Discussion:
lib2to3 and matching of fixers
Vinay Sajip
2011-12-13 22:56:37 UTC
Permalink
I'm trying to write a custom lib2to3 fixer which, when processing the source

x = b'foo' b'bar'

will match the right hand side of the expression. From what I can see I should
be able to match it with 'atom' or 'power', but it doesn't seem to match either
of those. The individual byte-string tokens are matched using STRING, as
expected.

What should I use to match the right-hand side, for example if I want to do
constant folding and transform to something like b'foobar'?

Regards,

Vinay Sajip
Benjamin Peterson
2011-12-14 01:30:16 UTC
Permalink
Post by Vinay Sajip
I'm trying to write a custom lib2to3 fixer which, when processing the source
x = b'foo' b'bar'
will match the right hand side of the expression. From what I can see I should
be able to match it with 'atom' or 'power', but it doesn't seem to match either
of those. The individual byte-string tokens are matched using STRING, as
expected.
What are you using exactly? I would expect "atom< STRING+>" to work.
Post by Vinay Sajip
What should I use to match the right-hand side, for example if I want to do
constant folding and transform to something like b'foobar'?
--
Regards,
Benjamin
Vinay Sajip
2011-12-14 11:43:10 UTC
Permalink
Post by Benjamin Peterson
What are you using exactly? I would expect "atom< STRING+>" to
work.
Thanks, that worked. I tried "atom" and "STRING+" separately - sorry, I'm unfamiliar with the pattern-matching
language, but hopefully I'll get more familiar as I work with it!
I'm now trying to match a tuple using the following pattern:

PATTERN = "atom< '(' args=testlist_gexp< any+ > ')' >"


This is a subset of the pattern used in the isinstance fixer that Armin Ronacher wrote. However, the pattern fails to match the source

x = (int, long)


The parse tree for this source looks like this:

Node(file_input, [Node(simple_stmt, [Node(expr_stmt, [Leaf(1, 'x'), Leaf(22, '='), Node(atom, [Leaf(7, '('), Node(testlist_gexp, [Leaf(1, 'int'), Leaf(12, ','), Leaf(1, 'long')]), Leaf(8, ')')])]), Leaf(4, '\n')]), Leaf(0, '')])


Since it contains an "atom" node with children consisting of a parenthesised "testlist_gexp" node, I'm not sure why there is no match. What am I missing?

Thanks and regards,

Vinay Sajip
Benjamin Peterson
2011-12-14 17:35:25 UTC
Permalink
Post by Vinay Sajip
Post by Benjamin Peterson
What are you using exactly? I would expect "atom< STRING+>" to
work.
Thanks, that worked. I tried "atom" and "STRING+" separately - sorry, I'm unfamiliar with the pattern-matching
language, but hopefully I'll get more familiar as I work with it!
PATTERN = "atom< '(' args=testlist_gexp< any+ > ')' >"
This is a subset of the pattern used in the isinstance fixer that Armin Ronacher wrote. However, the pattern fails to match the source
x = (int, long)
Node(file_input, [Node(simple_stmt, [Node(expr_stmt, [Leaf(1, 'x'), Leaf(22, '='), Node(atom, [Leaf(7, '('), Node(testlist_gexp, [Leaf(1, 'int'), Leaf(12, ','), Leaf(1, 'long')]), Leaf(8, ')')])]), Leaf(4, '\n')]), Leaf(0, '')])
You have to account for the newline.
--
Regards,
Benjamin
Vinay Sajip
2011-12-14 20:38:35 UTC
Permalink
Post by Benjamin Peterson
You have to account for the newline.
Thanks for the response, but it's not clear how or why. As to the "why": the
portion of the tree I get by looking at

tree.children[0].children[0].children[2]

is

Node(atom, [
Leaf(7, '('),
Node(testlist_gexp, [
Leaf(1, 'int'),
Leaf(12, ','),
Leaf(1, 'long')
]),
Leaf(8, ')')
])

which is the bit I expect to match. It does not contain the newline (that's
present as the last child the enclosing grandparent node (simple_stmt) obtained
by looking at tree.children[0]:

Node(simple_stmt, [
Node(expr_stmt, [
Leaf(1, 'x'),
Leaf(22, '='),
Node(atom, [
Leaf(7, '('),
Node(testlist_gexp, [
Leaf(1, 'int'),
Leaf(12, ','),
Leaf(1, 'long')
]),
Leaf(8, ')')
])
]),
Leaf(4, '\n')
])

The pattern I used appears consistent with the subset of the pattern used in
the is_instance fixer to match a tuple in the isinstance call, other than
whitespace in the pattern string itself:

power<
'isinstance'
trailer< '(' arglist< any ',' atom< '('
args=testlist_gexp< any+ >
')' > > ')' >
in the is_instance fixer, as against my fixer's

atom< '(' args=testlist_gexp< any+ > ')' >

So I can't see why the newline needs to be accounted for, nor how to do it.
Arranging the whitespace in my pattern to exactly match that segment in the
is_instance fixer's pattern made no difference, as I would have expected.

So, how do I account for the newline, and why do I need to when it's two levels
up in the tree?

Thanks for your patience with lib2to3 newbie.

Regards,

Vinay Sajip
Benjamin Peterson
2011-12-14 22:18:19 UTC
Permalink
Post by Vinay Sajip
Post by Benjamin Peterson
You have to account for the newline.
Thanks for the response, but it's not clear how or why. As to the "why": the
portion of the tree I get by looking at
Sorry. I was trying to read the parse tree from the linear version in
your last mail and failed. You are correct.

Does your fixer us BM_compatible = True? Try setting that to false.
--
Regards,
Benjamin
Vinay Sajip
2011-12-14 23:01:52 UTC
Permalink
Post by Benjamin Peterson
Does your fixer us BM_compatible = True? Try setting that to false.
With BM_compatible = False, the match is indeed made, and my fixer's transform method called. Thanks!

Is the meaning of the BM_compatible flag, or the overall matching and transformation algorithm, documented somewhere? Most of the fixers I looked at had BM_compatible = True, and I took my cue from the fix_isinstance fixer as it appeared closest to what I was trying to do.

Regards,

Vinay Sajip
Benjamin Peterson
2011-12-14 23:05:31 UTC
Permalink
Post by Vinay Sajip
Post by Benjamin Peterson
Does your fixer us BM_compatible = True? Try setting that to false.
With BM_compatible = False, the match is indeed made, and my fixer's transform method called. Thanks!
Is the meaning of the BM_compatible flag, or the overall matching and transformation algorithm, documented somewhere? Most of the fixers I looked at had BM_compatible = True, and I took my cue from the fix_isinstance fixer as it appeared closest to what I was trying to do.
No, unfortunately. The BM matcher is faster, but has weird behavior on
some things (as you can see).
--
Regards,
Benjamin
Vinay Sajip
2011-12-14 09:23:23 UTC
Permalink
Post by Benjamin Peterson
What are you using exactly? I would expect "atom< STRING+>" to
work.
Thanks, that worked. I tried "atom" and "STRING+" separately - sorry, I'm unfamiliar with the pattern-matching language, but hopefully I'll get more familiar as I work with it!

Regards,

Vinay Sajip

Loading...