thr3ads.net - R devel - [Rd] Mapping parse tree elements to tokens [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Jim Hester

2015-Jul-29 16:13 UTC

[Rd] Mapping parse tree elements to tokens

I would like to map the parsed tokens obtained from utils::getParseData()
to the parse tree and elements obtained by base::parse().

It looks like back when this code was in the parser package the parse()
function annotated the elements in the tree with their id, which would
allow you to perform this mapping.  However when the code was included in R
this functionality was removed.

?getParseData states
  The ?id? values are not attached to the elements of the parse
          tree, they are only retained in the table returned by
          ?getParseData?.

Is there another way you can map between the getParseData() tokens and
elements of the parse tree that makes this additional annotation
unnecessary?  Or is this simply not possible?

	[[alternative HTML version deleted]]

Duncan Murdoch

2015-Jul-29 16:43 UTC

head link

[Rd] Mapping parse tree elements to tokens

On 29/07/2015 12:13 PM, Jim Hester wrote:> I would like to map the parsed tokens obtained from utils::getParseData()
> to the parse tree and elements obtained by base::parse().
>
> It looks like back when this code was in the parser package the parse()
> function annotated the elements in the tree with their id, which would
> allow you to perform this mapping.  However when the code was included in R
> this functionality was removed.
Yes, not all elements of the parse tree can legally have attributes 
attached.>
> ?getParseData states
>    The ?id? values are not attached to the elements of the parse
>            tree, they are only retained in the table returned by
>            ?getParseData?.
>
> Is there another way you can map between the getParseData() tokens and
> elements of the parse tree that makes this additional annotation
> unnecessary?  Or is this simply not possible?
I think you can't get to it, though you can get close by looking at the 
id & parent values in the table.  For example,

  code <- "x + (y + 1)"
  p <- parse(text=code)

getParseData(p)
    line1 col1 line2 col2 id parent     token terminal text
15     1    1     1   11 15      0      expr    FALSE
1      1    1     1    1  1      3    SYMBOL     TRUE    x
3      1    1     1    1  3     15      expr    FALSE
2      1    3     1    3  2     15       '+'     TRUE    +
13     1    5     1   11 13     15      expr    FALSE
4      1    5     1    5  4     13       '('     TRUE    (
11     1    6     1   10 11     13      expr    FALSE
5      1    6     1    6  5      7    SYMBOL     TRUE    y
7      1    6     1    6  7     11      expr    FALSE
6      1    8     1    8  6     11       '+'     TRUE    +
8      1   10     1   10  8      9 NUM_CONST     TRUE    1
9      1   10     1   10  9     11      expr    FALSE
10     1   11     1   11 10     13       ')'     TRUE    )


Now p is an expression, with the parse tree in p[[1]].  From the table, 
we can see that the root node has id 15, and 3 nodes have that as a 
parent.  Those would be p[[c(1,1)]], p[[c(1,2)]], p[[c(1,3)]].  The 
tricky part is the re-ordering:  those correspond to `+`, x, and (y+1) 
respectively, not the order they appear in the original source or in the 
table.  Generally the function call appears first in the parse tree, but 
I'm not sure you could always recognize which is the function call by 
looking at the table.

Duncan Murdoch

Michael Lawrence

2015-Jul-29 18:30 UTC

head link

[Rd] Mapping parse tree elements to tokens

Probably need a generic tree based on "ParseNode" objects that
associate the line information with the symbol (for leaf nodes). As
Duncan notes, it should be possible to gather that from the table.

But it would be nice if there was an "expr" column in the parse data
column in addition to "text". It would contain the parsed object.
Otherwise, to use the table, one is often reparsing the text, which
just seems redundant and inconvenient.

Michael

On Wed, Jul 29, 2015 at 9:43 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:> On 29/07/2015 12:13 PM, Jim Hester wrote:
>>
>> I would like to map the parsed tokens obtained from
utils::getParseData()
>> to the parse tree and elements obtained by base::parse().
>>
>> It looks like back when this code was in the parser package the parse()
>> function annotated the elements in the tree with their id, which would
>> allow you to perform this mapping.  However when the code was included
in
>> R
>> this functionality was removed.
>
>
> Yes, not all elements of the parse tree can legally have attributes
> attached.
>>
>>
>> ?getParseData states
>>    The ?id? values are not attached to the elements of the parse
>>            tree, they are only retained in the table returned by
>>            ?getParseData?.
>>
>> Is there another way you can map between the getParseData() tokens and
>> elements of the parse tree that makes this additional annotation
>> unnecessary?  Or is this simply not possible?
>
>
> I think you can't get to it, though you can get close by looking at the
id &
> parent values in the table.  For example,
>
>  code <- "x + (y + 1)"
>  p <- parse(text=code)
>
> getParseData(p)
>    line1 col1 line2 col2 id parent     token terminal text
> 15     1    1     1   11 15      0      expr    FALSE
> 1      1    1     1    1  1      3    SYMBOL     TRUE    x
> 3      1    1     1    1  3     15      expr    FALSE
> 2      1    3     1    3  2     15       '+'     TRUE    +
> 13     1    5     1   11 13     15      expr    FALSE
> 4      1    5     1    5  4     13       '('     TRUE    (
> 11     1    6     1   10 11     13      expr    FALSE
> 5      1    6     1    6  5      7    SYMBOL     TRUE    y
> 7      1    6     1    6  7     11      expr    FALSE
> 6      1    8     1    8  6     11       '+'     TRUE    +
> 8      1   10     1   10  8      9 NUM_CONST     TRUE    1
> 9      1   10     1   10  9     11      expr    FALSE
> 10     1   11     1   11 10     13       ')'     TRUE    )
>
>
> Now p is an expression, with the parse tree in p[[1]].  From the table, we
> can see that the root node has id 15, and 3 nodes have that as a parent.
> Those would be p[[c(1,1)]], p[[c(1,2)]], p[[c(1,3)]].  The tricky part is
> the re-ordering:  those correspond to `+`, x, and (y+1) respectively, not
> the order they appear in the original source or in the table.  Generally
the
> function call appears first in the parse tree, but I'm not sure you
could
> always recognize which is the function call by looking at the table.
>
> Duncan Murdoch
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Maybe Matching Threads

Search for more seemingly similar threads

R devel - Jul 2015 - Mapping parse tree elements to tokens

[Rd] Mapping parse tree elements to tokens

[Rd] Mapping parse tree elements to tokens

[Rd] Mapping parse tree elements to tokens

Maybe Matching Threads