Laurent Gautier
2019-Dec-07 21:32 UTC
[Rd] Inconsistent behavior for the C AP's R_ParseVector() ?
Thanks for the quick response Tomas. The same error is indeed happening when trying to have a zero-length variable name in an environment. The surprising bit is then "why is this happening during parsing" (that is why are variables assigned to an environment) ? We are otherwise aware that the error is not occurring in the R console, but can be traced to a call to R_ParseVector() in R's C API:( https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509 ). Our specific setup is calling an embedded R from Python, using the cffi library. An error on end was the first possibility considered, but the puzzling specificity of the error (as shown below other parsing errors are handled properly) and the difficulty tracing what is in happening in R_ParseVector() made me ask whether someone on this list had a suggestion about the possible issue" ```>>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") ---------------------------------------------------------------------------RParsingError Traceback (most recent call last) >>> e = ri.parse("list(''=123") R[write to console]: Error: attempt to use zero-length variable nameR[write to console]: Fatal error: unable to initialize the JIT *** stack smashing detected ***: <unknown> terminated ``` Le lun. 2 d?c. 2019 ? 06:37, Tomas Kalibera <tomas.kalibera at gmail.com> a ?crit :> Dear Laurent, > > could you please provide a complete reproducible example where parsing > results in a crash of R? Calling parse(text="list(''=123") from R works > fine for me (gives Error: attempt to use zero-length variable name). > > I don't think the problem you observed could be related to the memory > leak. The leak is on the heap, not stack. > > Zero-length names of elements in a list are allowed. They are not the > same thing as zero-length variables in an environment. If you try to > convert "lst" from your example to an environment, you would get the > error (attempt to use zero-length variable name). > > Best > Tomas > > > On 11/30/19 11:55 PM, Laurent Gautier wrote: > > Hi again, > > > > Beside R_ParseVector()'s possible inconsistent behavior, R's handling of > > zero-length named elements does not seem consistent either: > > > > ``` > >> lst <- list() > >> lst[[""]] <- 1 > >> names(lst) > > [1] "" > >> list("" = 1) > > Error: attempt to use zero-length variable name > > ``` > > > > Should the parser be made to accept as valid what is otherwise possible > > when using `[[<` ? > > > > > > Best, > > > > Laurent > > > > > > > > Le sam. 30 nov. 2019 ? 17:33, Laurent Gautier <lgautier at gmail.com> a > ?crit : > > > >> I found the following code comment in `src/main/gram.c`: > >> > >> ``` > >> > >> /* Memory leak > >> > >> yyparse(), as generated by bison, allocates extra space for the parser > >> stack using malloc(). Unfortunately this means that there is a memory > >> leak in case of an R error (long-jump). In principle, we could define > >> yyoverflow() to relocate the parser stacks for bison and allocate say on > >> the R heap, but yyoverflow() is undocumented and somewhat complicated > >> (we would have to replicate some macros from the generated parser here). > >> The same problem exists at least in the Rd and LaTeX parsers in tools. > >> */ > >> > >> ``` > >> > >> Could this be related to be issue ? > >> > >> Le sam. 30 nov. 2019 ? 14:04, Laurent Gautier <lgautier at gmail.com> a > >> ?crit : > >> > >>> Hi, > >>> > >>> The behavior of > >>> ``` > >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP); > >>> ``` > >>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent > >>> depending on the string to be parsed. > >>> > >>> Trying to parse a string such as `"list(''=1+"` sets the > >>> `ParseStatus` to incomplete parsing error but trying to parse > >>> `"list(''=123"` will result in R sending a message to the console > (followed but a crash): > >>> > >>> ``` > >>> R[write to console]: Error: attempt to use zero-length variable > nameR[write to console]: Fatal error: unable to initialize the JIT*** stack > smashing detected ***: <unknown> terminated > >>> ``` > >>> > >>> Is there a reason for the difference in behavior, and is there a > workaround ? > >>> > >>> Thanks, > >>> > >>> > >>> Laurent > >>> > >>> > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > >[[alternative HTML version deleted]]
Tomas Kalibera
2019-Dec-09 10:43 UTC
[Rd] Inconsistent behavior for the C AP's R_ParseVector() ?
On 12/7/19 10:32 PM, Laurent Gautier wrote:> Thanks for the quick response Tomas. > > The same error is indeed happening when trying to have a zero-length > variable name in an environment. The surprising bit is then "why is > this happening during parsing" (that is why are variables assigned to > an environment) ?The emitted R error (in the R console) is not a parse (syntax) error, but an error emitted during parsing when the parser tries to intern a name - look it up in a symbol table. Empty string is not allowed as a symbol name, and hence the error. In the call "list(''=1)" , the empty name is what could eventually become a name of a local variable inside list(), even though not yet during parsing. There is probably some error in how the external code is handling R errors? (Fatal error: unable to initialize the JIT, stack smashing, etc) and possibly also how R is initialized before calling ParseVector. Probably you would get the same problem when running say "stop('myerror')". Please note R errors are implemented as long-jumps, so care has to be taken when calling into R, Writing R Extensions has more details (and section 8 specifically about embedding R). This is unlike parse (syntax) errors signaled via return value to ParseVector() Best, Tomas> > We are otherwise aware that the error is not occurring in the R > console, but can be traced to a call to R_ParseVector() in R's C > API:(https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509). > > Our specific setup is calling an embedded R from Python, using the > cffi library. An error on end was the first possibility considered, > but the puzzling specificity of the error (as shown below other > parsing errors are handled properly) and the difficulty tracing what > is in happening in R_ParseVector() made me ask whether someone on this > list had a suggestion about the possible issue" > > ``` > >>> import rpy2.rinterface as ri > >>> ri.initr() > >>> e = ri.parse("list(''=1+") > --------------------------------------------------------------------------- > RParsingError Traceback (most recent call last)>>> e = ri.parse("list(''=123") R[write to console]: Error: > attempt to use zero-length variable name R[write to console]: Fatal > error: unable to initialize the JIT *** stack smashing detected ***: > <unknown> terminated ``` > > Le?lun. 2 d?c. 2019 ??06:37, Tomas Kalibera <tomas.kalibera at gmail.com > <mailto:tomas.kalibera at gmail.com>> a ?crit?: > > Dear Laurent, > > could you please provide a complete reproducible example where > parsing > results in a crash of R? Calling parse(text="list(''=123") from R > works > fine for me (gives Error: attempt to use zero-length variable name). > > I don't think the problem you observed could be related to the memory > leak. The leak is on the heap, not stack. > > Zero-length names of elements in a list are allowed. They are not the > same thing as zero-length variables in an environment. If you try to > convert "lst" from your example to an environment, you would get the > error (attempt to use zero-length variable name). > > Best > Tomas > > > On 11/30/19 11:55 PM, Laurent Gautier wrote: > > Hi again, > > > > Beside R_ParseVector()'s possible inconsistent behavior, R's > handling of > > zero-length named elements does not seem consistent either: > > > > ``` > >> lst <- list() > >> lst[[""]] <- 1 > >> names(lst) > > [1] "" > >> list("" = 1) > > Error: attempt to use zero-length variable name > > ``` > > > > Should the parser be made to accept as valid what is otherwise > possible > > when using `[[<` ? > > > > > > Best, > > > > Laurent > > > > > > > > Le sam. 30 nov. 2019 ? 17:33, Laurent Gautier > <lgautier at gmail.com <mailto:lgautier at gmail.com>> a ?crit : > > > >> I found the following code comment in `src/main/gram.c`: > >> > >> ``` > >> > >> /* Memory leak > >> > >> yyparse(), as generated by bison, allocates extra space for the > parser > >> stack using malloc(). Unfortunately this means that there is a > memory > >> leak in case of an R error (long-jump). In principle, we could > define > >> yyoverflow() to relocate the parser stacks for bison and > allocate say on > >> the R heap, but yyoverflow() is undocumented and somewhat > complicated > >> (we would have to replicate some macros from the generated > parser here). > >> The same problem exists at least in the Rd and LaTeX parsers in > tools. > >> */ > >> > >> ``` > >> > >> Could this be related to be issue ? > >> > >> Le sam. 30 nov. 2019 ? 14:04, Laurent Gautier > <lgautier at gmail.com <mailto:lgautier at gmail.com>> a > >> ?crit : > >> > >>> Hi, > >>> > >>> The behavior of > >>> ``` > >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP); > >>> ``` > >>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent > >>> depending on the string to be parsed. > >>> > >>> Trying to parse a string such as `"list(''=1+"` sets the > >>> `ParseStatus` to incomplete parsing error but trying to parse > >>> `"list(''=123"` will result in R sending a message to the > console (followed but a crash): > >>> > >>> ``` > >>> R[write to console]: Error: attempt to use zero-length > variable nameR[write to console]: Fatal error: unable to > initialize the JIT*** stack smashing detected ***: <unknown> > terminated > >>> ``` > >>> > >>> Is there a reason for the difference in behavior, and is there > a workaround ? > >>> > >>> Thanks, > >>> > >>> > >>> Laurent > >>> > >>> > >? ? ? ?[[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > >[[alternative HTML version deleted]]
Laurent Gautier
2019-Dec-09 13:54 UTC
[Rd] Inconsistent behavior for the C AP's R_ParseVector() ?
Le lun. 9 d?c. 2019 ? 05:43, Tomas Kalibera <tomas.kalibera at gmail.com> a ?crit :> On 12/7/19 10:32 PM, Laurent Gautier wrote: > > Thanks for the quick response Tomas. > > The same error is indeed happening when trying to have a zero-length > variable name in an environment. The surprising bit is then "why is this > happening during parsing" (that is why are variables assigned to an > environment) ? > > The emitted R error (in the R console) is not a parse (syntax) error, but > an error emitted during parsing when the parser tries to intern a name - > look it up in a symbol table. Empty string is not allowed as a symbol name, > and hence the error. In the call "list(''=1)" , the empty name is what > could eventually become a name of a local variable inside list(), even > though not yet during parsing. >Thanks Tomas. I guess this has do with R expressions being lazily evaluated, and names of arguments in a call are also part of the expression. Now the puzzling part is why is that at all part of the parsing: I would have expected R_ParseVector() to be restricted to parsing... Now it feels like R_ParseVector() is performing parsing, and a first level of evalution for expressions that "should never work" (the empty name). There is probably some error in how the external code is handling R errors> (Fatal error: unable to initialize the JIT, stack smashing, etc) and > possibly also how R is initialized before calling ParseVector. Probably you > would get the same problem when running say "stop('myerror')". Please note > R errors are implemented as long-jumps, so care has to be taken when > calling into R, Writing R Extensions has more details (and section 8 > specifically about embedding R). This is unlike parse (syntax) errors > signaled via return value to ParseVector() >The issue is that the segfault (because of stack smashing, therefore because of what also suspected to be an incontrolled jump) is happening within the execution of R_ParseVector(). I would think that an issue with the initialization of R is less likely because the project is otherwise used a fair bit and is well covered by automated continuous tests. After looking more into R's gram.c I suspect that an execution context is required for R_ParseVector() to know to properly work (know where to jump in case of error) when the parsing code decides to fail outside what it thinks is a syntax error. If the case, this would make R_ParseVector() function well when called from say, a C-extension to an R package, but fail the way I am seeing it fail when called from an embedded R. Best, Laurent> Best, > Tomas > > > We are otherwise aware that the error is not occurring in the R console, > but can be traced to a call to R_ParseVector() in R's C API:( > https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509 > ). > > Our specific setup is calling an embedded R from Python, using the cffi > library. An error on end was the first possibility considered, but the > puzzling specificity of the error (as shown below other parsing errors are > handled properly) and the difficulty tracing what is in happening in > R_ParseVector() made me ask whether someone on this list had a suggestion > about the possible issue" > > ``` > > >>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") ---------------------------------------------------------------------------RParsingError Traceback (most recent call last)>>> e = ri.parse("list(''=123") R[write to console]: Error: attempt to use zero-length variable name > R[write to console]: Fatal error: unable to initialize the JIT > > *** stack smashing detected ***: <unknown> terminated > ``` > > > Le lun. 2 d?c. 2019 ? 06:37, Tomas Kalibera <tomas.kalibera at gmail.com> a > ?crit : > >> Dear Laurent, >> >> could you please provide a complete reproducible example where parsing >> results in a crash of R? Calling parse(text="list(''=123") from R works >> fine for me (gives Error: attempt to use zero-length variable name). >> >> I don't think the problem you observed could be related to the memory >> leak. The leak is on the heap, not stack. >> >> Zero-length names of elements in a list are allowed. They are not the >> same thing as zero-length variables in an environment. If you try to >> convert "lst" from your example to an environment, you would get the >> error (attempt to use zero-length variable name). >> >> Best >> Tomas >> >> >> On 11/30/19 11:55 PM, Laurent Gautier wrote: >> > Hi again, >> > >> > Beside R_ParseVector()'s possible inconsistent behavior, R's handling of >> > zero-length named elements does not seem consistent either: >> > >> > ``` >> >> lst <- list() >> >> lst[[""]] <- 1 >> >> names(lst) >> > [1] "" >> >> list("" = 1) >> > Error: attempt to use zero-length variable name >> > ``` >> > >> > Should the parser be made to accept as valid what is otherwise possible >> > when using `[[<` ? >> > >> > >> > Best, >> > >> > Laurent >> > >> > >> > >> > Le sam. 30 nov. 2019 ? 17:33, Laurent Gautier <lgautier at gmail.com> a >> ?crit : >> > >> >> I found the following code comment in `src/main/gram.c`: >> >> >> >> ``` >> >> >> >> /* Memory leak >> >> >> >> yyparse(), as generated by bison, allocates extra space for the parser >> >> stack using malloc(). Unfortunately this means that there is a memory >> >> leak in case of an R error (long-jump). In principle, we could define >> >> yyoverflow() to relocate the parser stacks for bison and allocate say >> on >> >> the R heap, but yyoverflow() is undocumented and somewhat complicated >> >> (we would have to replicate some macros from the generated parser >> here). >> >> The same problem exists at least in the Rd and LaTeX parsers in tools. >> >> */ >> >> >> >> ``` >> >> >> >> Could this be related to be issue ? >> >> >> >> Le sam. 30 nov. 2019 ? 14:04, Laurent Gautier <lgautier at gmail.com> a >> >> ?crit : >> >> >> >>> Hi, >> >>> >> >>> The behavior of >> >>> ``` >> >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP); >> >>> ``` >> >>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent >> >>> depending on the string to be parsed. >> >>> >> >>> Trying to parse a string such as `"list(''=1+"` sets the >> >>> `ParseStatus` to incomplete parsing error but trying to parse >> >>> `"list(''=123"` will result in R sending a message to the console >> (followed but a crash): >> >>> >> >>> ``` >> >>> R[write to console]: Error: attempt to use zero-length variable >> nameR[write to console]: Fatal error: unable to initialize the JIT*** stack >> smashing detected ***: <unknown> terminated >> >>> ``` >> >>> >> >>> Is there a reason for the difference in behavior, and is there a >> workaround ? >> >>> >> >>> Thanks, >> >>> >> >>> >> >>> Laurent >> >>> >> >>> >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-devel at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> >> >> >[[alternative HTML version deleted]]
Apparently Analagous Threads
- Inconsistent behavior for the C AP's R_ParseVector() ?
- Inconsistent behavior for the C AP's R_ParseVector() ?
- Inconsistent behavior for the C AP's R_ParseVector() ?
- Inconsistent behavior for the C AP's R_ParseVector() ?
- Inconsistent behavior for the C AP's R_ParseVector() ?