Suharto Anggono Suharto Anggono
2023-Mar-13 18:42 UTC
[Rd] scan(..., skip=1e11): infinite loop; cannot interrupt
With
?if?(!j--)?{
?????R_CheckUserInterrupt();
?????j?=?10000;
?}
as?in?current?R?devel?(r83976),?j goes negative (-1) and interrupt is checked
every 10001 instead of 10000. I?prefer
?if?(!--j)?{
?????R_CheckUserInterrupt();
?????j?=?10000;
?}
.
In?current?R?devel?(r83976),?if?EOF?is?reached,?the?outer?loop?keeps?going,?i?keeps?incrementing?until?nskip.
The?outer?loop?could?be?made?to?also?stop?on?EOF.
Alternatively,?not?using?nested?loop?is?possible,?like?the?following.
?if?(nskip)?for?(R_xlen_t?i?=?0,?j?=?10000;?;?)?{?/*?MBCS-safe?*/
?c?=?scanchar(FALSE,?&data);
?if?(!j--)?{
?????R_CheckUserInterrupt();
?????j?=?10000;
?}
?if?((c?==?'\n'?&&?++i?==?nskip)?||?c?==?R_EOF)
?????break;
?}
-----------
On?2/11/23?09:33,?Ivan?Krylov?wrote:>?On?Fri,?10?Feb?2023?23:38:55?-0600
>?Spencer?Graves?<spencer.graves?using?prodsyse.com>?wrote:
>
>>?I?have?a?4.54?GB?file?that?I'm?trying?to?read?in?chunks?using
>>?"scan(...,?skip=__)".??It?works?as?expected?for?small?values?of
>>?"skip"?but?goes?into?an?infinite?loop?for?"skip=1e11"?and?similar
>>?large?values?of?skip:??I?cannot?even?interrupt?it;??I?must?kill?R.
>?Skipping?lines?is?done?by?two?nested?loops.?The?outer?loop?counts?the
>?lines?to?skip;?the?inner?loop?reads?characters?until?it?encounters?a
>?newline?or?end?of?file.?The?outer?loop?doesn't?check?for?EOF?and?keeps
>?asking?for?more?characters?until?the?inner?loop?runs?at?least?once?for
>?every?line?it?wants?to?skip.?The?following?patch?should?avoid?the
>?wait?in?such?cases:
>
>?---?src/main/scan.c?(revision?83797)
>?+++?src/main/scan.c?(working?copy)
>?@@?-835,7?+835,7?@@
>???attribute_hidden?SEXP?do_scan(SEXP?call,?SEXP?op,?SEXP?args,?SEXP?rho)
>???{
>???????SEXP?ans,?file,?sep,?what,?stripwhite,?dec,?quotes,?comstr;
>?-????int?c,?flush,?fill,?blskip,?multiline,?escapes,?skipNul;
>?+????int?c?=?0,?flush,?fill,?blskip,?multiline,?escapes,?skipNul;
>???????R_xlen_t?nmax,?nlines,?nskip;
>???????const?char?*p,?*encoding;
>???????RCNTXT?cntxt;
>?@@?-952,7?+952,7?@@
>????????if(!data.con->canread)
>????error(_("cannot?read?from?this?connection"));
>????}
>?-?for?(R_xlen_t?i?=?0;?i?<?nskip;?i++)?/*?MBCS-safe?*/
>?+?for?(R_xlen_t?i?=?0;?i?<?nskip?&&?c?!=?R_EOF;?i++)?/*?MBCS-safe?*/
>????????while?((c?=?scanchar(FALSE,?&data))?!=?'\n'?&&?c?!=?R_EOF);
>???????}
>
>
>?Making?it?interruptible?is?a?bit?more?work:?we?need?to?ensure?that?a
>?valid?context?is?set?up?and?check?regularly?for?an?interrupt.
>
>?---?src/main/scan.c?(revision?83797)
>?+++?src/main/scan.c?(working?copy)
>?@@?-835,7?+835,7?@@
>???attribute_hidden?SEXP?do_scan(SEXP?call,?SEXP?op,?SEXP?args,?SEXP?rho)
>???{
>???????SEXP?ans,?file,?sep,?what,?stripwhite,?dec,?quotes,?comstr;
>?-????int?c,?flush,?fill,?blskip,?multiline,?escapes,?skipNul;
>?+????int?c?=?0,?flush,?fill,?blskip,?multiline,?escapes,?skipNul;
>???????R_xlen_t?nmax,?nlines,?nskip;
>???????const?char?*p,?*encoding;
>???????RCNTXT?cntxt;
>?@@?-952,8?+952,6?@@
>????????if(!data.con->canread)
>????error(_("cannot?read?from?this?connection"));
>????}
>?-?for?(R_xlen_t?i?=?0;?i?<?nskip;?i++)?/*?MBCS-safe?*/
>?-?????while?((c?=?scanchar(FALSE,?&data))?!=?'\n'?&&?c?!=?R_EOF);
>???????}
>
>???????ans?=?R_NilValue;?/*?-Wall?*/
>?@@?-966,6?+964,10?@@
>???????cntxt.cend?=?&scan_cleanup;
>???????cntxt.cenddata?=?&data;
>
>?+????if?(ii)?for?(R_xlen_t?i?=?0,?j?=?0;?i?<?nskip?&&?c?!=?R_EOF;?i++)?/*?MBCS-safe?*/
>?+?while?((c?=?scanchar(FALSE,?&data))?!=?'\n'?&&?c?!=?R_EOF)
>?+?????if?(j++?%?10000?==?9999)?R_CheckUserInterrupt();
>?+
>???????switch?(TYPEOF(what))?{
>???????case?LGLSXP:
>???????case?INTSXP:
>
>?This?way,?even?if?you?pour?a?Decanter?of?Endless?Lines?(e.g.?mkfifo
>?LINES;?perl?-E'print?"A"x42?while?1;'?>?LINES)?into?scan(),?it?can
>?still?be?interrupted,?even?if?neither?newline?nor?EOF?ever?arrives.
Thanks,?I've?updated?the?implementation?of?scan()?in?R-devel?to?be
interruptible?while?skipping?lines.
I've?done?it?slightly?differently?as?I?found?there?already?was?a?memory
leak,?which?could?be?fixed?by?creating?the?context?a?bit?earlier.
I've?also?avoided?modulo?on?the?fast?path?as?I?saw?13%?performance
overhead?on?my?mailbox?file.?Decrementing?and?checking?against?zero
didn't?have?measurable?overhead.
Best
Tomas
[snip]
Reasonably Related Threads
- scan(..., skip=1e11): infinite loop; cannot interrupt
- scan(..., skip=1e11): infinite loop; cannot interrupt
- read.table problem on Linux/Alpha (seg faults caused by isspace(R_EOF)) (PR#303)
- Fetching a range of columns
- density estimation for d>2 for the DPpackage
