Suharto Anggono Suharto Anggono
2023-Mar-13 18:42 UTC
[Rd] scan(..., skip=1e11): infinite loop; cannot interrupt
With ?if?(!j--)?{ ?????R_CheckUserInterrupt(); ?????j?=?10000; ?} as?in?current?R?devel?(r83976),?j goes negative (-1) and interrupt is checked every 10001 instead of 10000. I?prefer ?if?(!--j)?{ ?????R_CheckUserInterrupt(); ?????j?=?10000; ?} . In?current?R?devel?(r83976),?if?EOF?is?reached,?the?outer?loop?keeps?going,?i?keeps?incrementing?until?nskip. The?outer?loop?could?be?made?to?also?stop?on?EOF. Alternatively,?not?using?nested?loop?is?possible,?like?the?following. ?if?(nskip)?for?(R_xlen_t?i?=?0,?j?=?10000;?;?)?{?/*?MBCS-safe?*/ ?c?=?scanchar(FALSE,?&data); ?if?(!j--)?{ ?????R_CheckUserInterrupt(); ?????j?=?10000; ?} ?if?((c?==?'\n'?&&?++i?==?nskip)?||?c?==?R_EOF) ?????break; ?} ----------- On?2/11/23?09:33,?Ivan?Krylov?wrote:>?On?Fri,?10?Feb?2023?23:38:55?-0600 >?Spencer?Graves?<spencer.graves?using?prodsyse.com>?wrote: > >>?I?have?a?4.54?GB?file?that?I'm?trying?to?read?in?chunks?using >>?"scan(...,?skip=__)".??It?works?as?expected?for?small?values?of >>?"skip"?but?goes?into?an?infinite?loop?for?"skip=1e11"?and?similar >>?large?values?of?skip:??I?cannot?even?interrupt?it;??I?must?kill?R. >?Skipping?lines?is?done?by?two?nested?loops.?The?outer?loop?counts?the >?lines?to?skip;?the?inner?loop?reads?characters?until?it?encounters?a >?newline?or?end?of?file.?The?outer?loop?doesn't?check?for?EOF?and?keeps >?asking?for?more?characters?until?the?inner?loop?runs?at?least?once?for >?every?line?it?wants?to?skip.?The?following?patch?should?avoid?the >?wait?in?such?cases: > >?---?src/main/scan.c?(revision?83797) >?+++?src/main/scan.c?(working?copy) >?@@?-835,7?+835,7?@@ >???attribute_hidden?SEXP?do_scan(SEXP?call,?SEXP?op,?SEXP?args,?SEXP?rho) >???{ >???????SEXP?ans,?file,?sep,?what,?stripwhite,?dec,?quotes,?comstr; >?-????int?c,?flush,?fill,?blskip,?multiline,?escapes,?skipNul; >?+????int?c?=?0,?flush,?fill,?blskip,?multiline,?escapes,?skipNul; >???????R_xlen_t?nmax,?nlines,?nskip; >???????const?char?*p,?*encoding; >???????RCNTXT?cntxt; >?@@?-952,7?+952,7?@@ >????????if(!data.con->canread) >????error(_("cannot?read?from?this?connection")); >????} >?-?for?(R_xlen_t?i?=?0;?i?<?nskip;?i++)?/*?MBCS-safe?*/ >?+?for?(R_xlen_t?i?=?0;?i?<?nskip?&&?c?!=?R_EOF;?i++)?/*?MBCS-safe?*/ >????????while?((c?=?scanchar(FALSE,?&data))?!=?'\n'?&&?c?!=?R_EOF); >???????} > > >?Making?it?interruptible?is?a?bit?more?work:?we?need?to?ensure?that?a >?valid?context?is?set?up?and?check?regularly?for?an?interrupt. > >?---?src/main/scan.c?(revision?83797) >?+++?src/main/scan.c?(working?copy) >?@@?-835,7?+835,7?@@ >???attribute_hidden?SEXP?do_scan(SEXP?call,?SEXP?op,?SEXP?args,?SEXP?rho) >???{ >???????SEXP?ans,?file,?sep,?what,?stripwhite,?dec,?quotes,?comstr; >?-????int?c,?flush,?fill,?blskip,?multiline,?escapes,?skipNul; >?+????int?c?=?0,?flush,?fill,?blskip,?multiline,?escapes,?skipNul; >???????R_xlen_t?nmax,?nlines,?nskip; >???????const?char?*p,?*encoding; >???????RCNTXT?cntxt; >?@@?-952,8?+952,6?@@ >????????if(!data.con->canread) >????error(_("cannot?read?from?this?connection")); >????} >?-?for?(R_xlen_t?i?=?0;?i?<?nskip;?i++)?/*?MBCS-safe?*/ >?-?????while?((c?=?scanchar(FALSE,?&data))?!=?'\n'?&&?c?!=?R_EOF); >???????} > >???????ans?=?R_NilValue;?/*?-Wall?*/ >?@@?-966,6?+964,10?@@ >???????cntxt.cend?=?&scan_cleanup; >???????cntxt.cenddata?=?&data; > >?+????if?(ii)?for?(R_xlen_t?i?=?0,?j?=?0;?i?<?nskip?&&?c?!=?R_EOF;?i++)?/*?MBCS-safe?*/ >?+?while?((c?=?scanchar(FALSE,?&data))?!=?'\n'?&&?c?!=?R_EOF) >?+?????if?(j++?%?10000?==?9999)?R_CheckUserInterrupt(); >?+ >???????switch?(TYPEOF(what))?{ >???????case?LGLSXP: >???????case?INTSXP: > >?This?way,?even?if?you?pour?a?Decanter?of?Endless?Lines?(e.g.?mkfifo >?LINES;?perl?-E'print?"A"x42?while?1;'?>?LINES)?into?scan(),?it?can >?still?be?interrupted,?even?if?neither?newline?nor?EOF?ever?arrives.Thanks,?I've?updated?the?implementation?of?scan()?in?R-devel?to?be interruptible?while?skipping?lines. I've?done?it?slightly?differently?as?I?found?there?already?was?a?memory leak,?which?could?be?fixed?by?creating?the?context?a?bit?earlier. I've?also?avoided?modulo?on?the?fast?path?as?I?saw?13%?performance overhead?on?my?mailbox?file.?Decrementing?and?checking?against?zero didn't?have?measurable?overhead. Best Tomas [snip]
Apparently Analagous Threads
- scan(..., skip=1e11): infinite loop; cannot interrupt
- scan(..., skip=1e11): infinite loop; cannot interrupt
- read.table problem on Linux/Alpha (seg faults caused by isspace(R_EOF)) (PR#303)
- Fetching a range of columns
- density estimation for d>2 for the DPpackage