thr3ads.net - R help - [R] iterators : checkFunc with ireadLines [May 2020]

If this information is useful, please help other people find it:
Share via:
Laurent Rhelp
2020-May-18 20:21 UTC
[R] iterators : checkFunc with ireadLines

GREAT ! It is exactly in the idea of my request !
I like the nextElem call in the skip argument.
Thank you very much William
Best Regards
Laurent


Le 18/05/2020 ? 20:37, William Michels a ?crit?:> Hi Laurent,
>
> Thank you for explaining your size limitations. Below is an example
> using the read.fwf() function to grab the first column of your input
> file (in 2000 row chunks). This column is converted to an index, and
> the index is used to create an iterator useful for skipping lines when
> reading input with scan(). (You could try processing your large file
> in successive 2000 line chunks, or whatever number of lines fits into
> memory). Maybe not as elegant as the approach you were going for, but
> read.fwf() should be pretty efficient:
>
>> sensors <-  c("N053", "N163")
>> read.fwf("test2.txt", widths=c(4), as.is=TRUE, flush=TRUE,
n=2000, skip=0)
>      V1
> 1 Time
> 2 N023
> 3 N053
> 4 N123
> 5 N163
> 6 N193
>> first_col <- read.fwf("test2.txt", widths=c(4),
as.is=TRUE, flush=TRUE, n=2000, skip=0)
>> which(first_col$V1 %in% sensors)
> [1] 3 5
>> index1 <- which(first_col$V1 %in% sensors)
>> iter_index1 <- iter(1:2000, checkFunc= function(n) {n %in% index1})
>> unlist(scan(file="test2.txt",
what=list("","","","","","","","","",""),
flush=TRUE, multi.line=FALSE, skip=nextElem(iter_index1)-1, nlines=1,
quiet=TRUE))
>   [1] "N053"      "-0.014083" "-0.004741"
"0.001443"  "-0.010152"
> "-0.012996" "-0.005337" "-0.008738"
"-0.015094" "-0.012104"
>> unlist(scan(file="test2.txt",
what=list("","","","","","","","","",""),
flush=TRUE, multi.line=FALSE, skip=nextElem(iter_index1)-1, nlines=1,
quiet=TRUE))
>   [1] "N163"      "-0.054023" "-0.049345"
"-0.037158" "-0.04112"
> "-0.044612" "-0.036953" "-0.036061"
"-0.044516" "-0.046436"
> (Note for this email and the previous one, I've deleted the first
> "hash" character from each line of your test file for clarity).
>
> HTH, Bill.
>
> W. Michels, Ph.D.
>
>
>
>
>
> On Mon, May 18, 2020 at 3:35 AM Laurent Rhelp <LaurentRHelp at
free.fr> wrote:
>> Dear William,
>>    Thank you for your answer
>> My file is very large so I cannot read it in my memory (I cannot use
>> read.table). So I want to put in memory only the line I need to
process.
>> With readLines, as I did, it works but I would like to use an iterator
>> and a foreach loop to understand this way to do because I thought that
>> it was a better solution to write a nice code.
>>
>>
>> Le 18/05/2020 ? 04:54, William Michels a ?crit :
>>> Apologies, Laurent, for this two-part answer. I misunderstood your
>>> post where you stated you wanted to "filter(ing) some
>>> selected lines according to the line name... ." I thought that
meant
>>> you had a separate index (like a series of primes) that you wanted
to
>>> use to only read-in selected line numbers from a file (test file
below
>>> with numbers 1:1000 each on a separate line):
>>>
>>>> library(gmp)
>>>> library(iterators)
>>>> iprime <- iter(1:100, checkFunc = function(n) isprime(n))
>>>> scan(file="one_thou_lines.txt",
skip=nextElem(iprime)-1, nlines=1)
>>> Read 1 item
>>> [1] 2
>>>> scan(file="one_thou_lines.txt",
skip=nextElem(iprime)-1, nlines=1)
>>> Read 1 item
>>> [1] 3
>>>> scan(file="one_thou_lines.txt",
skip=nextElem(iprime)-1, nlines=1)
>>> Read 1 item
>>> [1] 5
>>>> scan(file="one_thou_lines.txt",
skip=nextElem(iprime)-1, nlines=1)
>>> Read 1 item
>>> [1] 7
>>> However, what it really seems that you want to do is read each line
of
>>> a (possibly enormous) file, test each line "string-wise"
to keep or
>>> discard, and if you're keeping it, append the line to a list. I
can
>>> certainly see the advantage of this strategy for reading in very,
very
>>> large files, but it's not clear to me how the
"ireadLines" function (
>>> in the "iterators" package) will help you, since it
doesn't seem to
>>> generate anything but a sequential index.
>>>
>>> Anyway, below is an absolutely standard read-in of your data using
>>> read.table(). Hopefully some of the code I've posted has been
useful
>>> to you.
>>>
>>>> sensors <-  c("N053", "N163")
>>>> read.table("test2.txt")
>>>       V1        V2        V3        V4        V5        V6       
V7
>>>      V8        V9       V10
>>> 1 Time  0.000000  0.000999  0.001999  0.002998  0.003998  0.004997
>>> 0.005997  0.006996  0.007996
>>> 2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816
>>> -0.033690 -0.041067 -0.038747
>>> 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
>>> -0.008738 -0.015094 -0.012104
>>> 4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243
>>> -0.015089 -0.014439 -0.011681
>>> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
>>> -0.036061 -0.044516 -0.046436
>>> 6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827
>>> -0.021996 -0.021755 -0.021846
>>>> Laurent_data <- read.table("test2.txt")
>>>> Laurent_data[Laurent_data$V1 %in% sensors, ]
>>>       V1        V2        V3        V4        V5        V6       
V7
>>>      V8        V9       V10
>>> 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
>>> -0.008738 -0.015094 -0.012104
>>> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
>>> -0.036061 -0.044516 -0.046436
>>>
>>> Best, Bill.
>>>
>>> W. Michels, Ph.D.
>>>
>>>
>>> On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <LaurentRHelp at
free.fr> wrote:
>>>> Dear R-Help List,
>>>>
>>>>       I would like to use an iterator to read a file filtering
some
>>>> selected lines according to the line name in order to use after
a
>>>> foreach loop. I wanted to use the checkFunc argument as the
following
>>>> example found on internet to select only prime numbers :
>>>>
>>>> |                                iprime <- ||iter||(1:100,
checkFunc >>>> ||function||(n) ||isprime||(n))|
>>>>
>>>> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
>>>>
<https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
>>>>
>>>> but the checkFunc argument seems not to be available with the
function
>>>> ireadLines (package iterators). So, I did the code below to
solve my
>>>> problem but I am sure that I miss something to use iterators
with files.
>>>> Since I found nothing on the web about ireadLines and the
checkFunc
>>>> argument, could somebody help me to understand how we have to
use
>>>> iterator (and foreach loop) on files keeping only selected
lines ?
>>>>
>>>> Thank you very much
>>>> Laurent
>>>>
>>>> Presently here is my code:
>>>>
>>>> ##        mock file to read: test.txt
>>>> ##
>>>> # Time    0    0.000999    0.001999    0.002998    0.003998
0.004997
>>>> 0.005997    0.006996    0.007996
>>>> # N023    -0.031323    -0.035026    -0.029759    -0.024886
-0.024464
>>>> -0.026816    -0.03369    -0.041067    -0.038747
>>>> # N053    -0.014083    -0.004741    0.001443    -0.010152
-0.012996
>>>> -0.005337    -0.008738    -0.015094    -0.012104
>>>> # N123    -0.019008    -0.013494    -0.01318    -0.029208
-0.032748
>>>> -0.020243    -0.015089    -0.014439    -0.011681
>>>> # N163    -0.054023    -0.049345    -0.037158    -0.04112
-0.044612
>>>> -0.036953    -0.036061    -0.044516    -0.046436
>>>> # N193    -0.022171    -0.022384    -0.022338    -0.023304
-0.022569
>>>> -0.021827    -0.021996    -0.021755    -0.021846
>>>>
>>>>
>>>> # sensors to keep
>>>>
>>>> sensors <-  c("N053", "N163")
>>>>
>>>>
>>>> library(iterators)
>>>>
>>>> library(rlist)
>>>>
>>>>
>>>> file_name <- "test.txt"
>>>>
>>>> con_obj <- file( file_name , "r")
>>>> ifile <- ireadLines( con_obj , n = 1 )
>>>>
>>>>
>>>> ## I do not do a loop for the example
>>>>
>>>> res <- list()
>>>>
>>>> r <- get_Lines_iter( ifile , sensors)
>>>> res <- list.append( res , r )
>>>> res
>>>> r <- get_Lines_iter( ifile , sensors)
>>>> res <- list.append( res , r )
>>>> res
>>>> r <- get_Lines_iter( ifile , sensors)
>>>> do.call("cbind",res)
>>>>
>>>> ## the function get_Lines_iter to select and process the line
>>>>
>>>> get_Lines_iter  <-  function( iter , sensors, sep =
'\t', quiet = FALSE){
>>>>      ## read the next record in the iterator
>>>>      r = try( nextElem(iter) )
>>>>     while(  TRUE ){
>>>>        if( class(r) == "try-error") {
>>>>              return( stop("The iterator is empty") )
>>>>       } else {
>>>>       ## split the read line according to the separator
>>>>        r_txt <- textConnection(r)
>>>>        fields <- scan(file = r_txt, what =
"character", sep = sep, quiet >>>> quiet)
>>>>         ## test if we have to keep the line
>>>>         if( fields[1] %in% sensors){
>>>>           ## data processing for the selected line (for the
example
>>>> transformation in dataframe)
>>>>           n <- length(fields)
>>>>           x <- data.frame( as.numeric(fields[2:n]) )
>>>>           names(x) <- fields[1]
>>>>           ## We return the values
>>>>           print(paste0("sensor ",fields[1],"
ok"))
>>>>           return( x )
>>>>         }else{
>>>>          print(paste0("Sensor ", fields[1] ,"
not selected"))
>>>>          r = try(nextElem(iter) )}
>>>>       }
>>>> }# end while loop
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> L'absence de virus dans ce courrier ?lectronique a ?t?
v?rifi?e par le logiciel antivirus Avast.
>>>> https://www.avast.com/antivirus
>>>>
>>>>           [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>
>>
>> --
>> L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par
le logiciel antivirus Avast.
>> https://www.avast.com/antivirus
>>
R help - May 2020 - iterators : checkFunc with ireadLines

[R] iterators : checkFunc with ireadLines