thr3ads.net - R help - [R] iterators : checkFunc with ireadLines [May 2020]

If this information is useful, please help other people find it:
Share via:

Laurent Rhelp

2020-May-17 13:52 UTC

[R] iterators : checkFunc with ireadLines

Dear R-Help List,

 ?? I would like to use an iterator to read a file filtering some 
selected lines according to the line name in order to use after a 
foreach loop. I wanted to use the checkFunc argument as the following 
example found on internet to select only prime numbers :

|??????????????????????????????? iprime <- ||iter||(1:100, checkFunc = 
||function||(n) ||isprime||(n))|

|(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/) 
<https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|

but the checkFunc argument seems not to be available with the function 
ireadLines (package iterators). So, I did the code below to solve my 
problem but I am sure that I miss something to use iterators with files. 
Since I found nothing on the web about ireadLines and the checkFunc 
argument, could somebody help me to understand how we have to use 
iterator (and foreach loop) on files keeping only selected lines ?

Thank you very much
Laurent

Presently here is my code:

##??????? mock file to read: test.txt
##
# Time??? 0??? 0.000999??? 0.001999??? 0.002998??? 0.003998 0.004997??? 
0.005997??? 0.006996??? 0.007996
# N023??? -0.031323??? -0.035026??? -0.029759??? -0.024886 -0.024464??? 
-0.026816??? -0.03369??? -0.041067??? -0.038747
# N053??? -0.014083??? -0.004741??? 0.001443??? -0.010152 -0.012996??? 
-0.005337??? -0.008738??? -0.015094??? -0.012104
# N123??? -0.019008??? -0.013494??? -0.01318??? -0.029208 -0.032748??? 
-0.020243??? -0.015089??? -0.014439??? -0.011681
# N163??? -0.054023??? -0.049345??? -0.037158??? -0.04112 -0.044612??? 
-0.036953??? -0.036061??? -0.044516??? -0.046436
# N193??? -0.022171??? -0.022384??? -0.022338??? -0.023304 -0.022569??? 
-0.021827??? -0.021996??? -0.021755??? -0.021846


# sensors to keep

sensors <-? c("N053", "N163")


library(iterators)

library(rlist)


file_name <- "test.txt"

con_obj <- file( file_name , "r")
ifile <- ireadLines( con_obj , n = 1 )


## I do not do a loop for the example

res <- list()

r <- get_Lines_iter( ifile , sensors)
res <- list.append( res , r )
res
r <- get_Lines_iter( ifile , sensors)
res <- list.append( res , r )
res
r <- get_Lines_iter( ifile , sensors)
do.call("cbind",res)

## the function get_Lines_iter to select and process the line

get_Lines_iter? <-? function( iter , sensors, sep = '\t', quiet =
FALSE){
 ? ## read the next record in the iterator
 ? r = try( nextElem(iter) )
 ?while(? TRUE ){
 ? ? if( class(r) == "try-error") {
 ?? ? ? ?? return( stop("The iterator is empty") )
 ?? } else {
 ?? ## split the read line according to the separator
 ??? r_txt <- textConnection(r)
 ??? fields <- scan(file = r_txt, what = "character", sep = sep,
quiet =
quiet)
 ???? ## test if we have to keep the line
 ???? if( fields[1] %in% sensors){
 ?????? ## data processing for the selected line (for the example 
transformation in dataframe)
 ???? ? n <- length(fields)
 ?????? x <- data.frame( as.numeric(fields[2:n]) )
 ?????? names(x) <- fields[1]
 ???? ? ## We return the values
 ???? ? print(paste0("sensor ",fields[1]," ok"))
 ??? ?? return( x )
 ?? ? }else{
 ????? print(paste0("Sensor ", fields[1] ," not selected"))
 ????? r = try(nextElem(iter) )}
 ?? }
}# end while loop
}







-- 
L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le
logiciel antivirus Avast.
https://www.avast.com/antivirus

	[[alternative HTML version deleted]]

William Michels

2020-May-18 01:09 UTC

head link

[R] iterators : checkFunc with ireadLines

Dear Laurent,

I'm going through your code quickly, and the first question I have is
whether you loaded the "gmp" library?
> library(gmp)
Attaching package: ?gmp?

The following objects are masked from ?package:base?:

    %*%, apply, crossprod, matrix, tcrossprod
> library(iterators)
> iter(1:100, checkFunc = function(n) isprime(n))$state
<environment: 0x7fbead8837f0>

$length
[1] 100

$checkFunc
function (n)
isprime(n)

$recycle
[1] FALSE

attr(,"class")
[1] "containeriter" "iter">
HTH, Bill.

W. Michels, Ph.D.



On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <LaurentRHelp at free.fr>
wrote:>
> Dear R-Help List,
>
>     I would like to use an iterator to read a file filtering some
> selected lines according to the line name in order to use after a
> foreach loop. I wanted to use the checkFunc argument as the following
> example found on internet to select only prime numbers :
>
> |                                iprime <- ||iter||(1:100, checkFunc
> ||function||(n) ||isprime||(n))|
>
> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
>
> but the checkFunc argument seems not to be available with the function
> ireadLines (package iterators). So, I did the code below to solve my
> problem but I am sure that I miss something to use iterators with files.
> Since I found nothing on the web about ireadLines and the checkFunc
> argument, could somebody help me to understand how we have to use
> iterator (and foreach loop) on files keeping only selected lines ?
>
> Thank you very much
> Laurent
>
> Presently here is my code:
>
> ##        mock file to read: test.txt
> ##
> # Time    0    0.000999    0.001999    0.002998    0.003998 0.004997
> 0.005997    0.006996    0.007996
> # N023    -0.031323    -0.035026    -0.029759    -0.024886 -0.024464
> -0.026816    -0.03369    -0.041067    -0.038747
> # N053    -0.014083    -0.004741    0.001443    -0.010152 -0.012996
> -0.005337    -0.008738    -0.015094    -0.012104
> # N123    -0.019008    -0.013494    -0.01318    -0.029208 -0.032748
> -0.020243    -0.015089    -0.014439    -0.011681
> # N163    -0.054023    -0.049345    -0.037158    -0.04112 -0.044612
> -0.036953    -0.036061    -0.044516    -0.046436
> # N193    -0.022171    -0.022384    -0.022338    -0.023304 -0.022569
> -0.021827    -0.021996    -0.021755    -0.021846
>
>
> # sensors to keep
>
> sensors <-  c("N053", "N163")
>
>
> library(iterators)
>
> library(rlist)
>
>
> file_name <- "test.txt"
>
> con_obj <- file( file_name , "r")
> ifile <- ireadLines( con_obj , n = 1 )
>
>
> ## I do not do a loop for the example
>
> res <- list()
>
> r <- get_Lines_iter( ifile , sensors)
> res <- list.append( res , r )
> res
> r <- get_Lines_iter( ifile , sensors)
> res <- list.append( res , r )
> res
> r <- get_Lines_iter( ifile , sensors)
> do.call("cbind",res)
>
> ## the function get_Lines_iter to select and process the line
>
> get_Lines_iter  <-  function( iter , sensors, sep = '\t', quiet
= FALSE){
>    ## read the next record in the iterator
>    r = try( nextElem(iter) )
>   while(  TRUE ){
>      if( class(r) == "try-error") {
>            return( stop("The iterator is empty") )
>     } else {
>     ## split the read line according to the separator
>      r_txt <- textConnection(r)
>      fields <- scan(file = r_txt, what = "character", sep =
sep, quiet > quiet)
>       ## test if we have to keep the line
>       if( fields[1] %in% sensors){
>         ## data processing for the selected line (for the example
> transformation in dataframe)
>         n <- length(fields)
>         x <- data.frame( as.numeric(fields[2:n]) )
>         names(x) <- fields[1]
>         ## We return the values
>         print(paste0("sensor ",fields[1]," ok"))
>         return( x )
>       }else{
>        print(paste0("Sensor ", fields[1] ," not
selected"))
>        r = try(nextElem(iter) )}
>     }
> }# end while loop
> }
>
>
>
>
>
>
>
> --
> L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le
logiciel antivirus Avast.
> https://www.avast.com/antivirus
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

William Michels

2020-May-18 02:54 UTC

head link

[R] iterators : checkFunc with ireadLines

Apologies, Laurent, for this two-part answer. I misunderstood your
post where you stated you wanted to "filter(ing) some
selected lines according to the line name... ." I thought that meant
you had a separate index (like a series of primes) that you wanted to
use to only read-in selected line numbers from a file (test file below
with numbers 1:1000 each on a separate line):
> library(gmp)
> library(iterators)
> iprime <- iter(1:100, checkFunc = function(n) isprime(n))
> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1,
nlines=1)Read 1 item
[1] 2> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1,
nlines=1)Read 1 item
[1] 3> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1,
nlines=1)Read 1 item
[1] 5> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1,
nlines=1)Read 1 item
[1] 7>
However, what it really seems that you want to do is read each line of
a (possibly enormous) file, test each line "string-wise" to keep or
discard, and if you're keeping it, append the line to a list. I can
certainly see the advantage of this strategy for reading in very, very
large files, but it's not clear to me how the "ireadLines"
function (
in the "iterators" package) will help you, since it doesn't seem
to
generate anything but a sequential index.

Anyway, below is an absolutely standard read-in of your data using
read.table(). Hopefully some of the code I've posted has been useful
to you.
> sensors <-  c("N053", "N163")
> read.table("test2.txt")    V1        V2        V3        V4        V5        V6        V7
   V8        V9       V10
1 Time  0.000000  0.000999  0.001999  0.002998  0.003998  0.004997
0.005997  0.006996  0.007996
2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816
-0.033690 -0.041067 -0.038747
3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
-0.008738 -0.015094 -0.012104
4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243
-0.015089 -0.014439 -0.011681
5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
-0.036061 -0.044516 -0.046436
6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827
-0.021996 -0.021755 -0.021846> Laurent_data <- read.table("test2.txt")
> Laurent_data[Laurent_data$V1 %in% sensors, ]    V1        V2        V3        V4        V5        V6        V7
   V8        V9       V10
3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
-0.008738 -0.015094 -0.012104
5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
-0.036061 -0.044516 -0.046436

Best, Bill.

W. Michels, Ph.D.


On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <LaurentRHelp at free.fr>
wrote:>
> Dear R-Help List,
>
>     I would like to use an iterator to read a file filtering some
> selected lines according to the line name in order to use after a
> foreach loop. I wanted to use the checkFunc argument as the following
> example found on internet to select only prime numbers :
>
> |                                iprime <- ||iter||(1:100, checkFunc
> ||function||(n) ||isprime||(n))|
>
> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
>
> but the checkFunc argument seems not to be available with the function
> ireadLines (package iterators). So, I did the code below to solve my
> problem but I am sure that I miss something to use iterators with files.
> Since I found nothing on the web about ireadLines and the checkFunc
> argument, could somebody help me to understand how we have to use
> iterator (and foreach loop) on files keeping only selected lines ?
>
> Thank you very much
> Laurent
>
> Presently here is my code:
>
> ##        mock file to read: test.txt
> ##
> # Time    0    0.000999    0.001999    0.002998    0.003998 0.004997
> 0.005997    0.006996    0.007996
> # N023    -0.031323    -0.035026    -0.029759    -0.024886 -0.024464
> -0.026816    -0.03369    -0.041067    -0.038747
> # N053    -0.014083    -0.004741    0.001443    -0.010152 -0.012996
> -0.005337    -0.008738    -0.015094    -0.012104
> # N123    -0.019008    -0.013494    -0.01318    -0.029208 -0.032748
> -0.020243    -0.015089    -0.014439    -0.011681
> # N163    -0.054023    -0.049345    -0.037158    -0.04112 -0.044612
> -0.036953    -0.036061    -0.044516    -0.046436
> # N193    -0.022171    -0.022384    -0.022338    -0.023304 -0.022569
> -0.021827    -0.021996    -0.021755    -0.021846
>
>
> # sensors to keep
>
> sensors <-  c("N053", "N163")
>
>
> library(iterators)
>
> library(rlist)
>
>
> file_name <- "test.txt"
>
> con_obj <- file( file_name , "r")
> ifile <- ireadLines( con_obj , n = 1 )
>
>
> ## I do not do a loop for the example
>
> res <- list()
>
> r <- get_Lines_iter( ifile , sensors)
> res <- list.append( res , r )
> res
> r <- get_Lines_iter( ifile , sensors)
> res <- list.append( res , r )
> res
> r <- get_Lines_iter( ifile , sensors)
> do.call("cbind",res)
>
> ## the function get_Lines_iter to select and process the line
>
> get_Lines_iter  <-  function( iter , sensors, sep = '\t', quiet
= FALSE){
>    ## read the next record in the iterator
>    r = try( nextElem(iter) )
>   while(  TRUE ){
>      if( class(r) == "try-error") {
>            return( stop("The iterator is empty") )
>     } else {
>     ## split the read line according to the separator
>      r_txt <- textConnection(r)
>      fields <- scan(file = r_txt, what = "character", sep =
sep, quiet > quiet)
>       ## test if we have to keep the line
>       if( fields[1] %in% sensors){
>         ## data processing for the selected line (for the example
> transformation in dataframe)
>         n <- length(fields)
>         x <- data.frame( as.numeric(fields[2:n]) )
>         names(x) <- fields[1]
>         ## We return the values
>         print(paste0("sensor ",fields[1]," ok"))
>         return( x )
>       }else{
>        print(paste0("Sensor ", fields[1] ," not
selected"))
>        r = try(nextElem(iter) )}
>     }
> }# end while loop
> }
>
>
>
>
>
>
>
> --
> L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le
logiciel antivirus Avast.
> https://www.avast.com/antivirus
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Ivan Krylov

2020-May-20 07:46 UTC

head link

[R] iterators : checkFunc with ireadLines

Hi Laurent,

I am not saying this will work every time and I do recognise that this
is very different from a more general solution that you had envisioned,
but if you are on an UNIX-like system or have the relevant utilities
installed and on the %PATH% on Windows, you can filter the input file
line-by-line using a pipe and an external program:

On Sun, 17 May 2020 15:52:30 +0200
Laurent Rhelp <LaurentRHelp at free.fr> wrote:
> # sensors to keep
> sensors <-? c("N053", "N163")
# filter on the beginning of the line
i <- pipe("grep -E '^(N053|N163)' test.txt")
# or:
# filter on the beginning of the given column
# (use $2 for the second column, etc.)
i <- pipe("awk '($1 ~ \"^(N053|N163)\")'
test.txt")
# or:
# since your message is full of Unicode non-breaking spaces, I have to
# bring in heavier machinery to handle those correctly;
# only this solution manages to match full column values
# (here you can also use $F[1] for second column and so on)
i <- pipe("perl -CSD -F'\\s+' -lE \\
 'print join qq{\\t}, @F if $F[0] =~ /^(N053|N163)$/' \\
 test.txt
")
lines <- read.table(i) # closes i when done

The downside of this approach is having to shell-escape the command
lines, which can become complicated, and choosing between use of regular
expressions and more wordy programs (Unicode whitespace in the input
doesn't help, either).

-- 
Best regards,
Ivan

Laurent Rhelp

2020-May-22 11:47 UTC

head link

[R] iterators : checkFunc with ireadLines

Hi Ivan,
 ? Endeed, it is a good idea. I am under MSwindows but I can use the 
bash command I use with git. I will see how to do that with the unix 
command lines.


Le 20/05/2020 ? 09:46, Ivan Krylov a ?crit?:> Hi Laurent,
>
> I am not saying this will work every time and I do recognise that this
> is very different from a more general solution that you had envisioned,
> but if you are on an UNIX-like system or have the relevant utilities
> installed and on the %PATH% on Windows, you can filter the input file
> line-by-line using a pipe and an external program:
>
> On Sun, 17 May 2020 15:52:30 +0200
> Laurent Rhelp <LaurentRHelp at free.fr> wrote:
>
>> # sensors to keep
>> sensors <-? c("N053", "N163")
> # filter on the beginning of the line
> i <- pipe("grep -E '^(N053|N163)' test.txt")
> # or:
> # filter on the beginning of the given column
> # (use $2 for the second column, etc.)
> i <- pipe("awk '($1 ~ \"^(N053|N163)\")'
test.txt")
> # or:
> # since your message is full of Unicode non-breaking spaces, I have to
> # bring in heavier machinery to handle those correctly;
> # only this solution manages to match full column values
> # (here you can also use $F[1] for second column and so on)
> i <- pipe("perl -CSD -F'\\s+' -lE \\
>   'print join qq{\\t}, @F if $F[0] =~ /^(N053|N163)$/' \\
>   test.txt
> ")
> lines <- read.table(i) # closes i when done
>
> The downside of this approach is having to shell-escape the command
> lines, which can become complicated, and choosing between use of regular
> expressions and more wordy programs (Unicode whitespace in the input
> doesn't help, either).
>

-- 
L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le
logiciel antivirus Avast.
https://www.avast.com/antivirus

R help - May 2020 - iterators : checkFunc with ireadLines

[R] iterators : checkFunc with ireadLines

[R] iterators : checkFunc with ireadLines

[R] iterators : checkFunc with ireadLines

[R] iterators : checkFunc with ireadLines

[R] iterators : checkFunc with ireadLines