Dear R-Help List, ?? I would like to use an iterator to read a file filtering some selected lines according to the line name in order to use after a foreach loop. I wanted to use the checkFunc argument as the following example found on internet to select only prime numbers : |??????????????????????????????? iprime <- ||iter||(1:100, checkFunc = ||function||(n) ||isprime||(n))| |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/) <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>| but the checkFunc argument seems not to be available with the function ireadLines (package iterators). So, I did the code below to solve my problem but I am sure that I miss something to use iterators with files. Since I found nothing on the web about ireadLines and the checkFunc argument, could somebody help me to understand how we have to use iterator (and foreach loop) on files keeping only selected lines ? Thank you very much Laurent Presently here is my code: ##??????? mock file to read: test.txt ## # Time??? 0??? 0.000999??? 0.001999??? 0.002998??? 0.003998 0.004997??? 0.005997??? 0.006996??? 0.007996 # N023??? -0.031323??? -0.035026??? -0.029759??? -0.024886 -0.024464??? -0.026816??? -0.03369??? -0.041067??? -0.038747 # N053??? -0.014083??? -0.004741??? 0.001443??? -0.010152 -0.012996??? -0.005337??? -0.008738??? -0.015094??? -0.012104 # N123??? -0.019008??? -0.013494??? -0.01318??? -0.029208 -0.032748??? -0.020243??? -0.015089??? -0.014439??? -0.011681 # N163??? -0.054023??? -0.049345??? -0.037158??? -0.04112 -0.044612??? -0.036953??? -0.036061??? -0.044516??? -0.046436 # N193??? -0.022171??? -0.022384??? -0.022338??? -0.023304 -0.022569??? -0.021827??? -0.021996??? -0.021755??? -0.021846 # sensors to keep sensors <-? c("N053", "N163") library(iterators) library(rlist) file_name <- "test.txt" con_obj <- file( file_name , "r") ifile <- ireadLines( con_obj , n = 1 ) ## I do not do a loop for the example res <- list() r <- get_Lines_iter( ifile , sensors) res <- list.append( res , r ) res r <- get_Lines_iter( ifile , sensors) res <- list.append( res , r ) res r <- get_Lines_iter( ifile , sensors) do.call("cbind",res) ## the function get_Lines_iter to select and process the line get_Lines_iter? <-? function( iter , sensors, sep = '\t', quiet = FALSE){ ? ## read the next record in the iterator ? r = try( nextElem(iter) ) ?while(? TRUE ){ ? ? if( class(r) == "try-error") { ?? ? ? ?? return( stop("The iterator is empty") ) ?? } else { ?? ## split the read line according to the separator ??? r_txt <- textConnection(r) ??? fields <- scan(file = r_txt, what = "character", sep = sep, quiet = quiet) ???? ## test if we have to keep the line ???? if( fields[1] %in% sensors){ ?????? ## data processing for the selected line (for the example transformation in dataframe) ???? ? n <- length(fields) ?????? x <- data.frame( as.numeric(fields[2:n]) ) ?????? names(x) <- fields[1] ???? ? ## We return the values ???? ? print(paste0("sensor ",fields[1]," ok")) ??? ?? return( x ) ?? ? }else{ ????? print(paste0("Sensor ", fields[1] ," not selected")) ????? r = try(nextElem(iter) )} ?? } }# end while loop } -- L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le logiciel antivirus Avast. https://www.avast.com/antivirus [[alternative HTML version deleted]]
Dear Laurent, I'm going through your code quickly, and the first question I have is whether you loaded the "gmp" library?> library(gmp)Attaching package: ?gmp? The following objects are masked from ?package:base?: %*%, apply, crossprod, matrix, tcrossprod> library(iterators) > iter(1:100, checkFunc = function(n) isprime(n))$state <environment: 0x7fbead8837f0> $length [1] 100 $checkFunc function (n) isprime(n) $recycle [1] FALSE attr(,"class") [1] "containeriter" "iter">HTH, Bill. W. Michels, Ph.D. On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <LaurentRHelp at free.fr> wrote:> > Dear R-Help List, > > I would like to use an iterator to read a file filtering some > selected lines according to the line name in order to use after a > foreach loop. I wanted to use the checkFunc argument as the following > example found on internet to select only prime numbers : > > | iprime <- ||iter||(1:100, checkFunc > ||function||(n) ||isprime||(n))| > > |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/) > <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>| > > but the checkFunc argument seems not to be available with the function > ireadLines (package iterators). So, I did the code below to solve my > problem but I am sure that I miss something to use iterators with files. > Since I found nothing on the web about ireadLines and the checkFunc > argument, could somebody help me to understand how we have to use > iterator (and foreach loop) on files keeping only selected lines ? > > Thank you very much > Laurent > > Presently here is my code: > > ## mock file to read: test.txt > ## > # Time 0 0.000999 0.001999 0.002998 0.003998 0.004997 > 0.005997 0.006996 0.007996 > # N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 > -0.026816 -0.03369 -0.041067 -0.038747 > # N053 -0.014083 -0.004741 0.001443 -0.010152 -0.012996 > -0.005337 -0.008738 -0.015094 -0.012104 > # N123 -0.019008 -0.013494 -0.01318 -0.029208 -0.032748 > -0.020243 -0.015089 -0.014439 -0.011681 > # N163 -0.054023 -0.049345 -0.037158 -0.04112 -0.044612 > -0.036953 -0.036061 -0.044516 -0.046436 > # N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 > -0.021827 -0.021996 -0.021755 -0.021846 > > > # sensors to keep > > sensors <- c("N053", "N163") > > > library(iterators) > > library(rlist) > > > file_name <- "test.txt" > > con_obj <- file( file_name , "r") > ifile <- ireadLines( con_obj , n = 1 ) > > > ## I do not do a loop for the example > > res <- list() > > r <- get_Lines_iter( ifile , sensors) > res <- list.append( res , r ) > res > r <- get_Lines_iter( ifile , sensors) > res <- list.append( res , r ) > res > r <- get_Lines_iter( ifile , sensors) > do.call("cbind",res) > > ## the function get_Lines_iter to select and process the line > > get_Lines_iter <- function( iter , sensors, sep = '\t', quiet = FALSE){ > ## read the next record in the iterator > r = try( nextElem(iter) ) > while( TRUE ){ > if( class(r) == "try-error") { > return( stop("The iterator is empty") ) > } else { > ## split the read line according to the separator > r_txt <- textConnection(r) > fields <- scan(file = r_txt, what = "character", sep = sep, quiet > quiet) > ## test if we have to keep the line > if( fields[1] %in% sensors){ > ## data processing for the selected line (for the example > transformation in dataframe) > n <- length(fields) > x <- data.frame( as.numeric(fields[2:n]) ) > names(x) <- fields[1] > ## We return the values > print(paste0("sensor ",fields[1]," ok")) > return( x ) > }else{ > print(paste0("Sensor ", fields[1] ," not selected")) > r = try(nextElem(iter) )} > } > }# end while loop > } > > > > > > > > -- > L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le logiciel antivirus Avast. > https://www.avast.com/antivirus > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Apologies, Laurent, for this two-part answer. I misunderstood your post where you stated you wanted to "filter(ing) some selected lines according to the line name... ." I thought that meant you had a separate index (like a series of primes) that you wanted to use to only read-in selected line numbers from a file (test file below with numbers 1:1000 each on a separate line):> library(gmp) > library(iterators) > iprime <- iter(1:100, checkFunc = function(n) isprime(n)) > scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)Read 1 item [1] 2> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)Read 1 item [1] 3> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)Read 1 item [1] 5> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)Read 1 item [1] 7>However, what it really seems that you want to do is read each line of a (possibly enormous) file, test each line "string-wise" to keep or discard, and if you're keeping it, append the line to a list. I can certainly see the advantage of this strategy for reading in very, very large files, but it's not clear to me how the "ireadLines" function ( in the "iterators" package) will help you, since it doesn't seem to generate anything but a sequential index. Anyway, below is an absolutely standard read-in of your data using read.table(). Hopefully some of the code I've posted has been useful to you.> sensors <- c("N053", "N163") > read.table("test2.txt")V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 Time 0.000000 0.000999 0.001999 0.002998 0.003998 0.004997 0.005997 0.006996 0.007996 2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816 -0.033690 -0.041067 -0.038747 3 N053 -0.014083 -0.004741 0.001443 -0.010152 -0.012996 -0.005337 -0.008738 -0.015094 -0.012104 4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243 -0.015089 -0.014439 -0.011681 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953 -0.036061 -0.044516 -0.046436 6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827 -0.021996 -0.021755 -0.021846> Laurent_data <- read.table("test2.txt") > Laurent_data[Laurent_data$V1 %in% sensors, ]V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 3 N053 -0.014083 -0.004741 0.001443 -0.010152 -0.012996 -0.005337 -0.008738 -0.015094 -0.012104 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953 -0.036061 -0.044516 -0.046436 Best, Bill. W. Michels, Ph.D. On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <LaurentRHelp at free.fr> wrote:> > Dear R-Help List, > > I would like to use an iterator to read a file filtering some > selected lines according to the line name in order to use after a > foreach loop. I wanted to use the checkFunc argument as the following > example found on internet to select only prime numbers : > > | iprime <- ||iter||(1:100, checkFunc > ||function||(n) ||isprime||(n))| > > |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/) > <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>| > > but the checkFunc argument seems not to be available with the function > ireadLines (package iterators). So, I did the code below to solve my > problem but I am sure that I miss something to use iterators with files. > Since I found nothing on the web about ireadLines and the checkFunc > argument, could somebody help me to understand how we have to use > iterator (and foreach loop) on files keeping only selected lines ? > > Thank you very much > Laurent > > Presently here is my code: > > ## mock file to read: test.txt > ## > # Time 0 0.000999 0.001999 0.002998 0.003998 0.004997 > 0.005997 0.006996 0.007996 > # N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 > -0.026816 -0.03369 -0.041067 -0.038747 > # N053 -0.014083 -0.004741 0.001443 -0.010152 -0.012996 > -0.005337 -0.008738 -0.015094 -0.012104 > # N123 -0.019008 -0.013494 -0.01318 -0.029208 -0.032748 > -0.020243 -0.015089 -0.014439 -0.011681 > # N163 -0.054023 -0.049345 -0.037158 -0.04112 -0.044612 > -0.036953 -0.036061 -0.044516 -0.046436 > # N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 > -0.021827 -0.021996 -0.021755 -0.021846 > > > # sensors to keep > > sensors <- c("N053", "N163") > > > library(iterators) > > library(rlist) > > > file_name <- "test.txt" > > con_obj <- file( file_name , "r") > ifile <- ireadLines( con_obj , n = 1 ) > > > ## I do not do a loop for the example > > res <- list() > > r <- get_Lines_iter( ifile , sensors) > res <- list.append( res , r ) > res > r <- get_Lines_iter( ifile , sensors) > res <- list.append( res , r ) > res > r <- get_Lines_iter( ifile , sensors) > do.call("cbind",res) > > ## the function get_Lines_iter to select and process the line > > get_Lines_iter <- function( iter , sensors, sep = '\t', quiet = FALSE){ > ## read the next record in the iterator > r = try( nextElem(iter) ) > while( TRUE ){ > if( class(r) == "try-error") { > return( stop("The iterator is empty") ) > } else { > ## split the read line according to the separator > r_txt <- textConnection(r) > fields <- scan(file = r_txt, what = "character", sep = sep, quiet > quiet) > ## test if we have to keep the line > if( fields[1] %in% sensors){ > ## data processing for the selected line (for the example > transformation in dataframe) > n <- length(fields) > x <- data.frame( as.numeric(fields[2:n]) ) > names(x) <- fields[1] > ## We return the values > print(paste0("sensor ",fields[1]," ok")) > return( x ) > }else{ > print(paste0("Sensor ", fields[1] ," not selected")) > r = try(nextElem(iter) )} > } > }# end while loop > } > > > > > > > > -- > L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le logiciel antivirus Avast. > https://www.avast.com/antivirus > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Laurent, I am not saying this will work every time and I do recognise that this is very different from a more general solution that you had envisioned, but if you are on an UNIX-like system or have the relevant utilities installed and on the %PATH% on Windows, you can filter the input file line-by-line using a pipe and an external program: On Sun, 17 May 2020 15:52:30 +0200 Laurent Rhelp <LaurentRHelp at free.fr> wrote:> # sensors to keep > sensors <-? c("N053", "N163")# filter on the beginning of the line i <- pipe("grep -E '^(N053|N163)' test.txt") # or: # filter on the beginning of the given column # (use $2 for the second column, etc.) i <- pipe("awk '($1 ~ \"^(N053|N163)\")' test.txt") # or: # since your message is full of Unicode non-breaking spaces, I have to # bring in heavier machinery to handle those correctly; # only this solution manages to match full column values # (here you can also use $F[1] for second column and so on) i <- pipe("perl -CSD -F'\\s+' -lE \\ 'print join qq{\\t}, @F if $F[0] =~ /^(N053|N163)$/' \\ test.txt ") lines <- read.table(i) # closes i when done The downside of this approach is having to shell-escape the command lines, which can become complicated, and choosing between use of regular expressions and more wordy programs (Unicode whitespace in the input doesn't help, either). -- Best regards, Ivan
Hi Ivan, ? Endeed, it is a good idea. I am under MSwindows but I can use the bash command I use with git. I will see how to do that with the unix command lines. Le 20/05/2020 ? 09:46, Ivan Krylov a ?crit?:> Hi Laurent, > > I am not saying this will work every time and I do recognise that this > is very different from a more general solution that you had envisioned, > but if you are on an UNIX-like system or have the relevant utilities > installed and on the %PATH% on Windows, you can filter the input file > line-by-line using a pipe and an external program: > > On Sun, 17 May 2020 15:52:30 +0200 > Laurent Rhelp <LaurentRHelp at free.fr> wrote: > >> # sensors to keep >> sensors <-? c("N053", "N163") > # filter on the beginning of the line > i <- pipe("grep -E '^(N053|N163)' test.txt") > # or: > # filter on the beginning of the given column > # (use $2 for the second column, etc.) > i <- pipe("awk '($1 ~ \"^(N053|N163)\")' test.txt") > # or: > # since your message is full of Unicode non-breaking spaces, I have to > # bring in heavier machinery to handle those correctly; > # only this solution manages to match full column values > # (here you can also use $F[1] for second column and so on) > i <- pipe("perl -CSD -F'\\s+' -lE \\ > 'print join qq{\\t}, @F if $F[0] =~ /^(N053|N163)$/' \\ > test.txt > ") > lines <- read.table(i) # closes i when done > > The downside of this approach is having to shell-escape the command > lines, which can become complicated, and choosing between use of regular > expressions and more wordy programs (Unicode whitespace in the input > doesn't help, either). >-- L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le logiciel antivirus Avast. https://www.avast.com/antivirus