GREAT ! It is exactly in the idea of my request ! I like the nextElem call in the skip argument. Thank you very much William Best Regards Laurent Le 18/05/2020 ? 20:37, William Michels a ?crit?:> Hi Laurent, > > Thank you for explaining your size limitations. Below is an example > using the read.fwf() function to grab the first column of your input > file (in 2000 row chunks). This column is converted to an index, and > the index is used to create an iterator useful for skipping lines when > reading input with scan(). (You could try processing your large file > in successive 2000 line chunks, or whatever number of lines fits into > memory). Maybe not as elegant as the approach you were going for, but > read.fwf() should be pretty efficient: > >> sensors <- c("N053", "N163") >> read.fwf("test2.txt", widths=c(4), as.is=TRUE, flush=TRUE, n=2000, skip=0) > V1 > 1 Time > 2 N023 > 3 N053 > 4 N123 > 5 N163 > 6 N193 >> first_col <- read.fwf("test2.txt", widths=c(4), as.is=TRUE, flush=TRUE, n=2000, skip=0) >> which(first_col$V1 %in% sensors) > [1] 3 5 >> index1 <- which(first_col$V1 %in% sensors) >> iter_index1 <- iter(1:2000, checkFunc= function(n) {n %in% index1}) >> unlist(scan(file="test2.txt", what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE, skip=nextElem(iter_index1)-1, nlines=1, quiet=TRUE)) > [1] "N053" "-0.014083" "-0.004741" "0.001443" "-0.010152" > "-0.012996" "-0.005337" "-0.008738" "-0.015094" "-0.012104" >> unlist(scan(file="test2.txt", what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE, skip=nextElem(iter_index1)-1, nlines=1, quiet=TRUE)) > [1] "N163" "-0.054023" "-0.049345" "-0.037158" "-0.04112" > "-0.044612" "-0.036953" "-0.036061" "-0.044516" "-0.046436" > (Note for this email and the previous one, I've deleted the first > "hash" character from each line of your test file for clarity). > > HTH, Bill. > > W. Michels, Ph.D. > > > > > > On Mon, May 18, 2020 at 3:35 AM Laurent Rhelp <LaurentRHelp at free.fr> wrote: >> Dear William, >> Thank you for your answer >> My file is very large so I cannot read it in my memory (I cannot use >> read.table). So I want to put in memory only the line I need to process. >> With readLines, as I did, it works but I would like to use an iterator >> and a foreach loop to understand this way to do because I thought that >> it was a better solution to write a nice code. >> >> >> Le 18/05/2020 ? 04:54, William Michels a ?crit : >>> Apologies, Laurent, for this two-part answer. I misunderstood your >>> post where you stated you wanted to "filter(ing) some >>> selected lines according to the line name... ." I thought that meant >>> you had a separate index (like a series of primes) that you wanted to >>> use to only read-in selected line numbers from a file (test file below >>> with numbers 1:1000 each on a separate line): >>> >>>> library(gmp) >>>> library(iterators) >>>> iprime <- iter(1:100, checkFunc = function(n) isprime(n)) >>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1) >>> Read 1 item >>> [1] 2 >>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1) >>> Read 1 item >>> [1] 3 >>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1) >>> Read 1 item >>> [1] 5 >>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1) >>> Read 1 item >>> [1] 7 >>> However, what it really seems that you want to do is read each line of >>> a (possibly enormous) file, test each line "string-wise" to keep or >>> discard, and if you're keeping it, append the line to a list. I can >>> certainly see the advantage of this strategy for reading in very, very >>> large files, but it's not clear to me how the "ireadLines" function ( >>> in the "iterators" package) will help you, since it doesn't seem to >>> generate anything but a sequential index. >>> >>> Anyway, below is an absolutely standard read-in of your data using >>> read.table(). Hopefully some of the code I've posted has been useful >>> to you. >>> >>>> sensors <- c("N053", "N163") >>>> read.table("test2.txt") >>> V1 V2 V3 V4 V5 V6 V7 >>> V8 V9 V10 >>> 1 Time 0.000000 0.000999 0.001999 0.002998 0.003998 0.004997 >>> 0.005997 0.006996 0.007996 >>> 2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816 >>> -0.033690 -0.041067 -0.038747 >>> 3 N053 -0.014083 -0.004741 0.001443 -0.010152 -0.012996 -0.005337 >>> -0.008738 -0.015094 -0.012104 >>> 4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243 >>> -0.015089 -0.014439 -0.011681 >>> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953 >>> -0.036061 -0.044516 -0.046436 >>> 6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827 >>> -0.021996 -0.021755 -0.021846 >>>> Laurent_data <- read.table("test2.txt") >>>> Laurent_data[Laurent_data$V1 %in% sensors, ] >>> V1 V2 V3 V4 V5 V6 V7 >>> V8 V9 V10 >>> 3 N053 -0.014083 -0.004741 0.001443 -0.010152 -0.012996 -0.005337 >>> -0.008738 -0.015094 -0.012104 >>> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953 >>> -0.036061 -0.044516 -0.046436 >>> >>> Best, Bill. >>> >>> W. Michels, Ph.D. >>> >>> >>> On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <LaurentRHelp at free.fr> wrote: >>>> Dear R-Help List, >>>> >>>> I would like to use an iterator to read a file filtering some >>>> selected lines according to the line name in order to use after a >>>> foreach loop. I wanted to use the checkFunc argument as the following >>>> example found on internet to select only prime numbers : >>>> >>>> | iprime <- ||iter||(1:100, checkFunc >>>> ||function||(n) ||isprime||(n))| >>>> >>>> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/) >>>> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>| >>>> >>>> but the checkFunc argument seems not to be available with the function >>>> ireadLines (package iterators). So, I did the code below to solve my >>>> problem but I am sure that I miss something to use iterators with files. >>>> Since I found nothing on the web about ireadLines and the checkFunc >>>> argument, could somebody help me to understand how we have to use >>>> iterator (and foreach loop) on files keeping only selected lines ? >>>> >>>> Thank you very much >>>> Laurent >>>> >>>> Presently here is my code: >>>> >>>> ## mock file to read: test.txt >>>> ## >>>> # Time 0 0.000999 0.001999 0.002998 0.003998 0.004997 >>>> 0.005997 0.006996 0.007996 >>>> # N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 >>>> -0.026816 -0.03369 -0.041067 -0.038747 >>>> # N053 -0.014083 -0.004741 0.001443 -0.010152 -0.012996 >>>> -0.005337 -0.008738 -0.015094 -0.012104 >>>> # N123 -0.019008 -0.013494 -0.01318 -0.029208 -0.032748 >>>> -0.020243 -0.015089 -0.014439 -0.011681 >>>> # N163 -0.054023 -0.049345 -0.037158 -0.04112 -0.044612 >>>> -0.036953 -0.036061 -0.044516 -0.046436 >>>> # N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 >>>> -0.021827 -0.021996 -0.021755 -0.021846 >>>> >>>> >>>> # sensors to keep >>>> >>>> sensors <- c("N053", "N163") >>>> >>>> >>>> library(iterators) >>>> >>>> library(rlist) >>>> >>>> >>>> file_name <- "test.txt" >>>> >>>> con_obj <- file( file_name , "r") >>>> ifile <- ireadLines( con_obj , n = 1 ) >>>> >>>> >>>> ## I do not do a loop for the example >>>> >>>> res <- list() >>>> >>>> r <- get_Lines_iter( ifile , sensors) >>>> res <- list.append( res , r ) >>>> res >>>> r <- get_Lines_iter( ifile , sensors) >>>> res <- list.append( res , r ) >>>> res >>>> r <- get_Lines_iter( ifile , sensors) >>>> do.call("cbind",res) >>>> >>>> ## the function get_Lines_iter to select and process the line >>>> >>>> get_Lines_iter <- function( iter , sensors, sep = '\t', quiet = FALSE){ >>>> ## read the next record in the iterator >>>> r = try( nextElem(iter) ) >>>> while( TRUE ){ >>>> if( class(r) == "try-error") { >>>> return( stop("The iterator is empty") ) >>>> } else { >>>> ## split the read line according to the separator >>>> r_txt <- textConnection(r) >>>> fields <- scan(file = r_txt, what = "character", sep = sep, quiet >>>> quiet) >>>> ## test if we have to keep the line >>>> if( fields[1] %in% sensors){ >>>> ## data processing for the selected line (for the example >>>> transformation in dataframe) >>>> n <- length(fields) >>>> x <- data.frame( as.numeric(fields[2:n]) ) >>>> names(x) <- fields[1] >>>> ## We return the values >>>> print(paste0("sensor ",fields[1]," ok")) >>>> return( x ) >>>> }else{ >>>> print(paste0("Sensor ", fields[1] ," not selected")) >>>> r = try(nextElem(iter) )} >>>> } >>>> }# end while loop >>>> } >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le logiciel antivirus Avast. >>>> https://www.avast.com/antivirus >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> >> >> -- >> L'absence de virus dans ce courrier ?lectronique a ?t? v?rifi?e par le logiciel antivirus Avast. >> https://www.avast.com/antivirus >>