Dear R-users, I would like to know how could I read a file with different lines lengths. I need read this file and create an output to feed my database. So after reading I'll need create an output like this "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20100910,837460, 39,390)" I mean, each line should be read. But I don`t how to do this when these lines have different lengths I really appreciate any help. Thanks. ====Below the file that should be read ========== *2010 10 01 00 83746 -43.25 -22.81 6 51* 1012.0 -9999 320 1.5 299.1 294.4 64 1000.0 114 250 4.1 298.4 294.8 32 925.0 797 0 0.0 293.6 292.9 32 850.0 1524 195 3.1 289.6 288.9 32 700.0 3156 290 11.3 280.1 280.1 32 500.0 5870 280 20.1 266.1 260.1 32 400.0 7570 265 23.7 256.6 222.7 32 300.0 9670 265 28.8 240.2 218.2 32 250.0 10920 280 27.3 230.2 220.2 32 200.0 12390 260 32.4 218.7 206.7 32 176.0 -9999 255 37.6 -9999.0 -9999.0 8 150.0 14180 245 35.5 205.1 196.1 32 100.0 16560 300 17.0 195.2 186.2 32 *2010 10 01 00 83768 -51.13 -23.33 569 41 * 1000.0 79 -9999 -9999.0 -9999.0 -9999.0 32 946.0 -9999 270 1.0 295.8 292.1 64 925.0 763 15 2.1 296.4 290.4 32 850.0 1497 175 3.6 290.8 288.4 32 700.0 3140 295 9.8 282.9 278.6 32 500.0 5840 285 23.7 267.1 232.1 32 400.0 7550 255 35.5 255.4 231.4 32 300.0 9640 265 37.0 242.2 216.2 32 Best Regards, -- Abraço, Nilza Barros [[alternative HTML version deleted]]
Hello Nilza, If your file is small you can read it into a character vector like this: indata <- readLines("foo.dat") If your file is very big you can read it in batches like this... MAXRECS <- 1000 # for example fcon <- file("foo.dat", open="r") indata <- readLines(fcon, n=MAXRECS) The number of lines read will be given by length(indata). You can check to see if the end of the file has been read yet with: isIncomplete( fcon ) If a leading "*" character is a flag for the start of a station data block you can find this in the indata vector with grepl... start.pos <- which(indata, grepl("^\\s*\\*", indata) When you're finished reading the file... close(fcon) Hope this helps, Michael On 3 October 2010 13:31, Nilza BARROS <nilzabarros at gmail.com> wrote:> Dear R-users, > > I would like to know how could I read a file with different lines lengths. > I need read this file and create an output to feed my database. > So after reading I'll need create an output like this > > "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20100910,837460, 39,390)" > > I mean, ?each line should be read. But I don`t how to do this when these > lines have different lengths > > I really appreciate any help. > > Thanks. > > > > ====Below the file that should be read ==========> > > *2010 10 01 00 > 83746 ?-43.25 -22.81 ? ? ?6 ?51* > 1012.0 ?-9999 ? ?320 ? ? 1.5 ? 299.1 ? 294.4 64 > ?1000.0 ? ?114 ? ?250 ? ? 4.1 ? 298.4 ? 294.8 32 > ?925.0 ? ?797 ? ? ?0 ? ? 0.0 ? 293.6 ? 292.9 32 > ?850.0 ? 1524 ? ?195 ? ? 3.1 ? 289.6 ? 288.9 32 > ?700.0 ? 3156 ? ?290 ? ?11.3 ? 280.1 ? 280.1 32 > ?500.0 ? 5870 ? ?280 ? ?20.1 ? 266.1 ? 260.1 32 > ?400.0 ? 7570 ? ?265 ? ?23.7 ? 256.6 ? 222.7 32 > ?300.0 ? 9670 ? ?265 ? ?28.8 ? 240.2 ? 218.2 32 > ?250.0 ?10920 ? ?280 ? ?27.3 ? 230.2 ? 220.2 32 > ?200.0 ?12390 ? ?260 ? ?32.4 ? 218.7 ? 206.7 32 > ?176.0 ?-9999 ? ?255 ? ?37.6 -9999.0 -9999.0 ?8 > ?150.0 ?14180 ? ?245 ? ?35.5 ? 205.1 ? 196.1 32 > ?100.0 ?16560 ? ?300 ? ?17.0 ? 195.2 ? 186.2 32 > *2010 10 01 00 > 83768 ?-51.13 -23.33 ? ?569 ?41 > * 1000.0 ? ? 79 ?-9999 -9999.0 -9999.0 -9999.0 32 > ?946.0 ?-9999 ? ?270 ? ? 1.0 ? 295.8 ? 292.1 64 > ?925.0 ? ?763 ? ? 15 ? ? 2.1 ? 296.4 ? 290.4 32 > ?850.0 ? 1497 ? ?175 ? ? 3.6 ? 290.8 ? 288.4 32 > ?700.0 ? 3140 ? ?295 ? ? 9.8 ? 282.9 ? 278.6 32 > ?500.0 ? 5840 ? ?285 ? ?23.7 ? 267.1 ? 232.1 32 > ?400.0 ? 7550 ? ?255 ? ?35.5 ? 255.4 ? 231.4 32 > ?300.0 ? 9640 ? ?265 ? ?37.0 ? 242.2 ? 216.2 32 > > > Best Regards, > > -- > Abra?o, > Nilza Barros > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
On Sat, Oct 2, 2010 at 11:31 PM, Nilza BARROS <nilzabarros at gmail.com> wrote:> Dear R-users, > > I would like to know how could I read a file with different lines lengths. > I need read this file and create an output to feed my database. > So after reading I'll need create an output like this > > "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20100910,837460, 39,390)" >Read the data filling the short lines (i.e. the date and station lines) with NAs. Replace the *s with spaces and compute how many non-NAs are in each row (cnt). Append group which is 1 for lines pertaining to the 1st station, 2 for the 2nd, etc. Then merge it all together in one big data frame, All, and generate a vector of SQL strings: DF <- read.table("d2010100100.txt", fill = TRUE) DF[] <- lapply(DF, function(x) as.numeric(chartr("*", " ", x))) cnt <- rowSums(!is.na(DF)) DF$group <- cumsum(cnt == 4) Merge <- function(x, y) merge(x, y, by = "group") All <- Reduce(Merge, split(DF, cnt)) with(All, sprintf("INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (%04d%02d%02d, %d, %d, %d)", V1.x, V2.x, V3.x, V1.y, V1, V2)) The result looks like this: [1] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 82599, 1008, -9999)" [2] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 1011, -9999)" [3] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 1000, 96)" [4] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 925, 782)" [5] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 850, 1520)" [6] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 700, 3171)" [7] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 500, 5890)" [8] "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20101001, 83649, 400, 7600)" -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com