Hi! I have data (also in attached file) in the following form: num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt 1 1 f q 1900-01-01 1900-01-01 01:01:01 2 1.0 1316666.5 2 a g r z 1900-01-01 01:01:01 3 1.5 1188830.5 3 b h s y 1900-01-01 1900-01-01 01:01:01 4 2.0 1271846.3 4 c i t x 1900-01-01 1900-01-01 01:01:01 5 2.5 829737.4 d j u w 1900-01-01 6 3.0 1240967.3 5 e k v v 1900-01-01 1900-01-01 01:01:01 7 3.5 919684.4 6 f l w u 1900-01-01 1900-01-01 01:01:01 8 4.0 968214.6 7 g m x t 1900-01-01 1900-01-01 01:01:01 9 4.5 1232076.4 8 h n y s 1900-01-01 1900-01-01 01:01:01 10 5.0 1141273.4 9 i o z r 1900-01-01 1900-01-01 01:01:01 5.5 988481.4 10 j q 1900-01-01 1900-01-01 01:01:01 This is a FWF (fixed width format) file. I can not use read.table here, because of missing values. I have tried with the following> read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),header=TRUE) Error in read.table(file = FILE, header = header, sep = sep, as.is as.is, : more columns than column names I could use:> read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),header=FALSE, skip=1) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 1 NA NA 1 f q 1900-01-01 1900-01-01 01:01:01 2 2 1.0 1316666.5 2 a g r z 1900-01-01 01:01:01 3 3 1.5 1188830.5 3 b h s y 1900-01-01 1900-01-01 01:01:01 4 4 2.0 1271846.3 4 c i t x 1900-01-01 1900-01-01 01:01:01 5 5 2.5 829737.4 NA d j u w 1900-01-01 6 6 3.0 1240967.3 5 e k v v 1900-01-01 1900-01-01 01:01:01 7 7 3.5 919684.4 6 f l w u 1900-01-01 1900-01-01 01:01:01 8 8 4.0 968214.6 7 g m x t 1900-01-01 1900-01-01 01:01:01 9 9 4.5 1232076.4 8 h n y s 1900-01-01 1900-01-01 01:01:01 10 10 5.0 1141273.4 9 i o z r 1900-01-01 1900-01-01 01:01:01 11 NA 5.5 988481.4 10 j q 1900-01-01 1900-01-01 01:01:01 Does anyone have a clue, how to get above result with header? Thanks! -- Lep pozdrav / With regards, Gregor Gorjanc ---------------------------------------------------------------------- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europe fax: +386 (0)1 72 17 888 ---------------------------------------------------------------------- "One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C.
On Mon, 2006-10-30 at 19:51 +0100, Gregor Gorjanc wrote:> Hi! > > I have data (also in attached file) in the following form: > > num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt > 1 1 f q 1900-01-01 1900-01-01 01:01:01 > 2 1.0 1316666.5 2 a g r z 1900-01-01 01:01:01 > 3 1.5 1188830.5 3 b h s y 1900-01-01 1900-01-01 01:01:01 > 4 2.0 1271846.3 4 c i t x 1900-01-01 1900-01-01 01:01:01 > 5 2.5 829737.4 d j u w 1900-01-01 > 6 3.0 1240967.3 5 e k v v 1900-01-01 1900-01-01 01:01:01 > 7 3.5 919684.4 6 f l w u 1900-01-01 1900-01-01 01:01:01 > 8 4.0 968214.6 7 g m x t 1900-01-01 1900-01-01 01:01:01 > 9 4.5 1232076.4 8 h n y s 1900-01-01 1900-01-01 01:01:01 > 10 5.0 1141273.4 9 i o z r 1900-01-01 1900-01-01 01:01:01 > 5.5 988481.4 10 j q 1900-01-01 1900-01-01 01:01:01 > > This is a FWF (fixed width format) file. I can not use read.table here, > because of missing values. I have tried with the following > > > read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20), > header=TRUE) > > Error in read.table(file = FILE, header = header, sep = sep, as.is > as.is, : > more columns than column names > > I could use: > > > read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20), > header=FALSE, skip=1) > V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 > 1 1 NA NA 1 f q 1900-01-01 1900-01-01 01:01:01 > 2 2 1.0 1316666.5 2 a g r z 1900-01-01 01:01:01 > 3 3 1.5 1188830.5 3 b h s y 1900-01-01 1900-01-01 01:01:01 > 4 4 2.0 1271846.3 4 c i t x 1900-01-01 1900-01-01 01:01:01 > 5 5 2.5 829737.4 NA d j u w 1900-01-01 > 6 6 3.0 1240967.3 5 e k v v 1900-01-01 1900-01-01 01:01:01 > 7 7 3.5 919684.4 6 f l w u 1900-01-01 1900-01-01 01:01:01 > 8 8 4.0 968214.6 7 g m x t 1900-01-01 1900-01-01 01:01:01 > 9 9 4.5 1232076.4 8 h n y s 1900-01-01 1900-01-01 01:01:01 > 10 10 5.0 1141273.4 9 i o z r 1900-01-01 1900-01-01 01:01:01 > 11 NA 5.5 988481.4 10 j q 1900-01-01 1900-01-01 01:01:01 > > Does anyone have a clue, how to get above result with header? > > Thanks!The attachment did not come through. Perhaps it was too large? Not sure if this is the most efficient way, but how about this: DF <- read.fwf("test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20), skip = 1, strip.white = TRUE, col.names = read.table("test.txt", nrow = 1, as.is = TRUE)[1, ])> DFnum1 num2 num3 int1 fac1 fac2 cha1 cha2 Date 1 1 NA NA 1 f q 1900-01-01 2 2 1.0 1316666.5 2 a g r z 3 3 1.5 1188830.5 3 b h s y 1900-01-01 4 4 2.0 1271846.3 4 c i t x 1900-01-01 5 5 2.5 829737.4 NA d j u w 1900-01-01 6 6 3.0 1240967.3 5 e k v v 1900-01-01 7 7 3.5 919684.4 6 f l w u 1900-01-01 8 8 4.0 968214.6 7 g m x t 1900-01-01 9 9 4.5 1232076.4 8 h n y s 1900-01-01 10 10 5.0 1141273.4 9 i o z r 1900-01-01 11 NA 5.5 988481.4 10 j q 1900-01-01 POSIXt 1 1900-01-01 01:01:01 2 1900-01-01 01:01:01 3 1900-01-01 01:01:01 4 1900-01-01 01:01:01 5 <NA> 6 1900-01-01 01:01:01 7 1900-01-01 01:01:01 8 1900-01-01 01:01:01 9 1900-01-01 01:01:01 10 1900-01-01 01:01:01 11 1900-01-01 01:01:01 Of course, with the limited number of columns, you can always just set colnames(DF) <- c("num1", "num2", "num3", "int1", "fac1", "fac2", "cha1", "cha2", "Date", "POSIXt") as a post-import step. HTH, Marc Schwartz
Gregor, According to the help for read.fwf, sep needs to be set to a value that occurs only in the header record. I changed the spaces to commas in the header record of your example and used the following syntax and was able to read the file just fine. new.data<-read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 19), header=TRUE, sep=',') Hope this is helpful, Dan Daniel Nordlund Bothell, WA USA> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] > On Behalf Of Gregor Gorjanc > Sent: Monday, October 30, 2006 10:52 AM > To: r-help at stat.math.ethz.ch > Subject: [R] read.fwf and header > > Hi! > > I have data (also in attached file) in the following form: > > num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt > 1 1 f q 1900-01-01 1900-01-01 01:01:01 > 2 1.0 1316666.5 2 a g r z 1900-01-01 01:01:01 > 3 1.5 1188830.5 3 b h s y 1900-01-01 1900-01-01 01:01:01 > 4 2.0 1271846.3 4 c i t x 1900-01-01 1900-01-01 01:01:01 > 5 2.5 829737.4 d j u w 1900-01-01 > 6 3.0 1240967.3 5 e k v v 1900-01-01 1900-01-01 01:01:01 > 7 3.5 919684.4 6 f l w u 1900-01-01 1900-01-01 01:01:01 > 8 4.0 968214.6 7 g m x t 1900-01-01 1900-01-01 01:01:01 > 9 4.5 1232076.4 8 h n y s 1900-01-01 1900-01-01 01:01:01 > 10 5.0 1141273.4 9 i o z r 1900-01-01 1900-01-01 01:01:01 > 5.5 988481.4 10 j q 1900-01-01 1900-01-01 01:01:01 > > This is a FWF (fixed width format) file. I can not use read.table here, > because of missing values. I have tried with the following > > > read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20), > header=TRUE) > > Error in read.table(file = FILE, header = header, sep = sep, as.is > as.is, : > more columns than column names > > I could use: > > > read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20), > header=FALSE, skip=1) > V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 > 1 1 NA NA 1 f q 1900-01-01 1900-01-01 01:01:01 > 2 2 1.0 1316666.5 2 a g r z 1900-01-01 01:01:01 > 3 3 1.5 1188830.5 3 b h s y 1900-01-01 1900-01-01 01:01:01 > 4 4 2.0 1271846.3 4 c i t x 1900-01-01 1900-01-01 01:01:01 > 5 5 2.5 829737.4 NA d j u w 1900-01-01 > 6 6 3.0 1240967.3 5 e k v v 1900-01-01 1900-01-01 01:01:01 > 7 7 3.5 919684.4 6 f l w u 1900-01-01 1900-01-01 01:01:01 > 8 8 4.0 968214.6 7 g m x t 1900-01-01 1900-01-01 01:01:01 > 9 9 4.5 1232076.4 8 h n y s 1900-01-01 1900-01-01 01:01:01 > 10 10 5.0 1141273.4 9 i o z r 1900-01-01 1900-01-01 01:01:01 > 11 NA 5.5 988481.4 10 j q 1900-01-01 1900-01-01 01:01:01 > > Does anyone have a clue, how to get above result with header? > > Thanks! > > -- > Lep pozdrav / With regards, > Gregor Gorjanc > ---------------------------------------------------------------------- > University of Ljubljana PhD student > Biotechnical Faculty > Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan > Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si > > SI-1230 Domzale tel: +386 (0)1 72 17 861 > Slovenia, Europe fax: +386 (0)1 72 17 888 > > ---------------------------------------------------------------------- > "One must learn by doing the thing; for though you think you know it, > you have no certainty until you try." Sophocles ~ 450 B.C. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
"Archaic" it may be, but I still have to deal with fixed format data files on a daily basis. David L. Reiner Rho Trading Securities, LLC Chicago IL 60605 312-362-4963 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Martin Maechler Sent: Tuesday, October 31, 2006 1:52 AM To: gregor.gorjanc at bfro.uni-lj.si Cc: r-help at stat.math.ethz.ch Subject: Re: [R] read.fwf and header <snip> In my (and probably R-core's) view, read.fwf() should only have to be used for ``legacy data files'' (those times when people used *no* separators in order to save disk space), since nowadays, such data files should "automatically" have correct separators. --> Fix the "file producing process" rather than make read.fwf() unnecessarily more complicated. Martin Maechler, ETH Zurich ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
How about using a connection and reading the header separate from the data, like this: tmp1 <- file('c:/temp/tmp.dat') open(tmp1) my.names <- scan(tmp1, nlines=1, what='') new.data<-read.fwf(file=tmp1, widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 19), header=FALSE) names(new.data) <- my.names close(tmp1) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gregor Gorjanc Sent: Monday, October 30, 2006 3:33 PM To: Daniel Nordlund Cc: r-help at stat.math.ethz.ch Subject: Re: [R] read.fwf and header Daniel Nordlund wrote:> Gregor, > > According to the help for read.fwf, sep needs to be set to a valuethat occurs only in the header record. I changed the spaces to commas in the header record of your example and used the following syntax and was able to read the file just fine.> > new.data<-read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2,11, 19),> header=TRUE, sep=',') > > Hope this is helpful, > > DanThanks Dan! But I have to modfy file first. Not that much of work but still. Regards, Gregor ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.