Jason Rupert
2009-May-05 02:20 UTC
[R] Way to handle variable length and numbers of columns using read.table(...)
I've got read.table to successfully read in my table of three columns. Most of the time I will have a set number of rows, but sometime that will be variable and sometimes there will be only be two variables in one row, e.g. Time Loc1 Loc2 1 22.33 44.55 2 66.77 88.99 3 222.33344.55 4 66.77 88.99 Is there any way to have read.table handle (1) a variable number of rows, and (2) sometime there are only two variables as shown in Time = 3 above? Just curious about how to handle this, and if read.table is the right way to go about or if I should read in all the data and then try to parse it out best I can. Thanks again.> R.version_ platform i386-apple-darwin8.11.1 arch i386 os darwin8.11.1 system i386, darwin8.11.1 status major 2 minor 8.0 year 2008 month 10 day 20 svn rev 46754 language R version.string R version 2.8.0 (2008-10-20)
jim holtman
2009-May-05 02:47 UTC
[R] Way to handle variable length and numbers of columns using read.table(...)
Well if you read in your data, you get:> x <- read.table('clipboard', header=TRUE, fill=TRUE)Warning message: In read.table("clipboard", header = TRUE, fill = TRUE) : incomplete final line found by readTableHeader on 'clipboard'> xTime Loc1 Loc2 1 1 22.33 44.55 2 2 66.77 88.99 3 3 222.33344.55 NA 4 4 66.77 88.99> str(x)'data.frame': 4 obs. of 3 variables: $ Time: int 1 2 3 4 $ Loc1: Factor w/ 3 levels "22.33","222.33344.55",..: 1 3 2 3 $ Loc2: num 44.5 89 NA 89>As you can see the variable that has two decimal points is read in as a character and cause the whole column to be converted to a factor. It appears that you have some fixed length fields that are overflowing. Now you could read in the data and use regualr expressions and parse the data; you just have to match on the first part have two decimal place and then extract the rest. THe question is, is this the only "problems" you have in the data? If so, parsing it is not hard. On Mon, May 4, 2009 at 10:20 PM, Jason Rupert <jasonkrupert@yahoo.com>wrote:> > I've got read.table to successfully read in my table of three columns. > Most of the time I will have a set number of rows, but sometime that will > be variable and sometimes there will be only be two variables in one row, > e.g. > > Time Loc1 Loc2 > 1 22.33 44.55 > 2 66.77 88.99 > 3 222.33344.55 > 4 66.77 88.99 > > Is there any way to have read.table handle (1) a variable number of rows, > and (2) sometime there are only two variables as shown in Time = 3 above? > > Just curious about how to handle this, and if read.table is the right way > to go about or if I should read in all the data and then try to parse it out > best I can. > > Thanks again. > > > R.version > _ > platform i386-apple-darwin8.11.1 > arch i386 > os darwin8.11.1 > system i386, darwin8.11.1 > status > major 2 > minor 8.0 > year 2008 > month 10 > day 20 > svn rev 46754 > language R > version.string R version 2.8.0 (2008-10-20) > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]
Gabor Grothendieck
2009-May-05 03:04 UTC
[R] Way to handle variable length and numbers of columns using read.table(...)
Its not clear exactly what the rules are for this but if we assume that numbers always end in a decimal plus two digits then using stapply from the gsubfn package:> Lines <- "Time Loc1 Loc2+ 1 22.33 44.55 + 2 66.77 88.99 + 3 222.33344.55 + 4 66.77 88.99"> > library(gsubfn) > L <- readLines(textConnection(Lines)) > strapply(L[-1], "[0-9]*[.][0-9][0-9]", as.numeric, simplify = rbind)[,1] [,2] [1,] 22.33 44.55 [2,] 66.77 88.99 [3,] 222.33 344.55 [4,] 66.77 88.99 See http://gsubfn.googlecode.com and for regular expressions see ?regex On Mon, May 4, 2009 at 10:20 PM, Jason Rupert <jasonkrupert at yahoo.com> wrote:> > I've got read.table to successfully read in my table of three columns. ?Most of the time I will have a set number of rows, but sometime that will be variable and sometimes there will be only be two variables in one row, e.g. > > Time Loc1 Loc2 > 1 22.33 44.55 > 2 66.77 88.99 > 3 222.33344.55 > 4 66.77 88.99 > > Is there any way to have read.table handle (1) a variable number of rows, and (2) sometime there are only two variables as shown in Time = 3 above? > > Just curious about how to handle this, and if read.table is the right way to go about or if I should read in all the data and then try to parse it out best I can. > > Thanks again. > >> R.version > ? ? ? ? ? ? ? _ > platform ? ? ? i386-apple-darwin8.11.1 > arch ? ? ? ? ? i386 > os ? ? ? ? ? ? darwin8.11.1 > system ? ? ? ? i386, darwin8.11.1 > status > major ? ? ? ? ?2 > minor ? ? ? ? ?8.0 > year ? ? ? ? ? 2008 > month ? ? ? ? ?10 > day ? ? ? ? ? ?20 > svn rev ? ? ? ?46754 > language ? ? ? R > version.string R version 2.8.0 (2008-10-20) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >