Johan Jackson
2010-May-26 00:05 UTC
[R] reading in table with different number of elements in each row
HI all, This is probably simple, but I haven't been able to locate the answer either in the Import Manual or from searching the listserve. I have tab-delimited data with different numbers of elements in each row. I want to read it into R, such that R fills in "NA" in elements that have no data. How do I accomplish this? Example: DATA on disk: 1 -0.068191 -0.050729 -0.113982 -0.044363 -0.072445 -0.044516 -0.048597 -0.051866 -0.051563 -0.041576 2 -0.032645 -0.062389 -0.054491 -0.058061 -0.034690 -0.038044 -0.045332 -0.043785 -0.050639 -0.049617 3 -0.068191 -0.044207 -0.058061 -0.050729 -0.034991 -0.045360 -0.051563 -0.060290 -0.043785 -0.048757 4 -0.068191 -0.062389 -0.050729 -0.058579 -0.056481 -0.044363 -0.042347 -0.060290 -0.051563 -0.037216 -0.041576 -0.056476 5 -0.068191 -0.047649 -0.062389 -0.058061 -0.034227 -0.185829 -0.071855 -0.064096 -0.195645 6 -0.040208 -0.068191 -0.036475 -0.041268 -0.044207 -0.044363 -0.034991 -0.059810 -0.051619 -0.051563 -0.037216 -0.041576 -0.019762 7 -0.068191 -0.034227 -0.044363 -0.051563 -0.041576 -0.053823 -0.057023 -0.046083 -0.089374 -0.057436 8 -0.068191 -0.050731 -0.044207 -0.169714 -0.060025 -0.048597 -0.037827 -0.053823 -0.055154 9 -0.062389 -0.044207 -0.050729 -0.044363 -0.043785 10 -0.040208 -0.036716 -0.068191 -0.051466 -0.050731 -0.050729 -0.048095 -0.044363 -0.044817 -0.059810 -0.051563 -0.037827 -0.053985 -0.059573 -0.052893 11 -0.068191 -0.034227 -0.048597 -0.051563 -0.041576 -0.056512 12 -0.040208 -0.050731 -0.044207 -0.048095 -0.044363 -0.044817 -0.037827 -0.053985 -0.059573 My attempts: x <- read.table("DATA",fill=TRUE,sep="\t",colClasses="numeric")> xV1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 1 -0.068191 -0.050729 -0.113982 -0.044363 -0.072445 -0.044516 -0.048597 -0.051866 -0.051563 -0.041576 NA NA NA 2 -0.032645 -0.062389 -0.054491 -0.058061 -0.034690 -0.038044 -0.045332 -0.043785 -0.050639 -0.049617 NA NA NA 3 -0.068191 -0.044207 -0.058061 -0.050729 -0.034991 -0.045360 -0.051563 -0.060290 -0.043785 -0.048757 NA NA NA 4 -0.068191 -0.062389 -0.050729 -0.058579 -0.056481 -0.044363 -0.042347 -0.060290 -0.051563 -0.037216 -0.041576 -0.056476 NA 5 -0.068191 -0.047649 -0.062389 -0.058061 -0.034227 -0.185829 -0.071855 -0.064096 -0.195645 NA NA NA NA 6 -0.040208 -0.068191 -0.036475 -0.041268 -0.044207 -0.044363 -0.034991 -0.059810 -0.051619 -0.051563 -0.037216 -0.041576 -0.019762 7 -0.068191 -0.034227 -0.044363 -0.051563 -0.041576 -0.053823 -0.057023 -0.046083 -0.089374 -0.057436 NA NA NA 8 -0.068191 -0.050731 -0.044207 -0.169714 -0.060025 -0.048597 -0.037827 -0.053823 -0.055154 NA NA NA NA 9 -0.062389 -0.044207 -0.050729 -0.044363 -0.043785 NA NA NA NA NA NA NA NA 10 -0.040208 -0.036716 -0.068191 -0.051466 -0.050731 -0.050729 -0.048095 -0.044363 -0.044817 -0.059810 -0.051563 -0.037827 -0.053985 11 -0.059573 -0.052893 NA NA NA NA NA NA NA NA NA NA NA 12 -0.068191 -0.034227 -0.048597 -0.051563 -0.041576 -0.056512 NA NA NA NA NA NA NA 13 -0.040208 -0.050731 -0.044207 -0.048095 -0.044363 -0.044817 -0.037827 -0.053985 -0.059573 NA NA NA NA The above is almost right, but x has 13 rows instead of 12! WHY? Row 10 (which has 15 elements) was cut off at 13, and then the last two elements were put in a new row. WHY? I have tried messing with colClasses to no avail. Any help would be ... umm... helpful! JJ [[alternative HTML version deleted]]
jim holtman
2010-May-26 00:59 UTC
[R] reading in table with different number of elements in each row
This is in the Detail of the help page: The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary. try: read.table(..., col.names=1:30) This will assume there are 30 columns of data (you only said a max of 15, but lets double it) On Tue, May 25, 2010 at 8:05 PM, Johan Jackson <johan.h.jackson at gmail.com> wrote:> HI all, > > This is probably simple, but I haven't been able to locate the answer either > in the Import Manual or from searching the listserve. > > I have tab-delimited data with different numbers of elements in each row. I > want to read it into R, such that R fills in "NA" in elements that have no > data. How do I accomplish this? > > > > Example: > > > DATA on disk: > ? ? ?1 -0.068191 ? ? ? -0.050729 ? ? ? -0.113982 ? ? ? -0.044363 > -0.072445 ? ? ? -0.044516 ? ? ? -0.048597 ? ? ? -0.051866 > -0.051563 ? ? ? -0.041576 > ? ? ?2 -0.032645 ? ? ? -0.062389 ? ? ? -0.054491 ? ? ? -0.058061 > -0.034690 ? ? ? -0.038044 ? ? ? -0.045332 ? ? ? -0.043785 > -0.050639 ? ? ? -0.049617 > ? ? ?3 -0.068191 ? ? ? -0.044207 ? ? ? -0.058061 ? ? ? -0.050729 > -0.034991 ? ? ? -0.045360 ? ? ? -0.051563 ? ? ? -0.060290 > -0.043785 ? ? ? -0.048757 > ? ? ?4 -0.068191 ? ? ? -0.062389 ? ? ? -0.050729 ? ? ? -0.058579 > -0.056481 ? ? ? -0.044363 ? ? ? -0.042347 ? ? ? -0.060290 > -0.051563 ? ? ? -0.037216 ? ? ? -0.041576 ? ? ? -0.056476 > ? ? ?5 -0.068191 ? ? ? -0.047649 ? ? ? -0.062389 ? ? ? -0.058061 > -0.034227 ? ? ? -0.185829 ? ? ? -0.071855 ? ? ? -0.064096 > -0.195645 > ? ? ?6 -0.040208 ? ? ? -0.068191 ? ? ? -0.036475 ? ? ? -0.041268 > -0.044207 ? ? ? -0.044363 ? ? ? -0.034991 ? ? ? -0.059810 > -0.051619 ? ? ? -0.051563 ? ? ? -0.037216 ? ? ? -0.041576 > -0.019762 > ? ? ?7 -0.068191 ? ? ? -0.034227 ? ? ? -0.044363 ? ? ? -0.051563 > -0.041576 ? ? ? -0.053823 ? ? ? -0.057023 ? ? ? -0.046083 > -0.089374 ? ? ? -0.057436 > ? ? ?8 -0.068191 ? ? ? -0.050731 ? ? ? -0.044207 ? ? ? -0.169714 > -0.060025 ? ? ? -0.048597 ? ? ? -0.037827 ? ? ? -0.053823 > -0.055154 > ? ? ?9 -0.062389 ? ? ? -0.044207 ? ? ? -0.050729 ? ? ? -0.044363 > -0.043785 > ? ? 10 -0.040208 ? ? ? -0.036716 ? ? ? -0.068191 ? ? ? -0.051466 > -0.050731 ? ? ? -0.050729 ? ? ? -0.048095 ? ? ? -0.044363 > -0.044817 ? ? ? -0.059810 ? ? ? -0.051563 ? ? ? -0.037827 > -0.053985 ? ? ? -0.059573 ? ? ? -0.052893 > ? ? 11 -0.068191 ? ? ? -0.034227 ? ? ? -0.048597 ? ? ? -0.051563 > -0.041576 ? ? ? -0.056512 > ? ? 12 -0.040208 ? ? ? -0.050731 ? ? ? -0.044207 ? ? ? -0.048095 > -0.044363 ? ? ? -0.044817 ? ? ? -0.037827 ? ? ? -0.053985 ? ? ? -0.059573 > > My attempts: > x <- read.table("DATA",fill=TRUE,sep="\t",colClasses="numeric") >> x > ? ? ? ? ?V1 ? ? ? ?V2 ? ? ? ?V3 ? ? ? ?V4 ? ? ? ?V5 ? ? ? ?V6 > V7 ? ? ? ?V8 ? ? ? ?V9 ? ? ? V10 ? ? ? V11 ? ? ? V12 ? ? ? V13 > 1 ?-0.068191 -0.050729 -0.113982 -0.044363 -0.072445 -0.044516 -0.048597 > -0.051866 -0.051563 -0.041576 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 2 ?-0.032645 -0.062389 -0.054491 -0.058061 -0.034690 -0.038044 -0.045332 > -0.043785 -0.050639 -0.049617 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 3 ?-0.068191 -0.044207 -0.058061 -0.050729 -0.034991 -0.045360 -0.051563 > -0.060290 -0.043785 -0.048757 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 4 ?-0.068191 -0.062389 -0.050729 -0.058579 -0.056481 -0.044363 -0.042347 > -0.060290 -0.051563 -0.037216 -0.041576 -0.056476 ? ? ? ?NA > 5 ?-0.068191 -0.047649 -0.062389 -0.058061 -0.034227 -0.185829 -0.071855 > -0.064096 -0.195645 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 6 ?-0.040208 -0.068191 -0.036475 -0.041268 -0.044207 -0.044363 -0.034991 > -0.059810 -0.051619 -0.051563 -0.037216 -0.041576 -0.019762 > 7 ?-0.068191 -0.034227 -0.044363 -0.051563 -0.041576 -0.053823 -0.057023 > -0.046083 -0.089374 -0.057436 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 8 ?-0.068191 -0.050731 -0.044207 -0.169714 -0.060025 -0.048597 -0.037827 > -0.053823 -0.055154 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 9 ?-0.062389 -0.044207 -0.050729 -0.044363 -0.043785 ? ? ? ?NA > NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 10 -0.040208 -0.036716 -0.068191 -0.051466 -0.050731 -0.050729 -0.048095 > -0.044363 -0.044817 -0.059810 -0.051563 -0.037827 -0.053985 > 11 -0.059573 -0.052893 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 12 -0.068191 -0.034227 -0.048597 -0.051563 -0.041576 -0.056512 > NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 13 -0.040208 -0.050731 -0.044207 -0.048095 -0.044363 -0.044817 -0.037827 > -0.053985 -0.059573 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > > The above is almost right, but x has 13 rows instead of 12! WHY? Row 10 > (which has 15 elements) was cut off at 13, and then the last two elements > were put in a new row. WHY? > I have tried messing with colClasses to no avail. Any help would be ... > umm... helpful! > > JJ > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
David Winsemius
2010-May-26 01:05 UTC
[R] reading in table with different number of elements in each row
On May 25, 2010, at 8:05 PM, Johan Jackson wrote:> HI all, > > This is probably simple, but I haven't been able to locate the > answer either > in the Import Manual or from searching the listserve. > > I have tab-delimited data with different numbers of elements in each > row. I > want to read it into R, such that R fills in "NA" in elements that > have no > data. How do I accomplish this?Look at the fill argument to read.table. read.table(textConnection(" 1 -0.068191 -0.050729 -0.113982 -0.044363\n -0.072445 -0.044516 -0.048597 -0.051866\n -0.051563 -0.041576\n 2 -0.032645 -0.062389 -0.054491 -0.058061\n -0.034690 -0.038044 -0.045332 -0.043785\n -0.050639 -0.049617"), header=FALSE, fill =TRUE, colClasses=rep("numeric", 4)) V1 V2 V3 V4 V5 1 1.000000 -0.068191 -0.050729 -0.113982 -0.044363 2 -0.072445 -0.044516 -0.048597 -0.051866 NA 3 -0.051563 -0.041576 NA NA NA 4 2.000000 -0.032645 -0.062389 -0.054491 -0.058061 5 -0.034690 -0.038044 -0.045332 -0.043785 NA 6 -0.050639 -0.049617 NA NA NA In your case you may want to use sep="\t" -- David.> > > > Example: > > > DATA on disk: > 1 -0.068191 -0.050729 -0.113982 -0.044363 > -0.072445 -0.044516 -0.048597 -0.051866 > -0.051563 -0.041576 > 2 -0.032645 -0.062389 -0.054491 -0.058061 > -0.034690 -0.038044 -0.045332 -0.043785 > -0.050639 -0.049617 > 3 -0.068191 -0.044207 -0.058061 -0.050729 > -0.034991 -0.045360 -0.051563 -0.060290 > -0.043785 -0.048757 > 4 -0.068191 -0.062389 -0.050729 -0.058579 > -0.056481 -0.044363 -0.042347 -0.060290 > -0.051563 -0.037216 -0.041576 -0.056476 > 5 -0.068191 -0.047649 -0.062389 -0.058061 > -0.034227 -0.185829 -0.071855 -0.064096 > -0.195645 > 6 -0.040208 -0.068191 -0.036475 -0.041268 > -0.044207 -0.044363 -0.034991 -0.059810 > -0.051619 -0.051563 -0.037216 -0.041576 > -0.019762 > 7 -0.068191 -0.034227 -0.044363 -0.051563 > -0.041576 -0.053823 -0.057023 -0.046083 > -0.089374 -0.057436 > 8 -0.068191 -0.050731 -0.044207 -0.169714 > -0.060025 -0.048597 -0.037827 -0.053823 > -0.055154 > 9 -0.062389 -0.044207 -0.050729 -0.044363 > -0.043785 > 10 -0.040208 -0.036716 -0.068191 -0.051466 > -0.050731 -0.050729 -0.048095 -0.044363 > -0.044817 -0.059810 -0.051563 -0.037827 > -0.053985 -0.059573 -0.052893 > 11 -0.068191 -0.034227 -0.048597 -0.051563 > -0.041576 -0.056512 > 12 -0.040208 -0.050731 -0.044207 -0.048095 > -0.044363 -0.044817 -0.037827 -0.053985 > -0.059573 > > My attempts: > x <- read.table("DATA",fill=TRUE,sep="\t",colClasses="numeric") >> x > V1 V2 V3 V4 V5 V6 > V7 V8 V9 V10 V11 V12 V13 > 1 -0.068191 -0.050729 -0.113982 -0.044363 -0.072445 -0.044516 > -0.048597 > -0.051866 -0.051563 -0.041576 NA NA NA > 2 -0.032645 -0.062389 -0.054491 -0.058061 -0.034690 -0.038044 > -0.045332 > -0.043785 -0.050639 -0.049617 NA NA NA > 3 -0.068191 -0.044207 -0.058061 -0.050729 -0.034991 -0.045360 > -0.051563 > -0.060290 -0.043785 -0.048757 NA NA NA > 4 -0.068191 -0.062389 -0.050729 -0.058579 -0.056481 -0.044363 > -0.042347 > -0.060290 -0.051563 -0.037216 -0.041576 -0.056476 NA > 5 -0.068191 -0.047649 -0.062389 -0.058061 -0.034227 -0.185829 > -0.071855 > -0.064096 -0.195645 NA NA NA NA > 6 -0.040208 -0.068191 -0.036475 -0.041268 -0.044207 -0.044363 > -0.034991 > -0.059810 -0.051619 -0.051563 -0.037216 -0.041576 -0.019762 > 7 -0.068191 -0.034227 -0.044363 -0.051563 -0.041576 -0.053823 > -0.057023 > -0.046083 -0.089374 -0.057436 NA NA NA > 8 -0.068191 -0.050731 -0.044207 -0.169714 -0.060025 -0.048597 > -0.037827 > -0.053823 -0.055154 NA NA NA NA > 9 -0.062389 -0.044207 -0.050729 -0.044363 -0.043785 NA > NA NA NA NA NA NA NA > 10 -0.040208 -0.036716 -0.068191 -0.051466 -0.050731 -0.050729 > -0.048095 > -0.044363 -0.044817 -0.059810 -0.051563 -0.037827 -0.053985 > 11 -0.059573 -0.052893 NA NA NA NA > NA NA NA NA NA NA NA > 12 -0.068191 -0.034227 -0.048597 -0.051563 -0.041576 -0.056512 > NA NA NA NA NA NA NA > 13 -0.040208 -0.050731 -0.044207 -0.048095 -0.044363 -0.044817 > -0.037827 > -0.053985 -0.059573 NA NA NA NA > > The above is almost right, but x has 13 rows instead of 12! WHY? Row > 10 > (which has 15 elements) was cut off at 13, and then the last two > elements > were put in a new row. WHY? > I have tried messing with colClasses to no avail. Any help would > be ... > umm... helpful! > > JJ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT