Johan Jackson
2010-May-26 00:05 UTC
[R] reading in table with different number of elements in each row
HI all,
This is probably simple, but I haven't been able to locate the answer either
in the Import Manual or from searching the listserve.
I have tab-delimited data with different numbers of elements in each row. I
want to read it into R, such that R fills in "NA" in elements that
have no
data. How do I accomplish this?
Example:
DATA on disk:
1 -0.068191 -0.050729 -0.113982 -0.044363
-0.072445 -0.044516 -0.048597 -0.051866
-0.051563 -0.041576
2 -0.032645 -0.062389 -0.054491 -0.058061
-0.034690 -0.038044 -0.045332 -0.043785
-0.050639 -0.049617
3 -0.068191 -0.044207 -0.058061 -0.050729
-0.034991 -0.045360 -0.051563 -0.060290
-0.043785 -0.048757
4 -0.068191 -0.062389 -0.050729 -0.058579
-0.056481 -0.044363 -0.042347 -0.060290
-0.051563 -0.037216 -0.041576 -0.056476
5 -0.068191 -0.047649 -0.062389 -0.058061
-0.034227 -0.185829 -0.071855 -0.064096
-0.195645
6 -0.040208 -0.068191 -0.036475 -0.041268
-0.044207 -0.044363 -0.034991 -0.059810
-0.051619 -0.051563 -0.037216 -0.041576
-0.019762
7 -0.068191 -0.034227 -0.044363 -0.051563
-0.041576 -0.053823 -0.057023 -0.046083
-0.089374 -0.057436
8 -0.068191 -0.050731 -0.044207 -0.169714
-0.060025 -0.048597 -0.037827 -0.053823
-0.055154
9 -0.062389 -0.044207 -0.050729 -0.044363
-0.043785
10 -0.040208 -0.036716 -0.068191 -0.051466
-0.050731 -0.050729 -0.048095 -0.044363
-0.044817 -0.059810 -0.051563 -0.037827
-0.053985 -0.059573 -0.052893
11 -0.068191 -0.034227 -0.048597 -0.051563
-0.041576 -0.056512
12 -0.040208 -0.050731 -0.044207 -0.048095
-0.044363 -0.044817 -0.037827 -0.053985 -0.059573
My attempts:
x <-
read.table("DATA",fill=TRUE,sep="\t",colClasses="numeric")> x
V1 V2 V3 V4 V5 V6
V7 V8 V9 V10 V11 V12 V13
1 -0.068191 -0.050729 -0.113982 -0.044363 -0.072445 -0.044516 -0.048597
-0.051866 -0.051563 -0.041576 NA NA NA
2 -0.032645 -0.062389 -0.054491 -0.058061 -0.034690 -0.038044 -0.045332
-0.043785 -0.050639 -0.049617 NA NA NA
3 -0.068191 -0.044207 -0.058061 -0.050729 -0.034991 -0.045360 -0.051563
-0.060290 -0.043785 -0.048757 NA NA NA
4 -0.068191 -0.062389 -0.050729 -0.058579 -0.056481 -0.044363 -0.042347
-0.060290 -0.051563 -0.037216 -0.041576 -0.056476 NA
5 -0.068191 -0.047649 -0.062389 -0.058061 -0.034227 -0.185829 -0.071855
-0.064096 -0.195645 NA NA NA NA
6 -0.040208 -0.068191 -0.036475 -0.041268 -0.044207 -0.044363 -0.034991
-0.059810 -0.051619 -0.051563 -0.037216 -0.041576 -0.019762
7 -0.068191 -0.034227 -0.044363 -0.051563 -0.041576 -0.053823 -0.057023
-0.046083 -0.089374 -0.057436 NA NA NA
8 -0.068191 -0.050731 -0.044207 -0.169714 -0.060025 -0.048597 -0.037827
-0.053823 -0.055154 NA NA NA NA
9 -0.062389 -0.044207 -0.050729 -0.044363 -0.043785 NA
NA NA NA NA NA NA NA
10 -0.040208 -0.036716 -0.068191 -0.051466 -0.050731 -0.050729 -0.048095
-0.044363 -0.044817 -0.059810 -0.051563 -0.037827 -0.053985
11 -0.059573 -0.052893 NA NA NA NA
NA NA NA NA NA NA NA
12 -0.068191 -0.034227 -0.048597 -0.051563 -0.041576 -0.056512
NA NA NA NA NA NA NA
13 -0.040208 -0.050731 -0.044207 -0.048095 -0.044363 -0.044817 -0.037827
-0.053985 -0.059573 NA NA NA NA
The above is almost right, but x has 13 rows instead of 12! WHY? Row 10
(which has 15 elements) was cut off at 13, and then the last two elements
were put in a new row. WHY?
I have tried messing with colClasses to no avail. Any help would be ...
umm... helpful!
JJ
[[alternative HTML version deleted]]
jim holtman
2010-May-26 00:59 UTC
[R] reading in table with different number of elements in each row
This is in the Detail of the help page: The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary. try: read.table(..., col.names=1:30) This will assume there are 30 columns of data (you only said a max of 15, but lets double it) On Tue, May 25, 2010 at 8:05 PM, Johan Jackson <johan.h.jackson at gmail.com> wrote:> HI all, > > This is probably simple, but I haven't been able to locate the answer either > in the Import Manual or from searching the listserve. > > I have tab-delimited data with different numbers of elements in each row. I > want to read it into R, such that R fills in "NA" in elements that have no > data. How do I accomplish this? > > > > Example: > > > DATA on disk: > ? ? ?1 -0.068191 ? ? ? -0.050729 ? ? ? -0.113982 ? ? ? -0.044363 > -0.072445 ? ? ? -0.044516 ? ? ? -0.048597 ? ? ? -0.051866 > -0.051563 ? ? ? -0.041576 > ? ? ?2 -0.032645 ? ? ? -0.062389 ? ? ? -0.054491 ? ? ? -0.058061 > -0.034690 ? ? ? -0.038044 ? ? ? -0.045332 ? ? ? -0.043785 > -0.050639 ? ? ? -0.049617 > ? ? ?3 -0.068191 ? ? ? -0.044207 ? ? ? -0.058061 ? ? ? -0.050729 > -0.034991 ? ? ? -0.045360 ? ? ? -0.051563 ? ? ? -0.060290 > -0.043785 ? ? ? -0.048757 > ? ? ?4 -0.068191 ? ? ? -0.062389 ? ? ? -0.050729 ? ? ? -0.058579 > -0.056481 ? ? ? -0.044363 ? ? ? -0.042347 ? ? ? -0.060290 > -0.051563 ? ? ? -0.037216 ? ? ? -0.041576 ? ? ? -0.056476 > ? ? ?5 -0.068191 ? ? ? -0.047649 ? ? ? -0.062389 ? ? ? -0.058061 > -0.034227 ? ? ? -0.185829 ? ? ? -0.071855 ? ? ? -0.064096 > -0.195645 > ? ? ?6 -0.040208 ? ? ? -0.068191 ? ? ? -0.036475 ? ? ? -0.041268 > -0.044207 ? ? ? -0.044363 ? ? ? -0.034991 ? ? ? -0.059810 > -0.051619 ? ? ? -0.051563 ? ? ? -0.037216 ? ? ? -0.041576 > -0.019762 > ? ? ?7 -0.068191 ? ? ? -0.034227 ? ? ? -0.044363 ? ? ? -0.051563 > -0.041576 ? ? ? -0.053823 ? ? ? -0.057023 ? ? ? -0.046083 > -0.089374 ? ? ? -0.057436 > ? ? ?8 -0.068191 ? ? ? -0.050731 ? ? ? -0.044207 ? ? ? -0.169714 > -0.060025 ? ? ? -0.048597 ? ? ? -0.037827 ? ? ? -0.053823 > -0.055154 > ? ? ?9 -0.062389 ? ? ? -0.044207 ? ? ? -0.050729 ? ? ? -0.044363 > -0.043785 > ? ? 10 -0.040208 ? ? ? -0.036716 ? ? ? -0.068191 ? ? ? -0.051466 > -0.050731 ? ? ? -0.050729 ? ? ? -0.048095 ? ? ? -0.044363 > -0.044817 ? ? ? -0.059810 ? ? ? -0.051563 ? ? ? -0.037827 > -0.053985 ? ? ? -0.059573 ? ? ? -0.052893 > ? ? 11 -0.068191 ? ? ? -0.034227 ? ? ? -0.048597 ? ? ? -0.051563 > -0.041576 ? ? ? -0.056512 > ? ? 12 -0.040208 ? ? ? -0.050731 ? ? ? -0.044207 ? ? ? -0.048095 > -0.044363 ? ? ? -0.044817 ? ? ? -0.037827 ? ? ? -0.053985 ? ? ? -0.059573 > > My attempts: > x <- read.table("DATA",fill=TRUE,sep="\t",colClasses="numeric") >> x > ? ? ? ? ?V1 ? ? ? ?V2 ? ? ? ?V3 ? ? ? ?V4 ? ? ? ?V5 ? ? ? ?V6 > V7 ? ? ? ?V8 ? ? ? ?V9 ? ? ? V10 ? ? ? V11 ? ? ? V12 ? ? ? V13 > 1 ?-0.068191 -0.050729 -0.113982 -0.044363 -0.072445 -0.044516 -0.048597 > -0.051866 -0.051563 -0.041576 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 2 ?-0.032645 -0.062389 -0.054491 -0.058061 -0.034690 -0.038044 -0.045332 > -0.043785 -0.050639 -0.049617 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 3 ?-0.068191 -0.044207 -0.058061 -0.050729 -0.034991 -0.045360 -0.051563 > -0.060290 -0.043785 -0.048757 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 4 ?-0.068191 -0.062389 -0.050729 -0.058579 -0.056481 -0.044363 -0.042347 > -0.060290 -0.051563 -0.037216 -0.041576 -0.056476 ? ? ? ?NA > 5 ?-0.068191 -0.047649 -0.062389 -0.058061 -0.034227 -0.185829 -0.071855 > -0.064096 -0.195645 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 6 ?-0.040208 -0.068191 -0.036475 -0.041268 -0.044207 -0.044363 -0.034991 > -0.059810 -0.051619 -0.051563 -0.037216 -0.041576 -0.019762 > 7 ?-0.068191 -0.034227 -0.044363 -0.051563 -0.041576 -0.053823 -0.057023 > -0.046083 -0.089374 -0.057436 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 8 ?-0.068191 -0.050731 -0.044207 -0.169714 -0.060025 -0.048597 -0.037827 > -0.053823 -0.055154 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 9 ?-0.062389 -0.044207 -0.050729 -0.044363 -0.043785 ? ? ? ?NA > NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 10 -0.040208 -0.036716 -0.068191 -0.051466 -0.050731 -0.050729 -0.048095 > -0.044363 -0.044817 -0.059810 -0.051563 -0.037827 -0.053985 > 11 -0.059573 -0.052893 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 12 -0.068191 -0.034227 -0.048597 -0.051563 -0.041576 -0.056512 > NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > 13 -0.040208 -0.050731 -0.044207 -0.048095 -0.044363 -0.044817 -0.037827 > -0.053985 -0.059573 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > > The above is almost right, but x has 13 rows instead of 12! WHY? Row 10 > (which has 15 elements) was cut off at 13, and then the last two elements > were put in a new row. WHY? > I have tried messing with colClasses to no avail. Any help would be ... > umm... helpful! > > JJ > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
David Winsemius
2010-May-26 01:05 UTC
[R] reading in table with different number of elements in each row
On May 25, 2010, at 8:05 PM, Johan Jackson wrote:> HI all, > > This is probably simple, but I haven't been able to locate the > answer either > in the Import Manual or from searching the listserve. > > I have tab-delimited data with different numbers of elements in each > row. I > want to read it into R, such that R fills in "NA" in elements that > have no > data. How do I accomplish this?Look at the fill argument to read.table. read.table(textConnection(" 1 -0.068191 -0.050729 -0.113982 -0.044363\n -0.072445 -0.044516 -0.048597 -0.051866\n -0.051563 -0.041576\n 2 -0.032645 -0.062389 -0.054491 -0.058061\n -0.034690 -0.038044 -0.045332 -0.043785\n -0.050639 -0.049617"), header=FALSE, fill =TRUE, colClasses=rep("numeric", 4)) V1 V2 V3 V4 V5 1 1.000000 -0.068191 -0.050729 -0.113982 -0.044363 2 -0.072445 -0.044516 -0.048597 -0.051866 NA 3 -0.051563 -0.041576 NA NA NA 4 2.000000 -0.032645 -0.062389 -0.054491 -0.058061 5 -0.034690 -0.038044 -0.045332 -0.043785 NA 6 -0.050639 -0.049617 NA NA NA In your case you may want to use sep="\t" -- David.> > > > Example: > > > DATA on disk: > 1 -0.068191 -0.050729 -0.113982 -0.044363 > -0.072445 -0.044516 -0.048597 -0.051866 > -0.051563 -0.041576 > 2 -0.032645 -0.062389 -0.054491 -0.058061 > -0.034690 -0.038044 -0.045332 -0.043785 > -0.050639 -0.049617 > 3 -0.068191 -0.044207 -0.058061 -0.050729 > -0.034991 -0.045360 -0.051563 -0.060290 > -0.043785 -0.048757 > 4 -0.068191 -0.062389 -0.050729 -0.058579 > -0.056481 -0.044363 -0.042347 -0.060290 > -0.051563 -0.037216 -0.041576 -0.056476 > 5 -0.068191 -0.047649 -0.062389 -0.058061 > -0.034227 -0.185829 -0.071855 -0.064096 > -0.195645 > 6 -0.040208 -0.068191 -0.036475 -0.041268 > -0.044207 -0.044363 -0.034991 -0.059810 > -0.051619 -0.051563 -0.037216 -0.041576 > -0.019762 > 7 -0.068191 -0.034227 -0.044363 -0.051563 > -0.041576 -0.053823 -0.057023 -0.046083 > -0.089374 -0.057436 > 8 -0.068191 -0.050731 -0.044207 -0.169714 > -0.060025 -0.048597 -0.037827 -0.053823 > -0.055154 > 9 -0.062389 -0.044207 -0.050729 -0.044363 > -0.043785 > 10 -0.040208 -0.036716 -0.068191 -0.051466 > -0.050731 -0.050729 -0.048095 -0.044363 > -0.044817 -0.059810 -0.051563 -0.037827 > -0.053985 -0.059573 -0.052893 > 11 -0.068191 -0.034227 -0.048597 -0.051563 > -0.041576 -0.056512 > 12 -0.040208 -0.050731 -0.044207 -0.048095 > -0.044363 -0.044817 -0.037827 -0.053985 > -0.059573 > > My attempts: > x <- read.table("DATA",fill=TRUE,sep="\t",colClasses="numeric") >> x > V1 V2 V3 V4 V5 V6 > V7 V8 V9 V10 V11 V12 V13 > 1 -0.068191 -0.050729 -0.113982 -0.044363 -0.072445 -0.044516 > -0.048597 > -0.051866 -0.051563 -0.041576 NA NA NA > 2 -0.032645 -0.062389 -0.054491 -0.058061 -0.034690 -0.038044 > -0.045332 > -0.043785 -0.050639 -0.049617 NA NA NA > 3 -0.068191 -0.044207 -0.058061 -0.050729 -0.034991 -0.045360 > -0.051563 > -0.060290 -0.043785 -0.048757 NA NA NA > 4 -0.068191 -0.062389 -0.050729 -0.058579 -0.056481 -0.044363 > -0.042347 > -0.060290 -0.051563 -0.037216 -0.041576 -0.056476 NA > 5 -0.068191 -0.047649 -0.062389 -0.058061 -0.034227 -0.185829 > -0.071855 > -0.064096 -0.195645 NA NA NA NA > 6 -0.040208 -0.068191 -0.036475 -0.041268 -0.044207 -0.044363 > -0.034991 > -0.059810 -0.051619 -0.051563 -0.037216 -0.041576 -0.019762 > 7 -0.068191 -0.034227 -0.044363 -0.051563 -0.041576 -0.053823 > -0.057023 > -0.046083 -0.089374 -0.057436 NA NA NA > 8 -0.068191 -0.050731 -0.044207 -0.169714 -0.060025 -0.048597 > -0.037827 > -0.053823 -0.055154 NA NA NA NA > 9 -0.062389 -0.044207 -0.050729 -0.044363 -0.043785 NA > NA NA NA NA NA NA NA > 10 -0.040208 -0.036716 -0.068191 -0.051466 -0.050731 -0.050729 > -0.048095 > -0.044363 -0.044817 -0.059810 -0.051563 -0.037827 -0.053985 > 11 -0.059573 -0.052893 NA NA NA NA > NA NA NA NA NA NA NA > 12 -0.068191 -0.034227 -0.048597 -0.051563 -0.041576 -0.056512 > NA NA NA NA NA NA NA > 13 -0.040208 -0.050731 -0.044207 -0.048095 -0.044363 -0.044817 > -0.037827 > -0.053985 -0.059573 NA NA NA NA > > The above is almost right, but x has 13 rows instead of 12! WHY? Row > 10 > (which has 15 elements) was cut off at 13, and then the last two > elements > were put in a new row. WHY? > I have tried messing with colClasses to no avail. Any help would > be ... > umm... helpful! > > JJ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT