Hi, Wasn't sure how to explain this problem succinctly in a title. I am trying to read in a text file that looks like: 0 1000 175 1 2 3 1 1000 58 0 2 9 2 1000 35 0 1 3 10 3 1000 300 0 2 4 5 10 11 18 4 1000 150 3 5 6 5 1000 100 3 4 6 7 18 6 1000 50 4 5 7 8 7 1000 155 5 6 8 19 8 1000 255 6 7 19 9 1000 200 1 10 12 10 1000 52 2 3 9 11 12 13 11 1000 70 3 10 14 15 16 17 18 19 12 1000 250 9 10 13 13 1000 40 10 12 14 14 1000 235 11 13 15 15 1000 127 11 14 16 17 16 1000 177 11 15 17 17 1000 358 11 15 16 18 1000 296 3 5 11 19 19 1000 120 7 8 11 18 The problem with this is that the 12th row (row with 11 in the first column) doesn't get read in correctly. To read into R, I'm using a command like: matrix(unlist(read.table(datafile, sep="",fill=T)), ncol=max(count.fields(datafile, sep="")),byrow=F) but that gives [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [1,] 0 19 1000 358 11 14 15 NA NA NA 18 [2,] 1 1000 1000 296 11 15 16 NA NA NA NA [3,] 2 1000 175 120 3 15 17 17 NA NA NA [4,] 3 1000 58 1 7 5 16 NA NA NA NA [5,] 4 1000 35 0 2 8 11 NA NA NA NA [6,] 5 1000 300 0 2 3 11 19 NA NA NA [7,] 6 1000 150 0 1 9 NA 18 NA NA NA [8,] 7 1000 100 3 2 3 NA NA NA NA NA [9,] 8 1000 50 3 5 4 10 NA NA NA NA [10,] 9 1000 155 4 4 6 5 NA NA NA NA [11,] 10 1000 255 5 5 6 NA 10 NA NA 0 [12,] 11 1000 200 6 6 7 7 NA 11 NA 1 [13,] 19 1000 52 1 7 8 8 18 NA 18 2 [14,] 12 NA 70 2 10 19 19 NA NA NA 3 [15,] 13 1000 NA 3 3 12 NA NA NA NA 4 [16,] 14 1000 250 NA 10 9 NA NA NA NA 5 [17,] 15 1000 40 9 NA 14 11 NA NA NA 6 [18,] 16 1000 235 10 10 NA 15 12 NA NA 7 [19,] 17 1000 127 11 12 13 NA 16 13 NA 8 [20,] 18 1000 177 11 13 14 NA NA 17 NA 9 I've tried other things, but this is as close as I've been able to get and I'm at a loss at this point. Any input would be helpful...thanks...mj
Mike Jones wrote:> Hi, > > Wasn't sure how to explain this problem succinctly in a title. I am > trying to read in a text file that looks like: > > 0 1000 175 1 2 3 > 1 1000 58 0 2 9 > 2 1000 35 0 1 3 10 > 3 1000 300 0 2 4 5 10 11 18 > 4 1000 150 3 5 6 > 5 1000 100 3 4 6 7 18 > 6 1000 50 4 5 7 8 > 7 1000 155 5 6 8 19 > 8 1000 255 6 7 19 > 9 1000 200 1 10 12 > 10 1000 52 2 3 9 11 12 13 > 11 1000 70 3 10 14 15 16 17 18 19 > 12 1000 250 9 10 13 > 13 1000 40 10 12 14 > 14 1000 235 11 13 15 > 15 1000 127 11 14 16 17 > 16 1000 177 11 15 17 > 17 1000 358 11 15 16 > 18 1000 296 3 5 11 19 > 19 1000 120 7 8 11 18 > > The problem with this is that the 12th row (row with 11 in the first > column) doesn't get read in correctly. To read into R, I'm using a > command like: > > matrix(unlist(read.table(datafile, sep="",fill=T)), > ncol=max(count.fields(datafile, sep="")),byrow=F) > > but that gives > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] > [1,] 0 19 1000 358 11 14 15 NA NA NA 18 > [2,] 1 1000 1000 296 11 15 16 NA NA NA NA > [3,] 2 1000 175 120 3 15 17 17 NA NA NA > [4,] 3 1000 58 1 7 5 16 NA NA NA NA > [5,] 4 1000 35 0 2 8 11 NA NA NA NA > [6,] 5 1000 300 0 2 3 11 19 NA NA NA > [7,] 6 1000 150 0 1 9 NA 18 NA NA NA > [8,] 7 1000 100 3 2 3 NA NA NA NA NA > [9,] 8 1000 50 3 5 4 10 NA NA NA NA > [10,] 9 1000 155 4 4 6 5 NA NA NA NA > [11,] 10 1000 255 5 5 6 NA 10 NA NA 0 > [12,] 11 1000 200 6 6 7 7 NA 11 NA 1 > [13,] 19 1000 52 1 7 8 8 18 NA 18 2 > [14,] 12 NA 70 2 10 19 19 NA NA NA 3 > [15,] 13 1000 NA 3 3 12 NA NA NA NA 4 > [16,] 14 1000 250 NA 10 9 NA NA NA NA 5 > [17,] 15 1000 40 9 NA 14 11 NA NA NA 6 > [18,] 16 1000 235 10 10 NA 15 12 NA NA 7 > [19,] 17 1000 127 11 12 13 NA 16 13 NA 8 > [20,] 18 1000 177 11 13 14 NA NA 17 NA 9 > > I've tried other things, but this is as close as I've been able to get > and I'm at a loss at this point. Any input would be > helpful...thanks...mj >There are two ways that I know of to get around this. I'm sure there are others: ## read in the file to determine the max number of columns x <- scan("file.txt", what = "", sep = "\n") x <- strsplit(x, "[ \t]+") # split string by white space max.col <- max(sapply(x, length)) ## option 1 ## specify col.names as ?read.table suggests cn <- paste("V", 1:max.col, sep = "") z1 <- read.table("file.txt", fill = TRUE, col.names = cn) ## option 2 ## parse `x' yourself and construct a matrix z2 <- t(sapply(x, function(i) { n <- length(i) y <- rep(NA, max.col) y[1:n] <- as.numeric(i) y }))
On 11/15/05, Mike Jones <MikeJones at westat.com> wrote:> Hi, > > Wasn't sure how to explain this problem succinctly in a title. I am > trying to read in a text file that looks like: > > 0 1000 175 1 2 3 > 1 1000 58 0 2 9 > 2 1000 35 0 1 3 10 > 3 1000 300 0 2 4 5 10 11 18 > 4 1000 150 3 5 6 > 5 1000 100 3 4 6 7 18 > 6 1000 50 4 5 7 8 > 7 1000 155 5 6 8 19 > 8 1000 255 6 7 19 > 9 1000 200 1 10 12 > 10 1000 52 2 3 9 11 12 13 > 11 1000 70 3 10 14 15 16 17 18 19 > 12 1000 250 9 10 13 > 13 1000 40 10 12 14 > 14 1000 235 11 13 15 > 15 1000 127 11 14 16 17 > 16 1000 177 11 15 17 > 17 1000 358 11 15 16 > 18 1000 296 3 5 11 19 > 19 1000 120 7 8 11 18 > > The problem with this is that the 12th row (row with 11 in the first > column) doesn't get read in correctly. To read into R, I'm using a > command like: > > matrix(unlist(read.table(datafile, sep="",fill=T)), > ncol=max(count.fields(datafile, sep="")),byrow=F) > > but that gives > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] > [1,] 0 19 1000 358 11 14 15 NA NA NA 18 > [2,] 1 1000 1000 296 11 15 16 NA NA NA NA > [3,] 2 1000 175 120 3 15 17 17 NA NA NA > [4,] 3 1000 58 1 7 5 16 NA NA NA NA > [5,] 4 1000 35 0 2 8 11 NA NA NA NA > [6,] 5 1000 300 0 2 3 11 19 NA NA NA > [7,] 6 1000 150 0 1 9 NA 18 NA NA NA > [8,] 7 1000 100 3 2 3 NA NA NA NA NA > [9,] 8 1000 50 3 5 4 10 NA NA NA NA > [10,] 9 1000 155 4 4 6 5 NA NA NA NA > [11,] 10 1000 255 5 5 6 NA 10 NA NA 0 > [12,] 11 1000 200 6 6 7 7 NA 11 NA 1 > [13,] 19 1000 52 1 7 8 8 18 NA 18 2 > [14,] 12 NA 70 2 10 19 19 NA NA NA 3 > [15,] 13 1000 NA 3 3 12 NA NA NA NA 4 > [16,] 14 1000 250 NA 10 9 NA NA NA NA 5 > [17,] 15 1000 40 9 NA 14 11 NA NA NA 6 > [18,] 16 1000 235 10 10 NA 15 12 NA NA 7 > [19,] 17 1000 127 11 12 13 NA 16 13 NA 8 > [20,] 18 1000 177 11 13 14 NA NA 17 NA 9 >Try this: nf <- max(count.fields(datafile)) read.table(datafile, fill = TRUE, col.names = 1:nf)
======= 2005-11-15 23:18:05 伳侜佋佢伬伌佇伵佒佇佇伌伒伬仯伜======>Hi, > >Wasn't sure how to explain this problem succinctly in a title. I am >trying to read in a text file that looks like: > >0 1000 175 1 2 3 >1 1000 58 0 2 9 >2 1000 35 0 1 3 10 >3 1000 300 0 2 4 5 10 11 18 >4 1000 150 3 5 6 >5 1000 100 3 4 6 7 18 >6 1000 50 4 5 7 8 >7 1000 155 5 6 8 19 >8 1000 255 6 7 19 >9 1000 200 1 10 12 >10 1000 52 2 3 9 11 12 13 >11 1000 70 3 10 14 15 16 17 18 19 >12 1000 250 9 10 13 >13 1000 40 10 12 14 >14 1000 235 11 13 15 >15 1000 127 11 14 16 17 >16 1000 177 11 15 17 >17 1000 358 11 15 16 >18 1000 296 3 5 11 19 >19 1000 120 7 8 11 18 > >The problem with this is that the 12th row (row with 11 in the first >column) doesn't get read in correctly. To read into R, I'm using a >command like: > >matrix(unlist(read.table(datafile, sep="",fill=T)), > ncol=max(count.fields(datafile, sep="")),byrow=F)?read.table will find The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of 'col.names' if it is specified and is longer. This could conceivably be wrong if 'fill' or 'blank.lines.skip' are true, so specify 'col.names' if necessary. So try: nc<-max(count.fields(datafile, sep="") x<-read.table(datafile,sep="",col.names=paste("v",1:nc,sep="."),fill=T) matrix(unlist(x),ncol=nc)>but that gives > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] > [1,] 0 19 1000 358 11 14 15 NA NA NA 18 > [2,] 1 1000 1000 296 11 15 16 NA NA NA NA > [3,] 2 1000 175 120 3 15 17 17 NA NA NA > [4,] 3 1000 58 1 7 5 16 NA NA NA NA > [5,] 4 1000 35 0 2 8 11 NA NA NA NA > [6,] 5 1000 300 0 2 3 11 19 NA NA NA > [7,] 6 1000 150 0 1 9 NA 18 NA NA NA > [8,] 7 1000 100 3 2 3 NA NA NA NA NA > [9,] 8 1000 50 3 5 4 10 NA NA NA NA >[10,] 9 1000 155 4 4 6 5 NA NA NA NA >[11,] 10 1000 255 5 5 6 NA 10 NA NA 0 >[12,] 11 1000 200 6 6 7 7 NA 11 NA 1 >[13,] 19 1000 52 1 7 8 8 18 NA 18 2 >[14,] 12 NA 70 2 10 19 19 NA NA NA 3 >[15,] 13 1000 NA 3 3 12 NA NA NA NA 4 >[16,] 14 1000 250 NA 10 9 NA NA NA NA 5 >[17,] 15 1000 40 9 NA 14 11 NA NA NA 6 >[18,] 16 1000 235 10 10 NA 15 12 NA NA 7 >[19,] 17 1000 127 11 12 13 NA 16 13 NA 8 >[20,] 18 1000 177 11 13 14 NA NA 17 NA 9 > >I've tried other things, but this is as close as I've been able to get >and I'm at a loss at this point. Any input would be >helpful...thanks...mj > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html= = = = = = = = = = = = = = = = = = = 2005-11-16 ------ Deparment of Sociology Fudan University My new mail addres is ronggui.huang at gmail.com Blog:http://sociology.yculblog.com