I've been trying to figure out why the following is happening.... I've got some data I'll load in from a file... rm(list=ls(all=TRUE)) trees <- read.table( "c:/cruisepak/data.txt", header=T) trees$ct <- 1 And when I create some temp variable, then split the data to perform further processing, the additional column doesn't maintain the data correctly.... mtrees <- trees[trees$m == 1,] ctrees <- trees[trees$m == 0,] The results are as follows...> treesplot tree m sp dbh tht ct 1 1 1 1 DF 44 185 1 3 1 3 0 DF 40 192 1 .....blah, blah, blah...... 6 1 6 0 DF 26 156 1 8 1 8 0 DF 26 155 1 but,>mtreesplot tree m sp dbh tht ct 1 1 1 1 DF 44 185 1 2 1 2 1 DF 38 188 NA 17 2 6 1 DF 26 174 NA 26 3 1 1 DF 42 185 NA and>ctrees > ctreesplot tree m sp dbh tht ct 3 1 3 0 DF 40 192 NA 4 1 4 0 DF 33 148 NA 5 1 5 0 DF 43 182 NA when the value of ct for all the records in all the data.frames should be 1, not NA. Why is that? Am I missing a step here? I'm running R 1.7.1 on Win2k. Jeff. --- Jeff D. Hamann Forest Informatics, Inc. PO Box 1421 Corvallis, Oregon USA 97339-1421 (office) 541-754-1428 (cell) 541-740-5988 jeff.hamann at forestinformatics.com www.forestinformatics.com --- Jeff D. Hamann Forest Informatics, Inc. PO Box 1421 Corvallis, Oregon USA 97339-1421 (office) 541-754-1428 (cell) 541-740-5988 jeff.hamann at forestinformatics.com www.forestinformatics.com
Not having the data, I can't reproduce this, but I would say that trees$ct <- 1 is not a robust way to add a column to a data frame; it relies on data frames being implemented as lists (and is probably the cause of the error - only guessing though). I suspect ct repeats ones only on printout, internally it is a half-dataframe, half-list hybrid. Do you get the same problem if you use trees <- data.frame(trees, ct=1) or even more explicitly trees <- data.frame(trees, ct=rep(1, length(trees[,1]))> -----Original Message----- > From: Jeff D. Hamann [mailto:jeff.hamann at forestinformatics.com] > Sent: 06 November 2003 17:17 > To: r-help at stat.math.ethz.ch > Subject: [R] created data doesn't remain when split... > > > Security Warning: > If you are not sure an attachment is safe to open please contact > Andy on x234. There are 0 attachments with this message. > ________________________________________________________________ > > > I've been trying to figure out why the following is happening.... > > I've got some data I'll load in from a file... > > rm(list=ls(all=TRUE)) > trees <- read.table( "c:/cruisepak/data.txt", header=T) > trees$ct <- 1 > > And when I create some temp variable, then split the data to > perform further > processing, the additional column doesn't maintain the data > correctly.... > > mtrees <- trees[trees$m == 1,] > ctrees <- trees[trees$m == 0,] > > The results are as follows... > > > trees > plot tree m sp dbh tht ct > 1 1 1 1 DF 44 185 1 > 3 1 3 0 DF 40 192 1 > .....blah, blah, blah...... > 6 1 6 0 DF 26 156 1 > 8 1 8 0 DF 26 155 1 > > > but, > >mtrees > plot tree m sp dbh tht ct > 1 1 1 1 DF 44 185 1 > 2 1 2 1 DF 38 188 NA > 17 2 6 1 DF 26 174 NA > 26 3 1 1 DF 42 185 NA > > and > > >ctrees > > ctrees > plot tree m sp dbh tht ct > 3 1 3 0 DF 40 192 NA > 4 1 4 0 DF 33 148 NA > 5 1 5 0 DF 43 182 NA > > when the value of ct for all the records in all the > data.frames should be 1, > not NA. > > Why is that? Am I missing a step here? I'm running R 1.7.1 on Win2k. > > > Jeff. > > --- > Jeff D. Hamann > Forest Informatics, Inc. > PO Box 1421 > Corvallis, Oregon USA 97339-1421 > (office) 541-754-1428 > (cell) 541-740-5988 > jeff.hamann at forestinformatics.com > www.forestinformatics.com > > > --- > Jeff D. Hamann > Forest Informatics, Inc. > PO Box 1421 > Corvallis, Oregon USA 97339-1421 > (office) 541-754-1428 > (cell) 541-740-5988 > jeff.hamann at forestinformatics.com > www.forestinformatics.com > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 644449 Fax: +44 (0) 1379 644445 email: Simon.Fear at synequanon.com web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}}
On Thu, 6 Nov 2003, Jeff D. Hamann wrote:> I've been trying to figure out why the following is happening.... > > I've got some data I'll load in from a file... > > rm(list=ls(all=TRUE)) > trees <- read.table( "c:/cruisepak/data.txt", header=T) > trees$ct <- 1That is incorrect usage: you set the last element to 1, and you used list indexing on a data frame. Either use tree["ct"] <- 1 or, better, use a value of the correct length. (As from 1.8.0 using $ with data frames is supported.)> And when I create some temp variable, then split the data to perform further > processing, the additional column doesn't maintain the data correctly....Actually, it was maintained correctly.> mtrees <- trees[trees$m == 1,] > ctrees <- trees[trees$m == 0,] > > The results are as follows... > > > trees > plot tree m sp dbh tht ct > 1 1 1 1 DF 44 185 1 > 3 1 3 0 DF 40 192 1 > .....blah, blah, blah...... > 6 1 6 0 DF 26 156 1 > 8 1 8 0 DF 26 155 1That's a bug that got fixed in 1.8.0.> but, > >mtrees > plot tree m sp dbh tht ct > 1 1 1 1 DF 44 185 1 > 2 1 2 1 DF 38 188 NA > 17 2 6 1 DF 26 174 NA > 26 3 1 1 DF 42 185 NA > > and > > >ctrees > > ctrees > plot tree m sp dbh tht ct > 3 1 3 0 DF 40 192 NA > 4 1 4 0 DF 33 148 NA > 5 1 5 0 DF 43 182 NA > > when the value of ct for all the records in all the data.frames should be 1, > not NA. > > Why is that? Am I missing a step here? I'm running R 1.7.1 on Win2k.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595