Dear Friends, I have data set with around 220,000 rows and 17 columns. One of the columns is an id variable which is grouped from 1000 through 9000. I need to perform the following operations. 1) Remove all the observations with id's between 6000 and 6999 I tried using this method. remdat1 <- subset(data, ID<6000) remdat2 <- subset(data, ID>=7000) donedat <- rbind(remdat1, remdat2) I check the last and first entry and found that it did not have ID values 6000. Therefore I think that this might be correct, but is this the most efficient way of doing this? 2) I need to remove observations within columns 3, 4, 6 and 8 when they are negative. For instance if the number in column 3 is -4, then I need to delete the entire observation. Can somebody help me with this too. Thank and Regards Anup --------------------------------- [[alternative HTML version deleted]]
On Apr 17, 2007, at 8:03 PM, Anup Nandialath wrote:> Dear Friends, > > I have data set with around 220,000 rows and 17 columns. One of the > columns is an id variable which is grouped from 1000 through 9000. > I need to perform the following operations. > > 1) Remove all the observations with id's between 6000 and 6999 > > I tried using this method. > > remdat1 <- subset(data, ID<6000) > remdat2 <- subset(data, ID>=7000) > donedat <- rbind(remdat1, remdat2) > > I check the last and first entry and found that it did not have ID > values 6000. Therefore I think that this might be correct, but is > this the most efficient way of doing this? >The rbind is a bit unnecessary probably. I think all you are missing for both questions is the "or" operator, "|". ( ?"|" ) Simply: donedat <- subset(data, ID< 6000 | ID >=7000) would do for this. Not sure about efficiency, but if the code is fast as it stands I wouldn't worry too much about it.> 2) I need to remove observations within columns 3, 4, 6 and 8 when > they are negative. For instance if the number in column 3 is -4, > then I need to delete the entire observation. Can somebody help me > with this too.The following should do it (untested, not sure if it would handle NA's): toremove <- data[,3] < 0 | data[,4] < 0 | data[,6] < 0 | data[,8] < 0 data[!toremove,] If you want more columns than those 4, then we could perhaps look for a better line than the first line above.> Thank and Regards > > AnupHaris Skiadas Department of Mathematics and Computer Science Hanover College
...is this what you're looking for? donedat <- subset(data,ID < 6000 | ID >= 7000) findat <- donedat[-unique(rapply(donedat,function(x) which( x < 0 ))),,drop=FALSE] the second line looks through each column, and finds the indices of negative values - rapply() returns all of them as a vector; unique() removes duplicated elements, and with negative indexing you remove these values from donedat. --- Anup Nandialath <anup_nandialath at yahoo.com> wrote:> Dear Friends, > > I have data set with around 220,000 rows and 17 columns. One of the columns > is an id variable which is grouped from 1000 through 9000. I need to > perform the following operations. > > 1) Remove all the observations with id's between 6000 and 6999 > > I tried using this method. > > remdat1 <- subset(data, ID<6000) > remdat2 <- subset(data, ID>=7000) > donedat <- rbind(remdat1, remdat2) > > I check the last and first entry and found that it did not have ID values > 6000. Therefore I think that this might be correct, but is this the most > efficient way of doing this? > > 2) I need to remove observations within columns 3, 4, 6 and 8 when they are > negative. For instance if the number in column 3 is -4, then I need to > delete the entire observation. Can somebody help me with this too. > > Thank and Regards > > Anup > > > --------------------------------- > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >