Dear Friends,
I have data set with around 220,000 rows and 17 columns. One of the columns is
an id variable which is grouped from 1000 through 9000. I need to perform the
following operations.
1) Remove all the observations with id's between 6000 and 6999
I tried using this method.
remdat1 <- subset(data, ID<6000)
remdat2 <- subset(data, ID>=7000)
donedat <- rbind(remdat1, remdat2)
I check the last and first entry and found that it did not have ID values 6000.
Therefore I think that this might be correct, but is this the most efficient way
of doing this?
2) I need to remove observations within columns 3, 4, 6 and 8 when they are
negative. For instance if the number in column 3 is -4, then I need to delete
the entire observation. Can somebody help me with this too.
Thank and Regards
Anup
---------------------------------
[[alternative HTML version deleted]]
On Apr 17, 2007, at 8:03 PM, Anup Nandialath wrote:> Dear Friends, > > I have data set with around 220,000 rows and 17 columns. One of the > columns is an id variable which is grouped from 1000 through 9000. > I need to perform the following operations. > > 1) Remove all the observations with id's between 6000 and 6999 > > I tried using this method. > > remdat1 <- subset(data, ID<6000) > remdat2 <- subset(data, ID>=7000) > donedat <- rbind(remdat1, remdat2) > > I check the last and first entry and found that it did not have ID > values 6000. Therefore I think that this might be correct, but is > this the most efficient way of doing this? >The rbind is a bit unnecessary probably. I think all you are missing for both questions is the "or" operator, "|". ( ?"|" ) Simply: donedat <- subset(data, ID< 6000 | ID >=7000) would do for this. Not sure about efficiency, but if the code is fast as it stands I wouldn't worry too much about it.> 2) I need to remove observations within columns 3, 4, 6 and 8 when > they are negative. For instance if the number in column 3 is -4, > then I need to delete the entire observation. Can somebody help me > with this too.The following should do it (untested, not sure if it would handle NA's): toremove <- data[,3] < 0 | data[,4] < 0 | data[,6] < 0 | data[,8] < 0 data[!toremove,] If you want more columns than those 4, then we could perhaps look for a better line than the first line above.> Thank and Regards > > AnupHaris Skiadas Department of Mathematics and Computer Science Hanover College
...is this what you're looking for?
donedat <- subset(data,ID < 6000 | ID >= 7000)
findat <- donedat[-unique(rapply(donedat,function(x)
which( x < 0 ))),,drop=FALSE]
the second line looks through each column, and finds the indices of negative
values - rapply() returns all of them as a vector; unique() removes
duplicated elements, and with negative indexing you remove these values from
donedat.
--- Anup Nandialath <anup_nandialath at yahoo.com> wrote:
> Dear Friends,
>
> I have data set with around 220,000 rows and 17 columns. One of the columns
> is an id variable which is grouped from 1000 through 9000. I need to
> perform the following operations.
>
> 1) Remove all the observations with id's between 6000 and 6999
>
> I tried using this method.
>
> remdat1 <- subset(data, ID<6000)
> remdat2 <- subset(data, ID>=7000)
> donedat <- rbind(remdat1, remdat2)
>
> I check the last and first entry and found that it did not have ID values
> 6000. Therefore I think that this might be correct, but is this the most
> efficient way of doing this?
>
> 2) I need to remove observations within columns 3, 4, 6 and 8 when they are
> negative. For instance if the number in column 3 is -4, then I need to
> delete the entire observation. Can somebody help me with this too.
>
> Thank and Regards
>
> Anup
>
>
> ---------------------------------
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>