Hello, The quick version of my question is how can I extract a matrix instead of a vector using tapply()? I would like to be able to access both the results of tapply() and also the index variables. In case further explanation would help: I am analyzing a large (3million rows x 9 columns) spatial/temporal dataset and am attempting to calculate the number of unique years containing any data within each geographic area (10 degree cells in this case). I can do this, but I also want to extract a subset vector of the index variable (area). My script to calculate the number of unique years containing any data for each area is: x<-tapply(years, area, function(x) length(unique(x))) Now, I want to extract the vector of areas where the number of unique years containing any data is >20, but tapply() only returns a vector of unique years and I was a matrix. I could use a looping function to do this, but tapply() is much faster with large datasets and so I would like to use it if possible. Any help is appreciated. Thanks. -- View this message in context: http://www.nabble.com/extracting-index-list-when-using-tapply%28%29-tp18345794p18345794.html Sent from the R help mailing list archive at Nabble.com.
Hi, How about using "subset"? x1<-tapply(subset(years, length(area)>20), function(x) length(unique(x))) I hope this works Chunhao Quoting hesicaia <dboyce at dal.ca>:> > Hello, > The quick version of my question is how can I extract a matrix instead of > a vector using tapply()? I would like to be able to access both the results > of tapply() and also the index variables. > > In case further explanation would help: I am analyzing a large (3million > rows x 9 columns) spatial/temporal dataset and am attempting to calculate > the number of unique years containing any data within each geographic area > (10 degree cells in this case). I can do this, but I also want to extract a > subset vector of the index variable (area). > > My script to calculate the number of unique years containing any data for > each area is: > x<-tapply(years, area, function(x) length(unique(x))) > > Now, I want to extract the vector of areas where the number of unique years > containing any data is >20, but tapply() only returns a vector of unique > years and I was a matrix. > > I could use a looping function to do this, but tapply() is much faster with > large datasets and so I would like to use it if possible. > > Any help is appreciated. > Thanks. > -- > View this message in context: > http://www.nabble.com/extracting-index-list-when-using-tapply%28%29-tp18345794p18345794.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Tue, 8 Jul 2008, hesicaia wrote:> > Hello, > The quick version of my question is how can I extract a matrix instead of > a vector using tapply()? I would like to be able to access both the results > of tapply() and also the index variables. > > In case further explanation would help: I am analyzing a large (3million > rows x 9 columns) spatial/temporal dataset and am attempting to calculate > the number of unique years containing any data within each geographic area > (10 degree cells in this case). I can do this, but I also want to extract a > subset vector of the index variable (area). >It really would help to provide a worling example as another suggested. We cannot test our suggestions without a trial dataset.> My script to calculate the number of unique years containing any data for > each area is: > x<-tapply(years, area, function(x) length(unique(x))) >or tab <- table( area, years ) x <- rowSums ( tab !=0 )> Now, I want to extract the vector of areas where the number of unique years > containing any data is >20, but tapply() only returns a vector of unique > years and I was a matrix.x <- rownames(tab)[ rowSums( tab !=0 ) > 20 ] unless, perhaps, you meant x <- rownames(tab)[ rowSums( tab > 20 ) !=0 ]> > I could use a looping function to do this, but tapply() is much faster with > large datasets and so I would like to use it if possible. >Depending on the size of the dataset and the number of different years and areas, there may be better ways to do this (since 'tab' could be very big and sparse). For a start in that direction, see http://finzi.psych.upenn.edu/R/Rhelp02a/archive/118816.html and perhaps library(Matrix) (on CRAN). HTH, Chuck> Any help is appreciated. > Thanks. > -- > View this message in context: http://www.nabble.com/extracting-index-list-when-using-tapply%28%29-tp18345794p18345794.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Working code would help. I would probably use 'lapply' since it appears that you want to return a variable number of items for each condition. On Tue, Jul 8, 2008 at 2:23 PM, hesicaia <dboyce at dal.ca> wrote:> > Hello, > The quick version of my question is how can I extract a matrix instead of > a vector using tapply()? I would like to be able to access both the results > of tapply() and also the index variables. > > In case further explanation would help: I am analyzing a large (3million > rows x 9 columns) spatial/temporal dataset and am attempting to calculate > the number of unique years containing any data within each geographic area > (10 degree cells in this case). I can do this, but I also want to extract a > subset vector of the index variable (area). > > My script to calculate the number of unique years containing any data for > each area is: > x<-tapply(years, area, function(x) length(unique(x))) > > Now, I want to extract the vector of areas where the number of unique years > containing any data is >20, but tapply() only returns a vector of unique > years and I was a matrix. > > I could use a looping function to do this, but tapply() is much faster with > large datasets and so I would like to use it if possible. > > Any help is appreciated. > Thanks. > -- > View this message in context: http://www.nabble.com/extracting-index-list-when-using-tapply%28%29-tp18345794p18345794.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?