Hello, I have a dataset with 40 variables, some of them are always 0 (each row). I would like to make a subset containing only the columns which values are not all 0, but I don't know how to do it. I tried: for(cut_column in 1:40) { if(sum(dataset[,cut_column])!=0) { columns_useful<-c(columns_useful,dataset[cut_column]) } } sorted_dataset<-subset(dataset, select=columns_useful) But it doesn't work. Thank you Francisco
> dataset<-data.frame(a=1:10,b=c(0,0,0,1,0,0,0,0,1,0),c=rep(0,10)) > apply(dataset,2,function(x) all(x==0))a b c FALSE FALSE TRUE> dataset[,!apply(dataset,2,function(x) all(x==0))]a b 1 1 0 2 2 0 3 3 0 4 4 1 5 5 0 6 6 0 7 7 0 8 8 0 9 9 1 10 10 0 On Tue, Jan 24, 2012 at 8:14 AM, Francisco <franciscororolaio@google.com>wrote:> Hello, > I have a dataset with 40 variables, some of them are always 0 (each row). > I would like to make a subset containing only the columns which values are > not all 0, but I don't know how to do it. > > I tried: > > for(cut_column in 1:40) { > > if(sum(dataset[,cut_column])!=**0) { > columns_useful<-c(columns_** > useful,dataset[cut_column]) > > } > } > > sorted_dataset<-subset(**dataset, select=columns_useful) > > But it doesn't work. > Thank you > > Francisco > > ______________________________**________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Try also dataset[, colSums(dataset == 0) != nrow(dataset)] HTH, Jorge.- On Tue, Jan 24, 2012 at 11:14 AM, Francisco <> wrote:> Hello, > I have a dataset with 40 variables, some of them are always 0 (each row). > I would like to make a subset containing only the columns which values are > not all 0, but I don't know how to do it. > > I tried: > > for(cut_column in 1:40) { > > if(sum(dataset[,cut_column])!=**0) { > columns_useful<-c(columns_** > useful,dataset[cut_column]) > > } > } > > sorted_dataset<-subset(**dataset, select=columns_useful) > > But it doesn't work. > Thank you > > Francisco > > ______________________________**________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
dataset.1 <- dataset[, apply(dataset, 2, sum)>0] Gregorio R. Serrano 2012/1/24 Francisco <franciscororolaio@google.com>> Hello, > I have a dataset with 40 variables, some of them are always 0 (each row). > I would like to make a subset containing only the columns which values are > not all 0, but I don't know how to do it. > > I tried: > > for(cut_column in 1:40) { > > if(sum(dataset[,cut_column])!=**0) { > columns_useful<-c(columns_** > useful,dataset[cut_column]) > > } > } > > sorted_dataset<-subset(**dataset, select=columns_useful) > > But it doesn't work. > Thank you > > Francisco > > ______________________________**________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Dr. Gregorio R. Serrano Dpto. EconomÃa Cuantitativa (UCM) Voz:+34 91394 2361 Twitter: @grserrano_ http://www.grserrano.es [[alternative HTML version deleted]]
On 25/01/12 05:14, Francisco wrote:> Hello, > I have a dataset with 40 variables, some of them are always 0 (each > row). I would like to make a subset containing only the columns which > values are not all 0, but I don't know how to do it. > > I tried: > > for(cut_column in 1:40) { > > if(sum(dataset[,cut_column])!=0) { > columns_useful<-c(columns_useful,dataset[cut_column]) > > } > } > > sorted_dataset<-subset(dataset, select=columns_useful) > > But it doesn't work.Try: good_dataset <- dataset[,sapply(dataset,function(x){!all(x==0)})] This works modulo possible gotchas induced by floating point arithmetic. Another possibility: tol <- sqrt(.Machine$double.eps) good_dataset <- dataset[,sapply(dataset,function(x){!all(abs(x)<=tol)})] Or: good_dataset <- dataset[,sapply(dataset,function(x){!isTRUE(all.equal(x,rep(0,length(x))))})] The foregoing could trip up if some columns of "dataset" have extra attributes tagging along. E.g. the column could actually be a numeric matrix of zeroes --- in which case it wouldn't get dropped. cheers, Rolf Turner
Another way would be a which statement. good_dataset=data[,which(colSums(data)!=0)] I believe this depends on how the data are structured though. -- View this message in context: http://r.789695.n4.nabble.com/drop-columns-whose-rows-are-all-0-tp4324231p4325474.html Sent from the R help mailing list archive at Nabble.com.