Dear R Staff You can see my data.csv file in the annex. I try to count non-zero values in dataset but I need to exclude NA in this calculation My code is very long (following), How can I write this code more efficiently and shortly? ## [NA_Count] - Find NA values data.na =sapply(data[,3:ncol(data)], function(c) sum(length(which(is.na (c))))) ## [Zero] - Find zero values data.z=apply(data[,3:ncol(data)], 2, function(c) sum(c==0)) ## [Non-Zero] - Find non-zero values data.nz=nrow(data[,3:ncol(data)])- (data.na+data.z) Sincerely Engin YILMAZ <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
Hello, Your attachment didn't came through, R-Help strips off most types of files, including CSV. Anyway, the following will do what I understand of your question. Tested with a fake dataset. set.seed(3026) # make the results reproducible data <- matrix(1:100, ncol = 10) data[sample(100, 15)] <- 0 data[sample(100, 10)] <- NA data <- as.data.frame(data) zero <- sapply(data, function(x) sum(x == 0, na.rm = TRUE)) na <- sapply(data, function(x) sum(is.na(x))) totals <- nrow(data) - zero - na # totals non zero per column grand_total <- sum(totals) # total non zero totals # V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 # 6 8 8 8 8 7 7 8 6 10 grand_total #[1] 76 # another way prod(dim(data)) - sum(zero + na) #[1] 76 Hope this helps, Rui Barradas Em 29-10-2017 10:25, Engin YILMAZ escreveu:> Dear R Staff > > You can see my data.csv file in the annex. > > I try to count non-zero values in dataset but I need to exclude NA in this > calculation > > My code is very long (following), > How can I write this code more efficiently and shortly? > > ## [NA_Count] - Find NA values > > data.na =sapply(data[,3:ncol(data)], function(c) sum(length(which(is.na > (c))))) > > > ## [Zero] - Find zero values > > data.z=apply(data[,3:ncol(data)], 2, function(c) sum(c==0)) > > > ## [Non-Zero] - Find non-zero values > > data.nz=nrow(data[,3:ncol(data)])- (data.na+data.z) > > > Sincerely > Engin YILMAZ > > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > Virus-free. > www.avast.com > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
If one does not need all the intermediate results then after defining data just one line: grand_total <- nrow(data)*ncol(data) - sum( sapply(data, function(x) sum( is.na(x) | x == 0 ) ) ) # 76 On Sun, Oct 29, 2017 at 2:38 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Your attachment didn't came through, R-Help strips off most types of > files, including CSV. > Anyway, the following will do what I understand of your question. Tested > with a fake dataset. > > > set.seed(3026) # make the results reproducible > data <- matrix(1:100, ncol = 10) > data[sample(100, 15)] <- 0 > data[sample(100, 10)] <- NA > data <- as.data.frame(data) > > zero <- sapply(data, function(x) sum(x == 0, na.rm = TRUE)) > na <- sapply(data, function(x) sum(is.na(x))) > totals <- nrow(data) - zero - na # totals non zero per column > grand_total <- sum(totals) # total non zero > > totals > # V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 > # 6 8 8 8 8 7 7 8 6 10 > > grand_total > #[1] 76 > > # another way > prod(dim(data)) - sum(zero + na) > #[1] 76 > > > Hope this helps, > > Rui Barradas > > > Em 29-10-2017 10:25, Engin YILMAZ escreveu: > >> Dear R Staff >> >> You can see my data.csv file in the annex. >> >> I try to count non-zero values in dataset but I need to exclude NA in this >> calculation >> >> My code is very long (following), >> How can I write this code more efficiently and shortly? >> >> ## [NA_Count] - Find NA values >> >> data.na =sapply(data[,3:ncol(data)], function(c) sum(length(which(is.na >> (c))))) >> >> >> ## [Zero] - Find zero values >> >> data.z=apply(data[,3:ncol(data)], 2, function(c) sum(c==0)) >> >> >> ## [Non-Zero] - Find non-zero values >> >> data.nz=nrow(data[,3:ncol(data)])- (data.na+data.z) >> >> >> Sincerely >> Engin YILMAZ >> >> <https://www.avast.com/sig-email?utm_medium=email&utm_source >> =link&utm_campaign=sig-email&utm_content=webmail> >> Virus-free. >> www.avast.com >> <https://www.avast.com/sig-email?utm_medium=email&utm_source >> =link&utm_campaign=sig-email&utm_content=webmail> >> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Since i could not see your data, the easiest thing comes to mind is court values excluding NAs, is something like this sum(!is.na(x)) Best of luck--EK On Sun, Oct 29, 2017 at 6:25 AM, Engin YILMAZ <ispanyolcom at gmail.com> wrote:> Dear R Staff > > You can see my data.csv file in the annex. > > I try to count non-zero values in dataset but I need to exclude NA in this > calculation > > My code is very long (following), > How can I write this code more efficiently and shortly? > > ## [NA_Count] - Find NA values > > data.na =sapply(data[,3:ncol(data)], function(c) sum(length(which(is.na > (c))))) > > > ## [Zero] - Find zero values > > data.z=apply(data[,3:ncol(data)], 2, function(c) sum(c==0)) > > > ## [Non-Zero] - Find non-zero values > > data.nz=nrow(data[,3:ncol(data)])- (data.na+data.z) > > > Sincerely > Engin YILMAZ > > <https://www.avast.com/sig-email?utm_medium=email&utm_ > source=link&utm_campaign=sig-email&utm_content=webmail> > Virus-free. > www.avast.com > <https://www.avast.com/sig-email?utm_medium=email&utm_ > source=link&utm_campaign=sig-email&utm_content=webmail> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dear R Staff This is my file (www.fiscalforecasting.com/data.csv) if you don't download this file, my dataset same as following Year Month A B C D E 2005 July 0 *4* NA NA *1* 2005 July 0 NA NA 0 *9* 2005 July NA *4* 0 *1* 0 2005 July *4* 0 *2* *9* NA I try to count non-zero values which are not NA values for every *column* *Sincerely* *Engin YILMAZ* 2017-10-29 15:01 GMT+03:00 Ek Esawi <esawiek at gmail.com>:> Since i could not see your data, the easiest thing comes to mind is court > values excluding NAs, is something like this > sum(!is.na(x)) > > Best of luck--EK > > On Sun, Oct 29, 2017 at 6:25 AM, Engin YILMAZ <ispanyolcom at gmail.com> > wrote: > >> Dear R Staff >> >> You can see my data.csv file in the annex. >> >> I try to count non-zero values in dataset but I need to exclude NA in this >> calculation >> >> My code is very long (following), >> How can I write this code more efficiently and shortly? >> >> ## [NA_Count] - Find NA values >> >> data.na =sapply(data[,3:ncol(data)], function(c) sum(length(which(is.na >> (c))))) >> >> >> ## [Zero] - Find zero values >> >> data.z=apply(data[,3:ncol(data)], 2, function(c) sum(c==0)) >> >> >> ## [Non-Zero] - Find non-zero values >> >> data.nz=nrow(data[,3:ncol(data)])- (data.na+data.z) >> >> >> Sincerely >> Engin YILMAZ >> >> <https://www.avast.com/sig-email?utm_medium=email&utm_source >> =link&utm_campaign=sig-email&utm_content=webmail> >> Virus-free. >> www.avast.com >> <https://www.avast.com/sig-email?utm_medium=email&utm_source >> =link&utm_campaign=sig-email&utm_content=webmail> >> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >-- *Sayg?lar?mla* Engin YILMAZ <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> [[alternative HTML version deleted]]
What was suggested by Eric and Rui works well, but here is a short and may be simpler answer provided your data is similar what Eric posted. It should work for your l data too. aa <- is.na(data)|data==0 nrow(data)-colSums(aa) EK On Sun, Oct 29, 2017 at 6:25 AM, Engin YILMAZ <ispanyolcom at gmail.com> wrote:> Dear R Staff > > You can see my data.csv file in the annex. > > I try to count non-zero values in dataset but I need to exclude NA in this > calculation > > My code is very long (following), > How can I write this code more efficiently and shortly? > > ## [NA_Count] - Find NA values > > data.na =sapply(data[,3:ncol(data)], function(c) sum(length(which(is.na > (c))))) > > > ## [Zero] - Find zero values > > data.z=apply(data[,3:ncol(data)], 2, function(c) sum(c==0)) > > > ## [Non-Zero] - Find non-zero values > > data.nz=nrow(data[,3:ncol(data)])- (data.na+data.z) > > > Sincerely > Engin YILMAZ > > <https://www.avast.com/sig-email?utm_medium=email&utm_ > source=link&utm_campaign=sig-email&utm_content=webmail> > Virus-free. > www.avast.com > <https://www.avast.com/sig-email?utm_medium=email&utm_ > source=link&utm_campaign=sig-email&utm_content=webmail> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Thanks Esawi,Barradas and Berger Sincerely Engin YILMAZ <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> 2017-10-29 23:14 GMT+03:00 Ek Esawi <esawiek at gmail.com>:> What was suggested by Eric and Rui works well, but here is a short and > may be simpler answer provided your data is similar what Eric posted. It > should work for your l data too. > > aa <- is.na(data)|data==0 > nrow(data)-colSums(aa) > > EK > > On Sun, Oct 29, 2017 at 6:25 AM, Engin YILMAZ <ispanyolcom at gmail.com> > wrote: > >> Dear R Staff >> >> You can see my data.csv file in the annex. >> >> I try to count non-zero values in dataset but I need to exclude NA in this >> calculation >> >> My code is very long (following), >> How can I write this code more efficiently and shortly? >> >> ## [NA_Count] - Find NA values >> >> data.na =sapply(data[,3:ncol(data)], function(c) sum(length(which(is.na >> (c))))) >> >> >> ## [Zero] - Find zero values >> >> data.z=apply(data[,3:ncol(data)], 2, function(c) sum(c==0)) >> >> >> ## [Non-Zero] - Find non-zero values >> >> data.nz=nrow(data[,3:ncol(data)])- (data.na+data.z) >> >> >> Sincerely >> Engin YILMAZ >> >> <https://www.avast.com/sig-email?utm_medium=email&utm_source >> =link&utm_campaign=sig-email&utm_content=webmail> >> Virus-free. >> www.avast.com >> <https://www.avast.com/sig-email?utm_medium=email&utm_source >> =link&utm_campaign=sig-email&utm_content=webmail> >> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >-- *Sayg?lar?mla* Engin YILMAZ [[alternative HTML version deleted]]
On 10/29/2017 3:25 AM, Engin YILMAZ wrote:> Dear R Staff > > You can see my data.csv file in the annex. > > I try to count non-zero values in dataset but I need to exclude NA in this > calculation > > My code is very long (following), > How can I write this code more efficiently and shortly? > > ## [NA_Count] - Find NA values > > data.na =sapply(data[,3:ncol(data)], function(c) sum(length(which(is.na > (c))))) > > > ## [Zero] - Find zero values > > data.z=apply(data[,3:ncol(data)], 2, function(c) sum(c==0)) > > > ## [Non-Zero] - Find non-zero values > > data.nz=nrow(data[,3:ncol(data)])- (data.na+data.z) > > > Sincerely > Engin YILMAZ > > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > Virus-free. > www.avast.com > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >this looks like a good place for apply() apply(data,2,function(x) sum(x != 0, na.rm=TRUE)) Hope this is helpful, Dan -- Daniel Nordlund Port Townsend, WA USA