Boris Steipe
2017-Nov-10 13:43 UTC
[R] Calculating frequencies of multiple values in 200 colomns
|> x <- sample(0:2, 10, replace = TRUE) |> x [1] 1 0 2 1 0 2 2 0 2 1 |> tabulate(x) [1] 3 4 |> table(x) x 0 1 2 3 3 4 B.> On Nov 10, 2017, at 4:32 AM, Allaisone 1 <allaisone1 at hotmail.com> wrote: > > > > Thank you for your effort Bert.., > > > I knew what is the problem now, the values (1,2,3) were only an example. The values I have are 0 , 1, 2 . Tabulate () function seem to ignore calculating the frequency of 0 values and this is my exact problem as the frequency of 0 values should also be calculated for the maf to be calculated correctly. > > ________________________________ > From: Bert Gunter <bgunter.4567 at gmail.com> > Sent: 09 November 2017 23:51:35 > To: Allaisone 1; R-help > Subject: Re: [R] Calculating frequencies of multiple values in 200 colomns > > [[elided Hotmail spam]] > > "For example, if I have the values : 1 , 2 , 3 in each column, applying Tabulate () would calculate the frequency of 1 and 2 without 3" > > Huh?? > >> x <- sample(1:3,10,TRUE) >> x > [1] 1 3 1 1 1 3 2 3 2 1 >> tabulate(x) > [1] 5 2 3 > > Cheers, > Bert > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Thu, Nov 9, 2017 at 3:44 PM, Allaisone 1 <allaisone1 at hotmail.com<mailto:allaisone1 at hotmail.com>> wrote: > > Thank you so much for your replay > > > Actually, I tried apply() function but struggled with the part of writing the appropriate function inside it which calculate the frequency of the 3 values. Tabulate () function is a good start but the problem is that this calculates the frequency of two values only per column which means that when I apply maf () function , maf value will be calculated using the frequency of these 2 values only without considering the frequency of the 3rd value. For example, if I have the values : 1 , 2 , 3 in each column, applying Tabulate () would calculate the frequency of 1 and 2 without 3 . I need a way to calculate the frequencies of all of the 3 values so the calculation of maf will be correct as it will consider all the 3 frequencies but not only 2 . > > > Regards > > Allahisone > > ________________________________ > From: Bert Gunter <bgunter.4567 at gmail.com<mailto:bgunter.4567 at gmail.com>> > Sent: 09 November 2017 20:56:39 > To: Allaisone 1 > Cc: r-help at R-project.org > Subject: Re: [R] Calculating frequencies of multiple values in 200 colomns > > This is not a good way to do things! R has many powerful built in functions to do this sort of thing for you. Searching -- e.g. at rseek.org<http://rseek.org> or even a plain old google search -- can help you find them. Also, it looks like you need to go through a tutorial or two to learn more about R's basic functionality. > > In this case, something like (no reproducible example given, so can't confirm): > > apply(Values, 2, function(x)maf(tabulate(x))) > > should be close to what you want . > > > Cheers, > Bert > > > > > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Thu, Nov 9, 2017 at 11:44 AM, Allaisone 1 <allaisone1 at hotmail.com<mailto:allaisone1 at hotmail.com>> wrote: > > Hi All > > > I have a dataset of 200 columns and 1000 rows , there are 3 repeated values under each column (7,8,10). I wanted to calculate the frequency of each value under each column and then apply the function maf () given that the frequency of each value is known. I can do the analysis step by step like this :- > > >> Values > > > A B C ... 200 > > 1 7 10 7 > > 2 7 8 7 > > 3 10 8 7 > > 4 8 7 10 > > . > > . > > . > > > > > For column A : I calculate the frequency for the 3 values as follows : > > count7 <- length(which(Values$A == 7)) > > count8 <- length(which(Values$A == 8)) > > count10 <- length(which(Values$A == 10)) > > > count7 = 2, count8 = 1 , count10= 1. > > > Then, I create a vector and type the frequencies manually : > > > Freq<- c( count7=2 ,count8= 1,count10=1) > > > Then I apply the function maf () :- > > maf(Freq) > > > This gives me the result I need for column A , could you please help me > > to perform the analysis for all of the 200 columns at once ? > > > Regards > > Allahisone > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Marc Schwartz
2017-Nov-10 14:34 UTC
[R] Calculating frequencies of multiple values in 200 colomns
Hi, To clarify the default behavior that Boris is referencing below, note the definition of the 'bin' argument to the tabulate() function: bin: a numeric vector ***(of positive integers)***, or a factor. Long vectors are supported. I added the asterisks for emphasis. This is also noted in the examples used for the function in ?tabulate at the bottom of the help page. The second argument, 'nbins', which defaults to max(1, bin, na.rm = TRUE), also affects the output:> tabulate(c(2, 3, 5))[1] 0 1 1 0 1 In this case, with each element in the returned vector indicating how many 1's, 2's, 3's, 4's and 5's are present in the source vector. Compare that to:> tabulate(c(2, 3, 5), nbins = 3)[1] 0 1 1 In the above example, 5 is ignored. Note also that tabulate(), unlike table(), does not return a named vector, just the frequencies. While tabulate() is used within the table() function, reviewing the code for the latter reveals how the default behavior of tabulate() is modified and preceded/wrapped in other code for use there. Regards, Marc Schwartz> On Nov 10, 2017, at 8:43 AM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > > |> x <- sample(0:2, 10, replace = TRUE) > |> x > [1] 1 0 2 1 0 2 2 0 2 1 > |> tabulate(x) > [1] 3 4 > |> table(x) > x > 0 1 2 > 3 3 4 > > > > B. > > > >> On Nov 10, 2017, at 4:32 AM, Allaisone 1 <allaisone1 at hotmail.com> wrote: >> >> >> >> Thank you for your effort Bert.., >> >> >> I knew what is the problem now, the values (1,2,3) were only an example. The values I have are 0 , 1, 2 . Tabulate () function seem to ignore calculating the frequency of 0 values and this is my exact problem as the frequency of 0 values should also be calculated for the maf to be calculated correctly. >> >> ________________________________ >> From: Bert Gunter <bgunter.4567 at gmail.com> >> Sent: 09 November 2017 23:51:35 >> To: Allaisone 1; R-help >> Subject: Re: [R] Calculating frequencies of multiple values in 200 colomns >> >> [[elided Hotmail spam]] >> >> "For example, if I have the values : 1 , 2 , 3 in each column, applying Tabulate () would calculate the frequency of 1 and 2 without 3" >> >> Huh?? >> >>> x <- sample(1:3,10,TRUE) >>> x >> [1] 1 3 1 1 1 3 2 3 2 1 >>> tabulate(x) >> [1] 5 2 3 >> >> Cheers, >> Bert >> >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> On Thu, Nov 9, 2017 at 3:44 PM, Allaisone 1 <allaisone1 at hotmail.com<mailto:allaisone1 at hotmail.com>> wrote: >> >> Thank you so much for your replay >> >> >> Actually, I tried apply() function but struggled with the part of writing the appropriate function inside it which calculate the frequency of the 3 values. Tabulate () function is a good start but the problem is that this calculates the frequency of two values only per column which means that when I apply maf () function , maf value will be calculated using the frequency of these 2 values only without considering the frequency of the 3rd value. For example, if I have the values : 1 , 2 , 3 in each column, applying Tabulate () would calculate the frequency of 1 and 2 without 3 . I need a way to calculate the frequencies of all of the 3 values so the calculation of maf will be correct as it will consider all the 3 frequencies but not only 2 . >> >> >> Regards >> >> Allahisone >> >> ________________________________ >> From: Bert Gunter <bgunter.4567 at gmail.com<mailto:bgunter.4567 at gmail.com>> >> Sent: 09 November 2017 20:56:39 >> To: Allaisone 1 >> Cc: r-help at R-project.org >> Subject: Re: [R] Calculating frequencies of multiple values in 200 colomns >> >> This is not a good way to do things! R has many powerful built in functions to do this sort of thing for you. Searching -- e.g. at rseek.org<http://rseek.org> or even a plain old google search -- can help you find them. Also, it looks like you need to go through a tutorial or two to learn more about R's basic functionality. >> >> In this case, something like (no reproducible example given, so can't confirm): >> >> apply(Values, 2, function(x)maf(tabulate(x))) >> >> should be close to what you want . >> >> >> Cheers, >> Bert >> >> >> >> >> >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> On Thu, Nov 9, 2017 at 11:44 AM, Allaisone 1 <allaisone1 at hotmail.com<mailto:allaisone1 at hotmail.com>> wrote: >> >> Hi All >> >> >> I have a dataset of 200 columns and 1000 rows , there are 3 repeated values under each column (7,8,10). I wanted to calculate the frequency of each value under each column and then apply the function maf () given that the frequency of each value is known. I can do the analysis step by step like this :- >> >> >>> Values >> >> >> A B C ... 200 >> >> 1 7 10 7 >> >> 2 7 8 7 >> >> 3 10 8 7 >> >> 4 8 7 10 >> >> . >> >> . >> >> . >> >> >> >> >> For column A : I calculate the frequency for the 3 values as follows : >> >> count7 <- length(which(Values$A == 7)) >> >> count8 <- length(which(Values$A == 8)) >> >> count10 <- length(which(Values$A == 10)) >> >> >> count7 = 2, count8 = 1 , count10= 1. >> >> >> Then, I create a vector and type the frequencies manually : >> >> >> Freq<- c( count7=2 ,count8= 1,count10=1) >> >> >> Then I apply the function maf () :- >> >> maf(Freq) >> >> >> This gives me the result I need for column A , could you please help me >> >> to perform the analysis for all of the 200 columns at once ? >> >> >> Regards >> >> Allahisone
Eric Berger
2017-Nov-10 16:28 UTC
[R] Calculating frequencies of multiple values in 200 colomns
How about this workaround - add 1 to the vector x <- c(1,0,2,1,0,2,2,0,2,1) tabulate(x) # [1] 3 4 tabulate(x+1) #[1] 3 3 4 On Fri, Nov 10, 2017 at 4:34 PM, Marc Schwartz <marc_schwartz at me.com> wrote:> Hi, > > To clarify the default behavior that Boris is referencing below, note the > definition of the 'bin' argument to the tabulate() function: > > bin: a numeric vector ***(of positive integers)***, or a factor. Long > vectors are supported. > > I added the asterisks for emphasis. > > This is also noted in the examples used for the function in ?tabulate at > the bottom of the help page. > > The second argument, 'nbins', which defaults to max(1, bin, na.rm = TRUE), > also affects the output: > > > tabulate(c(2, 3, 5)) > [1] 0 1 1 0 1 > > In this case, with each element in the returned vector indicating how many > 1's, 2's, 3's, 4's and 5's are present in the source vector. > > Compare that to: > > > tabulate(c(2, 3, 5), nbins = 3) > [1] 0 1 1 > > In the above example, 5 is ignored. > > Note also that tabulate(), unlike table(), does not return a named vector, > just the frequencies. > > While tabulate() is used within the table() function, reviewing the code > for the latter reveals how the default behavior of tabulate() is modified > and preceded/wrapped in other code for use there. > > Regards, > > Marc Schwartz > > > > On Nov 10, 2017, at 8:43 AM, Boris Steipe <boris.steipe at utoronto.ca> > wrote: > > > > |> x <- sample(0:2, 10, replace = TRUE) > > |> x > > [1] 1 0 2 1 0 2 2 0 2 1 > > |> tabulate(x) > > [1] 3 4 > > |> table(x) > > x > > 0 1 2 > > 3 3 4 > > > > > > > > B. > > > > > > > >> On Nov 10, 2017, at 4:32 AM, Allaisone 1 <allaisone1 at hotmail.com> > wrote: > >> > >> > >> > >> Thank you for your effort Bert.., > >> > >> > >> I knew what is the problem now, the values (1,2,3) were only an > example. The values I have are 0 , 1, 2 . Tabulate () function seem to > ignore calculating the frequency of 0 values and this is my exact problem > as the frequency of 0 values should also be calculated for the maf to be > calculated correctly. > >> > >> ________________________________ > >> From: Bert Gunter <bgunter.4567 at gmail.com> > >> Sent: 09 November 2017 23:51:35 > >> To: Allaisone 1; R-help > >> Subject: Re: [R] Calculating frequencies of multiple values in 200 > colomns > >> > >> [[elided Hotmail spam]] > >> > >> "For example, if I have the values : 1 , 2 , 3 in each column, applying > Tabulate () would calculate the frequency of 1 and 2 without 3" > >> > >> Huh?? > >> > >>> x <- sample(1:3,10,TRUE) > >>> x > >> [1] 1 3 1 1 1 3 2 3 2 1 > >>> tabulate(x) > >> [1] 5 2 3 > >> > >> Cheers, > >> Bert > >> > >> > >> > >> Bert Gunter > >> > >> "The trouble with having an open mind is that people keep coming along > and sticking things into it." > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > >> On Thu, Nov 9, 2017 at 3:44 PM, Allaisone 1 <allaisone1 at hotmail.com< > mailto:allaisone1 at hotmail.com>> wrote: > >> > >> Thank you so much for your replay > >> > >> > >> Actually, I tried apply() function but struggled with the part of > writing the appropriate function inside it which calculate the frequency of > the 3 values. Tabulate () function is a good start but the problem is that > this calculates the frequency of two values only per column which means > that when I apply maf () function , maf value will be calculated using the > frequency of these 2 values only without considering the frequency of the > 3rd value. For example, if I have the values : 1 , 2 , 3 in each column, > applying Tabulate () would calculate the frequency of 1 and 2 without 3 . I > need a way to calculate the frequencies of all of the 3 values so the > calculation of maf will be correct as it will consider all the 3 > frequencies but not only 2 . > >> > >> > >> Regards > >> > >> Allahisone > >> > >> ________________________________ > >> From: Bert Gunter <bgunter.4567 at gmail.com<mailto:bgunter.4567 at gmail.com > >> > >> Sent: 09 November 2017 20:56:39 > >> To: Allaisone 1 > >> Cc: r-help at R-project.org > >> Subject: Re: [R] Calculating frequencies of multiple values in 200 > colomns > >> > >> This is not a good way to do things! R has many powerful built in > functions to do this sort of thing for you. Searching -- e.g. at > rseek.org<http://rseek.org> or even a plain old google search -- can help > you find them. Also, it looks like you need to go through a tutorial or two > to learn more about R's basic functionality. > >> > >> In this case, something like (no reproducible example given, so can't > confirm): > >> > >> apply(Values, 2, function(x)maf(tabulate(x))) > >> > >> should be close to what you want . > >> > >> > >> Cheers, > >> Bert > >> > >> > >> > >> > >> > >> > >> > >> Bert Gunter > >> > >> "The trouble with having an open mind is that people keep coming along > and sticking things into it." > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > >> On Thu, Nov 9, 2017 at 11:44 AM, Allaisone 1 <allaisone1 at hotmail.com< > mailto:allaisone1 at hotmail.com>> wrote: > >> > >> Hi All > >> > >> > >> I have a dataset of 200 columns and 1000 rows , there are 3 repeated > values under each column (7,8,10). I wanted to calculate the frequency of > each value under each column and then apply the function maf () given that > the frequency of each value is known. I can do the analysis step by step > like this :- > >> > >> > >>> Values > >> > >> > >> A B C ... 200 > >> > >> 1 7 10 7 > >> > >> 2 7 8 7 > >> > >> 3 10 8 7 > >> > >> 4 8 7 10 > >> > >> . > >> > >> . > >> > >> . > >> > >> > >> > >> > >> For column A : I calculate the frequency for the 3 values as follows : > >> > >> count7 <- length(which(Values$A == 7)) > >> > >> count8 <- length(which(Values$A == 8)) > >> > >> count10 <- length(which(Values$A == 10)) > >> > >> > >> count7 = 2, count8 = 1 , count10= 1. > >> > >> > >> Then, I create a vector and type the frequencies manually : > >> > >> > >> Freq<- c( count7=2 ,count8= 1,count10=1) > >> > >> > >> Then I apply the function maf () :- > >> > >> maf(Freq) > >> > >> > >> This gives me the result I need for column A , could you please help me > >> > >> to perform the analysis for all of the 200 columns at once ? > >> > >> > >> Regards > >> > >> Allahisone > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Possibly Parallel Threads
- Calculating frequencies of multiple values in 200 colomns
- Calculating frequencies of multiple values in 200 colomns
- Calculating frequencies of multiple values in 200 colomns
- Calculating frequencies of multiple values in 200 colomns
- Complicated analysis for huge databases