Displaying 20 results from an estimated 2000 matches similar to: "Complicated analysis for huge databases"
2017 Nov 17
0
Complicated analysis for huge databases
Combine columns 1 and 2 into a column with a single ID like "33.55", "44.66" and use split() on these IDs to break up your dataset. Iterate over the list of data frames split() returns.
B.
> On Nov 17, 2017, at 12:59 PM, Allaisone 1 <allaisone1 at hotmail.com> wrote:
>
>
> Hi all ..,
>
>
> I have a large dataset of around 600,000 rows and 600
2017 Nov 18
2
Complicated analysis for huge databases
Thanks Boris , this was very helpful but I'm struggling with the last part.
1) I combined the first 2 columns :-
library(tidyr)
SingleMealsCode <-unite(MyData, MealsCombinations, c(MealA, MealB), remove=FALSE)
SingleMealsCode <- SingleMealsCode[,-2]
2) I separated this dataframe into different dataframes based on "MealsCombination"
column so R will recognize each meal
2017 Nov 18
2
Complicated analysis for huge databases
Although the loop seems to be formulated correctly I wonder why
it gives me these errors :
-object 'i' not found
- unexpected '}' in "}"
the desired output is expected to be very large as for each dataframe in the list of dataframes I expect to see maf value for each of the 600 columns! and this is only for
for one dataframe in the list .. I have around 150-200
2017 Nov 18
3
Complicated analysis for huge databases
The loop :
AllMAFs <- list()
for (i in length(SeparatedGroupsofmealsCombs) {
AllMAFs[[i]] <- apply( SeparatedGroupsofmealsCombs[[i]], 2, function(x)maf( tabulate( x+1) ))
}
gives these errors (I tried this many times and I'm sure I copied it entirely) :-
Error in apply(SeparatedGroupsofmealsCombs[[i]], 2, function(x) maf(tabulate(x + :
object 'i' not found
> }
2017 Nov 18
0
Complicated analysis for huge databases
> On Nov 18, 2017, at 1:52 AM, Allaisone 1 <allaisone1 at hotmail.com> wrote:
>
> Although the loop seems to be formulated correctly I wonder why
> it gives me these errors :
>
> -object 'i' not found
> - unexpected '}' in "}"
You probably did not copy the entire code offered. But we cannot know since you did not "show your code",
2017 Nov 18
0
Complicated analysis for huge databases
Something like the following?
AllMAFs <- list()
for (i in length(SeparatedGroupsofmealsCombs) {
AllMAFs[[i]] <- apply(SeparatedGroupsofmealsCombs[[i]], 2, function(x)maf(tabulate(x+1)))
}
(untested, of course)
Also the solution is a bit generic since I don't know what the output of maf() looks like in your case, and I don't understand why you use tabulate because I would have
2017 Nov 18
0
Complicated analysis for huge databases
On 18/11/2017 4:40 PM, Allaisone 1 wrote:
>
> The loop :
>
>
> AllMAFs <- list()
>
> for (i in length(SeparatedGroupsofmealsCombs) {
> AllMAFs[[i]] <- apply( SeparatedGroupsofmealsCombs[[i]], 2, function(x)maf( tabulate( x+1) ))
> }
>
>
> gives these errors (I tried this many times and I'm sure I copied it entirely) :-
>
> Error in
2017 Nov 19
1
Complicated analysis for huge databases
Thanks but a new error appeared with the loop :
Error in x + 1 : non-numeric argument to binary operator
I think this can be solved by converting columns (I,II,II,..600) into "numeric" instead of
the current "int" type as shown below in the structure of "33_55" dataframe .
$ 33_55:'data.frame': 256 obs. of 600 variables:
..$ MealsCombinations
2017 Nov 18
0
Complicated analysis for huge databases
The correct code is:
for (i in 1:length(SeparatedGroupsofmealsCombs)) { ...
I had mentioned that this is untested, but the error is so obvious ...
B.
> On Nov 18, 2017, at 4:40 PM, Allaisone 1 <allaisone1 at hotmail.com> wrote:
>
>
> The loop :
>
> AllMAFs <- list()
>
> for (i in length(SeparatedGroupsofmealsCombs) {
> AllMAFs[[i]] <-
2017 Nov 10
0
Calculating frequencies of multiple values in 200 colomns
Thank you for your effort Bert..,
I knew what is the problem now, the values (1,2,3) were only an example. The values I have are 0 , 1, 2 . Tabulate () function seem to ignore calculating the frequency of 0 values and this is my exact problem as the frequency of 0 values should also be calculated for the maf to be calculated correctly.
________________________________
From: Bert Gunter
2017 Nov 09
2
Calculating frequencies of multiple values in 200 colomns
Always reply to the list. I am not a free, private consultant!
"For example, if I have the values : 1 , 2 , 3 in each column, applying
Tabulate () would calculate the frequency of 1 and 2 without 3"
Huh??
> x <- sample(1:3,10,TRUE)
> x
[1] 1 3 1 1 1 3 2 3 2 1
> tabulate(x)
[1] 5 2 3
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people
2017 Nov 10
2
Calculating frequencies of multiple values in 200 colomns
|> x <- sample(0:2, 10, replace = TRUE)
|> x
[1] 1 0 2 1 0 2 2 0 2 1
|> tabulate(x)
[1] 3 4
|> table(x)
x
0 1 2
3 3 4
B.
> On Nov 10, 2017, at 4:32 AM, Allaisone 1 <allaisone1 at hotmail.com> wrote:
>
>
>
> Thank you for your effort Bert..,
>
>
> I knew what is the problem now, the values (1,2,3) were only an example. The values I have are
2017 Nov 10
0
Calculating frequencies of multiple values in 200 colomns
Hi,
To clarify the default behavior that Boris is referencing below, note the definition of the 'bin' argument to the tabulate() function:
bin: a numeric vector ***(of positive integers)***, or a factor. Long vectors are supported.
I added the asterisks for emphasis.
This is also noted in the examples used for the function in ?tabulate at the bottom of the help page.
The second
2017 Nov 09
3
Calculating frequencies of multiple values in 200 colomns
Hi All
I have a dataset of 200 columns and 1000 rows , there are 3 repeated values under each column (7,8,10). I wanted to calculate the frequency of each value under each column and then apply the function maf () given that the frequency of each value is known. I can do the analysis step by step like this :-
> Values
A B C ... 200
1 7 10 7
2
2012 Feb 10
3
problem subsetting data frame with variable instead of constant
Hello,
I've encountered a very weird issue with the method subset(), or maybe this
is something I don't know about said method that when you're subsetting
based on the columns of a data frame you can only use constants (0.1, 2.3,
2.2) instead of variables?
Here's a look at my data frame called 'ea.cad.pwr':
*>ea.ca.pwr[1:5,]
MAF OR POWER
1 0.02 0.01 0.9999
2 0.02
2010 Nov 29
2
FW: how to use by() ?
Thank you for the suggestion, Bill. The result is not quite what I would like. Here's sample code for you or anyone else who may be interested:
Al1 = c('A','C','C','C')
Al2 = c('G','G','G','T')
Freq1 = c(0.0078,0.0567,0.9434,0.9908)
MAF = c(0.0078,0.0567,0.0566,0.0092)
m1 = data.frame(Al1=Al1,
2005 Jul 15
1
2D contour predictions
Hi All
I have been fitting regression models and would now like to produce some
contour & image plots from the predictors.
Is there an easy way to do this? My current (newbie) experience with R
would suggest there is but that it's not always easy to find it!
f3 <- lm( fc ~ poly( speed, 2 ) + poly( torque, 2 ) + poly( sonl, 2 ) +
poly( p_rail, 2 ) + poly( pil_sep, 2 ) + poly( maf, 2
2003 Sep 17
2
CART analysis
Greetings,
Does anyone know of an R code for classification and regression tree
analysis (CART)?
Thank you
Ron
Ron Thornton BVSc, PhD, MACVSc (pathology, epidemiology)
Programme Co-ordinator, Active Surveillance
Animal Biosecurity
MAF Biosecurity Authority
P O Box 2526
Wellington, New Zealand
phone: 64-4-4744156
027 223 7582
fax: 64-4-474-4133
e-mail: ron.thornton at maf.govt.nz
2007 Jan 21
2
efficient code. how to reduce running time?
Hi,
I am new to R.
and even though I've made my code to run and do what it needs to .
It is taking forever and I can't use it like this.
I was wondering if you could help me find ways to fix the code to run
faster.
Here are my codes..
the data set is a bunch of 0s and 1s in a data.frame.
What I am doing is this.
I pick a column and make up a new column Y with values associated with that
2010 Nov 29
3
how to use by() ?
Hello, All!
How might one accomplish this using the by() function?
m1 is a data frame.
# populate column "m1$major_allele"
for ( i in 1:length(m1$major_allele)) {
if ( m1$Freq1[i] == m1$MAF[i]){
m1$major_allele[i] = m1$Al1[i]
}
else{
m1$major_allele[i] = m1$Al2[i]
}
}
Jim
[[alternative HTML version deleted]]