John Clark
2011-Sep-04 13:25 UTC
[R] generating multiple dataset and applying function and output multiple output dataset......
Dear R experts: Here is my problem, just hard for me... I want to generate multiple datasets, then apply a function to these datasets and output corresponding output in single or multiple dataset (whatever possible)... # my example, although I need to generate a large number of variables and datasets seed <- round(runif(10)*1000000) datagen <- function(x){ set.seed(x) var <- rep(1:3, c(rep(3, 3))) yvar <- rnorm(length(var), 50, 10) matrix <- matrix(sample(1:10, c(10*length(var)), replace = TRUE), ncol = 10) mydata <- data.frame(var, yvar, matrix) } gdt <- lapply (seed, datagen) # resulting list (I believe is correct term) has 10 dataframes: gdt[1] .......to gdt[10] # my function, this will perform anova in every component data frames and output probability coefficients... anovp <- function(x){ ind <- 3:ncol(x) out <- lm(gdt[x]$yvar ~ gdt[x][, ind[ind]]) pval <- out$coefficients[,4][2] pval <- do.call(rbind,pval) } plist <- lapply (gdt, anovp) Error in gdt[x] : invalid subscript type 'list' This is not working, I tried different options. But could not figure out...finally decided to bother experts, sorry for that... My questions are: (1) Is this possible to handle such situation in this way or there are other alternatives to handle such multiple datasets created? (2) If this is right way, how can I do it? Thank you for attention and I will appreciate your help... JC [[alternative HTML version deleted]]
Sarah Goslee
2011-Sep-05 13:40 UTC
[R] generating multiple dataset and applying function and output multiple output dataset......
Hi, On Sun, Sep 4, 2011 at 9:25 AM, John Clark <rosbreed.pba at gmail.com> wrote:> Dear R experts: > > Here is my problem, just hard for me... > > I want to generate multiple datasets, then apply a function to these > datasets and output corresponding output in single or multiple dataset > (whatever possible)... > > # my example, although I need to generate a large number of variables and > datasets > > seed <- round(runif(10)*1000000) > > datagen <- function(x){ > set.seed(x) > var <- rep(1:3, c(rep(3, 3))) > yvar <- rnorm(length(var), 50, 10) > matrix <- matrix(sample(1:10, c(10*length(var)), replace = TRUE), ncol = 10) > mydata <- data.frame(var, yvar, matrix) > } > > gdt <- lapply (seed, ?datagen) > > # resulting list (I believe is correct term) has 10 dataframes: gdt[1] > .......to gdt[10]Yes, that's a list of dataframes, though the correct reference is gdt[[1]]> # my function, this will perform anova in every component data frames and > output probability coefficients... > anovp <- function(x){ > ? ? ? ? ?ind <- 3:ncol(x) > ? ? ? ? ?out <- lm(gdt[x]$yvar ~ gdt[x][, ind[ind]]) > ? ? ? ? ?pval <- out$coefficients[,4][2] > ? ? ? ? ?pval <- do.call(rbind,pval) > ? ? ? ? } > > plist <- lapply (gdt, ?anovp) > > Error in gdt[x] : invalid subscript type 'list'It's not a matter of your use of lapply(), which is fine. It's that your anovp() function just plain doesn't work. You need to debug it with ONE dataframe before you try to lapply it to a whole bunch.> anovp(gdt[[1]])Error in gdt[x] : invalid subscript type 'list' This suggests to me that x should be a matrix rather than a list (a dataframe is a type of list), so I tried:> anovp(as.matrix(gdt[[1]]))Error in gdt[x][, ind[ind]] : incorrect number of dimensions But as you see there are still problems. You'll need to solve those first: if anovp() doesn't work for one dataframe, it won't work on a list of them.> This is not working, I tried different options. But could not figure > out...finally decided to bother experts, sorry for that... > > My questions are: > > (1) Is this possible to handle such situation in this way or there are other > alternatives to handle such multiple datasets created? > > (2) ?If this is right way, how can I do it? > > > Thank you for attention and I will appreciate your help... > > > JC >-- Sarah Goslee http://www.functionaldiversity.org
Reasonably Related Threads
- by (tapply) and for loop differences
- no solution yet, please help: extract p-value from mixed model in kinship package
- using eval to handle column names in function calling scatterplot graph function
- question on function argument
- Writing a helper function that takes in the dataframe and variable names and then does a subset and plot