Vining, Kelly
2012-Apr-18 20:13 UTC
[R] problem extracting data from a set of list vectors
Dear useRs, A colleague has sent me several batches of output I need to process, and I'm struggling with the format to the point that I don't even know how to extract a test set to upload here. My apologies, but I think that my issue is straightforward enough (for some of you, not for me!) that you can help in the absence of a test set. Here is the scenario: # Data sets are lists:> ls()[1] "res.Callus.Explant" "res.Callus.Regen" "res.Explant.Regen"> is.list(res.Callus.Explant)[1] TRUE # The elements of each list look like this:> names(res.Callus.Explant)[1] "name" "group1" "group2" "alternative" "rows" "counts" [7] "eff.lib.sizes" "dispersion" "x" "beta0" "beta.hat" "beta.tilde" [13] "e" "e1" "e2" "log.fc" "p.values" "q.values" I want to 1) extract specific fields from this data structure into a data frame, 2) subset from this data frame into a new data frame based on selection criteria. What I've done is this: all.comps <- ls(pattern="^res") for(i in all.comps){ obj = i; gene.ids = rownames(obj$counts); x = data.frame(gene.ids = gene.ids, obj$counts, obj$e1, obj$e2, obj$log.fc, obj$p.value, obj$q.value); DiffGenes.i = subset(x, x$obj.p.value<0.05 | x$obj.q.value<=0.1) } Obviously, this doesn't work because pattern searching in the first line is not feeding the entire data structure into the all.comps variable. But how can I accomplish feeding the whole data structure for each one of these lists into the loop? Should I be able to use sapply here? If so, how? Also, I suspect that "DiffGenes.i" is not going to give me the data frame I want, which in the example I'm showing would be "DiffGenes.res.Callus.Explant." How should I name output data frames from a loop like this (if a loop is even the best way to do this)? Any help with this will be greatly appreciated. --Kelly V.
Milan Bouchet-Valat
2012-Apr-18 21:04 UTC
[R] problem extracting data from a set of list vectors
Le mercredi 18 avril 2012 ? 13:13 -0700, Vining, Kelly a ?crit :> Dear useRs, > > A colleague has sent me several batches of output I need to process, and I'm struggling with the format to the point that I don't even know how to extract a test set to upload here. My apologies, but I think that my issue is straightforward enough (for some of you, not for me!) that you can help in the absence of a test set. Here is the scenario: > > # Data sets are lists: > > ls() > [1] "res.Callus.Explant" "res.Callus.Regen" "res.Explant.Regen" > > is.list(res.Callus.Explant) > [1] TRUE > > # The elements of each list look like this: > > names(res.Callus.Explant) > [1] "name" "group1" "group2" "alternative" "rows" "counts" > [7] "eff.lib.sizes" "dispersion" "x" "beta0" "beta.hat" "beta.tilde" > [13] "e" "e1" "e2" "log.fc" "p.values" "q.values" > > I want to 1) extract specific fields from this data structure into a data frame, 2) subset from this data frame into a new data frame based on selection criteria. What I've done is this: > > all.comps <- ls(pattern="^res") > for(i in all.comps){ > obj = i; > gene.ids = rownames(obj$counts); > x = data.frame(gene.ids = gene.ids, obj$counts, obj$e1, obj$e2, obj$log.fc, > obj$p.value, obj$q.value); > DiffGenes.i = subset(x, x$obj.p.value<0.05 | x$obj.q.value<=0.1) > } > > Obviously, this doesn't work because pattern searching in the first line is not feeding the entire data structure into the all.comps variable. But how can I accomplish feeding the whole data structure for each one of these lists into the loop? Should I be able to use sapply here? If so, how? Also, I suspect that "DiffGenes.i" is not going to give me the data frame I want, which in the example I'm showing would be "DiffGenes.res.Callus.Explant." How should I name output data frames from a loop like this (if a loop is even the best way to do this)? > > Any help with this will be greatly appreciated.You did not tell us exactly how you imported your data, and how your data sets are structured. str(res.Callus.Explant) would help. Specifically, I suspect your objects are already data frames, which are a special case of lists. You can check that with is.data.frame(). If they aren't, but their elements are all of the same length, you can use as.data.frame() to convert them to data frames. Then, you would simply do something like: sets <- list(res.Callus.Explant, res.Callus.Regen, res.Explant.Regen) sets <- lapply(sets, subset, p.values<0.05 | q.values<=0.1)) Hope this helps
MacQueen, Don
2012-Apr-18 21:42 UTC
[R] problem extracting data from a set of list vectors
Try this (NOT tested) or something similar: all.comps <- ls(pattern="^res") for(i in all.comps) { obj <- all.comops[[i]] gene.ids <- rownames(obj$counts) x <- data.frame(gene.ids = gene.ids, obj$counts, obj$e1, obj$e2, obj$log.fc, obj$p.value, obj$q.value) x <- subset(x, obj.p.value<0.05 | obj.q.value<=0.1) assign( paste('DiffGenes',i,sep='.') , x, '.GlobalEnv') } Before you try this, make sure you have a copy of everything, or can reconstruct it. The assign() function is dangerous. With it you can overwrite other data if you are not careful. You might test first; instead of using assign() as above, instead do cat('output object name is: ', paste('DiffGenes',i,sep='.'),'\n') cat('output object data is:\n') print(tmp) cat('\n') To explain a little: i is the name of the data structure, not the data structure itself you extract the data structure from all.comps using [[i]] The assign() function takes the output object (tmp in this case) and writes it to the "global environment" using a name that is constructed using paste(). The global environment is the first place in your search path; see search(). Note the simplification of the subset() statement. You don't need semi-colons at the end of each line. When you construct x, you might find it helpful to name the rest of the columns, not just the first one. Instead of letting it construct names. I re-wrapped the lines in the hopes that my email software will not re-wrap them for me. -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 4/18/12 1:13 PM, "Vining, Kelly" <Kelly.Vining at oregonstate.edu> wrote:>Dear useRs, > >A colleague has sent me several batches of output I need to process, and >I'm struggling with the format to the point that I don't even know how to >extract a test set to upload here. My apologies, but I think that my >issue is straightforward enough (for some of you, not for me!) that you >can help in the absence of a test set. Here is the scenario: > ># Data sets are lists: >> ls() >[1] "res.Callus.Explant" "res.Callus.Regen" "res.Explant.Regen" >> is.list(res.Callus.Explant) >[1] TRUE > ># The elements of each list look like this: >> names(res.Callus.Explant) > [1] "name" "group1" "group2" "alternative" >"rows" "counts" > [7] "eff.lib.sizes" "dispersion" "x" "beta0" >"beta.hat" "beta.tilde" >[13] "e" "e1" "e2" "log.fc" >"p.values" "q.values" > >I want to 1) extract specific fields from this data structure into a data >frame, 2) subset from this data frame into a new data frame based on >selection criteria. What I've done is this: > >all.comps <- ls(pattern="^res") >for(i in all.comps){ >obj = i; >gene.ids = rownames(obj$counts); >x = data.frame(gene.ids = gene.ids, obj$counts, obj$e1, obj$e2, >obj$log.fc, >obj$p.value, obj$q.value); >DiffGenes.i = subset(x, x$obj.p.value<0.05 | x$obj.q.value<=0.1) >} > >Obviously, this doesn't work because pattern searching in the first line >is not feeding the entire data structure into the all.comps variable. But >how can I accomplish feeding the whole data structure for each one of >these lists into the loop? Should I be able to use sapply here? If so, >how? Also, I suspect that "DiffGenes.i" is not going to give me the data >frame I want, which in the example I'm showing would be >"DiffGenes.res.Callus.Explant." How should I name output data frames from >a loop like this (if a loop is even the best way to do this)? > >Any help with this will be greatly appreciated. > >--Kelly V. > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.