Tim Richter-Heitmann
2015-Feb-20 17:33 UTC
[R] Split a dataframe by rownames and/or colnames
Dear List, Consider this example df <- data.frame(matrix(rnorm(9*9), ncol=9)) names(df) <- c("c_1", "d_1", "e_1", "a_p", "b_p", "c_p", "1_o1", "2_o1", "3_o1") row.names(df) <- names(df) indx <- gsub(".*_", "", names(df)) I can split the dataframe by the index that is given in the column.names after the underscore "_". list2env( setNames( lapply(split(colnames(df), indx), function(x) df[x]), paste('df', sort(unique(indx)), sep="_")), envir=.GlobalEnv) However, i changed my mind and want to do it now by rownames. Exchanging colnames with rownames does not work, it gives the exact same output (9 rows x 3 columns). I could do as.data.frame(t(df_x), but maybe that is not elegant. What would be the solution for splitting the dataframe by rows? Thank you very much! -- Tim Richter-Heitmann
I think ?tapply and friends: ?by ?aggregate ?ave is what you want. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Fri, Feb 20, 2015 at 9:33 AM, Tim Richter-Heitmann <trichter at uni-bremen.de> wrote:> Dear List, > > Consider this example > > df <- data.frame(matrix(rnorm(9*9), ncol=9)) > names(df) <- c("c_1", "d_1", "e_1", "a_p", "b_p", "c_p", "1_o1", "2_o1", > "3_o1") > row.names(df) <- names(df) > > > indx <- gsub(".*_", "", names(df)) > > I can split the dataframe by the index that is given in the column.names > after the underscore "_". > > list2env( > setNames( > lapply(split(colnames(df), indx), function(x) df[x]), > paste('df', sort(unique(indx)), sep="_")), > envir=.GlobalEnv) > > However, i changed my mind and want to do it now by rownames. Exchanging > colnames with rownames does not work, it gives the exact same output (9 rows > x 3 columns). I could do > as.data.frame(t(df_x), > but maybe that is not elegant. > What would be the solution for splitting the dataframe by rows? > > Thank you very much! > > -- > Tim Richter-Heitmann > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Feb 20, 2015, at 9:33 AM, Tim Richter-Heitmann wrote:> Dear List, > > Consider this example > > df <- data.frame(matrix(rnorm(9*9), ncol=9)) > names(df) <- c("c_1", "d_1", "e_1", "a_p", "b_p", "c_p", "1_o1", "2_o1", "3_o1") > row.names(df) <- names(df) > > > indx <- gsub(".*_", "", names(df)) > > I can split the dataframe by the index that is given in the column.names after the underscore "_". > > list2env( > setNames( > lapply(split(colnames(df), indx), function(x) df[x]), > paste('df', sort(unique(indx)), sep="_")), > envir=.GlobalEnv) >> However, i changed my mind and want to do it now by rownames. Exchanging colnames with rownames does not work, it gives the exact same output (9 rows x 3 columns). I could do > as.data.frame(t(df_x), > but maybe that is not elegant. > What would be the solution for splitting the dataframe by rows?The split.data.frame method seems to work perfectly well with a rownames-derived index argument:> split(df, sub(".+_","", rownames(df) ) )$`1` c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1 c_1 -0.11 -0.04 1.33 -0.87 -0.16 -0.25 -0.75 0.34 0.14 d_1 -0.62 -0.94 0.80 -0.78 -0.70 0.74 0.11 1.44 -0.33 e_1 0.98 -0.83 0.48 0.19 -0.32 -1.01 1.28 1.04 -2.16 $o1 c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1 1_o1 -0.93 -0.02 0.69 -0.67 1.04 1.04 -1.50 -0.36 0.50 2_o1 0.02 -0.16 -0.09 -1.50 -0.02 -1.04 1.07 -0.45 1.56 3_o1 -1.42 0.88 -0.05 0.85 -1.35 0.21 1.35 0.92 -0.76 $p c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1 a_p -1.35 0.91 -0.58 -0.63 0.94 -1.13 0.71 0.25 0.82 b_p -0.25 -0.73 -0.41 -1.71 1.28 0.19 -0.35 1.74 -0.93 c_p -0.01 -1.11 -0.12 0.58 1.51 0.03 -0.99 -0.23 -0.03> > Thank you very much! > > -- > Tim Richter-Heitmann >-- David Winsemius Alameda, CA, USA
Tim Richter-Heitmann
2015-Feb-23 12:03 UTC
[R] Split a dataframe by rownames and/or colnames
Thank you very much for the line. It was doing the split as suggested. However, i want to release all the dataframes to the environment (later on, for each dataframe, some dozen lines of code will be carried out, and i dont know how to do it w lapply or for-looping, so i do it separately): list2env(split(df, sub(".+_","", rownames(df))), envir=.GlobalEnv) Anyway, the dataframes have now numeric names in some cases, and cannot be easily accessed because of it. How would the line be altered to add an "df_" for each of the dataframe names resulting from list2env? Thank you very much! Thanks, On 20.02.2015 20:36, David Winsemius wrote:> On Feb 20, 2015, at 9:33 AM, Tim Richter-Heitmann wrote: > >> Dear List, >> >> Consider this example >> >> df <- data.frame(matrix(rnorm(9*9), ncol=9)) >> names(df) <- c("c_1", "d_1", "e_1", "a_p", "b_p", "c_p", "1_o1", "2_o1", "3_o1") >> row.names(df) <- names(df) >> >> >> indx <- gsub(".*_", "", names(df)) >> >> I can split the dataframe by the index that is given in the column.names after the underscore "_". >> >> list2env( >> setNames( >> lapply(split(colnames(df), indx), function(x) df[x]), >> paste('df', sort(unique(indx)), sep="_")), >> envir=.GlobalEnv) >> >> However, i changed my mind and want to do it now by rownames. Exchanging colnames with rownames does not work, it gives the exact same output (9 rows x 3 columns). I could do >> as.data.frame(t(df_x), >> but maybe that is not elegant. >> What would be the solution for splitting the dataframe by rows? > The split.data.frame method seems to work perfectly well with a rownames-derived index argument: > >> split(df, sub(".+_","", rownames(df) ) ) > $`1` > c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1 > c_1 -0.11 -0.04 1.33 -0.87 -0.16 -0.25 -0.75 0.34 0.14 > d_1 -0.62 -0.94 0.80 -0.78 -0.70 0.74 0.11 1.44 -0.33 > e_1 0.98 -0.83 0.48 0.19 -0.32 -1.01 1.28 1.04 -2.16 > > $o1 > c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1 > 1_o1 -0.93 -0.02 0.69 -0.67 1.04 1.04 -1.50 -0.36 0.50 > 2_o1 0.02 -0.16 -0.09 -1.50 -0.02 -1.04 1.07 -0.45 1.56 > 3_o1 -1.42 0.88 -0.05 0.85 -1.35 0.21 1.35 0.92 -0.76 > > $p > c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1 > a_p -1.35 0.91 -0.58 -0.63 0.94 -1.13 0.71 0.25 0.82 > b_p -0.25 -0.73 -0.41 -1.71 1.28 0.19 -0.35 1.74 -0.93 > c_p -0.01 -1.11 -0.12 0.58 1.51 0.03 -0.99 -0.23 -0.03 > >> Thank you very much! >> >> -- >> Tim Richter-Heitmann >>-- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Stra?e (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069