wk y
2009-Oct-13 00:29 UTC
[R] splitting dataframe, assign to new dataframe, add new rows to new dataframe
Hi, all, My objective is to split a dataframe named "cmbine" according to the value of "classes". After the split, I will take the first instance from each class and bin them into a new dataframe, "df1". In the 2nd iteration, I will take the 2nd available instance and bin them into another new dataframe, "df2".>cmbine$namesapple tiger pencil chicken banana pear>cmbine$mass0.50 100.00 0.01 1.00 0.15 0.30>cmbine$classes1 2 3 2 1 1 These are the results which I want to obtain:>df1classes mass apple 0.50 tiger 100.00 pencil 0.01>df2classes mass banana 0.15 chicken 1.00>df3classes mass pear 0.30 Below shows what I have tried. The main problem I have = I don't know how to assign the selected instance into a new dataframe with a name which is generated 'on-the-fly' based on the value of j (the jth row). for (i in 1:3) { same_cell <- cmbine[cmbine$classes == i, ] if (nrow(same_cell)!=0){ for (j in 1:nrow(same_cell)){ picked <- same_cell[j, ] assign(paste("df", j, sep=""), picked) #assign(paste("df",j, sep=""), paste("df", j, sep="")) } } The problem is that the assign function overwrites the previous value of df and therefore the I have not been able to insert rows in the three df dataframes and always end up with only 1 (final) row in df1, df2 and df3. I have tried using rbind but was not able to assign values back to the "on-the-fly" variable names. I really need your advice and assistance since I have stuck with this for some time now. Thank you.
cls59
2009-Oct-13 01:41 UTC
[R] splitting dataframe, assign to new dataframe, add new rows to new dataframe
wk yeo wrote:> > > Hi, all, > > My objective is to split a dataframe named "cmbine" according to the value > of "classes". After the split, I will take the first instance from each > class and bin them into a new dataframe, "df1". In the 2nd iteration, I > will take the 2nd available instance and bin them into another new > dataframe, "df2". > > >>cmbine$names > apple tiger pencil chicken banana pear > >>cmbine$mass > 0.50 100.00 0.01 1.00 0.15 0.30 > >>cmbine$classes > 1 2 3 2 1 1 > >If possible, it would be helpful to provide sample data in a form that could be copied and pasted directly into an R session, like so: cmbine <- data.frame( names = c('apple', 'tiger', 'pencil', 'chicken', 'banana', 'pear' ) ) cmbine['mass'] <- c(0.50, 100.00, 0.01, 1.00, 0.15, 0.30) cmbine['classes'] <- factor(c(1, 2, 3, 2,1 ,1)) It saves people on the list a bunch of coping/pasting/quote adding. Another quick way to do this is to use the dump() which spits out the structure of your object in a way that can be copied and pasted: dump( 'cmbine', file='' ) wk yeo wrote:> > > These are the results which I want to obtain: > >>df1 > classes mass > apple 0.50 > tiger 100.00 > pencil 0.01 > >>df2 > classes mass > banana 0.15 > chicken 1.00 > >>df3 > classes mass > pear 0.30 > > Below shows what I have tried. The main problem I have = I don't know how > to assign the selected instance into a new dataframe with a name which is > generated 'on-the-fly' based on the value of j (the jth row). > > > for (i in 1:3) { > same_cell <- cmbine[cmbine$classes == i, ] > if (nrow(same_cell)!=0){ > for (j in 1:nrow(same_cell)){ > picked <- same_cell[j, ] > assign(paste("df", j, sep=""), picked) > #assign(paste("df",j, sep=""), paste("df", j, sep="")) > } > } > >I'm assuming you want the results grouped by class, i.e. all the 1s in one data frame all the 2s in another. This can be done with a slight modification of your loop: for (i in 1:3) { same_cell <- cmbine[cmbine$classes == i, ] if (nrow(same_cell)!=0){ assign(paste("df", i, sep=""), same_cell) } } However, the results I get aren't the same as the results you said you wanted:> df1names mass classes 1 apple 0.50 1 5 banana 0.15 1 6 pear 0.30 1> df2names mass classes 2 tiger 100 2 4 chicken 1 2> df3names mass classes 3 pencil 0.01 3 The "R way" of doing this is to use the by() function, which breaks a data frame into sub-data frames based on a column of factors-- such as the classes. For your example, it would be used as: by( cmbine, cmbine[['classes']], function( df ){ # Lots of stuff can happen inside this function, in this case we are really # just returning the subset that got passed in. return( df ) }) cmbine[["classes"]]: 1 names mass classes 1 apple 0.50 1 5 banana 0.15 1 6 pear 0.30 1 ----------------------------------------------------------------------- cmbine[["classes"]]: 2 names mass classes 2 tiger 100 2 4 chicken 1 2 ----------------------------------------------------------------------- cmbine[["classes"]]: 3 names mass classes 3 pencil 0.01 3 The by() function returns a fancy list, each component of which can be accessed using the [] operator. Hope this helps! -Charlie ----- Charlie Sharpsteen Undergraduate Environmental Resources Engineering Humboldt State University -- View this message in context: http://www.nabble.com/splitting-dataframe%2C-assign-to-new-dataframe%2C-add-new-rows-to-new-dataframe-tp25865409p25865911.html Sent from the R help mailing list archive at Nabble.com.
cls59
2009-Oct-13 02:02 UTC
[R] splitting dataframe, assign to new dataframe, add new rows to new dataframe
wk yeo wrote:> > > Hi, all, > > My objective is to split a dataframe named "cmbine" according to the value > of "classes". After the split, I will take the first instance from each > class and bin them into a new dataframe, "df1". In the 2nd iteration, I > will take the 2nd available instance and bin them into another new > dataframe, "df2". > >My apologies, I did not read the first lines of your question carefully. Say we split the data frame by class using by(): byClass <- by( cmbine, cmbine[['classes']], function( df ){ return(df) } ) We could then determine the maximum number of rows in all the returned data frames: maxRows <- max(sapply( byClass, nrow )) Then, I usually resort to a gratuitous application of lapply() and do.call(): # Loop over each value between 1 and the maximum number of rows, return results as a list. lapply( 1:maxRow, function(i){ # Loop over each data frame, extract the ith rows and rbind the results # together. ithRows <- do.call(rbind,lapply(byClass,function(df){ return( df[i,] ) })) # Remove all NA rows ithRows <- ithRows[ !is.na(ithRows[,1]), ] return(ithRows) }) [[1]] names mass classes 1 apple 5e-01 1 2 tiger 1e+02 2 3 pencil 1e-02 3 [[2]] names mass classes 1 banana 0.15 1 2 chicken 1.00 2 [[3]] names mass classes 1 pear 0.3 1 There's definitely a more elegant way to do this, perhaps using some routines in the plyr package. Good luck! -Charlie ----- Charlie Sharpsteen Undergraduate Environmental Resources Engineering Humboldt State University -- View this message in context: http://www.nabble.com/splitting-dataframe%2C-assign-to-new-dataframe%2C-add-new-rows-to-new-dataframe-tp25865409p25866082.html Sent from the R help mailing list archive at Nabble.com.