Daren Tan
2008-Dec-03 04:52 UTC
[R] Speeding up casting a dataframe from long to wide format
Hi, I am casting a dataframe from long to wide format. The same codes that works for a smaller dataframe would take a long time (more than two hours and still running) for a longer dataframe of 2495227 rows and ten different predictors. How to make it more efficient ? wer <- data.frame(Name=c(1:5, 4:5), Type=c(letters[1:5], letters[4:5]), Predictor=c("A", "A", "A", "A", "A", "B", "B"))> werName Type Predictor 1 1 a A 2 2 b A 3 3 c A 4 4 d A 5 5 e A 6 4 d B 7 5 e B wer.melt <- melt(wer, id.var=c("Name", "Type")) cast(wer.melt, Name + Type ~ value, length, fill=0) Name Type A B 1 1 a 1 0 2 2 b 1 0 3 3 c 1 0 4 4 d 1 1 5 5 e 1 1> sessionInfo()R version 2.7.0 (2008-04-22) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] reshape_0.8.0
Gabor Grothendieck
2008-Dec-03 09:59 UTC
[R] Speeding up casting a dataframe from long to wide format
Try timing this to see if its any faster:> lev <- levels(wer$Predictor) > out <- outer(wer$Predictor, lev, "==") > colnames(out) <- lev > aggregate(out, wer[1:2], sum)Name Type A B 1 1 a 1 0 2 2 b 1 0 3 3 c 1 0 4 4 d 1 1 5 5 e 1 1 On Tue, Dec 2, 2008 at 11:52 PM, Daren Tan <daren76 at hotmail.com> wrote:> > Hi, > > I am casting a dataframe from long to wide format. The same codes that works for a smaller dataframe would take a long time (more than two hours and still running) for a longer dataframe of 2495227 rows and ten different predictors. How to make it more efficient ? > > wer <- data.frame(Name=c(1:5, 4:5), Type=c(letters[1:5], letters[4:5]), Predictor=c("A", "A", "A", "A", "A", "B", "B")) >> wer > Name Type Predictor > 1 1 a A > 2 2 b A > 3 3 c A > 4 4 d A > 5 5 e A > 6 4 d B > 7 5 e B > > wer.melt <- melt(wer, id.var=c("Name", "Type")) > > cast(wer.melt, Name + Type ~ value, length, fill=0) > Name Type A B > 1 1 a 1 0 > 2 2 b 1 0 > 3 3 c 1 0 > 4 4 d 1 1 > 5 5 e 1 1 > >> sessionInfo() > R version 2.7.0 (2008-04-22) > x86_64-unknown-linux-gnu > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] reshape_0.8.0 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
hadley wickham
2008-Dec-03 13:23 UTC
[R] Speeding up casting a dataframe from long to wide format
Hi Daren, Unfortunately, the current version of reshape isn't very efficient. I'm working on a new version which should be 10-20x times faster for the operation that you're performing, but this won't be ready for a while and in the meantime you might want to try an alternative approach, like the one that Gabor suggested. Hadley On Tue, Dec 2, 2008 at 10:52 PM, Daren Tan <daren76 at hotmail.com> wrote:> > Hi, > > I am casting a dataframe from long to wide format. The same codes that works for a smaller dataframe would take a long time (more than two hours and still running) for a longer dataframe of 2495227 rows and ten different predictors. How to make it more efficient ? > > wer <- data.frame(Name=c(1:5, 4:5), Type=c(letters[1:5], letters[4:5]), Predictor=c("A", "A", "A", "A", "A", "B", "B")) >> wer > Name Type Predictor > 1 1 a A > 2 2 b A > 3 3 c A > 4 4 d A > 5 5 e A > 6 4 d B > 7 5 e B > > wer.melt <- melt(wer, id.var=c("Name", "Type")) > > cast(wer.melt, Name + Type ~ value, length, fill=0) > Name Type A B > 1 1 a 1 0 > 2 2 b 1 0 > 3 3 c 1 0 > 4 4 d 1 1 > 5 5 e 1 1 > >> sessionInfo() > R version 2.7.0 (2008-04-22) > x86_64-unknown-linux-gnu > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] reshape_0.8.0 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- http://had.co.nz/
Possibly Parallel Threads
- how to convert data from long to wide format ?
- Reshape matrix from wide to long format
- Can't get the correct order from melt.data.frame of reshape library.
- How to reshape this data frame from long to wide ?
- Any simple way to subset a vector of strings that do contain a particular substring ?