thr3ads.net - R help - [R] Speeding up casting a dataframe from long to wide format [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Daren Tan

2008-Dec-03 04:52 UTC

[R] Speeding up casting a dataframe from long to wide format

Hi, 
 
I am casting a dataframe from long to wide format. The same codes that works for
a smaller dataframe would take a long time (more than two hours and still
running) for a longer dataframe of 2495227 rows and ten different predictors.
How to make it more efficient ?
 
wer <- data.frame(Name=c(1:5, 4:5), Type=c(letters[1:5], letters[4:5]),
Predictor=c("A", "A", "A", "A",
"A", "B", "B"))> wer  Name Type Predictor
1    1    a         A
2    2    b         A
3    3    c         A
4    4    d         A
5    5    e         A
6    4    d         B
7    5    e         B

wer.melt <- melt(wer, id.var=c("Name", "Type"))

cast(wer.melt, Name + Type ~ value, length, fill=0)
  Name Type A B
1    1    a 1 0
2    2    b 1 0
3    3    c 1 0
4    4    d 1 1
5    5    e 1 1
> sessionInfo()R version 2.7.0 (2008-04-22)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
other attached packages:
[1] reshape_0.8.0

Gabor Grothendieck

2008-Dec-03 09:59 UTC

head link

[R] Speeding up casting a dataframe from long to wide format

Try timing this to see if its any faster:
> lev <- levels(wer$Predictor)
> out <- outer(wer$Predictor, lev, "==")
> colnames(out) <- lev
> aggregate(out, wer[1:2], sum)  Name Type A B
1    1    a 1 0
2    2    b 1 0
3    3    c 1 0
4    4    d 1 1
5    5    e 1 1


On Tue, Dec 2, 2008 at 11:52 PM, Daren Tan <daren76 at hotmail.com>
wrote:>
> Hi,
>
> I am casting a dataframe from long to wide format. The same codes that
works for a smaller dataframe would take a long time (more than two hours and
still running) for a longer dataframe of 2495227 rows and ten different
predictors. How to make it more efficient ?
>
> wer <- data.frame(Name=c(1:5, 4:5), Type=c(letters[1:5], letters[4:5]),
Predictor=c("A", "A", "A", "A",
"A", "B", "B"))
>> wer
>  Name Type Predictor
> 1    1    a         A
> 2    2    b         A
> 3    3    c         A
> 4    4    d         A
> 5    5    e         A
> 6    4    d         B
> 7    5    e         B
>
> wer.melt <- melt(wer, id.var=c("Name", "Type"))
>
> cast(wer.melt, Name + Type ~ value, length, fill=0)
>  Name Type A B
> 1    1    a 1 0
> 2    2    b 1 0
> 3    3    c 1 0
> 4    4    d 1 1
> 5    5    e 1 1
>
>> sessionInfo()
> R version 2.7.0 (2008-04-22)
> x86_64-unknown-linux-gnu
> locale:
>
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> other attached packages:
> [1] reshape_0.8.0
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

hadley wickham

2008-Dec-03 13:23 UTC

head link

[R] Speeding up casting a dataframe from long to wide format

Hi Daren,

Unfortunately, the current version of reshape isn't very efficient.
I'm working on a new version which should be 10-20x times faster for
the operation that you're performing, but this won't be ready for a
while and in the meantime you might want to try an alternative
approach, like the one that Gabor suggested.

Hadley

On Tue, Dec 2, 2008 at 10:52 PM, Daren Tan <daren76 at hotmail.com>
wrote:>
> Hi,
>
> I am casting a dataframe from long to wide format. The same codes that
works for a smaller dataframe would take a long time (more than two hours and
still running) for a longer dataframe of 2495227 rows and ten different
predictors. How to make it more efficient ?
>
> wer <- data.frame(Name=c(1:5, 4:5), Type=c(letters[1:5], letters[4:5]),
Predictor=c("A", "A", "A", "A",
"A", "B", "B"))
>> wer
>  Name Type Predictor
> 1    1    a         A
> 2    2    b         A
> 3    3    c         A
> 4    4    d         A
> 5    5    e         A
> 6    4    d         B
> 7    5    e         B
>
> wer.melt <- melt(wer, id.var=c("Name", "Type"))
>
> cast(wer.melt, Name + Type ~ value, length, fill=0)
>  Name Type A B
> 1    1    a 1 0
> 2    2    b 1 0
> 3    3    c 1 0
> 4    4    d 1 1
> 5    5    e 1 1
>
>> sessionInfo()
> R version 2.7.0 (2008-04-22)
> x86_64-unknown-linux-gnu
> locale:
>
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> other attached packages:
> [1] reshape_0.8.0
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
http://had.co.nz/

Possibly Parallel Threads

Search for more maybe matching threads

R help - Dec 2008 - Speeding up casting a dataframe from long to wide format

[R] Speeding up casting a dataframe from long to wide format

[R] Speeding up casting a dataframe from long to wide format

[R] Speeding up casting a dataframe from long to wide format

Possibly Parallel Threads