thr3ads.net - R help - [R] More efficient use of reshape? [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Nathan Miller

2012-Dec-13 17:16 UTC

[R] More efficient use of reshape?

Hi all,

I have played a bit with the "reshape" package and function along with
"melt" and "cast", but I feel I still don't have a good
handle on how to
use them efficiently. Below I have included a application of "reshape"
that
is rather clunky and I'm hoping someone can offer advice on how to use
reshape (or melt/cast) more efficiently.


#For this example I am using climate change data available on-line

file <- ("
http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv")
clim.data <- read.csv(file, header=TRUE)

library(lubridate)
library(reshape)

#I've been playing with the lubridate package a bit to work with dates, but
as the climate dataset only uses year and month I have
#added a "day" to each entry in the "yr_mn" column and then
used "dym" from
lubridate to generate the POSIXlt formatted dates in
#a new column clim.data$date

clim.data$yr_mn<-paste("01", clim.data$yr_mn, sep="")
clim.data$date<-dym(clim.data$yr_mn)

#Now to the reshape. The dataframe is in a wide format. The columns GISS,
HAD, NOAA, RSS, and UAH are all different sources
#from which the global temperature anomaly has been calculated since 1880
(actually only 1978 for RSS and UAH). What I would like to
#do is plot the temperature anomaly vs date and use ggplot to facet by the
different data source (GISS, HAD, etc.). Thus I need the
#data in long format with a date column, a temperature anomaly column, and
a data source column. The code below works, but its
#really very clunky and I'm sure I am not using these tools as efficiently
as I can.

#The varying=list(3:7) specifies the columns in the dataframe that
corresponded to the sources (GISS, etc.), though then in the resulting
#reshaped dataframe the sources are numbered 1-5, so I have to reassigned
their names. In addition, the original dataframe has
#additional data columns I do not want and so after reshaping I create
another! dataframe with just the columns I need, and
#then I have to rename them so that I can keep track of what everything is.
Whew! Not the most elegant of code.

d<-reshape(clim.data, varying=list(3:7),idvar="date",
v.names="anomaly",direction="long")

d$time<-ifelse(d$time==1,"GISS",d$time)
d$time<-ifelse(d$time==2,"HAD",d$time)
d$time<-ifelse(d$time==3,"NOAA",d$time)
d$time<-ifelse(d$time==4,"RSS",d$time)
d$time<-ifelse(d$time==5,"UAH",d$time)

new.data<-data.frame(d$date,d$time,d$anomaly)
names(new.data)<-c("date","source","anomaly")

I realize this is a mess, though it works. I think with just some help on
how better to work this example I'll probably get over the learning hump
and actually figure out how to use these data manipulation functions more
cleanly.

Any advice or assistance would be appreciated.
Thanks,
Nate

	[[alternative HTML version deleted]]

David Winsemius

2012-Dec-13 17:48 UTC

head link

[R] More efficient use of reshape?

On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:
> Hi all,
>
> I have played a bit with the "reshape" package and function along
with
> "melt" and "cast", but I feel I still don't have a
good handle on
> how to
> use them efficiently. Below I have included a application of  
> "reshape" that
> is rather clunky and I'm hoping someone can offer advice on how to use
> reshape (or melt/cast) more efficiently.
>
You do realize that the 'reshape' function is _not_ in the reshape  
package, right? And also that the reshape package has been superseded  
by the reshape2 package?

-- 
David.
>
> #For this example I am using climate change data available on-line
>
> file <- ("
> http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv")
> clim.data <- read.csv(file, header=TRUE)
>
> library(lubridate)
> library(reshape)
>
> #I've been playing with the lubridate package a bit to work with  
> dates, but
> as the climate dataset only uses year and month I have
> #added a "day" to each entry in the "yr_mn" column and
then used
> "dym" from
> lubridate to generate the POSIXlt formatted dates in
> #a new column clim.data$date
>
> clim.data$yr_mn<-paste("01", clim.data$yr_mn,
sep="")
> clim.data$date<-dym(clim.data$yr_mn)
>
> #Now to the reshape. The dataframe is in a wide format. The columns  
> GISS,
> HAD, NOAA, RSS, and UAH are all different sources
> #from which the global temperature anomaly has been calculated since  
> 1880
> (actually only 1978 for RSS and UAH). What I would like to
> #do is plot the temperature anomaly vs date and use ggplot to facet  
> by the
> different data source (GISS, HAD, etc.). Thus I need the
> #data in long format with a date column, a temperature anomaly  
> column, and
> a data source column. The code below works, but its
> #really very clunky and I'm sure I am not using these tools as  
> efficiently
> as I can.
>
> #The varying=list(3:7) specifies the columns in the dataframe that
> corresponded to the sources (GISS, etc.), though then in the resulting
> #reshaped dataframe the sources are numbered 1-5, so I have to  
> reassigned
> their names. In addition, the original dataframe has
> #additional data columns I do not want and so after reshaping I create
> another! dataframe with just the columns I need, and
> #then I have to rename them so that I can keep track of what  
> everything is.
> Whew! Not the most elegant of code.
>
> d<-reshape(clim.data, varying=list(3:7),idvar="date",
> v.names="anomaly",direction="long")
>
> d$time<-ifelse(d$time==1,"GISS",d$time)
> d$time<-ifelse(d$time==2,"HAD",d$time)
> d$time<-ifelse(d$time==3,"NOAA",d$time)
> d$time<-ifelse(d$time==4,"RSS",d$time)
> d$time<-ifelse(d$time==5,"UAH",d$time)
>
> new.data<-data.frame(d$date,d$time,d$anomaly)
>
names(new.data)<-c("date","source","anomaly")
>
> I realize this is a mess, though it works. I think with just some  
> help on
> how better to work this example I'll probably get over the learning  
> hump
> and actually figure out how to use these data manipulation functions  
> more
> cleanly.
>
> Any advice or assistance would be appreciated.
> Thanks,
> Nate
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA

John Kane

2012-Dec-14 15:38 UTC

head link

[R] More efficient use of reshape?

I think David was pointing out that reshape() is not a reshape2 function.  It is
in the stats package.

I am not sure exactly what you are doing but perhaps something along the lines
of

library(reshape2)      
mm  <-  melt(clim.data, id = Cs("yr_frac", "yr_mn",   
"AMO", "NINO34", "SSTA"))

is a start?  

I also don't think that the more recent versions of ggplot2 automatically
load reshape2 so it may be that you are working with a relatively old
installation of ggplot and reshape?

sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8
 [4] LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8   
LC_MESSAGES=en_CA.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C                  LC_ADDRESS=C
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] lubridate_1.2.0    directlabels_2.9   RColorBrewer_1.0-5 gridExtra_0.9.1   
stringr_0.6.2
[6] scales_0.2.3       plyr_1.8           reshape2_1.2.1     ggplot2_0.9.3     

loaded via a namespace (and not attached):
[1] colorspace_1.2-0 dichromat_1.2-4  digest_0.6.0     gtable_0.1.2    
labeling_0.1
[6] MASS_7.3-22      munsell_0.4      proto_0.3-9.2    tools_2.15.2    






John Kane
Kingston ON Canada

> -----Original Message-----
> From: natemiller77 at gmail.com
> Sent: Thu, 13 Dec 2012 09:58:34 -0800
> To: dwinsemius at comcast.net
> Subject: Re: [R] More efficient use of reshape?
> 
> Sorry David,
> 
> In my attempt to simplify example and just include the code I felt was
> necessary I left out the loading of ggplot2, which then imports reshape2,
> and which was actually used in the code I provided. Sorry to the mistake
> and my misunderstanding of where the reshape function was coming from.
> Should have checked that more carefully.
> 
> Thanks,
> Nate
> 
> 
> On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius
> <dwinsemius at comcast.net>wrote:
> 
>> 
>> On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:
>> 
>>  Hi all,
>>> 
>>> I have played a bit with the "reshape" package and
function along with
>>> "melt" and "cast", but I feel I still don't
have a good handle on how
>>> to
>>> use them efficiently. Below I have included a application of
"reshape"
>>> that
>>> is rather clunky and I'm hoping someone can offer advice on how
to use
>>> reshape (or melt/cast) more efficiently.
>>> 
>>> 
>> You do realize that the 'reshape' function is _not_ in the
reshape
>> package, right? And also that the reshape package has been superseded
by
>> the reshape2 package?
>> 
>> --
>> David.
>> 
>> 
>>> #For this example I am using climate change data available on-line
>>> 
>>> file <- ("
>>>
http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csv<http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv>
>>> ")
>>> clim.data <- read.csv(file, header=TRUE)
>>> 
>>> library(lubridate)
>>> library(reshape)
>>> 
>>> #I've been playing with the lubridate package a bit to work
with dates,
>>> but
>>> as the climate dataset only uses year and month I have
>>> #added a "day" to each entry in the "yr_mn"
column and then used "dym"
>>> from
>>> lubridate to generate the POSIXlt formatted dates in
>>> #a new column clim.data$date
>>> 
>>> clim.data$yr_mn<-paste("01", clim.data$yr_mn,
sep="")
>>> clim.data$date<-dym(clim.data$**yr_mn)
>>> 
>>> #Now to the reshape. The dataframe is in a wide format. The columns
>>> GISS,
>>> HAD, NOAA, RSS, and UAH are all different sources
>>> #from which the global temperature anomaly has been calculated
since
>>> 1880
>>> (actually only 1978 for RSS and UAH). What I would like to
>>> #do is plot the temperature anomaly vs date and use ggplot to facet
by
>>> the
>>> different data source (GISS, HAD, etc.). Thus I need the
>>> #data in long format with a date column, a temperature anomaly
column,
>>> and
>>> a data source column. The code below works, but its
>>> #really very clunky and I'm sure I am not using these tools as
>>> efficiently
>>> as I can.
>>> 
>>> #The varying=list(3:7) specifies the columns in the dataframe that
>>> corresponded to the sources (GISS, etc.), though then in the
resulting
>>> #reshaped dataframe the sources are numbered 1-5, so I have to
>>> reassigned
>>> their names. In addition, the original dataframe has
>>> #additional data columns I do not want and so after reshaping I
create
>>> another! dataframe with just the columns I need, and
>>> #then I have to rename them so that I can keep track of what
everything
>>> is.
>>> Whew! Not the most elegant of code.
>>> 
>>> d<-reshape(clim.data,
varying=list(3:7),idvar="date"**,
>>> v.names="anomaly",direction="**long")
>>> 
>>> d$time<-ifelse(d$time==1,"**GISS",d$time)
>>> d$time<-ifelse(d$time==2,"HAD"**,d$time)
>>> d$time<-ifelse(d$time==3,"**NOAA",d$time)
>>> d$time<-ifelse(d$time==4,"RSS"**,d$time)
>>> d$time<-ifelse(d$time==5,"UAH"**,d$time)
>>> 
>>> new.data<-data.frame(d$date,d$**time,d$anomaly)
>>>
names(new.data)<-c("date","**source","anomaly")
>>> 
>>> I realize this is a mess, though it works. I think with just some
help
>>> on
>>> how better to work this example I'll probably get over the
learning
>>> hump
>>> and actually figure out how to use these data manipulation
functions
>>> more
>>> cleanly.
>>> 
>>> Any advice or assistance would be appreciated.
>>> Thanks,
>>> Nate
>>> 
>>>         [[alternative HTML version deleted]]
>>> 
>>> ______________________________**________________
>>> R-help at r-project.org mailing list
>>>
https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide http://www.R-project.org/**
>>> posting-guide.html
<http://www.R-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> David Winsemius, MD
>> Alameda, CA, USA
>> 
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
Receive Notifications of Incoming Messages
Easily monitor multiple email accounts & access them with a click.
Visit http://www.inbox.com/notifier and check it out!

Reasonably Related Threads

Search for more apparently analagous threads

R help - Dec 2012 - More efficient use of reshape?

[R] More efficient use of reshape?

[R] More efficient use of reshape?

[R] More efficient use of reshape?

Reasonably Related Threads