Hello,
I have a large data frame that is organized by date in a peculiar way. I
am seeking advice on how to transform the data into a format that is of
more use to me.
The data is organized as follows:
STN_ID YEAR MM ELEM X1 X2 X3 X4
X5 X6 X7
1 2402594 1997 9 1 *-00233* *-00204* *-00119* -00190 -00251
-00243 -00249
2 2402594 1997 10 1 -00003 -00005 -00001 -00039
-00031 -00036 -00033
3 2402594 1997 11 1 000025 000065 000070 000069
000115 000072 000093
4 2402594 1997 12 1 000160 000114 000143 000140
000093 000068 000157
5 2402594 1998 1 1 000067 000095 000139 000113
000066 000081 000070
6 2402594 1998 2 1 000098 000102 000140 000124
000082 000111 000047
7 2402594 1998 3 1 -00039 -00006 000015 000015
000016 000035 000013
8 2402594 1998 4 1 -00035 -00046 -00046 -00062
-00018 -00025 -00012
9 2402594 1998 5 1 000031 000011 -00005 -00061
-00061 -00080 -00217
10 2402594 1997 9 2 *-00339 -00339 -00343* -00346 -00285
-00253 -00253
11 2402594 1997 10 2 -00207 -00289 -00278 -00271
-00258 -00315 -00341
12 2402594 1997 11 2 -00242 -00230 -00206 -00180
-00256 -00227 -00241
13 2402594 1997 12 2 -00155 -00153 -00118 -00066
-00088 -00073 -00032
14 2402594 1998 1 2 000003 -00021 -00033 -00022
-00014 000001 000008
15 2402594 1998 2 2 000050 000077 000106 000073
000060 000060 000083
16 2402594 1998 3 2 000095 000063 000030 000057
000073 000144 000090
17 2402594 1998 4 2 000128 000178 000195 000157
000160 000160 000117
18 2402594 1998 5 2 000074 000064 000051 000027
000053 000063 000067
Where "MM" is the month of the year, and ELEM is the variable to which
the values in the X* columns describe (in the actual data there are 31 X
columns, one for each day of the month). The values in bold are the
values that are transferred into the small chart below (which is the
result I hope to get). This is to give a sense of how the data is picked
out of the original data frame.
I would like to organize the data so it looks like this:
STN_ID YEAR MM DAY ELEM1 ELEM2
1 2402594 1997 9 X1 -00233 -00339
2 2402594 1997 9 X2 -00204 000077
3 2402594 1997 9 X3 -00119 000030
Such that I create a new column named "DAY" that is made up of the
numbers following "X" in the original data.frame columns. Also, the
ELEM
values are converted to columns and parsed with the ELEM code (in this
case 1 and 2).
I have tried to split apart the columns, transform them, and bind them
back together, but my ability to do so just isn't there yet. I am still
fairly new to R, and would really appreciate some help in working
towards organizing this data frame.
Thanks in advance,
Scott Hatcher
[[alternative HTML version deleted]]
My suggestion, since bold doesn't show up in a text only mailing list, would be to look into the function ?aggregate. It looks like something like (assuming your data is in a mydat): mydat.new <- aggregate(cbind(STN_ID, YEAR, MM, DAY) ~ ELEM + ?, mydat, FUN = ?) #this is up to you Alternatively, the plyr package is great at transforming data.frames. On Tue, May 24, 2011 at 3:03 PM, Scott Hatcher <scott.v.hatcher at gmail.com> wrote:> Hello, > > I have a large data frame that is organized by date in a peculiar way. I > am seeking advice on how to transform the data into a format that is of > more use to me. > > The data is organized as follows: > > ? ? STN_ID YEAR MM ELEM ? ? ?X1 ? ? ? ? X2 ? ? ? X3 ? ? ? ? X4 > X5 ? ? ? ?X6 ? ? ? ? X7 > 1 ?2402594 1997 ? 9 ? 1 *-00233* *-00204* *-00119* ?-00190 ?-00251 > -00243 ?-00249 > 2 ?2402594 1997 ?10 ?1 ? ? ? ? ? ? ?-00003 ?-00005 ?-00001 ?-00039 > -00031 -00036 ?-00033 > 3 ?2402594 1997 ?11 ?1 ? ? ? ? ? ? ?000025 ?000065 ?000070 ?000069 > 000115 ?000072 ?000093 > 4 ?2402594 1997 ?12 ?1 ? ? ? ? ? ? ?000160 ?000114 ?000143 ?000140 > 000093 ?000068 ?000157 > 5 ?2402594 1998 ?1 ? ?1 ? ? ? ? ? ? ?000067 ?000095 ?000139 ?000113 > 000066 ?000081 ?000070 > 6 ?2402594 1998 ?2 ? ?1 ? ? ? ? ? ? ?000098 ?000102 ?000140 ?000124 > 000082 ?000111 ?000047 > 7 ?2402594 1998 ?3 ? ?1 ? ? ? ? ? ? ?-00039 ?-00006 ?000015 ?000015 > 000016 ?000035 ?000013 > 8 ?2402594 1998 ?4 ? ?1 ? ? ? ? ? ? ?-00035 ?-00046 ?-00046 ?-00062 > -00018 ?-00025 ?-00012 > 9 ?2402594 1998 ?5 ? ?1 ? ? ? ? ? ? ?000031 ?000011 ?-00005 ?-00061 > -00061 ?-00080 ?-00217 > 10 2402594 1997 ?9 ? ?2 *-00339 ?-00339 ?-00343* ?-00346 ?-00285 > -00253 ?-00253 > 11 2402594 1997 ?10 ?2 ? ? ? ? ? ? ?-00207 ?-00289 ?-00278 ?-00271 > -00258 ?-00315 ?-00341 > 12 2402594 1997 ?11 ?2 ? ? ? ? ? ? ?-00242 ?-00230 ?-00206 ?-00180 > -00256 ?-00227 ?-00241 > 13 2402594 1997 ?12 ?2 ? ? ? ? ? ? ?-00155 ?-00153 ?-00118 ?-00066 > -00088 ?-00073 ?-00032 > 14 2402594 1998 ?1 ? ?2 ? ? ? ? ? ? ?000003 ?-00021 ?-00033 ?-00022 > -00014 ?000001 ?000008 > 15 2402594 1998 ?2 ? ?2 ? ? ? ? ? ? ?000050 ?000077 ?000106 ?000073 > 000060 ?000060 ?000083 > 16 2402594 1998 ?3 ? ?2 ? ? ? ? ? ? ?000095 ?000063 ?000030 ?000057 > 000073 ?000144 ?000090 > 17 2402594 1998 ?4 ? ?2 ? ? ? ? ? ? ?000128 ?000178 ?000195 ?000157 > 000160 ?000160 ?000117 > 18 2402594 1998 ?5 ? ?2 ? ? ? ? ? ? ?000074 ?000064 ?000051 ?000027 > 000053 ?000063 ?000067 > > Where "MM" is the month of the year, and ELEM is the variable to which > the values in the X* columns describe (in the actual data there are 31 X > columns, one for each day of the month). The values in bold are the > values that are transferred into the small chart below (which is the > result I hope to get). This is to give a sense of how the data is picked > out of the original data frame. > > I would like to organize the data so it looks like this: > > ? ? ? STN_ID YEAR MM DAY ? ?ELEM1 ELEM2 > 1 ? ? 2402594 1997 ? 9 ?X1 ? ? ? -00233 -00339 > 2 ? ? 2402594 1997 ? 9 ?X2 ? ? ? -00204 000077 > 3 ? ? 2402594 1997 ? 9 ?X3 ? ? ? -00119 000030 > > Such that I create a new column named "DAY" that is made up of the > numbers following "X" in the original data.frame columns. Also, the ELEM > values are converted to columns and parsed with the ELEM code (in this > case 1 and 2). > > I have tried to split apart the columns, transform them, and bind them > back together, but my ability to do so just isn't there yet. I am still > fairly new to R, and would really appreciate some help in working > towards organizing this data frame. > > Thanks in advance, > Scott Hatcher > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- ==============================================Jon Daily Technician ==============================================#!/usr/bin/env outside # It's great, trust me.
On May 24, 2011, at 3:03 PM, Scott Hatcher wrote:> Hello, > > I have a large data frame that is organized by date in a peculiar > way. I > am seeking advice on how to transform the data into a format that is > of > more use to me. > > The data is organized as follows: > > STN_ID YEAR MM ELEM X1 X2 X3 > X4 X5 X6 X7 > 1 2402594 1997 9 1 *-00233* *-00204* *-00119* -00190 -00251 > -00243 -00249 > 2 2402594 1997 10 1 -00003 -00005 -00001 -00039 > -00031 -00036 -00033 > 3 2402594 1997 11 1 000025 000065 000070 000069 > 000115 000072 000093 > > Where "MM" is the month of the year, and ELEM is the variable to which > the values in the X* columns describe (in the actual data there are > 31 X > columns, one for each day of the month). The values in bold are the > values that are transferred into the small chart below (which is the > result I hope to get). This is to give a sense of how the data is > picked > out of the original data frame.assuming this dataframe is named 'tst': require(reshape2) mtst <- melt(tst[, 1:7], id.vars=1:4) Only select idvars and X1:X3 str(mtst) #---------- 'data.frame': 54 obs. of 6 variables: $ STN_ID : num 2402594 2402594 2402594 2402594 2402594 ... $ YEAR : num 1997 1997 1997 1997 1998 ... $ MM : num 9 10 11 12 1 2 3 4 5 9 ... $ ELEM : num 1 1 1 1 1 1 1 1 1 2 ... $ variable: Factor w/ 3 levels "X1","X2","X3": 1 1 1 1 1 1 1 1 1 1 ... $ value : chr "-00233" "-00003" "000025" "000160" ... dcast(mtst, STN_ID +YEAR+ MM + variable ~ ELEM) #--------- STN_ID YEAR MM variable 1 2 1 2402594 1997 9 X1 -00233 -00339 2 2402594 1997 9 X2 -00204 -00339 3 2402594 1997 9 X3 -00119 -00343 4 2402594 1997 10 X1 -00003 -00207 5 2402594 1997 10 X2 -00005 -00289 6 2402594 1997 10 X3 -00001 -00278 7 2402594 1997 11 X1 000025 -00242 snipped output> > I would like to organize the data so it looks like this: > > STN_ID YEAR MM DAY ELEM1 ELEM2 > 1 2402594 1997 9 X1 -00233 -00339 > 2 2402594 1997 9 X2 -00204 000077 > 3 2402594 1997 9 X3 -00119 000030Where is that second column coming from. I don't see it in the data example> > Such that I create a new column named "DAY" that is made up of the > numbers following "X" in the original data.frame columns. Also, the > ELEM > values are converted to columns and parsed with the ELEM code (in this > case 1 and 2). > > I have tried to split apart the columns, transform them, and bind them > back together, but my ability to do so just isn't there yet. I am > still > fairly new to R, and would really appreciate some help in working > towards organizing this data frame. > > Thanks in advance, > Scott Hatcher > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT