Hi Folks,
I have a nasty data restructuring problem!
I can think of one or two really clumsy ways of doing it
with 'for' loops and the like, but I can't think of a
*neat* way of doing it in R.
The data are the Hadley Centre "Central England Temperature"
series, daily from 01/01/1772 to 31/03/2007, and can be
viewed/downloaded at
  http://hadobs.metoffice.com/hadcet/cetdl1772on.dat
and the structure is as follows:
Year DoM Jan  Feb Mar  Apr May  Jun Jul Aug  Sep Oct  Nov Dec
-------------------------------------------------------------
1772  1   32  -15  18   25  87  128 187 177  105 111   78 112
1772  2   20    7  28   38  77  138 154 158  143 150   85  62
1772  3   27   15  36   33  84  170 139 153  113 124   83  60
1772  4   27  -25  61   58  96   90 151 160  173 114   60  47
1772  5   15   -5  68   69 133  146 179 170  173 116   83  50
1772  6   22  -45  51   77 113  105 175 198  160 134  134  42
.................................
.................................
1772 27    0   46  66   74  77  198 156 144   76 104   45   5
1772 28   15   77  86   64 116  167 151 155   66  84   60  10
1772 29  -33   56  83   50 113  131 170 182  135 140   63  12
1772 30  -10 -999  66   77 121  122 179 163  143 143   55  15
1772 31   -8 -999  46 -999 108 -999 168 144 -999 145 -999  22
1773  1   20    0  79   13  93  174 104 151  171 131   68  55
1773  2   10   17  71   25  65  109 128 184  164  91   34  75
1773  3    5  -28  94   70  41   79 135 192  149 101   78  85
1773  4    5  -23  99  107  49  107 144 173  144  98   86  83
1773  5  -28  -30  76   65  83  128 144 182  116  98   66  38
.................................
"DoM" is Day of Month, 1-31 for each month ("short" months
get entries -999 on missing days).
So each year is a block of 31 lines and 14 columns, pf
which the last 12 are Temperature (in 10ths of a degreeC),
each column a month, running down each column for the
31 days of the month in that year.
What I want to do is convert this into a 4-column format:
  Year, Month, DoM, Temp
with a separate row for each consecutive day from 01/01/1772
to 31/02/2007, and omitting days which have a "-999" entry
(OK I still have to check that "-999" is only used for DoMs
which don't exist, and don't also indicate that a Temperature
may be missing for some other reason; but I believe the series
to be complete).
What it boils down to is stacking the 12 31-day Temperature
columns on top of each other in each year, filling in the
Year, Month, DoM, and stacking the results for consecutive
years on top of each other (after which one can strike out
the "-999"s). Hence, really clunky for-loops!
Any really *neat* ideas for this?
With thanks,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Apr-07                                       Time: 22:24:51
------------------------------ XFMail ------------------------------
Hi Ted,
melt(df, id=c("year","DoM"))
should get you started.
Hadley
On 4/27/07, Ted Harding <ted.harding at nessie.mcc.ac.uk>
wrote:> Hi Folks,
> I have a nasty data restructuring problem!
>
> I can think of one or two really clumsy ways of doing it
> with 'for' loops and the like, but I can't think of a
> *neat* way of doing it in R.
>
> The data are the Hadley Centre "Central England Temperature"
> series, daily from 01/01/1772 to 31/03/2007, and can be
> viewed/downloaded at
>
>   http://hadobs.metoffice.com/hadcet/cetdl1772on.dat
>
> and the structure is as follows:
>
> Year DoM Jan  Feb Mar  Apr May  Jun Jul Aug  Sep Oct  Nov Dec
> -------------------------------------------------------------
> 1772  1   32  -15  18   25  87  128 187 177  105 111   78 112
> 1772  2   20    7  28   38  77  138 154 158  143 150   85  62
> 1772  3   27   15  36   33  84  170 139 153  113 124   83  60
> 1772  4   27  -25  61   58  96   90 151 160  173 114   60  47
> 1772  5   15   -5  68   69 133  146 179 170  173 116   83  50
> 1772  6   22  -45  51   77 113  105 175 198  160 134  134  42
> .................................
> .................................
> 1772 27    0   46  66   74  77  198 156 144   76 104   45   5
> 1772 28   15   77  86   64 116  167 151 155   66  84   60  10
> 1772 29  -33   56  83   50 113  131 170 182  135 140   63  12
> 1772 30  -10 -999  66   77 121  122 179 163  143 143   55  15
> 1772 31   -8 -999  46 -999 108 -999 168 144 -999 145 -999  22
> 1773  1   20    0  79   13  93  174 104 151  171 131   68  55
> 1773  2   10   17  71   25  65  109 128 184  164  91   34  75
> 1773  3    5  -28  94   70  41   79 135 192  149 101   78  85
> 1773  4    5  -23  99  107  49  107 144 173  144  98   86  83
> 1773  5  -28  -30  76   65  83  128 144 182  116  98   66  38
> .................................
>
> "DoM" is Day of Month, 1-31 for each month ("short"
months
> get entries -999 on missing days).
>
> So each year is a block of 31 lines and 14 columns, pf
> which the last 12 are Temperature (in 10ths of a degreeC),
> each column a month, running down each column for the
> 31 days of the month in that year.
>
> What I want to do is convert this into a 4-column format:
>
>   Year, Month, DoM, Temp
>
> with a separate row for each consecutive day from 01/01/1772
> to 31/02/2007, and omitting days which have a "-999" entry
> (OK I still have to check that "-999" is only used for DoMs
> which don't exist, and don't also indicate that a Temperature
> may be missing for some other reason; but I believe the series
> to be complete).
>
> What it boils down to is stacking the 12 31-day Temperature
> columns on top of each other in each year, filling in the
> Year, Month, DoM, and stacking the results for consecutive
> years on top of each other (after which one can strike out
> the "-999"s). Hence, really clunky for-loops!
>
> Any really *neat* ideas for this?
>
> With thanks,
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 27-Apr-07                                       Time: 22:24:51
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Possibly Parallel Threads
- group data in classes
- [NFC] Restructuring LoopRoration.cpp to create Loop Rotation Pass with Loop Rotation Utility Interface
- restructuring matrix
- gnome-window-decorator -> gtk-window-decorator and some restructuring
- restructuring "by" output for use with write.table