thr3ads.net - R help - [R] [newbie] aggregating table() results and simplifying code with loop [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Davide Rizzo

2012-Sep-13 13:36 UTC

[R] [newbie] aggregating table() results and simplifying code with loop

Dear all,
I'm looking for primary help at aggregating table() results and at
writing a loop (if useful)

My dataset ( http://goo.gl/gEPKW ) is composed of 23k rows, each one
representing a point in the space of which we know the land cover over
10 years (column y01 to y10).

I need to analyse it with a temporal sliding window of 5 years (y01 to
y05, y02 to y06 and so forth)
For each period I'm looking for specific sequences (e.g., Maize,
-noMaize, -noMaize, -noMaize, -noMaize) to calculate the "return time"
of principal land covers: barley (2BC), colza (2Co), maize (2Ma), etc.
I define the "return time" as the presence of a given land cover
according to a given sequence. Hence, each return time could require
the sum of different sequences (e.g., a return time of 5 years derives
from the sum of [2Ma,no2Ma,no2Ma,no2Ma,no2Ma] +
[no2Ma,no2Ma,no2Ma,no2Ma,2Ma]).
I need to repeat the calculation for each land cover for each time
window. In addition, I need to repeat the process over three datasets
(the one I give is the first one, the second one is from year 12 to
year 24, the third one from year 27 to year 31. So I have breaks in
the monitoring of land cover that avoid me to create a continuous
dataset). At the end I expect to aggregate the sum for each spatial
entity (column WS)

I've started writing the code for the first crop in the first 5yrs
period (http://goo.gl/FhZNx) then copying and pasting it for each crop
then for each time window...
Moreover I do not know how to aggregate the results of table(). (NB
sometimes I have a different number of WS per table because a given
sequence could be absent in a given spatial entity... so I have the
following warning msg: number of columns of result is not a multiple
of vector length (arg 1)). Therefore, I'm "obliged" to
copy&paste the
table corresponding to each sequence....

FIRST QUEST. How to aggregate the results of table() when the number
of columns is different?
Or the other way around: Is there a way to have a table where each row
reports the number of points per time return per WS? something like

WS1    WS2    WS3    WS4    ...    WS16    crop    period
23    15    18    43    ...    52       Ma5    01
18    11    25    84    ...    105       Ma2    01
...    ...    ...    ...    ...    ...    ...    ...
...    ...    ...    ...    ...    ...    Co5    01
...    ...    ...    ...    ...    ...    ...    ...
...    ...    ...    ...    ...    ...    Ma5    02
...    ...    ...    ...    ...    ...    ...    ...
In this table each row should represent a return time for a given land
cover a given period (one of the 6 time window of 5 years)?

SECOND QUEST. Could a loop (instead of a modular copy/paste code)
improve the time/reliability of the calculation? If yes, could you
please indicate me some entry-level references to write it?

I am aware this are newbie's questions, but I have not be able to
solve them using manuals and available sources.
Thank you in advance for your help.

Greetings,
Dd

PS
R: version 2.14.2 (2012-02-29)
OS: MS Windows XP Home 32-bit SP3


*****************************
Davide Rizzo
post-doc researcher
INRA UR055 SAD-ASTER
website :: http://sites.google.com/site/ridavide/

Rui Barradas

2012-Sep-13 23:25 UTC

head link

[R] [newbie] aggregating table() results and simplifying code with loop

Hello,

You would get more and better help if you were to break your problem 
into smaller sub-problems.
I am not really sure if this is what you want but here it goes.


inxList <- function(DF, logcover, nYears){
     f <- function(x, n){
         if(any(x)){
             r <- rle(x)
             any(r$lengths[!r$values] == n)
         }else FALSE
     }
     yrs <- lapply(0:5, `+`, seq_len(5))
     inx.list <- lapply(yrs, function(i){apply(logcover[, i], 1, f, 
nYears)})
     inx.list
}

makeCovers <- function(Cover, nYears){
     lcover <- T80[, ycols] == Cover
     inx.list <- inxList(T80[, ycols], lcover, nYears)
     tmp <- rep(NA, length(ws))
     names(tmp) <- ws
     xtb <- do.call(rbind, lapply(seq_along(inx.list), function(ix){
         xt <- xtabs(~ WS, T80[ inx.list[[ix]], c(2, 1 + ix + 1:5) ])
         tmp[names(xt)] <- xt; tmp}))
     colnames(xtb) <- paste("WS", seq_along(ws), sep =
"")
     data.frame(xtb, Cover = Cover, Period = seq_along(inx.list))
}

T80 <- read.table("sample.txt", header=TRUE, sep = ";")

ycols <- grep("y", names(T80))
ws <- unique(T80$WS)
covers <- as.character(unique(unlist(T80[ycols])))

result <- lapply( 4:2, function(.n)
         do.call(rbind, lapply(covers, makeCovers, .n)) )
names(result) <- paste("nYears", 4:2, sep = ".")

str(result)
# See the results for 4 years
result[[ "nYears.4" ]]  # or any other number of years


Hope this helps,

Rui Barradas
Em 13-09-2012 14:36, Davide Rizzo escreveu:> Dear all,
> I'm looking for primary help at aggregating table() results and at
> writing a loop (if useful)
>
> My dataset ( http://goo.gl/gEPKW ) is composed of 23k rows, each one
> representing a point in the space of which we know the land cover over
> 10 years (column y01 to y10).
>
> I need to analyse it with a temporal sliding window of 5 years (y01 to
> y05, y02 to y06 and so forth)
> For each period I'm looking for specific sequences (e.g., Maize,
> -noMaize, -noMaize, -noMaize, -noMaize) to calculate the "return
time"
> of principal land covers: barley (2BC), colza (2Co), maize (2Ma), etc.
> I define the "return time" as the presence of a given land cover
> according to a given sequence. Hence, each return time could require
> the sum of different sequences (e.g., a return time of 5 years derives
> from the sum of [2Ma,no2Ma,no2Ma,no2Ma,no2Ma] +
> [no2Ma,no2Ma,no2Ma,no2Ma,2Ma]).
> I need to repeat the calculation for each land cover for each time
> window. In addition, I need to repeat the process over three datasets
> (the one I give is the first one, the second one is from year 12 to
> year 24, the third one from year 27 to year 31. So I have breaks in
> the monitoring of land cover that avoid me to create a continuous
> dataset). At the end I expect to aggregate the sum for each spatial
> entity (column WS)
>
> I've started writing the code for the first crop in the first 5yrs
> period (http://goo.gl/FhZNx) then copying and pasting it for each crop
> then for each time window...
> Moreover I do not know how to aggregate the results of table(). (NB
> sometimes I have a different number of WS per table because a given
> sequence could be absent in a given spatial entity... so I have the
> following warning msg: number of columns of result is not a multiple
> of vector length (arg 1)). Therefore, I'm "obliged" to
copy&paste the
> table corresponding to each sequence....
>
> FIRST QUEST. How to aggregate the results of table() when the number
> of columns is different?
> Or the other way around: Is there a way to have a table where each row
> reports the number of points per time return per WS? something like
>
> WS1    WS2    WS3    WS4    ...    WS16    crop    period
> 23    15    18    43    ...    52       Ma5    01
> 18    11    25    84    ...    105       Ma2    01
> ...    ...    ...    ...    ...    ...    ...    ...
> ...    ...    ...    ...    ...    ...    Co5    01
> ...    ...    ...    ...    ...    ...    ...    ...
> ...    ...    ...    ...    ...    ...    Ma5    02
> ...    ...    ...    ...    ...    ...    ...    ...
> In this table each row should represent a return time for a given land
> cover a given period (one of the 6 time window of 5 years)?
>
> SECOND QUEST. Could a loop (instead of a modular copy/paste code)
> improve the time/reliability of the calculation? If yes, could you
> please indicate me some entry-level references to write it?
>
> I am aware this are newbie's questions, but I have not be able to
> solve them using manuals and available sources.
> Thank you in advance for your help.
>
> Greetings,
> Dd
>
> PS
> R: version 2.14.2 (2012-02-29)
> OS: MS Windows XP Home 32-bit SP3
>
>
> *****************************
> Davide Rizzo
> post-doc researcher
> INRA UR055 SAD-ASTER
> website :: http://sites.google.com/site/ridavide/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

John Kane

2012-Sep-15 15:51 UTC

head link

[R] [newbie] aggregating table() results and simplifying code with loop

I have not seen any replies to your questions so I will suggest an approach that
may work if I can get a function to work.

If I understand what you want, you have a pattern something like this:
pattern1  <-  c("2Ma", "no2Ma","no2Ma",
"no2Ma","no2Ma")
pattern2  <-  c("no2Ma", 'no2Ma', "no2Ma",
"no2Ma", "2Ma")

for each five year period where 2Ma stands to Maize, one of 11 different grains
  1AU   2BC   2Co   2Ma   2MG   2ML   2oc   2PG   2SA   2We   3sN 

and what you want to know is if each year gives a pattern like

check1 <-  c(TRUE, FALSE, FALSE, FALSE, FALSE)
check2  <-  c(FALSE, FALSE, FALSE, FALSE, TRUE)

If I understand the patterns you only care for the two above, is that correct?

I am running out of time today but I think that this approach will get you
started
==========================================================
T80<-read.table(file="C:/sample.txt", header=T, sep=";")

# Reminder of just what we want to get as a final result.
check1 <-  c(TRUE, FALSE, FALSE, FALSE, FALSE)
check2  <-  c(FALSE, FALSE, FALSE, FALSE, TRUE)

pattern1  <-  c("2Ma", "2Ma","2Ma",
"2Ma","2Ma")

# one row examples to see that is happening
T80[1,3:7]
T80[1, 3:7] == pattern1

T80[405, 3:7]
T80[405, 3:7] == pattern1

# now we apply the patterns to the entire data set.
pp1  <-  T80[, 3:7] == pattern1
pp2  <-  T80[, 3:7] == pattern2

# reassign the WS values so we know where the data is from
WSnames  <-  rep(T80$WS, 2)

# Assmble new data frame. 
maizedata  <-  data.frame(WSnames, rbind(pp1,pp2))
=======================================================
Now, assuming this runs for you and I have not made a serious mistake in logic,
kyou should be able to do some subsetting  (?subset)  to extract only the
check1 and check2 patterns above.  

This is where I ran into trouble as I don't have the time this morning to
work out the subsetting conditions. It looks tricking and you probably need a
couple of subsetting moves.

It's not a pretty  solutlion and, particularly, I expect someone could clean
it up to make the subsetting easier or even unnecessary but I hope it helps.

Once you have extracted what you want   use apply() or perhaps the plyr package
to aggregate the results.

Repeat for all grains.  Actually look into setting the whole thing up as a
function. You should be able to write the program once as a function and do a
loop or an apply() to do all 11 grains in one go.

Best of luck.

John Kane
Kingston ON Canada

> -----Original Message-----
> From: ridavide at gmail.com
> Sent: Thu, 13 Sep 2012 15:36:28 +0200
> To: r-help at r-project.org
> Subject: [R] [newbie] aggregating table() results and simplifying code
> with loop
> 
> Dear all,
> I'm looking for primary help at aggregating table() results and at
> writing a loop (if useful)
> 
> My dataset ( http://goo.gl/gEPKW ) is composed of 23k rows, each one
> representing a point in the space of which we know the land cover over
> 10 years (column y01 to y10).
> 
> I need to analyse it with a temporal sliding window of 5 years (y01 to
> y05, y02 to y06 and so forth)
> For each period I'm looking for specific sequences (e.g., Maize,
> -noMaize, -noMaize, -noMaize, -noMaize) to calculate the "return
time"
> of principal land covers: barley (2BC), colza (2Co), maize (2Ma), etc.
> I define the "return time" as the presence of a given land cover
> according to a given sequence. Hence, each return time could require
> the sum of different sequences (e.g., a return time of 5 years derives
> from the sum of [2Ma,no2Ma,no2Ma,no2Ma,no2Ma] +
> [no2Ma,no2Ma,no2Ma,no2Ma,2Ma]).
> I need to repeat the calculation for each land cover for each time
> window. In addition, I need to repeat the process over three datasets
> (the one I give is the first one, the second one is from year 12 to
> year 24, the third one from year 27 to year 31. So I have breaks in
> the monitoring of land cover that avoid me to create a continuous
> dataset). At the end I expect to aggregate the sum for each spatial
> entity (column WS)
> 
> I've started writing the code for the first crop in the first 5yrs
> period (http://goo.gl/FhZNx) then copying and pasting it for each crop
> then for each time window...
> Moreover I do not know how to aggregate the results of table(). (NB
> sometimes I have a different number of WS per table because a given
> sequence could be absent in a given spatial entity... so I have the
> following warning msg: number of columns of result is not a multiple
> of vector length (arg 1)). Therefore, I'm "obliged" to
copy&paste the
> table corresponding to each sequence....
> 
> FIRST QUEST. How to aggregate the results of table() when the number
> of columns is different?
> Or the other way around: Is there a way to have a table where each row
> reports the number of points per time return per WS? something like
> 
> WS1    WS2    WS3    WS4    ...    WS16    crop    period
> 23    15    18    43    ...    52       Ma5    01
> 18    11    25    84    ...    105       Ma2    01
> ...    ...    ...    ...    ...    ...    ...    ...
> ...    ...    ...    ...    ...    ...    Co5    01
> ...    ...    ...    ...    ...    ...    ...    ...
> ...    ...    ...    ...    ...    ...    Ma5    02
> ...    ...    ...    ...    ...    ...    ...    ...
> In this table each row should represent a return time for a given land
> cover a given period (one of the 6 time window of 5 years)?
> 
> SECOND QUEST. Could a loop (instead of a modular copy/paste code)
> improve the time/reliability of the calculation? If yes, could you
> please indicate me some entry-level references to write it?
> 
> I am aware this are newbie's questions, but I have not be able to
> solve them using manuals and available sources.
> Thank you in advance for your help.
> 
> Greetings,
> Dd
> 
> PS
> R: version 2.14.2 (2012-02-29)
> OS: MS Windows XP Home 32-bit SP3
>
> *****************************
> Davide Rizzo
> post-doc researcher
> INRA UR055 SAD-ASTER
> website :: http://sites.google.com/site/ridavide/
____________________________________________________________
GET FREE 5GB EMAIL - Check out spam free email with many cool features!
Visit http://www.inbox.com/email to find out more!

John Kane

2012-Sep-16 18:24 UTC

head link

[R] [newbie] aggregating table() results and simplifying code with loop

Hi Davide,

I had some time this afternoon and I wonder if this approach is llkely to get
the results you want?  As before it is not complete but I think it holds
promise.

On the other hand Rui is a much better programer than I am so he may have a much
cleaner solution.  My way still looks labour-intensive at the moment.

I am using the plyr package which you will probably have to install.
load.packages("plyr") should do it.
=========================================================# load the plyr package
-
library(plyr)

# sample data
T80<- read.csv("/home/john/rdata/sample.csv",  header = TRUE, sep =
";")
# Davide's actual read statement
# T80<-read.table(file="C:/sample.txt", header=T,
sep=";")

# Looking for Maize
pattern  <-  c("2Ma", "2Ma","2Ma",
"2Ma","2Ma")

# one row examples to see that is happening
T80[1,3:7]
T80[1, 3:7] == pattern

T80[405, 3:7]
T80[405, 3:7] == pattern

T80[55, 3:7] == pattern

# now we apply the patterns to the entire data set.
pp1  <-  T80[, 3:7] == pattern

# paste the TRUEs and FALSEs together to form a single variable
concatdat  <-  paste(pp1[, 1], pp1[, 2], pp1[, 3], pp1[, 4],pp1[,5] ,  sep =
"+")

# Assmble new data frame. 
maizedata  <-  data.frame(T80$WS, concatdat)
names(maizedata)  <-  c("WS", "crop_pattern")

mzcount  <-  ddply(maizedata, .(WS, crop_pattern),  summarize, count =
length(crop_pattern))
mzcount  # This is all the data not just the relevant maise patterns 

# This seems to be getting us somewhere though we are not not there yet
# Does this subset  look like we are going in the right direction?
m51  <-  subset(mzcount, 
mzcount$crop_pattern == "FALSE+FALSE+FALSE+FALSE+TRUE" 
| mzcount$crop_pattern == "TRUE+FALSE+FALSE+FALSE+FALSE")

m51  <-  ddply(m51, .(WS), summarize, count = sum(count))
m51
================================================================












John Kane
Kingston ON Canada

> -----Original Message-----
> From: ridavide at gmail.com
> Sent: Sat, 15 Sep 2012 19:00:29 +0200
> To: jrkrideau at inbox.com, ruipbarradas at sapo.pt
> Subject: Re: [R] [newbie] aggregating table() results and simplifying
> code with loop
> 
> Thanks Rui, thanks John for your very different solutions.
> 
> I'll try to break my questions into smaller steps following your tips.
> However, not everything is clear for me... so before giving you a
> feed-back I need to study further your answers. For the moment I could
> specify that I'm looking for the following 19 patterns:
> 
> 1. True, False, False, False, False # return period of 5 years (1/2)
> 2. False, False, False, False, True # return period of 5 years (2/2)
> 3. True, False, False, False, True # return period of 4 years (1/3)
> 4. False, True, False, False, False # return period of 4 years (2/3)
> 5. False, False, False, True, False # return period of 4 years (3/3)
> 6. True, False, False, True, False # return period of 3 years (1/3)
> 7. False, True, False, False, True # return period of 3 years (2/3)
> 8. False, False, True, False, False # return period of 3 years (3/3)
> 9. False, True, False, True, False # return period of 2 years (1/2)
> 10. True, False, True, False, True # return period of 2 years (1/2)
> 11. True, True, True, True, True # mono-succession of 5 years
> 12. False, True, True, True, True # mono-succession of 4 years (1/2)
> 13. True, True, True, True, False # mono-succession of 4 years (2/2)
> 14. True, False, True, True, True # mono-succession of 3 years (1/5)
> 15. True. True. True. False, True # mono-succession of 3 years (2/5)
> 16. False, False, True, True, True # mono-succession of 3 years (3/5)
> 17. True, True, True, False, False # mono-succession of 3 years (4/5)
> 18. False, True, True, True, False # mono-succession of 3 years (5/5)
> 19. True, True, False, True, True # crops repeated two years
> 
> In particular, I want to apply all these 19 patterns to 7 (out of 11)
> land covers: 2BC, 2Co, 2Ma, 2We, 2MG, 2ML, 2PG. The pattern are so
> structured: True means presence of a given land cover (iteratively,
> one of the seven listed above), False means any other land-cover
> (amidst the remainder 10).
> 
> Thanks again for any further help.
> Greetings,
> Dd
> 
> ***********************************************************
> Davide Rizzo
> website :: http://sites.google.com/site/ridavide/
> 
> 
> On Sat, Sep 15, 2012 at 5:51 PM, John Kane <jrkrideau at inbox.com>
wrote:
>> I have not seen any replies to your questions so I will suggest an
>> approach that may work if I can get a function to work.
>> 
>> If I understand what you want, you have a pattern something like this:
>> pattern1  <-  c("2Ma",
"no2Ma","no2Ma", "no2Ma","no2Ma")
>> pattern2  <-  c("no2Ma", 'no2Ma',
"no2Ma", "no2Ma", "2Ma")
>> 
>> for each five year period where 2Ma stands to Maize, one of 11
different
>> grains
>>   1AU   2BC   2Co   2Ma   2MG   2ML   2oc   2PG   2SA   2We   3sN
>> 
>> and what you want to know is if each year gives a pattern like
>> 
>> check1 <-  c(TRUE, FALSE, FALSE, FALSE, FALSE)
>> check2  <-  c(FALSE, FALSE, FALSE, FALSE, TRUE)
>> 
>> If I understand the patterns you only care for the two above, is that
>> correct?
>> 
>> I am running out of time today but I think that this approach will get
>> you started
>> ==========================================================>> 
>> T80<-read.table(file="C:/sample.txt", header=T,
sep=";")
>> 
>> # Reminder of just what we want to get as a final result.
>> check1 <-  c(TRUE, FALSE, FALSE, FALSE, FALSE)
>> check2  <-  c(FALSE, FALSE, FALSE, FALSE, TRUE)
>> 
>> pattern1  <-  c("2Ma", "2Ma","2Ma",
"2Ma","2Ma")
>> 
>> # one row examples to see that is happening
>> T80[1,3:7]
>> T80[1, 3:7] == pattern1
>> 
>> T80[405, 3:7]
>> T80[405, 3:7] == pattern1
>> 
>> # now we apply the patterns to the entire data set.
>> pp1  <-  T80[, 3:7] == pattern1
>> pp2  <-  T80[, 3:7] == pattern2
>> 
>> # reassign the WS values so we know where the data is from
>> WSnames  <-  rep(T80$WS, 2)
>> 
>> # Assmble new data frame.
>> maizedata  <-  data.frame(WSnames, rbind(pp1,pp2))
>> =======================================================>> 
>> Now, assuming this runs for you and I have not made a serious mistake
in
>> logic, kyou should be able to do some subsetting  (?subset)  to extract
>> only the
>> check1 and check2 patterns above.
>> 
>> This is where I ran into trouble as I don't have the time this
morning
>> to work out the subsetting conditions. It looks tricking and you
>> probably need a couple of subsetting moves.
>> 
>> It's not a pretty  solutlion and, particularly, I expect someone
could
>> clean it up to make the subsetting easier or even unnecessary but I
hope
>> it helps.
>> 
>> Once you have extracted what you want   use apply() or perhaps the plyr
>> package to aggregate the results.
>> 
>> Repeat for all grains.  Actually look into setting the whole thing up
as
>> a function. You should be able to write the program once as a function
>> and do a loop or an apply() to do all 11 grains in one go.
>> 
>> Best of luck.
>> 
>> John Kane
>> Kingston ON Canada
>> 
>> 
>>> -----Original Message-----
>>> From: ridavide at gmail.com
>>> Sent: Thu, 13 Sep 2012 15:36:28 +0200
>>> To: r-help at r-project.org
>>> Subject: [R] [newbie] aggregating table() results and simplifying
code
>>> with loop
>>> 
>>> Dear all,
>>> I'm looking for primary help at aggregating table() results and
at
>>> writing a loop (if useful)
>>> 
>>> My dataset ( http://goo.gl/gEPKW ) is composed of 23k rows, each
one
>>> representing a point in the space of which we know the land cover
over
>>> 10 years (column y01 to y10).
>>> 
>>> I need to analyse it with a temporal sliding window of 5 years (y01
to
>>> y05, y02 to y06 and so forth)
>>> For each period I'm looking for specific sequences (e.g.,
Maize,
>>> -noMaize, -noMaize, -noMaize, -noMaize) to calculate the
"return time"
>>> of principal land covers: barley (2BC), colza (2Co), maize (2Ma),
etc.
>>> I define the "return time" as the presence of a given
land cover
>>> according to a given sequence. Hence, each return time could
require
>>> the sum of different sequences (e.g., a return time of 5 years
derives
>>> from the sum of [2Ma,no2Ma,no2Ma,no2Ma,no2Ma] +
>>> [no2Ma,no2Ma,no2Ma,no2Ma,2Ma]).
>>> I need to repeat the calculation for each land cover for each time
>>> window. In addition, I need to repeat the process over three
datasets
>>> (the one I give is the first one, the second one is from year 12 to
>>> year 24, the third one from year 27 to year 31. So I have breaks in
>>> the monitoring of land cover that avoid me to create a continuous
>>> dataset). At the end I expect to aggregate the sum for each spatial
>>> entity (column WS)
>>> 
>>> I've started writing the code for the first crop in the first
5yrs
>>> period (http://goo.gl/FhZNx) then copying and pasting it for each
crop
>>> then for each time window...
>>> Moreover I do not know how to aggregate the results of table(). (NB
>>> sometimes I have a different number of WS per table because a given
>>> sequence could be absent in a given spatial entity... so I have the
>>> following warning msg: number of columns of result is not a
multiple
>>> of vector length (arg 1)). Therefore, I'm "obliged"
to copy&paste the
>>> table corresponding to each sequence....
>>> 
>>> FIRST QUEST. How to aggregate the results of table() when the
number
>>> of columns is different?
>>> Or the other way around: Is there a way to have a table where each
row
>>> reports the number of points per time return per WS? something like
>>> 
>>> WS1    WS2    WS3    WS4    ...    WS16    crop    period
>>> 23    15    18    43    ...    52       Ma5    01
>>> 18    11    25    84    ...    105       Ma2    01
>>> ...    ...    ...    ...    ...    ...    ...    ...
>>> ...    ...    ...    ...    ...    ...    Co5    01
>>> ...    ...    ...    ...    ...    ...    ...    ...
>>> ...    ...    ...    ...    ...    ...    Ma5    02
>>> ...    ...    ...    ...    ...    ...    ...    ...
>>> In this table each row should represent a return time for a given
land
>>> cover a given period (one of the 6 time window of 5 years)?
>>> 
>>> SECOND QUEST. Could a loop (instead of a modular copy/paste code)
>>> improve the time/reliability of the calculation? If yes, could you
>>> please indicate me some entry-level references to write it?
>>> 
>>> I am aware this are newbie's questions, but I have not be able
to
>>> solve them using manuals and available sources.
>>> Thank you in advance for your help.
>>> 
>>> Greetings,
>>> Dd
>>> 
>>> PS
>>> R: version 2.14.2 (2012-02-29)
>>> OS: MS Windows XP Home 32-bit SP3
>>> 
>>> *****************************
>>> Davide Rizzo
>>> post-doc researcher
>>> INRA UR055 SAD-ASTER
>>> website :: http://sites.google.com/site/ridavide/
>> 
>> ____________________________________________________________
>> GET FREE 5GB EMAIL - Check out spam free email with many cool features!
>> Visit http://www.inbox.com/email to find out more!
>> 
>>
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

John Kane

2012-Sep-17 13:19 UTC

head link

[R] [newbie] aggregating table() results and simplifying code with loop

Comments in-line below.

Sorry to be so long getting back to you but sleep intervened. 
> -----Original Message-----
> From: ridavide at gmail.com
> Sent: Sun, 16 Sep 2012 21:24:15 +0200
> To: jrkrideau at inbox.com
> Subject: Re: [R] [newbie] aggregating table() results and simplifying
> code with loop
> 
> Thank you John,
> 
> you are giving me two precious tips (in addition, well explained!):
> 1. to use the package plyr (I didn't know it before, but it seems to
> make the deal!)
> 2. a smart and promising way to use it
> 
> I can finally plot the partial results, to have a first glance and
> compare to them
> 
> =========================================================> 
> # once the sequences for a given crop have been assembled in a
> dataframe (e.g., maizedata)
> library(latticeExtra) # load the package for "improved" graphics
> dotplot(mzcount$crop_pattern) # to plot the occurrence of all the
> retrieved sequences
> 
> #once the target pattern have been subset (e.g., m51)
> dotplot(m51$WS ~ m51$count) # to plot the occurrence of the target
> pattern(s) per watershed
Ah, so that's what WS stands for!  Do you have an abstract or short summary
of what you are doing in English or French?  My Itallian is limited to about 3
words or what I can guess from French/Latin.

I think that you are misunderstanding what is happening in m51. It is just
summarizing the total counts across those two conditions.  I think either I made
a logic error or was not understanding what was needed. In any case the dotplot
is not plotting the actual occurances but the number of times the crop_pattern
was found.  The count is the actual number of occurances per WS per crop_pattern


Have a look at my new effort.  it is still ugly but I think it more accurately
supplies some of what you want.

Two caveats : 1.  This is still very rough and an better programer may have a
better approach.
2. I used the package ggplot2 to do the graphs. One more thing to learn but I am
not used to lattice and I am used to ggplot2. Rather than spend a lot of time
with lattice or
2. The last faceted plots are not intended to be of real use, just my quick look
to see if anything looked like I expected. A bit of thinking probably would give
somethimg much better





> =========================================================> 
> This help me retrieving all the possible patterns for the different
> land covers. Hence, you made me able to improve the subset of
> patterns.
> 
> Tomorrow morning I'm going to test the tips for all the land covers
> for all the 5years time-periods. Even if still labour-intensive, the
> solution you propose it's surely a steady improvement.
> Now only the second question remains: is there a way "to clean"
this
> approach?... or probably not ?...
Yes I suspect that there are lots of ways to 'clean' the process and
automate it so that it would apply to all three data sets quite easily.  Some I
think I can help with and some you may need other people's  help.

For example, once we have a working model to handle one or two conditions It
should be relatively easy to use an apply() or a loop to handle all of them and
so on..

Well, I'm off to work now so I probably won't be able to get back to
much before late evening my time ( probably after you are asleep) I think
I'm 6 hours ahead of you.

##===================Revised approach===================# load the various
packages (plyr, latticeExtra, ggplot2, reshape2)
library(plyr)
library(latticeExtra)
library(ggplot2)
library(reshape2)

# sample data
T80<- read.csv("/home/john/rdata/sample.csv",  header = TRUE, sep =
";")
# Davide's actual read statement
# T80<-read.table(file="C:/sample.txt", header=T,
sep=";")

# Looking for Maize
pattern  <-  c("2Ma", "2Ma","2Ma",
"2Ma","2Ma")

# one row examples to see that is happening
T80[1,3:7]
T80[1, 3:7] == pattern

T80[405, 3:7]
T80[405, 3:7] == pattern

T80[55, 3:7] == pattern

# now we apply the patterns to the entire data set.
pp1  <-  T80[, 3:7] == pattern

# paste the TRUEs and FALSEs together to form a single variable
concatdat  <-  paste(pp1[, 1], pp1[, 2], pp1[, 3], pp1[, 4],pp1[,5] ,  sep =
"+")

# Assmble new data frame. 
maizedata  <-  data.frame(T80$WS, concatdat)
names(maizedata)  <-  c("WS", "crop_pattern")
str(maizedata)

maizedata$crop_pattern  <-  as.character(maizedata$crop_pattern)

pattern_count  <-  ddply(maizedata, .(crop_pattern), summarize, npattern =
length(crop_pattern))
str(pattern_count)
head(pattern_count);  dim(pattern_count) # 	quick look at data.frame and  its
size.
                                                                                
# FALSE+FALSE+FALSE+FALSE+FALSE accounts for 21,493 values.

which(pattern_count$npattern == max(pattern_count$npattern))  # this does the
same as looking at the data
                                                                                
# not needed here but useful for larger datasets.

# If we graph pattern_count as it stands we lose any useful detail because of
that outlier.
p  <-  ggplot(pattern_count  , aes(crop_pattern, npattern  )) + geom_point()
+
           coord_flip()
p

(pattern1  <-  pattern_count[-1,])  # Drop the offending
FALSE+FALSE+FALSE+FALSE+FALSE
dim(pattern1)  # Okay now we have the maize patterns, without the WS who had no
maize at all.

p  <-  ggplot(pattern1  , aes(crop_pattern, npattern   )) + geom_point() +
           coord_flip()
p 


newmaize  <-  subset(maizedata, maizedata$crop_pattern !=
"FALSE+FALSE+FALSE+FALSE+FALSE")
dim(newmaize) ;  head(newmaize)
str(newmaize)

summaize_by_WS   <-  ddply(newmaize, .(WS), summarize, crop_pattern_ws =
length(crop_pattern))

p  <-  ggplot( summaize_by_WS  , aes(summaize_by_WS  )) + geom_point + 
             coord_flip()
p 

summaize_by_WS_and_crop   <-  ddply(newmaize, .(WS, crop_pattern), summarize,
crop_pattern_ws = length(crop_pattern))

# crappy graph but just try to see what we might get.  probablly need to subset
or use better grid layout.
p  <-  ggplot( summaize_by_WS_and_crop  , aes(crop_pattern, crop_pattern_ws  
)) + geom_point() +
             coord_flip() + facet_grid(WS ~ . )
p 

# save the last graph to look at it in a graphics package== still terrible
ggsave( "/home/john/Rjunk/crop.png")

##> 
> Thanks again for the help.
> Cheers,
> Dd
> 
> ***********************************************************
> Davide Rizzo
> website :: http://sites.google.com/site/ridavide/
> 
> 
> On Sun, Sep 16, 2012 at 8:24 PM, John Kane <jrkrideau at inbox.com>
wrote:
>> Hi Davide,
>> 
>> I had some time this afternoon and I wonder if this approach is llkely
>> to get the results you want?  As before it is not complete but I think
>> it holds promise.
>> 
>> On the other hand Rui is a much better programer than I am so he may
>> have a much cleaner solution.  My way still looks labour-intensive at
>> the moment.
>> 
>> I am using the plyr package which you will probably have to install.
>> load.packages("plyr") should do it.
>> =========================================================>> #
load the plyr package -
>> library(plyr)
>> 
>> # sample data
>> T80<- read.csv("/home/john/rdata/sample.csv",  header =
TRUE, sep = ";")
>> # Davide's actual read statement
>> # T80<-read.table(file="C:/sample.txt", header=T,
sep=";")
>> 
>> # Looking for Maize
>> pattern  <-  c("2Ma", "2Ma","2Ma",
"2Ma","2Ma")
>> 
>> # one row examples to see that is happening
>> T80[1,3:7]
>> T80[1, 3:7] == pattern
>> 
>> T80[405, 3:7]
>> T80[405, 3:7] == pattern
>> 
>> T80[55, 3:7] == pattern
>> 
>> # now we apply the patterns to the entire data set.
>> pp1  <-  T80[, 3:7] == pattern
>> 
>> # paste the TRUEs and FALSEs together to form a single variable
>> concatdat  <-  paste(pp1[, 1], pp1[, 2], pp1[, 3], pp1[, 4],pp1[,5]
,
>> sep = "+")
>> 
>> # Assmble new data frame.
>> maizedata  <-  data.frame(T80$WS, concatdat)
>> names(maizedata)  <-  c("WS", "crop_pattern")
>> 
>> mzcount  <-  ddply(maizedata, .(WS, crop_pattern),  summarize, count
>> length(crop_pattern))
>> mzcount  # This is all the data not just the relevant maise patterns
>> 
>> # This seems to be getting us somewhere though we are not not there yet
>> # Does this subset  look like we are going in the right direction?
>> m51  <-  subset(mzcount,
>> mzcount$crop_pattern == "FALSE+FALSE+FALSE+FALSE+TRUE"
>> | mzcount$crop_pattern == "TRUE+FALSE+FALSE+FALSE+FALSE")
>> 
>> m51  <-  ddply(m51, .(WS), summarize, count = sum(count))
>> m51
>>
================================================================>>
>> John Kane
>> Kingston ON Canada
>> 
>> 
>>> -----Original Message-----
>>> From: ridavide at gmail.com
>>> Sent: Sat, 15 Sep 2012 19:00:29 +0200
>>> To: jrkrideau at inbox.com, ruipbarradas at sapo.pt
>>> Subject: Re: [R] [newbie] aggregating table() results and
simplifying
>>> code with loop
>>> 
>>> Thanks Rui, thanks John for your very different solutions.
>>> 
>>> I'll try to break my questions into smaller steps following
your tips.
>>> However, not everything is clear for me... so before giving you a
>>> feed-back I need to study further your answers. For the moment I
could
>>> specify that I'm looking for the following 19 patterns:
>>> 
>>> 1. True, False, False, False, False # return period of 5 years
(1/2)
>>> 2. False, False, False, False, True # return period of 5 years
(2/2)
>>> 3. True, False, False, False, True # return period of 4 years (1/3)
>>> 4. False, True, False, False, False # return period of 4 years
(2/3)
>>> 5. False, False, False, True, False # return period of 4 years
(3/3)
>>> 6. True, False, False, True, False # return period of 3 years (1/3)
>>> 7. False, True, False, False, True # return period of 3 years (2/3)
>>> 8. False, False, True, False, False # return period of 3 years
(3/3)
>>> 9. False, True, False, True, False # return period of 2 years (1/2)
>>> 10. True, False, True, False, True # return period of 2 years (1/2)
>>> 11. True, True, True, True, True # mono-succession of 5 years
>>> 12. False, True, True, True, True # mono-succession of 4 years
(1/2)
>>> 13. True, True, True, True, False # mono-succession of 4 years
(2/2)
>>> 14. True, False, True, True, True # mono-succession of 3 years
(1/5)
>>> 15. True. True. True. False, True # mono-succession of 3 years
(2/5)
>>> 16. False, False, True, True, True # mono-succession of 3 years
(3/5)
>>> 17. True, True, True, False, False # mono-succession of 3 years
(4/5)
>>> 18. False, True, True, True, False # mono-succession of 3 years
(5/5)
>>> 19. True, True, False, True, True # crops repeated two years
>>> 
>>> In particular, I want to apply all these 19 patterns to 7 (out of
11)
>>> land covers: 2BC, 2Co, 2Ma, 2We, 2MG, 2ML, 2PG. The pattern are so
>>> structured: True means presence of a given land cover (iteratively,
>>> one of the seven listed above), False means any other land-cover
>>> (amidst the remainder 10).
>>> 
>>> Thanks again for any further help.
>>> Greetings,
>>> Dd
>>> 
>>> ***********************************************************
>>> Davide Rizzo
>>> website :: http://sites.google.com/site/ridavide/
>>> 
>>> 
>>> On Sat, Sep 15, 2012 at 5:51 PM, John Kane <jrkrideau at
inbox.com> wrote:
>>>> I have not seen any replies to your questions so I will suggest
an
>>>> approach that may work if I can get a function to work.
>>>> 
>>>> If I understand what you want, you have a pattern something
like this:
>>>> pattern1  <-  c("2Ma",
"no2Ma","no2Ma", "no2Ma","no2Ma")
>>>> pattern2  <-  c("no2Ma", 'no2Ma',
"no2Ma", "no2Ma", "2Ma")
>>>> 
>>>> for each five year period where 2Ma stands to Maize, one of 11
>>>> different
>>>> grains
>>>>   1AU   2BC   2Co   2Ma   2MG   2ML   2oc   2PG   2SA   2We  
3sN
>>>> 
>>>> and what you want to know is if each year gives a pattern like
>>>> 
>>>> check1 <-  c(TRUE, FALSE, FALSE, FALSE, FALSE)
>>>> check2  <-  c(FALSE, FALSE, FALSE, FALSE, TRUE)
>>>> 
>>>> If I understand the patterns you only care for the two above,
is that
>>>> correct?
>>>> 
>>>> I am running out of time today but I think that this approach
will get
>>>> you started
>>>>
==========================================================>>>>
>>>> T80<-read.table(file="C:/sample.txt", header=T,
sep=";")
>>>> 
>>>> # Reminder of just what we want to get as a final result.
>>>> check1 <-  c(TRUE, FALSE, FALSE, FALSE, FALSE)
>>>> check2  <-  c(FALSE, FALSE, FALSE, FALSE, TRUE)
>>>> 
>>>> pattern1  <-  c("2Ma",
"2Ma","2Ma", "2Ma","2Ma")
>>>> 
>>>> # one row examples to see that is happening
>>>> T80[1,3:7]
>>>> T80[1, 3:7] == pattern1
>>>> 
>>>> T80[405, 3:7]
>>>> T80[405, 3:7] == pattern1
>>>> 
>>>> # now we apply the patterns to the entire data set.
>>>> pp1  <-  T80[, 3:7] == pattern1
>>>> pp2  <-  T80[, 3:7] == pattern2
>>>> 
>>>> # reassign the WS values so we know where the data is from
>>>> WSnames  <-  rep(T80$WS, 2)
>>>> 
>>>> # Assmble new data frame.
>>>> maizedata  <-  data.frame(WSnames, rbind(pp1,pp2))
>>>>
=======================================================>>>>
>>>> Now, assuming this runs for you and I have not made a serious
mistake
>>>> in
>>>> logic, kyou should be able to do some subsetting  (?subset)  to
>>>> extract
>>>> only the
>>>> check1 and check2 patterns above.
>>>> 
>>>> This is where I ran into trouble as I don't have the time
this morning
>>>> to work out the subsetting conditions. It looks tricking and
you
>>>> probably need a couple of subsetting moves.
>>>> 
>>>> It's not a pretty  solutlion and, particularly, I expect
someone could
>>>> clean it up to make the subsetting easier or even unnecessary
but I
>>>> hope
>>>> it helps.
>>>> 
>>>> Once you have extracted what you want   use apply() or perhaps
the
>>>> plyr
>>>> package to aggregate the results.
>>>> 
>>>> Repeat for all grains.  Actually look into setting the whole
thing up
>>>> as
>>>> a function. You should be able to write the program once as a
function
>>>> and do a loop or an apply() to do all 11 grains in one go.
>>>> 
>>>> Best of luck.
>>>> 
>>>> John Kane
>>>> Kingston ON Canada
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: ridavide at gmail.com
>>>>> Sent: Thu, 13 Sep 2012 15:36:28 +0200
>>>>> To: r-help at r-project.org
>>>>> Subject: [R] [newbie] aggregating table() results and
simplifying
>>>>> code
>>>>> with loop
>>>>> 
>>>>> Dear all,
>>>>> I'm looking for primary help at aggregating table()
results and at
>>>>> writing a loop (if useful)
>>>>> 
>>>>> My dataset ( http://goo.gl/gEPKW ) is composed of 23k rows,
each one
>>>>> representing a point in the space of which we know the land
cover
>>>>> over
>>>>> 10 years (column y01 to y10).
>>>>> 
>>>>> I need to analyse it with a temporal sliding window of 5
years (y01
>>>>> to
>>>>> y05, y02 to y06 and so forth)
>>>>> For each period I'm looking for specific sequences
(e.g., Maize,
>>>>> -noMaize, -noMaize, -noMaize, -noMaize) to calculate the
"return
>>>>> time"
>>>>> of principal land covers: barley (2BC), colza (2Co), maize
(2Ma),
>>>>> etc.
>>>>> I define the "return time" as the presence of a
given land cover
>>>>> according to a given sequence. Hence, each return time
could require
>>>>> the sum of different sequences (e.g., a return time of 5
years
>>>>> derives
>>>>> from the sum of [2Ma,no2Ma,no2Ma,no2Ma,no2Ma] +
>>>>> [no2Ma,no2Ma,no2Ma,no2Ma,2Ma]).
>>>>> I need to repeat the calculation for each land cover for
each time
>>>>> window. In addition, I need to repeat the process over
three datasets
>>>>> (the one I give is the first one, the second one is from
year 12 to
>>>>> year 24, the third one from year 27 to year 31. So I have
breaks in
>>>>> the monitoring of land cover that avoid me to create a
continuous
>>>>> dataset). At the end I expect to aggregate the sum for each
spatial
>>>>> entity (column WS)
>>>>> 
>>>>> I've started writing the code for the first crop in the
first 5yrs
>>>>> period (http://goo.gl/FhZNx) then copying and pasting it
for each
>>>>> crop
>>>>> then for each time window...
>>>>> Moreover I do not know how to aggregate the results of
table(). (NB
>>>>> sometimes I have a different number of WS per table because
a given
>>>>> sequence could be absent in a given spatial entity... so I
have the
>>>>> following warning msg: number of columns of result is not a
multiple
>>>>> of vector length (arg 1)). Therefore, I'm
"obliged" to copy&paste the
>>>>> table corresponding to each sequence....
>>>>> 
>>>>> FIRST QUEST. How to aggregate the results of table() when
the number
>>>>> of columns is different?
>>>>> Or the other way around: Is there a way to have a table
where each
>>>>> row
>>>>> reports the number of points per time return per WS?
something like
>>>>> 
>>>>> WS1    WS2    WS3    WS4    ...    WS16    crop    period
>>>>> 23    15    18    43    ...    52       Ma5    01
>>>>> 18    11    25    84    ...    105       Ma2    01
>>>>> ...    ...    ...    ...    ...    ...    ...    ...
>>>>> ...    ...    ...    ...    ...    ...    Co5    01
>>>>> ...    ...    ...    ...    ...    ...    ...    ...
>>>>> ...    ...    ...    ...    ...    ...    Ma5    02
>>>>> ...    ...    ...    ...    ...    ...    ...    ...
>>>>> In this table each row should represent a return time for a
given
>>>>> land
>>>>> cover a given period (one of the 6 time window of 5 years)?
>>>>> 
>>>>> SECOND QUEST. Could a loop (instead of a modular copy/paste
code)
>>>>> improve the time/reliability of the calculation? If yes,
could you
>>>>> please indicate me some entry-level references to write it?
>>>>> 
>>>>> I am aware this are newbie's questions, but I have not
be able to
>>>>> solve them using manuals and available sources.
>>>>> Thank you in advance for your help.
>>>>> 
>>>>> Greetings,
>>>>> Dd
>>>>> 
>>>>> PS
>>>>> R: version 2.14.2 (2012-02-29)
>>>>> OS: MS Windows XP Home 32-bit SP3
>>>>> 
>>>>> *****************************
>>>>> Davide Rizzo
>>>>> post-doc researcher
>>>>> INRA UR055 SAD-ASTER
>>>>> website :: http://sites.google.com/site/ridavide/
>>>>
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

John Kane

2012-Sep-18 14:29 UTC

head link

[R] [newbie] aggregating table() results and simplifying code with loop

Looks very interesting and as I said you're a much better progamer than I
am. I get headaches using lapply()!

I am going to have to spend some time stepping through the functions to really
see what you are doing.   However if I understand the output, for example one
can simply aggreate the information by Crop and Period in each element of the
list to get the corresponding sums for each of the desired patterns?

Thus for example in result[[1]] we get all the results for condition 19 (crops)
from wanted?


John Kane
Kingston ON Canada

> -----Original Message-----
> From: ruipbarradas at sapo.pt
> Sent: Mon, 17 Sep 2012 16:04:15 +0100
> To: jrkrideau at inbox.com, ridavide at gmail.com, jrkrideau at inbox.com
> Subject: Re: [R] [newbie] aggregating table() results and simplifying
> code with loop
> 
> Hello,
> 
> Now I believe I've got it. (So-so believe.)
> ddply is a good idea, in my first post I was using xtabs to the same
> effect, but ddply makes it a lot easier.
> I had made a big mistake, though. I thought that only sequences of FALSE
> were valid, hence rle(). In a follow-up post, Davide listed all
> combinations of T/F that matter. The code below uses them to create a df
> of cases to count with ddply. The creation is done by merge, on the T/F
> columns (function f1) and the rest is just to apply to 6 periods of 5
> years each, after applying to covers, after applying to years of crop
> type.
> Sounds confusing, and it probably is but I've separated all of it in
> small functions.
> Each of the functions is in fact very simple, and all one needs is to
> remember that the inner most function is reached after a series of
> *apply statements, doing one thing at a time.
> 
> 
>
#------------------------------------------------------------------------------
> # data.frame of target sequences of T/F per crop type and years
> # (from an earlier post)
> 
> wanted <-
> structure(list(V2 = c(TRUE, FALSE, TRUE, FALSE, FALSE, TRUE,
> FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE,
> TRUE, FALSE, TRUE), V3 = c(FALSE, FALSE, FALSE, TRUE, FALSE,
> FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE,
> FALSE, TRUE, TRUE, TRUE), V4 = c(FALSE, FALSE, FALSE, FALSE,
> FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE,
> TRUE, TRUE, TRUE, TRUE, FALSE), V5 = c(FALSE, FALSE, FALSE, FALSE,
> TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE,
> FALSE, TRUE, FALSE, TRUE, TRUE), V6 = c(FALSE, TRUE, TRUE, FALSE,
> FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE,
> TRUE, TRUE, FALSE, FALSE, TRUE), V7 = c(" return", "
return",
> " return", " return", " return", "
return", " return", " return",
> " return", " return", " mono-succession",
" mono-succession",
> " mono-succession", " mono-succession", "
mono-succession", "
> mono-succession",
> " mono-succession", " mono-succession", "
crops"), V8 = c(5L,
> 5L, 4L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 5L, 4L, 4L, 3L, 3L, 3L, 3L,
> 3L, 2L)), .Names = c("V2", "V3", "V4",
"V5", "V6", "V7", "V8"
> ), row.names = c(NA, -19L), class = "data.frame")
> 
> want <- split(wanted, wanted$V7)
> 
> #--------------------------------------------------------------------
> # The functions go from f5 down to f1 (f5 calls f4, f4 calls f3, etc.)
> #
> f1 <- function(DF, Mat, currcols, WS){
> 	mcols <- names(T80[, ycols])[currcols]  # names for 'merge'
> 	names(DF)[1:5] <- mcols
> 	m <- merge(DF, data.frame(Mat[, mcols], WS = WS))
> 	ddply(m, .(WS), summarize, length(V8))
> }
> f2 <- function(w, tf, WS){
> 	l1 <- lapply(yrs, function(yy) t(f1(w, tf, yy, WS)))
> 	l2 <- lapply(l1, function(x){
> 		nms <- as.character(x["WS",])
> 		resline[ nms ] <- x[2, ]
> 		resline})
> 	res <- do.call(rbind, l2)
> 	data.frame(res)
> }
> f3 <- function(Cover, w){
> 	tf <- T80[, ycols] == Cover
> 	inx <- as.vector(apply(tf, 1, function(.x) any(.x)))
> 	tf <- tf[inx, ]
> 	WS <- T80$WS[inx]
> 	# Special care with empty sets
> 	if(sum(tf) > 0){
> 		res <- f2(w, tf, WS)
> 		na <- as.vector(apply(res, 1, function(.x) all(is.na(.x))))
> 		res <- res[!na, ]
> 		if(!is.null(res) && nrow(res) > 0){
> 			res$Cover <- Cover
> 			res$Period <- (1:6)[!na]
> 		}else res <- NULL
> 	}else res <- NULL
> 	res
> }
> f4 <- function(w){
> 	res <- do.call(rbind, lapply(covers, f3, w))
> 	# Special care with empty sets
> 	if(!is.null(nrow(res)) && nrow(res) > 0){
> 		res <- data.frame(res)
> 		res$Years <- w$V8[1]
> 	}else res <- NULL
> 	res
> }
> f5 <- function(w){
> 	sp <- split(w, w$V8)
> 	res <- do.call( rbind, lapply(sp, f4) )
> 	res <- data.frame(res)
> 	res
> }
> 
> T80 <- read.table("sample.txt", header = TRUE, sep =
";")
> 
> ycols <- grep("y", names(T80))
> ws <- unique(T80$WS)
> covers <- as.character(unique(unlist(T80[ycols])))
> yrs <- lapply(0:5, `+`, 1:5)
> resline <- rep(NA, length(ws))
> names(resline) <- ws
> 
> result <- lapply(want, f5)
> 
> 
> (Ok, maybe two or three things at a time.)
> 
> I hope this is it!
> Have fun.
> 
> Rui Barradas
> 
> Em 17-09-2012 14:19, John Kane escreveu:
>> Comments in-line below.
>> 
>> Sorry to be so long getting back to you but sleep intervened.
>> 
>>> -----Original Message-----
>>> From: ridavide at gmail.com
>>> Sent: Sun, 16 Sep 2012 21:24:15 +0200
>>> To: jrkrideau at inbox.com
>>> Subject: Re: [R] [newbie] aggregating table() results and
simplifying
>>> code with loop
>>> 
>>> Thank you John,
>>> 
>>> you are giving me two precious tips (in addition, well explained!):
>>> 1. to use the package plyr (I didn't know it before, but it
seems to
>>> make the deal!)
>>> 2. a smart and promising way to use it
>>> 
>>> I can finally plot the partial results, to have a first glance and
>>> compare to them
>>> 
>>>
=========================================================>>>
>>> # once the sequences for a given crop have been assembled in a
>>> dataframe (e.g., maizedata)
>>> library(latticeExtra) # load the package for "improved"
graphics
>>> dotplot(mzcount$crop_pattern) # to plot the occurrence of all the
>>> retrieved sequences
>>> 
>>> #once the target pattern have been subset (e.g., m51)
>>> dotplot(m51$WS ~ m51$count) # to plot the occurrence of the target
>>> pattern(s) per watershed
>> Ah, so that's what WS stands for!  Do you have an abstract or short
>> summary of what you are doing in English or French?  My Itallian is
>> limited to about 3 words or what I can guess from French/Latin.
>> 
>> I think that you are misunderstanding what is happening in m51. It is
>> just summarizing the total counts across those two conditions.  I think
>> either I made a logic error or was not understanding what was needed.
In
>> any case the dotplot is not plotting the actual occurances but the
>> number of times the crop_pattern was found.  The count is the actual
>> number of occurances per WS per crop_pattern
>> 
>> 
>> Have a look at my new effort.  it is still ugly but I think it more
>> accurately supplies some of what you want.
>> 
>> Two caveats : 1.  This is still very rough and an better programer may
>> have a better approach.
>> 2. I used the package ggplot2 to do the graphs. One more thing to learn
>> but I am not used to lattice and I am used to ggplot2. Rather than
spend
>> a lot of time with lattice or
>> 2. The last faceted plots are not intended to be of real use, just my
>> quick look to see if anything looked like I expected. A bit of thinking
>> probably would give somethimg much better
>> 
>> 
>> 
>> 
>> 
>> 
>>>
=========================================================>>>
>>> This help me retrieving all the possible patterns for the different
>>> land covers. Hence, you made me able to improve the subset of
>>> patterns.
>>> 
>>> Tomorrow morning I'm going to test the tips for all the land
covers
>>> for all the 5years time-periods. Even if still labour-intensive,
the
>>> solution you propose it's surely a steady improvement.
>>> Now only the second question remains: is there a way "to
clean" this
>>> approach?... or probably not ?...
>> Yes I suspect that there are lots of ways to 'clean' the
process and
>> automate it so that it would apply to all three data sets quite easily.
>> Some I think I can help with and some you may need other people's 
help.
>> 
>> For example, once we have a working model to handle one or two
>> conditions It should be relatively easy to use an apply() or a loop to
>> handle all of them and so on..
>> 
>> Well, I'm off to work now so I probably won't be able to get
back to
>> much before late evening my time ( probably after you are asleep) I
>> think I'm 6 hours ahead of you.
>> 
>> ##===================Revised approach===================>> # load
the various packages (plyr, latticeExtra, ggplot2, reshape2)
>> library(plyr)
>> library(latticeExtra)
>> library(ggplot2)
>> library(reshape2)
>> 
>> # sample data
>> T80<- read.csv("/home/john/rdata/sample.csv",  header =
TRUE, sep = ";")
>> # Davide's actual read statement
>> # T80<-read.table(file="C:/sample.txt", header=T,
sep=";")
>> 
>> # Looking for Maize
>> pattern  <-  c("2Ma", "2Ma","2Ma",
"2Ma","2Ma")
>> 
>> # one row examples to see that is happening
>> T80[1,3:7]
>> T80[1, 3:7] == pattern
>> 
>> T80[405, 3:7]
>> T80[405, 3:7] == pattern
>> 
>> T80[55, 3:7] == pattern
>> 
>> # now we apply the patterns to the entire data set.
>> pp1  <-  T80[, 3:7] == pattern
>> 
>> # paste the TRUEs and FALSEs together to form a single variable
>> concatdat  <-  paste(pp1[, 1], pp1[, 2], pp1[, 3], pp1[, 4],pp1[,5]
,
>> sep = "+")
>> 
>> # Assmble new data frame.
>> maizedata  <-  data.frame(T80$WS, concatdat)
>> names(maizedata)  <-  c("WS", "crop_pattern")
>> str(maizedata)
>> 
>> maizedata$crop_pattern  <-  as.character(maizedata$crop_pattern)
>> 
>> pattern_count  <-  ddply(maizedata, .(crop_pattern), summarize,
npattern
>> = length(crop_pattern))
>> str(pattern_count)
>> head(pattern_count);  dim(pattern_count) # 	quick look at data.frame
and
>> its size.
>> 
>> # FALSE+FALSE+FALSE+FALSE+FALSE accounts for 21,493 values.
>> 
>> which(pattern_count$npattern == max(pattern_count$npattern))  # this
>> does the same as looking at the data
>> 
>> # not needed here but useful for larger datasets.
>> 
>> # If we graph pattern_count as it stands we lose any useful detail
>> because of that outlier.
>> p  <-  ggplot(pattern_count  , aes(crop_pattern, npattern  )) +
>> geom_point() +
>>             coord_flip()
>> p
>> 
>> (pattern1  <-  pattern_count[-1,])  # Drop the offending
>> FALSE+FALSE+FALSE+FALSE+FALSE
>> dim(pattern1)  # Okay now we have the maize patterns, without the WS
who
>> had no maize at all.
>> 
>> p  <-  ggplot(pattern1  , aes(crop_pattern, npattern   )) +
geom_point()
>> +
>>             coord_flip()
>> p
>> 
>> 
>> newmaize  <-  subset(maizedata, maizedata$crop_pattern !>>
"FALSE+FALSE+FALSE+FALSE+FALSE")
>> dim(newmaize) ;  head(newmaize)
>> str(newmaize)
>> 
>> summaize_by_WS   <-  ddply(newmaize, .(WS), summarize,
crop_pattern_ws >> length(crop_pattern))
>> 
>> p  <-  ggplot( summaize_by_WS  , aes(summaize_by_WS  )) + geom_point
+
>>               coord_flip()
>> p
>> 
>> summaize_by_WS_and_crop   <-  ddply(newmaize, .(WS, crop_pattern),
>> summarize, crop_pattern_ws = length(crop_pattern))
>> 
>> # crappy graph but just try to see what we might get.  probablly need
to
>> subset or use better grid layout.
>> p  <-  ggplot( summaize_by_WS_and_crop  , aes(crop_pattern,
>> crop_pattern_ws   )) + geom_point() +
>>               coord_flip() + facet_grid(WS ~ . )
>> p
>> 
>> # save the last graph to look at it in a graphics package== still
>> terrible
>> ggsave( "/home/john/Rjunk/crop.png")
>> 
>> ##
>>> Thanks again for the help.
>>> Cheers,
>>> Dd
>>> 
>>> ***********************************************************
>>> Davide Rizzo
>>> website :: http://sites.google.com/site/ridavide/
>>> 
>>> 
>>> On Sun, Sep 16, 2012 at 8:24 PM, John Kane <jrkrideau at
inbox.com> wrote:
>>>> Hi Davide,
>>>> 
>>>> I had some time this afternoon and I wonder if this approach is
llkely
>>>> to get the results you want?  As before it is not complete but
I think
>>>> it holds promise.
>>>> 
>>>> On the other hand Rui is a much better programer than I am so
he may
>>>> have a much cleaner solution.  My way still looks
labour-intensive at
>>>> the moment.
>>>> 
>>>> I am using the plyr package which you will probably have to
install.
>>>> load.packages("plyr") should do it.
>>>>
=========================================================>>>> # load
the plyr package -
>>>> library(plyr)
>>>> 
>>>> # sample data
>>>> T80<- read.csv("/home/john/rdata/sample.csv", 
header = TRUE, sep >>>> ";")
>>>> # Davide's actual read statement
>>>> # T80<-read.table(file="C:/sample.txt", header=T,
sep=";")
>>>> 
>>>> # Looking for Maize
>>>> pattern  <-  c("2Ma",
"2Ma","2Ma", "2Ma","2Ma")
>>>> 
>>>> # one row examples to see that is happening
>>>> T80[1,3:7]
>>>> T80[1, 3:7] == pattern
>>>> 
>>>> T80[405, 3:7]
>>>> T80[405, 3:7] == pattern
>>>> 
>>>> T80[55, 3:7] == pattern
>>>> 
>>>> # now we apply the patterns to the entire data set.
>>>> pp1  <-  T80[, 3:7] == pattern
>>>> 
>>>> # paste the TRUEs and FALSEs together to form a single variable
>>>> concatdat  <-  paste(pp1[, 1], pp1[, 2], pp1[, 3], pp1[,
4],pp1[,5] ,
>>>> sep = "+")
>>>> 
>>>> # Assmble new data frame.
>>>> maizedata  <-  data.frame(T80$WS, concatdat)
>>>> names(maizedata)  <-  c("WS",
"crop_pattern")
>>>> 
>>>> mzcount  <-  ddply(maizedata, .(WS, crop_pattern), 
summarize, count >>>> length(crop_pattern))
>>>> mzcount  # This is all the data not just the relevant maise
patterns
>>>> 
>>>> # This seems to be getting us somewhere though we are not not
there
>>>> yet
>>>> # Does this subset  look like we are going in the right
direction?
>>>> m51  <-  subset(mzcount,
>>>> mzcount$crop_pattern ==
"FALSE+FALSE+FALSE+FALSE+TRUE"
>>>> | mzcount$crop_pattern ==
"TRUE+FALSE+FALSE+FALSE+FALSE")
>>>> 
>>>> m51  <-  ddply(m51, .(WS), summarize, count = sum(count))
>>>> m51
>>>>
================================================================>>>>
>>>> John Kane
>>>> Kingston ON Canada
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: ridavide at gmail.com
>>>>> Sent: Sat, 15 Sep 2012 19:00:29 +0200
>>>>> To: jrkrideau at inbox.com, ruipbarradas at sapo.pt
>>>>> Subject: Re: [R] [newbie] aggregating table() results and
simplifying
>>>>> code with loop
>>>>> 
>>>>> Thanks Rui, thanks John for your very different solutions.
>>>>> 
>>>>> I'll try to break my questions into smaller steps
following your
>>>>> tips.
>>>>> However, not everything is clear for me... so before giving
you a
>>>>> feed-back I need to study further your answers. For the
moment I
>>>>> could
>>>>> specify that I'm looking for the following 19 patterns:
>>>>> 
>>>>> 1. True, False, False, False, False # return period of 5
years (1/2)
>>>>> 2. False, False, False, False, True # return period of 5
years (2/2)
>>>>> 3. True, False, False, False, True # return period of 4
years (1/3)
>>>>> 4. False, True, False, False, False # return period of 4
years (2/3)
>>>>> 5. False, False, False, True, False # return period of 4
years (3/3)
>>>>> 6. True, False, False, True, False # return period of 3
years (1/3)
>>>>> 7. False, True, False, False, True # return period of 3
years (2/3)
>>>>> 8. False, False, True, False, False # return period of 3
years (3/3)
>>>>> 9. False, True, False, True, False # return period of 2
years (1/2)
>>>>> 10. True, False, True, False, True # return period of 2
years (1/2)
>>>>> 11. True, True, True, True, True # mono-succession of 5
years
>>>>> 12. False, True, True, True, True # mono-succession of 4
years (1/2)
>>>>> 13. True, True, True, True, False # mono-succession of 4
years (2/2)
>>>>> 14. True, False, True, True, True # mono-succession of 3
years (1/5)
>>>>> 15. True. True. True. False, True # mono-succession of 3
years (2/5)
>>>>> 16. False, False, True, True, True # mono-succession of 3
years (3/5)
>>>>> 17. True, True, True, False, False # mono-succession of 3
years (4/5)
>>>>> 18. False, True, True, True, False # mono-succession of 3
years (5/5)
>>>>> 19. True, True, False, True, True # crops repeated two
years
>>>>> 
>>>>> In particular, I want to apply all these 19 patterns to 7
(out of 11)
>>>>> land covers: 2BC, 2Co, 2Ma, 2We, 2MG, 2ML, 2PG. The pattern
are so
>>>>> structured: True means presence of a given land cover
(iteratively,
>>>>> one of the seven listed above), False means any other
land-cover
>>>>> (amidst the remainder 10).
>>>>> 
>>>>> Thanks again for any further help.
>>>>> Greetings,
>>>>> Dd
>>>>> 
>>>>> ***********************************************************
>>>>> Davide Rizzo
>>>>> website :: http://sites.google.com/site/ridavide/
>>>>> 
>>>>> 
>>>>> On Sat, Sep 15, 2012 at 5:51 PM, John Kane <jrkrideau at
inbox.com>
>>>>> wrote:
>>>>>> I have not seen any replies to your questions so I will
suggest an
>>>>>> approach that may work if I can get a function to work.
>>>>>> 
>>>>>> If I understand what you want, you have a pattern
something like
>>>>>> this:
>>>>>> pattern1  <-  c("2Ma",
"no2Ma","no2Ma", "no2Ma","no2Ma")
>>>>>> pattern2  <-  c("no2Ma", 'no2Ma',
"no2Ma", "no2Ma", "2Ma")
>>>>>> 
>>>>>> for each five year period where 2Ma stands to Maize,
one of 11
>>>>>> different
>>>>>> grains
>>>>>>    1AU   2BC   2Co   2Ma   2MG   2ML   2oc   2PG   2SA 
2We   3sN
>>>>>> 
>>>>>> and what you want to know is if each year gives a
pattern like
>>>>>> 
>>>>>> check1 <-  c(TRUE, FALSE, FALSE, FALSE, FALSE)
>>>>>> check2  <-  c(FALSE, FALSE, FALSE, FALSE, TRUE)
>>>>>> 
>>>>>> If I understand the patterns you only care for the two
above, is
>>>>>> that
>>>>>> correct?
>>>>>> 
>>>>>> I am running out of time today but I think that this
approach will
>>>>>> get
>>>>>> you started
>>>>>>
==========================================================>>>>>>
>>>>>> T80<-read.table(file="C:/sample.txt",
header=T, sep=";")
>>>>>> 
>>>>>> # Reminder of just what we want to get as a final
result.
>>>>>> check1 <-  c(TRUE, FALSE, FALSE, FALSE, FALSE)
>>>>>> check2  <-  c(FALSE, FALSE, FALSE, FALSE, TRUE)
>>>>>> 
>>>>>> pattern1  <-  c("2Ma",
"2Ma","2Ma", "2Ma","2Ma")
>>>>>> 
>>>>>> # one row examples to see that is happening
>>>>>> T80[1,3:7]
>>>>>> T80[1, 3:7] == pattern1
>>>>>> 
>>>>>> T80[405, 3:7]
>>>>>> T80[405, 3:7] == pattern1
>>>>>> 
>>>>>> # now we apply the patterns to the entire data set.
>>>>>> pp1  <-  T80[, 3:7] == pattern1
>>>>>> pp2  <-  T80[, 3:7] == pattern2
>>>>>> 
>>>>>> # reassign the WS values so we know where the data is
from
>>>>>> WSnames  <-  rep(T80$WS, 2)
>>>>>> 
>>>>>> # Assmble new data frame.
>>>>>> maizedata  <-  data.frame(WSnames, rbind(pp1,pp2))
>>>>>>
=======================================================>>>>>>
>>>>>> Now, assuming this runs for you and I have not made a
serious
>>>>>> mistake
>>>>>> in
>>>>>> logic, kyou should be able to do some subsetting 
(?subset)  to
>>>>>> extract
>>>>>> only the
>>>>>> check1 and check2 patterns above.
>>>>>> 
>>>>>> This is where I ran into trouble as I don't have
the time this
>>>>>> morning
>>>>>> to work out the subsetting conditions. It looks
tricking and you
>>>>>> probably need a couple of subsetting moves.
>>>>>> 
>>>>>> It's not a pretty  solutlion and, particularly, I
expect someone
>>>>>> could
>>>>>> clean it up to make the subsetting easier or even
unnecessary but I
>>>>>> hope
>>>>>> it helps.
>>>>>> 
>>>>>> Once you have extracted what you want   use apply() or
perhaps the
>>>>>> plyr
>>>>>> package to aggregate the results.
>>>>>> 
>>>>>> Repeat for all grains.  Actually look into setting the
whole thing
>>>>>> up
>>>>>> as
>>>>>> a function. You should be able to write the program
once as a
>>>>>> function
>>>>>> and do a loop or an apply() to do all 11 grains in one
go.
>>>>>> 
>>>>>> Best of luck.
>>>>>> 
>>>>>> John Kane
>>>>>> Kingston ON Canada
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: ridavide at gmail.com
>>>>>>> Sent: Thu, 13 Sep 2012 15:36:28 +0200
>>>>>>> To: r-help at r-project.org
>>>>>>> Subject: [R] [newbie] aggregating table() results
and simplifying
>>>>>>> code
>>>>>>> with loop
>>>>>>> 
>>>>>>> Dear all,
>>>>>>> I'm looking for primary help at aggregating
table() results and at
>>>>>>> writing a loop (if useful)
>>>>>>> 
>>>>>>> My dataset ( http://goo.gl/gEPKW ) is composed of
23k rows, each
>>>>>>> one
>>>>>>> representing a point in the space of which we know
the land cover
>>>>>>> over
>>>>>>> 10 years (column y01 to y10).
>>>>>>> 
>>>>>>> I need to analyse it with a temporal sliding window
of 5 years (y01
>>>>>>> to
>>>>>>> y05, y02 to y06 and so forth)
>>>>>>> For each period I'm looking for specific
sequences (e.g., Maize,
>>>>>>> -noMaize, -noMaize, -noMaize, -noMaize) to
calculate the "return
>>>>>>> time"
>>>>>>> of principal land covers: barley (2BC), colza
(2Co), maize (2Ma),
>>>>>>> etc.
>>>>>>> I define the "return time" as the
presence of a given land cover
>>>>>>> according to a given sequence. Hence, each return
time could
>>>>>>> require
>>>>>>> the sum of different sequences (e.g., a return time
of 5 years
>>>>>>> derives
>>>>>>> from the sum of [2Ma,no2Ma,no2Ma,no2Ma,no2Ma] +
>>>>>>> [no2Ma,no2Ma,no2Ma,no2Ma,2Ma]).
>>>>>>> I need to repeat the calculation for each land
cover for each time
>>>>>>> window. In addition, I need to repeat the process
over three
>>>>>>> datasets
>>>>>>> (the one I give is the first one, the second one is
from year 12 to
>>>>>>> year 24, the third one from year 27 to year 31. So
I have breaks in
>>>>>>> the monitoring of land cover that avoid me to
create a continuous
>>>>>>> dataset). At the end I expect to aggregate the sum
for each spatial
>>>>>>> entity (column WS)
>>>>>>> 
>>>>>>> I've started writing the code for the first
crop in the first 5yrs
>>>>>>> period (http://goo.gl/FhZNx) then copying and
pasting it for each
>>>>>>> crop
>>>>>>> then for each time window...
>>>>>>> Moreover I do not know how to aggregate the results
of table(). (NB
>>>>>>> sometimes I have a different number of WS per table
because a given
>>>>>>> sequence could be absent in a given spatial
entity... so I have the
>>>>>>> following warning msg: number of columns of result
is not a
>>>>>>> multiple
>>>>>>> of vector length (arg 1)). Therefore, I'm
"obliged" to copy&paste
>>>>>>> the
>>>>>>> table corresponding to each sequence....
>>>>>>> 
>>>>>>> FIRST QUEST. How to aggregate the results of
table() when the
>>>>>>> number
>>>>>>> of columns is different?
>>>>>>> Or the other way around: Is there a way to have a
table where each
>>>>>>> row
>>>>>>> reports the number of points per time return per
WS? something like
>>>>>>> 
>>>>>>> WS1    WS2    WS3    WS4    ...    WS16    crop   
period
>>>>>>> 23    15    18    43    ...    52       Ma5    01
>>>>>>> 18    11    25    84    ...    105       Ma2    01
>>>>>>> ...    ...    ...    ...    ...    ...    ...   
...
>>>>>>> ...    ...    ...    ...    ...    ...    Co5    01
>>>>>>> ...    ...    ...    ...    ...    ...    ...   
...
>>>>>>> ...    ...    ...    ...    ...    ...    Ma5    02
>>>>>>> ...    ...    ...    ...    ...    ...    ...   
...
>>>>>>> In this table each row should represent a return
time for a given
>>>>>>> land
>>>>>>> cover a given period (one of the 6 time window of 5
years)?
>>>>>>> 
>>>>>>> SECOND QUEST. Could a loop (instead of a modular
copy/paste code)
>>>>>>> improve the time/reliability of the calculation? If
yes, could you
>>>>>>> please indicate me some entry-level references to
write it?
>>>>>>> 
>>>>>>> I am aware this are newbie's questions, but I
have not be able to
>>>>>>> solve them using manuals and available sources.
>>>>>>> Thank you in advance for your help.
>>>>>>> 
>>>>>>> Greetings,
>>>>>>> Dd
>>>>>>> 
>>>>>>> PS
>>>>>>> R: version 2.14.2 (2012-02-29)
>>>>>>> OS: MS Windows XP Home 32-bit SP3
>>>>>>> 
>>>>>>> *****************************
>>>>>>> Davide Rizzo
>>>>>>> post-doc researcher
>>>>>>> INRA UR055 SAD-ASTER
>>>>>>> website :: http://sites.google.com/site/ridavide/
>> ____________________________________________________________
>> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
>> Check it out at http://www.inbox.com/earth
>> 
>> 
>
____________________________________________________________
GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at
http://www.inbox.com/smileys
Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most
webmails

John Kane

2012-Sep-19 13:29 UTC

head link

[R] [newbie] aggregating table() results and simplifying code with loop

> -----Original Message-----
> From: ridavide at gmail.com
> Sent: Wed, 19 Sep 2012 00:42:50 +0200
> To: ruipbarradas at sapo.pt
> Subject: Re: [R] [newbie] aggregating table() results and simplifying
> code with loop
> 
> Hi dear R-helpers,
> 
> I really appreciate your "coding the answers" to (my) questions.
> I am now in the situation to get results in a very shorter time and
> with a proper code.
> 
> +++++ the study (summary) +++++++++++++++++++++
> 
> In (short) response to the question by John, the global target of the
> analysis is to identify the dynamics of crop sequences in a given
> region. The spatial grain is the elementary watershed (as defined by
> the local water agency). The temporal grain is a 5 year period (it
> allows, according to previous analyses, to grasp the major crop
> sequences for the study area). The input data is a grid of points
> where the land cover has been monitored since the 80's up to now (with
> a couple of interruptions). The first part of the analysis was aimed
> at characterizing crop sequences and their occurrences in the
> different watersheds.
> The step of the analysis for what I'm asking your help is aimed at
> calculating the return time of the most relevant crops (from an
> agronomic point of view), on which operate then a clustering. Raw
> attempts I've tried so far with other methods appeared to give
> promising results, so I've decided to improve the approach using R.
> Here is where you come in.
> +++++++++++++++++++++++++++++++++++++++++++
> 
> ++++++ (unsolved) questions +++++++++++++++++++
> 
> The proposal by John is clear to me, probably because his way of
> coding is closer to my capability to understand. 
That's because I'm much closer to you in knowledge about R.  
> The solution by Rui
> looks interesting, complete and effective (few elementary functions
> and reduced time to have the results to be used in the other steps of
> the analysis).
> 
> Nevertheless I'm still in trouble because I'm not used with
function
> (that's one of the reasons I'm here).
> The following questions will sound naive to you, but I'm not able to
> decrypt Rui's code without asking again your help. Basic manuals (like
> "R Language Definition" ch. 4, or "The R Inferno" ch.
5) were not
> sufficient for me to understand how "w", "DF",
"Mat", "WS" and
> "currcols" are defined.
> In other words, I see that the five embedded functions Rui has
> proposed call for the above mentioned arguments, but I'm not able to
> understand at which moment these arguments are defined.
> +++++++++++++++++++++++++++++++++++++++++++
Rui's approach is "very, very  nice but does seem a bit opaque to us
newbies.

I think the first thing is : Do you undrestand the 'result' list
structure? It sounds like you do but basically he is processing each set of
conditions that you supplied and storing them indiviually as data frames in each
element of the list.

Secondly, to work through the functions you should be able to step through each
function without running the function.


For example for F5 the working parts are
===========================================================sp <- split(w,
w$V8)
	res <- do.call( rbind, lapply(sp, f4) )
	res <- data.frame(res)
	res
============================================================
We see from the main program that 

result <- lapply(want, f5)

so for the above code want is the "w" value.  Just stick it into the
code and see what happens. That is print out values as you go.

Each call seems to go to another function but bit by bit you should be able to
see what is happening.  I suspect it will be messy but it should help a bit.

Another trick might be to make up a very tiny data set that meets all conditions
and run the program in, I think , debug mode and stick in print statements all
along the way so that you can get a feel for what is happening at each step.

In any case you should be able to extact each condition from result by something
like crop  <-  result[[1]] (note [[1]]  rather than [1] and run ddply() or
aggregate() on the resulting data frame to see if the output looks reasonable.

Thanks for the quick summary.  Once I realised that WS = Watershed I thought
that was what it was likely to be about but it`s handy to know and sometimes
even a slight bit of knowledge about the subject area can help alert one to
something strange in the output that may indicate a programing or logic problem.

Good luck.   R is amazingly powerful and can often produce results much more
quickly than other stats language but it requires a different mindset. I used to
get very funny mind aches when I first started using it after years of SAS or
SYSTAT or even SPSS.
> 
> Everything flows without warnings, so I have the bad feeling to find
> myself less smart than R...
> That not a big problem, but having not understood the arguments, I do
> not understand the functions (so, the results).
> Thanks in advance for meeting this call for some extra-help...
> 
> Take care,
> Dd
> 
> ***********************************************************
> Davide Rizzo
> website :: http://sites.google.com/site/ridavide/
> 
> 
> On Tue, Sep 18, 2012 at 5:08 PM, Rui Barradas <ruipbarradas at
sapo.pt>
> wrote:
>> 
>> Hello,
>> Em 18-09-2012 15:29, John Kane escreveu:
>> 
>>> Looks very interesting and as I said you're a much better
progamer than
>>> I am. I get headaches using lapply()!
>> 
>> 
>> Thanks!
>> As for lapply, if you understand apply you'll understand lapply.
With
>> some differences, the principle is the same.
>> 
>>> 
>>> I am going to have to spend some time stepping through the
functions to
>>> really see what you are doing.
>> 
>> 
>> I've tried to keep each function simple, but it can become
complicated
>> to follow the call tree down to f1.
>> 
>> 
>>>     However if I understand the output, for example one can simply
>>> aggreate the information by Crop and Period in each element of the
list
>>> to get the corresponding sums for each of the desired patterns?
>>> 
>>> Thus for example in result[[1]] we get all the results for
condition 19
>>> (crops) from wanted?
>> 
>> 
>> Yes, or at least it's supposed to. And I've kept in separate
list
>> elements precisely to have simplified access to each combination of
Crop
>> and Period.
>> (Condition 19, the last one, comes first because split sorts its output
>> by list element name, and 'crop' is the first.)
>> 
>> Rui Barradas
>> 
>> 
>>> 
>>> 
>>> John Kane
>>> Kingston ON Canada
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: ruipbarradas at sapo.pt
>>>> Sent: Mon, 17 Sep 2012 16:04:15 +0100
>>>> To: jrkrideau at inbox.com, ridavide at gmail.com, jrkrideau at
inbox.com
>>>> Subject: Re: [R] [newbie] aggregating table() results and
simplifying
>>>> code with loop
>>>> 
>>>> Hello,
>>>> 
>>>> Now I believe I've got it. (So-so believe.)
>>>> ddply is a good idea, in my first post I was using xtabs to the
same
>>>> effect, but ddply makes it a lot easier.
>>>> I had made a big mistake, though. I thought that only sequences
of
>>>> FALSE
>>>> were valid, hence rle(). In a follow-up post, Davide listed all
>>>> combinations of T/F that matter. The code below uses them to
create a
>>>> df
>>>> of cases to count with ddply. The creation is done by merge, on
the
>>>> T/F
>>>> columns (function f1) and the rest is just to apply to 6
periods of 5
>>>> years each, after applying to covers, after applying to years
of crop
>>>> type.
>>>> Sounds confusing, and it probably is but I've separated all
of it in
>>>> small functions.
>>>> Each of the functions is in fact very simple, and all one needs
is to
>>>> remember that the inner most function is reached after a series
of
>>>> *apply statements, doing one thing at a time.
>>>> 
>>>> 
>>>>
#------------------------------------------------------------------------------
>>>> # data.frame of target sequences of T/F per crop type and years
>>>> # (from an earlier post)
>>>> 
>>>> wanted <-
>>>> structure(list(V2 = c(TRUE, FALSE, TRUE, FALSE, FALSE, TRUE,
>>>> FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE,
FALSE,
>>>> TRUE, FALSE, TRUE), V3 = c(FALSE, FALSE, FALSE, TRUE, FALSE,
>>>> FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE,
>>>> FALSE, TRUE, TRUE, TRUE), V4 = c(FALSE, FALSE, FALSE, FALSE,
>>>> FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE,
>>>> TRUE, TRUE, TRUE, TRUE, FALSE), V5 = c(FALSE, FALSE, FALSE,
FALSE,
>>>> TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE,
>>>> FALSE, TRUE, FALSE, TRUE, TRUE), V6 = c(FALSE, TRUE, TRUE,
FALSE,
>>>> FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE,
TRUE,
>>>> TRUE, TRUE, FALSE, FALSE, TRUE), V7 = c(" return",
" return",
>>>> " return", " return", " return",
" return", " return", " return",
>>>> " return", " return", "
mono-succession", " mono-succession",
>>>> " mono-succession", " mono-succession",
" mono-succession","
>>>> mono-succession",
>>>> " mono-succession", " mono-succession",
" crops"), V8 = c(5L,
>>>> 5L, 4L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 5L, 4L, 4L, 3L, 3L, 3L, 3L,
>>>> 3L, 2L)), .Names = c("V2", "V3",
"V4", "V5", "V6", "V7", "V8"
>>>> ), row.names = c(NA, -19L), class = "data.frame")
>>>> 
>>>> want <- split(wanted, wanted$V7)
>>>> 
>>>>
#--------------------------------------------------------------------
>>>> # The functions go from f5 down to f1 (f5 calls f4, f4 calls
f3, etc.)
>>>> #
>>>> f1 <- function(DF, Mat, currcols, WS){
>>>>         mcols <- names(T80[, ycols])[currcols]  # names for
'merge'
>>>>         names(DF)[1:5] <- mcols
>>>>         m <- merge(DF, data.frame(Mat[, mcols], WS = WS))
>>>>         ddply(m, .(WS), summarize, length(V8))
>>>> }
>>>> f2 <- function(w, tf, WS){
>>>>         l1 <- lapply(yrs, function(yy) t(f1(w, tf, yy, WS)))
>>>>         l2 <- lapply(l1, function(x){
>>>>                 nms <- as.character(x["WS",])
>>>>                 resline[ nms ] <- x[2, ]
>>>>                 resline})
>>>>         res <- do.call(rbind, l2)
>>>>         data.frame(res)
>>>> }
>>>> f3 <- function(Cover, w){
>>>>         tf <- T80[, ycols] == Cover
>>>>         inx <- as.vector(apply(tf, 1, function(.x) any(.x)))
>>>>         tf <- tf[inx, ]
>>>>         WS <- T80$WS[inx]
>>>>         # Special care with empty sets
>>>>         if(sum(tf) > 0){
>>>>                 res <- f2(w, tf, WS)
>>>>                 na <- as.vector(apply(res, 1, function(.x)
>>>> all(is.na(.x))))
>>>>                 res <- res[!na, ]
>>>>                 if(!is.null(res) && nrow(res) > 0){
>>>>                         res$Cover <- Cover
>>>>                         res$Period <- (1:6)[!na]
>>>>                 }else res <- NULL
>>>>         }else res <- NULL
>>>>         res
>>>> }
>>>> f4 <- function(w){
>>>>         res <- do.call(rbind, lapply(covers, f3, w))
>>>>         # Special care with empty sets
>>>>         if(!is.null(nrow(res)) && nrow(res) > 0){
>>>>                 res <- data.frame(res)
>>>>                 res$Years <- w$V8[1]
>>>>         }else res <- NULL
>>>>         res
>>>> }
>>>> f5 <- function(w){
>>>>         sp <- split(w, w$V8)
>>>>         res <- do.call( rbind, lapply(sp, f4) )
>>>>         res <- data.frame(res)
>>>>         res
>>>> }
>>>> 
>>>> T80 <- read.table("sample.txt", header = TRUE, sep
= ";")
>>>> 
>>>> ycols <- grep("y", names(T80))
>>>> ws <- unique(T80$WS)
>>>> covers <- as.character(unique(unlist(T80[ycols])))
>>>> yrs <- lapply(0:5, `+`, 1:5)
>>>> resline <- rep(NA, length(ws))
>>>> names(resline) <- ws
>>>> 
>>>> result <- lapply(want, f5)
>>>> 
>>>> 
>>>> (Ok, maybe two or three things at a time.)
>>>> 
>>>> I hope this is it!
>>>> Have fun.
>>>> 
>>>> Rui Barradas
>>>> 
>>>> Em 17-09-2012 14:19, John Kane escreveu:
>>>>> 
>>>>> Comments in-line below.
>>>>> 
>>>>> Sorry to be so long getting back to you but sleep
intervened.
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: ridavide at gmail.com
>>>>>> Sent: Sun, 16 Sep 2012 21:24:15 +0200
>>>>>> To: jrkrideau at inbox.com
>>>>>> Subject: Re: [R] [newbie] aggregating table() results
and
>>>>>> simplifying
>>>>>> code with loop
>>>>>> 
>>>>>> Thank you John,
>>>>>> 
>>>>>> you are giving me two precious tips (in addition, well
explained!):
>>>>>> 1. to use the package plyr (I didn't know it
before, but it seems to
>>>>>> make the deal!)
>>>>>> 2. a smart and promising way to use it
>>>>>> 
>>>>>> I can finally plot the partial results, to have a first
glance and
>>>>>> compare to them
>>>>>> 
>>>>>>
=========================================================>>>>>>
>>>>>> # once the sequences for a given crop have been
assembled in a
>>>>>> dataframe (e.g., maizedata)
>>>>>> library(latticeExtra) # load the package for
"improved" graphics
>>>>>> dotplot(mzcount$crop_pattern) # to plot the occurrence
of all the
>>>>>> retrieved sequences
>>>>>> 
>>>>>> #once the target pattern have been subset (e.g., m51)
>>>>>> dotplot(m51$WS ~ m51$count) # to plot the occurrence of
the target
>>>>>> pattern(s) per watershed
>>>>> 
>>>>> Ah, so that's what WS stands for!  Do you have an
abstract or short
>>>>> summary of what you are doing in English or French?  My
Itallian is
>>>>> limited to about 3 words or what I can guess from
French/Latin.
>>>>> 
>>>>> I think that you are misunderstanding what is happening in
m51. It is
>>>>> just summarizing the total counts across those two
conditions.  I
>>>>> think
>>>>> either I made a logic error or was not understanding what
was needed.
>>>>> In
>>>>> any case the dotplot is not plotting the actual occurances
but the
>>>>> number of times the crop_pattern was found.  The count is
the actual
>>>>> number of occurances per WS per crop_pattern
>>>>> 
>>>>> 
>>>>> Have a look at my new effort.  it is still ugly but I think
it more
>>>>> accurately supplies some of what you want.
>>>>> 
>>>>> Two caveats : 1.  This is still very rough and an better
programer
>>>>> may
>>>>> have a better approach.
>>>>> 2. I used the package ggplot2 to do the graphs. One more
thing to
>>>>> learn
>>>>> but I am not used to lattice and I am used to ggplot2.
Rather than
>>>>> spend
>>>>> a lot of time with lattice or
>>>>> 2. The last faceted plots are not intended to be of real
use, just my
>>>>> quick look to see if anything looked like I expected. A bit
of
>>>>> thinking
>>>>> probably would give somethimg much better
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>>
=========================================================>>>>>>
>>>>>> This help me retrieving all the possible patterns for
the different
>>>>>> land covers. Hence, you made me able to improve the
subset of
>>>>>> patterns.
>>>>>> 
>>>>>> Tomorrow morning I'm going to test the tips for all
the land covers
>>>>>> for all the 5years time-periods. Even if still
labour-intensive, the
>>>>>> solution you propose it's surely a steady
improvement.
>>>>>> Now only the second question remains: is there a way
"to clean" this
>>>>>> approach?... or probably not ?...
>>>>> 
>>>>> Yes I suspect that there are lots of ways to
'clean' the process and
>>>>> automate it so that it would apply to all three data sets
quite
>>>>> easily.
>>>>> Some I think I can help with and some you may need other
people's
>>>>> help.
>>>>> 
>>>>> For example, once we have a working model to handle one or
two
>>>>> conditions It should be relatively easy to use an apply()
or a loop
>>>>> to
>>>>> handle all of them and so on..
>>>>> 
>>>>> Well, I'm off to work now so I probably won't be
able to get back to
>>>>> much before late evening my time ( probably after you are
asleep) I
>>>>> think I'm 6 hours ahead of you.
>>>>> 
>>>>> ##===================Revised
approach===================>>>>> # load the various packages
(plyr, latticeExtra, ggplot2, reshape2)
>>>>> library(plyr)
>>>>> library(latticeExtra)
>>>>> library(ggplot2)
>>>>> library(reshape2)
>>>>> 
>>>>> # sample data
>>>>> T80<- read.csv("/home/john/rdata/sample.csv", 
header = TRUE, sep >>>>> ";")
>>>>> # Davide's actual read statement
>>>>> # T80<-read.table(file="C:/sample.txt",
header=T, sep=";")
>>>>> 
>>>>> # Looking for Maize
>>>>> pattern  <-  c("2Ma",
"2Ma","2Ma", "2Ma","2Ma")
>>>>> 
>>>>> # one row examples to see that is happening
>>>>> T80[1,3:7]
>>>>> T80[1, 3:7] == pattern
>>>>> 
>>>>> T80[405, 3:7]
>>>>> T80[405, 3:7] == pattern
>>>>> 
>>>>> T80[55, 3:7] == pattern
>>>>> 
>>>>> # now we apply the patterns to the entire data set.
>>>>> pp1  <-  T80[, 3:7] == pattern
>>>>> 
>>>>> # paste the TRUEs and FALSEs together to form a single
variable
>>>>> concatdat  <-  paste(pp1[, 1], pp1[, 2], pp1[, 3], pp1[,
4],pp1[,5] ,
>>>>> sep = "+")
>>>>> 
>>>>> # Assmble new data frame.
>>>>> maizedata  <-  data.frame(T80$WS, concatdat)
>>>>> names(maizedata)  <-  c("WS",
"crop_pattern")
>>>>> str(maizedata)
>>>>> 
>>>>> maizedata$crop_pattern  <- 
as.character(maizedata$crop_pattern)
>>>>> 
>>>>> pattern_count  <-  ddply(maizedata, .(crop_pattern),
summarize,
>>>>> npattern
>>>>> = length(crop_pattern))
>>>>> str(pattern_count)
>>>>> head(pattern_count);  dim(pattern_count) #      quick look
at
>>>>> data.frame and
>>>>> its size.
>>>>> 
>>>>> # FALSE+FALSE+FALSE+FALSE+FALSE accounts for 21,493 values.
>>>>> 
>>>>> which(pattern_count$npattern ==
max(pattern_count$npattern))  # this
>>>>> does the same as looking at the data
>>>>> 
>>>>> # not needed here but useful for larger datasets.
>>>>> 
>>>>> # If we graph pattern_count as it stands we lose any useful
detail
>>>>> because of that outlier.
>>>>> p  <-  ggplot(pattern_count  , aes(crop_pattern,
npattern  )) +
>>>>> geom_point() +
>>>>>              coord_flip()
>>>>> p
>>>>> 
>>>>> (pattern1  <-  pattern_count[-1,])  # Drop the offending
>>>>> FALSE+FALSE+FALSE+FALSE+FALSE
>>>>> dim(pattern1)  # Okay now we have the maize patterns,
without the WS
>>>>> who
>>>>> had no maize at all.
>>>>> 
>>>>> p  <-  ggplot(pattern1  , aes(crop_pattern, npattern  
)) +
>>>>> geom_point()
>>>>> +
>>>>>              coord_flip()
>>>>> p
>>>>> 
>>>>> 
>>>>> newmaize  <-  subset(maizedata, maizedata$crop_pattern
!>>>>> "FALSE+FALSE+FALSE+FALSE+FALSE")
>>>>> dim(newmaize) ;  head(newmaize)
>>>>> str(newmaize)
>>>>> 
>>>>> summaize_by_WS   <-  ddply(newmaize, .(WS), summarize,
>>>>> crop_pattern_ws >>>>> length(crop_pattern))
>>>>> 
>>>>> p  <-  ggplot( summaize_by_WS  , aes(summaize_by_WS  ))
+ geom_point
>>>>> +
>>>>>                coord_flip()
>>>>> p
>>>>> 
>>>>> summaize_by_WS_and_crop   <-  ddply(newmaize, .(WS,
crop_pattern),
>>>>> summarize, crop_pattern_ws = length(crop_pattern))
>>>>> 
>>>>> # crappy graph but just try to see what we might get. 
probablly need
>>>>> to
>>>>> subset or use better grid layout.
>>>>> p  <-  ggplot( summaize_by_WS_and_crop  ,
aes(crop_pattern,
>>>>> crop_pattern_ws   )) + geom_point() +
>>>>>                coord_flip() + facet_grid(WS ~ . )
>>>>> p
>>>>> 
>>>>> # save the last graph to look at it in a graphics package==
still
>>>>> terrible
>>>>> ggsave( "/home/john/Rjunk/crop.png")
>>>>> 
>>>>> ##
>>>>>> 
>>>>>> Thanks again for the help.
>>>>>> Cheers,
>>>>>> Dd
>>>>>> 
>>>>>>
***********************************************************
>>>>>> Davide Rizzo
>>>>>> website :: http://sites.google.com/site/ridavide/
>>>>>> 
>>>>>> 
>>>>>> On Sun, Sep 16, 2012 at 8:24 PM, John Kane
<jrkrideau at inbox.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi Davide,
>>>>>>> 
>>>>>>> I had some time this afternoon and I wonder if this
approach is
>>>>>>> llkely
>>>>>>> to get the results you want?  As before it is not
complete but I
>>>>>>> think
>>>>>>> it holds promise.
>>>>>>> 
>>>>>>> On the other hand Rui is a much better programer
than I am so he
>>>>>>> may
>>>>>>> have a much cleaner solution.  My way still looks
labour-intensive
>>>>>>> at
>>>>>>> the moment.
>>>>>>> 
>>>>>>> I am using the plyr package which you will probably
have to
>>>>>>> install.
>>>>>>> load.packages("plyr") should do it.
>>>>>>>
=========================================================>>>>>>>
# load the plyr package -
>>>>>>> library(plyr)
>>>>>>> 
>>>>>>> # sample data
>>>>>>> T80<-
read.csv("/home/john/rdata/sample.csv",  header = TRUE, sep
>>>>>>> ";")
>>>>>>> # Davide's actual read statement
>>>>>>> #
T80<-read.table(file="C:/sample.txt", header=T, sep=";")
>>>>>>> 
>>>>>>> # Looking for Maize
>>>>>>> pattern  <-  c("2Ma",
"2Ma","2Ma", "2Ma","2Ma")
>>>>>>> 
>>>>>>> # one row examples to see that is happening
>>>>>>> T80[1,3:7]
>>>>>>> T80[1, 3:7] == pattern
>>>>>>> 
>>>>>>> T80[405, 3:7]
>>>>>>> T80[405, 3:7] == pattern
>>>>>>> 
>>>>>>> T80[55, 3:7] == pattern
>>>>>>> 
>>>>>>> # now we apply the patterns to the entire data set.
>>>>>>> pp1  <-  T80[, 3:7] == pattern
>>>>>>> 
>>>>>>> # paste the TRUEs and FALSEs together to form a
single variable
>>>>>>> concatdat  <-  paste(pp1[, 1], pp1[, 2], pp1[,
3], pp1[, 4],pp1[,5]
>>>>>>> ,
>>>>>>> sep = "+")
>>>>>>> 
>>>>>>> # Assmble new data frame.
>>>>>>> maizedata  <-  data.frame(T80$WS, concatdat)
>>>>>>> names(maizedata)  <-  c("WS",
"crop_pattern")
>>>>>>> 
>>>>>>> mzcount  <-  ddply(maizedata, .(WS,
crop_pattern),  summarize,
>>>>>>> count >>>>>>>
length(crop_pattern))
>>>>>>> mzcount  # This is all the data not just the
relevant maise
>>>>>>> patterns
>>>>>>> 
>>>>>>> # This seems to be getting us somewhere though we
are not not there
>>>>>>> yet
>>>>>>> # Does this subset  look like we are going in the
right direction?
>>>>>>> m51  <-  subset(mzcount,
>>>>>>> mzcount$crop_pattern ==
"FALSE+FALSE+FALSE+FALSE+TRUE"
>>>>>>> | mzcount$crop_pattern ==
"TRUE+FALSE+FALSE+FALSE+FALSE")
>>>>>>> 
>>>>>>> m51  <-  ddply(m51, .(WS), summarize, count =
sum(count))
>>>>>>> m51
>>>>>>>
================================================================>>>>>>>
>>>>>>> John Kane
>>>>>>> Kingston ON Canada
>>>>>>> 
>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: ridavide at gmail.com
>>>>>>>> Sent: Sat, 15 Sep 2012 19:00:29 +0200
>>>>>>>> To: jrkrideau at inbox.com, ruipbarradas at
sapo.pt
>>>>>>>> Subject: Re: [R] [newbie] aggregating table()
results and
>>>>>>>> simplifying
>>>>>>>> code with loop
>>>>>>>> 
>>>>>>>> Thanks Rui, thanks John for your very different
solutions.
>>>>>>>> 
>>>>>>>> I'll try to break my questions into smaller
steps following your
>>>>>>>> tips.
>>>>>>>> However, not everything is clear for me... so
before giving you a
>>>>>>>> feed-back I need to study further your answers.
For the moment I
>>>>>>>> could
>>>>>>>> specify that I'm looking for the following
19 patterns:
>>>>>>>> 
>>>>>>>> 1. True, False, False, False, False # return
period of 5 years
>>>>>>>> (1/2)
>>>>>>>> 2. False, False, False, False, True # return
period of 5 years
>>>>>>>> (2/2)
>>>>>>>> 3. True, False, False, False, True # return
period of 4 years
>>>>>>>> (1/3)
>>>>>>>> 4. False, True, False, False, False # return
period of 4 years
>>>>>>>> (2/3)
>>>>>>>> 5. False, False, False, True, False # return
period of 4 years
>>>>>>>> (3/3)
>>>>>>>> 6. True, False, False, True, False # return
period of 3 years
>>>>>>>> (1/3)
>>>>>>>> 7. False, True, False, False, True # return
period of 3 years
>>>>>>>> (2/3)
>>>>>>>> 8. False, False, True, False, False # return
period of 3 years
>>>>>>>> (3/3)
>>>>>>>> 9. False, True, False, True, False # return
period of 2 years
>>>>>>>> (1/2)
>>>>>>>> 10. True, False, True, False, True # return
period of 2 years
>>>>>>>> (1/2)
>>>>>>>> 11. True, True, True, True, True #
mono-succession of 5 years
>>>>>>>> 12. False, True, True, True, True #
mono-succession of 4 years
>>>>>>>> (1/2)
>>>>>>>> 13. True, True, True, True, False #
mono-succession of 4 years
>>>>>>>> (2/2)
>>>>>>>> 14. True, False, True, True, True #
mono-succession of 3 years
>>>>>>>> (1/5)
>>>>>>>> 15. True. True. True. False, True #
mono-succession of 3 years
>>>>>>>> (2/5)
>>>>>>>> 16. False, False, True, True, True #
mono-succession of 3 years
>>>>>>>> (3/5)
>>>>>>>> 17. True, True, True, False, False #
mono-succession of 3 years
>>>>>>>> (4/5)
>>>>>>>> 18. False, True, True, True, False #
mono-succession of 3 years
>>>>>>>> (5/5)
>>>>>>>> 19. True, True, False, True, True # crops
repeated two years
>>>>>>>> 
>>>>>>>> In particular, I want to apply all these 19
patterns to 7 (out of
>>>>>>>> 11)
>>>>>>>> land covers: 2BC, 2Co, 2Ma, 2We, 2MG, 2ML, 2PG.
The pattern are so
>>>>>>>> structured: True means presence of a given land
cover
>>>>>>>> (iteratively,
>>>>>>>> one of the seven listed above), False means any
other land-cover
>>>>>>>> (amidst the remainder 10).
>>>>>>>> 
>>>>>>>> Thanks again for any further help.
>>>>>>>> Greetings,
>>>>>>>> Dd
>>>>>>>> 
>>>>>>>>
***********************************************************
>>>>>>>> Davide Rizzo
>>>>>>>> website ::
http://sites.google.com/site/ridavide/
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sat, Sep 15, 2012 at 5:51 PM, John Kane
<jrkrideau at inbox.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> I have not seen any replies to your
questions so I will suggest
>>>>>>>>> an
>>>>>>>>> approach that may work if I can get a
function to work.
>>>>>>>>> 
>>>>>>>>> If I understand what you want, you have a
pattern something like
>>>>>>>>> this:
>>>>>>>>> pattern1  <-  c("2Ma",
"no2Ma","no2Ma", "no2Ma","no2Ma")
>>>>>>>>> pattern2  <-  c("no2Ma",
'no2Ma', "no2Ma", "no2Ma", "2Ma")
>>>>>>>>> 
>>>>>>>>> for each five year period where 2Ma stands
to Maize, one of 11
>>>>>>>>> different
>>>>>>>>> grains
>>>>>>>>>     1AU   2BC   2Co   2Ma   2MG   2ML   2oc
2PG   2SA   2We
>>>>>>>>> 3sN
>>>>>>>>> 
>>>>>>>>> and what you want to know is if each year
gives a pattern like
>>>>>>>>> 
>>>>>>>>> check1 <-  c(TRUE, FALSE, FALSE, FALSE,
FALSE)
>>>>>>>>> check2  <-  c(FALSE, FALSE, FALSE,
FALSE, TRUE)
>>>>>>>>> 
>>>>>>>>> If I understand the patterns you only care
for the two above, is
>>>>>>>>> that
>>>>>>>>> correct?
>>>>>>>>> 
>>>>>>>>> I am running out of time today but I think
that this approach
>>>>>>>>> will
>>>>>>>>> get
>>>>>>>>> you started
>>>>>>>>>
==========================================================>>>>>>>>>
>>>>>>>>>
T80<-read.table(file="C:/sample.txt", header=T, sep=";")
>>>>>>>>> 
>>>>>>>>> # Reminder of just what we want to get as a
final result.
>>>>>>>>> check1 <-  c(TRUE, FALSE, FALSE, FALSE,
FALSE)
>>>>>>>>> check2  <-  c(FALSE, FALSE, FALSE,
FALSE, TRUE)
>>>>>>>>> 
>>>>>>>>> pattern1  <-  c("2Ma",
"2Ma","2Ma", "2Ma","2Ma")
>>>>>>>>> 
>>>>>>>>> # one row examples to see that is happening
>>>>>>>>> T80[1,3:7]
>>>>>>>>> T80[1, 3:7] == pattern1
>>>>>>>>> 
>>>>>>>>> T80[405, 3:7]
>>>>>>>>> T80[405, 3:7] == pattern1
>>>>>>>>> 
>>>>>>>>> # now we apply the patterns to the entire
data set.
>>>>>>>>> pp1  <-  T80[, 3:7] == pattern1
>>>>>>>>> pp2  <-  T80[, 3:7] == pattern2
>>>>>>>>> 
>>>>>>>>> # reassign the WS values so we know where
the data is from
>>>>>>>>> WSnames  <-  rep(T80$WS, 2)
>>>>>>>>> 
>>>>>>>>> # Assmble new data frame.
>>>>>>>>> maizedata  <-  data.frame(WSnames,
rbind(pp1,pp2))
>>>>>>>>>
=======================================================>>>>>>>>>
>>>>>>>>> Now, assuming this runs for you and I have
not made a serious
>>>>>>>>> mistake
>>>>>>>>> in
>>>>>>>>> logic, kyou should be able to do some
subsetting  (?subset)  to
>>>>>>>>> extract
>>>>>>>>> only the
>>>>>>>>> check1 and check2 patterns above.
>>>>>>>>> 
>>>>>>>>> This is where I ran into trouble as I
don't have the time this
>>>>>>>>> morning
>>>>>>>>> to work out the subsetting conditions. It
looks tricking and you
>>>>>>>>> probably need a couple of subsetting moves.
>>>>>>>>> 
>>>>>>>>> It's not a pretty  solutlion and,
particularly, I expect someone
>>>>>>>>> could
>>>>>>>>> clean it up to make the subsetting easier
or even unnecessary but
>>>>>>>>> I
>>>>>>>>> hope
>>>>>>>>> it helps.
>>>>>>>>> 
>>>>>>>>> Once you have extracted what you want   use
apply() or perhaps
>>>>>>>>> the
>>>>>>>>> plyr
>>>>>>>>> package to aggregate the results.
>>>>>>>>> 
>>>>>>>>> Repeat for all grains.  Actually look into
setting the whole
>>>>>>>>> thing
>>>>>>>>> up
>>>>>>>>> as
>>>>>>>>> a function. You should be able to write the
program once as a
>>>>>>>>> function
>>>>>>>>> and do a loop or an apply() to do all 11
grains in one go.
>>>>>>>>> 
>>>>>>>>> Best of luck.
>>>>>>>>> 
>>>>>>>>> John Kane
>>>>>>>>> Kingston ON Canada
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: ridavide at gmail.com
>>>>>>>>>> Sent: Thu, 13 Sep 2012 15:36:28 +0200
>>>>>>>>>> To: r-help at r-project.org
>>>>>>>>>> Subject: [R] [newbie] aggregating
table() results and
>>>>>>>>>> simplifying
>>>>>>>>>> code
>>>>>>>>>> with loop
>>>>>>>>>> 
>>>>>>>>>> Dear all,
>>>>>>>>>> I'm looking for primary help at
aggregating table() results and
>>>>>>>>>> at
>>>>>>>>>> writing a loop (if useful)
>>>>>>>>>> 
>>>>>>>>>> My dataset ( http://goo.gl/gEPKW ) is
composed of 23k rows, each
>>>>>>>>>> one
>>>>>>>>>> representing a point in the space of
which we know the land
>>>>>>>>>> cover
>>>>>>>>>> over
>>>>>>>>>> 10 years (column y01 to y10).
>>>>>>>>>> 
>>>>>>>>>> I need to analyse it with a temporal
sliding window of 5 years
>>>>>>>>>> (y01
>>>>>>>>>> to
>>>>>>>>>> y05, y02 to y06 and so forth)
>>>>>>>>>> For each period I'm looking for
specific sequences (e.g., Maize,
>>>>>>>>>> -noMaize, -noMaize, -noMaize, -noMaize)
to calculate the "return
>>>>>>>>>> time"
>>>>>>>>>> of principal land covers: barley (2BC),
colza (2Co), maize
>>>>>>>>>> (2Ma),
>>>>>>>>>> etc.
>>>>>>>>>> I define the "return time" as
the presence of a given land cover
>>>>>>>>>> according to a given sequence. Hence,
each return time could
>>>>>>>>>> require
>>>>>>>>>> the sum of different sequences (e.g., a
return time of 5 years
>>>>>>>>>> derives
>>>>>>>>>> from the sum of
[2Ma,no2Ma,no2Ma,no2Ma,no2Ma] +
>>>>>>>>>> [no2Ma,no2Ma,no2Ma,no2Ma,2Ma]).
>>>>>>>>>> I need to repeat the calculation for
each land cover for each
>>>>>>>>>> time
>>>>>>>>>> window. In addition, I need to repeat
the process over three
>>>>>>>>>> datasets
>>>>>>>>>> (the one I give is the first one, the
second one is from year 12
>>>>>>>>>> to
>>>>>>>>>> year 24, the third one from year 27 to
year 31. So I have breaks
>>>>>>>>>> in
>>>>>>>>>> the monitoring of land cover that avoid
me to create a
>>>>>>>>>> continuous
>>>>>>>>>> dataset). At the end I expect to
aggregate the sum for each
>>>>>>>>>> spatial
>>>>>>>>>> entity (column WS)
>>>>>>>>>> 
>>>>>>>>>> I've started writing the code for
the first crop in the first
>>>>>>>>>> 5yrs
>>>>>>>>>> period (http://goo.gl/FhZNx) then
copying and pasting it for
>>>>>>>>>> each
>>>>>>>>>> crop
>>>>>>>>>> then for each time window...
>>>>>>>>>> Moreover I do not know how to aggregate
the results of table().
>>>>>>>>>> (NB
>>>>>>>>>> sometimes I have a different number of
WS per table because a
>>>>>>>>>> given
>>>>>>>>>> sequence could be absent in a given
spatial entity... so I have
>>>>>>>>>> the
>>>>>>>>>> following warning msg: number of
columns of result is not a
>>>>>>>>>> multiple
>>>>>>>>>> of vector length (arg 1)). Therefore,
I'm "obliged" to
>>>>>>>>>> copy&paste
>>>>>>>>>> the
>>>>>>>>>> table corresponding to each
sequence....
>>>>>>>>>> 
>>>>>>>>>> FIRST QUEST. How to aggregate the
results of table() when the
>>>>>>>>>> number
>>>>>>>>>> of columns is different?
>>>>>>>>>> Or the other way around: Is there a way
to have a table where
>>>>>>>>>> each
>>>>>>>>>> row
>>>>>>>>>> reports the number of points per time
return per WS? something
>>>>>>>>>> like
>>>>>>>>>> 
>>>>>>>>>> WS1    WS2    WS3    WS4    ...    WS16
crop    period
>>>>>>>>>> 23    15    18    43    ...    52      
Ma5    01
>>>>>>>>>> 18    11    25    84    ...    105     
Ma2    01
>>>>>>>>>> ...    ...    ...    ...    ...    ... 
...    ...
>>>>>>>>>> ...    ...    ...    ...    ...    ... 
Co5    01
>>>>>>>>>> ...    ...    ...    ...    ...    ... 
...    ...
>>>>>>>>>> ...    ...    ...    ...    ...    ... 
Ma5    02
>>>>>>>>>> ...    ...    ...    ...    ...    ... 
...    ...
>>>>>>>>>> In this table each row should represent
a return time for a
>>>>>>>>>> given
>>>>>>>>>> land
>>>>>>>>>> cover a given period (one of the 6 time
window of 5 years)?
>>>>>>>>>> 
>>>>>>>>>> SECOND QUEST. Could a loop (instead of
a modular copy/paste
>>>>>>>>>> code)
>>>>>>>>>> improve the time/reliability of the
calculation? If yes, could
>>>>>>>>>> you
>>>>>>>>>> please indicate me some entry-level
references to write it?
>>>>>>>>>> 
>>>>>>>>>> I am aware this are newbie's
questions, but I have not be able
>>>>>>>>>> to
>>>>>>>>>> solve them using manuals and available
sources.
>>>>>>>>>> Thank you in advance for your help.
>>>>>>>>>> 
>>>>>>>>>> Greetings,
>>>>>>>>>> Dd
>>>>>>>>>> 
>>>>>>>>>> PS
>>>>>>>>>> R: version 2.14.2 (2012-02-29)
>>>>>>>>>> OS: MS Windows XP Home 32-bit SP3
>>>>>>>>>> 
>>>>>>>>>> *****************************
>>>>>>>>>> Davide Rizzo
>>>>>>>>>> post-doc researcher
>>>>>>>>>> INRA UR055 SAD-ASTER
>>>>>>>>>> website ::
http://sites.google.com/site/ridavide/
>>>>> 
>>>>>
____________________________________________________________
>>>>> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your
desktop!
>>>>> Check it out at http://www.inbox.com/earth
>>>>> 
>>>>> 
>>> ____________________________________________________________
>>> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at
>>> http://www.inbox.com/smileys
>>> Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google
Talk?
>>> and most webmails
>>> 
>>> 
>>
____________________________________________________________
FREE ONLINE PHOTOSHARING - Share your photos online with your friends and
family!
Visit http://www.inbox.com/photosharing to find out more!

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Sep 2012 - [newbie] aggregating table() results and simplifying code with loop

[R] [newbie] aggregating table() results and simplifying code with loop

[R] [newbie] aggregating table() results and simplifying code with loop

[R] [newbie] aggregating table() results and simplifying code with loop

[R] [newbie] aggregating table() results and simplifying code with loop

[R] [newbie] aggregating table() results and simplifying code with loop

[R] [newbie] aggregating table() results and simplifying code with loop

[R] [newbie] aggregating table() results and simplifying code with loop

Possibly Parallel Threads