thr3ads.net - R help - [R] Arrange data [Aug 2020]

If this information is useful, please help other people find it:
Share via:

Rasmus Liland

2020-Aug-03 11:33 UTC

[R] Arrange data

On 2020-08-03 21:11 +1000, Jim Lemon wrote:> On Mon, Aug 3, 2020 at 8:52 PM Md. Moyazzem Hossain <hossainmm at
juniv.edu> wrote:
> >
> > Hi,
> >
> > I have a dataset having monthly 
> > observations (from January to 
> > December) over a period of time like 
> > (2000 to 2018). Now, I am trying to 
> > take an average the value from 
> > January to July of each year.
> >
> > The data looks like
> > Year    Month  Value
> > 2000    1         25
> > 2000    2         28
> > 2000    3         22
> > ....    ......      .....
> > 2000    12       26
> > 2001     1       27
> > .......         ........
> > 2018    11       30
> > 20118   12      29
> >
> > Can someone help me in this regard? 
> >
> > Many thanks in advance.
> 
> Hi Md,
> One way is to form a subset of your 
> data, then calculate the means by 
> year:
> 
> # assume your data is named mddat
> mddat2<-mddat[mddat$month < 7,]
> jan2jun<-by(mddat2$value,mddat2$year,mean)
> 
> Jim
Hi Md,

you can also define the period in a new 
column, and use aggregate like this:

	Md <- structure(list(
	Year = c(2000L, 2000L, 2000L, 
	2000L, 2001L, 2018L, 2018L), 
	Month = c(1L, 2L, 3L, 12L, 1L,
	11L, 12L), 
	Value = c(25L, 28L, 22L, 26L,
	27L, 30L, 29L)), 
	class = "data.frame", 
	row.names = c(NA, -7L))
	
	Md[Md$Month %in%
	        1:6,"Period"] <- "first six months of the year"
	Md[Md$Month %in% 7:12,"Period"] <- "last six months of the
year"
	
	aggregate(
	  formula=Value~Year+Period,
	  data=Md,
	  FUN=mean)

Rasmus

Rui Barradas

2020-Aug-03 22:28 UTC

head link

[R] Arrange data

Hello,

And here is another way, with aggregate.

Make up test data.

set.seed(2020)
df1 <- expand.grid(Year = 2000:2018, Month = 1:12)
df1 <- df1[order(df1$Year),]
df1$Value <- sample(20:30, nrow(df1), TRUE)
head(df1)


#Use subset to keep only the relevant months
aggregate(Value ~ Year, data = subset(df1, Month <= 7), FUN = mean)


Hope this helps,

Rui Barradas

?s 12:33 de 03/08/2020, Rasmus Liland escreveu:> On 2020-08-03 21:11 +1000, Jim Lemon wrote:
>> On Mon, Aug 3, 2020 at 8:52 PM Md. Moyazzem Hossain <hossainmm at
juniv.edu> wrote:
>>> Hi,
>>>
>>> I have a dataset having monthly
>>> observations (from January to
>>> December) over a period of time like
>>> (2000 to 2018). Now, I am trying to
>>> take an average the value from
>>> January to July of each year.
>>>
>>> The data looks like
>>> Year    Month  Value
>>> 2000    1         25
>>> 2000    2         28
>>> 2000    3         22
>>> ....    ......      .....
>>> 2000    12       26
>>> 2001     1       27
>>> .......         ........
>>> 2018    11       30
>>> 20118   12      29
>>>
>>> Can someone help me in this regard?
>>>
>>> Many thanks in advance.
>> Hi Md,
>> One way is to form a subset of your
>> data, then calculate the means by
>> year:
>>
>> # assume your data is named mddat
>> mddat2<-mddat[mddat$month < 7,]
>> jan2jun<-by(mddat2$value,mddat2$year,mean)
>>
>> Jim
> Hi Md,
>
> you can also define the period in a new
> column, and use aggregate like this:
>
> 	Md <- structure(list(
> 	Year = c(2000L, 2000L, 2000L,
> 	2000L, 2001L, 2018L, 2018L),
> 	Month = c(1L, 2L, 3L, 12L, 1L,
> 	11L, 12L),
> 	Value = c(25L, 28L, 22L, 26L,
> 	27L, 30L, 29L)),
> 	class = "data.frame",
> 	row.names = c(NA, -7L))
> 	
> 	Md[Md$Month %in%
> 	        1:6,"Period"] <- "first six months of the
year"
> 	Md[Md$Month %in% 7:12,"Period"] <- "last six months of
the year"
> 	
> 	aggregate(
> 	  formula=Value~Year+Period,
> 	  data=Md,
> 	  FUN=mean)
>
> Rasmus
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Este e-mail foi verificado em termos de v?rus pelo software antiv?rus Avast.
https://www.avast.com/antivirus

Rui Barradas

2020-Aug-04 21:45 UTC

head link

[R] Arrange data

Hello,

Please keep cc-ing the list R-help is threaded and questions and answers 
might be of help to others in the future.

As for the question, see if the following code does what you want.
First, create a logical index i of the months between 7 and 3 and use 
that index to subset the original data.frame. Then, a cumsum trick gives 
a vector M defining the data grouping. Group and compute the Value means 
with aggregate. Finally, since each group spans a year border, create a 
more meaningful Years column and put everything together.

df1 <- read.csv("mddat.csv")

i <- with(df1, (Month >= 7 & Month <= 12) | (Month >= 1 &
Month <= 3))
df2 <- df1[i, ]
M <- cumsum(c(FALSE, diff(as.integer(row.names(df2))) > 1))

agg <- aggregate(Value ~ M, df2, mean)
Years <- sapply(split(df2$Year, M), function(x){paste(x[1], 
x[length(x)], sep = "-")})
final <- cbind.data.frame(Years, Value = agg[["Value"]])

head(final)
#      Years    Value
#0 1975-1975 87.00000
#1 1975-1976 89.44444
#2 1976-1977 85.77778
#3 1977-1978 81.55556
#4 1978-1979 71.55556
#5 1979-1980 75.77778


Hope this helps,

Rui Barradas



?s 20:44 de 04/08/20, Md. Moyazzem Hossain escreveu:> Dear Rui,
> 
> Thanks a lot for your help.
> 
> It is working. Now I am also trying to find the average of values for 
> *July 1975 to March 1976* and record as the value of the year 1975. 
> Moreover, I want to continue it up to the year 2017. You may check the 
> attached file for data (mddat.csv).
> 
> I use the following function but got error
> aggregate(Value ~ Year, data = subset(df1, Month >= 7 & Month <=
3), FUN
> = mean)
> 
> Please help me again. Thanks in advance.
> 
> Best Regards,
> Md
> 
> On Mon, Aug 3, 2020 at 11:28 PM Rui Barradas <ruipbarradas at sapo.pt 
> <mailto:ruipbarradas at sapo.pt>> wrote:
> 
>     Hello,
> 
>     And here is another way, with aggregate.
> 
>     Make up test data.
> 
>     set.seed(2020)
>     df1 <- expand.grid(Year = 2000:2018, Month = 1:12)
>     df1 <- df1[order(df1$Year),]
>     df1$Value <- sample(20:30, nrow(df1), TRUE)
>     head(df1)
> 
> 
>     #Use subset to keep only the relevant months
>     aggregate(Value ~ Year, data = subset(df1, Month <= 7), FUN = mean)
> 
> 
>     Hope this helps,
> 
>     Rui Barradas
> 
>     ?s 12:33 de 03/08/2020, Rasmus Liland escreveu:
>      > On 2020-08-03 21:11 +1000, Jim Lemon wrote:
>      >> On Mon, Aug 3, 2020 at 8:52 PM Md. Moyazzem Hossain
>     <hossainmm at juniv.edu <mailto:hossainmm at juniv.edu>>
wrote:
>      >>> Hi,
>      >>>
>      >>> I have a dataset having monthly
>      >>> observations (from January to
>      >>> December) over a period of time like
>      >>> (2000 to 2018). Now, I am trying to
>      >>> take an average the value from
>      >>> January to July of each year.
>      >>>
>      >>> The data looks like
>      >>> Year? ? Month? Value
>      >>> 2000? ? 1? ? ? ? ?25
>      >>> 2000? ? 2? ? ? ? ?28
>      >>> 2000? ? 3? ? ? ? ?22
>      >>> ....? ? ......? ? ? .....
>      >>> 2000? ? 12? ? ? ?26
>      >>> 2001? ? ?1? ? ? ?27
>      >>> .......? ? ? ? ?........
>      >>> 2018? ? 11? ? ? ?30
>      >>> 20118? ?12? ? ? 29
>      >>>
>      >>> Can someone help me in this regard?
>      >>>
>      >>> Many thanks in advance.
>      >> Hi Md,
>      >> One way is to form a subset of your
>      >> data, then calculate the means by
>      >> year:
>      >>
>      >> # assume your data is named mddat
>      >> mddat2<-mddat[mddat$month < 7,]
>      >> jan2jun<-by(mddat2$value,mddat2$year,mean)
>      >>
>      >> Jim
>      > Hi Md,
>      >
>      > you can also define the period in a new
>      > column, and use aggregate like this:
>      >
>      >? ? ? ?Md <- structure(list(
>      >? ? ? ?Year = c(2000L, 2000L, 2000L,
>      >? ? ? ?2000L, 2001L, 2018L, 2018L),
>      >? ? ? ?Month = c(1L, 2L, 3L, 12L, 1L,
>      >? ? ? ?11L, 12L),
>      >? ? ? ?Value = c(25L, 28L, 22L, 26L,
>      >? ? ? ?27L, 30L, 29L)),
>      >? ? ? ?class = "data.frame",
>      >? ? ? ?row.names = c(NA, -7L))
>      >
>      >? ? ? ?Md[Md$Month %in%
>      >? ? ? ? ? ? ? ?1:6,"Period"] <- "first six
months of the year"
>      >? ? ? ?Md[Md$Month %in% 7:12,"Period"] <- "last
six months of the
>     year"
>      >
>      >? ? ? ?aggregate(
>      >? ? ? ? ?formula=Value~Year+Period,
>      >? ? ? ? ?data=Md,
>      >? ? ? ? ?FUN=mean)
>      >
>      > Rasmus
>      >
>      > ______________________________________________
>      > R-help at r-project.org <mailto:R-help at r-project.org>
mailing list
>     -- To UNSUBSCRIBE and more, see
>      > https://stat.ethz.ch/mailman/listinfo/r-help
>      > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>      > and provide commented, minimal, self-contained, reproducible
code.
> 
> 
>     -- 
>     Este e-mail foi verificado em termos de v?rus pelo software
>     antiv?rus Avast.
>     https://www.avast.com/antivirus
> 
>     ______________________________________________
>     R-help at r-project.org <mailto:R-help at r-project.org> mailing
list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
> 
> 
>

Md. Moyazzem Hossain

2020-Aug-09 19:59 UTC

head link

[R] Arrange data

Dear Rui,

Thank you for your nice help.

Take care and be safe.

Md

On Tue, Aug 4, 2020 at 10:45 PM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> Hello,
>
> Please keep cc-ing the list R-help is threaded and questions and answers
> might be of help to others in the future.
>
> As for the question, see if the following code does what you want.
> First, create a logical index i of the months between 7 and 3 and use
> that index to subset the original data.frame. Then, a cumsum trick gives
> a vector M defining the data grouping. Group and compute the Value means
> with aggregate. Finally, since each group spans a year border, create a
> more meaningful Years column and put everything together.
>
> df1 <- read.csv("mddat.csv")
>
> i <- with(df1, (Month >= 7 & Month <= 12) | (Month >= 1
& Month <= 3))
> df2 <- df1[i, ]
> M <- cumsum(c(FALSE, diff(as.integer(row.names(df2))) > 1))
>
> agg <- aggregate(Value ~ M, df2, mean)
> Years <- sapply(split(df2$Year, M), function(x){paste(x[1],
> x[length(x)], sep = "-")})
> final <- cbind.data.frame(Years, Value = agg[["Value"]])
>
> head(final)
> #      Years    Value
> #0 1975-1975 87.00000
> #1 1975-1976 89.44444
> #2 1976-1977 85.77778
> #3 1977-1978 81.55556
> #4 1978-1979 71.55556
> #5 1979-1980 75.77778
>
>
> Hope this helps,
>
> Rui Barradas
>
>
>
> ?s 20:44 de 04/08/20, Md. Moyazzem Hossain escreveu:
> > Dear Rui,
> >
> > Thanks a lot for your help.
> >
> > It is working. Now I am also trying to find the average of values for
> > *July 1975 to March 1976* and record as the value of the year 1975.
> > Moreover, I want to continue it up to the year 2017. You may check the
> > attached file for data (mddat.csv).
> >
> > I use the following function but got error
> > aggregate(Value ~ Year, data = subset(df1, Month >= 7 & Month
<= 3), FUN
> > = mean)
> >
> > Please help me again. Thanks in advance.
> >
> > Best Regards,
> > Md
> >
> > On Mon, Aug 3, 2020 at 11:28 PM Rui Barradas <ruipbarradas at
sapo.pt
> > <mailto:ruipbarradas at sapo.pt>> wrote:
> >
> >     Hello,
> >
> >     And here is another way, with aggregate.
> >
> >     Make up test data.
> >
> >     set.seed(2020)
> >     df1 <- expand.grid(Year = 2000:2018, Month = 1:12)
> >     df1 <- df1[order(df1$Year),]
> >     df1$Value <- sample(20:30, nrow(df1), TRUE)
> >     head(df1)
> >
> >
> >     #Use subset to keep only the relevant months
> >     aggregate(Value ~ Year, data = subset(df1, Month <= 7), FUN =
mean)
> >
> >
> >     Hope this helps,
> >
> >     Rui Barradas
> >
> >     ?s 12:33 de 03/08/2020, Rasmus Liland escreveu:
> >      > On 2020-08-03 21:11 +1000, Jim Lemon wrote:
> >      >> On Mon, Aug 3, 2020 at 8:52 PM Md. Moyazzem Hossain
> >     <hossainmm at juniv.edu <mailto:hossainmm at
juniv.edu>> wrote:
> >      >>> Hi,
> >      >>>
> >      >>> I have a dataset having monthly
> >      >>> observations (from January to
> >      >>> December) over a period of time like
> >      >>> (2000 to 2018). Now, I am trying to
> >      >>> take an average the value from
> >      >>> January to July of each year.
> >      >>>
> >      >>> The data looks like
> >      >>> Year    Month  Value
> >      >>> 2000    1         25
> >      >>> 2000    2         28
> >      >>> 2000    3         22
> >      >>> ....    ......      .....
> >      >>> 2000    12       26
> >      >>> 2001     1       27
> >      >>> .......         ........
> >      >>> 2018    11       30
> >      >>> 20118   12      29
> >      >>>
> >      >>> Can someone help me in this regard?
> >      >>>
> >      >>> Many thanks in advance.
> >      >> Hi Md,
> >      >> One way is to form a subset of your
> >      >> data, then calculate the means by
> >      >> year:
> >      >>
> >      >> # assume your data is named mddat
> >      >> mddat2<-mddat[mddat$month < 7,]
> >      >> jan2jun<-by(mddat2$value,mddat2$year,mean)
> >      >>
> >      >> Jim
> >      > Hi Md,
> >      >
> >      > you can also define the period in a new
> >      > column, and use aggregate like this:
> >      >
> >      >       Md <- structure(list(
> >      >       Year = c(2000L, 2000L, 2000L,
> >      >       2000L, 2001L, 2018L, 2018L),
> >      >       Month = c(1L, 2L, 3L, 12L, 1L,
> >      >       11L, 12L),
> >      >       Value = c(25L, 28L, 22L, 26L,
> >      >       27L, 30L, 29L)),
> >      >       class = "data.frame",
> >      >       row.names = c(NA, -7L))
> >      >
> >      >       Md[Md$Month %in%
> >      >               1:6,"Period"] <- "first six
months of the year"
> >      >       Md[Md$Month %in% 7:12,"Period"] <-
"last six months of the
> >     year"
> >      >
> >      >       aggregate(
> >      >         formula=Value~Year+Period,
> >      >         data=Md,
> >      >         FUN=mean)
> >      >
> >      > Rasmus
> >      >
> >      > ______________________________________________
> >      > R-help at r-project.org <mailto:R-help at
r-project.org> mailing list
> >     -- To UNSUBSCRIBE and more, see
> >      > https://stat.ethz.ch/mailman/listinfo/r-help
> >      > PLEASE do read the posting guide
> >     http://www.R-project.org/posting-guide.html
> >      > and provide commented, minimal, self-contained, reproducible
code.
> >
> >
> >     --
> >     Este e-mail foi verificado em termos de v?rus pelo software
> >     antiv?rus Avast.
> >     https://www.avast.com/antivirus
> >
> >     ______________________________________________
> >     R-help at r-project.org <mailto:R-help at r-project.org>
mailing list --
> >     To UNSUBSCRIBE and more, see
> >     https://stat.ethz.ch/mailman/listinfo/r-help
> >     PLEASE do read the posting guide
> >     http://www.R-project.org/posting-guide.html
> >     and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>

-- 
Best Regards,
Md. Moyazzem Hossain
Associate Professor
Department of Statistics
Jahangirnagar University
Savar, Dhaka-1342
Bangladesh
Website: http://www.juniv.edu/teachers/hossainmm
Research: *Google Scholar
<https://scholar.google.com/citations?user=-U03XCgAAAAJ&hl=en&oi=ao>*;
*ResearchGate
<https://www.researchgate.net/profile/Md_Hossain107>*; *ORCID iD
<https://orcid.org/0000-0003-3593-6936>*

	[[alternative HTML version deleted]]

R help - Aug 2020 - Arrange data

[R] Arrange data

[R] Arrange data

[R] Arrange data

[R] Arrange data