thr3ads.net - R help - [R] About data manipulation [Nov 2016]

If this information is useful, please help other people find it:
Share via:

lily li

2016-Nov-26 16:11 UTC

[R] About data manipulation

Hi R users,

I'm trying to manipulate a dataframe and have some difficulties.

The original dataset is like this:

DF
year   month   total   id     note
2000     1         98    GA   1
2001     1        100   GA   1
2002     2         99    GA   1
2002     2         80    GB   1
...
2012     1         78    GA   2
...

The structure is like this: when year is between 2000-2005, note is 1; when
year is between 2006-2010, note is 2; GA, GB, etc represent different
groups, but they all have years 2000-2005, 2006-2010, 2011-2015.
I want to calculate one average value for each month in each time slice.
For example, between 2000-2005, when note is 1, for GA, there is one value
in month 1, one value in month 2, etc; for GB, there is one value in month
1, one value in month 2, between this time period. So later, there is no
'year' column, but other columns.
I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF,
id==GA&note==1)), but it did not give me the ideal dataframe. How to do
then?
Thanks for your help.

	[[alternative HTML version deleted]]

P Tennant

2016-Nov-26 23:42 UTC

head link

[R] About data manipulation

Hi,

It may help that:

aggregate(DF$total, list(DF$note, DF$id, DF$month), mean)

should give you means broken down by time slice (note), id and month. 
You could then subset means for GA or GB from the aggregated dataframe.

Philip

On 27/11/2016 3:11 AM, lily li wrote:> Hi R users,
>
> I'm trying to manipulate a dataframe and have some difficulties.
>
> The original dataset is like this:
>
> DF
> year   month   total   id     note
> 2000     1         98    GA   1
> 2001     1        100   GA   1
> 2002     2         99    GA   1
> 2002     2         80    GB   1
> ...
> 2012     1         78    GA   2
> ...
>
> The structure is like this: when year is between 2000-2005, note is 1; when
> year is between 2006-2010, note is 2; GA, GB, etc represent different
> groups, but they all have years 2000-2005, 2006-2010, 2011-2015.
> I want to calculate one average value for each month in each time slice.
> For example, between 2000-2005, when note is 1, for GA, there is one value
> in month 1, one value in month 2, etc; for GB, there is one value in month
> 1, one value in month 2, between this time period. So later, there is no
> 'year' column, but other columns.
> I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF,
> id==GA&note==1)), but it did not give me the ideal dataframe. How to do
> then?
> Thanks for your help.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2016-Nov-27 00:10 UTC

head link

[R] About data manipulation

A reproducible example was not provided, but I think what is wanted is
either ?tapply or ?ave; e.g.

within(DF, means <- ave(total, note, month, FUN = mean))


Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Nov 26, 2016 at 3:42 PM, P Tennant <philipt900 at iinet.net.au>
wrote:> Hi,
>
> It may help that:
>
> aggregate(DF$total, list(DF$note, DF$id, DF$month), mean)
>
> should give you means broken down by time slice (note), id and month. You
> could then subset means for GA or GB from the aggregated dataframe.
>
> Philip
>
> On 27/11/2016 3:11 AM, lily li wrote:
>>
>> Hi R users,
>>
>> I'm trying to manipulate a dataframe and have some difficulties.
>>
>> The original dataset is like this:
>>
>> DF
>> year   month   total   id     note
>> 2000     1         98    GA   1
>> 2001     1        100   GA   1
>> 2002     2         99    GA   1
>> 2002     2         80    GB   1
>> ...
>> 2012     1         78    GA   2
>> ...
>>
>> The structure is like this: when year is between 2000-2005, note is 1;
>> when
>> year is between 2006-2010, note is 2; GA, GB, etc represent different
>> groups, but they all have years 2000-2005, 2006-2010, 2011-2015.
>> I want to calculate one average value for each month in each time
slice.
>> For example, between 2000-2005, when note is 1, for GA, there is one
value
>> in month 1, one value in month 2, etc; for GB, there is one value in
month
>> 1, one value in month 2, between this time period. So later, there is
no
>> 'year' column, but other columns.
>> I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF,
>> id==GA&note==1)), but it did not give me the ideal dataframe. How
to do
>> then?
>> Thanks for your help.
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

jim holtman

2016-Nov-27 00:55 UTC

head link

[R] About data manipulation

You did not provide any data, but I will take a stab at it using the
"dplyr" package

library(dplyr)
DT %>%
    group_by(month, id, note) %>%
    summarise(avg = mean(total))



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Nov 26, 2016 at 11:11 AM, lily li <chocold12 at gmail.com> wrote:
> Hi R users,
>
> I'm trying to manipulate a dataframe and have some difficulties.
>
> The original dataset is like this:
>
> DF
> year   month   total   id     note
> 2000     1         98    GA   1
> 2001     1        100   GA   1
> 2002     2         99    GA   1
> 2002     2         80    GB   1
> ...
> 2012     1         78    GA   2
> ...
>
> The structure is like this: when year is between 2000-2005, note is 1; when
> year is between 2006-2010, note is 2; GA, GB, etc represent different
> groups, but they all have years 2000-2005, 2006-2010, 2011-2015.
> I want to calculate one average value for each month in each time slice.
> For example, between 2000-2005, when note is 1, for GA, there is one value
> in month 1, one value in month 2, etc; for GB, there is one value in month
> 1, one value in month 2, between this time period. So later, there is no
> 'year' column, but other columns.
> I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF,
> id==GA&note==1)), but it did not give me the ideal dataframe. How to do
> then?
> Thanks for your help.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

lily li

2016-Nov-27 07:03 UTC

head link

[R] About data manipulation

Thanks Jim, this method is very convenient and is what I want. Could I know
how to save the resulted dataframe? It printed in the console directly.

On Sat, Nov 26, 2016 at 5:55 PM, jim holtman <jholtman at gmail.com>
wrote:
> You did not provide any data, but I will take a stab at it using the
> "dplyr" package
>
> library(dplyr)
> DT %>%
>     group_by(month, id, note) %>%
>     summarise(avg = mean(total))
>
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Sat, Nov 26, 2016 at 11:11 AM, lily li <chocold12 at gmail.com>
wrote:
>
>> Hi R users,
>>
>> I'm trying to manipulate a dataframe and have some difficulties.
>>
>> The original dataset is like this:
>>
>> DF
>> year   month   total   id     note
>> 2000     1         98    GA   1
>> 2001     1        100   GA   1
>> 2002     2         99    GA   1
>> 2002     2         80    GB   1
>> ...
>> 2012     1         78    GA   2
>> ...
>>
>> The structure is like this: when year is between 2000-2005, note is 1;
>> when
>> year is between 2006-2010, note is 2; GA, GB, etc represent different
>> groups, but they all have years 2000-2005, 2006-2010, 2011-2015.
>> I want to calculate one average value for each month in each time
slice.
>> For example, between 2000-2005, when note is 1, for GA, there is one
value
>> in month 1, one value in month 2, etc; for GB, there is one value in
month
>> 1, one value in month 2, between this time period. So later, there is
no
>> 'year' column, but other columns.
>> I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF,
>> id==GA&note==1)), but it did not give me the ideal dataframe. How
to do
>> then?
>> Thanks for your help.
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
	[[alternative HTML version deleted]]

R help - Nov 2016 - About data manipulation

[R] About data manipulation

[R] About data manipulation

[R] About data manipulation

[R] About data manipulation

[R] About data manipulation