Subhamitra Patra
2019-Sep-13 13:20 UTC
[R] Query about calculating the monthly average of daily data columns
Dear Sir, Yes, I understood the logic. But, still, I have a few queries that I mentioned below your answers. "# if you only have to get the monthly averages, it can be done this way> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*) > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)" > > B. Here, I need to define the no. of months, and years separately, right? > or else what 2, and 3 (in bold) indicates? >To get the grouping variable of sequential months that you want, you only need the month and year values of the dates in the first column. First I used the "strsplit" function to split the date field at the hyphens, then used "sapply" to extract ("[") the second (month) and *third (year)* parts as two new columns. Because you have more than one year of data, you need the year values or you will group all Januarys, all Februarys and so on. Notice how I pass both of the new columns as a list (a data frame is a type of list) in the call to get the mean of each month. 1. Here, as per my understanding, the "3" indicates the 3rd year, right? But, you showed an average for 2 months of the same year. Then, what "3" in the spdat$year object indicate? C. From this part, I got the exact average values of both January and> February of 1994 for country A, and B. But, in code, I have a query that I > need to define spdat$returnA, and spdat$returnB separately before writing > this code, right? Like this, I need to define for each 84 countries > separately with their respective number of months, and years before writing > this code, right? >I don't think so. Because I don't know what your data looks like, I am guessing that for each row, it has columns for each of the 84 countries. I don't know what these columns are named, either. Maybe: date Australia Belarus ... Zambia 01/01/1994 20 21 22 ... Here, due to my misunderstanding about the code, I was wrong. But, what data structure you guessed, it is absolutely right that for each row, I have columns for each of the 84 countries. So, I think, I need to define the date column with no. of months, and years once for all the countries. Therefore, I got my answer to the first and third question in the previous email (what you suggested) that I no need to define the column of each country, as the date, and no. of observations are same for all countries. But, the no. of days are different for each month, and similarly, for each year. So, I think I need to define date for each year separately. Hence, I have given an example of 12 months, for 2 years (i.e. 1994, and 1995), and have written the following code. Please correct me in case I am wrong. spdat<-data.frame( dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(rep(1,21),rep(2,20), rep(3,23), rep(4,21), rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12,22) ),rep(1994,260) dates1paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,22),rep(2,20), rep(3,23), rep(4,20), rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12,21) ),rep(1995,259) ,sep="-") Concerning the exporting of structure of the dataset to excel, I will have 12*84 matrix. But, please suggest me the way to proceed for the large sample. I have mentioned below what I understood from your code. Please correct me if I am wrong. 1. I need to define the date for each year as the no. of days in each month are different for each year (as mentioned in my above code). For instance, in my data file, Jan 1994 has 21 days while Jan 1995 has 22 days. 2. Need to define the date column as character. 3. Need to define the monthly average for each month, and year. So, now code will be as follows. spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12) %%%%As I need all months average sequentially. spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3) Here, this meaning of "3", I am really unable to get. 4. Need to define each country with each month and year as mentioned in the last part of your code. Please suggest me in this regard. Thank you. [image: Mailtrack] <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender notified by Mailtrack <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 09/13/19, 06:41:41 PM On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Subhamitra, > I'll try to write my answers adjacent to your questions below. > > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra < > subhamitra.patra at gmail.com> wrote: > >> Dear Sir, >> >> Thank you very much for your suggestion. >> >> Yes, your suggested code worked. But, actually, I have data from 3rd >> January 1994 to 3rd August 2017 for very large (i.e. for 84 countries) >> sample. From this, I have given the example of the years up to 2000. Before >> applying the same code for the long 24 years, I want to learn the logic >> behind the code. Actually, some part of the code is not understandable to >> me which I mentioned in the bold letter as follows. >> >> "spdat<-data.frame( >> dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"), >> returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))" >> >> A. Here, I need to define the no. of days in a month, and the no. of >> countries name separately, right? But, what is meant by 15:50, and 10:45 in >> return A, and B respectively? >> > > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the > real values of return are, so I made them up using the "sample" function. > However, this is not meant to mislead anyone, just to show how whatever > numbers are in your data can be used in calculations. The colon (":") > operator creates a sequence of numbers starting with the one to the left > and ending with the one to the right. > >> >> "# if you only have to get the monthly averages, it can be done this way >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*) >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)" >> >> B. Here, I need to define the no. of months, and years separately, right? >> or else what 2, and 3 (in bold) indicates? >> > > To get the grouping variable of sequential months that you want, you only > need the month and year values of the dates in the first column. First I > used the "strsplit" function to split the date field at the hyphens, then > used "sapply" to extract ("[") the second (month) and third (year) parts as > two new columns. Because you have more than one year of data, you need the > year values or you will group all Januarys, all Februarys and so on. Notice > how I pass both of the new columns as a list (a data frame is a type of > list) in the call to get the mean of each month. > >> >> "# get the averages by month and year - is this correct? >> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean) >> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)" >> >> C. From this part, I got the exact average values of both January and >> February of 1994 for country A, and B. But, in code, I have a query that I >> need to define spdat$returnA, and spdat$returnB separately before writing >> this code, right? Like this, I need to define for each 84 countries >> separately with their respective number of months, and years before writing >> this code, right? >> > > I don't think so. Because I don't know what your data looks like, I am > guessing that for each row, it has columns for each of the 84 countries. I > don't know what these columns are named, either. Maybe: > > date Australia Belarus ... Zambia > 01/01/1994 20 21 22 > ... > > >> Yes, after obtaining the monthly average for each country's data, I need >> to use them for further calculations. So, I want to export the result to >> excel. But, until understanding the code, I think I willn't able to apply >> for the entire sample, and cannot be able to discuss the format of the >> resulted column to export to excel. >> > > Say that we perform the grouped mean calculation for the first two country > columns like this: > monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean) > monmeans > Australia Belarus > [1,] 29.70000 30.43333 > [2,] 34.17857 27.39286 > > We are presented with a 2x2 matrix of monthly means in just the format > someone might use for importing into Excel. The first row is January 1994, > the second February 1994 and so on. By expanding the columns to include all > the countries in your data, You should have the result you want. > > Jim >-- *Best Regards,* *Subhamitra Patra* *Phd. Research Scholar* *Department of Humanities and Social Sciences* *Indian Institute of Technology, Kharagpur* *INDIA* [[alternative HTML version deleted]]
PIKAL Petr
2019-Sep-13 13:33 UTC
[R] Query about calculating the monthly average of daily data columns
Hi I am almost 100% sure that you would spare yourself much trouble if you changed your date column to real date ?as.Date reshape your wide format to long one library(reshape2) ?melt to get 3 column data.frame with one date column, one country column and one value column use ?aggregate and ?format to get summary value something like aggregate(value column, list(format(date column, "%m.%Y"), country column), mean) But if you insist to scratch your left ear with right hand accross your head, you could continue your way. Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Subhamitra > Patra > Sent: Friday, September 13, 2019 3:20 PM > To: Jim Lemon <drjimlemon at gmail.com>; r-help mailing list <r-help at r- > project.org> > Subject: Re: [R] Query about calculating the monthly average of daily data > columns > > Dear Sir, > > Yes, I understood the logic. But, still, I have a few queries that I mentioned > below your answers. > > "# if you only have to get the monthly averages, it can be done this way > > spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*) > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)" > > > > B. Here, I need to define the no. of months, and years separately, right? > > or else what 2, and 3 (in bold) indicates? > > > > To get the grouping variable of sequential months that you want, you only > need the month and year values of the dates in the first column. First I used > the "strsplit" function to split the date field at the hyphens, then used > "sapply" to extract ("[") the second (month) and *third (year)* parts as two > new columns. Because you have more than one year of data, you need the > year values or you will group all Januarys, all Februarys and so on. > Notice how I pass both of the new columns as a list (a data frame is a type of > list) in the call to get the mean of each month. > > 1. Here, as per my understanding, the "3" indicates the 3rd year, right? > But, you showed an average for 2 months of the same year. Then, what "3" > in the spdat$year object indicate? > > > C. From this part, I got the exact average values of both January and > > February of 1994 for country A, and B. But, in code, I have a query > > that I need to define spdat$returnA, and spdat$returnB separately > > before writing this code, right? Like this, I need to define for each > > 84 countries separately with their respective number of months, and > > years before writing this code, right? > > > > I don't think so. Because I don't know what your data looks like, I am > guessing that for each row, it has columns for each of the 84 countries. I > don't know what these columns are named, either. Maybe: > > date Australia Belarus ... Zambia > 01/01/1994 20 21 22 > ... > > Here, due to my misunderstanding about the code, I was wrong. But, what > data structure you guessed, it is absolutely right that for each row, I have > columns for each of the 84 countries. So, I think, I need to define the date > column with no. of months, and years once for all the countries. > Therefore, I got my answer to the first and third question in the previous > email (what you suggested) that I no need to define the column of each > country, as the date, and no. of observations are same for all countries. > But, the no. of days are different for each month, and similarly, for each > year. So, I think I need to define date for each year separately. Hence, I have > given an example of 12 months, for 2 years (i.e. 1994, and 1995), and have > written the following code. Please correct me in case I am wrong. > > spdat<-data.frame( > > dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r > ep(1,21),rep(2,20), > rep(3,23), rep(4,21), > rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12 > ,22) > ),rep(1994,260) > dates1> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2 > 2),rep(2,20), > rep(3,23), rep(4,20), > rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12 > ,21) > ),rep(1995,259) ,sep="-") > > Concerning the exporting of structure of the dataset to excel, I will have > 12*84 matrix. But, please suggest me the way to proceed for the large > sample. I have mentioned below what I understood from your code. Please > correct me if I am wrong. > 1. I need to define the date for each year as the no. of days in each month > are different for each year (as mentioned in my above code). For instance, in > my data file, Jan 1994 has 21 days while Jan 1995 has 22 days. > 2. Need to define the date column as character. > 3. Need to define the monthly average for each month, and year. So, now > code will be as follows. > spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12) > %%%%As I need all months average sequentially. > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3) > > Here, this meaning of "3", I am really unable to get. > > 4. Need to define each country with each month and year as mentioned in > the last part of your code. > > Please suggest me in this regard. > > Thank you. > > > > > > > > [image: Mailtrack] > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca > mpaign=signaturevirality5&> > Sender > notified by > Mailtrack > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca > mpaign=signaturevirality5&> > 09/13/19, > 06:41:41 PM > > On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <drjimlemon at gmail.com> wrote: > > > Hi Subhamitra, > > I'll try to write my answers adjacent to your questions below. > > > > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra < > > subhamitra.patra at gmail.com> wrote: > > > >> Dear Sir, > >> > >> Thank you very much for your suggestion. > >> > >> Yes, your suggested code worked. But, actually, I have data from 3rd > >> January 1994 to 3rd August 2017 for very large (i.e. for 84 > >> countries) sample. From this, I have given the example of the years > >> up to 2000. Before applying the same code for the long 24 years, I > >> want to learn the logic behind the code. Actually, some part of the > >> code is not understandable to me which I mentioned in the bold letter as > follows. > >> > >> "spdat<-data.frame( > >> dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"), > >> returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))" > >> > >> A. Here, I need to define the no. of days in a month, and the no. of > >> countries name separately, right? But, what is meant by 15:50, and > >> 10:45 in return A, and B respectively? > >> > > > > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the > > real values of return are, so I made them up using the "sample" function. > > However, this is not meant to mislead anyone, just to show how > > whatever numbers are in your data can be used in calculations. The > > colon (":") operator creates a sequence of numbers starting with the > > one to the left and ending with the one to the right. > > > >> > >> "# if you only have to get the monthly averages, it can be done this > >> way > >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*) > >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)" > >> > >> B. Here, I need to define the no. of months, and years separately, right? > >> or else what 2, and 3 (in bold) indicates? > >> > > > > To get the grouping variable of sequential months that you want, you > > only need the month and year values of the dates in the first column. > > First I used the "strsplit" function to split the date field at the > > hyphens, then used "sapply" to extract ("[") the second (month) and > > third (year) parts as two new columns. Because you have more than one > > year of data, you need the year values or you will group all Januarys, > > all Februarys and so on. Notice how I pass both of the new columns as > > a list (a data frame is a type of > > list) in the call to get the mean of each month. > > > >> > >> "# get the averages by month and year - is this correct? > >> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean) > >> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)" > >> > >> C. From this part, I got the exact average values of both January and > >> February of 1994 for country A, and B. But, in code, I have a query > >> that I need to define spdat$returnA, and spdat$returnB separately > >> before writing this code, right? Like this, I need to define for each > >> 84 countries separately with their respective number of months, and > >> years before writing this code, right? > >> > > > > I don't think so. Because I don't know what your data looks like, I am > > guessing that for each row, it has columns for each of the 84 > > countries. I don't know what these columns are named, either. Maybe: > > > > date Australia Belarus ... Zambia > > 01/01/1994 20 21 22 > > ... > > > > > >> Yes, after obtaining the monthly average for each country's data, I > >> need to use them for further calculations. So, I want to export the > >> result to excel. But, until understanding the code, I think I willn't > >> able to apply for the entire sample, and cannot be able to discuss > >> the format of the resulted column to export to excel. > >> > > > > Say that we perform the grouped mean calculation for the first two > > country columns like this: > > monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean) > > monmeans > > Australia Belarus > > [1,] 29.70000 30.43333 > > [2,] 34.17857 27.39286 > > > > We are presented with a 2x2 matrix of monthly means in just the format > > someone might use for importing into Excel. The first row is January > > 1994, the second February 1994 and so on. By expanding the columns to > > include all the countries in your data, You should have the result you want. > > > > Jim > > > > > -- > *Best Regards,* > *Subhamitra Patra* > *Phd. Research Scholar* > *Department of Humanities and Social Sciences* *Indian Institute of > Technology, Kharagpur* > *INDIA* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
Subhamitra Patra
2019-Sep-13 13:58 UTC
[R] Query about calculating the monthly average of daily data columns
Dear PIKAL, Thank you very much for your suggestion. I tried your previous suggested code and getting the average value for each month for both country A, and B. But in your recent email, you are suggesting not to change the date column to real date. If I am going through your recently suggested code, i.e. "aggregate(value column, list(format(date column, "%m.%Y"), country column), mean)" I am getting an Error that "*aggregate(value, list(format(date, "%m.%Y"), country), mean) : **object 'value' not found"*. Here, my query "*may I need to define the date column, country column, and value column separately?"* Further, I need something the average value result like below in the data frame Month Country A Country B Jan 1994 26.66 35.78 Feb 1994 26.13 29.14 so that it will be easy for me to export to excel, and to use for the further calculations. Please suggest me in this regard. Thank you. [image: Mailtrack] <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender notified by Mailtrack <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 09/13/19, 07:22:53 PM On Fri, Sep 13, 2019 at 7:03 PM PIKAL Petr <petr.pikal at precheza.cz> wrote:> Hi > > I am almost 100% sure that you would spare yourself much trouble if you > changed your date column to real date > > ?as.Date > > reshape your wide format to long one > library(reshape2) > ?melt > > to get 3 column data.frame with one date column, one country column and > one value column > > use ?aggregate and ?format to get summary value > > something like > aggregate(value column, list(format(date column, "%m.%Y"), country > column), mean) > > But if you insist to scratch your left ear with right hand accross your > head, you could continue your way. > > Cheers > Petr > > > -----Original Message----- > > From: R-help <r-help-bounces at r-project.org> On Behalf Of Subhamitra > > Patra > > Sent: Friday, September 13, 2019 3:20 PM > > To: Jim Lemon <drjimlemon at gmail.com>; r-help mailing list <r-help at r- > > project.org> > > Subject: Re: [R] Query about calculating the monthly average of daily > data > > columns > > > > Dear Sir, > > > > Yes, I understood the logic. But, still, I have a few queries that I > mentioned > > below your answers. > > > > "# if you only have to get the monthly averages, it can be done this way > > > spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*) > > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)" > > > > > > B. Here, I need to define the no. of months, and years separately, > right? > > > or else what 2, and 3 (in bold) indicates? > > > > > > > To get the grouping variable of sequential months that you want, you only > > need the month and year values of the dates in the first column. First I > used > > the "strsplit" function to split the date field at the hyphens, then used > > "sapply" to extract ("[") the second (month) and *third (year)* parts as > two > > new columns. Because you have more than one year of data, you need the > > year values or you will group all Januarys, all Februarys and so on. > > Notice how I pass both of the new columns as a list (a data frame is a > type of > > list) in the call to get the mean of each month. > > > > 1. Here, as per my understanding, the "3" indicates the 3rd year, right? > > But, you showed an average for 2 months of the same year. Then, what "3" > > in the spdat$year object indicate? > > > > > > C. From this part, I got the exact average values of both January and > > > February of 1994 for country A, and B. But, in code, I have a query > > > that I need to define spdat$returnA, and spdat$returnB separately > > > before writing this code, right? Like this, I need to define for each > > > 84 countries separately with their respective number of months, and > > > years before writing this code, right? > > > > > > > I don't think so. Because I don't know what your data looks like, I am > > guessing that for each row, it has columns for each of the 84 countries. > I > > don't know what these columns are named, either. Maybe: > > > > date Australia Belarus ... Zambia > > 01/01/1994 20 21 22 > > ... > > > > Here, due to my misunderstanding about the code, I was wrong. But, what > > data structure you guessed, it is absolutely right that for each row, I > have > > columns for each of the 84 countries. So, I think, I need to define the > date > > column with no. of months, and years once for all the countries. > > Therefore, I got my answer to the first and third question in the > previous > > email (what you suggested) that I no need to define the column of each > > country, as the date, and no. of observations are same for all countries. > > But, the no. of days are different for each month, and similarly, for > each > > year. So, I think I need to define date for each year separately. > Hence, I have > > given an example of 12 months, for 2 years (i.e. 1994, and 1995), and > have > > written the following code. Please correct me in case I am wrong. > > > > spdat<-data.frame( > > > > > dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r > > ep(1,21),rep(2,20), > > rep(3,23), rep(4,21), > > > rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12 > > ,22) > > ),rep(1994,260) > > dates1> > > paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2 > > 2),rep(2,20), > > rep(3,23), rep(4,20), > > > rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12 > > ,21) > > ),rep(1995,259) ,sep="-") > > > > Concerning the exporting of structure of the dataset to excel, I will > have > > 12*84 matrix. But, please suggest me the way to proceed for the large > > sample. I have mentioned below what I understood from your code. Please > > correct me if I am wrong. > > 1. I need to define the date for each year as the no. of days in each > month > > are different for each year (as mentioned in my above code). For > instance, in > > my data file, Jan 1994 has 21 days while Jan 1995 has 22 days. > > 2. Need to define the date column as character. > > 3. Need to define the monthly average for each month, and year. So, now > > code will be as follows. > > > spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12) > > %%%%As I need all months average sequentially. > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3) > > > > Here, this meaning of "3", I am really unable to get. > > > > 4. Need to define each country with each month and year as mentioned in > > the last part of your code. > > > > Please suggest me in this regard. > > > > Thank you. > > > > > > > > > > > > > > > > [image: Mailtrack] > > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca > > mpaign=signaturevirality5&> > > Sender > > notified by > > Mailtrack > > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca > > mpaign=signaturevirality5&> > > 09/13/19, > > 06:41:41 PM > > > > On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <drjimlemon at gmail.com> wrote: > > > > > Hi Subhamitra, > > > I'll try to write my answers adjacent to your questions below. > > > > > > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra < > > > subhamitra.patra at gmail.com> wrote: > > > > > >> Dear Sir, > > >> > > >> Thank you very much for your suggestion. > > >> > > >> Yes, your suggested code worked. But, actually, I have data from 3rd > > >> January 1994 to 3rd August 2017 for very large (i.e. for 84 > > >> countries) sample. From this, I have given the example of the years > > >> up to 2000. Before applying the same code for the long 24 years, I > > >> want to learn the logic behind the code. Actually, some part of the > > >> code is not understandable to me which I mentioned in the bold letter > as > > follows. > > >> > > >> "spdat<-data.frame( > > >> > dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"), > > >> returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))" > > >> > > >> A. Here, I need to define the no. of days in a month, and the no. of > > >> countries name separately, right? But, what is meant by 15:50, and > > >> 10:45 in return A, and B respectively? > > >> > > > > > > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the > > > real values of return are, so I made them up using the "sample" > function. > > > However, this is not meant to mislead anyone, just to show how > > > whatever numbers are in your data can be used in calculations. The > > > colon (":") operator creates a sequence of numbers starting with the > > > one to the left and ending with the one to the right. > > > > > >> > > >> "# if you only have to get the monthly averages, it can be done this > > >> way > > >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*) > > >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)" > > >> > > >> B. Here, I need to define the no. of months, and years separately, > right? > > >> or else what 2, and 3 (in bold) indicates? > > >> > > > > > > To get the grouping variable of sequential months that you want, you > > > only need the month and year values of the dates in the first column. > > > First I used the "strsplit" function to split the date field at the > > > hyphens, then used "sapply" to extract ("[") the second (month) and > > > third (year) parts as two new columns. Because you have more than one > > > year of data, you need the year values or you will group all Januarys, > > > all Februarys and so on. Notice how I pass both of the new columns as > > > a list (a data frame is a type of > > > list) in the call to get the mean of each month. > > > > > >> > > >> "# get the averages by month and year - is this correct? > > >> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean) > > >> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)" > > >> > > >> C. From this part, I got the exact average values of both January and > > >> February of 1994 for country A, and B. But, in code, I have a query > > >> that I need to define spdat$returnA, and spdat$returnB separately > > >> before writing this code, right? Like this, I need to define for each > > >> 84 countries separately with their respective number of months, and > > >> years before writing this code, right? > > >> > > > > > > I don't think so. Because I don't know what your data looks like, I am > > > guessing that for each row, it has columns for each of the 84 > > > countries. I don't know what these columns are named, either. Maybe: > > > > > > date Australia Belarus ... Zambia > > > 01/01/1994 20 21 22 > > > ... > > > > > > > > >> Yes, after obtaining the monthly average for each country's data, I > > >> need to use them for further calculations. So, I want to export the > > >> result to excel. But, until understanding the code, I think I willn't > > >> able to apply for the entire sample, and cannot be able to discuss > > >> the format of the resulted column to export to excel. > > >> > > > > > > Say that we perform the grouped mean calculation for the first two > > > country columns like this: > > > monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean) > > > monmeans > > > Australia Belarus > > > [1,] 29.70000 30.43333 > > > [2,] 34.17857 27.39286 > > > > > > We are presented with a 2x2 matrix of monthly means in just the format > > > someone might use for importing into Excel. The first row is January > > > 1994, the second February 1994 and so on. By expanding the columns to > > > include all the countries in your data, You should have the result you > want. > > > > > > Jim > > > > > > > > > -- > > *Best Regards,* > > *Subhamitra Patra* > > *Phd. Research Scholar* > > *Department of Humanities and Social Sciences* *Indian Institute of > > Technology, Kharagpur* > > *INDIA* > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code. > Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch > partner? PRECHEZA a.s. jsou zve?ejn?ny na: > https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information > about processing and protection of business partner?s personal data are > available on website: > https://www.precheza.cz/en/personal-data-protection-principles/ > D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou > d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? > odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any > documents attached to it may be confidential and are subject to the legally > binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ > >-- *Best Regards,* *Subhamitra Patra* *Phd. Research Scholar* *Department of Humanities and Social Sciences* *Indian Institute of Technology, Kharagpur* *INDIA* [[alternative HTML version deleted]]
PIKAL Petr
2019-Sep-16 11:00 UTC
[R] Query about calculating the monthly average of daily data columns
Original email did not come through (some problems with formating). Hi No, on contrary. I **am** suggesting to change date column to real date asi it is easy to handle with appropriate functions. Here are some fake data> str(spdat)'data.frame':?? 260 obs. of? 3 variables: $ dates?? : Date, format: "1995-01-01" "1995-01-02" "1995-01-03" "1995-01-04" ... $ coutryA : num? 0.188 0.405 -0.107 -0.596 -0.529 ... $ countryB: num? 9.4 10.76 11.24 8.26 10.71 ..> head(spdat)?? ????dates??? coutryA? countryB 1 1995-01-01? 0.1875060? 9.402851 2 1995-01-02? 0.4045193 10.755112 3 1995-01-03 -0.1073904 11.243663 4 1995-01-04 -0.5959683? 8.256424 5 1995-01-05 -0.5293772 10.705431 6 1995-01-06 -0.2228029 10.171461 First I melt it spdat.m <- melt(spdat, id.var="dates")> head(spdat.m)???? ??dates variable????? value 1 1995-01-01? coutryA? 0.1875060 2 1995-01-02? coutryA? 0.4045193 3 1995-01-03? coutryA -0.1073904 4 1995-01-04? coutryA -0.5959683 5 1995-01-05? coutryA -0.5293772 6 1995-01-06? coutryA -0.2228029 I do aggregation> spdat.ag <- aggregate(spdat.m$value, list(format(spdat.m$dates, "%m.%Y"), spdat.m$variable), mean)And now I use dcast ?to get required result.> dcast(spdat.ag, Group.1~Group.2)Using x as value column: use value.var to override. ?? Group.1????? coutryA? countryB 1? 01.1995? 0.098688137 10.177696 2? 02.1995? 0.352264682? 9.609261 3? 03.1995? 0.155521876 10.043503 4? 04.1995 -0.166092393 10.129844 5? 05.1995? 0.164665188 10.308275 6? 06.1995? 0.260633585 10.210129 7? 07.1995? 0.003671979 10.549016 8? 08.1995? 0.045295990 10.087435 9? 09.1995 -0.145488206? 9.689876 10 10.1995 -0.225645950? 9.743744 11 11.1995? 0.030273383 10.025435 12 12.1995? 0.043557468 10.105626 Cheers Petr Here are the data.> dput(spdat)spdat ?<- structure(list(dates = structure(c(9131, 9132, 9133, 9134, 9135, 9136, 9137, 9138, 9139, 9140, 9141, 9142, 9143, 9144, 9145, 9146, 9147, 9148, 9149, 9150, 9151, 9152, 9162, 9163, 9164, 9165, 9166, 9167, 9168, 9169, 9170, 9171, 9172, 9173, 9174, 9175, 9176, 9177, 9178, 9179, 9180, 9181, 9190, 9191, 9192, 9193, 9194, 9195, 9196, 9197, 9198, 9199, 9200, 9201, 9202, 9203, 9204, 9205, 9206, 9207, 9208, 9209, 9210, 9211, 9212, 9221, 9222, 9223, 9224, 9225, 9226, 9227, 9228, 9229, 9230, 9231, 9232, 9233, 9234, 9235, 9236, 9237, 9238, 9239, 9240, 9251, 9252, 9253, 9254, 9255, 9256, 9257, 9258, 9259, 9260, 9261, 9262, 9263, 9264, 9265, 9266, 9267, 9268, 9269, 9270, 9271, 9272, 9273, 9282, 9283, 9284, 9285, 9286, 9287, 9288, 9289, 9290, 9291, 9292, 9293, 9294, 9295, 9296, 9297, 9298, 9299, 9300, 9301, 9302, 9303, 9312, 9313, 9314, 9315, 9316, 9317, 9318, 9319, 9320, 9321, 9322, 9323, 9324, 9325, 9326, 9327, 9328, 9329, 9330, 9331, 9332, 9343, 9344, 9345, 9346, 9347, 9348, 9349, 9350, 9351, 9352, 9353, 9354, 9355, 9356, 9357, 9358, 9359, 9360, 9361, 9362, 9363, 9364, 9365, 9374, 9375, 9376, 9377, 9378, 9379, 9380, 9381, 9382, 9383, 9384, 9385, 9386, 9387, 9388, 9389, 9390, 9391, 9392, 9393, 9394, 9404, 9405, 9406, 9407, 9408, 9409, 9410, 9411, 9412, 9413, 9414, 9415, 9416, 9417, 9418, 9419, 9420, 9421, 9422, 9423, 9424, 9456, 9435, 9436, 9437, 9438, 9439, 9440, 9441, 9442, 9443, 9444, 9445, 9446, 9447, 9448, 9449, 9450, 9451, 9452, 9453, 9454, 9455, 9486, 9465, 9466, 9467, 9468, 9469, 9470, 9471, 9472, 9473, 9474, 9475, 9476, 9477, 9478, 9479, 9480, 9481, 9482, 9483, 9484, 9151), class = "Date"), coutryA = c(0.187506004416315, 0.404519257417805, -0.107390371811605, -0.595968278805544, -0.529377240936012, -0.222802921207767, 0.413182392872818, 0.689673026532298, -1.2768723266992, -0.506308625809406, 0.113859233745174, -0.0963423819877653, 0.323987304768398, 1.63846917270538, 0.893233423250338, 0.297732439150487, 0.949323101836486, -0.599518074708052, 0.366372319197032, -2.25734971953878, -0.190971733204918, -0.0874143568874351, 1.46699645184047, 0.00702170238687361, 0.11221346278474, -0.8060359607624, 0.340842350476532, 0.798838328074708, 0.449214745851041, -0.664972890558734, 0.521830282184173, -1.35020467264521, -0.95240631225826, 1.25607320999849, 1.57018988549379, 0.99477900888445, -0.936218787378207, 1.48489932847779, 0.529222943794807, 0.0995675049147771, 0.477770516727839, 1.64567253670186, -0.0212651530684566, 0.558952796713992, 0.0409979382929057, 0.428675380654606, 0.0919422583362682, -0.819694497340459, 1.23998830450888, 0.607498144489643, -1.27724580163097, 1.41634774644371, -0.579094515769707, 2.02039606694223, 0.0740478208705996, -1.69826944583929, -0.321482399813063, -0.489198601400924, 2.0066750686302, -1.90624857752322, 0.46762405849973, 1.31264724137396, -0.0473627194710677, 0.141362267796145, 0.329709761206515, 0.518454586458572, -1.39489985851779, -0.388303591187678, -0.668922704543522, 0.0735115674875065, 1.30737242978235, 0.198503397980751, 0.257831448122427, -1.31173539205588, -1.45147941969116, 0.359725782295977, 0.612882118056585, -0.0733768753346202, -0.508349204402508, 1.35776663767231, 0.997807735669086, -1.41717534266382, -0.894170593324238, -0.68578120845151, -0.211509378018794, 0.436738904337909, -1.46932152770435, 0.0817388759874159, -0.0389350881653141, 0.709198476466861, -0.963669144724435, -0.548607422521798, -0.896886885575286, 0.322231150840934, 1.37327611339939, 0.0310213133870952, 0.796577750757324, -0.2010067423637, -0.241723752424226, 1.37547329580654, -1.15382202538982, 0.101454200596915, 0.273663839664217, 1.8315140887841, 1.86096518756473, -0.536393730924719, -0.45845011727266, 1.10226256157127, -0.385596991265563, 3.20218061566932, -1.25865250042183, -0.13613128784276, 0.483329357746514, -0.597187329618306, 0.710977603908319, -1.07945708269043, -0.477626236401394, 1.51034914684104, 2.35886426985999, -0.0250526828683629, -0.29439443478131, 0.665774016744828, 0.464027472251246, 0.226658374792016, -0.802597030454373, 0.825517059805602, -1.11293193130819, -1.27677400513873, 1.60776237113347, 1.12490009531342, 0.95767047134623, 0.0475745549797055, -0.0591587460876868, -0.690617365240342, -1.62111622938126, -1.3545210707469, 1.8607927043106, 0.764367674339969, 1.49261525602638, 0.549570728337346, -1.29658399741794, -1.6289903797869, 0.00573336252135834, 0.0300702149640632, 0.440810830115721, 0.663568666361326, -0.126685900835146, -0.00221628368438927, 0.815321995886579, -0.499280888368945, -0.271814047751667, -0.071025546459042, 1.73165491816826, -0.0294770299043331, 0.833605607221529, -0.670108794857159, -0.303323318026829, 1.29039844459134, -0.818806702120603, -0.445515595649677, -0.0128796557666887, 0.320923705586147, 0.230597275812536, -1.54009153212366, -0.294702981688559, 0.581209734391958, 0.121384768986639, 0.502914098451111, -1.59018268505718, -0.635101104166451, 1.48005776676403, -0.25631761189957, 0.171947814411552, 0.444646195980014, 0.172655758440111, -0.00432159794094836, -0.549321974240026, 0.585055026451421, -1.22813371480849, 0.846807540195381, 0.319629441352597, 0.393525732059709, -1.40275675444594, 1.11062585584811, 0.214809571213853, -0.636432711800391, -0.283087127251573, -1.46385553207618, 0.436928676930225, -1.34231945433777, 0.451281957595763, -0.523155001924496, -2.69416779107545, 1.5513477373689, 0.989632029400036, 1.34636075948993, 0.346147428691405, -0.464527560160041, 0.337233933370495, 1.11331396366389, -1.00060600083316, -0.734784444487169, 1.40476315358621, 1.01671092179193, -0.0144306250829694, -0.923555930346906, -1.02275966525015, 0.619422010219383, 0.603484309754755, -0.774553813657576, 0.0932792545556387, -0.651884521428279, -0.61965612647073, -1.22104834441579, -1.31439612639271, -2.87707752518163, -0.0343801084491906, -0.640678302378492, -1.38653452986558, 0.884963139028743, -0.657454283462004, 0.462842665244993, -0.20881674837534, 0.6345884135548, 0.707165108434729, -0.162090928425892, -0.998662309785188, 1.3130254639318, 0.191890764940071, -0.0493619237876962, -0.55183232511689, 0.470263932874487, -0.217088645692971, 0.231550037620628, -0.530406537266415, -0.616522469083808, 0.329347084038265, 1.49420692610475, 1.91750823142859, 0.753536143872474, 0.766584887163714, -0.259803384094296, -0.402463714097741, -0.0229799209735185, -0.259677990559218, -1.41529707261105, 0.191362852138627, 1.54483266684747, -1.17947655378489, -0.426265411073274, 0.723010460481118, 1.37405142869537, -0.374771207936141, 0.0513905365832423, -0.369432731236118, -0.945441984794364, 0.179506648255554, 0.31971255725438, -1.25117960937319, 2.46228549580083 ), countryB = c(9.4028512714591, 10.7551115504652, 11.2436629228434, 8.25642360904389, 10.7054313972395, 10.1714609666091, 10.3726975056206, 10.6594299429162, 8.56250595443296, 10.5612153841067, 8.07612112826519, 9.94704207511951, 11.392407273156, 10.4257690445268, 10.6339442533038, 10.5314883825356, 10.3506665399033, 10.2426403362978, 10.8437715647564, 10.8247200587034, 11.2449815064171, 9.2898697883112, 9.05418978124619, 10.6080277672463, 9.19882175737148, 11.3589722806948, 10.4139334238189, 9.44305216810892, 9.58426470056472, 8.84208362003176, 10.8125431356391, 7.71357872650814, 8.73526671289828, 10.714693958224, 9.49976972594189, 9.41896864941478, 7.33073349261249, 10.5268398982262, 9.92255671125184, 10.5665378092202, 10.5035704895405, 7.93682068228084, 10.882421050424, 9.3237610577468, 8.42128120513304, 9.46103753451174, 10.3472215515392, 11.0483414005193, 10.3421689244599, 7.85120280889754, 11.6327644046254, 9.57620745972098, 10.6553844719749, 10.8490250129346, 10.2742492933876, 9.55428072119304, 9.0976351049804, 10.0731951766966, 10.6956488509516, 11.1530744146062, 10.3496303024767, 9.52734980693306, 9.64478424894734, 9.28301632295047, 10.9568790570688, 11.6052870914912, 9.58530202776537, 11.1338134902295, 8.66685735149472, 11.0230863576875, 10.8000609212302, 10.6510296259782, 11.831292326569, 9.53836122448974, 9.55805411377422, 9.90336204189518, 9.36377040999133, 11.7041009433341, 9.95628297574831, 10.718111342931, 10.4562688422472, 8.85976383099186, 8.94085496683824, 8.19538394018434, 10.1058448260449, 9.70821289789561, 9.08757962570738, 10.6599997541876, 10.0521137258219, 9.9924295699559, 11.8730269098299, 11.2634470795907, 11.3854762443416, 9.56742053529845, 10.4101561978503, 9.53376547865009, 9.75410966995361, 9.92804558924886, 8.36231430067066, 10.7486459346681, 12.0143881312685, 11.0083060332839, 9.32820954213586, 10.8420346742049, 9.73064414798223, 10.7593902723319, 10.976622155215, 10.1039774975157, 8.36317871802524, 9.21809894958653, 10.1015362220683, 11.4655736295123, 9.65528297274543, 9.67844310028008, 10.1516820910267, 8.38764450852642, 10.163558398201, 11.1432463477237, 12.0509818193223, 10.9896913965091, 11.1772406550953, 9.14396687337779, 9.93338627749979, 10.9548864433126, 8.64911301751956, 11.706463972364, 11.1012846649741, 8.7805267197408, 11.5802098773954, 10.2268513542863, 10.3509617168731, 9.09646558899397, 11.2706666647314, 11.3984335011704, 11.4808985388742, 10.5608771133999, 10.3684356806175, 10.4815588822618, 10.5818867877558, 12.2561035284691, 8.6464271477849, 10.3412351841865, 10.7577574534162, 11.1124067479261, 9.91627943243343, 10.6356898895291, 10.2107566441478, 10.0672734202575, 10.2385787014999, 11.7112606160069, 10.0453801263575, 8.84654136100724, 10.2173421609193, 9.27919801705716, 10.4755578829547, 7.69340209082122, 9.24705253848083, 10.8415406794597, 8.69603117680965, 11.2589214416702, 10.5425642239737, 10.1389355042458, 9.17267675180435, 12.3052338002213, 10.0181674985326, 12.2715476751051, 9.64516268052739, 10.6305299379912, 10.1829347684655, 9.97983942366781, 10.2559847744715, 10.3092266661814, 9.75215330673072, 10.250464278709, 9.31118800061454, 10.3310666767171, 9.09703848990093, 10.241195105962, 8.57290406448477, 8.98090855172704, 8.64653101832229, 12.6791587435376, 9.56000538681993, 10.4062255533723, 11.067091476284, 10.5255014737268, 10.2240941949978, 9.13081571869084, 9.5942352120783, 9.2753466212409, 10.2789293993548, 8.10255065585342, 9.48751297655077, 8.51198576785003, 9.46310532206947, 9.86727270762806, 11.5149248124739, 9.31557156735022, 9.34351230206303, 10.022139448869, 11.4111350893792, 8.57891783464065, 10.3761090924661, 9.38300408584683, 9.33694577526158, 9.25815555686085, 9.29856853889735, 8.4250073823245, 8.83022950824832, 9.1510846172981, 10.2553042376765, 10.0739540955956, 9.04955917463259, 10.8927827168631, 9.44611041690694, 10.7883395708593, 10.6010088332078, 7.72560864006592, 10.1760839916637, 11.5576569894392, 11.384809257294, 8.73504353987083, 9.00585942714512, 9.62327893504013, 10.3527072699866, 10.5220100705827, 8.74921668696853, 8.56415116683662, 12.1348451793815, 10.9496674323819, 9.64443817181322, 9.52977454697087, 10.4281877186725, 8.52701721410292, 11.6911584965782, 10.2300108250139, 8.65368821276485, 11.7733431942379, 10.2060233777681, 9.57291673029552, 9.82687667895106, 10.5939736188493, 11.2510605726337, 10.3383384488323, 9.92301237292945, 10.0164623230529, 10.4939857044034, 10.5631769648289, 10.935731043532, 11.0659359187168, 8.51697010486427, 9.79512310587405, 9.35132038807071, 11.3286703149903, 10.4621597293933, 10.4099459919071, 8.86246315190942, 9.30054044639769, 9.40346575227191, 9.59278722974697)), row.names = c(NA, -260L), class = "data.frame") From: Subhamitra Patra <mailto:subhamitra.patra at gmail.com> Sent: Friday, September 13, 2019 3:59 PM To: PIKAL Petr <mailto:petr.pikal at precheza.cz>; r-help mailing list <mailto:r-help at r-project.org> Subject: Re: [R] Query about calculating the monthly average of daily data columns Dear PIKAL, Thank you very much for your suggestion. I tried your previous suggested code and getting the average value for each month for both country A, and B. But in your recent email, you are suggesting not to change the date column to real date. If I am going through your recently suggested code, i.e. ?"aggregate(value column, list(format(date column, "%m.%Y"), country column), mean)" I am getting an?Error that "aggregate(value, list(format(date, "%m.%Y"), country), mean) :?object 'value' not found".? Here, my query "may I need to define the date column, country column, and value column separately?" Further, I need something the average value result like below in the data frame Month? ? ? ?Country A? ?Country B Jan 1994? ? 26.66? ? ? ? ?35.78 Feb 1994? ? 26.13? ? ? ? ?29.14 so that it will be easy for me to export to excel, and to use for the further calculations. Please suggest me in this regard. Thank you. https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& Sender notified by https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& 09/13/19, 07:22:53 PM On Fri, Sep 13, 2019 at 7:03 PM PIKAL Petr <mailto:petr.pikal at precheza.cz> wrote: Hi I am almost 100% sure that you would spare yourself much trouble if you changed your date column to real date ?as.Date reshape your wide format to long one library(reshape2) ?melt to get 3 column data.frame with one date column, one country column and one value column use ?aggregate and ?format to get summary value something like aggregate(value column, list(format(date column, "%m.%Y"), country column), mean) But if you insist to scratch your left ear with right hand accross your head, you could continue your way. Cheers Petr> -----Original Message----- > From: R-help <mailto:r-help-bounces at r-project.org> On Behalf Of Subhamitra > Patra > Sent: Friday, September 13, 2019 3:20 PM > To: Jim Lemon <mailto:drjimlemon at gmail.com>; r-help mailing list <r-help at r- > http://project.org> > Subject: Re: [R] Query about calculating the monthly average of daily data > columns > > Dear Sir, > > Yes, I understood the logic. But, still, I have a few queries that I mentioned > below your answers. > > "# if you only have to get the monthly averages, it can be done this way > > spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*) > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)" > > > > B. Here, I need to define the no. of months, and years separately, right? > > or else what 2, and 3 (in bold) indicates? > > > > To get the grouping variable of sequential months that you want, you only > need the month and year values of the dates in the first column. First I used > the "strsplit" function to split the date field at the hyphens, then used > "sapply" to extract ("[") the second (month) and *third (year)* parts as two > new columns. Because you have more than one year of data, you need the > year values or you will group all Januarys, all Februarys and so on. > Notice how I pass both of the new columns as a list (a data frame is a type of > list) in the call to get the mean of each month. > > 1. Here, as per my understanding, the "3" indicates the 3rd year, right? > But, you showed an average for 2 months of the same year. Then, what "3" > in the? spdat$year object indicate? > > > C. From this part, I got the exact average values of both January and > > February of 1994 for country A, and B. But, in code, I have a query > > that I need to define? spdat$returnA, and? spdat$returnB separately > > before writing this code, right? Like this, I need to define for each > > 84 countries separately with their respective number of months, and > > years before writing this code, right? > > > > I don't think so. Because I don't know what your data looks like, I am > guessing that for each row, it has columns for each of the 84 countries. I > don't know what these columns are named, either. Maybe: > > date? ? ? ? ? ? ?Australia? ?Belarus? ?...? ? Zambia > 01/01/1994? ?20? ? ? ? ? ? ?21? ? ? ? ? ? ? ? ?22 > ... > > Here, due to my misunderstanding about the code, I was wrong. But, what > data structure you guessed, it is absolutely right that for each row, I have > columns for each of the 84 countries. So, I think, I need to define the date > column with no. of months, and years once for all the countries. > Therefore, I got my answer to the first and third question in the previous > email (what you suggested) that I no need to define the column of each > country, as the date, and no. of observations are same for all countries. > But, the no. of days are different for each month, and similarly, for each > year. So, I think I need to define date for each year separately.? Hence, I have > given an example of 12 months, for 2 years (i.e. 1994, and 1995), and have > written the following code. Please correct me in case I am wrong. > >? spdat<-data.frame( > > dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r > ep(1,21),rep(2,20), > rep(3,23), rep(4,21), > rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12 > ,22) > ),rep(1994,260) >? dates1> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2 > 2),rep(2,20), > rep(3,23), rep(4,20), > rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12 > ,21) > ),rep(1995,259) ,sep="-") > > Concerning the exporting of structure of the dataset to excel, I will have > 12*84 matrix. But, please suggest me the way to proceed for the large > sample. I have mentioned below what I understood from your code. Please > correct me if I am wrong. > 1. I need to define the date for each year as the no. of days in each month > are different for each year (as mentioned in my above code). For instance, in > my data file, Jan 1994 has 21 days while Jan 1995 has 22 days. > 2. Need to define the date column as character. > 3. Need to define the monthly average for each month, and year. So, now > code will be as follows. > spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12) >? ?%%%%As I need all months average sequentially. > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3) > > Here, this meaning of "3", I am really unable to get. > > 4. Need to define each country with each month and year as mentioned in > the last part of your code. > > Please suggest me in this regard. > > Thank you. > > > > > > > > [image: Mailtrack] > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca > mpaign=signaturevirality5&> > Sender > notified by > Mailtrack > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca > mpaign=signaturevirality5&> > 09/13/19, > 06:41:41 PM > > On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <mailto:drjimlemon at gmail.com> wrote: > > > Hi Subhamitra, > > I'll try to write my answers adjacent to your questions below. > > > > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra < > > mailto:subhamitra.patra at gmail.com> wrote: > > > >> Dear Sir, > >> > >> Thank you very much for your suggestion. > >> > >> Yes, your suggested code worked. But, actually, I have data from 3rd > >> January 1994 to 3rd August 2017 for very large (i.e. for 84 > >> countries) sample. From this, I have given the example of the years > >> up to 2000. Before applying the same code for the long 24 years, I > >> want to learn the logic behind the code. Actually, some part of the > >> code is not understandable to me which I mentioned in the bold letter as > follows. > >> > >> "spdat<-data.frame( > >>? ?dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"), > >>? ?returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))" > >> > >> A. Here, I need to define the no. of days in a month, and the no. of > >> countries name separately, right? But, what is meant by 15:50, and > >> 10:45 in return A, and B respectively? > >> > > > > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the > > real values of return are, so I made them up using the "sample" function. > > However, this is not meant to mislead anyone, just to show how > > whatever numbers are in your data can be used in calculations. The > > colon (":") operator creates a sequence of numbers starting with the > > one to the left and ending with the one to the right. > > > >> > >> "# if you only have to get the monthly averages, it can be done this > >> way > >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*) > >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)" > >> > >> B. Here, I need to define the no. of months, and years separately, right? > >> or else what 2, and 3 (in bold) indicates? > >> > > > > To get the grouping variable of sequential months that you want, you > > only need the month and year values of the dates in the first column. > > First I used the "strsplit" function to split the date field at the > > hyphens, then used "sapply" to extract ("[") the second (month) and > > third (year) parts as two new columns. Because you have more than one > > year of data, you need the year values or you will group all Januarys, > > all Februarys and so on. Notice how I pass both of the new columns as > > a list (a data frame is a type of > > list) in the call to get the mean of each month. > > > >> > >> "# get the averages by month and year - is this correct? > >> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean) > >> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)" > >> > >> C. From this part, I got the exact average values of both January and > >> February of 1994 for country A, and B. But, in code, I have a query > >> that I need to define? spdat$returnA, and? spdat$returnB separately > >> before writing this code, right? Like this, I need to define for each > >> 84 countries separately with their respective number of months, and > >> years before writing this code, right? > >> > > > > I don't think so. Because I don't know what your data looks like, I am > > guessing that for each row, it has columns for each of the 84 > > countries. I don't know what these columns are named, either. Maybe: > > > > date? ? ? ? ? ? ?Australia? ?Belarus? ?...? ? Zambia > > 01/01/1994? ?20? ? ? ? ? ? ?21? ? ? ? ? ? ? ? ?22 > > ... > > > > > >> Yes, after obtaining the monthly average for each country's data, I > >> need to use them for further calculations. So, I want to export the > >> result to excel. But, until understanding the code, I think I willn't > >> able to apply for the entire sample, and cannot be able to discuss > >> the format of the resulted column to export to excel. > >> > > > > Say that we perform the grouped mean calculation for the first two > > country columns like this: > > monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean) > > monmeans > >? ? ?Australia? Belarus > > [1,]? 29.70000 30.43333 > > [2,]? 34.17857 27.39286 > > > > We are presented with a 2x2 matrix of monthly means in just the format > > someone might use for importing into Excel. The first row is January > > 1994, the second February 1994 and so on. By expanding the columns to > > include all the countries in your data, You should have the result you want. > > > > Jim > > > > > -- > *Best Regards,* > *Subhamitra Patra* > *Phd. Research Scholar* > *Department of Humanities and Social Sciences* *Indian Institute of > Technology, Kharagpur* > *INDIA* > > [[alternative HTML version deleted]] > > ______________________________________________ > mailto:R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ -- Best Regards, Subhamitra Patra Phd. Research Scholar Department of Humanities and Social Sciences Indian Institute of Technology, Kharagpur INDIA
Subhamitra Patra
2019-Oct-20 12:09 UTC
[R] Query about calculating the monthly average of daily data columns
Dear Sir, Thank you very much for your suggestions. Due to certain inconveniences, I was unable to work on your suggestions. Today I worked on both suggestions and got the result that I really wanted that monthly averages for each country. Here, I am asking one more query (just for learning purpose) that if my country name and its respective variable is in the panel format, and I want to take the monthly average for each country, how the code will be arranged. For your convenience, I am providing a small data sample below. PERMNO DATE Spread 111 19940103 0.025464308 111 19940104 0.064424296 111 19940105 0.018579337 111 19940106 0.018872211 111 19940107 0.065279782 111 19940110 0.063485905 111 19940111 0.018355453 111 19940112 0.064135683 111 19940113 0.063519987 111 19940114 0.018277351 111 19940117 0.018628417 111 19940118 0.065630229 111 19940119 0.018713152 111 19940120 0.019119037 111 19940121 0.068342043 111 19940124 0.020843244 111 19940125 0.019954211 111 19940126 0.018980321 111 19940127 0.066827165 111 19940128 0.067459235 111 19940131 0.068682559 111 19940201 0.02081465 111 19940202 0.068236091 111 19940203 0.068821406 111 19940204 0.020075648 111 19940207 0.066070584 111 19940208 0.066068837 111 19940209 0.019077072 111 19940210 0.065894875 111 19940211 0.018847478 111 19940214 0.065040844 111 19940215 0.01880332 111 19940216 0.018836199 111 19940217 0.066888865 111 19940218 0.067116793 111 19940221 0.068809742 111 19940222 0.068230213 111 19940223 0.069502855 111 19940224 0.070383523 111 19940225 0.020430811 111 19940228 0.067087257 111 19940301 0.066776479 111 19940302 0.019959031 111 19940303 0.066596469 111 19940304 0.019131334 111 19940307 0.019312528 111 19940308 0.067349909 111 19940309 0.068916431 111 19940310 0.068620043 111 19940311 0.070494844 111 19940314 0.071056842 111 19940315 0.071042517 111 19940316 0.072401771 111 19940317 0.071940001 111 19940318 0.07352884 111 19940321 0.072671688 111 19940322 0.072652595 111 19940323 0.021352138 111 19940324 0.069933727 111 19940325 0.068717467 111 19940328 0.020470748 111 19940329 0.020003748 111 19940330 0.065833717 111 19940331 0.065268388 111 19940401 0.018762356 111 19940404 0.064914179 111 19940405 0.064706743 111 19940406 0.018764175 111 19940407 0.06524806 111 19940408 0.018593449 111 19940411 0.064913949 111 19940412 0.01872089 111 19940413 0.018729328 111 19940414 0.018978773 111 19940415 0.065477137 111 19940418 0.064614365 111 19940419 0.064184148 111 19940420 0.018553192 111 19940421 0.066872771 111 19940422 0.06680782 111 19940425 0.067467961 111 19940426 0.02014297 111 19940427 0.062464016 111 19940428 0.062357052 112 19940429 0.000233993 112 19940103 0.000815264 112 19940104 0.000238165 112 19940105 0.000813632 112 19940106 0.000236915 112 19940107 0.000809102 112 19940110 0.000801642 112 19940111 0.000797932 112 19940112 0.000795251 112 19940113 0.000795186 112 19940114 0.000231359 112 19940117 0.000232134 112 19940118 0.000233718 112 19940119 0.000233993 112 19940120 0.000234694 112 19940121 0.000235753 112 19940124 0.000808653 112 19940125 0.000235604 112 19940126 0.000805068 112 19940127 0.000802337 112 19940128 0.000801768 112 19940131 0.000233517 112 19940201 0.000797431 112 19940202 0.000233338 112 19940203 0.000233826 112 19940204 0.000799519 112 19940207 0.000798105 112 19940208 0.000792245 112 19940209 0.000231113 112 19940210 0.000233413 112 19940211 0.000798168 112 19940214 0.000233282 112 19940215 0.000797848 112 19940216 0.000785165 112 19940217 0.000228426 112 19940218 0.000786783 112 19940221 0.00078343 112 19940222 0.000781459 112 19940223 0.000776264 112 19940224 0.000226399 112 19940225 0.000779066 112 19940228 0.000773603 112 19940301 0.000226487 112 19940302 0.000775233 112 19940303 0.000227017 112 19940304 0.000227854 112 19940307 0.000782814 112 19940308 0.000229164 112 19940309 0.000787033 112 19940310 0.000784049 112 19940311 0.000228984 112 19940314 0.00078697 112 19940315 0.000782567 112 19940316 0.000228516 112 19940317 0.000786347 112 19940318 0.000229236 112 19940321 0.000230107 112 19940322 0.000792689 112 19940323 0.000787284 112 19940324 0.000787221 112 19940325 0.000227978 112 19940328 0.000228713 112 19940329 0.000228894 112 19940330 0.000229255 112 19940331 0.000231003 112 19940401 0.000796567 112 19940404 0.000790668 112 19940405 0.00078195 112 19940406 0.000780475 112 19940407 0.000228355 112 19940408 0.000781723 112 19940411 0.000775741 112 19940412 0.000226647 112 19940413 0.000778876 112 19940414 0.000777336 112 19940415 0.000775253 112 19940418 0.000226362 112 19940419 0.000779554 112 19940420 0.000774824 112 19940421 0.000225582 112 19940422 0.000225724 112 19940425 0.000773361 112 19940426 0.0002256 112 19940427 0.000776416 113 19940428 0.000280542 113 19940429 0.000964148 113 19940103 0.000962654 113 19940104 0.000281768 113 19940105 0.000962219 113 19940106 0.000961965 113 19940107 0.000958602 113 19940110 0.000280056 113 19940111 0.000956348 113 19940112 0.000952171 113 19940113 0.000948176 113 19940114 0.000275607 113 19940117 0.000275773 113 19940118 0.000276738 113 19940119 0.000947068 113 19940120 0.000940959 113 19940121 0.000275224 113 19940124 0.000948489 113 19940125 0.000940076 113 19940126 0.0009309 113 19940127 0.000269955 113 19940128 0.000270328 113 19940131 0.000924234 113 19940201 0.000924038 113 19940202 0.000269088 113 19940203 0.000270247 113 19940204 0.000270562 113 19940207 0.00092656 113 19940208 0.000921819 113 19940209 0.000920361 113 19940210 0.000268958 113 19940211 0.000924758 113 19940214 0.000266768 113 19940215 0.000911325 113 19940216 0.000909294 113 19940217 0.000905887 113 19940218 0.000262919 113 19940221 0.000262978 113 19940222 0.000263189 113 19940223 0.000904439 113 19940224 0.000263512 113 19940225 0.000906184 113 19940228 0.000265198 113 19940301 0.000906126 113 19940302 0.000264357 113 19940303 0.000265392 113 19940304 0.000912495 113 19940307 0.000910641 113 19940308 0.000266143 113 19940309 0.000910113 113 19940310 0.000909277 113 19940311 0.000905056 113 19940314 0.00090285 113 19940315 0.000898831 113 19940316 0.000896118 113 19940317 0.000261294 113 19940318 0.000892563 113 19940321 0.000890852 113 19940322 0.00088639 113 19940323 0.000258509 113 19940324 0.000260286 113 19940325 0.000889354 113 19940328 0.000888373 113 19940329 0.000885049 113 19940330 0.000259116 113 19940331 0.000259474 113 19940401 0.000260316 113 19940404 0.000897493 113 19940405 0.000894592 113 19940406 0.000260435 113 19940407 0.000260989 113 19940408 0.000262061 113 19940411 0.000262262 113 19940412 0.000263604 113 19940413 0.000908682 113 19940414 0.000265348 113 19940415 0.000265637 113 19940418 0.00026862 113 19940419 0.000918882 113 19940420 0.000909904 113 19940421 0.000901725 113 19940422 0.000900062 113 19940425 0.000893547 113 19940426 0.000260899 Here, the 1st column is the name of the countries panel which I identified in number, 2nd is the date column arranged in yyyy-mm-dd format, and 3rd is the daily variable for each country for 4 months of 1994. I need to take the monthly average of spread variable for each country (i.e. noted as 111, 112, and 113) in the above example. In short, my monthly spread variable should be sorted on the basis of both PERMNO (i.e. country identifier), and months of the particular year. Please educate me that in this data format, how the average code can be written? Thank you very much. [image: Mailtrack] <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender notified by Mailtrack <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 10/20/19, 05:31:23 PM On Mon, Sep 16, 2019 at 4:31 PM PIKAL Petr <petr.pikal at precheza.cz> wrote:> Original email did not come through (some problems with formating). > > Hi > > No, on contrary. I **am** suggesting to change date column to real date > asi it is easy to handle with appropriate functions. > > Here are some fake data > > > str(spdat) > 'data.frame': 260 obs. of 3 variables: > $ dates : Date, format: "1995-01-01" "1995-01-02" "1995-01-03" > "1995-01-04" ... > $ coutryA : num 0.188 0.405 -0.107 -0.596 -0.529 ... > $ countryB: num 9.4 10.76 11.24 8.26 10.71 .. > > > head(spdat) > dates coutryA countryB > 1 1995-01-01 0.1875060 9.402851 > 2 1995-01-02 0.4045193 10.755112 > 3 1995-01-03 -0.1073904 11.243663 > 4 1995-01-04 -0.5959683 8.256424 > 5 1995-01-05 -0.5293772 10.705431 > 6 1995-01-06 -0.2228029 10.171461 > > First I melt it > spdat.m <- melt(spdat, id.var="dates") > > > head(spdat.m) > dates variable value > 1 1995-01-01 coutryA 0.1875060 > 2 1995-01-02 coutryA 0.4045193 > 3 1995-01-03 coutryA -0.1073904 > 4 1995-01-04 coutryA -0.5959683 > 5 1995-01-05 coutryA -0.5293772 > 6 1995-01-06 coutryA -0.2228029 > > I do aggregation > > > spdat.ag <- aggregate(spdat.m$value, list(format(spdat.m$dates, > "%m.%Y"), spdat.m$variable), mean) > > And now I use dcast to get required result. > > > dcast(spdat.ag, Group.1~Group.2) > Using x as value column: use value.var to override. > Group.1 coutryA countryB > 1 01.1995 0.098688137 10.177696 > 2 02.1995 0.352264682 9.609261 > 3 03.1995 0.155521876 10.043503 > 4 04.1995 -0.166092393 10.129844 > 5 05.1995 0.164665188 10.308275 > 6 06.1995 0.260633585 10.210129 > 7 07.1995 0.003671979 10.549016 > 8 08.1995 0.045295990 10.087435 > 9 09.1995 -0.145488206 9.689876 > 10 10.1995 -0.225645950 9.743744 > 11 11.1995 0.030273383 10.025435 > 12 12.1995 0.043557468 10.105626 > > Cheers > Petr > > Here are the data. > > > dput(spdat) > spdat <- structure(list(dates = structure(c(9131, 9132, 9133, 9134, 9135, > 9136, 9137, 9138, 9139, 9140, 9141, 9142, 9143, 9144, 9145, 9146, > 9147, 9148, 9149, 9150, 9151, 9152, 9162, 9163, 9164, 9165, 9166, > 9167, 9168, 9169, 9170, 9171, 9172, 9173, 9174, 9175, 9176, 9177, > 9178, 9179, 9180, 9181, 9190, 9191, 9192, 9193, 9194, 9195, 9196, > 9197, 9198, 9199, 9200, 9201, 9202, 9203, 9204, 9205, 9206, 9207, > 9208, 9209, 9210, 9211, 9212, 9221, 9222, 9223, 9224, 9225, 9226, > 9227, 9228, 9229, 9230, 9231, 9232, 9233, 9234, 9235, 9236, 9237, > 9238, 9239, 9240, 9251, 9252, 9253, 9254, 9255, 9256, 9257, 9258, > 9259, 9260, 9261, 9262, 9263, 9264, 9265, 9266, 9267, 9268, 9269, > 9270, 9271, 9272, 9273, 9282, 9283, 9284, 9285, 9286, 9287, 9288, > 9289, 9290, 9291, 9292, 9293, 9294, 9295, 9296, 9297, 9298, 9299, > 9300, 9301, 9302, 9303, 9312, 9313, 9314, 9315, 9316, 9317, 9318, > 9319, 9320, 9321, 9322, 9323, 9324, 9325, 9326, 9327, 9328, 9329, > 9330, 9331, 9332, 9343, 9344, 9345, 9346, 9347, 9348, 9349, 9350, > 9351, 9352, 9353, 9354, 9355, 9356, 9357, 9358, 9359, 9360, 9361, > 9362, 9363, 9364, 9365, 9374, 9375, 9376, 9377, 9378, 9379, 9380, > 9381, 9382, 9383, 9384, 9385, 9386, 9387, 9388, 9389, 9390, 9391, > 9392, 9393, 9394, 9404, 9405, 9406, 9407, 9408, 9409, 9410, 9411, > 9412, 9413, 9414, 9415, 9416, 9417, 9418, 9419, 9420, 9421, 9422, > 9423, 9424, 9456, 9435, 9436, 9437, 9438, 9439, 9440, 9441, 9442, > 9443, 9444, 9445, 9446, 9447, 9448, 9449, 9450, 9451, 9452, 9453, > 9454, 9455, 9486, 9465, 9466, 9467, 9468, 9469, 9470, 9471, 9472, > 9473, 9474, 9475, 9476, 9477, 9478, 9479, 9480, 9481, 9482, 9483, > 9484, 9151), class = "Date"), coutryA = c(0.187506004416315, > 0.404519257417805, -0.107390371811605, -0.595968278805544, > -0.529377240936012, > -0.222802921207767, 0.413182392872818, 0.689673026532298, > -1.2768723266992, > -0.506308625809406, 0.113859233745174, -0.0963423819877653, > 0.323987304768398, > 1.63846917270538, 0.893233423250338, 0.297732439150487, 0.949323101836486, > -0.599518074708052, 0.366372319197032, -2.25734971953878, > -0.190971733204918, > -0.0874143568874351, 1.46699645184047, 0.00702170238687361, > 0.11221346278474, > -0.8060359607624, 0.340842350476532, 0.798838328074708, 0.449214745851041, > -0.664972890558734, 0.521830282184173, -1.35020467264521, > -0.95240631225826, > 1.25607320999849, 1.57018988549379, 0.99477900888445, -0.936218787378207, > 1.48489932847779, 0.529222943794807, 0.0995675049147771, > 0.477770516727839, > 1.64567253670186, -0.0212651530684566, 0.558952796713992, > 0.0409979382929057, > 0.428675380654606, 0.0919422583362682, -0.819694497340459, > 1.23998830450888, > 0.607498144489643, -1.27724580163097, 1.41634774644371, > -0.579094515769707, > 2.02039606694223, 0.0740478208705996, -1.69826944583929, > -0.321482399813063, > -0.489198601400924, 2.0066750686302, -1.90624857752322, 0.46762405849973, > 1.31264724137396, -0.0473627194710677, 0.141362267796145, > 0.329709761206515, > 0.518454586458572, -1.39489985851779, -0.388303591187678, > -0.668922704543522, > 0.0735115674875065, 1.30737242978235, 0.198503397980751, > 0.257831448122427, > -1.31173539205588, -1.45147941969116, 0.359725782295977, > 0.612882118056585, > -0.0733768753346202, -0.508349204402508, 1.35776663767231, > 0.997807735669086, > -1.41717534266382, -0.894170593324238, -0.68578120845151, > -0.211509378018794, > 0.436738904337909, -1.46932152770435, 0.0817388759874159, > -0.0389350881653141, > 0.709198476466861, -0.963669144724435, -0.548607422521798, > -0.896886885575286, > 0.322231150840934, 1.37327611339939, 0.0310213133870952, > 0.796577750757324, > -0.2010067423637, -0.241723752424226, 1.37547329580654, -1.15382202538982, > 0.101454200596915, 0.273663839664217, 1.8315140887841, 1.86096518756473, > -0.536393730924719, -0.45845011727266, 1.10226256157127, > -0.385596991265563, > 3.20218061566932, -1.25865250042183, -0.13613128784276, 0.483329357746514, > -0.597187329618306, 0.710977603908319, -1.07945708269043, > -0.477626236401394, > 1.51034914684104, 2.35886426985999, -0.0250526828683629, > -0.29439443478131, > 0.665774016744828, 0.464027472251246, 0.226658374792016, > -0.802597030454373, > 0.825517059805602, -1.11293193130819, -1.27677400513873, 1.60776237113347, > 1.12490009531342, 0.95767047134623, 0.0475745549797055, > -0.0591587460876868, > -0.690617365240342, -1.62111622938126, -1.3545210707469, 1.8607927043106, > 0.764367674339969, 1.49261525602638, 0.549570728337346, -1.29658399741794, > -1.6289903797869, 0.00573336252135834, 0.0300702149640632, > 0.440810830115721, > 0.663568666361326, -0.126685900835146, -0.00221628368438927, > 0.815321995886579, -0.499280888368945, -0.271814047751667, > -0.071025546459042, > 1.73165491816826, -0.0294770299043331, 0.833605607221529, > -0.670108794857159, > -0.303323318026829, 1.29039844459134, -0.818806702120603, > -0.445515595649677, > -0.0128796557666887, 0.320923705586147, 0.230597275812536, > -1.54009153212366, > -0.294702981688559, 0.581209734391958, 0.121384768986639, > 0.502914098451111, > -1.59018268505718, -0.635101104166451, 1.48005776676403, > -0.25631761189957, > 0.171947814411552, 0.444646195980014, 0.172655758440111, > -0.00432159794094836, > -0.549321974240026, 0.585055026451421, -1.22813371480849, > 0.846807540195381, > 0.319629441352597, 0.393525732059709, -1.40275675444594, 1.11062585584811, > 0.214809571213853, -0.636432711800391, -0.283087127251573, > -1.46385553207618, > 0.436928676930225, -1.34231945433777, 0.451281957595763, > -0.523155001924496, > -2.69416779107545, 1.5513477373689, 0.989632029400036, 1.34636075948993, > 0.346147428691405, -0.464527560160041, 0.337233933370495, > 1.11331396366389, > -1.00060600083316, -0.734784444487169, 1.40476315358621, 1.01671092179193, > -0.0144306250829694, -0.923555930346906, -1.02275966525015, > 0.619422010219383, > 0.603484309754755, -0.774553813657576, 0.0932792545556387, > -0.651884521428279, > -0.61965612647073, -1.22104834441579, -1.31439612639271, > -2.87707752518163, > -0.0343801084491906, -0.640678302378492, -1.38653452986558, > 0.884963139028743, > -0.657454283462004, 0.462842665244993, -0.20881674837534, 0.6345884135548, > 0.707165108434729, -0.162090928425892, -0.998662309785188, > 1.3130254639318, > 0.191890764940071, -0.0493619237876962, -0.55183232511689, > 0.470263932874487, > -0.217088645692971, 0.231550037620628, -0.530406537266415, > -0.616522469083808, > 0.329347084038265, 1.49420692610475, 1.91750823142859, 0.753536143872474, > 0.766584887163714, -0.259803384094296, -0.402463714097741, > -0.0229799209735185, > -0.259677990559218, -1.41529707261105, 0.191362852138627, > 1.54483266684747, > -1.17947655378489, -0.426265411073274, 0.723010460481118, > 1.37405142869537, > -0.374771207936141, 0.0513905365832423, -0.369432731236118, > -0.945441984794364, > 0.179506648255554, 0.31971255725438, -1.25117960937319, 2.46228549580083 > ), countryB = c(9.4028512714591, 10.7551115504652, 11.2436629228434, > 8.25642360904389, 10.7054313972395, 10.1714609666091, 10.3726975056206, > 10.6594299429162, 8.56250595443296, 10.5612153841067, 8.07612112826519, > 9.94704207511951, 11.392407273156, 10.4257690445268, 10.6339442533038, > 10.5314883825356, 10.3506665399033, 10.2426403362978, 10.8437715647564, > 10.8247200587034, 11.2449815064171, 9.2898697883112, 9.05418978124619, > 10.6080277672463, 9.19882175737148, 11.3589722806948, 10.4139334238189, > 9.44305216810892, 9.58426470056472, 8.84208362003176, 10.8125431356391, > 7.71357872650814, 8.73526671289828, 10.714693958224, 9.49976972594189, > 9.41896864941478, 7.33073349261249, 10.5268398982262, 9.92255671125184, > 10.5665378092202, 10.5035704895405, 7.93682068228084, 10.882421050424, > 9.3237610577468, 8.42128120513304, 9.46103753451174, 10.3472215515392, > 11.0483414005193, 10.3421689244599, 7.85120280889754, 11.6327644046254, > 9.57620745972098, 10.6553844719749, 10.8490250129346, 10.2742492933876, > 9.55428072119304, 9.0976351049804, 10.0731951766966, 10.6956488509516, > 11.1530744146062, 10.3496303024767, 9.52734980693306, 9.64478424894734, > 9.28301632295047, 10.9568790570688, 11.6052870914912, 9.58530202776537, > 11.1338134902295, 8.66685735149472, 11.0230863576875, 10.8000609212302, > 10.6510296259782, 11.831292326569, 9.53836122448974, 9.55805411377422, > 9.90336204189518, 9.36377040999133, 11.7041009433341, 9.95628297574831, > 10.718111342931, 10.4562688422472, 8.85976383099186, 8.94085496683824, > 8.19538394018434, 10.1058448260449, 9.70821289789561, 9.08757962570738, > 10.6599997541876, 10.0521137258219, 9.9924295699559, 11.8730269098299, > 11.2634470795907, 11.3854762443416, 9.56742053529845, 10.4101561978503, > 9.53376547865009, 9.75410966995361, 9.92804558924886, 8.36231430067066, > 10.7486459346681, 12.0143881312685, 11.0083060332839, 9.32820954213586, > 10.8420346742049, 9.73064414798223, 10.7593902723319, 10.976622155215, > 10.1039774975157, 8.36317871802524, 9.21809894958653, 10.1015362220683, > 11.4655736295123, 9.65528297274543, 9.67844310028008, 10.1516820910267, > 8.38764450852642, 10.163558398201, 11.1432463477237, 12.0509818193223, > 10.9896913965091, 11.1772406550953, 9.14396687337779, 9.93338627749979, > 10.9548864433126, 8.64911301751956, 11.706463972364, 11.1012846649741, > 8.7805267197408, 11.5802098773954, 10.2268513542863, 10.3509617168731, > 9.09646558899397, 11.2706666647314, 11.3984335011704, 11.4808985388742, > 10.5608771133999, 10.3684356806175, 10.4815588822618, 10.5818867877558, > 12.2561035284691, 8.6464271477849, 10.3412351841865, 10.7577574534162, > 11.1124067479261, 9.91627943243343, 10.6356898895291, 10.2107566441478, > 10.0672734202575, 10.2385787014999, 11.7112606160069, 10.0453801263575, > 8.84654136100724, 10.2173421609193, 9.27919801705716, 10.4755578829547, > 7.69340209082122, 9.24705253848083, 10.8415406794597, 8.69603117680965, > 11.2589214416702, 10.5425642239737, 10.1389355042458, 9.17267675180435, > 12.3052338002213, 10.0181674985326, 12.2715476751051, 9.64516268052739, > 10.6305299379912, 10.1829347684655, 9.97983942366781, 10.2559847744715, > 10.3092266661814, 9.75215330673072, 10.250464278709, 9.31118800061454, > 10.3310666767171, 9.09703848990093, 10.241195105962, 8.57290406448477, > 8.98090855172704, 8.64653101832229, 12.6791587435376, 9.56000538681993, > 10.4062255533723, 11.067091476284, 10.5255014737268, 10.2240941949978, > 9.13081571869084, 9.5942352120783, 9.2753466212409, 10.2789293993548, > 8.10255065585342, 9.48751297655077, 8.51198576785003, 9.46310532206947, > 9.86727270762806, 11.5149248124739, 9.31557156735022, 9.34351230206303, > 10.022139448869, 11.4111350893792, 8.57891783464065, 10.3761090924661, > 9.38300408584683, 9.33694577526158, 9.25815555686085, 9.29856853889735, > 8.4250073823245, 8.83022950824832, 9.1510846172981, 10.2553042376765, > 10.0739540955956, 9.04955917463259, 10.8927827168631, 9.44611041690694, > 10.7883395708593, 10.6010088332078, 7.72560864006592, 10.1760839916637, > 11.5576569894392, 11.384809257294, 8.73504353987083, 9.00585942714512, > 9.62327893504013, 10.3527072699866, 10.5220100705827, 8.74921668696853, > 8.56415116683662, 12.1348451793815, 10.9496674323819, 9.64443817181322, > 9.52977454697087, 10.4281877186725, 8.52701721410292, 11.6911584965782, > 10.2300108250139, 8.65368821276485, 11.7733431942379, 10.2060233777681, > 9.57291673029552, 9.82687667895106, 10.5939736188493, 11.2510605726337, > 10.3383384488323, 9.92301237292945, 10.0164623230529, 10.4939857044034, > 10.5631769648289, 10.935731043532, 11.0659359187168, 8.51697010486427, > 9.79512310587405, 9.35132038807071, 11.3286703149903, 10.4621597293933, > 10.4099459919071, 8.86246315190942, 9.30054044639769, 9.40346575227191, > 9.59278722974697)), row.names = c(NA, -260L), class = "data.frame") > > > > > > From: Subhamitra Patra <mailto:subhamitra.patra at gmail.com> > Sent: Friday, September 13, 2019 3:59 PM > To: PIKAL Petr <mailto:petr.pikal at precheza.cz>; r-help mailing list > <mailto:r-help at r-project.org> > Subject: Re: [R] Query about calculating the monthly average of daily data > columns > > Dear PIKAL, > > Thank you very much for your suggestion. > > I tried your previous suggested code and getting the average value for > each month for both country A, and B. But in your recent email, you are > suggesting not to change the date column to real date. If I am going > through your recently suggested code, i.e. > > "aggregate(value column, list(format(date column, "%m.%Y"), country > column), mean)" > > I am getting an Error that "aggregate(value, list(format(date, "%m.%Y"), > country), mean) : object 'value' not found". > > Here, my query "may I need to define the date column, country column, and > value column separately?" > > Further, I need something the average value result like below in the data > frame > > Month Country A Country B > Jan 1994 26.66 35.78 > Feb 1994 26.13 29.14 > > so that it will be easy for me to export to excel, and to use for the > further calculations. > > Please suggest me in this regard. > > Thank you. > > > > > > > https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& > Sender notified by > > https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& > 09/13/19, 07:22:53 PM > > > > On Fri, Sep 13, 2019 at 7:03 PM PIKAL Petr <mailto:petr.pikal at precheza.cz> > wrote: > Hi > > I am almost 100% sure that you would spare yourself much trouble if you > changed your date column to real date > > ?as.Date > > reshape your wide format to long one > library(reshape2) > ?melt > > to get 3 column data.frame with one date column, one country column and > one value column > > use ?aggregate and ?format to get summary value > > something like > aggregate(value column, list(format(date column, "%m.%Y"), country > column), mean) > > But if you insist to scratch your left ear with right hand accross your > head, you could continue your way. > > Cheers > Petr > > > -----Original Message----- > > From: R-help <mailto:r-help-bounces at r-project.org> On Behalf Of > Subhamitra > > Patra > > Sent: Friday, September 13, 2019 3:20 PM > > To: Jim Lemon <mailto:drjimlemon at gmail.com>; r-help mailing list > <r-help at r- > > http://project.org> > > Subject: Re: [R] Query about calculating the monthly average of daily > data > > columns > > > > Dear Sir, > > > > Yes, I understood the logic. But, still, I have a few queries that I > mentioned > > below your answers. > > > > "# if you only have to get the monthly averages, it can be done this way > > > spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*) > > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)" > > > > > > B. Here, I need to define the no. of months, and years separately, > right? > > > or else what 2, and 3 (in bold) indicates? > > > > > > > To get the grouping variable of sequential months that you want, you only > > need the month and year values of the dates in the first column. First I > used > > the "strsplit" function to split the date field at the hyphens, then used > > "sapply" to extract ("[") the second (month) and *third (year)* parts as > two > > new columns. Because you have more than one year of data, you need the > > year values or you will group all Januarys, all Februarys and so on. > > Notice how I pass both of the new columns as a list (a data frame is a > type of > > list) in the call to get the mean of each month. > > > > 1. Here, as per my understanding, the "3" indicates the 3rd year, right? > > But, you showed an average for 2 months of the same year. Then, what "3" > > in the spdat$year object indicate? > > > > > > C. From this part, I got the exact average values of both January and > > > February of 1994 for country A, and B. But, in code, I have a query > > > that I need to define spdat$returnA, and spdat$returnB separately > > > before writing this code, right? Like this, I need to define for each > > > 84 countries separately with their respective number of months, and > > > years before writing this code, right? > > > > > > > I don't think so. Because I don't know what your data looks like, I am > > guessing that for each row, it has columns for each of the 84 countries. > I > > don't know what these columns are named, either. Maybe: > > > > date Australia Belarus ... Zambia > > 01/01/1994 20 21 22 > > ... > > > > Here, due to my misunderstanding about the code, I was wrong. But, what > > data structure you guessed, it is absolutely right that for each row, I > have > > columns for each of the 84 countries. So, I think, I need to define the > date > > column with no. of months, and years once for all the countries. > > Therefore, I got my answer to the first and third question in the > previous > > email (what you suggested) that I no need to define the column of each > > country, as the date, and no. of observations are same for all countries. > > But, the no. of days are different for each month, and similarly, for > each > > year. So, I think I need to define date for each year separately. > Hence, I have > > given an example of 12 months, for 2 years (i.e. 1994, and 1995), and > have > > written the following code. Please correct me in case I am wrong. > > > > spdat<-data.frame( > > > > > dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r > > ep(1,21),rep(2,20), > > rep(3,23), rep(4,21), > > > rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12 > > ,22) > > ),rep(1994,260) > > dates1> > > paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2 > > 2),rep(2,20), > > rep(3,23), rep(4,20), > > > rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12 > > ,21) > > ),rep(1995,259) ,sep="-") > > > > Concerning the exporting of structure of the dataset to excel, I will > have > > 12*84 matrix. But, please suggest me the way to proceed for the large > > sample. I have mentioned below what I understood from your code. Please > > correct me if I am wrong. > > 1. I need to define the date for each year as the no. of days in each > month > > are different for each year (as mentioned in my above code). For > instance, in > > my data file, Jan 1994 has 21 days while Jan 1995 has 22 days. > > 2. Need to define the date column as character. > > 3. Need to define the monthly average for each month, and year. So, now > > code will be as follows. > > > spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12) > > %%%%As I need all months average sequentially. > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3) > > > > Here, this meaning of "3", I am really unable to get. > > > > 4. Need to define each country with each month and year as mentioned in > > the last part of your code. > > > > Please suggest me in this regard. > > > > Thank you. > > > > > > > > > > > > > > > > [image: Mailtrack] > > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca > > mpaign=signaturevirality5&> > > Sender > > notified by > > Mailtrack > > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca > > mpaign=signaturevirality5&> > > 09/13/19, > > 06:41:41 PM > > > > On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <mailto:drjimlemon at gmail.com> > wrote: > > > > > Hi Subhamitra, > > > I'll try to write my answers adjacent to your questions below. > > > > > > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra < > > > mailto:subhamitra.patra at gmail.com> wrote: > > > > > >> Dear Sir, > > >> > > >> Thank you very much for your suggestion. > > >> > > >> Yes, your suggested code worked. But, actually, I have data from 3rd > > >> January 1994 to 3rd August 2017 for very large (i.e. for 84 > > >> countries) sample. From this, I have given the example of the years > > >> up to 2000. Before applying the same code for the long 24 years, I > > >> want to learn the logic behind the code. Actually, some part of the > > >> code is not understandable to me which I mentioned in the bold letter > as > > follows. > > >> > > >> "spdat<-data.frame( > > >> > dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"), > > >> returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))" > > >> > > >> A. Here, I need to define the no. of days in a month, and the no. of > > >> countries name separately, right? But, what is meant by 15:50, and > > >> 10:45 in return A, and B respectively? > > >> > > > > > > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what the > > > real values of return are, so I made them up using the "sample" > function. > > > However, this is not meant to mislead anyone, just to show how > > > whatever numbers are in your data can be used in calculations. The > > > colon (":") operator creates a sequence of numbers starting with the > > > one to the left and ending with the one to the right. > > > > > >> > > >> "# if you only have to get the monthly averages, it can be done this > > >> way > > >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*) > > >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)" > > >> > > >> B. Here, I need to define the no. of months, and years separately, > right? > > >> or else what 2, and 3 (in bold) indicates? > > >> > > > > > > To get the grouping variable of sequential months that you want, you > > > only need the month and year values of the dates in the first column. > > > First I used the "strsplit" function to split the date field at the > > > hyphens, then used "sapply" to extract ("[") the second (month) and > > > third (year) parts as two new columns. Because you have more than one > > > year of data, you need the year values or you will group all Januarys, > > > all Februarys and so on. Notice how I pass both of the new columns as > > > a list (a data frame is a type of > > > list) in the call to get the mean of each month. > > > > > >> > > >> "# get the averages by month and year - is this correct? > > >> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean) > > >> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)" > > >> > > >> C. From this part, I got the exact average values of both January and > > >> February of 1994 for country A, and B. But, in code, I have a query > > >> that I need to define spdat$returnA, and spdat$returnB separately > > >> before writing this code, right? Like this, I need to define for each > > >> 84 countries separately with their respective number of months, and > > >> years before writing this code, right? > > >> > > > > > > I don't think so. Because I don't know what your data looks like, I am > > > guessing that for each row, it has columns for each of the 84 > > > countries. I don't know what these columns are named, either. Maybe: > > > > > > date Australia Belarus ... Zambia > > > 01/01/1994 20 21 22 > > > ... > > > > > > > > >> Yes, after obtaining the monthly average for each country's data, I > > >> need to use them for further calculations. So, I want to export the > > >> result to excel. But, until understanding the code, I think I willn't > > >> able to apply for the entire sample, and cannot be able to discuss > > >> the format of the resulted column to export to excel. > > >> > > > > > > Say that we perform the grouped mean calculation for the first two > > > country columns like this: > > > monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean) > > > monmeans > > > Australia Belarus > > > [1,] 29.70000 30.43333 > > > [2,] 34.17857 27.39286 > > > > > > We are presented with a 2x2 matrix of monthly means in just the format > > > someone might use for importing into Excel. The first row is January > > > 1994, the second February 1994 and so on. By expanding the columns to > > > include all the countries in your data, You should have the result you > want. > > > > > > Jim > > > > > > > > > -- > > *Best Regards,* > > *Subhamitra Patra* > > *Phd. Research Scholar* > > *Department of Humanities and Social Sciences* *Indian Institute of > > Technology, Kharagpur* > > *INDIA* > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > mailto:R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code. > Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch > partner? PRECHEZA a.s. jsou zve?ejn?ny na: > https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information > about processing and protection of business partner?s personal data are > available on website: > https://www.precheza.cz/en/personal-data-protection-principles/ > D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou > d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? > odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any > documents attached to it may be confidential and are subject to the legally > binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ > > > > -- > Best Regards, > Subhamitra Patra > Phd. Research Scholar > Department of Humanities and Social Sciences > Indian Institute of Technology, Kharagpur > INDIA > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- *Best Regards,* *Subhamitra Patra* *Phd. Research Scholar* *Department of Humanities and Social Sciences* *Indian Institute of Technology, Kharagpur* *INDIA* [[alternative HTML version deleted]]