Jim Lemon
2018-Dec-16 09:28 UTC
[R] [R studio] Plotting of line chart for each columns at 1 page
Hi Subhamitra, As I said, the code I sent is an approximation to get your year labels in about the correct places. You are welcome to improve the calculations. 182 days is about half a year, so that the first "tick" will fall around the end of June (i.e. the middle of the year). If you specify the increment as 226, you get one too many labels. 229 is what is known as a kludge (a clumsy solution that works) Yes, I mistakenly thought that the observations were the same throughout the four files. As you know this (and I didn't) you can do a better job of placing the year labels by changing the sequence for each of the CSV (not Excel) files. The best method of all would be to have a date for each observation. You could then discard all these approximations I have made to get the plots to work. No, the arguments of the axis function are: axis(<side of plot>, <position of ticks>, <labels for the ticks>) The first argument is; 1=bottom, 2=left, 3=top, 4=right. The next two arguments must be the same length. If not, you will get an error. As you can see, only every other tick has a label to avoid crowding. There are ways to get more tick labels on an axis. Jim On Sun, Dec 16, 2018 at 7:03 PM Subhamitra Patra <subhamitra.patra at gmail.com> wrote:> Hello Sir, > > I have three queries regarding your suggested code. > > *1. *In my last email, I mentioned why there are missing observations in > my data series. In the line, *year_mids<-seq(182,5655,by=229), * > > *A. what 182 indicates and what is the logic behind the consideration of > 229 increments, although there are 226 observations per year?* > *B. Each excel file is having different observations depending on the > variation of starting dates. So, is it required to add **year_mids in > the loop? I think I need to justify **year_mids object each time after > importing the individual excel files. If I am wrong, kindly correct me.* > > 2. Further, in the command* axis(1,at=year_mids,labels=1994:2017), 1 > indicates the no. of increments of year name, right?* > > Kindly clarify my queries Sir for which I shall be always grateful to you. > > Thank you very much. > > [image: Mailtrack] > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender > notified by > Mailtrack > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 12/16/18, > 1:29:05 PM > > On Sun, Dec 16, 2018 at 12:24 PM Subhamitra Patra < > subhamitra.patra at gmail.com> wrote: > >> Thank you very much sir. Actually, I excluded all the non-trading days. >> Therefore, Each year will have 226 observations and total 6154 observations >> for each column. The data which I plotted is not rough data. I obtained the >> rolling observations of window 500 from my original data. So, the no. of >> observations for each resulted column is (6154-500)+1=5655. So, It is >> not accurate as per the days of calculations of each year. >> >> Ok, Sir, I will go through your suggestion, obtain the results for each >> column of my data and would like to discuss the results with you. After >> solving of this problem, I would like to discuss another 2 queries. >> >> Thank you very much Sir for educating a new R learner. >> >> [image: Mailtrack] >> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender >> notified by >> Mailtrack >> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 12/16/18, >> 12:20:17 PM >> >> On Sun, Dec 16, 2018 at 8:10 AM Jim Lemon <drjimlemon at gmail.com> wrote: >> >>> Hi Subhamitra, >>> Thanks. Now I can provide some assistance instead of just complaining. >>> Your first problem is the temporal extent of the data. There are 8613 days >>> and 6512 weekdays between the two dates you list, but only 5655 >>> observations in your data. Therefore it is unlikely that you have a >>> complete data series, or perhaps you have the wrong dates. For the moment >>> I'll assume that there are missing observations. What I am going to do is >>> to match the 24 years (1994-2017) to their approximate positions in the >>> time series. This will give you the x-axis labels that you want, close >>> enough for this illustration. I doubt that you will need anything more >>> accurate. You have a span of 24.58 years, which means that if your missing >>> observations are uniformly distributed, you will have almost exactly 226 >>> observations per year. When i tried this, I got too many intervals, so I >>> increased the increment to 229 and that worked. To get the positions for >>> the middle of each year in the indices of the data: >>> >>> year_mids<-seq(182,5655,by=229) >>> >>> Now I suppress the x-axis by adding xaxt="n" to each call to plot. Then >>> I add a command to display the years at the positions I have calculated: >>> >>> axis(1,at=year_mids,labels=1994:2017) >>> >>> Also note that I have added braces to the "for" loop. Putting it all >>> together: >>> >>> year_mids<-seq(182,5655,by=229) >>> pdf("EMs.pdf",width=20,height=20) >>> par(mfrow=c(5,4)) >>> # import your first sheet here (16 columns) >>> EMs1.1<-read.csv("EMs1.1.csv") >>> ncolumns<-ncol(EMs1.1) >>> for(i in 1:ncolumns) { >>> plot(EMs1.1[,i],type="l",col = "Red", xlab="Time", >>> ylab="APEn", main=names(EMs1.1)[i],xaxt="n") >>> axis(1,at=year_mids,labels=1994:2017) >>> } >>> #import your second sheet here, (1 column) >>> EMs2.1<-read.csv("EMs2.1.csv") >>> ncolumns<-ncol(EMs2.1) >>> for(i in 1:ncolumns) { >>> plot(EMs2.1[,i],type="l",col = "Red", xlab="Time", >>> ylab="APEn", main=names(EMs2.1)[i],xaxt="n") >>> axis(1,at=year_mids,labels=1994:2017) >>> } >>> # import your Third sheet here, (1 column) >>> EMs3.1<-read.csv("EMs3.1.csv") >>> ncolumns<-ncol(EMs3.1) >>> for(i in 1:ncolumns) { >>> plot(EMs3.1[,i],type="l",col = "Red", xlab="Time", >>> ylab="APEn", main=names(EMs3.1)[i],xaxt="n") >>> axis(1,at=year_mids,labels=1994:2017) >>> } >>> # import your fourth sheet here, (1 column) >>> EMs4.1<-read.csv("EMs4.1.csv") >>> ncolumns<-ncol(EMs4.1) >>> for(i in 1:ncolumns) { >>> plot(EMs4.1[,i],type="l",col = "Red", xlab="Time", >>> ylab="APEn", main=names(EMs4.1)[i],xaxt="n") >>> axis(1,at=year_mids,labels=1994:2017) >>> } >>> # finish plotting >>> dev.off() >>> >>> With any luck, you are now okay. Remember, this is a hack to deal with >>> data that are not what you think they are. >>> >>> Jim >>> >>> >> >> -- >> *Best Regards,* >> *Subhamitra Patra* >> *Phd. Research Scholar* >> *Department of Humanities and Social Sciences* >> *Indian Institute of Technology, Kharagpur* >> *INDIA* >> > > > -- > *Best Regards,* > *Subhamitra Patra* > *Phd. Research Scholar* > *Department of Humanities and Social Sciences* > *Indian Institute of Technology, Kharagpur* > *INDIA* >[[alternative HTML version deleted]]
Subhamitra Patra
2018-Dec-17 07:13 UTC
[R] [R studio] Plotting of line chart for each columns at 1 page
Hello Sir, Thank you very much for your excellent guidance to a new R learner. I tried with your suggested code and got the expected results, but for the 2 CSV files (i.e. EMs2.1. and EMs.3.1), the date column is not coming in the X-axis (shown in the last row of the attached result Pdf file). I think I need to increase more or less than 229 in the year-mids because for both the CSV files, starting date is 03-01-2002 and 04-07-2001 (date-month-year) for EMs 2.1. and EMs 3.1. respectively. *Sir, hence I am quite confused for the logic behind the fixing of year_mids*. For your convenience, I am attaching both the code and result file. pdf("EMs1.pdf",width=20,height=20) par(mfrow=c(5,4)) # import your first sheet here (16 columns) EMs1.1<-read.csv("EMs1.1.csv") ncolumns<-ncol(EMs1.1) for(i in 1:ncolumns) { plot(EMs1.1[,i],type="l",col = "Red", xlab="Time", ylab="APEn", main=names(EMs1.1)[i],xaxt="n") year_mids<-seq(182,5655,by=229) axis(1,at=year_mids,labels=1994:2017) } #import your second sheet here, (1 column) EMs2.1<-read.csv("EMs2.1.csv") ncolumns<-ncol(EMs2.1) for(i in 1:ncolumns) { plot(EMs2.1[,i],type="l",col = "Red", xlab="Time", ylab="APEn", main=names(EMs2.1)[i],xaxt="n") year_mids<-seq(182,3567,by=229) axis(1,at=year_mids,labels=2002:2017) } # import your Third sheet here, (1 column) EMs3.1<-read.csv("EMs3.1.csv") ncolumns<-ncol(EMs3.1) for(i in 1:ncolumns) { plot(EMs3.1[,i],type="l",col = "Red", xlab="Time", ylab="APEn", main=names(EMs3.1)[i],xaxt="n") year_mids<-seq(182,3698,by=229) axis(1,at=year_mids,labels=2001:2017) } # import your fourth sheet here, (1 column) EMs4.1<-read.csv("EMs4.1.csv") ncolumns<-ncol(EMs4.1) for(i in 1:ncolumns) { plot(EMs4.1[,i],type="l",col = "Red", xlab="Time", ylab="APEn", main=names(EMs4.1)[i],xaxt="n") year_mids<-seq(182,5265,by=229) axis(1,at=year_mids,labels=1995:2017) } # finish plotting dev.off() Sir, According to your suggestion, *"** you can do a better job of placing the year labels by changing the sequence for each of the CSV (not Excel) files. The best method of all would be to have a date for each observation. You could then discard all these approximations I have made to get the plots to work.**" , *when I am adding the date (i.e. date-month-year) in the sequence *(**year_mids<-seq(182,5655,by=229)* * axis(1,at=year_mids,labels=03-01-1994:03-08-2017)* * })* *I am getting the error.* Kindly suggest. Thank you very much. [image: Mailtrack] <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender notified by Mailtrack <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 12/17/18, 12:25:26 PM On Sun, Dec 16, 2018 at 2:58 PM Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Subhamitra, > As I said, the code I sent is an approximation to get your year labels in > about the correct places. You are welcome to improve the calculations. > > 182 days is about half a year, so that the first "tick" will fall around > the end of June (i.e. the middle of the year). If you specify the increment > as 226, you get one too many labels. 229 is what is known as a kludge (a > clumsy solution that works) > > Yes, I mistakenly thought that the observations were the same throughout > the four files. As you know this (and I didn't) you can do a better job of > placing the year labels by changing the sequence for each of the CSV (not > Excel) files. The best method of all would be to have a date for each > observation. You could then discard all these approximations I have made to > get the plots to work. > > No, the arguments of the axis function are: > > axis(<side of plot>, <position of ticks>, <labels for the ticks>) > > The first argument is; 1=bottom, 2=left, 3=top, 4=right. The next two > arguments must be the same length. If not, you will get an error. As you > can see, only every other tick has a label to avoid crowding. There are > ways to get more tick labels on an axis. > > Jim > > > On Sun, Dec 16, 2018 at 7:03 PM Subhamitra Patra < > subhamitra.patra at gmail.com> wrote: > >> Hello Sir, >> >> I have three queries regarding your suggested code. >> >> *1. *In my last email, I mentioned why there are missing observations in >> my data series. In the line, *year_mids<-seq(182,5655,by=229), * >> >> *A. what 182 indicates and what is the logic behind the consideration of >> 229 increments, although there are 226 observations per year?* >> *B. Each excel file is having different observations depending on the >> variation of starting dates. So, is it required to add **year_mids in >> the loop? I think I need to justify **year_mids object each time after >> importing the individual excel files. If I am wrong, kindly correct me.* >> >> 2. Further, in the command* axis(1,at=year_mids,labels=1994:2017), 1 >> indicates the no. of increments of year name, right?* >> >> Kindly clarify my queries Sir for which I shall be always grateful to you. >> >> Thank you very much. >> >> [image: Mailtrack] >> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender >> notified by >> Mailtrack >> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 12/16/18, >> 1:29:05 PM >> >> On Sun, Dec 16, 2018 at 12:24 PM Subhamitra Patra < >> subhamitra.patra at gmail.com> wrote: >> >>> Thank you very much sir. Actually, I excluded all the non-trading days. >>> Therefore, Each year will have 226 observations and total 6154 observations >>> for each column. The data which I plotted is not rough data. I obtained the >>> rolling observations of window 500 from my original data. So, the no. of >>> observations for each resulted column is (6154-500)+1=5655. So, It is >>> not accurate as per the days of calculations of each year. >>> >>> Ok, Sir, I will go through your suggestion, obtain the results for each >>> column of my data and would like to discuss the results with you. After >>> solving of this problem, I would like to discuss another 2 queries. >>> >>> Thank you very much Sir for educating a new R learner. >>> >>> [image: Mailtrack] >>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender >>> notified by >>> Mailtrack >>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 12/16/18, >>> 12:20:17 PM >>> >>> On Sun, Dec 16, 2018 at 8:10 AM Jim Lemon <drjimlemon at gmail.com> wrote: >>> >>>> Hi Subhamitra, >>>> Thanks. Now I can provide some assistance instead of just complaining. >>>> Your first problem is the temporal extent of the data. There are 8613 days >>>> and 6512 weekdays between the two dates you list, but only 5655 >>>> observations in your data. Therefore it is unlikely that you have a >>>> complete data series, or perhaps you have the wrong dates. For the moment >>>> I'll assume that there are missing observations. What I am going to do is >>>> to match the 24 years (1994-2017) to their approximate positions in the >>>> time series. This will give you the x-axis labels that you want, close >>>> enough for this illustration. I doubt that you will need anything more >>>> accurate. You have a span of 24.58 years, which means that if your missing >>>> observations are uniformly distributed, you will have almost exactly 226 >>>> observations per year. When i tried this, I got too many intervals, so I >>>> increased the increment to 229 and that worked. To get the positions for >>>> the middle of each year in the indices of the data: >>>> >>>> year_mids<-seq(182,5655,by=229) >>>> >>>> Now I suppress the x-axis by adding xaxt="n" to each call to plot. Then >>>> I add a command to display the years at the positions I have calculated: >>>> >>>> axis(1,at=year_mids,labels=1994:2017) >>>> >>>> Also note that I have added braces to the "for" loop. Putting it all >>>> together: >>>> >>>> year_mids<-seq(182,5655,by=229) >>>> pdf("EMs.pdf",width=20,height=20) >>>> par(mfrow=c(5,4)) >>>> # import your first sheet here (16 columns) >>>> EMs1.1<-read.csv("EMs1.1.csv") >>>> ncolumns<-ncol(EMs1.1) >>>> for(i in 1:ncolumns) { >>>> plot(EMs1.1[,i],type="l",col = "Red", xlab="Time", >>>> ylab="APEn", main=names(EMs1.1)[i],xaxt="n") >>>> axis(1,at=year_mids,labels=1994:2017) >>>> } >>>> #import your second sheet here, (1 column) >>>> EMs2.1<-read.csv("EMs2.1.csv") >>>> ncolumns<-ncol(EMs2.1) >>>> for(i in 1:ncolumns) { >>>> plot(EMs2.1[,i],type="l",col = "Red", xlab="Time", >>>> ylab="APEn", main=names(EMs2.1)[i],xaxt="n") >>>> axis(1,at=year_mids,labels=1994:2017) >>>> } >>>> # import your Third sheet here, (1 column) >>>> EMs3.1<-read.csv("EMs3.1.csv") >>>> ncolumns<-ncol(EMs3.1) >>>> for(i in 1:ncolumns) { >>>> plot(EMs3.1[,i],type="l",col = "Red", xlab="Time", >>>> ylab="APEn", main=names(EMs3.1)[i],xaxt="n") >>>> axis(1,at=year_mids,labels=1994:2017) >>>> } >>>> # import your fourth sheet here, (1 column) >>>> EMs4.1<-read.csv("EMs4.1.csv") >>>> ncolumns<-ncol(EMs4.1) >>>> for(i in 1:ncolumns) { >>>> plot(EMs4.1[,i],type="l",col = "Red", xlab="Time", >>>> ylab="APEn", main=names(EMs4.1)[i],xaxt="n") >>>> axis(1,at=year_mids,labels=1994:2017) >>>> } >>>> # finish plotting >>>> dev.off() >>>> >>>> With any luck, you are now okay. Remember, this is a hack to deal with >>>> data that are not what you think they are. >>>> >>>> Jim >>>> >>>> >>> >>> -- >>> *Best Regards,* >>> *Subhamitra Patra* >>> *Phd. Research Scholar* >>> *Department of Humanities and Social Sciences* >>> *Indian Institute of Technology, Kharagpur* >>> *INDIA* >>> >> >> >> -- >> *Best Regards,* >> *Subhamitra Patra* >> *Phd. Research Scholar* >> *Department of Humanities and Social Sciences* >> *Indian Institute of Technology, Kharagpur* >> *INDIA* >> >-- *Best Regards,* *Subhamitra Patra* *Phd. Research Scholar* *Department of Humanities and Social Sciences* *Indian Institute of Technology, Kharagpur* *INDIA* -------------- next part -------------- A non-text attachment was scrubbed... Name: EMs.pdf Type: application/pdf Size: 545317 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20181217/96de6a51/attachment.pdf>
Jim Lemon
2018-Dec-18 01:56 UTC
[R] [R studio] Plotting of line chart for each columns at 1 page
Hi Subhamitra, As for the error that you mention, it was probably: Error in axis(1, at = year_mids, labels = 3 - 1 - 1994:3 - 8 - 2017) : 'at' and 'labels' lengths differ, 24 != 1992 Anything more than a passing glance reveals that you didn't read the explanation I sent about the arguments passed to the "axis" function. Perhaps it will be rewarding to read the help page for the "axis" function in the "graphics" package. Your confusion about the logic (really simple arithmetic) of assigning positions for the year labels may be allayed by the following. Think back to those grade school problems that read: "If I have m apples to give to n people, how many must I give each person so that all will receive the same number and I will have the fewest apples left?" I'm sure that you remember that this can be solved in a number of ways. You can divide m/n and drop the remainder. So, from 03-01-2002 to 03-08-2017 in EMs2.1: diff(as.Date(c("03-01-2002","03-08-2017"),"%d-%m-%Y")) Time difference of 5691 days # plus 1 for all of the days included # calculate the number of years 5692/365.25 [1] 15.58385 So if there had been an observation each day, you would have the trivial task of dividing the number of days by the number of years to get the tick increments: 5692/15.58385 365.2499 Of course you don't have that many observations and you are trying to get the number of observations, not days, in each year. By making the assumption that the missing observations are spread evenly over the years, you can simply replace the number of days with the number of observations. At the moment I don't have that as I unrared your data at home. But you do have it and I will call it nobs: # this calculates the number of observations per year nobs/15.58385 <obs_per_year> will yield the number of observations in each year. So you have your tick increments. Now for the offset. If you want the year ticks to appear at the middle of each year, you will want to start at 182 minus the two days missing in January or 180. So your new year_mids will be: year_mids<-seq(180,nobs,obs_per_year) Your years are 2002:2017 for EMs2.1, so: axis(1,year_mids,2002:2017) may well be what you want for axis ticks. As you can see, the "m apples to n people" approach gives you the answer. The only missing part was the offset, or where to start handing out apples. You might want to have another look at the help pages for "axis" and "seq" (or ":") which will show you why your axis command failed badly. Good luck. Jim On Mon, Dec 17, 2018 at 6:12 PM Subhamitra Patra <subhamitra.patra at gmail.com> wrote:> Hello Sir, > > Thank you very much for your excellent guidance to a new R learner. > > I tried with your suggested code and got the expected results, but for the > 2 CSV files (i.e. EMs2.1. and EMs.3.1), the date column is not coming in > the X-axis (shown in the last row of the attached result Pdf file). I > think I need to increase more or less than 229 in the year-mids because for > both the CSV files, starting date is 03-01-2002 and 04-07-2001 > (date-month-year) for EMs 2.1. and EMs 3.1. respectively. *Sir, hence I > am quite confused for the logic behind the fixing of year_mids*. For your > convenience, I am attaching both the code and result file. >[[alternative HTML version deleted]]