Jim Lemon
2018-Dec-16 09:28 UTC
[R] [R studio] Plotting of line chart for each columns at 1 page
Hi Subhamitra, As I said, the code I sent is an approximation to get your year labels in about the correct places. You are welcome to improve the calculations. 182 days is about half a year, so that the first "tick" will fall around the end of June (i.e. the middle of the year). If you specify the increment as 226, you get one too many labels. 229 is what is known as a kludge (a clumsy solution that works) Yes, I mistakenly thought that the observations were the same throughout the four files. As you know this (and I didn't) you can do a better job of placing the year labels by changing the sequence for each of the CSV (not Excel) files. The best method of all would be to have a date for each observation. You could then discard all these approximations I have made to get the plots to work. No, the arguments of the axis function are: axis(<side of plot>, <position of ticks>, <labels for the ticks>) The first argument is; 1=bottom, 2=left, 3=top, 4=right. The next two arguments must be the same length. If not, you will get an error. As you can see, only every other tick has a label to avoid crowding. There are ways to get more tick labels on an axis. Jim On Sun, Dec 16, 2018 at 7:03 PM Subhamitra Patra <subhamitra.patra at gmail.com> wrote:> Hello Sir, > > I have three queries regarding your suggested code. > > *1. *In my last email, I mentioned why there are missing observations in > my data series. In the line, *year_mids<-seq(182,5655,by=229), * > > *A. what 182 indicates and what is the logic behind the consideration of > 229 increments, although there are 226 observations per year?* > *B. Each excel file is having different observations depending on the > variation of starting dates. So, is it required to add **year_mids in > the loop? I think I need to justify **year_mids object each time after > importing the individual excel files. If I am wrong, kindly correct me.* > > 2. Further, in the command* axis(1,at=year_mids,labels=1994:2017), 1 > indicates the no. of increments of year name, right?* > > Kindly clarify my queries Sir for which I shall be always grateful to you. > > Thank you very much. > > [image: Mailtrack] > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender > notified by > Mailtrack > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 12/16/18, > 1:29:05 PM > > On Sun, Dec 16, 2018 at 12:24 PM Subhamitra Patra < > subhamitra.patra at gmail.com> wrote: > >> Thank you very much sir. Actually, I excluded all the non-trading days. >> Therefore, Each year will have 226 observations and total 6154 observations >> for each column. The data which I plotted is not rough data. I obtained the >> rolling observations of window 500 from my original data. So, the no. of >> observations for each resulted column is (6154-500)+1=5655. So, It is >> not accurate as per the days of calculations of each year. >> >> Ok, Sir, I will go through your suggestion, obtain the results for each >> column of my data and would like to discuss the results with you. After >> solving of this problem, I would like to discuss another 2 queries. >> >> Thank you very much Sir for educating a new R learner. >> >> [image: Mailtrack] >> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender >> notified by >> Mailtrack >> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 12/16/18, >> 12:20:17 PM >> >> On Sun, Dec 16, 2018 at 8:10 AM Jim Lemon <drjimlemon at gmail.com> wrote: >> >>> Hi Subhamitra, >>> Thanks. Now I can provide some assistance instead of just complaining. >>> Your first problem is the temporal extent of the data. There are 8613 days >>> and 6512 weekdays between the two dates you list, but only 5655 >>> observations in your data. Therefore it is unlikely that you have a >>> complete data series, or perhaps you have the wrong dates. For the moment >>> I'll assume that there are missing observations. What I am going to do is >>> to match the 24 years (1994-2017) to their approximate positions in the >>> time series. This will give you the x-axis labels that you want, close >>> enough for this illustration. I doubt that you will need anything more >>> accurate. You have a span of 24.58 years, which means that if your missing >>> observations are uniformly distributed, you will have almost exactly 226 >>> observations per year. When i tried this, I got too many intervals, so I >>> increased the increment to 229 and that worked. To get the positions for >>> the middle of each year in the indices of the data: >>> >>> year_mids<-seq(182,5655,by=229) >>> >>> Now I suppress the x-axis by adding xaxt="n" to each call to plot. Then >>> I add a command to display the years at the positions I have calculated: >>> >>> axis(1,at=year_mids,labels=1994:2017) >>> >>> Also note that I have added braces to the "for" loop. Putting it all >>> together: >>> >>> year_mids<-seq(182,5655,by=229) >>> pdf("EMs.pdf",width=20,height=20) >>> par(mfrow=c(5,4)) >>> # import your first sheet here (16 columns) >>> EMs1.1<-read.csv("EMs1.1.csv") >>> ncolumns<-ncol(EMs1.1) >>> for(i in 1:ncolumns) { >>> plot(EMs1.1[,i],type="l",col = "Red", xlab="Time", >>> ylab="APEn", main=names(EMs1.1)[i],xaxt="n") >>> axis(1,at=year_mids,labels=1994:2017) >>> } >>> #import your second sheet here, (1 column) >>> EMs2.1<-read.csv("EMs2.1.csv") >>> ncolumns<-ncol(EMs2.1) >>> for(i in 1:ncolumns) { >>> plot(EMs2.1[,i],type="l",col = "Red", xlab="Time", >>> ylab="APEn", main=names(EMs2.1)[i],xaxt="n") >>> axis(1,at=year_mids,labels=1994:2017) >>> } >>> # import your Third sheet here, (1 column) >>> EMs3.1<-read.csv("EMs3.1.csv") >>> ncolumns<-ncol(EMs3.1) >>> for(i in 1:ncolumns) { >>> plot(EMs3.1[,i],type="l",col = "Red", xlab="Time", >>> ylab="APEn", main=names(EMs3.1)[i],xaxt="n") >>> axis(1,at=year_mids,labels=1994:2017) >>> } >>> # import your fourth sheet here, (1 column) >>> EMs4.1<-read.csv("EMs4.1.csv") >>> ncolumns<-ncol(EMs4.1) >>> for(i in 1:ncolumns) { >>> plot(EMs4.1[,i],type="l",col = "Red", xlab="Time", >>> ylab="APEn", main=names(EMs4.1)[i],xaxt="n") >>> axis(1,at=year_mids,labels=1994:2017) >>> } >>> # finish plotting >>> dev.off() >>> >>> With any luck, you are now okay. Remember, this is a hack to deal with >>> data that are not what you think they are. >>> >>> Jim >>> >>> >> >> -- >> *Best Regards,* >> *Subhamitra Patra* >> *Phd. Research Scholar* >> *Department of Humanities and Social Sciences* >> *Indian Institute of Technology, Kharagpur* >> *INDIA* >> > > > -- > *Best Regards,* > *Subhamitra Patra* > *Phd. Research Scholar* > *Department of Humanities and Social Sciences* > *Indian Institute of Technology, Kharagpur* > *INDIA* >[[alternative HTML version deleted]]
Subhamitra Patra
2018-Dec-17 07:13 UTC
[R] [R studio] Plotting of line chart for each columns at 1 page
Hello Sir,
Thank you very much for your excellent guidance to a new R learner.
I tried with your suggested code and got the expected results, but for the
2 CSV files (i.e. EMs2.1. and EMs.3.1), the date column is not coming in
the X-axis (shown in the last row of the attached result Pdf file). I
think I need to increase more or less than 229 in the year-mids because for
both the CSV files, starting date is 03-01-2002 and 04-07-2001
(date-month-year) for EMs 2.1. and EMs 3.1. respectively. *Sir, hence I am
quite confused for the logic behind the fixing of year_mids*. For your
convenience, I am attaching both the code and result file.
pdf("EMs1.pdf",width=20,height=20)
par(mfrow=c(5,4))
# import your first sheet here (16 columns)
EMs1.1<-read.csv("EMs1.1.csv")
ncolumns<-ncol(EMs1.1)
for(i in 1:ncolumns) {
plot(EMs1.1[,i],type="l",col = "Red",
xlab="Time",
ylab="APEn", main=names(EMs1.1)[i],xaxt="n")
year_mids<-seq(182,5655,by=229)
axis(1,at=year_mids,labels=1994:2017)
}
#import your second sheet here, (1 column)
EMs2.1<-read.csv("EMs2.1.csv")
ncolumns<-ncol(EMs2.1)
for(i in 1:ncolumns) {
plot(EMs2.1[,i],type="l",col = "Red",
xlab="Time",
ylab="APEn", main=names(EMs2.1)[i],xaxt="n")
year_mids<-seq(182,3567,by=229)
axis(1,at=year_mids,labels=2002:2017)
}
# import your Third sheet here, (1 column)
EMs3.1<-read.csv("EMs3.1.csv")
ncolumns<-ncol(EMs3.1)
for(i in 1:ncolumns) {
plot(EMs3.1[,i],type="l",col = "Red",
xlab="Time",
ylab="APEn", main=names(EMs3.1)[i],xaxt="n")
year_mids<-seq(182,3698,by=229)
axis(1,at=year_mids,labels=2001:2017)
}
# import your fourth sheet here, (1 column)
EMs4.1<-read.csv("EMs4.1.csv")
ncolumns<-ncol(EMs4.1)
for(i in 1:ncolumns) {
plot(EMs4.1[,i],type="l",col = "Red",
xlab="Time",
ylab="APEn", main=names(EMs4.1)[i],xaxt="n")
year_mids<-seq(182,5265,by=229)
axis(1,at=year_mids,labels=1995:2017)
}
# finish plotting
dev.off()
Sir, According to your suggestion, *"** you can do a better job of placing
the year labels by changing the sequence for each of the CSV (not Excel)
files. The best method of all would be to have a date for each observation.
You could then discard all these approximations I have made to get the
plots to work.**" , *when I am adding the date (i.e. date-month-year) in
the sequence *(**year_mids<-seq(182,5655,by=229)*
*
axis(1,at=year_mids,labels=03-01-1994:03-08-2017)*
*
})*
*I am getting the error.*
Kindly suggest.
Thank you very much.
[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
12/17/18,
12:25:26 PM
On Sun, Dec 16, 2018 at 2:58 PM Jim Lemon <drjimlemon at gmail.com> wrote:
> Hi Subhamitra,
> As I said, the code I sent is an approximation to get your year labels in
> about the correct places. You are welcome to improve the calculations.
>
> 182 days is about half a year, so that the first "tick" will fall
around
> the end of June (i.e. the middle of the year). If you specify the increment
> as 226, you get one too many labels. 229 is what is known as a kludge (a
> clumsy solution that works)
>
> Yes, I mistakenly thought that the observations were the same throughout
> the four files. As you know this (and I didn't) you can do a better job
of
> placing the year labels by changing the sequence for each of the CSV (not
> Excel) files. The best method of all would be to have a date for each
> observation. You could then discard all these approximations I have made to
> get the plots to work.
>
> No, the arguments of the axis function are:
>
> axis(<side of plot>, <position of ticks>, <labels for the
ticks>)
>
> The first argument is; 1=bottom, 2=left, 3=top, 4=right. The next two
> arguments must be the same length. If not, you will get an error. As you
> can see, only every other tick has a label to avoid crowding. There are
> ways to get more tick labels on an axis.
>
> Jim
>
>
> On Sun, Dec 16, 2018 at 7:03 PM Subhamitra Patra <
> subhamitra.patra at gmail.com> wrote:
>
>> Hello Sir,
>>
>> I have three queries regarding your suggested code.
>>
>> *1. *In my last email, I mentioned why there are missing observations
in
>> my data series. In the line, *year_mids<-seq(182,5655,by=229), *
>>
>> *A. what 182 indicates and what is the logic behind the consideration
of
>> 229 increments, although there are 226 observations per year?*
>> *B. Each excel file is having different observations depending on the
>> variation of starting dates. So, is it required to add **year_mids in
>> the loop? I think I need to justify **year_mids object each time after
>> importing the individual excel files. If I am wrong, kindly correct
me.*
>>
>> 2. Further, in the command* axis(1,at=year_mids,labels=1994:2017), 1
>> indicates the no. of increments of year name, right?*
>>
>> Kindly clarify my queries Sir for which I shall be always grateful to
you.
>>
>> Thank you very much.
>>
>> [image: Mailtrack]
>>
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
>> notified by
>> Mailtrack
>>
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
12/16/18,
>> 1:29:05 PM
>>
>> On Sun, Dec 16, 2018 at 12:24 PM Subhamitra Patra <
>> subhamitra.patra at gmail.com> wrote:
>>
>>> Thank you very much sir. Actually, I excluded all the non-trading
days.
>>> Therefore, Each year will have 226 observations and total 6154
observations
>>> for each column. The data which I plotted is not rough data. I
obtained the
>>> rolling observations of window 500 from my original data. So, the
no. of
>>> observations for each resulted column is (6154-500)+1=5655. So, It
is
>>> not accurate as per the days of calculations of each year.
>>>
>>> Ok, Sir, I will go through your suggestion, obtain the results for
each
>>> column of my data and would like to discuss the results with you.
After
>>> solving of this problem, I would like to discuss another 2 queries.
>>>
>>> Thank you very much Sir for educating a new R learner.
>>>
>>> [image: Mailtrack]
>>>
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
>>> notified by
>>> Mailtrack
>>>
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
12/16/18,
>>> 12:20:17 PM
>>>
>>> On Sun, Dec 16, 2018 at 8:10 AM Jim Lemon <drjimlemon at
gmail.com> wrote:
>>>
>>>> Hi Subhamitra,
>>>> Thanks. Now I can provide some assistance instead of just
complaining.
>>>> Your first problem is the temporal extent of the data. There
are 8613 days
>>>> and 6512 weekdays between the two dates you list, but only 5655
>>>> observations in your data. Therefore it is unlikely that you
have a
>>>> complete data series, or perhaps you have the wrong dates. For
the moment
>>>> I'll assume that there are missing observations. What I am
going to do is
>>>> to match the 24 years (1994-2017) to their approximate
positions in the
>>>> time series. This will give you the x-axis labels that you
want, close
>>>> enough for this illustration. I doubt that you will need
anything more
>>>> accurate. You have a span of 24.58 years, which means that if
your missing
>>>> observations are uniformly distributed, you will have almost
exactly 226
>>>> observations per year. When i tried this, I got too many
intervals, so I
>>>> increased the increment to 229 and that worked. To get the
positions for
>>>> the middle of each year in the indices of the data:
>>>>
>>>> year_mids<-seq(182,5655,by=229)
>>>>
>>>> Now I suppress the x-axis by adding xaxt="n" to each
call to plot. Then
>>>> I add a command to display the years at the positions I have
calculated:
>>>>
>>>> axis(1,at=year_mids,labels=1994:2017)
>>>>
>>>> Also note that I have added braces to the "for" loop.
Putting it all
>>>> together:
>>>>
>>>> year_mids<-seq(182,5655,by=229)
>>>> pdf("EMs.pdf",width=20,height=20)
>>>> par(mfrow=c(5,4))
>>>> # import your first sheet here (16 columns)
>>>> EMs1.1<-read.csv("EMs1.1.csv")
>>>> ncolumns<-ncol(EMs1.1)
>>>> for(i in 1:ncolumns) {
>>>> plot(EMs1.1[,i],type="l",col = "Red",
xlab="Time",
>>>> ylab="APEn",
main=names(EMs1.1)[i],xaxt="n")
>>>> axis(1,at=year_mids,labels=1994:2017)
>>>> }
>>>> #import your second sheet here, (1 column)
>>>> EMs2.1<-read.csv("EMs2.1.csv")
>>>> ncolumns<-ncol(EMs2.1)
>>>> for(i in 1:ncolumns) {
>>>> plot(EMs2.1[,i],type="l",col = "Red",
xlab="Time",
>>>> ylab="APEn",
main=names(EMs2.1)[i],xaxt="n")
>>>> axis(1,at=year_mids,labels=1994:2017)
>>>> }
>>>> # import your Third sheet here, (1 column)
>>>> EMs3.1<-read.csv("EMs3.1.csv")
>>>> ncolumns<-ncol(EMs3.1)
>>>> for(i in 1:ncolumns) {
>>>> plot(EMs3.1[,i],type="l",col = "Red",
xlab="Time",
>>>> ylab="APEn",
main=names(EMs3.1)[i],xaxt="n")
>>>> axis(1,at=year_mids,labels=1994:2017)
>>>> }
>>>> # import your fourth sheet here, (1 column)
>>>> EMs4.1<-read.csv("EMs4.1.csv")
>>>> ncolumns<-ncol(EMs4.1)
>>>> for(i in 1:ncolumns) {
>>>> plot(EMs4.1[,i],type="l",col = "Red",
xlab="Time",
>>>> ylab="APEn",
main=names(EMs4.1)[i],xaxt="n")
>>>> axis(1,at=year_mids,labels=1994:2017)
>>>> }
>>>> # finish plotting
>>>> dev.off()
>>>>
>>>> With any luck, you are now okay. Remember, this is a hack to
deal with
>>>> data that are not what you think they are.
>>>>
>>>> Jim
>>>>
>>>>
>>>
>>> --
>>> *Best Regards,*
>>> *Subhamitra Patra*
>>> *Phd. Research Scholar*
>>> *Department of Humanities and Social Sciences*
>>> *Indian Institute of Technology, Kharagpur*
>>> *INDIA*
>>>
>>
>>
>> --
>> *Best Regards,*
>> *Subhamitra Patra*
>> *Phd. Research Scholar*
>> *Department of Humanities and Social Sciences*
>> *Indian Institute of Technology, Kharagpur*
>> *INDIA*
>>
>
--
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*
-------------- next part --------------
A non-text attachment was scrubbed...
Name: EMs.pdf
Type: application/pdf
Size: 545317 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20181217/96de6a51/attachment.pdf>
Jim Lemon
2018-Dec-18 01:56 UTC
[R] [R studio] Plotting of line chart for each columns at 1 page
Hi Subhamitra,
As for the error that you mention, it was probably:
Error in axis(1, at = year_mids, labels = 3 - 1 - 1994:3 - 8 - 2017) :
'at' and 'labels' lengths differ, 24 != 1992
Anything more than a passing glance reveals that you didn't read the
explanation I sent about the arguments passed to the "axis" function.
Perhaps it will be rewarding to read the help page for the "axis"
function
in the "graphics" package.
Your confusion about the logic (really simple arithmetic) of assigning
positions for the year labels may be allayed by the following. Think back
to those grade school problems that read:
"If I have m apples to give to n people, how many must I give each person
so that all will receive the same number and I will have the fewest apples
left?"
I'm sure that you remember that this can be solved in a number of ways. You
can divide m/n and drop the remainder. So, from 03-01-2002 to 03-08-2017 in
EMs2.1:
diff(as.Date(c("03-01-2002","03-08-2017"),"%d-%m-%Y"))
Time difference of 5691 days
# plus 1 for all of the days included
# calculate the number of years
5692/365.25
[1] 15.58385
So if there had been an observation each day, you would have the trivial
task of dividing the number of days by the number of years to get the tick
increments:
5692/15.58385
365.2499
Of course you don't have that many observations and you are trying to get
the number of observations, not days, in each year. By making the
assumption that the missing observations are spread evenly over the years,
you can simply replace the number of days with the number of observations.
At the moment I don't have that as I unrared your data at home. But you do
have it and I will call it nobs:
# this calculates the number of observations per year
nobs/15.58385
<obs_per_year>
will yield the number of observations in each year. So you have your tick
increments. Now for the offset. If you want the year ticks to appear at the
middle of each year, you will want to start at 182 minus the two days
missing in January or 180. So your new year_mids will be:
year_mids<-seq(180,nobs,obs_per_year)
Your years are 2002:2017 for EMs2.1, so:
axis(1,year_mids,2002:2017)
may well be what you want for axis ticks. As you can see, the "m apples to
n people" approach gives you the answer. The only missing part was the
offset, or where to start handing out apples. You might want to have
another look at the help pages for "axis" and "seq" (or
":") which will
show you why your axis command failed badly. Good luck.
Jim
On Mon, Dec 17, 2018 at 6:12 PM Subhamitra Patra <subhamitra.patra at
gmail.com>
wrote:
> Hello Sir,
>
> Thank you very much for your excellent guidance to a new R learner.
>
> I tried with your suggested code and got the expected results, but for the
> 2 CSV files (i.e. EMs2.1. and EMs.3.1), the date column is not coming in
> the X-axis (shown in the last row of the attached result Pdf file). I
> think I need to increase more or less than 229 in the year-mids because for
> both the CSV files, starting date is 03-01-2002 and 04-07-2001
> (date-month-year) for EMs 2.1. and EMs 3.1. respectively. *Sir, hence I
> am quite confused for the logic behind the fixing of year_mids*. For your
> convenience, I am attaching both the code and result file.
>
[[alternative HTML version deleted]]