Simon Kiss
2010-Jun-28 15:04 UTC
[R] Stacked Histogram, multiple lines for dates of news stories?
Dear colleagues, I have extracted the dates of several news stories from a newspaper data base to chart coverage trends of an issue over time. They are in a data frame that looks just like one generated by the reproducible code below. I can already generate a histogram of the dates with various intervals (months, quarters, weeks years) using hist.Date. However, there are two other things I'd like to do. First, I'd like to either create a stacked histogram so that one could see whether one newspaper really pushed coverage of an issue at a certain point while others then followed later on in time. Second, or alternatively, I would like to do a line graph of the same data for the different papers to represent the same trends. I guess what I'm finding challenging is that I don't have counts of the number of stories on each day or in each week or in each month; I just have the dates themselves. The date.Hist command was very useful in turning those into bins, but I'd like to push it a bit further and to a stacked histogram or a multiple line chart. Can anyone suggest a way to go about doing this? I should say, I played around in Hadley Wickham's ggplot package and looked at his website, and there is a way to render multiple lines here: http://had.co.nz/ggplot2/scale_date.html but it was not clear to me how to plot just the dates or an index of the dates as I don't have a value for the y axis, other than the number of times a story was published in that time frame. Regardless, I hope someone can suggest something. Yours, Simon J. Kiss test=sample(1:3, 50, replace=TRUE) test=as.factor(test) levels(test)=c("Star", "Globe and Mail", "Post") test2=ISOdatetime(sample(2004:2009, 50, replace=TRUE), sample(1:12, size=50, replace=TRUE), sample(1:30, 50, replace=TRUE), 0,0,0) test2=as.Date(test2) test_df=data.frame(test, test2) ********************************* Simon J. Kiss, PhD SSHRC and DAAD Post-Doctoral Fellow John F. Kennedy Institute of North America Studies Free University of Berlin Lansstra?e 7-9 14195 Berlin, Germany Cell: +49 (0)1525-300-2812, Web: http://www.jfki.fu-berlin.de/index.html
Hadley Wickham
2010-Jun-28 16:58 UTC
[R] Stacked Histogram, multiple lines for dates of news stories?
Hi Simon, Here are two ways to do that with ggplot: qplot(test2, data = test_df, geom = "freqpoly", colour = test, binwidth = 30, drop = F) qplot(test2, data = test_df, geom = "bar", fill = test, binwidth = 30) binwidth is in days. If you want to bin by other intervals (like months), I'd recommend doing so before plotting. Hadley On Mon, Jun 28, 2010 at 10:04 AM, Simon Kiss <sjkiss at gmail.com> wrote:> Dear colleagues, > I have extracted the dates of several news stories from a newspaper data > base to chart coverage trends of an issue over time. They are in a data > frame that looks just like one generated by the reproducible code below. > I can already generate a histogram of the dates with various intervals > (months, quarters, weeks years) using hist.Date. ?However, there are two > other things I'd like to do. > First, I'd like to either create a stacked histogram so that one could see > whether one newspaper really pushed coverage of an issue at a certain point > while others then followed later on in time. ?Second, or alternatively, I > would like to do a line graph of the same data for the different papers to > represent the same trends. > I guess what I'm finding challenging is that I don't have counts of the > number of stories on each day or in each week or in each month; I just have > the dates themselves. ?The date.Hist command was very useful in turning > those into bins, but I'd like to push it a bit further and to a stacked > histogram or a multiple line chart. > Can anyone suggest a way to go about doing this? > > I should say, I played around in Hadley Wickham's ggplot package and looked > at his website, and there is a way to render multiple lines here: > http://had.co.nz/ggplot2/scale_date.html > but it was not clear to me how to plot just the dates or an index of the > dates as I don't have a value for the y axis, other than the number of times > a story was published in that time frame. > > Regardless, I hope someone can suggest something. > Yours, > Simon J. Kiss > > test=sample(1:3, 50, replace=TRUE) > test=as.factor(test) > levels(test)=c("Star", "Globe and Mail", "Post") > test2=ISOdatetime(sample(2004:2009, 50, replace=TRUE), sample(1:12, size=50, > replace=TRUE), sample(1:30, 50, replace=TRUE), 0,0,0) > test2=as.Date(test2) > test_df=data.frame(test, test2) > > ********************************* > Simon J. Kiss, PhD > SSHRC and DAAD Post-Doctoral Fellow > John F. Kennedy Institute of North America Studies > Free University of Berlin > Lansstra?e 7-9 > 14195 Berlin, Germany > Cell: +49 (0)1525-300-2812, > Web: http://www.jfki.fu-berlin.de/index.html > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
Jim Lemon
2010-Jun-29 09:03 UTC
[R] Stacked Histogram, multiple lines for dates of news stories?
On 06/29/2010 01:04 AM, Simon Kiss wrote:> Dear colleagues, > I have extracted the dates of several news stories from a newspaper data > base to chart coverage trends of an issue over time. They are in a data > frame that looks just like one generated by the reproducible code below. > I can already generate a histogram of the dates with various intervals > (months, quarters, weeks years) using hist.Date. However, there are two > other things I'd like to do. > First, I'd like to either create a stacked histogram so that one could > see whether one newspaper really pushed coverage of an issue at a > certain point while others then followed later on in time. Second, or > alternatively, I would like to do a line graph of the same data for the > different papers to represent the same trends. > I guess what I'm finding challenging is that I don't have counts of the > number of stories on each day or in each week or in each month; I just > have the dates themselves. The date.Hist command was very useful in > turning those into bins, but I'd like to push it a bit further and to a > stacked histogram or a multiple line chart. > Can anyone suggest a way to go about doing this? > > I should say, I played around in Hadley Wickham's ggplot package and > looked at his website, and there is a way to render multiple lines here: > http://had.co.nz/ggplot2/scale_date.html > but it was not clear to me how to plot just the dates or an index of the > dates as I don't have a value for the y axis, other than the number of > times a story was published in that time frame. >Hi Simon, I had to think about this for a while, but the following may be what you want. It also gave me an idea for a new plot. Thanks. Jim library(plotrix) count1<- hist(as.numeric(test_df$test2[test_df$test=="Globe and Mail"]), breaks=6)$counts count2<- hist(as.numeric(test_df$test2[test_df$test=="Post"]), breaks=6)$counts count3<- hist(as.numeric(test_df$test2[test_df$test=="Star"]), breaks=6)$counts plot(test_df$test2,test_df$test,ylim=c(0.4,3.6),type="n", main="Date of articles",xlab="Year",ylab="Journal",axes=FALSE) yearpos<-seq(12599,14425,length.out=6) axis(1,at=yearpos,labels=2004:2009) axis(2,at=1:3,labels=c("Globe and Mail","Post","Star")) box() dispersion(yearpos,rep(1,6),count1/(max(count1)*2), type="l",fill="green") dispersion(yearpos,rep(2,6),count2/(max(count2)*2), type="l",fill="red") dispersion(yearpos[1:5],rep(3,5),count3/(max(count3)*2), type="l",fill="blue")