Hi, I'm trying to create a plot showing the density distribution of some shipping data. I like the look of violin plots, but my data is not continuous but rather binned and I want to make sure its binned nature (not smooth) is apparent in the final plot. So for example, I have the number of individuals per vessel, but rather than having the actual number of individuals I have data in the format of: 7 values of zero, 11 values between 1-10, 6 values between 10-100, 13 values between 100-1000, etc. To plot this data I generated a new dataset with the first 7 values being 0, representing the 7 values of 0, the next 11 values being 5.5, representing the 11 values between 1-10, etc. Sample data below. I can make a violin plot (code below) using a log y-axis, which looks alright (though I do have to deal with the zeros still), but in its default format it hides the fact that these are binned data, which seems a bit misleading. Is it possible to make a violin plot that looks a bit more angular (more corners, less smoothing) or in someway shows the distribution, but also clearly shows the true nature of these data? I've tried playing with the bandwidth adjustment and the kernel but haven't been able to get a figure that seems to work. Anyone have some thoughts on this? Thanks, Nate library(ggplot2) library(scales) p=ggplot(data2,(aes(vessel,values))) p+geom_violin()+ scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),labels trans_format("log10", math_format(10^.x))) data2<-read.table(textConnection(" vessel values rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+01 rec 5.5e+01 rec 5.5e+01 rec 5.5e+01 rec 5.5e+01 rec 5.5e+01 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+04 rec 5.5e+04 rec 5.5e+04 rec 5.5e+05 rec 5.5e+05",header=T) [[alternative HTML version deleted]]
On 11/04/2012 06:27 AM, Nathan Miller wrote:> Hi, > > I'm trying to create a plot showing the density distribution of some > shipping data. I like the look of violin plots, but my data is not > continuous but rather binned and I want to make sure its binned nature (not > smooth) is apparent in the final plot. So for example, I have the number of > individuals per vessel, but rather than having the actual number of > individuals I have data in the format of: 7 values of zero, 11 values > between 1-10, 6 values between 10-100, 13 values between 100-1000, etc. To > plot this data I generated a new dataset with the first 7 values being 0, > representing the 7 values of 0, the next 11 values being 5.5, representing > the 11 values between 1-10, etc. Sample data below. > > I can make a violin plot (code below) using a log y-axis, which looks > alright (though I do have to deal with the zeros still), but in its default > format it hides the fact that these are binned data, which seems a bit > misleading. Is it possible to make a violin plot that looks a bit more > angular (more corners, less smoothing) or in someway shows the > distribution, but also clearly shows the true nature of these data? I've > tried playing with the bandwidth adjustment and the kernel but haven't been > able to get a figure that seems to work. > > Anyone have some thoughts on this? >Hi Nate, I'm not exactly sure what you are doing in the data transformation, but you can display this type of information as a single polygon for each instance (kiteChart) or separate rectangles (battleship.plot). library(plotrix) vessels<-matrix(c(zero=sample(1:10,5),one2ten=sample(5:20,5), ten2hundred=sample(15:36,5),hundred2thousand=sample(10:16,5)), ncol=4) battleship.plot(vessels,xlab="Number of passengers", yaxlab=c("Barnacle","Maelstrom","Poopdeck","Seasick","Wallower"), xaxlab=c("0","1-10","10-100","100-1000")) kiteChart(vessels,xlab="Number of passengers",ylab="Vessel", varlabels=c("Barnacle","Maelstrom","Poopdeck","Seasick","Wallower"), timelabels=c("0","1-10","10-100","100-1000")) Jim