thr3ads.net - R help - [R] Problems with Boxplot [Sep 2009]

If this information is useful, please help other people find it:
Share via:

gug

2009-Sep-02 12:23 UTC

[R] Problems with Boxplot

Hello,

I have been having difficulty getting boxplot to give the output I want -
probably a result of the way I have been handling the data.

The data is arranged in columns: each date has two sets of data.  The number
of data points varies with the date, so each column is of different length. 
I want to get a series of boxplots with the date along the x-axis, with
alternating colors, so that it is easy to see the difference between the
results within each date, as well as across dates.

testdata<- c("C:\\Files\\R\\Sample R code\\Post trial data.csv")
data_headings <- read.table(testdata, skip = 0, sep = ",", header
FALSE)[1,]
my_data <- read.table(testdata, skip = 1, sep = ",", na.strings
"na",header = FALSE)
boxplot(my_data*100, names = data_headings, outline = FALSE, range = 0.3,
border = c(2,4))

The result is a boxplot, but it does not show the date along the bottom (the
"names = data_headings" bit achieves nothing).  I can alternatively
try
this:

new_data<- read.table(testdata, skip = 0, sep = ",", na.strings
"na",header = TRUE)
boxplot(new_data,outline = FALSE, range = 0.3,border = c(2,4))

This takes all the data and plots it, but I then lose the ability to
multiply by 100 (I'm trying to show percentages: e.g. 10% as "10",
rather
than as "0.1").

1) My first question is: is there a simple way of getting both dates along
the x-axis and the "*100" calculation (or percentages)?

2) Next is how can I put a legend somewhere to show that red is "data set
1"
and blue is "data set 2".

3) Is it possible to get the date to straddle across each of the two dates
it covers: as it is, one tick has the date, the other does not.

4) Is it possible to show both the median and the mean with boxplot?

5) Finally, the code works as described above (i.e. up to a point) with the
"Post trial data.csv" file I have posted.  However when I try with a
larger
file ("Larger trial.csv", also posted), I get the message: "Error
in
scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :  line
145 did not have 50 elements" when I get to the "data_headings"
line.  I
have no idea why R is seeing a difference between these two files.
http://www.nabble.com/file/p25256461/Post%2Btrial%2Bdata.csv
Post+trial+data.csv  http://www.nabble.com/file/p25256461/Larger%2Btrial.csv
Larger+trial.csv 
Thanks for any suggestions,

Guy Green
 

-- 
View this message in context:
http://www.nabble.com/Problems-with-Boxplot-tp25256461p25256461.html
Sent from the R help mailing list archive at Nabble.com.

gug

2009-Sep-03 11:41 UTC

head link

[R] Problems with Boxplot

I'm posting answers to my own Q's here - as far as I have answers -
first so
that people don't spend time on them, and second in case the solutions are
helpful to anyone else in future.

1) My first question is: is there a simple way of getting both dates along
the x-axis and the "*100" calculation (or percentages)?
I still don't know how to change the format of the y-axis tick labels. 
I'd
be interested if anyone has a quick way to get percentages and additionally,
how do I get numbers in the "0,000" format along the x or y-axis?  In
the
meantime, I can live with this.

2) Next is how can I put a legend somewhere to show that red is "data set
1"
and blue is "data set 2".
I did this with the following text:
legend("top", c("Top","Bottom"), cex=1.5, lty=1:2,
fill=c("lightblue",
"salmon"), bty="n")

3) Is it possible to get the date to straddle across each of the two dates
it covers: as it is, one tick has the date, the other does not.
I didn't manage to do this, but as there were over 20 dates in the final
data (i.e. 40 plots), by changing the width of the chart window, not every
plot was labeled anyway and it was clear enough.

4) Is it possible to show both the median and the mean with boxplot?
I gave up on this, but I think the data looks OK in the end with just the
boxplot defaults.

5) Finally, the code works as described above (i.e. up to a point) with the
"Post trial data.csv" file I have posted.  However when I try with a
larger
file ("Larger trial.csv", also posted), I get the message: "Error
in
scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :  line
145 did not have 50 elements" when I get to the "data_headings"
line.  I
have no idea why R is seeing a difference between these two files.
I ended up finding that even for specific small files, I got this error
message, which prevented me from processing the data and so was fatal to the
code.  I narrowed it down to a small file, and then looked at the csv file
in notepad.  The bottom of the file (which was just 2 columns of data, of
different column lengths), was along these lines:

-0.48013245,0.095652174
-0.039344262,-0.067142857
0.018022077,-0.079295154
-0.078534031,
0.010054845,
0.096153846,
0.177568018
0.013818182
0.002402883

It seemed that R could cope with empty columns - as long as there was a
","
to indicate that there was indeed a column, but it could NOT cope with a
column that didn't exist (because there was no ",").  The problem
was that
Excel, which was generating the CSV file, wasn't putting "," to
indicate
empty columns in certain circumstances.  The solution was to fill the empty
cells in Excel with "na" before saving as CSV.  Excel then saves it
correctly, and R deals with it correctly.  

The final code (though without the y-axis formatting being fixed) is:

testdata<- c("C:\\Files\\R\\Sample R code\\Post trial data.csv")
new_data<- read.table(testdata, skip = 0, sep = ",", na.strings
"na",header = TRUE)
x11(width=16, height=7, pointsize=14)
boxplot(new_data,outline = FALSE, col = c("lightblue",
"salmon"), las =1,
boxwex = 0.5) 
legend("top", c("Label for blue boxes","Label for red
boxes"), cex=1.5,
lty=1:2, fill=c("lightblue", "salmon"), bty="n");
title(main="Chart title text", cex.main = 1.8)
grid()  

Guy

gug wrote:> 
> Hello,
> 
> I have been having difficulty getting boxplot to give the output I want -
> probably a result of the way I have been handling the data.
> 
> The data is arranged in columns: each date has two sets of data.  The
> number of data points varies with the date, so each column is of different
> length.  I want to get a series of boxplots with the date along the
> x-axis, with alternating colors, so that it is easy to see the difference
> between the results within each date, as well as across dates.
> 
> testdata<- c("C:\\Files\\R\\Sample R code\\Post trial
data.csv")
> data_headings <- read.table(testdata, skip = 0, sep = ",",
header > FALSE)[1,]
> my_data <- read.table(testdata, skip = 1, sep = ",",
na.strings > "na",header = FALSE)
> boxplot(my_data*100, names = data_headings, outline = FALSE, range = 0.3,
> border = c(2,4))
> 
> The result is a boxplot, but it does not show the date along the bottom
> (the "names = data_headings" bit achieves nothing).  I can
alternatively
> try this:
> 
> new_data<- read.table(testdata, skip = 0, sep = ",",
na.strings > "na",header = TRUE)
> boxplot(new_data,outline = FALSE, range = 0.3,border = c(2,4))
> 
> This takes all the data and plots it, but I then lose the ability to
> multiply by 100 (I'm trying to show percentages: e.g. 10% as
"10", rather
> than as "0.1").
> 
> 1) My first question is: is there a simple way of getting both dates along
> the x-axis and the "*100" calculation (or percentages)?
> 
> 2) Next is how can I put a legend somewhere to show that red is "data
set
> 1" and blue is "data set 2".
> 
> 3) Is it possible to get the date to straddle across each of the two dates
> it covers: as it is, one tick has the date, the other does not.
> 
> 4) Is it possible to show both the median and the mean with boxplot?
> 
> 5) Finally, the code works as described above (i.e. up to a point) with
> the "Post trial data.csv" file I have posted.  However when I try
with a
> larger file ("Larger trial.csv", also posted), I get the message:
"Error
> in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
> line 145 did not have 50 elements" when I get to the
"data_headings" line.
> I have no idea why R is seeing a difference between these two files.
>  http://www.nabble.com/file/p25256461/Post%2Btrial%2Bdata.csv
> Post+trial+data.csv 
> http://www.nabble.com/file/p25256461/Larger%2Btrial.csv Larger+trial.csv 
> Thanks for any suggestions,
> 
> Guy Green
>  
> 
> 
-- 
View this message in context:
http://www.nabble.com/Problems-with-Boxplot-tp25256461p25274286.html
Sent from the R help mailing list archive at Nabble.com.

Maybe Matching Threads

Search for more apparently analagous threads

R help - Sep 2009 - Problems with Boxplot

[R] Problems with Boxplot

[R] Problems with Boxplot

Maybe Matching Threads