Coey Minear
2008-Sep-05  18:53 UTC
[R] boxplot including null info from dataframe, not with SQLite dataframe
I have been trying to use R to gather some information from parsed log
files (as part of examining some performance issues).  I parsed the
log files and put the data into an SQLite database, and then used
RSQLite to load the data into R.  The fields of interest are
controller, action and total_time: controller and action have string
values; total_time has a decimal value.
I first did the following box plot to find the problem controllers.
  boxplot(total_time ~ controller, all_data)
Having identified one controller of interest (let's say
"BadController"), I then wanted to then focus on the actions
associated with that controller.  So I did this:
  boxplot(total_time ~ action, subset(all_data, controller ==
"BadController"))
This gave me a plot I was expecting: just the actions which are
associated with "BadController".  However, I'd done this work on a
FreeBSD system, and then I wanted to print it, and the easiest means
seemed to re-plot using R on Windows.  So I wrote the data to a file,
moved it to Windows and loaded it up there.  
On FreeBSD: 
  write.table(all_data, "datafile.R")
On Windows:
  all_data <- read.table("datafile.R")
However, on Windows, when I get to the boxplot of the subset of data,
I'm seeing every action that's part of all_data, not just the ones
that are associated with "BadController".  Eventually, I found that
this wasn't a Windows vs. FreeBSD issue, because if I reload the data
from the file on FreeBSD, I start to see the same behavior.
Based on a few bug reports I found, it would seem that this is working
as designed.  So my question is: 
What do I do to get it to truly ignore the actions with no data?  I've
tried adding "drop=FALSE" to the subset call; that hasn't worked. 
I
also tried specifically adding "na.action=NULL" to the boxplot call,
with no change.
I'm also curious what's different between a data frame loaded from
SQLite versus a data frame loaded from a file.
In the mean time, I'll either try to install RSQLite on Windows or get
postscript working on FreeBSD.  (My quick attempt with postscript in R
on FreeBSD was not drawing the bounding box nor the axes.)
Thanks for any help.
-- 
Coey Minear
Ben Bolker
2008-Sep-05  19:20 UTC
[R] boxplot including null info from dataframe, not with SQLite dataframe
Coey Minear <cminear <at> securecomputing.com> writes:> > I have been trying to use R to gather some information from parsed log > files (as part of examining some performance issues). I parsed the > log files and put the data into an SQLite database, and then used > RSQLite to load the data into R. The fields of interest are > controller, action and total_time: controller and action have string > values; total_time has a decimal value. > > I first did the following box plot to find the problem controllers. > boxplot(total_time ~ controller, all_data) > > Having identified one controller of interest (let's say > "BadController"), I then wanted to then focus on the actions > associated with that controller. So I did this: > boxplot(total_time ~ action, subset(all_data, controller == "BadController")) > > This gave me a plot I was expecting: just the actions which are > associated with "BadController". However, I'd done this work on a > FreeBSD system, and then I wanted to print it, and the easiest means > seemed to re-plot using R on Windows. So I wrote the data to a file, > moved it to Windows and loaded it up there. > > On FreeBSD: > write.table(all_data, "datafile.R") > > On Windows: > all_data <- read.table("datafile.R") >I'm guessing that you want bad <- subset(all_data,controller=="BadController") bad$action <- factor(bad$action) boxplot(total_time ~ action) Subsetting doesn't drop factor levels that don't occur, which is an unfortunate design decision ... Ben Bolker