Hello, my data.frame is sort of a collection of process values, i.e. huge run-chart. It consists of a time-stamp in the first column (date as string), factors in the following columns (used for subset-filtering), and some process-data columns. Hereafter, two examples are listed, showing the problems that occour during print: At first the example, that works fine: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ a = c(1:10) # create a vector of integers b = rep(c("a","b"),5) # create a vector of chars, used # as factor-levels d = rnorm(10) # some random numbers e = data.frame(a,b,d) # connect to a data.frame e.1 = subset(e, b=="a") # create two subsets e.2 = subset(e, b=="b") plot(d~a, e.1, pch=3, col=2) # plot first data-subset points(d~a, e.2, pch=4, col=3) # plot the 2nd one ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ all looks fine in theses plots. However, changing the content of vector "a" to a set of strings the following happens: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ a = c("a","b","c","d","e","f","g","h","i","j") e = data.frame(a,b,d) # re-build data.frame e.1 = subset(e, b=="a") # create two subsets e.2 = subset(e, b=="b") plot(d~a, e.1, pch=3, col=2) points(d~a, e.2, pch=4, col=3) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The plot-command produces horizontal lines instead of dots. This seems to happen when the x-axis contains strings rather than numbers. is there a way out? Best regards, /Steffen -- Steffen Uhlig, PhD Mechatronik und Sensortechnik HTW des Saarlandes Goebenstra?e 40 66117 Saarbr?cken Tel.: +49 (0) 681 58 67 274
Hello, It is completely normal. I advise you to read the manual "An introduction to R" on the CRAN website. For example you can find (part 12.1.1) : 12.1.1 The |plot()| function One of the most frequently used plotting functions in R is the |plot()| function. This is a /generic/ function: the type of plot produced is dependent on the type or /class/ of the first argument. |plot(|x|, |y|)| |plot(|xy|)| If x and y are vectors, |plot(|x|, |y|)| produces a scatterplot of y against x. The same effect can be produced by supplying one argument (second form) as either a list containing two elements x and y or a two-column matrix. |plot(|x|)| If x is a time series, this produces a time-series plot. If x is a numeric vector, it produces a plot of the values in the vector against their index in the vector. If x is a complex vector, it produces a plot of imaginary versus real parts of the vector elements. |plot(|f|)| |plot(|f|, |y|)| f is a factor object, y is a numeric vector. The first form generates a bar plot of f; the second form produces boxplots of y for each level of f. |plot(|df|)| |plot(~ |expr|)| |plot(|y| ~ |expr|)| df is a data frame, y is any object, expr is a list of object names separated by `|+|' (e.g., |a + b + c|). The first two forms produce distributional plots of the variables in a data frame (first form) or of a number of named objects (second form). The third form plots y against every object named in expr. Alain On 26-Jul-10 13:38, Steffen Uhlig wrote:> Hello, > > my data.frame is sort of a collection of process values, i.e. huge > run-chart. It consists of a time-stamp in the first column (date as > string), factors in the following columns (used for subset-filtering), > and some process-data columns. > Hereafter, two examples are listed, showing the problems that occour > during print: > > At first the example, that works fine: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a = c(1:10) # create a vector of integers > b = rep(c("a","b"),5) # create a vector of chars, used > # as factor-levels > d = rnorm(10) # some random numbers > e = data.frame(a,b,d) # connect to a data.frame > > e.1 = subset(e, b=="a") # create two subsets > e.2 = subset(e, b=="b") > plot(d~a, e.1, pch=3, col=2) # plot first data-subset > points(d~a, e.2, pch=4, col=3) # plot the 2nd one > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > all looks fine in theses plots. > > > However, changing the content of vector "a" to a set of strings the > following happens: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a = c("a","b","c","d","e","f","g","h","i","j") > e = data.frame(a,b,d) # re-build data.frame > > e.1 = subset(e, b=="a") # create two subsets > e.2 = subset(e, b=="b") > plot(d~a, e.1, pch=3, col=2) > points(d~a, e.2, pch=4, col=3) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > The plot-command produces horizontal lines instead of dots. This seems > to happen when the x-axis contains strings rather than numbers. is > there a way out? > > Best regards, > /Steffen-- Alain Guillet Statistician and Computer Scientist SMCS - IMMAQ - Universit? catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50
Hi r-help-bounces at r-project.org napsal dne 26.07.2010 13:38:44:> Hello, > > my data.frame is sort of a collection of process values, i.e. huge > run-chart. It consists of a time-stamp in the first column (date as > string), factors in the following columns (used for subset-filtering), > and some process-data columns. > Hereafter, two examples are listed, showing the problems that occour > during print: > > At first the example, that works fine: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a = c(1:10) # create a vector of integers > b = rep(c("a","b"),5) # create a vector of chars, used > # as factor-levels > d = rnorm(10) # some random numbers > e = data.frame(a,b,d) # connect to a data.frame > > e.1 = subset(e, b=="a") # create two subsets > e.2 = subset(e, b=="b") > plot(d~a, e.1, pch=3, col=2) # plot first data-subsetRather strange plot call. I usually call plot(a, d, pch=as.numeric(as.factor(b))+2, col=as.numeric(as.factor(b))+1) as you could have problem when some point in second subset is outside a range of first subset.> points(d~a, e.2, pch=4, col=3) # plot the 2nd one > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > all looks fine in theses plots. > > > However, changing the content of vector "a" to a set of strings the > following happens: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a = c("a","b","c","d","e","f","g","h","i","j") > e = data.frame(a,b,d) # re-build data.frame > > e.1 = subset(e, b=="a") # create two subsets > e.2 = subset(e, b=="b") > plot(d~a, e.1, pch=3, col=2) > points(d~a, e.2, pch=4, col=3) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > The plot-command produces horizontal lines instead of dots. This seems > to happen when the x-axis contains strings rather than numbers. is there> a way out?You actually called boxplots hence lines and labels under x axis. The way out depends on how do you want everything to be plotted. If "a vector" was a factor you could use conversion to numeric representation by a.n<-as.numeric(a) and plot d against a.n with axis labels from a. Try to go through plot, plot.default, boxplot, factor help pages Regards Petr> > Best regards, > /Steffen > -- > Steffen Uhlig, PhD > Mechatronik und Sensortechnik > HTW des Saarlandes > Goebenstra?e 40 > 66117 Saarbr?cken > > Tel.: +49 (0) 681 58 67 274 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
You could have a look at the ggplot2 package to make such plots. The code for the plots is more readable than with base plots. a = c(1:10) # create a vector of integers b = rep(c("a","b"),5) # create a vector of chars, used # as factor-levels d = rnorm(10) # some random numbers e = data.frame(a,b,d) library(ggplot2) ggplot(e, aes(x = a, y = d, colour = b, shape = b)) + geom_point() a = c("a","b","c","d","e","f","g","h","i","j") e = data.frame(a,b,d) # re-build data.frame ggplot(e, aes(x = a, y = d, colour = b, shape = b)) + geom_point() HTH, Thierry ---------------------------------------------------------------------------- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey> -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] Namens Steffen Uhlig > Verzonden: maandag 26 juli 2010 13:39 > Aan: r-help at r-project.org > Onderwerp: [R] Plot of a subset of a data.frame() > > Hello, > > my data.frame is sort of a collection of process values, i.e. > huge run-chart. It consists of a time-stamp in the first > column (date as string), factors in the following columns > (used for subset-filtering), and some process-data columns. > Hereafter, two examples are listed, showing the problems that > occour during print: > > At first the example, that works fine: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a = c(1:10) # create a vector of integers > b = rep(c("a","b"),5) # create a vector of chars, used > # as factor-levels > d = rnorm(10) # some random numbers > e = data.frame(a,b,d) # connect to a data.frame > > e.1 = subset(e, b=="a") # create two subsets > e.2 = subset(e, b=="b") > plot(d~a, e.1, pch=3, col=2) # plot first data-subset > points(d~a, e.2, pch=4, col=3) # plot the 2nd one > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > all looks fine in theses plots. > > > However, changing the content of vector "a" to a set of > strings the following happens: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a = c("a","b","c","d","e","f","g","h","i","j") > e = data.frame(a,b,d) # re-build data.frame > > e.1 = subset(e, b=="a") # create two subsets > e.2 = subset(e, b=="b") > plot(d~a, e.1, pch=3, col=2) > points(d~a, e.2, pch=4, col=3) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > The plot-command produces horizontal lines instead of dots. > This seems to happen when the x-axis contains strings rather > than numbers. is there a way out? > > Best regards, > /Steffen > -- > Steffen Uhlig, PhD > Mechatronik und Sensortechnik > HTW des Saarlandes > Goebenstra?e 40 > 66117 Saarbr?cken > > Tel.: +49 (0) 681 58 67 274 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
On Jul 26, 2010, at 7:38 AM, Steffen Uhlig wrote:> Hello, > > my data.frame is sort of a collection of process values, i.e. huge > run-chart. It consists of a time-stamp in the first column (date as > string), factors in the following columns (used for subset- > filtering), and some process-data columns. > Hereafter, two examples are listed, showing the problems that occour > during print: > > At first the example, that works fine: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a = c(1:10) # create a vector of integers > b = rep(c("a","b"),5) # create a vector of chars, used > # as factor-levels > d = rnorm(10) # some random numbers > e = data.frame(a,b,d) # connect to a data.frameYou've gotten several answers, but none have addressed an aspect of R behavior that took me longer to appreciate than it perhaps should have. The "b" column inside the "e" data.frame is now a factor column. I mention that because you later referred to it as a "string" which it is not. It is an integer with an associated indexed level character vector. Many of the functions that you might think would "work" on "strings" will give either errors or unexpected results when applied to factors.> > e.1 = subset(e, b=="a") # create two subsets > e.2 = subset(e, b=="b") > plot(d~a, e.1, pch=3, col=2) # plot first data-subset > points(d~a, e.2, pch=4, col=3) # plot the 2nd one > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > all looks fine in theses plots. > > > However, changing the content of vector "a" to a set of strings the > following happens: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a = c("a","b","c","d","e","f","g","h","i","j") > e = data.frame(a,b,d) # re-build data.frame > > e.1 = subset(e, b=="a") # create two subsets > e.2 = subset(e, b=="b") > plot(d~a, e.1, pch=3, col=2) > points(d~a, e.2, pch=4, col=3) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > The plot-command produces horizontal lines instead of dots. This > seems to happen when the x-axis contains strings rather than > numbers. is there a way out? > > Best regards, > /Steffen-- David Winsemius, MD Heritage Laboratories West Hartford, CT