hello, I am very new to R. My current data set is a mix of values and categories. It is a geoscience data set, with values per rock sample. Case in point, each sample belongs to a lithology class, and each sample has several physical property measurements (density, porosity...). I want to be able to plot these physical properties for all samples in each lithology class. this is how i'm doing it now:> tc = read.table(....) > attach(tc) > names(tc)tc = [1] "Well" "Depth" "Latitude" "Longitude" [5] "Formation" "Lithology" "LithClass" "CondUncert" [9] "sample.ID" "Conductivity" "Density" "Porosity"> plot(Depth[LithClass=='sand'], Conductivity[LithClass=='sand'])(ad nauseum... how can I loop through them all?) and ...> boxplot(Conductivity[LithClass=='clay']) > boxplot(Conductivity~LithClass) # whole set of boxplots on onediagram, but # what if want to exclude one or two of the LithClasses? and ...> pairs(c(tc[10],tc[2],tc[11],tc[12]))this is as advanced as I've got. Any tips would be greatly appreciated. Ben. [[alternative HTML version deleted]]
Hi, First a good tip when you ask on the R list is to provide data in a way that we can readily use it. I think the best way to do it is to copy the output of dput(tc) into the email you write. There might be better ways to do what you want, but here is what I would do: #first subset what you need: tc_clay <- tc[LithClass=="clay",] #or tc_not_clay <- tc[LithClass!="clay",] #you might need to drop the unused levels if you don't want all of them to be plotted without values. Two ways (at least) to do it: tc_clay <- factor(tc_clay) tc_not_clay <- droplevels(tc_not_clay) #then you can plot plot(tc_clay$Conductivity~tc_clay$Depth) boxplot(tc_not_clay$Conductivity~tc_not_clay$LithClass) I hope it will help you get started. Ivan Le 1/18/2011 07:48, Ben Harrison a ?crit :> hello, I am very new to R. > My current data set is a mix of values and categories. It is a geoscience > data set, with values per rock sample. Case in point, each sample belongs to > a lithology class, and each sample has several physical property > measurements (density, porosity...). > > I want to be able to plot these physical properties for all samples in each > lithology class. this is how i'm doing it now: > >> tc = read.table(....) >> attach(tc) >> names(tc) > tc = [1] "Well" "Depth" "Latitude" > "Longitude" > [5] "Formation" "Lithology" "LithClass" "CondUncert" > [9] "sample.ID" "Conductivity" "Density" "Porosity" > >> plot(Depth[LithClass=='sand'], Conductivity[LithClass=='sand']) > (ad nauseum... how can I loop through them all?) > > and ... > >> boxplot(Conductivity[LithClass=='clay']) >> boxplot(Conductivity~LithClass) # whole set of boxplots on one > diagram, but > # what if want to exclude one or two of the LithClasses? > > and ... > >> pairs(c(tc[10],tc[2],tc[11],tc[12])) > this is as advanced as I've got. > > Any tips would be greatly appreciated. > > Ben. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. S?ugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra at uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
Hi: Your intention isn't crystal clear to me, but I'll give it a shot... On Mon, Jan 17, 2011 at 10:48 PM, Ben Harrison < b.harrison2@pgrad.unimelb.edu.au> wrote:> hello, I am very new to R. > My current data set is a mix of values and categories. It is a geoscience > data set, with values per rock sample. Case in point, each sample belongs > to > a lithology class, and each sample has several physical property > measurements (density, porosity...). > > I want to be able to plot these physical properties for all samples in each > lithology class. this is how i'm doing it now: > > > tc = read.table(....) > > attach(tc) > > names(tc) > tc = [1] "Well" "Depth" "Latitude" > "Longitude" > [5] "Formation" "Lithology" "LithClass" "CondUncert" > [9] "sample.ID" "Conductivity" "Density" "Porosity" > > > plot(Depth[LithClass=='sand'], Conductivity[LithClass=='sand']) > (ad nauseum... how can I loop through them all?) >One way: sand <- subset(tc, LithClass == 'sand') plot(Conductivity ~ Depth, data = sand, ...) Re the parenthesized statement: loop through all of what? The pairs of variables within a data subset, the data subsets, or both? If you made it clear what you were really after, one could be more forthcoming, but if the idea is to plot all pairs of variables within a particular data subset, there are functions in the apply family and in package plyr that can help you in that regard. Here's one approach for the individual boxplots, applied to the clay LithClass: vars <- names(which(sapply(tc, is.numeric))) # select numeric vars clay <- subset(tc, LithClass == 'clay') # Using the plot function g <- function(x) { with(clay, boxplot(eval(parse(text = x)), xlab eval(substitute(x)))) Sys.sleep(1) } # (1): loop for(i in seq_along(vars)) g(vars[i]) # (2): Use lapply in place of the loop: lapply(vars, g) # For all x-y scatterplots, one approach is via the m_ply() function # in package plyr. A more efficient means of choosing x and y in # expand.grid() is certainly possible, though. library(plyr) vpairs <- expand.grid(x = vars, y = vars) vpairs <- vpairs[vpairs$x != vpairs$y, ] # Basic plot function: f <- function(x, y) { fm <- as.formula(paste(y, x, sep = '~')) plot(fm, data = clay) Sys.sleep(1) } # Let the show begin: m_ply(vpairs, f) It's easier to do all of this when the plot function has a formula interface. Of course, the pairs() function or one of its variants among multiple packages may well be a more efficient approach. If you can split the data into multiple subsets, you can wrap up some or all of the above in a function and call it with lapply() in the base package. I'll leave that as a homework exercise :) Then there's the issue of saving them... HTH, Dennis and ...> > > boxplot(Conductivity[LithClass=='clay']) > > boxplot(Conductivity~LithClass) # whole set of boxplots on one > diagram, but > # what if want to exclude one or two of the LithClasses? > > and ... > > > pairs(c(tc[10],tc[2],tc[11],tc[12])) > this is as advanced as I've got. > > Any tips would be greatly appreciated. > > Ben. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On 2011-01-17 22:48, Ben Harrison wrote:> hello, I am very new to R. > My current data set is a mix of values and categories. It is a geoscience > data set, with values per rock sample. Case in point, each sample belongs to > a lithology class, and each sample has several physical property > measurements (density, porosity...). > > I want to be able to plot these physical properties for all samples in each > lithology class. this is how i'm doing it now: > >> tc = read.table(....) >> attach(tc) >> names(tc) > tc = [1] "Well" "Depth" "Latitude" > "Longitude" > [5] "Formation" "Lithology" "LithClass" "CondUncert" > [9] "sample.ID" "Conductivity" "Density" "Porosity" > >> plot(Depth[LithClass=='sand'], Conductivity[LithClass=='sand']) > (ad nauseum... how can I loop through them all?) > > and ... > >> boxplot(Conductivity[LithClass=='clay']) >> boxplot(Conductivity~LithClass) # whole set of boxplots on one > diagram, but > # what if want to exclude one or two of the LithClasses? > > and ... > >> pairs(c(tc[10],tc[2],tc[11],tc[12])) > this is as advanced as I've got. > > Any tips would be greatly appreciated. > > Ben. >Since you don't provide data, let's borrow from the help(droplevels) page: aq <- transform(airquality, Month = factor(Month, labels = month.abb[5:9])) str(aq) #'data.frame': 153 obs. of 6 variables: | # $ Ozone : int 41 36 12 18 NA 28 23 19 | # $ Solar.R: int 190 118 149 313 NA NA 29 | # $ Wind : num 7.4 8 12.6 11.5 14.3 14. | etc # $ Temp : int 67 72 74 62 56 66 65 59 | # $ Month : Factor w/ 5 levels "May","Jun | # $ Day : int 1 2 3 4 5 6 7 8 9 10 ... | Now see if the following give you some R inspiration: plot(Ozone ~ Temp, data = aq) plot(Ozone ~ Temp, data = aq, subset = {Month == "Sep"}) boxplot(Ozone ~ Month, data = aq) boxplot(Ozone ~ Month, data = aq, subset = {Month != "Aug"}) boxplot(Ozone ~ Month, data = aq, subset = {!(Month %in% c("Jul", "Aug"))}) boxplot(Ozone ~ Month, data = droplevels(subset(aq, subset = {Month != "Aug"}))) boxplot(Ozone ~ Month, data = droplevels(subset(aq, !(Month %in% c("Jul", "Aug"))))) BTW, attach() is not usually a good idea; have a look at ?with. Peter Ehlers