The sample data sets that come with the NADA package are limited to one or two variables and a censored measurement indicator column. I try to mimic examples using my data but keep missing the target. My water chemistry data is available in two formats: long (as seen in a database table) and wide (as seen in a spreadsheet). The two structures are: str(chem) 'data.frame': 65349 obs. of 8 variables: $ site : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 ... $ sampdate: Date, format: "2007-12-12" "2007-12-12" ... $ era : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ... $ param : Factor w/ 64 levels "AgDis","AgTot",..: 2 4 5 7 11 15 25 ... $ quant : num 1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 ... $ ceneq1 : logi TRUE FALSE FALSE FALSE TRUE FALSE ... $ floor : num 0 0.106 231 0.0113 0 100 0 1.43 0 0.0239 ... $ ceiling : num 1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 2.39e-02 ... and str(chem.cast) 'data.frame': 56938 obs. of 70 variables: $ site : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 ... $ sampdate : Date, format: "2007-12-12" "2007-12-12" ... $ era : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ... $ ceneq1 : logi TRUE FALSE FALSE FALSE TRUE FALSE ... $ floor : num 0 0.106 231 0.0113 0 100 0 1.43 0 0.0239 ... $ ceiling : num 1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 ... $ AgDis : num NA NA NA NA NA NA NA NA NA NA ... $ AgTot : num 0.00013 NA NA NA NA NA NA NA NA NA ... $ AlDis : num NA NA NA NA NA NA NA NA NA NA ... $ AlTot : num NA 0.106 NA NA NA NA NA NA NA NA ... $ Alk : num NA NA 231 NA NA NA NA NA NA NA ... $ AsDis : num NA NA NA NA NA NA NA NA NA NA ... and so on. I do not know if the latter is appropriate; that is, that the ceneq1, floor, and ceiling values are available for each site, sampdate, and chemical. Is the appropriate way to use the NADA methods for analyses and plotting to subset each chemical separately from the 'chem' data frame? Or, is there a syntax other than, for example, cenboxplot(chem&Vdis, chem$ceneq1, chem$era) Error in cenros(obs[group == i], cen[group == i]) : error in evaluating the argument 'obs' in selecting a method for function 'ros': Error: object 'Vdis' not found I get the same error when trying to use the 'chem.cast' data frame. Rich
R. Michael Weylandt
2012-Aug-07 18:31 UTC
[R] NADA Package: Referencing Data Frame Columns
On Tue, Aug 7, 2012 at 11:26 AM, Rich Shepard <rshepard at appl-ecosys.com> wrote:> The sample data sets that come with the NADA package are limited to one or > two variables and a censored measurement indicator column. I try to mimic > examples using my data but keep missing the target. > > My water chemistry data is available in two formats: long (as seen in a > database table) and wide (as seen in a spreadsheet). The two structures are: > > str(chem) > 'data.frame': 65349 obs. of 8 variables: > $ site : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 ... > $ sampdate: Date, format: "2007-12-12" "2007-12-12" ... > $ era : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ... > $ param : Factor w/ 64 levels "AgDis","AgTot",..: 2 4 5 7 11 15 25 ... > $ quant : num 1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 ... > $ ceneq1 : logi TRUE FALSE FALSE FALSE TRUE FALSE ... > $ floor : num 0 0.106 231 0.0113 0 100 0 1.43 0 0.0239 ... > $ ceiling : num 1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 2.39e-02 ... > > and > > str(chem.cast) > 'data.frame': 56938 obs. of 70 variables: > $ site : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 ... > $ sampdate : Date, format: "2007-12-12" "2007-12-12" ... > $ era : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ... > $ ceneq1 : logi TRUE FALSE FALSE FALSE TRUE FALSE ... > $ floor : num 0 0.106 231 0.0113 0 100 0 1.43 0 0.0239 ... > $ ceiling : num 1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 ... > $ AgDis : num NA NA NA NA NA NA NA NA NA NA ... > $ AgTot : num 0.00013 NA NA NA NA NA NA NA NA NA ... > $ AlDis : num NA NA NA NA NA NA NA NA NA NA ... > $ AlTot : num NA 0.106 NA NA NA NA NA NA NA NA ... > $ Alk : num NA NA 231 NA NA NA NA NA NA NA ... > $ AsDis : num NA NA NA NA NA NA NA NA NA NA ... > and so on. > > I do not know if the latter is appropriate; that is, that the ceneq1, > floor, and ceiling values are available for each site, sampdate, and > chemical. > > Is the appropriate way to use the NADA methods for analyses and plotting > to subset each chemical separately from the 'chem' data frame? Or, is there > a syntax other than, for example, > > cenboxplot(chem&Vdis, chem$ceneq1, chem$era) > Error in cenros(obs[group == i], cen[group == i]) : > error in evaluating the argument 'obs' in selecting a method for function > 'ros': Error: object 'Vdis' not found > > I get the same error when trying to use the 'chem.cast' data frame. >Take a look at with() Michael> Rich > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Tue, 7 Aug 2012, R. Michael Weylandt wrote:> Take a look at with()Michael, Works like a charm! Thanks again for the pointer. Rich
Hi Rich, I may not have the complete picture here, but I do see what looks to me like a problem with your chem.cast. Specifically, since it has only a single detection indicator column (ceneq1), it implies that within any single sample either all the analytes were detected, or all were not. Not what I would expect. If the typo that others pointed out was not the entire answer to your question, then I would add: As to your larger question of which layout is appropriate for use with NADA functions, the answer is that either can be used. The "trick" is to use the appropriate syntax to extract the values needed to pass the data to a NADA function. The syntax is different for the long vs the wide format. At this point, it's not really a NADA issue, just a matter of R syntax. There are multiple ways to do either one. I suppose each has pros and cons, to some extent depends on what kinds of graphics or analyses you need to do, and there's plenty of room for personal preference. For the long format you subset the rows, then pass the appropriate columns. Here's one way: with(subset(chem, param=='AgDis') , ros(quant,ceneq1)) For the wide format you pass the appropriate columns ros( chem.cast$AgDis, chem.cast$AgDis.ceneq1 ) where I have invented the name of a new column that has the censoring indicator specific to AgDis. Hope this helps. -Don (p.s., I still think you'll be better off in the long run if you store site, param, and maybe era, as character objects, not factors.) -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 8/7/12 9:26 AM, "Rich Shepard" <rshepard at appl-ecosys.com> wrote:> The sample data sets that come with the NADA package are limited to >one or >two variables and a censored measurement indicator column. I try to mimic >examples using my data but keep missing the target. > > My water chemistry data is available in two formats: long (as seen in a >database table) and wide (as seen in a spreadsheet). The two structures >are: > >str(chem) >'data.frame': 65349 obs. of 8 variables: > $ site : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 ... > $ sampdate: Date, format: "2007-12-12" "2007-12-12" ... > $ era : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ... > $ param : Factor w/ 64 levels "AgDis","AgTot",..: 2 4 5 7 11 15 25 ... > $ quant : num 1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 ... > $ ceneq1 : logi TRUE FALSE FALSE FALSE TRUE FALSE ... > $ floor : num 0 0.106 231 0.0113 0 100 0 1.43 0 0.0239 ... > $ ceiling : num 1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 2.39e-02 >... > >and > >str(chem.cast) >'data.frame': 56938 obs. of 70 variables: > $ site : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 ... > $ sampdate : Date, format: "2007-12-12" "2007-12-12" ... > $ era : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ... > $ ceneq1 : logi TRUE FALSE FALSE FALSE TRUE FALSE ... > $ floor : num 0 0.106 231 0.0113 0 100 0 1.43 0 0.0239 ... > $ ceiling : num 1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 ... > $ AgDis : num NA NA NA NA NA NA NA NA NA NA ... > $ AgTot : num 0.00013 NA NA NA NA NA NA NA NA NA ... > $ AlDis : num NA NA NA NA NA NA NA NA NA NA ... > $ AlTot : num NA 0.106 NA NA NA NA NA NA NA NA ... > $ Alk : num NA NA 231 NA NA NA NA NA NA NA ... > $ AsDis : num NA NA NA NA NA NA NA NA NA NA ... > and so on. > > I do not know if the latter is appropriate; that is, that the ceneq1, >floor, and ceiling values are available for each site, sampdate, and >chemical. > > Is the appropriate way to use the NADA methods for analyses and >plotting >to subset each chemical separately from the 'chem' data frame? Or, is >there >a syntax other than, for example, > >cenboxplot(chem&Vdis, chem$ceneq1, chem$era) >Error in cenros(obs[group == i], cen[group == i]) : > error in evaluating the argument 'obs' in selecting a method for >function >'ros': Error: object 'Vdis' not found > > I get the same error when trying to use the 'chem.cast' data frame. > >Rich > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.