I have water chemistry data with censored values (i.e., those less than reporting levels) in a data frame with a narrow (i.e., database table) format. The structure is: $ site : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 1 ... $ sampdate: Date, format: "2007-12-12" "2007-12-12" ... $ preeq0 : logi TRUE TRUE TRUE TRUE TRUE TRUE ... $ param : Factor w/ 37 levels "Ag","Al","Alk_tot",..: 1 2 8 17 3 4 9 ... $ quant : num 0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ... $ ceneq1 : logi TRUE FALSE TRUE FALSE FALSE FALSE ... $ floor : num 0 0.106 0 231 231 0.011 0 0 0 100 ... $ ceiling : num 0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ... The logical 'preeq0' separates sampdate into two groups; 'ceneq1' indicates censored/uncensored values; 'floor' and 'ceiling' are the minima and maxima for censored values. The NADA package methods will be used, but I have not found information on whether this format or the wide (i.e., spreadsheet) format should be used. The NADA.pdf document doesn't tell me; at least, I haven't found the answer there. I can apply reshape2 to melt and re-cast the data in wide format if that's what is appropriate. Please provide a pointer to documents I can read for an answer to this and related questions. Rich
Rich, I am not familiar with the NADA package, but the reference manual ( http://cran.r-project.org/web/packages/NADA/NADA.pdf) has many examples using several data sets included in the package. Look up one of the functions that you plan to use, run the example in R, and look at the data that is used in the example to see how it is organized. Jean Rich Shepard <rshepard@appl-ecosys.com> wrote on 07/03/2012 11:57:30 AM:> I have water chemistry data with censored values (i.e., those lessthan> reporting levels) in a data frame with a narrow (i.e., database table) > format. The structure is: > > $ site : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 1...> $ sampdate: Date, format: "2007-12-12" "2007-12-12" ... > $ preeq0 : logi TRUE TRUE TRUE TRUE TRUE TRUE ... > $ param : Factor w/ 37 levels "Ag","Al","Alk_tot",..: 1 2 8 17 3 4 9...> $ quant : num 0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ... > $ ceneq1 : logi TRUE FALSE TRUE FALSE FALSE FALSE ... > $ floor : num 0 0.106 0 231 231 0.011 0 0 0 100 ... > $ ceiling : num 0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ... > > The logical 'preeq0' separates sampdate into two groups; 'ceneq1' > indicates censored/uncensored values; 'floor' and 'ceiling' are theminima> and maxima for censored values. > > The NADA package methods will be used, but I have not foundinformation on> whether this format or the wide (i.e., spreadsheet) format should beused.> The NADA.pdf document doesn't tell me; at least, I haven't found theanswer> there. I can apply reshape2 to melt and re-cast the data in wide formatif> that's what is appropriate. Please provide a pointer to documents I canread> for an answer to this and related questions. > > Rich[[alternative HTML version deleted]]
I haven't used NADA functions in quite a while, but from what I recall, you will likely be using the "narrow" format, and sub-setting as needed for the different analytes. As Jean suggested, the examples in the help pages for the NADA function(s) of interest should make it clear. This example follows exactly the example in ?cenros. with( subset(yourdataframe, param=='Ag'), cenros(quant,ceneq1) ) This should do a simple censored summary statistica calculation for silver (assuming quant contains your reporting level for censored results, which appears to be the case). I'd also suggest you try to load your data so that site and param are not factors, though this could depend on your ultimate analysis. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 7/3/12 9:57 AM, "Rich Shepard" <rshepard at appl-ecosys.com> wrote:> I have water chemistry data with censored values (i.e., those less than >reporting levels) in a data frame with a narrow (i.e., database table) >format. The structure is: > > $ site : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 1 >... > $ sampdate: Date, format: "2007-12-12" "2007-12-12" ... > $ preeq0 : logi TRUE TRUE TRUE TRUE TRUE TRUE ... > $ param : Factor w/ 37 levels "Ag","Al","Alk_tot",..: 1 2 8 17 3 4 9 >... > $ quant : num 0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ... > $ ceneq1 : logi TRUE FALSE TRUE FALSE FALSE FALSE ... > $ floor : num 0 0.106 0 231 231 0.011 0 0 0 100 ... > $ ceiling : num 0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ... > > The logical 'preeq0' separates sampdate into two groups; 'ceneq1' >indicates censored/uncensored values; 'floor' and 'ceiling' are the minima >and maxima for censored values. > > The NADA package methods will be used, but I have not found >information on >whether this format or the wide (i.e., spreadsheet) format should be used. >The NADA.pdf document doesn't tell me; at least, I haven't found the >answer >there. I can apply reshape2 to melt and re-cast the data in wide format if >that's what is appropriate. Please provide a pointer to documents I can >read >for an answer to this and related questions. > >Rich > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.