Greetings, I am new to R and am having trouble with parsing a file with the following characteristics: * Individual results for a single sample are written to multiple lines. * First 16 columns are constant from sample to sample. * Remaining 10 need to be matched up (cross-tabbed?) o (the exact contents for the remaining 10 vary from sample to sample, as indicated in the extract below) * Ultimate goal is to run various comparisons between the variable columns, compare samples from separate populations, and graph samples from the separate populations. * (An extract is provided below) The data is initially extracted from an SQL database into Excel, then saved as a tab-delimited text file for use in R. I have been successful in using subset() to extract specific sample types, but have not yet been able to transform the data so that all the data needed is on a single line. I have looked at several R manuals, read through 'R in a Nutshell', prowled the help resources (R Site Search and the Google link), tried stack(), subset(), reshape(), and several other functions, to no avail. Thank you very much for your help. This seems like a wonderful community, Guy Jett, R.G. Project Geologist gjett@itsi.com<mailto:gjett@itsi.com> Example Data Input (subset): fldsampid CLP_ID sacode matrix etc... prccode Lab EXMCODE Analysis PARLABEL PARVQ Result 2268 LHR020GW-01E2 N WG INO BRLS NONE E300 CL = 23590.9 2269 LHR020GW-01E2 N WG INO BRLS NONE E300 PO4 ND 50 2270 LHR020GW-01E2 N WG INO BRLS NONE E300 SO4 = 22460 2272 LHR020GW-01E2 N WG MET BRLS FLDFLT E1631 HG = 0.00171 2273 LHR020GW-01E2 N WG MET BRLS FLDFLT E1638 AG = 2.57 2274 LHR020GW-01E2 N WG MET BRLS FLDFLT E1638 AL = 122 2275 LHR020GW-01E2 N WG MET BRLS FLDFLT E1638 AS = 317 2276 LHR020GW-01E2 N WG MET BRLS FLDFLT E1638 B = 9970 2289 LHR020GW-01E2 N WG MET BRLS FLDFLT E1638 V = 131 2290 LHR020GW-01E2 N WG MET BRLS FLDFLT E1638 Zn = 1.76 2291 LHR020GW-01E2 N WG MET BRLS METHOD E1638 PB ND 0.008 2292 LHR020GW-01E2 N WG MI BRLS NONE A2320 ALK = 807000 2293 LHR020GW-01E2 N WG MI BRLS NONE A2320 ALKB = 807000 2294 LHR020GW-01E2 N WG MI BRLS NONE A2320 ALKC ND 2500 2295 LHR020GW-01E2 N WG ORG BRLS NONE A5310B DOC = 49330 2296 LHR020GW-01E2 N WG SN BRLS NONE E300 NO3 = 792 2326 LHR020SD-00E2 N SE MET BRLS METHOD E1630 MEHG = 4.28 2327 LHR020SD-00E2 N SE MI BRLS METHOD E160.3 SOLID = 48.45 2328 LHR020SD-00E2 N SE ORG BRLS NONE SW9060 TOC = 4.823 2329 LHR020SD-00E2 MY77J8 N SE MET A4SW METHOD C245.5 HG = 5100 2330 LHR020SD-00E2 MY77J8 N SE MET A4SW METHOD E200.8 AG ND 1050 2331 LHR020SD-00E2 MY77J8 N SE MET A4SW METHOD E200.8 AS = 5500 2332 LHR020SD-00E2 MY77J8 N SE MET A4SW METHOD E200.8 B = 11400 2346 LHR020SD-00E2 MY77J8 N SE MET A4SW SW3050B SW6010B V = 56900 2349 LHR020SD-00E2 MY77J8 N SE MI A4SW METHOD A2540G SOLID = 47.7 Desired output: fldsampid CLP_ID sacode matrix etc... CL PO4 SO4 AG AL AS B V Zn etc... ALK ALKB ALKC SOLID DOC TOC NO3 LHR020GW-01E2 N WG <value> <value> <value> <value> <value> <value> <value> <value> <value> <value> <value> <value> <value> <value> <value> <value> <value> LHR020SD-00E2 MY77J8 N SE NA NA NA <value> <value> <value> <value> <value> NA <value> NA NA NA <value> NA NA NA [[alternative HTML version deleted]]
That sample data set is really hard to read. Could you resent it after having used dput on it? A file output with dput is easily read into R and makes seeing what you need much easier. BTW what are the = doing? Thanks --- On Mon, 1/10/11, Guy Jett <GJett at itsi.com> wrote:> From: Guy Jett <GJett at itsi.com> > Subject: [R] Help with Data Transformation > To: "r-help at r-project.org" <r-help at r-project.org> > Received: Monday, January 10, 2011, 3:59 PM > Greetings, > I am new to R and am having trouble with parsing a file > with the following characteristics: > > *? ? ? ???Individual results > for a single sample are written to multiple lines. > > *? ? ? ???First 16 columns > are constant from sample to sample. > > *? ? ? ???Remaining 10 need > to be matched up (cross-tabbed?) > > o???(the exact contents for the remaining 10 > vary from sample to sample, as indicated in the extract > below) > > *? ? ? ???Ultimate goal is to > run various comparisons between the variable columns, > compare samples from separate populations, and graph samples > from the separate populations. > > *? ? ? ???(An extract is > provided below) > > The data is initially extracted from an SQL database into > Excel, then saved as a tab-delimited text file for use in > R. > I have been successful in using subset() to extract > specific sample types, but have not yet been able to > transform the data so that all the data needed is on a > single line.? I have looked at several R manuals, read > through 'R in a Nutshell', prowled the help resources (R > Site Search and the Google link), tried stack(), subset(), > reshape(), and several other functions, to no avail. > > Thank you very much for your help.? This seems like a > wonderful community, > Guy Jett, R.G. > Project Geologist > gjett at itsi.com<mailto:gjett at itsi.com> > > Example Data Input (subset): > ? ? ? ? ? ? ? ? > fldsampid? ? ? ? ? ? > CLP_ID? sacode? > matrix???etc...? ? ? > prccode? ? ? ? ? ? ? > ? Lab? ? ? > ???EXMCODE? ? ? ? > ???Analysis? ? ? ? ? > ? ? ? PARLABEL? ? ? ? > ? ? ? ? PARVQ Result > 2268? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? INO? ? ? ? > BRLS? ? ? NONE? ? E300? ? > ???CL? ? ? ? > ???=? ? ? ? ? > ???23590.9 > 2269? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? INO? ? ? ? > BRLS? ? ? NONE? ? E300? ? > ???PO4? ? ? ? ND? > ? ? ? ? 50 > 2270? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? INO? ? ? ? > BRLS? ? ? NONE? ? E300? ? > ???SO4? ? ? ? =? > ? ? ? ? ???22460 > 2272? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? MET? ? > ???BRLS? ? ? FLDFLT? > E1631? ? HG? ? ? ? ? > =? ? ? ? ? > ???0.00171 > 2273? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? MET? ? > ???BRLS? ? ? FLDFLT? > E1638? ? AG? ? ? ? ? > =? ? ? ? ? ???2.57 > 2274? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? MET? ? > ???BRLS? ? ? FLDFLT? > E1638? ? AL? ? ? ? > ???=? ? ? ? ? > ???122 > 2275? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? MET? ? > ???BRLS? ? ? FLDFLT? > E1638? ? AS? ? ? ? > ???=? ? ? ? ? > ???317 > 2276? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? MET? ? > ???BRLS? ? ? FLDFLT? > E1638? ? B? ? ? ? ? > ???=? ? ? ? ? > ???9970 > 2289? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? MET? ? > ???BRLS? ? ? FLDFLT? > E1638? ? V? ? ? ? ? > ???=? ? ? ? ? > ???131 > 2290? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? MET? ? > ???BRLS? ? ? FLDFLT? > E1638? ? Zn? ? ? ? > ???=? ? ? ? ? > ???1.76 > 2291? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? MET? ? > ???BRLS? ? ? METHOD? > ? ? ? ? ? ? E1638? ? > PB? ? ? ? ???ND? > ? ? ? ? ? ? ? 0.008 > 2292? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? MI? ? ? ? ? > BRLS? ? ? NONE? ? A2320? > ? ALK? ? ? ? =? ? ? > ? ? ???807000 > 2293? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? MI? ? ? ? ? > BRLS? ? ? NONE? ? A2320? > ? ALKB? ? ? =? ? ? ? > ? ???807000 > 2294? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? MI? ? ? ? ? > BRLS? ? ? NONE? ? A2320? > ? ALKC? ? ? ND? ? ? > ? ? 2500 > 2295? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? ORG? ? > ???BRLS? ? ? NONE? ? > A5310B DOC? ? ???=? ? > ? ? ? ???49330 > 2296? ? ???LHR020GW-01E2? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ???N? > ? ? ? ? ???WG? > ? ? ? ? ? ? ? ? > ? ? ? SN? ? ? ? ? > BRLS? ? ? NONE? ? E300? ? > ???NO3? ? ???=? > ? ? ? ? ???792 > 2326? ? ???LHR020SD-00E2? > ? ? ? ? ? ? > ???N? ? ? ? ? > ???SE? ? ? ? ? > ? ? ? ? ? ? ? > ???MET? ? > ???BRLS? ? ? METHOD? > ? ? ? ? ? ? E1630? ? > MEHG???=? ? ? ? ? > ???4.28 > 2327? ? ???LHR020SD-00E2? > ? ? ? ? ? ? > ???N? ? ? ? ? > ???SE? ? ? ? ? > ? ? ? ? ? ? ? > ???MI? ? ? ? ? > BRLS? ? ? METHOD? ? ? ? > ? ? ? E160.3???SOLID? > ? =? ? ? ? ? > ???48.45 > 2328? ? ???LHR020SD-00E2? > ? ? ? ? ? ? > ???N? ? ? ? ? > ???SE? ? ? ? ? > ? ? ? ? ? ? ? > ???ORG? ? > ???BRLS? ? ? NONE? ? > SW9060? ? ? ? ? ? ? > ? TOC? ? ? ? =? ? ? > ? ? ???4.823 > 2329? ? ???LHR020SD-00E2 MY77J8 > N? ? ? ? ? > ???SE? ? ? ? ? > ? ? ? ? ? ? ? > ???MET? ? > ???A4SW? ? METHOD? ? > ? ? ? ? ? > C245.5???HG? ? ? ? ? > =? ? ? ? ? ???5100 > 2330? ? ???LHR020SD-00E2 MY77J8 > N? ? ? ? ? > ???SE? ? ? ? ? > ? ? ? ? ? ? ? > ???MET? ? > ???A4SW? ? METHOD? ? > ? ? ? ? ? > E200.8???AG? ? ? ? ? > ND? ? ? ? ? 1050 > 2331? ? ???LHR020SD-00E2 MY77J8 > N? ? ? ? ? > ???SE? ? ? ? ? > ? ? ? ? ? ? ? > ???MET? ? > ???A4SW? ? METHOD? ? > ? ? ? ? ? > E200.8???AS? ? ? ? > ???=? ? ? ? ? > ???5500 > 2332? ? ???LHR020SD-00E2 MY77J8 > N? ? ? ? ? > ???SE? ? ? ? ? > ? ? ? ? ? ? ? > ???MET? ? > ???A4SW? ? METHOD? ? > ? ? ? ? ? > E200.8???B? ? ? ? ? > ???=? ? ? ? ? > ???11400 > 2346? ? ???LHR020SD-00E2 MY77J8 > N? ? ? ? ? > ???SE? ? ? ? ? > ? ? ? ? ? ? ? > ???MET? ? > ???A4SW? ? SW3050B? ? > ? ? ? ???SW6010B? ? > ? ? ? ???V? ? ? > ? ? ???=? ? ? ? > ? ? ? ? 56900 > 2349? ? ???LHR020SD-00E2 MY77J8 > N? ? ? ? ? > ???SE? ? ? ? ? > ? ? ? ? ? ? ? > ???MI? ? ? ? ? > A4SW? ? METHOD? ? ? ? ? > ? ? A2540G? ? ? ? ? > ? ? ? SOLID? ? =? ? > ? ? ? ???47.7 > > Desired output: > ? ? ? ? ? ? ? ? > fldsampid? ? ? ? ? ? > CLP_ID? sacode? > matrix???etc...? ? ? CL? > ? ? ? ???PO4? ? > ? ? SO4? ? ? ? AG? ? > ? ? ? AL? ? ? ? > ???AS? ? ? ? > ???B? ? ? ? ? > ???V? ? ? ? ? > ???Zn? ? ? ? ? > ? ? ? etc...? ? ? ALK? > ? ? ? ALKB? ? ? ALKC? > ? ? SOLID? ? DOC? ? > ???TOC? ? ? ? NO3 > ? ? ? ? ? ? ? ? > LHR020GW-01E2? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ???N? ? ? ? ? > ???WG? ? ? ? ? > ? ? ? ? ? ? ? > <value>? ? ? ? ? ? > ? ? <value>? ? ? ? > ? ? ? ? <value>? ? > ? ? ? ? ? ? > <value>? ? ? ? ? ? > ? ? <value>? ? ? ? > ? ? ? ? <value>? ? > ? ? ? ? ? ? > <value>? ? ? ? ? ? > ? ? <value>? ? ? ? > ? ? ? ? <value>? ? > ? ? ? ? ? ? > <value>? ? ? ? ? ? > ? ? <value>? ? ? ? > ? ? ? ? <value>? ? > ? ? ? ? ? ? > <value>? ? ? ? ? ? > ? ? <value>? ? ? ? > ? ? ? ? <value>? ? > ? ? ? ? ? ? > <value>? ? ? ? ? ? > ? ? <value> > ? ? ? ? ? ? ? ? > LHR020SD-00E2 MY77J8 N? ? ? ? ? > ???SE? ? ? ? ? > ? ? ? ? ? ? ? > ???NA? ? ? ? ? > NA? ? ? ? ? NA? ? ? > ? ? <value>? ? ? ? > ? ? ? ? <value>? ? > ? ? ? ? ? ? > <value>? ? ? ? ? ? > ? ? <value>? ? ? ? > ? ? ? ? <value>? ? > ? ? ? ? ? ? NA? ? > ? ? ? <value>? ? ? > ? ? ? ? ? NA? ? ? > ? ? NA? ? ? ? ? NA? > ? ? ? ? <value>? ? > ? ? ? ? ? ? NA? ? > ? ? ? NA? ? ? ? ? > NA > > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >
Dear Guy, Have a look at cast() from the reshape package. You'll need something like cast(fldsampleid ~ Analysis, value = "Result", data = your.data.frame) Best regards, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey> -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] Namens Guy Jett > Verzonden: maandag 10 januari 2011 22:00 > Aan: r-help at r-project.org > Onderwerp: [R] Help with Data Transformation > > Greetings, > I am new to R and am having trouble with parsing a file with > the following characteristics: > > * Individual results for a single sample are written > to multiple lines. > > * First 16 columns are constant from sample to sample. > > * Remaining 10 need to be matched up (cross-tabbed?) > > o (the exact contents for the remaining 10 vary from sample > to sample, as indicated in the extract below) > > * Ultimate goal is to run various comparisons between > the variable columns, compare samples from separate > populations, and graph samples from the separate populations. > > * (An extract is provided below) > > The data is initially extracted from an SQL database into > Excel, then saved as a tab-delimited text file for use in R. > I have been successful in using subset() to extract specific > sample types, but have not yet been able to transform the > data so that all the data needed is on a single line. I have > looked at several R manuals, read through 'R in a Nutshell', > prowled the help resources (R Site Search and the Google > link), tried stack(), subset(), reshape(), and several other > functions, to no avail. > > Thank you very much for your help. This seems like a > wonderful community, > Guy Jett, R.G. > Project Geologist > gjett at itsi.com<mailto:gjett at itsi.com> > > Example Data Input (subset): > fldsampid CLP_ID sacode matrix > etc... prccode Lab EXMCODE > Analysis PARLABEL PARVQ Result > 2268 LHR020GW-01E2 N > WG INO BRLS NONE > E300 CL = 23590.9 > 2269 LHR020GW-01E2 N > WG INO BRLS NONE > E300 PO4 ND 50 > 2270 LHR020GW-01E2 N > WG INO BRLS NONE > E300 SO4 = 22460 > 2272 LHR020GW-01E2 N > WG MET BRLS FLDFLT > E1631 HG = 0.00171 > 2273 LHR020GW-01E2 N > WG MET BRLS FLDFLT > E1638 AG = 2.57 > 2274 LHR020GW-01E2 N > WG MET BRLS FLDFLT > E1638 AL = 122 > 2275 LHR020GW-01E2 N > WG MET BRLS FLDFLT > E1638 AS = 317 > 2276 LHR020GW-01E2 N > WG MET BRLS FLDFLT > E1638 B = 9970 > 2289 LHR020GW-01E2 N > WG MET BRLS FLDFLT > E1638 V = 131 > 2290 LHR020GW-01E2 N > WG MET BRLS FLDFLT > E1638 Zn = 1.76 > 2291 LHR020GW-01E2 N > WG MET BRLS METHOD > E1638 PB ND 0.008 > 2292 LHR020GW-01E2 N > WG MI BRLS NONE > A2320 ALK = 807000 > 2293 LHR020GW-01E2 N > WG MI BRLS NONE > A2320 ALKB = 807000 > 2294 LHR020GW-01E2 N > WG MI BRLS NONE > A2320 ALKC ND 2500 > 2295 LHR020GW-01E2 N > WG ORG BRLS NONE > A5310B DOC = 49330 > 2296 LHR020GW-01E2 N > WG SN BRLS NONE > E300 NO3 = 792 > 2326 LHR020SD-00E2 N SE > MET BRLS METHOD > E1630 MEHG = 4.28 > 2327 LHR020SD-00E2 N SE > MI BRLS METHOD > E160.3 SOLID = 48.45 > 2328 LHR020SD-00E2 N SE > ORG BRLS NONE SW9060 > TOC = 4.823 > 2329 LHR020SD-00E2 MY77J8 N SE > MET A4SW METHOD C245.5 > HG = 5100 > 2330 LHR020SD-00E2 MY77J8 N SE > MET A4SW METHOD E200.8 > AG ND 1050 > 2331 LHR020SD-00E2 MY77J8 N SE > MET A4SW METHOD E200.8 > AS = 5500 > 2332 LHR020SD-00E2 MY77J8 N SE > MET A4SW METHOD E200.8 B > = 11400 > 2346 LHR020SD-00E2 MY77J8 N SE > MET A4SW SW3050B SW6010B > V = 56900 > 2349 LHR020SD-00E2 MY77J8 N SE > MI A4SW METHOD A2540G > SOLID = 47.7 > > Desired output: > fldsampid CLP_ID sacode matrix > etc... CL PO4 SO4 AG AL > AS B V Zn > etc... ALK ALKB ALKC SOLID > DOC TOC NO3 > LHR020GW-01E2 N > WG <value> > <value> <value> <value> > <value> <value> <value> > <value> <value> > <value> <value> <value> > <value> <value> <value> > <value> <value> > LHR020SD-00E2 MY77J8 N SE > NA NA NA <value> > <value> <value> > <value> <value> NA > <value> NA NA NA > <value> NA NA NA > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
> The data is initially extracted from an SQL database into Excel, then saved as a tab-delimited text file for use in R.You might also want to look at the SQL packages for R so you can skip this manual step. I'd recommend starting with http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/