thr3ads.net - R help - [R] Help with Data Transformation

If this information is useful, please help other people find it:
Share via:
Guy Jett
2011-Jan-13 00:37 UTC
[R] Help with Data Transformation - RESOLVED

Hi Dennis,
SOLVED!!!

My thanks to both you, John, and others who chimed in.  Took a little more
digging before finally working, but it is working!

Here's a little more of what I did and the ultimate resolutions:
I installed the reshape2 library from CRAN.
Executed> (.packages())[1] "reshape2"  "stats"     "graphics" 
"grDevices" "datasets"  "rcom"     
"rscproxy"  "utils"     "methods"  
"base"
confirming presence.
Executed your toy script (stepwise so as not to include the output as input)
Received the following result at the end:
            > cast(df, idnum ~ lab, value = 'y')
Error: could not find function "cast"

The Package 'reshape2' documentation indicates that either acast() or
dcast() are used, not giving an example of cast() itself.
I resolved this by trying both acast() and dcast().  Both yielded the same
screen results.

Turning to my own data I was able to successfully execute the operation on one
of my subsets AFTER, cleaning up some results with duplicates
"PARLABELS".   (throws the resulting matrix into a count of values but
the docs give an indication of the resolution).  So it looks like I'm on my
way after some further data clean-up.

I am using dcast() as I wish to stay 2-dimensional (personal limitations on
brain()).

Cheers and thanks again to all,
Guy
ITSI,  A Gilbane Company
(925) 946-3340 Direct
(925) 457-4168 ITSI Cell
gjett@itsi.com<mailto:gjett@itsi.com>

From: Dennis Murphy [mailto:djmuser@gmail.com]
Sent: Wednesday, January 12, 2011 1:19 PM
To: Guy Jett
Cc: r-help@r-project.org
Subject: Re: [R] Help with Data Transformation

Hi:

This seems like a problem that is well suited for the cast() function in package
reshape2. Here's a toy example:

library(reshape2)
df <- data.frame(idnum = rep(101:104, c(3, 2, 4, 3)),
                  lab = unlist(sapply(c(3, 2, 4, 3), function(x)
sample(LETTERS[1:6], x))),
                  y = rpois(12, 7))
df
   idnum lab  y
1    101   C  7
2    101   F 11
3    101   E  4
4    102   B 10
5    102   A  7
6    103   E  6
7    103   D  9
8    103   B  3
9    103   F 12
10   104   C 10
11   104   D  9
12   104   A  5
cast(df, idnum ~ lab, value = 'y')
  idnum  A  B  C  D  E  F
1   101 NA NA  7 NA  4 11
2   102  7 10 NA NA NA NA
3   103 NA  3 NA  9  6 12
4   104  5 NA 10  9 NA NA

Since you have multiple 'invariant' variables, you could use something
like

cast(df, . ~ lab, value = 'y')

This would make all variables other than lab the 'rows', the values of
lab as separate 'columns', with the value of the variable y inserted in
the appropriate locations, NA fill otherwise.

cast() is an alternative to the reshape() function in base R.

HTH,
Dennis
On Wed, Jan 12, 2011 at 12:19 PM, Guy Jett
<GJett@itsi.com<mailto:GJett@itsi.com>> wrote:
Hi John,
Thank you for your patience.  I was away for a State certification exam
yesterday, so am just getting back to this.

Reading through you response I believe I wasn't clear enough about what
I'm trying to do.  Your description seems to rearrange the matrix without
grouping the analytical results for a single sample onto a single line, as I had
hoped.  I may have confused things by attempting to send a truncated/simplified
dataset.

Restatement of needs:
* I have 863 individual samples.  The following columns contain invariant
results for each sample:
   -
"Transect","Offset","Location","fldsampid","CLP_ID","sacode","matrix","LTCCODE",
      
"Northing","Easting","CRDUNITS","Event","LOGDATE","sbd","sed".
   - Sorting can make use of "fldsampid" as these values are entirely
unique to
     each sample.
* Each sample is associates with one or more of the following 48 analytical
parameters:
   -
AG","AL","ALK","ALKB","ALKC","AS","B","BA","BE","BR","CA","CD""CL","CO","CR","CU",
    
"DOC","FE","Hg","HG","HGACIDLAB","HGEXTINO","HGEXTORG","HGNONMOB""HGSEMIMOB","K",
    
"MEHG","MG","MN","MO","NH3N","NI","NO2N","NO3","NO3N","PB""PO4","S","SB","SE","SO4",
      
"SOLID","SSC","TL","TOC","V","Zn","ZN"
   - These are currently stored in the "PARLABEL" column.
* For each sample ID I would like to create a single line;
 - Extract each "PARLABEL" to use as a column name; and
 - Place the "Result" in the appropriate column.
* I can subset the data so that "prccode", "Lab",
"EXMCODE", "Analysis", "PARVQ", "RL",
 "EPA_FLAGS", and "units" are irrelevant to the issue.

The following snippet should illustrate the absolute minumum needs:
INPUT
fldsampid   |  PARLABEL    |  Result
------------+--------------+---------
fldsampid1  |  PARLABEL-a  |  value-8
fldsampid1  |  PARLABEL-b  |  value-5
fldsampid1  |  PARLABEL-x  |  value-2
fldsampid1  |  PARLABEL-y  |  value-0
fldsampid2  |  PARLABEL-a  |  value-9
fldsampid2  |  PARLABEL-c  |  value-8
fldsampid3  |  PARLABEL-a  |  value-2
fldsampid3  |  PARLABEL-d  |  value-8
fldsampid3  |  PARLABEL-w  |  value-3
fldsampid3  |  PARLABEL-x  |  value-9
fldsampid3  |  PARLABEL-y  |  value-6

OUTPUT
fldsampid   |  PARLABEL-a  |  PARLABEL-b  |  PARLABEL-w  |  PARLABEL-x  | 
PARLABEL-y
------------+--------------+--------------+--------------+--------------+--------------
fldsampid1  |   value-8    |   value-5    |   NA         |   value-2    |  
value-0
fldsampid2  |   value-9    |   value-2    |   NA         |   NA         |   NA
fldsampid3  |   value-2    |   NA         |   value-3    |   value-9    | 
value-6

If it would help I could attach a 31kb file written with
       write.table(Units_NG.L, file="Units_NG.L", quote=FALSE,
sep="\t")
This subset has 97 individual samples and 3 "PARLABELS" distributed
across 249 individual lines.


Added Responses:
1. The structure of my actual input file appears to be correct per the
following:
  (I has sent you a separate extration from an excel file)
  (Strings as Factors, numbers as num or int; a date changed via as.Date())
'data.frame':   19694 obs. of  25 variables:
 $ Transect : Factor w/ 78 levels "FLR01","FLR02",..: 1 1 1
1 1 1 1 1 1 1 ...
 $ Offset   : Factor w/ 16 levels
"0","A","B","C",..: 1 1 1 1 1 1 1 1 1 1
...
 $ Location : Factor w/ 246 levels "FLR010","FLR01A",..: 1 1
1 1 1 1 1 1 1 1 ...
 $ fldsampid: Factor w/ 863 levels "FLR010-ANE1",..: 1 1 1 1 1 1 1 1 1
1 ...
 $ CLP_ID   : Factor w/ 586 levels
"","MY6591","MY6593",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ sacode   : Factor w/ 2 levels "FD","N": 2 2 2 2 2 2 2 2 2
2 ...
 $ matrix   : Factor w/ 6 levels
"SE","SO","TA",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ LTCCODE  : Factor w/ 4 levels
"BH","LK","RE",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ Northing : num  2444733 2444733 2444733 2444733 2444733 ...
 $ Easting  : num  5684613 5684613 5684613 5684613 5684613 ...
 $ CRDUNITS : Factor w/ 1 level "FT": 1 1 1 1 1 1 1 1 1 1 ...
 $ Event    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ LOGDATE  :Class 'Date'  num [1:19694] -717743 -717743 -717743 -717743
-717743 ...
 $ sbd      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ sed      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ prccode  : Factor w/ 5 levels
"INO","MET","MI",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Lab      : Factor w/ 5 levels "A4SW","BRLS",..: 2 2 2 2 2
2 2 2 2 2 ...
 $ EXMCODE  : Factor w/ 5 levels "FLDFLT","METHOD",..: 2 2 2
2 2 2 2 2 2 2 ...
 $ Analysis : Factor w/ 23 levels "A2320","A2540G",..: 10 11
12 12 12 12 12 12 12 12 ...
 $ PARLABEL : Factor w/ 48 levels
"AG","AL","ALK",..: 27 20 1 2 6 7 8 9 12 14 ...
 $ PARVQ    : Factor w/ 3 levels "=","ND","TR": 1
1 2 1 1 1 1 2 1 1 ...
 $ Result   : num  20.6 24.7 5 14900 60 100 4930 4 182 80 ...
 $ RL       : num  3.1 0.77 10 5750 160 790 160 8 10 80 ...
 $ EPA_FLAGS: Factor w/ 10 levels
"","J","J-","J+",..: 4 1 7 3 2 2 1 7 1 2
...
 $ units    : Factor w/ 3 levels "ug/kg","ug/L",..: 1 1 1 1
1 1 1 1 1 1 ...

2. "etc..." Sorry to confuse you, this was to indicate additional
columns.

Guy Jett
ITSI,  A Gilbane Company
(925) 946-3340 Direct
(925) 457-4168 ITSI Cell
gjett@itsi.com<mailto:gjett@itsi.com>

-----Original Message-----
From: John Kane [mailto:jrkrideau@yahoo.ca<mailto:jrkrideau@yahoo.ca>]
Sent: Monday, January 10, 2011 4:29 PM
To: r-help@r-project.org<mailto:r-help@r-project.org>; Guy Jett
Subject: RE: [R] Help with Data Transformation

That's fine.  Am I correct that this is the format you want for the output
file?

nams <- c("fldsampid", "CLP_ID", "sacode",
"matrix", "etc.", "CL", "PO4",
"SO4AG", "AL", "AS", "B", "V",
"Zn", "etc.", "ALK", "ALKB",
"ALKC", "SOLID", "DOC", "TOC",
"NO3")


It seems a bit suspicious as you have etc. in there twice.


In any case if I understand what you want all you need to do is arrange those
names in the order you want and put them in a vector call it bb.

Then you simply say

newxx <- xx[,bb]
et voilàI

You may want to have a look at indexing in the Introduction to R to get a
feeling for what's happening herel

Below is a small example.

However I don't think your data.frame is quite what you think it is.

When I do str(xx) to look at the structure all your variables are being read in
as factors, which I suspect is not what you want.  R tries to recognize what
type of variable is being read in and often seems to decide a character or even
a numeric variable is a factor

You may want to run the command
options(stringsAsFactors = FALSE)
before you load the data into the data.frame

I hope this is of some help.
#==================================================================df1 <-
structure(list(site = c(1, 1, 4, 4, 1, 4), id = structure(c(1L, 2L, 2L, 3L, 1L,
2L), .Label = c("a", "b", "c"), class =
"factor"),
   cata = c(1, 1, 6, 1, 1, NA), catb = c(1, 2, 3, 4, 5, 6),
   doga = c(3, 5, 3, 6, 4, 0), dogb = c(2, 4, 6, 8, 10, 12),
   rata = c(NA, 9, 9, 8, 9, 8), ratb = c(1, 2, 3, 4, 5, 6),
   bata = c(12, 42, NA, 45, 32, 54), batb = c(13, 15, 17, 19,
   21, 23)), .Names = c("site", "id", "cata",
"catb", "doga", "dogb", "rata",
"ratb", "bata", "batb"), row.names =
c("aa", "bb", "cc", "dd",
"ee", "ff"), class = "data.frame")

df1

bb <- c("dogb", "rata", "ratb",
"bata", "batb", "site", "id",
"cata",
   "catb", "doga")

   newdf <- df1[,bb]


#=================================================================
--- On Mon, 1/10/11, Guy Jett
<GJett@itsi.com<mailto:GJett@itsi.com>> wrote:
> From: Guy Jett <GJett@itsi.com<mailto:GJett@itsi.com>>
> Subject: RE: [R] Help with Data Transformation
> To: "John Kane"
<jrkrideau@yahoo.ca<mailto:jrkrideau@yahoo.ca>>,
"r-help@r-project.org<mailto:r-help@r-project.org>"
> <r-help@r-project.org<mailto:r-help@r-project.org>>
> Received: Monday, January 10, 2011, 5:20 PM Thank you John, I have put
> a copy of the dput file below my signature block.  I hope that is what
> you need as I am unfamiliar with that function.  Note that empty cells
> need to become "NA".
> The "=" character is part of the "PARVALUE" column.
> Yours,
> Guy
> gjett@itsi.com<mailto:gjett@itsi.com>
>
> structure(list(X = c(2268L, 2269L, 2270L, 2272L, 2273L, 2274L, 2275L,
> 2276L, 2289L, 2290L, 2291L, 2292L, 2293L, 2294L, 2295L, 2296L, 2326L,
> 2327L, 2328L, 2329L, 2330L, 2331L, 2332L, 2346L, 2349L, NA, NA, NA,
> NA, NA), fldsampid = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L,
> 1L, 2L, 3L, 4L), .Label = c("", "fldsampid",
"LHR020GW-01E2",
> "LHR020SD-00E2"), class = "factor"), CLP_ID =
structure(c(1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L,
> 3L, 3L, 3L, 3L, 3L, 1L, 1L, 2L, 1L, 3L), .Label = c("",
"CLP_ID",
> "MY77J8"), class = "factor"), sacode = structure(c(2L,
2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 1L, 1L, 3L, 2L, 2L), .Label = c("", "N",
"sacode"), class
> = "factor"), matrix = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L,
> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,
> 1L, 2L, 4L, 3L), .Label = c("", "matrix",
"SE", "WG"), class > "factor"), etc. =
structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 2L, 1L, 1L), .Label = c("", "etc."), class =
"factor"), prccode > structure(c(4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 6L, 6L, 6L,
> 7L, 8L, 5L, 6L, 7L, 5L, 5L, 5L, 5L, 5L, 6L, 1L, 1L, 3L, 2L, NA),
> .Label = c("", "<value>", "CL",
"INO", "MET", "MI", "ORG",
"SN"),
> class = "factor"), Lab = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L,
> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L,
> 1L, 1L, 5L, 2L, NA), .Label = c("", "<value>",
"A4SW", "BRLS", "PO4"),
> class = "factor"), EXMCODE = structure(c(5L, 5L, 5L, 3L, 3L, 3L,
3L,
> 3L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 5L, 4L, 4L, 4L, 4L, 7L,
> 4L, 1L, 1L, 6L, 2L, NA), .Label = c("",
"<value>", "FLDFLT", "METHOD",
> "NONE", "SO4", "SW3050B"), class =
"factor"),
>     Analysis = structure(c(13L, 13L, 13L, 10L, 11L, 11L, 11L,
>     11L, 11L, 11L, 11L, 3L, 3L, 3L, 5L, 13L, 9L, 8L, 15L, 7L,
>     12L, 12L, 12L, 14L, 4L, 1L, 1L, 6L, 2L, 2L), .Label = c("",
>     "<value>", "A2320", "A2540G",
"A5310B", "AG", "C245.5", "E160.3",
>     "E1630", "E1631", "E1638",
"E200.8", "E300", "SW6010B", "SW9060"
>     ), class = "factor"), PARLABEL > structure(c(10L, 16L,
17L,
>     12L, 3L, 4L, 8L, 9L, 20L, 21L, 15L, 5L, 6L, 7L, 11L, 14L,
>     13L, 18L, 19L, 12L, 3L, 8L, 9L, 20L, 18L, 1L, 1L, 4L, 2L,
>     2L), .Label = c("", "<value>",
"AG", "AL", "ALK", "ALKB",
>     "ALKC", "AS", "B", "CL",
"DOC", "HG", "MEHG", "NO3",
"PB",
>     "PO4", "SO4", "SOLID", "TOC",
"V", "Zn"), class = "factor"),
>     PARVQ = structure(c(3L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>     5L, 3L, 3L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 3L, 3L, 3L, 3L,
>     1L, 1L, 4L, 2L, 2L), .Label = c("",
"<value>", "=", "AS",
>     "ND"), class = "factor"), Result = structure(c(12L,
20L,
>     11L, 3L, 10L, 8L, 14L, 26L, 9L, 5L, 4L, 25L, 25L, 13L, 19L,
>     24L, 15L, 18L, 16L, 21L, 6L, 22L, 7L, 23L, 17L, 1L, 1L, 27L,
>     2L, 2L), .Label = c("", "<value>",
"0.00171", "0.008", "1.76",
>     "1050", "11400", "122", "131",
"2.57", "22460", "23590.9",
>     "2500", "317", "4.28", "4.823",
"47.7", "48.45", "49330",
>     "50", "5100", "5500", "56900",
"792", "807000", "9970", "B"
>     ), class = "factor"), X.1 = structure(c(1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 2L), .Label = c("",
"<value>",
>     "V"), class = "factor"), X.2 > structure(c(1L,
1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, NA), .Label = c("",
>     "<value>", "Zn"), class =
"factor"),
> X.3 = structure(c(1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 2L), .Label >
c("",
>     "<value>", "etc."), class =
"factor"),
> X.4 = structure(c(1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, NA), .Label >
c("",
>     "<value>", "ALK"), class =
"factor"),
> X.5 = structure(c(1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, NA), .Label >
c("",
>     "<value>", "ALKB"), class =
"factor"),
> X.6 = structure(c(1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, NA), .Label >
c("",
>     "<value>", "ALKC"), class =
"factor"),
> X.7 = structure(c(1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 2L), .Label >
c("",
>     "<value>", "SOLID"), class =
"factor"),
> X.8 = structure(c(1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, NA), .Label >
c("",
>     "<value>", "DOC"), class =
"factor"),
> X.9 = structure(c(1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, NA), .Label >
c("",
>     "<value>", "TOC"), class =
"factor"),
> X.10 = structure(c(1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, NA), .Label >
c("",
>     "<value>", "NO3"), class =
"factor")), .Names = c("X",
> "fldsampid", "CLP_ID", "sacode",
"matrix", "etc.", "prccode", "Lab",
> "EXMCODE", "Analysis", "PARLABEL",
"PARVQ", "Result", "X.1", "X.2",
> "X.3", "X.4", "X.5", "X.6",
"X.7", "X.8", "X.9", "X.10"), class >
"data.frame", row.names = c(NA,
> -30L))
>
>
> -----Original Message-----
> From: John Kane
[mailto:jrkrideau@yahoo.ca<mailto:jrkrideau@yahoo.ca>]
>
> Sent: Monday, January 10, 2011 1:43 PM
> To: r-help@r-project.org<mailto:r-help@r-project.org>;
> Guy Jett
> Subject: Re: [R] Help with Data Transformation
>
> That sample data set is really hard to read.  Could you resent it
> after having used dput on it?
>
> A file output with dput is easily read into R and makes seeing what
> you need much easier.  BTW what are the = doing?
>
> Thanks
>
>
>
> --- On Mon, 1/10/11, Guy Jett
<GJett@itsi.com<mailto:GJett@itsi.com>>
> wrote:
>
> > From: Guy Jett <GJett@itsi.com<mailto:GJett@itsi.com>>
> > Subject: [R] Help with Data Transformation
> > To:
"r-help@r-project.org<mailto:r-help@r-project.org>"
> <r-help@r-project.org<mailto:r-help@r-project.org>>
> > Received: Monday, January 10, 2011, 3:59 PM Greetings,
> I am new to R
> > and am having trouble with parsing a file with the
> following
> > characteristics:
> >
> > *         Individual results
> > for a single sample are written to multiple lines.
> >
> > *         First 16 columns
> > are constant from sample to sample.
> >
> > *         Remaining 10 need
> > to be matched up (cross-tabbed?)
> >
> > o   (the exact contents for the remaining 10 vary
> from sample to
> > sample, as indicated in the extract
> > below)
> >
> > *         Ultimate goal is to
> > run various comparisons between the variable columns,
> compare samples
> > from separate populations, and graph samples from the
> separate
> > populations.
> >
> > *         (An extract is
> > provided below)
> >
> > The data is initially extracted from an SQL database
> into Excel, then
> > saved as a tab-delimited text file for use in R.
> > I have been successful in using subset() to extract
> specific sample
> > types, but have not yet been able to transform the
> data so that all
> > the data needed is on a single line.  I have looked
> at several R
> > manuals, read through 'R in a Nutshell', prowled the
> help resources (R
> > Site Search and the Google link), tried stack(),
> subset(), reshape(),
> > and several other functions, to no avail.
> >
> > Thank you very much for your help.  This seems like a
> wonderful
> > community, Guy Jett, R.G.
> > Project Geologist
> >
gjett@itsi.com<mailto:gjett@itsi.com><mailto:gjett@itsi.com<mailto:gjett@itsi.com>>
> >
> > Example Data Input (subset):
> >
> > fldsampid
> > CLP_ID  sacode
> > matrix   etc...
> > prccode
> >   Lab
> >    EXMCODE
> >    Analysis
> >       PARLABEL
> >         PARVQ Result
> > 2268       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       INO
> > BRLS      NONE    E300
> >    CL
> >    > >    23590.9
> > 2269       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       INO
> > BRLS      NONE    E300
> >    PO4        ND
> >         50
> > 2270       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       INO
> > BRLS      NONE    E300
> >    SO4        > >            22460
> > 2272       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       MET
> >    BRLS      FLDFLT
> > E1631    HG
> > > >    0.00171
> > 2273       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       MET
> >    BRLS      FLDFLT
> > E1638    AG
> > =             2.57
> > 2274       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       MET
> >    BRLS      FLDFLT
> > E1638    AL
> >    > >    122
> > 2275       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       MET
> >    BRLS      FLDFLT
> > E1638    AS
> >    > >    317
> > 2276       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       MET
> >    BRLS      FLDFLT
> > E1638    B
> >    > >    9970
> > 2289       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       MET
> >    BRLS      FLDFLT
> > E1638    V
> >    > >    131
> > 2290       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       MET
> >    BRLS      FLDFLT
> > E1638    Zn
> >    > >    1.76
> > 2291       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       MET
> >    BRLS      METHOD
> >             E1638
> > PB           ND
> >               0.008
> > 2292       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       MI
> > BRLS      NONE    A2320
> >   ALK        > >        807000
> > 2293       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       MI
> > BRLS      NONE    A2320
> >   ALKB      > >      807000
> > 2294       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       MI
> > BRLS      NONE    A2320
> >   ALKC      ND
> >     2500
> > 2295       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       ORG
> >    BRLS      NONE
> > A5310B DOC       > >          49330
> > 2296       LHR020GW-01E2
> >
> >              N
> >            WG
> >
> >       SN
> > BRLS      NONE    E300
> >    NO3       > >            792
> > 2326       LHR020SD-00E2
> >
> >    N
> >    SE
> >
> >    MET
> >    BRLS      METHOD
> >             E1630
> > MEHG   > >    4.28
> > 2327       LHR020SD-00E2
> >
> >    N
> >    SE
> >
> >    MI
> > BRLS      METHOD
> >       E160.3   SOLID
> >   > >    48.45
> > 2328       LHR020SD-00E2
> >
> >    N
> >    SE
> >
> >    ORG
> >    BRLS      NONE
> > SW9060
> >   TOC        > >        4.823
> > 2329       LHR020SD-00E2 MY77J8
> > N
> >    SE
> >
> >    MET
> >    A4SW    METHOD
> >
> > C245.5   HG
> > =             5100
> > 2330       LHR020SD-00E2 MY77J8
> > N
> >    SE
> >
> >    MET
> >    A4SW    METHOD
> >
> > E200.8   AG
> > ND          1050
> > 2331       LHR020SD-00E2 MY77J8
> > N
> >    SE
> >
> >    MET
> >    A4SW    METHOD
> >
> > E200.8   AS
> >    > >    5500
> > 2332       LHR020SD-00E2 MY77J8
> > N
> >    SE
> >
> >    MET
> >    A4SW    METHOD
> >
> > E200.8   B
> >    > >    11400
> > 2346       LHR020SD-00E2 MY77J8
> > N
> >    SE
> >
> >    MET
> >    A4SW    SW3050B
> >          SW6010B
> >          V
> >        > >         56900
> > 2349       LHR020SD-00E2 MY77J8
> > N
> >    SE
> >
> >    MI
> > A4SW    METHOD
> >     A2540G
> >       SOLID    > >          47.7
> >
> > Desired output:
> >
> > fldsampid
> > CLP_ID  sacode
> > matrix   etc...      CL
> >          PO4
> >     SO4        AG
> >       AL
> >    AS
> >    B
> >    V
> >    Zn
> >       etc...      ALK
> >       ALKB      ALKC
> >     SOLID    DOC
> >    TOC        NO3
> >
> > LHR020GW-01E2
> >
> >    N
> >    WG
> >
> > <value>
> >     <value>
> >         <value>
> >
> > <value>
> >     <value>
> >         <value>
> >
> > <value>
> >     <value>
> >         <value>
> >
> > <value>
> >     <value>
> >         <value>
> >
> > <value>
> >     <value>
> >         <value>
> >
> > <value>
> >     <value>
> >
> > LHR020SD-00E2 MY77J8 N
> >    SE
> >
> >    NA
> > NA          NA
> >     <value>
> >         <value>
> >
> > <value>
> >     <value>
> >         <value>
> >             NA
> >       <value>
> >           NA
> >     NA          NA
> >         <value>
> >             NA
> >       NA
> > NA
> >
> >     [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org<mailto:R-help@r-project.org>
> > mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


	[[alternative HTML version deleted]]
Seemingly Similar Threads

Search for more maybe matching threads
R help - Jan 2011 - Help with Data Transformation - RESOLVED

[R] Help with Data Transformation - RESOLVED

Seemingly Similar Threads

Wisdom of the Ancients