Ottorino-Luca Pantani
2009-Sep-03 16:17 UTC
[R] dividing a dataframe column by different constants
Dear R users, today I've got the following problem. Here you are a dataframe as example. There are some SAMPLES for which a CONCentration was recorded through TIME. The time during which the concentration was recorded is not always the same, 10 points for Sample A, 7 points for Sample B and 11 for sample C Also the initial concentration was not the same for the three samples. I would like express the concentrations as % of the concentration at time = 1, therefore I wrote the following code that do the job, but is impractical when the samples are, as in my real case, more than on hundred. It is known that at the minimum time is present the maximum concentration, by which divide all the other concentrations in the sample. I'm quite sure that there's a more elegant solution, but I really do not even imagine how to write it. Thanks in advance for your time (df.mydata <- data.frame( CONC c(seq( from = 1, to = 0.1, by = -0.1 ), seq( from = 0.8, to = 0.2, by = -0.1 ), seq( from = 0.6, to = 0.1, by = -0.05 )), TIME c(1:10, 2:8, 4:14 ), SAMPLE = c( rep( "A", 10 ), rep( "B", 7 ), rep( "C", 11 ) ) ) ) MAX <- tapply( df.mydata$CONC, df.mydata$SAMPLE, max ) (df.mydata$PERCENTAGE <- ifelse(df.mydata$SAMPLE == "A", df.mydata$CONC / MAX[1], ifelse(df.mydata$SAMPLE == "B", df.mydata$CONC / MAX[2], df.mydata$CONC / MAX[3]))) -- Ottorino-Luca Pantani, Universit? di Firenze Dip. Scienza del Suolo e Nutrizione della Pianta P.zle Cascine 28 50144 Firenze Italia Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273 OLPantani at unifi.it http://www4.unifi.it/dssnp/
David Winsemius
2009-Sep-03 16:43 UTC
[R] dividing a dataframe column by different constants
On Sep 3, 2009, at 12:17 PM, Ottorino-Luca Pantani wrote:> Dear R users, today I've got the following problem. > Here you are a dataframe as example. > There are some SAMPLES for which a CONCentration was recorded > through TIME. > The time during which the concentration was recorded is not always > the same, > 10 points for Sample A, 7 points for Sample B and 11 for sample C > > Also the initial concentration was not the same for the three samples. > > I would like express the concentrations as % of the concentration at > time = 1, therefore I wrote the following code that do the job, but > is impractical when the samples are, as in my real case, more than > on hundred. > It is known that at the minimum time is present the maximum > concentration, by which divide all the other concentrations in the > sample. > > I'm quite sure that there's a more elegant solution, but I really do > not even imagine how to write it. > > Thanks in advance for your time > > > (df.mydata <- data.frame( > CONC > c(seq( from = 1, to = 0.1, by = -0.1 ), > seq( from = 0.8, to = 0.2, by = -0.1 ), > seq( from = 0.6, to = 0.1, by = -0.05 )), > TIME > c(1:10, > 2:8, > 4:14 ), > SAMPLE = c( rep( "A", 10 ), > rep( "B", 7 ), > rep( "C", 11 ) > ) > ) > )Perhaps this: by(df.mydata, df.mydata$SAMPLE, function(x) x$CONC/x$CONC[1] ) ...or if you wanted to used max(x$CONC) as the standardizing procedure hat ought to work as well. With your data is gives identical results. The equivalent tapply construction would be: tapply(df.mydata$CONC, df.mydata$SAMPLE, function(x) x/x[1] )> MAX <- tapply( df.mydata$CONC, df.mydata$SAMPLE, max ) > (df.mydata$PERCENTAGE <- > ifelse(df.mydata$SAMPLE == "A", df.mydata$CONC / MAX[1], > ifelse(df.mydata$SAMPLE == "B", df.mydata$CONC / MAX[2], > df.mydata$CONC / MAX[3]))) > > -- > Ottorino-Luca Pantani, Universit? di Firenze > Dip. Scienza del Suolo e Nutrizione della Pianta > P.zle Cascine 28 50144 Firenze Italia > Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273 OLPantani at unifi.it http://www4.unifi.it/dssnp/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
Jorge Ivan Velez
2009-Sep-03 16:48 UTC
[R] dividing a dataframe column by different constants
Dear Ottorino-Luca, Here is a suggestion using ave(): df.mydata$PERCENTAGE <- with(df.mydata, ave(CONC, list(SAMPLE), FUN function(x) x / max(x) )) df.mydata[1:5,] # CONC TIME SAMPLE PERCENTAGE # 1 1.0 1 A 1.0 # 2 0.9 2 A 0.9 # 3 0.8 3 A 0.8 # 4 0.7 4 A 0.7 # 5 0.6 5 A 0.6 See ?ave and ?tapply for more information. HTH, Jorge On Thu, Sep 3, 2009 at 12:17 PM, Ottorino-Luca Pantani < ottorino-luca.pantani@unifi.it> wrote:> Dear R users, today I've got the following problem. > Here you are a dataframe as example. > There are some SAMPLES for which a CONCentration was recorded through > TIME. > The time during which the concentration was recorded is not always the > same, > 10 points for Sample A, 7 points for Sample B and 11 for sample C > > Also the initial concentration was not the same for the three samples. > > I would like express the concentrations as % of the concentration at time > 1, therefore I wrote the following code that do the job, but is impractical > when the samples are, as in my real case, more than on hundred. > It is known that at the minimum time is present the maximum concentration, > by which divide all the other concentrations in the sample. > > I'm quite sure that there's a more elegant solution, but I really do not > even imagine how to write it. > > Thanks in advance for your time > > > (df.mydata <- data.frame( > CONC > c(seq( from = 1, to = 0.1, by = -0.1 ), > seq( from = 0.8, to = 0.2, by = -0.1 ), > seq( from = 0.6, to = 0.1, by = -0.05 )), > TIME > c(1:10, > 2:8, > 4:14 ), > SAMPLE = c( rep( "A", 10 ), > rep( "B", 7 ), > rep( "C", 11 ) > ) > ) > ) > MAX <- tapply( df.mydata$CONC, df.mydata$SAMPLE, max ) > (df.mydata$PERCENTAGE <- > ifelse(df.mydata$SAMPLE == "A", df.mydata$CONC / MAX[1], > ifelse(df.mydata$SAMPLE == "B", df.mydata$CONC / MAX[2], > df.mydata$CONC / MAX[3]))) > > -- > Ottorino-Luca Pantani, Università di Firenze > Dip. Scienza del Suolo e Nutrizione della Pianta > P.zle Cascine 28 50144 Firenze Italia > Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273 OLPantani@unifi.it > http://www4.unifi.it/dssnp/ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Apparently Analagous Threads
- R code to reproduce (while studying) Bates & Watts 1988
- Is there in R a function equivalent to the mround, as found in most spreadsheets?
- Rearranging long tables, Sweave, xtable, LaTeX
- substituting dots in the names of the columns (sub, gsub, regexpr)
- again on ubuntu 7.10 and amd64