Jesús Guillermo Andrade
2009-Feb-09 22:11 UTC
[R] Dataframes: conditional calculations per row .
Dear Sirs: I've been working with several variables in a dataframe that serve as part of a calculation that I need to perform in a different way depending on its value. Let me explain: The main dataframe is called llmcc llmcc : 'data.frame': 283 obs. of 11 variables: $ Area : num 308.8 105.6 51.4 51.4 52.9 ... $ mFondo : num 30.1 10 10.2 10.2 40.4 ... $ mFachada : num 22.95 6.7 4.72 4.72 4.72 ... $ Marca : Factor w/ 132 levels "AA_Movilnet",..: 11 32 82 82 32 32 32 32 32 32 ... $ Clase : int 8 4 1 1 1 1 1 1 12 1 ... $ Categoria: int 2 6 6 6 1 1 1 1 1 1 ... $ Phi : num 0.128 0.147 0.217 0.217 0.887 ... $ Rf : num 0.119 0.102 0.147 0.147 0.143 ... $ OldA : num 0.737 0.258 0.375 0.375 0.385 ... $ OldCondo : num 4436 1555 2260 2260 2318 ... $ NewA_Jon : num 1.069 0.368 0.256 0.256 0.264 ... I perform an initial operation using the original variables plus one numeric (Abase) that is external and has the same number of rows than the dataframe: alitemp <- ((Abase/llmcc$Clase)*PClase)+(((1/llmcc $Categoria)*Abase)*PCategoria)+((Abase*llmcc$Phi)*PPhi)+((Abase*llmcc $Rf)*PRf) So, after I obtain the results of this calculation, I append the series by creating an additional column within the original dataframe: l lmcc$Alitmp <- alitemp Problem is: I need to calculate a new column using a formula that has different structure depending on the values of llmcc$Clase, thus: for any given row of llmcc where llmcc$Clase is >= 10 i would have to perform some operations with other values in the same row that are, by definition, different than the ones I would need in case of lmcc$Clase is < 10. I've managed to break down the original dataframe by using subsets, and then performing the calculations, but then it is complicated to put the results in the same order of the original dataframe. I understand the workings of the control structures available in R but after reading the docs and help files, I can´t figure how to perform a conditional calculation row by row that checks first the values of a given column and then applies the corresponding operation to another column, so it outputs a series in the same exact order as the dataframe. Any light that you might share with me over this will be highly appreciated. Thanks in advance. Guillermo. Nunca le preguntes a un peluquero si necesitas un corte de pelo. Ley de Murray. ------------------------------ Jesús Guillermo Andrade (Abg.) Gerente de Litigios y Corporativo. EDM. AC. API. Andrade & Moreno S.C. (http://amlegal.wordpress.com/) [[alternative HTML version deleted]]
You can use 'ifelse':> x <- data.frame(id=sample(1:4,20,TRUE)) > # use ifelse to do the calculations > x$cal <- ifelse(x$id == 1, 21,+ ifelse(x$id == 2, 221, + ifelse(x$id == 3, 2221, 22221)))> xid cal 1 4 22221 2 1 21 3 3 2221 4 1 21 5 2 221 6 2 221 7 1 21 8 2 221 9 4 22221 10 2 221 11 2 221 12 3 2221 13 2 221 14 1 21 15 4 22221 16 3 2221 17 4 22221 18 1 21 19 3 2221 20 2 221 On Mon, Feb 9, 2009 at 5:11 PM, Jes?s Guillermo Andrade <jgandradev at mac.com> wrote:> Dear Sirs: I've been working with several variables in a dataframe > that serve as part of a calculation that I need to perform in a > different way depending on its value. Let me explain: > > The main dataframe is called llmcc > > llmcc : 'data.frame': 283 obs. of 11 variables: > $ Area : num 308.8 105.6 51.4 51.4 52.9 ... > $ mFondo : num 30.1 10 10.2 10.2 40.4 ... > $ mFachada : num 22.95 6.7 4.72 4.72 4.72 ... > $ Marca : Factor w/ 132 levels "AA_Movilnet",..: 11 32 82 82 32 > 32 32 32 32 32 ... > $ Clase : int 8 4 1 1 1 1 1 1 12 1 ... > $ Categoria: int 2 6 6 6 1 1 1 1 1 1 ... > $ Phi : num 0.128 0.147 0.217 0.217 0.887 ... > $ Rf : num 0.119 0.102 0.147 0.147 0.143 ... > $ OldA : num 0.737 0.258 0.375 0.375 0.385 ... > $ OldCondo : num 4436 1555 2260 2260 2318 ... > $ NewA_Jon : num 1.069 0.368 0.256 0.256 0.264 ... > > I perform an initial operation using the original variables plus one > numeric (Abase) that is external and has the same number of rows than > the dataframe: > > alitemp <- ((Abase/llmcc$Clase)*PClase)+(((1/llmcc > $Categoria)*Abase)*PCategoria)+((Abase*llmcc$Phi)*PPhi)+((Abase*llmcc > $Rf)*PRf) > > So, after I obtain the results of this calculation, I append the > series by creating an additional column within the original dataframe: > l > lmcc$Alitmp <- alitemp > > Problem is: I need to calculate a new column using a formula that has > different structure depending on the values of llmcc$Clase, thus: for > any given row of llmcc where llmcc$Clase is >= 10 i would have to > perform some operations with other values in the same row that are, by > definition, different than the ones I would need in case of lmcc$Clase > is < 10. > I've managed to break down the original dataframe by using subsets, > and then performing the calculations, but then it is complicated to > put the results in the same order of the original dataframe. > I understand the workings of the control structures available in R but > after reading the docs and help files, I can?t figure how to perform a > conditional calculation row by row that checks first the values of a > given column and then applies the corresponding operation to another > column, so it outputs a series in the same exact order as the dataframe. > > Any light that you might share with me over this will be highly > appreciated. > > Thanks in advance. > > > Guillermo. > > > > Nunca le preguntes a un peluquero si necesitas un corte de pelo. Ley > de Murray. > ------------------------------ > Jes?s Guillermo Andrade (Abg.) > Gerente de Litigios y Corporativo. EDM. AC. API. > Andrade & Moreno S.C. (http://amlegal.wordpress.com/) > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
One way. there may be better. The apply function will work with just one row (or one column) at a time. > DF Month Week Estpassage MedFL 1 July 27 665 34 2 July 28 2232 35 3 July 29 9241 35 4 July 30 28464 35 5 Aug 31 41049 35 6 Aug 32 82216 35 7 Aug 33 230411 35 8 Aug 34 358541 35 9 Sept 35 747839 35 10 Sept 36 459682 36 11 Sept 37 609567 36 12 Sept 38 979475 36 13 Sept 39 837189 36 Build a function, say we call it switch.cond: switch.cond <- function (x) { if (x["Week"] >= 33) return( fun1(x) ) else return( fun2(x) ) } # Build two more functions to handle the dispatched rows fun1 <-function(x){ cat("do function 1\n") } # replace the cat-call with your first calculation fun2 <-function(x){ cat("do function 2\n") } # and use various x["<colname>"]'s as arguments > apply(DF, 1, switch.cond) do function 2 do function 2 do function 2 do function 2 do function 2 do function 2 do function 1 do function 1 do function 1 do function 1 do function 1 do function 1 do function 1 NULL HTH: David Winsemius On Feb 9, 2009, at 5:11 PM, Jes?s Guillermo Andrade wrote:> Dear Sirs: I've been working with several variables in a dataframe > that serve as part of a calculation that I need to perform in a > different way depending on its value. Let me explain: > > The main dataframe is called llmcc > > llmcc : 'data.frame': 283 obs. of 11 variables: > $ Area : num 308.8 105.6 51.4 51.4 52.9 ... > $ mFondo : num 30.1 10 10.2 10.2 40.4 ... > $ mFachada : num 22.95 6.7 4.72 4.72 4.72 ... > $ Marca : Factor w/ 132 levels "AA_Movilnet",..: 11 32 82 82 32 > 32 32 32 32 32 ... > $ Clase : int 8 4 1 1 1 1 1 1 12 1 ... > $ Categoria: int 2 6 6 6 1 1 1 1 1 1 ... > $ Phi : num 0.128 0.147 0.217 0.217 0.887 ... > $ Rf : num 0.119 0.102 0.147 0.147 0.143 ... > $ OldA : num 0.737 0.258 0.375 0.375 0.385 ... > $ OldCondo : num 4436 1555 2260 2260 2318 ... > $ NewA_Jon : num 1.069 0.368 0.256 0.256 0.264 ... > > I perform an initial operation using the original variables plus one > numeric (Abase) that is external and has the same number of rows than > the dataframe: > > alitemp <- ((Abase/llmcc$Clase)*PClase)+(((1/llmcc > $Categoria)*Abase)*PCategoria)+((Abase*llmcc$Phi)*PPhi)+((Abase*llmcc > $Rf)*PRf) > > So, after I obtain the results of this calculation, I append the > series by creating an additional column within the original dataframe: > l > lmcc$Alitmp <- alitemp > > Problem is: I need to calculate a new column using a formula that has > different structure depending on the values of llmcc$Clase, thus: for > any given row of llmcc where llmcc$Clase is >= 10 i would have to > perform some operations with other values in the same row that are, by > definition, different than the ones I would need in case of lmcc$Clase > is < 10. > I've managed to break down the original dataframe by using subsets, > and then performing the calculations, but then it is complicated to > put the results in the same order of the original dataframe. > I understand the workings of the control structures available in R but > after reading the docs and help files, I can?t figure how to perform a > conditional calculation row by row that checks first the values of a > given column and then applies the corresponding operation to another > column, so it outputs a series in the same exact order as the > dataframe. > > Any light that you might share with me over this will be highly > appreciated. > > Thanks in advance. > > > Guillermo. > > > > Nunca le preguntes a un peluquero si necesitas un corte de pelo. Ley > de Murray. > ------------------------------ > Jes?s Guillermo Andrade (Abg.) > Gerente de Litigios y Corporativo. EDM. AC. API. > Andrade & Moreno S.C. (http://amlegal.wordpress.com/) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.