I want to sum columns based on their names. As an exampel how could I sum columns which contain 6574, 7584 and 85 as column names? In addition, how could I sum those which contain 6574, 7584 and 85 in ther names and have a prefix "f". My data contains several variables with I want to sum columns based on their names. As an exampel how could I sum columns which contain 6574, 7584 and 85 as column names? In addition, how could I sum those which contain 6574, 7584 and 85 in ther names and have a prefix "f". My data contains several variables with dput(df1) structure(list(date = structure(c(1230768000, 1230854400, 1230940800, 1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200, 1231545600, 1231632000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5, 5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10, 3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10, 4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13, 12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6, 7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1, 11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names = c("date", "f014card", "f1534card", "f3564card", "f6574card", "f7584card", "f85card", "m014card", "m1534card", "m3564card", "m6574card", "m7584card", "m85card"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
Charles Determan Jr
2014-Oct-13 13:05 UTC
[R] How to sum some columns based on their names
You can use grep with some basic regex, index your dataframe, and colSums colSums(df[,grep("*6574*|*7584*|*85*", colnames(df))]) colSums(df[,grep("f6574*|f7584*|f85*", colnames(df))]) Regards, Dr. Charles Determan On Mon, Oct 13, 2014 at 7:57 AM, Kuma Raj <pollaroid at gmail.com> wrote:> I want to sum columns based on their names. As an exampel how could I > sum columns which contain 6574, 7584 and 85 as column names? In > addition, how could I sum those which contain 6574, 7584 and 85 in > ther names and have a prefix "f". My data contains several variables > with > > I want to sum columns based on their names. As an exampel how could I > sum columns which contain 6574, 7584 and 85 as column names? In > addition, how could I sum those which contain 6574, 7584 and 85 in > ther names and have a prefix "f". My data contains several variables > with > > dput(df1) > structure(list(date = structure(c(1230768000, 1230854400, 1230940800, > 1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200, > 1231545600, 1231632000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), > f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0, > 1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5, > 5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10, > 3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10, > 4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0, > 0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13, > 12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6, > 7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1, > 11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names = c("date", > "f014card", "f1534card", "f3564card", "f6574card", "f7584card", > "f85card", "m014card", "m1534card", "m3564card", "m6574card", > "m7584card", "m85card"), class = "data.frame", row.names = c("1", > "2", "3", "4", "5", "6", "7", "8", "9", "10", "11")) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dr. Charles Determan, PhD Integrated Biosciences [[alternative HTML version deleted]]
Learn regular expressions.. there are many websites and books that describe how they work. R has a number of functions that use them... ?regexp ?grep For example... grep("^[^0-9]*(6574|85|7584)[^0-9]*$",names(dta)) where dta is your data frame. You can read that regular expression as zero or more characters that are not digits at the beginning of the string, followed by any of three specified sequences of digits, followed by zero or more non-digit characters at the end of the string. You can then use that function as the column specification index to look only at certain columns. The sapply function can apply the sum function to all of those columns: sapply(dta[,grep("^[^0-9]*(6574|85|7584)[^0-9]*$",names(dta))],sum) --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On October 13, 2014 5:57:45 AM PDT, Kuma Raj <pollaroid at gmail.com> wrote:>I want to sum columns based on their names. As an exampel how could I >sum columns which contain 6574, 7584 and 85 as column names? In >addition, how could I sum those which contain 6574, 7584 and 85 in >ther names and have a prefix "f". My data contains several variables >with > >I want to sum columns based on their names. As an exampel how could I >sum columns which contain 6574, 7584 and 85 as column names? In >addition, how could I sum those which contain 6574, 7584 and 85 in >ther names and have a prefix "f". My data contains several variables >with > >dput(df1) >structure(list(date = structure(c(1230768000, 1230854400, 1230940800, >1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200, >1231545600, 1231632000), class = c("POSIXct", "POSIXt"), tzone >"UTC"), > f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0, > 1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5, > 5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10, > 3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10, > 4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0, > 0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13, > 12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6, > 7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1, > 11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names = c("date", >"f014card", "f1534card", "f3564card", "f6574card", "f7584card", >"f85card", "m014card", "m1534card", "m3564card", "m6574card", >"m7584card", "m85card"), class = "data.frame", row.names = c("1", >"2", "3", "4", "5", "6", "7", "8", "9", "10", "11")) > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Hello I have 2 df Dem and Rap. I would want to build all the df (dfnew) by associating these two df (Dem and Rap) in the following way : For each value of Dem$Nom (dfnew$Demandeur), I associate 2 different values of Rap$Nom (dfnew$Rapporteur1 and dfnew$Rapporteur2) in such a way * for each dfnew$Demandeur, dfnew$Rapporteur1 does not have the same value for Departement as Dem$Departement * for each dfnew$Demandeur, dfnew$Rapporteur2 does not have the same value for Unite as Dem$Unite * the value of table(dfnew$Rapporteur1) and the value of table(dfnew$Rapporteur2) must be balanced and not too different (Accepted differences : 1) table(dfnew$Rapporteur1) Rapporteur01 Rapporteur02 Rapporteur03 Rapporteur04 Rapporteur05 4 4 4 4 4 Thanks for your help Michel Dem <- structure(list(Nom = c("John", "Jim", "Julie", "Charles", "Michel", "Emma", "Sandra", "Elodie", "Thierry", "Albert", "Jean", "Francois", "Pierre", "Cyril", "Damien", "Jean-Michel", "Vincent", "Daniel", "Yvan", "Catherine"), Departement = c("D", "A", "A", "C", "D", "B", "D", "B", "C", "D", "B", "B", "B", "A", "C", "D", "B", "A", "D", "D"), Unite = c("Unite8", "Unite4", "Unite4", "Unite7", "Unite9", "Unite1", "Unite6", "Unite5", "Unite7", "Unite3", "Unite2", "Unite6", "Unite8", "Unite8", "Unite3", "Unite8", "Unite9", "Unite7", "Unite9", "Unite5")), .Names = c("Nom", "Departement", "Unite" ), row.names = c(NA, -20L), class = "data.frame") Rap <- structure(list(Nom = c("Rapporteur01", "Rapporteur02", "Rapporteur03", "Rapporteur04", "Rapporteur05"), Departement = c("C", "D", "C", "C", "D"), Unite = c("Unite10", "Unite6", "Unite5", "Unite5", "Unite4")), .Names = c("Nom", "Departement", "Unite"), row.names = c(NA, -5L), class = "data.frame") dfnew <- structure(list(Demandeur = structure(c(13L, 12L, 14L, 3L, 15L, 8L, 17L, 7L, 18L, 1L, 10L, 9L, 16L, 4L, 5L, 11L, 19L, 6L, 20L, 2L), .Label = c("Albert", "Catherine", "Charles", "Cyril", "Damien", "Daniel", "Elodie", "Emma", "Francois", "Jean", "Jean-Michel", "Jim", "John", "Julie", "Michel", "Pierre", "Sandra", "Thierry", "Vincent", "Yvan"), class = "factor"), Rapporteur1 = structure(c(3L, 1L, 3L, 5L, 1L, 5L, 1L, 2L, 5L, 4L, 2L, 4L, 2L, 3L, 5L, 4L, 4L, 2L, 3L, 1L), .Label = c("Rapporteur01", "Rapporteur02", "Rapporteur03", "Rapporteur04", "Rapporteur05"), class = "factor"), Rapporteur2 = structure(c(1L, 3L, 4L, 4L, 2L, 4L, 5L, 1L, 2L, 3L, 3L, 3L, 5L, 5L, 1L, 1L, 2L, 5L, 4L, 2L), .Label = c("Rapporteur01", "Rapporteur02", "Rapporteur03", "Rapporteur04", "Rapporteur05"), class = "factor")), .Names = c("Demandeur", "Rapporteur1", "Rapporteur2"), row.names = c(NA, -20L), class = "data.frame") -- Michel ARNAUD Cirad Montpellier [[alternative HTML version deleted]]