I want to sum columns based on their names. As an exampel how could I
sum columns which contain 6574, 7584 and 85 as column names? In
addition, how could I sum those which contain 6574, 7584 and 85 in
ther names and have a prefix "f". My data contains several variables
with
I want to sum columns based on their names. As an exampel how could I
sum columns which contain 6574, 7584 and 85 as column names? In
addition, how could I sum those which contain 6574, 7584 and 85 in
ther names and have a prefix "f". My data contains several variables
with
dput(df1)
structure(list(date = structure(c(1230768000, 1230854400, 1230940800,
1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200,
1231545600, 1231632000), class = c("POSIXct", "POSIXt"),
tzone = "UTC"),
f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0,
1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5,
5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10,
3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10,
4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0,
0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13,
12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6,
7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1,
11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names =
c("date",
"f014card", "f1534card", "f3564card",
"f6574card", "f7584card",
"f85card", "m014card", "m1534card",
"m3564card", "m6574card",
"m7584card", "m85card"), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6",
"7", "8", "9", "10", "11"))
Charles Determan Jr
2014-Oct-13 13:05 UTC
[R] How to sum some columns based on their names
You can use grep with some basic regex, index your dataframe, and colSums
colSums(df[,grep("*6574*|*7584*|*85*", colnames(df))])
colSums(df[,grep("f6574*|f7584*|f85*", colnames(df))])
Regards,
Dr. Charles Determan
On Mon, Oct 13, 2014 at 7:57 AM, Kuma Raj <pollaroid at gmail.com> wrote:
> I want to sum columns based on their names. As an exampel how could I
> sum columns which contain 6574, 7584 and 85 as column names? In
> addition, how could I sum those which contain 6574, 7584 and 85 in
> ther names and have a prefix "f". My data contains several
variables
> with
>
> I want to sum columns based on their names. As an exampel how could I
> sum columns which contain 6574, 7584 and 85 as column names? In
> addition, how could I sum those which contain 6574, 7584 and 85 in
> ther names and have a prefix "f". My data contains several
variables
> with
>
> dput(df1)
> structure(list(date = structure(c(1230768000, 1230854400, 1230940800,
> 1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200,
> 1231545600, 1231632000), class = c("POSIXct",
"POSIXt"), tzone = "UTC"),
> f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0,
> 1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5,
> 5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10,
> 3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10,
> 4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0,
> 0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13,
> 12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6,
> 7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1,
> 11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names =
c("date",
> "f014card", "f1534card", "f3564card",
"f6574card", "f7584card",
> "f85card", "m014card", "m1534card",
"m3564card", "m6574card",
> "m7584card", "m85card"), class =
"data.frame", row.names = c("1",
> "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11"))
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Dr. Charles Determan, PhD
Integrated Biosciences
[[alternative HTML version deleted]]
Learn regular expressions.. there are many websites and books that describe how
they work. R has a number of functions that use them...
?regexp
?grep
For example...
grep("^[^0-9]*(6574|85|7584)[^0-9]*$",names(dta))
where dta is your data frame. You can read that regular expression as zero or
more characters that are not digits at the beginning of the string, followed by
any of three specified sequences of digits, followed by zero or more non-digit
characters at the end of the string.
You can then use that function as the column specification index to look only at
certain columns. The sapply function can apply the sum function to all of those
columns:
sapply(dta[,grep("^[^0-9]*(6574|85|7584)[^0-9]*$",names(dta))],sum)
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On October 13, 2014 5:57:45 AM PDT, Kuma Raj <pollaroid at gmail.com>
wrote:>I want to sum columns based on their names. As an exampel how could I
>sum columns which contain 6574, 7584 and 85 as column names? In
>addition, how could I sum those which contain 6574, 7584 and 85 in
>ther names and have a prefix "f". My data contains several
variables
>with
>
>I want to sum columns based on their names. As an exampel how could I
>sum columns which contain 6574, 7584 and 85 as column names? In
>addition, how could I sum those which contain 6574, 7584 and 85 in
>ther names and have a prefix "f". My data contains several
variables
>with
>
>dput(df1)
>structure(list(date = structure(c(1230768000, 1230854400, 1230940800,
>1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200,
>1231545600, 1231632000), class = c("POSIXct", "POSIXt"),
tzone >"UTC"),
> f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0,
> 1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5,
> 5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10,
> 3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10,
> 4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0,
> 0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13,
> 12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6,
> 7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1,
> 11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names =
c("date",
>"f014card", "f1534card", "f3564card",
"f6574card", "f7584card",
>"f85card", "m014card", "m1534card",
"m3564card", "m6574card",
>"m7584card", "m85card"), class = "data.frame",
row.names = c("1",
>"2", "3", "4", "5", "6",
"7", "8", "9", "10", "11"))
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
Hello
I have 2 df Dem and Rap.
I would want to build all the df (dfnew) by associating these two df
(Dem and Rap) in the following way :
For each value of Dem$Nom (dfnew$Demandeur), I associate 2 different
values of Rap$Nom (dfnew$Rapporteur1 and dfnew$Rapporteur2) in such a way
* for each dfnew$Demandeur, dfnew$Rapporteur1 does not have the same
value for Departement as Dem$Departement
* for each dfnew$Demandeur, dfnew$Rapporteur2 does not have the same
value for Unite as Dem$Unite
* the value of table(dfnew$Rapporteur1) and the value of
table(dfnew$Rapporteur2) must be balanced and not too different
(Accepted differences : 1)
table(dfnew$Rapporteur1)
Rapporteur01 Rapporteur02 Rapporteur03 Rapporteur04 Rapporteur05
4 4 4 4
4
Thanks for your help
Michel
Dem <- structure(list(Nom = c("John", "Jim",
"Julie", "Charles",
"Michel",
"Emma", "Sandra", "Elodie", "Thierry",
"Albert", "Jean", "Francois",
"Pierre", "Cyril", "Damien",
"Jean-Michel", "Vincent", "Daniel",
"Yvan", "Catherine"), Departement = c("D",
"A", "A", "C", "D",
"B", "D", "B", "C", "D",
"B", "B", "B", "A", "C",
"D", "B", "A",
"D", "D"), Unite = c("Unite8", "Unite4",
"Unite4", "Unite7",
"Unite9", "Unite1", "Unite6", "Unite5",
"Unite7", "Unite3", "Unite2",
"Unite6", "Unite8", "Unite8", "Unite3",
"Unite8", "Unite9", "Unite7",
"Unite9", "Unite5")), .Names = c("Nom",
"Departement", "Unite"
), row.names = c(NA, -20L), class = "data.frame")
Rap <- structure(list(Nom = c("Rapporteur01",
"Rapporteur02",
"Rapporteur03",
"Rapporteur04", "Rapporteur05"), Departement =
c("C", "D", "C",
"C", "D"), Unite = c("Unite10",
"Unite6", "Unite5", "Unite5",
"Unite4")), .Names = c("Nom", "Departement",
"Unite"), row.names = c(NA,
-5L), class = "data.frame")
dfnew <- structure(list(Demandeur = structure(c(13L, 12L, 14L, 3L, 15L,
8L, 17L, 7L, 18L, 1L, 10L, 9L, 16L, 4L, 5L, 11L, 19L, 6L, 20L,
2L), .Label = c("Albert", "Catherine", "Charles",
"Cyril", "Damien",
"Daniel", "Elodie", "Emma", "Francois",
"Jean", "Jean-Michel",
"Jim", "John", "Julie", "Michel",
"Pierre", "Sandra", "Thierry",
"Vincent", "Yvan"), class = "factor"), Rapporteur1
= structure(c(3L,
1L, 3L, 5L, 1L, 5L, 1L, 2L, 5L, 4L, 2L, 4L, 2L, 3L, 5L, 4L, 4L,
2L, 3L, 1L), .Label = c("Rapporteur01", "Rapporteur02",
"Rapporteur03",
"Rapporteur04", "Rapporteur05"), class =
"factor"), Rapporteur2 =
structure(c(1L,
3L, 4L, 4L, 2L, 4L, 5L, 1L, 2L, 3L, 3L, 3L, 5L, 5L, 1L, 1L, 2L,
5L, 4L, 2L), .Label = c("Rapporteur01", "Rapporteur02",
"Rapporteur03",
"Rapporteur04", "Rapporteur05"), class =
"factor")), .Names =
c("Demandeur",
"Rapporteur1", "Rapporteur2"), row.names = c(NA, -20L),
class =
"data.frame")
--
Michel ARNAUD
Cirad Montpellier
[[alternative HTML version deleted]]