thr3ads.net - R help - [R] How to sum some columns based on their names [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Kuma Raj

2014-Oct-13 12:57 UTC

[R] How to sum some columns based on their names

I want to sum columns based on their names. As an exampel how could I
sum columns which contain 6574, 7584 and 85 as column names?  In
addition, how could I sum those which contain 6574, 7584 and 85 in
ther names and have a prefix "f". My data contains several variables
with

I want to sum columns based on their names. As an exampel how could I
sum columns which contain 6574, 7584 and 85 as column names?  In
addition, how could I sum those which contain 6574, 7584 and 85 in
ther names and have a prefix "f". My data contains several variables
with

dput(df1)
structure(list(date = structure(c(1230768000, 1230854400, 1230940800,
1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200,
1231545600, 1231632000), class = c("POSIXct", "POSIXt"),
tzone = "UTC"),
    f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0,
    1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5,
    5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10,
    3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10,
    4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0,
    0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13,
    12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6,
    7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1,
    11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names =
c("date",
"f014card", "f1534card", "f3564card",
"f6574card", "f7584card",
"f85card", "m014card", "m1534card",
"m3564card", "m6574card",
"m7584card", "m85card"), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6",
"7", "8", "9", "10", "11"))

Charles Determan Jr

2014-Oct-13 13:05 UTC

head link

[R] How to sum some columns based on their names

You can use grep with some basic regex, index your dataframe, and colSums

colSums(df[,grep("*6574*|*7584*|*85*", colnames(df))])
colSums(df[,grep("f6574*|f7584*|f85*", colnames(df))])


Regards,
Dr. Charles Determan

On Mon, Oct 13, 2014 at 7:57 AM, Kuma Raj <pollaroid at gmail.com> wrote:
> I want to sum columns based on their names. As an exampel how could I
> sum columns which contain 6574, 7584 and 85 as column names?  In
> addition, how could I sum those which contain 6574, 7584 and 85 in
> ther names and have a prefix "f". My data contains several
variables
> with
>
> I want to sum columns based on their names. As an exampel how could I
> sum columns which contain 6574, 7584 and 85 as column names?  In
> addition, how could I sum those which contain 6574, 7584 and 85 in
> ther names and have a prefix "f". My data contains several
variables
> with
>
> dput(df1)
> structure(list(date = structure(c(1230768000, 1230854400, 1230940800,
> 1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200,
> 1231545600, 1231632000), class = c("POSIXct",
"POSIXt"), tzone = "UTC"),
>     f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0,
>     1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5,
>     5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10,
>     3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10,
>     4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0,
>     0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0,
>     0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13,
>     12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6,
>     7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1,
>     11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names =
c("date",
> "f014card", "f1534card", "f3564card",
"f6574card", "f7584card",
> "f85card", "m014card", "m1534card",
"m3564card", "m6574card",
> "m7584card", "m85card"), class =
"data.frame", row.names = c("1",
> "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11"))
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Dr. Charles Determan, PhD
Integrated Biosciences

	[[alternative HTML version deleted]]

Jeff Newmiller

2014-Oct-13 13:30 UTC

head link

[R] How to sum some columns based on their names

Learn regular expressions.. there are many websites and books that describe how
they work. R has a number of functions that use them...

?regexp
?grep

For example...

grep("^[^0-9]*(6574|85|7584)[^0-9]*$",names(dta))

where dta is your data frame. You can read that regular expression as zero or
more characters that are not digits at the beginning of the string, followed by
any of three specified sequences of digits, followed by zero or more non-digit
characters at the end of the string.

You can then use that function as the column specification index to look only at
certain columns. The sapply function can apply the sum function to all of those
columns:

sapply(dta[,grep("^[^0-9]*(6574|85|7584)[^0-9]*$",names(dta))],sum)
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On October 13, 2014 5:57:45 AM PDT, Kuma Raj <pollaroid at gmail.com>
wrote:>I want to sum columns based on their names. As an exampel how could I
>sum columns which contain 6574, 7584 and 85 as column names?  In
>addition, how could I sum those which contain 6574, 7584 and 85 in
>ther names and have a prefix "f". My data contains several
variables
>with
>
>I want to sum columns based on their names. As an exampel how could I
>sum columns which contain 6574, 7584 and 85 as column names?  In
>addition, how could I sum those which contain 6574, 7584 and 85 in
>ther names and have a prefix "f". My data contains several
variables
>with
>
>dput(df1)
>structure(list(date = structure(c(1230768000, 1230854400, 1230940800,
>1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200,
>1231545600, 1231632000), class = c("POSIXct", "POSIXt"),
tzone >"UTC"),
>    f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0,
>    1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5,
>    5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10,
>    3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10,
>  4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0,
>    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0,
>    0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13,
>    12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6,
>    7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1,
> 11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names =
c("date",
>"f014card", "f1534card", "f3564card",
"f6574card", "f7584card",
>"f85card", "m014card", "m1534card",
"m3564card", "m6574card",
>"m7584card", "m85card"), class = "data.frame",
row.names = c("1",
>"2", "3", "4", "5", "6",
"7", "8", "9", "10", "11"))
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

Arnaud Michel

2014-Oct-14 08:46 UTC

head link

[R] To build a new Df from 2 Df

Hello

I have 2 df Dem and Rap.
I would want to build all the df (dfnew) by associating these two df 
(Dem and Rap) in the following way :

For each value of Dem$Nom (dfnew$Demandeur), I associate 2 different 
values of Rap$Nom (dfnew$Rapporteur1 and dfnew$Rapporteur2) in such a way

  * for each dfnew$Demandeur, dfnew$Rapporteur1 does not have the same
    value for Departement as Dem$Departement
  * for each dfnew$Demandeur, dfnew$Rapporteur2 does not have the same
    value for Unite as Dem$Unite
  * the value of table(dfnew$Rapporteur1) and the value of
    table(dfnew$Rapporteur2) must be balanced and not too different
    (Accepted differences : 1)

table(dfnew$Rapporteur1)
Rapporteur01 Rapporteur02 Rapporteur03 Rapporteur04 Rapporteur05
            4                   4 4                      4               
   4

Thanks for your help
Michel

  Dem <- structure(list(Nom = c("John", "Jim",
"Julie", "Charles",
"Michel",
"Emma", "Sandra", "Elodie", "Thierry",
"Albert", "Jean", "Francois",
"Pierre", "Cyril", "Damien",
"Jean-Michel", "Vincent", "Daniel",
"Yvan", "Catherine"), Departement = c("D",
"A", "A", "C", "D",
"B", "D", "B", "C", "D",
"B", "B", "B", "A", "C",
"D", "B", "A",
"D", "D"), Unite = c("Unite8", "Unite4",
"Unite4", "Unite7",
"Unite9", "Unite1", "Unite6", "Unite5",
"Unite7", "Unite3", "Unite2",
"Unite6", "Unite8", "Unite8", "Unite3",
"Unite8", "Unite9", "Unite7",
"Unite9", "Unite5")), .Names = c("Nom",
"Departement", "Unite"
), row.names = c(NA, -20L), class = "data.frame")

Rap <- structure(list(Nom = c("Rapporteur01",
"Rapporteur02",
"Rapporteur03",
"Rapporteur04", "Rapporteur05"), Departement =
c("C", "D", "C",
"C", "D"), Unite = c("Unite10",
"Unite6", "Unite5", "Unite5",
"Unite4")), .Names = c("Nom", "Departement",
"Unite"), row.names = c(NA,
-5L), class = "data.frame")

dfnew <- structure(list(Demandeur = structure(c(13L, 12L, 14L, 3L, 15L,
8L, 17L, 7L, 18L, 1L, 10L, 9L, 16L, 4L, 5L, 11L, 19L, 6L, 20L,
2L), .Label = c("Albert", "Catherine", "Charles",
"Cyril", "Damien",
"Daniel", "Elodie", "Emma", "Francois",
"Jean", "Jean-Michel",
"Jim", "John", "Julie", "Michel",
"Pierre", "Sandra", "Thierry",
"Vincent", "Yvan"), class = "factor"), Rapporteur1
= structure(c(3L,
1L, 3L, 5L, 1L, 5L, 1L, 2L, 5L, 4L, 2L, 4L, 2L, 3L, 5L, 4L, 4L,
2L, 3L, 1L), .Label = c("Rapporteur01", "Rapporteur02",
"Rapporteur03",
"Rapporteur04", "Rapporteur05"), class =
"factor"), Rapporteur2 =
structure(c(1L,
3L, 4L, 4L, 2L, 4L, 5L, 1L, 2L, 3L, 3L, 3L, 5L, 5L, 1L, 1L, 2L,
5L, 4L, 2L), .Label = c("Rapporteur01", "Rapporteur02",
"Rapporteur03",
"Rapporteur04", "Rapporteur05"), class =
"factor")), .Names =
c("Demandeur",
"Rapporteur1", "Rapporteur2"), row.names = c(NA, -20L),
class =
"data.frame")


-- 
Michel ARNAUD
Cirad Montpellier


	[[alternative HTML version deleted]]

R help - Oct 2014 - How to sum some columns based on their names

[R] How to sum some columns based on their names

[R] How to sum some columns based on their names

[R] How to sum some columns based on their names

[R] To build a new Df from 2 Df