Hi:
Here's one approach, although I imagine there are more efficient ways.
# A function to strip spaces and return the first three non-blank elements
of a string
keyset <- function(x) substr(gsub(' ', '', x)[1], 1, 3)
# Apply the function to the data frame to generate the key:
a$key <- sapply(a$product, keyset)> a
date product sales key
1 20081201 a b c d e 1 abc
2 20081202 a b c g h t 2 abc
3 20081201 d e h a c e h g 3 deh
# Use aggregate to sum sales by key:
aggregate(sales ~ key, data = a, FUN = sum)
key sales
1 abc 3
2 deh 3
HTH,
Dennis
On Wed, Mar 9, 2011 at 6:02 PM, Hui Du <Hui.Du@dataventures.com> wrote:
>
> Hi All,
>
> I have a data frame like
>
> a = data.frame(date = c(20081201, 20081202, 20081201), product = c("a
b c d
> e", "a b c g h t", "d e h a c e h g"), sales =
c(1, 2, 3))
>
> Now I want to aggregate the sales by part of the a$product.
> 'Product' is the product name, a string separated by a space. The
key in my
> aggregate function is first three items in "product" field. In my
example,
> the key is "a b c", "a b c" and "d e h",
respectively. Do you know how to do
> it? I thought an awkward way which needed several function calls (like
> strsplit, lapply, paste etc) to manipulate the string in 'product'
field. I
> guess there could be some more elegant way to do it.
>
> Thanks in advance.
>
>
> HXD
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]