Dear list,
Here's a little problem I already solved with my own coding style, but I
feel there is a more efficient and cleaner way to write it, but had no
success finding the "clever" solution.
I want to produce a factor from a subset of the combination of two
vectors. I have the vectors a et b in a data-frame :
> df <- expand.grid(a=c(0, 5, 10, 25, 50), b=c(0, 25, 50, 100, 200))
> fac.df
a b
1 0 0
2 5 0
3 10 0
4 25 0
5 50 0
6 0 25
7 5 25
<snip>
and want to create a factor which levels correspond to particular
combinations of a and b (let's say Low for a=0 & b=0, Medium for a=10
&
b=50, High for a=50 & b=200, others levels set to NA), reading them from
a data-frame which describes the desired subset and corresponding levels.
Here's my own solution (inputs are data-frames df and cas, output is the
sub factor):
> cas <- as.data.frame(matrix(c(0, 10,50, 0, 50, 200), 3,
2,dimnames=list(c("Low", "Medium", "High"),
c("a", "b"))))
> cas
a b
Low 0 0
Medium 10 50
High 50 200
> sub <- character(length(df$a))
> for (i in 1:length(df$a)) {
+ temp <- rownames(cas)[cas$a==df$a[i] & cas$b==df$b[i]]
+ sub[i] <- ifelse(length(temp)>0, temp, NA)
+ }
> sub <- ordered(sub, levels=c("Low", "Medium",
"High"))
> sub
[1] Low <NA> <NA> <NA> <NA> <NA>
<NA> <NA> <NA>
<NA> <NA> <NA> Medium <NA> <NA>
<NA> <NA>
[18] <NA> <NA> <NA> <NA> <NA> <NA>
<NA> High
Levels: Low < Medium < High
I was looking for a vectorized solution (apply style) binding
data-frames df and cas, but didn't succeed avoiding the for loop. Could
anybody bring me the ligths over the darkness of my ignorance ? Thank
you very much in advance.
--
Ir. Yves BROSTAUX
Unit?? de Statistique et Informatique
Facult?? universitaire des Sciences agronomiques de Gembloux (FUSAGx)
8, avenue de la Facult??
B-5030 Gembloux
Belgique
T??l: +32 81 62 24 69
Email: brostaux.y at fsagx.ac.be
Hi Yves,
Using your objects, here is a way:
> cascombo=do.call("paste",c(cas,sep="."))
>
factor(do.call("paste",c(df,sep=".")),levels=cascombo,labels=rownames(cas))
[1]
Low <NA> <NA> <NA> <NA> <NA>
<NA> <NA> <NA> <NA> <NA>
<NA> Medium <NA> <NA>
[16] <NA> <NA> <NA> <NA> <NA> <NA>
<NA> <NA> <NA> High
Levels: Low Medium High
It uses:
? paste (sep=.) to create the combinations ie 0.0, 10.50, etc.
? do.call to invoke the paste on the columns of the data.frames
? factor specifying existing levels (only those defined by cas data.frame)
anbd labels
Eric
At 10:12 30/11/2004, Yves Brostaux wrote:>Dear list,
>
>Here's a little problem I already solved with my own coding style, but I
>feel there is a more efficient and cleaner way to write it, but had no
>success finding the "clever" solution.
>
>I want to produce a factor from a subset of the combination of two
>vectors. I have the vectors a et b in a data-frame :
>
> > df <- expand.grid(a=c(0, 5, 10, 25, 50), b=c(0, 25, 50, 100, 200))
> > fac.df
> a b
>1 0 0
>2 5 0
>3 10 0
>4 25 0
>5 50 0
>6 0 25
>7 5 25
><snip>
>
>and want to create a factor which levels correspond to particular
>combinations of a and b (let's say Low for a=0 & b=0, Medium for
a=10 &
>b=50, High for a=50 & b=200, others levels set to NA), reading them from
a
>data-frame which describes the desired subset and corresponding levels.
>
>Here's my own solution (inputs are data-frames df and cas, output is the
>sub factor):
>
> > cas <- as.data.frame(matrix(c(0, 10,50, 0, 50, 200), 3,
> 2,dimnames=list(c("Low", "Medium", "High"),
c("a", "b"))))
> > cas
> a b
>Low 0 0
>Medium 10 50
>High 50 200
>
> > sub <- character(length(df$a))
> > for (i in 1:length(df$a)) {
>+ temp <- rownames(cas)[cas$a==df$a[i] & cas$b==df$b[i]]
>+ sub[i] <- ifelse(length(temp)>0, temp, NA)
>+ }
> > sub <- ordered(sub, levels=c("Low", "Medium",
"High"))
> > sub
>[1] Low <NA> <NA> <NA> <NA> <NA>
<NA> <NA> <NA>
><NA> <NA> <NA> Medium <NA> <NA>
<NA> <NA> [18]
><NA> <NA> <NA> <NA> <NA> <NA>
<NA> High Levels: Low < Medium
>< High
>
>I was looking for a vectorized solution (apply style) binding data-frames
>df and cas, but didn't succeed avoiding the for loop. Could anybody
bring
>me the ligths over the darkness of my ignorance ? Thank you very much in
>advance.
>
>--
>Ir. Yves BROSTAUX
>Unit?? de Statistique et Informatique
>Facult?? universitaire des Sciences agronomiques de Gembloux (FUSAGx)
>8, avenue de la Facult??
>B-5030 Gembloux
>Belgique
>T??l: +32 81 62 24 69
>Email: brostaux.y at fsagx.ac.be
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
Eric Lecoutre
UCL / Institut de Statistique
Voie du Roman Pays, 20
1348 Louvain-la-Neuve
Belgium
tel: (+32)(0)10473050
lecoutre at stat.ucl.ac.be
http://www.stat.ucl.ac.be/ISpersonnel/lecoutre
If the statistics are boring, then you've got the wrong numbers. -Edward
Tufte
Gabor Grothendieck
2004-Nov-30 11:55 UTC
[R] Creating a factor from a combination of vectors
Yves Brostaux <brostaux.y <at> fsagx.ac.be> writes:
:
: Dear list,
:
: Here's a little problem I already solved with my own coding style, but I
: feel there is a more efficient and cleaner way to write it, but had no
: success finding the "clever" solution.
:
: I want to produce a factor from a subset of the combination of two
: vectors. I have the vectors a et b in a data-frame :
:
: > df <- expand.grid(a=c(0, 5, 10, 25, 50), b=c(0, 25, 50, 100, 200))
: > fac.df
: a b
: 1 0 0
: 2 5 0
: 3 10 0
: 4 25 0
: 5 50 0
: 6 0 25
: 7 5 25
: <snip>
:
: and want to create a factor which levels correspond to particular
: combinations of a and b (let's say Low for a=0 & b=0, Medium for a=10
&
: b=50, High for a=50 & b=200, others levels set to NA), reading them from
: a data-frame which describes the desired subset and corresponding levels.
:
: Here's my own solution (inputs are data-frames df and cas, output is the
: sub factor):
:
: > cas <- as.data.frame(matrix(c(0, 10,50, 0, 50, 200), 3,
: 2,dimnames=list(c("Low", "Medium", "High"),
c("a", "b"))))
: > cas
: a b
: Low 0 0
: Medium 10 50
: High 50 200
:
: > sub <- character(length(df$a))
: > for (i in 1:length(df$a)) {
: + temp <- rownames(cas)[cas$a==df$a[i] & cas$b==df$b[i]]
: + sub[i] <- ifelse(length(temp)>0, temp, NA)
: + }
: > sub <- ordered(sub, levels=c("Low", "Medium",
"High"))
: > sub
: [1] Low <NA> <NA> <NA> <NA> <NA>
<NA> <NA> <NA>
: <NA> <NA> <NA> Medium <NA> <NA>
<NA> <NA>
: [18] <NA> <NA> <NA> <NA> <NA>
<NA> <NA> High
: Levels: Low < Medium < High
:
: I was looking for a vectorized solution (apply style) binding
: data-frames df and cas, but didn't succeed avoiding the for loop. Could
: anybody bring me the ligths over the darkness of my ignorance ? Thank
: you very much in advance.
:
Use interaction() and factor() like this:
factor( interaction(df), lev = c("0.0", "10.50",
"50.200"),
lab = c("Low", "Medium", "High"), ordered =
TRUE)
Richard A. O'Keefe
2004-Dec-01 01:37 UTC
[R] Creating a factor from a combination of vectors
Yves Brostaux <brostaux.y at fsagx.ac.be> wrote:
I want to produce a factor from a subset of the combination of two
vectors. I have the vectors a et b in a data-frame :
> df <- expand.grid(a=c(0, 5, 10, 25, 50), b=c(0, 25, 50, 100, 200))
...
and want to create a factor which levels correspond to particular
combinations of a and b (let's say Low for a=0 & b=0, Medium for a=10
&
b=50, High for a=50 & b=200, others levels set to NA), reading them from
a data-frame which describes the desired subset and corresponding levels.
Here's my own solution (inputs are data-frames df and cas, output is the
Why not do it the obvious way?
ifelse(a == 0 & b == 0, "Low",
ifelse(a == 10 & b == 50, "Medium",
ifelse(a == 50 & b == 200, "High",
"Other")))
gives you the mapping from vectors a and b to strings you want.
To get at the vectors locally, you need
with(df, ...)
To convert the vector of strings you get to an ordered factor,
with "Other" mapped to NA, just do
ordered(..., levels = c("Low","Medium","High"))
because any string not listed in levels= will be mapped to NA.
Put these pieces together, and you get
output <- ordered(with(df,
ifelse(a == 0 & b == 0, "Low",
ifelse(a == 10 & b == 50, "Medium",
ifelse(a == 50 & b == 200, "High",
"Other")))),
levels = c("Low","Medium","High"))