thr3ads.net - R help - [R] Assigning entries to categories [Jun 2010]

If this information is useful, please help other people find it:
Share via:

LogLord

2010-Jun-30 10:15 UTC

[R] Assigning entries to categories

Hi,

I have the following problem:
I have a large dataframe where each row is specified by two numerical value
(one 1:25 and the other one large specific number (e.g. 203043)). I have a
list of 60 categories which are also assigned to one of the first numerical
value (1:25) but have a range for the second numerical value  (e.g. 200020 -
208040) in two different columns.

I want now to assign a category to each row in a new variable by testing for
accordance to the first numerical value and overlap of the second numerical
value with the range.

For example:
entry1 has numerical value 1 = 15 and numerical value 2 = 200050.
This would be assigned to category3, which has a numerical value 1 = 15 and
a range for numerical value 2 = 200000 - 201000.

It would be great if any one could help me out with this.

Thanks!



-- 
View this message in context:
http://r.789695.n4.nabble.com/Assigning-entries-to-categories-tp2272697p2272697.html
Sent from the R help mailing list archive at Nabble.com.

LogLord

2010-Jul-01 08:03 UTC

head link

[R] Assigning entries to categories

As requested, here is some example data:
a=c("x","y","z")
b=c(1,5,8)
c=c(200010,535388,19929)
data=data.frame(a,b,c)

d=c("cat1","cat2","cat3")
b1=c(1,5,8)
c_start=c(200000,500000,600000)
c_stop=c(201000,550000,700000)
category=data.frame(d,b1,c_start,c_stop)

I want to add a variable into data, which assigns in this case to "x"
"cat1", "y" "cat2" and leaves "z"
unassigned. So first it should test if b b1 for each row and if this is true it
should test if c  >= c_start and <c_stop. If this is all true the value of
d should be transfered into the new
variable.
-- 
View this message in context:
http://r.789695.n4.nabble.com/Assigning-entries-to-categories-tp2272697p2274758.html
Sent from the R help mailing list archive at Nabble.com.

Charles C. Berry

2010-Jul-01 16:51 UTC

head link

[R] Assigning entries to categories

On Thu, 1 Jul 2010, LogLord wrote:
>
> As requested, here is some example data:
> a=c("x","y","z")
> b=c(1,5,8)
> c=c(200010,535388,19929)
> data=data.frame(a,b,c)
>
> d=c("cat1","cat2","cat3")
> b1=c(1,5,8)
> c_start=c(200000,500000,600000)
> c_stop=c(201000,550000,700000)
> category=data.frame(d,b1,c_start,c_stop)
>
> I want to add a variable into data, which assigns in this case to
"x"
> "cat1", "y" "cat2" and leaves "z"
unassigned. So first it should test if b > b1 for each row and if this is
true it should test if c  >= c_start and <> c_stop. If this is all true
the value of d should be transfered into the new
> variable.
Like this?
> data$new.var <- category$d
> is.na( data$new.var ) <- with(data, b != category$b1 | c <
category$c_start | c> category$c_stop )
> data   a b      c new.var
1 x 1 200010    cat1
2 y 5 535388    cat2
3 z 8  19929    <NA>>
You may want to read up on

 	?match
and
 	?merge

in case the rows of data and those of category are not in 
one-to-one correspondence.


Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

LogLord

2010-Jul-03 08:29 UTC

head link

[R] Assigning entries to categories

Thanks for your help!
You are right it is not one-to-one assigned that would be indeed very
easy... its more like assigning 1000 entries to 60 categories...

Unfortunately, the ?match and ?merge did not help me a lot... I am a newbie
to such programming stuff in R.

It would be great if you could help me again to set this up.
-- 
View this message in context:
http://r.789695.n4.nabble.com/Assigning-entries-to-categories-tp2272697p2277140.html
Sent from the R help mailing list archive at Nabble.com.

Charles C. Berry

2010-Jul-03 17:15 UTC

head link

[R] Assigning entries to categories

On Sat, 3 Jul 2010, LogLord wrote:
>
> Thanks for your help!
> You are right it is not one-to-one assigned that would be indeed very
> easy... its more like assigning 1000 entries to 60 categories...
>
> Unfortunately, the ?match and ?merge did not help me a lot... I am a newbie
> to such programming stuff in R.
>
> It would be great if you could help me again to set this up.
Then you need to observe this:

 	   PLEASE do read the posting guide
            http://www.R-project.org/posting-guide.html and provide
            commented, minimal, self-contained, reproducible code.

If you provide a _reproducible example_ that properly mimics the features 
of the problem you need to solve, the chance that someone will either 
solve it for you or point you in the right direction will be better.


[stuff deleted]

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

LogLord

2010-Jul-05 12:54 UTC

head link

[R] Assigning entries to categories

OK, thanks for the help!

Here a more complex example:

a=c("x","y","z")
b=c(8,14,19)
c=c(200010,535388,19929)
data=data.frame(a,b,c)

d=c("cat1","cat2","cat3","cat4","cat5","cat6")
b1=c(14,5,8,20,19,1)
c_start=c(500000,500000,200000,200000,18000,600000)
c_stop=c(550000,550000,201000,201000,20000,700000)
category=data.frame(d,b1,c_start,c_stop) 


Again I want to create a new variable, which automatically assigns the
category to the data based on matching b = b1 and c  >= c_start and
<=c_stop.

I hope this explains my problem more explicit.

Thanks!
-- 
View this message in context:
http://r.789695.n4.nabble.com/Assigning-entries-to-categories-tp2272697p2278334.html
Sent from the R help mailing list archive at Nabble.com.

David Winsemius

2010-Jul-05 15:08 UTC

head link

[R] Assigning entries to categories

On Jul 5, 2010, at 8:54 AM, LogLord wrote:
>
> OK, thanks for the help!
>
> Here a more complex example:
>
> a=c("x","y","z")
> b=c(8,14,19)
> c=c(200010,535388,19929)
> data=data.frame(a,b,c)
>
>
d=c("cat1","cat2","cat3","cat4","cat5","cat6")
> b1=c(14,5,8,20,19,1)
> c_start=c(500000,500000,200000,200000,18000,600000)
> c_stop=c(550000,550000,201000,201000,20000,700000)
> category=data.frame(d,b1,c_start,c_stop)
>
>
> Again I want to create a new variable, which automatically assigns the
> category to the data based on matching b = b1 and c  >= c_start and
> <=c_stop.
>

Probably not the most elegant solution. For each data row, see which  
one or more rows of category satisfies. Not tested for possibility of  
non-hit:

 > for (i in 1:nrow(data)) print( category[
                       which(apply(category[, -1], 1,
                                    function(x) {data$b[i]==x[1] & data 
$c[i] > x[2] & x[3] > data$c[i]})),
                                             1] )
[1] cat3
Levels: cat1 cat2 cat3 cat4 cat5 cat6
[1] cat1
Levels: cat1 cat2 cat3 cat4 cat5 cat6
[1] cat5
Levels: cat1 cat2 cat3 cat4 cat5 cat6

A couple of points. Bad practice to name variables or objects with the  
name "c". Also bad practice to name objects "data". Both at
common R
function names.
> I hope this explains my problem more explicit.
>
> Thanks!
> -- 
> View this message in context:
http://r.789695.n4.nabble.com/Assigning-entries-to-categories-tp2272697p2278334.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Gabor Grothendieck

2010-Jul-05 15:20 UTC

head link

[R] Assigning entries to categories

On Mon, Jul 5, 2010 at 8:54 AM, LogLord <nils.schoof at web.de>
wrote:>
> OK, thanks for the help!
>
> Here a more complex example:
>
> a=c("x","y","z")
> b=c(8,14,19)
> c=c(200010,535388,19929)
> data=data.frame(a,b,c)
>
>
d=c("cat1","cat2","cat3","cat4","cat5","cat6")
> b1=c(14,5,8,20,19,1)
> c_start=c(500000,500000,200000,200000,18000,600000)
> c_stop=c(550000,550000,201000,201000,20000,700000)
> category=data.frame(d,b1,c_start,c_stop)
>
>
> Again I want to create a new variable, which automatically assigns the
> category to the data based on matching b = b1 and c ?>= c_start and
> <=c_stop.
>
Try this:
> library(sqldf)
>
> sqldf("select data.*, d from data, category where data.b = category.b1
and c >= c_start and c <= c_stop")  a  b      c    d
1 x  8 200010 cat3
2 y 14 535388 cat1
3 z 19  19929 cat5

R help - Jun 2010 - Assigning entries to categories

[R] Assigning entries to categories

[R] Assigning entries to categories

[R] Assigning entries to categories

[R] Assigning entries to categories

[R] Assigning entries to categories

[R] Assigning entries to categories

[R] Assigning entries to categories

[R] Assigning entries to categories