Hi, I have the following problem: I have a large dataframe where each row is specified by two numerical value (one 1:25 and the other one large specific number (e.g. 203043)). I have a list of 60 categories which are also assigned to one of the first numerical value (1:25) but have a range for the second numerical value (e.g. 200020 - 208040) in two different columns. I want now to assign a category to each row in a new variable by testing for accordance to the first numerical value and overlap of the second numerical value with the range. For example: entry1 has numerical value 1 = 15 and numerical value 2 = 200050. This would be assigned to category3, which has a numerical value 1 = 15 and a range for numerical value 2 = 200000 - 201000. It would be great if any one could help me out with this. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Assigning-entries-to-categories-tp2272697p2272697.html Sent from the R help mailing list archive at Nabble.com.
As requested, here is some example data: a=c("x","y","z") b=c(1,5,8) c=c(200010,535388,19929) data=data.frame(a,b,c) d=c("cat1","cat2","cat3") b1=c(1,5,8) c_start=c(200000,500000,600000) c_stop=c(201000,550000,700000) category=data.frame(d,b1,c_start,c_stop) I want to add a variable into data, which assigns in this case to "x" "cat1", "y" "cat2" and leaves "z" unassigned. So first it should test if b b1 for each row and if this is true it should test if c >= c_start and <c_stop. If this is all true the value of d should be transfered into the new variable. -- View this message in context: http://r.789695.n4.nabble.com/Assigning-entries-to-categories-tp2272697p2274758.html Sent from the R help mailing list archive at Nabble.com.
On Thu, 1 Jul 2010, LogLord wrote:> > As requested, here is some example data: > a=c("x","y","z") > b=c(1,5,8) > c=c(200010,535388,19929) > data=data.frame(a,b,c) > > d=c("cat1","cat2","cat3") > b1=c(1,5,8) > c_start=c(200000,500000,600000) > c_stop=c(201000,550000,700000) > category=data.frame(d,b1,c_start,c_stop) > > I want to add a variable into data, which assigns in this case to "x" > "cat1", "y" "cat2" and leaves "z" unassigned. So first it should test if b > b1 for each row and if this is true it should test if c >= c_start and <> c_stop. If this is all true the value of d should be transfered into the new > variable.Like this?> data$new.var <- category$d > is.na( data$new.var ) <- with(data, b != category$b1 | c < category$c_start | c> category$c_stop ) > dataa b c new.var 1 x 1 200010 cat1 2 y 5 535388 cat2 3 z 8 19929 <NA>>You may want to read up on ?match and ?merge in case the rows of data and those of category are not in one-to-one correspondence. Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Thanks for your help! You are right it is not one-to-one assigned that would be indeed very easy... its more like assigning 1000 entries to 60 categories... Unfortunately, the ?match and ?merge did not help me a lot... I am a newbie to such programming stuff in R. It would be great if you could help me again to set this up. -- View this message in context: http://r.789695.n4.nabble.com/Assigning-entries-to-categories-tp2272697p2277140.html Sent from the R help mailing list archive at Nabble.com.
On Sat, 3 Jul 2010, LogLord wrote:> > Thanks for your help! > You are right it is not one-to-one assigned that would be indeed very > easy... its more like assigning 1000 entries to 60 categories... > > Unfortunately, the ?match and ?merge did not help me a lot... I am a newbie > to such programming stuff in R. > > It would be great if you could help me again to set this up.Then you need to observe this: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you provide a _reproducible example_ that properly mimics the features of the problem you need to solve, the chance that someone will either solve it for you or point you in the right direction will be better. [stuff deleted] Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
OK, thanks for the help! Here a more complex example: a=c("x","y","z") b=c(8,14,19) c=c(200010,535388,19929) data=data.frame(a,b,c) d=c("cat1","cat2","cat3","cat4","cat5","cat6") b1=c(14,5,8,20,19,1) c_start=c(500000,500000,200000,200000,18000,600000) c_stop=c(550000,550000,201000,201000,20000,700000) category=data.frame(d,b1,c_start,c_stop) Again I want to create a new variable, which automatically assigns the category to the data based on matching b = b1 and c >= c_start and <=c_stop. I hope this explains my problem more explicit. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Assigning-entries-to-categories-tp2272697p2278334.html Sent from the R help mailing list archive at Nabble.com.
On Jul 5, 2010, at 8:54 AM, LogLord wrote:> > OK, thanks for the help! > > Here a more complex example: > > a=c("x","y","z") > b=c(8,14,19) > c=c(200010,535388,19929) > data=data.frame(a,b,c) > > d=c("cat1","cat2","cat3","cat4","cat5","cat6") > b1=c(14,5,8,20,19,1) > c_start=c(500000,500000,200000,200000,18000,600000) > c_stop=c(550000,550000,201000,201000,20000,700000) > category=data.frame(d,b1,c_start,c_stop) > > > Again I want to create a new variable, which automatically assigns the > category to the data based on matching b = b1 and c >= c_start and > <=c_stop. >Probably not the most elegant solution. For each data row, see which one or more rows of category satisfies. Not tested for possibility of non-hit: > for (i in 1:nrow(data)) print( category[ which(apply(category[, -1], 1, function(x) {data$b[i]==x[1] & data $c[i] > x[2] & x[3] > data$c[i]})), 1] ) [1] cat3 Levels: cat1 cat2 cat3 cat4 cat5 cat6 [1] cat1 Levels: cat1 cat2 cat3 cat4 cat5 cat6 [1] cat5 Levels: cat1 cat2 cat3 cat4 cat5 cat6 A couple of points. Bad practice to name variables or objects with the name "c". Also bad practice to name objects "data". Both at common R function names.> I hope this explains my problem more explicit. > > Thanks! > -- > View this message in context: http://r.789695.n4.nabble.com/Assigning-entries-to-categories-tp2272697p2278334.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
On Mon, Jul 5, 2010 at 8:54 AM, LogLord <nils.schoof at web.de> wrote:> > OK, thanks for the help! > > Here a more complex example: > > a=c("x","y","z") > b=c(8,14,19) > c=c(200010,535388,19929) > data=data.frame(a,b,c) > > d=c("cat1","cat2","cat3","cat4","cat5","cat6") > b1=c(14,5,8,20,19,1) > c_start=c(500000,500000,200000,200000,18000,600000) > c_stop=c(550000,550000,201000,201000,20000,700000) > category=data.frame(d,b1,c_start,c_stop) > > > Again I want to create a new variable, which automatically assigns the > category to the data based on matching b = b1 and c ?>= c_start and > <=c_stop. >Try this:> library(sqldf) > > sqldf("select data.*, d from data, category where data.b = category.b1 and c >= c_start and c <= c_stop")a b c d 1 x 8 200010 cat3 2 y 14 535388 cat1 3 z 19 19929 cat5