Dimitri Liakhovitski
2013-Jan-29 21:11 UTC
[R] Fastest way to compare a single value with all values in one column of a data frame
Hello! I have a large data frame x: x<-data.frame(item=letters[1:5],a=1:5,b=11:15) # in actuality, x has 1000 rows x$item<-as.character(x$item) I also have a small data frame y with just 1 row: y<-data.frame(item="f",a=3,b=10) y$item<-as.character(y$item) I have to decide if y$a is larger than the smallest of all the values in x$a. If it is, I want y to replace the whole row in x that has the lowest value in column a. This is how I'd do it. if(y$a>min(x$a)){ whichmin<-which(x$a==min(x$a)) x[whichmin,]<-y[1,] } I am wondering if there is a faster way of doing it. What would be the fastest possible way? I'd have to do it, unfortunately, many-many times. Thank you very much! -- Dimitri Liakhovitski gfk.com <http://marketfusionanalytics.com/> [[alternative HTML version deleted]]
nalluri pratap
2013-Jan-30 12:32 UTC
[R] Fastest way to compare a single value with all values in one column of a data frame
Hi Dimitri, Does this help? k1<-data.frame(item=sample(rep(letters),10,replace=T),a=c(1:10),b=11:20) k2<-data.frame(item="f",a=3,b=10) merge<-function(y,x) { if(y$a>min(x$a)) { x<-rbind(x,y) x<-x[-which.min(x$a),] } return(x) } merge(k2,k1) or much faster way would be to refer "library(sqldf)". --- On Wed, 30/1/13, Dimitri Liakhovitski <dimitri.liakhovitski@gmail.com> wrote: From: Dimitri Liakhovitski <dimitri.liakhovitski@gmail.com> Subject: [R] Fastest way to compare a single value with all values in one column of a data frame To: "r-help" <r-help@r-project.org> Date: Wednesday, 30 January, 2013, 2:41 AM Hello! I have a large data frame x: x<-data.frame(item=letters[1:5],a=1:5,b=11:15) # in actuality, x has 1000 rows x$item<-as.character(x$item) I also have a small data frame y with just 1 row: y<-data.frame(item="f",a=3,b=10) y$item<-as.character(y$item) I have to decide if y$a is larger than the smallest of all the values in x$a. If it is, I want y to replace the whole row in x that has the lowest value in column a. This is how I'd do it. if(y$a>min(x$a)){ whichmin<-which(x$a==min(x$a)) x[whichmin,]<-y[1,] } I am wondering if there is a faster way of doing it. What would be the fastest possible way? I'd have to do it, unfortunately, many-many times. Thank you very much! -- Dimitri Liakhovitski gfk.com <http://marketfusionanalytics.com/> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Jessica Streicher
2013-Jan-30 12:38 UTC
[R] Fastest way to compare a single value with all values in one column of a data frame
If you wanted this for all values in x that are smaller, i'd use x[x$a < y$a,] <- y for just the smallest: x[intersect(which(x$a < y$a),which.min(x$a)),] <- y On 29.01.2013, at 22:11, Dimitri Liakhovitski wrote:> Hello! > > I have a large data frame x: > x<-data.frame(item=letters[1:5],a=1:5,b=11:15) # in actuality, x has 1000 > rows > x$item<-as.character(x$item) > I also have a small data frame y with just 1 row: > y<-data.frame(item="f",a=3,b=10) > y$item<-as.character(y$item) > > I have to decide if y$a is larger than the smallest of all the values in > x$a. If it is, I want y to replace the whole row in x that has the lowest > value in column a. > This is how I'd do it. > > if(y$a>min(x$a)){ > whichmin<-which(x$a==min(x$a)) > x[whichmin,]<-y[1,] > } > > > I am wondering if there is a faster way of doing it. What would be the > fastest possible way? I'd have to do it, unfortunately, many-many times. > > Thank you very much! > > -- > Dimitri Liakhovitski > gfk.com <http://marketfusionanalytics.com/> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
arun
2013-Jan-30 14:22 UTC
[R] Fastest way to compare a single value with all values in one column of a data frame
Hi, I guess you could also use: ?x[match(min(x$a),x$a[x$a<y$a]),]<- y ?x #? item a? b #1??? f 3 10 #2??? b 2 12 #3??? c 3 13 #4??? d 4 14 #5??? e 5 15 A.K. ----- Original Message ----- From: Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> To: r-help <r-help at r-project.org> Cc: Sent: Tuesday, January 29, 2013 4:11 PM Subject: [R] Fastest way to compare a single value with all values in one column of a data frame Hello! I have a large data frame x: x<-data.frame(item=letters[1:5],a=1:5,b=11:15)? # in actuality, x has 1000 rows x$item<-as.character(x$item) I also have a small data frame y with just 1 row: y<-data.frame(item="f",a=3,b=10) y$item<-as.character(y$item) I have to decide if y$a is larger than the smallest of all the values in x$a. If it is, I want y to replace the whole row in x that has the lowest value in column a. This is how I'd do it. if(y$a>min(x$a)){ ? whichmin<-which(x$a==min(x$a)) ? x[whichmin,]<-y[1,] } I am wondering if there is a faster way of doing it. What would be the fastest possible way? I'd have to do it, unfortunately, many-many times. Thank you very much! -- Dimitri Liakhovitski gfk.com <http://marketfusionanalytics.com/> ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
arun
2013-Jan-30 16:03 UTC
[R] Fastest way to compare a single value with all values in one column of a data frame
HI, Sorry, my previous solution doesn't work. This should work for your dataset: set.seed(1851) x<- data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F) y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) ?x[x$a%in%which.min(x[x$a<y$a,]$a),]<- y #if there are multiple minimum values set.seed(1241) x1<- data.frame(item=sample(letters[1:10],1e4,replace=TRUE),a=sample(1:30,1e4,replace=TRUE),b=sample(1:100,1e4,replace=TRUE),stringsAsFactors=F) y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) length(x1$a[x1$a==1]) #[1] 330 ?system.time({x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1}) #?? user? system elapsed ?# 0.000?? 0.000?? 0.001 length(x1$a[x1$a==1]) #[1] 0 #For some reason, it is not working when the multiple number of minimum values > some value set.seed(1241) x1<- data.frame(item=sample(letters[1:10],1e5,replace=TRUE),a=sample(1:30,1e5,replace=TRUE),b=sample(1:100,1e5,replace=TRUE),stringsAsFactors=F) y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) length(x1$a[x1$a==1]) #[1] 3404 x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1 ?length(x1$a[x1$a==1]) #[1] 3404 #not getting replaced #However, if I try: set.seed(1241) ?x1<- data.frame(item=sample(letters[1:10],1e6,replace=TRUE),a=sample(1:5000,1e6,replace=TRUE),b=sample(1:100,1e6,replace=TRUE),stringsAsFactors=F) ?y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) ?length(x1$a[x1$a==1]) #[1] 208 ?system.time(x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1) #user? system elapsed ?# 0.124?? 0.016?? 0.138 ? length(x1$a[x1$a==1]) #[1] 0 #Tried Jessica's solution: set.seed(1851) ?x<- data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F) ?y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) ?x[intersect(which(x$a < y$a),which.min(x$a)),] <- y ?x #?? item? a? b #1???? a? 8 25 #2???? a 10 26 #3???? f? 3 10 #replaced #4???? e 15 26 #5???? b 13 20 #6???? a? 5 23 #7???? d? 4 29 #8???? e? 2 24 #9???? c? 7 30 #10??? e 14 24 #11??? d? 2 20 #12??? e 10 21 #13??? c 13 27 #14??? d 12 23 #15??? b 11 26 #16??? e? 5 22 #17??? c? 1 26? #it is not replaced #18??? a? 8 21 #19??? e 10 26 #20??? c? 2 22 A.K. ----- Original Message ----- From: Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> To: r-help <r-help at r-project.org> Cc: Sent: Tuesday, January 29, 2013 4:11 PM Subject: [R] Fastest way to compare a single value with all values in one column of a data frame Hello! I have a large data frame x: x<-data.frame(item=letters[1:5],a=1:5,b=11:15)? # in actuality, x has 1000 rows x$item<-as.character(x$item) I also have a small data frame y with just 1 row: y<-data.frame(item="f",a=3,b=10) y$item<-as.character(y$item) I have to decide if y$a is larger than the smallest of all the values in x$a. If it is, I want y to replace the whole row in x that has the lowest value in column a. This is how I'd do it. if(y$a>min(x$a)){ ? whichmin<-which(x$a==min(x$a)) ? x[whichmin,]<-y[1,] } I am wondering if there is a faster way of doing it. What would be the fastest possible way? I'd have to do it, unfortunately, many-many times. Thank you very much! -- Dimitri Liakhovitski gfk.com <http://marketfusionanalytics.com/> ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Reasonably Related Threads
- expand.grid on contents of a list
- grabbing from elements of a list without a loop
- Select only unique rows from a data frame
- Looping through rows of all elements of a list that has variable length
- Assigning cases to groupings based on the values of several variables