When I just run a for loop it works. But if I am going to run a for loop every time for large vectors I might as well use C or any other language. The reason R is powerful is becasue it can handle large vectors without each element being manipulated? Please let me know where I am wrong. for(i in 1:length(news1o)){ + if(news1o[i]>s2o[i]) + s[i]<-1 + else + s[i]<--1 + } -- 'Raghu' [[alternative HTML version deleted]]
I don't know what is wrong with your code but I believe you should use ifelse instead of a for loop: s <- ifelse(news1o > s2o, 1 , -1 ) Alain On 12-Jul-10 16:09, Raghu wrote:> When I just run a for loop it works. But if I am going to run a for loop > every time for large vectors I might as well use C or any other language. > The reason R is powerful is becasue it can handle large vectors without each > element being manipulated? Please let me know where I am wrong. > > for(i in 1:length(news1o)){ > + if(news1o[i]>s2o[i]) > + s[i]<-1 > + else > + s[i]<--1 > + } >-- Alain Guillet Statistician and Computer Scientist SMCS - IMMAQ - Universit? catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50
Seeliger.Curt at epamail.epa.gov
2010-Jul-12 16:14 UTC
[R] in continuation with the earlier R puzzle
> The reason R is powerful is becasue it can handle large vectors withouteach> element being manipulated? Please let me know where I am wrong. > > for(i in 1:length(news1o)){ > + if(news1o[i]>s2o[i]) > + s[i]<-1 > + else > + s[i]<--1 > + }You might give ifelse() a shot here. s <- ifelse(news1o > s2o, 1, -1) Learning to think in vectors is important in R, just like thinking in sets is important for SQL, or thinking in rows and steps is important in SAS. cur -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD seeliger.curt@epa.gov 541/754-4638 [[alternative HTML version deleted]]
On Jul 12, 2010, at 10:09 AM, Raghu wrote:> When I just run a for loop it works. But if I am going to run a for > loop > every time for large vectors I might as well use C or any other > language. > The reason R is powerful is becasue it can handle large vectors > without each > element being manipulated? Please let me know where I am wrong. > > for(i in 1:length(news1o)){ > + if(news1o[i]>s2o[i]) > + s[i]<-1 > + else > + s[i]<--1 > + }Perhaps: s <- 2*( news1o > s2o[1:length(news1o)] ) - 1 ...which I think will throw errors under pretty much the same conditions that would cause errors in that loop. -- David Winsemius, MD West Hartford, CT
On 12-Jul-10 14:09:30, Raghu wrote:> When I just run a for loop it works. But if I am going to > run a for loop every time for large vectors I might as well > use C or any other language. > The reason R is powerful is becasue it can handle large vectors > without each element being manipulated? Please let me know where > I am wrong. > > for(i in 1:length(news1o)){ > + if(news1o[i]>s2o[i]) > + s[i]<-1 > + else > + s[i]<--1 > + } > > -- > 'Raghu'Many operations over the whole length of vectors can be done in "vectorised" form, in which an entire vector is changed in one operation based on the values of the separate elemnts of other vectors, also all take into account in a single operation. What happens "behind to scenes" is that the single element by element operations are performed by a function in a precompiled (usually from C) library. Hence R already does what you are suggesting as a "might as well" alternative! Below is an example, using long vectors. The first case is a copy of your R loop above (with some additional initialisation of the vectors). The second achieves the same result in the "vectorised" form. news1o <- runif(1000000) s2o <- runif(1000000) s <- numeric(length(news1o)) proc.time() # user system elapsed # 1.728 0.680 450.257 for(i in 1:length(news1o)){ ### Using a loop if(news1o[i]>s2o[i]) s[i]<- 1 else s[i]<- (-1) } proc.time() # user system elapsed # 11.184 0.756 460.340 s2 <- 2*(news1o > s2o) - 1 ### Vectorised proc.time() # user system elapsed # 11.348 0.852 460.663 sum(s2 != s) # [1] 0 ### Results identical Result: The loop took (11.184 - 1.728) = 9.456 seconds, Vectorised, it took (11.348 - 11.184) = 0.164 seconds. Loop/Vector = (11.184 - 1.728)/(11.348 - 11.184) = 57.65854 i.e. nearly 60 times as long. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 12-Jul-10 Time: 17:36:07 ------------------------------ XFMail ------------------------------
I wanted to point out one thing that Ted said, about initializing the vectors ('s' in your example). This can make a dramatic speed difference if you are using a for loop (the difference is neglible with vectorized computations). Also, a lot of benchmarks have been flying around, each from a different system and using random numbers without identical seeds. So to provide an overall comparison of all the methods I saw here plus demonstrate the speed difference for initializing a vector (if you know its desired length in advance), I ran these benchmarks. Notes: I did not want to interfere with your objects so I used different names. The equivalencies are: news1o = x; s2o = y; s = z. system.time() automatically calculates the time difference from proc.time() between start and finish .> ##R version info > sessionInfo()R version 2.11.1 (2010-05-31) x86_64-pc-mingw32 #snipped> > ##Some Sample Data > set.seed(10) > x <- rnorm(10^6) > set.seed(15) > y <- rnorm(10^6) > > ##Benchmark 1 > z.1 <- NULL > system.time(for(i in 1:length(x)) {+ if(x[i] > y[i]) { + z.1[i] <- 1 + } else { + z.1[i] <- -1} + } + ) user system elapsed 1303.83 174.24 1483.74> > ##Benchmark 2 > #initialize 'z' at length > z.2 <- vector("numeric", length = 10^6) > system.time(for(i in 1:length(x)) {+ if(x[i] > y[i]) { + z.2[i] <- 1 + } else { + z.2[i] <- -1} + } + ) user system elapsed 3.77 0.00 3.77> > ##Benchmark 3 > > z.3 <- NULL > system.time(z.3 <- ifelse(x > y, 1, -1))user system elapsed 0.38 0.00 0.38> > ##Benchmark 4 > > z.4 <- vector("numeric", length = 10^6) > system.time(z.4 <- ifelse(x > y, 1, -1))user system elapsed 0.31 0.00 0.31> > ##Benchmark 5 > > system.time(z.5 <- 2*(x > y) - 1)user system elapsed 0.01 0.00 0.01> > ##Benchmark 6 > > system.time(z.6 <- numeric(length(x))-1)user system elapsed 0 0 0> system.time(z.6[x > y] <- 1)user system elapsed 0.03 0.00 0.03> > ##Show that all results are identical > > identical(z.1, z.2)[1] TRUE> identical(z.1, z.3)[1] TRUE> identical(z.1, z.4)[1] TRUE> identical(z.1, z.5)[1] TRUE> identical(z.1, z.6)[1] TRUE I have not replicated these on other system, but tentatively, it appears that loops are significantly slower than ifelse(), which in turn is slower than options 5 and 6. However, when using the same test data and the same system, I did not find an appreciable difference between options 5 and 6 speed wise. Cheers, Josh On Mon, Jul 12, 2010 at 7:09 AM, Raghu <r.raghuraman at gmail.com> wrote:> When I just run a for loop it works. But if I am going to run a for loop > every time for large vectors I might as well use C or any other language. > The reason R is powerful is becasue it can handle large vectors without each > element being manipulated? Please let me know where I am wrong. > > for(i in 1:length(news1o)){ > + if(news1o[i]>s2o[i]) > + s[i]<-1 > + else > + s[i]<--1 > + } > > -- > 'Raghu' > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Hi r-help-bounces at r-project.org napsal dne 12.07.2010 16:09:30:> When I just run a for loop it works. But if I am going to run a for loop > every time for large vectors I might as well use C or any otherlanguage.> The reason R is powerful is becasue it can handle large vectors withouteach> element being manipulated? Please let me know where I am wrong. > > for(i in 1:length(news1o)){ > + if(news1o[i]>s2o[i]) > + s[i]<-1 > + else > + s[i]<--1 > + }Think in R not in C. Why using loops when you can use whole object directly. It is like drinking beer from snifters. It is possible but using pints is preferable and more convenient. news1o>s2o gives you a logical vector the same length and you can use it directly for further selection or computation. You can consider FALSE as 0 and TRUE as 1 and use it as numeric vector so x<-runif(10) y<-runif(10) c(-1,1)[(x>y)+1] selects -1 when FALSE and 1 when TRUE. or you can use it in mathematical operation directly (x>y)*2-1 Regards Petr> > -- > 'Raghu' > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.