thr3ads.net - R help - [R] vectorization of loops in R [Nov 2021]

If this information is useful, please help other people find it:
Share via:

Luigi Marongiu

2021-Nov-17 13:20 UTC

[R] vectorization of loops in R

Hello,
I have a dataframe with 3 variables. I want to loop through it to get
the mean value of the variable `z`, as follows:
```
df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
y = rep(letters[1:5],3),
z = rnorm(15),
stringsAsFactors = FALSE)
m = vector()
for (i in unique(df$y)) {
s = df[df$y == i,]
m = append(m, mean(s$z))
}
names(m) = unique(df$y)> (m)a          b          c          d          e
-0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004
```
The problem is that I have one million `y` values, so the work takes
almost a day. I understand that vectorization will speed up the
procedure. But how shall I write the procedure in vectorial terms?
Thank you

Kevin Thorpe

2021-Nov-17 13:28 UTC

head link

[R] vectorization of loops in R

If I follow what you are trying to do, you want the mean of z for each value of
y.

tapply(df$z, df$y, mean)

> On Nov 17, 2021, at 8:20 AM, Luigi Marongiu <marongiu.luigi at
gmail.com> wrote:
> 
> Hello,
> I have a dataframe with 3 variables. I want to loop through it to get
> the mean value of the variable `z`, as follows:
> ```
> df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
> y = rep(letters[1:5],3),
> z = rnorm(15),
> stringsAsFactors = FALSE)
> m = vector()
> for (i in unique(df$y)) {
> s = df[df$y == i,]
> m = append(m, mean(s$z))
> }
> names(m) = unique(df$y)
>> (m)
> a          b          c          d          e
> -0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004
> ```
> The problem is that I have one million `y` values, so the work takes
> almost a day. I understand that vectorization will speed up the
> procedure. But how shall I write the procedure in vectorial terms?
> Thank you
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael?s Hospital
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

Jan van der Laan

2021-Nov-17 13:32 UTC

head link

[R] vectorization of loops in R

Have a look at the base functions tapply and aggregate.

For example see:
- 
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#The-function-tapply_0028_0029-and-ragged-arrays
,
- https://online.stat.psu.edu/stat484/lesson/9/9.2,
- or ?tapply and ?aggregate.

Also your current code seems to contain an error: `s = df[df$y == i,]` 
should be `s = df$z[df$y == i]` I think.

HTH,
Jan






On 17-11-2021 14:20, Luigi Marongiu wrote:> Hello,
> I have a dataframe with 3 variables. I want to loop through it to get
> the mean value of the variable `z`, as follows:
> ```
> df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
> y = rep(letters[1:5],3),
> z = rnorm(15),
> stringsAsFactors = FALSE)
> m = vector()
> for (i in unique(df$y)) {
> s = df[df$y == i,]
> m = append(m, mean(s$z))
> }
> names(m) = unique(df$y)
>> (m)
> a          b          c          d          e
> -0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004
> ```
> The problem is that I have one million `y` values, so the work takes
> almost a day. I understand that vectorization will speed up the
> procedure. But how shall I write the procedure in vectorial terms?
> Thank you
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

PIKAL Petr

2021-Nov-18 11:39 UTC

head link

[R] vectorization of loops in R

Hi

above tapply and aggregate, split *apply could be used)

sapply(with(df, split(z, y)), mean)

Cheers
Petr
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Luigi
Marongiu
> Sent: Wednesday, November 17, 2021 2:21 PM
> To: r-help <r-help at r-project.org>
> Subject: [R] vectorization of loops in R
> 
> Hello,
> I have a dataframe with 3 variables. I want to loop through it to get
> the mean value of the variable `z`, as follows:
> ```
> df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
> y = rep(letters[1:5],3),
> z = rnorm(15),
> stringsAsFactors = FALSE)
> m = vector()
> for (i in unique(df$y)) {
> s = df[df$y == i,]
> m = append(m, mean(s$z))
> }
> names(m) = unique(df$y)
> > (m)
> a          b          c          d          e
> -0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004
> ```
> The problem is that I have one million `y` values, so the work takes
> almost a day. I understand that vectorization will speed up the
> procedure. But how shall I write the procedure in vectorial terms?
> Thank you
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Nov 2021 - vectorization of loops in R

[R] vectorization of loops in R

[R] vectorization of loops in R

[R] vectorization of loops in R

[R] vectorization of loops in R