Chris Beeley
2011-Aug-29 13:55 UTC
[R] Basic question about re-writing for loop as a function
Hello-
Sorry to ask a basic question, but I've spent many hours on this now
and seem to be missing something.
I have a loop that looks like this:
mainmat=data.frame(matrix(data=0, ncol=92, nrow=length(predata$Words_MH)))
for(i in 1:length(predata$Words_MH)){
for(j in 1:92){
mainmat[i,j]=ifelse(j %in%
as.numeric(unlist(strsplit(predata$Words_MH[i], split=","))), 1, 0)
}
}
What it's doing is creating a matrix with 92 columns, that's the
number of different codes, and then for every row of my data it looks
to see if the code (code 1, code 2, etc.) is in the string and if it
is, returns a 1 in the relevant column (column 1 for code 1, column 2
for code 2, etc.)
There are 1000 rows in the database, and I have to run several
versions of this code, so it just takes way too long, I have been
trying to rewrite using lapply. I tried this:
myfunction=function(x, y) ifelse(x %in%
as.numeric(unlist(strsplit(predata$Words_MH[y], split=","))), 1, 0)
for(j in 1:92){
mainmat[,j]= lapply(predata$Words, myfunction)
}
but I don't think I can use something that takes two inputs, and I
can't seem to remove either.
Here's a dput of the first 10 rows of the variable in case that's
helpful:
predata$Words=c("1", "1", "1", "1",
"2,3,4", "5", "1", "1", "6",
"7,8,9,10")
Given these data, I want the function to return, for the first column,
1, 1, 1, 1, 0, 0, 1, 1, 0, 0 (because those are the values of Words
which contain a 1) and for the second column return 0, 0, 0, 0, 1, 0,
0, 0, 0, 0 (because the fifth value is the only one that contains a
2).
Any suggestions gratefully received!
Chris Beeley
Institute of Mental Health, UK
Patrick Burns
2011-Aug-29 16:51 UTC
[R] Basic question about re-writing for loop as a function
You are somewhere in Circles 3 and 4 of
'The R Inferno'.
If you have a function to apply over more
than one argument, then 'mapply' will do
that.
But you don't need to do that -- you can do
the operation you want efficiently:
*) create your resulting matrix with all zeros,
no reason for this to be a data frame, almost
surely.
mainmat <- matrix(0, ncol=92, nrow=...)
*) create a subscripting matrix giving the row
and column combinations to change to 1. Here is
a small example:
> ss <- strsplit(c("1", "2,3", "1"),
split=",")
> sr <- rep(1:length(ss), sapply(ss, length))
> sr
[1] 1 2 2 3
> sc <- as.numeric(unlist(ss))
> sc
[1] 1 2 3 1
> mainmat[cbind(sr, sc)] <- 1
On 29/08/2011 14:55, Chris Beeley wrote:> Hello-
>
> Sorry to ask a basic question, but I've spent many hours on this now
> and seem to be missing something.
>
> I have a loop that looks like this:
>
> mainmat=data.frame(matrix(data=0, ncol=92,
nrow=length(predata$Words_MH)))
>
> for(i in 1:length(predata$Words_MH)){
> for(j in 1:92){
>
> mainmat[i,j]=ifelse(j %in%
> as.numeric(unlist(strsplit(predata$Words_MH[i], split=","))), 1,
0)
>
> }
> }
>
> What it's doing is creating a matrix with 92 columns, that's the
> number of different codes, and then for every row of my data it looks
> to see if the code (code 1, code 2, etc.) is in the string and if it
> is, returns a 1 in the relevant column (column 1 for code 1, column 2
> for code 2, etc.)
>
> There are 1000 rows in the database, and I have to run several
> versions of this code, so it just takes way too long, I have been
> trying to rewrite using lapply. I tried this:
>
> myfunction=function(x, y) ifelse(x %in%
> as.numeric(unlist(strsplit(predata$Words_MH[y], split=","))), 1,
0)
>
> for(j in 1:92){
> mainmat[,j]= lapply(predata$Words, myfunction)
> }
>
> but I don't think I can use something that takes two inputs, and I
> can't seem to remove either.
>
> Here's a dput of the first 10 rows of the variable in case that's
helpful:
>
> predata$Words=c("1", "1", "1", "1",
"2,3,4", "5", "1", "1", "6",
"7,8,9,10")
>
> Given these data, I want the function to return, for the first column,
> 1, 1, 1, 1, 0, 0, 1, 1, 0, 0 (because those are the values of Words
> which contain a 1) and for the second column return 0, 0, 0, 0, 1, 0,
> 0, 0, 0, 0 (because the fifth value is the only one that contains a
> 2).
>
> Any suggestions gratefully received!
>
> Chris Beeley
> Institute of Mental Health, UK
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Patrick Burns
pburns at pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')
Jim Holtman
2011-Sep-01 03:07 UTC
[R] Basic question about re-writing for loop as a function
use Rprof to see where thr time is spent. take the strsplit out of the loop and do it once outsidr to create an object you can test against in the loop. you can probably get rid of the loop easily, but since there is no example of the data, it is hard to create a solution. Sent from my iPad On Aug 29, 2011, at 9:55, Chris Beeley <chris.beeley at gmail.com> wrote:> Hello- > > Sorry to ask a basic question, but I've spent many hours on this now > and seem to be missing something. > > I have a loop that looks like this: > > mainmat=data.frame(matrix(data=0, ncol=92, nrow=length(predata$Words_MH))) > > for(i in 1:length(predata$Words_MH)){ > for(j in 1:92){ > > mainmat[i,j]=ifelse(j %in% > as.numeric(unlist(strsplit(predata$Words_MH[i], split=","))), 1, 0) > > } > } > > What it's doing is creating a matrix with 92 columns, that's the > number of different codes, and then for every row of my data it looks > to see if the code (code 1, code 2, etc.) is in the string and if it > is, returns a 1 in the relevant column (column 1 for code 1, column 2 > for code 2, etc.) > > There are 1000 rows in the database, and I have to run several > versions of this code, so it just takes way too long, I have been > trying to rewrite using lapply. I tried this: > > myfunction=function(x, y) ifelse(x %in% > as.numeric(unlist(strsplit(predata$Words_MH[y], split=","))), 1, 0) > > for(j in 1:92){ > mainmat[,j]= lapply(predata$Words, myfunction) > } > > but I don't think I can use something that takes two inputs, and I > can't seem to remove either. > > Here's a dput of the first 10 rows of the variable in case that's helpful: > > predata$Words=c("1", "1", "1", "1", "2,3,4", "5", "1", "1", "6", "7,8,9,10") > > Given these data, I want the function to return, for the first column, > 1, 1, 1, 1, 0, 0, 1, 1, 0, 0 (because those are the values of Words > which contain a 1) and for the second column return 0, 0, 0, 0, 1, 0, > 0, 0, 0, 0 (because the fifth value is the only one that contains a > 2). > > Any suggestions gratefully received! > > Chris Beeley > Institute of Mental Health, UK > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jim Holtman
2011-Sep-01 03:18 UTC
[R] Basic question about re-writing for loop as a function
sorry, did not see your data at the bottom of the email Sent from my iPad On Aug 29, 2011, at 9:55, Chris Beeley <chris.beeley at gmail.com> wrote:> Hello- > > Sorry to ask a basic question, but I've spent many hours on this now > and seem to be missing something. > > I have a loop that looks like this: > > mainmat=data.frame(matrix(data=0, ncol=92, nrow=length(predata$Words_MH))) > > for(i in 1:length(predata$Words_MH)){ > for(j in 1:92){ > > mainmat[i,j]=ifelse(j %in% > as.numeric(unlist(strsplit(predata$Words_MH[i], split=","))), 1, 0) > > } > } > > What it's doing is creating a matrix with 92 columns, that's the > number of different codes, and then for every row of my data it looks > to see if the code (code 1, code 2, etc.) is in the string and if it > is, returns a 1 in the relevant column (column 1 for code 1, column 2 > for code 2, etc.) > > There are 1000 rows in the database, and I have to run several > versions of this code, so it just takes way too long, I have been > trying to rewrite using lapply. I tried this: > > myfunction=function(x, y) ifelse(x %in% > as.numeric(unlist(strsplit(predata$Words_MH[y], split=","))), 1, 0) > > for(j in 1:92){ > mainmat[,j]= lapply(predata$Words, myfunction) > } > > but I don't think I can use something that takes two inputs, and I > can't seem to remove either. > > Here's a dput of the first 10 rows of the variable in case that's helpful: > > predata$Words=c("1", "1", "1", "1", "2,3,4", "5", "1", "1", "6", "7,8,9,10") > > Given these data, I want the function to return, for the first column, > 1, 1, 1, 1, 0, 0, 1, 1, 0, 0 (because those are the values of Words > which contain a 1) and for the second column return 0, 0, 0, 0, 1, 0, > 0, 0, 0, 0 (because the fifth value is the only one that contains a > 2). > > Any suggestions gratefully received! > > Chris Beeley > Institute of Mental Health, UK > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.