thr3ads.net - R help - [R] Accomplishing a loop on multiple columns [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Nerak

2012-Jan-11 10:22 UTC

[R] Accomplishing a loop on multiple columns

Hello,
I have a question concerning ?for loops? on multiple columns. 
I made 91 columns with results (all made together with a for loop) and I
want to us lm to fit the model.
I want to compare the results of all these calculated columns (91) with one
column with observed values. I use the function lm to fit the model and
calculate r.squared. I manage to do this for each column separately:

For example: my calculated results are in the dataframe ?results6?, my
observed results in data, (data$observed). 
#To calculate R2 for column 1:
lm.modelobs1 <- lm(results6[,c(1)] ~ data$observed)
R2.1 <- summary(lm.modelobs1)["r.squared"]
#To calculate R2 for column 91:
lm.modelobs91 <- lm(results6[,c(91)] ~ data$observed)
R2.91 <- summary(lm.modelobs91)["r.squared"]

But I think there has to be a method to do this automatically and not 91
times.
I tried to use a for loop:
###(length(C) = 91)
results7<-data.frame(lm.modelobs=rep(NA,length(C)))
for (i in (1:91))
{
results7$lm.modelobs[i] <- lm(results6[i] ~ data$observed)
R2.[i] <- summary(lm.modelobs[i])["r.squared"]
}

I also tried just to calculate results7$lm.modelobs[i] without directly
calculating r.squared but I also didn?t manage. It seems like it?s not
possible to use the referral to a column in a for loop or a function. (if I
just ask R the data in column 5 with ? results6[5]  ?, that works. ?
results6[,c(5)]? gives the same but replacing results6[i] by
results6[,c([i])] in the for loop is apparently also no a solution).  I?m
looking for a manner to repeat a calculation/function on several columns. I
kind of need this as well further in my script, not only in this part?

I would greatly appreciate any suggestions!
Thanks!
Nerak


--
View this message in context:
http://r.789695.n4.nabble.com/Accomplishing-a-loop-on-multiple-columns-tp4284974p4284974.html
Sent from the R help mailing list archive at Nabble.com.

iliketurtles

2012-Jan-11 11:39 UTC

head link

[R] Accomplishing a loop on multiple columns

Lists are the answer.


LIST<-list()
 for(i in 1:ncol(results6))
 {
  LIST[[i]]<-lm(results6[,i]~data$observed)
 }

You'll now have a 91 entry list of lm(). You can then do something like
this:

LIST2<-list()
 for(i in 1:length(LIST))
 {
  LIST2[[i]]<-LIST[[i]]$r.squared
 }

This should now be a list of 91 R-squared, which you can unlist() and save
in matrix form if you want. 

-----
----

Isaac
Research Assistant
Quantitative Finance Faculty, UTS
--
View this message in context:
http://r.789695.n4.nabble.com/Accomplishing-a-loop-on-multiple-columns-tp4284974p4285136.html
Sent from the R help mailing list archive at Nabble.com.

Nerak

2012-Jan-12 14:37 UTC

head link

[R] Accomplishing a loop on multiple columns

Many thanks! Never used lists before, but it?s a great solution! It works
very well!


Although, I have a next question concerning this. 
I want to know for which value (column) I have the maximal Rsquared.
Therefore, I unlist the LIST so that it?s written like a vector. 
The columns were always named in the same way. They always start with
results4$depth_ following by the number. The numbers are constructed as:
seq(1,10,0.1). But if the R squared values are now in 1 column, I don?t know
for which column they are calculated. So I made a new data frame with both
columns: 
R2 <- unlist(LIST)
Cvalue <- c(seq(1,10,0.1))
results5 <- data.frame(Cvalue,R2)

# I know I can calculate the max value of Rsquared by this way: 

max(results5$R2)

# now I want to know to which Cvalue this belongs. I would write it like
this: 
results5$Cvalue[which(results5$R2 == "max(results5$R2)")] 
# But I always get the solution: 
numeric(0)
# I don?t know if these Rsquared values are in a kind of format that this
doesn?t work? (I used before for similar things, and I know that for example
it cannot works if R recognises the values as a date)  Maybe because it?s
with decimals? I know that max(results5$R2) is in this example 0.6081547 and
I can see that that belongs to the Cvalue == 1.8. It works in the opposite
way. 
results5$R2[which(results5$Cvalue == "1.8")]
# But neither
results5$Cvalue[which(results5$R2 == "0.6081547")] 
# nor 
results5$Cvalue[which(results5$R2 == "max(results5$R2)")]
# works?




# I have an other question concerning accomplishing calculations on several
colums. Again, there is a loop involved? I don?t know if I should ask it in
this topic as well, because I don?t want to start to many kind of similar
topics. I searched in the helpforum but unfortunately I couldn?t find
something similar.

Again, I manage to do it for one column (with the use of the specific name
for this column).
In each columns, I have 60 values. But to compare it to another column, I
should reorganize the values. I want that value 2 becomes value 1, value 3
value 2 and so on.  The first value would be NA.
If I would do this for 1 column, I would do it like this:
results$newdepth[1] <- NA
for (t in 2:60)
{
results$newdepth[t] <- results$depth[t-1]
}

Like I mentioned before, the names of each column are constructed in the
same way: results$depth_ followed by a number (seq(1,10,0.1)).

So I don?t know how to manage to repeat this for all the columns at the same
time? I would think about a for loop with for example for  (i in 1:91)
because there are 91 columns, but then I don?t know how to say that it
should happen for each column. I was thinking about using this
for (u in 1:91)
{
results$newdepth [,u]<- results$depth [,u]
for (t in 2:60)
{
results$newdepth[,u][t] <- results$depth[,u] [t-1]
}}

But I can see that there are several reasons why a for loop like this cannot
work. (like [ ][ ], ?) 
I just really cannot find an other manner to repeat a calculation or
something els on several columns...


--
View this message in context:
http://r.789695.n4.nabble.com/Accomplishing-a-loop-on-multiple-columns-tp4284974p4289137.html
Sent from the R help mailing list archive at Nabble.com.

Nerak

2012-Jan-12 15:08 UTC

head link

[R] Accomplishing a loop on multiple columns

I just saw a little mistake in my last post: Totally in the end, last line of
the last loop, results$depth[,u] [t-1] should be results$newdepth[,u] [t-1].
My apologies.

for (u in 1:91)
{
results$newdepth [,u]<- results$depth [,u]
for (t in 2:60)
{
results$newdepth[,u][t] <- results$newdepth[,u] [t-1]
}}


--
View this message in context:
http://r.789695.n4.nabble.com/Accomplishing-a-loop-on-multiple-columns-tp4284974p4289248.html
Sent from the R help mailing list archive at Nabble.com.

Reasonably Related Threads

Search for more possibly parallel threads

R help - Jan 2012 - Accomplishing a loop on multiple columns

[R] Accomplishing a loop on multiple columns

[R] Accomplishing a loop on multiple columns

[R] Accomplishing a loop on multiple columns

[R] Accomplishing a loop on multiple columns

Reasonably Related Threads