Hello,
I have a question concerning ?for loops? on multiple columns.
I made 91 columns with results (all made together with a for loop) and I
want to us lm to fit the model.
I want to compare the results of all these calculated columns (91) with one
column with observed values. I use the function lm to fit the model and
calculate r.squared. I manage to do this for each column separately:
For example: my calculated results are in the dataframe ?results6?, my
observed results in data, (data$observed).
#To calculate R2 for column 1:
lm.modelobs1 <- lm(results6[,c(1)] ~ data$observed)
R2.1 <- summary(lm.modelobs1)["r.squared"]
#To calculate R2 for column 91:
lm.modelobs91 <- lm(results6[,c(91)] ~ data$observed)
R2.91 <- summary(lm.modelobs91)["r.squared"]
But I think there has to be a method to do this automatically and not 91
times.
I tried to use a for loop:
###(length(C) = 91)
results7<-data.frame(lm.modelobs=rep(NA,length(C)))
for (i in (1:91))
{
results7$lm.modelobs[i] <- lm(results6[i] ~ data$observed)
R2.[i] <- summary(lm.modelobs[i])["r.squared"]
}
I also tried just to calculate results7$lm.modelobs[i] without directly
calculating r.squared but I also didn?t manage. It seems like it?s not
possible to use the referral to a column in a for loop or a function. (if I
just ask R the data in column 5 with ? results6[5] ?, that works. ?
results6[,c(5)]? gives the same but replacing results6[i] by
results6[,c([i])] in the for loop is apparently also no a solution). I?m
looking for a manner to repeat a calculation/function on several columns. I
kind of need this as well further in my script, not only in this part?
I would greatly appreciate any suggestions!
Thanks!
Nerak
--
View this message in context:
http://r.789695.n4.nabble.com/Accomplishing-a-loop-on-multiple-columns-tp4284974p4284974.html
Sent from the R help mailing list archive at Nabble.com.
Lists are the answer.
LIST<-list()
for(i in 1:ncol(results6))
{
LIST[[i]]<-lm(results6[,i]~data$observed)
}
You'll now have a 91 entry list of lm(). You can then do something like
this:
LIST2<-list()
for(i in 1:length(LIST))
{
LIST2[[i]]<-LIST[[i]]$r.squared
}
This should now be a list of 91 R-squared, which you can unlist() and save
in matrix form if you want.
-----
----
Isaac
Research Assistant
Quantitative Finance Faculty, UTS
--
View this message in context:
http://r.789695.n4.nabble.com/Accomplishing-a-loop-on-multiple-columns-tp4284974p4285136.html
Sent from the R help mailing list archive at Nabble.com.
Many thanks! Never used lists before, but it?s a great solution! It works
very well!
Although, I have a next question concerning this.
I want to know for which value (column) I have the maximal Rsquared.
Therefore, I unlist the LIST so that it?s written like a vector.
The columns were always named in the same way. They always start with
results4$depth_ following by the number. The numbers are constructed as:
seq(1,10,0.1). But if the R squared values are now in 1 column, I don?t know
for which column they are calculated. So I made a new data frame with both
columns:
R2 <- unlist(LIST)
Cvalue <- c(seq(1,10,0.1))
results5 <- data.frame(Cvalue,R2)
# I know I can calculate the max value of Rsquared by this way:
max(results5$R2)
# now I want to know to which Cvalue this belongs. I would write it like
this:
results5$Cvalue[which(results5$R2 == "max(results5$R2)")]
# But I always get the solution:
numeric(0)
# I don?t know if these Rsquared values are in a kind of format that this
doesn?t work? (I used before for similar things, and I know that for example
it cannot works if R recognises the values as a date) Maybe because it?s
with decimals? I know that max(results5$R2) is in this example 0.6081547 and
I can see that that belongs to the Cvalue == 1.8. It works in the opposite
way.
results5$R2[which(results5$Cvalue == "1.8")]
# But neither
results5$Cvalue[which(results5$R2 == "0.6081547")]
# nor
results5$Cvalue[which(results5$R2 == "max(results5$R2)")]
# works?
# I have an other question concerning accomplishing calculations on several
colums. Again, there is a loop involved? I don?t know if I should ask it in
this topic as well, because I don?t want to start to many kind of similar
topics. I searched in the helpforum but unfortunately I couldn?t find
something similar.
Again, I manage to do it for one column (with the use of the specific name
for this column).
In each columns, I have 60 values. But to compare it to another column, I
should reorganize the values. I want that value 2 becomes value 1, value 3
value 2 and so on. The first value would be NA.
If I would do this for 1 column, I would do it like this:
results$newdepth[1] <- NA
for (t in 2:60)
{
results$newdepth[t] <- results$depth[t-1]
}
Like I mentioned before, the names of each column are constructed in the
same way: results$depth_ followed by a number (seq(1,10,0.1)).
So I don?t know how to manage to repeat this for all the columns at the same
time? I would think about a for loop with for example for (i in 1:91)
because there are 91 columns, but then I don?t know how to say that it
should happen for each column. I was thinking about using this
for (u in 1:91)
{
results$newdepth [,u]<- results$depth [,u]
for (t in 2:60)
{
results$newdepth[,u][t] <- results$depth[,u] [t-1]
}}
But I can see that there are several reasons why a for loop like this cannot
work. (like [ ][ ], ?)
I just really cannot find an other manner to repeat a calculation or
something els on several columns...
--
View this message in context:
http://r.789695.n4.nabble.com/Accomplishing-a-loop-on-multiple-columns-tp4284974p4289137.html
Sent from the R help mailing list archive at Nabble.com.
I just saw a little mistake in my last post: Totally in the end, last line of
the last loop, results$depth[,u] [t-1] should be results$newdepth[,u] [t-1].
My apologies.
for (u in 1:91)
{
results$newdepth [,u]<- results$depth [,u]
for (t in 2:60)
{
results$newdepth[,u][t] <- results$newdepth[,u] [t-1]
}}
--
View this message in context:
http://r.789695.n4.nabble.com/Accomplishing-a-loop-on-multiple-columns-tp4284974p4289248.html
Sent from the R help mailing list archive at Nabble.com.
Apparently Analagous Threads
- result numeric(0) when using variable1[which(variable2="max(variable2)"]
- manipulating data of several columns simultaneously
- Help documentation of "The Studentized range Distribution"
- differentiating a numeric vector
- Start plot really at baselines x=0, y=0