I predicted that y would increase as x increased. However, I only made the prediction on the ranks of the scores. The ranks don't correlate with predicted. And, I don't think a regression on the ranks is warranted. However, the actual scores do yield a significant slope for b, and a significant R^2 using a linear regression (y is the value and x is the predicted rank). What should my argument be here? Should I have endorsed using the actual scores instead of ranks to begin for some reason that doesn't have anything to do with my current result? :) Oh, on another note, I can use rcorr to get the Spearman correlations, but I'd like to be able to just add the ranks as a column. I was going to just use order and add a simple factor. But, that doesn't deal with ties correctly. And, I also wanted to analyze correlations subject by subject and compare my two groups. However, there doesn't seem to be a good way to get this. I tried using "by" with "cor". However, this requires binding x and y which causes cor to return a matrix (if you could pass it x and y separate it would just return a number). given data frame s x y subj 4 7 harry 5 1 harry 6 9 harry 2 4 steve 3 7 steve ... i'd like to be able to produce r subj .12 harry .52 steve ... any tips?
John Christie
2003-Aug-22 12:18 UTC
[R] a pickle (solved first part now need r's from data)
On Friday, August 22, 2003, at 01:44 AM, John Christie wrote:> I predicted that y would increase as x increased. However, I only > made the prediction on the ranks of the scores. The ranks don't > correlate with predicted. And, I don't think a regression on the > ranks is warranted. However, the actual scores do yield a significant > slope for b, and a significant R^2 using a linear regression (y is the > value and x is the predicted rank). What should my argument be here? > Should I have endorsed using the actual scores instead of ranks to > begin for some reason that doesn't have anything to do with my current > result? :)OK, now I realize that I should probably not have been correlating ranks in the first place because my real data may have had a non-linear, but still steadily increasing, slope. The ranks would tend to increase variance where the slope was low and ruined my chance of finding an effect.> Oh, on another note, I can use rcorr to get the Spearman correlations, > but I'd like to be able to just add > the ranks as a column. I was going to just use order and add a simple > factor. But, that doesn't deal with ties correctly.still don't have these yet.> And, I also wanted to analyze correlations subject by subject and > compare my two groups. However, there doesn't seem to be a good way > to get this. I tried using "by" with "cor". However, this requires > binding x and y which causes cor to return a matrix (if you could pass > it x and y separate it would just return a number). > > given > > data frame s > x y subj > 4 7 harry > 5 1 harry > 6 9 harry > 2 4 steve > 3 7 steve > ... > > i'd like to be able to produce > > r subj > .12 harry > .52 steve > ... > > any tips?
John - Here are two equivalent solutions to your final question: data <- data.frame(x=seq(15), y=sample(seq(15), 15), subj=sample(c("harry","steve","nathan","john"), 15, T)) result.1 <- unclass(by(data, data$subj, function(dd) cor(dd$x, dd$y))) result.2 <- unclass(by(data, data$subj, function(dd) cor(dd[c(1,2)])[1,2])) I guess I prefer result.1 since the code is easier to read, even though it does bury literal column names into the code. The "function(dd)" stuff is a very common construction in by(), sapply(), lapply() constructs. It defines a little function in-line, without ever naming it, and passes it as the third argument to by(). I use this all the time, when I need to rearrange the order, or do a little bit of subscripting (as here), in the arguments of a function (cor()) which I would otherwise just pass directly as the third argument to by(). I'll let others comment on my use of unclass() here. The goal was to get a numeric vector with a names attribute, so it can be incorporated into further processing. I'm surprised just how much tinkering it took to get this all to work. This might actually make a useful example to add to the help page for by(). - tom blackwell - u michigan medical school - ann arbor - On Fri, 22 Aug 2003, John Christie wrote:> . . . And, I also wanted to analyze correlations subject by subject and > compare my two groups. However, there doesn't seem to be a good way to > get this. I tried using "by" with "cor". However, this requires > binding x and y which causes cor to return a matrix (if you could pass > it x and y separate it would just return a number). > > given > > data frame s > x y subj > 4 7 harry > 5 1 harry > 6 9 harry > 2 4 steve > 3 7 steve > ... > > i'd like to be able to produce > > r subj > .12 harry > .52 steve > ... > > any tips?