Hello, I'm attempting to create a data frame with correlations between every pair of variables in a data frame, so that I can then sort by the value of the correlation coefficient and see which pairs of variables are most strongly correlated. The sm2vec function in the corpcor library works very nicely as shown here: library(Hmisc) library(corpcor) # Create example data x1 = runif(50) x2 = runif(50) x3 = runif(50) d = data.frame(x1=x1,x2=x2,x3=x3) label(d$x1) = "Variable x1" label(d$x2) = "Variable x2" label(d$x3) = "Variable x3" # Get correlations cormat = cor(d) # Get vector form of lower triangular elements cors = sm2vec(cormat,diag=F) inds = sm.index(cormat,diag=F) # Create a data frame var1 = dimnames(cormat)[[1]][inds[,1]] var2 = dimnames(cormat)[[2]][inds[,2]] lbl1 = label(d[,var1]) lbl2 = label(d[,var2]) cor_df = data.frame(Var1=lbl1,Var2=lbl2,Cor=cors) The issue that I run into is when trying to get the labels in lbl1 and lbl2. I get the warning: In mapply(FUN = label, x = x, default = default, MoreArgs = list(self TRUE), : longer argument not a multiple of length of shorter My usage of label seems ambiguous since the data frame could also a label attached to it, aside from labels attached to variables within the data frame. However, the code above does work, with the warning. Aside from using a loop to get the label of one variable at a time, is there any other way of getting the labels for all variables in the data frame? Also, if there is a better way to achieve my goal of getting the correlations between all variable pairs, I'd love to know. Thanks in advance for any responses! --Krishna [[alternative HTML version deleted]]
KT_rfan wrote:> > > I'm attempting to create a data frame with correlations between every pair > of variables in a data frame, so that I can then sort by the value of the > correlation coefficient and see which pairs of variables are most strongly > correlated. > > The sm2vec function in the corpcor library works very nicely as shown > here: > > library(Hmisc) > library(corpcor) > > # Create example data > x1 = runif(50) > x2 = runif(50) > x3 = runif(50) > d = data.frame(x1=x1,x2=x2,x3=x3) > label(d$x1) = "Variable x1" > label(d$x2) = "Variable x2" > label(d$x3) = "Variable x3" > .. rest of code omitted >This has nothing to do with Hmisc and corpcor. I things get confusing, simplify and use str(). What you wanted to "label" the columns is "names", or, probably better named, "colnames". Note that your way of labeling converts the column to a Class "labelled", which is not what function take for breakfeast. d = data.frame(x1=runif(10),x2=runif(10)) label(d) # This alone gives the error message # x1 x2 # "" "" " # Warning message: #In mapply(FUN = label, x = x, default = default, MoreArgs = list(self TRUE), : # longer argument not a multiple of length of shorter str(d) # data.frame': 10 obs. of 2 variables: # $ x1: num 0.1353 0.7234 0.0266 0.074 0.2391 ... # $ x2: num 0.833 0.573 0.136 0.395 0.308 ... label(d$x1) = "Variable x1" str(d) #'data.frame': 10 obs. of 2 variables: # $ x1:Class 'labelled' atomic [1:10] 0.1353 0.7234 0.0266 0.074 0.2391 ... # .. ..- attr(*, "label")= chr "Variable x1" # $ x2: num 0.833 0.573 0.136 0.395 0.308 ... # Labeling columns, the correct way d = data.frame(x1=runif(10),x2=runif(10)) str(d) names(d) = c("Var1","Var2") str(d) -- View this message in context: http://r.789695.n4.nabble.com/Hmisc-label-function-applied-to-data-frame-tp3069784p3070777.html Sent from the R help mailing list archive at Nabble.com.
Hello again, I have found that if I use sapply, I do not get a warning, i.e., lbl1 = sapply(d[,var1],label) works correctly and gives no warning. I'm sorry this did not occur to me earlier, my apologies! --Krishna On Thu, Dec 2, 2010 at 11:36 AM, Krishna Tateneni <tateneni@gmail.com>wrote:> Hello, > > I'm attempting to create a data frame with correlations between every pair > of variables in a data frame, so that I can then sort by the value of the > correlation coefficient and see which pairs of variables are most strongly > correlated. > > The sm2vec function in the corpcor library works very nicely as shown here: > > library(Hmisc) > library(corpcor) > > # Create example data > x1 = runif(50) > x2 = runif(50) > x3 = runif(50) > d = data.frame(x1=x1,x2=x2,x3=x3) > label(d$x1) = "Variable x1" > label(d$x2) = "Variable x2" > label(d$x3) = "Variable x3" > > # Get correlations > cormat = cor(d) > > # Get vector form of lower triangular elements > cors = sm2vec(cormat,diag=F) > inds = sm.index(cormat,diag=F) > > # Create a data frame > var1 = dimnames(cormat)[[1]][inds[,1]] > var2 = dimnames(cormat)[[2]][inds[,2]] > lbl1 = label(d[,var1]) > lbl2 = label(d[,var2]) > cor_df = data.frame(Var1=lbl1,Var2=lbl2,Cor=cors) > > The issue that I run into is when trying to get the labels in lbl1 and > lbl2. I get the warning: > > In mapply(FUN = label, x = x, default = default, MoreArgs = list(self > TRUE), : > longer argument not a multiple of length of shorter > > My usage of label seems ambiguous since the data frame could also a label > attached to it, aside from labels attached to variables within the data > frame. However, the code above does work, with the warning. Aside from > using a loop to get the label of one variable at a time, is there any other > way of getting the labels for all variables in the data frame? > > Also, if there is a better way to achieve my goal of getting the > correlations between all variable pairs, I'd love to know. > > Thanks in advance for any responses! > > --Krishna >[[alternative HTML version deleted]]
Possibly Parallel Threads
- [LLVMdev] Missed optimization opportunity in 3-way integer comparison case
- [LLVMdev] Wrong assembly is written for x86_64 target in JIT without optimization?
- Conversion of symmetry matrix into a vector
- [LLVMdev] Wrong assembly is written for x86_64 target in JIT without optimization?
- extensions.conf gotoif and label