Josh B
2010-Jun-09 19:04 UTC
[R] Performing a function on columns specified in another dataframe
Hello Listserve, Here is another question to keep you on your toes. Please consider the following toy dataset: a <- read.table(textConnection("fred sam joe alex measure.1 10 4 10 1 measure.2 10 4 2 8 measure.3 3 1 8 3 measure.4 5 1 3 3 measure.5 8 6 8 3 measure.6 9 5 1 0 measure.7 4 6 10 1 measure.8 3 6 8 9 measure.9 8 6 7 7 measure.10 7 8 9 8"), header = TRUE) And also please consider this toy dataset: b <- read.table(textConnection("x y test.1 fred sam test.2 sam joe test.3 joe alex"), header = TRUE) What I want to do is perform some Student's t-tests. The comparisons I want to make are specified in the dataset called "b" -- I'd like to test fred versus sam, sam versus joe, and joe versus alex. How could I use the dataset called "b" to specify the columns to use in the series of t-tests? Keep in mind that my real dataset is enormous (1000 columns) and will likely change, so solutions relying on numeric indexing would not work for me. I'm thinking the code would look something like this: #create a matrix for the output results <- matrix(nrow = nrow(b), ncol = 1) results <- cbind(b, results) for (i in 1:length(b)){ results[i,3] <- t.test(???, ???) #this is where I'm stuck. How do I pull the information I want out of b -- i.e., the columns to use -- to do the appropriate comparisons? } I'm hoping for a solution that doesn't create any new subsetted matrices along the way, because this will slow down the run time. Thanks in advance, Josh [[alternative HTML version deleted]]
Jorge Ivan Velez
2010-Jun-09 20:05 UTC
[R] Performing a function on columns specified in another dataframe
Hi Josh, One way would be: res <- apply(b, 1, function(Names) t.test(a[, Names[1]], a[, Names[2]])) do.call(rbind, lapply(res, function(l) c(l$statistic, l$parameter, p l$p.value))) # t df p # test.1 1.775490 17.35589 0.09335398 # test.2 -1.489210 15.82584 0.15608937 # test.3 1.533333 17.99873 0.14258339 HTH, Jorge On Wed, Jun 9, 2010 at 3:04 PM, Josh B <> wrote:> Hello Listserve, > > Here is another question to keep you on your toes. Please consider the > following toy dataset: > > a <- read.table(textConnection("fred sam joe alex > measure.1 10 4 10 1 > measure.2 10 4 2 8 > measure.3 3 1 8 3 > measure.4 5 1 3 3 > measure.5 8 6 8 3 > measure.6 9 5 1 0 > measure.7 4 6 10 1 > measure.8 3 6 8 9 > measure.9 8 6 7 7 > measure.10 7 8 9 8"), header = TRUE) > > And also please consider this toy dataset: > b <- read.table(textConnection("x y > test.1 fred sam > test.2 sam joe > test.3 joe alex"), header = TRUE) > > What I want to do is perform some Student's t-tests. The comparisons I want > to make are specified in the dataset called "b" -- I'd like to test fred > versus sam, sam versus joe, and joe versus alex. How could I use the dataset > called "b" to specify the columns to use in the series of t-tests? Keep in > mind that my real dataset is enormous (1000 columns) and will likely change, > so solutions relying on numeric indexing would not work for me. > > I'm thinking the code would look something like this: > > #create a matrix for the output > results <- matrix(nrow = nrow(b), ncol = 1) > results <- cbind(b, results) > > for (i in 1:length(b)){ > results[i,3] <- t.test(???, ???) #this is where I'm stuck. How do I pull > the information I want out of b -- i.e., the columns to use -- to do the > appropriate comparisons? > } > > I'm hoping for a solution that doesn't create any new subsetted matrices > along the way, because this will slow down the run time. > > Thanks in advance, > Josh > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]