Dear R community,
For 100 sites at human chromosomes, I ran two tests, one is to consider an
experiment measurement as a continuous variable, so doing multiple regression;
the other is to compare top 25% samples to bottom 25% samples based on values
of the measured variable, so categorical analysis. A total of 16 sites show
significance; In the following results, I only show five variables ( site,
region, test, chr, start); then I need to add the sixth variable called
"common" to label a common region (2 regions in this example file)
with p value significance from both tests.
In the second "common" region, chr (chromosome) is the same (chr 1)
and start location are also same for all six sites (three from categorical
analysis and three from continuous analysis), just end location (not known)
different, so I labeled them as one common region; for the first
"common" region, they are in chromosome 1, chromosome start location
is not the same, but location difference is less than 1000 base pairs, so they
are in the same chromosome region.
I used SAS first.location Idea, then using a R cumsum function I learned from
Bert; So comparing region variable and num.location variable, I can find out
the second common region although I have not figured out how to label it using
R. I have no idea about how to find the first "common" region.
Can you help me?
Thank you very much!!
Ding
common <- c(NA,NA,1,1,1,1,1,2,2,2,2,2,2, NA, NA, NA);
site <-seq(1, 16);
region <- c(1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6);
test
<-c("categorical","categorical","continuous","continuous","continuous","categorical",
"categorical","continuous","continuous","continuous","categorical","categorical",
"categorical","continuous","continuous","continuous");
chr <-c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2);
start
<-c(3229921,3229921,16553549,16553549,16553549,16554171,16554171,32826843,32826843,
32826843,32826843,32826843,32826843,30669385,30669385,30669385);
dat <-data.frame(common,site, region, test, chr, start, stringsAsFactors =
F);
dat$first.location <- !duplicated(dat$start);
dat$num.location <-cumsum(!duplicated(dat$start));
---------------------------------------------------------------------
-SECURITY/CONFIDENTIALITY WARNING-
This message (and any attachments) are intended solely f...{{dropped:22}}