Umesh Rosyara
2011-Mar-05 23:29 UTC
[R] please help ! label selected data points in huge number of data points potentially as high as 50, 000 !
Dear All
I am reposting because I my problem is real issue and I have been working on
this. I know this might be simple to those who know it ! Anyway I need help
!
Let me clear my point. I have huge number of datapoints plotted using either
base plot function or xyplot in lattice (I have preference to use lattice).
name xvar p
1 M1 1 0.107983837
2 M2 11 0.209125624
3 M3 21 0.163959428
4 M4 31 0.132469859
5 M5 41 0.086095130
6 M6 51 0.180822010
7 M7 61 0.246619925
8 M8 71 0.147363687
9 M9 81 0.162663127
........
5000 observations
I need to plot xvar (x variable) and p (y variable) using either plot () or
xyplot(). And I want show (print to graph) datapoint name labels to those
rows that have p value < 0.01 (means that they are significant). With my
limited R knowlege I can use text (x,y, labels) option to manually add the
text, but I have huge number of data point(though I provide just 1000 here,
potentially it can go upto 50,000). So I want to display name corresponding
to those observations (rows) that have pvalue less than 0.05 (threshold).
Here is my example dataset and my status:
name <- c(paste ("M", 1:5000, sep = ""))
xvar <- seq(1, 50000, 10)
set.seed(134)
p <- rnorm(5000, 0.15,0.05)
dataf <- data.frame(name,xvar, p)
# using lattice (my first preference)
require(lattice)
xyplot(p ~ xvar, dataf)
#I want to display names for the following observation that meet requirement
of p <0.01.
which (dataf$p < 0.01)
[1] 811 854 1636 1704 2148 2161 2244 3205 3268 4177 4564 4614 4639 4706
Thus significant observations are:
name xvar p
811 M811 8101 0.0050637068
854 M854 8531 -0.0433901783
1636 M1636 16351 -0.0279014039
1704 M1704 17031 0.0029878335
2148 M2148 21471 0.0048898232
2161 M2161 21601 -0.0354130557
2244 M2244 22431 0.0003255200
3205 M3205 32041 0.0079758430
3268 M3268 32671 0.0012797145
4177 M4177 41761 0.0015487439
4564 M4564 45631 0.0024867152
4614 M4614 46131 0.0078381964
4639 M4639 46381 -0.0063151605
4706 M4706 47051 0.0032200517
I want the datapoint (8101, 0.0050637068) with M811 in the plot. Similarly
for all of the above (that are significant). I do not want to label all out
of 5000 who do have p value < 0.01. I know I can add manually - text (8101,
0.0050637068, M811) in plot() in base.
plot (dataf$xvar,p)
text (8101, 0.0050637068, "M811")
text (8531, -0.0433901783, "M854")
I need more automation to deal with observations as high as 50,000. In real
sense I do not know how many variables there will be.
You help is highly appreciated. Thank you;
Best Regards
Umesh R
_____
From: Umesh Rosyara [mailto:rosyaraur@gmail.com]
Sent: Saturday, March 05, 2011 12:30 PM
To: 'r-help@r-project.org'
Subject: displaying label meeting condition (i.e. significant, i..e p value
less than 005) in plot function
Dear R users,
Here is my problem:
# example data
name <- c(paste ("M", 1:1000, sep = ""))
xvar <- seq(1, 10000, 10)
set.seed(134)
p <- rnorm(1000, 0.15,0.05)
dataf <- data.frame(name,xvar, p)
plot (dataf$xvar,p)
abline(h=0.05)
# I can know which observation number is less than 0.05
which (dataf$p < 0.05)
[1] 12 20 80 269 272 338 366 368 397 403 432 453 494 543 592 691 723 789
811
[20] 854 891 931 955
I want to display (label) corresponding names on the plot above:
means that 12th observation M12, 20th observation M20 and so on. Please note
that I have names not in numerical sequience (rather different names), just
provided for this example to create dataset easily.
Thanks in advance
Umesh R
[[alternative HTML version deleted]]
Sarah Goslee
2011-Mar-06 13:11 UTC
[R] please help ! label selected data points in huge number of data points potentially as high as 50, 000 !
I think you've made your problem too complicated. Given your example below (and THANK YOU for including a workable example), is this not what you need? sigdata <- dataf[dataf$p < 0.01,] plot(dataf$xvar, dataf$p) text(sigdata$xvar, sigdata$p, sigdata$name) text() will take vectors of arguments. Sarah On Sat, Mar 5, 2011 at 6:29 PM, Umesh Rosyara <rosyaraur at gmail.com> wrote:> Dear All > > I am reposting because I my problem is real issue and I have been working on > this. I know this might be simple to those who know it ! Anyway I need help > ! > > Let me clear my point. I have huge number of datapoints plotted using either > base plot function or xyplot in lattice (I have preference to use lattice). > ? ? ? ? name xvar ? ? ? ? ? ?p > 1 ? ? ? M1 ? ?1 ?0.107983837 > 2 ? ? ? M2 ? 11 ?0.209125624 > 3 ? ? ? M3 ? 21 ?0.163959428 > 4 ? ? ? M4 ? 31 ?0.132469859 > 5 ? ? ? M5 ? 41 ?0.086095130 > 6 ? ? ? M6 ? 51 ?0.180822010 > 7 ? ? ? M7 ? 61 ?0.246619925 > 8 ? ? ? M8 ? 71 ?0.147363687 > 9 ? ? ? M9 ? 81 ?0.162663127 > ........ > 5000 observations > > I need to plot xvar (x variable) and p (y variable) using either plot () or > xyplot(). And I want show (print to graph) datapoint name labels to those > rows that have p value < 0.01 (means that they are significant). With my > limited R knowlege I can use text (x,y, labels) option to manually add the > text, but I have huge number of data point(though I provide just 1000 here, > potentially it can go upto 50,000). So I want to display name corresponding > to those observations (rows) that have pvalue less than 0.05 (threshold). > > Here is my example dataset and my status: > name <- c(paste ("M", 1:5000, sep = "")) > xvar <- seq(1, 50000, 10) > set.seed(134) > p <- rnorm(5000, 0.15,0.05) > dataf <- data.frame(name,xvar, p) > > # using lattice (my first preference) > require(lattice) > xyplot(p ~ xvar, dataf) > > #I want to display names for the following observation that meet requirement > of p <0.01. > which (dataf$p < 0.01) > [1] ?811 ?854 1636 1704 2148 2161 2244 3205 3268 4177 4564 4614 4639 4706 > > Thus significant observations are: > ? ? ? ?name ?xvar ? ? ? ? ? ? p > 811 ? M811 ?8101 ?0.0050637068 > 854 ? M854 ?8531 -0.0433901783 > 1636 M1636 16351 -0.0279014039 > 1704 M1704 17031 ?0.0029878335 > 2148 M2148 21471 ?0.0048898232 > 2161 M2161 21601 -0.0354130557 > 2244 M2244 22431 ?0.0003255200 > 3205 M3205 32041 ?0.0079758430 > 3268 M3268 32671 ?0.0012797145 > 4177 M4177 41761 ?0.0015487439 > 4564 M4564 45631 ?0.0024867152 > 4614 M4614 46131 ?0.0078381964 > 4639 M4639 46381 -0.0063151605 > 4706 M4706 47051 ?0.0032200517 > > I want the datapoint (8101, 0.0050637068) with M811 in the plot. Similarly > for all of the above (that are significant). I do not want to label all out > of 5000 who do have p value < 0.01. I know I can add manually - text (8101, > 0.0050637068, M811) in plot() in base. > > plot (dataf$xvar,p) > text (8101, 0.0050637068, "M811") > text (8531, -0.0433901783, "M854") > > I need more automation to deal with observations as high as 50,000. In real > sense I do not know how many variables there will be. > > You help is highly appreciated. Thank you; > > Best Regards > > Umesh R > > > >-- Sarah Goslee http://www.functionaldiversity.org
csrabak
2011-Mar-06 17:46 UTC
[R] please help ! label selected data points in huge number of data points potentially as high as 50, 000 !
Em 5/3/2011 21:29, Umesh Rosyara escreveu:> Dear All > > I am reposting because I my problem is real issue and I have been working on > this. I know this might be simple to those who know it ! Anyway I need help > ! > > Let me clear my point. I have huge number of datapoints plotted using either > base plot function or xyplot in lattice (I have preference to use lattice). > name xvar p > 1 M1 1 0.107983837 > 2 M2 11 0.209125624 > 3 M3 21 0.163959428 > 4 M4 31 0.132469859 > 5 M5 41 0.086095130 > 6 M6 51 0.180822010 > 7 M7 61 0.246619925 > 8 M8 71 0.147363687 > 9 M9 81 0.162663127 > ........ > 5000 observations > > I need to plot xvar (x variable) and p (y variable) using either plot () or > xyplot(). And I want show (print to graph) datapoint name labels to those > rows that have p value< 0.01 (means that they are significant). With my > limited R knowlege I can use text (x,y, labels) option to manually add the > text, but I have huge number of data point(though I provide just 1000 here, > potentially it can go upto 50,000). So I want to display name corresponding > to those observations (rows) that have pvalue less than 0.05 (threshold). > [snipped]Umesh, Given you have been already instructed how to do more or less what you intend to, I want to suggest something more simple which given the description of your problem seems to me more appropriate: Create a vector for changing the symbol used to plot the points in the scatter diagram: p.point <- ifelse(p < 0.01,1,19) # for deciding which symbols you want # look at example(pch) plot(xvar, p, , pch=p.points) HTH -- Cesar Rabak
Maybe Matching Threads
- displaying label meeting condition (i.e. significant, i..e p value less than 005) in plot function
- "tapply versus by" in function with more than 1 arguments
- returning a modified fix()-ed dataframe
- Updating a Data Frame
- [macosx] improving quartz & Aqua Tk behaviour outside of RGui