Benjamin Dubreuil
2015-Jun-02 10:37 UTC
[R] Scatterplot : smoothing colors according to density of points
Hello everyone, I have a data frame D with 4 columns id,X,Y,C. I want to plot a simple scatter plot of D$X vs. D$Y and using D$C values as a color. (id is just a text string not used for the plot) But actually, I don't want to use the raw values of D$C, I would prefer to calculate the average values of D$C according to the density of points in a fixed neighborhood. In other words, I would like to smooth the colors according to the density of points. I am looking for any function,package that could solve this. So far, I've been looking at library MASS and the function kde2d which can calculate the density of points in 2 directions, but I don't see how I could then use this information to recalculate my D$C values. Here is a piece of the matrix : > head(D) id X Y C 1 O13297 44.444444 21.61220 -0.136651639 2 O13329 31.272085 4.01590 -0.117016949 3 O13525 6.865672 2.43884 -0.161173913 4 O13539 14.176245 7.81217 -0.075756757 5 O13541 73.275862 3.59012 -0.006988235 6 O13547 28.991597 258.99900 -0.013985507> dim(D)[1] 3616 4> apply(D[,-1],2,range)X Y C [1,] 0.3378378 0.0003 -0.7382222 [2,] 100.0000000 24556.4000 0.5582500 (Y is not linear, so I use log='y' in the plot function) I used a palette of 100 colors ranging from Blue to Yellow to red.>pal = colorRampPalette(c("blue","yellow","red"))(100)To make D$C values correspond to a color, I used a cut with the following breaks (101 breaks from -1.2 to 1.2):> BREAKS[1] -1.2000 -0.8000 -0.4000 -0.3600 -0.3200 -0.2800 -0.2400 -0.2000 -0.1925 [10] -0.1850 -0.1775 -0.1700 -0.1625 -0.1550 -0.1475 -0.1400 -0.1368 -0.1336 [19] -0.1304 -0.1272 -0.1240 -0.1208 -0.1176 -0.1144 -0.1112 -0.1080 -0.1048 [28] -0.1016 -0.0984 -0.0952 -0.0920 -0.0888 -0.0856 -0.0824 -0.0792 -0.0760 [37] -0.0728 -0.0696 -0.0664 -0.0632 -0.0600 -0.0568 -0.0536 -0.0504 -0.0472 [46] -0.0440 -0.0408 -0.0376 -0.0344 -0.0312 -0.0280 -0.0248 -0.0216 -0.0184 [55] -0.0152 -0.0120 -0.0088 -0.0056 -0.0024 0.0008 0.0040 0.0072 0.0104 [64] 0.0136 0.0168 0.0200 0.0232 0.0264 0.0296 0.0328 0.0360 0.0392 [73] 0.0424 0.0456 0.0488 0.0520 0.0552 0.0584 0.0616 0.0648 0.0680 [82] 0.0712 0.0744 0.0776 0.0808 0.0840 0.0872 0.0904 0.0936 0.0968 [91] 0.1000 0.1250 0.1500 0.1750 0.2000 0.2250 0.2500 0.4875 0.7250 [100] 0.9625 1.2000> C.levels = as.numeric(cut(D$C,breaks=BREAKS)) >length(C.levels)[1] 3616 C.levels ranges from 2 to 98 and then to plot the colors I used pal[C.levels].> plot( x=D$x, y=D$Y, col=pal[ C.levels ],log='y')[[alternative HTML version deleted]]
Adams, Jean
2015-Jun-02 13:51 UTC
[R] Scatterplot : smoothing colors according to density of points
Try this. Jean D <- structure(list( id = structure(1:6, .Label = c("O13297", "O13329", "O13525", "O13539", "O13541", "O13547"), class = "factor"), X = c(44.444444, 31.272085, 6.865672, 14.176245, 73.275862, 28.991597), Y = c(21.6122, 4.0159, 2.43884, 7.81217, 3.59012, 258.999)), .Names = c("id", "X", "Y"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6")) # define the number of colors ncol <- 100 # define the radius of the neighborhood distcut <- 30 pal <- colorRampPalette(c("blue", "yellow", "red"))(ncol) # calculate the euclidean distance between all pairs of points, based on X, Y coordinates Ddist <- with(D, as.matrix(dist(cbind(X, Y), diag=TRUE, upper=TRUE))) # count up the number of neighbors within distcut distance of each point D$C <- apply(Ddist<distcut, 2, sum) # use this count to define the levels (which will be then used to color points in the plot D$Clevels <- with(D, cut(C, breaks=seq(min(C), max(C), length.out=ncol+1), labels=FALSE, include.lowest=TRUE)) # plot the data with(D, plot(X, Y, col=pal[Clevels], log="y", pch=16)) On Tue, Jun 2, 2015 at 5:37 AM, Benjamin Dubreuil < benjamin.dubreuil at weizmann.ac.il> wrote:> Hello everyone, > > I have a data frame D with 4 columns id,X,Y,C. > I want to plot a simple scatter plot of D$X vs. D$Y and using D$C values > as a color. (id is just a text string not used for the plot) > > But actually, I don't want to use the raw values of D$C, I would prefer to > calculate the average values of D$C according to the density of points in a > fixed neighborhood. > In other words, I would like to smooth the colors according to the density > of points. > > I am looking for any function,package that could solve this. > So far, I've been looking at library MASS and the function kde2d which can > calculate the density of points in 2 directions, but I don't see how I > could then use this information to recalculate my D$C values. > > Here is a piece of the matrix : > > head(D) > id X Y C > 1 O13297 44.444444 21.61220 -0.136651639 > 2 O13329 31.272085 4.01590 -0.117016949 > 3 O13525 6.865672 2.43884 -0.161173913 > 4 O13539 14.176245 7.81217 -0.075756757 > 5 O13541 73.275862 3.59012 -0.006988235 > 6 O13547 28.991597 258.99900 -0.013985507 > > > dim(D) > [1] 3616 4 > > > apply(D[,-1],2,range) > X Y C > [1,] 0.3378378 0.0003 -0.7382222 > [2,] 100.0000000 24556.4000 0.5582500 > (Y is not linear, so I use log='y' in the plot function) > > I used a palette of 100 colors ranging from Blue to Yellow to red. > >pal = colorRampPalette(c("blue","yellow","red"))(100) > > To make D$C values correspond to a color, I used a cut with the following > breaks (101 breaks from -1.2 to 1.2): > > BREAKS > [1] -1.2000 -0.8000 -0.4000 -0.3600 -0.3200 -0.2800 -0.2400 -0.2000 > -0.1925 > [10] -0.1850 -0.1775 -0.1700 -0.1625 -0.1550 -0.1475 -0.1400 -0.1368 > -0.1336 > [19] -0.1304 -0.1272 -0.1240 -0.1208 -0.1176 -0.1144 -0.1112 -0.1080 > -0.1048 > [28] -0.1016 -0.0984 -0.0952 -0.0920 -0.0888 -0.0856 -0.0824 -0.0792 > -0.0760 > [37] -0.0728 -0.0696 -0.0664 -0.0632 -0.0600 -0.0568 -0.0536 -0.0504 > -0.0472 > [46] -0.0440 -0.0408 -0.0376 -0.0344 -0.0312 -0.0280 -0.0248 -0.0216 > -0.0184 > [55] -0.0152 -0.0120 -0.0088 -0.0056 -0.0024 0.0008 0.0040 0.0072 > 0.0104 > [64] 0.0136 0.0168 0.0200 0.0232 0.0264 0.0296 0.0328 0.0360 > 0.0392 > [73] 0.0424 0.0456 0.0488 0.0520 0.0552 0.0584 0.0616 0.0648 > 0.0680 > [82] 0.0712 0.0744 0.0776 0.0808 0.0840 0.0872 0.0904 0.0936 > 0.0968 > [91] 0.1000 0.1250 0.1500 0.1750 0.2000 0.2250 0.2500 0.4875 > 0.7250 > [100] 0.9625 1.2000 > > C.levels = as.numeric(cut(D$C,breaks=BREAKS)) > >length(C.levels) > [1] 3616 > > C.levels ranges from 2 to 98 and then to plot the colors I used > pal[C.levels]. > > plot( x=D$x, y=D$Y, col=pal[ C.levels ],log='y') > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
jim holtman
2015-Jun-15 00:08 UTC
[R] Scatterplot : smoothing colors according to density of points
check out the 'hexbin' package for making scatter plots that have a lot of points overlapping in a small area. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Tue, Jun 2, 2015 at 9:51 AM, Adams, Jean <jvadams at usgs.gov> wrote:> Try this. > > Jean > > D <- structure(list( > id = structure(1:6, .Label = c("O13297", "O13329", "O13525", > "O13539", "O13541", "O13547"), class = "factor"), > X = c(44.444444, 31.272085, 6.865672, 14.176245, 73.275862, > 28.991597), > Y = c(21.6122, 4.0159, 2.43884, 7.81217, 3.59012, 258.999)), > .Names = c("id", "X", "Y"), class = "data.frame", > row.names = c("1", "2", "3", "4", "5", "6")) > > # define the number of colors > ncol <- 100 > # define the radius of the neighborhood > distcut <- 30 > pal <- colorRampPalette(c("blue", "yellow", "red"))(ncol) > > # calculate the euclidean distance between all pairs of points, based on X, > Y coordinates > Ddist <- with(D, as.matrix(dist(cbind(X, Y), diag=TRUE, upper=TRUE))) > # count up the number of neighbors within distcut distance of each point > D$C <- apply(Ddist<distcut, 2, sum) > # use this count to define the levels (which will be then used to color > points in the plot > D$Clevels <- with(D, > cut(C, breaks=seq(min(C), max(C), length.out=ncol+1), > labels=FALSE, include.lowest=TRUE)) > > # plot the data > with(D, plot(X, Y, col=pal[Clevels], log="y", pch=16)) > > > > On Tue, Jun 2, 2015 at 5:37 AM, Benjamin Dubreuil < > benjamin.dubreuil at weizmann.ac.il> wrote: > > > Hello everyone, > > > > I have a data frame D with 4 columns id,X,Y,C. > > I want to plot a simple scatter plot of D$X vs. D$Y and using D$C values > > as a color. (id is just a text string not used for the plot) > > > > But actually, I don't want to use the raw values of D$C, I would prefer > to > > calculate the average values of D$C according to the density of points > in a > > fixed neighborhood. > > In other words, I would like to smooth the colors according to the > density > > of points. > > > > I am looking for any function,package that could solve this. > > So far, I've been looking at library MASS and the function kde2d which > can > > calculate the density of points in 2 directions, but I don't see how I > > could then use this information to recalculate my D$C values. > > > > Here is a piece of the matrix : > > > head(D) > > id X Y C > > 1 O13297 44.444444 21.61220 -0.136651639 > > 2 O13329 31.272085 4.01590 -0.117016949 > > 3 O13525 6.865672 2.43884 -0.161173913 > > 4 O13539 14.176245 7.81217 -0.075756757 > > 5 O13541 73.275862 3.59012 -0.006988235 > > 6 O13547 28.991597 258.99900 -0.013985507 > > > > > dim(D) > > [1] 3616 4 > > > > > apply(D[,-1],2,range) > > X Y C > > [1,] 0.3378378 0.0003 -0.7382222 > > [2,] 100.0000000 24556.4000 0.5582500 > > (Y is not linear, so I use log='y' in the plot function) > > > > I used a palette of 100 colors ranging from Blue to Yellow to red. > > >pal = colorRampPalette(c("blue","yellow","red"))(100) > > > > To make D$C values correspond to a color, I used a cut with the following > > breaks (101 breaks from -1.2 to 1.2): > > > BREAKS > > [1] -1.2000 -0.8000 -0.4000 -0.3600 -0.3200 -0.2800 -0.2400 -0.2000 > > -0.1925 > > [10] -0.1850 -0.1775 -0.1700 -0.1625 -0.1550 -0.1475 -0.1400 -0.1368 > > -0.1336 > > [19] -0.1304 -0.1272 -0.1240 -0.1208 -0.1176 -0.1144 -0.1112 -0.1080 > > -0.1048 > > [28] -0.1016 -0.0984 -0.0952 -0.0920 -0.0888 -0.0856 -0.0824 -0.0792 > > -0.0760 > > [37] -0.0728 -0.0696 -0.0664 -0.0632 -0.0600 -0.0568 -0.0536 -0.0504 > > -0.0472 > > [46] -0.0440 -0.0408 -0.0376 -0.0344 -0.0312 -0.0280 -0.0248 -0.0216 > > -0.0184 > > [55] -0.0152 -0.0120 -0.0088 -0.0056 -0.0024 0.0008 0.0040 0.0072 > > 0.0104 > > [64] 0.0136 0.0168 0.0200 0.0232 0.0264 0.0296 0.0328 0.0360 > > 0.0392 > > [73] 0.0424 0.0456 0.0488 0.0520 0.0552 0.0584 0.0616 0.0648 > > 0.0680 > > [82] 0.0712 0.0744 0.0776 0.0808 0.0840 0.0872 0.0904 0.0936 > > 0.0968 > > [91] 0.1000 0.1250 0.1500 0.1750 0.2000 0.2250 0.2500 0.4875 > > 0.7250 > > [100] 0.9625 1.2000 > > > C.levels = as.numeric(cut(D$C,breaks=BREAKS)) > > >length(C.levels) > > [1] 3616 > > > > C.levels ranges from 2 to 98 and then to plot the colors I used > > pal[C.levels]. > > > plot( x=D$x, y=D$Y, col=pal[ C.levels ],log='y') > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]