I have a scatter plot with 10000 points.? I would like to add a line that bins every 50 points and connects the average of each bin.? I'm looking for something similar to line type "m" in Stata. With this dataset of 10000 points, I would also like to bin the data and make boxplots at certain intervals, so that I have a set of boxplots to represent each bin.? I would also like the width of each box to be proportional to the number of points in each bin. How can I make these plots?? Is there a simple package to use? Jeffrey
On Nov 22, 2011, at 12:29 AM, Jeffrey Joh wrote:> > I have a scatter plot with 10000 points.So you have numeric x and y values.> I would like to add a line that bins every 50 points and connects > the average of each bin.What is the rule to be applied to form these bins? You may want to look at ?cut and ?quantile> I'm looking for something similar to line type "m" in Stata.Many. perhaps most. of us do not know what that means. People complain about the help files for R but they are crystal clear compared with the help files I have seen for Stata, so I do not intend searching those out.> > With this dataset of 10000 points, I would also like to bin the data > and make boxplots at certain intervals,... of what?> so that I have a set of boxplots to represent each bin. I would > also like the width of each box to be proportional to the number of > points in each bin.> > How can I make these plots? Is there a simple package to use?Probably any of the three plotting paradigms can be used but you need to describe the problem in an unambiguous manner.> > Jeffrey > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Hi Jeffrey, See ?factor ?rep and ?cut basically you just need to create another variable that indicates what bin a point belongs to, and then you just do regular plots. If you want to mix the scatterplot and the binned points, you'll need to make sure the bins fall somehwere in the same space. Once you have a variable indicating the bins. You could center the bins at the mean, median, or ... for each set of points using ?ave. Cheers, Josh On Mon, Nov 21, 2011 at 9:29 PM, Jeffrey Joh <johjeffrey at hotmail.com> wrote:> > I have a scatter plot with 10000 points.? I would like to add a line that bins every 50 points and connects the average of each bin.? I'm looking for something similar to line type "m" in Stata. > > With this dataset of 10000 points, I would also like to bin the data and make boxplots at certain intervals, so that I have a set of boxplots to represent each bin.? I would also like the width of each box to be proportional to the number of points in each bin. > > How can I make these plots?? Is there a simple package to use? > > Jeffrey > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
On 11/22/2011 04:29 PM, Jeffrey Joh wrote:> > I have a scatter plot with 10000 points. I would like to add a line that bins every 50 points and connects the average of each bin. I'm looking for something similar to line type "m" in Stata. > > With this dataset of 10000 points, I would also like to bin the data and make boxplots at certain intervals, so that I have a set of boxplots to represent each bin. I would also like the width of each box to be proportional to the number of points in each bin. > > How can I make these plots? Is there a simple package to use? >Hi Jeffrey, There are three possibilities that come to mind: 1) You want to bin the points based on their order in the data frame. 2) You want to bin the points based on the x or y values of the coordinates. 3) You want to bin the points based on the x _and_ y values of the coordinates. Number 1 is trivial and has already been answered (assume a two column data frame of coordinates named "xypoints"). #first point - set up a loop to get a vector of averages meanx<-rep(0,200) meany<-rep(0,200) for(index in 1:200) { start<-1+50*(index-1) meanx[index]<-mean(xypoints[start:(start+49),"x"]) meany[index]<-mean(xypoints[start:(start+49),"y"]) } plot(meanx,meany,type="l") Number 2 requires that you sort the pairs based on the value of the one you want, then apply the same process as 1 to the sorted pairs. Number 3 is somewhat more difficult. I don't do this much, and some of the people who do map analysis will probably come up with a much better method. Find the most extreme point. Find the 49 points closest to that point to constitute group 1. Remove those points from the data frame. Go back to the first step if there are any points left. You will end up with 200 groups of points that are spatially grouped. Get the centroids and plot as above. Another wild guess from Jim