Dear R-experts, Here below my reproducible example. I would like to fit/add the Gauss normal curve to this data. I don't get it. There is no error message but I don't get what I am looking for. Many thanks for your help. ############################################################ mydates <- as.Date(c("2020-03-15", "2020-03-16","2020-03-17","2020-03-18","2020-03-19","2020-03-20","2020-03-21","2020-03-22","2020-03-23","2020-03-24","2020-03-25","2020-03-26","2020-03-27","2020-03-28","2020-03-29","2020-03-30","2020-03-31","2020-04-01","2020-04-02","2020-04-03","2020-04-04","2020-04-05","2020-04-06","2020-04-07","2020-04-08","2020-04-09","2020-04-10")) nc<-c(1,1,2,7,3,6,6,20,17,46,67,71,56,70,85,93,301,339,325,226,608,546,1069,1264,1340,813,608) plot(as.Date(mydates),nc,pch=16,type="o",col="blue",ylim=c(1,1400), xlim=c(min(as.Date(mydates)),max(as.Date(mydates)))) x <- seq(min(mydates), max(mydates), 0.1) curve(dnorm(x, mean(nc), sd(nc)), add=TRUE, col="red", lwd=2) ############################################################
On 4/11/20 7:00 AM, varin sacha via R-help wrote:> Dear R-experts, > > Here below my reproducible example. I would like to fit/add the Gauss normal curve to this data. > I don't get it. There is no error message but I don't get what I am looking for. > Many thanks for your help. > > ############################################################ > mydates <- as.Date(c("2020-03-15", "2020-03-16","2020-03-17","2020-03-18","2020-03-19","2020-03-20","2020-03-21","2020-03-22","2020-03-23","2020-03-24","2020-03-25","2020-03-26","2020-03-27","2020-03-28","2020-03-29","2020-03-30","2020-03-31","2020-04-01","2020-04-02","2020-04-03","2020-04-04","2020-04-05","2020-04-06","2020-04-07","2020-04-08","2020-04-09","2020-04-10")) > > nc<-c(1,1,2,7,3,6,6,20,17,46,67,71,56,70,85,93,301,339,325,226,608,546,1069,1264,1340,813,608) > > plot(as.Date(mydates),nc,pch=16,type="o",col="blue",ylim=c(1,1400), xlim=c(min(as.Date(mydates)),max(as.Date(mydates)))) > > x <- seq(min(mydates), max(mydates), 0.1) > > curve(dnorm(x, mean(nc), sd(nc)), add=TRUE, col="red", lwd=2)(I infer) The values in the `nc` vector are not taken from observations that are interpretable as independent sampling from a continuous random vector. They are counts, i.e. "new cases". Furthermore, the "x" value in your plot is not the `nc` vector but rather it is the the ""y"-vector, so even if it were appropriate to use a Normal curve for fitting you would need to take the `nc` vector as corresponding to a density along the time axis. You could probably do as well by "eyeballing" where you want the "normal" curve to sit, since there would be no theoretical support for more refined curve fitting efforts. You might also need to scale the density values so they would appear as something other than a flat line. And the `curve` function does need an expression but it would be plotting that result far to the left of your current plotting range which is set by the integer values of those dates, i.e values in the tens of thousands. Use the `lines` function for better control. lines( x= as.numeric(mydates), ? ? ? ? ? ? ? ? ? # 3000 was eyeball guess as to a scaling factor that might work ????????????????? # but needed a larger number to make the curves commensurate ?????? y=10000* dnorm( x= as.numeric(mydates),? #set a proper x scale ??????????????? ? ? ?? mean= as.numeric( mydates[ which.max(nc) ]),? #use location of max ???????????????? ? ? ? sd= 7) ) Might need to use smaller value for the "standard deviation" and higher scaling factor to improve the eyeball fit.You might like a value of sd=4, but it would remain an unsupportable effort from a statistical viewpoint. -- David> ############################################################ > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Two obvious problems: 1. mean(nc) is a count, not a date, sd likewise 2. the scale of dnorm() is density, not count So (slightly inefficient, but who cares...): y <- rep(mydates, nc) n <- sum(nc) curve(n*dnorm(x, mean(y), sd(y)), add=TRUE, col="red", lwd=2) -pd> On 11 Apr 2020, at 16:00 , varin sacha via R-help <r-help at r-project.org> wrote: > > Dear R-experts, > > Here below my reproducible example. I would like to fit/add the Gauss normal curve to this data. > I don't get it. There is no error message but I don't get what I am looking for. > Many thanks for your help. > > ############################################################ > mydates <- as.Date(c("2020-03-15", "2020-03-16","2020-03-17","2020-03-18","2020-03-19","2020-03-20","2020-03-21","2020-03-22","2020-03-23","2020-03-24","2020-03-25","2020-03-26","2020-03-27","2020-03-28","2020-03-29","2020-03-30","2020-03-31","2020-04-01","2020-04-02","2020-04-03","2020-04-04","2020-04-05","2020-04-06","2020-04-07","2020-04-08","2020-04-09","2020-04-10")) > > nc<-c(1,1,2,7,3,6,6,20,17,46,67,71,56,70,85,93,301,339,325,226,608,546,1069,1264,1340,813,608) > > plot(as.Date(mydates),nc,pch=16,type="o",col="blue",ylim=c(1,1400), xlim=c(min(as.Date(mydates)),max(as.Date(mydates)))) > > x <- seq(min(mydates), max(mydates), 0.1) > > curve(dnorm(x, mean(nc), sd(nc)), add=TRUE, col="red", lwd=2) > ############################################################ > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Dear Peter, Dear David, Many thanks for your response. Indeed, counts do not have a Gaussian distribution, even if.... sometimes one approximates the distribution by a Gaussian one, usually using the argument of the Central Limit Theorem. Here below the reproducible example. One last thing. Now if I want to move my red Gaussian curve to the right or to the left, for example on the graph I can see that the Gaussian curve is centered around the 5th of April. Is it possible to move the Gaussian curve to make the center of the Gaussian curve on the 30th of March for example ? If yes, how to do ? ############################################################ mydates <- as.Date(c("2020-03-15", "2020-03-16","2020-03-17","2020-03-18","2020-03-19","2020-03-20","2020-03-21","2020-03-22","2020-03-23","2020-03-24","2020-03-25","2020-03-26","2020-03-27","2020-03-28","2020-03-29","2020-03-30","2020-03-31","2020-04-01","2020-04-02","2020-04-03","2020-04-04","2020-04-05","2020-04-06","2020-04-07","2020-04-08","2020-04-09","2020-04-10")) nc<-c(1,1,2,7,3,6,6,20,17,46,67,71,56,70,85,93,301,339,325,226,608,546,1069,1264,1340,813,608) plot(as.Date(mydates),nc,pch=16,type="o",col="blue",ylim=c(1,1400), xlim=c(min(as.Date(mydates)),max(as.Date(mydates)))) y <- rep(mydates, nc) n <- sum(nc) curve(n*dnorm(x, mean(y), sd(y)), add=TRUE, col="red", lwd=2) ############################################################ Le samedi 11 avril 2020 ? 17:02:36 UTC+2, peter dalgaard <pdalgd at gmail.com> a ?crit : Two obvious problems: 1. mean(nc) is a count, not a date, sd likewise 2. the scale of dnorm() is density, not count So (slightly inefficient, but who cares...): y <- rep(mydates, nc) n <- sum(nc) curve(n*dnorm(x, mean(y), sd(y)), add=TRUE, col="red", lwd=2) -pd> On 11 Apr 2020, at 16:00 , varin sacha via R-help <r-help at r-project.org> wrote: > > Dear R-experts, > > Here below my reproducible example. I would like to fit/add the Gauss normal curve to this data. > I don't get it. There is no error message but I don't get what I am looking for. > Many thanks for your help. > > ############################################################ > mydates <- as.Date(c("2020-03-15", "2020-03-16","2020-03-17","2020-03-18","2020-03-19","2020-03-20","2020-03-21","2020-03-22","2020-03-23","2020-03-24","2020-03-25","2020-03-26","2020-03-27","2020-03-28","2020-03-29","2020-03-30","2020-03-31","2020-04-01","2020-04-02","2020-04-03","2020-04-04","2020-04-05","2020-04-06","2020-04-07","2020-04-08","2020-04-09","2020-04-10")) > > nc<-c(1,1,2,7,3,6,6,20,17,46,67,71,56,70,85,93,301,339,325,226,608,546,1069,1264,1340,813,608) > > plot(as.Date(mydates),nc,pch=16,type="o",col="blue",ylim=c(1,1400), xlim=c(min(as.Date(mydates)),max(as.Date(mydates)))) > > x <- seq(min(mydates), max(mydates), 0.1) > > curve(dnorm(x, mean(nc), sd(nc)), add=TRUE, col="red", lwd=2) > ############################################################ > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk? Priv: PDalgd at gmail.com