Dear all, I am trying to validate a model by comparing simulated output values against observed values. I have produced a simple X-y scatter plot with a 1:1 line, so that the closer the points fall to this line, the better the 'fit' between the modelled data and the observation data. I am now attempting to quantify the strength of this fit by using a statistical test in R. I am no statistics guru, but from my limited understanding, I suspect that I need to use the Chi Squared test (I am more than happy to be corrected on this though!). However, this results in the following:> chisq.test(data$Simulation,data$Observation)??? Pearson's Chi-squared test data:? data$Simulation and data$Observation X-squared = 567, df = 550, p-value = 0.2989 Warning message: In chisq.test(data$Simulation, data$Observation) : ? Chi-squared approximation may be incorrect The ?chisq.test document suggests that the objects should be of vector or matrix format, so I tried the following, but still receive a warning message (and different results):> chisq.test(as.matrix(data[,4:5]))??? Pearson's Chi-squared test data:? as.matrix(data[, 4:5]) X-squared = 130.8284, df = 26, p-value = 6.095e-16 Warning message: In chisq.test(as.matrix(data[, 4:5])) : ? Chi-squared approximation may be incorrect What am I doing wrong and how can I successfully measure how well the simulated values fit the observed values? If it's of any help, here are how my data are structured - note that I am only using columns 4 and 5 (Observation and Simulation).> str(data)'data.frame':??? 27 obs. of? 5 variables: ?$ Location??????? : Factor w/ 27 levels "Australia","Brazil",..: 8 2 13 19 22 14 16 23 6 7 ... ?$ Vegetation????? : Factor w/ 21 levels "Beech","Broadleaf evergreen laurel",..: 17 21 2 16 15 16 9 16 3 4 ... ?$ Vegetation.Class: Factor w/ 4 levels "Boreal and Temperate Evergreen",..: 3 3 4 1 1 1 4 1 4 1 ... ?$ Observation???? : num? 24 8.9 14.7 26.7 42.4 31.7 30.8 7.5 14 22 ... ?$ Simulation????? : num? 33.9 7.8 9.74 7.6 11.8 10.7 12 28.1 1.7 1.7 ... I hope someone is able to point me in the right direction. Many thanks, Steve _________________________________________________________________ Have more than one Hotmail account? Link them together to easily access both http://clk.atdmt.com/UKM/go/186394591/direct/01/
On Nov 26, 2009, at 9:48 AM, Steve Murray wrote:> > Dear all, > > I am trying to validate a model by comparing simulated output values > against observed values. I have produced a simple X-y scatter plot > with a 1:1 line, so that the closer the points fall to this line, > the better the 'fit' between the modelled data and the observation > data. > > I am now attempting to quantify the strength of this fit by using a > statistical test in R. I am no statistics guru, but from my limited > understanding, I suspect that I need to use the Chi Squared test (I > am more than happy to be corrected on this though!). > > However, this results in the following: > > >> chisq.test(data$Simulation,data$Observation) > > Pearson's Chi-squared test > > data: data$Simulation and data$Observation > X-squared = 567, df = 550, p-value = 0.2989 > > Warning message: > In chisq.test(data$Simulation, data$Observation) : > Chi-squared approximation may be incorrect > > > The ?chisq.test document suggests that the objects should be of > vector or matrix format, so I tried the following, but still receive > a warning message (and different results): > >> chisq.test(as.matrix(data[,4:5])) > > Pearson's Chi-squared test > > data: as.matrix(data[, 4:5]) > X-squared = 130.8284, df = 26, p-value = 6.095e-16When you look at your "data" you see only 27 cases, so it would be implausible that your first invocation with a degree of freedom = 550 would be giving you something meaningful. The second one might have been more meaningful goodness of fit. I cannot explain why code # 1 did not give the same results since I would have thought that the positional matching of R would have resulted in the same results for both calls. What happens if you try: chisq.test(data$Simulation, y=data$Observation) # ? All of that being said, chisq.test is primarily intended for contingency tables. Testing association between two paired continuous variables is usually approached with regression and correlation tests. E.g.: ?cor ?lm Also may want to look at the Q-Q plot. ?qqplot -- David Winsemius> > Warning message: > In chisq.test(as.matrix(data[, 4:5])) : > Chi-squared approximation may be incorrect > > > > What am I doing wrong and how can I successfully measure how well > the simulated values fit the observed values? > > > If it's of any help, here are how my data are structured - note that > I am only using columns 4 and 5 (Observation and Simulation). > >> str(data) > 'data.frame': 27 obs. of 5 variables: > $ Location : Factor w/ 27 levels "Australia","Brazil",..: 8 > 2 13 19 22 14 16 23 6 7 ... > $ Vegetation : Factor w/ 21 levels "Beech","Broadleaf > evergreen laurel",..: 17 21 2 16 15 16 9 16 3 4 ... > $ Vegetation.Class: Factor w/ 4 levels "Boreal and Temperate > Evergreen",..: 3 3 4 1 1 1 4 1 4 1 ... > $ Observation : num 24 8.9 14.7 26.7 42.4 31.7 30.8 7.5 14 > 22 ... > $ Simulation : num 33.9 7.8 9.74 7.6 11.8 10.7 12 28.1 1.7 > 1.7 ... > > > I hope someone is able to point me in the right direction. > > Many thanks, >David Winsemius, MD Heritage Laboratories West Hartford, CT
Steve Murray wrote:> Dear all, > > I am trying to validate a model by comparing simulated output values against observed values. I have produced a simple X-y scatter plot with a 1:1 line, so that the closer the points fall to this line, the better the 'fit' between the modelled data and the observation data. > > I am now attempting to quantify the strength of this fit by using a statistical test in R. I am no statistics guru, but from my limited understanding, I suspect that I need to use the Chi Squared test (I am more than happy to be corrected on this though!). > > However, this results in the following: > > >> chisq.test(data$Simulation,data$Observation) > > Pearson's Chi-squared test > > data: data$Simulation and data$Observation > X-squared = 567, df = 550, p-value = 0.2989 > > Warning message: > In chisq.test(data$Simulation, data$Observation) : > Chi-squared approximation may be incorrect > > > The ?chisq.test document suggests that the objects should be of vector or matrix format, so I tried the following, but still receive a warning message (and different results): > >> chisq.test(as.matrix(data[,4:5])) > > Pearson's Chi-squared test > > data: as.matrix(data[, 4:5]) > X-squared = 130.8284, df = 26, p-value = 6.095e-16 > > Warning message: > In chisq.test(as.matrix(data[, 4:5])) : > Chi-squared approximation may be incorrect > > > > What am I doing wrong and how can I successfully measure how well the simulated values fit the observed values? > > > If it's of any help, here are how my data are structured - note that I am only using columns 4 and 5 (Observation and Simulation). > >> str(data) > 'data.frame': 27 obs. of 5 variables: > $ Location : Factor w/ 27 levels "Australia","Brazil",..: 8 2 13 19 22 14 16 23 6 7 ... > $ Vegetation : Factor w/ 21 levels "Beech","Broadleaf evergreen laurel",..: 17 21 2 16 15 16 9 16 3 4 ... > $ Vegetation.Class: Factor w/ 4 levels "Boreal and Temperate Evergreen",..: 3 3 4 1 1 1 4 1 4 1 ... > $ Observation : num 24 8.9 14.7 26.7 42.4 31.7 30.8 7.5 14 22 ... > $ Simulation : num 33.9 7.8 9.74 7.6 11.8 10.7 12 28.1 1.7 1.7 ... >The chisquare test is not the right thing here. You may have been fooled by the "goodness-of-fit" phrase associated with the test. I would do a cor.test(). But if the above is the real data, then there probably isn't much to test; you have very little agreement for the first 10 pairs. -Peter Ehlers> > I hope someone is able to point me in the right direction. > > Many thanks, > > Steve > > > > > _________________________________________________________________ > Have more than one Hotmail account? Link them together to easily access both > http://clk.atdmt.com/UKM/go/186394591/direct/01/ > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >