Hi, I've been running some simulations for a while and the performance of R has been great. However, I've recently changed the code to perform a sort of chi-square goodness-of-fit test. To get the observed values for each cell I've been using table() - specifically I've been using cut2 from Hmisc to divide up the range into a specified number of cells and then using table to count how many observations appear in each cell.> obs <- table(cut2(z.trun, cuts=breaks))Having done this I've found that the code takes much longer to run - up to 10x as long. Is there a more effecient way of doing this? Anyone have any thoughts? -- SC Simon Cullen Room 3030 Dept. Of Economics Trinity College Dublin Ph. (608)3477 Email cullens at tcd.ie
Have you tried using hist() with specifying `br' and `plot = FALSE'? See the note in ?cut. -roger Simon Cullen wrote:> Hi, > > I've been running some simulations for a while and the performance of R > has been great. However, I've recently changed the code to perform a > sort of chi-square goodness-of-fit test. To get the observed values for > each cell I've been using table() - specifically I've been using cut2 > from Hmisc to divide up the range into a specified number of cells and > then using table to count how many observations appear in each cell. > >> obs <- table(cut2(z.trun, cuts=breaks)) > > > Having done this I've found that the code takes much longer to run - up > to 10x as long. Is there a more effecient way of doing this? Anyone > have any thoughts? >-- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/
Since you didn't provide an example of what z.trun and breaks may look like, most people can only guess. Before asking how code can be made more efficient, it might be more helpful to find out where in the code is taking time. Try: Rprof() obs <- table(cut2(z.trun, cuts=breaks)) Rprof(NULL) summaryRprof() Andy> From: Simon Cullen > > Hi, > > I've been running some simulations for a while and the > performance of R > has been great. However, I've recently changed the code to > perform a sort > of chi-square goodness-of-fit test. To get the observed > values for each > cell I've been using table() - specifically I've been using > cut2 from > Hmisc to divide up the range into a specified number of cells > and then > using table to count how many observations appear in each cell. > > > obs <- table(cut2(z.trun, cuts=breaks)) > > Having done this I've found that the code takes much longer > to run - up to > 10x as long. Is there a more effecient way of doing this? > Anyone have any > thoughts? > > -- > SC > > Simon Cullen > Room 3030 > Dept. Of Economics > Trinity College Dublin > > Ph. (608)3477 > Email cullens at tcd.ie
On Tue, 2004-07-06 at 07:56, Simon Cullen wrote:> Hi, > > I've been running some simulations for a while and the performance of R > has been great. However, I've recently changed the code to perform a sort > of chi-square goodness-of-fit test. To get the observed values for each > cell I've been using table() - specifically I've been using cut2 from > Hmisc to divide up the range into a specified number of cells and then > using table to count how many observations appear in each cell. > > > obs <- table(cut2(z.trun, cuts=breaks)) > > Having done this I've found that the code takes much longer to run - up to > 10x as long. Is there a more effecient way of doing this? Anyone have any > thoughts?It would appear that you might be attempting to do a Hosmer-Lemeshow type of GOF test. If indeed that is the case, before making the above more efficient, you should spend some time reviewing the following posts by Frank Harrell on this subject: http://maths.newcastle.edu.au/~rking/R/help/02b/4210.html http://maths.newcastle.edu.au/~rking/R/help/02b/3111.html HTH, Marc Schwartz