Dear all R users : I am a IT student with few statistical background and new R user for only have two month exprience. I have a data named medcost, import by read.table() as follow for example (real dataset has 500 cases), the heander id means case id, member means members in a family and cost is the family pay for medical cost every 6 months. id member cost 1 4 320 2 2 150 3 3 420 4 5 330 5 6 540 6 2 310 7 4 169 8 6 647 9 3 347 10 4 567 I would like to use this dataset with chi-sqare analysis to see if there is any realationship between family member and medical cost (more members in a family will rise their medical cost?) I have found the pacage called stats, but I think need to transform the dataset into a contingency table as I read from books. I am not sure if I correct, I think the table should looks like: member cost [2] [3] [4] [5] [6] Total [0,100] 1 0 0 0 0 1 [100,200] 0 0 1 0 0 1 [200,300] 0 0 0 0 0 0 [300,400] 1 1 1 1 0 4 [400,500] 0 1 0 0 0 1 [500,600] 0 0 1 0 1 2 [600,700] 0 0 0 0 1 1 Total 2 2 3 1 2 10 I did try to use the method in chapter 5.0 of "R Introduction" to create freqency table, but it did not work. I am wondering if any one can help me with it? Thank you for your help. Regards Charlie .
If you want to test whether ' member' has an effect on 'cost' (a continuous numerical variable), I do not recommend using a chi.square test, but rather a simple linear regression or a one-way analysis of variance. Chi.square are for categorical variables and unless you have a good reason, there is no need to discretize the 'cost' variable. First, look at the data to see if "member" actually has an effect on "cost" (I assume your data in a data.frame 'a'): tapply(a$cost,a$member,summary) boxplot(a$cost ~ a$ member) Then, try a linear regression l<- lm(a$cost ~ a$member) plot(a$cost ~ a$member) abline(l) summary(l) or, if the linear fit is not too good, use an analysis of variance (ANOVA): l<- aov(a$cost ~ as.factor(a$member)) summary(l) TukeyHSD(l) On 6/12/07, Charlie Chi <tsang0323@hotmail.com> wrote:> > Dear all R users > : > I am a IT student with few statistical background and new R user for only > have two month exprience. I have a data named medcost, import by > read.table() as follow for example (real dataset has 500 cases), the > heander id means case id, member means members in a family and cost is the > family pay for medical cost every 6 months. > > id member cost > 1 4 320 > 2 2 150 > 3 3 420 > 4 5 330 > 5 6 540 > 6 2 310 > 7 4 169 > 8 6 647 > 9 3 347 > 10 4 567 > > I would like to use this dataset with chi-sqare analysis to see if there > is > any realationship between family member and medical cost (more members in > a > family will rise their medical cost?) I have found the pacage called > stats, > but I think need to transform the dataset into a contingency table as I > read from books. I am not sure if I correct, I think the table should > looks > like: > member > cost [2] [3] [4] [5] [6] Total > [0,100] 1 0 0 0 0 1 > [100,200] 0 0 1 0 0 1 > [200,300] 0 0 0 0 0 0 > [300,400] 1 1 1 1 0 4 > [400,500] 0 1 0 0 0 1 > [500,600] 0 0 1 0 1 2 > [600,700] 0 0 0 0 1 1 > Total 2 2 3 1 2 10 > > I did try to use the method in chapter 5.0 of "R Introduction" to create > freqency table, but it did not work. I am wondering if any one can help me > with it? Thank you for your help. > > Regards > > Charlie > . > > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Christophe Pallier (http://www.pallier.org) [[alternative HTML version deleted]]
Dear Charlie dat <- data.frame(id = 1:10, member = c(4,2,3,5,6,2,4,6,3,4), cost = c(320,150,420,330,540,310,169,647,347,567)) dat[,"costF"] <- cut(dat[,"cost"], breaks = seq(100, 700, by=100)) table(dat[,"costF"], dat[,"member"]) This should create the table you like. Best regards, Christoph -------------------------------------------------------------- Credit and Surety PML study: visit our web page www.cs-pml.org -------------------------------------------------------------- Christoph Buser <buser at stat.math.ethz.ch> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -------------------------------------------------------------- Charlie Chi writes: > Dear all R users > : > I am a IT student with few statistical background and new R user for only > have two month exprience. I have a data named medcost, import by > read.table() as follow for example (real dataset has 500 cases), the > heander id means case id, member means members in a family and cost is the > family pay for medical cost every 6 months. > > id member cost > 1 4 320 > 2 2 150 > 3 3 420 > 4 5 330 > 5 6 540 > 6 2 310 > 7 4 169 > 8 6 647 > 9 3 347 > 10 4 567 > > I would like to use this dataset with chi-sqare analysis to see if there is > any realationship between family member and medical cost (more members in a > family will rise their medical cost?) I have found the pacage called stats, > but I think need to transform the dataset into a contingency table as I > read from books. I am not sure if I correct, I think the table should looks > like: > member > cost [2] [3] [4] [5] [6] Total > [0,100] 1 0 0 0 0 1 > [100,200] 0 0 1 0 0 1 > [200,300] 0 0 0 0 0 0 > [300,400] 1 1 1 1 0 4 > [400,500] 0 1 0 0 0 1 > [500,600] 0 0 1 0 1 2 > [600,700] 0 0 0 0 1 1 > Total 2 2 3 1 2 10 > > I did try to use the method in chapter 5.0 of "R Introduction" to create > freqency table, but it did not work. I am wondering if any one can help me > with it? Thank you for your help. > > Regards > > Charlie > .. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.