eugen pircalabelu
2007-Oct-27 08:21 UTC
[R] [non-statistics question]methodological problem
Good afternoon! As mentioned in the subject, my question regards more the methodological part that accompanies survey design and the statistical part that is involved. So, I have the following data: a<-data.frame (id_hh=c(1:5), strata=c(1,1,2,2,1), Nhstrata=c(100,100,200,200,100), Nrmemb=c(2,4,2,5,4)) a$ocmemb1<-c("wk","jl","st","jl","st") a$ocmemb2<-c("wk","jl","st","wk","wk") where id_hh is a code of identification for the household (my analysis refers to households), strata is the strata from which the hh is sampled, Nhstrata is the dimension of the population strata from which the hh is sampled, nrmemb is the no of members in a hh and ocmemb1,2...is the occupation of each individual member of the hh (worker,jobless,student).> aid_hh strata Nhstrata Nrmemb ocmemb1 ocmemb2 1 1 1 100 2 wk wk 2 2 1 100 4 jl jl 3 3 2 200 2 st st 4 4 2 200 5 jl wk 5 5 1 100 4 st wk Now, is there a possibility of designing some weights for each household based on the characteristics of individuals which form the hh? Say, I want to calibrate each hh for its occupational category but i don't have the additional data for household, rather it is available for individuals, ex: I don't know that 32% of households are included in the category of studenthh (inclusion which is based on the status of the head of hh), but i know that 32% of all the individuals from which the sample of hhs is drawn are all students. So, is there a possibility of designing these weights for hhs where additional information is available for the individuals which form that hhs? And is it a solid way of calibrating, i mean is it reliable and trustworthy? Thank you and have a great day! __________________________________________________ [[alternative HTML version deleted]]
On Sat, 27 Oct 2007, eugen pircalabelu wrote:> > As mentioned in the subject, my question regards more the methodological > part that accompanies survey design and the statistical part that is > involved. So, I have the following data:You might get more helpful (or more authoritative) advice on methodological issues in survey sampling on other lists, in particular from srmsnet, rather than posting the same question twice to r-help.> > Now, is there a possibility of designing some weights for each household > based on the characteristics of individuals which form the hh? Say, I > want to calibrate each hh for its occupational category but i don't have > the additional data for household, rather it is available for > individuals, ex: I don't know that 32% of households are included in the > category of studenthh (inclusion which is based on the status of the > head of hh), but i know that 32% of all the individuals from which the > sample of hhs is drawn are all students.Yes and no. You can't calibrate to population totals you don't know. You can create household-level weights that calibrate the individual-level data to individual-level population totals. And the survey() package knows how to do this: it is the aggregate.stage= or aggregate.index= argument to calibrate(), depending on whether you are using replicate weights or design information for your standard errors. I don't know if this technique is useful in your setting. My impression is that it is mainly used by national statistics agencies that want to avoid weird-looking inconsistencies (eg 2,000,000 marriages involving 1,100,000 men and 900,000 women [1]). It is presumably less efficient than using individual-level weights. A description from Statistics Belgium is linked from ?calibrate. -thomas [1] Apart from in civilised places like, eg, Canada or MA. Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle