thr3ads.net - R help es - [R-es] help: analisis factorial con NA (Chabi) [Dec 2011]

If this information is useful, please help other people find it:
Share via:

javier villacampa

2011-Dec-14 00:47 UTC

[R-es] help: analisis factorial con NA (Chabi)

Hola buenas:

Estoy haciendo un análisis factorial y me gustaría saber si hay alguna manera de
aprovechar las filas con algún dato perdido o NA.

De momento estoy utilizando los datos sin los perdidos.
factanal(na.omit(datos),factors=4)

Pero si fuese posible y sobretodo riguroso matemáticamente hablado, me gustaría
emplear las filas con algún dato perdido o NA.

Muchas gracias a todos.

 Javier(Chabi) Villacampa González.

 		 	   		  
	[[alternative HTML version deleted]]

Carlos Ortega

2011-Dec-14 22:10 UTC

head link

[R-es] help: analisis factorial con NA (Chabi)

Hola Javier,

Es un tema que tiene difícil solución. Como referencia te adjunto el
detalle que aparece en el libro "An Introduction to Applied Multivariate
Analysis with R" que justamente trata este asunto en el primer capítulo.
Verás que las soluciones que se plantean no son definitivas, pero al menos
hay algunas alternativas.

Y bueno, además de este detalle, te sugiero que leas el capítulo dedicado
especialmente al análisis factorial y la discusión sobre alternativas a
esta técnica.

Saludos,
Carlos Ortega
www.qualityexcellence.es


############# Referencia ########
# An Introduction to Applied
# Multivariate Analysis with R
#
# Brian Everitt
# Torsten Hothorn
#
# Springer 2011
###############################

1.3.1 Missing values

Table 1.1 also illustrates one of the problems often faced by statisticians
un- dertaking statistical analysis in general and multivariate analysis in
particular, namely the presence of missing values in the data; i.e.,
observations and mea- surements that should have been recorded but for one
reason or another, were not. Missing values in multivariate data may arise
for a number of reasons; for example, non-response in sample surveys,
dropouts in longitudinal data (see Chapter 8), or refusal to answer
particular questions in a questionnaire. The most important approach for
dealing with missing data is to try to avoid them during the
data-collection stage of a study. But despite all the efforts a researcher
may make, he or she may still be faced with a data set that con- tains a
number of missing values. So what can be done? One answer to this question
is to take the complete-case analysis route because this is what most
statistical software packages do automatically. Using complete-case
analysis on multivariate data means omitting any case with a missing value
on any of the variables. It is easy to see that if the number of variables
is large, then even a sparse pattern of missing values can result in a
substantial number of incomplete cases. One possibility to ease this
problem is to simply drop any variables that have many missing values. But
complete-case analysis is not recommended for two reasons:

   -

   􏰁  Omitting a possibly substantial number of individuals will cause a
   large amount of information to be discarded and lower the effective sample
   size of the data, making any analyses less effective than they would have
   been if all the original sample had been available.
    -

   􏰁  More worrisome is that dropping the cases with missing values on one
   or more variables can lead to serious biases in both estimation and infer-
   ence unless the discarded cases are essentially a random subsample of the
   observed data (the term missing completely at random is often used; see
   Chapter 8 and Little and Rubin (1987) for more details).

   So, at the very least, complete-case analysis leads to a loss, and
   perhaps a substantial loss, in power by discarding data, but worse,
   analyses based just on complete cases might lead to misleading conclusions
   and inferences.

   A relatively simple alternative to complete-case analysis that is often
   used is available-case analysis. This is a straightforward attempt to
   exploit the incomplete information by using all the cases available to
   estimate quanti- ties of interest. For example, if the researcher is
   interested in estimating the correlation matrix (see Subsection 1.5.2) of
   a set of multivariate data, then available-case analysis uses all the cases
   with variables Xi and Xj present to estimate the correlation between the
   two variables. This approach appears to make better use of the data than
   complete-case analysis, but unfortunately available-case analysis has its
   own problems. The sample of individuals used changes from correlation to
   correlation, creating potential difficulties when the missing data are not
   missing completely at random. There is no guaran- tee that the estimated
   correlation matrix is even positive-definite which can create problems for
   some of the methods, such as factor analysis (see Chap- ter 5) and
structural
   equation modelling (see Chapter 7), that the researcher may wish to
   apply to the matrix.

   Both complete-case and available-case analyses are unattractive unless
   the number of missing values in the data set is “small”. An alternative
   answer to the missing-data problem is to consider some form of imputation,
   the prac- tise of “filling in” missing data with plausible values.
   Methods that impute the missing values have the advantage that, unlike in
   complete-case analysis, observed values in the incomplete cases are
   retained. On the surface, it looks like imputation will solve the
   missing-data problem and enable the investi- gator to progress normally.
   But, from a statistical viewpoint, careful consid- eration needs to be
   given to the method used for imputation or otherwise it may cause more
   problems than it solves; for example, imputing an observed variable mean
   for a variable’s missing values preserves the observed sample means but
   distorts the covariance matrix (see Subsection 1.5.1), biasing esti-
   mated variances and covariances towards zero. On the other hand, imputing
   predicted values from regression models tends to inflate observed
   correlations, biasing them away from zero (see Little 2005). And
   treating imputed data as

   if they were “real” in estimation and inference can lead to misleading
standard errors and p-values since they fail to reflect the uncertainty due
to the missing data.

The most appropriate way to deal with missing values is by a procedure
suggested by Rubin (1987) known as multiple imputation. This is a Monte
Carlo technique in which the missing values are replaced by m > 1 simulated
versions, where m is typically small (say 3–10). Each of the simulated com-
plete data sets is analysed using the method appropriate for the
investigation at hand, and the results are later combined to produce, say,
estimates and con- fidence intervals that incorporate missing-data
uncertainty. Details are given in Rubin (1987) and more concisely in Schafer
(1999). The great virtues of multiple imputation are its simplicity and its
generality. The user may analyse the data using virtually any technique
that would be appropriate if the data were complete. However, one should
always bear in mind that the imputed values are not real measurements. We
do not get something for nothing! And if there is a substantial proportion
of individuals with large amounts of miss- ing data, one should clearly
question whether any form of statistical analysis is worth the bother.

######################




El 14 de diciembre de 2011 01:47, javier villacampa
<hermesh2@hotmail.com>escribió:
>
>
>
> Hola buenas:
>
> Estoy haciendo un análisis factorial y me gustaría saber si hay alguna
> manera de aprovechar las filas con algún dato perdido o NA.
>
> De momento estoy utilizando los datos sin los perdidos.
> factanal(na.omit(datos),factors=4)
>
> Pero si fuese posible y sobretodo riguroso matemáticamente hablado, me
> gustaría emplear las filas con algún dato perdido o NA.
>
> Muchas gracias a todos.
>
>  Javier(Chabi) Villacampa González.
>
>
>        [[alternative HTML version deleted]]
>
>
> _______________________________________________
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>
>
	[[alternative HTML version deleted]]

R help es - Dec 2011 - help: analisis factorial con NA (Chabi)

[R-es] help: analisis factorial con NA (Chabi)

[R-es] help: analisis factorial con NA (Chabi)