Hola Javier,
Es un tema que tiene difícil solución. Como referencia te adjunto el
detalle que aparece en el libro "An Introduction to Applied Multivariate
Analysis with R" que justamente trata este asunto en el primer capítulo.
Verás que las soluciones que se plantean no son definitivas, pero al menos
hay algunas alternativas.
Y bueno, además de este detalle, te sugiero que leas el capítulo dedicado
especialmente al análisis factorial y la discusión sobre alternativas a
esta técnica.
Saludos,
Carlos Ortega
www.qualityexcellence.es
############# Referencia ########
# An Introduction to Applied
# Multivariate Analysis with R
#
# Brian Everitt
# Torsten Hothorn
#
# Springer 2011
###############################
1.3.1 Missing values
Table 1.1 also illustrates one of the problems often faced by statisticians
un- dertaking statistical analysis in general and multivariate analysis in
particular, namely the presence of missing values in the data; i.e.,
observations and mea- surements that should have been recorded but for one
reason or another, were not. Missing values in multivariate data may arise
for a number of reasons; for example, non-response in sample surveys,
dropouts in longitudinal data (see Chapter 8), or refusal to answer
particular questions in a questionnaire. The most important approach for
dealing with missing data is to try to avoid them during the
data-collection stage of a study. But despite all the efforts a researcher
may make, he or she may still be faced with a data set that con- tains a
number of missing values. So what can be done? One answer to this question
is to take the complete-case analysis route because this is what most
statistical software packages do automatically. Using complete-case
analysis on multivariate data means omitting any case with a missing value
on any of the variables. It is easy to see that if the number of variables
is large, then even a sparse pattern of missing values can result in a
substantial number of incomplete cases. One possibility to ease this
problem is to simply drop any variables that have many missing values. But
complete-case analysis is not recommended for two reasons:
-
Omitting a possibly substantial number of individuals will cause a
large amount of information to be discarded and lower the effective sample
size of the data, making any analyses less effective than they would have
been if all the original sample had been available.
-
More worrisome is that dropping the cases with missing values on one
or more variables can lead to serious biases in both estimation and infer-
ence unless the discarded cases are essentially a random subsample of the
observed data (the term missing completely at random is often used; see
Chapter 8 and Little and Rubin (1987) for more details).
So, at the very least, complete-case analysis leads to a loss, and
perhaps a substantial loss, in power by discarding data, but worse,
analyses based just on complete cases might lead to misleading conclusions
and inferences.
A relatively simple alternative to complete-case analysis that is often
used is available-case analysis. This is a straightforward attempt to
exploit the incomplete information by using all the cases available to
estimate quanti- ties of interest. For example, if the researcher is
interested in estimating the correlation matrix (see Subsection 1.5.2) of
a set of multivariate data, then available-case analysis uses all the cases
with variables Xi and Xj present to estimate the correlation between the
two variables. This approach appears to make better use of the data than
complete-case analysis, but unfortunately available-case analysis has its
own problems. The sample of individuals used changes from correlation to
correlation, creating potential difficulties when the missing data are not
missing completely at random. There is no guaran- tee that the estimated
correlation matrix is even positive-definite which can create problems for
some of the methods, such as factor analysis (see Chap- ter 5) and
structural
equation modelling (see Chapter 7), that the researcher may wish to
apply to the matrix.
Both complete-case and available-case analyses are unattractive unless
the number of missing values in the data set is “small”. An alternative
answer to the missing-data problem is to consider some form of imputation,
the prac- tise of “filling in” missing data with plausible values.
Methods that impute the missing values have the advantage that, unlike in
complete-case analysis, observed values in the incomplete cases are
retained. On the surface, it looks like imputation will solve the
missing-data problem and enable the investi- gator to progress normally.
But, from a statistical viewpoint, careful consid- eration needs to be
given to the method used for imputation or otherwise it may cause more
problems than it solves; for example, imputing an observed variable mean
for a variable’s missing values preserves the observed sample means but
distorts the covariance matrix (see Subsection 1.5.1), biasing esti-
mated variances and covariances towards zero. On the other hand, imputing
predicted values from regression models tends to inflate observed
correlations, biasing them away from zero (see Little 2005). And
treating imputed data as
if they were “real” in estimation and inference can lead to misleading
standard errors and p-values since they fail to reflect the uncertainty due
to the missing data.
The most appropriate way to deal with missing values is by a procedure
suggested by Rubin (1987) known as multiple imputation. This is a Monte
Carlo technique in which the missing values are replaced by m > 1 simulated
versions, where m is typically small (say 3–10). Each of the simulated com-
plete data sets is analysed using the method appropriate for the
investigation at hand, and the results are later combined to produce, say,
estimates and con- fidence intervals that incorporate missing-data
uncertainty. Details are given in Rubin (1987) and more concisely in Schafer
(1999). The great virtues of multiple imputation are its simplicity and its
generality. The user may analyse the data using virtually any technique
that would be appropriate if the data were complete. However, one should
always bear in mind that the imputed values are not real measurements. We
do not get something for nothing! And if there is a substantial proportion
of individuals with large amounts of miss- ing data, one should clearly
question whether any form of statistical analysis is worth the bother.
######################
El 14 de diciembre de 2011 01:47, javier villacampa
<hermesh2@hotmail.com>escribió:
>
>
>
> Hola buenas:
>
> Estoy haciendo un análisis factorial y me gustaría saber si hay alguna
> manera de aprovechar las filas con algún dato perdido o NA.
>
> De momento estoy utilizando los datos sin los perdidos.
> factanal(na.omit(datos),factors=4)
>
> Pero si fuese posible y sobretodo riguroso matemáticamente hablado, me
> gustaría emplear las filas con algún dato perdido o NA.
>
> Muchas gracias a todos.
>
> Javier(Chabi) Villacampa González.
>
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>
>
[[alternative HTML version deleted]]