thr3ads.net - R help - [R] Summary of variables with NA, empty [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Lopez, Dan

2012-Oct-23 18:17 UTC

[R] Summary of variables with NA, empty

Hi,

Is there a function I can use on my dataframe to give me a concise summary of
variables that are NA,blank,etc? Basically all Null values, Empty strings, white
space, blank values. Ideally it would look something like the below:

# it should only includes the fields with NAs, blanks, etc. Added bonus would be
to include column Index.
#Valid Records = records that are not NA, blank,etc
#ColIndex - what place is column in the original dataframe...1,2,3, ...xth

                Valid Records     Null (NA?)           Empty String      White
Space       Blank Value        ColIndex
Var1                       52           8                                       
2
Var2                       40           20                                      
10                           10                                           3
Var3                       58                                                   
2                                                                             
20
..

I now there is summary() but I am not sure if that always displays NAs and
blanks especially with factor variables that have several levels (lumps them in
'Other' when I run the entire dataframe). In these instances I can run
the individual field separately and see all levels but that would be inefficient
to do for a dataframe with over 50 variables.

Dan


	[[alternative HTML version deleted]]

David Winsemius

2012-Oct-23 21:44 UTC

head link

[R] Summary of variables with NA, empty

On Oct 23, 2012, at 11:17 AM, Lopez, Dan wrote:
> Hi,
> 
> Is there a function I can use on my dataframe to give me a concise summary
of variables that are NA,blank,etc? Basically all Null values, Empty strings,
white space, blank values. Ideally it would look something like the below:
> 
> # it should only includes the fields with NAs, blanks, etc. Added bonus
would be to include column Index.
> #Valid Records = records that are not NA, blank,etc
> #ColIndex - what place is column in the original dataframe...1,2,3, ...xth
> 
>                Valid Records  Null (NA?)        Empty String      White
Space       Blank Value        ColIndex
Would a "Valid Record" be defined by grep([^ ], column)? ... i.e. has
a non-space character in it
What is a "ColIndex"?
How is an "Empty String" different than "White Space" or a
"Blank Value"


> Var1                       52        8                                     
2
> Var2                       40           20                                 
10                           10                                           3
> Var3                       58                                              
2                                                                             
20
> ..
> 
I generally use describe from package:Hmisc. There are other versions of
describe in other packages. It's not going to classify items composed
entirely of a varying number of spaces and other non-character items like tabs
as a single group. And it's unclear what you will use as an operational
definition to separate blanks and white-space. You will probably need to code
that yourself. You might want to look at the code for Hmisc::describe as a
starting point.

> I now there is summary() but I am not sure if that always displays NAs and
blanks especially with factor variables that have several levels (lumps them in
'Other' when I run the entire dataframe).
> In these instances I can run the individual field separately and see all
levels but that would be inefficient to do for a dataframe with over 50
variables.
How were you going to "run the individual field"? If you show us code,
there might be more rapid progress. It would probably be very easy to turn that
into a function that could then be "run" with
`lapply`.> 
> -- 

David Winsemius, MD
Alameda, CA, USA

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Oct 2012 - Summary of variables with NA, empty

[R] Summary of variables with NA, empty

[R] Summary of variables with NA, empty

Apparently Analagous Threads