thr3ads.net - R help - [R] Any recommendations for reusable profiling of name fields? [Jan 2014]

If this information is useful, please help other people find it:
Share via:

Jeff Johnson

2014-Jan-17 04:38 UTC

[R] Any recommendations for reusable profiling of name fields?

Hi, I'm pretty new to R and am trying to develop a reusable set of scripts
that I can use to profile various data types and common fields in our
database. I know that what I'm asking is a can of worms, so please bear
with me. :)

For example, we store a person's first name, last name, phone number, email
address, last gift amount, gift date, etc. as well as integer type data.
I'm wondering if there's a "best practice" for validating a
field that
holds, for example, first name or last name. A couple of things I've come
up with are:
1) Count of characters (nchar) in the first (or last) name field
2) Number of unique tokens
3) Patterns (converting alpha to A and numeric to N) and count the
frequency of each unique pattern that results.I suppose I could make lower
case alpha 'a' and upper = 'A' to be more specific.
4) Min and max name (helps identify those with leading spaces, numbers)

Does anyone have more suggestions for techniques that are common or that
you'd recommend for name fields? Ultimately, I'm looking to develop a
common set of profiles for various data types, so if there's a white paper
(I've googled, but not found any that hit the mark yet) I'd love to see
it.

Perhaps there's even a package for this type of thing?

Thanks much!

-- 
Jeff

	[[alternative HTML version deleted]]

R help - Jan 2014 - Any recommendations for reusable profiling of name fields?

[R] Any recommendations for reusable profiling of name fields?