thr3ads.net - R help - [R] For loop on column names [Jan 2014]

If this information is useful, please help other people find it:
Share via:

Jeff Johnson

2014-Jan-18 00:26 UTC

[R] For loop on column names

I'm trying to find a more efficient to calculate the percent a field is
populated and repeat it for each field (column).

First, I'm counting the number of lines:
lines <- as.integer(countLines(extract) - 1)
dput(lines)
100000L

extract <- 'C:/Users/jeffjohn/Desktop/batchextract_100k_sample.csv'
mydf <- read.csv(file = extract, header = TRUE)

Here's the list of columns in my file:> dput(colnames(mydf))c("PERSONPROFILE_POS", "PARTY_ID",
"PERSON_FIRST_NAME", "PERSON_LAST_NAME",
"PERSON_MIDDLE_NAME", "PARTY_NUMBER",
"ACCOUNT_NUMBER", "ABILITEC_LINK",
"ADDRESS1", "ADDRESS2", "ADDRESS3",
"ADDRESS4", "CITY", "COUNTY",
"STATE", "PROVINCE", "POSTAL_CODE",
"COUNTRY", "PRIMARY_PER_TYPE",
"SELLTOADDR_LOS", "LOCATION_ID", "SELLTOADDR_SOS",
"PARTY_SITE_ID",
"PRIMARYPHONE_CPOS", "CONTACT_POINT_ID_PCP",
"CONTACT_POINT_PURPOSE_PCP",
"PHONE_LINE_TYPE", "PRIMARY_FLAG_PCP",
"PHONE_COUNTRY_CODE",
"PHONE_AREA_CODE", "PHONE_NUMBER", "EMAIL_CPOS",
"CONTACT_POINT_ID_ECP",
"CONTACT_POINT_PURPOSE_ECP", "PRIMARY_FLAG_ECP",
"EMAIL_ADDRESS",
"BB_PARTY_ID")

I want to count the percentage populated for each field. Rather than do:
percent(length(is.null(mydf$PERSONPROFILE_POS)) / lines)
percent(length(is.null(mydf$PARTY_ID)) / lines)
etc.
and repeat for each field manually, I want to use a for loop.

I am trying the following:
a <- length(colnames(mydf)) # this is to get the total number of columns

for (i in 1:a)
 print((percent(length(is.null(a)) / lines))

which isn't correct. I'm new to programming, so I don't quite know
how to
deal with this. Any suggestions? Thanks much.
-- 
Jeff

	[[alternative HTML version deleted]]

R help - Jan 2014 - For loop on column names

[R] For loop on column names