thr3ads.net - R help - [R] Processing a hierarchical string name [Jun 2023]

If this information is useful, please help other people find it:
Share via:

Kevin Zembower

2023-Jun-28 20:29 UTC

[R] Processing a hierarchical string name

Hello, all

I'm trying to process the names of the variables in the US Census 
database, that I'm retrieving with tidycensus. My end goal is to produce 
nicely formatted tables with natural labels.

The labels as downloaded from the US Census look like this:

## Get the P1 table for block group 3 in census tract 2711.01:
bg3_race <- get_decennial(
     geography = "block group",
     state = "MD",
     county = "Baltimore city",
     table = "P1",
     cache_table = TRUE,
     year = "2020",
     sumfile = "pl")%>%
     filter(substr(GEOID, 6, 12) == "2711013")

## Load the names and labels of the variables:
pl_vars <- load_variables(year = "2020", dataset = "pl",
cache = TRUE)

## Join the labels to the variables, and drop the zero counts
bg3_race_sum <- bg3_race %>%
     left_join(pl_vars, by=c("variable" = "name")) %>%
     filter(value > 0) %>%
     select(c(GEOID, value, label))

head(bg3_race_sum$label)
[1] " !!Total:" 

[2] " !!Total:!!Population of one race:" 

[3] " !!Total:!!Population of one race:!!White alone" 

[4] " !!Total:!!Population of one race:!!Black or African American 
alone"
[5] " !!Total:!!Population of one race:!!American Indian and Alaska 
Native alone"
[6] " !!Total:!!Population of one race:!!Asian alone" 


I think my algorithm for the labels is:
1. keep everything from the last "!!" up to and including the last
character
2. for everything remaining, replace each "!!.*:" group with a single
space.

This turns head() into:
"Total:"
" Population of one race:"
"  White alone"
"  Black or African American alone"
"  American Indian and Alaska Native alone"
"  Asian alone"
[may not be clearly visible if not rendered in a monospaced font]

I think that I need lapply here, but I'm not sure of that, and of what 
to do next. I can split the label using str_split(label, pattern =
"!!")
to get a vector of strings, but don't know how to work on the last 
string and all the rest of the strings separately.

Thank you for any suggestions to nudge me along towards a workable solution.

-Kevin

Ivan Krylov

2023-Jun-28 20:56 UTC

head link

[R] Processing a hierarchical string name

On Wed, 28 Jun 2023 20:29:23 +0000
Kevin Zembower via R-help <r-help at r-project.org> wrote:
> I think my algorithm for the labels is:
> 1. keep everything from the last "!!" up to and including the
last
> character
> 2. for everything remaining, replace each "!!.*:" group with a
single
> space.
If you remove the initial ' !!', the problem becomes a more tractable
"replace each group of non-'!' followed by '!!' with one
space":

bg3_race_sum$label |>
 (\(.) sub('^ !!', '', .))() |>
 (\(.) gsub('[^!]*!!', ' ', .))()

But that solution could have been impossible if the task was slightly
different.
> I can split the label using str_split(label, pattern = "!!") to
get a
> vector of strings, but don't know how to work on the last string and
> all the rest of the strings separately.
str_split() would have given you a list of character vectors. You can
use lapply to evaluate a function on each vector inside that list.
Inside the function, use length(x) (if `x` is the argument of the
function) to find out how many spaces to produce and which element of
the vector is the last one. (For code golf points, use rev(x)[1] to get
the last element.)

-- 
Best regards,
Ivan

R help - Jun 2023 - Processing a hierarchical string name

[R] Processing a hierarchical string name

[R] Processing a hierarchical string name