OK, so you want parentheses, not "brackets" + I think I misinterpreted
your
specification, which I think is actually incomplete. Based on what I think
you meant, how does this work:
gsub("((\\\\|/)[[:alnum:]]+)|(\\([[:alnum:]-]+\\))",
"",tmp$Text)
[1] "? ????? ????, ???? ?????" "????
???????\n??????? ??????"
[3] "? ????? ????, ???? ????? " "?\n????? ?????, ????
????????
???????"
[5] "? ????? ????, ????\n?????" "? ????? ????, ????
???????"
[7] "?\n???????? ????, ???? ?????"
If you want it without the \n's, cat the above to get:
cat(gsub("((\\\\|/)[[:alnum:]]+)|(\\([[:alnum:]-]+\\))",
"",tmp$Text))
? ????? ????, ???? ????? ???? ???????
??????? ?????? ? ????? ????, ???? ????? ?
????? ?????, ???? ???????? ??????? ? ????? ????, ????
????? ? ????? ????, ???? ??????? ?
???????? ????, ???? ?????
Cheers,
Bert
On Tue, Jun 27, 2023 at 11:09?AM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> Does this do it for you (or get you closer):
>
> gsub("\\[.*\\]|[\\\\] |/ ","",tmp$Text)
> [1] "? ????? ????, ???? ?????"
> [2] "???? ???????\n??????? ??????"
> [3] "? ????? (???????) ????, ???? ????? (??????)"
> [4] "?\n?????(-??) ?????, ???? ???????? ???????"
> [5] "? ?????/?? ????, ????\n?????/??"
> [6] "? ?????\\??????? ????, ???? ???????\\????????"
> [7] "?\n????????(??) ????, ???? ?????(??)"
>
> On Tue, Jun 27, 2023 at 10:16?AM Chris Evans via R-help <
> r-help at r-project.org> wrote:
>
>> I am sure this is easy for people who are good at regexps but I'm
>> failing with it. The situation is that I have hundreds of lines of
>> Ukrainian translations of some English. They contain things like this:
>>
>> 1"? ????? ????, ???? ?????"2"???? ??????? ???????
??????"3"? ?????
>> (???????) ????, ???? ????? (??????)"4"? ?????(-??) ?????,
???? ????????
>> ???????"5"? ?????/?? ????, ???? ?????/??"6"?
?????\\??????? ????, ????
>> ???????\\????????."7"? ????????(??) ????, ????
?????(??)"
>>
>> Using dput():
>>
>> tmp <- structure(list(Text = c("? ????? ????, ???? ?????",
"???? ???????
>> ??????? ??????", "? ????? (???????) ????, ???? ?????
(??????)", "?
>> ?????(-??) ?????, ???? ???????? ???????", "? ?????/?? ????,
????
>> ?????/??", "? ?????\\??????? ????, ????
???????\\????????", "?
>> ????????(??) ????, ???? ?????(??)" )), row.names = c(NA, -7L),
class >> c("tbl_df", "tbl", "data.frame" ))
Those show four different ways
>> translators have handled gendered words: 1) Ignore them and (I'm
>> guessing) only give the masculine 2) Give the feminine form of the word
>> (or just the feminine suffix) in brackets 3) Give the feminine
>> form/suffix prefixed by a forward slash 4) Give the feminine
form/suffix
>> prefixed by backslash (here a double backslash) I would like just to
>> drop all these feminine gendered options. (Don't worry, they'll
get back
>> in later.) So I would like to replace 1) anything between brackets with
>> nothing! 2) anything between a forward slash and the next space with
>> nothing 3) anything between a backslash and the next space with nothing
>> but preserving the rest of the text. I have been trying to achieve this
>> using str_replace_all() but I am failing utterly. Here's a silly
little
>> example of my failures. This was just trying to get the text I wanted
to
>> replace (as I was trying to simplify the issues for my tired wetware):
>
>> tmp %>%+ as_tibble() %>% + rename(Text = value) %>% +
mutate(Text >> str_replace_all(Text, fixed("."), ""))
%>% + filter(row_number() < 4)
>> %>% + mutate(Text2 = str_replace(Text, "\\(.*\\)",
"\\1")) Errorin
>> `mutate()`:?In argument: `Text2 = str_replace(Text,
"\\(.*\\)",
>> "\\1")`.Caused by error in
`stri_replace_first_regex()`:!Trying to
>> access the index that is out of bounds. (U_INDEX_OUTOFBOUNDS_ERROR) Run
>> `rlang::last_trace()` to see where the error occurred. I have tried
>> gurgling around the internet but am striking out so throwing myself on
>> the list. Apologies if this is trivial but I'd hate to have to
clean
>> these hundreds of lines by hand though it's starting to look as if
I'd
>> achieve that faster by hand than I will by banging my ignorance of R
>> regexp syntax on the problem. TIA, Chris
>>
>> --
>> Chris Evans (he/him)
>> Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor,
>> University of Roehampton, London, UK.
>> Work web site: https://www.psyctc.org/psyctc/
>> CORE site: http://www.coresystemtrust.org.uk/
>> Personal site: https://www.psyctc.org/pelerinage2016/
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
[[alternative HTML version deleted]]