Hi
I did not see any answer so I try to generate some answer.
It seems to me that your second attempt was quite close.
If passengerid was numeric, following code could probably give you the required
result.
res <- rep(NA, nrow(df1))
for (i in 1:NROW(df1)) {
sel <- which(str_detect(df1$Name,coll(df1$HusbandName[i])))
if (length(sel) > 0) { res[i] <- df1$passengerid[sel]}
}
res should contain passengerid for each relevant line and NA if there is no
match. You just could add it to your data frame as a new column.
The problem is that although you provide "a kind of" example, HTML
format probably scrambled it somehow. Better is to use dput for sending test
data and not use HTML formating.
This is data frame I got from your mail.
> dput(df1)
structure(list(passengerid = structure(c(3L, 4L, 2L, 1L), .Label =
c("3302",
"7767", "908", "9883"), class =
"factor"), Name = c("Backstrom, Mrs. Karl Alfred (Maria Mathilda
Gustafsson)",
"Backstrom, Mr. Karl Alfred John", "Cumings, Mrs. John Bradley
(Florence Briggs Thayer)",
"Cumings, Mr. John Bradley"), HusbandName = c("Backstrom, Mr.
Karl Alfred",
"", "Cumings, Mr. John\nBradley", "")), row.names
= c(NA, -4L
), class = "data.frame")
Cheers
Petr
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of gary
chimuzinga
> Sent: Tuesday, November 20, 2018 5:06 PM
> To: r-help at r-project.org
> Subject: [R] Partial LookUP
>
> I am working n R, using R studio,
> I have a dataframe with 4 columns. Column A contains passenger iD, B
contains
> passenger name, C contains husband name.
> I am attempting to create a new column which look to see if the husband
name
> in column C is listed in any of the records in column B. If so it should
then
> return to me the passenger iD of the husband from column A.
> To make things more complicated, as in the first example in some cases, the
> husband's given in column C might not include the his second name,
which
> would be included in column B.
>
> Reproducible Example
> library(stringr)
> rm(list=ls())
> passengerid <- c(0908,9883,7767,3302)
>
> Name<- c("Backstrom, Mrs. Karl Alfred (Maria Mathilda
Gustafsson)",
> "Backstrom, Mr. Karl Alfred John",
> "Cumings, Mrs. John Bradley (Florence Briggs Thayer)",
> "Cumings, Mr. John Bradley")
>
> HusbandName <- c("Backstrom, Mr. Karl
Alfred","","Cumings, Mr. John
> Bradley","")
>
>
>
> df1<- data.frame(cbind(passengerid,Name,HusbandName))
> df1$Name <- as.character(df1$Name)
> df1$HusbandName <- as.character(df1$HusbandName)
>
> I have tried using Stringr, but facing problems because 1)I need the code
to look
> at only 1 element of the vector HusbandName and search for it in the whole
> vector Name. 2) I found it difficult to use regular expressions given that
the
> pattern I am looking for is vectorised (as HusbandName)
> This is what I have tried so far:
>
> Attempt 1 - only finds exact matches & doesn't return the
passengerID &
> doesn't add column to df
> df1$Husbandid < - for (i in 1:NROW(df1$HusbandName)) {
> print(HusbandName[i] %in% Name)}
>
>
> Attempt 2 - finds partial matches, but does not ignore blanks & does
not tell
> me passenger id & doesn't add column to df
> df1$Husbandid <- for (i in 1:NROW(df1$HusbandName)) {
> print(which(str_detect(df1$Name,df1$HusbandName[i])))}
>
>
> #Attempt 3 - almost works but - the printed results are different from
those
> added into the dataframe as a new column. how can i correct for this?
> Ultimately I need the ones in the df to be correct. the error is that those
> without husbands are showing husbandiD when this should be blank or na. can
> this be corrected or is there a way to convert the output of the for loop
into a
> vector we can add to the df?
> for (i in 1:NROW(df1$HusbandName)) {
> if (df1$HusbandName[i] =="") {
> print("Man") & next()
> }
> FoundHusbandNames<-
> c(which(str_detect(df1$Name,df1$HusbandName[i])))
> print(df1$passengerid[FoundHusbandNames]) -> df1$Husbandid[i] }
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch
partner? PRECHEZA a.s. jsou zve?ejn?ny na:
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about
processing and protection of business partner?s personal data are available on
website: https://www.precheza.cz/en/personal-data-protection-principles/
D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a
podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti:
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to
it may be confidential and are subject to the legally binding disclaimer:
https://www.precheza.cz/en/01-disclaimer/