Hi R users, I am trying to omit rows of data based on partial matches an example of my data (seal_dist) is below: A quick break down of my coding and why I need to answer this - I am dealing with a colony of seals where for example A1 is a female with pup and A1.1 is that female's pup, the important part of the data here is DIST which tells the distance between one seal (ID) and another (TO_ID). What I want to do is take a mean for these data for a nearest neighbour analysis but I want to omit any cases where there is the distance between a female and her pup, i.e. in the previous e.g. omit rows where A1 and A1.1 occur. I have looked at grep and pmatch but these appear to work across columns and don't appear to do what I'm looking to do, If anyone can point me in the right direction, I'd be most greatful, Best wishes, Ross FROM TO DIST ID HR DD MM YY ANIMAL DAY TO_ID TO_ANIMAL 2 1 2 4.81803 A1 1 30 9 9 1 1 MALE1 12 3 1 3 2.53468 A1 1 30 9 9 1 1 A2 3 4 1 4 7.57332 A1 1 30 9 9 1 1 A1.1 7 5 1 1 7.57332 A1.1 1 30 9 9 7 1 A1 1 6 1 2 7.89665 A1.1 1 30 9 9 7 1 MALE1 12 7 1 3 6.47847 A1.1 1 30 9 9 7 1 A2 3 9 1 1 2.53468 A2 1 30 9 9 3 1 A1 1 10 1 2 2.59051 A2 1 30 9 9 3 1 MALE1 12 12 1 4 6.47847 A2 1 30 9 9 3 1 A1.1 7 13 1 1 4.81803 MALE1 1 30 9 9 12 1 A1 1 15 1 3 2.59051 MALE1 1 30 9 9 12 1 A2 3 16 1 4 7.89665 MALE1 1 30 9 9 12 1 A1.1 7 17 1 1 3.85359 A1 2 30 9 9 1 1 MALE1 12 19 1 3 4.88826 A1 2 30 9 9 1 1 A2 3 20 1 4 7.25773 A1 2 30 9 9 1 1 A1.1 7 21 1 1 9.96431 A1.1 2 30 9 9 7 1 MALE1 12 22 1 2 7.25773 A1.1 2 30 9 9 7 1 A1 1 23 1 3 5.71725 A1.1 2 30 9 9 7 1 A2 3 25 1 1 8.73759 A2 2 30 9 9 3 1 MALE1 12 26 1 2 4.88826 A2 2 30 9 9 3 1 A1 1 28 1 4 5.71725 A2 2 30 9 9 3 1 A1.1 7 30 1 2 3.85359 MALE1 2 30 9 9 12 1 A1 1 31 1 3 8.73759 MALE1 2 30 9 9 12 1 A2 3 32 1 4 9.96431 MALE1 2 30 9 9 12 1 A1.1 7 33 1 1 7.95399 A1 3 30 9 9 1 1 MALE1 12 35 1 3 0.60443 A1 3 30 9 9 1 1 A1.1 7 36 1 4 1.91136 A1 3 30 9 9 1 1 A2 3 37 1 1 8.29967 A1.1 3 30 9 9 7 1 MALE1 12 38 1 2 0.60443 A1.1 3 30 9 9 7 1 A1 1 40 1 4 1.43201 A1.1 3 30 9 9 7 1 A2 3 41 1 1 9.71659 A2 3 30 9 9 3 1 MALE1 12 42 1 2 1.91136 A2 3 30 9 9 3 1 A1 1 43 1 3 1.43201 A2 3 30 9 9 3 1 A1.1 7 46 1 2 7.95399 MALE1 3 30 9 9 12 1 A1 1 47 1 3 8.29967 MALE1 3 30 9 9 12 1 A1.1 7 48 1 4 9.71659 MALE1 3 30 9 9 12 1 A2 3 -- View this message in context: http://r.789695.n4.nabble.com/partial-matches-across-rows-not-columns-tp2247757p2247757.html Sent from the R help mailing list archive at Nabble.com.
Is this what you are looking for:> # assume females start with "A" > # extract first part if female from ID > x.id <- sub("(A[[:digit:]]+).*", "\\1", x$ID) > # now see if this pattern matches first part of TO_ID > x.match <- x.id == substring(x$TO_ID, 1, nchar(x.id)) > # here are the ones that would be eliminated > x[x.match,]FROM TO DIST ID HR DD MM YY ANIMAL DAY TO_ID TO_ANIMAL 4 1 4 7.57332 A1 1 30 9 9 1 1 A1.1 7 5 1 1 7.57332 A1.1 1 30 9 9 7 1 A1 1 20 1 4 7.25773 A1 2 30 9 9 1 1 A1.1 7 22 1 2 7.25773 A1.1 2 30 9 9 7 1 A1 1 35 1 3 0.60443 A1 3 30 9 9 1 1 A1.1 7 38 1 2 0.60443 A1.1 3 30 9 9 7 1 A1 1> >On Tue, Jun 8, 2010 at 1:43 PM, RCulloch <ross.culloch at dur.ac.uk> wrote:> > Hi R users, > > I am trying to omit rows of data based on partial matches an example of my > data (seal_dist) is below: > > A quick break down of my coding and why I need to answer this - I am dealing > with a colony of seals where for example A1 is a female with pup and A1.1 is > that female's pup, the important part of the data here is DIST which tells > the distance between one seal (ID) and another (TO_ID). What I want to do is > take a mean for these data for a nearest neighbour analysis but I want to > omit any cases where there is the distance between a female and her pup, > i.e. in the previous e.g. omit rows where A1 and A1.1 occur. > > I have looked at grep and pmatch but these appear to work across columns and > don't appear to do what I'm looking to do, > > If anyone can point me in the right direction, I'd be most greatful, > > Best wishes, > > Ross > > > ? ?FROM TO ? ? DIST ? ?ID HR DD MM YY ANIMAL DAY TO_ID TO_ANIMAL > 2 ? ? ?1 ?2 ?4.81803 ? ?A1 ?1 30 ?9 ?9 ? ? ?1 ? 1 MALE1 ? ? ? ?12 > 3 ? ? ?1 ?3 ?2.53468 ? ?A1 ?1 30 ?9 ?9 ? ? ?1 ? 1 ? ?A2 ? ? ? ? 3 > 4 ? ? ?1 ?4 ?7.57332 ? ?A1 ?1 30 ?9 ?9 ? ? ?1 ? 1 ?A1.1 ? ? ? ? 7 > 5 ? ? ?1 ?1 ?7.57332 ?A1.1 ?1 30 ?9 ?9 ? ? ?7 ? 1 ? ?A1 ? ? ? ? 1 > 6 ? ? ?1 ?2 ?7.89665 ?A1.1 ?1 30 ?9 ?9 ? ? ?7 ? 1 MALE1 ? ? ? ?12 > 7 ? ? ?1 ?3 ?6.47847 ?A1.1 ?1 30 ?9 ?9 ? ? ?7 ? 1 ? ?A2 ? ? ? ? 3 > 9 ? ? ?1 ?1 ?2.53468 ? ?A2 ?1 30 ?9 ?9 ? ? ?3 ? 1 ? ?A1 ? ? ? ? 1 > 10 ? ? 1 ?2 ?2.59051 ? ?A2 ?1 30 ?9 ?9 ? ? ?3 ? 1 MALE1 ? ? ? ?12 > 12 ? ? 1 ?4 ?6.47847 ? ?A2 ?1 30 ?9 ?9 ? ? ?3 ? 1 ?A1.1 ? ? ? ? 7 > 13 ? ? 1 ?1 ?4.81803 MALE1 ?1 30 ?9 ?9 ? ? 12 ? 1 ? ?A1 ? ? ? ? 1 > 15 ? ? 1 ?3 ?2.59051 MALE1 ?1 30 ?9 ?9 ? ? 12 ? 1 ? ?A2 ? ? ? ? 3 > 16 ? ? 1 ?4 ?7.89665 MALE1 ?1 30 ?9 ?9 ? ? 12 ? 1 ?A1.1 ? ? ? ? 7 > 17 ? ? 1 ?1 ?3.85359 ? ?A1 ?2 30 ?9 ?9 ? ? ?1 ? 1 MALE1 ? ? ? ?12 > 19 ? ? 1 ?3 ?4.88826 ? ?A1 ?2 30 ?9 ?9 ? ? ?1 ? 1 ? ?A2 ? ? ? ? 3 > 20 ? ? 1 ?4 ?7.25773 ? ?A1 ?2 30 ?9 ?9 ? ? ?1 ? 1 ?A1.1 ? ? ? ? 7 > 21 ? ? 1 ?1 ?9.96431 ?A1.1 ?2 30 ?9 ?9 ? ? ?7 ? 1 MALE1 ? ? ? ?12 > 22 ? ? 1 ?2 ?7.25773 ?A1.1 ?2 30 ?9 ?9 ? ? ?7 ? 1 ? ?A1 ? ? ? ? 1 > 23 ? ? 1 ?3 ?5.71725 ?A1.1 ?2 30 ?9 ?9 ? ? ?7 ? 1 ? ?A2 ? ? ? ? 3 > 25 ? ? 1 ?1 ?8.73759 ? ?A2 ?2 30 ?9 ?9 ? ? ?3 ? 1 MALE1 ? ? ? ?12 > 26 ? ? 1 ?2 ?4.88826 ? ?A2 ?2 30 ?9 ?9 ? ? ?3 ? 1 ? ?A1 ? ? ? ? 1 > 28 ? ? 1 ?4 ?5.71725 ? ?A2 ?2 30 ?9 ?9 ? ? ?3 ? 1 ?A1.1 ? ? ? ? 7 > 30 ? ? 1 ?2 ?3.85359 MALE1 ?2 30 ?9 ?9 ? ? 12 ? 1 ? ?A1 ? ? ? ? 1 > 31 ? ? 1 ?3 ?8.73759 MALE1 ?2 30 ?9 ?9 ? ? 12 ? 1 ? ?A2 ? ? ? ? 3 > 32 ? ? 1 ?4 ?9.96431 MALE1 ?2 30 ?9 ?9 ? ? 12 ? 1 ?A1.1 ? ? ? ? 7 > 33 ? ? 1 ?1 ?7.95399 ? ?A1 ?3 30 ?9 ?9 ? ? ?1 ? 1 MALE1 ? ? ? ?12 > 35 ? ? 1 ?3 ?0.60443 ? ?A1 ?3 30 ?9 ?9 ? ? ?1 ? 1 ?A1.1 ? ? ? ? 7 > 36 ? ? 1 ?4 ?1.91136 ? ?A1 ?3 30 ?9 ?9 ? ? ?1 ? 1 ? ?A2 ? ? ? ? 3 > 37 ? ? 1 ?1 ?8.29967 ?A1.1 ?3 30 ?9 ?9 ? ? ?7 ? 1 MALE1 ? ? ? ?12 > 38 ? ? 1 ?2 ?0.60443 ?A1.1 ?3 30 ?9 ?9 ? ? ?7 ? 1 ? ?A1 ? ? ? ? 1 > 40 ? ? 1 ?4 ?1.43201 ?A1.1 ?3 30 ?9 ?9 ? ? ?7 ? 1 ? ?A2 ? ? ? ? 3 > 41 ? ? 1 ?1 ?9.71659 ? ?A2 ?3 30 ?9 ?9 ? ? ?3 ? 1 MALE1 ? ? ? ?12 > 42 ? ? 1 ?2 ?1.91136 ? ?A2 ?3 30 ?9 ?9 ? ? ?3 ? 1 ? ?A1 ? ? ? ? 1 > 43 ? ? 1 ?3 ?1.43201 ? ?A2 ?3 30 ?9 ?9 ? ? ?3 ? 1 ?A1.1 ? ? ? ? 7 > 46 ? ? 1 ?2 ?7.95399 MALE1 ?3 30 ?9 ?9 ? ? 12 ? 1 ? ?A1 ? ? ? ? 1 > 47 ? ? 1 ?3 ?8.29967 MALE1 ?3 30 ?9 ?9 ? ? 12 ? 1 ?A1.1 ? ? ? ? 7 > 48 ? ? 1 ?4 ?9.71659 MALE1 ?3 30 ?9 ?9 ? ? 12 ? 1 ? ?A2 ? ? ? ? 3 > -- > View this message in context: http://r.789695.n4.nabble.com/partial-matches-across-rows-not-columns-tp2247757p2247757.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
I did not go too deep into your zoology problem ;-) but as far as I understood you, you want to omit all rows where ID and TO_ID are A1 and A1.1, (or A2....) correct? If the data you send us is all the data and if there do not occour any different situations the following should be sufficient: Transfer the vectors ID an TO_ID to values without the . and the number following it (e.g. A1.1 -> A1): ID.clean<-gsub("^.*[?]| .*$", "",data$ID) TO_ID.clean<-gsub("^.*[?]| .*$", "",data$TO_ID) And then use logical indexing: data.clean = data[ID.clean==TO_ID.clean,] HTH Jannis RCulloch schrieb:> Hi R users, > > I am trying to omit rows of data based on partial matches an example of my > data (seal_dist) is below: > > A quick break down of my coding and why I need to answer this - I am dealing > with a colony of seals where for example A1 is a female with pup and A1.1 is > that female's pup, the important part of the data here is DIST which tells > the distance between one seal (ID) and another (TO_ID). What I want to do is > take a mean for these data for a nearest neighbour analysis but I want to > omit any cases where there is the distance between a female and her pup, > i.e. in the previous e.g. omit rows where A1 and A1.1 occur. > > I have looked at grep and pmatch but these appear to work across columns and > don't appear to do what I'm looking to do, > > If anyone can point me in the right direction, I'd be most greatful, > > Best wishes, > > Ross > > > FROM TO DIST ID HR DD MM YY ANIMAL DAY TO_ID TO_ANIMAL > 2 1 2 4.81803 A1 1 30 9 9 1 1 MALE1 12 > 3 1 3 2.53468 A1 1 30 9 9 1 1 A2 3 > 4 1 4 7.57332 A1 1 30 9 9 1 1 A1.1 7 > 5 1 1 7.57332 A1.1 1 30 9 9 7 1 A1 1 > 6 1 2 7.89665 A1.1 1 30 9 9 7 1 MALE1 12 > 7 1 3 6.47847 A1.1 1 30 9 9 7 1 A2 3 > 9 1 1 2.53468 A2 1 30 9 9 3 1 A1 1 > 10 1 2 2.59051 A2 1 30 9 9 3 1 MALE1 12 > 12 1 4 6.47847 A2 1 30 9 9 3 1 A1.1 7 > 13 1 1 4.81803 MALE1 1 30 9 9 12 1 A1 1 > 15 1 3 2.59051 MALE1 1 30 9 9 12 1 A2 3 > 16 1 4 7.89665 MALE1 1 30 9 9 12 1 A1.1 7 > 17 1 1 3.85359 A1 2 30 9 9 1 1 MALE1 12 > 19 1 3 4.88826 A1 2 30 9 9 1 1 A2 3 > 20 1 4 7.25773 A1 2 30 9 9 1 1 A1.1 7 > 21 1 1 9.96431 A1.1 2 30 9 9 7 1 MALE1 12 > 22 1 2 7.25773 A1.1 2 30 9 9 7 1 A1 1 > 23 1 3 5.71725 A1.1 2 30 9 9 7 1 A2 3 > 25 1 1 8.73759 A2 2 30 9 9 3 1 MALE1 12 > 26 1 2 4.88826 A2 2 30 9 9 3 1 A1 1 > 28 1 4 5.71725 A2 2 30 9 9 3 1 A1.1 7 > 30 1 2 3.85359 MALE1 2 30 9 9 12 1 A1 1 > 31 1 3 8.73759 MALE1 2 30 9 9 12 1 A2 3 > 32 1 4 9.96431 MALE1 2 30 9 9 12 1 A1.1 7 > 33 1 1 7.95399 A1 3 30 9 9 1 1 MALE1 12 > 35 1 3 0.60443 A1 3 30 9 9 1 1 A1.1 7 > 36 1 4 1.91136 A1 3 30 9 9 1 1 A2 3 > 37 1 1 8.29967 A1.1 3 30 9 9 7 1 MALE1 12 > 38 1 2 0.60443 A1.1 3 30 9 9 7 1 A1 1 > 40 1 4 1.43201 A1.1 3 30 9 9 7 1 A2 3 > 41 1 1 9.71659 A2 3 30 9 9 3 1 MALE1 12 > 42 1 2 1.91136 A2 3 30 9 9 3 1 A1 1 > 43 1 3 1.43201 A2 3 30 9 9 3 1 A1.1 7 > 46 1 2 7.95399 MALE1 3 30 9 9 12 1 A1 1 > 47 1 3 8.29967 MALE1 3 30 9 9 12 1 A1.1 7 > 48 1 4 9.71659 MALE1 3 30 9 9 12 1 A2 3 >
Hi Jim and Hi Jannis, Thanks very much to both of you for your help! Both methods work perfectly! Always good to know that there is more than one way to skin a cat when it comes to R! I will just need to get a grip on the regular expressions, it would seem. Many thanks again for you r help, much appreciated, Ross -- View this message in context: http://r.789695.n4.nabble.com/partial-matches-across-rows-not-columns-tp2247757p2250306.html Sent from the R help mailing list archive at Nabble.com.