Hello - I am interested in parsing arbitrary street addresses from strings (semi-clean voter lists, mainly). These data may show up in various formats, but there are several common patterns. Non-exhaustive examples: 12-123 Washington Ave Minneapolis MN 12345 12/A-123 Washington Hwy Minneapolis Minnesota 12 Washington Dr Minneapolis Minn 12345 12 Washington Ridge St ... 12/AB-123 Washington Blvd ... 12/A-123 Washington Pl ... #12-123 Washington Rd E ... 1234/A Washington Ave ... 12B-123-A Washington St ... etc... My question is this: before I start cooking up a complex regexp to parse these strings into standard pieces(like state, city, street name, street type, unit number, etc), has someone already done this? Or is there some kind of toolkit to assist the parsing of street addresses? Surely this is a very common problem and it must have been solved many times by now. Or perhaps this type of data is so irregular as to preclude syntactical analysis? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hi, why not just split the strings with space? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Dec 27, 2006, at 11:01 PM, krst wrote:> > Hello - I am interested in parsing arbitrary street addresses from > strings (semi-clean voter lists, mainly). These data may show up in > various formats, but there are several common patterns. Non-exhaustive > examples: > > 12-123 Washington Ave Minneapolis MN 12345 > 12/A-123 Washington Hwy Minneapolis Minnesota > 12 Washington Dr Minneapolis Minn 12345 > 12 Washington Ridge St ... > 12/AB-123 Washington Blvd ... > 12/A-123 Washington Pl ... > #12-123 Washington Rd E ... > 1234/A Washington Ave ... > 12B-123-A Washington St ... > etc... > > My question is this: before I start cooking up a complex regexp to > parse these strings into standard pieces(like state, city, street > name, > street type, unit number, etc), has someone already done this? Or is > there some kind of toolkit to assist the parsing of street addresses? > Surely this is a very common problem and it must have been solved many > times by now. Or perhaps this type of data is so irregular as to > preclude syntactical analysis?The USPS might be able to help somewhat: http://www.usps.com/business/addressverification/welcome.htm Craig --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
I did some more searching and found a very useful website: http://regexlib.com They store 100''s of standard regular expressions for a variety of purposes. I searched on "address", "postal", etc. and found some patterns to start with. I think this site could be useful for anyone needing to get quickly started on regexp parsing. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---