Hello. We have book titles as a column in our database, which I would like to use in our URLS for SEO purposes. Given that these are titles, they include characters other than alphabets and numbers (e.g. punctuation, blanks, foreign characters in some cases). What''s the easiest way to do this? Here is some more information: Original string: On One Flower: Butterflies, Ticks and a Few More Icks What I would like to see: on-one-flower-butterflies-ticks-and-a-few-more-icks I''m currently doing something like this; is there a better way? title.squeeze.downcase.tr("(),? !'':.[]", "-").gsub(''--'', ''-'') Thanks in advance. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
This is arguably a tad bit prettier: title.downcase.gsub(/[^a-z ]/, '''').gsub(/ /, ''-'') Not sure if it''s that much better... -----Original Message----- From: rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org [mailto:rubyonrails-talk@googlegroups.com] On Behalf Of Ben Knight Sent: Thursday, October 02, 2008 10:52 AM To: rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Subject: [Rails] Removing Non Alpha & Numeric Characters From String Hello. We have book titles as a column in our database, which I would like to use in our URLS for SEO purposes. Given that these are titles, they include characters other than alphabets and numbers (e.g. punctuation, blanks, foreign characters in some cases). What''s the easiest way to do this? Here is some more information: Original string: On One Flower: Butterflies, Ticks and a Few More Icks What I would like to see: on-one-flower-butterflies-ticks-and-a-few-more-icks I''m currently doing something like this; is there a better way? title.squeeze.downcase.tr("(),? !'':.[]", "-").gsub(''--'', ''-'') Thanks in advance. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hassan Schroeder
2008-Oct-02 18:30 UTC
Re: Removing Non Alpha & Numeric Characters From String
On Thu, Oct 2, 2008 at 11:00 AM, Pardee, Roy <pardee.r-go57ItdSaco@public.gmane.org> wrote:> > This is arguably a tad bit prettier: > > title.downcase.gsub(/[^a-z ]/, '''').gsub(/ /, ''-'')..but doesn''t do the same thing as the OP''s -- this one will strip out non-vanilla-ASCII accented characters, which could well be part of a title. Not all books are written in America :-) -- Hassan Schroeder ------------------------ hassan.schroeder-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Roy Pardee wrote:> This is arguably a tad bit prettier: > > title.downcase.gsub(/[^a-z ]/, '''').gsub(/ /, ''-'') > > Not sure if it''s that much better...That works great (and looks prettier :-)! Thanks. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Well, Hassan makes a good point that this will eat any non-ascii characters. Consider whether you want to do that. If you don''t, you''ll likely have to url-encode the result (I don''t think e.g., accented characters are usable in URLs, are they? -----Original Message----- From: rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org [mailto:rubyonrails-talk@googlegroups.com] On Behalf Of Ben Knight Sent: Thursday, October 02, 2008 11:48 AM To: rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Subject: [Rails] Re: Removing Non Alpha & Numeric Characters From String Roy Pardee wrote:> This is arguably a tad bit prettier: > > title.downcase.gsub(/[^a-z ]/, '''').gsub(/ /, ''-'') > > Not sure if it''s that much better...That works great (and looks prettier :-)! Thanks. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hassan Schroeder
2008-Oct-02 19:33 UTC
Re: Removing Non Alpha & Numeric Characters From String
On Thu, Oct 2, 2008 at 11:52 AM, Pardee, Roy <pardee.r-go57ItdSaco@public.gmane.org> wrote:> > If you don''t, you''ll likely have to url-encode the result (I don''t think > e.g., accented characters are usable in URLs, are they?It depends on the Web server being able to handle it, but yes, you can have non-ISO-8859-1 characters in a URL. -- Hassan Schroeder ------------------------ hassan.schroeder-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
David A. Black
2008-Oct-02 19:39 UTC
Re: Removing Non Alpha & Numeric Characters From String
Hi -- On Thu, 2 Oct 2008, Ben Knight wrote:> > Hello. > > We have book titles as a column in our database, which I would like to > use in our URLS for SEO purposes. Given that these are titles, they > include characters other than alphabets and numbers (e.g. punctuation, > blanks, foreign characters in some cases). > > What''s the easiest way to do this? Here is some more information: > > Original string: > > On One Flower: Butterflies, Ticks and a Few More Icks > > > What I would like to see: > > on-one-flower-butterflies-ticks-and-a-few-more-icks > > > I''m currently doing something like this; is there a better way? > > title.squeeze.downcase.tr("(),? !'':.[]", "-").gsub(''--'', ''-'')Here''s one possible technique: title.downcase.gsub(/\W+/, ''-'') \W does not include underscore (which is part of the \w class), so if you want to translate underscores you would do: /[\W_]+/ David -- Rails training from David A. Black and Ruby Power and Light: Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL Advancing with Rails January 19-22 Fort Lauderdale, FL * * Co-taught with Patrick Ewing! See http://www.rubypal.com for details and updates! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Petite Abeille
2008-Oct-02 19:52 UTC
Re: Removing Non Alpha & Numeric Characters From String
On Oct 2, 2008, at 9:33 PM, Hassan Schroeder wrote:> It depends on the Web server being able to handle it, but yes, you > can have non-ISO-8859-1 characters in a URL.Hmmm... are you sure? I thought one would need to encode anything but a small subset of US-ASCII: "The generic URI syntax mandates that new URI schemes that provide for the representation of character data in a URI must, in effect, represent characters from the unreserved set without translation, and should convert all other characters to bytes according to UTF-8, and then percent-encode those values." http://en.wikipedia.org/wiki/Percent-encoding "When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent- encoded. For example, the character A would be represented as "A", the character LATIN CAPITAL LETTER A WITH GRAVE would be represented as "%C3%80", and the character KATAKANA LETTER A would be represented as "%E3%82%A2". http://tools.ietf.org/html/rfc3986 In any case, one approach to URL normalization would be to transliterate the path to ASCII, then convert any non-alphanumeric characters into dashes or something, e.g.: €2 commemorative coins -> http://svr225.stepx.com:3388/eur2-commemorative-coins Hernán Cortés -> http://svr225.stepx.com:3388/hernan-cortes Scanian (linguistics) -> http://svr225.stepx.com:3388/scanian-linguistics Scheme (programming language) -> http://svr225.stepx.com:3388/scheme-programming-language Cheers, -- PA. http://alt.textdrive.com/nanoki/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hassan Schroeder
2008-Oct-02 20:12 UTC
Re: Removing Non Alpha & Numeric Characters From String
On Thu, Oct 2, 2008 at 12:52 PM, Petite Abeille <petite.abeille-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:>> It depends on the Web server being able to handle it, but yes, you >> can have non-ISO-8859-1 characters in a URL. > > Hmmm... are you sure? I thought one would need to encode anything but > a small subset of US-ASCII:As a simple test, I create a file called "Chrétien.txt" which I drop into a Tomcat web server to view as "http://localhost/sample/Chrétien.txt". Firefox 2 turns this into: http://localhost/sample/Chr%C3%A9tien.txt while Safari requests http://localhost/sample/Chrétien.txt But the main thing is that, regardless, the non-US-ASCII name is used to match the resource in the file system. -- Hassan Schroeder ------------------------ hassan.schroeder-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Petite Abeille
2008-Oct-02 20:19 UTC
Re: Removing Non Alpha & Numeric Characters From String
On Oct 2, 2008, at 10:12 PM, Hassan Schroeder wrote:> Firefox 2 turns this into: http://localhost/sample/Chr%C3%A9tien.txt > while Safari requests http://localhost/sample/Chrétien.txtEven though Safari does indeed display the accentuated characters in its UI, it does encode the URL properly when sending the HTTP request to the server... take a look at your log...> But the main thing is that, regardless, the non-US-ASCII name is used > to match the resource in the file system.Well, yes... once it has been decoded from the HTTP request back to its original form... Cheers, -- PA. http://alt.textdrive.com/nanoki/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hi, guys. I ended up doing this for now and it works for us (for now): title.downcase.gsub(/[(,?!\''":.)]/, '''').gsub('' '', ''-'').gsub(/-$/, '''') Also, since this is for SEO purposes only, I basically don''t use the parameter in my code. For example, I have a /mycontroller/:id/:dummy/ in my routes.rb (dummy being the above book title). Thanks. Petite Abeille wrote:> On Oct 2, 2008, at 10:12 PM, Hassan Schroeder wrote: > >> Firefox 2 turns this into: http://localhost/sample/Chr%C3%A9tien.txt >> while Safari requests http://localhost/sample/Chr�tien.txt > > Even though Safari does indeed display the accentuated characters in > its UI, it does encode the URL properly when sending the HTTP request > to the server... take a look at your log... > >> But the main thing is that, regardless, the non-US-ASCII name is used >> to match the resource in the file system. > > Well, yes... once it has been decoded from the HTTP request back to > its original form... > > Cheers, > > -- > PA. > http://alt.textdrive.com/nanoki/-- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Petite Abeille
2008-Oct-03 18:32 UTC
Re: Removing Non Alpha & Numeric Characters From String
On Oct 3, 2008, at 4:54 PM, Ben Knight wrote:> Hi, guys. I ended up doing this for now and it works for us (for > now): > > title.downcase.gsub(/[(,?!\''":.)]/, '''').gsub('' '', ''-'').gsub(/-$/, > '''')What about multiple dashes in the middle of the title? For example, given: Primetime Emmy Award for Outstanding Lead Actress - Miniseries or a Movie One would expect: primetime-emmy-award-for-outstanding-lead-actress-miniseries-or-a-movie Note the transition between ''Actress'' and ''Miniseries''. Cheers, -- PA. http://alt.textdrive.com/nanoki/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---