Hi here is the array I am scanning: ["\n<td> <a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/ frameset~2489041&FF=rwr+121&1,1,\">The Academic Writer: A Brief Guide</ a>\n</td>\n<td >\n Ede, Lisa\n</td>\n\n<td >\n Valley Reserves -- VR 282 -- AVAILABLE\n</td>\n\n<td >\n \n</td>\n\n</ tr>\n<tr>\n<td> <a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/ frameset~1334646&FF=rwr+121&1,1,\">Cultural literacy : what every American needs to know / E.D. Hirsch, Jr. ; with an appendix, What li</ a>\n</td>\n<td >\n Hirsch, E. D. (Eric Donald), 1928-\n</td>\n \n<td >\n Valley Reserves -- LC149 .H57 1987 -- AVAILABLE\n</td> \n\n<td >\n \n</td>] I am trying to pull out the essential (everything but the newlines and such) value in between the <td></td>. Here is the regex I am trying: s.first.scan(/\<td \>(.*?)\<\/td\>/mi) But I don''t get the first <td> a href value. Any help would be appreciated. Kim --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Here is the array I am scanning: ["\n<td> <a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/ frameset~2489041&FF=rwr+121&1,1,\">The Academic Writer: A Brief Guide</ a>\n</td>\n<td >\n Ede, Lisa\n</td>\n\n<td >\n Valley Reserves -- VR 282 -- AVAILABLE\n</td>\n\n<td >\n \n</td>\n\n</ tr>\n<tr>\n<td> <a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/ frameset~1334646&FF=rwr+121&1,1,\">Cultural literacy : what every American needs to know / E.D. Hirsch, Jr. ; with an appendix, What li</ a>\n</td>\n<td >\n Hirsch, E. D. (Eric Donald), 1928-\n</td>\n \n<td >\n Valley Reserves -- LC149 .H57 1987 -- AVAILABLE\n</td> \n\n<td >\n \n</td>] I am trying to get the values (all but newlines and such) out from in between the <td> </td> Tried this : s.first.scan(/\<td \>(.*?)\<\/td\>/mi) But I never get the first <td> a href values. Any help is appreciated. Thanks. Kim --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Use hpricot plugin to handle HTML parsing. On Oct 22, 9:14 am, Kim <Kim.Gri...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Hi here is the array I am scanning: > ["\n<td> <a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/ > frameset~2489041&FF=rwr+121&1,1,\">The Academic Writer: A Brief Guide</ > a>\n</td>\n<td >\n Ede, Lisa\n</td>\n\n<td >\n Valley > Reserves -- VR 282 -- AVAILABLE\n</td>\n\n<td >\n \n</td>\n\n</ > tr>\n<tr>\n<td> <a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/ > frameset~1334646&FF=rwr+121&1,1,\">Cultural literacy : what every > American needs to know / E.D. Hirsch, Jr. ; with an appendix, What li</ > a>\n</td>\n<td >\n Hirsch, E. D. (Eric Donald), 1928-\n</td>\n > \n<td >\n Valley Reserves -- LC149 .H57 1987 -- AVAILABLE\n</td> > \n\n<td >\n \n</td>] > > I am trying to pull out the essential (everything but the newlines and > such) value in between the <td></td>. > > Here is the regex I am trying: > s.first.scan(/\<td \>(.*?)\<\/td\>/mi) > But I don''t get the first <td> a href value. > > Any help would be appreciated. Kim help would be appreciated. Kim--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Oct 22, 2008, at 12:38 AM, Kim wrote:> Here is the array I am scanning: > ["\n<td> <a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/ > frameset~2489041&FF=rwr+121&1,1,\">The Academic Writer: A Brief > Guide</ > a>\n</td>\n<td >\n Ede, Lisa\n</td>\n\n<td >\n Valley > Reserves -- VR 282 -- AVAILABLE\n</td>\n\n<td >\n \n</td>\n\n</ > tr>\n<tr>\n<td> <a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/ > frameset~1334646&FF=rwr+121&1,1,\">Cultural literacy : what every > American needs to know / E.D. Hirsch, Jr. ; with an appendix, What > li</ > a>\n</td>\n<td >\n Hirsch, E. D. (Eric Donald), 1928-\n</td>\n > \n<td >\n Valley Reserves -- LC149 .H57 1987 -- AVAILABLE\n</td> > \n\n<td >\n \n</td>] > > I am trying to get the values (all but newlines and such) out from in > between the <td> </td> > > Tried this : > s.first.scan(/\<td \>(.*?)\<\/td\>/mi) > But I never get the first <td> a href values. > > Any help is appreciated. Thanks. Kimbecause the first <td> is not a <td > of course. Your regexp looks for: /\<td \> ^ Did you mean something like %r{<td\b[^>]*>(.*?)</td>}mi Note that %r{} for a regexp literal can be more convenient when you hope to match a slash. -Rob Rob Biedenharn http://agileconsultingllc.com Rob-xa9cJyRlE0mWcWVYNo9pwxS2lgjeYSpx@public.gmane.org --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Oct 22, 12:14 am, Kim <Kim.Gri...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Hi here is the array I am scanning: > ["\n<td> <a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/ > frameset~2489041&FF=rwr+121&1,1,\">The Academic Writer: A Brief Guide</ > a>\n</td>\n<td >\n Ede, Lisa\n</td>\n\n<td >\n Valley > Reserves -- VR 282 -- AVAILABLE\n</td>\n\n<td >\n \n</td>\n\n</ > tr>\n<tr>\n<td> <a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/ > frameset~1334646&FF=rwr+121&1,1,\">Cultural literacy : what every > American needs to know / E.D. Hirsch, Jr. ; with an appendix, What li</ > a>\n</td>\n<td >\n Hirsch, E. D. (Eric Donald), 1928-\n</td>\n > \n<td >\n Valley Reserves -- LC149 .H57 1987 -- AVAILABLE\n</td> > \n\n<td >\n \n</td>] > > I am trying to pull out the essential (everything but the newlines and > such) value in between the <td></td>.I agree with Mukund. Use Hpricot: html = Hpricot(s.first) html.search( "td" ) do |cell| puts cell.inner_html end -- Mark. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---