For an application I am working on I have to extract URLs and the text used to link. For example, ..... <a href="http://www.rubyonrails.org" title="rails" >Ruby on Rails</a>.... I have been trying all night but cannot come up with the regular expression needed to extract the URLs and the text. I have tried: myurls=response.scan(/href\s*=\s*["''](http|https)(.*)["'']\s*.*>(.*)<\/a>/) However I am left with : ://domain.com/filename" rel="tag and ://domain.com/filename " title="permanent link Can anyone please help me as to how I can specify to extract everything till the next single or double quote character? Or how can I go about extracting URL and the linked text? I will greatly appreciate it. Thanks Frank --------------------------------- What are the most popular cars? Find out at Yahoo! Autos -------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060218/2423c5e8/attachment.html
irb> response = %{Here is some link <a
href="http://www.rubyonrails.org" title="rails" >Ruby on
Rails</a>
and <a href="http://www.google.com">Google ofcourse</a>
and <a
href="ftp://www.foo.bar" title="bar">Foo!</a>}
irb> puts response.scan(/href="([^"]+)".*?>([^>]+)</)
=> [["http://www.rubyonrails.org", "Ruby on Rails"],
["http://www.google.com", "Google ofcourse"],
["ftp://www.foo.bar", "Foo!"]]
what you''re looking for is the negation class so
href="([^"]+)"
^^^^^
match anything that is not a doublequote all the way until you bump into
one.
and similarly>([^>]+)<
^^^^^^
match everything but only between two > and <
cheers,
-Mehryar
On Fri, 17 Feb 2006, softwareengineer 99 wrote:
> For an application I am working on I have to extract URLs and the text used
to link.
>
> For example,
>
> ..... <a href="http://www.rubyonrails.org"
title="rails" >Ruby on Rails</a>....
>
> I have been trying all night but cannot come up with the regular
expression needed to extract the URLs and the text.
>
> I have tried:
>
>
myurls=response.scan(/href\s*=\s*["''](http|https)(.*)["'']\s*.*>(.*)<\/a>/)
>
> However I am left with :
>
> ://domain.com/filename" rel="tag
>
> and
>
> ://domain.com/filename " title="permanent link
>
> Can anyone please help me as to how I can specify to extract everything
till the next single or double quote character? Or how can I go about
extracting URL and the linked text?
>
> I will greatly appreciate it.
>
> Thanks
> Frank
>
>
> ---------------------------------
>
> What are the most popular cars? Find out at Yahoo! Autos
-------------------------------------------------------
... with proper design, the features come cheaply. This
approach is arduous, but continues to succeed.
---Dennis Ritchie
Hello Mehryar,
This works like a charm :)
Thank you so much. I really appreciate it.
Frank
mehryar <mehryar@mehryar.com> wrote:
irb> response = %{Here is some link Ruby on Rails
and Google ofcourse and
href="ftp://www.foo.bar" title="bar">Foo!}
irb> puts response.scan(/href="([^"]+)".*?>([^>]+)=>
[["http://www.rubyonrails.org", "Ruby on Rails"],
["http://www.google.com", "Google ofcourse"],
["ftp://www.foo.bar", "Foo!"]]
what you''re looking for is the negation class so
href="([^"]+)"
^^^^^
match anything that is not a doublequote all the way until you bump into
one.
and similarly>([^>]+)<
^^^^^^
match everything but only between two > and <
cheers,
-Mehryar
On Fri, 17 Feb 2006, softwareengineer 99 wrote:
> For an application I am working on I have to extract URLs and the text used
to link.
>
> For example,
>
> ..... Ruby on Rails....
>
> I have been trying all night but cannot come up with the regular
expression needed to extract the URLs and the text.
>
> I have tried:
>
>
myurls=response.scan(/href\s*=\s*["''](http|https)(.*)["'']\s*.*>(.*)<\/a>/)
>
> However I am left with :
>
> ://domain.com/filename" rel="tag
>
> and
>
> ://domain.com/filename " title="permanent link
>
> Can anyone please help me as to how I can specify to extract everything
till the next single or double quote character? Or how can I go about
extracting URL and the linked text?
>
> I will greatly appreciate it.
>
> Thanks
> Frank
>
>
> ---------------------------------
>
> What are the most popular cars? Find out at Yahoo! Autos
-------------------------------------------------------
... with proper design, the features come cheaply. This
approach is arduous, but continues to succeed.
---Dennis Ritchie
_______________________________________________
Rails mailing list
Rails@lists.rubyonrails.org
http://lists.rubyonrails.org/mailman/listinfo/rails
---------------------------------
Brings words and photos together (easily) with
PhotoMail - it''s free and works with Yahoo! Mail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://wrath.rubyonrails.org/pipermail/rails/attachments/20060218/51e9bc47/attachment.html