Hello Friends, I need to write a regular expression which will extract and return the domain name. for example if a user parse any of the below mention url it should save only "foo.com" http://www.foo.com/ http://www.foo.com/something http://foo.com/ https://something.foo.com/ Thanks for any help.. Thanks abhis --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Good way to Start is trying it to learn on online Regular Expression Editor http://rubular.com On 11/12/09, Abhishek shukla <betterabhi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Hello Friends, > I need to write a regular expression which will extract and return the > domain name. > > for example > if a user parse any of the below mention url it should save only "foo.com" > > http://www.foo.com/ > http://www.foo.com/something > http://foo.com/ > https://something.foo.com/ > > Thanks for any help.. > > Thanks > abhis > > > >
Hey srinivas,
Thanks for reply.
Somehow I am able to get the outpout, but the only problem is that i have to
define all the uk|com|net|org|in
So just trying to figure out which will be the best way to get the output.
url_pattern /^(?:.+?\.)+(.+?\.(?:co\.uk|com|net|org|in))(\:[0-9]{2,5})?\/*.*$/is
url = "http://www.foo.com"
url_pattern.match(url)
$1 #=> "foo.com"
Thanks
Abhishek
On Thu, Nov 12, 2009 at 12:09 PM, Srinivas Iyer
<srimviyer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> Good way to Start is trying it to learn on online Regular Expression
> Editor
>
> http://rubular.com
>
>
>
>
>
> On 11/12/09, Abhishek shukla
<betterabhi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > Hello Friends,
> > I need to write a regular expression which will extract and return the
> > domain name.
> >
> > for example
> > if a user parse any of the below mention url it should save only
"
> foo.com"
> >
> > http://www.foo.com/
> > http://www.foo.com/something
> > http://foo.com/
> > https://something.foo.com/
> >
> > Thanks for any help..
> >
> > Thanks
> > abhis
> >
> > >
> >
>
> >
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---
On Wed, Nov 11, 2009 at 10:25 PM, Abhishek shukla <betterabhi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote:> Hello Friends, > I need to write a regular expression which will extract and return the > domain name. > > for example > if a user parse any of the below mention url it should save only "foo.com" > > http://www.foo.com/ > http://www.foo.com/something > http://foo.com/ > https://something.foo.com/ > > Thanks for any help.. > > Thanks > abhis > > >require ''uri'' urls = [ "http://www.foo.com/", "http://www.foo.com/something", " http://foo.com/", "https://something.foo.com/" ] urls.each { |url| puts URI::parse( url ).host.split( "." )[-2,2].join(".") } Good luck, -Conrad> > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hi Abhishek
You can try using Addressable gem for your requirement .
Step 1 : Install Addressable gem with the following command .
$sudo gem install addressable
Step 2 : Will be explaining with IRB u can try and integrate with
your rails application .
$ irb
> require ''rubygems''
> require ''addressable/uri''
> uri = Addressable::URI.parse("http://google.com")
=> #<Addressable::URI:0xfdb9aee5c
URI:http://google.com>
Step 3 : You can extract only the host with the following command
> uri.host
=> "google.com"
There are many other different options which you can explore
http://addressable.rubyforge.org/api/classes/Addressable/URI.html
Hope this helps !
Best regards,
Srinivas Iyer
http://talkonsomething.com
http://twitter.com/srinivasiyermv
On 11/12/09, Conrad Taylor
<conradwt-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> On Wed, Nov 11, 2009 at 10:25 PM, Abhishek shukla
> <betterabhi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote:
>
>> Hello Friends,
>> I need to write a regular expression which will extract and return the
>> domain name.
>>
>> for example
>> if a user parse any of the below mention url it should save only
"foo.com"
>>
>> http://www.foo.com/
>> http://www.foo.com/something
>> http://foo.com/
>> https://something.foo.com/
>>
>> Thanks for any help..
>>
>> Thanks
>> abhis
>>
>>
>>
> require ''uri''
>
> urls = [ "http://www.foo.com/",
"http://www.foo.com/something", "
> http://foo.com/", "https://something.foo.com/" ]
>
> urls.each { |url| puts URI::parse( url ).host.split( "."
)[-2,2].join(".") }
>
> Good luck,
>
> -Conrad
>
>
>> >
>>
>
> >
>
On Wed, Nov 11, 2009 at 11:46 PM, Srinivas Iyer <srimviyer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Hi Abhishek > > You can try using Addressable gem for your requirement . > > Step 1 : Install Addressable gem with the following command . > > $sudo gem install addressable > > Step 2 : Will be explaining with IRB u can try and integrate with > your rails application . > > $ irb > > require ''rubygems'' > > require ''addressable/uri'' > > uri = Addressable::URI.parse("http://google.com") > => #<Addressable::URI:0xfdb9aee5c URI:http://google.com> > > Step 3 : You can extract only the host with the following command > > > uri.host > => "google.com" > > There are many other different options which you can explore > http://addressable.rubyforge.org/api/classes/Addressable/URI.html > > Hope this helps ! > > Best regards, > Srinivas Iyer > http://talkonsomething.com > http://twitter.com/srinivasiyermvHi, the addressable gem doesn''t produce the domain part of the web address. For example, irb(main):002:0> require ''addressable/uri'' => true irb(main):003:0> uri = Addressable::URI.parse("http://www.usc.edu/home.html" ) => #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html> irb(main):004:0> uri.host => "www.usc.edu" -Conrad> > > > On 11/12/09, Conrad Taylor <conradwt-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > On Wed, Nov 11, 2009 at 10:25 PM, Abhishek shukla > > <betterabhi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote: > > > >> Hello Friends, > >> I need to write a regular expression which will extract and return the > >> domain name. > >> > >> for example > >> if a user parse any of the below mention url it should save only " > foo.com" > >> > >> http://www.foo.com/ > >> http://www.foo.com/something > >> http://foo.com/ > >> https://something.foo.com/ > >> > >> Thanks for any help.. > >> > >> Thanks > >> abhis > >> > >> > >> > > require ''uri'' > > > > urls = [ "http://www.foo.com/", "http://www.foo.com/something", " > > http://foo.com/", "https://something.foo.com/" ] > > > > urls.each { |url| puts URI::parse( url ).host.split( "." > )[-2,2].join(".") } > > > > Good luck, > > > > -Conrad > > > > > >> > > >> > > > > > > > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Thu, 2009-11-12 at 02:31 -0800, Conrad Taylor wrote:> > irb(main):002:0> require ''addressable/uri'' > => true > irb(main):003:0> uri > Addressable::URI.parse("http://www.usc.edu/home.html" ) > => #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html> > irb(main):004:0> uri.host > => "www.usc.edu"---- uri.host.split(''.'')[0] Craig -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
# Given a URL, return a domain
def self.url_to_domain(url)
begin
host = URI.parse(self.fix_url(url)).host
host.gsub(/\Awww\./, "")
rescue
""
end
end
On Nov 12, 9:48 am, Craig White
<craigwh...-BQ75lA0ptkhBDgjK7y7TUQ@public.gmane.org>
wrote:> On Thu, 2009-11-12 at 02:31 -0800, Conrad Taylor wrote:
>
> > irb(main):002:0> require ''addressable/uri''
> > => true
> > irb(main):003:0> uri > >
Addressable::URI.parse("http://www.usc.edu/home.html" )
> > => #<Addressable::URI:0x90e89c
URI:http://www.usc.edu/home.html>
> > irb(main):004:0> uri.host
> > => "www.usc.edu"
>
> ----
> uri.host.split(''.'')[0]
>
> Craig
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
Oops, forgot to add the other function i was using:
# Prepend URL with http if necessary
def self.fix_url(u)
!!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u
end
Note that you need to require uri:
require ''uri''
I put this in a module called Utilities so the whole thing is:
require ''uri''
module Utilities
# Given a URL, return a domain
def self.url_to_domain(url)
begin
host = URI.parse(self.fix_url(url)).host
host.gsub(/\Awww\./, "")
rescue
""
end
end
# Prepend URL with http if necessary
def self.fix_url(u)
!!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u
end
end
And you call it with Utilities::url_to_domain(u)
On Nov 12, 9:59 am, Tony Amoyal
<corgan1...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> # Given a URL, return a domain
> def self.url_to_domain(url)
> begin
> host = URI.parse(self.fix_url(url)).host
> host.gsub(/\Awww\./, "")
> rescue
> ""
> end
> end
>
> On Nov 12, 9:48 am, Craig White
<craigwh...-BQ75lA0ptkhBDgjK7y7TUQ@public.gmane.org> wrote:
>
> > On Thu, 2009-11-12 at 02:31 -0800, Conrad Taylor wrote:
>
> > > irb(main):002:0> require ''addressable/uri''
> > > => true
> > > irb(main):003:0> uri > > >
Addressable::URI.parse("http://www.usc.edu/home.html" )
> > > => #<Addressable::URI:0x90e89c
URI:http://www.usc.edu/home.html>
> > > irb(main):004:0> uri.host
> > > => "www.usc.edu"
>
> > ----
> > uri.host.split(''.'')[0]
>
> > Craig
>
> > --
> > This message has been scanned for viruses and
> > dangerous content by MailScanner, and is
> > believed to be clean.
Hello Thanks friends for a superb solutions. Really appreciated. Thanks Abhis On Thu, Nov 12, 2009 at 8:46 PM, Tony Amoyal <corgan1003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Oops, forgot to add the other function i was using: > # Prepend URL with http if necessary > def self.fix_url(u) > !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u > end > > Note that you need to require uri: > > require ''uri'' > > I put this in a module called Utilities so the whole thing is: > > require ''uri'' > > module Utilities > > # Given a URL, return a domain > def self.url_to_domain(url) > begin > host = URI.parse(self.fix_url(url)).host > host.gsub(/\Awww\./, "") > rescue > "" > end > end > > # Prepend URL with http if necessary > def self.fix_url(u) > !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u > end > > end > > And you call it with Utilities::url_to_domain(u) > > On Nov 12, 9:59 am, Tony Amoyal <corgan1...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > # Given a URL, return a domain > > def self.url_to_domain(url) > > begin > > host = URI.parse(self.fix_url(url)).host > > host.gsub(/\Awww\./, "") > > rescue > > "" > > end > > end > > > > On Nov 12, 9:48 am, Craig White <craigwh...-BQ75lA0ptkhBDgjK7y7TUQ@public.gmane.org> wrote: > > > > > On Thu, 2009-11-12 at 02:31 -0800, Conrad Taylor wrote: > > > > > > irb(main):002:0> require ''addressable/uri'' > > > > => true > > > > irb(main):003:0> uri > > > > Addressable::URI.parse("http://www.usc.edu/home.html" ) > > > > => #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html> > > > > irb(main):004:0> uri.host > > > > => "www.usc.edu" > > > > > ---- > > > uri.host.split(''.'')[0] > > > > > Craig > > > > > -- > > > This message has been scanned for viruses and > > > dangerous content by MailScanner, and is > > > believed to be clean. > --~--~---------~--~----~------------~-------~--~----~ > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > To unsubscribe from this group, send email to > rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<rubyonrails-talk%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> > For more options, visit this group at > http://groups.google.com/group/rubyonrails-talk?hl=en > -~----------~----~----~----~------~----~------~--~--- > >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=.
On Nov 12, 1:16 pm, Tony Amoyal <corgan1...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Oops, forgot to add the other function i was using: > # Prepend URL with http if necessary > def self.fix_url(u) > !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u > end > > Note that you need to require uri: > > require ''uri'' > > I put this in a module called Utilities so the whole thing is: > > require ''uri'' > > module Utilities > > # Given a URL, return a domain > def self.url_to_domain(url) > begin > host = URI.parse(self.fix_url(url)).host > host.gsub(/\Awww\./, "") > rescue > "" > end > end > > # Prepend URL with http if necessary > def self.fix_url(u) > !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u > end > > end > > And you call it with Utilities::url_to_domain(u) > > On Nov 12, 9:59 am, Tony Amoyal <corgan1...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > # Given a URL, return a domain > > def self.url_to_domain(url) > > begin > > host = URI.parse(self.fix_url(url)).host > > host.gsub(/\Awww\./, "") > > rescue > > "" > > end > > end > > > On Nov 12, 9:48 am, Craig White <craigwh...-BQ75lA0ptkhBDgjK7y7TUQ@public.gmane.org> wrote: > > > > On Thu, 2009-11-12 at 02:31 -0800, Conrad Taylor wrote: > > > > > irb(main):002:0> require ''addressable/uri'' > > > > => true > > > > irb(main):003:0> uri > > > > Addressable::URI.parse("http://www.usc.edu/home.html" ) > > > > => #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html> > > > > irb(main):004:0> uri.host > > > > => "www.usc.edu" > > > > ---- > > > uri.host.split(''.'')[0] > > > > Craig > > > > -- > > > This message has been scanned for viruses and > > > dangerous content by MailScanner, and is > > > believed to be clean. > >I faced the exact same situation a while ago, here''s what I came up with after reading the rest of this thread: #!/usr/bin/env ruby require ''uri'' module DomainExtractor VALID_GENERIC_SUFIXES_RE = /^(com|net|org|co)$/ def self.extract(url) u = fix_url(url) uri = URI::parse(u) domain = uri.host chunks = domain.split(''.'') if ! (chunks[-1] =~ VALID_GENERIC_SUFIXES_RE).nil? domain = chunks[-2, 2].join(''.'') elsif ! (chunks[-2] =~ VALID_GENERIC_SUFIXES_RE).nil? domain = chunks[-3, 3].join(''.'') else domain = "" end domain.gsub(/\^www\./, "") rescue "" end def self.fix_url(url) !!( url !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{url}" : url end end # test urls = [ "http://google.com", "http://www.google.com", "http://google.com.uy", "http://www.google.com.uy", "http://google.com.uy/index.html", "http://subdomain1.google.com.uy/index.html", "http://subdomain1.subdomain2.google.com", "http://www.subdomain1.google.com.uy/index.html", "http://subdomain1.google.net/index.html", "http://subdomain1.sub2.sub3.google.org.kz?test=3", "http://kb.mediatemple.net/questions/251/Running+rake+tasks+from +cron", "https://creaproject.basecamphq.com/projects/3620850/todo_items/ 413078/comments", "google.com", "google.com.uy", "google.com.uy/index.php", "sub1.sub2.google.com.uy?test=value", "www.sub1.sub2.google.com.uy?test=value", "http://sub1.sub2.google.com.uy?test=value", "http://www.wwwsub1.sub2.google.com.uy?test=value" ] urls.each do |url| puts puts "URL : #{url}" result = DomainExtractor::extract(url) puts "result: #{result}" end -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=.