Hello Friends, I need to write a regular expression which will extract and return the domain name. for example if a user parse any of the below mention url it should save only "foo.com" http://www.foo.com/ http://www.foo.com/something http://foo.com/ https://something.foo.com/ Thanks for any help.. Thanks abhis --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Good way to Start is trying it to learn on online Regular Expression Editor http://rubular.com On 11/12/09, Abhishek shukla <betterabhi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Hello Friends, > I need to write a regular expression which will extract and return the > domain name. > > for example > if a user parse any of the below mention url it should save only "foo.com" > > http://www.foo.com/ > http://www.foo.com/something > http://foo.com/ > https://something.foo.com/ > > Thanks for any help.. > > Thanks > abhis > > > >
Hey srinivas, Thanks for reply. Somehow I am able to get the outpout, but the only problem is that i have to define all the uk|com|net|org|in So just trying to figure out which will be the best way to get the output. url_pattern /^(?:.+?\.)+(.+?\.(?:co\.uk|com|net|org|in))(\:[0-9]{2,5})?\/*.*$/is url = "http://www.foo.com" url_pattern.match(url) $1 #=> "foo.com" Thanks Abhishek On Thu, Nov 12, 2009 at 12:09 PM, Srinivas Iyer <srimviyer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Good way to Start is trying it to learn on online Regular Expression > Editor > > http://rubular.com > > > > > > On 11/12/09, Abhishek shukla <betterabhi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > Hello Friends, > > I need to write a regular expression which will extract and return the > > domain name. > > > > for example > > if a user parse any of the below mention url it should save only " > foo.com" > > > > http://www.foo.com/ > > http://www.foo.com/something > > http://foo.com/ > > https://something.foo.com/ > > > > Thanks for any help.. > > > > Thanks > > abhis > > > > > > > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Wed, Nov 11, 2009 at 10:25 PM, Abhishek shukla <betterabhi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote:> Hello Friends, > I need to write a regular expression which will extract and return the > domain name. > > for example > if a user parse any of the below mention url it should save only "foo.com" > > http://www.foo.com/ > http://www.foo.com/something > http://foo.com/ > https://something.foo.com/ > > Thanks for any help.. > > Thanks > abhis > > >require ''uri'' urls = [ "http://www.foo.com/", "http://www.foo.com/something", " http://foo.com/", "https://something.foo.com/" ] urls.each { |url| puts URI::parse( url ).host.split( "." )[-2,2].join(".") } Good luck, -Conrad> > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hi Abhishek You can try using Addressable gem for your requirement . Step 1 : Install Addressable gem with the following command . $sudo gem install addressable Step 2 : Will be explaining with IRB u can try and integrate with your rails application . $ irb > require ''rubygems'' > require ''addressable/uri'' > uri = Addressable::URI.parse("http://google.com") => #<Addressable::URI:0xfdb9aee5c URI:http://google.com> Step 3 : You can extract only the host with the following command > uri.host => "google.com" There are many other different options which you can explore http://addressable.rubyforge.org/api/classes/Addressable/URI.html Hope this helps ! Best regards, Srinivas Iyer http://talkonsomething.com http://twitter.com/srinivasiyermv On 11/12/09, Conrad Taylor <conradwt-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On Wed, Nov 11, 2009 at 10:25 PM, Abhishek shukla > <betterabhi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote: > >> Hello Friends, >> I need to write a regular expression which will extract and return the >> domain name. >> >> for example >> if a user parse any of the below mention url it should save only "foo.com" >> >> http://www.foo.com/ >> http://www.foo.com/something >> http://foo.com/ >> https://something.foo.com/ >> >> Thanks for any help.. >> >> Thanks >> abhis >> >> >> > require ''uri'' > > urls = [ "http://www.foo.com/", "http://www.foo.com/something", " > http://foo.com/", "https://something.foo.com/" ] > > urls.each { |url| puts URI::parse( url ).host.split( "." )[-2,2].join(".") } > > Good luck, > > -Conrad > > >> > >> > > > >
On Wed, Nov 11, 2009 at 11:46 PM, Srinivas Iyer <srimviyer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Hi Abhishek > > You can try using Addressable gem for your requirement . > > Step 1 : Install Addressable gem with the following command . > > $sudo gem install addressable > > Step 2 : Will be explaining with IRB u can try and integrate with > your rails application . > > $ irb > > require ''rubygems'' > > require ''addressable/uri'' > > uri = Addressable::URI.parse("http://google.com") > => #<Addressable::URI:0xfdb9aee5c URI:http://google.com> > > Step 3 : You can extract only the host with the following command > > > uri.host > => "google.com" > > There are many other different options which you can explore > http://addressable.rubyforge.org/api/classes/Addressable/URI.html > > Hope this helps ! > > Best regards, > Srinivas Iyer > http://talkonsomething.com > http://twitter.com/srinivasiyermvHi, the addressable gem doesn''t produce the domain part of the web address. For example, irb(main):002:0> require ''addressable/uri'' => true irb(main):003:0> uri = Addressable::URI.parse("http://www.usc.edu/home.html" ) => #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html> irb(main):004:0> uri.host => "www.usc.edu" -Conrad> > > > On 11/12/09, Conrad Taylor <conradwt-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > On Wed, Nov 11, 2009 at 10:25 PM, Abhishek shukla > > <betterabhi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote: > > > >> Hello Friends, > >> I need to write a regular expression which will extract and return the > >> domain name. > >> > >> for example > >> if a user parse any of the below mention url it should save only " > foo.com" > >> > >> http://www.foo.com/ > >> http://www.foo.com/something > >> http://foo.com/ > >> https://something.foo.com/ > >> > >> Thanks for any help.. > >> > >> Thanks > >> abhis > >> > >> > >> > > require ''uri'' > > > > urls = [ "http://www.foo.com/", "http://www.foo.com/something", " > > http://foo.com/", "https://something.foo.com/" ] > > > > urls.each { |url| puts URI::parse( url ).host.split( "." > )[-2,2].join(".") } > > > > Good luck, > > > > -Conrad > > > > > >> > > >> > > > > > > > > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Thu, 2009-11-12 at 02:31 -0800, Conrad Taylor wrote:> > irb(main):002:0> require ''addressable/uri'' > => true > irb(main):003:0> uri > Addressable::URI.parse("http://www.usc.edu/home.html" ) > => #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html> > irb(main):004:0> uri.host > => "www.usc.edu"---- uri.host.split(''.'')[0] Craig -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
# Given a URL, return a domain def self.url_to_domain(url) begin host = URI.parse(self.fix_url(url)).host host.gsub(/\Awww\./, "") rescue "" end end On Nov 12, 9:48 am, Craig White <craigwh...-BQ75lA0ptkhBDgjK7y7TUQ@public.gmane.org> wrote:> On Thu, 2009-11-12 at 02:31 -0800, Conrad Taylor wrote: > > > irb(main):002:0> require ''addressable/uri'' > > => true > > irb(main):003:0> uri > > Addressable::URI.parse("http://www.usc.edu/home.html" ) > > => #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html> > > irb(main):004:0> uri.host > > => "www.usc.edu" > > ---- > uri.host.split(''.'')[0] > > Craig > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean.
Oops, forgot to add the other function i was using: # Prepend URL with http if necessary def self.fix_url(u) !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u end Note that you need to require uri: require ''uri'' I put this in a module called Utilities so the whole thing is: require ''uri'' module Utilities # Given a URL, return a domain def self.url_to_domain(url) begin host = URI.parse(self.fix_url(url)).host host.gsub(/\Awww\./, "") rescue "" end end # Prepend URL with http if necessary def self.fix_url(u) !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u end end And you call it with Utilities::url_to_domain(u) On Nov 12, 9:59 am, Tony Amoyal <corgan1...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> # Given a URL, return a domain > def self.url_to_domain(url) > begin > host = URI.parse(self.fix_url(url)).host > host.gsub(/\Awww\./, "") > rescue > "" > end > end > > On Nov 12, 9:48 am, Craig White <craigwh...-BQ75lA0ptkhBDgjK7y7TUQ@public.gmane.org> wrote: > > > On Thu, 2009-11-12 at 02:31 -0800, Conrad Taylor wrote: > > > > irb(main):002:0> require ''addressable/uri'' > > > => true > > > irb(main):003:0> uri > > > Addressable::URI.parse("http://www.usc.edu/home.html" ) > > > => #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html> > > > irb(main):004:0> uri.host > > > => "www.usc.edu" > > > ---- > > uri.host.split(''.'')[0] > > > Craig > > > -- > > This message has been scanned for viruses and > > dangerous content by MailScanner, and is > > believed to be clean.
Hello Thanks friends for a superb solutions. Really appreciated. Thanks Abhis On Thu, Nov 12, 2009 at 8:46 PM, Tony Amoyal <corgan1003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Oops, forgot to add the other function i was using: > # Prepend URL with http if necessary > def self.fix_url(u) > !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u > end > > Note that you need to require uri: > > require ''uri'' > > I put this in a module called Utilities so the whole thing is: > > require ''uri'' > > module Utilities > > # Given a URL, return a domain > def self.url_to_domain(url) > begin > host = URI.parse(self.fix_url(url)).host > host.gsub(/\Awww\./, "") > rescue > "" > end > end > > # Prepend URL with http if necessary > def self.fix_url(u) > !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u > end > > end > > And you call it with Utilities::url_to_domain(u) > > On Nov 12, 9:59 am, Tony Amoyal <corgan1...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > # Given a URL, return a domain > > def self.url_to_domain(url) > > begin > > host = URI.parse(self.fix_url(url)).host > > host.gsub(/\Awww\./, "") > > rescue > > "" > > end > > end > > > > On Nov 12, 9:48 am, Craig White <craigwh...-BQ75lA0ptkhBDgjK7y7TUQ@public.gmane.org> wrote: > > > > > On Thu, 2009-11-12 at 02:31 -0800, Conrad Taylor wrote: > > > > > > irb(main):002:0> require ''addressable/uri'' > > > > => true > > > > irb(main):003:0> uri > > > > Addressable::URI.parse("http://www.usc.edu/home.html" ) > > > > => #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html> > > > > irb(main):004:0> uri.host > > > > => "www.usc.edu" > > > > > ---- > > > uri.host.split(''.'')[0] > > > > > Craig > > > > > -- > > > This message has been scanned for viruses and > > > dangerous content by MailScanner, and is > > > believed to be clean. > --~--~---------~--~----~------------~-------~--~----~ > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > To unsubscribe from this group, send email to > rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<rubyonrails-talk%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> > For more options, visit this group at > http://groups.google.com/group/rubyonrails-talk?hl=en > -~----------~----~----~----~------~----~------~--~--- > >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=.
On Nov 12, 1:16 pm, Tony Amoyal <corgan1...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Oops, forgot to add the other function i was using: > # Prepend URL with http if necessary > def self.fix_url(u) > !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u > end > > Note that you need to require uri: > > require ''uri'' > > I put this in a module called Utilities so the whole thing is: > > require ''uri'' > > module Utilities > > # Given a URL, return a domain > def self.url_to_domain(url) > begin > host = URI.parse(self.fix_url(url)).host > host.gsub(/\Awww\./, "") > rescue > "" > end > end > > # Prepend URL with http if necessary > def self.fix_url(u) > !!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u > end > > end > > And you call it with Utilities::url_to_domain(u) > > On Nov 12, 9:59 am, Tony Amoyal <corgan1...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > # Given a URL, return a domain > > def self.url_to_domain(url) > > begin > > host = URI.parse(self.fix_url(url)).host > > host.gsub(/\Awww\./, "") > > rescue > > "" > > end > > end > > > On Nov 12, 9:48 am, Craig White <craigwh...-BQ75lA0ptkhBDgjK7y7TUQ@public.gmane.org> wrote: > > > > On Thu, 2009-11-12 at 02:31 -0800, Conrad Taylor wrote: > > > > > irb(main):002:0> require ''addressable/uri'' > > > > => true > > > > irb(main):003:0> uri > > > > Addressable::URI.parse("http://www.usc.edu/home.html" ) > > > > => #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html> > > > > irb(main):004:0> uri.host > > > > => "www.usc.edu" > > > > ---- > > > uri.host.split(''.'')[0] > > > > Craig > > > > -- > > > This message has been scanned for viruses and > > > dangerous content by MailScanner, and is > > > believed to be clean. > >I faced the exact same situation a while ago, here''s what I came up with after reading the rest of this thread: #!/usr/bin/env ruby require ''uri'' module DomainExtractor VALID_GENERIC_SUFIXES_RE = /^(com|net|org|co)$/ def self.extract(url) u = fix_url(url) uri = URI::parse(u) domain = uri.host chunks = domain.split(''.'') if ! (chunks[-1] =~ VALID_GENERIC_SUFIXES_RE).nil? domain = chunks[-2, 2].join(''.'') elsif ! (chunks[-2] =~ VALID_GENERIC_SUFIXES_RE).nil? domain = chunks[-3, 3].join(''.'') else domain = "" end domain.gsub(/\^www\./, "") rescue "" end def self.fix_url(url) !!( url !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{url}" : url end end # test urls = [ "http://google.com", "http://www.google.com", "http://google.com.uy", "http://www.google.com.uy", "http://google.com.uy/index.html", "http://subdomain1.google.com.uy/index.html", "http://subdomain1.subdomain2.google.com", "http://www.subdomain1.google.com.uy/index.html", "http://subdomain1.google.net/index.html", "http://subdomain1.sub2.sub3.google.org.kz?test=3", "http://kb.mediatemple.net/questions/251/Running+rake+tasks+from +cron", "https://creaproject.basecamphq.com/projects/3620850/todo_items/ 413078/comments", "google.com", "google.com.uy", "google.com.uy/index.php", "sub1.sub2.google.com.uy?test=value", "www.sub1.sub2.google.com.uy?test=value", "http://sub1.sub2.google.com.uy?test=value", "http://www.wwwsub1.sub2.google.com.uy?test=value" ] urls.each do |url| puts puts "URL : #{url}" result = DomainExtractor::extract(url) puts "result: #{result}" end -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=.