Hey Bil! On Tue, Aug 21, 2007 at 11:56:40AM -0400, Bil Kleb wrote:> Hi, > > Does anyone have the formula for getting logged into LinkedIn? > > Here''s my current attempt: > > require ''rubygems'' > require ''mechanize'' > > agent = WWW::Mechanize.new > > home_page = agent.get(''http://www.linkedin.com'') > > signin_page = agent.click home_page.links.text(''Sign in'') > puts "\nSIGNIN PAGE" > pp signin_page > > login_form = signin_page.form(''login'') > login_form.session_login = ''LOGIN'' > login_form.session_password = ''PASSWORD'' > > welcome_page = agent.submit(login_form, login_form.buttons.first) > puts "\nWELCOME PAGE" > pp welcome_page <<<< Currently returns signin page > > I tried mucking about with a session key, but no joy: > > login_form.session_rikey = agent.cookies.find{ |c| ''JSESSIONID'' == c.name }.value > > (My goal is to scrape a list of my connections'' new connections.)I think the "session_login" field is misleading. Give this a try: mech = WWW::Mechanize.new page = mech.get(''https://www.linkedin.com/secure/login'') page = page.form(''login'') { |form| form.session_key = ARGV[0] form.session_password = ARGV[1] }.submit.links.first.click page.save_as(''out.html'') Hope that helps! I''ll add this to the mechanize examples. :-) -- Aaron Patterson http://tenderlovemaking.com/
Hi, Does anyone have the formula for getting logged into LinkedIn? Here''s my current attempt: require ''rubygems'' require ''mechanize'' agent = WWW::Mechanize.new home_page = agent.get(''http://www.linkedin.com'') signin_page = agent.click home_page.links.text(''Sign in'') puts "\nSIGNIN PAGE" pp signin_page login_form = signin_page.form(''login'') login_form.session_login = ''LOGIN'' login_form.session_password = ''PASSWORD'' welcome_page = agent.submit(login_form, login_form.buttons.first) puts "\nWELCOME PAGE" pp welcome_page <<<< Currently returns signin page I tried mucking about with a session key, but no joy: login_form.session_rikey = agent.cookies.find{ |c| ''JSESSIONID'' == c.name }.value (My goal is to scrape a list of my connections'' new connections.) Thanks, -- Bil Kleb http://nasarb.rubyforge.org
Bil, It''s possible there is more to it than this, but looking at the page it appears that you have the field names wrong for the login form. Also, the login submit button is an image, which is a hoop I''ve had to jump through before, even so far as having to specify the exact coordinates of where I "clicked" on the button, so tell it to submit using that button specifically. Since there is only one button on that form, you can just use the following for your login form: login_form = agent.page.form(''login'') login_form.set_fields(:session_key => ''LOGIN'', :session_password => ''PASSWORD'') agent.submit(login_form, login_form.buttons.first) When I tried this, it said that my credentials were invalid, so hopefully it will work for you as you actually have a login :). Good luck! Matt White ----- Original Message ---- From: Bil Kleb <Bil.Kleb at NASA.gov> To: mechanize-users at rubyforge.org Sent: Tuesday, August 21, 2007 9:56:40 AM Subject: [Mechanize-users] Signin to LinkedIn Hi, Does anyone have the formula for getting logged into LinkedIn? Here''s my current attempt: require ''rubygems'' require ''mechanize'' agent = WWW::Mechanize.new home_page = agent.get(''http://www.linkedin.com'') signin_page = agent.click home_page.links.text(''Sign in'') puts "\nSIGNIN PAGE" pp signin_page login_form = signin_page.form(''login'') login_form.session_login = ''LOGIN'' login_form.session_password = ''PASSWORD'' welcome_page = agent.submit(login_form, login_form.buttons.first) puts "\nWELCOME PAGE" pp welcome_page <<<< Currently returns signin page I tried mucking about with a session key, but no joy: login_form.session_rikey = agent.cookies.find{ |c| ''JSESSIONID'' == c.name }.value (My goal is to scrape a list of my connections'' new connections.) Thanks, -- Bil Kleb http://nasarb.rubyforge.org _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users ____________________________________________________________________________________ Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase. http://farechase.yahoo.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070821/a4991001/attachment.html
On Tue, Aug 21, 2007 at 01:43:45PM -0400, Bil Kleb wrote:> Aaron Patterson wrote: > > Hey Bil! > > Hi, and thanks again for betabrite, > > http://tenderlovemaking.com/2006/09/28/new-ruby-betabrite-002/ > > It was a blast!No problem. I actually got a new sign, but they don''t have an API for it. Its got a USB cable with a proprietary protocol. :-(> > > Hope that helps! I''ll add this to the mechanize examples. :-) > > Yes, but it looks like they''ve hidden the actual "Connections" > in an embedded javascript browser:It looks like they also have a CSV export of the contacts. Would that get you the information you want? mech = WWW::Mechanize.new page = mech.get(''https://www.linkedin.com/secure/login'') page.form(''login'') { |form| form.session_key = ARGV[0] form.session_password = ARGV[1] }.submit page = mech.get(''http://www.linkedin.com/addressBookExport'') form = page.form(''exportSettingsForm'') form.submit(form.buttons.first).save_as(''contacts.csv'') -- Aaron Patterson http://tenderlovemaking.com/
Matt White wrote:> Bil, > > It''s possible there is more to it than this, but looking at the page it > appears that you have the field names wrong for the login form. [..]Ah, thanks! Now I''m on apparently to the next step... #<WWW::Mechanize::Page {url #<URI::HTTPS:0x9b74f4 URL:https://www.linkedin.com/secure/login>} {meta #<WWW::Mechanize::Meta "" "http://www.linkedin.com/home">} {title "Redirecting..."} {iframes} {frames} {links #<WWW::Mechanize::Link "click here" "http://www.linkedin.com/home">} {forms}> Thanks again, -- Bil Kleb http://fun3d.larc.nasa.gov
On Tue, Aug 21, 2007 at 02:30:28PM -0400, Bil Kleb wrote:> Aaron Patterson wrote: > > > > It looks like they also have a CSV export of the contacts. Would that > > get you the information you want? > > I don''t think so, because I''m going after my connections'' > /new/ connections, which are one step removed from that and > indicated by yellow outlines in the connections listing.Ah. Yes, this is getting messier. I was able to get mechanize to fetch the javascript used to populate that list: id = mech.cookies.find { |c| c.name == ''JSESSIONID'' }.value page = mech.post(''/dwr/exec/ConnectionsBrowserService.getMyConnections.dwr'', { ''callCount'' => ''1'', ''JSESSIONID'' => id, ''c0-scriptName'' => ''ConnectionsBrowserService'', ''c0-methodName'' => ''getMyConnections'', ''c0-id'' => ''8656_1187721167904'', ''c0-param0'' => ''number:-1'', ''c0-param1'' => ''number:-1'', ''c0-param2'' => ''string:DONT_CARE'', ''c0-param3'' => ''number:500'', ''c0-param4'' => ''boolean:false'', ''c0-param5'' => ''boolean:true'', ''xml'' => ''true'', }) I don''t know how brittle that is.... I don''t know where the c0-id number comes from, so it may break for you. That javascript has the info you want, but it might be kind of nasty to parse. -- Aaron Patterson http://tenderlovemaking.com/
Aaron Patterson wrote:> Hey Bil!Hi, and thanks again for betabrite, http://tenderlovemaking.com/2006/09/28/new-ruby-betabrite-002/ It was a blast!> Hope that helps! I''ll add this to the mechanize examples. :-)Yes, but it looks like they''ve hidden the actual "Connections" in an embedded javascript browser: <div id="main"> <noscript> <h1> <span>My Contacts:</span> Connections <div class="hdrlink"> <p class="dc88x31"> <script type="text/javascript"> var dbl_page = ''connections_browser''; var dbl_tile = ''5''; var dbl_sz = ''88x31''; </script> <script type="text/javascript" src="/js/doubleclick.js?v=build-402_2_1431"></script> </p> </div> </h1> <p>You currently have JavaScript disabled or are using a browser that doesn''t support it. Either enable JavaScript and refresh this page or proceed to the <a href="/connectionsnojs?trk=cnx_nojslink" >basic connection browser</a>.</p> </noscript> All I get on the fetched page (or with "view source") is <div id="connection-listing"> <p class="processing" id="processing"> ...processing </p> <ol id="listing-results"> <li class="result-message"></li> </ol> </div> (I swear just last month the connections we available with "view source".) Stymied for now, -- Bil Kleb http://fun3d.larc.nasa.gov
Aaron Patterson wrote:> > It looks like they also have a CSV export of the contacts. Would that > get you the information you want?I don''t think so, because I''m going after my connections'' /new/ connections, which are one step removed from that and indicated by yellow outlines in the connections listing. Maybe I''ll have to resort to firewatir? Later, -- Bil Kleb http://fun3d.larc.nasa.gov