Tomoe Sugihara
2010-Dec-21 23:09 UTC
[SPAM] Re: [Xen-API] [SPAM] XCP 1.0beta and vastSky and how "far" I got with it.
Hi Henrik, Thanks for trying out vastsky on xcp 1.0 beta. Currently, testing has not completed and we have at least two issues, (not relevant to your issues though), that need to be addressed to get it working. One of the issues requires dom0 kernel bug fix, which I have been waiting for next update release. So, please be patient a little while;) We can discuss your vastsky specific issue at vastsky-devel@lists.sourceforge.net Thanks, Tomoe On 12/21/2010 05:07 PM, Henrik Andersson wrote:> I also sent this email to xen-users list, I hope it''s ok. > > Hello all. I''m trying to figure out vastSky on xcp1.0 beta. As we all know, it has been integrated to the xcp. That''s just about all one can find about the matter. Been trying to google a lot, but with no luck. I''ll write here what information I''ve gathered, what I tried and how far I managed to get with this. > > I include information about my hardware, in case it has something to do with all this. I have one four node SuperMicro twin2 server (2026TT-HiBQRF) with QDR InfiniBand (haven''t bought a switch or managed to get drivers for dom0''s yet, so it''s gigabit ethernet for now). Each node is identical, containing: 1x Intel Xeon E5620, 12GB ddr3, 3x 60GB OCZ Vertex2 ssd and 3x 500 GB Seagate Momentus 7200.4 SATA 2.5". No raid cards, just the onboard ICH10. > > Networking configuration: > > node A: hostname: super0nodeA ip: 192.168.10.210 > node B: hostname: super0nodeB ip: 192.168.10.211 > node C: hostname: super0nodeC ip: 192.168.10.212 > node D: hostname: super0nodeD ip: 192.168.10.213 > > I have bonded two interfaces on each node, have only one gigabit switch and haven''t done any multipath configurations. > > My plan was to use super0nodeA as Storage Manager and super0nodeB, super0nodeC, super0nodeD as storage servers but ended up installing storage and head server on super0nodeA also. > > http://sourceforge.net/apps/mediawiki/vastsky/index.php?title=Main_Page > this seem''s to be good starting point. If one click''s "Install manual" on the left, you get to: http://vastsky.svn.sourceforge.net/viewvc/vastsky/tags/v3.0/doc/vas_install.txt?revision=367&view=markup <http://vastsky.svn.sourceforge.net/viewvc/vastsky/tags/v3.0/doc/vas_install.txt?revision=367&view=markup> > > Install manual isn''t xcp specific, actually it only references to xcp couple of times but it seem''s pretty straight forward when it comes to config''s. Someone at ##xen-api clarified what thing''s I need to install. I mean there actually is /etc/vas.conf on stock xcp 1.0 beta but one still need''s to install the needed rpm''s to get the functionality. > > So, as (also) stated in the installation document one needs (taken from the vas_install.txt): > <start copy paste> > vastsky-common.rpm Common library and configuration > vastsky-hsvr.rpm Head server agent > vastsky-ssvr.rpm Storage server agent > vastsky-sm.rpm Storage manager > vastsky-cli.rpm Storage manager command-line clients > vastsky-doc.rpm Documentations (including this file) > > Basically, > - -common package is required by other packages. > - Head servers need -hsvr package. > - Storage servers need -ssvr package. > - The storage manager needs and -sm package. > - The host on which you want to run user commands needs -cli package. > <end copy paste> > > Everything I did, I did on dom0 of each server, actually I had no domU''s on these servers when I did all this. > > So, first I edited "/etc/vas.conf" that exist on all four nodes, inserted ip for "super0nodeA". It says "Comma separated list of hosts on which storage manager runs" but I remember reading somewhere, that there can only be one instance of it. Maybe one can define multiple ip''s on a single host. I didn''t find anything else to modify in "/etc/vas.conf". > > <part of vas.conf> > [storage_manager] > > # host_list: > # Comma separated list of hosts on which storage manager runs. > host_list: 192.168.10.210 > </part of vas.conf> > > Then I created "/var/lib/vas/register_device_list" on each node. Added disk''s, following the instructions on vas_install.txt. I configured one ssd disk and one hdd on each node. Actually, first I added this to nodes B, C and D, but later on, I added this to A also. > > I didn''t modify "/etc/multipath.conf" since vas_install.txt states "This step is not necessary if you solely use our XCP SR driver". Also I didn''t modify "/etc/hosts", since I used IP address instead of host name in "/etc/vas.conf" and haven''t found any where else to insert host names or ip addresses. > > Then after multiple reboot''s and plenty of googling, I went to #xen and #xen-api to ask some help. I was told that I need to install the rpm''s. It was "ahaa" moment and explained nicely why I didn''t have cli commands availeable or "/etc/init.d" script''s for the vastSky servers. So I did "rpm -i vastsky-hsvr.rpm" and "rpm -i vastsky-ssvr.rpm" on all nodes. I also did "rpm -i vastsky-sm.rpm" and "rpm -i vastsky-cli.rpm" on "super0nodeA". vastsky-common.rpm is already installed on "stock" xcp 1.0 beta and it is vastSky 2.1, so all the rpm''s I installed, were from 2.1, not 3.0 that seem''s to be the newest version availeable at: http://sourceforge.net/projects/vastsky/files/vastsky/ > > Then I did "/etc/init.d/vas_sm init" and "/etc/init.d/vas_sm start" on "super0nodeA". Seemed like I was on fire. Finally I had some processe''s running that I was pretty comfortable thinking had something to do with vastSky. Finaly I had commands working like: > > - hsvr_list "list head servers" > - ssvr_list "list storage servers" > - pdsk_list "list physical disks" > > Tho no resources present, even after I issued "/etc/init.d/vas_hsvr start" and "/etc/init.d/vas_ssvr start" on nodes "super0nodeB", "super0nodeC" and "super0nodeD". I knew that these services started since "ps -aux | grep vas" told me so and also because I started getting lines on "/var/log/vas_<host name>.log" (not sure if that is correct but the log files can be found at "/var/log", there is only one starting with "vas" there and it is similar to what i wrote). > > This is when I started thinking if the problem migth be network related. So I installed vastsky-hsvr.rpm and vastsky-ssvr.rpm to super0nodeA and started them. I also modified my "/etc/hosts" and added: > > 192.168.10.210 super0nodeA super0nodeA-data1 super0nodeA-data2 > 192.168.10.211 super0nodeB super0nodeB-data1 super0nodeB-data2 > 192.168.10.212 super0nodeC super0nodeC-data1 super0nodeC-data2 > 192.168.10.213 super0nodeD super0nodeD-data1 super0nodeD-data2 > > I did this to all nodes. > > This is when I finally had something come out of "storage manager". If I did hsvr_list, ssvr_list or pdsk_list, they all printed one resource, and it was the same that was on "super0nodeA", where the storage manager was also running. So still no connections from other nodes, even if I rebooted all nodes. > > After re-re-re-re-checking all the config''s I did "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr stop" and "/etc/init.d/vas_ssvr start" on "super0nodeA". About 5s after I started vas_ssvr I observed my server shutting down. Tried to start it, just to see it shut it self again just after the loading screen with panda on it. Just a text saying something about stunnel and bunch of numbers on top of the screen. Well I taught it was something I did, so I re-installed xcp. > > While I was reinstallin xcp to node A, I started to think that my problem might be node A, so I installed vastsky-cli.rpm and vastsky-sm.rpm to "super0nodeB", modified (changed the "host_list: 192.168.10.210" to 192.168.10.211) "/etc/vas.conf" on node B, C and D. Again, I had connections from head and storage servers, but only from local ones. Still no connections from nodes C or D. > > I did "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr stop" and "/etc/init.d/vas_ssvr start" on node B and again, server started shutting it self down. This time I had another ssh session where I had "tail -f /var/log/vas_super0nodeB.log" so even if the server shutted it self down, I was able to copy paste the content of the screen: > > <start of log> > 2010-12-19 15:47:59,435 ssvr_reporter DEBUG /opt/vas/bin/daemon_launcher -n 1 /opt/vas/bin/DiskPatroller /var/run/DiskPatroller.run > 2010-12-19 15:47:59,443 storage_manager INFO DISPATCH registerStorageServer called. ({''ip_data'': [''192.168.10.211'', ''192.168.10.211''], ''ver'': 3},) > 2010-12-19 15:47:59,444 storage_manager INFO DISPATCH registerStorageServer EXCEPTION <Fault 17: ''EEXIST''> > 2010-12-19 15:47:59,445 ssvr_reporter ERROR shutdown > 2010-12-19 15:47:59,500 ssvr_reporter DEBUG shutdown -g0 -h now > 2010-12-19 15:47:59,501 ssvr_reporter ERROR Traceback (most recent call last): > File "ssvr_reporter.py", line 231, in main > File "ssvr_reporter.py", line 100, in register_resources > File "vas_subr.py", line 68, in send_request > File "/usr/lib/python2.4/xmlrpclib.py", line 1096, in __call__ > return self.__send(self.__name, args) > File "/usr/lib/python2.4/xmlrpclib.py", line 1383, in __request > verbose=self.__verbose > File "/usr/lib/python2.4/xmlrpclib.py", line 1147, in request > return self._parse_response(h.getfile(), sock) > File "/usr/lib/python2.4/xmlrpclib.py", line 1286, in _parse_response > return u.close() > File "/usr/lib/python2.4/xmlrpclib.py", line 744, in close > raise Fault(**self._stack[0]) > Fault: <Fault 17: ''EEXIST''> > 2010-12-19 15:48:00,337 storage_manager DEBUG RW.__send_request (''192.168.10.211'', ''192.168.10.211'') 8883 registerShredRequest {''dextid'': 4, ''capacity'': 465, ''pdskid'': 3, ''ver'': 3, ''offset'': 0} > 2010-12-19 15:48:00,338 storage_manager DEBUG RW.__send_request (''192.168.10.211'', ''192.168.10.211'') 8883 registerShredRequest {''dextid'': 2, ''capacity'': 55, ''pdskid'': 2, ''ver'': 3, ''offset'': 0} > 2010-12-19 15:48:00,340 ssvr_agent INFO DISPATCH registerShredRequest called. ({''dextid'': 4, ''ver'': 3, ''pdskid'': 3, ''capacity'': 465, ''offset'': 0},) > 2010-12-19 15:48:00,342 ssvr_agent INFO DISPATCH registerShredRequest called. ({''dextid'': 2, ''ver'': 3, ''pdskid'': 2, ''capacity'': 55, ''offset'': 0},) > 2010-12-19 15:48:00,343 ssvr_agent INFO false [Status 256] > 2010-12-19 15:48:00,343 ssvr_agent INFO retrying(1/16) ... > 2010-12-19 15:48:00,345 ssvr_agent INFO false [Status 256] > 2010-12-19 15:48:00,345 ssvr_agent INFO retrying(1/16) ... > <end of log> > > Notice: "2010-12-19 15:47:59,500 ssvr_reporter DEBUG shutdown -g0 -h now" > > I did "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr stop" and "/etc/init.d/vas_ssvr start" on node C also and exactly the same happened. Server shut it self down and cant be started. Same stunnel... error. > > This is how far I got before I stopped trying. Hope this helps someone else. I would also welcome input if some one has something to say. > > -Henrik Andersson > > > > _______________________________________________ > xen-api mailing list > xen-api@lists.xensource.com > http://lists.xensource.com/mailman/listinfo/xen-api_______________________________________________ xen-api mailing list xen-api@lists.xensource.com http://lists.xensource.com/mailman/listinfo/xen-api