Henrik Andersson
2010-Dec-21 07:50 UTC
[Xen-users] [SPAM] XCP 1.0beta and vastSky. What information I have gathered, what I did and how "far" I got with it.
Hello all. I''m trying to figure out vastSky on xcp1.0 beta. As we all
know,
it has been integrated to the xcp. That''s just about all one can find
about
the matter. Been trying to google a lot, but with no luck. I''ll write
here
what information I''ve gathered, what I tried and how far I managed to
get
with this.
I include information about my hardware, in case it has something to do with
all this. I have one four node SuperMicro twin2 server (2026TT-HiBQRF) with
QDR InfiniBand (haven''t bought a switch or managed to get drivers for
dom0''s
yet, so it''s gigabit ethernet for now). Each node is identical,
containing:
1x Intel Xeon E5620, 12GB ddr3, 3x 60GB OCZ Vertex2 ssd and 3x 500 GB
Seagate Momentus 7200.4 SATA 2.5". No raid cards, just the onboard ICH10.
Networking configuration:
node A: hostname: super0nodeA ip: 192.168.10.210
node B: hostname: super0nodeB ip: 192.168.10.211
node C: hostname: super0nodeC ip: 192.168.10.212
node D: hostname: super0nodeD ip: 192.168.10.213
I have bonded two interfaces on each node, have only one gigabit switch and
haven''t done any multipath configurations.
My plan was to use super0nodeA as Storage Manager and super0nodeB,
super0nodeC, super0nodeD as storage servers but ended up installing storage
and head server on super0nodeA also.
http://sourceforge.net/apps/mediawiki/vastsky/index.php?title=Main_Page
this seem''s to be good starting point. If one click''s
"Install manual" on
the left, you get to:
http://vastsky.svn.sourceforge.net/viewvc/vastsky/tags/v3.0/doc/vas_install.txt?revision=367&view=markup
Install manual isn''t xcp specific, actually it only references to xcp
couple
of times but it seem''s pretty straight forward when it comes to
config''s.
Someone at ##xen-api clarified what thing''s I need to install. I mean
there
actually is /etc/vas.conf on stock xcp 1.0 beta but one still need''s to
install the needed rpm''s to get the functionality.
So, as (also) stated in the installation document one needs (taken from the
vas_install.txt):
<start copy paste>
vastsky-common.rpm Common library and configuration
vastsky-hsvr.rpm Head server agent
vastsky-ssvr.rpm Storage server agent
vastsky-sm.rpm Storage manager
vastsky-cli.rpm Storage manager command-line clients
vastsky-doc.rpm Documentations (including this file)
Basically,
- -common package is required by other packages.
- Head servers need -hsvr package.
- Storage servers need -ssvr package.
- The storage manager needs and -sm package.
- The host on which you want to run user commands needs -cli package.
<end copy paste>
Everything I did, I did on dom0 of each server, actually I had no
domU''s on
these servers when I did all this.
So, first I edited "/etc/vas.conf" that exist on all four nodes,
inserted ip
for "super0nodeA". It says "Comma separated list of hosts on
which storage
manager runs" but I remember reading somewhere, that there can only be one
instance of it. Maybe one can define multiple ip''s on a single host. I
didn''t find anything else to modify in "/etc/vas.conf".
<part of vas.conf>
[storage_manager]
# host_list:
# Comma separated list of hosts on which storage manager runs.
host_list: 192.168.10.210
</part of vas.conf>
Then I created "/var/lib/vas/register_device_list" on each node. Added
disk''s, following the instructions on vas_install.txt. I configured one
ssd
disk and one hdd on each node. Actually, first I added this to nodes B, C
and D, but later on, I added this to A also.
I didn''t modify "/etc/multipath.conf" since vas_install.txt
states "This
step is not necessary if you solely use our XCP SR driver". Also I
didn''t
modify "/etc/hosts", since I used IP address instead of host name in
"/etc/vas.conf" and haven''t found any where else to insert
host names or ip
addresses.
Then after multiple reboot''s and plenty of googling, I went to #xen and
#xen-api to ask some help. I was told that I need to install the rpm''s.
It
was "ahaa" moment and explained nicely why I didn''t have cli
commands
availeable or "/etc/init.d" script''s for the vastSky
servers. So I did "rpm
-i vastsky-hsvr.rpm" and "rpm -i vastsky-ssvr.rpm" on all nodes.
I also did
"rpm -i vastsky-sm.rpm" and "rpm -i vastsky-cli.rpm" on
"super0nodeA".
vastsky-common.rpm is already installed on "stock" xcp 1.0 beta and it
is
vastSky 2.1, so all the rpm''s I installed, were from 2.1, not 3.0 that
seem''s to be the newest version availeable at:
http://sourceforge.net/projects/vastsky/files/vastsky/
Then I did "/etc/init.d/vas_sm init" and "/etc/init.d/vas_sm
start" on
"super0nodeA". Seemed like I was on fire. Finally I had some
processe''s
running that I was pretty comfortable thinking had something to do with
vastSky. Finaly I had commands working like:
- hsvr_list "list head servers"
- ssvr_list "list storage servers"
- pdsk_list "list physical disks"
Tho no resources present, even after I issued "/etc/init.d/vas_hsvr
start"
and "/etc/init.d/vas_ssvr start" on nodes "super0nodeB",
"super0nodeC"
and "super0nodeD".
I knew that these services started since "ps -aux | grep vas" told me
so and
also because I started getting lines on "/var/log/vas_<host
name>.log" (not
sure if that is correct but the log files can be found at "/var/log",
there
is only one starting with "vas" there and it is similar to what i
wrote).
This is when I started thinking if the problem migth be network related. So
I installed vastsky-hsvr.rpm and vastsky-ssvr.rpm to super0nodeA and started
them. I also modified my "/etc/hosts" and added:
192.168.10.210 super0nodeA super0nodeA-data1 super0nodeA-data2
192.168.10.211 super0nodeB super0nodeB-data1 super0nodeB-data2
192.168.10.212 super0nodeC super0nodeC-data1 super0nodeC-data2
192.168.10.213 super0nodeD super0nodeD-data1 super0nodeD-data2
I did this to all nodes.
This is when I finally had something come out of "storage manager". If
I
did hsvr_list, ssvr_list or pdsk_list, they all printed one resource, and it
was the same that was on "super0nodeA", where the storage manager was
also
running. So still no connections from other nodes, even if I rebooted all
nodes.
After re-re-re-re-checking all the config''s I did
"/etc/init.d/vas_hsvr
stop", "/etc/init.d/vas_ssvr stop" and "/etc/init.d/vas_ssvr
start" on
"super0nodeA".
About 5s after I started vas_ssvr I observed my server shutting down. Tried
to start it, just to see it shut it self again just after the loading screen
with panda on it. Just a text saying something about stunnel and bunch of
numbers on top of the screen. Well I taught it was something I did, so I
re-installed xcp.
While I was reinstallin xcp to node A, I started to think that my problem
might be node A, so I installed vastsky-cli.rpm and vastsky-sm.rpm to
"super0nodeB",
modified (changed the "host_list: 192.168.10.210" to 192.168.10.211)
"/etc/vas.conf"
on node B, C and D. Again, I had connections from head and storage servers,
but only from local ones. Still no connections from nodes C or D.
I did "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr
stop" and
"/etc/init.d/vas_ssvr start" on node B and again, server started
shutting it
self down. This time I had another ssh session where I had "tail -f
/var/log/vas_super0nodeB.log" so even if the server shutted it self down, I
was able to copy paste the content of the screen:
<start of log>
2010-12-19 15:47:59,435 ssvr_reporter DEBUG /opt/vas/bin/daemon_launcher -n
1 /opt/vas/bin/DiskPatroller /var/run/DiskPatroller.run
2010-12-19 15:47:59,443 storage_manager INFO DISPATCH registerStorageServer
called. ({''ip_data'': [''192.168.10.211'',
''192.168.10.211''], ''ver'': 3},)
2010-12-19 15:47:59,444 storage_manager INFO DISPATCH registerStorageServer
EXCEPTION <Fault 17: ''EEXIST''>
2010-12-19 15:47:59,445 ssvr_reporter ERROR shutdown
2010-12-19 15:47:59,500 ssvr_reporter DEBUG shutdown -g0 -h now
2010-12-19 15:47:59,501 ssvr_reporter ERROR Traceback (most recent call
last):
File "ssvr_reporter.py", line 231, in main
File "ssvr_reporter.py", line 100, in register_resources
File "vas_subr.py", line 68, in send_request
File "/usr/lib/python2.4/xmlrpclib.py", line 1096, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.4/xmlrpclib.py", line 1383, in __request
verbose=self.__verbose
File "/usr/lib/python2.4/xmlrpclib.py", line 1147, in request
return self._parse_response(h.getfile(), sock)
File "/usr/lib/python2.4/xmlrpclib.py", line 1286, in
_parse_response
return u.close()
File "/usr/lib/python2.4/xmlrpclib.py", line 744, in close
raise Fault(**self._stack[0])
Fault: <Fault 17: ''EEXIST''>
2010-12-19 15:48:00,337 storage_manager DEBUG RW.__send_request
(''192.168.10.211'', ''192.168.10.211'') 8883
registerShredRequest {''dextid'': 4,
''capacity'': 465, ''pdskid'': 3,
''ver'': 3, ''offset'': 0}
2010-12-19 15:48:00,338 storage_manager DEBUG RW.__send_request
(''192.168.10.211'', ''192.168.10.211'') 8883
registerShredRequest {''dextid'': 2,
''capacity'': 55, ''pdskid'': 2,
''ver'': 3, ''offset'': 0}
2010-12-19 15:48:00,340 ssvr_agent INFO DISPATCH registerShredRequest
called. ({''dextid'': 4, ''ver'': 3,
''pdskid'': 3, ''capacity'': 465,
''offset'':
0},)
2010-12-19 15:48:00,342 ssvr_agent INFO DISPATCH registerShredRequest
called. ({''dextid'': 2, ''ver'': 3,
''pdskid'': 2, ''capacity'': 55,
''offset'': 0},)
2010-12-19 15:48:00,343 ssvr_agent INFO false [Status 256]
2010-12-19 15:48:00,343 ssvr_agent INFO retrying(1/16) ...
2010-12-19 15:48:00,345 ssvr_agent INFO false [Status 256]
2010-12-19 15:48:00,345 ssvr_agent INFO retrying(1/16) ...
<end of log>
Notice: "2010-12-19 15:47:59,500 ssvr_reporter DEBUG shutdown -g0 -h
now"
I did "/etc/init.d/vas_hsvr stop", "/etc/init.d/vas_ssvr
stop" and
"/etc/init.d/vas_ssvr start" on node C also and exactly the same
happened.
Server shut it self down and cant be started. Same stunnel... error.
This is how far I got before I stopped trying. Hope this helps someone else.
I would also welcome input if some one has something to say.
-Henrik Andersson
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
