Thanks much Daire,
Your insights are very much appreciated.
> Interesting configuration. Is there a particular reason why you decided
> on using Xen VMs? Is failover better with Xen instances? I''m
guessing
> you don''t have hundreds of clients hammering the hardware.
Cost was the deciding factor for XEN. True, failover and STONITH are FAST
with XEN (it''s mostly all up in memory and killing a domU is easy) ,
and
boot time of XEN domUs is more seconds (35 - 50) than minutes. But it all
came down to cost. We only get about 5k folks a day knocking at the
Customer Portal door, so not a lot of traffic (we''re not Google by any
stretch of the imagination).
A small group within our ITS department got together to architect this
solution in 3 days (Security, Portal Apps Devel, App Admins, System
Admins, Network Admins, Storage Admins). The choice to use XEN VMs really
was based on cost. We were pleasantly suprised by our interim VP to
receive X number of dollars to create a high availability solution for our
client facing customer portal and those X number of dollars wasn''t a
lot
and was only available for a limited time so we had to hustle (use it or
lose it for the year). Out of that bucket we needed to purchase network
and server hardware, OS support, App and Portal support, backup server
client licenses, LDAP support and high availability disk solution support
(Lustre training and ongoing support and initial configuration). So the
funds got gobbled up fast and anywhere we could save a buck was reviewed
and held weight. Purchasing RHEL 5 Advanced Platforum Premium gave us
24x7 support and unlimited number of XEN servers (right value, right
price).
We bought low cost IBM Intel xSeries servers for DEV, CERT, and PROD. The
DEV server is running 5 XEN domUs. The 3 CERT servers all together are
running 13 XEN domUs. The 5 PROD servers are all together running 22 XEN
domUs. Nine physical servers total, 40 virtual, hardware that''s maxed
out
and configured with multiple dual HBAs and quad GigE NICs to the tune of
about $200k I think. We''ve got WAS and WPS servers, HTTP servers, LDAP
servers, DB2 servers (thankfully free with the Portal server), Lustre MDS
/ OSS all running side by side with each other.
Our current customer facing portal (we haven''t cut over to the new
hardware yet) consists of 16 servers and dropping to 9 reduces our "carbon
footprint", but definitely increases the complexity of our environment.
Our data center, like many, is power constrained (fully using our UPS'')
and we have a large internal push to consolidate and virtualize to realize
full server utilization potential (do more with less) as well as reduce
energy costs. We currently use RHEL''s GFS / CS, but we''re on
3U8 (which
uses disk pools) and from RHEL4 and up, GFS uses LVM instead of disk
pools. This requires a complete rebuild and hardware refresh any way you
look at it, so we opened the playing field to all HA disk solutions. We
wanted to decouple the HA disk from the Application Server layer (to allow
the app layer to remain up when GFS panics...and yes, GFS has panicked the
entire 6 node cluster before and brought down the Application layer;
bye-bye portal access; RHEL 3 was very buggy). Lustre allows us to do
that and has a great support base.
> I''m curious as to why you created 5 filesystems on the same
"hardware"
> instead of one big filesystem?
Legacy filesystems, before my time and reaching far into the past. Those
filesystems have migrated from an IBM Regatta class p690 server, to the
current RHEL GFS 16 server environment, to this new environment. We''re
working with the data content owners to establish new filesystem
guidelines which will include archiving old / unused data and better
recognizing data ownership. But that''s Phase 2 of this project and
another type of migration. Phase 1 is the migration and implementation of
a solid HA hardware / software environment (or as SPOF free as we can make
it). Phase 2 will change the entire file system structure and provide us
with tools to enforce disk usage and accountability (along with
establishing better control over disk growth and who to charge back for
SAN expansion). To change those filesystems now would mean our entire
development and publishing structure would breakdown (automated publishing
scripts would all break, several integral connected servers that check
data existence would go nuts, VPs would have words with VPs, not a pretty
scene). Basically, politics and corporate culture are the current
reasons. We just need time to plan and carefully coordinate with all
parties to develop a new filesystem structure and get folks to start
posting new data and migrating existing data to new filesystems.
> I''m not sure it is possible to migrate an existing filesystem to
LVM
> easily - you would need to do a file backup of your MDT first and
restore > to the LVM device (section 15.1.3.1 of the manual). So in your case to
> wipe (!) a single MDT and create a new one I''d do something like:
>
> pvcreate /dev/xvdj
> vgcreate lusfs01 /dev/xvdj
> vgchange -a y lusfs01
> lvcreate -L3G -nmdt lusfs01
I have another limited time opportunity open to me. Into the middle of
this large Customer Portal project another project got dropped; a new SAN
with all the fun that comes with a new SAN (migration, migration,
migration). As I''m being asked to migrate all my virtual domU
OS'' (which
are located on the old SAN) along with their corresponding data disks
(also on the old SAN), I figured, I''d take advantage of that migration
and
instead rebuild Lustre with LVM to get the benefits of journaling and
snapshotting, as you''d mentioned. Thank you for making clear that the
MDT''s are where you''d recommend using LVM, I wasn''t
sure if it was just
the MDS servers or both MDS + OSS.
And thank you for taking the time to answer. Your reply is absolutely
brilliant and what I''d hope for (it''s exactly what I need to
present my
case to the business). We''re not live, let''s recreate and
bring on board
these additional LVM features! Give me some new SAN data disks for
building up LVM, I''ll build it alongside the old SAN data disks,
transfer
the MDT data and then drop the old SAN. This is faster and more efficient
than using our SAN vendor''s migration solution for the data disks (I
still
have to use it for the OS disk though, but rebuilding with LVm is still a
time savings and a known procedure).
Cheers and many thanks again, Daire,
Ms. Andrea D. Rucks
Sr. Unix Systems Administrator,
Lawson ITS Unix Server Team
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090305/5652be0a/attachment.html