thr3ads.net - Lustre discuss - [Lustre-discuss] Large scale delete results in lag on clients [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Jim McCusker

2009-Aug-06 19:08 UTC

[Lustre-discuss] Large scale delete results in lag on clients

We have a 15 TB luster volume across 4 OSTs and we recently deleted over 4
million files from it in order to free up the 80 GB MDT/MDS (going from 100%
capacity on it to 81%. As a result, after the rm completed, there is
significant lag on most file system operations (but fast access once it
occurs) even after the two servers that host the targets were rebooted. It
seems to clear up for a little while after reboot, but comes back after some
time.

Any ideas?

For the curious, we host a large image archive (almost 400k images) and do
research on processing them. We had a lot of intermediate files that we
needed to clean up:

 http://krauthammerlab.med.yale.edu/imagefinder (currently laggy and
unresponsive due to this problem)

Thanks,
Jim
--
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker at yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090806/4800e0a5/attachment.html

Andreas Dilger

2009-Aug-06 20:27 UTC

head link

[Lustre-discuss] Large scale delete results in lag on clients

On Aug 06, 2009  15:08 -0400, Jim McCusker wrote:> We have a 15 TB luster volume across 4 OSTs and we recently deleted over 4
> million files from it in order to free up the 80 GB MDT/MDS (going from
100%
> capacity on it to 81%. As a result, after the rm completed, there is
> significant lag on most file system operations (but fast access once it
> occurs) even after the two servers that host the targets were rebooted. It
> seems to clear up for a little while after reboot, but comes back after
some
> time.
> 
> Any ideas?
The Lustre unlink processing is somewhat asynchronous, so you may still be
catching up with unlinks.  You can check this by looking at the OSS service
RPC stats file to see if there are still object destroys being processed
by the OSTs.  You could also just check the system load/io on the OSTs to
see how busy they are in a "no load" situation.

> For the curious, we host a large image archive (almost 400k images) and do
> research on processing them. We had a lot of intermediate files that we
> needed to clean up:
> 
>  http://krauthammerlab.med.yale.edu/imagefinder (currently laggy and
> unresponsive due to this problem)
> 
> Thanks,
> Jim
> --
> Jim McCusker
> Programmer Analyst
> Krauthammer Lab, Pathology Informatics
> Yale School of Medicine
> james.mccusker at yale.edu | (203) 785-6330
> http://krauthammerlab.med.yale.edu
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Arden Wiebe

2009-Aug-07 10:45 UTC

head link

[Lustre-discuss] Large scale delete results in lag on clients

--- On Thu, 8/6/09, Andreas Dilger <adilger at sun.com> wrote:
> From: Andreas Dilger <adilger at sun.com>
> Subject: Re: [Lustre-discuss] Large scale delete results in lag on clients
> To: "Jim McCusker" <james.mccusker at yale.edu>
> Cc: "lustre-discuss" <lustre-discuss at lists.lustre.org>
> Date: Thursday, August 6, 2009, 1:27 PM
> On Aug 06, 2009? 15:08 -0400,
> Jim McCusker wrote:
> > We have a 15 TB luster volume across 4 OSTs and we
> recently deleted over 4
> > million files from it in order to free up the 80 GB
> MDT/MDS (going from 100%
> > capacity on it to 81%. As a result, after the rm
> completed, there is
> > significant lag on most file system operations (but
> fast access once it
> > occurs) even after the two servers that host the
> targets were rebooted. It
> > seems to clear up for a little while after reboot, but
> comes back after some
> > time.
> > 
> > Any ideas?
> 
> The Lustre unlink processing is somewhat asynchronous, so
> you may still be
> catching up with unlinks.? You can check this by
> looking at the OSS service
> RPC stats file to see if there are still object destroys
> being processed
> by the OSTs.? You could also just check the system
> load/io on the OSTs to
> see how busy they are in a "no load" situation.
> 
> 
> > For the curious, we host a large image archive (almost
> 400k images) and do
> > research on processing them. We had a lot of
> intermediate files that we
> > needed to clean up:
> > 
> >? http://krauthammerlab.med.yale.edu/imagefinder
> (currently laggy and
> > unresponsive due to this problem)
> > 
Jim, from the web side perspective it seems responsive.  Are you actually
serving the images from the lustre cluster?  I have ran a few searches looking
for "Purified HIV Electron Microscope" and your project returns 15
pages of results with great links to full abstracts almost instantly but
obviously none with real purified HIV electron microscope images similar to a
real pathogenic virus like
http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=62982&state:Figure=BrO0ABXcRAAAAAQAACmRvY3VtZW50SWRzcgARamF2YS5sYW5nLkludGVnZXIS4qCk94GHOAIAAUkABXZhbHVleHIAEGphdmEubGFuZy5OdW1iZXKGrJUdC5TgiwIAAHhwAAD2Cg%3D%3D

Again though, not surprisingly some of the same proteins in this virus are
present in molecular clones of HIV. I''ll have to agree more now with
http://www.karymullis.com that using PCR to detect viral infection is a bad idea
lacking proper viral isolation of HIV that is still overlooked after 25 years. 
No doubt http://ThePerthGroup.com are probably correct in their views but enough
curiosity.

Have you physically separated your MDS/MDT from the MGS portion on different
servers?  I somehow doubt you overlooked this but if you didn''t for
some reason this could be a cause of unresponsiveness on the client side.  Again
if your serving up the images from the cluster I find it works great.

http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=62982&state:Figure=BrO0ABXcRAAAAAQAACmRvY3VtZW50SWRzcgARamF2YS5sYW5nLkludGVnZXIS4qCk94GHOAIAAUkABXZhbHVleHIAEGphdmEubGFuZy5OdW1iZXKGrJUdC5TgiwIAAHhwAAD2Cg%3D%3D
> > Thanks,
> > Jim
> > --
> > Jim McCusker
> > Programmer Analyst
> > Krauthammer Lab, Pathology Informatics
> > Yale School of Medicine
> > james.mccusker at yale.edu
> | (203) 785-6330
> > http://krauthammerlab.med.yale.edu
> 
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Jim McCusker

2009-Aug-07 12:25 UTC

head link

[Lustre-discuss] Large scale delete results in lag on clients

On Fri, Aug 7, 2009 at 6:45 AM, Arden Wiebe <albert682 at yahoo.com>
wrote:> --- On Thu, 8/6/09, Andreas Dilger <adilger at sun.com> wrote:
> > Jim McCusker wrote:
> > > We have a 15 TB luster volume across 4 OSTs and we recently
deleted over 4
> > > million files from it in order to free up the 80 GB MDT/MDS
(going from 100%
> > > capacity on it to 81%. As a result, after the rm completed, there
is
> > > significant lag on most file system operations (but fast access
once it
> > > occurs) even after the two servers that host the targets were
rebooted. It
> > > seems to clear up for a little while after reboot, but comes back
after some
> > > time.
> > >
> > > Any ideas?
> >
> > The Lustre unlink processing is somewhat asynchronous, so you may
still be
> > catching up with unlinks.? You can check this by looking at the OSS
service
> > RPC stats file to see if there are still object destroys being
processed
> > by the OSTs.? You could also just check the system load/io on the OSTs
to
> > see how busy they are in a "no load" situation.
> >
> >
> > > For the curious, we host a large image archive (almost 400k
images) and do
> > > research on processing them. We had a lot of intermediate files
that we
> > > needed to clean up:
> > >
> > >? http://krauthammerlab.med.yale.edu/imagefinder (currently laggy
and
> > > unresponsive due to this problem)
> > >
>
> Jim, from the web side perspective it seems responsive. ?Are you actually
serving the images from the lustre cluster? ?I have ran a few searches looking
for "Purified HIV Electron Microscope" and your project returns 15
pages of results with great links to full abstracts almost instantly but
obviously none with real purified HIV electron microscope images similar to a
real pathogenic virus like
http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=62982&state:Figure=BrO0ABXcRAAAAAQAACmRvY3VtZW50SWRzcgARamF2YS5sYW5nLkludGVnZXIS4qCk94GHOAIAAUkABXZhbHVleHIAEGphdmEubGFuZy5OdW1iZXKGrJUdC5TgiwIAAHhwAAD2Cg%3D%3D
The images and the lucene index are both served from the lustre
cluster (as is just about everything else on our network). I think
Andreas is right, it seems to have cleared itself up. You''re seeing
typical performance. If you don''t find what you''re looking
for, you
can expand your search to the full text, abstract, or title using the
checkboxes below the search box. Of course, the lack of images in
search has more to do with the availability of open access papers on
the topic than the performance of lustre. :-)
> Have you physically separated your MDS/MDT from the MGS portion on
different servers? ?I somehow doubt you overlooked this but if you
didn''t for some reason this could be a cause of unresponsiveness on the
client side. ?Again if your serving up the images from the cluster I find it
works great.
This server started life as a 1.4.x server, so the MGS is still on the
same partition as MDS/MDT. We have one server with the MGS, MDS/MDT,
and two OSTs, and another server with two more OSTs. The first server
also provides NFS and SMB services for the volume in question. I know
that we''re not supposed to mount the volume on a server that provides
it, but limited budget means limited servers, and performance has been
excellent except for this one problem.

Jim
--
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker at yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu

Arden Wiebe

2009-Aug-07 20:11 UTC

head link

[Lustre-discuss] Large scale delete results in lag on clients

--- On Fri, 8/7/09, Jim McCusker <james.mccusker at yale.edu> wrote:
> From: Jim McCusker <james.mccusker at yale.edu>
> Subject: Re: [Lustre-discuss] Large scale delete results in lag on clients
> To: "lustre-discuss" <lustre-discuss at lists.lustre.org>
> Date: Friday, August 7, 2009, 5:25 AM
> On Fri, Aug 7, 2009 at 6:45 AM, Arden
> Wiebe <albert682 at yahoo.com>
> wrote:
> > --- On Thu, 8/6/09, Andreas Dilger <adilger at sun.com>
> wrote:
> > > Jim McCusker wrote:
> 
> > > > We have a 15 TB luster volume across 4 OSTs
> and we recently deleted over 4
> > > > million files from it in order to free up
> the 80 GB MDT/MDS (going from 100%
> > > > capacity on it to 81%. As a result, after
> the rm completed, there is
> > > > significant lag on most file system
> operations (but fast access once it
> > > > occurs) even after the two servers that host
> the targets were rebooted. It
> > > > seems to clear up for a little while after
> reboot, but comes back after some
> > > > time.
> > > >
> > > > Any ideas?
> > >
> > > The Lustre unlink processing is somewhat
> asynchronous, so you may still be
> > > catching up with unlinks.? You can check this by
> looking at the OSS service
> > > RPC stats file to see if there are still object
> destroys being processed
> > > by the OSTs.? You could also just check the
> system load/io on the OSTs to
> > > see how busy they are in a "no load" situation.
> > >
> > >
> > > > For the curious, we host a large image
> archive (almost 400k images) and do
> > > > research on processing them. We had a lot of
> intermediate files that we
> > > > needed to clean up:
> > > >
> > > >? http://krauthammerlab.med.yale.edu/imagefinder
> (currently laggy and
> > > > unresponsive due to this problem)
> > > >
> >
> > Jim, from the web side perspective it seems
> responsive. ?Are you actually serving the images from the
> lustre cluster? ?I have ran a few searches looking for
> "Purified HIV Electron Microscope" and your project returns
> 15 pages of results with great links to full abstracts
> almost instantly but obviously none with real purified HIV
> electron microscope images similar to a real pathogenic
> virus like
http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=62982&state:Figure=BrO0ABXcRAAAAAQAACmRvY3VtZW50SWRzcgARamF2YS5sYW5nLkludGVnZXIS4qCk94GHOAIAAUkABXZhbHVleHIAEGphdmEubGFuZy5OdW1iZXKGrJUdC5TgiwIAAHhwAAD2Cg%3D%3D
> 
> The images and the lucene index are both served from the
> lustre
> cluster (as is just about everything else on our network).
> I think
> Andreas is right, it seems to have cleared itself up.
> You''re seeing
> typical performance. If you don''t find what you''re
looking
> for, you
> can expand your search to the full text, abstract, or title
> using the
> checkboxes below the search box. Of course, the lack of
> images in
> search has more to do with the availability of open access
> papers on
> the topic than the performance of lustre. :-)
> 
Yeah I was all over the full text check box as soon as I ran one query.  Great
project by the way as there really is no way for any researcher or doctor to
read the volumes of scientific journals the pharmaceutical industry pays for
every month.  Sad how mass consensus has replaced the actual scientific method
all for capitalism.
> > Have you physically separated your MDS/MDT from the
> MGS portion on different servers? ?I somehow doubt you
> overlooked this but if you didn''t for some reason this could
> be a cause of unresponsiveness on the client side. ?Again
> if your serving up the images from the cluster I find it
> works great.
> 
> This server started life as a 1.4.x server, so the MGS is
> still on the
> same partition as MDS/MDT. We have one server with the MGS,
> MDS/MDT,
> and two OSTs, and another server with two more OSTs. The
> first server
> also provides NFS and SMB services for the volume in
> question. I know
> that we''re not supposed to mount the volume on a server
> that provides
> it, but limited budget means limited servers, and
> performance has been
> excellent except for this one problem.
> 
I roll the same way at http://oil-gas.ca/phpsysinfo and
http://linuxguru.ca/phpsysinfo with the OST''s actually providing tcp
routing and DNS service for the network that leads surfers to the internal
lustre powered webservers although at this time I''m actually only
serving one file via a symlink from the physically separated by block device
lustre cluster right now at http://workwanted.ca/images/3689011.avi (let me know
how fast it downloads back to you) but am tempted to symlink the entire
/var/www/html directory for a few domains over to the lustre filesystem. I also
run other services like smb, of course apache and mysql.

The fact remains that lustre can be built with off the shelf hardware and is
very robust and dependable and obviously upgradable if your coming from 1.4.x
servers.  I believe that the the "Lustre Product" could be used by
more people but given the stigma of a High Performance Compute filesystem it
will take a little magic marketing it to more of the masses.  Like you also
budget is a concern at times although I can see a solid state OST in the near
future even if that one box deployed costs five thousand.
> Jim
> --
> Jim McCusker
> Programmer Analyst
> Krauthammer Lab, Pathology Informatics
> Yale School of Medicine
> james.mccusker at yale.edu
> | (203) 785-6330
> http://krauthammerlab.med.yale.edu
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Jim McCusker

2009-Aug-11 01:02 UTC

head link

[Lustre-discuss] Large scale delete results in lag on clients

Just a quick update: the lag is back. It seems to hold off for about
24 hours, then things get so slow that our web applications time out.
Is there a delay before freeing inodes commences after a reboot, or is
this officially now Something Else? Should I just tough it out for now
and see if it clears up?

Thanks,
Jim
--
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker at yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu

Oleg Drokin

2009-Aug-11 02:56 UTC

head link

[Lustre-discuss] Large scale delete results in lag on clients

Hello!

On Aug 10, 2009, at 9:02 PM, Jim McCusker wrote:
> Just a quick update: the lag is back. It seems to hold off for about
> 24 hours, then things get so slow that our web applications time out.
> Is there a delay before freeing inodes commences after a reboot, or is
> this officially now Something Else? Should I just tough it out for now
> and see if it clears up?
What lustre version is it now?

We used to have uncontrolled unlinking where OSTs might get swamped with
unlink requests.
Now we limit to 8 unlinks to OST at any one time. This slows down the
deletion process, but at least there are no following aftershocks.
(bug 13843, included into 1.6.5 release).

Bye,
     Oleg

Jim McCusker

2009-Aug-11 03:03 UTC

head link

[Lustre-discuss] Large scale delete results in lag on clients

On Monday, August 10, 2009, Oleg Drokin <Oleg.Drokin at sun.com>
wrote:> What lustre version is it now?
>
> We used to have uncontrolled unlinking where OSTs might get swamped with
> unlink requests.
> Now we limit to 8 unlinks to OST at any one time. This slows down the
> deletion process, but at least there are no following aftershocks.
> (bug 13843, included into 1.6.5 release
We''re at 1.6.4.x. Is it too late to upgrade?

Jim

-- 
Jim
--
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker at yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu

Oleg Drokin

2009-Aug-11 03:11 UTC

head link

[Lustre-discuss] Large scale delete results in lag on clients

Hello!

On Aug 10, 2009, at 11:03 PM, Jim McCusker wrote:
> On Monday, August 10, 2009, Oleg Drokin <Oleg.Drokin at sun.com>
wrote:
>> What lustre version is it now?
>>
>> We used to have uncontrolled unlinking where OSTs might get swamped  
>> with
>> unlink requests.
>> Now we limit to 8 unlinks to OST at any one time. This slows down the
>> deletion process, but at least there are no following aftershocks.
>> (bug 13843, included into 1.6.5 release
> We''re at 1.6.4.x. Is it too late to upgrade?
Well, it''s never too late to upgrade ;)
After you upgrade two things will happen:
1. all big delete jobs might start to take more time.
2. The object unlinking that you now have will stop (as soon as you  
kill the
client where you did the unlink) and on next MDS reconnection to OST
the orphan destroy process will kill the objects.
(so if you want to just kill the slowness you need to kill client  
where you did removal,
but you will have space leakage until next MDS restart).

Bye,
     Oleg

Arden Wiebe

2009-Aug-12 07:36 UTC

head link

[Lustre-discuss] Large scale delete results in lag on clients

Jim:

Mag just started a good thread about a live back up.  Depending on your budget
if the spare boxes are available and enough disks just make another Lustre
filesystem and copy the existing data over with smb.  Here is a screenshot of my
commodity hardware rig-up of a 5.4 TB raid10 Lustre Filesystem that uses 28 1TB
hard drives that could easily be built for under $10000.00 if you shopped a
little more conservatively or had existing hardware you could utilize in the
build out.  http://www.ioio.ca/Lustre-tcp-bonding/images.html and
http://www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html

Being that I used raid10 for the underlying redundancy my available storage
space was reduced substantially.  I''m sure you could squeeze 15TB out
of close to that number of disks if you used the right raid level.  Here is the
hardware recipe I used at http://oil-gas.ca/phpsysinfo if it helps you to
contemplate the upgrade route or the backup then upgrade route.  Otherwise if
you knew someone with a spare 15TB of storage and bandwith you could quickly or
not so quickly upload your data and then download again - again just ideas but
the thought of doing a 15TB end to end data transfer using Lustre is
interesting.

Arden

--- On Mon, 8/10/09, Jim McCusker <james.mccusker at yale.edu> wrote:
> From: Jim McCusker <james.mccusker at yale.edu>
> Subject: Re: [Lustre-discuss] Large scale delete results in lag on clients
> To: "Oleg Drokin" <Oleg.Drokin at sun.com>
> Cc: "lustre-discuss" <lustre-discuss at lists.lustre.org>
> Date: Monday, August 10, 2009, 8:03 PM
> On Monday, August 10, 2009, Oleg
> Drokin <Oleg.Drokin at sun.com>
> wrote:
> > What lustre version is it now?
> >
> > We used to have uncontrolled unlinking where OSTs
> might get swamped with
> > unlink requests.
> > Now we limit to 8 unlinks to OST at any one time. This
> slows down the
> > deletion process, but at least there are no following
> aftershocks.
> > (bug 13843, included into 1.6.5 release
> 
> We''re at 1.6.4.x. Is it too late to upgrade?
> 
> Jim
> 
> -- 
> Jim
> --
> Jim McCusker
> Programmer Analyst
> Krauthammer Lab, Pathology Informatics
> Yale School of Medicine
> james.mccusker at yale.edu
> | (203) 785-6330
> http://krauthammerlab.med.yale.edu
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Lustre discuss - Aug 2009 - Large scale delete results in lag on clients

[Lustre-discuss] Large scale delete results in lag on clients

[Lustre-discuss] Large scale delete results in lag on clients

[Lustre-discuss] Large scale delete results in lag on clients

[Lustre-discuss] Large scale delete results in lag on clients

[Lustre-discuss] Large scale delete results in lag on clients

[Lustre-discuss] Large scale delete results in lag on clients

[Lustre-discuss] Large scale delete results in lag on clients

[Lustre-discuss] Large scale delete results in lag on clients

[Lustre-discuss] Large scale delete results in lag on clients

[Lustre-discuss] Large scale delete results in lag on clients