Jim McCusker
2009-Aug-06 19:08 UTC
[Lustre-discuss] Large scale delete results in lag on clients
We have a 15 TB luster volume across 4 OSTs and we recently deleted over 4 million files from it in order to free up the 80 GB MDT/MDS (going from 100% capacity on it to 81%. As a result, after the rm completed, there is significant lag on most file system operations (but fast access once it occurs) even after the two servers that host the targets were rebooted. It seems to clear up for a little while after reboot, but comes back after some time. Any ideas? For the curious, we host a large image archive (almost 400k images) and do research on processing them. We had a lot of intermediate files that we needed to clean up: http://krauthammerlab.med.yale.edu/imagefinder (currently laggy and unresponsive due to this problem) Thanks, Jim -- Jim McCusker Programmer Analyst Krauthammer Lab, Pathology Informatics Yale School of Medicine james.mccusker at yale.edu | (203) 785-6330 http://krauthammerlab.med.yale.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090806/4800e0a5/attachment.html
Andreas Dilger
2009-Aug-06 20:27 UTC
[Lustre-discuss] Large scale delete results in lag on clients
On Aug 06, 2009 15:08 -0400, Jim McCusker wrote:> We have a 15 TB luster volume across 4 OSTs and we recently deleted over 4 > million files from it in order to free up the 80 GB MDT/MDS (going from 100% > capacity on it to 81%. As a result, after the rm completed, there is > significant lag on most file system operations (but fast access once it > occurs) even after the two servers that host the targets were rebooted. It > seems to clear up for a little while after reboot, but comes back after some > time. > > Any ideas?The Lustre unlink processing is somewhat asynchronous, so you may still be catching up with unlinks. You can check this by looking at the OSS service RPC stats file to see if there are still object destroys being processed by the OSTs. You could also just check the system load/io on the OSTs to see how busy they are in a "no load" situation.> For the curious, we host a large image archive (almost 400k images) and do > research on processing them. We had a lot of intermediate files that we > needed to clean up: > > http://krauthammerlab.med.yale.edu/imagefinder (currently laggy and > unresponsive due to this problem) > > Thanks, > Jim > -- > Jim McCusker > Programmer Analyst > Krauthammer Lab, Pathology Informatics > Yale School of Medicine > james.mccusker at yale.edu | (203) 785-6330 > http://krauthammerlab.med.yale.edu> _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Arden Wiebe
2009-Aug-07 10:45 UTC
[Lustre-discuss] Large scale delete results in lag on clients
--- On Thu, 8/6/09, Andreas Dilger <adilger at sun.com> wrote:> From: Andreas Dilger <adilger at sun.com> > Subject: Re: [Lustre-discuss] Large scale delete results in lag on clients > To: "Jim McCusker" <james.mccusker at yale.edu> > Cc: "lustre-discuss" <lustre-discuss at lists.lustre.org> > Date: Thursday, August 6, 2009, 1:27 PM > On Aug 06, 2009? 15:08 -0400, > Jim McCusker wrote: > > We have a 15 TB luster volume across 4 OSTs and we > recently deleted over 4 > > million files from it in order to free up the 80 GB > MDT/MDS (going from 100% > > capacity on it to 81%. As a result, after the rm > completed, there is > > significant lag on most file system operations (but > fast access once it > > occurs) even after the two servers that host the > targets were rebooted. It > > seems to clear up for a little while after reboot, but > comes back after some > > time. > > > > Any ideas? > > The Lustre unlink processing is somewhat asynchronous, so > you may still be > catching up with unlinks.? You can check this by > looking at the OSS service > RPC stats file to see if there are still object destroys > being processed > by the OSTs.? You could also just check the system > load/io on the OSTs to > see how busy they are in a "no load" situation. > > > > For the curious, we host a large image archive (almost > 400k images) and do > > research on processing them. We had a lot of > intermediate files that we > > needed to clean up: > > > >? http://krauthammerlab.med.yale.edu/imagefinder > (currently laggy and > > unresponsive due to this problem) > >Jim, from the web side perspective it seems responsive. Are you actually serving the images from the lustre cluster? I have ran a few searches looking for "Purified HIV Electron Microscope" and your project returns 15 pages of results with great links to full abstracts almost instantly but obviously none with real purified HIV electron microscope images similar to a real pathogenic virus like http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=62982&state:Figure=BrO0ABXcRAAAAAQAACmRvY3VtZW50SWRzcgARamF2YS5sYW5nLkludGVnZXIS4qCk94GHOAIAAUkABXZhbHVleHIAEGphdmEubGFuZy5OdW1iZXKGrJUdC5TgiwIAAHhwAAD2Cg%3D%3D Again though, not surprisingly some of the same proteins in this virus are present in molecular clones of HIV. I''ll have to agree more now with http://www.karymullis.com that using PCR to detect viral infection is a bad idea lacking proper viral isolation of HIV that is still overlooked after 25 years. No doubt http://ThePerthGroup.com are probably correct in their views but enough curiosity. Have you physically separated your MDS/MDT from the MGS portion on different servers? I somehow doubt you overlooked this but if you didn''t for some reason this could be a cause of unresponsiveness on the client side. Again if your serving up the images from the cluster I find it works great. http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=62982&state:Figure=BrO0ABXcRAAAAAQAACmRvY3VtZW50SWRzcgARamF2YS5sYW5nLkludGVnZXIS4qCk94GHOAIAAUkABXZhbHVleHIAEGphdmEubGFuZy5OdW1iZXKGrJUdC5TgiwIAAHhwAAD2Cg%3D%3D> > Thanks, > > Jim > > -- > > Jim McCusker > > Programmer Analyst > > Krauthammer Lab, Pathology Informatics > > Yale School of Medicine > > james.mccusker at yale.edu > | (203) 785-6330 > > http://krauthammerlab.med.yale.edu > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Jim McCusker
2009-Aug-07 12:25 UTC
[Lustre-discuss] Large scale delete results in lag on clients
On Fri, Aug 7, 2009 at 6:45 AM, Arden Wiebe <albert682 at yahoo.com> wrote:> --- On Thu, 8/6/09, Andreas Dilger <adilger at sun.com> wrote: > > Jim McCusker wrote:> > > We have a 15 TB luster volume across 4 OSTs and we recently deleted over 4 > > > million files from it in order to free up the 80 GB MDT/MDS (going from 100% > > > capacity on it to 81%. As a result, after the rm completed, there is > > > significant lag on most file system operations (but fast access once it > > > occurs) even after the two servers that host the targets were rebooted. It > > > seems to clear up for a little while after reboot, but comes back after some > > > time. > > > > > > Any ideas? > > > > The Lustre unlink processing is somewhat asynchronous, so you may still be > > catching up with unlinks.? You can check this by looking at the OSS service > > RPC stats file to see if there are still object destroys being processed > > by the OSTs.? You could also just check the system load/io on the OSTs to > > see how busy they are in a "no load" situation. > > > > > > > For the curious, we host a large image archive (almost 400k images) and do > > > research on processing them. We had a lot of intermediate files that we > > > needed to clean up: > > > > > >? http://krauthammerlab.med.yale.edu/imagefinder (currently laggy and > > > unresponsive due to this problem) > > > > > Jim, from the web side perspective it seems responsive. ?Are you actually serving the images from the lustre cluster? ?I have ran a few searches looking for "Purified HIV Electron Microscope" and your project returns 15 pages of results with great links to full abstracts almost instantly but obviously none with real purified HIV electron microscope images similar to a real pathogenic virus like http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=62982&state:Figure=BrO0ABXcRAAAAAQAACmRvY3VtZW50SWRzcgARamF2YS5sYW5nLkludGVnZXIS4qCk94GHOAIAAUkABXZhbHVleHIAEGphdmEubGFuZy5OdW1iZXKGrJUdC5TgiwIAAHhwAAD2Cg%3D%3DThe images and the lucene index are both served from the lustre cluster (as is just about everything else on our network). I think Andreas is right, it seems to have cleared itself up. You''re seeing typical performance. If you don''t find what you''re looking for, you can expand your search to the full text, abstract, or title using the checkboxes below the search box. Of course, the lack of images in search has more to do with the availability of open access papers on the topic than the performance of lustre. :-)> Have you physically separated your MDS/MDT from the MGS portion on different servers? ?I somehow doubt you overlooked this but if you didn''t for some reason this could be a cause of unresponsiveness on the client side. ?Again if your serving up the images from the cluster I find it works great.This server started life as a 1.4.x server, so the MGS is still on the same partition as MDS/MDT. We have one server with the MGS, MDS/MDT, and two OSTs, and another server with two more OSTs. The first server also provides NFS and SMB services for the volume in question. I know that we''re not supposed to mount the volume on a server that provides it, but limited budget means limited servers, and performance has been excellent except for this one problem. Jim -- Jim McCusker Programmer Analyst Krauthammer Lab, Pathology Informatics Yale School of Medicine james.mccusker at yale.edu | (203) 785-6330 http://krauthammerlab.med.yale.edu
Arden Wiebe
2009-Aug-07 20:11 UTC
[Lustre-discuss] Large scale delete results in lag on clients
--- On Fri, 8/7/09, Jim McCusker <james.mccusker at yale.edu> wrote:> From: Jim McCusker <james.mccusker at yale.edu> > Subject: Re: [Lustre-discuss] Large scale delete results in lag on clients > To: "lustre-discuss" <lustre-discuss at lists.lustre.org> > Date: Friday, August 7, 2009, 5:25 AM > On Fri, Aug 7, 2009 at 6:45 AM, Arden > Wiebe <albert682 at yahoo.com> > wrote: > > --- On Thu, 8/6/09, Andreas Dilger <adilger at sun.com> > wrote: > > > Jim McCusker wrote: > > > > > We have a 15 TB luster volume across 4 OSTs > and we recently deleted over 4 > > > > million files from it in order to free up > the 80 GB MDT/MDS (going from 100% > > > > capacity on it to 81%. As a result, after > the rm completed, there is > > > > significant lag on most file system > operations (but fast access once it > > > > occurs) even after the two servers that host > the targets were rebooted. It > > > > seems to clear up for a little while after > reboot, but comes back after some > > > > time. > > > > > > > > Any ideas? > > > > > > The Lustre unlink processing is somewhat > asynchronous, so you may still be > > > catching up with unlinks.? You can check this by > looking at the OSS service > > > RPC stats file to see if there are still object > destroys being processed > > > by the OSTs.? You could also just check the > system load/io on the OSTs to > > > see how busy they are in a "no load" situation. > > > > > > > > > > For the curious, we host a large image > archive (almost 400k images) and do > > > > research on processing them. We had a lot of > intermediate files that we > > > > needed to clean up: > > > > > > > >? http://krauthammerlab.med.yale.edu/imagefinder > (currently laggy and > > > > unresponsive due to this problem) > > > > > > > > Jim, from the web side perspective it seems > responsive. ?Are you actually serving the images from the > lustre cluster? ?I have ran a few searches looking for > "Purified HIV Electron Microscope" and your project returns > 15 pages of results with great links to full abstracts > almost instantly but obviously none with real purified HIV > electron microscope images similar to a real pathogenic > virus like http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=62982&state:Figure=BrO0ABXcRAAAAAQAACmRvY3VtZW50SWRzcgARamF2YS5sYW5nLkludGVnZXIS4qCk94GHOAIAAUkABXZhbHVleHIAEGphdmEubGFuZy5OdW1iZXKGrJUdC5TgiwIAAHhwAAD2Cg%3D%3D > > The images and the lucene index are both served from the > lustre > cluster (as is just about everything else on our network). > I think > Andreas is right, it seems to have cleared itself up. > You''re seeing > typical performance. If you don''t find what you''re looking > for, you > can expand your search to the full text, abstract, or title > using the > checkboxes below the search box. Of course, the lack of > images in > search has more to do with the availability of open access > papers on > the topic than the performance of lustre. :-) >Yeah I was all over the full text check box as soon as I ran one query. Great project by the way as there really is no way for any researcher or doctor to read the volumes of scientific journals the pharmaceutical industry pays for every month. Sad how mass consensus has replaced the actual scientific method all for capitalism.> > Have you physically separated your MDS/MDT from the > MGS portion on different servers? ?I somehow doubt you > overlooked this but if you didn''t for some reason this could > be a cause of unresponsiveness on the client side. ?Again > if your serving up the images from the cluster I find it > works great. > > This server started life as a 1.4.x server, so the MGS is > still on the > same partition as MDS/MDT. We have one server with the MGS, > MDS/MDT, > and two OSTs, and another server with two more OSTs. The > first server > also provides NFS and SMB services for the volume in > question. I know > that we''re not supposed to mount the volume on a server > that provides > it, but limited budget means limited servers, and > performance has been > excellent except for this one problem. >I roll the same way at http://oil-gas.ca/phpsysinfo and http://linuxguru.ca/phpsysinfo with the OST''s actually providing tcp routing and DNS service for the network that leads surfers to the internal lustre powered webservers although at this time I''m actually only serving one file via a symlink from the physically separated by block device lustre cluster right now at http://workwanted.ca/images/3689011.avi (let me know how fast it downloads back to you) but am tempted to symlink the entire /var/www/html directory for a few domains over to the lustre filesystem. I also run other services like smb, of course apache and mysql. The fact remains that lustre can be built with off the shelf hardware and is very robust and dependable and obviously upgradable if your coming from 1.4.x servers. I believe that the the "Lustre Product" could be used by more people but given the stigma of a High Performance Compute filesystem it will take a little magic marketing it to more of the masses. Like you also budget is a concern at times although I can see a solid state OST in the near future even if that one box deployed costs five thousand.> Jim > -- > Jim McCusker > Programmer Analyst > Krauthammer Lab, Pathology Informatics > Yale School of Medicine > james.mccusker at yale.edu > | (203) 785-6330 > http://krauthammerlab.med.yale.edu > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Jim McCusker
2009-Aug-11 01:02 UTC
[Lustre-discuss] Large scale delete results in lag on clients
Just a quick update: the lag is back. It seems to hold off for about 24 hours, then things get so slow that our web applications time out. Is there a delay before freeing inodes commences after a reboot, or is this officially now Something Else? Should I just tough it out for now and see if it clears up? Thanks, Jim -- Jim McCusker Programmer Analyst Krauthammer Lab, Pathology Informatics Yale School of Medicine james.mccusker at yale.edu | (203) 785-6330 http://krauthammerlab.med.yale.edu
Oleg Drokin
2009-Aug-11 02:56 UTC
[Lustre-discuss] Large scale delete results in lag on clients
Hello! On Aug 10, 2009, at 9:02 PM, Jim McCusker wrote:> Just a quick update: the lag is back. It seems to hold off for about > 24 hours, then things get so slow that our web applications time out. > Is there a delay before freeing inodes commences after a reboot, or is > this officially now Something Else? Should I just tough it out for now > and see if it clears up?What lustre version is it now? We used to have uncontrolled unlinking where OSTs might get swamped with unlink requests. Now we limit to 8 unlinks to OST at any one time. This slows down the deletion process, but at least there are no following aftershocks. (bug 13843, included into 1.6.5 release). Bye, Oleg
Jim McCusker
2009-Aug-11 03:03 UTC
[Lustre-discuss] Large scale delete results in lag on clients
On Monday, August 10, 2009, Oleg Drokin <Oleg.Drokin at sun.com> wrote:> What lustre version is it now? > > We used to have uncontrolled unlinking where OSTs might get swamped with > unlink requests. > Now we limit to 8 unlinks to OST at any one time. This slows down the > deletion process, but at least there are no following aftershocks. > (bug 13843, included into 1.6.5 releaseWe''re at 1.6.4.x. Is it too late to upgrade? Jim -- Jim -- Jim McCusker Programmer Analyst Krauthammer Lab, Pathology Informatics Yale School of Medicine james.mccusker at yale.edu | (203) 785-6330 http://krauthammerlab.med.yale.edu
Oleg Drokin
2009-Aug-11 03:11 UTC
[Lustre-discuss] Large scale delete results in lag on clients
Hello! On Aug 10, 2009, at 11:03 PM, Jim McCusker wrote:> On Monday, August 10, 2009, Oleg Drokin <Oleg.Drokin at sun.com> wrote: >> What lustre version is it now? >> >> We used to have uncontrolled unlinking where OSTs might get swamped >> with >> unlink requests. >> Now we limit to 8 unlinks to OST at any one time. This slows down the >> deletion process, but at least there are no following aftershocks. >> (bug 13843, included into 1.6.5 release > We''re at 1.6.4.x. Is it too late to upgrade?Well, it''s never too late to upgrade ;) After you upgrade two things will happen: 1. all big delete jobs might start to take more time. 2. The object unlinking that you now have will stop (as soon as you kill the client where you did the unlink) and on next MDS reconnection to OST the orphan destroy process will kill the objects. (so if you want to just kill the slowness you need to kill client where you did removal, but you will have space leakage until next MDS restart). Bye, Oleg
Arden Wiebe
2009-Aug-12 07:36 UTC
[Lustre-discuss] Large scale delete results in lag on clients
Jim: Mag just started a good thread about a live back up. Depending on your budget if the spare boxes are available and enough disks just make another Lustre filesystem and copy the existing data over with smb. Here is a screenshot of my commodity hardware rig-up of a 5.4 TB raid10 Lustre Filesystem that uses 28 1TB hard drives that could easily be built for under $10000.00 if you shopped a little more conservatively or had existing hardware you could utilize in the build out. http://www.ioio.ca/Lustre-tcp-bonding/images.html and http://www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html Being that I used raid10 for the underlying redundancy my available storage space was reduced substantially. I''m sure you could squeeze 15TB out of close to that number of disks if you used the right raid level. Here is the hardware recipe I used at http://oil-gas.ca/phpsysinfo if it helps you to contemplate the upgrade route or the backup then upgrade route. Otherwise if you knew someone with a spare 15TB of storage and bandwith you could quickly or not so quickly upload your data and then download again - again just ideas but the thought of doing a 15TB end to end data transfer using Lustre is interesting. Arden --- On Mon, 8/10/09, Jim McCusker <james.mccusker at yale.edu> wrote:> From: Jim McCusker <james.mccusker at yale.edu> > Subject: Re: [Lustre-discuss] Large scale delete results in lag on clients > To: "Oleg Drokin" <Oleg.Drokin at sun.com> > Cc: "lustre-discuss" <lustre-discuss at lists.lustre.org> > Date: Monday, August 10, 2009, 8:03 PM > On Monday, August 10, 2009, Oleg > Drokin <Oleg.Drokin at sun.com> > wrote: > > What lustre version is it now? > > > > We used to have uncontrolled unlinking where OSTs > might get swamped with > > unlink requests. > > Now we limit to 8 unlinks to OST at any one time. This > slows down the > > deletion process, but at least there are no following > aftershocks. > > (bug 13843, included into 1.6.5 release > > We''re at 1.6.4.x. Is it too late to upgrade? > > Jim > > -- > Jim > -- > Jim McCusker > Programmer Analyst > Krauthammer Lab, Pathology Informatics > Yale School of Medicine > james.mccusker at yale.edu > | (203) 785-6330 > http://krauthammerlab.med.yale.edu > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >