behlendorf1@llnl.gov
2006-Dec-21 11:30 UTC
[Lustre-devel] [Bug 11471] New: Many 160-stripe files cause memory fragmentation and failures
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11471 Recently observed on one of our systems we had a user set the default striping on one of his directories to stripe all files 160 wide. He then proceeded to create many thousands of multi-gigabyte byte files in the directory. This worked fine for roughly 44 days until the system memory was so fragmented the order-4 allocations required for the LOV for each file started failing regularly. This resulted in the the normal system tools such as cp,mv,ls getting ENOMEM error when manipulating any of these files. For now we''ve advised to user to not stripe quite so widely by default, and we''ll be rebooting the client to clear up the fragmentation. That said I think we''re going to need to adjust how the LOV is allocated and not use a kmalloc() but instead rely on a page array to keep the allocation size small. Sadly I think that''s going to complicate how the LOV is packed when it needs to be sent from the client<->server but it seems like the right fix.