Carlos Santana
2009-Jul-13 17:51 UTC
[Lustre-discuss] file system shrinking and OST failure
Does lustre support shrinking of file system size - online or offline? I read online is not supported, but I couldn''t find any info for the offline shrinking. My guess is that it is not supported. Please correct me if I am wrong. Can OST failure/removal be related to shrinking? Thanks, CS.
Brian J. Murrell
2009-Jul-13 17:56 UTC
[Lustre-discuss] file system shrinking and OST failure
On Mon, 2009-07-13 at 12:51 -0500, Carlos Santana wrote:> Does lustre support shrinking of file system size - online or offline? > I read online is not supported, but I couldn''t find any info for the > offline shrinking. My guess is that it is not supported. Please > correct me if I am wrong.You can shrink the filesystem by simply removing an OST. Of course, if there are objects on that OST, you need to move them off first, or you will lose the files, (or parts of files in the case of striped files) those objects are members of. All of the manual, this list and bugzilla have discussed how to move files off an OST in pretty great detail. Please check them for details on how. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090713/25ad334b/attachment.bin
Carlos Santana
2009-Jul-13 19:02 UTC
[Lustre-discuss] file system shrinking and OST failure
All right, so we can shrink the file system. The manual has useful info about OST failure/removal. I have a few related questions about it. The manual has a note in failover chapter 8-4 for stopping client process which waits indefinitely saying - "the OST is explicitly marked "inactive" on the clients: lctl --device <failed OSC device on the client> deactivate". But, a note in chapter 4-18 says "Do not deactivate the OST on the clients. Do so causes errors (EIOs), and the copy out to fail.". This is a bit confusing. So what should we do when an OST fails? and when should we deactivate OST (or to be precide OSC?) on client? Could you please elaborate more on configuring failover while making a new filesystem? The mkfs.lustre command does not have --failover switch, but rather has --failnode switch. So we just need to specify ''--failnode=<ip.addr.of.another.OST at interface>'' or anything else? What is the correct method? And do we need to configure this (spare) OST for the file system and be it active/mounted while running above mkfs.lustre command? - CS. On Mon, Jul 13, 2009 at 12:56 PM, Brian J. Murrell<Brian.Murrell at sun.com> wrote:> On Mon, 2009-07-13 at 12:51 -0500, Carlos Santana wrote: >> Does lustre support shrinking of file system size - online or offline? >> I read online is not supported, but I couldn''t find any info for the >> offline shrinking. My guess is that it is not supported. Please >> correct me if I am wrong. > > You can shrink the filesystem by simply removing an OST. ?Of course, if > there are objects on that OST, you need to move them off first, or you > will lose the files, (or parts of files in the case of striped files) > those objects are members of. > > All of the manual, this list and bugzilla have discussed how to move > files off an OST in pretty great detail. ?Please check them for details > on how. > > b. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
Andreas Dilger
2009-Jul-13 20:47 UTC
[Lustre-discuss] file system shrinking and OST failure
On Jul 13, 2009 14:02 -0500, Carlos Santana wrote:> The manual has a note in failover chapter 8-4 for stopping client > process which waits indefinitely saying - "the OST is explicitly > marked "inactive" on the clients: lctl --device <failed OSC device on > the client> deactivate". But, a note in chapter 4-18 says "Do not > deactivate the OST on the clients. Do so causes errors (EIOs), and the > copy out to fail.". This is a bit confusing. So what should we do when > an OST fails? and when should we deactivate OST (or to be precide > OSC?) on client?These are for two different situations. If the OST is unavailable, and you want clients to return -EIO if they access files located on that OST, then deactivate the OSC on the client. If the OST is just marked administratively down on the MDS in order to manually balance space usage, then the OSC should NOT be deactivated on the client, so that the clients can read/write/unlink files on that OST.> Could you please elaborate more on configuring failover while making a > new filesystem? The mkfs.lustre command does not have --failover > switch, but rather has --failnode switch. So we just need to specify > ''--failnode=<ip.addr.of.another.OST at interface>'' or anything else? > What is the correct method? > > And do we need to configure this (spare) OST for the file system and > be it active/mounted while running above mkfs.lustre command?You are mixing up "OST" and "OSS". For failover you are moving the OST (disk/filesystem) from a primary OSS (server) to a backup OSS. The backup OSS can also be active serving its own OSTs, and then also take over the failed OSS''s OSTs, so long as it has enough RAM and is of course cabled to the shared disk that holds the failed-over OSTs. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.