We have been a method to have a backup MDS and MDT using the documentation here: https://jira.hpdd.intel.com/browse/LUDOC-161 The procedure we follow is commented by Scott and was taken from our own internal documentation. Our method is to create a ZFS file system lustre-meta/fsmeta lustre-meta/mgs And then we snapshot (-r), then zfs send (-R)/receive it. To use the backup we swap its IP addresses and name with the primary. If the backup is swapped and used immediately after the snapshot is taken, this method works. However, it does not work if you continue to use the original server before migrating to the backup -- the equivalent of migrating to a not-perfectly-up-to-date snapshot. In that case, you can mount and read from the file system but no newly written files make it to the OSTs. The MDT does show files we attempt to create but their attributes are all unknown. We are unable to manipulate the files (rm, mv, etc). The error returned is "cannot allocate memory" (LU-4524). We suspected the configuration logs and so we re-ran writeconf and remounted. Same behavior. On our primary MDT right now we have both a working MGS/MDT and also the non-working MGS/MDT which we could switch to if testing were requested. We are running lustre 2.4.0 on the servers and have tested with 2.4.0 and 2.1.6 clients. Best, Jesse Stroik University of Wisconsin