Eric Chris Garrison
2013-Oct-17 13:16 UTC
[Samba] Can't restore from GPFS snapshots, disk_free error
Hello, We're trying to set up a GPFS system with Samba running on top with CTDB managing it. I have snapshots set up to be accessible in every directory as the invisible directory .snap The snapshots are in the following format: /usr/lpp/mmfs/bin/mmcrsnapshot 1MB `TZ=GMT date + at GMT-%Y.%m.%d-%H.%M.%S` ?and look like this from the UNIX level: ~ecgarris/RSFS/.snap/@GMT-2013.10.16-20.00.01/thing1 I've set up shares like this: [homes] path = %H/RSFS comment = RSFS Home Directories browseable = No shadow:snapdir = .snap # shadow:basedir = %H/RSFS shadow:fixinodes = yes shadow:snapdirseverywhere=yes (I commented out basedir, in case it was causing a path issue, the problem is the same either way, and I don't think basedir is needed if the .snap is in every directory). We can right-click and see "restore previous versions" and see all the other snapshots. We can "Open" those files and see the contents. "Copy" fails with "Item Not Found / Could not find this item / This is no longer located in Z:\sharename (\\hostname.domainname.edu)(Z:)\dirname." "Restore" gives the same error, but then the current version disappears. Upon doing a "Copy", we see the following in the logs: Oct 15 16:34:06 hostname smbd[18355]: [2013/10/15 16:34:06.706603, 0] ../source3/smbd/dfree.c:138(sys_disk_free) Oct 15 16:34:06 hostname smbd[18355]: disk_free: sys_fsusage() failed. Error was : No such file or directory A similar issue comes up in the following forum, but no solution comes up: http://forums.freenas.org/threads/9-1-and-windows-7-2008r2-previous-versions .14416/ One thing I notice is that it'll put the share name in the path, when that share is already the root level of (Z:) at that point. I tried putting a symlink in to the share name under the share, in case it was getting an extra level in there, but that didn't solve the problem, so it's probably not it. In summary, it acts like the snapshot version is unreadable for both Restore and Copy, but can read it fine for Open. If I do a Copy or Restore for a whole directory tree, it restores all the directories, but they are all empty of files, and I get an error for each file that fails to restore. I'd appreciate any help. It feels like we're really close. Thanks. Chris Research Storage Indiana University
Jonathan Buzzard
2013-Oct-17 15:46 UTC
[Samba] Can't restore from GPFS snapshots, disk_free error
On Thu, 2013-10-17 at 09:16 -0400, Eric Chris Garrison wrote:> Hello, > > We're trying to set up a GPFS system with Samba running on top with CTDB > managing it. > > I have snapshots set up to be accessible in every directory as the invisible > directory .snap > > The snapshots are in the following format: > > /usr/lpp/mmfs/bin/mmcrsnapshot 1MB `TZ=GMT date + at GMT-%Y.%m.%d-%H.%M.%S` > > ?and look like this from the UNIX level: > > ~ecgarris/RSFS/.snap/@GMT-2013.10.16-20.00.01/thing1 > > I've set up shares like this: > > [homes] > path = %H/RSFS > comment = RSFS Home Directories > browseable = No > shadow:snapdir = .snap > # shadow:basedir = %H/RSFS > shadow:fixinodes = yes > shadow:snapdirseverywhere=yes >[SNIP]> I'd appreciate any help. It feels like we're really close. Thanks. >I am assuming that you are loading the shadow_copy2 and gpfs VFS modules. Assuming that you don't have independent filesets as well then scrap the snapshot directories all over the place and in the general configuration do # enable shadow copies shadow : snapdir = /gpfs/.snapshots shadow : basedir = /gpfs shadow : fixinodes = yes Works for certain with GPFS 3.4.x and Samba 3.5.x and 3.6.x Now the bad news, give it up anyway as snapshots are unworkable on a GPFS files system that is in production. The following commands in my experience should only be run under very light load (aka maintenance window) on a production file system mmcrsnapshot, mmdelsnapshot and mmunlinkfileset. I have a wonderful Perl script that can be called from all nodes in the file system able to run admin commands; will create snapshots as required, will remove old snapshots as required, complete with full locking (the script is called from crontab on multiple nodes so a node down for maintenance does not cause you to loose snapshots) and it all works beautifully in test. Runs for months without a problem. Put it on a production system, and within days if not hours you will get a total freeze on the file system when it blocks trying to get a quiescent lock to either take or remove a snapshot. If you wish I can send you a copy of my Perl script, my advice is forget it as ahead only pain and anguish :-( JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom.