Mikko Vatanen
2014-May-09 10:24 UTC
[Gluster-users] Repetitive rename() causes device or resource busy errors
Hi! Hello to all, as being noobie on this mailing list. :) I have been trying to google and figure out this problem without success, so I am asking here for some help. As I have understood GlusterFS is should have atomic rename() operation. Am I going wrong here? I have some separate testing setups: - 2 virtual servers, low performance, two bricks on each host - 4 medium/high performance servers, few bricks on each host, 10Gbit ethernet between - Running RHEL 6.5 / kernel 2.6.32-431.11.2.el6.x86_64 Have tried the following versions (with out of the box settings): - glusterfs 3.5.0 built on Apr 23 2014 12:53:57 - glusterfs 3.4.2 built on Jan 3 2014 12:38:06 Tested volume is simple 2 x replicated volume with default settings "volume create storage_vol01 rep 2 transport tcp ?the bricks? " In addition I have also tried disabling write/read caches on server side and bunch of other parameters, with no change. Test is run by mounting the glusterfs volume with fuse client on 2 separate hosts and running test script on both hosts: http://pastebin.com/aAUhaswM Script sleeps for a short time, if the directory given as first parameter exists, renames it as the second directory. This is repeated forever. For example: host1: python test.py 1/test_directory 2/test_directory host2: python test.py 2/test_directory 1/test_directory Both client logs are showing the following messages: [2014-05-09 07:34:41.227499] I [dht-layout.c:640:dht_layout_normalize] 2-ingest_vol-dht: found anomalies in /test/1/test_directory. holes=1 overlaps=0 [2014-05-09 07:34:41.229184] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 2-ingest_vol-client-2: remote operation failed: File exists. Path: /test/1/test_directory [2014-05-09 07:34:41.229240] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 2-ingest_vol-client-3: remote operation failed: File exists. Path: /test/1/test_directory After some time has passed other of the scripts raises an exception: Traceback (most recent call last): File "test.py", line 28, in <module> main() File "test.py", line 25, in main os.rename(source_filename, destination_filename) OSError: [Errno 16] Device or resource busy This error is reproduced always on both setups after decreasing time.sleep() parameter enough. Any advice on options / tuning that I could try to resolve this problem? Thanks in advance :) ? Mikko Vatanen Applications Specialist Digital Preservation Services CSC - IT Center for Science Ltd. P.O. Box 405, FI-02101 Espoo, Finland Mobile: + 358 50 381 2435 http://www.csc.fi/
Laurent Chouinard
2014-May-10 13:39 UTC
[Gluster-users] Repetitive rename() causes device or resource busy errors
Hello Mikko,>From what we've seen, rename operations are not atomic through the cluster because of the hashing algorithm that will move data to a new brick after the rename.To make it atomic, you need to use a feature that allows tweaking the regular expression that is used to feed the hashing algorithm. With this, you can decide which part of the path/filename will be used for the distribution hash, and which part will be ignored. This allowed us to append a ".tmp" extension that is ignored by the hashing. Consequently, renaming a file to and from ".tmp" ensures it is not moved around the bricks, thereby making it an atomic operation. The setting to use for this would be: cluster.extra-hash-regex: "(.*)\\.tmp" Cheers, Laurent Chouinard -----Original Message----- From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Mikko Vatanen Sent: 9 mai 2014 06:25 To: gluster-users at gluster.org Subject: [Gluster-users] Repetitive rename() causes device or resource busy errors Hi! Hello to all, as being noobie on this mailing list. :) I have been trying to google and figure out this problem without success, so I am asking here for some help. As I have understood GlusterFS is should have atomic rename() operation. Am I going wrong here? I have some separate testing setups: - 2 virtual servers, low performance, two bricks on each host - 4 medium/high performance servers, few bricks on each host, 10Gbit ethernet between - Running RHEL 6.5 / kernel 2.6.32-431.11.2.el6.x86_64 Have tried the following versions (with out of the box settings): - glusterfs 3.5.0 built on Apr 23 2014 12:53:57 - glusterfs 3.4.2 built on Jan 3 2014 12:38:06 Tested volume is simple 2 x replicated volume with default settings "volume create storage_vol01 rep 2 transport tcp ...the bricks... " In addition I have also tried disabling write/read caches on server side and bunch of other parameters, with no change. Test is run by mounting the glusterfs volume with fuse client on 2 separate hosts and running test script on both hosts: http://pastebin.com/aAUhaswM Script sleeps for a short time, if the directory given as first parameter exists, renames it as the second directory. This is repeated forever. For example: host1: python test.py 1/test_directory 2/test_directory host2: python test.py 2/test_directory 1/test_directory Both client logs are showing the following messages: [2014-05-09 07:34:41.227499] I [dht-layout.c:640:dht_layout_normalize] 2-ingest_vol-dht: found anomalies in /test/1/test_directory. holes=1 overlaps=0 [2014-05-09 07:34:41.229184] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 2-ingest_vol-client-2: remote operation failed: File exists. Path: /test/1/test_directory [2014-05-09 07:34:41.229240] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 2-ingest_vol-client-3: remote operation failed: File exists. Path: /test/1/test_directory After some time has passed other of the scripts raises an exception: Traceback (most recent call last): File "test.py", line 28, in <module> main() File "test.py", line 25, in main os.rename(source_filename, destination_filename) OSError: [Errno 16] Device or resource busy This error is reproduced always on both setups after decreasing time.sleep() parameter enough. Any advice on options / tuning that I could try to resolve this problem? Thanks in advance :) - Mikko Vatanen Applications Specialist Digital Preservation Services CSC - IT Center for Science Ltd. P.O. Box 405, FI-02101 Espoo, Finland Mobile: + 358 50 381 2435 http://www.csc.fi/ _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users