Leonid Isaev
2015-Oct-08 02:24 UTC
[Gluster-users] Writing to distributed (non-replicated) volume with failed nodes
Hi, I have an 8-node trusted pool with a distributed, non-replicated volume The bricks are located only on 2 machines (2 bricks per node), so there are 6 dummy" nodes. Everything is working great until one of the brick-arrying nodes experiences a power outage. In this case, I can still mount the volume after a timeout (there is plenty of servers to ask for metadata, after all) and read files from there, but whenever I try to create a random-named file (e.g. running touch /mnt/.lock-${RANDOM}${RANDOM}) this succeeds only sometimes, but often fails with "no such file or directory". I understand that error if I were touching files that already exist on the offline node (but invisible with the degraded volume), but these are new random files which never existed before. So, why does writing to the online bricks fail, and what can I do to enable it? The machines run fully up-to-date Fedora 22 and ArchLinux with gluster 3.7.4. I tried to look for similar problems on this ML, but haven't found anything related, sorry if I missed something. Thanks! L. -- Leonid Isaev GPG fingerprints: DA92 034D B4A8 EC51 7EA6 20DF 9291 EE8A 043C B8C4 C0DF 20D0 C075 C3F1 E1BE 775A A7AE F6CB 164B 5A6D
Susant Palai
2015-Oct-08 12:15 UTC
[Gluster-users] Writing to distributed (non-replicated) volume with failed nodes
Hi, If the file creation hashes to the brick which is down, then it fails with ENOENT. Susant ----- Original Message ----- From: "Leonid Isaev" <leonid.isaev at jila.colorado.edu> To: gluster-users at gluster.org Sent: Thursday, 8 October, 2015 7:54:07 AM Subject: [Gluster-users] Writing to distributed (non-replicated) volume with failed nodes Hi, I have an 8-node trusted pool with a distributed, non-replicated volume The bricks are located only on 2 machines (2 bricks per node), so there are 6 dummy" nodes. Everything is working great until one of the brick-arrying nodes experiences a power outage. In this case, I can still mount the volume after a timeout (there is plenty of servers to ask for metadata, after all) and read files from there, but whenever I try to create a random-named file (e.g. running touch /mnt/.lock-${RANDOM}${RANDOM}) this succeeds only sometimes, but often fails with "no such file or directory". I understand that error if I were touching files that already exist on the offline node (but invisible with the degraded volume), but these are new random files which never existed before. So, why does writing to the online bricks fail, and what can I do to enable it? The machines run fully up-to-date Fedora 22 and ArchLinux with gluster 3.7.4. I tried to look for similar problems on this ML, but haven't found anything related, sorry if I missed something. Thanks! L. -- Leonid Isaev GPG fingerprints: DA92 034D B4A8 EC51 7EA6 20DF 9291 EE8A 043C B8C4 C0DF 20D0 C075 C3F1 E1BE 775A A7AE F6CB 164B 5A6D _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://www.gluster.org/mailman/listinfo/gluster-users