Carlos Capriotti
2014-Mar-27 18:45 UTC
[Gluster-users] dude, where is my data ? AKA what is gluster doing with my file...?
Hello all. I have a very curious puzzle here, an maybe you want to chip in your opinion. My current setup, which I was hoping would be the last, is the following: One node: 1xDell 2950 8 GB ram,with an external RAID (Ataboy2) connected via a SCSI card, issuing about 70 MB/s (slow, I know). One node Dell 2950 with internal raid. Raid 5. both Dells have 4 (yep, four) bonded NICs using the useless mode 6. I recently learned that REAL link aggregation is switch dependent, so, if you want SPEED, never mind playing around with software. I have a REPLICATED volume using both servers with the respective bricks. As part of the scenario I have an Isilon, which is my primary storage for a few (10) fairly gig VMware images. One of the nodes mounts a NFS share to the isilon and mounts the gluster volume using the native glusterfs option. I am on a unique situation where I can afford suspending the VM servers for a few hours for a backup, so I wrote a nice simple bash script, ran from that node, that does exactly this: pauses the VM uses cp to copy the pertinent files from the Isilon to the gluster volume. resumes the VM and repeats that for all of the servers. Simple and elegant, I like to think. BUT, here is the trouble: For 20 SOLID minutes the system sits, reporting a steady connection to the isilon, receiving at 740 Mbps. Impressive you would thing, right ? That would be something like 150 GB of data. Well, the problem is, NOTHING is written to the disks. NO disk at all. And, with 8 GM of ram, it is not going to the memory either. There is NOTHING on gluster-related logs. Actually, the only entries there are from two days ago, when I last rebooted the system. There is nothing on system logs either, and, regarding network communication, there is no data flow to ANY other peer or point on the servers. After those 20 minutes of going fast nowhere, THEN the system decides to start transmitting data to the other node, bandwidth usage falls to around 270-300 Mbps, now FINALLY we have data recorded to both volumes. While the issue is happening, gluster-related processes are doing nothing. No processing reported by top. Any idea about what is going on here ? I am not even sure this is gluster, but, well, I can't think of anything else right now. on the network, x.92 is the isilon. x.23 and x.24 are the nodes. While that bizarre behavior was taking place, a quick tcpdump showed: 19:31:41.355065 IP 10.0.1.24.ssh > 10.0.1.23.50250: Flags [P.], seq 20534432:20534816, ack 7393, win 108, options [nop,nop,TS val 209602327 ecr 209546443], length 384 19:31:41.355123 IP 10.0.1.24.ssh > 10.0.1.23.50250: Flags [P.], seq 20534816:20535648, ack 7393, win 108, options [nop,nop,TS val 209602327 ecr 209546443], length 832 19:31:41.355164 IP 10.0.1.23.50250 > 10.0.1.24.ssh: Flags [.], ack 20534432, win 1122, options [nop,nop,TS val 209546443 ecr 209602327], length 0 19:31:41.355173 IP 10.0.1.24.ssh > 10.0.1.23.50250: Flags [P.], seq 20535648:20535856, ack 7393, win 108, options [nop,nop,TS val 209602327 ecr 209546443], length 208 19:31:41.355188 IP 10.0.1.92.nfs > 10.0.1.24.954: Flags [.], seq 602055725:602064673, ack 1211761, win 65535, options [nop,nop,TS val 200506019 ecr 209602324], length 8948 19:31:41.355208 IP 10.0.1.24.954 > 10.0.1.92.nfs: Flags [.], ack 602064673, win 6530, options [nop,nop,TS val 209602327 ecr 200506019], length 0 19:31:41.355213 IP 10.0.1.92.nfs > 10.0.1.24.954: Flags [.], seq 602064673:602073621, ack 1211761, win 65535, options [nop,nop,TS val 200506019 ecr 209602324], length 8948 19:31:41.355218 IP 10.0.1.24.ssh > 10.0.1.23.50250: Flags [P.], seq 20535856:20536064, ack 7393, win 108, options [nop,nop,TS val 209602327 ecr 209546443], length 208 19:31:41.355224 IP 10.0.1.92.nfs > 10.0.1.24.954: Flags [.], seq 602073621:602082569, ack 1211761, win 65535, options [nop,nop,TS val 200506019 ecr 209602324], length 8948 19:31:41.355235 IP 10.0.1.24.954 > 10.0.1.92.nfs: Flags [.], ack 602082569, win 6530, options [nop,nop,TS val 209602327 ecr 200506019], length 0 19:31:41.355239 IP 10.0.1.24.ssh > 10.0.1.23.50250: Flags [P.], seq 20536064:20536416, ack 7393, win 108, options [nop,nop,TS val 209602327 ecr 209546443], length 352 19:31:41.355266 IP 10.0.1.23.50250 > 10.0.1.24.ssh: Flags [.], ack 20535648, win 1122, options [nop,nop,TS val 209546443 ecr 209602327], length 0 19:31:41.355360 IP 10.0.1.23.50250 > 10.0.1.24.ssh: Flags [.], ack 20536416, win 1122, options [nop,nop,TS val 209546443 ecr 209602327], length 0 19:31:41.355429 IP 10.0.1.92.nfs > 10.0.1.24.954: Flags [.], seq 602082569:602091517, ack 1211761, win 65535, options [nop,nop,TS val 200506019 ecr 209602324], length 8948 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140327/87034777/attachment.html>