Hi Geoffrey, grep for 'ERROR' from the log file, and only these lines would be sufficient. Thanks, Vijay On Wednesday 10 June 2015 04:38 AM, Geoffrey Letessier wrote:> Hello Vijay, > > Quota-verify is still running since a couple of hours (more than 10) > and each output file sizes (4 files because 4 bricks per replica) are > very huge: around 800MB per file in the first server and 5GB per file > in the second one. Do your still want these? How can I send it to you? > > Nice night (in France) > Geoffrey > ------------------------------------------------------ > Geoffrey Letessier > Responsable informatique & ing?nieur syst?me > UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique > Institut de Biologie Physico-Chimique > 13, rue Pierre et Marie Curie - 75005 Paris > Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr > <mailto:geoffrey.letessier at ibpc.fr> > > Le 9 juin 2015 ? 12:46, Vijaikumar M <vmallika at redhat.com > <mailto:vmallika at redhat.com>> a ?crit : > >> Hi Geoffrey, >> >> The file content deletion is because of 'vi editor' behaviour of >> truncating the file when writing the updated content. >> >> Regarding quota size/usage problem, can you please execute the script >> attached on each brick and provide us the output generated, this will >> help us analyse why quota list is showing wrong-size. >> The script basically crawls the directory given as argument. >> It collects quota "contri" and "size" extended attribute and also >> "block size" from stat call. >> >> Usage: >> >> ./quota-verify -b <brick_path> | tee brick_name.log >> >> >> Thanks, >> Vijay >> >> >> >> On Tuesday 09 June 2015 03:45 PM, Vijaikumar M wrote: >>> >>> >>> On Tuesday 09 June 2015 03:40 PM, Geoffrey Letessier wrote: >>>> Hi Vijay, >>>> >>>> Thanks for having replied. >>>> >>>> Unfortunately, i check each bricks on my stockage pool and dont >>>> find any backup file.. damage! >>> >>> Please check backup file on client machine where the file was edited >>> and on the home dir of a user (this is the user login used to edit a >>> file). >>> >>> Thanks, >>> Vijay >>> >>> >>>> >>>> Thank you again! >>>> Good luck and see you, >>>> Geoffrey >>>> ------------------------------------------------------ >>>> Geoffrey Letessier >>>> Responsable informatique & ing?nieur syst?me >>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>> Institut de Biologie Physico-Chimique >>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>> <mailto:geoffrey.letessier at ibpc.fr> >>>> >>>>> Le 9 juin 2015 ? 10:05, Vijaikumar M <vmallika at redhat.com >>>>> <mailto:vmallika at redhat.com>> a ?crit : >>>>> >>>>> >>>>> >>>>> On Tuesday 09 June 2015 01:08 PM, Geoffrey Letessier wrote: >>>>>> Hi, >>>>>> >>>>>> Yes of course: >>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -s >>>>>> /export/brick_home/brick*/amyloid_team >>>>>> cl-storage1: 1608522280/export/brick_home/brick1/amyloid_team >>>>>> cl-storage3: 1619630616/export/brick_home/brick1/amyloid_team >>>>>> cl-storage1: 1614057836/export/brick_home/brick2/amyloid_team >>>>>> cl-storage3: 1602653808/export/brick_home/brick2/amyloid_team >>>>>> >>>>>> The sum is: 6444864540 (around 6.4-6.5TB) while the quota list >>>>>> displays 7.7TB. >>>>>> So, the mistake is roughly 1.2-1.3TB, in other words around 16% >>>>>> -which is too huge, no? >>>>>> >>>>>> In addition, since the quota is exceeded, i note a lot of files >>>>>> like following: >>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] "cd >>>>>> /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/; >>>>>> ls -ail remd_100.sh 2> /dev/null" 2>/dev/null >>>>>> cl-storage3: 133325688 ---------T 2 tarus amyloid_team 0 16 f?vr. >>>>>> 10:20 remd_100.sh >>>>>> note the ?T? at the end of perms and the file size to 0B. >>>>>> >>>>>> And, yesterday, some files were duplicated but not anymore... >>>>>> >>>>>> The worst is, previously, all these files were OK. In other >>>>>> words, exceeding quota made file or content deletions or >>>>>> corruptions? What can I do to prevent to situation for the futur >>>>>> -because I guess i cannot do something to rollback this situation >>>>>> now, right? >>>>>> >>>>> >>>>> Hi Geoffrey, >>>>> >>>>> I tried re-creating the problem. >>>>> >>>>> Here is the behaviour of vi editor. >>>>> When a file is saved in vi editor, it creates a backup file under >>>>> home dir and opens the original file with 'O_TRUNC' flag and hence >>>>> file was truncated. >>>>> >>>>> >>>>> Here is the strace of vi editor when it gets 'EDQUOT' error: >>>>> >>>>> open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 3 >>>>> write(3, "line one\nline two\n", 18) = 18 >>>>> fsync(3) = 0 >>>>> close(3) = -1 EDQUOT (Disk quota exceeded) >>>>> chmod("hello", 0100644) = 0 >>>>> open("/root/hello~", O_RDONLY) = 3 >>>>> *open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 7* >>>>> read(3, "line one\n", 256) = 9 >>>>> write(7, "line one\n", 9) = 9 >>>>> read(3, "", 256) = 0 >>>>> close(7) = -1 EDQUOT (Disk quota exceeded) >>>>> close(3) = 0 >>>>> >>>>> >>>>> To re-cover the truncated file, please find if there are any >>>>> backup file 'remd_115.sh~' under '~/' or on the same dir where >>>>> this file exists.If exists you can copy this file. >>>>> >>>>> Thanks, >>>>> Vijay >>>>> >>>>> >>>>>> Geoffrey >>>>>> ------------------------------------------------------ >>>>>> Geoffrey Letessier >>>>>> Responsable informatique & ing?nieur syst?me >>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>> Institut de Biologie Physico-Chimique >>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>>>> <mailto:geoffrey.letessier at ibpc.fr> >>>>>> >>>>>>> Le 9 juin 2015 ? 09:01, Vijaikumar M <vmallika at redhat.com >>>>>>> <mailto:vmallika at redhat.com>> a ?crit : >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Monday 08 June 2015 07:11 PM, Geoffrey Letessier wrote: >>>>>>>> In addition, i notice a very big difference between the sum of >>>>>>>> DU on each brick and ? quota list ? display, as you can read >>>>>>>> below: >>>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -sh >>>>>>>> /export/brick_home/brick*/amyloid_team >>>>>>>> cl-storage1: 1,6T/export/brick_home/brick1/amyloid_team >>>>>>>> cl-storage3: 1,6T/export/brick_home/brick1/amyloid_team >>>>>>>> cl-storage1: 1,6T/export/brick_home/brick2/amyloid_team >>>>>>>> cl-storage3: 1,6T/export/brick_home/brick2/amyloid_team >>>>>>>> [root at lucifer ~]# gluster volume quota vol_home list /amyloid_team >>>>>>>> Path Hard-limit Soft-limit Used Available >>>>>>>> -------------------------------------------------------------------------------- >>>>>>>> /amyloid_team 9.0TB 90% 7.8TB 1.2TB >>>>>>>> >>>>>>>> As you can notice, the sum of all bricks gives me roughly 6.4TB >>>>>>>> and ? quota list ? around 7.8TB; so there is a difference of >>>>>>>> 1.4TB i?m not able to explain? Do you have any idea? >>>>>>>> >>>>>>> >>>>>>> There were few issues when quota accounting the size, we have >>>>>>> fixed some of these issues in 3.7 >>>>>>> 'df -h' will round off the values, can you please provide the >>>>>>> output of 'df' without -h option? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Geoffrey >>>>>>>> ------------------------------------------------------ >>>>>>>> Geoffrey Letessier >>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>>>>>> <mailto:geoffrey.letessier at ibpc.fr> >>>>>>>> >>>>>>>>> Le 8 juin 2015 ? 14:30, Geoffrey Letessier >>>>>>>>> <geoffrey.letessier at cnrs.fr >>>>>>>>> <mailto:geoffrey.letessier at cnrs.fr>> a ?crit : >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> Concerning the 3.5.3 version of GlusterFS, I met this morning >>>>>>>>> a strange issue writing file when quota is exceeded. >>>>>>>>> >>>>>>>>> One person of my lab, whose her quota is exceeded (but she >>>>>>>>> didn?t know about) try to modify a file but, because of >>>>>>>>> exceeded quota, she was unable to and decided to exit VI. Now, >>>>>>>>> her file is empty/blank as you can read below: >>>>>>> we suspect 'vi' might have created tmp file before writing to a >>>>>>> file. We are working on re-creating this problem and will update >>>>>>> you on the same. >>>>>>> >>>>>>> >>>>>>>>> pdsh at lucifer: cl-storage3: ssh exited with exit code 2 >>>>>>>>> cl-storage1: ---------T 2 tarus amyloid_team 0 19 f?vr. 12:34 >>>>>>>>> /export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh >>>>>>>>> cl-storage1: -rwxrw-r-- 2 tarus amyloid_team 0 8 juin 12:38 >>>>>>>>> /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh >>>>>>>>> >>>>>>>>> In addition, i dont understand why, my volume being a >>>>>>>>> distributed volume inside replica (cl-storage[1,3] is >>>>>>>>> replicated only on cl-storage[2,4]), i have 2 ? same ? files >>>>>>>>> (complete path) in 2 different bricks (as you can read above). >>>>>>>>> >>>>>>>>> Thanks by advance for your help and clarification. >>>>>>>>> Geoffrey >>>>>>>>> ------------------------------------------------------ >>>>>>>>> Geoffrey Letessier >>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>>>>>>> <mailto:geoffrey.letessier at ibpc.fr> >>>>>>>>> >>>>>>>>>> Le 2 juin 2015 ? 23:45, Geoffrey Letessier >>>>>>>>>> <geoffrey.letessier at cnrs.fr >>>>>>>>>> <mailto:geoffrey.letessier at cnrs.fr>> a ?crit : >>>>>>>>>> >>>>>>>>>> Hi Ben, >>>>>>>>>> >>>>>>>>>> I just check my messages log files, both on client and >>>>>>>>>> server, and I dont find any hung task you notice on yours.. >>>>>>>>>> >>>>>>>>>> As you can read below, i dont note the performance issue in a >>>>>>>>>> simple DD but I think my issue is concerning a set of small >>>>>>>>>> files (tens of thousands nay more)? >>>>>>>>>> >>>>>>>>>> [root at nisus test]# ddt -t 10g /mnt/test/ >>>>>>>>>> Writing to /mnt/test/ddt.8362 ... syncing ... done. >>>>>>>>>> sleeping 10 seconds ... done. >>>>>>>>>> Reading from /mnt/test/ddt.8362 ... done. >>>>>>>>>> 10240MiB KiB/s CPU% >>>>>>>>>> Write 114770 4 >>>>>>>>>> Read 40675 4 >>>>>>>>>> >>>>>>>>>> for info: /mnt/test concerns the single v2 GlFS volume >>>>>>>>>> >>>>>>>>>> [root at nisus test]# ddt -t 10g /mnt/fhgfs/ >>>>>>>>>> Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done. >>>>>>>>>> sleeping 10 seconds ... done. >>>>>>>>>> Reading from /mnt/fhgfs/ddt.8380 ... done. >>>>>>>>>> 10240MiB KiB/s CPU% >>>>>>>>>> Write 102591 1 >>>>>>>>>> Read 98079 2 >>>>>>>>>> >>>>>>>>>> Do you have a idea how to tune/optimize performance settings? >>>>>>>>>> and/or TCP settings (MTU, etc.)? >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>> | | UNTAR | DU | FIND | TAR | RM | >>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s | >>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | >>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s | >>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | >>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s | >>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>> | BeeGFS | ~3m43s | ~15s | ~3s | ~1m33s | ~46s | >>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>> | single (v2) | ~3m6s | ~14s | ~32s | ~1m2s | ~44s | >>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>> for info: >>>>>>>>>> -BeeGFS is a distributed FS (4 bricks, 2 bricks per server >>>>>>>>>> and 2 servers) >>>>>>>>>> - single (v2): simple gluster volume with default settings >>>>>>>>>> >>>>>>>>>> I also note I obtain the same tar/untar performance issue >>>>>>>>>> with FhGFS/BeeGFS but the rest (DU, FIND, RM) looks like to >>>>>>>>>> be OK. >>>>>>>>>> >>>>>>>>>> Thank you very much for your reply and help. >>>>>>>>>> Geoffrey >>>>>>>>>> ----------------------------------------------- >>>>>>>>>> Geoffrey Letessier >>>>>>>>>> >>>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>>> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique >>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr >>>>>>>>>> <mailto:geoffrey.letessier at cnrs.fr> >>>>>>>>>> >>>>>>>>>> Le 2 juin 2015 ? 21:53, Ben Turner <bturner at redhat.com >>>>>>>>>> <mailto:bturner at redhat.com>> a ?crit : >>>>>>>>>> >>>>>>>>>>> I am seeing problems on 3.7 as well. Can you check >>>>>>>>>>> /var/log/messages on both the clients and servers for hung >>>>>>>>>>> tasks like: >>>>>>>>>>> >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: "echo 0 > >>>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: iozone D >>>>>>>>>>> 0000000000000001 0 21999 1 0x00000080 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: ffff880611321cc8 >>>>>>>>>>> 0000000000000082 ffff880611321c18 ffffffffa027236e >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: ffff880611321c48 >>>>>>>>>>> ffffffffa0272c10 ffff88052bd1e040 ffff880611321c78 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: ffff88052bd1e0f0 >>>>>>>>>>> ffff88062080c7a0 ffff880625addaf8 ffff880611321fd8 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: Call Trace: >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffffa027236e>] ? >>>>>>>>>>> rpc_make_runnable+0x7e/0x80 [sunrpc] >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffffa0272c10>] ? >>>>>>>>>>> rpc_execute+0x50/0xa0 [sunrpc] >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff810aaa21>] ? >>>>>>>>>>> ktime_get_ts+0xb1/0xf0 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811242d0>] ? >>>>>>>>>>> sync_page+0x0/0x50 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8152a1b3>] >>>>>>>>>>> io_schedule+0x73/0xc0 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8112430d>] >>>>>>>>>>> sync_page+0x3d/0x50 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8152ac7f>] >>>>>>>>>>> __wait_on_bit+0x5f/0x90 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff81124543>] >>>>>>>>>>> wait_on_page_bit+0x73/0x80 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8109eb80>] ? >>>>>>>>>>> wake_bit_function+0x0/0x50 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8113a525>] ? >>>>>>>>>>> pagevec_lookup_tag+0x25/0x40 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8112496b>] >>>>>>>>>>> wait_on_page_writeback_range+0xfb/0x190 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff81124b38>] >>>>>>>>>>> filemap_write_and_wait_range+0x78/0x90 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c07ce>] >>>>>>>>>>> vfs_fsync_range+0x7e/0x100 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c08bd>] >>>>>>>>>>> vfs_fsync+0x1d/0x20 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c08fe>] >>>>>>>>>>> do_fsync+0x3e/0x60 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c0950>] >>>>>>>>>>> sys_fsync+0x10/0x20 >>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8100b072>] >>>>>>>>>>> system_call_fastpath+0x16/0x1b >>>>>>>>>>> >>>>>>>>>>> Do you see a perf problem with just a simple DD or do you >>>>>>>>>>> need a more complex workload to hit the issue? I think I >>>>>>>>>>> saw an issue with metadata performance that I am trying to >>>>>>>>>>> run down, let me know if you can see the problem with simple >>>>>>>>>>> DD reads / writes or if we need to do some sort of dir / >>>>>>>>>>> metadata access as well. >>>>>>>>>>> >>>>>>>>>>> -b >>>>>>>>>>> >>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr >>>>>>>>>>>> <mailto:geoffrey.letessier at cnrs.fr>> >>>>>>>>>>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com >>>>>>>>>>>> <mailto:pkarampu at redhat.com>> >>>>>>>>>>>> Cc:gluster-users at gluster.org <mailto:gluster-users at gluster.org> >>>>>>>>>>>> Sent: Tuesday, June 2, 2015 8:09:04 AM >>>>>>>>>>>> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor >>>>>>>>>>>> performances >>>>>>>>>>>> >>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>> >>>>>>>>>>>> I?m sorry but I cannot bring you any comparison because >>>>>>>>>>>> comparison will be >>>>>>>>>>>> distorted by the fact in my HPC cluster in production the >>>>>>>>>>>> network technology >>>>>>>>>>>> is InfiniBand QDR and my volumes are quite different (brick >>>>>>>>>>>> in RAID6 >>>>>>>>>>>> (12x2TB), 2 bricks per server and 4 servers into my pool) >>>>>>>>>>>> >>>>>>>>>>>> Concerning your demand, in attachments you can find all >>>>>>>>>>>> expected results >>>>>>>>>>>> hoping it can help you to solve this serious performance >>>>>>>>>>>> issue (maybe I need >>>>>>>>>>>> play with glusterfs parameters?). >>>>>>>>>>>> >>>>>>>>>>>> Thank you very much by advance, >>>>>>>>>>>> Geoffrey >>>>>>>>>>>> ------------------------------------------------------ >>>>>>>>>>>> Geoffrey Letessier >>>>>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr >>>>>>>>>>>> <mailto:geoffrey.letessier at ibpc.fr> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Le 2 juin 2015 ? 10:09, Pranith Kumar Karampuri < >>>>>>>>>>>> pkarampu at redhat.com <mailto:pkarampu at redhat.com> > a >>>>>>>>>>>> ?crit : >>>>>>>>>>>> >>>>>>>>>>>> hi Geoffrey, >>>>>>>>>>>> Since you are saying it happens on all types of volumes, >>>>>>>>>>>> lets do the >>>>>>>>>>>> following: >>>>>>>>>>>> 1) Create a dist-repl volume >>>>>>>>>>>> 2) Set the options etc you need. >>>>>>>>>>>> 3) enable gluster volume profile using "gluster volume >>>>>>>>>>>> profile <volname> >>>>>>>>>>>> start" >>>>>>>>>>>> 4) run the work load >>>>>>>>>>>> 5) give output of "gluster volume profile <volname> info" >>>>>>>>>>>> >>>>>>>>>>>> Repeat the steps above on new and old version you are >>>>>>>>>>>> comparing this with. >>>>>>>>>>>> That should give us insight into what could be causing the >>>>>>>>>>>> slowness. >>>>>>>>>>>> >>>>>>>>>>>> Pranith >>>>>>>>>>>> On 06/02/2015 03:22 AM, Geoffrey Letessier wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Dear all, >>>>>>>>>>>> >>>>>>>>>>>> I have a crash test cluster where i?ve tested the new >>>>>>>>>>>> version of GlusterFS >>>>>>>>>>>> (v3.7) before upgrading my HPC cluster in production. >>>>>>>>>>>> But? all my tests show me very very low performances. >>>>>>>>>>>> >>>>>>>>>>>> For my benches, as you can read below, I do some actions >>>>>>>>>>>> (untar, du, find, >>>>>>>>>>>> tar, rm) with linux kernel sources, dropping cache, each on >>>>>>>>>>>> distributed, >>>>>>>>>>>> replicated, distributed-replicated, single (single brick) >>>>>>>>>>>> volumes and the >>>>>>>>>>>> native FS of one brick. >>>>>>>>>>>> >>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar xJf >>>>>>>>>>>> ~/linux-4.1-rc5.tar.xz; >>>>>>>>>>>> sync; echo 3 > /proc/sys/vm/drop_caches) >>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; du -sh >>>>>>>>>>>> linux-4.1-rc5/; echo 3 > >>>>>>>>>>>> /proc/sys/vm/drop_caches) >>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; find >>>>>>>>>>>> linux-4.1-rc5/|wc -l; echo 3 >>>>>>>>>>>>> /proc/sys/vm/drop_caches) >>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar czf >>>>>>>>>>>> linux-4.1-rc5.tgz >>>>>>>>>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches) >>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf >>>>>>>>>>>> linux-4.1-rc5.tgz >>>>>>>>>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches) >>>>>>>>>>>> >>>>>>>>>>>> And here are the process times: >>>>>>>>>>>> >>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>> | | UNTAR | DU | FIND | TAR | RM | >>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s | >>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | >>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s | >>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | >>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s | >>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>> >>>>>>>>>>>> I get the same results, whether with default configurations >>>>>>>>>>>> with custom >>>>>>>>>>>> configurations. >>>>>>>>>>>> >>>>>>>>>>>> if I look at the side of the ifstat command, I can note my >>>>>>>>>>>> IO write processes >>>>>>>>>>>> never exceed 3MBs... >>>>>>>>>>>> >>>>>>>>>>>> EXT4 native FS seems to be faster (roughly 15-20% but no >>>>>>>>>>>> more) than XFS one >>>>>>>>>>>> >>>>>>>>>>>> My [test] storage cluster config is composed by 2 identical >>>>>>>>>>>> servers (biCPU >>>>>>>>>>>> Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb >>>>>>>>>>>> ethernet) >>>>>>>>>>>> >>>>>>>>>>>> My volume settings: >>>>>>>>>>>> single: 1server 1 brick >>>>>>>>>>>> replicated: 2 servers 1 brick each >>>>>>>>>>>> distributed: 2 servers 2 bricks each >>>>>>>>>>>> dist-repl: 2 bricks in the same server and replica 2 >>>>>>>>>>>> >>>>>>>>>>>> All seems to be OK in gluster status command line. >>>>>>>>>>>> >>>>>>>>>>>> Do you have an idea why I obtain so bad results? >>>>>>>>>>>> Thanks in advance. >>>>>>>>>>>> Geoffrey >>>>>>>>>>>> ----------------------------------------------- >>>>>>>>>>>> Geoffrey Letessier >>>>>>>>>>>> >>>>>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>>>>> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique >>>>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr >>>>>>>>>>>> <mailto:geoffrey.letessier at cnrs.fr> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Gluster-users mailing list Gluster-users at gluster.org >>>>>>>>>>>> <mailto:Gluster-users at gluster.org> >>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>> >>>>> >>>> >>> >> >> <quota-verify.gz> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150610/2d409068/attachment-0001.html>
Hello Vijay, OK, so you can find in attachments the 100000 first errors lines in each brick files. Thanks in advance for treatment and help, Best, Geoffrey ------------------------------------------------------ Geoffrey Letessier Responsable informatique & ing?nieur syst?me UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr> Le 10 juin 2015 ? 06:12, Vijaikumar M <vmallika at redhat.com> a ?crit : > > Hi Geoffrey, > > grep for 'ERROR' from the log file, and only these lines would be sufficient. > > Thanks, > Vijay > > > On Wednesday 10 June 2015 04:38 AM, Geoffrey Letessier wrote: >> Hello Vijay, >> >> Quota-verify is still running since a couple of hours (more than 10) and each output file sizes (4 files because 4 bricks per replica) are very huge: around 800MB per file in the first server and 5GB per file in the second one. Do your still want these? How can I send it to you? >> >> Nice night (in France) >> Geoffrey >> ------------------------------------------------------ >> Geoffrey Letessier >> Responsable informatique & ing?nieur syst?me >> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >> Institut de Biologie Physico-Chimique >> 13, rue Pierre et Marie Curie - 75005 Paris >> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr> >> Le 9 juin 2015 ? 12:46, Vijaikumar M <vmallika at redhat.com <mailto:vmallika at redhat.com>> a ?crit : >> >>> Hi Geoffrey, >>> >>> The file content deletion is because of 'vi editor' behaviour of truncating the file when writing the updated content. >>> >>> Regarding quota size/usage problem, can you please execute the script attached on each brick and provide us the output generated, this will help us analyse why quota list is showing wrong-size. >>> The script basically crawls the directory given as argument. >>> It collects quota "contri" and "size" extended attribute and also "block size" from stat call. >>> >>> Usage: >>> >>> ./quota-verify -b <brick_path> | tee brick_name.log >>> >>> >>> Thanks, >>> Vijay >>> >>> >>> >>> On Tuesday 09 June 2015 03:45 PM, Vijaikumar M wrote: >>>> >>>> >>>> On Tuesday 09 June 2015 03:40 PM, Geoffrey Letessier wrote: >>>>> Hi Vijay, >>>>> >>>>> Thanks for having replied. >>>>> >>>>> Unfortunately, i check each bricks on my stockage pool and dont find any backup file.. damage! >>>> >>>> Please check backup file on client machine where the file was edited and on the home dir of a user (this is the user login used to edit a file). >>>> >>>> Thanks, >>>> Vijay >>>> >>>> >>>>> >>>>> Thank you again! >>>>> Good luck and see you, >>>>> Geoffrey >>>>> ------------------------------------------------------ >>>>> Geoffrey Letessier >>>>> Responsable informatique & ing?nieur syst?me >>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>> Institut de Biologie Physico-Chimique >>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr> >>>>>> Le 9 juin 2015 ? 10:05, Vijaikumar M <vmallika at redhat.com <mailto:vmallika at redhat.com>> a ?crit : >>>>>> >>>>>> >>>>>> >>>>>> On Tuesday 09 June 2015 01:08 PM, Geoffrey Letessier wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Yes of course: >>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -s /export/brick_home/brick*/amyloid_team >>>>>>> cl-storage1: 1608522280 /export/brick_home/brick1/amyloid_team >>>>>>> cl-storage3: 1619630616 /export/brick_home/brick1/amyloid_team >>>>>>> cl-storage1: 1614057836 /export/brick_home/brick2/amyloid_team >>>>>>> cl-storage3: 1602653808 /export/brick_home/brick2/amyloid_team >>>>>>> >>>>>>> The sum is: 6444864540 (around 6.4-6.5TB) while the quota list displays 7.7TB. >>>>>>> So, the mistake is roughly 1.2-1.3TB, in other words around 16% -which is too huge, no? >>>>>>> >>>>>>> In addition, since the quota is exceeded, i note a lot of files like following: >>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] "cd /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/; ls -ail remd_100.sh 2> /dev/null" 2>/dev/null >>>>>>> cl-storage3: 133325688 ---------T 2 tarus amyloid_team 0 16 f?vr. 10:20 remd_100.sh >>>>>>> note the ?T? at the end of perms and the file size to 0B. >>>>>>> >>>>>>> And, yesterday, some files were duplicated but not anymore... >>>>>>> >>>>>>> The worst is, previously, all these files were OK. In other words, exceeding quota made file or content deletions or corruptions? What can I do to prevent to situation for the futur -because I guess i cannot do something to rollback this situation now, right? >>>>>>> >>>>>> >>>>>> Hi Geoffrey, >>>>>> >>>>>> I tried re-creating the problem. >>>>>> >>>>>> Here is the behaviour of vi editor. >>>>>> When a file is saved in vi editor, it creates a backup file under home dir and opens the original file with 'O_TRUNC' flag and hence file was truncated. >>>>>> >>>>>> >>>>>> Here is the strace of vi editor when it gets 'EDQUOT' error: >>>>>> >>>>>> open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 3 >>>>>> write(3, "line one\nline two\n", 18) = 18 >>>>>> fsync(3) = 0 >>>>>> close(3) = -1 EDQUOT (Disk quota exceeded) >>>>>> chmod("hello", 0100644) = 0 >>>>>> open("/root/hello~", O_RDONLY) = 3 >>>>>> open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 7 >>>>>> read(3, "line one\n", 256) = 9 >>>>>> write(7, "line one\n", 9) = 9 >>>>>> read(3, "", 256) = 0 >>>>>> close(7) = -1 EDQUOT (Disk quota exceeded) >>>>>> close(3) = 0 >>>>>> >>>>>> >>>>>> To re-cover the truncated file, please find if there are any backup file 'remd_115.sh~' under '~/' or on the same dir where this file exists. If exists you can copy this file. >>>>>> >>>>>> Thanks, >>>>>> Vijay >>>>>> >>>>>> >>>>>>> Geoffrey >>>>>>> ------------------------------------------------------ >>>>>>> Geoffrey Letessier >>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>> Institut de Biologie Physico-Chimique >>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr> >>>>>>>> Le 9 juin 2015 ? 09:01, Vijaikumar M <vmallika at redhat.com <mailto:vmallika at redhat.com>> a ?crit : >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Monday 08 June 2015 07:11 PM, Geoffrey Letessier wrote: >>>>>>>>> In addition, i notice a very big difference between the sum of DU on each brick and ? quota list ? display, as you can read below: >>>>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -sh /export/brick_home/brick*/amyloid_team >>>>>>>>> cl-storage1: 1,6T >>>>>>>>> /export/brick_home/brick1/amyloid_team >>>>>>>>> cl-storage3: 1,6T >>>>>>>>> /export/brick_home/brick1/amyloid_team >>>>>>>>> cl-storage1: 1,6T >>>>>>>>> /export/brick_home/brick2/amyloid_team >>>>>>>>> cl-storage3: 1,6T >>>>>>>>> /export/brick_home/brick2/amyloid_team >>>>>>>>> [root at lucifer ~]# gluster volume quota vol_home list /amyloid_team >>>>>>>>> Path Hard-limit Soft-limit Used Available >>>>>>>>> -------------------------------------------------------------------------------- >>>>>>>>> /amyloid_team 9.0TB 90% 7.8TB 1.2TB >>>>>>>>> >>>>>>>>> As you can notice, the sum of all bricks gives me roughly 6.4TB and ? quota list ? around 7.8TB; so there is a difference of 1.4TB i?m not able to explain? Do you have any idea? >>>>>>>>> >>>>>>>> >>>>>>>> There were few issues when quota accounting the size, we have fixed some of these issues in 3.7 >>>>>>>> 'df -h' will round off the values, can you please provide the output of 'df' without -h option? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Geoffrey >>>>>>>>> ------------------------------------------------------ >>>>>>>>> Geoffrey Letessier >>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr> >>>>>>>>>> Le 8 juin 2015 ? 14:30, Geoffrey Letessier <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>> a ?crit : >>>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> Concerning the 3.5.3 version of GlusterFS, I met this morning a strange issue writing file when quota is exceeded. >>>>>>>>>> >>>>>>>>>> One person of my lab, whose her quota is exceeded (but she didn?t know about) try to modify a file but, because of exceeded quota, she was unable to and decided to exit VI. Now, her file is empty/blank as you can read below: >>>>>>>> we suspect 'vi' might have created tmp file before writing to a file. We are working on re-creating this problem and will update you on the same. >>>>>>>> >>>>>>>> >>>>>>>>>> pdsh at lucifer: cl-storage3: ssh exited with exit code 2 >>>>>>>>>> cl-storage1: ---------T 2 tarus amyloid_team 0 19 f?vr. 12:34 /export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh >>>>>>>>>> cl-storage1: -rwxrw-r-- 2 tarus amyloid_team 0 8 juin 12:38 /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh >>>>>>>>>> >>>>>>>>>> In addition, i dont understand why, my volume being a distributed volume inside replica (cl-storage[1,3] is replicated only on cl-storage[2,4]), i have 2 ? same ? files (complete path) in 2 different bricks (as you can read above). >>>>>>>>>> >>>>>>>>>> Thanks by advance for your help and clarification. >>>>>>>>>> Geoffrey >>>>>>>>>> ------------------------------------------------------ >>>>>>>>>> Geoffrey Letessier >>>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr> >>>>>>>>>>> Le 2 juin 2015 ? 23:45, Geoffrey Letessier <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>> a ?crit : >>>>>>>>>>> >>>>>>>>>>> Hi Ben, >>>>>>>>>>> >>>>>>>>>>> I just check my messages log files, both on client and server, and I dont find any hung task you notice on yours.. >>>>>>>>>>> >>>>>>>>>>> As you can read below, i dont note the performance issue in a simple DD but I think my issue is concerning a set of small files (tens of thousands nay more)? >>>>>>>>>>> >>>>>>>>>>> [root at nisus test]# ddt -t 10g /mnt/test/ >>>>>>>>>>> Writing to /mnt/test/ddt.8362 ... syncing ... done. >>>>>>>>>>> sleeping 10 seconds ... done. >>>>>>>>>>> Reading from /mnt/test/ddt.8362 ... done. >>>>>>>>>>> 10240MiB KiB/s CPU% >>>>>>>>>>> Write 114770 4 >>>>>>>>>>> Read 40675 4 >>>>>>>>>>> >>>>>>>>>>> for info: /mnt/test concerns the single v2 GlFS volume >>>>>>>>>>> >>>>>>>>>>> [root at nisus test]# ddt -t 10g /mnt/fhgfs/ >>>>>>>>>>> Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done. >>>>>>>>>>> sleeping 10 seconds ... done. >>>>>>>>>>> Reading from /mnt/fhgfs/ddt.8380 ... done. >>>>>>>>>>> 10240MiB KiB/s CPU% >>>>>>>>>>> Write 102591 1 >>>>>>>>>>> Read 98079 2 >>>>>>>>>>> >>>>>>>>>>> Do you have a idea how to tune/optimize performance settings? and/or TCP settings (MTU, etc.)? >>>>>>>>>>> >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | | UNTAR | DU | FIND | TAR | RM | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | BeeGFS | ~3m43s | ~15s | ~3s | ~1m33s | ~46s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | single (v2) | ~3m6s | ~14s | ~32s | ~1m2s | ~44s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> for info: >>>>>>>>>>> -BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 servers) >>>>>>>>>>> - single (v2): simple gluster volume with default settings >>>>>>>>>>> >>>>>>>>>>> I also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS but the rest (DU, FIND, RM) looks like to be OK. >>>>>>>>>>> >>>>>>>>>>> Thank you very much for your reply and help. >>>>>>>>>>> Geoffrey >>>>>>>>>>> ----------------------------------------------- >>>>>>>>>>> Geoffrey Letessier >>>>>>>>>>> >>>>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>>>> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique >>>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr> >>>>>>>>>>> Le 2 juin 2015 ? 21:53, Ben Turner <bturner at redhat.com <mailto:bturner at redhat.com>> a ?crit : >>>>>>>>>>> >>>>>>>>>>>> I am seeing problems on 3.7 as well. Can you check /var/log/messages on both the clients and servers for hung tasks like: >>>>>>>>>>>> >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: iozone D 0000000000000001 0 21999 1 0x00000080 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: ffff880611321cc8 0000000000000082 ffff880611321c18 ffffffffa027236e >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: ffff880611321c48 ffffffffa0272c10 ffff88052bd1e040 ffff880611321c78 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: ffff88052bd1e0f0 ffff88062080c7a0 ffff880625addaf8 ffff880611321fd8 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: Call Trace: >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffffa027236e>] ? rpc_make_runnable+0x7e/0x80 [sunrpc] >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffffa0272c10>] ? rpc_execute+0x50/0xa0 [sunrpc] >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff810aaa21>] ? ktime_get_ts+0xb1/0xf0 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811242d0>] ? sync_page+0x0/0x50 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8152a1b3>] io_schedule+0x73/0xc0 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8112430d>] sync_page+0x3d/0x50 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8152ac7f>] __wait_on_bit+0x5f/0x90 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff81124543>] wait_on_page_bit+0x73/0x80 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8109eb80>] ? wake_bit_function+0x0/0x50 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8113a525>] ? pagevec_lookup_tag+0x25/0x40 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8112496b>] wait_on_page_writeback_range+0xfb/0x190 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff81124b38>] filemap_write_and_wait_range+0x78/0x90 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c07ce>] vfs_fsync_range+0x7e/0x100 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c08bd>] vfs_fsync+0x1d/0x20 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c08fe>] do_fsync+0x3e/0x60 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c0950>] sys_fsync+0x10/0x20 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b >>>>>>>>>>>> >>>>>>>>>>>> Do you see a perf problem with just a simple DD or do you need a more complex workload to hit the issue? I think I saw an issue with metadata performance that I am trying to run down, let me know if you can see the problem with simple DD reads / writes or if we need to do some sort of dir / metadata access as well. >>>>>>>>>>>> >>>>>>>>>>>> -b >>>>>>>>>>>> >>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>> >>>>>>>>>>>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> >>>>>>>>>>>>> Cc: gluster-users at gluster.org <mailto:gluster-users at gluster.org> >>>>>>>>>>>>> Sent: Tuesday, June 2, 2015 8:09:04 AM >>>>>>>>>>>>> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>> >>>>>>>>>>>>> I?m sorry but I cannot bring you any comparison because comparison will be >>>>>>>>>>>>> distorted by the fact in my HPC cluster in production the network technology >>>>>>>>>>>>> is InfiniBand QDR and my volumes are quite different (brick in RAID6 >>>>>>>>>>>>> (12x2TB), 2 bricks per server and 4 servers into my pool) >>>>>>>>>>>>> >>>>>>>>>>>>> Concerning your demand, in attachments you can find all expected results >>>>>>>>>>>>> hoping it can help you to solve this serious performance issue (maybe I need >>>>>>>>>>>>> play with glusterfs parameters?). >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you very much by advance, >>>>>>>>>>>>> Geoffrey >>>>>>>>>>>>> ------------------------------------------------------ >>>>>>>>>>>>> Geoffrey Letessier >>>>>>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Le 2 juin 2015 ? 10:09, Pranith Kumar Karampuri < pkarampu at redhat.com <mailto:pkarampu at redhat.com> > a >>>>>>>>>>>>> ?crit : >>>>>>>>>>>>> >>>>>>>>>>>>> hi Geoffrey, >>>>>>>>>>>>> Since you are saying it happens on all types of volumes, lets do the >>>>>>>>>>>>> following: >>>>>>>>>>>>> 1) Create a dist-repl volume >>>>>>>>>>>>> 2) Set the options etc you need. >>>>>>>>>>>>> 3) enable gluster volume profile using "gluster volume profile <volname> >>>>>>>>>>>>> start" >>>>>>>>>>>>> 4) run the work load >>>>>>>>>>>>> 5) give output of "gluster volume profile <volname> info" >>>>>>>>>>>>> >>>>>>>>>>>>> Repeat the steps above on new and old version you are comparing this with. >>>>>>>>>>>>> That should give us insight into what could be causing the slowness. >>>>>>>>>>>>> >>>>>>>>>>>>> Pranith >>>>>>>>>>>>> On 06/02/2015 03:22 AM, Geoffrey Letessier wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Dear all, >>>>>>>>>>>>> >>>>>>>>>>>>> I have a crash test cluster where i?ve tested the new version of GlusterFS >>>>>>>>>>>>> (v3.7) before upgrading my HPC cluster in production. >>>>>>>>>>>>> But? all my tests show me very very low performances. >>>>>>>>>>>>> >>>>>>>>>>>>> For my benches, as you can read below, I do some actions (untar, du, find, >>>>>>>>>>>>> tar, rm) with linux kernel sources, dropping cache, each on distributed, >>>>>>>>>>>>> replicated, distributed-replicated, single (single brick) volumes and the >>>>>>>>>>>>> native FS of one brick. >>>>>>>>>>>>> >>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar xJf ~/linux-4.1-rc5.tar.xz; >>>>>>>>>>>>> sync; echo 3 > /proc/sys/vm/drop_caches) >>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3 > >>>>>>>>>>>>> /proc/sys/vm/drop_caches) >>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; echo 3 >>>>>>>>>>>>>> /proc/sys/vm/drop_caches) >>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz >>>>>>>>>>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches) >>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz >>>>>>>>>>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches) >>>>>>>>>>>>> >>>>>>>>>>>>> And here are the process times: >>>>>>>>>>>>> >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> | | UNTAR | DU | FIND | TAR | RM | >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s | >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s | >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s | >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> >>>>>>>>>>>>> I get the same results, whether with default configurations with custom >>>>>>>>>>>>> configurations. >>>>>>>>>>>>> >>>>>>>>>>>>> if I look at the side of the ifstat command, I can note my IO write processes >>>>>>>>>>>>> never exceed 3MBs... >>>>>>>>>>>>> >>>>>>>>>>>>> EXT4 native FS seems to be faster (roughly 15-20% but no more) than XFS one >>>>>>>>>>>>> >>>>>>>>>>>>> My [test] storage cluster config is composed by 2 identical servers (biCPU >>>>>>>>>>>>> Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet) >>>>>>>>>>>>> >>>>>>>>>>>>> My volume settings: >>>>>>>>>>>>> single: 1server 1 brick >>>>>>>>>>>>> replicated: 2 servers 1 brick each >>>>>>>>>>>>> distributed: 2 servers 2 bricks each >>>>>>>>>>>>> dist-repl: 2 bricks in the same server and replica 2 >>>>>>>>>>>>> >>>>>>>>>>>>> All seems to be OK in gluster status command line. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you have an idea why I obtain so bad results? >>>>>>>>>>>>> Thanks in advance. >>>>>>>>>>>>> Geoffrey >>>>>>>>>>>>> ----------------------------------------------- >>>>>>>>>>>>> Geoffrey Letessier >>>>>>>>>>>>> >>>>>>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>>>>>> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique >>>>>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Gluster-users mailing list Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>> >>>>>> >>>>> >>>> >>> >>> <quota-verify.gz> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150610/31c2eeb5/attachment-0002.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: storage-bricks-log-extract.tgz Type: application/octet-stream Size: 1910223 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150610/31c2eeb5/attachment-0001.obj> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150610/31c2eeb5/attachment-0003.html>
Hi Vijay, Could you take a time to take a look at this? I found only one thing about my issues in Red Hat bugzilla (https://bugzilla.redhat.com/show_bug.cgi?id=917901 <https://bugzilla.redhat.com/show_bug.cgi?id=917901>) But, my storage & computing clusters are still in production now and I wonder if I should warn my community about of a needed production break or can I apply a fix during production? (i.e. without updating my GlusterFS version on my storage cluster). Thanks in advance, Geoffrey ------------------------------------------------------ Geoffrey Letessier Responsable informatique & ing?nieur syst?me UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr> Le 10 juin 2015 ? 06:12, Vijaikumar M <vmallika at redhat.com> a ?crit : > > Hi Geoffrey, > > grep for 'ERROR' from the log file, and only these lines would be sufficient. > > Thanks, > Vijay > > > On Wednesday 10 June 2015 04:38 AM, Geoffrey Letessier wrote: >> Hello Vijay, >> >> Quota-verify is still running since a couple of hours (more than 10) and each output file sizes (4 files because 4 bricks per replica) are very huge: around 800MB per file in the first server and 5GB per file in the second one. Do your still want these? How can I send it to you? >> >> Nice night (in France) >> Geoffrey >> ------------------------------------------------------ >> Geoffrey Letessier >> Responsable informatique & ing?nieur syst?me >> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >> Institut de Biologie Physico-Chimique >> 13, rue Pierre et Marie Curie - 75005 Paris >> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr> >> Le 9 juin 2015 ? 12:46, Vijaikumar M <vmallika at redhat.com <mailto:vmallika at redhat.com>> a ?crit : >> >>> Hi Geoffrey, >>> >>> The file content deletion is because of 'vi editor' behaviour of truncating the file when writing the updated content. >>> >>> Regarding quota size/usage problem, can you please execute the script attached on each brick and provide us the output generated, this will help us analyse why quota list is showing wrong-size. >>> The script basically crawls the directory given as argument. >>> It collects quota "contri" and "size" extended attribute and also "block size" from stat call. >>> >>> Usage: >>> >>> ./quota-verify -b <brick_path> | tee brick_name.log >>> >>> >>> Thanks, >>> Vijay >>> >>> >>> >>> On Tuesday 09 June 2015 03:45 PM, Vijaikumar M wrote: >>>> >>>> >>>> On Tuesday 09 June 2015 03:40 PM, Geoffrey Letessier wrote: >>>>> Hi Vijay, >>>>> >>>>> Thanks for having replied. >>>>> >>>>> Unfortunately, i check each bricks on my stockage pool and dont find any backup file.. damage! >>>> >>>> Please check backup file on client machine where the file was edited and on the home dir of a user (this is the user login used to edit a file). >>>> >>>> Thanks, >>>> Vijay >>>> >>>> >>>>> >>>>> Thank you again! >>>>> Good luck and see you, >>>>> Geoffrey >>>>> ------------------------------------------------------ >>>>> Geoffrey Letessier >>>>> Responsable informatique & ing?nieur syst?me >>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>> Institut de Biologie Physico-Chimique >>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr> >>>>>> Le 9 juin 2015 ? 10:05, Vijaikumar M <vmallika at redhat.com <mailto:vmallika at redhat.com>> a ?crit : >>>>>> >>>>>> >>>>>> >>>>>> On Tuesday 09 June 2015 01:08 PM, Geoffrey Letessier wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Yes of course: >>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -s /export/brick_home/brick*/amyloid_team >>>>>>> cl-storage1: 1608522280 /export/brick_home/brick1/amyloid_team >>>>>>> cl-storage3: 1619630616 /export/brick_home/brick1/amyloid_team >>>>>>> cl-storage1: 1614057836 /export/brick_home/brick2/amyloid_team >>>>>>> cl-storage3: 1602653808 /export/brick_home/brick2/amyloid_team >>>>>>> >>>>>>> The sum is: 6444864540 (around 6.4-6.5TB) while the quota list displays 7.7TB. >>>>>>> So, the mistake is roughly 1.2-1.3TB, in other words around 16% -which is too huge, no? >>>>>>> >>>>>>> In addition, since the quota is exceeded, i note a lot of files like following: >>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] "cd /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/; ls -ail remd_100.sh 2> /dev/null" 2>/dev/null >>>>>>> cl-storage3: 133325688 ---------T 2 tarus amyloid_team 0 16 f?vr. 10:20 remd_100.sh >>>>>>> note the ?T? at the end of perms and the file size to 0B. >>>>>>> >>>>>>> And, yesterday, some files were duplicated but not anymore... >>>>>>> >>>>>>> The worst is, previously, all these files were OK. In other words, exceeding quota made file or content deletions or corruptions? What can I do to prevent to situation for the futur -because I guess i cannot do something to rollback this situation now, right? >>>>>>> >>>>>> >>>>>> Hi Geoffrey, >>>>>> >>>>>> I tried re-creating the problem. >>>>>> >>>>>> Here is the behaviour of vi editor. >>>>>> When a file is saved in vi editor, it creates a backup file under home dir and opens the original file with 'O_TRUNC' flag and hence file was truncated. >>>>>> >>>>>> >>>>>> Here is the strace of vi editor when it gets 'EDQUOT' error: >>>>>> >>>>>> open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 3 >>>>>> write(3, "line one\nline two\n", 18) = 18 >>>>>> fsync(3) = 0 >>>>>> close(3) = -1 EDQUOT (Disk quota exceeded) >>>>>> chmod("hello", 0100644) = 0 >>>>>> open("/root/hello~", O_RDONLY) = 3 >>>>>> open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 7 >>>>>> read(3, "line one\n", 256) = 9 >>>>>> write(7, "line one\n", 9) = 9 >>>>>> read(3, "", 256) = 0 >>>>>> close(7) = -1 EDQUOT (Disk quota exceeded) >>>>>> close(3) = 0 >>>>>> >>>>>> >>>>>> To re-cover the truncated file, please find if there are any backup file 'remd_115.sh~' under '~/' or on the same dir where this file exists. If exists you can copy this file. >>>>>> >>>>>> Thanks, >>>>>> Vijay >>>>>> >>>>>> >>>>>>> Geoffrey >>>>>>> ------------------------------------------------------ >>>>>>> Geoffrey Letessier >>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>> Institut de Biologie Physico-Chimique >>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr> >>>>>>>> Le 9 juin 2015 ? 09:01, Vijaikumar M <vmallika at redhat.com <mailto:vmallika at redhat.com>> a ?crit : >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Monday 08 June 2015 07:11 PM, Geoffrey Letessier wrote: >>>>>>>>> In addition, i notice a very big difference between the sum of DU on each brick and ? quota list ? display, as you can read below: >>>>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -sh /export/brick_home/brick*/amyloid_team >>>>>>>>> cl-storage1: 1,6T >>>>>>>>> /export/brick_home/brick1/amyloid_team >>>>>>>>> cl-storage3: 1,6T >>>>>>>>> /export/brick_home/brick1/amyloid_team >>>>>>>>> cl-storage1: 1,6T >>>>>>>>> /export/brick_home/brick2/amyloid_team >>>>>>>>> cl-storage3: 1,6T >>>>>>>>> /export/brick_home/brick2/amyloid_team >>>>>>>>> [root at lucifer ~]# gluster volume quota vol_home list /amyloid_team >>>>>>>>> Path Hard-limit Soft-limit Used Available >>>>>>>>> -------------------------------------------------------------------------------- >>>>>>>>> /amyloid_team 9.0TB 90% 7.8TB 1.2TB >>>>>>>>> >>>>>>>>> As you can notice, the sum of all bricks gives me roughly 6.4TB and ? quota list ? around 7.8TB; so there is a difference of 1.4TB i?m not able to explain? Do you have any idea? >>>>>>>>> >>>>>>>> >>>>>>>> There were few issues when quota accounting the size, we have fixed some of these issues in 3.7 >>>>>>>> 'df -h' will round off the values, can you please provide the output of 'df' without -h option? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Geoffrey >>>>>>>>> ------------------------------------------------------ >>>>>>>>> Geoffrey Letessier >>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr> >>>>>>>>>> Le 8 juin 2015 ? 14:30, Geoffrey Letessier <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>> a ?crit : >>>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> Concerning the 3.5.3 version of GlusterFS, I met this morning a strange issue writing file when quota is exceeded. >>>>>>>>>> >>>>>>>>>> One person of my lab, whose her quota is exceeded (but she didn?t know about) try to modify a file but, because of exceeded quota, she was unable to and decided to exit VI. Now, her file is empty/blank as you can read below: >>>>>>>> we suspect 'vi' might have created tmp file before writing to a file. We are working on re-creating this problem and will update you on the same. >>>>>>>> >>>>>>>> >>>>>>>>>> pdsh at lucifer: cl-storage3: ssh exited with exit code 2 >>>>>>>>>> cl-storage1: ---------T 2 tarus amyloid_team 0 19 f?vr. 12:34 /export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh >>>>>>>>>> cl-storage1: -rwxrw-r-- 2 tarus amyloid_team 0 8 juin 12:38 /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh >>>>>>>>>> >>>>>>>>>> In addition, i dont understand why, my volume being a distributed volume inside replica (cl-storage[1,3] is replicated only on cl-storage[2,4]), i have 2 ? same ? files (complete path) in 2 different bricks (as you can read above). >>>>>>>>>> >>>>>>>>>> Thanks by advance for your help and clarification. >>>>>>>>>> Geoffrey >>>>>>>>>> ------------------------------------------------------ >>>>>>>>>> Geoffrey Letessier >>>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr> >>>>>>>>>>> Le 2 juin 2015 ? 23:45, Geoffrey Letessier <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>> a ?crit : >>>>>>>>>>> >>>>>>>>>>> Hi Ben, >>>>>>>>>>> >>>>>>>>>>> I just check my messages log files, both on client and server, and I dont find any hung task you notice on yours.. >>>>>>>>>>> >>>>>>>>>>> As you can read below, i dont note the performance issue in a simple DD but I think my issue is concerning a set of small files (tens of thousands nay more)? >>>>>>>>>>> >>>>>>>>>>> [root at nisus test]# ddt -t 10g /mnt/test/ >>>>>>>>>>> Writing to /mnt/test/ddt.8362 ... syncing ... done. >>>>>>>>>>> sleeping 10 seconds ... done. >>>>>>>>>>> Reading from /mnt/test/ddt.8362 ... done. >>>>>>>>>>> 10240MiB KiB/s CPU% >>>>>>>>>>> Write 114770 4 >>>>>>>>>>> Read 40675 4 >>>>>>>>>>> >>>>>>>>>>> for info: /mnt/test concerns the single v2 GlFS volume >>>>>>>>>>> >>>>>>>>>>> [root at nisus test]# ddt -t 10g /mnt/fhgfs/ >>>>>>>>>>> Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done. >>>>>>>>>>> sleeping 10 seconds ... done. >>>>>>>>>>> Reading from /mnt/fhgfs/ddt.8380 ... done. >>>>>>>>>>> 10240MiB KiB/s CPU% >>>>>>>>>>> Write 102591 1 >>>>>>>>>>> Read 98079 2 >>>>>>>>>>> >>>>>>>>>>> Do you have a idea how to tune/optimize performance settings? and/or TCP settings (MTU, etc.)? >>>>>>>>>>> >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | | UNTAR | DU | FIND | TAR | RM | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | BeeGFS | ~3m43s | ~15s | ~3s | ~1m33s | ~46s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> | single (v2) | ~3m6s | ~14s | ~32s | ~1m2s | ~44s | >>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>> for info: >>>>>>>>>>> -BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 servers) >>>>>>>>>>> - single (v2): simple gluster volume with default settings >>>>>>>>>>> >>>>>>>>>>> I also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS but the rest (DU, FIND, RM) looks like to be OK. >>>>>>>>>>> >>>>>>>>>>> Thank you very much for your reply and help. >>>>>>>>>>> Geoffrey >>>>>>>>>>> ----------------------------------------------- >>>>>>>>>>> Geoffrey Letessier >>>>>>>>>>> >>>>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>>>> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique >>>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr> >>>>>>>>>>> Le 2 juin 2015 ? 21:53, Ben Turner <bturner at redhat.com <mailto:bturner at redhat.com>> a ?crit : >>>>>>>>>>> >>>>>>>>>>>> I am seeing problems on 3.7 as well. Can you check /var/log/messages on both the clients and servers for hung tasks like: >>>>>>>>>>>> >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: iozone D 0000000000000001 0 21999 1 0x00000080 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: ffff880611321cc8 0000000000000082 ffff880611321c18 ffffffffa027236e >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: ffff880611321c48 ffffffffa0272c10 ffff88052bd1e040 ffff880611321c78 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: ffff88052bd1e0f0 ffff88062080c7a0 ffff880625addaf8 ffff880611321fd8 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: Call Trace: >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffffa027236e>] ? rpc_make_runnable+0x7e/0x80 [sunrpc] >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffffa0272c10>] ? rpc_execute+0x50/0xa0 [sunrpc] >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff810aaa21>] ? ktime_get_ts+0xb1/0xf0 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811242d0>] ? sync_page+0x0/0x50 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8152a1b3>] io_schedule+0x73/0xc0 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8112430d>] sync_page+0x3d/0x50 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8152ac7f>] __wait_on_bit+0x5f/0x90 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff81124543>] wait_on_page_bit+0x73/0x80 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8109eb80>] ? wake_bit_function+0x0/0x50 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8113a525>] ? pagevec_lookup_tag+0x25/0x40 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8112496b>] wait_on_page_writeback_range+0xfb/0x190 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff81124b38>] filemap_write_and_wait_range+0x78/0x90 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c07ce>] vfs_fsync_range+0x7e/0x100 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c08bd>] vfs_fsync+0x1d/0x20 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c08fe>] do_fsync+0x3e/0x60 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff811c0950>] sys_fsync+0x10/0x20 >>>>>>>>>>>> Jun 2 15:23:14 gqac006 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b >>>>>>>>>>>> >>>>>>>>>>>> Do you see a perf problem with just a simple DD or do you need a more complex workload to hit the issue? I think I saw an issue with metadata performance that I am trying to run down, let me know if you can see the problem with simple DD reads / writes or if we need to do some sort of dir / metadata access as well. >>>>>>>>>>>> >>>>>>>>>>>> -b >>>>>>>>>>>> >>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>> >>>>>>>>>>>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> >>>>>>>>>>>>> Cc: gluster-users at gluster.org <mailto:gluster-users at gluster.org> >>>>>>>>>>>>> Sent: Tuesday, June 2, 2015 8:09:04 AM >>>>>>>>>>>>> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Pranith, >>>>>>>>>>>>> >>>>>>>>>>>>> I?m sorry but I cannot bring you any comparison because comparison will be >>>>>>>>>>>>> distorted by the fact in my HPC cluster in production the network technology >>>>>>>>>>>>> is InfiniBand QDR and my volumes are quite different (brick in RAID6 >>>>>>>>>>>>> (12x2TB), 2 bricks per server and 4 servers into my pool) >>>>>>>>>>>>> >>>>>>>>>>>>> Concerning your demand, in attachments you can find all expected results >>>>>>>>>>>>> hoping it can help you to solve this serious performance issue (maybe I need >>>>>>>>>>>>> play with glusterfs parameters?). >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you very much by advance, >>>>>>>>>>>>> Geoffrey >>>>>>>>>>>>> ------------------------------------------------------ >>>>>>>>>>>>> Geoffrey Letessier >>>>>>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique >>>>>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Le 2 juin 2015 ? 10:09, Pranith Kumar Karampuri < pkarampu at redhat.com <mailto:pkarampu at redhat.com> > a >>>>>>>>>>>>> ?crit : >>>>>>>>>>>>> >>>>>>>>>>>>> hi Geoffrey, >>>>>>>>>>>>> Since you are saying it happens on all types of volumes, lets do the >>>>>>>>>>>>> following: >>>>>>>>>>>>> 1) Create a dist-repl volume >>>>>>>>>>>>> 2) Set the options etc you need. >>>>>>>>>>>>> 3) enable gluster volume profile using "gluster volume profile <volname> >>>>>>>>>>>>> start" >>>>>>>>>>>>> 4) run the work load >>>>>>>>>>>>> 5) give output of "gluster volume profile <volname> info" >>>>>>>>>>>>> >>>>>>>>>>>>> Repeat the steps above on new and old version you are comparing this with. >>>>>>>>>>>>> That should give us insight into what could be causing the slowness. >>>>>>>>>>>>> >>>>>>>>>>>>> Pranith >>>>>>>>>>>>> On 06/02/2015 03:22 AM, Geoffrey Letessier wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Dear all, >>>>>>>>>>>>> >>>>>>>>>>>>> I have a crash test cluster where i?ve tested the new version of GlusterFS >>>>>>>>>>>>> (v3.7) before upgrading my HPC cluster in production. >>>>>>>>>>>>> But? all my tests show me very very low performances. >>>>>>>>>>>>> >>>>>>>>>>>>> For my benches, as you can read below, I do some actions (untar, du, find, >>>>>>>>>>>>> tar, rm) with linux kernel sources, dropping cache, each on distributed, >>>>>>>>>>>>> replicated, distributed-replicated, single (single brick) volumes and the >>>>>>>>>>>>> native FS of one brick. >>>>>>>>>>>>> >>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar xJf ~/linux-4.1-rc5.tar.xz; >>>>>>>>>>>>> sync; echo 3 > /proc/sys/vm/drop_caches) >>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3 > >>>>>>>>>>>>> /proc/sys/vm/drop_caches) >>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; echo 3 >>>>>>>>>>>>>> /proc/sys/vm/drop_caches) >>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz >>>>>>>>>>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches) >>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz >>>>>>>>>>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches) >>>>>>>>>>>>> >>>>>>>>>>>>> And here are the process times: >>>>>>>>>>>>> >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> | | UNTAR | DU | FIND | TAR | RM | >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s | >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s | >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s | >>>>>>>>>>>>> --------------------------------------------------------------- >>>>>>>>>>>>> >>>>>>>>>>>>> I get the same results, whether with default configurations with custom >>>>>>>>>>>>> configurations. >>>>>>>>>>>>> >>>>>>>>>>>>> if I look at the side of the ifstat command, I can note my IO write processes >>>>>>>>>>>>> never exceed 3MBs... >>>>>>>>>>>>> >>>>>>>>>>>>> EXT4 native FS seems to be faster (roughly 15-20% but no more) than XFS one >>>>>>>>>>>>> >>>>>>>>>>>>> My [test] storage cluster config is composed by 2 identical servers (biCPU >>>>>>>>>>>>> Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet) >>>>>>>>>>>>> >>>>>>>>>>>>> My volume settings: >>>>>>>>>>>>> single: 1server 1 brick >>>>>>>>>>>>> replicated: 2 servers 1 brick each >>>>>>>>>>>>> distributed: 2 servers 2 bricks each >>>>>>>>>>>>> dist-repl: 2 bricks in the same server and replica 2 >>>>>>>>>>>>> >>>>>>>>>>>>> All seems to be OK in gluster status command line. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you have an idea why I obtain so bad results? >>>>>>>>>>>>> Thanks in advance. >>>>>>>>>>>>> Geoffrey >>>>>>>>>>>>> ----------------------------------------------- >>>>>>>>>>>>> Geoffrey Letessier >>>>>>>>>>>>> >>>>>>>>>>>>> Responsable informatique & ing?nieur syst?me >>>>>>>>>>>>> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique >>>>>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Gluster-users mailing list Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users <http://www.gluster.org/mailman/listinfo/gluster-users> >>>>>>> >>>>>> >>>>> >>>> >>> >>> <quota-verify.gz> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150611/26f852d8/attachment.html>