Hello, I think this is the second time this happens to me. A couple of year ago, I deleted a big (500G) zvol and then the machine started to hang some 20 minutes later (out of memory), even rebooting didnt help. But with the great support from Victor Latushkin, who on a weekend helped me debug the problem (abort the transaction and restart it again, which required some black magic and recompiling of ZFS) it worked. Now I''m facing a similar problem. I was writing about 20GB (from CIFS) to a filesystem. While that was going, I deleted some old files, freeing up about 60GB in the process. After Windows was done deleting those (it was instant), i tried to delete another file, which I didnt have permision to. So I SSHd to the machine and removed it manually (pfexec rm file). And thats where problems started. First, I noticed the rm wasnt instant. It was taking long (over 5 minutes). I tried Ctrl-C, Ctrl-Z, another SSH and kill, nothing worked. After a while it died with "killed". I did a "zfs list", and noticed the free space wasn''t updated. I tried "sync", it also hangs. I try a reboot - it won''t, I guess it''s waiting for the sync to finish. So I hard reboot the machine. When it comes back I can access the ZFS pool again. I go to the directory where I tried to delete the files with "rm": files are still there (they weren''t before the reboot). I try a "sync" again. Same result (hang). "top" shows a decreasing amount of free memory. zpool iostat 5 shows: rpool 69.4G 79.6G 0 0 0 0 tera 3.12T 513G 63 0 144K 0 ---------- ----- ----- ----- ----- ----- ----- rpool 69.4G 79.6G 0 0 0 0 tera 3.12T 513G 63 0 142K 0 ---------- ----- ----- ----- ----- ----- ----- rpool 69.4G 79.6G 0 0 0 0 tera 3.12T 513G 62 0 142K 0 ---------- ----- ----- ----- ----- ----- ----- rpool 69.4G 79.6G 0 0 0 0 tera 3.12T 513G 64 0 144K 0 ---------- ----- ----- ----- ----- ----- ----- rpool 69.4G 79.6G 0 0 0 0 tera 3.12T 513G 65 0 148K 0 Could this be related to the fact that I THINK i enabled deduplication on this pool a while ago (but then I disabled it due to performance reasons)? What should I do? Do I have to wait for these "reads" to finish? Why are they so slow anyway? Thanks, Hernan -- This message posted from opensolaris.org
Hi, some information is missing... How large is your ARC / your main memory? Probably too small to hold all metadata (1/1000 of the data amount). => metadata has to be read again and again A recordsize smaller than 128k increases the problem. Its a data volume, perhaps raidz or raidz2 and you are using an older ZPOOL version? Reading is done for the whole raid stripe when you are reading a block. => the whole raidz stripe has the attributes of a single disk (see Roch''s blog). The number of files is not specified. Updating the dedup table needs random access of the table. ~ 60 reads per second is normal for a sata disk with 7200 RPM. so far nothing suprising... Regards, Ulrich ----- Original Message ----- From: drgenio at gmail.com To: zfs-discuss at opensolaris.org Sent: Monday, July 19, 2010 5:14:03 PM GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna Subject: [zfs-discuss] Deleting large amounts of files Hello, I think this is the second time this happens to me. A couple of year ago, I deleted a big (500G) zvol and then the machine started to hang some 20 minutes later (out of memory), even rebooting didnt help. But with the great support from Victor Latushkin, who on a weekend helped me debug the problem (abort the transaction and restart it again, which required some black magic and recompiling of ZFS) it worked. Now I''m facing a similar problem. I was writing about 20GB (from CIFS) to a filesystem. While that was going, I deleted some old files, freeing up about 60GB in the process. After Windows was done deleting those (it was instant), i tried to delete another file, which I didnt have permision to. So I SSHd to the machine and removed it manually (pfexec rm file). And thats where problems started. First, I noticed the rm wasnt instant. It was taking long (over 5 minutes). I tried Ctrl-C, Ctrl-Z, another SSH and kill, nothing worked. After a while it died with "killed". I did a "zfs list", and noticed the free space wasn''t updated. I tried "sync", it also hangs. I try a reboot - it won''t, I guess it''s waiting for the sync to finish. So I hard reboot the machine. When it comes back I can access the ZFS pool again. I go to the directory where I tried to delete the files with "rm": files are still there (they weren''t before the reboot). I try a "sync" again. Same result (hang). "top" shows a decreasing amount of free memory. zpool iostat 5 shows: rpool 69.4G 79.6G 0 0 0 0 tera 3.12T 513G 63 0 144K 0 ---------- ----- ----- ----- ----- ----- ----- rpool 69.4G 79.6G 0 0 0 0 tera 3.12T 513G 63 0 142K 0 ---------- ----- ----- ----- ----- ----- ----- rpool 69.4G 79.6G 0 0 0 0 tera 3.12T 513G 62 0 142K 0 ---------- ----- ----- ----- ----- ----- ----- rpool 69.4G 79.6G 0 0 0 0 tera 3.12T 513G 64 0 144K 0 ---------- ----- ----- ----- ----- ----- ----- rpool 69.4G 79.6G 0 0 0 0 tera 3.12T 513G 65 0 148K 0 Could this be related to the fact that I THINK i enabled deduplication on this pool a while ago (but then I disabled it due to performance reasons)? What should I do? Do I have to wait for these "reads" to finish? Why are they so slow anyway? Thanks, Hernan -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hi, thanks for answering,> How large is your ARC / your main memory? > ? Probably too small to hold all metadata (1/1000 of the data amount). > ? => metadata has to be read again and againMain memory is 8GB. ARC (according to arcstat.pl) usually stays at 5-7GB> A recordsize smaller than 128k increases the problem.recordsize is default, 128k> Its a data volume, perhaps raidz or raidz2 and you are using an older ZPOOL version?It''s raidz, pool version is 22> ? Reading is done for the whole raid stripe when you are reading a block. > > ? => the whole raidz stripe has the attributes of a single disk (see Roch''s blog). > > The number of files is not specified.some 20 files deleted, each about 4GB in size> Updating the dedup table needs random access of the table.dedup was enabled at some point, but I disabled it long ago. Does it still matter? Should I copy all these files again (or zfs send) to un-dedup those blocks?> > ~ 60 reads per second is normal for a sata disk with 7200 RPM.shouldnt ~60 reads per second at about 128k (not counting prefetch) be about 7MB/s, instead of the 144kbps (!) I''m getting?> > so far nothing suprising... > > > Regards, > > ? ?Ulrich > > > > ----- Original Message ----- > From: drgenio at gmail.com > To: zfs-discuss at opensolaris.org > Sent: Monday, July 19, 2010 5:14:03 PM GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna > Subject: [zfs-discuss] Deleting large amounts of files > > Hello, > I think this is the second time this happens to me. A couple of year ago, I deleted a big (500G) zvol and then the machine started to hang some 20 minutes later (out of memory), even rebooting didnt help. But with the great support from Victor Latushkin, who on a weekend helped me debug the problem (abort the transaction and restart it again, which required some black magic and recompiling of ZFS) it worked. > > Now I''m facing a similar problem. I was writing about 20GB (from CIFS) to a filesystem. While that was going, I deleted some old files, freeing up about 60GB in the process. After Windows was done deleting those (it was instant), i tried to delete another file, which I didnt have permision to. So I SSHd to the machine and removed it manually (pfexec rm file). And thats where problems started. > > First, I noticed the rm wasnt instant. It was taking long (over 5 minutes). I tried Ctrl-C, Ctrl-Z, another SSH and kill, nothing worked. After a while it died with "killed". I did a "zfs list", and noticed the free space wasn''t updated. > > I tried "sync", it also hangs. I try a reboot - it won''t, I guess it''s waiting for the sync to finish. So I hard reboot the machine. When it comes back I can access the ZFS pool again. I go to the directory where I tried to delete the files with "rm": files are still there (they weren''t before the reboot). > > I try a "sync" again. Same result (hang). "top" shows a decreasing amount of free memory. zpool iostat 5 shows: > > rpool ? ? ? 69.4G ?79.6G ? ? ?0 ? ? ?0 ? ? ?0 ? ? ?0 > tera ? ? ? ?3.12T ? 513G ? ? 63 ? ? ?0 ? 144K ? ? ?0 > ---------- ?----- ?----- ?----- ?----- ?----- ?----- > rpool ? ? ? 69.4G ?79.6G ? ? ?0 ? ? ?0 ? ? ?0 ? ? ?0 > tera ? ? ? ?3.12T ? 513G ? ? 63 ? ? ?0 ? 142K ? ? ?0 > ---------- ?----- ?----- ?----- ?----- ?----- ?----- > rpool ? ? ? 69.4G ?79.6G ? ? ?0 ? ? ?0 ? ? ?0 ? ? ?0 > tera ? ? ? ?3.12T ? 513G ? ? 62 ? ? ?0 ? 142K ? ? ?0 > ---------- ?----- ?----- ?----- ?----- ?----- ?----- > rpool ? ? ? 69.4G ?79.6G ? ? ?0 ? ? ?0 ? ? ?0 ? ? ?0 > tera ? ? ? ?3.12T ? 513G ? ? 64 ? ? ?0 ? 144K ? ? ?0 > ---------- ?----- ?----- ?----- ?----- ?----- ?----- > rpool ? ? ? 69.4G ?79.6G ? ? ?0 ? ? ?0 ? ? ?0 ? ? ?0 > tera ? ? ? ?3.12T ? 513G ? ? 65 ? ? ?0 ? 148K ? ? ?0 > > Could this be related to the fact that I THINK i enabled deduplication on this pool a while ago (but then I disabled it due to performance reasons)? > > What should I do? Do I have to wait for these "reads" to finish? Why are they so slow anyway? > > Thanks, > Hernan > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
If these files are deduped, and there is not a lot of RAM on the machine, it can take a long, long time to work through the dedupe portion. I don''t know enough to know if that is what you are experiencing, but it could be the problem. How much RAM do you have? Scott -- This message posted from opensolaris.org
Hi, Hernan Freschi wrote:> Hi, thanks for answering, > > >> How large is your ARC / your main memory? >> Probably too small to hold all metadata (1/1000 of the data amount). >> => metadata has to be read again and again >> > > Main memory is 8GB. ARC (according to arcstat.pl) usually stays at 5-7GB > > >> A recordsize smaller than 128k increases the problem. >> > > recordsize is default, 128k > > >> Its a data volume, perhaps raidz or raidz2 and you are using an older ZPOOL version? >> > It''s raidz, pool version is 22 > > >> Reading is done for the whole raid stripe when you are reading a block. >> >> => the whole raidz stripe has the attributes of a single disk (see Roch''s blog). >> >> The number of files is not specified. >> > some 20 files deleted, each about 4GB in size > > >> Updating the dedup table needs random access of the table. >> > dedup was enabled at some point, but I disabled it long ago. Does it > still matter? Should I copy all these files again (or zfs send) to > un-dedup those blocks? >When you are writing to a file and currently dedup is enabled, then the Data is entered into the dedup table of the pool. (There is one dedup table per pool not per zfs). Switching off the dedup does not change this data. After switching off dedup, he dedup table is used until this file is deleted or overwritten. Deleting or overwriting then accesses the dedup table and corrects the reference count. Therefore you will see effects of the dedup table long after switching dedup off, as long you wrote the file during a time dedup was switched on.>> ~ 60 reads per second is normal for a sata disk with 7200 RPM. >> > > shouldnt ~60 reads per second at about 128k (not counting prefetch) be > about 7MB/s, instead of the 144kbps (!) I''m getting? >ZFS can use smaller blocks, metadata is usually compressed and in the dedup table it is possible, that a dedup block compresses well, when only one slot on a block is used, then a 128k block (logical) can be stored in a 2k physical block. Regards, Ulrich -- Ulrich Graef / Sales Consultant / Hardware Presales / Phone: + 49 6103 752 359 ORACLE Deutschland B.V. & Co. KG / Amperestr. 6 / 63225 Langen http://www.oracle.com ORACLE Deutschland B.V. & Co. KG Hauptverwaltung: Riesstr. 25, D-80992 Muenchen Registergericht: Amtsgericht Muenchen, HRA 95603 Komplementaerin: ORACLE Deutschland Verwaltung B.V. Rijnzathe 6, 3454PV De Meern, Niederlande Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Geschaeftsfuehrer: Juergen Kunz, Marcel van de Molen, Alexander van der Ven -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100720/e87ddb3e/attachment.html>
On Tue, Jul 20, 2010 at 1:40 PM, Ulrich Graef <ulrich.graef at oracle.com> wrote:> When you are writing to a file and currently dedup is enabled, then the > Data is entered into the dedup table of the pool. > (There is one dedup table per pool not per zfs). > > Switching off the dedup does not change this data.Yes, i suppose so (just as enabling dedup or compression doesn''t alter on-disk data),> After switching off dedup, he dedup table is used until this file is deleted > or overwritten.> Deleting or overwriting then accesses the dedup table and corrects the > reference count.Is there a way to see which files are using dedup? Or should I just copy everything to a new ZFS?
I have 8GB RAM, arcsz as reported by arcstat.pl is 5-7GB usually. It took about 20-30 mins to delete the files. Is there a way to see which files have been deduped, so I can copy them again an un-dedupe them? Thanks, Hernan -- This message posted from opensolaris.org
Hi,> Is there a way to see which files have been deduped, so I can copy them again an un-dedupe them?unfortunately, that''s not easy (I''ve tried it :) ). The issue is that the dedup table (which knows which blocks have been deduped) doesn''t know about files. And if you pull block pointers for deduped blocks from the dedup table, you''ll need to backtrack from there through the filesystem structure to figure out what files are associated with those blocks. (remember: Deduplication happens at the block level, not the file level.) So, in order to compile a list of deduped _files_, one would need to extract the list of dedupes _blocks_ from the dedup table, then chase the pointers from the root of the zpool to the blocks in order to figure out what files they''re associated with. Unless there''s a different way that I''m not aware of (and I hope someone can correct me here), the only way to do that is run a scrub-like process and build up a table of files and their blocks. Cheers, Constantin -- Constantin Gonzalez Schmitz | Principal Field Technologist Phone: +49 89 460 08 25 91 || Mobile: +49 172 834 90 30 Oracle Hardware Presales Germany ORACLE Deutschland B.V. & Co. KG | Sonnenallee 1 | 85551 Kirchheim-Heimstetten ORACLE Deutschland B.V. & Co. KG Hauptverwaltung: Riesstra?e 25, D-80992 M?nchen Registergericht: Amtsgericht M?nchen, HRA 95603 Komplement?rin: ORACLE Deutschland Verwaltung B.V. Rijnzathe 6, 3454PV De Meern, Niederlande Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Gesch?ftsf?hrer: J?rgen Kunz, Marcel van de Molen, Alexander van der Ven Oracle is committed to developing practices and products that help protect the environment
On Tue, Jul 20, 2010 at 9:48 AM, Hernan Freschi <drgenio at gmail.com> wrote:> Is there a way to see which files are using dedup? Or should I just > copy everything ?to a new ZFS?Using ''zfs send'' to copy the datasets will work and preserve other metadata that copying will lose. -B -- Brandon High : bhigh at freaks.com