thr3ads.net - CentOS - [CentOS] Understanding VDO vs ZFS [May 2020]

If this information is useful, please help other people find it:
Share via:

david

2020-May-03 02:50 UTC

[CentOS] Understanding VDO vs ZFS

Folks

I'm looking for a solution for backups because ZFS has failed on me 
too many times.  In my environment, I have a large amount of data 
(around 2tb) that I periodically back up.  I keep the last 5 
"snapshots".  I use rsync so that when I overwrite the oldest backup, 
most of the data is already there and the backup completes quickly, 
because only a small number of files have actually changed.

Because of this low change rate, I have used ZFS with its 
deduplication feature to store the data.  I started using a Centos-6 
installation, and upgraded years ago to Centos7.  Centos 8 is on my 
agenda.  However, I've had several data-loss events with ZFS where 
because of a combination of errors and/or mistakes, the entire store 
was lost.  I've also noticed that ZFS is maintained separately from 
Centos.  At this moment, the Centos 8 update causes ZFS to 
fail.  Looking for an alternate, I'm trying VDO.

In the VDO installation, I created a logical volume containing two 
hard-drives, and defined VDO on top of that logical volume.  It 
appears to be running, yet I find the deduplication numbers don't 
pass the smell test.  I would expect that if the logical volume 
contains three copies of essentially identical data, I should see 
deduplication numbers close to 3.00, but instead I'm seeing numbers 
like 1.15.  I compute the compression number as follows:
  Use df and extract the value for "1k blocks used" from the third
column
  use vdostats --verbose and extract the number titled "1K-blocks
used"

Divide the first by the second.

Can you provide any advice on my use of ZFS or VDO without telling me 
that I should be doing backups differently?

Thanks

David

Erick Perez - Quadrian Enterprises

2020-May-03 03:07 UTC

head link

[CentOS] Understanding VDO vs ZFS

My two cents:
1- Do you have an encrypted filesystem on top of VDO? If yes, you will see
no benefit from dedupe.
2- can you post the stats of  vdostats ?verbose /dev/mapper/xxxxx (replace
with your device)

you can do something like:  "vdostats -verbose /dev/mapper/xxxxxxxx | grep
-B6 'save percentage'




On Sat, May 2, 2020 at 9:54 PM david <david at daku.org> wrote:
> Folks
>
> I'm looking for a solution for backups because ZFS has failed on me
> too many times.  In my environment, I have a large amount of data
> (around 2tb) that I periodically back up.  I keep the last 5
> "snapshots".  I use rsync so that when I overwrite the oldest
backup,
> most of the data is already there and the backup completes quickly,
> because only a small number of files have actually changed.
>
> Because of this low change rate, I have used ZFS with its
> deduplication feature to store the data.  I started using a Centos-6
> installation, and upgraded years ago to Centos7.  Centos 8 is on my
> agenda.  However, I've had several data-loss events with ZFS where
> because of a combination of errors and/or mistakes, the entire store
> was lost.  I've also noticed that ZFS is maintained separately from
> Centos.  At this moment, the Centos 8 update causes ZFS to
> fail.  Looking for an alternate, I'm trying VDO.
>
> In the VDO installation, I created a logical volume containing two
> hard-drives, and defined VDO on top of that logical volume.  It
> appears to be running, yet I find the deduplication numbers don't
> pass the smell test.  I would expect that if the logical volume
> contains three copies of essentially identical data, I should see
> deduplication numbers close to 3.00, but instead I'm seeing numbers
> like 1.15.  I compute the compression number as follows:
>   Use df and extract the value for "1k blocks used" from the
third column
>   use vdostats --verbose and extract the number titled "1K-blocks
used"
>
> Divide the first by the second.
>
> Can you provide any advice on my use of ZFS or VDO without telling me
> that I should be doing backups differently?
>
> Thanks
>
> David
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>

-- 

---------------------
Erick Perez

Erick Perez - Quadrian Enterprises

2020-May-03 05:33 UTC

head link

[CentOS] Understanding VDO vs ZFS

sorry corrections:
For this test I created a 40GB lvm volume group with /dev/sdb and /dev/sdc
then a 40GB LV
then a 60GB VDO vol (for testing purposes)

vdostats --verbose /dev/mapper/vdoas | grep -B6 'saving percent'
output from just created vdoas

[root at localhost ~]# vdostats --verbose /dev/mapper/vdoas | grep -B6
'saving
percent'
physical blocks                     : 10483712
  logical blocks                      : 15728640
  1K-blocks                           : 41934848
  1K-blocks used                      : 4212024
  1K-blocks available                 : 37722824
  used percent                        : 10
  saving percent                      : 99
[root at localhost ~]#

FIRST copy CentOS-7-x86_64-Minimal-2003.iso (1.1G) to vdoas from source
outside vdo volume
[root at localhost ~]# vdostats --verbose /dev/mapper/vdoas | grep -B6
'saving
percent'
  1K-blocks used                      : 4721348
  1K-blocks available                 : 37213500
  used percent                        : 11
  saving percent                      : 9

SECOND copy  CentOS-7-x86_64-Minimal-2003.iso (1.1G) to vdoas form source
outside vdo volume
#cp /root/CentOS-7-x86_64-Minimal-2003.iso
/mnt/vdomounts/CentOS-7-x86_64-Minimal-2003-version2.iso
  1K-blocks used                      : 5239012
  1K-blocks available                 : 36695836
  used percent                        : 12
  saving percent                      : 52

THIRD  copy  CentOS-7-x86_64-Minimal-2003.iso (1.1G) to
vdoas form inside vdo volume to inside vdo volume
  1K-blocks used                      : 5248060
  1K-blocks available                 : 36686788
  used percent                        : 12
  saving percent                      : 67

Then I did this a total of 9 more times to have 10 ISOs copied. Total data
copied 10.6GB.


Do note this:
When using DF, it will show the VDO size, in my case 60G
when using vdostats it will show the size of the LV, in my case 40G
Remeber dedupe AND compression are enabled.

The df -hT output shows the logical space occupied by these iso files as
seen by the filesystem on the VDO volume.
Since VDO manages a logical to physical block map, df sees logical space
consumed according to the file system that resides on top of the VDO
volume.
vdostats --hu is viewing the physical block device as managed by VDO.
Physically a single .ISO image is residing on the disk, but logically the
file system thinks there are 10 copies, occupying 10.6GB.

So at the end I have 10 .ISOs of 1086 1MB blocks (total 10860 1MB blocks)
that yield these results:
  1K-blocks used                      : 5248212
  1K-blocks available                 : 36686636
  used percent                        : 12
  saving percent                      : 89

So at the end it is using 5248212 1K blocks minus  4212024  initial used 1K
blocks, gives (5248212 - 4212024) = 1036188 1K blocks / 1024 = about 1012MB
total.

Hope this helps understanding where the space goes.

BTW: Testing system is CentOS Linux release 7.8.2003 stock. with only "yum
install vdo kmod-kvdo"

History of commands:
[root at localhost vdomounts]# history
    2  pvcreate /dev/sdb
    3  pvcreate /dev/sdc
    8  vgcreate -v -A y vgvol01 /dev/sdb /dev/sdc
    9  vgdisplay
   13  lvcreate -l 100%FREE -n lvvdo01 vgvol01
   14   yum install vdo kmod-kvdo
   18  vdo create --name=vdoas --device=/dev/vgvol01/lvvdo01
--vdoLogicalSize=60G --writePolicy=async
   19  mkfs.xfs -K /dev/mapper/vdoas
   20  ls /mnt
   21  mkdir /mnt/vdomounts
   22  mount /dev/mapper/vdoas /mnt//vdomounts/
   26  vdostats --verbose /dev/mapper/vdoas | grep -B6 'saving percent'
   28  cp /root/CentOS-7-x86_64-Minimal-2003.iso /mnt/vdomounts/ -vvv
   29  vdostats --verbose /dev/mapper/vdoas | grep -B6 'saving percent'
   30  cp /root/CentOS-7-x86_64-Minimal-2003.iso
/mnt/vdomounts/CentOS-7-x86_64-Minimal-2003-version2.iso
   31  vdostats --verbose /dev/mapper/vdoas | grep -B6 'saving percent'
   33  cd /mnt/vdomounts/
   35  cp CentOS-7-x86_64-Minimal-2003-version2.iso
./CentOS-7-x86_64-Minimal-2003-version3.iso
   36  vdostats --verbose /dev/mapper/vdoas | grep -B6 'saving percent'
   37  df
   39  vdostats --hu
   40  ls -l --block-size=1MB /root/CentOS-7-x86_64-Minimal-2003.iso
   41  df -hT
   42  vdo status | grep Dedupl
   43  vdostats --hu
   44  vdostats
   48  cp CentOS-7-x86_64-Minimal-2003-version2.iso
./CentOS-7-x86_64-Minimal-2003-version4.iso
   49  cp CentOS-7-x86_64-Minimal-2003-version2.iso
./CentOS-7-x86_64-Minimal-2003-version5.iso
   50  cp CentOS-7-x86_64-Minimal-2003-version2.iso
./CentOS-7-x86_64-Minimal-2003-version6.iso
   51  cp CentOS-7-x86_64-Minimal-2003-version2.iso
./CentOS-7-x86_64-Minimal-2003-version7.iso
   52  cp CentOS-7-x86_64-Minimal-2003-version2.iso
./CentOS-7-x86_64-Minimal-2003-version8.iso
   53  cp CentOS-7-x86_64-Minimal-2003-version2.iso
./CentOS-7-x86_64-Minimal-2003-version9.iso
   54  df -hT
   55  ls -l --block-size=1MB
   56  vdostats --hu
   57  df -hT
   58  df
   59  vdostats --hu
   60  vdostats
   61  vdostats --verbose /dev/mapper/vdoas | grep -B6 'saving percent'
   62  cat /etc/centos-release
   63  history
[root at localhost vdomounts]#





On Sat, May 2, 2020 at 10:07 PM Erick Perez - Quadrian Enterprises <
eperez at quadrianweb.com> wrote:
> My two cents:
> 1- Do you have an encrypted filesystem on top of VDO? If yes, you will see
> no benefit from dedupe.
> 2- can you post the stats of  vdostats ?verbose /dev/mapper/xxxxx (replace
> with your device)
>
> you can do something like:  "vdostats -verbose /dev/mapper/xxxxxxxx |
grep
> -B6 'save percentage'
>
>
>
>
> On Sat, May 2, 2020 at 9:54 PM david <david at daku.org> wrote:
>
>> Folks
>>
>> I'm looking for a solution for backups because ZFS has failed on me
>> too many times.  In my environment, I have a large amount of data
>> (around 2tb) that I periodically back up.  I keep the last 5
>> "snapshots".  I use rsync so that when I overwrite the oldest
backup,
>> most of the data is already there and the backup completes quickly,
>> because only a small number of files have actually changed.
>>
>> Because of this low change rate, I have used ZFS with its
>> deduplication feature to store the data.  I started using a Centos-6
>> installation, and upgraded years ago to Centos7.  Centos 8 is on my
>> agenda.  However, I've had several data-loss events with ZFS where
>> because of a combination of errors and/or mistakes, the entire store
>> was lost.  I've also noticed that ZFS is maintained separately from
>> Centos.  At this moment, the Centos 8 update causes ZFS to
>> fail.  Looking for an alternate, I'm trying VDO.
>>
>> In the VDO installation, I created a logical volume containing two
>> hard-drives, and defined VDO on top of that logical volume.  It
>> appears to be running, yet I find the deduplication numbers don't
>> pass the smell test.  I would expect that if the logical volume
>> contains three copies of essentially identical data, I should see
>> deduplication numbers close to 3.00, but instead I'm seeing numbers
>> like 1.15.  I compute the compression number as follows:
>>   Use df and extract the value for "1k blocks used" from the
third column
>>   use vdostats --verbose and extract the number titled "1K-blocks
used"
>>
>> Divide the first by the second.
>>
>> Can you provide any advice on my use of ZFS or VDO without telling me
>> that I should be doing backups differently?
>>
>> Thanks
>>
>> David
>>
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> https://lists.centos.org/mailman/listinfo/centos
>>
>
>
> --
>
> ---------------------
> Erick Perez
>
>
-- 

---------------------
Erick Perez
Quadrian Enterprises S.A. - Panama, Republica de Panama
Skype chat: eaperezh
WhatsApp IM: +507-6675-5083
---------------------

Alessandro Baggi

2020-May-03 07:39 UTC

head link

[CentOS] Understanding VDO vs ZFS

Il 03/05/20 04:50, david ha scritto:> Folks
>
> I'm looking for a solution for backups because ZFS has failed on me 
> too many times.? In my environment, I have a large amount of data 
> (around 2tb) that I periodically back up.? I keep the last 5 
> "snapshots".? I use rsync so that when I overwrite the oldest
backup,
> most of the data is already there and the backup completes quickly, 
> because only a small number of files have actually changed.
>
> Because of this low change rate, I have used ZFS with its 
> deduplication feature to store the data.? I started using a Centos-6 
> installation, and upgraded years ago to Centos7.? Centos 8 is on my 
> agenda.? However, I've had several data-loss events with ZFS where 
> because of a combination of errors and/or mistakes, the entire store 
> was lost.? I've also noticed that ZFS is maintained separately from 
> Centos.? At this moment, the Centos 8 update causes ZFS to fail.? 
> Looking for an alternate, I'm trying VDO.
>
> In the VDO installation, I created a logical volume containing two 
> hard-drives, and defined VDO on top of that logical volume.? It 
> appears to be running, yet I find the deduplication numbers don't pass 
> the smell test.? I would expect that if the logical volume contains 
> three copies of essentially identical data, I should see deduplication 
> numbers close to 3.00, but instead I'm seeing numbers like 1.15.? I 
> compute the compression number as follows:
> ?Use df and extract the value for "1k blocks used" from the third
column
> ?use vdostats --verbose and extract the number titled "1K-blocks
used"
>
> Divide the first by the second.
>
> Can you provide any advice on my use of ZFS or VDO without telling me 
> that I should be doing backups differently?
>
> Thanks
>
> David
>Hi David, I'm not an expert about vdo but I will try it for backup 
purpose with rsync + hardlink. I know that this is not an answer you 
asked, sorry for this.

Many user said me to use? more specific tool for running backup using 
deduplication (borg in my case). I'm testing it and I'm not sure if I 
will adopt it in the long term. As you reported, you are using rsync 
solution so I would ask: why not use a more specific tool? What are 
benefits to stay with rsync for you?

Thank you in advance.

david

2020-May-03 15:43 UTC

head link

[CentOS] Understanding VDO vs ZFS

0At 08:07 PM 5/2/2020, you wrote:>My two cents:
>1- Do you have an encrypted filesystem on top of VDO? If yes, you will see
>no benefit from dedupe.
>2- can you post the stats of  vdostats ?verbose /dev/mapper/xxxxx (replace
>with your device)
>
>you can do something like:  "vdostats -verbose /dev/mapper/xxxxxxxx |
grep
>-B6 'save percentage'
>
>
>
>
>On Sat, May 2, 2020 at 9:54 PM david <david at daku.org> wrote:
>
> > Folks
> >
> > I'm looking for a solution for backups because ZFS has failed on
me
> > too many times.  In my environment, I have a large amount of data
> > <snip>

BTW:  I think the 'saving percent' of 13 is 
consistent with my computation of 1.16 if one 
takes into account the overhead blocks.  Is that true?


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: vdostats.txt
URL:
<http://lists.centos.org/pipermail/centos/attachments/20200503/6f1b0b4a/attachment.txt>

Stefan S

2020-May-04 14:01 UTC

head link

[CentOS] Understanding VDO vs ZFS

Hi David,

in my opinion, VDO isn't worth the effort. I tried VDO for the same use
case: backups. My dataset is 2-3TB and I backup daily. Even with a smaller
dataset, VDO couldn't stand up to it's promises. It used tons of CPU and
memory and with a lot of tuning I could get it to kind of work, but it became
corrupted at the slightest problem (even a shutdown could do this, and shutdowns
could also take hours).

I have tried a number of things and I use a combination of two things now:
1. a btrfs volume with force-compress enabled to store the intermediate data -
it compresses my data to about 60% and that's enough for me
2. use of bup (https://bup.github.io/) to store long-term backups.

bup is incredibly efficient for my use case (full VM backups). Over the course
of a whole month, the dataset only increases by about 30% from the initial size
(I create a new full backup each month) - and this is with FULL backups of all
VMs every day. bup backupsets can also be mounted via FUSE, giving you access to
all stored versions in a filesystem-like manner.

If you can backup at will you can probably forego the btrfs volume for
intermediate storage - that is just a band-aid to work around a specific issue
here.


Stefan


--

________________________________
From: CentOS <centos-bounces at centos.org> on behalf of david <david
at daku.org>
Sent: Sunday, May 3, 2020 2:50 AM
To: centos at centos.org <centos at centos.org>
Subject: [CentOS] Understanding VDO vs ZFS

Folks

I'm looking for a solution for backups because ZFS has failed on me
too many times.  In my environment, I have a large amount of data
(around 2tb) that I periodically back up.  I keep the last 5
"snapshots".  I use rsync so that when I overwrite the oldest backup,
most of the data is already there and the backup completes quickly,
because only a small number of files have actually changed.

Because of this low change rate, I have used ZFS with its
deduplication feature to store the data.  I started using a Centos-6
installation, and upgraded years ago to Centos7.  Centos 8 is on my
agenda.  However, I've had several data-loss events with ZFS where
because of a combination of errors and/or mistakes, the entire store
was lost.  I've also noticed that ZFS is maintained separately from
Centos.  At this moment, the Centos 8 update causes ZFS to
fail.  Looking for an alternate, I'm trying VDO.

In the VDO installation, I created a logical volume containing two
hard-drives, and defined VDO on top of that logical volume.  It
appears to be running, yet I find the deduplication numbers don't
pass the smell test.  I would expect that if the logical volume
contains three copies of essentially identical data, I should see
deduplication numbers close to 3.00, but instead I'm seeing numbers
like 1.15.  I compute the compression number as follows:
  Use df and extract the value for "1k blocks used" from the third
column
  use vdostats --verbose and extract the number titled "1K-blocks
used"

Divide the first by the second.

Can you provide any advice on my use of ZFS or VDO without telling me
that I should be doing backups differently?

Thanks

David

_______________________________________________
CentOS mailing list
CentOS at centos.org
https://lists.centos.org/mailman/listinfo/centos

Andrew Walsh

2020-May-04 14:31 UTC

head link

[CentOS] Understanding VDO vs ZFS

On Sat, May 2, 2020 at 10:54 PM david <david at daku.org>
wrote:>
> Folks
>
> I'm looking for a solution for backups because ZFS has failed on me
> too many times.  In my environment, I have a large amount of data
> (around 2tb) that I periodically back up.  I keep the last 5
> "snapshots".  I use rsync so that when I overwrite the oldest
backup,
> most of the data is already there and the backup completes quickly,
> because only a small number of files have actually changed.
>
> Because of this low change rate, I have used ZFS with its
> deduplication feature to store the data.  I started using a Centos-6
> installation, and upgraded years ago to Centos7.  Centos 8 is on my
> agenda.  However, I've had several data-loss events with ZFS where
> because of a combination of errors and/or mistakes, the entire store
> was lost.  I've also noticed that ZFS is maintained separately from
> Centos.  At this moment, the Centos 8 update causes ZFS to
> fail.  Looking for an alternate, I'm trying VDO.
>
> In the VDO installation, I created a logical volume containing two
> hard-drives, and defined VDO on top of that logical volume.  It
> appears to be running, yet I find the deduplication numbers don't
> pass the smell test.  I would expect that if the logical volume
> contains three copies of essentially identical data, I should see
> deduplication numbers close to 3.00, but instead I'm seeing numbers
> like 1.15.  I compute the compression number as follows:
>   Use df and extract the value for "1k blocks used" from the
third column
>   use vdostats --verbose and extract the number titled "1K-blocks
used"
I'd like to know what kind of data you're looking to back up (that
will just help get an idea of whether it's even a good fit for dedupe;
though if it dedupes well on ZFS, it probably is fine).  I'd also like
to know how you configured your VDO volume (provide the 'vdo create'
command you used).  As mentioned in some other responses, can you
provide vdostats (full 'vdostats --verbose' output as well as base
'vdostats') and df outputs for this volume?  That would help
understand a bit more on what you're experiencing.

The default deduplication window for a VDO volume is set to ~250G
(--indexMem=0.25).  Assuming you're writing the full 2T of data each
time and want to achieve deduplication across that entire 2T of data,
it would require a "--indexMem=2G" configuration.  You may want to
account for growth as well, which means you may want to consider a
larger amount of memory for the '--indexMem' parameter.  An
alternative, if memory isn't as plentiful, you could enable the sparse
index option to cover a significantly larger dedupe window for a
smaller amount of memory commitment.  There is an additional on-disk
footprint requirement that goes with it.  You can look at the
documentation [0] to find out those specific requirements.  For this
setup, a sparse index with default memory footprint (0.25G) would
cover ~2.5T, but would require an additional ~20G of storage over the
default index configuration.

[0]
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/deduplicating_and_compressing_storage/deploying-vdo_deduplicating-and-compressing-storage#vdo-memory-requirements_vdo-requirements
>
> Divide the first by the second.
>
> Can you provide any advice on my use of ZFS or VDO without telling me
> that I should be doing backups differently?
Without more information about what you're attempting to do, I can't
really say that what you're doing is wrong, but I also can't say that
there are any expectations from VDO yet that aren't being met.  More
context would certainly help get to the bottom of this question.
>
> Thanks
>
> David
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>

Andrew Walsh

2020-May-04 14:31 UTC

head link

[CentOS] Understanding VDO vs ZFS

On Mon, May 4, 2020 at 10:02 AM Stefan S <stefan at kalaam.org>
wrote:>
> Hi David,
>
> in my opinion, VDO isn't worth the effort. I tried VDO for the same use
case: backups. My dataset is 2-3TB and I backup daily. Even with a smaller
dataset, VDO couldn't stand up to it's promises. It used tons of CPU and
memory and with a lot of tuning I could get it to kind of work, but it became
corrupted at the slightest problem (even a shutdown could do this, and shutdowns
could also take hours).
I'm sorry to hear you feel that way.  I would be interested to
understand the situations that you experienced this problem so that it
can be addressed better in the future.  Did you reach out for any
guidance when it was happening?
>
> I have tried a number of things and I use a combination of two things now:
> 1. a btrfs volume with force-compress enabled to store the intermediate
data - it compresses my data to about 60% and that's enough for me
> 2. use of bup (https://bup.github.io/) to store long-term backups.
>
> bup is incredibly efficient for my use case (full VM backups). Over the
course of a whole month, the dataset only increases by about 30% from the
initial size (I create a new full backup each month) - and this is with FULL
backups of all VMs every day. bup backupsets can also be mounted via FUSE,
giving you access to all stored versions in a filesystem-like manner.
>
> If you can backup at will you can probably forego the btrfs volume for
intermediate storage - that is just a band-aid to work around a specific issue
here.
>
>
> Stefan
>
>
> --
>
> ________________________________
> From: CentOS <centos-bounces at centos.org> on behalf of david
<david at daku.org>
> Sent: Sunday, May 3, 2020 2:50 AM
> To: centos at centos.org <centos at centos.org>
> Subject: [CentOS] Understanding VDO vs ZFS
>
> Folks
>
> I'm looking for a solution for backups because ZFS has failed on me
> too many times.  In my environment, I have a large amount of data
> (around 2tb) that I periodically back up.  I keep the last 5
> "snapshots".  I use rsync so that when I overwrite the oldest
backup,
> most of the data is already there and the backup completes quickly,
> because only a small number of files have actually changed.
>
> Because of this low change rate, I have used ZFS with its
> deduplication feature to store the data.  I started using a Centos-6
> installation, and upgraded years ago to Centos7.  Centos 8 is on my
> agenda.  However, I've had several data-loss events with ZFS where
> because of a combination of errors and/or mistakes, the entire store
> was lost.  I've also noticed that ZFS is maintained separately from
> Centos.  At this moment, the Centos 8 update causes ZFS to
> fail.  Looking for an alternate, I'm trying VDO.
>
> In the VDO installation, I created a logical volume containing two
> hard-drives, and defined VDO on top of that logical volume.  It
> appears to be running, yet I find the deduplication numbers don't
> pass the smell test.  I would expect that if the logical volume
> contains three copies of essentially identical data, I should see
> deduplication numbers close to 3.00, but instead I'm seeing numbers
> like 1.15.  I compute the compression number as follows:
>   Use df and extract the value for "1k blocks used" from the
third column
>   use vdostats --verbose and extract the number titled "1K-blocks
used"
>
> Divide the first by the second.
>
> Can you provide any advice on my use of ZFS or VDO without telling me
> that I should be doing backups differently?
>
> Thanks
>
> David
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>

Jeffrey Walton

2020-May-04 15:27 UTC

head link

[CentOS] Understanding VDO vs ZFS

On Sat, May 2, 2020 at 10:54 PM david <david at daku.org>
wrote:>
> I'm looking for a solution for backups because ZFS has failed on me
> too many times.  In my environment, I have a large amount of data
> (around 2tb) that I periodically back up.  I keep the last 5
> "snapshots".  I use rsync so that when I overwrite the oldest
backup,
> most of the data is already there and the backup completes quickly,
> because only a small number of files have actually changed.
Duplicity works well on CentOS. I had to perform a restore of a
website and wiki after I [accidentally] deleted both. Backups are to
another machine over SSH scheduled through Systemd.

A Duplicity-based backup may help protect your data until you get
something in place you like better.

Jeff

John Pierce

2020-May-04 18:03 UTC

head link

[CentOS] Understanding VDO vs ZFS

Rather than dedupe at the file system level, I found the application level
dedupe in BackupPC works really well...   I've run BackupPC on both a big
ZFS volume, and on a giant XFS over LVM over MDRAID volume (24 x 3TB disks
organized as 2 x 11 raid6 plus 2 hot spares).   The backuppc server I built
at my last $job had 30 days of daily incrementals and 12 months of
monthlies of about 25 servers+VMs (including Linux, Solaris, AIX, and
Windows).   The dedupe is done globally on a file level, so no matter how
many instances of a file in all those backups ((30+12) * 25), there's only
one file in the 'hive'.    Bonus, BackupPC has a nice web UI for
retrieving
backups, I could create accounts for my various developers, and they could
retrieve stuff from any covered date on any of the servers they had access
to without my intervention.

about the only manual intervention I ever needed to do over the several
years this was running involved the Windows rsync client needing a PID file
deleted after an unexpected reboot.





-- 
-john r pierce
  recycling used bits in santa cruz

Maybe Matching Threads

Search for more apparently analagous threads

CentOS - May 2020 - Understanding VDO vs ZFS

[CentOS] Understanding VDO vs ZFS

[CentOS] Understanding VDO vs ZFS

[CentOS] Understanding VDO vs ZFS

[CentOS] Understanding VDO vs ZFS

[CentOS] Understanding VDO vs ZFS

[CentOS] Understanding VDO vs ZFS

[CentOS] Understanding VDO vs ZFS

[CentOS] Understanding VDO vs ZFS

[CentOS] Understanding VDO vs ZFS

[CentOS] Understanding VDO vs ZFS

Maybe Matching Threads