thr3ads.net - CentOS - [CentOS] Looking for a life-save LVM Guru [Feb 2015]

If this information is useful, please help other people find it:
Share via:

John R Pierce

2015-Feb-28 03:24 UTC

[CentOS] Looking for a life-save LVM Guru

On 2/27/2015 4:52 PM, Khemara Lyn wrote:> I understand; I tried it in the hope that, I could activate the LV again
> with a new PV replacing the damaged one. But still I could not activate
> it.
>
> What is the right way to recover the remaining PVs left?
take a filing cabinet packed full of 10s of 1000s of files of 100s of 
pages each,   with the index cards interleaved in the files, and remove 
1/4th of the pages in the folders, including some of the indexes... and 
toss everything else on the floor...    this is what you have.   3 out 
of 4 pages, semi-randomly with no idea whats what.

a LV built from PV's that are just simple drives is something like 
RAID0, which isn't RAID at all, as there's no redundancy, its AID-0.





-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Marko Vojinovic

2015-Feb-28 04:00 UTC

head link

[CentOS] Looking for a life-save LVM Guru

On Fri, 27 Feb 2015 19:24:57 -0800
John R Pierce <pierce at hogranch.com> wrote:> On 2/27/2015 4:52 PM, Khemara Lyn wrote:
> >
> > What is the right way to recover the remaining PVs left?
> 
> take a filing cabinet packed full of 10s of 1000s of files of 100s of 
> pages each,   with the index cards interleaved in the files, and
> remove 1/4th of the pages in the folders, including some of the
> indexes... and toss everything else on the floor...    this is what
> you have.   3 out of 4 pages, semi-randomly with no idea whats what.
And this is why I don't like LVM to begin with. If one of the drives
dies, you're screwed not only for the data on that drive, but even for
data on remaining healthy drives.

I never really saw the point of LVM. Storing data on plain physical
partitions, having an intelligent directory structure and a few wise
well-placed symlinks across the drives can go a long way in having
flexible storage, which is way more robust than LVM. With today's huge
drive capacities, I really see no reason to adjust the sizes of
partitions on-the-fly, and putting several TB of data in a single
directory is just Bad Design to begin with.

That said, if you have a multi-TB amount of critical data while not
having at least a simple RAID-1 backup, you are already standing in a
big pile of sh*t just waiting to become obvious, regardless of LVM and
stuff. Hardware fails, and storing data without a backup is just simply
a disaster waiting to happen.

Best, :-)
Marko

Chris Murphy

2015-Feb-28 04:23 UTC

head link

[CentOS] Looking for a life-save LVM Guru

On Fri, Feb 27, 2015 at 8:24 PM, John R Pierce <pierce at hogranch.com>
wrote:> On 2/27/2015 4:52 PM, Khemara Lyn wrote:
>>
>> I understand; I tried it in the hope that, I could activate the LV
again
>> with a new PV replacing the damaged one. But still I could not activate
>> it.
>>
>> What is the right way to recover the remaining PVs left?
>
>
> take a filing cabinet packed full of 10s of 1000s of files of 100s of pages
> each,   with the index cards interleaved in the files, and remove 1/4th of
> the pages in the folders, including some of the indexes... and toss
> everything else on the floor...    this is what you have.   3 out of 4
> pages, semi-randomly with no idea whats what.
>
> a LV built from PV's that are just simple drives is something like
RAID0,
> which isn't RAID at all, as there's no redundancy, its AID-0.
If the LE to PE relationship is exactly linear, as in, the PV, VG, LV
were all made at the same time, it's not entirely hopeless. There will
be some superblocks intact so scraping is possible.

I just tried this with a 4 disk LV and XFS. I removed the 3rd drive. I
was able to activate the LV using:

vgchange -a y --activationmode partial

I was able to mount -o ro but I do get errors in dmesg:
[ 1594.835766] XFS (dm-1): Mounting V4 Filesystem
[ 1594.884172] XFS (dm-1): Ending clean mount
[ 1602.753606] XFS (dm-1): metadata I/O error: block 0x5d780040
("xfs_trans_read_buf_map") error 5 numblks 16
[ 1602.753623] XFS (dm-1): xfs_imap_to_bp: xfs_trans_read_buf()
returned error -5.

# ls -l
ls: cannot access 4: Input/output error
total 0
drwxr-xr-x. 3 root root 16 Feb 27 20:40 1
drwxr-xr-x. 3 root root 16 Feb 27 20:43 2
drwxr-xr-x. 3 root root 16 Feb 27 20:47 3
??????????? ? ?    ?     ?            ? 4

# cp -a 1/ /mnt/btrfs
cp: cannot stat ?1/usr/include?: Input/output error
cp: cannot stat ?1/usr/lib/alsa/init?: Input/output error
cp: cannot stat ?1/usr/lib/cups?: Input/output error
cp: cannot stat ?1/usr/lib/debug?: Input/output error
[...]

And now in dmesg, thousands of
[ 1663.722490] XFS (dm-1): metadata I/O error: block 0x425f96d0
("xfs_trans_read_buf_map") error 5 numblks 8

Out of what should have been 3.5GB of data in 1/, I was able to get 452MB.

That's not so bad for just a normal mount and copy. I am in fact
shocked the file system mounts, and stays mounted. Yay XFS.


-- 
Chris Murphy

John R Pierce

2015-Feb-28 04:44 UTC

head link

[CentOS] Looking for a life-save LVM Guru

On 2/27/2015 8:00 PM, Marko Vojinovic wrote:> And this is why I don't like LVM to begin with. If one of the drives
> dies, you're screwed not only for the data on that drive, but even for
> data on remaining healthy drives.
with classic LVM, you were supposed to use raid for your PV's.   The new 
LVM in 6.3+ has integrated raid at an LV level, you just have to declare 
all your LVs with appropriate raid levels.



-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Chris Murphy

2015-Feb-28 04:50 UTC

head link

[CentOS] Looking for a life-save LVM Guru

OK so ext4 this time, with new disk images. I notice at mkfs.ext4 that
each virtual disk goes from 2MB to 130MB-150MB each. That's a lot of
fs metadata, and it's fairly evenly distributed across each drive.

Copied 3.5GB to the volume. Unmount. Poweroff. Killed 3rd of 4. Boot.
Mounts fine. No errors. HUH surprising. As soon as I use ls though:

[  182.461819] EXT4-fs error (device dm-1): __ext4_get_inode_loc:3806:
inode #43384833: block 173539360: comm ls: unable to read itable block

# cp -a usr /mnt/btrfs
cp: cannot stat ?usr?: Input/output error

[  214.411859] EXT4-fs error (device dm-1): __ext4_get_inode_loc:3806:
inode #43384833: block 173539360: comm ls: unable to read itable block
[  221.067689] EXT4-fs error (device dm-1): __ext4_get_inode_loc:3806:
inode #43384833: block 173539360: comm cp: unable to read itable block

I can't get anything off the drive. And what I have here are ideal
conditions because it's a brand new clean file system, no
fragmentation, nothing about the LVM volume has been modified, no fsck
done. So nothing is corrupt. It's just missing a 1/4 hunk of its PE's.
I'd say an older production use fs has zero chance of recovery via
mounting.

So this is now a scraping operation with ext4.



Chris Murphy

Chris Murphy

2015-Feb-28 05:37 UTC

head link

[CentOS] Looking for a life-save LVM Guru

On Fri, Feb 27, 2015 at 9:00 PM, Marko Vojinovic <vvmarko at gmail.com>
wrote:> And this is why I don't like LVM to begin with. If one of the drives
> dies, you're screwed not only for the data on that drive, but even for
> data on remaining healthy drives.
It has its uses, just like RAID0 has uses. But yes, as the number of
drives in the pool increases, the risk of catastrophic failure
increases. So you have to bet on consistent backups and be OK with any
intervening dataloss. If not, well, use RAID1+ or use a
distributed-replication cluster like GlusterFS or Ceph.
> Hardware fails, and storing data without a backup is just simply
> a disaster waiting to happen.
I agree. I kind get a wee bit aggressive and say, if you don't have
backups the data is by (your own) definition not important.

Anyway, changing the underlying storage as little as possible gives
the best chance of success. linux-raid@ list is full of raid5/6
implosions due to people panicking, reading a bunch of stuff, not
identifying their actual problem, and just start typing a bunch of
commands and end up with user induced data loss.

In the case of this thread, I'd say the best chance for success is to
not remove or replace the dead PV, but to do a partial activation.
# vgchange -a y --activationmode partial

And then ext4 it's a scrape operation with debugfs -c. And for XFS
looks like some amount of data is possibly recoverable with just an ro
mount. I didn't try any scrape operation, too tedious to test.

-- 
Chris Murphy

Valeri Galtsev

2015-Feb-28 20:26 UTC

head link

[CentOS] Looking for a life-save LVM Guru

On Fri, February 27, 2015 10:00 pm, Marko Vojinovic
wrote:> On Fri, 27 Feb 2015 19:24:57 -0800
> John R Pierce <pierce at hogranch.com> wrote:
>> On 2/27/2015 4:52 PM, Khemara Lyn wrote:
>> >
>> > What is the right way to recover the remaining PVs left?
>>
>> take a filing cabinet packed full of 10s of 1000s of files of 100s of
>> pages each,   with the index cards interleaved in the files, and
>> remove 1/4th of the pages in the folders, including some of the
>> indexes... and toss everything else on the floor...    this is what
>> you have.   3 out of 4 pages, semi-randomly with no idea whats what.
>
> And this is why I don't like LVM to begin with. If one of the drives
> dies, you're screwed not only for the data on that drive, but even for
> data on remaining healthy drives.
>
> I never really saw the point of LVM. Storing data on plain physical
> partitions, having an intelligent directory structure and a few wise
> well-placed symlinks across the drives can go a long way in having
> flexible storage, which is way more robust than LVM. With today's huge
> drive capacities, I really see no reason to adjust the sizes of
> partitions on-the-fly, and putting several TB of data in a single
> directory is just Bad Design to begin with.
>
> That said, if you have a multi-TB amount of critical data while not
> having at least a simple RAID-1 backup, you are already standing in a
> big pile of sh*t just waiting to become obvious, regardless of LVM and
> stuff. Hardware fails, and storing data without a backup is just simply
> a disaster waiting to happen.
>
Indeed. That is why: no LVMs in my server room. Even no software RAID.
Software RAID relies on the system itself to fulfill its RAID function;
what if kernel panics before software RAID does its job? Hardware RAID
(for huge filesystems I can not afford to back up) is what only makes
sense for me. RAID controller has dedicated processors and dedicated
simple system which does one simple task: RAID.

Just my $0.02

Valeri

++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++

James A. Peltier

2015-Feb-28 23:28 UTC

head link

[CentOS] Looking for a life-save LVM Guru

----- Original Message -----
| On Fri, 27 Feb 2015 19:24:57 -0800
| John R Pierce <pierce at hogranch.com> wrote:
| > On 2/27/2015 4:52 PM, Khemara Lyn wrote:
| > >
| > > What is the right way to recover the remaining PVs left?
| > 
| > take a filing cabinet packed full of 10s of 1000s of files of 100s of
| > pages each,   with the index cards interleaved in the files, and
| > remove 1/4th of the pages in the folders, including some of the
| > indexes... and toss everything else on the floor...    this is what
| > you have.   3 out of 4 pages, semi-randomly with no idea whats what.
| 
| And this is why I don't like LVM to begin with. If one of the drives
| dies, you're screwed not only for the data on that drive, but even for
| data on remaining healthy drives.
| 
| I never really saw the point of LVM. Storing data on plain physical
| partitions, having an intelligent directory structure and a few wise
| well-placed symlinks across the drives can go a long way in having
| flexible storage, which is way more robust than LVM. With today's huge
| drive capacities, I really see no reason to adjust the sizes of
| partitions on-the-fly, and putting several TB of data in a single
| directory is just Bad Design to begin with.
| 
| That said, if you have a multi-TB amount of critical data while not
| having at least a simple RAID-1 backup, you are already standing in a
| big pile of sh*t just waiting to become obvious, regardless of LVM and
| stuff. Hardware fails, and storing data without a backup is just simply
| a disaster waiting to happen.
| 
| Best, :-)
| Marko
| 

This is not an LVM vs physical partitioning problem.  This is a system component
failed and it wasn't being monitored and so now we're in deep doo-doo
problem.  This problem also came to us after there were many attempts to recover
the problem that were likely done incorrectly.  If the disk was still at least
partially accessible (monitoring would have caught that) there would be
increased chances of data recovery, although maybe not much better.

People who understand how to use the system do not suffer these problems.  LVM
adds a bit of complexity for a bit of extra benefits.  You can't blame LVM
for user error.  Not having monitoring in place or backups is a user problem,
not an LVM one.

I have managed Petabytes worth of data on LVM and not suffered this sort of
problem *knock on wood*, but I also know that I'm not immune to it.  I
don't even use partitions for anything but system drives.  I use whole disk
PV to avoid things like partition alignment issues.  Not a single bit of data
loss in 7 years dealing with these servers either.  At least none that
weren't user error. ;)

-- 
James A. Peltier
IT Services - Research Computing Group
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax     : 778-782-3045
E-Mail  : jpeltier at sfu.ca
Website : sfu.ca/itservices
Twitter : @sfu_rcg
Powering Engagement Through Technology
"Build upon strengths and weaknesses will generally take care of
themselves" - Joyce C. Lock

Possibly Parallel Threads

Search for more reasonably related threads

CentOS - Feb 2015 - Looking for a life-save LVM Guru

[CentOS] Looking for a life-save LVM Guru

[CentOS] Looking for a life-save LVM Guru

[CentOS] Looking for a life-save LVM Guru

[CentOS] Looking for a life-save LVM Guru

[CentOS] Looking for a life-save LVM Guru

[CentOS] Looking for a life-save LVM Guru

[CentOS] Looking for a life-save LVM Guru

[CentOS] Looking for a life-save LVM Guru

Possibly Parallel Threads