Mike Snitzer
2022-Sep-21 15:08 UTC
[PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage
On Tue, Sep 20 2022 at 5:48P -0400, Daniil Lunev <dlunev at google.com> wrote:> > There is no such thing as WRITE UNAVAILABLE in NVMe. > Apologize, that is WRITE UNCORRECTABLE. Chapter 3.2.7 of > NVM Express NVM Command Set Specification 1.0b > > > That being siad you still haven't actually explained what problem > > you're even trying to solve. > > The specific problem is the following: > * There is an thinpool over a physical device > * There are multiple logical volumes over the thin pool > * Each logical volume has an independent file system and an > independent application running over it > * Each application is potentially allowed to consume the entirety > of the disk space - there is no strict size limit for application > * Applications need to pre-allocate space sometime, for which > they use fallocate. Once the operation succeeded, the application > assumed the space is guaranteed to be there for it. > * Since filesystems on the volumes are independent, filesystem > level enforcement of size constraints is impossible and the only > common level is the thin pool, thus, each fallocate has to find its > representation in thin pool one way or another - otherwise you > may end up in the situation, where FS thinks it has allocated space > but when it tries to actually write it, the thin pool is already > exhausted. > * Hole-Punching fallocate will not reach the thin pool, so the only > solution presently is zero-writing pre-allocate. > * Not all storage devices support zero-writing efficiently - apart > from NVMe being or not being capable of doing efficient write > zero - changing which is easier said than done, and would take > years - there are also other types of storage devices that do not > have WRITE ZERO capability in the first place or have it in a > peculiar way. And adding custom WRITE ZERO to LVM would be > arguably a much bigger hack. > * Thus, a provisioning block operation allows an interface specific > operation that guarantees the presence of the block in the > mapped space. LVM Thin-pool itself is the primary target for our > use case but the argument is that this operation maps well to > other interfaces which allow thinly provisioned units.Thanks for this overview. Should help level-set others. Adding fallocate support has been a long-standing dm-thin TODO item for me. I just never got around to it. So thanks to Sarthak, you and anyone else who had a hand in developing this. I had a look at the DM thin implementation and it looks pretty simple (doesn't require a thin-metadata change, etc). I'll look closer at the broader implementation (block, etc) but I'm encouraged by what I'm seeing. Mike