David Hildenbrand
2021-May-03 10:13 UTC
[PATCH v1 7/7] fs/proc/kcore: use page_offline_(freeze|unfreeze)
On 03.05.21 11:28, Mike Rapoport wrote:> On Mon, May 03, 2021 at 10:28:36AM +0200, David Hildenbrand wrote: >> On 02.05.21 08:34, Mike Rapoport wrote: >>> On Thu, Apr 29, 2021 at 02:25:19PM +0200, David Hildenbrand wrote: >>>> Let's properly synchronize with drivers that set PageOffline(). Unfreeze >>>> every now and then, so drivers that want to set PageOffline() can make >>>> progress. >>>> >>>> Signed-off-by: David Hildenbrand <david at redhat.com> >>>> --- >>>> fs/proc/kcore.c | 15 +++++++++++++++ >>>> 1 file changed, 15 insertions(+) >>>> >>>> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c >>>> index 92ff1e4436cb..3d7531f47389 100644 >>>> --- a/fs/proc/kcore.c >>>> +++ b/fs/proc/kcore.c >>>> @@ -311,6 +311,7 @@ static void append_kcore_note(char *notes, size_t *i, const char *name, >>>> static ssize_t >>>> read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) >>>> { >>>> + size_t page_offline_frozen = 0; >>>> char *buf = file->private_data; >>>> size_t phdrs_offset, notes_offset, data_offset; >>>> size_t phdrs_len, notes_len; >>>> @@ -509,6 +510,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) >>>> pfn = __pa(start) >> PAGE_SHIFT; >>>> page = pfn_to_online_page(pfn); >>> >>> Can't this race with page offlining for the first time we get here? >> >> >> To clarify, we have three types of offline pages in the kernel ... >> >> a) Pages part of an offline memory section; the memap is stale and not >> trustworthy. pfn_to_online_page() checks that. We *can* protect against >> memory offlining using get_online_mems()/put_online_mems(), but usually >> avoid doing so as the race window is very small (and a problem all over the >> kernel we basically never hit) and locking is rather expensive. In the >> future, we might switch to rcu to handle that more efficiently and avoiding >> these possible races. >> >> b) PageOffline(): logically offline pages contained in an online memory >> section with a sane memmap. virtio-mem calls these pages "fake offline"; >> something like a "temporary" memory hole. The new mechanism I propose will >> be used to handle synchronization as races can be more severe, e.g., when >> reading actual page content here. >> >> c) Soft offline pages: hwpoisoned pages that are not actually harmful yet, >> but could become harmful in the future. So we better try to remove the page >> from the page allcoator and try to migrate away existing users. >> >> >> So page_offline_* handle "b) PageOffline()" only. There is a tiny race >> between pfn_to_online_page(pfn) and looking at the memmap as we have in many >> cases already throughout the kernel, to be tackled in the future. > > Right, but here you anyway add locking, so why exclude the first iteration?What we're protecting is PageOffline() below. If I didn't mess up, we should always be calling page_offline_freeze() before calling PageOffline(). Or am I missing something?> > BTW, did you consider something likeYes, I played with something like that. We'd have to handle the first page_offline_freeze() freeze differently, though, and that's where things got a bit ugly in my attempts.> > if (page_offline_frozen++ % MAX_ORDER_NR_PAGES == 0) { > page_offline_unfreeze(); > cond_resched(); > page_offline_freeze(); > } > > We don't seem to care about page_offline_frozen overflows here, do we?No, the buffer size is also size_t and gets incremented on a per-byte basis. The variant I have right now looked the cleanest to me. Happy to hear simpler alternatives. -- Thanks, David / dhildenb