Andrew Morton
2006-May-11 04:41 UTC
[Samba] Re: [PATCH] smbfs: Fix slab corruption in samba error path
Jan Niehusmann <jan@gondor.com> wrote:> > Yesterday, I got the following error with 2.6.16.13 during a file copy > from a smb filesystem over a wireless link. I guess there was some error > on the wireless link, which in turn caused an error condition for the > smb filesystem. > > In the log, smb_file_read reports error=4294966784 (0xfffffe00), which > also shows up in the slab dumps, and also is -ERESTARTSYS. Error code > 27499 corresponds to 0x6b6b, so the rq_errno field seems to be the only > one being set after freeing the slab. > > In smb_add_request (which is the only place in smbfs where I found > ERESTARTSYS), I found the following: > > if (!timeleft || signal_pending(current)) { > /* > * On timeout or on interrupt we want to try and remove the > * request from the recvq/xmitq. > */ > smb_lock_server(server); > if (!(req->rq_flags & SMB_REQ_RECEIVED)) { > list_del_init(&req->rq_queue); > smb_rput(req); > } > smb_unlock_server(server); > } > [...] > if (signal_pending(current)) > req->rq_errno = -ERESTARTSYS; > > I guess that some codepath like smbiod_flush() caused the request > to be removed from the queue, and smb_rput(req) be called, without > SMB_REQ_RECEIVED being set. This violates an asumption made by the > quoted code. > > Then, the above code calls smb_rput(req) again, the req gets freed, > and req->rq_errno = -ERESTARTSYS writes into the already freed slab. As > list_del_init doesn't cause an error if called multiple times, that does > cause the observed behaviour (freed slab with rq_errno=-ERESTARTSYS). > > If this observation is correct, the following patch should fix it. > > .. > > diff --git a/fs/smbfs/request.c b/fs/smbfs/request.c > index c71c375..d2d8226 100644 > --- a/fs/smbfs/request.c > +++ b/fs/smbfs/request.c > @@ -339,9 +339,11 @@ #endif > /* > * On timeout or on interrupt we want to try and remove the > * request from the recvq/xmitq. > + * First check if the request is still part of a queue. (May > + * have been removed by some error condition) > */ > smb_lock_server(server); > - if (!(req->rq_flags & SMB_REQ_RECEIVED)) { > + if (&req->rq_queue != req->rq_queue.next) { > list_del_init(&req->rq_queue); > smb_rput(req); > } >I think the bug is actually that this code is accessing *req after having doen the smb_rput(). I worry that your patch "fixes" this by accidentally leaking the request. We can fairly simply restructure this code so that it doesn't touch the request after that possible smb_rput(). How does this look? If "OK", are you able to test it? fs/smbfs/request.c | 30 ++++++++++++++++-------------- 1 files changed, 16 insertions(+), 14 deletions(-) diff -puN fs/smbfs/request.c~smbfs-fix-slab-corruption-in-samba-error-path fs/smbfs/request.c --- devel/fs/smbfs/request.c~smbfs-fix-slab-corruption-in-samba-error-path 2006-05-10 01:59:08.000000000 -0700 +++ devel-akpm/fs/smbfs/request.c 2006-05-10 02:21:41.000000000 -0700 @@ -335,19 +335,6 @@ int smb_add_request(struct smb_request * timeleft = wait_event_interruptible_timeout(req->rq_wait, req->rq_flags & SMB_REQ_RECEIVED, 30*HZ); - if (!timeleft || signal_pending(current)) { - /* - * On timeout or on interrupt we want to try and remove the - * request from the recvq/xmitq. - */ - smb_lock_server(server); - if (!(req->rq_flags & SMB_REQ_RECEIVED)) { - list_del_init(&req->rq_queue); - smb_rput(req); - } - smb_unlock_server(server); - } - if (!timeleft) { PARANOIA("request [%p, mid=%d] timed out!\n", req, req->rq_mid); @@ -372,7 +359,22 @@ int smb_add_request(struct smb_request * req->rq_errno = smb_errno(req); if (signal_pending(current)) req->rq_errno = -ERESTARTSYS; - return req->rq_errno; + result = req->rq_errno; + + if (!timeleft || signal_pending(current)) { + /* + * On timeout or on interrupt we want to try and remove the + * request from the recvq/xmitq. + */ + smb_lock_server(server); + if (!(req->rq_flags & SMB_REQ_RECEIVED)) { + list_del_init(&req->rq_queue); + smb_rput(req); + } + smb_unlock_server(server); + } + + return result; } /* _
Jan Niehusmann
2006-May-11 04:41 UTC
[Samba] Re: [PATCH] smbfs: Fix slab corruption in samba error path
On Wed, May 10, 2006 at 02:25:29AM -0700, Andrew Morton wrote:> I think the bug is actually that this code is accessing *req after having > doen the smb_rput(). I worry that your patch "fixes" this by accidentally > leaking the request. > > We can fairly simply restructure this code so that it doesn't touch the > request after that possible smb_rput(). > > How does this look? If "OK", are you able to test it?No, it doesn't look ok: The callers of smb_add_request (which are all in fs/smbfs/proc.c) do touch the req structure after calling smb_add_request, even if an error is returned. So your code would still cause access after release and double frees on the req object. As I understand the code smb_add_request is not allowed to completely release the req structure at all. What smb_add_request should do is - increase the req usage counter by one (by calling smb_rget), and add the req to one of the work queues - or leave the usage counter alone, and don't add the req to one of the work queues On error, one has to be careful: If we actively remove the req from the work queues again, we have to decrease the usage counter (otherwise we leak requests). But if some other function already removed the req from the queue, that function already did decrease the counter, so we are not allowed to do it again. The original code did get the latter case partly wrong. It assumed that the only way a req could be removed from the work queue would be in smb_request_recv, where the SMB_REQ_RECEIVED flag gets set. But it did miss the error cases in smbiod.c, eg. smbiod_flush(), where the req gets removed from the queues (and the usage counter decreased), without the SMB_REQ_RECEIVED flag being set. Therefore I changed the code to not check SMB_REQ_RECEIVED, but test if the req is still on one of the work queue linked lists. After that change, smb_add_request never releases the req, so reordering is not necessary. Unfortunately it's not easy to test the patch: Of course I did check that it properly compiles, but the bug is not easily reproducible. Jan