Am 12.09.21 um 21:48 schrieb Matt Oursbourn via samba:> What I have noticed from looking at the Processes tab in System > Monitor on the server: The server is on a 10GB network. When I start > a chia plot (~106Gb file) transfer from a client computer that is > also connected to the 10Gb network the transfer starts off at > ~1000Mb/s until the samba process is largerthan> 32.5GiB. The largest I have seen is 39GiB. The transfer rate then > drops down to the hdd write speed of ~170Mb/s. That samba process > nevergets any> smaller than 32.5GiB. Even after the transfer is complete. > > If a client computer that is connected with a 1Gb nic starts a > transfer, another samba process is spawned. That transfer stays > around 130Mb/sto 150> Mb/s and the samba process grows to ~9GiB. After that transfer is > complete the process drops down to ~5MiB. > > After a day of plotting and transferring ~100 plot files to the > serverfrom> those two computers there are still only the two samba processes > running. Plotting has been stopped for several hours and one is > 32.5Gib, the other is 4.9Mib.the pool-usage of the large process you'd sent me privately contains: $ grep pthreadpool_tevent_job_state pool-usage\ smbd_32GiB.txt | wc -l 6468 which means there are 6468 SMB2 WRITE requests pending at the server. This is likely triggered because - the client is sending data much faster then the server is writing data to disk - Samba returns SMB2 Async Interim Response for every received WRITE request - These grant SMB2 credits to the client - As a result the client is not throttled by a crediting window Iirc we've seen this before some time ago and iirc a Windows server behaves either doesn't send interim async responses or stops sending them at some point. To me this looks like a design flaw in the crediting logic, but we can't change that so we have to adjust our write processing code to handle this. Until that fix materializes (which may take some tim) you could disable aio as a workaround in smb.conf by setting: aio read size = 0 aio write size = 0 -slow -- Ralph Boehme, Samba Team https://samba.org/ SerNet Samba Team Lead https://sernet.de/en/team-samba -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 840 bytes Desc: OpenPGP digital signature URL: <http://lists.samba.org/pipermail/samba/attachments/20210914/a8b5805b/OpenPGP_signature.sig>
On Tue, Sep 14, 2021 at 08:16:49PM +0200, Ralph Boehme via samba wrote:> >Am 12.09.21 um 21:48 schrieb Matt Oursbourn via samba: >>What I have noticed from looking at the Processes tab in System >>Monitor on the server: The server is on a 10GB network. When I start >>a chia plot (~106Gb file) transfer from a client computer that is >>also connected to the 10Gb network the transfer starts off at >>~1000Mb/s until the samba process is larger >than >>32.5GiB. The largest I have seen is 39GiB. The transfer rate then >>drops down to the hdd write speed of ~170Mb/s. That samba process >>never >gets any >>smaller than 32.5GiB. Even after the transfer is complete. >> >>If a client computer that is connected with a 1Gb nic starts a >>transfer, another samba process is spawned. That transfer stays >>around 130Mb/s >to 150 >>Mb/s and the samba process grows to ~9GiB. After that transfer is >>complete the process drops down to ~5MiB. >> >>After a day of plotting and transferring ~100 plot files to the >>server >from >>those two computers there are still only the two samba processes >>running. Plotting has been stopped for several hours and one is >>32.5Gib, the other is 4.9Mib. > >the pool-usage of the large process you'd sent me privately contains: > >$ grep pthreadpool_tevent_job_state pool-usage\ smbd_32GiB.txt | wc -l >6468 > >which means there are 6468 SMB2 WRITE requests pending at the server. > >This is likely triggered because > >- the client is sending data much faster then the server is writing data >to disk > >- Samba returns SMB2 Async Interim Response for every received WRITE request > >- These grant SMB2 credits to the client > >- As a result the client is not throttled by a crediting window > >Iirc we've seen this before some time ago and iirc a Windows server >behaves either doesn't send interim async responses or stops sending >them at some point. > >To me this looks like a design flaw in the crediting logic, but we can't >change that so we have to adjust our write processing code to handle this. > >Until that fix materializes (which may take some tim) you could disable >aio as a workaround in smb.conf by setting: > >aio read size = 0 >aio write size = 0Hmmm. Can't we create some back-pressure on the crediting algorithm by forcing writes to go synchronous after the queue async io events reaches a (parameterized) size ? We already count fsp->num_aio_requests, so we have a per-fd count. We could add a per-process count ? Or maybe start forcing reads/writes to go synchronous once the time between request/response goes over a certain amount ?
On Tue, Sep 14, 2021 at 08:16:49PM +0200, Ralph Boehme via samba wrote:> >the pool-usage of the large process you'd sent me privately contains: > >$ grep pthreadpool_tevent_job_state pool-usage\ smbd_32GiB.txt | wc -l >6468 > >which means there are 6468 SMB2 WRITE requests pending at the server. > >This is likely triggered because > >- the client is sending data much faster then the server is writing data >to disk > >- Samba returns SMB2 Async Interim Response for every received WRITE request > >- These grant SMB2 credits to the client > >- As a result the client is not throttled by a crediting window > >Iirc we've seen this before some time ago and iirc a Windows server >behaves either doesn't send interim async responses or stops sending >them at some point. > >To me this looks like a design flaw in the crediting logic, but we can't >change that so we have to adjust our write processing code to handle this. > >Until that fix materializes (which may take some tim) you could disable >aio as a workaround in smb.conf by setting: > >aio read size = 0 >aio write size = 0OK, here is a horrible test hack (the best kind :-) that should limit the total number of outstanding aio requests overall to MAX_TOTAL_OUTSTANDING_AIO, where that is currently set to: #define MAX_TOTAL_OUTSTANDING_AIO 1000 Patch included here as an attachment, and also inline in case the list strips attachment. Matt, if you want to test this (with the number set to taste) and let us know if this fixes the problem, that would be an interesting data point ! Cheers, Jeremy --------------------------------------------- diff --git a/source3/smbd/aio.c b/source3/smbd/aio.c index 038487ad4ba..90b218fcc7a 100644 --- a/source3/smbd/aio.c +++ b/source3/smbd/aio.c @@ -24,6 +24,9 @@ #include "../lib/util/tevent_ntstatus.h" #include "../lib/util/tevent_unix.h" +static size_t total_num_outstanding_aio = 0; +#define MAX_TOTAL_OUTSTANDING_AIO 1000 + /**************************************************************************** The buffer we keep around whilst an aio request is in process. *****************************************************************************/ @@ -97,6 +100,7 @@ static int aio_del_req_from_fsp(struct aio_req_fsp_link *lnk) DEBUG(1, ("req %p not found in fsp %p\n", req, fsp)); return 0; } + total_num_outstanding_aio -= 1; fsp->num_aio_requests -= 1; fsp->aio_requests[i] = fsp->aio_requests[fsp->num_aio_requests]; @@ -141,6 +145,7 @@ bool aio_add_req_to_fsp(files_struct *fsp, struct tevent_req *req) } fsp->aio_requests[fsp->num_aio_requests] = req; fsp->num_aio_requests += 1; + total_num_outstanding_aio += 1; lnk->fsp = fsp; lnk->req = req; @@ -192,6 +197,10 @@ NTSTATUS schedule_aio_read_and_X(connection_struct *conn, return NT_STATUS_RETRY; } + if (total_num_outstanding_aio >= MAX_TOTAL_OUTSTANDING_AIO) { + return NT_STATUS_RETRY; + } + /* The following is safe from integer wrap as we've already checked smb_maxcnt is 128k or less. Wct is 12 for read replies */ @@ -460,6 +469,10 @@ NTSTATUS schedule_aio_write_and_X(connection_struct *conn, return NT_STATUS_RETRY; } + if (total_num_outstanding_aio >= MAX_TOTAL_OUTSTANDING_AIO) { + return NT_STATUS_RETRY; + } + bufsize = smb_size + 6*2; if (!(aio_ex = create_aio_extra(NULL, fsp, bufsize))) { @@ -710,6 +723,10 @@ NTSTATUS schedule_smb2_aio_read(connection_struct *conn, return NT_STATUS_RETRY; } + if (total_num_outstanding_aio >= MAX_TOTAL_OUTSTANDING_AIO) { + return NT_STATUS_RETRY; + } + /* Create the out buffer. */ *preadbuf = data_blob_talloc(ctx, NULL, smb_maxcnt); if (preadbuf->data == NULL) { @@ -851,6 +868,10 @@ NTSTATUS schedule_aio_smb2_write(connection_struct *conn, return NT_STATUS_RETRY; } + if (total_num_outstanding_aio >= MAX_TOTAL_OUTSTANDING_AIO) { + return NT_STATUS_RETRY; + } + if (!(aio_ex = create_aio_extra(smbreq->smb2req, fsp, 0))) { return NT_STATUS_NO_MEMORY; } -------------- next part -------------- diff --git a/source3/smbd/aio.c b/source3/smbd/aio.c index 038487ad4ba..90b218fcc7a 100644 --- a/source3/smbd/aio.c +++ b/source3/smbd/aio.c @@ -24,6 +24,9 @@ #include "../lib/util/tevent_ntstatus.h" #include "../lib/util/tevent_unix.h" +static size_t total_num_outstanding_aio = 0; +#define MAX_TOTAL_OUTSTANDING_AIO 1000 + /**************************************************************************** The buffer we keep around whilst an aio request is in process. *****************************************************************************/ @@ -97,6 +100,7 @@ static int aio_del_req_from_fsp(struct aio_req_fsp_link *lnk) DEBUG(1, ("req %p not found in fsp %p\n", req, fsp)); return 0; } + total_num_outstanding_aio -= 1; fsp->num_aio_requests -= 1; fsp->aio_requests[i] = fsp->aio_requests[fsp->num_aio_requests]; @@ -141,6 +145,7 @@ bool aio_add_req_to_fsp(files_struct *fsp, struct tevent_req *req) } fsp->aio_requests[fsp->num_aio_requests] = req; fsp->num_aio_requests += 1; + total_num_outstanding_aio += 1; lnk->fsp = fsp; lnk->req = req; @@ -192,6 +197,10 @@ NTSTATUS schedule_aio_read_and_X(connection_struct *conn, return NT_STATUS_RETRY; } + if (total_num_outstanding_aio >= MAX_TOTAL_OUTSTANDING_AIO) { + return NT_STATUS_RETRY; + } + /* The following is safe from integer wrap as we've already checked smb_maxcnt is 128k or less. Wct is 12 for read replies */ @@ -460,6 +469,10 @@ NTSTATUS schedule_aio_write_and_X(connection_struct *conn, return NT_STATUS_RETRY; } + if (total_num_outstanding_aio >= MAX_TOTAL_OUTSTANDING_AIO) { + return NT_STATUS_RETRY; + } + bufsize = smb_size + 6*2; if (!(aio_ex = create_aio_extra(NULL, fsp, bufsize))) { @@ -710,6 +723,10 @@ NTSTATUS schedule_smb2_aio_read(connection_struct *conn, return NT_STATUS_RETRY; } + if (total_num_outstanding_aio >= MAX_TOTAL_OUTSTANDING_AIO) { + return NT_STATUS_RETRY; + } + /* Create the out buffer. */ *preadbuf = data_blob_talloc(ctx, NULL, smb_maxcnt); if (preadbuf->data == NULL) { @@ -851,6 +868,10 @@ NTSTATUS schedule_aio_smb2_write(connection_struct *conn, return NT_STATUS_RETRY; } + if (total_num_outstanding_aio >= MAX_TOTAL_OUTSTANDING_AIO) { + return NT_STATUS_RETRY; + } + if (!(aio_ex = create_aio_extra(smbreq->smb2req, fsp, 0))) { return NT_STATUS_NO_MEMORY; }
so far I can confirm that with: aio read size = 0 aio write size = 0 The memory has not grown past 8MiB on any smbd process. Jeremy, As for the test you suggest. That is way over my head. I don't understand your email even a little bit. I am happy to test, but I have never compiled samba. -matt On Tue, Sep 14, 2021 at 11:16 AM Ralph Boehme <slow at samba.org> wrote:> > Am 12.09.21 um 21:48 schrieb Matt Oursbourn via samba: > > What I have noticed from looking at the Processes tab in System > > Monitor on the server: The server is on a 10GB network. When I start > > a chia plot (~106Gb file) transfer from a client computer that is > > also connected to the 10Gb network the transfer starts off at > > ~1000Mb/s until the samba process is larger > than > > 32.5GiB. The largest I have seen is 39GiB. The transfer rate then > > drops down to the hdd write speed of ~170Mb/s. That samba process > > never > gets any > > smaller than 32.5GiB. Even after the transfer is complete. > > > > If a client computer that is connected with a 1Gb nic starts a > > transfer, another samba process is spawned. That transfer stays > > around 130Mb/s > to 150 > > Mb/s and the samba process grows to ~9GiB. After that transfer is > > complete the process drops down to ~5MiB. > > > > After a day of plotting and transferring ~100 plot files to the > > server > from > > those two computers there are still only the two samba processes > > running. Plotting has been stopped for several hours and one is > > 32.5Gib, the other is 4.9Mib. > > the pool-usage of the large process you'd sent me privately contains: > > $ grep pthreadpool_tevent_job_state pool-usage\ smbd_32GiB.txt | wc -l > 6468 > > which means there are 6468 SMB2 WRITE requests pending at the server. > > This is likely triggered because > > - the client is sending data much faster then the server is writing data > to disk > > - Samba returns SMB2 Async Interim Response for every received WRITE > request > > - These grant SMB2 credits to the client > > - As a result the client is not throttled by a crediting window > > Iirc we've seen this before some time ago and iirc a Windows server > behaves either doesn't send interim async responses or stops sending > them at some point. > > To me this looks like a design flaw in the crediting logic, but we can't > change that so we have to adjust our write processing code to handle this. > > Until that fix materializes (which may take some tim) you could disable > aio as a workaround in smb.conf by setting: > > aio read size = 0 > aio write size = 0 > > -slow > > -- > Ralph Boehme, Samba Team https://samba.org/ > SerNet Samba Team Lead https://sernet.de/en/team-samba > > > >