Hello, this is a call for testers of the merge of fpu_kern_enter/leave(9) to RELENG_8. The changes are required to fix some issues with VIA padlock engine, and to actually merge aesni(4) to RELENG_8. I ask to look at the possible FPU context handling regressions. Reports from the users of VIA padlock hardware are also needed. Any user that has suspend/resume magically working on 8 branch, please test that the patch does not make the things worse. Please note that the pre-release freeze will start in 2 weeks, so I need to get testing results relatively quickly to be in time for 8.2. Patch is at http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patch Thanks in advance. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20101115/9afae51a/attachment.pgp
On 11/15/2010 4:13 PM, Kostik Belousov wrote:> > Patch is at > http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patchHi, One small failure on the patch The text leading up to this was: -------------------------- |Index: pc98/include/npx.h |==================================================================|--- pc98/include/npx.h (revision 215253) |+++ pc98/include/npx.h (working copy) -------------------------- Patching file pc98/include/npx.h using Plan A... Hunk #1 failed at 1. 1 out of 1 hunks failed--saving rejects to pc98/include/npx.h.rej I tested with openssl and openvpn and all seems to work great on the via board and my i5 board!! Simple test details at http://www.tancsa.com/fpu.html I will try out geli and some more extensive tests tomorrow Thanks for porting this back to RELENG_8 ! ---Mike
On Mon, Nov 15, 2010 at 10:42:50PM -0500, Mike Tancsa wrote:> On 11/15/2010 4:13 PM, Kostik Belousov wrote: > > > > Patch is at > > http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patch > > > Hi, > One small failure on the patch > > The text leading up to this was: > -------------------------- > |Index: pc98/include/npx.h > |==================================================================> |--- pc98/include/npx.h (revision 215253) > |+++ pc98/include/npx.h (working copy) > -------------------------- > Patching file pc98/include/npx.h using Plan A... > Hunk #1 failed at 1. > 1 out of 1 hunks failed--saving rejects to pc98/include/npx.h.rejThis is because our patch(1) in base is somewhat old, I believe. The diff was generated by svn diff from the up to date stable/8 checkout, and the reason for failure is expanded $FreeBSD$ tags. Newer gnu patch, available in ports, handless this correctly, reporting about patches applied with "fuzz".> > > I tested with openssl and openvpn and all seems to work great on the via > board and my i5 board!! Simple test details at > > http://www.tancsa.com/fpu.html > > I will try out geli and some more extensive tests tomorrow > > Thanks for porting this back to RELENG_8 !This is actually somewhat puzzling. Does openssl in base automatically use crypto(4) ? Also, could you, please redo the speed tests for aesni(4) with the following patch applied over the driver sources ? Thank you ! diff --git a/sys/crypto/aesni/aesni_wrap.c b/sys/crypto/aesni/aesni_wrap.c index 36c66ea..3fd397c 100644 --- a/sys/crypto/aesni/aesni_wrap.c +++ b/sys/crypto/aesni/aesni_wrap.c @@ -246,14 +246,21 @@ int aesni_cipher_setup(struct aesni_session *ses, struct cryptoini *encini) { struct thread *td; - int error; + int error, saved_ctx; td = curthread; - error = fpu_kern_enter(td, &ses->fpu_ctx, FPU_KERN_NORMAL); + if (!is_fpu_kern_thread(0)) { + error = fpu_kern_enter(td, &ses->fpu_ctx, FPU_KERN_NORMAL); + saved_ctx = 1; + } else { + error = 0; + saved_ctx = 0; + } if (error == 0) { error = aesni_cipher_setup_common(ses, encini->cri_key, encini->cri_klen); - fpu_kern_leave(td, &ses->fpu_ctx); + if (saved_ctx) + fpu_kern_leave(td, &ses->fpu_ctx); } return (error); } @@ -264,16 +271,22 @@ aesni_cipher_process(struct aesni_session *ses, struct cryptodesc *enccrd, { struct thread *td; uint8_t *buf; - int error, allocated; + int error, allocated, saved_ctx; buf = aesni_cipher_alloc(enccrd, crp, &allocated); if (buf == NULL) return (ENOMEM); td = curthread; - error = fpu_kern_enter(td, &ses->fpu_ctx, FPU_KERN_NORMAL); - if (error != 0) - goto out; + if (!is_fpu_kern_thread(0)) { + error = fpu_kern_enter(td, &ses->fpu_ctx, FPU_KERN_NORMAL); + if (error != 0) + goto out; + saved_ctx = 1; + } else { + saved_ctx = 0; + error = 0; + } if ((enccrd->crd_flags & CRD_F_KEY_EXPLICIT) != 0) { error = aesni_cipher_setup_common(ses, enccrd->crd_key, @@ -311,7 +324,8 @@ aesni_cipher_process(struct aesni_session *ses, struct cryptodesc *enccrd, ses->iv); } } - fpu_kern_leave(td, &ses->fpu_ctx); + if (saved_ctx) + fpu_kern_leave(td, &ses->fpu_ctx); if (allocated) crypto_copyback(crp->crp_flags, crp->crp_buf, enccrd->crd_skip, enccrd->crd_len, buf); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20101116/b06defa5/attachment.pgp
On Tue, Nov 16, 2010 at 08:46:23PM -0500, Mike Tancsa wrote:> On 11/16/2010 5:19 PM, Kostik Belousov wrote: > > Would your conclusion be that the patch seems to increase the throughput > > of the aesni(4) ? > > > > I think that on small-sized blocks, when using aesni(4), the dominating > > factor is the copying/copyout of the data to/from the kernel address > > space. Still would be interesting to compare the full output > > of "openssl speed" on aesni(4) with and without the patch I posted. > > Hi, > There does seem to be some improvement on large blocks. But there are > some freakishly fast times. On other sizes, there is no difference in > speed it would seem > > I did 20 runs. Updated stats at http://www.tancsa.com/fpu.htmlThank you. Indeed, I think that the test units are too small so that random system events can cause the variation. Nonetheless, patch seems to help, so I committed it. Meantime, the similar change may be beneficial for padlock(4) too. f you are going to test it, please note that most likely, openssl padlock engine does not use padlock(4), I do not know for sure. diff --git a/sys/crypto/via/padlock.c b/sys/crypto/via/padlock.c index 77e059b..ba63093 100644 --- a/sys/crypto/via/padlock.c +++ b/sys/crypto/via/padlock.c @@ -170,7 +170,7 @@ padlock_newsession(device_t dev, uint32_t *sidp, struct cryptoini *cri) struct padlock_session *ses = NULL; struct cryptoini *encini, *macini; struct thread *td; - int error; + int error, saved_ctx; if (sidp == NULL || cri == NULL) return (EINVAL); @@ -238,10 +238,18 @@ padlock_newsession(device_t dev, uint32_t *sidp, struct cryptoini *cri) if (macini != NULL) { td = curthread; - error = fpu_kern_enter(td, &ses->ses_fpu_ctx, FPU_KERN_NORMAL); + if (!is_fpu_kern_thread(0)) { + error = fpu_kern_enter(td, &ses->ses_fpu_ctx, + FPU_KERN_NORMAL); + saved_ctx = 1; + } else { + error = 0; + saved_ctx = 0; + } if (error == 0) { error = padlock_hash_setup(ses, macini); - fpu_kern_leave(td, &ses->ses_fpu_ctx); + if (saved_ctx) + fpu_kern_leave(td, &ses->ses_fpu_ctx); } if (error != 0) { padlock_freesession_one(sc, ses, 0); diff --git a/sys/crypto/via/padlock_cipher.c b/sys/crypto/via/padlock_cipher.c index 0ae26c8..1456ddf 100644 --- a/sys/crypto/via/padlock_cipher.c +++ b/sys/crypto/via/padlock_cipher.c @@ -205,7 +205,7 @@ padlock_cipher_process(struct padlock_session *ses, struct cryptodesc *enccrd, struct thread *td; u_char *buf, *abuf; uint32_t *key; - int allocated, error; + int allocated, error, saved_ctx; buf = padlock_cipher_alloc(enccrd, crp, &allocated); if (buf == NULL) @@ -250,14 +250,21 @@ padlock_cipher_process(struct padlock_session *ses, struct cryptodesc *enccrd, } td = curthread; - error = fpu_kern_enter(td, &ses->ses_fpu_ctx, FPU_KERN_NORMAL); + if (!is_fpu_kern_thread(0)) { + error = fpu_kern_enter(td, &ses->ses_fpu_ctx, FPU_KERN_NORMAL); + saved_ctx = 1; + } else { + error = 0; + saved_ctx = 0; + } if (error != 0) goto out; padlock_cbc(abuf, abuf, enccrd->crd_len / AES_BLOCK_LEN, key, cw, ses->ses_iv); - fpu_kern_leave(td, &ses->ses_fpu_ctx); + if (saved_ctx) + fpu_kern_leave(td, &ses->ses_fpu_ctx); if (allocated) { crypto_copyback(crp->crp_flags, crp->crp_buf, enccrd->crd_skip, diff --git a/sys/crypto/via/padlock_hash.c b/sys/crypto/via/padlock_hash.c index 58c58b2..0fe182b 100644 --- a/sys/crypto/via/padlock_hash.c +++ b/sys/crypto/via/padlock_hash.c @@ -366,17 +366,24 @@ padlock_hash_process(struct padlock_session *ses, struct cryptodesc *maccrd, struct cryptop *crp) { struct thread *td; - int error; + int error, saved_ctx; td = curthread; - error = fpu_kern_enter(td, &ses->ses_fpu_ctx, FPU_KERN_NORMAL); + if (!is_fpu_kern_thread(0)) { + error = fpu_kern_enter(td, &ses->ses_fpu_ctx, FPU_KERN_NORMAL); + saved_ctx = 1; + } else { + error = 0; + saved_ctx = 0; + } if (error != 0) return (error); if ((maccrd->crd_flags & CRD_F_KEY_EXPLICIT) != 0) padlock_hash_key_setup(ses, maccrd->crd_key, maccrd->crd_klen); error = padlock_authcompute(ses, maccrd, crp->crp_buf, crp->crp_flags); - fpu_kern_leave(td, &ses->ses_fpu_ctx); + if (saved_ctx) + fpu_kern_leave(td, &ses->ses_fpu_ctx); return (error); } -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20101117/85e24276/attachment.pgp
On 11/17/2010 11:35 AM, Kostik Belousov wrote:> > Meantime, the similar change may be beneficial for padlock(4) too. > f you are going to test it, please note that most likely, openssl padlock > engine does not use padlock(4), I do not know for sure.I did some more tests since someone said they had problems with geli, ipsec and padlock. In the simple tests I did, I didnt find any regressions or speed differences. Info appended to http://www.tancsa.com/fpu.html I also compared to stock RELENG_8 and padlock and didnt find any issues. ---Mike
Not sure if this is the kind of testing you were looking for; but I've run both mprime and boinc/setiathome for the last two days without any problem... It's not a notebook so I can't test suspend/resume.. On 10-11-15 4:13 PM, Kostik Belousov wrote:> Hello, > this is a call for testers of the merge of fpu_kern_enter/leave(9) > to RELENG_8. The changes are required to fix some issues with VIA > padlock engine, and to actually merge aesni(4) to RELENG_8. > > I ask to look at the possible FPU context handling regressions. > Reports from the users of VIA padlock hardware are also needed. > Any user that has suspend/resume magically working on > 8 branch, please test that the patch does not make the things > worse. > > Please note that the pre-release freeze will start in 2 weeks, so > I need to get testing results relatively quickly to be in time for 8.2. > > Patch is at > http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patch > > Thanks in advance.-- Daryl Richards Isle Technical Services Inc.
On 11/16/2010 4:43 AM, Kostik Belousov wrote:> On Mon, Nov 15, 2010 at 10:42:50PM -0500, Mike Tancsa wrote: >> On 11/15/2010 4:13 PM, Kostik Belousov wrote: >>> >>> Patch is at >>> http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patchI did some more tests post commit today using the aesni kld taken directly from HEAD. BTW, do you plan to MFC this as well ? Results at the bottom of http://www.tancsa.com/fpu.html It certainly makes a difference with geli. IPSEC and userland stuff, not so much. The CPU itself is crazy fast, so its hard to see a difference in things like ssh and even ipsec didnt yield any differences. For ssh and userland stuff I guess once there is an aesni userland engine, this would probably help over the cryptodev interface. ---Mike
On Sat, Nov 20, 2010 at 01:30:54AM -0500, Mike Tancsa wrote:> On 11/16/2010 4:43 AM, Kostik Belousov wrote: > > On Mon, Nov 15, 2010 at 10:42:50PM -0500, Mike Tancsa wrote: > >> On 11/15/2010 4:13 PM, Kostik Belousov wrote: > >>> > >>> Patch is at > >>> http://people.freebsd.org/~kib/misc/releng_8_fpu.1.patch > > > I did some more tests post commit today using the aesni kld taken > directly from HEAD. BTW, do you plan to MFC this as well ?Sure, I will merge aesni(4), it was the only reason to work on the kern_fpu in stable/8. I want some pause between KPI and driver MFC, to ease the handling of possible mismerge or fixing latent bugs (since stable has much larger testing base then HEAD).> > Results at the bottom of http://www.tancsa.com/fpu.html > > It certainly makes a difference with geli. IPSEC and userland stuff, not > so much. The CPU itself is crazy fast, so its hard to see a difference > in things like ssh and even ipsec didnt yield any differences. For ssh > and userland stuff I guess once there is an aesni userland engine, this > would probably help over the cryptodev interface.Yes, the small blocks encoding/decoding has a large overhead of loop setup code. Thank you. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20101120/3318495c/attachment.pgp