Looking at this code (2.0.2), it appears to have a couple of problems I could not find mentioned on the mailing list archive: (1) If base is zero in an expand-up segment, the conversion will yield an expand-down segment covering the whole 4Gb, thus providing a mechanism to obtain access to XEN space. (2) If a malicious program accesses memory at a small negative offset from gs:0 and the access extends into the positive range, the access will gp-fault with either descriptor setting, thus leading to an endless loop of flipping between the two states. (3) Since escaped opcodes (those starting with 0F) aren''t handled, accessing mm/xmm data in __thread variables (along with other specialized operations on such variable the compiler might generate) is going to kill the program. Of course, it is similarly problematic that SIB addressing still isn''t implemented, but that''s at least stated so in the code. (4) In the no-mod-r/m handling of the decoder, the byte case is handled incorrectly: The address it deals with is still a 32-byte (or 16-byte, but 16-bit addressing isn''t handled anyway) one. There simply must not be a ''case 1'' there, and the insn_decode table should be changed accordingly. (5, minor) The change from 2.0.1 to 2.0.2 (making the code a lot more correct) left an access to the no longer existing positive_access parameter of fixup_seg in (at least) one of the DPRINTK-s. Jan ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Looking at this code (2.0.2), it appears to have a couple of problems I > could not find mentioned on the mailing list archive:Thanks, we''ll look into these. The whole TLS workaround is a huge pain in the butt, and has pretty horrible performance anyhow. The segment flipping approach doesn''t buy us much as +ve and -ve accesses seem to be interleaved. It at least enables some simplifications to the instruction decoder (though as you point out, we need to extend it some). We really need to look into producing a suitably patched glibc rpm. Likewise, we need to prevent gcc from generating -ve offsets for tls as the default. Ian> (1) If base is zero in an expand-up segment, the conversion will yield > an expand-down segment covering the whole 4Gb, thus providing a > mechanism to obtain access to XEN space. > > (2) If a malicious program accesses memory at a small negative offset > from gs:0 and the access extends into the positive range, the access > will gp-fault with either descriptor setting, thus leading to an endless > loop of flipping between the two states. > > (3) Since escaped opcodes (those starting with 0F) aren''t handled, > accessing mm/xmm data in __thread variables (along with other > specialized operations on such variable the compiler might generate) is > going to kill the program. Of course, it is similarly problematic that > SIB addressing still isn''t implemented, but that''s at least stated so in > the code. > > (4) In the no-mod-r/m handling of the decoder, the byte case is handled > incorrectly: The address it deals with is still a 32-byte (or 16-byte, > but 16-bit addressing isn''t handled anyway) one. There simply must not > be a ''case 1'' there, and the insn_decode table should be changed > accordingly. > > (5, minor) The change from 2.0.1 to 2.0.2 (making the code a lot more > correct) left an access to the no longer existing positive_access > parameter of fixup_seg in (at least) one of the DPRINTK-s. > > Jan > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Hi! Ian Pratt wrote: [..]> We really need to look into producing a suitably patched glibc > rpm. Likewise, we need to prevent gcc from generating -ve offsets > for tls as the default.You can workaround that in userspace with: LD_ASSUME_KERNEL=2.4.9 Set it in /etc/environment or in your kernel command line. The later depends on your setup: IMMV. Regards, Nuno Silva ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
I don''t think that would get us all the way. It would stop ld.so linking to TLS glibc after booting, but: 1. GCC can now emit -ve offset accesses directly in application binaries. So the problem may not be entirely solvable at link-time. 2. Still need to deal with linking of ''init'' process, which is run before you get a chance to set LD_ASSUME_KERNEL. Maybe there''s a workaround for 2 less invasive than deleting /lib/tls, but I don''t know what it is. -- Keir> > Hi! > > Ian Pratt wrote: > > [..] > > > We really need to look into producing a suitably patched glibc > > rpm. Likewise, we need to prevent gcc from generating -ve offsets > > for tls as the default. > > You can workaround that in userspace with: > LD_ASSUME_KERNEL=2.4.9 > > Set it in /etc/environment or in your kernel command line. The later > depends on your setup: IMMV. > > Regards, > Nuno Silva > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Maybe there''s a workaround for 2 less invasive than deleting /lib/tls, > but I don''t know what it is.Under Debian at least, deleting /lib/tls is only a temporary measure as if the right packages are updated, it is re-created again. Maybe a redirect could fix that though. James ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Looking at this code (2.0.2), it appears to have a couple of problems I > could not find mentioned on the mailing list archive: > > (1) If base is zero in an expand-up segment, the conversion will yield > an expand-down segment covering the whole 4Gb, thus providing a > mechanism to obtain access to XEN space.No it won''t. When we flip to expands-down we set the limit to (-(base & PAGE_MASK) >> 12) - 1 == (-(0 & PAGE_MASK) >> 12) - 1 == 0 - 1 == 0xfffff This is a *zero-length* grows-down segment, exactly as we require for safety. [Yes, you can have a zero-length grows-down segment, even though it is impossible to have a zero-length grows-up segment -- I tested on real silicon.]> (2) If a malicious program accesses memory at a small negative offset > from gs:0 and the access extends into the positive range, the access > will gp-fault with either descriptor setting, thus leading to an endless > loop of flipping between the two states.Harmless. The guest OS will still execute (e.g., to service timer interrupts) and will be able to preempt the malicious program when it has received its timeslice. So the program cannot take over the machine -- it can only receive the same amount of CPU as a user-space infinite CPU loop.> (3) Since escaped opcodes (those starting with 0F) aren''t handled, > accessing mm/xmm data in __thread variables (along with other > specialized operations on such variable the compiler might generate) is > going to kill the program. Of course, it is similarly problematic that > SIB addressing still isn''t implemented, but that''s at least stated so in > the code.Yes there are restrictions in what it supports. But we have a DPRINTK for those cases so we can fix them up as/if they occur.> (4) In the no-mod-r/m handling of the decoder, the byte case is handled > incorrectly: The address it deals with is still a 32-byte (or 16-byte, > but 16-bit addressing isn''t handled anyway) one. There simply must not > be a ''case 1'' there, and the insn_decode table should be changed > accordingly.You mean the following fragment? if ( !(decode & HAS_MODRM) ) { switch ( decode & 7 ) { case 1: offset = (long)(*(char *)pb); goto skip_modrm; It is correct -- sign extends the 8-bit offset to a signed long as we require. The instruction only contains a single-byte offset -- it isn''t stored as 16 or 32 bits.> (5, minor) The change from 2.0.1 to 2.0.2 (making the code a lot more > correct) left an access to the no longer existing positive_access > parameter of fixup_seg in (at least) one of the DPRINTK-s.Now fixed in the new 2.0.3 release. The FC3 issue that some poeple (including Rik) were seeing is also now fixed. -- Keir ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
>> Looking at this code (2.0.2), it appears to have a couple of problemsI>> could not find mentioned on the mailing list archive: >> >> (1) If base is zero in an expand-up segment, the conversion willyield>> an expand-down segment covering the whole 4Gb, thus providing a >> mechanism to obtain access to XEN space. > >No it won''t. When we flip to expands-down we set the limit to > (-(base & PAGE_MASK) >> 12) - 1 > == (-(0 & PAGE_MASK) >> 12) - 1 > == 0 - 1 > == 0xfffff > >This is a *zero-length* grows-down segment, exactly as we require for >safety. [Yes, you can have a zero-length grows-down segment, even >though it is impossible to have a zero-length grows-up segment -- I >tested on real silicon.]Oh, right, I didn''t connect the -1 done after the flip: label. But it seems pointless to flip a segment with a base of zero; the gp fault must have had a different reason (i.e. by flipping it to a zero-length segment you''ll only cause another gp fault right away when restarting the instruction).>> (2) If a malicious program accesses memory at a small negativeoffset>> from gs:0 and the access extends into the positive range, theaccess>> will gp-fault with either descriptor setting, thus leading to anendless>> loop of flipping between the two states. > >Harmless. The guest OS will still execute (e.g., to service timer >interrupts) and will be able to preempt the malicious program when it >has received its timeslice. So the program cannot take over the >machine -- it can only receive the same amount of CPU as a user-space >infinite CPU loop.Not exactly: if a user mode process runs en infinite loop, it''ll not block out interrupts for any more than a single instruction. For the mentioned scenario, (physical) interrupt latency would significantly increase.>> (3) Since escaped opcodes (those starting with 0F) aren''t handled, >> accessing mm/xmm data in __thread variables (along with other >> specialized operations on such variable the compiler might generate)is>> going to kill the program. Of course, it is similarly problematicthat>> SIB addressing still isn''t implemented, but that''s at least statedso in>> the code. > >Yes there are restrictions in what it supports. But we have a DPRINTK >for those cases so we can fix them up as/if they occur.Except that the DPRINTK expands to nothing in the published sources, so one would never know what caused a spurious SEGV seen in a client OSes app without re-running the whole thing with a version of XEN where these DPRINTKs are actually doing something.>> (4) In the no-mod-r/m handling of the decoder, the byte case ishandled>> incorrectly: The address it deals with is still a 32-byte (or16-byte,>> but 16-bit addressing isn''t handled anyway) one. There simply mustnot>> be a ''case 1'' there, and the insn_decode table should be changed >> accordingly. > >You mean the following fragment? > > if ( !(decode & HAS_MODRM) ) > { > switch ( decode & 7 ) > { > case 1: > offset = (long)(*(char *)pb); > goto skip_modrm;Yes.>It is correct -- sign extends the 8-bit offset to a signed long as we >require. The instruction only contains a single-byte offset -- it >isn''t stored as 16 or 32 bits.I don''t think so. The instructions this refers to are (opcodes A1 and A3) movb symbol, %al movb %al, symbol Clearly, they take a full (16-/32-bit) address but operate on a byte at that address (they would be useless if they operated only on an 8-bit sign-extended address). Having a second look at this reveals another dangerous thing: The code here directly derefences pb, whereas all other instances of references through ip correctly use get_user. Finally, in the case I''m still missing something here and the ''case 1'' is needed, then an operation like (long)*(char *) is rather dangerous, because you depend on the (implementation defined) signedness of char. You''d want to code (long)*(signed char *) instead. Jan ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> >No it won''t. When we flip to expands-down we set the limit to > > (-(base & PAGE_MASK) >> 12) - 1 > > == (-(0 & PAGE_MASK) >> 12) - 1 > > == 0 - 1 > > == 0xfffff > > > >This is a *zero-length* grows-down segment, exactly as we require for > >safety. [Yes, you can have a zero-length grows-down segment, even > >though it is impossible to have a zero-length grows-up segment -- I > >tested on real silicon.] > > Oh, right, I didn''t connect the -1 done after the flip: label. But it > seems pointless to flip a segment with a base of zero; the gp fault must > have had a different reason (i.e. by flipping it to a zero-length > segment you''ll only cause another gp fault right away when restarting > the instruction).We could improve the flipping code by detecting accesses that overlap Xen''s private area and simply propagate the GPF in that case. That would stop us flipping to a zero-length segment. Modifying that code again is not high on our priority list, though. Our main aim is compatibility with non-buggy programs. If there is a minuscule chance of buggy programs looping instead of seg-faulting, we can accept that for now. :-)> >Harmless. The guest OS will still execute (e.g., to service timer > >interrupts) and will be able to preempt the malicious program when it > >has received its timeslice. So the program cannot take over the > >machine -- it can only receive the same amount of CPU as a user-space > >infinite CPU loop. > > Not exactly: if a user mode process runs en infinite loop, it''ll not > block out interrupts for any more than a single instruction. For the > mentioned scenario, (physical) interrupt latency would significantly > increase.Well, maybe the path thru the segment-flip code is a few hundred instructions. I bet there are plenty of syscall paths of *at least* that length that don''t contain a yield point. The major cost of the flipping is actually the privilege-level changes (ring 3 to 0 to 3). An application can trivially cause lots fo those, even with no seg-flipping.> >Yes there are restrictions in what it supports. But we have a DPRINTK > >for those cases so we can fix them up as/if they occur. > > Except that the DPRINTK expands to nothing in the published sources, so > one would never know what caused a spurious SEGV seen in a client OSes > app without re-running the whole thing with a version of XEN where these > DPRINTKs are actually doing something.I agree it is not the best situation. But to fix it we would need a much more complex decoder, and a comprehensive test suite (implementing the decoder without testing every instruction it can decode would just lead to latent bugs in Xen that could be worse than having no decoding ability for those instructions at all). It''s just more effort than we want to commit to -- the main purpose of the seg-flipping is to allow existing distros to initially boot and allow the sysadmin to move /lib/tls out of the way.> >It is correct -- sign extends the 8-bit offset to a signed long as we > >require. The instruction only contains a single-byte offset -- it > >isn''t stored as 16 or 32 bits. > > I don''t think so. The instructions this refers to are (opcodes A1 and > A3)Your analysis of these opcodes (A1 and A3) is correct, but both have code ''O|4'' in the insn_decode table, so they don''t go thru ''case 1:''. A0 and A2 have code ''O|1'' which are instructions: MOV moffs8,AL ; MOV AL,moffs8 These have a single-byte offset incoded within the instruction. The ''case 1:'' *is* needed for A0 and A2.> Having a second look at this reveals another dangerous thing: The code > here directly derefences pb, whereas all other instances of references > through ip correctly use get_user.Yes, this needs fixing.> Finally, in the case I''m still missing something here and the ''case 1'' > is needed, then an operation like (long)*(char *) is rather dangerous, > because you depend on the (implementation defined) signedness of char. > You''d want to code (long)*(signed char *) instead.Yes, this also needs fixing. :-) Thanks, Keir ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
>Your analysis of these opcodes (A1 and A3) is correct, but both have >code ''O|4'' in the insn_decode table, so they don''t go thru ''case 1:''.> >A0 and A2 have code ''O|1'' which are instructions: > MOV moffs8,AL ; MOV AL,moffs8 >These have a single-byte offset incoded within the instruction. >The ''case 1:'' *is* needed for A0 and A2.Hmm, I''m sorry, I actually meant A0 and A2. Just look at what the assembler generates for movb symbol, %al movb %al, symbol (gas -al=movb.l -o movb.o movb.s): GAS LISTING movb.s page 1 1 0000 A0000000 movb symbol, %al 1 00 2 0005 A2000000 movb %al, symbol 2 00 Jan ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> >Your analysis of these opcodes (A1 and A3) is correct, but both have > >code ''O|4'' in the insn_decode table, so they don''t go thru ''case 1:''. > > > > >A0 and A2 have code ''O|1'' which are instructions: > > MOV moffs8,AL ; MOV AL,moffs8 > >These have a single-byte offset incoded within the instruction. > >The ''case 1:'' *is* needed for A0 and A2. > > Hmm, I''m sorry, I actually meant A0 and A2. Just look at what the > assembler generates for > > movb symbol, %al > movb %al, symbolOops. Sorry - yes, you win. :-) I''ll get rid of ''case 1''. -- Keir ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It''s fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel