Hi, By default, Linaro Kernel is compiled with THUMB2 enabled. This small patch series allow a guest kernel to use THUMB instruction. Cheers, Julien Grall (3): xen/arm: Don''t emulate the MMIO access if the instruction syndrome is invalid xen/arm: Allow secondary cpus to start in THUMB xen/arm: errata 766422: decode thumb store during data abort xen/arch/arm/psci.c | 3 +++ xen/arch/arm/traps.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+) -- 1.7.10.4
Julien Grall
2013-Jul-23 18:05 UTC
[PATCH 1/3] xen/arm: Don''t emulate the MMIO access if the instruction syndrome is invalid
When the instruction syndrome is not valid, the transfer register is unknown. If this register is used in the emulation code (it''s the case for the VGIC), Xen can retrieve wrong data. For safety, consider invalid instruction syndrome as wrong memory access. Signed-off-by: Julien Grall <julien.grall@linaro.org> --- xen/arch/arm/traps.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index bbd60aa..d6dc37d 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1017,6 +1017,10 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, if ( rc == -EFAULT ) goto bad_data_abort; + /* XXX: Decode the instruction if ISS is not valid */ + if ( !dabt.valid ) + goto bad_data_abort; + if (handle_mmio(&info)) { regs->pc += dabt.len ? 4 : 2; -- 1.7.10.4
Julien Grall
2013-Jul-23 18:05 UTC
[PATCH 2/3] xen/arm: Allow secondary cpus to start in THUMB
Unlike bx, eret will not update the instruction set (THUMB,ARM) according to the return address. This will result to an unpredicable behaviour for the processor if the address doesn''t match the right instruction set. When the kernel is compiled with THUMB2, THUMB bit needs to be set in CPSR for the secondary cpus. Signed-off-by: Julien Grall <julien.grall@linaro.org> --- xen/arch/arm/psci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c index 18feead..574c343 100644 --- a/xen/arch/arm/psci.c +++ b/xen/arch/arm/psci.c @@ -43,6 +43,9 @@ int do_psci_cpu_on(uint32_t vcpuid, register_t entry_point) ctxt->ttbr1 = 0; ctxt->ttbcr = 0; /* Defined Reset Value */ ctxt->user_regs.cpsr = PSR_GUEST_INIT; + /* Start the VCPU in THUMB mode if it''s requested by the kernel */ + if ( entry_point & 1 ) + ctxt->user_regs.cpsr |= PSR_THUMB; ctxt->flags = VGCF_online; domain_lock(d); -- 1.7.10.4
Julien Grall
2013-Jul-23 18:05 UTC
[PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
From the errata document: When a non-secure non-hypervisor memory operation instruction generates a stage2 page table translation fault, a trap to the hypervisor will be triggered. For an architecturally defined subset of instructions, the Hypervisor Syndrome Register (HSR) will have the Instruction Syndrome Valid (ISV) bit set to 1’b1, and the Rt field should reflect the source register (for stores) or destination register for loads. On Cortex-A15, for Thumb and ThumbEE stores, the Rt value may be incorrect and should not be used, even if the ISV bit is set. All loads, and all ARM instruction set loads and stores, will have the correct Rt value if the ISV bit is set. To avoid this issue, Xen needs to decode thumb store instruction and update the transfer register. Signed-off-by: Julien Grall <julien.grall@linaro.org> --- xen/arch/arm/traps.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index d6dc37d..da2bef6 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -35,6 +35,7 @@ #include <asm/regs.h> #include <asm/cpregs.h> #include <asm/psci.h> +#include <asm/guest_access.h> #include "io.h" #include "vtimer.h" @@ -996,6 +997,31 @@ done: if (first) unmap_domain_page(first); } +static int read_instruction(struct cpu_user_regs *regs, unsigned len, + uint32_t *instr) +{ + int rc; + + rc = raw_copy_from_guest(instr, (void * __user)regs->pc, len); + + if ( rc ) + return rc; + + switch ( len ) + { + /* 16-bit instruction */ + case 2: + *instr &= 0xffff; + break; + /* 32-bit instruction */ + case 4: + *instr = (*instr & 0xffff) << 16 | (*instr & 0xffff0000) >> 16; + break; + } + + return 0; +} + static void do_trap_data_abort_guest(struct cpu_user_regs *regs, struct hsr_dabt dabt) { @@ -1021,6 +1047,27 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, if ( !dabt.valid ) goto bad_data_abort; + /* + * Errata 766422: Thumb store translation fault to Hypervisor may + * not have correct HSR Rt value. + */ + if ( (regs->cpsr & PSR_THUMB) && dabt.write ) + { + uint32_t instr = 0; + + rc = read_instruction(regs, dabt.len ? 4 : 2, &instr); + if ( rc ) + goto bad_data_abort; + + /* Retrieve the transfer register from the instruction */ + if ( dabt.len ) + /* With 32-bit store instruction, the register is in [12..15] */ + info.dabt.reg = (instr & 0xf000) >> 12; + else + /* With 16-bit store instruction, the register is in [0..3] */ + info.dabt.reg = instr & 0x7; + } + if (handle_mmio(&info)) { regs->pc += dabt.len ? 4 : 2; -- 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Tue, 2013-07-23 at 19:05 +0100, Julien Grall wrote:> Hi, > > By default, Linaro Kernel is compiled with THUMB2 enabled. This small patch > series allow a guest kernel to use THUMB instruction.Cool. I''m actually working on right now on a patch to correctly handle the ITSTATE machine cranking and conditional instructions, which will be needed for this too. Ian.> > Cheers, > > Julien Grall (3): > xen/arm: Don''t emulate the MMIO access if the instruction syndrome is > invalid > xen/arm: Allow secondary cpus to start in THUMB > xen/arm: errata 766422: decode thumb store during data abort > > xen/arch/arm/psci.c | 3 +++ > xen/arch/arm/traps.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 54 insertions(+) >
Ian Campbell
2013-Jul-23 18:12 UTC
Re: [PATCH 2/3] xen/arm: Allow secondary cpus to start in THUMB
On Tue, 2013-07-23 at 19:05 +0100, Julien Grall wrote:> Unlike bx, eret will not update the instruction set (THUMB,ARM) according to > the return address. This will result to an unpredicable behaviour for the > processor if the address doesn''t match the right instruction set. > > When the kernel is compiled with THUMB2, THUMB bit needs to be set in CPSR > for the secondary cpus. > > Signed-off-by: Julien Grall <julien.grall@linaro.org> > --- > xen/arch/arm/psci.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c > index 18feead..574c343 100644 > --- a/xen/arch/arm/psci.c > +++ b/xen/arch/arm/psci.c > @@ -43,6 +43,9 @@ int do_psci_cpu_on(uint32_t vcpuid, register_t entry_point) > ctxt->ttbr1 = 0; > ctxt->ttbcr = 0; /* Defined Reset Value */ > ctxt->user_regs.cpsr = PSR_GUEST_INIT; > + /* Start the VCPU in THUMB mode if it''s requested by the kernel */ > + if ( entry_point & 1 ) > + ctxt->user_regs.cpsr |= PSR_THUMB;Do we also need to clear bit 0 of the entry point, or does ERET get that right for us?> ctxt->flags = VGCF_online; > > domain_lock(d);
Ian Campbell
2013-Jul-23 18:18 UTC
Re: [PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
On Tue, 2013-07-23 at 19:05 +0100, Julien Grall wrote:> From the errata document: > > When a non-secure non-hypervisor memory operation instruction generates a > stage2 page table translation fault, a trap to the hypervisor will be triggered. > For an architecturally defined subset of instructions, the Hypervisor Syndrome > Register (HSR) will have the Instruction Syndrome Valid (ISV) bit set to 1’b1, > and the Rt field should reflect the source register (for stores) or destination > register for loads. > On Cortex-A15, for Thumb and ThumbEE stores, the Rt value may be incorrect > and should not be used, even if the ISV bit is set. All loads, and all ARM > instruction set loads and stores, will have the correct Rt value if the ISV > bit is set. > > To avoid this issue, Xen needs to decode thumb store instruction and update > the transfer register. > > Signed-off-by: Julien Grall <julien.grall@linaro.org> > --- > xen/arch/arm/traps.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 47 insertions(+) > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index d6dc37d..da2bef6 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -35,6 +35,7 @@ > #include <asm/regs.h> > #include <asm/cpregs.h> > #include <asm/psci.h> > +#include <asm/guest_access.h> > > #include "io.h" > #include "vtimer.h" > @@ -996,6 +997,31 @@ done: > if (first) unmap_domain_page(first); > } > > +static int read_instruction(struct cpu_user_regs *regs, unsigned len, > + uint32_t *instr) > +{ > + int rc; > + > + rc = raw_copy_from_guest(instr, (void * __user)regs->pc, len); > + > + if ( rc ) > + return rc; > + > + switch ( len ) > + { > + /* 16-bit instruction */ > + case 2: > + *instr &= 0xffff; > + break; > + /* 32-bit instruction */ > + case 4: > + *instr = (*instr & 0xffff) << 16 | (*instr & 0xffff0000) >> 16;Since you only ever consult bits 12..15 or 0..3 of the result couldn't you just read two bytes from pc+2 instead of reading 4 bytes and swapping etc?> + break; > + } > + > + return 0; > +} > + > static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > struct hsr_dabt dabt) > { > @@ -1021,6 +1047,27 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > if ( !dabt.valid ) > goto bad_data_abort; > > + /* > + * Errata 766422: Thumb store translation fault to Hypervisor may > + * not have correct HSR Rt value. > + */ > + if ( (regs->cpsr & PSR_THUMB) && dabt.write )Is there some way to more precisely identify the processors with this errata? It'd be better to avoid this rigmarole when we can... I'd think about implementing this as a pseudo-cpuinfo thing setup either via identify_cpu or perhaps via a processor callback in proc-v7.S and friends. Then you can define and use something like cpu_has_errata766422 in the conditional and force it to zero for arm64 builds.> + { > + uint32_t instr = 0; > + > + rc = read_instruction(regs, dabt.len ? 4 : 2, &instr);You might as well just pass dabt.len directly I think, since you just decode it again with a switch.> + if ( rc ) > + goto bad_data_abort; > + > + /* Retrieve the transfer register from the instruction */ > + if ( dabt.len ) > + /* With 32-bit store instruction, the register is in [12..15] */ > + info.dabt.reg = (instr & 0xf000) >> 12; > + else > + /* With 16-bit store instruction, the register is in [0..3] */ > + info.dabt.reg = instr & 0x7; > + } > + > if (handle_mmio(&info)) > { > regs->pc += dabt.len ? 4 : 2;_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Julien Grall
2013-Jul-23 21:28 UTC
Re: [PATCH 2/3] xen/arm: Allow secondary cpus to start in THUMB
On 23 July 2013 19:12, Ian Campbell <ian.campbell@citrix.com> wrote:> On Tue, 2013-07-23 at 19:05 +0100, Julien Grall wrote: >> Unlike bx, eret will not update the instruction set (THUMB,ARM) according to >> the return address. This will result to an unpredicable behaviour for the >> processor if the address doesn''t match the right instruction set. >> >> When the kernel is compiled with THUMB2, THUMB bit needs to be set in CPSR >> for the secondary cpus. >> >> Signed-off-by: Julien Grall <julien.grall@linaro.org> >> --- >> xen/arch/arm/psci.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c >> index 18feead..574c343 100644 >> --- a/xen/arch/arm/psci.c >> +++ b/xen/arch/arm/psci.c >> @@ -43,6 +43,9 @@ int do_psci_cpu_on(uint32_t vcpuid, register_t entry_point) >> ctxt->ttbr1 = 0; >> ctxt->ttbcr = 0; /* Defined Reset Value */ >> ctxt->user_regs.cpsr = PSR_GUEST_INIT; >> + /* Start the VCPU in THUMB mode if it''s requested by the kernel */ >> + if ( entry_point & 1 ) >> + ctxt->user_regs.cpsr |= PSR_THUMB; > > Do we also need to clear bit 0 of the entry point, or does ERET get that > right for us?ERET will clear bit 0. So no need to clear it. -- Julien Grall
Julien Grall
2013-Jul-23 21:41 UTC
Re: [PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
On 23 July 2013 19:18, Ian Campbell <ian.campbell@citrix.com> wrote:> On Tue, 2013-07-23 at 19:05 +0100, Julien Grall wrote: >> From the errata document: >> >> When a non-secure non-hypervisor memory operation instruction generates a >> stage2 page table translation fault, a trap to the hypervisor will be triggered. >> For an architecturally defined subset of instructions, the Hypervisor Syndrome >> Register (HSR) will have the Instruction Syndrome Valid (ISV) bit set to 1’b1, >> and the Rt field should reflect the source register (for stores) or destination >> register for loads. >> On Cortex-A15, for Thumb and ThumbEE stores, the Rt value may be incorrect >> and should not be used, even if the ISV bit is set. All loads, and all ARM >> instruction set loads and stores, will have the correct Rt value if the ISV >> bit is set. >> >> To avoid this issue, Xen needs to decode thumb store instruction and update >> the transfer register. >> >> Signed-off-by: Julien Grall <julien.grall@linaro.org> >> --- >> xen/arch/arm/traps.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 47 insertions(+) >> >> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c >> index d6dc37d..da2bef6 100644 >> --- a/xen/arch/arm/traps.c >> +++ b/xen/arch/arm/traps.c >> @@ -35,6 +35,7 @@ >> #include <asm/regs.h> >> #include <asm/cpregs.h> >> #include <asm/psci.h> >> +#include <asm/guest_access.h> >> >> #include "io.h" >> #include "vtimer.h" >> @@ -996,6 +997,31 @@ done: >> if (first) unmap_domain_page(first); >> } >> >> +static int read_instruction(struct cpu_user_regs *regs, unsigned len, >> + uint32_t *instr) >> +{ >> + int rc; >> + >> + rc = raw_copy_from_guest(instr, (void * __user)regs->pc, len); >> + >> + if ( rc ) >> + return rc; >> + >> + switch ( len ) >> + { >> + /* 16-bit instruction */ >> + case 2: >> + *instr &= 0xffff; >> + break; >> + /* 32-bit instruction */ >> + case 4: >> + *instr = (*instr & 0xffff) << 16 | (*instr & 0xffff0000) >> 16; > > Since you only ever consult bits 12..15 or 0..3 of the result couldn't > you just read two bytes from pc+2 instead of reading 4 bytes and > swapping etc?I was thinking to provide a "high level" function to retrieve an instruction just in case we need it in the future.>> + break; >> + } >> + >> + return 0; >> +} >> + >> static void do_trap_data_abort_guest(struct cpu_user_regs *regs, >> struct hsr_dabt dabt) >> { >> @@ -1021,6 +1047,27 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, >> if ( !dabt.valid ) >> goto bad_data_abort; >> >> + /* >> + * Errata 766422: Thumb store translation fault to Hypervisor may >> + * not have correct HSR Rt value. >> + */ >> + if ( (regs->cpsr & PSR_THUMB) && dabt.write ) > > Is there some way to more precisely identify the processors with this > errata? It'd be better to avoid this rigmarole when we can...Only r0p4 (for instance the arndale board) should be impacted by this errata.> I'd think about implementing this as a pseudo-cpuinfo thing setup either > via identify_cpu or perhaps via a processor callback in proc-v7.S and > friends. Then you can define and use something like cpu_has_errata766422 > in the conditional and force it to zero for arm64 builds.Sounds good. I will rework the patch with this solution.>> + { >> + uint32_t instr = 0; >> + >> + rc = read_instruction(regs, dabt.len ? 4 : 2, &instr); > > You might as well just pass dabt.len directly I think, since you just > decode it again with a switch.If a high-level helper to read an instruction is not needed, I will directly mix this function with the decode part below.>> + if ( rc ) >> + goto bad_data_abort; >> + >> + /* Retrieve the transfer register from the instruction */ >> + if ( dabt.len ) >> + /* With 32-bit store instruction, the register is in [12..15] */ >> + info.dabt.reg = (instr & 0xf000) >> 12; >> + else >> + /* With 16-bit store instruction, the register is in [0..3] */ >> + info.dabt.reg = instr & 0x7; >> + } >> + >> if (handle_mmio(&info)) >> { >> regs->pc += dabt.len ? 4 : 2; > >-- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Jul-23 22:07 UTC
Re: [PATCH 2/3] xen/arm: Allow secondary cpus to start in THUMB
On Tue, 2013-07-23 at 22:28 +0100, Julien Grall wrote:> On 23 July 2013 19:12, Ian Campbell <ian.campbell@citrix.com> wrote: > > On Tue, 2013-07-23 at 19:05 +0100, Julien Grall wrote: > >> Unlike bx, eret will not update the instruction set (THUMB,ARM) according to > >> the return address. This will result to an unpredicable behaviour for the > >> processor if the address doesn''t match the right instruction set. > >> > >> When the kernel is compiled with THUMB2, THUMB bit needs to be set in CPSR > >> for the secondary cpus. > >> > >> Signed-off-by: Julien Grall <julien.grall@linaro.org> > >> --- > >> xen/arch/arm/psci.c | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c > >> index 18feead..574c343 100644 > >> --- a/xen/arch/arm/psci.c > >> +++ b/xen/arch/arm/psci.c > >> @@ -43,6 +43,9 @@ int do_psci_cpu_on(uint32_t vcpuid, register_t entry_point) > >> ctxt->ttbr1 = 0; > >> ctxt->ttbcr = 0; /* Defined Reset Value */ > >> ctxt->user_regs.cpsr = PSR_GUEST_INIT; > >> + /* Start the VCPU in THUMB mode if it''s requested by the kernel */ > >> + if ( entry_point & 1 ) > >> + ctxt->user_regs.cpsr |= PSR_THUMB; > > > > Do we also need to clear bit 0 of the entry point, or does ERET get that > > right for us? > > ERET will clear bit 0. So no need to clear it.Perfect! Acked-by: Ian Campbell <ian.campbell@citrix.com> Strictly speaking the PSCI spec (DEN 0022B.b) says that for aarch64 guests we should return INVALID_PARAMETERS (-2) if bit 1 is set. Ian
Julien Grall
2013-Jul-23 22:09 UTC
Re: [PATCH 2/3] xen/arm: Allow secondary cpus to start in THUMB
On 23 July 2013 23:07, Ian Campbell <ian.campbell@citrix.com> wrote:> On Tue, 2013-07-23 at 22:28 +0100, Julien Grall wrote: >> On 23 July 2013 19:12, Ian Campbell <ian.campbell@citrix.com> wrote: >> > On Tue, 2013-07-23 at 19:05 +0100, Julien Grall wrote: >> >> Unlike bx, eret will not update the instruction set (THUMB,ARM) according to >> >> the return address. This will result to an unpredicable behaviour for the >> >> processor if the address doesn''t match the right instruction set. >> >> >> >> When the kernel is compiled with THUMB2, THUMB bit needs to be set in CPSR >> >> for the secondary cpus. >> >> >> >> Signed-off-by: Julien Grall <julien.grall@linaro.org> >> >> --- >> >> xen/arch/arm/psci.c | 3 +++ >> >> 1 file changed, 3 insertions(+) >> >> >> >> diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c >> >> index 18feead..574c343 100644 >> >> --- a/xen/arch/arm/psci.c >> >> +++ b/xen/arch/arm/psci.c >> >> @@ -43,6 +43,9 @@ int do_psci_cpu_on(uint32_t vcpuid, register_t entry_point) >> >> ctxt->ttbr1 = 0; >> >> ctxt->ttbcr = 0; /* Defined Reset Value */ >> >> ctxt->user_regs.cpsr = PSR_GUEST_INIT; >> >> + /* Start the VCPU in THUMB mode if it''s requested by the kernel */ >> >> + if ( entry_point & 1 ) >> >> + ctxt->user_regs.cpsr |= PSR_THUMB; >> > >> > Do we also need to clear bit 0 of the entry point, or does ERET get that >> > right for us? >> >> ERET will clear bit 0. So no need to clear it. > > Perfect! > > Acked-by: Ian Campbell <ian.campbell@citrix.com> > > Strictly speaking the PSCI spec (DEN 0022B.b) says that for aarch64 > guests we should return INVALID_PARAMETERS (-2) if bit 1 is set.I can update the patch to check if we are running an aarch64 guest. -- Julien Grall
Ian Campbell
2013-Jul-23 22:10 UTC
Re: [PATCH 1/3] xen/arm: Don''t emulate the MMIO access if the instruction syndrome is invalid
On Tue, 2013-07-23 at 19:05 +0100, Julien Grall wrote:> When the instruction syndrome is not valid, the transfer register is unknown.Are there known circumstances when this can happen? Trapped store multiples or something like that? Did you actually see one?> If this register is used in the emulation code (it''s the case for the VGIC), > Xen can retrieve wrong data. > > For safety, consider invalid instruction syndrome as wrong memory access.That''s not really what it is though. I think this deserves at least a printed warning but to be honest if we aren''t going to emulate the instruction then there isn''t much chance that the guest will be able to recover from a spurious dabt -- IOW we might as well just shoot the guest in the head?> Signed-off-by: Julien Grall <julien.grall@linaro.org> > --- > xen/arch/arm/traps.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index bbd60aa..d6dc37d 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -1017,6 +1017,10 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > if ( rc == -EFAULT ) > goto bad_data_abort; > > + /* XXX: Decode the instruction if ISS is not valid */ > + if ( !dabt.valid ) > + goto bad_data_abort; > + > if (handle_mmio(&info)) > { > regs->pc += dabt.len ? 4 : 2;
Ian Campbell
2013-Jul-23 22:16 UTC
Re: [PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
On Tue, 2013-07-23 at 22:41 +0100, Julien Grall wrote:> On 23 July 2013 19:18, Ian Campbell <ian.campbell@citrix.com> wrote: > > On Tue, 2013-07-23 at 19:05 +0100, Julien Grall wrote: > >> From the errata document: > >> > >> When a non-secure non-hypervisor memory operation instruction generates a > >> stage2 page table translation fault, a trap to the hypervisor will be triggered. > >> For an architecturally defined subset of instructions, the Hypervisor Syndrome > >> Register (HSR) will have the Instruction Syndrome Valid (ISV) bit set to 1’b1, > >> and the Rt field should reflect the source register (for stores) or destination > >> register for loads. > >> On Cortex-A15, for Thumb and ThumbEE stores, the Rt value may be incorrect > >> and should not be used, even if the ISV bit is set. All loads, and all ARM > >> instruction set loads and stores, will have the correct Rt value if the ISV > >> bit is set. > >> > >> To avoid this issue, Xen needs to decode thumb store instruction and update > >> the transfer register. > >> > >> Signed-off-by: Julien Grall <julien.grall@linaro.org> > >> --- > >> xen/arch/arm/traps.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 47 insertions(+) > >> > >> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > >> index d6dc37d..da2bef6 100644 > >> --- a/xen/arch/arm/traps.c > >> +++ b/xen/arch/arm/traps.c > >> @@ -35,6 +35,7 @@ > >> #include <asm/regs.h> > >> #include <asm/cpregs.h> > >> #include <asm/psci.h> > >> +#include <asm/guest_access.h> > >> > >> #include "io.h" > >> #include "vtimer.h" > >> @@ -996,6 +997,31 @@ done: > >> if (first) unmap_domain_page(first); > >> } > >> > >> +static int read_instruction(struct cpu_user_regs *regs, unsigned len, > >> + uint32_t *instr) > >> +{ > >> + int rc; > >> + > >> + rc = raw_copy_from_guest(instr, (void * __user)regs->pc, len); > >> + > >> + if ( rc ) > >> + return rc; > >> + > >> + switch ( len ) > >> + { > >> + /* 16-bit instruction */ > >> + case 2: > >> + *instr &= 0xffff; > >> + break; > >> + /* 32-bit instruction */ > >> + case 4: > >> + *instr = (*instr & 0xffff) << 16 | (*instr & 0xffff0000) >> 16; > > > > Since you only ever consult bits 12..15 or 0..3 of the result couldn't > > you just read two bytes from pc+2 instead of reading 4 bytes and > > swapping etc? > > I was thinking to provide a "high level" function to retrieve an > instruction just in case we need it in the future.I suppose that does sound potentially useful. Were does this the idea of swapping them come from though? The ARM ARM seems (see e.g. section A6.3) to describe things in the order they are in memory -- is doing otherwise not going to end up confusing us? [...]> > I'd think about implementing this as a pseudo-cpuinfo thing setup either > > via identify_cpu or perhaps via a processor callback in proc-v7.S and > > friends. Then you can define and use something like cpu_has_errata766422 > > in the conditional and force it to zero for arm64 builds. > > Sounds good. I will rework the patch with this solution.Thanks. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Julien Grall
2013-Jul-23 22:43 UTC
Re: [PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
On 23 July 2013 23:16, Ian Campbell <ian.campbell@citrix.com> wrote:> On Tue, 2013-07-23 at 22:41 +0100, Julien Grall wrote: >> On 23 July 2013 19:18, Ian Campbell <ian.campbell@citrix.com> wrote: >> > On Tue, 2013-07-23 at 19:05 +0100, Julien Grall wrote: >> >> From the errata document: >> >> >> >> When a non-secure non-hypervisor memory operation instruction generates a >> >> stage2 page table translation fault, a trap to the hypervisor will be triggered. >> >> For an architecturally defined subset of instructions, the Hypervisor Syndrome >> >> Register (HSR) will have the Instruction Syndrome Valid (ISV) bit set to 1’b1, >> >> and the Rt field should reflect the source register (for stores) or destination >> >> register for loads. >> >> On Cortex-A15, for Thumb and ThumbEE stores, the Rt value may be incorrect >> >> and should not be used, even if the ISV bit is set. All loads, and all ARM >> >> instruction set loads and stores, will have the correct Rt value if the ISV >> >> bit is set. >> >> >> >> To avoid this issue, Xen needs to decode thumb store instruction and update >> >> the transfer register. >> >> >> >> Signed-off-by: Julien Grall <julien.grall@linaro.org> >> >> --- >> >> xen/arch/arm/traps.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ >> >> 1 file changed, 47 insertions(+) >> >> >> >> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c >> >> index d6dc37d..da2bef6 100644 >> >> --- a/xen/arch/arm/traps.c >> >> +++ b/xen/arch/arm/traps.c >> >> @@ -35,6 +35,7 @@ >> >> #include <asm/regs.h> >> >> #include <asm/cpregs.h> >> >> #include <asm/psci.h> >> >> +#include <asm/guest_access.h> >> >> >> >> #include "io.h" >> >> #include "vtimer.h" >> >> @@ -996,6 +997,31 @@ done: >> >> if (first) unmap_domain_page(first); >> >> } >> >> >> >> +static int read_instruction(struct cpu_user_regs *regs, unsigned len, >> >> + uint32_t *instr) >> >> +{ >> >> + int rc; >> >> + >> >> + rc = raw_copy_from_guest(instr, (void * __user)regs->pc, len); >> >> + >> >> + if ( rc ) >> >> + return rc; >> >> + >> >> + switch ( len ) >> >> + { >> >> + /* 16-bit instruction */ >> >> + case 2: >> >> + *instr &= 0xffff; >> >> + break; >> >> + /* 32-bit instruction */ >> >> + case 4: >> >> + *instr = (*instr & 0xffff) << 16 | (*instr & 0xffff0000) >> 16; >> > >> > Since you only ever consult bits 12..15 or 0..3 of the result couldn't >> > you just read two bytes from pc+2 instead of reading 4 bytes and >> > swapping etc? >> >> I was thinking to provide a "high level" function to retrieve an >> instruction just in case we need it in the future. > > I suppose that does sound potentially useful. > > Were does this the idea of swapping them come from though? The ARM ARM > seems (see e.g. section A6.3) to describe things in the order they are > in memory -- is doing otherwise not going to end up confusing us?In THUMB set, a 32-bit instruction consisting of two consecutive halfwords (see section A6.1). So the instruction is in "big endian", but Xen will read the word as a "little endian" things. But in ARM set, a 32-bit instruction is a single word. I will add a check to handle this case. -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Jul-24 14:55 UTC
Re: [PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
On Tue, 2013-07-23 at 23:43 +0100, Julien Grall wrote:> On 23 July 2013 23:16, Ian Campbell <ian.campbell@citrix.com> wrote: > > On Tue, 2013-07-23 at 22:41 +0100, Julien Grall wrote: > >> On 23 July 2013 19:18, Ian Campbell <ian.campbell@citrix.com> wrote: > >> > On Tue, 2013-07-23 at 19:05 +0100, Julien Grall wrote: > >> >> From the errata document: > >> >> > >> >> When a non-secure non-hypervisor memory operation instruction generates a > >> >> stage2 page table translation fault, a trap to the hypervisor will be triggered. > >> >> For an architecturally defined subset of instructions, the Hypervisor Syndrome > >> >> Register (HSR) will have the Instruction Syndrome Valid (ISV) bit set to 1’b1, > >> >> and the Rt field should reflect the source register (for stores) or destination > >> >> register for loads. > >> >> On Cortex-A15, for Thumb and ThumbEE stores, the Rt value may be incorrect > >> >> and should not be used, even if the ISV bit is set. All loads, and all ARM > >> >> instruction set loads and stores, will have the correct Rt value if the ISV > >> >> bit is set. > >> >> > >> >> To avoid this issue, Xen needs to decode thumb store instruction and update > >> >> the transfer register. > >> >> > >> >> Signed-off-by: Julien Grall <julien.grall@linaro.org> > >> >> --- > >> >> xen/arch/arm/traps.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ > >> >> 1 file changed, 47 insertions(+) > >> >> > >> >> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > >> >> index d6dc37d..da2bef6 100644 > >> >> --- a/xen/arch/arm/traps.c > >> >> +++ b/xen/arch/arm/traps.c > >> >> @@ -35,6 +35,7 @@ > >> >> #include <asm/regs.h> > >> >> #include <asm/cpregs.h> > >> >> #include <asm/psci.h> > >> >> +#include <asm/guest_access.h> > >> >> > >> >> #include "io.h" > >> >> #include "vtimer.h" > >> >> @@ -996,6 +997,31 @@ done: > >> >> if (first) unmap_domain_page(first); > >> >> } > >> >> > >> >> +static int read_instruction(struct cpu_user_regs *regs, unsigned len, > >> >> + uint32_t *instr) > >> >> +{ > >> >> + int rc; > >> >> + > >> >> + rc = raw_copy_from_guest(instr, (void * __user)regs->pc, len); > >> >> + > >> >> + if ( rc ) > >> >> + return rc; > >> >> + > >> >> + switch ( len ) > >> >> + { > >> >> + /* 16-bit instruction */ > >> >> + case 2: > >> >> + *instr &= 0xffff; > >> >> + break; > >> >> + /* 32-bit instruction */ > >> >> + case 4: > >> >> + *instr = (*instr & 0xffff) << 16 | (*instr & 0xffff0000) >> 16; > >> > > >> > Since you only ever consult bits 12..15 or 0..3 of the result couldn't > >> > you just read two bytes from pc+2 instead of reading 4 bytes and > >> > swapping etc? > >> > >> I was thinking to provide a "high level" function to retrieve an > >> instruction just in case we need it in the future. > > > > I suppose that does sound potentially useful. > > > > Were does this the idea of swapping them come from though? The ARM ARM > > seems (see e.g. section A6.3) to describe things in the order they are > > in memory -- is doing otherwise not going to end up confusing us? > > In THUMB set, a 32-bit instruction consisting of two consecutive > halfwords (see section A6.1). > So the instruction is in "big endian", but Xen will read the word as a > "little endian" things.Are you saying that the 16-bit sub-words are in the opposite order to what I would have expected and what seems most natural from a decode PoV? Consider the T3 encoding of STR (imm, thumb) from A8.8.203, which is: 1 1 1 1 1 0 0 0 1 1 0 0 Rn | Rt imm12 (where Rn == bits 0..3 of the first halfword, Rt is 15..12 of the second and imm12 is the remainder of the second halfword). I would have expected that the halfword with the "11111" pattern (which identifies this as a 32-bit thumb instruction) would have to come first, so the decode knows to look for the second. IOW the halfword with 11111 should be at PC and the Rt/imm12 halfword should be at PC+2. So if we copy 4 bytes from guest PC we should end up with things in the order given above (and in the manual) and swapping shouldn't be needed. Perhaps I need to go have some coffee... Have you tested and actually observed this case on real h/w?> > But in ARM set, a 32-bit instruction is a single word. I will add a > check to handle this case. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Julien Grall
2013-Jul-24 15:19 UTC
Re: [PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
On 07/24/2013 03:55 PM, Ian Campbell wrote:> On Tue, 2013-07-23 at 23:43 +0100, Julien Grall wrote: >> On 23 July 2013 23:16, Ian Campbell <ian.campbell@citrix.com> wrote: >>> On Tue, 2013-07-23 at 22:41 +0100, Julien Grall wrote: >>>> On 23 July 2013 19:18, Ian Campbell <ian.campbell@citrix.com> wrote: >>>>> On Tue, 2013-07-23 at 19:05 +0100, Julien Grall wrote: >>>>>> From the errata document: >>>>>> >>>>>> When a non-secure non-hypervisor memory operation instruction generates a >>>>>> stage2 page table translation fault, a trap to the hypervisor will be triggered. >>>>>> For an architecturally defined subset of instructions, the Hypervisor Syndrome >>>>>> Register (HSR) will have the Instruction Syndrome Valid (ISV) bit set to 1’b1, >>>>>> and the Rt field should reflect the source register (for stores) or destination >>>>>> register for loads. >>>>>> On Cortex-A15, for Thumb and ThumbEE stores, the Rt value may be incorrect >>>>>> and should not be used, even if the ISV bit is set. All loads, and all ARM >>>>>> instruction set loads and stores, will have the correct Rt value if the ISV >>>>>> bit is set. >>>>>> >>>>>> To avoid this issue, Xen needs to decode thumb store instruction and update >>>>>> the transfer register. >>>>>> >>>>>> Signed-off-by: Julien Grall <julien.grall@linaro.org> >>>>>> --- >>>>>> xen/arch/arm/traps.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> 1 file changed, 47 insertions(+) >>>>>> >>>>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c >>>>>> index d6dc37d..da2bef6 100644 >>>>>> --- a/xen/arch/arm/traps.c >>>>>> +++ b/xen/arch/arm/traps.c >>>>>> @@ -35,6 +35,7 @@ >>>>>> #include <asm/regs.h> >>>>>> #include <asm/cpregs.h> >>>>>> #include <asm/psci.h> >>>>>> +#include <asm/guest_access.h> >>>>>> >>>>>> #include "io.h" >>>>>> #include "vtimer.h" >>>>>> @@ -996,6 +997,31 @@ done: >>>>>> if (first) unmap_domain_page(first); >>>>>> } >>>>>> >>>>>> +static int read_instruction(struct cpu_user_regs *regs, unsigned len, >>>>>> + uint32_t *instr) >>>>>> +{ >>>>>> + int rc; >>>>>> + >>>>>> + rc = raw_copy_from_guest(instr, (void * __user)regs->pc, len); >>>>>> + >>>>>> + if ( rc ) >>>>>> + return rc; >>>>>> + >>>>>> + switch ( len ) >>>>>> + { >>>>>> + /* 16-bit instruction */ >>>>>> + case 2: >>>>>> + *instr &= 0xffff; >>>>>> + break; >>>>>> + /* 32-bit instruction */ >>>>>> + case 4: >>>>>> + *instr = (*instr & 0xffff) << 16 | (*instr & 0xffff0000) >> 16; >>>>> >>>>> Since you only ever consult bits 12..15 or 0..3 of the result couldn't >>>>> you just read two bytes from pc+2 instead of reading 4 bytes and >>>>> swapping etc? >>>> >>>> I was thinking to provide a "high level" function to retrieve an >>>> instruction just in case we need it in the future. >>> >>> I suppose that does sound potentially useful. >>> >>> Were does this the idea of swapping them come from though? The ARM ARM >>> seems (see e.g. section A6.3) to describe things in the order they are >>> in memory -- is doing otherwise not going to end up confusing us? >> >> In THUMB set, a 32-bit instruction consisting of two consecutive >> halfwords (see section A6.1). >> So the instruction is in "big endian", but Xen will read the word as a >> "little endian" things. > > Are you saying that the 16-bit sub-words are in the opposite order to > what I would have expected and what seems most natural from a decode > PoV?> Consider the T3 encoding of STR (imm, thumb) from A8.8.203, which is: > 1 1 1 1 1 0 0 0 1 1 0 0 Rn | Rt imm12 > > (where Rn == bits 0..3 of the first halfword, Rt is 15..12 of the second > and imm12 is the remainder of the second halfword). > > I would have expected that the halfword with the "11111" pattern (which > identifies this as a 32-bit thumb instruction) would have to come first, > so the decode knows to look for the second. IOW the halfword with 11111 > should be at PC and the Rt/imm12 halfword should be at PC+2. So if we > copy 4 bytes from guest PC we should end up with things in the order > given above (and in the manual) and swapping shouldn't be needed. > Perhaps I need to go have some coffee...The ARM ARM specification describes a THUMB 32 bit instruction with HW1 HW2 HW1 => [31:16] => most significant HW2 => [15:0] => less significant raw_copy_from_copy will directly read 4 bytes and store in an uint32_t. As Xen is running in little endian, it ends up in: HW2 => [31:16] HW1 => [15:0] Which is false. If it's more clear, I can do something like this uint16_t a[2]; rc = raw_copy_from_guest(instr, (void * __user)regs->pc, len); ... switch ( len ) { ... case 4: instr = a[0] << 16 | a[1]; ... }> Have you tested and actually observed this case on real h/w?I tried on the arndale board. -- Julien _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Tim Deegan
2013-Jul-29 14:46 UTC
Re: [PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
Hi, At 16:19 +0100 on 24 Jul (1374682769), Julien Grall wrote:> On 07/24/2013 03:55 PM, Ian Campbell wrote: > > On Tue, 2013-07-23 at 23:43 +0100, Julien Grall wrote: > >> On 23 July 2013 23:16, Ian Campbell <ian.campbell@citrix.com> wrote: > >>> Were does this the idea of swapping them come from though? The ARM ARM > >>> seems (see e.g. section A6.3) to describe things in the order they are > >>> in memory -- is doing otherwise not going to end up confusing us? > >> > >> In THUMB set, a 32-bit instruction consisting of two consecutive > >> halfwords (see section A6.1). > >> So the instruction is in "big endian", but Xen will read the word as a > >> "little endian" things. > > > > Are you saying that the 16-bit sub-words are in the opposite order to > > what I would have expected and what seems most natural from a decode > > PoV? > > > Consider the T3 encoding of STR (imm, thumb) from A8.8.203, which is: > > 1 1 1 1 1 0 0 0 1 1 0 0 Rn | Rt imm12 > > > > (where Rn == bits 0..3 of the first halfword, Rt is 15..12 of the second > > and imm12 is the remainder of the second halfword). > > > > I would have expected that the halfword with the "11111" pattern (which > > identifies this as a 32-bit thumb instruction) would have to come first, > > so the decode knows to look for the second. IOW the halfword with 11111 > > should be at PC and the Rt/imm12 halfword should be at PC+2.Yes. This seems to be pretty much explicit in section A6.1 -- you can read 16 bits and decide from those whether you need to read another 16. "If bits [15:11] of the halfword being decoded take any of the following values, the halfword is the first halfword of a 32-bit instruction"> > So if we > > copy 4 bytes from guest PC we should end up with things in the order > > given above (and in the manual) and swapping shouldn''t be needed.Sadly, no. Instruction memory is always little-endian, but: "The Thumb instruction stream is a sequence of halfword-aligned halfwords. Each Thumb instruction is either a single 16-bit halfword in that stream, or a 32-bit instruction consisting of two consecutive halfwords in that stream." So a 32-bit thumb instruction with bytes ABCD (where byte A is the one with the magic 5-bit pattern) will be stored in memory as a mixed-endian monstrosity: PC: B PC+1: A PC+2: D PC+3: C A little-endian halfword read of PC gives you AB; another halfword read at PC+2 gives CD, and a 32-bit little-endian read gives CDAB. We don''t necessarily have to do the bit-shuffling explicitly: we could do thumb32 decode with the shuffle implicit in the layout used for decoding.> uint16_t a[2]; > rc = raw_copy_from_guest(instr, (void * __user)regs->pc, len);You definitely should read exactly the correct number of bytes -- if we are decoding a 16-bit instruction at the end of a page we don''t want to trigger a fault by reading 32 bits and crossing a page boundary. Oh, and one other comment that I''ve already lost the context for: can you please rename the instruction fetch-and-shuffle function to have ''thumb'' in its name somewhere? Cheers, Tim.
Ian Campbell
2013-Jul-29 14:55 UTC
Re: [PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
On Mon, 2013-07-29 at 15:46 +0100, Tim Deegan wrote:> Hi, > > At 16:19 +0100 on 24 Jul (1374682769), Julien Grall wrote: > > On 07/24/2013 03:55 PM, Ian Campbell wrote: > > > On Tue, 2013-07-23 at 23:43 +0100, Julien Grall wrote: > > >> On 23 July 2013 23:16, Ian Campbell <ian.campbell@citrix.com> wrote: > > >>> Were does this the idea of swapping them come from though? The ARM ARM > > >>> seems (see e.g. section A6.3) to describe things in the order they are > > >>> in memory -- is doing otherwise not going to end up confusing us? > > >> > > >> In THUMB set, a 32-bit instruction consisting of two consecutive > > >> halfwords (see section A6.1). > > >> So the instruction is in "big endian", but Xen will read the word as a > > >> "little endian" things. > > > > > > Are you saying that the 16-bit sub-words are in the opposite order to > > > what I would have expected and what seems most natural from a decode > > > PoV? > > > > > Consider the T3 encoding of STR (imm, thumb) from A8.8.203, which is: > > > 1 1 1 1 1 0 0 0 1 1 0 0 Rn | Rt imm12 > > > > > > (where Rn == bits 0..3 of the first halfword, Rt is 15..12 of the second > > > and imm12 is the remainder of the second halfword). > > > > > > I would have expected that the halfword with the "11111" pattern (which > > > identifies this as a 32-bit thumb instruction) would have to come first, > > > so the decode knows to look for the second. IOW the halfword with 11111 > > > should be at PC and the Rt/imm12 halfword should be at PC+2. > > Yes. This seems to be pretty much explicit in section A6.1 -- you can > read 16 bits and decide from those whether you need to read another 16. > > "If bits [15:11] of the halfword being decoded take any of the following > values, the halfword is the first halfword of a 32-bit instruction" > > > > So if we > > > copy 4 bytes from guest PC we should end up with things in the order > > > given above (and in the manual) and swapping shouldn''t be needed. > > Sadly, no. Instruction memory is always little-endian, but: > > "The Thumb instruction stream is a sequence of halfword-aligned > halfwords. Each Thumb instruction is either a single 16-bit halfword > in that stream, or a 32-bit instruction consisting of two consecutive > halfwords in that stream." > > So a 32-bit thumb instruction with bytes ABCD (where byte A is the one > with the magic 5-bit pattern) will be stored in memory as a mixed-endian > monstrosity: > > PC: B > PC+1: A > PC+2: D > PC+3: C > > A little-endian halfword read of PC gives you AB; another halfword read > at PC+2 gives CD, and a 32-bit little-endian read gives CDAB.Good lord. No wonder I didn''t get what Julien was trying to explain to me, he was trying to explain an insanity!> We don''t necessarily have to do the bit-shuffling explicitly: we could > do thumb32 decode with the shuffle implicit in the layout used for > decoding.Since we already know the instruction length in this context that would seem OK. Although it does make it harder to reason about code which handles either 16 or 32 bit instructions -- which half contains "bit 0" becomes different. If we wanted to be more general we would do a 16-bit read and then check for the magic pattern before doing a shift and a second 16-bit read, which mimic the hardware and avoid my tiny brain having to think to hard about mixed endianness.> > uint16_t a[2]; > > rc = raw_copy_from_guest(instr, (void * __user)regs->pc, len); > > You definitely should read exactly the correct number of bytes -- if we > are decoding a 16-bit instruction at the end of a page we don''t want to > trigger a fault by reading 32 bits and crossing a page boundary.Yes.> Oh, and one other comment that I''ve already lost the context for: can > you please rename the instruction fetch-and-shuffle function to have > ''thumb'' in its name somewhere?ACK! Ian.
Julien Grall
2013-Jul-29 14:56 UTC
Re: [PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
On 07/29/2013 03:46 PM, Tim Deegan wrote:> Hi, > > At 16:19 +0100 on 24 Jul (1374682769), Julien Grall wrote: >> On 07/24/2013 03:55 PM, Ian Campbell wrote: >>> On Tue, 2013-07-23 at 23:43 +0100, Julien Grall wrote: >>>> On 23 July 2013 23:16, Ian Campbell <ian.campbell@citrix.com> wrote: >>>>> Were does this the idea of swapping them come from though? The ARM ARM >>>>> seems (see e.g. section A6.3) to describe things in the order they are >>>>> in memory -- is doing otherwise not going to end up confusing us? >>>> >>>> In THUMB set, a 32-bit instruction consisting of two consecutive >>>> halfwords (see section A6.1). >>>> So the instruction is in "big endian", but Xen will read the word as a >>>> "little endian" things. >>> >>> Are you saying that the 16-bit sub-words are in the opposite order to >>> what I would have expected and what seems most natural from a decode >>> PoV? >> >>> Consider the T3 encoding of STR (imm, thumb) from A8.8.203, which is: >>> 1 1 1 1 1 0 0 0 1 1 0 0 Rn | Rt imm12 >>> >>> (where Rn == bits 0..3 of the first halfword, Rt is 15..12 of the second >>> and imm12 is the remainder of the second halfword). >>> >>> I would have expected that the halfword with the "11111" pattern (which >>> identifies this as a 32-bit thumb instruction) would have to come first, >>> so the decode knows to look for the second. IOW the halfword with 11111 >>> should be at PC and the Rt/imm12 halfword should be at PC+2. > > Yes. This seems to be pretty much explicit in section A6.1 -- you can > read 16 bits and decide from those whether you need to read another 16. > > "If bits [15:11] of the halfword being decoded take any of the following > values, the halfword is the first halfword of a 32-bit instruction" > >>> So if we >>> copy 4 bytes from guest PC we should end up with things in the order >>> given above (and in the manual) and swapping shouldn''t be needed. > > Sadly, no. Instruction memory is always little-endian, but: > > "The Thumb instruction stream is a sequence of halfword-aligned > halfwords. Each Thumb instruction is either a single 16-bit halfword > in that stream, or a 32-bit instruction consisting of two consecutive > halfwords in that stream." > > So a 32-bit thumb instruction with bytes ABCD (where byte A is the one > with the magic 5-bit pattern) will be stored in memory as a mixed-endian > monstrosity: > > PC: B > PC+1: A > PC+2: D > PC+3: C > > A little-endian halfword read of PC gives you AB; another halfword read > at PC+2 gives CD, and a 32-bit little-endian read gives CDAB. > > We don''t necessarily have to do the bit-shuffling explicitly: we could > do thumb32 decode with the shuffle implicit in the layout used for > decoding.I think we need to keep the bit-shuffling explicitly. It''s less confusing than having "non-coherent" shift during the decoding.>> uint16_t a[2]; >> rc = raw_copy_from_guest(instr, (void * __user)regs->pc, len); > > You definitely should read exactly the correct number of bytes -- if we > are decoding a 16-bit instruction at the end of a page we don''t want to > trigger a fault by reading 32 bits and crossing a page boundary.I already read either 16 or 32 bits depending of the instruction len. The array is bigger to fit to the both instruction.> Oh, and one other comment that I''ve already lost the context for: can > you please rename the instruction fetch-and-shuffle function to have > ''thumb'' in its name somewhere?Actually, I have sent a version for this patch series (http://www.gossamer-threads.com/lists/xen/devel/291808). It''s also support ARM instruction decoding. Cheers, -- Julien
Ian Campbell
2013-Jul-29 15:00 UTC
Re: [PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
On Mon, 2013-07-29 at 15:56 +0100, Julien Grall wrote:> On 07/29/2013 03:46 PM, Tim Deegan wrote: > > Hi, > > > > At 16:19 +0100 on 24 Jul (1374682769), Julien Grall wrote: > >> On 07/24/2013 03:55 PM, Ian Campbell wrote: > >>> On Tue, 2013-07-23 at 23:43 +0100, Julien Grall wrote: > >>>> On 23 July 2013 23:16, Ian Campbell <ian.campbell@citrix.com> wrote: > >>>>> Were does this the idea of swapping them come from though? The ARM ARM > >>>>> seems (see e.g. section A6.3) to describe things in the order they are > >>>>> in memory -- is doing otherwise not going to end up confusing us? > >>>> > >>>> In THUMB set, a 32-bit instruction consisting of two consecutive > >>>> halfwords (see section A6.1). > >>>> So the instruction is in "big endian", but Xen will read the word as a > >>>> "little endian" things. > >>> > >>> Are you saying that the 16-bit sub-words are in the opposite order to > >>> what I would have expected and what seems most natural from a decode > >>> PoV? > >> > >>> Consider the T3 encoding of STR (imm, thumb) from A8.8.203, which is: > >>> 1 1 1 1 1 0 0 0 1 1 0 0 Rn | Rt imm12 > >>> > >>> (where Rn == bits 0..3 of the first halfword, Rt is 15..12 of the second > >>> and imm12 is the remainder of the second halfword). > >>> > >>> I would have expected that the halfword with the "11111" pattern (which > >>> identifies this as a 32-bit thumb instruction) would have to come first, > >>> so the decode knows to look for the second. IOW the halfword with 11111 > >>> should be at PC and the Rt/imm12 halfword should be at PC+2. > > > > Yes. This seems to be pretty much explicit in section A6.1 -- you can > > read 16 bits and decide from those whether you need to read another 16. > > > > "If bits [15:11] of the halfword being decoded take any of the following > > values, the halfword is the first halfword of a 32-bit instruction" > > > >>> So if we > >>> copy 4 bytes from guest PC we should end up with things in the order > >>> given above (and in the manual) and swapping shouldn''t be needed. > > > > Sadly, no. Instruction memory is always little-endian, but: > > > > "The Thumb instruction stream is a sequence of halfword-aligned > > halfwords. Each Thumb instruction is either a single 16-bit halfword > > in that stream, or a 32-bit instruction consisting of two consecutive > > halfwords in that stream." > > > > So a 32-bit thumb instruction with bytes ABCD (where byte A is the one > > with the magic 5-bit pattern) will be stored in memory as a mixed-endian > > monstrosity: > > > > PC: B > > PC+1: A > > PC+2: D > > PC+3: C > > > > A little-endian halfword read of PC gives you AB; another halfword read > > at PC+2 gives CD, and a 32-bit little-endian read gives CDAB. > > > > We don''t necessarily have to do the bit-shuffling explicitly: we could > > do thumb32 decode with the shuffle implicit in the layout used for > > decoding. > > I think we need to keep the bit-shuffling explicitly. It''s less > confusing than having "non-coherent" shift during the decoding.Whatever we do it needs a big comment along the lines of what Tim wrote above. Ian.
Julien Grall
2013-Jul-29 15:04 UTC
Re: [PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
On 07/29/2013 04:00 PM, Ian Campbell wrote:> On Mon, 2013-07-29 at 15:56 +0100, Julien Grall wrote: >> On 07/29/2013 03:46 PM, Tim Deegan wrote: >>> Hi, >>> >>> At 16:19 +0100 on 24 Jul (1374682769), Julien Grall wrote: >>>> On 07/24/2013 03:55 PM, Ian Campbell wrote: >>>>> On Tue, 2013-07-23 at 23:43 +0100, Julien Grall wrote: >>>>>> On 23 July 2013 23:16, Ian Campbell <ian.campbell@citrix.com> wrote: >>>>>>> Were does this the idea of swapping them come from though? The ARM ARM >>>>>>> seems (see e.g. section A6.3) to describe things in the order they are >>>>>>> in memory -- is doing otherwise not going to end up confusing us? >>>>>> >>>>>> In THUMB set, a 32-bit instruction consisting of two consecutive >>>>>> halfwords (see section A6.1). >>>>>> So the instruction is in "big endian", but Xen will read the word as a >>>>>> "little endian" things. >>>>> >>>>> Are you saying that the 16-bit sub-words are in the opposite order to >>>>> what I would have expected and what seems most natural from a decode >>>>> PoV? >>>> >>>>> Consider the T3 encoding of STR (imm, thumb) from A8.8.203, which is: >>>>> 1 1 1 1 1 0 0 0 1 1 0 0 Rn | Rt imm12 >>>>> >>>>> (where Rn == bits 0..3 of the first halfword, Rt is 15..12 of the second >>>>> and imm12 is the remainder of the second halfword). >>>>> >>>>> I would have expected that the halfword with the "11111" pattern (which >>>>> identifies this as a 32-bit thumb instruction) would have to come first, >>>>> so the decode knows to look for the second. IOW the halfword with 11111 >>>>> should be at PC and the Rt/imm12 halfword should be at PC+2. >>> >>> Yes. This seems to be pretty much explicit in section A6.1 -- you can >>> read 16 bits and decide from those whether you need to read another 16. >>> >>> "If bits [15:11] of the halfword being decoded take any of the following >>> values, the halfword is the first halfword of a 32-bit instruction" >>> >>>>> So if we >>>>> copy 4 bytes from guest PC we should end up with things in the order >>>>> given above (and in the manual) and swapping shouldn''t be needed. >>> >>> Sadly, no. Instruction memory is always little-endian, but: >>> >>> "The Thumb instruction stream is a sequence of halfword-aligned >>> halfwords. Each Thumb instruction is either a single 16-bit halfword >>> in that stream, or a 32-bit instruction consisting of two consecutive >>> halfwords in that stream." >>> >>> So a 32-bit thumb instruction with bytes ABCD (where byte A is the one >>> with the magic 5-bit pattern) will be stored in memory as a mixed-endian >>> monstrosity: >>> >>> PC: B >>> PC+1: A >>> PC+2: D >>> PC+3: C >>> >>> A little-endian halfword read of PC gives you AB; another halfword read >>> at PC+2 gives CD, and a 32-bit little-endian read gives CDAB. >>> >>> We don''t necessarily have to do the bit-shuffling explicitly: we could >>> do thumb32 decode with the shuffle implicit in the layout used for >>> decoding. >> >> I think we need to keep the bit-shuffling explicitly. It''s less >> confusing than having "non-coherent" shift during the decoding. > > Whatever we do it needs a big comment along the lines of what Tim wrote > above.I will add the comment in the third version of this patch series. -- Julien
Ian Campbell
2013-Jul-29 15:15 UTC
Re: [PATCH 3/3] xen/arm: errata 766422: decode thumb store during data abort
On Mon, 2013-07-29 at 16:04 +0100, Julien Grall wrote:> On 07/29/2013 04:00 PM, Ian Campbell wrote: > > On Mon, 2013-07-29 at 15:56 +0100, Julien Grall wrote: > >> On 07/29/2013 03:46 PM, Tim Deegan wrote: > >>> Hi, > >>> > >>> At 16:19 +0100 on 24 Jul (1374682769), Julien Grall wrote: > >>>> On 07/24/2013 03:55 PM, Ian Campbell wrote: > >>>>> On Tue, 2013-07-23 at 23:43 +0100, Julien Grall wrote: > >>>>>> On 23 July 2013 23:16, Ian Campbell <ian.campbell@citrix.com> wrote: > >>>>>>> Were does this the idea of swapping them come from though? The ARM ARM > >>>>>>> seems (see e.g. section A6.3) to describe things in the order they are > >>>>>>> in memory -- is doing otherwise not going to end up confusing us? > >>>>>> > >>>>>> In THUMB set, a 32-bit instruction consisting of two consecutive > >>>>>> halfwords (see section A6.1). > >>>>>> So the instruction is in "big endian", but Xen will read the word as a > >>>>>> "little endian" things. > >>>>> > >>>>> Are you saying that the 16-bit sub-words are in the opposite order to > >>>>> what I would have expected and what seems most natural from a decode > >>>>> PoV? > >>>> > >>>>> Consider the T3 encoding of STR (imm, thumb) from A8.8.203, which is: > >>>>> 1 1 1 1 1 0 0 0 1 1 0 0 Rn | Rt imm12 > >>>>> > >>>>> (where Rn == bits 0..3 of the first halfword, Rt is 15..12 of the second > >>>>> and imm12 is the remainder of the second halfword). > >>>>> > >>>>> I would have expected that the halfword with the "11111" pattern (which > >>>>> identifies this as a 32-bit thumb instruction) would have to come first, > >>>>> so the decode knows to look for the second. IOW the halfword with 11111 > >>>>> should be at PC and the Rt/imm12 halfword should be at PC+2. > >>> > >>> Yes. This seems to be pretty much explicit in section A6.1 -- you can > >>> read 16 bits and decide from those whether you need to read another 16. > >>> > >>> "If bits [15:11] of the halfword being decoded take any of the following > >>> values, the halfword is the first halfword of a 32-bit instruction" > >>> > >>>>> So if we > >>>>> copy 4 bytes from guest PC we should end up with things in the order > >>>>> given above (and in the manual) and swapping shouldn''t be needed. > >>> > >>> Sadly, no. Instruction memory is always little-endian, but: > >>> > >>> "The Thumb instruction stream is a sequence of halfword-aligned > >>> halfwords. Each Thumb instruction is either a single 16-bit halfword > >>> in that stream, or a 32-bit instruction consisting of two consecutive > >>> halfwords in that stream." > >>> > >>> So a 32-bit thumb instruction with bytes ABCD (where byte A is the one > >>> with the magic 5-bit pattern) will be stored in memory as a mixed-endian > >>> monstrosity: > >>> > >>> PC: B > >>> PC+1: A > >>> PC+2: D > >>> PC+3: C > >>> > >>> A little-endian halfword read of PC gives you AB; another halfword read > >>> at PC+2 gives CD, and a 32-bit little-endian read gives CDAB. > >>> > >>> We don''t necessarily have to do the bit-shuffling explicitly: we could > >>> do thumb32 decode with the shuffle implicit in the layout used for > >>> decoding. > >> > >> I think we need to keep the bit-shuffling explicitly. It''s less > >> confusing than having "non-coherent" shift during the decoding. > > > > Whatever we do it needs a big comment along the lines of what Tim wrote > > above. > > I will add the comment in the third version of this patch series.FYI I am in the process of applying the first 2 patches of this series, so you might want to wait an hour or so. Ian.