diff mbox

[10/10] ARM: virt: Fix hypervisor stub installation work without RAM

Message ID alpine.LFD.2.02.1202151611010.24536@xanadu.home
State New
Headers show

Commit Message

Nicolas Pitre Feb. 15, 2012, 9:47 p.m. UTC
On Wed, 15 Feb 2012, Dave Martin wrote:

> The zImage loader doesn't have any usable RAM until the kernel is
> relocated.  To support this, the hypervisor stub is abstracted to
> allow the boot CPU mode to be stored into an SPSR when building the
> zImage loader.
> 
> When building the kernel proper, we store the value into .data
> (moved from .bss, because zeroing out of the .bss at boot time will
> likely clobber it).
> 
> A helper function __get_boot_cpu_mode() is provided to read back
> the stored value.  This is mostly to help keep the zImage loader
> code clean, since in the kernel proper, the __boot_cpu_mode
> variable can straightforwardly be relied upon instead.  However,
> __get_boot_cpu_mode() will still work.
> 
> The safe_svcmode_maskall macro is modified to transfer the old
> mode's SPSR to the new mode.  For the main kernel entry point this
> will result in a couple of redundant instructions, since the SPSR
> is not significant -- however, this allows us to continue to use
> the same macro in the kernel and in the zImage loader.

Bleh....  I don't like this.

This is becoming too convoluted, and the abstractions stretched too much 
IMHO.  And tying the abstractions with the zImage wrapper restrictions 
is too cumbersome to remain manageable in the future.

For example, trying to reuse the same stub code as for the kernel proper 
is problematic because the zImage code has to be relocatable.  Using a 
.data variable because .bss is zeroed late is good, but then that 
variable has to be referenced with a PLT entry.  That makes the assembly 
code pretty hard to share as is.

I'd much prefer something straight forward and obvious such as:


Completely untested, but you get the idea.  This is self contained, 
independent from the main kernel code, and smaller as well.

Yet, it occurs to me that this is all broken anyway.  Exactly because 
the zImage code does get relocated.  That means that the HVBAR will be 
wrong as soon as zImage copies itself out of the way of the decompressed 
kernel image and attempting a hvc will branch to a random point in the 
final kernel image.  Trying to update the HVBAR would require yet more 
code making the whole thing even more fragile.

So I'm starting to think that simply adding LPAE support to the zImage 
wrapper code and leaving the CPU in HYP mode when decompressing is 
likely to be much simpler after all.  The LPAE additions in 
kernel/head.S weren't very intrusive.


Nicolas

Comments

Dave Martin Feb. 16, 2012, 11:46 a.m. UTC | #1
On Wed, Feb 15, 2012 at 9:47 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Wed, 15 Feb 2012, Dave Martin wrote:
>
>> The zImage loader doesn't have any usable RAM until the kernel is
>> relocated.  To support this, the hypervisor stub is abstracted to
>> allow the boot CPU mode to be stored into an SPSR when building the
>> zImage loader.
>>
>> When building the kernel proper, we store the value into .data
>> (moved from .bss, because zeroing out of the .bss at boot time will
>> likely clobber it).
>>
>> A helper function __get_boot_cpu_mode() is provided to read back
>> the stored value.  This is mostly to help keep the zImage loader
>> code clean, since in the kernel proper, the __boot_cpu_mode
>> variable can straightforwardly be relied upon instead.  However,
>> __get_boot_cpu_mode() will still work.
>>
>> The safe_svcmode_maskall macro is modified to transfer the old
>> mode's SPSR to the new mode.  For the main kernel entry point this
>> will result in a couple of redundant instructions, since the SPSR
>> is not significant -- however, this allows us to continue to use
>> the same macro in the kernel and in the zImage loader.
>
> Bleh....  I don't like this.
>
> This is becoming too convoluted, and the abstractions stretched too much
> IMHO.  And tying the abstractions with the zImage wrapper restrictions
> is too cumbersome to remain manageable in the future.
>
> For example, trying to reuse the same stub code as for the kernel proper
> is problematic because the zImage code has to be relocatable.  Using a
> .data variable because .bss is zeroed late is good, but then that
> variable has to be referenced with a PLT entry.  That makes the assembly
> code pretty hard to share as is.

The assembler is already completely position-independent, so there is
no GOT entry -- this is already needed even for the main kernel boot,
since the MMU is not on when this code runs.  For zImage .data/.bss
isn't used at all, because we use the SPSR instead to store the old
mode.

But I agree that things are getting a bit strained here to the point
where having a common abstraction is not really worth it.

One reason for keeping that stuff in a separate file was to avoid
building the whole file with armv7-a+virt, since this will make it too
easy to build something that won't work on earlier CPUs without this
triggering a build error.

Now that we have macros to replace hvc etc., we could just use those
instead and inline the code, instead of changing the assembler flags.
This feels better, though for kernel/head.S there is more added code,
and I'm still not sure if it should be inlined -- what's your view?

> I'd much prefer something straight forward and obvious such as:
>
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index c5d60250d4..469326580d 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -160,6 +160,28 @@ not_angel:
>                 * be needed here - is there an Angel SWI call for this?
>                 */
>
> +#ifdef CONFIG_ARM_VIRT
> +               /*
> +                * If we were called in HYP mode, drop to SVC mode
> +                * temporarily to perform kernel decompression with
> +                * the non-LPAE page table code.
> +                */
> +               mrs     r0, cpsr
> +               and     r1, r0, #MODE_MASK
> +               teq     r1, #HYP_MODE

(Why not cmp?  I may have asked this before... but I'm curious.
Anyhow, it works either way.)

> +               bne     1f
> +               adr     r1, __hyp_vectors
> +               mcr     p15, 4, r1, c12, c0, 0  @ set HVBAR
> +               /*
> +                * Use IRQ mode (same PL1 as SVC mode) to remember
> +                * it was HYP mode initially.
> +                */
> +               eor     r0, r0, #(HYP_MODE ^ IRQ_MODE)
> +               msr     cpsr_c, r0
> +               isb
> +1:
> +#endif
> +
>                /*
>                 * some architecture specific code can be inserted
>                 * by the linker here, but it should preserve r7, r8, and r9.
> @@ -458,7 +480,29 @@ not_relocated:     mov     r0, #0
>                bl      decompress_kernel
>                bl      cache_clean_flush
>                bl      cache_off
> -               mov     r0, #0                  @ must be zero
> +
> +#ifdef CONFIG_ARM_VIRT
> +               /*
> +                * If necessary, return to HYP mode before calling the kernel
> +                * indicated by IRQ mode.
> +                */
> +               mrs     r0, cpsr
> +               and     r0, r0, #MODE_MASK
> +               teq     r0, #IRQ_MODE
> +               bne     1f
> +               hvc     #0
> +
> +               .align  5
> +__hyp_vectors:
> +               W(b)    .                       @ reset
> +               W(b)    .                       @ undef
> +               W(b)    .                       @ svc
> +               W(b)    .                       @ pabort
> +               W(b)    .                       @ dabort
> +               @ hyp trap: fall through
> +#endif
> +
> +1:             mov     r0, #0                  @ must be zero
>                mov     r1, r7                  @ restore architecture number
>                mov     r2, r8                  @ restore atags pointer
>  ARM(          mov     pc, r4  )               @ call kernel
>
> Completely untested, but you get the idea.  This is self contained,
> independent from the main kernel code, and smaller as well.
>
> Yet, it occurs to me that this is all broken anyway.  Exactly because
> the zImage code does get relocated.  That means that the HVBAR will be

Hmmm, yes, you're totally right here.

> wrong as soon as zImage copies itself out of the way of the decompressed
> kernel image and attempting a hvc will branch to a random point in the
> final kernel image.  Trying to update the HVBAR would require yet more
> code making the whole thing even more fragile.
>
> So I'm starting to think that simply adding LPAE support to the zImage
> wrapper code and leaving the CPU in HYP mode when decompressing is
> likely to be much simpler after all.  The LPAE additions in
> kernel/head.S weren't very intrusive.

I was thinking about that yesterday... and I was starting to consider
this option too.  We're just creating a flat mapping, so it shouldn't
be as complex as all that.

I can maybe have a go at that.

Cheers
---Dave
Christoffer Dall Feb. 16, 2012, 6:39 p.m. UTC | #2
On Thu, Feb 16, 2012 at 6:46 AM, Dave Martin <dave.martin@linaro.org> wrote:
> On Wed, Feb 15, 2012 at 9:47 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> On Wed, 15 Feb 2012, Dave Martin wrote:
>>
>>> The zImage loader doesn't have any usable RAM until the kernel is
>>> relocated.  To support this, the hypervisor stub is abstracted to
>>> allow the boot CPU mode to be stored into an SPSR when building the
>>> zImage loader.
>>>
>>> When building the kernel proper, we store the value into .data
>>> (moved from .bss, because zeroing out of the .bss at boot time will
>>> likely clobber it).
>>>
>>> A helper function __get_boot_cpu_mode() is provided to read back
>>> the stored value.  This is mostly to help keep the zImage loader
>>> code clean, since in the kernel proper, the __boot_cpu_mode
>>> variable can straightforwardly be relied upon instead.  However,
>>> __get_boot_cpu_mode() will still work.
>>>
>>> The safe_svcmode_maskall macro is modified to transfer the old
>>> mode's SPSR to the new mode.  For the main kernel entry point this
>>> will result in a couple of redundant instructions, since the SPSR
>>> is not significant -- however, this allows us to continue to use
>>> the same macro in the kernel and in the zImage loader.
>>
>> Bleh....  I don't like this.
>>
>> This is becoming too convoluted, and the abstractions stretched too much
>> IMHO.  And tying the abstractions with the zImage wrapper restrictions
>> is too cumbersome to remain manageable in the future.
>>
>> For example, trying to reuse the same stub code as for the kernel proper
>> is problematic because the zImage code has to be relocatable.  Using a
>> .data variable because .bss is zeroed late is good, but then that
>> variable has to be referenced with a PLT entry.  That makes the assembly
>> code pretty hard to share as is.
>
> The assembler is already completely position-independent, so there is
> no GOT entry -- this is already needed even for the main kernel boot,
> since the MMU is not on when this code runs.  For zImage .data/.bss
> isn't used at all, because we use the SPSR instead to store the old
> mode.
>
> But I agree that things are getting a bit strained here to the point
> where having a common abstraction is not really worth it.
>
> One reason for keeping that stuff in a separate file was to avoid
> building the whole file with armv7-a+virt, since this will make it too
> easy to build something that won't work on earlier CPUs without this
> triggering a build error.
>
> Now that we have macros to replace hvc etc., we could just use those
> instead and inline the code, instead of changing the assembler flags.
> This feels better, though for kernel/head.S there is more added code,
> and I'm still not sure if it should be inlined -- what's your view?
>
>> I'd much prefer something straight forward and obvious such as:
>>
>> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
>> index c5d60250d4..469326580d 100644
>> --- a/arch/arm/boot/compressed/head.S
>> +++ b/arch/arm/boot/compressed/head.S
>> @@ -160,6 +160,28 @@ not_angel:
>>                 * be needed here - is there an Angel SWI call for this?
>>                 */
>>
>> +#ifdef CONFIG_ARM_VIRT
>> +               /*
>> +                * If we were called in HYP mode, drop to SVC mode
>> +                * temporarily to perform kernel decompression with
>> +                * the non-LPAE page table code.
>> +                */
>> +               mrs     r0, cpsr
>> +               and     r1, r0, #MODE_MASK
>> +               teq     r1, #HYP_MODE
>
> (Why not cmp?  I may have asked this before... but I'm curious.
> Anyhow, it works either way.)
>
>> +               bne     1f
>> +               adr     r1, __hyp_vectors
>> +               mcr     p15, 4, r1, c12, c0, 0  @ set HVBAR
>> +               /*
>> +                * Use IRQ mode (same PL1 as SVC mode) to remember
>> +                * it was HYP mode initially.
>> +                */
>> +               eor     r0, r0, #(HYP_MODE ^ IRQ_MODE)
>> +               msr     cpsr_c, r0
>> +               isb
>> +1:
>> +#endif
>> +
>>                /*
>>                 * some architecture specific code can be inserted
>>                 * by the linker here, but it should preserve r7, r8, and r9.
>> @@ -458,7 +480,29 @@ not_relocated:     mov     r0, #0
>>                bl      decompress_kernel
>>                bl      cache_clean_flush
>>                bl      cache_off
>> -               mov     r0, #0                  @ must be zero
>> +
>> +#ifdef CONFIG_ARM_VIRT
>> +               /*
>> +                * If necessary, return to HYP mode before calling the kernel
>> +                * indicated by IRQ mode.
>> +                */
>> +               mrs     r0, cpsr
>> +               and     r0, r0, #MODE_MASK
>> +               teq     r0, #IRQ_MODE
>> +               bne     1f
>> +               hvc     #0
>> +
>> +               .align  5
>> +__hyp_vectors:
>> +               W(b)    .                       @ reset
>> +               W(b)    .                       @ undef
>> +               W(b)    .                       @ svc
>> +               W(b)    .                       @ pabort
>> +               W(b)    .                       @ dabort
>> +               @ hyp trap: fall through
>> +#endif
>> +
>> +1:             mov     r0, #0                  @ must be zero
>>                mov     r1, r7                  @ restore architecture number
>>                mov     r2, r8                  @ restore atags pointer
>>  ARM(          mov     pc, r4  )               @ call kernel
>>
>> Completely untested, but you get the idea.  This is self contained,
>> independent from the main kernel code, and smaller as well.
>>
>> Yet, it occurs to me that this is all broken anyway.  Exactly because
>> the zImage code does get relocated.  That means that the HVBAR will be
>
> Hmmm, yes, you're totally right here.
>
>> wrong as soon as zImage copies itself out of the way of the decompressed
>> kernel image and attempting a hvc will branch to a random point in the
>> final kernel image.  Trying to update the HVBAR would require yet more
>> code making the whole thing even more fragile.
>>
>> So I'm starting to think that simply adding LPAE support to the zImage
>> wrapper code and leaving the CPU in HYP mode when decompressing is
>> likely to be much simpler after all.  The LPAE additions in
>> kernel/head.S weren't very intrusive.
>
> I was thinking about that yesterday... and I was starting to consider
> this option too.  We're just creating a flat mapping, so it shouldn't
> be as complex as all that.
>
sounds to me like you would only have to or that user bit for the flat
mapping, so definitely worth looking into...
Nicolas Pitre Feb. 16, 2012, 7:14 p.m. UTC | #3
On Thu, 16 Feb 2012, Dave Martin wrote:

> On Wed, Feb 15, 2012 at 9:47 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > For example, trying to reuse the same stub code as for the kernel proper
> > is problematic because the zImage code has to be relocatable.  Using a
> > .data variable because .bss is zeroed late is good, but then that
> > variable has to be referenced with a PLT entry.  That makes the assembly
> > code pretty hard to share as is.
> 
> The assembler is already completely position-independent, so there is
> no GOT entry -- this is already needed even for the main kernel boot,
> since the MMU is not on when this code runs.

The main kernel boot always has a constant offset between code and data. 
The zImage wrapper is trickier.  You do need a GOT entry (or the 
equivalent) since the code and data might be relocated separately. And 
because we didn't want to deal with .data copying, all .data content was 
simply forbidden.  And then we had to prevent static variables as well as 
they would use a GOT relative fixup which didn't work either with the 
simplistic relocation code in zImage.

> For zImage .data/.bss isn't used at all, because we use the SPSR 
> instead to store the old mode.
> 
> But I agree that things are getting a bit strained here to the point
> where having a common abstraction is not really worth it.

Right.

> One reason for keeping that stuff in a separate file was to avoid
> building the whole file with armv7-a+virt, since this will make it too
> easy to build something that won't work on earlier CPUs without this
> triggering a build error.


Hmmm.  Fine with me for the main kernel code, but not necessarily for 
the zImage wrapper.

> Now that we have macros to replace hvc etc., we could just use those
> instead and inline the code, instead of changing the assembler flags.

Indeed.

> This feels better, though for kernel/head.S there is more added code,
> and I'm still not sure if it should be inlined -- what's your view?

Your current set of patches seemed well balanced in that regard.

> > +               teq     r1, #HYP_MODE
> 
> (Why not cmp?  I may have asked this before... but I'm curious.
> Anyhow, it works either way.)

Old habit.  The teq insn doesn't clobber the V status flags unlike cmp, 
hence that's what I typically use when looking for pure equality.

> > Yet, it occurs to me that this is all broken anyway.  Exactly because
> > the zImage code does get relocated.  That means that the HVBAR will be
> 
> Hmmm, yes, you're totally right here.
> 
> > wrong as soon as zImage copies itself out of the way of the decompressed
> > kernel image and attempting a hvc will branch to a random point in the
> > final kernel image.  Trying to update the HVBAR would require yet more
> > code making the whole thing even more fragile.
> >
> > So I'm starting to think that simply adding LPAE support to the zImage
> > wrapper code and leaving the CPU in HYP mode when decompressing is
> > likely to be much simpler after all.  The LPAE additions in
> > kernel/head.S weren't very intrusive.
> 
> I was thinking about that yesterday... and I was starting to consider
> this option too.  We're just creating a flat mapping, so it shouldn't
> be as complex as all that.
> 
> I can maybe have a go at that.

Good!


Nicolas
diff mbox

Patch

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index c5d60250d4..469326580d 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -160,6 +160,28 @@  not_angel:
 		 * be needed here - is there an Angel SWI call for this?
 		 */
 
+#ifdef CONFIG_ARM_VIRT
+		/*
+		 * If we were called in HYP mode, drop to SVC mode
+		 * temporarily to perform kernel decompression with
+		 * the non-LPAE page table code.
+		 */
+		mrs	r0, cpsr
+		and	r1, r0, #MODE_MASK
+		teq	r1, #HYP_MODE
+		bne	1f
+		adr	r1, __hyp_vectors
+		mcr	p15, 4, r1, c12, c0, 0	@ set HVBAR
+		/*
+		 * Use IRQ mode (same PL1 as SVC mode) to remember
+		 * it was HYP mode initially.
+		 */
+		eor	r0, r0, #(HYP_MODE ^ IRQ_MODE)
+		msr	cpsr_c, r0
+		isb
+1:
+#endif
+
 		/*
 		 * some architecture specific code can be inserted
 		 * by the linker here, but it should preserve r7, r8, and r9.
@@ -458,7 +480,29 @@  not_relocated:	mov	r0, #0
 		bl	decompress_kernel
 		bl	cache_clean_flush
 		bl	cache_off
-		mov	r0, #0			@ must be zero
+
+#ifdef CONFIG_ARM_VIRT
+		/*
+		 * If necessary, return to HYP mode before calling the kernel
+		 * indicated by IRQ mode.
+		 */
+		mrs	r0, cpsr
+		and	r0, r0, #MODE_MASK
+		teq	r0, #IRQ_MODE
+		bne	1f
+		hvc	#0
+
+		.align	5
+__hyp_vectors:
+		W(b)	.			@ reset
+		W(b)	.			@ undef
+		W(b)	.			@ svc
+		W(b)	.			@ pabort
+		W(b)	.			@ dabort
+		@ hyp trap: fall through
+#endif
+
+1:		mov	r0, #0			@ must be zero
 		mov	r1, r7			@ restore architecture number
 		mov	r2, r8			@ restore atags pointer
  ARM(		mov	pc, r4	)		@ call kernel