diff mbox

[v7,05/32] target-arm: make arm_current_el() return EL3

Message ID 1413910544-20150-6-git-send-email-greg.bellows@linaro.org
State New
Headers show

Commit Message

Greg Bellows Oct. 21, 2014, 4:55 p.m. UTC
From: Fabian Aggeler <aggelerf@ethz.ch>

Make arm_current_el() return EL3 for secure PL1 and monitor mode.
Increase MMU modes since mmu_index is directly inferred from arm_
current_el(). Change assertion in arm_el_is_aa64() to allow EL3.

Signed-off-by: Fabian Aggeler <aggelerf@ethz.ch>
Signed-off-by: Greg Bellows <greg.bellows@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

Comments

Peter Maydell Jan. 16, 2015, 6:36 p.m. UTC | #1
On 21 October 2014 at 17:55, Greg Bellows <greg.bellows@linaro.org> wrote:
> From: Fabian Aggeler <aggelerf@ethz.ch>
>
> Make arm_current_el() return EL3 for secure PL1 and monitor mode.
> Increase MMU modes since mmu_index is directly inferred from arm_
> current_el(). Change assertion in arm_el_is_aa64() to allow EL3.

> -#define NB_MMU_MODES 2
> +#define NB_MMU_MODES 4

So this turns out not to quite be what we want.
A QEMU MMU mode index basically defines a (vaddr -> paddr,permissions)
mapping. This is similar to the ARM ARM concept of a "translation
regime", with the differences that:
 * the ARM ARM translation regimes may have split permissions,
   for user and privileged code, so we need two mmu_idx values
   for a translation regime that applies to both EL0 and EL1
 * stage 1 and stage 2 translations for a VA->IPA->PA lookup
   for an EL1/EL0 hypervisor guest are two different translation
   regimes, but for QEMU we can just cache the whole VA->PA
   and use a single mmu_idx. [We only need to separately do
   VA->IPA and IPA->VA for the "do this address translation"
   system instructions, which don't need to touch the TLB;
   a combined stage1+stage2 TLB is permitted by the architecture.]

The translation regimes are:

If EL3 is 64-bit:
 * Secure EL3
 * Secure EL1 & EL0
 * NonSecure EL2
 * NonSecure EL1 & 0 stage 1
 * NonSecure EL1 & 0 stage 2
If EL3 is 32-bit:
 * Secure PL0 & PL1
 * NonSecure PL2
 * NonSecure PL1 & 0 stage 1
 * NonSecure PL1 & 0 stage 2
(reminder: for 32 bit EL3, Secure PL1 is *EL3*, not EL1.)

which we can give the following mmu indexes:

64 bit EL3:
 0 : NS EL0 stage 1+2
 1 : NS EL1 stage 1+2
 2 : NS EL2
 3 : S EL3
 4 : S EL0
 5 : S EL1

32 bit EL3:
 0 : NS EL0 (aka NS PL0) stage 1+2
 1 : NS EL1 (aka NS PL1) stage 1+2
 2 : NS EL2 (aka NS PL2)
 3 : S EL3 (aka S PL1)
 4 : S EL0 (aka S PL0)

Notice how they end up being the same, except that with a
64 bit EL3 we need an extra mmu index that 32 bit doesn't have.
They aren't simply "what is our current EL?", though as you
can see I've put them in an order that comes close.

So the right answer for NB_MMU_MODES is 6 :-)

-- PMM
Peter Maydell Jan. 19, 2015, 1:22 p.m. UTC | #2
On 16 January 2015 at 18:36, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 21 October 2014 at 17:55, Greg Bellows <greg.bellows@linaro.org> wrote:
>> -#define NB_MMU_MODES 2
>> +#define NB_MMU_MODES 4
>
> So this turns out not to quite be what we want.
> A QEMU MMU mode index basically defines a (vaddr -> paddr,permissions)
> mapping. This is similar to the ARM ARM concept of a "translation
> regime", with the differences that:
>  * the ARM ARM translation regimes may have split permissions,
>    for user and privileged code, so we need two mmu_idx values
>    for a translation regime that applies to both EL0 and EL1
>  * stage 1 and stage 2 translations for a VA->IPA->PA lookup
>    for an EL1/EL0 hypervisor guest are two different translation
>    regimes, but for QEMU we can just cache the whole VA->PA
>    and use a single mmu_idx. [We only need to separately do
>    VA->IPA and IPA->VA for the "do this address translation"
>    system instructions, which don't need to touch the TLB;
>    a combined stage1+stage2 TLB is permitted by the architecture.]
>
> The translation regimes are:
>
> If EL3 is 64-bit:
>  * Secure EL3
>  * Secure EL1 & EL0
>  * NonSecure EL2
>  * NonSecure EL1 & 0 stage 1
>  * NonSecure EL1 & 0 stage 2
> If EL3 is 32-bit:
>  * Secure PL0 & PL1
>  * NonSecure PL2
>  * NonSecure PL1 & 0 stage 1
>  * NonSecure PL1 & 0 stage 2
> (reminder: for 32 bit EL3, Secure PL1 is *EL3*, not EL1.)
>
> which we can give the following mmu indexes:
>
> 64 bit EL3:
>  0 : NS EL0 stage 1+2
>  1 : NS EL1 stage 1+2
>  2 : NS EL2
>  3 : S EL3
>  4 : S EL0
>  5 : S EL1
>
> 32 bit EL3:
>  0 : NS EL0 (aka NS PL0) stage 1+2
>  1 : NS EL1 (aka NS PL1) stage 1+2
>  2 : NS EL2 (aka NS PL2)
>  3 : S EL3 (aka S PL1)
>  4 : S EL0 (aka S PL0)
>
> Notice how they end up being the same, except that with a
> 64 bit EL3 we need an extra mmu index that 32 bit doesn't have.
> They aren't simply "what is our current EL?", though as you
> can see I've put them in an order that comes close.
>
> So the right answer for NB_MMU_MODES is 6 :-)

...except we would also kind of like to be able to cache
NS stage 2 lookups, because otherwise every access we make
to a stage 1 page table word (accessed by IPA) is going to
require a full stage 2 page table walk. That would mean
7 MMU modes.

Richard: do you have a feel for how expensive it is to
have lots and lots of mmu modes? I might be able to
merge "S EL1" with "NS EL1 stage 1+2" and ditto "S EL0"
with "NS EL0 stage1 + 2" but we'd need to do more TLB
flushing and it's not clear to me currently exactly
where the extra flushes would have to go...

-- PMM
Peter Maydell Jan. 19, 2015, 7 p.m. UTC | #3
On 19 January 2015 at 17:44, Richard Henderson <rth@twiddle.net> wrote:
> On 01/19/2015 05:22 AM, Peter Maydell wrote:
>> Richard: do you have a feel for how expensive it is to
>> have lots and lots of mmu modes? I might be able to
>> merge "S EL1" with "NS EL1 stage 1+2" and ditto "S EL0"
>> with "NS EL0 stage1 + 2" but we'd need to do more TLB
>> flushing and it's not clear to me currently exactly
>> where the extra flushes would have to go...
>
> It's 10k per mmu mode, more or less.  That's what you've
> got to memset (to -1) whenever a flush occurs.

Hmm. If the tlb flush memset is the main perf issue, we
could let the target tell the generic code how many MMU
modes it was using at runtime. We might need 7 modes in
the general case, but we could avoid burdening "no TZ"
or "no virtualization" CPUs with the overhead of clearing
TLB entries that we never actually use.

Alternatively (better!), for a lot of the tlb_flush()es triggered
by target-arm code we could be more precise about the affected
mmu_idx values, since the common case is going to be
"NS EL1 did something that needs a TLB flush", and by definition
that can't affect TLB entries for EL2, EL3 or S-EL1/EL0.

So I think my preference would be to use 7 mmu indexes,
and add a tlb_flush_mmuidx() function. (Assuming I'm
not missing anything that makes that not workable...)

-- PMM
diff mbox

Patch

==========

v6 -> v7
- Fix commit message

v5 -> v6
- Rework arm_current_el() logic to properly return EL3 for secure PL1 when EL3
  is 32-bit.
- Replace direct access of env->aarch64 with is_a64()

Signed-off-by: Greg Bellows <greg.bellows@linaro.org>
---
 target-arm/cpu.h | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 1138539..cb6ec5c 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -100,7 +100,7 @@  typedef uint32_t ARMReadCPFunc(void *opaque, int cp_info,
 
 struct arm_boot_info;
 
-#define NB_MMU_MODES 2
+#define NB_MMU_MODES 4
 
 /* We currently assume float and double are IEEE single and double
    precision respectively.
@@ -803,11 +803,12 @@  static inline bool arm_is_secure(CPUARMState *env)
 /* Return true if the specified exception level is running in AArch64 state. */
 static inline bool arm_el_is_aa64(CPUARMState *env, int el)
 {
-    /* We don't currently support EL2 or EL3, and this isn't valid for EL0
+    /* We don't currently support EL2, and this isn't valid for EL0
      * (if we're in EL0, is_a64() is what you want, and if we're not in EL0
      * then the state of EL0 isn't well defined.)
      */
-    assert(el == 1);
+    assert(el == 1 || el == 3);
+
     /* AArch64-capable CPUs always run with EL1 in AArch64 mode. This
      * is a QEMU-imposed simplification which we may wish to change later.
      * If we in future support EL2 and/or EL3, then the state of lower
@@ -996,17 +997,27 @@  static inline bool cptype_valid(int cptype)
  */
 static inline int arm_current_el(CPUARMState *env)
 {
-    if (env->aarch64) {
+    if (is_a64(env)) {
         return extract32(env->pstate, 2, 2);
     }
 
-    if ((env->uncached_cpsr & 0x1f) == ARM_CPU_MODE_USR) {
+    switch (env->uncached_cpsr & 0x1f) {
+    case ARM_CPU_MODE_USR:
         return 0;
+    case ARM_CPU_MODE_HYP:
+        return 2;
+    case ARM_CPU_MODE_MON:
+        return 3;
+    default:
+        if (arm_is_secure(env) && !arm_el_is_aa64(env, 3)) {
+            /* If EL3 is 32-bit then all secure privileged modes run in
+             * EL3
+             */
+            return 3;
+        }
+
+        return 1;
     }
-    /* We don't currently implement the Virtualization or TrustZone
-     * extensions, so EL2 and EL3 don't exist for us.
-     */
-    return 1;
 }
 
 typedef struct ARMCPRegInfo ARMCPRegInfo;