[Xen-devel,v2,4/6] xen/arm: cpufeature: Add helper to check constant caps

Message ID 20180925172043.20248-5-julien.grall@arm.com
State Superseded
Headers show
Series
  • xen/arm: SMCCC fixup and improvement
Related show

Commit Message

Julien Grall Sept. 25, 2018, 5:20 p.m.
Some capababilities are set right during boot and will never change
afterwards. At the moment, the function cpu_have_caps will check whether
the cap is enabled from the memory.

It is possible to avoid the load from the memory by using an
ALTERNATIVE. With that the check is just reduced to 1 instruction.

Signed-off-by: Julien Grall <julien.grall@arm.com>

---

This is the static key for the poor. At some point we might want to
introduce something similar to static key in Xen.

    Changes in v2:
        - Use unlikely
---
 xen/include/asm-arm/cpufeature.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Stefano Stabellini Sept. 26, 2018, 4:53 p.m. | #1
On Tue, 25 Sep 2018, Julien Grall wrote:
> Some capababilities are set right during boot and will never change
> afterwards. At the moment, the function cpu_have_caps will check whether
> the cap is enabled from the memory.
> 
> It is possible to avoid the load from the memory by using an
> ALTERNATIVE. With that the check is just reduced to 1 instruction.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>

I enjoyed reading the patch :-)  But I don't think it is worth going
into this extreme level of optimization. test_bit is efficient enough,
right? What do you think we need to use alternatives just to check one
bit?


> ---
> 
> This is the static key for the poor. At some point we might want to
> introduce something similar to static key in Xen.
> 
>     Changes in v2:
>         - Use unlikely
> ---
>  xen/include/asm-arm/cpufeature.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
> index 3de6b54301..c6cbc2ec84 100644
> --- a/xen/include/asm-arm/cpufeature.h
> +++ b/xen/include/asm-arm/cpufeature.h
> @@ -63,6 +63,18 @@ static inline bool cpus_have_cap(unsigned int num)
>      return test_bit(num, cpu_hwcaps);
>  }
>  
> +/* System capability check for constant cap */
> +#define cpus_have_const_cap(num) ({                 \
> +        bool __ret;                                 \
> +                                                    \
> +        asm volatile (ALTERNATIVE("mov %0, #0",     \
> +                                  "mov %0, #1",     \
> +                                  num)              \
> +                      : "=r" (__ret));              \
> +                                                    \
> +        unlikely(__ret);                            \
> +        })
> +
>  static inline void cpus_set_cap(unsigned int num)
>  {
>      if (num >= ARM_NCAPS)
> -- 
> 2.11.0
>
Julien Grall Sept. 26, 2018, 7 p.m. | #2
Hi Stefano,

On 09/26/2018 05:53 PM, Stefano Stabellini wrote:
> On Tue, 25 Sep 2018, Julien Grall wrote:
>> Some capababilities are set right during boot and will never change
>> afterwards. At the moment, the function cpu_have_caps will check whether
>> the cap is enabled from the memory.
>>
>> It is possible to avoid the load from the memory by using an
>> ALTERNATIVE. With that the check is just reduced to 1 instruction.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> I enjoyed reading the patch :-)  But I don't think it is worth going
> into this extreme level of optimization. test_bit is efficient enough,
> right? What do you think we need to use alternatives just to check one
> bit?

We already have an helper using test_bit (see cpus_have_cap). However 
test_bit requires to load the word from the memory. This is an overhead 
when the decision never change after boot.

One load may be insignificant by itself (thought may have a cache miss), 
but if you are in hotpath this is a win in long term.

The mechanism is very similar to static key but for the poor (I don't 
have much time to implement better for now). We already use similar 
construction on other places (see CHECK_WORKAROUND_HELPER).

Cheers,
Stefano Stabellini Sept. 27, 2018, 10:34 p.m. | #3
On Wed, 26 Sep 2018, Julien Grall wrote:
> Hi Stefano,
> 
> On 09/26/2018 05:53 PM, Stefano Stabellini wrote:
> > On Tue, 25 Sep 2018, Julien Grall wrote:
> > > Some capababilities are set right during boot and will never change
> > > afterwards. At the moment, the function cpu_have_caps will check whether
> > > the cap is enabled from the memory.
> > > 
> > > It is possible to avoid the load from the memory by using an
> > > ALTERNATIVE. With that the check is just reduced to 1 instruction.
> > > 
> > > Signed-off-by: Julien Grall <julien.grall@arm.com>
> > 
> > I enjoyed reading the patch :-)  But I don't think it is worth going
> > into this extreme level of optimization. test_bit is efficient enough,
> > right? What do you think we need to use alternatives just to check one
> > bit?
> 
> We already have an helper using test_bit (see cpus_have_cap). However test_bit
> requires to load the word from the memory. This is an overhead when the
> decision never change after boot.
> 
> One load may be insignificant by itself (thought may have a cache miss), but
> if you are in hotpath this is a win in long term.
> 
> The mechanism is very similar to static key but for the poor (I don't have
> much time to implement better for now). We already use similar construction on
> other places (see CHECK_WORKAROUND_HELPER).

I like the mechanism and the little ALTERNATIVE trick.  I can see it
could very useful in some cases. It is just that it is a bit of a waste
to use it on SMCCC virtualization -- SMCCC calls cannot be done often
enough for this to make a difference.

But who am I to complain when you make things faster? :-)
Julien Grall Sept. 28, 2018, 9:27 a.m. | #4
Hi,

On 09/27/2018 11:34 PM, Stefano Stabellini wrote:
> On Wed, 26 Sep 2018, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 09/26/2018 05:53 PM, Stefano Stabellini wrote:
>>> On Tue, 25 Sep 2018, Julien Grall wrote:
>>>> Some capababilities are set right during boot and will never change
>>>> afterwards. At the moment, the function cpu_have_caps will check whether
>>>> the cap is enabled from the memory.
>>>>
>>>> It is possible to avoid the load from the memory by using an
>>>> ALTERNATIVE. With that the check is just reduced to 1 instruction.
>>>>
>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>
>>> I enjoyed reading the patch :-)  But I don't think it is worth going
>>> into this extreme level of optimization. test_bit is efficient enough,
>>> right? What do you think we need to use alternatives just to check one
>>> bit?
>>
>> We already have an helper using test_bit (see cpus_have_cap). However test_bit
>> requires to load the word from the memory. This is an overhead when the
>> decision never change after boot.
>>
>> One load may be insignificant by itself (thought may have a cache miss), but
>> if you are in hotpath this is a win in long term.
>>
>> The mechanism is very similar to static key but for the poor (I don't have
>> much time to implement better for now). We already use similar construction on
>> other places (see CHECK_WORKAROUND_HELPER).
> 
> I like the mechanism and the little ALTERNATIVE trick.  I can see it
> could very useful in some cases. It is just that it is a bit of a waste
> to use it on SMCCC virtualization -- SMCCC calls cannot be done often
> enough for this to make a difference.

I would not be so sure ;). SMC can be called at *every* entry and exit 
into the hypervisor to apply SSBD workaround.

While SSBD will directly call arm_smccc_1_1_smc (and not the generic 
one), I would not rule out more hotpath with SMC call in it.

Cheers,

Patch

diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index 3de6b54301..c6cbc2ec84 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -63,6 +63,18 @@  static inline bool cpus_have_cap(unsigned int num)
     return test_bit(num, cpu_hwcaps);
 }
 
+/* System capability check for constant cap */
+#define cpus_have_const_cap(num) ({                 \
+        bool __ret;                                 \
+                                                    \
+        asm volatile (ALTERNATIVE("mov %0, #0",     \
+                                  "mov %0, #1",     \
+                                  num)              \
+                      : "=r" (__ret));              \
+                                                    \
+        unlikely(__ret);                            \
+        })
+
 static inline void cpus_set_cap(unsigned int num)
 {
     if (num >= ARM_NCAPS)