[RFC,v2,3/7] arm64: alternative: Apply alternatives early in boot process

Message ID 1442237181-17064-4-git-send-email-daniel.thompson@linaro.org
State New
Headers show

Commit Message

Daniel Thompson Sept. 14, 2015, 1:26 p.m.
Currently alternatives are applied very late in the boot process (and
a long time after we enable scheduling). Some alternative sequences,
such as those that alter the way CPU context is stored, must be applied
much earlier in the boot sequence.

Introduce apply_alternatives_early() to allow some alternatives to be
applied immediately after we detect the CPU features of the boot CPU.

Currently apply_alternatives_all() is not optimized and will re-patch
code that has already been updated. This is harmless but could be
removed by adding extra flags to the alternatives store.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
---
 arch/arm64/include/asm/alternative.h |  1 +
 arch/arm64/kernel/alternative.c      | 15 +++++++++++++++
 arch/arm64/kernel/setup.c            |  7 +++++++
 3 files changed, 23 insertions(+)

Comments

Will Deacon Sept. 16, 2015, 1:05 p.m. | #1
On Mon, Sep 14, 2015 at 02:26:17PM +0100, Daniel Thompson wrote:
> Currently alternatives are applied very late in the boot process (and
> a long time after we enable scheduling). Some alternative sequences,
> such as those that alter the way CPU context is stored, must be applied
> much earlier in the boot sequence.
> 
> Introduce apply_alternatives_early() to allow some alternatives to be
> applied immediately after we detect the CPU features of the boot CPU.
> 
> Currently apply_alternatives_all() is not optimized and will re-patch
> code that has already been updated. This is harmless but could be
> removed by adding extra flags to the alternatives store.
> 
> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
> ---
>  arch/arm64/include/asm/alternative.h |  1 +
>  arch/arm64/kernel/alternative.c      | 15 +++++++++++++++
>  arch/arm64/kernel/setup.c            |  7 +++++++
>  3 files changed, 23 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
> index d56ec0715157..f9dad1b7c651 100644
> --- a/arch/arm64/include/asm/alternative.h
> +++ b/arch/arm64/include/asm/alternative.h
> @@ -17,6 +17,7 @@ struct alt_instr {
>  	u8  alt_len;		/* size of new instruction(s), <= orig_len */
>  };
>  
> +void __init apply_alternatives_early(void);
>  void __init apply_alternatives_all(void);
>  void apply_alternatives(void *start, size_t length);
>  void free_alternatives_memory(void);
> diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> index ab9db0e9818c..59989a4bed7c 100644
> --- a/arch/arm64/kernel/alternative.c
> +++ b/arch/arm64/kernel/alternative.c
> @@ -117,6 +117,21 @@ static void __apply_alternatives(void *alt_region)
>  }
>  
>  /*
> + * This is called very early in the boot process (directly after we run
> + * a feature detect on the boot CPU). No need to worry about other CPUs
> + * here.
> + */
> +void apply_alternatives_early(void)
> +{
> +	struct alt_region region = {
> +		.begin	= __alt_instructions,
> +		.end	= __alt_instructions_end,
> +	};
> +
> +	__apply_alternatives(&region);
> +}

How do you choose which alternatives are applied early and which are
applied later? AFAICT, this just applies everything before we've
established the capabilities of the CPUs in the system, which could cause
problems for big/little SoCs.

Also, why do we need this for the NMI?

Will
Daniel Thompson Sept. 16, 2015, 3:51 p.m. | #2
On 16/09/15 14:05, Will Deacon wrote:
> On Mon, Sep 14, 2015 at 02:26:17PM +0100, Daniel Thompson wrote:
>> Currently alternatives are applied very late in the boot process (and
>> a long time after we enable scheduling). Some alternative sequences,
>> such as those that alter the way CPU context is stored, must be applied
>> much earlier in the boot sequence.
>>
>> Introduce apply_alternatives_early() to allow some alternatives to be
>> applied immediately after we detect the CPU features of the boot CPU.
>>
>> Currently apply_alternatives_all() is not optimized and will re-patch
>> code that has already been updated. This is harmless but could be
>> removed by adding extra flags to the alternatives store.
>>
>> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
>> ---
> [snip]
>>   /*
>> + * This is called very early in the boot process (directly after we run
>> + * a feature detect on the boot CPU). No need to worry about other CPUs
>> + * here.
>> + */
>> +void apply_alternatives_early(void)
>> +{
>> +	struct alt_region region = {
>> +		.begin	= __alt_instructions,
>> +		.end	= __alt_instructions_end,
>> +	};
>> +
>> +	__apply_alternatives(&region);
>> +}
>
> How do you choose which alternatives are applied early and which are
> applied later? AFAICT, this just applies everything before we've
> established the capabilities of the CPUs in the system, which could cause
> problems for big/little SoCs.

They are applied twice. This relies for correctness on the fact that 
cpufeatures can be set but not unset.

In other words the boot CPU does a feature detect and, as a result, a 
subset of the required alternatives will be applied. However after this 
the other CPUs will boot and the the remaining alternatives applied as 
before.

The current implementation is inefficient (because it will redundantly 
patch the same code twice) but I don't think it is broken.


> Also, why do we need this for the NMI?

I was/am concerned that a context saved before the alternatives are 
applied might be restored afterwards. If that happens the bit that 
indicates what value to put into the PMR would read during the restore 
without having been saved first. Applying early ensures that the context 
save/restore code is updated before it is ever used.


Daniel.
Will Deacon Sept. 16, 2015, 4:24 p.m. | #3
On Wed, Sep 16, 2015 at 04:51:12PM +0100, Daniel Thompson wrote:
> On 16/09/15 14:05, Will Deacon wrote:
> > On Mon, Sep 14, 2015 at 02:26:17PM +0100, Daniel Thompson wrote:
> >> Currently alternatives are applied very late in the boot process (and
> >> a long time after we enable scheduling). Some alternative sequences,
> >> such as those that alter the way CPU context is stored, must be applied
> >> much earlier in the boot sequence.
> >>
> >> Introduce apply_alternatives_early() to allow some alternatives to be
> >> applied immediately after we detect the CPU features of the boot CPU.
> >>
> >> Currently apply_alternatives_all() is not optimized and will re-patch
> >> code that has already been updated. This is harmless but could be
> >> removed by adding extra flags to the alternatives store.
> >>
> >> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
> >> ---
> > [snip]
> >>   /*
> >> + * This is called very early in the boot process (directly after we run
> >> + * a feature detect on the boot CPU). No need to worry about other CPUs
> >> + * here.
> >> + */
> >> +void apply_alternatives_early(void)
> >> +{
> >> +	struct alt_region region = {
> >> +		.begin	= __alt_instructions,
> >> +		.end	= __alt_instructions_end,
> >> +	};
> >> +
> >> +	__apply_alternatives(&region);
> >> +}
> >
> > How do you choose which alternatives are applied early and which are
> > applied later? AFAICT, this just applies everything before we've
> > established the capabilities of the CPUs in the system, which could cause
> > problems for big/little SoCs.
> 
> They are applied twice. This relies for correctness on the fact that 
> cpufeatures can be set but not unset.
> 
> In other words the boot CPU does a feature detect and, as a result, a 
> subset of the required alternatives will be applied. However after this 
> the other CPUs will boot and the the remaining alternatives applied as 
> before.
> 
> The current implementation is inefficient (because it will redundantly 
> patch the same code twice) but I don't think it is broken.

What about a big/little system where we boot on the big cores and only
they support LSE atomics?

> > Also, why do we need this for the NMI?
> 
> I was/am concerned that a context saved before the alternatives are 
> applied might be restored afterwards. If that happens the bit that 
> indicates what value to put into the PMR would read during the restore 
> without having been saved first. Applying early ensures that the context 
> save/restore code is updated before it is ever used.

Damn, and stop_machine makes use of local_irq_restore immediately after
the patching has completed, so it's a non-starter. Still, special-casing
this feature via an explicit apply_alternatives call would be better
than moving everything earlier, I think.

We also need to think about how an incoming NMI interacts with
concurrent patching of later features. I suspect we want to set the I
bit, like you do for WFI, unless you can guarantee that no patched
sequences run in NMI context.

Will
Daniel Thompson Sept. 17, 2015, 1:25 p.m. | #4
On 16/09/15 17:24, Will Deacon wrote:
> On Wed, Sep 16, 2015 at 04:51:12PM +0100, Daniel Thompson wrote:
>> On 16/09/15 14:05, Will Deacon wrote:
>>> On Mon, Sep 14, 2015 at 02:26:17PM +0100, Daniel Thompson wrote:
>>>> Currently alternatives are applied very late in the boot process (and
>>>> a long time after we enable scheduling). Some alternative sequences,
>>>> such as those that alter the way CPU context is stored, must be applied
>>>> much earlier in the boot sequence.
>>>>
>>>> Introduce apply_alternatives_early() to allow some alternatives to be
>>>> applied immediately after we detect the CPU features of the boot CPU.
>>>>
>>>> Currently apply_alternatives_all() is not optimized and will re-patch
>>>> code that has already been updated. This is harmless but could be
>>>> removed by adding extra flags to the alternatives store.
>>>>
>>>> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
>>>> ---
>>> [snip]
>>>>    /*
>>>> + * This is called very early in the boot process (directly after we run
>>>> + * a feature detect on the boot CPU). No need to worry about other CPUs
>>>> + * here.
>>>> + */
>>>> +void apply_alternatives_early(void)
>>>> +{
>>>> +	struct alt_region region = {
>>>> +		.begin	= __alt_instructions,
>>>> +		.end	= __alt_instructions_end,
>>>> +	};
>>>> +
>>>> +	__apply_alternatives(&region);
>>>> +}
>>>
>>> How do you choose which alternatives are applied early and which are
>>> applied later? AFAICT, this just applies everything before we've
>>> established the capabilities of the CPUs in the system, which could cause
>>> problems for big/little SoCs.
>>
>> They are applied twice. This relies for correctness on the fact that
>> cpufeatures can be set but not unset.
>>
>> In other words the boot CPU does a feature detect and, as a result, a
>> subset of the required alternatives will be applied. However after this
>> the other CPUs will boot and the the remaining alternatives applied as
>> before.
>>
>> The current implementation is inefficient (because it will redundantly
>> patch the same code twice) but I don't think it is broken.
>
> What about a big/little system where we boot on the big cores and only
> they support LSE atomics?

Hmmnn... I don't think this patch will impact that.

Once something in the boot sequence calls cpus_set_cap() then if there 
is a corresponding alternative then it is *going* to be applied isn't 
it? The patch only means that some of the alternatives will be applied 
early. Once the boot is complete the patched .text should be the same 
with and without the patch.

Have I overlooked some code in the current kernel that prevents a system 
with mis-matched LSE support from applying the alternatives?


>>> Also, why do we need this for the NMI?
>>
>> I was/am concerned that a context saved before the alternatives are
>> applied might be restored afterwards. If that happens the bit that
>> indicates what value to put into the PMR would read during the restore
>> without having been saved first. Applying early ensures that the context
>> save/restore code is updated before it is ever used.
>
> Damn, and stop_machine makes use of local_irq_restore immediately after
> the patching has completed, so it's a non-starter. Still, special-casing
> this feature via an explicit apply_alternatives call would be better
> than moving everything earlier, I think.

Can you expand on you concerns here? Assuming I didn't miss anything 
about how the current machinery works then it really is only a matter of 
whether applying some alternatives early could harm the boot sequence. 
After we have booted the results should be the same.


> We also need to think about how an incoming NMI interacts with
> concurrent patching of later features. I suspect we want to set the I
> bit, like you do for WFI, unless you can guarantee that no patched
> sequences run in NMI context.

Good point. I'll fix this in the next respin.


Daniel.
Will Deacon Sept. 17, 2015, 2:01 p.m. | #5
On Thu, Sep 17, 2015 at 02:25:56PM +0100, Daniel Thompson wrote:
> On 16/09/15 17:24, Will Deacon wrote:
> > On Wed, Sep 16, 2015 at 04:51:12PM +0100, Daniel Thompson wrote:
> >> On 16/09/15 14:05, Will Deacon wrote:
> >>> On Mon, Sep 14, 2015 at 02:26:17PM +0100, Daniel Thompson wrote:
> >>>>    /*
> >>>> + * This is called very early in the boot process (directly after we run
> >>>> + * a feature detect on the boot CPU). No need to worry about other CPUs
> >>>> + * here.
> >>>> + */
> >>>> +void apply_alternatives_early(void)
> >>>> +{
> >>>> +	struct alt_region region = {
> >>>> +		.begin	= __alt_instructions,
> >>>> +		.end	= __alt_instructions_end,
> >>>> +	};
> >>>> +
> >>>> +	__apply_alternatives(&region);
> >>>> +}
> >>>
> >>> How do you choose which alternatives are applied early and which are
> >>> applied later? AFAICT, this just applies everything before we've
> >>> established the capabilities of the CPUs in the system, which could cause
> >>> problems for big/little SoCs.
> >>
> >> They are applied twice. This relies for correctness on the fact that
> >> cpufeatures can be set but not unset.
> >>
> >> In other words the boot CPU does a feature detect and, as a result, a
> >> subset of the required alternatives will be applied. However after this
> >> the other CPUs will boot and the the remaining alternatives applied as
> >> before.
> >>
> >> The current implementation is inefficient (because it will redundantly
> >> patch the same code twice) but I don't think it is broken.
> >
> > What about a big/little system where we boot on the big cores and only
> > they support LSE atomics?
> 
> Hmmnn... I don't think this patch will impact that.
> 
> Once something in the boot sequence calls cpus_set_cap() then if there 
> is a corresponding alternative then it is *going* to be applied isn't 
> it? The patch only means that some of the alternatives will be applied 
> early. Once the boot is complete the patched .text should be the same 
> with and without the patch.
> 
> Have I overlooked some code in the current kernel that prevents a system 
> with mis-matched LSE support from applying the alternatives?

Sorry, I'm thinking slightly ahead of myself, but the series from Suzuki
creates a shadow "safe" view of the ID registers in the system,
corresponding to the intersection of CPU features:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2015-September/370386.html

In this case, it is necessary to inspect all of the possible CPUs before
we can apply the patching, but as I say above, I'm prepared to make an
exception for NMI because I don't think we can assume a safe value anyway
for a system with mismatched GIC CPU interfaces. I just don't want to
drag all of the alternatives patching earlier as well.

> > We also need to think about how an incoming NMI interacts with
> > concurrent patching of later features. I suspect we want to set the I
> > bit, like you do for WFI, unless you can guarantee that no patched
> > sequences run in NMI context.
> 
> Good point. I'll fix this in the next respin.

Great, thanks. It probably also means that the NMI code needs
__kprobes/__notrace annotations for similar reasons.

Will
Daniel Thompson Sept. 17, 2015, 3:28 p.m. | #6
On 17/09/15 15:01, Will Deacon wrote:
> On Thu, Sep 17, 2015 at 02:25:56PM +0100, Daniel Thompson wrote:
>> On 16/09/15 17:24, Will Deacon wrote:
>>> On Wed, Sep 16, 2015 at 04:51:12PM +0100, Daniel Thompson wrote:
>>>> On 16/09/15 14:05, Will Deacon wrote:
>>>>> On Mon, Sep 14, 2015 at 02:26:17PM +0100, Daniel Thompson wrote:
>>>>>>     /*
>>>>>> + * This is called very early in the boot process (directly after we run
>>>>>> + * a feature detect on the boot CPU). No need to worry about other CPUs
>>>>>> + * here.
>>>>>> + */
>>>>>> +void apply_alternatives_early(void)
>>>>>> +{
>>>>>> +	struct alt_region region = {
>>>>>> +		.begin	= __alt_instructions,
>>>>>> +		.end	= __alt_instructions_end,
>>>>>> +	};
>>>>>> +
>>>>>> +	__apply_alternatives(&region);
>>>>>> +}
>>>>>
>>>>> How do you choose which alternatives are applied early and which are
>>>>> applied later? AFAICT, this just applies everything before we've
>>>>> established the capabilities of the CPUs in the system, which could cause
>>>>> problems for big/little SoCs.
>>>>
>>>> They are applied twice. This relies for correctness on the fact that
>>>> cpufeatures can be set but not unset.
>>>>
>>>> In other words the boot CPU does a feature detect and, as a result, a
>>>> subset of the required alternatives will be applied. However after this
>>>> the other CPUs will boot and the the remaining alternatives applied as
>>>> before.
>>>>
>>>> The current implementation is inefficient (because it will redundantly
>>>> patch the same code twice) but I don't think it is broken.
>>>
>>> What about a big/little system where we boot on the big cores and only
>>> they support LSE atomics?
>>
>> Hmmnn... I don't think this patch will impact that.
>>
>> Once something in the boot sequence calls cpus_set_cap() then if there
>> is a corresponding alternative then it is *going* to be applied isn't
>> it? The patch only means that some of the alternatives will be applied
>> early. Once the boot is complete the patched .text should be the same
>> with and without the patch.
>>
>> Have I overlooked some code in the current kernel that prevents a system
>> with mis-matched LSE support from applying the alternatives?
>
> Sorry, I'm thinking slightly ahead of myself, but the series from Suzuki
> creates a shadow "safe" view of the ID registers in the system,
> corresponding to the intersection of CPU features:
>
>    http://lists.infradead.org/pipermail/linux-arm-kernel/2015-September/370386.html
>
> In this case, it is necessary to inspect all of the possible CPUs before
> we can apply the patching, but as I say above, I'm prepared to make an
> exception for NMI because I don't think we can assume a safe value anyway
> for a system with mismatched GIC CPU interfaces. I just don't want to
> drag all of the alternatives patching earlier as well.

Thanks. I'll take a close look at this patch set and work out how to 
cooperate with it.

However I would like, if I can, to persuade you that we are making an 
exception ARM64_HAS_SYSREG_GIC_CPUIF rather than specifically for things 
that are NMI related. AFAIK all ARMv8 cores have a GIC_CPUIF and the 
system either has a GICv3+ or it doesn't so it shouldn't matter what 
core you check the feature on; it is in the nature of the feature we are 
detecting that it is safe to patch early.

To some extent this is quibbling about semantics but:

1. Treating this as a general case will put us in a good position if we
    ever have to deal with an errata that cannot wait until the system
    has nearly finished booting.

2. It makes the resulting code very simple because we can just have a
    bitmask indicating which cpufeatures we need should apply early and
    which we apply late. That in turn means we don't have to
    differentiate NMI alternatives from other alternatives (thus avoiding
    a bunch of new alternative macros).

I'm not seeking any kind binding agreement from you before you see the 
patch but if you *know* right now that you would nack something that 
follows the above thinking then please let me know so I don't waste time 
writing it ;-) . If you're on the fence I'll happily write the patch and 
you can see what I think then.


>>> We also need to think about how an incoming NMI interacts with
>>> concurrent patching of later features. I suspect we want to set the I
>>> bit, like you do for WFI, unless you can guarantee that no patched
>>> sequences run in NMI context.
>>
>> Good point. I'll fix this in the next respin.
>
> Great, thanks. It probably also means that the NMI code needs
> __kprobes/__notrace annotations for similar reasons.

Oops. That I really should have thought about already (but I didn't).


Daniel.
Will Deacon Sept. 17, 2015, 3:43 p.m. | #7
On Thu, Sep 17, 2015 at 04:28:11PM +0100, Daniel Thompson wrote:
> On 17/09/15 15:01, Will Deacon wrote:
> > Sorry, I'm thinking slightly ahead of myself, but the series from Suzuki
> > creates a shadow "safe" view of the ID registers in the system,
> > corresponding to the intersection of CPU features:
> >
> >    http://lists.infradead.org/pipermail/linux-arm-kernel/2015-September/370386.html
> >
> > In this case, it is necessary to inspect all of the possible CPUs before
> > we can apply the patching, but as I say above, I'm prepared to make an
> > exception for NMI because I don't think we can assume a safe value anyway
> > for a system with mismatched GIC CPU interfaces. I just don't want to
> > drag all of the alternatives patching earlier as well.
> 
> Thanks. I'll take a close look at this patch set and work out how to 
> cooperate with it.

Brill, thanks.

> However I would like, if I can, to persuade you that we are making an 
> exception ARM64_HAS_SYSREG_GIC_CPUIF rather than specifically for things 
> that are NMI related.

Sure, I conflated the two above.

> AFAIK all ARMv8 cores have a GIC_CPUIF and the system either has a GICv3+
> or it doesn't so it shouldn't matter what core you check the feature on;
> it is in the nature of the feature we are detecting that it is safe to
> patch early.

I'm at all convinced that its not possible to build something with
mismatched CPU interfaces, but that's not something we can support in
Linux without significant rework of the GIC code, so we can ignore that
possibility for now.

> To some extent this is quibbling about semantics but:
> 
> 1. Treating this as a general case will put us in a good position if we
>     ever have to deal with an errata that cannot wait until the system
>     has nearly finished booting.
> 
> 2. It makes the resulting code very simple because we can just have a
>     bitmask indicating which cpufeatures we need should apply early and
>     which we apply late. That in turn means we don't have to
>     differentiate NMI alternatives from other alternatives (thus avoiding
>     a bunch of new alternative macros).
> 
> I'm not seeking any kind binding agreement from you before you see the 
> patch but if you *know* right now that you would nack something that 
> follows the above thinking then please let me know so I don't waste time 
> writing it ;-) . If you're on the fence I'll happily write the patch and 
> you can see what I think then

I don't object to the early patching if it's done on an opt-in basis for
features that (a) really need it and (b) are guaranteed to work across
the whole system for anything that Linux supports.

Deal? I think it gives you the rope you need :)

Will

Patch

diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index d56ec0715157..f9dad1b7c651 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -17,6 +17,7 @@  struct alt_instr {
 	u8  alt_len;		/* size of new instruction(s), <= orig_len */
 };
 
+void __init apply_alternatives_early(void);
 void __init apply_alternatives_all(void);
 void apply_alternatives(void *start, size_t length);
 void free_alternatives_memory(void);
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index ab9db0e9818c..59989a4bed7c 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -117,6 +117,21 @@  static void __apply_alternatives(void *alt_region)
 }
 
 /*
+ * This is called very early in the boot process (directly after we run
+ * a feature detect on the boot CPU). No need to worry about other CPUs
+ * here.
+ */
+void apply_alternatives_early(void)
+{
+	struct alt_region region = {
+		.begin	= __alt_instructions,
+		.end	= __alt_instructions_end,
+	};
+
+	__apply_alternatives(&region);
+}
+
+/*
  * We might be patching the stop_machine state machine, so implement a
  * really simple polling protocol here.
  */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 6bab21f84a9f..0cddc5ff8089 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -211,6 +211,13 @@  static void __init setup_processor(void)
 	cpuinfo_store_boot_cpu();
 
 	/*
+	 * We now know enough about the boot CPU to apply the
+	 * alternatives that cannot wait until interrupt handling
+	 * and/or scheduling is enabled.
+	 */
+	apply_alternatives_early();
+
+	/*
 	 * Check for sane CTR_EL0.CWG value.
 	 */
 	cwg = cache_type_cwg();