diff mbox series

[RFC,v3,14/21] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()

Message ID E1rDOgx-00Dvkv-Bb@rmk-PC.armlinux.org.uk
State New
Headers show
Series ACPI/arm64: add support for virtual cpu hotplug | expand

Commit Message

Russell King (Oracle) Dec. 13, 2023, 12:50 p.m. UTC
From: James Morse <james.morse@arm.com>

gic_acpi_match_gicc() is only called via gic_acpi_count_gicr_regions().
It should only count the number of enabled redistributors, but it
also tries to sanity check the GICC entry, currently returning an
error if the Enabled bit is set, but the gicr_base_address is zero.

Adding support for the online-capable bit to the sanity check
complicates it, for no benefit. The existing check implicitly
depends on gic_acpi_count_gicr_regions() previous failing to find
any GICR regions (as it is valid to have gicr_base_address of zero if
the redistributors are described via a GICR entry).

Instead of complicating the check, remove it. Failures that happen
at this point cause the irqchip not to register, meaning no irqs
can be requested. The kernel grinds to a panic() pretty quickly.

Without the check, MADT tables that exhibit this problem are still
caught by gic_populate_rdist(), which helpfully also prints what
went wrong:
| CPU4: mpidr 100 has no re-distributor!

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/irqchip/irq-gic-v3.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

Comments

Jonathan Cameron Dec. 15, 2023, 4:33 p.m. UTC | #1
On Wed, 13 Dec 2023 12:50:23 +0000
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> wrote:

> From: James Morse <james.morse@arm.com>
> 
> gic_acpi_match_gicc() is only called via gic_acpi_count_gicr_regions().
> It should only count the number of enabled redistributors, but it
> also tries to sanity check the GICC entry, currently returning an
> error if the Enabled bit is set, but the gicr_base_address is zero.
> 
> Adding support for the online-capable bit to the sanity check
> complicates it, for no benefit. The existing check implicitly
> depends on gic_acpi_count_gicr_regions() previous failing to find
> any GICR regions (as it is valid to have gicr_base_address of zero if
> the redistributors are described via a GICR entry).
> 
> Instead of complicating the check, remove it. Failures that happen
> at this point cause the irqchip not to register, meaning no irqs
> can be requested. The kernel grinds to a panic() pretty quickly.
> 
> Without the check, MADT tables that exhibit this problem are still
> caught by gic_populate_rdist(), which helpfully also prints what
> went wrong:
> | CPU4: mpidr 100 has no re-distributor!
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Tested-by: Miguel Luis <miguel.luis@oracle.com>
> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
>  drivers/irqchip/irq-gic-v3.c | 18 ++++++------------
>  1 file changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 98b0329b7154..ebecd4546830 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -2420,21 +2420,15 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
>  
>  	/*
>  	 * If GICC is enabled and has valid gicr base address, then it means
> -	 * GICR base is presented via GICC
> +	 * GICR base is presented via GICC. The redistributor is only known to
> +	 * be accessible if the GICC is marked as enabled. If this bit is not
> +	 * set, we'd need to add the redistributor at runtime, which isn't
> +	 * supported.
>  	 */
> -	if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {
> +	if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)

I was very vague in previous review.  I think the reasons you are switching
from acpi_gicc_is_useable(gicc) to the gicc->flags & ACPI_MADT_ENABLED
needs calling out as I'm fairly sure that this point in the series at least
acpi_gicc_is_usable is same as current upstream:

static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
{
	return gicc->flags & ACPI_MADT_ENABLED;
}

>  		acpi_data.enabled_rdists++;
> -		return 0;
> -	}
>  
> -	/*
> -	 * It's perfectly valid firmware can pass disabled GICC entry, driver
> -	 * should not treat as errors, skip the entry instead of probe fail.
> -	 */
> -	if (!acpi_gicc_is_usable(gicc))
> -		return 0;
> -
> -	return -ENODEV;
> +	return 0;
>  }
>  
>  static int __init gic_acpi_count_gicr_regions(void)
Russell King (Oracle) Jan. 9, 2024, 7:27 p.m. UTC | #2
On Fri, Dec 15, 2023 at 04:33:01PM +0000, Jonathan Cameron wrote:
> On Wed, 13 Dec 2023 12:50:23 +0000
> Russell King (Oracle) <rmk+kernel@armlinux.org.uk> wrote:
> 
> > From: James Morse <james.morse@arm.com>
> > 
> > gic_acpi_match_gicc() is only called via gic_acpi_count_gicr_regions().
> > It should only count the number of enabled redistributors, but it
> > also tries to sanity check the GICC entry, currently returning an
> > error if the Enabled bit is set, but the gicr_base_address is zero.
> > 
> > Adding support for the online-capable bit to the sanity check
> > complicates it, for no benefit. The existing check implicitly
> > depends on gic_acpi_count_gicr_regions() previous failing to find
> > any GICR regions (as it is valid to have gicr_base_address of zero if
> > the redistributors are described via a GICR entry).
> > 
> > Instead of complicating the check, remove it. Failures that happen
> > at this point cause the irqchip not to register, meaning no irqs
> > can be requested. The kernel grinds to a panic() pretty quickly.
> > 
> > Without the check, MADT tables that exhibit this problem are still
> > caught by gic_populate_rdist(), which helpfully also prints what
> > went wrong:
> > | CPU4: mpidr 100 has no re-distributor!
> > 
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > Tested-by: Miguel Luis <miguel.luis@oracle.com>
> > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> > Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > ---
> >  drivers/irqchip/irq-gic-v3.c | 18 ++++++------------
> >  1 file changed, 6 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > index 98b0329b7154..ebecd4546830 100644
> > --- a/drivers/irqchip/irq-gic-v3.c
> > +++ b/drivers/irqchip/irq-gic-v3.c
> > @@ -2420,21 +2420,15 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
> >  
> >  	/*
> >  	 * If GICC is enabled and has valid gicr base address, then it means
> > -	 * GICR base is presented via GICC
> > +	 * GICR base is presented via GICC. The redistributor is only known to
> > +	 * be accessible if the GICC is marked as enabled. If this bit is not
> > +	 * set, we'd need to add the redistributor at runtime, which isn't
> > +	 * supported.
> >  	 */
> > -	if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {
> > +	if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)
> 
> I was very vague in previous review.  I think the reasons you are switching
> from acpi_gicc_is_useable(gicc) to the gicc->flags & ACPI_MADT_ENABLED
> needs calling out as I'm fairly sure that this point in the series at least
> acpi_gicc_is_usable is same as current upstream:
> 
> static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
> {
> 	return gicc->flags & ACPI_MADT_ENABLED;
> }

In a previous patch adding acpi_gicc_is_usable() c54e52f84d7a ("arm64,
irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper") this
was:

-       if ((gicc->flags & ACPI_MADT_ENABLED) && gicc->gicr_base_address) {
+       if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {

so effectively this is undoing that particular change, which raises in
my mind why the change was made in the first place if it's just going
to be reverted in a later patch (because in a following patch,
acpi_gicc_is_usable() has an additional condition added to it that
isn't applicable here.) which effectively makes acpi_gicc_is_usable()
return true if either ACPI_MADT_ENABLED _or_
ACPI_MADT_GICC_ONLINE_CAPABLE (as it is now known) are set.

However, if ACPI_MADT_GICC_ONLINE_CAPABLE is set, does that actually
mean that the GICC is usable? I'm not sure it does. ACPI v6.5 says that
this bit indicates that the system supports enabling this processor
later. Is the GICC of a currently disabled processor "usable"...

Clearly, the intention of this change is not to count this GICC entry
if it is marked ACPI_MADT_GICC_ONLINE_CAPABLE, but I feel that isn't
described in the commit message.

Moreover, I am getting the feeling that there are _two_ changes going
on here - there's the change that's talked about in the commit message
(the complex validation that seems unnecessary) and then there's the
preparation for the change to acpi_gicc_is_usable() - which maybe
should be in the following patch where it would be less confusing.

Would you agree?
Jonathan Cameron Jan. 23, 2024, 10:08 a.m. UTC | #3
On Tue, 9 Jan 2024 19:27:20 +0000
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Fri, Dec 15, 2023 at 04:33:01PM +0000, Jonathan Cameron wrote:
> > On Wed, 13 Dec 2023 12:50:23 +0000
> > Russell King (Oracle) <rmk+kernel@armlinux.org.uk> wrote:
> >   
> > > From: James Morse <james.morse@arm.com>
> > > 
> > > gic_acpi_match_gicc() is only called via gic_acpi_count_gicr_regions().
> > > It should only count the number of enabled redistributors, but it
> > > also tries to sanity check the GICC entry, currently returning an
> > > error if the Enabled bit is set, but the gicr_base_address is zero.
> > > 
> > > Adding support for the online-capable bit to the sanity check
> > > complicates it, for no benefit. The existing check implicitly
> > > depends on gic_acpi_count_gicr_regions() previous failing to find
> > > any GICR regions (as it is valid to have gicr_base_address of zero if
> > > the redistributors are described via a GICR entry).
> > > 
> > > Instead of complicating the check, remove it. Failures that happen
> > > at this point cause the irqchip not to register, meaning no irqs
> > > can be requested. The kernel grinds to a panic() pretty quickly.
> > > 
> > > Without the check, MADT tables that exhibit this problem are still
> > > caught by gic_populate_rdist(), which helpfully also prints what
> > > went wrong:
> > > | CPU4: mpidr 100 has no re-distributor!
> > > 
> > > Signed-off-by: James Morse <james.morse@arm.com>
> > > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > > Tested-by: Miguel Luis <miguel.luis@oracle.com>
> > > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> > > Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > > ---
> > >  drivers/irqchip/irq-gic-v3.c | 18 ++++++------------
> > >  1 file changed, 6 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > > index 98b0329b7154..ebecd4546830 100644
> > > --- a/drivers/irqchip/irq-gic-v3.c
> > > +++ b/drivers/irqchip/irq-gic-v3.c
> > > @@ -2420,21 +2420,15 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
> > >  
> > >  	/*
> > >  	 * If GICC is enabled and has valid gicr base address, then it means
> > > -	 * GICR base is presented via GICC
> > > +	 * GICR base is presented via GICC. The redistributor is only known to
> > > +	 * be accessible if the GICC is marked as enabled. If this bit is not
> > > +	 * set, we'd need to add the redistributor at runtime, which isn't
> > > +	 * supported.
> > >  	 */
> > > -	if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {
> > > +	if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)  
> > 
> > I was very vague in previous review.  I think the reasons you are switching
> > from acpi_gicc_is_useable(gicc) to the gicc->flags & ACPI_MADT_ENABLED
> > needs calling out as I'm fairly sure that this point in the series at least
> > acpi_gicc_is_usable is same as current upstream:
> > 
> > static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
> > {
> > 	return gicc->flags & ACPI_MADT_ENABLED;
> > }  
> 
> In a previous patch adding acpi_gicc_is_usable() c54e52f84d7a ("arm64,
> irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper") this
> was:
> 
> -       if ((gicc->flags & ACPI_MADT_ENABLED) && gicc->gicr_base_address) {
> +       if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {
> 
> so effectively this is undoing that particular change, which raises in
> my mind why the change was made in the first place if it's just going
> to be reverted in a later patch (because in a following patch,
> acpi_gicc_is_usable() has an additional condition added to it that
> isn't applicable here.) which effectively makes acpi_gicc_is_usable()
> return true if either ACPI_MADT_ENABLED _or_
> ACPI_MADT_GICC_ONLINE_CAPABLE (as it is now known) are set.

Ok. So maybe just calling out that we are about to change the meaning
of acpi_gicc_is_usable() so need to partly revert that earlier patch
to make use of it everywhere.

Or perhaps introduce
acpi_gicc_is_enabled() which is called by acpi_gicc_is_usable()
along with the new conditions when they are added though as you
say later, what does usable mean?

> 
> However, if ACPI_MADT_GICC_ONLINE_CAPABLE is set, does that actually
> mean that the GICC is usable? I'm not sure it does. ACPI v6.5 says that
> this bit indicates that the system supports enabling this processor
> later. Is the GICC of a currently disabled processor "usable"...

I agree, this is confusing.

acpi_gicc_may_be_usable()?

Or invert it in all places to give a cleaner meaning
!acpi_gicc_never_usable()

Bit of a pain to change this throughout again, but maybe necessary
to avoid confusion in future.

> 
> Clearly, the intention of this change is not to count this GICC entry
> if it is marked ACPI_MADT_GICC_ONLINE_CAPABLE, but I feel that isn't
> described in the commit message.

Agreed, though that only happens in the next patch so easier to describe
there or via a patch adding initially identical multiple helper functions
that then diverge in following patch?

Whilst a helper for this one location seems silly it would let us put
the two helpers next to each other where the distinction is obvious.

> 
> Moreover, I am getting the feeling that there are _two_ changes going
> on here - there's the change that's talked about in the commit message
> (the complex validation that seems unnecessary) and then there's the
> preparation for the change to acpi_gicc_is_usable() - which maybe
> should be in the following patch where it would be less confusing.

Agreed.

> 
> Would you agree?
> 
Yes, the move would help as then it's obvious why this needs to change
and that is separate from the naming question.

So in conclusion, I agree with everything you've called out on this one,
up to you to pick which solution cleans this up. I think options are.
1) Just move the change to the next patch where it's easier to describe.
   Leaves the odd 'usable' behind.
2) Rename the useable() to something else, maybe inverting logic as
   !never is easier than now_or_maybe_later.
3) Possibly add another helper for this new case which starts as matching
   the existing one, but diverges in a later patch (Should still not be
   in this patch which as you observer is doing something else and I think
   is actually a bug fix anyway, be it one that has never mattered for
   any shipping firmware).

Jonathan
diff mbox series

Patch

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 98b0329b7154..ebecd4546830 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -2420,21 +2420,15 @@  static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
 
 	/*
 	 * If GICC is enabled and has valid gicr base address, then it means
-	 * GICR base is presented via GICC
+	 * GICR base is presented via GICC. The redistributor is only known to
+	 * be accessible if the GICC is marked as enabled. If this bit is not
+	 * set, we'd need to add the redistributor at runtime, which isn't
+	 * supported.
 	 */
-	if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {
+	if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)
 		acpi_data.enabled_rdists++;
-		return 0;
-	}
 
-	/*
-	 * It's perfectly valid firmware can pass disabled GICC entry, driver
-	 * should not treat as errors, skip the entry instead of probe fail.
-	 */
-	if (!acpi_gicc_is_usable(gicc))
-		return 0;
-
-	return -ENODEV;
+	return 0;
 }
 
 static int __init gic_acpi_count_gicr_regions(void)