mbox series

[v2,0/3] Support managed interrupts for platform devices

Message ID 1603888387-52499-1-git-send-email-john.garry@huawei.com
Headers show
Series Support managed interrupts for platform devices | expand

Message

John Garry Oct. 28, 2020, 12:33 p.m. UTC
So far, managed interrupts are only used for PCI MSIs. This series add
platform device support for managed interrupts. Initially this topic was
discussed at [0].

The method to enable managed interrupts is to allocate all the IRQs for
the device, and then switch the interrupts to managed - this is done
through new function irq_update_affinity_desc().

API platform_get_irqs_affinity() is added as a helper to manage this work,
such that we don't need to export irq_update_affinity_desc() or
irq_create_affinity_masks().

For now, the HiSilicon SAS v2 hw driver is switched over. This is used
in the D05 dev board.

Performance gain observed for 6x SAS SSDs is ~357K -> 420K IOPs for fio read.

[0] https://lore.kernel.org/lkml/84a9411b-4ae3-1928-3d35-1666f2687ec8@huawei.com/

Changes since v1:
- Update authorship on genirq change

John Garry (3):
  genirq/affinity: Add irq_update_affinity_desc()
  Driver core: platform: Add platform_get_irqs_affinity()
  scsi: hisi_sas: Expose HW queues for v2 hw

 drivers/base/platform.c                | 58 +++++++++++++++++++++
 drivers/scsi/hisi_sas/hisi_sas.h       |  4 ++
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 11 ++++
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 71 ++++++++++++++++++++++----
 include/linux/interrupt.h              |  8 +++
 include/linux/platform_device.h        |  5 ++
 kernel/irq/manage.c                    | 19 +++++++
 7 files changed, 165 insertions(+), 11 deletions(-)

Comments

Thomas Gleixner Oct. 28, 2020, 6:22 p.m. UTC | #1
On Wed, Oct 28 2020 at 20:33, John Garry wrote:
>  
> +int irq_update_affinity_desc(unsigned int irq,
> +			     struct irq_affinity_desc *affinity)
> +{
> +	unsigned long flags;
> +	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
> +
> +	if (!desc)
> +		return -EINVAL;

Just looking at it some more. This needs a check whether the interrupt
is actually shut down. Otherwise the update will corrupt
state. Something like this:

        if (irqd_is_started(&desc->irq_data))
        	return -EBUSY;

But all of this can't work on x86 due to the way how vector allocation
works. Let me think about that.

Thanks,

        tglx
John Garry Nov. 2, 2020, 5:32 p.m. UTC | #2
On 28/10/2020 18:22, Thomas Gleixner wrote:
> On Wed, Oct 28 2020 at 20:33, John Garry wrote:

Hi Thomas,

>>   
>> +int irq_update_affinity_desc(unsigned int irq,
>> +			     struct irq_affinity_desc *affinity)
>> +{
>> +	unsigned long flags;
>> +	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
>> +
>> +	if (!desc)
>> +		return -EINVAL;
> Just looking at it some more. This needs a check whether the interrupt
> is actually shut down. Otherwise the update will corrupt
> state. Something like this:
> 
>          if (irqd_is_started(&desc->irq_data))
>          	return -EBUSY;
> 
> But all of this can't work on x86 due to the way how vector allocation
> works. Let me think about that.
> 

Is the problem that we reserve per-cpu managed interrupt space when 
allocated irq vectors on x86, and so later changing managed vs 
non-managed setting for irqs messes up this accounting somehow?

Cheers,
John
Thomas Gleixner Nov. 2, 2020, 8:35 p.m. UTC | #3
On Mon, Nov 02 2020 at 17:32, John Garry wrote:
> On 28/10/2020 18:22, Thomas Gleixner wrote:
>> But all of this can't work on x86 due to the way how vector allocation
>> works. Let me think about that.
>
> Is the problem that we reserve per-cpu managed interrupt space when 
> allocated irq vectors on x86, and so later changing managed vs 
> non-managed setting for irqs messes up this accounting somehow?

Correct. I have a halfways working solution for that, but I need to fix
some other thing first.

Thanks,

        tglx
Thomas Gleixner Nov. 17, 2020, 9:28 p.m. UTC | #4
On Mon, Nov 02 2020 at 21:35, Thomas Gleixner wrote:
> On Mon, Nov 02 2020 at 17:32, John Garry wrote:

> Correct. I have a halfways working solution for that, but I need to fix

> some other thing first.


Sorry for the delay. Supporting this truly on x86 needs some more
thought and surgery, but for ARM it should not matter. I made a few
tweaks to your original code. See below.

Thanks,

        tglx
---
From: John Garry <john.garry@huawei.com>

Subject: genirq/affinity: Add irq_update_affinity_desc()
Date: Wed, 28 Oct 2020 20:33:05 +0800

From: John Garry <john.garry@huawei.com>


Add a function to allow the affinity of an interrupt be switched to
managed, such that interrupts allocated for platform devices may be
managed.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Garry <john.garry@huawei.com>

---
 include/linux/interrupt.h |    8 ++++++
 kernel/irq/manage.c       |   56 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -352,6 +352,8 @@ extern int irq_can_set_affinity(unsigned
 extern int irq_select_affinity(unsigned int irq);
 
 extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
+extern int irq_update_affinity_desc(unsigned int irq,
+				    struct irq_affinity_desc *affinity);
 
 extern int
 irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);
@@ -386,6 +388,12 @@ static inline int irq_set_affinity_hint(
 {
 	return -EINVAL;
 }
+
+static inline int irq_update_affinity_desc(unsigned int irq,
+					   struct irq_affinity_desc *affinity)
+{
+	return -EINVAL;
+}
 
 static inline int
 irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -371,6 +371,62 @@ int irq_set_affinity_locked(struct irq_d
 	return ret;
 }
 
+/**
+ * irq_update_affinity_desc - Update affinity management for an interrupt
+ * @irq:	The interrupt number to update
+ * @affinity:	Pointer to the affinity descriptor
+ *
+ * This interface can be used to configure the affinity management of
+ * interrupts which have been allocated already.
+ */
+int irq_update_affinity_desc(unsigned int irq,
+			     struct irq_affinity_desc *affinity)
+{
+	struct irq_desc *desc;
+	unsigned long flags;
+	bool activated;
+
+	/*
+	 * Supporting this with the reservation scheme used by x86 needs
+	 * some more thought. Fail it for now.
+	 */
+	if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE))
+		return -EOPNOTSUPP;
+
+	desc = irq_get_desc_buslock(irq, &flags, 0);
+	if (!desc)
+		return -EINVAL;
+
+	/* Requires the interrupt to be shut down */
+	if (irqd_is_started(&desc->irq_data))
+		return -EBUSY;
+
+	/* Interrupts which are already managed cannot be modified */
+	if (irqd_is_managed(&desc->irq_data))
+		return -EBUSY;
+
+	/*
+	 * Deactivate the interrupt. That's required to undo
+	 * anything an earlier activation has established.
+	 */
+	activated = irqd_is_activated(&desc->irq_data);
+	if (activated)
+		irq_domain_deactivate_irq(&desc->irq_data);
+
+	if (affinity->is_managed) {
+		irqd_set(&desc->irq_data, IRQD_AFFINITY_MANAGED);
+		irqd_set(&desc->irq_data, IRQD_MANAGED_SHUTDOWN);
+	}
+
+	cpumask_copy(desc->irq_common_data.affinity, &affinity->mask);
+
+	/* Restore the activation state */
+	if (activated)
+		irq_domain_deactivate_irq(&desc->irq_data);
+	irq_put_desc_busunlock(desc, flags);
+	return 0;
+}
+
 int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
John Garry Nov. 18, 2020, 11:34 a.m. UTC | #5
Hi Thomas,

> 

> Sorry for the delay. 


No worries, Thanks for the effort.

> Supporting this truly on x86 needs some more

> thought and surgery, but for ARM it should not matter. 


ok, as long as you are content not to support x86 atm.

> I made a few

> tweaks to your original code. See below.


I think maybe a few more tweaks, below. Apart from that, it looks to 
work ok.

> 

> Thanks,

> 

>          tglx

> ---

> From: John Garry <john.garry@huawei.com>

> Subject: genirq/affinity: Add irq_update_affinity_desc()

> Date: Wed, 28 Oct 2020 20:33:05 +0800

> 

> From: John Garry <john.garry@huawei.com>

> 

> Add a function to allow the affinity of an interrupt be switched to

> managed, such that interrupts allocated for platform devices may be

> managed.

> 

> Suggested-by: Thomas Gleixner <tglx@linutronix.de>

> Signed-off-by: John Garry <john.garry@huawei.com>

> ---

>   include/linux/interrupt.h |    8 ++++++

>   kernel/irq/manage.c       |   56 ++++++++++++++++++++++++++++++++++++++++++++++

>   2 files changed, 64 insertions(+)

> 


...

> +/**

> + * irq_update_affinity_desc - Update affinity management for an interrupt

> + * @irq:	The interrupt number to update

> + * @affinity:	Pointer to the affinity descriptor

> + *

> + * This interface can be used to configure the affinity management of

> + * interrupts which have been allocated already.

> + */

> +int irq_update_affinity_desc(unsigned int irq,

> +			     struct irq_affinity_desc *affinity)


Just a note on the return value, in the only current callsite - 
platform_get_irqs_affinity() - we don't check the return value and 
propagate the error. This is because we don't want to fail the interrupt 
init just because of problems updating the affinity mask. So I could 
print a message to inform the user of error (at the callsite).

> +{

> +	struct irq_desc *desc;

> +	unsigned long flags;

> +	bool activated;

> +

> +	/*

> +	 * Supporting this with the reservation scheme used by x86 needs

> +	 * some more thought. Fail it for now.

> +	 */

> +	if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE))

> +		return -EOPNOTSUPP;

> +

> +	desc = irq_get_desc_buslock(irq, &flags, 0);

> +	if (!desc)

> +		return -EINVAL;

> +

> +	/* Requires the interrupt to be shut down */

> +	if (irqd_is_started(&desc->irq_data))


We're missing the unlock here, right?

> +		return -EBUSY;

> +

> +	/* Interrupts which are already managed cannot be modified */

> +	if (irqd_is_managed(&desc->irq_data))


And here, and I figure that this should be irqd_affinity_is_managed()

> +		return -EBUSY;

> +

> +	/*

> +	 * Deactivate the interrupt. That's required to undo

> +	 * anything an earlier activation has established.

> +	 */

> +	activated = irqd_is_activated(&desc->irq_data);

> +	if (activated)

> +		irq_domain_deactivate_irq(&desc->irq_data);

> +

> +	if (affinity->is_managed) {

> +		irqd_set(&desc->irq_data, IRQD_AFFINITY_MANAGED);

> +		irqd_set(&desc->irq_data, IRQD_MANAGED_SHUTDOWN);

> +	}

> +

> +	cpumask_copy(desc->irq_common_data.affinity, &affinity->mask);

> +

> +	/* Restore the activation state */

> +	if (activated)

> +		irq_domain_deactivate_irq(&desc->irq_data);

> +	irq_put_desc_busunlock(desc, flags);

> +	return 0;

> +}

> +

>   int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force)

>   {

>   	struct irq_desc *desc = irq_to_desc(irq);

> .

> 


Thanks,
John
Thomas Gleixner Nov. 18, 2020, 8:38 p.m. UTC | #6
John,

On Wed, Nov 18 2020 at 11:34, John Garry wrote:
>> +/**

>> + * irq_update_affinity_desc - Update affinity management for an interrupt

>> + * @irq:	The interrupt number to update

>> + * @affinity:	Pointer to the affinity descriptor

>> + *

>> + * This interface can be used to configure the affinity management of

>> + * interrupts which have been allocated already.

>> + */

>> +int irq_update_affinity_desc(unsigned int irq,

>> +			     struct irq_affinity_desc *affinity)

>

> Just a note on the return value, in the only current callsite - 

> platform_get_irqs_affinity() - we don't check the return value and 

> propagate the error. This is because we don't want to fail the interrupt 

> init just because of problems updating the affinity mask. So I could 

> print a message to inform the user of error (at the callsite).


Well, not sure about that. During init on a platform which does not have
the issues with reservation mode there failure cases are:

 1) Interrupt does not exist. Definitely a full fail

 2) Interrupt is already started up. Not a good idea on init() and
    a clear fail.

 3) Interrupt has already been switched to managed. Double init is not
    really a good sign either.

>> +	/* Requires the interrupt to be shut down */

>> +	if (irqd_is_started(&desc->irq_data))

>

> We're missing the unlock here, right?


Duh yes.

>> +		return -EBUSY;

>> +

>> +	/* Interrupts which are already managed cannot be modified */

>> +	if (irqd_is_managed(&desc->irq_data))

>

> And here, and I figure that this should be irqd_affinity_is_managed()


More duh :)

I assume you send a fixed variant of this.

Thanks,

        tglx
John Garry Nov. 19, 2020, 9:31 a.m. UTC | #7
Hi Thomas,

>>> +int irq_update_affinity_desc(unsigned int irq,

>>> +			     struct irq_affinity_desc *affinity)

>> Just a note on the return value, in the only current callsite -

>> platform_get_irqs_affinity() - we don't check the return value and

>> propagate the error. This is because we don't want to fail the interrupt

>> init just because of problems updating the affinity mask. So I could

>> print a message to inform the user of error (at the callsite).

> Well, not sure about that. During init on a platform which does not have

> the issues with reservation mode there failure cases are:

> 

>   1) Interrupt does not exist. Definitely a full fail

> 

>   2) Interrupt is already started up. Not a good idea on init() and

>      a clear fail.

> 

>   3) Interrupt has already been switched to managed. Double init is not

>      really a good sign either.


I just tested that and case 3) would be a problem. I don't see us 
clearing the managed flag when free'ing the interrupt. So with 
CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, we attempt this affinity update 
twice, and error from the irqd_affinity_is_managed() check.

> 

>>> +	/* Requires the interrupt to be shut down */

>>> +	if (irqd_is_started(&desc->irq_data))

>> We're missing the unlock here, right?

> Duh yes.

> 

>>> +		return -EBUSY;

>>> +

>>> +	/* Interrupts which are already managed cannot be modified */

>>> +	if (irqd_is_managed(&desc->irq_data))

>> And here, and I figure that this should be irqd_affinity_is_managed()

> More duh:)

> 

> I assume you send a fixed variant of this.


Can do.

Thanks,
John
Thomas Gleixner Nov. 19, 2020, 6:09 p.m. UTC | #8
On Thu, Nov 19 2020 at 09:31, John Garry wrote:
>>   1) Interrupt does not exist. Definitely a full fail

>> 

>>   2) Interrupt is already started up. Not a good idea on init() and

>>      a clear fail.

>> 

>>   3) Interrupt has already been switched to managed. Double init is not

>>      really a good sign either.

>

> I just tested that and case 3) would be a problem. I don't see us 

> clearing the managed flag when free'ing the interrupt. So with 

> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, we attempt this affinity update 

> twice, and error from the irqd_affinity_is_managed() check.


That means the interrupt is not deallocated and reallocated, which does
not make sense to me.

Thanks,

        tglx
John Garry Nov. 19, 2020, 7:56 p.m. UTC | #9
Hi Thomas,

>>>    3) Interrupt has already been switched to managed. Double init is not

>>>       really a good sign either.

>> I just tested that and case 3) would be a problem. I don't see us

>> clearing the managed flag when free'ing the interrupt. So with

>> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, we attempt this affinity update

>> twice, and error from the irqd_affinity_is_managed() check.

> That means the interrupt is not deallocated and reallocated, which does

> not make sense to me.

> 


Just mentioning a couple of things here, which could be a clue to what 
is going on:
- the device is behind mbigen secondary irq controller
- the flow in the LLDD is to allocate all 128 interrupts during probe, 
but we only register handlers for a subset with device managed API

Thanks,
John
Thomas Gleixner Nov. 19, 2020, 9:03 p.m. UTC | #10
On Thu, Nov 19 2020 at 19:56, John Garry wrote:
>>>>    3) Interrupt has already been switched to managed. Double init is not

>>>>       really a good sign either.

>>> I just tested that and case 3) would be a problem. I don't see us

>>> clearing the managed flag when free'ing the interrupt. So with

>>> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, we attempt this affinity update

>>> twice, and error from the irqd_affinity_is_managed() check.

>> That means the interrupt is not deallocated and reallocated, which does

>> not make sense to me.

>> 

>

> Just mentioning a couple of things here, which could be a clue to what 

> is going on:

> - the device is behind mbigen secondary irq controller

> - the flow in the LLDD is to allocate all 128 interrupts during probe, 

> but we only register handlers for a subset with device managed API


Right, but if the driver is removed then the interrupts should be
deallocated, right?

Thanks,

        tglx
John Garry Nov. 20, 2020, 11:52 a.m. UTC | #11
Hi Thomas,

>> Just mentioning a couple of things here, which could be a clue to what

>> is going on:

>> - the device is behind mbigen secondary irq controller

>> - the flow in the LLDD is to allocate all 128 interrupts during probe,

>> but we only register handlers for a subset with device managed API

> Right, but if the driver is removed then the interrupts should be

> deallocated, right?

> 


When removing the driver we just call free_irq(), which removes the 
handler and disables the interrupt.

But about the irq_desc, this is created when the mapping is created in 
irq_create_fwspec_mapping(), and I don't see this being torn down in the 
driver removal, so persistent in that regard.

So for pci msi I can see that we free the irq_desc in pci_disable_msi() 
-> free_msi_irqs() -> msi_domain_free_irqs() ...

So what I am missing here?

Thanks,
John
Marc Zyngier Nov. 22, 2020, 1:38 p.m. UTC | #12
On Fri, 20 Nov 2020 11:52:09 +0000,
John Garry <john.garry@huawei.com> wrote:
> 

> Hi Thomas,

> 

> >> Just mentioning a couple of things here, which could be a clue to what

> >> is going on:

> >> - the device is behind mbigen secondary irq controller

> >> - the flow in the LLDD is to allocate all 128 interrupts during probe,

> >> but we only register handlers for a subset with device managed API

> > Right, but if the driver is removed then the interrupts should be

> > deallocated, right?

> > 

> 

> When removing the driver we just call free_irq(), which removes the

> handler and disables the interrupt.

> 

> But about the irq_desc, this is created when the mapping is created in

> irq_create_fwspec_mapping(), and I don't see this being torn down in

> the driver removal, so persistent in that regard.


If the irq_descs are created via the platform_get_irq() calls in
platform_get_irqs_affinity(), I'd expect some equivalent helper to
tear things down as a result, calling irq_dispose_mapping() behind the
scenes.

> So for pci msi I can see that we free the irq_desc in

> pci_disable_msi() -> free_msi_irqs() -> msi_domain_free_irqs() ...

> 

> So what I am missing here?


I'm not sure the paths are strictly equivalent. On the PCI side, we
can have something that completely driver agnostic, as it is all
architectural. In your case, only the endpoint driver knows about what
happens, and needs to free things accordingly.

Finally, there is the issue in your driver that everything is
requested using devm_request_irq, which cannot play nicely with an
explicit irq_desc teardown. You'll probably need to provide the
equivalent devm helpers for your driver to safely be taken down.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
John Garry Nov. 23, 2020, 12:54 p.m. UTC | #13
Hi Marc,

>>> Right, but if the driver is removed then the interrupts should be

>>> deallocated, right?

>>>

>>

>> When removing the driver we just call free_irq(), which removes the

>> handler and disables the interrupt.

>>

>> But about the irq_desc, this is created when the mapping is created in

>> irq_create_fwspec_mapping(), and I don't see this being torn down in

>> the driver removal, so persistent in that regard.

> 

> If the irq_descs are created via the platform_get_irq() calls in

> platform_get_irqs_affinity(), I'd expect some equivalent helper to

> tear things down as a result, calling irq_dispose_mapping() behind the

> scenes.

> 


So this looks lacking in the kernel AFAICS...

So is there a reason for which irq dispose mapping is not a requirement 
for drivers when finished with the irq? because of shared interrupts?

>> So for pci msi I can see that we free the irq_desc in

>> pci_disable_msi() -> free_msi_irqs() -> msi_domain_free_irqs() ...

>>

>> So what I am missing here?

> 

> I'm not sure the paths are strictly equivalent. On the PCI side, we

> can have something that completely driver agnostic, as it is all

> architectural. In your case, only the endpoint driver knows about what

> happens, and needs to free things accordingly.

> 

> Finally, there is the issue in your driver that everything is

> requested using devm_request_irq, which cannot play nicely with an

> explicit irq_desc teardown. You'll probably need to provide the

> equivalent devm helpers for your driver to safely be taken down.

> 


Yeah, so since we use the devm irq request method, we also need a devm 
dispose release method as we can't dispose the irq mapping in the 
remove() method, prior to the irq_free() in the later devm release method.

But it looks like there is more to it than that, which I'm worried is 
far from non-trivial. For example, just calling irq_dispose_mapping() 
for removal and then plaform_get_irq()->acpi_get_irq() second time fails 
as it looks like more tidy-up is needed for removal...

Cheers,
John
Marc Zyngier Nov. 23, 2020, 1:26 p.m. UTC | #14
Hi John,

On 2020-11-23 12:54, John Garry wrote:
> Hi Marc,

> 

>>>> Right, but if the driver is removed then the interrupts should be

>>>> deallocated, right?

>>>> 

>>> 

>>> When removing the driver we just call free_irq(), which removes the

>>> handler and disables the interrupt.

>>> 

>>> But about the irq_desc, this is created when the mapping is created 

>>> in

>>> irq_create_fwspec_mapping(), and I don't see this being torn down in

>>> the driver removal, so persistent in that regard.

>> 

>> If the irq_descs are created via the platform_get_irq() calls in

>> platform_get_irqs_affinity(), I'd expect some equivalent helper to

>> tear things down as a result, calling irq_dispose_mapping() behind the

>> scenes.

>> 

> 

> So this looks lacking in the kernel AFAICS...

> 

> So is there a reason for which irq dispose mapping is not a

> requirement for drivers when finished with the irq? because of shared

> interrupts?


For a bunch of reasons: IRQ number used to be created 1:1 with their
physical counterpart, so there wasn't a need to "get rid" of the
associated data structures. Also, people expected their drivers
to be there for the lifetime of the system (believe it or not,
hotplug devices are pretty new!).

Shared interrupts are just another part of the problem (although we
should be able to work out whether there is any action attached to
the descriptor before blowing it away.

> 

>>> So for pci msi I can see that we free the irq_desc in

>>> pci_disable_msi() -> free_msi_irqs() -> msi_domain_free_irqs() ...

>>> 

>>> So what I am missing here?

>> 

>> I'm not sure the paths are strictly equivalent. On the PCI side, we

>> can have something that completely driver agnostic, as it is all

>> architectural. In your case, only the endpoint driver knows about what

>> happens, and needs to free things accordingly.

>> 

>> Finally, there is the issue in your driver that everything is

>> requested using devm_request_irq, which cannot play nicely with an

>> explicit irq_desc teardown. You'll probably need to provide the

>> equivalent devm helpers for your driver to safely be taken down.

>> 

> 

> Yeah, so since we use the devm irq request method, we also need a devm

> dispose release method as we can't dispose the irq mapping in the

> remove() method, prior to the irq_free() in the later devm release

> method.

> 

> But it looks like there is more to it than that, which I'm worried is

> far from non-trivial. For example, just calling irq_dispose_mapping()

> for removal and then plaform_get_irq()->acpi_get_irq() second time

> fails as it looks like more tidy-up is needed for removal...


Most probably. I could imagine things failing if there is any trace
of an existing translation in the ITS or in the platform-MSI layer,
for example, or if the interrupt is still active...

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
John Garry Nov. 23, 2020, 3:45 p.m. UTC | #15
Hi Marc,

>> So is there a reason for which irq dispose mapping is not a

>> requirement for drivers when finished with the irq? because of shared

>> interrupts?

> 

> For a bunch of reasons: IRQ number used to be created 1:1 with their

> physical counterpart, so there wasn't a need to "get rid" of the

> associated data structures. Also, people expected their drivers

> to be there for the lifetime of the system (believe it or not,

> hotplug devices are pretty new!).

> 

> Shared interrupts are just another part of the problem (although we

> should be able to work out whether there is any action attached to

> the descriptor before blowing it away.

> 


OK, understood.

>> But it looks like there is more to it than that, which I'm worried is

>> far from non-trivial. For example, just calling irq_dispose_mapping()

>> for removal and then plaform_get_irq()->acpi_get_irq() second time

>> fails as it looks like more tidy-up is needed for removal...

> 

> Most probably. I could imagine things failing if there is any trace

> of an existing translation in the ITS or in the platform-MSI layer,

> for example, or if the interrupt is still active...


So this looks to be a problem I have. So if I hack the code to skip the 
check in acpi_get_irq() for the irq already being init'ed, I run into a 
use-after-free in the gic-v3-its driver. I may be skipping something 
with this hack, but I'll ask anyway.

So initially in the msi_prepare method we setup the its dev - this is 
from the mbigen probe. Then when all the irqs are unmapped later for end 
device driver removal, we release this its device in 
its_irq_domain_free(). But I don't see anything to set it up again. Is 
it improper to have released the its device in this scenario? Commenting 
out the release makes things "good" again.

Thanks,
john

Complete splat:

[   35.751627] 
==================================================================
[   35.758892] BUG: KASAN: use-after-free in its_irq_domain_alloc+0x44/0x1a8
[   35.765714] Read of size 8 at addr ffff04101bc93210 by task swapper/0/1
[   35.772357]
[   35.773854] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
5.10.0-rc4-18055-g3247a1b07719 #955
[   35.782072] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
D05 IT21 Nemo 2.0 RC0 04/18/2018
[   35.791250] Call trace:
[   35.793706]  dump_backtrace+0x0/0x2d0
[   35.797384]  show_stack+0x18/0x68
[   35.800713]  dump_stack+0x100/0x16c
[   35.804216]  print_address_description.constprop.12+0x6c/0x4e8
[   35.810078]  kasan_report+0x130/0x200
[   35.813755]  __asan_load8+0x9c/0xd8
[   35.817257]  its_irq_domain_alloc+0x44/0x1a8
[   35.821548]  irq_domain_alloc_irqs_parent+0x68/0x88
[   35.826447]  msi_domain_alloc+0x98/0x180
[   35.830387]  irq_domain_alloc_irqs_hierarchy+0x58/0x78
[   35.835549]  msi_domain_populate_irqs+0x16c/0x1d8
[   35.840275]  platform_msi_domain_alloc+0x9c/0xd0
[   35.844914]  mbigen_irq_domain_alloc+0x100/0x180
[   35.849553]  __irq_domain_alloc_irqs+0x188/0x498
[   35.854191]  irq_create_fwspec_mapping+0x21c/0x4e0
[   35.859005]  acpi_irq_get+0x1f0/0x218
[   35.862683]  platform_get_irq_optional+0x1c4/0x4a0
[   35.867495]  platform_get_irq+0x20/0x68
[   35.871348]  hisi_sas_v2_probe+0x2c/0x88
[   35.875287]  platform_drv_probe+0x70/0xd0
[   35.879313]  really_probe+0x414/0x640
[   35.882990]  driver_probe_device+0x78/0xe8
[   35.887103]  device_driver_attach+0x9c/0xa8
[   35.891304]  __driver_attach+0x74/0x120
[   35.895156]  bus_for_each_dev+0xec/0x160
[   35.899094]  driver_attach+0x34/0x48
[   35.902684]  bus_add_driver+0x1b8/0x2c0
[   35.906535]  driver_register+0xc0/0x1e0
[   35.910386]  __platform_driver_register+0x80/0x90
[   35.915115]  hisi_sas_v2_driver_init+0x1c/0x28
[   35.919578]  do_one_initcall+0xd4/0x268
[   35.923431]  kernel_init_freeable+0x270/0x2d8
[   35.927807]  kernel_init+0x14/0x124
[   35.931309]  ret_from_fork+0x10/0x34
[   35.934895]
[   35.936386] Allocated by task 1:
[   35.939628]  stack_trace_save+0x94/0xc8
[   35.943480]  kasan_save_stack+0x28/0x58
[   35.947331]  __kasan_kmalloc.isra.6+0xcc/0xf0
[   35.951706]  kasan_kmalloc+0x10/0x20
[   35.955295]  its_create_device+0x1b8/0x448
[   35.959408]  its_msi_prepare+0x120/0x1d8
[   35.963347]  its_pmsi_prepare+0x20c/0x280
[   35.967373]  msi_domain_prepare_irqs+0x80/0x98
[   35.971836]  __platform_msi_create_device_domain+0xa8/0x130
[   35.977435]  mbigen_device_probe+0x298/0x340
[   35.981723]  platform_drv_probe+0x70/0xd0
[   35.985748]  really_probe+0x414/0x640
[   35.989424]  driver_probe_device+0x78/0xe8
[   35.993537]  device_driver_attach+0x9c/0xa8
[   35.997737]  __driver_attach+0x74/0x120
[   36.001589]  bus_for_each_dev+0xec/0x160
[   36.005527]  driver_attach+0x34/0x48
[   36.009118]  bus_add_driver+0x1b8/0x2c0
[   36.012968]  driver_register+0xc0/0x1e0
[   36.016819]  __platform_driver_register+0x80/0x90
[   36.021545]  mbigen_platform_driver_init+0x1c/0x28
[   36.026356]  do_one_initcall+0xd4/0x268
[   36.030207]  kernel_init_freeable+0x270/0x2d8
[   36.034583]  kernel_init+0x14/0x124
[   36.038084]  ret_from_fork+0x10/0x34
[   36.041671]
[   36.043162] Freed by task 1:
[   36.046053]  stack_trace_save+0x94/0xc8
[   36.049904]  kasan_save_stack+0x28/0x58
[   36.053754]  kasan_set_track+0x28/0x40
[   36.057518]  kasan_set_free_info+0x24/0x48
[   36.061631]  __kasan_slab_free+0x104/0x188
[   36.065744]  kasan_slab_free+0x14/0x20
[   36.069510]  slab_free_freelist_hook+0x8c/0x190
[   36.074060]  kfree+0x308/0x448
[   36.077125]  its_irq_domain_free+0x31c/0x338
[   36.081414]  irq_domain_free_irqs_common+0xd8/0x128
[   36.086314]  irq_domain_free_irqs_top+0x70/0x88
[   36.090864]  msi_domain_free+0xa8/0xc8
[   36.094629]  irq_domain_free_irqs_common+0xd8/0x128
[   36.099528]  platform_msi_domain_free+0xdc/0x1b0
[   36.104167]  mbigen_irq_domain_free+0x10/0x20
[   36.108543]  irq_domain_free_irqs+0x184/0x208
[   36.112918]  irq_dispose_mapping+0x54/0x98
[   36.117033]  devm_platform_get_irqs_affinity_release+0xbc/0xdc
[   36.122893]  release_nodes+0x350/0x3e8
[   36.126657]  devres_release_all+0x54/0x78
[   36.130683]  really_probe+0x234/0x640
[   36.134359]  driver_probe_device+0x78/0xe8
[   36.138473]  device_driver_attach+0x9c/0xa8
[   36.142673]  __driver_attach+0x74/0x120
[   36.146524]  bus_for_each_dev+0xec/0x160
[   36.150462]  driver_attach+0x34/0x48
[   36.154052]  bus_add_driver+0x1b8/0x2c0
[   36.157903]  driver_register+0xc0/0x1e0
[   36.161754]  __platform_driver_register+0x80/0x90
[   36.166479]  hisi_sas_v2_driver_init+0x1c/0x28
[   36.170942]  do_one_initcall+0xd4/0x268
[   36.174793]  kernel_init_freeable+0x270/0x2d8
[   36.179168]  kernel_init+0x14/0x124
[   36.182670]  ret_from_fork+0x10/0x34
[   36.186256]
[   36.187749] The buggy address belongs to the object at ffff04101bc93200
[   36.187749]  which belongs to the cache kmalloc-128 of size 128
[   36.200335] The buggy address is located 16 bytes inside of
[   36.200335]  128-byte region [ffff04101bc93200, ffff04101bc93280)
[   36.212046] The buggy address belongs to the page:
[   36.216864] page:(____ptrval____) refcount:1 mapcount:0 
mapping:0000000000000000 index:0x0 pfn:0x4101bc92
[   36.226479] head:(____ptrval____) order:1 compound_mapcount:0
[   36.232253] flags: 0x2bfffc0000010200(slab|head)
[   36.236895] raw: 2bfffc0000010200 dead000000000100 dead000000000122 
ffff00104000fc00
[   36.244679] raw: 0000000000000000 0000000000200020 00000001ffffffff 
0000000000000000
[   36.252459] page dumped because: kasan: bad access detected
[   36.258055]
[   36.259545] Memory state around the buggy address:
[   36.264358]  ffff04101bc93100: fa fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   36.271616]  ffff04101bc93180: fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc
[   36.278874] >ffff04101bc93200: fa fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   36.286130]                          ^
[   36.289893]  ffff04101bc93280: fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc
[   36.297151]  ffff04101bc93300: fa fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   36.304406] 
==================================================================
[   36.311663] Disabling lock debugging due to kernel taint
Marc Zyngier Nov. 24, 2020, 4:51 p.m. UTC | #16
On 2020-11-23 15:45, John Garry wrote:

Hi John,

>>> But it looks like there is more to it than that, which I'm worried is

>>> far from non-trivial. For example, just calling irq_dispose_mapping()

>>> for removal and then plaform_get_irq()->acpi_get_irq() second time

>>> fails as it looks like more tidy-up is needed for removal...

>> 

>> Most probably. I could imagine things failing if there is any trace

>> of an existing translation in the ITS or in the platform-MSI layer,

>> for example, or if the interrupt is still active...

> 

> So this looks to be a problem I have. So if I hack the code to skip

> the check in acpi_get_irq() for the irq already being init'ed, I run

> into a use-after-free in the gic-v3-its driver. I may be skipping

> something with this hack, but I'll ask anyway.

> 

> So initially in the msi_prepare method we setup the its dev - this is

> from the mbigen probe. Then when all the irqs are unmapped later for

> end device driver removal, we release this its device in

> its_irq_domain_free(). But I don't see anything to set it up again. Is

> it improper to have released the its device in this scenario?

> Commenting out the release makes things "good" again.


Huh, that's ugly. The issue is that the device that deals with the
interrupts isn't the device that the ITS knows about (there isn't a
1:1 mapping between mbigen and the endpoint).

The mbigen is responsible for the creation of the corresponding
irqdomain, and and crucially for the "prepare" phase, which results
in storing the its_dev pointer in info->scratchpad[0].

As we free all the interrupts associated with the endpoint, we
free the its_dev (nothing else needs it at this point). On the
next allocation, we reuse the damn its_dev pointer, and we're SOL.
This is wrong, because we haven't removed the mbigen, only the
device *connected* to the mbigen. And since the mbigen can be shared
across endpoints, we can't reliably tear it down at all. Boo.

The only thing to do is to convey that by marking the its_dev as
shared so that it isn't deleted when no LPIs are being used. After
all, it isn't like the mbigen is going anywhere.

It is just that passing that information down isn't a simple affair,
as msi_alloc_info_t isn't a generic type... Let me have a think.

         M.
-- 
Jazz is not dead. It just smells funny...
John Garry Nov. 24, 2020, 5:38 p.m. UTC | #17
Hi Marc,

>> So initially in the msi_prepare method we setup the its dev - this is

>> from the mbigen probe. Then when all the irqs are unmapped later for

>> end device driver removal, we release this its device in

>> its_irq_domain_free(). But I don't see anything to set it up again. Is

>> it improper to have released the its device in this scenario?

>> Commenting out the release makes things "good" again.

> 

> Huh, that's ugly. The issue is that the device that deals with the

> interrupts isn't the device that the ITS knows about (there isn't a

> 1:1 mapping between mbigen and the endpoint).

> 

> The mbigen is responsible for the creation of the corresponding

> irqdomain, and and crucially for the "prepare" phase, which results

> in storing the its_dev pointer in info->scratchpad[0].

> 

> As we free all the interrupts associated with the endpoint, we

> free the its_dev (nothing else needs it at this point). On the

> next allocation, we reuse the damn its_dev pointer, and we're SOL.

> This is wrong, because we haven't removed the mbigen, only the

> device *connected* to the mbigen. And since the mbigen can be shared

> across endpoints, we can't reliably tear it down at all. Boo.

> 

> The only thing to do is to convey that by marking the its_dev as

> shared so that it isn't deleted when no LPIs are being used. After

> all, it isn't like the mbigen is going anywhere.


Right, I did consider this.

> 

> It is just that passing that information down isn't a simple affair,

> as msi_alloc_info_t isn't a generic type... Let me have a think.


I think that there is a way to circumvent the problem, which you might 
call hacky, but OTOH, not sure if there's much point changing mbigen or 
related infrastructure at this stage.

Anyway, so we have 128 irqs in total for the mbigen domain, but the 
driver only is interesting in something like irq indexes 1,2,72-81, and 
96-112. So we can just dispose the mappings for irq index 0-112 at 
removal stage, thereby keeping the its device around. We do still call 
platform_irq_count(), which sets up all 128 mappings, so maybe we should 
be unmapping all of these - this would be the contentious part. But 
maybe not, as the device driver is only interested in that subset, and 
has no business unmapping the rest.

With that change, the platform.c API would work a bit more like the pci 
msi code equivalent, where we request a min and max number of vectors. 
In fact, that platform.c change needs to be made anyway as 
platform_get_irqs_affinity() is broken currently for when nr_cpus < #hw 
queues.

Thoughts?

Thanks,
John
Marc Zyngier Nov. 25, 2020, 6:35 p.m. UTC | #18
On 2020-11-24 17:38, John Garry wrote:
> Hi Marc,

> 

>>> So initially in the msi_prepare method we setup the its dev - this is

>>> from the mbigen probe. Then when all the irqs are unmapped later for

>>> end device driver removal, we release this its device in

>>> its_irq_domain_free(). But I don't see anything to set it up again. 

>>> Is

>>> it improper to have released the its device in this scenario?

>>> Commenting out the release makes things "good" again.

>> 

>> Huh, that's ugly. The issue is that the device that deals with the

>> interrupts isn't the device that the ITS knows about (there isn't a

>> 1:1 mapping between mbigen and the endpoint).

>> 

>> The mbigen is responsible for the creation of the corresponding

>> irqdomain, and and crucially for the "prepare" phase, which results

>> in storing the its_dev pointer in info->scratchpad[0].

>> 

>> As we free all the interrupts associated with the endpoint, we

>> free the its_dev (nothing else needs it at this point). On the

>> next allocation, we reuse the damn its_dev pointer, and we're SOL.

>> This is wrong, because we haven't removed the mbigen, only the

>> device *connected* to the mbigen. And since the mbigen can be shared

>> across endpoints, we can't reliably tear it down at all. Boo.

>> 

>> The only thing to do is to convey that by marking the its_dev as

>> shared so that it isn't deleted when no LPIs are being used. After

>> all, it isn't like the mbigen is going anywhere.

> 

> Right, I did consider this.


FWIW, I've pushed my hack branch[1] out with a couple of patches
for you to try (the top 3 patches). They allow platform-MSI domains
created by devices (mbigen, ICU) to be advertised as shared between
devices, so that the low-level driver can handle that in an appropriate
way.

I gave it a go on my D05 and nothing blew up, but I can't really remove
the kernel module, as that's where my disks are... :-/
Please let me know if that helps.

>> It is just that passing that information down isn't a simple affair,

>> as msi_alloc_info_t isn't a generic type... Let me have a think.

> 

> I think that there is a way to circumvent the problem, which you might

> call hacky, but OTOH, not sure if there's much point changing mbigen

> or related infrastructure at this stage.


Bah, it's a simple change, and there is now more than the mbigen using
the same API...

> 

> Anyway, so we have 128 irqs in total for the mbigen domain, but the

> driver only is interesting in something like irq indexes 1,2,72-81,

> and 96-112. So we can just dispose the mappings for irq index 0-112 at

> removal stage, thereby keeping the its device around. We do still call

> platform_irq_count(), which sets up all 128 mappings, so maybe we

> should be unmapping all of these - this would be the contentious part.

> But maybe not, as the device driver is only interested in that subset,

> and has no business unmapping the rest.


I don't think the driver should mess with interrupts it doesn't own.
And while the mbigen port that is connected to the SAS controller
doesn't seem to be shared between endpoints, some other ports definitely
are:

# cat /sys/kernel/debug/irq/domains/\\_SB.MBI1
name:   \_SB.MBI1
  size:   409
  mapped: 192
  flags:  0x00000003

[...]

I guess that the other 217 lines are connected somewhere.

> With that change, the platform.c API would work a bit more like the

> pci msi code equivalent, where we request a min and max number of

> vectors. In fact, that platform.c change needs to be made anyway as

> platform_get_irqs_affinity() is broken currently for when nr_cpus <

> #hw queues.

> 

> Thoughts?


I'm happy to look at some code! ;-)

         M.
-- 
Jazz is not dead. It just smells funny...
John Garry Nov. 26, 2020, 10:47 a.m. UTC | #19
Hi Marc,

>> Right, I did consider this.

> 

> FWIW, I've pushed my hack branch[1]


Did you miss that reference?

> out with a couple of patches

> for you to try (the top 3 patches). They allow platform-MSI domains

> created by devices (mbigen, ICU) to be advertised as shared between

> devices, so that the low-level driver can handle that in an appropriate

> way.

> 

> I gave it a go on my D05 and nothing blew up, but I can't really remove

> the kernel module, as that's where my disks are... :-/


You still should be able to enable my favorite 
CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, while the distro still boot. But I'll 
just test if you want.

> Please let me know if that helps.

> 

>>> It is just that passing that information down isn't a simple affair,

>>> as msi_alloc_info_t isn't a generic type... Let me have a think.

>>

>> I think that there is a way to circumvent the problem, which you might

>> call hacky, but OTOH, not sure if there's much point changing mbigen

>> or related infrastructure at this stage.

> 

> Bah, it's a simple change, and there is now more than the mbigen using

> the same API...

> 

>>

>> Anyway, so we have 128 irqs in total for the mbigen domain, but the

>> driver only is interesting in something like irq indexes 1,2,72-81,

>> and 96-112. So we can just dispose the mappings for irq index 0-112 at

>> removal stage, thereby keeping the its device around. We do still call

>> platform_irq_count(), which sets up all 128 mappings, so maybe we

>> should be unmapping all of these - this would be the contentious part.

>> But maybe not, as the device driver is only interested in that subset,

>> and has no business unmapping the rest.

> 

> I don't think the driver should mess with interrupts it doesn't own.


I would tend to agree. But all 128 lines here are for the SAS 
controller. It's quite strange, as only about ~20 are useful.

> And while the mbigen port that is connected to the SAS controller

> doesn't seem to be shared between endpoints, some other ports definitely

> are:

> 

> # cat /sys/kernel/debug/irq/domains/\\_SB.MBI1

> name:   \_SB.MBI1

>   size:   409

>   mapped: 192

>   flags:  0x00000003

> 

> [...]

> 

> I guess that the other 217 lines are connected somewhere.

> 


I think that is not the right one. See 
https://github.com/tianocore/edk2-platforms/blob/master/Silicon/Hisilicon/Hi1616/D05AcpiTables/Dsdt/D05Sas.asl#L101


Thanks,
John
Marc Zyngier Nov. 26, 2020, 11:08 a.m. UTC | #20
Hi John,

On 2020-11-26 10:47, John Garry wrote:
> Hi Marc,

> 

>>> Right, I did consider this.

>> 

>> FWIW, I've pushed my hack branch[1]

> 

> Did you miss that reference?


I did:

https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/hacks

> 

>> out with a couple of patches

>> for you to try (the top 3 patches). They allow platform-MSI domains

>> created by devices (mbigen, ICU) to be advertised as shared between

>> devices, so that the low-level driver can handle that in an 

>> appropriate

>> way.

>> 

>> I gave it a go on my D05 and nothing blew up, but I can't really 

>> remove

>> the kernel module, as that's where my disks are... :-/

> 

> You still should be able to enable my favorite

> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, while the distro still boot. But

> I'll just test if you want.


Ah! Let me try that then. Having been debugging some ugly driver removal
lately, I wish I had known that config option and inflicted it on their
authors...

[...]

>> And while the mbigen port that is connected to the SAS controller

>> doesn't seem to be shared between endpoints, some other ports 

>> definitely

>> are:

>> 

>> # cat /sys/kernel/debug/irq/domains/\\_SB.MBI1

>> name:   \_SB.MBI1

>>   size:   409

>>   mapped: 192

>>   flags:  0x00000003

>> 

>> [...]

>> 

>> I guess that the other 217 lines are connected somewhere.

>> 

> 

> I think that is not the right one. See

> https://github.com/tianocore/edk2-platforms/blob/master/Silicon/Hisilicon/Hi1616/D05AcpiTables/Dsdt/D05Sas.asl#L101

> 


I know, I was just outlining the fact that some of the mbigen
ports are shared between devices, and MBI1 seems to be an example
of that (though my reading of ASL is... primitive).

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
John Garry Nov. 26, 2020, 11:29 a.m. UTC | #21
Hi Marc,

> 

> I did:

> 

> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/hacks 


ok, I'll have a look

>>

>> You still should be able to enable my favorite

>> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, while the distro still boot. But

>> I'll just test if you want.

> 

> Ah! Let me try that then. Having been debugging some ugly driver removal

> lately, I wish I had known that config option and inflicted it on their

> authors...


Just in case - be careful with this one! It kills the boot of many 
systems, but D05 should be fine.

> 

> [...]

> 

>>> And while the mbigen port that is connected to the SAS controller

>>> doesn't seem to be shared between endpoints, some other ports definitely

>>> are:

>>>

>>> # cat /sys/kernel/debug/irq/domains/\\_SB.MBI1

>>> name:   \_SB.MBI1

>>>   size:   409

>>>   mapped: 192

>>>   flags:  0x00000003

>>>

>>> [...]

>>>

>>> I guess that the other 217 lines are connected somewhere.

>>>

>>

>> I think that is not the right one. See

>> https://github.com/tianocore/edk2-platforms/blob/master/Silicon/Hisilicon/Hi1616/D05AcpiTables/Dsdt/D05Sas.asl#L101 

>>

>>

> 

> I know, I was just outlining the fact that some of the mbigen

> ports are shared between devices, and MBI1 seems to be an example

> of that (though my reading of ASL is... primitive).


Ah, ok. So this one is for the networking subsystem, and it does have 
many devices in the topology model, but I can't claim to known much more 
than that.

Thanks,
John
John Garry Nov. 26, 2020, 4:52 p.m. UTC | #22
Hi Marc,

>>

>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/hacks 

> 

> 

> ok, I'll have a look


I tried that and it doesn't look to work.

I find that for the its_msi_prepare() call, its_dev->shared does not get 
set, as MSI_ALLOC_FLAGS_SHARED_DEVICE is not set in info->flags.

If I understand the code correctly, MSI_ALLOC_FLAGS_SHARED_DEVICE is 
supposed to be set in info->flags in platform_msi_set_desc(), but this 
is called per-msi after its_msi_prepare(), so we don't the flags set at 
the right time. That's how it looks to me...

Cheers,
John
Marc Zyngier Nov. 27, 2020, 9:57 a.m. UTC | #23
On 2020-11-26 16:52, John Garry wrote:
> Hi Marc,

> 

>>> 

>>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/hacks

>> 

>> 

>> ok, I'll have a look

> 

> I tried that and it doesn't look to work.

> 

> I find that for the its_msi_prepare() call, its_dev->shared does not

> get set, as MSI_ALLOC_FLAGS_SHARED_DEVICE is not set in info->flags.

> 

> If I understand the code correctly, MSI_ALLOC_FLAGS_SHARED_DEVICE is

> supposed to be set in info->flags in platform_msi_set_desc(), but this

> is called per-msi after its_msi_prepare(), so we don't the flags set

> at the right time. That's how it looks to me...


Meh. I was trying multiple things, and of course commited the worse
possible approach.

I've updated the branch, having verified that we do get the flag in
the ITS now.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
John Garry Nov. 27, 2020, 12:45 p.m. UTC | #24
On 27/11/2020 09:57, Marc Zyngier wrote:
>> If I understand the code correctly, MSI_ALLOC_FLAGS_SHARED_DEVICE is

>> supposed to be set in info->flags in platform_msi_set_desc(), but this

>> is called per-msi after its_msi_prepare(), so we don't the flags set

>> at the right time. That's how it looks to me...

> 

> Meh. I was trying multiple things, and of course commited the worse

> possible approach.

> 

> I've updated the branch, having verified that we do get the flag in

> the ITS now.


Hi Marc,

That looks to work.

So do you have an upstream plan for this? I ask, as if you go with this, 
then I may change my series to map and unmap all the irqs again - but 
not sure about that.

Thanks,
John
Marc Zyngier Nov. 27, 2020, 12:49 p.m. UTC | #25
On 2020-11-27 12:45, John Garry wrote:
> On 27/11/2020 09:57, Marc Zyngier wrote:

>>> If I understand the code correctly, MSI_ALLOC_FLAGS_SHARED_DEVICE is

>>> supposed to be set in info->flags in platform_msi_set_desc(), but 

>>> this

>>> is called per-msi after its_msi_prepare(), so we don't the flags set

>>> at the right time. That's how it looks to me...

>> 

>> Meh. I was trying multiple things, and of course commited the worse

>> possible approach.

>> 

>> I've updated the branch, having verified that we do get the flag in

>> the ITS now.

> 

> Hi Marc,

> 

> That looks to work.


Thanks for having given it a go.

> So do you have an upstream plan for this? I ask, as if you go with

> this, then I may change my series to map and unmap all the irqs again

> - but not sure about that.


I'll write some commit messages, and post that. Either this weekend,
or on Monday at the latest.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...