Message ID | 20230508142842.854564-1-apatel@ventanamicro.com |
---|---|
Headers | show |
Series | Linux RISC-V AIA Support | expand |
On 2023-05-08 15:28, Anup Patel wrote: > We have a separate RISC-V IMSIC MSI address for each CPU so changing > MSI (or IRQ) affinity results in re-programming of MSI address in > the PCIe (or platform) device. > > Currently, the iommu_dma_prepare_msi() is called only once at the > time of IRQ allocation so IOMMU DMA domain will only have mapping > for one MSI page. This means iommu_dma_compose_msi_msg() called > by imsic_irq_compose_msi_msg() will always use the same MSI page > irrespective to target CPU MSI address. In other words, changing > MSI (or IRQ) affinity for device using IOMMU DMA domain will not > work. > > To address above issue, we do the following: > 1) Map MSI pages for all CPUs in imsic_irq_domain_alloc() > using iommu_dma_prepare_msi(). > 2) Add a new iommu_dma_select_msi() API to select a specific > MSI page from a set of already mapped MSI pages. > 3) Use iommu_dma_select_msi() to select a specific MSI page > before calling iommu_dma_compose_msi_msg() in > imsic_irq_compose_msi_msg(). The high-level design is that prepare ensures any necessary page mappings exist, then compose retrieves the appropriate page for the given message. I think it generalises well enough without needing a new op, it just means that caching a single page in the msi_desc up-front no longer fits, so that wants tweaking to allow compose to do a more general lookup. Thanks, Robin. > Reported-by: Vincent Chen <vincent.chen@sifive.com> > Signed-off-by: Anup Patel <apatel@ventanamicro.com> > --- > drivers/iommu/dma-iommu.c | 38 +++++++++++++++++++++++++++++++ > drivers/irqchip/irq-riscv-imsic.c | 27 ++++++++++++---------- > include/linux/iommu.h | 6 +++++ > 3 files changed, 59 insertions(+), 12 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 7a9f0b0bddbd..07782c77a6eb 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -1677,6 +1677,44 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) > return 0; > } > > +/** > + * iommu_dma_select_msi() - Select a MSI page from a set of > + * already mapped MSI pages in the IOMMU domain. > + * > + * @desc: MSI descriptor prepared by iommu_dma_prepare_msi() > + * @msi_addr: physical address of the MSI page to be selected > + * > + * Return: 0 on success or negative error code if the select failed. > + */ > +int iommu_dma_select_msi(struct msi_desc *desc, phys_addr_t msi_addr) > +{ > + struct device *dev = msi_desc_to_dev(desc); > + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > + const struct iommu_dma_msi_page *msi_page; > + struct iommu_dma_cookie *cookie; > + > + if (!domain || !domain->iova_cookie) { > + desc->iommu_cookie = NULL; > + return 0; > + } > + > + cookie = domain->iova_cookie; > + msi_addr &= ~(phys_addr_t)(cookie_msi_granule(cookie) - 1); > + > + msi_page = msi_desc_get_iommu_cookie(desc); > + if (msi_page && msi_page->phys == msi_addr) > + return 0; > + > + list_for_each_entry(msi_page, &cookie->msi_page_list, list) { > + if (msi_page->phys == msi_addr) { > + msi_desc_set_iommu_cookie(desc, msi_page); > + return 0; > + } > + } > + > + return -ENOENT; > +} > + > /** > * iommu_dma_compose_msi_msg() - Apply translation to an MSI message > * @desc: MSI descriptor prepared by iommu_dma_prepare_msi() > diff --git a/drivers/irqchip/irq-riscv-imsic.c b/drivers/irqchip/irq-riscv-imsic.c > index 30247c84a6b0..ec61c599e0c5 100644 > --- a/drivers/irqchip/irq-riscv-imsic.c > +++ b/drivers/irqchip/irq-riscv-imsic.c > @@ -446,6 +446,10 @@ static void imsic_irq_compose_msi_msg(struct irq_data *d, > if (WARN_ON(err)) > return; > > + err = iommu_dma_select_msi(desc, msi_addr); > + if (WARN_ON(err)) > + return; > + > msg->address_hi = upper_32_bits(msi_addr); > msg->address_lo = lower_32_bits(msi_addr); > msg->data = d->hwirq; > @@ -493,11 +497,18 @@ static int imsic_irq_domain_alloc(struct irq_domain *domain, > int i, hwirq, err = 0; > unsigned int cpu; > > - err = imsic_get_cpu(&imsic->lmask, false, &cpu); > - if (err) > - return err; > + /* Map MSI address of all CPUs */ > + for_each_cpu(cpu, &imsic->lmask) { > + err = imsic_cpu_page_phys(cpu, 0, &msi_addr); > + if (err) > + return err; > > - err = imsic_cpu_page_phys(cpu, 0, &msi_addr); > + err = iommu_dma_prepare_msi(info->desc, msi_addr); > + if (err) > + return err; > + } > + > + err = imsic_get_cpu(&imsic->lmask, false, &cpu); > if (err) > return err; > > @@ -505,10 +516,6 @@ static int imsic_irq_domain_alloc(struct irq_domain *domain, > if (hwirq < 0) > return hwirq; > > - err = iommu_dma_prepare_msi(info->desc, msi_addr); > - if (err) > - goto fail; > - > for (i = 0; i < nr_irqs; i++) { > imsic_id_set_target(hwirq + i, cpu); > irq_domain_set_info(domain, virq + i, hwirq + i, > @@ -528,10 +535,6 @@ static int imsic_irq_domain_alloc(struct irq_domain *domain, > } > > return 0; > - > -fail: > - imsic_ids_free(hwirq, get_count_order(nr_irqs)); > - return err; > } > > static void imsic_irq_domain_free(struct irq_domain *domain, > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index e8c9a7da1060..41e8613832ab 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -1117,6 +1117,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit); > int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base); > > int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr); > +int iommu_dma_select_msi(struct msi_desc *desc, phys_addr_t msi_addr); > void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg); > > #else /* CONFIG_IOMMU_DMA */ > @@ -1138,6 +1139,11 @@ static inline int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_a > return 0; > } > > +static inline int iommu_dma_select_msi(struct msi_desc *desc, phys_addr_t msi_addr) > +{ > + return 0; > +} > + > static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg) > { > }
On Mon, May 08, 2023 at 07:58:32PM +0530, Anup Patel wrote: > We add common riscv_fw_parent_hartid() which help device drivers > to get parent hartid of the INTC (i.e. local interrupt controller) > fwnode. Currently, this new function only supports device tree > but it can be extended to support ACPI as well. > > Signed-off-by: Anup Patel <apatel@ventanamicro.com> > --- > arch/riscv/include/asm/processor.h | 3 +++ > arch/riscv/kernel/cpu.c | 12 ++++++++++++ > 2 files changed, 15 insertions(+) > > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h > index 94a0590c6971..6fb8bbec8459 100644 > --- a/arch/riscv/include/asm/processor.h > +++ b/arch/riscv/include/asm/processor.h > @@ -77,6 +77,9 @@ struct device_node; > int riscv_of_processor_hartid(struct device_node *node, unsigned long *hartid); > int riscv_of_parent_hartid(struct device_node *node, unsigned long *hartid); > > +struct fwnode_handle; > +int riscv_fw_parent_hartid(struct fwnode_handle *node, unsigned long *hartid); > + > extern void riscv_fill_hwcap(void); > extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c > index 5de6fb703cc2..1adbe48b2b58 100644 > --- a/arch/riscv/kernel/cpu.c > +++ b/arch/riscv/kernel/cpu.c > @@ -73,6 +73,18 @@ int riscv_of_parent_hartid(struct device_node *node, unsigned long *hartid) > return -1; > } > > +/* Find hart ID of the CPU fwnode under which given fwnode falls. */ > +int riscv_fw_parent_hartid(struct fwnode_handle *node, unsigned long *hartid) > +{ > + /* > + * Currently, this function only supports DT but it can be > + * extended to support ACPI as well. > + */ Statement of the obvious here, no? Although, it seems a little odd to read this comment & the corresponding statement in the commit message, when the series appears to have been based on the ACPI? Perhaps by the time v4 comes around, ACPI support will have been merged & that'll be moot. > + if (!is_of_node(node)) > + return -EINVAL; > + return riscv_of_parent_hartid(to_of_node(node), hartid); nit: blank line before the return here please. Thanks, Conor.
On Mon, May 08, 2023 at 07:58:38PM +0530, Anup Patel wrote: > We have a separate RISC-V IMSIC MSI address for each CPU so changing > MSI (or IRQ) affinity results in re-programming of MSI address in > the PCIe (or platform) device. > > Currently, the iommu_dma_prepare_msi() is called only once at the > time of IRQ allocation so IOMMU DMA domain will only have mapping > for one MSI page. This means iommu_dma_compose_msi_msg() called > by imsic_irq_compose_msi_msg() will always use the same MSI page > irrespective to target CPU MSI address. In other words, changing > MSI (or IRQ) affinity for device using IOMMU DMA domain will not > work. > > To address above issue, we do the following: > 1) Map MSI pages for all CPUs in imsic_irq_domain_alloc() > using iommu_dma_prepare_msi(). > 2) Add a new iommu_dma_select_msi() API to select a specific > MSI page from a set of already mapped MSI pages. > 3) Use iommu_dma_select_msi() to select a specific MSI page > before calling iommu_dma_compose_msi_msg() in > imsic_irq_compose_msi_msg(). Is there an iommu driver somewhere in all this? I don't obviously see one? There should be no reason to use the dma-iommu.c stuff just to make interrupts work, that is only necessary if there is an iommu, and the platform architecture requires the iommu to have the MSI region programmed into IOPTEs. And I'd be much happier if we could clean this design up before risc-v starts using it too :\ Jason
On Wed, May 10, 2023 at 6:15 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > On Mon, May 08, 2023 at 07:58:32PM +0530, Anup Patel wrote: > > We add common riscv_fw_parent_hartid() which help device drivers > > to get parent hartid of the INTC (i.e. local interrupt controller) > > fwnode. Currently, this new function only supports device tree > > but it can be extended to support ACPI as well. > > > > Signed-off-by: Anup Patel <apatel@ventanamicro.com> > > --- > > arch/riscv/include/asm/processor.h | 3 +++ > > arch/riscv/kernel/cpu.c | 12 ++++++++++++ > > 2 files changed, 15 insertions(+) > > > > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h > > index 94a0590c6971..6fb8bbec8459 100644 > > --- a/arch/riscv/include/asm/processor.h > > +++ b/arch/riscv/include/asm/processor.h > > @@ -77,6 +77,9 @@ struct device_node; > > int riscv_of_processor_hartid(struct device_node *node, unsigned long *hartid); > > int riscv_of_parent_hartid(struct device_node *node, unsigned long *hartid); > > > > +struct fwnode_handle; > > +int riscv_fw_parent_hartid(struct fwnode_handle *node, unsigned long *hartid); > > + > > extern void riscv_fill_hwcap(void); > > extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); > > > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c > > index 5de6fb703cc2..1adbe48b2b58 100644 > > --- a/arch/riscv/kernel/cpu.c > > +++ b/arch/riscv/kernel/cpu.c > > @@ -73,6 +73,18 @@ int riscv_of_parent_hartid(struct device_node *node, unsigned long *hartid) > > return -1; > > } > > > > +/* Find hart ID of the CPU fwnode under which given fwnode falls. */ > > +int riscv_fw_parent_hartid(struct fwnode_handle *node, unsigned long *hartid) > > +{ > > + /* > > + * Currently, this function only supports DT but it can be > > + * extended to support ACPI as well. > > + */ > > Statement of the obvious here, no? > Although, it seems a little odd to read this comment & the corresponding > statement in the commit message, when the series appears to have been > based on the ACPI? > > Perhaps by the time v4 comes around, ACPI support will have been merged > & that'll be moot. Yes, I was anyway going to update this in v4 to support both DT and ACPI. > > > + if (!is_of_node(node)) > > + return -EINVAL; > > + return riscv_of_parent_hartid(to_of_node(node), hartid); > > nit: blank line before the return here please. Okay, I will update. > > Thanks, > Conor. Regards, Anup