mbox series

[RFC,00/17] Solve iommu probe races around iommu_fwspec

Message ID 0-v1-5f734af130a3+34f-iommu_fwspec_jgg@nvidia.com
Headers show
Series Solve iommu probe races around iommu_fwspec | expand

Message

Jason Gunthorpe Nov. 3, 2023, 4:44 p.m. UTC
This is a more complete solution that the first attempt here:
https://lore.kernel.org/r/1698825902-10685-1-git-send-email-quic_zhenhuah@quicinc.com

I haven't been able to test this on any HW that touches these paths, so if
some people with HW can help get it in shape it can become non-RFC.


The iommu subsystem uses dev->iommu to store bits of information about the
attached iommu driver. This has been co-opted by the ACPI/OF code to also
be a place to pass around the iommu_fwspec before a driver is probed.

Since both are using the same pointers without any locking it triggers
races if there is concurrent driver loading:

     CPU0                                     CPU1
of_iommu_configure()                iommu_device_register()
 ..                                   bus_iommu_probe()
  iommu_fwspec_of_xlate()              __iommu_probe_device()
                                        iommu_init_device()
   dev_iommu_get()
                                          .. ops->probe fails, no fwspec ..
                                          dev_iommu_free()
   dev->iommu->fwspec    *crash*

My first attempt get correct locking here was to use the device_lock to
protect the entire *_iommu_configure() and iommu_probe() paths. This
allowed safe use of dev->iommu within those paths. Unfortuately enough
drivers abuse the of_iommu_configure() flow without proper locking and
this approach failed.

This approach removes touches of dev->iommu from the *_iommu_configure()
code. The few remaining required touches are moved into iommu.c and
protected with the existing iommu_probe_device_lock.

To do this we change *_iommu_configure() to hold the iommu_fwspec on the
stack while it is being built. Once it is fully formed the core code will
install it into the dev->iommu when it calls probe.

This also removes all the touches of iommu_ops from
the *_iommu_configure() paths and makes that mechanism private to the
iommu core.

A few more lockdep assertions are added to discourage future mis-use.

This is on github: https://github.com/jgunthorpe/linux/commits/iommu_fwspec

Jason Gunthorpe (17):
  iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops()
  of: Do not return struct iommu_ops from of_iommu_configure()
  of: Use -ENODEV consistently in of_iommu_configure()
  acpi: Do not return struct iommu_ops from acpi_iommu_configure_id()
  iommu: Make iommu_fwspec->ids a distinct allocation
  iommu: Add iommu_fwspec_alloc/dealloc()
  iommu: Add iommu_probe_device_fwspec()
  of: Do not use dev->iommu within of_iommu_configure()
  iommu: Add iommu_fwspec_append_ids()
  acpi: Do not use dev->iommu within acpi_iommu_configure()
  iommu: Hold iommu_probe_device_lock while calling ops->of_xlate
  iommu: Make iommu_ops_from_fwnode() static
  iommu: Remove dev_iommu_fwspec_set()
  iommu: Remove pointless iommu_fwspec_free()
  iommu: Add ops->of_xlate_fwspec()
  iommu: Mark dev_iommu_get() with lockdep
  iommu: Mark dev_iommu_priv_set() with a lockdep

 arch/arc/mm/dma.c                           |   2 +-
 arch/arm/mm/dma-mapping-nommu.c             |   2 +-
 arch/arm/mm/dma-mapping.c                   |  10 +-
 arch/arm64/mm/dma-mapping.c                 |   4 +-
 arch/mips/mm/dma-noncoherent.c              |   2 +-
 arch/riscv/mm/dma-noncoherent.c             |   2 +-
 drivers/acpi/arm64/iort.c                   |  39 ++--
 drivers/acpi/scan.c                         | 104 +++++----
 drivers/acpi/viot.c                         |  44 ++--
 drivers/hv/hv_common.c                      |   2 +-
 drivers/iommu/amd/iommu.c                   |   2 -
 drivers/iommu/apple-dart.c                  |   1 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   9 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c       |  23 +-
 drivers/iommu/intel/iommu.c                 |   2 -
 drivers/iommu/iommu.c                       | 227 +++++++++++++++-----
 drivers/iommu/of_iommu.c                    | 129 +++++------
 drivers/iommu/omap-iommu.c                  |   1 -
 drivers/iommu/tegra-smmu.c                  |   1 -
 drivers/iommu/virtio-iommu.c                |   8 +-
 drivers/of/device.c                         |  24 ++-
 include/acpi/acpi_bus.h                     |   8 +-
 include/linux/acpi_iort.h                   |   3 +-
 include/linux/acpi_viot.h                   |   5 +-
 include/linux/dma-map-ops.h                 |   4 +-
 include/linux/iommu.h                       |  46 ++--
 include/linux/of_iommu.h                    |  13 +-
 27 files changed, 417 insertions(+), 300 deletions(-)


base-commit: ab41f1aafb43c2555b358147b14b4d7b8105b452

Comments

Jerry Snitselaar Nov. 4, 2023, 12:48 a.m. UTC | #1
On Fri, Nov 03, 2023 at 01:44:49PM -0300, Jason Gunthorpe wrote:
> Nothing needs this pointer. Return a normal error code with the usual
> IOMMU semantic that ENODEV means 'there is no IOMMU driver'.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/acpi/scan.c | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 

...

>  #else /* !CONFIG_IOMMU_API */
> @@ -1623,7 +1624,7 @@ static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
>  int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
>  			  const u32 *input_id)
>  {
> -	const struct iommu_ops *iommu;
> +	int ret;
>  
>  	if (attr == DEV_DMA_NOT_SUPPORTED) {
>  		set_dma_ops(dev, &dma_dummy_ops);
> @@ -1632,10 +1633,15 @@ int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
>  
>  	acpi_arch_dma_setup(dev);
>  
> -	iommu = acpi_iommu_configure_id(dev, input_id);
> -	if (PTR_ERR(iommu) == -EPROBE_DEFER)
> +	ret = acpi_iommu_configure_id(dev, input_id);
> +	if (ret == -EPROBE_DEFER)
>  		return -EPROBE_DEFER;
>  
                return ret; ?

> +	/*
> +	 * Historically this routine doesn't fail driver probing due to errors
> +	 * in acpi_iommu_configure()

              acpi_iommu_configure_id()

> +	 */
> +
>  	arch_setup_dma_ops(dev, 0, U64_MAX, attr == DEV_DMA_COHERENT);
>  
>  	return 0;
> -- 
> 2.42.0
>
Jason Gunthorpe Nov. 5, 2023, 1:24 p.m. UTC | #2
On Fri, Nov 03, 2023 at 05:48:01PM -0700, Jerry Snitselaar wrote:
> > @@ -1632,10 +1633,15 @@ int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
> >  
> >  	acpi_arch_dma_setup(dev);
> >  
> > -	iommu = acpi_iommu_configure_id(dev, input_id);
> > -	if (PTR_ERR(iommu) == -EPROBE_DEFER)
> > +	ret = acpi_iommu_configure_id(dev, input_id);
> > +	if (ret == -EPROBE_DEFER)
> >  		return -EPROBE_DEFER;
> >  
>                 return ret; ?

Maybe? Like this seemed to be a pattern in this code so I left it

> > +	/*
> > +	 * Historically this routine doesn't fail driver probing due to errors
> > +	 * in acpi_iommu_configure()
> 
>               acpi_iommu_configure_id()

Thanks

Jason
Jerry Snitselaar Nov. 5, 2023, 5:55 p.m. UTC | #3
On Sun, Nov 05, 2023 at 09:24:09AM -0400, Jason Gunthorpe wrote:
> On Fri, Nov 03, 2023 at 05:48:01PM -0700, Jerry Snitselaar wrote:
> > > @@ -1632,10 +1633,15 @@ int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
> > >  
> > >  	acpi_arch_dma_setup(dev);
> > >  
> > > -	iommu = acpi_iommu_configure_id(dev, input_id);
> > > -	if (PTR_ERR(iommu) == -EPROBE_DEFER)
> > > +	ret = acpi_iommu_configure_id(dev, input_id);
> > > +	if (ret == -EPROBE_DEFER)
> > >  		return -EPROBE_DEFER;
> > >  
> >                 return ret; ?
> 
> Maybe? Like this seemed to be a pattern in this code so I left it

Yeah, it is fine. I think it just caught my eye, because of this earlier
bit in the patch:

        if (err == -EPROBE_DEFER) {
-               return ERR_PTR(err);
+               return err;

which needed to get rid of the ERR_PTR.

Regards,
Jerry

> 
> > > +	/*
> > > +	 * Historically this routine doesn't fail driver probing due to errors
> > > +	 * in acpi_iommu_configure()
> > 
> >               acpi_iommu_configure_id()
> 
> Thanks
> 
> Jason
Rafael J. Wysocki Nov. 6, 2023, 2:36 p.m. UTC | #4
On Fri, Nov 3, 2023 at 5:45 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> This call chain is using dev->iommu->fwspec to pass around the fwspec
> between the three parts (acpi_iommu_configure(), acpi_iommu_fwspec_init(),
> iommu_probe_device()).
>
> However there is no locking around the accesses to dev->iommu, so this is
> all racy.
>
> Allocate a clean, local, fwspec at the start of acpu_iommu_configure(),
> pass it through all functions on the stack to fill it with data, and
> finally pass it into iommu_probe_device_fwspec() which will load it into
> dev->iommu under a lock.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/acpi/arm64/iort.c | 39 ++++++++---------
>  drivers/acpi/scan.c       | 89 ++++++++++++++++++---------------------
>  drivers/acpi/viot.c       | 44 ++++++++++---------
>  drivers/iommu/iommu.c     |  5 +--
>  include/acpi/acpi_bus.h   |  8 ++--
>  include/linux/acpi_iort.h |  3 +-
>  include/linux/acpi_viot.h |  5 ++-
>  include/linux/iommu.h     |  2 +
>  8 files changed, 97 insertions(+), 98 deletions(-)
>
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 6496ff5a6ba20d..accd01dcfe93f5 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1218,10 +1218,9 @@ static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
>         return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
>  }
>
> -static int iort_iommu_xlate(struct device *dev, struct acpi_iort_node *node,
> -                           u32 streamid)
> +static int iort_iommu_xlate(struct iommu_fwspec *fwspec, struct device *dev,
> +                           struct acpi_iort_node *node, u32 streamid)
>  {
> -       const struct iommu_ops *ops;
>         struct fwnode_handle *iort_fwnode;
>
>         if (!node)
> @@ -1239,17 +1238,14 @@ static int iort_iommu_xlate(struct device *dev, struct acpi_iort_node *node,
>          * in the kernel or not, defer the IOMMU configuration
>          * or just abort it.
>          */
> -       ops = iommu_ops_from_fwnode(iort_fwnode);
> -       if (!ops)
> -               return iort_iommu_driver_enabled(node->type) ?
> -                      -EPROBE_DEFER : -ENODEV;
> -
> -       return acpi_iommu_fwspec_init(dev, streamid, iort_fwnode, ops);
> +       return acpi_iommu_fwspec_init(fwspec, dev, streamid, iort_fwnode,
> +                                     iort_iommu_driver_enabled(node->type));
>  }
>
>  struct iort_pci_alias_info {
>         struct device *dev;
>         struct acpi_iort_node *node;
> +       struct iommu_fwspec *fwspec;
>  };
>
>  static int iort_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)
> @@ -1260,7 +1256,7 @@ static int iort_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)
>
>         parent = iort_node_map_id(info->node, alias, &streamid,
>                                   IORT_IOMMU_TYPE);
> -       return iort_iommu_xlate(info->dev, parent, streamid);
> +       return iort_iommu_xlate(info->fwspec, info->dev, parent, streamid);
>  }
>
>  static void iort_named_component_init(struct device *dev,
> @@ -1280,7 +1276,8 @@ static void iort_named_component_init(struct device *dev,
>                 dev_warn(dev, "Could not add device properties\n");
>  }
>
> -static int iort_nc_iommu_map(struct device *dev, struct acpi_iort_node *node)
> +static int iort_nc_iommu_map(struct iommu_fwspec *fwspec, struct device *dev,
> +                            struct acpi_iort_node *node)
>  {
>         struct acpi_iort_node *parent;
>         int err = -ENODEV, i = 0;
> @@ -1293,13 +1290,13 @@ static int iort_nc_iommu_map(struct device *dev, struct acpi_iort_node *node)
>                                                    i++);
>
>                 if (parent)
> -                       err = iort_iommu_xlate(dev, parent, streamid);
> +                       err = iort_iommu_xlate(fwspec, dev, parent, streamid);
>         } while (parent && !err);
>
>         return err;
>  }
>
> -static int iort_nc_iommu_map_id(struct device *dev,
> +static int iort_nc_iommu_map_id(struct iommu_fwspec *fwspec, struct device *dev,
>                                 struct acpi_iort_node *node,
>                                 const u32 *in_id)
>  {
> @@ -1308,7 +1305,7 @@ static int iort_nc_iommu_map_id(struct device *dev,
>
>         parent = iort_node_map_id(node, *in_id, &streamid, IORT_IOMMU_TYPE);
>         if (parent)
> -               return iort_iommu_xlate(dev, parent, streamid);
> +               return iort_iommu_xlate(fwspec, dev, parent, streamid);
>
>         return -ENODEV;
>  }
> @@ -1322,15 +1319,16 @@ static int iort_nc_iommu_map_id(struct device *dev,
>   *
>   * Returns: 0 on success, <0 on failure
>   */
> -int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
> +int iort_iommu_configure_id(struct iommu_fwspec *fwspec, struct device *dev,
> +                           const u32 *id_in)
>  {
>         struct acpi_iort_node *node;
>         int err = -ENODEV;
>
>         if (dev_is_pci(dev)) {
> -               struct iommu_fwspec *fwspec;
>                 struct pci_bus *bus = to_pci_dev(dev)->bus;
> -               struct iort_pci_alias_info info = { .dev = dev };
> +               struct iort_pci_alias_info info = { .dev = dev,
> +                                                   .fwspec = fwspec };
>
>                 node = iort_scan_node(ACPI_IORT_NODE_PCI_ROOT_COMPLEX,
>                                       iort_match_node_callback, &bus->dev);
> @@ -1341,8 +1339,7 @@ int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
>                 err = pci_for_each_dma_alias(to_pci_dev(dev),
>                                              iort_pci_iommu_init, &info);
>
> -               fwspec = dev_iommu_fwspec_get(dev);
> -               if (fwspec && iort_pci_rc_supports_ats(node))
> +               if (iort_pci_rc_supports_ats(node))
>                         fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS;
>         } else {
>                 node = iort_scan_node(ACPI_IORT_NODE_NAMED_COMPONENT,
> @@ -1350,8 +1347,8 @@ int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
>                 if (!node)
>                         return -ENODEV;
>
> -               err = id_in ? iort_nc_iommu_map_id(dev, node, id_in) :
> -                             iort_nc_iommu_map(dev, node);
> +               err = id_in ? iort_nc_iommu_map_id(fwspec, dev, node, id_in) :
> +                             iort_nc_iommu_map(fwspec, dev, node);
>
>                 if (!err)
>                         iort_named_component_init(dev, node);
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index fbabde001a23a2..1e01a8e0316867 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1543,74 +1543,67 @@ int acpi_dma_get_range(struct device *dev, const struct bus_dma_region **map)
>  }
>
>  #ifdef CONFIG_IOMMU_API
> -int acpi_iommu_fwspec_init(struct device *dev, u32 id,
> -                          struct fwnode_handle *fwnode,
> -                          const struct iommu_ops *ops)
> +int acpi_iommu_fwspec_init(struct iommu_fwspec *fwspec, struct device *dev,
> +                          u32 id, struct fwnode_handle *fwnode,
> +                          bool iommu_driver_available)
>  {
> -       int ret = iommu_fwspec_init(dev, fwnode, ops);
> +       int ret;
>
> -       if (!ret)
> -               ret = iommu_fwspec_add_ids(dev, &id, 1);
> -
> -       return ret;
> -}
> -
> -static inline const struct iommu_ops *acpi_iommu_fwspec_ops(struct device *dev)
> -{
> -       struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> -
> -       return fwspec ? fwspec->ops : NULL;
> +       ret = iommu_fwspec_assign_iommu(fwspec, dev, fwnode);
> +       if (ret) {
> +               if (ret == -EPROBE_DEFER && !iommu_driver_available)
> +                       return -ENODEV;
> +               return ret;
> +       }
> +       return iommu_fwspec_append_ids(fwspec, &id, 1);
>  }
>
>  static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
>  {
>         int err;
> -       const struct iommu_ops *ops;
> +       struct iommu_fwspec *fwspec;
>
> -       /*
> -        * If we already translated the fwspec there is nothing left to do,
> -        * return the iommu_ops.
> -        */
> -       ops = acpi_iommu_fwspec_ops(dev);
> -       if (ops)
> -               return 0;
> +       fwspec = iommu_fwspec_alloc();
> +       if (IS_ERR(fwspec))
> +               return PTR_ERR(fwspec);
>
> -       err = iort_iommu_configure_id(dev, id_in);
> -       if (err && err != -EPROBE_DEFER)
> -               err = viot_iommu_configure(dev);
> +       err = iort_iommu_configure_id(fwspec, dev, id_in);
> +       if (err == -ENODEV)
> +               err = viot_iommu_configure(fwspec, dev);
> +       if (err == -ENODEV || err == -EPROBE_DEFER)
> +               goto err_free;
> +       if (err)
> +               goto err_log;
>
> -       /*
> -        * If we have reason to believe the IOMMU driver missed the initial
> -        * iommu_probe_device() call for dev, replay it to get things in order.
> -        */
> -       if (!err && dev->bus)
> -               err = iommu_probe_device(dev);
> -
> -       /* Ignore all other errors apart from EPROBE_DEFER */
> -       if (err == -EPROBE_DEFER) {
> -               return err;
> -       } else if (err) {
> -               dev_dbg(dev, "Adding to IOMMU failed: %d\n", err);
> -               return -ENODEV;
> +       err = iommu_probe_device_fwspec(dev, fwspec);
> +       if (err) {
> +               /*
> +                * Ownership for fwspec always passes into
> +                * iommu_probe_device_fwspec()
> +                */
> +               fwspec = NULL;
> +               goto err_log;
>         }
> -       if (!acpi_iommu_fwspec_ops(dev))
> -               return -ENODEV;
> -       return 0;
> +
> +err_log:
> +       dev_dbg(dev, "Adding to IOMMU failed: %d\n", err);
> +err_free:
> +       iommu_fwspec_dealloc(fwspec);
> +       return err;
>  }
>
>  #else /* !CONFIG_IOMMU_API */
>
> -int acpi_iommu_fwspec_init(struct device *dev, u32 id,
> -                          struct fwnode_handle *fwnode,
> -                          const struct iommu_ops *ops)
> +int acpi_iommu_fwspec_init(struct iommu_fwspec *fwspec, struct device *dev,
> +                          u32 id, struct fwnode_handle *fwnode,
> +                          bool iommu_driver_available)
>  {
>         return -ENODEV;
>  }
>
> -static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
> -                                                      const u32 *id_in)
> +static const int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
>  {
> -       return NULL;
> +       return -ENODEV;
>  }
>
>  #endif /* !CONFIG_IOMMU_API */
> diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
> index c8025921c129b2..33b511dd202d15 100644
> --- a/drivers/acpi/viot.c
> +++ b/drivers/acpi/viot.c
> @@ -304,11 +304,9 @@ void __init acpi_viot_init(void)
>         acpi_put_table(hdr);
>  }
>
> -static int viot_dev_iommu_init(struct device *dev, struct viot_iommu *viommu,
> -                              u32 epid)
> +static int viot_dev_iommu_init(struct iommu_fwspec *fwspec, struct device *dev,
> +                              struct viot_iommu *viommu, u32 epid)
>  {
> -       const struct iommu_ops *ops;
> -
>         if (!viommu)
>                 return -ENODEV;
>
> @@ -316,19 +314,20 @@ static int viot_dev_iommu_init(struct device *dev, struct viot_iommu *viommu,
>         if (device_match_fwnode(dev, viommu->fwnode))
>                 return -EINVAL;
>
> -       ops = iommu_ops_from_fwnode(viommu->fwnode);
> -       if (!ops)
> -               return IS_ENABLED(CONFIG_VIRTIO_IOMMU) ?
> -                       -EPROBE_DEFER : -ENODEV;
> -
> -       return acpi_iommu_fwspec_init(dev, epid, viommu->fwnode, ops);
> +       return acpi_iommu_fwspec_init(fwspec, dev, epid, viommu->fwnode,
> +                                     IS_ENABLED(CONFIG_VIRTIO_IOMMU));
>  }
>
> +struct viot_pci_alias_info {
> +       struct device *dev;
> +       struct iommu_fwspec *fwspec;
> +};
> +
>  static int viot_pci_dev_iommu_init(struct pci_dev *pdev, u16 dev_id, void *data)
>  {
>         u32 epid;
>         struct viot_endpoint *ep;
> -       struct device *aliased_dev = data;
> +       struct viot_pci_alias_info *info = data;
>         u32 domain_nr = pci_domain_nr(pdev->bus);
>
>         list_for_each_entry(ep, &viot_pci_ranges, list) {
> @@ -339,14 +338,15 @@ static int viot_pci_dev_iommu_init(struct pci_dev *pdev, u16 dev_id, void *data)
>                         epid = ((domain_nr - ep->segment_start) << 16) +
>                                 dev_id - ep->bdf_start + ep->endpoint_id;
>
> -                       return viot_dev_iommu_init(aliased_dev, ep->viommu,
> -                                                  epid);
> +                       return viot_dev_iommu_init(info->fwspec, info->dev,
> +                                                  ep->viommu, epid);
>                 }
>         }
>         return -ENODEV;
>  }
>
> -static int viot_mmio_dev_iommu_init(struct platform_device *pdev)
> +static int viot_mmio_dev_iommu_init(struct iommu_fwspec *fwspec,
> +                                   struct platform_device *pdev)
>  {
>         struct resource *mem;
>         struct viot_endpoint *ep;
> @@ -357,8 +357,8 @@ static int viot_mmio_dev_iommu_init(struct platform_device *pdev)
>
>         list_for_each_entry(ep, &viot_mmio_endpoints, list) {
>                 if (ep->address == mem->start)
> -                       return viot_dev_iommu_init(&pdev->dev, ep->viommu,
> -                                                  ep->endpoint_id);
> +                       return viot_dev_iommu_init(fwspec, &pdev->dev,
> +                                                  ep->viommu, ep->endpoint_id);
>         }
>         return -ENODEV;
>  }
> @@ -369,12 +369,16 @@ static int viot_mmio_dev_iommu_init(struct platform_device *pdev)
>   *
>   * Return: 0 on success, <0 on failure
>   */
> -int viot_iommu_configure(struct device *dev)
> +int viot_iommu_configure(struct iommu_fwspec *fwspec, struct device *dev)
>  {
> -       if (dev_is_pci(dev))
> +       if (dev_is_pci(dev)) {
> +               struct viot_pci_alias_info info = { .dev = dev,
> +                                                   .fwspec = fwspec };
>                 return pci_for_each_dma_alias(to_pci_dev(dev),
> -                                             viot_pci_dev_iommu_init, dev);
> +                                             viot_pci_dev_iommu_init, &info);
> +       }
>         else if (dev_is_platform(dev))
> -               return viot_mmio_dev_iommu_init(to_platform_device(dev));
> +               return viot_mmio_dev_iommu_init(fwspec,
> +                                               to_platform_device(dev));
>         return -ENODEV;
>  }
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 15dbe2d9eb24c2..9cfba9d12d1400 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2960,9 +2960,8 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
>         return ops;
>  }
>
> -static int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec,
> -                                    struct device *dev,
> -                                    struct fwnode_handle *iommu_fwnode)
> +int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec, struct device *dev,
> +                             struct fwnode_handle *iommu_fwnode)
>  {
>         const struct iommu_ops *ops;
>
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index 254685085c825c..70f97096c776e4 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -12,6 +12,8 @@
>  #include <linux/device.h>
>  #include <linux/property.h>
>
> +struct iommu_fwspec;
> +
>  /* TBD: Make dynamic */
>  #define ACPI_MAX_HANDLES       10
>  struct acpi_handle_list {
> @@ -625,9 +627,9 @@ struct acpi_pci_root {
>
>  bool acpi_dma_supported(const struct acpi_device *adev);
>  enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev);
> -int acpi_iommu_fwspec_init(struct device *dev, u32 id,
> -                          struct fwnode_handle *fwnode,
> -                          const struct iommu_ops *ops);
> +int acpi_iommu_fwspec_init(struct iommu_fwspec *fwspec, struct device *dev,
> +                          u32 id, struct fwnode_handle *fwnode,
> +                          bool iommu_driver_available);
>  int acpi_dma_get_range(struct device *dev, const struct bus_dma_region **map);
>  int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
>                            const u32 *input_id);
> diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
> index 1cb65592c95dd3..80794ec45d1693 100644
> --- a/include/linux/acpi_iort.h
> +++ b/include/linux/acpi_iort.h
> @@ -40,7 +40,8 @@ void iort_put_rmr_sids(struct fwnode_handle *iommu_fwnode,
>                        struct list_head *head);
>  /* IOMMU interface */
>  int iort_dma_get_ranges(struct device *dev, u64 *size);
> -int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
> +int iort_iommu_configure_id(struct iommu_fwspec *fwspec, struct device *dev,
> +                           const u32 *id_in);
>  void iort_iommu_get_resv_regions(struct device *dev, struct list_head *head);
>  phys_addr_t acpi_iort_dma_get_max_cpu_address(void);
>  #else
> diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
> index a5a12243156377..f1874cb6d43c09 100644
> --- a/include/linux/acpi_viot.h
> +++ b/include/linux/acpi_viot.h
> @@ -8,11 +8,12 @@
>  #ifdef CONFIG_ACPI_VIOT
>  void __init acpi_viot_early_init(void);
>  void __init acpi_viot_init(void);
> -int viot_iommu_configure(struct device *dev);
> +int viot_iommu_configure(struct iommu_fwspec *fwspec, struct device *dev);
>  #else
>  static inline void acpi_viot_early_init(void) {}
>  static inline void acpi_viot_init(void) {}
> -static inline int viot_iommu_configure(struct device *dev)
> +static inline int viot_iommu_configure(struct iommu_fwspec *fwspec,
> +                                      struct device *dev)
>  {
>         return -ENODEV;
>  }
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index c5a5e2b5e2cc2a..27e4605d498850 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -688,6 +688,8 @@ void iommu_fwspec_dealloc(struct iommu_fwspec *fwspec);
>  int iommu_fwspec_of_xlate(struct iommu_fwspec *fwspec, struct device *dev,
>                           struct fwnode_handle *iommu_fwnode,
>                           struct of_phandle_args *iommu_spec);
> +int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec, struct device *dev,
> +                             struct fwnode_handle *iommu_fwnode);
>
>  int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
>                       const struct iommu_ops *ops);
> --
> 2.42.0
>
André Draszik Nov. 8, 2023, 6:34 p.m. UTC | #5
Hi Jason,

On Fri, 2023-11-03 at 13:44 -0300, Jason Gunthorpe wrote:
> This is a more complete solution that the first attempt here:
> https://lore.kernel.org/r/1698825902-10685-1-git-send-email-quic_zhenhuah@quicinc.com
> 
> I haven't been able to test this on any HW that touches these paths, so if
> some people with HW can help get it in shape it can become non-RFC.

Thank you for this series.

Please note that we're also observing this issue on 6.1.
I think this series is a good candidate for a back port (slightly complicated by
the fact that various refactors have happened since).

For me, it's working fine so far on master, and I've also done my own back port
to 6.1 and am currently testing both. An official back port once finalised
could be useful, though :-)


Cheers,
Andre'
Jason Gunthorpe Nov. 8, 2023, 7:22 p.m. UTC | #6
On Wed, Nov 08, 2023 at 06:34:58PM +0000, André Draszik wrote:

> For me, it's working fine so far on master, and I've also done my own back port
> to 6.1 and am currently testing both. An official back port once finalised
> could be useful, though :-)

Great, I'll post a non-RFC version next week (LPC permitting)

BTW, kbuild 0-day caught your note in the other email and a bunch of
other wonky stuff I've fixed on the github version.

Thanks,
Jason
Zhenhua Huang Nov. 14, 2023, 4:56 a.m. UTC | #7
Thanks Jason.

On 2023/11/4 0:44, Jason Gunthorpe wrote:
> This is a more complete solution that the first attempt here:
> https://lore.kernel.org/r/1698825902-10685-1-git-send-email-quic_zhenhuah@quicinc.com
> 
> I haven't been able to test this on any HW that touches these paths, so if
> some people with HW can help get it in shape it can become non-RFC.

Thank you for addressing it quickly with a thorough way. I have 
backported it to Android common kernel 6.1 and tested basic sanity well.
I will share these to OEMs and see if they can reproduce further, thanks.

Thanks,
Zhenhua