[v8,3/5] iommu/of: Add msi address regions reservation helper

Message ID 20170927133241.21036-4-shameerali.kolothum.thodi@huawei.com
State New
Headers show
Series
  • iommu/smmu-v3: Workaround for hisilicon 161010801 erratum(reserve HW MSI)
Related show

Commit Message

Shameerali Kolothum Thodi Sept. 27, 2017, 1:32 p.m.
From: John Garry <john.garry@huawei.com>


On some platforms msi-controller address regions have to be excluded
from normal IOVA allocation in that they are detected and decoded in
a HW specific way by system components and so they cannot be considered
normal IOVA address space.

Add a helper function that retrieves msi address regions through device
tree msi mapping, so that these regions will not be translated by IOMMU
and will be excluded from IOVA allocations.

Signed-off-by: John Garry <john.garry@huawei.com>

[Shameer: Modified msi-parent retrieval logic]
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

---
 drivers/iommu/of_iommu.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/of_iommu.h | 10 +++++
 2 files changed, 105 insertions(+)

-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Marc Zyngier Oct. 4, 2017, 11:22 a.m. | #1
On 27/09/17 14:32, Shameer Kolothum wrote:
> From: John Garry <john.garry@huawei.com>

> 

> On some platforms msi-controller address regions have to be excluded

> from normal IOVA allocation in that they are detected and decoded in

> a HW specific way by system components and so they cannot be considered

> normal IOVA address space.

> 

> Add a helper function that retrieves msi address regions through device

> tree msi mapping, so that these regions will not be translated by IOMMU

> and will be excluded from IOVA allocations.

> 

> Signed-off-by: John Garry <john.garry@huawei.com>

> [Shameer: Modified msi-parent retrieval logic]

> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

> ---

>  drivers/iommu/of_iommu.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++

>  include/linux/of_iommu.h | 10 +++++

>  2 files changed, 105 insertions(+)

> 

> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c

> index e60e3db..ffd7fb7 100644

> --- a/drivers/iommu/of_iommu.c

> +++ b/drivers/iommu/of_iommu.c

> @@ -21,6 +21,7 @@

>  #include <linux/iommu.h>

>  #include <linux/limits.h>

>  #include <linux/of.h>

> +#include <linux/of_address.h>

>  #include <linux/of_iommu.h>

>  #include <linux/of_pci.h>

>  #include <linux/slab.h>

> @@ -138,6 +139,14 @@ static int of_iommu_xlate(struct device *dev,

>  	return ops->of_xlate(dev, iommu_spec);

>  }

>  

> +static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)

> +{

> +	u32 *rid = data;

> +

> +	*rid = alias;

> +	return 0;

> +}

> +

>  struct of_pci_iommu_alias_info {

>  	struct device *dev;

>  	struct device_node *np;

> @@ -163,6 +172,73 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)

>  	return info->np == pdev->bus->dev.of_node;

>  }

>  

> +static int of_iommu_alloc_resv_region(struct device_node *np,

> +				      struct list_head *head)

> +{

> +	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;

> +	struct iommu_resv_region *region;

> +	struct resource res;

> +	int err;

> +

> +	err = of_address_to_resource(np, 0, &res);


What is the rational for registering the first region only? Surely you
can have more than one region in an MSI controller (yes, your particular
case is the ITS which has only one, but we're trying to do something
generic here).

Another issue I have with this code is that it inserts all of the ITS
MMIO in the RESV_MSI range. As long as we don't generate any page tables
for this, we're fine. But if we ever change this, we'll end-up with the
ITS programming interface being exposed to a device, which wouldn't be
acceptable.

Why can't we have some generic property instead, that would describe the
actual ranges that cannot be translated? That way, no random insertion
of a random range, and relatively future proof.

I'm not sure where to declare it (PCIe node? IOMMU node?), but it'd feel
like a much nicer and simpler solution to this problem.

Thoughts?

	M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Oct. 4, 2017, 1:50 p.m. | #2
On Wed, Oct 04, 2017 at 12:22:04PM +0100, Marc Zyngier wrote:
> On 27/09/17 14:32, Shameer Kolothum wrote:

> > From: John Garry <john.garry@huawei.com>

> > 

> > On some platforms msi-controller address regions have to be excluded

> > from normal IOVA allocation in that they are detected and decoded in

> > a HW specific way by system components and so they cannot be considered

> > normal IOVA address space.

> > 

> > Add a helper function that retrieves msi address regions through device

> > tree msi mapping, so that these regions will not be translated by IOMMU

> > and will be excluded from IOVA allocations.

> > 

> > Signed-off-by: John Garry <john.garry@huawei.com>

> > [Shameer: Modified msi-parent retrieval logic]

> > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

> > ---

> >  drivers/iommu/of_iommu.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++

> >  include/linux/of_iommu.h | 10 +++++

> >  2 files changed, 105 insertions(+)

> > 

> > diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c

> > index e60e3db..ffd7fb7 100644

> > --- a/drivers/iommu/of_iommu.c

> > +++ b/drivers/iommu/of_iommu.c

> > @@ -21,6 +21,7 @@

> >  #include <linux/iommu.h>

> >  #include <linux/limits.h>

> >  #include <linux/of.h>

> > +#include <linux/of_address.h>

> >  #include <linux/of_iommu.h>

> >  #include <linux/of_pci.h>

> >  #include <linux/slab.h>

> > @@ -138,6 +139,14 @@ static int of_iommu_xlate(struct device *dev,

> >  	return ops->of_xlate(dev, iommu_spec);

> >  }

> >  

> > +static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)

> > +{

> > +	u32 *rid = data;

> > +

> > +	*rid = alias;

> > +	return 0;

> > +}

> > +

> >  struct of_pci_iommu_alias_info {

> >  	struct device *dev;

> >  	struct device_node *np;

> > @@ -163,6 +172,73 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)

> >  	return info->np == pdev->bus->dev.of_node;

> >  }

> >  

> > +static int of_iommu_alloc_resv_region(struct device_node *np,

> > +				      struct list_head *head)

> > +{

> > +	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;

> > +	struct iommu_resv_region *region;

> > +	struct resource res;

> > +	int err;

> > +

> > +	err = of_address_to_resource(np, 0, &res);

> 

> What is the rational for registering the first region only? Surely you

> can have more than one region in an MSI controller (yes, your particular

> case is the ITS which has only one, but we're trying to do something

> generic here).

> 

> Another issue I have with this code is that it inserts all of the ITS

> MMIO in the RESV_MSI range. As long as we don't generate any page tables

> for this, we're fine. But if we ever change this, we'll end-up with the

> ITS programming interface being exposed to a device, which wouldn't be

> acceptable.

> 

> Why can't we have some generic property instead, that would describe the

> actual ranges that cannot be translated? That way, no random insertion

> of a random range, and relatively future proof.


IORT code has the same issue (ie it reserves all ITS regions) and I do
not know where a property can be added to describe those ranges (IORT
specs ? I'd rather not) in ACPI other than the IORT tables entries
themselves.

> I'm not sure where to declare it (PCIe node? IOMMU node?), but it'd feel

> like a much nicer and simpler solution to this problem.


It could be but if we throw ACPI into the picture we have to knock
together Hisilicon specific namespace bindings to handle this and
quickly.

Lorenzo
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shameerali Kolothum Thodi Oct. 4, 2017, 5:07 p.m. | #3
> -----Original Message-----

> From: Marc Zyngier [mailto:marc.zyngier@arm.com]

> Sent: Wednesday, October 04, 2017 12:22 PM

> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;

> lorenzo.pieralisi@arm.com; sudeep.holla@arm.com; will.deacon@arm.com;

> robin.murphy@arm.com; joro@8bytes.org; mark.rutland@arm.com;

> robh@kernel.org

> Cc: Gabriele Paoloni <gabriele.paoloni@huawei.com>; John Garry

> <john.garry@huawei.com>; iommu@lists.linux-foundation.org; linux-arm-

> kernel@lists.infradead.org; linux-acpi@vger.kernel.org;

> devicetree@vger.kernel.org; devel@acpica.org; Linuxarm

> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;

> Guohanjun (Hanjun Guo) <guohanjun@huawei.com>

> Subject: Re: [PATCH v8 3/5] iommu/of: Add msi address regions reservation

> helper

> 

> On 27/09/17 14:32, Shameer Kolothum wrote:

> > From: John Garry <john.garry@huawei.com>

> >

> > On some platforms msi-controller address regions have to be excluded

> > from normal IOVA allocation in that they are detected and decoded in a

> > HW specific way by system components and so they cannot be considered

> > normal IOVA address space.

> >

> > Add a helper function that retrieves msi address regions through

> > device tree msi mapping, so that these regions will not be translated

> > by IOMMU and will be excluded from IOVA allocations.

> >

> > Signed-off-by: John Garry <john.garry@huawei.com>

> > [Shameer: Modified msi-parent retrieval logic]

> > Signed-off-by: Shameer Kolothum

> <shameerali.kolothum.thodi@huawei.com>

> > ---

> >  drivers/iommu/of_iommu.c | 95

> > ++++++++++++++++++++++++++++++++++++++++++++++++

> >  include/linux/of_iommu.h | 10 +++++

> >  2 files changed, 105 insertions(+)

> >

> > diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c

> index

> > e60e3db..ffd7fb7 100644

> > --- a/drivers/iommu/of_iommu.c

> > +++ b/drivers/iommu/of_iommu.c

> > @@ -21,6 +21,7 @@

> >  #include <linux/iommu.h>

> >  #include <linux/limits.h>

> >  #include <linux/of.h>

> > +#include <linux/of_address.h>

> >  #include <linux/of_iommu.h>

> >  #include <linux/of_pci.h>

> >  #include <linux/slab.h>

> > @@ -138,6 +139,14 @@ static int of_iommu_xlate(struct device *dev,

> >  	return ops->of_xlate(dev, iommu_spec);  }

> >

> > +static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)

> > +{

> > +	u32 *rid = data;

> > +

> > +	*rid = alias;

> > +	return 0;

> > +}

> > +

> >  struct of_pci_iommu_alias_info {

> >  	struct device *dev;

> >  	struct device_node *np;

> > @@ -163,6 +172,73 @@ static int of_pci_iommu_init(struct pci_dev *pdev,

> u16 alias, void *data)

> >  	return info->np == pdev->bus->dev.of_node;  }

> >

> > +static int of_iommu_alloc_resv_region(struct device_node *np,

> > +				      struct list_head *head)

> > +{

> > +	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;

> > +	struct iommu_resv_region *region;

> > +	struct resource res;

> > +	int err;

> > +

> > +	err = of_address_to_resource(np, 0, &res);

> 

> What is the rational for registering the first region only? Surely you can have

> more than one region in an MSI controller (yes, your particular case is the ITS

> which has only one, but we're trying to do something generic here).


Ok. 

> Another issue I have with this code is that it inserts all of the ITS MMIO in the

> RESV_MSI range. As long as we don't generate any page tables for this, we're

> fine. But if we ever change this, we'll end-up with the ITS programming

> interface being exposed to a device, which wouldn't be acceptable.


I understand the concern of reserving the whole of ITS MMIO region. Sorry, 
but just being curious, how this will be exposed to a  device ? You mean a device
can  be configured to access the ITS MMIO region and it will fail because there is
no SMMU mapping for that?
 
> Why can't we have some generic property instead, that would describe the

> actual ranges that cannot be translated? That way, no random insertion of a

> random range, and relatively future proof.

> 

> I'm not sure where to declare it (PCIe node? IOMMU node?), but it'd feel like

> a much nicer and simpler solution to this problem.


I am not sure that will be seen as legitimizing the untranslated regions or not.
We had this discussion to include the regions specified in IORT spec and the
answer was that, it will in a way legitimize this and encourage future
implementations.

How about making the _msi_get_resv_regions() function be very specific to GIC
ITS like _its_msi_get_resv_regions() ? Is that something we can consider?
(In fact we had a checking for "arm, gic-v3-its" in a previous version of this series).

Thanks,
Shameer
Robin Murphy Oct. 5, 2017, 11:07 a.m. | #4
On 04/10/17 14:50, Lorenzo Pieralisi wrote:
> On Wed, Oct 04, 2017 at 12:22:04PM +0100, Marc Zyngier wrote:

>> On 27/09/17 14:32, Shameer Kolothum wrote:

>>> From: John Garry <john.garry@huawei.com>

>>>

>>> On some platforms msi-controller address regions have to be excluded

>>> from normal IOVA allocation in that they are detected and decoded in

>>> a HW specific way by system components and so they cannot be considered

>>> normal IOVA address space.

>>>

>>> Add a helper function that retrieves msi address regions through device

>>> tree msi mapping, so that these regions will not be translated by IOMMU

>>> and will be excluded from IOVA allocations.

>>>

>>> Signed-off-by: John Garry <john.garry@huawei.com>

>>> [Shameer: Modified msi-parent retrieval logic]

>>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

>>> ---

>>>  drivers/iommu/of_iommu.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++

>>>  include/linux/of_iommu.h | 10 +++++

>>>  2 files changed, 105 insertions(+)

>>>

>>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c

>>> index e60e3db..ffd7fb7 100644

>>> --- a/drivers/iommu/of_iommu.c

>>> +++ b/drivers/iommu/of_iommu.c

>>> @@ -21,6 +21,7 @@

>>>  #include <linux/iommu.h>

>>>  #include <linux/limits.h>

>>>  #include <linux/of.h>

>>> +#include <linux/of_address.h>

>>>  #include <linux/of_iommu.h>

>>>  #include <linux/of_pci.h>

>>>  #include <linux/slab.h>

>>> @@ -138,6 +139,14 @@ static int of_iommu_xlate(struct device *dev,

>>>  	return ops->of_xlate(dev, iommu_spec);

>>>  }

>>>  

>>> +static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)

>>> +{

>>> +	u32 *rid = data;

>>> +

>>> +	*rid = alias;

>>> +	return 0;

>>> +}

>>> +

>>>  struct of_pci_iommu_alias_info {

>>>  	struct device *dev;

>>>  	struct device_node *np;

>>> @@ -163,6 +172,73 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)

>>>  	return info->np == pdev->bus->dev.of_node;

>>>  }

>>>  

>>> +static int of_iommu_alloc_resv_region(struct device_node *np,

>>> +				      struct list_head *head)

>>> +{

>>> +	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;

>>> +	struct iommu_resv_region *region;

>>> +	struct resource res;

>>> +	int err;

>>> +

>>> +	err = of_address_to_resource(np, 0, &res);

>>

>> What is the rational for registering the first region only? Surely you

>> can have more than one region in an MSI controller (yes, your particular

>> case is the ITS which has only one, but we're trying to do something

>> generic here).

>>

>> Another issue I have with this code is that it inserts all of the ITS

>> MMIO in the RESV_MSI range. As long as we don't generate any page tables

>> for this, we're fine. But if we ever change this, we'll end-up with the

>> ITS programming interface being exposed to a device, which wouldn't be

>> acceptable.

>>

>> Why can't we have some generic property instead, that would describe the

>> actual ranges that cannot be translated? That way, no random insertion

>> of a random range, and relatively future proof.


Indeed. Furthermore, IORT has the advantage of being limited to a few
distinct device types and SBSA-compliant system topologies, so the
ITS-chasing is reasonable there (modulo only reserving GITS_TRANSLATER).
The scope of DT covers more embedded things as well like PCI host
controllers with internal MSI doorbells, and potentially even
direct-mapped memory regions for things like bootloader framebuffers to
prevent display glitches - a generic address/size/flags description of a
region could work for just about everything.

> IORT code has the same issue (ie it reserves all ITS regions) and I do

> not know where a property can be added to describe those ranges (IORT

> specs ? I'd rather not) in ACPI other than the IORT tables entries

> themselves.

> 

>> I'm not sure where to declare it (PCIe node? IOMMU node?), but it'd feel

>> like a much nicer and simpler solution to this problem.

> 

> It could be but if we throw ACPI into the picture we have to knock

> together Hisilicon specific namespace bindings to handle this and

> quickly.


As above I'm happy with the ITS-specific solution for ACPI given the
limits of IORT. What I had in mind for DT was something tied in with the
generic IOMMU bindings. Something like this is probably easiest to
handle, but does rather spread the information around:


  pci {
  	...
  	iommu-map = <0 0 &iommu1 0x10000>;
  	iommu-reserved-ranges = <0x12340000 0x1000 IOMMU_MSI>,
  				<0x34560000 0x1000 IOMMU_MSI>;
  };

  display {
  	...
  	iommus = <&iommu1 0x20000>;
  	/* simplefb region */
  	iommu-reserved-ranges = <0x80080000 0x80000 IOMMU_DIRECT>,
  };


Alternatively, something inspired by reserved-memory might perhaps be
conceptually neater, at the risk of being more complicated:


  iommu1: iommu@acbd0000 {
  	...
  	#iommu-cells = <1>;

  	iommu-reserved-ranges {
  		#address-cells = <1>;
  		#size-cells = <1>;

		its0_resv: msi@12340000 {
  			compatible = "iommu-msi-region";
  			reg = <0x12340000 0x1000>;
  		};

		its1_resv: msi@34560000 {
  			compatible = "iommu-msi-region";
  			reg = <0x34560000 0x1000>;
  		};

		fb_resv: direct@12340000 {
  			compatible = "iommu-direct-region";
  			reg = <0x80080000 0x80000>;
  		};
  	};
  };


DT folks - any opinions?

Robin.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon Oct. 5, 2017, 11:57 a.m. | #5
On Thu, Oct 05, 2017 at 12:07:26PM +0100, Robin Murphy wrote:
> On 04/10/17 14:50, Lorenzo Pieralisi wrote:

> > On Wed, Oct 04, 2017 at 12:22:04PM +0100, Marc Zyngier wrote:

> >> On 27/09/17 14:32, Shameer Kolothum wrote:

> >>> From: John Garry <john.garry@huawei.com>

> >>>

> >>> On some platforms msi-controller address regions have to be excluded

> >>> from normal IOVA allocation in that they are detected and decoded in

> >>> a HW specific way by system components and so they cannot be considered

> >>> normal IOVA address space.

> >>>

> >>> Add a helper function that retrieves msi address regions through device

> >>> tree msi mapping, so that these regions will not be translated by IOMMU

> >>> and will be excluded from IOVA allocations.

> >>>

> >>> Signed-off-by: John Garry <john.garry@huawei.com>

> >>> [Shameer: Modified msi-parent retrieval logic]

> >>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

> >>> ---

> >>>  drivers/iommu/of_iommu.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++

> >>>  include/linux/of_iommu.h | 10 +++++

> >>>  2 files changed, 105 insertions(+)

> >>>

> >>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c

> >>> index e60e3db..ffd7fb7 100644

> >>> --- a/drivers/iommu/of_iommu.c

> >>> +++ b/drivers/iommu/of_iommu.c

> >>> @@ -21,6 +21,7 @@

> >>>  #include <linux/iommu.h>

> >>>  #include <linux/limits.h>

> >>>  #include <linux/of.h>

> >>> +#include <linux/of_address.h>

> >>>  #include <linux/of_iommu.h>

> >>>  #include <linux/of_pci.h>

> >>>  #include <linux/slab.h>

> >>> @@ -138,6 +139,14 @@ static int of_iommu_xlate(struct device *dev,

> >>>  	return ops->of_xlate(dev, iommu_spec);

> >>>  }

> >>>  

> >>> +static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)

> >>> +{

> >>> +	u32 *rid = data;

> >>> +

> >>> +	*rid = alias;

> >>> +	return 0;

> >>> +}

> >>> +

> >>>  struct of_pci_iommu_alias_info {

> >>>  	struct device *dev;

> >>>  	struct device_node *np;

> >>> @@ -163,6 +172,73 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)

> >>>  	return info->np == pdev->bus->dev.of_node;

> >>>  }

> >>>  

> >>> +static int of_iommu_alloc_resv_region(struct device_node *np,

> >>> +				      struct list_head *head)

> >>> +{

> >>> +	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;

> >>> +	struct iommu_resv_region *region;

> >>> +	struct resource res;

> >>> +	int err;

> >>> +

> >>> +	err = of_address_to_resource(np, 0, &res);

> >>

> >> What is the rational for registering the first region only? Surely you

> >> can have more than one region in an MSI controller (yes, your particular

> >> case is the ITS which has only one, but we're trying to do something

> >> generic here).

> >>

> >> Another issue I have with this code is that it inserts all of the ITS

> >> MMIO in the RESV_MSI range. As long as we don't generate any page tables

> >> for this, we're fine. But if we ever change this, we'll end-up with the

> >> ITS programming interface being exposed to a device, which wouldn't be

> >> acceptable.

> >>

> >> Why can't we have some generic property instead, that would describe the

> >> actual ranges that cannot be translated? That way, no random insertion

> >> of a random range, and relatively future proof.

> 

> Indeed. Furthermore, IORT has the advantage of being limited to a few

> distinct device types and SBSA-compliant system topologies, so the

> ITS-chasing is reasonable there (modulo only reserving GITS_TRANSLATER).

> The scope of DT covers more embedded things as well like PCI host

> controllers with internal MSI doorbells, and potentially even

> direct-mapped memory regions for things like bootloader framebuffers to

> prevent display glitches - a generic address/size/flags description of a

> region could work for just about everything.

> 

> > IORT code has the same issue (ie it reserves all ITS regions) and I do

> > not know where a property can be added to describe those ranges (IORT

> > specs ? I'd rather not) in ACPI other than the IORT tables entries

> > themselves.

> > 

> >> I'm not sure where to declare it (PCIe node? IOMMU node?), but it'd feel

> >> like a much nicer and simpler solution to this problem.

> > 

> > It could be but if we throw ACPI into the picture we have to knock

> > together Hisilicon specific namespace bindings to handle this and

> > quickly.

> 

> As above I'm happy with the ITS-specific solution for ACPI given the

> limits of IORT. What I had in mind for DT was something tied in with the

> generic IOMMU bindings. Something like this is probably easiest to

> handle, but does rather spread the information around:

> 

> 

>   pci {

>   	...

>   	iommu-map = <0 0 &iommu1 0x10000>;

>   	iommu-reserved-ranges = <0x12340000 0x1000 IOMMU_MSI>,

>   				<0x34560000 0x1000 IOMMU_MSI>;

>   };

> 

>   display {

>   	...

>   	iommus = <&iommu1 0x20000>;

>   	/* simplefb region */

>   	iommu-reserved-ranges = <0x80080000 0x80000 IOMMU_DIRECT>,

>   };

> 

> 

> Alternatively, something inspired by reserved-memory might perhaps be

> conceptually neater, at the risk of being more complicated:

> 

> 

>   iommu1: iommu@acbd0000 {

>   	...

>   	#iommu-cells = <1>;

> 

>   	iommu-reserved-ranges {

>   		#address-cells = <1>;

>   		#size-cells = <1>;

> 

> 		its0_resv: msi@12340000 {

>   			compatible = "iommu-msi-region";

>   			reg = <0x12340000 0x1000>;

>   		};

> 

> 		its1_resv: msi@34560000 {

>   			compatible = "iommu-msi-region";

>   			reg = <0x34560000 0x1000>;

>   		};

> 

> 		fb_resv: direct@12340000 {

>   			compatible = "iommu-direct-region";

>   			reg = <0x80080000 0x80000>;

>   		};

>   	};

>   };


I like the locality of this, but is it definitely flexible enough? Do we
need to deal with systems where the reserved regions are specific to the
master (i.e. TBU) as opposed to the entire SMMU block?

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Garry Oct. 5, 2017, 12:37 p.m. | #6
On 05/10/2017 12:07, Robin Murphy wrote:
> On 04/10/17 14:50, Lorenzo Pieralisi wrote:

>> On Wed, Oct 04, 2017 at 12:22:04PM +0100, Marc Zyngier wrote:

>>> On 27/09/17 14:32, Shameer Kolothum wrote:

>>>> From: John Garry <john.garry@huawei.com>

>>>>

>>>> On some platforms msi-controller address regions have to be excluded

>>>> from normal IOVA allocation in that they are detected and decoded in

>>>> a HW specific way by system components and so they cannot be considered

>>>> normal IOVA address space.

>>>>

>>>> Add a helper function that retrieves msi address regions through device

>>>> tree msi mapping, so that these regions will not be translated by IOMMU

>>>> and will be excluded from IOVA allocations.

>>>>

>>>> Signed-off-by: John Garry <john.garry@huawei.com>

>>>> [Shameer: Modified msi-parent retrieval logic]

>>>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

>>>> ---

>>>>  drivers/iommu/of_iommu.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++

>>>>  include/linux/of_iommu.h | 10 +++++

>>>>  2 files changed, 105 insertions(+)

>>>>

>>>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c

>>>> index e60e3db..ffd7fb7 100644

>>>> --- a/drivers/iommu/of_iommu.c

>>>> +++ b/drivers/iommu/of_iommu.c

>>>> @@ -21,6 +21,7 @@

>>>>  #include <linux/iommu.h>

>>>>  #include <linux/limits.h>

>>>>  #include <linux/of.h>

>>>> +#include <linux/of_address.h>

>>>>  #include <linux/of_iommu.h>

>>>>  #include <linux/of_pci.h>

>>>>  #include <linux/slab.h>

>>>> @@ -138,6 +139,14 @@ static int of_iommu_xlate(struct device *dev,

>>>>  	return ops->of_xlate(dev, iommu_spec);

>>>>  }

>>>>

>>>> +static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)

>>>> +{

>>>> +	u32 *rid = data;

>>>> +

>>>> +	*rid = alias;

>>>> +	return 0;

>>>> +}

>>>> +

>>>>  struct of_pci_iommu_alias_info {

>>>>  	struct device *dev;

>>>>  	struct device_node *np;

>>>> @@ -163,6 +172,73 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)

>>>>  	return info->np == pdev->bus->dev.of_node;

>>>>  }

>>>>

>>>> +static int of_iommu_alloc_resv_region(struct device_node *np,

>>>> +				      struct list_head *head)

>>>> +{

>>>> +	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;

>>>> +	struct iommu_resv_region *region;

>>>> +	struct resource res;

>>>> +	int err;

>>>> +

>>>> +	err = of_address_to_resource(np, 0, &res);

>>>

>>> What is the rational for registering the first region only? Surely you

>>> can have more than one region in an MSI controller (yes, your particular

>>> case is the ITS which has only one, but we're trying to do something

>>> generic here).

>>>

>>> Another issue I have with this code is that it inserts all of the ITS

>>> MMIO in the RESV_MSI range. As long as we don't generate any page tables

>>> for this, we're fine. But if we ever change this, we'll end-up with the

>>> ITS programming interface being exposed to a device, which wouldn't be

>>> acceptable.

>>>

>>> Why can't we have some generic property instead, that would describe the

>>> actual ranges that cannot be translated? That way, no random insertion

>>> of a random range, and relatively future proof.

>

> Indeed. Furthermore, IORT has the advantage of being limited to a few

> distinct device types and SBSA-compliant system topologies, so the

> ITS-chasing is reasonable there (modulo only reserving GITS_TRANSLATER).

> The scope of DT covers more embedded things as well like PCI host

> controllers with internal MSI doorbells, and potentially even

> direct-mapped memory regions for things like bootloader framebuffers to

> prevent display glitches - a generic address/size/flags description of a

> region could work for just about everything.

>

>> IORT code has the same issue (ie it reserves all ITS regions) and I do

>> not know where a property can be added to describe those ranges (IORT

>> specs ? I'd rather not) in ACPI other than the IORT tables entries

>> themselves.

>>

>>> I'm not sure where to declare it (PCIe node? IOMMU node?), but it'd feel

>>> like a much nicer and simpler solution to this problem.

>>

>> It could be but if we throw ACPI into the picture we have to knock

>> together Hisilicon specific namespace bindings to handle this and

>> quickly.

>

> As above I'm happy with the ITS-specific solution for ACPI given the

> limits of IORT. What I had in mind for DT was something tied in with the

> generic IOMMU bindings. Something like this is probably easiest to

> handle, but does rather spread the information around:

>

>

>   pci {

>   	...

>   	iommu-map = <0 0 &iommu1 0x10000>;

>   	iommu-reserved-ranges = <0x12340000 0x1000 IOMMU_MSI>,

>   				<0x34560000 0x1000 IOMMU_MSI>;

>   };

>

>   display {

>   	...

>   	iommus = <&iommu1 0x20000>;

>   	/* simplefb region */

>   	iommu-reserved-ranges = <0x80080000 0x80000 IOMMU_DIRECT>,

>   };

>

>

> Alternatively, something inspired by reserved-memory might perhaps be

> conceptually neater, at the risk of being more complicated:

>

>

>   iommu1: iommu@acbd0000 {

>   	...

>   	#iommu-cells = <1>;

>

>   	iommu-reserved-ranges {

>   		#address-cells = <1>;

>   		#size-cells = <1>;

>

> 		its0_resv: msi@12340000 {

>   			compatible = "iommu-msi-region";

>   			reg = <0x12340000 0x1000>;

>   		};

>

> 		its1_resv: msi@34560000 {

>   			compatible = "iommu-msi-region";

>   			reg = <0x34560000 0x1000>;

>   		};

>

> 		fb_resv: direct@12340000 {

>   			compatible = "iommu-direct-region";

>   			reg = <0x80080000 0x80000>;

>   		};

>   	};

>   };

>

>


If we did this, wouldn't it be easier to create dangerous reserved 
regions, like our ITS region which Marc was concerned by? It's not so 
hard to get dts changes into the kernel.

John

> DT folks - any opinions?

>

> Robin.

>

> .

>



--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robin Murphy Oct. 5, 2017, 12:44 p.m. | #7
On 05/10/17 13:37, John Garry wrote:
> On 05/10/2017 12:07, Robin Murphy wrote:

>> On 04/10/17 14:50, Lorenzo Pieralisi wrote:

>>> On Wed, Oct 04, 2017 at 12:22:04PM +0100, Marc Zyngier wrote:

>>>> On 27/09/17 14:32, Shameer Kolothum wrote:

>>>>> From: John Garry <john.garry@huawei.com>

>>>>>

>>>>> On some platforms msi-controller address regions have to be excluded

>>>>> from normal IOVA allocation in that they are detected and decoded in

>>>>> a HW specific way by system components and so they cannot be

>>>>> considered

>>>>> normal IOVA address space.

>>>>>

>>>>> Add a helper function that retrieves msi address regions through

>>>>> device

>>>>> tree msi mapping, so that these regions will not be translated by

>>>>> IOMMU

>>>>> and will be excluded from IOVA allocations.

>>>>>

>>>>> Signed-off-by: John Garry <john.garry@huawei.com>

>>>>> [Shameer: Modified msi-parent retrieval logic]

>>>>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

>>>>> ---

>>>>>  drivers/iommu/of_iommu.c | 95

>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++

>>>>>  include/linux/of_iommu.h | 10 +++++

>>>>>  2 files changed, 105 insertions(+)

>>>>>

>>>>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c

>>>>> index e60e3db..ffd7fb7 100644

>>>>> --- a/drivers/iommu/of_iommu.c

>>>>> +++ b/drivers/iommu/of_iommu.c

>>>>> @@ -21,6 +21,7 @@

>>>>>  #include <linux/iommu.h>

>>>>>  #include <linux/limits.h>

>>>>>  #include <linux/of.h>

>>>>> +#include <linux/of_address.h>

>>>>>  #include <linux/of_iommu.h>

>>>>>  #include <linux/of_pci.h>

>>>>>  #include <linux/slab.h>

>>>>> @@ -138,6 +139,14 @@ static int of_iommu_xlate(struct device *dev,

>>>>>      return ops->of_xlate(dev, iommu_spec);

>>>>>  }

>>>>>

>>>>> +static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)

>>>>> +{

>>>>> +    u32 *rid = data;

>>>>> +

>>>>> +    *rid = alias;

>>>>> +    return 0;

>>>>> +}

>>>>> +

>>>>>  struct of_pci_iommu_alias_info {

>>>>>      struct device *dev;

>>>>>      struct device_node *np;

>>>>> @@ -163,6 +172,73 @@ static int of_pci_iommu_init(struct pci_dev

>>>>> *pdev, u16 alias, void *data)

>>>>>      return info->np == pdev->bus->dev.of_node;

>>>>>  }

>>>>>

>>>>> +static int of_iommu_alloc_resv_region(struct device_node *np,

>>>>> +                      struct list_head *head)

>>>>> +{

>>>>> +    int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;

>>>>> +    struct iommu_resv_region *region;

>>>>> +    struct resource res;

>>>>> +    int err;

>>>>> +

>>>>> +    err = of_address_to_resource(np, 0, &res);

>>>>

>>>> What is the rational for registering the first region only? Surely you

>>>> can have more than one region in an MSI controller (yes, your

>>>> particular

>>>> case is the ITS which has only one, but we're trying to do something

>>>> generic here).

>>>>

>>>> Another issue I have with this code is that it inserts all of the ITS

>>>> MMIO in the RESV_MSI range. As long as we don't generate any page

>>>> tables

>>>> for this, we're fine. But if we ever change this, we'll end-up with the

>>>> ITS programming interface being exposed to a device, which wouldn't be

>>>> acceptable.

>>>>

>>>> Why can't we have some generic property instead, that would describe

>>>> the

>>>> actual ranges that cannot be translated? That way, no random insertion

>>>> of a random range, and relatively future proof.

>>

>> Indeed. Furthermore, IORT has the advantage of being limited to a few

>> distinct device types and SBSA-compliant system topologies, so the

>> ITS-chasing is reasonable there (modulo only reserving GITS_TRANSLATER).

>> The scope of DT covers more embedded things as well like PCI host

>> controllers with internal MSI doorbells, and potentially even

>> direct-mapped memory regions for things like bootloader framebuffers to

>> prevent display glitches - a generic address/size/flags description of a

>> region could work for just about everything.

>>

>>> IORT code has the same issue (ie it reserves all ITS regions) and I do

>>> not know where a property can be added to describe those ranges (IORT

>>> specs ? I'd rather not) in ACPI other than the IORT tables entries

>>> themselves.

>>>

>>>> I'm not sure where to declare it (PCIe node? IOMMU node?), but it'd

>>>> feel

>>>> like a much nicer and simpler solution to this problem.

>>>

>>> It could be but if we throw ACPI into the picture we have to knock

>>> together Hisilicon specific namespace bindings to handle this and

>>> quickly.

>>

>> As above I'm happy with the ITS-specific solution for ACPI given the

>> limits of IORT. What I had in mind for DT was something tied in with the

>> generic IOMMU bindings. Something like this is probably easiest to

>> handle, but does rather spread the information around:

>>

>>

>>   pci {

>>       ...

>>       iommu-map = <0 0 &iommu1 0x10000>;

>>       iommu-reserved-ranges = <0x12340000 0x1000 IOMMU_MSI>,

>>                   <0x34560000 0x1000 IOMMU_MSI>;

>>   };

>>

>>   display {

>>       ...

>>       iommus = <&iommu1 0x20000>;

>>       /* simplefb region */

>>       iommu-reserved-ranges = <0x80080000 0x80000 IOMMU_DIRECT>,

>>   };

>>

>>

>> Alternatively, something inspired by reserved-memory might perhaps be

>> conceptually neater, at the risk of being more complicated:

>>

>>

>>   iommu1: iommu@acbd0000 {

>>       ...

>>       #iommu-cells = <1>;

>>

>>       iommu-reserved-ranges {

>>           #address-cells = <1>;

>>           #size-cells = <1>;

>>

>>         its0_resv: msi@12340000 {

>>               compatible = "iommu-msi-region";

>>               reg = <0x12340000 0x1000>;

>>           };

>>

>>         its1_resv: msi@34560000 {

>>               compatible = "iommu-msi-region";

>>               reg = <0x34560000 0x1000>;

>>           };

>>

>>         fb_resv: direct@12340000 {

>>               compatible = "iommu-direct-region";

>>               reg = <0x80080000 0x80000>;

>>           };

>>       };

>>   };

>>

>>

> 

> If we did this, wouldn't it be easier to create dangerous reserved

> regions, like our ITS region which Marc was concerned by? It's not so

> hard to get dts changes into the kernel.


Well, yeah. It's also equally easy to add some peripheral register
region to the /memory node and watch hilarity ensue. The solution to
both is "don't put bogus crap in your DT".

There's a big difference between wilfully misdescribing your platform
requirements vs. having the kernel automagically infer something but
leave a hole open in the process.

Robin.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robin Murphy Oct. 5, 2017, 12:55 p.m. | #8
On 05/10/17 12:57, Will Deacon wrote:
> On Thu, Oct 05, 2017 at 12:07:26PM +0100, Robin Murphy wrote:

>> On 04/10/17 14:50, Lorenzo Pieralisi wrote:

>>> On Wed, Oct 04, 2017 at 12:22:04PM +0100, Marc Zyngier wrote:

>>>> On 27/09/17 14:32, Shameer Kolothum wrote:

>>>>> From: John Garry <john.garry@huawei.com>

>>>>>

>>>>> On some platforms msi-controller address regions have to be excluded

>>>>> from normal IOVA allocation in that they are detected and decoded in

>>>>> a HW specific way by system components and so they cannot be considered

>>>>> normal IOVA address space.

>>>>>

>>>>> Add a helper function that retrieves msi address regions through device

>>>>> tree msi mapping, so that these regions will not be translated by IOMMU

>>>>> and will be excluded from IOVA allocations.

>>>>>

>>>>> Signed-off-by: John Garry <john.garry@huawei.com>

>>>>> [Shameer: Modified msi-parent retrieval logic]

>>>>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

>>>>> ---

>>>>>  drivers/iommu/of_iommu.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++

>>>>>  include/linux/of_iommu.h | 10 +++++

>>>>>  2 files changed, 105 insertions(+)

>>>>>

>>>>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c

>>>>> index e60e3db..ffd7fb7 100644

>>>>> --- a/drivers/iommu/of_iommu.c

>>>>> +++ b/drivers/iommu/of_iommu.c

>>>>> @@ -21,6 +21,7 @@

>>>>>  #include <linux/iommu.h>

>>>>>  #include <linux/limits.h>

>>>>>  #include <linux/of.h>

>>>>> +#include <linux/of_address.h>

>>>>>  #include <linux/of_iommu.h>

>>>>>  #include <linux/of_pci.h>

>>>>>  #include <linux/slab.h>

>>>>> @@ -138,6 +139,14 @@ static int of_iommu_xlate(struct device *dev,

>>>>>  	return ops->of_xlate(dev, iommu_spec);

>>>>>  }

>>>>>  

>>>>> +static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)

>>>>> +{

>>>>> +	u32 *rid = data;

>>>>> +

>>>>> +	*rid = alias;

>>>>> +	return 0;

>>>>> +}

>>>>> +

>>>>>  struct of_pci_iommu_alias_info {

>>>>>  	struct device *dev;

>>>>>  	struct device_node *np;

>>>>> @@ -163,6 +172,73 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)

>>>>>  	return info->np == pdev->bus->dev.of_node;

>>>>>  }

>>>>>  

>>>>> +static int of_iommu_alloc_resv_region(struct device_node *np,

>>>>> +				      struct list_head *head)

>>>>> +{

>>>>> +	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;

>>>>> +	struct iommu_resv_region *region;

>>>>> +	struct resource res;

>>>>> +	int err;

>>>>> +

>>>>> +	err = of_address_to_resource(np, 0, &res);

>>>>

>>>> What is the rational for registering the first region only? Surely you

>>>> can have more than one region in an MSI controller (yes, your particular

>>>> case is the ITS which has only one, but we're trying to do something

>>>> generic here).

>>>>

>>>> Another issue I have with this code is that it inserts all of the ITS

>>>> MMIO in the RESV_MSI range. As long as we don't generate any page tables

>>>> for this, we're fine. But if we ever change this, we'll end-up with the

>>>> ITS programming interface being exposed to a device, which wouldn't be

>>>> acceptable.

>>>>

>>>> Why can't we have some generic property instead, that would describe the

>>>> actual ranges that cannot be translated? That way, no random insertion

>>>> of a random range, and relatively future proof.

>>

>> Indeed. Furthermore, IORT has the advantage of being limited to a few

>> distinct device types and SBSA-compliant system topologies, so the

>> ITS-chasing is reasonable there (modulo only reserving GITS_TRANSLATER).

>> The scope of DT covers more embedded things as well like PCI host

>> controllers with internal MSI doorbells, and potentially even

>> direct-mapped memory regions for things like bootloader framebuffers to

>> prevent display glitches - a generic address/size/flags description of a

>> region could work for just about everything.

>>

>>> IORT code has the same issue (ie it reserves all ITS regions) and I do

>>> not know where a property can be added to describe those ranges (IORT

>>> specs ? I'd rather not) in ACPI other than the IORT tables entries

>>> themselves.

>>>

>>>> I'm not sure where to declare it (PCIe node? IOMMU node?), but it'd feel

>>>> like a much nicer and simpler solution to this problem.

>>>

>>> It could be but if we throw ACPI into the picture we have to knock

>>> together Hisilicon specific namespace bindings to handle this and

>>> quickly.

>>

>> As above I'm happy with the ITS-specific solution for ACPI given the

>> limits of IORT. What I had in mind for DT was something tied in with the

>> generic IOMMU bindings. Something like this is probably easiest to

>> handle, but does rather spread the information around:

>>

>>

>>   pci {

>>   	...

>>   	iommu-map = <0 0 &iommu1 0x10000>;

>>   	iommu-reserved-ranges = <0x12340000 0x1000 IOMMU_MSI>,

>>   				<0x34560000 0x1000 IOMMU_MSI>;

>>   };

>>

>>   display {

>>   	...

>>   	iommus = <&iommu1 0x20000>;

>>   	/* simplefb region */

>>   	iommu-reserved-ranges = <0x80080000 0x80000 IOMMU_DIRECT>,

>>   };

>>

>>

>> Alternatively, something inspired by reserved-memory might perhaps be

>> conceptually neater, at the risk of being more complicated:

>>

>>

>>   iommu1: iommu@acbd0000 {

>>   	...

>>   	#iommu-cells = <1>;

>>

>>   	iommu-reserved-ranges {

>>   		#address-cells = <1>;

>>   		#size-cells = <1>;

>>

>> 		its0_resv: msi@12340000 {

>>   			compatible = "iommu-msi-region";

>>   			reg = <0x12340000 0x1000>;

>>   		};

>>

>> 		its1_resv: msi@34560000 {

>>   			compatible = "iommu-msi-region";

>>   			reg = <0x34560000 0x1000>;

>>   		};

>>

>> 		fb_resv: direct@12340000 {

>>   			compatible = "iommu-direct-region";

>>   			reg = <0x80080000 0x80000>;

>>   		};

>>   	};

>>   };

> 

> I like the locality of this, but is it definitely flexible enough? Do we

> need to deal with systems where the reserved regions are specific to the

> master (i.e. TBU) as opposed to the entire SMMU block?


That would certainly be true for most direct-mapping cases, where the
reservation is tied to the specific stream ID(s) of one master, let
alone a TBU. I guess we'd have to make a phandle reference from the
device node(s) to the region node mandatory, such that software need
only make the actual reservation/mapping upon encountering a device that
actually needs it.

Robin.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Garry Oct. 5, 2017, 1:08 p.m. | #9
On 05/10/2017 13:44, Robin Murphy wrote:
> On 05/10/17 13:37, John Garry wrote:

>> On 05/10/2017 12:07, Robin Murphy wrote:

>>> On 04/10/17 14:50, Lorenzo Pieralisi wrote:

>>>> On Wed, Oct 04, 2017 at 12:22:04PM +0100, Marc Zyngier wrote:

>>>>> On 27/09/17 14:32, Shameer Kolothum wrote:

>>>>>> From: John Garry <john.garry@huawei.com>

>>>>>>

>>>>>> On some platforms msi-controller address regions have to be excluded

>>>>>> from normal IOVA allocation in that they are detected and decoded in

>>>>>> a HW specific way by system components and so they cannot be

>>>>>> considered

>>>>>> normal IOVA address space.

>>>>>>

>>>>>> Add a helper function that retrieves msi address regions through

>>>>>> device

>>>>>> tree msi mapping, so that these regions will not be translated by

>>>>>> IOMMU

>>>>>> and will be excluded from IOVA allocations.

>>>>>>

>>>>>> Signed-off-by: John Garry <john.garry@huawei.com>

>>>>>> [Shameer: Modified msi-parent retrieval logic]

>>>>>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

>>>>>> ---

>>>>>>  drivers/iommu/of_iommu.c | 95

>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++

>>>>>>  include/linux/of_iommu.h | 10 +++++

>>>>>>  2 files changed, 105 insertions(+)

>>>>>>

>>>>>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c

>>>>>> index e60e3db..ffd7fb7 100644

>>>>>> --- a/drivers/iommu/of_iommu.c

>>>>>> +++ b/drivers/iommu/of_iommu.c

>>>>>> @@ -21,6 +21,7 @@

>>>>>>  #include <linux/iommu.h>

>>>>>>  #include <linux/limits.h>

>>>>>>  #include <linux/of.h>

>>>>>> +#include <linux/of_address.h>

>>>>>>  #include <linux/of_iommu.h>

>>>>>>  #include <linux/of_pci.h>

>>>>>>  #include <linux/slab.h>

>>>>>> @@ -138,6 +139,14 @@ static int of_iommu_xlate(struct device *dev,

>>>>>>      return ops->of_xlate(dev, iommu_spec);

>>>>>>  }

>>>>>>

>>>>>> +static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)

>>>>>> +{

>>>>>> +    u32 *rid = data;

>>>>>> +

>>>>>> +    *rid = alias;

>>>>>> +    return 0;

>>>>>> +}

>>>>>> +

>>>>>>  struct of_pci_iommu_alias_info {

>>>>>>      struct device *dev;

>>>>>>      struct device_node *np;

>>>>>> @@ -163,6 +172,73 @@ static int of_pci_iommu_init(struct pci_dev

>>>>>> *pdev, u16 alias, void *data)

>>>>>>      return info->np == pdev->bus->dev.of_node;

>>>>>>  }

>>>>>>

>>>>>> +static int of_iommu_alloc_resv_region(struct device_node *np,

>>>>>> +                      struct list_head *head)

>>>>>> +{

>>>>>> +    int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;

>>>>>> +    struct iommu_resv_region *region;

>>>>>> +    struct resource res;

>>>>>> +    int err;

>>>>>> +

>>>>>> +    err = of_address_to_resource(np, 0, &res);

>>>>>

>>>>> What is the rational for registering the first region only? Surely you

>>>>> can have more than one region in an MSI controller (yes, your

>>>>> particular

>>>>> case is the ITS which has only one, but we're trying to do something

>>>>> generic here).

>>>>>

>>>>> Another issue I have with this code is that it inserts all of the ITS

>>>>> MMIO in the RESV_MSI range. As long as we don't generate any page

>>>>> tables

>>>>> for this, we're fine. But if we ever change this, we'll end-up with the

>>>>> ITS programming interface being exposed to a device, which wouldn't be

>>>>> acceptable.

>>>>>

>>>>> Why can't we have some generic property instead, that would describe

>>>>> the

>>>>> actual ranges that cannot be translated? That way, no random insertion

>>>>> of a random range, and relatively future proof.

>>>

>>> Indeed. Furthermore, IORT has the advantage of being limited to a few

>>> distinct device types and SBSA-compliant system topologies, so the

>>> ITS-chasing is reasonable there (modulo only reserving GITS_TRANSLATER).

>>> The scope of DT covers more embedded things as well like PCI host

>>> controllers with internal MSI doorbells, and potentially even

>>> direct-mapped memory regions for things like bootloader framebuffers to

>>> prevent display glitches - a generic address/size/flags description of a

>>> region could work for just about everything.

>>>

>>>> IORT code has the same issue (ie it reserves all ITS regions) and I do

>>>> not know where a property can be added to describe those ranges (IORT

>>>> specs ? I'd rather not) in ACPI other than the IORT tables entries

>>>> themselves.

>>>>

>>>>> I'm not sure where to declare it (PCIe node? IOMMU node?), but it'd

>>>>> feel

>>>>> like a much nicer and simpler solution to this problem.

>>>>

>>>> It could be but if we throw ACPI into the picture we have to knock

>>>> together Hisilicon specific namespace bindings to handle this and

>>>> quickly.

>>>

>>> As above I'm happy with the ITS-specific solution for ACPI given the

>>> limits of IORT. What I had in mind for DT was something tied in with the

>>> generic IOMMU bindings. Something like this is probably easiest to

>>> handle, but does rather spread the information around:

>>>

>>>

>>>   pci {

>>>       ...

>>>       iommu-map = <0 0 &iommu1 0x10000>;

>>>       iommu-reserved-ranges = <0x12340000 0x1000 IOMMU_MSI>,

>>>                   <0x34560000 0x1000 IOMMU_MSI>;

>>>   };

>>>

>>>   display {

>>>       ...

>>>       iommus = <&iommu1 0x20000>;

>>>       /* simplefb region */

>>>       iommu-reserved-ranges = <0x80080000 0x80000 IOMMU_DIRECT>,

>>>   };

>>>

>>>

>>> Alternatively, something inspired by reserved-memory might perhaps be

>>> conceptually neater, at the risk of being more complicated:

>>>

>>>

>>>   iommu1: iommu@acbd0000 {

>>>       ...

>>>       #iommu-cells = <1>;

>>>

>>>       iommu-reserved-ranges {

>>>           #address-cells = <1>;

>>>           #size-cells = <1>;

>>>

>>>         its0_resv: msi@12340000 {

>>>               compatible = "iommu-msi-region";

>>>               reg = <0x12340000 0x1000>;

>>>           };

>>>

>>>         its1_resv: msi@34560000 {

>>>               compatible = "iommu-msi-region";

>>>               reg = <0x34560000 0x1000>;

>>>           };

>>>

>>>         fb_resv: direct@12340000 {

>>>               compatible = "iommu-direct-region";

>>>               reg = <0x80080000 0x80000>;

>>>           };

>>>       };

>>>   };

>>>

>>>

>>

>> If we did this, wouldn't it be easier to create dangerous reserved

>> regions, like our ITS region which Marc was concerned by? It's not so

>> hard to get dts changes into the kernel.

>

> Well, yeah. It's also equally easy to add some peripheral register

> region to the /memory node and watch hilarity ensue. The solution to

> both is "don't put bogus crap in your DT".

>


Sure, but people make mistakes and often a lot more subtle than your 
example.

Only one person spotted the problem with our approach to the ITS 
problem. I'm pretty confident it would not have been spotted if it was 
submitted as a dts change.

Much appreciated,
John

> There's a big difference between wilfully misdescribing your platform

> requirements vs. having the kernel automatically infer something but

> leave a hole open in the process.

>

> Robin.

>

> .

>



--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index e60e3db..ffd7fb7 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -21,6 +21,7 @@ 
 #include <linux/iommu.h>
 #include <linux/limits.h>
 #include <linux/of.h>
+#include <linux/of_address.h>
 #include <linux/of_iommu.h>
 #include <linux/of_pci.h>
 #include <linux/slab.h>
@@ -138,6 +139,14 @@  static int of_iommu_xlate(struct device *dev,
 	return ops->of_xlate(dev, iommu_spec);
 }
 
+static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)
+{
+	u32 *rid = data;
+
+	*rid = alias;
+	return 0;
+}
+
 struct of_pci_iommu_alias_info {
 	struct device *dev;
 	struct device_node *np;
@@ -163,6 +172,73 @@  static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)
 	return info->np == pdev->bus->dev.of_node;
 }
 
+static int of_iommu_alloc_resv_region(struct device_node *np,
+				      struct list_head *head)
+{
+	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+	struct iommu_resv_region *region;
+	struct resource res;
+	int err;
+
+	err = of_address_to_resource(np, 0, &res);
+	if (err)
+		return err;
+
+	region = iommu_alloc_resv_region(res.start, resource_size(&res),
+					 prot, IOMMU_RESV_MSI);
+	if (!region)
+		return -ENOMEM;
+
+	list_add_tail(&region->list, head);
+
+	return 0;
+}
+
+static int of_pci_msi_get_resv_regions(struct device *dev,
+				       struct list_head *head)
+{
+	struct device_node *msi_np;
+	struct device *pdev;
+	u32 rid;
+	int err, resv = 0;
+
+	pci_for_each_dma_alias(to_pci_dev(dev), __get_pci_rid, &rid);
+
+	for_each_node_with_property(msi_np, "msi-controller") {
+		for (pdev = dev; pdev; pdev = pdev->parent) {
+			if (!of_pci_map_rid(pdev->of_node, rid, "msi-map",
+					    "msi-map-mask", &msi_np, NULL)) {
+
+				err = of_iommu_alloc_resv_region(msi_np, head);
+				if (err)
+					return err;
+				resv++;
+			}
+		}
+	}
+
+	return resv;
+}
+
+static int of_platform_msi_get_resv_regions(struct device *dev,
+				       struct list_head *head)
+{
+	struct of_phandle_args args;
+	int err, resv = 0;
+
+	while (!of_parse_phandle_with_args(dev->of_node, "msi-parent",
+					   "#msi-cells", resv, &args)) {
+
+		err = of_iommu_alloc_resv_region(args.np, head);
+		of_node_put(args.np);
+		if (err)
+			return err;
+		resv++;
+	}
+
+	return resv;
+}
+
 const struct iommu_ops *of_iommu_configure(struct device *dev,
 					   struct device_node *master_np)
 {
@@ -235,6 +311,25 @@  const struct iommu_ops *of_iommu_configure(struct device *dev,
 	return ops;
 }
 
+/**
+ * of_iommu_msi_get_resv_regions - Reserved region driver helper
+ * @dev: Device from iommu_get_resv_regions()
+ * @head: Reserved region list from iommu_get_resv_regions()
+ *
+ * Returns: Number of reserved regions on success (0 if no associated
+ *          msi parent), appropriate error value otherwise.
+ */
+int of_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
+{
+
+	if (dev_is_pci(dev))
+		return of_pci_msi_get_resv_regions(dev, head);
+	else if (dev->of_node)
+		return of_platform_msi_get_resv_regions(dev, head);
+
+	return 0;
+}
+
 static int __init of_iommu_init(void)
 {
 	struct device_node *np;
diff --git a/include/linux/of_iommu.h b/include/linux/of_iommu.h
index 13394ac..9267772 100644
--- a/include/linux/of_iommu.h
+++ b/include/linux/of_iommu.h
@@ -14,6 +14,9 @@  extern int of_get_dma_window(struct device_node *dn, const char *prefix,
 extern const struct iommu_ops *of_iommu_configure(struct device *dev,
 					struct device_node *master_np);
 
+extern int of_iommu_msi_get_resv_regions(struct device *dev,
+					struct list_head *head);
+
 #else
 
 static inline int of_get_dma_window(struct device_node *dn, const char *prefix,
@@ -29,6 +32,13 @@  static inline const struct iommu_ops *of_iommu_configure(struct device *dev,
 	return NULL;
 }
 
+static int of_iommu_msi_get_resv_regions(struct device *dev,
+					struct list_head *head)
+{
+	return -ENODEV;
+}
+
+
 #endif	/* CONFIG_OF_IOMMU */
 
 extern struct of_device_id __iommu_of_table;