diff mbox

[Xen-devel,v3,12/13] xen/iommu: smmu: Add Xen specific code to be able to use the driver

Message ID 1422643768-23614-13-git-send-email-julien.grall@linaro.org
State Superseded, archived
Headers show

Commit Message

Julien Grall Jan. 30, 2015, 6:49 p.m. UTC
The main goal is to modify as little the Linux code to be able to port
easily new feature added in Linux repo for the driver.

To achieve that we:
    - Add helpers to Linux function not implemented on Xen
    - Add callbacks used by Xen to do our own stuff and call Linux ones
    - Only modify when required the code which comes from Linux. If so a
    comment has been added with /* Xen: ... */ explaining why it's
    necessary.

The support for PCI has been commented because it's not yet supported by
Xen ARM and therefore won't compile.

Signed-off-by: Julien Grall <julien.grall@linaro.org>

---
    Changes in v2:
        - Add the ACCESS_ONCE definition in the drivers. The patch to
        introduce the one in common code has been dropped.
        - The include xen/device.h has been dropped in favor of
        asm/device.h
---
 xen/drivers/passthrough/arm/Makefile |   1 +
 xen/drivers/passthrough/arm/smmu.c   | 678 +++++++++++++++++++++++++++++++----
 2 files changed, 612 insertions(+), 67 deletions(-)

Comments

Julien Grall Feb. 2, 2015, 11:01 p.m. UTC | #1
Hello Manish,

On 02/02/2015 18:55, Manish wrote:
> A general comment,

Please avoid top post.

> master->of_node is used as a unique value in the rb_tree. In case of
> pci_passthrough devices are enumerated and not present in device tree,
> so I would have to remove the rb_tree and replace with a linked list.

Clearly no. There is no big change necessary in the code. Think that 
Linux is able to handle PCI with master->of_node. So why Xen has to be 
different?

In case of PCI, master is attached to a root bus. Every root buses are 
described in the device tree so they are associated to a device node.

I have added few TODOs (just search TODO) in the code where you should 
plumb to support PCI. Any change outside is by default wrong.

The TODOs are:
	* dev_get_dev_node: you have to find the root bus and return the device 
node.
         * arm_smmu_add_device: You have to get the stream IDs associate 
to it and store it in the PCI device IOMMU data.

Regards,
Julien Grall Feb. 9, 2015, 3:40 p.m. UTC | #2
Hi Stefano,

On 06/02/2015 21:20, Stefano Stabellini wrote:
> On Fri, 30 Jan 2015, Julien Grall wrote:
>> -static int force_stage;
>> -module_param_named(force_stage, force_stage, int, S_IRUGO | S_IWUSR);
>> -MODULE_PARM_DESC(force_stage,
>> -	"Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation.");
>
> for the sake of minimizing the diff, I would add
>
> #define module_param_named(a, b, c, d)
> #define MODULE_PARM_DESC(a, b)
>
> to the Xen definitions above

We still have to drop force_stage because it will get unused and won't 
compile.
So it's not too bad to remove the 2 other lines.

[..]

>> +static struct iommu_group *iommu_group_get(struct device *dev)
>> +{
>> +	struct iommu_group *group = dev_iommu_group(dev);
>> +
>> +	if (group)
>> +		atomic_inc(&group->ref);
>> +
>> +	return group;
>> +}
>> +
>> +#define iommu_group_get_iommudata(group) (group)->cfg
>
> I would move all the Xen stuff at the beginning of the file or maybe to
> a separate file. You could #include the linux smmu driver into it.
> Maybe we cannot have runtime patching of this file, but at least we can
> keep Xen and Linux stuff clearly separate.

Rather than using a void * in iommu_group, I directly use the real type 
(arm_smmu_master_cfg). It's defined a bit earlier.

I would prefer to keep the real type because it's easier to understand 
the code and catch possible error.

All the Xen code is beginning with /* Xen: */. Though, I could add a /* 
Xen: End */ to make the separation clearer.

>
>>   static DEFINE_SPINLOCK(arm_smmu_devices_lock);
>>   static LIST_HEAD(arm_smmu_devices);
>>
>> @@ -455,6 +690,8 @@ static void parse_driver_options(struct arm_smmu_device *smmu)
>>
>>   static struct device_node *dev_get_dev_node(struct device *dev)
>>   {
>> +	/* Xen: TODO: Add support for PCI */
>> +#if 0
>>   	if (dev_is_pci(dev)) {
>>   		struct pci_bus *bus = to_pci_dev(dev)->bus;
>>
>> @@ -462,7 +699,7 @@ static struct device_node *dev_get_dev_node(struct device *dev)
>>   			bus = bus->parent;
>>   		return bus->bridge->parent->of_node;
>>   	}
>> -
>> +#endif
>
> Would it be possible instead to add:
>
> #define dev_is_pci (0)
>
> above among the Xen definitions?

dev_is_pci is already defined to 0 (see include/asm-arm/device.h), but 
it won't prevent the compiler to check the validity of the code inside...

pci_dev doesn't contain a structure pci_bus on Xen. So it won't compile.

>
>>   	return dev->of_node;
>>   }
>>
>> @@ -556,6 +793,9 @@ static int register_smmu_master(struct arm_smmu_device *smmu,
>>   	master->of_node			= masterspec->np;
>>   	master->cfg.num_streamids	= masterspec->args_count;
>>
>> +	/* Xen: Let Xen knows that the device is protected by an SMMU */
>> +	dt_device_set_protected(masterspec->np);
>> +
>>   	for (i = 0; i < master->cfg.num_streamids; ++i) {
>>   		u16 streamid = masterspec->args[i];
>>
>> @@ -651,11 +891,12 @@ static void arm_smmu_tlb_inv_context(struct arm_smmu_domain *smmu_domain)
>>   	arm_smmu_tlb_sync(smmu);
>>   }
>>
>> -static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>> +static void arm_smmu_context_fault(int irq, void *dev,
>> +				   struct cpu_user_regs *regs)
>>   {
>> -	int flags, ret;
>> +	int flags;
>>   	u32 fsr, far, fsynr, resume;
>> -	unsigned long iova;
>> +	paddr_t iova;
>>   	struct iommu_domain *domain = dev;
>>   	struct arm_smmu_domain *smmu_domain = domain->priv;
>>   	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>> @@ -666,7 +907,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>>   	fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
>>
>>   	if (!(fsr & FSR_FAULT))
>> -		return IRQ_NONE;
>> +		return;
>>
>>   	if (fsr & FSR_IGN)
>>   		dev_err_ratelimited(smmu->dev,
>> @@ -678,19 +919,16 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>>
>>   	far = readl_relaxed(cb_base + ARM_SMMU_CB_FAR_LO);
>>   	iova = far;
>> -#ifdef CONFIG_64BIT
>> +	/* Xen: The fault address maybe higher than 32 bits on arm32 */
>>   	far = readl_relaxed(cb_base + ARM_SMMU_CB_FAR_HI);
>> -	iova |= ((unsigned long)far << 32);
>> -#endif
>> +	iova |= ((paddr_t)far << 32);
>>
>>   	if (!report_iommu_fault(domain, smmu->dev, iova, flags)) {
>> -		ret = IRQ_HANDLED;
>>   		resume = RESUME_RETRY;
>>   	} else {
>>   		dev_err_ratelimited(smmu->dev,
>> -		    "Unhandled context fault: iova=0x%08lx, fsynr=0x%x, cb=%d\n",
>> +		    "Unhandled context fault: iova=0x%"PRIpaddr", fsynr=0x%x, cb=%d\n",
>>   		    iova, fsynr, cfg->cbndx);
>> -		ret = IRQ_NONE;
>>   		resume = RESUME_TERMINATE;
>>   	}
>>
>> @@ -700,11 +938,10 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>>   	/* Retry or terminate any stalled transactions */
>>   	if (fsr & FSR_SS)
>>   		writel_relaxed(resume, cb_base + ARM_SMMU_CB_RESUME);
>> -
>> -	return ret;
>>   }
>
> For the sake of minimizing the changes to Linux functions, could you
> write a wrapper around this function instead? That might allow you to
> remove the changes related to the return value.

We should avoid to try to minimize the code against the clarity of the 
code...

The change you suggest would require to modify the caller of this 
function, which is less clearer than this one.

>> -static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>> +static void arm_smmu_global_fault(int irq, void *dev,
>> +                                  struct cpu_user_regs *regs)
>>   {
>>   	u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
>>   	struct arm_smmu_device *smmu = dev;
>> @@ -716,7 +953,7 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>>   	gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
>>
>>   	if (!gfsr)
>> -		return IRQ_NONE;
>> +		return;
>>
>>   	dev_err_ratelimited(smmu->dev,
>>   		"Unexpected global fault, this could be serious\n");
>> @@ -725,9 +962,10 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>>   		gfsr, gfsynr0, gfsynr1, gfsynr2);
>>
>>   	writel(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
>> -	return IRQ_HANDLED;
>>   }
>
> Same here

Ditto.

>
>> +/* Xen: Page tables are shared with the processor */
>> +#if 0
>>   static void arm_smmu_flush_pgtable(struct arm_smmu_device *smmu, void *addr,
>>   				   size_t size)
>>   {
>> @@ -749,6 +987,7 @@ static void arm_smmu_flush_pgtable(struct arm_smmu_device *smmu, void *addr,
>>   				DMA_TO_DEVICE);
>>   	}
>>   }
>> +#endif
>
> But then you need to fix all the call sites. I think it is best to add a
> return at the beginning of the function. Again the goal is to minimize
> changes to the linux driver.

There is only one call site and the field pgd is drop because pgt_t 
doesn't exist on Xen. We would have to add a comment on the caller to 
explain what means PTRS_PER_PGD which is meaningful on Xe.

Again, we need to find a fair trade between clarity and the number of 
changes size.
If I really wanted to minize the code, I would have avoid the #if 0 and 
add hundreds typedef/macro to compile the code. We need to find a fair 
trade between the clarity and the size changes.

At the moment, I believe the current patch is the best we can have.

>
>>   static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain)
>>   {
>> @@ -757,6 +996,7 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain)
>>   	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>>   	struct arm_smmu_device *smmu = smmu_domain->smmu;
>>   	void __iomem *cb_base, *gr0_base, *gr1_base;
>> +	paddr_t p2maddr;
>>
>>   	gr0_base = ARM_SMMU_GR0(smmu);
>>   	gr1_base = ARM_SMMU_GR1(smmu);
>> @@ -840,11 +1080,16 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain)
>>   	}
>>
>>   	/* TTBR0 */
>> -	arm_smmu_flush_pgtable(smmu, cfg->pgd,
>> -			       PTRS_PER_PGD * sizeof(pgd_t));
>> -	reg = __pa(cfg->pgd);
>> +	/* Xen: The page table is shared with the P2M code */
>> +	ASSERT(smmu_domain->cfg.domain != NULL);
>> +	p2maddr = page_to_maddr(smmu_domain->cfg.domain->arch.p2m.root);
>> +
>> +	dev_notice(smmu->dev, "d%u: p2maddr 0x%"PRIpaddr"\n",
>> +		   smmu_domain->cfg.domain->domain_id, p2maddr);
>> +
>> +	reg = (p2maddr & ((1ULL << 32) - 1));
>>   	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_LO);
>> -	reg = (phys_addr_t)__pa(cfg->pgd) >> 32;
>> +	reg = (p2maddr >> 32);
>>   	if (stage1)
>>   		reg |= ARM_SMMU_CB_ASID(cfg) << TTBRn_HI_ASID_SHIFT;
>>   	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_HI);
>
> Not sure how we could get rid of this change.
> Maybe we could keep pgd in arm_smmu_cfg and make it point to
> smmu_domain->cfg.domain->arch.p2m.root?

Maybe, I will look at it.

>
>> @@ -886,9 +1131,21 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain)
>>   			reg |= (64 - smmu->s1_input_size) << TTBCR_T0SZ_SHIFT;
>>   		}
>>   	} else {
>> +#if CONFIG_ARM_32
>> +		/* Xen: Midway is using 40-bit address space. Though it may
>> +		 * not work on other ARM32 platform with SMMU-v1.
>> +		 * TODO: A quirk may be necessary if we have to support
>> +		 * other ARM32 platform with SMMU-v1.
>> +		 */
>> +		reg = 0x18 << TTBCR_T0SZ_SHIFT;
>> +#else
>>   		reg = 0;
>> +#endif
>>   	}
>>
>> +	/* Xen: The attributes to walk the page table should be the same as
>> +	 * VTCR_EL2. Currently doesn't differ from Linux ones.
>> +	 */
>>   	reg |= TTBCR_EAE |
>>   	      (TTBCR_SH_IS << TTBCR_SH0_SHIFT) |
>>   	      (TTBCR_RGN_WBWA << TTBCR_ORGN0_SHIFT) |
>> @@ -1031,7 +1288,6 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
>>   static int arm_smmu_domain_init(struct iommu_domain *domain)
>>   {
>>   	struct arm_smmu_domain *smmu_domain;
>> -	pgd_t *pgd;
>>
>>   	/*
>>   	 * Allocate the domain and initialise some of its data structures.
>> @@ -1042,20 +1298,12 @@ static int arm_smmu_domain_init(struct iommu_domain *domain)
>>   	if (!smmu_domain)
>>   		return -ENOMEM;
>>
>> -	pgd = kcalloc(PTRS_PER_PGD, sizeof(pgd_t), GFP_KERNEL);
>> -	if (!pgd)
>> -		goto out_free_domain;
>> -	smmu_domain->cfg.pgd = pgd;
>> -
>>   	spin_lock_init(&smmu_domain->lock);
>>   	domain->priv = smmu_domain;
>>   	return 0;
>> -
>> -out_free_domain:
>> -	kfree(smmu_domain);
>> -	return -ENOMEM;
>
> I would consider using explicit if (0) or #if 0 blocks instead: next
> time you'll see right away what to remove.

Which will add more diff, in this case 3 #if 0 / #endif plus some 
comments to explain the #if 0. Because you want to understand what is 
the variable pgd.

>
>>   }
>>
>> +#if 0
>>   static void arm_smmu_free_ptes(pmd_t *pmd)
>>   {
>>   	pgtable_t table = pmd_pgtable(*pmd);
>> @@ -1118,6 +1366,7 @@ static void arm_smmu_free_pgtables(struct arm_smmu_domain *smmu_domain)
>>
>>   	kfree(pgd_base);
>>   }
>> +#endif
>
> If you use return at the beginning of the functions instead, you can
> avoid changing the call sites, like below:

Calling this function would be very confusing for someone which will 
debug the code. I think the #if 0/#endif of the whole functions make 
more sense.

[..]


>> @@ -1454,10 +1704,14 @@ static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
>>   	if (!cfg)
>>   		return;
>>
>> -	dev->archdata.iommu = NULL;
>> +	dev_iommu_domain(dev) = NULL;
>>   	arm_smmu_domain_remove_master(smmu_domain, cfg);
>>   }
>
>   %s/dev->archdata.iommu/dev_iommu_domain(dev)/g is not too bad I guess

We have to store more data for Xen. So the only solution is to change 
this code.

[..]

>>   static int arm_smmu_add_device(struct device *dev)
>>   {
>> @@ -1784,6 +2042,10 @@ static int arm_smmu_add_device(struct device *dev)
>>   	void (*releasefn)(void *) = NULL;
>>   	int ret;
>>
>> +	/* Xen: Check if the device has already been added */
>> +	if (dev_iommu_group(dev))
>> +		return -EBUSY;
>
> This is strange. Why is this check missing in Linux?

IIRC it's done in the iommu common code. But actually this check is 
already done arm_smmu_add_device (Xen specific function). It may be a 
left-over my development.
I will drop it.

>
>>   	smmu = find_smmu_for_device(dev);
>>   	if (!smmu)
>>   		return -ENODEV;
>> @@ -1795,6 +2057,9 @@ static int arm_smmu_add_device(struct device *dev)
>>   	}
>>
>>   	if (dev_is_pci(dev)) {
>> +		/* Xen: TODO: Add PCI support */
>> +		BUG();
>> +#if 0
>>   		struct pci_dev *pdev = to_pci_dev(dev);
>>
>>   		cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
>> @@ -1811,6 +2076,7 @@ static int arm_smmu_add_device(struct device *dev)
>>   		pci_for_each_dma_alias(pdev, __arm_smmu_get_pci_sid,
>>   				       &cfg->streamids[0]);
>>   		releasefn = __arm_smmu_release_pci_iommudata;
>> +#endif
>
> just make dev_is_pci be (0)

dev_is_pci is already set to 0 but a if (0) won't prevent any 
compilation error...

[..]

>> -MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
>
> #define MODULE_DEVICE_TABLE(a, b)

Will do.

[..]


>> +out_free:
>> +	kfree(smmu->irqs);
>> +	if (!IS_ERR(smmu->base))
>> +		iounmap(smmu->base);
>> +	kfree(smmu);
>> +
>>   	return err;
>
> It would require fewer changes to add a wrapper function.

But we won't be able to get the smmu pointer as it's not stored if the 
function fail.

[..]

>> +DT_DEVICE_START(smmu, "ARM SMMU", DEVICE_IOMMU)
>> +	.dt_match = arm_smmu_of_match,
>> +	.init = arm_smmu_dt_init,
>> +DT_DEVICE_END
>
> If possible, I would move them all together at the beginning of the
> file or maybe to a separate file.

Not possible. Some code for Xen depends on the functions defined in the 
Linux part. We would have to add prototype for those functions but they 
are using structure defined within this file. So we would end up to add 
all the code in the middle of the file.

We could do a separate file but we also need to modify some bits of the 
Linux code... it's pointless.

Overall, the position of the Xen functions/definitions are placed where 
they fit the most and make the whole code still clear.

Regards,
Julien Grall Feb. 19, 2015, 5:17 p.m. UTC | #3
Hi Stefano,

On 09/02/15 15:40, Julien Grall wrote:
>>
>>>   static void arm_smmu_init_context_bank(struct arm_smmu_domain
>>> *smmu_domain)
>>>   {
>>> @@ -757,6 +996,7 @@ static void arm_smmu_init_context_bank(struct
>>> arm_smmu_domain *smmu_domain)
>>>       struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>>>       struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>       void __iomem *cb_base, *gr0_base, *gr1_base;
>>> +    paddr_t p2maddr;
>>>
>>>       gr0_base = ARM_SMMU_GR0(smmu);
>>>       gr1_base = ARM_SMMU_GR1(smmu);
>>> @@ -840,11 +1080,16 @@ static void arm_smmu_init_context_bank(struct
>>> arm_smmu_domain *smmu_domain)
>>>       }
>>>
>>>       /* TTBR0 */
>>> -    arm_smmu_flush_pgtable(smmu, cfg->pgd,
>>> -                   PTRS_PER_PGD * sizeof(pgd_t));
>>> -    reg = __pa(cfg->pgd);
>>> +    /* Xen: The page table is shared with the P2M code */
>>> +    ASSERT(smmu_domain->cfg.domain != NULL);
>>> +    p2maddr = page_to_maddr(smmu_domain->cfg.domain->arch.p2m.root);
>>> +
>>> +    dev_notice(smmu->dev, "d%u: p2maddr 0x%"PRIpaddr"\n",
>>> +           smmu_domain->cfg.domain->domain_id, p2maddr);
>>> +
>>> +    reg = (p2maddr & ((1ULL << 32) - 1));
>>>       writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_LO);
>>> -    reg = (phys_addr_t)__pa(cfg->pgd) >> 32;
>>> +    reg = (p2maddr >> 32);
>>>       if (stage1)
>>>           reg |= ARM_SMMU_CB_ASID(cfg) << TTBRn_HI_ASID_SHIFT;
>>>       writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_HI);
>>
>> Not sure how we could get rid of this change.
>> Maybe we could keep pgd in arm_smmu_cfg and make it point to
>> smmu_domain->cfg.domain->arch.p2m.root?
> 
> Maybe, I will look at it.

I though more about it. __pa(...) requires a virtual address on
parameter. As the page allocated for the P2M is part of the domheap,
there is no permanent mapping.

That would require to map the p2m or override __pa.

IHMO, both of those case a less clear than the current code. So I will
keep the current solution.

Regards,
Julien Grall Feb. 20, 2015, 1:50 p.m. UTC | #4
Hi Ian,

On 20/02/15 13:23, Ian Campbell wrote:
> On Fri, 2015-02-06 at 13:20 +0000, Stefano Stabellini wrote:
>>> +#define iommu_group_get_iommudata(group) (group)->cfg
>>
>> I would move all the Xen stuff at the beginning of the file or maybe to
>> a separate file. You could #include the linux smmu driver into it.
>> Maybe we cannot have runtime patching of this file, but at least we can
>> keep Xen and Linux stuff clearly separate.
> 
> I was going to suggest something similar e.g. #include <ssmu-xen-shim.h>
> or something.

See my comment on Stefano's mail for this.

>>
>>
>>>  static DEFINE_SPINLOCK(arm_smmu_devices_lock);
>>>  static LIST_HEAD(arm_smmu_devices);
>>>  
>>> @@ -455,6 +690,8 @@ static void parse_driver_options(struct arm_smmu_device *smmu)
>>>  
>>>  static struct device_node *dev_get_dev_node(struct device *dev)
>>>  {
>>> +	/* Xen: TODO: Add support for PCI */
>>> +#if 0
>>>  	if (dev_is_pci(dev)) {
>>>  		struct pci_bus *bus = to_pci_dev(dev)->bus;
>>>  
>>> @@ -462,7 +699,7 @@ static struct device_node *dev_get_dev_node(struct device *dev)
>>>  			bus = bus->parent;
>>>  		return bus->bridge->parent->of_node;
>>>  	}
>>> -
>>> +#endif
>>
>> Would it be possible instead to add:
>>
>> #define dev_is_pci (0)
>>
>> above among the Xen definitions?
> 
> Would be preferable if possible.

It's already done but ... if (0) doesn't mean the code inside the if
could be invalid.

In this specific case, we don't have a field struct pci_bus *bus in our
pci_dev. So it won't compile.

>>   
>>> @@ -700,11 +938,10 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>>>  	/* Retry or terminate any stalled transactions */
>>>  	if (fsr & FSR_SS)
>>>  		writel_relaxed(resume, cb_base + ARM_SMMU_CB_RESUME);
>>> -
>>> -	return ret;
>>>  }
>>
>> For the sake of minimizing the changes to Linux functions, could you
>> write a wrapper around this function instead? That might allow you to
>> remove the changes related to the return value.
> 
> typedef void irqreturn_t; and "#define IRQ_NONE /**/" etc perhaps?
> 
> Or even just cast the function in the call to request_irq, the differing
> return type shouldn't cause a problem in this context I think.

I can give a look.

>>
>>
>>> +/* Xen: Page tables are shared with the processor */
>>> +#if 0
>>>  static void arm_smmu_flush_pgtable(struct arm_smmu_device *smmu, void *addr,
>>>  				   size_t size)
>>>  {
>>> @@ -749,6 +987,7 @@ static void arm_smmu_flush_pgtable(struct arm_smmu_device *smmu, void *addr,
>>>  				DMA_TO_DEVICE);
>>>  	}
>>>  }
>>> +#endif
>>
>> But then you need to fix all the call sites. I think it is best to add a
>> return at the beginning of the function.
> 
> Or add a nop stub... (return sounds better though)

I don't succeed to make my point on this kind of things...

While I agree that we have to minimize the changes in the code. We
shouldn't do it at the cost of an incomprehensible cost.

The function arm_smmu_flush_pgtable doesn't make any sense at all on Xen
(the parameter are even invalid).

Furthermore there is only one call site.


>>> +#if CONFIG_ARM_32
>>> +		/* Xen: Midway is using 40-bit address space.
> 
> I'm a bit surprised that the upstream driver isn't prepared to cope with
> this sort of thing, there are a few LPAE arm32 systems around these
> days. I had the same thought around the " /* Xen: The fault address
> maybe higher than 32 bits on arm32 */" comment too.

At the time I resynced the SMMU drivers, they were using short page
tables. And therefore the number of address bits was limited for the guest.

I don't plan to lose again minimum 2 weeks for re-syncing the driver soon.

Regards,
Julien Grall Feb. 20, 2015, 2:01 p.m. UTC | #5
On 20/02/15 13:29, Ian Campbell wrote:
> On Mon, 2015-02-09 at 23:40 +0800, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 06/02/2015 21:20, Stefano Stabellini wrote:
>>> On Fri, 30 Jan 2015, Julien Grall wrote:
>>>> -static int force_stage;
>>>> -module_param_named(force_stage, force_stage, int, S_IRUGO | S_IWUSR);
>>>> -MODULE_PARM_DESC(force_stage,
>>>> -	"Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation.");
>>>
>>> for the sake of minimizing the diff, I would add
>>>
>>> #define module_param_named(a, b, c, d)
>>> #define MODULE_PARM_DESC(a, b)
>>>
>>> to the Xen definitions above
>>
>> We still have to drop force_stage because it will get unused and won't 
>> compile.
>> So it's not too bad to remove the 2 other lines.
> 
> You currently redefine it as a const int -- is it really unused?

This variable is only read to force the driver to use the specify stage
(1 or 2).

On Xen, we only want to use stage-2. The driver will bail out if the
SMMU doesn't support it.


>> All the Xen code is beginning with /* Xen: */. Though, I could add a /* 
>> Xen: End */ to make the separation clearer.
> 
> BTW, I was thinking to suggest replacing all the #if 0 with #ifndef
> CONFIG_XEN or something, which might make some of the comments shorter
> or even mean they aren't really needed.
> 
> Either way I would write it as:
>    #if 0 /* Xen: Comment comment */
> (similarly any if (0) ...)

I prefer the #if 0 /* Xen: */

>>> For the sake of minimizing the changes to Linux functions, could you
>>> write a wrapper around this function instead? That might allow you to
>>> remove the changes related to the return value.
>>
>> We should avoid to try to minimize the code against the clarity of the 
>> code...
>>
>> The change you suggest would require to modify the caller of this 
>> function, which is less clearer than this one.
> 
> Given an original function foo can you do
> 
> static void foo(void)
> {
>   original code, unmodified
> }
> static void foo_wrapper(void)
> {
>    wrapper
>    foo_orig
>    wrapper
> }
> #define foo foo_wrapper
> 
> or something along those lines?

There is some case where we need to modify the original function. So I'm
not sure it worth to add more line just for a wrapper.

Regards,
diff mbox

Patch

diff --git a/xen/drivers/passthrough/arm/Makefile b/xen/drivers/passthrough/arm/Makefile
index 0484b79..f4cd26e 100644
--- a/xen/drivers/passthrough/arm/Makefile
+++ b/xen/drivers/passthrough/arm/Makefile
@@ -1 +1,2 @@ 
 obj-y += iommu.o
+obj-y += smmu.o
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 8a6514f..373eee8 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -18,6 +18,13 @@ 
  *
  * Author: Will Deacon <will.deacon@arm.com>
  *
+ * Based on Linux drivers/iommu/arm-smmu.c
+ *	=> commit e6b5be2be4e30037eb551e0ed09dd97bd00d85d3
+ *
+ * Xen modification:
+ * Julien Grall <julien.grall@linaro.org>
+ * Copyright (C) 2014 Linaro Limited.
+ *
  * This driver currently supports:
  *	- SMMUv1 and v2 implementations
  *	- Stream-matching and stream-indexing
@@ -28,26 +35,164 @@ 
  *	- Context fault reporting
  */
 
-#define pr_fmt(fmt) "arm-smmu: " fmt
 
-#include <linux/delay.h>
-#include <linux/dma-mapping.h>
-#include <linux/err.h>
-#include <linux/interrupt.h>
-#include <linux/io.h>
-#include <linux/iommu.h>
-#include <linux/mm.h>
-#include <linux/module.h>
-#include <linux/of.h>
-#include <linux/pci.h>
-#include <linux/platform_device.h>
-#include <linux/slab.h>
-#include <linux/spinlock.h>
-#include <linux/bitops.h>
+#include <xen/config.h>
+#include <xen/delay.h>
+#include <xen/errno.h>
+#include <xen/err.h>
+#include <xen/irq.h>
+#include <xen/lib.h>
+#include <xen/list.h>
+#include <xen/mm.h>
+#include <xen/vmap.h>
+#include <xen/rbtree.h>
+#include <xen/sched.h>
+#include <xen/sizes.h>
+#include <asm/atomic.h>
+#include <asm/device.h>
+#include <asm/io.h>
+#include <asm/platform.h>
+
+/* Xen: The below defines are redefined within the file. Undef it */
+#undef SCTLR_AFE
+#undef SCTLR_TRE
+#undef SCTLR_M
+#undef TTBCR_EAE
+
+/* Alias to Xen device tree helpers */
+#define device_node dt_device_node
+#define of_phandle_args dt_phandle_args
+#define of_device_id dt_device_match
+#define of_match_node dt_match_node
+#define of_property_read_u32(np, pname, out) (!dt_property_read_u32(np, pname, out))
+#define of_property_read_bool dt_property_read_bool
+#define of_parse_phandle_with_args dt_parse_phandle_with_args
+
+/* Xen: Helpers to get device MMIO and IRQs */
+struct resource
+{
+	u64 addr;
+	u64 size;
+	unsigned int type;
+};
+
+#define resource_size(res) (res)->size;
+
+#define platform_device dt_device_node
+
+#define IORESOURCE_MEM 0
+#define IORESOURCE_IRQ 1
+
+static struct resource *platform_get_resource(struct platform_device *pdev,
+					      unsigned int type,
+					      unsigned int num)
+{
+	/*
+	 * The resource is only used between 2 calls of platform_get_resource.
+	 * It's quite ugly but it's avoid to add too much code in the part
+	 * imported from Linux
+	 */
+	static struct resource res;
+	int ret = 0;
+
+	res.type = type;
+
+	switch (type) {
+	case IORESOURCE_MEM:
+		ret = dt_device_get_address(pdev, num, &res.addr, &res.size);
+
+		return ((ret) ? NULL : &res);
+
+	case IORESOURCE_IRQ:
+		ret = platform_get_irq(pdev, num);
+		if (ret < 0)
+			return NULL;
+
+		res.addr = ret;
+		res.size = 1;
+
+		return &res;
+
+	default:
+		return NULL;
+	}
+}
+
+/* Alias to Xen IRQ functions */
+#define request_irq(irq, func, flags, name, dev) request_irq(irq, flags, func, name, dev)
+#define free_irq release_irq
+
+/*
+ * Device logger functions
+ * TODO: Handle PCI
+ */
+#define dev_print(dev, lvl, fmt, ...)						\
+	 printk(lvl "smmu: %s: " fmt, dt_node_full_name(dev_to_dt(dev)), ## __VA_ARGS__)
+
+#define dev_dbg(dev, fmt, ...) dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
+#define dev_notice(dev, fmt, ...) dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
+#define dev_warn(dev, fmt, ...) dev_print(dev, XENLOG_WARNING, fmt, ## __VA_ARGS__)
+#define dev_err(dev, fmt, ...) dev_print(dev, XENLOG_ERR, fmt, ## __VA_ARGS__)
+
+#define dev_err_ratelimited(dev, fmt, ...)					\
+	 dev_print(dev, XENLOG_ERR, fmt, ## __VA_ARGS__)
 
-#include <linux/amba/bus.h>
+#define dev_name(dev) dt_node_full_name(dev_to_dt(dev))
 
-#include <asm/pgalloc.h>
+/* Alias to Xen allocation helpers */
+#define kfree xfree
+#define kmalloc(size, flags)		_xmalloc(size, sizeof(void *))
+#define kzalloc(size, flags)		_xzalloc(size, sizeof(void *))
+#define devm_kzalloc(dev, size, flags)	_xzalloc(size, sizeof(void *))
+#define kmalloc_array(size, n, flags)	_xmalloc_array(size, sizeof(void *), n)
+
+static void __iomem *devm_ioremap_resource(struct device *dev,
+					   struct resource *res)
+{
+	void __iomem *ptr;
+
+	if (!res || res->type != IORESOURCE_MEM) {
+		dev_err(dev, "Invalid resource\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	ptr = ioremap_nocache(res->addr, res->size);
+	if (!ptr) {
+		dev_err(dev,
+			"ioremap failed (addr 0x%"PRIx64" size 0x%"PRIx64")\n",
+			res->addr, res->size);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return ptr;
+}
+
+/* Xen doesn't handle IOMMU fault */
+#define report_iommu_fault(...)	1
+
+#define IOMMU_FAULT_READ	0
+#define IOMMU_FAULT_WRITE	1
+
+/* Xen: misc */
+#define PHYS_MASK_SHIFT		PADDR_BITS
+
+#ifdef CONFIG_ARM_64
+# define CONFIG_64BIT
+#endif
+
+#define VA_BITS		0	/* Only used for configuring stage-1 input size */
+
+/* The macro ACCESS_ONCE start to be replaced in Linux in favor of
+ * {READ, WRITE}_ONCE. Rather than introducing in the common code, keep a
+ * version here. We will have to drop it when the SMMU code in Linux will
+ * switch to {READ, WRITE}_ONCE.
+ */
+#define __ACCESS_ONCE(x) ({ \
+	 __maybe_unused typeof(x) __var = 0; \
+	(volatile typeof(x) *)&(x); })
+#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
+
+/***** Start of SMMU definitions *****/
 
 /* Maximum number of stream IDs assigned to a single device */
 #define MAX_MASTER_STREAMIDS		MAX_PHANDLE_ARGS
@@ -330,10 +475,14 @@ 
 
 #define FSYNR0_WNR			(1 << 4)
 
-static int force_stage;
-module_param_named(force_stage, force_stage, int, S_IRUGO | S_IWUSR);
-MODULE_PARM_DESC(force_stage,
-	"Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation.");
+/* Force SMMU mapping to be installed at a particular stage of translation.
+ * A value of '1' or '2' forces the corresponding state. All other values
+ * are ignored (i.e no stage is forced). Note that selecting a specific stage
+ * will disable support for nested translation.
+ *
+ * Xen is only supported stage-2 translation, so force the value to 2.
+ */
+static const int force_stage = 2;
 
 enum arm_smmu_arch_version {
 	ARM_SMMU_V1 = 1,
@@ -406,7 +555,9 @@  struct arm_smmu_cfg {
 	u8				cbndx;
 	u8				irptndx;
 	u32				cbar;
-	pgd_t				*pgd;
+
+	/* Xen: Domain associated to this configuration */
+	struct domain			*domain;
 };
 #define INVALID_IRPTNDX			0xff
 
@@ -426,6 +577,90 @@  struct arm_smmu_domain {
 	spinlock_t			lock;
 };
 
+/* Xen: Dummy iommu_domain */
+struct iommu_domain
+{
+	struct arm_smmu_domain		*priv;
+
+	/* Used to link domain contexts for a same domain */
+	struct list_head		list;
+};
+
+/* Xen: Describes informations required for a Xen domain */
+struct arm_smmu_xen_domain {
+	spinlock_t			lock;
+	/* List of context (i.e iommu_domain) associated to this domain */
+	struct list_head		contexts;
+};
+
+/* Xen: Information about each device stored in dev->archdata.iommu */
+struct arm_smmu_xen_device {
+	struct iommu_domain *domain;
+	struct iommu_group *group;
+};
+
+#define dev_archdata(dev) ((struct arm_smmu_xen_device *)dev->archdata.iommu)
+#define dev_iommu_domain(dev) (dev_archdata(dev)->domain)
+#define dev_iommu_group(dev) (dev_archdata(dev)->group)
+
+/* Xen: Dummy iommu_group */
+struct iommu_group
+{
+	struct arm_smmu_master_cfg *cfg;
+
+	atomic_t ref;
+};
+
+static struct iommu_group *iommu_group_alloc(void)
+{
+	struct iommu_group *group = xzalloc(struct iommu_group);
+
+	if (!group)
+		return ERR_PTR(-ENOMEM);
+
+	atomic_set(&group->ref, 1);
+
+	return group;
+}
+
+static void iommu_group_put(struct iommu_group *group)
+{
+	if (atomic_dec_and_test(&group->ref))
+		xfree(group);
+}
+
+static void iommu_group_set_iommudata(struct iommu_group *group,
+				      struct arm_smmu_master_cfg *cfg,
+				      void (*releasefn)(void *))
+{
+	/* TODO: Store the releasefn for the PCI */
+	ASSERT(releasefn == NULL);
+
+	group->cfg = cfg;
+}
+
+static int iommu_group_add_device(struct iommu_group *group,
+				  struct device *dev)
+{
+	dev_iommu_group(dev) = group;
+
+	atomic_inc(&group->ref);
+
+	return 0;
+}
+
+static struct iommu_group *iommu_group_get(struct device *dev)
+{
+	struct iommu_group *group = dev_iommu_group(dev);
+
+	if (group)
+		atomic_inc(&group->ref);
+
+	return group;
+}
+
+#define iommu_group_get_iommudata(group) (group)->cfg
+
 static DEFINE_SPINLOCK(arm_smmu_devices_lock);
 static LIST_HEAD(arm_smmu_devices);
 
@@ -455,6 +690,8 @@  static void parse_driver_options(struct arm_smmu_device *smmu)
 
 static struct device_node *dev_get_dev_node(struct device *dev)
 {
+	/* Xen: TODO: Add support for PCI */
+#if 0
 	if (dev_is_pci(dev)) {
 		struct pci_bus *bus = to_pci_dev(dev)->bus;
 
@@ -462,7 +699,7 @@  static struct device_node *dev_get_dev_node(struct device *dev)
 			bus = bus->parent;
 		return bus->bridge->parent->of_node;
 	}
-
+#endif
 	return dev->of_node;
 }
 
@@ -556,6 +793,9 @@  static int register_smmu_master(struct arm_smmu_device *smmu,
 	master->of_node			= masterspec->np;
 	master->cfg.num_streamids	= masterspec->args_count;
 
+	/* Xen: Let Xen knows that the device is protected by an SMMU */
+	dt_device_set_protected(masterspec->np);
+
 	for (i = 0; i < master->cfg.num_streamids; ++i) {
 		u16 streamid = masterspec->args[i];
 
@@ -651,11 +891,12 @@  static void arm_smmu_tlb_inv_context(struct arm_smmu_domain *smmu_domain)
 	arm_smmu_tlb_sync(smmu);
 }
 
-static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
+static void arm_smmu_context_fault(int irq, void *dev,
+				   struct cpu_user_regs *regs)
 {
-	int flags, ret;
+	int flags;
 	u32 fsr, far, fsynr, resume;
-	unsigned long iova;
+	paddr_t iova;
 	struct iommu_domain *domain = dev;
 	struct arm_smmu_domain *smmu_domain = domain->priv;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
@@ -666,7 +907,7 @@  static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 	fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
 
 	if (!(fsr & FSR_FAULT))
-		return IRQ_NONE;
+		return;
 
 	if (fsr & FSR_IGN)
 		dev_err_ratelimited(smmu->dev,
@@ -678,19 +919,16 @@  static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 
 	far = readl_relaxed(cb_base + ARM_SMMU_CB_FAR_LO);
 	iova = far;
-#ifdef CONFIG_64BIT
+	/* Xen: The fault address maybe higher than 32 bits on arm32 */
 	far = readl_relaxed(cb_base + ARM_SMMU_CB_FAR_HI);
-	iova |= ((unsigned long)far << 32);
-#endif
+	iova |= ((paddr_t)far << 32);
 
 	if (!report_iommu_fault(domain, smmu->dev, iova, flags)) {
-		ret = IRQ_HANDLED;
 		resume = RESUME_RETRY;
 	} else {
 		dev_err_ratelimited(smmu->dev,
-		    "Unhandled context fault: iova=0x%08lx, fsynr=0x%x, cb=%d\n",
+		    "Unhandled context fault: iova=0x%"PRIpaddr", fsynr=0x%x, cb=%d\n",
 		    iova, fsynr, cfg->cbndx);
-		ret = IRQ_NONE;
 		resume = RESUME_TERMINATE;
 	}
 
@@ -700,11 +938,10 @@  static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 	/* Retry or terminate any stalled transactions */
 	if (fsr & FSR_SS)
 		writel_relaxed(resume, cb_base + ARM_SMMU_CB_RESUME);
-
-	return ret;
 }
 
-static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
+static void arm_smmu_global_fault(int irq, void *dev,
+                                  struct cpu_user_regs *regs)
 {
 	u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
 	struct arm_smmu_device *smmu = dev;
@@ -716,7 +953,7 @@  static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
 	gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
 
 	if (!gfsr)
-		return IRQ_NONE;
+		return;
 
 	dev_err_ratelimited(smmu->dev,
 		"Unexpected global fault, this could be serious\n");
@@ -725,9 +962,10 @@  static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
 		gfsr, gfsynr0, gfsynr1, gfsynr2);
 
 	writel(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
-	return IRQ_HANDLED;
 }
 
+/* Xen: Page tables are shared with the processor */
+#if 0
 static void arm_smmu_flush_pgtable(struct arm_smmu_device *smmu, void *addr,
 				   size_t size)
 {
@@ -749,6 +987,7 @@  static void arm_smmu_flush_pgtable(struct arm_smmu_device *smmu, void *addr,
 				DMA_TO_DEVICE);
 	}
 }
+#endif
 
 static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain)
 {
@@ -757,6 +996,7 @@  static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain)
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	void __iomem *cb_base, *gr0_base, *gr1_base;
+	paddr_t p2maddr;
 
 	gr0_base = ARM_SMMU_GR0(smmu);
 	gr1_base = ARM_SMMU_GR1(smmu);
@@ -840,11 +1080,16 @@  static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain)
 	}
 
 	/* TTBR0 */
-	arm_smmu_flush_pgtable(smmu, cfg->pgd,
-			       PTRS_PER_PGD * sizeof(pgd_t));
-	reg = __pa(cfg->pgd);
+	/* Xen: The page table is shared with the P2M code */
+	ASSERT(smmu_domain->cfg.domain != NULL);
+	p2maddr = page_to_maddr(smmu_domain->cfg.domain->arch.p2m.root);
+
+	dev_notice(smmu->dev, "d%u: p2maddr 0x%"PRIpaddr"\n",
+		   smmu_domain->cfg.domain->domain_id, p2maddr);
+
+	reg = (p2maddr & ((1ULL << 32) - 1));
 	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_LO);
-	reg = (phys_addr_t)__pa(cfg->pgd) >> 32;
+	reg = (p2maddr >> 32);
 	if (stage1)
 		reg |= ARM_SMMU_CB_ASID(cfg) << TTBRn_HI_ASID_SHIFT;
 	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_HI);
@@ -886,9 +1131,21 @@  static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain)
 			reg |= (64 - smmu->s1_input_size) << TTBCR_T0SZ_SHIFT;
 		}
 	} else {
+#if CONFIG_ARM_32
+		/* Xen: Midway is using 40-bit address space. Though it may
+		 * not work on other ARM32 platform with SMMU-v1.
+		 * TODO: A quirk may be necessary if we have to support
+		 * other ARM32 platform with SMMU-v1.
+		 */
+		reg = 0x18 << TTBCR_T0SZ_SHIFT;
+#else
 		reg = 0;
+#endif
 	}
 
+	/* Xen: The attributes to walk the page table should be the same as
+	 * VTCR_EL2. Currently doesn't differ from Linux ones.
+	 */
 	reg |= TTBCR_EAE |
 	      (TTBCR_SH_IS << TTBCR_SH0_SHIFT) |
 	      (TTBCR_RGN_WBWA << TTBCR_ORGN0_SHIFT) |
@@ -1031,7 +1288,6 @@  static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
 static int arm_smmu_domain_init(struct iommu_domain *domain)
 {
 	struct arm_smmu_domain *smmu_domain;
-	pgd_t *pgd;
 
 	/*
 	 * Allocate the domain and initialise some of its data structures.
@@ -1042,20 +1298,12 @@  static int arm_smmu_domain_init(struct iommu_domain *domain)
 	if (!smmu_domain)
 		return -ENOMEM;
 
-	pgd = kcalloc(PTRS_PER_PGD, sizeof(pgd_t), GFP_KERNEL);
-	if (!pgd)
-		goto out_free_domain;
-	smmu_domain->cfg.pgd = pgd;
-
 	spin_lock_init(&smmu_domain->lock);
 	domain->priv = smmu_domain;
 	return 0;
-
-out_free_domain:
-	kfree(smmu_domain);
-	return -ENOMEM;
 }
 
+#if 0
 static void arm_smmu_free_ptes(pmd_t *pmd)
 {
 	pgtable_t table = pmd_pgtable(*pmd);
@@ -1118,6 +1366,7 @@  static void arm_smmu_free_pgtables(struct arm_smmu_domain *smmu_domain)
 
 	kfree(pgd_base);
 }
+#endif
 
 /*
  * For a given set N of 2**order different stream IDs (no duplicates
@@ -1259,7 +1508,6 @@  static void arm_smmu_domain_destroy(struct iommu_domain *domain)
 	 * already been detached.
 	 */
 	arm_smmu_destroy_domain_context(domain);
-	arm_smmu_free_pgtables(smmu_domain);
 	kfree(smmu_domain);
 }
 
@@ -1384,11 +1632,12 @@  static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain,
 	/*
 	 * We *must* clear the S2CR first, because freeing the SMR means
 	 * that it can be re-allocated immediately.
+	 * Xen: Unlike Linux, any access to non-configured stream will fault.
 	 */
 	for (i = 0; i < cfg->num_streamids; ++i) {
 		u32 idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
 
-		writel_relaxed(S2CR_TYPE_BYPASS,
+		writel_relaxed(S2CR_TYPE_FAULT,
 			       gr0_base + ARM_SMMU_GR0_S2CR(idx));
 	}
 
@@ -1408,7 +1657,7 @@  static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		return -ENXIO;
 	}
 
-	if (dev->archdata.iommu) {
+	if (dev_iommu_domain(dev)) {
 		dev_err(dev, "already attached to IOMMU domain\n");
 		return -EEXIST;
 	}
@@ -1440,8 +1689,9 @@  static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		return -ENODEV;
 
 	ret = arm_smmu_domain_add_master(smmu_domain, cfg);
+
 	if (!ret)
-		dev->archdata.iommu = domain;
+		dev_iommu_domain(dev) = domain;
 	return ret;
 }
 
@@ -1454,10 +1704,14 @@  static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
 	if (!cfg)
 		return;
 
-	dev->archdata.iommu = NULL;
+	dev_iommu_domain(dev) = NULL;
 	arm_smmu_domain_remove_master(smmu_domain, cfg);
 }
 
+/* Xen: the page table is shared with the processor, therefore helpers to
+ * implement separate is not necessary.
+ */
+#if 0
 static bool arm_smmu_pte_is_contiguous_range(unsigned long addr,
 					     unsigned long end)
 {
@@ -1746,7 +2000,10 @@  static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
 
 	return __pfn_to_phys(pte_pfn(pte)) | (iova & ~PAGE_MASK);
 }
+#endif
 
+/* Xen: Functions are not used at the moment */
+#if 0
 static bool arm_smmu_capable(enum iommu_cap cap)
 {
 	switch (cap) {
@@ -1775,6 +2032,7 @@  static void __arm_smmu_release_pci_iommudata(void *data)
 {
 	kfree(data);
 }
+#endif
 
 static int arm_smmu_add_device(struct device *dev)
 {
@@ -1784,6 +2042,10 @@  static int arm_smmu_add_device(struct device *dev)
 	void (*releasefn)(void *) = NULL;
 	int ret;
 
+	/* Xen: Check if the device has already been added */
+	if (dev_iommu_group(dev))
+		return -EBUSY;
+
 	smmu = find_smmu_for_device(dev);
 	if (!smmu)
 		return -ENODEV;
@@ -1795,6 +2057,9 @@  static int arm_smmu_add_device(struct device *dev)
 	}
 
 	if (dev_is_pci(dev)) {
+		/* Xen: TODO: Add PCI support */
+		BUG();
+#if 0
 		struct pci_dev *pdev = to_pci_dev(dev);
 
 		cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
@@ -1811,6 +2076,7 @@  static int arm_smmu_add_device(struct device *dev)
 		pci_for_each_dma_alias(pdev, __arm_smmu_get_pci_sid,
 				       &cfg->streamids[0]);
 		releasefn = __arm_smmu_release_pci_iommudata;
+#endif
 	} else {
 		struct arm_smmu_master *master;
 
@@ -1831,6 +2097,8 @@  out_put_group:
 	return ret;
 }
 
+/* Xen: We don't support remove device for now. Will be useful for PCI */
+#if 0
 static void arm_smmu_remove_device(struct device *dev)
 {
 	iommu_group_remove_device(dev);
@@ -1888,6 +2156,7 @@  static const struct iommu_ops arm_smmu_ops = {
 				   ARM_SMMU_PTE_CONT_SIZE |
 				   PAGE_SIZE),
 };
+#endif
 
 static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 {
@@ -1903,7 +2172,11 @@  static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	/* Mark all SMRn as invalid and all S2CRn as bypass */
 	for (i = 0; i < smmu->num_mapping_groups; ++i) {
 		writel_relaxed(0, gr0_base + ARM_SMMU_GR0_SMR(i));
-		writel_relaxed(S2CR_TYPE_BYPASS,
+		/*
+		 * Xen: Unlike Linux, any access to a non-configure stream
+		 * will fault by default.
+		 */
+		writel_relaxed(S2CR_TYPE_FAULT,
 			gr0_base + ARM_SMMU_GR0_S2CR(i));
 	}
 
@@ -1929,6 +2202,8 @@  static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 
 	/* Enable client access, but bypass when no mapping is found */
 	reg &= ~(sCR0_CLIENTPD | sCR0_USFCFG);
+	/* Xen: Unlike Linux, generate a fault when no mapping is found */
+	reg |= sCR0_USFCFG;
 
 	/* Disable forced broadcasting */
 	reg &= ~sCR0_FB;
@@ -2039,7 +2314,7 @@  static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 		smmu->smr_id_mask = sid;
 
 		dev_notice(smmu->dev,
-			   "\tstream matching with %u register groups, mask 0x%x",
+			   "\tstream matching with %u register groups, mask 0x%x\n",
 			   smmu->num_mapping_groups, mask);
 	} else {
 		smmu->num_mapping_groups = (id >> ID0_NUMSIDB_SHIFT) &
@@ -2074,12 +2349,30 @@  static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 	size = arm_smmu_id_size_to_bits((id >> ID2_IAS_SHIFT) & ID2_IAS_MASK);
 	smmu->s1_output_size = min_t(unsigned long, PHYS_MASK_SHIFT, size);
 
+	/* Xen: Stage-2 input size is not restricted */
+	smmu->s2_input_size = size;
+#if 0
 	/* Stage-2 input size limited due to pgd allocation (PTRS_PER_PGD) */
 #ifdef CONFIG_64BIT
 	smmu->s2_input_size = min_t(unsigned long, VA_BITS, size);
 #else
 	smmu->s2_input_size = min(32UL, size);
 #endif
+#endif
+
+	/*
+	 * Xen: SMMU v1: Only 40 bits input address size is supported for
+ 	 * arm32. See arm_smmu_init_context_bank
+	 */
+#ifdef CONFIG_ARM_32
+	if ( smmu->version == ARM_SMMU_V1 && smmu->s2_input_size != 40 )
+	{
+		dev_err(smmu->dev,
+			"Stage-2 Input size %ld not supported for SMMUv1\n",
+			smmu->s2_input_size);
+		return -ENODEV;;
+	}
+#endif
 
 	/* The stage-2 output mask is also applied for bypass */
 	size = arm_smmu_id_size_to_bits((id >> ID2_OAS_SHIFT) & ID2_OAS_MASK);
@@ -2124,8 +2417,11 @@  static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,mmu-500", .data = (void *)ARM_SMMU_V2 },
 	{ },
 };
-MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
 
+/*
+ * Xen: We don't have refcount allocated memory so manually free memory when
+ * an error occured.
+ */
 static int arm_smmu_device_dt_probe(struct platform_device *pdev)
 {
 	const struct of_device_id *of_id;
@@ -2149,14 +2445,17 @@  static int arm_smmu_device_dt_probe(struct platform_device *pdev)
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	smmu->base = devm_ioremap_resource(dev, res);
-	if (IS_ERR(smmu->base))
-		return PTR_ERR(smmu->base);
+	if (IS_ERR(smmu->base)) {
+		err = PTR_ERR(smmu->base);
+		goto out_free;
+	}
 	smmu->size = resource_size(res);
 
 	if (of_property_read_u32(dev->of_node, "#global-interrupts",
 				 &smmu->num_global_irqs)) {
 		dev_err(dev, "missing #global-interrupts property\n");
-		return -ENODEV;
+		err = -ENODEV;
+		goto out_free;
 	}
 
 	num_irqs = 0;
@@ -2169,14 +2468,16 @@  static int arm_smmu_device_dt_probe(struct platform_device *pdev)
 	if (!smmu->num_context_irqs) {
 		dev_err(dev, "found %d interrupts but expected at least %d\n",
 			num_irqs, smmu->num_global_irqs + 1);
-		return -ENODEV;
+		err = -ENODEV;
+		goto out_free;
 	}
 
 	smmu->irqs = devm_kzalloc(dev, sizeof(*smmu->irqs) * num_irqs,
 				  GFP_KERNEL);
 	if (!smmu->irqs) {
 		dev_err(dev, "failed to allocate %d irqs\n", num_irqs);
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto out_free;
 	}
 
 	for (i = 0; i < num_irqs; ++i) {
@@ -2184,7 +2485,8 @@  static int arm_smmu_device_dt_probe(struct platform_device *pdev)
 
 		if (irq < 0) {
 			dev_err(dev, "failed to get irq index %d\n", i);
-			return -ENODEV;
+			err = -ENODEV;
+			goto out_free;
 		}
 		smmu->irqs[i] = irq;
 	}
@@ -2259,12 +2561,20 @@  out_put_masters:
 	for (node = rb_first(&smmu->masters); node; node = rb_next(node)) {
 		struct arm_smmu_master *master
 			= container_of(node, struct arm_smmu_master, node);
-		of_node_put(master->of_node);
+		kfree(master);
 	}
 
+out_free:
+	kfree(smmu->irqs);
+	if (!IS_ERR(smmu->base))
+		iounmap(smmu->base);
+	kfree(smmu);
+
 	return err;
 }
 
+/* Xen: We never remove SMMU */
+#if 0
 static int arm_smmu_device_remove(struct platform_device *pdev)
 {
 	int i;
@@ -2359,3 +2669,237 @@  module_exit(arm_smmu_exit);
 MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations");
 MODULE_AUTHOR("Will Deacon <will.deacon@arm.com>");
 MODULE_LICENSE("GPL v2");
+#endif
+
+/* Xen specific function */
+
+static void arm_smmu_iotlb_flush_all(struct domain *d)
+{
+	struct arm_smmu_xen_domain *smmu_domain = domain_hvm_iommu(d)->arch.priv;
+	struct iommu_domain *cfg;
+
+	spin_lock(&smmu_domain->lock);
+	list_for_each_entry(cfg, &smmu_domain->contexts, list) {
+		/*
+		 * Only invalidate the context when SMMU is present.
+		 * This is because the context initialization is delayed
+		 * until a master has been added.
+		 */
+		if (unlikely(!ACCESS_ONCE(cfg->priv->smmu)))
+			continue;
+		arm_smmu_tlb_inv_context(cfg->priv);
+	}
+	spin_unlock(&smmu_domain->lock);
+}
+
+static void arm_smmu_iotlb_flush(struct domain *d, unsigned long gfn,
+                                 unsigned int page_count)
+{
+    /* ARM SMMU v1 doesn't have flush by VMA and VMID */
+    arm_smmu_iotlb_flush_all(d);
+}
+
+static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
+			       struct device *dev)
+{
+	struct iommu_domain *domain;
+	struct arm_smmu_xen_domain *xen_domain;
+	int ret;
+
+	xen_domain = domain_hvm_iommu(d)->arch.priv;
+
+	if (!dev->archdata.iommu) {
+		dev->archdata.iommu = xzalloc(struct arm_smmu_xen_device);
+		if (!dev->archdata.iommu)
+			return -ENOMEM;
+	}
+
+	if (!dev_iommu_group(dev)) {
+		ret = arm_smmu_add_device(dev);
+		if (ret)
+			return ret;
+	}
+
+	/*
+	 * TODO: Share the context bank (i.e iommu_domain) when the device is
+	 * under the same SMMU as another device assigned to this domain.
+	 * Would it useful for PCI
+	 */
+	domain = xzalloc(struct iommu_domain);
+	if (!domain)
+		return -ENOMEM;
+
+	ret = arm_smmu_domain_init(domain);
+	if (ret)
+		goto err_dom_init;
+
+	domain->priv->cfg.domain = d;
+
+	ret = arm_smmu_attach_dev(domain, dev);
+	if (ret)
+		goto err_attach_dev;
+
+	spin_lock(&xen_domain->lock);
+	/* Chain the new context to the domain */
+	list_add(&domain->list, &xen_domain->contexts);
+	spin_unlock(&xen_domain->lock);
+
+	return 0;
+
+err_attach_dev:
+	arm_smmu_domain_destroy(domain);
+err_dom_init:
+	xfree(domain);
+
+	return ret;
+}
+
+static int arm_smmu_deassign_dev(struct domain *d, struct device *dev)
+{
+	struct iommu_domain *domain = dev_iommu_domain(dev);
+	struct arm_smmu_xen_domain *xen_domain;
+
+	xen_domain = domain_hvm_iommu(d)->arch.priv;
+
+	if (!domain || domain->priv->cfg.domain != d) {
+		dev_err(dev, " not attached to domain %d\n", d->domain_id);
+		return -ESRCH;
+	}
+
+	arm_smmu_detach_dev(domain, dev);
+
+	spin_lock(&xen_domain->lock);
+	list_del(&domain->list);
+	spin_unlock(&xen_domain->lock);
+
+	arm_smmu_domain_destroy(domain);
+	xfree(domain);
+
+	return 0;
+}
+
+static int arm_smmu_reassign_dev(struct domain *s, struct domain *t,
+				 u8 devfn,  struct device *dev)
+{
+	int ret = 0;
+
+	/* Don't allow remapping on other domain than hwdom */
+	if (t != hardware_domain)
+		return -EPERM;
+
+	if (t == s)
+		return 0;
+
+	ret = arm_smmu_deassign_dev(s, dev);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int arm_smmu_iommu_domain_init(struct domain *d)
+{
+	struct arm_smmu_xen_domain *xen_domain;
+
+	xen_domain = xzalloc(struct arm_smmu_xen_domain);
+	if ( !xen_domain )
+		return -ENOMEM;
+
+	spin_lock_init(&xen_domain->lock);
+	INIT_LIST_HEAD(&xen_domain->contexts);
+
+	domain_hvm_iommu(d)->arch.priv = xen_domain;
+
+	return 0;
+}
+
+static void __hwdom_init arm_smmu_iommu_hwdom_init(struct domain *d)
+{
+}
+
+static void arm_smmu_iommu_domain_teardown(struct domain *d)
+{
+	struct arm_smmu_xen_domain *xen_domain = domain_hvm_iommu(d)->arch.priv;
+
+	ASSERT(list_empty(&xen_domain->contexts));
+	xfree(xen_domain);
+}
+
+
+static int arm_smmu_map_page(struct domain *d, unsigned long gfn,
+			     unsigned long mfn, unsigned int flags)
+{
+	p2m_type_t t;
+
+	/*
+	 * Grant mappings can be used for DMA requests. The dev_bus_addr
+	 * returned by the hypercall is the MFN (not the IPA). For device
+	 * protected by an IOMMU, Xen needs to add a 1:1 mapping in the domain
+	 * p2m to allow DMA request to work.
+	 * This is only valid when the domain is directed mapped. Hence this
+	 * function should only be used by gnttab code with gfn == mfn.
+	 */
+	BUG_ON(!is_domain_direct_mapped(d));
+	BUG_ON(mfn != gfn);
+
+	/* We only support readable and writable flags */
+	if (!(flags & (IOMMUF_readable | IOMMUF_writable)))
+		return -EINVAL;
+
+	t = (flags & IOMMUF_writable) ? p2m_iommu_map_rw : p2m_iommu_map_ro;
+
+	/*
+	 * The function guest_physmap_add_entry replaces the current mapping
+	 * if there is already one...
+	 */
+	return guest_physmap_add_entry(d, gfn, mfn, 0, t);
+}
+
+static int arm_smmu_unmap_page(struct domain *d, unsigned long gfn)
+{
+	/*
+	 * This function should only be used by gnttab code when the domain
+	 * is direct mapped
+	 */
+	if ( !is_domain_direct_mapped(d) )
+		return -EINVAL;
+
+	guest_physmap_remove_page(d, gfn, gfn, 0);
+
+	return 0;
+}
+
+static const struct iommu_ops arm_smmu_iommu_ops = {
+    .init = arm_smmu_iommu_domain_init,
+    .hwdom_init = arm_smmu_iommu_hwdom_init,
+    .teardown = arm_smmu_iommu_domain_teardown,
+    .iotlb_flush = arm_smmu_iotlb_flush,
+    .iotlb_flush_all = arm_smmu_iotlb_flush_all,
+    .assign_device = arm_smmu_assign_dev,
+    .reassign_device = arm_smmu_reassign_dev,
+    .map_page = arm_smmu_map_page,
+    .unmap_page = arm_smmu_unmap_page,
+};
+
+static __init int arm_smmu_dt_init(struct dt_device_node *dev,
+				   const void *data)
+{
+	int rc;
+
+	/*
+	 * Even if the device can't be initialized, we don't want to
+	 * give the SMMU device to dom0.
+	 */
+	dt_device_set_used_by(dev, DOMID_XEN);
+
+	rc = arm_smmu_device_dt_probe(dev);
+	if ( !rc )
+		iommu_set_ops(&arm_smmu_iommu_ops);
+
+	return rc;
+}
+
+DT_DEVICE_START(smmu, "ARM SMMU", DEVICE_IOMMU)
+	.dt_match = arm_smmu_of_match,
+	.init = arm_smmu_dt_init,
+DT_DEVICE_END