[v7,3/8] arm64: introduce is_device_dma_coherent

Message ID	20141105165646.GN32700@e104818-lin.cambridge.arm.com
State	New
Headers	show Return-Path: <patchwork-forward+bncBDZ6PXELJIIOFLPJUICRUBAYDJBRG@linaro.org> Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.45 as permitted sender) client-ip=209.85.215.45; Received-SPF: none (google.com: linux-kernel-owner@vger.kernel.org does not designate permitted sender hosts) client-ip=209.132.180.67; Date: Wed, 5 Nov 2014 16:56:47 +0000 From: Catalin Marinas <catalin.marinas@arm.com> To: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Will Deacon <Will.Deacon@arm.com>, "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>, "konrad.wilk@oracle.com" <konrad.wilk@oracle.com>, "Ian.Campbell@citrix.com" <Ian.Campbell@citrix.com>, "david.vrabel@citrix.com" <david.vrabel@citrix.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Subject: Re: [PATCH v7 3/8] arm64: introduce is_device_dma_coherent Message-ID: <20141105165646.GN32700@e104818-lin.cambridge.arm.com> References: <alpine.DEB.2.02.1410261621200.22875@kaball.uk.xensource.com> <1414422568-19103-3-git-send-email-stefano.stabellini@eu.citrix.com> <alpine.DEB.2.02.1411031045340.22875@kaball.uk.xensource.com> <20141103105716.GC23162@arm.com> <alpine.DEB.2.02.1411031101400.22875@kaball.uk.xensource.com> MIME-Version: 1.0 In-Reply-To: <alpine.DEB.2.02.1411031101400.22875@kaball.uk.xensource.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Content-Type: text/plain; charset=us-ascii Content-Disposition: inline

Catalin Marinas Nov. 5, 2014, 4:56 p.m. UTC

On Mon, Nov 03, 2014 at 11:10:18AM +0000, Stefano Stabellini wrote:
> On Mon, 3 Nov 2014, Will Deacon wrote:
> > On Mon, Nov 03, 2014 at 10:46:03AM +0000, Stefano Stabellini wrote:
> > > On Mon, 27 Oct 2014, Stefano Stabellini wrote:
> > > > Introduce a boolean flag and an accessor function to check whether a
> > > > device is dma_coherent. Set the flag from set_arch_dma_coherent_ops.
> > > > 
> > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > > > CC: will.deacon@arm.com
> > > 
> > > Will, Catalin,
> > > are you OK with this patch?
> > 
> > It would be nicer if the dma_coherent flag didn't have to be duplicated by
> > each architecture in dev_archdata. Is there any reason not to put it in the
> > core code?
> 
> Yes, there is a reason for it: if I added a boolean dma_coherent flag in
> struct device as Catalin initially suggested, what would be the default
> for each architecture? Where would I set it for arch that don't use
> device tree?

You don't need to. An architecture that has coherent DMA always doesn't
need to do anything. One that has non-coherent DMA always only needs to
select HAVE_DMA_NONCOHERENT. One that has a mix of both needs to find a
way to set dev->dma_coherent. Since that's a new API you introduce, it
doesn't break any existing architectures.

Note that if !is_device_dma_coherent(), it doesn't always mean that
standard cache maintenance would be enough (but that's a Xen problem,
not sure how to solve).


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Stefano Stabellini Nov. 5, 2014, 6:15 p.m. UTC | #1

On Wed, 5 Nov 2014, Catalin Marinas wrote:
> On Mon, Nov 03, 2014 at 11:10:18AM +0000, Stefano Stabellini wrote:
> > On Mon, 3 Nov 2014, Will Deacon wrote:
> > > On Mon, Nov 03, 2014 at 10:46:03AM +0000, Stefano Stabellini wrote:
> > > > On Mon, 27 Oct 2014, Stefano Stabellini wrote:
> > > > > Introduce a boolean flag and an accessor function to check whether a
> > > > > device is dma_coherent. Set the flag from set_arch_dma_coherent_ops.
> > > > > 
> > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > > > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > > > > CC: will.deacon@arm.com
> > > > 
> > > > Will, Catalin,
> > > > are you OK with this patch?
> > > 
> > > It would be nicer if the dma_coherent flag didn't have to be duplicated by
> > > each architecture in dev_archdata. Is there any reason not to put it in the
> > > core code?
> > 
> > Yes, there is a reason for it: if I added a boolean dma_coherent flag in
> > struct device as Catalin initially suggested, what would be the default
> > for each architecture? Where would I set it for arch that don't use
> > device tree?
> 
> You don't need to. An architecture that has coherent DMA always doesn't
> need to do anything. One that has non-coherent DMA always only needs to
> select HAVE_DMA_NONCOHERENT. One that has a mix of both needs to find a
> way to set dev->dma_coherent. Since that's a new API you introduce, it
> doesn't break any existing architectures.

I am not sure that this is better than the current patch but I can see
that this approach is not too controversial, so I am happy to go with
whatever the maintainers prefer.


> Note that if !is_device_dma_coherent(), it doesn't always mean that
> standard cache maintenance would be enough (but that's a Xen problem,
> not sure how to solve).

It is a thorny issue indeed.
Xen would need to know how to do non-standard cache maintenance
operations.
Otherwise we would need to resurrect XENFEAT_grant_map_identity (that I
am reverting in this series) and be content with having CONFIG_XEN_DOM0
depend on CONFIG_ARM_LPAE.


> diff --git a/arch/Kconfig b/arch/Kconfig
> index 05d7a8a458d5..8462b2e7491b 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -203,6 +203,9 @@ config HAVE_DMA_ATTRS
>  config HAVE_DMA_CONTIGUOUS
>  	bool
>  
> +config HAVE_DMA_NONCOHERENT
> +	bool
> +
>  config GENERIC_SMP_IDLE_THREAD
>         bool
>  
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 89c4b5ccc68d..fd7d5522764c 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -40,6 +40,7 @@ config ARM
>  	select HAVE_DMA_API_DEBUG
>  	select HAVE_DMA_ATTRS
>  	select HAVE_DMA_CONTIGUOUS if MMU
> +	select HAVE_DMA_NONCOHERENT if OF
>  	select HAVE_DYNAMIC_FTRACE if (!XIP_KERNEL)
>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) && MMU
>  	select HAVE_FTRACE_MCOUNT_RECORD if (!XIP_KERNEL)
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 9532f8d5857e..eb7a5aa64e0e 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -46,6 +46,7 @@ config ARM64
>  	select HAVE_DMA_API_DEBUG
>  	select HAVE_DMA_ATTRS
>  	select HAVE_DMA_CONTIGUOUS
> +	select HAVE_DMA_NONCOHERENT
>  	select HAVE_DYNAMIC_FTRACE
>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS
>  	select HAVE_FTRACE_MCOUNT_RECORD
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 3b64d0bf5bba..7e827726b702 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -183,6 +183,7 @@ static void of_dma_configure(struct device *dev)
>  	 * dma coherent operations.
>  	 */
>  	if (of_dma_is_coherent(dev->of_node)) {
> +		dev->dma_coherent = true;
>  		set_arch_dma_coherent_ops(dev);
>  		dev_dbg(dev, "device is dma coherent\n");
>  	}

I think that this would need to be #ifdef'ed as it is possible to have
OF support but no HAVE_DMA_NONCOHERENT (PPC?).


> diff --git a/include/linux/device.h b/include/linux/device.h
> index ce1f21608b16..e00ca876db01 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -796,6 +796,7 @@ struct device {
>  
>  	bool			offline_disabled:1;
>  	bool			offline:1;
> +	bool			dma_coherent:1;
>  };

I guess we would have to #ifdef CONFIG_HAVE_DMA_NONCOHERENT the
dma_coherent flag, right? Otherwise architecures that do not select
CONFIG_HAVE_DMA_NONCOHERENT (x86 for example) would end up with a flag
in struct device that doesn't reflect the properties of the device (dma
coherent devices with dev->dma_coherent == 0).

Overall it is a lot of ifdefs for not so much code sharing.


>  static inline struct device *kobj_to_dev(struct kobject *kobj)
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index d5d388160f42..0bdffba2337d 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -78,6 +78,18 @@ static inline int is_device_dma_capable(struct device *dev)
>  	return dev->dma_mask != NULL && *dev->dma_mask != DMA_MASK_NONE;
>  }
>  
> +#ifdef CONFIG_HAVE_DMA_NONCOHERENT
> +static inline int is_device_dma_coherent(struct device *dev)
> +{
> +	return dev->dma_coherent;
> +}
> +#else
> +static inline int is_device_dma_coherent(struct device *dev)
> +{
> +	return 1
> +}
> +#endif
> +
>  #ifdef CONFIG_HAS_DMA
>  #include <asm/dma-mapping.h>
>  #else
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Catalin Marinas Nov. 6, 2014, 10:33 a.m. UTC | #2

On Wed, Nov 05, 2014 at 06:15:38PM +0000, Stefano Stabellini wrote:
> On Wed, 5 Nov 2014, Catalin Marinas wrote:
> > On Mon, Nov 03, 2014 at 11:10:18AM +0000, Stefano Stabellini wrote:
> > > On Mon, 3 Nov 2014, Will Deacon wrote:
> > > > On Mon, Nov 03, 2014 at 10:46:03AM +0000, Stefano Stabellini wrote:
> > > > > On Mon, 27 Oct 2014, Stefano Stabellini wrote:
> > > > > > Introduce a boolean flag and an accessor function to check whether a
> > > > > > device is dma_coherent. Set the flag from set_arch_dma_coherent_ops.
> > > > > > 
> > > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > > > > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > > > > > CC: will.deacon@arm.com
> > > > > 
> > > > > Will, Catalin,
> > > > > are you OK with this patch?
> > > > 
> > > > It would be nicer if the dma_coherent flag didn't have to be duplicated by
> > > > each architecture in dev_archdata. Is there any reason not to put it in the
> > > > core code?
> > > 
> > > Yes, there is a reason for it: if I added a boolean dma_coherent flag in
> > > struct device as Catalin initially suggested, what would be the default
> > > for each architecture? Where would I set it for arch that don't use
> > > device tree?
> > 
> > You don't need to. An architecture that has coherent DMA always doesn't
> > need to do anything. One that has non-coherent DMA always only needs to
> > select HAVE_DMA_NONCOHERENT. One that has a mix of both needs to find a
> > way to set dev->dma_coherent. Since that's a new API you introduce, it
> > doesn't break any existing architectures.
> 
> I am not sure that this is better than the current patch but I can see
> that this approach is not too controversial, so I am happy to go with
> whatever the maintainers prefer.

Functionally it is the same, but just less code duplication.

> > Note that if !is_device_dma_coherent(), it doesn't always mean that
> > standard cache maintenance would be enough (but that's a Xen problem,
> > not sure how to solve).
> 
> It is a thorny issue indeed.
> Xen would need to know how to do non-standard cache maintenance
> operations.

Is EL2 hyp or EL1 dom0 doing such maintenance? If the latter, you could
just use the currently registered dma ops.

> Otherwise we would need to resurrect XENFEAT_grant_map_identity (that I
> am reverting in this series) and be content with having CONFIG_XEN_DOM0
> depend on CONFIG_ARM_LPAE.

So what does buy you? Is it just the hope that with LPAE you won't have
weird system caches?

> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 05d7a8a458d5..8462b2e7491b 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -203,6 +203,9 @@ config HAVE_DMA_ATTRS
> >  config HAVE_DMA_CONTIGUOUS
> >  	bool
> >  
> > +config HAVE_DMA_NONCOHERENT
> > +	bool
> > +
> >  config GENERIC_SMP_IDLE_THREAD
> >         bool
> >  
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index 89c4b5ccc68d..fd7d5522764c 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -40,6 +40,7 @@ config ARM
> >  	select HAVE_DMA_API_DEBUG
> >  	select HAVE_DMA_ATTRS
> >  	select HAVE_DMA_CONTIGUOUS if MMU
> > +	select HAVE_DMA_NONCOHERENT if OF
> >  	select HAVE_DYNAMIC_FTRACE if (!XIP_KERNEL)
> >  	select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) && MMU
> >  	select HAVE_FTRACE_MCOUNT_RECORD if (!XIP_KERNEL)
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 9532f8d5857e..eb7a5aa64e0e 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -46,6 +46,7 @@ config ARM64
> >  	select HAVE_DMA_API_DEBUG
> >  	select HAVE_DMA_ATTRS
> >  	select HAVE_DMA_CONTIGUOUS
> > +	select HAVE_DMA_NONCOHERENT
> >  	select HAVE_DYNAMIC_FTRACE
> >  	select HAVE_EFFICIENT_UNALIGNED_ACCESS
> >  	select HAVE_FTRACE_MCOUNT_RECORD
> > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > index 3b64d0bf5bba..7e827726b702 100644
> > --- a/drivers/of/platform.c
> > +++ b/drivers/of/platform.c
> > @@ -183,6 +183,7 @@ static void of_dma_configure(struct device *dev)
> >  	 * dma coherent operations.
> >  	 */
> >  	if (of_dma_is_coherent(dev->of_node)) {
> > +		dev->dma_coherent = true;
> >  		set_arch_dma_coherent_ops(dev);
> >  		dev_dbg(dev, "device is dma coherent\n");
> >  	}
> 
> I think that this would need to be #ifdef'ed as it is possible to have
> OF support but no HAVE_DMA_NONCOHERENT (PPC?).

The field is always there. But with !HAVE_DMA_NONCOHERENT,
is_device_dma_coherent() would always return 1. You could avoid
defining is_device_dma_coherent() entirely when !HAVE_DMA_NONCOHERENT,
it wouldn't be worse than your patch in terms of an undefined function.

> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index ce1f21608b16..e00ca876db01 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -796,6 +796,7 @@ struct device {
> >  
> >  	bool			offline_disabled:1;
> >  	bool			offline:1;
> > +	bool			dma_coherent:1;
> >  };
> 
> I guess we would have to #ifdef CONFIG_HAVE_DMA_NONCOHERENT the
> dma_coherent flag, right? Otherwise architecures that do not select
> CONFIG_HAVE_DMA_NONCOHERENT (x86 for example) would end up with a flag
> in struct device that doesn't reflect the properties of the device (dma
> coherent devices with dev->dma_coherent == 0).

In my proposal you should not read this field directly but rather access
it only via is_device_dma_coherent() (you can add a function for setting
it as well).

Stefano Stabellini Nov. 6, 2014, 12:22 p.m. UTC | #3

On Thu, 6 Nov 2014, Catalin Marinas wrote:
> On Wed, Nov 05, 2014 at 06:15:38PM +0000, Stefano Stabellini wrote:
> > On Wed, 5 Nov 2014, Catalin Marinas wrote:
> > > On Mon, Nov 03, 2014 at 11:10:18AM +0000, Stefano Stabellini wrote:
> > > > On Mon, 3 Nov 2014, Will Deacon wrote:
> > > > > On Mon, Nov 03, 2014 at 10:46:03AM +0000, Stefano Stabellini wrote:
> > > > > > On Mon, 27 Oct 2014, Stefano Stabellini wrote:
> > > > > > > Introduce a boolean flag and an accessor function to check whether a
> > > > > > > device is dma_coherent. Set the flag from set_arch_dma_coherent_ops.
> > > > > > > 
> > > > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > > > > > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > > > > > > CC: will.deacon@arm.com
> > > > > > 
> > > > > > Will, Catalin,
> > > > > > are you OK with this patch?
> > > > > 
> > > > > It would be nicer if the dma_coherent flag didn't have to be duplicated by
> > > > > each architecture in dev_archdata. Is there any reason not to put it in the
> > > > > core code?
> > > > 
> > > > Yes, there is a reason for it: if I added a boolean dma_coherent flag in
> > > > struct device as Catalin initially suggested, what would be the default
> > > > for each architecture? Where would I set it for arch that don't use
> > > > device tree?
> > > 
> > > You don't need to. An architecture that has coherent DMA always doesn't
> > > need to do anything. One that has non-coherent DMA always only needs to
> > > select HAVE_DMA_NONCOHERENT. One that has a mix of both needs to find a
> > > way to set dev->dma_coherent. Since that's a new API you introduce, it
> > > doesn't break any existing architectures.
> > 
> > I am not sure that this is better than the current patch but I can see
> > that this approach is not too controversial, so I am happy to go with
> > whatever the maintainers prefer.
> 
> Functionally it is the same, but just less code duplication.
> 
> > > Note that if !is_device_dma_coherent(), it doesn't always mean that
> > > standard cache maintenance would be enough (but that's a Xen problem,
> > > not sure how to solve).
> > 
> > It is a thorny issue indeed.
> > Xen would need to know how to do non-standard cache maintenance
> > operations.
> 
> Is EL2 hyp or EL1 dom0 doing such maintenance? If the latter, you could
> just use the currently registered dma ops.

It is Xen, so El2 hypervisor.


> > Otherwise we would need to resurrect XENFEAT_grant_map_identity (that I
> > am reverting in this series) and be content with having CONFIG_XEN_DOM0
> > depend on CONFIG_ARM_LPAE.
> 
> So what does buy you? Is it just the hope that with LPAE you won't have
> weird system caches?

Let me explain a bit more and let's assume there is no SMMU.

The problem is not normal dma originating in dom0, currently mapped 1:1
(pseudo-physical == physical). The problem are dma operations that
involve foreign pages (pages belonging to other virtual machines)
currently mapped also in dom0 to do IO. It is common for the Xen PV
network and block frontends to grant access to one or more pages to the
network and block backends for IO. The backends, usually in dom0, map
the pages and perform the actual IO. Sometimes the IO is a dma operation
that involves one of the granted pages directly.

For these pages, the pseudo-physical address in dom0 is different from
the physical address. Dom0 keeps track of the pseudo-physical to
physical relationship (pfn_to_mfn in Xen terminology). The swiotlb-xen
driver makes sure to return the right mfn from map_page and map_sg.

However some of the dma_ops give you only a dma_addr_t handle
(unmap_page and unmap_sg), that is the physical address of the page.
Dom0 doesn't know how to find the pseudo-physical address corresponding
to it. Therefore it also doesn't know how to find the struct page for
it. This is not a trivial problem to solve as the same page can be
granted multiple times, therefore could have multiple pseudo-physical
addresses and multiple struct pages corresponding to one physical
address in dom0.

Fortunately these dma_ops don't actually need to do much. In fact they
only need to flush caches if the device is not dma-coherent. So the
current proposed solution is to issue an hypercall to ask Xen to do the
flushing, passing the physical address dom0 knows.

The older solution that this series is reverting, is based on
XENFEAT_grant_map_identity, that was a feature offered by Xen, already
reverted in the hypervisor. XENFEAT_grant_map_identity told dom0 that
the hypervisor mapped foreign pages at the address requested by dom0
*and* at pseudo-physical == physical address. That way Dom0 could always
find the page at pseudo-physical == physical address and do the flushing
there. However it only works if Dom0 is compiled with CONFIG_ARM_LPAE
(a non-LPAE kernel is not guaranteed to be able to reach the
pseudo-physical address in question) and we didn't want to introduce
that limitation so we changed approach.


> > > diff --git a/arch/Kconfig b/arch/Kconfig
> > > index 05d7a8a458d5..8462b2e7491b 100644
> > > --- a/arch/Kconfig
> > > +++ b/arch/Kconfig
> > > @@ -203,6 +203,9 @@ config HAVE_DMA_ATTRS
> > >  config HAVE_DMA_CONTIGUOUS
> > >  	bool
> > >  
> > > +config HAVE_DMA_NONCOHERENT
> > > +	bool
> > > +
> > >  config GENERIC_SMP_IDLE_THREAD
> > >         bool
> > >  
> > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > > index 89c4b5ccc68d..fd7d5522764c 100644
> > > --- a/arch/arm/Kconfig
> > > +++ b/arch/arm/Kconfig
> > > @@ -40,6 +40,7 @@ config ARM
> > >  	select HAVE_DMA_API_DEBUG
> > >  	select HAVE_DMA_ATTRS
> > >  	select HAVE_DMA_CONTIGUOUS if MMU
> > > +	select HAVE_DMA_NONCOHERENT if OF
> > >  	select HAVE_DYNAMIC_FTRACE if (!XIP_KERNEL)
> > >  	select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) && MMU
> > >  	select HAVE_FTRACE_MCOUNT_RECORD if (!XIP_KERNEL)
> > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > index 9532f8d5857e..eb7a5aa64e0e 100644
> > > --- a/arch/arm64/Kconfig
> > > +++ b/arch/arm64/Kconfig
> > > @@ -46,6 +46,7 @@ config ARM64
> > >  	select HAVE_DMA_API_DEBUG
> > >  	select HAVE_DMA_ATTRS
> > >  	select HAVE_DMA_CONTIGUOUS
> > > +	select HAVE_DMA_NONCOHERENT
> > >  	select HAVE_DYNAMIC_FTRACE
> > >  	select HAVE_EFFICIENT_UNALIGNED_ACCESS
> > >  	select HAVE_FTRACE_MCOUNT_RECORD
> > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > index 3b64d0bf5bba..7e827726b702 100644
> > > --- a/drivers/of/platform.c
> > > +++ b/drivers/of/platform.c
> > > @@ -183,6 +183,7 @@ static void of_dma_configure(struct device *dev)
> > >  	 * dma coherent operations.
> > >  	 */
> > >  	if (of_dma_is_coherent(dev->of_node)) {
> > > +		dev->dma_coherent = true;
> > >  		set_arch_dma_coherent_ops(dev);
> > >  		dev_dbg(dev, "device is dma coherent\n");
> > >  	}
> > 
> > I think that this would need to be #ifdef'ed as it is possible to have
> > OF support but no HAVE_DMA_NONCOHERENT (PPC?).
> 
> The field is always there. But with !HAVE_DMA_NONCOHERENT,
> is_device_dma_coherent() would always return 1. You could avoid
> defining is_device_dma_coherent() entirely when !HAVE_DMA_NONCOHERENT,
> it wouldn't be worse than your patch in terms of an undefined function.
> 
> > > diff --git a/include/linux/device.h b/include/linux/device.h
> > > index ce1f21608b16..e00ca876db01 100644
> > > --- a/include/linux/device.h
> > > +++ b/include/linux/device.h
> > > @@ -796,6 +796,7 @@ struct device {
> > >  
> > >  	bool			offline_disabled:1;
> > >  	bool			offline:1;
> > > +	bool			dma_coherent:1;
> > >  };
> > 
> > I guess we would have to #ifdef CONFIG_HAVE_DMA_NONCOHERENT the
> > dma_coherent flag, right? Otherwise architecures that do not select
> > CONFIG_HAVE_DMA_NONCOHERENT (x86 for example) would end up with a flag
> > in struct device that doesn't reflect the properties of the device (dma
> > coherent devices with dev->dma_coherent == 0).
> 
> In my proposal you should not read this field directly but rather access
> it only via is_device_dma_coherent() (you can add a function for setting
> it as well).

This is the part that I don't like: on other architectures you now have a
field in struct device that is actually incorrect. It is true that one
should call the accessor function to read the property but still...

As I don't feel that strongly against it, I'll resend including this
patch (with you as the author of course).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Catalin Marinas Nov. 6, 2014, 1:40 p.m. UTC | #4

On Thu, Nov 06, 2014 at 12:22:13PM +0000, Stefano Stabellini wrote:
> As I don't feel that strongly against it, I'll resend including this
> patch (with you as the author of course).

Not just yet, let me understand your reply properly ;).

Catalin Marinas Nov. 7, 2014, 11:05 a.m. UTC | #5

On Thu, Nov 06, 2014 at 12:22:13PM +0000, Stefano Stabellini wrote:
> On Thu, 6 Nov 2014, Catalin Marinas wrote:
> > On Wed, Nov 05, 2014 at 06:15:38PM +0000, Stefano Stabellini wrote:
> > > On Wed, 5 Nov 2014, Catalin Marinas wrote:
> > > > Note that if !is_device_dma_coherent(), it doesn't always mean that
> > > > standard cache maintenance would be enough (but that's a Xen problem,
> > > > not sure how to solve).
> > > 
> > > It is a thorny issue indeed.
> > > Xen would need to know how to do non-standard cache maintenance
> > > operations.
> > 
> > Is EL2 hyp or EL1 dom0 doing such maintenance? If the latter, you could
> > just use the currently registered dma ops.
> 
> It is Xen, so El2 hypervisor.

So what is arch/arm/xen/mm32.c cache maintenance for? Doesn't it run in
dom0?

> > > Otherwise we would need to resurrect XENFEAT_grant_map_identity (that I
> > > am reverting in this series) and be content with having CONFIG_XEN_DOM0
> > > depend on CONFIG_ARM_LPAE.
> > 
> > So what does buy you? Is it just the hope that with LPAE you won't have
> > weird system caches?
> 
> Let me explain a bit more and let's assume there is no SMMU.
> 
> The problem is not normal dma originating in dom0, currently mapped 1:1
> (pseudo-physical == physical). The problem are dma operations that
> involve foreign pages (pages belonging to other virtual machines)
> currently mapped also in dom0 to do IO. It is common for the Xen PV
> network and block frontends to grant access to one or more pages to the
> network and block backends for IO. The backends, usually in dom0, map
> the pages and perform the actual IO. Sometimes the IO is a dma operation
> that involves one of the granted pages directly.
> 
> For these pages, the pseudo-physical address in dom0 is different from
> the physical address. Dom0 keeps track of the pseudo-physical to
> physical relationship (pfn_to_mfn in Xen terminology). The swiotlb-xen
> driver makes sure to return the right mfn from map_page and map_sg.

For my understanding, xen_dma_map_page() is called from dom0 and it
simply calls the __generic_dma_ops()->map_page(). The "page" argument
here is the dom0 page and it gets translated to dom0 phys and then to a
bus address via xen_phys_to_bus().

On arm64, __swiotlb_map_page() uses the page structure to get some
pseudo device address which gets converted back to a dom0 virtual
address. The latter is used for cache maintenance by VA in dom0. With
PIPT-like caches, that's fine, you don't need the actual machine PA
unless you have caches like PL310 (which are not compatible with
virtualisation anyway and the latest ARMv8 ARM even makes a clear
statement here).

(one potential problem here is the dma_capable() check in
swiotlb_map_page() (non-xen) which uses the pseudo device address)

The interesting part is that xen_swiotlb_map_page() returns the real
machine device address to dom0, which is different from what dom0 thinks
of a device address (phys_to_dma(phys_to_page(page))). Does dom0 use
such real/machine device address to program the device directly or there
is another layer where you could do the translation?

> However some of the dma_ops give you only a dma_addr_t handle
> (unmap_page and unmap_sg), that is the physical address of the page.

OK, I started to get it now, the __swiotlb_unmap_page() on arm64, if
called from dom0, would get the machine device address and dma_to_phys()
does not work.

> Dom0 doesn't know how to find the pseudo-physical address corresponding
> to it. Therefore it also doesn't know how to find the struct page for
> it. This is not a trivial problem to solve as the same page can be
> granted multiple times, therefore could have multiple pseudo-physical
> addresses and multiple struct pages corresponding to one physical
> address in dom0.

So what is mfn_to_pfn() and xen_bus_to_phys() used for? Isn't
mfn_to_pfn(pfn_to_mfn(x)) an identity function?

> Fortunately these dma_ops don't actually need to do much. In fact they
> only need to flush caches if the device is not dma-coherent. So the
> current proposed solution is to issue an hypercall to ask Xen to do the
> flushing, passing the physical address dom0 knows.

I really don't like the dma_cache_maint() duplication you added in
arch/arm/xen/mm32.c just because you cannot call the host dma_ops when
the page is not available.

You basically only need a VA that was mapped in dom0 when map_page() was
called and is still mapped when page_unmap() is called. You can free the
actual mapping after calling __generic_dma_ops()->unmap_page() in
xen_dma_unmap_page() (in xen_unmap_single()).

BTW, drivers/xen/swiotlb-xen.c needs something like:

        dma_addr_t dev_addr = xen_phys_to_bus(phys_to_dma(phys));

phys != bus address.

Catalin Marinas Nov. 7, 2014, 11:11 a.m. UTC | #6

On Fri, Nov 07, 2014 at 11:05:25AM +0000, Catalin Marinas wrote:
> BTW, drivers/xen/swiotlb-xen.c needs something like:
> 
>         dma_addr_t dev_addr = xen_phys_to_bus(phys_to_dma(phys));
> 
> phys != bus address.

Wrong place, what I meant is when calling:

        xen_dma_unmap_page(hwdev, phys_to_dma(paddr), size, dir, attrs);

Stefano Stabellini Nov. 7, 2014, 2:10 p.m. UTC | #7

On Fri, 7 Nov 2014, Catalin Marinas wrote:
> On Thu, Nov 06, 2014 at 12:22:13PM +0000, Stefano Stabellini wrote:
> > On Thu, 6 Nov 2014, Catalin Marinas wrote:
> > > On Wed, Nov 05, 2014 at 06:15:38PM +0000, Stefano Stabellini wrote:
> > > > On Wed, 5 Nov 2014, Catalin Marinas wrote:
> > > > > Note that if !is_device_dma_coherent(), it doesn't always mean that
> > > > > standard cache maintenance would be enough (but that's a Xen problem,
> > > > > not sure how to solve).
> > > > 
> > > > It is a thorny issue indeed.
> > > > Xen would need to know how to do non-standard cache maintenance
> > > > operations.
> > > 
> > > Is EL2 hyp or EL1 dom0 doing such maintenance? If the latter, you could
> > > just use the currently registered dma ops.
> > 
> > It is Xen, so El2 hypervisor.
> 
> So what is arch/arm/xen/mm32.c cache maintenance for? Doesn't it run in
> dom0?

This patch series changes that code path specifically. As a matter of
fact it removes mm32.c.
Before this patch series, cache maintenance in dom0 was being done with
XENFEAT_grant_map_identity (see explanation in previous email).
After this patch series, an hypercall is made when foreign pages are
involved, see patch 8/8, the if(!pfn_valid(pfn)) code path. Local pages
are still flushed in dom0.


> > > > Otherwise we would need to resurrect XENFEAT_grant_map_identity (that I
> > > > am reverting in this series) and be content with having CONFIG_XEN_DOM0
> > > > depend on CONFIG_ARM_LPAE.
> > > 
> > > So what does buy you? Is it just the hope that with LPAE you won't have
> > > weird system caches?
> > 
> > Let me explain a bit more and let's assume there is no SMMU.
> > 
> > The problem is not normal dma originating in dom0, currently mapped 1:1
> > (pseudo-physical == physical). The problem are dma operations that
> > involve foreign pages (pages belonging to other virtual machines)
> > currently mapped also in dom0 to do IO. It is common for the Xen PV
> > network and block frontends to grant access to one or more pages to the
> > network and block backends for IO. The backends, usually in dom0, map
> > the pages and perform the actual IO. Sometimes the IO is a dma operation
> > that involves one of the granted pages directly.
> > 
> > For these pages, the pseudo-physical address in dom0 is different from
> > the physical address. Dom0 keeps track of the pseudo-physical to
> > physical relationship (pfn_to_mfn in Xen terminology). The swiotlb-xen
> > driver makes sure to return the right mfn from map_page and map_sg.
> 
> For my understanding, xen_dma_map_page() is called from dom0 and it
> simply calls the __generic_dma_ops()->map_page(). The "page" argument
> here is the dom0 page and it gets translated to dom0 phys and then to a
> bus address via xen_phys_to_bus().
> 
> On arm64, __swiotlb_map_page() uses the page structure to get some
> pseudo device address which gets converted back to a dom0 virtual
> address. The latter is used for cache maintenance by VA in dom0. With
> PIPT-like caches, that's fine, you don't need the actual machine PA
> unless you have caches like PL310 (which are not compatible with
> virtualisation anyway and the latest ARMv8 ARM even makes a clear
> statement here).
> 
> (one potential problem here is the dma_capable() check in
> swiotlb_map_page() (non-xen) which uses the pseudo device address)
> 
> The interesting part is that xen_swiotlb_map_page() returns the real
> machine device address to dom0, which is different from what dom0 thinks
> of a device address (phys_to_dma(phys_to_page(page))). Does dom0 use
> such real/machine device address to program the device directly or there
> is another layer where you could do the translation?

No other layer. Dom0 uses the real/machine device address returned by
xen_phys_to_bus to program the device directly.


> > However some of the dma_ops give you only a dma_addr_t handle
> > (unmap_page and unmap_sg), that is the physical address of the page.
> 
> OK, I started to get it now, the __swiotlb_unmap_page() on arm64, if
> called from dom0, would get the machine device address and dma_to_phys()
> does not work.

Right!


> > Dom0 doesn't know how to find the pseudo-physical address corresponding
> > to it. Therefore it also doesn't know how to find the struct page for
> > it. This is not a trivial problem to solve as the same page can be
> > granted multiple times, therefore could have multiple pseudo-physical
> > addresses and multiple struct pages corresponding to one physical
> > address in dom0.
> 
> So what is mfn_to_pfn() and xen_bus_to_phys() used for?

They are not used on ARM. But swiotlb-xen is not ARM specific and
xen_bus_to_phys is still important on x86. It works differently there.


> Isn't mfn_to_pfn(pfn_to_mfn(x)) an identity function?

Yes, it should be and it is for local pages.
However it is not an identity function for foreign pages, because
mfn_to_pfn is broken and just returns pfn on ARM, while pfn_to_mfn works
correctly.


> > Fortunately these dma_ops don't actually need to do much. In fact they
> > only need to flush caches if the device is not dma-coherent. So the
> > current proposed solution is to issue an hypercall to ask Xen to do the
> > flushing, passing the physical address dom0 knows.
> 
> I really don't like the dma_cache_maint() duplication you added in
> arch/arm/xen/mm32.c just because you cannot call the host dma_ops when
> the page is not available.

I agree is bad, but this patch series is a big step forward removing the
duplication. In fact now that I think about it, when a page is local (not a
foreign page) I could just call the host dma_ops. mm.c:dma_cache_maint
would only be used for foreign pages (pages for which !pfn_valid(pfn)).


> You basically only need a VA that was mapped in dom0 when map_page() was
> called and is still mapped when page_unmap() is called. You can free the
> actual mapping after calling __generic_dma_ops()->unmap_page() in
> xen_dma_unmap_page() (in xen_unmap_single()).

That is true, but where would I store the mapping?

I don't want to introduce another data structure to keep physical to va
or physical to struct page mappings that I need to modify at every
map_page and unmap_page. It wouldn't even be a trivial data structure as
multiple dma operations involving the same mfn but different struct page
can happen simultaneously.

The performance hit of maintaining such a data structure would be
significant.

It would negatively impact any dma operations involving any devices,
including dma coherent ones. While this series only requires special
handling of dma operations involving non-coherent devices (resulting in
an hypercall being made).

In a way this series rewards hardware that makes life easier to software
developers :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Stefano Stabellini Nov. 7, 2014, 2:10 p.m. UTC | #8

On Fri, 7 Nov 2014, Catalin Marinas wrote:
> On Fri, Nov 07, 2014 at 11:05:25AM +0000, Catalin Marinas wrote:
> > BTW, drivers/xen/swiotlb-xen.c needs something like:
> > 
> >         dma_addr_t dev_addr = xen_phys_to_bus(phys_to_dma(phys));
> > 
> > phys != bus address.
> 
> Wrong place, what I meant is when calling:
> 
>         xen_dma_unmap_page(hwdev, phys_to_dma(paddr), size, dir, attrs);

Well spotted! This is an actual mistake in the current code, independent
from the problem described before. Julien reported it a couple of days
ago and I have already queued a patch to fix it at the end of the
series.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Catalin Marinas Nov. 7, 2014, 4 p.m. UTC | #9

(talking to Ian in person helped a bit ;))

On Fri, Nov 07, 2014 at 02:10:25PM +0000, Stefano Stabellini wrote:
> On Fri, 7 Nov 2014, Catalin Marinas wrote:
> > On Thu, Nov 06, 2014 at 12:22:13PM +0000, Stefano Stabellini wrote:
> > > On Thu, 6 Nov 2014, Catalin Marinas wrote:
> > > > On Wed, Nov 05, 2014 at 06:15:38PM +0000, Stefano Stabellini wrote:
> > > > > On Wed, 5 Nov 2014, Catalin Marinas wrote:
> > > > > > Note that if !is_device_dma_coherent(), it doesn't always mean that
> > > > > > standard cache maintenance would be enough (but that's a Xen problem,
> > > > > > not sure how to solve).
> > > > > 
> > > > > It is a thorny issue indeed.
> > > > > Xen would need to know how to do non-standard cache maintenance
> > > > > operations.
> > > > 
> > > > Is EL2 hyp or EL1 dom0 doing such maintenance? If the latter, you could
> > > > just use the currently registered dma ops.
> > > 
> > > It is Xen, so El2 hypervisor.
> > 
> > So what is arch/arm/xen/mm32.c cache maintenance for? Doesn't it run in
> > dom0?
> 
> This patch series changes that code path specifically. As a matter of
> fact it removes mm32.c.

Well, it actually merges it into another file, dma_cache_maint() still
remains.

> Before this patch series, cache maintenance in dom0 was being done with
> XENFEAT_grant_map_identity (see explanation in previous email).
> After this patch series, an hypercall is made when foreign pages are
> involved, see patch 8/8, the if(!pfn_valid(pfn)) code path. Local pages
> are still flushed in dom0.

OK. One question I had though is whether pfn_valid(mfn) is always false for
foreign pages. IIUC, dom0 uses a 1:1 pfn:mfn mapping, so it would work.
But from what I gather, there can be other guest, not necessarily dom0,
handling the DMA in which case, if it doesn't use a 1:1 pfn:mfn mapping,
your pfn_valid(mfn) check would no longer work for foreign pages.

> > > Fortunately these dma_ops don't actually need to do much. In fact they
> > > only need to flush caches if the device is not dma-coherent. So the
> > > current proposed solution is to issue an hypercall to ask Xen to do the
> > > flushing, passing the physical address dom0 knows.
> > 
> > I really don't like the dma_cache_maint() duplication you added in
> > arch/arm/xen/mm32.c just because you cannot call the host dma_ops when
> > the page is not available.
> 
> I agree is bad, but this patch series is a big step forward removing the
> duplication. In fact now that I think about it, when a page is local (not a
> foreign page) I could just call the host dma_ops. mm.c:dma_cache_maint
> would only be used for foreign pages (pages for which !pfn_valid(pfn)).

Indeed. I already replied to patch 6/8. This way I think you can remove
dma_cache_maint() duplication entirely, just add one specific to foreign
pages.

What I would like to see is xen_dma_map_page() also using hyp calls for
cache maintenance when !pfn_valid(), for symmetry with unmap. You would
need another argument to xen_dma_map_page() though to pass the real
device address or mfn (and on the map side you could simply check for
page_to_pfn(page) != mfn). For such cases, Xen swiotlb already handles
bouncing so you don't need dom0 swiotlb involved as well.

> > You basically only need a VA that was mapped in dom0 when map_page() was
> > called and is still mapped when page_unmap() is called. You can free the
> > actual mapping after calling __generic_dma_ops()->unmap_page() in
> > xen_dma_unmap_page() (in xen_unmap_single()).
> 
> That is true, but where would I store the mapping?

dev_archdata ;).

> I don't want to introduce another data structure to keep physical to va
> or physical to struct page mappings that I need to modify at every
> map_page and unmap_page. It wouldn't even be a trivial data structure as
> multiple dma operations involving the same mfn but different struct page
> can happen simultaneously.
> 
> The performance hit of maintaining such a data structure would be
> significant.

There would indeed be a performance hit especially for sg ops.

> It would negatively impact any dma operations involving any devices,
> including dma coherent ones. While this series only requires special
> handling of dma operations involving non-coherent devices (resulting in
> an hypercall being made).
> 
> In a way this series rewards hardware that makes life easier to software
> developers :-)

I only need the pfn_valid() clarification now together with the
map/unmap symmetry fix.

I think for the time being is_device_dma_coherent() can stay as an
arch-only function as it is only used by Xen on arm/arm64. If we later
need this on other architectures, we can make it generic.

Stefano Stabellini Nov. 7, 2014, 5:24 p.m. UTC | #10

On Fri, 7 Nov 2014, Catalin Marinas wrote:
> (talking to Ian in person helped a bit ;))
> 
> On Fri, Nov 07, 2014 at 02:10:25PM +0000, Stefano Stabellini wrote:
> > On Fri, 7 Nov 2014, Catalin Marinas wrote:
> > > On Thu, Nov 06, 2014 at 12:22:13PM +0000, Stefano Stabellini wrote:
> > > > On Thu, 6 Nov 2014, Catalin Marinas wrote:
> > > > > On Wed, Nov 05, 2014 at 06:15:38PM +0000, Stefano Stabellini wrote:
> > > > > > On Wed, 5 Nov 2014, Catalin Marinas wrote:
> > > > > > > Note that if !is_device_dma_coherent(), it doesn't always mean that
> > > > > > > standard cache maintenance would be enough (but that's a Xen problem,
> > > > > > > not sure how to solve).
> > > > > > 
> > > > > > It is a thorny issue indeed.
> > > > > > Xen would need to know how to do non-standard cache maintenance
> > > > > > operations.
> > > > > 
> > > > > Is EL2 hyp or EL1 dom0 doing such maintenance? If the latter, you could
> > > > > just use the currently registered dma ops.
> > > > 
> > > > It is Xen, so El2 hypervisor.
> > > 
> > > So what is arch/arm/xen/mm32.c cache maintenance for? Doesn't it run in
> > > dom0?
> > 
> > This patch series changes that code path specifically. As a matter of
> > fact it removes mm32.c.
> 
> Well, it actually merges it into another file, dma_cache_maint() still
> remains.
> 
> > Before this patch series, cache maintenance in dom0 was being done with
> > XENFEAT_grant_map_identity (see explanation in previous email).
> > After this patch series, an hypercall is made when foreign pages are
> > involved, see patch 8/8, the if(!pfn_valid(pfn)) code path. Local pages
> > are still flushed in dom0.
> 
> OK. One question I had though is whether pfn_valid(mfn) is always false for
> foreign pages. IIUC, dom0 uses a 1:1 pfn:mfn mapping, so it would work.

Right


> But from what I gather, there can be other guest, not necessarily dom0,
> handling the DMA in which case, if it doesn't use a 1:1 pfn:mfn mapping,
> your pfn_valid(mfn) check would no longer work for foreign pages.

We don't support assigning devices to guests other than dom0 without an
SMMU. With an SMMU it should all be transparent. Either way the
pfn_valid check should always work.



> > > > Fortunately these dma_ops don't actually need to do much. In fact they
> > > > only need to flush caches if the device is not dma-coherent. So the
> > > > current proposed solution is to issue an hypercall to ask Xen to do the
> > > > flushing, passing the physical address dom0 knows.
> > > 
> > > I really don't like the dma_cache_maint() duplication you added in
> > > arch/arm/xen/mm32.c just because you cannot call the host dma_ops when
> > > the page is not available.
> > 
> > I agree is bad, but this patch series is a big step forward removing the
> > duplication. In fact now that I think about it, when a page is local (not a
> > foreign page) I could just call the host dma_ops. mm.c:dma_cache_maint
> > would only be used for foreign pages (pages for which !pfn_valid(pfn)).
> 
> Indeed. I already replied to patch 6/8. This way I think you can remove
> dma_cache_maint() duplication entirely, just add one specific to foreign
> pages.

Yes, that was a very good suggestion, the code is much cleaner now.
Thanks!


> What I would like to see is xen_dma_map_page() also using hyp calls for
> cache maintenance when !pfn_valid(), for symmetry with unmap. You would
> need another argument to xen_dma_map_page() though to pass the real
> device address or mfn (and on the map side you could simply check for
> page_to_pfn(page) != mfn). For such cases, Xen swiotlb already handles
> bouncing so you don't need dom0 swiotlb involved as well.

I can see that it would be nice to have map_page and unmap_page be
symmetrical. However actually doing the map_page flush with an hypercall
would slow things down. Hypercalls are slower than function calls. It is
faster to do the cache flushing in dom0 if possible. In the map_page
case we have the struct page so we can easily do it by calling the
native dma_ops function.

Maybe I could just add a comment to explain the reason for the asymmetry?


> > > You basically only need a VA that was mapped in dom0 when map_page() was
> > > called and is still mapped when page_unmap() is called. You can free the
> > > actual mapping after calling __generic_dma_ops()->unmap_page() in
> > > xen_dma_unmap_page() (in xen_unmap_single()).
> > 
> > That is true, but where would I store the mapping?
> 
> dev_archdata ;).

Uhm... I haven't thought about using dev_archdata. It might simplify the
problem a bit. I'll have to think it over. It could be 3.19 or 3.20
material.


> > I don't want to introduce another data structure to keep physical to va
> > or physical to struct page mappings that I need to modify at every
> > map_page and unmap_page. It wouldn't even be a trivial data structure as
> > multiple dma operations involving the same mfn but different struct page
> > can happen simultaneously.
> > 
> > The performance hit of maintaining such a data structure would be
> > significant.
> 
> There would indeed be a performance hit especially for sg ops.
> 
> > It would negatively impact any dma operations involving any devices,
> > including dma coherent ones. While this series only requires special
> > handling of dma operations involving non-coherent devices (resulting in
> > an hypercall being made).
> > 
> > In a way this series rewards hardware that makes life easier to software
> > developers :-)
> 
> I only need the pfn_valid() clarification now together with the
> map/unmap symmetry fix.
>
> I think for the time being is_device_dma_coherent() can stay as an
> arch-only function as it is only used by Xen on arm/arm64. If we later
> need this on other architectures, we can make it generic.

OK
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Stefano Stabellini Nov. 7, 2014, 5:35 p.m. UTC | #11

On Fri, 7 Nov 2014, Stefano Stabellini wrote:
> On Fri, 7 Nov 2014, Catalin Marinas wrote:
> > What I would like to see is xen_dma_map_page() also using hyp calls for
> > cache maintenance when !pfn_valid(), for symmetry with unmap. You would
> > need another argument to xen_dma_map_page() though to pass the real
> > device address or mfn (and on the map side you could simply check for
> > page_to_pfn(page) != mfn). For such cases, Xen swiotlb already handles
> > bouncing so you don't need dom0 swiotlb involved as well.
> 
> I can see that it would be nice to have map_page and unmap_page be
> symmetrical. However actually doing the map_page flush with an hypercall
> would slow things down. Hypercalls are slower than function calls. It is
> faster to do the cache flushing in dom0 if possible. In the map_page
> case we have the struct page so we can easily do it by calling the
> native dma_ops function.
> 
> Maybe I could just add a comment to explain the reason for the asymmetry?

Ah, but the problem is that map_page could allocate a swiotlb buffer
(actually it does on arm64) that without a corresponding unmap_page
call, would end up being leaked, right?

Oh well.. two hypercalls it is then :-/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Catalin Marinas Nov. 7, 2014, 6:14 p.m. UTC | #12

On Fri, Nov 07, 2014 at 05:35:41PM +0000, Stefano Stabellini wrote:
> On Fri, 7 Nov 2014, Stefano Stabellini wrote:
> > On Fri, 7 Nov 2014, Catalin Marinas wrote:
> > > What I would like to see is xen_dma_map_page() also using hyp calls for
> > > cache maintenance when !pfn_valid(), for symmetry with unmap. You would
> > > need another argument to xen_dma_map_page() though to pass the real
> > > device address or mfn (and on the map side you could simply check for
> > > page_to_pfn(page) != mfn). For such cases, Xen swiotlb already handles
> > > bouncing so you don't need dom0 swiotlb involved as well.
> > 
> > I can see that it would be nice to have map_page and unmap_page be
> > symmetrical. However actually doing the map_page flush with an hypercall
> > would slow things down. Hypercalls are slower than function calls. It is
> > faster to do the cache flushing in dom0 if possible. In the map_page
> > case we have the struct page so we can easily do it by calling the
> > native dma_ops function.
> > 
> > Maybe I could just add a comment to explain the reason for the asymmetry?
> 
> Ah, but the problem is that map_page could allocate a swiotlb buffer
> (actually it does on arm64) that without a corresponding unmap_page
> call, would end up being leaked, right?

Yes. You could hack dma_capable() to always return true for dom0
(because the pfn/dma address here doesn't have anything to do with the
real mfn) but that's more of a hack assuming a lot about the swiotlb
implementation.

> Oh well.. two hypercalls it is then :-/

That looks cleaner to me (though I haven't seen the code yet ;)).

Alternatively, you could keep a permanent page (per-cpu) in dom0 that
you ask the hypervisor to point at the mfn just for the unmap cache
maintenance (similarly to the kernel kmap_atomic used for cache
maintenance on high pages). This could be more expensive but, OTOH, you
only need the hyp call on the unmap path. The is_device_dma_coherent()
would still remain to avoid any unnecessary map/unmap calls but for
non-coherent devices you use whatever the SoC provides.

Stefano Stabellini Nov. 7, 2014, 6:45 p.m. UTC | #13

On Fri, 7 Nov 2014, Catalin Marinas wrote:
> On Fri, Nov 07, 2014 at 05:35:41PM +0000, Stefano Stabellini wrote:
> > On Fri, 7 Nov 2014, Stefano Stabellini wrote:
> > > On Fri, 7 Nov 2014, Catalin Marinas wrote:
> > > > What I would like to see is xen_dma_map_page() also using hyp calls for
> > > > cache maintenance when !pfn_valid(), for symmetry with unmap. You would
> > > > need another argument to xen_dma_map_page() though to pass the real
> > > > device address or mfn (and on the map side you could simply check for
> > > > page_to_pfn(page) != mfn). For such cases, Xen swiotlb already handles
> > > > bouncing so you don't need dom0 swiotlb involved as well.
> > > 
> > > I can see that it would be nice to have map_page and unmap_page be
> > > symmetrical. However actually doing the map_page flush with an hypercall
> > > would slow things down. Hypercalls are slower than function calls. It is
> > > faster to do the cache flushing in dom0 if possible. In the map_page
> > > case we have the struct page so we can easily do it by calling the
> > > native dma_ops function.
> > > 
> > > Maybe I could just add a comment to explain the reason for the asymmetry?
> > 
> > Ah, but the problem is that map_page could allocate a swiotlb buffer
> > (actually it does on arm64) that without a corresponding unmap_page
> > call, would end up being leaked, right?
> 
> Yes. You could hack dma_capable() to always return true for dom0
> (because the pfn/dma address here doesn't have anything to do with the
> real mfn) but that's more of a hack assuming a lot about the swiotlb
> implementation.

Another idea would be to avoid calling the native map_page for foreign
pages, but in the xen specific implementation instead of making the
hypercall, we could call __dma_map_area on arm64 and map_page on arm.

Something like this:


In arch/arm/include/asm/xen/page-coherent.h:

static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
	     dma_addr_t dev_addr, unsigned long offset, size_t size,
	     enum dma_data_direction dir, struct dma_attrs *attrs)
{
	if (pfn_valid(PFN_DOWN(dev_addr))) {
		if (__generic_dma_ops(hwdev)->map_page)
			__generic_dma_ops(hwdev)->map_page(hwdev, page, offset, size, dir, attrs);
	} else
		__xen_dma_map_page(hwdev, page, dev_addr, offset, size, dir, attrs);
}



In arch/arm/xen/mm.c:

void __xen_dma_map_page(struct device *hwdev, struct page *page,
	     dma_addr_t dev_addr, unsigned long offset, size_t size,
	     enum dma_data_direction dir, struct dma_attrs *attrs)
{
	if (is_device_dma_coherent(hwdev))
		return;
#ifdef CONFIG_ARM64
	__dma_map_area(page_to_phys(page) + offset, size, dir);
#else
	__generic_dma_ops(hwdev)->map_page(hwdev, page, offset, size, dir, attrs);
#endif
}


It wouldn't be as nice as using the hypercall but it would be faster and
wouldn't depend on the inner workings of the arm64 implementation of
map_page, except for __dma_map_area.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Catalin Marinas Nov. 10, 2014, 10:16 a.m. UTC | #14

On Fri, Nov 07, 2014 at 06:45:22PM +0000, Stefano Stabellini wrote:
> On Fri, 7 Nov 2014, Catalin Marinas wrote:
> > On Fri, Nov 07, 2014 at 05:35:41PM +0000, Stefano Stabellini wrote:
> > > On Fri, 7 Nov 2014, Stefano Stabellini wrote:
> > > > On Fri, 7 Nov 2014, Catalin Marinas wrote:
> > > > > What I would like to see is xen_dma_map_page() also using hyp calls for
> > > > > cache maintenance when !pfn_valid(), for symmetry with unmap. You would
> > > > > need another argument to xen_dma_map_page() though to pass the real
> > > > > device address or mfn (and on the map side you could simply check for
> > > > > page_to_pfn(page) != mfn). For such cases, Xen swiotlb already handles
> > > > > bouncing so you don't need dom0 swiotlb involved as well.
> > > > 
> > > > I can see that it would be nice to have map_page and unmap_page be
> > > > symmetrical. However actually doing the map_page flush with an hypercall
> > > > would slow things down. Hypercalls are slower than function calls. It is
> > > > faster to do the cache flushing in dom0 if possible. In the map_page
> > > > case we have the struct page so we can easily do it by calling the
> > > > native dma_ops function.
> > > > 
> > > > Maybe I could just add a comment to explain the reason for the asymmetry?
> > > 
> > > Ah, but the problem is that map_page could allocate a swiotlb buffer
> > > (actually it does on arm64) that without a corresponding unmap_page
> > > call, would end up being leaked, right?
> > 
> > Yes. You could hack dma_capable() to always return true for dom0
> > (because the pfn/dma address here doesn't have anything to do with the
> > real mfn) but that's more of a hack assuming a lot about the swiotlb
> > implementation.
> 
> Another idea would be to avoid calling the native map_page for foreign
> pages, but in the xen specific implementation instead of making the
> hypercall, we could call __dma_map_area on arm64 and map_page on arm.

The problem here is that you assume that for an SoC that is not fully
coherent all it needs is __dma_map_area. If you look at mach-mvebu, the
DMA is nearly cache coherent but it needs some specific synchronisation
barrier at the interconnect level. If we get something like this on a
platform with virtualisation, it would be implemented at the dom0 level
by SoC-specific DMA ops. Xen hypervisor I assume has its own BSP, hence
it could implement SoC specific cache flushing there. But with the mix
of cache flushing in dom0 on map and hypervisor on unmap such thing is
no longer be possible in a nice way.

> In arch/arm/include/asm/xen/page-coherent.h:
> 
> static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
> 	     dma_addr_t dev_addr, unsigned long offset, size_t size,
> 	     enum dma_data_direction dir, struct dma_attrs *attrs)
> {
> 	if (pfn_valid(PFN_DOWN(dev_addr))) {

BTW, pfn_valid() is more expensive than simply comparing the mfn with
pfn for dom0. The calling code knows this already and it may be quicker
to simply pass a "bool foreign" argument.

> It wouldn't be as nice as using the hypercall but it would be faster and
> wouldn't depend on the inner workings of the arm64 implementation of
> map_page, except for __dma_map_area.

This "except" is big as we may at some point get some SoC like mvebu (I
would say less likely for arm64 than arm32).

BTW, does the Xen hypervisor already have a mapping of the mfn? If not
and it has to create one temporarily for the flush, you may not lose
much by using the dom0 ops for unmap with a hyper call for temporarily
mapping the foreign page in dom0's IPA space (you could do the unmapping
lazily).

Stefano Stabellini Nov. 10, 2014, 12:35 p.m. UTC | #15

On Mon, 10 Nov 2014, Catalin Marinas wrote:
> On Fri, Nov 07, 2014 at 06:45:22PM +0000, Stefano Stabellini wrote:
> > On Fri, 7 Nov 2014, Catalin Marinas wrote:
> > > On Fri, Nov 07, 2014 at 05:35:41PM +0000, Stefano Stabellini wrote:
> > > > On Fri, 7 Nov 2014, Stefano Stabellini wrote:
> > > > > On Fri, 7 Nov 2014, Catalin Marinas wrote:
> > > > > > What I would like to see is xen_dma_map_page() also using hyp calls for
> > > > > > cache maintenance when !pfn_valid(), for symmetry with unmap. You would
> > > > > > need another argument to xen_dma_map_page() though to pass the real
> > > > > > device address or mfn (and on the map side you could simply check for
> > > > > > page_to_pfn(page) != mfn). For such cases, Xen swiotlb already handles
> > > > > > bouncing so you don't need dom0 swiotlb involved as well.
> > > > > 
> > > > > I can see that it would be nice to have map_page and unmap_page be
> > > > > symmetrical. However actually doing the map_page flush with an hypercall
> > > > > would slow things down. Hypercalls are slower than function calls. It is
> > > > > faster to do the cache flushing in dom0 if possible. In the map_page
> > > > > case we have the struct page so we can easily do it by calling the
> > > > > native dma_ops function.
> > > > > 
> > > > > Maybe I could just add a comment to explain the reason for the asymmetry?
> > > > 
> > > > Ah, but the problem is that map_page could allocate a swiotlb buffer
> > > > (actually it does on arm64) that without a corresponding unmap_page
> > > > call, would end up being leaked, right?
> > > 
> > > Yes. You could hack dma_capable() to always return true for dom0
> > > (because the pfn/dma address here doesn't have anything to do with the
> > > real mfn) but that's more of a hack assuming a lot about the swiotlb
> > > implementation.
> > 
> > Another idea would be to avoid calling the native map_page for foreign
> > pages, but in the xen specific implementation instead of making the
> > hypercall, we could call __dma_map_area on arm64 and map_page on arm.
> 
> The problem here is that you assume that for an SoC that is not fully
> coherent all it needs is __dma_map_area. If you look at mach-mvebu, the
> DMA is nearly cache coherent but it needs some specific synchronisation
> barrier at the interconnect level. If we get something like this on a
> platform with virtualisation, it would be implemented at the dom0 level
> by SoC-specific DMA ops. Xen hypervisor I assume has its own BSP, hence
> it could implement SoC specific cache flushing there. But with the mix
> of cache flushing in dom0 on map and hypervisor on unmap such thing is
> no longer be possible in a nice way.

Yes, Xen has its own little platform files.

I think you convinced me that having map in Linux and unmap in Xen is
not a long term solid solution.


> > In arch/arm/include/asm/xen/page-coherent.h:
> > 
> > static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
> > 	     dma_addr_t dev_addr, unsigned long offset, size_t size,
> > 	     enum dma_data_direction dir, struct dma_attrs *attrs)
> > {
> > 	if (pfn_valid(PFN_DOWN(dev_addr))) {
> 
> BTW, pfn_valid() is more expensive than simply comparing the mfn with
> pfn for dom0. The calling code knows this already and it may be quicker
> to simply pass a "bool foreign" argument.

Good point, I can compare mfn and pfn here.
I cannot do the same in unmap_page and the various sync operations,
because there is no pfn available, but maybe I could use
set_page_private to set a flag. I'll think about it.


> > It wouldn't be as nice as using the hypercall but it would be faster and
> > wouldn't depend on the inner workings of the arm64 implementation of
> > map_page, except for __dma_map_area.
> 
> This "except" is big as we may at some point get some SoC like mvebu (I
> would say less likely for arm64 than arm32).
> 
> BTW, does the Xen hypervisor already have a mapping of the mfn?

On arm64, it does.
On arm32, we have a pool of mappings, when we run out we start
overwriting old mappings.


> If not
> and it has to create one temporarily for the flush, you may not lose
> much by using the dom0 ops for unmap with a hyper call for temporarily
> mapping the foreign page in dom0's IPA space (you could do the unmapping
> lazily).

That's true. Fortunately on arm64 we are OK.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Stefano Stabellini Nov. 10, 2014, 12:41 p.m. UTC | #16

On Mon, 10 Nov 2014, Stefano Stabellini wrote:
> > BTW, pfn_valid() is more expensive than simply comparing the mfn with
> > pfn for dom0. The calling code knows this already and it may be quicker
> > to simply pass a "bool foreign" argument.
> 
> Good point, I can compare mfn and pfn here.
> I cannot do the same in unmap_page and the various sync operations,
> because there is no pfn available, but maybe I could use
> set_page_private to set a flag. I'll think about it.

But of course there is no struct page either so I think pfn_valid will
have to stay in unmap and sync
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[v7,3/8] arm64: introduce is_device_dma_coherent

Commit Message

Comments

Patch