diff mbox series

[RFC,2/7] dma: Handle swiotlb throttling for SGLs

Message ID 20240822183718.1234-3-mhklinux@outlook.com
State New
Headers show
Series Introduce swiotlb throttling | expand

Commit Message

mhkelley58@gmail.com Aug. 22, 2024, 6:37 p.m. UTC
From: Michael Kelley <mhklinux@outlook.com>

When a DMA map request is for a SGL, each SGL entry results in an
independent mapping operation. If the mapping requires a bounce buffer
due to running in a CoCo VM or due to swiotlb=force on the boot line,
swiotlb is invoked. If swiotlb throttling is enabled for the request,
each SGL entry results in a separate throttling operation. This is
problematic because a thread may be holding swiotlb memory while waiting
for memory to become free.

Resolve this problem by only allowing throttling on the 0th SGL
entry. When unmapping the SGL, unmap entries 1 thru N-1 first, then
unmap entry 0 so that the throttle isn't released until all swiotlb
memory has been freed.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
This approach to SGLs muddies the line between DMA direct and swiotlb
throttling functionality. To keep the MAY_BLOCK attr fully generic, it
should propagate to the mapping of all SGL entries.

An alternate approach is to define an additional DMA attribute that
is internal to the DMA layer. Instead of clearing MAX_BLOCK, this
attr is added by dma_direct_map_sg() when mapping SGL entries other
than the 0th entry. swiotlb would do throttling only when MAY_BLOCK
is set and this new attr is not set.

This approach has a modest amount of additional complexity. Given
that we currently have no other users of the MAY_BLOCK attr, the
conceptual cleanliness may not be warranted until we do.

Thoughts?

 kernel/dma/direct.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

Comments

Petr Tesařík Aug. 23, 2024, 8:02 a.m. UTC | #1
On Thu, 22 Aug 2024 11:37:13 -0700
mhkelley58@gmail.com wrote:

> From: Michael Kelley <mhklinux@outlook.com>
> 
> When a DMA map request is for a SGL, each SGL entry results in an
> independent mapping operation. If the mapping requires a bounce buffer
> due to running in a CoCo VM or due to swiotlb=force on the boot line,
> swiotlb is invoked. If swiotlb throttling is enabled for the request,
> each SGL entry results in a separate throttling operation. This is
> problematic because a thread may be holding swiotlb memory while waiting
> for memory to become free.
> 
> Resolve this problem by only allowing throttling on the 0th SGL
> entry. When unmapping the SGL, unmap entries 1 thru N-1 first, then
> unmap entry 0 so that the throttle isn't released until all swiotlb
> memory has been freed.
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
> This approach to SGLs muddies the line between DMA direct and swiotlb
> throttling functionality. To keep the MAY_BLOCK attr fully generic, it
> should propagate to the mapping of all SGL entries.
> 
> An alternate approach is to define an additional DMA attribute that
> is internal to the DMA layer. Instead of clearing MAX_BLOCK, this
> attr is added by dma_direct_map_sg() when mapping SGL entries other
> than the 0th entry. swiotlb would do throttling only when MAY_BLOCK
> is set and this new attr is not set.
> 
> This approach has a modest amount of additional complexity. Given
> that we currently have no other users of the MAY_BLOCK attr, the
> conceptual cleanliness may not be warranted until we do.
> 
> Thoughts?

If we agree to change the unthrottling logic (see my comment to your
RFC 1/7), we'll need an additional attribute to delay unthrottling when
unmapping sg list entries 1 to N-1. This attribute could convey that
the mapping is the non-initial segment of an sg list and it could then
be also used to disable blocking in swiotlb_tbl_map_single().

> 
>  kernel/dma/direct.c | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 4480a3cd92e0..80e03c0838d4 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -438,6 +438,18 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
>  		arch_sync_dma_for_cpu_all();
>  }
>  
> +static void dma_direct_unmap_sgl_entry(struct device *dev,
> +		struct scatterlist *sgl, enum dma_data_direction dir,

Nitpick: This parameter should probably be called "sg", because it is
never used to do any operation on the whole list. Similarly, the
function could be called dma_direct_unmap_sg_entry(), because there is
no dma_direct_unmap_sgl() either...

> +		unsigned long attrs)
> +
> +{
> +	if (sg_dma_is_bus_address(sgl))
> +		sg_dma_unmark_bus_address(sgl);
> +	else
> +		dma_direct_unmap_page(dev, sgl->dma_address,
> +				      sg_dma_len(sgl), dir, attrs);
> +}
> +
>  /*
>   * Unmaps segments, except for ones marked as pci_p2pdma which do not
>   * require any further action as they contain a bus address.
> @@ -449,12 +461,20 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>  	int i;
>  
>  	for_each_sg(sgl,  sg, nents, i) {
> -		if (sg_dma_is_bus_address(sg))
> -			sg_dma_unmark_bus_address(sg);
> -		else
> -			dma_direct_unmap_page(dev, sg->dma_address,
> -					      sg_dma_len(sg), dir, attrs);
> +		/*
> +		 * Skip the 0th SGL entry in case this SGL consists of
> +		 * throttled swiotlb mappings. In such a case, any other
> +		 * entries should be unmapped first since unmapping the
> +		 * 0th entry will release the throttle semaphore.
> +		 */
> +		if (!i)
> +			continue;
> +		dma_direct_unmap_sgl_entry(dev, sg, dir, attrs);
>  	}
> +
> +	/* Now do the 0th SGL entry */
> +	if (nents)

I wonder if nents can ever be zero here, but it's nowhere enforced and
dma_map_sg_attrs() is exported, so I agree, let's play it safe.

> +		dma_direct_unmap_sgl_entry(dev, sgl, dir, attrs);
>  }
>  #endif
>  
> @@ -492,6 +512,11 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>  			ret = -EIO;
>  			goto out_unmap;
>  		}
> +
> +		/* Allow only the 0th SGL entry to block */
> +		if (!i)

Are you sure? I think the modified value of attrs is first used in the
next loop iteration, so the conditional should be removed, or else both
segment index 0 and 1 will keep the flag.

Petr T

> +			attrs &= ~DMA_ATTR_MAY_BLOCK;
> +
>  		sg_dma_len(sg) = sg->length;
>  	}
>
Michael Kelley Aug. 23, 2024, 8:42 p.m. UTC | #2
From: Petr Tesařík <petr@tesarici.cz> Sent: Friday, August 23, 2024 1:03 AM
> 
> On Thu, 22 Aug 2024 11:37:13 -0700
> mhkelley58@gmail.com wrote:
> 
> > From: Michael Kelley <mhklinux@outlook.com>
> >
> > When a DMA map request is for a SGL, each SGL entry results in an
> > independent mapping operation. If the mapping requires a bounce buffer
> > due to running in a CoCo VM or due to swiotlb=force on the boot line,
> > swiotlb is invoked. If swiotlb throttling is enabled for the request,
> > each SGL entry results in a separate throttling operation. This is
> > problematic because a thread may be holding swiotlb memory while waiting
> > for memory to become free.
> >
> > Resolve this problem by only allowing throttling on the 0th SGL
> > entry. When unmapping the SGL, unmap entries 1 thru N-1 first, then
> > unmap entry 0 so that the throttle isn't released until all swiotlb
> > memory has been freed.
> >
> > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> > ---
> > This approach to SGLs muddies the line between DMA direct and swiotlb
> > throttling functionality. To keep the MAY_BLOCK attr fully generic, it
> > should propagate to the mapping of all SGL entries.
> >
> > An alternate approach is to define an additional DMA attribute that
> > is internal to the DMA layer. Instead of clearing MAX_BLOCK, this
> > attr is added by dma_direct_map_sg() when mapping SGL entries other
> > than the 0th entry. swiotlb would do throttling only when MAY_BLOCK
> > is set and this new attr is not set.
> >
> > This approach has a modest amount of additional complexity. Given
> > that we currently have no other users of the MAY_BLOCK attr, the
> > conceptual cleanliness may not be warranted until we do.
> >
> > Thoughts?
> 
> If we agree to change the unthrottling logic (see my comment to your
> RFC 1/7), we'll need an additional attribute to delay unthrottling when
> unmapping sg list entries 1 to N-1. This attribute could convey that
> the mapping is the non-initial segment of an sg list and it could then
> be also used to disable blocking in swiotlb_tbl_map_single().
> 
> >
> >  kernel/dma/direct.c | 35 ++++++++++++++++++++++++++++++-----
> >  1 file changed, 30 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 4480a3cd92e0..80e03c0838d4 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -438,6 +438,18 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
> >  		arch_sync_dma_for_cpu_all();
> >  }
> >
> > +static void dma_direct_unmap_sgl_entry(struct device *dev,
> > +		struct scatterlist *sgl, enum dma_data_direction dir,
> 
> Nitpick: This parameter should probably be called "sg", because it is
> never used to do any operation on the whole list. Similarly, the
> function could be called dma_direct_unmap_sg_entry(), because there is
> no dma_direct_unmap_sgl() either...

OK.  I agree.

> 
> > +		unsigned long attrs)
> > +
> > +{
> > +	if (sg_dma_is_bus_address(sgl))
> > +		sg_dma_unmark_bus_address(sgl);
> > +	else
> > +		dma_direct_unmap_page(dev, sgl->dma_address,
> > +				      sg_dma_len(sgl), dir, attrs);
> > +}
> > +
> >  /*
> >   * Unmaps segments, except for ones marked as pci_p2pdma which do not
> >   * require any further action as they contain a bus address.
> > @@ -449,12 +461,20 @@ void dma_direct_unmap_sg(struct device *dev, struct
> scatterlist *sgl,
> >  	int i;
> >
> >  	for_each_sg(sgl,  sg, nents, i) {
> > -		if (sg_dma_is_bus_address(sg))
> > -			sg_dma_unmark_bus_address(sg);
> > -		else
> > -			dma_direct_unmap_page(dev, sg->dma_address,
> > -					      sg_dma_len(sg), dir, attrs);
> > +		/*
> > +		 * Skip the 0th SGL entry in case this SGL consists of
> > +		 * throttled swiotlb mappings. In such a case, any other
> > +		 * entries should be unmapped first since unmapping the
> > +		 * 0th entry will release the throttle semaphore.
> > +		 */
> > +		if (!i)
> > +			continue;
> > +		dma_direct_unmap_sgl_entry(dev, sg, dir, attrs);
> >  	}
> > +
> > +	/* Now do the 0th SGL entry */
> > +	if (nents)
> 
> I wonder if nents can ever be zero here, but it's nowhere enforced and
> dma_map_sg_attrs() is exported, so I agree, let's play it safe.

Yep -- my thinking exactly.

> 
> > +		dma_direct_unmap_sgl_entry(dev, sgl, dir, attrs);
> >  }
> >  #endif
> >
> > @@ -492,6 +512,11 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist
> *sgl, int nents,
> >  			ret = -EIO;
> >  			goto out_unmap;
> >  		}
> > +
> > +		/* Allow only the 0th SGL entry to block */
> > +		if (!i)
> 
> Are you sure? I think the modified value of attrs is first used in the
> next loop iteration, so the conditional should be removed, or else both
> segment index 0 and 1 will keep the flag.

I don't understand your comment. If it's present, the MAY_BLOCK flag
is used for the index 0 SGL entry, and then is cleared before the loop is
run again for the index 1 and subsequent SGL entries. But it would
still work with the conditional removed, and maybe the CPU overhead
of always clearing the flag is the same as doing the conditional.

Michael

> 
> Petr T
> 
> > +			attrs &= ~DMA_ATTR_MAY_BLOCK;
> > +
> >  		sg_dma_len(sg) = sg->length;
> >  	}
> >
Petr Tesařík Aug. 24, 2024, 7:56 p.m. UTC | #3
On Fri, 23 Aug 2024 20:42:08 +0000
Michael Kelley <mhklinux@outlook.com> wrote:

> From: Petr Tesařík <petr@tesarici.cz> Sent: Friday, August 23, 2024 1:03 AM
> > 
> > On Thu, 22 Aug 2024 11:37:13 -0700
> > mhkelley58@gmail.com wrote:
> >   
> > > From: Michael Kelley <mhklinux@outlook.com>
> > >
> > > When a DMA map request is for a SGL, each SGL entry results in an
> > > independent mapping operation. If the mapping requires a bounce buffer
> > > due to running in a CoCo VM or due to swiotlb=force on the boot line,
> > > swiotlb is invoked. If swiotlb throttling is enabled for the request,
> > > each SGL entry results in a separate throttling operation. This is
> > > problematic because a thread may be holding swiotlb memory while waiting
> > > for memory to become free.
> > >
> > > Resolve this problem by only allowing throttling on the 0th SGL
> > > entry. When unmapping the SGL, unmap entries 1 thru N-1 first, then
> > > unmap entry 0 so that the throttle isn't released until all swiotlb
> > > memory has been freed.
> > >
> > > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> > > ---
> > > This approach to SGLs muddies the line between DMA direct and swiotlb
> > > throttling functionality. To keep the MAY_BLOCK attr fully generic, it
> > > should propagate to the mapping of all SGL entries.
> > >
> > > An alternate approach is to define an additional DMA attribute that
> > > is internal to the DMA layer. Instead of clearing MAX_BLOCK, this
> > > attr is added by dma_direct_map_sg() when mapping SGL entries other
> > > than the 0th entry. swiotlb would do throttling only when MAY_BLOCK
> > > is set and this new attr is not set.
> > >
> > > This approach has a modest amount of additional complexity. Given
> > > that we currently have no other users of the MAY_BLOCK attr, the
> > > conceptual cleanliness may not be warranted until we do.
> > >
> > > Thoughts?  
> > 
> > If we agree to change the unthrottling logic (see my comment to your
> > RFC 1/7), we'll need an additional attribute to delay unthrottling when
> > unmapping sg list entries 1 to N-1. This attribute could convey that
> > the mapping is the non-initial segment of an sg list and it could then
> > be also used to disable blocking in swiotlb_tbl_map_single().
> >   
> > >
> > >  kernel/dma/direct.c | 35 ++++++++++++++++++++++++++++++-----
> > >  1 file changed, 30 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > > index 4480a3cd92e0..80e03c0838d4 100644
> > > --- a/kernel/dma/direct.c
> > > +++ b/kernel/dma/direct.c
> > > @@ -438,6 +438,18 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
> > >  		arch_sync_dma_for_cpu_all();
> > >  }
> > >
> > > +static void dma_direct_unmap_sgl_entry(struct device *dev,
> > > +		struct scatterlist *sgl, enum dma_data_direction dir,  
> > 
> > Nitpick: This parameter should probably be called "sg", because it is
> > never used to do any operation on the whole list. Similarly, the
> > function could be called dma_direct_unmap_sg_entry(), because there is
> > no dma_direct_unmap_sgl() either...  
> 
> OK.  I agree.
> 
> >   
> > > +		unsigned long attrs)
> > > +
> > > +{
> > > +	if (sg_dma_is_bus_address(sgl))
> > > +		sg_dma_unmark_bus_address(sgl);
> > > +	else
> > > +		dma_direct_unmap_page(dev, sgl->dma_address,
> > > +				      sg_dma_len(sgl), dir, attrs);
> > > +}
> > > +
> > >  /*
> > >   * Unmaps segments, except for ones marked as pci_p2pdma which do not
> > >   * require any further action as they contain a bus address.
> > > @@ -449,12 +461,20 @@ void dma_direct_unmap_sg(struct device *dev, struct  
> > scatterlist *sgl,  
> > >  	int i;
> > >
> > >  	for_each_sg(sgl,  sg, nents, i) {
> > > -		if (sg_dma_is_bus_address(sg))
> > > -			sg_dma_unmark_bus_address(sg);
> > > -		else
> > > -			dma_direct_unmap_page(dev, sg->dma_address,
> > > -					      sg_dma_len(sg), dir, attrs);
> > > +		/*
> > > +		 * Skip the 0th SGL entry in case this SGL consists of
> > > +		 * throttled swiotlb mappings. In such a case, any other
> > > +		 * entries should be unmapped first since unmapping the
> > > +		 * 0th entry will release the throttle semaphore.
> > > +		 */
> > > +		if (!i)
> > > +			continue;
> > > +		dma_direct_unmap_sgl_entry(dev, sg, dir, attrs);
> > >  	}
> > > +
> > > +	/* Now do the 0th SGL entry */
> > > +	if (nents)  
> > 
> > I wonder if nents can ever be zero here, but it's nowhere enforced and
> > dma_map_sg_attrs() is exported, so I agree, let's play it safe.  
> 
> Yep -- my thinking exactly.
> 
> >   
> > > +		dma_direct_unmap_sgl_entry(dev, sgl, dir, attrs);
> > >  }
> > >  #endif
> > >
> > > @@ -492,6 +512,11 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist  
> > *sgl, int nents,  
> > >  			ret = -EIO;
> > >  			goto out_unmap;
> > >  		}
> > > +
> > > +		/* Allow only the 0th SGL entry to block */
> > > +		if (!i)  
> > 
> > Are you sure? I think the modified value of attrs is first used in the
> > next loop iteration, so the conditional should be removed, or else both
> > segment index 0 and 1 will keep the flag.  
> 
> I don't understand your comment. If it's present, the MAY_BLOCK flag
> is used for the index 0 SGL entry, and then is cleared before the loop is
> run again for the index 1 and subsequent SGL entries. But it would
> still work with the conditional removed, and maybe the CPU overhead
> of always clearing the flag is the same as doing the conditional.

Yes, this was my original thinking, but then I somehow got caught up in
it and went on to thinking that the condition was not only unnecessary,
but wrong. You're right, the code is perfectly fine as it is.

Sorry for the noise.

Petr T
diff mbox series

Patch

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 4480a3cd92e0..80e03c0838d4 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -438,6 +438,18 @@  void dma_direct_sync_sg_for_cpu(struct device *dev,
 		arch_sync_dma_for_cpu_all();
 }
 
+static void dma_direct_unmap_sgl_entry(struct device *dev,
+		struct scatterlist *sgl, enum dma_data_direction dir,
+		unsigned long attrs)
+
+{
+	if (sg_dma_is_bus_address(sgl))
+		sg_dma_unmark_bus_address(sgl);
+	else
+		dma_direct_unmap_page(dev, sgl->dma_address,
+				      sg_dma_len(sgl), dir, attrs);
+}
+
 /*
  * Unmaps segments, except for ones marked as pci_p2pdma which do not
  * require any further action as they contain a bus address.
@@ -449,12 +461,20 @@  void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 	int i;
 
 	for_each_sg(sgl,  sg, nents, i) {
-		if (sg_dma_is_bus_address(sg))
-			sg_dma_unmark_bus_address(sg);
-		else
-			dma_direct_unmap_page(dev, sg->dma_address,
-					      sg_dma_len(sg), dir, attrs);
+		/*
+		 * Skip the 0th SGL entry in case this SGL consists of
+		 * throttled swiotlb mappings. In such a case, any other
+		 * entries should be unmapped first since unmapping the
+		 * 0th entry will release the throttle semaphore.
+		 */
+		if (!i)
+			continue;
+		dma_direct_unmap_sgl_entry(dev, sg, dir, attrs);
 	}
+
+	/* Now do the 0th SGL entry */
+	if (nents)
+		dma_direct_unmap_sgl_entry(dev, sgl, dir, attrs);
 }
 #endif
 
@@ -492,6 +512,11 @@  int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 			ret = -EIO;
 			goto out_unmap;
 		}
+
+		/* Allow only the 0th SGL entry to block */
+		if (!i)
+			attrs &= ~DMA_ATTR_MAY_BLOCK;
+
 		sg_dma_len(sg) = sg->length;
 	}