mbox series

[v5,0/7] udmbuf bug fix and some improvements

Message ID 20240903083818.3071759-1-link@vivo.com
Headers show
Series udmbuf bug fix and some improvements | expand

Message

Huan Yang Sept. 3, 2024, 8:38 a.m. UTC
This patchset attempts to fix some errors in udmabuf and remove the
upin_list structure.

By this patchset, udmabuf remove page struct relay, only use
pfn and folios.

Some of this fix just gather the patches which I upload before.

Any patch has passed the udmabuf self-test suite's tests.
Suggested by Kasireddy, Vivek <vivek.kasireddy@intel.com>
Patch5 modified the unpin function, therefore running the udmabuf
self-test program in a loop did not reveal any memory leaks.

Notice: Test item 6 maybe requires running the command:
  echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

Changelog
===
  v4 -> v5:
    Patch1 doc why pre-fault failed not report error.
    Patch2 no change
    Patch3 update ack tag marked in v4
    Patch4 split some change into patch5 and patch7, this only contains
           udmabuf pin folio's function
    Patch5 add helper init/deini to clean udmabuf struct initialized
    Patch6 remove unnecessary code
    Patch7 remove reuse folio remove v4 patch4

  v3 -> v4:
    Patch1 fill vma area with pfn only when true access.
    Patch2 no change
    Patch3 select VMAP_PFN
    Patch4 use function pin each item.
    Patch5 use pinned folio array instead udmabuf_folio struct.
  v2 -> v3:
    Patch1, avoid use page, instead, use pfn, and use vmf_insert_pfn map
    suggested-by Kasireddy, Vivek <vivek.kasireddy@intel.com>

    Patch2, update acked-by Kasireddy, Vivek <vivek.kasireddy@intel.com>
    And keep the kvcalloc on the same line.

    Patch3, avoid use page, instead, use pfn, then use vmap_pfn map

    Patch4, split v2 patch4, single update codestyle to keep review
    easy.

    Patch5, another way to remove udmabuf_folio
    
---
  v1 -> v2:
    Patch1, 3 Rectify the improper use of the sg table.
    suggested-by Christian König <christian.koenig@amd.com>
    
    Patch2 add acked-by Christian K�nig <christian.koenig@amd.com> which
    marked in v1

    Patch4
    Modify the data structure to restore the use of pages and
    correct the misunderstanding of loop conditions such as "pgcnt".
    make sure pass self test.

    remove v1's patch4

v4
  https://lore.kernel.org/all/20240822084342.1574914-1-link@vivo.com/
v3
  https://lore.kernel.org/all/20240813090518.3252469-1-link@vivo.com/
v2
  https://lore.kernel.org/all/20240805032550.3912454-1-link@vivo.com/
v1
  https://lore.kernel.org/all/20240801104512.4056860-1-link@vivo.com/

Huan Yang (7):
  udmabuf: pre-fault when first page fault
  udmabuf: change folios array from kmalloc to kvmalloc
  udmabuf: fix vmap_udmabuf error page set
  udmabuf: udmabuf_create pin folio codestyle cleanup
  udmabuf: introduce udmabuf init and deinit helper
  udmabuf: remove udmabuf_folio
  udmabuf: reuse folio array when pin folios

 drivers/dma-buf/Kconfig   |   1 +
 drivers/dma-buf/udmabuf.c | 272 +++++++++++++++++++++++---------------
 2 files changed, 166 insertions(+), 107 deletions(-)


base-commit: c79c85875f1af04040fe4492ed94ce37ad729c4d

Comments

Kasireddy, Vivek Sept. 6, 2024, 8:12 a.m. UTC | #1
Hi Huan,

> Subject: [PATCH v5 1/7] udmabuf: pre-fault when first page fault
> 
> The current udmabuf mmap uses a page fault to populate the vma.
> 
> However, the current udmabuf has already obtained and pinned the folio
> upon completion of the creation.This means that the physical memory has
> already been acquired, rather than being accessed dynamically.
> 
> As a result, the page fault has lost its purpose as a demanding
> page. Due to the fact that page fault requires trapping into kernel mode
> and filling in when accessing the corresponding virtual address in mmap,
> when creating a large size udmabuf, this represents a considerable
> overhead.
> 
> This patch first the pfn into page table, and then pre-fault each pfn
> into vma, when first access. Should know, if anything wrong when
> pre-fault, will not report it's error, else, report when task access it
> at the first time.
> 
> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> Signed-off-by: Huan Yang <link@vivo.com>
> ---
>  drivers/dma-buf/udmabuf.c | 35 +++++++++++++++++++++++++++++++++--
>  1 file changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index 047c3cd2ceff..0a8c231a36e1 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -43,7 +43,8 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault
> *vmf)
>  	struct vm_area_struct *vma = vmf->vma;
>  	struct udmabuf *ubuf = vma->vm_private_data;
>  	pgoff_t pgoff = vmf->pgoff;
> -	unsigned long pfn;
> +	unsigned long addr, end, pfn;
> +	vm_fault_t ret;
> 
>  	if (pgoff >= ubuf->pagecount)
>  		return VM_FAULT_SIGBUS;
> @@ -51,7 +52,37 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault
> *vmf)
>  	pfn = folio_pfn(ubuf->folios[pgoff]);
>  	pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
> 
> -	return vmf_insert_pfn(vma, vmf->address, pfn);
> +	ret = vmf_insert_pfn(vma, vmf->address, pfn);
> +	if (ret & VM_FAULT_ERROR)
> +		return ret;
> +
> +	/* pre fault */
> +	pgoff = vma->vm_pgoff;
> +	end = vma->vm_end;
Nit: use vma->vm_end directly in the loop below, as end is used only once.

> +	addr = vma->vm_start;
> +
> +	for (; addr < end; pgoff++, addr += PAGE_SIZE) {
> +		if (addr == vmf->address)
> +			continue;
> +
> +		if (WARN_ON(pgoff >= ubuf->pagecount))
> +			break;
> +
> +		pfn = folio_pfn(ubuf->folios[pgoff]);
> +
Nit: no need for a blank line here.

> +		pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
> +
> +		/**
> +		 * If something wrong, due to this vm fault success,
> +		 * do not report in here, report only when true access
> +		 * this addr.
> +		 * So, don't update ret here, just break.
Please rewrite the above comments as:
* If the below vmf_insert_pfn() fails, we do not return an error here
* during this pre-fault step. However, an error will be returned if the
* failure occurs when the addr is truly accessed.

With that,
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>

> +		 */
> +		if (vmf_insert_pfn(vma, addr, pfn) & VM_FAULT_ERROR)
> +			break;
> +	}
> +
> +	return ret;
>  }
> 
>  static const struct vm_operations_struct udmabuf_vm_ops = {
> --
> 2.45.2
Kasireddy, Vivek Sept. 6, 2024, 8:20 a.m. UTC | #2
Hi Huan,

> Subject: [PATCH v5 5/7] udmabuf: introduce udmabuf init and deinit helper
> 
> After udmabuf is allocated, its resources need to be initialized,
> including various array structures. The current array structure has
> already been greatly expanded.
> 
> Also, before udmabuf needs to be kfree, the occupied resources need to
> be released.
> 
> This part is repetitive and maybe overlooked.
> 
> This patch give a helper function when init and deinit, by this,
> deduce duplicate code.
*reduce

If possible, please try to improve the wording and grammatical correctness
in the commit messages of other patches as well.

> 
> Signed-off-by: Huan Yang <link@vivo.com>
> ---
>  drivers/dma-buf/udmabuf.c | 52 +++++++++++++++++++++++----------------
>  1 file changed, 31 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index ca2b21c5c57f..254d9ec3d9f3 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -226,6 +226,28 @@ static int add_to_unpin_list(struct list_head
> *unpin_list,
>  	return 0;
>  }
> 
> +static __always_inline int init_udmabuf(struct udmabuf *ubuf, pgoff_t
> pgcnt)
> +{
> +	INIT_LIST_HEAD(&ubuf->unpin_list);
> +
> +	ubuf->folios = kvmalloc_array(pgcnt, sizeof(*ubuf->folios),
> GFP_KERNEL);
> +	if (!ubuf->folios)
> +		return -ENOMEM;
> +
> +	ubuf->offsets = kvcalloc(pgcnt, sizeof(*ubuf->offsets), GFP_KERNEL);
> +	if (!ubuf->offsets)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static __always_inline void deinit_udmabuf(struct udmabuf *ubuf)
> +{
> +	unpin_all_folios(&ubuf->unpin_list);
> +	kvfree(ubuf->offsets);
> +	kvfree(ubuf->folios);
> +}
> +
>  static void release_udmabuf(struct dma_buf *buf)
>  {
>  	struct udmabuf *ubuf = buf->priv;
> @@ -234,9 +256,7 @@ static void release_udmabuf(struct dma_buf *buf)
>  	if (ubuf->sg)
>  		put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
> 
> -	unpin_all_folios(&ubuf->unpin_list);
> -	kvfree(ubuf->offsets);
> -	kvfree(ubuf->folios);
> +	deinit_udmabuf(ubuf);
>  	kfree(ubuf);
>  }
> 
> @@ -396,33 +416,24 @@ static long udmabuf_create(struct miscdevice
> *device,
>  	if (!ubuf)
>  		return -ENOMEM;
> 
> -	INIT_LIST_HEAD(&ubuf->unpin_list);
>  	pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT;
>  	for (i = 0; i < head->count; i++) {
>  		if (!PAGE_ALIGNED(list[i].offset))
> -			goto err;
> +			goto err_noinit;
>  		if (!PAGE_ALIGNED(list[i].size))
> -			goto err;
> +			goto err_noinit;
> 
>  		pgcnt += list[i].size >> PAGE_SHIFT;
>  		if (pgcnt > pglimit)
> -			goto err;
> +			goto err_noinit;
>  	}
> 
>  	if (!pgcnt)
> -		goto err;
> +		goto err_noinit;
> 
> -	ubuf->folios = kvmalloc_array(pgcnt, sizeof(*ubuf->folios),
> GFP_KERNEL);
> -	if (!ubuf->folios) {
> -		ret = -ENOMEM;
> +	ret = init_udmabuf(ubuf, pgcnt);
> +	if (ret)
>  		goto err;
> -	}
> -
> -	ubuf->offsets = kvcalloc(pgcnt, sizeof(*ubuf->offsets), GFP_KERNEL);
> -	if (!ubuf->offsets) {
> -		ret = -ENOMEM;
> -		goto err;
> -	}
> 
>  	for (i = 0; i < head->count; i++) {
>  		struct file *memfd = fget(list[i].memfd);
> @@ -446,9 +457,8 @@ static long udmabuf_create(struct miscdevice
> *device,
>  	return ret;
> 
>  err:
> -	unpin_all_folios(&ubuf->unpin_list);
> -	kvfree(ubuf->offsets);
> -	kvfree(ubuf->folios);
> +	deinit_udmabuf(ubuf);
> +err_noinit:
I don't really see the need for this new label, but I guess it makes things a
bit clear.

Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>

>  	kfree(ubuf);
>  	return ret;
>  }
> --
> 2.45.2
Kasireddy, Vivek Sept. 6, 2024, 8:23 a.m. UTC | #3
Hi Huan,

> Subject: [PATCH v5 7/7] udmabuf: reuse folio array when pin folios
> 
> When invoke memfd_pin_folios, we need offer an array to save each folio
> which we pinned.
> 
> The currently way is dynamic alloc an array, get folios, save into
> udmabuf and then free.
> 
> If the size is tiny, alloc from slab, is ok due to slab can cache it.
> Or, just PCP order can cover, also ok.
> 
> But if size is huge, need fallback into vmalloc, then, not well, due to
> each page will iter alloc, and map into vmalloc area. Too heavy.
> 
> Now that we need to iter each udmabuf item, then pin it's range folios,
> we can reuse the maximum size range's folios array.
> 
> Signed-off-by: Huan Yang <link@vivo.com>
> ---
>  drivers/dma-buf/udmabuf.c | 34 ++++++++++++++++++++--------------
>  1 file changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index d449c1fd67a5..d70e45c33442 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -343,28 +343,21 @@ static int export_udmabuf(struct udmabuf *ubuf,
>  }
> 
>  static int udmabuf_pin_folios(struct udmabuf *ubuf, struct file *memfd,
> -			      loff_t start, loff_t size)
> +			      loff_t start, loff_t size, struct folio **folios)
>  {
>  	pgoff_t pgoff, pgcnt;
>  	pgoff_t upgcnt = ubuf->pagecount;
>  	pgoff_t nr_pinned = ubuf->nr_pinned;
>  	u32 cur_folio, cur_pgcnt;
> -	struct folio **folios = NULL;
>  	long nr_folios;
>  	loff_t end;
>  	int ret = 0;
> 
>  	pgcnt = size >> PAGE_SHIFT;
> -	folios = kvmalloc_array(pgcnt, sizeof(*folios), GFP_KERNEL);
> -	if (!folios)
> -		return -ENOMEM;
> -
>  	end = start + (pgcnt << PAGE_SHIFT) - 1;
>  	nr_folios = memfd_pin_folios(memfd, start, end, folios, pgcnt,
> &pgoff);
> -	if (nr_folios <= 0) {
> -		ret = nr_folios ? nr_folios : -EINVAL;
> -		goto err;
> -	}
> +	if (nr_folios <= 0)
> +		return nr_folios ? nr_folios : -EINVAL;
> 
>  	cur_pgcnt = 0;
>  	for (cur_folio = 0; cur_folio < nr_folios; ++cur_folio) {
> @@ -389,10 +382,8 @@ static int udmabuf_pin_folios(struct udmabuf
> *ubuf, struct file *memfd,
>  		pgoff = 0;
>  	}
>  end:
> -err:
>  	ubuf->pagecount = upgcnt;
>  	ubuf->nr_pinned = nr_pinned;
> -	kvfree(folios);
>  	return ret;
The variable ret is now unused in this function. Remove it and just
return 0 at the end.

>  }
> 
> @@ -403,6 +394,8 @@ static long udmabuf_create(struct miscdevice
> *device,
>  	pgoff_t pgcnt = 0, pglimit;
>  	long ret = -EINVAL;
>  	struct udmabuf *ubuf;
> +	struct folio **folios = NULL;
> +	unsigned long max_nr_folios = 0;
>  	u32 i, flags;
> 
>  	ubuf = kzalloc(sizeof(*ubuf), GFP_KERNEL);
> @@ -411,14 +404,19 @@ static long udmabuf_create(struct miscdevice
> *device,
> 
>  	pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT;
>  	for (i = 0; i < head->count; i++) {
> +		pgoff_t subpgcnt;
> +
>  		if (!PAGE_ALIGNED(list[i].offset))
>  			goto err_noinit;
>  		if (!PAGE_ALIGNED(list[i].size))
>  			goto err_noinit;
> 
> -		pgcnt += list[i].size >> PAGE_SHIFT;
> +		subpgcnt = list[i].size >> PAGE_SHIFT;
> +		pgcnt += subpgcnt;
>  		if (pgcnt > pglimit)
>  			goto err_noinit;
> +
> +		max_nr_folios = max_t(unsigned long, subpgcnt,
> max_nr_folios);
>  	}
> 
>  	if (!pgcnt)
> @@ -428,6 +426,12 @@ static long udmabuf_create(struct miscdevice
> *device,
>  	if (ret)
>  		goto err;
> 
> +	folios = kvmalloc_array(max_nr_folios, sizeof(*folios), GFP_KERNEL);
> +	if (!folios) {
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +
>  	for (i = 0; i < head->count; i++) {
>  		struct file *memfd = fget(list[i].memfd);
> 
> @@ -436,7 +440,7 @@ static long udmabuf_create(struct miscdevice
> *device,
>  			goto err;
> 
>  		ret = udmabuf_pin_folios(ubuf, memfd, list[i].offset,
> -					 list[i].size);
> +					 list[i].size, folios);
>  		fput(memfd);
>  		if (ret)
>  			goto err;
> @@ -447,12 +451,14 @@ static long udmabuf_create(struct miscdevice
> *device,
>  	if (ret < 0)
>  		goto err;
> 
> +	kvfree(folios);
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>

>  	return ret;
> 
>  err:
>  	deinit_udmabuf(ubuf);
>  err_noinit:
>  	kfree(ubuf);
> +	kvfree(folios);
>  	return ret;
>  }
> 
> --
> 2.45.2
Huan Yang Sept. 6, 2024, 8:37 a.m. UTC | #4
在 2024/9/6 16:12, Kasireddy, Vivek 写道:
> Hi Huan,
>
>> Subject: [PATCH v5 1/7] udmabuf: pre-fault when first page fault
>>
>> The current udmabuf mmap uses a page fault to populate the vma.
>>
>> However, the current udmabuf has already obtained and pinned the folio
>> upon completion of the creation.This means that the physical memory has
>> already been acquired, rather than being accessed dynamically.
>>
>> As a result, the page fault has lost its purpose as a demanding
>> page. Due to the fact that page fault requires trapping into kernel mode
>> and filling in when accessing the corresponding virtual address in mmap,
>> when creating a large size udmabuf, this represents a considerable
>> overhead.
>>
>> This patch first the pfn into page table, and then pre-fault each pfn
>> into vma, when first access. Should know, if anything wrong when
>> pre-fault, will not report it's error, else, report when task access it
>> at the first time.
>>
>> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>> Signed-off-by: Huan Yang <link@vivo.com>
>> ---
>>   drivers/dma-buf/udmabuf.c | 35 +++++++++++++++++++++++++++++++++--
>>   1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
>> index 047c3cd2ceff..0a8c231a36e1 100644
>> --- a/drivers/dma-buf/udmabuf.c
>> +++ b/drivers/dma-buf/udmabuf.c
>> @@ -43,7 +43,8 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault
>> *vmf)
>>   	struct vm_area_struct *vma = vmf->vma;
>>   	struct udmabuf *ubuf = vma->vm_private_data;
>>   	pgoff_t pgoff = vmf->pgoff;
>> -	unsigned long pfn;
>> +	unsigned long addr, end, pfn;
>> +	vm_fault_t ret;
>>
>>   	if (pgoff >= ubuf->pagecount)
>>   		return VM_FAULT_SIGBUS;
>> @@ -51,7 +52,37 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault
>> *vmf)
>>   	pfn = folio_pfn(ubuf->folios[pgoff]);
>>   	pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
>>
>> -	return vmf_insert_pfn(vma, vmf->address, pfn);
>> +	ret = vmf_insert_pfn(vma, vmf->address, pfn);
>> +	if (ret & VM_FAULT_ERROR)
>> +		return ret;
>> +
>> +	/* pre fault */
>> +	pgoff = vma->vm_pgoff;
>> +	end = vma->vm_end;
> Nit: use vma->vm_end directly in the loop below, as end is used only once.
>
>> +	addr = vma->vm_start;
>> +
>> +	for (; addr < end; pgoff++, addr += PAGE_SIZE) {
>> +		if (addr == vmf->address)
>> +			continue;
>> +
>> +		if (WARN_ON(pgoff >= ubuf->pagecount))
>> +			break;
>> +
>> +		pfn = folio_pfn(ubuf->folios[pgoff]);
>> +
> Nit: no need for a blank line here.
>
>> +		pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
>> +
>> +		/**
>> +		 * If something wrong, due to this vm fault success,
>> +		 * do not report in here, report only when true access
>> +		 * this addr.
>> +		 * So, don't update ret here, just break.
> Please rewrite the above comments as:
> * If the below vmf_insert_pfn() fails, we do not return an error here
> * during this pre-fault step. However, an error will be returned if the
> * failure occurs when the addr is truly accessed.
Thank you
>
> With that,

OK.

Thanks

> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>
>> +		 */
>> +		if (vmf_insert_pfn(vma, addr, pfn) & VM_FAULT_ERROR)
>> +			break;
>> +	}
>> +
>> +	return ret;
>>   }
>>
>>   static const struct vm_operations_struct udmabuf_vm_ops = {
>> --
>> 2.45.2
Huan Yang Sept. 6, 2024, 8:38 a.m. UTC | #5
在 2024/9/6 16:20, Kasireddy, Vivek 写道:
> Hi Huan,
>
>> Subject: [PATCH v5 5/7] udmabuf: introduce udmabuf init and deinit helper
>>
>> After udmabuf is allocated, its resources need to be initialized,
>> including various array structures. The current array structure has
>> already been greatly expanded.
>>
>> Also, before udmabuf needs to be kfree, the occupied resources need to
>> be released.
>>
>> This part is repetitive and maybe overlooked.
>>
>> This patch give a helper function when init and deinit, by this,
>> deduce duplicate code.
> *reduce
>
> If possible, please try to improve the wording and grammatical correctness
> in the commit messages of other patches as well.

I'll fix it in next-version

>
>> Signed-off-by: Huan Yang <link@vivo.com>
>> ---
>>   drivers/dma-buf/udmabuf.c | 52 +++++++++++++++++++++++----------------
>>   1 file changed, 31 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
>> index ca2b21c5c57f..254d9ec3d9f3 100644
>> --- a/drivers/dma-buf/udmabuf.c
>> +++ b/drivers/dma-buf/udmabuf.c
>> @@ -226,6 +226,28 @@ static int add_to_unpin_list(struct list_head
>> *unpin_list,
>>   	return 0;
>>   }
>>
>> +static __always_inline int init_udmabuf(struct udmabuf *ubuf, pgoff_t
>> pgcnt)
>> +{
>> +	INIT_LIST_HEAD(&ubuf->unpin_list);
>> +
>> +	ubuf->folios = kvmalloc_array(pgcnt, sizeof(*ubuf->folios),
>> GFP_KERNEL);
>> +	if (!ubuf->folios)
>> +		return -ENOMEM;
>> +
>> +	ubuf->offsets = kvcalloc(pgcnt, sizeof(*ubuf->offsets), GFP_KERNEL);
>> +	if (!ubuf->offsets)
>> +		return -ENOMEM;
>> +
>> +	return 0;
>> +}
>> +
>> +static __always_inline void deinit_udmabuf(struct udmabuf *ubuf)
>> +{
>> +	unpin_all_folios(&ubuf->unpin_list);
>> +	kvfree(ubuf->offsets);
>> +	kvfree(ubuf->folios);
>> +}
>> +
>>   static void release_udmabuf(struct dma_buf *buf)
>>   {
>>   	struct udmabuf *ubuf = buf->priv;
>> @@ -234,9 +256,7 @@ static void release_udmabuf(struct dma_buf *buf)
>>   	if (ubuf->sg)
>>   		put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
>>
>> -	unpin_all_folios(&ubuf->unpin_list);
>> -	kvfree(ubuf->offsets);
>> -	kvfree(ubuf->folios);
>> +	deinit_udmabuf(ubuf);
>>   	kfree(ubuf);
>>   }
>>
>> @@ -396,33 +416,24 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   	if (!ubuf)
>>   		return -ENOMEM;
>>
>> -	INIT_LIST_HEAD(&ubuf->unpin_list);
>>   	pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT;
>>   	for (i = 0; i < head->count; i++) {
>>   		if (!PAGE_ALIGNED(list[i].offset))
>> -			goto err;
>> +			goto err_noinit;
>>   		if (!PAGE_ALIGNED(list[i].size))
>> -			goto err;
>> +			goto err_noinit;
>>
>>   		pgcnt += list[i].size >> PAGE_SHIFT;
>>   		if (pgcnt > pglimit)
>> -			goto err;
>> +			goto err_noinit;
>>   	}
>>
>>   	if (!pgcnt)
>> -		goto err;
>> +		goto err_noinit;
>>
>> -	ubuf->folios = kvmalloc_array(pgcnt, sizeof(*ubuf->folios),
>> GFP_KERNEL);
>> -	if (!ubuf->folios) {
>> -		ret = -ENOMEM;
>> +	ret = init_udmabuf(ubuf, pgcnt);
>> +	if (ret)
>>   		goto err;
>> -	}
>> -
>> -	ubuf->offsets = kvcalloc(pgcnt, sizeof(*ubuf->offsets), GFP_KERNEL);
>> -	if (!ubuf->offsets) {
>> -		ret = -ENOMEM;
>> -		goto err;
>> -	}
>>
>>   	for (i = 0; i < head->count; i++) {
>>   		struct file *memfd = fget(list[i].memfd);
>> @@ -446,9 +457,8 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   	return ret;
>>
>>   err:
>> -	unpin_all_folios(&ubuf->unpin_list);
>> -	kvfree(ubuf->offsets);
>> -	kvfree(ubuf->folios);
>> +	deinit_udmabuf(ubuf);
>> +err_noinit:
> I don't really see the need for this new label, but I guess it makes things a
> bit clear.

If not this, each list err will need kfree, I think use this more clear.

Thank you. :)

>
> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>
>>   	kfree(ubuf);
>>   	return ret;
>>   }
>> --
>> 2.45.2
Huan Yang Sept. 6, 2024, 8:40 a.m. UTC | #6
在 2024/9/6 16:23, Kasireddy, Vivek 写道:
> Hi Huan,
>
>> Subject: [PATCH v5 7/7] udmabuf: reuse folio array when pin folios
>>
>> When invoke memfd_pin_folios, we need offer an array to save each folio
>> which we pinned.
>>
>> The currently way is dynamic alloc an array, get folios, save into
>> udmabuf and then free.
>>
>> If the size is tiny, alloc from slab, is ok due to slab can cache it.
>> Or, just PCP order can cover, also ok.
>>
>> But if size is huge, need fallback into vmalloc, then, not well, due to
>> each page will iter alloc, and map into vmalloc area. Too heavy.
>>
>> Now that we need to iter each udmabuf item, then pin it's range folios,
>> we can reuse the maximum size range's folios array.
>>
>> Signed-off-by: Huan Yang <link@vivo.com>
>> ---
>>   drivers/dma-buf/udmabuf.c | 34 ++++++++++++++++++++--------------
>>   1 file changed, 20 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
>> index d449c1fd67a5..d70e45c33442 100644
>> --- a/drivers/dma-buf/udmabuf.c
>> +++ b/drivers/dma-buf/udmabuf.c
>> @@ -343,28 +343,21 @@ static int export_udmabuf(struct udmabuf *ubuf,
>>   }
>>
>>   static int udmabuf_pin_folios(struct udmabuf *ubuf, struct file *memfd,
>> -			      loff_t start, loff_t size)
>> +			      loff_t start, loff_t size, struct folio **folios)
>>   {
>>   	pgoff_t pgoff, pgcnt;
>>   	pgoff_t upgcnt = ubuf->pagecount;
>>   	pgoff_t nr_pinned = ubuf->nr_pinned;
>>   	u32 cur_folio, cur_pgcnt;
>> -	struct folio **folios = NULL;
>>   	long nr_folios;
>>   	loff_t end;
>>   	int ret = 0;
>>
>>   	pgcnt = size >> PAGE_SHIFT;
>> -	folios = kvmalloc_array(pgcnt, sizeof(*folios), GFP_KERNEL);
>> -	if (!folios)
>> -		return -ENOMEM;
>> -
>>   	end = start + (pgcnt << PAGE_SHIFT) - 1;
>>   	nr_folios = memfd_pin_folios(memfd, start, end, folios, pgcnt,
>> &pgoff);
>> -	if (nr_folios <= 0) {
>> -		ret = nr_folios ? nr_folios : -EINVAL;
>> -		goto err;
>> -	}
>> +	if (nr_folios <= 0)
>> +		return nr_folios ? nr_folios : -EINVAL;
>>
>>   	cur_pgcnt = 0;
>>   	for (cur_folio = 0; cur_folio < nr_folios; ++cur_folio) {
>> @@ -389,10 +382,8 @@ static int udmabuf_pin_folios(struct udmabuf
>> *ubuf, struct file *memfd,
>>   		pgoff = 0;
>>   	}
>>   end:
>> -err:
>>   	ubuf->pagecount = upgcnt;
>>   	ubuf->nr_pinned = nr_pinned;
>> -	kvfree(folios);
>>   	return ret;
> The variable ret is now unused in this function. Remove it and just
> return 0 at the end.

Oh, nice catch.

Thanks.

>
>>   }
>>
>> @@ -403,6 +394,8 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   	pgoff_t pgcnt = 0, pglimit;
>>   	long ret = -EINVAL;
>>   	struct udmabuf *ubuf;
>> +	struct folio **folios = NULL;
>> +	unsigned long max_nr_folios = 0;
>>   	u32 i, flags;
>>
>>   	ubuf = kzalloc(sizeof(*ubuf), GFP_KERNEL);
>> @@ -411,14 +404,19 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>
>>   	pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT;
>>   	for (i = 0; i < head->count; i++) {
>> +		pgoff_t subpgcnt;
>> +
>>   		if (!PAGE_ALIGNED(list[i].offset))
>>   			goto err_noinit;
>>   		if (!PAGE_ALIGNED(list[i].size))
>>   			goto err_noinit;
>>
>> -		pgcnt += list[i].size >> PAGE_SHIFT;
>> +		subpgcnt = list[i].size >> PAGE_SHIFT;
>> +		pgcnt += subpgcnt;
>>   		if (pgcnt > pglimit)
>>   			goto err_noinit;
>> +
>> +		max_nr_folios = max_t(unsigned long, subpgcnt,
>> max_nr_folios);
>>   	}
>>
>>   	if (!pgcnt)
>> @@ -428,6 +426,12 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   	if (ret)
>>   		goto err;
>>
>> +	folios = kvmalloc_array(max_nr_folios, sizeof(*folios), GFP_KERNEL);
>> +	if (!folios) {
>> +		ret = -ENOMEM;
>> +		goto err;
>> +	}
>> +
>>   	for (i = 0; i < head->count; i++) {
>>   		struct file *memfd = fget(list[i].memfd);
>>
>> @@ -436,7 +440,7 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   			goto err;
>>
>>   		ret = udmabuf_pin_folios(ubuf, memfd, list[i].offset,
>> -					 list[i].size);
>> +					 list[i].size, folios);
>>   		fput(memfd);
>>   		if (ret)
>>   			goto err;
>> @@ -447,12 +451,14 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   	if (ret < 0)
>>   		goto err;
>>
>> +	kvfree(folios);
> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>
>>   	return ret;
>>
>>   err:
>>   	deinit_udmabuf(ubuf);
>>   err_noinit:
>>   	kfree(ubuf);
>> +	kvfree(folios);
>>   	return ret;
>>   }
>>
>> --
>> 2.45.2