diff mbox series

[1/2] lib/string_helpers: Introduce strsplit_u32()

Message ID 20220707091301.1282291-1-cezary.rojewski@intel.com
State New
Headers show
Series [1/2] lib/string_helpers: Introduce strsplit_u32() | expand

Commit Message

Cezary Rojewski July 7, 2022, 9:13 a.m. UTC
Add strsplit_u32() and its __user variant to allow for splitting
specified string into array of u32 tokens.

Originally this functionality was added for the SOF sound driver. As
more users are on the horizon, relocate it so it becomes a common good.

Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
---
 include/linux/string_helpers.h    |  3 +
 lib/string_helpers.c              | 96 +++++++++++++++++++++++++++++++
 sound/soc/sof/sof-client-probes.c | 51 +---------------
 3 files changed, 100 insertions(+), 50 deletions(-)

Comments

Andy Shevchenko July 8, 2022, 10:22 a.m. UTC | #1
On Thu, Jul 7, 2022 at 11:03 AM Cezary Rojewski
<cezary.rojewski@intel.com> wrote:
>
> Add strsplit_u32() and its __user variant to allow for splitting
> specified string into array of u32 tokens.

And I believe we have more of this done in old code.
Since all callers use ',' as a delimiter, have you considered using
get_options()?

> Originally this functionality was added for the SOF sound driver. As
> more users are on the horizon, relocate it so it becomes a common good.

Maybe it can be fixed just there.
Cezary Rojewski July 8, 2022, 11:32 a.m. UTC | #2
On 2022-07-08 12:22 PM, Andy Shevchenko wrote:
> On Thu, Jul 7, 2022 at 11:03 AM Cezary Rojewski
> <cezary.rojewski@intel.com> wrote:
>>
>> Add strsplit_u32() and its __user variant to allow for splitting
>> specified string into array of u32 tokens.
> 
> And I believe we have more of this done in old code.
> Since all callers use ',' as a delimiter, have you considered using
> get_options()?


Thanks for your input, Andy.

When I'd written the very first version of this function many months 
ago, get_options() looked as it does not fulfill our needs. It seems to 
be true even today: caller needs to know the number of elements in an 
array upfront. Also, kstrtox() takes into account '0x' and modifies the 
base accordingly if that's the case. simple_strtoull() looks as not 
capable of doing the same thing.

The goal is to be able to parse input such as:

0x1000003,0,0,0x1000004,0,0

into a sequence of 6 uints, filling the *tkns and *num_tkns for the caller.

>> Originally this functionality was added for the SOF sound driver. As
>> more users are on the horizon, relocate it so it becomes a common good.
> 
> Maybe it can be fixed just there.

avs-driver, which is also part of the ASoC framework has very similar 
debug-interface. I believe there's no need to duplicate the functions - 
move them to common code instead.


Regards,
Czarek
Andy Shevchenko July 8, 2022, 11:46 a.m. UTC | #3
On Fri, Jul 8, 2022 at 1:33 PM Cezary Rojewski
<cezary.rojewski@intel.com> wrote:
> On 2022-07-08 12:22 PM, Andy Shevchenko wrote:
> > On Thu, Jul 7, 2022 at 11:03 AM Cezary Rojewski
> > <cezary.rojewski@intel.com> wrote:
> >>
> >> Add strsplit_u32() and its __user variant to allow for splitting
> >> specified string into array of u32 tokens.
> >
> > And I believe we have more of this done in old code.
> > Since all callers use ',' as a delimiter, have you considered using
> > get_options()?
>
>
> Thanks for your input, Andy.
>
> When I'd written the very first version of this function many months
> ago, get_options() looked as it does not fulfill our needs. It seems to
> be true even today: caller needs to know the number of elements in an
> array upfront.

Have you read a kernel doc for it? It does return the number of
elements at the first pass.

> Also, kstrtox() takes into account '0x' and modifies the
> base accordingly if that's the case. simple_strtoull() looks as not
> capable of doing the same thing.

How come?! It does parse all known prefixes: 0x, 0, +, -.

> The goal is to be able to parse input such as:
>
> 0x1000003,0,0,0x1000004,0,0
>
> into a sequence of 6 uints, filling the *tkns and *num_tkns for the caller.

Yes. Have you checked the test cases for get_options()?

> >> Originally this functionality was added for the SOF sound driver. As
> >> more users are on the horizon, relocate it so it becomes a common good.
> >
> > Maybe it can be fixed just there.
>
> avs-driver, which is also part of the ASoC framework has very similar
> debug-interface. I believe there's no need to duplicate the functions -
> move them to common code instead.

Taking the above into account, please try to use get_options() and
then tell me what's not working with it. If so, we will add test cases
to get_options() and fix it.
Cezary Rojewski July 8, 2022, 12:13 p.m. UTC | #4
On 2022-07-08 1:46 PM, Andy Shevchenko wrote:
> On Fri, Jul 8, 2022 at 1:33 PM Cezary Rojewski
> <cezary.rojewski@intel.com> wrote:

...

>> When I'd written the very first version of this function many months
>> ago, get_options() looked as it does not fulfill our needs. It seems to
>> be true even today: caller needs to know the number of elements in an
>> array upfront.
> 
> Have you read a kernel doc for it? It does return the number of
> elements at the first pass.

Yes, I've checked several parts of it. Perhaps I did miss something but 
simple_strtoull() doc reads: use kstrtox() instead. Thus the 
strsplit_u32() makes use of kstrtox().

>> Also, kstrtox() takes into account '0x' and modifies the
>> base accordingly if that's the case. simple_strtoull() looks as not
>> capable of doing the same thing.
> 
> How come?! It does parse all known prefixes: 0x, 0, +, -.

Hmm.. doc says that it stops at the first non-digit character. Will 
re-check.

>> The goal is to be able to parse input such as:
>>
>> 0x1000003,0,0,0x1000004,0,0
>>
>> into a sequence of 6 uints, filling the *tkns and *num_tkns for the caller.
> 
> Yes. Have you checked the test cases for get_options()?
> 

...

>> avs-driver, which is also part of the ASoC framework has very similar
>> debug-interface. I believe there's no need to duplicate the functions -
>> move them to common code instead.
> 
> Taking the above into account, please try to use get_options() and
> then tell me what's not working with it. If so, we will add test cases
> to get_options() and fix it.

There is a difference:

	// get_options
	int ints[5];

	s = get_options(str, ARRAY_SIZE(ints), ints);

	// strsplit_u32()
	u32 *tkns, num_tkns;

	ret = strsplit_u32(str, delim, &tkns, &num_tkns);

Nothing has been told upfront for in the second case.
Peter Ujfalusi July 8, 2022, 12:35 p.m. UTC | #5
On 08/07/2022 15:30, Andy Shevchenko wrote:
> On Fri, Jul 08, 2022 at 02:13:14PM +0200, Cezary Rojewski wrote:
>> On 2022-07-08 1:46 PM, Andy Shevchenko wrote:
>>> On Fri, Jul 8, 2022 at 1:33 PM Cezary Rojewski
>>> <cezary.rojewski@intel.com> wrote:
> 
> ...
> 
>>>> When I'd written the very first version of this function many months
>>>> ago, get_options() looked as it does not fulfill our needs. It seems to
>>>> be true even today: caller needs to know the number of elements in an
>>>> array upfront.
>>>
>>> Have you read a kernel doc for it? It does return the number of
>>> elements at the first pass.
>>
>> Yes, I've checked several parts of it. Perhaps I did miss something but
>> simple_strtoull() doc reads: use kstrtox() instead.
> 
> Doc was fixed to make clearer that in some cases it's okay to use
> simple_strtox(). And this was exactly due to obfuscation code with this
> recommendation. Yes, in general one supposed to use kstrtox(), but it's
> not 100% obligatory.
> 
>> Thus the strsplit_u32()
>> makes use of kstrtox().
> 
> Yeah...
> 
>>>> Also, kstrtox() takes into account '0x' and modifies the
>>>> base accordingly if that's the case. simple_strtoull() looks as not
>>>> capable of doing the same thing.
>>>
>>> How come?! It does parse all known prefixes: 0x, 0, +, -.
>>
>> Hmm.. doc says that it stops at the first non-digit character. Will
>> re-check.
> 
> Yes, but under non-digit implies the standard prefixes of digits.
> simple_strtox() and kstrotox() use the very same function for prefixes.
> 
>>>> The goal is to be able to parse input such as:
>>>>
>>>> 0x1000003,0,0,0x1000004,0,0
>>>>
>>>> into a sequence of 6 uints, filling the *tkns and *num_tkns for the caller.
>>>
>>> Yes. Have you checked the test cases for get_options()?
> 
> (1)
> 
> ...
> 
>>>> avs-driver, which is also part of the ASoC framework has very similar
>>>> debug-interface. I believe there's no need to duplicate the functions -
>>>> move them to common code instead.
>>>
>>> Taking the above into account, please try to use get_options() and
>>> then tell me what's not working with it. If so, we will add test cases
>>> to get_options() and fix it.
>>
>> There is a difference:
>>
>> 	// get_options
>> 	int ints[5];
>>
>> 	s = get_options(str, ARRAY_SIZE(ints), ints);
>>
>> 	// strsplit_u32()
>> 	u32 *tkns, num_tkns;
>>
>> 	ret = strsplit_u32(str, delim, &tkns, &num_tkns);
>>
>> Nothing has been told upfront for in the second case.
> 
> It seems you are missing the (1). The code has checks for the case where you
> can do get number upfront, it would just require two passes, but it's nothing
> in comparison of heave realloc().
> 
>   unsigned int *tokens;
>   char *p;
>   int num;
> 
>   p = get_options(str, 0, &num);
>   if (num == 0)
> 	// No numbers in the string!
> 
>   tokens = kcalloc(num + 1, sizeof(*tokens), GFP_KERNEL);
>   if (!tokens)
> 	return -ENOMEM;
> 
>   p = get_oprions(str, num, &tokens);
>   if (*p)
> 	// String was parsed only partially!
> 	// assuming it's not a fatal error
> 
>   return tokens;
> 

This diff is tested and works:
diff --git a/sound/soc/sof/sof-client-probes.c b/sound/soc/sof/sof-client-probes.c
index 60e4250fac87..48d405f78a83 100644
--- a/sound/soc/sof/sof-client-probes.c
+++ b/sound/soc/sof/sof-client-probes.c
@@ -410,61 +410,11 @@ static const struct snd_compress_ops sof_probes_compressed_ops = {
 	.copy = sof_probes_compr_copy,
 };
 
-/**
- * strsplit_u32 - Split string into sequence of u32 tokens
- * @buf:	String to split into tokens.
- * @delim:	String containing delimiter characters.
- * @tkns:	Returned u32 sequence pointer.
- * @num_tkns:	Returned number of tokens obtained.
- */
-static int strsplit_u32(char *buf, const char *delim, u32 **tkns, size_t *num_tkns)
-{
-	char *s;
-	u32 *data, *tmp;
-	size_t count = 0;
-	size_t cap = 32;
-	int ret = 0;
-
-	*tkns = NULL;
-	*num_tkns = 0;
-	data = kcalloc(cap, sizeof(*data), GFP_KERNEL);
-	if (!data)
-		return -ENOMEM;
-
-	while ((s = strsep(&buf, delim)) != NULL) {
-		ret = kstrtouint(s, 0, data + count);
-		if (ret)
-			goto exit;
-		if (++count >= cap) {
-			cap *= 2;
-			tmp = krealloc(data, cap * sizeof(*data), GFP_KERNEL);
-			if (!tmp) {
-				ret = -ENOMEM;
-				goto exit;
-			}
-			data = tmp;
-		}
-	}
-
-	if (!count)
-		goto exit;
-	*tkns = kmemdup(data, count * sizeof(*data), GFP_KERNEL);
-	if (!(*tkns)) {
-		ret = -ENOMEM;
-		goto exit;
-	}
-	*num_tkns = count;
-
-exit:
-	kfree(data);
-	return ret;
-}
-
 static int tokenize_input(const char __user *from, size_t count,
 			  loff_t *ppos, u32 **tkns, size_t *num_tkns)
 {
+	int ret, nints, i, *ints;
 	char *buf;
-	int ret;
 
 	buf = kmalloc(count + 1, GFP_KERNEL);
 	if (!buf)
@@ -473,12 +423,36 @@ static int tokenize_input(const char __user *from, size_t count,
 	ret = simple_write_to_buffer(buf, count, ppos, from, count);
 	if (ret != count) {
 		ret = ret >= 0 ? -EIO : ret;
-		goto exit;
+		goto free_buf;
 	}
 
 	buf[count] = '\0';
-	ret = strsplit_u32(buf, ",", tkns, num_tkns);
-exit:
+	get_options(buf, 0, &nints);
+	if (!nints) {
+		ret = -ENOENT;
+		goto free_buf;
+	}
+
+	ints = kcalloc(nints + 1, sizeof(*ints), GFP_KERNEL);
+	if (!ints) {
+		ret = -ENOMEM;
+		goto free_buf;
+	}
+
+	*tkns = kcalloc(nints, sizeof(u32), GFP_KERNEL);
+	if (!(*tkns)) {
+		ret = -ENOMEM;
+		goto free_ints;
+	}
+
+	get_options(buf, nints + 1, ints);
+	*num_tkns = nints;
+	for (i = 0; i < nints; i++)
+		(*tkns)[i] = ints[i + 1];
+	
+free_ints:
+	kfree(ints);
+free_buf:
 	kfree(buf);
 	return ret;
 }

Could be made nicer with some brain work put to it, we need strict u32 within the IPC message for the array...
Andy Shevchenko July 8, 2022, 3:25 p.m. UTC | #6
On Fri, Jul 8, 2022 at 2:34 PM Péter Ujfalusi
<peter.ujfalusi@linux.intel.com> wrote:
> On 08/07/2022 15:30, Andy Shevchenko wrote:
> > On Fri, Jul 08, 2022 at 02:13:14PM +0200, Cezary Rojewski wrote:

...

> > It seems you are missing the (1). The code has checks for the case where you
> > can do get number upfront, it would just require two passes, but it's nothing
> > in comparison of heave realloc().
> >
> >   unsigned int *tokens;
> >   char *p;
> >   int num;
> >
> >   p = get_options(str, 0, &num);
> >   if (num == 0)
> >       // No numbers in the string!
> >
> >   tokens = kcalloc(num + 1, sizeof(*tokens), GFP_KERNEL);
> >   if (!tokens)
> >       return -ENOMEM;
> >
> >   p = get_oprions(str, num, &tokens);
> >   if (*p)
> >       // String was parsed only partially!
> >       // assuming it's not a fatal error
> >
> >   return tokens;

> This diff is tested and works:

Thanks, Peter!

But at least you can memove() to avoid second allocation.
ideally to refactor that the result of get_options is consumed as is
(it may be casted to struct tokens { int n; u32 v[]; })

...

> Could be made nicer with some brain work put to it, we need strict u32 within the IPC message for the array...

True, it needs to be thought through. But I guess you got the idea of
how to use existing library routines.
Andy Shevchenko July 8, 2022, 4:49 p.m. UTC | #7
On Fri, Jul 8, 2022 at 6:32 PM Cezary Rojewski
<cezary.rojewski@intel.com> wrote:
>
> On 2022-07-08 5:25 PM, Andy Shevchenko wrote:
> > On Fri, Jul 8, 2022 at 2:34 PM Péter Ujfalusi
> > <peter.ujfalusi@linux.intel.com> wrote:

> A long shot, but what if we were to modify get_options() so it takes
> additional element-size parameter instead?

But why? int / unsigned int, u32 / s32  are all compatible in the current cases.
Cezary Rojewski July 12, 2022, 1:51 p.m. UTC | #8
On 2022-07-09 10:42 PM, Andy Shevchenko wrote:
> On Sat, Jul 09, 2022 at 10:45:49AM +0200, Cezary Rojewski wrote:
>> On 2022-07-08 6:49 PM, Andy Shevchenko wrote:
>>> On Fri, Jul 8, 2022 at 6:32 PM Cezary Rojewski
>>> <cezary.rojewski@intel.com> wrote:
>>>>
>>>> On 2022-07-08 5:25 PM, Andy Shevchenko wrote:
>>>>> On Fri, Jul 8, 2022 at 2:34 PM Péter Ujfalusi
>>>>> <peter.ujfalusi@linux.intel.com> wrote:
>>>
>>>> A long shot, but what if we were to modify get_options() so it takes
>>>> additional element-size parameter instead?
>>>
>>> But why? int / unsigned int, u32 / s32  are all compatible in the current cases.
>>
>> I'd like to avoid any additional operations, so that the retrieved payload
>> can be provided to the IPC handler directly. The IPC handlers for AudioDSP
>> drivers are expecting payload in u32s.
>>
>> // u32 **tkns, size_t *num_tkns as foo() arguments
>> // u32 *ints, int nints as locals
>>
>> 	get_options(buf, 0, &nints);
>> 	if (!nints) {
>> 		ret = -ENOENT;
>> 		goto free_buf;
>> 	}
>>
>> 	ints = kcalloc(nints + 1, sizeof(*ints), GFP_KERNEL);
>> 	if (!ints) {
>> 		ret = -ENOMEM;
>> 		goto free_buf;
>> 	}
>>
>> 	get_num_options(buf, nints + 1, ints, sizeof(*ints));
>>
>> 	*tkns = ints;
>> 	*num_tkns = nints;
>>
>> No additional operations in between. The intermediate IPC handler can later
>> refer to the actual payload via &tkns[1] before passing it to the generic
>> one.
>>
>> Casting int array into u32 array does not feel right, or perhaps I'm missing
>> something like in the doc case.
> 
> C standard.
> 
> int to unsigned int is not promoted. And standard says that "The rank of any
> unsigned integer type shall equal the rank of the corresponding signed integer
> type, if any."
> 
> I don't know why one needs to have an additional churn here. int and unsigned
> int are interoperable with the adjustment to the sign when the other argument
> is signed or lesser rank of.


I still believe that casting blindly is not the way to go. I did 
explicitly ask about int vs u32, not int vs unsigned int. Please note 
that these values are later passed to the IPC handlers, and this changes 
the context a bit. If hw expects u32, then u32 it shall be.

Please correct me if I'm wrong, but there is no guarantee that int is 
always 32bits long. What is guaranteed though, is that int holds at 
least -/+ 32,767. Also, values larger than INT_MAX are allowed in the 
IPC payload.


Regards,
Czarek
Andy Shevchenko July 12, 2022, 1:59 p.m. UTC | #9
On Tue, Jul 12, 2022 at 3:51 PM Cezary Rojewski
<cezary.rojewski@intel.com> wrote:
> On 2022-07-09 10:42 PM, Andy Shevchenko wrote:
> > On Sat, Jul 09, 2022 at 10:45:49AM +0200, Cezary Rojewski wrote:
> >> On 2022-07-08 6:49 PM, Andy Shevchenko wrote:
> >>> On Fri, Jul 8, 2022 at 6:32 PM Cezary Rojewski
> >>> <cezary.rojewski@intel.com> wrote:
> >>>> On 2022-07-08 5:25 PM, Andy Shevchenko wrote:
> >>>>> On Fri, Jul 8, 2022 at 2:34 PM Péter Ujfalusi
> >>>>> <peter.ujfalusi@linux.intel.com> wrote:

...

> >>>> A long shot, but what if we were to modify get_options() so it takes
> >>>> additional element-size parameter instead?
> >>>
> >>> But why? int / unsigned int, u32 / s32  are all compatible in the current cases.
> >>
> >> I'd like to avoid any additional operations, so that the retrieved payload
> >> can be provided to the IPC handler directly. The IPC handlers for AudioDSP
> >> drivers are expecting payload in u32s.
> >>
> >> // u32 **tkns, size_t *num_tkns as foo() arguments
> >> // u32 *ints, int nints as locals
> >>
> >>      get_options(buf, 0, &nints);
> >>      if (!nints) {
> >>              ret = -ENOENT;
> >>              goto free_buf;
> >>      }
> >>
> >>      ints = kcalloc(nints + 1, sizeof(*ints), GFP_KERNEL);
> >>      if (!ints) {
> >>              ret = -ENOMEM;
> >>              goto free_buf;
> >>      }
> >>
> >>      get_num_options(buf, nints + 1, ints, sizeof(*ints));
> >>
> >>      *tkns = ints;
> >>      *num_tkns = nints;
> >>
> >> No additional operations in between. The intermediate IPC handler can later
> >> refer to the actual payload via &tkns[1] before passing it to the generic
> >> one.
> >>
> >> Casting int array into u32 array does not feel right, or perhaps I'm missing
> >> something like in the doc case.
> >
> > C standard.
> >
> > int to unsigned int is not promoted. And standard says that "The rank of any
> > unsigned integer type shall equal the rank of the corresponding signed integer
> > type, if any."
> >
> > I don't know why one needs to have an additional churn here. int and unsigned
> > int are interoperable with the adjustment to the sign when the other argument
> > is signed or lesser rank of.
>
> I still believe that casting blindly is not the way to go. I did
> explicitly ask about int vs u32,

There is no such type in the C standard.

> not int vs unsigned int. Please note
> that these values are later passed to the IPC handlers, and this changes
> the context a bit. If hw expects u32, then u32 it shall be.

H/W doesn't expect u32, HW expects bytes or group of bytes with:
1) dedicated address alignment (if required);
2) dedicated byte order;
3) dedicated padding (if required).

Correct me if I'm wrong.

> Please correct me if I'm wrong, but there is no guarantee that int is
> always 32bits long.

There is no guarantee by the C standard, indeed, but there is an upper
level guarantee, by the Linux kernel.

> What is guaranteed though, is that int holds at
> least -/+ 32,767. Also, values larger than INT_MAX are allowed in the
> IPC payload.

Yeah... this is binary protocol, right? So, what limits are you
talking about here if they are not applicable there anyway. It's
simply different dimension of limits (i.e. bytes and bits and not C
language types).
Mark Brown July 12, 2022, 2:02 p.m. UTC | #10
On Tue, Jul 12, 2022 at 03:51:04PM +0200, Cezary Rojewski wrote:

> Please correct me if I'm wrong, but there is no guarantee that int is always
> 32bits long. What is guaranteed though, is that int holds at least -/+
> 32,767. Also, values larger than INT_MAX are allowed in the IPC payload.

Right, int is just the natural size for an integer on the platform.
Andy Shevchenko July 12, 2022, 2:24 p.m. UTC | #11
On Tue, Jul 12, 2022 at 03:51:04PM +0200, Cezary Rojewski wrote:
> On 2022-07-09 10:42 PM, Andy Shevchenko wrote:

...

> I still believe that casting blindly is not the way to go. I did explicitly
> ask about int vs u32, not int vs unsigned int. Please note that these values
> are later passed to the IPC handlers, and this changes the context a bit. If
> hw expects u32, then u32 it shall be.

What you can do is probably utilize _Generic() which will reduce the code base
and allow to use the same template for different types.
David Laight July 13, 2022, 9:14 a.m. UTC | #12
>          if (pint)
> -               *pint = value;
> +               memcpy(pint, &value, min(nsize, sizeof(value)));

That is just soooooo broken.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Andy Shevchenko July 13, 2022, 9:38 a.m. UTC | #13
On Wed, Jul 13, 2022 at 11:14 AM David Laight <David.Laight@aculab.com> wrote:
>
> >          if (pint)
> > -               *pint = value;
> > +               memcpy(pint, &value, min(nsize, sizeof(value)));
>
> That is just soooooo broken.

OK.
Cezary Rojewski July 19, 2022, 11:40 a.m. UTC | #14
On 2022-07-12 4:24 PM, Andy Shevchenko wrote:
> On Tue, Jul 12, 2022 at 03:51:04PM +0200, Cezary Rojewski wrote:
>> On 2022-07-09 10:42 PM, Andy Shevchenko wrote:
> 
> ...
> 
>> I still believe that casting blindly is not the way to go. I did explicitly
>> ask about int vs u32, not int vs unsigned int. Please note that these values
>> are later passed to the IPC handlers, and this changes the context a bit. If
>> hw expects u32, then u32 it shall be.
> 
> What you can do is probably utilize _Generic() which will reduce the code base
> and allow to use the same template for different types.


Writing to let you know that the feedback is not ignored, just my TODO 
queue is large. Will come back once _Generic() idea is verified on my 
side or when I have more questions. Thanks once again for your input Andy!


Regards,
Czarek
Cezary Rojewski Aug. 9, 2022, 9:55 a.m. UTC | #15
On 2022-07-12 4:24 PM, Andy Shevchenko wrote:
> On Tue, Jul 12, 2022 at 03:51:04PM +0200, Cezary Rojewski wrote:
>> On 2022-07-09 10:42 PM, Andy Shevchenko wrote:
> 
> ...
> 
>> I still believe that casting blindly is not the way to go. I did explicitly
>> ask about int vs u32, not int vs unsigned int. Please note that these values
>> are later passed to the IPC handlers, and this changes the context a bit. If
>> hw expects u32, then u32 it shall be.
> 
> What you can do is probably utilize _Generic() which will reduce the code base
> and allow to use the same template for different types.


Hello,

I've spent some time analyzing possible utilization of _Generic() in 
context of get_options() but in my opinion get_range() complicates 
things enough that get_range() and get_option() would basically need a 
copy per type.


If Linux kernel guarantees that sizeof(int), sizeof(unsigned int), 
sizeof(s32) and sizeof(u32) are all equal (given the currently supported 
arch set), then indeed modifying get_options() may not be necessary. 
This plus shamelessly casting (u32 *) to (int *) of course.

What's left to do is the __user helper function. What I have in mind is:

int tokenize_user_input(const char __user *from, size_t count, loff_t 
*ppos, int **tkns)
{
	int *ints, nints;
	char *buf;
	int ret;

	buf = kmalloc(count + 1, GFP_KERNEL);
	if (!buf)
		return -ENOMEM;

	ret = simple_write_to_buffer(buf, count, ppos, from, count);
	if (ret != count) {
		ret = (ret < 0) ? ret : -EIO;
		goto free_buf;
	}

	buf[count] = '\0';

	get_options(buf, 0, &nints);
	if (!nints) {
		ret = -ENOENT;
		goto free_buf;
	}

	ints = kcalloc(nints + 1, sizeof(*ints), GFP_KERNEL);
	if (!ints) {
		ret = -ENOMEM;
		goto free_buf;
	}

	get_options(buf, nints + 1, ints);
	*tkns = ints;
	ret = 0;

free_buf:
	kfree(buf);
	return ret;
}


Usage:
	u32 *tkns;

	ret = tokenize_user_input(from, count, ppos, (int **)&tkns);


as a part of fs/libfs.c not lib/cmdline.c. Is such approach acceptable?



Regards,
Czarek
Andy Shevchenko Aug. 9, 2022, 3:23 p.m. UTC | #16
On Tue, Aug 9, 2022 at 11:55 AM Cezary Rojewski
<cezary.rojewski@intel.com> wrote:
> On 2022-07-12 4:24 PM, Andy Shevchenko wrote:

...

> I've spent some time analyzing possible utilization of _Generic() in
> context of get_options() but in my opinion get_range() complicates
> things enough that get_range() and get_option() would basically need a
> copy per type.

Thanks for keeping us updated.

> If Linux kernel guarantees that sizeof(int), sizeof(unsigned int),
> sizeof(s32) and sizeof(u32) are all equal (given the currently supported
> arch set), then indeed modifying get_options() may not be necessary.
> This plus shamelessly casting (u32 *) to (int *) of course.

I think as long as Linux kernel states that it requires (at least)
32-bit architecture to run, we are fine. I have heard of course about
a funny project of running Linux on 8-bit microcontrollers, but it's
such a fun niche, which by the way uses emulation without changing
actual 32-bit code, that I won't even talk about.

> What's left to do is the __user helper function. What I have in mind is:
>
> int tokenize_user_input(const char __user *from, size_t count, loff_t
> *ppos, int **tkns)
> {
>         int *ints, nints;
>         char *buf;
>         int ret;
>
>         buf = kmalloc(count + 1, GFP_KERNEL);
>         if (!buf)
>                 return -ENOMEM;
>
>         ret = simple_write_to_buffer(buf, count, ppos, from, count);
>         if (ret != count) {
>                 ret = (ret < 0) ? ret : -EIO;
>                 goto free_buf;
>         }
>
>         buf[count] = '\0';

I guess this may be simplified with memdup_user(). Otherwise it looks like that.

>         get_options(buf, 0, &nints);

(You don't use ppos here, so it's pointless to use
simple_write_to_buffer(), right? I have noticed this pattern in SOF
code, which might be simplified the same way as I suggested above)

>         if (!nints) {
>                 ret = -ENOENT;
>                 goto free_buf;
>         }
>
>         ints = kcalloc(nints + 1, sizeof(*ints), GFP_KERNEL);
>         if (!ints) {
>                 ret = -ENOMEM;
>                 goto free_buf;
>         }
>
>         get_options(buf, nints + 1, ints);
>         *tkns = ints;
>         ret = 0;
>
> free_buf:
>         kfree(buf);
>         return ret;
> }

...

> as a part of fs/libfs.c not lib/cmdline.c. Is such approach acceptable?

I think so.
Cezary Rojewski Aug. 16, 2022, 9:28 a.m. UTC | #17
On 2022-08-09 5:23 PM, Andy Shevchenko wrote:
> On Tue, Aug 9, 2022 at 11:55 AM Cezary Rojewski
> <cezary.rojewski@intel.com> wrote:

...

> 
> I guess this may be simplified with memdup_user(). Otherwise it looks like that.

...

> (You don't use ppos here, so it's pointless to use
> simple_write_to_buffer(), right? I have noticed this pattern in SOF
> code, which might be simplified the same way as I suggested above)


Hello Andy,

Given the two major suggestions (memdup_user() and re-using 
get_options()) that had a major impact on the patch are both provided by 
you, would you like me to add any tags to the commit message? I'm 
speaking of Suggested-by or Co-developed-by and such. In you choose 
'yes', please specify tags to be added.

By the way, I've provided 'the final form' on thesofproject/linux as PR 
[1] to see if no regression is caused in basic scenarios.


[1]: https://github.com/thesofproject/linux/pull/3812


Regards,
Czarek
Andy Shevchenko Aug. 25, 2022, 3:09 p.m. UTC | #18
On Tue, Aug 16, 2022 at 11:28:46AM +0200, Cezary Rojewski wrote:
> On 2022-08-09 5:23 PM, Andy Shevchenko wrote:

...

> Given the two major suggestions (memdup_user() and re-using get_options())
> that had a major impact on the patch are both provided by you, would you
> like me to add any tags to the commit message? I'm speaking of Suggested-by
> or Co-developed-by and such. In you choose 'yes', please specify tags to be
> added.

Suggested-by would be enough.

> By the way, I've provided 'the final form' on thesofproject/linux as PR [1]
> to see if no regression is caused in basic scenarios.

When you will be ready, send it for upstream review. It would be easier for
the kernel community to look at and comment on.

> [1]: https://github.com/thesofproject/linux/pull/3812
Cezary Rojewski Aug. 25, 2022, 4:44 p.m. UTC | #19
On 2022-08-25 5:09 PM, Andy Shevchenko wrote:
> On Tue, Aug 16, 2022 at 11:28:46AM +0200, Cezary Rojewski wrote:
>> On 2022-08-09 5:23 PM, Andy Shevchenko wrote:

...

> Suggested-by would be enough.
> 
>> By the way, I've provided 'the final form' on thesofproject/linux as PR [1]
>> to see if no regression is caused in basic scenarios.
> 
> When you will be ready, send it for upstream review. It would be easier for
> the kernel community to look at and comment on.

On its way. Was awaiting your replay so I do not need to send the series 
twice : ) The PR was there to check if there is no regression on SOF 
side, at least on a BAT (Basic Acceptable Tests) level.


Regards,
Czarek
diff mbox series

Patch

diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h
index 4d72258d42fd..a4630ddfca27 100644
--- a/include/linux/string_helpers.h
+++ b/include/linux/string_helpers.h
@@ -126,4 +126,7 @@  static inline const char *str_enabled_disabled(bool v)
 	return v ? "enabled" : "disabled";
 }
 
+int strsplit_u32(const char *str, const char *delim, u32 **tkns, size_t *num_tkns);
+int strsplit_u32_user(const char __user *from, size_t count, loff_t *ppos, const char *delim,
+		      u32 **tkns, size_t *num_tkns);
 #endif
diff --git a/lib/string_helpers.c b/lib/string_helpers.c
index 5ed3beb066e6..bb24f0c62539 100644
--- a/lib/string_helpers.c
+++ b/lib/string_helpers.c
@@ -984,3 +984,99 @@  void fortify_panic(const char *name)
 }
 EXPORT_SYMBOL(fortify_panic);
 #endif /* CONFIG_FORTIFY_SOURCE */
+
+/**
+ * strsplit_u32 - Split string into sequence of u32 tokens
+ * @str:	The string to split into tokens.
+ * @delim:	The string containing delimiter characters.
+ * @tkns:	Returned u32 sequence pointer.
+ * @num_tkns:	Returned number of tokens obtained.
+ *
+ * On success @num_tkns and @tkns are assigned the number of tokens extracted
+ * and the array itself respectively.
+ * Caller takes responsibility for freeing @tkns when no longer needed.
+ */
+int strsplit_u32(const char *str, const char *delim, u32 **tkns, size_t *num_tkns)
+{
+	size_t max_count = 32;
+	size_t count = 0;
+	char *s, **p;
+	u32 *buf, *tmp;
+	int ret = 0;
+
+	p = (char **)&str;
+	*tkns = NULL;
+	*num_tkns = 0;
+
+	buf = kcalloc(max_count, sizeof(*buf), GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	while ((s = strsep(p, delim)) != NULL) {
+		ret = kstrtouint(s, 0, buf + count);
+		if (ret)
+			goto free_buf;
+
+		if (++count > max_count) {
+			max_count *= 2;
+			tmp = krealloc(buf, max_count * sizeof(*buf), GFP_KERNEL);
+			if (!tmp) {
+				ret = -ENOMEM;
+				goto free_buf;
+			}
+			buf = tmp;
+		}
+	}
+
+	if (!count)
+		goto free_buf;
+	*tkns = kmemdup(buf, count * sizeof(*buf), GFP_KERNEL);
+	if (*tkns == NULL) {
+		ret = -ENOMEM;
+		goto free_buf;
+	}
+	*num_tkns = count;
+
+free_buf:
+	kfree(buf);
+	return ret;
+}
+EXPORT_SYMBOL(strsplit_u32);
+
+/**
+ * strsplit_u32_user - Split string into sequence of u32 tokens
+ * @from:	The user space buffer to read from
+ * @ppos:	The current position in the buffer
+ * @count:	The maximum number of bytes to read
+ * @delim:	The string containing delimiter characters.
+ * @tkns:	Returned u32 sequence pointer.
+ * @num_tkns:	Returned number of tokens obtained.
+ *
+ * On success @num_tkns and @tkns are assigned the number of tokens extracted
+ * and the array itself respectively.
+ * Caller takes responsibility for freeing @tkns when no longer needed.
+ */
+int strsplit_u32_user(const char __user *from, size_t count, loff_t *ppos, const char *delim,
+		      u32 **tkns, size_t *num_tkns)
+{
+	char *buf;
+	int ret;
+
+	buf = kmalloc(count + 1, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	ret = simple_write_to_buffer(buf, count, ppos, from, count);
+	if (ret != count) {
+		ret = (ret < 0) ? ret : -EIO;
+		goto free_buf;
+	}
+
+	buf[count] = '\0';
+	ret = strsplit_u32(buf, delim, tkns, num_tkns);
+
+free_buf:
+	kfree(buf);
+	return ret;
+}
+EXPORT_SYMBOL(strsplit_u32_user);
diff --git a/sound/soc/sof/sof-client-probes.c b/sound/soc/sof/sof-client-probes.c
index 1f1ea93a7fbf..48ebbe58e2b9 100644
--- a/sound/soc/sof/sof-client-probes.c
+++ b/sound/soc/sof/sof-client-probes.c
@@ -12,6 +12,7 @@ 
 #include <linux/debugfs.h>
 #include <linux/module.h>
 #include <linux/pm_runtime.h>
+#include <linux/string_helpers.h>
 #include <sound/soc.h>
 #include <sound/sof/header.h>
 #include "sof-client.h"
@@ -410,56 +411,6 @@  static const struct snd_compress_ops sof_probes_compressed_ops = {
 	.copy = sof_probes_compr_copy,
 };
 
-/**
- * strsplit_u32 - Split string into sequence of u32 tokens
- * @buf:	String to split into tokens.
- * @delim:	String containing delimiter characters.
- * @tkns:	Returned u32 sequence pointer.
- * @num_tkns:	Returned number of tokens obtained.
- */
-static int strsplit_u32(char *buf, const char *delim, u32 **tkns, size_t *num_tkns)
-{
-	char *s;
-	u32 *data, *tmp;
-	size_t count = 0;
-	size_t cap = 32;
-	int ret = 0;
-
-	*tkns = NULL;
-	*num_tkns = 0;
-	data = kcalloc(cap, sizeof(*data), GFP_KERNEL);
-	if (!data)
-		return -ENOMEM;
-
-	while ((s = strsep(&buf, delim)) != NULL) {
-		ret = kstrtouint(s, 0, data + count);
-		if (ret)
-			goto exit;
-		if (++count >= cap) {
-			cap *= 2;
-			tmp = krealloc(data, cap * sizeof(*data), GFP_KERNEL);
-			if (!tmp) {
-				ret = -ENOMEM;
-				goto exit;
-			}
-			data = tmp;
-		}
-	}
-
-	if (!count)
-		goto exit;
-	*tkns = kmemdup(data, count * sizeof(*data), GFP_KERNEL);
-	if (!(*tkns)) {
-		ret = -ENOMEM;
-		goto exit;
-	}
-	*num_tkns = count;
-
-exit:
-	kfree(data);
-	return ret;
-}
-
 static int tokenize_input(const char __user *from, size_t count,
 			  loff_t *ppos, u32 **tkns, size_t *num_tkns)
 {