diff mbox series

tty: serial: qcom-geni-serial: minor fixes to get_clk_div_rate()

Message ID 1654021066-13341-1-git-send-email-quic_vnivarth@quicinc.com
State New
Headers show
Series tty: serial: qcom-geni-serial: minor fixes to get_clk_div_rate() | expand

Commit Message

Vijaya Krishna Nivarthi May 31, 2022, 6:17 p.m. UTC
Add missing initialisation and correct type casting

Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com>
---
 drivers/tty/serial/qcom_geni_serial.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Vijaya Krishna Nivarthi June 3, 2022, 5:43 p.m. UTC | #1
Hi,


On 6/1/2022 9:03 PM, Doug Anderson wrote:
> Hi,
>
> On Wed, Jun 1, 2022 at 3:46 AM Vijaya Krishna Nivarthi
> <quic_vnivarth@quicinc.com> wrote:
>> Hi,
>>
>> On 6/1/2022 12:58 AM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Tue, May 31, 2022 at 11:18 AM Vijaya Krishna Nivarthi
>>> <quic_vnivarth@quicinc.com> wrote:
>>>> Add missing initialisation and correct type casting
>>>>
>>>> Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com>
>>>> ---
>>>>    drivers/tty/serial/qcom_geni_serial.c | 8 ++++----
>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
>>>> index 4733a23..08f3ad4 100644
>>>> --- a/drivers/tty/serial/qcom_geni_serial.c
>>>> +++ b/drivers/tty/serial/qcom_geni_serial.c
>>>> @@ -943,11 +943,11 @@ static int qcom_geni_serial_startup(struct uart_port *uport)
>>>>    static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
>>>>                           unsigned int sampling_rate, unsigned int *clk_div)
>>>>    {
>>>> -       unsigned long ser_clk;
>>>> +       unsigned long ser_clk = 0;
>>> In this patch it's not at all obvious why you'd need to init to 0. I
>>> think the "for loop" is guaranteed to run at least once because
>>> "max_div" is known at compile time. ...and currently each time through
>>> the "for" loop you'll always set "ser_clk".
>> Ok, I realised we will never break out of for loop exceeding ULONG_MAX
>> in 1st pass, so yes ser_clk will always be set.
>>
>>> I think in a future patch you'll want to _remove_ this from the for loop:
>>>
>>> if (!prev)
>>>     ser_clk = freq;
>> Intent is to save (and use) 1st freq if we cannot find an exact divider.
>>
>> Isn't it ok?
>>
>> For example please find debug output for a required frequency of 51.2MHz.
>>
>> We try dividers 1, 2, 3 and end up with 52.1MHz the first result.
>>
>> [   18.815432] 20220509 get_clk_div_rate desired_clk:51200000
>> [   18.821081] 20220509 get_clk_div_rate maxdiv:4095
>> [   18.825924] 20220509 get_clk_div_rate div:1
>> [   18.830239] 20220509 get_clk_div_rate freq:52174000
>> [   18.835288] 20220509 get_clk_div_rate div:2
>> [   18.839628] 20220509 get_clk_div_rate freq:100000000
>> [   18.844794] 20220509 get_clk_div_rate div:3
>> [   18.849119] 20220509 get_clk_div_rate freq:100000000
>> [   18.854254] 20220509 get_clk_div_rate reached max frequency breaking...
>> [   18.861072] 20220509 get_clk_div_rate clk_div=1, ser_clk=52174000
>>
>> The behaviour was same earlier too when root_freq table was present.
> Are you certain about the behavior being the same earlier? Before
> commit c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart
> frequency table..."), the behavior was that get_clk_cfg() would return
> 0 if there was no exact match. Then get_clk_div_rate() would see this
> 0 and print an error and return. Then the rest of
> qcom_geni_serial_set_termios() would do nothing at all.
>
> Ah, or I guess what you're saying is that the table historically
> contained "rounded" rates but that clk_round_rate() isn't returning
> nice round rates. OK, but if we truly want to support an inexact
> match, you'd want to pick the rate that reduces the error, not just
> pick the first one. In other words, something like this (untested):
>
> freq = clk_round_rate(clk, mult);
> diff = abs(((long)mult - freq) / div);
> if (diff < best_diff) {
>    best_diff = diff;
>    ser_clk = freq;
>    best_div = div;
> }
I am not sure if its required that freq is a multiple of best_div now 
that we don't have a multiple of desired_clk anyway.

If it is indeed required, with above patch its not guaranteed and 
finding best_div gets little more complicated?

We may have to loop through all available frequencies and dividers?

PFB, a proposed implementation with a 2nd loop. Its tested but I haven't 
been able to optimise it further because it misses corner theoretical 
cases when I try


     maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
     prev = 0;

     /* run through quicker loop anticipating to find an exact match */
     for (div = 1; div <= maxdiv; div++) {
         mult = (unsigned long long)div * desired_clk;
         if (mult > ULONG_MAX)
             break;

         freq = clk_round_rate(clk, max((unsigned long)mult, prev+1));
         if (!(freq % desired_clk)) {
             *clk_div = freq / desired_clk;
             return freq;
         }

         if (prev && prev == freq)
             break;

         prev = freq;
     }

     pr_warn("Can't find exact match frequency and divider\n");

     /*
      * this scenario ideally should be a rare occurrence
      * run through all frequencies and find closest match
      * note that it cannot get better than a difference of 1
      */
     freq = 0;
     best_diff = ULONG_MAX;
     while (true) {
         prev = freq;
         freq = clk_round_rate(clk, freq+1);

         if (freq == prev)
             break;

         for (div = 1; div <= maxdiv; div++) {
             if (!(freq % div)) {
                 diff = abs((long)(freq/div) - desired_clk);
                 if (diff < best_diff) {
                     best_diff = diff;
                     ser_clk = freq;
                     *clk_div = div;
                     if (diff == 1)
                         break;
                 }
             }
         }
     }

     return ser_clk;
}

>
> Why do you need this? Imagine that the desired rate was 50000001 or
> 49999999. The closest match would be to use the rate 100000000 and
> divide it by 2. ...but your existing algorithm would just arbitrarily
> pick the first rate returned.
>
> NOTE also that you could end up with a slightly higher or slightly
> lower clock than requested, right? So it's important to:
> * Do signed math when comparing.
> * Save the "div" instead of trying to recompute it at the end.
>
>
>> The table did contain 51.2MHz and we would exit with same but on call to
>> clk_set_rate(51.2MHz) we were ending up with 52.1MHz
>>
>>> ...and _that's_ when you should init "ser_clk" to 0. Until then I'd
>>> leave it as uninitialized...
>>>
>>> Honestly, I'd throw all the fixes into one series, too.
>> My concern was if there would be a requirement to split the changes.
>>
>> Will put in all in 1 series with Fixes tag.
>>
>>>
>>>>           unsigned long desired_clk;
>>>>           unsigned long freq, prev;
>>>>           unsigned long div, maxdiv;
>>>> -       int64_t mult;
>>>> +       unsigned long long mult;
>>>>
>>>>           desired_clk = baud * sampling_rate;
>>>>           if (!desired_clk) {
>>>> @@ -959,8 +959,8 @@ static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
>>>>           prev = 0;
>>>>
>>>>           for (div = 1; div <= maxdiv; div++) {
>>>> -               mult = div * desired_clk;
>>>> -               if (mult > ULONG_MAX)
>>>> +               mult = (unsigned long long)div * (unsigned long long)desired_clk;
>>> I think you only need to cast one of the two. The other will be
>>> up-cast automatically.
>> Will change.
>>>
>>>> +               if (mult > (unsigned long long)ULONG_MAX)
>>> I don't think you need this cast. As far as I know the C language will
>>> "upcast" to the larger of the two types.
>> Will change.
>>>
>>> -Doug
>> Thank you.
>>
>> -Vijay/
>>
Vijaya Krishna Nivarthi June 6, 2022, 6:19 p.m. UTC | #2
Hi,


On 6/4/2022 12:10 AM, Doug Anderson wrote:
> Hi,
>
> On Fri, Jun 3, 2022 at 10:43 AM Vijaya Krishna Nivarthi
> <quic_vnivarth@quicinc.com> wrote:
>>
>> Ah, or I guess what you're saying is that the table historically
>> contained "rounded" rates but that clk_round_rate() isn't returning
>> nice round rates. OK, but if we truly want to support an inexact
>> match, you'd want to pick the rate that reduces the error, not just
>> pick the first one. In other words, something like this (untested):
>>
>> freq = clk_round_rate(clk, mult);
>> diff = abs(((long)mult - freq) / div);
>> if (diff < best_diff) {
>>     best_diff = diff;
>>     ser_clk = freq;
>>     best_div = div;
>> }
>> I am not sure if its required that freq is a multiple of best_div now
>> that we don't have a multiple of desired_clk anyway.
> How about just this (untested):
>
> freq = clk_round_rate(clk, mult);
> candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
> candidate_freq = freq / candidate_div;
> diff = abs((long)desired_clk - candidate_freq);
> if (diff < best_diff) {
>    best_diff = diff;
>    ser_clk = freq;
>    best_div = candidate_div;
> }

I am afraid this still doesn't guarantee that ser_clk is a multiple of 
best_div

I tested it with a function simulates clk_round_rate.

static unsigned long clk_round_rate_test(struct clk *clk, unsigned long 
in_freq)
{
     unsigned long root_freq[6] = {105, 204, 303, 402, 501, 602};
     int i;

     for (i = 0; i < 6; i++) {
         if (root_freq[i] >= in_freq)
             return root_freq[i];
     }
     return root_freq[6];
}

     {
         unsigned long ser_clk;
         unsigned long desired_clk;
         unsigned long freq;
         int div_round_closest;
         unsigned long div;
         unsigned long mult;
         unsigned long candidate_div, candidate_freq;

         unsigned long diff, best_diff, best_div;
         unsigned long one;

         desired_clk = 100;
         one = 1;
         best_diff = ULONG_MAX;
         pr_err("\ndesired_clk-%d\n", desired_clk);
         for (div = 1; div <= 10; div++) {
             mult = div * desired_clk;

             freq = clk_round_rate_test(clk, mult);
             div_round_closest = DIV_ROUND_CLOSEST(freq, desired_clk);
             candidate_div = max(one, (unsigned long)div_round_closest);
             candidate_freq = freq / candidate_div;
             diff = abs((long)desired_clk - candidate_freq);
             pr_err("div-%d, mult-%d, freq-%d, div_round_closest-%d, 
candidate_div-%d, candidate_freq-%d, diff-%d\n",
                 div, mult, freq, div_round_closest, candidate_div, 
candidate_freq, diff);
             if (diff < best_diff) {
                 pr_err("This is best so far\n");
                 best_diff = diff;
                 ser_clk = freq;
                 best_div = candidate_div;
             }
         }
         pr_err("\nbest_diff-%d, ser_clk-%d, best_div-%d\n",
             best_diff, ser_clk, best_div);
     }

And here is the output

[   17.835167] desired_clk-100
[   17.839567] div-1, mult-100, freq-105, div_round_closest-1, 
candidate_div-1, candidate_freq-105, diff-5
[   17.849220] This is best so far
[   17.852458] div-2, mult-200, freq-204, div_round_closest-2, 
candidate_div-2, candidate_freq-102, diff-2
[   17.862104] This is best so far
[   17.865345] div-3, mult-300, freq-303, div_round_closest-3, 
candidate_div-3, candidate_freq-101, diff-1
[   17.874995] This is best so far
[   17.878237] div-4, mult-400, freq-402, div_round_closest-4, 
candidate_div-4, candidate_freq-100, diff-0
[   17.887882] This is best so far
[   17.891118] div-5, mult-500, freq-501, div_round_closest-5, 
candidate_div-5, candidate_freq-100, diff-0
[   17.900770] div-6, mult-600, freq-602, div_round_closest-6, 
candidate_div-6, candidate_freq-100, diff-0
[   17.910415] div-7, mult-700, freq-602, div_round_closest-6, 
candidate_div-6, candidate_freq-100, diff-0
[   17.920057] div-8, mult-800, freq-602, div_round_closest-6, 
candidate_div-6, candidate_freq-100, diff-0
[   17.929703] div-9, mult-900, freq-602, div_round_closest-6, 
candidate_div-6, candidate_freq-100, diff-0
[   17.939353] div-10, mult-1000, freq-602, div_round_closest-6, 
candidate_div-6, candidate_freq-100, diff-0
[   17.949181]
[   17.949181] best_diff-0, ser_clk-402, best_div-4

Please note that we go past cases when we have an divider that can 
exactly divide the frequency(105/1, 204/2, 303/3) and end up with one 
that doesn't.

It seems like we need to run through all dividers to find one that can 
divide output freq from clk_round_rate and among those choose smallest 
delta.

In regular bootup 2nd loop was required only for 51200000. For others 
1843200, 153600, 1st loop worked fine.

Thank you.

>
> Here:
>
> freq: a freq we can definitely make
>
> candidate_div: the best number to divide freq by to get the desired clock.
>
> candidate_freq: the frequency we'll end up if we divide freq by
> candidate_div. We want this to be close to desired_clk.
>
> diff: how far away the candidate_freq is away from what we want.
>
> best_diff: how far away the best candidate was from what we wanted.
>
> ser_clk: What we should pass to clk_set_rate() to get the best candidate.
>
> best_div: What we should use as a divider to get the best candidate.
>
>
> -Doug
Doug Anderson June 7, 2022, 7:25 p.m. UTC | #3
Hi,

On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi
<quic_vnivarth@quicinc.com> wrote:
>
> Hi,
>
> On 6/7/2022 1:29 AM, Doug Anderson wrote:
> > Hi,
> >
> > On Mon, Jun 6, 2022 at 11:19 AM Vijaya Krishna Nivarthi
> > <quic_vnivarth@quicinc.com> wrote:
> >> Hi,
> >>
> >>
> >> On 6/4/2022 12:10 AM, Doug Anderson wrote:
> >>> Hi,
> >>>
> >>> On Fri, Jun 3, 2022 at 10:43 AM Vijaya Krishna Nivarthi
> >>> <quic_vnivarth@quicinc.com> wrote:
> >>>> Ah, or I guess what you're saying is that the table historically
> >>>> contained "rounded" rates but that clk_round_rate() isn't returning
> >>>> nice round rates. OK, but if we truly want to support an inexact
> >>>> match, you'd want to pick the rate that reduces the error, not just
> >>>> pick the first one. In other words, something like this (untested):
> >>>>
> >>>> freq = clk_round_rate(clk, mult);
> >>>> diff = abs(((long)mult - freq) / div);
> >>>> if (diff < best_diff) {
> >>>>      best_diff = diff;
> >>>>      ser_clk = freq;
> >>>>      best_div = div;
> >>>> }
> >>>> I am not sure if its required that freq is a multiple of best_div now
> >>>> that we don't have a multiple of desired_clk anyway.
> >>> How about just this (untested):
> >>>
> >>> freq = clk_round_rate(clk, mult);
> >>> candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
> >>> candidate_freq = freq / candidate_div;
> >>> diff = abs((long)desired_clk - candidate_freq);
> >>> if (diff < best_diff) {
> >>>     best_diff = diff;
> >>>     ser_clk = freq;
> >>>     best_div = candidate_div;
> >>> }
> >> I am afraid this still doesn't guarantee that ser_clk is a multiple of
> >> best_div
> > OK. ...I guess my question would be: does it matter for some reason?
> > "ser_clk" is just a local variable in this function. Who cares if it's
> > not a multiple of best_div? This is why we're keeping track of
> > "best_div" in the first place, so that later in the function instead
> > of:
> >
> > *clk_div = ser_clk / desired_clk;
> > if (!(*clk_div))
> >    *clk_div = 1;
> >
> > You just do:
> >
> > *clk_div = best_div;
>
> My only concern continues to be...
>
> Given ser_clk is the final frequency that this function is going to
> return and best_div is going to be the clk_divider, is it ok if the
> divider cant divide the frequency exactly?
>
> In other words, Can this function output combinations like (402,4)
> (501,5) ?
>
> If ok, then we can go ahead with this patch or even previous perhaps.

I don't see why not. You're basically just getting a resulting clock
that's not an integral "Hz", right?

So if "baud" is 9600 and sampling_rate is 16 then desired_clk is (9600
* 16) = 153600

Let's imagine that we do all the math and we finally decide that our
best bet is with the rate 922000 and a divider of 6. That means that
the actual clock we'll make is 153666.67 when we _wanted_ 153600.
There's no reason it needs to be integral, though, and 153666.67 would
still be better than making 160000.


> >> I tested it with a function simulates clk_round_rate.
> >>
> >> static unsigned long clk_round_rate_test(struct clk *clk, unsigned long
> >> in_freq)
> >> {
> >>       unsigned long root_freq[6] = {105, 204, 303, 402, 501, 602};
> >>       int i;
> >>
> >>       for (i = 0; i < 6; i++) {
> >>           if (root_freq[i] >= in_freq)
> >>               return root_freq[i];
> >>       }
> >>       return root_freq[6];
> >> }
> >>
> >>       {
> >>           unsigned long ser_clk;
> >>           unsigned long desired_clk;
> >>           unsigned long freq;
> >>           int div_round_closest;
> >>           unsigned long div;
> >>           unsigned long mult;
> >>           unsigned long candidate_div, candidate_freq;
> >>
> >>           unsigned long diff, best_diff, best_div;
> >>           unsigned long one;
> >>
> >>           desired_clk = 100;
> >>           one = 1;
> >>           best_diff = ULONG_MAX;
> >>           pr_err("\ndesired_clk-%d\n", desired_clk);
> >>           for (div = 1; div <= 10; div++) {
> >>               mult = div * desired_clk;
> >>
> >>               freq = clk_round_rate_test(clk, mult);
> >>               div_round_closest = DIV_ROUND_CLOSEST(freq, desired_clk);
> >>               candidate_div = max(one, (unsigned long)div_round_closest);
> >>               candidate_freq = freq / candidate_div;
> >>               diff = abs((long)desired_clk - candidate_freq);
> >>               pr_err("div-%d, mult-%d, freq-%d, div_round_closest-%d,
> >> candidate_div-%d, candidate_freq-%d, diff-%d\n",
> >>                   div, mult, freq, div_round_closest, candidate_div,
> >> candidate_freq, diff);
> >>               if (diff < best_diff) {
> >>                   pr_err("This is best so far\n");
> >>                   best_diff = diff;
> >>                   ser_clk = freq;
> >>                   best_div = candidate_div;
> >>               }
> >>           }
> >>           pr_err("\nbest_diff-%d, ser_clk-%d, best_div-%d\n",
> >>               best_diff, ser_clk, best_div);
> >>       }
> >>
> >> And here is the output
> >>
> >> [   17.835167] desired_clk-100
> >> [   17.839567] div-1, mult-100, freq-105, div_round_closest-1,
> >> candidate_div-1, candidate_freq-105, diff-5
> >> [   17.849220] This is best so far
> >> [   17.852458] div-2, mult-200, freq-204, div_round_closest-2,
> >> candidate_div-2, candidate_freq-102, diff-2
> >> [   17.862104] This is best so far
> >> [   17.865345] div-3, mult-300, freq-303, div_round_closest-3,
> >> candidate_div-3, candidate_freq-101, diff-1
> >> [   17.874995] This is best so far
> >> [   17.878237] div-4, mult-400, freq-402, div_round_closest-4,
> >> candidate_div-4, candidate_freq-100, diff-0
> >> [   17.887882] This is best so far
> >> [   17.891118] div-5, mult-500, freq-501, div_round_closest-5,
> >> candidate_div-5, candidate_freq-100, diff-0
> >> [   17.900770] div-6, mult-600, freq-602, div_round_closest-6,
> >> candidate_div-6, candidate_freq-100, diff-0
> >> [   17.910415] div-7, mult-700, freq-602, div_round_closest-6,
> >> candidate_div-6, candidate_freq-100, diff-0
> >> [   17.920057] div-8, mult-800, freq-602, div_round_closest-6,
> >> candidate_div-6, candidate_freq-100, diff-0
> >> [   17.929703] div-9, mult-900, freq-602, div_round_closest-6,
> >> candidate_div-6, candidate_freq-100, diff-0
> >> [   17.939353] div-10, mult-1000, freq-602, div_round_closest-6,
> >> candidate_div-6, candidate_freq-100, diff-0
> >> [   17.949181]
> >> [   17.949181] best_diff-0, ser_clk-402, best_div-4
> > That doesn't look like a terrible result. I guess nominally 602 is a
> > better approximation, but if we're accepting that we're not going to
> > have an exact rate anyway then maybe being off by that tiny amount
> > doesn't matter and we'd do better with the slow clock (maybe saves
> > power?)
> Actually power saving was the anticipation behind returning first
> frequency in original patch, when we cant find exact frequency.

Right, except that if you just pick the first clock you find it would
be _wildly_ off. I guess if you really want to do this the right way,
you need to set a maximum tolerance and pick the first rate you find
that meets that tolerance. Random web search for "uart baud rate
tolerance" makes me believe that +/- 5% deviation is OK, but to be
safe you probably want something lower. Maybe 2%? So if the desired
clock is within 2% of a clock you can make, can you just pick that
one?


> >> Please note that we go past cases when we have an divider that can
> >> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one
> >> that doesn't.
> > Ah, good point. Luckily that's a 1-line fix, right?
>
> Apologies, I could not figure out how.

Ah, sorry. Not quite 1 line, but this (untested)


freq = clk_round_rate(clk, mult);

if (freq % desired_clk == 0) {
 ser_clk = freq;
 best_div = freq / desired_clk;
 break;
}

candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
candidate_freq = freq / candidate_div;
diff = abs((long)desired_clk - candidate_freq);
if (diff < best_diff) {
  best_diff = diff;
  ser_clk = freq;
  best_div = candidate_div;
}
Vijaya Krishna Nivarthi June 8, 2022, 6:33 p.m. UTC | #4
Hi,


On 6/8/2022 12:55 AM, Doug Anderson wrote:
> Hi,
>
> On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi
> <quic_vnivarth@quicinc.com> wrote:
>> Hi,
>>
>> On 6/7/2022 1:29 AM, Doug Anderson wrote:
>>
>> My only concern continues to be...
>>
>> Given ser_clk is the final frequency that this function is going to
>> return and best_div is going to be the clk_divider, is it ok if the
>> divider cant divide the frequency exactly?
>>
>> In other words, Can this function output combinations like (402,4)
>> (501,5) ?
>>
>> If ok, then we can go ahead with this patch or even previous perhaps.
> I don't see why not. You're basically just getting a resulting clock
> that's not an integral "Hz", right?
>
> So if "baud" is 9600 and sampling_rate is 16 then desired_clk is (9600
> * 16) = 153600
>
> Let's imagine that we do all the math and we finally decide that our
> best bet is with the rate 922000 and a divider of 6. That means that
> the actual clock we'll make is 153666.67 when we _wanted_ 153600.
> There's no reason it needs to be integral, though, and 153666.67 would
> still be better than making 160000.
>
Thank you for clarification.
>>> power?)
>> Actually power saving was the anticipation behind returning first
>> frequency in original patch, when we cant find exact frequency.
> Right, except that if you just pick the first clock you find it would
> be _wildly_ off. I guess if you really want to do this the right way,
> you need to set a maximum tolerance and pick the first rate you find
> that meets that tolerance. Random web search for "uart baud rate
> tolerance" makes me believe that +/- 5% deviation is OK, but to be
> safe you probably want something lower. Maybe 2%? So if the desired
> clock is within 2% of a clock you can make, can you just pick that
> one?
Ok, 2% seems good.
>
>>>> Please note that we go past cases when we have an divider that can
>>>> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one
>>>> that doesn't.
>>> Ah, good point. Luckily that's a 1-line fix, right?
>> Apologies, I could not figure out how.
> Ah, sorry. Not quite 1 line, but this (untested)
>
>
> freq = clk_round_rate(clk, mult);
>
> if (freq % desired_clk == 0) {
>   ser_clk = freq;
>   best_div = freq / desired_clk;
>   break;
> }
>
> candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
> candidate_freq = freq / candidate_div;
> diff = abs((long)desired_clk - candidate_freq);
> if (diff < best_diff) {
>    best_diff = diff;
>    ser_clk = freq;
>    best_div = candidate_div;
> }

But then once again, we would likely need 2 loops because while we are 
ok with giving up on search for best_div on finding something within 2% 
tolerance, we may not want to give up on exact match (freq % desired_clk 
== 0 )

So how about something like this with 2 loops (more optimised than 
previous version with 2 loops)? (untested)


     maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
     prev = 0;

     /* run through quicker loop anticipating to find an exact match */
     for (div = 1; div <= maxdiv; div++) {
         mult = (unsigned long long)div * desired_clk;
         if (mult > ULONG_MAX)
             break;

         freq = clk_round_rate(clk, max((unsigned long)mult, prev+1));
         if (!(freq % desired_clk)) {
             *clk_div = freq / desired_clk;
             return freq;
         }

         if (prev && prev == freq)
             break;

         prev = freq;
     }

     pr_warn("Can't find exact match frequency and divider\n");

     freq = 0;
     best_diff = ULONG_MAX;
     prev_candidate_div = -1;
     while (true) {
         prev = freq;
         freq = clk_round_rate(clk, freq+1);

         if (freq == prev)
             break; /* end of table */

         candidate_div = DIV_ROUND_CLOSEST(freq, desired_clk);
         /*
          * Since the frequencies are increasing, previous is better
          * if we have same divider, proceed to next in table
          */
         if (prev_candidate_div == candidate_div)
             continue;
         prev_candidate_div = candidate_div;

         if (candidate_div)
             candidate_freq = freq / candidate_div;
         else
             candidate_freq = freq;

         diff = abs((long)desired_clk - candidate_freq);
         if (diff < best_diff) {
             best_diff = diff;
             ser_clk = freq;
             *clk_div = candidate_div;
             if (diff * 50 < ser_clk) {
                 two_percent_tolerance = true;
                 break;
             }
         }
     }

     if (!two_percent_tolerance) {
         pr_warn("Can't find frequency within 2 percent tolerance\n");
     }

     return ser_clk;
}

Thank you.
Doug Anderson June 8, 2022, 10:37 p.m. UTC | #5
Hi,

On Wed, Jun 8, 2022 at 11:34 AM Vijaya Krishna Nivarthi
<quic_vnivarth@quicinc.com> wrote:
>
> Hi,
>
>
> On 6/8/2022 12:55 AM, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi
> > <quic_vnivarth@quicinc.com> wrote:
> >> Hi,
> >>
> >> On 6/7/2022 1:29 AM, Doug Anderson wrote:
> >>
> >> My only concern continues to be...
> >>
> >> Given ser_clk is the final frequency that this function is going to
> >> return and best_div is going to be the clk_divider, is it ok if the
> >> divider cant divide the frequency exactly?
> >>
> >> In other words, Can this function output combinations like (402,4)
> >> (501,5) ?
> >>
> >> If ok, then we can go ahead with this patch or even previous perhaps.
> > I don't see why not. You're basically just getting a resulting clock
> > that's not an integral "Hz", right?
> >
> > So if "baud" is 9600 and sampling_rate is 16 then desired_clk is (9600
> > * 16) = 153600
> >
> > Let's imagine that we do all the math and we finally decide that our
> > best bet is with the rate 922000 and a divider of 6. That means that
> > the actual clock we'll make is 153666.67 when we _wanted_ 153600.
> > There's no reason it needs to be integral, though, and 153666.67 would
> > still be better than making 160000.
> >
> Thank you for clarification.
> >>> power?)
> >> Actually power saving was the anticipation behind returning first
> >> frequency in original patch, when we cant find exact frequency.
> > Right, except that if you just pick the first clock you find it would
> > be _wildly_ off. I guess if you really want to do this the right way,
> > you need to set a maximum tolerance and pick the first rate you find
> > that meets that tolerance. Random web search for "uart baud rate
> > tolerance" makes me believe that +/- 5% deviation is OK, but to be
> > safe you probably want something lower. Maybe 2%? So if the desired
> > clock is within 2% of a clock you can make, can you just pick that
> > one?
> Ok, 2% seems good.
> >
> >>>> Please note that we go past cases when we have an divider that can
> >>>> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one
> >>>> that doesn't.
> >>> Ah, good point. Luckily that's a 1-line fix, right?
> >> Apologies, I could not figure out how.
> > Ah, sorry. Not quite 1 line, but this (untested)
> >
> >
> > freq = clk_round_rate(clk, mult);
> >
> > if (freq % desired_clk == 0) {
> >   ser_clk = freq;
> >   best_div = freq / desired_clk;
> >   break;
> > }
> >
> > candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
> > candidate_freq = freq / candidate_div;
> > diff = abs((long)desired_clk - candidate_freq);
> > if (diff < best_diff) {
> >    best_diff = diff;
> >    ser_clk = freq;
> >    best_div = candidate_div;
> > }
>
> But then once again, we would likely need 2 loops because while we are
> ok with giving up on search for best_div on finding something within 2%
> tolerance, we may not want to give up on exact match (freq % desired_clk
> == 0 )

Ah, it took me a while to understand why two loops. It's because in
one case you're trying multiplies and in the other you're bumping up
to the next closest clock rate. I don't think you really need to do
that. Just test the (rate - 2%) and the rate. How about this (only
lightly tested):

    ser_clk = 0;
    maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
    div = 1;
    while (div < maxdiv) {
        mult = (unsigned long long)div * desired_clk;
        if (mult != (unsigned long)mult)
            break;

        two_percent = mult / 50;

        /*
         * Loop requesting (freq - 2%) and possibly (freq).
         *
         * We'll keep track of the lowest freq inexact match we found
         * but always try to find a perfect match. NOTE: this algorithm
         * could miss a slightly better freq if there's more than one
         * freq between (freq - 2%) and (freq) but (freq) can't be made
         * exactly, but that's OK.
         *
         * This absolutely relies on the fact that the Qualcomm clock
         * driver always rounds up.
         */
        test_freq = mult - two_percent;
        while (test_freq <= mult) {
            freq = clk_round_rate(clk, test_freq);

            /*
             * A dead-on freq is an insta-win. This implicitly
             * handles when "freq == mult"
             */
            if (!(freq % desired_clk)) {
                *clk_div = freq / desired_clk;
                return freq;
            }

            /*
             * Only time clock framework doesn't round up is if
             * we're past the max clock rate. We're done searching
             * if that's the case.
             */
            if (freq < test_freq)
                return ser_clk;

            /* Save the first (lowest freq) within 2% */
            if (!ser_clk && freq <= mult + two_percent) {
                ser_clk = freq;
                *clk_div = div;
            }

            /*
             * If we already rounded up past mult then this will
             * cause the loop to exit. If not then this will run
             * the loop a second time with exactly mult.
             */
            test_freq = max(freq + 1, mult);
        }

        /*
         * test_freq will always be bigger than mult by at least 1.
         * That means we can get the next divider with a DIV_ROUND_UP.
         * This has the advantage of skipping by a whole bunch of divs
         * If the clock framework already bypassed them.
         */
        div = DIV_ROUND_UP(test_freq, desired_clk);
        }

    return ser_clk;
Vijaya Krishna Nivarthi June 9, 2022, 5:48 p.m. UTC | #6
Hi,


On 6/9/2022 4:07 AM, Doug Anderson wrote:
> Hi,
>
> On Wed, Jun 8, 2022 at 11:34 AM Vijaya Krishna Nivarthi
> <quic_vnivarth@quicinc.com> wrote:
>> Hi,
>>
>>
>> On 6/8/2022 12:55 AM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi
>>> <quic_vnivarth@quicinc.com> wrote:
>>> Ah, sorry. Not quite 1 line, but this (untested) 
>>>
>>> freq = clk_round_rate(clk, mult);
>>>
>>> if (freq % desired_clk == 0) {
>>>    ser_clk = freq;
>>>    best_div = freq / desired_clk;
>>>    break;
>>> }
>>>
>>> candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
>>> candidate_freq = freq / candidate_div;
>>> diff = abs((long)desired_clk - candidate_freq);
>>> if (diff < best_diff) {
>>>     best_diff = diff;
>>>     ser_clk = freq;
>>>     best_div = candidate_div;
>>> }
>> But then once again, we would likely need 2 loops because while we are
>> ok with giving up on search for best_div on finding something within 2%
>> tolerance, we may not want to give up on exact match (freq % desired_clk
>> == 0 )
> Ah, it took me a while to understand why two loops. It's because in
> one case you're trying multiplies and in the other you're bumping up
> to the next closest clock rate. I don't think you really need to do
> that. Just test the (rate - 2%) and the rate. How about this (only
> lightly tested):
>
>      ser_clk = 0;
>      maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
>      div = 1;
>      while (div < maxdiv) {

div <= maxdiv ?

>          mult = (unsigned long long)div * desired_clk;
>          if (mult != (unsigned long)mult)
>              break;
>
>          two_percent = mult / 50;
>
>          /*
>           * Loop requesting (freq - 2%) and possibly (freq).
>           *
>           * We'll keep track of the lowest freq inexact match we found
>           * but always try to find a perfect match. NOTE: this algorithm
>           * could miss a slightly better freq if there's more than one
>           * freq between (freq - 2%) and (freq) but (freq) can't be made
>           * exactly, but that's OK.
>           *
>           * This absolutely relies on the fact that the Qualcomm clock
>           * driver always rounds up.
>           */
>          test_freq = mult - two_percent;
>          while (test_freq <= mult) {
>              freq = clk_round_rate(clk, test_freq);
>
>              /*
>               * A dead-on freq is an insta-win. This implicitly
>               * handles when "freq == mult"
>               */
>              if (!(freq % desired_clk)) {
>                  *clk_div = freq / desired_clk;
>                  return freq;
>              }
>
>              /*
>               * Only time clock framework doesn't round up is if
>               * we're past the max clock rate. We're done searching
>               * if that's the case.
>               */
>              if (freq < test_freq)
>                  return ser_clk;
>
>              /* Save the first (lowest freq) within 2% */
>              if (!ser_clk && freq <= mult + two_percent) {
>                  ser_clk = freq;
>                  *clk_div = div;
>              }

My last concern is with search happening only within 2% tolerance.

Do we fail otherwise?

This real case has best tolerance of 1.9%.

[   17.963672] 20220530 desired_clk-51200000
[   21.193550] 20220530 returning ser_clk-52174000, div-1, diff-974000

Seems close.

Thank you.

>
>              /*
>               * If we already rounded up past mult then this will
>               * cause the loop to exit. If not then this will run
>               * the loop a second time with exactly mult.
>               */
>              test_freq = max(freq + 1, mult);
>          }
>
>          /*
>           * test_freq will always be bigger than mult by at least 1.
>           * That means we can get the next divider with a DIV_ROUND_UP.
>           * This has the advantage of skipping by a whole bunch of divs
>           * If the clock framework already bypassed them.
>           */
>          div = DIV_ROUND_UP(test_freq, desired_clk);
>          }
>
>      return ser_clk;
Vijaya Krishna Nivarthi June 9, 2022, 6:08 p.m. UTC | #7
Hi,

Re-sending as my earlier message bounced.


On 6/9/2022 4:07 AM, Doug Anderson wrote:
> Hi,
>
> On Wed, Jun 8, 2022 at 11:34 AM Vijaya Krishna Nivarthi
> <quic_vnivarth@quicinc.com> wrote:
>> Hi,
>>
>>
>> On 6/8/2022 12:55 AM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi
>>> <quic_vnivarth@quicinc.com> wrote:
>>>> Hi,
>>>>
>>>> On 6/7/2022 1:29 AM, Doug Anderson wrote:
>>>>
>>>> My only concern continues to be...
>>>>
>>>> Given ser_clk is the final frequency that this function is going to
>>>> return and best_div is going to be the clk_divider, is it ok if the
>>>> divider cant divide the frequency exactly?
>>>>
>>>> In other words, Can this function output combinations like (402,4)
>>>> (501,5) ?
>>>>
>>>> If ok, then we can go ahead with this patch or even previous perhaps.
>>> I don't see why not. You're basically just getting a resulting clock
>>> that's not an integral "Hz", right?
>>>
>>> So if "baud" is 9600 and sampling_rate is 16 then desired_clk is (9600
>>> * 16) = 153600
>>>
>>> Let's imagine that we do all the math and we finally decide that our
>>> best bet is with the rate 922000 and a divider of 6. That means that
>>> the actual clock we'll make is 153666.67 when we _wanted_ 153600.
>>> There's no reason it needs to be integral, though, and 153666.67 would
>>> still be better than making 160000.
>>>
>> Thank you for clarification.
>>>>> power?)
>>>> Actually power saving was the anticipation behind returning first
>>>> frequency in original patch, when we cant find exact frequency.
>>> Right, except that if you just pick the first clock you find it would
>>> be _wildly_ off. I guess if you really want to do this the right way,
>>> you need to set a maximum tolerance and pick the first rate you find
>>> that meets that tolerance. Random web search for "uart baud rate
>>> tolerance" makes me believe that +/- 5% deviation is OK, but to be
>>> safe you probably want something lower. Maybe 2%? So if the desired
>>> clock is within 2% of a clock you can make, can you just pick that
>>> one?
>> Ok, 2% seems good.
>>>>>> Please note that we go past cases when we have an divider that can
>>>>>> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one
>>>>>> that doesn't.
>>>>> Ah, good point. Luckily that's a 1-line fix, right?
>>>> Apologies, I could not figure out how.
>>> Ah, sorry. Not quite 1 line, but this (untested)
>>>
>>>
>>> freq = clk_round_rate(clk, mult);
>>>
>>> if (freq % desired_clk == 0) {
>>>    ser_clk = freq;
>>>    best_div = freq / desired_clk;
>>>    break;
>>> }
>>>
>>> candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk));
>>> candidate_freq = freq / candidate_div;
>>> diff = abs((long)desired_clk - candidate_freq);
>>> if (diff < best_diff) {
>>>     best_diff = diff;
>>>     ser_clk = freq;
>>>     best_div = candidate_div;
>>> }
>> But then once again, we would likely need 2 loops because while we are
>> ok with giving up on search for best_div on finding something within 2%
>> tolerance, we may not want to give up on exact match (freq % desired_clk
>> == 0 )
> Ah, it took me a while to understand why two loops. It's because in
> one case you're trying multiplies and in the other you're bumping up
> to the next closest clock rate. I don't think you really need to do
> that. Just test the (rate - 2%) and the rate. How about this (only
> lightly tested):
>
>      ser_clk = 0;
>      maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
>      div = 1;
>      while (div < maxdiv) {
div <= maxdiv ?
>          mult = (unsigned long long)div * desired_clk;
>          if (mult != (unsigned long)mult)
>              break;
>
>          two_percent = mult / 50;
>
>          /*
>           * Loop requesting (freq - 2%) and possibly (freq).
>           *
>           * We'll keep track of the lowest freq inexact match we found
>           * but always try to find a perfect match. NOTE: this algorithm
>           * could miss a slightly better freq if there's more than one
>           * freq between (freq - 2%) and (freq) but (freq) can't be made
>           * exactly, but that's OK.
>           *
>           * This absolutely relies on the fact that the Qualcomm clock
>           * driver always rounds up.
>           */
>          test_freq = mult - two_percent;
>          while (test_freq <= mult) {
>              freq = clk_round_rate(clk, test_freq);
>
>              /*
>               * A dead-on freq is an insta-win. This implicitly
>               * handles when "freq == mult"
>               */
>              if (!(freq % desired_clk)) {
>                  *clk_div = freq / desired_clk;
>                  return freq;
>              }
>
>              /*
>               * Only time clock framework doesn't round up is if
>               * we're past the max clock rate. We're done searching
>               * if that's the case.
>               */
>              if (freq < test_freq)
>                  return ser_clk;
>
>              /* Save the first (lowest freq) within 2% */
>              if (!ser_clk && freq <= mult + two_percent) {
>                  ser_clk = freq;
>                  *clk_div = div;
>              }
My last concern is with search happening only within 2% tolerance.

Do we fail otherwise?

This real case has best tolerance of 1.9%.

[   17.963672] 20220530 desired_clk-51200000
[   21.193550] 20220530 returning ser_clk-52174000, div-1, diff-974000

Seems close.

Thank you.
>
>              /*
>               * If we already rounded up past mult then this will
>               * cause the loop to exit. If not then this will run
>               * the loop a second time with exactly mult.
>               */
>              test_freq = max(freq + 1, mult);
>          }
>
>          /*
>           * test_freq will always be bigger than mult by at least 1.
>           * That means we can get the next divider with a DIV_ROUND_UP.
>           * This has the advantage of skipping by a whole bunch of divs
>           * If the clock framework already bypassed them.
>           */
>          div = DIV_ROUND_UP(test_freq, desired_clk);
>          }
>
>      return ser_clk;
Vijaya Krishna Nivarthi June 10, 2022, 9:33 a.m. UTC | #8
Hi,

Re-sending (2nd attempt) as emails are bouncing...


> >
> > But then once again, we would likely need 2 loops because while we are
> > ok with giving up on search for best_div on finding something within
> > 2% tolerance, we may not want to give up on exact match (freq %
> > desired_clk == 0 )
> 
> Ah, it took me a while to understand why two loops. It's because in one case
> you're trying multiplies and in the other you're bumping up to the next
> closest clock rate. I don't think you really need to do that. Just test the (rate -
> 2%) and the rate. How about this (only lightly tested):
> 
>     ser_clk = 0;
>     maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
>     div = 1;
>     while (div < maxdiv) {


div <= maxdiv ?


>         mult = (unsigned long long)div * desired_clk;
>         if (mult != (unsigned long)mult)
>             break;
> 
>         two_percent = mult / 50;
> 
>         /*
>          * Loop requesting (freq - 2%) and possibly (freq).
>          *
>          * We'll keep track of the lowest freq inexact match we found
>          * but always try to find a perfect match. NOTE: this algorithm
>          * could miss a slightly better freq if there's more than one
>          * freq between (freq - 2%) and (freq) but (freq) can't be made
>          * exactly, but that's OK.
>          *
>          * This absolutely relies on the fact that the Qualcomm clock
>          * driver always rounds up.
>          */
>         test_freq = mult - two_percent;
>         while (test_freq <= mult) {
>             freq = clk_round_rate(clk, test_freq);
> 
>             /*
>              * A dead-on freq is an insta-win. This implicitly
>              * handles when "freq == mult"
>              */
>             if (!(freq % desired_clk)) {
>                 *clk_div = freq / desired_clk;
>                 return freq;
>             }
> 
>             /*
>              * Only time clock framework doesn't round up is if
>              * we're past the max clock rate. We're done searching
>              * if that's the case.
>              */
>             if (freq < test_freq)
>                 return ser_clk;
> 
>             /* Save the first (lowest freq) within 2% */
>             if (!ser_clk && freq <= mult + two_percent) {
>                 ser_clk = freq;
>                 *clk_div = div;
>             }

My last concern is with search happening only within 2% tolerance.
Do we fail otherwise?

This real case has best tolerance of 1.9% and seems close.

[   17.963672] 20220530 desired_clk-51200000
[   21.193550] 20220530 returning ser_clk-52174000, div-1, diff-974000

Perhaps we can fallback on 1st clock rate?

Thank you.

> 
>             /*
>              * If we already rounded up past mult then this will
>              * cause the loop to exit. If not then this will run
>              * the loop a second time with exactly mult.
>              */
>             test_freq = max(freq + 1, mult);
>         }
> 
>         /*
>          * test_freq will always be bigger than mult by at least 1.
>          * That means we can get the next divider with a DIV_ROUND_UP.
>          * This has the advantage of skipping by a whole bunch of divs
>          * If the clock framework already bypassed them.
>          */
>         div = DIV_ROUND_UP(test_freq, desired_clk);
>         }
> 
>     return ser_clk;
Doug Anderson June 10, 2022, 5:20 p.m. UTC | #9
Hi,

On Fri, Jun 10, 2022 at 2:33 AM Vijaya Krishna Nivarthi (Temp) (QUIC)
<quic_vnivarth@quicinc.com> wrote:
>
> Hi,
>
> Re-sending (2nd attempt) as emails are bouncing...
>
>
> > >
> > > But then once again, we would likely need 2 loops because while we are
> > > ok with giving up on search for best_div on finding something within
> > > 2% tolerance, we may not want to give up on exact match (freq %
> > > desired_clk == 0 )
> >
> > Ah, it took me a while to understand why two loops. It's because in one case
> > you're trying multiplies and in the other you're bumping up to the next
> > closest clock rate. I don't think you really need to do that. Just test the (rate -
> > 2%) and the rate. How about this (only lightly tested):
> >
> >     ser_clk = 0;
> >     maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
> >     div = 1;
> >     while (div < maxdiv) {
>
>
> div <= maxdiv ?

Ah, sure.


> >         mult = (unsigned long long)div * desired_clk;
> >         if (mult != (unsigned long)mult)
> >             break;
> >
> >         two_percent = mult / 50;
> >
> >         /*
> >          * Loop requesting (freq - 2%) and possibly (freq).
> >          *
> >          * We'll keep track of the lowest freq inexact match we found
> >          * but always try to find a perfect match. NOTE: this algorithm
> >          * could miss a slightly better freq if there's more than one
> >          * freq between (freq - 2%) and (freq) but (freq) can't be made
> >          * exactly, but that's OK.
> >          *
> >          * This absolutely relies on the fact that the Qualcomm clock
> >          * driver always rounds up.
> >          */
> >         test_freq = mult - two_percent;
> >         while (test_freq <= mult) {
> >             freq = clk_round_rate(clk, test_freq);
> >
> >             /*
> >              * A dead-on freq is an insta-win. This implicitly
> >              * handles when "freq == mult"
> >              */
> >             if (!(freq % desired_clk)) {
> >                 *clk_div = freq / desired_clk;
> >                 return freq;
> >             }
> >
> >             /*
> >              * Only time clock framework doesn't round up is if
> >              * we're past the max clock rate. We're done searching
> >              * if that's the case.
> >              */
> >             if (freq < test_freq)
> >                 return ser_clk;
> >
> >             /* Save the first (lowest freq) within 2% */
> >             if (!ser_clk && freq <= mult + two_percent) {
> >                 ser_clk = freq;
> >                 *clk_div = div;
> >             }
>
> My last concern is with search happening only within 2% tolerance.
> Do we fail otherwise?
>
> This real case has best tolerance of 1.9% and seems close.
>
> [   17.963672] 20220530 desired_clk-51200000
> [   21.193550] 20220530 returning ser_clk-52174000, div-1, diff-974000
>
> Perhaps we can fallback on 1st clock rate?

I don't feel super comfortable just blindly falling back on the 1st
clock rate. It could be wildly (more than 5%) wrong, can't it?

IMO:
* If you're not comfortable with 2%, you could always pick 3% or 4%.
As I said, my random web search seemed to indicate that up to 5% was
perhaps OK.
* It's probably overkill, but you could abstract the whole search out
and try searching once for 2% and then try 4%?


-Doug
Vijaya Krishna Nivarthi (Temp) June 21, 2022, 5:58 p.m. UTC | #10
Hi,

For desired_clk = 100 and clock rates like 1st from below, DIV_ROUND_UP seems to cause missing candidate solutions.

static unsigned long clk_round_rate_test(struct clk *clk, unsigned long in_freq)
{
	//unsigned long root_freq[] = {301, 702, 1004};
	//unsigned long root_freq[] = {301, 702, 1004, 2000, 3000};
	//unsigned long root_freq[] = {50, 97, 99};
	//unsigned long root_freq[] = {50, 97, 99, 200};
	//unsigned long root_freq[] = {92, 110, 193, 230};
	//unsigned long root_freq[] = {92, 110, 193, 230, 300, 401};
	//unsigned long root_freq[] = {92, 110, 193, 230, 295, 296, 297, 401};
	//unsigned long root_freq[] = {92, 110, 193, 230, 295, 296, 297, 300, 401};
	//unsigned long root_freq[] = {197, 198, 199};
	unsigned long root_freq[] = {197, 198, 199, 200};
	int i;
	size_t n = sizeof root_freq / sizeof *root_freq;

	for (i = 0; i < n; i++) {
		if (root_freq[i] >= in_freq)
			return root_freq[i];
	}
	return root_freq[n-1];
}

I modified to handle such cases, optimised little and uploaded a patch.
It seems to work for all the cases like above.

Thank you,
Vijay/



> 
> Ah, it took me a while to understand why two loops. It's because in one case
> you're trying multiplies and in the other you're bumping up to the next
> closest clock rate. I don't think you really need to do that. Just test the (rate -
> 2%) and the rate. How about this (only lightly tested):
> 
>     ser_clk = 0;
>     maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT;
>     div = 1;
>     while (div < maxdiv) {
>         mult = (unsigned long long)div * desired_clk;
>         if (mult != (unsigned long)mult)
>             break;
> 
>         two_percent = mult / 50;
> 
>         /*
>          * Loop requesting (freq - 2%) and possibly (freq).
>          *
>          * We'll keep track of the lowest freq inexact match we found
>          * but always try to find a perfect match. NOTE: this algorithm
>          * could miss a slightly better freq if there's more than one
>          * freq between (freq - 2%) and (freq) but (freq) can't be made
>          * exactly, but that's OK.
>          *
>          * This absolutely relies on the fact that the Qualcomm clock
>          * driver always rounds up.
>          */
>         test_freq = mult - two_percent;
>         while (test_freq <= mult) {
>             freq = clk_round_rate(clk, test_freq);
> 
>             /*
>              * A dead-on freq is an insta-win. This implicitly
>              * handles when "freq == mult"
>              */
>             if (!(freq % desired_clk)) {
>                 *clk_div = freq / desired_clk;
>                 return freq;
>             }
> 
>             /*
>              * Only time clock framework doesn't round up is if
>              * we're past the max clock rate. We're done searching
>              * if that's the case.
>              */
>             if (freq < test_freq)
>                 return ser_clk;
> 
>             /* Save the first (lowest freq) within 2% */
>             if (!ser_clk && freq <= mult + two_percent) {
>                 ser_clk = freq;
>                 *clk_div = div;
>             }
> 
>             /*
>              * If we already rounded up past mult then this will
>              * cause the loop to exit. If not then this will run
>              * the loop a second time with exactly mult.
>              */
>             test_freq = max(freq + 1, mult);
>         }
> 
>         /*
>          * test_freq will always be bigger than mult by at least 1.
>          * That means we can get the next divider with a DIV_ROUND_UP.
>          * This has the advantage of skipping by a whole bunch of divs
>          * If the clock framework already bypassed them.
>          */
>         div = DIV_ROUND_UP(test_freq, desired_clk);
>         }
> 
>     return ser_clk;
diff mbox series

Patch

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 4733a23..08f3ad4 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -943,11 +943,11 @@  static int qcom_geni_serial_startup(struct uart_port *uport)
 static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
 			unsigned int sampling_rate, unsigned int *clk_div)
 {
-	unsigned long ser_clk;
+	unsigned long ser_clk = 0;
 	unsigned long desired_clk;
 	unsigned long freq, prev;
 	unsigned long div, maxdiv;
-	int64_t mult;
+	unsigned long long mult;
 
 	desired_clk = baud * sampling_rate;
 	if (!desired_clk) {
@@ -959,8 +959,8 @@  static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud,
 	prev = 0;
 
 	for (div = 1; div <= maxdiv; div++) {
-		mult = div * desired_clk;
-		if (mult > ULONG_MAX)
+		mult = (unsigned long long)div * (unsigned long long)desired_clk;
+		if (mult > (unsigned long long)ULONG_MAX)
 			break;
 
 		freq = clk_round_rate(clk, (unsigned long)mult);