Message ID | 1654021066-13341-1-git-send-email-quic_vnivarth@quicinc.com |
---|---|

State | New |

Headers | show |

Series | tty: serial: qcom-geni-serial: minor fixes to get_clk_div_rate() | expand |

Hi, On 6/1/2022 9:03 PM, Doug Anderson wrote: > Hi, > > On Wed, Jun 1, 2022 at 3:46 AM Vijaya Krishna Nivarthi > <quic_vnivarth@quicinc.com> wrote: >> Hi, >> >> On 6/1/2022 12:58 AM, Doug Anderson wrote: >>> Hi, >>> >>> On Tue, May 31, 2022 at 11:18 AM Vijaya Krishna Nivarthi >>> <quic_vnivarth@quicinc.com> wrote: >>>> Add missing initialisation and correct type casting >>>> >>>> Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com> >>>> --- >>>> drivers/tty/serial/qcom_geni_serial.c | 8 ++++---- >>>> 1 file changed, 4 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c >>>> index 4733a23..08f3ad4 100644 >>>> --- a/drivers/tty/serial/qcom_geni_serial.c >>>> +++ b/drivers/tty/serial/qcom_geni_serial.c >>>> @@ -943,11 +943,11 @@ static int qcom_geni_serial_startup(struct uart_port *uport) >>>> static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud, >>>> unsigned int sampling_rate, unsigned int *clk_div) >>>> { >>>> - unsigned long ser_clk; >>>> + unsigned long ser_clk = 0; >>> In this patch it's not at all obvious why you'd need to init to 0. I >>> think the "for loop" is guaranteed to run at least once because >>> "max_div" is known at compile time. ...and currently each time through >>> the "for" loop you'll always set "ser_clk". >> Ok, I realised we will never break out of for loop exceeding ULONG_MAX >> in 1st pass, so yes ser_clk will always be set. >> >>> I think in a future patch you'll want to _remove_ this from the for loop: >>> >>> if (!prev) >>> ser_clk = freq; >> Intent is to save (and use) 1st freq if we cannot find an exact divider. >> >> Isn't it ok? >> >> For example please find debug output for a required frequency of 51.2MHz. >> >> We try dividers 1, 2, 3 and end up with 52.1MHz the first result. >> >> [ 18.815432] 20220509 get_clk_div_rate desired_clk:51200000 >> [ 18.821081] 20220509 get_clk_div_rate maxdiv:4095 >> [ 18.825924] 20220509 get_clk_div_rate div:1 >> [ 18.830239] 20220509 get_clk_div_rate freq:52174000 >> [ 18.835288] 20220509 get_clk_div_rate div:2 >> [ 18.839628] 20220509 get_clk_div_rate freq:100000000 >> [ 18.844794] 20220509 get_clk_div_rate div:3 >> [ 18.849119] 20220509 get_clk_div_rate freq:100000000 >> [ 18.854254] 20220509 get_clk_div_rate reached max frequency breaking... >> [ 18.861072] 20220509 get_clk_div_rate clk_div=1, ser_clk=52174000 >> >> The behaviour was same earlier too when root_freq table was present. > Are you certain about the behavior being the same earlier? Before > commit c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart > frequency table..."), the behavior was that get_clk_cfg() would return > 0 if there was no exact match. Then get_clk_div_rate() would see this > 0 and print an error and return. Then the rest of > qcom_geni_serial_set_termios() would do nothing at all. > > Ah, or I guess what you're saying is that the table historically > contained "rounded" rates but that clk_round_rate() isn't returning > nice round rates. OK, but if we truly want to support an inexact > match, you'd want to pick the rate that reduces the error, not just > pick the first one. In other words, something like this (untested): > > freq = clk_round_rate(clk, mult); > diff = abs(((long)mult - freq) / div); > if (diff < best_diff) { > best_diff = diff; > ser_clk = freq; > best_div = div; > } I am not sure if its required that freq is a multiple of best_div now that we don't have a multiple of desired_clk anyway. If it is indeed required, with above patch its not guaranteed and finding best_div gets little more complicated? We may have to loop through all available frequencies and dividers? PFB, a proposed implementation with a 2nd loop. Its tested but I haven't been able to optimise it further because it misses corner theoretical cases when I try maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT; prev = 0; /* run through quicker loop anticipating to find an exact match */ for (div = 1; div <= maxdiv; div++) { mult = (unsigned long long)div * desired_clk; if (mult > ULONG_MAX) break; freq = clk_round_rate(clk, max((unsigned long)mult, prev+1)); if (!(freq % desired_clk)) { *clk_div = freq / desired_clk; return freq; } if (prev && prev == freq) break; prev = freq; } pr_warn("Can't find exact match frequency and divider\n"); /* * this scenario ideally should be a rare occurrence * run through all frequencies and find closest match * note that it cannot get better than a difference of 1 */ freq = 0; best_diff = ULONG_MAX; while (true) { prev = freq; freq = clk_round_rate(clk, freq+1); if (freq == prev) break; for (div = 1; div <= maxdiv; div++) { if (!(freq % div)) { diff = abs((long)(freq/div) - desired_clk); if (diff < best_diff) { best_diff = diff; ser_clk = freq; *clk_div = div; if (diff == 1) break; } } } } return ser_clk; } > > Why do you need this? Imagine that the desired rate was 50000001 or > 49999999. The closest match would be to use the rate 100000000 and > divide it by 2. ...but your existing algorithm would just arbitrarily > pick the first rate returned. > > NOTE also that you could end up with a slightly higher or slightly > lower clock than requested, right? So it's important to: > * Do signed math when comparing. > * Save the "div" instead of trying to recompute it at the end. > > >> The table did contain 51.2MHz and we would exit with same but on call to >> clk_set_rate(51.2MHz) we were ending up with 52.1MHz >> >>> ...and _that's_ when you should init "ser_clk" to 0. Until then I'd >>> leave it as uninitialized... >>> >>> Honestly, I'd throw all the fixes into one series, too. >> My concern was if there would be a requirement to split the changes. >> >> Will put in all in 1 series with Fixes tag. >> >>> >>>> unsigned long desired_clk; >>>> unsigned long freq, prev; >>>> unsigned long div, maxdiv; >>>> - int64_t mult; >>>> + unsigned long long mult; >>>> >>>> desired_clk = baud * sampling_rate; >>>> if (!desired_clk) { >>>> @@ -959,8 +959,8 @@ static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud, >>>> prev = 0; >>>> >>>> for (div = 1; div <= maxdiv; div++) { >>>> - mult = div * desired_clk; >>>> - if (mult > ULONG_MAX) >>>> + mult = (unsigned long long)div * (unsigned long long)desired_clk; >>> I think you only need to cast one of the two. The other will be >>> up-cast automatically. >> Will change. >>> >>>> + if (mult > (unsigned long long)ULONG_MAX) >>> I don't think you need this cast. As far as I know the C language will >>> "upcast" to the larger of the two types. >> Will change. >>> >>> -Doug >> Thank you. >> >> -Vijay/ >>

Hi, On 6/4/2022 12:10 AM, Doug Anderson wrote: > Hi, > > On Fri, Jun 3, 2022 at 10:43 AM Vijaya Krishna Nivarthi > <quic_vnivarth@quicinc.com> wrote: >> >> Ah, or I guess what you're saying is that the table historically >> contained "rounded" rates but that clk_round_rate() isn't returning >> nice round rates. OK, but if we truly want to support an inexact >> match, you'd want to pick the rate that reduces the error, not just >> pick the first one. In other words, something like this (untested): >> >> freq = clk_round_rate(clk, mult); >> diff = abs(((long)mult - freq) / div); >> if (diff < best_diff) { >> best_diff = diff; >> ser_clk = freq; >> best_div = div; >> } >> I am not sure if its required that freq is a multiple of best_div now >> that we don't have a multiple of desired_clk anyway. > How about just this (untested): > > freq = clk_round_rate(clk, mult); > candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk)); > candidate_freq = freq / candidate_div; > diff = abs((long)desired_clk - candidate_freq); > if (diff < best_diff) { > best_diff = diff; > ser_clk = freq; > best_div = candidate_div; > } I am afraid this still doesn't guarantee that ser_clk is a multiple of best_div I tested it with a function simulates clk_round_rate. static unsigned long clk_round_rate_test(struct clk *clk, unsigned long in_freq) { unsigned long root_freq[6] = {105, 204, 303, 402, 501, 602}; int i; for (i = 0; i < 6; i++) { if (root_freq[i] >= in_freq) return root_freq[i]; } return root_freq[6]; } { unsigned long ser_clk; unsigned long desired_clk; unsigned long freq; int div_round_closest; unsigned long div; unsigned long mult; unsigned long candidate_div, candidate_freq; unsigned long diff, best_diff, best_div; unsigned long one; desired_clk = 100; one = 1; best_diff = ULONG_MAX; pr_err("\ndesired_clk-%d\n", desired_clk); for (div = 1; div <= 10; div++) { mult = div * desired_clk; freq = clk_round_rate_test(clk, mult); div_round_closest = DIV_ROUND_CLOSEST(freq, desired_clk); candidate_div = max(one, (unsigned long)div_round_closest); candidate_freq = freq / candidate_div; diff = abs((long)desired_clk - candidate_freq); pr_err("div-%d, mult-%d, freq-%d, div_round_closest-%d, candidate_div-%d, candidate_freq-%d, diff-%d\n", div, mult, freq, div_round_closest, candidate_div, candidate_freq, diff); if (diff < best_diff) { pr_err("This is best so far\n"); best_diff = diff; ser_clk = freq; best_div = candidate_div; } } pr_err("\nbest_diff-%d, ser_clk-%d, best_div-%d\n", best_diff, ser_clk, best_div); } And here is the output [ 17.835167] desired_clk-100 [ 17.839567] div-1, mult-100, freq-105, div_round_closest-1, candidate_div-1, candidate_freq-105, diff-5 [ 17.849220] This is best so far [ 17.852458] div-2, mult-200, freq-204, div_round_closest-2, candidate_div-2, candidate_freq-102, diff-2 [ 17.862104] This is best so far [ 17.865345] div-3, mult-300, freq-303, div_round_closest-3, candidate_div-3, candidate_freq-101, diff-1 [ 17.874995] This is best so far [ 17.878237] div-4, mult-400, freq-402, div_round_closest-4, candidate_div-4, candidate_freq-100, diff-0 [ 17.887882] This is best so far [ 17.891118] div-5, mult-500, freq-501, div_round_closest-5, candidate_div-5, candidate_freq-100, diff-0 [ 17.900770] div-6, mult-600, freq-602, div_round_closest-6, candidate_div-6, candidate_freq-100, diff-0 [ 17.910415] div-7, mult-700, freq-602, div_round_closest-6, candidate_div-6, candidate_freq-100, diff-0 [ 17.920057] div-8, mult-800, freq-602, div_round_closest-6, candidate_div-6, candidate_freq-100, diff-0 [ 17.929703] div-9, mult-900, freq-602, div_round_closest-6, candidate_div-6, candidate_freq-100, diff-0 [ 17.939353] div-10, mult-1000, freq-602, div_round_closest-6, candidate_div-6, candidate_freq-100, diff-0 [ 17.949181] [ 17.949181] best_diff-0, ser_clk-402, best_div-4 Please note that we go past cases when we have an divider that can exactly divide the frequency(105/1, 204/2, 303/3) and end up with one that doesn't. It seems like we need to run through all dividers to find one that can divide output freq from clk_round_rate and among those choose smallest delta. In regular bootup 2nd loop was required only for 51200000. For others 1843200, 153600, 1st loop worked fine. Thank you. > > Here: > > freq: a freq we can definitely make > > candidate_div: the best number to divide freq by to get the desired clock. > > candidate_freq: the frequency we'll end up if we divide freq by > candidate_div. We want this to be close to desired_clk. > > diff: how far away the candidate_freq is away from what we want. > > best_diff: how far away the best candidate was from what we wanted. > > ser_clk: What we should pass to clk_set_rate() to get the best candidate. > > best_div: What we should use as a divider to get the best candidate. > > > -Doug

Hi, On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com> wrote: > > Hi, > > On 6/7/2022 1:29 AM, Doug Anderson wrote: > > Hi, > > > > On Mon, Jun 6, 2022 at 11:19 AM Vijaya Krishna Nivarthi > > <quic_vnivarth@quicinc.com> wrote: > >> Hi, > >> > >> > >> On 6/4/2022 12:10 AM, Doug Anderson wrote: > >>> Hi, > >>> > >>> On Fri, Jun 3, 2022 at 10:43 AM Vijaya Krishna Nivarthi > >>> <quic_vnivarth@quicinc.com> wrote: > >>>> Ah, or I guess what you're saying is that the table historically > >>>> contained "rounded" rates but that clk_round_rate() isn't returning > >>>> nice round rates. OK, but if we truly want to support an inexact > >>>> match, you'd want to pick the rate that reduces the error, not just > >>>> pick the first one. In other words, something like this (untested): > >>>> > >>>> freq = clk_round_rate(clk, mult); > >>>> diff = abs(((long)mult - freq) / div); > >>>> if (diff < best_diff) { > >>>> best_diff = diff; > >>>> ser_clk = freq; > >>>> best_div = div; > >>>> } > >>>> I am not sure if its required that freq is a multiple of best_div now > >>>> that we don't have a multiple of desired_clk anyway. > >>> How about just this (untested): > >>> > >>> freq = clk_round_rate(clk, mult); > >>> candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk)); > >>> candidate_freq = freq / candidate_div; > >>> diff = abs((long)desired_clk - candidate_freq); > >>> if (diff < best_diff) { > >>> best_diff = diff; > >>> ser_clk = freq; > >>> best_div = candidate_div; > >>> } > >> I am afraid this still doesn't guarantee that ser_clk is a multiple of > >> best_div > > OK. ...I guess my question would be: does it matter for some reason? > > "ser_clk" is just a local variable in this function. Who cares if it's > > not a multiple of best_div? This is why we're keeping track of > > "best_div" in the first place, so that later in the function instead > > of: > > > > *clk_div = ser_clk / desired_clk; > > if (!(*clk_div)) > > *clk_div = 1; > > > > You just do: > > > > *clk_div = best_div; > > My only concern continues to be... > > Given ser_clk is the final frequency that this function is going to > return and best_div is going to be the clk_divider, is it ok if the > divider cant divide the frequency exactly? > > In other words, Can this function output combinations like (402,4) > (501,5) ? > > If ok, then we can go ahead with this patch or even previous perhaps. I don't see why not. You're basically just getting a resulting clock that's not an integral "Hz", right? So if "baud" is 9600 and sampling_rate is 16 then desired_clk is (9600 * 16) = 153600 Let's imagine that we do all the math and we finally decide that our best bet is with the rate 922000 and a divider of 6. That means that the actual clock we'll make is 153666.67 when we _wanted_ 153600. There's no reason it needs to be integral, though, and 153666.67 would still be better than making 160000. > >> I tested it with a function simulates clk_round_rate. > >> > >> static unsigned long clk_round_rate_test(struct clk *clk, unsigned long > >> in_freq) > >> { > >> unsigned long root_freq[6] = {105, 204, 303, 402, 501, 602}; > >> int i; > >> > >> for (i = 0; i < 6; i++) { > >> if (root_freq[i] >= in_freq) > >> return root_freq[i]; > >> } > >> return root_freq[6]; > >> } > >> > >> { > >> unsigned long ser_clk; > >> unsigned long desired_clk; > >> unsigned long freq; > >> int div_round_closest; > >> unsigned long div; > >> unsigned long mult; > >> unsigned long candidate_div, candidate_freq; > >> > >> unsigned long diff, best_diff, best_div; > >> unsigned long one; > >> > >> desired_clk = 100; > >> one = 1; > >> best_diff = ULONG_MAX; > >> pr_err("\ndesired_clk-%d\n", desired_clk); > >> for (div = 1; div <= 10; div++) { > >> mult = div * desired_clk; > >> > >> freq = clk_round_rate_test(clk, mult); > >> div_round_closest = DIV_ROUND_CLOSEST(freq, desired_clk); > >> candidate_div = max(one, (unsigned long)div_round_closest); > >> candidate_freq = freq / candidate_div; > >> diff = abs((long)desired_clk - candidate_freq); > >> pr_err("div-%d, mult-%d, freq-%d, div_round_closest-%d, > >> candidate_div-%d, candidate_freq-%d, diff-%d\n", > >> div, mult, freq, div_round_closest, candidate_div, > >> candidate_freq, diff); > >> if (diff < best_diff) { > >> pr_err("This is best so far\n"); > >> best_diff = diff; > >> ser_clk = freq; > >> best_div = candidate_div; > >> } > >> } > >> pr_err("\nbest_diff-%d, ser_clk-%d, best_div-%d\n", > >> best_diff, ser_clk, best_div); > >> } > >> > >> And here is the output > >> > >> [ 17.835167] desired_clk-100 > >> [ 17.839567] div-1, mult-100, freq-105, div_round_closest-1, > >> candidate_div-1, candidate_freq-105, diff-5 > >> [ 17.849220] This is best so far > >> [ 17.852458] div-2, mult-200, freq-204, div_round_closest-2, > >> candidate_div-2, candidate_freq-102, diff-2 > >> [ 17.862104] This is best so far > >> [ 17.865345] div-3, mult-300, freq-303, div_round_closest-3, > >> candidate_div-3, candidate_freq-101, diff-1 > >> [ 17.874995] This is best so far > >> [ 17.878237] div-4, mult-400, freq-402, div_round_closest-4, > >> candidate_div-4, candidate_freq-100, diff-0 > >> [ 17.887882] This is best so far > >> [ 17.891118] div-5, mult-500, freq-501, div_round_closest-5, > >> candidate_div-5, candidate_freq-100, diff-0 > >> [ 17.900770] div-6, mult-600, freq-602, div_round_closest-6, > >> candidate_div-6, candidate_freq-100, diff-0 > >> [ 17.910415] div-7, mult-700, freq-602, div_round_closest-6, > >> candidate_div-6, candidate_freq-100, diff-0 > >> [ 17.920057] div-8, mult-800, freq-602, div_round_closest-6, > >> candidate_div-6, candidate_freq-100, diff-0 > >> [ 17.929703] div-9, mult-900, freq-602, div_round_closest-6, > >> candidate_div-6, candidate_freq-100, diff-0 > >> [ 17.939353] div-10, mult-1000, freq-602, div_round_closest-6, > >> candidate_div-6, candidate_freq-100, diff-0 > >> [ 17.949181] > >> [ 17.949181] best_diff-0, ser_clk-402, best_div-4 > > That doesn't look like a terrible result. I guess nominally 602 is a > > better approximation, but if we're accepting that we're not going to > > have an exact rate anyway then maybe being off by that tiny amount > > doesn't matter and we'd do better with the slow clock (maybe saves > > power?) > Actually power saving was the anticipation behind returning first > frequency in original patch, when we cant find exact frequency. Right, except that if you just pick the first clock you find it would be _wildly_ off. I guess if you really want to do this the right way, you need to set a maximum tolerance and pick the first rate you find that meets that tolerance. Random web search for "uart baud rate tolerance" makes me believe that +/- 5% deviation is OK, but to be safe you probably want something lower. Maybe 2%? So if the desired clock is within 2% of a clock you can make, can you just pick that one? > >> Please note that we go past cases when we have an divider that can > >> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one > >> that doesn't. > > Ah, good point. Luckily that's a 1-line fix, right? > > Apologies, I could not figure out how. Ah, sorry. Not quite 1 line, but this (untested) freq = clk_round_rate(clk, mult); if (freq % desired_clk == 0) { ser_clk = freq; best_div = freq / desired_clk; break; } candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk)); candidate_freq = freq / candidate_div; diff = abs((long)desired_clk - candidate_freq); if (diff < best_diff) { best_diff = diff; ser_clk = freq; best_div = candidate_div; }

Hi, On 6/8/2022 12:55 AM, Doug Anderson wrote: > Hi, > > On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi > <quic_vnivarth@quicinc.com> wrote: >> Hi, >> >> On 6/7/2022 1:29 AM, Doug Anderson wrote: >> >> My only concern continues to be... >> >> Given ser_clk is the final frequency that this function is going to >> return and best_div is going to be the clk_divider, is it ok if the >> divider cant divide the frequency exactly? >> >> In other words, Can this function output combinations like (402,4) >> (501,5) ? >> >> If ok, then we can go ahead with this patch or even previous perhaps. > I don't see why not. You're basically just getting a resulting clock > that's not an integral "Hz", right? > > So if "baud" is 9600 and sampling_rate is 16 then desired_clk is (9600 > * 16) = 153600 > > Let's imagine that we do all the math and we finally decide that our > best bet is with the rate 922000 and a divider of 6. That means that > the actual clock we'll make is 153666.67 when we _wanted_ 153600. > There's no reason it needs to be integral, though, and 153666.67 would > still be better than making 160000. > Thank you for clarification. >>> power?) >> Actually power saving was the anticipation behind returning first >> frequency in original patch, when we cant find exact frequency. > Right, except that if you just pick the first clock you find it would > be _wildly_ off. I guess if you really want to do this the right way, > you need to set a maximum tolerance and pick the first rate you find > that meets that tolerance. Random web search for "uart baud rate > tolerance" makes me believe that +/- 5% deviation is OK, but to be > safe you probably want something lower. Maybe 2%? So if the desired > clock is within 2% of a clock you can make, can you just pick that > one? Ok, 2% seems good. > >>>> Please note that we go past cases when we have an divider that can >>>> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one >>>> that doesn't. >>> Ah, good point. Luckily that's a 1-line fix, right? >> Apologies, I could not figure out how. > Ah, sorry. Not quite 1 line, but this (untested) > > > freq = clk_round_rate(clk, mult); > > if (freq % desired_clk == 0) { > ser_clk = freq; > best_div = freq / desired_clk; > break; > } > > candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk)); > candidate_freq = freq / candidate_div; > diff = abs((long)desired_clk - candidate_freq); > if (diff < best_diff) { > best_diff = diff; > ser_clk = freq; > best_div = candidate_div; > } But then once again, we would likely need 2 loops because while we are ok with giving up on search for best_div on finding something within 2% tolerance, we may not want to give up on exact match (freq % desired_clk == 0 ) So how about something like this with 2 loops (more optimised than previous version with 2 loops)? (untested) maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT; prev = 0; /* run through quicker loop anticipating to find an exact match */ for (div = 1; div <= maxdiv; div++) { mult = (unsigned long long)div * desired_clk; if (mult > ULONG_MAX) break; freq = clk_round_rate(clk, max((unsigned long)mult, prev+1)); if (!(freq % desired_clk)) { *clk_div = freq / desired_clk; return freq; } if (prev && prev == freq) break; prev = freq; } pr_warn("Can't find exact match frequency and divider\n"); freq = 0; best_diff = ULONG_MAX; prev_candidate_div = -1; while (true) { prev = freq; freq = clk_round_rate(clk, freq+1); if (freq == prev) break; /* end of table */ candidate_div = DIV_ROUND_CLOSEST(freq, desired_clk); /* * Since the frequencies are increasing, previous is better * if we have same divider, proceed to next in table */ if (prev_candidate_div == candidate_div) continue; prev_candidate_div = candidate_div; if (candidate_div) candidate_freq = freq / candidate_div; else candidate_freq = freq; diff = abs((long)desired_clk - candidate_freq); if (diff < best_diff) { best_diff = diff; ser_clk = freq; *clk_div = candidate_div; if (diff * 50 < ser_clk) { two_percent_tolerance = true; break; } } } if (!two_percent_tolerance) { pr_warn("Can't find frequency within 2 percent tolerance\n"); } return ser_clk; } Thank you.

Hi, On Wed, Jun 8, 2022 at 11:34 AM Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com> wrote: > > Hi, > > > On 6/8/2022 12:55 AM, Doug Anderson wrote: > > Hi, > > > > On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi > > <quic_vnivarth@quicinc.com> wrote: > >> Hi, > >> > >> On 6/7/2022 1:29 AM, Doug Anderson wrote: > >> > >> My only concern continues to be... > >> > >> Given ser_clk is the final frequency that this function is going to > >> return and best_div is going to be the clk_divider, is it ok if the > >> divider cant divide the frequency exactly? > >> > >> In other words, Can this function output combinations like (402,4) > >> (501,5) ? > >> > >> If ok, then we can go ahead with this patch or even previous perhaps. > > I don't see why not. You're basically just getting a resulting clock > > that's not an integral "Hz", right? > > > > So if "baud" is 9600 and sampling_rate is 16 then desired_clk is (9600 > > * 16) = 153600 > > > > Let's imagine that we do all the math and we finally decide that our > > best bet is with the rate 922000 and a divider of 6. That means that > > the actual clock we'll make is 153666.67 when we _wanted_ 153600. > > There's no reason it needs to be integral, though, and 153666.67 would > > still be better than making 160000. > > > Thank you for clarification. > >>> power?) > >> Actually power saving was the anticipation behind returning first > >> frequency in original patch, when we cant find exact frequency. > > Right, except that if you just pick the first clock you find it would > > be _wildly_ off. I guess if you really want to do this the right way, > > you need to set a maximum tolerance and pick the first rate you find > > that meets that tolerance. Random web search for "uart baud rate > > tolerance" makes me believe that +/- 5% deviation is OK, but to be > > safe you probably want something lower. Maybe 2%? So if the desired > > clock is within 2% of a clock you can make, can you just pick that > > one? > Ok, 2% seems good. > > > >>>> Please note that we go past cases when we have an divider that can > >>>> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one > >>>> that doesn't. > >>> Ah, good point. Luckily that's a 1-line fix, right? > >> Apologies, I could not figure out how. > > Ah, sorry. Not quite 1 line, but this (untested) > > > > > > freq = clk_round_rate(clk, mult); > > > > if (freq % desired_clk == 0) { > > ser_clk = freq; > > best_div = freq / desired_clk; > > break; > > } > > > > candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk)); > > candidate_freq = freq / candidate_div; > > diff = abs((long)desired_clk - candidate_freq); > > if (diff < best_diff) { > > best_diff = diff; > > ser_clk = freq; > > best_div = candidate_div; > > } > > But then once again, we would likely need 2 loops because while we are > ok with giving up on search for best_div on finding something within 2% > tolerance, we may not want to give up on exact match (freq % desired_clk > == 0 ) Ah, it took me a while to understand why two loops. It's because in one case you're trying multiplies and in the other you're bumping up to the next closest clock rate. I don't think you really need to do that. Just test the (rate - 2%) and the rate. How about this (only lightly tested): ser_clk = 0; maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT; div = 1; while (div < maxdiv) { mult = (unsigned long long)div * desired_clk; if (mult != (unsigned long)mult) break; two_percent = mult / 50; /* * Loop requesting (freq - 2%) and possibly (freq). * * We'll keep track of the lowest freq inexact match we found * but always try to find a perfect match. NOTE: this algorithm * could miss a slightly better freq if there's more than one * freq between (freq - 2%) and (freq) but (freq) can't be made * exactly, but that's OK. * * This absolutely relies on the fact that the Qualcomm clock * driver always rounds up. */ test_freq = mult - two_percent; while (test_freq <= mult) { freq = clk_round_rate(clk, test_freq); /* * A dead-on freq is an insta-win. This implicitly * handles when "freq == mult" */ if (!(freq % desired_clk)) { *clk_div = freq / desired_clk; return freq; } /* * Only time clock framework doesn't round up is if * we're past the max clock rate. We're done searching * if that's the case. */ if (freq < test_freq) return ser_clk; /* Save the first (lowest freq) within 2% */ if (!ser_clk && freq <= mult + two_percent) { ser_clk = freq; *clk_div = div; } /* * If we already rounded up past mult then this will * cause the loop to exit. If not then this will run * the loop a second time with exactly mult. */ test_freq = max(freq + 1, mult); } /* * test_freq will always be bigger than mult by at least 1. * That means we can get the next divider with a DIV_ROUND_UP. * This has the advantage of skipping by a whole bunch of divs * If the clock framework already bypassed them. */ div = DIV_ROUND_UP(test_freq, desired_clk); } return ser_clk;

Hi, On 6/9/2022 4:07 AM, Doug Anderson wrote: > Hi, > > On Wed, Jun 8, 2022 at 11:34 AM Vijaya Krishna Nivarthi > <quic_vnivarth@quicinc.com> wrote: >> Hi, >> >> >> On 6/8/2022 12:55 AM, Doug Anderson wrote: >>> Hi, >>> >>> On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi >>> <quic_vnivarth@quicinc.com> wrote: >>> Ah, sorry. Not quite 1 line, but this (untested) >>> >>> freq = clk_round_rate(clk, mult); >>> >>> if (freq % desired_clk == 0) { >>> ser_clk = freq; >>> best_div = freq / desired_clk; >>> break; >>> } >>> >>> candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk)); >>> candidate_freq = freq / candidate_div; >>> diff = abs((long)desired_clk - candidate_freq); >>> if (diff < best_diff) { >>> best_diff = diff; >>> ser_clk = freq; >>> best_div = candidate_div; >>> } >> But then once again, we would likely need 2 loops because while we are >> ok with giving up on search for best_div on finding something within 2% >> tolerance, we may not want to give up on exact match (freq % desired_clk >> == 0 ) > Ah, it took me a while to understand why two loops. It's because in > one case you're trying multiplies and in the other you're bumping up > to the next closest clock rate. I don't think you really need to do > that. Just test the (rate - 2%) and the rate. How about this (only > lightly tested): > > ser_clk = 0; > maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT; > div = 1; > while (div < maxdiv) { div <= maxdiv ? > mult = (unsigned long long)div * desired_clk; > if (mult != (unsigned long)mult) > break; > > two_percent = mult / 50; > > /* > * Loop requesting (freq - 2%) and possibly (freq). > * > * We'll keep track of the lowest freq inexact match we found > * but always try to find a perfect match. NOTE: this algorithm > * could miss a slightly better freq if there's more than one > * freq between (freq - 2%) and (freq) but (freq) can't be made > * exactly, but that's OK. > * > * This absolutely relies on the fact that the Qualcomm clock > * driver always rounds up. > */ > test_freq = mult - two_percent; > while (test_freq <= mult) { > freq = clk_round_rate(clk, test_freq); > > /* > * A dead-on freq is an insta-win. This implicitly > * handles when "freq == mult" > */ > if (!(freq % desired_clk)) { > *clk_div = freq / desired_clk; > return freq; > } > > /* > * Only time clock framework doesn't round up is if > * we're past the max clock rate. We're done searching > * if that's the case. > */ > if (freq < test_freq) > return ser_clk; > > /* Save the first (lowest freq) within 2% */ > if (!ser_clk && freq <= mult + two_percent) { > ser_clk = freq; > *clk_div = div; > } My last concern is with search happening only within 2% tolerance. Do we fail otherwise? This real case has best tolerance of 1.9%. [ 17.963672] 20220530 desired_clk-51200000 [ 21.193550] 20220530 returning ser_clk-52174000, div-1, diff-974000 Seems close. Thank you. > > /* > * If we already rounded up past mult then this will > * cause the loop to exit. If not then this will run > * the loop a second time with exactly mult. > */ > test_freq = max(freq + 1, mult); > } > > /* > * test_freq will always be bigger than mult by at least 1. > * That means we can get the next divider with a DIV_ROUND_UP. > * This has the advantage of skipping by a whole bunch of divs > * If the clock framework already bypassed them. > */ > div = DIV_ROUND_UP(test_freq, desired_clk); > } > > return ser_clk;

Hi, Re-sending as my earlier message bounced. On 6/9/2022 4:07 AM, Doug Anderson wrote: > Hi, > > On Wed, Jun 8, 2022 at 11:34 AM Vijaya Krishna Nivarthi > <quic_vnivarth@quicinc.com> wrote: >> Hi, >> >> >> On 6/8/2022 12:55 AM, Doug Anderson wrote: >>> Hi, >>> >>> On Tue, Jun 7, 2022 at 10:40 AM Vijaya Krishna Nivarthi >>> <quic_vnivarth@quicinc.com> wrote: >>>> Hi, >>>> >>>> On 6/7/2022 1:29 AM, Doug Anderson wrote: >>>> >>>> My only concern continues to be... >>>> >>>> Given ser_clk is the final frequency that this function is going to >>>> return and best_div is going to be the clk_divider, is it ok if the >>>> divider cant divide the frequency exactly? >>>> >>>> In other words, Can this function output combinations like (402,4) >>>> (501,5) ? >>>> >>>> If ok, then we can go ahead with this patch or even previous perhaps. >>> I don't see why not. You're basically just getting a resulting clock >>> that's not an integral "Hz", right? >>> >>> So if "baud" is 9600 and sampling_rate is 16 then desired_clk is (9600 >>> * 16) = 153600 >>> >>> Let's imagine that we do all the math and we finally decide that our >>> best bet is with the rate 922000 and a divider of 6. That means that >>> the actual clock we'll make is 153666.67 when we _wanted_ 153600. >>> There's no reason it needs to be integral, though, and 153666.67 would >>> still be better than making 160000. >>> >> Thank you for clarification. >>>>> power?) >>>> Actually power saving was the anticipation behind returning first >>>> frequency in original patch, when we cant find exact frequency. >>> Right, except that if you just pick the first clock you find it would >>> be _wildly_ off. I guess if you really want to do this the right way, >>> you need to set a maximum tolerance and pick the first rate you find >>> that meets that tolerance. Random web search for "uart baud rate >>> tolerance" makes me believe that +/- 5% deviation is OK, but to be >>> safe you probably want something lower. Maybe 2%? So if the desired >>> clock is within 2% of a clock you can make, can you just pick that >>> one? >> Ok, 2% seems good. >>>>>> Please note that we go past cases when we have an divider that can >>>>>> exactly divide the frequency(105/1, 204/2, 303/3) and end up with one >>>>>> that doesn't. >>>>> Ah, good point. Luckily that's a 1-line fix, right? >>>> Apologies, I could not figure out how. >>> Ah, sorry. Not quite 1 line, but this (untested) >>> >>> >>> freq = clk_round_rate(clk, mult); >>> >>> if (freq % desired_clk == 0) { >>> ser_clk = freq; >>> best_div = freq / desired_clk; >>> break; >>> } >>> >>> candidate_div = max(1, DIV_ROUND_CLOSEST(freq, desired_clk)); >>> candidate_freq = freq / candidate_div; >>> diff = abs((long)desired_clk - candidate_freq); >>> if (diff < best_diff) { >>> best_diff = diff; >>> ser_clk = freq; >>> best_div = candidate_div; >>> } >> But then once again, we would likely need 2 loops because while we are >> ok with giving up on search for best_div on finding something within 2% >> tolerance, we may not want to give up on exact match (freq % desired_clk >> == 0 ) > Ah, it took me a while to understand why two loops. It's because in > one case you're trying multiplies and in the other you're bumping up > to the next closest clock rate. I don't think you really need to do > that. Just test the (rate - 2%) and the rate. How about this (only > lightly tested): > > ser_clk = 0; > maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT; > div = 1; > while (div < maxdiv) { div <= maxdiv ? > mult = (unsigned long long)div * desired_clk; > if (mult != (unsigned long)mult) > break; > > two_percent = mult / 50; > > /* > * Loop requesting (freq - 2%) and possibly (freq). > * > * We'll keep track of the lowest freq inexact match we found > * but always try to find a perfect match. NOTE: this algorithm > * could miss a slightly better freq if there's more than one > * freq between (freq - 2%) and (freq) but (freq) can't be made > * exactly, but that's OK. > * > * This absolutely relies on the fact that the Qualcomm clock > * driver always rounds up. > */ > test_freq = mult - two_percent; > while (test_freq <= mult) { > freq = clk_round_rate(clk, test_freq); > > /* > * A dead-on freq is an insta-win. This implicitly > * handles when "freq == mult" > */ > if (!(freq % desired_clk)) { > *clk_div = freq / desired_clk; > return freq; > } > > /* > * Only time clock framework doesn't round up is if > * we're past the max clock rate. We're done searching > * if that's the case. > */ > if (freq < test_freq) > return ser_clk; > > /* Save the first (lowest freq) within 2% */ > if (!ser_clk && freq <= mult + two_percent) { > ser_clk = freq; > *clk_div = div; > } My last concern is with search happening only within 2% tolerance. Do we fail otherwise? This real case has best tolerance of 1.9%. [ 17.963672] 20220530 desired_clk-51200000 [ 21.193550] 20220530 returning ser_clk-52174000, div-1, diff-974000 Seems close. Thank you. > > /* > * If we already rounded up past mult then this will > * cause the loop to exit. If not then this will run > * the loop a second time with exactly mult. > */ > test_freq = max(freq + 1, mult); > } > > /* > * test_freq will always be bigger than mult by at least 1. > * That means we can get the next divider with a DIV_ROUND_UP. > * This has the advantage of skipping by a whole bunch of divs > * If the clock framework already bypassed them. > */ > div = DIV_ROUND_UP(test_freq, desired_clk); > } > > return ser_clk;

Hi, Re-sending (2nd attempt) as emails are bouncing... > > > > But then once again, we would likely need 2 loops because while we are > > ok with giving up on search for best_div on finding something within > > 2% tolerance, we may not want to give up on exact match (freq % > > desired_clk == 0 ) > > Ah, it took me a while to understand why two loops. It's because in one case > you're trying multiplies and in the other you're bumping up to the next > closest clock rate. I don't think you really need to do that. Just test the (rate - > 2%) and the rate. How about this (only lightly tested): > > ser_clk = 0; > maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT; > div = 1; > while (div < maxdiv) { div <= maxdiv ? > mult = (unsigned long long)div * desired_clk; > if (mult != (unsigned long)mult) > break; > > two_percent = mult / 50; > > /* > * Loop requesting (freq - 2%) and possibly (freq). > * > * We'll keep track of the lowest freq inexact match we found > * but always try to find a perfect match. NOTE: this algorithm > * could miss a slightly better freq if there's more than one > * freq between (freq - 2%) and (freq) but (freq) can't be made > * exactly, but that's OK. > * > * This absolutely relies on the fact that the Qualcomm clock > * driver always rounds up. > */ > test_freq = mult - two_percent; > while (test_freq <= mult) { > freq = clk_round_rate(clk, test_freq); > > /* > * A dead-on freq is an insta-win. This implicitly > * handles when "freq == mult" > */ > if (!(freq % desired_clk)) { > *clk_div = freq / desired_clk; > return freq; > } > > /* > * Only time clock framework doesn't round up is if > * we're past the max clock rate. We're done searching > * if that's the case. > */ > if (freq < test_freq) > return ser_clk; > > /* Save the first (lowest freq) within 2% */ > if (!ser_clk && freq <= mult + two_percent) { > ser_clk = freq; > *clk_div = div; > } My last concern is with search happening only within 2% tolerance. Do we fail otherwise? This real case has best tolerance of 1.9% and seems close. [ 17.963672] 20220530 desired_clk-51200000 [ 21.193550] 20220530 returning ser_clk-52174000, div-1, diff-974000 Perhaps we can fallback on 1st clock rate? Thank you. > > /* > * If we already rounded up past mult then this will > * cause the loop to exit. If not then this will run > * the loop a second time with exactly mult. > */ > test_freq = max(freq + 1, mult); > } > > /* > * test_freq will always be bigger than mult by at least 1. > * That means we can get the next divider with a DIV_ROUND_UP. > * This has the advantage of skipping by a whole bunch of divs > * If the clock framework already bypassed them. > */ > div = DIV_ROUND_UP(test_freq, desired_clk); > } > > return ser_clk;

Hi, On Fri, Jun 10, 2022 at 2:33 AM Vijaya Krishna Nivarthi (Temp) (QUIC) <quic_vnivarth@quicinc.com> wrote: > > Hi, > > Re-sending (2nd attempt) as emails are bouncing... > > > > > > > > But then once again, we would likely need 2 loops because while we are > > > ok with giving up on search for best_div on finding something within > > > 2% tolerance, we may not want to give up on exact match (freq % > > > desired_clk == 0 ) > > > > Ah, it took me a while to understand why two loops. It's because in one case > > you're trying multiplies and in the other you're bumping up to the next > > closest clock rate. I don't think you really need to do that. Just test the (rate - > > 2%) and the rate. How about this (only lightly tested): > > > > ser_clk = 0; > > maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT; > > div = 1; > > while (div < maxdiv) { > > > div <= maxdiv ? Ah, sure. > > mult = (unsigned long long)div * desired_clk; > > if (mult != (unsigned long)mult) > > break; > > > > two_percent = mult / 50; > > > > /* > > * Loop requesting (freq - 2%) and possibly (freq). > > * > > * We'll keep track of the lowest freq inexact match we found > > * but always try to find a perfect match. NOTE: this algorithm > > * could miss a slightly better freq if there's more than one > > * freq between (freq - 2%) and (freq) but (freq) can't be made > > * exactly, but that's OK. > > * > > * This absolutely relies on the fact that the Qualcomm clock > > * driver always rounds up. > > */ > > test_freq = mult - two_percent; > > while (test_freq <= mult) { > > freq = clk_round_rate(clk, test_freq); > > > > /* > > * A dead-on freq is an insta-win. This implicitly > > * handles when "freq == mult" > > */ > > if (!(freq % desired_clk)) { > > *clk_div = freq / desired_clk; > > return freq; > > } > > > > /* > > * Only time clock framework doesn't round up is if > > * we're past the max clock rate. We're done searching > > * if that's the case. > > */ > > if (freq < test_freq) > > return ser_clk; > > > > /* Save the first (lowest freq) within 2% */ > > if (!ser_clk && freq <= mult + two_percent) { > > ser_clk = freq; > > *clk_div = div; > > } > > My last concern is with search happening only within 2% tolerance. > Do we fail otherwise? > > This real case has best tolerance of 1.9% and seems close. > > [ 17.963672] 20220530 desired_clk-51200000 > [ 21.193550] 20220530 returning ser_clk-52174000, div-1, diff-974000 > > Perhaps we can fallback on 1st clock rate? I don't feel super comfortable just blindly falling back on the 1st clock rate. It could be wildly (more than 5%) wrong, can't it? IMO: * If you're not comfortable with 2%, you could always pick 3% or 4%. As I said, my random web search seemed to indicate that up to 5% was perhaps OK. * It's probably overkill, but you could abstract the whole search out and try searching once for 2% and then try 4%? -Doug

Hi, For desired_clk = 100 and clock rates like 1st from below, DIV_ROUND_UP seems to cause missing candidate solutions. static unsigned long clk_round_rate_test(struct clk *clk, unsigned long in_freq) { //unsigned long root_freq[] = {301, 702, 1004}; //unsigned long root_freq[] = {301, 702, 1004, 2000, 3000}; //unsigned long root_freq[] = {50, 97, 99}; //unsigned long root_freq[] = {50, 97, 99, 200}; //unsigned long root_freq[] = {92, 110, 193, 230}; //unsigned long root_freq[] = {92, 110, 193, 230, 300, 401}; //unsigned long root_freq[] = {92, 110, 193, 230, 295, 296, 297, 401}; //unsigned long root_freq[] = {92, 110, 193, 230, 295, 296, 297, 300, 401}; //unsigned long root_freq[] = {197, 198, 199}; unsigned long root_freq[] = {197, 198, 199, 200}; int i; size_t n = sizeof root_freq / sizeof *root_freq; for (i = 0; i < n; i++) { if (root_freq[i] >= in_freq) return root_freq[i]; } return root_freq[n-1]; } I modified to handle such cases, optimised little and uploaded a patch. It seems to work for all the cases like above. Thank you, Vijay/ > > Ah, it took me a while to understand why two loops. It's because in one case > you're trying multiplies and in the other you're bumping up to the next > closest clock rate. I don't think you really need to do that. Just test the (rate - > 2%) and the rate. How about this (only lightly tested): > > ser_clk = 0; > maxdiv = CLK_DIV_MSK >> CLK_DIV_SHFT; > div = 1; > while (div < maxdiv) { > mult = (unsigned long long)div * desired_clk; > if (mult != (unsigned long)mult) > break; > > two_percent = mult / 50; > > /* > * Loop requesting (freq - 2%) and possibly (freq). > * > * We'll keep track of the lowest freq inexact match we found > * but always try to find a perfect match. NOTE: this algorithm > * could miss a slightly better freq if there's more than one > * freq between (freq - 2%) and (freq) but (freq) can't be made > * exactly, but that's OK. > * > * This absolutely relies on the fact that the Qualcomm clock > * driver always rounds up. > */ > test_freq = mult - two_percent; > while (test_freq <= mult) { > freq = clk_round_rate(clk, test_freq); > > /* > * A dead-on freq is an insta-win. This implicitly > * handles when "freq == mult" > */ > if (!(freq % desired_clk)) { > *clk_div = freq / desired_clk; > return freq; > } > > /* > * Only time clock framework doesn't round up is if > * we're past the max clock rate. We're done searching > * if that's the case. > */ > if (freq < test_freq) > return ser_clk; > > /* Save the first (lowest freq) within 2% */ > if (!ser_clk && freq <= mult + two_percent) { > ser_clk = freq; > *clk_div = div; > } > > /* > * If we already rounded up past mult then this will > * cause the loop to exit. If not then this will run > * the loop a second time with exactly mult. > */ > test_freq = max(freq + 1, mult); > } > > /* > * test_freq will always be bigger than mult by at least 1. > * That means we can get the next divider with a DIV_ROUND_UP. > * This has the advantage of skipping by a whole bunch of divs > * If the clock framework already bypassed them. > */ > div = DIV_ROUND_UP(test_freq, desired_clk); > } > > return ser_clk;

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c index 4733a23..08f3ad4 100644 --- a/drivers/tty/serial/qcom_geni_serial.c +++ b/drivers/tty/serial/qcom_geni_serial.c @@ -943,11 +943,11 @@ static int qcom_geni_serial_startup(struct uart_port *uport) static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud, unsigned int sampling_rate, unsigned int *clk_div) { - unsigned long ser_clk; + unsigned long ser_clk = 0; unsigned long desired_clk; unsigned long freq, prev; unsigned long div, maxdiv; - int64_t mult; + unsigned long long mult; desired_clk = baud * sampling_rate; if (!desired_clk) { @@ -959,8 +959,8 @@ static unsigned long get_clk_div_rate(struct clk *clk, unsigned int baud, prev = 0; for (div = 1; div <= maxdiv; div++) { - mult = div * desired_clk; - if (mult > ULONG_MAX) + mult = (unsigned long long)div * (unsigned long long)desired_clk; + if (mult > (unsigned long long)ULONG_MAX) break; freq = clk_round_rate(clk, (unsigned long)mult);

`Add missing initialisation and correct type casting Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com> --- drivers/tty/serial/qcom_geni_serial.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)`