diff mbox series

HID: logitech-hidpp: Handle timeout differently from busy

Message ID 20230531082428.21763-1-hadess@hadess.net
State Accepted
Commit 6199d23c91ce53bfed455f09a8c5ed170d516824
Headers show
Series HID: logitech-hidpp: Handle timeout differently from busy | expand

Commit Message

Bastien Nocera May 31, 2023, 8:24 a.m. UTC
If an attempt at contacting a receiver or a device fails because the
receiver or device never responds, don't restart the communication, only
restart it if the receiver or device answers that it's busy, as originally
intended.

This was the behaviour on communication timeout before commit 586e8fede795
("HID: logitech-hidpp: Retry commands when device is busy").

This fixes some overly long waits in a critical path on boot, when
checking whether the device is connected by getting its HID++ version.

Signed-off-by: Bastien Nocera <hadess@hadess.net>
Suggested-by: Mark Lord <mlord@pobox.com>
Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
---
 drivers/hid/hid-logitech-hidpp.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Jiri Kosina May 31, 2023, 2:07 p.m. UTC | #1
On Wed, 31 May 2023, Bastien Nocera wrote:

> If an attempt at contacting a receiver or a device fails because the
> receiver or device never responds, don't restart the communication, only
> restart it if the receiver or device answers that it's busy, as originally
> intended.
> 
> This was the behaviour on communication timeout before commit 586e8fede795
> ("HID: logitech-hidpp: Retry commands when device is busy").
> 
> This fixes some overly long waits in a critical path on boot, when
> checking whether the device is connected by getting its HID++ version.
> 
> Signed-off-by: Bastien Nocera <hadess@hadess.net>
> Suggested-by: Mark Lord <mlord@pobox.com>
> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
> ---
>  drivers/hid/hid-logitech-hidpp.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
> index 0fcfd85fea0f..2246044b1639 100644
> --- a/drivers/hid/hid-logitech-hidpp.c
> +++ b/drivers/hid/hid-logitech-hidpp.c
> @@ -314,6 +314,7 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
>  			dbg_hid("%s:timeout waiting for response\n", __func__);
>  			memset(response, 0, sizeof(struct hidpp_report));
>  			ret = -ETIMEDOUT;
> +			goto exit;
>  		}
>  

I have applied this even before getting confirmation from the reporters in 
bugzilla, as it's the right thing to do anyway.

Thanks,
Jiri Kosina June 3, 2023, 12:41 p.m. UTC | #2
On Wed, 31 May 2023, Jiri Kosina wrote:

> > If an attempt at contacting a receiver or a device fails because the
> > receiver or device never responds, don't restart the communication, only
> > restart it if the receiver or device answers that it's busy, as originally
> > intended.
> > 
> > This was the behaviour on communication timeout before commit 586e8fede795
> > ("HID: logitech-hidpp: Retry commands when device is busy").
> > 
> > This fixes some overly long waits in a critical path on boot, when
> > checking whether the device is connected by getting its HID++ version.
> > 
> > Signed-off-by: Bastien Nocera <hadess@hadess.net>
> > Suggested-by: Mark Lord <mlord@pobox.com>
> > Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
> > ---
> >  drivers/hid/hid-logitech-hidpp.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
> > index 0fcfd85fea0f..2246044b1639 100644
> > --- a/drivers/hid/hid-logitech-hidpp.c
> > +++ b/drivers/hid/hid-logitech-hidpp.c
> > @@ -314,6 +314,7 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> >  			dbg_hid("%s:timeout waiting for response\n", __func__);
> >  			memset(response, 0, sizeof(struct hidpp_report));
> >  			ret = -ETIMEDOUT;
> > +			goto exit;
> >  		}
> >  
> 
> I have applied this even before getting confirmation from the reporters in 
> bugzilla, as it's the right thing to do anyway.

Unfortunately it doesn't seem to cure the reported issue (while reverting 
586e8fede79 does): https://bugzilla.kernel.org/show_bug.cgi?id=217523#c2
Mark Lord June 3, 2023, 1:17 p.m. UTC | #3
On 2023-06-03 08:41 AM, Jiri Kosina wrote:
> On Wed, 31 May 2023, Jiri Kosina wrote:
> 
>>> If an attempt at contacting a receiver or a device fails because the
>>> receiver or device never responds, don't restart the communication, only
>>> restart it if the receiver or device answers that it's busy, as originally
>>> intended.
>>>
>>> This was the behaviour on communication timeout before commit 586e8fede795
>>> ("HID: logitech-hidpp: Retry commands when device is busy").
>>>
>>> This fixes some overly long waits in a critical path on boot, when
>>> checking whether the device is connected by getting its HID++ version.
>>>
>>> Signed-off-by: Bastien Nocera <hadess@hadess.net>
>>> Suggested-by: Mark Lord <mlord@pobox.com>
>>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
>>> ---
>>>  drivers/hid/hid-logitech-hidpp.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
>>> index 0fcfd85fea0f..2246044b1639 100644
>>> --- a/drivers/hid/hid-logitech-hidpp.c
>>> +++ b/drivers/hid/hid-logitech-hidpp.c
>>> @@ -314,6 +314,7 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
>>>  			dbg_hid("%s:timeout waiting for response\n", __func__);
>>>  			memset(response, 0, sizeof(struct hidpp_report));
>>>  			ret = -ETIMEDOUT;
>>> +			goto exit;
>>>  		}
>>>  
>>
>> I have applied this even before getting confirmation from the reporters in 
>> bugzilla, as it's the right thing to do anyway.
> 
> Unfortunately it doesn't seem to cure the reported issue (while reverting 
> 586e8fede79 does): https://bugzilla.kernel.org/show_bug.cgi?id=217523#c2

I wonder if this code could be re-worked to not even do this (waiting)
from the _probe() function?  It ought to be able to throw it on a workqueue
or something, rather than stalling system boot for a minimum of 5-seconds
(or much longer as as-is).
Linux regression tracking (Thorsten Leemhuis) June 5, 2023, 8:44 a.m. UTC | #4
On 03.06.23 14:41, Jiri Kosina wrote:
> On Wed, 31 May 2023, Jiri Kosina wrote:
> 
>>> If an attempt at contacting a receiver or a device fails because the
>>> receiver or device never responds, don't restart the communication, only
>>> restart it if the receiver or device answers that it's busy, as originally
>>> intended.
>>>
>>> This was the behaviour on communication timeout before commit 586e8fede795
>>> ("HID: logitech-hidpp: Retry commands when device is busy").
>>>
>>> This fixes some overly long waits in a critical path on boot, when
>>> checking whether the device is connected by getting its HID++ version.
>>>
>>> Signed-off-by: Bastien Nocera <hadess@hadess.net>
>>> Suggested-by: Mark Lord <mlord@pobox.com>
>>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
> [...]  
>>
>> I have applied this even before getting confirmation from the reporters in 
>> bugzilla, as it's the right thing to do anyway.
> 
> Unfortunately it doesn't seem to cure the reported issue (while reverting 
> 586e8fede79 does):

BTW, remind me again: was fixing this by reverting 586e8fede79 for now a
option? I guess it's not, but if I'm wrong I wonder if that might at
this point be the best way forward.

> https://bugzilla.kernel.org/show_bug.cgi?id=217523#c2

FWIW, another comment showed up there:

```
> --- Comment #6 from vova7890 ---
> Same problem. I researched this some time ago. I noticed that if I add a small
> delay between commands to the dongle - everything goes fine. Repeated
> request(586e8fede7953b1695b5ccc6112eff9b052e79ac) made the situation more
> visible
```

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot ^backmonitor:
https://lore.kernel.org/all/15e5d50f-95fc-c7c9-0918-015f24c6fc6d@leemhuis.info/
Benjamin Tissoires June 5, 2023, 2:27 p.m. UTC | #5
On Jun 05 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
> 
> On 03.06.23 14:41, Jiri Kosina wrote:
> > On Wed, 31 May 2023, Jiri Kosina wrote:
> > 
> >>> If an attempt at contacting a receiver or a device fails because the
> >>> receiver or device never responds, don't restart the communication, only
> >>> restart it if the receiver or device answers that it's busy, as originally
> >>> intended.
> >>>
> >>> This was the behaviour on communication timeout before commit 586e8fede795
> >>> ("HID: logitech-hidpp: Retry commands when device is busy").
> >>>
> >>> This fixes some overly long waits in a critical path on boot, when
> >>> checking whether the device is connected by getting its HID++ version.
> >>>
> >>> Signed-off-by: Bastien Nocera <hadess@hadess.net>
> >>> Suggested-by: Mark Lord <mlord@pobox.com>
> >>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
> >>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
> > [...]  
> >>
> >> I have applied this even before getting confirmation from the reporters in 
> >> bugzilla, as it's the right thing to do anyway.
> > 
> > Unfortunately it doesn't seem to cure the reported issue (while reverting 
> > 586e8fede79 does):
> 
> BTW, remind me again: was fixing this by reverting 586e8fede79 for now a
> option? I guess it's not, but if I'm wrong I wonder if that might at
> this point be the best way forward.

Could be. I don't think we thought at simply reverting it because it is
required for some new supoprted devices because they might differ
slightly from what we currently supported.

That being said, Bastien will be unavailable for at least a week AFAIU,
so maybe we should revert that patch.

> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=217523#c2
> 
> FWIW, another comment showed up there:
> 
> ```
> > --- Comment #6 from vova7890 ---
> > Same problem. I researched this some time ago. I noticed that if I add a small
> > delay between commands to the dongle - everything goes fine. Repeated
> > request(586e8fede7953b1695b5ccc6112eff9b052e79ac) made the situation more
> > visible

I don't think I ever had to add any delays between commands. The USB
stack should be capable of forwarding the commands just fine. So unless
the receiver is of a different hardware (but same VID/PID) that might
expose a problem elsewhere (in the USB controller maybe???). Just a long
shot, but maybe getting the config of the impacted users and what are
the USB controllers/drivers they are using might gives us a better
understanding.

Cheers,
Benjamin
Mark Lord June 5, 2023, 4:25 p.m. UTC | #6
On 2023-06-05 10:27 AM, Benjamin Tissoires wrote:
> 
> On Jun 05 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
>>
>> On 03.06.23 14:41, Jiri Kosina wrote:
>>> On Wed, 31 May 2023, Jiri Kosina wrote:
>>>
>>>>> If an attempt at contacting a receiver or a device fails because the
>>>>> receiver or device never responds, don't restart the communication, only
>>>>> restart it if the receiver or device answers that it's busy, as originally
>>>>> intended.
>>>>>
>>>>> This was the behaviour on communication timeout before commit 586e8fede795
>>>>> ("HID: logitech-hidpp: Retry commands when device is busy").
>>>>>
>>>>> This fixes some overly long waits in a critical path on boot, when
>>>>> checking whether the device is connected by getting its HID++ version.
>>>>>
>>>>> Signed-off-by: Bastien Nocera <hadess@hadess.net>
>>>>> Suggested-by: Mark Lord <mlord@pobox.com>
>>>>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
>>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
>>> [...]  
>>>>
>>>> I have applied this even before getting confirmation from the reporters in 
>>>> bugzilla, as it's the right thing to do anyway.
>>>
>>> Unfortunately it doesn't seem to cure the reported issue (while reverting 
>>> 586e8fede79 does):
>>
>> BTW, remind me again: was fixing this by reverting 586e8fede79 for now a
>> option? I guess it's not, but if I'm wrong I wonder if that might at
>> this point be the best way forward.
> 
> Could be. I don't think we thought at simply reverting it because it is
> required for some new supoprted devices because they might differ
> slightly from what we currently supported.
> 
> That being said, Bastien will be unavailable for at least a week AFAIU,
> so maybe we should revert that patch.
> 
>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=217523#c2

Pardon me, but I don't understand what that "retry" loop
is even attempting to do inside function hidpp_send_message_sync().

It appears to unconditionally loop until:
   (1) the __hidpp_send_report() fails,
or (2) it gets a HIDPP_ERROR,
or (3) it gets a HIDPP20_ERROR other than HIDPP20_ERROR_BUSY,
or (4) until it has looped 3 times, which appears to be the normal exit.

It doesn't seem to have any provision to exit the loop earlier on "success"
(whatever that is).

And so when it finally does exit after the 3 iterations,
it then returns the last value of "ret",
which will be zero from the __hidpp_send_report() call,
or sometimes the most recent non-BUSY HIDPP20_ERROR seen.

Obviously I'm missing something, as otherwise this code would never have
passed review and made it into the Linux kernel in the first place.  Right?

What is this code trying to do?  And what am I not seeing?
Mark Lord June 5, 2023, 5:47 p.m. UTC | #7
On 2023-06-05 01:04 PM, Benjamin Tissoires wrote:
> 
> 
> On Jun 05 2023, Mark Lord wrote:
>>
>> On 2023-06-05 10:27 AM, Benjamin Tissoires wrote:
>>>
>>> On Jun 05 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>>
>>>> On 03.06.23 14:41, Jiri Kosina wrote:
>>>>> On Wed, 31 May 2023, Jiri Kosina wrote:
>>>>>
>>>>>>> If an attempt at contacting a receiver or a device fails because the
>>>>>>> receiver or device never responds, don't restart the communication, only
>>>>>>> restart it if the receiver or device answers that it's busy, as originally
>>>>>>> intended.
>>>>>>>
>>>>>>> This was the behaviour on communication timeout before commit 586e8fede795
>>>>>>> ("HID: logitech-hidpp: Retry commands when device is busy").
>>>>>>>
>>>>>>> This fixes some overly long waits in a critical path on boot, when
>>>>>>> checking whether the device is connected by getting its HID++ version.
>>>>>>>
>>>>>>> Signed-off-by: Bastien Nocera <hadess@hadess.net>
>>>>>>> Suggested-by: Mark Lord <mlord@pobox.com>
>>>>>>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
>>>>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
>>>>> [...]  
>>>>>>
>>>>>> I have applied this even before getting confirmation from the reporters in 
>>>>>> bugzilla, as it's the right thing to do anyway.
>>>>>
>>>>> Unfortunately it doesn't seem to cure the reported issue (while reverting 
>>>>> 586e8fede79 does):
>>>>
>>>> BTW, remind me again: was fixing this by reverting 586e8fede79 for now a
>>>> option? I guess it's not, but if I'm wrong I wonder if that might at
>>>> this point be the best way forward.
>>>
>>> Could be. I don't think we thought at simply reverting it because it is
>>> required for some new supoprted devices because they might differ
>>> slightly from what we currently supported.
>>>
>>> That being said, Bastien will be unavailable for at least a week AFAIU,
>>> so maybe we should revert that patch.
>>>
>>>>
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=217523#c2
>>
>> Pardon me, but I don't understand what that "retry" loop
>> is even attempting to do inside function hidpp_send_message_sync().
>>
>> It appears to unconditionally loop until:
>>    (1) the __hidpp_send_report() fails,
>> or (2) it gets a HIDPP_ERROR,
>> or (3) it gets a HIDPP20_ERROR other than HIDPP20_ERROR_BUSY,
>> or (4) until it has looped 3 times, which appears to be the normal exit.
>>
>> It doesn't seem to have any provision to exit the loop earlier on "success"
>> (whatever that is).
>>
>> And so when it finally does exit after the 3 iterations,
>> it then returns the last value of "ret",
>> which will be zero from the __hidpp_send_report() call,
>> or sometimes the most recent non-BUSY HIDPP20_ERROR seen.
>>
>> Obviously I'm missing something, as otherwise this code would never have
>> passed review and made it into the Linux kernel in the first place.  Right?
> 
> Ouch, very much ouch :(
> 
> So that one is on me, sorry, I completely missed that and trusted the
> local tests. We are human and can make mistakes. And TBH a lot of people
> have been staring at this code for a while now without noticing that.
> 
> Would you mind giving a shot at the following (untested) patch:
> 
> ---
>>From 7d03a6b765571d7c518c85e7e74f6f9d498fa354 Mon Sep 17 00:00:00 2001
> From: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> Date: Mon, 5 Jun 2023 18:56:36 +0200
> Subject: [PATCH] HID: hidpp: terminate retry loop on success
> 
> It seems we forgot the normal case to terminate the retry loop,
> making us asking 3 times each command, which is probably a little bit
> too much.
> 
> And remove the ugly "goto exit" that can be replaced by a simpler "break"
> 
> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
> Suggested-by: Mark Lord <mlord@pobox.com>
> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> ---
>  drivers/hid/hid-logitech-hidpp.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
> index 2246044b1639..5e1a412fd28f 100644
> --- a/drivers/hid/hid-logitech-hidpp.c
> +++ b/drivers/hid/hid-logitech-hidpp.c
> @@ -286,7 +286,7 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
>  	struct hidpp_report *message,
>  	struct hidpp_report *response)
>  {
> -	int ret;
> +	int ret = -1;
>  	int max_retries = 3;
>  
>  	mutex_lock(&hidpp->send_mutex);
> @@ -300,13 +300,13 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
>  	 */
>  	*response = *message;
>  
> -	for (; max_retries != 0; max_retries--) {
> +	for (; max_retries != 0 && ret; max_retries--) {
>  		ret = __hidpp_send_report(hidpp->hid_dev, message);
>  
>  		if (ret) {
>  			dbg_hid("__hidpp_send_report returned err: %d\n", ret);
>  			memset(response, 0, sizeof(struct hidpp_report));
> -			goto exit;
> +			break;
>  		}
>  
>  		if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
> @@ -314,14 +314,14 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
>  			dbg_hid("%s:timeout waiting for response\n", __func__);
>  			memset(response, 0, sizeof(struct hidpp_report));
>  			ret = -ETIMEDOUT;
> -			goto exit;
> +			break;
>  		}
>  
>  		if (response->report_id == REPORT_ID_HIDPP_SHORT &&
>  		    response->rap.sub_id == HIDPP_ERROR) {
>  			ret = response->rap.params[1];
>  			dbg_hid("%s:got hidpp error %02X\n", __func__, ret);
> -			goto exit;
> +			break;
>  		}
>  
>  		if ((response->report_id == REPORT_ID_HIDPP_LONG ||
> @@ -330,13 +330,12 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
>  			ret = response->fap.params[1];
>  			if (ret != HIDPP20_ERROR_BUSY) {
>  				dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__, ret);
> -				goto exit;
> +				break;
>  			}
>  			dbg_hid("%s:got busy hidpp 2.0 error %02X, retrying\n", __func__, ret);
>  		}
>  	}
>  
> -exit:
>  	mutex_unlock(&hidpp->send_mutex);
>  	return ret;
>  
> 

Tested-by: Mark Lord <mlord@pobox.com>

That works fine for me on top of 6.3.6,
and I don't even see the ETIMEDOUT happening there either (added a printk() for it).

I am unable to test on 6.4.0-rc5, because that kernel doesn't work with my USB3 docking station.

Cheers
Jiri Kosina June 5, 2023, 8:03 p.m. UTC | #8
On Mon, 5 Jun 2023, Mark Lord wrote:

> Tested-by: Mark Lord <mlord@pobox.com>
> 
> That works fine for me on top of 6.3.6, and I don't even see the 
> ETIMEDOUT happening there either (added a printk() for it).
> 
> I am unable to test on 6.4.0-rc5, because that kernel doesn't work with 
> my USB3 docking station.

Thanks a lot, Mark!

Applied and will be sending to Linus soon.
Greg Kroah-Hartman June 7, 2023, 9:46 a.m. UTC | #9
On Tue, Jun 06, 2023 at 08:37:26PM +0200, Benjamin Tissoires wrote:
> On Tue, Jun 6, 2023 at 8:18 PM Linux regression tracking (Thorsten
> Leemhuis) <regressions@leemhuis.info> wrote:
> >
> >
> >
> > On 06.06.23 15:27, Jiri Kosina wrote:
> > > On Mon, 5 Jun 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
> > >
> > >>>>> If an attempt at contacting a receiver or a device fails because the
> > >>>>> receiver or device never responds, don't restart the communication, only
> > >>>>> restart it if the receiver or device answers that it's busy, as originally
> > >>>>> intended.
> > >>>>>
> > >>>>> This was the behaviour on communication timeout before commit 586e8fede795
> > >>>>> ("HID: logitech-hidpp: Retry commands when device is busy").
> > >>>>>
> > >>>>> This fixes some overly long waits in a critical path on boot, when
> > >>>>> checking whether the device is connected by getting its HID++ version.
> > >>>>>
> > >>>>> Signed-off-by: Bastien Nocera <hadess@hadess.net>
> > >>>>> Suggested-by: Mark Lord <mlord@pobox.com>
> > >>>>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
> > >>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
> > >>> [...]
> > >>>>
> > >>>> I have applied this even before getting confirmation from the reporters in
> > >>>> bugzilla, as it's the right thing to do anyway.
> > >>>
> > >>> Unfortunately it doesn't seem to cure the reported issue (while reverting
> > >>> 586e8fede79 does):
> > >>
> > >> BTW, remind me again: was fixing this by reverting 586e8fede79 for now a
> > >> option? I guess it's not, but if I'm wrong I wonder if that might at
> > >> this point be the best way forward.
> > >
> > > This should now all be fixed by
> > >
> > >     https://git.kernel.org/linus/7c28afd5512e371773dbb2bf95a31ed5625651d9
> >
> > Jiri, Benjamin, many many thx for working on this.
> >
> > Hmmm. No CC: <stable... tag.
> >
> > Should we ask Greg to pick this up for 6.3 now, or better wait a few
> > days? He currently already has 6199d23c91ce ("HID: logitech-hidpp:
> > Handle timeout differently from busy") in his queue for the next 6.3.y
> > release.
> 
> Well, the Fixes: tag supposedly is enough to let the stable folks to
> pick it up.

No, not at all, please see:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

(hint, we need a cc: stable@ in the signed-off-by area.)

We only pick up stuff with "Fixes:" semi-often, sometimes never,
depending on our workload.  Never rely on that.

It's been this way for 18+ years now, nothing new :)

> But you are right, let's Cc Greg for a quicker inclusion
> in the 6.3 tree.
> 
> Greg, would you mind adding the commit above (7c28afd5512e37) onto the
> 6.3 stable queue? Thanks!

Now queued up, thanks.

greg k-h
Benjamin Tissoires June 7, 2023, 10:07 a.m. UTC | #10
On Wed, Jun 7, 2023 at 11:46 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Jun 06, 2023 at 08:37:26PM +0200, Benjamin Tissoires wrote:
> > On Tue, Jun 6, 2023 at 8:18 PM Linux regression tracking (Thorsten
> > Leemhuis) <regressions@leemhuis.info> wrote:
> > >
> > >
> > >
> > > On 06.06.23 15:27, Jiri Kosina wrote:
> > > > On Mon, 5 Jun 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
> > > >
> > > >>>>> If an attempt at contacting a receiver or a device fails because the
> > > >>>>> receiver or device never responds, don't restart the communication, only
> > > >>>>> restart it if the receiver or device answers that it's busy, as originally
> > > >>>>> intended.
> > > >>>>>
> > > >>>>> This was the behaviour on communication timeout before commit 586e8fede795
> > > >>>>> ("HID: logitech-hidpp: Retry commands when device is busy").
> > > >>>>>
> > > >>>>> This fixes some overly long waits in a critical path on boot, when
> > > >>>>> checking whether the device is connected by getting its HID++ version.
> > > >>>>>
> > > >>>>> Signed-off-by: Bastien Nocera <hadess@hadess.net>
> > > >>>>> Suggested-by: Mark Lord <mlord@pobox.com>
> > > >>>>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
> > > >>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
> > > >>> [...]
> > > >>>>
> > > >>>> I have applied this even before getting confirmation from the reporters in
> > > >>>> bugzilla, as it's the right thing to do anyway.
> > > >>>
> > > >>> Unfortunately it doesn't seem to cure the reported issue (while reverting
> > > >>> 586e8fede79 does):
> > > >>
> > > >> BTW, remind me again: was fixing this by reverting 586e8fede79 for now a
> > > >> option? I guess it's not, but if I'm wrong I wonder if that might at
> > > >> this point be the best way forward.
> > > >
> > > > This should now all be fixed by
> > > >
> > > >     https://git.kernel.org/linus/7c28afd5512e371773dbb2bf95a31ed5625651d9
> > >
> > > Jiri, Benjamin, many many thx for working on this.
> > >
> > > Hmmm. No CC: <stable... tag.
> > >
> > > Should we ask Greg to pick this up for 6.3 now, or better wait a few
> > > days? He currently already has 6199d23c91ce ("HID: logitech-hidpp:
> > > Handle timeout differently from busy") in his queue for the next 6.3.y
> > > release.
> >
> > Well, the Fixes: tag supposedly is enough to let the stable folks to
> > pick it up.
>
> No, not at all, please see:
>     https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> for how to do this properly.
>
> (hint, we need a cc: stable@ in the signed-off-by area.)
>
> We only pick up stuff with "Fixes:" semi-often, sometimes never,
> depending on our workload.  Never rely on that.

Oh right. Given that those patches eventually end up in stable sooner
or later I made the shortcut in my head. Thanks for correcting that :)

>
> It's been this way for 18+ years now, nothing new :)
>
> > But you are right, let's Cc Greg for a quicker inclusion
> > in the 6.3 tree.
> >
> > Greg, would you mind adding the commit above (7c28afd5512e37) onto the
> > 6.3 stable queue? Thanks!
>
> Now queued up, thanks.

Great thanks!

Cheers,
Benjamin
Linux regression tracking (Thorsten Leemhuis) June 15, 2023, 7:24 a.m. UTC | #11
On 06.06.23 20:18, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 06.06.23 15:27, Jiri Kosina wrote:
>> On Mon, 5 Jun 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>>>> If an attempt at contacting a receiver or a device fails because the
>>>>>> receiver or device never responds, don't restart the communication, only
>>>>>> restart it if the receiver or device answers that it's busy, as originally
>>>>>> intended.
>>>>>> Signed-off-by: Bastien Nocera <hadess@hadess.net>
> [...]  
>>>> Unfortunately it doesn't seem to cure the reported issue (while reverting 
>>>> 586e8fede79 does):
> [...]
>> This should now all be fixed by
>>
>>     https://git.kernel.org/linus/7c28afd5512e371773dbb2bf95a31ed5625651d9
> 
> Jiri, Benjamin, many many thx for working on this.

FWIW, it seems things work for many people, but something still is not
completely right:

```
https://bugzilla.kernel.org/show_bug.cgi?id=217412

--- Comment #47 from Mark Blakeney ---
@Juha, kernel 6.3.7 adds the 2 patches intended to fix this bug and the
startup delay is now gone. However, I have had 2 cases over the last 5
days in which I have been running 6.3.7 where my mouse fails to be
detected at all after startup. I have to pull the Logitech receiver
out/in to get the mouse working. Never seen this issue before so I
suspect the patches are not right.
```

Ciao, Thorsten
Mark Lord June 15, 2023, 11:24 a.m. UTC | #12
On 2023-06-15 03:24 AM, Linux regression tracking (Thorsten Leemhuis) wrote:
>
...
> https://bugzilla.kernel.org/show_bug.cgi?id=217412
> 
> --- Comment #47 from Mark Blakeney ---
> @Juha, kernel 6.3.7 adds the 2 patches intended to fix this bug and the
> startup delay is now gone. However, I have had 2 cases over the last 5
> days in which I have been running 6.3.7 where my mouse fails to be
> detected at all after startup. I have to pull the Logitech receiver
> out/in to get the mouse working. Never seen this issue before so I
> suspect the patches are not right.
> ```

I too have had that happen with recent kernels, but have not yet put
a finger to a specific version or cause.

Just toggling the power button on the wireless mouse is enough for
it to "re-appear".

The 5.4.xx kernels never had this issue.  I went straight from those
to the 6.3.xx ones, where it does happen sometimes, both with and without
the recent "delay" fixes.
Linux regression tracking (Thorsten Leemhuis) June 21, 2023, 2:03 p.m. UTC | #13
On 15.06.23 13:24, Mark Lord wrote:
> On 2023-06-15 03:24 AM, Linux regression tracking (Thorsten Leemhuis) wrote:
> ...
>> https://bugzilla.kernel.org/show_bug.cgi?id=217412
>>
>> --- Comment #47 from Mark Blakeney ---
>> @Juha, kernel 6.3.7 adds the 2 patches intended to fix this bug and the
>> startup delay is now gone. However, I have had 2 cases over the last 5
>> days in which I have been running 6.3.7 where my mouse fails to be
>> detected at all after startup. I have to pull the Logitech receiver
>> out/in to get the mouse working. Never seen this issue before so I
>> suspect the patches are not right.
>> ```
> 
> I too have had that happen with recent kernels,

Ahh, good to know!

> but have not yet put
> a finger to a specific version or cause.
> 
> Just toggling the power button on the wireless mouse is enough for
> it to "re-appear".
> 
> The 5.4.xx kernels never had this issue.  I went straight from those
> to the 6.3.xx ones, where it does happen sometimes, both with and without
> the recent "delay" fixes.
Benjamin Tissoires June 21, 2023, 3:10 p.m. UTC | #14
On Wed, Jun 21, 2023 at 4:09 PM Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> On 15.06.23 13:24, Mark Lord wrote:
> > On 2023-06-15 03:24 AM, Linux regression tracking (Thorsten Leemhuis) wrote:
> > ...
> >> https://bugzilla.kernel.org/show_bug.cgi?id=217412
> >>
> >> --- Comment #47 from Mark Blakeney ---
> >> @Juha, kernel 6.3.7 adds the 2 patches intended to fix this bug and the
> >> startup delay is now gone. However, I have had 2 cases over the last 5
> >> days in which I have been running 6.3.7 where my mouse fails to be
> >> detected at all after startup. I have to pull the Logitech receiver
> >> out/in to get the mouse working. Never seen this issue before so I
> >> suspect the patches are not right.
> >> ```
> >
> > I too have had that happen with recent kernels,
>
> Ahh, good to know!
>
> > but have not yet put
> > a finger to a specific version or cause.
> >
> > Just toggling the power button on the wireless mouse is enough for
> > it to "re-appear".
> >
> > The 5.4.xx kernels never had this issue.  I went straight from those
> > to the 6.3.xx ones, where it does happen sometimes, both with and without
> > the recent "delay" fixes.
>
> From Mark Blakeney it sounded a lot like this is something that started
> with 6.3, but would be good to confirm. Which brings me to the reason
> why I write this mail:
>
> Is anyone still working on this? There was radio silence already for a
> week now. Okay, it's not really urgent, but I guess this should not fall
> through the cracks.

Sorry for the radio silence. I worked on hidpp yesterday and submitted
a new patch for a problem that was overlooked in the previous fixes:
https://lore.kernel.org/linux-input/20230621-logitech-fixes-v1-1-32e70933c0b0@redhat.com/

The loop was not properly initializing all its local states, meaning
that when we encountered a "please retry" from the device, we could
never do the actual retry and returned a different error than in the
6.2 series.

Would anyone give it a shot?

Cheers,
Benjamin

>
> Bastian, are you back?
>
> If not: Does anyone know if there is hope that Bastien will be able to
> look into this in the not too far future?
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
Linux regression tracking (Thorsten Leemhuis) June 21, 2023, 5:19 p.m. UTC | #15
On 21.06.23 17:10, Benjamin Tissoires wrote:
> On Wed, Jun 21, 2023 at 4:09 PM Linux regression tracking (Thorsten
> Leemhuis) <regressions@leemhuis.info> wrote:
>>
>> On 15.06.23 13:24, Mark Lord wrote:
>>> On 2023-06-15 03:24 AM, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> ...
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=217412
>>>>
>>>> --- Comment #47 from Mark Blakeney ---
>>>> @Juha, kernel 6.3.7 adds the 2 patches intended to fix this bug and the
>>>> startup delay is now gone. However, I have had 2 cases over the last 5
>>>> days in which I have been running 6.3.7 where my mouse fails to be
>>>> detected at all after startup. I have to pull the Logitech receiver
>>>> out/in to get the mouse working. Never seen this issue before so I
>>>> suspect the patches are not right.
>>>> ```
>>>
>>> I too have had that happen with recent kernels,
>>
>> Ahh, good to know!
>>
>>> but have not yet put
>>> a finger to a specific version or cause.
>>>
>>> Just toggling the power button on the wireless mouse is enough for
>>> it to "re-appear".
>>>
>>> The 5.4.xx kernels never had this issue.  I went straight from those
>>> to the 6.3.xx ones, where it does happen sometimes, both with and without
>>> the recent "delay" fixes.
>>
>> From Mark Blakeney it sounded a lot like this is something that started
>> with 6.3, but would be good to confirm. Which brings me to the reason
>> why I write this mail:
>>
>> Is anyone still working on this? There was radio silence already for a
>> week now. Okay, it's not really urgent, but I guess this should not fall
>> through the cracks.
> 
> Sorry for the radio silence.

No worries, we all have much to do. But I thought it was time for a
quick status inquiry.

> I worked on hidpp yesterday and submitted
> a new patch for a problem that was overlooked in the previous fixes:
> https://lore.kernel.org/linux-input/20230621-logitech-fixes-v1-1-32e70933c0b0@redhat.com/

Great, many thx!

> The loop was not properly initializing all its local states, meaning
> that when we encountered a "please retry" from the device, we could
> never do the actual retry and returned a different error than in the
> 6.2 series.
> 
> Would anyone give it a shot?

Might be worth mentioning it in the bugzilla ticket, as I sadly can't CC
those people here. I'll do this for once (konstantin's bugbot will soon
solve this problem...)

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
diff mbox series

Patch

diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
index 0fcfd85fea0f..2246044b1639 100644
--- a/drivers/hid/hid-logitech-hidpp.c
+++ b/drivers/hid/hid-logitech-hidpp.c
@@ -314,6 +314,7 @@  static int hidpp_send_message_sync(struct hidpp_device *hidpp,
 			dbg_hid("%s:timeout waiting for response\n", __func__);
 			memset(response, 0, sizeof(struct hidpp_report));
 			ret = -ETIMEDOUT;
+			goto exit;
 		}
 
 		if (response->report_id == REPORT_ID_HIDPP_SHORT &&