diff mbox series

[v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Message ID 20231006155646.12938-1-hgajjar@de.adit-jv.com
State Superseded
Headers show
Series [v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach | expand

Commit Message

Hardik Gajjar Oct. 6, 2023, 3:56 p.m. UTC
This patch replaces the usage of netif_stop_queue with netif_device_detach
in the u_ether driver. The netif_device_detach function not only stops all
tx queues by calling netif_tx_stop_all_queues but also marks the device as
removed by clearing the __LINK_STATE_PRESENT bit.

This change helps notify user space about the disconnection of the device
more effectively, compared to netif_stop_queue, which only stops a single
transmit queue.

Signed-off-by: Hardik Gajjar <hgajjar@de.adit-jv.com>
---
Changes since version 1:
	- Correct Singed-off user name and e-mail

Changes since version 2:
	- Move change history below signed-off-by

Changes since version 3:
	- move netif_device_detach from eth_stop to gether_disconnect.
---
 drivers/usb/gadget/function/u_ether.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andy Shevchenko Jan. 14, 2024, 4:59 p.m. UTC | #1
+Cc: Ferry.

On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
> This patch replaces the usage of netif_stop_queue with netif_device_detach
> in the u_ether driver. The netif_device_detach function not only stops all
> tx queues by calling netif_tx_stop_all_queues but also marks the device as
> removed by clearing the __LINK_STATE_PRESENT bit.
> 
> This change helps notify user space about the disconnection of the device
> more effectively, compared to netif_stop_queue, which only stops a single
> transmit queue.

This change effectively broke my USB ether setup.

git bisect start
# status: waiting for both good and bad commits
# good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
# status: waiting for bad commit, 1 good commit known
# bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag 'usb-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
# bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix wrong data added to platform device
git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
# good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: add a bandwidth budget table
git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
# good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: gadget: uvc: cleanup request when not in correct state"
git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
# bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: tps6598x: Refactor tps6598x port registration
git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
# bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach
git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
# good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix stub_dev hub disconnect
git bisect good 97475763484245916735a1aa9a3310a01d46b008
# good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix a type promotion bug
git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
# first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach

Note, revert indeed helps. Should I send a revert?

I use configfs to setup USB EEM function and it worked till this commit.
If needed, I can share my scripts, but I believe it's not needed as here
we see a clear regression.
Ferry Toth April 3, 2024, 9:01 p.m. UTC | #2
Hi,

Op 15-01-2024 om 21:10 schreef Ferry Toth:
> Hi,
>
> Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
>> On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
>>> +Cc: Ferry.
>>>
>>> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
>>>> This patch replaces the usage of netif_stop_queue with 
>>>> netif_device_detach
>>>> in the u_ether driver. The netif_device_detach function not only 
>>>> stops all
>>>> tx queues by calling netif_tx_stop_all_queues but also marks the 
>>>> device as
>>>> removed by clearing the __LINK_STATE_PRESENT bit.
>>>>
>>>> This change helps notify user space about the disconnection of the 
>>>> device
>>>> more effectively, compared to netif_stop_queue, which only stops a 
>>>> single
>>>> transmit queue.
>>>
>>> This change effectively broke my USB ether setup.
>>>
>>> git bisect start
>>> # status: waiting for both good and bad commits
>>> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag 
>>> 'tty-6.7-rc1' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
>>> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
>>> # status: waiting for bad commit, 1 good commit known
>>> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag 
>>> 'usb-6.7-rc1' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
>>> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
>>> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix 
>>> wrong data added to platform device
>>> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
>>> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: 
>>> add a bandwidth budget table
>>> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
>>> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: 
>>> gadget: uvc: cleanup request when not in correct state"
>>> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
>>> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: 
>>> tps6598x: Refactor tps6598x port registration
>>> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
>>> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: 
>>> u_ether: Replace netif_stop_queue with netif_device_detach
>>> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
>>> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix 
>>> stub_dev hub disconnect
>>> git bisect good 97475763484245916735a1aa9a3310a01d46b008
>>> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix 
>>> a type promotion bug
>>> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
>>> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: 
>>> gadget: u_ether: Replace netif_stop_queue with netif_device_detach
>>>
>>> Note, revert indeed helps. Should I send a revert?
>>>
>>> I use configfs to setup USB EEM function and it worked till this 
>>> commit.
>>> If needed, I can share my scripts, but I believe it's not needed as 
>>> here
>>> we see a clear regression.
>>>
>>> -- 
>>> With Best Regards,
>>> Andy Shevchenko
>>>
>>>
>>
>> Without this patch, there may be a potential crash in a race 
>> condition, as __LINK_STATE_PRESENT is monitored at many places in the 
>> Network stack to determine the status of the link.
>>
>> Could you please provide details on how this patch affects your 
>> functionality? Are you experiencing connection problems or data 
>> transfer interruptions?
>
> In my case on mrfld (Intel Edison Arduino) using configfs with this 
> patch no config from host through dhcp is received. Manual setting 
> correct ipv4 addr / mask / gw still no connection.
>
>> Instead of reverting this patch, consider trying the upcoming patch 
>> (soon to be available in the mainline) to see if it resolves your issue.
>>
>> https://lore.kernel.org/lkml/2023122900-commence-agenda-db2c@gregkh/T/#m36a812d3f1e5d744ee32381f6ae4185940b376de 
>>
>
> This patch works for me with v6.7.0.

I need to revisit this. The patch in this topic landed in v6.7.0-rc1 
(f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and 
other platforms as well.

The mentioned fix "usb: gadget: u_ether: Re-attach netif device to 
mirror detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does 
fix: I am able to make a USB EEM function again.

However, now a hidden issue appears. With mrfld there is an external 
switch to easily switch between host and device mode.

What is not fixed:

- when in device mode and unplugging/plugging the cable when using 
`ifconfig usb0` the line "usb0: 
flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to "usb0: 
flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route table 
is updated and the dir `/sys/class/net/usb0` exists and in the dir `cat 
carrier*` shows the carrier up and down counts. This is the expected 
behavior.

- when in device mode and switching to host mode `ifconfig usb0` 
continues to show "RUNNING", the route table is not modified and the dir 
`/sys/class/net/usb0` no longer exists.

- switching to device mode again, USB EEM works fine, no changes to 
RUNNING or the route table happen and the dir `/sys/class/net/usb0` 
still is non- existing.

- unplugging/plugging the cable in device mode after this does not 
restore the original situation.

This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential 
patches on top) and bisected back to this patch in v6.70-rc1.

It seems `netif_device_detach` does not completely clean up as expected 
and `netif_device_attach` does not completely rebuild.

I am wondering if on other platforms this can be reproduced? If so, inho 
it would be best to revert the both patches until the issue is resolved.

Thanks,

Ferry

>> Thanks,
>> Hardik
>
Hardik Gajjar April 5, 2024, 11:38 a.m. UTC | #3
On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote:
> Hi,
> 
> Op 15-01-2024 om 21:10 schreef Ferry Toth:
> > Hi,
> > 
> > Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
> > > On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
> > > > +Cc: Ferry.
> > > > 
> > > > On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
> > > > > This patch replaces the usage of netif_stop_queue with
> > > > > netif_device_detach
> > > > > in the u_ether driver. The netif_device_detach function not
> > > > > only stops all
> > > > > tx queues by calling netif_tx_stop_all_queues but also marks
> > > > > the device as
> > > > > removed by clearing the __LINK_STATE_PRESENT bit.
> > > > > 
> > > > > This change helps notify user space about the disconnection
> > > > > of the device
> > > > > more effectively, compared to netif_stop_queue, which only
> > > > > stops a single
> > > > > transmit queue.
> > > > 
> > > > This change effectively broke my USB ether setup.
> > > > 
> > > > git bisect start
> > > > # status: waiting for both good and bad commits
> > > > # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag
> > > > 'tty-6.7-rc1' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> > > > git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
> > > > # status: waiting for bad commit, 1 good commit known
> > > > # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag
> > > > 'usb-6.7-rc1' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> > > > git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
> > > > # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix
> > > > wrong data added to platform device
> > > > git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
> > > > # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb:
> > > > xhci-mtk: add a bandwidth budget table
> > > > git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
> > > > # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb:
> > > > gadget: uvc: cleanup request when not in correct state"
> > > > git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
> > > > # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec:
> > > > tps6598x: Refactor tps6598x port registration
> > > > git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
> > > > # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget:
> > > > u_ether: Replace netif_stop_queue with netif_device_detach
> > > > git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
> > > > # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip:
> > > > fix stub_dev hub disconnect
> > > > git bisect good 97475763484245916735a1aa9a3310a01d46b008
> > > > # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest:
> > > > fix a type promotion bug
> > > > git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
> > > > # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918]
> > > > usb: gadget: u_ether: Replace netif_stop_queue with
> > > > netif_device_detach
> > > > 
> > > > Note, revert indeed helps. Should I send a revert?
> > > > 
> > > > I use configfs to setup USB EEM function and it worked till this
> > > > commit.
> > > > If needed, I can share my scripts, but I believe it's not needed
> > > > as here
> > > > we see a clear regression.
> > > > 
> > > > -- 
> > > > With Best Regards,
> > > > Andy Shevchenko
> > > > 
> > > > 
> > > 
> > > Without this patch, there may be a potential crash in a race
> > > condition, as __LINK_STATE_PRESENT is monitored at many places in
> > > the Network stack to determine the status of the link.
> > > 
> > > Could you please provide details on how this patch affects your
> > > functionality? Are you experiencing connection problems or data
> > > transfer interruptions?
> > 
> > In my case on mrfld (Intel Edison Arduino) using configfs with this
> > patch no config from host through dhcp is received. Manual setting
> > correct ipv4 addr / mask / gw still no connection.
> > 
> > > Instead of reverting this patch, consider trying the upcoming patch
> > > (soon to be available in the mainline) to see if it resolves your
> > > issue.
> > > 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e=
> > > 
> > 
> > This patch works for me with v6.7.0.
> 
> I need to revisit this. The patch in this topic landed in v6.7.0-rc1
> (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other
> platforms as well.
> 
> The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror
> detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am
> able to make a USB EEM function again.
> 
> However, now a hidden issue appears. With mrfld there is an external switch
> to easily switch between host and device mode.
> 
> What is not fixed:
> 
> - when in device mode and unplugging/plugging the cable when using `ifconfig
> usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to
> "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route
> table is updated and the dir `/sys/class/net/usb0` exists and in the dir
> `cat carrier*` shows the carrier up and down counts. This is the expected
> behavior.
> 
> - when in device mode and switching to host mode `ifconfig usb0` continues
> to show "RUNNING", the route table is not modified and the dir
> `/sys/class/net/usb0` no longer exists.
> 
> - switching to device mode again, USB EEM works fine, no changes to RUNNING
> or the route table happen and the dir `/sys/class/net/usb0` still is non-
> existing.
> 
> - unplugging/plugging the cable in device mode after this does not restore
> the original situation.
> 
> This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential
> patches on top) and bisected back to this patch in v6.70-rc1.
> 
> It seems `netif_device_detach` does not completely clean up as expected and
> `netif_device_attach` does not completely rebuild.
> 
> I am wondering if on other platforms this can be reproduced? If so, inho it
> would be best to revert the both patches until the issue is resolved.
> 
> Thanks,
> 
> Ferry

I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available.
This behavior seems not correct.

The gether_cleanup function should remove the interface along with the associated kobject,
and this function should only be called during the unloading of the driver or deleting the gadget.
Could you please confirm whether any of your custom modifications are removing the net interface kobject?

> 
> > > Thanks,
> > > Hardik
> >
Ferry Toth April 7, 2024, 8:51 p.m. UTC | #4
Hi

Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote:
>> Hi,
>>
>> Op 15-01-2024 om 21:10 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 15-01-2024 om 14:27 schreef Hardik Gajjar:
>>>> On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote:
>>>>> +Cc: Ferry.
>>>>>
>>>>> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote:
>>>>>> This patch replaces the usage of netif_stop_queue with
>>>>>> netif_device_detach
>>>>>> in the u_ether driver. The netif_device_detach function not
>>>>>> only stops all
>>>>>> tx queues by calling netif_tx_stop_all_queues but also marks
>>>>>> the device as
>>>>>> removed by clearing the __LINK_STATE_PRESENT bit.
>>>>>>
>>>>>> This change helps notify user space about the disconnection
>>>>>> of the device
>>>>>> more effectively, compared to netif_stop_queue, which only
>>>>>> stops a single
>>>>>> transmit queue.
>>>>> This change effectively broke my USB ether setup.
>>>>>
>>>>> git bisect start
>>>>> # status: waiting for both good and bad commits
>>>>> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag
>>>>> 'tty-6.7-rc1' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
>>>>> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923
>>>>> # status: waiting for bad commit, 1 good commit known
>>>>> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag
>>>>> 'usb-6.7-rc1' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
>>>>> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b
>>>>> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix
>>>>> wrong data added to platform device
>>>>> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d
>>>>> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb:
>>>>> xhci-mtk: add a bandwidth budget table
>>>>> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96
>>>>> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb:
>>>>> gadget: uvc: cleanup request when not in correct state"
>>>>> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1
>>>>> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec:
>>>>> tps6598x: Refactor tps6598x port registration
>>>>> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59
>>>>> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget:
>>>>> u_ether: Replace netif_stop_queue with netif_device_detach
>>>>> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918
>>>>> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip:
>>>>> fix stub_dev hub disconnect
>>>>> git bisect good 97475763484245916735a1aa9a3310a01d46b008
>>>>> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest:
>>>>> fix a type promotion bug
>>>>> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7
>>>>> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918]
>>>>> usb: gadget: u_ether: Replace netif_stop_queue with
>>>>> netif_device_detach
>>>>>
>>>>> Note, revert indeed helps. Should I send a revert?
>>>>>
>>>>> I use configfs to setup USB EEM function and it worked till this
>>>>> commit.
>>>>> If needed, I can share my scripts, but I believe it's not needed
>>>>> as here
>>>>> we see a clear regression.
>>>>>
>>>>> -- 
>>>>> With Best Regards,
>>>>> Andy Shevchenko
>>>>>
>>>>>
>>>> Without this patch, there may be a potential crash in a race
>>>> condition, as __LINK_STATE_PRESENT is monitored at many places in
>>>> the Network stack to determine the status of the link.
>>>>
>>>> Could you please provide details on how this patch affects your
>>>> functionality? Are you experiencing connection problems or data
>>>> transfer interruptions?
>>> In my case on mrfld (Intel Edison Arduino) using configfs with this
>>> patch no config from host through dhcp is received. Manual setting
>>> correct ipv4 addr / mask / gw still no connection.
>>>
>>>> Instead of reverting this patch, consider trying the upcoming patch
>>>> (soon to be available in the mainline) to see if it resolves your
>>>> issue.
>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e=
>>>>
>>> This patch works for me with v6.7.0.
>> I need to revisit this. The patch in this topic landed in v6.7.0-rc1
>> (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other
>> platforms as well.
>>
>> The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror
>> detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am
>> able to make a USB EEM function again.
>>
>> However, now a hidden issue appears. With mrfld there is an external switch
>> to easily switch between host and device mode.
>>
>> What is not fixed:
>>
>> - when in device mode and unplugging/plugging the cable when using `ifconfig
>> usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to
>> "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route
>> table is updated and the dir `/sys/class/net/usb0` exists and in the dir
>> `cat carrier*` shows the carrier up and down counts. This is the expected
>> behavior.
>>
>> - when in device mode and switching to host mode `ifconfig usb0` continues
>> to show "RUNNING", the route table is not modified and the dir
>> `/sys/class/net/usb0` no longer exists.
>>
>> - switching to device mode again, USB EEM works fine, no changes to RUNNING
>> or the route table happen and the dir `/sys/class/net/usb0` still is non-
>> existing.
>>
>> - unplugging/plugging the cable in device mode after this does not restore
>> the original situation.
>>
>> This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential
>> patches on top) and bisected back to this patch in v6.70-rc1.
>>
>> It seems `netif_device_detach` does not completely clean up as expected and
>> `netif_device_attach` does not completely rebuild.
>>
>> I am wondering if on other platforms this can be reproduced? If so, inho it
>> would be best to revert the both patches until the issue is resolved.
>>
>> Thanks,
>>
>> Ferry
> I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available.
> This behavior seems not correct.

Exactly. And this didn't happen before the 2 patches.

To be precise: /sys/class/net/usb0 is not removed and it is a link, the 
link target 
/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no 
longer exists

> The gether_cleanup function should remove the interface along with the associated kobject,
> and this function should only be called during the unloading of the driver or deleting the gadget.
> Could you please confirm whether any of your custom modifications are removing the net interface kobject?

As far as know not. I have one tusb1210 (usb phy) and 2 dwc3 patches, 
nothing related to gadgets or net.

For reference, patches are here: 
https://github.com/htot/meta-intel-edison/tree/nanbield/meta-intel-edison-bsp/recipes-kernel/linux/files

0001-phy-ti-tusb1210-write-to-scratch-on-power-on.patch

0001a-usb-dwc3-core-Fix-dwc3_core_soft_reset-before-anythi.patch

044-REVERTME-usb-dwc3-gadget-skip-endpoints-ep-18-in-out.patch

Seems (just guessing) gether_cleanup is being called due to the switch 
to host mode, but cleanup only partly succeeds. Now I'm finding I can 
make the net interface disappear with `ip link set dev usb0 down` and 
the broken link goes away by destroying the gadget in configfs.

Is that intentional? Do I need to tear down the gadgets in reverse order 
as on create when switching to host mode? That would be new.

What happens when you trigger a switch to host mode without actually 
unplugging your cable?

>>>> Thanks,
>>>> Hardik
Andy Shevchenko April 11, 2024, 4:39 p.m. UTC | #5
On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:

...

> > > Exactly. And this didn't happen before the 2 patches.
> > > 
> > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
> > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
> > > longer exists
> 
> So, it means that the /sys/class/net/usb0 is present, but the symlink is
> broken. In that case, the dwc3 driver should recreate the device, and the
> symlink should become active again
> I have the dwc3 IP base usb controller, Let me check with this patch and
> share result here.  May be we need some fix in dwc3

It's quite possible, please test on your side.
We are happy to test any fixes if you come up with.
Ferry Toth April 11, 2024, 8:52 p.m. UTC | #6
Hi

Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> ...
>
>>>> Exactly. And this didn't happen before the 2 patches.
>>>>
>>>> To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
>>>> longer exists
>> So, it means that the /sys/class/net/usb0 is present, but the symlink is
>> broken. In that case, the dwc3 driver should recreate the device, and the
>> symlink should become active again

Yes, on first enabling gadget (when device mode is activated):

root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
driver  net  power  sound  subsystem  suspended  uevent

Then switching to host mode:

root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
ls: cannot access 
'/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such 
file or directory

Then back to device mode:

root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
driver  power  sound  subsystem  suspended  uevent

net is missing. But, network functions:

root@yuna:~# ping 10.42.0.1
PING 10.42.0.1 (10.42.0.1): 56 data bytes

Mass storage device is created and removed each time as expected.

>> I have the dwc3 IP base usb controller, Let me check with this patch and
>> share result here.  May be we need some fix in dwc3
Would have been nice if someone could test on other controller as well. 
But another instance of dwc3 is also very welcome.
> It's quite possible, please test on your side.
> We are happy to test any fixes if you come up with.
>
Hardik Gajjar April 17, 2024, 3:13 p.m. UTC | #7
On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> 
> ...
> 
> > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > 
> > > > > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
> > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
> > > > > > longer exists
> > > > So, it means that the /sys/class/net/usb0 is present, but the symlink is
> > > > broken. In that case, the dwc3 driver should recreate the device, and the
> > > > symlink should become active again
> > 
> > Yes, on first enabling gadget (when device mode is activated):
> > 
> > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > driver  net  power  sound  subsystem  suspended  uevent
> > 
> > Then switching to host mode:
> > 
> > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > ls: cannot access
> > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file
> > or directory
> > 
> > Then back to device mode:
> > 
> > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > driver  power  sound  subsystem  suspended  uevent
> > 
> > net is missing. But, network functions:
> > 
> > root@yuna:~# ping 10.42.0.1
> > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > 
> > Mass storage device is created and removed each time as expected.
> 
> So, what's the conclusion? Shall we move towards revert of those two changes?


As promised, I have the tested the this patch with the dwc3 gadget. I could not reproduce
the issue. 

I can see the usb0 exist all the time and accessible regardless of the role switching of the USB mode (peripheral <-> host)

Following are the logs:
//Host to device

console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > mode
console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/
usb0

//device to host
console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/
usb0
s a800000.dwc3/gadget/net/usb0                                                <
addr_assign_type    duplex             phys_port_name
addr_len            flags              phys_switch_id
address             gro_flush_timeout  power
broadcast           ifalias            proto_down
carrier             ifindex            queues
carrier_changes     iflink             speed
carrier_down_count  link_mode          statistics
carrier_up_count    mtu                subsystem
dev_id              name_assign_type   tx_queue_len
dev_port            netdev_group       type
device              operstate          uevent
dormant             phys_port_id       waiting_for_supplier
console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 TX bytes:0

console:/sys/bus/platform/devices/a800000.ssusb #

I strongly advise against reverting the patch solely based on the observed issue of removing the /sys/class/net/usb0 directory while the usb0 interface remains available. 
Instead, I recommend enabling FTRACE to trace the functions involved and identify which faulty call is responsible for removing usb0.

According to current kernel architecture of u_ether driver, only gether_cleanup should remove the usb0 interface along with its kobject and sysfs interface.
I suggest sharing the analysis here to understand why this practice is not followed in your use case or driver ?

I am curious why the driver was developed without adhering to the kernel's gadget architecture.

> 
> > > > I have the dwc3 IP base usb controller, Let me check with this patch and
> > > > share result here.  May be we need some fix in dwc3
> > Would have been nice if someone could test on other controller as well. But
> > another instance of dwc3 is also very welcome.
> > > It's quite possible, please test on your side.
> > > We are happy to test any fixes if you come up with.
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>
Ferry Toth April 25, 2024, 9:27 p.m. UTC | #8
Hi,

Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>
>> ...
>>
>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>
>>>>>>> To be precise: /sys/class/net/usb0 is not removed and it is a link, the link
>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
>>>>>>> longer exists
>>>>> So, it means that the /sys/class/net/usb0 is present, but the symlink is
>>>>> broken. In that case, the dwc3 driver should recreate the device, and the
>>>>> symlink should become active again
>>>
>>> Yes, on first enabling gadget (when device mode is activated):
>>>
>>> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>> driver  net  power  sound  subsystem  suspended  uevent
>>>
>>> Then switching to host mode:
>>>
>>> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>> ls: cannot access
>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file
>>> or directory
>>>
>>> Then back to device mode:
>>>
>>> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>> driver  power  sound  subsystem  suspended  uevent
>>>
>>> net is missing. But, network functions:
>>>
>>> root@yuna:~# ping 10.42.0.1
>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>
>>> Mass storage device is created and removed each time as expected.
>>
>> So, what's the conclusion? Shall we move towards revert of those two changes?
> 
> 
> As promised, I have the tested the this patch with the dwc3 gadget. I could not reproduce
> the issue.
> 
> I can see the usb0 exist all the time and accessible regardless of the role switching of the USB mode (peripheral <-> host)
> 
> Following are the logs:
> //Host to device
> 
> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > mode
> console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/
> usb0
> 
> //device to host
> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/
> usb0

That is weird. When I switch to host mode (using the physical switch), 
the whole gadget directory is removed (now testing 6.9.0-rc5)

Switching back to device mode, that gadget directory is recreated. And 
gadget/sound as well, but not gadget/net.

> s a800000.dwc3/gadget/net/usb0                                                <
> addr_assign_type    duplex             phys_port_name
> addr_len            flags              phys_switch_id
> address             gro_flush_timeout  power
> broadcast           ifalias            proto_down
> carrier             ifindex            queues
> carrier_changes     iflink             speed
> carrier_down_count  link_mode          statistics
> carrier_up_count    mtu                subsystem
> dev_id              name_assign_type   tx_queue_len
> dev_port            netdev_group       type
> device              operstate          uevent
> dormant             phys_port_id       waiting_for_supplier
> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>            BROADCAST MULTICAST  MTU:1500  Metric:1
>            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>            collisions:0 txqueuelen:1000
>            RX bytes:0 TX bytes:0
> 
> console:/sys/bus/platform/devices/a800000.ssusb #
> 
> I strongly advise against reverting the patch solely based on the observed issue of removing the /sys/class/net/usb0 directory while the usb0 interface remains available.

There's more to it. I also mentioned that switching the role or 
unplugging the cable leaves the usb0 connection.

I have while in host mode:
root@yuna:~# ifconfig -a usb0
usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
         inet 10.42.0.221  netmask 255.255.255.0  broadcast 10.42.0.255
         inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 0x20<link>


You don't see that because you didn't create a connection at all.

> Instead, I recommend enabling FTRACE to trace the functions involved and identify which faulty call is responsible for removing usb0.

Switching from device -> host -> device:

root@yuna:~# trace-cmd record -p function_graph -l *gether_*
   plugin 'function_graph'
Hit Ctrl^C to stop recording
^CCPU0 data recorded at offset=0x1c8000
     188 bytes in size (4096 uncompressed)
CPU1 data recorded at offset=0x1c9000
     0 bytes in size (0 uncompressed)
root@yuna:~# trace-cmd report
cpus=2
      irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      # 
2079.480 us |  gether_disconnect();
      irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      + 
11.640 us  |  gether_disconnect();
      irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      ! 
116.520 us |  gether_connect();
      irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      # 
2057.260 us |  gether_disconnect();
      irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      ! 
109.000 us |  gether_connect();


> According to current kernel architecture of u_ether driver, only gether_cleanup should remove the usb0 interface along with its kobject and sysfs interface.
> I suggest sharing the analysis here to understand why this practice is not followed in your use case or driver ?

Yes, I'll try to trace where that happens.

Nevertheless, the disappearance of the net/usb0 directory seems 
harmless? But the usb: net device remaining after disconnect or role 
switch is not good, as the route remains.

May be they are 2 separate problems. Could you try to reproduce what 
happens if you make eem connection and then unplug?

> I am curious why the driver was developed without adhering to the kernel's gadget architecture.
> 
>>
>>>>> I have the dwc3 IP base usb controller, Let me check with this patch and
>>>>> share result here.  May be we need some fix in dwc3
>>> Would have been nice if someone could test on other controller as well. But
>>> another instance of dwc3 is also very welcome.
>>>> It's quite possible, please test on your side.
>>>> We are happy to test any fixes if you come up with.
>>
>> -- 
>> With Best Regards,
>> Andy Shevchenko
>>
>>
Ferry Toth April 28, 2024, 9:07 p.m. UTC | #9
Hi,

Op 25-04-2024 om 23:27 schreef Ferry Toth:
> Hi,
> 
> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>
>>> ...
>>>
>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>
>>>>>>>> To be precise: /sys/class/net/usb0 is not removed and it is a 
>>>>>>>> link, the link
>>>>>>>> target 
>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no
>>>>>>>> longer exists
>>>>>> So, it means that the /sys/class/net/usb0 is present, but the 
>>>>>> symlink is
>>>>>> broken. In that case, the dwc3 driver should recreate the device, 
>>>>>> and the
>>>>>> symlink should become active again
>>>>
>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>
>>>> root@yuna:~# ls 
>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>
>>>> Then switching to host mode:
>>>>
>>>> root@yuna:~# ls 
>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>> ls: cannot access
>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No 
>>>> such file
>>>> or directory
>>>>
>>>> Then back to device mode:
>>>>
>>>> root@yuna:~# ls 
>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>> driver  power  sound  subsystem  suspended  uevent
>>>>
>>>> net is missing. But, network functions:
>>>>
>>>> root@yuna:~# ping 10.42.0.1
>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>
>>>> Mass storage device is created and removed each time as expected.
>>>
>>> So, what's the conclusion? Shall we move towards revert of those two 
>>> changes?
>>
>>
>> As promised, I have the tested the this patch with the dwc3 gadget. I 
>> could not reproduce
>> the issue.
>>
>> I can see the usb0 exist all the time and accessible regardless of the 
>> role switching of the USB mode (peripheral <-> host)
>>
>> Following are the logs:
>> //Host to device
>>
>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > 
>> mode
>> console:/sys/bus/platform/devices/a800000.ssusb # ls 
>> a800000.dwc3/gadget/net/
>> usb0
>>
>> //device to host
>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>> console:/sys/bus/platform/devices/a800000.ssusb # ls 
>> a800000.dwc3/gadget/net/
>> usb0
> 
> That is weird. When I switch to host mode (using the physical switch), 
> the whole gadget directory is removed (now testing 6.9.0-rc5)
> 
> Switching back to device mode, that gadget directory is recreated. And 
> gadget/sound as well, but not gadget/net.
> 
>> s 
>> a800000.dwc3/gadget/net/usb0                                                <
>> addr_assign_type    duplex             phys_port_name
>> addr_len            flags              phys_switch_id
>> address             gro_flush_timeout  power
>> broadcast           ifalias            proto_down
>> carrier             ifindex            queues
>> carrier_changes     iflink             speed
>> carrier_down_count  link_mode          statistics
>> carrier_up_count    mtu                subsystem
>> dev_id              name_assign_type   tx_queue_len
>> dev_port            netdev_group       type
>> device              operstate          uevent
>> dormant             phys_port_id       waiting_for_supplier
>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>            BROADCAST MULTICAST  MTU:1500  Metric:1
>>            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>            collisions:0 txqueuelen:1000
>>            RX bytes:0 TX bytes:0
>>
>> console:/sys/bus/platform/devices/a800000.ssusb #
>>
>> I strongly advise against reverting the patch solely based on the 
>> observed issue of removing the /sys/class/net/usb0 directory while the 
>> usb0 interface remains available.
> 
> There's more to it. I also mentioned that switching the role or 
> unplugging the cable leaves the usb0 connection.
> 
> I have while in host mode:
> root@yuna:~# ifconfig -a usb0
> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>          inet 10.42.0.221  netmask 255.255.255.0  broadcast 10.42.0.255
>          inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 0x20<link>
> 
> 
> You don't see that because you didn't create a connection at all.
> 
>> Instead, I recommend enabling FTRACE to trace the functions involved 
>> and identify which faulty call is responsible for removing usb0.
> 
> Switching from device -> host -> device:
> 
> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>    plugin 'function_graph'
> Hit Ctrl^C to stop recording
> ^CCPU0 data recorded at offset=0x1c8000
>      188 bytes in size (4096 uncompressed)
> CPU1 data recorded at offset=0x1c9000
>      0 bytes in size (0 uncompressed)
> root@yuna:~# trace-cmd report
> cpus=2
>       irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      # 
> 2079.480 us |  gether_disconnect();
>       irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      + 
> 11.640 us  |  gether_disconnect();
>       irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      ! 
> 116.520 us |  gether_connect();
>       irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      # 
> 2057.260 us |  gether_disconnect();
>       irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      ! 
> 109.000 us |  gether_connect();

I tried to get a more useful trace:
root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
root@yuna:/sys/kernel/tracing# echo function > current_tracer
root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
-> switch to host mode then back to device
root@yuna:/sys/kernel/tracing# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 53/53   #P:2
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
      irq/68-dwc3-523     [000] D..3.   133.990254: reset_config 
<-__composite_disconnect
      irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable 
<-reset_config
      irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect 
<-reset_config
      kworker/1:3-443     [001] ...1.   134.022453: eem_unbind 
<-purge_configs_funcs

-> to device mode

      kworker/1:3-443     [001] ...1.   148.630773: eem_bind 
<-usb_add_function
      irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt 
<-composite_setup
      irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect 
<-eem_set_alt
      irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect 
<-eem_set_alt
      irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt 
<-composite_setup
      irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect 
<-eem_set_alt
      irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect 
<-eem_set_alt
      irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap <-rx_complete
...

I don't know where to look exactly. Any hints?

> 
>> According to current kernel architecture of u_ether driver, only 
>> gether_cleanup should remove the usb0 interface along with its kobject 
>> and sysfs interface.
>> I suggest sharing the analysis here to understand why this practice is 
>> not followed in your use case or driver ?
> 
> Yes, I'll try to trace where that happens.
> 
> Nevertheless, the disappearance of the net/usb0 directory seems 
> harmless? But the usb: net device remaining after disconnect or role 
> switch is not good, as the route remains.
> 
> May be they are 2 separate problems. Could you try to reproduce what 
> happens if you make eem connection and then unplug?
> 
>> I am curious why the driver was developed without adhering to the 
>> kernel's gadget architecture.

I don't know what you mean here. Which driver do you mean?

>>>
>>>>>> I have the dwc3 IP base usb controller, Let me check with this 
>>>>>> patch and
>>>>>> share result here.  May be we need some fix in dwc3
>>>> Would have been nice if someone could test on other controller as 
>>>> well. But
>>>> another instance of dwc3 is also very welcome.
>>>>> It's quite possible, please test on your side.
>>>>> We are happy to test any fixes if you come up with.
>>>
>>> -- 
>>> With Best Regards,
>>> Andy Shevchenko
>>>
>>>
>
Hardik Gajjar April 30, 2024, 3:32 p.m. UTC | #10
On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> Hi,
> 
> Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > Hi,
> > 
> > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > > > 
> > > > ...
> > > > 
> > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > 
> > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > removed and it is a link, the link
> > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > no
> > > > > > > > > longer exists
> > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > present, but the symlink is
> > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > recreate the device, and the
> > > > > > > symlink should become active again
> > > > > 
> > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > 
> > > > > root@yuna:~# ls
> > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > driver  net  power  sound  subsystem  suspended  uevent
> > > > > 
> > > > > Then switching to host mode:
> > > > > 
> > > > > root@yuna:~# ls
> > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > ls: cannot access
> > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > No such file
> > > > > or directory
> > > > > 
> > > > > Then back to device mode:
> > > > > 
> > > > > root@yuna:~# ls
> > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > driver  power  sound  subsystem  suspended  uevent
> > > > > 
> > > > > net is missing. But, network functions:
> > > > > 
> > > > > root@yuna:~# ping 10.42.0.1
> > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > 
> > > > > Mass storage device is created and removed each time as expected.
> > > > 
> > > > So, what's the conclusion? Shall we move towards revert of those
> > > > two changes?
> > > 
> > > 
> > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > I could not reproduce
> > > the issue.
> > > 
> > > I can see the usb0 exist all the time and accessible regardless of
> > > the role switching of the USB mode (peripheral <-> host)
> > > 
> > > Following are the logs:
> > > //Host to device
> > > 
> > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > mode
> > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > a800000.dwc3/gadget/net/
> > > usb0
> > > 
> > > //device to host
> > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > a800000.dwc3/gadget/net/
> > > usb0
> > 
> > That is weird. When I switch to host mode (using the physical switch),
> > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > 
> > Switching back to device mode, that gadget directory is recreated. And
> > gadget/sound as well, but not gadget/net.
> > 
> > > s a800000.dwc3/gadget/net/usb0                                               
> > > <
> > > addr_assign_type    duplex             phys_port_name
> > > addr_len            flags              phys_switch_id
> > > address             gro_flush_timeout  power
> > > broadcast           ifalias            proto_down
> > > carrier             ifindex            queues
> > > carrier_changes     iflink             speed
> > > carrier_down_count  link_mode          statistics
> > > carrier_up_count    mtu                subsystem
> > > dev_id              name_assign_type   tx_queue_len
> > > dev_port            netdev_group       type
> > > device              operstate          uevent
> > > dormant             phys_port_id       waiting_for_supplier
> > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> > >            BROADCAST MULTICAST  MTU:1500  Metric:1
> > >            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > >            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > >            collisions:0 txqueuelen:1000
> > >            RX bytes:0 TX bytes:0
> > > 
> > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > 
> > > I strongly advise against reverting the patch solely based on the
> > > observed issue of removing the /sys/class/net/usb0 directory while
> > > the usb0 interface remains available.
> > 
> > There's more to it. I also mentioned that switching the role or
> > unplugging the cable leaves the usb0 connection.
> > 
> > I have while in host mode:
> > root@yuna:~# ifconfig -a usb0
> > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> >          inet 10.42.0.221  netmask 255.255.255.0  broadcast 10.42.0.255
> >          inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 0x20<link>
> > 
> > 
> > You don't see that because you didn't create a connection at all.
> > 
> > > Instead, I recommend enabling FTRACE to trace the functions involved
> > > and identify which faulty call is responsible for removing usb0.
> > 
> > Switching from device -> host -> device:
> > 
> > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> >    plugin 'function_graph'
> > Hit Ctrl^C to stop recording
> > ^CCPU0 data recorded at offset=0x1c8000
> >      188 bytes in size (4096 uncompressed)
> > CPU1 data recorded at offset=0x1c9000
> >      0 bytes in size (0 uncompressed)
> > root@yuna:~# trace-cmd report
> > cpus=2
> >       irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> > 2079.480 us |  gether_disconnect();
> >       irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> > 11.640 us  |  gether_disconnect();
> >       irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> > 116.520 us |  gether_connect();
> >       irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> > 2057.260 us |  gether_disconnect();
> >       irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> > 109.000 us |  gether_connect();
> 
> I tried to get a more useful trace:
> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> root@yuna:/sys/kernel/tracing# echo function > current_tracer
> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> -> switch to host mode then back to device
> root@yuna:/sys/kernel/tracing# cat trace
> # tracer: function
> #
> # entries-in-buffer/entries-written: 53/53   #P:2
> #
> #                                _-----=> irqs-off/BH-disabled
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| / _-=> migrate-disable
> #                              |||| /     delay
> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>      irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> <-__composite_disconnect
>      irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> <-reset_config
>      irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
> <-reset_config
>      kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> <-purge_configs_funcs
> 
> -> to device mode
> 
>      kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> <-usb_add_function
>      irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> <-composite_setup
>      irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
> <-eem_set_alt
>      irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
> <-eem_set_alt
>      irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> <-composite_setup
>      irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
> <-eem_set_alt
>      irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
> <-eem_set_alt
>      irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap <-rx_complete
> ...
> 
> I don't know where to look exactly. Any hints?

do you see anything related to gether_cleanup() after eem_unbind() ?
If not then, you may try to enable tracing of TCP/IP stack and network side to check who deleting the sysfs entry

Hardik


> 
> > 
> > > According to current kernel architecture of u_ether driver, only
> > > gether_cleanup should remove the usb0 interface along with its
> > > kobject and sysfs interface.
> > > I suggest sharing the analysis here to understand why this practice
> > > is not followed in your use case or driver ?
> > 
> > Yes, I'll try to trace where that happens.
> > 
> > Nevertheless, the disappearance of the net/usb0 directory seems
> > harmless? But the usb: net device remaining after disconnect or role
> > switch is not good, as the route remains.
> > 
> > May be they are 2 separate problems. Could you try to reproduce what
> > happens if you make eem connection and then unplug?
> > 
> > > I am curious why the driver was developed without adhering to the
> > > kernel's gadget architecture.
> 
> I don't know what you mean here. Which driver do you mean?
> 
> > > > 
> > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > with this patch and
> > > > > > > share result here.  May be we need some fix in dwc3
> > > > > Would have been nice if someone could test on other
> > > > > controller as well. But
> > > > > another instance of dwc3 is also very welcome.
> > > > > > It's quite possible, please test on your side.
> > > > > > We are happy to test any fixes if you come up with.
> > > > 
> > > > -- 
> > > > With Best Regards,
> > > > Andy Shevchenko
> > > > 
> > > > 
> >
Ferry Toth April 30, 2024, 7:40 p.m. UTC | #11
Hi,

Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>> Hi,
>>
>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>
>>>>> ...
>>>>>
>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>
>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>> no
>>>>>>>>>> longer exists
>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>> present, but the symlink is
>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>> recreate the device, and the
>>>>>>>> symlink should become active again
>>>>>>
>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>
>>>>>> root@yuna:~# ls
>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>
>>>>>> Then switching to host mode:
>>>>>>
>>>>>> root@yuna:~# ls
>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>> ls: cannot access
>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>> No such file
>>>>>> or directory
>>>>>>
>>>>>> Then back to device mode:
>>>>>>
>>>>>> root@yuna:~# ls
>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>
>>>>>> net is missing. But, network functions:
>>>>>>
>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>
>>>>>> Mass storage device is created and removed each time as expected.
>>>>>
>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>> two changes?
>>>>
>>>>
>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>> I could not reproduce
>>>> the issue.
>>>>
>>>> I can see the usb0 exist all the time and accessible regardless of
>>>> the role switching of the USB mode (peripheral <-> host)
>>>>
>>>> Following are the logs:
>>>> //Host to device
>>>>
>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>> mode
>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>> a800000.dwc3/gadget/net/
>>>> usb0
>>>>
>>>> //device to host
>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>> a800000.dwc3/gadget/net/
>>>> usb0
>>>
>>> That is weird. When I switch to host mode (using the physical switch),
>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>
>>> Switching back to device mode, that gadget directory is recreated. And
>>> gadget/sound as well, but not gadget/net.
>>>
>>>> s a800000.dwc3/gadget/net/usb0
>>>> <
>>>> addr_assign_type    duplex             phys_port_name
>>>> addr_len            flags              phys_switch_id
>>>> address             gro_flush_timeout  power
>>>> broadcast           ifalias            proto_down
>>>> carrier             ifindex            queues
>>>> carrier_changes     iflink             speed
>>>> carrier_down_count  link_mode          statistics
>>>> carrier_up_count    mtu                subsystem
>>>> dev_id              name_assign_type   tx_queue_len
>>>> dev_port            netdev_group       type
>>>> device              operstate          uevent
>>>> dormant             phys_port_id       waiting_for_supplier
>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>             collisions:0 txqueuelen:1000
>>>>             RX bytes:0 TX bytes:0
>>>>
>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>
>>>> I strongly advise against reverting the patch solely based on the
>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>> the usb0 interface remains available.
>>>
>>> There's more to it. I also mentioned that switching the role or
>>> unplugging the cable leaves the usb0 connection.
>>>
>>> I have while in host mode:
>>> root@yuna:~# ifconfig -a usb0
>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast 10.42.0.255
>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 0x20<link>
>>>
>>>
>>> You don't see that because you didn't create a connection at all.
>>>
>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>> and identify which faulty call is responsible for removing usb0.
>>>
>>> Switching from device -> host -> device:
>>>
>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>     plugin 'function_graph'
>>> Hit Ctrl^C to stop recording
>>> ^CCPU0 data recorded at offset=0x1c8000
>>>       188 bytes in size (4096 uncompressed)
>>> CPU1 data recorded at offset=0x1c9000
>>>       0 bytes in size (0 uncompressed)
>>> root@yuna:~# trace-cmd report
>>> cpus=2
>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>> 2079.480 us |  gether_disconnect();
>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>> 11.640 us  |  gether_disconnect();
>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>> 116.520 us |  gether_connect();
>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>> 2057.260 us |  gether_disconnect();
>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>> 109.000 us |  gether_connect();
>>
>> I tried to get a more useful trace:
>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>> -> switch to host mode then back to device
>> root@yuna:/sys/kernel/tracing# cat trace
>> # tracer: function
>> #
>> # entries-in-buffer/entries-written: 53/53   #P:2
>> #
>> #                                _-----=> irqs-off/BH-disabled
>> #                               / _----=> need-resched
>> #                              | / _---=> hardirq/softirq
>> #                              || / _--=> preempt-depth
>> #                              ||| / _-=> migrate-disable
>> #                              |||| /     delay
>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>> #              | |         |   |||||     |         |
>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>> <-__composite_disconnect
>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>> <-reset_config
>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>> <-reset_config
>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>> <-purge_configs_funcs
>>
>> -> to device mode
>>
>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>> <-usb_add_function
>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>> <-composite_setup
>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>> <-eem_set_alt
>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>> <-eem_set_alt
>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>> <-composite_setup
>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>> <-eem_set_alt
>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>> <-eem_set_alt
>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap <-rx_complete
>> ...
>>
>> I don't know where to look exactly. Any hints?
> 
> do you see anything related to gether_cleanup() after eem_unbind() ?

Nope. It's a pitty that the trace formatting got messed up above. But as 
you can see I traced gether_* and eem_*. After eem_unbind no traced 
function is called, until I flip the switch to device mode.
The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete 
and eem_wrap <-eth_start_xmit lines.

> If not then, you may try to enable tracing of TCP/IP stack and network side to check who deleting the sysfs entry

Yes, that's a vast amount of functions to trace. And I don't see how 
that would be related to the patch we're discussing here. I was hoping 
for a little more targeted hint.

You may recall the whole issue did not occur before this patch got applied.

> Hardik
> 
> 
>>
>>>
>>>> According to current kernel architecture of u_ether driver, only
>>>> gether_cleanup should remove the usb0 interface along with its
>>>> kobject and sysfs interface.
>>>> I suggest sharing the analysis here to understand why this practice
>>>> is not followed in your use case or driver ?
>>>
>>> Yes, I'll try to trace where that happens.
>>>
>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>> harmless? But the usb: net device remaining after disconnect or role
>>> switch is not good, as the route remains.
>>>
>>> May be they are 2 separate problems. Could you try to reproduce what
>>> happens if you make eem connection and then unplug?
>>>
>>>> I am curious why the driver was developed without adhering to the
>>>> kernel's gadget architecture.
>>
>> I don't know what you mean here. Which driver do you mean?
>>
>>>>>
>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>> with this patch and
>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>> Would have been nice if someone could test on other
>>>>>> controller as well. But
>>>>>> another instance of dwc3 is also very welcome.
>>>>>>> It's quite possible, please test on your side.
>>>>>>> We are happy to test any fixes if you come up with.
>>>>>
>>>>> -- 
>>>>> With Best Regards,
>>>>> Andy Shevchenko
>>>>>
>>>>>
>>>
Ferry Toth April 30, 2024, 9:12 p.m. UTC | #12
Hi,

Op 30-04-2024 om 21:40 schreef Ferry Toth:
> Hi,
> 
> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>> Hi,
>>>
>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>> Hi,
>>>>
>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>
>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>> target 
>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>> no
>>>>>>>>>>> longer exists
>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>> present, but the symlink is
>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>> recreate the device, and the
>>>>>>>>> symlink should become active again
>>>>>>>
>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>
>>>>>>> root@yuna:~# ls
>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>
>>>>>>> Then switching to host mode:
>>>>>>>
>>>>>>> root@yuna:~# ls
>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>> ls: cannot access
>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>> No such file
>>>>>>> or directory
>>>>>>>
>>>>>>> Then back to device mode:
>>>>>>>
>>>>>>> root@yuna:~# ls
>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>
>>>>>>> net is missing. But, network functions:
>>>>>>>
>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>
>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>
>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>> two changes?
>>>>>
>>>>>
>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>> I could not reproduce
>>>>> the issue.
>>>>>
>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>
>>>>> Following are the logs:
>>>>> //Host to device
>>>>>
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>> mode
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>> a800000.dwc3/gadget/net/
>>>>> usb0
>>>>>
>>>>> //device to host
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>> a800000.dwc3/gadget/net/
>>>>> usb0
>>>>
>>>> That is weird. When I switch to host mode (using the physical switch),
>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>
>>>> Switching back to device mode, that gadget directory is recreated. And
>>>> gadget/sound as well, but not gadget/net.
>>>>
>>>>> s a800000.dwc3/gadget/net/usb0
>>>>> <
>>>>> addr_assign_type    duplex             phys_port_name
>>>>> addr_len            flags              phys_switch_id
>>>>> address             gro_flush_timeout  power
>>>>> broadcast           ifalias            proto_down
>>>>> carrier             ifindex            queues
>>>>> carrier_changes     iflink             speed
>>>>> carrier_down_count  link_mode          statistics
>>>>> carrier_up_count    mtu                subsystem
>>>>> dev_id              name_assign_type   tx_queue_len
>>>>> dev_port            netdev_group       type
>>>>> device              operstate          uevent
>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>             collisions:0 txqueuelen:1000
>>>>>             RX bytes:0 TX bytes:0
>>>>>
>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>
>>>>> I strongly advise against reverting the patch solely based on the
>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>> the usb0 interface remains available.
>>>>
>>>> There's more to it. I also mentioned that switching the role or
>>>> unplugging the cable leaves the usb0 connection.
>>>>
>>>> I have while in host mode:
>>>> root@yuna:~# ifconfig -a usb0
>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast 
>>>> 10.42.0.255
>>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64  scopeid 
>>>> 0x20<link>
>>>>
>>>>
>>>> You don't see that because you didn't create a connection at all.
>>>>
>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>> and identify which faulty call is responsible for removing usb0.
>>>>
>>>> Switching from device -> host -> device:
>>>>
>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>     plugin 'function_graph'
>>>> Hit Ctrl^C to stop recording
>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>       188 bytes in size (4096 uncompressed)
>>>> CPU1 data recorded at offset=0x1c9000
>>>>       0 bytes in size (0 uncompressed)
>>>> root@yuna:~# trace-cmd report
>>>> cpus=2
>>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>>> 2079.480 us |  gether_disconnect();
>>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>>> 11.640 us  |  gether_disconnect();
>>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>>> 116.520 us |  gether_connect();
>>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>>> 2057.260 us |  gether_disconnect();
>>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>>> 109.000 us |  gether_connect();
>>>
>>> I tried to get a more useful trace:
>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>> -> switch to host mode then back to device
>>> root@yuna:/sys/kernel/tracing# cat trace
>>> # tracer: function
>>> #
>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>> #
>>> #                                _-----=> irqs-off/BH-disabled
>>> #                               / _----=> need-resched
>>> #                              | / _---=> hardirq/softirq
>>> #                              || / _--=> preempt-depth
>>> #                              ||| / _-=> migrate-disable
>>> #                              |||| /     delay
>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>> #              | |         |   |||||     |         |
>>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>> <-__composite_disconnect
>>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>> <-reset_config
>>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>>> <-reset_config
>>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>> <-purge_configs_funcs
>>>
>>> -> to device mode
>>>
>>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>> <-usb_add_function
>>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>> <-composite_setup
>>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>>> <-eem_set_alt
>>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>> <-eem_set_alt
>>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>> <-composite_setup
>>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>>> <-eem_set_alt
>>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>> <-eem_set_alt
>>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap 
>>> <-rx_complete
>>> ...
>>>
>>> I don't know where to look exactly. Any hints?
>>
>> do you see anything related to gether_cleanup() after eem_unbind() ?
> 
> Nope. It's a pitty that the trace formatting got messed up above. But as 
> you can see I traced gether_* and eem_*. After eem_unbind no traced 
> function is called, until I flip the switch to device mode.
> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete 
> and eem_wrap <-eth_start_xmit lines.
> 
>> If not then, you may try to enable tracing of TCP/IP stack and network 
>> side to check who deleting the sysfs entry
> 
> Yes, that's a vast amount of functions to trace. And I don't see how 
> that would be related to the patch we're discussing here. I was hoping 
> for a little more targeted hint.

Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 
'usb_fun*', 'reset_config' and 'device_remove_file' leads me to:

TIMESTAMP  FUNCTION
    |         |
   49.952477: eem_wrap <-eth_start_xmit
   55.072455: eem_wrap <-eth_start_xmit
   55.072621: eem_unwrap <-rx_complete
   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
   59.011545: composite_reset <-configfs_composite_reset
   59.011548: reset_config <-__composite_disconnect
   59.013565: eem_disable <-reset_config
   59.013567: gether_disconnect <-reset_config
   59.049560: device_remove_file <-device_remove
   59.051185: configfs_composite_disconnect <-usb_gadget_disco
   59.051189: composite_disconnect <-configfs_composite_discon
   59.051195: configfs_composite_unbind <-gadget_unbind_driver
   59.052519: eem_unbind <-purge_configs_funcs
   59.052529: composite_dev_cleanup <-configfs_composite_unbin
   59.052537: device_remove_file <-composite_dev_cleanup

device_remove_file gets called twice, once by device_remove after 
gether_disconnect (that the one). The 2nd time by composite_dev_cleanup 
(removing the gadget)

> You may recall the whole issue did not occur before this patch got applied.
> 
>> Hardik
>>
>>
>>>
>>>>
>>>>> According to current kernel architecture of u_ether driver, only
>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>> kobject and sysfs interface.
>>>>> I suggest sharing the analysis here to understand why this practice
>>>>> is not followed in your use case or driver ?
>>>>
>>>> Yes, I'll try to trace where that happens.
>>>>
>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>> harmless? But the usb: net device remaining after disconnect or role
>>>> switch is not good, as the route remains.
>>>>
>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>> happens if you make eem connection and then unplug?
>>>>
>>>>> I am curious why the driver was developed without adhering to the
>>>>> kernel's gadget architecture.
>>>
>>> I don't know what you mean here. Which driver do you mean?
>>>
>>>>>>
>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>> with this patch and
>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>> Would have been nice if someone could test on other
>>>>>>> controller as well. But
>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>> It's quite possible, please test on your side.
>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>
>>>>>> -- 
>>>>>> With Best Regards,
>>>>>> Andy Shevchenko
>>>>>>
>>>>>>
>>>>
>
Hardik Gajjar May 2, 2024, 3:29 p.m. UTC | #13
On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> Hi,
> 
> Op 30-04-2024 om 21:40 schreef Ferry Toth:
> > Hi,
> > 
> > Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> > > > Hi,
> > > > 
> > > > Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > > > > Hi,
> > > > > 
> > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > > > > > > 
> > > > > > > ...
> > > > > > > 
> > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > > > > 
> > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > > > > removed and it is a link, the link
> > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > > > > no
> > > > > > > > > > > > longer exists
> > > > > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > > > > present, but the symlink is
> > > > > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > > > > recreate the device, and the
> > > > > > > > > > symlink should become active again
> > > > > > > > 
> > > > > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > > > > 
> > > > > > > > root@yuna:~# ls
> > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > driver  net  power  sound  subsystem  suspended  uevent
> > > > > > > > 
> > > > > > > > Then switching to host mode:
> > > > > > > > 
> > > > > > > > root@yuna:~# ls
> > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > ls: cannot access
> > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > > > > No such file
> > > > > > > > or directory
> > > > > > > > 
> > > > > > > > Then back to device mode:
> > > > > > > > 
> > > > > > > > root@yuna:~# ls
> > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > driver  power  sound  subsystem  suspended  uevent
> > > > > > > > 
> > > > > > > > net is missing. But, network functions:
> > > > > > > > 
> > > > > > > > root@yuna:~# ping 10.42.0.1
> > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > > > > 
> > > > > > > > Mass storage device is created and removed each time as expected.
> > > > > > > 
> > > > > > > So, what's the conclusion? Shall we move towards revert of those
> > > > > > > two changes?
> > > > > > 
> > > > > > 
> > > > > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > > > > I could not reproduce
> > > > > > the issue.
> > > > > > 
> > > > > > I can see the usb0 exist all the time and accessible regardless of
> > > > > > the role switching of the USB mode (peripheral <-> host)
> > > > > > 
> > > > > > Following are the logs:
> > > > > > //Host to device
> > > > > > 
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > > > > mode
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > a800000.dwc3/gadget/net/
> > > > > > usb0
> > > > > > 
> > > > > > //device to host
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > a800000.dwc3/gadget/net/
> > > > > > usb0
> > > > > 
> > > > > That is weird. When I switch to host mode (using the physical switch),
> > > > > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > > > > 
> > > > > Switching back to device mode, that gadget directory is recreated. And
> > > > > gadget/sound as well, but not gadget/net.
> > > > > 
> > > > > > s a800000.dwc3/gadget/net/usb0
> > > > > > <
> > > > > > addr_assign_type    duplex             phys_port_name
> > > > > > addr_len            flags              phys_switch_id
> > > > > > address             gro_flush_timeout  power
> > > > > > broadcast           ifalias            proto_down
> > > > > > carrier             ifindex            queues
> > > > > > carrier_changes     iflink             speed
> > > > > > carrier_down_count  link_mode          statistics
> > > > > > carrier_up_count    mtu                subsystem
> > > > > > dev_id              name_assign_type   tx_queue_len
> > > > > > dev_port            netdev_group       type
> > > > > > device              operstate          uevent
> > > > > > dormant             phys_port_id       waiting_for_supplier
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > > > > usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> > > > > >             BROADCAST MULTICAST  MTU:1500  Metric:1
> > > > > >             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > > > > >             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > > > > >             collisions:0 txqueuelen:1000
> > > > > >             RX bytes:0 TX bytes:0
> > > > > > 
> > > > > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > > > > 
> > > > > > I strongly advise against reverting the patch solely based on the
> > > > > > observed issue of removing the /sys/class/net/usb0 directory while
> > > > > > the usb0 interface remains available.
> > > > > 
> > > > > There's more to it. I also mentioned that switching the role or
> > > > > unplugging the cable leaves the usb0 connection.
> > > > > 
> > > > > I have while in host mode:
> > > > > root@yuna:~# ifconfig -a usb0
> > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> > > > >           inet 10.42.0.221  netmask 255.255.255.0  broadcast
> > > > > 10.42.0.255
> > > > >           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
> > > > > scopeid 0x20<link>
> > > > > 
> > > > > 
> > > > > You don't see that because you didn't create a connection at all.
> > > > > 
> > > > > > Instead, I recommend enabling FTRACE to trace the functions involved
> > > > > > and identify which faulty call is responsible for removing usb0.
> > > > > 
> > > > > Switching from device -> host -> device:
> > > > > 
> > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> > > > >     plugin 'function_graph'
> > > > > Hit Ctrl^C to stop recording
> > > > > ^CCPU0 data recorded at offset=0x1c8000
> > > > >       188 bytes in size (4096 uncompressed)
> > > > > CPU1 data recorded at offset=0x1c9000
> > > > >       0 bytes in size (0 uncompressed)
> > > > > root@yuna:~# trace-cmd report
> > > > > cpus=2
> > > > >        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> > > > > 2079.480 us |  gether_disconnect();
> > > > >        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> > > > > 11.640 us  |  gether_disconnect();
> > > > >        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> > > > > 116.520 us |  gether_connect();
> > > > >        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> > > > > 2057.260 us |  gether_disconnect();
> > > > >        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> > > > > 109.000 us |  gether_connect();
> > > > 
> > > > I tried to get a more useful trace:
> > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer
> > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> > > > -> switch to host mode then back to device
> > > > root@yuna:/sys/kernel/tracing# cat trace
> > > > # tracer: function
> > > > #
> > > > # entries-in-buffer/entries-written: 53/53   #P:2
> > > > #
> > > > #                                _-----=> irqs-off/BH-disabled
> > > > #                               / _----=> need-resched
> > > > #                              | / _---=> hardirq/softirq
> > > > #                              || / _--=> preempt-depth
> > > > #                              ||| / _-=> migrate-disable
> > > > #                              |||| /     delay
> > > > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > > > #              | |         |   |||||     |         |
> > > >       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> > > > <-__composite_disconnect
> > > >       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> > > > <-reset_config
> > > >       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
> > > > <-reset_config
> > > >       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> > > > <-purge_configs_funcs
> > > > 
> > > > -> to device mode
> > > > 
> > > >       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> > > > <-usb_add_function
> > > >       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> > > > <-composite_setup
> > > >       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
> > > > <-eem_set_alt
> > > >       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
> > > > <-eem_set_alt
> > > >       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> > > > <-composite_setup
> > > >       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
> > > > <-eem_set_alt
> > > >       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
> > > > <-eem_set_alt
> > > >       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
> > > > <-rx_complete
> > > > ...
> > > > 
> > > > I don't know where to look exactly. Any hints?
> > > 
> > > do you see anything related to gether_cleanup() after eem_unbind() ?
> > 
> > Nope. It's a pitty that the trace formatting got messed up above. But as
> > you can see I traced gether_* and eem_*. After eem_unbind no traced
> > function is called, until I flip the switch to device mode.
> > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
> > and eem_wrap <-eth_start_xmit lines.
> > 
> > > If not then, you may try to enable tracing of TCP/IP stack and
> > > network side to check who deleting the sysfs entry
> > 
> > Yes, that's a vast amount of functions to trace. And I don't see how
> > that would be related to the patch we're discussing here. I was hoping
> > for a little more targeted hint.
> 
> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
> 'reset_config' and 'device_remove_file' leads me to:
> 
> TIMESTAMP  FUNCTION
>    |         |
>   49.952477: eem_wrap <-eth_start_xmit
>   55.072455: eem_wrap <-eth_start_xmit
>   55.072621: eem_unwrap <-rx_complete
>   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>   59.011545: composite_reset <-configfs_composite_reset
>   59.011548: reset_config <-__composite_disconnect
>   59.013565: eem_disable <-reset_config
>   59.013567: gether_disconnect <-reset_config
>   59.049560: device_remove_file <-device_remove
>   59.051185: configfs_composite_disconnect <-usb_gadget_disco
>   59.051189: composite_disconnect <-configfs_composite_discon
>   59.051195: configfs_composite_unbind <-gadget_unbind_driver
>   59.052519: eem_unbind <-purge_configs_funcs
>   59.052529: composite_dev_cleanup <-configfs_composite_unbin
>   59.052537: device_remove_file <-composite_dev_cleanup
> 
> device_remove_file gets called twice, once by device_remove after
> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> (removing the gadget)

I believe that the device_remove_file function is only removing suspend-specific attributes, not the complete gadget. 
Typically, when you perform the role switch, the Gadget start/stop function in your UDC driver is called. These functions should not delete the gadget

To investigate further, could you please enable the DWC3 functions in ftrace and check who is removing the gadget? 
I can also enable this on my system and compare the logs with yours, but I will be in PI planning for 1.5 weeks and may not be able to provide immediate support.

Additionally, please check if you have any customized DWC patches that may be causing this problem.

> 
> > You may recall the whole issue did not occur before this patch got applied.
> > 
> > > Hardik
> > > 
> > > 
> > > > 
> > > > > 
> > > > > > According to current kernel architecture of u_ether driver, only
> > > > > > gether_cleanup should remove the usb0 interface along with its
> > > > > > kobject and sysfs interface.
> > > > > > I suggest sharing the analysis here to understand why this practice
> > > > > > is not followed in your use case or driver ?
> > > > > 
> > > > > Yes, I'll try to trace where that happens.
> > > > > 
> > > > > Nevertheless, the disappearance of the net/usb0 directory seems
> > > > > harmless? But the usb: net device remaining after disconnect or role
> > > > > switch is not good, as the route remains.
> > > > > 
> > > > > May be they are 2 separate problems. Could you try to reproduce what
> > > > > happens if you make eem connection and then unplug?
> > > > > 
> > > > > > I am curious why the driver was developed without adhering to the
> > > > > > kernel's gadget architecture.
> > > > 
> > > > I don't know what you mean here. Which driver do you mean?
> > > > 
> > > > > > > 
> > > > > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > > > > with this patch and
> > > > > > > > > > share result here.  May be we need some fix in dwc3
> > > > > > > > Would have been nice if someone could test on other
> > > > > > > > controller as well. But
> > > > > > > > another instance of dwc3 is also very welcome.
> > > > > > > > > It's quite possible, please test on your side.
> > > > > > > > > We are happy to test any fixes if you come up with.
> > > > > > > 
> > > > > > > -- 
> > > > > > > With Best Regards,
> > > > > > > Andy Shevchenko
> > > > > > > 
> > > > > > > 
> > > > > 
> >
Hardik Gajjar May 2, 2024, 4:16 p.m. UTC | #14
On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote:
> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
> > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> > > Op 30-04-2024 om 21:40 schreef Ferry Toth:
> > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> 
> ...
> 
> > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > > > > > > removed and it is a link, the link
> > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > > > > > > no
> > > > > > > > > > > > > > longer exists
> > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > > > > > > present, but the symlink is
> > > > > > > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > > > > > > recreate the device, and the
> > > > > > > > > > > > symlink should become active again
> > > > > > > > > > 
> > > > > > > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > > > > > > 
> > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > driver  net  power  sound  subsystem  suspended  uevent
> > > > > > > > > > 
> > > > > > > > > > Then switching to host mode:
> > > > > > > > > > 
> > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > ls: cannot access
> > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > > > > > > No such file
> > > > > > > > > > or directory
> > > > > > > > > > 
> > > > > > > > > > Then back to device mode:
> > > > > > > > > > 
> > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > driver  power  sound  subsystem  suspended  uevent
> > > > > > > > > > 
> > > > > > > > > > net is missing. But, network functions:
> > > > > > > > > > 
> > > > > > > > > > root@yuna:~# ping 10.42.0.1
> > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > > > > > > 
> > > > > > > > > > Mass storage device is created and removed each time as expected.
> > > > > > > > > 
> > > > > > > > > So, what's the conclusion? Shall we move towards revert of those
> > > > > > > > > two changes?
> > > > > > > > 
> > > > > > > > 
> > > > > > > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > > > > > > I could not reproduce
> > > > > > > > the issue.
> > > > > > > > 
> > > > > > > > I can see the usb0 exist all the time and accessible regardless of
> > > > > > > > the role switching of the USB mode (peripheral <-> host)
> > > > > > > > 
> > > > > > > > Following are the logs:
> > > > > > > > //Host to device
> > > > > > > > 
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > > > > > > mode
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > usb0
> > > > > > > > 
> > > > > > > > //device to host
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > usb0
> > > > > > > 
> > > > > > > That is weird. When I switch to host mode (using the physical switch),
> > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > > > > > > 
> > > > > > > Switching back to device mode, that gadget directory is recreated. And
> > > > > > > gadget/sound as well, but not gadget/net.
> > > > > > > 
> > > > > > > > s a800000.dwc3/gadget/net/usb0
> > > > > > > > <
> > > > > > > > addr_assign_type    duplex             phys_port_name
> > > > > > > > addr_len            flags              phys_switch_id
> > > > > > > > address             gro_flush_timeout  power
> > > > > > > > broadcast           ifalias            proto_down
> > > > > > > > carrier             ifindex            queues
> > > > > > > > carrier_changes     iflink             speed
> > > > > > > > carrier_down_count  link_mode          statistics
> > > > > > > > carrier_up_count    mtu                subsystem
> > > > > > > > dev_id              name_assign_type   tx_queue_len
> > > > > > > > dev_port            netdev_group       type
> > > > > > > > device              operstate          uevent
> > > > > > > > dormant             phys_port_id       waiting_for_supplier
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > > > > > > usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> > > > > > > >             BROADCAST MULTICAST  MTU:1500  Metric:1
> > > > > > > >             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > > > > > > >             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > > > > > > >             collisions:0 txqueuelen:1000
> > > > > > > >             RX bytes:0 TX bytes:0
> > > > > > > > 
> > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > > > > > > 
> > > > > > > > I strongly advise against reverting the patch solely based on the
> > > > > > > > observed issue of removing the /sys/class/net/usb0 directory while
> > > > > > > > the usb0 interface remains available.
> > > > > > > 
> > > > > > > There's more to it. I also mentioned that switching the role or
> > > > > > > unplugging the cable leaves the usb0 connection.
> > > > > > > 
> > > > > > > I have while in host mode:
> > > > > > > root@yuna:~# ifconfig -a usb0
> > > > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> > > > > > >           inet 10.42.0.221  netmask 255.255.255.0  broadcast
> > > > > > > 10.42.0.255
> > > > > > >           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
> > > > > > > scopeid 0x20<link>
> > > > > > > 
> > > > > > > 
> > > > > > > You don't see that because you didn't create a connection at all.
> > > > > > > 
> > > > > > > > Instead, I recommend enabling FTRACE to trace the functions involved
> > > > > > > > and identify which faulty call is responsible for removing usb0.
> > > > > > > 
> > > > > > > Switching from device -> host -> device:
> > > > > > > 
> > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> > > > > > >     plugin 'function_graph'
> > > > > > > Hit Ctrl^C to stop recording
> > > > > > > ^CCPU0 data recorded at offset=0x1c8000
> > > > > > >       188 bytes in size (4096 uncompressed)
> > > > > > > CPU1 data recorded at offset=0x1c9000
> > > > > > >       0 bytes in size (0 uncompressed)
> > > > > > > root@yuna:~# trace-cmd report
> > > > > > > cpus=2
> > > > > > >        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> > > > > > > 2079.480 us |  gether_disconnect();
> > > > > > >        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> > > > > > > 11.640 us  |  gether_disconnect();
> > > > > > >        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> > > > > > > 116.520 us |  gether_connect();
> > > > > > >        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> > > > > > > 2057.260 us |  gether_disconnect();
> > > > > > >        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> > > > > > > 109.000 us |  gether_connect();
> > > > > > 
> > > > > > I tried to get a more useful trace:
> > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer
> > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> > > > > > -> switch to host mode then back to device
> > > > > > root@yuna:/sys/kernel/tracing# cat trace
> > > > > > # tracer: function
> > > > > > #
> > > > > > # entries-in-buffer/entries-written: 53/53   #P:2
> > > > > > #
> > > > > > #                                _-----=> irqs-off/BH-disabled
> > > > > > #                               / _----=> need-resched
> > > > > > #                              | / _---=> hardirq/softirq
> > > > > > #                              || / _--=> preempt-depth
> > > > > > #                              ||| / _-=> migrate-disable
> > > > > > #                              |||| /     delay
> > > > > > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > > > > > #              | |         |   |||||     |         |
> > > > > >       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> > > > > > <-__composite_disconnect
> > > > > >       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> > > > > > <-reset_config
> > > > > >       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
> > > > > > <-reset_config
> > > > > >       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> > > > > > <-purge_configs_funcs
> > > > > > 
> > > > > > -> to device mode
> > > > > > 
> > > > > >       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> > > > > > <-usb_add_function
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> > > > > > <-composite_setup
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
> > > > > > <-eem_set_alt
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
> > > > > > <-eem_set_alt
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> > > > > > <-composite_setup
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
> > > > > > <-eem_set_alt
> > > > > >       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
> > > > > > <-eem_set_alt
> > > > > >       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
> > > > > > <-rx_complete
> > > > > > ...
> > > > > > 
> > > > > > I don't know where to look exactly. Any hints?
> > > > > 
> > > > > do you see anything related to gether_cleanup() after eem_unbind() ?
> > > > 
> > > > Nope. It's a pitty that the trace formatting got messed up above. But as
> > > > you can see I traced gether_* and eem_*. After eem_unbind no traced
> > > > function is called, until I flip the switch to device mode.
> > > > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
> > > > and eem_wrap <-eth_start_xmit lines.
> > > > 
> > > > > If not then, you may try to enable tracing of TCP/IP stack and
> > > > > network side to check who deleting the sysfs entry
> > > > 
> > > > Yes, that's a vast amount of functions to trace. And I don't see how
> > > > that would be related to the patch we're discussing here. I was hoping
> > > > for a little more targeted hint.
> > > 
> > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
> > > 'reset_config' and 'device_remove_file' leads me to:
> > > 
> > > TIMESTAMP  FUNCTION
> > >    |         |
> > >   49.952477: eem_wrap <-eth_start_xmit
> > >   55.072455: eem_wrap <-eth_start_xmit
> > >   55.072621: eem_unwrap <-rx_complete
> > >   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> > >   59.011545: composite_reset <-configfs_composite_reset
> > >   59.011548: reset_config <-__composite_disconnect
> > >   59.013565: eem_disable <-reset_config
> > >   59.013567: gether_disconnect <-reset_config
> > >   59.049560: device_remove_file <-device_remove
> > >   59.051185: configfs_composite_disconnect <-usb_gadget_disco
> > >   59.051189: composite_disconnect <-configfs_composite_discon
> > >   59.051195: configfs_composite_unbind <-gadget_unbind_driver
> > >   59.052519: eem_unbind <-purge_configs_funcs
> > >   59.052529: composite_dev_cleanup <-configfs_composite_unbin
> > >   59.052537: device_remove_file <-composite_dev_cleanup
> > > 
> > > device_remove_file gets called twice, once by device_remove after
> > > gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> > > (removing the gadget)
> > 
> > I believe that the device_remove_file function is only removing
> > suspend-specific attributes, not the complete gadget.  Typically, when you
> > perform the role switch, the Gadget start/stop function in your UDC driver is
> > called. These functions should not delete the gadget
> > 
> > To investigate further, could you please enable the DWC3 functions in ftrace
> > and check who is removing the gadget?  I can also enable this on my system
> > and compare the logs with yours, but I will be in PI planning for 1.5 weeks
> > and may not be able to provide immediate support.
> 
> Since we are almost at -rc7, I propose to revert and try again next cycle.

Why? There is no problem with this patch!

I don't know why you are so excited to revert the patch instead of investigating the original issue. 
Even though I have proved that the problem is caused by usb0 being removed just by role-switching the port by the UDC driver, this behavior is incorrect.

There is no behavior in the Linux kernel that keeps the network interface and removes the related sysfs entry. THIS IS WRONG and has been FIXED without reverting any mainline patch.

Please don't encourage reverting any mainstream patches to avoid investigating a (probably) faulty driver. This is not how the community should work.

Also, I'm a bit wondering: the patch has been in the mainline for some time, and we haven't had any issues except this one, which is due to the wrong behavior of the UDC driver.

> 
> > Additionally, please check if you have any customized DWC patches that may be causing this problem.
> > 
> > > > You may recall the whole issue did not occur before this patch got applied.
> > > > 
> > > > > > > > According to current kernel architecture of u_ether driver, only
> > > > > > > > gether_cleanup should remove the usb0 interface along with its
> > > > > > > > kobject and sysfs interface.
> > > > > > > > I suggest sharing the analysis here to understand why this practice
> > > > > > > > is not followed in your use case or driver ?
> > > > > > > 
> > > > > > > Yes, I'll try to trace where that happens.
> > > > > > > 
> > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems
> > > > > > > harmless? But the usb: net device remaining after disconnect or role
> > > > > > > switch is not good, as the route remains.
> > > > > > > 
> > > > > > > May be they are 2 separate problems. Could you try to reproduce what
> > > > > > > happens if you make eem connection and then unplug?
> > > > > > > 
> > > > > > > > I am curious why the driver was developed without adhering to the
> > > > > > > > kernel's gadget architecture.
> > > > > > 
> > > > > > I don't know what you mean here. Which driver do you mean?
> > > > > > 
> > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > > > > > > with this patch and
> > > > > > > > > > > > share result here.  May be we need some fix in dwc3
> > > > > > > > > > Would have been nice if someone could test on other
> > > > > > > > > > controller as well. But
> > > > > > > > > > another instance of dwc3 is also very welcome.
> > > > > > > > > > > It's quite possible, please test on your side.
> > > > > > > > > > > We are happy to test any fixes if you come up with.
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>
Ferry Toth May 2, 2024, 8:13 p.m. UTC | #15
Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>> Hi,
>>
>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>> Hi,
>>>>>
>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>> Hi,
>>>>>>
>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>> no
>>>>>>>>>>>>> longer exists
>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>> symlink should become active again
>>>>>>>>>
>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>
>>>>>>>>> root@yuna:~# ls
>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>
>>>>>>>>> Then switching to host mode:
>>>>>>>>>
>>>>>>>>> root@yuna:~# ls
>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>> ls: cannot access
>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>> No such file
>>>>>>>>> or directory
>>>>>>>>>
>>>>>>>>> Then back to device mode:
>>>>>>>>>
>>>>>>>>> root@yuna:~# ls
>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>
>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>
>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>
>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>
>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>> two changes?
>>>>>>>
>>>>>>>
>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>> I could not reproduce
>>>>>>> the issue.
>>>>>>>
>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>
>>>>>>> Following are the logs:
>>>>>>> //Host to device
>>>>>>>
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>> mode
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>> a800000.dwc3/gadget/net/
>>>>>>> usb0
>>>>>>>
>>>>>>> //device to host
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>> a800000.dwc3/gadget/net/
>>>>>>> usb0
>>>>>>
>>>>>> That is weird. When I switch to host mode (using the physical switch),
>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>
>>>>>> Switching back to device mode, that gadget directory is recreated. And
>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>
>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>> <
>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>> addr_len            flags              phys_switch_id
>>>>>>> address             gro_flush_timeout  power
>>>>>>> broadcast           ifalias            proto_down
>>>>>>> carrier             ifindex            queues
>>>>>>> carrier_changes     iflink             speed
>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>> dev_port            netdev_group       type
>>>>>>> device              operstate          uevent
>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>              BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>              collisions:0 txqueuelen:1000
>>>>>>>              RX bytes:0 TX bytes:0
>>>>>>>
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>
>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>> the usb0 interface remains available.
>>>>>>
>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>
>>>>>> I have while in host mode:
>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>            inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>> 10.42.0.255
>>>>>>            inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>> scopeid 0x20<link>
>>>>>>
>>>>>>
>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>
>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>
>>>>>> Switching from device -> host -> device:
>>>>>>
>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>      plugin 'function_graph'
>>>>>> Hit Ctrl^C to stop recording
>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>        188 bytes in size (4096 uncompressed)
>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>        0 bytes in size (0 uncompressed)
>>>>>> root@yuna:~# trace-cmd report
>>>>>> cpus=2
>>>>>>         irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>         irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>         irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>>>>> 116.520 us |  gether_connect();
>>>>>>         irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>         irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>>>>> 109.000 us |  gether_connect();
>>>>>
>>>>> I tried to get a more useful trace:
>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>>>> -> switch to host mode then back to device
>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>> # tracer: function
>>>>> #
>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>> #
>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>> #                               / _----=> need-resched
>>>>> #                              | / _---=> hardirq/softirq
>>>>> #                              || / _--=> preempt-depth
>>>>> #                              ||| / _-=> migrate-disable
>>>>> #                              |||| /     delay
>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>> #              | |         |   |||||     |         |
>>>>>        irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>> <-__composite_disconnect
>>>>>        irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>> <-reset_config
>>>>>        irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>>>>> <-reset_config
>>>>>        kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>> <-purge_configs_funcs
>>>>>
>>>>> -> to device mode
>>>>>
>>>>>        kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>> <-usb_add_function
>>>>>        irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>> <-composite_setup
>>>>>        irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>>>>> <-eem_set_alt
>>>>>        irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>> <-eem_set_alt
>>>>>        irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>> <-composite_setup
>>>>>        irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>>>>> <-eem_set_alt
>>>>>        irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>> <-eem_set_alt
>>>>>        irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>> <-rx_complete
>>>>> ...
>>>>>
>>>>> I don't know where to look exactly. Any hints?
>>>>
>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>
>>> Nope. It's a pitty that the trace formatting got messed up above. But as
>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>> function is called, until I flip the switch to device mode.
>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
>>> and eem_wrap <-eth_start_xmit lines.
>>>
>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>> network side to check who deleting the sysfs entry
>>>
>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>> that would be related to the patch we're discussing here. I was hoping
>>> for a little more targeted hint.
>>
>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
>> 'reset_config' and 'device_remove_file' leads me to:
>>
>> TIMESTAMP  FUNCTION
>>     |         |
>>    49.952477: eem_wrap <-eth_start_xmit
>>    55.072455: eem_wrap <-eth_start_xmit
>>    55.072621: eem_unwrap <-rx_complete
>>    59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>    59.011545: composite_reset <-configfs_composite_reset
>>    59.011548: reset_config <-__composite_disconnect
>>    59.013565: eem_disable <-reset_config
>>    59.013567: gether_disconnect <-reset_config
>>    59.049560: device_remove_file <-device_remove
>>    59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>    59.051189: composite_disconnect <-configfs_composite_discon
>>    59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>    59.052519: eem_unbind <-purge_configs_funcs
>>    59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>    59.052537: device_remove_file <-composite_dev_cleanup
>>
>> device_remove_file gets called twice, once by device_remove after
>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>> (removing the gadget)
> 
> I believe that the device_remove_file function is only removing suspend-specific attributes, not the complete gadget.
> Typically, when you perform the role switch, the Gadget start/stop function in your UDC driver is called. These functions should not delete the gadget
> 
> To investigate further, could you please enable the DWC3 functions in ftrace and check who is removing the gadget?
> I can also enable this on my system and compare the logs with yours, but I will be in PI planning for 1.5 weeks and may not be able to provide immediate support.

Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so 
trace is 600kb. I cut uninteresting stuff before 
configfs_composite_reset <-usb_gadget_udc_reset and after 
__dwc3_set_mode, < 300 lines remain. See attached tar.gz for you to compare.

> Additionally, please check if you have any customized DWC patches that may be causing this problem.
> 
>>
>>> You may recall the whole issue did not occur before this patch got applied.
>>>
>>>> Hardik
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>> kobject and sysfs interface.
>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>> is not followed in your use case or driver ?
>>>>>>
>>>>>> Yes, I'll try to trace where that happens.
>>>>>>
>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>> switch is not good, as the route remains.
>>>>>>
>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>> happens if you make eem connection and then unplug?
>>>>>>
>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>> kernel's gadget architecture.
>>>>>
>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>
>>>>>>>>
>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>> with this patch and
>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>> controller as well. But
>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> With Best Regards,
>>>>>>>> Andy Shevchenko
>>>>>>>>
>>>>>>>>
>>>>>>
>>>
Ferry Toth May 2, 2024, 8:32 p.m. UTC | #16
Oops, sorry, wrong file attached . Now correct one.

Op 02-05-2024 om 22:13 schreef Ferry Toth:
> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>> Hi,
>>>
>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>> Hi,
>>>>
>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>> target 
>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>> no
>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>
>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>
>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>> ls: cannot access
>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>> No such file
>>>>>>>>>> or directory
>>>>>>>>>>
>>>>>>>>>> Then back to device mode:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>
>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>
>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>
>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>> two changes?
>>>>>>>>
>>>>>>>>
>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>> I could not reproduce
>>>>>>>> the issue.
>>>>>>>>
>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>
>>>>>>>> Following are the logs:
>>>>>>>> //Host to device
>>>>>>>>
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>> mode
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>> usb0
>>>>>>>>
>>>>>>>> //device to host
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > 
>>>>>>>> mode
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>> usb0
>>>>>>>
>>>>>>> That is weird. When I switch to host mode (using the physical 
>>>>>>> switch),
>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>
>>>>>>> Switching back to device mode, that gadget directory is 
>>>>>>> recreated. And
>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>
>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>> <
>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>> address             gro_flush_timeout  power
>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>> carrier             ifindex            queues
>>>>>>>> carrier_changes     iflink             speed
>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>> dev_port            netdev_group       type
>>>>>>>> device              operstate          uevent
>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>              BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>              collisions:0 txqueuelen:1000
>>>>>>>>              RX bytes:0 TX bytes:0
>>>>>>>>
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>
>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>> the usb0 interface remains available.
>>>>>>>
>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>
>>>>>>> I have while in host mode:
>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>>            inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>> 10.42.0.255
>>>>>>>            inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>>> scopeid 0x20<link>
>>>>>>>
>>>>>>>
>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>
>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions 
>>>>>>>> involved
>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>
>>>>>>> Switching from device -> host -> device:
>>>>>>>
>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>      plugin 'function_graph'
>>>>>>> Hit Ctrl^C to stop recording
>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>        188 bytes in size (4096 uncompressed)
>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>        0 bytes in size (0 uncompressed)
>>>>>>> root@yuna:~# trace-cmd report
>>>>>>> cpus=2
>>>>>>>         irq/68-dwc3-725   [000]   514.575337: 
>>>>>>> funcgraph_entry:      #
>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>         irq/68-dwc3-946   [000]   524.263731: 
>>>>>>> funcgraph_entry:      +
>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>         irq/68-dwc3-946   [000]   524.263743: 
>>>>>>> funcgraph_entry:      !
>>>>>>> 116.520 us |  gether_connect();
>>>>>>>         irq/68-dwc3-946   [000]   524.268029: 
>>>>>>> funcgraph_entry:      #
>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>         irq/68-dwc3-946   [000]   524.270089: 
>>>>>>> funcgraph_entry:      !
>>>>>>> 109.000 us |  gether_connect();
>>>>>>
>>>>>> I tried to get a more useful trace:
>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> 
>>>>>> set_ftrace_filter
>>>>>> -> switch to host mode then back to device
>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>> # tracer: function
>>>>>> #
>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>> #
>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>> #                               / _----=> need-resched
>>>>>> #                              | / _---=> hardirq/softirq
>>>>>> #                              || / _--=> preempt-depth
>>>>>> #                              ||| / _-=> migrate-disable
>>>>>> #                              |||| /     delay
>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>> #              | |         |   |||||     |         |
>>>>>>        irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>> <-__composite_disconnect
>>>>>>        irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>> <-reset_config
>>>>>>        irq/68-dwc3-523     [000] D..3.   133.992276: 
>>>>>> gether_disconnect
>>>>>> <-reset_config
>>>>>>        kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>> <-purge_configs_funcs
>>>>>>
>>>>>> -> to device mode
>>>>>>
>>>>>>        kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>> <-usb_add_function
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>> <-composite_setup
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.155215: 
>>>>>> gether_disconnect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>> <-composite_setup
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.157292: 
>>>>>> gether_disconnect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>> <-rx_complete
>>>>>> ...
>>>>>>
>>>>>> I don't know where to look exactly. Any hints?
>>>>>
>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>
>>>> Nope. It's a pitty that the trace formatting got messed up above. 
>>>> But as
>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>> function is called, until I flip the switch to device mode.
>>>> The ... at the end is where I cut uninteresting eem_unwrap 
>>>> <-rx_complete
>>>> and eem_wrap <-eth_start_xmit lines.
>>>>
>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>> network side to check who deleting the sysfs entry
>>>>
>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>> that would be related to the patch we're discussing here. I was hoping
>>>> for a little more targeted hint.
>>>
>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 
>>> 'usb_fun*',
>>> 'reset_config' and 'device_remove_file' leads me to:
>>>
>>> TIMESTAMP  FUNCTION
>>>     |         |
>>>    49.952477: eem_wrap <-eth_start_xmit
>>>    55.072455: eem_wrap <-eth_start_xmit
>>>    55.072621: eem_unwrap <-rx_complete
>>>    59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>    59.011545: composite_reset <-configfs_composite_reset
>>>    59.011548: reset_config <-__composite_disconnect
>>>    59.013565: eem_disable <-reset_config
>>>    59.013567: gether_disconnect <-reset_config
>>>    59.049560: device_remove_file <-device_remove
>>>    59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>    59.051189: composite_disconnect <-configfs_composite_discon
>>>    59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>    59.052519: eem_unbind <-purge_configs_funcs
>>>    59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>    59.052537: device_remove_file <-composite_dev_cleanup
>>>
>>> device_remove_file gets called twice, once by device_remove after
>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>> (removing the gadget)
>>
>> I believe that the device_remove_file function is only removing 
>> suspend-specific attributes, not the complete gadget.
>> Typically, when you perform the role switch, the Gadget start/stop 
>> function in your UDC driver is called. These functions should not 
>> delete the gadget
>>
>> To investigate further, could you please enable the DWC3 functions in 
>> ftrace and check who is removing the gadget?
>> I can also enable this on my system and compare the logs with yours, 
>> but I will be in PI planning for 1.5 weeks and may not be able to 
>> provide immediate support.
> 
> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so 
> trace is 600kb. I cut uninteresting stuff before 
> configfs_composite_reset <-usb_gadget_udc_reset and after 
> __dwc3_set_mode, < 300 lines remain. See attached tar.gz for you to 
> compare.
> 
>> Additionally, please check if you have any customized DWC patches that 
>> may be causing this problem.
>>
>>>
>>>> You may recall the whole issue did not occur before this patch got 
>>>> applied.
>>>>
>>>>> Hardik
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>> kobject and sysfs interface.
>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>> is not followed in your use case or driver ?
>>>>>>>
>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>
>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>> switch is not good, as the route remains.
>>>>>>>
>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>
>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>> kernel's gadget architecture.
>>>>>>
>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>
>>>>>>>>>
>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>> with this patch and
>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>> controller as well. But
>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> With Best Regards,
>>>>>>>>> Andy Shevchenko
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>
Linux regression tracking (Thorsten Leemhuis) May 3, 2024, 7:24 a.m. UTC | #17
Greg, not totally sure, but this might be something where you might need
to make a judgement call. See below:

On 02.05.24 18:16, Hardik Gajjar wrote:
> On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote:
>> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>> 

>> ...
>> 
>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>
>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>
>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>> ls: cannot access
>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>> No such file
>>>>>>>>>>> or directory
>>>>>>>>>>>
>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>
>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>
>>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>>
>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>>> two changes?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>>> I could not reproduce
>>>>>>>>> the issue.
>>>>>>>>>
>>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>
>>>>>>>>> Following are the logs:
>>>>>>>>> //Host to device
>>>>>>>>>
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>>> mode
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>> usb0
>>>>>>>>>
>>>>>>>>> //device to host
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>> usb0
>>>>>>>>
>>>>>>>> That is weird. When I switch to host mode (using the physical switch),
>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>
>>>>>>>> Switching back to device mode, that gadget directory is recreated. And
>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>
>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>> <
>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>> carrier             ifindex            queues
>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>> device              operstate          uevent
>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>>             collisions:0 txqueuelen:1000
>>>>>>>>>             RX bytes:0 TX bytes:0
>>>>>>>>>
>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>
>>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>>> the usb0 interface remains available.
>>>>>>>>
>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>
>>>>>>>> I have while in host mode:
>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>> 10.42.0.255
>>>>>>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
>>>>>>>> scopeid 0x20<link>
>>>>>>>>
>>>>>>>>
>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>
>>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>
>>>>>>>> Switching from device -> host -> device:
>>>>>>>>
>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>     plugin 'function_graph'
>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>       188 bytes in size (4096 uncompressed)
>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>       0 bytes in size (0 uncompressed)
>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>> cpus=2
>>>>>>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>
>>>>>>> I tried to get a more useful trace:
>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>>>>>> -> switch to host mode then back to device
>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>> # tracer: function
>>>>>>> #
>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>> #
>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>> #                               / _----=> need-resched
>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>> #                              || / _--=> preempt-depth
>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>> #                              |||| /     delay
>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>> #              | |         |   |||||     |         |
>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>>> <-__composite_disconnect
>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>> <-reset_config
>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>>>>>>> <-reset_config
>>>>>>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>> <-purge_configs_funcs
>>>>>>>
>>>>>>> -> to device mode
>>>>>>>
>>>>>>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>> <-usb_add_function
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>> <-composite_setup
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>>>>>>> <-eem_set_alt
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>>>> <-eem_set_alt
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>> <-composite_setup
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>>>>>>> <-eem_set_alt
>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>>>> <-eem_set_alt
>>>>>>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>> <-rx_complete
>>>>>>> ...
>>>>>>>
>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>
>>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>>
>>>>> Nope. It's a pitty that the trace formatting got messed up above. But as
>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>> function is called, until I flip the switch to device mode.
>>>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>
>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>> network side to check who deleting the sysfs entry
>>>>>
>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>> that would be related to the patch we're discussing here. I was hoping
>>>>> for a little more targeted hint.
>>>>
>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>
>>>> TIMESTAMP  FUNCTION
>>>>    |         |
>>>>   49.952477: eem_wrap <-eth_start_xmit
>>>>   55.072455: eem_wrap <-eth_start_xmit
>>>>   55.072621: eem_unwrap <-rx_complete
>>>>   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>   59.011545: composite_reset <-configfs_composite_reset
>>>>   59.011548: reset_config <-__composite_disconnect
>>>>   59.013565: eem_disable <-reset_config
>>>>   59.013567: gether_disconnect <-reset_config
>>>>   59.049560: device_remove_file <-device_remove
>>>>   59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>   59.051189: composite_disconnect <-configfs_composite_discon
>>>>   59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>   59.052519: eem_unbind <-purge_configs_funcs
>>>>   59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>   59.052537: device_remove_file <-composite_dev_cleanup
>>>>
>>>> device_remove_file gets called twice, once by device_remove after
>>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>>> (removing the gadget)
>>>
>>> I believe that the device_remove_file function is only removing
>>> suspend-specific attributes, not the complete gadget.  Typically, when you
>>> perform the role switch, the Gadget start/stop function in your UDC driver is
>>> called. These functions should not delete the gadget
>>>
>>> To investigate further, could you please enable the DWC3 functions in ftrace
>>> and check who is removing the gadget?  I can also enable this on my system
>>> and compare the logs with yours, but I will be in PI planning for 1.5 weeks
>>> and may not be able to provide immediate support.
>>
>> Since we are almost at -rc7, I propose to revert and try again next cycle.
> 
> Why? There is no problem with this patch!
> 
> I don't know why you are so excited to revert the patch instead of
> investigating the original issue. Even though I have proved that the
> problem is caused by usb0 being removed just by role-switching the
> port by the UDC driver, this behavior is incorrect.
> 
> There is no behavior in the Linux kernel that keeps the network
> interface and removes the related sysfs entry. THIS IS WRONG and has
> been FIXED without reverting any mainline patch.
> 
> Please don't encourage reverting any mainstream patches to avoid
> investigating a (probably) faulty driver. This is not how the
> community should work.

I only skimmed the thread and this is not my area of expertise, so
correct me if I'm wrong anywhere here:

But from what I see I tend to disagree: the patch mentioned in the
subject apparently regressed something. This afaics was reported 16
weeks ago during the 6.7 cycle[1]. Due to how we apply the "no
regressions" rule[2] the change causing this ideally should have been
reverted before 6.8 was release -- it does not matter that it just
exposed an existing problem rooted somewhere else. That revert did not
happen. Then even a few more weeks went by and this is still not fixed.
Then I'd guess reverting this might be the right course of action to
motivate someone to fix the exposed problem. Unless of course we know or
fear that a revert causes regressions for users of 6.8.y -- is that the
case?

[1] https://lore.kernel.org/all/ZaQS5x-XK08Jre6I@smile.fi.intel.com/

[2] https://docs.kernel.org/process/handling-regressions.html

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> Also, I'm a bit wondering: the patch has been in the mainline for
> some time, and we haven't had any issues except this one, which is
> due to the wrong behavior of the UDC driver.
>>
>>> Additionally, please check if you have any customized DWC patches that may be causing this problem.
>>>
>>>>> You may recall the whole issue did not occur before this patch got applied.
>>>>>
>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>> kobject and sysfs interface.
>>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>
>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>
>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>>> switch is not good, as the route remains.
>>>>>>>>
>>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>
>>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>>> kernel's gadget architecture.
>>>>>>>
>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>
>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>> controller as well. But
>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>
>> -- 
>> With Best Regards,
>> Andy Shevchenko
>>
>>
Hardik Gajjar May 3, 2024, 9:15 a.m. UTC | #18
On Fri, May 03, 2024 at 09:24:16AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> Greg, not totally sure, but this might be something where you might need
> to make a judgement call. See below:
> 
> On 02.05.24 18:16, Hardik Gajjar wrote:
> > On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote:
> >> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
> >>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> >>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
> >>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> >>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> >>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
> >>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> >>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> >>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> >>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> >>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> >>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
> >>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> >>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> >> 
> 
> >> ...
> >> 
> >>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
> >>>>>>>>>>>>>>> removed and it is a link, the link
> >>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> >>>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>> longer exists
> >>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
> >>>>>>>>>>>>> present, but the symlink is
> >>>>>>>>>>>>> broken. In that case, the dwc3 driver should
> >>>>>>>>>>>>> recreate the device, and the
> >>>>>>>>>>>>> symlink should become active again
> >>>>>>>>>>>
> >>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
> >>>>>>>>>>>
> >>>>>>>>>>> root@yuna:~# ls
> >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> >>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
> >>>>>>>>>>>
> >>>>>>>>>>> Then switching to host mode:
> >>>>>>>>>>>
> >>>>>>>>>>> root@yuna:~# ls
> >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> >>>>>>>>>>> ls: cannot access
> >>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> >>>>>>>>>>> No such file
> >>>>>>>>>>> or directory
> >>>>>>>>>>>
> >>>>>>>>>>> Then back to device mode:
> >>>>>>>>>>>
> >>>>>>>>>>> root@yuna:~# ls
> >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> >>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
> >>>>>>>>>>>
> >>>>>>>>>>> net is missing. But, network functions:
> >>>>>>>>>>>
> >>>>>>>>>>> root@yuna:~# ping 10.42.0.1
> >>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
> >>>>>>>>>>>
> >>>>>>>>>>> Mass storage device is created and removed each time as expected.
> >>>>>>>>>>
> >>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
> >>>>>>>>>> two changes?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
> >>>>>>>>> I could not reproduce
> >>>>>>>>> the issue.
> >>>>>>>>>
> >>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
> >>>>>>>>> the role switching of the USB mode (peripheral <-> host)
> >>>>>>>>>
> >>>>>>>>> Following are the logs:
> >>>>>>>>> //Host to device
> >>>>>>>>>
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> >>>>>>>>>> mode
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
> >>>>>>>>> a800000.dwc3/gadget/net/
> >>>>>>>>> usb0
> >>>>>>>>>
> >>>>>>>>> //device to host
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
> >>>>>>>>> a800000.dwc3/gadget/net/
> >>>>>>>>> usb0
> >>>>>>>>
> >>>>>>>> That is weird. When I switch to host mode (using the physical switch),
> >>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
> >>>>>>>>
> >>>>>>>> Switching back to device mode, that gadget directory is recreated. And
> >>>>>>>> gadget/sound as well, but not gadget/net.
> >>>>>>>>
> >>>>>>>>> s a800000.dwc3/gadget/net/usb0
> >>>>>>>>> <
> >>>>>>>>> addr_assign_type    duplex             phys_port_name
> >>>>>>>>> addr_len            flags              phys_switch_id
> >>>>>>>>> address             gro_flush_timeout  power
> >>>>>>>>> broadcast           ifalias            proto_down
> >>>>>>>>> carrier             ifindex            queues
> >>>>>>>>> carrier_changes     iflink             speed
> >>>>>>>>> carrier_down_count  link_mode          statistics
> >>>>>>>>> carrier_up_count    mtu                subsystem
> >>>>>>>>> dev_id              name_assign_type   tx_queue_len
> >>>>>>>>> dev_port            netdev_group       type
> >>>>>>>>> device              operstate          uevent
> >>>>>>>>> dormant             phys_port_id       waiting_for_supplier
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> >>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> >>>>>>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
> >>>>>>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> >>>>>>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> >>>>>>>>>             collisions:0 txqueuelen:1000
> >>>>>>>>>             RX bytes:0 TX bytes:0
> >>>>>>>>>
> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
> >>>>>>>>>
> >>>>>>>>> I strongly advise against reverting the patch solely based on the
> >>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
> >>>>>>>>> the usb0 interface remains available.
> >>>>>>>>
> >>>>>>>> There's more to it. I also mentioned that switching the role or
> >>>>>>>> unplugging the cable leaves the usb0 connection.
> >>>>>>>>
> >>>>>>>> I have while in host mode:
> >>>>>>>> root@yuna:~# ifconfig -a usb0
> >>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> >>>>>>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast
> >>>>>>>> 10.42.0.255
> >>>>>>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
> >>>>>>>> scopeid 0x20<link>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> You don't see that because you didn't create a connection at all.
> >>>>>>>>
> >>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
> >>>>>>>>> and identify which faulty call is responsible for removing usb0.
> >>>>>>>>
> >>>>>>>> Switching from device -> host -> device:
> >>>>>>>>
> >>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> >>>>>>>>     plugin 'function_graph'
> >>>>>>>> Hit Ctrl^C to stop recording
> >>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
> >>>>>>>>       188 bytes in size (4096 uncompressed)
> >>>>>>>> CPU1 data recorded at offset=0x1c9000
> >>>>>>>>       0 bytes in size (0 uncompressed)
> >>>>>>>> root@yuna:~# trace-cmd report
> >>>>>>>> cpus=2
> >>>>>>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
> >>>>>>>> 2079.480 us |  gether_disconnect();
> >>>>>>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
> >>>>>>>> 11.640 us  |  gether_disconnect();
> >>>>>>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
> >>>>>>>> 116.520 us |  gether_connect();
> >>>>>>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
> >>>>>>>> 2057.260 us |  gether_disconnect();
> >>>>>>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
> >>>>>>>> 109.000 us |  gether_connect();
> >>>>>>>
> >>>>>>> I tried to get a more useful trace:
> >>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> >>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> >>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
> >>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
> >>>>>>> -> switch to host mode then back to device
> >>>>>>> root@yuna:/sys/kernel/tracing# cat trace
> >>>>>>> # tracer: function
> >>>>>>> #
> >>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
> >>>>>>> #
> >>>>>>> #                                _-----=> irqs-off/BH-disabled
> >>>>>>> #                               / _----=> need-resched
> >>>>>>> #                              | / _---=> hardirq/softirq
> >>>>>>> #                              || / _--=> preempt-depth
> >>>>>>> #                              ||| / _-=> migrate-disable
> >>>>>>> #                              |||| /     delay
> >>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> >>>>>>> #              | |         |   |||||     |         |
> >>>>>>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> >>>>>>> <-__composite_disconnect
> >>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> >>>>>>> <-reset_config
> >>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
> >>>>>>> <-reset_config
> >>>>>>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> >>>>>>> <-purge_configs_funcs
> >>>>>>>
> >>>>>>> -> to device mode
> >>>>>>>
> >>>>>>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> >>>>>>> <-usb_add_function
> >>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> >>>>>>> <-composite_setup
> >>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
> >>>>>>> <-eem_set_alt
> >>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
> >>>>>>> <-eem_set_alt
> >>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> >>>>>>> <-composite_setup
> >>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
> >>>>>>> <-eem_set_alt
> >>>>>>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
> >>>>>>> <-eem_set_alt
> >>>>>>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
> >>>>>>> <-rx_complete
> >>>>>>> ...
> >>>>>>>
> >>>>>>> I don't know where to look exactly. Any hints?
> >>>>>>
> >>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
> >>>>>
> >>>>> Nope. It's a pitty that the trace formatting got messed up above. But as
> >>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
> >>>>> function is called, until I flip the switch to device mode.
> >>>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
> >>>>> and eem_wrap <-eth_start_xmit lines.
> >>>>>
> >>>>>> If not then, you may try to enable tracing of TCP/IP stack and
> >>>>>> network side to check who deleting the sysfs entry
> >>>>>
> >>>>> Yes, that's a vast amount of functions to trace. And I don't see how
> >>>>> that would be related to the patch we're discussing here. I was hoping
> >>>>> for a little more targeted hint.
> >>>>
> >>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
> >>>> 'reset_config' and 'device_remove_file' leads me to:
> >>>>
> >>>> TIMESTAMP  FUNCTION
> >>>>    |         |
> >>>>   49.952477: eem_wrap <-eth_start_xmit
> >>>>   55.072455: eem_wrap <-eth_start_xmit
> >>>>   55.072621: eem_unwrap <-rx_complete
> >>>>   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> >>>>   59.011545: composite_reset <-configfs_composite_reset
> >>>>   59.011548: reset_config <-__composite_disconnect
> >>>>   59.013565: eem_disable <-reset_config
> >>>>   59.013567: gether_disconnect <-reset_config
> >>>>   59.049560: device_remove_file <-device_remove
> >>>>   59.051185: configfs_composite_disconnect <-usb_gadget_disco
> >>>>   59.051189: composite_disconnect <-configfs_composite_discon
> >>>>   59.051195: configfs_composite_unbind <-gadget_unbind_driver
> >>>>   59.052519: eem_unbind <-purge_configs_funcs
> >>>>   59.052529: composite_dev_cleanup <-configfs_composite_unbin
> >>>>   59.052537: device_remove_file <-composite_dev_cleanup
> >>>>
> >>>> device_remove_file gets called twice, once by device_remove after
> >>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> >>>> (removing the gadget)
> >>>
> >>> I believe that the device_remove_file function is only removing
> >>> suspend-specific attributes, not the complete gadget.  Typically, when you
> >>> perform the role switch, the Gadget start/stop function in your UDC driver is
> >>> called. These functions should not delete the gadget
> >>>
> >>> To investigate further, could you please enable the DWC3 functions in ftrace
> >>> and check who is removing the gadget?  I can also enable this on my system
> >>> and compare the logs with yours, but I will be in PI planning for 1.5 weeks
> >>> and may not be able to provide immediate support.
> >>
> >> Since we are almost at -rc7, I propose to revert and try again next cycle.
> > 
> > Why? There is no problem with this patch!
> > 
> > I don't know why you are so excited to revert the patch instead of
> > investigating the original issue. Even though I have proved that the
> > problem is caused by usb0 being removed just by role-switching the
> > port by the UDC driver, this behavior is incorrect.
> > 
> > There is no behavior in the Linux kernel that keeps the network
> > interface and removes the related sysfs entry. THIS IS WRONG and has
> > been FIXED without reverting any mainline patch.
> > 
> > Please don't encourage reverting any mainstream patches to avoid
> > investigating a (probably) faulty driver. This is not how the
> > community should work.
> 
> I only skimmed the thread and this is not my area of expertise, so
> correct me if I'm wrong anywhere here:
> 
> But from what I see I tend to disagree: the patch mentioned in the
> subject apparently regressed something. This afaics was reported 16
> weeks ago during the 6.7 cycle[1]. Due to how we apply the "no
> regressions" rule[2] the change causing this ideally should have been
> reverted before 6.8 was release -- it does not matter that it just
> exposed an existing problem rooted somewhere else. That revert did not
> happen. Then even a few more weeks went by and this is still not fixed.
> Then I'd guess reverting this might be the right course of action to
> motivate someone to fix the exposed problem. Unless of course we know or
> fear that a revert causes regressions for users of 6.8.y -- is that the
> case?
> 
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_ZaQS5x-2DXK08Jre6I-40smile.fi.intel.com_&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=HBokvujOYQdyCxk3eMV6xtwLLNdCxjsnZ4nTzbhbLXM&e=
> 
> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.kernel.org_process_handling-2Dregressions.html&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=lOh1a5Vzu2VNEaspoNgEipxRU9UDDPcibRLoOVrGbIU&e=
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__linux-2Dregtracking.leemhuis.info_about_-23tldr&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=MuVuulVZqYAjIqCVpNc01ve8yD6DFt2wOZEH4p9khTc&e=
> If I did something stupid, please tell me, as explained on that page.

I understand your perspective. Do you have documentation or guidelines available to ensure that the existing problem is resolved and the correct patch is reinstated?

As far as I know, the issue is currently only reproducible by the reporter. I also have a similar setup, but I have not been able to reproduce the issue myself (please refer to our previous discussions). This discrepancy could be due to some OEM modifications.

If the problem is only reproducible by the reporter, how do we motivate others to support and upstream the fix? Isn't it demotivating to revert the correct patch and retain the wrong behavior of the driver?

Additionally, is it possible to determine how many correct patches have been reverted in the past to address this particular problem?

Please correct me if I am wrong ?

> 
> > Also, I'm a bit wondering: the patch has been in the mainline for
> > some time, and we haven't had any issues except this one, which is
> > due to the wrong behavior of the UDC driver.
> >>
> >>> Additionally, please check if you have any customized DWC patches that may be causing this problem.
> >>>
> >>>>> You may recall the whole issue did not occur before this patch got applied.
> >>>>>
> >>>>>>>>> According to current kernel architecture of u_ether driver, only
> >>>>>>>>> gether_cleanup should remove the usb0 interface along with its
> >>>>>>>>> kobject and sysfs interface.
> >>>>>>>>> I suggest sharing the analysis here to understand why this practice
> >>>>>>>>> is not followed in your use case or driver ?
> >>>>>>>>
> >>>>>>>> Yes, I'll try to trace where that happens.
> >>>>>>>>
> >>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
> >>>>>>>> harmless? But the usb: net device remaining after disconnect or role
> >>>>>>>> switch is not good, as the route remains.
> >>>>>>>>
> >>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
> >>>>>>>> happens if you make eem connection and then unplug?
> >>>>>>>>
> >>>>>>>>> I am curious why the driver was developed without adhering to the
> >>>>>>>>> kernel's gadget architecture.
> >>>>>>>
> >>>>>>> I don't know what you mean here. Which driver do you mean?
> >>>>>>>
> >>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
> >>>>>>>>>>>>> with this patch and
> >>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
> >>>>>>>>>>> Would have been nice if someone could test on other
> >>>>>>>>>>> controller as well. But
> >>>>>>>>>>> another instance of dwc3 is also very welcome.
> >>>>>>>>>>>> It's quite possible, please test on your side.
> >>>>>>>>>>>> We are happy to test any fixes if you come up with.
> >>
> >> -- 
> >> With Best Regards,
> >> Andy Shevchenko
> >>
> >>
Linux regression tracking (Thorsten Leemhuis) May 3, 2024, 12:39 p.m. UTC | #19
On 03.05.24 11:15, Hardik Gajjar wrote:
> On Fri, May 03, 2024 at 09:24:16AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> Greg, not totally sure, but this might be something where you might need
>> to make a judgement call. See below:
>>
>> On 02.05.24 18:16, Hardik Gajjar wrote:
>>> On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote:
>>>> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote:
>>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>
>>
>>>> ...
>>>>
>>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> ls: cannot access
>>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>>>> No such file
>>>>>>>>>>>>> or directory
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>
>>>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>>>>
>>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>>>>> two changes?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>>>>> I could not reproduce
>>>>>>>>>>> the issue.
>>>>>>>>>>>
>>>>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>>>
>>>>>>>>>>> Following are the logs:
>>>>>>>>>>> //Host to device
>>>>>>>>>>>
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>>>>> mode
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>> usb0
>>>>>>>>>>>
>>>>>>>>>>> //device to host
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>> usb0
>>>>>>>>>>
>>>>>>>>>> That is weird. When I switch to host mode (using the physical switch),
>>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>>>
>>>>>>>>>> Switching back to device mode, that gadget directory is recreated. And
>>>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>>>
>>>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>>>> <
>>>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>>>> carrier             ifindex            queues
>>>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>>>> device              operstate          uevent
>>>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>>>             BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>>>             RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>>>>             collisions:0 txqueuelen:1000
>>>>>>>>>>>             RX bytes:0 TX bytes:0
>>>>>>>>>>>
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>>>
>>>>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>>>>> the usb0 interface remains available.
>>>>>>>>>>
>>>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>>>
>>>>>>>>>> I have while in host mode:
>>>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>>>>>           inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>>>> 10.42.0.255
>>>>>>>>>>           inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64 
>>>>>>>>>> scopeid 0x20<link>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>>>
>>>>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>>>
>>>>>>>>>> Switching from device -> host -> device:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>>>     plugin 'function_graph'
>>>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>>>       188 bytes in size (4096 uncompressed)
>>>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>>>       0 bytes in size (0 uncompressed)
>>>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>>>> cpus=2
>>>>>>>>>>        irq/68-dwc3-725   [000]   514.575337: funcgraph_entry:      #
>>>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>>>        irq/68-dwc3-946   [000]   524.263731: funcgraph_entry:      +
>>>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>>>        irq/68-dwc3-946   [000]   524.263743: funcgraph_entry:      !
>>>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>>>        irq/68-dwc3-946   [000]   524.268029: funcgraph_entry:      #
>>>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>>>        irq/68-dwc3-946   [000]   524.270089: funcgraph_entry:      !
>>>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>>>
>>>>>>>>> I tried to get a more useful trace:
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>>>>>>>> -> switch to host mode then back to device
>>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>>>> # tracer: function
>>>>>>>>> #
>>>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>>>> #
>>>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>>>> #                               / _----=> need-resched
>>>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>>>> #                              || / _--=> preempt-depth
>>>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>>>> #                              |||| /     delay
>>>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>>>> #              | |         |   |||||     |         |
>>>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>>>>> <-__composite_disconnect
>>>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>>>> <-reset_config
>>>>>>>>>       irq/68-dwc3-523     [000] D..3.   133.992276: gether_disconnect
>>>>>>>>> <-reset_config
>>>>>>>>>       kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>>>> <-purge_configs_funcs
>>>>>>>>>
>>>>>>>>> -> to device mode
>>>>>>>>>
>>>>>>>>>       kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>>>> <-usb_add_function
>>>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>>>> <-composite_setup
>>>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155215: gether_disconnect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>>>> <-composite_setup
>>>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.157292: gether_disconnect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>       irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>       irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>>>> <-rx_complete
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>>>
>>>>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>>>>
>>>>>>> Nope. It's a pitty that the trace formatting got messed up above. But as
>>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>>>> function is called, until I flip the switch to device mode.
>>>>>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
>>>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>>>
>>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>>>> network side to check who deleting the sysfs entry
>>>>>>>
>>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>>>> that would be related to the patch we're discussing here. I was hoping
>>>>>>> for a little more targeted hint.
>>>>>>
>>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
>>>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>>>
>>>>>> TIMESTAMP  FUNCTION
>>>>>>    |         |
>>>>>>   49.952477: eem_wrap <-eth_start_xmit
>>>>>>   55.072455: eem_wrap <-eth_start_xmit
>>>>>>   55.072621: eem_unwrap <-rx_complete
>>>>>>   59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>>>   59.011545: composite_reset <-configfs_composite_reset
>>>>>>   59.011548: reset_config <-__composite_disconnect
>>>>>>   59.013565: eem_disable <-reset_config
>>>>>>   59.013567: gether_disconnect <-reset_config
>>>>>>   59.049560: device_remove_file <-device_remove
>>>>>>   59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>>>   59.051189: composite_disconnect <-configfs_composite_discon
>>>>>>   59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>>>   59.052519: eem_unbind <-purge_configs_funcs
>>>>>>   59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>>>   59.052537: device_remove_file <-composite_dev_cleanup
>>>>>>
>>>>>> device_remove_file gets called twice, once by device_remove after
>>>>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>>>>> (removing the gadget)
>>>>>
>>>>> I believe that the device_remove_file function is only removing
>>>>> suspend-specific attributes, not the complete gadget.  Typically, when you
>>>>> perform the role switch, the Gadget start/stop function in your UDC driver is
>>>>> called. These functions should not delete the gadget
>>>>>
>>>>> To investigate further, could you please enable the DWC3 functions in ftrace
>>>>> and check who is removing the gadget?  I can also enable this on my system
>>>>> and compare the logs with yours, but I will be in PI planning for 1.5 weeks
>>>>> and may not be able to provide immediate support.
>>>>
>>>> Since we are almost at -rc7, I propose to revert and try again next cycle.
>>>
>>> Why? There is no problem with this patch!
>>>
>>> I don't know why you are so excited to revert the patch instead of
>>> investigating the original issue. Even though I have proved that the
>>> problem is caused by usb0 being removed just by role-switching the
>>> port by the UDC driver, this behavior is incorrect.
>>>
>>> There is no behavior in the Linux kernel that keeps the network
>>> interface and removes the related sysfs entry. THIS IS WRONG and has
>>> been FIXED without reverting any mainline patch.
>>>
>>> Please don't encourage reverting any mainstream patches to avoid
>>> investigating a (probably) faulty driver. This is not how the
>>> community should work.
>>
>> I only skimmed the thread and this is not my area of expertise, so
>> correct me if I'm wrong anywhere here:
>>
>> But from what I see I tend to disagree: the patch mentioned in the
>> subject apparently regressed something. This afaics was reported 16
>> weeks ago during the 6.7 cycle[1]. Due to how we apply the "no
>> regressions" rule[2] the change causing this ideally should have been
>> reverted before 6.8 was release -- it does not matter that it just
>> exposed an existing problem rooted somewhere else. That revert did not
>> happen. Then even a few more weeks went by and this is still not fixed.
>> Then I'd guess reverting this might be the right course of action to
>> motivate someone to fix the exposed problem. Unless of course we know or
>> fear that a revert causes regressions for users of 6.8.y -- is that the
>> case?
>>
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_ZaQS5x-2DXK08Jre6I-40smile.fi.intel.com_&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=HBokvujOYQdyCxk3eMV6xtwLLNdCxjsnZ4nTzbhbLXM&e=
>>
>> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.kernel.org_process_handling-2Dregressions.html&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=lOh1a5Vzu2VNEaspoNgEipxRU9UDDPcibRLoOVrGbIU&e=
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__linux-2Dregtracking.leemhuis.info_about_-23tldr&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=MuVuulVZqYAjIqCVpNc01ve8yD6DFt2wOZEH4p9khTc&e=
>> If I did something stupid, please tell me, as explained on that page.
> 
> I understand your perspective. Do you have documentation or
> guidelines available to ensure that the existing problem is resolved
> and the correct patch is reinstated?

The second link I gave (e.g. to a rendered version of
Documentation/process/handling-regressions.rst ) has quite a few quotes
from Linus which iirc should support this. Apart from this there is only
the lore archive and the the git log afaik.

> As far as I know, the issue is currently only reproducible by the
> reporter. I also have a similar setup, but I have not been able to
> reproduce the issue myself (please refer to our previous
> discussions). This discrepancy could be due to some OEM
> modifications.

To the board/machine in question or the kernel used beforehand? If the
former: that does not matter much. In the latter: than of course it's
not a regression.

> If the problem is only reproducible by the reporter, how do we
> motivate others to support and upstream the fix?

The general idea afaics is: there was someone that for some reason was
motivated to bring in the patch that caused the regression (indirectly
in this case); with a bit of luck that someone after the revert wants it
to be included again and thus is motivated to fix it (or search for the
underlying issue and fix that in this case).

Of course that does not always work out. OTOH with a bit of luck the
other developers will help with that, too.

And there are gray areas, but they are not that big. Not sure if this is
one, that in the end is up to Greg or Linus to decide.

> Isn't it
> demotivating to revert the correct patch and retain the wrong
> behavior of the driver?

Sure, but I'd say it's even more demotivating if things break for users
when they update their kernel, as then they won't update the next time
and stay on old versions with bugs and vulnerabilities. As that's afaics
what Linus wants to prevent -- and why he made the rule and established
current practices.

> Additionally, is it possible to determine how many correct patches
> have been reverted in the past to address this particular problem?

git log?

> Please correct me if I am wrong ?

HTH, Ciao, Thorsten

>>> Also, I'm a bit wondering: the patch has been in the mainline for
>>> some time, and we haven't had any issues except this one, which is
>>> due to the wrong behavior of the UDC driver.
>>>>
>>>>> Additionally, please check if you have any customized DWC patches that may be causing this problem.
>>>>>
>>>>>>> You may recall the whole issue did not occur before this patch got applied.
>>>>>>>
>>>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>>>> kobject and sysfs interface.
>>>>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>>>
>>>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>>>
>>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>>>>> switch is not good, as the route remains.
>>>>>>>>>>
>>>>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>>>
>>>>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>>>>> kernel's gadget architecture.
>>>>>>>>>
>>>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>>>
>>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>>>> controller as well. But
>>>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>
>>>> -- 
>>>> With Best Regards,
>>>> Andy Shevchenko
>>>>
>>>>
> 
>
Hardik Gajjar May 10, 2024, 9:45 a.m. UTC | #20
On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
> Oops, sorry, wrong file attached . Now correct one.
> 
> Op 02-05-2024 om 22:13 schreef Ferry Toth:
> > Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
> > > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
> > > > Hi,
> > > > 
> > > > Op 30-04-2024 om 21:40 schreef Ferry Toth:
> > > > > Hi,
> > > > > 
> > > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
> > > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
> > > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
> > > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
> > > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
> > > > > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
> > > > > > > > > > > > > On Wed, Apr 10, 2024 at
> > > > > > > > > > > > > 08:37:42PM +0300, Andy
> > > > > > > > > > > > > Shevchenko wrote:
> > > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
> > > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
> > > > > > > > > > 
> > > > > > > > > > ...
> > > > > > > > > > 
> > > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not
> > > > > > > > > > > > > > > removed and it is a link, the link
> > > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
> > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > longer exists
> > > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is
> > > > > > > > > > > > > present, but the symlink is
> > > > > > > > > > > > > broken. In that case, the dwc3 driver should
> > > > > > > > > > > > > recreate the device, and the
> > > > > > > > > > > > > symlink should become active again
> > > > > > > > > > > 
> > > > > > > > > > > Yes, on first enabling gadget (when device mode is activated):
> > > > > > > > > > > 
> > > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > > driver  net  power  sound  subsystem  suspended  uevent
> > > > > > > > > > > 
> > > > > > > > > > > Then switching to host mode:
> > > > > > > > > > > 
> > > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > > ls: cannot access
> > > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
> > > > > > > > > > > No such file
> > > > > > > > > > > or directory
> > > > > > > > > > > 
> > > > > > > > > > > Then back to device mode:
> > > > > > > > > > > 
> > > > > > > > > > > root@yuna:~# ls
> > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
> > > > > > > > > > > driver  power  sound  subsystem  suspended  uevent
> > > > > > > > > > > 
> > > > > > > > > > > net is missing. But, network functions:
> > > > > > > > > > > 
> > > > > > > > > > > root@yuna:~# ping 10.42.0.1
> > > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes
> > > > > > > > > > > 
> > > > > > > > > > > Mass storage device is created and removed each time as expected.
> > > > > > > > > > 
> > > > > > > > > > So, what's the conclusion? Shall we move towards revert of those
> > > > > > > > > > two changes?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > As promised, I have the tested the this patch with the dwc3 gadget.
> > > > > > > > > I could not reproduce
> > > > > > > > > the issue.
> > > > > > > > > 
> > > > > > > > > I can see the usb0 exist all the time and accessible regardless of
> > > > > > > > > the role switching of the USB mode (peripheral <-> host)
> > > > > > > > > 
> > > > > > > > > Following are the logs:
> > > > > > > > > //Host to device
> > > > > > > > > 
> > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
> > > > > > > > > > mode
> > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > > usb0
> > > > > > > > > 
> > > > > > > > > //device to host
> > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb
> > > > > > > > > # echo "host" > mode
> > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls
> > > > > > > > > a800000.dwc3/gadget/net/
> > > > > > > > > usb0
> > > > > > > > 
> > > > > > > > That is weird. When I switch to host mode (using
> > > > > > > > the physical switch),
> > > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5)
> > > > > > > > 
> > > > > > > > Switching back to device mode, that gadget
> > > > > > > > directory is recreated. And
> > > > > > > > gadget/sound as well, but not gadget/net.
> > > > > > > > 
> > > > > > > > > s a800000.dwc3/gadget/net/usb0
> > > > > > > > > <
> > > > > > > > > addr_assign_type    duplex             phys_port_name
> > > > > > > > > addr_len            flags              phys_switch_id
> > > > > > > > > address             gro_flush_timeout  power
> > > > > > > > > broadcast           ifalias            proto_down
> > > > > > > > > carrier             ifindex            queues
> > > > > > > > > carrier_changes     iflink             speed
> > > > > > > > > carrier_down_count  link_mode          statistics
> > > > > > > > > carrier_up_count    mtu                subsystem
> > > > > > > > > dev_id              name_assign_type   tx_queue_len
> > > > > > > > > dev_port            netdev_group       type
> > > > > > > > > device              operstate          uevent
> > > > > > > > > dormant             phys_port_id       waiting_for_supplier
> > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
> > > > > > > > > usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
> > > > > > > > >              BROADCAST MULTICAST  MTU:1500  Metric:1
> > > > > > > > >              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > > > > > > > >              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > > > > > > > >              collisions:0 txqueuelen:1000
> > > > > > > > >              RX bytes:0 TX bytes:0
> > > > > > > > > 
> > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb #
> > > > > > > > > 
> > > > > > > > > I strongly advise against reverting the patch solely based on the
> > > > > > > > > observed issue of removing the /sys/class/net/usb0 directory while
> > > > > > > > > the usb0 interface remains available.
> > > > > > > > 
> > > > > > > > There's more to it. I also mentioned that switching the role or
> > > > > > > > unplugging the cable leaves the usb0 connection.
> > > > > > > > 
> > > > > > > > I have while in host mode:
> > > > > > > > root@yuna:~# ifconfig -a usb0
> > > > > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
> > > > > > > >            inet 10.42.0.221  netmask 255.255.255.0  broadcast
> > > > > > > > 10.42.0.255
> > > > > > > >            inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
> > > > > > > > scopeid 0x20<link>
> > > > > > > > 
> > > > > > > > 
> > > > > > > > You don't see that because you didn't create a connection at all.
> > > > > > > > 
> > > > > > > > > Instead, I recommend enabling FTRACE to
> > > > > > > > > trace the functions involved
> > > > > > > > > and identify which faulty call is responsible for removing usb0.
> > > > > > > > 
> > > > > > > > Switching from device -> host -> device:
> > > > > > > > 
> > > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_*
> > > > > > > >      plugin 'function_graph'
> > > > > > > > Hit Ctrl^C to stop recording
> > > > > > > > ^CCPU0 data recorded at offset=0x1c8000
> > > > > > > >        188 bytes in size (4096 uncompressed)
> > > > > > > > CPU1 data recorded at offset=0x1c9000
> > > > > > > >        0 bytes in size (0 uncompressed)
> > > > > > > > root@yuna:~# trace-cmd report
> > > > > > > > cpus=2
> > > > > > > >         irq/68-dwc3-725   [000]   514.575337:
> > > > > > > > funcgraph_entry:      #
> > > > > > > > 2079.480 us |  gether_disconnect();
> > > > > > > >         irq/68-dwc3-946   [000]   524.263731:
> > > > > > > > funcgraph_entry:      +
> > > > > > > > 11.640 us  |  gether_disconnect();
> > > > > > > >         irq/68-dwc3-946   [000]   524.263743:
> > > > > > > > funcgraph_entry:      !
> > > > > > > > 116.520 us |  gether_connect();
> > > > > > > >         irq/68-dwc3-946   [000]   524.268029:
> > > > > > > > funcgraph_entry:      #
> > > > > > > > 2057.260 us |  gether_disconnect();
> > > > > > > >         irq/68-dwc3-946   [000]   524.270089:
> > > > > > > > funcgraph_entry:      !
> > > > > > > > 109.000 us |  gether_connect();
> > > > > > > 
> > > > > > > I tried to get a more useful trace:
> > > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
> > > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
> > > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer
> > > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config'
> > > > > > > >> set_ftrace_filter
> > > > > > > -> switch to host mode then back to device
> > > > > > > root@yuna:/sys/kernel/tracing# cat trace
> > > > > > > # tracer: function
> > > > > > > #
> > > > > > > # entries-in-buffer/entries-written: 53/53   #P:2
> > > > > > > #
> > > > > > > #                                _-----=> irqs-off/BH-disabled
> > > > > > > #                               / _----=> need-resched
> > > > > > > #                              | / _---=> hardirq/softirq
> > > > > > > #                              || / _--=> preempt-depth
> > > > > > > #                              ||| / _-=> migrate-disable
> > > > > > > #                              |||| /     delay
> > > > > > > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > > > > > > #              | |         |   |||||     |         |
> > > > > > >        irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
> > > > > > > <-__composite_disconnect
> > > > > > >        irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
> > > > > > > <-reset_config
> > > > > > >        irq/68-dwc3-523     [000] D..3.   133.992276:
> > > > > > > gether_disconnect
> > > > > > > <-reset_config
> > > > > > >        kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
> > > > > > > <-purge_configs_funcs
> > > > > > > 
> > > > > > > -> to device mode
> > > > > > > 
> > > > > > >        kworker/1:3-443     [001] ...1.   148.630773: eem_bind
> > > > > > > <-usb_add_function
> > > > > > >        irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
> > > > > > > <-composite_setup
> > > > > > >        irq/68-dwc3-734     [000] D..3.   149.155215:
> > > > > > > gether_disconnect
> > > > > > > <-eem_set_alt
> > > > > > >        irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
> > > > > > > <-eem_set_alt
> > > > > > >        irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
> > > > > > > <-composite_setup
> > > > > > >        irq/68-dwc3-734     [000] D..3.   149.157292:
> > > > > > > gether_disconnect
> > > > > > > <-eem_set_alt
> > > > > > >        irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
> > > > > > > <-eem_set_alt
> > > > > > >        irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
> > > > > > > <-rx_complete
> > > > > > > ...
> > > > > > > 
> > > > > > > I don't know where to look exactly. Any hints?
> > > > > > 
> > > > > > do you see anything related to gether_cleanup() after eem_unbind() ?
> > > > > 
> > > > > Nope. It's a pitty that the trace formatting got messed up
> > > > > above. But as
> > > > > you can see I traced gether_* and eem_*. After eem_unbind no traced
> > > > > function is called, until I flip the switch to device mode.
> > > > > The ... at the end is where I cut uninteresting eem_unwrap
> > > > > <-rx_complete
> > > > > and eem_wrap <-eth_start_xmit lines.
> > > > > 
> > > > > > If not then, you may try to enable tracing of TCP/IP stack and
> > > > > > network side to check who deleting the sysfs entry
> > > > > 
> > > > > Yes, that's a vast amount of functions to trace. And I don't see how
> > > > > that would be related to the patch we're discussing here. I was hoping
> > > > > for a little more targeted hint.
> > > > 
> > > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
> > > > 'usb_fun*',
> > > > 'reset_config' and 'device_remove_file' leads me to:
> > > > 
> > > > TIMESTAMP  FUNCTION
> > > >     |         |
> > > >    49.952477: eem_wrap <-eth_start_xmit
> > > >    55.072455: eem_wrap <-eth_start_xmit
> > > >    55.072621: eem_unwrap <-rx_complete
> > > >    59.011540: configfs_composite_reset <-usb_gadget_udc_reset
> > > >    59.011545: composite_reset <-configfs_composite_reset
> > > >    59.011548: reset_config <-__composite_disconnect
> > > >    59.013565: eem_disable <-reset_config
> > > >    59.013567: gether_disconnect <-reset_config
> > > >    59.049560: device_remove_file <-device_remove
> > > >    59.051185: configfs_composite_disconnect <-usb_gadget_disco
> > > >    59.051189: composite_disconnect <-configfs_composite_discon
> > > >    59.051195: configfs_composite_unbind <-gadget_unbind_driver
> > > >    59.052519: eem_unbind <-purge_configs_funcs
> > > >    59.052529: composite_dev_cleanup <-configfs_composite_unbin
> > > >    59.052537: device_remove_file <-composite_dev_cleanup
> > > > 
> > > > device_remove_file gets called twice, once by device_remove after
> > > > gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
> > > > (removing the gadget)
> > > 
> > > I believe that the device_remove_file function is only removing
> > > suspend-specific attributes, not the complete gadget.
> > > Typically, when you perform the role switch, the Gadget start/stop
> > > function in your UDC driver is called. These functions should not
> > > delete the gadget
> > > 
> > > To investigate further, could you please enable the DWC3 functions
> > > in ftrace and check who is removing the gadget?
> > > I can also enable this on my system and compare the logs with yours,
> > > but I will be in PI planning for 1.5 weeks and may not be able to
> > > provide immediate support.
> > 
> > Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
> > trace is 600kb. I cut uninteresting stuff before
> > configfs_composite_reset <-usb_gadget_udc_reset and after
> > __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
> > compare.
> >
Could you please try with the following patch and see if your issue resolves ?

diff --git a/drivers/usb/gadget/function/f_eem.c b/drivers/usb/gadget/function/f_eem.c
index 3b445bd88498..c2a904475083 100644
--- a/drivers/usb/gadget/function/f_eem.c
+++ b/drivers/usb/gadget/function/f_eem.c
@@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f)
 	struct usb_composite_dev *cdev = c->cdev;
 	struct f_eem		*eem = func_to_eem(f);
 	struct usb_string	*us;
-	int			status;
+	int			status = 0;
 	struct usb_ep		*ep;
 
 	struct f_eem_opts	*eem_opts;
@@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f)
 	 * with list_for_each_entry, so we assume no race condition
 	 * with regard to eem_opts->bound access
 	 */
+	mutex_lock(&eem_opts->lock);
+	gether_set_gadget(eem_opts->net, cdev->gadget);
+
 	if (!eem_opts->bound) {
-		mutex_lock(&eem_opts->lock);
-		gether_set_gadget(eem_opts->net, cdev->gadget);
 		status = gether_register_netdev(eem_opts->net);
-		mutex_unlock(&eem_opts->lock);
-		if (status)
-			return status;
-		eem_opts->bound = true;
 	}
 
+	mutex_unlock(&eem_opts->lock);
+
+	if (status)
+		goto fail;
+
+	eem_opts->bound = true;
+
 	us = usb_gstrings_attach(cdev, eem_strings,
 				 ARRAY_SIZE(eem_string_defs));
 
> > > Additionally, please check if you have any customized DWC patches
> > > that may be causing this problem.
> > > 
> > > > 
> > > > > You may recall the whole issue did not occur before this
> > > > > patch got applied.
> > > > > 
> > > > > > Hardik
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > > According to current kernel architecture of u_ether driver, only
> > > > > > > > > gether_cleanup should remove the usb0 interface along with its
> > > > > > > > > kobject and sysfs interface.
> > > > > > > > > I suggest sharing the analysis here to understand why this practice
> > > > > > > > > is not followed in your use case or driver ?
> > > > > > > > 
> > > > > > > > Yes, I'll try to trace where that happens.
> > > > > > > > 
> > > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems
> > > > > > > > harmless? But the usb: net device remaining after disconnect or role
> > > > > > > > switch is not good, as the route remains.
> > > > > > > > 
> > > > > > > > May be they are 2 separate problems. Could you try to reproduce what
> > > > > > > > happens if you make eem connection and then unplug?
> > > > > > > > 
> > > > > > > > > I am curious why the driver was developed without adhering to the
> > > > > > > > > kernel's gadget architecture.
> > > > > > > 
> > > > > > > I don't know what you mean here. Which driver do you mean?
> > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check
> > > > > > > > > > > > > with this patch and
> > > > > > > > > > > > > share result here.  May be we need some fix in dwc3
> > > > > > > > > > > Would have been nice if someone could test on other
> > > > > > > > > > > controller as well. But
> > > > > > > > > > > another instance of dwc3 is also very welcome.
> > > > > > > > > > > > It's quite possible, please test on your side.
> > > > > > > > > > > > We are happy to test any fixes if you come up with.
> > > > > > > > > > 
> > > > > > > > > > -- 
> > > > > > > > > > With Best Regards,
> > > > > > > > > > Andy Shevchenko
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > 
> > > > >
Ferry Toth May 15, 2024, 6:38 p.m. UTC | #21
Hi,

Sorry to have kept you waiting, I was away for a short break.

Patch tested below.

Op 10-05-2024 om 11:45 schreef Hardik Gajjar:
> On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
>> Oops, sorry, wrong file attached . Now correct one.
>>
>> Op 02-05-2024 om 22:13 schreef Ferry Toth:
>>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>>> Hi,
>>>>>
>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>>> Hi,
>>>>>>
>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at
>>>>>>>>>>>>>> 08:37:42PM +0300, Andy
>>>>>>>>>>>>>> Shevchenko wrote:
>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>>
>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>
>>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>>
>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>> ls: cannot access
>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>>> No such file
>>>>>>>>>>>> or directory
>>>>>>>>>>>>
>>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>>
>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>
>>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>>
>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>>
>>>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>>>
>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>>>> two changes?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>>>> I could not reproduce
>>>>>>>>>> the issue.
>>>>>>>>>>
>>>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>>
>>>>>>>>>> Following are the logs:
>>>>>>>>>> //Host to device
>>>>>>>>>>
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>>>> mode
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>> usb0
>>>>>>>>>>
>>>>>>>>>> //device to host
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>> # echo "host" > mode
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>> usb0
>>>>>>>>>
>>>>>>>>> That is weird. When I switch to host mode (using
>>>>>>>>> the physical switch),
>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>>
>>>>>>>>> Switching back to device mode, that gadget
>>>>>>>>> directory is recreated. And
>>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>>
>>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>>> <
>>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>>> carrier             ifindex            queues
>>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>>> device              operstate          uevent
>>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>>               BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>>               RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>>               TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>>>               collisions:0 txqueuelen:1000
>>>>>>>>>>               RX bytes:0 TX bytes:0
>>>>>>>>>>
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>>
>>>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>>>> the usb0 interface remains available.
>>>>>>>>>
>>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>>
>>>>>>>>> I have while in host mode:
>>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>>>>             inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>>> 10.42.0.255
>>>>>>>>>             inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>>>>> scopeid 0x20<link>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>>
>>>>>>>>>> Instead, I recommend enabling FTRACE to
>>>>>>>>>> trace the functions involved
>>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>>
>>>>>>>>> Switching from device -> host -> device:
>>>>>>>>>
>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>>       plugin 'function_graph'
>>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>>         188 bytes in size (4096 uncompressed)
>>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>>         0 bytes in size (0 uncompressed)
>>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>>> cpus=2
>>>>>>>>>          irq/68-dwc3-725   [000]   514.575337:
>>>>>>>>> funcgraph_entry:      #
>>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>>          irq/68-dwc3-946   [000]   524.263731:
>>>>>>>>> funcgraph_entry:      +
>>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>>          irq/68-dwc3-946   [000]   524.263743:
>>>>>>>>> funcgraph_entry:      !
>>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>>          irq/68-dwc3-946   [000]   524.268029:
>>>>>>>>> funcgraph_entry:      #
>>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>>          irq/68-dwc3-946   [000]   524.270089:
>>>>>>>>> funcgraph_entry:      !
>>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>>
>>>>>>>> I tried to get a more useful trace:
>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config'
>>>>>>>>>> set_ftrace_filter
>>>>>>>> -> switch to host mode then back to device
>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>>> # tracer: function
>>>>>>>> #
>>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>>> #
>>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>>> #                               / _----=> need-resched
>>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>>> #                              || / _--=> preempt-depth
>>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>>> #                              |||| /     delay
>>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>>> #              | |         |   |||||     |         |
>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>>>> <-__composite_disconnect
>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>>> <-reset_config
>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992276:
>>>>>>>> gether_disconnect
>>>>>>>> <-reset_config
>>>>>>>>         kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>>> <-purge_configs_funcs
>>>>>>>>
>>>>>>>> -> to device mode
>>>>>>>>
>>>>>>>>         kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>>> <-usb_add_function
>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>>> <-composite_setup
>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155215:
>>>>>>>> gether_disconnect
>>>>>>>> <-eem_set_alt
>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>>>>> <-eem_set_alt
>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>>> <-composite_setup
>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157292:
>>>>>>>> gether_disconnect
>>>>>>>> <-eem_set_alt
>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>>>>> <-eem_set_alt
>>>>>>>>         irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>>> <-rx_complete
>>>>>>>> ...
>>>>>>>>
>>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>>
>>>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>>>
>>>>>> Nope. It's a pitty that the trace formatting got messed up
>>>>>> above. But as
>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>>> function is called, until I flip the switch to device mode.
>>>>>> The ... at the end is where I cut uninteresting eem_unwrap
>>>>>> <-rx_complete
>>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>>
>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>>> network side to check who deleting the sysfs entry
>>>>>>
>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>>> that would be related to the patch we're discussing here. I was hoping
>>>>>> for a little more targeted hint.
>>>>>
>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
>>>>> 'usb_fun*',
>>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>>
>>>>> TIMESTAMP  FUNCTION
>>>>>      |         |
>>>>>     49.952477: eem_wrap <-eth_start_xmit
>>>>>     55.072455: eem_wrap <-eth_start_xmit
>>>>>     55.072621: eem_unwrap <-rx_complete
>>>>>     59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>>     59.011545: composite_reset <-configfs_composite_reset
>>>>>     59.011548: reset_config <-__composite_disconnect
>>>>>     59.013565: eem_disable <-reset_config
>>>>>     59.013567: gether_disconnect <-reset_config
>>>>>     59.049560: device_remove_file <-device_remove
>>>>>     59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>>     59.051189: composite_disconnect <-configfs_composite_discon
>>>>>     59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>>     59.052519: eem_unbind <-purge_configs_funcs
>>>>>     59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>>     59.052537: device_remove_file <-composite_dev_cleanup
>>>>>
>>>>> device_remove_file gets called twice, once by device_remove after
>>>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>>>> (removing the gadget)
>>>>
>>>> I believe that the device_remove_file function is only removing
>>>> suspend-specific attributes, not the complete gadget.
>>>> Typically, when you perform the role switch, the Gadget start/stop
>>>> function in your UDC driver is called. These functions should not
>>>> delete the gadget
>>>>
>>>> To investigate further, could you please enable the DWC3 functions
>>>> in ftrace and check who is removing the gadget?
>>>> I can also enable this on my system and compare the logs with yours,
>>>> but I will be in PI planning for 1.5 weeks and may not be able to
>>>> provide immediate support.
>>>
>>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
>>> trace is 600kb. I cut uninteresting stuff before
>>> configfs_composite_reset <-usb_gadget_udc_reset and after
>>> __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
>>> compare.
>>>
> Could you please try with the following patch and see if your issue resolves ?
> 
> diff --git a/drivers/usb/gadget/function/f_eem.c b/drivers/usb/gadget/function/f_eem.c
> index 3b445bd88498..c2a904475083 100644
> --- a/drivers/usb/gadget/function/f_eem.c
> +++ b/drivers/usb/gadget/function/f_eem.c
> @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f)
>   	struct usb_composite_dev *cdev = c->cdev;
>   	struct f_eem		*eem = func_to_eem(f);
>   	struct usb_string	*us;
> -	int			status;
> +	int			status = 0;
>   	struct usb_ep		*ep;
>   
>   	struct f_eem_opts	*eem_opts;
> @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f)
>   	 * with list_for_each_entry, so we assume no race condition
>   	 * with regard to eem_opts->bound access
>   	 */
> +	mutex_lock(&eem_opts->lock);
> +	gether_set_gadget(eem_opts->net, cdev->gadget);
> +
>   	if (!eem_opts->bound) {
> -		mutex_lock(&eem_opts->lock);
> -		gether_set_gadget(eem_opts->net, cdev->gadget);
>   		status = gether_register_netdev(eem_opts->net);
> -		mutex_unlock(&eem_opts->lock);
> -		if (status)
> -			return status;
> -		eem_opts->bound = true;
>   	}
>   
> +	mutex_unlock(&eem_opts->lock);
> +
> +	if (status)
> +		goto fail;
> +
> +	eem_opts->bound = true;
> +
>   	us = usb_gstrings_attach(cdev, eem_strings,
>   				 ARRAY_SIZE(eem_string_defs));
> 

After switching from host -> device -> host the usb0 device as seen by 
ifconfig remains "RUNNING" and in the routing table.

FYI This is the regression.

Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is 
still missing.

After deeper diving into v6.1.0, I found the latter a longer existing 
bug, not caused by your patch.

More over, this doesn't seem to do any harm.

The first issue does, as we try to route traffic over a device that no 
longer exists.

>>>> Additionally, please check if you have any customized DWC patches
>>>> that may be causing this problem.
>>>>
>>>>>
>>>>>> You may recall the whole issue did not occur before this
>>>>>> patch got applied.
>>>>>>
>>>>>>> Hardik
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>>> kobject and sysfs interface.
>>>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>>
>>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>>
>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>>>> switch is not good, as the route remains.
>>>>>>>>>
>>>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>>
>>>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>>>> kernel's gadget architecture.
>>>>>>>>
>>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>>> controller as well. But
>>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> With Best Regards,
>>>>>>>>>>> Andy Shevchenko
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>
> 
>
Ferry Toth May 26, 2024, 8:52 p.m. UTC | #22
Hi

Op 15-05-2024 om 20:38 schreef Ferry Toth:
> Hi,
> 
> Sorry to have kept you waiting, I was away for a short break.
> 
> Patch tested below.
> 
> Op 10-05-2024 om 11:45 schreef Hardik Gajjar:
>> On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
>>> Oops, sorry, wrong file attached . Now correct one.
>>>
>>> Op 02-05-2024 om 22:13 schreef Ferry Toth:
>>>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at
>>>>>>>>>>>>>>> 08:37:42PM +0300, Andy
>>>>>>>>>>>>>>> Shevchenko wrote:
>>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>>>>
>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>>>> target 
>>>>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> ls: cannot access
>>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>>>> No such file
>>>>>>>>>>>>> or directory
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>
>>>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mass storage device is created and removed each time as 
>>>>>>>>>>>>> expected.
>>>>>>>>>>>>
>>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of 
>>>>>>>>>>>> those
>>>>>>>>>>>> two changes?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> As promised, I have the tested the this patch with the dwc3 
>>>>>>>>>>> gadget.
>>>>>>>>>>> I could not reproduce
>>>>>>>>>>> the issue.
>>>>>>>>>>>
>>>>>>>>>>> I can see the usb0 exist all the time and accessible 
>>>>>>>>>>> regardless of
>>>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>>>
>>>>>>>>>>> Following are the logs:
>>>>>>>>>>> //Host to device
>>>>>>>>>>>
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo 
>>>>>>>>>>> "peripheral"
>>>>>>>>>>>> mode
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>> usb0
>>>>>>>>>>>
>>>>>>>>>>> //device to host
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>>> # echo "host" > mode
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>> usb0
>>>>>>>>>>
>>>>>>>>>> That is weird. When I switch to host mode (using
>>>>>>>>>> the physical switch),
>>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>>>
>>>>>>>>>> Switching back to device mode, that gadget
>>>>>>>>>> directory is recreated. And
>>>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>>>
>>>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>>>> <
>>>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>>>> carrier             ifindex            queues
>>>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>>>> device              operstate          uevent
>>>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a 
>>>>>>>>>>> usb0
>>>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>>>               BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>>>               RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>>>               TX packets:0 errors:0 dropped:0 overruns:0 
>>>>>>>>>>> carrier:0
>>>>>>>>>>>               collisions:0 txqueuelen:1000
>>>>>>>>>>>               RX bytes:0 TX bytes:0
>>>>>>>>>>>
>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>>>
>>>>>>>>>>> I strongly advise against reverting the patch solely based on 
>>>>>>>>>>> the
>>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory 
>>>>>>>>>>> while
>>>>>>>>>>> the usb0 interface remains available.
>>>>>>>>>>
>>>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>>>
>>>>>>>>>> I have while in host mode:
>>>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  
>>>>>>>>>> mtu 1500
>>>>>>>>>>             inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>>>> 10.42.0.255
>>>>>>>>>>             inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>>>>>> scopeid 0x20<link>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>>>
>>>>>>>>>>> Instead, I recommend enabling FTRACE to
>>>>>>>>>>> trace the functions involved
>>>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>>>
>>>>>>>>>> Switching from device -> host -> device:
>>>>>>>>>>
>>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>>>       plugin 'function_graph'
>>>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>>>         188 bytes in size (4096 uncompressed)
>>>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>>>         0 bytes in size (0 uncompressed)
>>>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>>>> cpus=2
>>>>>>>>>>          irq/68-dwc3-725   [000]   514.575337:
>>>>>>>>>> funcgraph_entry:      #
>>>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>>>          irq/68-dwc3-946   [000]   524.263731:
>>>>>>>>>> funcgraph_entry:      +
>>>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>>>          irq/68-dwc3-946   [000]   524.263743:
>>>>>>>>>> funcgraph_entry:      !
>>>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>>>          irq/68-dwc3-946   [000]   524.268029:
>>>>>>>>>> funcgraph_entry:      #
>>>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>>>          irq/68-dwc3-946   [000]   524.270089:
>>>>>>>>>> funcgraph_entry:      !
>>>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>>>
>>>>>>>>> I tried to get a more useful trace:
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config'
>>>>>>>>>>> set_ftrace_filter
>>>>>>>>> -> switch to host mode then back to device
>>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>>>> # tracer: function
>>>>>>>>> #
>>>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>>>> #
>>>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>>>> #                               / _----=> need-resched
>>>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>>>> #                              || / _--=> preempt-depth
>>>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>>>> #                              |||| /     delay
>>>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>>>> #              | |         |   |||||     |         |
>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>>>>> <-__composite_disconnect
>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>>>> <-reset_config
>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992276:
>>>>>>>>> gether_disconnect
>>>>>>>>> <-reset_config
>>>>>>>>>         kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>>>> <-purge_configs_funcs
>>>>>>>>>
>>>>>>>>> -> to device mode
>>>>>>>>>
>>>>>>>>>         kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>>>> <-usb_add_function
>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>>>> <-composite_setup
>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155215:
>>>>>>>>> gether_disconnect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155220: 
>>>>>>>>> gether_connect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>>>> <-composite_setup
>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157292:
>>>>>>>>> gether_disconnect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.159338: 
>>>>>>>>> gether_connect
>>>>>>>>> <-eem_set_alt
>>>>>>>>>         irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>>>> <-rx_complete
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>>>
>>>>>>>> do you see anything related to gether_cleanup() after 
>>>>>>>> eem_unbind() ?
>>>>>>>
>>>>>>> Nope. It's a pitty that the trace formatting got messed up
>>>>>>> above. But as
>>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>>>> function is called, until I flip the switch to device mode.
>>>>>>> The ... at the end is where I cut uninteresting eem_unwrap
>>>>>>> <-rx_complete
>>>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>>>
>>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>>>> network side to check who deleting the sysfs entry
>>>>>>>
>>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>>>> that would be related to the patch we're discussing here. I was 
>>>>>>> hoping
>>>>>>> for a little more targeted hint.
>>>>>>
>>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
>>>>>> 'usb_fun*',
>>>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>>>
>>>>>> TIMESTAMP  FUNCTION
>>>>>>      |         |
>>>>>>     49.952477: eem_wrap <-eth_start_xmit
>>>>>>     55.072455: eem_wrap <-eth_start_xmit
>>>>>>     55.072621: eem_unwrap <-rx_complete
>>>>>>     59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>>>     59.011545: composite_reset <-configfs_composite_reset
>>>>>>     59.011548: reset_config <-__composite_disconnect
>>>>>>     59.013565: eem_disable <-reset_config
>>>>>>     59.013567: gether_disconnect <-reset_config
>>>>>>     59.049560: device_remove_file <-device_remove
>>>>>>     59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>>>     59.051189: composite_disconnect <-configfs_composite_discon
>>>>>>     59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>>>     59.052519: eem_unbind <-purge_configs_funcs
>>>>>>     59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>>>     59.052537: device_remove_file <-composite_dev_cleanup
>>>>>>
>>>>>> device_remove_file gets called twice, once by device_remove after
>>>>>> gether_disconnect (that the one). The 2nd time by 
>>>>>> composite_dev_cleanup
>>>>>> (removing the gadget)
>>>>>
>>>>> I believe that the device_remove_file function is only removing
>>>>> suspend-specific attributes, not the complete gadget.
>>>>> Typically, when you perform the role switch, the Gadget start/stop
>>>>> function in your UDC driver is called. These functions should not
>>>>> delete the gadget
>>>>>
>>>>> To investigate further, could you please enable the DWC3 functions
>>>>> in ftrace and check who is removing the gadget?
>>>>> I can also enable this on my system and compare the logs with yours,
>>>>> but I will be in PI planning for 1.5 weeks and may not be able to
>>>>> provide immediate support.
>>>>
>>>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
>>>> trace is 600kb. I cut uninteresting stuff before
>>>> configfs_composite_reset <-usb_gadget_udc_reset and after
>>>> __dwc3_set_mode, 
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
>>>> compare.
>>>>
>> Could you please try with the following patch and see if your issue 
>> resolves ?
>>
>> diff --git a/drivers/usb/gadget/function/f_eem.c 
>> b/drivers/usb/gadget/function/f_eem.c
>> index 3b445bd88498..c2a904475083 100644
>> --- a/drivers/usb/gadget/function/f_eem.c
>> +++ b/drivers/usb/gadget/function/f_eem.c
>> @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, 
>> struct usb_function *f)
>>       struct usb_composite_dev *cdev = c->cdev;
>>       struct f_eem        *eem = func_to_eem(f);
>>       struct usb_string    *us;
>> -    int            status;
>> +    int            status = 0;
>>       struct usb_ep        *ep;
>>       struct f_eem_opts    *eem_opts;
>> @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration *c, 
>> struct usb_function *f)
>>        * with list_for_each_entry, so we assume no race condition
>>        * with regard to eem_opts->bound access
>>        */
>> +    mutex_lock(&eem_opts->lock);
>> +    gether_set_gadget(eem_opts->net, cdev->gadget);
>> +
>>       if (!eem_opts->bound) {
>> -        mutex_lock(&eem_opts->lock);
>> -        gether_set_gadget(eem_opts->net, cdev->gadget);
>>           status = gether_register_netdev(eem_opts->net);
>> -        mutex_unlock(&eem_opts->lock);
>> -        if (status)
>> -            return status;
>> -        eem_opts->bound = true;
>>       }
>> +    mutex_unlock(&eem_opts->lock);
>> +
>> +    if (status)
>> +        goto fail;
>> +
>> +    eem_opts->bound = true;
>> +
>>       us = usb_gstrings_attach(cdev, eem_strings,
>>                    ARRAY_SIZE(eem_string_defs));
>>
> 
> After switching from host -> device -> host the usb0 device as seen by 
> ifconfig remains "RUNNING" and in the routing table.
> 
> FYI This is the regression.

For us, I don't see any advantages in having this patch and only the 
downside. Do you have any further ideas / patches that we could test.

Otherwise, it might be best to go with the original suggestion and 
revert until it gets sorted out?

> Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is 
> still missing.
> 
> After deeper diving into v6.1.0, I found the latter a longer existing 
> bug, not caused by your patch.
> 
> More over, this doesn't seem to do any harm.
> 
> The first issue does, as we try to route traffic over a device that no 
> longer exists.
> 
>>>>> Additionally, please check if you have any customized DWC patches
>>>>> that may be causing this problem.
>>>>>
>>>>>>
>>>>>>> You may recall the whole issue did not occur before this
>>>>>>> patch got applied.
>>>>>>>
>>>>>>>> Hardik
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>>>> kobject and sysfs interface.
>>>>>>>>>>> I suggest sharing the analysis here to understand why this 
>>>>>>>>>>> practice
>>>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>>>
>>>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>>>
>>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>>>> harmless? But the usb: net device remaining after disconnect 
>>>>>>>>>> or role
>>>>>>>>>> switch is not good, as the route remains.
>>>>>>>>>>
>>>>>>>>>> May be they are 2 separate problems. Could you try to 
>>>>>>>>>> reproduce what
>>>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>>>
>>>>>>>>>>> I am curious why the driver was developed without adhering to 
>>>>>>>>>>> the
>>>>>>>>>>> kernel's gadget architecture.
>>>>>>>>>
>>>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>>>> controller as well. But
>>>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>> With Best Regards,
>>>>>>>>>>>> Andy Shevchenko
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>
>>
>
Linux regression tracking (Thorsten Leemhuis) May 27, 2024, 6:29 a.m. UTC | #23
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Seems more that four months after the report[1] there is still no
resolution for this regression in sight. This is not how it should be.

Greg, could you maybe chime in here and share your opinion? From my "I
do not know anything about the details of this problem" point of view it
looks like reverting the culprit (f49449fbc21e7e ("usb: gadget: u_ether:
Replace netif_stop_queue with netif_device_detach")) might be the best
option for now, unless of course that creates a regression as well --
but you are the best to judge here.

[1] https://lore.kernel.org/all/ZaQS5x-XK08Jre6I@smile.fi.intel.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

On 26.05.24 22:52, Ferry Toth wrote:
> Op 15-05-2024 om 20:38 schreef Ferry Toth:
>>
>> Sorry to have kept you waiting, I was away for a short break.
>>
>> Patch tested below.
>>
>> Op 10-05-2024 om 11:45 schreef Hardik Gajjar:
>>> On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
>>>> Oops, sorry, wrong file attached . Now correct one.
>>>>
>>>> Op 02-05-2024 om 22:13 schreef Ferry Toth:
>>>>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>>>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at
>>>>>>>>>>>>>>>> 08:37:42PM +0300, Andy
>>>>>>>>>>>>>>>> Shevchenko wrote:
>>>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is
>>>>>>>>>>>>>> activated):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>>> ls: cannot access
>>>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>>>>> No such file
>>>>>>>>>>>>>> or directory
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mass storage device is created and removed each time as
>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of
>>>>>>>>>>>>> those
>>>>>>>>>>>>> two changes?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> As promised, I have the tested the this patch with the dwc3
>>>>>>>>>>>> gadget.
>>>>>>>>>>>> I could not reproduce
>>>>>>>>>>>> the issue.
>>>>>>>>>>>>
>>>>>>>>>>>> I can see the usb0 exist all the time and accessible
>>>>>>>>>>>> regardless of
>>>>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>>>>
>>>>>>>>>>>> Following are the logs:
>>>>>>>>>>>> //Host to device
>>>>>>>>>>>>
>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo
>>>>>>>>>>>> "peripheral"
>>>>>>>>>>>>> mode
>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>>> usb0
>>>>>>>>>>>>
>>>>>>>>>>>> //device to host
>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>>>> # echo "host" > mode
>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>>> usb0
>>>>>>>>>>>
>>>>>>>>>>> That is weird. When I switch to host mode (using
>>>>>>>>>>> the physical switch),
>>>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>>>>
>>>>>>>>>>> Switching back to device mode, that gadget
>>>>>>>>>>> directory is recreated. And
>>>>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>>>>
>>>>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>>>>> <
>>>>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>>>>> carrier             ifindex            queues
>>>>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>>>>> device              operstate          uevent
>>>>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig
>>>>>>>>>>>> -a usb0
>>>>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>>>>               BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>>>>               RX packets:0 errors:0 dropped:0 overruns:0
>>>>>>>>>>>> frame:0
>>>>>>>>>>>>               TX packets:0 errors:0 dropped:0 overruns:0
>>>>>>>>>>>> carrier:0
>>>>>>>>>>>>               collisions:0 txqueuelen:1000
>>>>>>>>>>>>               RX bytes:0 TX bytes:0
>>>>>>>>>>>>
>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>>>>
>>>>>>>>>>>> I strongly advise against reverting the patch solely based
>>>>>>>>>>>> on the
>>>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory
>>>>>>>>>>>> while
>>>>>>>>>>>> the usb0 interface remains available.
>>>>>>>>>>>
>>>>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>>>>
>>>>>>>>>>> I have while in host mode:
>>>>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> 
>>>>>>>>>>> mtu 1500
>>>>>>>>>>>             inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>>>>> 10.42.0.255
>>>>>>>>>>>             inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>>>>>>> scopeid 0x20<link>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> You don't see that because you didn't create a connection at
>>>>>>>>>>> all.
>>>>>>>>>>>
>>>>>>>>>>>> Instead, I recommend enabling FTRACE to
>>>>>>>>>>>> trace the functions involved
>>>>>>>>>>>> and identify which faulty call is responsible for removing
>>>>>>>>>>>> usb0.
>>>>>>>>>>>
>>>>>>>>>>> Switching from device -> host -> device:
>>>>>>>>>>>
>>>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>>>>       plugin 'function_graph'
>>>>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>>>>         188 bytes in size (4096 uncompressed)
>>>>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>>>>         0 bytes in size (0 uncompressed)
>>>>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>>>>> cpus=2
>>>>>>>>>>>          irq/68-dwc3-725   [000]   514.575337:
>>>>>>>>>>> funcgraph_entry:      #
>>>>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>>>>          irq/68-dwc3-946   [000]   524.263731:
>>>>>>>>>>> funcgraph_entry:      +
>>>>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>>>>          irq/68-dwc3-946   [000]   524.263743:
>>>>>>>>>>> funcgraph_entry:      !
>>>>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>>>>          irq/68-dwc3-946   [000]   524.268029:
>>>>>>>>>>> funcgraph_entry:      #
>>>>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>>>>          irq/68-dwc3-946   [000]   524.270089:
>>>>>>>>>>> funcgraph_entry:      !
>>>>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>>>>
>>>>>>>>>> I tried to get a more useful trace:
>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' >
>>>>>>>>>> set_ftrace_filter
>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config'
>>>>>>>>>>>> set_ftrace_filter
>>>>>>>>>> -> switch to host mode then back to device
>>>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>>>>> # tracer: function
>>>>>>>>>> #
>>>>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>>>>> #
>>>>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>>>>> #                               / _----=> need-resched
>>>>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>>>>> #                              || / _--=> preempt-depth
>>>>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>>>>> #                              |||| /     delay
>>>>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>>>>> #              | |         |   |||||     |         |
>>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.990254:
>>>>>>>>>> reset_config
>>>>>>>>>> <-__composite_disconnect
>>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>>>>> <-reset_config
>>>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992276:
>>>>>>>>>> gether_disconnect
>>>>>>>>>> <-reset_config
>>>>>>>>>>         kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>>>>> <-purge_configs_funcs
>>>>>>>>>>
>>>>>>>>>> -> to device mode
>>>>>>>>>>
>>>>>>>>>>         kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>>>>> <-usb_add_function
>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>>>>> <-composite_setup
>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155215:
>>>>>>>>>> gether_disconnect
>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155220:
>>>>>>>>>> gether_connect
>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>>>>> <-composite_setup
>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157292:
>>>>>>>>>> gether_disconnect
>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.159338:
>>>>>>>>>> gether_connect
>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>         irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>>>>> <-rx_complete
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>>>>
>>>>>>>>> do you see anything related to gether_cleanup() after
>>>>>>>>> eem_unbind() ?
>>>>>>>>
>>>>>>>> Nope. It's a pitty that the trace formatting got messed up
>>>>>>>> above. But as
>>>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>>>>> function is called, until I flip the switch to device mode.
>>>>>>>> The ... at the end is where I cut uninteresting eem_unwrap
>>>>>>>> <-rx_complete
>>>>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>>>>
>>>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>>>>> network side to check who deleting the sysfs entry
>>>>>>>>
>>>>>>>> Yes, that's a vast amount of functions to trace. And I don't see
>>>>>>>> how
>>>>>>>> that would be related to the patch we're discussing here. I was
>>>>>>>> hoping
>>>>>>>> for a little more targeted hint.
>>>>>>>
>>>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
>>>>>>> 'usb_fun*',
>>>>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>>>>
>>>>>>> TIMESTAMP  FUNCTION
>>>>>>>      |         |
>>>>>>>     49.952477: eem_wrap <-eth_start_xmit
>>>>>>>     55.072455: eem_wrap <-eth_start_xmit
>>>>>>>     55.072621: eem_unwrap <-rx_complete
>>>>>>>     59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>>>>     59.011545: composite_reset <-configfs_composite_reset
>>>>>>>     59.011548: reset_config <-__composite_disconnect
>>>>>>>     59.013565: eem_disable <-reset_config
>>>>>>>     59.013567: gether_disconnect <-reset_config
>>>>>>>     59.049560: device_remove_file <-device_remove
>>>>>>>     59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>>>>     59.051189: composite_disconnect <-configfs_composite_discon
>>>>>>>     59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>>>>     59.052519: eem_unbind <-purge_configs_funcs
>>>>>>>     59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>>>>     59.052537: device_remove_file <-composite_dev_cleanup
>>>>>>>
>>>>>>> device_remove_file gets called twice, once by device_remove after
>>>>>>> gether_disconnect (that the one). The 2nd time by
>>>>>>> composite_dev_cleanup
>>>>>>> (removing the gadget)
>>>>>>
>>>>>> I believe that the device_remove_file function is only removing
>>>>>> suspend-specific attributes, not the complete gadget.
>>>>>> Typically, when you perform the role switch, the Gadget start/stop
>>>>>> function in your UDC driver is called. These functions should not
>>>>>> delete the gadget
>>>>>>
>>>>>> To investigate further, could you please enable the DWC3 functions
>>>>>> in ftrace and check who is removing the gadget?
>>>>>> I can also enable this on my system and compare the logs with yours,
>>>>>> but I will be in PI planning for 1.5 weeks and may not be able to
>>>>>> provide immediate support.
>>>>>
>>>>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
>>>>> trace is 600kb. I cut uninteresting stuff before
>>>>> configfs_composite_reset <-usb_gadget_udc_reset and after
>>>>> __dwc3_set_mode,
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
>>>>> compare.
>>>>>
>>> Could you please try with the following patch and see if your issue
>>> resolves ?
>>>
>>> diff --git a/drivers/usb/gadget/function/f_eem.c
>>> b/drivers/usb/gadget/function/f_eem.c
>>> index 3b445bd88498..c2a904475083 100644
>>> --- a/drivers/usb/gadget/function/f_eem.c
>>> +++ b/drivers/usb/gadget/function/f_eem.c
>>> @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c,
>>> struct usb_function *f)
>>>       struct usb_composite_dev *cdev = c->cdev;
>>>       struct f_eem        *eem = func_to_eem(f);
>>>       struct usb_string    *us;
>>> -    int            status;
>>> +    int            status = 0;
>>>       struct usb_ep        *ep;
>>>       struct f_eem_opts    *eem_opts;
>>> @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration
>>> *c, struct usb_function *f)
>>>        * with list_for_each_entry, so we assume no race condition
>>>        * with regard to eem_opts->bound access
>>>        */
>>> +    mutex_lock(&eem_opts->lock);
>>> +    gether_set_gadget(eem_opts->net, cdev->gadget);
>>> +
>>>       if (!eem_opts->bound) {
>>> -        mutex_lock(&eem_opts->lock);
>>> -        gether_set_gadget(eem_opts->net, cdev->gadget);
>>>           status = gether_register_netdev(eem_opts->net);
>>> -        mutex_unlock(&eem_opts->lock);
>>> -        if (status)
>>> -            return status;
>>> -        eem_opts->bound = true;
>>>       }
>>> +    mutex_unlock(&eem_opts->lock);
>>> +
>>> +    if (status)
>>> +        goto fail;
>>> +
>>> +    eem_opts->bound = true;
>>> +
>>>       us = usb_gstrings_attach(cdev, eem_strings,
>>>                    ARRAY_SIZE(eem_string_defs));
>>>
>>
>> After switching from host -> device -> host the usb0 device as seen by
>> ifconfig remains "RUNNING" and in the routing table.
>>
>> FYI This is the regression.
> 
> For us, I don't see any advantages in having this patch and only the
> downside. Do you have any further ideas / patches that we could test.
> 
> Otherwise, it might be best to go with the original suggestion and
> revert until it gets sorted out?
> 
>> Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is
>> still missing.
>>
>> After deeper diving into v6.1.0, I found the latter a longer existing
>> bug, not caused by your patch.
>>
>> More over, this doesn't seem to do any harm.
>>
>> The first issue does, as we try to route traffic over a device that no
>> longer exists.
>>
>>>>>> Additionally, please check if you have any customized DWC patches
>>>>>> that may be causing this problem.
>>>>>>
>>>>>>>
>>>>>>>> You may recall the whole issue did not occur before this
>>>>>>>> patch got applied.
>>>>>>>>
>>>>>>>>> Hardik
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> According to current kernel architecture of u_ether driver,
>>>>>>>>>>>> only
>>>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>>>>> kobject and sysfs interface.
>>>>>>>>>>>> I suggest sharing the analysis here to understand why this
>>>>>>>>>>>> practice
>>>>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>>>>
>>>>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>>>>
>>>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>>>>> harmless? But the usb: net device remaining after disconnect
>>>>>>>>>>> or role
>>>>>>>>>>> switch is not good, as the route remains.
>>>>>>>>>>>
>>>>>>>>>>> May be they are 2 separate problems. Could you try to
>>>>>>>>>>> reproduce what
>>>>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>>>>
>>>>>>>>>>>> I am curious why the driver was developed without adhering
>>>>>>>>>>>> to the
>>>>>>>>>>>> kernel's gadget architecture.
>>>>>>>>>>
>>>>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>>>>> controller as well. But
>>>>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> With Best Regards,
>>>>>>>>>>>>> Andy Shevchenko
Ferry Toth May 30, 2024, 7:06 p.m. UTC | #24
Hi,

Op 28-05-2024 om 09:18 schreef Hardik Gajjar:
> On Sun, May 26, 2024 at 10:52:53PM +0200, Ferry Toth wrote:
>> Hi
>>
>> Op 15-05-2024 om 20:38 schreef Ferry Toth:
>>> Hi,
>>>
>>> Sorry to have kept you waiting, I was away for a short break.
>>>
>>> Patch tested below.
>>>
>>> Op 10-05-2024 om 11:45 schreef Hardik Gajjar:
>>>> On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
>>>>> Oops, sorry, wrong file attached . Now correct one.
>>>>>
>>>>> Op 02-05-2024 om 22:13 schreef Ferry Toth:
>>>>>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>>>>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>>>>>> On Thu, Apr 11, 2024
>>>>>>>>>>>>>>>> at 04:26:37PM +0200,
>>>>>>>>>>>>>>>> Hardik Gajjar wrote:
>>>>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at
>>>>>>>>>>>>>>>>> 08:37:42PM +0300, Andy
>>>>>>>>>>>>>>>>> Shevchenko wrote:
>>>>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>>>> ls: cannot access
>>>>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>>>>>> No such file
>>>>>>>>>>>>>>> or directory
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> root@yuna:~# ls
>>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1
>>>>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Mass storage device is
>>>>>>>>>>>>>>> created and removed each
>>>>>>>>>>>>>>> time as expected.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, what's the conclusion?
>>>>>>>>>>>>>> Shall we move towards revert
>>>>>>>>>>>>>> of those
>>>>>>>>>>>>>> two changes?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> As promised, I have the tested
>>>>>>>>>>>>> the this patch with the dwc3
>>>>>>>>>>>>> gadget.
>>>>>>>>>>>>> I could not reproduce
>>>>>>>>>>>>> the issue.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can see the usb0 exist all the
>>>>>>>>>>>>> time and accessible regardless
>>>>>>>>>>>>> of
>>>>>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Following are the logs:
>>>>>>>>>>>>> //Host to device
>>>>>>>>>>>>>
>>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>>>>> # echo "peripheral"
>>>>>>>>>>>>>> mode
>>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>>>> usb0
>>>>>>>>>>>>>
>>>>>>>>>>>>> //device to host
>>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>>>>> # echo "host" > mode
>>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>>>>> usb0
>>>>>>>>>>>>
>>>>>>>>>>>> That is weird. When I switch to host mode (using
>>>>>>>>>>>> the physical switch),
>>>>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>>>>>
>>>>>>>>>>>> Switching back to device mode, that gadget
>>>>>>>>>>>> directory is recreated. And
>>>>>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>>>>>
>>>>>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>>>>>> <
>>>>>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>>>>>> carrier             ifindex            queues
>>>>>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>>>>>> device              operstate          uevent
>>>>>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>>>>> # ifconfig -a usb0
>>>>>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>>>>>                BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>>>>>                RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>>>>>                TX packets:0
>>>>>>>>>>>>> errors:0 dropped:0 overruns:0
>>>>>>>>>>>>> carrier:0
>>>>>>>>>>>>>                collisions:0 txqueuelen:1000
>>>>>>>>>>>>>                RX bytes:0 TX bytes:0
>>>>>>>>>>>>>
>>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>>>>>
>>>>>>>>>>>>> I strongly advise against
>>>>>>>>>>>>> reverting the patch solely based
>>>>>>>>>>>>> on the
>>>>>>>>>>>>> observed issue of removing the
>>>>>>>>>>>>> /sys/class/net/usb0 directory
>>>>>>>>>>>>> while
>>>>>>>>>>>>> the usb0 interface remains available.
>>>>>>>>>>>>
>>>>>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>>>>>
>>>>>>>>>>>> I have while in host mode:
>>>>>>>>>>>> root@yuna:~# ifconfig -a usb0
>>>>>>>>>>>> usb0:
>>>>>>>>>>>> flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>
>>>>>>>>>>>> mtu 1500
>>>>>>>>>>>>              inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>>>>>> 10.42.0.255
>>>>>>>>>>>>              inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>>>>>>>> scopeid 0x20<link>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>>>>>
>>>>>>>>>>>>> Instead, I recommend enabling FTRACE to
>>>>>>>>>>>>> trace the functions involved
>>>>>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>>>>>
>>>>>>>>>>>> Switching from device -> host -> device:
>>>>>>>>>>>>
>>>>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>>>>>        plugin 'function_graph'
>>>>>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>>>>>          188 bytes in size (4096 uncompressed)
>>>>>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>>>>>          0 bytes in size (0 uncompressed)
>>>>>>>>>>>> root@yuna:~# trace-cmd report
>>>>>>>>>>>> cpus=2
>>>>>>>>>>>>           irq/68-dwc3-725   [000]   514.575337:
>>>>>>>>>>>> funcgraph_entry:      #
>>>>>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>>>>>           irq/68-dwc3-946   [000]   524.263731:
>>>>>>>>>>>> funcgraph_entry:      +
>>>>>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>>>>>           irq/68-dwc3-946   [000]   524.263743:
>>>>>>>>>>>> funcgraph_entry:      !
>>>>>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>>>>>           irq/68-dwc3-946   [000]   524.268029:
>>>>>>>>>>>> funcgraph_entry:      #
>>>>>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>>>>>           irq/68-dwc3-946   [000]   524.270089:
>>>>>>>>>>>> funcgraph_entry:      !
>>>>>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>>>>>
>>>>>>>>>>> I tried to get a more useful trace:
>>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer
>>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config'
>>>>>>>>>>>>> set_ftrace_filter
>>>>>>>>>>> -> switch to host mode then back to device
>>>>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace
>>>>>>>>>>> # tracer: function
>>>>>>>>>>> #
>>>>>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>>>>>> #
>>>>>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>>>>>> #                               / _----=> need-resched
>>>>>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>>>>>> #                              || / _--=> preempt-depth
>>>>>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>>>>>> #                              |||| /     delay
>>>>>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>>>>>> #              | |         |   |||||     |         |
>>>>>>>>>>>          irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>>>>>>> <-__composite_disconnect
>>>>>>>>>>>          irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>>>>>> <-reset_config
>>>>>>>>>>>          irq/68-dwc3-523     [000] D..3.   133.992276:
>>>>>>>>>>> gether_disconnect
>>>>>>>>>>> <-reset_config
>>>>>>>>>>>          kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>>>>>> <-purge_configs_funcs
>>>>>>>>>>>
>>>>>>>>>>> -> to device mode
>>>>>>>>>>>
>>>>>>>>>>>          kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>>>>>> <-usb_add_function
>>>>>>>>>>>          irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>>>>>> <-composite_setup
>>>>>>>>>>>          irq/68-dwc3-734     [000] D..3.   149.155215:
>>>>>>>>>>> gether_disconnect
>>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>>          irq/68-dwc3-734     [000]
>>>>>>>>>>> D..3.   149.155220: gether_connect
>>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>>          irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>>>>>> <-composite_setup
>>>>>>>>>>>          irq/68-dwc3-734     [000] D..3.   149.157292:
>>>>>>>>>>> gether_disconnect
>>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>>          irq/68-dwc3-734     [000]
>>>>>>>>>>> D..3.   149.159338: gether_connect
>>>>>>>>>>> <-eem_set_alt
>>>>>>>>>>>          irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>>>>>> <-rx_complete
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>>>>>
>>>>>>>>>> do you see anything related to
>>>>>>>>>> gether_cleanup() after eem_unbind() ?
>>>>>>>>>
>>>>>>>>> Nope. It's a pitty that the trace formatting got messed up
>>>>>>>>> above. But as
>>>>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>>>>>> function is called, until I flip the switch to device mode.
>>>>>>>>> The ... at the end is where I cut uninteresting eem_unwrap
>>>>>>>>> <-rx_complete
>>>>>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>>>>>
>>>>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>>>>>> network side to check who deleting the sysfs entry
>>>>>>>>>
>>>>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>>>>>> that would be related to the patch we're
>>>>>>>>> discussing here. I was hoping
>>>>>>>>> for a little more targeted hint.
>>>>>>>>
>>>>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
>>>>>>>> 'usb_fun*',
>>>>>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>>>>>
>>>>>>>> TIMESTAMP  FUNCTION
>>>>>>>>       |         |
>>>>>>>>      49.952477: eem_wrap <-eth_start_xmit
>>>>>>>>      55.072455: eem_wrap <-eth_start_xmit
>>>>>>>>      55.072621: eem_unwrap <-rx_complete
>>>>>>>>      59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>>>>>      59.011545: composite_reset <-configfs_composite_reset
>>>>>>>>      59.011548: reset_config <-__composite_disconnect
>>>>>>>>      59.013565: eem_disable <-reset_config
>>>>>>>>      59.013567: gether_disconnect <-reset_config
>>>>>>>>      59.049560: device_remove_file <-device_remove
>>>>>>>>      59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>>>>>      59.051189: composite_disconnect <-configfs_composite_discon
>>>>>>>>      59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>>>>>      59.052519: eem_unbind <-purge_configs_funcs
>>>>>>>>      59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>>>>>      59.052537: device_remove_file <-composite_dev_cleanup
>>>>>>>>
>>>>>>>> device_remove_file gets called twice, once by device_remove after
>>>>>>>> gether_disconnect (that the one). The 2nd time by
>>>>>>>> composite_dev_cleanup
>>>>>>>> (removing the gadget)
>>>>>>>
>>>>>>> I believe that the device_remove_file function is only removing
>>>>>>> suspend-specific attributes, not the complete gadget.
>>>>>>> Typically, when you perform the role switch, the Gadget start/stop
>>>>>>> function in your UDC driver is called. These functions should not
>>>>>>> delete the gadget
>>>>>>>
>>>>>>> To investigate further, could you please enable the DWC3 functions
>>>>>>> in ftrace and check who is removing the gadget?
>>>>>>> I can also enable this on my system and compare the logs with yours,
>>>>>>> but I will be in PI planning for 1.5 weeks and may not be able to
>>>>>>> provide immediate support.
>>>>>>
>>>>>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
>>>>>> trace is 600kb. I cut uninteresting stuff before
>>>>>> configfs_composite_reset <-usb_gadget_udc_reset and after
>>>>>> __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
>>>>>> compare.
>>>>>>
>>>> Could you please try with the following patch and see if your issue
>>>> resolves ?
>>>>
>>>> diff --git a/drivers/usb/gadget/function/f_eem.c
>>>> b/drivers/usb/gadget/function/f_eem.c
>>>> index 3b445bd88498..c2a904475083 100644
>>>> --- a/drivers/usb/gadget/function/f_eem.c
>>>> +++ b/drivers/usb/gadget/function/f_eem.c
>>>> @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c,
>>>> struct usb_function *f)
>>>>        struct usb_composite_dev *cdev = c->cdev;
>>>>        struct f_eem        *eem = func_to_eem(f);
>>>>        struct usb_string    *us;
>>>> -    int            status;
>>>> +    int            status = 0;
>>>>        struct usb_ep        *ep;
>>>>        struct f_eem_opts    *eem_opts;
>>>> @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration
>>>> *c, struct usb_function *f)
>>>>         * with list_for_each_entry, so we assume no race condition
>>>>         * with regard to eem_opts->bound access
>>>>         */
>>>> +    mutex_lock(&eem_opts->lock);
>>>> +    gether_set_gadget(eem_opts->net, cdev->gadget);
>>>> +
>>>>        if (!eem_opts->bound) {
>>>> -        mutex_lock(&eem_opts->lock);
>>>> -        gether_set_gadget(eem_opts->net, cdev->gadget);
>>>>            status = gether_register_netdev(eem_opts->net);
>>>> -        mutex_unlock(&eem_opts->lock);
>>>> -        if (status)
>>>> -            return status;
>>>> -        eem_opts->bound = true;
>>>>        }
>>>> +    mutex_unlock(&eem_opts->lock);
>>>> +
>>>> +    if (status)
>>>> +        goto fail;
>>>> +
>>>> +    eem_opts->bound = true;
>>>> +
>>>>        us = usb_gstrings_attach(cdev, eem_strings,
>>>>                     ARRAY_SIZE(eem_string_defs));
>>>>
>>>
>>> After switching from host -> device -> host the usb0 device as seen by
>>> ifconfig remains "RUNNING" and in the routing table.
>>>
>>> FYI This is the regression.
>>
>> For us, I don't see any advantages in having this patch and only the
>> downside. Do you have any further ideas / patches that we could test.
>>
>> Otherwise, it might be best to go with the original suggestion and revert
>> until it gets sorted out?
> 
> Hello,
> 
> Okay, lets revert the patch till we fix your problem. Would you be able to send the revert ? or shall I ?
> 
> However, We are very much intrested to fix your problem and upstream the v2.
> Therefore could you please share the following details to reproduce the issue

Instruction to establish the eem connection on the device (connman) and 
the host (networkmanager) here:
https://edison-fw.github.io/meta-intel-edison/4.2-networking.html#gadget

> - How did you created the gadget ? Could you please share the exact commands to create the eem gadget ?

Sure, you can find the u-dev rules and the script here on GH:
https://github.com/edison-fw/meta-intel-edison/tree/kirkstone/meta-intel-edison-bsp/recipes-support/gadget/files

> - Arch of your test bench (arm, x86)

This is x86_64 although we can just as easily build i686 if needed.

> - How the DWC3 controller connected to your system ? via PCIe slot ? Could you please share the part number of the PCIe card ?

Like Andy mentioned it is in the SOC. It's described in ACPI (in U-Boot 
together with other SOC level devices) here:
https://github.com/u-boot/u-boot/blob/ea722aa5eb33740ae77e8816aeb72b385e621cd0/arch/x86/include/asm/arch-tangier/acpi/southcluster.asl#L333

> Hardik
> 
>>
>>> Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is
>>> still missing.
>>>
>>> After deeper diving into v6.1.0, I found the latter a longer existing
>>> bug, not caused by your patch.
>>>
>>> More over, this doesn't seem to do any harm.
>>>
>>> The first issue does, as we try to route traffic over a device that no
>>> longer exists.
>>>
>>>>>>> Additionally, please check if you have any customized DWC patches
>>>>>>> that may be causing this problem.
>>>>>>>
>>>>>>>>
>>>>>>>>> You may recall the whole issue did not occur before this
>>>>>>>>> patch got applied.
>>>>>>>>>
>>>>>>>>>> Hardik
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>>>>>> kobject and sysfs interface.
>>>>>>>>>>>>> I suggest sharing the analysis
>>>>>>>>>>>>> here to understand why this
>>>>>>>>>>>>> practice
>>>>>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>>>>>
>>>>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>>>>>> harmless? But the usb: net device
>>>>>>>>>>>> remaining after disconnect or role
>>>>>>>>>>>> switch is not good, as the route remains.
>>>>>>>>>>>>
>>>>>>>>>>>> May be they are 2 separate problems.
>>>>>>>>>>>> Could you try to reproduce what
>>>>>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>>>>>
>>>>>>>>>>>>> I am curious why the driver was
>>>>>>>>>>>>> developed without adhering to
>>>>>>>>>>>>> the
>>>>>>>>>>>>> kernel's gadget architecture.
>>>>>>>>>>>
>>>>>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>>>>>> controller as well. But
>>>>>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>> With Best Regards,
>>>>>>>>>>>>>> Andy Shevchenko
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>
>>>>
>>>
Linux regression tracking (Thorsten Leemhuis) June 6, 2024, 7:58 a.m. UTC | #25
On 28.05.24 09:18, Hardik Gajjar wrote:
>
> Okay, lets revert the patch till we fix your problem. Would you be able to send the revert ? or shall I ?
diff mbox series

Patch

diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 4bb0553da658..9d1c40c152d8 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -1200,7 +1200,7 @@  void gether_disconnect(struct gether *link)
 
 	DBG(dev, "%s\n", __func__);
 
-	netif_stop_queue(dev->net);
+	netif_device_detach(dev->net);
 	netif_carrier_off(dev->net);
 
 	/* disable endpoints, forcing (synchronous) completion