Message ID | 20210420075406.64105-1-acelan.kao@canonical.com |
---|---|
State | Superseded |
Headers | show |
Series | net: called rtnl_unlock() before runpm resumes devices | expand |
Yes, should add Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") and also Fixes: 9513d2a5dc7f ("igc: Add legacy power management support") Jakub Kicinski <kuba@kernel.org> 於 2021年4月21日 週三 上午3:27寫道: > > On Tue, 20 Apr 2021 10:34:17 +0200 Eric Dumazet wrote: > > On Tue, Apr 20, 2021 at 9:54 AM AceLan Kao <acelan.kao@canonical.com> wrote: > > > > > > From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> > > > > > > The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in > > > __dev_open() it calls pm_runtime_resume() to resume devices, and in > > > some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock() > > > again. That leads to a recursive lock. > > > > > > It should leave the devices' resume function to decide if they need to > > > call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling > > > pm_runtime_resume() and then call rtnl_lock() after it in __dev_open(). > > > > > > > > > > Hi Acelan > > > > When was the bugg added ? > > Please add a Fixes: tag > > For immediate cause probably: > > Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") > > > By doing so, you give more chances for reviewers to understand why the > > fix is not risky, > > and help stable teams work. > > IMO the driver lacks internal locking. Taking rtnl from resume is just > one example, git history shows many more places that lacked locking and > got papered over with rtnl here.
On 22.04.2021 08:30, AceLan Kao wrote: > Yes, should add > > Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") > and also > Fixes: 9513d2a5dc7f ("igc: Add legacy power management support") > Please don't top-post. Apart from that: If the issue was introduced with driver changes, then adding a workaround in net core may not be the right approach. > Jakub Kicinski <kuba@kernel.org> 於 2021年4月21日 週三 上午3:27寫道: >> >> On Tue, 20 Apr 2021 10:34:17 +0200 Eric Dumazet wrote: >>> On Tue, Apr 20, 2021 at 9:54 AM AceLan Kao <acelan.kao@canonical.com> wrote: >>>> >>>> From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> >>>> >>>> The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in >>>> __dev_open() it calls pm_runtime_resume() to resume devices, and in >>>> some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock() >>>> again. That leads to a recursive lock. >>>> >>>> It should leave the devices' resume function to decide if they need to >>>> call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling >>>> pm_runtime_resume() and then call rtnl_lock() after it in __dev_open(). >>>> >>>> >>> >>> Hi Acelan >>> >>> When was the bugg added ? >>> Please add a Fixes: tag >> >> For immediate cause probably: >> >> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") >> >>> By doing so, you give more chances for reviewers to understand why the >>> fix is not risky, >>> and help stable teams work. >> >> IMO the driver lacks internal locking. Taking rtnl from resume is just >> one example, git history shows many more places that lacked locking and >> got papered over with rtnl here.
Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月22日 週四 下午3:09寫道: > > On 22.04.2021 08:30, AceLan Kao wrote: > > Yes, should add > > > > Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") > > and also > > Fixes: 9513d2a5dc7f ("igc: Add legacy power management support") > > > Please don't top-post. Apart from that: > If the issue was introduced with driver changes, then adding a workaround > in net core may not be the right approach. It's hard to say who introduces this issue, we probably could point our finger to below commit bd869245a3dc net: core: try to runtime-resume detached device in __dev_open This calling path is not usual, in my case, the NIC is not plugged in any Ethernet cable, and we are doing networking tests on another NIC on the system. So, remove the rtnl lock from igb driver will affect other scenarios. > > > Jakub Kicinski <kuba@kernel.org> 於 2021年4月21日 週三 上午3:27寫道: > >> > >> On Tue, 20 Apr 2021 10:34:17 +0200 Eric Dumazet wrote: > >>> On Tue, Apr 20, 2021 at 9:54 AM AceLan Kao <acelan.kao@canonical.com> wrote: > >>>> > >>>> From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> > >>>> > >>>> The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in > >>>> __dev_open() it calls pm_runtime_resume() to resume devices, and in > >>>> some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock() > >>>> again. That leads to a recursive lock. > >>>> > >>>> It should leave the devices' resume function to decide if they need to > >>>> call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling > >>>> pm_runtime_resume() and then call rtnl_lock() after it in __dev_open(). > >>>> > >>>> > >>> > >>> Hi Acelan > >>> > >>> When was the bugg added ? > >>> Please add a Fixes: tag > >> > >> For immediate cause probably: > >> > >> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") > >> > >>> By doing so, you give more chances for reviewers to understand why the > >>> fix is not risky, > >>> and help stable teams work. > >> > >> IMO the driver lacks internal locking. Taking 看rtnl from resume is just > >> one example, git history shows many more places that lacked locking and > >> got papered over with rtnl here. >
On 23.04.2021 05:42, AceLan Kao wrote: > Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月22日 週四 下午3:09寫道: >> >> On 22.04.2021 08:30, AceLan Kao wrote: >>> Yes, should add >>> >>> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") >>> and also >>> Fixes: 9513d2a5dc7f ("igc: Add legacy power management support") >>> >> Please don't top-post. Apart from that: >> If the issue was introduced with driver changes, then adding a workaround >> in net core may not be the right approach. > It's hard to say who introduces this issue, we probably could point > our finger to below commit > bd869245a3dc net: core: try to runtime-resume detached device in __dev_open > > This calling path is not usual, in my case, the NIC is not plugged in > any Ethernet cable, > and we are doing networking tests on another NIC on the system. So, > remove the rtnl lock from igb driver will affect other scenarios. > >> >>> Jakub Kicinski <kuba@kernel.org> 於 2021年4月21日 週三 上午3:27寫道: >>>> >>>> On Tue, 20 Apr 2021 10:34:17 +0200 Eric Dumazet wrote: >>>>> On Tue, Apr 20, 2021 at 9:54 AM AceLan Kao <acelan.kao@canonical.com> wrote: >>>>>> >>>>>> From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> >>>>>> >>>>>> The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in >>>>>> __dev_open() it calls pm_runtime_resume() to resume devices, and in >>>>>> some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock() >>>>>> again. That leads to a recursive lock. >>>>>> >>>>>> It should leave the devices' resume function to decide if they need to >>>>>> call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling >>>>>> pm_runtime_resume() and then call rtnl_lock() after it in __dev_open(). >>>>>> >>>>>> >>>>> >>>>> Hi Acelan >>>>> >>>>> When was the bugg added ? >>>>> Please add a Fixes: tag >>>> >>>> For immediate cause probably: >>>> >>>> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") >>>> >>>>> By doing so, you give more chances for reviewers to understand why the >>>>> fix is not risky, >>>>> and help stable teams work. >>>> >>>> IMO the driver lacks internal locking. Taking 看rtnl from resume is just >>>> one example, git history shows many more places that lacked locking and >>>> got papered over with rtnl here. >> You could alternatively try the following. It should avoid the deadlock, and when runtime-resuming if __IGB_DOWN is set all we do is marking the net_device as present (because of PCI D3 -> D0 transition). I do basically the same in r8169 and it works as intended. Disclaimer: I don't have an igb-driven device and therefore can't test the proposal. diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 038a9fd1a..21436626a 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -9300,6 +9300,14 @@ static int __maybe_unused igb_runtime_suspend(struct device *dev) static int __maybe_unused igb_runtime_resume(struct device *dev) { + struct net_device *netdev = dev_get_drvdata(dev); + struct igb_adapter *adapter = netdev_priv(netdev); + + if (test_bit(__IGB_DOWN, &adapter->state)) { + netif_device_attach(netdev); + return 0; + } + return igb_resume(dev); } -- 2.31.1
Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月25日 週日 上午4:07寫道: > > On 23.04.2021 05:42, AceLan Kao wrote: > > Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月22日 週四 下午3:09寫道: > >> > >> On 22.04.2021 08:30, AceLan Kao wrote: > >>> Yes, should add > >>> > >>> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") > >>> and also > >>> Fixes: 9513d2a5dc7f ("igc: Add legacy power management support") > >>> > >> Please don't top-post. Apart from that: > >> If the issue was introduced with driver changes, then adding a workaround > >> in net core may not be the right approach. > > It's hard to say who introduces this issue, we probably could point > > our finger to below commit > > bd869245a3dc net: core: try to runtime-resume detached device in __dev_open > > > > This calling path is not usual, in my case, the NIC is not plugged in > > any Ethernet cable, > > and we are doing networking tests on another NIC on the system. So, > > remove the rtnl lock from igb driver will affect other scenarios. > > > >> > >>> Jakub Kicinski <kuba@kernel.org> 於 2021年4月21日 週三 上午3:27寫道: > >>>> > >>>> On Tue, 20 Apr 2021 10:34:17 +0200 Eric Dumazet wrote: > >>>>> On Tue, Apr 20, 2021 at 9:54 AM AceLan Kao <acelan.kao@canonical.com> wrote: > >>>>>> > >>>>>> From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> > >>>>>> > >>>>>> The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in > >>>>>> __dev_open() it calls pm_runtime_resume() to resume devices, and in > >>>>>> some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock() > >>>>>> again. That leads to a recursive lock. > >>>>>> > >>>>>> It should leave the devices' resume function to decide if they need to > >>>>>> call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling > >>>>>> pm_runtime_resume() and then call rtnl_lock() after it in __dev_open(). > >>>>>> > >>>>>> > >>>>> > >>>>> Hi Acelan > >>>>> > >>>>> When was the bugg added ? > >>>>> Please add a Fixes: tag > >>>> > >>>> For immediate cause probably: > >>>> > >>>> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") > >>>> > >>>>> By doing so, you give more chances for reviewers to understand why the > >>>>> fix is not risky, > >>>>> and help stable teams work. > >>>> > >>>> IMO the driver lacks internal locking. Taking 看rtnl from resume is just > >>>> one example, git history shows many more places that lacked locking and > >>>> got papered over with rtnl here. > >> > > You could alternatively try the following. It should avoid the deadlock, > and when runtime-resuming if __IGB_DOWN is set all we do is marking the > net_device as present (because of PCI D3 -> D0 transition). > I do basically the same in r8169 and it works as intended. > > Disclaimer: I don't have an igb-driven device and therefore can't test > the proposal. > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c > index 038a9fd1a..21436626a 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -9300,6 +9300,14 @@ static int __maybe_unused igb_runtime_suspend(struct device *dev) > > static int __maybe_unused igb_runtime_resume(struct device *dev) > { > + struct net_device *netdev = dev_get_drvdata(dev); > + struct igb_adapter *adapter = netdev_priv(netdev); > + > + if (test_bit(__IGB_DOWN, &adapter->state)) { > + netif_device_attach(netdev); > + return 0; > + } > + > return igb_resume(dev); > } > > -- > 2.31.1 > Hi Heiner, I encountered below error after applied your patch. [ 121.489970] u kernel: ------------[ cut here ]------------ [ 121.489979] u kernel: igb 0000:05:00.0: disabling already-disabled device [ 121.490008] u kernel: WARNING: CPU: 7 PID: 258 at drivers/pci/pci.c:2146 pci_disable_device+0x91/0xb0 [ 121.490028] u kernel: Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg bnep btusb btrtl btbcm btintel bluetooth ecdh_generic ecc joydev input_leds inte l_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp ath10k_pci ath10k_core kvm_intel ath mac80211 kvm snd_sof_pci_intel_tgl snd_soc_acpi_intel_ma tch snd_sof_intel_hda_common nls_iso8859_1 soundwire_intel soundwire_generic_allocation soundwire_cadence soundwire_bus snd_sof_pci snd_soc_acpi snd_sof snd_soc_core snd _hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul snd_sof_xtensa_dsp ghash_clmulni_intel ledtrig_audio aesni_intel snd_hda_intel libarc4 crypto_simd snd_intel_dspcfg snd_intel_sdw_acpi cryptd snd_hda_codec cfg80211 mei_hdcp snd_hwdep snd_hda_core intel_wmi_thunderbolt snd_pcm wmi_bmof snd_seq inte l_cstate efi_pstore snd_timer snd_seq_device ee1004 mei_me snd mei ucsi_acpi soundcore typec_ucsi typec wmi mac_hid acpi_pad acpi_tad sch_fq_codel [ 121.490314] u kernel: parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_ pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear hid_generic usbhid hid i915 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc _core igb drm nvme e1000e nvme_core i2c_i801 dca i2c_smbus i2c_algo_bit intel_lpss_pci intel_lpss ahci idma64 video xhci_pci libahci virt_dma xhci_pci_renesas pinctrl_ti gerlake [ 121.490508] u kernel: CPU: 7 PID: 258 Comm: kworker/7:2 Tainted: G U 5.12.0-rc7+ #79 [ 121.490518] u kernel: Hardware name: Dell Inc. OptiPlex 7090/, BIOS 0.12.80 02/23/2021 [ 121.490525] u kernel: Workqueue: pm pm_runtime_work [ 121.490540] u kernel: RIP: 0010:pci_disable_device+0x91/0xb0 [ 121.490550] u kernel: Code: 4d 85 e4 75 07 4c 8b a3 c8 00 00 00 48 8d bb c8 00 00 00 e8 61 8d 17 00 4c 89 e2 48 c7 c7 60 5a e0 a5 48 89 c6 e8 9b a3 59 00 <0f> 0b eb 8 d 48 89 df e8 e3 fe ff ff 80 a3 49 0a 00 00 df 5b 41 5c [ 121.490558] u kernel: RSP: 0018:ffffb76b4169fc90 EFLAGS: 00010286 [ 121.490569] u kernel: RAX: 0000000000000000 RBX: ffff9e2581ee6000 RCX: 0000000000000027 [ 121.490576] u kernel: RDX: 0000000000000027 RSI: ffffffffa493bca0 RDI: ffff9e27073e89b8 [ 121.490582] u kernel: RBP: ffffb76b4169fca0 R08: ffff9e27073e89b0 R09: 0000000000000000 [ 121.490588] u kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff9e2581af7c80 [ 121.490594] u kernel: R13: ffff9e2581ee6000 R14: ffff9e25a0914000 R15: ffff9e25a0915280 [ 121.490600] u kernel: FS: 0000000000000000(0000) GS:ffff9e2707200000(0000) knlGS:0000000000000000 [ 121.490608] u kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 121.490614] u kernel: CR2: 00007ff86ec8d024 CR3: 0000000189c28002 CR4: 0000000000770ee0 [ 121.490621] u kernel: PKRU: 55555554 [ 121.490626] u kernel: Call Trace: [ 121.490638] u kernel: __igb_shutdown+0xf2/0x1c0 [igb] [ 121.490676] u kernel: igb_runtime_suspend+0x1c/0x20 [igb] [ 121.490703] u kernel: pci_pm_runtime_suspend+0x63/0x180 [ 121.490715] u kernel: ? pci_pm_runtime_resume+0x90/0x90 [ 121.490727] u kernel: __rpm_callback+0xc7/0x140 [ 121.490740] u kernel: rpm_callback+0x57/0x80 [ 121.490750] u kernel: ? pci_pm_runtime_resume+0x90/0x90 [ 121.490759] u kernel: rpm_suspend+0x119/0x640 [ 121.490774] u kernel: pm_runtime_work+0x64/0xc0 [ 121.490784] u kernel: process_one_work+0x2af/0x5d0 [ 121.490803] u kernel: worker_thread+0x4d/0x3e0 [ 121.490814] u kernel: ? process_one_work+0x5d0/0x5d0 [ 121.490825] u kernel: kthread+0x12a/0x160 [ 121.490834] u kernel: ? kthread_park+0x90/0x90 [ 121.490844] u kernel: ret_from_fork+0x1f/0x30 [ 121.490867] u kernel: irq event stamp: 0[ 121.490871] u kernel: hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 121.490916] u kernel: hardirqs last disabled at (0): [<ffffffffa489ea44>] copy_process+0x714/0x1cc0 [ 121.490929] u kernel: softirqs last enabled at (0): [<ffffffffa489ea44>] copy_process+0x714/0x1cc0 [ 121.490938] u kernel: softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 121.490949] u kernel: ---[ end trace a9c7ffc27c226979 ]---
On 26.04.2021 09:36, AceLan Kao wrote: > Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月25日 週日 上午4:07寫道: >> >> On 23.04.2021 05:42, AceLan Kao wrote: >>> Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月22日 週四 下午3:09寫道: >>>> >>>> On 22.04.2021 08:30, AceLan Kao wrote: >>>>> Yes, should add >>>>> >>>>> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") >>>>> and also >>>>> Fixes: 9513d2a5dc7f ("igc: Add legacy power management support") >>>>> >>>> Please don't top-post. Apart from that: >>>> If the issue was introduced with driver changes, then adding a workaround >>>> in net core may not be the right approach. >>> It's hard to say who introduces this issue, we probably could point >>> our finger to below commit >>> bd869245a3dc net: core: try to runtime-resume detached device in __dev_open >>> >>> This calling path is not usual, in my case, the NIC is not plugged in >>> any Ethernet cable, >>> and we are doing networking tests on another NIC on the system. So, >>> remove the rtnl lock from igb driver will affect other scenarios. >>> >>>> >>>>> Jakub Kicinski <kuba@kernel.org> 於 2021年4月21日 週三 上午3:27寫道: >>>>>> >>>>>> On Tue, 20 Apr 2021 10:34:17 +0200 Eric Dumazet wrote: >>>>>>> On Tue, Apr 20, 2021 at 9:54 AM AceLan Kao <acelan.kao@canonical.com> wrote: >>>>>>>> >>>>>>>> From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> >>>>>>>> >>>>>>>> The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in >>>>>>>> __dev_open() it calls pm_runtime_resume() to resume devices, and in >>>>>>>> some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock() >>>>>>>> again. That leads to a recursive lock. >>>>>>>> >>>>>>>> It should leave the devices' resume function to decide if they need to >>>>>>>> call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling >>>>>>>> pm_runtime_resume() and then call rtnl_lock() after it in __dev_open(). >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Hi Acelan >>>>>>> >>>>>>> When was the bugg added ? >>>>>>> Please add a Fixes: tag >>>>>> >>>>>> For immediate cause probably: >>>>>> >>>>>> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") >>>>>> >>>>>>> By doing so, you give more chances for reviewers to understand why the >>>>>>> fix is not risky, >>>>>>> and help stable teams work. >>>>>> >>>>>> IMO the driver lacks internal locking. Taking 看rtnl from resume is just >>>>>> one example, git history shows many more places that lacked locking and >>>>>> got papered over with rtnl here. >>>> >> >> You could alternatively try the following. It should avoid the deadlock, >> and when runtime-resuming if __IGB_DOWN is set all we do is marking the >> net_device as present (because of PCI D3 -> D0 transition). >> I do basically the same in r8169 and it works as intended. >> >> Disclaimer: I don't have an igb-driven device and therefore can't test >> the proposal. >> >> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c >> index 038a9fd1a..21436626a 100644 >> --- a/drivers/net/ethernet/intel/igb/igb_main.c >> +++ b/drivers/net/ethernet/intel/igb/igb_main.c >> @@ -9300,6 +9300,14 @@ static int __maybe_unused igb_runtime_suspend(struct device *dev) >> >> static int __maybe_unused igb_runtime_resume(struct device *dev) >> { >> + struct net_device *netdev = dev_get_drvdata(dev); >> + struct igb_adapter *adapter = netdev_priv(netdev); >> + >> + if (test_bit(__IGB_DOWN, &adapter->state)) { >> + netif_device_attach(netdev); >> + return 0; >> + } >> + >> return igb_resume(dev); >> } >> >> -- >> 2.31.1 >> > > Hi Heiner, > > I encountered below error after applied your patch. > Presumably similar changes are needed also for the runtime_suspend callback. If __IGB_DOWN is set, I think just the net_device needs to be detached. > [ 121.489970] u kernel: ------------[ cut here ]------------ > [ 121.489979] u kernel: igb 0000:05:00.0: disabling already-disabled device > [ 121.490008] u kernel: WARNING: CPU: 7 PID: 258 at > drivers/pci/pci.c:2146 pci_disable_device+0x91/0xb0 > [ 121.490028] u kernel: Modules linked in: rfcomm cmac algif_hash > algif_skcipher af_alg bnep btusb btrtl btbcm btintel bluetooth > ecdh_generic ecc joydev input_leds inte > l_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp > coretemp ath10k_pci ath10k_core kvm_intel ath mac80211 kvm > snd_sof_pci_intel_tgl snd_soc_acpi_intel_ma > tch snd_sof_intel_hda_common nls_iso8859_1 soundwire_intel > soundwire_generic_allocation soundwire_cadence soundwire_bus > snd_sof_pci snd_soc_acpi snd_sof snd_soc_core snd > _hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi > crct10dif_pclmul crc32_pclmul snd_sof_xtensa_dsp ghash_clmulni_intel > ledtrig_audio aesni_intel snd_hda_intel > libarc4 crypto_simd snd_intel_dspcfg snd_intel_sdw_acpi cryptd > snd_hda_codec cfg80211 mei_hdcp snd_hwdep snd_hda_core > intel_wmi_thunderbolt snd_pcm wmi_bmof snd_seq inte > l_cstate efi_pstore snd_timer snd_seq_device ee1004 mei_me snd mei > ucsi_acpi soundcore typec_ucsi typec wmi mac_hid acpi_pad acpi_tad > sch_fq_codel > [ 121.490314] u kernel: parport_pc ppdev lp parport ip_tables > x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 > async_raid6_recov async_memcpy async_ > pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath > linear hid_generic usbhid hid i915 drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops cec rc > _core igb drm nvme e1000e nvme_core i2c_i801 dca i2c_smbus > i2c_algo_bit intel_lpss_pci intel_lpss ahci idma64 video xhci_pci > libahci virt_dma xhci_pci_renesas pinctrl_ti > gerlake > [ 121.490508] u kernel: CPU: 7 PID: 258 Comm: kworker/7:2 Tainted: G > U 5.12.0-rc7+ #79 > [ 121.490518] u kernel: Hardware name: Dell Inc. OptiPlex 7090/, BIOS > 0.12.80 02/23/2021 > [ 121.490525] u kernel: Workqueue: pm pm_runtime_work > [ 121.490540] u kernel: RIP: 0010:pci_disable_device+0x91/0xb0 > [ 121.490550] u kernel: Code: 4d 85 e4 75 07 4c 8b a3 c8 00 00 00 48 > 8d bb c8 00 00 00 e8 61 8d 17 00 4c 89 e2 48 c7 c7 60 5a e0 a5 48 89 > c6 e8 9b a3 59 00 <0f> 0b eb 8 > d 48 89 df e8 e3 fe ff ff 80 a3 49 0a 00 00 df 5b 41 5c > [ 121.490558] u kernel: RSP: 0018:ffffb76b4169fc90 EFLAGS: 00010286 > [ 121.490569] u kernel: RAX: 0000000000000000 RBX: ffff9e2581ee6000 > RCX: 0000000000000027 > [ 121.490576] u kernel: RDX: 0000000000000027 RSI: ffffffffa493bca0 > RDI: ffff9e27073e89b8 > [ 121.490582] u kernel: RBP: ffffb76b4169fca0 R08: ffff9e27073e89b0 > R09: 0000000000000000 > [ 121.490588] u kernel: R10: 0000000000000000 R11: 0000000000000001 > R12: ffff9e2581af7c80 > [ 121.490594] u kernel: R13: ffff9e2581ee6000 R14: ffff9e25a0914000 > R15: ffff9e25a0915280 > [ 121.490600] u kernel: FS: 0000000000000000(0000) > GS:ffff9e2707200000(0000) knlGS:0000000000000000 > [ 121.490608] u kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 121.490614] u kernel: CR2: 00007ff86ec8d024 CR3: 0000000189c28002 > CR4: 0000000000770ee0 > [ 121.490621] u kernel: PKRU: 55555554 > [ 121.490626] u kernel: Call Trace: > [ 121.490638] u kernel: __igb_shutdown+0xf2/0x1c0 [igb] > [ 121.490676] u kernel: igb_runtime_suspend+0x1c/0x20 [igb] > [ 121.490703] u kernel: pci_pm_runtime_suspend+0x63/0x180 > [ 121.490715] u kernel: ? pci_pm_runtime_resume+0x90/0x90 > [ 121.490727] u kernel: __rpm_callback+0xc7/0x140 > [ 121.490740] u kernel: rpm_callback+0x57/0x80 > [ 121.490750] u kernel: ? pci_pm_runtime_resume+0x90/0x90 > [ 121.490759] u kernel: rpm_suspend+0x119/0x640 > [ 121.490774] u kernel: pm_runtime_work+0x64/0xc0 > [ 121.490784] u kernel: process_one_work+0x2af/0x5d0 > [ 121.490803] u kernel: worker_thread+0x4d/0x3e0 > [ 121.490814] u kernel: ? process_one_work+0x5d0/0x5d0 > [ 121.490825] u kernel: kthread+0x12a/0x160 > [ 121.490834] u kernel: ? kthread_park+0x90/0x90 > [ 121.490844] u kernel: ret_from_fork+0x1f/0x30 > [ 121.490867] u kernel: irq event stamp: 0[ 121.490871] u kernel: > hardirqs last enabled at (0): [<0000000000000000>] 0x0 > [ 121.490916] u kernel: hardirqs last disabled at (0): > [<ffffffffa489ea44>] copy_process+0x714/0x1cc0 > [ 121.490929] u kernel: softirqs last enabled at (0): > [<ffffffffa489ea44>] copy_process+0x714/0x1cc0 > [ 121.490938] u kernel: softirqs last disabled at (0): [<0000000000000000>] 0x0 > [ 121.490949] u kernel: ---[ end trace a9c7ffc27c226979 ]--- >
I got another issue ticket with the same dump_stack with igc NIC. To fix this on the device side would lead to many other issues and we have to check on the device side if the calling path contains rtnl_lock already. I still think we should not keep rtnl locked when calling functions outside of network stack. Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月26日 週一 下午4:42寫道: > > On 26.04.2021 09:36, AceLan Kao wrote: > > Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月25日 週日 上午4:07寫道: > >> > >> On 23.04.2021 05:42, AceLan Kao wrote: > >>> Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月22日 週四 下午3:09寫道: > >>>> > >>>> On 22.04.2021 08:30, AceLan Kao wrote: > >>>>> Yes, should add > >>>>> > >>>>> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") > >>>>> and also > >>>>> Fixes: 9513d2a5dc7f ("igc: Add legacy power management support") > >>>>> > >>>> Please don't top-post. Apart from that: > >>>> If the issue was introduced with driver changes, then adding a workaround > >>>> in net core may not be the right approach. > >>> It's hard to say who introduces this issue, we probably could point > >>> our finger to below commit > >>> bd869245a3dc net: core: try to runtime-resume detached device in __dev_open > >>> > >>> This calling path is not usual, in my case, the NIC is not plugged in > >>> any Ethernet cable, > >>> and we are doing networking tests on another NIC on the system. So, > >>> remove the rtnl lock from igb driver will affect other scenarios. > >>> > >>>> > >>>>> Jakub Kicinski <kuba@kernel.org> 於 2021年4月21日 週三 上午3:27寫道: > >>>>>> > >>>>>> On Tue, 20 Apr 2021 10:34:17 +0200 Eric Dumazet wrote: > >>>>>>> On Tue, Apr 20, 2021 at 9:54 AM AceLan Kao <acelan.kao@canonical.com> wrote: > >>>>>>>> > >>>>>>>> From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> > >>>>>>>> > >>>>>>>> The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in > >>>>>>>> __dev_open() it calls pm_runtime_resume() to resume devices, and in > >>>>>>>> some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock() > >>>>>>>> again. That leads to a recursive lock. > >>>>>>>> > >>>>>>>> It should leave the devices' resume function to decide if they need to > >>>>>>>> call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling > >>>>>>>> pm_runtime_resume() and then call rtnl_lock() after it in __dev_open(). > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> Hi Acelan > >>>>>>> > >>>>>>> When was the bugg added ? > >>>>>>> Please add a Fixes: tag > >>>>>> > >>>>>> For immediate cause probably: > >>>>>> > >>>>>> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") > >>>>>> > >>>>>>> By doing so, you give more chances for reviewers to understand why the > >>>>>>> fix is not risky, > >>>>>>> and help stable teams work. > >>>>>> > >>>>>> IMO the driver lacks internal locking. Taking 看rtnl from resume is just > >>>>>> one example, git history shows many more places that lacked locking and > >>>>>> got papered over with rtnl here. > >>>> > >> > >> You could alternatively try the following. It should avoid the deadlock, > >> and when runtime-resuming if __IGB_DOWN is set all we do is marking the > >> net_device as present (because of PCI D3 -> D0 transition). > >> I do basically the same in r8169 and it works as intended. > >> > >> Disclaimer: I don't have an igb-driven device and therefore can't test > >> the proposal. > >> > >> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c > >> index 038a9fd1a..21436626a 100644 > >> --- a/drivers/net/ethernet/intel/igb/igb_main.c > >> +++ b/drivers/net/ethernet/intel/igb/igb_main.c > >> @@ -9300,6 +9300,14 @@ static int __maybe_unused igb_runtime_suspend(struct device *dev) > >> > >> static int __maybe_unused igb_runtime_resume(struct device *dev) > >> { > >> + struct net_device *netdev = dev_get_drvdata(dev); > >> + struct igb_adapter *adapter = netdev_priv(netdev); > >> + > >> + if (test_bit(__IGB_DOWN, &adapter->state)) { > >> + netif_device_attach(netdev); > >> + return 0; > >> + } > >> + > >> return igb_resume(dev); > >> } > >> > >> -- > >> 2.31.1 > >> > > > > Hi Heiner, > > > > I encountered below error after applied your patch. > > > Presumably similar changes are needed also for the runtime_suspend callback. > If __IGB_DOWN is set, I think just the net_device needs to be detached. > > > > [ 121.489970] u kernel: ------------[ cut here ]------------ > > [ 121.489979] u kernel: igb 0000:05:00.0: disabling already-disabled device > > [ 121.490008] u kernel: WARNING: CPU: 7 PID: 258 at > > drivers/pci/pci.c:2146 pci_disable_device+0x91/0xb0 > > [ 121.490028] u kernel: Modules linked in: rfcomm cmac algif_hash > > algif_skcipher af_alg bnep btusb btrtl btbcm btintel bluetooth > > ecdh_generic ecc joydev input_leds inte > > l_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp > > coretemp ath10k_pci ath10k_core kvm_intel ath mac80211 kvm > > snd_sof_pci_intel_tgl snd_soc_acpi_intel_ma > > tch snd_sof_intel_hda_common nls_iso8859_1 soundwire_intel > > soundwire_generic_allocation soundwire_cadence soundwire_bus > > snd_sof_pci snd_soc_acpi snd_sof snd_soc_core snd > > _hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi > > crct10dif_pclmul crc32_pclmul snd_sof_xtensa_dsp ghash_clmulni_intel > > ledtrig_audio aesni_intel snd_hda_intel > > libarc4 crypto_simd snd_intel_dspcfg snd_intel_sdw_acpi cryptd > > snd_hda_codec cfg80211 mei_hdcp snd_hwdep snd_hda_core > > intel_wmi_thunderbolt snd_pcm wmi_bmof snd_seq inte > > l_cstate efi_pstore snd_timer snd_seq_device ee1004 mei_me snd mei > > ucsi_acpi soundcore typec_ucsi typec wmi mac_hid acpi_pad acpi_tad > > sch_fq_codel > > [ 121.490314] u kernel: parport_pc ppdev lp parport ip_tables > > x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 > > async_raid6_recov async_memcpy async_ > > pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath > > linear hid_generic usbhid hid i915 drm_kms_helper syscopyarea > > sysfillrect sysimgblt fb_sys_fops cec rc > > _core igb drm nvme e1000e nvme_core i2c_i801 dca i2c_smbus > > i2c_algo_bit intel_lpss_pci intel_lpss ahci idma64 video xhci_pci > > libahci virt_dma xhci_pci_renesas pinctrl_ti > > gerlake > > [ 121.490508] u kernel: CPU: 7 PID: 258 Comm: kworker/7:2 Tainted: G > > U 5.12.0-rc7+ #79 > > [ 121.490518] u kernel: Hardware name: Dell Inc. OptiPlex 7090/, BIOS > > 0.12.80 02/23/2021 > > [ 121.490525] u kernel: Workqueue: pm pm_runtime_work > > [ 121.490540] u kernel: RIP: 0010:pci_disable_device+0x91/0xb0 > > [ 121.490550] u kernel: Code: 4d 85 e4 75 07 4c 8b a3 c8 00 00 00 48 > > 8d bb c8 00 00 00 e8 61 8d 17 00 4c 89 e2 48 c7 c7 60 5a e0 a5 48 89 > > c6 e8 9b a3 59 00 <0f> 0b eb 8 > > d 48 89 df e8 e3 fe ff ff 80 a3 49 0a 00 00 df 5b 41 5c > > [ 121.490558] u kernel: RSP: 0018:ffffb76b4169fc90 EFLAGS: 00010286 > > [ 121.490569] u kernel: RAX: 0000000000000000 RBX: ffff9e2581ee6000 > > RCX: 0000000000000027 > > [ 121.490576] u kernel: RDX: 0000000000000027 RSI: ffffffffa493bca0 > > RDI: ffff9e27073e89b8 > > [ 121.490582] u kernel: RBP: ffffb76b4169fca0 R08: ffff9e27073e89b0 > > R09: 0000000000000000 > > [ 121.490588] u kernel: R10: 0000000000000000 R11: 0000000000000001 > > R12: ffff9e2581af7c80 > > [ 121.490594] u kernel: R13: ffff9e2581ee6000 R14: ffff9e25a0914000 > > R15: ffff9e25a0915280 > > [ 121.490600] u kernel: FS: 0000000000000000(0000) > > GS:ffff9e2707200000(0000) knlGS:0000000000000000 > > [ 121.490608] u kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 121.490614] u kernel: CR2: 00007ff86ec8d024 CR3: 0000000189c28002 > > CR4: 0000000000770ee0 > > [ 121.490621] u kernel: PKRU: 55555554 > > [ 121.490626] u kernel: Call Trace: > > [ 121.490638] u kernel: __igb_shutdown+0xf2/0x1c0 [igb] > > [ 121.490676] u kernel: igb_runtime_suspend+0x1c/0x20 [igb] > > [ 121.490703] u kernel: pci_pm_runtime_suspend+0x63/0x180 > > [ 121.490715] u kernel: ? pci_pm_runtime_resume+0x90/0x90 > > [ 121.490727] u kernel: __rpm_callback+0xc7/0x140 > > [ 121.490740] u kernel: rpm_callback+0x57/0x80 > > [ 121.490750] u kernel: ? pci_pm_runtime_resume+0x90/0x90 > > [ 121.490759] u kernel: rpm_suspend+0x119/0x640 > > [ 121.490774] u kernel: pm_runtime_work+0x64/0xc0 > > [ 121.490784] u kernel: process_one_work+0x2af/0x5d0 > > [ 121.490803] u kernel: worker_thread+0x4d/0x3e0 > > [ 121.490814] u kernel: ? process_one_work+0x5d0/0x5d0 > > [ 121.490825] u kernel: kthread+0x12a/0x160 > > [ 121.490834] u kernel: ? kthread_park+0x90/0x90 > > [ 121.490844] u kernel: ret_from_fork+0x1f/0x30 > > [ 121.490867] u kernel: irq event stamp: 0[ 121.490871] u kernel: > > hardirqs last enabled at (0): [<0000000000000000>] 0x0 > > [ 121.490916] u kernel: hardirqs last disabled at (0): > > [<ffffffffa489ea44>] copy_process+0x714/0x1cc0 > > [ 121.490929] u kernel: softirqs last enabled at (0): > > [<ffffffffa489ea44>] copy_process+0x714/0x1cc0 > > [ 121.490938] u kernel: softirqs last disabled at (0): [<0000000000000000>] 0x0 > > [ 121.490949] u kernel: ---[ end trace a9c7ffc27c226979 ]--- > > >
On 27.04.2021 03:58, AceLan Kao wrote: > I got another issue ticket with the same dump_stack with igc NIC. > To fix this on the device side would lead to many other issues and we > have to check on the device side if the calling path contains > rtnl_lock already. I think it's not clean that the igb/igc runtime_resume handlers call __igb_open() before ndo_open(). I understand why it works in the drivers, but I'd consider it a little bit hacky. And there's no need for the drivers to acquire rtnl_lock before ndo_open(). > I still think we should not keep rtnl locked when calling functions > outside of network stack. > Well, the runtime pm handlers of a MAC driver are within the network stack. I understand that your proposed change is the easiest way to work around the issue you're facing. I'm not sure it's the best way however, but I don't have a strong opinion here and would leave it to the maintainers which option they prefer. > Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月26日 週一 下午4:42寫道: >> >> On 26.04.2021 09:36, AceLan Kao wrote: >>> Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月25日 週日 上午4:07寫道: >>>> >>>> On 23.04.2021 05:42, AceLan Kao wrote: >>>>> Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月22日 週四 下午3:09寫道: >>>>>> >>>>>> On 22.04.2021 08:30, AceLan Kao wrote: >>>>>>> Yes, should add >>>>>>> >>>>>>> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") >>>>>>> and also >>>>>>> Fixes: 9513d2a5dc7f ("igc: Add legacy power management support") >>>>>>> >>>>>> Please don't top-post. Apart from that: >>>>>> If the issue was introduced with driver changes, then adding a workaround >>>>>> in net core may not be the right approach. >>>>> It's hard to say who introduces this issue, we probably could point >>>>> our finger to below commit >>>>> bd869245a3dc net: core: try to runtime-resume detached device in __dev_open >>>>> >>>>> This calling path is not usual, in my case, the NIC is not plugged in >>>>> any Ethernet cable, >>>>> and we are doing networking tests on another NIC on the system. So, >>>>> remove the rtnl lock from igb driver will affect other scenarios. >>>>> >>>>>> >>>>>>> Jakub Kicinski <kuba@kernel.org> 於 2021年4月21日 週三 上午3:27寫道: >>>>>>>> >>>>>>>> On Tue, 20 Apr 2021 10:34:17 +0200 Eric Dumazet wrote: >>>>>>>>> On Tue, Apr 20, 2021 at 9:54 AM AceLan Kao <acelan.kao@canonical.com> wrote: >>>>>>>>>> >>>>>>>>>> From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> >>>>>>>>>> >>>>>>>>>> The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in >>>>>>>>>> __dev_open() it calls pm_runtime_resume() to resume devices, and in >>>>>>>>>> some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock() >>>>>>>>>> again. That leads to a recursive lock. >>>>>>>>>> >>>>>>>>>> It should leave the devices' resume function to decide if they need to >>>>>>>>>> call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling >>>>>>>>>> pm_runtime_resume() and then call rtnl_lock() after it in __dev_open(). >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Acelan >>>>>>>>> >>>>>>>>> When was the bugg added ? >>>>>>>>> Please add a Fixes: tag >>>>>>>> >>>>>>>> For immediate cause probably: >>>>>>>> >>>>>>>> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") >>>>>>>> >>>>>>>>> By doing so, you give more chances for reviewers to understand why the >>>>>>>>> fix is not risky, >>>>>>>>> and help stable teams work. >>>>>>>> >>>>>>>> IMO the driver lacks internal locking. Taking 看rtnl from resume is just >>>>>>>> one example, git history shows many more places that lacked locking and >>>>>>>> got papered over with rtnl here. >>>>>> >>>> >>>> You could alternatively try the following. It should avoid the deadlock, >>>> and when runtime-resuming if __IGB_DOWN is set all we do is marking the >>>> net_device as present (because of PCI D3 -> D0 transition). >>>> I do basically the same in r8169 and it works as intended. >>>> >>>> Disclaimer: I don't have an igb-driven device and therefore can't test >>>> the proposal. >>>> >>>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c >>>> index 038a9fd1a..21436626a 100644 >>>> --- a/drivers/net/ethernet/intel/igb/igb_main.c >>>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c >>>> @@ -9300,6 +9300,14 @@ static int __maybe_unused igb_runtime_suspend(struct device *dev) >>>> >>>> static int __maybe_unused igb_runtime_resume(struct device *dev) >>>> { >>>> + struct net_device *netdev = dev_get_drvdata(dev); >>>> + struct igb_adapter *adapter = netdev_priv(netdev); >>>> + >>>> + if (test_bit(__IGB_DOWN, &adapter->state)) { >>>> + netif_device_attach(netdev); >>>> + return 0; >>>> + } >>>> + >>>> return igb_resume(dev); >>>> } >>>> >>>> -- >>>> 2.31.1 >>>> >>> >>> Hi Heiner, >>> >>> I encountered below error after applied your patch. >>> >> Presumably similar changes are needed also for the runtime_suspend callback. >> If __IGB_DOWN is set, I think just the net_device needs to be detached. >> >> >>> [ 121.489970] u kernel: ------------[ cut here ]------------ >>> [ 121.489979] u kernel: igb 0000:05:00.0: disabling already-disabled device >>> [ 121.490008] u kernel: WARNING: CPU: 7 PID: 258 at >>> drivers/pci/pci.c:2146 pci_disable_device+0x91/0xb0 >>> [ 121.490028] u kernel: Modules linked in: rfcomm cmac algif_hash >>> algif_skcipher af_alg bnep btusb btrtl btbcm btintel bluetooth >>> ecdh_generic ecc joydev input_leds inte >>> l_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp >>> coretemp ath10k_pci ath10k_core kvm_intel ath mac80211 kvm >>> snd_sof_pci_intel_tgl snd_soc_acpi_intel_ma >>> tch snd_sof_intel_hda_common nls_iso8859_1 soundwire_intel >>> soundwire_generic_allocation soundwire_cadence soundwire_bus >>> snd_sof_pci snd_soc_acpi snd_sof snd_soc_core snd >>> _hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi >>> crct10dif_pclmul crc32_pclmul snd_sof_xtensa_dsp ghash_clmulni_intel >>> ledtrig_audio aesni_intel snd_hda_intel >>> libarc4 crypto_simd snd_intel_dspcfg snd_intel_sdw_acpi cryptd >>> snd_hda_codec cfg80211 mei_hdcp snd_hwdep snd_hda_core >>> intel_wmi_thunderbolt snd_pcm wmi_bmof snd_seq inte >>> l_cstate efi_pstore snd_timer snd_seq_device ee1004 mei_me snd mei >>> ucsi_acpi soundcore typec_ucsi typec wmi mac_hid acpi_pad acpi_tad >>> sch_fq_codel >>> [ 121.490314] u kernel: parport_pc ppdev lp parport ip_tables >>> x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 >>> async_raid6_recov async_memcpy async_ >>> pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath >>> linear hid_generic usbhid hid i915 drm_kms_helper syscopyarea >>> sysfillrect sysimgblt fb_sys_fops cec rc >>> _core igb drm nvme e1000e nvme_core i2c_i801 dca i2c_smbus >>> i2c_algo_bit intel_lpss_pci intel_lpss ahci idma64 video xhci_pci >>> libahci virt_dma xhci_pci_renesas pinctrl_ti >>> gerlake >>> [ 121.490508] u kernel: CPU: 7 PID: 258 Comm: kworker/7:2 Tainted: G >>> U 5.12.0-rc7+ #79 >>> [ 121.490518] u kernel: Hardware name: Dell Inc. OptiPlex 7090/, BIOS >>> 0.12.80 02/23/2021 >>> [ 121.490525] u kernel: Workqueue: pm pm_runtime_work >>> [ 121.490540] u kernel: RIP: 0010:pci_disable_device+0x91/0xb0 >>> [ 121.490550] u kernel: Code: 4d 85 e4 75 07 4c 8b a3 c8 00 00 00 48 >>> 8d bb c8 00 00 00 e8 61 8d 17 00 4c 89 e2 48 c7 c7 60 5a e0 a5 48 89 >>> c6 e8 9b a3 59 00 <0f> 0b eb 8 >>> d 48 89 df e8 e3 fe ff ff 80 a3 49 0a 00 00 df 5b 41 5c >>> [ 121.490558] u kernel: RSP: 0018:ffffb76b4169fc90 EFLAGS: 00010286 >>> [ 121.490569] u kernel: RAX: 0000000000000000 RBX: ffff9e2581ee6000 >>> RCX: 0000000000000027 >>> [ 121.490576] u kernel: RDX: 0000000000000027 RSI: ffffffffa493bca0 >>> RDI: ffff9e27073e89b8 >>> [ 121.490582] u kernel: RBP: ffffb76b4169fca0 R08: ffff9e27073e89b0 >>> R09: 0000000000000000 >>> [ 121.490588] u kernel: R10: 0000000000000000 R11: 0000000000000001 >>> R12: ffff9e2581af7c80 >>> [ 121.490594] u kernel: R13: ffff9e2581ee6000 R14: ffff9e25a0914000 >>> R15: ffff9e25a0915280 >>> [ 121.490600] u kernel: FS: 0000000000000000(0000) >>> GS:ffff9e2707200000(0000) knlGS:0000000000000000 >>> [ 121.490608] u kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 121.490614] u kernel: CR2: 00007ff86ec8d024 CR3: 0000000189c28002 >>> CR4: 0000000000770ee0 >>> [ 121.490621] u kernel: PKRU: 55555554 >>> [ 121.490626] u kernel: Call Trace: >>> [ 121.490638] u kernel: __igb_shutdown+0xf2/0x1c0 [igb] >>> [ 121.490676] u kernel: igb_runtime_suspend+0x1c/0x20 [igb] >>> [ 121.490703] u kernel: pci_pm_runtime_suspend+0x63/0x180 >>> [ 121.490715] u kernel: ? pci_pm_runtime_resume+0x90/0x90 >>> [ 121.490727] u kernel: __rpm_callback+0xc7/0x140 >>> [ 121.490740] u kernel: rpm_callback+0x57/0x80 >>> [ 121.490750] u kernel: ? pci_pm_runtime_resume+0x90/0x90 >>> [ 121.490759] u kernel: rpm_suspend+0x119/0x640 >>> [ 121.490774] u kernel: pm_runtime_work+0x64/0xc0 >>> [ 121.490784] u kernel: process_one_work+0x2af/0x5d0 >>> [ 121.490803] u kernel: worker_thread+0x4d/0x3e0 >>> [ 121.490814] u kernel: ? process_one_work+0x5d0/0x5d0 >>> [ 121.490825] u kernel: kthread+0x12a/0x160 >>> [ 121.490834] u kernel: ? kthread_park+0x90/0x90 >>> [ 121.490844] u kernel: ret_from_fork+0x1f/0x30 >>> [ 121.490867] u kernel: irq event stamp: 0[ 121.490871] u kernel: >>> hardirqs last enabled at (0): [<0000000000000000>] 0x0 >>> [ 121.490916] u kernel: hardirqs last disabled at (0): >>> [<ffffffffa489ea44>] copy_process+0x714/0x1cc0 >>> [ 121.490929] u kernel: softirqs last enabled at (0): >>> [<ffffffffa489ea44>] copy_process+0x714/0x1cc0 >>> [ 121.490938] u kernel: softirqs last disabled at (0): [<0000000000000000>] 0x0 >>> [ 121.490949] u kernel: ---[ end trace a9c7ffc27c226979 ]--- >>> >>
On Tue, 20 Apr 2021 at 09:55, AceLan Kao <acelan.kao@canonical.com> wrote: > > From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> > > The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in > __dev_open() it calls pm_runtime_resume() to resume devices, and in > some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock() > again. That leads to a recursive lock. > > It should leave the devices' resume function to decide if they need to > call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling > pm_runtime_resume() and then call rtnl_lock() after it in __dev_open(). > > [ 967.723577] INFO: task ip:6024 blocked for more than 120 seconds. > [ 967.723588] Not tainted 5.12.0-rc3+ #1 > [ 967.723592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 967.723594] task:ip state:D stack: 0 pid: 6024 ppid: 5957 flags:0x00004000 > [ 967.723603] Call Trace: > [ 967.723610] __schedule+0x2de/0x890 > [ 967.723623] schedule+0x4f/0xc0 > [ 967.723629] schedule_preempt_disabled+0xe/0x10 > [ 967.723636] __mutex_lock.isra.0+0x190/0x510 > [ 967.723644] __mutex_lock_slowpath+0x13/0x20 > [ 967.723651] mutex_lock+0x32/0x40 > [ 967.723657] rtnl_lock+0x15/0x20 > [ 967.723665] igb_resume+0xee/0x1d0 [igb] > [ 967.723687] ? pci_pm_default_resume+0x30/0x30 > [ 967.723696] igb_runtime_resume+0xe/0x10 [igb] > [ 967.723713] pci_pm_runtime_resume+0x74/0x90 > [ 967.723718] __rpm_callback+0x53/0x1c0 > [ 967.723725] rpm_callback+0x57/0x80 > [ 967.723730] ? pci_pm_default_resume+0x30/0x30 > [ 967.723735] rpm_resume+0x547/0x760 > [ 967.723740] __pm_runtime_resume+0x52/0x80 > [ 967.723745] __dev_open+0x56/0x160 > [ 967.723753] ? _raw_spin_unlock_bh+0x1e/0x20 > [ 967.723758] __dev_change_flags+0x188/0x1e0 > [ 967.723766] dev_change_flags+0x26/0x60 > [ 967.723773] do_setlink+0x723/0x10b0 > [ 967.723782] ? __nla_validate_parse+0x5b/0xb80 > [ 967.723792] __rtnl_newlink+0x594/0xa00 > [ 967.723800] ? nla_put_ifalias+0x38/0xa0 > [ 967.723807] ? __nla_reserve+0x41/0x50 > [ 967.723813] ? __nla_reserve+0x41/0x50 > [ 967.723818] ? __kmalloc_node_track_caller+0x49b/0x4d0 > [ 967.723824] ? pskb_expand_head+0x75/0x310 > [ 967.723830] ? nla_reserve+0x28/0x30 > [ 967.723835] ? skb_free_head+0x25/0x30 > [ 967.723843] ? security_sock_rcv_skb+0x2f/0x50 > [ 967.723850] ? netlink_deliver_tap+0x3d/0x210 > [ 967.723859] ? sk_filter_trim_cap+0xc1/0x230 > [ 967.723863] ? skb_queue_tail+0x43/0x50 > [ 967.723870] ? sock_def_readable+0x4b/0x80 > [ 967.723876] ? __netlink_sendskb+0x42/0x50 > [ 967.723888] ? security_capable+0x3d/0x60 > [ 967.723894] ? __cond_resched+0x19/0x30 > [ 967.723900] ? kmem_cache_alloc_trace+0x390/0x440 > [ 967.723906] rtnl_newlink+0x49/0x70 > [ 967.723913] rtnetlink_rcv_msg+0x13c/0x370 > [ 967.723920] ? _copy_to_iter+0xa0/0x460 > [ 967.723927] ? rtnl_calcit.isra.0+0x130/0x130 > [ 967.723934] netlink_rcv_skb+0x55/0x100 > [ 967.723939] rtnetlink_rcv+0x15/0x20 > [ 967.723944] netlink_unicast+0x1a8/0x250 > [ 967.723949] netlink_sendmsg+0x233/0x460 > [ 967.723954] sock_sendmsg+0x65/0x70 > [ 967.723958] ____sys_sendmsg+0x218/0x290 > [ 967.723961] ? copy_msghdr_from_user+0x5c/0x90 > [ 967.723966] ? lru_cache_add_inactive_or_unevictable+0x27/0xb0 > [ 967.723974] ___sys_sendmsg+0x81/0xc0 > [ 967.723980] ? __mod_memcg_lruvec_state+0x22/0xe0 > [ 967.723987] ? kmem_cache_free+0x244/0x420 > [ 967.723991] ? dentry_free+0x37/0x70 > [ 967.723996] ? mntput_no_expire+0x4c/0x260 > [ 967.724001] ? __cond_resched+0x19/0x30 > [ 967.724007] ? security_file_free+0x54/0x60 > [ 967.724013] ? call_rcu+0xa4/0x250 > [ 967.724021] __sys_sendmsg+0x62/0xb0 > [ 967.724026] ? exit_to_user_mode_prepare+0x3d/0x1a0 > [ 967.724032] __x64_sys_sendmsg+0x1f/0x30 > [ 967.724037] do_syscall_64+0x38/0x90 > [ 967.724044] entry_SYSCALL_64_after_hwframe+0x44/0xae > > Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> > --- > net/core/dev.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 1f79b9aa9a3f..427cbc80d1e5 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -1537,8 +1537,11 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack) > > if (!netif_device_present(dev)) { > /* may be detached because parent is runtime-suspended */ > - if (dev->dev.parent) > + if (dev->dev.parent) { > + rtnl_unlock(); > pm_runtime_resume(dev->dev.parent); A side topic, maybe a little bit silly question (I don't know that much about net core): Why not increase the parent or current PM runtime usage counter instead? The problem with calling pm_runtime_resume() is that if the parent device was already suspended (so it's usage counter ==0), it might get suspended right after this call. IOW, you do not have any guarantee that the device will be really resumed. Probably it should be pm_runtime_resume_and_get() (and later "put" on close or other events). This however might not solve the lock problem at all. Best regards, Krzysztof
On 29.04.2021 13:58, Krzysztof Kozlowski wrote: > On Tue, 20 Apr 2021 at 09:55, AceLan Kao <acelan.kao@canonical.com> wrote: >> >> From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> >> >> The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in >> __dev_open() it calls pm_runtime_resume() to resume devices, and in >> some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock() >> again. That leads to a recursive lock. >> >> It should leave the devices' resume function to decide if they need to >> call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling >> pm_runtime_resume() and then call rtnl_lock() after it in __dev_open(). >> >> [ 967.723577] INFO: task ip:6024 blocked for more than 120 seconds. >> [ 967.723588] Not tainted 5.12.0-rc3+ #1 >> [ 967.723592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 967.723594] task:ip state:D stack: 0 pid: 6024 ppid: 5957 flags:0x00004000 >> [ 967.723603] Call Trace: >> [ 967.723610] __schedule+0x2de/0x890 >> [ 967.723623] schedule+0x4f/0xc0 >> [ 967.723629] schedule_preempt_disabled+0xe/0x10 >> [ 967.723636] __mutex_lock.isra.0+0x190/0x510 >> [ 967.723644] __mutex_lock_slowpath+0x13/0x20 >> [ 967.723651] mutex_lock+0x32/0x40 >> [ 967.723657] rtnl_lock+0x15/0x20 >> [ 967.723665] igb_resume+0xee/0x1d0 [igb] >> [ 967.723687] ? pci_pm_default_resume+0x30/0x30 >> [ 967.723696] igb_runtime_resume+0xe/0x10 [igb] >> [ 967.723713] pci_pm_runtime_resume+0x74/0x90 >> [ 967.723718] __rpm_callback+0x53/0x1c0 >> [ 967.723725] rpm_callback+0x57/0x80 >> [ 967.723730] ? pci_pm_default_resume+0x30/0x30 >> [ 967.723735] rpm_resume+0x547/0x760 >> [ 967.723740] __pm_runtime_resume+0x52/0x80 >> [ 967.723745] __dev_open+0x56/0x160 >> [ 967.723753] ? _raw_spin_unlock_bh+0x1e/0x20 >> [ 967.723758] __dev_change_flags+0x188/0x1e0 >> [ 967.723766] dev_change_flags+0x26/0x60 >> [ 967.723773] do_setlink+0x723/0x10b0 >> [ 967.723782] ? __nla_validate_parse+0x5b/0xb80 >> [ 967.723792] __rtnl_newlink+0x594/0xa00 >> [ 967.723800] ? nla_put_ifalias+0x38/0xa0 >> [ 967.723807] ? __nla_reserve+0x41/0x50 >> [ 967.723813] ? __nla_reserve+0x41/0x50 >> [ 967.723818] ? __kmalloc_node_track_caller+0x49b/0x4d0 >> [ 967.723824] ? pskb_expand_head+0x75/0x310 >> [ 967.723830] ? nla_reserve+0x28/0x30 >> [ 967.723835] ? skb_free_head+0x25/0x30 >> [ 967.723843] ? security_sock_rcv_skb+0x2f/0x50 >> [ 967.723850] ? netlink_deliver_tap+0x3d/0x210 >> [ 967.723859] ? sk_filter_trim_cap+0xc1/0x230 >> [ 967.723863] ? skb_queue_tail+0x43/0x50 >> [ 967.723870] ? sock_def_readable+0x4b/0x80 >> [ 967.723876] ? __netlink_sendskb+0x42/0x50 >> [ 967.723888] ? security_capable+0x3d/0x60 >> [ 967.723894] ? __cond_resched+0x19/0x30 >> [ 967.723900] ? kmem_cache_alloc_trace+0x390/0x440 >> [ 967.723906] rtnl_newlink+0x49/0x70 >> [ 967.723913] rtnetlink_rcv_msg+0x13c/0x370 >> [ 967.723920] ? _copy_to_iter+0xa0/0x460 >> [ 967.723927] ? rtnl_calcit.isra.0+0x130/0x130 >> [ 967.723934] netlink_rcv_skb+0x55/0x100 >> [ 967.723939] rtnetlink_rcv+0x15/0x20 >> [ 967.723944] netlink_unicast+0x1a8/0x250 >> [ 967.723949] netlink_sendmsg+0x233/0x460 >> [ 967.723954] sock_sendmsg+0x65/0x70 >> [ 967.723958] ____sys_sendmsg+0x218/0x290 >> [ 967.723961] ? copy_msghdr_from_user+0x5c/0x90 >> [ 967.723966] ? lru_cache_add_inactive_or_unevictable+0x27/0xb0 >> [ 967.723974] ___sys_sendmsg+0x81/0xc0 >> [ 967.723980] ? __mod_memcg_lruvec_state+0x22/0xe0 >> [ 967.723987] ? kmem_cache_free+0x244/0x420 >> [ 967.723991] ? dentry_free+0x37/0x70 >> [ 967.723996] ? mntput_no_expire+0x4c/0x260 >> [ 967.724001] ? __cond_resched+0x19/0x30 >> [ 967.724007] ? security_file_free+0x54/0x60 >> [ 967.724013] ? call_rcu+0xa4/0x250 >> [ 967.724021] __sys_sendmsg+0x62/0xb0 >> [ 967.724026] ? exit_to_user_mode_prepare+0x3d/0x1a0 >> [ 967.724032] __x64_sys_sendmsg+0x1f/0x30 >> [ 967.724037] do_syscall_64+0x38/0x90 >> [ 967.724044] entry_SYSCALL_64_after_hwframe+0x44/0xae >> >> Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> >> --- >> net/core/dev.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 1f79b9aa9a3f..427cbc80d1e5 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -1537,8 +1537,11 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack) >> >> if (!netif_device_present(dev)) { >> /* may be detached because parent is runtime-suspended */ >> - if (dev->dev.parent) >> + if (dev->dev.parent) { >> + rtnl_unlock(); >> pm_runtime_resume(dev->dev.parent); > > A side topic, maybe a little bit silly question (I don't know that > much about net core): > Why not increase the parent or current PM runtime usage counter > instead? The problem with calling pm_runtime_resume() is that if the > parent device was already suspended (so it's usage counter ==0), it > might get suspended right after this call. IOW, you do not have any > guarantee that the device will be really resumed. Probably it should > be pm_runtime_resume_and_get() (and later "put" on close or other > events). This however might not solve the lock problem at all. > The point of runtime-resuming the parent here isn't to ensure it's resumed for the complete time the device is open. It's called because netif_device_present() may be (initially) false just because the parent is runtime-suspended. We want the device to be able to runtime-suspend later if e.g. the link is down. > Best regards, > Krzysztof >
It's been a while, do we have any conclusion about this? Do you need me re-send the patch with "Fixes:"? Heiner Kallweit <hkallweit1@gmail.com> 於 2021年4月30日 週五 上午3:36寫道: > > On 29.04.2021 13:58, Krzysztof Kozlowski wrote: > > On Tue, 20 Apr 2021 at 09:55, AceLan Kao <acelan.kao@canonical.com> wrote: > >> > >> From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com> > >> > >> The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in > >> __dev_open() it calls pm_runtime_resume() to resume devices, and in > >> some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock() > >> again. That leads to a recursive lock. > >> > >> It should leave the devices' resume function to decide if they need to > >> call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling > >> pm_runtime_resume() and then call rtnl_lock() after it in __dev_open(). > >> > >> [ 967.723577] INFO: task ip:6024 blocked for more than 120 seconds. > >> [ 967.723588] Not tainted 5.12.0-rc3+ #1 > >> [ 967.723592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> [ 967.723594] task:ip state:D stack: 0 pid: 6024 ppid: 5957 flags:0x00004000 > >> [ 967.723603] Call Trace: > >> [ 967.723610] __schedule+0x2de/0x890 > >> [ 967.723623] schedule+0x4f/0xc0 > >> [ 967.723629] schedule_preempt_disabled+0xe/0x10 > >> [ 967.723636] __mutex_lock.isra.0+0x190/0x510 > >> [ 967.723644] __mutex_lock_slowpath+0x13/0x20 > >> [ 967.723651] mutex_lock+0x32/0x40 > >> [ 967.723657] rtnl_lock+0x15/0x20 > >> [ 967.723665] igb_resume+0xee/0x1d0 [igb] > >> [ 967.723687] ? pci_pm_default_resume+0x30/0x30 > >> [ 967.723696] igb_runtime_resume+0xe/0x10 [igb] > >> [ 967.723713] pci_pm_runtime_resume+0x74/0x90 > >> [ 967.723718] __rpm_callback+0x53/0x1c0 > >> [ 967.723725] rpm_callback+0x57/0x80 > >> [ 967.723730] ? pci_pm_default_resume+0x30/0x30 > >> [ 967.723735] rpm_resume+0x547/0x760 > >> [ 967.723740] __pm_runtime_resume+0x52/0x80 > >> [ 967.723745] __dev_open+0x56/0x160 > >> [ 967.723753] ? _raw_spin_unlock_bh+0x1e/0x20 > >> [ 967.723758] __dev_change_flags+0x188/0x1e0 > >> [ 967.723766] dev_change_flags+0x26/0x60 > >> [ 967.723773] do_setlink+0x723/0x10b0 > >> [ 967.723782] ? __nla_validate_parse+0x5b/0xb80 > >> [ 967.723792] __rtnl_newlink+0x594/0xa00 > >> [ 967.723800] ? nla_put_ifalias+0x38/0xa0 > >> [ 967.723807] ? __nla_reserve+0x41/0x50 > >> [ 967.723813] ? __nla_reserve+0x41/0x50 > >> [ 967.723818] ? __kmalloc_node_track_caller+0x49b/0x4d0 > >> [ 967.723824] ? pskb_expand_head+0x75/0x310 > >> [ 967.723830] ? nla_reserve+0x28/0x30 > >> [ 967.723835] ? skb_free_head+0x25/0x30 > >> [ 967.723843] ? security_sock_rcv_skb+0x2f/0x50 > >> [ 967.723850] ? netlink_deliver_tap+0x3d/0x210 > >> [ 967.723859] ? sk_filter_trim_cap+0xc1/0x230 > >> [ 967.723863] ? skb_queue_tail+0x43/0x50 > >> [ 967.723870] ? sock_def_readable+0x4b/0x80 > >> [ 967.723876] ? __netlink_sendskb+0x42/0x50 > >> [ 967.723888] ? security_capable+0x3d/0x60 > >> [ 967.723894] ? __cond_resched+0x19/0x30 > >> [ 967.723900] ? kmem_cache_alloc_trace+0x390/0x440 > >> [ 967.723906] rtnl_newlink+0x49/0x70 > >> [ 967.723913] rtnetlink_rcv_msg+0x13c/0x370 > >> [ 967.723920] ? _copy_to_iter+0xa0/0x460 > >> [ 967.723927] ? rtnl_calcit.isra.0+0x130/0x130 > >> [ 967.723934] netlink_rcv_skb+0x55/0x100 > >> [ 967.723939] rtnetlink_rcv+0x15/0x20 > >> [ 967.723944] netlink_unicast+0x1a8/0x250 > >> [ 967.723949] netlink_sendmsg+0x233/0x460 > >> [ 967.723954] sock_sendmsg+0x65/0x70 > >> [ 967.723958] ____sys_sendmsg+0x218/0x290 > >> [ 967.723961] ? copy_msghdr_from_user+0x5c/0x90 > >> [ 967.723966] ? lru_cache_add_inactive_or_unevictable+0x27/0xb0 > >> [ 967.723974] ___sys_sendmsg+0x81/0xc0 > >> [ 967.723980] ? __mod_memcg_lruvec_state+0x22/0xe0 > >> [ 967.723987] ? kmem_cache_free+0x244/0x420 > >> [ 967.723991] ? dentry_free+0x37/0x70 > >> [ 967.723996] ? mntput_no_expire+0x4c/0x260 > >> [ 967.724001] ? __cond_resched+0x19/0x30 > >> [ 967.724007] ? security_file_free+0x54/0x60 > >> [ 967.724013] ? call_rcu+0xa4/0x250 > >> [ 967.724021] __sys_sendmsg+0x62/0xb0 > >> [ 967.724026] ? exit_to_user_mode_prepare+0x3d/0x1a0 > >> [ 967.724032] __x64_sys_sendmsg+0x1f/0x30 > >> [ 967.724037] do_syscall_64+0x38/0x90 > >> [ 967.724044] entry_SYSCALL_64_after_hwframe+0x44/0xae > >> > >> Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> > >> --- > >> net/core/dev.c | 5 ++++- > >> 1 file changed, 4 insertions(+), 1 deletion(-) > >> > >> diff --git a/net/core/dev.c b/net/core/dev.c > >> index 1f79b9aa9a3f..427cbc80d1e5 100644 > >> --- a/net/core/dev.c > >> +++ b/net/core/dev.c > >> @@ -1537,8 +1537,11 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack) > >> > >> if (!netif_device_present(dev)) { > >> /* may be detached because parent is runtime-suspended */ > >> - if (dev->dev.parent) > >> + if (dev->dev.parent) { > >> + rtnl_unlock(); > >> pm_runtime_resume(dev->dev.parent); > > > > A side topic, maybe a little bit silly question (I don't know that > > much about net core): > > Why not increase the parent or current PM runtime usage counter > > instead? The problem with calling pm_runtime_resume() is that if the > > parent device was already suspended (so it's usage counter ==0), it > > might get suspended right after this call. IOW, you do not have any > > guarantee that the device will be really resumed. Probably it should > > be pm_runtime_resume_and_get() (and later "put" on close or other > > events). This however might not solve the lock problem at all. > > > The point of runtime-resuming the parent here isn't to ensure it's > resumed for the complete time the device is open. It's called > because netif_device_present() may be (initially) false just because > the parent is runtime-suspended. > We want the device to be able to runtime-suspend later if e.g. > the link is down. > > > Best regards, > > Krzysztof > > >
diff --git a/net/core/dev.c b/net/core/dev.c index 1f79b9aa9a3f..427cbc80d1e5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1537,8 +1537,11 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack) if (!netif_device_present(dev)) { /* may be detached because parent is runtime-suspended */ - if (dev->dev.parent) + if (dev->dev.parent) { + rtnl_unlock(); pm_runtime_resume(dev->dev.parent); + rtnl_lock(); + } if (!netif_device_present(dev)) return -ENODEV; }