Message ID | 20230203060128.19625-1-quic_mpubbise@quicinc.com |
---|---|
Headers | show |
Series | Enable low power mode when WLAN is not active | expand |
Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes: > Currently, WLAN chip is powered once during driver probe and is kept > ON (powered) always even when WLAN is not active; keeping the chip > powered ON all the time will consume extra power which is not > desirable for a battery operated device. Same is the case with non-WoW > suspend, chip will never be put into low power mode when the system is > suspended resulting in higher battery drain. > > As per the recommendation, sending a PDEV suspend WMI command followed > by a QMI MODE OFF command will cease all WLAN activity and put the device > in low power mode. When WLAN interfaces are brought up, sending a QMI > MISSION MODE command would be sufficient to bring the chip out of low > power. This is a better approach than doing hif_power_down()/hif_power_up() > for every WiFi ON/OFF sequence since the turnaround time for entry/exit of > low power mode is much less. Overhead is just the time taken for sending > QMI MODE OFF & QMI MISSION MODE commands instead of going through the > entire chip boot & QMI init sequence. > > Currently the changes are applicable only for WCN6750. This can be > extended to other targets with a future patch. > > Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1 > Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16 > > Signed-off-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> This is still crashing for me every time with WCN6855 on a NUC x86 device when I rmmod ath11k. Interestingly enough QCA6390 on a Dell XPS 13 9310 does not crash. I investigated the crash more, the crash happens in ath11k_dp_process_rxdma_err() on this line: srng = &ab->hal.srng_list[err_ring->ring_id]; Here are the debug messages before the crash (first and last are my own messages): [ 226.766111] rmmod ath11k_pci [ 227.003678] ath11k_pci 0000:06:00.0: txpower from firmware NaN, reported -2147483648 dBm [ 227.082283] ath11k_pci 0000:06:00.0: qmi wifi fw del server [ 227.195760] ath11k_pci 0000:06:00.0: cookie:0x0 [ 227.195843] ath11k_pci 0000:06:00.0: WLAON_WARM_SW_ENTRY 0x15b894d [ 227.216022] ath11k_pci 0000:06:00.0: WLAON_WARM_SW_ENTRY 0x0 [ 227.216086] ath11k_pci 0000:06:00.0: soc reset cause:0 [ 227.236170] ath11k_pci 0000:06:00.0: MHISTATUS 0xff04 [ 227.270816] ath11k_pci 0000:06:00.0: ext irq:167 [ 227.271231] ath11k_dp_process_rxdma_err() 4187 ab ffff888145520000 err_ring 00000000000001d0 So we get irq 167 which is: 167: 0 0 0 0 0 0 0 0 IR-PCI-MSI-0000:06:00.0 14-edge DP_EXT_IRQ But in ath11k_pcic_ext_interrupt_handler() ATH11K_FLAG_EXT_IRQ_ENABLED is still enabled so the irq is processed: if (!test_bit(ATH11K_FLAG_EXT_IRQ_ENABLED, &ab->dev_flags)) return IRQ_HANDLED; It looks like that, after applying this patch 3, whenever ath11k_pci_remove() is called we are not calling ath11k_hif_irq_disable() anymore. I checked that without patch 3 ath11k_hif_irq_disable() is always called. So this patch is definitely breaking something fundamental, but I ran out of time to invetigate further. I hope this still helps. Do note I have concerns about this patchset, it just changes quite a lot of the driver logic and I'm worried what else this breaks. Also we should definitely test with another AHB device like IPQ8074, this patchset needs extensive testing.
Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes: > Currently, WLAN chip is powered once during driver probe and is kept > ON (powered) always even when WLAN is not active; keeping the chip > powered ON all the time will consume extra power which is not > desirable for a battery operated device. Same is the case with non-WoW > suspend, chip will never be put into low power mode when the system is > suspended resulting in higher battery drain. > > As per the recommendation, sending a PDEV suspend WMI command followed > by a QMI MODE OFF command will cease all WLAN activity and put the device > in low power mode. When WLAN interfaces are brought up, sending a QMI > MISSION MODE command would be sufficient to bring the chip out of low > power. This is a better approach than doing hif_power_down()/hif_power_up() > for every WiFi ON/OFF sequence since the turnaround time for entry/exit of > low power mode is much less. Overhead is just the time taken for sending > QMI MODE OFF & QMI MISSION MODE commands instead of going through the > entire chip boot & QMI init sequence. > > Currently the changes are applicable only for WCN6750. This can be > extended to other targets with a future patch. > > Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1 > Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16 > > Signed-off-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> [...] > +static int ath11k_ahb_core_start_ipq8074(struct ath11k_base *ab) > +{ > + /* TODO: Currently initializing the hardware/firmware only > + * during hardware recovery. Support to shutdown/turn-on > + * the hardware during Wi-Fi OFF/ON will be added later. > + */ > + if (!test_bit(ATH11K_FLAG_RECOVERY, &ab->dev_flags)) > + return 0; > + > + return ath11k_core_start_device(ab); > +} > + > +static void ath11k_ahb_core_stop_ipq8074(struct ath11k_base *ab) > +{ > + /* TODO: Currently stopping the hardware/firmware only > + * during driver unload. Support to shutdown/turn-on > + * the hardware during Wi-Fi OFF/ON will be added later. > + */ > + if (!test_bit(ATH11K_FLAG_UNREGISTERING, &ab->dev_flags)) > + return; > + > + return ath11k_core_stop_device(ab); > +} Please clarify what Wi-Fi OFF/ON exactly means on these two comments, it's not clear for me. Also I want to mention that I suspect eventually we have to always power off the firmware during suspend to get hibernation working: https://bugzilla.kernel.org/show_bug.cgi?id=214649
Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes: > Currently, WLAN chip is powered once during driver probe and is kept > ON (powered) always even when WLAN is not active; keeping the chip > powered ON all the time will consume extra power which is not > desirable for battery operated devices. Same is the case with non-WoW > suspend, chip will not be put into low power mode when the system is > suspended resulting in higher battery drain. > > Send QMI MODE OFF command to firmware during WiFi OFF to put device > into low power mode. > > Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1 > Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16 > > Manikanta Pubbisetty (3): > ath11k: Fix double free issue during SRNG deinit > ath11k: Move hardware initialization logic to start() > ath11k: Enable low power mode when WLAN is not active Please add "wifi:" to all patches. And please add "wifi: ath11k:" to the cover letter.
On 2/24/2023 8:16 PM, Kalle Valo wrote: > Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes: > >> Currently, WLAN chip is powered once during driver probe and is kept >> ON (powered) always even when WLAN is not active; keeping the chip >> powered ON all the time will consume extra power which is not >> desirable for a battery operated device. Same is the case with non-WoW >> suspend, chip will never be put into low power mode when the system is >> suspended resulting in higher battery drain. >> >> As per the recommendation, sending a PDEV suspend WMI command followed >> by a QMI MODE OFF command will cease all WLAN activity and put the device >> in low power mode. When WLAN interfaces are brought up, sending a QMI >> MISSION MODE command would be sufficient to bring the chip out of low >> power. This is a better approach than doing hif_power_down()/hif_power_up() >> for every WiFi ON/OFF sequence since the turnaround time for entry/exit of >> low power mode is much less. Overhead is just the time taken for sending >> QMI MODE OFF & QMI MISSION MODE commands instead of going through the >> entire chip boot & QMI init sequence. >> >> Currently the changes are applicable only for WCN6750. This can be >> extended to other targets with a future patch. >> >> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1 >> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16 >> >> Signed-off-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> > > This is still crashing for me every time with WCN6855 on a NUC x86 > device when I rmmod ath11k. Interestingly enough QCA6390 on a Dell XPS > 13 9310 does not crash. > Surprisingly enough, I was never been able to reproduce the problem on my device which has WCN6855. > I investigated the crash more, the crash happens in > ath11k_dp_process_rxdma_err() on this line: > > srng = &ab->hal.srng_list[err_ring->ring_id]; > > Here are the debug messages before the crash (first and last are my > own messages): > > [ 226.766111] rmmod ath11k_pci > [ 227.003678] ath11k_pci 0000:06:00.0: txpower from firmware NaN, reported -2147483648 dBm > [ 227.082283] ath11k_pci 0000:06:00.0: qmi wifi fw del server > [ 227.195760] ath11k_pci 0000:06:00.0: cookie:0x0 > [ 227.195843] ath11k_pci 0000:06:00.0: WLAON_WARM_SW_ENTRY 0x15b894d > [ 227.216022] ath11k_pci 0000:06:00.0: WLAON_WARM_SW_ENTRY 0x0 > [ 227.216086] ath11k_pci 0000:06:00.0: soc reset cause:0 > [ 227.236170] ath11k_pci 0000:06:00.0: MHISTATUS 0xff04 > [ 227.270816] ath11k_pci 0000:06:00.0: ext irq:167 > [ 227.271231] ath11k_dp_process_rxdma_err() 4187 ab ffff888145520000 err_ring 00000000000001d0 > > So we get irq 167 which is: > > 167: 0 0 0 0 0 0 0 0 IR-PCI-MSI-0000:06:00.0 14-edge DP_EXT_IRQ > > But in ath11k_pcic_ext_interrupt_handler() ATH11K_FLAG_EXT_IRQ_ENABLED > is still enabled so the irq is processed: > > if (!test_bit(ATH11K_FLAG_EXT_IRQ_ENABLED, &ab->dev_flags)) > return IRQ_HANDLED; > > It looks like that, after applying this patch 3, whenever > ath11k_pci_remove() is called we are not calling > ath11k_hif_irq_disable() anymore. I checked that without patch 3 > ath11k_hif_irq_disable() is always called. So this patch is definitely > breaking something fundamental, but I ran out of time to invetigate > further. I hope this still helps. > This definitely helps, thanks a lot for the direction. I'll check what is missing here. > Do note I have concerns about this patchset, it just changes quite a lot > of the driver logic and I'm worried what else this breaks. Also we > should definitely test with another AHB device like IPQ8074, this > patchset needs extensive testing. > Noted. Thanks, Manikanta
On 2/24/2023 8:20 PM, Kalle Valo wrote: > Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes: > >> Currently, WLAN chip is powered once during driver probe and is kept >> ON (powered) always even when WLAN is not active; keeping the chip >> powered ON all the time will consume extra power which is not >> desirable for a battery operated device. Same is the case with non-WoW >> suspend, chip will never be put into low power mode when the system is >> suspended resulting in higher battery drain. >> >> As per the recommendation, sending a PDEV suspend WMI command followed >> by a QMI MODE OFF command will cease all WLAN activity and put the device >> in low power mode. When WLAN interfaces are brought up, sending a QMI >> MISSION MODE command would be sufficient to bring the chip out of low >> power. This is a better approach than doing hif_power_down()/hif_power_up() >> for every WiFi ON/OFF sequence since the turnaround time for entry/exit of >> low power mode is much less. Overhead is just the time taken for sending >> QMI MODE OFF & QMI MISSION MODE commands instead of going through the >> entire chip boot & QMI init sequence. >> >> Currently the changes are applicable only for WCN6750. This can be >> extended to other targets with a future patch. >> >> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1 >> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16 >> >> Signed-off-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> > > [...] > >> +static int ath11k_ahb_core_start_ipq8074(struct ath11k_base *ab) >> +{ >> + /* TODO: Currently initializing the hardware/firmware only >> + * during hardware recovery. Support to shutdown/turn-on >> + * the hardware during Wi-Fi OFF/ON will be added later. >> + */ >> + if (!test_bit(ATH11K_FLAG_RECOVERY, &ab->dev_flags)) >> + return 0; >> + >> + return ath11k_core_start_device(ab); >> +} >> + >> +static void ath11k_ahb_core_stop_ipq8074(struct ath11k_base *ab) >> +{ >> + /* TODO: Currently stopping the hardware/firmware only >> + * during driver unload. Support to shutdown/turn-on >> + * the hardware during Wi-Fi OFF/ON will be added later. >> + */ >> + if (!test_bit(ATH11K_FLAG_UNREGISTERING, &ab->dev_flags)) >> + return; >> + >> + return ath11k_core_stop_device(ab); >> +} > > Please clarify what Wi-Fi OFF/ON exactly means on these two comments, > it's not clear for me. > By Wi-Fi OFF/ON I mean is the bringing the last wlan interface down/up which is nothing but the non-WoW suspend/resume. > Also I want to mention that I suspect eventually we have to always power > off the firmware during suspend to get hibernation working: > This patch will power off the firmware for WCN6750. I'm not sure how to get that working for other ath11k devices. Thanks, Manikanta
On 2/24/2023 8:24 PM, Kalle Valo wrote: > Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes: > >> Currently, WLAN chip is powered once during driver probe and is kept >> ON (powered) always even when WLAN is not active; keeping the chip >> powered ON all the time will consume extra power which is not >> desirable for battery operated devices. Same is the case with non-WoW >> suspend, chip will not be put into low power mode when the system is >> suspended resulting in higher battery drain. >> >> Send QMI MODE OFF command to firmware during WiFi OFF to put device >> into low power mode. >> >> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1 >> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16 >> >> Manikanta Pubbisetty (3): >> ath11k: Fix double free issue during SRNG deinit >> ath11k: Move hardware initialization logic to start() >> ath11k: Enable low power mode when WLAN is not active > > Please add "wifi:" to all patches. > > And please add "wifi: ath11k:" to the cover letter. > Sorry, I have missed adding the prefix. I'll fix this. Thanks, Manikanta
Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes: >> Also I want to mention that I suspect eventually we have to always power >> off the firmware during suspend to get hibernation working: > > This patch will power off the firmware for WCN6750. I'm not sure how > to get that working for other ath11k devices. Oh, I didn't release that. If you can, try to clarify the commit log on this part. For example, "for suspend we run command FOO_OFF which means that the power from the firmware is complete turned off". Or maybe just read them too hastily, but still having clear (and simple) commit logs help understanding the issue.
On 2/24/2023 8:16 PM, Kalle Valo wrote: > Manikanta Pubbisetty <quic_mpubbise@quicinc.com> writes: > >> Currently, WLAN chip is powered once during driver probe and is kept >> ON (powered) always even when WLAN is not active; keeping the chip >> powered ON all the time will consume extra power which is not >> desirable for a battery operated device. Same is the case with non-WoW >> suspend, chip will never be put into low power mode when the system is >> suspended resulting in higher battery drain. >> >> As per the recommendation, sending a PDEV suspend WMI command followed >> by a QMI MODE OFF command will cease all WLAN activity and put the device >> in low power mode. When WLAN interfaces are brought up, sending a QMI >> MISSION MODE command would be sufficient to bring the chip out of low >> power. This is a better approach than doing hif_power_down()/hif_power_up() >> for every WiFi ON/OFF sequence since the turnaround time for entry/exit of >> low power mode is much less. Overhead is just the time taken for sending >> QMI MODE OFF & QMI MISSION MODE commands instead of going through the >> entire chip boot & QMI init sequence. >> >> Currently the changes are applicable only for WCN6750. This can be >> extended to other targets with a future patch. >> >> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1 >> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16 >> >> Signed-off-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> > > This is still crashing for me every time with WCN6855 on a NUC x86 > device when I rmmod ath11k. Interestingly enough QCA6390 on a Dell XPS > 13 9310 does not crash. > > I investigated the crash more, the crash happens in > ath11k_dp_process_rxdma_err() on this line: > > srng = &ab->hal.srng_list[err_ring->ring_id]; > > Here are the debug messages before the crash (first and last are my > own messages): > > [ 226.766111] rmmod ath11k_pci > [ 227.003678] ath11k_pci 0000:06:00.0: txpower from firmware NaN, reported -2147483648 dBm > [ 227.082283] ath11k_pci 0000:06:00.0: qmi wifi fw del server > [ 227.195760] ath11k_pci 0000:06:00.0: cookie:0x0 > [ 227.195843] ath11k_pci 0000:06:00.0: WLAON_WARM_SW_ENTRY 0x15b894d > [ 227.216022] ath11k_pci 0000:06:00.0: WLAON_WARM_SW_ENTRY 0x0 > [ 227.216086] ath11k_pci 0000:06:00.0: soc reset cause:0 > [ 227.236170] ath11k_pci 0000:06:00.0: MHISTATUS 0xff04 > [ 227.270816] ath11k_pci 0000:06:00.0: ext irq:167 > [ 227.271231] ath11k_dp_process_rxdma_err() 4187 ab ffff888145520000 err_ring 00000000000001d0 > > So we get irq 167 which is: > > 167: 0 0 0 0 0 0 0 0 IR-PCI-MSI-0000:06:00.0 14-edge DP_EXT_IRQ > > But in ath11k_pcic_ext_interrupt_handler() ATH11K_FLAG_EXT_IRQ_ENABLED > is still enabled so the irq is processed: > > if (!test_bit(ATH11K_FLAG_EXT_IRQ_ENABLED, &ab->dev_flags)) > return IRQ_HANDLED; > > It looks like that, after applying this patch 3, whenever > ath11k_pci_remove() is called we are not calling > ath11k_hif_irq_disable() anymore. I checked that without patch 3 > ath11k_hif_irq_disable() is always called. So this patch is definitely > breaking something fundamental, but I ran out of time to invetigate > further. I hope this still helps. > Hi Kalle, I was checking the logic around this and have added some debug logs to check if all the de-init APIs are getting called in the rmmod path. This is the function call flow with WCN6855 on my machine, ath11k_pci 0000:06:00.0: Manikanta: ath11k_core_pdev_destroy ath11k_pci 0000:06:00.0: Manikanta: ath11k_thermal_unregister ath11k_pci 0000:06:00.0: Manikanta: ath11k_mac_unregister ath11k_pci 0000:06:00.0: Manikanta: ath11k_pcic_ext_irq_disable ath11k_pci 0000:06:00.0: Manikanta: __ath11k_pcic_ext_irq_disable ath11k_pci 0000:06:00.0: Manikanta: ath11k_dp_pdev_free ath11k_pci 0000:06:00.0: Manikanta: ath11k_dp_rx_pdev_free ath11k_pci 0000:06:00.0: Manikanta: ath11k_dp_rx_pdev_mon_detach ath11k_pci 0000:06:00.0: Manikanta: ath11k_pcic_stop ath11k_pci 0000:06:00.0: Manikanta: ath11k_dp_pdev_reo_cleanup ath11k_pci 0000:06:00.0: Manikanta: ath11k_dp_free ath11k_pci 0000:06:00.0: Manikanta: ath11k_pci_power_down ath11k_pci 0000:06:00.0: Manikanta: ath11k_mac_destroy ath11k_pci 0000:06:00.0: Manikanta: ath11k_reg_free ath11k_pci 0000:06:00.0: Manikanta: ath11k_pcic_free_irq ath11k_pci 0000:06:00.0: Manikanta: ath11k_pci_free_msi ath11k_pci 0000:06:00.0: Manikanta: ath11k_hal_srng_deinit ath11k_pci 0000:06:00.0: Manikanta: ath11k_core_free In stark contrast to your observations, from the above call flow, I see that ath11k_hif_irq_disable() is getting called and the IRQs are getting disabled. ath11k_pcic_ext_irq_disable() is registered for hif_irq_disable(). I even tried the single MSI vector configuration suspecting that could be the difference. Even in single MSI case, I don't see any crashes during rmmod. I'm completely clueless as to why the same code is behaving differently with the same hardware. How can we take this forward, could you please suggest? I'm thinking to keep these changes specific to WCN6750 for now. Thanks, Manikanta