Message ID | 20250410145704.207969-1-usama.anjum@collabora.com |
---|---|
State | New |
Headers | show |
Series | [v2] bus: mhi: host: don't free bhie tables during suspend/hibernation | expand |
On 4/11/2025 12:32 PM, Muhammad Usama Anjum wrote: > On 4/11/25 8:37 AM, Krishna Chaitanya Chundru wrote: >> >> >> On 4/10/2025 8:26 PM, Muhammad Usama Anjum wrote: >>> Fix dma_direct_alloc() failure at resume time during bhie_table >>> allocation. There is a crash report where at resume time, the memory >>> from the dma doesn't get allocated and MHI fails to re-initialize. >>> There may be fragmentation of some kind which fails the allocation >>> call. >>> >>> To fix it, don't free the memory at power down during suspend / >>> hibernation. Instead, use the same allocated memory again after every >>> resume / hibernation. This patch has been tested with resume and >>> hibernation both. >>> >>> The rddm is of constant size for a given hardware. While the fbc_image >>> size depends on the firmware. If the firmware changes, we'll free and >> If firmware image will change between suspend and resume ? > Yes, correct. > why the firmware image size will change between suspend & resume? who will update the firmware image after bootup? It is not expected behaviour. - Krishna chaitanya. >>> allocate new memory for it. >>> >>> Here are the crash logs: >>> >>> [ 3029.338587] mhi mhi0: Requested to power ON >>> [ 3029.338621] mhi mhi0: Power on setup success >>> [ 3029.668654] kworker/u33:8: page allocation failure: order:7, >>> mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 >>> [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted >>> 6.11.11-valve10-1-neptune-611-gb69e902b4338 >>> #1ed779c892334112fb968aaa3facf9686b5ff0bd7 >>> [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 >>> 08/01/2024 >>> [ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi] >>> [ 3029.668717] Call Trace: >>> [ 3029.668722] <TASK> >>> [ 3029.668728] dump_stack_lvl+0x4e/0x70 >>> [ 3029.668738] warn_alloc+0x164/0x190 >>> [ 3029.668747] ? srso_return_thunk+0x5/0x5f >>> [ 3029.668754] ? __alloc_pages_direct_compact+0xaf/0x360 >>> [ 3029.668761] __alloc_pages_slowpath.constprop.0+0xc75/0xd70 >>> [ 3029.668774] __alloc_pages_noprof+0x321/0x350 >>> [ 3029.668782] __dma_direct_alloc_pages.isra.0+0x14a/0x290 >>> [ 3029.668790] dma_direct_alloc+0x70/0x270 >>> [ 3029.668796] mhi_alloc_bhie_table+0xe8/0x190 [mhi >>> faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] >>> [ 3029.668814] mhi_fw_load_handler+0x1bc/0x310 [mhi >>> faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] >>> [ 3029.668830] mhi_pm_st_worker+0x5c8/0xaa0 [mhi >>> faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] >>> [ 3029.668844] ? srso_return_thunk+0x5/0x5f >>> [ 3029.668853] process_one_work+0x17e/0x330 >>> [ 3029.668861] worker_thread+0x2ce/0x3f0 >>> [ 3029.668868] ? __pfx_worker_thread+0x10/0x10 >>> [ 3029.668873] kthread+0xd2/0x100 >>> [ 3029.668879] ? __pfx_kthread+0x10/0x10 >>> [ 3029.668885] ret_from_fork+0x34/0x50 >>> [ 3029.668892] ? __pfx_kthread+0x10/0x10 >>> [ 3029.668898] ret_from_fork_asm+0x1a/0x30 >>> [ 3029.668910] </TASK> >>> >>> Tested-on: QCNFA765 WLAN.HSP.1.1-03926.13- >>> QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 >>> >>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >>> --- >>> Changes sice v1: >>> - Don't free bhie tables during suspend/hibernation only >>> - Handle fbc_image changed size correctly >>> - Remove fbc_image getting set to NULL in *free_bhie_table() >>> --- >>> drivers/bus/mhi/host/boot.c | 15 +++++++++++---- >>> drivers/bus/mhi/host/init.c | 13 ++++++++++--- >>> drivers/net/wireless/ath/ath11k/mhi.c | 9 +++++---- >>> include/linux/mhi.h | 7 +++++++ >>> 4 files changed, 33 insertions(+), 11 deletions(-) >>> >>> diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c >>> index 9dcc7184817d5..0df26100c8f9c 100644 >>> --- a/drivers/bus/mhi/host/boot.c >>> +++ b/drivers/bus/mhi/host/boot.c >>> @@ -487,10 +487,17 @@ void mhi_fw_load_handler(struct mhi_controller >>> *mhi_cntrl) >>> * device transitioning into MHI READY state >>> */ >>> if (mhi_cntrl->fbc_download) { >>> - ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->fbc_image, >>> fw_sz); >>> - if (ret) { >>> - release_firmware(firmware); >>> - goto error_fw_load; >>> + if (mhi_cntrl->fbc_image && fw_sz != mhi_cntrl->prev_fw_sz) { >>> + mhi_free_bhie_table(mhi_cntrl, mhi_cntrl->fbc_image); >>> + mhi_cntrl->fbc_image = NULL; >>> + } >>> + if (!mhi_cntrl->fbc_image) { >>> + ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl- >>>> fbc_image, fw_sz); >>> + if (ret) { >>> + release_firmware(firmware); >>> + goto error_fw_load; >>> + } >>> + mhi_cntrl->prev_fw_sz = fw_sz; >>> } >>> /* Load the firmware into BHIE vec table */ >>> diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c >>> index a9b1f8beee7bc..09b946b86ac46 100644 >>> --- a/drivers/bus/mhi/host/init.c >>> +++ b/drivers/bus/mhi/host/init.c >>> @@ -1173,8 +1173,9 @@ int mhi_prepare_for_power_up(struct >>> mhi_controller *mhi_cntrl) >>> /* >>> * Allocate RDDM table for debugging purpose if specified >>> */ >>> - mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->rddm_image, >>> - mhi_cntrl->rddm_size); >>> + if (!mhi_cntrl->rddm_image) >>> + mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->rddm_image, >>> + mhi_cntrl->rddm_size); >>> if (mhi_cntrl->rddm_image) { >>> ret = mhi_rddm_prepare(mhi_cntrl, >>> mhi_cntrl->rddm_image); >>> @@ -1212,12 +1213,18 @@ void mhi_unprepare_after_power_down(struct >>> mhi_controller *mhi_cntrl) >>> mhi_cntrl->rddm_image = NULL; >>> } >>> + mhi_partial_unprepare_after_power_down(mhi_cntrl); >>> +} >>> +EXPORT_SYMBOL_GPL(mhi_unprepare_after_power_down); >>> + >>> +void mhi_partial_unprepare_after_power_down(struct mhi_controller >>> *mhi_cntrl) >>> +{ >>> mhi_cntrl->bhi = NULL; >>> mhi_cntrl->bhie = NULL; >>> mhi_deinit_dev_ctxt(mhi_cntrl); >>> } >>> -EXPORT_SYMBOL_GPL(mhi_unprepare_after_power_down); >>> +EXPORT_SYMBOL_GPL(mhi_partial_unprepare_after_power_down); >>> >> Instead of adding new API you can free memory from the unregister >> controller also. >> >> - Krishna Chaitanya. >>> static void mhi_release_device(struct device *dev) >>> { >>> diff --git a/drivers/net/wireless/ath/ath11k/mhi.c b/drivers/net/ >>> wireless/ath/ath11k/mhi.c >>> index acd76e9392d31..f77cec79b5b80 100644 >>> --- a/drivers/net/wireless/ath/ath11k/mhi.c >>> +++ b/drivers/net/wireless/ath/ath11k/mhi.c >>> @@ -460,12 +460,13 @@ void ath11k_mhi_stop(struct ath11k_pci *ab_pci, >>> bool is_suspend) >>> * workaround, otherwise ath11k_core_resume() will timeout >>> * during resume. >>> */ >>> - if (is_suspend) >>> + if (is_suspend) { >>> mhi_power_down_keep_dev(ab_pci->mhi_ctrl, true); >>> - else >>> + mhi_partial_unprepare_after_power_down(ab_pci->mhi_ctrl); >>> + } else { >>> mhi_power_down(ab_pci->mhi_ctrl, true); >>> - >>> - mhi_unprepare_after_power_down(ab_pci->mhi_ctrl); >>> + mhi_unprepare_after_power_down(ab_pci->mhi_ctrl); >>> + } >>> } >>> int ath11k_mhi_suspend(struct ath11k_pci *ab_pci) >>> diff --git a/include/linux/mhi.h b/include/linux/mhi.h >>> index 059dc94d20bb6..65a47c712b3a0 100644 >>> --- a/include/linux/mhi.h >>> +++ b/include/linux/mhi.h >>> @@ -382,6 +382,7 @@ struct mhi_controller { >>> const char *fw_image; >>> const u8 *fw_data; >>> size_t fw_sz; >>> + size_t prev_fw_sz; >>> const char *edl_image; >>> size_t rddm_size; >>> size_t sbl_size; >>> @@ -662,6 +663,12 @@ void mhi_power_down_keep_dev(struct >>> mhi_controller *mhi_cntrl, bool graceful); >>> */ >>> void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl); >>> +/** >>> + * mhi_partial_unprepare_after_power_down - Free any allocated memory >>> after power down partially >>> + * @mhi_cntrl: MHI controller >>> + */ >>> +void mhi_partial_unprepare_after_power_down(struct mhi_controller >>> *mhi_cntrl); >>> + >>> /** >>> * mhi_pm_suspend - Move MHI into a suspended state >>> * @mhi_cntrl: MHI controller > >
On 4/10/25 10:00 PM, Greg Kroah-Hartman wrote: > On Thu, Apr 10, 2025 at 07:56:54PM +0500, Muhammad Usama Anjum wrote: >> Fix dma_direct_alloc() failure at resume time during bhie_table >> allocation. There is a crash report where at resume time, the memory >> from the dma doesn't get allocated and MHI fails to re-initialize. >> There may be fragmentation of some kind which fails the allocation >> call. >> >> To fix it, don't free the memory at power down during suspend / >> hibernation. Instead, use the same allocated memory again after every >> resume / hibernation. This patch has been tested with resume and >> hibernation both. >> >> The rddm is of constant size for a given hardware. While the fbc_image >> size depends on the firmware. If the firmware changes, we'll free and >> allocate new memory for it. >> >> Here are the crash logs: >> >> [ 3029.338587] mhi mhi0: Requested to power ON >> [ 3029.338621] mhi mhi0: Power on setup success >> [ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 >> [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7 >> [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024 >> [ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi] >> [ 3029.668717] Call Trace: >> [ 3029.668722] <TASK> >> [ 3029.668728] dump_stack_lvl+0x4e/0x70 >> [ 3029.668738] warn_alloc+0x164/0x190 >> [ 3029.668747] ? srso_return_thunk+0x5/0x5f >> [ 3029.668754] ? __alloc_pages_direct_compact+0xaf/0x360 >> [ 3029.668761] __alloc_pages_slowpath.constprop.0+0xc75/0xd70 >> [ 3029.668774] __alloc_pages_noprof+0x321/0x350 >> [ 3029.668782] __dma_direct_alloc_pages.isra.0+0x14a/0x290 >> [ 3029.668790] dma_direct_alloc+0x70/0x270 >> [ 3029.668796] mhi_alloc_bhie_table+0xe8/0x190 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] >> [ 3029.668814] mhi_fw_load_handler+0x1bc/0x310 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] >> [ 3029.668830] mhi_pm_st_worker+0x5c8/0xaa0 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] >> [ 3029.668844] ? srso_return_thunk+0x5/0x5f >> [ 3029.668853] process_one_work+0x17e/0x330 >> [ 3029.668861] worker_thread+0x2ce/0x3f0 >> [ 3029.668868] ? __pfx_worker_thread+0x10/0x10 >> [ 3029.668873] kthread+0xd2/0x100 >> [ 3029.668879] ? __pfx_kthread+0x10/0x10 >> [ 3029.668885] ret_from_fork+0x34/0x50 >> [ 3029.668892] ? __pfx_kthread+0x10/0x10 >> [ 3029.668898] ret_from_fork_asm+0x1a/0x30 >> [ 3029.668910] </TASK> >> >> Tested-on: QCNFA765 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 >> >> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >> --- >> Changes sice v1: >> - Don't free bhie tables during suspend/hibernation only >> - Handle fbc_image changed size correctly >> - Remove fbc_image getting set to NULL in *free_bhie_table() > > What commit id does this fix? I think, these errors are happening because of the fragmentation. So this patch is doing an improvement. Its hard to call it a fix for something already added. The following patch had added fbc_image allocation: cd457afb1667 bus: mhi: core: Add support for downloading firmware over BHIe The following commit had added rddm allocation: 3215d8e0691b bus: mhi: core: Set BHI/BHIe offsets on power up preparation Even if I want to add a fixes-by tag, it would be difficult to decide which commit to chose. Maybe we divide the patch into 2 in these scenarios or just select the earlier commit in Fixes tag. Please suggest what is best way? > > thanks, > > greg k-h >
Hi Jeff, Thank you for reviewing. On 4/11/25 9:10 PM, Jeff Hugo wrote: > On 4/10/2025 8:56 AM, Muhammad Usama Anjum wrote: >> Fix dma_direct_alloc() failure at resume time during bhie_table >> allocation. There is a crash report where at resume time, the memory >> from the dma doesn't get allocated and MHI fails to re-initialize. >> There may be fragmentation of some kind which fails the allocation >> call. >> >> To fix it, don't free the memory at power down during suspend / >> hibernation. Instead, use the same allocated memory again after every >> resume / hibernation. This patch has been tested with resume and >> hibernation both. >> >> The rddm is of constant size for a given hardware. While the fbc_image >> size depends on the firmware. If the firmware changes, we'll free and >> allocate new memory for it. >> >> Here are the crash logs: >> >> [ 3029.338587] mhi mhi0: Requested to power ON >> [ 3029.338621] mhi mhi0: Power on setup success >> [ 3029.668654] kworker/u33:8: page allocation failure: order:7, >> mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 >> [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted >> 6.11.11-valve10-1-neptune-611-gb69e902b4338 >> #1ed779c892334112fb968aaa3facf9686b5ff0bd7 >> [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 >> 08/01/2024 >> [ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi] >> [ 3029.668717] Call Trace: >> [ 3029.668722] <TASK> >> [ 3029.668728] dump_stack_lvl+0x4e/0x70 >> [ 3029.668738] warn_alloc+0x164/0x190 >> [ 3029.668747] ? srso_return_thunk+0x5/0x5f >> [ 3029.668754] ? __alloc_pages_direct_compact+0xaf/0x360 >> [ 3029.668761] __alloc_pages_slowpath.constprop.0+0xc75/0xd70 >> [ 3029.668774] __alloc_pages_noprof+0x321/0x350 >> [ 3029.668782] __dma_direct_alloc_pages.isra.0+0x14a/0x290 >> [ 3029.668790] dma_direct_alloc+0x70/0x270 >> [ 3029.668796] mhi_alloc_bhie_table+0xe8/0x190 [mhi >> faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] >> [ 3029.668814] mhi_fw_load_handler+0x1bc/0x310 [mhi >> faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] >> [ 3029.668830] mhi_pm_st_worker+0x5c8/0xaa0 [mhi >> faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] >> [ 3029.668844] ? srso_return_thunk+0x5/0x5f >> [ 3029.668853] process_one_work+0x17e/0x330 >> [ 3029.668861] worker_thread+0x2ce/0x3f0 >> [ 3029.668868] ? __pfx_worker_thread+0x10/0x10 >> [ 3029.668873] kthread+0xd2/0x100 >> [ 3029.668879] ? __pfx_kthread+0x10/0x10 >> [ 3029.668885] ret_from_fork+0x34/0x50 >> [ 3029.668892] ? __pfx_kthread+0x10/0x10 >> [ 3029.668898] ret_from_fork_asm+0x1a/0x30 >> [ 3029.668910] </TASK> >> >> Tested-on: QCNFA765 WLAN.HSP.1.1-03926.13- >> QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 >> >> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >> --- >> Changes sice v1: >> - Don't free bhie tables during suspend/hibernation only >> - Handle fbc_image changed size correctly >> - Remove fbc_image getting set to NULL in *free_bhie_table() >> --- >> drivers/bus/mhi/host/boot.c | 15 +++++++++++---- >> drivers/bus/mhi/host/init.c | 13 ++++++++++--- >> drivers/net/wireless/ath/ath11k/mhi.c | 9 +++++---- >> include/linux/mhi.h | 7 +++++++ >> 4 files changed, 33 insertions(+), 11 deletions(-) >> >> diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c >> index 9dcc7184817d5..0df26100c8f9c 100644 >> --- a/drivers/bus/mhi/host/boot.c >> +++ b/drivers/bus/mhi/host/boot.c >> @@ -487,10 +487,17 @@ void mhi_fw_load_handler(struct mhi_controller >> *mhi_cntrl) >> * device transitioning into MHI READY state >> */ >> if (mhi_cntrl->fbc_download) { >> - ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->fbc_image, >> fw_sz); >> - if (ret) { >> - release_firmware(firmware); >> - goto error_fw_load; >> + if (mhi_cntrl->fbc_image && fw_sz != mhi_cntrl->prev_fw_sz) { >> + mhi_free_bhie_table(mhi_cntrl, mhi_cntrl->fbc_image); >> + mhi_cntrl->fbc_image = NULL; >> + } >> + if (!mhi_cntrl->fbc_image) { >> + ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl- >> >fbc_image, fw_sz); >> + if (ret) { >> + release_firmware(firmware); >> + goto error_fw_load; >> + } >> + mhi_cntrl->prev_fw_sz = fw_sz; > > This seems confusing. Why do we care about the previous fw size when we > care about the allocated bhie table size? The table size depends on seg_size and alloc_size. The seg_size remains same. While alloc_size would change if firmware size changes. So I'm checking just firmware size here. > Also, if the fw size is > smaller than the allocated table size it looks like we'll do a free/ > alloc, when it seems like we could jsut use the memory we already have. This can be done. I'll do it if we can find out if the firmware can change at resume time or not. > >> } >> /* Load the firmware into BHIE vec table */ >> diff --git a/include/linux/mhi.h b/include/linux/mhi.h >> index 059dc94d20bb6..65a47c712b3a0 100644 >> --- a/include/linux/mhi.h >> +++ b/include/linux/mhi.h >> @@ -382,6 +382,7 @@ struct mhi_controller { >> const char *fw_image; >> const u8 *fw_data; >> size_t fw_sz; >> + size_t prev_fw_sz; > > No documentation? Sorry, I'll add: * @prev_fw_sz: Previous firmware image data size, used when fbc_download is true > >> const char *edl_image; >> size_t rddm_size; >> size_t sbl_size; >> @@ -662,6 +663,12 @@ void mhi_power_down_keep_dev(struct >> mhi_controller *mhi_cntrl, bool graceful); >> */ >> void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl); >> +/** >> + * mhi_partial_unprepare_after_power_down - Free any allocated memory >> after power down partially > > This looks like it exceeds 80 char. > Also what is a "power down partially"? Fixing it: * mhi_partial_unprepare_after_power_down - Free any allocated memory after * power down other than fbc_image * and rddm_image > >
On 4/12/2025 12:02 AM, Muhammad Usama Anjum wrote: > On 4/11/25 1:39 PM, Krishna Chaitanya Chundru wrote: >> >> >> On 4/11/2025 12:32 PM, Muhammad Usama Anjum wrote: >>> On 4/11/25 8:37 AM, Krishna Chaitanya Chundru wrote: >>>> >>>> >>>> On 4/10/2025 8:26 PM, Muhammad Usama Anjum wrote: >>>>> Fix dma_direct_alloc() failure at resume time during bhie_table >>>>> allocation. There is a crash report where at resume time, the memory >>>>> from the dma doesn't get allocated and MHI fails to re-initialize. >>>>> There may be fragmentation of some kind which fails the allocation >>>>> call. >>>>> >>>>> To fix it, don't free the memory at power down during suspend / >>>>> hibernation. Instead, use the same allocated memory again after every >>>>> resume / hibernation. This patch has been tested with resume and >>>>> hibernation both. >>>>> >>>>> The rddm is of constant size for a given hardware. While the fbc_image >>>>> size depends on the firmware. If the firmware changes, we'll free and >>>> If firmware image will change between suspend and resume ? >>> Yes, correct. >>> >> why the firmware image size will change between suspend & resume? >> who will update the firmware image after bootup? >> It is not expected behaviour. > I was trying to research if the firmware can change or not. I've not > found any documentation on it. > > If the firmare is updated in filesystem before suspend/hibernate, would > the new firwmare be loaded the next time kernel resumes as the older > firmware is no where to be found? > > What do you think about this? > I don't think firmware can be updated before suspend/hibernate. I don't see any reason why it can be updated. If you think it can be updated please quote relevant doc. - Krishna Chaitanya. >> >> - Krishna chaitanya. >>>>> allocate new memory for it. >>>>> >>>>> Here are the crash logs: >>>>> >>>>> [ 3029.338587] mhi mhi0: Requested to power ON >>>>> [ 3029.338621] mhi mhi0: Power on setup success >>>>> [ 3029.668654] kworker/u33:8: page allocation failure: order:7, >>>>> mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 >>>>> [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted >>>>> 6.11.11-valve10-1-neptune-611-gb69e902b4338 >>>>> #1ed779c892334112fb968aaa3facf9686b5ff0bd7 >>>>> [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 >>>>> 08/01/2024 >>>>> [ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi] >>>>> [ 3029.668717] Call Trace: >>>>> [ 3029.668722] <TASK> >>>>> [ 3029.668728] dump_stack_lvl+0x4e/0x70 >>>>> [ 3029.668738] warn_alloc+0x164/0x190 >>>>> [ 3029.668747] ? srso_return_thunk+0x5/0x5f >>>>> [ 3029.668754] ? __alloc_pages_direct_compact+0xaf/0x360 >>>>> [ 3029.668761] __alloc_pages_slowpath.constprop.0+0xc75/0xd70 >>>>> [ 3029.668774] __alloc_pages_noprof+0x321/0x350 >>>>> [ 3029.668782] __dma_direct_alloc_pages.isra.0+0x14a/0x290 >>>>> [ 3029.668790] dma_direct_alloc+0x70/0x270 >>>>> [ 3029.668796] mhi_alloc_bhie_table+0xe8/0x190 [mhi >>>>> faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] >>>>> [ 3029.668814] mhi_fw_load_handler+0x1bc/0x310 [mhi >>>>> faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] >>>>> [ 3029.668830] mhi_pm_st_worker+0x5c8/0xaa0 [mhi >>>>> faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] >>>>> [ 3029.668844] ? srso_return_thunk+0x5/0x5f >>>>> [ 3029.668853] process_one_work+0x17e/0x330 >>>>> [ 3029.668861] worker_thread+0x2ce/0x3f0 >>>>> [ 3029.668868] ? __pfx_worker_thread+0x10/0x10 >>>>> [ 3029.668873] kthread+0xd2/0x100 >>>>> [ 3029.668879] ? __pfx_kthread+0x10/0x10 >>>>> [ 3029.668885] ret_from_fork+0x34/0x50 >>>>> [ 3029.668892] ? __pfx_kthread+0x10/0x10 >>>>> [ 3029.668898] ret_from_fork_asm+0x1a/0x30 >>>>> [ 3029.668910] </TASK> >>>>> >>>>> Tested-on: QCNFA765 WLAN.HSP.1.1-03926.13- >>>>> QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 >>>>> >>>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >>>>> --- >>>>> Changes sice v1: >>>>> - Don't free bhie tables during suspend/hibernation only >>>>> - Handle fbc_image changed size correctly >>>>> - Remove fbc_image getting set to NULL in *free_bhie_table() >>>>> --- >>>>> drivers/bus/mhi/host/boot.c | 15 +++++++++++---- >>>>> drivers/bus/mhi/host/init.c | 13 ++++++++++--- >>>>> drivers/net/wireless/ath/ath11k/mhi.c | 9 +++++---- >>>>> include/linux/mhi.h | 7 +++++++ >>>>> 4 files changed, 33 insertions(+), 11 deletions(-) >>>>> >>>>> diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c >>>>> index 9dcc7184817d5..0df26100c8f9c 100644 >>>>> --- a/drivers/bus/mhi/host/boot.c >>>>> +++ b/drivers/bus/mhi/host/boot.c >>>>> @@ -487,10 +487,17 @@ void mhi_fw_load_handler(struct mhi_controller >>>>> *mhi_cntrl) >>>>> * device transitioning into MHI READY state >>>>> */ >>>>> if (mhi_cntrl->fbc_download) { >>>>> - ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->fbc_image, >>>>> fw_sz); >>>>> - if (ret) { >>>>> - release_firmware(firmware); >>>>> - goto error_fw_load; >>>>> + if (mhi_cntrl->fbc_image && fw_sz != mhi_cntrl->prev_fw_sz) { >>>>> + mhi_free_bhie_table(mhi_cntrl, mhi_cntrl->fbc_image); >>>>> + mhi_cntrl->fbc_image = NULL; >>>>> + } >>>>> + if (!mhi_cntrl->fbc_image) { >>>>> + ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl- >>>>>> fbc_image, fw_sz); >>>>> + if (ret) { >>>>> + release_firmware(firmware); >>>>> + goto error_fw_load; >>>>> + } >>>>> + mhi_cntrl->prev_fw_sz = fw_sz; >>>>> } >>>>> /* Load the firmware into BHIE vec table */ >>>>> diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c >>>>> index a9b1f8beee7bc..09b946b86ac46 100644 >>>>> --- a/drivers/bus/mhi/host/init.c >>>>> +++ b/drivers/bus/mhi/host/init.c >>>>> @@ -1173,8 +1173,9 @@ int mhi_prepare_for_power_up(struct >>>>> mhi_controller *mhi_cntrl) >>>>> /* >>>>> * Allocate RDDM table for debugging purpose if specified >>>>> */ >>>>> - mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->rddm_image, >>>>> - mhi_cntrl->rddm_size); >>>>> + if (!mhi_cntrl->rddm_image) >>>>> + mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->rddm_image, >>>>> + mhi_cntrl->rddm_size); >>>>> if (mhi_cntrl->rddm_image) { >>>>> ret = mhi_rddm_prepare(mhi_cntrl, >>>>> mhi_cntrl->rddm_image); >>>>> @@ -1212,12 +1213,18 @@ void mhi_unprepare_after_power_down(struct >>>>> mhi_controller *mhi_cntrl) >>>>> mhi_cntrl->rddm_image = NULL; >>>>> } >>>>> + mhi_partial_unprepare_after_power_down(mhi_cntrl); >>>>> +} >>>>> +EXPORT_SYMBOL_GPL(mhi_unprepare_after_power_down); >>>>> + >>>>> +void mhi_partial_unprepare_after_power_down(struct mhi_controller >>>>> *mhi_cntrl) >>>>> +{ >>>>> mhi_cntrl->bhi = NULL; >>>>> mhi_cntrl->bhie = NULL; >>>>> mhi_deinit_dev_ctxt(mhi_cntrl); >>>>> } >>>>> -EXPORT_SYMBOL_GPL(mhi_unprepare_after_power_down); >>>>> +EXPORT_SYMBOL_GPL(mhi_partial_unprepare_after_power_down); >>>>> >>>> Instead of adding new API you can free memory from the unregister >>>> controller also. >>>> >>>> - Krishna Chaitanya. >>>>> static void mhi_release_device(struct device *dev) >>>>> { >>>>> diff --git a/drivers/net/wireless/ath/ath11k/mhi.c b/drivers/net/ >>>>> wireless/ath/ath11k/mhi.c >>>>> index acd76e9392d31..f77cec79b5b80 100644 >>>>> --- a/drivers/net/wireless/ath/ath11k/mhi.c >>>>> +++ b/drivers/net/wireless/ath/ath11k/mhi.c >>>>> @@ -460,12 +460,13 @@ void ath11k_mhi_stop(struct ath11k_pci *ab_pci, >>>>> bool is_suspend) >>>>> * workaround, otherwise ath11k_core_resume() will timeout >>>>> * during resume. >>>>> */ >>>>> - if (is_suspend) >>>>> + if (is_suspend) { >>>>> mhi_power_down_keep_dev(ab_pci->mhi_ctrl, true); >>>>> - else >>>>> + mhi_partial_unprepare_after_power_down(ab_pci->mhi_ctrl); >>>>> + } else { >>>>> mhi_power_down(ab_pci->mhi_ctrl, true); >>>>> - >>>>> - mhi_unprepare_after_power_down(ab_pci->mhi_ctrl); >>>>> + mhi_unprepare_after_power_down(ab_pci->mhi_ctrl); >>>>> + } >>>>> } >>>>> int ath11k_mhi_suspend(struct ath11k_pci *ab_pci) >>>>> diff --git a/include/linux/mhi.h b/include/linux/mhi.h >>>>> index 059dc94d20bb6..65a47c712b3a0 100644 >>>>> --- a/include/linux/mhi.h >>>>> +++ b/include/linux/mhi.h >>>>> @@ -382,6 +382,7 @@ struct mhi_controller { >>>>> const char *fw_image; >>>>> const u8 *fw_data; >>>>> size_t fw_sz; >>>>> + size_t prev_fw_sz; >>>>> const char *edl_image; >>>>> size_t rddm_size; >>>>> size_t sbl_size; >>>>> @@ -662,6 +663,12 @@ void mhi_power_down_keep_dev(struct >>>>> mhi_controller *mhi_cntrl, bool graceful); >>>>> */ >>>>> void mhi_unprepare_after_power_down(struct mhi_controller >>>>> *mhi_cntrl); >>>>> +/** >>>>> + * mhi_partial_unprepare_after_power_down - Free any allocated memory >>>>> after power down partially >>>>> + * @mhi_cntrl: MHI controller >>>>> + */ >>>>> +void mhi_partial_unprepare_after_power_down(struct mhi_controller >>>>> *mhi_cntrl); >>>>> + >>>>> /** >>>>> * mhi_pm_suspend - Move MHI into a suspended state >>>>> * @mhi_cntrl: MHI controller >>> >>> > >
On 4/14/2025 1:32 AM, Muhammad Usama Anjum wrote: > On 4/12/25 6:22 AM, Krishna Chaitanya Chundru wrote: >> >> On 4/12/2025 12:02 AM, Muhammad Usama Anjum wrote: >>> On 4/11/25 1:39 PM, Krishna Chaitanya Chundru wrote: >>>> >>>> >>>> On 4/11/2025 12:32 PM, Muhammad Usama Anjum wrote: >>>>> On 4/11/25 8:37 AM, Krishna Chaitanya Chundru wrote: >>>>>> >>>>>> >>>>>> On 4/10/2025 8:26 PM, Muhammad Usama Anjum wrote: >>>>>>> Fix dma_direct_alloc() failure at resume time during bhie_table >>>>>>> allocation. There is a crash report where at resume time, the memory >>>>>>> from the dma doesn't get allocated and MHI fails to re-initialize. >>>>>>> There may be fragmentation of some kind which fails the allocation >>>>>>> call. >>>>>>> >>>>>>> To fix it, don't free the memory at power down during suspend / >>>>>>> hibernation. Instead, use the same allocated memory again after every >>>>>>> resume / hibernation. This patch has been tested with resume and >>>>>>> hibernation both. >>>>>>> >>>>>>> The rddm is of constant size for a given hardware. While the >>>>>>> fbc_image >>>>>>> size depends on the firmware. If the firmware changes, we'll free and >>>>>> If firmware image will change between suspend and resume ? >>>>> Yes, correct. >>>>> >>>> why the firmware image size will change between suspend & resume? >>>> who will update the firmware image after bootup? >>>> It is not expected behaviour. >>> I was trying to research if the firmware can change or not. I've not >>> found any documentation on it. >>> >>> If the firmare is updated in filesystem before suspend/hibernate, would >>> the new firwmare be loaded the next time kernel resumes as the older >>> firmware is no where to be found? >>> >>> What do you think about this? >>> >> I don't think firmware can be updated before suspend/hibernate. I don't >> see any reason why it can be updated. If you think it can be updated >> please quote relevant doc. > I've not found any documentation on it. Let's wait for others to review > and it it cannot be updated, I'll remove this part. > Wouldn't this be trivial to test? Boot the device, go modify the firmware on the filesystem, then go through a suspend cycle. -Jeff
On 4/14/25 7:14 PM, Jeff Hugo wrote: > On 4/14/2025 1:32 AM, Muhammad Usama Anjum wrote: >> On 4/12/25 6:22 AM, Krishna Chaitanya Chundru wrote: >>> >>> On 4/12/2025 12:02 AM, Muhammad Usama Anjum wrote: >>>> On 4/11/25 1:39 PM, Krishna Chaitanya Chundru wrote: >>>>> >>>>> >>>>> On 4/11/2025 12:32 PM, Muhammad Usama Anjum wrote: >>>>>> On 4/11/25 8:37 AM, Krishna Chaitanya Chundru wrote: >>>>>>> >>>>>>> >>>>>>> On 4/10/2025 8:26 PM, Muhammad Usama Anjum wrote: >>>>>>>> Fix dma_direct_alloc() failure at resume time during bhie_table >>>>>>>> allocation. There is a crash report where at resume time, the >>>>>>>> memory >>>>>>>> from the dma doesn't get allocated and MHI fails to re-initialize. >>>>>>>> There may be fragmentation of some kind which fails the allocation >>>>>>>> call. >>>>>>>> >>>>>>>> To fix it, don't free the memory at power down during suspend / >>>>>>>> hibernation. Instead, use the same allocated memory again after >>>>>>>> every >>>>>>>> resume / hibernation. This patch has been tested with resume and >>>>>>>> hibernation both. >>>>>>>> >>>>>>>> The rddm is of constant size for a given hardware. While the >>>>>>>> fbc_image >>>>>>>> size depends on the firmware. If the firmware changes, we'll >>>>>>>> free and >>>>>>> If firmware image will change between suspend and resume ? >>>>>> Yes, correct. >>>>>> >>>>> why the firmware image size will change between suspend & resume? >>>>> who will update the firmware image after bootup? >>>>> It is not expected behaviour. >>>> I was trying to research if the firmware can change or not. I've not >>>> found any documentation on it. >>>> >>>> If the firmare is updated in filesystem before suspend/hibernate, would >>>> the new firwmare be loaded the next time kernel resumes as the older >>>> firmware is no where to be found? >>>> >>>> What do you think about this? >>>> >>> I don't think firmware can be updated before suspend/hibernate. I don't >>> see any reason why it can be updated. If you think it can be updated >>> please quote relevant doc. >> I've not found any documentation on it. Let's wait for others to review >> and it it cannot be updated, I'll remove this part. >> > > Wouldn't this be trivial to test? Boot the device, go modify the > firmware on the filesystem, then go through a suspend cycle. I just tested this. I've used an old firmware from last year vs the latest one. Firmware A: old firmware size: 5349376 Firmware B: new firmware size: 5165056 A here has bigger size. 1. I loaded A at boot and then replaced the firmwares in filesystem with B before syspend. At resume time, B was loaded fine by freeing the bigger memory area and allocating the smaller one. 2. I loaded B and then replaced A in its place before suspend. At resume time, memory was freed and larger memory was allocated. But driver wasn't able to initialize correctly: [ 184.051902] ath11k_pci 0000:03:00.0: timeout while waiting for restart complete [ 184.051916] ath11k_pci 0000:03:00.0: failed to resume core: -110 [ 184.051923] ath11k_pci 0000:03:00.0: PM: dpm_run_callback(): pci_pm_resume returns -110 [ 184.051945] ath11k_pci 0000:03:00.0: PM: failed to resume async: error -110 [ 187.251911] ath11k_pci 0000:03:00.0: wmi command 16387 timeout [ 187.251924] ath11k_pci 0000:03:00.0: failed to send WMI_PDEV_SET_PARAM cmd [ 187.251933] ath11k_pci 0000:03:00.0: failed to enable dynamic bw: -11 So should we generalize above that changing firmware at suspend/hibernation time isn't supported. If firmware package is updated, does user restarts every time? Attaching the dirty logs for reference. There are a lot of debug logs in it.
On 4/18/2025 2:10 AM, Muhammad Usama Anjum wrote: > On 4/14/25 7:14 PM, Jeff Hugo wrote: >> On 4/14/2025 1:32 AM, Muhammad Usama Anjum wrote: >>> On 4/12/25 6:22 AM, Krishna Chaitanya Chundru wrote: >>>> >>>> On 4/12/2025 12:02 AM, Muhammad Usama Anjum wrote: >>>>> On 4/11/25 1:39 PM, Krishna Chaitanya Chundru wrote: >>>>>> >>>>>> >>>>>> On 4/11/2025 12:32 PM, Muhammad Usama Anjum wrote: >>>>>>> On 4/11/25 8:37 AM, Krishna Chaitanya Chundru wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 4/10/2025 8:26 PM, Muhammad Usama Anjum wrote: >>>>>>>>> Fix dma_direct_alloc() failure at resume time during bhie_table >>>>>>>>> allocation. There is a crash report where at resume time, the >>>>>>>>> memory >>>>>>>>> from the dma doesn't get allocated and MHI fails to re-initialize. >>>>>>>>> There may be fragmentation of some kind which fails the allocation >>>>>>>>> call. >>>>>>>>> >>>>>>>>> To fix it, don't free the memory at power down during suspend / >>>>>>>>> hibernation. Instead, use the same allocated memory again after >>>>>>>>> every >>>>>>>>> resume / hibernation. This patch has been tested with resume and >>>>>>>>> hibernation both. >>>>>>>>> >>>>>>>>> The rddm is of constant size for a given hardware. While the >>>>>>>>> fbc_image >>>>>>>>> size depends on the firmware. If the firmware changes, we'll >>>>>>>>> free and >>>>>>>> If firmware image will change between suspend and resume ? >>>>>>> Yes, correct. >>>>>>> >>>>>> why the firmware image size will change between suspend & resume? >>>>>> who will update the firmware image after bootup? >>>>>> It is not expected behaviour. >>>>> I was trying to research if the firmware can change or not. I've not >>>>> found any documentation on it. >>>>> >>>>> If the firmare is updated in filesystem before suspend/hibernate, would >>>>> the new firwmare be loaded the next time kernel resumes as the older >>>>> firmware is no where to be found? >>>>> >>>>> What do you think about this? >>>>> >>>> I don't think firmware can be updated before suspend/hibernate. I don't >>>> see any reason why it can be updated. If you think it can be updated >>>> please quote relevant doc. >>> I've not found any documentation on it. Let's wait for others to review >>> and it it cannot be updated, I'll remove this part. >>> >> >> Wouldn't this be trivial to test? Boot the device, go modify the >> firmware on the filesystem, then go through a suspend cycle. > I just tested this. I've used an old firmware from last year vs the > latest one. > > Firmware A: old firmware size: 5349376 > Firmware B: new firmware size: 5165056 > > A here has bigger size. > > 1. I loaded A at boot and then replaced the firmwares in filesystem with > B before syspend. At resume time, B was loaded fine by freeing the > bigger memory area and allocating the smaller one. > > 2. I loaded B and then replaced A in its place before suspend. At resume > time, memory was freed and larger memory was allocated. But driver > wasn't able to initialize correctly: > > [ 184.051902] ath11k_pci 0000:03:00.0: timeout while waiting for > restart complete > [ 184.051916] ath11k_pci 0000:03:00.0: failed to resume core: -110 > [ 184.051923] ath11k_pci 0000:03:00.0: PM: dpm_run_callback(): > pci_pm_resume returns -110 > [ 184.051945] ath11k_pci 0000:03:00.0: PM: failed to resume async: > error -110 > [ 187.251911] ath11k_pci 0000:03:00.0: wmi command 16387 timeout > [ 187.251924] ath11k_pci 0000:03:00.0: failed to send > WMI_PDEV_SET_PARAM cmd > [ 187.251933] ath11k_pci 0000:03:00.0: failed to enable dynamic bw: -11 > > So should we generalize above that changing firmware at > suspend/hibernation time isn't supported. If firmware package is > updated, does user restarts every time? You may want to review how other devices handle this. I can think of these threads as potential reference https://lore.kernel.org/all/CAPM=9twyvq3EWkwUeoTdMMj76u_sRPmUDHWrzbzEZFQ8eL++BQ@mail.gmail.com/ https://lore.kernel.org/all/20250207012531.621369-1-airlied@gmail.com/ -Jeff
On 4/22/2025 1:23 AM, Muhammad Usama Anjum wrote: > On 4/18/25 7:08 PM, Jeff Hugo wrote: >> On 4/18/2025 2:10 AM, Muhammad Usama Anjum wrote: >>> On 4/14/25 7:14 PM, Jeff Hugo wrote: >>>> On 4/14/2025 1:32 AM, Muhammad Usama Anjum wrote: >>>>> On 4/12/25 6:22 AM, Krishna Chaitanya Chundru wrote: >>>>>> >>>>>> On 4/12/2025 12:02 AM, Muhammad Usama Anjum wrote: >>>>>>> On 4/11/25 1:39 PM, Krishna Chaitanya Chundru wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 4/11/2025 12:32 PM, Muhammad Usama Anjum wrote: >>>>>>>>> On 4/11/25 8:37 AM, Krishna Chaitanya Chundru wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 4/10/2025 8:26 PM, Muhammad Usama Anjum wrote: >>>>>>>>>>> Fix dma_direct_alloc() failure at resume time during bhie_table >>>>>>>>>>> allocation. There is a crash report where at resume time, the >>>>>>>>>>> memory >>>>>>>>>>> from the dma doesn't get allocated and MHI fails to re- >>>>>>>>>>> initialize. >>>>>>>>>>> There may be fragmentation of some kind which fails the >>>>>>>>>>> allocation >>>>>>>>>>> call. >>>>>>>>>>> >>>>>>>>>>> To fix it, don't free the memory at power down during suspend / >>>>>>>>>>> hibernation. Instead, use the same allocated memory again after >>>>>>>>>>> every >>>>>>>>>>> resume / hibernation. This patch has been tested with resume and >>>>>>>>>>> hibernation both. >>>>>>>>>>> >>>>>>>>>>> The rddm is of constant size for a given hardware. While the >>>>>>>>>>> fbc_image >>>>>>>>>>> size depends on the firmware. If the firmware changes, we'll >>>>>>>>>>> free and >>>>>>>>>> If firmware image will change between suspend and resume ? >>>>>>>>> Yes, correct. >>>>>>>>> >>>>>>>> why the firmware image size will change between suspend & resume? >>>>>>>> who will update the firmware image after bootup? >>>>>>>> It is not expected behaviour. >>>>>>> I was trying to research if the firmware can change or not. I've not >>>>>>> found any documentation on it. >>>>>>> >>>>>>> If the firmare is updated in filesystem before suspend/hibernate, >>>>>>> would >>>>>>> the new firwmare be loaded the next time kernel resumes as the older >>>>>>> firmware is no where to be found? >>>>>>> >>>>>>> What do you think about this? >>>>>>> >>>>>> I don't think firmware can be updated before suspend/hibernate. I >>>>>> don't >>>>>> see any reason why it can be updated. If you think it can be updated >>>>>> please quote relevant doc. >>>>> I've not found any documentation on it. Let's wait for others to review >>>>> and it it cannot be updated, I'll remove this part. >>>>> >>>> >>>> Wouldn't this be trivial to test? Boot the device, go modify the >>>> firmware on the filesystem, then go through a suspend cycle. >>> I just tested this. I've used an old firmware from last year vs the >>> latest one. >>> >>> Firmware A: old firmware size: 5349376 >>> Firmware B: new firmware size: 5165056 >>> >>> A here has bigger size. >>> >>> 1. I loaded A at boot and then replaced the firmwares in filesystem with >>> B before syspend. At resume time, B was loaded fine by freeing the >>> bigger memory area and allocating the smaller one. >>> >>> 2. I loaded B and then replaced A in its place before suspend. At resume >>> time, memory was freed and larger memory was allocated. But driver >>> wasn't able to initialize correctly: >>> >>> [ 184.051902] ath11k_pci 0000:03:00.0: timeout while waiting for >>> restart complete >>> [ 184.051916] ath11k_pci 0000:03:00.0: failed to resume core: -110 >>> [ 184.051923] ath11k_pci 0000:03:00.0: PM: dpm_run_callback(): >>> pci_pm_resume returns -110 >>> [ 184.051945] ath11k_pci 0000:03:00.0: PM: failed to resume async: >>> error -110 >>> [ 187.251911] ath11k_pci 0000:03:00.0: wmi command 16387 timeout >>> [ 187.251924] ath11k_pci 0000:03:00.0: failed to send >>> WMI_PDEV_SET_PARAM cmd >>> [ 187.251933] ath11k_pci 0000:03:00.0: failed to enable dynamic bw: -11 >>> >>> So should we generalize above that changing firmware at >>> suspend/hibernation time isn't supported. If firmware package is >>> updated, does user restarts every time? >> >> You may want to review how other devices handle this. I can think of >> these threads as potential reference >> >> https://lore.kernel.org/all/ >> CAPM=9twyvq3EWkwUeoTdMMj76u_sRPmUDHWrzbzEZFQ8eL++BQ@mail.gmail.com/ >> https://lore.kernel.org/all/20250207012531.621369-1-airlied@gmail.com/ > They are talking about firmware cache which is not being used in the > wireless drivers. In my kernel config, firwmare cache is enabeld. But > everytime kernel needs to read the firwamre, it reads from the filesystem. > > What can be the way forward for this patch? Assuming my previous > experiment with changed firmwares across suspend/resume failed, I should > remove reuse logic and send again? Perhaps you need to refactor the wireless drivers? I'm not convinced your patch is valid. If FW needs to be reloaded due to suspend/resume, it seems like the proper thing is to load the same FW that was loaded at device boot. Per your testing, loading changed FW can cause a failure. Even if it doesn't fail, will the changed firmware cause a "breakage" from the user perspective by modifying the device behavior? This does not seem to be a problem that is relevant to all MHI devices, so whatever the end solution ends up being, I think that it should not be blanket applied to all of MHI. -Jeff
On Thu, Apr 10, 2025 at 07:56:54PM +0500, Muhammad Usama Anjum wrote: > Fix dma_direct_alloc() failure at resume time during bhie_table > allocation. There is a crash report where at resume time, the memory > from the dma doesn't get allocated and MHI fails to re-initialize. > There may be fragmentation of some kind which fails the allocation > call. > If dma_direct_alloc() fails, then it is a platform limitation/issue. We cannot workaround that in the device drivers. What is the guarantee that other drivers will also continue to work? Will you go ahead and patch all of them which release memory during suspend? Please investigate why the allocation fails. Even this is not a device issue, so we cannot add quirks :/ > To fix it, don't free the memory at power down during suspend / > hibernation. Instead, use the same allocated memory again after every > resume / hibernation. This patch has been tested with resume and > hibernation both. > > The rddm is of constant size for a given hardware. While the fbc_image > size depends on the firmware. If the firmware changes, we'll free and > allocate new memory for it. > > Here are the crash logs: > > [ 3029.338587] mhi mhi0: Requested to power ON > [ 3029.338621] mhi mhi0: Power on setup success > [ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 > [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7 > [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024 Did you intend to leak this information? If not, please remove it from stacktrace. - Mani
On 4/25/25 12:04 PM, Manivannan Sadhasivam wrote: > On Thu, Apr 10, 2025 at 07:56:54PM +0500, Muhammad Usama Anjum wrote: >> Fix dma_direct_alloc() failure at resume time during bhie_table >> allocation. There is a crash report where at resume time, the memory >> from the dma doesn't get allocated and MHI fails to re-initialize. >> There may be fragmentation of some kind which fails the allocation >> call. >> > > If dma_direct_alloc() fails, then it is a platform limitation/issue. We cannot > workaround that in the device drivers. What is the guarantee that other drivers > will also continue to work? Will you go ahead and patch all of them which > release memory during suspend? > > Please investigate why the allocation fails. Even this is not a device issue, so > we cannot add quirks :/ This isn't a platform specific quirk. We are only hitting it because there is high memory pressure during suspend/resume. This dma allocation failure can happen with memory pressure on any device. The purpose of this patch is just to make driver more robust to memory pressure during resume. I'm not sure about MHI. But other drivers already have such patches as dma_direct_alloc() is susceptible to failures when memory pressure is high. This patch was motivated from ath12k [1] and ath11k [2]. [1] https://lore.kernel.org/all/20240419034034.2842-1-quic_bqiang@quicinc.com/ [2] https://lore.kernel.org/all/20220506141448.10340-1-quic_akolli@quicinc.com/ What do you think can be the way forward for this patch? > >> To fix it, don't free the memory at power down during suspend / >> hibernation. Instead, use the same allocated memory again after every >> resume / hibernation. This patch has been tested with resume and >> hibernation both. >> >> The rddm is of constant size for a given hardware. While the fbc_image >> size depends on the firmware. If the firmware changes, we'll free and >> allocate new memory for it. >> >> Here are the crash logs: >> >> [ 3029.338587] mhi mhi0: Requested to power ON >> [ 3029.338621] mhi mhi0: Power on setup success >> [ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 >> [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7 >> [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024 > > Did you intend to leak this information? If not, please remove it from > stacktrace. The device isn't private. Its fine. > > - Mani >
On Fri, Apr 25, 2025 at 12:14:39PM +0500, Muhammad Usama Anjum wrote: > On 4/25/25 12:04 PM, Manivannan Sadhasivam wrote: > > On Thu, Apr 10, 2025 at 07:56:54PM +0500, Muhammad Usama Anjum wrote: > >> Fix dma_direct_alloc() failure at resume time during bhie_table > >> allocation. There is a crash report where at resume time, the memory > >> from the dma doesn't get allocated and MHI fails to re-initialize. > >> There may be fragmentation of some kind which fails the allocation > >> call. > >> > > > > If dma_direct_alloc() fails, then it is a platform limitation/issue. We cannot > > workaround that in the device drivers. What is the guarantee that other drivers > > will also continue to work? Will you go ahead and patch all of them which > > release memory during suspend? > > > > Please investigate why the allocation fails. Even this is not a device issue, so > > we cannot add quirks :/ > This isn't a platform specific quirk. We are only hitting it because > there is high memory pressure during suspend/resume. This dma allocation > failure can happen with memory pressure on any device. > Yes. > The purpose of this patch is just to make driver more robust to memory > pressure during resume. > > I'm not sure about MHI. But other drivers already have such patches as > dma_direct_alloc() is susceptible to failures when memory pressure is > high. This patch was motivated from ath12k [1] and ath11k [2]. > Even if we patch the MHI driver, the issue is going to trip some other driver. How does the DMA memory goes low during resume? So some other driver is consuming more than it did during probe()? > [1] > https://lore.kernel.org/all/20240419034034.2842-1-quic_bqiang@quicinc.com/ > [2] > https://lore.kernel.org/all/20220506141448.10340-1-quic_akolli@quicinc.com/ > > What do you think can be the way forward for this patch? > Let's try first to analyze why the memory pressure happens during suspend. As I can see, even if we fix the MHI driver, you are likely to hit this issue somewhere else. - Mani > > [...] > > Did you intend to leak this information? If not, please remove it from > > stacktrace. > The device isn't private. Its fine. > Okay. - Mani
On 4/25/25 12:32 PM, Manivannan Sadhasivam wrote: > On Fri, Apr 25, 2025 at 12:14:39PM +0500, Muhammad Usama Anjum wrote: >> On 4/25/25 12:04 PM, Manivannan Sadhasivam wrote: >>> On Thu, Apr 10, 2025 at 07:56:54PM +0500, Muhammad Usama Anjum wrote: >>>> Fix dma_direct_alloc() failure at resume time during bhie_table >>>> allocation. There is a crash report where at resume time, the memory >>>> from the dma doesn't get allocated and MHI fails to re-initialize. >>>> There may be fragmentation of some kind which fails the allocation >>>> call. >>>> >>> >>> If dma_direct_alloc() fails, then it is a platform limitation/issue. We cannot >>> workaround that in the device drivers. What is the guarantee that other drivers >>> will also continue to work? Will you go ahead and patch all of them which >>> release memory during suspend? >>> >>> Please investigate why the allocation fails. Even this is not a device issue, so >>> we cannot add quirks :/ >> This isn't a platform specific quirk. We are only hitting it because >> there is high memory pressure during suspend/resume. This dma allocation >> failure can happen with memory pressure on any device. >> > > Yes. Thanks for understanding. > >> The purpose of this patch is just to make driver more robust to memory >> pressure during resume. >> >> I'm not sure about MHI. But other drivers already have such patches as >> dma_direct_alloc() is susceptible to failures when memory pressure is >> high. This patch was motivated from ath12k [1] and ath11k [2]. >> > > Even if we patch the MHI driver, the issue is going to trip some other driver. > How does the DMA memory goes low during resume? So some other driver is > consuming more than it did during probe()? Think it like this. The first probe happens just after boot. Most of the RAM was empty. Then let's say user launches applications which not only consume entire RAM but also the Swap. The DMA memory area is the first ~4GB on x86_64 (if I'm not mistaken). Now at resume time when we want to allocate memory from dma, it may not be available entirely or because of fragmentation we cannot allocate that much contiguous memory. In our testing and real world cases, right now only wifi driver is misbehaving. Wifi is also very important. So we are hoping to make wifi driver robust. > >> [1] >> https://lore.kernel.org/all/20240419034034.2842-1-quic_bqiang@quicinc.com/ >> [2] >> https://lore.kernel.org/all/20220506141448.10340-1-quic_akolli@quicinc.com/ >> >> What do you think can be the way forward for this patch? >> > > Let's try first to analyze why the memory pressure happens during suspend. As I > can see, even if we fix the MHI driver, you are likely to hit this issue > somewhere else.> > - Mani > >>> > > [...] > >>> Did you intend to leak this information? If not, please remove it from >>> stacktrace. >> The device isn't private. Its fine. >> > > Okay. > > - Mani >
On Fri, Apr 25, 2025 at 12:42:38PM +0500, Muhammad Usama Anjum wrote: > On 4/25/25 12:32 PM, Manivannan Sadhasivam wrote: > > On Fri, Apr 25, 2025 at 12:14:39PM +0500, Muhammad Usama Anjum wrote: > >> On 4/25/25 12:04 PM, Manivannan Sadhasivam wrote: > >>> On Thu, Apr 10, 2025 at 07:56:54PM +0500, Muhammad Usama Anjum wrote: > >>>> Fix dma_direct_alloc() failure at resume time during bhie_table > >>>> allocation. There is a crash report where at resume time, the memory > >>>> from the dma doesn't get allocated and MHI fails to re-initialize. > >>>> There may be fragmentation of some kind which fails the allocation > >>>> call. > >>>> > >>> > >>> If dma_direct_alloc() fails, then it is a platform limitation/issue. We cannot > >>> workaround that in the device drivers. What is the guarantee that other drivers > >>> will also continue to work? Will you go ahead and patch all of them which > >>> release memory during suspend? > >>> > >>> Please investigate why the allocation fails. Even this is not a device issue, so > >>> we cannot add quirks :/ > >> This isn't a platform specific quirk. We are only hitting it because > >> there is high memory pressure during suspend/resume. This dma allocation > >> failure can happen with memory pressure on any device. > >> > > > > Yes. > Thanks for understanding. > > > > >> The purpose of this patch is just to make driver more robust to memory > >> pressure during resume. > >> > >> I'm not sure about MHI. But other drivers already have such patches as > >> dma_direct_alloc() is susceptible to failures when memory pressure is > >> high. This patch was motivated from ath12k [1] and ath11k [2]. > >> > > > > Even if we patch the MHI driver, the issue is going to trip some other driver. > > How does the DMA memory goes low during resume? So some other driver is > > consuming more than it did during probe()? > Think it like this. The first probe happens just after boot. Most of the > RAM was empty. Then let's say user launches applications which not only > consume entire RAM but also the Swap. The DMA memory area is the first > ~4GB on x86_64 (if I'm not mistaken). Now at resume time when we want to > allocate memory from dma, it may not be available entirely or because of > fragmentation we cannot allocate that much contiguous memory. > Looks like you have a workload that consumes the limited DMA coherent memory. Most likely the GPU applications I think. > In our testing and real world cases, right now only wifi driver is > misbehaving. Wifi is also very important. So we are hoping to make wifi > driver robust. > Sounds fair. If you want to move forward, please modify the exisiting mhi_power_down_keep_dev() to include this partial unprepare as well: mhi_power_down_unprepare_keep_dev() Since both APIs are anyway going to be used together, I don't see a need to introduce yet another API. - Mani > > > >> [1] > >> https://lore.kernel.org/all/20240419034034.2842-1-quic_bqiang@quicinc.com/ > >> [2] > >> https://lore.kernel.org/all/20220506141448.10340-1-quic_akolli@quicinc.com/ > >> > >> What do you think can be the way forward for this patch? > >> > > > > Let's try first to analyze why the memory pressure happens during suspend. As I > > can see, even if we fix the MHI driver, you are likely to hit this issue > > somewhere else.> > > - Mani > > > >>> > > > > [...] > > > >>> Did you intend to leak this information? If not, please remove it from > >>> stacktrace. > >> The device isn't private. Its fine. > >> > > > > Okay. > > > > - Mani > > > > > -- > Regards, > Usama
On 4/25/25 1:59 PM, Manivannan Sadhasivam wrote: > On Fri, Apr 25, 2025 at 12:42:38PM +0500, Muhammad Usama Anjum wrote: >> On 4/25/25 12:32 PM, Manivannan Sadhasivam wrote: >>> On Fri, Apr 25, 2025 at 12:14:39PM +0500, Muhammad Usama Anjum wrote: >>>> On 4/25/25 12:04 PM, Manivannan Sadhasivam wrote: >>>>> On Thu, Apr 10, 2025 at 07:56:54PM +0500, Muhammad Usama Anjum wrote: >>>>>> Fix dma_direct_alloc() failure at resume time during bhie_table >>>>>> allocation. There is a crash report where at resume time, the memory >>>>>> from the dma doesn't get allocated and MHI fails to re-initialize. >>>>>> There may be fragmentation of some kind which fails the allocation >>>>>> call. >>>>>> >>>>> >>>>> If dma_direct_alloc() fails, then it is a platform limitation/issue. We cannot >>>>> workaround that in the device drivers. What is the guarantee that other drivers >>>>> will also continue to work? Will you go ahead and patch all of them which >>>>> release memory during suspend? >>>>> >>>>> Please investigate why the allocation fails. Even this is not a device issue, so >>>>> we cannot add quirks :/ >>>> This isn't a platform specific quirk. We are only hitting it because >>>> there is high memory pressure during suspend/resume. This dma allocation >>>> failure can happen with memory pressure on any device. >>>> >>> >>> Yes. >> Thanks for understanding. >> >>> >>>> The purpose of this patch is just to make driver more robust to memory >>>> pressure during resume. >>>> >>>> I'm not sure about MHI. But other drivers already have such patches as >>>> dma_direct_alloc() is susceptible to failures when memory pressure is >>>> high. This patch was motivated from ath12k [1] and ath11k [2]. >>>> >>> >>> Even if we patch the MHI driver, the issue is going to trip some other driver. >>> How does the DMA memory goes low during resume? So some other driver is >>> consuming more than it did during probe()? >> Think it like this. The first probe happens just after boot. Most of the >> RAM was empty. Then let's say user launches applications which not only >> consume entire RAM but also the Swap. The DMA memory area is the first >> ~4GB on x86_64 (if I'm not mistaken). Now at resume time when we want to >> allocate memory from dma, it may not be available entirely or because of >> fragmentation we cannot allocate that much contiguous memory. >> > > Looks like you have a workload that consumes the limited DMA coherent memory. > Most likely the GPU applications I think. > >> In our testing and real world cases, right now only wifi driver is >> misbehaving. Wifi is also very important. So we are hoping to make wifi >> driver robust. >> > > Sounds fair. If you want to move forward, please modify the exisiting > mhi_power_down_keep_dev() to include this partial unprepare as well: > > mhi_power_down_unprepare_keep_dev() > > Since both APIs are anyway going to be used together, I don't see a need to > introduce yet another API. I've looked at usages of mhi_power_down_keep_dev(). Its getting used by ath12k and ath11k both. We would have to look at ath12k as well before we can change mhi_power_down_keep_dev(). Unfortunately, I don't have device using ath12k at hand. Should we keep this new API or what should we do? > > - Mani > >>> >>>> [1] >>>> https://lore.kernel.org/all/20240419034034.2842-1-quic_bqiang@quicinc.com/ >>>> [2] >>>> https://lore.kernel.org/all/20220506141448.10340-1-quic_akolli@quicinc.com/ >>>> >>>> What do you think can be the way forward for this patch? >>>> >>> >>> Let's try first to analyze why the memory pressure happens during suspend. As I >>> can see, even if we fix the MHI driver, you are likely to hit this issue >>> somewhere else.> >>> - Mani >>> >>>>> >>> >>> [...] >>> >>>>> Did you intend to leak this information? If not, please remove it from >>>>> stacktrace. >>>> The device isn't private. Its fine. >>>> >>> >>> Okay. >>> >>> - Mani >>> >> >> >> -- >> Regards, >> Usama >
On Fri, Apr 25, 2025 at 04:41:43PM +0500, Muhammad Usama Anjum wrote: > On 4/25/25 1:59 PM, Manivannan Sadhasivam wrote: > > On Fri, Apr 25, 2025 at 12:42:38PM +0500, Muhammad Usama Anjum wrote: > >> On 4/25/25 12:32 PM, Manivannan Sadhasivam wrote: > >>> On Fri, Apr 25, 2025 at 12:14:39PM +0500, Muhammad Usama Anjum wrote: > >>>> On 4/25/25 12:04 PM, Manivannan Sadhasivam wrote: > >>>>> On Thu, Apr 10, 2025 at 07:56:54PM +0500, Muhammad Usama Anjum wrote: > >>>>>> Fix dma_direct_alloc() failure at resume time during bhie_table > >>>>>> allocation. There is a crash report where at resume time, the memory > >>>>>> from the dma doesn't get allocated and MHI fails to re-initialize. > >>>>>> There may be fragmentation of some kind which fails the allocation > >>>>>> call. > >>>>>> > >>>>> > >>>>> If dma_direct_alloc() fails, then it is a platform limitation/issue. We cannot > >>>>> workaround that in the device drivers. What is the guarantee that other drivers > >>>>> will also continue to work? Will you go ahead and patch all of them which > >>>>> release memory during suspend? > >>>>> > >>>>> Please investigate why the allocation fails. Even this is not a device issue, so > >>>>> we cannot add quirks :/ > >>>> This isn't a platform specific quirk. We are only hitting it because > >>>> there is high memory pressure during suspend/resume. This dma allocation > >>>> failure can happen with memory pressure on any device. > >>>> > >>> > >>> Yes. > >> Thanks for understanding. > >> > >>> > >>>> The purpose of this patch is just to make driver more robust to memory > >>>> pressure during resume. > >>>> > >>>> I'm not sure about MHI. But other drivers already have such patches as > >>>> dma_direct_alloc() is susceptible to failures when memory pressure is > >>>> high. This patch was motivated from ath12k [1] and ath11k [2]. > >>>> > >>> > >>> Even if we patch the MHI driver, the issue is going to trip some other driver. > >>> How does the DMA memory goes low during resume? So some other driver is > >>> consuming more than it did during probe()? > >> Think it like this. The first probe happens just after boot. Most of the > >> RAM was empty. Then let's say user launches applications which not only > >> consume entire RAM but also the Swap. The DMA memory area is the first > >> ~4GB on x86_64 (if I'm not mistaken). Now at resume time when we want to > >> allocate memory from dma, it may not be available entirely or because of > >> fragmentation we cannot allocate that much contiguous memory. > >> > > > > Looks like you have a workload that consumes the limited DMA coherent memory. > > Most likely the GPU applications I think. > > > >> In our testing and real world cases, right now only wifi driver is > >> misbehaving. Wifi is also very important. So we are hoping to make wifi > >> driver robust. > >> > > > > Sounds fair. If you want to move forward, please modify the exisiting > > mhi_power_down_keep_dev() to include this partial unprepare as well: > > > > mhi_power_down_unprepare_keep_dev() > > > > Since both APIs are anyway going to be used together, I don't see a need to > > introduce yet another API. > I've looked at usages of mhi_power_down_keep_dev(). Its getting used by > ath12k and ath11k both. We would have to look at ath12k as well before > we can change mhi_power_down_keep_dev(). Unfortunately, I don't have > device using ath12k at hand. > ath12k conversion looks trivial. So please go ahead with this new API conversion for that driver as well. - Mani
diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c index 9dcc7184817d5..0df26100c8f9c 100644 --- a/drivers/bus/mhi/host/boot.c +++ b/drivers/bus/mhi/host/boot.c @@ -487,10 +487,17 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl) * device transitioning into MHI READY state */ if (mhi_cntrl->fbc_download) { - ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->fbc_image, fw_sz); - if (ret) { - release_firmware(firmware); - goto error_fw_load; + if (mhi_cntrl->fbc_image && fw_sz != mhi_cntrl->prev_fw_sz) { + mhi_free_bhie_table(mhi_cntrl, mhi_cntrl->fbc_image); + mhi_cntrl->fbc_image = NULL; + } + if (!mhi_cntrl->fbc_image) { + ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->fbc_image, fw_sz); + if (ret) { + release_firmware(firmware); + goto error_fw_load; + } + mhi_cntrl->prev_fw_sz = fw_sz; } /* Load the firmware into BHIE vec table */ diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c index a9b1f8beee7bc..09b946b86ac46 100644 --- a/drivers/bus/mhi/host/init.c +++ b/drivers/bus/mhi/host/init.c @@ -1173,8 +1173,9 @@ int mhi_prepare_for_power_up(struct mhi_controller *mhi_cntrl) /* * Allocate RDDM table for debugging purpose if specified */ - mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->rddm_image, - mhi_cntrl->rddm_size); + if (!mhi_cntrl->rddm_image) + mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->rddm_image, + mhi_cntrl->rddm_size); if (mhi_cntrl->rddm_image) { ret = mhi_rddm_prepare(mhi_cntrl, mhi_cntrl->rddm_image); @@ -1212,12 +1213,18 @@ void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl) mhi_cntrl->rddm_image = NULL; } + mhi_partial_unprepare_after_power_down(mhi_cntrl); +} +EXPORT_SYMBOL_GPL(mhi_unprepare_after_power_down); + +void mhi_partial_unprepare_after_power_down(struct mhi_controller *mhi_cntrl) +{ mhi_cntrl->bhi = NULL; mhi_cntrl->bhie = NULL; mhi_deinit_dev_ctxt(mhi_cntrl); } -EXPORT_SYMBOL_GPL(mhi_unprepare_after_power_down); +EXPORT_SYMBOL_GPL(mhi_partial_unprepare_after_power_down); static void mhi_release_device(struct device *dev) { diff --git a/drivers/net/wireless/ath/ath11k/mhi.c b/drivers/net/wireless/ath/ath11k/mhi.c index acd76e9392d31..f77cec79b5b80 100644 --- a/drivers/net/wireless/ath/ath11k/mhi.c +++ b/drivers/net/wireless/ath/ath11k/mhi.c @@ -460,12 +460,13 @@ void ath11k_mhi_stop(struct ath11k_pci *ab_pci, bool is_suspend) * workaround, otherwise ath11k_core_resume() will timeout * during resume. */ - if (is_suspend) + if (is_suspend) { mhi_power_down_keep_dev(ab_pci->mhi_ctrl, true); - else + mhi_partial_unprepare_after_power_down(ab_pci->mhi_ctrl); + } else { mhi_power_down(ab_pci->mhi_ctrl, true); - - mhi_unprepare_after_power_down(ab_pci->mhi_ctrl); + mhi_unprepare_after_power_down(ab_pci->mhi_ctrl); + } } int ath11k_mhi_suspend(struct ath11k_pci *ab_pci) diff --git a/include/linux/mhi.h b/include/linux/mhi.h index 059dc94d20bb6..65a47c712b3a0 100644 --- a/include/linux/mhi.h +++ b/include/linux/mhi.h @@ -382,6 +382,7 @@ struct mhi_controller { const char *fw_image; const u8 *fw_data; size_t fw_sz; + size_t prev_fw_sz; const char *edl_image; size_t rddm_size; size_t sbl_size; @@ -662,6 +663,12 @@ void mhi_power_down_keep_dev(struct mhi_controller *mhi_cntrl, bool graceful); */ void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl); +/** + * mhi_partial_unprepare_after_power_down - Free any allocated memory after power down partially + * @mhi_cntrl: MHI controller + */ +void mhi_partial_unprepare_after_power_down(struct mhi_controller *mhi_cntrl); + /** * mhi_pm_suspend - Move MHI into a suspended state * @mhi_cntrl: MHI controller
Fix dma_direct_alloc() failure at resume time during bhie_table allocation. There is a crash report where at resume time, the memory from the dma doesn't get allocated and MHI fails to re-initialize. There may be fragmentation of some kind which fails the allocation call. To fix it, don't free the memory at power down during suspend / hibernation. Instead, use the same allocated memory again after every resume / hibernation. This patch has been tested with resume and hibernation both. The rddm is of constant size for a given hardware. While the fbc_image size depends on the firmware. If the firmware changes, we'll free and allocate new memory for it. Here are the crash logs: [ 3029.338587] mhi mhi0: Requested to power ON [ 3029.338621] mhi mhi0: Power on setup success [ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7 [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024 [ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi] [ 3029.668717] Call Trace: [ 3029.668722] <TASK> [ 3029.668728] dump_stack_lvl+0x4e/0x70 [ 3029.668738] warn_alloc+0x164/0x190 [ 3029.668747] ? srso_return_thunk+0x5/0x5f [ 3029.668754] ? __alloc_pages_direct_compact+0xaf/0x360 [ 3029.668761] __alloc_pages_slowpath.constprop.0+0xc75/0xd70 [ 3029.668774] __alloc_pages_noprof+0x321/0x350 [ 3029.668782] __dma_direct_alloc_pages.isra.0+0x14a/0x290 [ 3029.668790] dma_direct_alloc+0x70/0x270 [ 3029.668796] mhi_alloc_bhie_table+0xe8/0x190 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] [ 3029.668814] mhi_fw_load_handler+0x1bc/0x310 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] [ 3029.668830] mhi_pm_st_worker+0x5c8/0xaa0 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] [ 3029.668844] ? srso_return_thunk+0x5/0x5f [ 3029.668853] process_one_work+0x17e/0x330 [ 3029.668861] worker_thread+0x2ce/0x3f0 [ 3029.668868] ? __pfx_worker_thread+0x10/0x10 [ 3029.668873] kthread+0xd2/0x100 [ 3029.668879] ? __pfx_kthread+0x10/0x10 [ 3029.668885] ret_from_fork+0x34/0x50 [ 3029.668892] ? __pfx_kthread+0x10/0x10 [ 3029.668898] ret_from_fork_asm+0x1a/0x30 [ 3029.668910] </TASK> Tested-on: QCNFA765 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> --- Changes sice v1: - Don't free bhie tables during suspend/hibernation only - Handle fbc_image changed size correctly - Remove fbc_image getting set to NULL in *free_bhie_table() --- drivers/bus/mhi/host/boot.c | 15 +++++++++++---- drivers/bus/mhi/host/init.c | 13 ++++++++++--- drivers/net/wireless/ath/ath11k/mhi.c | 9 +++++---- include/linux/mhi.h | 7 +++++++ 4 files changed, 33 insertions(+), 11 deletions(-)