Message ID | fd26ce4a-a9f3-4ada-8d46-ed36fb2456ca@freebox.fr |
---|---|
Headers | show |
Series | Work around missing MSA_READY indicator for msm8998 devices | expand |
On 28/03/2024 18:36, Marc Gonzalez wrote: > The ath10k driver waits for an "MSA_READY" indicator > to complete initialization. If the indicator is not > received, then the device remains unusable. > > cf. ath10k_qmi_driver_event_work() > > Several msm8998-based devices are affected by this issue. > Oddly, it seems safe to NOT wait for the indicator, and > proceed immediately when QMI_EVENT_SERVER_ARRIVE. > This is v2, so where is the changelog? Best regards, Krzysztof
On 28/03/2024 18:39, Marc Gonzalez wrote: > The ath10k driver waits for an "MSA_READY" indicator > to complete initialization. If the indicator is not > received, then the device remains unusable. > > cf. ath10k_qmi_driver_event_work() > > Several msm8998-based devices are affected by this issue. > Oddly, it seems safe to NOT wait for the indicator, and > proceed immediately when QMI_EVENT_SERVER_ARRIVE. > > Jeff Johnson wrote: > > The feedback I received was "it might be ok to change all ath10k qmi > to skip waiting for msa_ready", and it was pointed out that ath11k > (and ath12k) do not wait for it. > > However with so many deployed devices, "might be ok" isn't a strong > argument for changing the default behavior. > I think you got pretty clear comments: "This sounds more like a firmware feature, not a hardware feature." "This is why having this property in DT does not look right place for this." Best regards, Krzysztof
On 4/2/2024 8:55 AM, Dmitry Baryshkov wrote: > I'd say, we should take a step back and actually verify how this was > handled in the vendor kernel. (error handling and other non-QMI code removed from the following for readability) In ath10k we unconditionally call the following in ath10k_qmi_event_server_arrive(): ret = ath10k_qmi_host_cap_send_sync(qmi); ret = ath10k_qmi_msa_mem_info_send_sync_msg(qmi); ret = ath10k_qmi_setup_msa_permissions(qmi); ret = ath10k_qmi_msa_ready_send_sync_msg(qmi); ret = ath10k_qmi_cap_send_sync_msg(qmi); In vendor icnss2 there is conditional logic in icnss_driver_event_server_arrive(): if (priv->device_id == WCN6750_DEVICE_ID || priv->device_id == WCN6450_DEVICE_ID) { ret = wlfw_host_cap_send_sync(priv); } if (priv->device_id == ADRASTEA_DEVICE_ID) { ret = wlfw_msa_mem_info_send_sync_msg(priv); ret = wlfw_msa_ready_send_sync_msg(priv); } ret = wlfw_cap_send_sync_msg(priv); reference: https://git.codelinaro.org/clo/la/platform/vendor/qcom-opensource/wlan/platform/-/blob/wlan-platform.lnx.1.0.r1-rel/icnss2/main.c?ref_type=heads#L890 /jeff
On Tue, 2 Apr 2024 at 21:22, Jeff Johnson <quic_jjohnson@quicinc.com> wrote: > > On 4/2/2024 8:55 AM, Dmitry Baryshkov wrote: > > I'd say, we should take a step back and actually verify how this was > > handled in the vendor kernel. > > (error handling and other non-QMI code removed from the following for readability) > > In ath10k we unconditionally call the following in > ath10k_qmi_event_server_arrive(): > ret = ath10k_qmi_host_cap_send_sync(qmi); > ret = ath10k_qmi_msa_mem_info_send_sync_msg(qmi); > ret = ath10k_qmi_setup_msa_permissions(qmi); > ret = ath10k_qmi_msa_ready_send_sync_msg(qmi); > ret = ath10k_qmi_cap_send_sync_msg(qmi); > > In vendor icnss2 there is conditional logic in icnss_driver_event_server_arrive(): Note, wcn3990 is icnss, not icnss2 > if (priv->device_id == WCN6750_DEVICE_ID || > priv->device_id == WCN6450_DEVICE_ID) { > ret = wlfw_host_cap_send_sync(priv); > } > > if (priv->device_id == ADRASTEA_DEVICE_ID) { > ret = wlfw_msa_mem_info_send_sync_msg(priv); > ret = wlfw_msa_ready_send_sync_msg(priv); > } The problem with applying this approach is that here the discriminator is the WiFi device ID. WCN6750, WCN6450 and this ADRASTEA are different WiFi/BT chips. However for msm8998 and e.g. sdm845 there is no easy way to distinguish the WiFi chips. Both platforms use wcn3990 device. > > ret = wlfw_cap_send_sync_msg(priv); > > reference: > https://git.codelinaro.org/clo/la/platform/vendor/qcom-opensource/wlan/platform/-/blob/wlan-platform.lnx.1.0.r1-rel/icnss2/main.c?ref_type=heads#L890 > > /jeff
On 03/04/2024 16:12, Dmitry Baryshkov wrote:
> From [Jeff's] message it looks like we are expected to get MSA READY even on msm8998.
This is the code we're using:
https://git.codelinaro.org/clo/la/kernel/msm-4.4/-/blob/caf_migration/kernel.lnx.4.4.r38-rel/drivers/net/wireless/ath/ath10k/qmi.c
When ATH10K_SNOC_DRIVER_EVENT_SERVER_ARRIVE,
driver registers an "indicator handler"
ath10k_snoc_qmi_wlfw_clnt_ind()
It handles QMI_WLFW_FW_READY_IND_V01 by posting
ATH10K_SNOC_DRIVER_EVENT_FW_READY_IND
which is handled in the
ath10k_snoc_driver_event_work() work queue.
But QMI_WLFW_MSA_READY_IND_V01 only triggers
a debug log and setting qmi_cfg->msa_ready = true;
$ git grep '\<msa_ready\>'
drivers/net/wireless/ath/ath10k/qmi.c: qmi_cfg->msa_ready = true;
drivers/net/wireless/ath/ath10k/qmi.c: qmi_cfg->msa_ready = false;
drivers/net/wireless/ath/ath10k/qmi.h: * msa_ready: wlan firmware msa memory ready for board data download
drivers/net/wireless/ath/ath10k/qmi.h: bool msa_ready;
So basically, the vendor ath10k driver ignores QMI_WLFW_MSA_READY_IND_V01.
I will test the following patch which aligns the behavior
of mainline driver to that of vendor driver:
diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c
index 38e939f572a9e..0e1ab5aca663b 100644
--- a/drivers/net/wireless/ath/ath10k/qmi.c
+++ b/drivers/net/wireless/ath/ath10k/qmi.c
@@ -1040,6 +1040,7 @@ static void ath10k_qmi_driver_event_work(struct work_struct *work)
switch (event->type) {
case ATH10K_QMI_EVENT_SERVER_ARRIVE:
ath10k_qmi_event_server_arrive(qmi);
+ ath10k_qmi_event_msa_ready(qmi);
break;
case ATH10K_QMI_EVENT_SERVER_EXIT:
ath10k_qmi_event_server_exit(qmi);
@@ -1048,7 +1049,7 @@ static void ath10k_qmi_driver_event_work(struct work_struct *work)
ath10k_qmi_event_fw_ready_ind(qmi);
break;
case ATH10K_QMI_EVENT_MSA_READY_IND:
- ath10k_qmi_event_msa_ready(qmi);
+ printk(KERN_WARNING "IGNORING MSA_READY INDICATOR");
break;
default:
ath10k_warn(ar, "invalid event type: %d", event->type);
Dmitry Baryshkov reported:
Works on sm8150, sdm845, qrb2210
Regards
On Thu, 4 Apr 2024 at 15:30, Marc Gonzalez <mgonzalez@freebox.fr> wrote: > > On 04/04/2024 13:57, Kalle Valo wrote: > > > Dmitry Baryshkov wrote: > > > >> I'd say, we should take a step back and actually verify how this was > >> handled in the vendor kernel. > > > > One comment related to this: usually vendor driver and firmware branches > > go "hand in hand", meaning that a version of driver supports only one > > specific firmware branch. And there can be a lot of branches. So even if > > one branch might have a check for something specific, there are no > > guarantees what the other N+1 branches do :/ > > The consequences and ramifications of the above comment are not clear to me. > > Does this mean: > "It is pointless to analyze a given version (or even several versions) > of the vendor driver downstream, because there are exist a large number > of variations of the code." ? > > And thus, "it is nonsensical to try to "align" the mainline driver to > "the" vendor driver, as there is no single "vendor driver"" ? > > Thus, the following patch (or one functionally-equivalent) is not acceptable? For reference, I tested this patch on sdm845 (db845c), qcm2290 aka qrb2210 (rb1), sm6115 aka qrb4210 (rb2) and sm8150 platforms. I was not able to fully test it on sda660, modem crashes without this patch (there is no MSA_READY indication) and with the patch applied the device hangs, most likely because of the IOMMU or clocking issue. > > diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c > index 38e939f572a9e..fd9ac9717488a 100644 > --- a/drivers/net/wireless/ath/ath10k/qmi.c > +++ b/drivers/net/wireless/ath/ath10k/qmi.c > @@ -1040,6 +1040,8 @@ static void ath10k_qmi_driver_event_work(struct work_struct *work) > switch (event->type) { > case ATH10K_QMI_EVENT_SERVER_ARRIVE: > ath10k_qmi_event_server_arrive(qmi); > + printk(KERN_NOTICE "NOT WAITING FOR MSA_READY INDICATOR"); > + ath10k_qmi_event_msa_ready(qmi); > break; > case ATH10K_QMI_EVENT_SERVER_EXIT: > ath10k_qmi_event_server_exit(qmi); > @@ -1048,7 +1050,7 @@ static void ath10k_qmi_driver_event_work(struct work_struct *work) > ath10k_qmi_event_fw_ready_ind(qmi); > break; > case ATH10K_QMI_EVENT_MSA_READY_IND: > - ath10k_qmi_event_msa_ready(qmi); > + printk(KERN_NOTICE "IGNORING ACTUAL MSA_READY INDICATOR"); > break; > default: > ath10k_warn(ar, "invalid event type: %d", event->type); > > > > Regards >
On 04/04/2024 17:28, Kalle Valo wrote: > Marc Gonzalez wrote: > >> On 04/04/2024 13:57, Kalle Valo wrote: >> >>> Dmitry Baryshkov wrote: >>> >>>> I'd say, we should take a step back and actually verify how this was >>>> handled in the vendor kernel. >>> >>> One comment related to this: usually vendor driver and firmware branches >>> go "hand in hand", meaning that a version of driver supports only one >>> specific firmware branch. And there can be a lot of branches. So even if >>> one branch might have a check for something specific, there are no >>> guarantees what the other N+1 branches do :/ >> >> The consequences and ramifications of the above comment are not clear to me. >> >> Does this mean: >> "It is pointless to analyze a given version (or even several versions) >> of the vendor driver downstream, because there are exist a large number >> of variations of the code." ? > > I was trying to say that because the design philosophy between vendor > drivers and upstream drivers is very different, we can't 100% trust > vendor drivers. It's a very good idea to check what vendor drivers do > but we just need to be careful before making any conclusions. Testing > real hardware (and corresponding firmware) is the most reliable way to > know how different products/firmware work, unfortunately. > >> And thus, "it is nonsensical to try to "align" the mainline driver to >> "the" vendor driver, as there is no single "vendor driver"" ? > > No no, I'm not saying that. I have suffered this "N+1 different firmware > branches behaving slighly differently" problem since ath6kl days so for > me this is business as usual, sadly. I'm sure we can find a solution for > ath10k. Hello Kalle, I can spin a v3, no problem. Do you prefer: Option A = never waiting for the MSA_READY indicator for ANYONE Option B = not waiting for the MSA_READY indicator when qcom,no-msa-ready-indicator is defined Option C = not waiting for the MSA_READY indicator for certain platforms (based on root compatible) Option D = some other solution not yet discussed Dmitry has tested Option A on 5 platforms, where it does not induce regressions. I worked on msm8998, where Option A (or any equivalent) unbreaks WiFi. Please provide guidance :) Regards
Marc Gonzalez <mgonzalez@freebox.fr> writes: > On 04/04/2024 17:28, Kalle Valo wrote: > >> Marc Gonzalez wrote: >> >>> On 04/04/2024 13:57, Kalle Valo wrote: >>> >>>> Dmitry Baryshkov wrote: >>>> >>>>> I'd say, we should take a step back and actually verify how this was >>>>> handled in the vendor kernel. >>>> >>>> One comment related to this: usually vendor driver and firmware branches >>>> go "hand in hand", meaning that a version of driver supports only one >>>> specific firmware branch. And there can be a lot of branches. So even if >>>> one branch might have a check for something specific, there are no >>>> guarantees what the other N+1 branches do :/ >>> >>> The consequences and ramifications of the above comment are not clear to me. >>> >>> Does this mean: >>> "It is pointless to analyze a given version (or even several versions) >>> of the vendor driver downstream, because there are exist a large number >>> of variations of the code." ? >> >> I was trying to say that because the design philosophy between vendor >> drivers and upstream drivers is very different, we can't 100% trust >> vendor drivers. It's a very good idea to check what vendor drivers do >> but we just need to be careful before making any conclusions. Testing >> real hardware (and corresponding firmware) is the most reliable way to >> know how different products/firmware work, unfortunately. >> >>> And thus, "it is nonsensical to try to "align" the mainline driver to >>> "the" vendor driver, as there is no single "vendor driver"" ? >> >> No no, I'm not saying that. I have suffered this "N+1 different firmware >> branches behaving slighly differently" problem since ath6kl days so for >> me this is business as usual, sadly. I'm sure we can find a solution for >> ath10k. > > Hello Kalle, > > I can spin a v3, no problem. > > Do you prefer: > > Option A = never waiting for the MSA_READY indicator for ANYONE > Option B = not waiting for the MSA_READY indicator when > qcom,no-msa-ready-indicator is defined > Option C = not waiting for the MSA_READY indicator for certain > platforms (based on root compatible) > Option D = some other solution not yet discussed After firmware-N.bin solution didn't work (sorry about that!) my prerence is option B. > Dmitry has tested Option A on 5 platforms, where it does not induce regressions. > I worked on msm8998, where Option A (or any equivalent) unbreaks WiFi. What do you mean here? Are you saying that option A works on all devices? I'm guessing I'm misunderstanding something.
On 25/04/2024 11:42, Kalle Valo wrote: > Marc Gonzalez wrote: > >> Do you prefer: >> >> Option A = never waiting for the MSA_READY indicator for ANYONE >> Option B = not waiting for the MSA_READY indicator when >> qcom,no-msa-ready-indicator is defined >> Option C = not waiting for the MSA_READY indicator for certain >> platforms (based on root compatible) >> Option D = some other solution not yet discussed > > After firmware-N.bin solution didn't work (sorry about that!) my > preference is option B. Actually, Option B is this patch series. Could you formally review it? Perhaps one thing I could do slightly differently is to NOT call ath10k_qmi_event_msa_ready() a second time if we DO receive the indicator later. >> Dmitry has tested Option A on 5 platforms, where it does not induce regressions. >> I worked on msm8998, where Option A (or any equivalent) unbreaks WiFi. > > What do you mean here? Are you saying that option A works on all > devices? I'm guessing I'm misunderstanding something. No one serious would ever claim "this works on all devices". Dmitry and I tested the following patch: diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c index 38e939f572a9e..fd9ac9717488a 100644 --- a/drivers/net/wireless/ath/ath10k/qmi.c +++ b/drivers/net/wireless/ath/ath10k/qmi.c @@ -1040,6 +1040,8 @@ static void ath10k_qmi_driver_event_work(struct work_struct *work) switch (event->type) { case ATH10K_QMI_EVENT_SERVER_ARRIVE: ath10k_qmi_event_server_arrive(qmi); + printk(KERN_NOTICE "NOT WAITING FOR MSA_READY INDICATOR"); + ath10k_qmi_event_msa_ready(qmi); break; case ATH10K_QMI_EVENT_SERVER_EXIT: ath10k_qmi_event_server_exit(qmi); @@ -1048,7 +1050,7 @@ static void ath10k_qmi_driver_event_work(struct work_struct *work) ath10k_qmi_event_fw_ready_ind(qmi); break; case ATH10K_QMI_EVENT_MSA_READY_IND: - ath10k_qmi_event_msa_ready(qmi); + printk(KERN_NOTICE "IGNORING ACTUAL MSA_READY INDICATOR"); break; default: ath10k_warn(ar, "invalid event type: %d", event->type); Dmitry tested several platforms: > For reference, I tested this patch on sdm845 (db845c), qcm2290 aka > qrb2210 (rb1), sm6115 aka qrb4210 (rb2) and sm8150 platforms. > I was not able to fully test it on sda660, modem crashes without this > patch (there is no MSA_READY indication) and with the patch applied > the device hangs, most likely because of the IOMMU or clocking issue. I tested on apq8098 (msm8998 sibling). Patch makes adapter work on my msm8998 platform. Regards
Marc Gonzalez <mgonzalez@freebox.fr> writes: > On 25/04/2024 11:42, Kalle Valo wrote: > >> Marc Gonzalez wrote: >> >>> Do you prefer: >>> >>> Option A = never waiting for the MSA_READY indicator for ANYONE >>> Option B = not waiting for the MSA_READY indicator when >>> qcom,no-msa-ready-indicator is defined >>> Option C = not waiting for the MSA_READY indicator for certain >>> platforms (based on root compatible) >>> Option D = some other solution not yet discussed >> >> After firmware-N.bin solution didn't work (sorry about that!) my >> preference is option B. > > Actually, Option B is this patch series. > Could you formally review it? I'm happy with this series and would take it to ath.git, just need an ack from DT maintainers: https://patchwork.kernel.org/project/linux-wireless/patch/84f20fb5-5d48-419c-8eff-d7044afb81c0@freebox.fr/ > Perhaps one thing I could do slightly differently is to NOT call > ath10k_qmi_event_msa_ready() a second time if we DO receive the > indicator later. Good point. And maybe add an ath10k_warn() message so that we notice if there's a mismatch.
On Thu, Apr 25, 2024 at 06:42:16PM +0300, Kalle Valo wrote: > Marc Gonzalez <mgonzalez@freebox.fr> writes: > > > On 25/04/2024 11:42, Kalle Valo wrote: > > > >> Marc Gonzalez wrote: > >> > >>> Do you prefer: > >>> > >>> Option A = never waiting for the MSA_READY indicator for ANYONE > >>> Option B = not waiting for the MSA_READY indicator when > >>> qcom,no-msa-ready-indicator is defined > >>> Option C = not waiting for the MSA_READY indicator for certain > >>> platforms (based on root compatible) > >>> Option D = some other solution not yet discussed > >> > >> After firmware-N.bin solution didn't work (sorry about that!) my > >> preference is option B. > > > > Actually, Option B is this patch series. > > Could you formally review it? > > I'm happy with this series and would take it to ath.git, just need an > ack from DT maintainers: As far as I can tell, Krzysztof wanted a commit message update for 1/3. Usually this dts patch would go via the qcom dts tree, so presumably there's a need for an Ack from Bjorn or Konrad? > > https://patchwork.kernel.org/project/linux-wireless/patch/84f20fb5-5d48-419c-8eff-d7044afb81c0@freebox.fr/ > > > Perhaps one thing I could do slightly differently is to NOT call > > ath10k_qmi_event_msa_ready() a second time if we DO receive the > > indicator later. > > Good point. And maybe add an ath10k_warn() message so that we notice if > there's a mismatch. > > -- > https://patchwork.kernel.org/project/linux-wireless/list/ > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Conor Dooley <conor@kernel.org> writes: > On Thu, Apr 25, 2024 at 06:42:16PM +0300, Kalle Valo wrote: > >> Marc Gonzalez <mgonzalez@freebox.fr> writes: >> >> > On 25/04/2024 11:42, Kalle Valo wrote: >> > >> >> Marc Gonzalez wrote: >> >> >> >>> Do you prefer: >> >>> >> >>> Option A = never waiting for the MSA_READY indicator for ANYONE >> >>> Option B = not waiting for the MSA_READY indicator when >> >>> qcom,no-msa-ready-indicator is defined >> >>> Option C = not waiting for the MSA_READY indicator for certain >> >>> platforms (based on root compatible) >> >>> Option D = some other solution not yet discussed >> >> >> >> After firmware-N.bin solution didn't work (sorry about that!) my >> >> preference is option B. >> > >> > Actually, Option B is this patch series. >> > Could you formally review it? >> >> I'm happy with this series and would take it to ath.git, just need an >> ack from DT maintainers: > > As far as I can tell, Krzysztof wanted a commit message update for > 1/3. That's my understanding as well, I assume Marc will submit v3. I marked this patchset as 'Changes Requested' in patchwork. > Usually this dts patch would go via the qcom dts tree, so presumably > there's a need for an Ack from Bjorn or Konrad? Thanks pointing this out. What I meant to say earlier that I'm happy to take patches 1-2 to ath.git but I prefer that patch 3 goes via qcom dts tree.