mbox series

[V4,0/5] arm_scmi: vendors: Qualcomm Generic Vendor Extensions

Message ID 20241007061023.1978380-1-quic_sibis@quicinc.com
Headers show
Series arm_scmi: vendors: Qualcomm Generic Vendor Extensions | expand

Message

Sibi Sankar Oct. 7, 2024, 6:10 a.m. UTC
The QCOM SCMI vendor protocol provides a generic way of exposing a
number of Qualcomm SoC specific features (like memory bus scaling)
through a mixture of pre-determined algorithm strings and param_id
pairs hosted on the SCMI controller. Introduce a client driver that
uses the memlat algorithm string hosted on QCOM SCMI Vendor Protocol
to detect memory latency workloads and control frequency/level of
the various memory buses (DDR/LLCC/DDR_QOS).

QCOM SCMI Generic Vendor protocol background:
It was found that a lot of the vendor protocol used internally was
for debug/internal development purposes that would either be super
SoC specific or had to be disabled because of some features being
fused out during production. This lead to a large number of vendor
protocol numbers being quickly consumed and were never released
either. Using a generic vendor protocol with functionality abstracted
behind algorithm strings gave us the flexibility of allowing such
functionality exist during initial development/debugging while
still being able to expose functionality like memlat once they have
matured enough. The param-ids are certainly expected to act as ABI
for algorithms strings like MEMLAT.

Thanks in advance for taking time to review the series.

V3:
* Restructure the bindings to mimic IMX [Christian]
* Add documentation for the qcom generic vendor protocol [Christian/Sudeep]
* Pick up Rb tag and fixup/re-order elements of the vendor ops [Christian]
* Follow naming convention and folder structure used by IMX
* Add missing enum in the vendor protocol and fix documentation [Konrad]
* Add missing enum in the scmi memlat driver and fix documentation [Konrad]
* Add checks for max memory and monitor [Shivnandan]
* Fix typo from START_TIMER -> STOP_TIMER [Shivnandan]
* Make populate_physical_mask func to void [Shivnandan]
* Remove unecessary zero set [Shivnandan]
* Use __free(device node) in init_cpufreq-memfreqmap [Christian/Konrad]
* Use sdev->dev.of_node directly [Christian]
* use return dev_err_probe in multiple places [Christian]

V2:
* Drop container dvfs memlat container node. [Rob]
* Move scmi-memlat.yaml to protocol level given that a lot of vendors might end up
* using the same protocol number. [Rob]
* Replace qcom,cpulist with the standard "cpus" property. [Rob]
* Fix up compute-type/ipm-ceil required. [Rob]
* Make driver changes to the accommodate bindings changes. [Rob]
* Minor fixups in subjects/coverletter.
* Minor style fixes in dts.

V1:
* Add missing bindings for the protocol. [Konrad/Dmitry]
* Use alternate bindings. [Dmitry/Konrad]
* Rebase on top of Cristian's "SCMI multiple vendor protocol support" series. [Cristian]
* Add more documentation wherever possible. [Sudeep]
* Replace pr_err/info with it's dev equivalents.
* Mixed tabs and initialization cleanups in the memlat driver. [Konrad]
* Commit message update for the memlat driver. [Dmitry]
* Cleanups/Fixes suggested for the client driver. [Dmitry/Konrad/Cristian]
* Use opp-tables instead of memfreq-tbl. [Dmitry/Konrad]
* Detect physical cpu to deal with variants with reduced cpu count.
* Add support for DDR_QOS mem_type.

Sibi Sankar (5):
  dt-bindings: firmware: Document bindings for QCOM SCMI Generic
    Extension
  firmware: arm_scmi: Add QCOM Generic Vendor Protocol documentation
  firmware: arm_scmi: vendors: Add QCOM SCMI Generic Extensions
  soc: qcom: Introduce SCMI based Memlat (Memory Latency) governor
  arm64: dts: qcom: x1e80100: Enable LLCC/DDR/DDR_QOS dvfs

 .../bindings/firmware/arm,scmi.yaml           |   1 +
 .../bindings/firmware/qcom,scmi-memlat.yaml   | 246 ++++++++
 arch/arm64/boot/dts/qcom/x1e80100.dtsi        | 138 +++++
 drivers/firmware/arm_scmi/Kconfig             |   1 +
 drivers/firmware/arm_scmi/Makefile            |   1 +
 .../firmware/arm_scmi/vendors/qcom/Kconfig    |  15 +
 .../firmware/arm_scmi/vendors/qcom/Makefile   |   2 +
 .../arm_scmi/vendors/qcom/qcom-generic-ext.c  | 184 ++++++
 .../arm_scmi/vendors/qcom/qcom_generic.rst    | 210 +++++++
 drivers/soc/qcom/Kconfig                      |  12 +
 drivers/soc/qcom/Makefile                     |   1 +
 drivers/soc/qcom/qcom_scmi_memlat_client.c    | 569 ++++++++++++++++++
 .../dt-bindings/firmware/qcom,scmi-memlat.h   |  22 +
 include/linux/scmi_qcom_protocol.h            |  39 ++
 14 files changed, 1441 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/firmware/qcom,scmi-memlat.yaml
 create mode 100644 drivers/firmware/arm_scmi/vendors/qcom/Kconfig
 create mode 100644 drivers/firmware/arm_scmi/vendors/qcom/Makefile
 create mode 100644 drivers/firmware/arm_scmi/vendors/qcom/qcom-generic-ext.c
 create mode 100644 drivers/firmware/arm_scmi/vendors/qcom/qcom_generic.rst
 create mode 100644 drivers/soc/qcom/qcom_scmi_memlat_client.c
 create mode 100644 include/dt-bindings/firmware/qcom,scmi-memlat.h
 create mode 100644 include/linux/scmi_qcom_protocol.h

Comments

Sibi Sankar Oct. 22, 2024, 8:24 a.m. UTC | #1
On 10/8/24 12:22, Krzysztof Kozlowski wrote:
> On Mon, Oct 07, 2024 at 11:40:18AM +0530, Sibi Sankar wrote:
>> The QCOM SCMI vendor protocol provides a generic way of exposing a
>> number of Qualcomm SoC specific features (like memory bus scaling)
>> through a mixture of pre-determined algorithm strings and param_id
>> pairs hosted on the SCMI controller. Introduce a client driver that
>> uses the memlat algorithm string hosted on QCOM SCMI Vendor Protocol
>> to detect memory latency workloads and control frequency/level of
>> the various memory buses (DDR/LLCC/DDR_QOS).
> 
> None of your patches are wrapped according to Linux coding style which
> makes reviewing more difficult than it should be. And before you answer
> with checkpatch, checkpatch is not a coding style.

I can see that you've been a reviewer of this series from the very
initial version. That would imply you had a chance to shape/guide the
series to whatever shape you prefer. Yet you choose not to do so and
make a blanket statement now that it's close to merge in v4 :/

-Sibi

> 
> Best regards,
> Krzysztof
>
Johan Hovold Nov. 6, 2024, 12:55 p.m. UTC | #2
On Mon, Oct 07, 2024 at 11:40:18AM +0530, Sibi Sankar wrote:
> The QCOM SCMI vendor protocol provides a generic way of exposing a
> number of Qualcomm SoC specific features (like memory bus scaling)
> through a mixture of pre-determined algorithm strings and param_id
> pairs hosted on the SCMI controller. Introduce a client driver that
> uses the memlat algorithm string hosted on QCOM SCMI Vendor Protocol
> to detect memory latency workloads and control frequency/level of
> the various memory buses (DDR/LLCC/DDR_QOS).
> 
> QCOM SCMI Generic Vendor protocol background:
> It was found that a lot of the vendor protocol used internally was
> for debug/internal development purposes that would either be super
> SoC specific or had to be disabled because of some features being
> fused out during production. This lead to a large number of vendor
> protocol numbers being quickly consumed and were never released
> either. Using a generic vendor protocol with functionality abstracted
> behind algorithm strings gave us the flexibility of allowing such
> functionality exist during initial development/debugging while
> still being able to expose functionality like memlat once they have
> matured enough. The param-ids are certainly expected to act as ABI
> for algorithms strings like MEMLAT.

I wanted to give this series a quick spin on the x1e80100 CRD, but ran
into some issues.

First, I expected the drivers to be loaded automatically when built as
modules, but that did not happen so something appears to be missing.

Second, after loading the protocol and client drivers manually (in that
order, shouldn't the client driver pull in the protocol?), I got:

	scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm  20000
	arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
	scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
	scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95

which seems to suggest that the firmware on my CRD does not support this
feature. Is that the way this should be interpreted? And does that mean
that non of the commercial laptops supports this either?

Johan
Cristian Marussi Nov. 6, 2024, 8:03 p.m. UTC | #3
On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:
> On Mon, Oct 07, 2024 at 11:40:18AM +0530, Sibi Sankar wrote:
> > The QCOM SCMI vendor protocol provides a generic way of exposing a
> > number of Qualcomm SoC specific features (like memory bus scaling)
> > through a mixture of pre-determined algorithm strings and param_id
> > pairs hosted on the SCMI controller. Introduce a client driver that
> > uses the memlat algorithm string hosted on QCOM SCMI Vendor Protocol
> > to detect memory latency workloads and control frequency/level of
> > the various memory buses (DDR/LLCC/DDR_QOS).
> > 
> > QCOM SCMI Generic Vendor protocol background:
> > It was found that a lot of the vendor protocol used internally was
> > for debug/internal development purposes that would either be super
> > SoC specific or had to be disabled because of some features being
> > fused out during production. This lead to a large number of vendor
> > protocol numbers being quickly consumed and were never released
> > either. Using a generic vendor protocol with functionality abstracted
> > behind algorithm strings gave us the flexibility of allowing such
> > functionality exist during initial development/debugging while
> > still being able to expose functionality like memlat once they have
> > matured enough. The param-ids are certainly expected to act as ABI
> > for algorithms strings like MEMLAT.
> 
> I wanted to give this series a quick spin on the x1e80100 CRD, but ran
> into some issues.
> 
> First, I expected the drivers to be loaded automatically when built as
> modules, but that did not happen so something appears to be missing.
> 

Hi Johan,

so the SCMI stack is fully modularizable as of this release, i.e.

 - SCMI core (scmi-core + scmi-module)
 - SCMI transports (scmi_transport_{mailbox,virtio,smc,optee}
 - optional SCMI Vendor protos
 - Std and Vendor SCMI Drivers on top of protos

....on the other side the SCMI standard protocols are still embedded
in scmi-module (or builtin) as of now...

Even though, module usage is tracked by the core AND when an SCMI Vendor
driver tries to use protocol_X, it causes protocol_X to be initialized
(calling its protocol_init), there is NO auto-loading for SCMI Vendor
Protocols when bult as LKM...because there were really no ask till now
and this stuff is in general needed so very early dburing boot...so the
usecase of all these LKM modules is just debug/test as in your case
(or in mine frequently)....

...and I am NOT saying with this that is necessarily right or must be
stay like this...just explaining how it is now....

....anyway...it is mostly trivial to add vendor/protocols autoloading
transparently...today I was experimenting with a patch that triggers
autoloading based on a generic common alias pattern in the form
'scmi-protocol-0x<NN>' (with NN the specific protocol ID of course)
that triggers the loading as soon as the SCMI Vendor driver tries to
access the protocol during its probe...

....I will post it for the next cycle once it is clean.
(unless I am missing something else that you want to add...)

> Second, after loading the protocol and client drivers manually (in that
> order, shouldn't the client driver pull in the protocol?), I got:
> 
> 	scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm  20000
> 	arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
> 	scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
> 	scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95
> 
> which seems to suggest that the firmware on my CRD does not support this
> feature. Is that the way this should be interpreted? And does that mean
> that non of the commercial laptops supports this either?

This seems like FW rejecting the command, maybe just only for the specific
Linux OSPM agent since it is not allowed to ask for that specific setup,
and only Sibi can shed a light here :D

...but this Vendor protocol, if I am not mistaken, AFAIU, uses a bunch
of "algo strings" coming from tokens it picks from DT and use thsoe as
params for the set_param() VendorProtocol ops...cannot be that a bad/missing
DT value could cause the FW to reject the command due to the params ?
(even if the command is supported)...

...just a guess ah... I have no real knowledge of this venndor proto...


Thanks,
Cristian
Johan Hovold Nov. 8, 2024, 3:14 p.m. UTC | #4
On Wed, Nov 06, 2024 at 08:03:30PM +0000, Cristian Marussi wrote:
> On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:

> > First, I expected the drivers to be loaded automatically when built as
> > modules, but that did not happen so something appears to be missing.

> Even though, module usage is tracked by the core AND when an SCMI Vendor
> driver tries to use protocol_X, it causes protocol_X to be initialized
> (calling its protocol_init), there is NO auto-loading for SCMI Vendor
> Protocols when bult as LKM...because there were really no ask till now
> and this stuff is in general needed so very early dburing boot...so the
> usecase of all these LKM modules is just debug/test as in your case
> (or in mine frequently)....
> 
> ...and I am NOT saying with this that is necessarily right or must be
> stay like this...just explaining how it is now....

Ok, thanks for clarifying.

> ....anyway...it is mostly trivial to add vendor/protocols autoloading
> transparently...today I was experimenting with a patch that triggers
> autoloading based on a generic common alias pattern in the form
> 'scmi-protocol-0x<NN>' (with NN the specific protocol ID of course)
> that triggers the loading as soon as the SCMI Vendor driver tries to
> access the protocol during its probe...
> 
> ....I will post it for the next cycle once it is clean.
> (unless I am missing something else that you want to add...)

Sounds like that would solve the issue. I was just expecting the memlat
client driver and its protocol dependency to be loaded automatically
when built as modules on machines that can use them (e.g. as we don't
want to have every vendor protocol driver built into distro kernels,
etc).

> > Second, after loading the protocol and client drivers manually (in that
> > order, shouldn't the client driver pull in the protocol?), I got:
> > 
> > 	scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm  20000
> > 	arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
> > 	scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
> > 	scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95
> > 
> > which seems to suggest that the firmware on my CRD does not support this
> > feature. Is that the way this should be interpreted? And does that mean
> > that non of the commercial laptops supports this either?
> 
> This seems like FW rejecting the command, maybe just only for the specific
> Linux OSPM agent since it is not allowed to ask for that specific setup,
> and only Sibi can shed a light here :D
> 
> ...but this Vendor protocol, if I am not mistaken, AFAIU, uses a bunch
> of "algo strings" coming from tokens it picks from DT and use thsoe as
> params for the set_param() VendorProtocol ops...cannot be that a bad/missing
> DT value could cause the FW to reject the command due to the params ?
> (even if the command is supported)...
> 
> ...just a guess ah... I have no real knowledge of this venndor proto...

Yeah, hopefully Sibi can shed some light on this. I'm using the DT
patch (5/5) from this series, which according to the commit message is
supposed to enable bus scaling on the x1e80100 platform. So I guess
something is missing in my firmware.

Johan
Sibi Sankar Dec. 5, 2024, 10:56 a.m. UTC | #5
On 11/22/24 14:07, Johan Hovold wrote:
> On Thu, Nov 14, 2024 at 09:52:12AM +0530, Sibi Sankar wrote:
>> On 11/8/24 20:44, Johan Hovold wrote:
> 
>>>> On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:
> 
>>>>> Second, after loading the protocol and client drivers manually (in that
>>>>> order, shouldn't the client driver pull in the protocol?), I got:
>>>>>
>>>>> 	scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm  20000
>>>>> 	arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
>>>>> 	scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
>>>>> 	scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95
>>>>>
>>>>> which seems to suggest that the firmware on my CRD does not support this
>>>>> feature. Is that the way this should be interpreted? And does that mean
>>>>> that non of the commercial laptops supports this either?
> 
>>> Yeah, hopefully Sibi can shed some light on this. I'm using the DT
>>> patch (5/5) from this series, which according to the commit message is
>>> supposed to enable bus scaling on the x1e80100 platform. So I guess
>>> something is missing in my firmware.
>>
>> Nah, it's probably just because of the algo string used.
>> The past few series used caps MEMLAT string instead of
>> memlat to pass the tuneables, looks like all the laptops
>> havn't really switched to it yet. Will revert back to
>> using to lower case memlat so that all devices are
>> supported. Thanks for trying the series out!
> 
> I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
> there too, and there I do *not* see the above mentioned -EOPNOSUPP error
> and the memlat driver probes successfully.
> 
> On the other hand, this series seems to have no effect on a kernel
> compilation benchmark. Is that expected?

I can have a look at your tree. But memlat in general
depends on the cpu frequency when your benchmarks max
the cpu's the ddr/llcc are scaled accordingly by it.

> 
> And does this mean that you should stick with the uppercase "MEMLAT"
> string after all? The firmware on my CRD is not the latest one, but I am
> using the latest available firmware for the T14s.

We should stick with "memlat" if we run into a device in the
wild that doesn't support "MEMLAT"

-Sibi

> 
> Johan
>
Sudeep Holla Dec. 5, 2024, 5:01 p.m. UTC | #6
On Fri, Nov 22, 2024 at 09:37:47AM +0100, Johan Hovold wrote:
> On Thu, Nov 14, 2024 at 09:52:12AM +0530, Sibi Sankar wrote:
> > On 11/8/24 20:44, Johan Hovold wrote:
>
> > >> On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:
>
> > >>> Second, after loading the protocol and client drivers manually (in that
> > >>> order, shouldn't the client driver pull in the protocol?), I got:
> > >>>
> > >>> 	scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm  20000
> > >>> 	arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
> > >>> 	scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
> > >>> 	scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95
> > >>>
> > >>> which seems to suggest that the firmware on my CRD does not support this
> > >>> feature. Is that the way this should be interpreted? And does that mean
> > >>> that non of the commercial laptops supports this either?
>
> > > Yeah, hopefully Sibi can shed some light on this. I'm using the DT
> > > patch (5/5) from this series, which according to the commit message is
> > > supposed to enable bus scaling on the x1e80100 platform. So I guess
> > > something is missing in my firmware.
> >
> > Nah, it's probably just because of the algo string used.
> > The past few series used caps MEMLAT string instead of
> > memlat to pass the tuneables, looks like all the laptops
> > havn't really switched to it yet. Will revert back to
> > using to lower case memlat so that all devices are
> > supported. Thanks for trying the series out!
>
> I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
> there too, and there I do *not* see the above mentioned -EOPNOSUPP error
> and the memlat driver probes successfully.
>
> On the other hand, this series seems to have no effect on a kernel
> compilation benchmark. Is that expected?
>

Hijacking this thread to rant about state of firmware implementation on
this platform that gives me zero confidence in merging any of these without
examining each of the interface details in depth and at lengths.

Also I see the standard protocol like PERF seem to have so many issues which
adds to my no confidence. I can't comment on that thread for specific reasons.

I will briefly mention my suspicion here. This Lenovo ThinkPad T14s being
primarily targeting other OS using ACPI might have just implemented what is
required for ACPI CPPC which conveniently doesn't have to discover lot of
fastchannel details since they are supplied in the tables straight away.
But that also would mean it could be not fully compliant to SCMI spec.

Either we need to run some compliance test suite(which again may need more
work as it is unfortunately make platform specific - referring to [1])
or use the raw interface in the kernel and throw /dev/random at it and see
how well it can cope up.

--
Regards,
Sudeep

[1] https://gitlab.arm.com/tests/scmi-tests (Not so great and needs platform
     specific vectors to check compliance)
Sibi Sankar Dec. 17, 2024, 11:49 a.m. UTC | #7
On 12/5/24 21:22, Johan Hovold wrote:
> On Thu, Dec 05, 2024 at 04:26:55PM +0530, Sibi Sankar wrote:
>> On 11/22/24 14:07, Johan Hovold wrote:
> 
>>> I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
>>> there too, and there I do *not* see the above mentioned -EOPNOSUPP error
>>> and the memlat driver probes successfully.
>>>
>>> On the other hand, this series seems to have no effect on a kernel
>>> compilation benchmark. Is that expected?
>>
>> I can have a look at your tree. But memlat in general
>> depends on the cpu frequency when your benchmarks max
>> the cpu's the ddr/llcc are scaled accordingly by it.
> 
> A kernel compilation should max out the CPU frequency on all cores.
> 
>>> And does this mean that you should stick with the uppercase "MEMLAT"
>>> string after all? The firmware on my CRD is not the latest one, but I am
>>> using the latest available firmware for the T14s.
>>
>> We should stick with "memlat" if we run into a device in the
>> wild that doesn't support "MEMLAT"
> 
> Ok. So the updated firmware supports both strings?


Sry for the delay, was out sick. Yes the updated firmware supports both
strings.

-Sibi

> 
> Johan
Sibi Sankar Dec. 17, 2024, 12:25 p.m. UTC | #8
On 12/5/24 22:31, Sudeep Holla wrote:
> On Fri, Nov 22, 2024 at 09:37:47AM +0100, Johan Hovold wrote:
>> On Thu, Nov 14, 2024 at 09:52:12AM +0530, Sibi Sankar wrote:
>>> On 11/8/24 20:44, Johan Hovold wrote:
>>
>>>>> On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:
>>
>>>>>> Second, after loading the protocol and client drivers manually (in that
>>>>>> order, shouldn't the client driver pull in the protocol?), I got:
>>>>>>
>>>>>> 	scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm  20000
>>>>>> 	arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
>>>>>> 	scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
>>>>>> 	scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95
>>>>>>
>>>>>> which seems to suggest that the firmware on my CRD does not support this
>>>>>> feature. Is that the way this should be interpreted? And does that mean
>>>>>> that non of the commercial laptops supports this either?
>>
>>>> Yeah, hopefully Sibi can shed some light on this. I'm using the DT
>>>> patch (5/5) from this series, which according to the commit message is
>>>> supposed to enable bus scaling on the x1e80100 platform. So I guess
>>>> something is missing in my firmware.
>>>
>>> Nah, it's probably just because of the algo string used.
>>> The past few series used caps MEMLAT string instead of
>>> memlat to pass the tuneables, looks like all the laptops
>>> havn't really switched to it yet. Will revert back to
>>> using to lower case memlat so that all devices are
>>> supported. Thanks for trying the series out!
>>
>> I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
>> there too, and there I do *not* see the above mentioned -EOPNOSUPP error
>> and the memlat driver probes successfully.
>>
>> On the other hand, this series seems to have no effect on a kernel
>> compilation benchmark. Is that expected?
>>
> 
> Hijacking this thread to rant about state of firmware implementation on
> this platform that gives me zero confidence in merging any of these without
> examining each of the interface details in depth and at lengths.
> 

Hey Sudeep,

Thanks for taking time to review the series.

> Also I see the standard protocol like PERF seem to have so many issues which
> adds to my no confidence. I can't comment on that thread for specific reasons.

^^ is largely untrue, a lot of finger pointing and a gross
misrepresentation of reality :/

The only major problem that X1E perf protocol has is a firmware
crash in the LEVEL_GET regular message implementation. This
pretty much went unnoticed because of messaging in perf implementation
in kernel. Given the fastchannel implementation isn't mandatory
according to spec, the kernel clearly says it switches to
regular messaging when it clearly doesn't do that and uses
stale data structures instead. This ensured that level get regular
messaging never got tested.

We pretty much have been good upstream citizens, finding bugs and
sending fixes wherever we can. We clearly don't deserve such a hostile
stance.

> 
> I will briefly mention my suspicion here. This Lenovo ThinkPad T14s being
> primarily targeting other OS using ACPI might have just implemented what is
> required for ACPI CPPC which conveniently doesn't have to discover lot of
> fastchannel details since they are supplied in the tables straight away.
> But that also would mean it could be not fully compliant to SCMI spec.

Not fully compliant to the spec? I am pretty sure this series would
have been shot down completely and NAKd on the list by you if that
was the case lol.

-Sibi

> 
> Either we need to run some compliance test suite(which again may need more
> work as it is unfortunately make platform specific - referring to [1])
> or use the raw interface in the kernel and throw /dev/random at it and see
> how well it can cope up.
> 
> --
> Regards,
> Sudeep
> 
> [1] https://gitlab.arm.com/tests/scmi-tests (Not so great and needs platform
>       specific vectors to check compliance)
Cristian Marussi Dec. 17, 2024, 2:45 p.m. UTC | #9
On Tue, Dec 17, 2024 at 05:55:35PM +0530, Sibi Sankar wrote:
> 
> 
> On 12/5/24 22:31, Sudeep Holla wrote:
> > On Fri, Nov 22, 2024 at 09:37:47AM +0100, Johan Hovold wrote:
> > > On Thu, Nov 14, 2024 at 09:52:12AM +0530, Sibi Sankar wrote:
> > > > On 11/8/24 20:44, Johan Hovold wrote:
> > > 
> > > > > > On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:
> > > 
> > > > > > > Second, after loading the protocol and client drivers manually (in that
> > > > > > > order, shouldn't the client driver pull in the protocol?), I got:
> > > > > > > 
> > > > > > > 	scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm  20000
> > > > > > > 	arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
> > > > > > > 	scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
> > > > > > > 	scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95
> > > > > > > 
> > > > > > > which seems to suggest that the firmware on my CRD does not support this
> > > > > > > feature. Is that the way this should be interpreted? And does that mean
> > > > > > > that non of the commercial laptops supports this either?
> > > 
> > > > > Yeah, hopefully Sibi can shed some light on this. I'm using the DT
> > > > > patch (5/5) from this series, which according to the commit message is
> > > > > supposed to enable bus scaling on the x1e80100 platform. So I guess
> > > > > something is missing in my firmware.
> > > > 
> > > > Nah, it's probably just because of the algo string used.
> > > > The past few series used caps MEMLAT string instead of
> > > > memlat to pass the tuneables, looks like all the laptops
> > > > havn't really switched to it yet. Will revert back to
> > > > using to lower case memlat so that all devices are
> > > > supported. Thanks for trying the series out!
> > > 
> > > I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
> > > there too, and there I do *not* see the above mentioned -EOPNOSUPP error
> > > and the memlat driver probes successfully.
> > > 
> > > On the other hand, this series seems to have no effect on a kernel
> > > compilation benchmark. Is that expected?
> > > 
> > 
> > Hijacking this thread to rant about state of firmware implementation on
> > this platform that gives me zero confidence in merging any of these without
> > examining each of the interface details in depth and at lengths.
> > 
> 

Hi Sibi,

> Hey Sudeep,
> 
> Thanks for taking time to review the series.
> 
> > Also I see the standard protocol like PERF seem to have so many issues which
> > adds to my no confidence. I can't comment on that thread for specific reasons.
> 
> ^^ is largely untrue, a lot of finger pointing and a gross
> misrepresentation of reality :/
> 
> The only major problem that X1E perf protocol has is a firmware
> crash in the LEVEL_GET regular message implementation. This
> pretty much went unnoticed because of messaging in perf implementation
> in kernel. Given the fastchannel implementation isn't mandatory
> according to spec, the kernel clearly says it switches to
> regular messaging when it clearly doesn't do that and uses
> stale data structures instead. This ensured that level get regular
> messaging never got tested.

You claimed this a couple of times here and on IRC, but sincerely,
looking at the fastchannel implementation in SCMI core and Perf, I could
not track down where this could have happened in the recent code
(i.e. with or without your recent, welcomed, patches...)

When FC initialization fails and bailout it says:
	
	"Failed to get FC for protocol %X [MSG_ID:%u / RES_ID:%u] - ret:%d. Using regular messaging."

... and it clears any gathered address for that FC, so that in __scmi_perf_level_get()
you end up skipping the FC machinery and use messaging

	if (dom->fc_info && dom->fc_info[PERF_FC_LEVEL].get_addr) {
		...
	}

	return scmi_perf_msg_level_get(ph, dom->id, level, poll);

Now this is done ONLY for the FC that specifically failed
initialization, i.e. identified by the tuple PROTO_ID/MSG_ID/RES_ID
(as stated in the noisy message above where MSG_ID is specified) NOT for
all Fastchannel, so you can have an FC successfully initialized only on
the GET but failing in the SET, so only the GET FC will be used.

I dont really understand how the Kernel was misbehaving and using
instead stale data, neither, if this was the case, I can see where this
issue would have been fixed.

To be clear, I am not really interested in throwing an argument here, but
I sincerely dont see where the alleged problem was and how was fixed (kernel
side), so I fear it could be still there, hidden maybe by a change in the
platform fw.

Apologies if I missed something along the history of this..

Thanks,
Cristian
Sudeep Holla Dec. 17, 2024, 5:59 p.m. UTC | #10
On Tue, Dec 17, 2024 at 05:55:35PM +0530, Sibi Sankar wrote:
> 
> 
> On 12/5/24 22:31, Sudeep Holla wrote:
> > On Fri, Nov 22, 2024 at 09:37:47AM +0100, Johan Hovold wrote:
> > > On Thu, Nov 14, 2024 at 09:52:12AM +0530, Sibi Sankar wrote:
> > > > On 11/8/24 20:44, Johan Hovold wrote:
> > > 
> > > > > > On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:
> > > 
> > > > > > > Second, after loading the protocol and client drivers manually (in that
> > > > > > > order, shouldn't the client driver pull in the protocol?), I got:
> > > > > > > 
> > > > > > > 	scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm  20000
> > > > > > > 	arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
> > > > > > > 	scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
> > > > > > > 	scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95
> > > > > > > 
> > > > > > > which seems to suggest that the firmware on my CRD does not support this
> > > > > > > feature. Is that the way this should be interpreted? And does that mean
> > > > > > > that non of the commercial laptops supports this either?
> > > 
> > > > > Yeah, hopefully Sibi can shed some light on this. I'm using the DT
> > > > > patch (5/5) from this series, which according to the commit message is
> > > > > supposed to enable bus scaling on the x1e80100 platform. So I guess
> > > > > something is missing in my firmware.
> > > > 
> > > > Nah, it's probably just because of the algo string used.
> > > > The past few series used caps MEMLAT string instead of
> > > > memlat to pass the tuneables, looks like all the laptops
> > > > havn't really switched to it yet. Will revert back to
> > > > using to lower case memlat so that all devices are
> > > > supported. Thanks for trying the series out!
> > > 
> > > I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
> > > there too, and there I do *not* see the above mentioned -EOPNOSUPP error
> > > and the memlat driver probes successfully.
> > > 
> > > On the other hand, this series seems to have no effect on a kernel
> > > compilation benchmark. Is that expected?
> > > 
> > 
> > Hijacking this thread to rant about state of firmware implementation on
> > this platform that gives me zero confidence in merging any of these without
> > examining each of the interface details in depth and at lengths.
> > 
> 
> Hey Sudeep,
> 
> Thanks for taking time to review the series.
> 
> > Also I see the standard protocol like PERF seem to have so many issues which
> > adds to my no confidence. I can't comment on that thread for specific reasons.
> 
> ^^ is largely untrue, a lot of finger pointing and a gross
> misrepresentation of reality :/
>

Sorry if I was not clear, I just said I don't have confidence yet and if
the firmware is stable, then it is just the impression I have arrived based
on the discussions.

> crash in the LEVEL_GET regular message implementation. This
> pretty much went unnoticed because of messaging in perf implementation
> in kernel.

OK, is there any scope to improve in your opinion ? Please suggest and
discuss or post a patch to have separate discussion.

> Given the fastchannel implementation isn't mandatory
> according to spec, the kernel clearly says it switches to
> regular messaging when it clearly doesn't do that and uses
> stale data structures instead.

Interesting, it sounds like a bug. Please provide details or patch to
fix the bug. That would probably fix it on whatever platform we are
concerned here.

> This ensured that level get regular messaging never got tested.
>

You seem to point at this bug several time now, we need to get it fixed,
but we need to understand it first if you want us to fix it or as mentioned
before you can as well post the patch.

> We pretty much have been good upstream citizens, finding bugs and
> sending fixes wherever we can. We clearly don't deserve such a hostile
> stance.
>

Not sure what made you think we are hostile towards your contributions.
We just need a maintainable solution merged upstream and we are working
towards the same. The documents written as part of this series is not
there yet to help me understand the protocol yet. I have asked questions
and answer to those can be made part of the next version to improve it
IMO.

> > I will briefly mention my suspicion here. This Lenovo ThinkPad T14s being
> > primarily targeting other OS using ACPI might have just implemented what is
> > required for ACPI CPPC which conveniently doesn't have to discover lot of
> > fastchannel details since they are supplied in the tables straight away.
> > But that also would mean it could be not fully compliant to SCMI spec.
>
> Not fully compliant to the spec? I am pretty sure this series would
> have been shot down completely and NAKd on the list by you if that
> was the case lol.
>

Honestly I am still trying to make any sense out of this vendor protocols.
The documents produced as part of this series doesn't help me understand
the same and that is my main feedback so far on this thread. I haven't
looked at the code yet so I can't comment on the same as I first need
to understand the vendor protocol document/specification.
Johan Hovold Dec. 19, 2024, 10:37 a.m. UTC | #11
On Tue, Dec 17, 2024 at 05:19:25PM +0530, Sibi Sankar wrote:
> On 12/5/24 21:22, Johan Hovold wrote:
> > On Thu, Dec 05, 2024 at 04:26:55PM +0530, Sibi Sankar wrote:
> >> On 11/22/24 14:07, Johan Hovold wrote:
> > 
> >>> I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
> >>> there too, and there I do *not* see the above mentioned -EOPNOSUPP error
> >>> and the memlat driver probes successfully.
> >>>
> >>> On the other hand, this series seems to have no effect on a kernel
> >>> compilation benchmark. Is that expected?
> >>
> >> I can have a look at your tree. But memlat in general
> >> depends on the cpu frequency when your benchmarks max
> >> the cpu's the ddr/llcc are scaled accordingly by it.
> > 
> > A kernel compilation should max out the CPU frequency on all cores.

Answering my own question here; bwmon should scale the buses for
benchmarks like kernel compilations so I guess the non-existing impact
of memlat is expected here.

Ettore helped me run some further benchmarks, including cachebench, but
also saw no positive (or negative) effect with this series running on an
X1E CRD (with recent firmware).

Do you have any suggestions of benchmarks to run where the effect of
memlat should show up? What have you been using for testing?

I did measure a possibly slightly higher (idle) power consumption with
memlat, but I guess that is also expected given the intended more
aggressive ramping of the bus clocks.

These are the branches (and configs; johan_defconfig) we've used for
testing:

	https://github.com/jhovold/linux/tree/wip/x1e80100-6.13-rc3
	https://github.com/jhovold/linux/tree/wip/x1e80100-6.13-rc3-memlat

> >>> And does this mean that you should stick with the uppercase "MEMLAT"
> >>> string after all? The firmware on my CRD is not the latest one, but I am
> >>> using the latest available firmware for the T14s.
> >>
> >> We should stick with "memlat" if we run into a device in the
> >> wild that doesn't support "MEMLAT"
> > 
> > Ok. So the updated firmware supports both strings?
> 
> Sry for the delay, was out sick. Yes the updated firmware supports both
> strings.

No worries, hope you're feeling better.

I noticed that the firmware on the T14s indeed accepts both strings.

Johan
Sibi Sankar Dec. 23, 2024, 2 p.m. UTC | #12
On 12/19/24 16:07, Johan Hovold wrote:
> On Tue, Dec 17, 2024 at 05:19:25PM +0530, Sibi Sankar wrote:
>> On 12/5/24 21:22, Johan Hovold wrote:
>>> On Thu, Dec 05, 2024 at 04:26:55PM +0530, Sibi Sankar wrote:
>>>> On 11/22/24 14:07, Johan Hovold wrote:
>>>
>>>>> I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
>>>>> there too, and there I do *not* see the above mentioned -EOPNOSUPP error
>>>>> and the memlat driver probes successfully.
>>>>>
>>>>> On the other hand, this series seems to have no effect on a kernel
>>>>> compilation benchmark. Is that expected?
>>>>
>>>> I can have a look at your tree. But memlat in general
>>>> depends on the cpu frequency when your benchmarks max
>>>> the cpu's the ddr/llcc are scaled accordingly by it.
>>>
>>> A kernel compilation should max out the CPU frequency on all cores.
> 
> Answering my own question here; bwmon should scale the buses for
> benchmarks like kernel compilations so I guess the non-existing impact
> of memlat is expected here.

you would see impact only in cases where you would benefit from
having ddr and llcc at a higher frequency i.e. latency workloads.
I usually run geekbench with and we are expected to see a big
difference with and without it.

> 
> Ettore helped me run some further benchmarks, including cachebench, but
> also saw no positive (or negative) effect with this series running on an
> X1E CRD (with recent firmware).
> 
> Do you have any suggestions of benchmarks to run where the effect of
> memlat should show up? What have you been using for testing?
> 
> I did measure a possibly slightly higher (idle) power consumption with
> memlat, but I guess that is also expected given the intended more
> aggressive ramping of the bus clocks.
> 
> These are the branches (and configs; johan_defconfig) we've used for
> testing:
> 
> 	https://github.com/jhovold/linux/tree/wip/x1e80100-6.13-rc3
> 	https://github.com/jhovold/linux/tree/wip/x1e80100-6.13-rc3-memlat

Thanks, we'll get this sorted out.

> 
>>>>> And does this mean that you should stick with the uppercase "MEMLAT"
>>>>> string after all? The firmware on my CRD is not the latest one, but I am
>>>>> using the latest available firmware for the T14s.
>>>>
>>>> We should stick with "memlat" if we run into a device in the
>>>> wild that doesn't support "MEMLAT"
>>>
>>> Ok. So the updated firmware supports both strings?
>>
>> Sry for the delay, was out sick. Yes the updated firmware supports both
>> strings.
> 
> No worries, hope you're feeling better.
> 
> I noticed that the firmware on the T14s indeed accepts both strings.
> 
> Johan
Sibi Sankar Dec. 23, 2024, 2:09 p.m. UTC | #13
On 12/17/24 20:15, Cristian Marussi wrote:
> On Tue, Dec 17, 2024 at 05:55:35PM +0530, Sibi Sankar wrote:
>>
>>
>> On 12/5/24 22:31, Sudeep Holla wrote:
>>> On Fri, Nov 22, 2024 at 09:37:47AM +0100, Johan Hovold wrote:
>>>> On Thu, Nov 14, 2024 at 09:52:12AM +0530, Sibi Sankar wrote:
>>>>> On 11/8/24 20:44, Johan Hovold wrote:
>>>>
>>>>>>> On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:
>>>>
>>>>>>>> Second, after loading the protocol and client drivers manually (in that
>>>>>>>> order, shouldn't the client driver pull in the protocol?), I got:
>>>>>>>>
>>>>>>>> 	scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm  20000
>>>>>>>> 	arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
>>>>>>>> 	scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
>>>>>>>> 	scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95
>>>>>>>>
>>>>>>>> which seems to suggest that the firmware on my CRD does not support this
>>>>>>>> feature. Is that the way this should be interpreted? And does that mean
>>>>>>>> that non of the commercial laptops supports this either?
>>>>
>>>>>> Yeah, hopefully Sibi can shed some light on this. I'm using the DT
>>>>>> patch (5/5) from this series, which according to the commit message is
>>>>>> supposed to enable bus scaling on the x1e80100 platform. So I guess
>>>>>> something is missing in my firmware.
>>>>>
>>>>> Nah, it's probably just because of the algo string used.
>>>>> The past few series used caps MEMLAT string instead of
>>>>> memlat to pass the tuneables, looks like all the laptops
>>>>> havn't really switched to it yet. Will revert back to
>>>>> using to lower case memlat so that all devices are
>>>>> supported. Thanks for trying the series out!
>>>>
>>>> I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
>>>> there too, and there I do *not* see the above mentioned -EOPNOSUPP error
>>>> and the memlat driver probes successfully.
>>>>
>>>> On the other hand, this series seems to have no effect on a kernel
>>>> compilation benchmark. Is that expected?
>>>>
>>>
>>> Hijacking this thread to rant about state of firmware implementation on
>>> this platform that gives me zero confidence in merging any of these without
>>> examining each of the interface details in depth and at lengths.
>>>
>>
> 
> Hi Sibi,
> 
>> Hey Sudeep,
>>
>> Thanks for taking time to review the series.
>>
>>> Also I see the standard protocol like PERF seem to have so many issues which
>>> adds to my no confidence. I can't comment on that thread for specific reasons.
>>
>> ^^ is largely untrue, a lot of finger pointing and a gross
>> misrepresentation of reality :/
>>
>> The only major problem that X1E perf protocol has is a firmware
>> crash in the LEVEL_GET regular message implementation. This
>> pretty much went unnoticed because of messaging in perf implementation
>> in kernel. Given the fastchannel implementation isn't mandatory
>> according to spec, the kernel clearly says it switches to
>> regular messaging when it clearly doesn't do that and uses
>> stale data structures instead. This ensured that level get regular
>> messaging never got tested.
> 
> You claimed this a couple of times here and on IRC, but sincerely,
> looking at the fastchannel implementation in SCMI core and Perf, I could
> not track down where this could have happened in the recent code
> (i.e. with or without your recent, welcomed, patches...)
> 
> When FC initialization fails and bailout it says:
> 	
> 	"Failed to get FC for protocol %X [MSG_ID:%u / RES_ID:%u] - ret:%d. Using regular messaging."
> 
> ... and it clears any gathered address for that FC, so that in __scmi_perf_level_get()
> you end up skipping the FC machinery and use messaging
> 
> 	if (dom->fc_info && dom->fc_info[PERF_FC_LEVEL].get_addr) {
> 		...
> 	}
> 
> 	return scmi_perf_msg_level_get(ph, dom->id, level, poll);
> 
> Now this is done ONLY for the FC that specifically failed
> initialization, i.e. identified by the tuple PROTO_ID/MSG_ID/RES_ID
> (as stated in the noisy message above where MSG_ID is specified) NOT for
> all Fastchannel, so you can have an FC successfully initialized only on
> the GET but failing in the SET, so only the GET FC will be used.
> 
> I dont really understand how the Kernel was misbehaving and using
> instead stale data, neither, if this was the case, I can see where this
> issue would have been fixed.
> 
> To be clear, I am not really interested in throwing an argument here, but
> I sincerely dont see where the alleged problem was and how was fixed (kernel
> side), so I fear it could be still there, hidden maybe by a change in the
> platform fw.
> 
> Apologies if I missed something along the history of this..

lol, this is pretty embarrassing :|, It's just like you said
looks like this fw supports get_level fastchannel but fails
to say it supports it. This was the reason behind get_level
regular message for being never tested and being buggy and
had nothing to do the kernel messaging or being buggy.
My bad :(, sry again.

-Sibi

> 
> Thanks,
> Cristian
Sibi Sankar Dec. 23, 2024, 2:14 p.m. UTC | #14
On 12/17/24 23:29, Sudeep Holla wrote:
> On Tue, Dec 17, 2024 at 05:55:35PM +0530, Sibi Sankar wrote:
>>
>>
>> On 12/5/24 22:31, Sudeep Holla wrote:
>>> On Fri, Nov 22, 2024 at 09:37:47AM +0100, Johan Hovold wrote:
>>>> On Thu, Nov 14, 2024 at 09:52:12AM +0530, Sibi Sankar wrote:
>>>>> On 11/8/24 20:44, Johan Hovold wrote:
>>>>
>>>>>>> On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:
>>>>
>>>>>>>> Second, after loading the protocol and client drivers manually (in that
>>>>>>>> order, shouldn't the client driver pull in the protocol?), I got:
>>>>>>>>
>>>>>>>> 	scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm  20000
>>>>>>>> 	arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
>>>>>>>> 	scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
>>>>>>>> 	scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95
>>>>>>>>
>>>>>>>> which seems to suggest that the firmware on my CRD does not support this
>>>>>>>> feature. Is that the way this should be interpreted? And does that mean
>>>>>>>> that non of the commercial laptops supports this either?
>>>>
>>>>>> Yeah, hopefully Sibi can shed some light on this. I'm using the DT
>>>>>> patch (5/5) from this series, which according to the commit message is
>>>>>> supposed to enable bus scaling on the x1e80100 platform. So I guess
>>>>>> something is missing in my firmware.
>>>>>
>>>>> Nah, it's probably just because of the algo string used.
>>>>> The past few series used caps MEMLAT string instead of
>>>>> memlat to pass the tuneables, looks like all the laptops
>>>>> havn't really switched to it yet. Will revert back to
>>>>> using to lower case memlat so that all devices are
>>>>> supported. Thanks for trying the series out!
>>>>
>>>> I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
>>>> there too, and there I do *not* see the above mentioned -EOPNOSUPP error
>>>> and the memlat driver probes successfully.
>>>>
>>>> On the other hand, this series seems to have no effect on a kernel
>>>> compilation benchmark. Is that expected?
>>>>
>>>
>>> Hijacking this thread to rant about state of firmware implementation on
>>> this platform that gives me zero confidence in merging any of these without
>>> examining each of the interface details in depth and at lengths.
>>>
>>
>> Hey Sudeep,
>>
>> Thanks for taking time to review the series.
>>
>>> Also I see the standard protocol like PERF seem to have so many issues which
>>> adds to my no confidence. I can't comment on that thread for specific reasons.
>>
>> ^^ is largely untrue, a lot of finger pointing and a gross
>> misrepresentation of reality :/
>>
> 
> Sorry if I was not clear, I just said I don't have confidence yet and if
> the firmware is stable, then it is just the impression I have arrived based
> on the discussions.

It's like you said the SCMI PERF protocol isn't used in Windows
but they do vendor protocol for bus scaling i.e. the memlat
algostring hosted on the generic vendor protocol. So those
bits are expected to be pretty stable.

> 
>> crash in the LEVEL_GET regular message implementation. This
>> pretty much went unnoticed because of messaging in perf implementation
>> in kernel.
> 
> OK, is there any scope to improve in your opinion ? Please suggest and
> discuss or post a patch to have separate discussion.
> 
>> Given the fastchannel implementation isn't mandatory
>> according to spec, the kernel clearly says it switches to
>> regular messaging when it clearly doesn't do that and uses
>> stale data structures instead.
> 
> Interesting, it sounds like a bug. Please provide details or patch to
> fix the bug. That would probably fix it on whatever platform we are
> concerned here.

sry, It was just a misunderstanding. Please ignore.

> 
>> This ensured that level get regular messaging never got tested.
>>
> 
> You seem to point at this bug several time now, we need to get it fixed,
> but we need to understand it first if you want us to fix it or as mentioned
> before you can as well post the patch.
> 
>> We pretty much have been good upstream citizens, finding bugs and
>> sending fixes wherever we can. We clearly don't deserve such a hostile
>> stance.
>>
> 
> Not sure what made you think we are hostile towards your contributions.
> We just need a maintainable solution merged upstream and we are working
> towards the same. The documents written as part of this series is not
> there yet to help me understand the protocol yet. I have asked questions
> and answer to those can be made part of the next version to improve it
> IMO.

Ack and we would ensure those get implemented to ensure the
protocol remains easily reviewable and maintainable.

-Sibi

> 
>>> I will briefly mention my suspicion here. This Lenovo ThinkPad T14s being
>>> primarily targeting other OS using ACPI might have just implemented what is
>>> required for ACPI CPPC which conveniently doesn't have to discover lot of
>>> fastchannel details since they are supplied in the tables straight away.
>>> But that also would mean it could be not fully compliant to SCMI spec.
>>
>> Not fully compliant to the spec? I am pretty sure this series would
>> have been shot down completely and NAKd on the list by you if that
>> was the case lol.
>>
> 
> Honestly I am still trying to make any sense out of this vendor protocols.
> The documents produced as part of this series doesn't help me understand
> the same and that is my main feedback so far on this thread. I haven't
> looked at the code yet so I can't comment on the same as I first need
> to understand the vendor protocol document/specification.
>