mbox series

[RESEND,v3,00/17] EDAC/mc/synopsys: Various fixes and cleanups

Message ID 20220929232712.12202-1-Sergey.Semin@baikalelectronics.ru
Headers show
Series EDAC/mc/synopsys: Various fixes and cleanups | expand

Message

Serge Semin Sept. 29, 2022, 11:26 p.m. UTC
This patchset is a first one in the series created in the framework of
my Baikal-T1 DDRC-related work:

[1: In-progress] EDAC/mc/synopsys: Various fixes and cleanups
Link: ---you are looking at it---
[2: In-progress] EDAC/synopsys: Add generic DDRC info and address mapping
Link: https://lore.kernel.org/linux-edac/20220910195007.11027-1-Sergey.Semin@baikalelectronics.ru
[3: In-progress] EDAC/synopsys: Add generic resources and Baikal-T1 support
Link: https://lore.kernel.org/linux-edac/20220910195659.11843-1-Sergey.Semin@baikalelectronics.ru

Note the patchsets above must be merged in the same order as they are
placed in the list in order to prevent conflicts. Nothing prevents them
from being reviewed synchronously though. Any tests are very welcome.
Thanks in advance.

Regarding this series content. It's an initial patchset which
traditionally provides various fixes, cleanups and modifications required
for the more comfortable further features development. The main goal of it
though is to detach the Xilinx Zynq A05 DDRC related code into the
dedicated driver since first it has nothing to do with the Synopsys DW
uMCTL2 DDR controller and second it will be a great deal obstacle on the
way of extending the Synopsys-part functionality.

The series starts with fixes patches, which in short concern the next
aspects: touching the ZynqMP-specific CSRs on the Xilinx ZinqMP platform
only, serializing an access to the ECCCLR register, adding correct memory
devices type detection, setting a correct value to the
mem_ctl_info.scrub_cap field, dropping an erroneous ADDRMAP[4] parsing and
getting back a correct order of the ECC errors info detection procedure.

Afterwards the patchset provides several cleanup patches required for the
more coherent code splitting up (Xilinx Zynq A05 and Synopsys DW uMCTL2)
so the provided modifications would be useful in both drivers. First we
get to replace the platform resource manual IO-remapping with the
devm_platform_ioremap_resource() method call. Secondly we suggest to drop:
internal CE/UE errors counters, local to_mci() macros definition, some
redundant ecc_error_info structure fields and redundant info from the
error message, duplicated dimm->nr_pages debug printout and spaces from
the MEM_TYPE flags declarations. (The later two updates concern the MCI
core part.) Thirdly before splitting up the driver we need to add an
unique MC index allocation infrastructure to the MCI core.  It's required
since after splitting the driver up we'll need to make sure both device
types could be correctly probed on the same platform. Finally the Xilinx
Zynq A05 part of the driver is moved out to a dedicated driver where it
should been originally placed. After that the platform-specific setups API
is removed from the Synopsys DW uMCTL2 DDRC driver since it's no longer
required.

Finally as the cherry on the cake we suggest to unify the DW uMCTL2 DDRC
driver entities naming and replace the open-coded "shift/mask" patter with
the kernel helpers like BIT/GENMASK/FIELD_x in there. It shall
significantly improve the code readability.

Link: https://lore.kernel.org/linux-edac/20220822190730.27277-1-Sergey.Semin@baikalelectronics.ru/
Changelog v2:
- Move Synopsys DW uMCTL2 DDRC bindings file renaming to a separate patch.
  (@Krzysztof)
- Introduce a new compatible string "snps,dw-umctl2-ddrc" matching the new
  DT-schema name.
- Forgot to fix some of the prefix of the SYNPS_ZYNQMP_IRQ_REGS macro
  in several places. (@tbot)
- Drop the no longer used "priv" pointer from the mc_init() function.
  (@tbot)
- Include "linux/bitfield.h" header file to get the FIELD_GET macro
  definition. (@tbot)
- Drop the already merged in patches:
[PATCH 12/20] EDAC/mc: Replace spaces with tabs in memtype flags definition
[PATCH 13/20] EDAC/mc: Drop duplicated dimm->nr_pages debug printout

Link: https://lore.kernel.org/linux-edac/20220910194237.10142-1-Sergey.Semin@baikalelectronics.ru
Changelog v3:
- Drop the no longer used "priv" pointer from the mc_init() function.
  (@tbot)
- Drop the merged in patches:
[PATCH v2 14/19] dt-bindings: memory: snps: Detach Zynq DDRC controller support
[PATCH v2 15/19] dt-bindings: memory: snps: Use more descriptive device name
  (@Krzysztof)

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Cc: Michail Ivanov <Michail.Ivanov@baikalelectronics.ru>
Cc: Pavel Parkhomenko <Pavel.Parkhomenko@baikalelectronics.ru>
Cc: Punnaiah Choudary Kalluri <punnaiah.choudary.kalluri@xilinx.com>
Cc: Manish Narani <manish.narani@xilinx.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Robert Richter <rric@kernel.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: devicetree@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Serge Semin (17):
  EDAC/synopsys: Fix native uMCTL2 IRQs handling procedure
  EDAC/synopsys: Fix generic device type detection procedure
  EDAC/synopsys: Fix mci->scrub_cap field setting
  EDAC/synopsys: Drop erroneous ADDRMAP4.addrmap_col_b10 parse
  EDAC/synopsys: Fix reading errors count before ECC status
  EDAC/synopsys: Use platform device devm ioremap method
  EDAC/synopsys: Drop internal CE and UE counters
  EDAC/synopsys: Drop local to_mci macro implementation
  EDAC/synopsys: Drop struct ecc_error_info.blknr field
  EDAC/synopsys: Shorten out struct ecc_error_info.bankgrpnr field name
  EDAC/synopsys: Drop redundant info from error message
  EDAC/mc: Init DIMM labels in MC registration method
  EDAC/mc: Add MC unique index allocation procedure
  EDAC/synopsys: Detach Zynq DDRC controller support
  EDAC/synopsys: Drop unused platform-specific setup API
  EDAC/synopsys: Unify the driver entities naming
  EDAC/synopsys: Convert to using BIT/GENMASK/FIELD_x macros

 MAINTAINERS                  |   1 +
 drivers/edac/Kconfig         |   9 +-
 drivers/edac/Makefile        |   1 +
 drivers/edac/edac_mc.c       | 135 +++++-
 drivers/edac/edac_mc.h       |   4 +
 drivers/edac/synopsys_edac.c | 903 ++++++++++++-----------------------
 drivers/edac/zynq_edac.c     | 501 +++++++++++++++++++
 7 files changed, 927 insertions(+), 627 deletions(-)
 create mode 100644 drivers/edac/zynq_edac.c

Comments

Borislav Petkov Sept. 30, 2022, 2:29 p.m. UTC | #1
On Fri, Sep 30, 2022 at 02:26:55AM +0300, Serge Semin wrote:
> This patchset is a first one in the series created in the framework of
> my Baikal-T1 DDRC-related work:
> 
> [1: In-progress] EDAC/mc/synopsys: Various fixes and cleanups
> Link: ---you are looking at it---
> [2: In-progress] EDAC/synopsys: Add generic DDRC info and address mapping
> Link: https://lore.kernel.org/linux-edac/20220910195007.11027-1-Sergey.Semin@baikalelectronics.ru
> [3: In-progress] EDAC/synopsys: Add generic resources and Baikal-T1 support
> Link: https://lore.kernel.org/linux-edac/20220910195659.11843-1-Sergey.Semin@baikalelectronics.ru
> 
> Note the patchsets above must be merged in the same order as they are
> placed in the list in order to prevent conflicts. Nothing prevents them
> from being reviewed synchronously though. Any tests are very welcome.
> Thanks in advance.

So I'd take a look slowly but I'd like for this driver's maintainer -
Michal Simek - to have a look first.

Thx.
Borislav Petkov Sept. 30, 2022, 2:42 p.m. UTC | #2
On Fri, Sep 30, 2022 at 02:27:09AM +0300, Serge Semin wrote:
> It was a bad idea in the first place to combine two absolutely different
> controllers support in a single driver [1]. It caused having an additional
> level of abstraction, which obviously have needlessly overcomplicated the
> driver and as such caused many problems in the new main controller
> features support implementation. The solution looks even more unreasonable
> now seeing the justification of having both controllers support in a
> single driver hasn't been implemented by the original code author [2].

Yeah, no, you need to give more concrete details here.

Why exactly is this a problem?

Are you saying that if synopsys puts out 10 incompatible memory
controllers, we should have 10 separate EDAC drivers?

Hell no.

synopsys_edac.c is not a huge file and the probe logic which matches
which synps_platform_data to load is pretty straight-forward to me.

But maybe I'm missing something so please explain in detail what the
actual problems with this are?

Thx.
Serge Semin Oct. 8, 2022, 12:42 a.m. UTC | #3
On Thu, Oct 06, 2022 at 03:25:42PM +0200, Borislav Petkov wrote:
> On Thu, Oct 06, 2022 at 03:17:40PM +0300, Serge Semin wrote:
> > In general because it needlessly overcomplicates the driver, worsen
> > it scalability, maintainability and readability, makes it much harder
> > to add new controller features. Moreover even if you still able to add
> > some more features support the driver will get to be more and more messy
> > (as Michal correctly said in the original thread [1]).
> 

> Did you read that thread until the end?

Of course I did. Here is a short summary:

1. @Punnaiah described that he got a Zynq system with two different
DDR controllers. One of them was Synopsys DW uMCTL2 DDRC (ARM cortex A9
PS) and another one - Zynq A05 DDRC (Xilinx PL). All of these DDR
controllers could be probed in Linux. It was required to detect the ECC
errors for both of them.

2. @Punnaiah suggested that both of these controllers should have been
handled by two different drivers since the controllers were different.
But he was reasonably afraid that there could be a race condition for
the mc_num assignment since it was the drivers responsibility to
allocate one.

3. @Punnaiah suggested two solutions: 1) Keep two drivers in single file
and use static global variable for tracking the mc_num. 2) Let the
framework assign the mc_num to the edac driver.

4. @Punnaiah preferred the solution 2, but you said that the solution
1 would do it.

5. @Michal was against mixing up the support of two different devices
in a single driver. He reasonably assumed that the driver would get
to look messy. 

6. @Boris disagreed saying that it won't be messy and any
communication between the two memory controllers could be done
internally in that driver.

7. So you all decided to keep both controllers support in a single file,
but would get back to a discussion of having separate drivers for them
later.

But after all these years of the Synopsys EDAC driver living in kernel
we can conclude that combining the two different devices support in a
single driver was pointless. First by the driver design nobody ever
used the driver to handle both of them at the same time (MC index is
fixed to zero). Second there is no even a glimpse of communications
between two devices in the system. Third there are two different set
of macros and multiple platform-specific methods and if-flag-else
statements fully abstracting out the devices implementation from the
EDAC core.  All of that has made the driver overcomplicated in-vain
seeing none of the reasons of these complications have been realised.
Meanwhile extending further the Synopsys uMCTL2 DDR controller
functionality support would mean adding new macros, functions,
platform-specific flags, parameters and device-specific data. That
would have made the driver even more complicated. Dropping the
abstraction layer out and splitting up the drivers was the best choice
in the current circumstances especially seeing the driver author and
@Michal preferred that solution in the first place.

Moreover in order to cover a still possible case of having both
Synopsys uMCTL2 DDRC and Zynq A05 DDRC used on the same system in
future I have implemented the solution 2.

> 
> > It will get to be messy since you'll need to add more if-flag-else
> > conditionals or define new device-specific callbacks to select the
> > right code path for different controllers; you'll need to define
> > some private data specific to one controller and unused in another.
> > All of that you already have in the driver and all of that would be
> > unneeded should the driver author have followed the standard kernel
> > bus-device-driver model.
> 

> So lemme ask this again because the last time it all sounded weird and I
> don't think it got clarified fully: you cannot have more than one memory
> controller type at the same time on those systems, can you?

To be clear I don't work with the Xilinx systems. So I can only say based
on your discussion with @Punnaiah on the initial driver review.
@Punnaiah said they could have more than one memory controller type on
their systems. That's why he described the problem with the MC indexes
allocation and suggested to combine the devices support in a single
driver.

> 
> Because if you can and you want to support that, the current EDAC
> "design" allows to have only a single EDAC driver per system. So you
> can't do two drivers now.

I do understand that.

> 
> If your answer to that is
> 
> Subject: [PATCH RESEND v3 13/17] EDAC/mc: Add MC unique index allocation procedure
> 
> then I'm sceptical but I'd need to look at that first.

Em, if there is something else which makes the EDAC drivers to be
impossible to co-exist on the same system, then it greatly violates
the bus-device-driver model.

> 
> And reading your commit messages, you're talking a lot about what you're
> doing. But that should be visible from the patch. What I wanna read is
> *why* you're doing it.

Each patchlog has a thorough enough description of "why".

> 
> Like in this patch above, what's that "unique index allocation
> procedure" for?

Have you read the patchlog? 

> 
> edac_mc_alloc() already gets a mc_num as the MC number.
> 
> And yes, if you want to do multiple driver instances like x86 does per
> NUMA node instances, then that is done with edac_mc_alloc() which gives
> you a memory controller descriptor and then you can deal with those.
> 

See the "EDAC/mc: Add MC unique index allocation procedure" patch. It
also provides a way to assign the indexes based on the OF-aliases.

> From all the text it sounds to me you want to add a separate driver for
> that Zynq A05 thing but I might still be missing something...

I do want to detach the Zynq A05 device support from out of the
Synopsys uMCTL2 driver.

BTW have you read the cover letter? It contains a short summary of the
changes and their justification. It seems as if you started your
review straight from this patch.

-Sergey

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov Oct. 12, 2022, 5:29 p.m. UTC | #4
On Fri, Sep 30, 2022 at 02:27:08AM +0300, Serge Semin wrote:
> In case of the unique index allocation it's not that optimal to always
> rely on the low-level device drivers (platform drivers), because they get
> to start to implement either the same design pattern (for instance global
> static MC counter) or may end-up with having non-unique index eventually
> at runtime. Needless to say that having a generic unique index
> allocation/tracking procedure will make code more readable and safer.

I guess this is trying to say that the current memory controller index
thing doesn't work. But why doesn't it work?

It works just fine with the x86 drivers - there the memory controller
number is the same as the node number where it is located so that works
just fine.

If that scheme cannot work on other systems, then I need to see an
explanation why it cannot work first.

> The suggested implementation is based on the kernel IDA infrastructure
> exposed by the lib/idr.c driver with API described in linux/idr.h header
> file. It's used to create an ID resource descriptor "mc_idr", which then
> is utilized either to track the custom MC idx specified by EDAC LLDDs or
> to allocate the next-free MC idx.

This is talking about the "what" and not the "why".

> A new special MC index is introduced here. It's defined by the
> EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least
> probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM
> index is specified by the EDAC LLDD, the MC index will be either retrieved
> from the MC device OF-node alias index ("mc[:number:]") or automatically
> generated as the next-free MC index found by the ID allocation procedure.

This is also talking about the "what" and not the "why".

At best, what you're doing should be visible from the patch itself.

Here's a longer explanation of how a commit message should look like:

https://kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes

Thx.
Serge Semin Oct. 12, 2022, 7:27 p.m. UTC | #5
On Wed, Oct 12, 2022 at 07:28:48PM +0200, Borislav Petkov wrote:
> On Sat, Oct 08, 2022 at 03:42:24AM +0300, Serge Semin wrote:
> > Of course I did. Here is a short summary:
> 
> You didn't have to do a short summary but sure, if you prefer...

It seemed to me as I should have so to make sure we are on the page of
the original discussion understanding.

> 
> > 7. So you all decided to keep both controllers support in a single file,
> > but would get back to a discussion of having separate drivers for them
> > later.
> 
> Yes, pretty much.
> 
> > But after all these years of the Synopsys EDAC driver living in kernel
> > we can conclude that combining the two different devices support in a
> > single driver was pointless. First by the driver design nobody ever
> > used the driver to handle both of them at the same time (MC index is
> > fixed to zero).
> 

> So how was this supposed to work on his system?

How am I supposed to know? You should have asked him at the time of
his patches review before accepting them. He (Punnaiah Choudary
Kalluri) said they had the system with two different DDR controllers.
Since the MC idx was supposed to be provided by the controller driver
he suggested to have both devices support implemented in the same
driver. You agreed. If no interaction intended between the two devices
why did he need to combine their support in the same driver then? He
could have as well just split the drivers up and didn't bother with
all the discussions.

> 
> If you have a system with two different memory controllers 

I don't. The Synopsys EDAC driver author (Punnaiah) did judging by
what he said in your discussion.

> and you want
> to have two different EDAC drivers for each, then the whole EDAC core
> code needs to be audited wrt concurrency and synchronizing access to
> its facilities because I don't think it has ever supported more than a
> single EDAC driver per system.

Once again. What is driver-depended do you have in the EDAC core?
Does it follow the bus-device-drivers model? I failed to find any
inter-driver dependency (what could be seen for instance in the
tty/serial subsystem). But I am not the subsystem maintainer after all.
I found the possible races in the MC index registration and fixed it
in the corresponding patch. But that was the just
registration-specifics behavior which could be easily fixed in the
same way as the most of the kernel subsystem do.

You are worried about the concurrencies. Does the EDAC core have
problems if there are several DDR devices of the same type probed?
AFAICS it doesn't as long as the indexes are properly allocated by the
driver. What is the problem with registering devices of different
types? The error counters are the MC-data-specific, so are the
sysfs-nodes. The EDAC MC error handler function doesn't touch any
inter-device parts of the core. The registration procedure is
protected by the mutex and RCU. So it seems as the EDAC core developer
thought about having the devices being registered concurrently.

> 
> And it has never needed to, at least not on x86 land. Which is
> currently changing because of CXL, because of accelerators needing
> RAS, GPUs needing RAS and so on and so on. So eventually we'd have to
> either put the new RAS functionality in the existing chipset-specific
> driver or have to go the multiple EDAC drivers route. But that's only
> tangential...

If it has never needed to, then please explain why did you let the
Synopsys EDAC driver being accepted like that then? 

> 
> So first I'd like to hear what your use case is: single EDAC driver for
> your particular Baical-T1 device or you need to support multiple EDAC
> drivers.

In my case it's a single EDAC driver per-chip. There can be several
DDR-controllers installed on the same SoC, but all of them of the same
type (Synopsys DW uMCTL2 v2.61a).

> 
> If so, why?

I don't have a system with different DDR controllers detected on the
same SoC but Punnaiah Choudary Kalluri, original Synopsys EDAC driver
developer, did.

> 
> > Moreover in order to cover a still possible case of having both
> > Synopsys uMCTL2 DDRC and Zynq A05 DDRC used on the same system in
> > future I have implemented the solution 2.
> 
> See above.
> 
> > Em, if there is something else which makes the EDAC drivers to be
> > impossible to co-exist on the same system, then it greatly violates
> > the bus-device-driver model.
> 
> If by that you mean the aspect of a driver associating with a device and
> performing operations with it then why do you assume that EDAC drivers
> have to adhere to that model?
> 
> > Have you read the patchlog?
> 
> Lemme reply to it directly.
> 
> > BTW have you read the cover letter? It contains a short summary of the
> > changes and their justification.
> 

> Yes, I have read it and it contains a lot of unnecessary detail which
> should be in the respective patches themselves. And I still don't know
> exactly what *you* are trying to do, as I said above.
> 
> A cover letter should contain a short executive summary explaining only
> the goal of the patchset and then you can go into details if you prefer.
> A reviewer should not have to dig into patch management details to know
> what this patchset is trying to do.

The main part of the letter starts exactly with the goals and then has
text with more details of what is done and why. It is enough to get a
notion regarding the patchset aim and content. Each patchlog starts
with the problem description, the suggested solution and some details
of the implementation I thought was required to add. Everything in
accordance with the kernel patches standards.

-Sergey

> 
> A possible structure could be:
> 
> Problem is A.
> 
> It happens because of B.
> 
> Fix it by doing C.
> 
> (Potentially do D).
> 
> Btw, when you're writing your commit messages, please use passive voice
> in your commit message: no "we" or "I", etc, and describe your changes
> in imperative mood.
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov Oct. 12, 2022, 7:44 p.m. UTC | #6
On Wed, Oct 12, 2022 at 10:27:43PM +0300, Serge Semin wrote:
> ... inter-device parts of the core. The registration procedure is
> protected by the mutex and RCU. So it seems as the EDAC core developer

Seems, schmeems. As I said already, EDAC has always had a single
chipset-specific driver. Period. So if one needs to run more than one
chipset-specific driver concurrently, then the whole code needs to be
audited because this hasn't been done before.

> If it has never needed to, then please explain why did you let the
> Synopsys EDAC driver being accepted like that then? 

I think I already did.

> In my case it's a single EDAC driver per-chip. There can be several
> DDR-controllers installed on the same SoC, but all of them of the same
> type (Synopsys DW uMCTL2 v2.61a).

Good.

I'll look at your patches as time allows.
Serge Semin Oct. 12, 2022, 8:01 p.m. UTC | #7
On Wed, Oct 12, 2022 at 07:29:22PM +0200, Borislav Petkov wrote:
> On Fri, Sep 30, 2022 at 02:27:08AM +0300, Serge Semin wrote:
> > In case of the unique index allocation it's not that optimal to always
> > rely on the low-level device drivers (platform drivers), because they get
> > to start to implement either the same design pattern (for instance global
> > static MC counter) or may end-up with having non-unique index eventually
> > at runtime. Needless to say that having a generic unique index
> > allocation/tracking procedure will make code more readable and safer.
> 

> I guess this is trying to say that the current memory controller index
> thing doesn't work. But why doesn't it work?
Borislav Petkov Oct. 12, 2022, 8:33 p.m. UTC | #8
On Wed, Oct 12, 2022 at 11:01:54PM +0300, Serge Semin wrote:
> The unified approach makes code indeed more readable in the platform
> drivers and safer since they didn't have to bother with more coding.
> See for instance the drivers with the static variable-based IDs
> allocation.

Which drivers? Concrete examples please.

> Have you read it yourself? 

Yes. I even have improved it over the years.

> Here is a short excerpt from there:
> "Once the problem is established, describe what you are actually doing
> about it in technical detail.  It's important to describe the change
> in plain English for the reviewer to verify that the code is behaving
> as you intend it to."

Maybe that part can be misunderstood: "describe what you're doing about
it". That doesn't mean the text should explain what you're adding and
how stuff is defined: "It's defined by the EDAC_AUTO_MC_NUM macro." I
can see that from the diff.

So let me try to explain to you what I'm expecting from commit messages
in the EDAC tree:

The commit message should explain *why* a change is being done so that,
months, years from now, when you've gone on to do something else, people
doing git archeology can actually figure out *why* this change was done.

And the explanation in that commit message should be *complete* and
should contain *all* necessary information to understand why this change
was done.

Your commit message is not explaining the problem.

"In case of the unique index allocation it's not that optimal to always
rely on the low-level device drivers (platform drivers)"

That's your statement. That needs to have exact details so that people
can look at that commit message, look at the code which *you* point them
to in it and go, aha, that is the problem.

"because they get to start to implement either the same design pattern
(for instance global static MC counter) or may end-up with having
non-unique index eventually at runtime."

Who are they, exact pointers please.

"The suggested implementation is based on the kernel IDA infrastructure
exposed by the lib/idr.c driver with API described in linux/idr.h header
file."

That doesn't matter one bit for the change you're doing. You could have
added it under the "---" line.

"A new special MC index is introduced here. It's defined by the
EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least
probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM
index is specified by the EDAC LLDD, the MC index will be either retrieved
from the MC device OF-node alias index ("mc[:number:]") or automatically
generated as the next-free MC index found by the ID allocation procedure."

Some of that paragraph should go over the function as a comment - not in
the commit message as it pertains to what the function does and it would
make a *lot* more sense there when someone tries to figure out what the
function does instead of in the commit message.

So, I'm still not convinced why do some EDAC drivers need unique MC
identifiers, why the current scheme doesn't work and where it doesn't
work.
Serge Semin Oct. 12, 2022, 8:55 p.m. UTC | #9
On Wed, Oct 12, 2022 at 09:44:00PM +0200, Borislav Petkov wrote:
> On Wed, Oct 12, 2022 at 10:27:43PM +0300, Serge Semin wrote:
> > ... inter-device parts of the core. The registration procedure is
> > protected by the mutex and RCU. So it seems as the EDAC core developer
> 
> Seems, schmeems. As I said already, EDAC has always had a single
> chipset-specific driver. Period. So if one needs to run more than one
> chipset-specific driver concurrently, then the whole code needs to be
> audited because this hasn't been done before.
> 
> > If it has never needed to, then please explain why did you let the
> > Synopsys EDAC driver being accepted like that then? 
> 
> I think I already did.

Kind of. What you didn't explain was the driver-specific problem in the
edac_mc core. What is the difference in the EDAC core handling
two devices (including of difference types) on the same platform and
handling the same devices each probed by two different drivers? (Consider
the drivers are designed thread-safe and we are talking about the EDAC
MC core.)

> 
> > In my case it's a single EDAC driver per-chip. There can be several
> > DDR-controllers installed on the same SoC, but all of them of the same
> > type (Synopsys DW uMCTL2 v2.61a).
> 
> Good.
> 
> I'll look at your patches as time allows.

Ok. Thanks in advance.

-Sergey

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette