Message ID | 20190624150758.6695-2-rrichter@marvell.com |
---|---|
State | New |
Headers | show |
Series | EDAC, mc, ghes: Fixes and updates to improve memory error reporting | expand |
On Mon, Jun 24, 2019 at 03:08:55PM +0000, Robert Richter wrote: > The grain in edac is defined as "minimum granularity for an error > report, in bytes". The following calculation of the grain_bits in > edac_mc is wrong: > > grain_bits = fls_long(e->grain) + 1; > > Where grain_bits is defined as: > > grain = 1 << grain_bits > > Example: > > grain = 8 # 64 bit (8 bytes) > grain_bits = fls_long(8) + 1 > grain_bits = 4 + 1 = 5 > > grain = 1 << grain_bits > grain = 1 << 5 = 32 > > Replacing it with the correct calculation: > > grain_bits = fls_long(e->grain - 1); > > The example gives now: > > grain_bits = fls_long(8 - 1) > grain_bits = fls_long(8 - 1) > grain_bits = 3 > > grain = 1 << 3 = 8 > > Note: We need to check if the hardware reports a reasonable grain != 0 > and fallback with a warn_once and 1 byte granularity otherwise. > > Signed-off-by: Robert Richter <rrichter@marvell.com> > --- > drivers/edac/edac_mc.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) Applied to the new EDAC repo: https://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-for-next Thx. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c index 64922c8fa7e3..45cac74ab833 100644 --- a/drivers/edac/edac_mc.c +++ b/drivers/edac/edac_mc.c @@ -1235,9 +1235,15 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type, if (p > e->location) *(p - 1) = '\0'; - /* Report the error via the trace interface */ - grain_bits = fls_long(e->grain) + 1; + /* + * We expect the hw to report a reasonable grain, fallback to + * 1 byte granularity otherwise. + */ + if (WARN_ON_ONCE(!e->grain)) + e->grain = 1; + grain_bits = fls_long(e->grain - 1); + /* Report the error via the trace interface */ if (IS_ENABLED(CONFIG_RAS)) trace_mc_event(type, e->msg, e->label, e->error_count, mci->mc_idx, e->top_layer, e->mid_layer,
The grain in edac is defined as "minimum granularity for an error report, in bytes". The following calculation of the grain_bits in edac_mc is wrong: grain_bits = fls_long(e->grain) + 1; Where grain_bits is defined as: grain = 1 << grain_bits Example: grain = 8 # 64 bit (8 bytes) grain_bits = fls_long(8) + 1 grain_bits = 4 + 1 = 5 grain = 1 << grain_bits grain = 1 << 5 = 32 Replacing it with the correct calculation: grain_bits = fls_long(e->grain - 1); The example gives now: grain_bits = fls_long(8 - 1) grain_bits = fls_long(8 - 1) grain_bits = 3 grain = 1 << 3 = 8 Note: We need to check if the hardware reports a reasonable grain != 0 and fallback with a warn_once and 1 byte granularity otherwise. Signed-off-by: Robert Richter <rrichter@marvell.com> --- drivers/edac/edac_mc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) -- 2.20.1