[v2,01/24] EDAC, mc: Fix grain_bits calculation

Message ID 20190624150758.6695-2-rrichter@marvell.com
State New
Headers show
Series
  • EDAC, mc, ghes: Fixes and updates to improve memory error reporting
Related show

Commit Message

Robert Richter June 24, 2019, 3:08 p.m.
The grain in edac is defined as "minimum granularity for an error
report, in bytes". The following calculation of the grain_bits in
edac_mc is wrong:

	grain_bits = fls_long(e->grain) + 1;

Where grain_bits is defined as:

	grain = 1 << grain_bits

Example:

	grain = 8	# 64 bit (8 bytes)
	grain_bits = fls_long(8) + 1
	grain_bits = 4 + 1 = 5

	grain = 1 << grain_bits
	grain = 1 << 5 = 32

Replacing it with the correct calculation:

	grain_bits = fls_long(e->grain - 1);

The example gives now:

	grain_bits = fls_long(8 - 1)
	grain_bits = fls_long(8 - 1)
	grain_bits = 3

	grain = 1 << 3 = 8

Note: We need to check if the hardware reports a reasonable grain != 0
and fallback with a warn_once and 1 byte granularity otherwise.

Signed-off-by: Robert Richter <rrichter@marvell.com>

---
 drivers/edac/edac_mc.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

-- 
2.20.1

Comments

Borislav Petkov Aug. 3, 2019, 10:08 a.m. | #1
On Mon, Jun 24, 2019 at 03:08:55PM +0000, Robert Richter wrote:
> The grain in edac is defined as "minimum granularity for an error

> report, in bytes". The following calculation of the grain_bits in

> edac_mc is wrong:

> 

> 	grain_bits = fls_long(e->grain) + 1;

> 

> Where grain_bits is defined as:

> 

> 	grain = 1 << grain_bits

> 

> Example:

> 

> 	grain = 8	# 64 bit (8 bytes)

> 	grain_bits = fls_long(8) + 1

> 	grain_bits = 4 + 1 = 5

> 

> 	grain = 1 << grain_bits

> 	grain = 1 << 5 = 32

> 

> Replacing it with the correct calculation:

> 

> 	grain_bits = fls_long(e->grain - 1);

> 

> The example gives now:

> 

> 	grain_bits = fls_long(8 - 1)

> 	grain_bits = fls_long(8 - 1)

> 	grain_bits = 3

> 

> 	grain = 1 << 3 = 8

> 

> Note: We need to check if the hardware reports a reasonable grain != 0

> and fallback with a warn_once and 1 byte granularity otherwise.

> 

> Signed-off-by: Robert Richter <rrichter@marvell.com>

> ---

>  drivers/edac/edac_mc.c | 10 ++++++++--

>  1 file changed, 8 insertions(+), 2 deletions(-)


Applied to the new EDAC repo:

https://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-for-next

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Patch

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 64922c8fa7e3..45cac74ab833 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -1235,9 +1235,15 @@  void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 	if (p > e->location)
 		*(p - 1) = '\0';
 
-	/* Report the error via the trace interface */
-	grain_bits = fls_long(e->grain) + 1;
+	/*
+	 * We expect the hw to report a reasonable grain, fallback to
+	 * 1 byte granularity otherwise.
+	 */
+	if (WARN_ON_ONCE(!e->grain))
+		e->grain = 1;
+	grain_bits = fls_long(e->grain - 1);
 
+	/* Report the error via the trace interface */
 	if (IS_ENABLED(CONFIG_RAS))
 		trace_mc_event(type, e->msg, e->label, e->error_count,
 			       mci->mc_idx, e->top_layer, e->mid_layer,