[v2,0/3] ghes_edac: refactor memory error reporting to avoid code duplication

Message ID	20211210134019.28536-1-xueshuai@linux.alibaba.com
Headers	show Return-Path: <linux-efi-owner@kernel.org> From: Shuai Xue <xueshuai@linux.alibaba.com> To: mchehab@kernel.org, bp@alien8.de, tony.luck@intel.com, james.morse@arm.com, rric@kernel.org, ardb@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org Cc: xueshuai@linux.alibaba.com, zhangliguang@linux.alibaba.com, zhuo.song@linux.alibaba.com Subject: [PATCH v2 0/3] ghes_edac: refactor memory error reporting to avoid code duplication Date: Fri, 10 Dec 2021 21:40:16 +0800 Message-Id: <20211210134019.28536-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	ghes_edac: refactor memory error reporting to avoid code duplication \| expand [v2,0/3] ghes_edac: refactor memory error reporting to avoid code duplication [v2,1/3] ghes_edac: unify memory error report format with cper [v2,2/3] ghes_edac: refactor memory error location processing [v2,3/3] ghes_edac: refactor error status fields decoding

Shuai Xue Dec. 10, 2021, 1:40 p.m. UTC

ghes_edac_report_mem_error() in ghes_edac.c is Long Method and have
Duplicated Code with cper_mem_err_location(), cper_dimm_err_location(), and
cper_mem_err_type_str() in drivers/firmware/efi/cper.c. In addition, the
cper_print_mem() in drivers/firmware/efi/cper.c only reports the error
status and misses its description.

This patch set is to refactor ghes_edac_report_mem_error with:

- Patch 01 is to unify memory error report format with cper;
- Patch 02 is to introduces cper_*(), into ghes_edac_report_mem_error(),
  this can avoid bunch of duplicate code lines;
- Patch 02 is to wrap up error status decoding logics and reuse it in
    cper_print_mem().

Changes since v1:
https://lore.kernel.org/all/20211207031905.61906-2-xueshuai@linux.alibaba.com/

- add a new patch to unify ghes and cper before removing duplication.
- document the changes in patch description
- add EXPORT_SYMBOL_GPL()s for cper_*()
- document and the dependency and add UEFI_CPER dependency explicitly
Thanks Robert Richter for review comments.

Shuai Xue (3):
  ghes_edac: unify memory error report format with cper
  ghes_edac: refactor memory error location processing
  ghes_edac: refactor error status fields decoding

 drivers/edac/Kconfig        |   1 +
 drivers/edac/ghes_edac.c    | 196 +++++++-----------------------------
 drivers/firmware/efi/cper.c |  86 ++++++++++++----
 include/linux/cper.h        |   3 +
 4 files changed, 105 insertions(+), 181 deletions(-)

Borislav Petkov Jan. 24, 2022, 9:16 p.m. UTC | #1

On Mon, Jan 24, 2022 at 10:47:58AM +0800, Shuai Xue wrote:
> Introduce a new helper function cper_mem_err_status_str() which is used to
> decode the description of error status, and the cper_print_mem() will call
> it and report the details of error status.
> 
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
>  drivers/firmware/efi/cper.c | 46 ++++++++++++++++++++++++++++++++++++-
>  include/linux/cper.h        |  1 +
>  2 files changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 6ec8edec6329..addafccecd84 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -211,6 +211,49 @@ const char *cper_mem_err_type_str(unsigned int etype)
>  }
>  EXPORT_SYMBOL_GPL(cper_mem_err_type_str);
>  
> +const char *cper_mem_err_status_str(u64 status)
> +{
> +	switch ((status >> 8) & 0xff) {
> +	case 1:
> +		return "Error detected internal to the component";

You can make that table a lot more compact:

        switch ((status >> 8) & 0xff) {
        case  1:        return "Error detected internal to the component";
        case  4:        return "Storage error in DRAM memory";
        case  5:        return "Storage error in TLB";
        case  6:        return "Storage error in cache";
        case  7:        return "Error in one or more functional units";
        case  8:        return "component failed self test";
        case  9:        return "Overflow or undervalue of internal queue";
        case 16:        return "Error detected in the bus";
	...

> +	case 16:
> +		return "Error detected in the bus";

And yes, that 16 needs to come before 17, ofc.

> @@ -334,7 +377,8 @@ static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem,
>  		return;
>  	}
>  	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
> -		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
> +		printk("%s""error_status: 0x%016llx, %s\n", pfx, mem->error_status,
> +				cper_mem_err_status_str(mem->error_status));

Arguments need to be aligned at opening brace, i.e.:

                printk("%s""error_status: 0x%016llx, %s\n",
                        pfx, mem->error_status, cper_mem_err_status_str(mem->error_status));


Also, the naked error status number is not as user-friendly when we have
the decoded string. So the format should be:

                printk("%s error_status: %s (0x%016llx)\n",
                        pfx, cper_mem_err_status_str(mem->error_status), mem->error_status);

Shuai Xue Jan. 25, 2022, 2:45 a.m. UTC | #2

Hi, Borislav,

Thank you for your valuable comments.

在 2022/1/25 AM5:16, Borislav Petkov 写道:
> On Mon, Jan 24, 2022 at 10:47:58AM +0800, Shuai Xue wrote:
>> Introduce a new helper function cper_mem_err_status_str() which is used to
>> decode the description of error status, and the cper_print_mem() will call
>> it and report the details of error status.
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> ---
>>  drivers/firmware/efi/cper.c | 46 ++++++++++++++++++++++++++++++++++++-
>>  include/linux/cper.h        |  1 +
>>  2 files changed, 46 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
>> index 6ec8edec6329..addafccecd84 100644
>> --- a/drivers/firmware/efi/cper.c
>> +++ b/drivers/firmware/efi/cper.c
>> @@ -211,6 +211,49 @@ const char *cper_mem_err_type_str(unsigned int etype)
>>  }
>>  EXPORT_SYMBOL_GPL(cper_mem_err_type_str);
>>  
>> +const char *cper_mem_err_status_str(u64 status)
>> +{
>> +	switch ((status >> 8) & 0xff) {
>> +	case 1:
>> +		return "Error detected internal to the component";
> 
> You can make that table a lot more compact:
> 
>         switch ((status >> 8) & 0xff) {
>         case  1:        return "Error detected internal to the component";
>         case  4:        return "Storage error in DRAM memory";
>         case  5:        return "Storage error in TLB";
>         case  6:        return "Storage error in cache";
>         case  7:        return "Error in one or more functional units";
>         case  8:        return "component failed self test";
>         case  9:        return "Overflow or undervalue of internal queue";
>         case 16:        return "Error detected in the bus";
> 	...
> 
>> +	case 16:
>> +		return "Error detected in the bus";
> 
> And yes, that 16 needs to come before 17, ofc.

I will fix it in next version.

>> @@ -334,7 +377,8 @@ static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem,
>>  		return;
>>  	}
>>  	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
>> -		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
>> +		printk("%s""error_status: 0x%016llx, %s\n", pfx, mem->error_status,
>> +				cper_mem_err_status_str(mem->error_status));
> 
> Arguments need to be aligned at opening brace, i.e.:
> 
>                 printk("%s""error_status: 0x%016llx, %s\n",
>                         pfx, mem->error_status, cper_mem_err_status_str(mem->error_status));
> 
> 
> Also, the naked error status number is not as user-friendly when we have
> the decoded string. So the format should be:
> 
>                 printk("%s error_status: %s (0x%016llx)\n",
>                         pfx, cper_mem_err_status_str(mem->error_status), mem->error_status);
> 

Good point. Will fix it.

Best Regard,
Shuai

Borislav Petkov Jan. 25, 2022, 10:21 a.m. UTC | #3

On Tue, Jan 25, 2022 at 10:45:31AM +0800, Shuai Xue wrote:
> I will fix it in next version.

Yes, thanks.

However, you don't have to resend immediately but wait instead until
people have had time to review the whole thing. And while you're
waiting, you can read through Documentation/process/...

There are passages like the following one, for example:

"Don't get discouraged - or impatient
------------------------------------

After you have submitted your change, be patient and wait.  Reviewers are
busy people and may not get to your patch right away.

Once upon a time, patches used to disappear into the void without comment,
but the development process works more smoothly than that now.  You should
receive comments within a week or so; if that does not happen, make sure
that you have sent your patches to the right place.  Wait for a minimum of
one week before resubmitting or pinging reviewers - possibly longer during
busy times like merge windows.

It's also ok to resend the patch or the patch series after a couple of
weeks with the word "RESEND" added to the subject line::

   [PATCH Vx RESEND] sub/sys: Condensed patch summary

Don't add "RESEND" when you are submitting a modified version of your
patch or patch series - "RESEND" only applies to resubmission of a
patch or patch series which have not been modified in any way from the
previous submission."

Shuai Xue Jan. 25, 2022, 11:49 a.m. UTC | #4

Hi, Borislav,

Thank you for your reply.

I am sorry if you feel the RESEND tag is pushing you.

Take your time, I will be more patient :)

Best Regards,
Shuai

在 2022/1/25 PM6:21, Borislav Petkov 写道:
> On Tue, Jan 25, 2022 at 10:45:31AM +0800, Shuai Xue wrote:
>> I will fix it in next version.
> 
> Yes, thanks.
> 
> However, you don't have to resend immediately but wait instead until
> people have had time to review the whole thing. And while you're
> waiting, you can read through Documentation/process/...
> 
> There are passages like the following one, for example:
> 
> "Don't get discouraged - or impatient
> ------------------------------------
> 
> After you have submitted your change, be patient and wait.  Reviewers are
> busy people and may not get to your patch right away.
> 
> Once upon a time, patches used to disappear into the void without comment,
> but the development process works more smoothly than that now.  You should
> receive comments within a week or so; if that does not happen, make sure
> that you have sent your patches to the right place.  Wait for a minimum of
> one week before resubmitting or pinging reviewers - possibly longer during
> busy times like merge windows.
> 
> It's also ok to resend the patch or the patch series after a couple of
> weeks with the word "RESEND" added to the subject line::
> 
>    [PATCH Vx RESEND] sub/sys: Condensed patch summary
> 
> Don't add "RESEND" when you are submitting a modified version of your
> patch or patch series - "RESEND" only applies to resubmission of a
> patch or patch series which have not been modified in any way from the
> previous submission."
>

Borislav Petkov Jan. 25, 2022, 12:37 p.m. UTC | #5

On Tue, Jan 25, 2022 at 07:49:41PM +0800, Shuai Xue wrote:
> I am sorry if you feel the RESEND tag is pushing you.

It is not pushing me - there are rules, simply. Rules you should read
first before sending patches.

How about I start flooding you a patchset every day?

Also, please do not top-post. That's also explained in that
documentation directory I mentioned. Read about it too pls.

Shuai Xue Jan. 25, 2022, 3:02 p.m. UTC | #6

Hi Borislav,

在 2022/1/25 PM8:37, Borislav Petkov 写道:
> On Tue, Jan 25, 2022 at 07:49:41PM +0800, Shuai Xue wrote:
>> I am sorry if you feel the RESEND tag is pushing you.
> 
> It is not pushing me - there are rules, simply. Rules you should read
> first before sending patches.

Got it. I will learn rules first.

> How about I start flooding you a patchset every day?

Haha, I see. I am sorry to bother you and thank you very much for your patient
and valuable comments, just take your time. By the way, after sending patchset
v3 8 days, I resend it yesterday, and the patchset v4 sent today is to address
your comments, not a resend patchset. Anyway, I will be more patient.

> Also, please do not top-post. That's also explained in that
> documentation directory I mentioned. Read about it too pls.

I will, thank you.

Best Regards,
Shuai

Borislav Petkov Jan. 25, 2022, 7:08 p.m. UTC | #7

On Tue, Jan 25, 2022 at 10:49:38AM +0800, Shuai Xue wrote:
> Introduce a new helper function cper_mem_err_status_str() which is used to
> decode the description of error status, and the cper_print_mem() will call
> it and report the details of error status.
> 
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
>  drivers/firmware/efi/cper.c | 29 ++++++++++++++++++++++++++++-
>  include/linux/cper.h        |  1 +
>  2 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 6ec8edec6329..7f08d4ea906e 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -211,6 +211,31 @@ const char *cper_mem_err_type_str(unsigned int etype)
>  }
>  EXPORT_SYMBOL_GPL(cper_mem_err_type_str);
>  
> +const char *cper_mem_err_status_str(u64 status)
> +{
> +	switch ((status >> 8) & 0xff) {
> +	case  1:	return "Error detected internal to the component";
> +	case  4:	return "Storage error in DRAM memory";
> +	case  5:	return "Storage error in TLB";
> +	case  6:	return "Storage error in cache";
> +	case  7:	return "Error in one or more functional units";
> +	case  8:	return "component failed self test";

Well, at least start them all with capital letters: "Component... " And
yes, I know this is how it is in the spec but the spec has typos and
other problems - doesn't mean we have to copy them too.

> +	case  9:	return "Overflow or undervalue of internal queue";
> +	case 16:	return "Error detected in the bus";
> +	case 17:	return "Virtual address not found on IO-TLB or IO-PDIR";
> +	case 18:	return "Improper access error";
> +	case 19:	return "Access to a memory address which is not mapped to any component";
> +	case 20:	return "Loss of Lockstep";
> +	case 21:	return "Response not associated with a request";
> +	case 22:	return "Bus parity error - must also set the A, C, or D Bits";
> +	case 23:	return "Detection of a PATH_ERROR ";

Trailing space here. Also what is PATH_ERROR?

That "PATH_ERROR" is nowhere else explained in that big fat UEFI spec.
2558 pages and they can't explain *that*. Geez.

> +	case 25:	return "Bus operation timeout";
> +	case 26:	return "A read was issued to data that has been poisoned";
> +	default:	return "reserved";
> +	}
> +}
> +EXPORT_SYMBOL_GPL(cper_mem_err_status_str);
> +
>  static int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
>  {
>  	u32 len, n;
> @@ -334,7 +359,9 @@ static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem,
>  		return;
>  	}
>  	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
> -		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
> +		printk("%s""error_status: %s (0x%016llx)\n",

Why do you insist on having two back-to-back strings instead of one
here?

(And don't tell me it is because the other function calls here do it
too.)

FWIW, even checkpatch complains here:

WARNING: Consecutive strings are generally better as a single string
#87: FILE: drivers/firmware/efi/cper.c:362:
+               printk("%s""error_status: %s (0x%016llx)\n",

Btw, please integrate scripts/checkpatch.pl into your patch creation
workflow. Some of the warnings/errors *actually* make sense.

Shuai Xue Jan. 26, 2022, 5:25 a.m. UTC | #8

Hi, Borislav,

Thank you for your comments.

在 2022/1/26 AM3:08, Borislav Petkov 写道:
> On Tue, Jan 25, 2022 at 10:49:38AM +0800, Shuai Xue wrote:
>> Introduce a new helper function cper_mem_err_status_str() which is used to
>> decode the description of error status, and the cper_print_mem() will call
>> it and report the details of error status.
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> ---
>>  drivers/firmware/efi/cper.c | 29 ++++++++++++++++++++++++++++-
>>  include/linux/cper.h        |  1 +
>>  2 files changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
>> index 6ec8edec6329..7f08d4ea906e 100644
>> --- a/drivers/firmware/efi/cper.c
>> +++ b/drivers/firmware/efi/cper.c
>> @@ -211,6 +211,31 @@ const char *cper_mem_err_type_str(unsigned int etype)
>>  }
>>  EXPORT_SYMBOL_GPL(cper_mem_err_type_str);
>>  
>> +const char *cper_mem_err_status_str(u64 status)
>> +{
>> +	switch ((status >> 8) & 0xff) {
>> +	case  1:	return "Error detected internal to the component";
>> +	case  4:	return "Storage error in DRAM memory";
>> +	case  5:	return "Storage error in TLB";
>> +	case  6:	return "Storage error in cache";
>> +	case  7:	return "Error in one or more functional units";
>> +	case  8:	return "component failed self test";
> 
> Well, at least start them all with capital letters: "Component... " And
> yes, I know this is how it is in the spec but the spec has typos and
> other problems - doesn't mean we have to copy them too.

You are right, I will fix it in next version.

>> +	case  9:	return "Overflow or undervalue of internal queue";
>> +	case 16:	return "Error detected in the bus";
>> +	case 17:	return "Virtual address not found on IO-TLB or IO-PDIR";
>> +	case 18:	return "Improper access error";
>> +	case 19:	return "Access to a memory address which is not mapped to any component";
>> +	case 20:	return "Loss of Lockstep";
>> +	case 21:	return "Response not associated with a request";
>> +	case 22:	return "Bus parity error - must also set the A, C, or D Bits";
>> +	case 23:	return "Detection of a PATH_ERROR ";
> 
> Trailing space here.

Sorry, will delete it.

> Also what is PATH_ERROR?
> 
> That "PATH_ERROR" is nowhere else explained in that big fat UEFI spec.
> 2558 pages and they can't explain *that*. Geez.

I don't know either. A related item I found is "iSCSI Device Path error".
Section 10 defines the device path protocol, but I don't know if "PATH_ERROR"
means the path of the device is not found, or something else.

>> +	case 25:	return "Bus operation timeout";
>> +	case 26:	return "A read was issued to data that has been poisoned";
>> +	default:	return "reserved";
>> +	}
>> +}
>> +EXPORT_SYMBOL_GPL(cper_mem_err_status_str);
>> +
>>  static int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
>>  {
>>  	u32 len, n;
>> @@ -334,7 +359,9 @@ static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err *mem,
>>  		return;
>>  	}
>>  	if (mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS)
>> -		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
>> +		printk("%s""error_status: %s (0x%016llx)\n",
> 
> Why do you insist on having two back-to-back strings instead of one
> here?
> 
> (And don't tell me it is because the other function calls here do it
> too.)
> 
> FWIW, even checkpatch complains here:
> 
> WARNING: Consecutive strings are generally better as a single string
> #87: FILE: drivers/firmware/efi/cper.c:362:
> +               printk("%s""error_status: %s (0x%016llx)\n",
> 
> Btw, please integrate scripts/checkpatch.pl into your patch creation
> workflow. Some of the warnings/errors *actually* make sense.

Sorry, I did integrate scripts/checkpatch.pl before sending patch. And as you see,
other function calls here do the same, so I ignored the warnings. I will change as
your comments in next version:

-		printk("%s""error_status: 0x%016llx\n", pfx, mem->error_status);
+		printk("%s error_status: %s (0x%016llx)\n",


Best regards,
Shuai

Shuai Xue Jan. 26, 2022, 9:26 a.m. UTC | #9

Hi, Borislav,

在 2022/1/26 PM4:20, Borislav Petkov 写道:
> On Wed, Jan 26, 2022 at 04:17:00PM +0800, Shuai Xue wrote:
>> ghes_edac_report_mem_error() in ghes_edac.c is a Long Method and have
>> Duplicated Code with cper_mem_err_location(), cper_dimm_err_location(), and
>> cper_mem_err_type_str() in drivers/firmware/efi/cper.c. In addition, the
>> cper_print_mem() in drivers/firmware/efi/cper.c only reports the error
>> status and misses its description.
> 
> Dude, what about
> 
> 	wait for a week or until the patchset has been fully reviewed
> 
> don't you understand?!
> 
> Please let me know what about the review process is not clear to you so
> that we can document it better.

Emmm, when I received your replied email, I thought you had fully reviewed them. So
I work to address your comments and reply as soon as possible. Sorry, I misunderstood.

Of course, I can wait. As I said before, take your time.

By the way, I have a question about review process: after waiting for a period
of time, how can I tell whether you have no comments or are still in review process?

Best Regards,
Shuai

Borislav Petkov Jan. 26, 2022, 10:16 a.m. UTC | #10

On January 26, 2022 9:26:01 AM UTC, Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>By the way, I have a question about review process: after waiting for a period
>of time, how can I tell whether you have no comments or are still in review process?
>

A good sign for when review is done is to wait to see replies to every patch.

BUT, there are other people on CC too so they would need to get a chance to have a look too.

Regardless, you wait for a week and then you incorporate all review comments and resend - not before.

This constant spamming with the patchset is not productive. You're not the only one who sends patches and wants review - you should consider that there are others who would need to get reviewed too.

Shuai Xue Jan. 27, 2022, 2:28 a.m. UTC | #11

Hi, Borislav,

在 2022/1/26 PM6:16, Boris Petkov 写道:
> On January 26, 2022 9:26:01 AM UTC, Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>> By the way, I have a question about review process: after waiting for a period
>> of time, how can I tell whether you have no comments or are still in review process?
>>
> 
> A good sign for when review is done is to wait to see replies to every patch.
> 
> BUT, there are other people on CC too so they would need to get a chance to have a look too.
> 
> Regardless, you wait for a week and then you incorporate all review comments and resend - not before.
> 
> This constant spamming with the patchset is not productive. You're not the only one who sends patches and wants review - you should consider that there are others who would need to get reviewed too.

Got it. Thank you very much.

Best Regards,
Shuai

Borislav Petkov March 4, 2022, 10:16 a.m. UTC | #12

On Thu, Mar 03, 2022 at 08:26:26PM +0800, Shuai Xue wrote:
> To Borislav: Sorry, I only delete the format change summary in this commit
> log in this version. If I missed any comments, could you please point out
> clearly? Thank you very much.

I can't be more clear than the below - I simply took your patch and
heavily massaged it because it is a lot quicker this way.

You can compare it with your version and see what needed to be changed.
And you can of course ask questios why, if it is not clear. But it
should be - a shortlog of what I did is also in the commit message.

Now, when you look at the resulting patch you can see what the change
does. Yours did more than one logical thing - which a patch should never
do.

Also, I'd really really urge you to read

Documentation/process/submitting-patches.rst

carefully before sending other patches. I won't do this heavy editing
in the future so you're going to have to read review comments and
*incorporate* them into your patches before submitting them again.

You can test the below on your machine now to make sure it still
performs as expected.

Thx.

---
From: Shuai Xue <xueshuai@linux.alibaba.com>
Date: Thu, 3 Mar 2022 20:26:26 +0800
Subject: [PATCH] EDAC/ghes: Unify CPER memory error location reporting

Switch the GHES EDAC memory error reporting functions to use the common
CPER ones and get rid of code duplication.

  [ bp:
      - rewrite commit message, remove useless text
      - rip out useless reformatting
      - align function params on the opening brace
      - rename function to a more descriptive name
      - drop useless function exports
      - handle buffer lengths properly when printing other detail
      - remove useless casting
  ]

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220303122626.99740-3-xueshuai@linux.alibaba.com
---
 drivers/edac/Kconfig        |   1 +
 drivers/edac/ghes_edac.c    | 200 +++++++-----------------------------
 drivers/firmware/efi/cper.c |   4 +-
 include/linux/cper.h        |   2 +
 4 files changed, 42 insertions(+), 165 deletions(-)

diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 58ab63642e72..23f11554f400 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -55,6 +55,7 @@ config EDAC_DECODE_MCE
 config EDAC_GHES
 	bool "Output ACPI APEI/GHES BIOS detected errors via EDAC"
 	depends on ACPI_APEI_GHES && (EDAC=y)
+	select UEFI_CPER
 	help
 	  Not all machines support hardware-driven error report. Some of those
 	  provide a BIOS-driven error report mechanism via ACPI, using the
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 6d1ddecbf0da..2805d5610300 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -15,11 +15,13 @@
 #include "edac_module.h"
 #include <ras/ras_event.h>
 
+#define OTHER_DETAIL_LEN	400
+
 struct ghes_pvt {
 	struct mem_ctl_info *mci;
 
 	/* Buffers for the error handling routine */
-	char other_detail[400];
+	char other_detail[OTHER_DETAIL_LEN];
 	char msg[80];
 };
 
@@ -235,8 +237,34 @@ static void ghes_scan_system(void)
 	system_scanned = true;
 }
 
+static int print_mem_error_other_detail(const struct cper_sec_mem_err *mem, char *msg,
+					const char *location, unsigned int len)
+{
+	u32 n;
+
+	if (!msg)
+		return 0;
+
+	n = 0;
+	len -= 1;
+
+	n += scnprintf(msg + n, len - n, "APEI location: %s ", location);
+
+	if (!(mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS))
+		goto out;
+
+	n += scnprintf(msg + n, len - n, "status(0x%016llx): ", mem->error_status);
+	n += scnprintf(msg + n, len - n, "%s ", cper_mem_err_status_str(mem->error_status));
+
+out:
+	msg[n] = '\0';
+
+	return n;
+}
+
 void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 {
+	struct cper_mem_err_compact cmem;
 	struct edac_raw_error_desc *e;
 	struct mem_ctl_info *mci;
 	struct ghes_pvt *pvt;
@@ -292,60 +320,10 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 
 	/* Error type, mapped on e->msg */
 	if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE) {
+		u8 etype = mem_err->error_type;
+
 		p = pvt->msg;
-		switch (mem_err->error_type) {
-		case 0:
-			p += sprintf(p, "Unknown");
-			break;
-		case 1:
-			p += sprintf(p, "No error");
-			break;
-		case 2:
-			p += sprintf(p, "Single-bit ECC");
-			break;
-		case 3:
-			p += sprintf(p, "Multi-bit ECC");
-			break;
-		case 4:
-			p += sprintf(p, "Single-symbol ChipKill ECC");
-			break;
-		case 5:
-			p += sprintf(p, "Multi-symbol ChipKill ECC");
-			break;
-		case 6:
-			p += sprintf(p, "Master abort");
-			break;
-		case 7:
-			p += sprintf(p, "Target abort");
-			break;
-		case 8:
-			p += sprintf(p, "Parity Error");
-			break;
-		case 9:
-			p += sprintf(p, "Watchdog timeout");
-			break;
-		case 10:
-			p += sprintf(p, "Invalid address");
-			break;
-		case 11:
-			p += sprintf(p, "Mirror Broken");
-			break;
-		case 12:
-			p += sprintf(p, "Memory Sparing");
-			break;
-		case 13:
-			p += sprintf(p, "Scrub corrected error");
-			break;
-		case 14:
-			p += sprintf(p, "Scrub uncorrected error");
-			break;
-		case 15:
-			p += sprintf(p, "Physical Memory Map-out event");
-			break;
-		default:
-			p += sprintf(p, "reserved error (%d)",
-				     mem_err->error_type);
-		}
+		p += snprintf(p, sizeof(pvt->msg), "%s", cper_mem_err_type_str(etype));
 	} else {
 		strcpy(pvt->msg, "unknown error");
 	}
@@ -362,52 +340,19 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 
 	/* Memory error location, mapped on e->location */
 	p = e->location;
-	if (mem_err->validation_bits & CPER_MEM_VALID_NODE)
-		p += sprintf(p, "node:%d ", mem_err->node);
-	if (mem_err->validation_bits & CPER_MEM_VALID_CARD)
-		p += sprintf(p, "card:%d ", mem_err->card);
-	if (mem_err->validation_bits & CPER_MEM_VALID_MODULE)
-		p += sprintf(p, "module:%d ", mem_err->module);
-	if (mem_err->validation_bits & CPER_MEM_VALID_RANK_NUMBER)
-		p += sprintf(p, "rank:%d ", mem_err->rank);
-	if (mem_err->validation_bits & CPER_MEM_VALID_BANK)
-		p += sprintf(p, "bank:%d ", mem_err->bank);
-	if (mem_err->validation_bits & CPER_MEM_VALID_BANK_GROUP)
-		p += sprintf(p, "bank_group:%d ",
-			     mem_err->bank >> CPER_MEM_BANK_GROUP_SHIFT);
-	if (mem_err->validation_bits & CPER_MEM_VALID_BANK_ADDRESS)
-		p += sprintf(p, "bank_address:%d ",
-			     mem_err->bank & CPER_MEM_BANK_ADDRESS_MASK);
-	if (mem_err->validation_bits & (CPER_MEM_VALID_ROW | CPER_MEM_VALID_ROW_EXT)) {
-		u32 row = mem_err->row;
-
-		row |= cper_get_mem_extension(mem_err->validation_bits, mem_err->extended);
-		p += sprintf(p, "row:%d ", row);
-	}
-	if (mem_err->validation_bits & CPER_MEM_VALID_COLUMN)
-		p += sprintf(p, "col:%d ", mem_err->column);
-	if (mem_err->validation_bits & CPER_MEM_VALID_BIT_POSITION)
-		p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos);
+	cper_mem_err_pack(mem_err, &cmem);
+	p += cper_mem_err_location(&cmem, p);
+
 	if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) {
-		const char *bank = NULL, *device = NULL;
 		struct dimm_info *dimm;
 
-		dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device);
-		if (bank != NULL && device != NULL)
-			p += sprintf(p, "DIMM location:%s %s ", bank, device);
-		else
-			p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
-				     mem_err->mem_dev_handle);
-
+		p += cper_dimm_err_location(&cmem, p);
 		dimm = find_dimm_by_handle(mci, mem_err->mem_dev_handle);
 		if (dimm) {
 			e->top_layer = dimm->idx;
 			strcpy(e->label, dimm->label);
 		}
 	}
-	if (mem_err->validation_bits & CPER_MEM_VALID_CHIP_ID)
-		p += sprintf(p, "chipID: %d ",
-			     mem_err->extended >> CPER_MEM_CHIP_ID_SHIFT);
 	if (p > e->location)
 		*(p - 1) = '\0';
 
@@ -416,78 +361,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 
 	/* All other fields are mapped on e->other_detail */
 	p = pvt->other_detail;
-	p += snprintf(p, sizeof(pvt->other_detail),
-		"APEI location: %s ", e->location);
-	if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_STATUS) {
-		u64 status = mem_err->error_status;
-
-		p += sprintf(p, "status(0x%016llx): ", (long long)status);
-		switch ((status >> 8) & 0xff) {
-		case 1:
-			p += sprintf(p, "Error detected internal to the component ");
-			break;
-		case 16:
-			p += sprintf(p, "Error detected in the bus ");
-			break;
-		case 4:
-			p += sprintf(p, "Storage error in DRAM memory ");
-			break;
-		case 5:
-			p += sprintf(p, "Storage error in TLB ");
-			break;
-		case 6:
-			p += sprintf(p, "Storage error in cache ");
-			break;
-		case 7:
-			p += sprintf(p, "Error in one or more functional units ");
-			break;
-		case 8:
-			p += sprintf(p, "component failed self test ");
-			break;
-		case 9:
-			p += sprintf(p, "Overflow or undervalue of internal queue ");
-			break;
-		case 17:
-			p += sprintf(p, "Virtual address not found on IO-TLB or IO-PDIR ");
-			break;
-		case 18:
-			p += sprintf(p, "Improper access error ");
-			break;
-		case 19:
-			p += sprintf(p, "Access to a memory address which is not mapped to any component ");
-			break;
-		case 20:
-			p += sprintf(p, "Loss of Lockstep ");
-			break;
-		case 21:
-			p += sprintf(p, "Response not associated with a request ");
-			break;
-		case 22:
-			p += sprintf(p, "Bus parity error - must also set the A, C, or D Bits ");
-			break;
-		case 23:
-			p += sprintf(p, "Detection of a PATH_ERROR ");
-			break;
-		case 25:
-			p += sprintf(p, "Bus operation timeout ");
-			break;
-		case 26:
-			p += sprintf(p, "A read was issued to data that has been poisoned ");
-			break;
-		default:
-			p += sprintf(p, "reserved ");
-			break;
-		}
-	}
-	if (mem_err->validation_bits & CPER_MEM_VALID_REQUESTOR_ID)
-		p += sprintf(p, "requestorID: 0x%016llx ",
-			     (long long)mem_err->requestor_id);
-	if (mem_err->validation_bits & CPER_MEM_VALID_RESPONDER_ID)
-		p += sprintf(p, "responderID: 0x%016llx ",
-			     (long long)mem_err->responder_id);
-	if (mem_err->validation_bits & CPER_MEM_VALID_TARGET_ID)
-		p += sprintf(p, "targetID: 0x%016llx ",
-			     (long long)mem_err->responder_id);
+	p += print_mem_error_other_detail(mem_err, p, e->location, OTHER_DETAIL_LEN);
 	if (p > pvt->other_detail)
 		*(p - 1) = '\0';
 
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 34eeaa59f04a..215c778fb33c 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -237,7 +237,7 @@ const char *cper_mem_err_status_str(u64 status)
 }
 EXPORT_SYMBOL_GPL(cper_mem_err_status_str);
 
-static int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
+int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
 {
 	u32 len, n;
 
@@ -291,7 +291,7 @@ static int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
 	return n;
 }
 
-static int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
+int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
 {
 	u32 len, n;
 	const char *bank = NULL, *device = NULL;
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 5b1dd27b317d..eacb7dd7b3af 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -569,5 +569,7 @@ void cper_print_proc_arm(const char *pfx,
 			 const struct cper_sec_proc_arm *proc);
 void cper_print_proc_ia(const char *pfx,
 			const struct cper_sec_proc_ia *proc);
+int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg);
+int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg);
 
 #endif

Shuai Xue March 7, 2022, 6:16 a.m. UTC | #13

Hi, Borislav,

在 2022/3/4 PM6:16, Borislav Petkov 写道:
> On Thu, Mar 03, 2022 at 08:26:26PM +0800, Shuai Xue wrote:
>> To Borislav: Sorry, I only delete the format change summary in this commit
>> log in this version. If I missed any comments, could you please point out
>> clearly? Thank you very much.
> 
> I can't be more clear than the below - I simply took your patch and
> heavily massaged it because it is a lot quicker this way.
> 
> You can compare it with your version and see what needed to be changed.
> And you can of course ask questios why, if it is not clear. But it
> should be - a shortlog of what I did is also in the commit message.
> 
> Now, when you look at the resulting patch you can see what the change
> does. Yours did more than one logical thing - which a patch should never
> do.
> 
> Also, I'd really really urge you to read
> 
> Documentation/process/submitting-patches.rst
> 
> carefully before sending other patches. I won't do this heavy editing
> in the future so you're going to have to read review comments and
> *incorporate* them into your patches before submitting them again.

Thank you very much for your valuable reworking. I see your concerns.

> You can test the below on your machine now to make sure it still
> performs as expected.

I have test this patch, it works well.

>       - rip out useless reformatting

I have a question about the format after this patch:

	[ 4578.277035] EDAC MC0: 1 CE single-symbol chipkill ECC on unknown memory (node: 0 card: 0 rank: 0 bank: 256 row: 11098 column: 1032 page:0x95ad2e offset:0x20 grain:1 syndrome:0x0 - APEI location: node: 0 card: 0 rank: 0 bank: 256 row: 11098 column: 1032)

Should we add a new patch to remove the 'space' delimiter after the colon?

Best Regards,
Shuai


> ---
> From: Shuai Xue <xueshuai@linux.alibaba.com>
> Date: Thu, 3 Mar 2022 20:26:26 +0800
> Subject: [PATCH] EDAC/ghes: Unify CPER memory error location reporting
> 
> Switch the GHES EDAC memory error reporting functions to use the common
> CPER ones and get rid of code duplication.
> 
>   [ bp:
>       - rewrite commit message, remove useless text
>       - rip out useless reformatting
>       - align function params on the opening brace
>       - rename function to a more descriptive name
>       - drop useless function exports
>       - handle buffer lengths properly when printing other detail
>       - remove useless casting
>   ]
> 
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Link: https://lore.kernel.org/r/20220303122626.99740-3-xueshuai@linux.alibaba.com
> ---
>  drivers/edac/Kconfig        |   1 +
>  drivers/edac/ghes_edac.c    | 200 +++++++-----------------------------
>  drivers/firmware/efi/cper.c |   4 +-
>  include/linux/cper.h        |   2 +
>  4 files changed, 42 insertions(+), 165 deletions(-)
> 
> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
> index 58ab63642e72..23f11554f400 100644
> --- a/drivers/edac/Kconfig
> +++ b/drivers/edac/Kconfig
> @@ -55,6 +55,7 @@ config EDAC_DECODE_MCE
>  config EDAC_GHES
>  	bool "Output ACPI APEI/GHES BIOS detected errors via EDAC"
>  	depends on ACPI_APEI_GHES && (EDAC=y)
> +	select UEFI_CPER
>  	help
>  	  Not all machines support hardware-driven error report. Some of those
>  	  provide a BIOS-driven error report mechanism via ACPI, using the
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 6d1ddecbf0da..2805d5610300 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -15,11 +15,13 @@
>  #include "edac_module.h"
>  #include <ras/ras_event.h>
>  
> +#define OTHER_DETAIL_LEN	400
> +
>  struct ghes_pvt {
>  	struct mem_ctl_info *mci;
>  
>  	/* Buffers for the error handling routine */
> -	char other_detail[400];
> +	char other_detail[OTHER_DETAIL_LEN];
>  	char msg[80];
>  };
>  
> @@ -235,8 +237,34 @@ static void ghes_scan_system(void)
>  	system_scanned = true;
>  }
>  
> +static int print_mem_error_other_detail(const struct cper_sec_mem_err *mem, char *msg,
> +					const char *location, unsigned int len)
> +{
> +	u32 n;
> +
> +	if (!msg)
> +		return 0;
> +
> +	n = 0;
> +	len -= 1;
> +
> +	n += scnprintf(msg + n, len - n, "APEI location: %s ", location);
> +
> +	if (!(mem->validation_bits & CPER_MEM_VALID_ERROR_STATUS))
> +		goto out;
> +
> +	n += scnprintf(msg + n, len - n, "status(0x%016llx): ", mem->error_status);
> +	n += scnprintf(msg + n, len - n, "%s ", cper_mem_err_status_str(mem->error_status));
> +
> +out:
> +	msg[n] = '\0';
> +
> +	return n;
> +}
> +
>  void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
>  {
> +	struct cper_mem_err_compact cmem;
>  	struct edac_raw_error_desc *e;
>  	struct mem_ctl_info *mci;
>  	struct ghes_pvt *pvt;
> @@ -292,60 +320,10 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
>  
>  	/* Error type, mapped on e->msg */
>  	if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE) {
> +		u8 etype = mem_err->error_type;
> +
>  		p = pvt->msg;
> -		switch (mem_err->error_type) {
> -		case 0:
> -			p += sprintf(p, "Unknown");
> -			break;
> -		case 1:
> -			p += sprintf(p, "No error");
> -			break;
> -		case 2:
> -			p += sprintf(p, "Single-bit ECC");
> -			break;
> -		case 3:
> -			p += sprintf(p, "Multi-bit ECC");
> -			break;
> -		case 4:
> -			p += sprintf(p, "Single-symbol ChipKill ECC");
> -			break;
> -		case 5:
> -			p += sprintf(p, "Multi-symbol ChipKill ECC");
> -			break;
> -		case 6:
> -			p += sprintf(p, "Master abort");
> -			break;
> -		case 7:
> -			p += sprintf(p, "Target abort");
> -			break;
> -		case 8:
> -			p += sprintf(p, "Parity Error");
> -			break;
> -		case 9:
> -			p += sprintf(p, "Watchdog timeout");
> -			break;
> -		case 10:
> -			p += sprintf(p, "Invalid address");
> -			break;
> -		case 11:
> -			p += sprintf(p, "Mirror Broken");
> -			break;
> -		case 12:
> -			p += sprintf(p, "Memory Sparing");
> -			break;
> -		case 13:
> -			p += sprintf(p, "Scrub corrected error");
> -			break;
> -		case 14:
> -			p += sprintf(p, "Scrub uncorrected error");
> -			break;
> -		case 15:
> -			p += sprintf(p, "Physical Memory Map-out event");
> -			break;
> -		default:
> -			p += sprintf(p, "reserved error (%d)",
> -				     mem_err->error_type);
> -		}
> +		p += snprintf(p, sizeof(pvt->msg), "%s", cper_mem_err_type_str(etype));
>  	} else {
>  		strcpy(pvt->msg, "unknown error");
>  	}
> @@ -362,52 +340,19 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
>  
>  	/* Memory error location, mapped on e->location */
>  	p = e->location;
> -	if (mem_err->validation_bits & CPER_MEM_VALID_NODE)
> -		p += sprintf(p, "node:%d ", mem_err->node);
> -	if (mem_err->validation_bits & CPER_MEM_VALID_CARD)
> -		p += sprintf(p, "card:%d ", mem_err->card);
> -	if (mem_err->validation_bits & CPER_MEM_VALID_MODULE)
> -		p += sprintf(p, "module:%d ", mem_err->module);
> -	if (mem_err->validation_bits & CPER_MEM_VALID_RANK_NUMBER)
> -		p += sprintf(p, "rank:%d ", mem_err->rank);
> -	if (mem_err->validation_bits & CPER_MEM_VALID_BANK)
> -		p += sprintf(p, "bank:%d ", mem_err->bank);
> -	if (mem_err->validation_bits & CPER_MEM_VALID_BANK_GROUP)
> -		p += sprintf(p, "bank_group:%d ",
> -			     mem_err->bank >> CPER_MEM_BANK_GROUP_SHIFT);
> -	if (mem_err->validation_bits & CPER_MEM_VALID_BANK_ADDRESS)
> -		p += sprintf(p, "bank_address:%d ",
> -			     mem_err->bank & CPER_MEM_BANK_ADDRESS_MASK);
> -	if (mem_err->validation_bits & (CPER_MEM_VALID_ROW | CPER_MEM_VALID_ROW_EXT)) {
> -		u32 row = mem_err->row;
> -
> -		row |= cper_get_mem_extension(mem_err->validation_bits, mem_err->extended);
> -		p += sprintf(p, "row:%d ", row);
> -	}
> -	if (mem_err->validation_bits & CPER_MEM_VALID_COLUMN)
> -		p += sprintf(p, "col:%d ", mem_err->column);
> -	if (mem_err->validation_bits & CPER_MEM_VALID_BIT_POSITION)
> -		p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos);
> +	cper_mem_err_pack(mem_err, &cmem);
> +	p += cper_mem_err_location(&cmem, p);
> +
>  	if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) {
> -		const char *bank = NULL, *device = NULL;
>  		struct dimm_info *dimm;
>  
> -		dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device);
> -		if (bank != NULL && device != NULL)
> -			p += sprintf(p, "DIMM location:%s %s ", bank, device);
> -		else
> -			p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
> -				     mem_err->mem_dev_handle);
> -
> +		p += cper_dimm_err_location(&cmem, p);
>  		dimm = find_dimm_by_handle(mci, mem_err->mem_dev_handle);
>  		if (dimm) {
>  			e->top_layer = dimm->idx;
>  			strcpy(e->label, dimm->label);
>  		}
>  	}
> -	if (mem_err->validation_bits & CPER_MEM_VALID_CHIP_ID)
> -		p += sprintf(p, "chipID: %d ",
> -			     mem_err->extended >> CPER_MEM_CHIP_ID_SHIFT);
>  	if (p > e->location)
>  		*(p - 1) = '\0';
>  
> @@ -416,78 +361,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
>  
>  	/* All other fields are mapped on e->other_detail */
>  	p = pvt->other_detail;
> -	p += snprintf(p, sizeof(pvt->other_detail),
> -		"APEI location: %s ", e->location);
> -	if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_STATUS) {
> -		u64 status = mem_err->error_status;
> -
> -		p += sprintf(p, "status(0x%016llx): ", (long long)status);
> -		switch ((status >> 8) & 0xff) {
> -		case 1:
> -			p += sprintf(p, "Error detected internal to the component ");
> -			break;
> -		case 16:
> -			p += sprintf(p, "Error detected in the bus ");
> -			break;
> -		case 4:
> -			p += sprintf(p, "Storage error in DRAM memory ");
> -			break;
> -		case 5:
> -			p += sprintf(p, "Storage error in TLB ");
> -			break;
> -		case 6:
> -			p += sprintf(p, "Storage error in cache ");
> -			break;
> -		case 7:
> -			p += sprintf(p, "Error in one or more functional units ");
> -			break;
> -		case 8:
> -			p += sprintf(p, "component failed self test ");
> -			break;
> -		case 9:
> -			p += sprintf(p, "Overflow or undervalue of internal queue ");
> -			break;
> -		case 17:
> -			p += sprintf(p, "Virtual address not found on IO-TLB or IO-PDIR ");
> -			break;
> -		case 18:
> -			p += sprintf(p, "Improper access error ");
> -			break;
> -		case 19:
> -			p += sprintf(p, "Access to a memory address which is not mapped to any component ");
> -			break;
> -		case 20:
> -			p += sprintf(p, "Loss of Lockstep ");
> -			break;
> -		case 21:
> -			p += sprintf(p, "Response not associated with a request ");
> -			break;
> -		case 22:
> -			p += sprintf(p, "Bus parity error - must also set the A, C, or D Bits ");
> -			break;
> -		case 23:
> -			p += sprintf(p, "Detection of a PATH_ERROR ");
> -			break;
> -		case 25:
> -			p += sprintf(p, "Bus operation timeout ");
> -			break;
> -		case 26:
> -			p += sprintf(p, "A read was issued to data that has been poisoned ");
> -			break;
> -		default:
> -			p += sprintf(p, "reserved ");
> -			break;
> -		}
> -	}
> -	if (mem_err->validation_bits & CPER_MEM_VALID_REQUESTOR_ID)
> -		p += sprintf(p, "requestorID: 0x%016llx ",
> -			     (long long)mem_err->requestor_id);
> -	if (mem_err->validation_bits & CPER_MEM_VALID_RESPONDER_ID)
> -		p += sprintf(p, "responderID: 0x%016llx ",
> -			     (long long)mem_err->responder_id);
> -	if (mem_err->validation_bits & CPER_MEM_VALID_TARGET_ID)
> -		p += sprintf(p, "targetID: 0x%016llx ",
> -			     (long long)mem_err->responder_id);
> +	p += print_mem_error_other_detail(mem_err, p, e->location, OTHER_DETAIL_LEN);
>  	if (p > pvt->other_detail)
>  		*(p - 1) = '\0';
>  
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 34eeaa59f04a..215c778fb33c 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -237,7 +237,7 @@ const char *cper_mem_err_status_str(u64 status)
>  }
>  EXPORT_SYMBOL_GPL(cper_mem_err_status_str);
>  
> -static int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
> +int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
>  {
>  	u32 len, n;
>  
> @@ -291,7 +291,7 @@ static int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
>  	return n;
>  }
>  
> -static int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
> +int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg)
>  {
>  	u32 len, n;
>  	const char *bank = NULL, *device = NULL;
> diff --git a/include/linux/cper.h b/include/linux/cper.h
> index 5b1dd27b317d..eacb7dd7b3af 100644
> --- a/include/linux/cper.h
> +++ b/include/linux/cper.h
> @@ -569,5 +569,7 @@ void cper_print_proc_arm(const char *pfx,
>  			 const struct cper_sec_proc_arm *proc);
>  void cper_print_proc_ia(const char *pfx,
>  			const struct cper_sec_proc_ia *proc);
> +int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg);
> +int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg);
>  
>  #endif

Borislav Petkov March 7, 2022, 7:08 a.m. UTC | #14

On Mon, Mar 07, 2022 at 02:16:42PM +0800, Shuai Xue wrote:
> Should we add a new patch to remove the 'space' delimiter after the colon?

Yeah, it looks like it would be easier to parse this way - both by
humans and tools.

[v2,0/3] ghes_edac: refactor memory error reporting to avoid code duplication

Message

Comments