[v8,6/7] apei/ghes: Use unrcu_pointer for cmpxchg

Message ID	20221010023559.69655-7-justin.he@arm.com
State	New
Headers	show Return-Path: <linux-efi-owner@kernel.org> From: Jia He <justin.he@arm.com> To: Len Brown <lenb@kernel.org>, James Morse <james.morse@arm.com>, Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>, Mauro Carvalho Chehab <mchehab@kernel.org>, Robert Richter <rric@kernel.org>, Robert Moore <robert.moore@intel.com>, Qiuxu Zhuo <qiuxu.zhuo@intel.com>, Yazen Ghannam <yazen.ghannam@amd.com>, Jan Luebbe <jlu@pengutronix.de>, Khuong Dinh <khuong@os.amperecomputing.com>, Kani Toshi <toshi.kani@hpe.com> Cc: Ard Biesheuvel <ardb@kernel.org>, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, devel@acpica.org, "Rafael J . Wysocki" <rafael@kernel.org>, Shuai Xue <xueshuai@linux.alibaba.com>, Jarkko Sakkinen <jarkko@kernel.org>, linux-efi@vger.kernel.org, nd@arm.com, Jia He <justin.he@arm.com>, kernel test robot <lkp@intel.com> Subject: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg Date: Mon, 10 Oct 2022 02:35:58 +0000 Message-Id: <20221010023559.69655-7-justin.he@arm.com> In-Reply-To: <20221010023559.69655-1-justin.he@arm.com> References: <20221010023559.69655-1-justin.he@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Make ghes_edac a proper module \| expand [v8,0/7] Make ghes_edac a proper module [v8,1/7] efi/cper: export several helpers for ghes_edac to use [v8,2/7] EDAC/ghes: Add a notifier for reporting memory errors [v8,3/7] EDAC/ghes: Prepare to make ghes_edac a proper module [v8,4/7] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes [v8,5/7] EDAC: Add the ghes_get_devices() check for chipset-specific edac drivers [v8,6/7] apei/ghes: Use unrcu_pointer for cmpxchg [v8,7/7] EDAC/igen6: Return consistent errno when another edac driver is enabled

Jia He Oct. 10, 2022, 2:35 a.m. UTC

ghes_estatus_caches should be add rcu annotation to avoid sparse warnings.
   drivers/acpi/apei/ghes.c:733:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
   drivers/acpi/apei/ghes.c:733:25: sparse:    struct ghes_estatus_cache *
   drivers/acpi/apei/ghes.c:813:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache [noderef] __rcu *
   drivers/acpi/apei/ghes.c:813:25: sparse:    struct ghes_estatus_cache *

unrcu_pointer is to strip the __rcu in cmpxchg.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 drivers/acpi/apei/ghes.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Borislav Petkov Oct. 11, 2022, 2:45 p.m. UTC | #1

On Tue, Oct 11, 2022 at 02:32:48PM +0000, Justin He wrote:
> My original purpose is to make it pass the sparse checking.

Then do this pls.

This is a combined diff - do a second patch which does only remove the
smp_wmb(). The smp_wmb() there is not needed as the cmpxchg() already
implies a smp_mb() so there's no need for that separate, explicit one.

But it would be prudent to have it in a separate patch just in case.

Thx.

---
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 85acfc589fb7..fd285bf8d114 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -154,7 +154,7 @@ struct ghes_vendor_record_entry {
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;
 
-static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
@@ -842,9 +842,7 @@ static void ghes_estatus_cache_add(
 			slot_cache = cache;
 		}
 	}
-	/* new_cache must be put into array after its contents are written */
-	smp_wmb();
-	if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
+	if (slot != -1 && cmpxchg((struct ghes_estatus_cache ** __force)(ghes_estatus_caches + slot),
 				  slot_cache, new_cache) == slot_cache) {
 		if (slot_cache)
 			call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);

Jia He Oct. 12, 2022, 4:35 a.m. UTC | #2

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Tuesday, October 11, 2022 10:46 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Len Brown <lenb@kernel.org>; James Morse <James.Morse@arm.com>;
> Tony Luck <tony.luck@intel.com>; Mauro Carvalho Chehab
> <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert Moore
> <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>; Yazen
> Ghannam <yazen.ghannam@amd.com>; Jan Luebbe <jlu@pengutronix.de>;
> Khuong Dinh <khuong@os.amperecomputing.com>; Kani Toshi
> <toshi.kani@hpe.com>; Ard Biesheuvel <ardb@kernel.org>;
> linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com>;
> kernel test robot <lkp@intel.com>
> Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> 
> On Tue, Oct 11, 2022 at 02:32:48PM +0000, Justin He wrote:
> > My original purpose is to make it pass the sparse checking.
> 
> Then do this pls.
> 
> This is a combined diff - do a second patch which does only remove the
> smp_wmb(). The smp_wmb() there is not needed as the cmpxchg() already
> implies a smp_mb() so there's no need for that separate, explicit one.
> 
> But it would be prudent to have it in a separate patch just in case.
> 
Sure, I will insert/append your patch in my v9 series if there are no more comments.

Thanks for the

Jia He Oct. 12, 2022, 12:04 p.m. UTC | #3

Hi Borislav

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Tuesday, October 11, 2022 10:46 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Len Brown <lenb@kernel.org>; James Morse <James.Morse@arm.com>;
> Tony Luck <tony.luck@intel.com>; Mauro Carvalho Chehab
> <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert Moore
> <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>; Yazen
> Ghannam <yazen.ghannam@amd.com>; Jan Luebbe <jlu@pengutronix.de>;
> Khuong Dinh <khuong@os.amperecomputing.com>; Kani Toshi
> <toshi.kani@hpe.com>; Ard Biesheuvel <ardb@kernel.org>;
> linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com>;
> kernel test robot <lkp@intel.com>
> Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> 
> On Tue, Oct 11, 2022 at 02:32:48PM +0000, Justin He wrote:
> > My original purpose is to make it pass the sparse checking.
> 
> Then do this pls.
> 
> This is a combined diff - do a second patch which does only remove the
> smp_wmb(). The smp_wmb() there is not needed as the cmpxchg() already
> implies a smp_mb() so there's no need for that separate, explicit one.
> 
I have a concern about what if cmpxchg failed? Do we have to still guarantee the ordering since cmpxchg will not imply a smp_mb if it failed.

Besides, I didn't find the paired smp_mb or smp_rmb
for this smp_wmb. Do you have any ideas?

--
Cheers,
Justin (Jia He)

Borislav Petkov Oct. 13, 2022, 1:37 p.m. UTC | #4

On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote:
> I have a concern about what if cmpxchg failed? Do we have to still
> guarantee the ordering since cmpxchg will not imply a smp_mb if it
> failed.

Of course it will imply that. At least on x86 it does. smp_wmb() is a
compiler barrier there and cmpxchg() already has that barrier semantics
by clobbering "memory". I'm pretty sure you should have the same thing
on ARM.

And even if that weren't the case, the write barrier is, as the comment
says, "new_cache must be put into array after its contents are written".

Are we writing anything into the cache if cmpxchg fails?

> Besides, I didn't find the paired smp_mb or smp_rmb for this smp_wmb.

Why would there be pairs? I don't understand that statement here.

Ard Biesheuvel Oct. 13, 2022, 3:41 p.m. UTC | #5

On Thu, 13 Oct 2022 at 15:37, Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote:
> > I have a concern about what if cmpxchg failed? Do we have to still
> > guarantee the ordering since cmpxchg will not imply a smp_mb if it
> > failed.
>
> Of course it will imply that. At least on x86 it does. smp_wmb() is a
> compiler barrier there and cmpxchg() already has that barrier semantics
> by clobbering "memory". I'm pretty sure you should have the same thing
> on ARM.
>

No it definitely does not imply that. A memory clobber is a codegen
construct, and the hardware could still complete the writes in a way
that could result in another observer seeing a mix of old and new
values that is inconsistent with the ordering of the stores as issued
by the compiler.

> says, "new_cache must be put into array after its contents are written".
>
> Are we writing anything into the cache if cmpxchg fails?
>

The cache fields get updated but the pointer to the struct is never
shared globally if the cmpxchg() fails so not having the barrier on
failure should not be an issue here.

> > Besides, I didn't find the paired smp_mb or smp_rmb for this smp_wmb.
>
> Why would there be pairs? I don't understand that statement here.
>

Typically, the other observer pairs the write barrier with a read barrier.

In this case, the other observer appears to be ghes_estatus_cached(),
and the reads of the cache struct fields must be ordered after the
read of the cache struct's address. However, there is an implicit
ordering there through an address dependency (you cannot dereference a
struct without knowing its address) so the ordering is implied (and
rcu_dereference() has a READ_ONCE() inside so we are guaranteed to
always dereference the same struct, even if the array slot gets
updated concurrently.)

If you want to get rid of the barrier, you could drop it and change
the cmpxchg() to cmpxchg_release().

Justin: so why are the RCU_INITIALIZER()s needed here?

Borislav Petkov Oct. 13, 2022, 4:37 p.m. UTC | #6

On Thu, Oct 13, 2022 at 05:41:06PM +0200, Ard Biesheuvel wrote:
> No it definitely does not imply that. A memory clobber is a codegen
> construct, and the hardware could still complete the writes in a way
> that could result in another observer seeing a mix of old and new
> values that is inconsistent with the ordering of the stores as issued
> by the compiler.

Yes, but look at the code. There's a:

	smp_wmb();

which on x86 is

	#define smp_wmb() barrier()

which is

	#define barrier() __asm__ __volatile__("": : :"memory")

so there wasn't a hardware memory barrier there in the first place.

Unless ARM does something else in those primitives...

Peter Zijlstra Oct. 13, 2022, 4:41 p.m. UTC | #7

On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote:

> > This is a combined diff - do a second patch which does only remove the
> > smp_wmb(). The smp_wmb() there is not needed as the cmpxchg() already
> > implies a smp_mb() so there's no need for that separate, explicit one.
> > 
> I have a concern about what if cmpxchg failed? Do we have to still
> guarantee the ordering since cmpxchg will not imply a smp_mb if it
> failed.
> 
> Besides, I didn't find the paired smp_mb or smp_rmb
> for this smp_wmb. Do you have any ideas?

failed cmpxchg does indeed not imply smp_mb; but in that case I can't
find a store it orders against; and the comment is utterly shite since
it doesn't spell out the ordering in any case.

The way I read that code is that the cmpxchg effectively publishes the
data and all it wants to ensure is that if the pointer is publised the
object is complete -- in which case the cpmxchg() is sufficient, on
success the object is published and you get the ordering, on failure the
object isn't published and nobody cares about the ordering anyway.

If there's anything else, the comment is in dire need of fixing anyway.

Peter Zijlstra Oct. 13, 2022, 4:45 p.m. UTC | #8

On Thu, Oct 13, 2022 at 05:41:06PM +0200, Ard Biesheuvel wrote:
> On Thu, 13 Oct 2022 at 15:37, Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote:
> > > I have a concern about what if cmpxchg failed? Do we have to still
> > > guarantee the ordering since cmpxchg will not imply a smp_mb if it
> > > failed.
> >
> > Of course it will imply that. At least on x86 it does. smp_wmb() is a
> > compiler barrier there and cmpxchg() already has that barrier semantics
> > by clobbering "memory". I'm pretty sure you should have the same thing
> > on ARM.
> >
> 
> No it definitely does not imply that. A memory clobber is a codegen
> construct, and the hardware could still complete the writes in a way
> that could result in another observer seeing a mix of old and new
> values that is inconsistent with the ordering of the stores as issued
> by the compiler.

Borislav is thinking too much x86. Failed cmpxchg() does indeed not
imply any memory ordering for all architectures -- and while the memory
clobber (aka. barrier()) is equivalent to an smp_wmb() on x86, that most
certainly doesn't hold for non x86 code.

> > says, "new_cache must be put into array after its contents are written".
> >
> > Are we writing anything into the cache if cmpxchg fails?
> >
> 
> The cache fields get updated but the pointer to the struct is never
> shared globally if the cmpxchg() fails so not having the barrier on
> failure should not be an issue here.

That is how I read the code too; so if the cmpxchg fails the object is
not published and nobody cares about the ordering.

> 
> > > Besides, I didn't find the paired smp_mb or smp_rmb for this smp_wmb.
> >
> > Why would there be pairs? I don't understand that statement here.
> >
> 
> Typically, the other observer pairs the write barrier with a read barrier.
> 
> In this case, the other observer appears to be ghes_estatus_cached(),
> and the reads of the cache struct fields must be ordered after the
> read of the cache struct's address. However, there is an implicit
> ordering there through an address dependency (you cannot dereference a
> struct without knowing its address) so the ordering is implied (and
> rcu_dereference() has a READ_ONCE() inside so we are guaranteed to
> always dereference the same struct, even if the array slot gets
> updated concurrently.)
> 
> If you want to get rid of the barrier, you could drop it and change
> the cmpxchg() to cmpxchg_release().

cmpxchg_release() is strictly weaker than cmpxchg(); so I don't see the
point there other than optimizing for weak architectures. It can't
fundamentally fix anything.

Borislav Petkov Oct. 13, 2022, 5:42 p.m. UTC | #9

On Thu, Oct 13, 2022 at 06:45:44PM +0200, Peter Zijlstra wrote:
> Borislav is thinking too much x86. Failed cmpxchg() does indeed not
> imply any memory ordering for all architectures -- and while the memory
> clobber (aka. barrier()) is equivalent to an smp_wmb() on x86, that most
> certainly doesn't hold for non x86 code.

Right, but the patch was addied by an Intel person, CCed:

152cef40a808 ("ACPI, APEI, GHES, Error records content based throttle")a

So I don't think he was thinking about ARM when doing that.

And that commit message doesn't say one whit why that memory barrier is
needed there.

Reading that comment, it sounds like he wanted a hw memory barrier -
MFENCE - but I don't see how normal data dependency wouldn't enforce the proper order
already...

So that barrier looks out of place there.

Btw, this is the next perfect example why I'm asking people to write
proper commit messages so that when we do git archeology later, we can
figure out why something was done the way it has been.

And in this case, we can't. ;-\

Because writing proper commit messages is for losers. Yeah, right.</sarcasm>

Ard Biesheuvel Oct. 14, 2022, 9:40 a.m. UTC | #10

On Thu, 13 Oct 2022 at 19:42, Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Oct 13, 2022 at 06:45:44PM +0200, Peter Zijlstra wrote:
> > Borislav is thinking too much x86. Failed cmpxchg() does indeed not
> > imply any memory ordering for all architectures -- and while the memory
> > clobber (aka. barrier()) is equivalent to an smp_wmb() on x86, that most
> > certainly doesn't hold for non x86 code.
>
> Right, but the patch was addied by an Intel person, CCed:
>
> 152cef40a808 ("ACPI, APEI, GHES, Error records content based throttle")a
>
> So I don't think he was thinking about ARM when doing that.
>
> And that commit message doesn't say one whit why that memory barrier is
> needed there.
>
> Reading that comment, it sounds like he wanted a hw memory barrier -
> MFENCE - but I don't see how normal data dependency wouldn't enforce the proper order
> already...
>
> So that barrier looks out of place there.
>

The cache struct pointer should not be published until after the
struct itself is fully populated. So on the producer side, some kind
of hardware barrier is definitely needed, or the struct may appear
half-baked to other cores that can read the updated pointer.

But as Peter points out, cmpxchg() itself has the required barrier
semantics already, so the separate smp_wmb() is likely unnecessary.
And as I suggested earlier, a full barrier is not necessary so we
could relax this to cmpxchg_release() if desired.

OTOH the code seems to be working fine as is, so why modify it at all?
(apart from the purely cosmetic changes)

> Btw, this is the next perfect example why I'm asking people to write
> proper commit messages so that when we do git archeology later, we can
> figure out why something was done the way it has been.
>
> And in this case, we can't. ;-\
>
> Because writing proper commit messages is for losers. Yeah, right.</sarcasm>
>

Yeah, agree there.

Jia He Oct. 14, 2022, noon UTC | #11

Hi Ard

> -----Original Message-----
> From: Ard Biesheuvel <ardb@kernel.org>
> Sent: Thursday, October 13, 2022 11:41 PM
> To: Borislav Petkov <bp@alien8.de>
> Cc: Justin He <Justin.He@arm.com>; Len Brown <lenb@kernel.org>; James
> Morse <James.Morse@arm.com>; Tony Luck <tony.luck@intel.com>; Mauro
> Carvalho Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>;
> Robert Moore <robert.moore@intel.com>; Qiuxu Zhuo
> <qiuxu.zhuo@intel.com>; Yazen Ghannam <yazen.ghannam@amd.com>; Jan
> Luebbe <jlu@pengutronix.de>; Khuong Dinh
> <khuong@os.amperecomputing.com>; Kani Toshi <toshi.kani@hpe.com>;
> linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com>;
> kernel test robot <lkp@intel.com>
> Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> 
> On Thu, 13 Oct 2022 at 15:37, Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote:
> > > I have a concern about what if cmpxchg failed? Do we have to still
> > > guarantee the ordering since cmpxchg will not imply a smp_mb if it
> > > failed.
> >
> > Of course it will imply that. At least on x86 it does. smp_wmb() is a
> > compiler barrier there and cmpxchg() already has that barrier
> > semantics by clobbering "memory". I'm pretty sure you should have the
> > same thing on ARM.
> >
> 
> No it definitely does not imply that. A memory clobber is a codegen construct,
> and the hardware could still complete the writes in a way that could result in
> another observer seeing a mix of old and new values that is inconsistent with
> the ordering of the stores as issued by the compiler.
> 
> > says, "new_cache must be put into array after its contents are written".
> >
> > Are we writing anything into the cache if cmpxchg fails?
> >
> 
> The cache fields get updated but the pointer to the struct is never shared
> globally if the cmpxchg() fails so not having the barrier on failure should not be
> an issue here.
> 
> > > Besides, I didn't find the paired smp_mb or smp_rmb for this smp_wmb.
> >
> > Why would there be pairs? I don't understand that statement here.
> >
> 
> Typically, the other observer pairs the write barrier with a read barrier.
> 
> In this case, the other observer appears to be ghes_estatus_cached(), and the
> reads of the cache struct fields must be ordered after the read of the cache
> struct's address. However, there is an implicit ordering there through an address
> dependency (you cannot dereference a struct without knowing its address) so
> the ordering is implied (and
> rcu_dereference() has a READ_ONCE() inside so we are guaranteed to always
> dereference the same struct, even if the array slot gets updated concurrently.)
> 
> If you want to get rid of the barrier, you could drop it and change the cmpxchg()
> to cmpxchg_release().
> 
> Justin: so why are the RCU_INITIALIZER()s needed here?

In my this patch, I add the "__rcu" to the definition of ghes_estatus_caches. Hence
Sparse will still have the warning on X86 with this RCU_INITIALIZER cast.
drivers/acpi/apei/ghes.c:843:27: sparse: warning: incorrect type in initializer (different address spaces)
drivers/acpi/apei/ghes.c:843:27: sparse:    expected struct ghes_estatus_cache [noderef] __rcu *__old
drivers/acpi/apei/ghes.c:843:27: sparse:    got struct ghes_estatus_cache *[assigned] slot_cache
drivers/acpi/apei/ghes.c:843:27: sparse: warning: incorrect type in initializer (different address spaces)
drivers/acpi/apei/ghes.c:843:27: sparse:    expected struct ghes_estatus_cache [noderef] __rcu *__new
drivers/acpi/apei/ghes.c:843:27: sparse:    got struct ghes_estatus_cache *[assigned] new_cache

On Arm, IMO the macro cmpxchg doesn't care about it, that is, sparse will not report warnings with or
without RCU_INITIALIZER cast.

I tend to remain this cast, what do you think of it.

--
Cheers,
Justin (Jia He)

Ard Biesheuvel Oct. 14, 2022, 2:31 p.m. UTC | #12

On Fri, 14 Oct 2022 at 14:00, Justin He <Justin.He@arm.com> wrote:
>
> Hi Ard
>
> > -----Original Message-----
> > From: Ard Biesheuvel <ardb@kernel.org>
> > Sent: Thursday, October 13, 2022 11:41 PM
> > To: Borislav Petkov <bp@alien8.de>
> > Cc: Justin He <Justin.He@arm.com>; Len Brown <lenb@kernel.org>; James
> > Morse <James.Morse@arm.com>; Tony Luck <tony.luck@intel.com>; Mauro
> > Carvalho Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>;
> > Robert Moore <robert.moore@intel.com>; Qiuxu Zhuo
> > <qiuxu.zhuo@intel.com>; Yazen Ghannam <yazen.ghannam@amd.com>; Jan
> > Luebbe <jlu@pengutronix.de>; Khuong Dinh
> > <khuong@os.amperecomputing.com>; Kani Toshi <toshi.kani@hpe.com>;
> > linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org;
> > linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> > <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko
> > Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com>;
> > kernel test robot <lkp@intel.com>
> > Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> >
> > On Thu, 13 Oct 2022 at 15:37, Borislav Petkov <bp@alien8.de> wrote:
> > >
> > > On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote:
> > > > I have a concern about what if cmpxchg failed? Do we have to still
> > > > guarantee the ordering since cmpxchg will not imply a smp_mb if it
> > > > failed.
> > >
> > > Of course it will imply that. At least on x86 it does. smp_wmb() is a
> > > compiler barrier there and cmpxchg() already has that barrier
> > > semantics by clobbering "memory". I'm pretty sure you should have the
> > > same thing on ARM.
> > >
> >
> > No it definitely does not imply that. A memory clobber is a codegen construct,
> > and the hardware could still complete the writes in a way that could result in
> > another observer seeing a mix of old and new values that is inconsistent with
> > the ordering of the stores as issued by the compiler.
> >
> > > says, "new_cache must be put into array after its contents are written".
> > >
> > > Are we writing anything into the cache if cmpxchg fails?
> > >
> >
> > The cache fields get updated but the pointer to the struct is never shared
> > globally if the cmpxchg() fails so not having the barrier on failure should not be
> > an issue here.
> >
> > > > Besides, I didn't find the paired smp_mb or smp_rmb for this smp_wmb.
> > >
> > > Why would there be pairs? I don't understand that statement here.
> > >
> >
> > Typically, the other observer pairs the write barrier with a read barrier.
> >
> > In this case, the other observer appears to be ghes_estatus_cached(), and the
> > reads of the cache struct fields must be ordered after the read of the cache
> > struct's address. However, there is an implicit ordering there through an address
> > dependency (you cannot dereference a struct without knowing its address) so
> > the ordering is implied (and
> > rcu_dereference() has a READ_ONCE() inside so we are guaranteed to always
> > dereference the same struct, even if the array slot gets updated concurrently.)
> >
> > If you want to get rid of the barrier, you could drop it and change the cmpxchg()
> > to cmpxchg_release().
> >
> > Justin: so why are the RCU_INITIALIZER()s needed here?
>
> In my this patch, I add the "__rcu" to the definition of ghes_estatus_caches. Hence
> Sparse will still have the warning on X86 with this RCU_INITIALIZER cast.
> drivers/acpi/apei/ghes.c:843:27: sparse: warning: incorrect type in initializer (different address spaces)
> drivers/acpi/apei/ghes.c:843:27: sparse:    expected struct ghes_estatus_cache [noderef] __rcu *__old
> drivers/acpi/apei/ghes.c:843:27: sparse:    got struct ghes_estatus_cache *[assigned] slot_cache
> drivers/acpi/apei/ghes.c:843:27: sparse: warning: incorrect type in initializer (different address spaces)
> drivers/acpi/apei/ghes.c:843:27: sparse:    expected struct ghes_estatus_cache [noderef] __rcu *__new
> drivers/acpi/apei/ghes.c:843:27: sparse:    got struct ghes_estatus_cache *[assigned] new_cache
>
> On Arm, IMO the macro cmpxchg doesn't care about it, that is, sparse will not report warnings with or
> without RCU_INITIALIZER cast.
>
> I tend to remain this cast, what do you think of it.
>

OK, fair enough, I had only tested the arm64 build myself.

But just putting unrcu_pointer() and RCU_INITIALIZER() arbitrarily to
shut up sparse is a bit sloppy, imho. Passing around pointers like
this code does makes that necessary, unfortunately, so it would be
nice if we could clean that up, by getting rid of the slot_cache
variable.

And now that I have spent some time looking at this code, I wonder
what the point of the cmpxchg() is in the first place.
ghes_estatus_cache_add() selects a slot, and either succeeds in
replacing its contents with a pointer to a new cached item, or it just
gives up and frees the new item again, without attempting to select
another slot even if one might be available.

Since we only insert new items, the race can only cause a failure if
the selected slot was updated with another new item concurrently,
which means that it is arbitrary which of those two items gets
dropped. This means we don't need the cmpxchg() and the special case,
and we can just drop the existing item unconditionally. Note that this
does not result in loss of error events, it simply means we might
cause a false cache miss, and report the same event one additional
time in quick succession even if the cache should have prevented that.


------------------------8<------------------------------
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 80ad530583c9..03acdfa35dab 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -138,7 +138,7 @@ struct ghes_vendor_record_entry {
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;

-static struct ghes_estatus_cache
*ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static struct ghes_estatus_cache __rcu
*ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;

 static int ghes_panic_timeout __read_mostly = 30;
@@ -773,31 +773,26 @@ static struct ghes_estatus_cache
*ghes_estatus_cache_alloc(
        return cache;
 }

-static void ghes_estatus_cache_free(struct ghes_estatus_cache *cache)
+static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
 {
+       struct ghes_estatus_cache *cache;
        u32 len;

+       cache = container_of(head, struct ghes_estatus_cache, rcu);
        len = cper_estatus_len(GHES_ESTATUS_FROM_CACHE(cache));
        len = GHES_ESTATUS_CACHE_LEN(len);
        gen_pool_free(ghes_estatus_pool, (unsigned long)cache, len);
        atomic_dec(&ghes_estatus_cache_alloced);
 }

-static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
-{
-       struct ghes_estatus_cache *cache;
-
-       cache = container_of(head, struct ghes_estatus_cache, rcu);
-       ghes_estatus_cache_free(cache);
-}
-
 static void ghes_estatus_cache_add(
        struct acpi_hest_generic *generic,
        struct acpi_hest_generic_status *estatus)
 {
        int i, slot = -1, count;
        unsigned long long now, duration, period, max_period = 0;
-       struct ghes_estatus_cache *cache, *slot_cache = NULL, *new_cache;
+       struct ghes_estatus_cache *cache, *new_cache;
+       struct ghes_estatus_cache __rcu *victim;

        new_cache = ghes_estatus_cache_alloc(generic, estatus);
        if (new_cache == NULL)
@@ -808,13 +803,11 @@ static void ghes_estatus_cache_add(
                cache = rcu_dereference(ghes_estatus_caches[i]);
                if (cache == NULL) {
                        slot = i;
-                       slot_cache = NULL;
                        break;
                }
                duration = now - cache->time_in;
                if (duration >= GHES_ESTATUS_IN_CACHE_MAX_NSEC) {
                        slot = i;
-                       slot_cache = cache;
                        break;
                }
                count = atomic_read(&cache->count);
@@ -823,17 +816,28 @@ static void ghes_estatus_cache_add(
                if (period > max_period) {
                        max_period = period;
                        slot = i;
-                       slot_cache = cache;
                }
        }
-       /* new_cache must be put into array after its contents are written */
-       smp_wmb();
-       if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
-                                 slot_cache, new_cache) == slot_cache) {
-               if (slot_cache)
-                       call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
-       } else
-               ghes_estatus_cache_free(new_cache);
+       if (slot != -1) {
+               /*
+                * Use release semantics to ensure that ghes_estatus_cached()
+                * running on another CPU will see the updated cache fields if
+                * it can see the new value of the pointer.
+                */
+               victim = xchg_release(ghes_estatus_caches + slot,
+                                     RCU_INITIALIZER(new_cache));
+
+               /*
+                * At this point, victim may point to a cached item different
+                * from the one based on which we selected the slot. Instead of
+                * going to the loop again to pick another slot, let's just
+                * drop the other item anyway: this may cause a false cache
+                * miss later on, but that won't cause any problems.
+                */
+               if (victim)
+                       call_rcu(&rcu_dereference(victim)->rcu,
+                                ghes_estatus_cache_rcu_free);
+       }
        rcu_read_unlock();
 }

Peter Zijlstra Oct. 14, 2022, 3:10 p.m. UTC | #13

On Fri, Oct 14, 2022 at 04:31:37PM +0200, Ard Biesheuvel wrote:
> +       if (slot != -1) {
> +               /*
> +                * Use release semantics to ensure that ghes_estatus_cached()
> +                * running on another CPU will see the updated cache fields if
> +                * it can see the new value of the pointer.
> +                */
> +               victim = xchg_release(ghes_estatus_caches + slot,
> +                                     RCU_INITIALIZER(new_cache));
> +
> +               /*
> +                * At this point, victim may point to a cached item different
> +                * from the one based on which we selected the slot. Instead of
> +                * going to the loop again to pick another slot, let's just
> +                * drop the other item anyway: this may cause a false cache
> +                * miss later on, but that won't cause any problems.
> +                */
> +               if (victim) {
> +                       call_rcu(&rcu_dereference(victim)->rcu,
> +                                ghes_estatus_cache_rcu_free);
		}

I think you can use unrcu_pointer() here instead, there should not be a
data dependency since the ->rcu member itself should be otherwise unused
(and if it were, we wouldn't care about its previous content anyway).

But only Alpha cares about that distinction anyway, so *shrug*.

While I much like the xchg() variant; I still don't really fancy the
verbage the sparse nonsense makes us do.

		victim = xchg_release(&ghes_estatus_caches[slot], new_cache);
		if (victim)
			call_rcu(&victim->rcu, ghes_estatus_cache_rcu_free);

is much nicer code.

Over all; I'd simply ignore sparse (I often do).

> +       }
>         rcu_read_unlock();
>  }

Ard Biesheuvel Oct. 14, 2022, 3:24 p.m. UTC | #14

On Fri, 14 Oct 2022 at 17:11, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Oct 14, 2022 at 04:31:37PM +0200, Ard Biesheuvel wrote:
> > +       if (slot != -1) {
> > +               /*
> > +                * Use release semantics to ensure that ghes_estatus_cached()
> > +                * running on another CPU will see the updated cache fields if
> > +                * it can see the new value of the pointer.
> > +                */
> > +               victim = xchg_release(ghes_estatus_caches + slot,
> > +                                     RCU_INITIALIZER(new_cache));
> > +
> > +               /*
> > +                * At this point, victim may point to a cached item different
> > +                * from the one based on which we selected the slot. Instead of
> > +                * going to the loop again to pick another slot, let's just
> > +                * drop the other item anyway: this may cause a false cache
> > +                * miss later on, but that won't cause any problems.
> > +                */
> > +               if (victim) {
> > +                       call_rcu(&rcu_dereference(victim)->rcu,
> > +                                ghes_estatus_cache_rcu_free);
>                 }
>
> I think you can use unrcu_pointer() here instead, there should not be a
> data dependency since the ->rcu member itself should be otherwise unused
> (and if it were, we wouldn't care about its previous content anyway).
>
> But only Alpha cares about that distinction anyway, so *shrug*.
>

Ah yeah good point - and we are not actually dereferencing the pointer
at all here, just adding an offset to get at the address of the rcu
member.

So we can take this block out of the rcu_read_lock() section as well.


> While I much like the xchg() variant; I still don't really fancy the
> verbage the sparse nonsense makes us do.
>
>                 victim = xchg_release(&ghes_estatus_caches[slot], new_cache);
>                 if (victim)
>                         call_rcu(&victim->rcu, ghes_estatus_cache_rcu_free);
>
> is much nicer code.
>
> Over all; I'd simply ignore sparse (I often do).
>

No disagreement there.

Borislav Petkov Oct. 14, 2022, 7:40 p.m. UTC | #15

On Fri, Oct 14, 2022 at 11:40:39AM +0200, Ard Biesheuvel wrote:
> The cache struct pointer should not be published until after the
> struct itself is fully populated. So on the producer side, some kind
> of hardware barrier is definitely needed, or the struct may appear
> half-baked to other cores that can read the updated pointer.

Ah, right you are, ghes_estatus_cached() is called in all kinds of
contexts and by other cores, sure.

> OTOH the code seems to be working fine as is, so why modify it at all?
> (apart from the purely cosmetic changes)

peterz questioned that smp_wmb() there and then we started digging. And
frankly, even if removing that barrier won't make any difference, I'd
still prefer it gone and have the code simpler and cleaner.

Thx.

Jia He Oct. 17, 2022, 8:47 a.m. UTC | #16

Hi Ard

> -----Original Message-----
> Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
>
> On Fri, 14 Oct 2022 at 17:11, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Oct 14, 2022 at 04:31:37PM +0200, Ard Biesheuvel wrote:
> > > +       if (slot != -1) {
> > > +               /*
> > > +                * Use release semantics to ensure that
> ghes_estatus_cached()
> > > +                * running on another CPU will see the updated cache
> fields if
> > > +                * it can see the new value of the pointer.
> > > +                */
> > > +               victim = xchg_release(ghes_estatus_caches + slot,
> > > +
> RCU_INITIALIZER(new_cache));
> > > +
> > > +               /*
> > > +                * At this point, victim may point to a cached item
> different
> > > +                * from the one based on which we selected the slot.
> Instead of
> > > +                * going to the loop again to pick another slot, let's
> just
> > > +                * drop the other item anyway: this may cause a false
> cache
> > > +                * miss later on, but that won't cause any problems.
> > > +                */
> > > +               if (victim) {
> > > +                       call_rcu(&rcu_dereference(victim)->rcu,
> > > +                                ghes_estatus_cache_rcu_free);
> >                 }
> >
> > I think you can use unrcu_pointer() here instead, there should not be
> > a data dependency since the ->rcu member itself should be otherwise
> > unused (and if it were, we wouldn't care about its previous content anyway).
> >
> > But only Alpha cares about that distinction anyway, so *shrug*.
> >
>
> Ah yeah good point - and we are not actually dereferencing the pointer at all
> here, just adding an offset to get at the address of the rcu member.
>
> So we can take this block out of the rcu_read_lock() section as well.
>
>
> > While I much like the xchg() variant; I still don't really fancy the
> > verbage the sparse nonsense makes us do.
> >
> >                 victim = xchg_release(&ghes_estatus_caches[slot],
> new_cache);
> >                 if (victim)
> >                         call_rcu(&victim->rcu,
> > ghes_estatus_cache_rcu_free);
> >
> > is much nicer code.
> >
> > Over all; I'd simply ignore sparse (I often do).
> >
>
> No disagreement there.

What do you think of the updated patch:

apei/ghes: Use xchg() for updating new cache slot instead of
 cmpxchg()

From: Ard Biesheuvel <ardb@kernel.org>

ghes_estatus_cache_add() selects a slot, and either succeeds in
replacing its contents with a pointer to a new cached item, or it just
gives up and frees the new item again, without attempting to select
another slot even if one might be available.

Since only inserting new items is needed, the race can only cause a failure
if the selected slot was updated with another new item concurrently,
which means that it is arbitrary which of those two items gets
dropped. This means the cmpxchg() and the special case are not necessary,
and hence just drop the existing item unconditionally. Note that this
does not result in loss of error events, it simply means we might
cause a false cache miss, and report the same event one additional
time in quick succession even if the cache should have prevented that.

Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
[Justin]: I removed __rcu annotation of victim, removed the RCU_INITIALIZER
cast and added the unptr for xchg.

drivers/acpi/apei/ghes.c | 44 ++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 27c72b175e4b..5fc8a135450b 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -150,7 +150,7 @@ struct ghes_vendor_record_entry {
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;

-static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;

 static int ghes_panic_timeout __read_mostly = 30;
@@ -785,31 +785,26 @@ static struct ghes_estatus_cache *ghes_estatus_cache_alloc(
        return cache;
 }

-static void ghes_estatus_cache_free(struct ghes_estatus_cache *cache)
+static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
 {
+       struct ghes_estatus_cache *cache;
        u32 len;

+       cache = container_of(head, struct ghes_estatus_cache, rcu);
        len = cper_estatus_len(GHES_ESTATUS_FROM_CACHE(cache));
        len = GHES_ESTATUS_CACHE_LEN(len);
        gen_pool_free(ghes_estatus_pool, (unsigned long)cache, len);
        atomic_dec(&ghes_estatus_cache_alloced);
 }

-static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
-{
-       struct ghes_estatus_cache *cache;
-
-       cache = container_of(head, struct ghes_estatus_cache, rcu);
-       ghes_estatus_cache_free(cache);
-}
-
 static void ghes_estatus_cache_add(
        struct acpi_hest_generic *generic,
        struct acpi_hest_generic_status *estatus)
 {
        int i, slot = -1, count;
        unsigned long long now, duration, period, max_period = 0;
-       struct ghes_estatus_cache *cache, *slot_cache = NULL, *new_cache;
+       struct ghes_estatus_cache *cache, *new_cache;
+       struct ghes_estatus_cache *victim;

        new_cache = ghes_estatus_cache_alloc(generic, estatus);
        if (new_cache == NULL)
@@ -820,13 +815,11 @@ static void ghes_estatus_cache_add(
                cache = rcu_dereference(ghes_estatus_caches[i]);
                if (cache == NULL) {
                        slot = i;
-                       slot_cache = NULL;
                        break;
                }
                duration = now - cache->time_in;
                if (duration >= GHES_ESTATUS_IN_CACHE_MAX_NSEC) {
                        slot = i;
-                       slot_cache = cache;
                        break;
                }
                count = atomic_read(&cache->count);
@@ -835,17 +828,24 @@ static void ghes_estatus_cache_add(
                if (period > max_period) {
                        max_period = period;
                        slot = i;
-                       slot_cache = cache;
                }
        }
-       /* new_cache must be put into array after its contents are written */
-       smp_wmb();
-       if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
-                                 slot_cache, new_cache) == slot_cache) {
-               if (slot_cache)
-                       call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
-       } else
-               ghes_estatus_cache_free(new_cache);
+       if (slot != -1) {
+               /*
+                * Use release semantics to ensure that ghes_estatus_cached()
+                * running on another CPU will see the updated cache fields if
+                * it can see the new value of the pointer.
+                * At this point, victim may point to a cached item different
+                * from the one based on which we selected the slot. Instead of
+                * going to the loop again to pick another slot, let's just
+                * drop the other item anyway: this may cause a false cache
+                * miss later on, but that won't cause any problems.
+                */
+               victim = unrcu_pointer(xchg_release(&ghes_estatus_caches[slot],
+                                       new_cache));
+               if (victim)
+                       call_rcu(&victim->rcu, ghes_estatus_cache_rcu_free);
+       }
        rcu_read_unlock();
 }

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Ard Biesheuvel Oct. 17, 2022, 9:27 a.m. UTC | #17

Hi Justin,

On Mon, 17 Oct 2022 at 10:47, Justin He <Justin.He@arm.com> wrote:
>
> Hi Ard
>
> > -----Original Message-----
> > Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> >
> > On Fri, 14 Oct 2022 at 17:11, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Fri, Oct 14, 2022 at 04:31:37PM +0200, Ard Biesheuvel wrote:
> > > > +       if (slot != -1) {
> > > > +               /*
> > > > +                * Use release semantics to ensure that
> > ghes_estatus_cached()
> > > > +                * running on another CPU will see the updated cache
> > fields if
> > > > +                * it can see the new value of the pointer.
> > > > +                */
> > > > +               victim = xchg_release(ghes_estatus_caches + slot,
> > > > +
> > RCU_INITIALIZER(new_cache));
> > > > +
> > > > +               /*
> > > > +                * At this point, victim may point to a cached item
> > different
> > > > +                * from the one based on which we selected the slot.
> > Instead of
> > > > +                * going to the loop again to pick another slot, let's
> > just
> > > > +                * drop the other item anyway: this may cause a false
> > cache
> > > > +                * miss later on, but that won't cause any problems.
> > > > +                */
> > > > +               if (victim) {
> > > > +                       call_rcu(&rcu_dereference(victim)->rcu,
> > > > +                                ghes_estatus_cache_rcu_free);
> > >                 }
> > >
> > > I think you can use unrcu_pointer() here instead, there should not be
> > > a data dependency since the ->rcu member itself should be otherwise
> > > unused (and if it were, we wouldn't care about its previous content anyway).
> > >
> > > But only Alpha cares about that distinction anyway, so *shrug*.
> > >
> >
> > Ah yeah good point - and we are not actually dereferencing the pointer at all
> > here, just adding an offset to get at the address of the rcu member.
> >
> > So we can take this block out of the rcu_read_lock() section as well.
> >
> >
> > > While I much like the xchg() variant; I still don't really fancy the
> > > verbage the sparse nonsense makes us do.
> > >
> > >                 victim = xchg_release(&ghes_estatus_caches[slot],
> > new_cache);
> > >                 if (victim)
> > >                         call_rcu(&victim->rcu,
> > > ghes_estatus_cache_rcu_free);
> > >
> > > is much nicer code.
> > >
> > > Over all; I'd simply ignore sparse (I often do).
> > >
> >
> > No disagreement there.
>
> What do you think of the updated patch:
>
> apei/ghes: Use xchg() for updating new cache slot instead of
>  cmpxchg()
>
> From: Ard Biesheuvel <ardb@kernel.org>
>
> ghes_estatus_cache_add() selects a slot, and either succeeds in
> replacing its contents with a pointer to a new cached item, or it just
> gives up and frees the new item again, without attempting to select
> another slot even if one might be available.
>
> Since only inserting new items is needed, the race can only cause a failure
> if the selected slot was updated with another new item concurrently,
> which means that it is arbitrary which of those two items gets
> dropped. This means the cmpxchg() and the special case are not necessary,
> and hence just drop the existing item unconditionally. Note that this
> does not result in loss of error events, it simply means we might
> cause a false cache miss, and report the same event one additional
> time in quick succession even if the cache should have prevented that.
>

Please add a line here

Co-developed-by: Jia He <justin.he@arm.com>

> Signed-off-by: Jia He <justin.he@arm.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> [Justin]: I removed __rcu annotation of victim, removed the RCU_INITIALIZER
> cast and added the unptr for xchg.
>
> drivers/acpi/apei/ghes.c | 44 ++++++++++++++++++++--------------------
>  1 file changed, 22 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 27c72b175e4b..5fc8a135450b 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -150,7 +150,7 @@ struct ghes_vendor_record_entry {
>  static struct gen_pool *ghes_estatus_pool;
>  static unsigned long ghes_estatus_pool_size_request;
>
> -static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> +static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
>  static atomic_t ghes_estatus_cache_alloced;
>
>  static int ghes_panic_timeout __read_mostly = 30;
> @@ -785,31 +785,26 @@ static struct ghes_estatus_cache *ghes_estatus_cache_alloc(
>         return cache;
>  }
>
> -static void ghes_estatus_cache_free(struct ghes_estatus_cache *cache)
> +static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
>  {
> +       struct ghes_estatus_cache *cache;
>         u32 len;
>
> +       cache = container_of(head, struct ghes_estatus_cache, rcu);
>         len = cper_estatus_len(GHES_ESTATUS_FROM_CACHE(cache));
>         len = GHES_ESTATUS_CACHE_LEN(len);
>         gen_pool_free(ghes_estatus_pool, (unsigned long)cache, len);
>         atomic_dec(&ghes_estatus_cache_alloced);
>  }
>
> -static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
> -{
> -       struct ghes_estatus_cache *cache;
> -
> -       cache = container_of(head, struct ghes_estatus_cache, rcu);
> -       ghes_estatus_cache_free(cache);
> -}
> -
>  static void ghes_estatus_cache_add(
>         struct acpi_hest_generic *generic,
>         struct acpi_hest_generic_status *estatus)
>  {
>         int i, slot = -1, count;
>         unsigned long long now, duration, period, max_period = 0;
> -       struct ghes_estatus_cache *cache, *slot_cache = NULL, *new_cache;
> +       struct ghes_estatus_cache *cache, *new_cache;
> +       struct ghes_estatus_cache *victim;
>
>         new_cache = ghes_estatus_cache_alloc(generic, estatus);
>         if (new_cache == NULL)
> @@ -820,13 +815,11 @@ static void ghes_estatus_cache_add(
>                 cache = rcu_dereference(ghes_estatus_caches[i]);
>                 if (cache == NULL) {
>                         slot = i;
> -                       slot_cache = NULL;
>                         break;
>                 }
>                 duration = now - cache->time_in;
>                 if (duration >= GHES_ESTATUS_IN_CACHE_MAX_NSEC) {
>                         slot = i;
> -                       slot_cache = cache;
>                         break;
>                 }
>                 count = atomic_read(&cache->count);
> @@ -835,17 +828,24 @@ static void ghes_estatus_cache_add(
>                 if (period > max_period) {
>                         max_period = period;
>                         slot = i;
> -                       slot_cache = cache;
>                 }
>         }
> -       /* new_cache must be put into array after its contents are written */
> -       smp_wmb();
> -       if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
> -                                 slot_cache, new_cache) == slot_cache) {
> -               if (slot_cache)
> -                       call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
> -       } else
> -               ghes_estatus_cache_free(new_cache);
> +       if (slot != -1) {
> +               /*
> +                * Use release semantics to ensure that ghes_estatus_cached()
> +                * running on another CPU will see the updated cache fields if
> +                * it can see the new value of the pointer.

Please move the comment back where it was. 'At this point' is now
ambiguous because victim has not been assigned yet.

> +                * At this point, victim may point to a cached item different
> +                * from the one based on which we selected the slot. Instead of
> +                * going to the loop again to pick another slot, let's just
> +                * drop the other item anyway: this may cause a false cache
> +                * miss later on, but that won't cause any problems.
> +                */
> +               victim = unrcu_pointer(xchg_release(&ghes_estatus_caches[slot],
> +                                       new_cache));

Doesn't this still trigger the sparse warning on x86?

> +               if (victim)
> +                       call_rcu(&victim->rcu, ghes_estatus_cache_rcu_free);

I think it is better to add back the __rcu annotation to 'victim', and
change this line to

call_rcu(&unrcu_pointer(victim)->rcu, ghes_estatus_cache_rcu_free);

> +       }
>         rcu_read_unlock();

This can now be moved before the if()

>  }
>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Please get rid of this footer.

Jia He Oct. 17, 2022, 11:57 a.m. UTC | #18

Hi Ard

> -----Original Message-----
> From: Ard Biesheuvel <ardb@kernel.org>
> Sent: Monday, October 17, 2022 5:27 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Borislav Petkov <bp@alien8.de>; Len Brown <lenb@kernel.org>; James
> Morse <James.Morse@arm.com>; Tony Luck <tony.luck@intel.com>; Mauro
> Carvalho Chehab <mchehab@kernel.org>; Peter Zijlstra
> <peterz@infradead.org>; Robert Richter <rric@kernel.org>; Robert Moore
> <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>; Yazen
> Ghannam <yazen.ghannam@amd.com>; Jan Luebbe <jlu@pengutronix.de>;
> Khuong Dinh <khuong@os.amperecomputing.com>; Kani Toshi
> <toshi.kani@hpe.com>; linux-acpi@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org; devel@acpica.org;
> Rafael J . Wysocki <rafael@kernel.org>; Shuai Xue
> <xueshuai@linux.alibaba.com>; Jarkko Sakkinen <jarkko@kernel.org>;
> linux-efi@vger.kernel.org; kernel test robot <lkp@intel.com>
> Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> 
> Hi Justin,
> 
> On Mon, 17 Oct 2022 at 10:47, Justin He <Justin.He@arm.com> wrote:
> >
> > Hi Ard
> >
> > > -----Original Message-----
> > > Subject: Re: [PATCH v8 6/7] apei/ghes: Use unrcu_pointer for cmpxchg
> > >
> > > On Fri, 14 Oct 2022 at 17:11, Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Fri, Oct 14, 2022 at 04:31:37PM +0200, Ard Biesheuvel wrote:
> > > > > +       if (slot != -1) {
> > > > > +               /*
> > > > > +                * Use release semantics to ensure that
> > > ghes_estatus_cached()
> > > > > +                * running on another CPU will see the updated
> > > > > + cache
> > > fields if
> > > > > +                * it can see the new value of the pointer.
> > > > > +                */
> > > > > +               victim = xchg_release(ghes_estatus_caches +
> > > > > + slot,
> > > > > +
> > > RCU_INITIALIZER(new_cache));
> > > > > +
> > > > > +               /*
> > > > > +                * At this point, victim may point to a cached
> > > > > + item
> > > different
> > > > > +                * from the one based on which we selected the slot.
> > > Instead of
> > > > > +                * going to the loop again to pick another slot,
> > > > > + let's
> > > just
> > > > > +                * drop the other item anyway: this may cause a
> > > > > + false
> > > cache
> > > > > +                * miss later on, but that won't cause any problems.
> > > > > +                */
> > > > > +               if (victim) {
> > > > > +                       call_rcu(&rcu_dereference(victim)->rcu,
> > > > > +                                ghes_estatus_cache_rcu_free);
> > > >                 }
> > > >
> > > > I think you can use unrcu_pointer() here instead, there should not
> > > > be a data dependency since the ->rcu member itself should be
> > > > otherwise unused (and if it were, we wouldn't care about its previous
> content anyway).
> > > >
> > > > But only Alpha cares about that distinction anyway, so *shrug*.
> > > >
> > >
> > > Ah yeah good point - and we are not actually dereferencing the
> > > pointer at all here, just adding an offset to get at the address of the rcu
> member.
> > >
> > > So we can take this block out of the rcu_read_lock() section as well.
> > >
> > >
> > > > While I much like the xchg() variant; I still don't really fancy
> > > > the verbage the sparse nonsense makes us do.
> > > >
> > > >                 victim = xchg_release(&ghes_estatus_caches[slot],
> > > new_cache);
> > > >                 if (victim)
> > > >                         call_rcu(&victim->rcu,
> > > > ghes_estatus_cache_rcu_free);
> > > >
> > > > is much nicer code.
> > > >
> > > > Over all; I'd simply ignore sparse (I often do).
> > > >
> > >
> > > No disagreement there.
> >
> > What do you think of the updated patch:
> >
> > apei/ghes: Use xchg() for updating new cache slot instead of
> >  cmpxchg()
> >
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > ghes_estatus_cache_add() selects a slot, and either succeeds in
> > replacing its contents with a pointer to a new cached item, or it just
> > gives up and frees the new item again, without attempting to select
> > another slot even if one might be available.
> >
> > Since only inserting new items is needed, the race can only cause a
> > failure if the selected slot was updated with another new item
> > concurrently, which means that it is arbitrary which of those two
> > items gets dropped. This means the cmpxchg() and the special case are
> > not necessary, and hence just drop the existing item unconditionally.
> > Note that this does not result in loss of error events, it simply
> > means we might cause a false cache miss, and report the same event one
> > additional time in quick succession even if the cache should have prevented
> that.
> >
> 
> Please add a line here
> 
> Co-developed-by: Jia He <justin.he@arm.com>
> 
> > Signed-off-by: Jia He <justin.he@arm.com>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > [Justin]: I removed __rcu annotation of victim, removed the
> > RCU_INITIALIZER cast and added the unptr for xchg.
> >
> > drivers/acpi/apei/ghes.c | 44 ++++++++++++++++++++--------------------
> >  1 file changed, 22 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
> > 27c72b175e4b..5fc8a135450b 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -150,7 +150,7 @@ struct ghes_vendor_record_entry {  static struct
> > gen_pool *ghes_estatus_pool;  static unsigned long
> > ghes_estatus_pool_size_request;
> >
> > -static struct ghes_estatus_cache
> > *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> > +static struct ghes_estatus_cache __rcu
> > +*ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> >  static atomic_t ghes_estatus_cache_alloced;
> >
> >  static int ghes_panic_timeout __read_mostly = 30; @@ -785,31 +785,26
> > @@ static struct ghes_estatus_cache *ghes_estatus_cache_alloc(
> >         return cache;
> >  }
> >
> > -static void ghes_estatus_cache_free(struct ghes_estatus_cache *cache)
> > +static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
> >  {
> > +       struct ghes_estatus_cache *cache;
> >         u32 len;
> >
> > +       cache = container_of(head, struct ghes_estatus_cache, rcu);
> >         len = cper_estatus_len(GHES_ESTATUS_FROM_CACHE(cache));
> >         len = GHES_ESTATUS_CACHE_LEN(len);
> >         gen_pool_free(ghes_estatus_pool, (unsigned long)cache, len);
> >         atomic_dec(&ghes_estatus_cache_alloced);
> >  }
> >
> > -static void ghes_estatus_cache_rcu_free(struct rcu_head *head) -{
> > -       struct ghes_estatus_cache *cache;
> > -
> > -       cache = container_of(head, struct ghes_estatus_cache, rcu);
> > -       ghes_estatus_cache_free(cache);
> > -}
> > -
> >  static void ghes_estatus_cache_add(
> >         struct acpi_hest_generic *generic,
> >         struct acpi_hest_generic_status *estatus)  {
> >         int i, slot = -1, count;
> >         unsigned long long now, duration, period, max_period = 0;
> > -       struct ghes_estatus_cache *cache, *slot_cache = NULL,
> *new_cache;
> > +       struct ghes_estatus_cache *cache, *new_cache;
> > +       struct ghes_estatus_cache *victim;
> >
> >         new_cache = ghes_estatus_cache_alloc(generic, estatus);
> >         if (new_cache == NULL)
> > @@ -820,13 +815,11 @@ static void ghes_estatus_cache_add(
> >                 cache = rcu_dereference(ghes_estatus_caches[i]);
> >                 if (cache == NULL) {
> >                         slot = i;
> > -                       slot_cache = NULL;
> >                         break;
> >                 }
> >                 duration = now - cache->time_in;
> >                 if (duration >= GHES_ESTATUS_IN_CACHE_MAX_NSEC) {
> >                         slot = i;
> > -                       slot_cache = cache;
> >                         break;
> >                 }
> >                 count = atomic_read(&cache->count); @@ -835,17
> +828,24
> > @@ static void ghes_estatus_cache_add(
> >                 if (period > max_period) {
> >                         max_period = period;
> >                         slot = i;
> > -                       slot_cache = cache;
> >                 }
> >         }
> > -       /* new_cache must be put into array after its contents are written
> */
> > -       smp_wmb();
> > -       if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
> > -                                 slot_cache, new_cache) == slot_cache)
> {
> > -               if (slot_cache)
> > -                       call_rcu(&slot_cache->rcu,
> ghes_estatus_cache_rcu_free);
> > -       } else
> > -               ghes_estatus_cache_free(new_cache);
> > +       if (slot != -1) {
> > +               /*
> > +                * Use release semantics to ensure that
> ghes_estatus_cached()
> > +                * running on another CPU will see the updated cache
> fields if
> > +                * it can see the new value of the pointer.
> 
> Please move the comment back where it was. 'At this point' is now ambiguous
> because victim has not been assigned yet.
> 
> > +                * At this point, victim may point to a cached item
> different
> > +                * from the one based on which we selected the slot.
> Instead of
> > +                * going to the loop again to pick another slot, let's just
> > +                * drop the other item anyway: this may cause a false
> cache
> > +                * miss later on, but that won't cause any problems.
> > +                */
> > +               victim =
> unrcu_pointer(xchg_release(&ghes_estatus_caches[slot],
> > +                                       new_cache));
> 
> Doesn't this still trigger the sparse warning on x86?
Yes,
I will add the RCU_INITIALIZER back if Peter doesn't object.


--
Cheers,
Justin (Jia He)

[v8,6/7] apei/ghes: Use unrcu_pointer for cmpxchg

Commit Message

Comments

Patch