ARM: make memzero optimization smarter

Message ID	20180116162905.1404585-1-arnd@arndb.de
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; From: Arnd Bergmann <arnd@arndb.de> To: Russell King <linux@armlinux.org.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>, Zhichang Yuan <yuanzhichang@hisilicon.com>, Nicolas Pitre <nico@linaro.org>, Arnd Bergmann <arnd@arndb.de>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH] ARM: make memzero optimization smarter Date: Tue, 16 Jan 2018 17:28:43 +0100 Message-Id: <20180116162905.1404585-1-arnd@arndb.de> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk
Series	ARM: make memzero optimization smarter \| expand ARM: make memzero optimization smarter

Arnd Bergmann Jan. 16, 2018, 4:28 p.m. UTC

While testing with a gcc-8.0.0 snapshot, I ran into a harmless
build warning:

In file included from include/linux/string.h:18:0,
		 ...
                 from drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:10:
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c: In function '__lb_other_process':
arch/arm/include/asm/string.h:50:5: error: 'memset' specified size 4294967295 exceeds maximum object size 2147483647 [-Werror=stringop-overflow=]
     memset((__p),(v),(__n));  \
     ^~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:394:3: note: in expansion of macro 'memset'
   memset(&skb->data[frame_size / 2], 0xAA, frame_size / 2 - 1);
   ^~~~~~

I think the warning is unintentional here, and gcc should not actually
warn, so I reported this in the gcc bugzilla as pr82103. From the
discussion there, it seems unlikely that this gets addressed in
the compiler.

The problem here is that testing the 'frame_size' variable for non-zero
in the first memset() macro invocation leads to a code path in which
gcc thinks it may be zero, and that code path would lead to an overly
large length for the following memset that is now "(u32)-1". We know
this won't happen as the skb len is already guaranteed to be nonzero
when we get here (it has just been allocated with a nonzero size).

However, we can avoid this class of bogus warnings for the memset() macro
by only doing the micro-optimization for zero-length arguments when the
length is a compile-time constant. This should also reduce code size by
a few bytes, and avoid an extra branch for the cases that a variable-length
argument is always nonzero, which is probably the common case anyway.

I have made sure that the __memzero implementation can safely handle
a zero length argument.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82103
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---
Originally sent in September 2017, but then forgot about it
as nobody replied.

I slightly updated the change text now to reflect that the gcc
developers treat it as an invalid bug.
---
 arch/arm/include/asm/string.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.9.0

Nicolas Pitre Jan. 16, 2018, 5:10 p.m. UTC | #1

On Tue, 16 Jan 2018, Arnd Bergmann wrote:

> While testing with a gcc-8.0.0 snapshot, I ran into a harmless

> build warning:

> 

> In file included from include/linux/string.h:18:0,

> 		 ...

>                  from drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:10:

> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c: In function '__lb_other_process':

> arch/arm/include/asm/string.h:50:5: error: 'memset' specified size 4294967295 exceeds maximum object size 2147483647 [-Werror=stringop-overflow=]

>      memset((__p),(v),(__n));  \

>      ^~~~~~~~~~~~~~~~~~~~~~~

> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:394:3: note: in expansion of macro 'memset'

>    memset(&skb->data[frame_size / 2], 0xAA, frame_size / 2 - 1);

>    ^~~~~~

> 

> I think the warning is unintentional here, and gcc should not actually

> warn, so I reported this in the gcc bugzilla as pr82103. From the

> discussion there, it seems unlikely that this gets addressed in

> the compiler.

> 

> The problem here is that testing the 'frame_size' variable for non-zero

> in the first memset() macro invocation leads to a code path in which

> gcc thinks it may be zero, and that code path would lead to an overly

> large length for the following memset that is now "(u32)-1". We know

> this won't happen as the skb len is already guaranteed to be nonzero

> when we get here (it has just been allocated with a nonzero size).

> 

> However, we can avoid this class of bogus warnings for the memset() macro

> by only doing the micro-optimization for zero-length arguments when the

> length is a compile-time constant. This should also reduce code size by

> a few bytes, and avoid an extra branch for the cases that a variable-length

> argument is always nonzero, which is probably the common case anyway.

> 

> I have made sure that the __memzero implementation can safely handle

> a zero length argument.


Why not simply drop the test on (__n) != 0 then? I fail to see what the 
advantage is in that case.


> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82103

> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> ---

> Originally sent in September 2017, but then forgot about it

> as nobody replied.

> 

> I slightly updated the change text now to reflect that the gcc

> developers treat it as an invalid bug.

> ---

>  arch/arm/include/asm/string.h | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/arch/arm/include/asm/string.h b/arch/arm/include/asm/string.h

> index f54a3136aac6..a8b90e33cc87 100644

> --- a/arch/arm/include/asm/string.h

> +++ b/arch/arm/include/asm/string.h

> @@ -44,7 +44,7 @@ extern void __memzero(void *ptr, __kernel_size_t n);

>  #define memset(p,v,n)							\

>  	({								\

>  	 	void *__p = (p); size_t __n = n;			\

> -		if ((__n) != 0) {					\

> +		if (!__builtin_constant_p(__n) || (__n) != 0) {		\

>  			if (__builtin_constant_p((v)) && (v) == 0)	\

>  				__memzero((__p),(__n));			\

>  			else						\

> -- 

> 2.9.0

> 

>

Arnd Bergmann Jan. 16, 2018, 10:30 p.m. UTC | #2

On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Tue, 16 Jan 2018, Arnd Bergmann wrote:

>

>> While testing with a gcc-8.0.0 snapshot, I ran into a harmless

>> build warning:

>>

>> In file included from include/linux/string.h:18:0,

>>                ...

>>                  from drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:10:

>> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c: In function '__lb_other_process':

>> arch/arm/include/asm/string.h:50:5: error: 'memset' specified size 4294967295 exceeds maximum object size 2147483647 [-Werror=stringop-overflow=]

>>      memset((__p),(v),(__n));  \

>>      ^~~~~~~~~~~~~~~~~~~~~~~

>> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:394:3: note: in expansion of macro 'memset'

>>    memset(&skb->data[frame_size / 2], 0xAA, frame_size / 2 - 1);

>>    ^~~~~~

>>

>> I think the warning is unintentional here, and gcc should not actually

>> warn, so I reported this in the gcc bugzilla as pr82103. From the

>> discussion there, it seems unlikely that this gets addressed in

>> the compiler.

>>

>> The problem here is that testing the 'frame_size' variable for non-zero

>> in the first memset() macro invocation leads to a code path in which

>> gcc thinks it may be zero, and that code path would lead to an overly

>> large length for the following memset that is now "(u32)-1". We know

>> this won't happen as the skb len is already guaranteed to be nonzero

>> when we get here (it has just been allocated with a nonzero size).

>>

>> However, we can avoid this class of bogus warnings for the memset() macro

>> by only doing the micro-optimization for zero-length arguments when the

>> length is a compile-time constant. This should also reduce code size by

>> a few bytes, and avoid an extra branch for the cases that a variable-length

>> argument is always nonzero, which is probably the common case anyway.

>>

>> I have made sure that the __memzero implementation can safely handle

>> a zero length argument.

>

> Why not simply drop the test on (__n) != 0 then? I fail to see what the

> advantage is in that case.


Good point. We might actually get even better results by dropping the
__memzero path entirely, since gcc has can optimize trivial memset()
operations and inline them.

If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr'
instructions compared to the memset.S implementation, but calling
memset() rather than __memzero() from C code ends up saving a
function call at least some of the time.

Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero()
and 636 calls to memset() in vmlinux. If I remove the macro entirely,
I get 1775 calls to memset() instead, so 780 memzero instances got
inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the
__memzero implementation that we could possibly also drop.

FWIW, the zero-length check saves five references to __memzero()
and one reference to memset(), or 16 bytes in kernel size, I have not
checked what those are.

      Arnd

Nicolas Pitre Jan. 17, 2018, 4:07 a.m. UTC | #3

On Tue, 16 Jan 2018, Arnd Bergmann wrote:

> On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:

> > On Tue, 16 Jan 2018, Arnd Bergmann wrote:

> >

> >> However, we can avoid this class of bogus warnings for the memset() macro

> >> by only doing the micro-optimization for zero-length arguments when the

> >> length is a compile-time constant. This should also reduce code size by

> >> a few bytes, and avoid an extra branch for the cases that a variable-length

> >> argument is always nonzero, which is probably the common case anyway.

> >>

> >> I have made sure that the __memzero implementation can safely handle

> >> a zero length argument.

> >

> > Why not simply drop the test on (__n) != 0 then? I fail to see what the

> > advantage is in that case.

> 

> Good point. We might actually get even better results by dropping the

> __memzero path entirely, since gcc has can optimize trivial memset()

> operations and inline them.

> 

> If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr'

> instructions compared to the memset.S implementation, but calling

> memset() rather than __memzero() from C code ends up saving a

> function call at least some of the time.

> 

> Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero()

> and 636 calls to memset() in vmlinux. If I remove the macro entirely,

> I get 1775 calls to memset() instead, so 780 memzero instances got

> inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the

> __memzero implementation that we could possibly also drop.


I get 3668 fewer bytes just by removing the test against 0 in the macro.

And an additional 5092 fewer bytes by removing the call-to-__memzero 
optimization.

That's using gcc v6.3.1.

> FWIW, the zero-length check saves five references to __memzero()

> and one reference to memset(), or 16 bytes in kernel size, I have not

> checked what those are.


They apparently are:

security/keys/key.c:1117:2:
  memset(&ktype->lock_class, 0, sizeof(ktype->lock_class));
crypto/drbg.c:615:3:
   memset(drbg->V, 1, drbg_statelen(drbg));
crypto/drbg.c:1120:3:
   memset(drbg->V, 0, drbg_statelen(drbg));
crypto/drbg.c:1121:3:
   memset(drbg->C, 0, drbg_statelen(drbg));
drivers/crypto/bcm/cipher.c:1963:2:
  memset(ctx->bcm_spu_req_hdr, 0, alloc_len);
drivers/media/platform/vivid/vivid-vbi-cap.c:106:2:
  memset(vbuf, 0x10, vb2_plane_size(&buf->vb.vb2_buf, 0));
drivers/media/platform/vivid/vivid-vbi-cap.c:127:2:
  memset(vbuf, 0, vb2_plane_size(&buf->vb.vb2_buf, 0));
drivers/gpu/drm/nouveau/nvkm/subdev/bios/conn.c:50:2:
  memset(info, 0x00, sizeof(*info));


Nicolas

Russell King (Oracle) Jan. 17, 2018, 10:58 a.m. UTC | #4

On Tue, Jan 16, 2018 at 11:07:34PM -0500, Nicolas Pitre wrote:
> On Tue, 16 Jan 2018, Arnd Bergmann wrote:

> 

> > On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:

> > > On Tue, 16 Jan 2018, Arnd Bergmann wrote:

> > >

> > >> However, we can avoid this class of bogus warnings for the memset() macro

> > >> by only doing the micro-optimization for zero-length arguments when the

> > >> length is a compile-time constant. This should also reduce code size by

> > >> a few bytes, and avoid an extra branch for the cases that a variable-length

> > >> argument is always nonzero, which is probably the common case anyway.

> > >>

> > >> I have made sure that the __memzero implementation can safely handle

> > >> a zero length argument.

> > >

> > > Why not simply drop the test on (__n) != 0 then? I fail to see what the

> > > advantage is in that case.

> > 

> > Good point. We might actually get even better results by dropping the

> > __memzero path entirely, since gcc has can optimize trivial memset()

> > operations and inline them.

> > 

> > If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr'

> > instructions compared to the memset.S implementation, but calling

> > memset() rather than __memzero() from C code ends up saving a

> > function call at least some of the time.

> > 

> > Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero()

> > and 636 calls to memset() in vmlinux. If I remove the macro entirely,

> > I get 1775 calls to memset() instead, so 780 memzero instances got

> > inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the

> > __memzero implementation that we could possibly also drop.

> 

> I get 3668 fewer bytes just by removing the test against 0 in the macro.

> 

> And an additional 5092 fewer bytes by removing the call-to-__memzero 

> optimization.


However, __memzero is not safe against being called with a zero length
so it's not something we can simply remove.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Nicolas Pitre Jan. 17, 2018, 2:03 p.m. UTC | #5

On Wed, 17 Jan 2018, Russell King - ARM Linux wrote:

> On Tue, Jan 16, 2018 at 11:07:34PM -0500, Nicolas Pitre wrote:

> > On Tue, 16 Jan 2018, Arnd Bergmann wrote:

> > 

> > > On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:

> > > > On Tue, 16 Jan 2018, Arnd Bergmann wrote:

> > > >

> > > >> However, we can avoid this class of bogus warnings for the memset() macro

> > > >> by only doing the micro-optimization for zero-length arguments when the

> > > >> length is a compile-time constant. This should also reduce code size by

> > > >> a few bytes, and avoid an extra branch for the cases that a variable-length

> > > >> argument is always nonzero, which is probably the common case anyway.

> > > >>

> > > >> I have made sure that the __memzero implementation can safely handle

> > > >> a zero length argument.

> > > >

> > > > Why not simply drop the test on (__n) != 0 then? I fail to see what the

> > > > advantage is in that case.

> > > 

> > > Good point. We might actually get even better results by dropping the

> > > __memzero path entirely, since gcc has can optimize trivial memset()

> > > operations and inline them.

> > > 

> > > If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr'

> > > instructions compared to the memset.S implementation, but calling

> > > memset() rather than __memzero() from C code ends up saving a

> > > function call at least some of the time.

> > > 

> > > Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero()

> > > and 636 calls to memset() in vmlinux. If I remove the macro entirely,

> > > I get 1775 calls to memset() instead, so 780 memzero instances got

> > > inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the

> > > __memzero implementation that we could possibly also drop.

> > 

> > I get 3668 fewer bytes just by removing the test against 0 in the macro.

> > 

> > And an additional 5092 fewer bytes by removing the call-to-__memzero 

> > optimization.

> 

> However, __memzero is not safe against being called with a zero length

> so it's not something we can simply remove.


The idea is about the possibility of removing __memzero altogether.
It is not clear that the tiny performance gain from a dedicated memzero 
implementation is worth the current overhead around it.


Nicolas

Nicolas Pitre Jan. 17, 2018, 9:14 p.m. UTC | #6

On Wed, 17 Jan 2018, Nicolas Pitre wrote:

> On Wed, 17 Jan 2018, Russell King - ARM Linux wrote:

> 

> > However, __memzero is not safe against being called with a zero length

> > so it's not something we can simply remove.

> 

> The idea is about the possibility of removing __memzero altogether.

> It is not clear that the tiny performance gain from a dedicated memzero 

> implementation is worth the current overhead around it.


This being said, I fail to see how __memzero is not safe against a zero 
length. Are you sure it isn't?


Nicolas

ARM: make memzero optimization smarter

Commit Message

Comments

Patch