diff mbox series

[1/3,builtins] Generic support for __builtin_load_no_speculate()

Message ID 3f3375cf241d099400b7f90c7c6e42c2e140734c.1515072356.git.Richard.Earnshaw@arm.com
State Superseded
Headers show
Series Add __builtin_load_no_speculate | expand

Commit Message

Richard Earnshaw (lists) Jan. 4, 2018, 1:58 p.m. UTC
This patch adds generic support for the new builtin
__builtin_load_no_speculate.  It provides the overloading of the
different access sizes and a default fall-back expansion for targets
that do not support a mechanism for inhibiting speculation.

	* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
	New builtin type signature.
	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
	* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.
	(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.
	(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.
	(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.
	(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.
	(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.
	* target.def (inhibit_load_speculation): New hook.
	* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to
	documentation.
	* doc/tm.texi: Regenerated.
	* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.
	* doc/extend.texi: Document __builtin_load_no_speculate.
	* c-family/c-common.c (load_no_speculate_resolve_size): New function.
	(load_no_speculate_resolve_params): New function.
	(load_no_speculate_resolve_return): New function.
	(resolve_overloaded_builtin): Handle overloading
	__builtin_load_no_speculate.
	* builtins.c (expand_load_no_speculate): New function.
	(expand_builtin): Handle new no-speculation builtins.
	* targhooks.h (default_inhibit_load_speculation): Declare.
	* targhooks.c (default_inhibit_load_speculation): New function.
---
 gcc/builtin-types.def       |  16 +++++
 gcc/builtins.c              |  99 ++++++++++++++++++++++++++
 gcc/builtins.def            |  22 ++++++
 gcc/c-family/c-common.c     | 164 ++++++++++++++++++++++++++++++++++++++++++++
 gcc/c-family/c-cppbuiltin.c |   5 +-
 gcc/doc/cpp.texi            |   4 ++
 gcc/doc/extend.texi         |  53 ++++++++++++++
 gcc/doc/tm.texi             |   6 ++
 gcc/doc/tm.texi.in          |   2 +
 gcc/target.def              |  20 ++++++
 gcc/targhooks.c             |  69 +++++++++++++++++++
 gcc/targhooks.h             |   3 +
 12 files changed, 462 insertions(+), 1 deletion(-)

Comments

Bill Schmidt Jan. 8, 2018, 2:20 a.m. UTC | #1
Hi Richard,

Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
have a way to force resolution of a condition prior to continuing speculation, so this
will just introduce another comparison that we would speculate past.  For our mitigation
we will have to introduce an instruction that halts all speculation at that point, and place
it in front of all dangerous loads.  I wish it were otherwise.

Thanks,
Bill

> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw <richard.earnshaw@arm.com> wrote:

> 

> 

> This patch adds generic support for the new builtin

> __builtin_load_no_speculate.  It provides the overloading of the

> different access sizes and a default fall-back expansion for targets

> that do not support a mechanism for inhibiting speculation.

> 

> 	* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):

> 	New builtin type signature.

> 	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

> 	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

> 	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

> 	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

> 	* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.

> 	(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.

> 	(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.

> 	(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.

> 	(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.

> 	(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.

> 	* target.def (inhibit_load_speculation): New hook.

> 	* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to

> 	documentation.

> 	* doc/tm.texi: Regenerated.

> 	* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.

> 	* doc/extend.texi: Document __builtin_load_no_speculate.

> 	* c-family/c-common.c (load_no_speculate_resolve_size): New function.

> 	(load_no_speculate_resolve_params): New function.

> 	(load_no_speculate_resolve_return): New function.

> 	(resolve_overloaded_builtin): Handle overloading

> 	__builtin_load_no_speculate.

> 	* builtins.c (expand_load_no_speculate): New function.

> 	(expand_builtin): Handle new no-speculation builtins.

> 	* targhooks.h (default_inhibit_load_speculation): Declare.

> 	* targhooks.c (default_inhibit_load_speculation): New function.

> ---

> gcc/builtin-types.def       |  16 +++++

> gcc/builtins.c              |  99 ++++++++++++++++++++++++++

> gcc/builtins.def            |  22 ++++++

> gcc/c-family/c-common.c     | 164 ++++++++++++++++++++++++++++++++++++++++++++

> gcc/c-family/c-cppbuiltin.c |   5 +-

> gcc/doc/cpp.texi            |   4 ++

> gcc/doc/extend.texi         |  53 ++++++++++++++

> gcc/doc/tm.texi             |   6 ++

> gcc/doc/tm.texi.in          |   2 +

> gcc/target.def              |  20 ++++++

> gcc/targhooks.c             |  69 +++++++++++++++++++

> gcc/targhooks.h             |   3 +

> 12 files changed, 462 insertions(+), 1 deletion(-)

> 

> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>
Jeff Law Jan. 8, 2018, 4:47 a.m. UTC | #2
On 01/07/2018 07:20 PM, Bill Schmidt wrote:
> Hi Richard,

> 

> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

> have a way to force resolution of a condition prior to continuing speculation, so this

> will just introduce another comparison that we would speculate past.  For our mitigation

> we will have to introduce an instruction that halts all speculation at that point, and place

> it in front of all dangerous loads.  I wish it were otherwise.

So could you have an expander for __builtin_load_no_speculate that just
emits the magic insn that halts all speculation and essentially ignores
the additional stuff that __builtin_load_no_speculate might be able to
do on other platforms?

jeff
Richard Biener Jan. 8, 2018, 11:46 a.m. UTC | #3
On Mon, Jan 8, 2018 at 5:47 AM, Jeff Law <law@redhat.com> wrote:
> On 01/07/2018 07:20 PM, Bill Schmidt wrote:

>> Hi Richard,

>>

>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>> have a way to force resolution of a condition prior to continuing speculation, so this

>> will just introduce another comparison that we would speculate past.  For our mitigation

>> we will have to introduce an instruction that halts all speculation at that point, and place

>> it in front of all dangerous loads.  I wish it were otherwise.

> So could you have an expander for __builtin_load_no_speculate that just

> emits the magic insn that halts all speculation and essentially ignores

> the additional stuff that __builtin_load_no_speculate might be able to

> do on other platforms?


I think you at least need to expand the builtin semantically given as designed
it might consume the condition entirely in the source code.

I also think the user documentation in extend.texi should contain examples on
how to actually use the builtin to mitigate the Spectre attack, that
is, code before
and after using it.

And somebody might want to set up a spectre.html page and some NEWS item
at some point.

Richard.

>

> jeff
Richard Earnshaw (lists) Jan. 8, 2018, 2:06 p.m. UTC | #4
On 08/01/18 02:20, Bill Schmidt wrote:
> Hi Richard,

> 

> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

> have a way to force resolution of a condition prior to continuing speculation, so this

> will just introduce another comparison that we would speculate past.  For our mitigation

> we will have to introduce an instruction that halts all speculation at that point, and place

> it in front of all dangerous loads.  I wish it were otherwise.


So can't you make the builtin expand to (in pseudo code):

	if (bounds_check)
	  {
	    __asm ("barrier");
	    result = *ptr;
          }
        else
	  result = failval;

R.

> 

> Thanks,

> Bill

> 

>> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw <richard.earnshaw@arm.com> wrote:

>>

>>

>> This patch adds generic support for the new builtin

>> __builtin_load_no_speculate.  It provides the overloading of the

>> different access sizes and a default fall-back expansion for targets

>> that do not support a mechanism for inhibiting speculation.

>>

>> 	* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):

>> 	New builtin type signature.

>> 	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>> 	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>> 	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>> 	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>> 	* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.

>> 	(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.

>> 	(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.

>> 	(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.

>> 	(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.

>> 	(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.

>> 	* target.def (inhibit_load_speculation): New hook.

>> 	* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to

>> 	documentation.

>> 	* doc/tm.texi: Regenerated.

>> 	* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.

>> 	* doc/extend.texi: Document __builtin_load_no_speculate.

>> 	* c-family/c-common.c (load_no_speculate_resolve_size): New function.

>> 	(load_no_speculate_resolve_params): New function.

>> 	(load_no_speculate_resolve_return): New function.

>> 	(resolve_overloaded_builtin): Handle overloading

>> 	__builtin_load_no_speculate.

>> 	* builtins.c (expand_load_no_speculate): New function.

>> 	(expand_builtin): Handle new no-speculation builtins.

>> 	* targhooks.h (default_inhibit_load_speculation): Declare.

>> 	* targhooks.c (default_inhibit_load_speculation): New function.

>> ---

>> gcc/builtin-types.def       |  16 +++++

>> gcc/builtins.c              |  99 ++++++++++++++++++++++++++

>> gcc/builtins.def            |  22 ++++++

>> gcc/c-family/c-common.c     | 164 ++++++++++++++++++++++++++++++++++++++++++++

>> gcc/c-family/c-cppbuiltin.c |   5 +-

>> gcc/doc/cpp.texi            |   4 ++

>> gcc/doc/extend.texi         |  53 ++++++++++++++

>> gcc/doc/tm.texi             |   6 ++

>> gcc/doc/tm.texi.in          |   2 +

>> gcc/target.def              |  20 ++++++

>> gcc/targhooks.c             |  69 +++++++++++++++++++

>> gcc/targhooks.h             |   3 +

>> 12 files changed, 462 insertions(+), 1 deletion(-)

>>

>> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>

>
Bill Schmidt Jan. 8, 2018, 2:19 p.m. UTC | #5
> On Jan 7, 2018, at 10:47 PM, Jeff Law <law@redhat.com> wrote:

> 

> On 01/07/2018 07:20 PM, Bill Schmidt wrote:

>> Hi Richard,

>> 

>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>> have a way to force resolution of a condition prior to continuing speculation, so this

>> will just introduce another comparison that we would speculate past.  For our mitigation

>> we will have to introduce an instruction that halts all speculation at that point, and place

>> it in front of all dangerous loads.  I wish it were otherwise.

> So could you have an expander for __builtin_load_no_speculate that just

> emits the magic insn that halts all speculation and essentially ignores

> the additional stuff that __builtin_load_no_speculate might be able to

> do on other platforms?


This is possible, but the builtin documentation is completely misleading for
powerpc.  We will not provide the semantics that this builtin claims to provide.
So at a minimum we would need the documentation to indicate that the additional
range-checking is target-specific behavior as well, not just the speculation code.
At that point it isn't really a very target-neutral solution.

What about other targets?  This builtin seems predicated on specific behavior
of ARM architecture; I don't know whether other targets have a guaranteed
speculation-rectifying conditional test.

For POWER, all we would need, or be able to exploit, is 

	void __builtin_speculation_barrier ()

or some such.  If there are two dangerous loads in one block, a single call
to this suffices, but a generic solution involving range checks for specific
loads would require one per load.

Thanks,
Bill  

> 

> jeff

>
Richard Earnshaw (lists) Jan. 8, 2018, 3:23 p.m. UTC | #6
On 08/01/18 14:19, Bill Schmidt wrote:
> 

>> On Jan 7, 2018, at 10:47 PM, Jeff Law <law@redhat.com> wrote:

>>

>> On 01/07/2018 07:20 PM, Bill Schmidt wrote:

>>> Hi Richard,

>>>

>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>>> have a way to force resolution of a condition prior to continuing speculation, so this

>>> will just introduce another comparison that we would speculate past.  For our mitigation

>>> we will have to introduce an instruction that halts all speculation at that point, and place

>>> it in front of all dangerous loads.  I wish it were otherwise.

>> So could you have an expander for __builtin_load_no_speculate that just

>> emits the magic insn that halts all speculation and essentially ignores

>> the additional stuff that __builtin_load_no_speculate might be able to

>> do on other platforms?

> 

> This is possible, but the builtin documentation is completely misleading for

> powerpc.  We will not provide the semantics that this builtin claims to provide.

> So at a minimum we would need the documentation to indicate that the additional

> range-checking is target-specific behavior as well, not just the speculation code.

> At that point it isn't really a very target-neutral solution.

> 

> What about other targets?  This builtin seems predicated on specific behavior

> of ARM architecture; I don't know whether other targets have a guaranteed

> speculation-rectifying conditional test.

> 

> For POWER, all we would need, or be able to exploit, is 

> 

> 	void __builtin_speculation_barrier ()

> 

> or some such.  If there are two dangerous loads in one block, a single call

> to this suffices, but a generic solution involving range checks for specific

> loads would require one per load.

> 


Do you have any data to suggest that multiple /independent/ vulnerable
accesses occur under a single guarding condition more often than 'once
in a blue moon'?  It seems to me that would be highly unlikely.


R.
Bill Schmidt Jan. 8, 2018, 4:01 p.m. UTC | #7
On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 

> On 08/01/18 02:20, Bill Schmidt wrote:

>> Hi Richard,

>> 

>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>> have a way to force resolution of a condition prior to continuing speculation, so this

>> will just introduce another comparison that we would speculate past.  For our mitigation

>> we will have to introduce an instruction that halts all speculation at that point, and place

>> it in front of all dangerous loads.  I wish it were otherwise.

> 

> So can't you make the builtin expand to (in pseudo code):

> 

> 	if (bounds_check)

> 	  {

> 	    __asm ("barrier");

> 	    result = *ptr;

>          }

>        else

> 	  result = failval;


Could, but this just generates unnecessary code for Power.  We would instead generate

	__asm ("barrier");
	result = *ptr;

without any checks.  We would ignore everything but the first argument.

Thanks,
Bill

> 

> R.

> 

>> 

>> Thanks,

>> Bill

>> 

>>> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw <richard.earnshaw@arm.com> wrote:

>>> 

>>> 

>>> This patch adds generic support for the new builtin

>>> __builtin_load_no_speculate.  It provides the overloading of the

>>> different access sizes and a default fall-back expansion for targets

>>> that do not support a mechanism for inhibiting speculation.

>>> 

>>> 	* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):

>>> 	New builtin type signature.

>>> 	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>>> 	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>>> 	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>>> 	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>>> 	* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.

>>> 	(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.

>>> 	(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.

>>> 	(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.

>>> 	(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.

>>> 	(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.

>>> 	* target.def (inhibit_load_speculation): New hook.

>>> 	* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to

>>> 	documentation.

>>> 	* doc/tm.texi: Regenerated.

>>> 	* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.

>>> 	* doc/extend.texi: Document __builtin_load_no_speculate.

>>> 	* c-family/c-common.c (load_no_speculate_resolve_size): New function.

>>> 	(load_no_speculate_resolve_params): New function.

>>> 	(load_no_speculate_resolve_return): New function.

>>> 	(resolve_overloaded_builtin): Handle overloading

>>> 	__builtin_load_no_speculate.

>>> 	* builtins.c (expand_load_no_speculate): New function.

>>> 	(expand_builtin): Handle new no-speculation builtins.

>>> 	* targhooks.h (default_inhibit_load_speculation): Declare.

>>> 	* targhooks.c (default_inhibit_load_speculation): New function.

>>> ---

>>> gcc/builtin-types.def       |  16 +++++

>>> gcc/builtins.c              |  99 ++++++++++++++++++++++++++

>>> gcc/builtins.def            |  22 ++++++

>>> gcc/c-family/c-common.c     | 164 ++++++++++++++++++++++++++++++++++++++++++++

>>> gcc/c-family/c-cppbuiltin.c |   5 +-

>>> gcc/doc/cpp.texi            |   4 ++

>>> gcc/doc/extend.texi         |  53 ++++++++++++++

>>> gcc/doc/tm.texi             |   6 ++

>>> gcc/doc/tm.texi.in          |   2 +

>>> gcc/target.def              |  20 ++++++

>>> gcc/targhooks.c             |  69 +++++++++++++++++++

>>> gcc/targhooks.h             |   3 +

>>> 12 files changed, 462 insertions(+), 1 deletion(-)

>>> 

>>> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>

>> 

>
Bill Schmidt Jan. 8, 2018, 4:52 p.m. UTC | #8
On Jan 8, 2018, at 9:23 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 

> On 08/01/18 14:19, Bill Schmidt wrote:

>> 

>>> On Jan 7, 2018, at 10:47 PM, Jeff Law <law@redhat.com> wrote:

>>> 

>>> On 01/07/2018 07:20 PM, Bill Schmidt wrote:

>>>> Hi Richard,

>>>> 

>>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>>>> have a way to force resolution of a condition prior to continuing speculation, so this

>>>> will just introduce another comparison that we would speculate past.  For our mitigation

>>>> we will have to introduce an instruction that halts all speculation at that point, and place

>>>> it in front of all dangerous loads.  I wish it were otherwise.

>>> So could you have an expander for __builtin_load_no_speculate that just

>>> emits the magic insn that halts all speculation and essentially ignores

>>> the additional stuff that __builtin_load_no_speculate might be able to

>>> do on other platforms?

>> 

>> This is possible, but the builtin documentation is completely misleading for

>> powerpc.  We will not provide the semantics that this builtin claims to provide.

>> So at a minimum we would need the documentation to indicate that the additional

>> range-checking is target-specific behavior as well, not just the speculation code.

>> At that point it isn't really a very target-neutral solution.

>> 

>> What about other targets?  This builtin seems predicated on specific behavior

>> of ARM architecture; I don't know whether other targets have a guaranteed

>> speculation-rectifying conditional test.

>> 

>> For POWER, all we would need, or be able to exploit, is 

>> 

>> 	void __builtin_speculation_barrier ()

>> 

>> or some such.  If there are two dangerous loads in one block, a single call

>> to this suffices, but a generic solution involving range checks for specific

>> loads would require one per load.

>> 

> 

> Do you have any data to suggest that multiple /independent/ vulnerable

> accesses occur under a single guarding condition more often than 'once

> in a blue moon'?  It seems to me that would be highly unlikely.


No, I agree with that.

This is more thinking ahead about the problem of trying to identify these
cases automatically.  Anything we do there is going to be necessarily 
conservative, so more loads may look dangerous than a human user can 
identify.  But for something like that then you probably aren't looking at
using __builtin_load_no_speculate anyhow.

Thanks,
Bill

> 

> 

> R.
Jeff Law Jan. 8, 2018, 7:40 p.m. UTC | #9
On 01/08/2018 07:19 AM, Bill Schmidt wrote:
> 

>> On Jan 7, 2018, at 10:47 PM, Jeff Law <law@redhat.com> wrote:

>>

>> On 01/07/2018 07:20 PM, Bill Schmidt wrote:

>>> Hi Richard,

>>>

>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>>> have a way to force resolution of a condition prior to continuing speculation, so this

>>> will just introduce another comparison that we would speculate past.  For our mitigation

>>> we will have to introduce an instruction that halts all speculation at that point, and place

>>> it in front of all dangerous loads.  I wish it were otherwise.

>> So could you have an expander for __builtin_load_no_speculate that just

>> emits the magic insn that halts all speculation and essentially ignores

>> the additional stuff that __builtin_load_no_speculate might be able to

>> do on other platforms?

> 

> This is possible, but the builtin documentation is completely misleading for

> powerpc.  We will not provide the semantics that this builtin claims to provide.

> So at a minimum we would need the documentation to indicate that the additional

> range-checking is target-specific behavior as well, not just the speculation code.

> At that point it isn't really a very target-neutral solution.

> 

> What about other targets?  This builtin seems predicated on specific behavior

> of ARM architecture; I don't know whether other targets have a guaranteed

> speculation-rectifying conditional test.

> 

> For POWER, all we would need, or be able to exploit, is 

> 

> 	void __builtin_speculation_barrier ()

> 

> or some such.  If there are two dangerous loads in one block, a single call

> to this suffices, but a generic solution involving range checks for specific

> loads would require one per load.

So my concern is that if we have multiple builtins to deal with this
problem, then we're forcing the pain of figuring out which one to use
onto the users.

I'd really like there to be a single builtin to address this problem.
Otherwise the kernel (and anyone else that wants to use this stuff) is
stuck with using both, conditional compilation or something along those
lines which seems like a huge lose.

We'd really like them to be able to add one appropriate
__builtin_whatever call at the key site(s) that does the right thing
regardless of the architecture.

I think that implies we're likely to have arguments that are unused on
some architectures.  I can live with that.  But it also implies we need
better language around the semantics.

As you mention -- there's some belief that we're going to want to do
something for automatic detection in the.  I think a builtin for that
could well be different than what we provide to the kernel folks in the
immediate term.  I want to focus first on getting a builtin the kernel
guys can use today as needed though.

Jeff
Bill Schmidt Jan. 8, 2018, 9:03 p.m. UTC | #10
On Jan 8, 2018, at 1:40 PM, Jeff Law <law@redhat.com> wrote:
> 

> On 01/08/2018 07:19 AM, Bill Schmidt wrote:

>> 

>>> On Jan 7, 2018, at 10:47 PM, Jeff Law <law@redhat.com> wrote:

>>> 

>>> On 01/07/2018 07:20 PM, Bill Schmidt wrote:

>>>> Hi Richard,

>>>> 

>>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>>>> have a way to force resolution of a condition prior to continuing speculation, so this

>>>> will just introduce another comparison that we would speculate past.  For our mitigation

>>>> we will have to introduce an instruction that halts all speculation at that point, and place

>>>> it in front of all dangerous loads.  I wish it were otherwise.

>>> So could you have an expander for __builtin_load_no_speculate that just

>>> emits the magic insn that halts all speculation and essentially ignores

>>> the additional stuff that __builtin_load_no_speculate might be able to

>>> do on other platforms?

>> 

>> This is possible, but the builtin documentation is completely misleading for

>> powerpc.  We will not provide the semantics that this builtin claims to provide.

>> So at a minimum we would need the documentation to indicate that the additional

>> range-checking is target-specific behavior as well, not just the speculation code.

>> At that point it isn't really a very target-neutral solution.

>> 

>> What about other targets?  This builtin seems predicated on specific behavior

>> of ARM architecture; I don't know whether other targets have a guaranteed

>> speculation-rectifying conditional test.

>> 

>> For POWER, all we would need, or be able to exploit, is 

>> 

>> 	void __builtin_speculation_barrier ()

>> 

>> or some such.  If there are two dangerous loads in one block, a single call

>> to this suffices, but a generic solution involving range checks for specific

>> loads would require one per load.

> So my concern is that if we have multiple builtins to deal with this

> problem, then we're forcing the pain of figuring out which one to use

> onto the users.

> 

> I'd really like there to be a single builtin to address this problem.

> Otherwise the kernel (and anyone else that wants to use this stuff) is

> stuck with using both, conditional compilation or something along those

> lines which seems like a huge lose.

> 

> We'd really like them to be able to add one appropriate

> __builtin_whatever call at the key site(s) that does the right thing

> regardless of the architecture.

> 

> I think that implies we're likely to have arguments that are unused on

> some architectures.  I can live with that.  But it also implies we need

> better language around the semantics.

> 

> As you mention -- there's some belief that we're going to want to do

> something for automatic detection in the.  I think a builtin for that

> could well be different than what we provide to the kernel folks in the

> immediate term.  I want to focus first on getting a builtin the kernel

> guys can use today as needed though.


Hi Jeff,

I agree 100% with this approach.  I just wanted to raise the point in case
other architectures have different needs.  Power can work around this
by just ignoring 4 of the 5 arguments.  As long as nobody else needs
*additional* arguments, this should work out just fine.  But I want to be clear
that the only guarantee of the semantics for everybody is that "speculation 
stops here," while on some processors it may be "speculation stops here
if out of range."  If we can write this into the documentation, then I'm fine
writing a target expander for Power as discussed.

I had a brief interchange with Richi last week, and he suggested that for
the automatic detection we might look into flagging MEM_REFs rather
than inserting a built-in; a target hook can still handle such a flag.  That
has some advantages and some disadvantages that I can think of, so
we'll have to talk that out on the list over time after we get through the
crisis mode reactions.

Thanks!

Bill

> 

> Jeff
Richard Earnshaw (lists) Jan. 9, 2018, 10:21 a.m. UTC | #11
On 08/01/18 16:01, Bill Schmidt wrote:
> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:

>>

>> On 08/01/18 02:20, Bill Schmidt wrote:

>>> Hi Richard,

>>>

>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>>> have a way to force resolution of a condition prior to continuing speculation, so this

>>> will just introduce another comparison that we would speculate past.  For our mitigation

>>> we will have to introduce an instruction that halts all speculation at that point, and place

>>> it in front of all dangerous loads.  I wish it were otherwise.

>>

>> So can't you make the builtin expand to (in pseudo code):

>>

>> 	if (bounds_check)

>> 	  {

>> 	    __asm ("barrier");

>> 	    result = *ptr;

>>          }

>>        else

>> 	  result = failval;

> 

> Could, but this just generates unnecessary code for Power.  We would instead generate

> 

> 	__asm ("barrier");

> 	result = *ptr;

> 

> without any checks.  We would ignore everything but the first argument.


You can't do that with the builtin as it is currently specified as it
also has a defined behaviour for the result it returns.  You can,
however, expand the code as normal RTL and let the optimizers remove any
redundant code if they can make that deduction and you don't need the
additional behaviour.

R.

> 

> Thanks,

> Bill

> 

>>

>> R.

>>

>>>

>>> Thanks,

>>> Bill

>>>

>>>> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw <richard.earnshaw@arm.com> wrote:

>>>>

>>>>

>>>> This patch adds generic support for the new builtin

>>>> __builtin_load_no_speculate.  It provides the overloading of the

>>>> different access sizes and a default fall-back expansion for targets

>>>> that do not support a mechanism for inhibiting speculation.

>>>>

>>>> 	* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):

>>>> 	New builtin type signature.

>>>> 	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>>>> 	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>>>> 	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>>>> 	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>>>> 	* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.

>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.

>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.

>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.

>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.

>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.

>>>> 	* target.def (inhibit_load_speculation): New hook.

>>>> 	* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to

>>>> 	documentation.

>>>> 	* doc/tm.texi: Regenerated.

>>>> 	* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.

>>>> 	* doc/extend.texi: Document __builtin_load_no_speculate.

>>>> 	* c-family/c-common.c (load_no_speculate_resolve_size): New function.

>>>> 	(load_no_speculate_resolve_params): New function.

>>>> 	(load_no_speculate_resolve_return): New function.

>>>> 	(resolve_overloaded_builtin): Handle overloading

>>>> 	__builtin_load_no_speculate.

>>>> 	* builtins.c (expand_load_no_speculate): New function.

>>>> 	(expand_builtin): Handle new no-speculation builtins.

>>>> 	* targhooks.h (default_inhibit_load_speculation): Declare.

>>>> 	* targhooks.c (default_inhibit_load_speculation): New function.

>>>> ---

>>>> gcc/builtin-types.def       |  16 +++++

>>>> gcc/builtins.c              |  99 ++++++++++++++++++++++++++

>>>> gcc/builtins.def            |  22 ++++++

>>>> gcc/c-family/c-common.c     | 164 ++++++++++++++++++++++++++++++++++++++++++++

>>>> gcc/c-family/c-cppbuiltin.c |   5 +-

>>>> gcc/doc/cpp.texi            |   4 ++

>>>> gcc/doc/extend.texi         |  53 ++++++++++++++

>>>> gcc/doc/tm.texi             |   6 ++

>>>> gcc/doc/tm.texi.in          |   2 +

>>>> gcc/target.def              |  20 ++++++

>>>> gcc/targhooks.c             |  69 +++++++++++++++++++

>>>> gcc/targhooks.h             |   3 +

>>>> 12 files changed, 462 insertions(+), 1 deletion(-)

>>>>

>>>> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>

>>>

>>

>
Bill Schmidt Jan. 9, 2018, 5:11 p.m. UTC | #12
On Jan 9, 2018, at 4:21 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 

> On 08/01/18 16:01, Bill Schmidt wrote:

>> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:

>>> 

>>> On 08/01/18 02:20, Bill Schmidt wrote:

>>>> Hi Richard,

>>>> 

>>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>>>> have a way to force resolution of a condition prior to continuing speculation, so this

>>>> will just introduce another comparison that we would speculate past.  For our mitigation

>>>> we will have to introduce an instruction that halts all speculation at that point, and place

>>>> it in front of all dangerous loads.  I wish it were otherwise.

>>> 

>>> So can't you make the builtin expand to (in pseudo code):

>>> 

>>> 	if (bounds_check)

>>> 	  {

>>> 	    __asm ("barrier");

>>> 	    result = *ptr;

>>>         }

>>>       else

>>> 	  result = failval;

>> 

>> Could, but this just generates unnecessary code for Power.  We would instead generate

>> 

>> 	__asm ("barrier");

>> 	result = *ptr;

>> 

>> without any checks.  We would ignore everything but the first argument.

> 

> You can't do that with the builtin as it is currently specified as it

> also has a defined behaviour for the result it returns.  You can,

> however, expand the code as normal RTL and let the optimizers remove any

> redundant code if they can make that deduction and you don't need the

> additional behaviour.


But that's my original point.  "As currently specified" is overspecified for
our architecture, and expanding the extra code *hoping* it will go away
is not something I feel we should do.  If our hopes are dashed, we end up
with yet worse performance.  If we're going to use something generic
for everyone, then I argue that the semantics may not be the same for
all targets, and that this needs to be specified in the documentation.

I'm just looking for a solution that works for everyone.

Thanks,
Bill

> 

> R.

> 

>> 

>> Thanks,

>> Bill

>> 

>>> 

>>> R.

>>> 

>>>> 

>>>> Thanks,

>>>> Bill

>>>> 

>>>>> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw <richard.earnshaw@arm.com> wrote:

>>>>> 

>>>>> 

>>>>> This patch adds generic support for the new builtin

>>>>> __builtin_load_no_speculate.  It provides the overloading of the

>>>>> different access sizes and a default fall-back expansion for targets

>>>>> that do not support a mechanism for inhibiting speculation.

>>>>> 

>>>>> 	* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):

>>>>> 	New builtin type signature.

>>>>> 	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>>>>> 	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>>>>> 	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>>>>> 	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.

>>>>> 	* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.

>>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.

>>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.

>>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.

>>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.

>>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.

>>>>> 	* target.def (inhibit_load_speculation): New hook.

>>>>> 	* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to

>>>>> 	documentation.

>>>>> 	* doc/tm.texi: Regenerated.

>>>>> 	* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.

>>>>> 	* doc/extend.texi: Document __builtin_load_no_speculate.

>>>>> 	* c-family/c-common.c (load_no_speculate_resolve_size): New function.

>>>>> 	(load_no_speculate_resolve_params): New function.

>>>>> 	(load_no_speculate_resolve_return): New function.

>>>>> 	(resolve_overloaded_builtin): Handle overloading

>>>>> 	__builtin_load_no_speculate.

>>>>> 	* builtins.c (expand_load_no_speculate): New function.

>>>>> 	(expand_builtin): Handle new no-speculation builtins.

>>>>> 	* targhooks.h (default_inhibit_load_speculation): Declare.

>>>>> 	* targhooks.c (default_inhibit_load_speculation): New function.

>>>>> ---

>>>>> gcc/builtin-types.def       |  16 +++++

>>>>> gcc/builtins.c              |  99 ++++++++++++++++++++++++++

>>>>> gcc/builtins.def            |  22 ++++++

>>>>> gcc/c-family/c-common.c     | 164 ++++++++++++++++++++++++++++++++++++++++++++

>>>>> gcc/c-family/c-cppbuiltin.c |   5 +-

>>>>> gcc/doc/cpp.texi            |   4 ++

>>>>> gcc/doc/extend.texi         |  53 ++++++++++++++

>>>>> gcc/doc/tm.texi             |   6 ++

>>>>> gcc/doc/tm.texi.in          |   2 +

>>>>> gcc/target.def              |  20 ++++++

>>>>> gcc/targhooks.c             |  69 +++++++++++++++++++

>>>>> gcc/targhooks.h             |   3 +

>>>>> 12 files changed, 462 insertions(+), 1 deletion(-)

>>>>> 

>>>>> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>
Jeff Law Jan. 10, 2018, 11:26 p.m. UTC | #13
On 01/08/2018 09:01 AM, Bill Schmidt wrote:
> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:

>>

>> On 08/01/18 02:20, Bill Schmidt wrote:

>>> Hi Richard,

>>>

>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>>> have a way to force resolution of a condition prior to continuing speculation, so this

>>> will just introduce another comparison that we would speculate past.  For our mitigation

>>> we will have to introduce an instruction that halts all speculation at that point, and place

>>> it in front of all dangerous loads.  I wish it were otherwise.

>>

>> So can't you make the builtin expand to (in pseudo code):

>>

>> 	if (bounds_check)

>> 	  {

>> 	    __asm ("barrier");

>> 	    result = *ptr;

>>          }

>>        else

>> 	  result = failval;

> 

> Could, but this just generates unnecessary code for Power.  We would instead generate

> 

> 	__asm ("barrier");

> 	result = *ptr;

> 

> without any checks.  We would ignore everything but the first argument.

So how bad would it be to drop the whole concept of failval and make the
result indeterminate in the out of bounds case.


Would that give us enough freedom to generate an appropriate sequence
for aarch64 and ppc?  It feels like these two architectures are
essentially on opposite sides of the spectrum and if we can find a
reasonable way to handle them that we'd likely have semantics we can use
on just about any architecture.


jeff
Jeff Law Jan. 10, 2018, 11:32 p.m. UTC | #14
On 01/09/2018 10:11 AM, Bill Schmidt wrote:
> On Jan 9, 2018, at 4:21 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:

>>

>> On 08/01/18 16:01, Bill Schmidt wrote:

>>> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:

>>>>

>>>> On 08/01/18 02:20, Bill Schmidt wrote:

>>>>> Hi Richard,

>>>>>

>>>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>>>>> have a way to force resolution of a condition prior to continuing speculation, so this

>>>>> will just introduce another comparison that we would speculate past.  For our mitigation

>>>>> we will have to introduce an instruction that halts all speculation at that point, and place

>>>>> it in front of all dangerous loads.  I wish it were otherwise.

>>>>

>>>> So can't you make the builtin expand to (in pseudo code):

>>>>

>>>> 	if (bounds_check)

>>>> 	  {

>>>> 	    __asm ("barrier");

>>>> 	    result = *ptr;

>>>>         }

>>>>       else

>>>> 	  result = failval;

>>>

>>> Could, but this just generates unnecessary code for Power.  We would instead generate

>>>

>>> 	__asm ("barrier");

>>> 	result = *ptr;

>>>

>>> without any checks.  We would ignore everything but the first argument.

>>

>> You can't do that with the builtin as it is currently specified as it

>> also has a defined behaviour for the result it returns.  You can,

>> however, expand the code as normal RTL and let the optimizers remove any

>> redundant code if they can make that deduction and you don't need the

>> additional behaviour.

> 

> But that's my original point.  "As currently specified" is overspecified for

> our architecture, and expanding the extra code *hoping* it will go away

> is not something I feel we should do.  If our hopes are dashed, we end up

> with yet worse performance.  If we're going to use something generic

> for everyone, then I argue that the semantics may not be the same for

> all targets, and that this needs to be specified in the documentation.

Other than in the case where bounds_check is a compile-time constant I
don't see that the compiler is likely to drop the else clause above.  So
relying on the compiler to optimize away the redundant code doesn't seem
like a viable option.

However, if we relax the semantics on failval, then I think we get
enough freedom to generate the desired code on PPC.

> I'm just looking for a solution that works for everyone.

Likewise.  It'd be unfortunate if we end up with two distinct builtins
and the kernel buys have to write some level of abstraction on top of
them to select between them based on the target.


jeff
Jeff Law Jan. 10, 2018, 11:41 p.m. UTC | #15
On 01/08/2018 02:03 PM, Bill Schmidt wrote:
> 

> I agree 100% with this approach.  I just wanted to raise the point in case

> other architectures have different needs.  Power can work around this

> by just ignoring 4 of the 5 arguments.  As long as nobody else needs

> *additional* arguments, this should work out just fine.  But I want to be clear

> that the only guarantee of the semantics for everybody is that "speculation 

> stops here," while on some processors it may be "speculation stops here

> if out of range."  If we can write this into the documentation, then I'm fine

> writing a target expander for Power as discussed.

My recollection of other micro-architectures and how they handle
conditional moves makes me believe that the test and conditional move
may be enough to stop the rampant speculation that causes the problems.
They're just enough of a fence to provide a level of mitigation.

So I see value in providing those arguments for architectures other than
ARM/AArch64.

What I think we're really trying to nail down is how crisply the
semantics of this builtin are, particularly around the need to test and
provide a failval.

I'm actually going to be in a meeting with another chip vendor tomorrow
AM and will make a point to discuss this with them and see what
direction they want to go.  I suspect they're closer to PPC in terms of
the semantics they want, but need to verify.


> 

> I had a brief interchange with Richi last week, and he suggested that for

> the automatic detection we might look into flagging MEM_REFs rather

> than inserting a built-in; a target hook can still handle such a flag.  That

> has some advantages and some disadvantages that I can think of, so

> we'll have to talk that out on the list over time after we get through the

> crisis mode reactions.

I think someone could likely spend a huge amount of time in this space.
Both with the analysis around finding potentially vulnerable sequences,
generating appropriate mitigation code and optimizing that to be as
painless as possible.  But as you say, this is something we should hash
out post-crisis.

jeff
Richard Earnshaw (lists) Jan. 12, 2018, 4:08 p.m. UTC | #16
On 10/01/18 23:26, Jeff Law wrote:
> On 01/08/2018 09:01 AM, Bill Schmidt wrote:

>> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:

>>>

>>> On 08/01/18 02:20, Bill Schmidt wrote:

>>>> Hi Richard,

>>>>

>>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>>>> have a way to force resolution of a condition prior to continuing speculation, so this

>>>> will just introduce another comparison that we would speculate past.  For our mitigation

>>>> we will have to introduce an instruction that halts all speculation at that point, and place

>>>> it in front of all dangerous loads.  I wish it were otherwise.

>>>

>>> So can't you make the builtin expand to (in pseudo code):

>>>

>>> 	if (bounds_check)

>>> 	  {

>>> 	    __asm ("barrier");

>>> 	    result = *ptr;

>>>          }

>>>        else

>>> 	  result = failval;

>>

>> Could, but this just generates unnecessary code for Power.  We would instead generate

>>

>> 	__asm ("barrier");

>> 	result = *ptr;

>>

>> without any checks.  We would ignore everything but the first argument.

> So how bad would it be to drop the whole concept of failval and make the

> result indeterminate in the out of bounds case.

> 


Indeterminate on its own is too loose.  An arbitrary value fetched from
memory is 'indeterminate', so we'd need tighter wording that that, see
below.

> 

> Would that give us enough freedom to generate an appropriate sequence

> for aarch64 and ppc?  It feels like these two architectures are

> essentially on opposite sides of the spectrum and if we can find a

> reasonable way to handle them that we'd likely have semantics we can use

> on just about any architecture.

> 

> 

> jeff

> 


So if we changed the specification (using the same parameter names) to:

1) When the call to the builtin is not being speculatively executed the
   result is *ptr if lower_bound <= cmpptr < upper_bound and undefined
   otherwise.
2) When code is being speculatively executed either:
   a) execution of subsequent instructions that depend on the result
      will be prevented until it can be proven that the call to the
      builtin is not being speculatively executed (ie until execution
      can continue under point 1), or
   b) speculation may continue using *ptr as the result when
      lower_bound <= cmpptr < upper_bound, or an unspecified constant
      value (eg 0) if cmpptr lies outside that range.

Then I think that meets those requirements.

We could still implement the existing relaxations on permitting the
builtin to be called with only one of upper and lower.

My concern is that this would make it quite hard for programmers to
reason safely about the behaviour overall.

R.
Bill Schmidt Jan. 12, 2018, 10:44 p.m. UTC | #17
Hi Richard and Jeff,

Sorry I missed this earlier today, it somehow ended up in my spam folder...

> On Jan 12, 2018, at 10:08 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:

> 

> On 10/01/18 23:26, Jeff Law wrote:

>> On 01/08/2018 09:01 AM, Bill Schmidt wrote:

>>> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:

>>>> 

>>>> On 08/01/18 02:20, Bill Schmidt wrote:

>>>>> Hi Richard,

>>>>> 

>>>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't

>>>>> have a way to force resolution of a condition prior to continuing speculation, so this

>>>>> will just introduce another comparison that we would speculate past.  For our mitigation

>>>>> we will have to introduce an instruction that halts all speculation at that point, and place

>>>>> it in front of all dangerous loads.  I wish it were otherwise.

>>>> 

>>>> So can't you make the builtin expand to (in pseudo code):

>>>> 

>>>> 	if (bounds_check)

>>>> 	  {

>>>> 	    __asm ("barrier");

>>>> 	    result = *ptr;

>>>>         }

>>>>       else

>>>> 	  result = failval;

>>> 

>>> Could, but this just generates unnecessary code for Power.  We would instead generate

>>> 

>>> 	__asm ("barrier");

>>> 	result = *ptr;

>>> 

>>> without any checks.  We would ignore everything but the first argument.

>> So how bad would it be to drop the whole concept of failval and make the

>> result indeterminate in the out of bounds case.

>> 

> 

> Indeterminate on its own is too loose.  An arbitrary value fetched from

> memory is 'indeterminate', so we'd need tighter wording that that, see

> below.

> 

>> 

>> Would that give us enough freedom to generate an appropriate sequence

>> for aarch64 and ppc?  It feels like these two architectures are

>> essentially on opposite sides of the spectrum and if we can find a

>> reasonable way to handle them that we'd likely have semantics we can use

>> on just about any architecture.

>> 

>> 

>> jeff

>> 

> 

> So if we changed the specification (using the same parameter names) to:

> 

> 1) When the call to the builtin is not being speculatively executed the

>   result is *ptr if lower_bound <= cmpptr < upper_bound and undefined

>   otherwise.

> 2) When code is being speculatively executed either:

>   a) execution of subsequent instructions that depend on the result

>      will be prevented until it can be proven that the call to the

>      builtin is not being speculatively executed (ie until execution

>      can continue under point 1), or

>   b) speculation may continue using *ptr as the result when

>      lower_bound <= cmpptr < upper_bound, or an unspecified constant

>      value (eg 0) if cmpptr lies outside that range.

> 

> Then I think that meets those requirements.


This is quite acceptable for Power.  Thanks, Richard, for your flexibility!

Bill

> 

> We could still implement the existing relaxations on permitting the

> builtin to be called with only one of upper and lower.

> 

> My concern is that this would make it quite hard for programmers to

> reason safely about the behaviour overall.

> 

> R.
diff mbox series

Patch

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index bb50e60..259aacd 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -785,6 +785,22 @@  DEF_FUNCTION_TYPE_VAR_3 (BT_FN_SSIZE_STRING_SIZE_CONST_STRING_VAR,
 DEF_FUNCTION_TYPE_VAR_3 (BT_FN_INT_FILEPTR_INT_CONST_STRING_VAR,
 			 BT_INT, BT_FILEPTR, BT_INT, BT_CONST_STRING)
 
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I1, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I2, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I4, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I8, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I16, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+
 DEF_FUNCTION_TYPE_VAR_4 (BT_FN_INT_STRING_INT_SIZE_CONST_STRING_VAR,
 			 BT_INT, BT_STRING, BT_INT, BT_SIZE, BT_CONST_STRING)
 
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 98eb804..1bdbc64 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6602,6 +6602,97 @@  expand_stack_save (void)
   return ret;
 }
 
+/* Expand a call to __builtin_load_no_speculate_<N>.  MODE represents the
+   size of the first argument to that call.  We emit a warning if the
+   result isn't used (IGNORE != 0), since the implementation might
+   rely on the value being used to correctly inhibit speculation.  */
+static rtx
+expand_load_no_speculate (machine_mode mode, tree exp, rtx target, int ignore)
+{
+  rtx ptr, op0, op1, op2, op3, op4;
+  unsigned nargs = call_expr_nargs (exp);
+
+  if (ignore)
+    {
+      warning_at (input_location, 0,
+		  "result of __builtin_load_no_speculate must be used to "
+		  "ensure correct operation");
+      target = NULL;
+    }
+
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  tree arg2 = CALL_EXPR_ARG (exp, 2);
+
+  ptr = expand_expr (arg0, NULL_RTX, ptr_mode, EXPAND_SUM);
+  op0 = validize_mem (gen_rtx_MEM (mode, convert_memory_address (Pmode, ptr)));
+
+  set_mem_align (op0, MAX (GET_MODE_ALIGNMENT (mode),
+			   get_pointer_alignment (arg0)));
+  set_mem_alias_set (op0, get_alias_set (TREE_TYPE (TREE_TYPE (arg0))));
+
+  /* Mark the memory access as volatile.  We don't want the optimizers to
+     move it or otherwise substitue an alternative value.  */
+  MEM_VOLATILE_P (op0) = 1;
+
+  if (integer_zerop (tree_strip_nop_conversions (arg1)))
+    op1 = NULL;
+  else
+    {
+      op1 = expand_normal (arg1);
+      if (GET_MODE (op1) != ptr_mode && GET_MODE (op1) != VOIDmode)
+	op1 = convert_modes (ptr_mode, VOIDmode, op1,
+			     TYPE_UNSIGNED (TREE_TYPE (arg1)));
+    }
+
+  if (integer_zerop (tree_strip_nop_conversions (arg2)))
+    op2 = NULL;
+  else
+    {
+      op2 = expand_normal (arg2);
+      if (GET_MODE (op2) != ptr_mode && GET_MODE (op2) != VOIDmode)
+	op2 = convert_modes (ptr_mode, VOIDmode, op2,
+			     TYPE_UNSIGNED (TREE_TYPE (arg2)));
+    }
+
+  if (nargs > 3)
+    {
+      tree arg3 = CALL_EXPR_ARG (exp, 3);
+      op3 = expand_normal (arg3);
+      if (CONST_INT_P (op3))
+	op3 = gen_int_mode (INTVAL (op3), mode);
+      else if (GET_MODE (op3) != mode && GET_MODE (op3) != VOIDmode)
+	op3 = convert_modes (mode, VOIDmode, op3,
+			     TYPE_UNSIGNED (TREE_TYPE (arg3)));
+    }
+  else
+    op3 = const0_rtx;
+
+  if (nargs > 4)
+    {
+      tree arg4 = CALL_EXPR_ARG (exp, 4);
+      op4 = expand_normal (arg4);
+      if (GET_MODE (op4) != ptr_mode && GET_MODE (op4) != VOIDmode)
+	op4 = convert_modes (ptr_mode, VOIDmode, op4,
+			     TYPE_UNSIGNED (TREE_TYPE (arg4)));
+    }
+  else
+    op4 = ptr;
+
+  if (op1 == NULL && op2 == NULL)
+    {
+      error_at (input_location,
+		"at least one speculation bound must be non-NULL");
+      /* Ensure we don't crash later.  */
+      op1 = op4;
+    }
+
+  if (target == NULL)
+    target = gen_reg_rtx (mode);
+
+  return targetm.inhibit_load_speculation (mode, target, op0, op1, op2, op3,
+					   op4);
+}
 
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
@@ -7732,6 +7823,14 @@  expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 	 folding.  */
       break;
 
+    case BUILT_IN_LOAD_NO_SPECULATE_1:
+    case BUILT_IN_LOAD_NO_SPECULATE_2:
+    case BUILT_IN_LOAD_NO_SPECULATE_4:
+    case BUILT_IN_LOAD_NO_SPECULATE_8:
+    case BUILT_IN_LOAD_NO_SPECULATE_16:
+      mode = get_builtin_sync_mode (fcode - BUILT_IN_LOAD_NO_SPECULATE_1);
+      return expand_load_no_speculate (mode, exp, target, ignore);
+
     default:	/* just do library call, if unknown builtin */
       break;
     }
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 671097e..761c063 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1017,6 +1017,28 @@  DEF_BUILTIN (BUILT_IN_EMUTLS_REGISTER_COMMON,
 	     true, true, true, ATTR_NOTHROW_LEAF_LIST, false,
 	     !targetm.have_tls)
 
+/* Suppressing speculation.  Users are expected to use the first (N)
+   variant, which will be translated internally into one of the other
+   types.  */
+DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_N, "load_no_speculate",
+		 BT_FN_VOID_VAR, ATTR_NULL)
+
+DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_1, "load_no_speculate_1",
+		 BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+		 ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_2, "load_no_speculate_2",
+		 BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+		 ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_4, "load_no_speculate_4",
+		 BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+		 ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_8, "load_no_speculate_8",
+		 BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+		 ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_16, "load_no_speculate_16",
+		 BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+		 ATTR_NULL)
+
 /* Exception support.  */
 DEF_BUILTIN_STUB (BUILT_IN_UNWIND_RESUME, "__builtin_unwind_resume")
 DEF_BUILTIN_STUB (BUILT_IN_CXA_END_CLEANUP, "__builtin_cxa_end_cleanup")
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 197a71f..c213ffd 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -6456,6 +6456,146 @@  builtin_type_for_size (int size, bool unsignedp)
   return type ? type : error_mark_node;
 }
 
+/* Work out the size of the object pointed to by the first arguement
+   of a call to __builtin_load_no_speculate.  Only pointers to
+   integral types and pointers are permitted.  Return 0 if the
+   arguement type is not supported of if the size is too large.  */
+static int
+load_no_speculate_resolve_size (tree function, vec<tree, va_gc> *params)
+{
+  /* Type of the argument.  */
+  tree type;
+  int size;
+
+  if (vec_safe_is_empty (params))
+    {
+      error ("too few arguments to function %qE", function);
+      return 0;
+    }
+
+  type = TREE_TYPE ((*params)[0]);
+
+  if (!POINTER_TYPE_P (type))
+    goto incompatible;
+
+  type = TREE_TYPE (type);
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    {
+      /* Force array-to-pointer decay for c++.  */
+      gcc_assert (c_dialect_cxx());
+      (*params)[0] = default_conversion ((*params)[0]);
+      type = TREE_TYPE ((*params)[0]);
+    }
+
+  if (!INTEGRAL_TYPE_P (type) && !POINTER_TYPE_P (type))
+    goto incompatible;
+
+  if (!COMPLETE_TYPE_P (type))
+   goto incompatible;
+
+  size = tree_to_uhwi (TYPE_SIZE_UNIT (type));
+  if (size == 1 || size == 2 || size == 4 || size == 8 || size == 16)
+    return size;
+
+ incompatible:
+  /* Issue the diagnostic only if the argument is valid, otherwise
+     it would be redundant at best and could be misleading.  */
+  if (type != error_mark_node)
+    error ("operand type %qT is incompatible with argument %d of %qE",
+	   type, 1, function);
+
+  return 0;
+}
+
+/* Validate and coerce PARAMS, the arguments to ORIG_FUNCTION to fit
+   the prototype for FUNCTION.  The first three arguments are
+   mandatory, but shouldn't need casting as they are all pointers and
+   we've already established that the first argument is a pointer to a
+   permitted type.  The two optional arguments may need to be
+   fabricated if they have been omitted.  */
+static bool
+load_no_speculate_resolve_params (location_t loc, tree orig_function,
+				  tree function,
+				  vec<tree, va_gc> *params)
+{
+  function_args_iterator iter;
+
+  function_args_iter_init (&iter, TREE_TYPE (function));
+  tree arg_type = function_args_iter_cond (&iter);
+  unsigned parmnum;
+  tree val;
+
+  if (params->length () < 3)
+    {
+      error_at (loc, "too few arguments to function %qE", orig_function);
+      return false;
+    }
+  else if (params->length () > 5)
+    {
+      error_at (loc, "too many arguments to function %qE", orig_function);
+      return false;
+    }
+
+  /* Required arguments.  These must all be pointers.  */
+  for (parmnum = 0; parmnum < 3; parmnum++)
+    {
+      arg_type = function_args_iter_cond (&iter);
+      val = (*params)[parmnum];
+      if (TREE_CODE (TREE_TYPE (val)) == ARRAY_TYPE)
+	val = default_conversion (val);
+      if (TREE_CODE (TREE_TYPE (val)) != POINTER_TYPE)
+	goto bad_arg;
+      (*params)[parmnum] = val;
+    }
+
+  /* Optional integer value.  */
+  arg_type = function_args_iter_cond (&iter);
+  if (params->length () >= 4)
+    {
+      val = (*params)[parmnum];
+      val = convert (arg_type, val);
+      (*params)[parmnum] = val;
+    }
+  else
+    return true;
+
+  /* Optional pointer to compare against.  */
+  parmnum = 4;
+  arg_type = function_args_iter_cond (&iter);
+  if (params->length () == 5)
+    {
+      val = (*params)[parmnum];
+      if (TREE_CODE (TREE_TYPE (val)) == ARRAY_TYPE)
+	val = default_conversion (val);
+      if (TREE_CODE (TREE_TYPE (val)) != POINTER_TYPE)
+	goto bad_arg;
+      (*params)[parmnum] = val;
+    }
+
+  return true;
+
+ bad_arg:
+  error_at (loc, "expecting argument of type %qT for argument %u", arg_type,
+	    parmnum);
+  return false;
+}
+
+/* Cast the result of the builtin back to the type pointed to by the
+   first argument, preserving any qualifiers that it might have.  */
+static tree
+load_no_speculate_resolve_return (tree first_param, tree result)
+{
+  tree ptype = TREE_TYPE (TREE_TYPE (first_param));
+  tree rtype = TREE_TYPE (result);
+  ptype = TYPE_MAIN_VARIANT (ptype);
+
+  if (tree_int_cst_equal (TYPE_SIZE (ptype), TYPE_SIZE (rtype)))
+    return convert (ptype, result);
+
+  return result;
+}
+
 /* A helper function for resolve_overloaded_builtin in resolving the
    overloaded __sync_ builtins.  Returns a positive power of 2 if the
    first operand of PARAMS is a pointer to a supported data type.
@@ -7110,6 +7250,30 @@  resolve_overloaded_builtin (location_t loc, tree function,
   /* Handle BUILT_IN_NORMAL here.  */
   switch (orig_code)
     {
+    case BUILT_IN_LOAD_NO_SPECULATE_N:
+      {
+	int n = load_no_speculate_resolve_size (function, params);
+	tree new_function, first_param, result;
+	enum built_in_function fncode;
+
+	if (n == 0)
+	  return error_mark_node;
+
+	fncode = (enum built_in_function)((int)orig_code + exact_log2 (n) + 1);
+	new_function = builtin_decl_explicit (fncode);
+	first_param = (*params)[0];
+	if (!load_no_speculate_resolve_params (loc, function, new_function,
+					       params))
+	  return error_mark_node;
+
+	result = build_function_call_vec (loc, vNULL, new_function, params,
+					  NULL);
+	if (result == error_mark_node)
+	  return result;
+
+	return load_no_speculate_resolve_return (first_param, result);
+      }
+
     case BUILT_IN_ATOMIC_EXCHANGE:
     case BUILT_IN_ATOMIC_COMPARE_EXCHANGE:
     case BUILT_IN_ATOMIC_LOAD:
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 9e33aed..fb06ee7 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1361,7 +1361,10 @@  c_cpp_builtins (cpp_reader *pfile)
     cpp_define (pfile, "__WCHAR_UNSIGNED__");
 
   cpp_atomic_builtins (pfile);
-    
+
+  /* Show support for __builtin_load_no_speculate ().  */
+  cpp_define (pfile, "__HAVE_LOAD_NO_SPECULATE");
+
 #ifdef DWARF2_UNWIND_INFO
   if (dwarf2out_do_cfi_asm ())
     cpp_define (pfile, "__GCC_HAVE_DWARF2_CFI_ASM");
diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi
index 94437d5..9dca2e2 100644
--- a/gcc/doc/cpp.texi
+++ b/gcc/doc/cpp.texi
@@ -2381,6 +2381,10 @@  If GCC cannot determine the current date, it will emit a warning message
 These macros are defined when the target processor supports atomic compare
 and swap operations on operands 1, 2, 4, 8 or 16 bytes in length, respectively.
 
+@item __HAVE_LOAD_NO_SPECULATE
+This macro is defined with the value 1 to show that this version of GCC
+supports @code{__builtin_load_no_speculate}.
+
 @item __GCC_HAVE_DWARF2_CFI_ASM
 This macro is defined when the compiler is emitting DWARF CFI directives
 to the assembler.  When this is defined, it is possible to emit those same
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 2a553ad..7a71f34 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10968,6 +10968,7 @@  the built-in function returns -1.
 @findex __builtin_islessequal
 @findex __builtin_islessgreater
 @findex __builtin_isunordered
+@findex __builtin_load_no_speculate
 @findex __builtin_powi
 @findex __builtin_powif
 @findex __builtin_powil
@@ -11614,6 +11615,58 @@  check its compatibility with @var{size}.
 
 @end deftypefn
 
+@deftypefn {Built-in Function} @var{type} __builtin_load_no_speculate (const volatile @var{type} *ptr, const volatile void *lower_bound, const volatile void *upper_bound, @var{type} failval, const volatile void *cmpptr)
+The @code{__builtin_load_no_speculation} function provides a means to
+limit the extent to which a processor can continue speculative
+execution with the result of loading a value stored at @var{ptr}.
+Logically, the builtin implements the following behavior:
+
+@smallexample
+inline @var{type} __builtin_load_no_speculate
+    (const volatile @var{type} *ptr,
+     const volatile void *lower_bound,
+     const volatile void *upper_bound,
+     @var{type} failval,
+     const volatile void *cmpptr)
+@{
+  @var{type} result;
+  if (cmpptr >= lower_bound && cmpptr < upper_bound)
+    result = *ptr;
+  else
+    result = failval;
+  return result;
+@}
+@end smallexample
+
+but in addition target-specific code will be inserted to ensure that
+speculation using @code{*ptr} cannot occur when @var{cmpptr} lies outside of
+the specified bounds.
+
+@var{type} may be any integral type (signed, or unsigned, @code{char},
+@code{short}, @code{int}, etc) or a pointer to any type.
+
+The final argument, @var{cmpptr}, may be omitted.  If you do this,
+then the compiler will use @var{ptr} for comparison against the upper
+and lower bounds.  Furthermore, if you omit @var{cmpptr}, you may also
+omit @var{failval} and the compiler will use @code{(@var{type})0} for
+the out-of-bounds result.
+
+Additionally, when it is know that one of the bounds can never fail,
+you can use a literal @code{NULL} argument and the compiler will
+generate code that only checks the other boundary condition.  It is generally
+only safe to do this when your code contains a loop construct where the only
+boundary of interest is the one beyond the termination condition.  You cannot
+omit both boundary conditions in this way.
+
+The logical behaviour of the builtin is supported for all architectures, but
+on machines where target-specific support for inhibiting speculation is not
+implemented, or not necessary, the compiler will emit a warning.
+
+The pre-processor macro @code{__HAVE_LOAD_NO_SPECULATE} is defined with the
+value 1 on all implementations of GCC that support this builtin.
+
+@end deftypefn
+
 @deftypefn {Built-in Function} int __builtin_types_compatible_p (@var{type1}, @var{type2})
 
 You can use the built-in function @code{__builtin_types_compatible_p} to
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 9793a0e..7309ccb 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11922,6 +11922,12 @@  maintainer is familiar with.
 
 @end defmac
 
+@deftypefn {Target Hook} rtx TARGET_INHIBIT_LOAD_SPECULATION (machine_mode @var{mode}, rtx @var{result}, rtx @var{mem}, rtx @var{lower_bound}, rtx @var{upper_bound}, rtx @var{fail_result}, rtx @var{cmpptr})
+Generate a target-specific code sequence that implements @code{__builtin_load_no_speculate}, returning the result in @var{result}. If @var{cmpptr} is greater than, or equal to, @var{lower_bound} and less than @var{upper_bound} then @var{mem}, a @code{MEM} of type @var{mode}, should be returned, otherwise @var{failval} should be returned.  The expansion must ensure that subsequent speculation by the processor using the @var{mem} cannot occur if @var{cmpptr} lies outside of the specified bounds.  At most one of @var{lower_bound} and @var{upper_bound} can be @code{NULL_RTX}, indicating that code for that bounds check should not be generated.
+ 
+ The default implementation implements the logic of the builtin but cannot provide the target-specific code necessary to inhibit speculation.  A warning will be emitted to that effect.
+@end deftypefn
+
 @deftypefn {Target Hook} void TARGET_RUN_TARGET_SELFTESTS (void)
 If selftests are enabled, run any selftests for this target.
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 7bcfb37..d34e4bf 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8075,4 +8075,6 @@  maintainer is familiar with.
 
 @end defmac
 
+@hook TARGET_INHIBIT_LOAD_SPECULATION
+
 @hook TARGET_RUN_TARGET_SELFTESTS
diff --git a/gcc/target.def b/gcc/target.def
index e9eacc8..375eb0a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4214,6 +4214,26 @@  DEFHOOK
  hook_bool_void_true)
 
 DEFHOOK
+(inhibit_load_speculation,
+ "Generate a target-specific code sequence that implements\
+ @code{__builtin_load_no_speculate}, returning the result in @var{result}.\
+ If @var{cmpptr} is greater than, or equal to, @var{lower_bound} and less\
+ than @var{upper_bound} then @var{mem}, a @code{MEM} of type @var{mode},\
+ should be returned, otherwise @var{failval} should be returned.  The\
+ expansion must ensure that subsequent speculation by the processor using\
+ the @var{mem} cannot occur if @var{cmpptr} lies outside of the specified\
+ bounds.  At most one of @var{lower_bound} and @var{upper_bound} can be\
+ @code{NULL_RTX}, indicating that code for that bounds check should not be\
+ generated.\n\
+ \n\
+ The default implementation implements the logic of the builtin\
+ but cannot provide the target-specific code necessary to inhibit\
+ speculation.  A warning will be emitted to that effect.",
+ rtx, (machine_mode mode, rtx result, rtx mem, rtx lower_bound,
+       rtx upper_bound, rtx fail_result, rtx cmpptr),
+ default_inhibit_load_speculation)
+
+DEFHOOK
 (can_use_doloop_p,
  "Return true if it is possible to use low-overhead loops (@code{doloop_end}\n\
 and @code{doloop_begin}) for a particular loop.  @var{iterations} gives the\n\
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 653567c..24d9c7b 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -82,6 +82,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "real.h"
 #include "langhooks.h"
+#include "dojump.h"
 
 bool
 default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
@@ -2307,4 +2308,72 @@  default_stack_clash_protection_final_dynamic_probe (rtx residual ATTRIBUTE_UNUSE
   return 0;
 }
 
+/* Default implementation of the load-and-inhibit-speculation builtin.
+   This version does not have, or know of, the target-specific
+   mechanisms necessary to inhibit speculation, so it simply emits a
+   code sequence that implements the architectural aspects of the
+   builtin.  */
+rtx
+default_inhibit_load_speculation (machine_mode mode ATTRIBUTE_UNUSED,
+				  rtx result,
+				  rtx mem,
+				  rtx lower_bound,
+				  rtx upper_bound,
+				  rtx fail_result,
+				  rtx cmpptr)
+{
+  rtx_code_label *done_label = gen_label_rtx ();
+  rtx_code_label *inrange_label = gen_label_rtx ();
+  warning_at
+    (input_location, 0,
+     "this target does not support anti-speculation operations.  "
+     "Your program will still execute correctly, but speculation "
+     "will not be inhibited");
+
+  /* We don't have any despeculation barriers, but if we mark the branch
+     probabilities to be always predicting the out-of-bounds path, then
+     there's a higher chance that the compiler will order code so that
+     static prediction will fall through a safe path.  */
+  if (lower_bound == NULL)
+    {
+      do_compare_rtx_and_jump (cmpptr, upper_bound, LTU, true, ptr_mode,
+			       NULL, NULL, inrange_label,
+			       profile_probability::never ());
+      emit_move_insn (result, fail_result);
+      emit_jump (done_label);
+      emit_label (inrange_label);
+      emit_move_insn (result, mem);
+      emit_label (done_label);
+    }
+  else if (upper_bound == NULL)
+    {
+      do_compare_rtx_and_jump (cmpptr, lower_bound, GEU, true, ptr_mode,
+			       NULL, NULL, inrange_label,
+			       profile_probability::never ());
+      emit_move_insn (result, fail_result);
+      emit_jump (done_label);
+      emit_label (inrange_label);
+      emit_move_insn (result, mem);
+      emit_label (done_label);
+    }
+  else
+    {
+      rtx_code_label *oob_label = gen_label_rtx ();
+      do_compare_rtx_and_jump (cmpptr, lower_bound, LTU, true, ptr_mode,
+			       NULL, NULL, oob_label,
+			       profile_probability::always ());
+      do_compare_rtx_and_jump (cmpptr, upper_bound, GEU, true, ptr_mode,
+			       NULL, NULL, inrange_label,
+			       profile_probability::never ());
+      emit_label (oob_label);
+      emit_move_insn (result, fail_result);
+      emit_jump (done_label);
+      emit_label (inrange_label);
+      emit_move_insn (result, mem);
+      emit_label (done_label);
+    }
+
+  return result;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index e753e58..c55b43f 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -286,4 +286,7 @@  extern enum flt_eval_method
 default_excess_precision (enum excess_precision_type ATTRIBUTE_UNUSED);
 extern bool default_stack_clash_protection_final_dynamic_probe (rtx);
 
+extern rtx
+default_inhibit_load_speculation (machine_mode, rtx, rtx, rtx, rtx, rtx, rtx);
+
 #endif /* GCC_TARGHOOKS_H */