Message ID | cover.1515072356.git.Richard.Earnshaw@arm.com |
---|---|
Headers | show |
Series | Add __builtin_load_no_speculate | expand |
On Thu, 4 Jan 2018, Richard Earnshaw wrote: > 1 - generic modifications to GCC providing the builtin function for all > architectures and expanding to an implementation that gives the > logical behaviour of the builtin only. A warning is generated if > this expansion path is used that code will execute correctly but > without providing protection against speculative use. Presumably it would make sense to have a standard way for architectures with no speculative execution to just use the generic version, but without the warning? E.g., split up default_inhibit_load_speculation so that it generates the warning then calls another function with the same prototype, so such architectures can just define the hook to use that other function? -- Joseph S. Myers joseph@codesourcery.com
On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw <Richard.Earnshaw@arm.com> wrote: > > Recently, Google Project Zero disclosed several classes of attack > against speculative execution. One of these, known as variant-1 > (CVE-2017-5753), allows explicit bounds checks to be bypassed under > speculation, providing an arbitrary read gadget. Further details can > be found on the GPZ blog [1] and the documentation that is included > with the first patch. > > This patch set adds a new builtin function for GCC to provide a > mechanism for limiting speculation by a CPU after a bounds-checked > memory access. I've tried to design this in such a way that it can be > used for any target where this might be necessary. The patch set > provides a generic implementation of the builtin and then > target-specific support for Arm and AArch64. Other architectures can > utilize the internal infrastructure as needed. > > Most of the details of the builtin and the hooks that need to be > implemented to support it are described in the updates to the manual, > but a short summary is given below. > > TYP __builtin_load_no_speculate > (const volatile TYP *ptr, > const volatile void *lower, > const volatile void *upper, > TYP failval, > const volatile void *cmpptr) > > Where TYP can be any integral type (signed or unsigned char, int, > short, long, etc) or any pointer type. > > The builtin implements the following logical behaviour: > > inline TYP __builtin_load_no_speculate > (const volatile TYP *ptr, > const volatile void *lower, > const volatile void *upper, > TYP failval, > const volatile void *cmpptr) > { > TYP result; > > if (cmpptr >= lower && cmpptr < upper) > result = *ptr; > else > result = failval; > return result; > } > > in addition the specification of the builtin ensures that future > speculation using *ptr may only continue iff cmpptr lies within the > bounds specified. I fail to see how the vulnerability doesn't affect aggregate copies or bitfield accesses. The vulnerability doesn't cause the loaded value to leak but (part of) the address by populating the CPU cache side-channel. So why isn't this just T __builtin_load_no_speculate (T *); ? Why that "fallback" and why the lower/upper bounds? Did you talk with other compiler vendors (intel, llvm, ...?)? Richard. > Some optimizations are permitted to make the builtin easier to use. > The final two arguments can both be omitted (c++ style): failval will > default to 0 in this case and if cmpptr is omitted ptr will be used > for expansions of the range check. In addition either lower or upper > (but not both) may be a literal NULL and the expansion will then > ignore that boundary condition when expanding. > > The patch set is constructed as follows: > 1 - generic modifications to GCC providing the builtin function for all > architectures and expanding to an implementation that gives the > logical behaviour of the builtin only. A warning is generated if > this expansion path is used that code will execute correctly but > without providing protection against speculative use. > 2 - AArch64 support > 3 - AArch32 support (arm) for A32 and thumb2 states. > > These patches can be used with the header file that Arm recently > published here: https://github.com/ARM-software/speculation-barrier. > > Kernel patches are also being developed, eg: > https://lkml.org/lkml/2018/1/3/754. The intent is that eventually > code like this will be able to use support directly from the compiler > in a portable manner. > > Similar patches are also being developed for LLVM and will be posted > to their development lists shortly. > > [1] More information on the topic can be found here: > https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html > Arm specific information can be found here: https://www.arm.com/security-update > > > > Richard Earnshaw (3): > [builtins] Generic support for __builtin_load_no_speculate() > [aarch64] Implement support for __builtin_load_no_speculate. > [arm] Implement support for the de-speculation intrinsic > > gcc/builtin-types.def | 16 +++++ > gcc/builtins.c | 99 +++++++++++++++++++++++++ > gcc/builtins.def | 22 ++++++ > gcc/c-family/c-common.c | 164 ++++++++++++++++++++++++++++++++++++++++++ > gcc/c-family/c-cppbuiltin.c | 5 +- > gcc/config/aarch64/aarch64.c | 92 ++++++++++++++++++++++++ > gcc/config/aarch64/aarch64.md | 28 ++++++++ > gcc/config/arm/arm.c | 107 +++++++++++++++++++++++++++ > gcc/config/arm/arm.md | 40 ++++++++++- > gcc/config/arm/unspecs.md | 1 + > gcc/doc/cpp.texi | 4 ++ > gcc/doc/extend.texi | 53 ++++++++++++++ > gcc/doc/tm.texi | 6 ++ > gcc/doc/tm.texi.in | 2 + > gcc/target.def | 20 ++++++ > gcc/targhooks.c | 69 ++++++++++++++++++ > gcc/targhooks.h | 3 + > 17 files changed, 729 insertions(+), 2 deletions(-) > > >
On 05/01/18 09:44, Richard Biener wrote: > On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw > <Richard.Earnshaw@arm.com> wrote: >> >> Recently, Google Project Zero disclosed several classes of attack >> against speculative execution. One of these, known as variant-1 >> (CVE-2017-5753), allows explicit bounds checks to be bypassed under >> speculation, providing an arbitrary read gadget. Further details can >> be found on the GPZ blog [1] and the documentation that is included >> with the first patch. >> >> This patch set adds a new builtin function for GCC to provide a >> mechanism for limiting speculation by a CPU after a bounds-checked >> memory access. I've tried to design this in such a way that it can be >> used for any target where this might be necessary. The patch set >> provides a generic implementation of the builtin and then >> target-specific support for Arm and AArch64. Other architectures can >> utilize the internal infrastructure as needed. >> >> Most of the details of the builtin and the hooks that need to be >> implemented to support it are described in the updates to the manual, >> but a short summary is given below. >> >> TYP __builtin_load_no_speculate >> (const volatile TYP *ptr, >> const volatile void *lower, >> const volatile void *upper, >> TYP failval, >> const volatile void *cmpptr) >> >> Where TYP can be any integral type (signed or unsigned char, int, >> short, long, etc) or any pointer type. >> >> The builtin implements the following logical behaviour: >> >> inline TYP __builtin_load_no_speculate >> (const volatile TYP *ptr, >> const volatile void *lower, >> const volatile void *upper, >> TYP failval, >> const volatile void *cmpptr) >> { >> TYP result; >> >> if (cmpptr >= lower && cmpptr < upper) >> result = *ptr; >> else >> result = failval; >> return result; >> } >> >> in addition the specification of the builtin ensures that future >> speculation using *ptr may only continue iff cmpptr lies within the >> bounds specified. > > I fail to see how the vulnerability doesn't affect aggregate copies > or bitfield accesses. The vulnerability doesn't cause the loaded > value to leak but (part of) the address by populating the CPU cache > side-channel. > It's not quite as simple as that. You'll need to read the white paper on Arm's web site to get a full grasp of this (linked from https://www.arm.com/security-update). > So why isn't this just > > T __builtin_load_no_speculate (T *); > > ? Why that "fallback" and why the lower/upper bounds? Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem to restrict subsequent speculation, the CSEL needs a condition and the compiler must not be able to optimize it away based on logical reachability. The fallback is used for the other operand of the CSEL instruction. > > Did you talk with other compiler vendors (intel, llvm, ...?)? As stated we'll shortly be submitting similar patches for LLVM. R. > > Richard. > >> Some optimizations are permitted to make the builtin easier to use. >> The final two arguments can both be omitted (c++ style): failval will >> default to 0 in this case and if cmpptr is omitted ptr will be used >> for expansions of the range check. In addition either lower or upper >> (but not both) may be a literal NULL and the expansion will then >> ignore that boundary condition when expanding. >> >> The patch set is constructed as follows: >> 1 - generic modifications to GCC providing the builtin function for all >> architectures and expanding to an implementation that gives the >> logical behaviour of the builtin only. A warning is generated if >> this expansion path is used that code will execute correctly but >> without providing protection against speculative use. >> 2 - AArch64 support >> 3 - AArch32 support (arm) for A32 and thumb2 states. >> >> These patches can be used with the header file that Arm recently >> published here: https://github.com/ARM-software/speculation-barrier. >> >> Kernel patches are also being developed, eg: >> https://lkml.org/lkml/2018/1/3/754. The intent is that eventually >> code like this will be able to use support directly from the compiler >> in a portable manner. >> >> Similar patches are also being developed for LLVM and will be posted >> to their development lists shortly. >> >> [1] More information on the topic can be found here: >> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html >> Arm specific information can be found here: https://www.arm.com/security-update >> >> >> >> Richard Earnshaw (3): >> [builtins] Generic support for __builtin_load_no_speculate() >> [aarch64] Implement support for __builtin_load_no_speculate. >> [arm] Implement support for the de-speculation intrinsic >> >> gcc/builtin-types.def | 16 +++++ >> gcc/builtins.c | 99 +++++++++++++++++++++++++ >> gcc/builtins.def | 22 ++++++ >> gcc/c-family/c-common.c | 164 ++++++++++++++++++++++++++++++++++++++++++ >> gcc/c-family/c-cppbuiltin.c | 5 +- >> gcc/config/aarch64/aarch64.c | 92 ++++++++++++++++++++++++ >> gcc/config/aarch64/aarch64.md | 28 ++++++++ >> gcc/config/arm/arm.c | 107 +++++++++++++++++++++++++++ >> gcc/config/arm/arm.md | 40 ++++++++++- >> gcc/config/arm/unspecs.md | 1 + >> gcc/doc/cpp.texi | 4 ++ >> gcc/doc/extend.texi | 53 ++++++++++++++ >> gcc/doc/tm.texi | 6 ++ >> gcc/doc/tm.texi.in | 2 + >> gcc/target.def | 20 ++++++ >> gcc/targhooks.c | 69 ++++++++++++++++++ >> gcc/targhooks.h | 3 + >> 17 files changed, 729 insertions(+), 2 deletions(-) >> >> >>
On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote: > On 05/01/18 09:44, Richard Biener wrote: >> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw >> <Richard.Earnshaw@arm.com> wrote: >>> >>> Recently, Google Project Zero disclosed several classes of attack >>> against speculative execution. One of these, known as variant-1 >>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under >>> speculation, providing an arbitrary read gadget. Further details can >>> be found on the GPZ blog [1] and the documentation that is included >>> with the first patch. >>> >>> This patch set adds a new builtin function for GCC to provide a >>> mechanism for limiting speculation by a CPU after a bounds-checked >>> memory access. I've tried to design this in such a way that it can be >>> used for any target where this might be necessary. The patch set >>> provides a generic implementation of the builtin and then >>> target-specific support for Arm and AArch64. Other architectures can >>> utilize the internal infrastructure as needed. >>> >>> Most of the details of the builtin and the hooks that need to be >>> implemented to support it are described in the updates to the manual, >>> but a short summary is given below. >>> >>> TYP __builtin_load_no_speculate >>> (const volatile TYP *ptr, >>> const volatile void *lower, >>> const volatile void *upper, >>> TYP failval, >>> const volatile void *cmpptr) >>> >>> Where TYP can be any integral type (signed or unsigned char, int, >>> short, long, etc) or any pointer type. >>> >>> The builtin implements the following logical behaviour: >>> >>> inline TYP __builtin_load_no_speculate >>> (const volatile TYP *ptr, >>> const volatile void *lower, >>> const volatile void *upper, >>> TYP failval, >>> const volatile void *cmpptr) >>> { >>> TYP result; >>> >>> if (cmpptr >= lower && cmpptr < upper) >>> result = *ptr; >>> else >>> result = failval; >>> return result; >>> } >>> >>> in addition the specification of the builtin ensures that future >>> speculation using *ptr may only continue iff cmpptr lies within the >>> bounds specified. >> >> I fail to see how the vulnerability doesn't affect aggregate copies >> or bitfield accesses. The vulnerability doesn't cause the loaded >> value to leak but (part of) the address by populating the CPU cache >> side-channel. >> > > It's not quite as simple as that. You'll need to read the white paper > on Arm's web site to get a full grasp of this (linked from > https://www.arm.com/security-update). > >> So why isn't this just >> >> T __builtin_load_no_speculate (T *); >> >> ? Why that "fallback" and why the lower/upper bounds? > > Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem > to restrict subsequent speculation, the CSEL needs a condition and the > compiler must not be able to optimize it away based on logical > reachability. The fallback is used for the other operand of the CSEL > instruction. But the condition could be just 'true' in the instruction encoding? That is, why do you do both the jump-around and the csel? Maybe I misunderstood the RTL you appear to generate. I'd have expected an UNSPEC to avoid the compiler messing up things. >> >> Did you talk with other compiler vendors (intel, llvm, ...?)? > > As stated we'll shortly be submitting similar patches for LLVM. How do you solve the aggregate load issue? I would imagine that speculating stores (by pre-fetching the affected cache line!) could be another attack vector? Not sure if any CPU speculatively does that (but I see no good reason why a CPU shouldn't do that). Does ARM have a barrier like instruction suitable for use in the kernel patches that are floating around? Richard. > R. > >> >> Richard. >> >>> Some optimizations are permitted to make the builtin easier to use. >>> The final two arguments can both be omitted (c++ style): failval will >>> default to 0 in this case and if cmpptr is omitted ptr will be used >>> for expansions of the range check. In addition either lower or upper >>> (but not both) may be a literal NULL and the expansion will then >>> ignore that boundary condition when expanding. >>> >>> The patch set is constructed as follows: >>> 1 - generic modifications to GCC providing the builtin function for all >>> architectures and expanding to an implementation that gives the >>> logical behaviour of the builtin only. A warning is generated if >>> this expansion path is used that code will execute correctly but >>> without providing protection against speculative use. >>> 2 - AArch64 support >>> 3 - AArch32 support (arm) for A32 and thumb2 states. >>> >>> These patches can be used with the header file that Arm recently >>> published here: https://github.com/ARM-software/speculation-barrier. >>> >>> Kernel patches are also being developed, eg: >>> https://lkml.org/lkml/2018/1/3/754. The intent is that eventually >>> code like this will be able to use support directly from the compiler >>> in a portable manner. >>> >>> Similar patches are also being developed for LLVM and will be posted >>> to their development lists shortly. >>> >>> [1] More information on the topic can be found here: >>> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html >>> Arm specific information can be found here: https://www.arm.com/security-update >>> >>> >>> >>> Richard Earnshaw (3): >>> [builtins] Generic support for __builtin_load_no_speculate() >>> [aarch64] Implement support for __builtin_load_no_speculate. >>> [arm] Implement support for the de-speculation intrinsic >>> >>> gcc/builtin-types.def | 16 +++++ >>> gcc/builtins.c | 99 +++++++++++++++++++++++++ >>> gcc/builtins.def | 22 ++++++ >>> gcc/c-family/c-common.c | 164 ++++++++++++++++++++++++++++++++++++++++++ >>> gcc/c-family/c-cppbuiltin.c | 5 +- >>> gcc/config/aarch64/aarch64.c | 92 ++++++++++++++++++++++++ >>> gcc/config/aarch64/aarch64.md | 28 ++++++++ >>> gcc/config/arm/arm.c | 107 +++++++++++++++++++++++++++ >>> gcc/config/arm/arm.md | 40 ++++++++++- >>> gcc/config/arm/unspecs.md | 1 + >>> gcc/doc/cpp.texi | 4 ++ >>> gcc/doc/extend.texi | 53 ++++++++++++++ >>> gcc/doc/tm.texi | 6 ++ >>> gcc/doc/tm.texi.in | 2 + >>> gcc/target.def | 20 ++++++ >>> gcc/targhooks.c | 69 ++++++++++++++++++ >>> gcc/targhooks.h | 3 + >>> 17 files changed, 729 insertions(+), 2 deletions(-) >>> >>> >>> >
On 05/01/18 10:47, Richard Biener wrote: > On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists) > <Richard.Earnshaw@arm.com> wrote: >> On 05/01/18 09:44, Richard Biener wrote: >>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw >>> <Richard.Earnshaw@arm.com> wrote: >>>> >>>> Recently, Google Project Zero disclosed several classes of attack >>>> against speculative execution. One of these, known as variant-1 >>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under >>>> speculation, providing an arbitrary read gadget. Further details can >>>> be found on the GPZ blog [1] and the documentation that is included >>>> with the first patch. >>>> >>>> This patch set adds a new builtin function for GCC to provide a >>>> mechanism for limiting speculation by a CPU after a bounds-checked >>>> memory access. I've tried to design this in such a way that it can be >>>> used for any target where this might be necessary. The patch set >>>> provides a generic implementation of the builtin and then >>>> target-specific support for Arm and AArch64. Other architectures can >>>> utilize the internal infrastructure as needed. >>>> >>>> Most of the details of the builtin and the hooks that need to be >>>> implemented to support it are described in the updates to the manual, >>>> but a short summary is given below. >>>> >>>> TYP __builtin_load_no_speculate >>>> (const volatile TYP *ptr, >>>> const volatile void *lower, >>>> const volatile void *upper, >>>> TYP failval, >>>> const volatile void *cmpptr) >>>> >>>> Where TYP can be any integral type (signed or unsigned char, int, >>>> short, long, etc) or any pointer type. >>>> >>>> The builtin implements the following logical behaviour: >>>> >>>> inline TYP __builtin_load_no_speculate >>>> (const volatile TYP *ptr, >>>> const volatile void *lower, >>>> const volatile void *upper, >>>> TYP failval, >>>> const volatile void *cmpptr) >>>> { >>>> TYP result; >>>> >>>> if (cmpptr >= lower && cmpptr < upper) >>>> result = *ptr; >>>> else >>>> result = failval; >>>> return result; >>>> } >>>> >>>> in addition the specification of the builtin ensures that future >>>> speculation using *ptr may only continue iff cmpptr lies within the >>>> bounds specified. >>> >>> I fail to see how the vulnerability doesn't affect aggregate copies >>> or bitfield accesses. The vulnerability doesn't cause the loaded >>> value to leak but (part of) the address by populating the CPU cache >>> side-channel. >>> >> >> It's not quite as simple as that. You'll need to read the white paper >> on Arm's web site to get a full grasp of this (linked from >> https://www.arm.com/security-update). >> >>> So why isn't this just >>> >>> T __builtin_load_no_speculate (T *); >>> >>> ? Why that "fallback" and why the lower/upper bounds? >> >> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem >> to restrict subsequent speculation, the CSEL needs a condition and the >> compiler must not be able to optimize it away based on logical >> reachability. The fallback is used for the other operand of the CSEL >> instruction. > > But the condition could be just 'true' in the instruction encoding? That is, > why do you do both the jump-around and the csel? Maybe I misunderstood > the RTL you appear to generate. I'd have expected an UNSPEC to avoid > the compiler messing up things. > That's why the intrinsic takes explicit bounds not a simple bool expressing the conditions. It prevents the optimizers from replacing the condition by value range propagation. That does restrict the flexibility of the builtin somewhat but it does allow it to work correctly at all optimization levels. >>> >>> Did you talk with other compiler vendors (intel, llvm, ...?)? >> >> As stated we'll shortly be submitting similar patches for LLVM. > > How do you solve the aggregate load issue? I would imagine > that speculating stores (by pre-fetching the affected cache line!) > could be another attack vector? Not sure if any CPU speculatively > does that (but I see no good reason why a CPU shouldn't do that). > > Does ARM have a barrier like instruction suitable for use in the > kernel patches that are floating around? > > Richard. > >> R. >> >>> >>> Richard. >>> >>>> Some optimizations are permitted to make the builtin easier to use. >>>> The final two arguments can both be omitted (c++ style): failval will >>>> default to 0 in this case and if cmpptr is omitted ptr will be used >>>> for expansions of the range check. In addition either lower or upper >>>> (but not both) may be a literal NULL and the expansion will then >>>> ignore that boundary condition when expanding. >>>> >>>> The patch set is constructed as follows: >>>> 1 - generic modifications to GCC providing the builtin function for all >>>> architectures and expanding to an implementation that gives the >>>> logical behaviour of the builtin only. A warning is generated if >>>> this expansion path is used that code will execute correctly but >>>> without providing protection against speculative use. >>>> 2 - AArch64 support >>>> 3 - AArch32 support (arm) for A32 and thumb2 states. >>>> >>>> These patches can be used with the header file that Arm recently >>>> published here: https://github.com/ARM-software/speculation-barrier. >>>> >>>> Kernel patches are also being developed, eg: >>>> https://lkml.org/lkml/2018/1/3/754. The intent is that eventually >>>> code like this will be able to use support directly from the compiler >>>> in a portable manner. >>>> >>>> Similar patches are also being developed for LLVM and will be posted >>>> to their development lists shortly. >>>> >>>> [1] More information on the topic can be found here: >>>> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html >>>> Arm specific information can be found here: https://www.arm.com/security-update >>>> >>>> >>>> >>>> Richard Earnshaw (3): >>>> [builtins] Generic support for __builtin_load_no_speculate() >>>> [aarch64] Implement support for __builtin_load_no_speculate. >>>> [arm] Implement support for the de-speculation intrinsic >>>> >>>> gcc/builtin-types.def | 16 +++++ >>>> gcc/builtins.c | 99 +++++++++++++++++++++++++ >>>> gcc/builtins.def | 22 ++++++ >>>> gcc/c-family/c-common.c | 164 ++++++++++++++++++++++++++++++++++++++++++ >>>> gcc/c-family/c-cppbuiltin.c | 5 +- >>>> gcc/config/aarch64/aarch64.c | 92 ++++++++++++++++++++++++ >>>> gcc/config/aarch64/aarch64.md | 28 ++++++++ >>>> gcc/config/arm/arm.c | 107 +++++++++++++++++++++++++++ >>>> gcc/config/arm/arm.md | 40 ++++++++++- >>>> gcc/config/arm/unspecs.md | 1 + >>>> gcc/doc/cpp.texi | 4 ++ >>>> gcc/doc/extend.texi | 53 ++++++++++++++ >>>> gcc/doc/tm.texi | 6 ++ >>>> gcc/doc/tm.texi.in | 2 + >>>> gcc/target.def | 20 ++++++ >>>> gcc/targhooks.c | 69 ++++++++++++++++++ >>>> gcc/targhooks.h | 3 + >>>> 17 files changed, 729 insertions(+), 2 deletions(-) >>>> >>>> >>>> >>
On Fri, Jan 05, 2018 at 10:59:21AM +0000, Richard Earnshaw (lists) wrote: > > But the condition could be just 'true' in the instruction encoding? That is, > > why do you do both the jump-around and the csel? Maybe I misunderstood > > the RTL you appear to generate. I'd have expected an UNSPEC to avoid > > the compiler messing up things. > > > > That's why the intrinsic takes explicit bounds not a simple bool > expressing the conditions. It prevents the optimizers from replacing > the condition by value range propagation. That does restrict the Preventing optimizers from replacing the condition can be done in many ways, IFN, UNSPEC, ... The question is if you really need it at the assembly level, or if you can just do an unconditional branch or branch conditional on say comparison of two constants, without any dynamic bounds. Jakub
I believe the proposed behavior of the new builtin is over-specialized. In principle the following snippet may be open to exploitation too: if (predicate) foo = arr[(secret >> untrusted) & 64]; assuming the attacker has a means to observe which part of 'arr' is brought into cache, but cannot set 'predicate' to true (so has to rely on the speculative population of the cache); and likewise if a store is predicated-off rather than a load. So shouldn't, for generality, the new builtin work "as if" by cleansing the address rather than the result of the load, like the following? It would also be applicable to stores then. On Thu, 4 Jan 2018, Richard Earnshaw wrote: > inline TYP __builtin_load_no_speculate > (const volatile TYP *ptr, > const volatile void *lower, > const volatile void *upper, > TYP failval, > const volatile void *cmpptr) > { > TYP result; > > if (cmpptr >= lower && cmpptr < upper) > result = *ptr; > else > result = failval; > return result; > } { if (!(cmpptr >= lower && cmpptr < upper)) ptr = NULL; return *ptr; } Alexander
On 05/01/18 11:37, Alexander Monakov wrote: > I believe the proposed behavior of the new builtin is over-specialized. > In principle the following snippet may be open to exploitation too: > > if (predicate) > foo = arr[(secret >> untrusted) & 64]; > > assuming the attacker has a means to observe which part of 'arr' is > brought into > cache, but cannot set 'predicate' to true (so has to rely on the speculative > population of the cache); and likewise if a store is predicated-off > rather than > a load. > > So shouldn't, for generality, the new builtin work "as if" by cleansing the > address rather than the result of the load, like the following? It would > also be > applicable to stores then. This is quite tricky. For ARM we have to have a speculated load. So one way to recast this might be to push 'untrusted' into a stack slot that was then speculatively loaded back by the builtin. To do that you'd need to find a way of expressing predicate as a range check, so something like if (predicate) { foo = arr[(secret >> __builtin_load_no_speculate (&untrusted, predicate_base, predicate_upper, 0, predicate_val)) & 64)]; } You could use similar techniques with stores as well. I'm open to suggestions as to how we might express the conditions better. It's certainly the least pleasant part of this proposal, but the seemingly obvious '_Bool condition' parameter won't work because the compiler will just substitute true in too many situations where it thinks it has proved that it knows the conditions. R. > > On Thu, 4 Jan 2018, Richard Earnshaw wrote: >> inline TYP __builtin_load_no_speculate >> (const volatile TYP *ptr, >> const volatile void *lower, >> const volatile void *upper, >> TYP failval, >> const volatile void *cmpptr) >> { >> TYP result; >> >> if (cmpptr >= lower && cmpptr < upper) >> result = *ptr; >> else >> result = failval; >> return result; >> } > > { > if (!(cmpptr >= lower && cmpptr < upper)) > ptr = NULL; > > return *ptr; > } > > Alexander
On 05/01/18 11:35, Jakub Jelinek wrote: > On Fri, Jan 05, 2018 at 10:59:21AM +0000, Richard Earnshaw (lists) wrote: >>> But the condition could be just 'true' in the instruction encoding? That is, >>> why do you do both the jump-around and the csel? Maybe I misunderstood >>> the RTL you appear to generate. I'd have expected an UNSPEC to avoid >>> the compiler messing up things. >>> >> >> That's why the intrinsic takes explicit bounds not a simple bool >> expressing the conditions. It prevents the optimizers from replacing >> the condition by value range propagation. That does restrict the > > Preventing optimizers from replacing the condition can be done in many ways, > IFN, UNSPEC, ... > The question is if you really need it at the assembly level, or if you can > just do an unconditional branch or branch conditional on say comparison of > two constants, without any dynamic bounds. > > Jakub > The bounds /have/ to reflect the real speculation condition. Otherwise the CSEL becomes ineffective. R.
On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote:
> This is quite tricky. For ARM we have to have a speculated load.
Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
wouldn't work (applying CSEL to the address rather than loaded value), and
if it wouldn't, then ARM-specific lowering of the builtin can handle that
anyhow, right? (by spilling the pointer)
(on x86 the current Intel's recommendation is to emit LFENCE prior to the load)
Is the main issue expressing the CSEL condition in the source code? Perhaps it is
possible to introduce
int guard = __builtin_nontransparent(predicate);
if (predicate)
foo = __builtin_load_no_speculate(&arr[addr], guard);
... or maybe even
if (predicate)
foo = arr[__builtin_loadspecbarrier(addr, guard)];
where internally __builtin_nontransparent is the same as
guard = predicate;
asm volatile("" : "+g"(guard));
although admittedly this is not perfect since it forces evaluation of 'guard'
before the branch.
Alexander
On 01/05/2018 02:44 AM, Richard Biener wrote: > On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw > <Richard.Earnshaw@arm.com> wrote: >> >> Recently, Google Project Zero disclosed several classes of attack >> against speculative execution. One of these, known as variant-1 >> (CVE-2017-5753), allows explicit bounds checks to be bypassed under >> speculation, providing an arbitrary read gadget. Further details can >> be found on the GPZ blog [1] and the documentation that is included >> with the first patch. >> >> This patch set adds a new builtin function for GCC to provide a >> mechanism for limiting speculation by a CPU after a bounds-checked >> memory access. I've tried to design this in such a way that it can be >> used for any target where this might be necessary. The patch set >> provides a generic implementation of the builtin and then >> target-specific support for Arm and AArch64. Other architectures can >> utilize the internal infrastructure as needed. >> >> Most of the details of the builtin and the hooks that need to be >> implemented to support it are described in the updates to the manual, >> but a short summary is given below. >> >> TYP __builtin_load_no_speculate >> (const volatile TYP *ptr, >> const volatile void *lower, >> const volatile void *upper, >> TYP failval, >> const volatile void *cmpptr) >> >> Where TYP can be any integral type (signed or unsigned char, int, >> short, long, etc) or any pointer type. >> >> The builtin implements the following logical behaviour: >> >> inline TYP __builtin_load_no_speculate >> (const volatile TYP *ptr, >> const volatile void *lower, >> const volatile void *upper, >> TYP failval, >> const volatile void *cmpptr) >> { >> TYP result; >> >> if (cmpptr >= lower && cmpptr < upper) >> result = *ptr; >> else >> result = failval; >> return result; >> } >> >> in addition the specification of the builtin ensures that future >> speculation using *ptr may only continue iff cmpptr lies within the >> bounds specified. > > I fail to see how the vulnerability doesn't affect aggregate copies > or bitfield accesses. The vulnerability doesn't cause the loaded > value to leak but (part of) the address by populating the CPU cache > side-channel. > > So why isn't this just > > T __builtin_load_no_speculate (T *); > > ? Why that "fallback" and why the lower/upper bounds? > > Did you talk with other compiler vendors (intel, llvm, ...?)? I think "fallback" could potentially be any value, I don't think it's actual value matters much -- I could just as easily be "0" across the board or some indeterminate value (as long as the indeterminate doesn't itself present an information leak). The bounds are used to build a condition for the csel to select between the loaded value and the fail value. You need some way to select between them. If I've got anything wrong, I'm sure Richard E. will correct me. Jeff
On 01/05/2018 03:40 AM, Richard Earnshaw (lists) wrote: > On 05/01/18 09:44, Richard Biener wrote: >> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw >> <Richard.Earnshaw@arm.com> wrote: >>> >>> Recently, Google Project Zero disclosed several classes of attack >>> against speculative execution. One of these, known as variant-1 >>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under >>> speculation, providing an arbitrary read gadget. Further details can >>> be found on the GPZ blog [1] and the documentation that is included >>> with the first patch. >>> >>> This patch set adds a new builtin function for GCC to provide a >>> mechanism for limiting speculation by a CPU after a bounds-checked >>> memory access. I've tried to design this in such a way that it can be >>> used for any target where this might be necessary. The patch set >>> provides a generic implementation of the builtin and then >>> target-specific support for Arm and AArch64. Other architectures can >>> utilize the internal infrastructure as needed. >>> >>> Most of the details of the builtin and the hooks that need to be >>> implemented to support it are described in the updates to the manual, >>> but a short summary is given below. >>> >>> TYP __builtin_load_no_speculate >>> (const volatile TYP *ptr, >>> const volatile void *lower, >>> const volatile void *upper, >>> TYP failval, >>> const volatile void *cmpptr) >>> >>> Where TYP can be any integral type (signed or unsigned char, int, >>> short, long, etc) or any pointer type. >>> >>> The builtin implements the following logical behaviour: >>> >>> inline TYP __builtin_load_no_speculate >>> (const volatile TYP *ptr, >>> const volatile void *lower, >>> const volatile void *upper, >>> TYP failval, >>> const volatile void *cmpptr) >>> { >>> TYP result; >>> >>> if (cmpptr >= lower && cmpptr < upper) >>> result = *ptr; >>> else >>> result = failval; >>> return result; >>> } >>> >>> in addition the specification of the builtin ensures that future >>> speculation using *ptr may only continue iff cmpptr lies within the >>> bounds specified. >> >> I fail to see how the vulnerability doesn't affect aggregate copies >> or bitfield accesses. The vulnerability doesn't cause the loaded >> value to leak but (part of) the address by populating the CPU cache >> side-channel. >> > > It's not quite as simple as that. You'll need to read the white paper > on Arm's web site to get a full grasp of this (linked from > https://www.arm.com/security-update). I actually recommend reading the original papers referenced by Google in addition to the ARM whitepaper. This stuff is far from intuitive. I've been loosely involved for a month or so, but it really didn't start to click for me until right before the holiday break. jeff
On 01/05/2018 04:57 AM, Richard Earnshaw (lists) wrote: > On 05/01/18 11:35, Jakub Jelinek wrote: >> On Fri, Jan 05, 2018 at 10:59:21AM +0000, Richard Earnshaw (lists) wrote: >>>> But the condition could be just 'true' in the instruction encoding? That is, >>>> why do you do both the jump-around and the csel? Maybe I misunderstood >>>> the RTL you appear to generate. I'd have expected an UNSPEC to avoid >>>> the compiler messing up things. >>>> >>> >>> That's why the intrinsic takes explicit bounds not a simple bool >>> expressing the conditions. It prevents the optimizers from replacing >>> the condition by value range propagation. That does restrict the >> >> Preventing optimizers from replacing the condition can be done in many ways, >> IFN, UNSPEC, ... >> The question is if you really need it at the assembly level, or if you can >> just do an unconditional branch or branch conditional on say comparison of >> two constants, without any dynamic bounds. >> >> Jakub >> > > The bounds /have/ to reflect the real speculation condition. Otherwise > the CSEL becomes ineffective. I think this is probably the key that Jakub and Richard B. were missing. It certainly hadn't clicked for me when we spoke earlier. Jeff
On 01/05/2018 03:47 AM, Richard Biener wrote: > On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists) > <Richard.Earnshaw@arm.com> wrote: >> On 05/01/18 09:44, Richard Biener wrote: >>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw >>> <Richard.Earnshaw@arm.com> wrote: >>>> >>>> Recently, Google Project Zero disclosed several classes of attack >>>> against speculative execution. One of these, known as variant-1 >>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under >>>> speculation, providing an arbitrary read gadget. Further details can >>>> be found on the GPZ blog [1] and the documentation that is included >>>> with the first patch. >>>> >>>> This patch set adds a new builtin function for GCC to provide a >>>> mechanism for limiting speculation by a CPU after a bounds-checked >>>> memory access. I've tried to design this in such a way that it can be >>>> used for any target where this might be necessary. The patch set >>>> provides a generic implementation of the builtin and then >>>> target-specific support for Arm and AArch64. Other architectures can >>>> utilize the internal infrastructure as needed. >>>> >>>> Most of the details of the builtin and the hooks that need to be >>>> implemented to support it are described in the updates to the manual, >>>> but a short summary is given below. >>>> >>>> TYP __builtin_load_no_speculate >>>> (const volatile TYP *ptr, >>>> const volatile void *lower, >>>> const volatile void *upper, >>>> TYP failval, >>>> const volatile void *cmpptr) >>>> >>>> Where TYP can be any integral type (signed or unsigned char, int, >>>> short, long, etc) or any pointer type. >>>> >>>> The builtin implements the following logical behaviour: >>>> >>>> inline TYP __builtin_load_no_speculate >>>> (const volatile TYP *ptr, >>>> const volatile void *lower, >>>> const volatile void *upper, >>>> TYP failval, >>>> const volatile void *cmpptr) >>>> { >>>> TYP result; >>>> >>>> if (cmpptr >= lower && cmpptr < upper) >>>> result = *ptr; >>>> else >>>> result = failval; >>>> return result; >>>> } >>>> >>>> in addition the specification of the builtin ensures that future >>>> speculation using *ptr may only continue iff cmpptr lies within the >>>> bounds specified. >>> >>> I fail to see how the vulnerability doesn't affect aggregate copies >>> or bitfield accesses. The vulnerability doesn't cause the loaded >>> value to leak but (part of) the address by populating the CPU cache >>> side-channel. >>> >> >> It's not quite as simple as that. You'll need to read the white paper >> on Arm's web site to get a full grasp of this (linked from >> https://www.arm.com/security-update). >> >>> So why isn't this just >>> >>> T __builtin_load_no_speculate (T *); >>> >>> ? Why that "fallback" and why the lower/upper bounds? >> >> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem >> to restrict subsequent speculation, the CSEL needs a condition and the >> compiler must not be able to optimize it away based on logical >> reachability. The fallback is used for the other operand of the CSEL >> instruction. > > But the condition could be just 'true' in the instruction encoding? That is, > why do you do both the jump-around and the csel? Maybe I misunderstood > the RTL you appear to generate. I'd have expected an UNSPEC to avoid > the compiler messing up things. > >>> >>> Did you talk with other compiler vendors (intel, llvm, ...?)? >> >> As stated we'll shortly be submitting similar patches for LLVM. > > How do you solve the aggregate load issue? I would imagine > that speculating stores (by pre-fetching the affected cache line!) > could be another attack vector? Not sure if any CPU speculatively > does that (but I see no good reason why a CPU shouldn't do that). Without going into details, I've speculated to the appropriate folks that there's other vectors for these kinds of attacks, some more exploitable than others. Attacking the cache via loads/stores is just one instance of a class of problems IMHO. I fully expect we'll be fighting this class of problems for a while. This is just the tip of the iceberg. Jeff
On 01/04/2018 06:58 AM, Richard Earnshaw wrote: > > Recently, Google Project Zero disclosed several classes of attack > against speculative execution. One of these, known as variant-1 > (CVE-2017-5753), allows explicit bounds checks to be bypassed under > speculation, providing an arbitrary read gadget. Further details can > be found on the GPZ blog [1] and the documentation that is included > with the first patch. So I think it's important for anyone reading this stuff to read Richard's paragraph carefully -- "an arbitrary read gadget". I fully expect that over the course of time we're going to see other arbitrary read gadgets to be found. Those gadgets may have lower bandwidth, have a higher degree of jitter or be harder to exploit, but they're out there just waiting to be discovered. So I don't expect this to be the only mitigation we have to put into place. Anyway... > > Some optimizations are permitted to make the builtin easier to use. > The final two arguments can both be omitted (c++ style): failval will > default to 0 in this case and if cmpptr is omitted ptr will be used > for expansions of the range check. In addition either lower or upper > (but not both) may be a literal NULL and the expansion will then > ignore that boundary condition when expanding. So what are the cases where FAILVAL is useful rather than just using zero (or some other constant) all the time? Similarly under what conditions would one want to use CMPPTR rather than PTR? I wandered down through the LKML thread but didn't see anything which would shed light on those two questions. jeff >
On 01/04/2018 11:20 AM, Joseph Myers wrote: > On Thu, 4 Jan 2018, Richard Earnshaw wrote: > >> 1 - generic modifications to GCC providing the builtin function for all >> architectures and expanding to an implementation that gives the >> logical behaviour of the builtin only. A warning is generated if >> this expansion path is used that code will execute correctly but >> without providing protection against speculative use. > > Presumably it would make sense to have a standard way for architectures > with no speculative execution to just use the generic version, but without > the warning? E.g., split up default_inhibit_load_speculation so that it > generates the warning then calls another function with the same prototype, > so such architectures can just define the hook to use that other function? > Seems wise. There's still architectures that don't speculate or don't speculate enough to trigger these problems. Jeff
On 05/01/18 13:08, Alexander Monakov wrote: > On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote: >> This is quite tricky. For ARM we have to have a speculated load. > > Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence > wouldn't work (applying CSEL to the address rather than loaded value), and > if it wouldn't, then ARM-specific lowering of the builtin can handle that > anyhow, right? (by spilling the pointer) The load has to feed /in/ to the csel/csdb sequence, not come after it. > > (on x86 the current Intel's recommendation is to emit LFENCE prior to the load) That can be supported in the way you expand the builtin. The builtin expander is given a (MEM (ptr)) , but it's up to the back-end where to put that in the expanded sequence to materialize the load, so you could write (sorry, don't know x86 asm very well, but I think this is how you'd put it) lfence mov (ptr), dest with branches around that as appropriate to support the remainder of the builtin's behaviour. > Is the main issue expressing the CSEL condition in the source code? Perhaps it is > possible to introduce > > int guard = __builtin_nontransparent(predicate); > > if (predicate) > foo = __builtin_load_no_speculate(&arr[addr], guard); > > ... or maybe even > > if (predicate) > foo = arr[__builtin_loadspecbarrier(addr, guard)]; > > where internally __builtin_nontransparent is the same as > > guard = predicate; > asm volatile("" : "+g"(guard)); > > although admittedly this is not perfect since it forces evaluation of 'guard' > before the branch. As I explained to Bernd last night, I think this is likely be unsafe. If there's some control path before __builtin_nontransparent that allows 'predicate' to be simplified (eg by value range propagation), then your guard doesn't protect against the speculation that you think it does. Changing all the optimizers to guarantee that wouldn't happen (and guaranteeing that all future optimizers won't introduce new problems of that nature) is, I suspect, very non-trivial. R. > > Alexander >
On 05/01/18 17:24, Jeff Law wrote: > On 01/04/2018 06:58 AM, Richard Earnshaw wrote: >> >> Recently, Google Project Zero disclosed several classes of attack >> against speculative execution. One of these, known as variant-1 >> (CVE-2017-5753), allows explicit bounds checks to be bypassed under >> speculation, providing an arbitrary read gadget. Further details can >> be found on the GPZ blog [1] and the documentation that is included >> with the first patch. > So I think it's important for anyone reading this stuff to read > Richard's paragraph carefully -- "an arbitrary read gadget". > > I fully expect that over the course of time we're going to see other > arbitrary read gadgets to be found. Those gadgets may have lower > bandwidth, have a higher degree of jitter or be harder to exploit, but > they're out there just waiting to be discovered. > > So I don't expect this to be the only mitigation we have to put into place. > > Anyway... > > >> >> Some optimizations are permitted to make the builtin easier to use. >> The final two arguments can both be omitted (c++ style): failval will >> default to 0 in this case and if cmpptr is omitted ptr will be used >> for expansions of the range check. In addition either lower or upper >> (but not both) may be a literal NULL and the expansion will then >> ignore that boundary condition when expanding. > So what are the cases where FAILVAL is useful rather than just using > zero (or some other constant) all the time? So some background first. My initial attempts to define a builtin were based entirely around trying to specify the builtin without out any hard functional behaviour as well. The specification was a mess. Things just became so much easier to specify when I moved to defining a logical behaviour on top of the intended side effects on speculation. Having done that it seemed sensible to permit the user to use the builtin in more creative ways by allowing it to select the failure value. The idea was that code such as if (ptr >= base && ptr < limit) // bounds check return *ptr; return FAILVAL; could simply be rewritten as return __builtin_load_no_speculate (ptr, base, limit, FAILVAL); and now you don't have to worry about writing the condition out twice or any other such nonsense. In this case the builtin does exactly what you want it to do. (It's also easier now to write test cases that check the correctness of the builtin's expansion, since you can test for a behaviour of the code's execution, even if you can't check the speculation behaviour properly.) > > Similarly under what conditions would one want to use CMPPTR rather than > PTR? This was at the request of some kernel folk. They have some cases where CMPPTR may be a pointer to an atomic type and want to do something like if (cmpptr >= lower && cmpptr < upper) val = __atomic_read_and_inc (cmpptr); The atomics are complicated enough already and rewriting them all to additionally inhibit speculation based on the result would be a nightmare. By separating out cmpptr from ptr you can now write if (cmpptr >= lower && cmpptr < upper) { TYPE tmp_val = __atomic_read_and_inc (cmpptr); val = builtin_load_no_speculate (&tmp_val, lower, upper, 0, cmpptr); } It's meaningless in this case to check the bounds of tmp_val - it's just a value pushed onto the stack in order to satisfy the requirement that we need a load that is being speculatively executed; but we can still test against the speculation conditions, which are still the range check for cmpptr. There may be other similar cases as well where you have an object that you want to clean up that is somehow derived from cmpptr but is not cmpptr itself. You do have to be more careful with the extended form, since it is possible to write sequences that don't inhibit speculation in the way you might think they do, but with greater flexibility also comes greater risk. I don't think there's ever a problem if ptr is somehow derived from dereferencing cmpptr. R. > > I wandered down through the LKML thread but didn't see anything which > would shed light on those two questions. > > jeff >>
On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote: > > Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence > > wouldn't work (applying CSEL to the address rather than loaded value), and > > if it wouldn't, then ARM-specific lowering of the builtin can handle that > > anyhow, right? (by spilling the pointer) > > The load has to feed /in/ to the csel/csdb sequence, not come after it. Again, I'm sorry, but I have to insist that what you're saying here contradicts the documentation linked from https://developer.arm.com/support/security-update The PDF currently says, in "Details of the CSDB barrier": Until the barrier completes: 1) For any load, store, data or instruction preload, RW2, appearing in program order *after the barrier* [...] 2) For any indirect branch (B2), appearing in program order *after the barrier* [...] [...] the speculative execution of RW2/B2 does not influence the allocations of entries in a cache [...] It doesn't say anything about the behavior of CSDB being dependent on the loads encountered prior to it. (and imho it doesn't make sense for a hardware implementation to work that way) > As I explained to Bernd last night, I think this is likely be unsafe. > If there's some control path before __builtin_nontransparent that allows > 'predicate' to be simplified (eg by value range propagation), then your > guard doesn't protect against the speculation that you think it does. > Changing all the optimizers to guarantee that wouldn't happen (and > guaranteeing that all future optimizers won't introduce new problems of > that nature) is, I suspect, very non-trivial. But note that in that case the compiler could have performed the same simplification in the original code as well, emitting straight-line machine code lacking speculatively executable parts in the first place. Alexander
On 09/01/18 13:27, Alexander Monakov wrote: > On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote: >> > Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence >> > wouldn't work (applying CSEL to the address rather than loaded value), and >> > if it wouldn't, then ARM-specific lowering of the builtin can handle that >> > anyhow, right? (by spilling the pointer) >> >> The load has to feed /in/ to the csel/csdb sequence, not come after it. > > Again, I'm sorry, but I have to insist that what you're saying here > contradicts > the documentation linked from > https://developer.arm.com/support/security-update > The PDF currently says, in "Details of the CSDB barrier": > > Until the barrier completes: > 1) For any load, store, data or instruction preload, RW2, appearing in > program order *after the barrier* [...] > > 2) For any indirect branch (B2), appearing in program order > *after the barrier* [...] > > [...] the speculative execution of RW2/B2 does not influence the > allocations of entries in a cache [...] > > It doesn't say anything about the behavior of CSDB being dependent on > the loads > encountered prior to it. (and imho it doesn't make sense for a hardware > implementation to work that way) Read condition 1 i) again. i) the conditional select instruction has a register data dependency on a load R1, that has been executed speculatively, for one of its input registers, and To be executed speculatively it must be *after* a predicted operation that has not yet resolved. Once it has resolved it is no-longer speculative, it's committed (one way or another). R. > >> As I explained to Bernd last night, I think this is likely be unsafe. >> If there's some control path before __builtin_nontransparent that allows >> 'predicate' to be simplified (eg by value range propagation), then your >> guard doesn't protect against the speculation that you think it does. >> Changing all the optimizers to guarantee that wouldn't happen (and >> guaranteeing that all future optimizers won't introduce new problems of >> that nature) is, I suspect, very non-trivial. > > But note that in that case the compiler could have performed the same > simplification in the original code as well, emitting straight-line > machine code > lacking speculatively executable parts in the first place. > > Alexander
On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote: > Read condition 1 i) again. > > > i) the conditional select instruction has a register data dependency on > a load R1, that has been executed speculatively, for one of its input > registers, and Sorry - I don't know how I missed that. I understand the motivation for the behavior of the new built-in now; the CSDB specification is surprising, but that is off-topic I guess. Seems quite unfortunate that CSDB is constraining the generic built-in like this, though. Thank you for clearing this up. I suggest that the user documentation in extend.texi should explicitly call out that the load itself still may be speculatively executed - only its consumers can be expected to be fenced off. For example, after +but in addition target-specific code will be inserted to ensure that +speculation using @code{*ptr} cannot occur when @var{cmpptr} lies outside of +the specified bounds. it can go on to say e.g., However note that the load of @code{*ptr} itself and the conditional branch leading to it may still be subject to speculative execution (and the load may have observable effects on the cache as a result). Thanks. Alexander
On 05/01/18 17:26, Jeff Law wrote: > On 01/04/2018 11:20 AM, Joseph Myers wrote: >> On Thu, 4 Jan 2018, Richard Earnshaw wrote: >> >>> 1 - generic modifications to GCC providing the builtin function for all >>> architectures and expanding to an implementation that gives the >>> logical behaviour of the builtin only. A warning is generated if >>> this expansion path is used that code will execute correctly but >>> without providing protection against speculative use. >> >> Presumably it would make sense to have a standard way for architectures >> with no speculative execution to just use the generic version, but without >> the warning? E.g., split up default_inhibit_load_speculation so that it >> generates the warning then calls another function with the same prototype, >> so such architectures can just define the hook to use that other function? >> > Seems wise. There's still architectures that don't speculate or don't > speculate enough to trigger these problems. > > Jeff > I'm in two minds about that. There are certainly cases where you might want to use the generic expansion as part of handling a case where you have a more standard speculation barrier; in those cases you might want to emit your barrier and then let the generic code expand and provide the logical behaviour of the builtin. However, if you don't have a barrier you need to ask yourself, will my architecture ever have an implementation that does do speculation? Unless you can be certain that it won't, you probably need to be warned that some code the programmer thinks might be vulnerable to spectre has not been compiled with that protection, otherwise if you run that code on a later implementation that does speculate, it might be vulnerable. Let me give an example, we use the generic code expansion if we encounter the builtin when generating code for Thumb on pre-ARMv7 devices. We don't have instructions in 'thumb1' to guard against speculation and we really want to warn users that they've done this (it might be a mistake in how they're invoking the compiler). I could add an extra parameter to the helper (bool warn_unimplemented), which would be true if called directly from the expanders, but would now allow back-ends to implement a trivial variant that just suppressed the warning. They could also then use the generic expansion if all that was needed was a subsequent fence instruction. R.
On 01/09/2018 03:47 AM, Richard Earnshaw (lists) wrote: > On 05/01/18 13:08, Alexander Monakov wrote: >> On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote: >>> This is quite tricky. For ARM we have to have a speculated load. >> >> Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence >> wouldn't work (applying CSEL to the address rather than loaded value), and >> if it wouldn't, then ARM-specific lowering of the builtin can handle that >> anyhow, right? (by spilling the pointer) > > The load has to feed /in/ to the csel/csdb sequence, not come after it. > >> >> (on x86 the current Intel's recommendation is to emit LFENCE prior to the load) > > That can be supported in the way you expand the builtin. The builtin > expander is given a (MEM (ptr)) , but it's up to the back-end where to > put that in the expanded sequence to materialize the load, so you could > write (sorry, don't know x86 asm very well, but I think this is how > you'd put it) > > lfence > mov (ptr), dest > > with branches around that as appropriate to support the remainder of the > builtin's behaviour. I think the argument is going to be that they don't want the branches around to support the other test + failval semantics. Essentially the same position as IBM has with PPC. > >> Is the main issue expressing the CSEL condition in the source code? Perhaps it is >> possible to introduce >> >> int guard = __builtin_nontransparent(predicate); >> >> if (predicate) >> foo = __builtin_load_no_speculate(&arr[addr], guard); >> >> ... or maybe even >> >> if (predicate) >> foo = arr[__builtin_loadspecbarrier(addr, guard)]; >> >> where internally __builtin_nontransparent is the same as >> >> guard = predicate; >> asm volatile("" : "+g"(guard)); >> >> although admittedly this is not perfect since it forces evaluation of 'guard' >> before the branch. > > As I explained to Bernd last night, I think this is likely be unsafe. > If there's some control path before __builtin_nontransparent that allows > 'predicate' to be simplified (eg by value range propagation), then your > guard doesn't protect against the speculation that you think it does. > Changing all the optimizers to guarantee that wouldn't happen (and > guaranteeing that all future optimizers won't introduce new problems of > that nature) is, I suspect, very non-trivial. Agreed. Whatever PREDICATE happens to be, the compiler is going to go through extreme measures to try and collapse PREDICATE down to a compile-time constant, including splitting paths to the point where PREDICATE is used in the conditional so that on one side it's constant and the other it's non-constant. It seems like this approach is likely to be compromised by the optimizers. Jeff
On 01/09/2018 08:29 AM, Richard Earnshaw (lists) wrote: > I'm in two minds about that. There are certainly cases where you might > want to use the generic expansion as part of handling a case where you > have a more standard speculation barrier; in those cases you might want > to emit your barrier and then let the generic code expand and provide > the logical behaviour of the builtin. > > However, if you don't have a barrier you need to ask yourself, will my > architecture ever have an implementation that does do speculation? > Unless you can be certain that it won't, you probably need to be warned > that some code the programmer thinks might be vulnerable to spectre has > not been compiled with that protection, otherwise if you run that code > on a later implementation that does speculate, it might be vulnerable. > > Let me give an example, we use the generic code expansion if we > encounter the builtin when generating code for Thumb on pre-ARMv7 > devices. We don't have instructions in 'thumb1' to guard against > speculation and we really want to warn users that they've done this (it > might be a mistake in how they're invoking the compiler). > > I could add an extra parameter to the helper (bool warn_unimplemented), > which would be true if called directly from the expanders, but would now > allow back-ends to implement a trivial variant that just suppressed the > warning. They could also then use the generic expansion if all that was > needed was a subsequent fence instruction. Yea, I see your side as well on this -- advisable or not I suspect we're going to see folks saying "my embedded chip doesn't have these problems, I don't want to pay any of these costs" and I don't want to be warned about a problem I know can't happen (today). Anyway, I think relatively speaking this is minor compared to the stuff we're discussing around the semantics of the builtin. I can live with iterating on this aspect based on feedback. jeff