Message ID | 20210120173800.1660730-13-jthierry@redhat.com |
---|---|
State | New |
Headers | show |
Series | objtool: add base support for arm64 | expand |
> From: Raphael Gault <raphael.gault@arm.com> > > This plugins comes into play before the final 2 RTL passes of GCC and > detects switch-tables that are to be outputed in the ELF and writes > information in an ".discard.switch_table_info" section which will be > used by objtool. > > Signed-off-by: Raphael Gault <raphael.gault@arm.com> > [J.T.: Change section name to store switch table information, > Make plugin Kconfig be selected rather than opt-in by user, > Add a relocation in the switch_table_info that points to > the jump operation itself] > Signed-off-by: Julien Thierry <jthierry@redhat.com> Rather than tightly couple this feature to a particular toolchain via plugin, it might be nice to consider what features could be spec'ed out for toolchains to implement (perhaps via a -f flag). Distributions (like Android, CrOS) wont be able to use such a feature as is.
On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote: > > From: Raphael Gault <raphael.gault@arm.com> > > > > This plugins comes into play before the final 2 RTL passes of GCC and > > detects switch-tables that are to be outputed in the ELF and writes > > information in an ".discard.switch_table_info" section which will be > > used by objtool. > > > > Signed-off-by: Raphael Gault <raphael.gault@arm.com> > > [J.T.: Change section name to store switch table information, > > Make plugin Kconfig be selected rather than opt-in by user, > > Add a relocation in the switch_table_info that points to > > the jump operation itself] > > Signed-off-by: Julien Thierry <jthierry@redhat.com> > > Rather than tightly couple this feature to a particular toolchain via > plugin, it might be nice to consider what features could be spec'ed out > for toolchains to implement (perhaps via a -f flag). The problem is being able to detect switch statement jump table vectors. For a given indirect branch (due to a switch statement), what are all the corresponding jump targets? We would need the compiler to annotate that information somehow. > Distributions (like Android, CrOS) wont be able to use such a feature as > is. Would a Clang plugin be out of the question? -- Josh
On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote: > > > From: Raphael Gault <raphael.gault@arm.com> > > > > > > This plugins comes into play before the final 2 RTL passes of GCC and > > > detects switch-tables that are to be outputed in the ELF and writes > > > information in an ".discard.switch_table_info" section which will be > > > used by objtool. > > > > > > Signed-off-by: Raphael Gault <raphael.gault@arm.com> > > > [J.T.: Change section name to store switch table information, > > > Make plugin Kconfig be selected rather than opt-in by user, > > > Add a relocation in the switch_table_info that points to > > > the jump operation itself] > > > Signed-off-by: Julien Thierry <jthierry@redhat.com> > > > > Rather than tightly couple this feature to a particular toolchain via > > plugin, it might be nice to consider what features could be spec'ed out > > for toolchains to implement (perhaps via a -f flag). > > The problem is being able to detect switch statement jump table vectors. > > For a given indirect branch (due to a switch statement), what are all > the corresponding jump targets? > > We would need the compiler to annotate that information somehow. Makes sense, the compiler should have this information. How is this problem solved on x86? > > > Distributions (like Android, CrOS) wont be able to use such a feature as > > is. > > Would a Clang plugin be out of the question? Generally, we frown on out of tree kernel modules for a couple reasons. Maintaining ABI compatibility when the core kernel changes is generally not instantaneous; someone has to notice the ABI has changed which will be more delayed than if the module was in tree. Worse is when semantics subtly change. While we must not break userspace, we provide no such guarantees within the kernel proper. Also, it's less likely that out of tree kernel modules have been reviewed by kernel developers. They may not have the same quality, use the recommended interfaces, follow coding conventions, etc.. Oh, did I say "out of tree kernel modules?" I meant "compiler plugins." But it's two different sides of the same coin to me. FWIW, I think the approach taken by -mstack-protector-guard-reg= is a useful case study. It was prototyped as a GCC extension, then added to GCC proper, then added to LLVM (currently only x86, but most of the machinery is in place in the compiler to get it running on arm64). My recommendation is to skip the plugin part and work on a standard interface for compilers to implement, with input from compiler developers. -- Thanks, ~Nick Desaulniers
On Fri, Jan 29, 2021 at 10:10:01AM -0800, Nick Desaulniers wrote: > On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote: > > > > From: Raphael Gault <raphael.gault@arm.com> > > > > > > > > This plugins comes into play before the final 2 RTL passes of GCC and > > > > detects switch-tables that are to be outputed in the ELF and writes > > > > information in an ".discard.switch_table_info" section which will be > > > > used by objtool. > > > > > > > > Signed-off-by: Raphael Gault <raphael.gault@arm.com> > > > > [J.T.: Change section name to store switch table information, > > > > Make plugin Kconfig be selected rather than opt-in by user, > > > > Add a relocation in the switch_table_info that points to > > > > the jump operation itself] > > > > Signed-off-by: Julien Thierry <jthierry@redhat.com> > > > > > > Rather than tightly couple this feature to a particular toolchain via > > > plugin, it might be nice to consider what features could be spec'ed out > > > for toolchains to implement (perhaps via a -f flag). > > > > The problem is being able to detect switch statement jump table vectors. > > > > For a given indirect branch (due to a switch statement), what are all > > the corresponding jump targets? > > > > We would need the compiler to annotate that information somehow. > > Makes sense, the compiler should have this information. How is this > problem solved on x86? Thus far we've been able to successfully reverse engineer it on x86, though it hasn't been easy. There were some particulars for arm64 which made doing so impossible. (I don't remember the details.) > > > Distributions (like Android, CrOS) wont be able to use such a feature as > > > is. > > > > Would a Clang plugin be out of the question? > > Generally, we frown on out of tree kernel modules for a couple reasons. > > Maintaining ABI compatibility when the core kernel changes is > generally not instantaneous; someone has to notice the ABI has changed > which will be more delayed than if the module was in tree. Worse is > when semantics subtly change. While we must not break userspace, we > provide no such guarantees within the kernel proper. > > Also, it's less likely that out of tree kernel modules have been > reviewed by kernel developers. They may not have the same quality, > use the recommended interfaces, follow coding conventions, etc.. > > Oh, did I say "out of tree kernel modules?" I meant "compiler > plugins." But it's two different sides of the same coin to me. I thought Android already relied on OOT modules. GCC plugins generally enforce the exact same GCC version for OOT modules. So there's no ABI to worry about. I assume Clang does the same? Or did I miss your point? > FWIW, I think the approach taken by -mstack-protector-guard-reg= is a > useful case study. It was prototyped as a GCC extension, then added > to GCC proper, then added to LLVM (currently only x86, but most of the > machinery is in place in the compiler to get it running on arm64). My > recommendation is to skip the plugin part and work on a standard > interface for compilers to implement, with input from compiler > developers. I like the idea. Is there a recommended forum for such discussions? Just an email to GCC/Clang development lists? -- Josh
On Mon, Feb 1, 2021 at 1:44 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > On Fri, Jan 29, 2021 at 10:10:01AM -0800, Nick Desaulniers wrote: > > On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > > > On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote: > > > > > From: Raphael Gault <raphael.gault@arm.com> > > > > > > > > > > This plugins comes into play before the final 2 RTL passes of GCC and > > > > > detects switch-tables that are to be outputed in the ELF and writes > > > > > information in an ".discard.switch_table_info" section which will be > > > > > used by objtool. > > > > > > > > > > Signed-off-by: Raphael Gault <raphael.gault@arm.com> > > > > > [J.T.: Change section name to store switch table information, > > > > > Make plugin Kconfig be selected rather than opt-in by user, > > > > > Add a relocation in the switch_table_info that points to > > > > > the jump operation itself] > > > > > Signed-off-by: Julien Thierry <jthierry@redhat.com> > > > > > > > > Rather than tightly couple this feature to a particular toolchain via > > > > plugin, it might be nice to consider what features could be spec'ed out > > > > for toolchains to implement (perhaps via a -f flag). > > > > > > The problem is being able to detect switch statement jump table vectors. > > > > > > For a given indirect branch (due to a switch statement), what are all > > > the corresponding jump targets? > > > > > > We would need the compiler to annotate that information somehow. > > > > Makes sense, the compiler should have this information. How is this > > problem solved on x86? > > Thus far we've been able to successfully reverse engineer it on x86, > though it hasn't been easy. > > There were some particulars for arm64 which made doing so impossible. > (I don't remember the details.) I think the details are pertinent to finding a portable solution. The commit message of this commit in particular doesn't document such details, such as why such an approach is necessary or how the data is laid out for objtool to consume it. > > > > Distributions (like Android, CrOS) wont be able to use such a feature as > > > > is. > > > > > > Would a Clang plugin be out of the question? > > > > Generally, we frown on out of tree kernel modules for a couple reasons. > > > > Maintaining ABI compatibility when the core kernel changes is > > generally not instantaneous; someone has to notice the ABI has changed > > which will be more delayed than if the module was in tree. Worse is > > when semantics subtly change. While we must not break userspace, we > > provide no such guarantees within the kernel proper. > > > > Also, it's less likely that out of tree kernel modules have been > > reviewed by kernel developers. They may not have the same quality, > > use the recommended interfaces, follow coding conventions, etc.. > > > > Oh, did I say "out of tree kernel modules?" I meant "compiler > > plugins." But it's two different sides of the same coin to me. > > I thought Android already relied on OOT modules. Android Common Kernel does not *rely* on OOT modules or compiler plugins. Android doesn't forbid vendors from using OOT modules, though, just as the mainline kernel does not prevent modules from being built out of tree, or users from insmod'ing them. It's certainly a pragmatic approach; idealism doesn't work for all OEMs. Personally, I lean more towards idealistic; I prefer in-tree modules, dislike compiler plugins (for much the same reasons), and idealize loose coupling of software components. This series looks to me like it tightly couples arm64 live patching to objtool and GCC. On the earlier thread, Julien writes: >> I think most people interested in livepatching are using GCC built >> kernels, but I could be mistaken (althought in the long run, both >> compilers should be supported, and yes, I realize the objtool solution >> currently only would support GCC). Google's production kernels are using livepatching and are built with Clang. Getting similar functionality working for arm64 would be of interest. > GCC plugins generally enforce the exact same GCC version for OOT > modules. So there's no ABI to worry about. I assume Clang does the > same? > > Or did I miss your point? I think so. It would seem that the current solution for stack validation would require multiple different compiler plugins to work reliably. If jump tables are an issue, I don't see why you wouldn't pursue a more portable option like -fno-jump-tables first, then work to claw back any performance loss from that, if relevant? Then there's no complaint about toolchain specific implementations or tight coupling. Objtool support on arm64 is interesting to me though, because it has found bugs in LLVM codegen. That alone is extremely valuable. But not it's not helpful if it's predicated or tightly coupled to GCC, as this series appears to do. The idea of rebuilding control flow from binary analysis and using that to find codegen bugs is a really cool idea (novel, even? idk), and I wish we had some analog for userspace binaries that could perform similar checks. > > > FWIW, I think the approach taken by -mstack-protector-guard-reg= is a > > useful case study. It was prototyped as a GCC extension, then added > > to GCC proper, then added to LLVM (currently only x86, but most of the > > machinery is in place in the compiler to get it running on arm64). My > > recommendation is to skip the plugin part and work on a standard > > interface for compilers to implement, with input from compiler > > developers. > > I like the idea. Is there a recommended forum for such discussions? > Just an email to GCC/Clang development lists? linux-toolchains@vger.kernel.org is probably a good start. -- Thanks, ~Nick Desaulniers
On Mon, Feb 01, 2021 at 03:17:40PM -0800, Nick Desaulniers wrote: > > > > > Rather than tightly couple this feature to a particular toolchain via > > > > > plugin, it might be nice to consider what features could be spec'ed out > > > > > for toolchains to implement (perhaps via a -f flag). > > > > > > > > The problem is being able to detect switch statement jump table vectors. > > > > > > > > For a given indirect branch (due to a switch statement), what are all > > > > the corresponding jump targets? > > > > > > > > We would need the compiler to annotate that information somehow. > > > > > > Makes sense, the compiler should have this information. How is this > > > problem solved on x86? > > > > Thus far we've been able to successfully reverse engineer it on x86, > > though it hasn't been easy. > > > > There were some particulars for arm64 which made doing so impossible. > > (I don't remember the details.) > > I think the details are pertinent to finding a portable solution. The > commit message of this commit in particular doesn't document such > details, such as why such an approach is necessary or how the data is > laid out for objtool to consume it. Agreed, the commit message needs more details. > > > > > Distributions (like Android, CrOS) wont be able to use such a feature as > > > > > is. > > > > > > > > Would a Clang plugin be out of the question? > > > > > > Generally, we frown on out of tree kernel modules for a couple reasons. > > > > > > Maintaining ABI compatibility when the core kernel changes is > > > generally not instantaneous; someone has to notice the ABI has changed > > > which will be more delayed than if the module was in tree. Worse is > > > when semantics subtly change. While we must not break userspace, we > > > provide no such guarantees within the kernel proper. > > > > > > Also, it's less likely that out of tree kernel modules have been > > > reviewed by kernel developers. They may not have the same quality, > > > use the recommended interfaces, follow coding conventions, etc.. > > > > > > Oh, did I say "out of tree kernel modules?" I meant "compiler > > > plugins." But it's two different sides of the same coin to me. > > > > I thought Android already relied on OOT modules. > > Android Common Kernel does not *rely* on OOT modules or compiler > plugins. Android doesn't forbid vendors from using OOT modules, > though, just as the mainline kernel does not prevent modules from > being built out of tree, or users from insmod'ing them. It's > certainly a pragmatic approach; idealism doesn't work for all OEMs. No judgement, RHEL is similar. Linux is nothing if not pragmatic. > Personally, I lean more towards idealistic; I prefer in-tree modules, > dislike compiler plugins (for much the same reasons), and idealize > loose coupling of software components. This series looks to me like it > tightly couples arm64 live patching to objtool and GCC. > > On the earlier thread, Julien writes: > > >> I think most people interested in livepatching are using GCC built > >> kernels, but I could be mistaken (althought in the long run, both > >> compilers should be supported, and yes, I realize the objtool solution > >> currently only would support GCC). > > Google's production kernels are using livepatching and are built with > Clang. Getting similar functionality working for arm64 would be of > interest. Well, that's cool. I had no idea. I'm curious how they're generating livepatch modules? Because kpatch-build doesn't support Clang (AFAIK), and if they're not using kpatch-build then there are some traps to look out for. > > GCC plugins generally enforce the exact same GCC version for OOT > > modules. So there's no ABI to worry about. I assume Clang does the > > same? > > > > Or did I miss your point? > > I think so. It would seem that the current solution for stack > validation would require multiple different compiler plugins to work > reliably. If jump tables are an issue, I don't see why you wouldn't > pursue a more portable option like -fno-jump-tables first, then work > to claw back any performance loss from that, if relevant? Then > there's no complaint about toolchain specific implementations or tight > coupling. > > Objtool support on arm64 is interesting to me though, because it has > found bugs in LLVM codegen. That alone is extremely valuable. But not > it's not helpful if it's predicated or tightly coupled to GCC, as this > series appears to do. I agree 100%, if there are actual Clang livepatch users (which it sounds like there are) then we should target both compilers. And yes, objtool has been pretty good at finding compiler bugs, so the more coverage the better. I like the -fno-jump-tables suggestion, assuming it doesn't affect performance significantly. > The idea of rebuilding control flow from binary analysis and using > that to find codegen bugs is a really cool idea (novel, even? idk), > and I wish we had some analog for userspace binaries that could > perform similar checks. Objtool is generic in many ways -- in fact I recently heard from a PhD candidate who used it successfully on another kernel for an ORC unwinder. It could probably be used on user space without much effort. That was an early original stated goal but I definitely don't have the bandwidth or incentive to work on it. > > > FWIW, I think the approach taken by -mstack-protector-guard-reg= is a > > > useful case study. It was prototyped as a GCC extension, then added > > > to GCC proper, then added to LLVM (currently only x86, but most of the > > > machinery is in place in the compiler to get it running on arm64). My > > > recommendation is to skip the plugin part and work on a standard > > > interface for compilers to implement, with input from compiler > > > developers. > > > > I like the idea. Is there a recommended forum for such discussions? > > Just an email to GCC/Clang development lists? > > linux-toolchains@vger.kernel.org is probably a good start. Thanks, I'll write something up (maybe a more specific proposal) and post it.
On 2/2/21 12:17 AM, Nick Desaulniers wrote: > On Mon, Feb 1, 2021 at 1:44 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: >> >> On Fri, Jan 29, 2021 at 10:10:01AM -0800, Nick Desaulniers wrote: >>> On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: >>>> >>>> On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote: >>>>>> From: Raphael Gault <raphael.gault@arm.com> >>>>>> >>>>>> This plugins comes into play before the final 2 RTL passes of GCC and >>>>>> detects switch-tables that are to be outputed in the ELF and writes >>>>>> information in an ".discard.switch_table_info" section which will be >>>>>> used by objtool. >>>>>> >>>>>> Signed-off-by: Raphael Gault <raphael.gault@arm.com> >>>>>> [J.T.: Change section name to store switch table information, >>>>>> Make plugin Kconfig be selected rather than opt-in by user, >>>>>> Add a relocation in the switch_table_info that points to >>>>>> the jump operation itself] >>>>>> Signed-off-by: Julien Thierry <jthierry@redhat.com> >>>>> >>>>> Rather than tightly couple this feature to a particular toolchain via >>>>> plugin, it might be nice to consider what features could be spec'ed out >>>>> for toolchains to implement (perhaps via a -f flag). >>>> >>>> The problem is being able to detect switch statement jump table vectors. >>>> >>>> For a given indirect branch (due to a switch statement), what are all >>>> the corresponding jump targets? >>>> >>>> We would need the compiler to annotate that information somehow. >>> >>> Makes sense, the compiler should have this information. How is this >>> problem solved on x86? >> >> Thus far we've been able to successfully reverse engineer it on x86, >> though it hasn't been easy. >> >> There were some particulars for arm64 which made doing so impossible. >> (I don't remember the details.) The main issue is that the tables for arm64 have more indirection than x86. On x86, the dispatching jump instruction fetches the target address from a contiguous array of addresses based on a given offset. So the list of potential targets of the jump is neatly organized in a table (and sure, before link time these are just relocation, but still processable). On arm64 (with GCC at least), what is stored in a table is an array of candidate offsets from the jump instruction. And because arm64 is limited to 32bit instructions, the encoding often requires multiple instructions to compute the target address: ldr<*> x_offset, [x_offsets_table, x_index, ...] // load offset adr x_dest_base, <addr> // load target branch for offset 0 add x_dest, x_target_base, x_offset, ... // compute final address br x_dest // jump Where this gets trickier is that (with GCC) the offsets stored in the table might or might not be signed constants (and this can be seen in GCC intermediate representations, but I do not believe this information is output in the final object file). And on top of that, GCC might decide to use offsets that are seen as unsigned during intermediate representation as signed offset by sign extending them in the add instruction. So, to handle this we'd have to track the different operation done with the offset, from the load to the final jump, decoding the instructions and deducing the potential target instructions from the table of offsets. But that is error prone as we don't really know how many instructions can be between the ones doing the address computation, and I remember some messy case of a jump table inside a jump table where tracking the instruction touching one or the other offset would need a lot of corner case handling. And this of course is just for GCC, I haven't looked at what it all looks like on Clang's end. > > I think the details are pertinent to finding a portable solution. The > commit message of this commit in particular doesn't document such > details, such as why such an approach is necessary or how the data is > laid out for objtool to consume it. > Sorry, I will need to make that clearer. The next patch explains it a bit [1] Basically, for simplicity, the plugin creates a new section containing tables (one per jump table) of references to the jump targets, similar to what x86 has, except that in this case this table isn't actually used by runtime code and is discarded at link time. I only chose this to minimize what needed to be changed in objtool and because the format seemed simple enough. But I'm open on some alternative, whether it's a -fjump-table-info option added to compilers with a different format to do the links. The important requirement is to be able to know all the candidate targets for a "br <reg>" instruction. [1] https://lkml.org/lkml/2021/1/20/910 Thanks, -- Julien Thierry
Stirring more goop into the hole .... Requiring gcc-plugins, matching compiler versions and the same 'dwarf' format for OOT modules is probably very painful. In many cases (and this may include drivers released by some distributions) an OOT driver has two separate parts. One part is C source that is compiled when the module is built on the target system and against the installed kernel headers. Getting this to match 'just' relies on having the correct compiler (etc) installed and in $PATH. The second part is much more problematic. This is just an object file compiled by a third party. It doesn't directly depend on anything defined in the kernel headers - so can (currently) be linked into any kernel version. In the past some graphics drivers have had a third party object file. I think some of the laptop wifi drivers might as well. Now I some people think everything should be free source. But there are various commercial and practical reasons for both OOT drivers and object file 'blobs' in OOT drivers. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, Feb 1, 2021 at 4:02 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > On Mon, Feb 01, 2021 at 03:17:40PM -0800, Nick Desaulniers wrote: > > On the earlier thread, Julien writes: > > > > >> I think most people interested in livepatching are using GCC built > > >> kernels, but I could be mistaken (althought in the long run, both > > >> compilers should be supported, and yes, I realize the objtool solution > > >> currently only would support GCC). > > > > Google's production kernels are using livepatching and are built with > > Clang. Getting similar functionality working for arm64 would be of > > interest. > > Well, that's cool. I had no idea. > > I'm curious how they're generating livepatch modules? Because > kpatch-build doesn't support Clang (AFAIK), and if they're not using > kpatch-build then there are some traps to look out for. Ok, I just met with a bunch of folks that are actively working on this. Let me intro Yonghyun Hwang <yonghyun@google.com> Pete Swain <swine@google.com> who will be the folks on point for this from Google. My understanding after some clarifications today is that Google is currently using a proprietary kernel patching mechanism that developed around a decade ago, "pre-ksplice Oracle acquisition." But we are looking to transition to kpatch, and help towards supporting arm64. Live patching is important for deploying kernel fixes faster than predetermined scheduled draining of jobs in clusters. The first steps for kpatch transition is supporting builds with Clang. Yonghyun is working on that and my hope is he will have patches for you for that soon. Curiously, the proprietary mechanism doesn't rely on stack validation. I think that such dependency was questioned on the cover letter patch's thread as well. Maybe there's "some traps to look out for" you're referring to there? I'm not privy to the details, though I would guess it has to do with ensuring kernel threads aren't executing (or planning to return through) code regions that are trying to be patched/unpatched. I am curious about frame pointers never being omitted for arm64; is frame pointer chasing is unreliable in certain contexts? The internal functionality has been used heavily in production for almost a decade, though without it being public or supporting arm64; I'm not sure precisely how they solve such issues (or how others might review such an approach). Either way, the dependencies for live patching are less important, so long as they are toolchain portable. The ability to live patch kernel images is ___important___ to Google. > > Objtool support on arm64 is interesting to me though, because it has > > found bugs in LLVM codegen. That alone is extremely valuable. But not > > it's not helpful if it's predicated or tightly coupled to GCC, as this > > series appears to do. > > I agree 100%, if there are actual Clang livepatch users (which it sounds > like there are) then we should target both compilers. Or will be. (Sorry, I didn't know we hadn't completed the transition to kpatch yet. It is "the opposite side of the house" from where I work; I literally have 8 bosses, not kidding). Though if kpatch moves to requiring GCC plugins for architectures we use extensively or would like to use more of, that's probably going to throw a wrench in multiple transition plans. (The fleet's transition to Clang is done, I'm not worried about that). > And yes, objtool has been pretty good at finding compiler bugs, so the > more coverage the better. > > The idea of rebuilding control flow from binary analysis and using > > that to find codegen bugs is a really cool idea (novel, even? idk), > > and I wish we had some analog for userspace binaries that could > > perform similar checks. > > Objtool is generic in many ways -- in fact I recently heard from a PhD > candidate who used it successfully on another kernel for an ORC > unwinder. That's pretty cool! Reuse outside the initial context is always a good sign that something was designed right. > It could probably be used on user space without much effort. That was > an early original stated goal but I definitely don't have the bandwidth > or incentive to work on it. Heh. I'm a big fan of game theory; carrot or stick, right? -- Thanks, ~Nick Desaulniers
On Tue, Feb 2, 2021 at 12:57 AM Julien Thierry <jthierry@redhat.com> wrote: > > > > On 2/2/21 12:17 AM, Nick Desaulniers wrote: > > On Mon, Feb 1, 2021 at 1:44 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > >> > >> On Fri, Jan 29, 2021 at 10:10:01AM -0800, Nick Desaulniers wrote: > >>> On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > >>>> > >>>> On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote: > >>>>>> From: Raphael Gault <raphael.gault@arm.com> > >>>>>> > >>>>>> This plugins comes into play before the final 2 RTL passes of GCC and > >>>>>> detects switch-tables that are to be outputed in the ELF and writes > >>>>>> information in an ".discard.switch_table_info" section which will be > >>>>>> used by objtool. > >>>>>> > >>>>>> Signed-off-by: Raphael Gault <raphael.gault@arm.com> > >>>>>> [J.T.: Change section name to store switch table information, > >>>>>> Make plugin Kconfig be selected rather than opt-in by user, > >>>>>> Add a relocation in the switch_table_info that points to > >>>>>> the jump operation itself] > >>>>>> Signed-off-by: Julien Thierry <jthierry@redhat.com> > >>>>> > >>>>> Rather than tightly couple this feature to a particular toolchain via > >>>>> plugin, it might be nice to consider what features could be spec'ed out > >>>>> for toolchains to implement (perhaps via a -f flag). > >>>> > >>>> The problem is being able to detect switch statement jump table vectors. > >>>> > >>>> For a given indirect branch (due to a switch statement), what are all > >>>> the corresponding jump targets? > >>>> > >>>> We would need the compiler to annotate that information somehow. > >>> > >>> Makes sense, the compiler should have this information. How is this > >>> problem solved on x86? > >> > >> Thus far we've been able to successfully reverse engineer it on x86, > >> though it hasn't been easy. > >> > >> There were some particulars for arm64 which made doing so impossible. > >> (I don't remember the details.) > > The main issue is that the tables for arm64 have more indirection than x86. I wonder if PAC or BTI also make this slightly more complex? PAC at least has implications for unwinders, IIUC. > > On x86, the dispatching jump instruction fetches the target address from > a contiguous array of addresses based on a given offset. So the list of > potential targets of the jump is neatly organized in a table (and sure, > before link time these are just relocation, but still processable). > > On arm64 (with GCC at least), what is stored in a table is an array of > candidate offsets from the jump instruction. And because arm64 is > limited to 32bit instructions, the encoding often requires multiple > instructions to compute the target address: > > ldr<*> x_offset, [x_offsets_table, x_index, ...] // load offset > adr x_dest_base, <addr> // load target branch for offset 0 > add x_dest, x_target_base, x_offset, ... // compute final address > br x_dest // jump > > Where this gets trickier is that (with GCC) the offsets stored in the > table might or might not be signed constants (and this can be seen in > GCC intermediate representations, but I do not believe this information > is output in the final object file). And on top of that, GCC might > decide to use offsets that are seen as unsigned during intermediate > representation as signed offset by sign extending them in the add > instruction. > > So, to handle this we'd have to track the different operation done with > the offset, from the load to the final jump, decoding the instructions > and deducing the potential target instructions from the table of offsets. > > But that is error prone as we don't really know how many instructions > can be between the ones doing the address computation, and I remember > some messy case of a jump table inside a jump table where tracking the > instruction touching one or the other offset would need a lot of corner > case handling. > > And this of course is just for GCC, I haven't looked at what it all > looks like on Clang's end. Sure, but this is what production unwinders do, and they don't require compiler plugins, right? I don't doubt unwinders can be made simpler with changes to toolchain output; please work with your compiler vendor on making such changes rather than relying on compiler plugins to do so. > > I think the details are pertinent to finding a portable solution. The > > commit message of this commit in particular doesn't document such > > details, such as why such an approach is necessary or how the data is > > laid out for objtool to consume it. > > > > Sorry, I will need to make that clearer. The next patch explains it a > bit [1] > > Basically, for simplicity, the plugin creates a new section containing Right, this takes a focus on simplicity, at the cost of alienating a toolchain. Ard's point about 3193c0836f20 relating to -fgcse is that when presented with tricky cases to unwind, the simplest approach is taken. There it was disabling a compiler specific compiler optimization, here it's either a compiler specific compiler plugin (or disabling another compiler optimization). The pattern seems to be "Objtool isn't smart enough" ... "compiler optimization disabled" or "compiler plugin dependency." > tables (one per jump table) of references to the jump targets, similar > to what x86 has, except that in this case this table isn't actually used > by runtime code and is discarded at link time. I only chose this to > minimize what needed to be changed in objtool and because the format > seemed simple enough. > > But I'm open on some alternative, whether it's a -fjump-table-info Yes, I think we could spec out something like that. But I would appreciate revisiting open questions around stack validation (frame pointers), preventing the generation of jump tables to begin with (-fno-jump-tables) in place of making objtool more robust, or generally the need to depend on compiler plugins. > option added to compilers with a different format to do the links. The > important requirement is to be able to know all the candidate targets > for a "br <reg>" instruction. > > [1] https://lkml.org/lkml/2021/1/20/910 > > Thanks, > > -- > Julien Thierry > -- Thanks, ~Nick Desaulniers
On Tue, Feb 02, 2021 at 02:33:38PM -0800, Nick Desaulniers wrote: > On Mon, Feb 1, 2021 at 4:02 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > On Mon, Feb 01, 2021 at 03:17:40PM -0800, Nick Desaulniers wrote: > > > On the earlier thread, Julien writes: > > > > > > >> I think most people interested in livepatching are using GCC built > > > >> kernels, but I could be mistaken (althought in the long run, both > > > >> compilers should be supported, and yes, I realize the objtool solution > > > >> currently only would support GCC). > > > > > > Google's production kernels are using livepatching and are built with > > > Clang. Getting similar functionality working for arm64 would be of > > > interest. > > > > Well, that's cool. I had no idea. > > > > I'm curious how they're generating livepatch modules? Because > > kpatch-build doesn't support Clang (AFAIK), and if they're not using > > kpatch-build then there are some traps to look out for. > > Ok, I just met with a bunch of folks that are actively working on > this. Let me intro > Yonghyun Hwang <yonghyun@google.com> > Pete Swain <swine@google.com> > who will be the folks on point for this from Google. Nice to meet you all. Adding the live-patching ML sub-thread. > My understanding after some clarifications today is that Google is > currently using a proprietary kernel patching mechanism that developed > around a decade ago, "pre-ksplice Oracle acquisition." But we are > looking to transition to kpatch, and help towards supporting arm64. > Live patching is important for deploying kernel fixes faster than > predetermined scheduled draining of jobs in clusters. > > The first steps for kpatch transition is supporting builds with Clang. > Yonghyun is working on that and my hope is he will have patches for > you for that soon. That would be great! > Curiously, the proprietary mechanism doesn't rely on stack validation. If this proprietary mechanism relies on stack traces, that could problematic. Livepatch originally made the same assumption, but it was shot down quickly: https://lwn.net/Articles/634649/ https://lwn.net/Articles/658333/ > I think that such dependency was questioned on the cover letter > patch's thread as well. Yes, though it's generally agreed that unvalidated compiler-generated unwinder metadata isn't going to be robust enough for kernel live patching. > Maybe there's "some traps to look out for" you're referring to there? The "traps" are more about how the patches are generated. If they're built with source code, like a normal kernel module, you have to be extra careful because of function ABI nastiness. kpatch-build avoids this problem. Unfortunately this still isn't documented. > I'm not privy to the details, though I would guess it has to do with > ensuring kernel threads aren't executing (or planning to return > through) code regions that are trying to be patched/unpatched. Right. There are some good details in Documentation/livepatch/livepatch.rst. > I am curious about frame pointers never being omitted for arm64; is > frame pointer chasing is unreliable in certain contexts? Yes, problematic areas are interrupts, exceptions, inline asm, hand-coded asm. A nice document was recently added in Documentation/livepatch/reliable-stacktrace.rst which covers a lot of this stuff. > The internal functionality has been used heavily in production for > almost a decade, though without it being public or supporting arm64; > I'm not sure precisely how they solve such issues (or how others might > review such an approach). Very impressive to run it in production that long. Their experience and expertise is definitely welcome. > Either way, the dependencies for live patching are less important, so > long as they are toolchain portable. The ability to live patch kernel > images is ___important___ to Google. > > > > Objtool support on arm64 is interesting to me though, because it has > > > found bugs in LLVM codegen. That alone is extremely valuable. But not > > > it's not helpful if it's predicated or tightly coupled to GCC, as this > > > series appears to do. > > > > I agree 100%, if there are actual Clang livepatch users (which it sounds > > like there are) then we should target both compilers. > > Or will be. (Sorry, I didn't know we hadn't completed the transition > to kpatch yet. It is "the opposite side of the house" from where I > work; I literally have 8 bosses, not kidding). > > Though if kpatch moves to requiring GCC plugins for architectures we > use extensively or would like to use more of, that's probably going to > throw a wrench in multiple transition plans. (The fleet's transition > to Clang is done, I'm not worried about that). Hopefully we can just forget the GCC plugin idea. It would be really nice to see some performance numbers for -fno-jump-tables so we can justify doing that instead, at least in the short-term. I'd suspect the difference isn't measurable in the real world. (In the case of GCC+retpolines, it would be a performance improvement.) > > And yes, objtool has been pretty good at finding compiler bugs, so the > > more coverage the better. > > > The idea of rebuilding control flow from binary analysis and using > > > that to find codegen bugs is a really cool idea (novel, even? idk), > > > and I wish we had some analog for userspace binaries that could > > > perform similar checks. > > > > Objtool is generic in many ways -- in fact I recently heard from a PhD > > candidate who used it successfully on another kernel for an ORC > > unwinder. > > That's pretty cool! Reuse outside the initial context is always a > good sign that something was designed right. So basically you're saying objtool is both useful and well-designed. I will quote you on that! -- Josh
On Tue, Feb 2, 2021 at 3:36 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > On Tue, Feb 02, 2021 at 02:33:38PM -0800, Nick Desaulniers wrote: > > On Mon, Feb 1, 2021 at 4:02 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > > > On Mon, Feb 01, 2021 at 03:17:40PM -0800, Nick Desaulniers wrote: > > > And yes, objtool has been pretty good at finding compiler bugs, so the > > > more coverage the better. > > > > The idea of rebuilding control flow from binary analysis and using > > > > that to find codegen bugs is a really cool idea (novel, even? idk), > > > > and I wish we had some analog for userspace binaries that could > > > > perform similar checks. > > > > > > Objtool is generic in many ways -- in fact I recently heard from a PhD > > > candidate who used it successfully on another kernel for an ORC > > > unwinder. > > > > That's pretty cool! Reuse outside the initial context is always a > > good sign that something was designed right. > > So basically you're saying objtool is both useful and well-designed. I > will quote you on that! Haha, all I'm saying is that while I'm not proud that it did find bugs in LLVM (and I do have existing bugs found by it to fix on my plate), I don't see who else or how else those would have been spotted, and I can appreciate that. I think the tools given to us are broken (by design, perhaps), so anything that can help us spot issues might help our code live longer than we do. I also think that there's room for improvement and experimentation in debug info formats, though there is currently a proliferation to support. Live patching and eBPF seem to have some functional overlap IIUC, strengths/weaknesses, and their own unique debug info formats to go with it. Supporting each one does require some level of toolchain support or coordination (or complexity, even). -- Thanks, ~Nick Desaulniers
On Tue, Feb 02, 2021 at 03:01:22PM -0800, Nick Desaulniers wrote: > > >> Thus far we've been able to successfully reverse engineer it on x86, > > >> though it hasn't been easy. > > >> > > >> There were some particulars for arm64 which made doing so impossible. > > >> (I don't remember the details.) > > > > The main issue is that the tables for arm64 have more indirection than x86. > > I wonder if PAC or BTI also make this slightly more complex? PAC at > least has implications for unwinders, IIUC. What is PAC/BTI? > > On x86, the dispatching jump instruction fetches the target address from > > a contiguous array of addresses based on a given offset. So the list of > > potential targets of the jump is neatly organized in a table (and sure, > > before link time these are just relocation, but still processable). > > > > On arm64 (with GCC at least), what is stored in a table is an array of > > candidate offsets from the jump instruction. And because arm64 is > > limited to 32bit instructions, the encoding often requires multiple > > instructions to compute the target address: > > > > ldr<*> x_offset, [x_offsets_table, x_index, ...] // load offset > > adr x_dest_base, <addr> // load target branch for offset 0 > > add x_dest, x_target_base, x_offset, ... // compute final address > > br x_dest // jump > > > > Where this gets trickier is that (with GCC) the offsets stored in the > > table might or might not be signed constants (and this can be seen in > > GCC intermediate representations, but I do not believe this information > > is output in the final object file). And on top of that, GCC might > > decide to use offsets that are seen as unsigned during intermediate > > representation as signed offset by sign extending them in the add > > instruction. > > > > So, to handle this we'd have to track the different operation done with > > the offset, from the load to the final jump, decoding the instructions > > and deducing the potential target instructions from the table of offsets. > > > > But that is error prone as we don't really know how many instructions > > can be between the ones doing the address computation, and I remember > > some messy case of a jump table inside a jump table where tracking the > > instruction touching one or the other offset would need a lot of corner > > case handling. > > > > And this of course is just for GCC, I haven't looked at what it all > > looks like on Clang's end. > > Sure, but this is what production unwinders do, and they don't require > compiler plugins, right? What do you mean by "production unwinders"? Generally unwinders rely on either frame pointers or DWARF, but (without validation) those aren't robust enough for live patching in the kernel, so I'm not sure how this is relevant. > > > I think the details are pertinent to finding a portable solution. The > > > commit message of this commit in particular doesn't document such > > > details, such as why such an approach is necessary or how the data is > > > laid out for objtool to consume it. > > > > > > > Sorry, I will need to make that clearer. The next patch explains it a > > bit [1] > > > > Basically, for simplicity, the plugin creates a new section containing > > Right, this takes a focus on simplicity, at the cost of alienating a toolchain. > > Ard's point about 3193c0836f20 relating to -fgcse is that when > presented with tricky cases to unwind, the simplest approach is taken. > There it was disabling a compiler specific compiler optimization, here > it's either a compiler specific compiler plugin (or disabling another > compiler optimization). The pattern seems to be "Objtool isn't smart > enough" ... "compiler optimization disabled" or "compiler plugin > dependency." You're taking the two absolute worst case scenarios (one of which is just a patch which doesn't look like it's going to get merged anyway) and drawing a false narrative. In this case the simplest approach would have been to just give up and disable jump tables. We try as hard as possible (beyond turning objtool into a full emulator) to avoid doing that kind of thing because objtool isn't supposed to dictate kernel optimizations. Otherwise we would have disabled jump tables (even for non-retpolines) a long time ago, because that's been a serious PITA. You might not like the plugin -- I don't like it either -- but the goal was to avoid penalizing the kernel with "objtool-friendly" optimizations. That said, jump tables are such a pain for objtool (and currently impossible to deal with for arm64) that I'm completely open to just disabling them if they're shown to have negligible benefit for the kernel. -- Josh
On 2/3/21 12:01 AM, Nick Desaulniers wrote: > On Tue, Feb 2, 2021 at 12:57 AM Julien Thierry <jthierry@redhat.com> wrote: >> >> >> >> On 2/2/21 12:17 AM, Nick Desaulniers wrote: >>> On Mon, Feb 1, 2021 at 1:44 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: >>>> >>>> On Fri, Jan 29, 2021 at 10:10:01AM -0800, Nick Desaulniers wrote: >>>>> On Wed, Jan 27, 2021 at 3:27 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote: >>>>>> >>>>>> On Wed, Jan 27, 2021 at 02:15:57PM -0800, Nick Desaulniers wrote: >>>>>>>> From: Raphael Gault <raphael.gault@arm.com> >>>>>>>> >>>>>>>> This plugins comes into play before the final 2 RTL passes of GCC and >>>>>>>> detects switch-tables that are to be outputed in the ELF and writes >>>>>>>> information in an ".discard.switch_table_info" section which will be >>>>>>>> used by objtool. >>>>>>>> >>>>>>>> Signed-off-by: Raphael Gault <raphael.gault@arm.com> >>>>>>>> [J.T.: Change section name to store switch table information, >>>>>>>> Make plugin Kconfig be selected rather than opt-in by user, >>>>>>>> Add a relocation in the switch_table_info that points to >>>>>>>> the jump operation itself] >>>>>>>> Signed-off-by: Julien Thierry <jthierry@redhat.com> >>>>>>> >>>>>>> Rather than tightly couple this feature to a particular toolchain via >>>>>>> plugin, it might be nice to consider what features could be spec'ed out >>>>>>> for toolchains to implement (perhaps via a -f flag). >>>>>> >>>>>> The problem is being able to detect switch statement jump table vectors. >>>>>> >>>>>> For a given indirect branch (due to a switch statement), what are all >>>>>> the corresponding jump targets? >>>>>> >>>>>> We would need the compiler to annotate that information somehow. >>>>> >>>>> Makes sense, the compiler should have this information. How is this >>>>> problem solved on x86? >>>> >>>> Thus far we've been able to successfully reverse engineer it on x86, >>>> though it hasn't been easy. >>>> >>>> There were some particulars for arm64 which made doing so impossible. >>>> (I don't remember the details.) >> >> The main issue is that the tables for arm64 have more indirection than x86. > > I wonder if PAC or BTI also make this slightly more complex? PAC at > least has implications for unwinders, IIUC. > >> >> On x86, the dispatching jump instruction fetches the target address from >> a contiguous array of addresses based on a given offset. So the list of >> potential targets of the jump is neatly organized in a table (and sure, >> before link time these are just relocation, but still processable). >> >> On arm64 (with GCC at least), what is stored in a table is an array of >> candidate offsets from the jump instruction. And because arm64 is >> limited to 32bit instructions, the encoding often requires multiple >> instructions to compute the target address: >> >> ldr<*> x_offset, [x_offsets_table, x_index, ...] // load offset >> adr x_dest_base, <addr> // load target branch for offset 0 >> add x_dest, x_target_base, x_offset, ... // compute final address >> br x_dest // jump >> >> Where this gets trickier is that (with GCC) the offsets stored in the >> table might or might not be signed constants (and this can be seen in >> GCC intermediate representations, but I do not believe this information >> is output in the final object file). And on top of that, GCC might >> decide to use offsets that are seen as unsigned during intermediate >> representation as signed offset by sign extending them in the add >> instruction. >> >> So, to handle this we'd have to track the different operation done with >> the offset, from the load to the final jump, decoding the instructions >> and deducing the potential target instructions from the table of offsets. >> >> But that is error prone as we don't really know how many instructions >> can be between the ones doing the address computation, and I remember >> some messy case of a jump table inside a jump table where tracking the >> instruction touching one or the other offset would need a lot of corner >> case handling. >> >> And this of course is just for GCC, I haven't looked at what it all >> looks like on Clang's end. > > Sure, but this is what production unwinders do, and they don't require > compiler plugins, right? I don't doubt unwinders can be made simpler > with changes to toolchain output; please work with your compiler > vendor on making such changes rather than relying on compiler plugins > to do so. > I think there is a small confusion. The plugin nor the data it generates is not to be used by a kernel unwinder. It's here to allow objtool to assess whether the code being checked can be unwound (?) reliably (not omitting functions). Part of this is checking that a branch/jump in a function does not end up in some code that is not related to the function without setting up a call frame. This is about static validation rather than functionality. >>> I think the details are pertinent to finding a portable solution. The >>> commit message of this commit in particular doesn't document such >>> details, such as why such an approach is necessary or how the data is >>> laid out for objtool to consume it. >>> >> >> Sorry, I will need to make that clearer. The next patch explains it a >> bit [1] >> >> Basically, for simplicity, the plugin creates a new section containing > > Right, this takes a focus on simplicity, at the cost of alienating a toolchain. > > Ard's point about 3193c0836f20 relating to -fgcse is that when > presented with tricky cases to unwind, the simplest approach is taken. > There it was disabling a compiler specific compiler optimization, here > it's either a compiler specific compiler plugin (or disabling another > compiler optimization). The pattern seems to be "Objtool isn't smart > enough" ... "compiler optimization disabled" or "compiler plugin > dependency." > >> tables (one per jump table) of references to the jump targets, similar >> to what x86 has, except that in this case this table isn't actually used >> by runtime code and is discarded at link time. I only chose this to >> minimize what needed to be changed in objtool and because the format >> seemed simple enough. >> >> But I'm open on some alternative, whether it's a -fjump-table-info > > Yes, I think we could spec out something like that. But I would > appreciate revisiting open questions around stack validation (frame > pointers), preventing the generation of jump tables to begin with > (-fno-jump-tables) in place of making objtool more robust, or > generally the need to depend on compiler plugins. > I'll give it a try at least for the arm64 side. Thanks, -- Julien Thierry
On Tue, Feb 02, 2021 at 06:14:14PM -0600, Josh Poimboeuf wrote: > > Sure, but this is what production unwinders do, and they don't require > > compiler plugins, right? > > What do you mean by "production unwinders"? Generally unwinders rely on > either frame pointers or DWARF, but (without validation) those aren't > robust enough for live patching in the kernel, so I'm not sure how this > is relevant. Not to mention that DWARF and consequently it's unders are horribly large, complex and above all fragile things. There's a reason ORC got invented, DWARF is simlpy unacceptable and inadequate. Now, one avenue that has been mentioned in the past, but I've not seen recently, is to have objtool use DWARF as input to help it understand the code. At least in userspace we can rely on DWARF libs. But I'm fairly sure people aren't jumping up and down for having to always build their kernel with DWARFs on, compile speed etc..
On Tue, Feb 02, 2021 at 06:14:14PM -0600, Josh Poimboeuf wrote: > On Tue, Feb 02, 2021 at 03:01:22PM -0800, Nick Desaulniers wrote: > > I wonder if PAC or BTI also make this slightly more complex? PAC at > > least has implications for unwinders, IIUC. > What is PAC/BTI? PAC and BTI are ARM architecture extensions. PAC uses a tag in pointers to sign and verify them, presenting a barrier to ROP, and when BTI is active only specific instructions can be branched to. Since PAC modifies pointers when it is active the unwinder has to undo the tagging to understand what's being pointed to, that's already there.
On Tue, Feb 02, 2021 at 06:14:14PM -0600, Josh Poimboeuf wrote: > On Tue, Feb 02, 2021 at 03:01:22PM -0800, Nick Desaulniers wrote: > > > >> Thus far we've been able to successfully reverse engineer it on x86, > > > >> though it hasn't been easy. > > > >> > > > >> There were some particulars for arm64 which made doing so impossible. > > > >> (I don't remember the details.) > > > > > > The main issue is that the tables for arm64 have more indirection than x86. > > > > I wonder if PAC or BTI also make this slightly more complex? PAC at > > least has implications for unwinders, IIUC. > > What is PAC/BTI? PAC is "Pointer Authentication Codes". The gist is that we munge some bits in pointers when they get stored in memory (called "signing"), and undo that with a check (called "authentication") when reading from memory, in order to detect unexpected modification. There's some new instructions that may exist in function prologues and epilogues, etc. There's a basic introduction at: https://events.static.linuxfound.org/sites/events/files/slides/slides_23.pdf https://www.kernel.org/doc/html/latest/arm64/pointer-authentication.html Return address signing/authentication uses the SP as an input, so without knowing the SP something was signed against it's not possible to alter it reliably (or to check it). The arm64 unwinder ignores the PAC bits, and ftrace uses patchable-function-entry so that we don't have to do anything special to manipulate the return address. Today the ABI used by the kernel doesn't mess with the pointers used in jump tables, but that may come in future as toolchain folk are working to define an ABI that might. BTI is "Branch Target Identification", which is a bit like CET's indirect branch tracking -- indirect branches need to land on a specific instruction, or they'll raise an exception. Thanks, Mark.
> On 3 Feb 2021, at 00:01, Nick Desaulniers <ndesaulniers@google.com> wrote: > > I wonder if PAC or BTI also make this slightly more complex? PAC at > least has implications for unwinders, IIUC. BTI has effect on the jump target\case because a landing pad is required for each and every target. As I see that would not hurt here. The unwinder shall take care of the PAC signed return address if reads the LR from the stack. DWARF contains an entry where the LR is got signed\authenticated. I’m wondering would be simpler or possible to transform the DWARD to ORC.
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 05e17351e4f3..93a320cc8e03 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -100,6 +100,7 @@ config ARM64 select DMA_DIRECT_REMAP select EDAC_SUPPORT select FRAME_POINTER + select GCC_PLUGIN_SWITCH_TABLES if STACK_VALIDATION select GENERIC_ALLOCATOR select GENERIC_ARCH_TOPOLOGY select GENERIC_CLOCKEVENTS_BROADCAST diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins index 952e46876329..8af322311f6b 100644 --- a/scripts/Makefile.gcc-plugins +++ b/scripts/Makefile.gcc-plugins @@ -46,6 +46,8 @@ ifdef CONFIG_GCC_PLUGIN_ARM_SSP_PER_TASK endif export DISABLE_ARM_SSP_PER_TASK_PLUGIN +gcc-plugin-$(CONFIG_GCC_PLUGIN_SWITCH_TABLES) += arm64_switch_table_detection_plugin.so + # All the plugin CFLAGS are collected here in case a build target needs to # filter them out of the KBUILD_CFLAGS. GCC_PLUGINS_CFLAGS := $(strip $(addprefix -fplugin=$(objtree)/scripts/gcc-plugins/, $(gcc-plugin-y)) $(gcc-plugin-cflags-y)) diff --git a/scripts/gcc-plugins/Kconfig b/scripts/gcc-plugins/Kconfig index ab9eb4cbe33a..76efbb97d223 100644 --- a/scripts/gcc-plugins/Kconfig +++ b/scripts/gcc-plugins/Kconfig @@ -104,4 +104,8 @@ config GCC_PLUGIN_ARM_SSP_PER_TASK bool depends on GCC_PLUGINS && ARM +config GCC_PLUGIN_SWITCH_TABLES + bool + depends on GCC_PLUGINS && ARM64 + endif diff --git a/scripts/gcc-plugins/arm64_switch_table_detection_plugin.c b/scripts/gcc-plugins/arm64_switch_table_detection_plugin.c new file mode 100644 index 000000000000..60ef00ff2c5b --- /dev/null +++ b/scripts/gcc-plugins/arm64_switch_table_detection_plugin.c @@ -0,0 +1,85 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <stdio.h> +#include "gcc-common.h" + +__visible int plugin_is_GPL_compatible; + +#define GEN_QUAD(rtx) assemble_integer_with_op(".quad ", rtx) + +/* + * Create an array of metadata for each jump table found in the rtl. + * The metadata contains: + * - A reference to first instruction part of the RTL expanded into an + * acutal jump + * - The number of entries in the table of offsets + * - A reference to each possible jump target + * + * Separate each entry with a null quad word. + */ +static unsigned int arm64_switchtbl_rtl_execute(void) +{ + rtx_insn *insn; + rtx_insn *labelp = NULL; + rtx_jump_table_data *tablep = NULL; + section *swt_sec; + section *curr_sec = current_function_section(); + + swt_sec = get_section(".discard.switch_table_info", + SECTION_DEBUG | SECTION_EXCLUDE, NULL); + + for (insn = get_insns(); insn; insn = NEXT_INSN(insn)) { + /* + * Find a tablejump_p INSN (using a dispatch table) + */ + if (!tablejump_p(insn, &labelp, &tablep)) + continue; + + if (labelp && tablep) { + rtx_code_label *label_to_jump; + rtvec jump_labels = tablep->get_labels(); + int nr_labels = GET_NUM_ELEM(jump_labels); + int i; + + label_to_jump = gen_label_rtx(); + SET_LABEL_KIND(label_to_jump, LABEL_NORMAL); + emit_label_before(label_to_jump, insn); + LABEL_PRESERVE_P(label_to_jump) = 1; + + switch_to_section(swt_sec); + GEN_QUAD(GEN_INT(0)); // mark separation between rela tables + GEN_QUAD(gen_rtx_LABEL_REF(Pmode, label_to_jump)); + GEN_QUAD(GEN_INT(nr_labels)); + for (i = 0; i < nr_labels; i++) + GEN_QUAD(gen_rtx_LABEL_REF(Pmode, + label_ref_label(RTVEC_ELT(jump_labels, i)))); + switch_to_section(curr_sec); + delete_insn(label_to_jump); + } + } + return 0; +} + +#define PASS_NAME arm64_switchtbl_rtl + +#define NO_GATE +#include "gcc-generate-rtl-pass.h" + +__visible int plugin_init(struct plugin_name_args *plugin_info, + struct plugin_gcc_version *version) +{ + const char * const plugin_name = plugin_info->base_name; + + if (!plugin_default_version_check(version, &gcc_version)) { + error(G_("incompatible gcc/plugin versions")); + return 1; + } + + PASS_INFO(arm64_switchtbl_rtl, "final", 1, + PASS_POS_INSERT_BEFORE); + + register_callback(plugin_info->base_name, PLUGIN_PASS_MANAGER_SETUP, + NULL, &arm64_switchtbl_rtl_pass_info); + + return 0; +}