mbox series

[v2,0/6] Add support for Control-Flow Integrity

Message ID 20201023200645.1055-1-dbuono@linux.vnet.ibm.com
Headers show
Series Add support for Control-Flow Integrity | expand

Message

Daniele Buono Oct. 23, 2020, 8:06 p.m. UTC
v2: Several months (and structural changes in QEMU) have passed since v1.
While the spirit of the patch is similar, the implementation is changed
in multiple points, and should address most if not all the comments
received in v1.
* Instead of disabling CFI in specific functions by using a filter file,
  disable cfi by using a new decorator to be prefixed to the function
  definition. The decorator is automatically expanded to an attribute
  asking clang to disable cfi-icall on the function.
  This should simplify tracking of sensitive function, compared to
  keeping the list in a separate file
  I tentatively added myself as maintainer of a new include file defined for
  that purpose, in case a maintainer is considered needed.
* Updated patch to work with the new build system based on meson
* Split LTO and CFI options. Now LTO can be used independently of CFI.
  LTO uses the meson option to build and can now work, in general, with
  any linker (ld, gold, lld). LTO with meson works fine with clang >=6
  and requires the use of LLVM's ar to handle shared libraries with
  intermediate code (selectable by setting the environment variable
  AR to llvm-ar-xx).
* Introduce a small patch for the linker script used by fuzzing targets,
  so that it works properly with both bfd and lld >=12
* Disable a couple of warning check that trigger errors with clang >= 11 
* Add additional checks for fuzzing and LTO. At the moment, only LLVM's
  lld linker v12 is able to support fuzzing and LTO, because of a bug in the
  bfd linker when handling --wrap with LTO. Therefore, automatically
  select lld if fuzzing and LTO are both enabled.
* Made sure that fuzzing works with LTO and CFI enabled.

-----
v1's cover letter starts here
-----

LLVM/Clang supports multiple types of forward-edge Control-Flow Integrity
(CFI), including runtime checks on indirect function calls.

CFI on indirect function calls can have a huge impact in enhancing QEMU
security, by significantly limiting one of the most used attack vectors
for VM Escape. Attacks demonstrated in [1],[2] and [3] will, at some
point, change a function pointer in a QEMU data structure.

At high level, LLVM's implementation relies on compile-time information
to create a range of consecutive trampolines for "compatible functions".
At runtime, if the pointer is not in the valid range, it is assumed that
the control flow was hijacked, and the process is terminated with an
"Illegal Instruction" exception.

CAVEATS:

1) For this CFI check to work, the code must always respect the function
signature when using function pointer. While this is generally true
in QEMU, there are a few instances where pointers are handled as
generic void* from the caller. Since this is a common approach, Clang
offer a flag to relax pointer checks and consider all pointer types
to be compatible.

2) Since CFI relies on compile-time information, it requires using
link-time optimization (LTO) to support CFI across compilation units.
This adds a requirement for the gold linker, and LLVM's versions of
static libraries tools (ar, ranlib, nm).

3) CFI checks cannot be performed on shared libraries (given that functions
are not known at compile time). This means that indirect function calls
will fail if the function pointer belong to a shared library.
This does not seem to be a big issue for a standard QEMU deployment today,
but QEMU modules won't be able to work when CFI is enabled.
There is a way to allow shared library pointers, but it is experimental
in LLVM, requires some work and reduces performance and security. For
these reasons, I think it's best to start with this version, and discuss
an extension for modules later.

4) CFI cannot be fully applied to TCG. The purpose of TCG is to transform
code on-the-fly from one ISA to another. In doing so, binary blobs of
executable code are created and called with function pointers.
Since the code did not exist at compile time, runtime CFI checks find such
functions illegal. To allow the code to keep running, CFI checks are not
performed in the core function of TCG/TCI, and in the code that
implements TCG plugins.
This does not affect QEMU when executed with KVM, and all the device
emulation code is always protected, even when using TCG.

5) Most of the logic to enable CFI goes in the configure, since it's
just a matter of checking for dependencies and incompatible options.
However, I had to disable CFI checks for a few TCG functions.
This can only be done through a blacklist file. I added a file in the
root of QEMU, called cfi-blacklist.txt for such purpose. I am open to
suggestions on where the file should go, and I am willing to become the
maintainer of it, if deemed necessary.

PERFORMANCE:

Enabling CFI creates a larger binary, which may be an issue in some use
cases. However, the increase is not exceptionally large. On my Ubuntu
system, with default options, I see an increase of stripped size from
14MiB to 15.3MiB when enabling CFI with Clang v9.

There is also a possible performance issue, since for every indirect
function call, and additional address check is performed, followed by
an additional indirect call to the trampoline function.
However, especially in the case of KVM-based virtualization, the impact
should be minimal, since indirect function pointers should be used mostly
for device emulation.

I used Kata Container's metrics tests since that is a simple,
reproducible set of tests to stress storage and network between VMs,
and run a Lifecycle test to measure VM startup times under a specific
workload. A full report is available here [4].

The difference between LLVM with and without CFI is generally low.
Sometimes CFI is actually offering better performance, which may be
explained by having a different binary layout because of LTO.
Lifecycle and network do not seem to be affected much. With storage,
the situation is a bit more variable, but the oscillations seem to be
more related to the benchmark variability than the CFI overhead.

I also run a quick check-acceptance on full system VMs with and without CFI,
the results are at [4] and show comparable results, with CFI slightly
outperforming the default binary produced by LLVM.

----

[1] Mehdi Talbi and Paul Fariello. VM escape - QEMU Case Study
[2] Nelson Elhage. Virtunoid: Breaking out of KVM
[3] Marco Grassi and Kira. Vulnerability Discovery and Exploitation
of Virtualization Solutions for Cloud Computing and Desktops
[4] https://github.com/dbuono/QEMU-CFI-Performance

*** BLURB HERE ***

Daniele Buono (6):
  fuzz: Make fork_fuzz.ld compatible with LLVM's LLD
  configure: avoid new clang 11+ warnings
  configure: add option to enable LTO
  cfi: Initial support for cfi-icall in QEMU
  check-block: enable iotests with cfi-icall
  configure: add support for Control-Flow Integrity

 MAINTAINERS                   |   5 +
 accel/tcg/cpu-exec.c          |   9 ++
 configure                     | 214 ++++++++++++++++++++++++++++++++++
 include/qemu/sanitizers.h     |  22 ++++
 meson.build                   |   3 +
 plugins/core.c                |  25 ++++
 plugins/loader.c              |   5 +
 tcg/tci.c                     |   5 +
 tests/check-block.sh          |  18 +--
 tests/qtest/fuzz/fork_fuzz.ld |  12 +-
 util/main-loop.c              |   9 ++
 util/oslib-posix.c            |   9 ++
 12 files changed, 328 insertions(+), 8 deletions(-)
 create mode 100644 include/qemu/sanitizers.h

Comments

Eric Blake Oct. 23, 2020, 8:33 p.m. UTC | #1
On 10/23/20 3:06 PM, Daniele Buono wrote:
> v2: Several months (and structural changes in QEMU) have passed since v1.
> While the spirit of the patch is similar, the implementation is changed
> in multiple points, and should address most if not all the comments
> received in v1.

> 5) Most of the logic to enable CFI goes in the configure, since it's
> just a matter of checking for dependencies and incompatible options.
> However, I had to disable CFI checks for a few TCG functions.
> This can only be done through a blacklist file. I added a file in the
> root of QEMU, called cfi-blacklist.txt for such purpose. I am open to
> suggestions on where the file should go, and I am willing to become the
> maintainer of it, if deemed necessary.

In the meantime, we have commits like:

commit b199c682f1f0aaee22b2170a5fb885250057eec2
Author: Philippe Mathieu-Daudé <philmd@redhat.com>
Date:   Thu Sep 10 09:01:31 2020 +0200

    target/i386/kvm: Rename host_tsx_blacklisted() as host_tsx_broken()

    In order to use inclusive terminology, rename host_tsx_blacklisted()
    as host_tsx_broken().

which may help you in coming up with a more appropriate name for the new
file.

> 
>  MAINTAINERS                   |   5 +
>  accel/tcg/cpu-exec.c          |   9 ++
>  configure                     | 214 ++++++++++++++++++++++++++++++++++
>  include/qemu/sanitizers.h     |  22 ++++
>  meson.build                   |   3 +
>  plugins/core.c                |  25 ++++
>  plugins/loader.c              |   5 +
>  tcg/tci.c                     |   5 +
>  tests/check-block.sh          |  18 +--
>  tests/qtest/fuzz/fork_fuzz.ld |  12 +-
>  util/main-loop.c              |   9 ++
>  util/oslib-posix.c            |   9 ++
>  12 files changed, 328 insertions(+), 8 deletions(-)
>  create mode 100644 include/qemu/sanitizers.h

although I don't see a new file by that name here, so perhaps the v1
overview is now stale?
Daniele Buono Oct. 24, 2020, 11:58 a.m. UTC | #2
On 10/23/2020 4:33 PM, Eric Blake wrote:
> On 10/23/20 3:06 PM, Daniele Buono wrote:
>> v2: Several months (and structural changes in QEMU) have passed since v1.
>> While the spirit of the patch is similar, the implementation is changed
>> in multiple points, and should address most if not all the comments
>> received in v1.
> 
>> 5) Most of the logic to enable CFI goes in the configure, since it's
>> just a matter of checking for dependencies and incompatible options.
>> However, I had to disable CFI checks for a few TCG functions.
>> This can only be done through a blacklist file. I added a file in the
>> root of QEMU, called cfi-blacklist.txt for such purpose. I am open to
>> suggestions on where the file should go, and I am willing to become the
>> maintainer of it, if deemed necessary.
> 
> In the meantime, we have commits like:
> 
> commit b199c682f1f0aaee22b2170a5fb885250057eec2
> Author: Philippe Mathieu-Daudé <philmd@redhat.com>
> Date:   Thu Sep 10 09:01:31 2020 +0200
> 
>      target/i386/kvm: Rename host_tsx_blacklisted() as host_tsx_broken()
> 
>      In order to use inclusive terminology, rename host_tsx_blacklisted()
>      as host_tsx_broken().
> 
> which may help you in coming up with a more appropriate name for the new
> file.
> 
>>
>>   MAINTAINERS                   |   5 +
>>   accel/tcg/cpu-exec.c          |   9 ++
>>   configure                     | 214 ++++++++++++++++++++++++++++++++++
>>   include/qemu/sanitizers.h     |  22 ++++
>>   meson.build                   |   3 +
>>   plugins/core.c                |  25 ++++
>>   plugins/loader.c              |   5 +
>>   tcg/tci.c                     |   5 +
>>   tests/check-block.sh          |  18 +--
>>   tests/qtest/fuzz/fork_fuzz.ld |  12 +-
>>   util/main-loop.c              |   9 ++
>>   util/oslib-posix.c            |   9 ++
>>   12 files changed, 328 insertions(+), 8 deletions(-)
>>   create mode 100644 include/qemu/sanitizers.h
> 
> although I don't see a new file by that name here, so perhaps the v1
> overview is now stale?
> 
Correct, the v1 overview is stale on that regard. V2 is not using a
"broken" file anymore. CFI is now disabled by using an attribute
directly on the code.

 From the v2 overview:
* Instead of disabling CFI in specific functions by using a filter file,
   disable cfi by using a new decorator to be prefixed to the function
   definition.

Beside the removal of a non-inclusive term, I believe this is a better
way to track functions, since it is directly inside the code so everyone
working on those functions will see it immediately. It's safer with
regards of function naming changes and, hopefully, this will make
maintaining cfi easier.
Daniel P. Berrangé Oct. 26, 2020, 9:26 a.m. UTC | #3
On Fri, Oct 23, 2020 at 03:33:31PM -0500, Eric Blake wrote:
> On 10/23/20 3:06 PM, Daniele Buono wrote:

> > v2: Several months (and structural changes in QEMU) have passed since v1.

> > While the spirit of the patch is similar, the implementation is changed

> > in multiple points, and should address most if not all the comments

> > received in v1.

> 

> > 5) Most of the logic to enable CFI goes in the configure, since it's

> > just a matter of checking for dependencies and incompatible options.

> > However, I had to disable CFI checks for a few TCG functions.

> > This can only be done through a blacklist file. I added a file in the

> > root of QEMU, called cfi-blacklist.txt for such purpose. I am open to

> > suggestions on where the file should go, and I am willing to become the

> > maintainer of it, if deemed necessary.

> 

> In the meantime, we have commits like:

> 

> commit b199c682f1f0aaee22b2170a5fb885250057eec2

> Author: Philippe Mathieu-Daudé <philmd@redhat.com>

> Date:   Thu Sep 10 09:01:31 2020 +0200

> 

>     target/i386/kvm: Rename host_tsx_blacklisted() as host_tsx_broken()

> 

>     In order to use inclusive terminology, rename host_tsx_blacklisted()

>     as host_tsx_broken().

> 

> which may help you in coming up with a more appropriate name for the new

> file.


Something like  cfi-exclude-list.txt or cfi-skip-list.txt seems reasonable


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Paolo Bonzini Oct. 26, 2020, 9:51 a.m. UTC | #4
On 23/10/20 22:06, Daniele Buono wrote:
> This patch allows to compile QEMU with link-time optimization (LTO).
> Compilation with LTO is handled directly by meson. This patch adds checks
> in configure to make sure the toolchain supports LTO.
> 
> Currently, allow LTO only with clang, since I have found a couple of issues
> with gcc-based LTO.
> 
> In case fuzzing is enabled, automatically switch to llvm's linker (lld).
> The standard bfd linker has a bug where function wrapping (used by the fuzz*
> targets) is used in conjunction with LTO.
> 
> Tested with all major versions of clang from 6 to 12
> 
> Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>

What are the problems like if you have GCC or you ar/linker are not up
to the job?  I wouldn't mind omitting the tests since this has to be
enabled explicitly by the user.

Paolo
Daniel P. Berrangé Oct. 26, 2020, 3:50 p.m. UTC | #5
On Mon, Oct 26, 2020 at 10:51:43AM +0100, Paolo Bonzini wrote:
> On 23/10/20 22:06, Daniele Buono wrote:

> > This patch allows to compile QEMU with link-time optimization (LTO).

> > Compilation with LTO is handled directly by meson. This patch adds checks

> > in configure to make sure the toolchain supports LTO.

> > 

> > Currently, allow LTO only with clang, since I have found a couple of issues

> > with gcc-based LTO.

> > 

> > In case fuzzing is enabled, automatically switch to llvm's linker (lld).

> > The standard bfd linker has a bug where function wrapping (used by the fuzz*

> > targets) is used in conjunction with LTO.

> > 

> > Tested with all major versions of clang from 6 to 12

> > 

> > Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>

> 

> What are the problems like if you have GCC or you ar/linker are not up

> to the job?  I wouldn't mind omitting the tests since this has to be

> enabled explicitly by the user.


We temporarily disabled LTO in Fedora rawhide due to GCC bugs causing
wierd test suite asserts. Those were pre-release versions of GCC/binutils
though. I've just tested again and LTO works correctly, so I've enabled
LTO once again. 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Daniele Buono Oct. 27, 2020, 2:57 p.m. UTC | #6
In terms of ar and linker, if you don't have the right mix it will just
stop at link time with an error.

In terms of using gcc the errors may be a bit more subtle, similar to
what Daniel mentioned. Succesfully compiling but then showing issues at
runtime or in the test suite.

I'm using ubuntu 18.04 and the stock compiler (based on gcc 7.5) issues
a bunch of warnings but compile succesfully with LTO.
However, the tcg binary for sparc64 is broken. System-wide emulation
stops in OpenFirmware with an exception. User emulation triggers a
segmentation fault in some of the test cases. If I compile QEMU with
--enable-debug the tests magically work.

I briefly tested with gcc-9 and that seemed to work ok, buy your mileage
may vary

On 10/26/2020 11:50 AM, Daniel P. Berrangé wrote:
> On Mon, Oct 26, 2020 at 10:51:43AM +0100, Paolo Bonzini wrote:
>> On 23/10/20 22:06, Daniele Buono wrote:
>>> This patch allows to compile QEMU with link-time optimization (LTO).
>>> Compilation with LTO is handled directly by meson. This patch adds checks
>>> in configure to make sure the toolchain supports LTO.
>>>
>>> Currently, allow LTO only with clang, since I have found a couple of issues
>>> with gcc-based LTO.
>>>
>>> In case fuzzing is enabled, automatically switch to llvm's linker (lld).
>>> The standard bfd linker has a bug where function wrapping (used by the fuzz*
>>> targets) is used in conjunction with LTO.
>>>
>>> Tested with all major versions of clang from 6 to 12
>>>
>>> Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
>>
>> What are the problems like if you have GCC or you ar/linker are not up
>> to the job?  I wouldn't mind omitting the tests since this has to be
>> enabled explicitly by the user.
> 
> We temporarily disabled LTO in Fedora rawhide due to GCC bugs causing
> wierd test suite asserts. Those were pre-release versions of GCC/binutils
> though. I've just tested again and LTO works correctly, so I've enabled
> LTO once again.
> 
> Regards,
> Daniel
>
Daniel P. Berrangé Oct. 27, 2020, 3:17 p.m. UTC | #7
On Tue, Oct 27, 2020 at 10:57:14AM -0400, Daniele Buono wrote:
> In terms of ar and linker, if you don't have the right mix it will just
> stop at link time with an error.
> 
> In terms of using gcc the errors may be a bit more subtle, similar to
> what Daniel mentioned. Succesfully compiling but then showing issues at
> runtime or in the test suite.
> 
> I'm using ubuntu 18.04 and the stock compiler (based on gcc 7.5) issues
> a bunch of warnings but compile succesfully with LTO.
> However, the tcg binary for sparc64 is broken. System-wide emulation
> stops in OpenFirmware with an exception. User emulation triggers a
> segmentation fault in some of the test cases. If I compile QEMU with
> --enable-debug the tests magically work.
> 
> I briefly tested with gcc-9 and that seemed to work ok, buy your mileage
> may vary

This why we shouldn't artificially block use of LTO with GCC in
the configure script. It blocks completely legitimate usage of
LTO with GCC versions where it works.

The user can detect if their version of GCC is broken by running the
test suite during their build process, which is best practice already,
and actually testing the result.

> 
> On 10/26/2020 11:50 AM, Daniel P. Berrangé wrote:
> > On Mon, Oct 26, 2020 at 10:51:43AM +0100, Paolo Bonzini wrote:
> > > On 23/10/20 22:06, Daniele Buono wrote:
> > > > This patch allows to compile QEMU with link-time optimization (LTO).
> > > > Compilation with LTO is handled directly by meson. This patch adds checks
> > > > in configure to make sure the toolchain supports LTO.
> > > > 
> > > > Currently, allow LTO only with clang, since I have found a couple of issues
> > > > with gcc-based LTO.
> > > > 
> > > > In case fuzzing is enabled, automatically switch to llvm's linker (lld).
> > > > The standard bfd linker has a bug where function wrapping (used by the fuzz*
> > > > targets) is used in conjunction with LTO.
> > > > 
> > > > Tested with all major versions of clang from 6 to 12
> > > > 
> > > > Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
> > > 
> > > What are the problems like if you have GCC or you ar/linker are not up
> > > to the job?  I wouldn't mind omitting the tests since this has to be
> > > enabled explicitly by the user.
> > 
> > We temporarily disabled LTO in Fedora rawhide due to GCC bugs causing
> > wierd test suite asserts. Those were pre-release versions of GCC/binutils
> > though. I've just tested again and LTO works correctly, so I've enabled
> > LTO once again.
> > 
> > Regards,
> > Daniel
> > 
> 

Regards,
Daniel
Daniele Buono Oct. 27, 2020, 8:42 p.m. UTC | #8
Ok, no problem. I can definitely disable the check on GCC.

Paolo, would you like me to disable checks on AR/linker for lto too?
If so, should I add some of this information on a document, perhaps
docs/devel/lto.rst, so it is written somewhere for future uses?

--

Btw, using lto with gcc I found another interesting warning here
(adding scsi maintainer so they can chip in on the solution):

In function 'scsi_disk_new_request_dump',
     inlined from 'scsi_new_request' at 
../qemu-cfi-v3/hw/scsi/scsi-disk.c:2588:9:
../qemu-cfi-v3/hw/scsi/scsi-disk.c:2562:17: warning: argument 1 value 
'18446744073709551612' exceeds maximum object size 9223372036854775807 
[-Walloc-size-larger-than=]
      line_buffer = g_malloc(len * 5 + 1);
                  ^
../qemu-cfi-v3/hw/scsi/scsi-disk.c: In function 'scsi_new_request':
/usr/include/glib-2.0/glib/gmem.h:78:10: note: in a call to allocation 
function 'g_malloc' declared here
  gpointer g_malloc         (gsize  n_bytes) G_GNUC_MALLOC 
G_GNUC_ALLOC_SIZE(1);

This seems like a bug to me. len is a signed integer filled up by
scsi_cdb_length which can return -1 if it can't decode the command.
What would probably happen is that we try a g_malloc with something too
big and that would fail. However, scsi_disk_new_request_dump is used for
tracing and:

a) I believe an unknown command here is a possibility, and is
handled by the caller - scsi_new_request - that has the following:

     command = buf[0];
     ops = scsi_disk_reqops_dispatch[command];
     if (!ops) {
         ops = &scsi_disk_emulate_reqops;
     }

so a termination here on the malloc is probably not desired.

b) In the tracing, we should probably print the content of the buffer
anyway, so that the unknown command can be debugged. However, I don't
know what size I should use here.
I'm thinking either 1, to print just the command header in the buffer,
or the max size of the buffer, which I am not sure how to get.

Ideas or you prefer having an initial patch and then discuss it there?

On 10/27/2020 11:17 AM, Daniel P. Berrangé wrote:
> On Tue, Oct 27, 2020 at 10:57:14AM -0400, Daniele Buono wrote:
>> In terms of ar and linker, if you don't have the right mix it will just
>> stop at link time with an error.
>>
>> In terms of using gcc the errors may be a bit more subtle, similar to
>> what Daniel mentioned. Succesfully compiling but then showing issues at
>> runtime or in the test suite.
>>
>> I'm using ubuntu 18.04 and the stock compiler (based on gcc 7.5) issues
>> a bunch of warnings but compile succesfully with LTO.
>> However, the tcg binary for sparc64 is broken. System-wide emulation
>> stops in OpenFirmware with an exception. User emulation triggers a
>> segmentation fault in some of the test cases. If I compile QEMU with
>> --enable-debug the tests magically work.
>>
>> I briefly tested with gcc-9 and that seemed to work ok, buy your mileage
>> may vary
> 
> This why we shouldn't artificially block use of LTO with GCC in
> the configure script. It blocks completely legitimate usage of
> LTO with GCC versions where it works.
> 
> The user can detect if their version of GCC is broken by running the
> test suite during their build process, which is best practice already,
> and actually testing the result.
> 
>>
>> On 10/26/2020 11:50 AM, Daniel P. Berrangé wrote:
>>> On Mon, Oct 26, 2020 at 10:51:43AM +0100, Paolo Bonzini wrote:
>>>> On 23/10/20 22:06, Daniele Buono wrote:
>>>>> This patch allows to compile QEMU with link-time optimization (LTO).
>>>>> Compilation with LTO is handled directly by meson. This patch adds checks
>>>>> in configure to make sure the toolchain supports LTO.
>>>>>
>>>>> Currently, allow LTO only with clang, since I have found a couple of issues
>>>>> with gcc-based LTO.
>>>>>
>>>>> In case fuzzing is enabled, automatically switch to llvm's linker (lld).
>>>>> The standard bfd linker has a bug where function wrapping (used by the fuzz*
>>>>> targets) is used in conjunction with LTO.
>>>>>
>>>>> Tested with all major versions of clang from 6 to 12
>>>>>
>>>>> Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
>>>>
>>>> What are the problems like if you have GCC or you ar/linker are not up
>>>> to the job?  I wouldn't mind omitting the tests since this has to be
>>>> enabled explicitly by the user.
>>>
>>> We temporarily disabled LTO in Fedora rawhide due to GCC bugs causing
>>> wierd test suite asserts. Those were pre-release versions of GCC/binutils
>>> though. I've just tested again and LTO works correctly, so I've enabled
>>> LTO once again.
>>>
>>> Regards,
>>> Daniel
>>>
>>
> 
> Regards,
> Daniel
>
Paolo Bonzini Oct. 28, 2020, 6:44 a.m. UTC | #9
On 27/10/20 21:42, Daniele Buono wrote:
> Ok, no problem. I can definitely disable the check on GCC.
> 
> Paolo, would you like me to disable checks on AR/linker for lto too?
> If so, should I add some of this information on a document, perhaps
> docs/devel/lto.rst, so it is written somewhere for future uses?

I am not sure of the effects.  Does it simply effectively disable LTO or
is it something worse?

I'll look into the SCSI issue.

Paolo
Alex Bennée Oct. 28, 2020, 9:35 a.m. UTC | #10
Daniele Buono <dbuono@linux.vnet.ibm.com> writes:

> In terms of ar and linker, if you don't have the right mix it will just
> stop at link time with an error.
>
> In terms of using gcc the errors may be a bit more subtle, similar to
> what Daniel mentioned. Succesfully compiling but then showing issues at
> runtime or in the test suite.
>
> I'm using ubuntu 18.04 and the stock compiler (based on gcc 7.5) issues
> a bunch of warnings but compile succesfully with LTO.
> However, the tcg binary for sparc64 is broken.

sparc64-linux-user? I think that might be in a bit of a bit rotted state
- we had to disable running check-tcg on it in CI because of instability
so I wouldn't be surprised if messing around with LTO has dug up even
more gremlins.

> System-wide emulation
> stops in OpenFirmware with an exception. User emulation triggers a
> segmentation fault in some of the test cases. If I compile QEMU with
> --enable-debug the tests magically work.

Breakage in both system and linux-user emulation probably points at
something in the instruction decode being broken. Shame we don't have a
working risu setup for sparc64 to give the instruction handling a proper
work out.

>
> I briefly tested with gcc-9 and that seemed to work ok, buy your mileage
> may vary
>
> On 10/26/2020 11:50 AM, Daniel P. Berrangé wrote:
>> On Mon, Oct 26, 2020 at 10:51:43AM +0100, Paolo Bonzini wrote:
>>> On 23/10/20 22:06, Daniele Buono wrote:
>>>> This patch allows to compile QEMU with link-time optimization (LTO).
>>>> Compilation with LTO is handled directly by meson. This patch adds checks
>>>> in configure to make sure the toolchain supports LTO.
>>>>
>>>> Currently, allow LTO only with clang, since I have found a couple of issues
>>>> with gcc-based LTO.
>>>>
>>>> In case fuzzing is enabled, automatically switch to llvm's linker (lld).
>>>> The standard bfd linker has a bug where function wrapping (used by the fuzz*
>>>> targets) is used in conjunction with LTO.
>>>>
>>>> Tested with all major versions of clang from 6 to 12
>>>>
>>>> Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
>>>
>>> What are the problems like if you have GCC or you ar/linker are not up
>>> to the job?  I wouldn't mind omitting the tests since this has to be
>>> enabled explicitly by the user.
>> 
>> We temporarily disabled LTO in Fedora rawhide due to GCC bugs causing
>> wierd test suite asserts. Those were pre-release versions of GCC/binutils
>> though. I've just tested again and LTO works correctly, so I've enabled
>> LTO once again.
>> 
>> Regards,
>> Daniel
>>
Daniele Buono Oct. 28, 2020, 6:22 p.m. UTC | #11
If LTO is enabled with the wrong linker/ar:
- with the checks, it will exit at configure with an error. I can change 
this in a warning and disabling LTO if preferred.
- without the checks compilation will fail

If LTO is enabled with the wrong compiler (e.g. old gcc), you may get a 
bunch of warnings at compile time, and a binary that won't pass some of 
the tests in make check.

On 10/28/2020 2:44 AM, Paolo Bonzini wrote:
> On 27/10/20 21:42, Daniele Buono wrote:
>> Ok, no problem. I can definitely disable the check on GCC.
>>
>> Paolo, would you like me to disable checks on AR/linker for lto too?
>> If so, should I add some of this information on a document, perhaps
>> docs/devel/lto.rst, so it is written somewhere for future uses?
> 
> I am not sure of the effects.  Does it simply effectively disable LTO or
> is it something worse?
> 
> I'll look into the SCSI issue.
> 
> Paolo
>
Daniele Buono Oct. 28, 2020, 6:47 p.m. UTC | #12
On 10/28/2020 5:35 AM, Alex Bennée wrote:
> Breakage in both system and linux-user emulation probably points at
> something in the instruction decode being broken. Shame we don't have a
> working risu setup for sparc64 to give the instruction handling a proper
> work out.

This is what I'm thinking too. Interesting bit is that sparc32
seem to work fine, and it should be the same codebase.

I played a bit with a couple of days but couldn't isolate the faulty
instruction.  But I'd be happy to work on this issue with someone,
perhaps from the sparc maintainers, to see if we can find out what's
happening
Paolo Bonzini Oct. 29, 2020, 10:19 a.m. UTC | #13
On 28/10/20 19:22, Daniele Buono wrote:
> If LTO is enabled with the wrong linker/ar:
> - with the checks, it will exit at configure with an error. I can change
> this in a warning and disabling LTO if preferred.
> - without the checks compilation will fail
> 
> If LTO is enabled with the wrong compiler (e.g. old gcc), you may get a
> bunch of warnings at compile time, and a binary that won't pass some of
> the tests in make check.

I think both of these count as user error or compiler bug, which we
generally don't protect against.

There is one exception.  We check if the C++ compiler driver can link
object files produced by the C compiler driver; this issue arises if the
driver used for compilation (C) is GCC and the driver used for linking
(C++) is clang, because GCC and clang's sanitizer libraries are not
compatible with each other.

I think however that in this case the problem is not one of
compatibility, but just a broken install, so I think we can just ignore
and just forward b_lto.

Paolo