mbox series

[v3,0/9] RISCV: Add kvm Sstc timer selftests

Message ID cover.1694421911.git.haibo1.xu@intel.com
Headers show
Series RISCV: Add kvm Sstc timer selftests | expand

Message

Haibo Xu Sept. 14, 2023, 1:36 a.m. UTC
The RISC-V arch_timer selftests is used to validate Sstc timer
functionality in a guest, which sets up periodic timer interrupts
and check the basic interrupt status upon its receipt.

This KVM selftests was ported from aarch64 arch_timer and tested
with Linux v6.6-rc1 on a Qemu riscv64 virt machine.

---
Changed since v2:
  * Rebase to Linux 6.6-rc1
  * Add separate patch for kvm/Makefile improvement
  * Move aarch64 specific macros to aarch64/arch_timer.c
  * Add -DCONFIG_64BIT to kvm/Makefile CFLAGS to ensure
    only 64bit registers were available in csr.h
  * Avoid some #ifdef in kvm/arch_timer.c by setting some
    aarch64 specific variable to 0 on risc-v

Haibo Xu (9):
  KVM: selftests: Unify the codes for guest exception handling
  KVM: selftests: Unify the makefile rule for split targets
  KVM: arm64: selftests: Split arch_timer test code
  tools: riscv: Add header file csr.h
  KVM: riscv: selftests: Switch to use macro from csr.h
  KVM: riscv: selftests: Add exception handling support
  KVM: riscv: selftests: Add guest helper to get vcpu id
  KVM: riscv: selftests: Change vcpu_has_ext to a common function
  KVM: riscv: selftests: Add sstc timer test

 tools/arch/riscv/include/asm/csr.h            | 521 ++++++++++++++++++
 tools/testing/selftests/kvm/Makefile          |  14 +-
 .../selftests/kvm/aarch64/arch_timer.c        | 291 +---------
 .../selftests/kvm/aarch64/debug-exceptions.c  |   4 +-
 .../selftests/kvm/aarch64/page_fault_test.c   |   4 +-
 .../testing/selftests/kvm/aarch64/vgic_irq.c  |   4 +-
 tools/testing/selftests/kvm/arch_timer.c      | 250 +++++++++
 .../selftests/kvm/include/aarch64/processor.h |  12 +-
 .../selftests/kvm/include/kvm_util_base.h     |   9 +
 .../selftests/kvm/include/riscv/arch_timer.h  |  80 +++
 .../selftests/kvm/include/riscv/processor.h   |  63 ++-
 .../testing/selftests/kvm/include/test_util.h |   2 +
 .../selftests/kvm/include/timer_test.h        |  43 ++
 .../selftests/kvm/include/x86_64/processor.h  |   5 -
 .../selftests/kvm/lib/aarch64/processor.c     |   6 +-
 .../selftests/kvm/lib/riscv/handlers.S        | 101 ++++
 .../selftests/kvm/lib/riscv/processor.c       |  86 +++
 .../selftests/kvm/lib/x86_64/processor.c      |   4 +-
 .../testing/selftests/kvm/riscv/arch_timer.c  | 107 ++++
 .../selftests/kvm/riscv/get-reg-list.c        |  16 +-
 tools/testing/selftests/kvm/x86_64/amx_test.c |   4 +-
 .../selftests/kvm/x86_64/fix_hypercall_test.c |   4 +-
 .../selftests/kvm/x86_64/hyperv_evmcs.c       |   4 +-
 .../selftests/kvm/x86_64/hyperv_features.c    |   8 +-
 .../testing/selftests/kvm/x86_64/hyperv_ipi.c |   6 +-
 .../selftests/kvm/x86_64/kvm_pv_test.c        |   4 +-
 .../selftests/kvm/x86_64/monitor_mwait_test.c |   4 +-
 .../kvm/x86_64/pmu_event_filter_test.c        |   8 +-
 .../smaller_maxphyaddr_emulation_test.c       |   4 +-
 .../selftests/kvm/x86_64/svm_int_ctl_test.c   |   4 +-
 .../kvm/x86_64/svm_nested_shutdown_test.c     |   4 +-
 .../kvm/x86_64/svm_nested_soft_inject_test.c  |   4 +-
 .../kvm/x86_64/ucna_injection_test.c          |   8 +-
 .../kvm/x86_64/userspace_msr_exit_test.c      |   4 +-
 .../vmx_exception_with_invalid_guest_state.c  |   4 +-
 .../selftests/kvm/x86_64/vmx_pmu_caps_test.c  |   4 +-
 .../selftests/kvm/x86_64/xapic_ipi_test.c     |   4 +-
 .../selftests/kvm/x86_64/xcr0_cpuid_test.c    |   4 +-
 .../selftests/kvm/x86_64/xen_shinfo_test.c    |   4 +-
 39 files changed, 1338 insertions(+), 374 deletions(-)
 create mode 100644 tools/arch/riscv/include/asm/csr.h
 create mode 100644 tools/testing/selftests/kvm/arch_timer.c
 create mode 100644 tools/testing/selftests/kvm/include/riscv/arch_timer.h
 create mode 100644 tools/testing/selftests/kvm/include/timer_test.h
 create mode 100644 tools/testing/selftests/kvm/lib/riscv/handlers.S
 create mode 100644 tools/testing/selftests/kvm/riscv/arch_timer.c

Comments

Haibo Xu Sept. 15, 2023, 6:13 a.m. UTC | #1
On Thu, Sep 14, 2023 at 4:52 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Thu, Sep 14, 2023 at 09:36:59AM +0800, Haibo Xu wrote:
> > Signed-off-by: Haibo Xu <haibo1.xu@intel.com>
> > ---
> >  tools/testing/selftests/kvm/include/riscv/processor.h | 8 +-------
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
> > index 5b62a3d2aa9b..67766baed4f7 100644
> > --- a/tools/testing/selftests/kvm/include/riscv/processor.h
> > +++ b/tools/testing/selftests/kvm/include/riscv/processor.h
> > @@ -8,6 +8,7 @@
> >  #define SELFTEST_KVM_PROCESSOR_H
> >
> >  #include "kvm_util.h"
> > +#include <asm/csr.h>
> >  #include <linux/stringify.h>
>
> nit: Usually we try to keep the order of our includes separated into five
> categories, listed below, where each category is sorted alphabetically. Of
> course any dependencies the includes have on each other need to be
> considered too.
>
> <library-includes-without-a-subdir>
> <library-includes-with-subdir>
> <linux/...>
> <asm/...>
> "local-includes"
>

It's very helpful. Thanks for sharing it!
Will fix it in v4.

> >
> >  static inline uint64_t __kvm_reg_id(uint64_t type, uint64_t idx,
> > @@ -95,13 +96,6 @@ static inline uint64_t __kvm_reg_id(uint64_t type, uint64_t idx,
> >  #define PGTBL_PAGE_SIZE                              PGTBL_L0_BLOCK_SIZE
> >  #define PGTBL_PAGE_SIZE_SHIFT                        PGTBL_L0_BLOCK_SHIFT
> >
> > -#define SATP_PPN                             _AC(0x00000FFFFFFFFFFF, UL)
> > -#define SATP_MODE_39                         _AC(0x8000000000000000, UL)
> > -#define SATP_MODE_48                         _AC(0x9000000000000000, UL)
> > -#define SATP_ASID_BITS                               16
> > -#define SATP_ASID_SHIFT                              44
> > -#define SATP_ASID_MASK                               _AC(0xFFFF, UL)
> > -
> >  #define SBI_EXT_EXPERIMENTAL_START           0x08000000
> >  #define SBI_EXT_EXPERIMENTAL_END             0x08FFFFFF
> >
> > --
> > 2.34.1
> >
>
> Assuming the CONFIG_64BIT patch will come before this, then
>
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
>
> Thanks,
> drew
Haibo Xu Sept. 15, 2023, 6:21 a.m. UTC | #2
On Thu, Sep 14, 2023 at 5:52 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Thu, Sep 14, 2023 at 09:37:03AM +0800, Haibo Xu wrote:
> > Add a KVM selftests to validate the Sstc timer functionality.
> > The test was ported from arm64 arch timer test.
>
> I just tried this test out. Running it over and over again on QEMU I see
> it works sometimes, but it frequently fails with the
> GUEST_ASSERT_EQ(config_iter + 1, irq_iter) assert and at least once I
> also saw the __GUEST_ASSERT(xcnt >= cmp) assert.
>

Good catch!

I can also reproduce this issue and it is a common problem for both
arm64 and riscv because it also happens in a arm64 Qemu VM.

It seems like a synchronization issue between host and guest shared
variables. Will double check the test code.

> Thanks,
> drew
Haibo Xu Sept. 15, 2023, 6:23 a.m. UTC | #3
On Thu, Sep 14, 2023 at 6:15 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Thu, Sep 14, 2023 at 10:52:15AM +0100, Conor Dooley wrote:
> > On Thu, Sep 14, 2023 at 11:36:01AM +0200, Andrew Jones wrote:
> > > > +static inline void cpu_relax(void)
> > > > +{
> > > > +#ifdef __riscv_zihintpause
> > > > + asm volatile("pause" ::: "memory");
> > > > +#else
> > > > + /* Encoding of the pause instruction */
> > > > + asm volatile(".4byte 0x100000F" ::: "memory");
> > > > +#endif
> > > > +}
> > >
> > > cpu_relax() should go to include/riscv/processor.h
> >
> > Can the one from asm/vdso/processor.h be reused, or are there special
> > considerations preventing that?
>
> We'd need to copy it into tools/arch/riscv/include/asm, but it could be
> done. Hmm, now that I look at it, I see we're missing the barrier() call
> in this kvm selftests version.
>

Will reuse the one from asm/vdso/processor.h and copy it to
tools/arch/riscv/include/asm.

> Thanks,
> drew
Haibo Xu Dec. 4, 2023, 2:42 a.m. UTC | #4
On Fri, Sep 15, 2023 at 2:21 PM Haibo Xu <xiaobo55x@gmail.com> wrote:
>
> On Thu, Sep 14, 2023 at 5:52 PM Andrew Jones <ajones@ventanamicro.com> wrote:
> >
> > On Thu, Sep 14, 2023 at 09:37:03AM +0800, Haibo Xu wrote:
> > > Add a KVM selftests to validate the Sstc timer functionality.
> > > The test was ported from arm64 arch timer test.
> >
> > I just tried this test out. Running it over and over again on QEMU I see
> > it works sometimes, but it frequently fails with the
> > GUEST_ASSERT_EQ(config_iter + 1, irq_iter) assert and at least once I
> > also saw the __GUEST_ASSERT(xcnt >= cmp) assert.
> >
>
> Good catch!
>
> I can also reproduce this issue and it is a common problem for both
> arm64 and riscv because it also happens in a arm64 Qemu VM.
>
> It seems like a synchronization issue between host and guest shared
> variables. Will double check the test code.
>
> > Thanks,
> > drew

Hi Andrew,

After several rounds of regression testing, some findings:
1. The intermittent failure also happened on ARM64 Qemu VM, and even
in the initial arch_timer commit(4959d8650e9f4).
2. it didn't happen on a ARM64 HW(but a different failure occured
during stress test)
3. The failure have a close relationship with
TIMER_TEST_ERR_MARGIN_US(default 100), and after increasing
     the macro to 300, the failure couldn't reproduced in 1000 loops
stress test in RISC-V Qemu VM

So my suggestion is we can expose the TIMER_TEST_ERR_MARGIN_US
parameter as an arch_timer test arg parameter
and tune it based on a specific test environment.

What's your opinion?

Regards,
Haibo
Andrew Jones Dec. 4, 2023, 11:32 a.m. UTC | #5
On Mon, Dec 04, 2023 at 10:42:24AM +0800, Haibo Xu wrote:
> On Fri, Sep 15, 2023 at 2:21 PM Haibo Xu <xiaobo55x@gmail.com> wrote:
> >
> > On Thu, Sep 14, 2023 at 5:52 PM Andrew Jones <ajones@ventanamicro.com> wrote:
> > >
> > > On Thu, Sep 14, 2023 at 09:37:03AM +0800, Haibo Xu wrote:
> > > > Add a KVM selftests to validate the Sstc timer functionality.
> > > > The test was ported from arm64 arch timer test.
> > >
> > > I just tried this test out. Running it over and over again on QEMU I see
> > > it works sometimes, but it frequently fails with the
> > > GUEST_ASSERT_EQ(config_iter + 1, irq_iter) assert and at least once I
> > > also saw the __GUEST_ASSERT(xcnt >= cmp) assert.
> > >
> >
> > Good catch!
> >
> > I can also reproduce this issue and it is a common problem for both
> > arm64 and riscv because it also happens in a arm64 Qemu VM.
> >
> > It seems like a synchronization issue between host and guest shared
> > variables. Will double check the test code.
> >
> > > Thanks,
> > > drew
> 
> Hi Andrew,
> 
> After several rounds of regression testing, some findings:
> 1. The intermittent failure also happened on ARM64 Qemu VM, and even
> in the initial arch_timer commit(4959d8650e9f4).
> 2. it didn't happen on a ARM64 HW(but a different failure occured
> during stress test)
> 3. The failure have a close relationship with
> TIMER_TEST_ERR_MARGIN_US(default 100), and after increasing
>      the macro to 300, the failure couldn't reproduced in 1000 loops
> stress test in RISC-V Qemu VM
> 
> So my suggestion is we can expose the TIMER_TEST_ERR_MARGIN_US
> parameter as an arch_timer test arg parameter
> and tune it based on a specific test environment.
> 
> What's your opinion?

The concept of "timeout for an interrupt to arrive" is always going to
leave us exposed to random failures. Your suggestion of making the
timeout user configurable is probably the best we can do. I would
suggest also adding more descriptive failure text and a hint about
trying to adjust the timeout.

Or, one thing we do in kvm-unit-tests, is to reduce typical delays while
allowing expected delays to be longer by looping over a shorter delay and
a non-fatal check, i.e.

 pass = false;
 for (i = 0; i < 10; i++) {
   udelay(100);
   if (check(...)) {
      pass = true;
      break;
   }
 }
 assert(pass);

We could try that approach here too.

Thanks,
drew
Haibo Xu Dec. 5, 2023, 7:58 a.m. UTC | #6
On Mon, Dec 4, 2023 at 7:32 PM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Mon, Dec 04, 2023 at 10:42:24AM +0800, Haibo Xu wrote:
> > On Fri, Sep 15, 2023 at 2:21 PM Haibo Xu <xiaobo55x@gmail.com> wrote:
> > >
> > > On Thu, Sep 14, 2023 at 5:52 PM Andrew Jones <ajones@ventanamicro.com> wrote:
> > > >
> > > > On Thu, Sep 14, 2023 at 09:37:03AM +0800, Haibo Xu wrote:
> > > > > Add a KVM selftests to validate the Sstc timer functionality.
> > > > > The test was ported from arm64 arch timer test.
> > > >
> > > > I just tried this test out. Running it over and over again on QEMU I see
> > > > it works sometimes, but it frequently fails with the
> > > > GUEST_ASSERT_EQ(config_iter + 1, irq_iter) assert and at least once I
> > > > also saw the __GUEST_ASSERT(xcnt >= cmp) assert.
> > > >
> > >
> > > Good catch!
> > >
> > > I can also reproduce this issue and it is a common problem for both
> > > arm64 and riscv because it also happens in a arm64 Qemu VM.
> > >
> > > It seems like a synchronization issue between host and guest shared
> > > variables. Will double check the test code.
> > >
> > > > Thanks,
> > > > drew
> >
> > Hi Andrew,
> >
> > After several rounds of regression testing, some findings:
> > 1. The intermittent failure also happened on ARM64 Qemu VM, and even
> > in the initial arch_timer commit(4959d8650e9f4).
> > 2. it didn't happen on a ARM64 HW(but a different failure occured
> > during stress test)
> > 3. The failure have a close relationship with
> > TIMER_TEST_ERR_MARGIN_US(default 100), and after increasing
> >      the macro to 300, the failure couldn't reproduced in 1000 loops
> > stress test in RISC-V Qemu VM
> >
> > So my suggestion is we can expose the TIMER_TEST_ERR_MARGIN_US
> > parameter as an arch_timer test arg parameter
> > and tune it based on a specific test environment.
> >
> > What's your opinion?
>
> The concept of "timeout for an interrupt to arrive" is always going to
> leave us exposed to random failures. Your suggestion of making the
> timeout user configurable is probably the best we can do. I would
> suggest also adding more descriptive failure text and a hint about
> trying to adjust the timeout.
>
> Or, one thing we do in kvm-unit-tests, is to reduce typical delays while
> allowing expected delays to be longer by looping over a shorter delay and
> a non-fatal check, i.e.
>
>  pass = false;
>  for (i = 0; i < 10; i++) {
>    udelay(100);
>    if (check(...)) {
>       pass = true;
>       break;
>    }
>  }
>  assert(pass);
>
> We could try that approach here too.
>
> Thanks,
> drew

Thanks for the feedback, I will send out patch set v4 soon!