diff mbox

[v2] ARM: KVM: add irqfd and irq routing support

Message ID 1401694196-30861-1-git-send-email-eric.auger@linaro.org
State New
Headers show

Commit Message

Auger Eric June 2, 2014, 7:29 a.m. UTC
This patch enables irqfd and irq routing on ARM.

It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING

irqfd framework enables to assign physical IRQs to guests.

1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
injects the specified IRQ into the VM (the "GSI" takes the semantic of a
virtual IRQ for that guest).

2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
a virtual IRQ (aka irchip.pin).  This creates a so-called GSI routing entry.
When someone triggers the eventfd, irqfd handles it but uses the specified
routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
context the GSI takes the semantic of a physical IRQ while the irqchip.pin
takes the semantic of a virtual IRQ.

in 1) routing is used by irqfd but an identity routing is created by default
making the gsi = irqchip.pin. Note on ARM there is a single interrupt
controller kind, the GIC.

GSI routing mostly is implemented in generic irqchip.c.
The tiny ARM specific part is directly implemented in the virtual interrupt
controller (vgic.c) as it is done for powerpc for instance. This option was
prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
Hence irq_comm.c is not used at all.

Routing currently is not used for anything else than irqfd IRQ injection. Only
SPI can be injected. This means the vgic is not totally hidden behind the
irqchip. There are separate discussions on PPI/SGI routing.

Only level sensitive IRQs are supported (with a registered resampler). As a
reminder the resampler is a second eventfd called by irqfd framework when the
virtual IRQ is completed by the guest. This eventfd is supposed to be handled
on user-side

MSI routing is not supported yet.

This work was tested with Calxeda Midway xgmac main interrupt (with and without
explicit user routing) with qemu-system-arm and QEMU VFIO platform device.

changes v1 -> v2:
2 fixes:
- v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
  This is now vgic_set_assigned_irq that increments it before injection.
- v2 now handles the case where a pending assigned irq is cleared through
  MMIO access. The irq is properly acked allowing the resamplefd handler
  to possibly unmask the physical IRQ.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

Conflicts:
	Documentation/virtual/kvm/api.txt
	arch/arm/kvm/Kconfig

Conflicts:
	Documentation/virtual/kvm/api.txt
---
 Documentation/virtual/kvm/api.txt |   4 +-
 arch/arm/include/uapi/asm/kvm.h   |   8 +++
 arch/arm/kvm/Kconfig              |   2 +
 arch/arm/kvm/Makefile             |   1 +
 arch/arm/kvm/irq.h                |  25 +++++++
 virt/kvm/arm/vgic.c               | 141 ++++++++++++++++++++++++++++++++++++--
 6 files changed, 174 insertions(+), 7 deletions(-)
 create mode 100644 arch/arm/kvm/irq.h

Comments

Marc Zyngier June 2, 2014, 1:54 p.m. UTC | #1
Hi Eric,

On Mon, Jun 02 2014 at  8:29:56 am BST, Eric Auger <eric.auger@linaro.org> wrote:
> This patch enables irqfd and irq routing on ARM.
>
> It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING
>
> irqfd framework enables to assign physical IRQs to guests.
>
> 1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
> associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
> signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
> injects the specified IRQ into the VM (the "GSI" takes the semantic of a
> virtual IRQ for that guest).

Just so I can understand how this works: Who EOIs (handles) the physical
interrupt? If it is the VFIO driver, then I don't see how you prevent
the interrupt from firing again immediately (unless this is an edge
interrupt?).

> 2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
> VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
> a virtual IRQ (aka irchip.pin).  This creates a so-called GSI routing entry.
> When someone triggers the eventfd, irqfd handles it but uses the specified
> routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
> context the GSI takes the semantic of a physical IRQ while the irqchip.pin
> takes the semantic of a virtual IRQ.
>
> in 1) routing is used by irqfd but an identity routing is created by default
> making the gsi = irqchip.pin. Note on ARM there is a single interrupt
> controller kind, the GIC.
>
> GSI routing mostly is implemented in generic irqchip.c.
> The tiny ARM specific part is directly implemented in the virtual interrupt
> controller (vgic.c) as it is done for powerpc for instance. This option was
> prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
> Hence irq_comm.c is not used at all.
>
> Routing currently is not used for anything else than irqfd IRQ injection. Only
> SPI can be injected. This means the vgic is not totally hidden behind the
> irqchip. There are separate discussions on PPI/SGI routing.
>
> Only level sensitive IRQs are supported (with a registered resampler). As a
> reminder the resampler is a second eventfd called by irqfd framework when the
> virtual IRQ is completed by the guest. This eventfd is supposed to be handled
> on user-side
>
> MSI routing is not supported yet.
>
> This work was tested with Calxeda Midway xgmac main interrupt (with and without
> explicit user routing) with qemu-system-arm and QEMU VFIO platform device.
>
> changes v1 -> v2:
> 2 fixes:
> - v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
>   This is now vgic_set_assigned_irq that increments it before injection.
> - v2 now handles the case where a pending assigned irq is cleared through
>   MMIO access. The irq is properly acked allowing the resamplefd handler
>   to possibly unmask the physical IRQ.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
> Conflicts:
>         Documentation/virtual/kvm/api.txt
>         arch/arm/kvm/Kconfig
>
> Conflicts:
>         Documentation/virtual/kvm/api.txt
> ---
>  Documentation/virtual/kvm/api.txt |   4 +-
>  arch/arm/include/uapi/asm/kvm.h   |   8 +++
>  arch/arm/kvm/Kconfig              |   2 +
>  arch/arm/kvm/Makefile             |   1 +
>  arch/arm/kvm/irq.h                |  25 +++++++
>  virt/kvm/arm/vgic.c               | 141 ++++++++++++++++++++++++++++++++++++--
>  6 files changed, 174 insertions(+), 7 deletions(-)
>  create mode 100644 arch/arm/kvm/irq.h
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index b4f5365..b376334 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1339,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
>  4.52 KVM_SET_GSI_ROUTING
>
>  Capability: KVM_CAP_IRQ_ROUTING
> -Architectures: x86 ia64 s390
> +Architectures: x86 ia64 s390 arm
>  Type: vm ioctl
>  Parameters: struct kvm_irq_routing (in)
>  Returns: 0 on success, -1 on error
> @@ -2126,7 +2126,7 @@ into the hash PTE second double word).
>  4.75 KVM_IRQFD
>
>  Capability: KVM_CAP_IRQFD
> -Architectures: x86 s390
> +Architectures: x86 s390 arm
>  Type: vm ioctl
>  Parameters: struct kvm_irqfd (in)
>  Returns: 0 on success, -1 on error
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index ef0c878..89b864d 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -192,6 +192,14 @@ struct kvm_arch_memory_slot {
>  /* Highest supported SPI, from VGIC_NR_IRQS */
>  #define KVM_ARM_IRQ_GIC_MAX            127
>
> +/* needed by IRQ routing */
> +
> +/* One single KVM irqchip, ie. the VGIC */
> +#define KVM_NR_IRQCHIPS          1
> +
> +/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
> +#define KVM_IRQCHIP_NUM_PINS     256

Gahhhh... Please don't. We're trying hard to move away from hard-coded
definitions such as this, since GICv3 has much higher limits. And the
comment you've added perfectly outlines why this is such a bad idea
(even on GICv2, we can have up to 960 SPIs).

Have a look at what's brewing in my kvm-arm64/vgic-dyn branch.

>  /* PSCI interface */
>  #define KVM_PSCI_FN_BASE               0x95c1ba5e
>  #define KVM_PSCI_FN(n)                 (KVM_PSCI_FN_BASE + (n))
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 4be5bb1..096692c 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -24,6 +24,7 @@ config KVM
>         select KVM_MMIO
>         select KVM_ARM_HOST
>         depends on ARM_VIRT_EXT && ARM_LPAE && !CPU_BIG_ENDIAN
> +       select HAVE_KVM_EVENTFD
>         ---help---
>           Support hosting virtualized guest machines. You will also
>           need to select one or more of the processor modules below.
> @@ -56,6 +57,7 @@ config KVM_ARM_VGIC
>         bool "KVM support for Virtual GIC"
>         depends on KVM_ARM_HOST && OF
>         select HAVE_KVM_IRQCHIP
> +       select HAVE_KVM_IRQ_ROUTING
>         default y
>         ---help---
>           Adds support for a hardware assisted, in-kernel GIC emulation.
> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> index 789bca9..29de111 100644
> --- a/arch/arm/kvm/Makefile
> +++ b/arch/arm/kvm/Makefile
> @@ -21,4 +21,5 @@ obj-y += kvm-arm.o init.o interrupts.o
>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
>  obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
> +obj-$(CONFIG_HAVE_KVM_EVENTFD) += $(KVM)/eventfd.o $(KVM)/irqchip.o
>  obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
> diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h
> new file mode 100644
> index 0000000..4d6fcc6
> --- /dev/null
> +++ b/arch/arm/kvm/irq.h
> @@ -0,0 +1,25 @@
> +/*
> + * Copyright (C) 2014, STMicroelectronics
> + * Authors: Eric Auger <eric.auger@st.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef __IRQ_H
> +#define __IRQ_H
> +
> +#include <linux/kvm_host.h>
> +/*
> + * Placeholder for irqchip and irq/msi routing declarations
> + * included in irqchip.c
> + */
> +
> +#endif
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 56ff9be..39afa0d 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -93,6 +93,9 @@ static struct device_node *vgic_node;
>  #define ACCESS_WRITE_VALUE     (3 << 1)
>  #define ACCESS_WRITE_MASK(x)   ((x) & (3 << 1))
>
> +static struct kvm_irq_routing_entry identity_table[VGIC_NR_IRQS];
> +static int set_default_routing_table(struct kvm *kvm);
> +
>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>  static void vgic_update_state(struct kvm *kvm);
>  static void vgic_kick_vcpus(struct kvm *kvm);
> @@ -408,11 +411,27 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
>                                           struct kvm_exit_mmio *mmio,
>                                           phys_addr_t offset)
>  {
> -       u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
> +       struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +       unsigned int i;
> +       bool is_assigned_irq;
> +       DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
> +       DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
> +       unsigned long *pending =
> +               vgic_bitmap_get_shared_map(&dist->irq_state);
> +       u32 *reg;
> +       bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);

That's really heavy. You could find out which interrupts are potentially
affected (only 32 of them) and just handle those. Also, you do the copy
on both the read and write paths. Not great.

> +       reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>                                        vcpu->vcpu_id, offset);
>         vgic_reg_access(mmio, reg, offset,
>                         ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
>         if (mmio->is_write) {
> +               pending = vgic_bitmap_get_shared_map(&dist->irq_state);
> +               bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
> +               for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
> +                       is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
> +                       if (is_assigned_irq)
> +                               kvm_notify_acked_irq(vcpu->kvm, 0, i);

Are you saying that a masked interrupt should be treated the same as an
EOI-ed interrupt? That seems wrong from my PoV.

> +               }
>                 vgic_update_state(vcpu->kvm);
>                 return true;
>         }
> @@ -1172,6 +1191,8 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  {
>         struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>         bool level_pending = false;
> +       struct kvm *kvm;
> +       int is_assigned_irq;
>
>         kvm_debug("MISR = %08x\n", vgic_cpu->vgic_misr);
>
> @@ -1189,12 +1210,23 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>                         vgic_irq_clear_active(vcpu, irq);
>                         vgic_cpu->vgic_lr[lr] &= ~GICH_LR_EOI;
>
> +                       kvm = vcpu->kvm;
> +                       is_assigned_irq =
> +                               kvm_irq_has_notifier(kvm, 0, irq-VGIC_NR_PRIVATE_IRQS);
>                         /* Any additional pending interrupt? */
> -                       if (vgic_dist_irq_is_pending(vcpu, irq)) {
> -                               vgic_cpu_irq_set(vcpu, irq);
> -                               level_pending = true;
> -                       } else {
> +                       if (is_assigned_irq) {
>                                 vgic_cpu_irq_clear(vcpu, irq);
> +                               kvm_debug("EOI irqchip routed vIRQ %d\n", irq);
> +                               kvm_notify_acked_irq(kvm, 0,
> +                                       irq-VGIC_NR_PRIVATE_IRQS);
> +                               vgic_dist_irq_clear(vcpu, irq);
> +                       } else {
> +                               if (vgic_dist_irq_is_pending(vcpu, irq)) {
> +                                       vgic_cpu_irq_set(vcpu, irq);
> +                                       level_pending = true;
> +                               } else {
> +                                       vgic_cpu_irq_clear(vcpu, irq);
> +                               }
>                         }
>
>                         /*
> @@ -1627,6 +1659,8 @@ int kvm_vgic_create(struct kvm *kvm)
>         kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>         kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
>
> +       set_default_routing_table(kvm);
> +
>  out_unlock:
>         for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
>                 vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
> @@ -2017,3 +2051,100 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>         .get_attr = vgic_get_attr,
>         .has_attr = vgic_has_attr,
>  };
> +
> +
> +/*
> + * set up a default identity routing table
> + * The user-side can further change the routing table using
> + * KVM_SET_GSI_ROUTING VM ioctl
> + */
> +
> +static int set_default_routing_table(struct kvm *kvm)
> +{
> +       struct kvm_irq_routing_entry;
> +       int i;
> +       for (i = 0; i < VGIC_NR_IRQS; i++)      {
> +               identity_table[i].gsi = i;
> +               identity_table[i].type = KVM_IRQ_ROUTING_IRQCHIP;
> +               identity_table[i].u.irqchip.irqchip = 0;
> +               identity_table[i].u.irqchip.pin = i;
> +       }
> +       return kvm_set_irq_routing(kvm, identity_table,
> +                                  ARRAY_SIZE(identity_table), 0);
> +}
> +
> +
> +/*
> + * Functions needed for GSI routing (used by irqchip.c)
> + * implemented in irq_comm.c for x86 and ia64
> + * in architecture specific files for some other archictures (powerpc)
> + */
> +
> +static int vgic_set_assigned_irq(struct kvm_kernel_irq_routing_entry *e,
> +                          struct kvm *kvm, int irq_source_id, int level,
> +                          bool line_status)
> +{
> +       unsigned int spi = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
> +
> +       if (irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID) {
> +               /*
> +                * This path is not tested yet,
> +                * only irqchip with resampler was exercised
> +                */
> +               kvm_vgic_inject_irq(kvm, 0, spi, level);
> +       } else if (irq_source_id == KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID) {
> +               if (level == 1) {
> +                       kvm_debug("Inject irqchip routed vIRQ %d\n",
> +                                       e->irqchip.pin);
> +                       kvm_vgic_inject_irq(kvm, 0, spi, level);
> +                       /*
> +                        * toggling down vIRQ wire is directly handled in
> +                        * process_maintenance for this reason:
> +                        * irqfd_resampler_ack is called in
> +                        * process_maintenance which holds the dist lock.
> +                        * irqfd_resampler_ack calls kvm_set_irq
> +                        * which ends_up calling kvm_vgic_inject_irq.
> +                        * This later attempts to take the lock -> deadlock!
> +                        */
> +               }
> +       }
> +       return 0;
> +
> +}
> +
> +/* void implementation requested to compile irqchip.c */
> +
> +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
> +               struct kvm *kvm, int irq_source_id, int level, bool line_status)
> +{
> +       return 0;
> +}
> +
> +int kvm_set_routing_entry(struct kvm_irq_routing_table *rt,
> +                         struct kvm_kernel_irq_routing_entry *e,
> +                         const struct kvm_irq_routing_entry *ue)
> +{
> +       int r = -EINVAL;
> +
> +       switch (ue->type) {
> +       case KVM_IRQ_ROUTING_IRQCHIP:
> +               e->set = vgic_set_assigned_irq;
> +               e->irqchip.irqchip = ue->u.irqchip.irqchip;
> +               e->irqchip.pin = ue->u.irqchip.pin;
> +               if (e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS)
> +                       goto out;
> +               /* chip[0][virtualID] = physicalID  */
> +               rt->chip[ue->u.irqchip.irqchip][e->irqchip.pin] = ue->gsi;
> +               break;
> +       default:
> +               goto out;
> +       }
> +
> +       r = 0;
> +out:
> +       return r;
> +}

I clearly need to do some more reading about all of this, but some of
the questions/remarks I've outlined above need anwers.

Also, how does this combine with the work the VOS guys are doing?

Thanks,

	M.
Auger Eric June 2, 2014, 2:42 p.m. UTC | #2
On 06/02/2014 03:54 PM, Marc Zyngier wrote:
> Hi Eric,
> 
> On Mon, Jun 02 2014 at  8:29:56 am BST, Eric Auger <eric.auger@linaro.org> wrote:
>> This patch enables irqfd and irq routing on ARM.
>>
>> It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING
>>
>> irqfd framework enables to assign physical IRQs to guests.
>>
>> 1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
>> associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
>> signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
>> injects the specified IRQ into the VM (the "GSI" takes the semantic of a
>> virtual IRQ for that guest).
> 
Hi Marc,

First thanks for your review.

> Just so I can understand how this works: Who EOIs (handles) the physical
> interrupt? If it is the VFIO driver, then I don't see how you prevent
> the interrupt from firing again immediately (unless this is an edge
> interrupt?).

Yes the physical IRQ is handled by the VFIO platform driver. This later
masks the IRQ in the ISR before signaling the eventfd. The IRQ currently
is unmasked when the virtual IRQ is completed, from user side, using the
VFIO user API.
> 
>> 2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
>> VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
>> a virtual IRQ (aka irchip.pin).  This creates a so-called GSI routing entry.
>> When someone triggers the eventfd, irqfd handles it but uses the specified
>> routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
>> context the GSI takes the semantic of a physical IRQ while the irqchip.pin
>> takes the semantic of a virtual IRQ.
>>
>> in 1) routing is used by irqfd but an identity routing is created by default
>> making the gsi = irqchip.pin. Note on ARM there is a single interrupt
>> controller kind, the GIC.
>>
>> GSI routing mostly is implemented in generic irqchip.c.
>> The tiny ARM specific part is directly implemented in the virtual interrupt
>> controller (vgic.c) as it is done for powerpc for instance. This option was
>> prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
>> Hence irq_comm.c is not used at all.
>>
>> Routing currently is not used for anything else than irqfd IRQ injection. Only
>> SPI can be injected. This means the vgic is not totally hidden behind the
>> irqchip. There are separate discussions on PPI/SGI routing.
>>
>> Only level sensitive IRQs are supported (with a registered resampler). As a
>> reminder the resampler is a second eventfd called by irqfd framework when the
>> virtual IRQ is completed by the guest. This eventfd is supposed to be handled
>> on user-side
>>
>> MSI routing is not supported yet.
>>
>> This work was tested with Calxeda Midway xgmac main interrupt (with and without
>> explicit user routing) with qemu-system-arm and QEMU VFIO platform device.
>>
>> changes v1 -> v2:
>> 2 fixes:
>> - v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
>>   This is now vgic_set_assigned_irq that increments it before injection.
>> - v2 now handles the case where a pending assigned irq is cleared through
>>   MMIO access. The irq is properly acked allowing the resamplefd handler
>>   to possibly unmask the physical IRQ.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> Conflicts:
>>         Documentation/virtual/kvm/api.txt
>>         arch/arm/kvm/Kconfig
>>
>> Conflicts:
>>         Documentation/virtual/kvm/api.txt
>> ---
>>  Documentation/virtual/kvm/api.txt |   4 +-
>>  arch/arm/include/uapi/asm/kvm.h   |   8 +++
>>  arch/arm/kvm/Kconfig              |   2 +
>>  arch/arm/kvm/Makefile             |   1 +
>>  arch/arm/kvm/irq.h                |  25 +++++++
>>  virt/kvm/arm/vgic.c               | 141 ++++++++++++++++++++++++++++++++++++--
>>  6 files changed, 174 insertions(+), 7 deletions(-)
>>  create mode 100644 arch/arm/kvm/irq.h
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index b4f5365..b376334 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -1339,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
>>  4.52 KVM_SET_GSI_ROUTING
>>
>>  Capability: KVM_CAP_IRQ_ROUTING
>> -Architectures: x86 ia64 s390
>> +Architectures: x86 ia64 s390 arm
>>  Type: vm ioctl
>>  Parameters: struct kvm_irq_routing (in)
>>  Returns: 0 on success, -1 on error
>> @@ -2126,7 +2126,7 @@ into the hash PTE second double word).
>>  4.75 KVM_IRQFD
>>
>>  Capability: KVM_CAP_IRQFD
>> -Architectures: x86 s390
>> +Architectures: x86 s390 arm
>>  Type: vm ioctl
>>  Parameters: struct kvm_irqfd (in)
>>  Returns: 0 on success, -1 on error
>> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
>> index ef0c878..89b864d 100644
>> --- a/arch/arm/include/uapi/asm/kvm.h
>> +++ b/arch/arm/include/uapi/asm/kvm.h
>> @@ -192,6 +192,14 @@ struct kvm_arch_memory_slot {
>>  /* Highest supported SPI, from VGIC_NR_IRQS */
>>  #define KVM_ARM_IRQ_GIC_MAX            127
>>
>> +/* needed by IRQ routing */
>> +
>> +/* One single KVM irqchip, ie. the VGIC */
>> +#define KVM_NR_IRQCHIPS          1
>> +
>> +/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
>> +#define KVM_IRQCHIP_NUM_PINS     256
> 
> Gahhhh... Please don't. We're trying hard to move away from hard-coded
> definitions such as this, since GICv3 has much higher limits. And the
> comment you've added perfectly outlines why this is such a bad idea
> (even on GICv2, we can have up to 960 SPIs).
> 
> Have a look at what's brewing in my kvm-arm64/vgic-dyn branch.

OK I will do ;-)
> 
>>  /* PSCI interface */
>>  #define KVM_PSCI_FN_BASE               0x95c1ba5e
>>  #define KVM_PSCI_FN(n)                 (KVM_PSCI_FN_BASE + (n))
>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>> index 4be5bb1..096692c 100644
>> --- a/arch/arm/kvm/Kconfig
>> +++ b/arch/arm/kvm/Kconfig
>> @@ -24,6 +24,7 @@ config KVM
>>         select KVM_MMIO
>>         select KVM_ARM_HOST
>>         depends on ARM_VIRT_EXT && ARM_LPAE && !CPU_BIG_ENDIAN
>> +       select HAVE_KVM_EVENTFD
>>         ---help---
>>           Support hosting virtualized guest machines. You will also
>>           need to select one or more of the processor modules below.
>> @@ -56,6 +57,7 @@ config KVM_ARM_VGIC
>>         bool "KVM support for Virtual GIC"
>>         depends on KVM_ARM_HOST && OF
>>         select HAVE_KVM_IRQCHIP
>> +       select HAVE_KVM_IRQ_ROUTING
>>         default y
>>         ---help---
>>           Adds support for a hardware assisted, in-kernel GIC emulation.
>> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
>> index 789bca9..29de111 100644
>> --- a/arch/arm/kvm/Makefile
>> +++ b/arch/arm/kvm/Makefile
>> @@ -21,4 +21,5 @@ obj-y += kvm-arm.o init.o interrupts.o
>>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
>>  obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
>>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
>> +obj-$(CONFIG_HAVE_KVM_EVENTFD) += $(KVM)/eventfd.o $(KVM)/irqchip.o
>>  obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
>> diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h
>> new file mode 100644
>> index 0000000..4d6fcc6
>> --- /dev/null
>> +++ b/arch/arm/kvm/irq.h
>> @@ -0,0 +1,25 @@
>> +/*
>> + * Copyright (C) 2014, STMicroelectronics
>> + * Authors: Eric Auger <eric.auger@st.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + */
>> +
>> +#ifndef __IRQ_H
>> +#define __IRQ_H
>> +
>> +#include <linux/kvm_host.h>
>> +/*
>> + * Placeholder for irqchip and irq/msi routing declarations
>> + * included in irqchip.c
>> + */
>> +
>> +#endif
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 56ff9be..39afa0d 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -93,6 +93,9 @@ static struct device_node *vgic_node;
>>  #define ACCESS_WRITE_VALUE     (3 << 1)
>>  #define ACCESS_WRITE_MASK(x)   ((x) & (3 << 1))
>>
>> +static struct kvm_irq_routing_entry identity_table[VGIC_NR_IRQS];
>> +static int set_default_routing_table(struct kvm *kvm);
>> +
>>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>  static void vgic_update_state(struct kvm *kvm);
>>  static void vgic_kick_vcpus(struct kvm *kvm);
>> @@ -408,11 +411,27 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
>>                                           struct kvm_exit_mmio *mmio,
>>                                           phys_addr_t offset)
>>  {
>> -       u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>> +       struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +       unsigned int i;
>> +       bool is_assigned_irq;
>> +       DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
>> +       DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
>> +       unsigned long *pending =
>> +               vgic_bitmap_get_shared_map(&dist->irq_state);
>> +       u32 *reg;
>> +       bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);
> 
> That's really heavy. You could find out which interrupts are potentially
> affected (only 32 of them) and just handle those. Also, you do the copy
> on both the read and write paths. Not great.
for the copy you are fully right. I will add the check. Then to detect
which pending IRQ is cleared I need to further study how I can optimize.
Why do you say 32?  Can't any SPI be assigned to a guest?
> 
>> +       reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>>                                        vcpu->vcpu_id, offset);
>>         vgic_reg_access(mmio, reg, offset,
>>                         ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
>>         if (mmio->is_write) {
>> +               pending = vgic_bitmap_get_shared_map(&dist->irq_state);
>> +               bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
>> +               for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
>> +                       is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
>> +                       if (is_assigned_irq)
>> +                               kvm_notify_acked_irq(vcpu->kvm, 0, i);
> 
> Are you saying that a masked interrupt should be treated the same as an
> EOI-ed interrupt? That seems wrong from my PoV.
Actually all that stuff comes from a bug I encountered with
qemu_system_arm and the VFIO platform QEMU device (RFCv3 sent today).
The scenario is the following:
1) I launch a 1st qemu_system_arm session with one xgmac bound to the
KVM guest with vfio. IRQs are routed through irqfd.
2) I kill that session and launch a 2d one.

After the 1st session kill, xgmac ic still running (funny principle of
VFIO "meta" driver which is HW device agnostic and do not know how to
reset the xgmac). So very early I can see the xgmac sends a main IRQ
which is handled by the vfio platform driver. This later masks the IRQ
before signaling the eventfd. During guest setup I observe MMIO accesses
that clears the xgmac pending IRQ under the hood. So for that IRQ the
maintenance IRQ code will never be called, the notifier will not be be
acked and thus the IRQ is never unmasked at VFIO driver level. As a
result the xgmac driver gets stuck.

So currently this is the best fix I have found. VFIO Reset management
need to be further studied anyway. THis is planned but I understand it
will not straightforward.
> 
>> +               }
>>                 vgic_update_state(vcpu->kvm);
>>                 return true;
>>         }
>> @@ -1172,6 +1191,8 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>  {
>>         struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>         bool level_pending = false;
>> +       struct kvm *kvm;
>> +       int is_assigned_irq;
>>
>>         kvm_debug("MISR = %08x\n", vgic_cpu->vgic_misr);
>>
>> @@ -1189,12 +1210,23 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>                         vgic_irq_clear_active(vcpu, irq);
>>                         vgic_cpu->vgic_lr[lr] &= ~GICH_LR_EOI;
>>
>> +                       kvm = vcpu->kvm;
>> +                       is_assigned_irq =
>> +                               kvm_irq_has_notifier(kvm, 0, irq-VGIC_NR_PRIVATE_IRQS);
>>                         /* Any additional pending interrupt? */
>> -                       if (vgic_dist_irq_is_pending(vcpu, irq)) {
>> -                               vgic_cpu_irq_set(vcpu, irq);
>> -                               level_pending = true;
>> -                       } else {
>> +                       if (is_assigned_irq) {
>>                                 vgic_cpu_irq_clear(vcpu, irq);
>> +                               kvm_debug("EOI irqchip routed vIRQ %d\n", irq);
>> +                               kvm_notify_acked_irq(kvm, 0,
>> +                                       irq-VGIC_NR_PRIVATE_IRQS);
>> +                               vgic_dist_irq_clear(vcpu, irq);
>> +                       } else {
>> +                               if (vgic_dist_irq_is_pending(vcpu, irq)) {
>> +                                       vgic_cpu_irq_set(vcpu, irq);
>> +                                       level_pending = true;
>> +                               } else {
>> +                                       vgic_cpu_irq_clear(vcpu, irq);
>> +                               }
>>                         }
>>
>>                         /*
>> @@ -1627,6 +1659,8 @@ int kvm_vgic_create(struct kvm *kvm)
>>         kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>>         kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
>>
>> +       set_default_routing_table(kvm);
>> +
>>  out_unlock:
>>         for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
>>                 vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
>> @@ -2017,3 +2051,100 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>>         .get_attr = vgic_get_attr,
>>         .has_attr = vgic_has_attr,
>>  };
>> +
>> +
>> +/*
>> + * set up a default identity routing table
>> + * The user-side can further change the routing table using
>> + * KVM_SET_GSI_ROUTING VM ioctl
>> + */
>> +
>> +static int set_default_routing_table(struct kvm *kvm)
>> +{
>> +       struct kvm_irq_routing_entry;
>> +       int i;
>> +       for (i = 0; i < VGIC_NR_IRQS; i++)      {
>> +               identity_table[i].gsi = i;
>> +               identity_table[i].type = KVM_IRQ_ROUTING_IRQCHIP;
>> +               identity_table[i].u.irqchip.irqchip = 0;
>> +               identity_table[i].u.irqchip.pin = i;
>> +       }
>> +       return kvm_set_irq_routing(kvm, identity_table,
>> +                                  ARRAY_SIZE(identity_table), 0);
>> +}
>> +
>> +
>> +/*
>> + * Functions needed for GSI routing (used by irqchip.c)
>> + * implemented in irq_comm.c for x86 and ia64
>> + * in architecture specific files for some other archictures (powerpc)
>> + */
>> +
>> +static int vgic_set_assigned_irq(struct kvm_kernel_irq_routing_entry *e,
>> +                          struct kvm *kvm, int irq_source_id, int level,
>> +                          bool line_status)
>> +{
>> +       unsigned int spi = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
>> +
>> +       if (irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID) {
>> +               /*
>> +                * This path is not tested yet,
>> +                * only irqchip with resampler was exercised
>> +                */
>> +               kvm_vgic_inject_irq(kvm, 0, spi, level);
>> +       } else if (irq_source_id == KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID) {
>> +               if (level == 1) {
>> +                       kvm_debug("Inject irqchip routed vIRQ %d\n",
>> +                                       e->irqchip.pin);
>> +                       kvm_vgic_inject_irq(kvm, 0, spi, level);
>> +                       /*
>> +                        * toggling down vIRQ wire is directly handled in
>> +                        * process_maintenance for this reason:
>> +                        * irqfd_resampler_ack is called in
>> +                        * process_maintenance which holds the dist lock.
>> +                        * irqfd_resampler_ack calls kvm_set_irq
>> +                        * which ends_up calling kvm_vgic_inject_irq.
>> +                        * This later attempts to take the lock -> deadlock!
>> +                        */
>> +               }
>> +       }
>> +       return 0;
>> +
>> +}
>> +
>> +/* void implementation requested to compile irqchip.c */
>> +
>> +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
>> +               struct kvm *kvm, int irq_source_id, int level, bool line_status)
>> +{
>> +       return 0;
>> +}
>> +
>> +int kvm_set_routing_entry(struct kvm_irq_routing_table *rt,
>> +                         struct kvm_kernel_irq_routing_entry *e,
>> +                         const struct kvm_irq_routing_entry *ue)
>> +{
>> +       int r = -EINVAL;
>> +
>> +       switch (ue->type) {
>> +       case KVM_IRQ_ROUTING_IRQCHIP:
>> +               e->set = vgic_set_assigned_irq;
>> +               e->irqchip.irqchip = ue->u.irqchip.irqchip;
>> +               e->irqchip.pin = ue->u.irqchip.pin;
>> +               if (e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS)
>> +                       goto out;
>> +               /* chip[0][virtualID] = physicalID  */
>> +               rt->chip[ue->u.irqchip.irqchip][e->irqchip.pin] = ue->gsi;
>> +               break;
>> +       default:
>> +               goto out;
>> +       }
>> +
>> +       r = 0;
>> +out:
>> +       return r;
>> +}
> 
> I clearly need to do some more reading about all of this, but some of
> the questions/remarks I've outlined above need anwers.

hope I answered them.
> 
> Also, how does this combine with the work the VOS guys are doing?

This patch may address a subset of VOSYS original targets (ie. SPI
routing through irqfd). Besides VFIO platform driver I understood
Antonios original plan was to put the whole VGIC behind irqchip which
was not my intent here. But I will let him update us wrt their plans and
progress.

The patch intent is to achieve significant performance improvements
compared to traditional QEMU user-side IRQ routing. irqfd/GSI routing
framework achieves this by
- handling the eventfd on kernel-side instead of on user-side
- virtal EOI trapping at GIC level instead of trapping the MMIO access
into the IRQ status register.

Next step is to remove maintenance IRQ for assigned IRQs, thus removing
completion trapping... sure we will need to rediscuss that together ;-)

Best Regards

Eric

> 
> Thanks,
> 
> 	M.
>
Christoffer Dall June 5, 2014, 10:28 a.m. UTC | #3
On Mon, Jun 02, 2014 at 09:29:56AM +0200, Eric Auger wrote:
> This patch enables irqfd and irq routing on ARM.
> 
> It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING
> 
> irqfd framework enables to assign physical IRQs to guests.
> 
> 1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
> associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
> signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
> injects the specified IRQ into the VM (the "GSI" takes the semantic of a
> virtual IRQ for that guest).
> 
> 2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
> VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
> a virtual IRQ (aka irchip.pin).  This creates a so-called GSI routing entry.
> When someone triggers the eventfd, irqfd handles it but uses the specified
> routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
> context the GSI takes the semantic of a physical IRQ while the irqchip.pin
> takes the semantic of a virtual IRQ.
> 
> in 1) routing is used by irqfd but an identity routing is created by default
> making the gsi = irqchip.pin. Note on ARM there is a single interrupt
> controller kind, the GIC.
> 
> GSI routing mostly is implemented in generic irqchip.c.
> The tiny ARM specific part is directly implemented in the virtual interrupt
> controller (vgic.c) as it is done for powerpc for instance. This option was
> prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
> Hence irq_comm.c is not used at all.
> 
> Routing currently is not used for anything else than irqfd IRQ injection. Only
> SPI can be injected. This means the vgic is not totally hidden behind the
> irqchip. There are separate discussions on PPI/SGI routing.

What do you mean here?  Is there an ongoing discussion on the mailing
list somewhere?

> 
> Only level sensitive IRQs are supported (with a registered resampler). As a

Is it not trivial to add edge-triggered support in the same go?

> reminder the resampler is a second eventfd called by irqfd framework when the
> virtual IRQ is completed by the guest. This eventfd is supposed to be handled
> on user-side
> 
> MSI routing is not supported yet.
> 
> This work was tested with Calxeda Midway xgmac main interrupt (with and without
> explicit user routing) with qemu-system-arm and QEMU VFIO platform device.
> 
> changes v1 -> v2:
> 2 fixes:
> - v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
>   This is now vgic_set_assigned_irq that increments it before injection.
> - v2 now handles the case where a pending assigned irq is cleared through
>   MMIO access. The irq is properly acked allowing the resamplefd handler
>   to possibly unmask the physical IRQ.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> Conflicts:
> 	Documentation/virtual/kvm/api.txt
> 	arch/arm/kvm/Kconfig
> 
> Conflicts:
> 	Documentation/virtual/kvm/api.txt

We usually don't include these conflict notices when sending out
patches.

> ---
>  Documentation/virtual/kvm/api.txt |   4 +-
>  arch/arm/include/uapi/asm/kvm.h   |   8 +++
>  arch/arm/kvm/Kconfig              |   2 +
>  arch/arm/kvm/Makefile             |   1 +
>  arch/arm/kvm/irq.h                |  25 +++++++
>  virt/kvm/arm/vgic.c               | 141 ++++++++++++++++++++++++++++++++++++--
>  6 files changed, 174 insertions(+), 7 deletions(-)
>  create mode 100644 arch/arm/kvm/irq.h
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index b4f5365..b376334 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1339,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
>  4.52 KVM_SET_GSI_ROUTING
>  
>  Capability: KVM_CAP_IRQ_ROUTING
> -Architectures: x86 ia64 s390
> +Architectures: x86 ia64 s390 arm
>  Type: vm ioctl
>  Parameters: struct kvm_irq_routing (in)
>  Returns: 0 on success, -1 on error
> @@ -2126,7 +2126,7 @@ into the hash PTE second double word).
>  4.75 KVM_IRQFD
>  
>  Capability: KVM_CAP_IRQFD
> -Architectures: x86 s390
> +Architectures: x86 s390 arm
>  Type: vm ioctl
>  Parameters: struct kvm_irqfd (in)
>  Returns: 0 on success, -1 on error
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index ef0c878..89b864d 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -192,6 +192,14 @@ struct kvm_arch_memory_slot {
>  /* Highest supported SPI, from VGIC_NR_IRQS */
>  #define KVM_ARM_IRQ_GIC_MAX		127
>  
> +/* needed by IRQ routing */
> +
> +/* One single KVM irqchip, ie. the VGIC */
> +#define KVM_NR_IRQCHIPS          1
> +
> +/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
> +#define KVM_IRQCHIP_NUM_PINS     256

I don't even see how the comment correlates to the define.  Hmmm.  But
since Marc asked you to change this anyhow, maybe this doesn't matter
now.

> +
>  /* PSCI interface */
>  #define KVM_PSCI_FN_BASE		0x95c1ba5e
>  #define KVM_PSCI_FN(n)			(KVM_PSCI_FN_BASE + (n))
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 4be5bb1..096692c 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -24,6 +24,7 @@ config KVM
>  	select KVM_MMIO
>  	select KVM_ARM_HOST
>  	depends on ARM_VIRT_EXT && ARM_LPAE && !CPU_BIG_ENDIAN
> +	select HAVE_KVM_EVENTFD
>  	---help---
>  	  Support hosting virtualized guest machines. You will also
>  	  need to select one or more of the processor modules below.
> @@ -56,6 +57,7 @@ config KVM_ARM_VGIC
>  	bool "KVM support for Virtual GIC"
>  	depends on KVM_ARM_HOST && OF
>  	select HAVE_KVM_IRQCHIP
> +	select HAVE_KVM_IRQ_ROUTING
>  	default y
>  	---help---
>  	  Adds support for a hardware assisted, in-kernel GIC emulation.
> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> index 789bca9..29de111 100644
> --- a/arch/arm/kvm/Makefile
> +++ b/arch/arm/kvm/Makefile
> @@ -21,4 +21,5 @@ obj-y += kvm-arm.o init.o interrupts.o
>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
>  obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
> +obj-$(CONFIG_HAVE_KVM_EVENTFD) += $(KVM)/eventfd.o $(KVM)/irqchip.o
>  obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
> diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h
> new file mode 100644
> index 0000000..4d6fcc6
> --- /dev/null
> +++ b/arch/arm/kvm/irq.h
> @@ -0,0 +1,25 @@
> +/*
> + * Copyright (C) 2014, STMicroelectronics
> + * Authors: Eric Auger <eric.auger@st.com>

please use the Linaro copyright notice for this.  You can add your
@st.com e-mail in addition to the Linaro one in case you want that for
long-term support.

> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef __IRQ_H
> +#define __IRQ_H
> +
> +#include <linux/kvm_host.h>
> +/*
> + * Placeholder for irqchip and irq/msi routing declarations
> + * included in irqchip.c
> + */

But none needed now?

> +
> +#endif
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 56ff9be..39afa0d 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -93,6 +93,9 @@ static struct device_node *vgic_node;
>  #define ACCESS_WRITE_VALUE	(3 << 1)
>  #define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
>  
> +static struct kvm_irq_routing_entry identity_table[VGIC_NR_IRQS];
> +static int set_default_routing_table(struct kvm *kvm);
> +
>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>  static void vgic_update_state(struct kvm *kvm);
>  static void vgic_kick_vcpus(struct kvm *kvm);
> @@ -408,11 +411,27 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
>  					  struct kvm_exit_mmio *mmio,
>  					  phys_addr_t offset)
>  {
> -	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	unsigned int i;
> +	bool is_assigned_irq;
> +	DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
> +	DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
> +	unsigned long *pending =
> +		vgic_bitmap_get_shared_map(&dist->irq_state);
> +	u32 *reg;

please add a blank line between your declarations and the actual code.

> +	bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);
> +	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>  				       vcpu->vcpu_id, offset);
>  	vgic_reg_access(mmio, reg, offset,
>  			ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
>  	if (mmio->is_write) {
> +		pending = vgic_bitmap_get_shared_map(&dist->irq_state);
> +		bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
> +		for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
> +			is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
> +			if (is_assigned_irq)
> +				kvm_notify_acked_irq(vcpu->kvm, 0, i);
> +		}

As Mark pointed out, just copy the vgic reg value and do a simple xor on
the two instead, and factor out the two lines that check for
is_assigned_irq and calles notify_acked so that you can give a small
static function a semantically meaningful name and call that from both
this function and the process_maintenance function.

That being said, this whole thing feels a bit weird, I'll comment on the
other thread.

>  		vgic_update_state(vcpu->kvm);
>  		return true;
>  	}
> @@ -1172,6 +1191,8 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	bool level_pending = false;
> +	struct kvm *kvm;
> +	int is_assigned_irq;
>  
>  	kvm_debug("MISR = %08x\n", vgic_cpu->vgic_misr);
>  
> @@ -1189,12 +1210,23 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  			vgic_irq_clear_active(vcpu, irq);
>  			vgic_cpu->vgic_lr[lr] &= ~GICH_LR_EOI;
>  
> +			kvm = vcpu->kvm;
> +			is_assigned_irq =
> +				kvm_irq_has_notifier(kvm, 0, irq-VGIC_NR_PRIVATE_IRQS);

I think the preferred style is to keep the function call name on the
same line as the assignment, and the do a line break on the parameter
that doesn't fit in the line width and align that to the opening
parenthesis on the funciton call.  At least I prefer it that way.

also spaces around the '-', please.

>  			/* Any additional pending interrupt? */

This comment seems to only aplly for non-assigned IRQs now, right?

> -			if (vgic_dist_irq_is_pending(vcpu, irq)) {
> -				vgic_cpu_irq_set(vcpu, irq);
> -				level_pending = true;
> -			} else {
> +			if (is_assigned_irq) {
>  				vgic_cpu_irq_clear(vcpu, irq);
> +				kvm_debug("EOI irqchip routed vIRQ %d\n", irq);
> +				kvm_notify_acked_irq(kvm, 0,
> +					irq-VGIC_NR_PRIVATE_IRQS);

spaces around the '-', please.

> +				vgic_dist_irq_clear(vcpu, irq);
> +			} else {
> +				if (vgic_dist_irq_is_pending(vcpu, irq)) {
> +					vgic_cpu_irq_set(vcpu, irq);
> +					level_pending = true;
> +				} else {
> +					vgic_cpu_irq_clear(vcpu, irq);
> +				}
>  			}
>  
>  			/*
> @@ -1627,6 +1659,8 @@ int kvm_vgic_create(struct kvm *kvm)
>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>  	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
>  
> +	set_default_routing_table(kvm);
> +
>  out_unlock:
>  	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
>  		vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
> @@ -2017,3 +2051,100 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>  	.get_attr = vgic_get_attr,
>  	.has_attr = vgic_has_attr,
>  };
> +
> +
> +/*
> + * set up a default identity routing table
> + * The user-side can further change the routing table using
> + * KVM_SET_GSI_ROUTING VM ioctl
> + */
> +

don't add this whitespace between comments and a function declaration,
please check the entire patch for this.

> +static int set_default_routing_table(struct kvm *kvm)
> +{
> +	struct kvm_irq_routing_entry;
> +	int i;
> +	for (i = 0; i < VGIC_NR_IRQS; i++)	{
> +		identity_table[i].gsi = i;
> +		identity_table[i].type = KVM_IRQ_ROUTING_IRQCHIP;
> +		identity_table[i].u.irqchip.irqchip = 0;
> +		identity_table[i].u.irqchip.pin = i;
> +	}
> +	return kvm_set_irq_routing(kvm, identity_table,
> +				   ARRAY_SIZE(identity_table), 0);

is identity table used after this stage?  If not, could you not
dynamically allocate it and free it after use so we don't waste this
staic allocation of memory in the kernel?

> +}
> +
> +
> +/*
> + * Functions needed for GSI routing (used by irqchip.c)
> + * implemented in irq_comm.c for x86 and ia64
> + * in architecture specific files for some other archictures (powerpc)
> + */

Hmmm, this comment seems rather pointless in this file.  If you want to
describe what this function does, then just document this functionality
and the parameters/return value, i.e.

vgic_set_assigned_irq - set state of IRQs driven by irqfd

When an IRQ is raised or lowered.... blah blah blah blah.

@e:   the routing entry describing...
@kvm: the kvm struct
and so on.

return 0 on success, -error on errors.

> +
> +static int vgic_set_assigned_irq(struct kvm_kernel_irq_routing_entry *e,
> +			   struct kvm *kvm, int irq_source_id, int level,
> +			   bool line_status)
> +{
> +	unsigned int spi = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;

I noticed this in the changelogs too, where's the rationale/API
documentaiton for the use of irqchip.pin and its semantics on ARM?

Should we add something in Documentation/virtual/kvm/api.txt ?

> +
> +	if (irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID) {
> +		/*
> +		 * This path is not tested yet,
> +		 * only irqchip with resampler was exercised
> +		 */
> +		kvm_vgic_inject_irq(kvm, 0, spi, level);

hmmm, stuff like this definitely makes this patch an RFC.

> +	} else if (irq_source_id == KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID) {
> +		if (level == 1) {

This seems very wrong.  What if an external device raises a
level-triggered IRQ, but then lowers it again, without the guest was
ever running, now you have to wait until the guest sees the (now
inactive) interrupt and EOIs it before the interrupt is lowered on the
vgic?

> +			kvm_debug("Inject irqchip routed vIRQ %d\n",
> +					e->irqchip.pin);
> +			kvm_vgic_inject_irq(kvm, 0, spi, level);
> +			/*
> +			 * toggling down vIRQ wire is directly handled in
> +			 * process_maintenance for this reason:
> +			 * irqfd_resampler_ack is called in
> +			 * process_maintenance which holds the dist lock.
> +			 * irqfd_resampler_ack calls kvm_set_irq
> +			 * which ends_up calling kvm_vgic_inject_irq.
> +			 * This later attempts to take the lock -> deadlock!
> +			 */

Not sure I understand this comment.  What are we trying to achieve, are
we using some sort of a workaround to avoid a deadlock?

> +		}
> +	}
> +	return 0;
> +
> +}
> +
> +/* void implementation requested to compile irqchip.c */
> +

hmm, "TODO: MSI routing not yet implemented" seems more appropriate.

> +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
> +		struct kvm *kvm, int irq_source_id, int level, bool line_status)
> +{
> +	return 0;
> +}
> +
> +int kvm_set_routing_entry(struct kvm_irq_routing_table *rt,
> +			  struct kvm_kernel_irq_routing_entry *e,
> +			  const struct kvm_irq_routing_entry *ue)
> +{
> +	int r = -EINVAL;
> +
> +	switch (ue->type) {
> +	case KVM_IRQ_ROUTING_IRQCHIP:
> +		e->set = vgic_set_assigned_irq;
> +		e->irqchip.irqchip = ue->u.irqchip.irqchip;
> +		e->irqchip.pin = ue->u.irqchip.pin;
> +		if (e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS)
> +			goto out;
> +		/* chip[0][virtualID] = physicalID  */
> +		rt->chip[ue->u.irqchip.irqchip][e->irqchip.pin] = ue->gsi;

did we verify ue->u.irqchip.irqchip anywhere before reaching this code?

> +		break;
> +	default:
> +		goto out;
> +	}
> +
> +	r = 0;
> +out:
> +	return r;
> +}
> +
> +
> +
> +

That's a lot of unnecessary white space.

> -- 
> 1.9.1
> 

Thanks,
-Christoffer
Christoffer Dall June 5, 2014, 10:37 a.m. UTC | #4
On Mon, Jun 02, 2014 at 04:42:36PM +0200, Eric Auger wrote:
> On 06/02/2014 03:54 PM, Marc Zyngier wrote:
> > Hi Eric,
> > 
> > On Mon, Jun 02 2014 at  8:29:56 am BST, Eric Auger <eric.auger@linaro.org> wrote:

[...]

> >> @@ -408,11 +411,27 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
> >>                                           struct kvm_exit_mmio *mmio,
> >>                                           phys_addr_t offset)
> >>  {
> >> -       u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
> >> +       struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> >> +       unsigned int i;
> >> +       bool is_assigned_irq;
> >> +       DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
> >> +       DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
> >> +       unsigned long *pending =
> >> +               vgic_bitmap_get_shared_map(&dist->irq_state);
> >> +       u32 *reg;
> >> +       bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);
> > 
> > That's really heavy. You could find out which interrupts are potentially
> > affected (only 32 of them) and just handle those. Also, you do the copy
> > on both the read and write paths. Not great.
> for the copy you are fully right. I will add the check. Then to detect
> which pending IRQ is cleared I need to further study how I can optimize.
> Why do you say 32?  Can't any SPI be assigned to a guest?
> > 
> >> +       reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
> >>                                        vcpu->vcpu_id, offset);
> >>         vgic_reg_access(mmio, reg, offset,
> >>                         ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
> >>         if (mmio->is_write) {
> >> +               pending = vgic_bitmap_get_shared_map(&dist->irq_state);
> >> +               bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
> >> +               for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
> >> +                       is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
> >> +                       if (is_assigned_irq)
> >> +                               kvm_notify_acked_irq(vcpu->kvm, 0, i);
> > 
> > Are you saying that a masked interrupt should be treated the same as an
> > EOI-ed interrupt? That seems wrong from my PoV.
> Actually all that stuff comes from a bug I encountered with
> qemu_system_arm and the VFIO platform QEMU device (RFCv3 sent today).
> The scenario is the following:
> 1) I launch a 1st qemu_system_arm session with one xgmac bound to the
> KVM guest with vfio. IRQs are routed through irqfd.
> 2) I kill that session and launch a 2d one.
> 
> After the 1st session kill, xgmac ic still running (funny principle of
> VFIO "meta" driver which is HW device agnostic and do not know how to
> reset the xgmac). So very early I can see the xgmac sends a main IRQ
> which is handled by the vfio platform driver. This later masks the IRQ
> before signaling the eventfd. During guest setup I observe MMIO accesses
> that clears the xgmac pending IRQ under the hood. So for that IRQ the
> maintenance IRQ code will never be called, the notifier will not be be
> acked and thus the IRQ is never unmasked at VFIO driver level. As a
> result the xgmac driver gets stuck.
> 
> So currently this is the best fix I have found. VFIO Reset management
> need to be further studied anyway. THis is planned but I understand it
> will not straightforward.

It sounds to me like this is being done in the wrong place.  ICPENDRn
and ISPENDRn are used for saving/restoring state, not for
enabling/disabling IRQs.

Somehow, the VFIO driver must know that the xgmac has an active
interrupt which is masked.  When you setup the IRQ routing for your new
VM (when you start it the second time) there must be a mechanism to
probe this state, but you don't inject that into the VM until the VM
actually enables that IRQ (the guest kernel does request_irq()).  This
should already be handled by the existing vgic code.

Note that what should also happen is that the guest driver should now
reset the xgmac so that it doesn't inject IRQs, which should let VFIO
know to lower the line to the vgic too, and then only later it becomes
re-enabeld.  But that should be handled generically, iow. we shouldn't
write code specifically to address only such a flow of events.

[...]

> > 
> > I clearly need to do some more reading about all of this, but some of
> > the questions/remarks I've outlined above need anwers.
> 
> hope I answered them.
> > 
> > Also, how does this combine with the work the VOS guys are doing?
> 
> This patch may address a subset of VOSYS original targets (ie. SPI
> routing through irqfd). Besides VFIO platform driver I understood
> Antonios original plan was to put the whole VGIC behind irqchip which
> was not my intent here. But I will let him update us wrt their plans and
> progress.
> 
I thought we had this conversation with the VOS guys and that there were
no direct overlap between this work and what they had done, but the
details are a bit fussy and I also remember seeing that irq routing
patch from Antonios.

Antonios, other VOSYS guys, what are your thought here?

-Christoffer
Auger Eric June 5, 2014, 1:15 p.m. UTC | #5
On 06/05/2014 12:28 PM, Christoffer Dall wrote:
> On Mon, Jun 02, 2014 at 09:29:56AM +0200, Eric Auger wrote:
>> This patch enables irqfd and irq routing on ARM.
>>
>> It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING
>>
>> irqfd framework enables to assign physical IRQs to guests.
>>
>> 1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
>> associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
>> signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
>> injects the specified IRQ into the VM (the "GSI" takes the semantic of a
>> virtual IRQ for that guest).
>>
>> 2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
>> VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
>> a virtual IRQ (aka irchip.pin).  This creates a so-called GSI routing entry.
>> When someone triggers the eventfd, irqfd handles it but uses the specified
>> routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
>> context the GSI takes the semantic of a physical IRQ while the irqchip.pin
>> takes the semantic of a virtual IRQ.
>>
>> in 1) routing is used by irqfd but an identity routing is created by default
>> making the gsi = irqchip.pin. Note on ARM there is a single interrupt
>> controller kind, the GIC.
>>
>> GSI routing mostly is implemented in generic irqchip.c.
>> The tiny ARM specific part is directly implemented in the virtual interrupt
>> controller (vgic.c) as it is done for powerpc for instance. This option was
>> prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
>> Hence irq_comm.c is not used at all.
>>
>> Routing currently is not used for anything else than irqfd IRQ injection. Only
>> SPI can be injected. This means the vgic is not totally hidden behind the
>> irqchip. There are separate discussions on PPI/SGI routing.
> 
> What do you mean here?  Is there an ongoing discussion on the mailing
> list somewhere?

Hi Christoffer,

Thanks for the review.

I was refering to that thread where it was invoked to put the whole vgic
behind irqchip:
http://www.spinics.net/lists/kvm-arm/msg08413.html
> 
>>
>> Only level sensitive IRQs are supported (with a registered resampler). As a
> 
> Is it not trivial to add edge-triggered support in the same go?
Yes it shouldn't be a problem. It is more a matter of testing.
> 
>> reminder the resampler is a second eventfd called by irqfd framework when the
>> virtual IRQ is completed by the guest. This eventfd is supposed to be handled
>> on user-side
>>
>> MSI routing is not supported yet.
>>
>> This work was tested with Calxeda Midway xgmac main interrupt (with and without
>> explicit user routing) with qemu-system-arm and QEMU VFIO platform device.
>>
>> changes v1 -> v2:
>> 2 fixes:
>> - v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
>>   This is now vgic_set_assigned_irq that increments it before injection.
>> - v2 now handles the case where a pending assigned irq is cleared through
>>   MMIO access. The irq is properly acked allowing the resamplefd handler
>>   to possibly unmask the physical IRQ.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> Conflicts:
>> 	Documentation/virtual/kvm/api.txt
>> 	arch/arm/kvm/Kconfig
>>
>> Conflicts:
>> 	Documentation/virtual/kvm/api.txt
> 
> We usually don't include these conflict notices when sending out
> patches.

OK I will remove them
> 
>> ---
>>  Documentation/virtual/kvm/api.txt |   4 +-
>>  arch/arm/include/uapi/asm/kvm.h   |   8 +++
>>  arch/arm/kvm/Kconfig              |   2 +
>>  arch/arm/kvm/Makefile             |   1 +
>>  arch/arm/kvm/irq.h                |  25 +++++++
>>  virt/kvm/arm/vgic.c               | 141 ++++++++++++++++++++++++++++++++++++--
>>  6 files changed, 174 insertions(+), 7 deletions(-)
>>  create mode 100644 arch/arm/kvm/irq.h
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index b4f5365..b376334 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -1339,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
>>  4.52 KVM_SET_GSI_ROUTING
>>  
>>  Capability: KVM_CAP_IRQ_ROUTING
>> -Architectures: x86 ia64 s390
>> +Architectures: x86 ia64 s390 arm
>>  Type: vm ioctl
>>  Parameters: struct kvm_irq_routing (in)
>>  Returns: 0 on success, -1 on error
>> @@ -2126,7 +2126,7 @@ into the hash PTE second double word).
>>  4.75 KVM_IRQFD
>>  
>>  Capability: KVM_CAP_IRQFD
>> -Architectures: x86 s390
>> +Architectures: x86 s390 arm
>>  Type: vm ioctl
>>  Parameters: struct kvm_irqfd (in)
>>  Returns: 0 on success, -1 on error
>> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
>> index ef0c878..89b864d 100644
>> --- a/arch/arm/include/uapi/asm/kvm.h
>> +++ b/arch/arm/include/uapi/asm/kvm.h
>> @@ -192,6 +192,14 @@ struct kvm_arch_memory_slot {
>>  /* Highest supported SPI, from VGIC_NR_IRQS */
>>  #define KVM_ARM_IRQ_GIC_MAX		127
>>  
>> +/* needed by IRQ routing */
>> +
>> +/* One single KVM irqchip, ie. the VGIC */
>> +#define KVM_NR_IRQCHIPS          1
>> +
>> +/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
>> +#define KVM_IRQCHIP_NUM_PINS     256
> 
> I don't even see how the comment correlates to the define.  Hmmm.  But
> since Marc asked you to change this anyhow, maybe this doesn't matter
> now.

yes you're right. Those were the figures I was able to find in GIC400
TRM and I was confused by the fact I was not able to find them in the
code so I eventually put the same value as VGIC_NR_IRQS. I started
looking at kvm-arm64/vgic-dyn and found this dist nr_irqs but need more
time to investigate. Nethertheless his KVM_IRQCHIP_NUM_PINS is used in
generic code (irqchip).
> 
>> +
>>  /* PSCI interface */
>>  #define KVM_PSCI_FN_BASE		0x95c1ba5e
>>  #define KVM_PSCI_FN(n)			(KVM_PSCI_FN_BASE + (n))
>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>> index 4be5bb1..096692c 100644
>> --- a/arch/arm/kvm/Kconfig
>> +++ b/arch/arm/kvm/Kconfig
>> @@ -24,6 +24,7 @@ config KVM
>>  	select KVM_MMIO
>>  	select KVM_ARM_HOST
>>  	depends on ARM_VIRT_EXT && ARM_LPAE && !CPU_BIG_ENDIAN
>> +	select HAVE_KVM_EVENTFD
>>  	---help---
>>  	  Support hosting virtualized guest machines. You will also
>>  	  need to select one or more of the processor modules below.
>> @@ -56,6 +57,7 @@ config KVM_ARM_VGIC
>>  	bool "KVM support for Virtual GIC"
>>  	depends on KVM_ARM_HOST && OF
>>  	select HAVE_KVM_IRQCHIP
>> +	select HAVE_KVM_IRQ_ROUTING
>>  	default y
>>  	---help---
>>  	  Adds support for a hardware assisted, in-kernel GIC emulation.
>> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
>> index 789bca9..29de111 100644
>> --- a/arch/arm/kvm/Makefile
>> +++ b/arch/arm/kvm/Makefile
>> @@ -21,4 +21,5 @@ obj-y += kvm-arm.o init.o interrupts.o
>>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
>>  obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
>>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
>> +obj-$(CONFIG_HAVE_KVM_EVENTFD) += $(KVM)/eventfd.o $(KVM)/irqchip.o
>>  obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
>> diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h
>> new file mode 100644
>> index 0000000..4d6fcc6
>> --- /dev/null
>> +++ b/arch/arm/kvm/irq.h
>> @@ -0,0 +1,25 @@
>> +/*
>> + * Copyright (C) 2014, STMicroelectronics
>> + * Authors: Eric Auger <eric.auger@st.com>
> 
> please use the Linaro copyright notice for this.  You can add your
> @st.com e-mail in addition to the Linaro one in case you want that for
> long-term support.

OK I will do
> 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + */
>> +
>> +#ifndef __IRQ_H
>> +#define __IRQ_H
>> +
>> +#include <linux/kvm_host.h>
>> +/*
>> + * Placeholder for irqchip and irq/msi routing declarations
>> + * included in irqchip.c
>> + */
> 
> But none needed now?

Yes nothing for the time being.
> 
>> +
>> +#endif
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 56ff9be..39afa0d 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -93,6 +93,9 @@ static struct device_node *vgic_node;
>>  #define ACCESS_WRITE_VALUE	(3 << 1)
>>  #define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
>>  
>> +static struct kvm_irq_routing_entry identity_table[VGIC_NR_IRQS];
>> +static int set_default_routing_table(struct kvm *kvm);
>> +
>>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>  static void vgic_update_state(struct kvm *kvm);
>>  static void vgic_kick_vcpus(struct kvm *kvm);
>> @@ -408,11 +411,27 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
>>  					  struct kvm_exit_mmio *mmio,
>>  					  phys_addr_t offset)
>>  {
>> -	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	unsigned int i;
>> +	bool is_assigned_irq;
>> +	DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
>> +	DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
>> +	unsigned long *pending =
>> +		vgic_bitmap_get_shared_map(&dist->irq_state);
>> +	u32 *reg;
> 
> please add a blank line between your declarations and the actual code.

OK
> 
>> +	bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);
>> +	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>>  				       vcpu->vcpu_id, offset);
>>  	vgic_reg_access(mmio, reg, offset,
>>  			ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
>>  	if (mmio->is_write) {
>> +		pending = vgic_bitmap_get_shared_map(&dist->irq_state);
>> +		bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
>> +		for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
>> +			is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
>> +			if (is_assigned_irq)
>> +				kvm_notify_acked_irq(vcpu->kvm, 0, i);
>> +		}
> 
> As Mark pointed out, just copy the vgic reg value and do a simple xor on
> the two instead, and factor out the two lines that check for
> is_assigned_irq and calles notify_acked so that you can give a small
> static function a semantically meaningful name and call that from both
> this function and the process_maintenance function.

Yes I corrected this after looking into more details at the register
semantic.
> 
> That being said, this whole thing feels a bit weird, I'll comment on the
> other thread.
OK
> 
>>  		vgic_update_state(vcpu->kvm);
>>  		return true;
>>  	}
>> @@ -1172,6 +1191,8 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>  	bool level_pending = false;
>> +	struct kvm *kvm;
>> +	int is_assigned_irq;
>>  
>>  	kvm_debug("MISR = %08x\n", vgic_cpu->vgic_misr);
>>  
>> @@ -1189,12 +1210,23 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>  			vgic_irq_clear_active(vcpu, irq);
>>  			vgic_cpu->vgic_lr[lr] &= ~GICH_LR_EOI;
>>  
>> +			kvm = vcpu->kvm;
>> +			is_assigned_irq =
>> +				kvm_irq_has_notifier(kvm, 0, irq-VGIC_NR_PRIVATE_IRQS);
> 
> I think the preferred style is to keep the function call name on the
> same line as the assignment, and the do a line break on the parameter
> that doesn't fit in the line width and align that to the opening
> parenthesis on the funciton call.  At least I prefer it that way.

OK I will change this.
> 
> also spaces around the '-', please.
> 
>>  			/* Any additional pending interrupt? */
> 
> This comment seems to only aplly for non-assigned IRQs now, right?

yes it does
> 
>> -			if (vgic_dist_irq_is_pending(vcpu, irq)) {
>> -				vgic_cpu_irq_set(vcpu, irq);
>> -				level_pending = true;
>> -			} else {
>> +			if (is_assigned_irq) {
>>  				vgic_cpu_irq_clear(vcpu, irq);
>> +				kvm_debug("EOI irqchip routed vIRQ %d\n", irq);
>> +				kvm_notify_acked_irq(kvm, 0,
>> +					irq-VGIC_NR_PRIVATE_IRQS);
> 
> spaces around the '-', please.
OK
> 
>> +				vgic_dist_irq_clear(vcpu, irq);
>> +			} else {
>> +				if (vgic_dist_irq_is_pending(vcpu, irq)) {
>> +					vgic_cpu_irq_set(vcpu, irq);
>> +					level_pending = true;
>> +				} else {
>> +					vgic_cpu_irq_clear(vcpu, irq);
>> +				}
>>  			}
>>  
>>  			/*
>> @@ -1627,6 +1659,8 @@ int kvm_vgic_create(struct kvm *kvm)
>>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>>  	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
>>  
>> +	set_default_routing_table(kvm);
>> +
>>  out_unlock:
>>  	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
>>  		vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
>> @@ -2017,3 +2051,100 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>>  	.get_attr = vgic_get_attr,
>>  	.has_attr = vgic_has_attr,
>>  };
>> +
>> +
>> +/*
>> + * set up a default identity routing table
>> + * The user-side can further change the routing table using
>> + * KVM_SET_GSI_ROUTING VM ioctl
>> + */
>> +
> 
> don't add this whitespace between comments and a function declaration,
> please check the entire patch for this.
OK
> 
>> +static int set_default_routing_table(struct kvm *kvm)
>> +{
>> +	struct kvm_irq_routing_entry;
>> +	int i;
>> +	for (i = 0; i < VGIC_NR_IRQS; i++)	{
>> +		identity_table[i].gsi = i;
>> +		identity_table[i].type = KVM_IRQ_ROUTING_IRQCHIP;
>> +		identity_table[i].u.irqchip.irqchip = 0;
>> +		identity_table[i].u.irqchip.pin = i;
>> +	}
>> +	return kvm_set_irq_routing(kvm, identity_table,
>> +				   ARRAY_SIZE(identity_table), 0);
> 
> is identity table used after this stage?  If not, could you not
> dynamically allocate it and free it after use so we don't waste this
> staic allocation of memory in the kernel?

yes this definitively can be optimized.
> 
>> +}
>> +
>> +
>> +/*
>> + * Functions needed for GSI routing (used by irqchip.c)
>> + * implemented in irq_comm.c for x86 and ia64
>> + * in architecture specific files for some other archictures (powerpc)
>> + */
> 
> Hmmm, this comment seems rather pointless in this file.  If you want to
> describe what this function does, then just document this functionality
> and the parameters/return value, i.e.
> 
> vgic_set_assigned_irq - set state of IRQs driven by irqfd
> 
> When an IRQ is raised or lowered.... blah blah blah blah.
> 
> @e:   the routing entry describing...
> @kvm: the kvm struct
> and so on.

> 
> return 0 on success, -error on errors.
> 

OK
>> +
>> +static int vgic_set_assigned_irq(struct kvm_kernel_irq_routing_entry *e,
>> +			   struct kvm *kvm, int irq_source_id, int level,
>> +			   bool line_status)
>> +{
>> +	unsigned int spi = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
> 
> I noticed this in the changelogs too, where's the rationale/API
> documentaiton for the use of irqchip.pin and its semantics on ARM?
> 
> Should we add something in Documentation/virtual/kvm/api.txt ?

- in 4.24 KVM_CREATE_IRQCHIP I might add that similarly to s390 a dummy
identity table is created.

- KVM_CAP_IRQFD says "kvm_irqfd.gsi specifies the irqchip pin toggled by
this event.  When an event is triggered on the eventfd, an interrupt is
injected into the guest using the specified gsi pin"

Assuming the standard use case is to use an identity/dummy GSI table the
irqchip.pin still remains the "irqchip pin" toggled on eventfd.

By the way I might remove the mention to the use case where the gsi !=
irqchip.pin.

Now when reading 4.25 KVM_IRQ_LINE, it is said that it is used to inject
a GSI as well.

Now on ARM The GSI has the following content.

  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
  field: | irq_type  | vcpu_index |     irq_id     |

As such that's true that currently there is an inconsistency.

Currently my GSI == spi number whereas the GSI as injected by
KVM_IRQ_LINE has the above format.
> 
>> +
>> +	if (irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID) {
>> +		/*
>> +		 * This path is not tested yet,
>> +		 * only irqchip with resampler was exercised
>> +		 */
>> +		kvm_vgic_inject_irq(kvm, 0, spi, level);
> 
> hmmm, stuff like this definitely makes this patch an RFC.

Yes I acknowledge I shall step back to RFC. I was a bit eager.
> 
>> +	} else if (irq_source_id == KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID) {
>> +		if (level == 1) {
> 
> This seems very wrong.  What if an external device raises a
> level-triggered IRQ, but then lowers it again, without the guest was
> ever running, now you have to wait until the guest sees the (now
> inactive) interrupt and EOIs it before the interrupt is lowered on the
> vgic?
Hum I am bit confused here. when you enter that code, this means an
irqfd was triggered. This irqfd was registered by some user code that is
supposed to be alive and do the proper action to complete the IRQ. in my
case the xgmac toggles down the IRQ if anyone resets the IRQ status
register? this action is done in the xgmac guest ISR. A device
reset/error? might provoke the IRQ pin toggling done. Do you see any
other events?
> 
>> +			kvm_debug("Inject irqchip routed vIRQ %d\n",
>> +					e->irqchip.pin);
>> +			kvm_vgic_inject_irq(kvm, 0, spi, level);
>> +			/*
>> +			 * toggling down vIRQ wire is directly handled in
>> +			 * process_maintenance for this reason:
>> +			 * irqfd_resampler_ack is called in
>> +			 * process_maintenance which holds the dist lock.
>> +			 * irqfd_resampler_ack calls kvm_set_irq
>> +			 * which ends_up calling kvm_vgic_inject_irq.
>> +			 * This later attempts to take the lock -> deadlock!
>> +			 */
> 
> Not sure I understand this comment.  What are we trying to achieve, are
> we using some sort of a workaround to avoid a deadlock?

What I wanted to point out here is I would have prefered to handle both
levels 0 and 1 in a symetrical manner. irqfd_resampler_ack (in eventfd)
is calling kvm_set_irq with level 0. This would be the prefered way to
toggle down the SPI at GIC input instead of doing this in
process_maintenance in a dirty manner. However this does work because
irqfd_resampler_ack is called in process_maintenance (the place where
the EOI is analyzed). process_maintenance holds the dist lock and would
eventually call kvm_vgic_inject_irq which also attempts to take the lock.
	
> 
>> +		}
>> +	}
>> +	return 0;
>> +
>> +}
>> +
>> +/* void implementation requested to compile irqchip.c */
>> +
> 
> hmm, "TODO: MSI routing not yet implemented" seems more appropriate.

OK
> 
>> +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
>> +		struct kvm *kvm, int irq_source_id, int level, bool line_status)
>> +{
>> +	return 0;
>> +}
>> +
>> +int kvm_set_routing_entry(struct kvm_irq_routing_table *rt,
>> +			  struct kvm_kernel_irq_routing_entry *e,
>> +			  const struct kvm_irq_routing_entry *ue)
>> +{
>> +	int r = -EINVAL;
>> +
>> +	switch (ue->type) {
>> +	case KVM_IRQ_ROUTING_IRQCHIP:
>> +		e->set = vgic_set_assigned_irq;
>> +		e->irqchip.irqchip = ue->u.irqchip.irqchip;
>> +		e->irqchip.pin = ue->u.irqchip.pin;
>> +		if (e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS)
>> +			goto out;
>> +		/* chip[0][virtualID] = physicalID  */
>> +		rt->chip[ue->u.irqchip.irqchip][e->irqchip.pin] = ue->gsi;
> 
> did we verify ue->u.irqchip.irqchip anywhere before reaching this code?

No you are right. Should add this check.
> 
>> +		break;
>> +	default:
>> +		goto out;
>> +	}
>> +
>> +	r = 0;
>> +out:
>> +	return r;
>> +}
>> +
>> +
>> +
>> +
> 
> That's a lot of unnecessary white space.
OK
> 
>> -- 
>> 1.9.1
>>
> 
> Thanks,
> -Christoffer
>
Christoffer Dall June 5, 2014, 2:39 p.m. UTC | #6
On Thu, Jun 05, 2014 at 03:15:15PM +0200, Eric Auger wrote:
> On 06/05/2014 12:28 PM, Christoffer Dall wrote:
> > On Mon, Jun 02, 2014 at 09:29:56AM +0200, Eric Auger wrote:
> >> This patch enables irqfd and irq routing on ARM.
> >>
> >> It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING
> >>
> >> irqfd framework enables to assign physical IRQs to guests.
> >>
> >> 1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
> >> associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
> >> signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
> >> injects the specified IRQ into the VM (the "GSI" takes the semantic of a
> >> virtual IRQ for that guest).
> >>
> >> 2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
> >> VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
> >> a virtual IRQ (aka irchip.pin).  This creates a so-called GSI routing entry.
> >> When someone triggers the eventfd, irqfd handles it but uses the specified
> >> routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
> >> context the GSI takes the semantic of a physical IRQ while the irqchip.pin
> >> takes the semantic of a virtual IRQ.
> >>
> >> in 1) routing is used by irqfd but an identity routing is created by default
> >> making the gsi = irqchip.pin. Note on ARM there is a single interrupt
> >> controller kind, the GIC.
> >>
> >> GSI routing mostly is implemented in generic irqchip.c.
> >> The tiny ARM specific part is directly implemented in the virtual interrupt
> >> controller (vgic.c) as it is done for powerpc for instance. This option was
> >> prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
> >> Hence irq_comm.c is not used at all.
> >>
> >> Routing currently is not used for anything else than irqfd IRQ injection. Only
> >> SPI can be injected. This means the vgic is not totally hidden behind the
> >> irqchip. There are separate discussions on PPI/SGI routing.
> > 
> > What do you mean here?  Is there an ongoing discussion on the mailing
> > list somewhere?
> 
> Hi Christoffer,
> 
> Thanks for the review.
> 
> I was refering to that thread where it was invoked to put the whole vgic
> behind irqchip:
> http://www.spinics.net/lists/kvm-arm/msg08413.html
> > 
> >>
> >> Only level sensitive IRQs are supported (with a registered resampler). As a
> > 
> > Is it not trivial to add edge-triggered support in the same go?
> Yes it shouldn't be a problem. It is more a matter of testing.
> > 
> >> reminder the resampler is a second eventfd called by irqfd framework when the
> >> virtual IRQ is completed by the guest. This eventfd is supposed to be handled
> >> on user-side
> >>
> >> MSI routing is not supported yet.
> >>
> >> This work was tested with Calxeda Midway xgmac main interrupt (with and without
> >> explicit user routing) with qemu-system-arm and QEMU VFIO platform device.
> >>
> >> changes v1 -> v2:
> >> 2 fixes:
> >> - v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
> >>   This is now vgic_set_assigned_irq that increments it before injection.
> >> - v2 now handles the case where a pending assigned irq is cleared through
> >>   MMIO access. The irq is properly acked allowing the resamplefd handler
> >>   to possibly unmask the physical IRQ.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >>
> >> Conflicts:
> >> 	Documentation/virtual/kvm/api.txt
> >> 	arch/arm/kvm/Kconfig
> >>
> >> Conflicts:
> >> 	Documentation/virtual/kvm/api.txt
> > 
> > We usually don't include these conflict notices when sending out
> > patches.
> 
> OK I will remove them
> > 
> >> ---
> >>  Documentation/virtual/kvm/api.txt |   4 +-
> >>  arch/arm/include/uapi/asm/kvm.h   |   8 +++
> >>  arch/arm/kvm/Kconfig              |   2 +
> >>  arch/arm/kvm/Makefile             |   1 +
> >>  arch/arm/kvm/irq.h                |  25 +++++++
> >>  virt/kvm/arm/vgic.c               | 141 ++++++++++++++++++++++++++++++++++++--
> >>  6 files changed, 174 insertions(+), 7 deletions(-)
> >>  create mode 100644 arch/arm/kvm/irq.h
> >>
> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> index b4f5365..b376334 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -1339,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
> >>  4.52 KVM_SET_GSI_ROUTING
> >>  
> >>  Capability: KVM_CAP_IRQ_ROUTING
> >> -Architectures: x86 ia64 s390
> >> +Architectures: x86 ia64 s390 arm
> >>  Type: vm ioctl
> >>  Parameters: struct kvm_irq_routing (in)
> >>  Returns: 0 on success, -1 on error
> >> @@ -2126,7 +2126,7 @@ into the hash PTE second double word).
> >>  4.75 KVM_IRQFD
> >>  
> >>  Capability: KVM_CAP_IRQFD
> >> -Architectures: x86 s390
> >> +Architectures: x86 s390 arm
> >>  Type: vm ioctl
> >>  Parameters: struct kvm_irqfd (in)
> >>  Returns: 0 on success, -1 on error
> >> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> >> index ef0c878..89b864d 100644
> >> --- a/arch/arm/include/uapi/asm/kvm.h
> >> +++ b/arch/arm/include/uapi/asm/kvm.h
> >> @@ -192,6 +192,14 @@ struct kvm_arch_memory_slot {
> >>  /* Highest supported SPI, from VGIC_NR_IRQS */
> >>  #define KVM_ARM_IRQ_GIC_MAX		127
> >>  
> >> +/* needed by IRQ routing */
> >> +
> >> +/* One single KVM irqchip, ie. the VGIC */
> >> +#define KVM_NR_IRQCHIPS          1
> >> +
> >> +/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
> >> +#define KVM_IRQCHIP_NUM_PINS     256
> > 
> > I don't even see how the comment correlates to the define.  Hmmm.  But
> > since Marc asked you to change this anyhow, maybe this doesn't matter
> > now.
> 
> yes you're right. Those were the figures I was able to find in GIC400
> TRM and I was confused by the fact I was not able to find them in the
> code so I eventually put the same value as VGIC_NR_IRQS. I started
> looking at kvm-arm64/vgic-dyn and found this dist nr_irqs but need more
> time to investigate. Nethertheless his KVM_IRQCHIP_NUM_PINS is used in
> generic code (irqchip).
> > 
> >> +
> >>  /* PSCI interface */
> >>  #define KVM_PSCI_FN_BASE		0x95c1ba5e
> >>  #define KVM_PSCI_FN(n)			(KVM_PSCI_FN_BASE + (n))
> >> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> >> index 4be5bb1..096692c 100644
> >> --- a/arch/arm/kvm/Kconfig
> >> +++ b/arch/arm/kvm/Kconfig
> >> @@ -24,6 +24,7 @@ config KVM
> >>  	select KVM_MMIO
> >>  	select KVM_ARM_HOST
> >>  	depends on ARM_VIRT_EXT && ARM_LPAE && !CPU_BIG_ENDIAN
> >> +	select HAVE_KVM_EVENTFD
> >>  	---help---
> >>  	  Support hosting virtualized guest machines. You will also
> >>  	  need to select one or more of the processor modules below.
> >> @@ -56,6 +57,7 @@ config KVM_ARM_VGIC
> >>  	bool "KVM support for Virtual GIC"
> >>  	depends on KVM_ARM_HOST && OF
> >>  	select HAVE_KVM_IRQCHIP
> >> +	select HAVE_KVM_IRQ_ROUTING
> >>  	default y
> >>  	---help---
> >>  	  Adds support for a hardware assisted, in-kernel GIC emulation.
> >> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> >> index 789bca9..29de111 100644
> >> --- a/arch/arm/kvm/Makefile
> >> +++ b/arch/arm/kvm/Makefile
> >> @@ -21,4 +21,5 @@ obj-y += kvm-arm.o init.o interrupts.o
> >>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
> >>  obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
> >>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
> >> +obj-$(CONFIG_HAVE_KVM_EVENTFD) += $(KVM)/eventfd.o $(KVM)/irqchip.o
> >>  obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
> >> diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h
> >> new file mode 100644
> >> index 0000000..4d6fcc6
> >> --- /dev/null
> >> +++ b/arch/arm/kvm/irq.h
> >> @@ -0,0 +1,25 @@
> >> +/*
> >> + * Copyright (C) 2014, STMicroelectronics
> >> + * Authors: Eric Auger <eric.auger@st.com>
> > 
> > please use the Linaro copyright notice for this.  You can add your
> > @st.com e-mail in addition to the Linaro one in case you want that for
> > long-term support.
> 
> OK I will do
> > 
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License, version 2, as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + */
> >> +
> >> +#ifndef __IRQ_H
> >> +#define __IRQ_H
> >> +
> >> +#include <linux/kvm_host.h>
> >> +/*
> >> + * Placeholder for irqchip and irq/msi routing declarations
> >> + * included in irqchip.c
> >> + */
> > 
> > But none needed now?
> 
> Yes nothing for the time being.
> > 
> >> +
> >> +#endif
> >> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> >> index 56ff9be..39afa0d 100644
> >> --- a/virt/kvm/arm/vgic.c
> >> +++ b/virt/kvm/arm/vgic.c
> >> @@ -93,6 +93,9 @@ static struct device_node *vgic_node;
> >>  #define ACCESS_WRITE_VALUE	(3 << 1)
> >>  #define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
> >>  
> >> +static struct kvm_irq_routing_entry identity_table[VGIC_NR_IRQS];
> >> +static int set_default_routing_table(struct kvm *kvm);
> >> +
> >>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
> >>  static void vgic_update_state(struct kvm *kvm);
> >>  static void vgic_kick_vcpus(struct kvm *kvm);
> >> @@ -408,11 +411,27 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
> >>  					  struct kvm_exit_mmio *mmio,
> >>  					  phys_addr_t offset)
> >>  {
> >> -	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
> >> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> >> +	unsigned int i;
> >> +	bool is_assigned_irq;
> >> +	DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
> >> +	DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
> >> +	unsigned long *pending =
> >> +		vgic_bitmap_get_shared_map(&dist->irq_state);
> >> +	u32 *reg;
> > 
> > please add a blank line between your declarations and the actual code.
> 
> OK
> > 
> >> +	bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);
> >> +	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
> >>  				       vcpu->vcpu_id, offset);
> >>  	vgic_reg_access(mmio, reg, offset,
> >>  			ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
> >>  	if (mmio->is_write) {
> >> +		pending = vgic_bitmap_get_shared_map(&dist->irq_state);
> >> +		bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
> >> +		for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
> >> +			is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
> >> +			if (is_assigned_irq)
> >> +				kvm_notify_acked_irq(vcpu->kvm, 0, i);
> >> +		}
> > 
> > As Mark pointed out, just copy the vgic reg value and do a simple xor on
> > the two instead, and factor out the two lines that check for
> > is_assigned_irq and calles notify_acked so that you can give a small
> > static function a semantically meaningful name and call that from both
> > this function and the process_maintenance function.
> 
> Yes I corrected this after looking into more details at the register
> semantic.
> > 
> > That being said, this whole thing feels a bit weird, I'll comment on the
> > other thread.
> OK
> > 
> >>  		vgic_update_state(vcpu->kvm);
> >>  		return true;
> >>  	}
> >> @@ -1172,6 +1191,8 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >>  {
> >>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> >>  	bool level_pending = false;
> >> +	struct kvm *kvm;
> >> +	int is_assigned_irq;
> >>  
> >>  	kvm_debug("MISR = %08x\n", vgic_cpu->vgic_misr);
> >>  
> >> @@ -1189,12 +1210,23 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >>  			vgic_irq_clear_active(vcpu, irq);
> >>  			vgic_cpu->vgic_lr[lr] &= ~GICH_LR_EOI;
> >>  
> >> +			kvm = vcpu->kvm;
> >> +			is_assigned_irq =
> >> +				kvm_irq_has_notifier(kvm, 0, irq-VGIC_NR_PRIVATE_IRQS);
> > 
> > I think the preferred style is to keep the function call name on the
> > same line as the assignment, and the do a line break on the parameter
> > that doesn't fit in the line width and align that to the opening
> > parenthesis on the funciton call.  At least I prefer it that way.
> 
> OK I will change this.
> > 
> > also spaces around the '-', please.
> > 
> >>  			/* Any additional pending interrupt? */
> > 
> > This comment seems to only aplly for non-assigned IRQs now, right?
> 
> yes it does
> > 

so move the comment in the else-clause, if it's an assigned irq then
that's not what you're handling here.  But come to think of it, don't we
need to check if the line is still high?

> >> -			if (vgic_dist_irq_is_pending(vcpu, irq)) {
> >> -				vgic_cpu_irq_set(vcpu, irq);
> >> -				level_pending = true;
> >> -			} else {
> >> +			if (is_assigned_irq) {
> >>  				vgic_cpu_irq_clear(vcpu, irq);
> >> +				kvm_debug("EOI irqchip routed vIRQ %d\n", irq);
> >> +				kvm_notify_acked_irq(kvm, 0,
> >> +					irq-VGIC_NR_PRIVATE_IRQS);
> > 
> > spaces around the '-', please.
> OK
> > 
> >> +				vgic_dist_irq_clear(vcpu, irq);
> >> +			} else {
> >> +				if (vgic_dist_irq_is_pending(vcpu, irq)) {
> >> +					vgic_cpu_irq_set(vcpu, irq);
> >> +					level_pending = true;
> >> +				} else {
> >> +					vgic_cpu_irq_clear(vcpu, irq);
> >> +				}
> >>  			}
> >>  
> >>  			/*
> >> @@ -1627,6 +1659,8 @@ int kvm_vgic_create(struct kvm *kvm)
> >>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
> >>  	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
> >>  
> >> +	set_default_routing_table(kvm);
> >> +
> >>  out_unlock:
> >>  	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
> >>  		vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
> >> @@ -2017,3 +2051,100 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = {
> >>  	.get_attr = vgic_get_attr,
> >>  	.has_attr = vgic_has_attr,
> >>  };
> >> +
> >> +
> >> +/*
> >> + * set up a default identity routing table
> >> + * The user-side can further change the routing table using
> >> + * KVM_SET_GSI_ROUTING VM ioctl
> >> + */
> >> +
> > 
> > don't add this whitespace between comments and a function declaration,
> > please check the entire patch for this.
> OK
> > 
> >> +static int set_default_routing_table(struct kvm *kvm)
> >> +{
> >> +	struct kvm_irq_routing_entry;
> >> +	int i;
> >> +	for (i = 0; i < VGIC_NR_IRQS; i++)	{
> >> +		identity_table[i].gsi = i;
> >> +		identity_table[i].type = KVM_IRQ_ROUTING_IRQCHIP;
> >> +		identity_table[i].u.irqchip.irqchip = 0;
> >> +		identity_table[i].u.irqchip.pin = i;
> >> +	}
> >> +	return kvm_set_irq_routing(kvm, identity_table,
> >> +				   ARRAY_SIZE(identity_table), 0);
> > 
> > is identity table used after this stage?  If not, could you not
> > dynamically allocate it and free it after use so we don't waste this
> > staic allocation of memory in the kernel?
> 
> yes this definitively can be optimized.
> > 
> >> +}
> >> +
> >> +
> >> +/*
> >> + * Functions needed for GSI routing (used by irqchip.c)
> >> + * implemented in irq_comm.c for x86 and ia64
> >> + * in architecture specific files for some other archictures (powerpc)
> >> + */
> > 
> > Hmmm, this comment seems rather pointless in this file.  If you want to
> > describe what this function does, then just document this functionality
> > and the parameters/return value, i.e.
> > 
> > vgic_set_assigned_irq - set state of IRQs driven by irqfd
> > 
> > When an IRQ is raised or lowered.... blah blah blah blah.
> > 
> > @e:   the routing entry describing...
> > @kvm: the kvm struct
> > and so on.
> 
> > 
> > return 0 on success, -error on errors.
> > 
> 
> OK
> >> +
> >> +static int vgic_set_assigned_irq(struct kvm_kernel_irq_routing_entry *e,
> >> +			   struct kvm *kvm, int irq_source_id, int level,
> >> +			   bool line_status)
> >> +{
> >> +	unsigned int spi = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
> > 
> > I noticed this in the changelogs too, where's the rationale/API
> > documentaiton for the use of irqchip.pin and its semantics on ARM?
> > 
> > Should we add something in Documentation/virtual/kvm/api.txt ?
> 
> - in 4.24 KVM_CREATE_IRQCHIP I might add that similarly to s390 a dummy
> identity table is created.
> 
> - KVM_CAP_IRQFD says "kvm_irqfd.gsi specifies the irqchip pin toggled by
> this event.  When an event is triggered on the eventfd, an interrupt is
> injected into the guest using the specified gsi pin"
> 
> Assuming the standard use case is to use an identity/dummy GSI table the
> irqchip.pin still remains the "irqchip pin" toggled on eventfd.
> 
> By the way I might remove the mention to the use case where the gsi !=
> irqchip.pin.
> 
> Now when reading 4.25 KVM_IRQ_LINE, it is said that it is used to inject
> a GSI as well.
> 
> Now on ARM The GSI has the following content.
> 
>   bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
>   field: | irq_type  | vcpu_index |     irq_id     |
> 
> As such that's true that currently there is an inconsistency.
> 
> Currently my GSI == spi number whereas the GSI as injected by
> KVM_IRQ_LINE has the above format.

But the ARM use of KVM_IRQFD is not clearly documented.

I think this patch needs to include adding arm to the "Architectures"
list in api.txt and clearly document what the semantics of the fields of
the struct kvm_irqfd are.

> > 
> >> +
> >> +	if (irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID) {
> >> +		/*
> >> +		 * This path is not tested yet,
> >> +		 * only irqchip with resampler was exercised
> >> +		 */
> >> +		kvm_vgic_inject_irq(kvm, 0, spi, level);
> > 
> > hmmm, stuff like this definitely makes this patch an RFC.
> 
> Yes I acknowledge I shall step back to RFC. I was a bit eager.

The alternative is to return an error and put a big fat comment saying
this is NOT IMPLEMENTED yet.

> > 
> >> +	} else if (irq_source_id == KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID) {
> >> +		if (level == 1) {
> > 
> > This seems very wrong.  What if an external device raises a
> > level-triggered IRQ, but then lowers it again, without the guest was
> > ever running, now you have to wait until the guest sees the (now
> > inactive) interrupt and EOIs it before the interrupt is lowered on the
> > vgic?
> Hum I am bit confused here. when you enter that code, this means an
> irqfd was triggered. This irqfd was registered by some user code that is
> supposed to be alive and do the proper action to complete the IRQ. in my
> case the xgmac toggles down the IRQ if anyone resets the IRQ status
> register? this action is done in the xgmac guest ISR. A device
> reset/error? might provoke the IRQ pin toggling done. Do you see any
> other events?

Sure, ok, forget about the 'guest wasn't running', but if the guest ISR
is run with guest interrupts disabled and resets the xgmac device (or
imagine some other device that just lowers the level triggered interrupt
after some timeout), then how does the current code handle this?

> > 
> >> +			kvm_debug("Inject irqchip routed vIRQ %d\n",
> >> +					e->irqchip.pin);
> >> +			kvm_vgic_inject_irq(kvm, 0, spi, level);
> >> +			/*
> >> +			 * toggling down vIRQ wire is directly handled in
> >> +			 * process_maintenance for this reason:
> >> +			 * irqfd_resampler_ack is called in
> >> +			 * process_maintenance which holds the dist lock.
> >> +			 * irqfd_resampler_ack calls kvm_set_irq
> >> +			 * which ends_up calling kvm_vgic_inject_irq.
> >> +			 * This later attempts to take the lock -> deadlock!
> >> +			 */
> > 
> > Not sure I understand this comment.  What are we trying to achieve, are
> > we using some sort of a workaround to avoid a deadlock?
> 
> What I wanted to point out here is I would have prefered to handle both
> levels 0 and 1 in a symetrical manner. irqfd_resampler_ack (in eventfd)
> is calling kvm_set_irq with level 0. This would be the prefered way to
> toggle down the SPI at GIC input instead of doing this in
> process_maintenance in a dirty manner. However this does work because
> irqfd_resampler_ack is called in process_maintenance (the place where
> the EOI is analyzed). process_maintenance holds the dist lock and would
> eventually call kvm_vgic_inject_irq which also attempts to take the lock.
> 	

I'm afraid that's too much of a hack.  There's an external mechanism to
set an interrupt line to active (level=1) or inactive (level=0) and we
must support both.

The fact that vgic_process_maintenance() can set the interrupt line to
inactive is just something we exploit to properly handle level-triggered
interrupts, but the main API to the VGIC must absolutely be supported.

Am I completely wrong here?

The locking issue can be solved by splitting up the locking into a finer
granularity as needed or deferring the call to irqfd_resampler_ack()
until after unlocking the distributor lock in kvm_vgic_sync_hwstate().

> > 
> >> +		}
> >> +	}
> >> +	return 0;
> >> +
> >> +}
> >> +
> >> +/* void implementation requested to compile irqchip.c */
> >> +
> > 
> > hmm, "TODO: MSI routing not yet implemented" seems more appropriate.
> 
> OK
> > 
> >> +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
> >> +		struct kvm *kvm, int irq_source_id, int level, bool line_status)
> >> +{
> >> +	return 0;
> >> +}
> >> +
> >> +int kvm_set_routing_entry(struct kvm_irq_routing_table *rt,
> >> +			  struct kvm_kernel_irq_routing_entry *e,
> >> +			  const struct kvm_irq_routing_entry *ue)
> >> +{
> >> +	int r = -EINVAL;
> >> +
> >> +	switch (ue->type) {
> >> +	case KVM_IRQ_ROUTING_IRQCHIP:
> >> +		e->set = vgic_set_assigned_irq;
> >> +		e->irqchip.irqchip = ue->u.irqchip.irqchip;
> >> +		e->irqchip.pin = ue->u.irqchip.pin;
> >> +		if (e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS)
> >> +			goto out;
> >> +		/* chip[0][virtualID] = physicalID  */
> >> +		rt->chip[ue->u.irqchip.irqchip][e->irqchip.pin] = ue->gsi;
> > 
> > did we verify ue->u.irqchip.irqchip anywhere before reaching this code?
> 
> No you are right. Should add this check.

ouch! ;)

-Christoffer
Auger Eric June 5, 2014, 3:58 p.m. UTC | #7
On 06/05/2014 04:39 PM, Christoffer Dall wrote:
> On Thu, Jun 05, 2014 at 03:15:15PM +0200, Eric Auger wrote:
>> On 06/05/2014 12:28 PM, Christoffer Dall wrote:
>>> On Mon, Jun 02, 2014 at 09:29:56AM +0200, Eric Auger wrote:
>>>> This patch enables irqfd and irq routing on ARM.
>>>>
>>>> It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING
>>>>
>>>> irqfd framework enables to assign physical IRQs to guests.
>>>>
>>>> 1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
>>>> associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
>>>> signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
>>>> injects the specified IRQ into the VM (the "GSI" takes the semantic of a
>>>> virtual IRQ for that guest).
>>>>
>>>> 2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
>>>> VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
>>>> a virtual IRQ (aka irchip.pin).  This creates a so-called GSI routing entry.
>>>> When someone triggers the eventfd, irqfd handles it but uses the specified
>>>> routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
>>>> context the GSI takes the semantic of a physical IRQ while the irqchip.pin
>>>> takes the semantic of a virtual IRQ.
>>>>
>>>> in 1) routing is used by irqfd but an identity routing is created by default
>>>> making the gsi = irqchip.pin. Note on ARM there is a single interrupt
>>>> controller kind, the GIC.
>>>>
>>>> GSI routing mostly is implemented in generic irqchip.c.
>>>> The tiny ARM specific part is directly implemented in the virtual interrupt
>>>> controller (vgic.c) as it is done for powerpc for instance. This option was
>>>> prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
>>>> Hence irq_comm.c is not used at all.
>>>>
>>>> Routing currently is not used for anything else than irqfd IRQ injection. Only
>>>> SPI can be injected. This means the vgic is not totally hidden behind the
>>>> irqchip. There are separate discussions on PPI/SGI routing.
>>>
>>> What do you mean here?  Is there an ongoing discussion on the mailing
>>> list somewhere?
>>
>> Hi Christoffer,
>>
>> Thanks for the review.
>>
>> I was refering to that thread where it was invoked to put the whole vgic
>> behind irqchip:
>> http://www.spinics.net/lists/kvm-arm/msg08413.html
>>>
>>>>
>>>> Only level sensitive IRQs are supported (with a registered resampler). As a
>>>
>>> Is it not trivial to add edge-triggered support in the same go?
>> Yes it shouldn't be a problem. It is more a matter of testing.
>>>
>>>> reminder the resampler is a second eventfd called by irqfd framework when the
>>>> virtual IRQ is completed by the guest. This eventfd is supposed to be handled
>>>> on user-side
>>>>
>>>> MSI routing is not supported yet.
>>>>
>>>> This work was tested with Calxeda Midway xgmac main interrupt (with and without
>>>> explicit user routing) with qemu-system-arm and QEMU VFIO platform device.
>>>>
>>>> changes v1 -> v2:
>>>> 2 fixes:
>>>> - v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
>>>>   This is now vgic_set_assigned_irq that increments it before injection.
>>>> - v2 now handles the case where a pending assigned irq is cleared through
>>>>   MMIO access. The irq is properly acked allowing the resamplefd handler
>>>>   to possibly unmask the physical IRQ.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>
>>>> Conflicts:
>>>> 	Documentation/virtual/kvm/api.txt
>>>> 	arch/arm/kvm/Kconfig
>>>>
>>>> Conflicts:
>>>> 	Documentation/virtual/kvm/api.txt
>>>
>>> We usually don't include these conflict notices when sending out
>>> patches.
>>
>> OK I will remove them
>>>
>>>> ---
>>>>  Documentation/virtual/kvm/api.txt |   4 +-
>>>>  arch/arm/include/uapi/asm/kvm.h   |   8 +++
>>>>  arch/arm/kvm/Kconfig              |   2 +
>>>>  arch/arm/kvm/Makefile             |   1 +
>>>>  arch/arm/kvm/irq.h                |  25 +++++++
>>>>  virt/kvm/arm/vgic.c               | 141 ++++++++++++++++++++++++++++++++++++--
>>>>  6 files changed, 174 insertions(+), 7 deletions(-)
>>>>  create mode 100644 arch/arm/kvm/irq.h
>>>>
>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>> index b4f5365..b376334 100644
>>>> --- a/Documentation/virtual/kvm/api.txt
>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>> @@ -1339,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
>>>>  4.52 KVM_SET_GSI_ROUTING
>>>>  
>>>>  Capability: KVM_CAP_IRQ_ROUTING
>>>> -Architectures: x86 ia64 s390
>>>> +Architectures: x86 ia64 s390 arm
>>>>  Type: vm ioctl
>>>>  Parameters: struct kvm_irq_routing (in)
>>>>  Returns: 0 on success, -1 on error
>>>> @@ -2126,7 +2126,7 @@ into the hash PTE second double word).
>>>>  4.75 KVM_IRQFD
>>>>  
>>>>  Capability: KVM_CAP_IRQFD
>>>> -Architectures: x86 s390
>>>> +Architectures: x86 s390 arm
>>>>  Type: vm ioctl
>>>>  Parameters: struct kvm_irqfd (in)
>>>>  Returns: 0 on success, -1 on error
>>>> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
>>>> index ef0c878..89b864d 100644
>>>> --- a/arch/arm/include/uapi/asm/kvm.h
>>>> +++ b/arch/arm/include/uapi/asm/kvm.h
>>>> @@ -192,6 +192,14 @@ struct kvm_arch_memory_slot {
>>>>  /* Highest supported SPI, from VGIC_NR_IRQS */
>>>>  #define KVM_ARM_IRQ_GIC_MAX		127
>>>>  
>>>> +/* needed by IRQ routing */
>>>> +
>>>> +/* One single KVM irqchip, ie. the VGIC */
>>>> +#define KVM_NR_IRQCHIPS          1
>>>> +
>>>> +/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
>>>> +#define KVM_IRQCHIP_NUM_PINS     256
>>>
>>> I don't even see how the comment correlates to the define.  Hmmm.  But
>>> since Marc asked you to change this anyhow, maybe this doesn't matter
>>> now.
>>
>> yes you're right. Those were the figures I was able to find in GIC400
>> TRM and I was confused by the fact I was not able to find them in the
>> code so I eventually put the same value as VGIC_NR_IRQS. I started
>> looking at kvm-arm64/vgic-dyn and found this dist nr_irqs but need more
>> time to investigate. Nethertheless his KVM_IRQCHIP_NUM_PINS is used in
>> generic code (irqchip).
>>>
>>>> +
>>>>  /* PSCI interface */
>>>>  #define KVM_PSCI_FN_BASE		0x95c1ba5e
>>>>  #define KVM_PSCI_FN(n)			(KVM_PSCI_FN_BASE + (n))
>>>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>>>> index 4be5bb1..096692c 100644
>>>> --- a/arch/arm/kvm/Kconfig
>>>> +++ b/arch/arm/kvm/Kconfig
>>>> @@ -24,6 +24,7 @@ config KVM
>>>>  	select KVM_MMIO
>>>>  	select KVM_ARM_HOST
>>>>  	depends on ARM_VIRT_EXT && ARM_LPAE && !CPU_BIG_ENDIAN
>>>> +	select HAVE_KVM_EVENTFD
>>>>  	---help---
>>>>  	  Support hosting virtualized guest machines. You will also
>>>>  	  need to select one or more of the processor modules below.
>>>> @@ -56,6 +57,7 @@ config KVM_ARM_VGIC
>>>>  	bool "KVM support for Virtual GIC"
>>>>  	depends on KVM_ARM_HOST && OF
>>>>  	select HAVE_KVM_IRQCHIP
>>>> +	select HAVE_KVM_IRQ_ROUTING
>>>>  	default y
>>>>  	---help---
>>>>  	  Adds support for a hardware assisted, in-kernel GIC emulation.
>>>> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
>>>> index 789bca9..29de111 100644
>>>> --- a/arch/arm/kvm/Makefile
>>>> +++ b/arch/arm/kvm/Makefile
>>>> @@ -21,4 +21,5 @@ obj-y += kvm-arm.o init.o interrupts.o
>>>>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
>>>>  obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
>>>>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
>>>> +obj-$(CONFIG_HAVE_KVM_EVENTFD) += $(KVM)/eventfd.o $(KVM)/irqchip.o
>>>>  obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
>>>> diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h
>>>> new file mode 100644
>>>> index 0000000..4d6fcc6
>>>> --- /dev/null
>>>> +++ b/arch/arm/kvm/irq.h
>>>> @@ -0,0 +1,25 @@
>>>> +/*
>>>> + * Copyright (C) 2014, STMicroelectronics
>>>> + * Authors: Eric Auger <eric.auger@st.com>
>>>
>>> please use the Linaro copyright notice for this.  You can add your
>>> @st.com e-mail in addition to the Linaro one in case you want that for
>>> long-term support.
>>
>> OK I will do
>>>
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License, version 2, as
>>>> + * published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + */
>>>> +
>>>> +#ifndef __IRQ_H
>>>> +#define __IRQ_H
>>>> +
>>>> +#include <linux/kvm_host.h>
>>>> +/*
>>>> + * Placeholder for irqchip and irq/msi routing declarations
>>>> + * included in irqchip.c
>>>> + */
>>>
>>> But none needed now?
>>
>> Yes nothing for the time being.
>>>
>>>> +
>>>> +#endif
>>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>>> index 56ff9be..39afa0d 100644
>>>> --- a/virt/kvm/arm/vgic.c
>>>> +++ b/virt/kvm/arm/vgic.c
>>>> @@ -93,6 +93,9 @@ static struct device_node *vgic_node;
>>>>  #define ACCESS_WRITE_VALUE	(3 << 1)
>>>>  #define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
>>>>  
>>>> +static struct kvm_irq_routing_entry identity_table[VGIC_NR_IRQS];
>>>> +static int set_default_routing_table(struct kvm *kvm);
>>>> +
>>>>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>>>  static void vgic_update_state(struct kvm *kvm);
>>>>  static void vgic_kick_vcpus(struct kvm *kvm);
>>>> @@ -408,11 +411,27 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
>>>>  					  struct kvm_exit_mmio *mmio,
>>>>  					  phys_addr_t offset)
>>>>  {
>>>> -	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>>>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>>> +	unsigned int i;
>>>> +	bool is_assigned_irq;
>>>> +	DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
>>>> +	DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
>>>> +	unsigned long *pending =
>>>> +		vgic_bitmap_get_shared_map(&dist->irq_state);
>>>> +	u32 *reg;
>>>
>>> please add a blank line between your declarations and the actual code.
>>
>> OK
>>>
>>>> +	bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);
>>>> +	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>>>>  				       vcpu->vcpu_id, offset);
>>>>  	vgic_reg_access(mmio, reg, offset,
>>>>  			ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
>>>>  	if (mmio->is_write) {
>>>> +		pending = vgic_bitmap_get_shared_map(&dist->irq_state);
>>>> +		bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
>>>> +		for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
>>>> +			is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
>>>> +			if (is_assigned_irq)
>>>> +				kvm_notify_acked_irq(vcpu->kvm, 0, i);
>>>> +		}
>>>
>>> As Mark pointed out, just copy the vgic reg value and do a simple xor on
>>> the two instead, and factor out the two lines that check for
>>> is_assigned_irq and calles notify_acked so that you can give a small
>>> static function a semantically meaningful name and call that from both
>>> this function and the process_maintenance function.
>>
>> Yes I corrected this after looking into more details at the register
>> semantic.
>>>
>>> That being said, this whole thing feels a bit weird, I'll comment on the
>>> other thread.
>> OK
>>>
>>>>  		vgic_update_state(vcpu->kvm);
>>>>  		return true;
>>>>  	}
>>>> @@ -1172,6 +1191,8 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>  {
>>>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>>>  	bool level_pending = false;
>>>> +	struct kvm *kvm;
>>>> +	int is_assigned_irq;
>>>>  
>>>>  	kvm_debug("MISR = %08x\n", vgic_cpu->vgic_misr);
>>>>  
>>>> @@ -1189,12 +1210,23 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>  			vgic_irq_clear_active(vcpu, irq);
>>>>  			vgic_cpu->vgic_lr[lr] &= ~GICH_LR_EOI;
>>>>  
>>>> +			kvm = vcpu->kvm;
>>>> +			is_assigned_irq =
>>>> +				kvm_irq_has_notifier(kvm, 0, irq-VGIC_NR_PRIVATE_IRQS);
>>>
>>> I think the preferred style is to keep the function call name on the
>>> same line as the assignment, and the do a line break on the parameter
>>> that doesn't fit in the line width and align that to the opening
>>> parenthesis on the funciton call.  At least I prefer it that way.
>>
>> OK I will change this.
>>>
>>> also spaces around the '-', please.
>>>
>>>>  			/* Any additional pending interrupt? */
>>>
>>> This comment seems to only aplly for non-assigned IRQs now, right?
>>
>> yes it does
>>>
> 
> so move the comment in the else-clause, if it's an assigned irq then
> that's not what you're handling here. 

correct

 But come to think of it, don't we
> need to check if the line is still high?

by construction it should not be possible since the physical IRQ is
masked by the VFIO driver.
> 
>>>> -			if (vgic_dist_irq_is_pending(vcpu, irq)) {
>>>> -				vgic_cpu_irq_set(vcpu, irq);
>>>> -				level_pending = true;
>>>> -			} else {
>>>> +			if (is_assigned_irq) {
>>>>  				vgic_cpu_irq_clear(vcpu, irq);
>>>> +				kvm_debug("EOI irqchip routed vIRQ %d\n", irq);
>>>> +				kvm_notify_acked_irq(kvm, 0,
>>>> +					irq-VGIC_NR_PRIVATE_IRQS);
>>>
>>> spaces around the '-', please.
>> OK
>>>
>>>> +				vgic_dist_irq_clear(vcpu, irq);
>>>> +			} else {
>>>> +				if (vgic_dist_irq_is_pending(vcpu, irq)) {
>>>> +					vgic_cpu_irq_set(vcpu, irq);
>>>> +					level_pending = true;
>>>> +				} else {
>>>> +					vgic_cpu_irq_clear(vcpu, irq);
>>>> +				}
>>>>  			}
>>>>  
>>>>  			/*
>>>> @@ -1627,6 +1659,8 @@ int kvm_vgic_create(struct kvm *kvm)
>>>>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>>>>  	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
>>>>  
>>>> +	set_default_routing_table(kvm);
>>>> +
>>>>  out_unlock:
>>>>  	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
>>>>  		vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
>>>> @@ -2017,3 +2051,100 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>>>>  	.get_attr = vgic_get_attr,
>>>>  	.has_attr = vgic_has_attr,
>>>>  };
>>>> +
>>>> +
>>>> +/*
>>>> + * set up a default identity routing table
>>>> + * The user-side can further change the routing table using
>>>> + * KVM_SET_GSI_ROUTING VM ioctl
>>>> + */
>>>> +
>>>
>>> don't add this whitespace between comments and a function declaration,
>>> please check the entire patch for this.
>> OK
>>>
>>>> +static int set_default_routing_table(struct kvm *kvm)
>>>> +{
>>>> +	struct kvm_irq_routing_entry;
>>>> +	int i;
>>>> +	for (i = 0; i < VGIC_NR_IRQS; i++)	{
>>>> +		identity_table[i].gsi = i;
>>>> +		identity_table[i].type = KVM_IRQ_ROUTING_IRQCHIP;
>>>> +		identity_table[i].u.irqchip.irqchip = 0;
>>>> +		identity_table[i].u.irqchip.pin = i;
>>>> +	}
>>>> +	return kvm_set_irq_routing(kvm, identity_table,
>>>> +				   ARRAY_SIZE(identity_table), 0);
>>>
>>> is identity table used after this stage?  If not, could you not
>>> dynamically allocate it and free it after use so we don't waste this
>>> staic allocation of memory in the kernel?
>>
>> yes this definitively can be optimized.
>>>
>>>> +}
>>>> +
>>>> +
>>>> +/*
>>>> + * Functions needed for GSI routing (used by irqchip.c)
>>>> + * implemented in irq_comm.c for x86 and ia64
>>>> + * in architecture specific files for some other archictures (powerpc)
>>>> + */
>>>
>>> Hmmm, this comment seems rather pointless in this file.  If you want to
>>> describe what this function does, then just document this functionality
>>> and the parameters/return value, i.e.
>>>
>>> vgic_set_assigned_irq - set state of IRQs driven by irqfd
>>>
>>> When an IRQ is raised or lowered.... blah blah blah blah.
>>>
>>> @e:   the routing entry describing...
>>> @kvm: the kvm struct
>>> and so on.
>>
>>>
>>> return 0 on success, -error on errors.
>>>
>>
>> OK
>>>> +
>>>> +static int vgic_set_assigned_irq(struct kvm_kernel_irq_routing_entry *e,
>>>> +			   struct kvm *kvm, int irq_source_id, int level,
>>>> +			   bool line_status)
>>>> +{
>>>> +	unsigned int spi = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
>>>
>>> I noticed this in the changelogs too, where's the rationale/API
>>> documentaiton for the use of irqchip.pin and its semantics on ARM?
>>>
>>> Should we add something in Documentation/virtual/kvm/api.txt ?
>>
>> - in 4.24 KVM_CREATE_IRQCHIP I might add that similarly to s390 a dummy
>> identity table is created.
>>
>> - KVM_CAP_IRQFD says "kvm_irqfd.gsi specifies the irqchip pin toggled by
>> this event.  When an event is triggered on the eventfd, an interrupt is
>> injected into the guest using the specified gsi pin"
>>
>> Assuming the standard use case is to use an identity/dummy GSI table the
>> irqchip.pin still remains the "irqchip pin" toggled on eventfd.
>>
>> By the way I might remove the mention to the use case where the gsi !=
>> irqchip.pin.
>>
>> Now when reading 4.25 KVM_IRQ_LINE, it is said that it is used to inject
>> a GSI as well.
>>
>> Now on ARM The GSI has the following content.
>>
>>   bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
>>   field: | irq_type  | vcpu_index |     irq_id     |
>>
>> As such that's true that currently there is an inconsistency.
>>
>> Currently my GSI == spi number whereas the GSI as injected by
>> KVM_IRQ_LINE has the above format.
> 
> But the ARM use of KVM_IRQFD is not clearly documented.
> 
> I think this patch needs to include adding arm to the "Architectures"
> list in api.txt and clearly document what the semantics of the fields of
> the struct kvm_irqfd are.

Yes indeed, this deserves such clarification, all the more so there is
the above inconsistency. We come back to the discussion about relevance
of routing other things than SPI actually. use case?
> 
>>>
>>>> +
>>>> +	if (irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID) {
>>>> +		/*
>>>> +		 * This path is not tested yet,
>>>> +		 * only irqchip with resampler was exercised
>>>> +		 */
>>>> +		kvm_vgic_inject_irq(kvm, 0, spi, level);
>>>
>>> hmmm, stuff like this definitely makes this patch an RFC.
>>
>> Yes I acknowledge I shall step back to RFC. I was a bit eager.
> 
> The alternative is to return an error and put a big fat comment saying
> this is NOT IMPLEMENTED yet.

actually this KVM_USERSPACE_IRQ_SOURCE_ID path is entered when injecting
IRQs without resamplerfd (is edge sensitive ones). As such we can link
this issue to your above related comment.

Anyway I will move it to RFC since there are quite a lot of open points,
especially the next one.
> 
>>>
>>>> +	} else if (irq_source_id == KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID) {
>>>> +		if (level == 1) {
>>>
>>> This seems very wrong.  What if an external device raises a
>>> level-triggered IRQ, but then lowers it again, without the guest was
>>> ever running, now you have to wait until the guest sees the (now
>>> inactive) interrupt and EOIs it before the interrupt is lowered on the
>>> vgic?
>> Hum I am bit confused here. when you enter that code, this means an
>> irqfd was triggered. This irqfd was registered by some user code that is
>> supposed to be alive and do the proper action to complete the IRQ. in my
>> case the xgmac toggles down the IRQ if anyone resets the IRQ status
>> register? this action is done in the xgmac guest ISR. A device
>> reset/error? might provoke the IRQ pin toggling done. Do you see any
>> other events?
> 
> Sure, ok, forget about the 'guest wasn't running', but if the guest ISR
> is run with guest interrupts disabled and resets the xgmac device (or
> imagine some other device that just lowers the level triggered interrupt
> after some timeout), then how does the current code handle this?
yes I aknowledge current implementation relies on 2 assumptions:
- either the EOI happens or
- the pending IRQ is cleared through ICPENDRn access

and in case of reset typically we are in trouble to clear the distributor.

But I currently do not see any way to detect the device IRQ is toggled
down without trapping something.
> 
>>>
>>>> +			kvm_debug("Inject irqchip routed vIRQ %d\n",
>>>> +					e->irqchip.pin);
>>>> +			kvm_vgic_inject_irq(kvm, 0, spi, level);
>>>> +			/*
>>>> +			 * toggling down vIRQ wire is directly handled in
>>>> +			 * process_maintenance for this reason:
>>>> +			 * irqfd_resampler_ack is called in
>>>> +			 * process_maintenance which holds the dist lock.
>>>> +			 * irqfd_resampler_ack calls kvm_set_irq
>>>> +			 * which ends_up calling kvm_vgic_inject_irq.
>>>> +			 * This later attempts to take the lock -> deadlock!
>>>> +			 */
>>>
>>> Not sure I understand this comment.  What are we trying to achieve, are
>>> we using some sort of a workaround to avoid a deadlock?
>>
>> What I wanted to point out here is I would have prefered to handle both
>> levels 0 and 1 in a symetrical manner. irqfd_resampler_ack (in eventfd)
>> is calling kvm_set_irq with level 0. This would be the prefered way to
>> toggle down the SPI at GIC input instead of doing this in
>> process_maintenance in a dirty manner. However this does work because
>> irqfd_resampler_ack is called in process_maintenance (the place where
>> the EOI is analyzed). process_maintenance holds the dist lock and would
>> eventually call kvm_vgic_inject_irq which also attempts to take the lock.
>> 	
> 
> I'm afraid that's too much of a hack.  There's an external mechanism to
> set an interrupt line to active (level=1) or inactive (level=0) and we
> must support both.

> The fact that vgic_process_maintenance() can set the interrupt line to
> inactive is just something we exploit to properly handle level-triggered
> interrupts, but the main API to the VGIC must absolutely be supported.
> 
> Am I completely wrong here?

Well I am not sure what you call here the VGIC API?  Is it
kvm_vgic_inject_irq? I agree with you on the fact It would be cleaner to
use this later in both edge transitions.
> 
> The locking issue can be solved by splitting up the locking into a finer
> granularity as needed or deferring the call to irqfd_resampler_ack()
> until after unlocking the distributor lock in kvm_vgic_sync_hwstate().
I will investigate a cleaner solution anyway.
> 
>>>
>>>> +		}
>>>> +	}
>>>> +	return 0;
>>>> +
>>>> +}
>>>> +
>>>> +/* void implementation requested to compile irqchip.c */
>>>> +
>>>
>>> hmm, "TODO: MSI routing not yet implemented" seems more appropriate.
>>
>> OK
>>>
>>>> +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
>>>> +		struct kvm *kvm, int irq_source_id, int level, bool line_status)
>>>> +{
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +int kvm_set_routing_entry(struct kvm_irq_routing_table *rt,
>>>> +			  struct kvm_kernel_irq_routing_entry *e,
>>>> +			  const struct kvm_irq_routing_entry *ue)
>>>> +{
>>>> +	int r = -EINVAL;
>>>> +
>>>> +	switch (ue->type) {
>>>> +	case KVM_IRQ_ROUTING_IRQCHIP:
>>>> +		e->set = vgic_set_assigned_irq;
>>>> +		e->irqchip.irqchip = ue->u.irqchip.irqchip;
>>>> +		e->irqchip.pin = ue->u.irqchip.pin;
>>>> +		if (e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS)
>>>> +			goto out;
>>>> +		/* chip[0][virtualID] = physicalID  */
>>>> +		rt->chip[ue->u.irqchip.irqchip][e->irqchip.pin] = ue->gsi;
>>>
>>> did we verify ue->u.irqchip.irqchip anywhere before reaching this code?
>>
>> No you are right. Should add this check.
> 
> ouch! ;)
> 
> -Christoffer
>
Christoffer Dall June 5, 2014, 4:08 p.m. UTC | #8
On Thu, Jun 05, 2014 at 05:58:02PM +0200, Eric Auger wrote:
> On 06/05/2014 04:39 PM, Christoffer Dall wrote:
> > On Thu, Jun 05, 2014 at 03:15:15PM +0200, Eric Auger wrote:
> >> On 06/05/2014 12:28 PM, Christoffer Dall wrote:
> >>> On Mon, Jun 02, 2014 at 09:29:56AM +0200, Eric Auger wrote:
> >>>> This patch enables irqfd and irq routing on ARM.
> >>>>
> >>>> It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING
> >>>>
> >>>> irqfd framework enables to assign physical IRQs to guests.
> >>>>
> >>>> 1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
> >>>> associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
> >>>> signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
> >>>> injects the specified IRQ into the VM (the "GSI" takes the semantic of a
> >>>> virtual IRQ for that guest).
> >>>>
> >>>> 2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
> >>>> VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
> >>>> a virtual IRQ (aka irchip.pin).  This creates a so-called GSI routing entry.
> >>>> When someone triggers the eventfd, irqfd handles it but uses the specified
> >>>> routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
> >>>> context the GSI takes the semantic of a physical IRQ while the irqchip.pin
> >>>> takes the semantic of a virtual IRQ.
> >>>>
> >>>> in 1) routing is used by irqfd but an identity routing is created by default
> >>>> making the gsi = irqchip.pin. Note on ARM there is a single interrupt
> >>>> controller kind, the GIC.
> >>>>
> >>>> GSI routing mostly is implemented in generic irqchip.c.
> >>>> The tiny ARM specific part is directly implemented in the virtual interrupt
> >>>> controller (vgic.c) as it is done for powerpc for instance. This option was
> >>>> prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
> >>>> Hence irq_comm.c is not used at all.
> >>>>
> >>>> Routing currently is not used for anything else than irqfd IRQ injection. Only
> >>>> SPI can be injected. This means the vgic is not totally hidden behind the
> >>>> irqchip. There are separate discussions on PPI/SGI routing.
> >>>
> >>> What do you mean here?  Is there an ongoing discussion on the mailing
> >>> list somewhere?
> >>
> >> Hi Christoffer,
> >>
> >> Thanks for the review.
> >>
> >> I was refering to that thread where it was invoked to put the whole vgic
> >> behind irqchip:
> >> http://www.spinics.net/lists/kvm-arm/msg08413.html
> >>>
> >>>>
> >>>> Only level sensitive IRQs are supported (with a registered resampler). As a
> >>>
> >>> Is it not trivial to add edge-triggered support in the same go?
> >> Yes it shouldn't be a problem. It is more a matter of testing.
> >>>
> >>>> reminder the resampler is a second eventfd called by irqfd framework when the
> >>>> virtual IRQ is completed by the guest. This eventfd is supposed to be handled
> >>>> on user-side
> >>>>
> >>>> MSI routing is not supported yet.
> >>>>
> >>>> This work was tested with Calxeda Midway xgmac main interrupt (with and without
> >>>> explicit user routing) with qemu-system-arm and QEMU VFIO platform device.
> >>>>
> >>>> changes v1 -> v2:
> >>>> 2 fixes:
> >>>> - v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
> >>>>   This is now vgic_set_assigned_irq that increments it before injection.
> >>>> - v2 now handles the case where a pending assigned irq is cleared through
> >>>>   MMIO access. The irq is properly acked allowing the resamplefd handler
> >>>>   to possibly unmask the physical IRQ.
> >>>>
> >>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >>>>
> >>>> Conflicts:
> >>>> 	Documentation/virtual/kvm/api.txt
> >>>> 	arch/arm/kvm/Kconfig
> >>>>
> >>>> Conflicts:
> >>>> 	Documentation/virtual/kvm/api.txt
> >>>
> >>> We usually don't include these conflict notices when sending out
> >>> patches.
> >>
> >> OK I will remove them
> >>>
> >>>> ---
> >>>>  Documentation/virtual/kvm/api.txt |   4 +-
> >>>>  arch/arm/include/uapi/asm/kvm.h   |   8 +++
> >>>>  arch/arm/kvm/Kconfig              |   2 +
> >>>>  arch/arm/kvm/Makefile             |   1 +
> >>>>  arch/arm/kvm/irq.h                |  25 +++++++
> >>>>  virt/kvm/arm/vgic.c               | 141 ++++++++++++++++++++++++++++++++++++--
> >>>>  6 files changed, 174 insertions(+), 7 deletions(-)
> >>>>  create mode 100644 arch/arm/kvm/irq.h
> >>>>
> >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>>> index b4f5365..b376334 100644
> >>>> --- a/Documentation/virtual/kvm/api.txt
> >>>> +++ b/Documentation/virtual/kvm/api.txt
> >>>> @@ -1339,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
> >>>>  4.52 KVM_SET_GSI_ROUTING
> >>>>  
> >>>>  Capability: KVM_CAP_IRQ_ROUTING
> >>>> -Architectures: x86 ia64 s390
> >>>> +Architectures: x86 ia64 s390 arm
> >>>>  Type: vm ioctl
> >>>>  Parameters: struct kvm_irq_routing (in)
> >>>>  Returns: 0 on success, -1 on error
> >>>> @@ -2126,7 +2126,7 @@ into the hash PTE second double word).
> >>>>  4.75 KVM_IRQFD
> >>>>  
> >>>>  Capability: KVM_CAP_IRQFD
> >>>> -Architectures: x86 s390
> >>>> +Architectures: x86 s390 arm
> >>>>  Type: vm ioctl
> >>>>  Parameters: struct kvm_irqfd (in)
> >>>>  Returns: 0 on success, -1 on error
> >>>> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> >>>> index ef0c878..89b864d 100644
> >>>> --- a/arch/arm/include/uapi/asm/kvm.h
> >>>> +++ b/arch/arm/include/uapi/asm/kvm.h
> >>>> @@ -192,6 +192,14 @@ struct kvm_arch_memory_slot {
> >>>>  /* Highest supported SPI, from VGIC_NR_IRQS */
> >>>>  #define KVM_ARM_IRQ_GIC_MAX		127
> >>>>  
> >>>> +/* needed by IRQ routing */
> >>>> +
> >>>> +/* One single KVM irqchip, ie. the VGIC */
> >>>> +#define KVM_NR_IRQCHIPS          1
> >>>> +
> >>>> +/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
> >>>> +#define KVM_IRQCHIP_NUM_PINS     256
> >>>
> >>> I don't even see how the comment correlates to the define.  Hmmm.  But
> >>> since Marc asked you to change this anyhow, maybe this doesn't matter
> >>> now.
> >>
> >> yes you're right. Those were the figures I was able to find in GIC400
> >> TRM and I was confused by the fact I was not able to find them in the
> >> code so I eventually put the same value as VGIC_NR_IRQS. I started
> >> looking at kvm-arm64/vgic-dyn and found this dist nr_irqs but need more
> >> time to investigate. Nethertheless his KVM_IRQCHIP_NUM_PINS is used in
> >> generic code (irqchip).
> >>>
> >>>> +
> >>>>  /* PSCI interface */
> >>>>  #define KVM_PSCI_FN_BASE		0x95c1ba5e
> >>>>  #define KVM_PSCI_FN(n)			(KVM_PSCI_FN_BASE + (n))
> >>>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> >>>> index 4be5bb1..096692c 100644
> >>>> --- a/arch/arm/kvm/Kconfig
> >>>> +++ b/arch/arm/kvm/Kconfig
> >>>> @@ -24,6 +24,7 @@ config KVM
> >>>>  	select KVM_MMIO
> >>>>  	select KVM_ARM_HOST
> >>>>  	depends on ARM_VIRT_EXT && ARM_LPAE && !CPU_BIG_ENDIAN
> >>>> +	select HAVE_KVM_EVENTFD
> >>>>  	---help---
> >>>>  	  Support hosting virtualized guest machines. You will also
> >>>>  	  need to select one or more of the processor modules below.
> >>>> @@ -56,6 +57,7 @@ config KVM_ARM_VGIC
> >>>>  	bool "KVM support for Virtual GIC"
> >>>>  	depends on KVM_ARM_HOST && OF
> >>>>  	select HAVE_KVM_IRQCHIP
> >>>> +	select HAVE_KVM_IRQ_ROUTING
> >>>>  	default y
> >>>>  	---help---
> >>>>  	  Adds support for a hardware assisted, in-kernel GIC emulation.
> >>>> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> >>>> index 789bca9..29de111 100644
> >>>> --- a/arch/arm/kvm/Makefile
> >>>> +++ b/arch/arm/kvm/Makefile
> >>>> @@ -21,4 +21,5 @@ obj-y += kvm-arm.o init.o interrupts.o
> >>>>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
> >>>>  obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
> >>>>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
> >>>> +obj-$(CONFIG_HAVE_KVM_EVENTFD) += $(KVM)/eventfd.o $(KVM)/irqchip.o
> >>>>  obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
> >>>> diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h
> >>>> new file mode 100644
> >>>> index 0000000..4d6fcc6
> >>>> --- /dev/null
> >>>> +++ b/arch/arm/kvm/irq.h
> >>>> @@ -0,0 +1,25 @@
> >>>> +/*
> >>>> + * Copyright (C) 2014, STMicroelectronics
> >>>> + * Authors: Eric Auger <eric.auger@st.com>
> >>>
> >>> please use the Linaro copyright notice for this.  You can add your
> >>> @st.com e-mail in addition to the Linaro one in case you want that for
> >>> long-term support.
> >>
> >> OK I will do
> >>>
> >>>> + *
> >>>> + * This program is free software; you can redistribute it and/or modify
> >>>> + * it under the terms of the GNU General Public License, version 2, as
> >>>> + * published by the Free Software Foundation.
> >>>> + *
> >>>> + * This program is distributed in the hope that it will be useful,
> >>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>>> + * GNU General Public License for more details.
> >>>> + *
> >>>> + */
> >>>> +
> >>>> +#ifndef __IRQ_H
> >>>> +#define __IRQ_H
> >>>> +
> >>>> +#include <linux/kvm_host.h>
> >>>> +/*
> >>>> + * Placeholder for irqchip and irq/msi routing declarations
> >>>> + * included in irqchip.c
> >>>> + */
> >>>
> >>> But none needed now?
> >>
> >> Yes nothing for the time being.
> >>>
> >>>> +
> >>>> +#endif
> >>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> >>>> index 56ff9be..39afa0d 100644
> >>>> --- a/virt/kvm/arm/vgic.c
> >>>> +++ b/virt/kvm/arm/vgic.c
> >>>> @@ -93,6 +93,9 @@ static struct device_node *vgic_node;
> >>>>  #define ACCESS_WRITE_VALUE	(3 << 1)
> >>>>  #define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
> >>>>  
> >>>> +static struct kvm_irq_routing_entry identity_table[VGIC_NR_IRQS];
> >>>> +static int set_default_routing_table(struct kvm *kvm);
> >>>> +
> >>>>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
> >>>>  static void vgic_update_state(struct kvm *kvm);
> >>>>  static void vgic_kick_vcpus(struct kvm *kvm);
> >>>> @@ -408,11 +411,27 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
> >>>>  					  struct kvm_exit_mmio *mmio,
> >>>>  					  phys_addr_t offset)
> >>>>  {
> >>>> -	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
> >>>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> >>>> +	unsigned int i;
> >>>> +	bool is_assigned_irq;
> >>>> +	DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
> >>>> +	DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
> >>>> +	unsigned long *pending =
> >>>> +		vgic_bitmap_get_shared_map(&dist->irq_state);
> >>>> +	u32 *reg;
> >>>
> >>> please add a blank line between your declarations and the actual code.
> >>
> >> OK
> >>>
> >>>> +	bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);
> >>>> +	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
> >>>>  				       vcpu->vcpu_id, offset);
> >>>>  	vgic_reg_access(mmio, reg, offset,
> >>>>  			ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
> >>>>  	if (mmio->is_write) {
> >>>> +		pending = vgic_bitmap_get_shared_map(&dist->irq_state);
> >>>> +		bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
> >>>> +		for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
> >>>> +			is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
> >>>> +			if (is_assigned_irq)
> >>>> +				kvm_notify_acked_irq(vcpu->kvm, 0, i);
> >>>> +		}
> >>>
> >>> As Mark pointed out, just copy the vgic reg value and do a simple xor on
> >>> the two instead, and factor out the two lines that check for
> >>> is_assigned_irq and calles notify_acked so that you can give a small
> >>> static function a semantically meaningful name and call that from both
> >>> this function and the process_maintenance function.
> >>
> >> Yes I corrected this after looking into more details at the register
> >> semantic.
> >>>
> >>> That being said, this whole thing feels a bit weird, I'll comment on the
> >>> other thread.
> >> OK
> >>>
> >>>>  		vgic_update_state(vcpu->kvm);
> >>>>  		return true;
> >>>>  	}
> >>>> @@ -1172,6 +1191,8 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> >>>>  	bool level_pending = false;
> >>>> +	struct kvm *kvm;
> >>>> +	int is_assigned_irq;
> >>>>  
> >>>>  	kvm_debug("MISR = %08x\n", vgic_cpu->vgic_misr);
> >>>>  
> >>>> @@ -1189,12 +1210,23 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >>>>  			vgic_irq_clear_active(vcpu, irq);
> >>>>  			vgic_cpu->vgic_lr[lr] &= ~GICH_LR_EOI;
> >>>>  
> >>>> +			kvm = vcpu->kvm;
> >>>> +			is_assigned_irq =
> >>>> +				kvm_irq_has_notifier(kvm, 0, irq-VGIC_NR_PRIVATE_IRQS);
> >>>
> >>> I think the preferred style is to keep the function call name on the
> >>> same line as the assignment, and the do a line break on the parameter
> >>> that doesn't fit in the line width and align that to the opening
> >>> parenthesis on the funciton call.  At least I prefer it that way.
> >>
> >> OK I will change this.
> >>>
> >>> also spaces around the '-', please.
> >>>
> >>>>  			/* Any additional pending interrupt? */
> >>>
> >>> This comment seems to only aplly for non-assigned IRQs now, right?
> >>
> >> yes it does
> >>>
> > 
> > so move the comment in the else-clause, if it's an assigned irq then
> > that's not what you're handling here. 
> 
> correct
> 
>  But come to think of it, don't we
> > need to check if the line is still high?
> 
> by construction it should not be possible since the physical IRQ is
> masked by the VFIO driver.

But this mechanism is not necessarily tied to VFIO is it?

> > 
> >>>> -			if (vgic_dist_irq_is_pending(vcpu, irq)) {
> >>>> -				vgic_cpu_irq_set(vcpu, irq);
> >>>> -				level_pending = true;
> >>>> -			} else {
> >>>> +			if (is_assigned_irq) {
> >>>>  				vgic_cpu_irq_clear(vcpu, irq);
> >>>> +				kvm_debug("EOI irqchip routed vIRQ %d\n", irq);
> >>>> +				kvm_notify_acked_irq(kvm, 0,
> >>>> +					irq-VGIC_NR_PRIVATE_IRQS);
> >>>
> >>> spaces around the '-', please.
> >> OK
> >>>
> >>>> +				vgic_dist_irq_clear(vcpu, irq);
> >>>> +			} else {
> >>>> +				if (vgic_dist_irq_is_pending(vcpu, irq)) {
> >>>> +					vgic_cpu_irq_set(vcpu, irq);
> >>>> +					level_pending = true;
> >>>> +				} else {
> >>>> +					vgic_cpu_irq_clear(vcpu, irq);
> >>>> +				}
> >>>>  			}
> >>>>  
> >>>>  			/*
> >>>> @@ -1627,6 +1659,8 @@ int kvm_vgic_create(struct kvm *kvm)
> >>>>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
> >>>>  	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
> >>>>  
> >>>> +	set_default_routing_table(kvm);
> >>>> +
> >>>>  out_unlock:
> >>>>  	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
> >>>>  		vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
> >>>> @@ -2017,3 +2051,100 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = {
> >>>>  	.get_attr = vgic_get_attr,
> >>>>  	.has_attr = vgic_has_attr,
> >>>>  };
> >>>> +
> >>>> +
> >>>> +/*
> >>>> + * set up a default identity routing table
> >>>> + * The user-side can further change the routing table using
> >>>> + * KVM_SET_GSI_ROUTING VM ioctl
> >>>> + */
> >>>> +
> >>>
> >>> don't add this whitespace between comments and a function declaration,
> >>> please check the entire patch for this.
> >> OK
> >>>
> >>>> +static int set_default_routing_table(struct kvm *kvm)
> >>>> +{
> >>>> +	struct kvm_irq_routing_entry;
> >>>> +	int i;
> >>>> +	for (i = 0; i < VGIC_NR_IRQS; i++)	{
> >>>> +		identity_table[i].gsi = i;
> >>>> +		identity_table[i].type = KVM_IRQ_ROUTING_IRQCHIP;
> >>>> +		identity_table[i].u.irqchip.irqchip = 0;
> >>>> +		identity_table[i].u.irqchip.pin = i;
> >>>> +	}
> >>>> +	return kvm_set_irq_routing(kvm, identity_table,
> >>>> +				   ARRAY_SIZE(identity_table), 0);
> >>>
> >>> is identity table used after this stage?  If not, could you not
> >>> dynamically allocate it and free it after use so we don't waste this
> >>> staic allocation of memory in the kernel?
> >>
> >> yes this definitively can be optimized.
> >>>
> >>>> +}
> >>>> +
> >>>> +
> >>>> +/*
> >>>> + * Functions needed for GSI routing (used by irqchip.c)
> >>>> + * implemented in irq_comm.c for x86 and ia64
> >>>> + * in architecture specific files for some other archictures (powerpc)
> >>>> + */
> >>>
> >>> Hmmm, this comment seems rather pointless in this file.  If you want to
> >>> describe what this function does, then just document this functionality
> >>> and the parameters/return value, i.e.
> >>>
> >>> vgic_set_assigned_irq - set state of IRQs driven by irqfd
> >>>
> >>> When an IRQ is raised or lowered.... blah blah blah blah.
> >>>
> >>> @e:   the routing entry describing...
> >>> @kvm: the kvm struct
> >>> and so on.
> >>
> >>>
> >>> return 0 on success, -error on errors.
> >>>
> >>
> >> OK
> >>>> +
> >>>> +static int vgic_set_assigned_irq(struct kvm_kernel_irq_routing_entry *e,
> >>>> +			   struct kvm *kvm, int irq_source_id, int level,
> >>>> +			   bool line_status)
> >>>> +{
> >>>> +	unsigned int spi = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
> >>>
> >>> I noticed this in the changelogs too, where's the rationale/API
> >>> documentaiton for the use of irqchip.pin and its semantics on ARM?
> >>>
> >>> Should we add something in Documentation/virtual/kvm/api.txt ?
> >>
> >> - in 4.24 KVM_CREATE_IRQCHIP I might add that similarly to s390 a dummy
> >> identity table is created.
> >>
> >> - KVM_CAP_IRQFD says "kvm_irqfd.gsi specifies the irqchip pin toggled by
> >> this event.  When an event is triggered on the eventfd, an interrupt is
> >> injected into the guest using the specified gsi pin"
> >>
> >> Assuming the standard use case is to use an identity/dummy GSI table the
> >> irqchip.pin still remains the "irqchip pin" toggled on eventfd.
> >>
> >> By the way I might remove the mention to the use case where the gsi !=
> >> irqchip.pin.
> >>
> >> Now when reading 4.25 KVM_IRQ_LINE, it is said that it is used to inject
> >> a GSI as well.
> >>
> >> Now on ARM The GSI has the following content.
> >>
> >>   bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
> >>   field: | irq_type  | vcpu_index |     irq_id     |
> >>
> >> As such that's true that currently there is an inconsistency.
> >>
> >> Currently my GSI == spi number whereas the GSI as injected by
> >> KVM_IRQ_LINE has the above format.
> > 
> > But the ARM use of KVM_IRQFD is not clearly documented.
> > 
> > I think this patch needs to include adding arm to the "Architectures"
> > list in api.txt and clearly document what the semantics of the fields of
> > the struct kvm_irqfd are.
> 
> Yes indeed, this deserves such clarification, all the more so there is
> the above inconsistency. We come back to the discussion about relevance
> of routing other things than SPI actually. use case?
> > 
> >>>
> >>>> +
> >>>> +	if (irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID) {
> >>>> +		/*
> >>>> +		 * This path is not tested yet,
> >>>> +		 * only irqchip with resampler was exercised
> >>>> +		 */
> >>>> +		kvm_vgic_inject_irq(kvm, 0, spi, level);
> >>>
> >>> hmmm, stuff like this definitely makes this patch an RFC.
> >>
> >> Yes I acknowledge I shall step back to RFC. I was a bit eager.
> > 
> > The alternative is to return an error and put a big fat comment saying
> > this is NOT IMPLEMENTED yet.
> 
> actually this KVM_USERSPACE_IRQ_SOURCE_ID path is entered when injecting
> IRQs without resamplerfd (is edge sensitive ones). As such we can link
> this issue to your above related comment.
> 
> Anyway I will move it to RFC since there are quite a lot of open points,
> especially the next one.
> > 
> >>>
> >>>> +	} else if (irq_source_id == KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID) {
> >>>> +		if (level == 1) {
> >>>
> >>> This seems very wrong.  What if an external device raises a
> >>> level-triggered IRQ, but then lowers it again, without the guest was
> >>> ever running, now you have to wait until the guest sees the (now
> >>> inactive) interrupt and EOIs it before the interrupt is lowered on the
> >>> vgic?
> >> Hum I am bit confused here. when you enter that code, this means an
> >> irqfd was triggered. This irqfd was registered by some user code that is
> >> supposed to be alive and do the proper action to complete the IRQ. in my
> >> case the xgmac toggles down the IRQ if anyone resets the IRQ status
> >> register? this action is done in the xgmac guest ISR. A device
> >> reset/error? might provoke the IRQ pin toggling done. Do you see any
> >> other events?
> > 
> > Sure, ok, forget about the 'guest wasn't running', but if the guest ISR
> > is run with guest interrupts disabled and resets the xgmac device (or
> > imagine some other device that just lowers the level triggered interrupt
> > after some timeout), then how does the current code handle this?
> yes I aknowledge current implementation relies on 2 assumptions:
> - either the EOI happens or
> - the pending IRQ is cleared through ICPENDRn access
> 
> and in case of reset typically we are in trouble to clear the distributor.
> 
> But I currently do not see any way to detect the device IRQ is toggled
> down without trapping something.

I guess that's hard without trapping or probing the VFIO driver, but the
current scheme just seems to fragile and seems to be built around a very
specific behavior of your guest and host, and not supporting a generic
solution.

Either we will have to trap MMIO to know what's going on, or perhaps
probe the VFIO state when the interrupt is enabled on the vgic by the
guest.

> > 
> >>>
> >>>> +			kvm_debug("Inject irqchip routed vIRQ %d\n",
> >>>> +					e->irqchip.pin);
> >>>> +			kvm_vgic_inject_irq(kvm, 0, spi, level);
> >>>> +			/*
> >>>> +			 * toggling down vIRQ wire is directly handled in
> >>>> +			 * process_maintenance for this reason:
> >>>> +			 * irqfd_resampler_ack is called in
> >>>> +			 * process_maintenance which holds the dist lock.
> >>>> +			 * irqfd_resampler_ack calls kvm_set_irq
> >>>> +			 * which ends_up calling kvm_vgic_inject_irq.
> >>>> +			 * This later attempts to take the lock -> deadlock!
> >>>> +			 */
> >>>
> >>> Not sure I understand this comment.  What are we trying to achieve, are
> >>> we using some sort of a workaround to avoid a deadlock?
> >>
> >> What I wanted to point out here is I would have prefered to handle both
> >> levels 0 and 1 in a symetrical manner. irqfd_resampler_ack (in eventfd)
> >> is calling kvm_set_irq with level 0. This would be the prefered way to
> >> toggle down the SPI at GIC input instead of doing this in
> >> process_maintenance in a dirty manner. However this does work because
> >> irqfd_resampler_ack is called in process_maintenance (the place where
> >> the EOI is analyzed). process_maintenance holds the dist lock and would
> >> eventually call kvm_vgic_inject_irq which also attempts to take the lock.
> >> 	
> > 
> > I'm afraid that's too much of a hack.  There's an external mechanism to
> > set an interrupt line to active (level=1) or inactive (level=0) and we
> > must support both.
> 
> > The fact that vgic_process_maintenance() can set the interrupt line to
> > inactive is just something we exploit to properly handle level-triggered
> > interrupts, but the main API to the VGIC must absolutely be supported.
> > 
> > Am I completely wrong here?
> 
> Well I am not sure what you call here the VGIC API?  Is it
> kvm_vgic_inject_irq? I agree with you on the fact It would be cleaner to
> use this later in both edge transitions.

Yes, the "set" function pointer, which eventually calls
vgic_set_assigned_irq(), is your API.  If that API carries semantics
that you can raise AND lower the irq through that API, then we need to
support that I think.

Thanks,
-Christoffer
Auger Eric June 6, 2014, 7:41 a.m. UTC | #9
On 06/05/2014 06:08 PM, Christoffer Dall wrote:
> On Thu, Jun 05, 2014 at 05:58:02PM +0200, Eric Auger wrote:
>> On 06/05/2014 04:39 PM, Christoffer Dall wrote:
>>> On Thu, Jun 05, 2014 at 03:15:15PM +0200, Eric Auger wrote:
>>>> On 06/05/2014 12:28 PM, Christoffer Dall wrote:
>>>>> On Mon, Jun 02, 2014 at 09:29:56AM +0200, Eric Auger wrote:
>>>>>> This patch enables irqfd and irq routing on ARM.
>>>>>>
>>>>>> It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING
>>>>>>
>>>>>> irqfd framework enables to assign physical IRQs to guests.
>>>>>>
>>>>>> 1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
>>>>>> associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
>>>>>> signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
>>>>>> injects the specified IRQ into the VM (the "GSI" takes the semantic of a
>>>>>> virtual IRQ for that guest).
>>>>>>
>>>>>> 2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
>>>>>> VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
>>>>>> a virtual IRQ (aka irchip.pin).  This creates a so-called GSI routing entry.
>>>>>> When someone triggers the eventfd, irqfd handles it but uses the specified
>>>>>> routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
>>>>>> context the GSI takes the semantic of a physical IRQ while the irqchip.pin
>>>>>> takes the semantic of a virtual IRQ.
>>>>>>
>>>>>> in 1) routing is used by irqfd but an identity routing is created by default
>>>>>> making the gsi = irqchip.pin. Note on ARM there is a single interrupt
>>>>>> controller kind, the GIC.
>>>>>>
>>>>>> GSI routing mostly is implemented in generic irqchip.c.
>>>>>> The tiny ARM specific part is directly implemented in the virtual interrupt
>>>>>> controller (vgic.c) as it is done for powerpc for instance. This option was
>>>>>> prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
>>>>>> Hence irq_comm.c is not used at all.
>>>>>>
>>>>>> Routing currently is not used for anything else than irqfd IRQ injection. Only
>>>>>> SPI can be injected. This means the vgic is not totally hidden behind the
>>>>>> irqchip. There are separate discussions on PPI/SGI routing.
>>>>>
>>>>> What do you mean here?  Is there an ongoing discussion on the mailing
>>>>> list somewhere?
>>>>
>>>> Hi Christoffer,
>>>>
>>>> Thanks for the review.
>>>>
>>>> I was refering to that thread where it was invoked to put the whole vgic
>>>> behind irqchip:
>>>> http://www.spinics.net/lists/kvm-arm/msg08413.html
>>>>>
>>>>>>
>>>>>> Only level sensitive IRQs are supported (with a registered resampler). As a
>>>>>
>>>>> Is it not trivial to add edge-triggered support in the same go?
>>>> Yes it shouldn't be a problem. It is more a matter of testing.
>>>>>
>>>>>> reminder the resampler is a second eventfd called by irqfd framework when the
>>>>>> virtual IRQ is completed by the guest. This eventfd is supposed to be handled
>>>>>> on user-side
>>>>>>
>>>>>> MSI routing is not supported yet.
>>>>>>
>>>>>> This work was tested with Calxeda Midway xgmac main interrupt (with and without
>>>>>> explicit user routing) with qemu-system-arm and QEMU VFIO platform device.
>>>>>>
>>>>>> changes v1 -> v2:
>>>>>> 2 fixes:
>>>>>> - v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
>>>>>>   This is now vgic_set_assigned_irq that increments it before injection.
>>>>>> - v2 now handles the case where a pending assigned irq is cleared through
>>>>>>   MMIO access. The irq is properly acked allowing the resamplefd handler
>>>>>>   to possibly unmask the physical IRQ.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>
>>>>>> Conflicts:
>>>>>> 	Documentation/virtual/kvm/api.txt
>>>>>> 	arch/arm/kvm/Kconfig
>>>>>>
>>>>>> Conflicts:
>>>>>> 	Documentation/virtual/kvm/api.txt
>>>>>
>>>>> We usually don't include these conflict notices when sending out
>>>>> patches.
>>>>
>>>> OK I will remove them
>>>>>
>>>>>> ---
>>>>>>  Documentation/virtual/kvm/api.txt |   4 +-
>>>>>>  arch/arm/include/uapi/asm/kvm.h   |   8 +++
>>>>>>  arch/arm/kvm/Kconfig              |   2 +
>>>>>>  arch/arm/kvm/Makefile             |   1 +
>>>>>>  arch/arm/kvm/irq.h                |  25 +++++++
>>>>>>  virt/kvm/arm/vgic.c               | 141 ++++++++++++++++++++++++++++++++++++--
>>>>>>  6 files changed, 174 insertions(+), 7 deletions(-)
>>>>>>  create mode 100644 arch/arm/kvm/irq.h
>>>>>>
>>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>>> index b4f5365..b376334 100644
>>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>>> @@ -1339,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
>>>>>>  4.52 KVM_SET_GSI_ROUTING
>>>>>>  
>>>>>>  Capability: KVM_CAP_IRQ_ROUTING
>>>>>> -Architectures: x86 ia64 s390
>>>>>> +Architectures: x86 ia64 s390 arm
>>>>>>  Type: vm ioctl
>>>>>>  Parameters: struct kvm_irq_routing (in)
>>>>>>  Returns: 0 on success, -1 on error
>>>>>> @@ -2126,7 +2126,7 @@ into the hash PTE second double word).
>>>>>>  4.75 KVM_IRQFD
>>>>>>  
>>>>>>  Capability: KVM_CAP_IRQFD
>>>>>> -Architectures: x86 s390
>>>>>> +Architectures: x86 s390 arm
>>>>>>  Type: vm ioctl
>>>>>>  Parameters: struct kvm_irqfd (in)
>>>>>>  Returns: 0 on success, -1 on error
>>>>>> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
>>>>>> index ef0c878..89b864d 100644
>>>>>> --- a/arch/arm/include/uapi/asm/kvm.h
>>>>>> +++ b/arch/arm/include/uapi/asm/kvm.h
>>>>>> @@ -192,6 +192,14 @@ struct kvm_arch_memory_slot {
>>>>>>  /* Highest supported SPI, from VGIC_NR_IRQS */
>>>>>>  #define KVM_ARM_IRQ_GIC_MAX		127
>>>>>>  
>>>>>> +/* needed by IRQ routing */
>>>>>> +
>>>>>> +/* One single KVM irqchip, ie. the VGIC */
>>>>>> +#define KVM_NR_IRQCHIPS          1
>>>>>> +
>>>>>> +/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
>>>>>> +#define KVM_IRQCHIP_NUM_PINS     256
>>>>>
>>>>> I don't even see how the comment correlates to the define.  Hmmm.  But
>>>>> since Marc asked you to change this anyhow, maybe this doesn't matter
>>>>> now.
>>>>
>>>> yes you're right. Those were the figures I was able to find in GIC400
>>>> TRM and I was confused by the fact I was not able to find them in the
>>>> code so I eventually put the same value as VGIC_NR_IRQS. I started
>>>> looking at kvm-arm64/vgic-dyn and found this dist nr_irqs but need more
>>>> time to investigate. Nethertheless his KVM_IRQCHIP_NUM_PINS is used in
>>>> generic code (irqchip).
>>>>>
>>>>>> +
>>>>>>  /* PSCI interface */
>>>>>>  #define KVM_PSCI_FN_BASE		0x95c1ba5e
>>>>>>  #define KVM_PSCI_FN(n)			(KVM_PSCI_FN_BASE + (n))
>>>>>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>>>>>> index 4be5bb1..096692c 100644
>>>>>> --- a/arch/arm/kvm/Kconfig
>>>>>> +++ b/arch/arm/kvm/Kconfig
>>>>>> @@ -24,6 +24,7 @@ config KVM
>>>>>>  	select KVM_MMIO
>>>>>>  	select KVM_ARM_HOST
>>>>>>  	depends on ARM_VIRT_EXT && ARM_LPAE && !CPU_BIG_ENDIAN
>>>>>> +	select HAVE_KVM_EVENTFD
>>>>>>  	---help---
>>>>>>  	  Support hosting virtualized guest machines. You will also
>>>>>>  	  need to select one or more of the processor modules below.
>>>>>> @@ -56,6 +57,7 @@ config KVM_ARM_VGIC
>>>>>>  	bool "KVM support for Virtual GIC"
>>>>>>  	depends on KVM_ARM_HOST && OF
>>>>>>  	select HAVE_KVM_IRQCHIP
>>>>>> +	select HAVE_KVM_IRQ_ROUTING
>>>>>>  	default y
>>>>>>  	---help---
>>>>>>  	  Adds support for a hardware assisted, in-kernel GIC emulation.
>>>>>> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
>>>>>> index 789bca9..29de111 100644
>>>>>> --- a/arch/arm/kvm/Makefile
>>>>>> +++ b/arch/arm/kvm/Makefile
>>>>>> @@ -21,4 +21,5 @@ obj-y += kvm-arm.o init.o interrupts.o
>>>>>>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
>>>>>>  obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
>>>>>>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
>>>>>> +obj-$(CONFIG_HAVE_KVM_EVENTFD) += $(KVM)/eventfd.o $(KVM)/irqchip.o
>>>>>>  obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
>>>>>> diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h
>>>>>> new file mode 100644
>>>>>> index 0000000..4d6fcc6
>>>>>> --- /dev/null
>>>>>> +++ b/arch/arm/kvm/irq.h
>>>>>> @@ -0,0 +1,25 @@
>>>>>> +/*
>>>>>> + * Copyright (C) 2014, STMicroelectronics
>>>>>> + * Authors: Eric Auger <eric.auger@st.com>
>>>>>
>>>>> please use the Linaro copyright notice for this.  You can add your
>>>>> @st.com e-mail in addition to the Linaro one in case you want that for
>>>>> long-term support.
>>>>
>>>> OK I will do
>>>>>
>>>>>> + *
>>>>>> + * This program is free software; you can redistribute it and/or modify
>>>>>> + * it under the terms of the GNU General Public License, version 2, as
>>>>>> + * published by the Free Software Foundation.
>>>>>> + *
>>>>>> + * This program is distributed in the hope that it will be useful,
>>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>>> + * GNU General Public License for more details.
>>>>>> + *
>>>>>> + */
>>>>>> +
>>>>>> +#ifndef __IRQ_H
>>>>>> +#define __IRQ_H
>>>>>> +
>>>>>> +#include <linux/kvm_host.h>
>>>>>> +/*
>>>>>> + * Placeholder for irqchip and irq/msi routing declarations
>>>>>> + * included in irqchip.c
>>>>>> + */
>>>>>
>>>>> But none needed now?
>>>>
>>>> Yes nothing for the time being.
>>>>>
>>>>>> +
>>>>>> +#endif
>>>>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>>>>> index 56ff9be..39afa0d 100644
>>>>>> --- a/virt/kvm/arm/vgic.c
>>>>>> +++ b/virt/kvm/arm/vgic.c
>>>>>> @@ -93,6 +93,9 @@ static struct device_node *vgic_node;
>>>>>>  #define ACCESS_WRITE_VALUE	(3 << 1)
>>>>>>  #define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
>>>>>>  
>>>>>> +static struct kvm_irq_routing_entry identity_table[VGIC_NR_IRQS];
>>>>>> +static int set_default_routing_table(struct kvm *kvm);
>>>>>> +
>>>>>>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>>>>>  static void vgic_update_state(struct kvm *kvm);
>>>>>>  static void vgic_kick_vcpus(struct kvm *kvm);
>>>>>> @@ -408,11 +411,27 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
>>>>>>  					  struct kvm_exit_mmio *mmio,
>>>>>>  					  phys_addr_t offset)
>>>>>>  {
>>>>>> -	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>>>>>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>>>>> +	unsigned int i;
>>>>>> +	bool is_assigned_irq;
>>>>>> +	DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
>>>>>> +	DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
>>>>>> +	unsigned long *pending =
>>>>>> +		vgic_bitmap_get_shared_map(&dist->irq_state);
>>>>>> +	u32 *reg;
>>>>>
>>>>> please add a blank line between your declarations and the actual code.
>>>>
>>>> OK
>>>>>
>>>>>> +	bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);
>>>>>> +	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>>>>>>  				       vcpu->vcpu_id, offset);
>>>>>>  	vgic_reg_access(mmio, reg, offset,
>>>>>>  			ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
>>>>>>  	if (mmio->is_write) {
>>>>>> +		pending = vgic_bitmap_get_shared_map(&dist->irq_state);
>>>>>> +		bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
>>>>>> +		for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
>>>>>> +			is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
>>>>>> +			if (is_assigned_irq)
>>>>>> +				kvm_notify_acked_irq(vcpu->kvm, 0, i);
>>>>>> +		}
>>>>>
>>>>> As Mark pointed out, just copy the vgic reg value and do a simple xor on
>>>>> the two instead, and factor out the two lines that check for
>>>>> is_assigned_irq and calles notify_acked so that you can give a small
>>>>> static function a semantically meaningful name and call that from both
>>>>> this function and the process_maintenance function.
>>>>
>>>> Yes I corrected this after looking into more details at the register
>>>> semantic.
>>>>>
>>>>> That being said, this whole thing feels a bit weird, I'll comment on the
>>>>> other thread.
>>>> OK
>>>>>
>>>>>>  		vgic_update_state(vcpu->kvm);
>>>>>>  		return true;
>>>>>>  	}
>>>>>> @@ -1172,6 +1191,8 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>>>  {
>>>>>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>>>>>  	bool level_pending = false;
>>>>>> +	struct kvm *kvm;
>>>>>> +	int is_assigned_irq;
>>>>>>  
>>>>>>  	kvm_debug("MISR = %08x\n", vgic_cpu->vgic_misr);
>>>>>>  
>>>>>> @@ -1189,12 +1210,23 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>>>  			vgic_irq_clear_active(vcpu, irq);
>>>>>>  			vgic_cpu->vgic_lr[lr] &= ~GICH_LR_EOI;
>>>>>>  
>>>>>> +			kvm = vcpu->kvm;
>>>>>> +			is_assigned_irq =
>>>>>> +				kvm_irq_has_notifier(kvm, 0, irq-VGIC_NR_PRIVATE_IRQS);
>>>>>
>>>>> I think the preferred style is to keep the function call name on the
>>>>> same line as the assignment, and the do a line break on the parameter
>>>>> that doesn't fit in the line width and align that to the opening
>>>>> parenthesis on the funciton call.  At least I prefer it that way.
>>>>
>>>> OK I will change this.
>>>>>
>>>>> also spaces around the '-', please.
>>>>>
>>>>>>  			/* Any additional pending interrupt? */
>>>>>
>>>>> This comment seems to only aplly for non-assigned IRQs now, right?
>>>>
>>>> yes it does
>>>>>
>>>
>>> so move the comment in the else-clause, if it's an assigned irq then
>>> that's not what you're handling here. 
>>
>> correct
>>
>>  But come to think of it, don't we
>>> need to check if the line is still high?
>>
>> by construction it should not be possible since the physical IRQ is
>> masked by the VFIO driver.
> 
> But this mechanism is not necessarily tied to VFIO is it?
> 

Yes that's correct. but it is tied to eventfd mechanism. I need to
further check what it induces as constraints. vhost uses it. Would be
good to put this in place yo test genericity of next versions.
>>>
>>>>>> -			if (vgic_dist_irq_is_pending(vcpu, irq)) {
>>>>>> -				vgic_cpu_irq_set(vcpu, irq);
>>>>>> -				level_pending = true;
>>>>>> -			} else {
>>>>>> +			if (is_assigned_irq) {
>>>>>>  				vgic_cpu_irq_clear(vcpu, irq);
>>>>>> +				kvm_debug("EOI irqchip routed vIRQ %d\n", irq);
>>>>>> +				kvm_notify_acked_irq(kvm, 0,
>>>>>> +					irq-VGIC_NR_PRIVATE_IRQS);
>>>>>
>>>>> spaces around the '-', please.
>>>> OK
>>>>>
>>>>>> +				vgic_dist_irq_clear(vcpu, irq);
>>>>>> +			} else {
>>>>>> +				if (vgic_dist_irq_is_pending(vcpu, irq)) {
>>>>>> +					vgic_cpu_irq_set(vcpu, irq);
>>>>>> +					level_pending = true;
>>>>>> +				} else {
>>>>>> +					vgic_cpu_irq_clear(vcpu, irq);
>>>>>> +				}
>>>>>>  			}
>>>>>>  
>>>>>>  			/*
>>>>>> @@ -1627,6 +1659,8 @@ int kvm_vgic_create(struct kvm *kvm)
>>>>>>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>>>>>>  	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
>>>>>>  
>>>>>> +	set_default_routing_table(kvm);
>>>>>> +
>>>>>>  out_unlock:
>>>>>>  	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
>>>>>>  		vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
>>>>>> @@ -2017,3 +2051,100 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>>>>>>  	.get_attr = vgic_get_attr,
>>>>>>  	.has_attr = vgic_has_attr,
>>>>>>  };
>>>>>> +
>>>>>> +
>>>>>> +/*
>>>>>> + * set up a default identity routing table
>>>>>> + * The user-side can further change the routing table using
>>>>>> + * KVM_SET_GSI_ROUTING VM ioctl
>>>>>> + */
>>>>>> +
>>>>>
>>>>> don't add this whitespace between comments and a function declaration,
>>>>> please check the entire patch for this.
>>>> OK
>>>>>
>>>>>> +static int set_default_routing_table(struct kvm *kvm)
>>>>>> +{
>>>>>> +	struct kvm_irq_routing_entry;
>>>>>> +	int i;
>>>>>> +	for (i = 0; i < VGIC_NR_IRQS; i++)	{
>>>>>> +		identity_table[i].gsi = i;
>>>>>> +		identity_table[i].type = KVM_IRQ_ROUTING_IRQCHIP;
>>>>>> +		identity_table[i].u.irqchip.irqchip = 0;
>>>>>> +		identity_table[i].u.irqchip.pin = i;
>>>>>> +	}
>>>>>> +	return kvm_set_irq_routing(kvm, identity_table,
>>>>>> +				   ARRAY_SIZE(identity_table), 0);
>>>>>
>>>>> is identity table used after this stage?  If not, could you not
>>>>> dynamically allocate it and free it after use so we don't waste this
>>>>> staic allocation of memory in the kernel?
>>>>
>>>> yes this definitively can be optimized.
>>>>>
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>> +/*
>>>>>> + * Functions needed for GSI routing (used by irqchip.c)
>>>>>> + * implemented in irq_comm.c for x86 and ia64
>>>>>> + * in architecture specific files for some other archictures (powerpc)
>>>>>> + */
>>>>>
>>>>> Hmmm, this comment seems rather pointless in this file.  If you want to
>>>>> describe what this function does, then just document this functionality
>>>>> and the parameters/return value, i.e.
>>>>>
>>>>> vgic_set_assigned_irq - set state of IRQs driven by irqfd
>>>>>
>>>>> When an IRQ is raised or lowered.... blah blah blah blah.
>>>>>
>>>>> @e:   the routing entry describing...
>>>>> @kvm: the kvm struct
>>>>> and so on.
>>>>
>>>>>
>>>>> return 0 on success, -error on errors.
>>>>>
>>>>
>>>> OK
>>>>>> +
>>>>>> +static int vgic_set_assigned_irq(struct kvm_kernel_irq_routing_entry *e,
>>>>>> +			   struct kvm *kvm, int irq_source_id, int level,
>>>>>> +			   bool line_status)
>>>>>> +{
>>>>>> +	unsigned int spi = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
>>>>>
>>>>> I noticed this in the changelogs too, where's the rationale/API
>>>>> documentaiton for the use of irqchip.pin and its semantics on ARM?
>>>>>
>>>>> Should we add something in Documentation/virtual/kvm/api.txt ?
>>>>
>>>> - in 4.24 KVM_CREATE_IRQCHIP I might add that similarly to s390 a dummy
>>>> identity table is created.
>>>>
>>>> - KVM_CAP_IRQFD says "kvm_irqfd.gsi specifies the irqchip pin toggled by
>>>> this event.  When an event is triggered on the eventfd, an interrupt is
>>>> injected into the guest using the specified gsi pin"
>>>>
>>>> Assuming the standard use case is to use an identity/dummy GSI table the
>>>> irqchip.pin still remains the "irqchip pin" toggled on eventfd.
>>>>
>>>> By the way I might remove the mention to the use case where the gsi !=
>>>> irqchip.pin.
>>>>
>>>> Now when reading 4.25 KVM_IRQ_LINE, it is said that it is used to inject
>>>> a GSI as well.
>>>>
>>>> Now on ARM The GSI has the following content.
>>>>
>>>>   bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
>>>>   field: | irq_type  | vcpu_index |     irq_id     |
>>>>
>>>> As such that's true that currently there is an inconsistency.
>>>>
>>>> Currently my GSI == spi number whereas the GSI as injected by
>>>> KVM_IRQ_LINE has the above format.
>>>
>>> But the ARM use of KVM_IRQFD is not clearly documented.
>>>
>>> I think this patch needs to include adding arm to the "Architectures"
>>> list in api.txt and clearly document what the semantics of the fields of
>>> the struct kvm_irqfd are.
>>
>> Yes indeed, this deserves such clarification, all the more so there is
>> the above inconsistency. We come back to the discussion about relevance
>> of routing other things than SPI actually. use case?
>>>
>>>>>
>>>>>> +
>>>>>> +	if (irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID) {
>>>>>> +		/*
>>>>>> +		 * This path is not tested yet,
>>>>>> +		 * only irqchip with resampler was exercised
>>>>>> +		 */
>>>>>> +		kvm_vgic_inject_irq(kvm, 0, spi, level);
>>>>>
>>>>> hmmm, stuff like this definitely makes this patch an RFC.
>>>>
>>>> Yes I acknowledge I shall step back to RFC. I was a bit eager.
>>>
>>> The alternative is to return an error and put a big fat comment saying
>>> this is NOT IMPLEMENTED yet.
>>
>> actually this KVM_USERSPACE_IRQ_SOURCE_ID path is entered when injecting
>> IRQs without resamplerfd (is edge sensitive ones). As such we can link
>> this issue to your above related comment.
>>
>> Anyway I will move it to RFC since there are quite a lot of open points,
>> especially the next one.
>>>
>>>>>
>>>>>> +	} else if (irq_source_id == KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID) {
>>>>>> +		if (level == 1) {
>>>>>
>>>>> This seems very wrong.  What if an external device raises a
>>>>> level-triggered IRQ, but then lowers it again, without the guest was
>>>>> ever running, now you have to wait until the guest sees the (now
>>>>> inactive) interrupt and EOIs it before the interrupt is lowered on the
>>>>> vgic?
>>>> Hum I am bit confused here. when you enter that code, this means an
>>>> irqfd was triggered. This irqfd was registered by some user code that is
>>>> supposed to be alive and do the proper action to complete the IRQ. in my
>>>> case the xgmac toggles down the IRQ if anyone resets the IRQ status
>>>> register? this action is done in the xgmac guest ISR. A device
>>>> reset/error? might provoke the IRQ pin toggling done. Do you see any
>>>> other events?
>>>
>>> Sure, ok, forget about the 'guest wasn't running', but if the guest ISR
>>> is run with guest interrupts disabled and resets the xgmac device (or
>>> imagine some other device that just lowers the level triggered interrupt
>>> after some timeout), then how does the current code handle this?
>> yes I aknowledge current implementation relies on 2 assumptions:
>> - either the EOI happens or
>> - the pending IRQ is cleared through ICPENDRn access
>>
>> and in case of reset typically we are in trouble to clear the distributor.
>>
>> But I currently do not see any way to detect the device IRQ is toggled
>> down without trapping something.
> 
> I guess that's hard without trapping or probing the VFIO driver, but the
> current scheme just seems to fragile and seems to be built around a very
> specific behavior of your guest and host, and not supporting a generic
> solution.
> 
> Either we will have to trap MMIO to know what's going on, or perhaps
> probe the VFIO state when the interrupt is enabled on the vgic by the
> guest.
> 

agreed.
>>>
>>>>>
>>>>>> +			kvm_debug("Inject irqchip routed vIRQ %d\n",
>>>>>> +					e->irqchip.pin);
>>>>>> +			kvm_vgic_inject_irq(kvm, 0, spi, level);
>>>>>> +			/*
>>>>>> +			 * toggling down vIRQ wire is directly handled in
>>>>>> +			 * process_maintenance for this reason:
>>>>>> +			 * irqfd_resampler_ack is called in
>>>>>> +			 * process_maintenance which holds the dist lock.
>>>>>> +			 * irqfd_resampler_ack calls kvm_set_irq
>>>>>> +			 * which ends_up calling kvm_vgic_inject_irq.
>>>>>> +			 * This later attempts to take the lock -> deadlock!
>>>>>> +			 */
>>>>>
>>>>> Not sure I understand this comment.  What are we trying to achieve, are
>>>>> we using some sort of a workaround to avoid a deadlock?
>>>>
>>>> What I wanted to point out here is I would have prefered to handle both
>>>> levels 0 and 1 in a symetrical manner. irqfd_resampler_ack (in eventfd)
>>>> is calling kvm_set_irq with level 0. This would be the prefered way to
>>>> toggle down the SPI at GIC input instead of doing this in
>>>> process_maintenance in a dirty manner. However this does work because
>>>> irqfd_resampler_ack is called in process_maintenance (the place where
>>>> the EOI is analyzed). process_maintenance holds the dist lock and would
>>>> eventually call kvm_vgic_inject_irq which also attempts to take the lock.
>>>> 	
>>>
>>> I'm afraid that's too much of a hack.  There's an external mechanism to
>>> set an interrupt line to active (level=1) or inactive (level=0) and we
>>> must support both.
>>
>>> The fact that vgic_process_maintenance() can set the interrupt line to
>>> inactive is just something we exploit to properly handle level-triggered
>>> interrupts, but the main API to the VGIC must absolutely be supported.
>>>
>>> Am I completely wrong here?
>>
>> Well I am not sure what you call here the VGIC API?  Is it
>> kvm_vgic_inject_irq? I agree with you on the fact It would be cleaner to
>> use this later in both edge transitions.
> 
> Yes, the "set" function pointer, which eventually calls
> vgic_set_assigned_irq(), is your API.  If that API carries semantics
> that you can raise AND lower the irq through that API, then we need to
> support that I think.
agreed too
> 
> Thanks,
> -Christoffer
>
Christoffer Dall June 6, 2014, 12:16 p.m. UTC | #10
On Mon, Jun 02, 2014 at 02:54:12PM +0100, Marc Zyngier wrote:
> Hi Eric,
> 
> On Mon, Jun 02 2014 at  8:29:56 am BST, Eric Auger <eric.auger@linaro.org> wrote:
> > This patch enables irqfd and irq routing on ARM.
> >
> > It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING
> >
> > irqfd framework enables to assign physical IRQs to guests.
> >
> > 1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
> > associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
> > signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
> > injects the specified IRQ into the VM (the "GSI" takes the semantic of a
> > virtual IRQ for that guest).
> 
> Just so I can understand how this works: Who EOIs (handles) the physical
> interrupt? If it is the VFIO driver, then I don't see how you prevent
> the interrupt from firing again immediately (unless this is an edge
> interrupt?).
> 
> > 2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
> > VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
> > a virtual IRQ (aka irchip.pin).  This creates a so-called GSI routing entry.
> > When someone triggers the eventfd, irqfd handles it but uses the specified
> > routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
> > context the GSI takes the semantic of a physical IRQ while the irqchip.pin
> > takes the semantic of a virtual IRQ.
> >
> > in 1) routing is used by irqfd but an identity routing is created by default
> > making the gsi = irqchip.pin. Note on ARM there is a single interrupt
> > controller kind, the GIC.
> >
> > GSI routing mostly is implemented in generic irqchip.c.
> > The tiny ARM specific part is directly implemented in the virtual interrupt
> > controller (vgic.c) as it is done for powerpc for instance. This option was
> > prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
> > Hence irq_comm.c is not used at all.
> >
> > Routing currently is not used for anything else than irqfd IRQ injection. Only
> > SPI can be injected. This means the vgic is not totally hidden behind the
> > irqchip. There are separate discussions on PPI/SGI routing.
> >
> > Only level sensitive IRQs are supported (with a registered resampler). As a
> > reminder the resampler is a second eventfd called by irqfd framework when the
> > virtual IRQ is completed by the guest. This eventfd is supposed to be handled
> > on user-side
> >
> > MSI routing is not supported yet.
> >
> > This work was tested with Calxeda Midway xgmac main interrupt (with and without
> > explicit user routing) with qemu-system-arm and QEMU VFIO platform device.
> >
> > changes v1 -> v2:
> > 2 fixes:
> > - v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
> >   This is now vgic_set_assigned_irq that increments it before injection.
> > - v2 now handles the case where a pending assigned irq is cleared through
> >   MMIO access. The irq is properly acked allowing the resamplefd handler
> >   to possibly unmask the physical IRQ.
> >
> > Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >
> > Conflicts:
> >         Documentation/virtual/kvm/api.txt
> >         arch/arm/kvm/Kconfig
> >
> > Conflicts:
> >         Documentation/virtual/kvm/api.txt
> > ---
> >  Documentation/virtual/kvm/api.txt |   4 +-
> >  arch/arm/include/uapi/asm/kvm.h   |   8 +++
> >  arch/arm/kvm/Kconfig              |   2 +
> >  arch/arm/kvm/Makefile             |   1 +
> >  arch/arm/kvm/irq.h                |  25 +++++++
> >  virt/kvm/arm/vgic.c               | 141 ++++++++++++++++++++++++++++++++++++--
> >  6 files changed, 174 insertions(+), 7 deletions(-)
> >  create mode 100644 arch/arm/kvm/irq.h
> >
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index b4f5365..b376334 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -1339,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
> >  4.52 KVM_SET_GSI_ROUTING
> >
> >  Capability: KVM_CAP_IRQ_ROUTING
> > -Architectures: x86 ia64 s390
> > +Architectures: x86 ia64 s390 arm
> >  Type: vm ioctl
> >  Parameters: struct kvm_irq_routing (in)
> >  Returns: 0 on success, -1 on error
> > @@ -2126,7 +2126,7 @@ into the hash PTE second double word).
> >  4.75 KVM_IRQFD
> >
> >  Capability: KVM_CAP_IRQFD
> > -Architectures: x86 s390
> > +Architectures: x86 s390 arm
> >  Type: vm ioctl
> >  Parameters: struct kvm_irqfd (in)
> >  Returns: 0 on success, -1 on error
> > diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> > index ef0c878..89b864d 100644
> > --- a/arch/arm/include/uapi/asm/kvm.h
> > +++ b/arch/arm/include/uapi/asm/kvm.h
> > @@ -192,6 +192,14 @@ struct kvm_arch_memory_slot {
> >  /* Highest supported SPI, from VGIC_NR_IRQS */
> >  #define KVM_ARM_IRQ_GIC_MAX            127
> >
> > +/* needed by IRQ routing */
> > +
> > +/* One single KVM irqchip, ie. the VGIC */
> > +#define KVM_NR_IRQCHIPS          1
> > +
> > +/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
> > +#define KVM_IRQCHIP_NUM_PINS     256
> 
> Gahhhh... Please don't. We're trying hard to move away from hard-coded
> definitions such as this, since GICv3 has much higher limits. And the
> comment you've added perfectly outlines why this is such a bad idea
> (even on GICv2, we can have up to 960 SPIs).
> 
> Have a look at what's brewing in my kvm-arm64/vgic-dyn branch.
> 
Looked at this a bit more, and the generic code does require this
definition, but it can be arbitrarily high (it is not used to allocate
tables or anything like that).  Could we not just define it to the
highest possible number for now, and then deal with it if the number is
too large for gicv3?

(There may be some more issues related to this and the identity_table
setup code, but I'll let Eric investigate and comment on this).

-Christoffer
Christoffer Dall June 6, 2014, 12:36 p.m. UTC | #11
On Thu, Jun 05, 2014 at 12:37:36PM +0200, Christoffer Dall wrote:
> On Mon, Jun 02, 2014 at 04:42:36PM +0200, Eric Auger wrote:
> > On 06/02/2014 03:54 PM, Marc Zyngier wrote:
> > > Hi Eric,
> > > 
> > > On Mon, Jun 02 2014 at  8:29:56 am BST, Eric Auger <eric.auger@linaro.org> wrote:
> 
> [...]
> 
> > >> @@ -408,11 +411,27 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
> > >>                                           struct kvm_exit_mmio *mmio,
> > >>                                           phys_addr_t offset)
> > >>  {
> > >> -       u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
> > >> +       struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> > >> +       unsigned int i;
> > >> +       bool is_assigned_irq;
> > >> +       DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
> > >> +       DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
> > >> +       unsigned long *pending =
> > >> +               vgic_bitmap_get_shared_map(&dist->irq_state);
> > >> +       u32 *reg;
> > >> +       bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);
> > > 
> > > That's really heavy. You could find out which interrupts are potentially
> > > affected (only 32 of them) and just handle those. Also, you do the copy
> > > on both the read and write paths. Not great.
> > for the copy you are fully right. I will add the check. Then to detect
> > which pending IRQ is cleared I need to further study how I can optimize.
> > Why do you say 32?  Can't any SPI be assigned to a guest?
> > > 
> > >> +       reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
> > >>                                        vcpu->vcpu_id, offset);
> > >>         vgic_reg_access(mmio, reg, offset,
> > >>                         ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
> > >>         if (mmio->is_write) {
> > >> +               pending = vgic_bitmap_get_shared_map(&dist->irq_state);
> > >> +               bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
> > >> +               for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
> > >> +                       is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
> > >> +                       if (is_assigned_irq)
> > >> +                               kvm_notify_acked_irq(vcpu->kvm, 0, i);
> > > 
> > > Are you saying that a masked interrupt should be treated the same as an
> > > EOI-ed interrupt? That seems wrong from my PoV.
> > Actually all that stuff comes from a bug I encountered with
> > qemu_system_arm and the VFIO platform QEMU device (RFCv3 sent today).
> > The scenario is the following:
> > 1) I launch a 1st qemu_system_arm session with one xgmac bound to the
> > KVM guest with vfio. IRQs are routed through irqfd.
> > 2) I kill that session and launch a 2d one.
> > 
> > After the 1st session kill, xgmac ic still running (funny principle of
> > VFIO "meta" driver which is HW device agnostic and do not know how to
> > reset the xgmac). So very early I can see the xgmac sends a main IRQ
> > which is handled by the vfio platform driver. This later masks the IRQ
> > before signaling the eventfd. During guest setup I observe MMIO accesses
> > that clears the xgmac pending IRQ under the hood. So for that IRQ the
> > maintenance IRQ code will never be called, the notifier will not be be
> > acked and thus the IRQ is never unmasked at VFIO driver level. As a
> > result the xgmac driver gets stuck.
> > 
> > So currently this is the best fix I have found. VFIO Reset management
> > need to be further studied anyway. THis is planned but I understand it
> > will not straightforward.
> 
> It sounds to me like this is being done in the wrong place.  ICPENDRn
> and ISPENDRn are used for saving/restoring state, not for
> enabling/disabling IRQs.
> 
> Somehow, the VFIO driver must know that the xgmac has an active
> interrupt which is masked.  When you setup the IRQ routing for your new
> VM (when you start it the second time) there must be a mechanism to
> probe this state, but you don't inject that into the VM until the VM
> actually enables that IRQ (the guest kernel does request_irq()).  This
> should already be handled by the existing vgic code.
> 
> Note that what should also happen is that the guest driver should now
> reset the xgmac so that it doesn't inject IRQs, which should let VFIO
> know to lower the line to the vgic too, and then only later it becomes
> re-enabeld.  But that should be handled generically, iow. we shouldn't
> write code specifically to address only such a flow of events.
> 

Talked to Eric about this again, and I think the problem is that we're
not handling ICPENDRn and ISPENDRn properly.  It was a pain to deal with
this in the QEMU too, but basically we're not honering the semantics of
the registers, because the final pending state needs to be or'ed with
the external input signal.

I'll have a look at fixing this.

-Christoffer
Auger Eric June 17, 2014, 11:42 a.m. UTC | #12
On 06/06/2014 02:16 PM, Christoffer Dall wrote:
> On Mon, Jun 02, 2014 at 02:54:12PM +0100, Marc Zyngier wrote:
>> Hi Eric,
>>
>> On Mon, Jun 02 2014 at  8:29:56 am BST, Eric Auger <eric.auger@linaro.org> wrote:
>>> This patch enables irqfd and irq routing on ARM.
>>>
>>> It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING
>>>
>>> irqfd framework enables to assign physical IRQs to guests.
>>>
>>> 1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
>>> associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
>>> signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
>>> injects the specified IRQ into the VM (the "GSI" takes the semantic of a
>>> virtual IRQ for that guest).
>>
>> Just so I can understand how this works: Who EOIs (handles) the physical
>> interrupt? If it is the VFIO driver, then I don't see how you prevent
>> the interrupt from firing again immediately (unless this is an edge
>> interrupt?).
>>
>>> 2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
>>> VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
>>> a virtual IRQ (aka irchip.pin).  This creates a so-called GSI routing entry.
>>> When someone triggers the eventfd, irqfd handles it but uses the specified
>>> routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
>>> context the GSI takes the semantic of a physical IRQ while the irqchip.pin
>>> takes the semantic of a virtual IRQ.
>>>
>>> in 1) routing is used by irqfd but an identity routing is created by default
>>> making the gsi = irqchip.pin. Note on ARM there is a single interrupt
>>> controller kind, the GIC.
>>>
>>> GSI routing mostly is implemented in generic irqchip.c.
>>> The tiny ARM specific part is directly implemented in the virtual interrupt
>>> controller (vgic.c) as it is done for powerpc for instance. This option was
>>> prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
>>> Hence irq_comm.c is not used at all.
>>>
>>> Routing currently is not used for anything else than irqfd IRQ injection. Only
>>> SPI can be injected. This means the vgic is not totally hidden behind the
>>> irqchip. There are separate discussions on PPI/SGI routing.
>>>
>>> Only level sensitive IRQs are supported (with a registered resampler). As a
>>> reminder the resampler is a second eventfd called by irqfd framework when the
>>> virtual IRQ is completed by the guest. This eventfd is supposed to be handled
>>> on user-side
>>>
>>> MSI routing is not supported yet.
>>>
>>> This work was tested with Calxeda Midway xgmac main interrupt (with and without
>>> explicit user routing) with qemu-system-arm and QEMU VFIO platform device.
>>>
>>> changes v1 -> v2:
>>> 2 fixes:
>>> - v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
>>>   This is now vgic_set_assigned_irq that increments it before injection.
>>> - v2 now handles the case where a pending assigned irq is cleared through
>>>   MMIO access. The irq is properly acked allowing the resamplefd handler
>>>   to possibly unmask the physical IRQ.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>
>>> Conflicts:
>>>         Documentation/virtual/kvm/api.txt
>>>         arch/arm/kvm/Kconfig
>>>
>>> Conflicts:
>>>         Documentation/virtual/kvm/api.txt
>>> ---
>>>  Documentation/virtual/kvm/api.txt |   4 +-
>>>  arch/arm/include/uapi/asm/kvm.h   |   8 +++
>>>  arch/arm/kvm/Kconfig              |   2 +
>>>  arch/arm/kvm/Makefile             |   1 +
>>>  arch/arm/kvm/irq.h                |  25 +++++++
>>>  virt/kvm/arm/vgic.c               | 141 ++++++++++++++++++++++++++++++++++++--
>>>  6 files changed, 174 insertions(+), 7 deletions(-)
>>>  create mode 100644 arch/arm/kvm/irq.h
>>>
>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>> index b4f5365..b376334 100644
>>> --- a/Documentation/virtual/kvm/api.txt
>>> +++ b/Documentation/virtual/kvm/api.txt
>>> @@ -1339,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
>>>  4.52 KVM_SET_GSI_ROUTING
>>>
>>>  Capability: KVM_CAP_IRQ_ROUTING
>>> -Architectures: x86 ia64 s390
>>> +Architectures: x86 ia64 s390 arm
>>>  Type: vm ioctl
>>>  Parameters: struct kvm_irq_routing (in)
>>>  Returns: 0 on success, -1 on error
>>> @@ -2126,7 +2126,7 @@ into the hash PTE second double word).
>>>  4.75 KVM_IRQFD
>>>
>>>  Capability: KVM_CAP_IRQFD
>>> -Architectures: x86 s390
>>> +Architectures: x86 s390 arm
>>>  Type: vm ioctl
>>>  Parameters: struct kvm_irqfd (in)
>>>  Returns: 0 on success, -1 on error
>>> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
>>> index ef0c878..89b864d 100644
>>> --- a/arch/arm/include/uapi/asm/kvm.h
>>> +++ b/arch/arm/include/uapi/asm/kvm.h
>>> @@ -192,6 +192,14 @@ struct kvm_arch_memory_slot {
>>>  /* Highest supported SPI, from VGIC_NR_IRQS */
>>>  #define KVM_ARM_IRQ_GIC_MAX            127
>>>
>>> +/* needed by IRQ routing */
>>> +
>>> +/* One single KVM irqchip, ie. the VGIC */
>>> +#define KVM_NR_IRQCHIPS          1
>>> +
>>> +/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
>>> +#define KVM_IRQCHIP_NUM_PINS     256
>>
>> Gahhhh... Please don't. We're trying hard to move away from hard-coded
>> definitions such as this, since GICv3 has much higher limits. And the
>> comment you've added perfectly outlines why this is such a bad idea
>> (even on GICv2, we can have up to 960 SPIs).
>>
>> Have a look at what's brewing in my kvm-arm64/vgic-dyn branch.
>>
> Looked at this a bit more, and the generic code does require this
> definition, but it can be arbitrarily high (it is not used to allocate
> tables or anything like that).  Could we not just define it to the
> highest possible number for now, and then deal with it if the number is
> too large for gicv3?

Actually KVM_IRQCHIP_NUM_PINS is used for static allocation  of chip[][]
in kernel routing table struct reminded below.As such it must be
reasonably sized I am afraid.

struct kvm_irq_routing_table {
	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
	struct kvm_kernel_irq_routing_entry *rt_entries;
	u32 nr_rt_entries;
	/*
	 * Array indexed by gsi. Each entry contains list of irq chips
	 * the gsi is connected to.
	 */
	struct hlist_head map[0];
};

As a temporary solution waiting for kvm-arm64/vgic-dyn code, may I use
KVM_ARM_IRQ_GIC_MAX as KVM_IRQCHIP_NUM_PINS value?
> 
> (There may be some more issues related to this and the identity_table
> setup code, but I'll let Eric investigate and comment on this).
What I understand from other architecture implementations like powerpc
or s390 is only a minimalist dummy routing table is created by default
on kernel side. This makes possible to dereference the routing table
pointer. They rely on user side to create the routing table using the
KVM_SET_GSI_ROUTING KVM IOCTL. This is typically done in
hw/intc/openpic_kvm.c, in realize function

    /* set up irq routing */
    kvm_init_irq_routing(kvm_state);
    for (i = 0; i < 256; ++i) {
        kvm_irqchip_add_irq_route(kvm_state, i, 0, i);
    }
    kvm_irqchip_commit_routes(s);

What confuses me is KVM API does not mention specifically that a routing
table must be explicitly built by the user-side before using KVM_IRQFD.
Building the identity table in the kernel enables to avoid such user
action and this is why I did the job in the kernel instead. I asked the
question explicitly on the kvm ML.

Best Regards

Eric

> 
> -Christoffer
>
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index b4f5365..b376334 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1339,7 +1339,7 @@  KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
 4.52 KVM_SET_GSI_ROUTING
 
 Capability: KVM_CAP_IRQ_ROUTING
-Architectures: x86 ia64 s390
+Architectures: x86 ia64 s390 arm
 Type: vm ioctl
 Parameters: struct kvm_irq_routing (in)
 Returns: 0 on success, -1 on error
@@ -2126,7 +2126,7 @@  into the hash PTE second double word).
 4.75 KVM_IRQFD
 
 Capability: KVM_CAP_IRQFD
-Architectures: x86 s390
+Architectures: x86 s390 arm
 Type: vm ioctl
 Parameters: struct kvm_irqfd (in)
 Returns: 0 on success, -1 on error
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index ef0c878..89b864d 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -192,6 +192,14 @@  struct kvm_arch_memory_slot {
 /* Highest supported SPI, from VGIC_NR_IRQS */
 #define KVM_ARM_IRQ_GIC_MAX		127
 
+/* needed by IRQ routing */
+
+/* One single KVM irqchip, ie. the VGIC */
+#define KVM_NR_IRQCHIPS          1
+
+/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
+#define KVM_IRQCHIP_NUM_PINS     256
+
 /* PSCI interface */
 #define KVM_PSCI_FN_BASE		0x95c1ba5e
 #define KVM_PSCI_FN(n)			(KVM_PSCI_FN_BASE + (n))
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 4be5bb1..096692c 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -24,6 +24,7 @@  config KVM
 	select KVM_MMIO
 	select KVM_ARM_HOST
 	depends on ARM_VIRT_EXT && ARM_LPAE && !CPU_BIG_ENDIAN
+	select HAVE_KVM_EVENTFD
 	---help---
 	  Support hosting virtualized guest machines. You will also
 	  need to select one or more of the processor modules below.
@@ -56,6 +57,7 @@  config KVM_ARM_VGIC
 	bool "KVM support for Virtual GIC"
 	depends on KVM_ARM_HOST && OF
 	select HAVE_KVM_IRQCHIP
+	select HAVE_KVM_IRQ_ROUTING
 	default y
 	---help---
 	  Adds support for a hardware assisted, in-kernel GIC emulation.
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 789bca9..29de111 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -21,4 +21,5 @@  obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
 obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
 obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
+obj-$(CONFIG_HAVE_KVM_EVENTFD) += $(KVM)/eventfd.o $(KVM)/irqchip.o
 obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h
new file mode 100644
index 0000000..4d6fcc6
--- /dev/null
+++ b/arch/arm/kvm/irq.h
@@ -0,0 +1,25 @@ 
+/*
+ * Copyright (C) 2014, STMicroelectronics
+ * Authors: Eric Auger <eric.auger@st.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef __IRQ_H
+#define __IRQ_H
+
+#include <linux/kvm_host.h>
+/*
+ * Placeholder for irqchip and irq/msi routing declarations
+ * included in irqchip.c
+ */
+
+#endif
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 56ff9be..39afa0d 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -93,6 +93,9 @@  static struct device_node *vgic_node;
 #define ACCESS_WRITE_VALUE	(3 << 1)
 #define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
 
+static struct kvm_irq_routing_entry identity_table[VGIC_NR_IRQS];
+static int set_default_routing_table(struct kvm *kvm);
+
 static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
 static void vgic_update_state(struct kvm *kvm);
 static void vgic_kick_vcpus(struct kvm *kvm);
@@ -408,11 +411,27 @@  static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
 					  struct kvm_exit_mmio *mmio,
 					  phys_addr_t offset)
 {
-	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	unsigned int i;
+	bool is_assigned_irq;
+	DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
+	DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
+	unsigned long *pending =
+		vgic_bitmap_get_shared_map(&dist->irq_state);
+	u32 *reg;
+	bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);
+	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
 				       vcpu->vcpu_id, offset);
 	vgic_reg_access(mmio, reg, offset,
 			ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
 	if (mmio->is_write) {
+		pending = vgic_bitmap_get_shared_map(&dist->irq_state);
+		bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
+		for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
+			is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
+			if (is_assigned_irq)
+				kvm_notify_acked_irq(vcpu->kvm, 0, i);
+		}
 		vgic_update_state(vcpu->kvm);
 		return true;
 	}
@@ -1172,6 +1191,8 @@  static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	bool level_pending = false;
+	struct kvm *kvm;
+	int is_assigned_irq;
 
 	kvm_debug("MISR = %08x\n", vgic_cpu->vgic_misr);
 
@@ -1189,12 +1210,23 @@  static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 			vgic_irq_clear_active(vcpu, irq);
 			vgic_cpu->vgic_lr[lr] &= ~GICH_LR_EOI;
 
+			kvm = vcpu->kvm;
+			is_assigned_irq =
+				kvm_irq_has_notifier(kvm, 0, irq-VGIC_NR_PRIVATE_IRQS);
 			/* Any additional pending interrupt? */
-			if (vgic_dist_irq_is_pending(vcpu, irq)) {
-				vgic_cpu_irq_set(vcpu, irq);
-				level_pending = true;
-			} else {
+			if (is_assigned_irq) {
 				vgic_cpu_irq_clear(vcpu, irq);
+				kvm_debug("EOI irqchip routed vIRQ %d\n", irq);
+				kvm_notify_acked_irq(kvm, 0,
+					irq-VGIC_NR_PRIVATE_IRQS);
+				vgic_dist_irq_clear(vcpu, irq);
+			} else {
+				if (vgic_dist_irq_is_pending(vcpu, irq)) {
+					vgic_cpu_irq_set(vcpu, irq);
+					level_pending = true;
+				} else {
+					vgic_cpu_irq_clear(vcpu, irq);
+				}
 			}
 
 			/*
@@ -1627,6 +1659,8 @@  int kvm_vgic_create(struct kvm *kvm)
 	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
 	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
 
+	set_default_routing_table(kvm);
+
 out_unlock:
 	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
 		vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
@@ -2017,3 +2051,100 @@  struct kvm_device_ops kvm_arm_vgic_v2_ops = {
 	.get_attr = vgic_get_attr,
 	.has_attr = vgic_has_attr,
 };
+
+
+/*
+ * set up a default identity routing table
+ * The user-side can further change the routing table using
+ * KVM_SET_GSI_ROUTING VM ioctl
+ */
+
+static int set_default_routing_table(struct kvm *kvm)
+{
+	struct kvm_irq_routing_entry;
+	int i;
+	for (i = 0; i < VGIC_NR_IRQS; i++)	{
+		identity_table[i].gsi = i;
+		identity_table[i].type = KVM_IRQ_ROUTING_IRQCHIP;
+		identity_table[i].u.irqchip.irqchip = 0;
+		identity_table[i].u.irqchip.pin = i;
+	}
+	return kvm_set_irq_routing(kvm, identity_table,
+				   ARRAY_SIZE(identity_table), 0);
+}
+
+
+/*
+ * Functions needed for GSI routing (used by irqchip.c)
+ * implemented in irq_comm.c for x86 and ia64
+ * in architecture specific files for some other archictures (powerpc)
+ */
+
+static int vgic_set_assigned_irq(struct kvm_kernel_irq_routing_entry *e,
+			   struct kvm *kvm, int irq_source_id, int level,
+			   bool line_status)
+{
+	unsigned int spi = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
+
+	if (irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID) {
+		/*
+		 * This path is not tested yet,
+		 * only irqchip with resampler was exercised
+		 */
+		kvm_vgic_inject_irq(kvm, 0, spi, level);
+	} else if (irq_source_id == KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID) {
+		if (level == 1) {
+			kvm_debug("Inject irqchip routed vIRQ %d\n",
+					e->irqchip.pin);
+			kvm_vgic_inject_irq(kvm, 0, spi, level);
+			/*
+			 * toggling down vIRQ wire is directly handled in
+			 * process_maintenance for this reason:
+			 * irqfd_resampler_ack is called in
+			 * process_maintenance which holds the dist lock.
+			 * irqfd_resampler_ack calls kvm_set_irq
+			 * which ends_up calling kvm_vgic_inject_irq.
+			 * This later attempts to take the lock -> deadlock!
+			 */
+		}
+	}
+	return 0;
+
+}
+
+/* void implementation requested to compile irqchip.c */
+
+int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
+		struct kvm *kvm, int irq_source_id, int level, bool line_status)
+{
+	return 0;
+}
+
+int kvm_set_routing_entry(struct kvm_irq_routing_table *rt,
+			  struct kvm_kernel_irq_routing_entry *e,
+			  const struct kvm_irq_routing_entry *ue)
+{
+	int r = -EINVAL;
+
+	switch (ue->type) {
+	case KVM_IRQ_ROUTING_IRQCHIP:
+		e->set = vgic_set_assigned_irq;
+		e->irqchip.irqchip = ue->u.irqchip.irqchip;
+		e->irqchip.pin = ue->u.irqchip.pin;
+		if (e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS)
+			goto out;
+		/* chip[0][virtualID] = physicalID  */
+		rt->chip[ue->u.irqchip.irqchip][e->irqchip.pin] = ue->gsi;
+		break;
+	default:
+		goto out;
+	}
+
+	r = 0;
+out:
+	return r;
+}
+
+
+
+