Message ID | 1342811652-16931-1-git-send-email-peter.maydell@linaro.org |
---|---|
State | Superseded |
Headers | show |
On 2012-07-20 21:14, Peter Maydell wrote: > kvm_allows_irq0_override() is a totally x86 specific concept: > move it to the target-specific source file where it belongs. > This means we need a new header file for the prototype: > kvm_i386.h, in line with the existing kvm_ppc.h. First of all, the patch is inconsistent as it lacks something like target-i386/kvm-stub.c (IOW, you forgot about kvm_allows_irq0_override in kvm-stub.c). But the approach is wrong in general, see below. > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- > I'm sure this isn't the only x86ism in the KVM generic source > files. However the thing I'm specifically trying to do is > nuke all the uses of kvm_irqchip_in_kernel() in common code, No, "irqchip in kernel" is supposed to be a generic concept. We will also have it on Power. Not sure what your plans are for ARM, maybe it will always be true there. That said, maybe there is room for discussion about what it means for the general KVM code and its users if the irqchip is in the kernel. Two things that should be common for every arch: - VCPU idle management is done inside the kernel - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly The latter point implies that irqfd is available and that interrupt routes from virtual IRQs (*) (like the one associated with an irqfd) to the in-kernel IRQ controller have to be established. That's pretty generic. > as part of trying to untangle "what irqchip model do you want" > from other aspects of the interrupt handling model. > > Other than this one, there are uses in cpus.c and hw/virtio-pci.c, > but I haven't figured out yet exactly what those bits of code > are trying to do... See above for the two reasons. Jan (*) "GSI" actually means "physical or virtual IRQ line".
On 21 July 2012 07:57, Jan Kiszka <jan.kiszka@web.de> wrote: > On 2012-07-20 21:14, Peter Maydell wrote: >> I'm sure this isn't the only x86ism in the KVM generic source >> files. However the thing I'm specifically trying to do is >> nuke all the uses of kvm_irqchip_in_kernel() in common code, > > No, "irqchip in kernel" is supposed to be a generic concept. We will > also have it on Power. Not sure what your plans are for ARM, maybe it > will always be true there. I agree that "irqchip in kernel?" is generic (though as you'll see below there's disagreement about what that ought to mean or imply). "irq0_override" though seems to me to be absolutely x86 specific. > That said, maybe there is room for discussion about what it means for > the general KVM code and its users if the irqchip is in the kernel. Two > things that should be common for every arch: > - VCPU idle management is done inside the kernel The trouble is that at the moment QEMU assumes that "is the irqchip in kernel?" == "is VCPU idle management in kernel", for instance. For ARM, VCPU idle management is in kernel whether we're using the kernel's model of the VGIC or not. Alex tells me PPC is the same way. It's only x86 that has tied these two concepts together. The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel() is because I think they're all similar to this -- the common code is using the check as a proxy for something else, and it should be directly asking about that something else. The only bits of code that should care about "is the irqchip in kernel?" are: * target-specific device/machine setup code which needs to know which apic/etc to instantiate * target-specific x86 code which has this weird synchronous IRQ delivery model for irqchip-not-in-kernel (Obviously I might have missed something, I'm flailing around trying to understand this code :-)) > - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly > > The latter point implies that irqfd is available and that interrupt > routes from virtual IRQs (*) (like the one associated with an irqfd) to > the in-kernel IRQ controller have to be established. That's pretty generic. But you can perfectly well have an in-kernel-irqchip that doesn't support irqfd -- it just means that interrupts from devices have to come in via the ioctls same as anything else. Some in-kernel helpers obviously would depend on the existence and use of a full featured in-kernel irqchip (on ARM you don't get the in kernel timer unless you have in kernel VGIC), but I don't see why the virtio code should be asking "is there an in kernel irqchip?" rather than "can I do irqfd routing?" or whatever the question is it actually wants to ask. (In fact the virtio code probably needs to do something more complex anyway: you could perfectly well have a system where there is a full-featured irqchip in the kernel but the virtio device is on the "wrong" side of a second interrupt controller which is not in-kernel. So the actual question it needs to ask is "does the interrupt wiring in this specific machine model mean I can get and use an irqfd from where I am to the main CPU interrupt controller?" or something similar.) >> as part of trying to untangle "what irqchip model do you want" >> from other aspects of the interrupt handling model. >> >> Other than this one, there are uses in cpus.c and hw/virtio-pci.c, >> but I haven't figured out yet exactly what those bits of code >> are trying to do... > > See above for the two reasons. Thanks for the explanations. -- PMM
On 2012-07-21 10:54, Peter Maydell wrote: > On 21 July 2012 07:57, Jan Kiszka <jan.kiszka@web.de> wrote: >> On 2012-07-20 21:14, Peter Maydell wrote: >>> I'm sure this isn't the only x86ism in the KVM generic source >>> files. However the thing I'm specifically trying to do is >>> nuke all the uses of kvm_irqchip_in_kernel() in common code, >> >> No, "irqchip in kernel" is supposed to be a generic concept. We will >> also have it on Power. Not sure what your plans are for ARM, maybe it >> will always be true there. > > I agree that "irqchip in kernel?" is generic (though as you'll see > below there's disagreement about what that ought to mean or imply). > "irq0_override" though seems to me to be absolutely x86 specific. Naming is x86 specific, semantic not. It means that KVM doesn't prevent remapping of IRQs. Granted, I really hope you don't make such mistakes in your arch. > >> That said, maybe there is room for discussion about what it means for >> the general KVM code and its users if the irqchip is in the kernel. Two >> things that should be common for every arch: >> - VCPU idle management is done inside the kernel > > The trouble is that at the moment QEMU assumes that "is the > irqchip in kernel?" == "is VCPU idle management in kernel", for > instance. For ARM, VCPU idle management is in kernel whether > we're using the kernel's model of the VGIC or not. Alex tells > me PPC is the same way. It's only x86 that has tied these two > concepts together. Hmm, and why does Power work despite this mismatch? If cpu_thread_is_idle doesn't work for you, define something like kvm_idle_in_kernel() to replace kvm_irqchip_in_kernel here. > > The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel() > is because I think they're all similar to this -- the common code is > using the check as a proxy for something else, and it should be directly > asking about that something else. The only bits of code that should > care about "is the irqchip in kernel?" are: > * target-specific device/machine setup code which needs to know > which apic/etc to instantiate > * target-specific x86 code which has this weird synchronous IRQ > delivery model for irqchip-not-in-kernel > (Obviously I might have missed something, I'm flailing around > trying to understand this code :-)) > >> - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly >> >> The latter point implies that irqfd is available and that interrupt >> routes from virtual IRQs (*) (like the one associated with an irqfd) to >> the in-kernel IRQ controller have to be established. That's pretty generic. > > But you can perfectly well have an in-kernel-irqchip that doesn't > support irqfd You could, thought this doesn't make much sense. > -- it just means that interrupts from devices have > to come in via the ioctls same as anything else. Some in-kernel > helpers obviously would depend on the existence and use of a full > featured in-kernel irqchip (on ARM you don't get the in kernel timer > unless you have in kernel VGIC), but I don't see why the virtio code > should be asking "is there an in kernel irqchip?" rather than "can > I do irqfd routing?" or whatever the question is it actually wants > to ask. (In fact the virtio code probably needs to do something > more complex anyway: you could perfectly well have a system where > there is a full-featured irqchip in the kernel but the virtio > device is on the "wrong" side of a second interrupt controller > which is not in-kernel. So the actual question it needs to ask > is "does the interrupt wiring in this specific machine model mean > I can get and use an irqfd from where I am to the main CPU > interrupt controller?" or something similar.) Well, same here: then define more precise generic test functions. Jan
On 21 July 2012 10:14, Jan Kiszka <jan.kiszka@web.de> wrote: > On 2012-07-21 10:54, Peter Maydell wrote: >> On 21 July 2012 07:57, Jan Kiszka <jan.kiszka@web.de> wrote: >>> On 2012-07-20 21:14, Peter Maydell wrote: >>>> I'm sure this isn't the only x86ism in the KVM generic source >>>> files. However the thing I'm specifically trying to do is >>>> nuke all the uses of kvm_irqchip_in_kernel() in common code, >>> >>> No, "irqchip in kernel" is supposed to be a generic concept. We will >>> also have it on Power. Not sure what your plans are for ARM, maybe it >>> will always be true there. >> >> I agree that "irqchip in kernel?" is generic (though as you'll see >> below there's disagreement about what that ought to mean or imply). >> "irq0_override" though seems to me to be absolutely x86 specific. > > Naming is x86 specific, semantic not. It means that KVM doesn't prevent > remapping of IRQs. Granted, I really hope you don't make such mistakes > in your arch. What does "remapping of IRQs" mean here? This is still sounding not very generic to me, in that I really don't know what it would mean in an ARM context. The fact that the only caller of this is in hw/pc.c is also a big red flag that this isn't exactly generic. >>> That said, maybe there is room for discussion about what it means for >>> the general KVM code and its users if the irqchip is in the kernel. Two >>> things that should be common for every arch: >>> - VCPU idle management is done inside the kernel >> >> The trouble is that at the moment QEMU assumes that "is the >> irqchip in kernel?" == "is VCPU idle management in kernel", for >> instance. For ARM, VCPU idle management is in kernel whether >> we're using the kernel's model of the VGIC or not. Alex tells >> me PPC is the same way. It's only x86 that has tied these two >> concepts together. > > Hmm, and why does Power work despite this mismatch? I think because hw/ppc.c:ppc_set_irq() both calls cpu_interrupt() and also kvmppc_set_interrupt(), so we end up with a sort of odd mix of both models... Alex? > If cpu_thread_is_idle doesn't work for you, define something like > kvm_idle_in_kernel() to replace kvm_irqchip_in_kernel here. Yes, this is kind of where I'm headed. I thought I'd start with this patch as the easiest one first, though... >> The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel() >> is because I think they're all similar to this -- the common code is >> using the check as a proxy for something else, and it should be directly >> asking about that something else. The only bits of code that should >> care about "is the irqchip in kernel?" are: >> * target-specific device/machine setup code which needs to know >> which apic/etc to instantiate >> * target-specific x86 code which has this weird synchronous IRQ >> delivery model for irqchip-not-in-kernel >> (Obviously I might have missed something, I'm flailing around >> trying to understand this code :-)) >> >>> - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly >>> >>> The latter point implies that irqfd is available and that interrupt >>> routes from virtual IRQs (*) (like the one associated with an irqfd) to >>> the in-kernel IRQ controller have to be established. That's pretty generic. >> >> But you can perfectly well have an in-kernel-irqchip that doesn't >> support irqfd > > You could, thought this doesn't make much sense. Why doesn't it make sense? On ARM, in-kernel-irqchip means you can take advantage of the hardware support for a virtual GIC, and you can use the virtual timer support too. These are both big performance advantages even if QEMU never does anything with irqfds. (In fact the current ARM KVM VGIC code doesn't support irqfds as far as I can see from a quick scan of the kernel code.) >> -- it just means that interrupts from devices have >> to come in via the ioctls same as anything else. Some in-kernel >> helpers obviously would depend on the existence and use of a full >> featured in-kernel irqchip (on ARM you don't get the in kernel timer >> unless you have in kernel VGIC), but I don't see why the virtio code >> should be asking "is there an in kernel irqchip?" rather than "can >> I do irqfd routing?" or whatever the question is it actually wants >> to ask. (In fact the virtio code probably needs to do something >> more complex anyway: you could perfectly well have a system where >> there is a full-featured irqchip in the kernel but the virtio >> device is on the "wrong" side of a second interrupt controller >> which is not in-kernel. So the actual question it needs to ask >> is "does the interrupt wiring in this specific machine model mean >> I can get and use an irqfd from where I am to the main CPU >> interrupt controller?" or something similar.) > > Well, same here: then define more precise generic test functions. First I have to understand what the current code is trying to do; your explanations have been very helpful in that respect. -- PMM
On 2012-07-21 11:30, Peter Maydell wrote: > On 21 July 2012 10:14, Jan Kiszka <jan.kiszka@web.de> wrote: >> On 2012-07-21 10:54, Peter Maydell wrote: >>> On 21 July 2012 07:57, Jan Kiszka <jan.kiszka@web.de> wrote: >>>> On 2012-07-20 21:14, Peter Maydell wrote: >>>>> I'm sure this isn't the only x86ism in the KVM generic source >>>>> files. However the thing I'm specifically trying to do is >>>>> nuke all the uses of kvm_irqchip_in_kernel() in common code, >>>> >>>> No, "irqchip in kernel" is supposed to be a generic concept. We will >>>> also have it on Power. Not sure what your plans are for ARM, maybe it >>>> will always be true there. >>> >>> I agree that "irqchip in kernel?" is generic (though as you'll see >>> below there's disagreement about what that ought to mean or imply). >>> "irq0_override" though seems to me to be absolutely x86 specific. >> >> Naming is x86 specific, semantic not. It means that KVM doesn't prevent >> remapping of IRQs. Granted, I really hope you don't make such mistakes >> in your arch. > > What does "remapping of IRQs" mean here? This is still sounding It means that the QEMU model of the board can define interrupt routes in an unconfined way, which is obviously always true when the irqchips are all in userspace but not necessarily when KVM support is in the loop. > not very generic to me, in that I really don't know what it would > mean in an ARM context. The fact that the only caller of this is > in hw/pc.c is also a big red flag that this isn't exactly generic. x86 is also still the only arch with full in-kernel irqchip support. And even if there is only one arch using it, that doesn't mean the test needs to be moved around - if the test itself is generic, just always true for other archs. > >>>> That said, maybe there is room for discussion about what it means for >>>> the general KVM code and its users if the irqchip is in the kernel. Two >>>> things that should be common for every arch: >>>> - VCPU idle management is done inside the kernel >>> >>> The trouble is that at the moment QEMU assumes that "is the >>> irqchip in kernel?" == "is VCPU idle management in kernel", for >>> instance. For ARM, VCPU idle management is in kernel whether >>> we're using the kernel's model of the VGIC or not. Alex tells >>> me PPC is the same way. It's only x86 that has tied these two >>> concepts together. >> >> Hmm, and why does Power work despite this mismatch? > > I think because hw/ppc.c:ppc_set_irq() both calls cpu_interrupt() > and also kvmppc_set_interrupt(), so we end up with a sort of odd > mix of both models... Alex? > >> If cpu_thread_is_idle doesn't work for you, define something like >> kvm_idle_in_kernel() to replace kvm_irqchip_in_kernel here. > > Yes, this is kind of where I'm headed. I thought I'd start with this > patch as the easiest one first, though... Before moving anything, let's refine/break up the semantics of kvm_irqchip_in_kernel first. > >>> The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel() >>> is because I think they're all similar to this -- the common code is >>> using the check as a proxy for something else, and it should be directly >>> asking about that something else. The only bits of code that should >>> care about "is the irqchip in kernel?" are: >>> * target-specific device/machine setup code which needs to know >>> which apic/etc to instantiate >>> * target-specific x86 code which has this weird synchronous IRQ >>> delivery model for irqchip-not-in-kernel >>> (Obviously I might have missed something, I'm flailing around >>> trying to understand this code :-)) >>> >>>> - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly >>>> >>>> The latter point implies that irqfd is available and that interrupt >>>> routes from virtual IRQs (*) (like the one associated with an irqfd) to >>>> the in-kernel IRQ controller have to be established. That's pretty generic. >>> >>> But you can perfectly well have an in-kernel-irqchip that doesn't >>> support irqfd >> >> You could, thought this doesn't make much sense. > > Why doesn't it make sense? On ARM, in-kernel-irqchip means you can take > advantage of the hardware support for a virtual GIC, and you can use > the virtual timer support too. These are both big performance advantages > even if QEMU never does anything with irqfds. (In fact the current > ARM KVM VGIC code doesn't support irqfds as far as I can see from > a quick scan of the kernel code.) It doesn't make sense as it means your in-kernel irqchip model is semi-finished. If you didn't consider how to support direct in-kernel IRQ injections, you risk designing something that requires userspace quirk handling later on when extending it to full-featured in-kernel irqchip support. Of course, you are free to do this stepwise, but your digging through QEMU /wrt x86-specifics in the KVM layer should warn you about the risks. ;) Jan
On 21 July 2012 10:44, Jan Kiszka <jan.kiszka@web.de> wrote: > On 2012-07-21 11:30, Peter Maydell wrote: >> On 21 July 2012 10:14, Jan Kiszka <jan.kiszka@web.de> wrote: >>> On 2012-07-21 10:54, Peter Maydell wrote: >>>> On 21 July 2012 07:57, Jan Kiszka <jan.kiszka@web.de> wrote: >>> Naming is x86 specific, semantic not. It means that KVM doesn't prevent >>> remapping of IRQs. Granted, I really hope you don't make such mistakes >>> in your arch. >> >> What does "remapping of IRQs" mean here? > > It means that the QEMU model of the board can define interrupt routes in > an unconfined way, which is obviously always true when the irqchips are > all in userspace but not necessarily when KVM support is in the loop. Er, surely it's only false if your KVM irqchip is obviously broken? I mean, any sane in-kernel-irqchip model needs to support the same set of incoming irq lines as the out-of-kernel irqchip model it's replacing. There's no difference in flexibility there. Or are you trying to talk about defining interrupt routes when the source and destination are both kernel code but the route needs to be set by userspace (ie is machine specific not cpu specific)? Whether that's possible sounds to me like it would depend on all the board model code between the source and destination rather than being a single global boolean check. >> not very generic to me, in that I really don't know what it would >> mean in an ARM context. The fact that the only caller of this is >> in hw/pc.c is also a big red flag that this isn't exactly generic. > > x86 is also still the only arch with full in-kernel irqchip support. And > even if there is only one arch using it, that doesn't mean the test > needs to be moved around - if the test itself is generic, just always > true for other archs. I just don't see why this check is anything other than "do I have a broken x86 kernel irqchip?" In particular it doesn't have any documented semantics defined in generic terms that you could usefully use in general QEMU code to say "do I need to call this function and what should I be doing based on the result?" >>>>> - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly >>>>> >>>>> The latter point implies that irqfd is available and that interrupt >>>>> routes from virtual IRQs (*) (like the one associated with an irqfd) to >>>>> the in-kernel IRQ controller have to be established. That's pretty generic. >>>> >>>> But you can perfectly well have an in-kernel-irqchip that doesn't >>>> support irqfd >>> >>> You could, thought this doesn't make much sense. >> >> Why doesn't it make sense? On ARM, in-kernel-irqchip means you can take >> advantage of the hardware support for a virtual GIC, and you can use >> the virtual timer support too. These are both big performance advantages >> even if QEMU never does anything with irqfds. (In fact the current >> ARM KVM VGIC code doesn't support irqfds as far as I can see from >> a quick scan of the kernel code.) > > It doesn't make sense as it means your in-kernel irqchip model is > semi-finished. If you didn't consider how to support direct in-kernel > IRQ injections, you risk designing something that requires userspace > quirk handling later on when extending it to full-featured in-kernel > irqchip support. Well, the in-kernel virtual timer already does direct in-kernel IRQ injection to the VGIC: it calls a function to say "inject IRQ X"... (this works because the interrupt line used is fixed by the CPU, it's not a board model property so there is no need for the routing to be defined by userspace. (ie both ends of this irq injection are in the CPU proper.)) As far as I can tell you seem to be saying "irqfds are an extra feature you might want later", which is exactly "you can have an irqchip without them". (Is there some summary of the design of the irqfds stuff somewhere I can go and read up?) thanks -- PMM
On 2012-07-21 11:56, Peter Maydell wrote: > On 21 July 2012 10:44, Jan Kiszka <jan.kiszka@web.de> wrote: >> On 2012-07-21 11:30, Peter Maydell wrote: >>> On 21 July 2012 10:14, Jan Kiszka <jan.kiszka@web.de> wrote: >>>> On 2012-07-21 10:54, Peter Maydell wrote: >>>>> On 21 July 2012 07:57, Jan Kiszka <jan.kiszka@web.de> wrote: >>>> Naming is x86 specific, semantic not. It means that KVM doesn't prevent >>>> remapping of IRQs. Granted, I really hope you don't make such mistakes >>>> in your arch. >>> >>> What does "remapping of IRQs" mean here? >> >> It means that the QEMU model of the board can define interrupt routes in >> an unconfined way, which is obviously always true when the irqchips are >> all in userspace but not necessarily when KVM support is in the loop. > > Er, surely it's only false if your KVM irqchip is obviously broken? > I mean, any sane in-kernel-irqchip model needs to support the same set > of incoming irq lines as the out-of-kernel irqchip model it's replacing. > There's no difference in flexibility there. > > Or are you trying to talk about defining interrupt routes when the > source and destination are both kernel code but the route needs to > be set by userspace (ie is machine specific not cpu specific)? It describes this requirement primarily. > Whether that's possible sounds to me like it would depend on all > the board model code between the source and destination rather > than being a single global boolean check. It depends on the feature set of the in-kernel irqchips and if this can possibly vary on real hw. > >>> not very generic to me, in that I really don't know what it would >>> mean in an ARM context. The fact that the only caller of this is >>> in hw/pc.c is also a big red flag that this isn't exactly generic. >> >> x86 is also still the only arch with full in-kernel irqchip support. And >> even if there is only one arch using it, that doesn't mean the test >> needs to be moved around - if the test itself is generic, just always >> true for other archs. > > I just don't see why this check is anything other than "do I have > a broken x86 kernel irqchip?" In particular it doesn't have any > documented semantics defined in generic terms that you could usefully > use in general QEMU code to say "do I need to call this function > and what should I be doing based on the result?" Let's look at this again when we redefined the KVM test functions. Maybe then the infrastructure is available to move it cleanly. Maybe it will continue to not matter and we can leave it alone. > >>>>>> - in-kernel KVM helpers like vhost or VFIO can inject IRQs directly >>>>>> >>>>>> The latter point implies that irqfd is available and that interrupt >>>>>> routes from virtual IRQs (*) (like the one associated with an irqfd) to >>>>>> the in-kernel IRQ controller have to be established. That's pretty generic. >>>>> >>>>> But you can perfectly well have an in-kernel-irqchip that doesn't >>>>> support irqfd >>>> >>>> You could, thought this doesn't make much sense. >>> >>> Why doesn't it make sense? On ARM, in-kernel-irqchip means you can take >>> advantage of the hardware support for a virtual GIC, and you can use >>> the virtual timer support too. These are both big performance advantages >>> even if QEMU never does anything with irqfds. (In fact the current >>> ARM KVM VGIC code doesn't support irqfds as far as I can see from >>> a quick scan of the kernel code.) >> >> It doesn't make sense as it means your in-kernel irqchip model is >> semi-finished. If you didn't consider how to support direct in-kernel >> IRQ injections, you risk designing something that requires userspace >> quirk handling later on when extending it to full-featured in-kernel >> irqchip support. > > Well, the in-kernel virtual timer already does direct in-kernel IRQ > injection to the VGIC: it calls a function to say "inject IRQ X"... > (this works because the interrupt line used is fixed by the CPU, > it's not a board model property so there is no need for the routing > to be defined by userspace. (ie both ends of this irq injection are > in the CPU proper.)) Could you inject IRQs from an in-kernel helper that (partially or fully) emulates some device sitting on peripheral buses as well (like PCI)? If not, you aren't done with the in-kernel irqchip model yet. > > As far as I can tell you seem to be saying "irqfds are an extra > feature you might want later", which is exactly "you can have > an irqchip without them". Once the prerequisites for peripheral interrupt injections are there, enabling irqfd for your arch should be straightforward. I'm picking on those prerequisites here, not irqfd. > > (Is there some summary of the design of the irqfds stuff somewhere I > can go and read up?) linux/Documentation/virtual/kvm/api.txt is a good start, though not really a high-level summary. Jan
On 21 July 2012 11:22, Jan Kiszka <jan.kiszka@web.de> wrote: > On 2012-07-21 11:56, Peter Maydell wrote: >> Or are you trying to talk about defining interrupt routes when the >> source and destination are both kernel code but the route needs to >> be set by userspace (ie is machine specific not cpu specific)? > > It describes this requirement primarily. > >> Whether that's possible sounds to me like it would depend on all >> the board model code between the source and destination rather >> than being a single global boolean check. > > It depends on the feature set of the in-kernel irqchips and if this can > possibly vary on real hw. If the interrupt route is on-CPU then its routing is fixed (for that CPU), and you don't need to care about irqfds because the kernel knows what CPU it's providing to the guest, has both ends of the connection and can just do the right thing however is most convenient for the internal implementation. If the interrupt route is off-CPU then all bets are off because the routing is machine specific and could go through any kind of logic between the peripheral and the CPU's irqchip. I don't see how you can do this with QEMU's current IRQ infrastructure, which basically just hardwires everything with qemu_irq lines and doesn't provide any way to query the routing and logic from an irq source to its destination. >>>>>> But you can perfectly well have an in-kernel-irqchip that doesn't >>>>>> support irqfd >>>>> >>>>> You could, thought this doesn't make much sense. >>>> >>>> Why doesn't it make sense? On ARM, in-kernel-irqchip means you can take >>>> advantage of the hardware support for a virtual GIC, and you can use >>>> the virtual timer support too. These are both big performance advantages >>>> even if QEMU never does anything with irqfds. (In fact the current >>>> ARM KVM VGIC code doesn't support irqfds as far as I can see from >>>> a quick scan of the kernel code.) >>> >>> It doesn't make sense as it means your in-kernel irqchip model is >>> semi-finished. If you didn't consider how to support direct in-kernel >>> IRQ injections, you risk designing something that requires userspace >>> quirk handling later on when extending it to full-featured in-kernel >>> irqchip support. >> >> Well, the in-kernel virtual timer already does direct in-kernel IRQ >> injection to the VGIC: it calls a function to say "inject IRQ X"... >> (this works because the interrupt line used is fixed by the CPU, >> it's not a board model property so there is no need for the routing >> to be defined by userspace. (ie both ends of this irq injection are >> in the CPU proper.)) > > Could you inject IRQs from an in-kernel helper that (partially or fully) > emulates some device sitting on peripheral buses as well (like PCI)? If > not, you aren't done with the in-kernel irqchip model yet. This is still sounding like "there is an extra feature which you should probably implement at some point and should certainly design with the intention of supporting", not "you cannot have an irqchip without irqfds". Therefore QEMU code which cares about irqfds should be doing checks for irqfd functionality, not "is there an in kernel irqchip". >> As far as I can tell you seem to be saying "irqfds are an extra >> feature you might want later", which is exactly "you can have >> an irqchip without them". > > Once the prerequisites for peripheral interrupt injections are there, > enabling irqfd for your arch should be straightforward. I'm picking on > those prerequisites here, not irqfd. > >> >> (Is there some summary of the design of the irqfds stuff somewhere I >> can go and read up?) > > linux/Documentation/virtual/kvm/api.txt is a good start, though not > really a high-level summary. I looked for 'irqfd' in that and found a straightforward ioctl for "wire this eventfd up to this irqchip input". That doesn't say anything about remapping of IRQs, and it's not clear to me either why a straightforward "use an ioctl to deliver incoming interrupts" design would be broken by adding that later: it's just a different source for the interrupt... -- PMM
On 2012-07-21 12:53, Peter Maydell wrote: > On 21 July 2012 11:22, Jan Kiszka <jan.kiszka@web.de> wrote: >> On 2012-07-21 11:56, Peter Maydell wrote: >>> Or are you trying to talk about defining interrupt routes when the >>> source and destination are both kernel code but the route needs to >>> be set by userspace (ie is machine specific not cpu specific)? >> >> It describes this requirement primarily. >> >>> Whether that's possible sounds to me like it would depend on all >>> the board model code between the source and destination rather >>> than being a single global boolean check. >> >> It depends on the feature set of the in-kernel irqchips and if this can >> possibly vary on real hw. > > If the interrupt route is on-CPU then its routing is fixed (for > that CPU), and you don't need to care about irqfds because the > kernel knows what CPU it's providing to the guest, has both ends > of the connection and can just do the right thing however is most > convenient for the internal implementation. If the interrupt route > is off-CPU then all bets are off because the routing is machine > specific and could go through any kind of logic between the peripheral > and the CPU's irqchip. > > I don't see how you can do this with QEMU's current IRQ infrastructure, > which basically just hardwires everything with qemu_irq lines and > doesn't provide any way to query the routing and logic from an > irq source to its destination. Routing from arbitrary sources to the in-kernel sink, skipping intermediate steps in the hotpath is in fact an unsolved issue in QEMU. We are just introducing band-aid for PCI on x86 (to introduce PCI device assignment). Long-term, this requires a generic solution which allows path discovery etc. But this is a userspace problem, nothing related to the KVM kernel features. > >>>>>>> But you can perfectly well have an in-kernel-irqchip that doesn't >>>>>>> support irqfd >>>>>> >>>>>> You could, thought this doesn't make much sense. >>>>> >>>>> Why doesn't it make sense? On ARM, in-kernel-irqchip means you can take >>>>> advantage of the hardware support for a virtual GIC, and you can use >>>>> the virtual timer support too. These are both big performance advantages >>>>> even if QEMU never does anything with irqfds. (In fact the current >>>>> ARM KVM VGIC code doesn't support irqfds as far as I can see from >>>>> a quick scan of the kernel code.) >>>> >>>> It doesn't make sense as it means your in-kernel irqchip model is >>>> semi-finished. If you didn't consider how to support direct in-kernel >>>> IRQ injections, you risk designing something that requires userspace >>>> quirk handling later on when extending it to full-featured in-kernel >>>> irqchip support. >>> >>> Well, the in-kernel virtual timer already does direct in-kernel IRQ >>> injection to the VGIC: it calls a function to say "inject IRQ X"... >>> (this works because the interrupt line used is fixed by the CPU, >>> it's not a board model property so there is no need for the routing >>> to be defined by userspace. (ie both ends of this irq injection are >>> in the CPU proper.)) >> >> Could you inject IRQs from an in-kernel helper that (partially or fully) >> emulates some device sitting on peripheral buses as well (like PCI)? If >> not, you aren't done with the in-kernel irqchip model yet. > > This is still sounding like "there is an extra feature which you should > probably implement at some point and should certainly design with the > intention of supporting", not "you cannot have an irqchip without irqfds". > > Therefore QEMU code which cares about irqfds should be doing > checks for irqfd functionality, not "is there an in kernel > irqchip". Defining some kvm_irqfd_available() is one thing. Ignoring irqfd "for now" while introducing in-kernel irqchip is another, less wise decision. >>> As far as I can tell you seem to be saying "irqfds are an extra >>> feature you might want later", which is exactly "you can have >>> an irqchip without them". >> >> Once the prerequisites for peripheral interrupt injections are there, >> enabling irqfd for your arch should be straightforward. I'm picking on >> those prerequisites here, not irqfd. >> >>> >>> (Is there some summary of the design of the irqfds stuff somewhere I >>> can go and read up?) >> >> linux/Documentation/virtual/kvm/api.txt is a good start, though not >> really a high-level summary. > > I looked for 'irqfd' in that and found a straightforward ioctl for > "wire this eventfd up to this irqchip input". That doesn't say anything > about remapping of IRQs, and it's not clear to me either why a > straightforward "use an ioctl to deliver incoming interrupts" design > would be broken by adding that later: it's just a different source > for the interrupt... Once you support the backend (KVM_SET_GSI_ROUTING + KVM_IRQ_LINE), adding irqfd is in fact simple. Jan
On 21 July 2012 12:08, Jan Kiszka <jan.kiszka@web.de> wrote: > On 2012-07-21 12:53, Peter Maydell wrote: >> This is still sounding like "there is an extra feature which you should >> probably implement at some point and should certainly design with the >> intention of supporting", not "you cannot have an irqchip without irqfds". >> >> Therefore QEMU code which cares about irqfds should be doing >> checks for irqfd functionality, not "is there an in kernel >> irqchip". > > Defining some kvm_irqfd_available() is one thing. Ignoring irqfd "for > now" while introducing in-kernel irqchip is another, less wise decision. You still haven't really explained why we can't just ignore irqfd for now. I don't see how it would particularly affect the design of the kernel implementation very much, and the userspace interface seems to just be an extra ioctl. > Once you support the backend (KVM_SET_GSI_ROUTING + KVM_IRQ_LINE), > adding irqfd is in fact simple. I don't really understand where KVM_SET_GSI_ROUTING comes into this -- the documentation is a bit opaque. It says "Sets the GSI routing table entries" but it doesn't define what a GSI is or what we're routing to where. Googling suggests GSI is an APIC specific term so it doesn't sound like it's relevant for non-x86. I'm not sure what KVM_IRQ_LINE has to do with it either -- this is just the standard interrupt injection ioctl. KVM ARM supports this whether there's an in-kernel irqchip or not. -- PMM
On 2012-07-21 14:17, Peter Maydell wrote: > On 21 July 2012 12:08, Jan Kiszka <jan.kiszka@web.de> wrote: >> On 2012-07-21 12:53, Peter Maydell wrote: >>> This is still sounding like "there is an extra feature which you should >>> probably implement at some point and should certainly design with the >>> intention of supporting", not "you cannot have an irqchip without irqfds". >>> >>> Therefore QEMU code which cares about irqfds should be doing >>> checks for irqfd functionality, not "is there an in kernel >>> irqchip". >> >> Defining some kvm_irqfd_available() is one thing. Ignoring irqfd "for >> now" while introducing in-kernel irqchip is another, less wise decision. > > You still haven't really explained why we can't just ignore irqfd > for now. I don't see how it would particularly affect the design > of the kernel implementation very much, and the userspace interface > seems to just be an extra ioctl. I bet you ignored MSI so far. Physical IRQ lines are just a part of the whole picture. How are MSIs delivered on the systems you want to add? > >> Once you support the backend (KVM_SET_GSI_ROUTING + KVM_IRQ_LINE), >> adding irqfd is in fact simple. > > I don't really understand where KVM_SET_GSI_ROUTING comes into > this -- the documentation is a bit opaque. It says "Sets the GSI > routing table entries" but it doesn't define what a GSI is or > what we're routing to where. Googling suggests GSI is an APIC > specific term so it doesn't sound like it's relevant for non-x86. As I said before: "GSI" needs to be read as "physical or virtual IRQ line". The virtual ones can be of any source you define, irqfd is just one. > > I'm not sure what KVM_IRQ_LINE has to do with it either -- this > is just the standard interrupt injection ioctl. KVM ARM supports > this whether there's an in-kernel irqchip or not. KVM_IRQ_LINE injects according to the routing defined via KVM_SET_GSI_ROUTING, at least on x86 (and ia64). If you define your own KVM_IRQ_LINE semantics, you may have problems later on when you need to redefine it after adding IRQ routing support. Even if you don't want to add full irqchip support now, plan for it. Define the so far used interfaces in a way that they will work seamlessly when adding the missing bits and pieces. However, I still think it's better to skip the intermediate steps in order to avoid planning mistakes. Jan
On 21 July 2012 13:35, Jan Kiszka <jan.kiszka@web.de> wrote: > On 2012-07-21 14:17, Peter Maydell wrote: >> You still haven't really explained why we can't just ignore irqfd >> for now. I don't see how it would particularly affect the design >> of the kernel implementation very much, and the userspace interface >> seems to just be an extra ioctl. > > I bet you ignored MSI so far. Physical IRQ lines are just a part of the > whole picture. How are MSIs delivered on the systems you want to add? You're using random acronyms without defining them again. It looks as if MSI is a PCI specific term. That would seem to me to fall under the heading of "routing across a board model" which we can't do anyway, because you have no idea how this all wired up, it will depend on the details of the SoC and the PCI controller. (As it happens the initial board model doesn't have PCI support; most ARM boards don't.) I'm not entirely sure we want to have "in kernel random SoC-specific PCI controller"... [Point taken that thought is required here, though.] >>> Once you support the backend (KVM_SET_GSI_ROUTING + KVM_IRQ_LINE), >>> adding irqfd is in fact simple. >> >> I don't really understand where KVM_SET_GSI_ROUTING comes into >> this -- the documentation is a bit opaque. It says "Sets the GSI >> routing table entries" but it doesn't define what a GSI is or >> what we're routing to where. Googling suggests GSI is an APIC >> specific term so it doesn't sound like it's relevant for non-x86. > > As I said before: "GSI" needs to be read as "physical or virtual IRQ > line". The virtual ones can be of any source you define, irqfd is just one. What's a virtual irq line in this context? We're modelling a physical bit of hardware which has N interrupt lines, so I'm not sure what a virtual irq line would be or how it would appear to the guest... -- PMM
On 2012-07-21 14:57, Peter Maydell wrote: > On 21 July 2012 13:35, Jan Kiszka <jan.kiszka@web.de> wrote: >> On 2012-07-21 14:17, Peter Maydell wrote: >>> You still haven't really explained why we can't just ignore irqfd >>> for now. I don't see how it would particularly affect the design >>> of the kernel implementation very much, and the userspace interface >>> seems to just be an extra ioctl. >> >> I bet you ignored MSI so far. Physical IRQ lines are just a part of the >> whole picture. How are MSIs delivered on the systems you want to add? > > You're using random acronyms without defining them again. It looks > as if MSI is a PCI specific term. That would seem to me to fall > under the heading of "routing across a board model" which we can't > do anyway, because you have no idea how this all wired up, it > will depend on the details of the SoC and the PCI controller. For sure you can. You need to discover those wiring, cache it, and then let the source inject to the final destination directly. See the INTx routing notifier and pci_device_route_intx_to_irq from [1] for that simplistic approach we are taking on the x86/PC architecture. > (As it happens the initial board model doesn't have PCI support; > most ARM boards don't.) I'm not entirely sure we want to have > "in kernel random SoC-specific PCI controller"... Isn't ARM going after server scenarios as well? Will be hard without some PCI support. The good news is that you likely won't need a full in-kernel PCI model for this (we don't do so on x86 as well). > > [Point taken that thought is required here, though.] > >>>> Once you support the backend (KVM_SET_GSI_ROUTING + KVM_IRQ_LINE), >>>> adding irqfd is in fact simple. >>> >>> I don't really understand where KVM_SET_GSI_ROUTING comes into >>> this -- the documentation is a bit opaque. It says "Sets the GSI >>> routing table entries" but it doesn't define what a GSI is or >>> what we're routing to where. Googling suggests GSI is an APIC >>> specific term so it doesn't sound like it's relevant for non-x86. >> >> As I said before: "GSI" needs to be read as "physical or virtual IRQ >> line". The virtual ones can be of any source you define, irqfd is just one. > > What's a virtual irq line in this context? We're modelling a physical > bit of hardware which has N interrupt lines, so I'm not sure what > a virtual irq line would be or how it would appear to the guest... A virtual line is an input of the in-kernel IRQ router you configure via SET_GSI_ROUTING. A physical line is a potential output of it that goes into some in-kernel interrupt controller model. It can also be an interrupt message sent to a specific CPU - provided the arch supports such a delivery protocol. Of course, the current router was modeled after x86 and ia64. So I wouldn't be surprised if some ARM system configuration cannot be expressed this way. Then we need to discuss reasonable extensions. But it should provide a sound foundation at least. Jan [1] http://permalink.gmane.org/gmane.comp.emulators.qemu/160792
On Sat, 21 Jul 2012 15:16:56 +0200 Jan Kiszka <jan.kiszka@web.de> wrote: > On 2012-07-21 14:57, Peter Maydell wrote: > > On 21 July 2012 13:35, Jan Kiszka <jan.kiszka@web.de> wrote: > >> On 2012-07-21 14:17, Peter Maydell wrote: > >>> You still haven't really explained why we can't just ignore irqfd > >>> for now. I don't see how it would particularly affect the design > >>> of the kernel implementation very much, and the userspace interface > >>> seems to just be an extra ioctl. > >> > >> I bet you ignored MSI so far. Physical IRQ lines are just a part of the > >> whole picture. How are MSIs delivered on the systems you want to add? > > > > You're using random acronyms without defining them again. It looks > > as if MSI is a PCI specific term. That would seem to me to fall > > under the heading of "routing across a board model" which we can't > > do anyway, because you have no idea how this all wired up, it > > will depend on the details of the SoC and the PCI controller. > > For sure you can. You need to discover those wiring, cache it, and then > let the source inject to the final destination directly. See the INTx > routing notifier and pci_device_route_intx_to_irq from [1] for that > simplistic approach we are taking on the x86/PC architecture. > >>>> Once you support the backend (KVM_SET_GSI_ROUTING + KVM_IRQ_LINE), > >>>> adding irqfd is in fact simple. > >>> > >>> I don't really understand where KVM_SET_GSI_ROUTING comes into > >>> this -- the documentation is a bit opaque. It says "Sets the GSI > >>> routing table entries" but it doesn't define what a GSI is or > >>> what we're routing to where. Googling suggests GSI is an APIC > >>> specific term so it doesn't sound like it's relevant for non-x86. > >> > >> As I said before: "GSI" needs to be read as "physical or virtual IRQ > >> line". The virtual ones can be of any source you define, irqfd is just one. > > > > What's a virtual irq line in this context? We're modelling a physical > > bit of hardware which has N interrupt lines, so I'm not sure what > > a virtual irq line would be or how it would appear to the guest... > > A virtual line is an input of the in-kernel IRQ router you configure via > SET_GSI_ROUTING. A physical line is a potential output of it that goes > into some in-kernel interrupt controller model. It can also be an > interrupt message sent to a specific CPU - provided the arch supports > such a delivery protocol. > > Of course, the current router was modeled after x86 and ia64. So I > wouldn't be surprised if some ARM system configuration cannot be > expressed this way. Then we need to discuss reasonable extensions. But > it should provide a sound foundation at least. OK, so I was reading through this thread since I want to add irqfd support for s390, but we don't have any kind of "irqchip". The understanding I got so far is that !s390 architectures have some kind of mechanism that allows them to "route" an interrupt between a device and a cpu, meaning that there's a fixed tie-in between a device and a cpu. If that's correct, I don't see how to model irqfds via this irqchip infrastructure for s390. Here's in a nutshell how (external and I/O) interrupts work on s390: - Interrupts have an internal prioritation, that means different types of interrupts (external, I/O, machine check, ...) take precedent over other types - External and I/O interrupts are "floating", i. e. they are not tied to a specific cpu, but can be delivered to any cpu that has external resp. I/O interrupts enabled - Interrupts take payload with them that defines which device they are for So, for example, if a specific subchannel (=device) has pending status and an I/O interrupt is to be generated, this interrupt remains pending until an arbitrary cpu is enabled for I/O interrupts. If several cpus are enabled for I/O interrupts, any of them may be interrupted. When an I/O interrupt is delivered on a cpu, the cpu's lowcore contains the interrupt payload which defines the subchannel (=device) the interrupt is for. Any idea on how this architecture can be married with the irqchip concept is welcome. If all else fails, would a special irqfd concept for !irqchip be acceptable? Cornelia
On 07/23/2012 03:04 PM, Cornelia Huck wrote: > > OK, so I was reading through this thread since I want to add irqfd > support for s390, but we don't have any kind of "irqchip". > > The understanding I got so far is that !s390 architectures have some > kind of mechanism that allows them to "route" an interrupt between a > device and a cpu, meaning that there's a fixed tie-in between a device > and a cpu. It's not fixed, but it changes rarely. Various interrupt routers can be programmed to change the route, but guests do so only rarely. > If that's correct, I don't see how to model irqfds via > this irqchip infrastructure for s390. > > Here's in a nutshell how (external and I/O) interrupts work on s390: > > - Interrupts have an internal prioritation, that means different types > of interrupts (external, I/O, machine check, ...) take precedent over > other types > > - External and I/O interrupts are "floating", i. e. they are not tied > to a specific cpu, but can be delivered to any cpu that has external > resp. I/O interrupts enabled > > - Interrupts take payload with them that defines which device they are > for > > So, for example, if a specific subchannel (=device) has pending status > and an I/O interrupt is to be generated, this interrupt remains pending > until an arbitrary cpu is enabled for I/O interrupts. If several cpus > are enabled for I/O interrupts, any of them may be interrupted. This may be costly to emulate. On x86 we do not have access to a guest's interrupt status while it is running. Is this not the case for s390? Oh, let me guess. You write some interrupt descriptor in memory somewhere, issue one of your famous instructions, and the hardware finds a guest vcpu and injects the interrupt. x86 has a "least priority" mode which is similar to what you're describing, but I don't think we emulate it correctly. > When an > I/O interrupt is delivered on a cpu, the cpu's lowcore contains the > interrupt payload which defines the subchannel (=device) the interrupt > is for. > > Any idea on how this architecture can be married with the irqchip > concept is welcome. If all else fails, would a special irqfd concept > for !irqchip be acceptable? I don't see an issue. You need an arch-specific irqfd configuration ioctl (or your own data field in the existing ioctl) that defines the payload. Then the kernel installs a poll function on that eventfd that, when called, does the magic sequence needed to get the interrupt there. While you don't have an irqchip, you do have asynchronous interrupt injection, yes? That's what irqchip really is all about.
On 23 July 2012 13:18, Avi Kivity <avi@redhat.com> wrote: > While you don't have an irqchip, you do have asynchronous interrupt > injection, yes? That's what irqchip really is all about. This seems an odd point of view -- async interrupt injection doesn't have anything to do with whether we're modelling the irqchip in the kernel or in QEMU, I thought... -- PMM
On 07/21/2012 11:54 AM, Peter Maydell wrote: > On 21 July 2012 07:57, Jan Kiszka <jan.kiszka@web.de> wrote: >> On 2012-07-20 21:14, Peter Maydell wrote: >>> I'm sure this isn't the only x86ism in the KVM generic source >>> files. However the thing I'm specifically trying to do is >>> nuke all the uses of kvm_irqchip_in_kernel() in common code, >> >> No, "irqchip in kernel" is supposed to be a generic concept. We will >> also have it on Power. Not sure what your plans are for ARM, maybe it >> will always be true there. > > I agree that "irqchip in kernel?" is generic (though as you'll see > below there's disagreement about what that ought to mean or imply). > "irq0_override" though seems to me to be absolutely x86 specific. > >> That said, maybe there is room for discussion about what it means for >> the general KVM code and its users if the irqchip is in the kernel. Two >> things that should be common for every arch: >> - VCPU idle management is done inside the kernel > > The trouble is that at the moment QEMU assumes that "is the > irqchip in kernel?" == "is VCPU idle management in kernel", for > instance. For ARM, VCPU idle management is in kernel whether > we're using the kernel's model of the VGIC or not. Alex tells > me PPC is the same way. It's only x86 that has tied these two > concepts together. Really, "irqchip in kernel" means asynchronous interrupts - you can inject an interrupt from outside the vcpu thread. Obviously if the vcpu is sleeping you need to wake it up and that pulls in idle management. "irqchip" for x86 really means the local APIC, which has a synchronous interface with the cpu core. "local APIC in kernel" really is equivalent to "kernel idle management", "KVM_IRQ_LINE", and irqfd. The ioapic and pit, on the other hand, don't contribute anything to this (just performance). So yes, ARM with and without GIC are irqchip_in_kernel, since the ARM<->GIC interface is asynchronous. Whether to emulate the GIC or not is just a performance question. > The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel() > is because I think they're all similar to this -- the common code is > using the check as a proxy for something else, and it should be directly > asking about that something else. The only bits of code that should > care about "is the irqchip in kernel?" are: > * target-specific device/machine setup code which needs to know > which apic/etc to instantiate > * target-specific x86 code which has this weird synchronous IRQ > delivery model for irqchip-not-in-kernel > (Obviously I might have missed something, I'm flailing around > trying to understand this code :-)) Agree naming should be improved. In fact the early series I pushed to decompose local apic, ioapic, and pic, but that didn't happen. If it did we'd probably not have this conversation.
On 07/23/2012 03:25 PM, Peter Maydell wrote: > On 23 July 2012 13:18, Avi Kivity <avi@redhat.com> wrote: >> While you don't have an irqchip, you do have asynchronous interrupt >> injection, yes? That's what irqchip really is all about. > > This seems an odd point of view -- async interrupt injection > doesn't have anything to do with whether we're modelling > the irqchip in the kernel or in QEMU, I thought... It does on x86. The relationship between the APIC and the core is synchronous - the APIC presents the interrupt, the core grabs is when it is ready (interrupt flag, etc.) and signals the APIC it has done so. The APIC then clears the interrupt from the interrupt request register and sets it in the interrupt status register. This sequence has to be done while the vcpu is stopped, since we don't have access to the interrupt flag otherwise.
On 07/23/2012 03:31 PM, Avi Kivity wrote: > On 07/23/2012 03:25 PM, Peter Maydell wrote: >> On 23 July 2012 13:18, Avi Kivity <avi@redhat.com> wrote: >>> While you don't have an irqchip, you do have asynchronous interrupt >>> injection, yes? That's what irqchip really is all about. >> >> This seems an odd point of view -- async interrupt injection >> doesn't have anything to do with whether we're modelling >> the irqchip in the kernel or in QEMU, I thought... > > It does on x86. The relationship between the APIC and the core is > synchronous - the APIC presents the interrupt, the core grabs is when it > is ready (interrupt flag, etc.) and signals the APIC it has done so. > The APIC then clears the interrupt from the interrupt request register > and sets it in the interrupt status register. This sequence has to be > done while the vcpu is stopped, since we don't have access to the > interrupt flag otherwise. Again, this is just the local APIC. The IOAPIC (which is x86 equivalent to the GIC, IIUC) doesn't change this picture and could have been emulated in userspace even with async interrupts. As it happens local APIC emulation pulls in the IOAPIC and PIC. So my view is that ARM with and without kernel GIC are irqchip_in_kernel, since whatever is the local APIC in ARM is always emulated in the kernel.
On 23 July 2012 13:26, Avi Kivity <avi@redhat.com> wrote: > Really, "irqchip in kernel" means asynchronous interrupts - you can > inject an interrupt from outside the vcpu thread. Obviously if the vcpu > is sleeping you need to wake it up and that pulls in idle management. > > "irqchip" for x86 really means the local APIC, which has a synchronous > interface with the cpu core. "local APIC in kernel" really is > equivalent to "kernel idle management", "KVM_IRQ_LINE", and irqfd. The > ioapic and pit, on the other hand, don't contribute anything to this > (just performance). > So yes, ARM with and without GIC are irqchip_in_kernel, since the > ARM<->GIC interface is asynchronous. Whether to emulate the GIC or not > is just a performance question. So should we be using something other than KVM_CREATE_IRQCHIP to ask the kernel to create a GIC model for us (and leave KVM_CREATE_IRQCHIP as a dummy "always succeed" ioctl)? > So my view is that ARM with and without kernel GIC are > irqchip_in_kernel, since whatever is the local APIC in ARM is always > emulated in the kernel. I'm not sure ARM has any equivalent to the local APIC -- the GIC deals with everything and we don't have any equivalent division of labour to the x86 LAPIC-IOAPIC one. -- PMM
On Mon, 23 Jul 2012 15:18:49 +0300 Avi Kivity <avi@redhat.com> wrote: > > So, for example, if a specific subchannel (=device) has pending status > > and an I/O interrupt is to be generated, this interrupt remains pending > > until an arbitrary cpu is enabled for I/O interrupts. If several cpus > > are enabled for I/O interrupts, any of them may be interrupted. > > This may be costly to emulate. On x86 we do not have access to a > guest's interrupt status while it is running. Is this not the case for > s390? > > Oh, let me guess. You write some interrupt descriptor in memory > somewhere, issue one of your famous instructions, and the hardware finds > a guest vcpu and injects the interrupt. Basically, we have some flags in our control block we can set so that the cpu drops out of SIE whenever external/I/O/... interrupts are enabled and then have the host do the lowcore updates, psw swaps, etc. > > x86 has a "least priority" mode which is similar to what you're > describing, but I don't think we emulate it correctly. > > > When an > > I/O interrupt is delivered on a cpu, the cpu's lowcore contains the > > interrupt payload which defines the subchannel (=device) the interrupt > > is for. > > > > Any idea on how this architecture can be married with the irqchip > > concept is welcome. If all else fails, would a special irqfd concept > > for !irqchip be acceptable? > > I don't see an issue. You need an arch-specific irqfd configuration > ioctl (or your own data field in the existing ioctl) that defines the > payload. Then the kernel installs a poll function on that eventfd that, > when called, does the magic sequence needed to get the interrupt there. If extending the existing ioctl is acceptable, I think we will go that route. > While you don't have an irqchip, you do have asynchronous interrupt > injection, yes? That's what irqchip really is all about. You mean injection via ioctl() that is asynchronous to vcpu execution? Yes, although we use a different ioctl than the others. Cornelia
On 07/23/2012 03:58 PM, Peter Maydell wrote: > On 23 July 2012 13:26, Avi Kivity <avi@redhat.com> wrote: >> Really, "irqchip in kernel" means asynchronous interrupts - you can >> inject an interrupt from outside the vcpu thread. Obviously if the vcpu >> is sleeping you need to wake it up and that pulls in idle management. >> >> "irqchip" for x86 really means the local APIC, which has a synchronous >> interface with the cpu core. "local APIC in kernel" really is >> equivalent to "kernel idle management", "KVM_IRQ_LINE", and irqfd. The >> ioapic and pit, on the other hand, don't contribute anything to this >> (just performance). > >> So yes, ARM with and without GIC are irqchip_in_kernel, since the >> ARM<->GIC interface is asynchronous. Whether to emulate the GIC or not >> is just a performance question. > > So should we be using something other than KVM_CREATE_IRQCHIP to > ask the kernel to create a GIC model for us (and leave KVM_CREATE_IRQCHIP > as a dummy "always succeed" ioctl)? Some time ago I suggested using the parameter to KVM_CREATE_IRQCHIP to select the "irqchip type". > >> So my view is that ARM with and without kernel GIC are >> irqchip_in_kernel, since whatever is the local APIC in ARM is always >> emulated in the kernel. > > I'm not sure ARM has any equivalent to the local APIC -- the GIC > deals with everything and we don't have any equivalent division > of labour to the x86 LAPIC-IOAPIC one. It's probably a tiny part of the core with no name. The point is that the x86<->lapic interface is synchronous and bidirectional, while the lapic<->ioapic interface is asynchronous (it is still bidirectional, but not in a stop-the-vcpu way). I assume the ARM<->GIC interface is unidirectional?
On 07/23/2012 04:06 PM, Cornelia Huck wrote: > On Mon, 23 Jul 2012 15:18:49 +0300 > Avi Kivity <avi@redhat.com> wrote: > >> > So, for example, if a specific subchannel (=device) has pending status >> > and an I/O interrupt is to be generated, this interrupt remains pending >> > until an arbitrary cpu is enabled for I/O interrupts. If several cpus >> > are enabled for I/O interrupts, any of them may be interrupted. >> >> This may be costly to emulate. On x86 we do not have access to a >> guest's interrupt status while it is running. Is this not the case for >> s390? >> >> Oh, let me guess. You write some interrupt descriptor in memory >> somewhere, issue one of your famous instructions, and the hardware finds >> a guest vcpu and injects the interrupt. > > Basically, we have some flags in our control block we can set so that > the cpu drops out of SIE whenever external/I/O/... interrupts are > enabled and then have the host do the lowcore updates, psw swaps, etc. Can you write them from a different cpu and expect them to take effect? How do you emulate an interrupt with a large guest? You have to update the flags in the control blocks for all vcpus; then restore them when the interrupt is delivered? >> I don't see an issue. You need an arch-specific irqfd configuration >> ioctl (or your own data field in the existing ioctl) that defines the >> payload. Then the kernel installs a poll function on that eventfd that, >> when called, does the magic sequence needed to get the interrupt there. > > If extending the existing ioctl is acceptable, I think we will go that > route. Sure. >> While you don't have an irqchip, you do have asynchronous interrupt >> injection, yes? That's what irqchip really is all about. > > You mean injection via ioctl() that is asynchronous to vcpu execution? > Yes, although we use a different ioctl than the others. Ok. The difference between irqfd and the ioctl is that with irqfd everything (the payload in your case, the interrupt number for others) is prepared beforehand, while with the ioctl the extra information is delivered with the ioctl.
On 23 July 2012 14:09, Avi Kivity <avi@redhat.com> wrote: > On 07/23/2012 03:58 PM, Peter Maydell wrote: >> So should we be using something other than KVM_CREATE_IRQCHIP to >> ask the kernel to create a GIC model for us (and leave KVM_CREATE_IRQCHIP >> as a dummy "always succeed" ioctl)? > > Some time ago I suggested using the parameter to KVM_CREATE_IRQCHIP to > select the "irqchip type". That seems reasonable, although we have an awkward ordering issue in KVM as it stands: KVM_CREATE_IRQCHIP needs to be called before we start creating VCPU threads (at the moment it is done via kvm_init) but we don't know what kind of irqchip we want to create until the machine model code actually creates the irqchip device, which generally happens after the CPU object is created (and VCPU threads are created at that point). We could fix that by moving creation of the actual VCPU thread to vl.c after the model has been initialized. >> I'm not sure ARM has any equivalent to the local APIC -- the GIC >> deals with everything and we don't have any equivalent division >> of labour to the x86 LAPIC-IOAPIC one. > > It's probably a tiny part of the core with no name. The point is that > the x86<->lapic interface is synchronous and bidirectional, while the > lapic<->ioapic interface is asynchronous (it is still bidirectional, but > not in a stop-the-vcpu way). I assume the ARM<->GIC interface is > unidirectional? Well, strictly speaking the ARM<->GIC interface is implementation defined, but in practice you can think of it as "the GIC controls the IRQ and FIQ input lines to each core and uses them to signal that an interrupt is present". There's no need for anything to be signalled back in the other direction: the GIC will just continue to hold IRQ asserted until the interrupt handler code writes the relevant GIC register to indicate that the interrupt has been handled. (On a core with the virtualization extensions there are also signals for the GIC to raise a virtual IRQ or FIQ, but we can ignore those for KVM because we don't and can't provide the virtualization extensions to a guest.) -- PMM
On 07/23/2012 04:27 PM, Peter Maydell wrote: > On 23 July 2012 14:09, Avi Kivity <avi@redhat.com> wrote: >> On 07/23/2012 03:58 PM, Peter Maydell wrote: >>> So should we be using something other than KVM_CREATE_IRQCHIP to >>> ask the kernel to create a GIC model for us (and leave KVM_CREATE_IRQCHIP >>> as a dummy "always succeed" ioctl)? >> >> Some time ago I suggested using the parameter to KVM_CREATE_IRQCHIP to >> select the "irqchip type". > > That seems reasonable, although we have an awkward ordering issue > in KVM as it stands: KVM_CREATE_IRQCHIP needs to be called before > we start creating VCPU threads (at the moment it is done via kvm_init) > but we don't know what kind of irqchip we want to create until the > machine model code actually creates the irqchip device, which generally > happens after the CPU object is created (and VCPU threads are created > at that point). We could fix that by moving creation of the actual > VCPU thread to vl.c after the model has been initialized. Seems to be purely a qemu problem, no? While I think it's reasonable to be flexible, in this case I think qemu ought to know all these things beforehand. > >>> I'm not sure ARM has any equivalent to the local APIC -- the GIC >>> deals with everything and we don't have any equivalent division >>> of labour to the x86 LAPIC-IOAPIC one. >> >> It's probably a tiny part of the core with no name. The point is that >> the x86<->lapic interface is synchronous and bidirectional, while the >> lapic<->ioapic interface is asynchronous (it is still bidirectional, but >> not in a stop-the-vcpu way). I assume the ARM<->GIC interface is >> unidirectional? > > Well, strictly speaking the ARM<->GIC interface is implementation > defined, but in practice you can think of it as "the GIC controls > the IRQ and FIQ input lines to each core and uses them to signal > that an interrupt is present". There's no need for anything to > be signalled back in the other direction: the GIC will just continue > to hold IRQ asserted until the interrupt handler code writes the > relevant GIC register to indicate that the interrupt has been > handled. Okay. This agrees with my mental model of how it works. > (On a core with the virtualization extensions there are also > signals for the GIC to raise a virtual IRQ or FIQ, but we can > ignore those for KVM because we don't and can't provide the > virtualization extensions to a guest.) Yet.
On 23 July 2012 14:38, Avi Kivity <avi@redhat.com> wrote: > On 07/23/2012 04:27 PM, Peter Maydell wrote: >> That seems reasonable, although we have an awkward ordering issue >> in KVM as it stands: KVM_CREATE_IRQCHIP needs to be called before >> we start creating VCPU threads (at the moment it is done via kvm_init) >> but we don't know what kind of irqchip we want to create until the >> machine model code actually creates the irqchip device, which generally >> happens after the CPU object is created (and VCPU threads are created >> at that point). We could fix that by moving creation of the actual >> VCPU thread to vl.c after the model has been initialized. > > Seems to be purely a qemu problem, no? While I think it's reasonable to > be flexible, in this case I think qemu ought to know all these things > beforehand. Yes, sorry, I meant "issue in QEMU" but had a thinko; my suggested solution is basically shuffling code around in QEMU so we wait until we know what we are dealing with before asking the kernel to create the vcpu threads. >> (On a core with the virtualization extensions there are also >> signals for the GIC to raise a virtual IRQ or FIQ, but we can >> ignore those for KVM because we don't and can't provide the >> virtualization extensions to a guest.) > > Yet. There is no mechanism in the virtualization extensions to either trap on or present a false value for guest accesses to the CPSR mode bits. So you can't make the guest OS think it is in Hypervisor mode. Therefore you can't provide the guest with the virtualization extensions. (The same argument applies for Monitor mode and means you can't provide the Security extensions (TrustZone) to the guest either.) I guess you could handle all code in the guest hypervisor under TCG but implementing a hybrid TCG+KVM mode would be a lot of effort and probably not really perform very well anyway... -- PMM
On Mon, 23 Jul 2012 16:14:02 +0300 Avi Kivity <avi@redhat.com> wrote: > On 07/23/2012 04:06 PM, Cornelia Huck wrote: > > On Mon, 23 Jul 2012 15:18:49 +0300 > > Avi Kivity <avi@redhat.com> wrote: > > > >> > So, for example, if a specific subchannel (=device) has pending status > >> > and an I/O interrupt is to be generated, this interrupt remains pending > >> > until an arbitrary cpu is enabled for I/O interrupts. If several cpus > >> > are enabled for I/O interrupts, any of them may be interrupted. > >> > >> This may be costly to emulate. On x86 we do not have access to a > >> guest's interrupt status while it is running. Is this not the case for > >> s390? > >> > >> Oh, let me guess. You write some interrupt descriptor in memory > >> somewhere, issue one of your famous instructions, and the hardware finds > >> a guest vcpu and injects the interrupt. > > > > Basically, we have some flags in our control block we can set so that > > the cpu drops out of SIE whenever external/I/O/... interrupts are > > enabled and then have the host do the lowcore updates, psw swaps, etc. > > Can you write them from a different cpu and expect them to take effect? > > How do you emulate an interrupt with a large guest? You have to update > the flags in the control blocks for all vcpus; then restore them when > the interrupt is delivered? We may access the flags for any vcpu via the kvm_s390_float_interrupt structure which is contained in kvm->arch. We'll get control when a vcpu enters a wait state and try to deliver pending interrupts or set/clear the flags. Moreover, idle vcpus are on a wait queue and are the first target for getting an interrupt injected. > >> While you don't have an irqchip, you do have asynchronous interrupt > >> injection, yes? That's what irqchip really is all about. > > > > You mean injection via ioctl() that is asynchronous to vcpu execution? > > Yes, although we use a different ioctl than the others. > > Ok. The difference between irqfd and the ioctl is that with irqfd > everything (the payload in your case, the interrupt number for others) > is prepared beforehand, while with the ioctl the extra information is > delivered with the ioctl. This should work for us. Cornelia
On 07/23/2012 04:55 PM, Cornelia Huck wrote: >> > Basically, we have some flags in our control block we can set so that >> > the cpu drops out of SIE whenever external/I/O/... interrupts are >> > enabled and then have the host do the lowcore updates, psw swaps, etc. >> >> Can you write them from a different cpu and expect them to take effect? >> >> How do you emulate an interrupt with a large guest? You have to update >> the flags in the control blocks for all vcpus; then restore them when >> the interrupt is delivered? > > We may access the flags for any vcpu via the kvm_s390_float_interrupt > structure which is contained in kvm->arch. > > We'll get control when a vcpu enters a wait state and try to deliver > pending interrupts or set/clear the flags. Moreover, idle vcpus are on > a wait queue and are the first target for getting an interrupt injected. Okay. And you can ask a vcpu to exit when it enables interrupts, without interrupting it? On x86, we have to IPI the vcpu so it drops to the host, ask it to let us know when interrupts are enabled, and let it run again.
On 07/23/2012 04:50 PM, Peter Maydell wrote: >> >> Yet. > > There is no mechanism in the virtualization extensions to either > trap on or present a false value for guest accesses to the CPSR > mode bits. So you can't make the guest OS think it is in Hypervisor > mode. Therefore you can't provide the guest with the virtualization > extensions. (The same argument applies for Monitor mode and means > you can't provide the Security extensions (TrustZone) to the guest > either.) I guess you could handle all code in the guest hypervisor > under TCG but implementing a hybrid TCG+KVM mode would be a lot > of effort and probably not really perform very well anyway... Gaah, people add virtualization extensions to fix an ISA's non-virtualizability, then do the same mistake again. But I was only joking. Nested virtualization is interesting technically but so far I haven't seen any huge or even small uptake.
On Mon, 23 Jul 2012 17:27:47 +0300 Avi Kivity <avi@redhat.com> wrote: > On 07/23/2012 04:55 PM, Cornelia Huck wrote: > >> > Basically, we have some flags in our control block we can set so that > >> > the cpu drops out of SIE whenever external/I/O/... interrupts are > >> > enabled and then have the host do the lowcore updates, psw swaps, etc. > >> > >> Can you write them from a different cpu and expect them to take effect? > >> > >> How do you emulate an interrupt with a large guest? You have to update > >> the flags in the control blocks for all vcpus; then restore them when > >> the interrupt is delivered? > > > > We may access the flags for any vcpu via the kvm_s390_float_interrupt > > structure which is contained in kvm->arch. > > > > We'll get control when a vcpu enters a wait state and try to deliver > > pending interrupts or set/clear the flags. Moreover, idle vcpus are on > > a wait queue and are the first target for getting an interrupt injected. > > Okay. And you can ask a vcpu to exit when it enables interrupts, > without interrupting it? Yes. See arch/s390/kvm/interrupt.c and the CPUSTAT_* flags. > > On x86, we have to IPI the vcpu so it drops to the host, ask it to let > us know when interrupts are enabled, and let it run again. > Cornelia
On 23 July 2012 13:26, Avi Kivity <avi@redhat.com> wrote: > On 07/21/2012 11:54 AM, Peter Maydell wrote: >> The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel() >> is because I think they're all similar to this -- the common code is >> using the check as a proxy for something else, and it should be directly >> asking about that something else. The only bits of code that should >> care about "is the irqchip in kernel?" are: >> * target-specific device/machine setup code which needs to know >> which apic/etc to instantiate >> * target-specific x86 code which has this weird synchronous IRQ >> delivery model for irqchip-not-in-kernel >> (Obviously I might have missed something, I'm flailing around >> trying to understand this code :-)) > > Agree naming should be improved. In fact the early series I pushed to > decompose local apic, ioapic, and pic, but that didn't happen. If it > did we'd probably not have this conversation. OK, let's see if we can get some agreement about naming here. First, some test-functions I think we definitely need: kvm_interrupts_are_async() -- true if interrupt delivery is asynchronous default false in kvm_init, set true in kvm_irqchip_create, architectures may set it true in kvm_arch_init [ARM will do so; PPC might want to do so] kvm_irqchip_in_kernel() -- the user-settable option, actual behaviour is arch specific on x86, true means (as it does now) LAPIC,IOAPIC,PIT in kernel on ARM, we ignore this setting and just DTRT on PPC, used as a convenience setting for whether to use an in-kernel model of the interrupt controller Shouldn't be used in non-target-specific code and two I'm not quite so sure about: kvm_has_msi_routing() -- true if we can do routing of MSIs set true only if x86 and kvm_irqchip_in_kernel kvm_has_irqfds() -- true if kernel supports IRQFDs currently true only if x86 and kvm_irqchip_in_kernel Second, current uses of kvm_irqchip_in_kernel(): hw/kvmvapic.c, hw/pc.c, hw/pc_piix.c, target-i386/kvm.c: -- these are all x86 specific and can continue to use kvm_irqchip_in_kernel() cpus.c:cpu_thread_is_idle() -- should use !kvm_interrupts_are_async() [because halt is in userspace iff we're using the synchronous interrupt model] kvm-all.c:kvm_irqchip_set_irq(): -- (just an assert) should be kvm_interrupts_are_async() kvm-all.c:kvm_irqchip_add_msi_route(): -- should be kvm_have_msi_routing() kvm-all.c:kvm_irqchip_assign_irqfd(): -- should be true if kvm_has_irqfds() kvm-all.c:kvm_allows_irq0_override(): -- this still seems to me to be a completely x86 specific concept; it should move to a source file in target-x86 and then it can continue to use kvm_irqchip_in_kernel() hw/virtio-pci.c:virtio_pci_set_guest_notifiers() -- not entirely sure about this one but I think it should be testing kvm_has_msi_routing(). Any mistakes/countersuggestions? thanks -- PMM
On 2012-07-23 17:19, Peter Maydell wrote: > On 23 July 2012 13:26, Avi Kivity <avi@redhat.com> wrote: >> On 07/21/2012 11:54 AM, Peter Maydell wrote: >>> The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel() >>> is because I think they're all similar to this -- the common code is >>> using the check as a proxy for something else, and it should be directly >>> asking about that something else. The only bits of code that should >>> care about "is the irqchip in kernel?" are: >>> * target-specific device/machine setup code which needs to know >>> which apic/etc to instantiate >>> * target-specific x86 code which has this weird synchronous IRQ >>> delivery model for irqchip-not-in-kernel >>> (Obviously I might have missed something, I'm flailing around >>> trying to understand this code :-)) >> >> Agree naming should be improved. In fact the early series I pushed to >> decompose local apic, ioapic, and pic, but that didn't happen. If it >> did we'd probably not have this conversation. > > OK, let's see if we can get some agreement about naming here. > > First, some test-functions I think we definitely need: > > kvm_interrupts_are_async() > -- true if interrupt delivery is asynchronous > default false in kvm_init, set true in kvm_irqchip_create, > architectures may set it true in kvm_arch_init [ARM will > do so; PPC might want to do so] > > kvm_irqchip_in_kernel() > -- the user-settable option, actual behaviour is arch specific > on x86, true means (as it does now) LAPIC,IOAPIC,PIT in kernel > on ARM, we ignore this setting and just DTRT You should reject kernel_irqchip=off as long as you only support an in-kernel GIC model. > on PPC, used as a convenience setting for whether to use > an in-kernel model of the interrupt controller > Shouldn't be used in non-target-specific code > > and two I'm not quite so sure about: > > kvm_has_msi_routing() > -- true if we can do routing of MSIs GSI, not MSI. > set true only if x86 and kvm_irqchip_in_kernel It means that the target architecture supports routing of various interrupt sources (userspace, irqfds, in-kernel device models) to different in-kernel IRQ sinks (CPU cores, irqchip models, whatever). Interrupt messages via (binary-state) irqfd depend on it. > > kvm_has_irqfds() > -- true if kernel supports IRQFDs > currently true only if x86 and kvm_irqchip_in_kernel Note that this and the above are currently static feature tests, not mode checks (i.e. they are true even if kernel_irqchip=off). The "kvm_has" namespace is reserved for such tests. > > > Second, current uses of kvm_irqchip_in_kernel(): > > hw/kvmvapic.c, hw/pc.c, hw/pc_piix.c, target-i386/kvm.c: > -- these are all x86 specific and can continue to use > kvm_irqchip_in_kernel() > > cpus.c:cpu_thread_is_idle() > -- should use !kvm_interrupts_are_async() [because halt is > in userspace iff we're using the synchronous interrupt model] > > kvm-all.c:kvm_irqchip_set_irq(): > -- (just an assert) should be kvm_interrupts_are_async() The name kvm_irqchip_set_irq implies so far that it injects into an in-kernel irqchip model. Either different functions for archs that don't follow this concept need to be provided, or this function requires renaming (kvm_set_irq_async or so). > > kvm-all.c:kvm_irqchip_add_msi_route(): > -- should be kvm_have_msi_routing() Only if you change the semantics of kvm_has_gsi_routing (and rename it). > > kvm-all.c:kvm_irqchip_assign_irqfd(): > -- should be true if kvm_has_irqfds() The same issue. Plus there is that naming conflict again if we should ever see irqfd without some in-kernel irqchip. But even s390 would have a logical "irqchip" for me at the point it may route interrupt messages from devices directly to the CPUs. > > kvm-all.c:kvm_allows_irq0_override(): > -- this still seems to me to be a completely x86 specific concept; > it should move to a source file in target-x86 and then it > can continue to use kvm_irqchip_in_kernel() > > hw/virtio-pci.c:virtio_pci_set_guest_notifiers() > -- not entirely sure about this one but I think it > should be testing kvm_has_msi_routing(). It depends on full irqfd support, which includes IRQ routing to allow MSI via irqfd. Something like kvm_msi_via_irqfd_available. Jan
On 23 July 2012 17:55, Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2012-07-23 17:19, Peter Maydell wrote: >> OK, let's see if we can get some agreement about naming here. >> >> First, some test-functions I think we definitely need: >> >> kvm_interrupts_are_async() >> -- true if interrupt delivery is asynchronous >> default false in kvm_init, set true in kvm_irqchip_create, >> architectures may set it true in kvm_arch_init [ARM will >> do so; PPC might want to do so] >> >> kvm_irqchip_in_kernel() >> -- the user-settable option, actual behaviour is arch specific >> on x86, true means (as it does now) LAPIC,IOAPIC,PIT in kernel >> on ARM, we ignore this setting and just DTRT > > You should reject kernel_irqchip=off as long as you only support an > in-kernel GIC model. Agreed. (At the moment we still have code in QEMU for out-of-kernel GIC models, which is legacy from before the VGIC implementation; depending on whether that actually vanishes from the kernel ABI or not I may include the QEMU code which makes arm_pic.c handle injecting interrupts to KVM via ioctl. But in that case we would reject =off for A15 and =on for non-A15 (slightly pointlessly since non-A15 will fail the "must be an A15" check.)) >> on PPC, used as a convenience setting for whether to use >> an in-kernel model of the interrupt controller >> Shouldn't be used in non-target-specific code >> >> and two I'm not quite so sure about: >> >> kvm_has_msi_routing() >> -- true if we can do routing of MSIs > > GSI, not MSI. > >> set true only if x86 and kvm_irqchip_in_kernel > > It means that the target architecture supports routing of various > interrupt sources (userspace, irqfds, in-kernel device models) to > different in-kernel IRQ sinks (CPU cores, irqchip models, whatever). > Interrupt messages via (binary-state) irqfd depend on it. So I still don't actually understand what this interrupt routing is (as far as the kernel is concerned). Surely the way this should work is that you use KVM_IRQFD to say "this fd goes to this interrupt sink" (with some sensible scheme for encoding what the interrupt destination actually is, like an (irqchip,pin) tuple or something). Why does the kernel need to care about irq routing? >> kvm-all.c:kvm_irqchip_set_irq(): >> -- (just an assert) should be kvm_interrupts_are_async() > > The name kvm_irqchip_set_irq implies so far that it injects into an > in-kernel irqchip model. Either different functions for archs that don't > follow this concept need to be provided, or this function requires > renaming (kvm_set_irq_async or so). Yes, renaming the function would probably be reasonable. (Currently my QEMU code for using the VGIC just does a direct ioctl itself though, because kvm_irqchip_set_irq() doesn't actually do very much and it's a bit awkward that it insists on fishing the ioctl number out from the KVMState.) >> kvm-all.c:kvm_irqchip_assign_irqfd(): >> -- should be true if kvm_has_irqfds() > > The same issue. Plus there is that naming conflict again if we should > ever see irqfd without some in-kernel irqchip. But even s390 would have > a logical "irqchip" for me at the point it may route interrupt messages > from devices directly to the CPUs. I don't think you can have irqfds without an irqchip because where would you be sending the interrupts? -- PMM
On 2012-07-23 19:41, Peter Maydell wrote: > On 23 July 2012 17:55, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> On 2012-07-23 17:19, Peter Maydell wrote: >>> OK, let's see if we can get some agreement about naming here. >>> >>> First, some test-functions I think we definitely need: >>> >>> kvm_interrupts_are_async() >>> -- true if interrupt delivery is asynchronous >>> default false in kvm_init, set true in kvm_irqchip_create, >>> architectures may set it true in kvm_arch_init [ARM will >>> do so; PPC might want to do so] >>> >>> kvm_irqchip_in_kernel() >>> -- the user-settable option, actual behaviour is arch specific >>> on x86, true means (as it does now) LAPIC,IOAPIC,PIT in kernel >>> on ARM, we ignore this setting and just DTRT >> >> You should reject kernel_irqchip=off as long as you only support an >> in-kernel GIC model. > > Agreed. (At the moment we still have code in QEMU for out-of-kernel > GIC models, which is legacy from before the VGIC implementation; > depending on whether that actually vanishes from the kernel ABI > or not I may include the QEMU code which makes arm_pic.c handle > injecting interrupts to KVM via ioctl. But in that case we would > reject =off for A15 and =on for non-A15 (slightly pointlessly > since non-A15 will fail the "must be an A15" check.)) > >>> on PPC, used as a convenience setting for whether to use >>> an in-kernel model of the interrupt controller >>> Shouldn't be used in non-target-specific code >>> >>> and two I'm not quite so sure about: >>> >>> kvm_has_msi_routing() >>> -- true if we can do routing of MSIs >> >> GSI, not MSI. >> >>> set true only if x86 and kvm_irqchip_in_kernel >> >> It means that the target architecture supports routing of various >> interrupt sources (userspace, irqfds, in-kernel device models) to >> different in-kernel IRQ sinks (CPU cores, irqchip models, whatever). >> Interrupt messages via (binary-state) irqfd depend on it. > > So I still don't actually understand what this interrupt routing > is (as far as the kernel is concerned). Surely the way this should > work is that you use KVM_IRQFD to say "this fd goes to this > interrupt sink" (with some sensible scheme for encoding what the > interrupt destination actually is, like an (irqchip,pin) tuple > or something). Why does the kernel need to care about irq > routing? - to define where to inject if you have multiple irqchips to address, or if there is a special sink that reattaches payload (e.g. MSI messages) to a binary source like irqfd is - to configure board-specific re-directions (like that IRQ0 override on x86) > >>> kvm-all.c:kvm_irqchip_set_irq(): >>> -- (just an assert) should be kvm_interrupts_are_async() >> >> The name kvm_irqchip_set_irq implies so far that it injects into an >> in-kernel irqchip model. Either different functions for archs that don't >> follow this concept need to be provided, or this function requires >> renaming (kvm_set_irq_async or so). > > Yes, renaming the function would probably be reasonable. > (Currently my QEMU code for using the VGIC just does a direct > ioctl itself though, because kvm_irqchip_set_irq() doesn't actually > do very much and it's a bit awkward that it insists on fishing > the ioctl number out from the KVMState.) The latter has to do with legacy and lacking support for certain IOCTLs on older kernels. However, it shouldn't hurt you to take the generic route for the sake of consistency and a to ease instrumentations etc. > >>> kvm-all.c:kvm_irqchip_assign_irqfd(): >>> -- should be true if kvm_has_irqfds() >> >> The same issue. Plus there is that naming conflict again if we should >> ever see irqfd without some in-kernel irqchip. But even s390 would have >> a logical "irqchip" for me at the point it may route interrupt messages >> from devices directly to the CPUs. > > I don't think you can have irqfds without an irqchip because where > would you be sending the interrupts? To the CPU cores. But even that requires routing and arbitration if there are multiple receivers. So I think s390 has an in-kernel (or "in-hardware") irqchip, conceptually. Jan
On 23 July 2012 15:30, Avi Kivity <avi@redhat.com> wrote: > But I was only joking. Nested virtualization is interesting technically > but so far I haven't seen any huge or even small uptake. Yes; that (as I understand it) is why it wasn't an expected use case for the architecture extensions. The other related thing that might be surprising for x86-background people is that being able to present the guest with a virtual CPU that looks like a pre-virtualization CPU (eg the A9) isn't really an intended use case either. (The ARM world has much less of the 'everything must be fully backwards compatible for existing OSes' than x86...) -- PMM
On 07/23/2012 08:58 PM, Peter Maydell wrote: > On 23 July 2012 15:30, Avi Kivity <avi@redhat.com> wrote: >> But I was only joking. Nested virtualization is interesting technically >> but so far I haven't seen any huge or even small uptake. > > Yes; that (as I understand it) is why it wasn't an expected use > case for the architecture extensions. The other related thing that > might be surprising for x86-background people is that being > able to present the guest with a virtual CPU that looks like > a pre-virtualization CPU (eg the A9) isn't really an intended > use case either. (The ARM world has much less of the 'everything > must be fully backwards compatible for existing OSes' than x86...) I expect this to change once ARM servers become a reality.
On 24 July 2012 09:50, Avi Kivity <avi@redhat.com> wrote: > On 07/23/2012 08:58 PM, Peter Maydell wrote: >> The other related thing that >> might be surprising for x86-background people is that being >> able to present the guest with a virtual CPU that looks like >> a pre-virtualization CPU (eg the A9) isn't really an intended >> use case either. (The ARM world has much less of the 'everything >> must be fully backwards compatible for existing OSes' than x86...) > > I expect this to change once ARM servers become a reality. Yes, compatibility for guests migrating forwards from old virt-supporting host CPUs to new ones is important; it's only the pre-virt to post-virt that is less interesting. -- PMM
On 07/23/2012 06:19 PM, Peter Maydell wrote: > On 23 July 2012 13:26, Avi Kivity <avi@redhat.com> wrote: >> On 07/21/2012 11:54 AM, Peter Maydell wrote: >>> The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel() >>> is because I think they're all similar to this -- the common code is >>> using the check as a proxy for something else, and it should be directly >>> asking about that something else. The only bits of code that should >>> care about "is the irqchip in kernel?" are: >>> * target-specific device/machine setup code which needs to know >>> which apic/etc to instantiate >>> * target-specific x86 code which has this weird synchronous IRQ >>> delivery model for irqchip-not-in-kernel >>> (Obviously I might have missed something, I'm flailing around >>> trying to understand this code :-)) >> >> Agree naming should be improved. In fact the early series I pushed to >> decompose local apic, ioapic, and pic, but that didn't happen. If it >> did we'd probably not have this conversation. > > OK, let's see if we can get some agreement about naming here. > > First, some test-functions I think we definitely need: > > kvm_interrupts_are_async() > -- true if interrupt delivery is asynchronous > default false in kvm_init, set true in kvm_irqchip_create, > architectures may set it true in kvm_arch_init [ARM will > do so; PPC might want to do so] Interrupts are by nature async. I'd say kvm_async_interrupt_injection() to make it clearer. > > kvm_irqchip_in_kernel() > -- the user-settable option, actual behaviour is arch specific > on x86, true means (as it does now) LAPIC,IOAPIC,PIT in kernel > on ARM, we ignore this setting and just DTRT > on PPC, used as a convenience setting for whether to use > an in-kernel model of the interrupt controller > Shouldn't be used in non-target-specific code If it's 100% arch specific, the name can/should be arch specific since it will never be used in generic core. So kvm_ioapic_in_kernel(), kvm_gic_in_kernel() (or even kvm_ioapic(), kvm_gic(), since "kvm" already implies the kernel (that's the k in kvm, after all). > > and two I'm not quite so sure about: > > kvm_has_msi_routing() > -- true if we can do routing of MSIs > set true only if x86 and kvm_irqchip_in_kernel > > kvm_has_irqfds() > -- true if kernel supports IRQFDs > currently true only if x86 and kvm_irqchip_in_kernel Will be true for everyone, eventually.
On 2012-07-24 10:54, Peter Maydell wrote: > On 24 July 2012 09:50, Avi Kivity <avi@redhat.com> wrote: >> On 07/23/2012 08:58 PM, Peter Maydell wrote: >>> The other related thing that >>> might be surprising for x86-background people is that being >>> able to present the guest with a virtual CPU that looks like >>> a pre-virtualization CPU (eg the A9) isn't really an intended >>> use case either. (The ARM world has much less of the 'everything >>> must be fully backwards compatible for existing OSes' than x86...) >> >> I expect this to change once ARM servers become a reality. > > Yes, compatibility for guests migrating forwards from old > virt-supporting host CPUs to new ones is important; it's > only the pre-virt to post-virt that is less interesting. One of the many use cases for virtualization on other archs is precisely this (we run 286 code on latest and greatest CPUs in KVM). There should be quite a few embedded systems out there that could happily be consolidated on a more modern and powerful ARM platform via virtualization. Jan
diff --git a/hw/pc.c b/hw/pc.c index 598267a..4953376 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -42,6 +42,7 @@ #include "sysbus.h" #include "sysemu.h" #include "kvm.h" +#include "kvm_i386.h" #include "xen.h" #include "blockdev.h" #include "hw/block-common.h" diff --git a/kvm-all.c b/kvm-all.c index 2148b20..65f1160 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1667,11 +1667,6 @@ int kvm_has_gsi_routing(void) #endif } -int kvm_allows_irq0_override(void) -{ - return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing(); -} - void *kvm_vmalloc(ram_addr_t size) { #ifdef TARGET_S390X diff --git a/kvm.h b/kvm.h index 2617dd5..c46132f 100644 --- a/kvm.h +++ b/kvm.h @@ -62,8 +62,6 @@ int kvm_has_pit_state2(void); int kvm_has_many_ioeventfds(void); int kvm_has_gsi_routing(void); -int kvm_allows_irq0_override(void); - #ifdef NEED_CPU_H int kvm_init_vcpu(CPUArchState *env); diff --git a/target-i386/kvm.c b/target-i386/kvm.c index e53c2f6..503abeb 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -23,6 +23,7 @@ #include "qemu-common.h" #include "sysemu.h" #include "kvm.h" +#include "kvm_i386.h" #include "cpu.h" #include "gdbstub.h" #include "host-utils.h" @@ -65,6 +66,11 @@ static bool has_msr_async_pf_en; static bool has_msr_misc_enable; static int lm_capable_kernel; +int kvm_allows_irq0_override(void) +{ + return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing(); +} + static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max) { struct kvm_cpuid2 *cpuid; diff --git a/target-i386/kvm_i386.h b/target-i386/kvm_i386.h new file mode 100644 index 0000000..87031c0 --- /dev/null +++ b/target-i386/kvm_i386.h @@ -0,0 +1,16 @@ +/* + * QEMU KVM support -- x86 specific functions. + * + * Copyright (c) 2012 Linaro Limited + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ + +#ifndef QEMU_KVM_I386_H +#define QEMU_KVM_I386_H + +int kvm_allows_irq0_override(void); + +#endif
kvm_allows_irq0_override() is a totally x86 specific concept: move it to the target-specific source file where it belongs. This means we need a new header file for the prototype: kvm_i386.h, in line with the existing kvm_ppc.h. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- I'm sure this isn't the only x86ism in the KVM generic source files. However the thing I'm specifically trying to do is nuke all the uses of kvm_irqchip_in_kernel() in common code, as part of trying to untangle "what irqchip model do you want" from other aspects of the interrupt handling model. Other than this one, there are uses in cpus.c and hw/virtio-pci.c, but I haven't figured out yet exactly what those bits of code are trying to do... hw/pc.c | 1 + kvm-all.c | 5 ----- kvm.h | 2 -- target-i386/kvm.c | 6 ++++++ target-i386/kvm_i386.h | 16 ++++++++++++++++ 5 files changed, 23 insertions(+), 7 deletions(-) create mode 100644 target-i386/kvm_i386.h