Message ID | 20201109154831.20779-1-pasic@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [1/1] virtio-blk-ccw: tweak the default for num_queues | expand |
On Mon, 9 Nov 2020 16:48:31 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > Currently the default value of num_queues is effectively 1 for > virtio-blk-ccw. Recently 9445e1e15e ("virtio-blk-pci: default num_queues > to -smp N") changed the default for pci to the number of vcpus, citing > interrupt better locality and better performance as a rationale. > > While virtio-blk-ccw does not yet benefit from better interrupt > locality, measurements have shown that for secure VMs multiqueue does > help with performance. Since the bounce buffering happens with the queue > lock held (in the guest) this is hardly a surprise. > > As for non-secure VMs the extra queues shouldn't hurt too much. > > Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> > Signed-off-by: Halil Pasic <pasic@linux.ibm.com> > --- > > We would prefer to land this for 5.2. If we do then commit 9445e1e15e > ("virtio-blk-pci: default num_queues to -smp N") took care of all the > necessary compat handling. > > If that's not possible, I will send a version that does the necessary > compat handling. I think we can still get this into 5.2, and that would indeed be less hassle. > --- > hw/s390x/virtio-ccw-blk.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/hw/s390x/virtio-ccw-blk.c b/hw/s390x/virtio-ccw-blk.c > index 2294ce1ce4..7296140dde 100644 > --- a/hw/s390x/virtio-ccw-blk.c > +++ b/hw/s390x/virtio-ccw-blk.c > @@ -10,6 +10,7 @@ > */ > > #include "qemu/osdep.h" > +#include "hw/boards.h" > #include "hw/qdev-properties.h" > #include "hw/virtio/virtio.h" > #include "qapi/error.h" > @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp) > { > VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev); > DeviceState *vdev = DEVICE(&dev->vdev); > + VirtIOBlkConf *conf = &dev->vdev.conf; > + > + if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) { > + conf->num_queues = MIN(4, current_machine->smp.cpus); > + } I would like to have a comment explaining the numbers here, however. virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if possible, apply some other capping). 4 seems to be a bit arbitrary without explanation, although I'm sure you did some measurements :) Do we also want something similar for virtio-scsi (and vhost-scsi)? > > qdev_realize(vdev, BUS(&ccw_dev->bus), errp); > } > > base-commit: 2a190a7256a3e0563b29ffd67e0164097b4a6dac
On Mon, 9 Nov 2020 17:06:16 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > > @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp) > > { > > VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev); > > DeviceState *vdev = DEVICE(&dev->vdev); > > + VirtIOBlkConf *conf = &dev->vdev.conf; > > + > > + if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) { > > + conf->num_queues = MIN(4, current_machine->smp.cpus); > > + } > > I would like to have a comment explaining the numbers here, however. > > virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if > possible, apply some other capping). 4 seems to be a bit arbitrary > without explanation, although I'm sure you did some measurements :) Frankly, I don't have any measurements yet. For the secure case, I think Mimu has assessed the impact of multiqueue, hence adding Mimu to the cc list. @Mimu can you help us out. Regarding the normal non-protected VMs I'm in a middle of producing some measurement data. This was admittedly a bit rushed because of where we are in the cycle. Sorry to disappoint you. The number 4 was suggested by Christian, maybe Christian does have some readily available measurement data for the normal VM case. @Christian: can you help me out? Regards, Halil
On 10.11.20 15:16, Michael Mueller wrote: > > > On 09.11.20 19:53, Halil Pasic wrote: >> On Mon, 9 Nov 2020 17:06:16 +0100 >> Cornelia Huck <cohuck@redhat.com> wrote: >> >>>> @@ -20,6 +21,11 @@ static void >>>> virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp) >>>> { >>>> VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev); >>>> DeviceState *vdev = DEVICE(&dev->vdev); >>>> + VirtIOBlkConf *conf = &dev->vdev.conf; >>>> + >>>> + if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) { >>>> + conf->num_queues = MIN(4, current_machine->smp.cpus); >>>> + } >>> >>> I would like to have a comment explaining the numbers here, however. >>> >>> virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if >>> possible, apply some other capping). 4 seems to be a bit arbitrary >>> without explanation, although I'm sure you did some measurements :) >> >> Frankly, I don't have any measurements yet. For the secure case, >> I think Mimu has assessed the impact of multiqueue, hence adding Mimu to >> the cc list. @Mimu can you help us out. >> >> Regarding the normal non-protected VMs I'm in a middle of producing some >> measurement data. This was admittedly a bit rushed because of where we >> are in the cycle. Sorry to disappoint you. > > I'm talking with the perf team tomorrow. They have done some > measurements with multiqueue for PV guests and I asked for a comparison > to non PV guests as well. The perf team has performed measurements for us that show that a *PV KVM guest* benefits in terms of throughput for random read, random write and sequential read (no difference for sequential write) by a multi queue setup. CPU cost are reduced as well due to reduced spinlock contention. For a *standard KVM guest* it currently has no throughput effect. No benefit and no harm. I have asked them to finalize their measurements by comparing the CPU cost as well. I will receive that information on Friday. Michael > > Michael > >> >> The number 4 was suggested by Christian, maybe Christian does have some >> readily available measurement data for the normal VM case. @Christian: >> can you help me out? >> >> Regards, >> Halil >> >
On Wed, 11 Nov 2020 13:26:11 +0100 Michael Mueller <mimu@linux.ibm.com> wrote: > On 10.11.20 15:16, Michael Mueller wrote: > > > > > > On 09.11.20 19:53, Halil Pasic wrote: > >> On Mon, 9 Nov 2020 17:06:16 +0100 > >> Cornelia Huck <cohuck@redhat.com> wrote: > >> > >>>> @@ -20,6 +21,11 @@ static void > >>>> virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp) > >>>> { > >>>> VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev); > >>>> DeviceState *vdev = DEVICE(&dev->vdev); > >>>> + VirtIOBlkConf *conf = &dev->vdev.conf; > >>>> + > >>>> + if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) { > >>>> + conf->num_queues = MIN(4, current_machine->smp.cpus); > >>>> + } > >>> > >>> I would like to have a comment explaining the numbers here, however. > >>> > >>> virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if > >>> possible, apply some other capping). 4 seems to be a bit arbitrary > >>> without explanation, although I'm sure you did some measurements :) > >> > >> Frankly, I don't have any measurements yet. For the secure case, > >> I think Mimu has assessed the impact of multiqueue, hence adding Mimu to > >> the cc list. @Mimu can you help us out. > >> > >> Regarding the normal non-protected VMs I'm in a middle of producing some > >> measurement data. This was admittedly a bit rushed because of where we > >> are in the cycle. Sorry to disappoint you. > > > > I'm talking with the perf team tomorrow. They have done some > > measurements with multiqueue for PV guests and I asked for a comparison > > to non PV guests as well. > > The perf team has performed measurements for us that show that a *PV > KVM guest* benefits in terms of throughput for random read, random write > and sequential read (no difference for sequential write) by a multi > queue setup. CPU cost are reduced as well due to reduced spinlock > contention. Just to be clear, that was with 4 queues? > > For a *standard KVM guest* it currently has no throughput effect. No > benefit and no harm. I have asked them to finalize their measurements > by comparing the CPU cost as well. I will receive that information on > Friday. Thank you for checking! > > Michael > > > > > > Michael > > > >> > >> The number 4 was suggested by Christian, maybe Christian does have some > >> readily available measurement data for the normal VM case. @Christian: > >> can you help me out? > >> > >> Regards, > >> Halil > >> > > > >
On 11.11.20 13:38, Cornelia Huck wrote: > On Wed, 11 Nov 2020 13:26:11 +0100 > Michael Mueller <mimu@linux.ibm.com> wrote: > >> On 10.11.20 15:16, Michael Mueller wrote: >>> >>> >>> On 09.11.20 19:53, Halil Pasic wrote: >>>> On Mon, 9 Nov 2020 17:06:16 +0100 >>>> Cornelia Huck <cohuck@redhat.com> wrote: >>>> >>>>>> @@ -20,6 +21,11 @@ static void >>>>>> virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp) >>>>>> { >>>>>> VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev); >>>>>> DeviceState *vdev = DEVICE(&dev->vdev); >>>>>> + VirtIOBlkConf *conf = &dev->vdev.conf; >>>>>> + >>>>>> + if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) { >>>>>> + conf->num_queues = MIN(4, current_machine->smp.cpus); >>>>>> + } >>>>> >>>>> I would like to have a comment explaining the numbers here, however. >>>>> >>>>> virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if >>>>> possible, apply some other capping). 4 seems to be a bit arbitrary >>>>> without explanation, although I'm sure you did some measurements :) >>>> >>>> Frankly, I don't have any measurements yet. For the secure case, >>>> I think Mimu has assessed the impact of multiqueue, hence adding Mimu to >>>> the cc list. @Mimu can you help us out. >>>> >>>> Regarding the normal non-protected VMs I'm in a middle of producing some >>>> measurement data. This was admittedly a bit rushed because of where we >>>> are in the cycle. Sorry to disappoint you. >>> >>> I'm talking with the perf team tomorrow. They have done some >>> measurements with multiqueue for PV guests and I asked for a comparison >>> to non PV guests as well. >> >> The perf team has performed measurements for us that show that a *PV >> KVM guest* benefits in terms of throughput for random read, random write >> and sequential read (no difference for sequential write) by a multi >> queue setup. CPU cost are reduced as well due to reduced spinlock >> contention. > > Just to be clear, that was with 4 queues? Yes, we have seen it with 4 and also with 9 queues. Halil, still I would like to know what the exact memory consumption per queue is that you are talking about. Have you made a calculation? Thanks. > >> >> For a *standard KVM guest* it currently has no throughput effect. No >> benefit and no harm. I have asked them to finalize their measurements >> by comparing the CPU cost as well. I will receive that information on >> Friday. > > Thank you for checking! > >> >> Michael >> >> >>> >>> Michael >>> >>>> >>>> The number 4 was suggested by Christian, maybe Christian does have some >>>> readily available measurement data for the normal VM case. @Christian: >>>> can you help me out? >>>> >>>> Regards, >>>> Halil >>>> >>> >> >> >
On Wed, 11 Nov 2020 13:38:15 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > Tags: inv, me > From: Cornelia Huck <cohuck@redhat.com> > To: Michael Mueller <mimu@linux.ibm.com> > Cc: Thomas Huth <thuth@redhat.com>, David Hildenbrand <david@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, qemu-devel@nongnu.org, Halil Pasic <pasic@linux.ibm.com>, Christian Borntraeger <borntraeger@de.ibm.com>, qemu-s390x@nongnu.org > Subject: Re: [PATCH 1/1] virtio-blk-ccw: tweak the default for num_queues > Date: Wed, 11 Nov 2020 13:38:15 +0100 > Sender: "Qemu-devel" <qemu-devel-bounces+pasic=linux.ibm.com@nongnu.org> > Organization: Red Hat GmbH > > On Wed, 11 Nov 2020 13:26:11 +0100 > Michael Mueller <mimu@linux.ibm.com> wrote: > > > On 10.11.20 15:16, Michael Mueller wrote: > > > > > > > > > On 09.11.20 19:53, Halil Pasic wrote: > > >> On Mon, 9 Nov 2020 17:06:16 +0100 > > >> Cornelia Huck <cohuck@redhat.com> wrote: > > >> > > >>>> @@ -20,6 +21,11 @@ static void > > >>>> virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp) > > >>>> { > > >>>> VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev); > > >>>> DeviceState *vdev = DEVICE(&dev->vdev); > > >>>> + VirtIOBlkConf *conf = &dev->vdev.conf; > > >>>> + > > >>>> + if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) { > > >>>> + conf->num_queues = MIN(4, current_machine->smp.cpus); > > >>>> + } > > >>> > > >>> I would like to have a comment explaining the numbers here, however. > > >>> > > >>> virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if > > >>> possible, apply some other capping). 4 seems to be a bit arbitrary > > >>> without explanation, although I'm sure you did some measurements :) > > >> > > >> Frankly, I don't have any measurements yet. For the secure case, > > >> I think Mimu has assessed the impact of multiqueue, hence adding Mimu to > > >> the cc list. @Mimu can you help us out. > > >> > > >> Regarding the normal non-protected VMs I'm in a middle of producing some > > >> measurement data. This was admittedly a bit rushed because of where we > > >> are in the cycle. Sorry to disappoint you. > > > > > > I'm talking with the perf team tomorrow. They have done some > > > measurements with multiqueue for PV guests and I asked for a comparison > > > to non PV guests as well. > > > > The perf team has performed measurements for us that show that a *PV > > KVM guest* benefits in terms of throughput for random read, random write > > and sequential read (no difference for sequential write) by a multi > > queue setup. CPU cost are reduced as well due to reduced spinlock > > contention. > > Just to be clear, that was with 4 queues? > > > > > For a *standard KVM guest* it currently has no throughput effect. No > > benefit and no harm. I have asked them to finalize their measurements > > by comparing the CPU cost as well. I will receive that information on > > Friday. > > Thank you for checking! The results of my measurements (normal case only) are consistent with these findings. My setup looks like this: A guest with 6 vcpus and an attached virtio-blk-ccw disk backed by a raw image on a tmpfs (i.e. backed by ram, because we are interested in virtio and not in the scsi disk). The performance was evaluated with fio, randrw, queuedepth=1 and bs=1M. I scaled the number of virtqueues from 1 to 5, and collected 30 data points each. The full fio command line I used is at the end of this mail. For a nicer table, please see the attached png. Regarding the difference in averages, it's little about 1,2 percent. The percentages are with respect to average over all queues values. queues 1 Average - write_iops 99.45% Average - write_bw 99.45% Average - read_iops 99.44% Average - read_bw 99.44% 2 Average - write_iops 99.93% Average - write_bw 99.93% Average - read_iops 99.92% Average - read_bw 99.92% 3 Average - write_iops 100.02% Average - write_bw 100.02% Average - read_iops 100.02% Average - read_bw 100.02% 4 Average - write_iops 100.64% Average - write_bw 100.64% Average - read_iops 100.64% Average - read_bw 100.64% 5 Average - write_iops 99.97% Average - write_bw 99.97% Average - read_iops 99.97% Average - read_bw 99.97% Total Average - write_iops 100.00% Total Average - write_bw 100.00% Total Average - read_iops 100.00% Total Average - read_bw 100.00% fio --ramp_time=30s --output-format=json --bs=1M --ioengine=libaio --readwrite=randrw --runtime=120s --size=1948m --name=measurement --gtod_reduce=1 --direct=1 --iodepth=1 --filename=/dev/vda --time_based
On Wed, 11 Nov 2020 13:49:08 +0100 Michael Mueller <mimu@linux.ibm.com> wrote: > Halil, > > still I would like to know what the exact memory consumption per queue > is that you are talking about. Have you made a calculation? Thanks. Hi! The default size for virtio-blk seems to be 256 ring entries, which translates to 6668 bytes for the split ring(s). The queue size is user configurable, and guest, in theory, can decide to have a smaller queue. The indirect descriptors are allocated separately, and bounced via swiotlb in case of secure guests. Regards, Halil
On Tue, 10 Nov 2020 14:18:39 +0100 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 10.11.20 11:40, Halil Pasic wrote: > > On Tue, 10 Nov 2020 09:47:51 +0100 > > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > >> > >> > >> On 09.11.20 19:53, Halil Pasic wrote: > >>> On Mon, 9 Nov 2020 17:06:16 +0100 > >>> Cornelia Huck <cohuck@redhat.com> wrote: > >>> > >>>>> @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp) > >>>>> { > >>>>> VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev); > >>>>> DeviceState *vdev = DEVICE(&dev->vdev); > >>>>> + VirtIOBlkConf *conf = &dev->vdev.conf; > >>>>> + > >>>>> + if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) { > >>>>> + conf->num_queues = MIN(4, current_machine->smp.cpus); > >>>>> + } > >>>> > >>>> I would like to have a comment explaining the numbers here, however. > >>>> > >>>> virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if > >>>> possible, apply some other capping). 4 seems to be a bit arbitrary > >>>> without explanation, although I'm sure you did some measurements :) > >>> > >>> Frankly, I don't have any measurements yet. For the secure case, > >>> I think Mimu has assessed the impact of multiqueue, hence adding Mimu to > >>> the cc list. @Mimu can you help us out. > >>> > >>> Regarding the normal non-protected VMs I'm in a middle of producing some > >>> measurement data. This was admittedly a bit rushed because of where we > >>> are in the cycle. Sorry to disappoint you. > >>> > >>> The number 4 was suggested by Christian, maybe Christian does have some > >>> readily available measurement data for the normal VM case. @Christian: > >>> can you help me out? > >> My point was to find a balance between performance gain and memory usage. > >> As a matter of fact, virtqueue do consume memory. So 4 looked like a > >> reasonable default for me for large guests as long as we do not have directed > >> interrupts. > >> > >> Now, thinking about this again: If we want to change the default to something > >> else in the future (e.g. to num vcpus) then the compat handling will get > >> really complicated. > > > > Regarding compat handling, I believe we would need a new property for > > virtio-blk-ccw: something like def_num_queues_max. Then logic would > > morph to MIN(def_num_queues_max, current_machine->smp.cpus), and we could > > relatively freely do compat stuff on def_num_queues_max. > > > > IMHO not pretty but certainly doable. > > > >> > >> So we can > >> - go with num queues = num cpus. But this will consume memory > >> for guests with lots of CPUs. > > > > In absence of data that showcases the benefit outweighing the obvious > > detriment, I lean towards finding this option the least favorable. > > > >> - go with the proposed logic of min(4,vcpus) and accept that future compat handling > >> is harder > > > > IMHO not a bad option, but I think I would still feel better about a > > more informed decision. In the end, the end user can already specify the > > num_queues explicitly, so I don't think this is urgent. > > > >> - defer this change > > > > So I tend to lean towards deferring. > > Yes, I was pushing this for 5.2 to avoid compat handling. But maybe it is better > to wait and do it later. But we should certainly continue the discussion to have > something for the next release. <going through older mails> Do we have a better idea now about which values would make sense here? > > > > > Another thought is, provided the load is about evenly spread on the > > different virtqueues, if the game is about vCPU locality, one could > > think of decreasing the size of each individual virtqueue while > > increasing their number, with the idea of not paying much more in terms > > of memory. The queue size however needs to be a power of 2, > > so there is a limit on the granularity. > > > > Regarding secure VMs, currently we have to cramp at least the swiotlb and > > the virtqueues into ZONE_DMA. So increasing the number of > > virtqueues heavily may get us into new trouble with exotic configs. > > > > Regards, > > Halil > > >
On Tue, 15 Dec 2020 09:26:56 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > On Tue, 10 Nov 2020 14:18:39 +0100 > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > On 10.11.20 11:40, Halil Pasic wrote: > > > On Tue, 10 Nov 2020 09:47:51 +0100 > > > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > > > >> > > >> > > >> On 09.11.20 19:53, Halil Pasic wrote: > > >>> On Mon, 9 Nov 2020 17:06:16 +0100 > > >>> Cornelia Huck <cohuck@redhat.com> wrote: > > >>> > > >>>>> @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp) > > >>>>> { > > >>>>> VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev); > > >>>>> DeviceState *vdev = DEVICE(&dev->vdev); > > >>>>> + VirtIOBlkConf *conf = &dev->vdev.conf; > > >>>>> + > > >>>>> + if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) { > > >>>>> + conf->num_queues = MIN(4, current_machine->smp.cpus); > > >>>>> + } > > >>>> > > >>>> I would like to have a comment explaining the numbers here, however. > > >>>> > > >>>> virtio-pci has a pretty good explanation (use 1:1 for vqs:vcpus if > > >>>> possible, apply some other capping). 4 seems to be a bit arbitrary > > >>>> without explanation, although I'm sure you did some measurements :) > > >>> > > >>> Frankly, I don't have any measurements yet. For the secure case, > > >>> I think Mimu has assessed the impact of multiqueue, hence adding Mimu to > > >>> the cc list. @Mimu can you help us out. > > >>> > > >>> Regarding the normal non-protected VMs I'm in a middle of producing some > > >>> measurement data. This was admittedly a bit rushed because of where we > > >>> are in the cycle. Sorry to disappoint you. > > >>> > > >>> The number 4 was suggested by Christian, maybe Christian does have some > > >>> readily available measurement data for the normal VM case. @Christian: > > >>> can you help me out? > > >> My point was to find a balance between performance gain and memory usage. > > >> As a matter of fact, virtqueue do consume memory. So 4 looked like a > > >> reasonable default for me for large guests as long as we do not have directed > > >> interrupts. > > >> > > >> Now, thinking about this again: If we want to change the default to something > > >> else in the future (e.g. to num vcpus) then the compat handling will get > > >> really complicated. > > > > > > Regarding compat handling, I believe we would need a new property for > > > virtio-blk-ccw: something like def_num_queues_max. Then logic would > > > morph to MIN(def_num_queues_max, current_machine->smp.cpus), and we could > > > relatively freely do compat stuff on def_num_queues_max. > > > > > > IMHO not pretty but certainly doable. > > > > > >> > > >> So we can > > >> - go with num queues = num cpus. But this will consume memory > > >> for guests with lots of CPUs. > > > > > > In absence of data that showcases the benefit outweighing the obvious > > > detriment, I lean towards finding this option the least favorable. > > > > > >> - go with the proposed logic of min(4,vcpus) and accept that future compat handling > > >> is harder > > > > > > IMHO not a bad option, but I think I would still feel better about a > > > more informed decision. In the end, the end user can already specify the > > > num_queues explicitly, so I don't think this is urgent. > > > > > >> - defer this change > > > > > > So I tend to lean towards deferring. > > > > Yes, I was pushing this for 5.2 to avoid compat handling. But maybe it is better > > to wait and do it later. But we should certainly continue the discussion to have > > something for the next release. > > <going through older mails> > > Do we have a better idea now about which values would make sense here? > Hi Conny! I have nothing new since then. Capping at 4 queues still looks like a reasonable compromise to me. @Mimu: anything new since then? If you like I can spin a new version (we need compat handling now). Halil
On Tue, 15 Dec 2020 12:33:34 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > On Tue, 15 Dec 2020 09:26:56 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > > Do we have a better idea now about which values would make sense here? > > > > Hi Conny! > > I have nothing new since then. Capping at 4 queues still looks like a > reasonable compromise to me. @Mimu: anything new since then? > > If you like I can spin a new version (we need compat handling now). > > Halil > If your tests show that 4 queues are a reasonable value, this is fine by me (with an explanatory comment in the code.) Feel free to respin.
diff --git a/hw/s390x/virtio-ccw-blk.c b/hw/s390x/virtio-ccw-blk.c index 2294ce1ce4..7296140dde 100644 --- a/hw/s390x/virtio-ccw-blk.c +++ b/hw/s390x/virtio-ccw-blk.c @@ -10,6 +10,7 @@ */ #include "qemu/osdep.h" +#include "hw/boards.h" #include "hw/qdev-properties.h" #include "hw/virtio/virtio.h" #include "qapi/error.h" @@ -20,6 +21,11 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp) { VirtIOBlkCcw *dev = VIRTIO_BLK_CCW(ccw_dev); DeviceState *vdev = DEVICE(&dev->vdev); + VirtIOBlkConf *conf = &dev->vdev.conf; + + if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) { + conf->num_queues = MIN(4, current_machine->smp.cpus); + } qdev_realize(vdev, BUS(&ccw_dev->bus), errp); }
Currently the default value of num_queues is effectively 1 for virtio-blk-ccw. Recently 9445e1e15e ("virtio-blk-pci: default num_queues to -smp N") changed the default for pci to the number of vcpus, citing interrupt better locality and better performance as a rationale. While virtio-blk-ccw does not yet benefit from better interrupt locality, measurements have shown that for secure VMs multiqueue does help with performance. Since the bounce buffering happens with the queue lock held (in the guest) this is hardly a surprise. As for non-secure VMs the extra queues shouldn't hurt too much. Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> --- We would prefer to land this for 5.2. If we do then commit 9445e1e15e ("virtio-blk-pci: default num_queues to -smp N") took care of all the necessary compat handling. If that's not possible, I will send a version that does the necessary compat handling. --- hw/s390x/virtio-ccw-blk.c | 6 ++++++ 1 file changed, 6 insertions(+) base-commit: 2a190a7256a3e0563b29ffd67e0164097b4a6dac