Message ID | 20200904020907.241550-2-tom.ty89@gmail.com |
---|---|
State | Superseded |
Headers | show |
Series | [v4,1/2] file-posix: split hdev_refresh_limits from raw_refresh_limits | expand |
Maybe you want to add some condition for this: https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659 Or not clamp it at all. On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote: > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears > to have assumed that the only "SCSI Passthrough" is `-device > scsi-generic`, while the fact is there's also `-device scsi-block` > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting > max_sectors is necessary to it (more precisely, hw_max_sectors might > what matters, but BLKSECTGET reports max_sectors, so). > > I'm unsure about how qemu-nbd works, but the commit clearly wasn't the > right approach to fix the original issue it addresses. (It should for > example ignore the max_transfer if it will never matter in to it, or > overrides it in certain cases; when I glimpsed over > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how > it could be file-posix problem when it is reporting the right thing, > regardless of whether "removing" the code helps.) > > I don't think we want to "mark" `-device scsi-block` as sg either. It > will probably bring even more unexpected problems, because they are > literally different sets of things behind the scene / in the kernel. > > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote: > > > > sg devices have different major/minor than their corresponding > > block devices. Using sysfs to get max segments never really worked > > for them. > > > > Fortunately the sg driver provides an ioctl to get sg_tablesize, > > which is apparently equivalent to max segments. > > > > Signed-off-by: Tom Yan <tom.ty89@gmail.com> > > --- > > block/file-posix.c | 17 ++++++++++++++++- > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > index 411a23cf99..9e37594145 100644 > > --- a/block/file-posix.c > > +++ b/block/file-posix.c > > @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd) > > #endif > > } > > > > +static int sg_get_max_segments(int fd) > > +{ > > +#ifdef SG_GET_SG_TABLESIZE > > + long max_segments = 0; > > + > > + if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) { > > + return max_segments; > > + } else { > > + return -errno; > > + } > > +#else > > + return -ENOSYS; > > +#endif > > +} > > + > > static int get_max_transfer_length(int fd) > > { > > #if defined(BLKSECTGET) && defined(BLKSSZGET) > > @@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, Error **errp) > > bs->bl.max_transfer = pow2floor(ret); > > } > > > > - ret = get_max_segments(s->fd); > > + ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd); > > if (ret > 0) { > > bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, > > ret * qemu_real_host_page_size); > > -- > > 2.28.0 > >
On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote: > Maybe you want to add some condition for this: > https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659 > Or not clamp it at all. > > On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote: > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears > > to have assumed that the only "SCSI Passthrough" is `-device > > scsi-generic`, while the fact is there's also `-device scsi-block` > > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting > > max_sectors is necessary to it (more precisely, hw_max_sectors might > > what matters, but BLKSECTGET reports max_sectors, so). > > > > I'm unsure about how qemu-nbd works, but the commit clearly wasn't the > > right approach to fix the original issue it addresses. (It should for > > example ignore the max_transfer if it will never matter in to it, or > > overrides it in certain cases; when I glimpsed over > > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how > > it could be file-posix problem when it is reporting the right thing, > > regardless of whether "removing" the code helps.) > > > > I don't think we want to "mark" `-device scsi-block` as sg either. It > > will probably bring even more unexpected problems, because they are > > literally different sets of things behind the scene / in the kernel. Yes, I did overlook the fact that scsi-block is kind of hybrid passthough device, doing SG_IO for some things and regular IO for others. I don't think that my approach was wrong way to fix the problem, but as you found out, abusing 'bs->sg' hack (which I would be very happy to remove completely) wasn't a good idea. I actualy was aware of scsi-block and that it does SG_IO but it was forgotten some where on the way. So in summary what the problem is and what I think is the right solution: Each qemu block driver exposes block limits and assumes that they are the same for two IO interfaces the block driver can expose: 1. Regular qemu blk_pread/pwrite alike functions 2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on host block devices/sg char devices, and by iscsi The problem is that these two interfaces can have different block limits. I don't know about iscsi, but for files, doing regular IO is always unlimited, since it passes through the kernel block layer and segemented there on demand which is faster that doing it in userspace, while SG_IO is passed as is to the underlying SCSI device and lacks this segmentation. Regardless of how NBD uses these limits, I think that these limits should be correctly exposed by the block drivers, and therefore I propose is that each qemu block driver would expose a pair of block limits. One for the regular block IO, and other for SG_IO. Then block driver clients (like scsi devices that you mention, nbd, etc) can choose which limit to use, depending on which IO api they use. The scsi-generic, and scsi-block can use the SG_IO limits, while the rest can use the normal (unlimited for file I/O) limits, including the NBD. Best regards, Maxim Levitsky > > > > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote: > > > sg devices have different major/minor than their corresponding > > > block devices. Using sysfs to get max segments never really worked > > > for them. > > > > > > Fortunately the sg driver provides an ioctl to get sg_tablesize, > > > which is apparently equivalent to max segments. > > > > > > Signed-off-by: Tom Yan <tom.ty89@gmail.com> > > > --- > > > block/file-posix.c | 17 ++++++++++++++++- > > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > > index 411a23cf99..9e37594145 100644 > > > --- a/block/file-posix.c > > > +++ b/block/file-posix.c > > > @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd) > > > #endif > > > } > > > > > > +static int sg_get_max_segments(int fd) > > > +{ > > > +#ifdef SG_GET_SG_TABLESIZE > > > + long max_segments = 0; > > > + > > > + if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) { > > > + return max_segments; > > > + } else { > > > + return -errno; > > > + } > > > +#else > > > + return -ENOSYS; > > > +#endif > > > +} > > > + > > > static int get_max_transfer_length(int fd) > > > { > > > #if defined(BLKSECTGET) && defined(BLKSSZGET) > > > @@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, Error **errp) > > > bs->bl.max_transfer = pow2floor(ret); > > > } > > > > > > - ret = get_max_segments(s->fd); > > > + ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd); > > > if (ret > 0) { > > > bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, > > > ret * qemu_real_host_page_size); > > > -- > > > 2.28.0 > > >
On Sun, 2020-09-06 at 23:26 +0800, Tom Yan wrote: > I don't disagree with your proposal, but the thing is, do we even need > another field/limit for case 1? For example, do we *really* need to > clamp sizes[2] (NBD_INFO_BLOCK_SIZE) in nbd/server.c and > max_io_sectors (SCSIBlockLimits) in hw/scsi/scsi-disk.c to to any kind > of "dynamic" limit? It depends on if we have other block drivers that have IO segment/max transfer sizes, that we need to enforce, other than the SG_IO case. Knowing that the commit that added these limits to file-posix, which I was fixing is relatively new, I guess there are other cases for these limits. I'll check this tomorrow. > > Either way I can add another field (`max_pwrite`, maybe?) to > BlockLimits, as an infrastructure/hint, but what should be switched to > it, and what value should each block driver set it to, is probably > beyond what I can take. Implementation wise, I think I can do this, but I'll wait few days for the feedback on my proposal. Thanks for finding this bug! Best regards, Maxim Levitsky > > IMHO my patches should be merged first (they are really just fixing a > regression and some bugs). If anyone finds it necessary and is capable > and willing to fix the insufficiency of BlockLimits, they can do it > afterwards with another patch series as an improvement. I can add a > patch to remove the clamping in nbd/server.c as a perhaps-temporary > measure to address the original issue, if desired. > > On Sun, 6 Sep 2020 at 20:53, Maxim Levitsky <mlevitsk@redhat.com> wrote: > > On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote: > > > Maybe you want to add some condition for this: > > > https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659 > > > Or not clamp it at all. > > > > > > On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote: > > > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears > > > > to have assumed that the only "SCSI Passthrough" is `-device > > > > scsi-generic`, while the fact is there's also `-device scsi-block` > > > > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting > > > > max_sectors is necessary to it (more precisely, hw_max_sectors might > > > > what matters, but BLKSECTGET reports max_sectors, so). > > > > > > > > I'm unsure about how qemu-nbd works, but the commit clearly wasn't the > > > > right approach to fix the original issue it addresses. (It should for > > > > example ignore the max_transfer if it will never matter in to it, or > > > > overrides it in certain cases; when I glimpsed over > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how > > > > it could be file-posix problem when it is reporting the right thing, > > > > regardless of whether "removing" the code helps.) > > > > > > > > I don't think we want to "mark" `-device scsi-block` as sg either. It > > > > will probably bring even more unexpected problems, because they are > > > > literally different sets of things behind the scene / in the kernel. > > > > Yes, I did overlook the fact that scsi-block is kind of hybrid passthough device, > > doing SG_IO for some things and regular IO for others. > > > > I don't think that my approach was wrong way to fix the problem, but as you found > > out, abusing 'bs->sg' hack (which I would be very happy to remove completely) > > wasn't a good idea. > > I actualy was aware of scsi-block and that it does SG_IO but it > > was forgotten some where on the way. > > > > So in summary what the problem is and what I think is the right solution: > > > > > > Each qemu block driver exposes block limits and assumes that they are the same > > for two IO interfaces the block driver can expose: > > > > 1. Regular qemu blk_pread/pwrite alike functions > > 2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on > > host block devices/sg char devices, and by iscsi > > > > The problem is that these two interfaces can have different block limits. > > > > I don't know about iscsi, but for files, doing regular IO is always unlimited, > > since it passes through the kernel block layer and segemented there on > > demand which is faster that doing it in userspace, while SG_IO is passed as is > > to the underlying SCSI device and lacks this segmentation. > > > > Regardless of how NBD uses these limits, I think that these limits should be correctly > > exposed by the block drivers, and therefore I propose is that each qemu block driver > > would expose a pair of block limits. > > One for the regular block IO, and other for SG_IO. > > > > Then block driver clients (like scsi devices that you mention, nbd, etc) > > can choose which limit to use, depending on which IO api they use. > > The scsi-generic, and scsi-block can use the SG_IO limits, > > while the rest can use the normal (unlimited for file I/O) limits, including the NBD. > > > > Best regards, > > Maxim Levitsky > > > > > > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote: > > > > > sg devices have different major/minor than their corresponding > > > > > block devices. Using sysfs to get max segments never really worked > > > > > for them. > > > > > > > > > > Fortunately the sg driver provides an ioctl to get sg_tablesize, > > > > > which is apparently equivalent to max segments. > > > > > > > > > > Signed-off-by: Tom Yan <tom.ty89@gmail.com> > > > > > --- > > > > > block/file-posix.c | 17 ++++++++++++++++- > > > > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > > > > index 411a23cf99..9e37594145 100644 > > > > > --- a/block/file-posix.c > > > > > +++ b/block/file-posix.c > > > > > @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd) > > > > > #endif > > > > > } > > > > > > > > > > +static int sg_get_max_segments(int fd) > > > > > +{ > > > > > +#ifdef SG_GET_SG_TABLESIZE > > > > > + long max_segments = 0; > > > > > + > > > > > + if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) { > > > > > + return max_segments; > > > > > + } else { > > > > > + return -errno; > > > > > + } > > > > > +#else > > > > > + return -ENOSYS; > > > > > +#endif > > > > > +} > > > > > + > > > > > static int get_max_transfer_length(int fd) > > > > > { > > > > > #if defined(BLKSECTGET) && defined(BLKSSZGET) > > > > > @@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, Error **errp) > > > > > bs->bl.max_transfer = pow2floor(ret); > > > > > } > > > > > > > > > > - ret = get_max_segments(s->fd); > > > > > + ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd); > > > > > if (ret > 0) { > > > > > bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, > > > > > ret * qemu_real_host_page_size); > > > > > -- > > > > > 2.28.0 > > > > >
> -----Original Message----- > From: Qemu-block <qemu-block- > bounces+dmitry.fomichev=wdc.com@nongnu.org> On Behalf Of Maxim > Levitsky > Sent: Sunday, September 6, 2020 1:06 PM > To: Tom Yan <tom.ty89@gmail.com>; eblake@redhat.com; > pbonzini@redhat.com; fam@euphon.net; anielhb@linux.vnet.ibm.com; > kwolf@redhat.com; mreitz@redhat.com > Cc: qemu-devel@nongnu.org; qemu-block@nongnu.org > Subject: Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that > actually works with sg > > On Sun, 2020-09-06 at 23:26 +0800, Tom Yan wrote: > > I don't disagree with your proposal, but the thing is, do we even need > > another field/limit for case 1? For example, do we *really* need to > > clamp sizes[2] (NBD_INFO_BLOCK_SIZE) in nbd/server.c and > > max_io_sectors (SCSIBlockLimits) in hw/scsi/scsi-disk.c to to any kind > > of "dynamic" limit? > > It depends on if we have other block drivers that have IO segment/max > transfer sizes, > that we need to enforce, other than the SG_IO case. > > Knowing that the commit that added these limits to file-posix, which I was > fixing is relatively > new, I guess there are other cases for these limits. > > I'll check this tomorrow. > > > > > Either way I can add another field (`max_pwrite`, maybe?) to > > BlockLimits, as an infrastructure/hint, but what should be switched to > > it, and what value should each block driver set it to, is probably > > beyond what I can take. > > Implementation wise, I think I can do this, but I'll wait few days for the > feedback on my proposal. > > Thanks for finding this bug! There was also https://lore.kernel.org/qemu-devel/20200811225122.17342-2-dmitry.fomichev@wdc.com/ posted three weeks ago, but, seemingly, it was ignored... sorry if this bug is already being fixed in a different way, but I think the fix presented in that patch is worth considering. Very best, Dmitry > > Best regards, > Maxim Levitsky > > > > > IMHO my patches should be merged first (they are really just fixing a > > regression and some bugs). If anyone finds it necessary and is capable > > and willing to fix the insufficiency of BlockLimits, they can do it > > afterwards with another patch series as an improvement. I can add a > > patch to remove the clamping in nbd/server.c as a perhaps-temporary > > measure to address the original issue, if desired. > > > > On Sun, 6 Sep 2020 at 20:53, Maxim Levitsky <mlevitsk@redhat.com> > wrote: > > > On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote: > > > > Maybe you want to add some condition for this: > > > > https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659 > > > > Or not clamp it at all. > > > > > > > > On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote: > > > > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky > appears > > > > > to have assumed that the only "SCSI Passthrough" is `-device > > > > > scsi-generic`, while the fact is there's also `-device scsi-block` > > > > > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting > > > > > max_sectors is necessary to it (more precisely, hw_max_sectors > might > > > > > what matters, but BLKSECTGET reports max_sectors, so). > > > > > > > > > > I'm unsure about how qemu-nbd works, but the commit clearly > wasn't the > > > > > right approach to fix the original issue it addresses. (It should for > > > > > example ignore the max_transfer if it will never matter in to it, or > > > > > overrides it in certain cases; when I glimpsed over > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see > how > > > > > it could be file-posix problem when it is reporting the right thing, > > > > > regardless of whether "removing" the code helps.) > > > > > > > > > > I don't think we want to "mark" `-device scsi-block` as sg either. It > > > > > will probably bring even more unexpected problems, because they > are > > > > > literally different sets of things behind the scene / in the kernel. > > > > > > Yes, I did overlook the fact that scsi-block is kind of hybrid passthough > device, > > > doing SG_IO for some things and regular IO for others. > > > > > > I don't think that my approach was wrong way to fix the problem, but as > you found > > > out, abusing 'bs->sg' hack (which I would be very happy to remove > completely) > > > wasn't a good idea. > > > I actualy was aware of scsi-block and that it does SG_IO but it > > > was forgotten some where on the way. > > > > > > So in summary what the problem is and what I think is the right solution: > > > > > > > > > Each qemu block driver exposes block limits and assumes that they are > the same > > > for two IO interfaces the block driver can expose: > > > > > > 1. Regular qemu blk_pread/pwrite alike functions > > > 2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on > > > host block devices/sg char devices, and by iscsi > > > > > > The problem is that these two interfaces can have different block limits. > > > > > > I don't know about iscsi, but for files, doing regular IO is always unlimited, > > > since it passes through the kernel block layer and segemented there on > > > demand which is faster that doing it in userspace, while SG_IO is passed > as is > > > to the underlying SCSI device and lacks this segmentation. > > > > > > Regardless of how NBD uses these limits, I think that these limits should > be correctly > > > exposed by the block drivers, and therefore I propose is that each qemu > block driver > > > would expose a pair of block limits. > > > One for the regular block IO, and other for SG_IO. > > > > > > Then block driver clients (like scsi devices that you mention, nbd, etc) > > > can choose which limit to use, depending on which IO api they use. > > > The scsi-generic, and scsi-block can use the SG_IO limits, > > > while the rest can use the normal (unlimited for file I/O) limits, including > the NBD. > > > > > > Best regards, > > > Maxim Levitsky > > > > > > > > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote: > > > > > > sg devices have different major/minor than their corresponding > > > > > > block devices. Using sysfs to get max segments never really worked > > > > > > for them. > > > > > > > > > > > > Fortunately the sg driver provides an ioctl to get sg_tablesize, > > > > > > which is apparently equivalent to max segments. > > > > > > > > > > > > Signed-off-by: Tom Yan <tom.ty89@gmail.com> > > > > > > --- > > > > > > block/file-posix.c | 17 ++++++++++++++++- > > > > > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > > > > > index 411a23cf99..9e37594145 100644 > > > > > > --- a/block/file-posix.c > > > > > > +++ b/block/file-posix.c > > > > > > @@ -1178,6 +1178,21 @@ static int > sg_get_max_transfer_length(int fd) > > > > > > #endif > > > > > > } > > > > > > > > > > > > +static int sg_get_max_segments(int fd) > > > > > > +{ > > > > > > +#ifdef SG_GET_SG_TABLESIZE > > > > > > + long max_segments = 0; > > > > > > + > > > > > > + if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) { > > > > > > + return max_segments; > > > > > > + } else { > > > > > > + return -errno; > > > > > > + } > > > > > > +#else > > > > > > + return -ENOSYS; > > > > > > +#endif > > > > > > +} > > > > > > + > > > > > > static int get_max_transfer_length(int fd) > > > > > > { > > > > > > #if defined(BLKSECTGET) && defined(BLKSSZGET) > > > > > > @@ -1268,7 +1283,7 @@ static void > hdev_refresh_limits(BlockDriverState *bs, Error **errp) > > > > > > bs->bl.max_transfer = pow2floor(ret); > > > > > > } > > > > > > > > > > > > - ret = get_max_segments(s->fd); > > > > > > + ret = bs->sg ? sg_get_max_segments(s->fd) : > get_max_segments(s->fd); > > > > > > if (ret > 0) { > > > > > > bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, > > > > > > ret * qemu_real_host_page_size); > > > > > > -- > > > > > > 2.28.0 > > > > > > > >
Your patch only fixed it for sg devices. S_ISBLK() will never be true when sg_get_max_segments() is only called in raw_refresh_limits() when bs->sg is true. With that said, it's not like we cannot ditch the bs->sg check in raw_refresh_limits() and switch to S_ISBLK()/S_ISCHR(). (Although not strictly necessary, I would also do S_ISCHR() for SG_GET_SG_TABLESIZE.) I don't know if it's a better approach though. Seems repetitive to me (and bs->sg is not only used/checked in file-posix/raw_refresh_limits): https://github.com/qemu/qemu/blob/v5.1.0/block/file-posix.c#L3390 https://github.com/qemu/qemu/blob/v5.1.0/block/file-posix.c#L3504 On Mon, 7 Sep 2020 at 03:50, Dmitry Fomichev <Dmitry.Fomichev@wdc.com> wrote: > > > > > -----Original Message----- > > From: Qemu-block <qemu-block- > > bounces+dmitry.fomichev=wdc.com@nongnu.org> On Behalf Of Maxim > > Levitsky > > Sent: Sunday, September 6, 2020 1:06 PM > > To: Tom Yan <tom.ty89@gmail.com>; eblake@redhat.com; > > pbonzini@redhat.com; fam@euphon.net; anielhb@linux.vnet.ibm.com; > > kwolf@redhat.com; mreitz@redhat.com > > Cc: qemu-devel@nongnu.org; qemu-block@nongnu.org > > Subject: Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that > > actually works with sg > > > > On Sun, 2020-09-06 at 23:26 +0800, Tom Yan wrote: > > > I don't disagree with your proposal, but the thing is, do we even need > > > another field/limit for case 1? For example, do we *really* need to > > > clamp sizes[2] (NBD_INFO_BLOCK_SIZE) in nbd/server.c and > > > max_io_sectors (SCSIBlockLimits) in hw/scsi/scsi-disk.c to to any kind > > > of "dynamic" limit? > > > > It depends on if we have other block drivers that have IO segment/max > > transfer sizes, > > that we need to enforce, other than the SG_IO case. > > > > Knowing that the commit that added these limits to file-posix, which I was > > fixing is relatively > > new, I guess there are other cases for these limits. > > > > I'll check this tomorrow. > > > > > > > > Either way I can add another field (`max_pwrite`, maybe?) to > > > BlockLimits, as an infrastructure/hint, but what should be switched to > > > it, and what value should each block driver set it to, is probably > > > beyond what I can take. > > > > Implementation wise, I think I can do this, but I'll wait few days for the > > feedback on my proposal. > > > > Thanks for finding this bug! > > There was also https://lore.kernel.org/qemu-devel/20200811225122.17342-2-dmitry.fomichev@wdc.com/ > posted three weeks ago, but, seemingly, it was ignored... sorry if this > bug is already being fixed in a different way, but I think the fix presented > in that patch is worth considering. > > Very best, > Dmitry > > > > > Best regards, > > Maxim Levitsky > > > > > > > > IMHO my patches should be merged first (they are really just fixing a > > > regression and some bugs). If anyone finds it necessary and is capable > > > and willing to fix the insufficiency of BlockLimits, they can do it > > > afterwards with another patch series as an improvement. I can add a > > > patch to remove the clamping in nbd/server.c as a perhaps-temporary > > > measure to address the original issue, if desired. > > > > > > On Sun, 6 Sep 2020 at 20:53, Maxim Levitsky <mlevitsk@redhat.com> > > wrote: > > > > On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote: > > > > > Maybe you want to add some condition for this: > > > > > https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659 > > > > > Or not clamp it at all. > > > > > > > > > > On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote: > > > > > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky > > appears > > > > > > to have assumed that the only "SCSI Passthrough" is `-device > > > > > > scsi-generic`, while the fact is there's also `-device scsi-block` > > > > > > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting > > > > > > max_sectors is necessary to it (more precisely, hw_max_sectors > > might > > > > > > what matters, but BLKSECTGET reports max_sectors, so). > > > > > > > > > > > > I'm unsure about how qemu-nbd works, but the commit clearly > > wasn't the > > > > > > right approach to fix the original issue it addresses. (It should for > > > > > > example ignore the max_transfer if it will never matter in to it, or > > > > > > overrides it in certain cases; when I glimpsed over > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see > > how > > > > > > it could be file-posix problem when it is reporting the right thing, > > > > > > regardless of whether "removing" the code helps.) > > > > > > > > > > > > I don't think we want to "mark" `-device scsi-block` as sg either. It > > > > > > will probably bring even more unexpected problems, because they > > are > > > > > > literally different sets of things behind the scene / in the kernel. > > > > > > > > Yes, I did overlook the fact that scsi-block is kind of hybrid passthough > > device, > > > > doing SG_IO for some things and regular IO for others. > > > > > > > > I don't think that my approach was wrong way to fix the problem, but as > > you found > > > > out, abusing 'bs->sg' hack (which I would be very happy to remove > > completely) > > > > wasn't a good idea. > > > > I actualy was aware of scsi-block and that it does SG_IO but it > > > > was forgotten some where on the way. > > > > > > > > So in summary what the problem is and what I think is the right solution: > > > > > > > > > > > > Each qemu block driver exposes block limits and assumes that they are > > the same > > > > for two IO interfaces the block driver can expose: > > > > > > > > 1. Regular qemu blk_pread/pwrite alike functions > > > > 2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on > > > > host block devices/sg char devices, and by iscsi > > > > > > > > The problem is that these two interfaces can have different block limits. > > > > > > > > I don't know about iscsi, but for files, doing regular IO is always unlimited, > > > > since it passes through the kernel block layer and segemented there on > > > > demand which is faster that doing it in userspace, while SG_IO is passed > > as is > > > > to the underlying SCSI device and lacks this segmentation. > > > > > > > > Regardless of how NBD uses these limits, I think that these limits should > > be correctly > > > > exposed by the block drivers, and therefore I propose is that each qemu > > block driver > > > > would expose a pair of block limits. > > > > One for the regular block IO, and other for SG_IO. > > > > > > > > Then block driver clients (like scsi devices that you mention, nbd, etc) > > > > can choose which limit to use, depending on which IO api they use. > > > > The scsi-generic, and scsi-block can use the SG_IO limits, > > > > while the rest can use the normal (unlimited for file I/O) limits, including > > the NBD. > > > > > > > > Best regards, > > > > Maxim Levitsky > > > > > > > > > > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote: > > > > > > > sg devices have different major/minor than their corresponding > > > > > > > block devices. Using sysfs to get max segments never really worked > > > > > > > for them. > > > > > > > > > > > > > > Fortunately the sg driver provides an ioctl to get sg_tablesize, > > > > > > > which is apparently equivalent to max segments. > > > > > > > > > > > > > > Signed-off-by: Tom Yan <tom.ty89@gmail.com> > > > > > > > --- > > > > > > > block/file-posix.c | 17 ++++++++++++++++- > > > > > > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > > > > > > index 411a23cf99..9e37594145 100644 > > > > > > > --- a/block/file-posix.c > > > > > > > +++ b/block/file-posix.c > > > > > > > @@ -1178,6 +1178,21 @@ static int > > sg_get_max_transfer_length(int fd) > > > > > > > #endif > > > > > > > } > > > > > > > > > > > > > > +static int sg_get_max_segments(int fd) > > > > > > > +{ > > > > > > > +#ifdef SG_GET_SG_TABLESIZE > > > > > > > + long max_segments = 0; > > > > > > > + > > > > > > > + if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) { > > > > > > > + return max_segments; > > > > > > > + } else { > > > > > > > + return -errno; > > > > > > > + } > > > > > > > +#else > > > > > > > + return -ENOSYS; > > > > > > > +#endif > > > > > > > +} > > > > > > > + > > > > > > > static int get_max_transfer_length(int fd) > > > > > > > { > > > > > > > #if defined(BLKSECTGET) && defined(BLKSSZGET) > > > > > > > @@ -1268,7 +1283,7 @@ static void > > hdev_refresh_limits(BlockDriverState *bs, Error **errp) > > > > > > > bs->bl.max_transfer = pow2floor(ret); > > > > > > > } > > > > > > > > > > > > > > - ret = get_max_segments(s->fd); > > > > > > > + ret = bs->sg ? sg_get_max_segments(s->fd) : > > get_max_segments(s->fd); > > > > > > > if (ret > 0) { > > > > > > > bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, > > > > > > > ret * qemu_real_host_page_size); > > > > > > > -- > > > > > > > 2.28.0 > > > > > > > > > > > >
diff --git a/block/file-posix.c b/block/file-posix.c index 411a23cf99..9e37594145 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd) #endif } +static int sg_get_max_segments(int fd) +{ +#ifdef SG_GET_SG_TABLESIZE + long max_segments = 0; + + if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) { + return max_segments; + } else { + return -errno; + } +#else + return -ENOSYS; +#endif +} + static int get_max_transfer_length(int fd) { #if defined(BLKSECTGET) && defined(BLKSSZGET) @@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.max_transfer = pow2floor(ret); } - ret = get_max_segments(s->fd); + ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd); if (ret > 0) { bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, ret * qemu_real_host_page_size);
sg devices have different major/minor than their corresponding block devices. Using sysfs to get max segments never really worked for them. Fortunately the sg driver provides an ioctl to get sg_tablesize, which is apparently equivalent to max segments. Signed-off-by: Tom Yan <tom.ty89@gmail.com> --- block/file-posix.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)