diff mbox series

[v4,2/2] file-posix: add sg_get_max_segments that actually works with sg

Message ID 20200904020907.241550-2-tom.ty89@gmail.com
State Superseded
Headers show
Series [v4,1/2] file-posix: split hdev_refresh_limits from raw_refresh_limits | expand

Commit Message

Tom Yan Sept. 4, 2020, 2:09 a.m. UTC
sg devices have different major/minor than their corresponding
block devices. Using sysfs to get max segments never really worked
for them.

Fortunately the sg driver provides an ioctl to get sg_tablesize,
which is apparently equivalent to max segments.

Signed-off-by: Tom Yan <tom.ty89@gmail.com>
---
 block/file-posix.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Comments

Tom Yan Sept. 6, 2020, 11:04 a.m. UTC | #1
Maybe you want to add some condition for this:
https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659
Or not clamp it at all.

On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote:
>
> In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears
> to have assumed that the only "SCSI Passthrough" is `-device
> scsi-generic`, while the fact is there's also `-device scsi-block`
> (passthrough without the sg driver). Unlike `-device scsi-hd`, getting
> max_sectors is necessary to it (more precisely, hw_max_sectors might
> what matters, but BLKSECTGET reports max_sectors, so).
>
> I'm unsure about how qemu-nbd works, but the commit clearly wasn't the
> right approach to fix the original issue it addresses. (It should for
> example ignore the max_transfer if it will never matter in to it, or
> overrides it in certain cases; when I glimpsed over
> https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how
> it could be file-posix problem when it is reporting the right thing,
> regardless of whether "removing" the code helps.)
>
> I don't think we want to "mark" `-device scsi-block` as sg either. It
> will probably bring even more unexpected problems, because they are
> literally different sets of things behind the scene / in the kernel.
>
> On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote:
> >
> > sg devices have different major/minor than their corresponding
> > block devices. Using sysfs to get max segments never really worked
> > for them.
> >
> > Fortunately the sg driver provides an ioctl to get sg_tablesize,
> > which is apparently equivalent to max segments.
> >
> > Signed-off-by: Tom Yan <tom.ty89@gmail.com>
> > ---
> >  block/file-posix.c | 17 ++++++++++++++++-
> >  1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 411a23cf99..9e37594145 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd)
> >  #endif
> >  }
> >
> > +static int sg_get_max_segments(int fd)
> > +{
> > +#ifdef SG_GET_SG_TABLESIZE
> > +    long max_segments = 0;
> > +
> > +    if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) {
> > +        return max_segments;
> > +    } else {
> > +        return -errno;
> > +    }
> > +#else
> > +    return -ENOSYS;
> > +#endif
> > +}
> > +
> >  static int get_max_transfer_length(int fd)
> >  {
> >  #if defined(BLKSECTGET) && defined(BLKSSZGET)
> > @@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, Error **errp)
> >          bs->bl.max_transfer = pow2floor(ret);
> >      }
> >
> > -    ret = get_max_segments(s->fd);
> > +    ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
> >      if (ret > 0) {
> >          bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> >                                             ret * qemu_real_host_page_size);
> > --
> > 2.28.0
> >
Maxim Levitsky Sept. 6, 2020, 12:53 p.m. UTC | #2
On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote:
> Maybe you want to add some condition for this:
> https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659
> Or not clamp it at all.
> 
> On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote:
> > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears
> > to have assumed that the only "SCSI Passthrough" is `-device
> > scsi-generic`, while the fact is there's also `-device scsi-block`
> > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting
> > max_sectors is necessary to it (more precisely, hw_max_sectors might
> > what matters, but BLKSECTGET reports max_sectors, so).
> > 
> > I'm unsure about how qemu-nbd works, but the commit clearly wasn't the
> > right approach to fix the original issue it addresses. (It should for
> > example ignore the max_transfer if it will never matter in to it, or
> > overrides it in certain cases; when I glimpsed over
> > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how
> > it could be file-posix problem when it is reporting the right thing,
> > regardless of whether "removing" the code helps.)
> > 
> > I don't think we want to "mark" `-device scsi-block` as sg either. It
> > will probably bring even more unexpected problems, because they are
> > literally different sets of things behind the scene / in the kernel.

Yes, I did overlook the fact that scsi-block is kind of hybrid passthough device,
doing SG_IO for some things and regular IO for others.

I don't think that my approach was wrong way to fix the problem, but as you found
out, abusing 'bs->sg' hack (which I would be very happy to remove completely)
wasn't a good idea.
I actualy was aware of scsi-block and that it does SG_IO but it 
was forgotten some where on the way.

So in summary what the problem is and what I think is the right solution:


Each qemu block driver exposes block limits and assumes that they are the same 
for two IO interfaces the block driver can expose:

1. Regular qemu blk_pread/pwrite alike functions
2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on 
   host block devices/sg char devices, and by iscsi

The problem is that these two interfaces can have different block limits.

I don't know about iscsi, but for files, doing regular IO is always unlimited,
since it passes through the kernel block layer and segemented there on 
demand which is faster that doing it in userspace, while SG_IO is passed as is
to the underlying SCSI device and lacks this segmentation.

Regardless of how NBD uses these limits, I think that these limits should be correctly
exposed by the block drivers, and therefore I propose is that each qemu block driver 
would expose a pair of block limits.
One for the regular block IO, and other for SG_IO.

Then block driver clients (like scsi devices that you mention, nbd, etc) 
can choose which limit to use, depending on which IO api they use.
The scsi-generic, and scsi-block can use the SG_IO limits, 
while the rest  can use the normal (unlimited for file I/O) limits, including the NBD.

Best regards,
	Maxim Levitsky

> > 
> > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote:
> > > sg devices have different major/minor than their corresponding
> > > block devices. Using sysfs to get max segments never really worked
> > > for them.
> > > 
> > > Fortunately the sg driver provides an ioctl to get sg_tablesize,
> > > which is apparently equivalent to max segments.
> > > 
> > > Signed-off-by: Tom Yan <tom.ty89@gmail.com>
> > > ---
> > >  block/file-posix.c | 17 ++++++++++++++++-
> > >  1 file changed, 16 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > index 411a23cf99..9e37594145 100644
> > > --- a/block/file-posix.c
> > > +++ b/block/file-posix.c
> > > @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd)
> > >  #endif
> > >  }
> > > 
> > > +static int sg_get_max_segments(int fd)
> > > +{
> > > +#ifdef SG_GET_SG_TABLESIZE
> > > +    long max_segments = 0;
> > > +
> > > +    if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) {
> > > +        return max_segments;
> > > +    } else {
> > > +        return -errno;
> > > +    }
> > > +#else
> > > +    return -ENOSYS;
> > > +#endif
> > > +}
> > > +
> > >  static int get_max_transfer_length(int fd)
> > >  {
> > >  #if defined(BLKSECTGET) && defined(BLKSSZGET)
> > > @@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, Error **errp)
> > >          bs->bl.max_transfer = pow2floor(ret);
> > >      }
> > > 
> > > -    ret = get_max_segments(s->fd);
> > > +    ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
> > >      if (ret > 0) {
> > >          bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> > >                                             ret * qemu_real_host_page_size);
> > > --
> > > 2.28.0
> > >
Maxim Levitsky Sept. 6, 2020, 5:05 p.m. UTC | #3
On Sun, 2020-09-06 at 23:26 +0800, Tom Yan wrote:
> I don't disagree with your proposal, but the thing is, do we even need
> another field/limit for case 1? For example, do we *really* need to
> clamp sizes[2] (NBD_INFO_BLOCK_SIZE) in nbd/server.c and
> max_io_sectors (SCSIBlockLimits) in hw/scsi/scsi-disk.c to to any kind
> of "dynamic" limit?

It depends on if we have other block drivers that have IO segment/max transfer sizes,
that we need to enforce, other than the SG_IO case.

Knowing that the commit that added these limits to file-posix, which I was fixing is relatively
new, I guess there are other cases for these limits.

I'll check this tomorrow.

> 
> Either way I can add another field (`max_pwrite`, maybe?) to
> BlockLimits, as an infrastructure/hint, but what should be switched to
> it, and what value should each block driver set it to, is probably
> beyond what I can take.

Implementation wise, I think I can do this, but I'll wait few days for the
feedback on my proposal.

Thanks for finding this bug!

Best regards,
	Maxim Levitsky

> 
> IMHO my patches should be merged first (they are really just fixing a
> regression and some bugs). If anyone finds it necessary and is capable
> and willing to fix the insufficiency of BlockLimits, they can do it
> afterwards with another patch series as an improvement. I can add a
> patch to remove the clamping in nbd/server.c as a perhaps-temporary
> measure to address the original issue, if desired.
> 
> On Sun, 6 Sep 2020 at 20:53, Maxim Levitsky <mlevitsk@redhat.com> wrote:
> > On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote:
> > > Maybe you want to add some condition for this:
> > > https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659
> > > Or not clamp it at all.
> > > 
> > > On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote:
> > > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears
> > > > to have assumed that the only "SCSI Passthrough" is `-device
> > > > scsi-generic`, while the fact is there's also `-device scsi-block`
> > > > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting
> > > > max_sectors is necessary to it (more precisely, hw_max_sectors might
> > > > what matters, but BLKSECTGET reports max_sectors, so).
> > > > 
> > > > I'm unsure about how qemu-nbd works, but the commit clearly wasn't the
> > > > right approach to fix the original issue it addresses. (It should for
> > > > example ignore the max_transfer if it will never matter in to it, or
> > > > overrides it in certain cases; when I glimpsed over
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how
> > > > it could be file-posix problem when it is reporting the right thing,
> > > > regardless of whether "removing" the code helps.)
> > > > 
> > > > I don't think we want to "mark" `-device scsi-block` as sg either. It
> > > > will probably bring even more unexpected problems, because they are
> > > > literally different sets of things behind the scene / in the kernel.
> > 
> > Yes, I did overlook the fact that scsi-block is kind of hybrid passthough device,
> > doing SG_IO for some things and regular IO for others.
> > 
> > I don't think that my approach was wrong way to fix the problem, but as you found
> > out, abusing 'bs->sg' hack (which I would be very happy to remove completely)
> > wasn't a good idea.
> > I actualy was aware of scsi-block and that it does SG_IO but it
> > was forgotten some where on the way.
> > 
> > So in summary what the problem is and what I think is the right solution:
> > 
> > 
> > Each qemu block driver exposes block limits and assumes that they are the same
> > for two IO interfaces the block driver can expose:
> > 
> > 1. Regular qemu blk_pread/pwrite alike functions
> > 2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on
> >    host block devices/sg char devices, and by iscsi
> > 
> > The problem is that these two interfaces can have different block limits.
> > 
> > I don't know about iscsi, but for files, doing regular IO is always unlimited,
> > since it passes through the kernel block layer and segemented there on
> > demand which is faster that doing it in userspace, while SG_IO is passed as is
> > to the underlying SCSI device and lacks this segmentation.
> > 
> > Regardless of how NBD uses these limits, I think that these limits should be correctly
> > exposed by the block drivers, and therefore I propose is that each qemu block driver
> > would expose a pair of block limits.
> > One for the regular block IO, and other for SG_IO.
> > 
> > Then block driver clients (like scsi devices that you mention, nbd, etc)
> > can choose which limit to use, depending on which IO api they use.
> > The scsi-generic, and scsi-block can use the SG_IO limits,
> > while the rest  can use the normal (unlimited for file I/O) limits, including the NBD.
> > 
> > Best regards,
> >         Maxim Levitsky
> > 
> > > > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote:
> > > > > sg devices have different major/minor than their corresponding
> > > > > block devices. Using sysfs to get max segments never really worked
> > > > > for them.
> > > > > 
> > > > > Fortunately the sg driver provides an ioctl to get sg_tablesize,
> > > > > which is apparently equivalent to max segments.
> > > > > 
> > > > > Signed-off-by: Tom Yan <tom.ty89@gmail.com>
> > > > > ---
> > > > >  block/file-posix.c | 17 ++++++++++++++++-
> > > > >  1 file changed, 16 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > > > index 411a23cf99..9e37594145 100644
> > > > > --- a/block/file-posix.c
> > > > > +++ b/block/file-posix.c
> > > > > @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd)
> > > > >  #endif
> > > > >  }
> > > > > 
> > > > > +static int sg_get_max_segments(int fd)
> > > > > +{
> > > > > +#ifdef SG_GET_SG_TABLESIZE
> > > > > +    long max_segments = 0;
> > > > > +
> > > > > +    if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) {
> > > > > +        return max_segments;
> > > > > +    } else {
> > > > > +        return -errno;
> > > > > +    }
> > > > > +#else
> > > > > +    return -ENOSYS;
> > > > > +#endif
> > > > > +}
> > > > > +
> > > > >  static int get_max_transfer_length(int fd)
> > > > >  {
> > > > >  #if defined(BLKSECTGET) && defined(BLKSSZGET)
> > > > > @@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, Error **errp)
> > > > >          bs->bl.max_transfer = pow2floor(ret);
> > > > >      }
> > > > > 
> > > > > -    ret = get_max_segments(s->fd);
> > > > > +    ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
> > > > >      if (ret > 0) {
> > > > >          bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> > > > >                                             ret * qemu_real_host_page_size);
> > > > > --
> > > > > 2.28.0
> > > > >
Dmitry Fomichev Sept. 6, 2020, 7:50 p.m. UTC | #4
> -----Original Message-----

> From: Qemu-block <qemu-block-

> bounces+dmitry.fomichev=wdc.com@nongnu.org> On Behalf Of Maxim

> Levitsky

> Sent: Sunday, September 6, 2020 1:06 PM

> To: Tom Yan <tom.ty89@gmail.com>; eblake@redhat.com;

> pbonzini@redhat.com; fam@euphon.net; anielhb@linux.vnet.ibm.com;

> kwolf@redhat.com; mreitz@redhat.com

> Cc: qemu-devel@nongnu.org; qemu-block@nongnu.org

> Subject: Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that

> actually works with sg

> 

> On Sun, 2020-09-06 at 23:26 +0800, Tom Yan wrote:

> > I don't disagree with your proposal, but the thing is, do we even need

> > another field/limit for case 1? For example, do we *really* need to

> > clamp sizes[2] (NBD_INFO_BLOCK_SIZE) in nbd/server.c and

> > max_io_sectors (SCSIBlockLimits) in hw/scsi/scsi-disk.c to to any kind

> > of "dynamic" limit?

> 

> It depends on if we have other block drivers that have IO segment/max

> transfer sizes,

> that we need to enforce, other than the SG_IO case.

> 

> Knowing that the commit that added these limits to file-posix, which I was

> fixing is relatively

> new, I guess there are other cases for these limits.

> 

> I'll check this tomorrow.

> 

> >

> > Either way I can add another field (`max_pwrite`, maybe?) to

> > BlockLimits, as an infrastructure/hint, but what should be switched to

> > it, and what value should each block driver set it to, is probably

> > beyond what I can take.

> 

> Implementation wise, I think I can do this, but I'll wait few days for the

> feedback on my proposal.

> 

> Thanks for finding this bug!


There was also https://lore.kernel.org/qemu-devel/20200811225122.17342-2-dmitry.fomichev@wdc.com/
posted three weeks ago, but, seemingly, it was ignored... sorry if this
bug is already being fixed in a different way, but I think the fix presented
in that patch is worth considering.

Very best,
Dmitry

> 

> Best regards,

> 	Maxim Levitsky

> 

> >

> > IMHO my patches should be merged first (they are really just fixing a

> > regression and some bugs). If anyone finds it necessary and is capable

> > and willing to fix the insufficiency of BlockLimits, they can do it

> > afterwards with another patch series as an improvement. I can add a

> > patch to remove the clamping in nbd/server.c as a perhaps-temporary

> > measure to address the original issue, if desired.

> >

> > On Sun, 6 Sep 2020 at 20:53, Maxim Levitsky <mlevitsk@redhat.com>

> wrote:

> > > On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote:

> > > > Maybe you want to add some condition for this:

> > > > https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659

> > > > Or not clamp it at all.

> > > >

> > > > On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote:

> > > > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky

> appears

> > > > > to have assumed that the only "SCSI Passthrough" is `-device

> > > > > scsi-generic`, while the fact is there's also `-device scsi-block`

> > > > > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting

> > > > > max_sectors is necessary to it (more precisely, hw_max_sectors

> might

> > > > > what matters, but BLKSECTGET reports max_sectors, so).

> > > > >

> > > > > I'm unsure about how qemu-nbd works, but the commit clearly

> wasn't the

> > > > > right approach to fix the original issue it addresses. (It should for

> > > > > example ignore the max_transfer if it will never matter in to it, or

> > > > > overrides it in certain cases; when I glimpsed over

> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see

> how

> > > > > it could be file-posix problem when it is reporting the right thing,

> > > > > regardless of whether "removing" the code helps.)

> > > > >

> > > > > I don't think we want to "mark" `-device scsi-block` as sg either. It

> > > > > will probably bring even more unexpected problems, because they

> are

> > > > > literally different sets of things behind the scene / in the kernel.

> > >

> > > Yes, I did overlook the fact that scsi-block is kind of hybrid passthough

> device,

> > > doing SG_IO for some things and regular IO for others.

> > >

> > > I don't think that my approach was wrong way to fix the problem, but as

> you found

> > > out, abusing 'bs->sg' hack (which I would be very happy to remove

> completely)

> > > wasn't a good idea.

> > > I actualy was aware of scsi-block and that it does SG_IO but it

> > > was forgotten some where on the way.

> > >

> > > So in summary what the problem is and what I think is the right solution:

> > >

> > >

> > > Each qemu block driver exposes block limits and assumes that they are

> the same

> > > for two IO interfaces the block driver can expose:

> > >

> > > 1. Regular qemu blk_pread/pwrite alike functions

> > > 2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on

> > >    host block devices/sg char devices, and by iscsi

> > >

> > > The problem is that these two interfaces can have different block limits.

> > >

> > > I don't know about iscsi, but for files, doing regular IO is always unlimited,

> > > since it passes through the kernel block layer and segemented there on

> > > demand which is faster that doing it in userspace, while SG_IO is passed

> as is

> > > to the underlying SCSI device and lacks this segmentation.

> > >

> > > Regardless of how NBD uses these limits, I think that these limits should

> be correctly

> > > exposed by the block drivers, and therefore I propose is that each qemu

> block driver

> > > would expose a pair of block limits.

> > > One for the regular block IO, and other for SG_IO.

> > >

> > > Then block driver clients (like scsi devices that you mention, nbd, etc)

> > > can choose which limit to use, depending on which IO api they use.

> > > The scsi-generic, and scsi-block can use the SG_IO limits,

> > > while the rest  can use the normal (unlimited for file I/O) limits, including

> the NBD.

> > >

> > > Best regards,

> > >         Maxim Levitsky

> > >

> > > > > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote:

> > > > > > sg devices have different major/minor than their corresponding

> > > > > > block devices. Using sysfs to get max segments never really worked

> > > > > > for them.

> > > > > >

> > > > > > Fortunately the sg driver provides an ioctl to get sg_tablesize,

> > > > > > which is apparently equivalent to max segments.

> > > > > >

> > > > > > Signed-off-by: Tom Yan <tom.ty89@gmail.com>

> > > > > > ---

> > > > > >  block/file-posix.c | 17 ++++++++++++++++-

> > > > > >  1 file changed, 16 insertions(+), 1 deletion(-)

> > > > > >

> > > > > > diff --git a/block/file-posix.c b/block/file-posix.c

> > > > > > index 411a23cf99..9e37594145 100644

> > > > > > --- a/block/file-posix.c

> > > > > > +++ b/block/file-posix.c

> > > > > > @@ -1178,6 +1178,21 @@ static int

> sg_get_max_transfer_length(int fd)

> > > > > >  #endif

> > > > > >  }

> > > > > >

> > > > > > +static int sg_get_max_segments(int fd)

> > > > > > +{

> > > > > > +#ifdef SG_GET_SG_TABLESIZE

> > > > > > +    long max_segments = 0;

> > > > > > +

> > > > > > +    if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) {

> > > > > > +        return max_segments;

> > > > > > +    } else {

> > > > > > +        return -errno;

> > > > > > +    }

> > > > > > +#else

> > > > > > +    return -ENOSYS;

> > > > > > +#endif

> > > > > > +}

> > > > > > +

> > > > > >  static int get_max_transfer_length(int fd)

> > > > > >  {

> > > > > >  #if defined(BLKSECTGET) && defined(BLKSSZGET)

> > > > > > @@ -1268,7 +1283,7 @@ static void

> hdev_refresh_limits(BlockDriverState *bs, Error **errp)

> > > > > >          bs->bl.max_transfer = pow2floor(ret);

> > > > > >      }

> > > > > >

> > > > > > -    ret = get_max_segments(s->fd);

> > > > > > +    ret = bs->sg ? sg_get_max_segments(s->fd) :

> get_max_segments(s->fd);

> > > > > >      if (ret > 0) {

> > > > > >          bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,

> > > > > >                                             ret * qemu_real_host_page_size);

> > > > > > --

> > > > > > 2.28.0

> > > > > >

> 

>
Tom Yan Sept. 7, 2020, 1:29 a.m. UTC | #5
Your patch only fixed it for sg devices. S_ISBLK() will never be true
when sg_get_max_segments() is only called in raw_refresh_limits() when
bs->sg is true.

With that said, it's not like we cannot ditch the bs->sg check in
raw_refresh_limits() and switch to S_ISBLK()/S_ISCHR(). (Although not
strictly necessary, I would also do S_ISCHR() for
SG_GET_SG_TABLESIZE.)

I don't know if it's a better approach though. Seems repetitive to me
(and bs->sg is not only used/checked in
file-posix/raw_refresh_limits):
https://github.com/qemu/qemu/blob/v5.1.0/block/file-posix.c#L3390
https://github.com/qemu/qemu/blob/v5.1.0/block/file-posix.c#L3504

On Mon, 7 Sep 2020 at 03:50, Dmitry Fomichev <Dmitry.Fomichev@wdc.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Qemu-block <qemu-block-
> > bounces+dmitry.fomichev=wdc.com@nongnu.org> On Behalf Of Maxim
> > Levitsky
> > Sent: Sunday, September 6, 2020 1:06 PM
> > To: Tom Yan <tom.ty89@gmail.com>; eblake@redhat.com;
> > pbonzini@redhat.com; fam@euphon.net; anielhb@linux.vnet.ibm.com;
> > kwolf@redhat.com; mreitz@redhat.com
> > Cc: qemu-devel@nongnu.org; qemu-block@nongnu.org
> > Subject: Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that
> > actually works with sg
> >
> > On Sun, 2020-09-06 at 23:26 +0800, Tom Yan wrote:
> > > I don't disagree with your proposal, but the thing is, do we even need
> > > another field/limit for case 1? For example, do we *really* need to
> > > clamp sizes[2] (NBD_INFO_BLOCK_SIZE) in nbd/server.c and
> > > max_io_sectors (SCSIBlockLimits) in hw/scsi/scsi-disk.c to to any kind
> > > of "dynamic" limit?
> >
> > It depends on if we have other block drivers that have IO segment/max
> > transfer sizes,
> > that we need to enforce, other than the SG_IO case.
> >
> > Knowing that the commit that added these limits to file-posix, which I was
> > fixing is relatively
> > new, I guess there are other cases for these limits.
> >
> > I'll check this tomorrow.
> >
> > >
> > > Either way I can add another field (`max_pwrite`, maybe?) to
> > > BlockLimits, as an infrastructure/hint, but what should be switched to
> > > it, and what value should each block driver set it to, is probably
> > > beyond what I can take.
> >
> > Implementation wise, I think I can do this, but I'll wait few days for the
> > feedback on my proposal.
> >
> > Thanks for finding this bug!
>
> There was also https://lore.kernel.org/qemu-devel/20200811225122.17342-2-dmitry.fomichev@wdc.com/
> posted three weeks ago, but, seemingly, it was ignored... sorry if this
> bug is already being fixed in a different way, but I think the fix presented
> in that patch is worth considering.
>
> Very best,
> Dmitry
>
> >
> > Best regards,
> >       Maxim Levitsky
> >
> > >
> > > IMHO my patches should be merged first (they are really just fixing a
> > > regression and some bugs). If anyone finds it necessary and is capable
> > > and willing to fix the insufficiency of BlockLimits, they can do it
> > > afterwards with another patch series as an improvement. I can add a
> > > patch to remove the clamping in nbd/server.c as a perhaps-temporary
> > > measure to address the original issue, if desired.
> > >
> > > On Sun, 6 Sep 2020 at 20:53, Maxim Levitsky <mlevitsk@redhat.com>
> > wrote:
> > > > On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote:
> > > > > Maybe you want to add some condition for this:
> > > > > https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659
> > > > > Or not clamp it at all.
> > > > >
> > > > > On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote:
> > > > > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky
> > appears
> > > > > > to have assumed that the only "SCSI Passthrough" is `-device
> > > > > > scsi-generic`, while the fact is there's also `-device scsi-block`
> > > > > > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting
> > > > > > max_sectors is necessary to it (more precisely, hw_max_sectors
> > might
> > > > > > what matters, but BLKSECTGET reports max_sectors, so).
> > > > > >
> > > > > > I'm unsure about how qemu-nbd works, but the commit clearly
> > wasn't the
> > > > > > right approach to fix the original issue it addresses. (It should for
> > > > > > example ignore the max_transfer if it will never matter in to it, or
> > > > > > overrides it in certain cases; when I glimpsed over
> > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see
> > how
> > > > > > it could be file-posix problem when it is reporting the right thing,
> > > > > > regardless of whether "removing" the code helps.)
> > > > > >
> > > > > > I don't think we want to "mark" `-device scsi-block` as sg either. It
> > > > > > will probably bring even more unexpected problems, because they
> > are
> > > > > > literally different sets of things behind the scene / in the kernel.
> > > >
> > > > Yes, I did overlook the fact that scsi-block is kind of hybrid passthough
> > device,
> > > > doing SG_IO for some things and regular IO for others.
> > > >
> > > > I don't think that my approach was wrong way to fix the problem, but as
> > you found
> > > > out, abusing 'bs->sg' hack (which I would be very happy to remove
> > completely)
> > > > wasn't a good idea.
> > > > I actualy was aware of scsi-block and that it does SG_IO but it
> > > > was forgotten some where on the way.
> > > >
> > > > So in summary what the problem is and what I think is the right solution:
> > > >
> > > >
> > > > Each qemu block driver exposes block limits and assumes that they are
> > the same
> > > > for two IO interfaces the block driver can expose:
> > > >
> > > > 1. Regular qemu blk_pread/pwrite alike functions
> > > > 2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on
> > > >    host block devices/sg char devices, and by iscsi
> > > >
> > > > The problem is that these two interfaces can have different block limits.
> > > >
> > > > I don't know about iscsi, but for files, doing regular IO is always unlimited,
> > > > since it passes through the kernel block layer and segemented there on
> > > > demand which is faster that doing it in userspace, while SG_IO is passed
> > as is
> > > > to the underlying SCSI device and lacks this segmentation.
> > > >
> > > > Regardless of how NBD uses these limits, I think that these limits should
> > be correctly
> > > > exposed by the block drivers, and therefore I propose is that each qemu
> > block driver
> > > > would expose a pair of block limits.
> > > > One for the regular block IO, and other for SG_IO.
> > > >
> > > > Then block driver clients (like scsi devices that you mention, nbd, etc)
> > > > can choose which limit to use, depending on which IO api they use.
> > > > The scsi-generic, and scsi-block can use the SG_IO limits,
> > > > while the rest  can use the normal (unlimited for file I/O) limits, including
> > the NBD.
> > > >
> > > > Best regards,
> > > >         Maxim Levitsky
> > > >
> > > > > > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote:
> > > > > > > sg devices have different major/minor than their corresponding
> > > > > > > block devices. Using sysfs to get max segments never really worked
> > > > > > > for them.
> > > > > > >
> > > > > > > Fortunately the sg driver provides an ioctl to get sg_tablesize,
> > > > > > > which is apparently equivalent to max segments.
> > > > > > >
> > > > > > > Signed-off-by: Tom Yan <tom.ty89@gmail.com>
> > > > > > > ---
> > > > > > >  block/file-posix.c | 17 ++++++++++++++++-
> > > > > > >  1 file changed, 16 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > > > > > index 411a23cf99..9e37594145 100644
> > > > > > > --- a/block/file-posix.c
> > > > > > > +++ b/block/file-posix.c
> > > > > > > @@ -1178,6 +1178,21 @@ static int
> > sg_get_max_transfer_length(int fd)
> > > > > > >  #endif
> > > > > > >  }
> > > > > > >
> > > > > > > +static int sg_get_max_segments(int fd)
> > > > > > > +{
> > > > > > > +#ifdef SG_GET_SG_TABLESIZE
> > > > > > > +    long max_segments = 0;
> > > > > > > +
> > > > > > > +    if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) {
> > > > > > > +        return max_segments;
> > > > > > > +    } else {
> > > > > > > +        return -errno;
> > > > > > > +    }
> > > > > > > +#else
> > > > > > > +    return -ENOSYS;
> > > > > > > +#endif
> > > > > > > +}
> > > > > > > +
> > > > > > >  static int get_max_transfer_length(int fd)
> > > > > > >  {
> > > > > > >  #if defined(BLKSECTGET) && defined(BLKSSZGET)
> > > > > > > @@ -1268,7 +1283,7 @@ static void
> > hdev_refresh_limits(BlockDriverState *bs, Error **errp)
> > > > > > >          bs->bl.max_transfer = pow2floor(ret);
> > > > > > >      }
> > > > > > >
> > > > > > > -    ret = get_max_segments(s->fd);
> > > > > > > +    ret = bs->sg ? sg_get_max_segments(s->fd) :
> > get_max_segments(s->fd);
> > > > > > >      if (ret > 0) {
> > > > > > >          bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> > > > > > >                                             ret * qemu_real_host_page_size);
> > > > > > > --
> > > > > > > 2.28.0
> > > > > > >
> >
> >
>
diff mbox series

Patch

diff --git a/block/file-posix.c b/block/file-posix.c
index 411a23cf99..9e37594145 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1178,6 +1178,21 @@  static int sg_get_max_transfer_length(int fd)
 #endif
 }
 
+static int sg_get_max_segments(int fd)
+{
+#ifdef SG_GET_SG_TABLESIZE
+    long max_segments = 0;
+
+    if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) {
+        return max_segments;
+    } else {
+        return -errno;
+    }
+#else
+    return -ENOSYS;
+#endif
+}
+
 static int get_max_transfer_length(int fd)
 {
 #if defined(BLKSECTGET) && defined(BLKSSZGET)
@@ -1268,7 +1283,7 @@  static void hdev_refresh_limits(BlockDriverState *bs, Error **errp)
         bs->bl.max_transfer = pow2floor(ret);
     }
 
-    ret = get_max_segments(s->fd);
+    ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
     if (ret > 0) {
         bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
                                            ret * qemu_real_host_page_size);