Message ID | 20240823103811.2421-1-anuj20.g@samsung.com |
---|---|
Headers | show |
Series | Read/Write with meta/integrity | expand |
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
On Fri, Aug 23, 2024 at 04:08:03PM +0530, Anuj Gupta wrote: > Copy back the bounce buffer to user-space in entirety when the parent > bio completes. This looks odd to me. The usual way to handle iterating the entire submitter controlled data is to just iterate over the bvec array, as done by bio_for_each_segment_all/bio_for_each_bvec_all for the bio data. I think you want to do the same here, probably with a similar bip_for_each_bvec_all or similar helper. That way you don't need to stash away the iter. Currently we have the field for that, but I really want to split up struct bio_integrity_payload into what is actually needed for the payload and stuff only needed for the block layer autogenerated PI (bip_bio/bio_iter/bip_work).
On Fri, Aug 23, 2024 at 04:08:05PM +0530, Anuj Gupta wrote: > +struct uio_meta { > + meta_flags_t flags; > + u16 app_tag; > + struct iov_iter iter; > +}; Odd formatting here - the aligning tab goes before the field name, not the name of the structure type.
This patch came in twice, once with a "block" and once with a "block,nvme" prefix. One should be fine, and I think just block is the right one. How do we communicate what flags can be combined to the upper layer and userspace given the SCSI limitations here? > --- a/include/linux/bio-integrity.h > +++ b/include/linux/bio-integrity.h > @@ -11,6 +11,9 @@ enum bip_flags { > BIP_DISK_NOCHECK = 1 << 3, /* disable disk integrity checking */ > BIP_IP_CHECKSUM = 1 << 4, /* IP checksum */ > BIP_COPY_USER = 1 << 5, /* Kernel bounce buffer in use */ > + BIP_CHECK_GUARD = 1 << 6, > + BIP_CHECK_REFTAG = 1 << 7, > + BIP_CHECK_APPTAG = 1 << 8, Maybe describe the flags here like the other ones?
On Fri, Aug 23, 2024 at 04:08:09PM +0530, Anuj Gupta wrote: > From: Kanchan Joshi <joshi.k@samsung.com> > > If iocb contains the meta, extract that and prepare the bip. If an iocb contains metadata, ... > --- a/block/fops.c > +++ b/block/fops.c > @@ -154,6 +154,9 @@ static void blkdev_bio_end_io(struct bio *bio) > } > } > > + if (bio_integrity(bio) && (dio->iocb->ki_flags & IOCB_HAS_META)) > + bio_integrity_unmap_user(bio); How could bio_integrity() be true here without the iocb flag? > + if (!is_sync && unlikely(iocb->ki_flags & IOCB_HAS_META)) { unlikely is actively harmful here, as the code is likely if you use the feature.. > + ret = bio_integrity_map_iter(bio, iocb->private); > + if (unlikely(ret)) { > + bio_release_pages(bio, false); > + bio_clear_flag(bio, BIO_REFFED); > + bio_put(bio); > + blk_finish_plug(&plug); > + return ret; > + } This duplicates the error handling done just above. Please add a goto label to de-duplicate it. > + if (unlikely(iocb->ki_flags & IOCB_HAS_META)) { > + ret = bio_integrity_map_iter(bio, iocb->private); > + WRITE_ONCE(iocb->private, NULL); > + if (unlikely(ret)) { > + bio_put(bio); > + return ret; This probably also wants an out_bio_put label even if the duplication is minimal so far. You probably also want a WARN_ON for the iocb meta flag in __blkdev_direct_IO_simple so that we don't get caught off guard if someone adds a synchronous path using PI. > diff --git a/block/t10-pi.c b/block/t10-pi.c > index e7052a728966..cb7bc4a88380 100644 > --- a/block/t10-pi.c > +++ b/block/t10-pi.c > @@ -139,6 +139,8 @@ static void t10_pi_type1_prepare(struct request *rq) > /* Already remapped? */ > if (bip->bip_flags & BIP_MAPPED_INTEGRITY) > break; > + if (bip->bip_flags & BIP_INTEGRITY_USER) > + break; This is wrong. When submitting metadata on a partition the ref tag does need to be remapped. Please also add a tests that tests submitting metadata on a partition so that we have a regression test for this. > + BIP_INTEGRITY_USER = 1 << 9, /* Integrity payload is user > + * address > + */ .. and with the above fix this flag should not be needed. > }; > > struct bio_integrity_payload { > @@ -24,6 +27,7 @@ struct bio_integrity_payload { > unsigned short bip_vcnt; /* # of integrity bio_vecs */ > unsigned short bip_max_vcnt; /* integrity bio_vec slots */ > unsigned short bip_flags; /* control flags */ > + u16 app_tag; Please document the field even if it seems obvious.
> Add support for sending user-meta buffer. Set tags to be checked > using flags specified by user/block-layer user and underlying DIF/DIX > configuration. Introduce BLK_INTEGRITY_APP_TAG to specify apptag. > This provides a way for upper layers to specify apptag checking. We'll also need that flag for nvme, don't we? It should also be added in a block layer patch and not as part of a driver patch. > +/* > + * Can't check reftag alone or apptag alone > + */ > +static bool sd_prot_flags_valid(struct scsi_cmnd *scmd) > +{ > + struct request *rq = scsi_cmd_to_rq(scmd); > + struct bio *bio = rq->bio; > + > + if (bio_integrity_flagged(bio, BIP_CHECK_REFTAG) && > + !bio_integrity_flagged(bio, BIP_CHECK_APPTAG)) > + return false; > + if (!bio_integrity_flagged(bio, BIP_CHECK_REFTAG) && > + bio_integrity_flagged(bio, BIP_CHECK_APPTAG)) > + return false; > + return true; > +} We'll need to advertise this limitations to the application or in-kernel user somehow.. > + if ((bio_integrity_flagged(bio, BIP_CTRL_NOCHECK) == false) && > + (!dix || bio_integrity_flagged(bio, BIP_CHECK_REFTAG))) Incorrect formatting. This is better: if (!bio_integrity_flagged(bio, BIP_CTRL_NOCHECK) && (!dix || bio_integrity_flagged(bio, BIP_CHECK_REFTAG)))
On Sat, Aug 24, 2024 at 10:31:16AM +0200, Christoph Hellwig wrote: > On Fri, Aug 23, 2024 at 04:08:03PM +0530, Anuj Gupta wrote: > > Copy back the bounce buffer to user-space in entirety when the parent > > bio completes. > > This looks odd to me. The usual way to handle iterating the entire > submitter controlled data is to just iterate over the bvec array, as > done by bio_for_each_segment_all/bio_for_each_bvec_all for the bio > data. I think you want to do the same here, probably with a > similar bip_for_each_bvec_all or similar helper. That way you don't > need to stash away the iter. Currently we have the field for that, > but I really want to split up struct bio_integrity_payload into > what is actually needed for the payload and stuff only needed for > the block layer autogenerated PI (bip_bio/bio_iter/bip_work). I can add it [*], to iterate over the entire bvec array. But the original bio_iter still needs to be stored during submission, to calculate the number of bytes in the original integrity/metadata iter (as it could have gotten split, and I don't have original integrity iter stored anywhere). Do you have a different opinion? [*] diff --git a/block/bio-integrity.c b/block/bio-integrity.c index ff7de4fe74c4..f1690c644e70 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -125,11 +125,23 @@ static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip) struct bio_vec *copy = &bip->bip_vec[1]; size_t bytes = bio_iter_integrity_bytes(bi, bip->bio_iter); struct iov_iter iter; - int ret; + struct bio_vec *bvec; + struct bvec_iter_all iter_all; iov_iter_bvec(&iter, ITER_DEST, copy, nr_vecs, bytes); - ret = copy_to_iter(bvec_virt(bip->bip_vec), bytes, &iter); - WARN_ON_ONCE(ret != bytes); + bip_for_each_segment_all(bvec, bip, iter_all) { + ssize_t ret; + + ret = copy_page_to_iter(bvec->bv_page, + bvec->bv_offset, + bvec->bv_len, + &iter); + + if (!iov_iter_count(&iter)) + break; + + WARN_ON_ONCE(ret < bvec->bv_len); + } bio_integrity_unpin_bvec(copy, nr_vecs, true); } diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h index 22ff2ae16444..3132ef6f27e0 100644 --- a/include/linux/bio-integrity.h +++ b/include/linux/bio-integrity.h @@ -46,6 +46,19 @@ struct uio_meta { struct iov_iter iter; }; +static inline bool bip_next_segment(const struct bio_integrity_payload *bip, + struct bvec_iter_all *iter) +{ + if (iter->idx >= bip->bip_vcnt) + return false; + + bvec_advance(&bip->bip_vec[iter->idx], iter); + return true; +} + +#define bip_for_each_segment_all(bvl, bip, iter) \ + for (bvl = bvec_init_iter_all(&iter); bip_next_segment((bip), &iter); ) + #define BIP_CLONE_FLAGS (BIP_MAPPED_INTEGRITY | BIP_CTRL_NOCHECK | \ BIP_DISK_NOCHECK | BIP_IP_CHECKSUM | \ BIP_CHECK_GUARD | BIP_CHECK_REFTAG | \
On 8/24/2024 2:05 PM, Christoph Hellwig wrote: > How do we communicate what flags can be combined to the upper layer > and userspace given the SCSI limitations here? Will adding a new read-only sysfs attribute (under the group [*]) that publishes all valid combinations be fine? With Guard/Reftag/Apptag, we get 6 combinations. For NVMe, all can be valid. For SCSI, maximum 4 can be valid. And we factor the pi-type in while listing what all is valid. For example: 010 or 001 is not valid for SCSI and should not be shown by this. [*] const struct attribute_group blk_integrity_attr_group = { .name = "integrity", .attrs = integrity_attrs, };
Anuj, > Introduce BIP_CLONE_FLAGS describing integrity flags that should be > inherited in the cloned bip from the parent. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Kanchan, > With Guard/Reftag/Apptag, we get 6 combinations. For NVMe, all can be > valid. For SCSI, maximum 4 can be valid. And we factor the pi-type in > while listing what all is valid. For example: 010 or 001 is not valid > for SCSI and should not be shown by this. I thought we had tentatively agreed to let the block layer integrity flags only describe what the controller should do? And then let sd.c decide what to do about RDPROTECT/WRPROTECT (since host-to-target is a different protection envelope anyway). That is kind of how it works already.
On Wed, Aug 28, 2024 at 04:48:06PM +0530, Anuj Gupta wrote: > I can add it [*], to iterate over the entire bvec array. But the original > bio_iter still needs to be stored during submission, to calculate the > number of bytes in the original integrity/metadata iter (as it could have > gotten split, and I don't have original integrity iter stored anywhere). > Do you have a different opinion? Just like for the bio data, the original submitter should never use the iter, but the _all iter.
On Wed, Aug 28, 2024 at 07:12:22PM +0530, Kanchan Joshi wrote: > On 8/24/2024 2:05 PM, Christoph Hellwig wrote: > > How do we communicate what flags can be combined to the upper layer > > and userspace given the SCSI limitations here? > > Will adding a new read-only sysfs attribute (under the group [*]) that > publishes all valid combinations be fine? > That's not a good application interface. Finding the sysfs file is already a pain for direct users of the block device, and almost impossible when going through a file system.
On Wed, Aug 28, 2024 at 11:16:53PM -0400, Martin K. Petersen wrote: > I thought we had tentatively agreed to let the block layer integrity > flags only describe what the controller should do? And then let sd.c > decide what to do about RDPROTECT/WRPROTECT (since host-to-target is a > different protection envelope anyway). That is kind of how it works > already. Yes, that should easier to manage.
On Thu, Aug 29, 2024 at 8:47 AM Martin K. Petersen <martin.petersen@oracle.com> wrote: > > > Kanchan, > > > With Guard/Reftag/Apptag, we get 6 combinations. For NVMe, all can be > > valid. For SCSI, maximum 4 can be valid. And we factor the pi-type in > > while listing what all is valid. For example: 010 or 001 is not valid > > for SCSI and should not be shown by this. > > I thought we had tentatively agreed to let the block layer integrity > flags only describe what the controller should do? And then let sd.c > decide what to do about RDPROTECT/WRPROTECT (since host-to-target is a > different protection envelope anyway). That is kind of how it works > already. > Do you see that this patch (and this set of flags) are fine? If not, which specific flags do you suggest should be introduced?
Martin, Christoph On 29/08/24 06:59PM, Anuj gupta wrote: >On Thu, Aug 29, 2024 at 8:47 AM Martin K. Petersen ><martin.petersen@oracle.com> wrote: >> >> >> Kanchan, >> >> > With Guard/Reftag/Apptag, we get 6 combinations. For NVMe, all can be >> > valid. For SCSI, maximum 4 can be valid. And we factor the pi-type in >> > while listing what all is valid. For example: 010 or 001 is not valid >> > for SCSI and should not be shown by this. >> >> I thought we had tentatively agreed to let the block layer integrity >> flags only describe what the controller should do? And then let sd.c >> decide what to do about RDPROTECT/WRPROTECT (since host-to-target is a >> different protection envelope anyway). That is kind of how it works >> already. >> >Do you see that this patch (and this set of flags) are fine? >If not, which specific flags do you suggest should be introduced? While other things are sorted for next iteration, it's not fully clear what are we missing for this part. Can you comment on the above?
Anuj, > Do you see that this patch (and this set of flags) are fine? > If not, which specific flags do you suggest should be introduced? The three exposed BIP flags are fine. We'll deal with the target flag peculiarities in the SCSI disk driver.