Message ID | 20210417023323.852530-1-damien.lemoal@wdc.com |
---|---|
Headers | show |
Series | Fix dm-crypt zoned block device support | expand |
On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote: > Synchronous writes to sequential zone files cannot use zone append > operations if the underlying zoned device queue limit > max_zone_append_sectors is 0, indicating that the device does not > support this operation. In this case, fall back to using regular write > operations. Zone append is a mandatory feature of the zoned device API.
On 2021/04/19 15:45, Christoph Hellwig wrote: > On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote: >> Synchronous writes to sequential zone files cannot use zone append >> operations if the underlying zoned device queue limit >> max_zone_append_sectors is 0, indicating that the device does not >> support this operation. In this case, fall back to using regular write >> operations. > > Zone append is a mandatory feature of the zoned device API. Yes, I am well aware of that. All physical zoned devices and null blk do support zone append, but the logical device created by dm-crypt is out. And we cannot simply disable zone support in dm-crypt as there are use cases out there in the field that I am aware of, in SMR space. So this series is a compromise: preserve dm-crypt zone support for SMR (no one uses the zone append emulation yet, as far as I know) by disabling zone append. For zonefs, we can: 1) refuse to mount if ZA is disabled, same as btrfs 2) Do as I did in the patch, fallback to regular writes since that is easy to do (zonefs file size tracks the WP position already). I chose option (2) to allow for SMR+dm-crypt to still work with zonefs.
On Mon, Apr 19, 2021 at 07:08:46AM +0000, Damien Le Moal wrote:
> 1) refuse to mount if ZA is disabled, same as btrfs
Yes, please do that.
On Sat, 17 Apr 2021, Damien Le Moal wrote: > Mike, > > Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector > of the zone to be written instead of the actual location sector to > write. The write location is determined by the device and returned to > the host upon completion of the operation. I'd like to ask what's the reason for this semantics? Why can't users of the zoned device supply real sector numbers? > This interface, while simple and efficient for writing into sequential > zones of a zoned block device, is incompatible with the use of sector > values to calculate a cypher block IV. All data written in a zone is > encrypted using an IV calculated from the first sectors of the zone, > but read operation will specify any sector within the zone, resulting > in an IV mismatch between encryption and decryption. Reads fail in that > case. I would say that it is incompatible with all dm targets - even the linear target is changing the sector number and so it may redirect the write outside of the range specified in dm-table and cause corruption. Instead of complicating device mapper with imperfect support, I would just disable REQ_OP_ZONE_APPEND on device mapper at all. Mikulas
On 2021/04/19 21:52, Mikulas Patocka wrote: > > > On Sat, 17 Apr 2021, Damien Le Moal wrote: > >> Mike, >> >> Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector >> of the zone to be written instead of the actual location sector to >> write. The write location is determined by the device and returned to >> the host upon completion of the operation. > > I'd like to ask what's the reason for this semantics? Why can't users of > the zoned device supply real sector numbers? They can, if they use regular write commands :) Zone append was designed to address sequential write ordering difficulties on the host. Unlike regular writes which must be delivered to the device in sequential order, zone append commands can be sent to the device in any order. The device will process the write at the WP position when the command is executed and return the first sector written. This command makes it easy on the host in multi-queue environment and avoids the need for serializing commands & locking zones for writing. So very efficient performance. >> This interface, while simple and efficient for writing into sequential >> zones of a zoned block device, is incompatible with the use of sector >> values to calculate a cypher block IV. All data written in a zone is >> encrypted using an IV calculated from the first sectors of the zone, >> but read operation will specify any sector within the zone, resulting >> in an IV mismatch between encryption and decryption. Reads fail in that >> case. > > I would say that it is incompatible with all dm targets - even the linear > target is changing the sector number and so it may redirect the write > outside of the range specified in dm-table and cause corruption. DM remapping of BIO sectors is zone compatible because target entries must be zone aligned. In the case of zone append, the BIO sector always point to the start sector of the target zone. DM sector remapping will remap that to another zone start as all zones are the same size. No issue here. We extensively use dm-linear for various test environment to reduce the size of the device tested (to speed up tests). I am confident there are no problems there. > Instead of complicating device mapper with imperfect support, I would just > disable REQ_OP_ZONE_APPEND on device mapper at all. That was my initial approach, but for dm-crypt only since other targets that support zoned devices are fine. However, this breaks zoned block device requirement that zone append be supported so that users are presented with a uniform interface for different devices. So while simple to do, disabling zone append is far from ideal. > > Mikulas > >
On Mon, 19 Apr 2021, Damien Le Moal wrote: > > I would say that it is incompatible with all dm targets - even the linear > > target is changing the sector number and so it may redirect the write > > outside of the range specified in dm-table and cause corruption. > > DM remapping of BIO sectors is zone compatible because target entries must be > zone aligned. In the case of zone append, the BIO sector always point to the > start sector of the target zone. DM sector remapping will remap that to another > zone start as all zones are the same size. No issue here. We extensively use > dm-linear for various test environment to reduce the size of the device tested > (to speed up tests). I am confident there are no problems there. > > > Instead of complicating device mapper with imperfect support, I would just > > disable REQ_OP_ZONE_APPEND on device mapper at all. > > That was my initial approach, but for dm-crypt only since other targets that > support zoned devices are fine. However, this breaks zoned block device > requirement that zone append be supported so that users are presented with a > uniform interface for different devices. So while simple to do, disabling zone > append is far from ideal. So, we could enable it for the linear target and disable for all other targets? I talked with Milan about it and he doesn't want to add more bloat to the crypt target. I agree with him. Mikulas
On 19/04/2021 15:55, Mikulas Patocka wrote: > > > On Mon, 19 Apr 2021, Damien Le Moal wrote: > >>> I would say that it is incompatible with all dm targets - even the linear >>> target is changing the sector number and so it may redirect the write >>> outside of the range specified in dm-table and cause corruption. >> >> DM remapping of BIO sectors is zone compatible because target entries must be >> zone aligned. In the case of zone append, the BIO sector always point to the >> start sector of the target zone. DM sector remapping will remap that to another >> zone start as all zones are the same size. No issue here. We extensively use >> dm-linear for various test environment to reduce the size of the device tested >> (to speed up tests). I am confident there are no problems there. >> >>> Instead of complicating device mapper with imperfect support, I would just >>> disable REQ_OP_ZONE_APPEND on device mapper at all. >> >> That was my initial approach, but for dm-crypt only since other targets that >> support zoned devices are fine. However, this breaks zoned block device >> requirement that zone append be supported so that users are presented with a >> uniform interface for different devices. So while simple to do, disabling zone >> append is far from ideal. > > So, we could enable it for the linear target and disable for all other > targets? > > I talked with Milan about it and he doesn't want to add more bloat to the > crypt target. I agree with him. This is all fine even for dm-crypt IF the tweaking is unique for the sector position (it can be something just derived from the sector offset in principle). For FDE, we must never allow writing sectors to different positions with the same tweak (IV) and key - there are real attacks based on this issue. So zones can do any recalculation and reshuffling it wants if sector tweak in dm-crypt is unique. (Another solution would be to use different keys for different areas, but that is not possible with dm-crypt or FDE in general, but fs encryption can do that.) If you want dm-crypt to support zones properly, there is a need for emulation of the real sector offset - because that is what IV expects now. And I think such emulation should be in DM core, not in dm-crypt itself, because other targets can need the same functionality (I guess that dm-integrity journal has a problem with that already, Mikulas will know more). For online reencryption we also use multiple targets in the table that dynamically moves (linear combined with dm-crypt), so dm-crypt must support all commands as dm-linear to make this work. I hope I understand the problem correctly; all I want is to so avoid patching the wrong place (dmcrypt crypto) because that problem will appear elsewhere later. Also for security it would be nice to not add exceptions to encryption code - it is always recipe for disaster. Milan
On 2021-04-19 2:45 a.m., Christoph Hellwig wrote: > On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote: >> Synchronous writes to sequential zone files cannot use zone append >> operations if the underlying zoned device queue limit >> max_zone_append_sectors is 0, indicating that the device does not >> support this operation. In this case, fall back to using regular write >> operations. > > Zone append is a mandatory feature of the zoned device API. So a hack required for ZNS and not needed by ZBC and ZAC becomes a "mandatory feature" in a Linux API. Like many hacks, that one might come back to bite you :-) Doug Gilbert
On 2021/04/20 10:20, Douglas Gilbert wrote: > On 2021-04-19 2:45 a.m., Christoph Hellwig wrote: >> On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote: >>> Synchronous writes to sequential zone files cannot use zone append >>> operations if the underlying zoned device queue limit >>> max_zone_append_sectors is 0, indicating that the device does not >>> support this operation. In this case, fall back to using regular write >>> operations. >> >> Zone append is a mandatory feature of the zoned device API. > > So a hack required for ZNS and not needed by ZBC and ZAC becomes > a "mandatory feature" in a Linux API. Like many hacks, that one might > come back to bite you :-) Zone append is not a hack in ZNS. It is a write interface that fits very well with the multi-queue nature of NVMe. The "hack" is the emulation in scsi. We decided on having this mandatory for zoned devices (all types) to make sure that file systems do not have to implement different IO paths for sequential writing to zones. Zone append does simplify a lot of things and allows to get the best performance from ZNS drives. Zone write locking/serialization of writes per zones using regular writes is much harder to implement, make a mess of the file system code, and would kill write performance on ZNS.