Message ID | cover.1730021644.git.herbert@gondor.apana.org.au |
---|---|
Headers | show |
Series | Multibuffer hashing take two | expand |
Hi Herbert, On Sun, Oct 27, 2024 at 05:45:28PM +0800, Herbert Xu wrote: > Multibuffer hashing was a constant sore while it was part of the > kernel. It was very buggy and unnecessarily complex. Finally > it was removed when it had been broken for a while without anyone > noticing. > > Peace reigned in its absence, until Eric Biggers made a proposal > for its comeback :) > > Link: https://lore.kernel.org/all/20240415213719.120673-1-ebiggers@kernel.org/ > > The issue is that the SHA algorithm (and possibly others) is > inherently not parallelisable. Therefore the only way to exploit > parallelism on modern CPUs is to hash multiple indendent streams > of data. > > Eric's proposal is a simple interface bolted onto shash that takes > two streams of data of identical length. I thought the limitation > of two was too small, and Eric addressed that in his latest version: > > Link: https://lore.kernel.org/all/20241001153718.111665-2-ebiggers@kernel.org/ > > However, I still disliked the addition of this to shash as it meant > that users would have to spend extra effort in order to accumulate > and maintain multiple streams of data. > > My preference is to use ahash as the basis of multibuffer, because > its request object interface is perfectly suited to chaining. As I've explained before, I think your proposed approach is much too complex, inefficient, and broken compared to my much simpler patchset https://lore.kernel.org/linux-crypto/20241001153718.111665-1-ebiggers@kernel.org/. If this is really going to be the API for multibuffer hashing, then I'm not very interested in using or contributing to it. The much larger positive diffstat in your patchset alone should speak for itself, especially when it doesn't include essential pieces that were included in my smaller patchset, such as self-tests, dm-verity and fs-verity support, SHA-NI and ARMv8 CE support. Note that due to the complexity of your API, it would require far more updates to the tests in order to cover all the new edge cases. This patchset also removes the shash support for sha256-avx2, which is not realistic, as there are still ~100 users of shash in the kernel. You say that your API is needed so that users don't need to "spend extra effort in order to accumulate and maintain multiple streams of data." That's incorrect, though. The users, e.g. {dm,fs}-verity, will need to do that anyway even with your API. I think this would have been clear if you had tried to update them to use your API. With this patchset I am also seeing random crashes in the x86 sha256 glue code, and all multibuffer SHA256 hashes come back as all-zeroes. Bugs like this were predictable, of course. There's a high amount of complexity inherent in the ahash request chaining approach, both at the API layer and in the per-algorithm glue code. It will be hard to get everything right. And I am just submitting 8 equal-length requests, so I haven't even tried any of the edge cases that your proposed API allows, like submitting requests that aren't synced up properly. I don't think it's worth the time for me to try to debug and fix your code and add the missing pieces, when we could just choose a much simpler design that would result in far fewer bugs. Especially for cryptographic code, choosing sound designs that minimize the number of bugs should be the highest priority. I understand that you're trying to contribute something useful, and perhaps solve a wider set of problems than I set out to solve. The reality, though, is that this patchset is creating more problems than it's solving. Compared to my patchset, it makes things harder, more error-prone, and less efficient for the users who actually want the multibuffer hashing, and likewise for wiring it up to the low-level algorithms. It also doesn't bring us meaningfully closer to applying multibuffer crypto in other applications like IPsec where it will be very difficult to apply, is irrelevant for the most common algorithm used, and would at best provide less benefit than other easier-to-implement optimizations. The virtual address to support in ahash is a somewhat nice addition, but it could go in independently, and ultimately it seems not that useful, given that users could just use shash or lib/crypto/ instead. shash will still have less overhead, and lib/crypto/ even less, despite ahash getting slightly better. Also, users who actually care about old-school style crypto accelerators need pages + async processing anyway for optimal performance, especially given this patchset's proposed approach of handling virtual addresses using a bounce page. If you're really interested in the AVX2 multibuffer SHA256 for some reason, I'd be willing to clean up that assembly code and wire it up to the much simpler API that I proposed. Despite the favorable microbenchmark result, this would be of limited use, for various reasons that I've explained before. But it could be done if desired, and it would be much simpler than what you have. - Eric
On Mon, Oct 28, 2024 at 12:00:45PM -0700, Eric Biggers wrote: > > You say that your API is needed so that users don't need to "spend extra effort > in order to accumulate and maintain multiple streams of data." That's > incorrect, though. The users, e.g. {dm,fs}-verity, will need to do that anyway > even with your API. I think this would have been clear if you had tried to > update them to use your API. It's a lot easier once you switch them back to dynamically allocated requests instead of having them on the stack. Storing the hash state on the stack of course limits your ability to aggregate the hash operations since each one is hundreds-of-bytes long. We could also introduce SYNC_AHASH_REQ_ON_STACK like we do for skcipher but I think we should move away from that for the cases where aggregation makes sense. Note that when I say switching back to ahash, I'm not talking about switching back to an asynchronous API. If you're happy with the synchronous offerings then you're totally free to use ahash with synchronous-only implementations, just like skcipher. > With this patchset I am also seeing random crashes in the x86 sha256 glue code, > and all multibuffer SHA256 hashes come back as all-zeroes. Bugs like this were Guilty as charged. I haven't tested this at all apart from timing the speed. However, adding a proper test is trivial. We already have the ahash_requests ready to go so they just have to be chained together and submitted en-masse. > If you're really interested in the AVX2 multibuffer SHA256 for some reason, I'd > be willing to clean up that assembly code and wire it up to the much simpler API > that I proposed. Despite the favorable microbenchmark result, this would be of > limited use, for various reasons that I've explained before. But it could be > done if desired, and it would be much simpler than what you have. No I have zero interest in AVX2. I simply picked that because it was already in the kernel git history and I wasn't certain whether my CPU is recent enough to see much of a benefit from your SHA-NI code. Cheers,