Message ID | 20220729194532.228403-1-gpiccoli@igalia.com |
---|---|
Headers | show |
Series | The UEFI panic notification mechanism, 2nd round | expand |
On 29/07/2022 16:45, Guilherme G. Piccoli wrote: > Hey folks, this is the 2nd iteration of the patchset adding a simple > mechanism to notify the UEFI firmware about a panic event in the kernel. > V1 here: > https://lore.kernel.org/lkml/20220520195028.1347426-1-gpiccoli@igalia.com/ > > > First thing, the differences in this V2: > > - Ardb response in V1 mentioned a refactor aimed for v5.20 that removes an > obsolete/confusing way of setting EFI variables - this led to a weird > condition, deleted variables stayed in sysfs after deletion. Well, I've > refactored the code based on efi/next, so I'm using the recommended API > now - thanks a bunch for the advice Ardb! > > - I've changed NULL-terminating char in patch 1 to the format I've seen > in Ardb's code, L'\0'. > > - Patch 2 is new, it's somewhat a fix for a patch only in efi/next, part > of the efivar refactor. > > > In the V1 review, it was mentioned we could maybe use efi-pstore as a way > to signal the firmware about a panic event - in the end, the efi-pstore > mechanism can collect a dmesg, so it's even richer in the information level. > But I disagree that it is the way to go, for 3 main reasons: > > a) efi-pstore could be impossible to use, if the users are already using > another pstore backend (like ramoops), which is _exactly_ our case! > Of course, we could rework pstore and allow 2 backends, quite a bit of work, > but...see next points! > > b) Even if (a) is a not an issue, we have another one, even more important: > signaling the firmware about a panic is *different* than collecting a bunch > of data, a full dmesg even. This could be considered a security issue for > some users; also, the dmesg collected consumes a bunch more memory in the > (potentially scarce) UEFI available memory. > Although related, the goal of pstore is orthogonal to our mechanism here: > users rely on pstore to collect data, our proposal is a simple infrastructure > to just let the firmware know about a panic. Our kernel module also shows a > message and automatically clears the UEFI variable, so it tracks a single > panic, whereas efi-pstore logs are kept by default, in order to provide > data to users. > > c) Finally, it's faster and less "invasive"/risky to just write a byte in a > variable on a panic event than having a ksmg dumper collecting the full dmesg > and writing it to the UEFI memory; again, some users wish to have the logs, > but not all of them. > > > With all of that said, I think this module makes sense, it's a very simple > solution that opens doors to firmware panic handling approaches, like in our > proposed case (a different splash screen on panic). > > Finally, the variable name (PanicWarn) and value (0xFF by default, can be > changed by a module parameter) are just my personal choices but I'm open to > suggestions, not strongly attached to them heh > > Thanks again for the reviews/suggestions! > Cheers, > > > Guilherme > > Hi Ard, sorry for the ping =] Any opinions in this one? Patch 2 is a simple fix, BTW. Cheers, Guilherme
On Mon, 29 Aug 2022 at 15:04, Guilherme G. Piccoli <gpiccoli@igalia.com> wrote: > > On 29/07/2022 16:45, Guilherme G. Piccoli wrote: > > Hey folks, this is the 2nd iteration of the patchset adding a simple > > mechanism to notify the UEFI firmware about a panic event in the kernel. > > V1 here: > > https://lore.kernel.org/lkml/20220520195028.1347426-1-gpiccoli@igalia.com/ > > > > > > First thing, the differences in this V2: > > > > - Ardb response in V1 mentioned a refactor aimed for v5.20 that removes an > > obsolete/confusing way of setting EFI variables - this led to a weird > > condition, deleted variables stayed in sysfs after deletion. Well, I've > > refactored the code based on efi/next, so I'm using the recommended API > > now - thanks a bunch for the advice Ardb! > > > > - I've changed NULL-terminating char in patch 1 to the format I've seen > > in Ardb's code, L'\0'. > > > > - Patch 2 is new, it's somewhat a fix for a patch only in efi/next, part > > of the efivar refactor. > > > > > > In the V1 review, it was mentioned we could maybe use efi-pstore as a way > > to signal the firmware about a panic event - in the end, the efi-pstore > > mechanism can collect a dmesg, so it's even richer in the information level. > > But I disagree that it is the way to go, for 3 main reasons: > > > > a) efi-pstore could be impossible to use, if the users are already using > > another pstore backend (like ramoops), which is _exactly_ our case! > > Of course, we could rework pstore and allow 2 backends, quite a bit of work, > > but...see next points! > > > > b) Even if (a) is a not an issue, we have another one, even more important: > > signaling the firmware about a panic is *different* than collecting a bunch > > of data, a full dmesg even. This could be considered a security issue for > > some users; also, the dmesg collected consumes a bunch more memory in the > > (potentially scarce) UEFI available memory. > > Although related, the goal of pstore is orthogonal to our mechanism here: > > users rely on pstore to collect data, our proposal is a simple infrastructure > > to just let the firmware know about a panic. Our kernel module also shows a > > message and automatically clears the UEFI variable, so it tracks a single > > panic, whereas efi-pstore logs are kept by default, in order to provide > > data to users. > > > > c) Finally, it's faster and less "invasive"/risky to just write a byte in a > > variable on a panic event than having a ksmg dumper collecting the full dmesg > > and writing it to the UEFI memory; again, some users wish to have the logs, > > but not all of them. > > > > > > With all of that said, I think this module makes sense, it's a very simple > > solution that opens doors to firmware panic handling approaches, like in our > > proposed case (a different splash screen on panic). > > > > Finally, the variable name (PanicWarn) and value (0xFF by default, can be > > changed by a module parameter) are just my personal choices but I'm open to > > suggestions, not strongly attached to them heh > > > > Thanks again for the reviews/suggestions! > > Cheers, > > > > > > Guilherme > > > > > > Hi Ard, sorry for the ping =] > > Any opinions in this one? Patch 2 is a simple fix, BTW. Hey, No worries about the ping - apologies for the late response, I was on vacation. I still don't see the point of this series, but I will take the fix if you could please rebase it so it doesn't depend on the first patch. Thanks, Ard.
On 05/09/2022 06:50, Ard Biesheuvel wrote: > [...] >> Hi Ard, sorry for the ping =] >> >> Any opinions in this one? Patch 2 is a simple fix, BTW. > > Hey, > > No worries about the ping - apologies for the late response, I was on vacation. Hi Ard, no need for apologies at all - this is a known vacation period in the year =) > > I still don't see the point of this series, but I will take the fix if > you could please rebase it so it doesn't depend on the first patch. > That part is sad heheh I mean, it's a pity I could convince you - I still don't see how can kernel let UEFI know about a panic without this patch (or pstore, which I still think is the wrong way and sometimes impossible to use, due to other pstore backend usage). Anyway, nothing I can do about that it seems, unfortunately heheh Submitted the V2 fix here: https://lore.kernel.org/linux-efi/20220909194214.186731-1-gpiccoli@igalia.com/ Thanks, Guilherme