Message ID | 1473429644-13480-5-git-send-email-ard.biesheuvel@linaro.org |
---|---|
State | Accepted |
Commit | d947fbed72226011961e5e2691f09baebf128795 |
Headers | show |
Liming: do you have any comments on this patch? On 9 September 2016 at 15:00, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > The new accelerated ARM and AARCH64 implementations take advantage of > features that are only available when the MMU and Dcache are on. So > restrict the use of this library to the DXE phase or later. > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- > MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf b/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf > index 64d11b09ef06..5ddc0cbc2d77 100644 > --- a/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf > +++ b/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf > @@ -116,6 +116,15 @@ [Sources.X64] > X64/CopyMem.S > X64/IsZeroBuffer.nasm > > +[Defines.ARM, Defines.AARCH64] > + # > + # The ARM implementations of this library may perform unaligned accesses, and > + # may use DC ZVA instructions that are only allowed when the MMU and D-cache > + # are on. Since SEC, PEI_CORE and PEIM modules may execute with the MMU off, > + # omit them from the supported module types list for this library. > + # > + LIBRARY_CLASS = BaseMemoryLib|DXE_CORE DXE_DRIVER DXE_RUNTIME_DRIVER UEFI_DRIVER UEFI_APPLICATION > + > [Sources.ARM] > Arm/ScanMem.S |GCC > Arm/SetMem.S |GCC > -- > 2.7.4 > _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Hi, On 09/09/2016 09:00 AM, Ard Biesheuvel wrote: > The new accelerated ARM and AARCH64 implementations take advantage of > features that are only available when the MMU and Dcache are on. So > restrict the use of this library to the DXE phase or later. I don't think this is sufficient because DC ZVA doesn't work against device memory/etc. That means that users have to somehow know the page/etc attributes of memory regions before they call SetMemXX() on them. I think this is a problem because nowhere in the UEFI specs do I see such restrictions on those memory operations. For a specific problematic example, the LcdGraphicsOutputBlt.c uses it for BltVideoFill() and the target of that is likely not regular cached video memory. > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- > MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf b/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf > index 64d11b09ef06..5ddc0cbc2d77 100644 > --- a/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf > +++ b/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf > @@ -116,6 +116,15 @@ [Sources.X64] > X64/CopyMem.S > X64/IsZeroBuffer.nasm > > +[Defines.ARM, Defines.AARCH64] > + # > + # The ARM implementations of this library may perform unaligned accesses, and > + # may use DC ZVA instructions that are only allowed when the MMU and D-cache > + # are on. Since SEC, PEI_CORE and PEIM modules may execute with the MMU off, > + # omit them from the supported module types list for this library. > + # > + LIBRARY_CLASS = BaseMemoryLib|DXE_CORE DXE_DRIVER DXE_RUNTIME_DRIVER UEFI_DRIVER UEFI_APPLICATION > + > [Sources.ARM] > Arm/ScanMem.S |GCC > Arm/SetMem.S |GCC > _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
On 5 April 2017 at 21:12, Jeremy Linton <jeremy.linton@arm.com> wrote: > Hi, > > On 09/09/2016 09:00 AM, Ard Biesheuvel wrote: >> >> The new accelerated ARM and AARCH64 implementations take advantage of >> features that are only available when the MMU and Dcache are on. So >> restrict the use of this library to the DXE phase or later. > > > I don't think this is sufficient because DC ZVA doesn't work against device > memory/etc. That means that users have to somehow know the page/etc > attributes of memory regions before they call SetMemXX() on them. > Yes. I literally found this out myself yesterday. Note that this applies equally to unaligned accesses. > I think this is a problem because nowhere in the UEFI specs do I see such > restrictions on those memory operations. > Using device attributes for memory is something we should ban for AArch64 in the spec. > For a specific problematic example, the LcdGraphicsOutputBlt.c uses it for > BltVideoFill() and the target of that is likely not regular cached video > memory. > Those drivers should be using EFI_MEMORY_WC not EFI_MEMORY_UC for the VRAM mapping. Note that EFI_MEMORY_UC is nGnRnE which is unnecessarily restrictive. I agree there is a general issue here which we should address by tightening the spec. I don't see a lot of value in avoiding DC ZVA and unaligned accesses altogether, I'd rather fix the code instead. Thanks, Ard. _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Hi, On 04/05/2017 03:34 PM, Ard Biesheuvel wrote: > On 5 April 2017 at 21:12, Jeremy Linton <jeremy.linton@arm.com> wrote: >> Hi, >> >> On 09/09/2016 09:00 AM, Ard Biesheuvel wrote: >>> >>> The new accelerated ARM and AARCH64 implementations take advantage of >>> features that are only available when the MMU and Dcache are on. So >>> restrict the use of this library to the DXE phase or later. >> >> >> I don't think this is sufficient because DC ZVA doesn't work against device >> memory/etc. That means that users have to somehow know the page/etc >> attributes of memory regions before they call SetMemXX() on them. >> > > Yes. I literally found this out myself yesterday. Note that this > applies equally to unaligned accesses. > > >> I think this is a problem because nowhere in the UEFI specs do I see such >> restrictions on those memory operations. >> > > Using device attributes for memory is something we should ban for > AArch64 in the spec. > >> For a specific problematic example, the LcdGraphicsOutputBlt.c uses it for >> BltVideoFill() and the target of that is likely not regular cached video >> memory. >> > > Those drivers should be using EFI_MEMORY_WC not EFI_MEMORY_UC for the > VRAM mapping. Note that EFI_MEMORY_UC is nGnRnE which is unnecessarily > restrictive. > > I agree there is a general issue here which we should address by > tightening the spec. I don't see a lot of value in avoiding DC ZVA and > unaligned accesses altogether, I'd rather fix the code instead. While I agree with the general sentiment, I find the result brittle. If it were used as a DEBUG build way to locate sub-optmimal code I would be more on board. But shipping it like this, puts it into situations where the user inadvertently changes something (say making the background black and therefore triggering the DC) or some obscure option ROM (we will get there right??!!) triggers it in a place where it can't be debugged. Particularly since we are talking boot, where the few percent perf improvement on this operation is likely completely undetectable. The one place where I can think it might even be measurable is in routines to clear system memory, and those routines could be a special case anyway. _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
On 5 April 2017 at 22:28, Jeremy Linton <jeremy.linton@arm.com> wrote: > Hi, > > > On 04/05/2017 03:34 PM, Ard Biesheuvel wrote: >> >> On 5 April 2017 at 21:12, Jeremy Linton <jeremy.linton@arm.com> wrote: >>> >>> Hi, >>> >>> On 09/09/2016 09:00 AM, Ard Biesheuvel wrote: >>>> >>>> >>>> The new accelerated ARM and AARCH64 implementations take advantage of >>>> features that are only available when the MMU and Dcache are on. So >>>> restrict the use of this library to the DXE phase or later. >>> >>> >>> >>> I don't think this is sufficient because DC ZVA doesn't work against >>> device >>> memory/etc. That means that users have to somehow know the page/etc >>> attributes of memory regions before they call SetMemXX() on them. >>> >> >> Yes. I literally found this out myself yesterday. Note that this >> applies equally to unaligned accesses. >> >> >>> I think this is a problem because nowhere in the UEFI specs do I see such >>> restrictions on those memory operations. >>> >> >> Using device attributes for memory is something we should ban for >> AArch64 in the spec. >> >>> For a specific problematic example, the LcdGraphicsOutputBlt.c uses it >>> for >>> BltVideoFill() and the target of that is likely not regular cached video >>> memory. >>> >> >> Those drivers should be using EFI_MEMORY_WC not EFI_MEMORY_UC for the >> VRAM mapping. Note that EFI_MEMORY_UC is nGnRnE which is unnecessarily >> restrictive. >> >> I agree there is a general issue here which we should address by >> tightening the spec. I don't see a lot of value in avoiding DC ZVA and >> unaligned accesses altogether, I'd rather fix the code instead. > > > > While I agree with the general sentiment, I find the result brittle. If it > were used as a DEBUG build way to locate sub-optmimal code I would be more > on board. But shipping it like this, puts it into situations where the user > inadvertently changes something (say making the background black and > therefore triggering the DC) or some obscure option ROM (we will get there > right??!!) triggers it in a place where it can't be debugged. > > Particularly since we are talking boot, where the few percent perf > improvement on this operation is likely completely undetectable. The one > place where I can think it might even be measurable is in routines to clear > system memory, and those routines could be a special case anyway. > I guess this depends on the use case. For server, it may not matter, but the case is different for mobile, and the Broadcom engineers that did some benchmarks on this code were very pleased with the result (and the speedup was significant, although I don't know which routines are the hotspots) As for option ROMs: those will link to their own BaseMemoryLib implementation (assuming that they are EDK2 based) so the only way they would have access to these routines is via the CopyMem() and SetMem() boot services. Note that that does not dismiss the concern at all, it is just a clarification. Leif, any thoughts? _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
On Wed, Apr 05, 2017 at 10:55:49PM +0100, Ard Biesheuvel wrote: > >>> I think this is a problem because nowhere in the UEFI specs do I see such > >>> restrictions on those memory operations. > >> > >> Using device attributes for memory is something we should ban for > >> AArch64 in the spec. Yes, completely agree. And doing so is generally the result of misinderstanding the memory model (i.e., it probably won't provide the guarantee that was sought). Charles/Dong? Something to add to list? Can we insert a test preventing device memory type to be set for regions with _WB attribute? Or is that already only possible through manual trickery? > >>> For a specific problematic example, the LcdGraphicsOutputBlt.c uses it > >>> for > >>> BltVideoFill() and the target of that is likely not regular cached video > >>> memory. > >> > >> Those drivers should be using EFI_MEMORY_WC not EFI_MEMORY_UC for the > >> VRAM mapping. Note that EFI_MEMORY_UC is nGnRnE which is unnecessarily > >> restrictive. > >> > >> I agree there is a general issue here which we should address by > >> tightening the spec. I don't see a lot of value in avoiding DC ZVA and > >> unaligned accesses altogether, I'd rather fix the code instead. > > > > While I agree with the general sentiment, I find the result brittle. If it > > were used as a DEBUG build way to locate sub-optmimal code I would be more > > on board. But shipping it like this, puts it into situations where the user > > inadvertently changes something (say making the background black and > > therefore triggering the DC) or some obscure option ROM (we will get there > > right??!!) triggers it in a place where it can't be debugged. > > > > Particularly since we are talking boot, where the few percent perf > > improvement on this operation is likely completely undetectable. The one > > place where I can think it might even be measurable is in routines to clear > > system memory, and those routines could be a special case anyway. > > I guess this depends on the use case. For server, it may not matter, > but the case is different for mobile, and the Broadcom engineers that > did some benchmarks on this code were very pleased with the result > (and the speedup was significant, although I don't know which routines > are the hotspots) > > As for option ROMs: those will link to their own BaseMemoryLib > implementation (assuming that they are EDK2 based) so the only way > they would have access to these routines is via the CopyMem() and > SetMem() boot services. Note that that does not dismiss the concern at > all, it is just a clarification. > > Leif, any thoughts? I would prefer if we could resolve this without waiting for a new spec version. My gut feeling is that the (end-user, I care a _lot_ less about development platforms) devices that _could_ be affected by this won't be releasing updated firmwares completely rebased onto a newer edk2 HEAD. Rather they're likely to be cherry-picking individual bugfixes and improvements. But certainly having some input from abovementioned Broadcom team, Evan & co, and others is important before we make a call. / Leif _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
On 6 April 2017 at 10:35, Leif Lindholm <leif.lindholm@linaro.org> wrote: > On Wed, Apr 05, 2017 at 10:55:49PM +0100, Ard Biesheuvel wrote: >> >>> I think this is a problem because nowhere in the UEFI specs do I see such >> >>> restrictions on those memory operations. >> >> >> >> Using device attributes for memory is something we should ban for >> >> AArch64 in the spec. > > Yes, completely agree. And doing so is generally the result of > misinderstanding the memory model (i.e., it probably won't provide the > guarantee that was sought). > Charles/Dong? Something to add to list? > As an additional note, the UEFI spec mandates that unaligned accesses are enabled for AArch64, which clearly expresses the intent that routines operating on memory should be able to do so without going out of its way to avoid unaligned accesses. > Can we insert a test preventing device memory type to be set for > regions with _WB attribute? Or is that already only possible through > manual trickery? > We should simply remove the _UC attribute from all memory. I have already done so for many of the platforms I more or less maintain (and for virt, we removed _WT and _WC as well, because KVM only supports _WB) Note that this does not prevent the NOR and RTC drivers from creating _UC regions for their own MMIO registers, it just prevents them from being remapped _UC via the DXE services. >> >>> For a specific problematic example, the LcdGraphicsOutputBlt.c uses it >> >>> for >> >>> BltVideoFill() and the target of that is likely not regular cached video >> >>> memory. >> >> >> >> Those drivers should be using EFI_MEMORY_WC not EFI_MEMORY_UC for the >> >> VRAM mapping. Note that EFI_MEMORY_UC is nGnRnE which is unnecessarily >> >> restrictive. >> >> >> >> I agree there is a general issue here which we should address by >> >> tightening the spec. I don't see a lot of value in avoiding DC ZVA and >> >> unaligned accesses altogether, I'd rather fix the code instead. >> > >> > While I agree with the general sentiment, I find the result brittle. If it >> > were used as a DEBUG build way to locate sub-optmimal code I would be more >> > on board. But shipping it like this, puts it into situations where the user >> > inadvertently changes something (say making the background black and >> > therefore triggering the DC) or some obscure option ROM (we will get there >> > right??!!) triggers it in a place where it can't be debugged. >> > >> > Particularly since we are talking boot, where the few percent perf >> > improvement on this operation is likely completely undetectable. The one >> > place where I can think it might even be measurable is in routines to clear >> > system memory, and those routines could be a special case anyway. >> >> I guess this depends on the use case. For server, it may not matter, >> but the case is different for mobile, and the Broadcom engineers that >> did some benchmarks on this code were very pleased with the result >> (and the speedup was significant, although I don't know which routines >> are the hotspots) >> >> As for option ROMs: those will link to their own BaseMemoryLib >> implementation (assuming that they are EDK2 based) so the only way >> they would have access to these routines is via the CopyMem() and >> SetMem() boot services. Note that that does not dismiss the concern at >> all, it is just a clarification. >> >> Leif, any thoughts? > > I would prefer if we could resolve this without waiting for a new spec > version. > > My gut feeling is that the (end-user, I care a _lot_ less > about development platforms) devices that _could_ be affected by this > won't be releasing updated firmwares completely rebased onto a newer > edk2 HEAD. Rather they're likely to be cherry-picking individual > bugfixes and improvements. > > But certainly having some input from abovementioned Broadcom team, > Evan & co, and others is important before we make a call. > > / > Leif _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
On Thu, Apr 06, 2017 at 10:43:57AM +0100, Ard Biesheuvel wrote: > On 6 April 2017 at 10:35, Leif Lindholm <leif.lindholm@linaro.org> wrote: > > On Wed, Apr 05, 2017 at 10:55:49PM +0100, Ard Biesheuvel wrote: > >> >>> I think this is a problem because nowhere in the UEFI specs do I see such > >> >>> restrictions on those memory operations. > >> >> > >> >> Using device attributes for memory is something we should ban for > >> >> AArch64 in the spec. > > > > Yes, completely agree. And doing so is generally the result of > > misinderstanding the memory model (i.e., it probably won't provide the > > guarantee that was sought). > > Charles/Dong? Something to add to list? > > As an additional note, the UEFI spec mandates that unaligned accesses > are enabled for AArch64, which clearly expresses the intent that > routines operating on memory should be able to do so without going out > of its way to avoid unaligned accesses. It does - but only if you already understand the memory model. > > Can we insert a test preventing device memory type to be set for > > regions with _WB attribute? Or is that already only possible through > > manual trickery? > > We should simply remove the _UC attribute from all memory. I have > already done so for many of the platforms I more or less maintain (and > for virt, we removed _WT and _WC as well, because KVM only supports > _WB) Agreed. > Note that this does not prevent the NOR and RTC drivers from creating > _UC regions for their own MMIO registers, it just prevents them from > being remapped _UC via the DXE services. OK, good. / Leif _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
diff --git a/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf b/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf index 64d11b09ef06..5ddc0cbc2d77 100644 --- a/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf +++ b/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf @@ -116,6 +116,15 @@ [Sources.X64] X64/CopyMem.S X64/IsZeroBuffer.nasm +[Defines.ARM, Defines.AARCH64] + # + # The ARM implementations of this library may perform unaligned accesses, and + # may use DC ZVA instructions that are only allowed when the MMU and D-cache + # are on. Since SEC, PEI_CORE and PEIM modules may execute with the MMU off, + # omit them from the supported module types list for this library. + # + LIBRARY_CLASS = BaseMemoryLib|DXE_CORE DXE_DRIVER DXE_RUNTIME_DRIVER UEFI_DRIVER UEFI_APPLICATION + [Sources.ARM] Arm/ScanMem.S |GCC Arm/SetMem.S |GCC
The new accelerated ARM and AARCH64 implementations take advantage of features that are only available when the MMU and Dcache are on. So restrict the use of this library to the DXE phase or later. Contributed-under: TianoCore Contribution Agreement 1.0 Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> --- MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf | 9 +++++++++ 1 file changed, 9 insertions(+) -- 2.7.4 _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel