diff mbox

[edk2,v5,4/4] MdePkg/BaseMemoryLibOptDxe ARM|AARCH64: disallow use in SEC & PEI phases

Message ID 1473429644-13480-5-git-send-email-ard.biesheuvel@linaro.org
State Accepted
Commit d947fbed72226011961e5e2691f09baebf128795
Headers show

Commit Message

Ard Biesheuvel Sept. 9, 2016, 2 p.m. UTC
The new accelerated ARM and AARCH64 implementations take advantage of
features that are only available when the MMU and Dcache are on. So
restrict the use of this library to the DXE phase or later.

Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---
 MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf | 9 +++++++++
 1 file changed, 9 insertions(+)

-- 
2.7.4

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Comments

Ard Biesheuvel Sept. 13, 2016, 2:49 p.m. UTC | #1
Liming: do you have any comments on this patch?


On 9 September 2016 at 15:00, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> The new accelerated ARM and AARCH64 implementations take advantage of

> features that are only available when the MMU and Dcache are on. So

> restrict the use of this library to the DXE phase or later.

>

> Contributed-under: TianoCore Contribution Agreement 1.0

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---

>  MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf | 9 +++++++++

>  1 file changed, 9 insertions(+)

>

> diff --git a/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf b/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf

> index 64d11b09ef06..5ddc0cbc2d77 100644

> --- a/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf

> +++ b/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf

> @@ -116,6 +116,15 @@ [Sources.X64]

>    X64/CopyMem.S

>    X64/IsZeroBuffer.nasm

>

> +[Defines.ARM, Defines.AARCH64]

> +  #

> +  # The ARM implementations of this library may perform unaligned accesses, and

> +  # may use DC ZVA instructions that are only allowed when the MMU and D-cache

> +  # are on. Since SEC, PEI_CORE and PEIM modules may execute with the MMU off,

> +  # omit them from the supported module types list for this library.

> +  #

> +  LIBRARY_CLASS = BaseMemoryLib|DXE_CORE DXE_DRIVER DXE_RUNTIME_DRIVER UEFI_DRIVER UEFI_APPLICATION

> +

>  [Sources.ARM]

>    Arm/ScanMem.S       |GCC

>    Arm/SetMem.S        |GCC

> --

> 2.7.4

>

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Jeremy Linton April 5, 2017, 8:12 p.m. UTC | #2
Hi,

On 09/09/2016 09:00 AM, Ard Biesheuvel wrote:
> The new accelerated ARM and AARCH64 implementations take advantage of

> features that are only available when the MMU and Dcache are on. So

> restrict the use of this library to the DXE phase or later.


I don't think this is sufficient because DC ZVA doesn't work against 
device memory/etc. That means that users have to somehow know the 
page/etc attributes of memory regions before they call SetMemXX() on them.

I think this is a problem because nowhere in the UEFI specs do I see 
such restrictions on those memory operations.

For a specific problematic example, the LcdGraphicsOutputBlt.c uses it 
for BltVideoFill() and the target of that is likely not regular cached 
video memory.



>

> Contributed-under: TianoCore Contribution Agreement 1.0

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---

>  MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf | 9 +++++++++

>  1 file changed, 9 insertions(+)

>

> diff --git a/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf b/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf

> index 64d11b09ef06..5ddc0cbc2d77 100644

> --- a/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf

> +++ b/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf

> @@ -116,6 +116,15 @@ [Sources.X64]

>    X64/CopyMem.S

>    X64/IsZeroBuffer.nasm

>

> +[Defines.ARM, Defines.AARCH64]

> +  #

> +  # The ARM implementations of this library may perform unaligned accesses, and

> +  # may use DC ZVA instructions that are only allowed when the MMU and D-cache

> +  # are on. Since SEC, PEI_CORE and PEIM modules may execute with the MMU off,

> +  # omit them from the supported module types list for this library.

> +  #

> +  LIBRARY_CLASS = BaseMemoryLib|DXE_CORE DXE_DRIVER DXE_RUNTIME_DRIVER UEFI_DRIVER UEFI_APPLICATION

> +

>  [Sources.ARM]

>    Arm/ScanMem.S       |GCC

>    Arm/SetMem.S        |GCC

>


_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Ard Biesheuvel April 5, 2017, 8:34 p.m. UTC | #3
On 5 April 2017 at 21:12, Jeremy Linton <jeremy.linton@arm.com> wrote:
> Hi,

>

> On 09/09/2016 09:00 AM, Ard Biesheuvel wrote:

>>

>> The new accelerated ARM and AARCH64 implementations take advantage of

>> features that are only available when the MMU and Dcache are on. So

>> restrict the use of this library to the DXE phase or later.

>

>

> I don't think this is sufficient because DC ZVA doesn't work against device

> memory/etc. That means that users have to somehow know the page/etc

> attributes of memory regions before they call SetMemXX() on them.

>


Yes. I literally found this out myself yesterday. Note that this
applies equally to unaligned accesses.


> I think this is a problem because nowhere in the UEFI specs do I see such

> restrictions on those memory operations.

>


Using device attributes for memory is something we should ban for
AArch64 in the spec.

> For a specific problematic example, the LcdGraphicsOutputBlt.c uses it for

> BltVideoFill() and the target of that is likely not regular cached video

> memory.

>


Those drivers should be using EFI_MEMORY_WC not EFI_MEMORY_UC for the
VRAM mapping. Note that EFI_MEMORY_UC is nGnRnE which is unnecessarily
restrictive.

I agree there is a general issue here which we should address by
tightening the spec. I don't see a lot of value in avoiding DC ZVA and
unaligned accesses altogether, I'd rather fix the code instead.

Thanks,
Ard.
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Jeremy Linton April 5, 2017, 9:28 p.m. UTC | #4
Hi,

On 04/05/2017 03:34 PM, Ard Biesheuvel wrote:
> On 5 April 2017 at 21:12, Jeremy Linton <jeremy.linton@arm.com> wrote:

>> Hi,

>>

>> On 09/09/2016 09:00 AM, Ard Biesheuvel wrote:

>>>

>>> The new accelerated ARM and AARCH64 implementations take advantage of

>>> features that are only available when the MMU and Dcache are on. So

>>> restrict the use of this library to the DXE phase or later.

>>

>>

>> I don't think this is sufficient because DC ZVA doesn't work against device

>> memory/etc. That means that users have to somehow know the page/etc

>> attributes of memory regions before they call SetMemXX() on them.

>>

>

> Yes. I literally found this out myself yesterday. Note that this

> applies equally to unaligned accesses.

>

>

>> I think this is a problem because nowhere in the UEFI specs do I see such

>> restrictions on those memory operations.

>>

>

> Using device attributes for memory is something we should ban for

> AArch64 in the spec.

>

>> For a specific problematic example, the LcdGraphicsOutputBlt.c uses it for

>> BltVideoFill() and the target of that is likely not regular cached video

>> memory.

>>

>

> Those drivers should be using EFI_MEMORY_WC not EFI_MEMORY_UC for the

> VRAM mapping. Note that EFI_MEMORY_UC is nGnRnE which is unnecessarily

> restrictive.

>

> I agree there is a general issue here which we should address by

> tightening the spec. I don't see a lot of value in avoiding DC ZVA and

> unaligned accesses altogether, I'd rather fix the code instead.



While I agree with the general sentiment, I find the result brittle. If 
it were used as a DEBUG build way to locate sub-optmimal code I would be 
more on board. But shipping it like this, puts it into situations where 
the user inadvertently changes something (say making the background 
black and therefore triggering the DC) or some obscure option ROM (we 
will get there right??!!) triggers it in a place where it can't be 
debugged.

Particularly since we are talking boot, where the few percent perf 
improvement on this operation is likely completely undetectable. The one 
place where I can think it might even be measurable is in routines to 
clear system memory, and those routines could be a special case anyway.


_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Ard Biesheuvel April 5, 2017, 9:55 p.m. UTC | #5
On 5 April 2017 at 22:28, Jeremy Linton <jeremy.linton@arm.com> wrote:
> Hi,

>

>

> On 04/05/2017 03:34 PM, Ard Biesheuvel wrote:

>>

>> On 5 April 2017 at 21:12, Jeremy Linton <jeremy.linton@arm.com> wrote:

>>>

>>> Hi,

>>>

>>> On 09/09/2016 09:00 AM, Ard Biesheuvel wrote:

>>>>

>>>>

>>>> The new accelerated ARM and AARCH64 implementations take advantage of

>>>> features that are only available when the MMU and Dcache are on. So

>>>> restrict the use of this library to the DXE phase or later.

>>>

>>>

>>>

>>> I don't think this is sufficient because DC ZVA doesn't work against

>>> device

>>> memory/etc. That means that users have to somehow know the page/etc

>>> attributes of memory regions before they call SetMemXX() on them.

>>>

>>

>> Yes. I literally found this out myself yesterday. Note that this

>> applies equally to unaligned accesses.

>>

>>

>>> I think this is a problem because nowhere in the UEFI specs do I see such

>>> restrictions on those memory operations.

>>>

>>

>> Using device attributes for memory is something we should ban for

>> AArch64 in the spec.

>>

>>> For a specific problematic example, the LcdGraphicsOutputBlt.c uses it

>>> for

>>> BltVideoFill() and the target of that is likely not regular cached video

>>> memory.

>>>

>>

>> Those drivers should be using EFI_MEMORY_WC not EFI_MEMORY_UC for the

>> VRAM mapping. Note that EFI_MEMORY_UC is nGnRnE which is unnecessarily

>> restrictive.

>>

>> I agree there is a general issue here which we should address by

>> tightening the spec. I don't see a lot of value in avoiding DC ZVA and

>> unaligned accesses altogether, I'd rather fix the code instead.

>

>

>

> While I agree with the general sentiment, I find the result brittle. If it

> were used as a DEBUG build way to locate sub-optmimal code I would be more

> on board. But shipping it like this, puts it into situations where the user

> inadvertently changes something (say making the background black and

> therefore triggering the DC) or some obscure option ROM (we will get there

> right??!!) triggers it in a place where it can't be debugged.

>

> Particularly since we are talking boot, where the few percent perf

> improvement on this operation is likely completely undetectable. The one

> place where I can think it might even be measurable is in routines to clear

> system memory, and those routines could be a special case anyway.

>


I guess this depends on the use case. For server, it may not matter,
but the case is different for mobile, and the Broadcom engineers that
did some benchmarks on this code were very pleased with the result
(and the speedup was significant, although I don't know which routines
are the hotspots)

As for option ROMs: those will link to their own BaseMemoryLib
implementation (assuming that they are EDK2 based) so the only way
they would have access to these routines is via the CopyMem() and
SetMem() boot services. Note that that does not dismiss the concern at
all, it is just a clarification.

Leif, any thoughts?
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Leif Lindholm April 6, 2017, 9:35 a.m. UTC | #6
On Wed, Apr 05, 2017 at 10:55:49PM +0100, Ard Biesheuvel wrote:
> >>> I think this is a problem because nowhere in the UEFI specs do I see such

> >>> restrictions on those memory operations.

> >>

> >> Using device attributes for memory is something we should ban for

> >> AArch64 in the spec.


Yes, completely agree. And doing so is generally the result of
misinderstanding the memory model (i.e., it probably won't provide the
guarantee that was sought).
Charles/Dong? Something to add to list?

Can we insert a test preventing device memory type to be set for
regions with _WB attribute? Or is that already only possible through
manual trickery?

> >>> For a specific problematic example, the LcdGraphicsOutputBlt.c uses it

> >>> for

> >>> BltVideoFill() and the target of that is likely not regular cached video

> >>> memory.

> >>

> >> Those drivers should be using EFI_MEMORY_WC not EFI_MEMORY_UC for the

> >> VRAM mapping. Note that EFI_MEMORY_UC is nGnRnE which is unnecessarily

> >> restrictive.

> >>

> >> I agree there is a general issue here which we should address by

> >> tightening the spec. I don't see a lot of value in avoiding DC ZVA and

> >> unaligned accesses altogether, I'd rather fix the code instead.

> >

> > While I agree with the general sentiment, I find the result brittle. If it

> > were used as a DEBUG build way to locate sub-optmimal code I would be more

> > on board. But shipping it like this, puts it into situations where the user

> > inadvertently changes something (say making the background black and

> > therefore triggering the DC) or some obscure option ROM (we will get there

> > right??!!) triggers it in a place where it can't be debugged.

> >

> > Particularly since we are talking boot, where the few percent perf

> > improvement on this operation is likely completely undetectable. The one

> > place where I can think it might even be measurable is in routines to clear

> > system memory, and those routines could be a special case anyway.

> 

> I guess this depends on the use case. For server, it may not matter,

> but the case is different for mobile, and the Broadcom engineers that

> did some benchmarks on this code were very pleased with the result

> (and the speedup was significant, although I don't know which routines

> are the hotspots)

> 

> As for option ROMs: those will link to their own BaseMemoryLib

> implementation (assuming that they are EDK2 based) so the only way

> they would have access to these routines is via the CopyMem() and

> SetMem() boot services. Note that that does not dismiss the concern at

> all, it is just a clarification.

>

> Leif, any thoughts?


I would prefer if we could resolve this without waiting for a new spec
version.

My gut feeling is that the (end-user, I care a _lot_ less
about development platforms) devices that _could_ be affected by this
won't be releasing updated firmwares completely rebased onto a newer
edk2 HEAD. Rather they're likely to be cherry-picking individual
bugfixes and improvements.

But certainly having some input from abovementioned Broadcom team,
Evan & co, and others is important before we make a call.

/
    Leif
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Ard Biesheuvel April 6, 2017, 9:43 a.m. UTC | #7
On 6 April 2017 at 10:35, Leif Lindholm <leif.lindholm@linaro.org> wrote:
> On Wed, Apr 05, 2017 at 10:55:49PM +0100, Ard Biesheuvel wrote:

>> >>> I think this is a problem because nowhere in the UEFI specs do I see such

>> >>> restrictions on those memory operations.

>> >>

>> >> Using device attributes for memory is something we should ban for

>> >> AArch64 in the spec.

>

> Yes, completely agree. And doing so is generally the result of

> misinderstanding the memory model (i.e., it probably won't provide the

> guarantee that was sought).

> Charles/Dong? Something to add to list?

>


As an additional note, the UEFI spec mandates that unaligned accesses
are enabled for AArch64, which clearly expresses the intent that
routines operating on memory should be able to do so without going out
of its way to avoid unaligned accesses.

> Can we insert a test preventing device memory type to be set for

> regions with _WB attribute? Or is that already only possible through

> manual trickery?

>


We should simply remove the _UC attribute from all memory. I have
already done so for many of the platforms I more or less maintain (and
for virt, we removed _WT and _WC as well, because KVM only supports
_WB)

Note that this does not prevent the NOR and RTC drivers from creating
_UC regions for their own MMIO registers, it just prevents them from
being remapped _UC via the DXE services.

>> >>> For a specific problematic example, the LcdGraphicsOutputBlt.c uses it

>> >>> for

>> >>> BltVideoFill() and the target of that is likely not regular cached video

>> >>> memory.

>> >>

>> >> Those drivers should be using EFI_MEMORY_WC not EFI_MEMORY_UC for the

>> >> VRAM mapping. Note that EFI_MEMORY_UC is nGnRnE which is unnecessarily

>> >> restrictive.

>> >>

>> >> I agree there is a general issue here which we should address by

>> >> tightening the spec. I don't see a lot of value in avoiding DC ZVA and

>> >> unaligned accesses altogether, I'd rather fix the code instead.

>> >

>> > While I agree with the general sentiment, I find the result brittle. If it

>> > were used as a DEBUG build way to locate sub-optmimal code I would be more

>> > on board. But shipping it like this, puts it into situations where the user

>> > inadvertently changes something (say making the background black and

>> > therefore triggering the DC) or some obscure option ROM (we will get there

>> > right??!!) triggers it in a place where it can't be debugged.

>> >

>> > Particularly since we are talking boot, where the few percent perf

>> > improvement on this operation is likely completely undetectable. The one

>> > place where I can think it might even be measurable is in routines to clear

>> > system memory, and those routines could be a special case anyway.

>>

>> I guess this depends on the use case. For server, it may not matter,

>> but the case is different for mobile, and the Broadcom engineers that

>> did some benchmarks on this code were very pleased with the result

>> (and the speedup was significant, although I don't know which routines

>> are the hotspots)

>>

>> As for option ROMs: those will link to their own BaseMemoryLib

>> implementation (assuming that they are EDK2 based) so the only way

>> they would have access to these routines is via the CopyMem() and

>> SetMem() boot services. Note that that does not dismiss the concern at

>> all, it is just a clarification.

>>

>> Leif, any thoughts?

>

> I would prefer if we could resolve this without waiting for a new spec

> version.

>

> My gut feeling is that the (end-user, I care a _lot_ less

> about development platforms) devices that _could_ be affected by this

> won't be releasing updated firmwares completely rebased onto a newer

> edk2 HEAD. Rather they're likely to be cherry-picking individual

> bugfixes and improvements.

>

> But certainly having some input from abovementioned Broadcom team,

> Evan & co, and others is important before we make a call.

>

> /

>     Leif

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Leif Lindholm April 6, 2017, 10:16 a.m. UTC | #8
On Thu, Apr 06, 2017 at 10:43:57AM +0100, Ard Biesheuvel wrote:
> On 6 April 2017 at 10:35, Leif Lindholm <leif.lindholm@linaro.org> wrote:

> > On Wed, Apr 05, 2017 at 10:55:49PM +0100, Ard Biesheuvel wrote:

> >> >>> I think this is a problem because nowhere in the UEFI specs do I see such

> >> >>> restrictions on those memory operations.

> >> >>

> >> >> Using device attributes for memory is something we should ban for

> >> >> AArch64 in the spec.

> >

> > Yes, completely agree. And doing so is generally the result of

> > misinderstanding the memory model (i.e., it probably won't provide the

> > guarantee that was sought).

> > Charles/Dong? Something to add to list?

> 

> As an additional note, the UEFI spec mandates that unaligned accesses

> are enabled for AArch64, which clearly expresses the intent that

> routines operating on memory should be able to do so without going out

> of its way to avoid unaligned accesses.


It does - but only if you already understand the memory model.

> > Can we insert a test preventing device memory type to be set for

> > regions with _WB attribute? Or is that already only possible through

> > manual trickery?

> 

> We should simply remove the _UC attribute from all memory. I have

> already done so for many of the platforms I more or less maintain (and

> for virt, we removed _WT and _WC as well, because KVM only supports

> _WB)


Agreed.

> Note that this does not prevent the NOR and RTC drivers from creating

> _UC regions for their own MMIO registers, it just prevents them from

> being remapped _UC via the DXE services.


OK, good.

/
    Leif
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
diff mbox

Patch

diff --git a/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf b/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf
index 64d11b09ef06..5ddc0cbc2d77 100644
--- a/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf
+++ b/MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf
@@ -116,6 +116,15 @@  [Sources.X64]
   X64/CopyMem.S
   X64/IsZeroBuffer.nasm
 
+[Defines.ARM, Defines.AARCH64]
+  #
+  # The ARM implementations of this library may perform unaligned accesses, and
+  # may use DC ZVA instructions that are only allowed when the MMU and D-cache
+  # are on. Since SEC, PEI_CORE and PEIM modules may execute with the MMU off,
+  # omit them from the supported module types list for this library.
+  #
+  LIBRARY_CLASS = BaseMemoryLib|DXE_CORE DXE_DRIVER DXE_RUNTIME_DRIVER UEFI_DRIVER UEFI_APPLICATION
+
 [Sources.ARM]
   Arm/ScanMem.S       |GCC
   Arm/SetMem.S        |GCC