[edk2,1/3] ArmPkg/CompilerIntrinsicsLib: replace memcpy and memset with C code

Ard Biesheuvel Aug. 12, 2016, 8:09 a.m.
On 12 August 2016 at 01:04, Andrew Fish <afish@apple.com> wrote:

On Aug 11, 2016, at 2:50 PM, Cohen, Eugene <eugene@hp.com> wrote:


Why does memcpy performance matter?  In addition to the overall

memcpy stuff scattered around C code we have an instance that is

particularly sensitive to memcpy performance.  For DMA operations

when invoking double-buffering or access to portions of a buffer that

is common mapped (i.e. uncached on non-coherent DMA systems) the

impact of a non-optimized memcpy is enormous compared to the

optimized ones because the penalty is amplified by orders of

>>> magnitude due to uncached memory access latency.



compiler

>>> compiler

generated calls, which are few since Tianocore does not allow

>>> initialized locals.


>> I see and agree that should minimize the impact.   I guess I'll ask the naive question.  Could the BaseMemoryLib and CompilerIntrinsicsLib share the same stuff?



> Eugene,


> I think if a CompilerIntrinsicsLib implementation consumes the BaseMemoryLib class (lists it in the INF) then I think it should just work.


Adding this


(and something similar for memset()) will make the AArch64 platforms I
usually test with build happily without the compilerintrinsicslib.
Since no other changes are required, this means that BaseMemoryLib is
already pulled into all modules one way or the other, and so it would
seem like an improvement not to have both implementations, since they
do exactly the same.

For ARM, this is obviously different given the RT abi and its __aeabi_
prefixed entry points. I suppose the memcpy and memset intrinsincs are
more a GCC thing than an ARM thing.
--- a/ArmPkg/Library/BaseMemoryLibStm/AArch64/CopyMem.c
+++ b/ArmPkg/Library/BaseMemoryLibStm/AArch64/CopyMem.c
@@ -144,3 +144,10 @@  InternalMemCopyMem (
   return DestinationBuffer;
+// Make this implementation satisfy references to the intrinsic memcpy() that
+// may be emitted by the compiler.
+__attribute__((__weak__, __alias__("InternalMemCopyMem")))
+void *memcpy(void *dest, const void *src, __SIZE_TYPE__ n);