diff mbox

[edk2,2/2] MdeModulePkg: CoreStall: improve accuracy in iterating branch

Message ID 1410700015-23791-3-git-send-email-lersek@redhat.com
State New
Headers show

Commit Message

Laszlo Ersek Sept. 14, 2014, 1:06 p.m. UTC
The iterating branch of CoreStall() adds an extra tick for each iteration
if the division returned a nonzero remainder; effectively rounding up to
whole ticks separately per iteration. This has the potential to waste up
to 9 ticks.

While the current code is technically correct ("the Stall() function
stalls execution on the processor for *at least* the requested number of
microseconds", according to the UEFI spec, emphasis mine), we can improve
the accuracy by rounding up to the exact whole number of ticks. We
accumulate fractional ticks during the iteration, and flush whole ticks as
they become available.

Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 MdeModulePkg/Core/Dxe/Misc/Stall.c | 44 +++++++++++++++++++++++++++++---------
 1 file changed, 34 insertions(+), 10 deletions(-)

Comments

Paolo Bonzini Sept. 14, 2014, 2:45 p.m. UTC | #1
Il 14/09/2014 15:06, Laszlo Ersek ha scritto:
> +      } else {
> +        Accumulator -= gMetronome->TickPeriod;
> +        if (Counter == MAX_UINT64) {
> +          CoreInternalWaitForTick (Counter);
> +          CoreInternalWaitForTick (1);
> +        } else {
> +          CoreInternalWaitForTick (Counter + 1);
> +        }

Counter is the result of a division, so it is at most
0x1999999999999999ULL.  You do not need the "if", I think.

Paolo

> +      }
> +    }
>  
> -    if (Remainder != 0) {
> -      //
> -      // If Remainder was not zero, then normally, Counter would be rounded 
> -      // up by 1 tick.  In this case, since a loop for 10 counts was used
> -      // to emulate the multiply by 10 operation, Counter needs to be rounded
> -      // up by 10 counts.
> -      //
> -      CoreInternalWaitForTick (10);
> +    //
> +    // Flush any remaining fractional ticks.
> +    //
> +    if (Accumulator > 0) {
> +      CoreInternalWaitForTick (1);
>      }


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Paolo Bonzini Sept. 14, 2014, 2:49 p.m. UTC | #2
Il 14/09/2014 16:45, Paolo Bonzini ha scritto:
> Il 14/09/2014 15:06, Laszlo Ersek ha scritto:
>> +      } else {
>> +        Accumulator -= gMetronome->TickPeriod;
>> +        if (Counter == MAX_UINT64) {
>> +          CoreInternalWaitForTick (Counter);
>> +          CoreInternalWaitForTick (1);
>> +        } else {
>> +          CoreInternalWaitForTick (Counter + 1);
>> +        }
> 
> Counter is the result of a division, so it is at most
> 0x1999999999999999ULL.  You do not need the "if", I think.

Oops---sorry, the division is not by 10.

Still, if you count the extra number of ticks, and do a final
CoreInternalWaitForTick (together with the extra wait for Accumulator >
0), I think the code is cleaner.

Paolo

> Paolo
> 
>> +      }
>> +    }
>>  
>> -    if (Remainder != 0) {
>> -      //
>> -      // If Remainder was not zero, then normally, Counter would be rounded 
>> -      // up by 1 tick.  In this case, since a loop for 10 counts was used
>> -      // to emulate the multiply by 10 operation, Counter needs to be rounded
>> -      // up by 10 counts.
>> -      //
>> -      CoreInternalWaitForTick (10);
>> +    //
>> +    // Flush any remaining fractional ticks.
>> +    //
>> +    if (Accumulator > 0) {
>> +      CoreInternalWaitForTick (1);
>>      }
> 
> 
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> 


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Laszlo Ersek Sept. 14, 2014, 5:16 p.m. UTC | #3
On 09/14/14 16:49, Paolo Bonzini wrote:
> Il 14/09/2014 16:45, Paolo Bonzini ha scritto:
>> Il 14/09/2014 15:06, Laszlo Ersek ha scritto:
>>> +      } else {
>>> +        Accumulator -= gMetronome->TickPeriod;
>>> +        if (Counter == MAX_UINT64) {
>>> +          CoreInternalWaitForTick (Counter);
>>> +          CoreInternalWaitForTick (1);
>>> +        } else {
>>> +          CoreInternalWaitForTick (Counter + 1);
>>> +        }
>>
>> Counter is the result of a division, so it is at most
>> 0x1999999999999999ULL.  You do not need the "if", I think.
> 
> Oops---sorry, the division is not by 10.
> 
> Still, if you count the extra number of ticks, and do a final
> CoreInternalWaitForTick (together with the extra wait for Accumulator >
> 0), I think the code is cleaner.

That did cross my mind, but I didn't realize that it would make the code
cleaner (ie. eliminate the innermost if). If the idea of accumulating
the fractions proves acceptable for the maintainers in general, then
I'll spin a v2 the way you suggest.

Thanks!
Laszlo


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Laszlo Ersek Sept. 15, 2014, 12:17 p.m. UTC | #4
On 09/15/14 09:00, Tian, Feng wrote:
> Hi, Laszlo
> 
> Here is our intention for the logic:
> 
> If Microseconds is greater than or equal to
> 0x1999999999999999ULL(about 58,494 years), then the divide operation
> is performed first, which means we will lose some accuracy, but this
> is a very large delay request, so waiting a little bit extra in this
> extreme case is an acceptable compromise.
> 
> So I would prefer not to add this patch if there is no real impact.

Fair enough.

Thanks,
Laszlo

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Andrew Fish Sept. 15, 2014, 4:53 p.m. UTC | #5
On Sep 15, 2014, at 12:00 AM, Tian, Feng <feng.tian@intel.com> wrote:

> Hi, Laszlo
> 
> Here is our intention for the logic:
> 
> If Microseconds is greater than or equal to 0x1999999999999999ULL(about 58,494 years), then the divide operation is performed first, which means we will lose some accuracy, but this is a very large delay request, so waiting a little bit extra in this extreme case is an acceptable compromise.  
> 
> So I would prefer not to add this patch if there is no real impact.
> 

Was this issue caught by some static analysis tool? Why do we need to have code to deal with something that could never happen? 

Thanks,

Andrew Fish

> Thanks
> Feng
> 
> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com] 
> Sent: Sunday, September 14, 2014 21:07
> To: edk2-devel@lists.sourceforge.net
> Subject: [edk2] [PATCH 2/2] MdeModulePkg: CoreStall: improve accuracy in iterating branch
> 
> The iterating branch of CoreStall() adds an extra tick for each iteration if the division returned a nonzero remainder; effectively rounding up to whole ticks separately per iteration. This has the potential to waste up to 9 ticks.
> 
> While the current code is technically correct ("the Stall() function stalls execution on the processor for *at least* the requested number of microseconds", according to the UEFI spec, emphasis mine), we can improve the accuracy by rounding up to the exact whole number of ticks. We accumulate fractional ticks during the iteration, and flush whole ticks as they become available.
> 
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
> MdeModulePkg/Core/Dxe/Misc/Stall.c | 44 +++++++++++++++++++++++++++++---------
> 1 file changed, 34 insertions(+), 10 deletions(-)
> 
> diff --git a/MdeModulePkg/Core/Dxe/Misc/Stall.c b/MdeModulePkg/Core/Dxe/Misc/Stall.c
> index aae54fd..4c63624 100644
> --- a/MdeModulePkg/Core/Dxe/Misc/Stall.c
> +++ b/MdeModulePkg/Core/Dxe/Misc/Stall.c
> @@ -51,51 +51,75 @@ CoreInternalWaitForTick (  **/  EFI_STATUS  EFIAPI  CoreStall (
>   IN UINTN            Microseconds
>   )
> {
>   UINT64  Counter;
>   UINT32  Remainder;
>   UINTN   Index;
> +  UINT64  Accumulator;
> 
>   if (gMetronome == NULL) {
>     return EFI_NOT_AVAILABLE_YET;
>   }
> 
>   //
>   // Counter = Microseconds * 10 / gMetronome->TickPeriod
>   // 0x1999999999999999 = (2^64 - 1) / 10
>   //
>   if (Microseconds > 0x1999999999999999ULL) {
>     //
>     // Microseconds is too large to multiple by 10 first.  Perform the divide 
>     // operation first and loop 10 times to avoid 64-bit math overflow.
>     //
>     Counter = DivU64x32Remainder (
>                 Microseconds,
>                 gMetronome->TickPeriod,
>                 &Remainder
>                 );
> +
> +    Accumulator = 0;
> +    //
> +    // In each iteration we wait for Counter whole ticks, and
> +    // (Remainder/TickPeriod) fractional ticks (the latter being strictly less
> +    // than one). Clearly we can only pass whole ticks to
> +    // CoreInternalWaitForTick(), therefore we keep a running account of the
> +    // fractional ticks. Whenever enough fractional ticks accumulate, we flush
> +    // a whole tick from them.
> +    //
> +    // Note that Accumulator will never reach or exceed 2*TickPeriod, because
> +    // Remainder < TickPeriod. This has two consequences:
> +    // - We never have to flush more than 1 whole tick.
> +    // - Accumulator is always expressible as UINT64 (TickPeriod being UINT32).
> +    //
>     for (Index = 0; Index < 10; Index++) {
> -      CoreInternalWaitForTick (Counter);
> -    }      
> +      Accumulator += Remainder;
> +      if (Accumulator < gMetronome->TickPeriod) {
> +        CoreInternalWaitForTick (Counter);
> +      } else {
> +        Accumulator -= gMetronome->TickPeriod;
> +        if (Counter == MAX_UINT64) {
> +          CoreInternalWaitForTick (Counter);
> +          CoreInternalWaitForTick (1);
> +        } else {
> +          CoreInternalWaitForTick (Counter + 1);
> +        }
> +      }
> +    }
> 
> -    if (Remainder != 0) {
> -      //
> -      // If Remainder was not zero, then normally, Counter would be rounded 
> -      // up by 1 tick.  In this case, since a loop for 10 counts was used
> -      // to emulate the multiply by 10 operation, Counter needs to be rounded
> -      // up by 10 counts.
> -      //
> -      CoreInternalWaitForTick (10);
> +    //
> +    // Flush any remaining fractional ticks.
> +    //
> +    if (Accumulator > 0) {
> +      CoreInternalWaitForTick (1);
>     }
>   } else {
>     //
>     // Calculate the number of ticks by dividing the number of microseconds by
>     // the TickPeriod.  Calculation is based on 100ns unit.
>     //
>     Counter = DivU64x32Remainder (
>                 MultU64x32 (Microseconds, 10),
>                 gMetronome->TickPeriod,
>                 &Remainder
> --
> 1.8.3.1
> 
> 
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/edk2-devel
> 
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/edk2-devel


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Laszlo Ersek Sept. 15, 2014, 6:29 p.m. UTC | #6
On 09/15/14 18:53, Andrew Fish wrote:
> 
> On Sep 15, 2014, at 12:00 AM, Tian, Feng <feng.tian@intel.com> wrote:
> 
>> Hi, Laszlo
>>
>> Here is our intention for the logic:
>>
>> If Microseconds is greater than or equal to 0x1999999999999999ULL(about 58,494 years), then the divide operation is performed first, which means we will lose some accuracy, but this is a very large delay request, so waiting a little bit extra in this extreme case is an acceptable compromise.  
>>
>> So I would prefer not to add this patch if there is no real impact.
>>
> 
> Was this issue caught by some static analysis tool?

Nikolai submitted a patch that simply caused me to look at the function
in question. In 2005-2006 I had written a UDP data transfer program that
was obsessed with accurate packet rate, time keeping, and preventing
long-term drift (I still use it occasionally, if the link quality
justifies it); hence my interest in fractional ticks.

> Why do we need to have code to deal with something that could never
> happen?

Well, such function arguments *could* happen. How the boot service
should react is a different question. Technically the implementation
matches the UEFI spec.

(Compare for example the select() specification in POSIX:
<http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html>

    Implementations may place limitations on the maximum timeout
    interval supported. All implementations shall support a maximum
    timeout interval of at least 31 days. If the timeout argument
    specifies a timeout interval greater than the implementation-
    defined maximum value, the maximum value shall be used as the
    actual timeout value. [...]

The next errata / release of the UEFI spec might want to state something
like this too.)

Thanks,
Laszlo


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Andrew Fish Sept. 15, 2014, 9:10 p.m. UTC | #7
On Sep 15, 2014, at 11:29 AM, Laszlo Ersek <lersek@redhat.com> wrote:

> On 09/15/14 18:53, Andrew Fish wrote:
>> 
>> On Sep 15, 2014, at 12:00 AM, Tian, Feng <feng.tian@intel.com> wrote:
>> 
>>> Hi, Laszlo
>>> 
>>> Here is our intention for the logic:
>>> 
>>> If Microseconds is greater than or equal to 0x1999999999999999ULL(about 58,494 years), then the divide operation is performed first, which means we will lose some accuracy, but this is a very large delay request, so waiting a little bit extra in this extreme case is an acceptable compromise.  
>>> 
>>> So I would prefer not to add this patch if there is no real impact.
>>> 
>> 
>> Was this issue caught by some static analysis tool?
> 
> Nikolai submitted a patch that simply caused me to look at the function
> in question. In 2005-2006 I had written a UDP data transfer program that
> was obsessed with accurate packet rate, time keeping, and preventing
> long-term drift (I still use it occasionally, if the link quality
> justifies it); hence my interest in fractional ticks.
> 

Sorry I slotted this in the wrong location. I wanted to ask about the initial patch that added 0x199999999999999ULL?

>> Why do we need to have code to deal with something that could never
>> happen?
> 
> Well, such function arguments *could* happen. How the boot service
> should react is a different question. Technically the implementation
> matches the UEFI spec.
> 
> (Compare for example the select() specification in POSIX:
> <http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html>
> 
>    Implementations may place limitations on the maximum timeout
>    interval supported. All implementations shall support a maximum
>    timeout interval of at least 31 days. If the timeout argument
>    specifies a timeout interval greater than the implementation-
>    defined maximum value, the maximum value shall be used as the
>    actual timeout value. [...]
> 
> The next errata / release of the UEFI spec might want to state something
> like this too.)
> 

I was thinking that capping the max value would not be a bad idea. Thanks for the POSIX reference. 

Thanks,

Andrew Fish

> Thanks,
> Laszlo
> 
> 
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/edk2-devel


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Tian, Feng Sept. 16, 2014, 12:57 a.m. UTC | #8
Hi, Andrew

The change was proposed to solve incorrect truncation and overflow issue on old revision.

The original code is:
  Counter = (UINT32) DivU64x32Remainder (
                       Microseconds * 10,
                       gMetronome->TickPeriod,
                       &Remainder
                       );

In original code, there is Microseconds * 10 which may be out of UINTN boundary. And Counter is forced to truncate to 32 bit, which may also bring issue.

Thanks
Feng

-----Original Message-----
From: Andrew Fish [mailto:afish@apple.com] 
Sent: Tuesday, September 16, 2014 05:10
To: edk2-devel@lists.sourceforge.net
Subject: Re: [edk2] [PATCH 2/2] MdeModulePkg: CoreStall: improve accuracy in iterating branch


On Sep 15, 2014, at 11:29 AM, Laszlo Ersek <lersek@redhat.com> wrote:

> On 09/15/14 18:53, Andrew Fish wrote:
>> 
>> On Sep 15, 2014, at 12:00 AM, Tian, Feng <feng.tian@intel.com> wrote:
>> 
>>> Hi, Laszlo
>>> 
>>> Here is our intention for the logic:
>>> 
>>> If Microseconds is greater than or equal to 0x1999999999999999ULL(about 58,494 years), then the divide operation is performed first, which means we will lose some accuracy, but this is a very large delay request, so waiting a little bit extra in this extreme case is an acceptable compromise.  
>>> 
>>> So I would prefer not to add this patch if there is no real impact.
>>> 
>> 
>> Was this issue caught by some static analysis tool?
> 
> Nikolai submitted a patch that simply caused me to look at the 
> function in question. In 2005-2006 I had written a UDP data transfer 
> program that was obsessed with accurate packet rate, time keeping, and 
> preventing long-term drift (I still use it occasionally, if the link 
> quality justifies it); hence my interest in fractional ticks.
> 

Sorry I slotted this in the wrong location. I wanted to ask about the initial patch that added 0x199999999999999ULL?

>> Why do we need to have code to deal with something that could never 
>> happen?
> 
> Well, such function arguments *could* happen. How the boot service 
> should react is a different question. Technically the implementation 
> matches the UEFI spec.
> 
> (Compare for example the select() specification in POSIX:
> <http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html
> >
> 
>    Implementations may place limitations on the maximum timeout
>    interval supported. All implementations shall support a maximum
>    timeout interval of at least 31 days. If the timeout argument
>    specifies a timeout interval greater than the implementation-
>    defined maximum value, the maximum value shall be used as the
>    actual timeout value. [...]
> 
> The next errata / release of the UEFI spec might want to state 
> something like this too.)
> 

I was thinking that capping the max value would not be a bad idea. Thanks for the POSIX reference. 

Thanks,

Andrew Fish

> Thanks,
> Laszlo
> 
> 
> ----------------------------------------------------------------------
> --------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce Perforce version control. 
> Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.
> clktrk _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/edk2-devel


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Scott Duplichan Sept. 16, 2014, 5:53 p.m. UTC | #9
Tian, Feng [mailto:feng.tian@intel.com] wrote:

]Hi, Andrew
]
]The change was proposed to solve incorrect truncation and overflow issue on old revision.
]
]The original code is:
]  Counter = (UINT32) DivU64x32Remainder (
]                       Microseconds * 10,
]                       gMetronome->TickPeriod,
]                       &Remainder
]                       );
]
]In original code, there is Microseconds * 10 which may be out of UINTN boundary. And Counter is forced to truncate to 32 bit,
]which may also bring issue.

It is good to see concern over integer overflow. So often it is just ignored. Here is an example. Which delay is longer:

MicroSecondDelay (5153376776577);
MicroSecondDelay (1);

Answer: the second one. The first malfunctions due to integer overflow.

Thanks,
Scott


]Thanks
]Feng
]
]-----Original Message-----
]From: Andrew Fish [mailto:afish@apple.com] 
]Sent: Tuesday, September 16, 2014 05:10
]To: edk2-devel@lists.sourceforge.net
]Subject: Re: [edk2] [PATCH 2/2] MdeModulePkg: CoreStall: improve accuracy in iterating branch
]
]
]On Sep 15, 2014, at 11:29 AM, Laszlo Ersek <lersek@redhat.com> wrote:
]
]> On 09/15/14 18:53, Andrew Fish wrote:
]>> 
]>> On Sep 15, 2014, at 12:00 AM, Tian, Feng <feng.tian@intel.com> wrote:
]>> 
]>>> Hi, Laszlo
]>>> 
]>>> Here is our intention for the logic:
]>>> 
]>>> If Microseconds is greater than or equal to 0x1999999999999999ULL(about 58,494 years), then the divide operation is performed
]first, which means we will lose some accuracy, but this is a very large delay request, so waiting a little bit extra in this
]extreme case is an acceptable compromise.  
]>>> 
]>>> So I would prefer not to add this patch if there is no real impact.
]>>> 
]>> 
]>> Was this issue caught by some static analysis tool?
]> 
]> Nikolai submitted a patch that simply caused me to look at the 
]> function in question. In 2005-2006 I had written a UDP data transfer 
]> program that was obsessed with accurate packet rate, time keeping, and 
]> preventing long-term drift (I still use it occasionally, if the link 
]> quality justifies it); hence my interest in fractional ticks.
]> 
]
]Sorry I slotted this in the wrong location. I wanted to ask about the initial patch that added 0x199999999999999ULL?
]
]>> Why do we need to have code to deal with something that could never 
]>> happen?
]> 
]> Well, such function arguments *could* happen. How the boot service 
]> should react is a different question. Technically the implementation 
]> matches the UEFI spec.
]> 
]> (Compare for example the select() specification in POSIX:
]> <http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html
]> >
]> 
]>    Implementations may place limitations on the maximum timeout
]>    interval supported. All implementations shall support a maximum
]>    timeout interval of at least 31 days. If the timeout argument
]>    specifies a timeout interval greater than the implementation-
]>    defined maximum value, the maximum value shall be used as the
]>    actual timeout value. [...]
]> 
]> The next errata / release of the UEFI spec might want to state 
]> something like this too.)
]> 
]
]I was thinking that capping the max value would not be a bad idea. Thanks for the POSIX reference. 
]
]Thanks,
]
]Andrew Fish
]
]> Thanks,
]> Laszlo



------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Andrew Fish Sept. 16, 2014, 6:36 p.m. UTC | #10
f
On Sep 16, 2014, at 10:53 AM, Scott Duplichan <scott@notabs.org> wrote:

> Tian, Feng [mailto:feng.tian@intel.com] wrote:
> 
> ]Hi, Andrew
> ]
> ]The change was proposed to solve incorrect truncation and overflow issue on old revision.
> ]
> ]The original code is:
> ]  Counter = (UINT32) DivU64x32Remainder (
> ]                       Microseconds * 10,
> ]                       gMetronome->TickPeriod,
> ]                       &Remainder
> ]                       );
> ]
> ]In original code, there is Microseconds * 10 which may be out of UINTN boundary. And Counter is forced to truncate to 32 bit,
> ]which may also bring issue.
> 

Well checking for a delay of 58,455 years is not solving a real problem, at least not one we can test. Making sure the 7 minute stall works is probably the problem you want to solve. 

> It is good to see concern over integer overflow. So often it is just ignored. Here is an example. Which delay is longer:
> 
> MicroSecondDelay (5153376776577);
> MicroSecondDelay (1);
> 

That seems like a bug, which TimerLib library instance is this? We should fix it. Scott do you have a system that can sit around for 60 days running the test?

Thanks,

Andrew Fish

> Answer: the second one. The first malfunctions due to integer overflow.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Scott Duplichan Sept. 16, 2014, 10:14 p.m. UTC | #11
Andrew Fish [mailto:afish@apple.com]  wrote:

On Sep 16, 2014, at 10:53 AM, Scott Duplichan <scott@notabs.org <mailto:scott@notabs.org> > wrote:

 

Tian, Feng [ <mailto:feng.tian@intel.com> mailto:feng.tian@intel.com] wrote:

]Hi, Andrew
]
]The change was proposed to solve incorrect truncation and overflow issue on old revision.
]
]The original code is:
]  Counter = (UINT32) DivU64x32Remainder (
]                       Microseconds * 10,
]                       gMetronome->TickPeriod,
]                       &Remainder
]                       );
]
]In original code, there is Microseconds * 10 which may be out of UINTN boundary. And Counter is forced to truncate to 32 bit,
]which may also bring issue.



 

Well checking for a delay of 58,455 years is not solving a real problem, at least not one we can test. Making sure the 7 minute
stall works is probably the problem you want to solve. 





It is good to see concern over integer overflow. So often it is just ignored. Here is an example. Which delay is longer:

MicroSecondDelay (5153376776577);
MicroSecondDelay (1);



 

That seems like a bug, which TimerLib library instance is this? We should fix it. Scott do you have a system that can sit around for
60 days running the test?

 

 

Hello Andrew,

 

The above overflow is from DuetTimerLib, one that uses the ACPI timer. Probably a similar overflow could be demonstrated for the
ones using APIC timer. In many ways it is in the same category as 58,000 year delay problem. It really doesn't matter.

 

Thanks,

Scott

 

Thanks,

 

Andrew Fish





Answer: the second one. The first malfunctions due to integer overflow.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Andrew Fish Sept. 16, 2014, 10:27 p.m. UTC | #12
On Sep 16, 2014, at 3:14 PM, Scott Duplichan <scott@notabs.org> wrote:

> Hello Andrew,
>  
> The above overflow is from DuetTimerLib, one that uses the ACPI timer. Probably a similar overflow could be demonstrated for the ones using APIC timer. In many ways it is in the same category as 58,000 year delay problem. It really doesn't matter.
>  

Scott,

I think this brings up the point that the TimerLIb delay routines should have an upper bound on how long they will delay for. 

There is a high probability that a TimerLib delay is much more power inefficient that a call to UEFI gBS->Stall(). So there is probably not a need for a really long stall via the TimerLib. 

Thanks,

Andrew Fish

> Thanks,
> Scott
>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
diff mbox

Patch

diff --git a/MdeModulePkg/Core/Dxe/Misc/Stall.c b/MdeModulePkg/Core/Dxe/Misc/Stall.c
index aae54fd..4c63624 100644
--- a/MdeModulePkg/Core/Dxe/Misc/Stall.c
+++ b/MdeModulePkg/Core/Dxe/Misc/Stall.c
@@ -51,51 +51,75 @@  CoreInternalWaitForTick (
 **/
 EFI_STATUS
 EFIAPI
 CoreStall (
   IN UINTN            Microseconds
   )
 {
   UINT64  Counter;
   UINT32  Remainder;
   UINTN   Index;
+  UINT64  Accumulator;
 
   if (gMetronome == NULL) {
     return EFI_NOT_AVAILABLE_YET;
   }
 
   //
   // Counter = Microseconds * 10 / gMetronome->TickPeriod
   // 0x1999999999999999 = (2^64 - 1) / 10
   //
   if (Microseconds > 0x1999999999999999ULL) {
     //
     // Microseconds is too large to multiple by 10 first.  Perform the divide 
     // operation first and loop 10 times to avoid 64-bit math overflow.
     //
     Counter = DivU64x32Remainder (
                 Microseconds,
                 gMetronome->TickPeriod,
                 &Remainder
                 );
+
+    Accumulator = 0;
+    //
+    // In each iteration we wait for Counter whole ticks, and
+    // (Remainder/TickPeriod) fractional ticks (the latter being strictly less
+    // than one). Clearly we can only pass whole ticks to
+    // CoreInternalWaitForTick(), therefore we keep a running account of the
+    // fractional ticks. Whenever enough fractional ticks accumulate, we flush
+    // a whole tick from them.
+    //
+    // Note that Accumulator will never reach or exceed 2*TickPeriod, because
+    // Remainder < TickPeriod. This has two consequences:
+    // - We never have to flush more than 1 whole tick.
+    // - Accumulator is always expressible as UINT64 (TickPeriod being UINT32).
+    //
     for (Index = 0; Index < 10; Index++) {
-      CoreInternalWaitForTick (Counter);
-    }      
+      Accumulator += Remainder;
+      if (Accumulator < gMetronome->TickPeriod) {
+        CoreInternalWaitForTick (Counter);
+      } else {
+        Accumulator -= gMetronome->TickPeriod;
+        if (Counter == MAX_UINT64) {
+          CoreInternalWaitForTick (Counter);
+          CoreInternalWaitForTick (1);
+        } else {
+          CoreInternalWaitForTick (Counter + 1);
+        }
+      }
+    }
 
-    if (Remainder != 0) {
-      //
-      // If Remainder was not zero, then normally, Counter would be rounded 
-      // up by 1 tick.  In this case, since a loop for 10 counts was used
-      // to emulate the multiply by 10 operation, Counter needs to be rounded
-      // up by 10 counts.
-      //
-      CoreInternalWaitForTick (10);
+    //
+    // Flush any remaining fractional ticks.
+    //
+    if (Accumulator > 0) {
+      CoreInternalWaitForTick (1);
     }
   } else {
     //
     // Calculate the number of ticks by dividing the number of microseconds by
     // the TickPeriod.  Calculation is based on 100ns unit.
     //
     Counter = DivU64x32Remainder (
                 MultU64x32 (Microseconds, 10),
                 gMetronome->TickPeriod,
                 &Remainder