mbox series

[0/3] arm64: Allow early timestamping of kernel log

Message ID 20190722103330.255312-1-marc.zyngier@arm.com
Headers show
Series arm64: Allow early timestamping of kernel log | expand

Message

Marc Zyngier July 22, 2019, 10:33 a.m. UTC
So far, we've let the arm64 kernel start its meaningful time stamping
of the kernel log pretty late, which is caused by sched_clock() being
initialised rather late compared to other architectures.

Pavel Tatashin proposed[1] to move the initialisation of sched_clock
much earlier, which I had objections to. The reason for initialising
sched_clock late is that a number of systems have broken counters, and
we need to apply all kind of terrifying workarounds to avoid time
going backward on the affected platforms. Being able to identify the
right workaround comes pretty late in the kernel boot, and providing
an unreliable sched_clock, even for a short period of time, isn't an
appealing prospect.

To address this, I'm proposing that we allow an architecture to chose
to (1) divorce time stamping and sched_clock during the early phase of
booting, and (2) inherit the time stamping clock as the new epoch the
first time a sched_sched clock gets registered.

(1) would allow arm64 to provide a time stamping clock, however
unreliable it might be, while (2) would allow sched_clock to provide
time stamps that are continuous with the time-stamping clock.

The last patch in the series adds the necessary logic to arm64,
allowing the (potentially unreliable) time stamping of early kernel
messages.

Tested on a bunch of arm64 systems, both bare-metal and in VMs. Boot
tested on a x86 guest.

[1] https://lore.kernel.org/patchwork/patch/1015110/

Marc Zyngier (3):
  printk: Allow architecture-specific timestamping function
  sched/clock: Allow sched_clock to inherit timestamp_clock epoch
  arm64: Allow early time stamping

 arch/arm64/Kconfig          |  3 +++
 arch/arm64/kernel/setup.c   | 44 +++++++++++++++++++++++++++++++++++++
 include/linux/sched/clock.h | 13 +++++++++++
 kernel/printk/printk.c      |  4 ++--
 kernel/time/sched_clock.c   | 10 +++++++++
 5 files changed, 72 insertions(+), 2 deletions(-)

-- 
2.20.1

Comments

Pasha Tatashin July 22, 2019, 8:52 p.m. UTC | #1
On Mon, Jul 22, 2019 at 3:33 AM Marc Zyngier <marc.zyngier@arm.com> wrote:
>

> So far, we've let the arm64 kernel start its meaningful time stamping

> of the kernel log pretty late, which is caused by sched_clock() being

> initialised rather late compared to other architectures.

>

> Pavel Tatashin proposed[1] to move the initialisation of sched_clock

> much earlier, which I had objections to. The reason for initialising

> sched_clock late is that a number of systems have broken counters, and

> we need to apply all kind of terrifying workarounds to avoid time

> going backward on the affected platforms. Being able to identify the

> right workaround comes pretty late in the kernel boot, and providing

> an unreliable sched_clock, even for a short period of time, isn't an

> appealing prospect.

>

> To address this, I'm proposing that we allow an architecture to chose

> to (1) divorce time stamping and sched_clock during the early phase of

> booting, and (2) inherit the time stamping clock as the new epoch the

> first time a sched_sched clock gets registered.


Could we have a stable clock config for arm64: if it is known that
this Linux build is going to run on non-broken firmware, and with a
known stable cntvct_el0 register it can be optionally configured to
use that stable sched_clock instead of generic clock that arm64 is
using? This way, the early printk are going to be available on those
systems without adding a different method for printk's only. It would
also mean that other users of early sched_clock() such as ftrace could
benefit from it.

>

> (1) would allow arm64 to provide a time stamping clock, however

> unreliable it might be, while (2) would allow sched_clock to provide

> time stamps that are continuous with the time-stamping clock.

>

> The last patch in the series adds the necessary logic to arm64,

> allowing the (potentially unreliable) time stamping of early kernel

> messages.

>

> Tested on a bunch of arm64 systems, both bare-metal and in VMs. Boot

> tested on a x86 guest.

>

> [1] https://lore.kernel.org/patchwork/patch/1015110/

>

> Marc Zyngier (3):

>   printk: Allow architecture-specific timestamping function

>   sched/clock: Allow sched_clock to inherit timestamp_clock epoch

>   arm64: Allow early time stamping

>

>  arch/arm64/Kconfig          |  3 +++

>  arch/arm64/kernel/setup.c   | 44 +++++++++++++++++++++++++++++++++++++

>  include/linux/sched/clock.h | 13 +++++++++++

>  kernel/printk/printk.c      |  4 ++--

>  kernel/time/sched_clock.c   | 10 +++++++++

>  5 files changed, 72 insertions(+), 2 deletions(-)

>

> --

> 2.20.1

>
Hanjun Guo July 23, 2019, 2:42 a.m. UTC | #2
On 2019/7/22 18:33, Marc Zyngier wrote:
> So far, we've let the arm64 kernel start its meaningful time stamping

> of the kernel log pretty late, which is caused by sched_clock() being

> initialised rather late compared to other architectures.

> 

> Pavel Tatashin proposed[1] to move the initialisation of sched_clock

> much earlier, which I had objections to. The reason for initialising

> sched_clock late is that a number of systems have broken counters, and

> we need to apply all kind of terrifying workarounds to avoid time

> going backward on the affected platforms. Being able to identify the

> right workaround comes pretty late in the kernel boot, and providing

> an unreliable sched_clock, even for a short period of time, isn't an

> appealing prospect.

> 

> To address this, I'm proposing that we allow an architecture to chose

> to (1) divorce time stamping and sched_clock during the early phase of

> booting, and (2) inherit the time stamping clock as the new epoch the

> first time a sched_sched clock gets registered.

> 

> (1) would allow arm64 to provide a time stamping clock, however

> unreliable it might be, while (2) would allow sched_clock to provide

> time stamps that are continuous with the time-stamping clock.

> 

> The last patch in the series adds the necessary logic to arm64,

> allowing the (potentially unreliable) time stamping of early kernel

> messages.

> 

> Tested on a bunch of arm64 systems, both bare-metal and in VMs. Boot

> tested on a x86 guest.


This makes the boot log more useful and I can debug some time consuming
issue easier before the arch timer initialization, tested on my ARM64
server, I can see the timestamping from the start [1],

Tested-by: Hanjun Guo <guohanjun@huawei.com>


Thanks
Hanjun

[1]:
[    0.000000] Booting Linux on physical CPU 0x0000080000 [0x481fd010]
[    0.000000] Linux version 5.2.0+ (root@localhost.localdomain) (gcc version 9.0.1 20190312 (Red Hat 9.0.1-0.10) (GCC)) #45 SMP Tue Jul 23 09:17:48 CST 2019
[    0.000000] Using timestamp clock @100MHz
[    0.000074] efi: Getting EFI parameters from FDT:
[    0.000082] efi: EFI v2.70 by EDK II
[    0.000083] efi:  ACPI 2.0=0x3a300000  SMBIOS 3.0=0x39f80000  MEMATTR=0x30996018  MEMRESERVE=0x30997e18
[    0.000122] crashkernel reserved: 0x000000000ba00000 - 0x000000002ba00000 (512 MB)
[    0.000126] cma: Failed to reserve 512 MiB
[    0.185111] ACPI: Early table checksum verification disabled
[    0.185115] ACPI: RSDP 0x000000003A300000 000024 (v02 HISI  )
[    0.185120] ACPI: XSDT 0x000000003A270000 00009C (v01 HISI   HIP08    00000000      01000013)
[    0.185127] ACPI: FACP 0x0000000039B10000 000114 (v06 HISI   HIP08    00000000 HISI 20151124)
[    0.185134] ACPI: DSDT 0x0000000039AB0000 0084E4 (v02 HISI   HIP08    00000000 INTL 20181213)
[    0.185139] ACPI: PCCT 0x0000000039FB0000 00008A (v01 HISI   HIP08    00000000 HISI 20151124)
[    0.185143] ACPI: SSDT 0x0000000039F90000 01021A (v02 HISI   HIP07    00000000 INTL 20181213)
[    0.185147] ACPI: BERT 0x0000000039F50000 000030 (v01 HISI   HIP08    00000000 HISI 20151124)
[    0.185150] ACPI: HEST 0x0000000039F30000 000308 (v01 HISI   HIP08    00000000 HISI 20151124)
[    0.185154] ACPI: ERST 0x0000000039EF0000 000230 (v01 HISI   HIP08    00000000 HISI 20151124)
[    0.185158] ACPI: EINJ 0x0000000039EE0000 000170 (v01 HISI   HIP08    00000000 HISI 20151124)
[    0.185162] ACPI: SLIT 0x0000000039B30000 00003C (v01 HISI   HIP08    00000000 HISI 20151124)
[    0.185166] ACPI: GTDT 0x0000000039B00000 00007C (v02 HISI   HIP08    00000000 HISI 20151124)
[    0.185169] ACPI: MCFG 0x0000000039AF0000 00003C (v01 HISI   HIP08    00000000 HISI 20151124)
[    0.185173] ACPI: SPCR 0x0000000039AE0000 000050 (v02 HISI   HIP08    00000000 HISI 20151124)
[    0.185177] ACPI: SRAT 0x0000000039AD0000 0007D0 (v03 HISI   HIP08    00000000 HISI 20151124)
[    0.185181] ACPI: APIC 0x0000000039AC0000 001E6C (v04 HISI   HIP08    00000000 HISI 20151124)
[    0.185185] ACPI: IORT 0x0000000039AA0000 001310 (v00 HISI   HIP08    00000000 INTL 20181213)
[    0.185189] ACPI: PPTT 0x0000000030970000 0031B0 (v01 HISI   HIP08    00000000 HISI 20151124)
[    0.185196] ACPI: SPCR: console: pl011,mmio32,0x94080000,115200
[    0.185208] ACPI: SRAT: Node 0 PXM 0 [mem 0x2080000000-0x2fffffffff]
[    0.185210] ACPI: SRAT: Node 1 PXM 1 [mem 0x3000000000-0x3fffffffff]
[    0.185212] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
[    0.185213] ACPI: SRAT: Node 2 PXM 2 [mem 0x202000000000-0x202fffffffff]
[    0.185215] ACPI: SRAT: Node 3 PXM 3 [mem 0x203000000000-0x203fffffffff]
[    0.185221] NUMA: NODE_DATA [mem 0x2fffffe3c0-0x2fffffffff]
[    0.185224] NUMA: NODE_DATA [mem 0x3fffffe3c0-0x3fffffffff]
[    0.185226] NUMA: NODE_DATA [mem 0x202fffffe3c0-0x202fffffffff]
[    0.185229] NUMA: NODE_DATA [mem 0x203ffdfde3c0-0x203ffdfdffff]
Marc Zyngier July 23, 2019, 7:17 a.m. UTC | #3
On Mon, 22 Jul 2019 21:52:42 +0100,
Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> 

> On Mon, Jul 22, 2019 at 3:33 AM Marc Zyngier <marc.zyngier@arm.com> wrote:

> >

> > So far, we've let the arm64 kernel start its meaningful time stamping

> > of the kernel log pretty late, which is caused by sched_clock() being

> > initialised rather late compared to other architectures.

> >

> > Pavel Tatashin proposed[1] to move the initialisation of sched_clock

> > much earlier, which I had objections to. The reason for initialising

> > sched_clock late is that a number of systems have broken counters, and

> > we need to apply all kind of terrifying workarounds to avoid time

> > going backward on the affected platforms. Being able to identify the

> > right workaround comes pretty late in the kernel boot, and providing

> > an unreliable sched_clock, even for a short period of time, isn't an

> > appealing prospect.

> >

> > To address this, I'm proposing that we allow an architecture to chose

> > to (1) divorce time stamping and sched_clock during the early phase of

> > booting, and (2) inherit the time stamping clock as the new epoch the

> > first time a sched_sched clock gets registered.

> 

> Could we have a stable clock config for arm64: if it is known that

> this Linux build is going to run on non-broken firmware, and with a

> known stable cntvct_el0 register it can be optionally configured to

> use that stable sched_clock instead of generic clock that arm64 is

> using?


Hmmm. Then I guess a prerequisite is this patch:

https://lwn.net/Articles/490040/

as the number of systems this will reliably run on is a close
approximation to zero. Yes, counting is hard...

More seriously, we've been there before (cue the 32bit ARM port 8
years ago), and really don't want to go back to a time where we had
multiple config options for everything. There is one kernel, which
should run *reliably* on everything.

> This way, the early printk are going to be available on those

> systems without adding a different method for printk's only. It would

> also mean that other users of early sched_clock() such as ftrace could

> benefit from it.


See above. But maybe the real thing to do is to allow local_clock() to
be overridden. Same effect, same complexity, just hidden away from any
given subsystem.

Thanks,

	M.

-- 
Jazz is not dead, it just smells funny.
Pasha Tatashin Sept. 23, 2019, 7:13 p.m. UTC | #4
On Tue, Jul 23, 2019 at 3:17 AM Marc Zyngier <marc.zyngier@arm.com> wrote:
>

> On Mon, 22 Jul 2019 21:52:42 +0100,

> Pavel Tatashin <pasha.tatashin@soleen.com> wrote:

> >

> > On Mon, Jul 22, 2019 at 3:33 AM Marc Zyngier <marc.zyngier@arm.com> wrote:

> > >

> > > So far, we've let the arm64 kernel start its meaningful time stamping

> > > of the kernel log pretty late, which is caused by sched_clock() being

> > > initialised rather late compared to other architectures.

> > >

> > > Pavel Tatashin proposed[1] to move the initialisation of sched_clock

> > > much earlier, which I had objections to. The reason for initialising

> > > sched_clock late is that a number of systems have broken counters, and

> > > we need to apply all kind of terrifying workarounds to avoid time

> > > going backward on the affected platforms. Being able to identify the

> > > right workaround comes pretty late in the kernel boot, and providing

> > > an unreliable sched_clock, even for a short period of time, isn't an

> > > appealing prospect.

> > >

> > > To address this, I'm proposing that we allow an architecture to chose

> > > to (1) divorce time stamping and sched_clock during the early phase of

> > > booting, and (2) inherit the time stamping clock as the new epoch the

> > > first time a sched_sched clock gets registered.


Hi Marc,

I know we briefly discussed this at plumbers, but I want to bring it
up again, because I am still puzzled why it is not possible to
stabilize unstable clock early in boot.

Here is an example where clock is stabilized:
https://soleen.com/source/xref/linux/kernel/sched/clock.c?r=457c8996#265

It uses a value that is read at last ticks to normalize clock, and
because ticks are not available early in boot instead we can make sure
that early in boot sched_clock() never returns value smaller than
previously returned value, and if we want to be extra careful, we can
also make sure that sched_clock() early in boot does not jump ahead by
more than some fixed amount of time i.e. more than one hour.

If sched_clock() is available early we will get the benefit of having
other tracers that use it to debug early boot information.

Pasha