mbox series

[0/29] arm meltdown fix backporting review for lts 4.9

Message ID 1519790211-16582-1-git-send-email-alex.shi@linaro.org
Headers show
Series arm meltdown fix backporting review for lts 4.9 | expand

Message

Alex Shi Feb. 28, 2018, 3:56 a.m. UTC
Hi All,

This backport patchset fixed the meltdown issue, it's original branch:
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=kpti
A few dependency or fixingpatches are also picked up, if they are necessary
 and no functional changes.

The patchset also on repository:
	git://git.linaro.org/kernel/linux-linaro-stable.git lts-4.9-spectrevv2 

No bug found yet from kernelci.org and lkft testing.

Any comments are appreciated!

Regards
Alex

---
AKASHI Takahiro (1):
      module: extend 'rodata=off' boot cmdline parameter to module mappings

Jayachandran C (2):
      arm64: cputype: Add MIDR values for Cavium ThunderX2 CPUs
      arm64: Turn on KPTI only on CPUs that need it

Marc Zyngier (2):
      arm64: Allow checking of a CPU-local erratum
      arm64: Force KPTI to be disabled on Cavium ThunderX

Mark Rutland (1):
      arm64: factor out entry stack manipulation

Suzuki K Poulose (1):
      arm64: capabilities: Handle duplicate entries for a capability

Will Deacon (21):
      arm64: mm: Use non-global mappings for kernel space
      arm64: mm: Move ASID from TTBR0 to TTBR1
      arm64: mm: Allocate ASIDs in pairs
      arm64: mm: Add arm64_kernel_unmapped_at_el0 helper
      arm64: mm: Invalidate both kernel and user ASIDs when performing TLBI
      arm64: entry: Add exception trampoline page for exceptions from EL0
      arm64: mm: Map entry trampoline into trampoline and kernel page tables
      arm64: entry: Explicitly pass exception level to kernel_ventry macro
      arm64: entry: Hook up entry trampoline to exception vectors
      arm64: tls: Avoid unconditional zeroing of tpidrro_el0 for native tasks
      arm64: entry: Add fake CPU feature for unmapping the kernel at EL0
      arm64: kaslr: Put kernel vectors address in separate data page
      arm64: use RET instruction for exiting the trampoline
      arm64: Kconfig: Add CONFIG_UNMAP_KERNEL_AT_EL0
      arm64: Kconfig: Reword UNMAP_KERNEL_AT_EL0 kconfig entry
      arm64: Take into account ID_AA64PFR0_EL1.CSV3
      arm64: cputype: Add missing MIDR values for Cortex-A72 and Cortex-A75
      arm64: kpti: Make use of nG dependent on arm64_kernel_unmapped_at_el0()
      arm64: kpti: Add ->enable callback to remap swapper using nG mappings
      arm64: entry: Reword comment about post_ttbr_update_workaround
      arm64: idmap: Use "awx" flags for .idmap.text .pushsection directives

Xie XiuQi (1):
      arm64: entry.S: move SError handling into a C function for future expansion

Comments

Alex Shi Feb. 28, 2018, 4:02 a.m. UTC | #1
On 02/28/2018 11:56 AM, Alex Shi wrote:
> The patchset also on repository:

> 	git://git.linaro.org/kernel/linux-linaro-stable.git lts-4.9-spectrevv2 


Sorry, the correct branch address is here:

https://git.linaro.org/kernel/speculation-fixes-staging.git v4.9-meltdown

Thanks
Alex
Marc Zyngier March 2, 2018, 10:32 a.m. UTC | #2
On Fri, 02 Mar 2018 09:14:50 +0000,
Alex Shi wrote:
> 

> 

> 

> On 03/01/2018 11:24 PM, Greg KH wrote:

> > On Wed, Feb 28, 2018 at 11:56:22AM +0800, Alex Shi wrote:

> >> Hi All,

> >>

> >> This backport patchset fixed the meltdown issue, it's original branch:

> >> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=kpti

> >> A few dependency or fixingpatches are also picked up, if they are necessary

> >>  and no functional changes.

> >>

> >> The patchset also on repository:

> >> 	git://git.linaro.org/kernel/linux-linaro-stable.git lts-4.9-spectrevv2 

> >>

> >> No bug found yet from kernelci.org and lkft testing.

> > 

> > No bugs is good, but does it actually fix the meltdown problem?  What

> > did you test it on?

> 

> Oh, I have no A73/A75 cpu, so I can not reproduce meltdown bug.


Cortex-A73 is not affected by Meltdown. Only A75 is. Please don't
spread misinformation. They are both affected by Spectre though.

	M.

-- 
Jazz is not dead, it just smell funny.
Greg KH March 6, 2018, 5:25 p.m. UTC | #3
On Tue, Mar 06, 2018 at 02:26:34PM +0000, Mark Brown wrote:
> On Mon, Mar 05, 2018 at 02:08:59PM +0100, Greg KH wrote:

> 

> > I know there is lots more than Android to ARM, but the huge majority by

> > quantity is Android.

> 

> > What I'm saying here is look at all of the backports that were required

> > to get this working in the android tree.  It was non-trivial by a long

> > shot, and based on that work, this series feels really "small" and I'm

> > really worried that it's not really working or solving the problem here.

> 

> Unfortunately what's been coming over was just the bit about using

> android-common, not the bit about why you're worried about the code.  :(


Sorry, it's been a long few months, my ability to communicate well about
this topic is tough at times without assuming everyone else has been
dealing with it for as long as some of us have.

> > There are major features that were backported to the android trees for

> > ARM that the upstream features for Spectre and Meltdown built on top of

> > to get their solution.  To not backport all of that is a huge risk,

> > right?

> 

> I'm not far enough into the details to comment on the specifics here;

> there's other people in the CCs who are.  Let's let people look at the

> code and see if they think some of the fixes are useful in LTS.  The

> Android tree does have things beyond what's in LTS and there's been more

> time for analysis since the changes were made there.


I suggest looking at the backports in the android-common tree that are
needed for this "feature" to work properly, and pull them out and test
them if you really want it in your Linaro trees.  If you think some of
them should be added to the LTS kernels, I'll be glad to consider them,
but don't do a hack to try to work around the lack of these features,
otherwise you will not be happy in the long-run.

Again, look at the mess we have for x86 in 4.4.y and 4.9.y.  You do not
want that for ARM for the simple reason that ARM systems usually last
"longer" with those old kernels than the x86 systems do.

> > So that's why I keep pointing people at the android trees.  Look at what

> > they did there.  There's nothing stoping anyone who is really insistant

> > on staying on these old kernel versions from pulling from those branches

> > to get these bugfixes in a known stable, and tested, implementation.

> 

> I think there's enough stuff going on in the Android tree to make that

> unpalatable for a good segment of users.


Really?  Like what?  Last I looked it's only about 300 or so patches.
Something like less than .5% of the normal SoC backport size for any ARM
system recently.  There were some numbers published a few months ago
about the real count, I can dig them up if you are curious.

> > Or just move to 4.14.y.  Seriously, that's probably the safest thing in

> > the long run for anyone here.  And when you realize you can't do that,

> > go yell at your SoC for forcing you into the nightmare that they conned

> > you into by their 3+ million lines added to their kernel tree.  You were

> > always living on borowed time, and it looks like that time is finally

> > up...

> 

> Yes, there are some people who are stuck with enormous out of tree patch

> sets on most architectures (just look at the enterprise distros!) - but

> there are also people who are at or very close to vanilla and just

> trying to control their validation costs by not changing too much when

> they don't need to.


Great, then move to 4.14.y :)

And before someone says "but it takes more to validate a new kernel
version than it does to just validate a core backport for the
architecture code", well...

> There's a good discussion to be had about it being sensible for people

> to accept more change in that segment of the market but equally those

> same attitudes have been an important part of the pressure that's been

> placed on vendors long term to get things in mainline.

> 

> > [1] It's also why I keep doing the LTS merges into the android-common

> >     trees within days of the upstream LTS release (today being an

> >     exception).  That way once you do a pull/merge, you can just keep

> >     always merging to keep a secure device that is always up to date

> >     with the latest LTS releases in a simple way.  How much easier can I

> >     make it for the ARM ecosystem here, really?

> 

> That's great for the Android ecosystem, it's fantastic work and is doing

> a lot to overcome resistances people had there to merging up the LTS

> which is going to help many people.  While that's a very large part of

> ARM ecosystem it's not all of it, there are also chip vendors and system

> integrators who have made deliberate choices to minimize out of tree

> code just as we've been encouraging them to.


Again great, go use 4.14.y for those systems please.  It's better in the
long run.

thanks,

greg k-h
Ard Biesheuvel March 7, 2018, 6:24 p.m. UTC | #4
On 2 March 2018 at 16:54, Greg KH <greg@kroah.com> wrote:
> On Fri, Mar 02, 2018 at 05:14:50PM +0800, Alex Shi wrote:

>>

>>

>> On 03/01/2018 11:24 PM, Greg KH wrote:

>> > On Wed, Feb 28, 2018 at 11:56:22AM +0800, Alex Shi wrote:

>> >> Hi All,

>> >>

>> >> This backport patchset fixed the meltdown issue, it's original branch:

>> >> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=kpti

>> >> A few dependency or fixingpatches are also picked up, if they are necessary

>> >>  and no functional changes.

>> >>

>> >> The patchset also on repository:

>> >>    git://git.linaro.org/kernel/linux-linaro-stable.git lts-4.9-spectrevv2

>> >>

>> >> No bug found yet from kernelci.org and lkft testing.

>> >

>> > No bugs is good, but does it actually fix the meltdown problem?  What

>> > did you test it on?

>>

>> Oh, I have no A73/A75 cpu, so I can not reproduce meltdown bug.

>

> Then why should I trust this backport at all?

>

> Please test on the hardware that is affected, otherwise you do not know

> if your patches do anything or not.

>


I don't think it is feasible to test these backports by confirming
that they make the fundamental issue go away. We simply don't have the
code to reproduce all the variants, and we have to rely on the
information provided by ARM Ltd. regarding which cores are affected
and which aren't.

What we can do (and what I did for the v4.14 backport) is ensure that
the mitigations take effect when they are expected to, i.e., confirm
that the trampoline vector table and page tables are being used (which
can be done using the exploit code for variant 3a btw), and to check
that the branch predictor maintenance code is called as expected. For
variant 1, we just have to have faith ...

Note that I haven't done so for *this* backport, and I currently don't
have any time to spend on this.
Greg KH March 13, 2018, 10:38 a.m. UTC | #5
On Tue, Mar 13, 2018 at 10:13:26AM +0000, Ard Biesheuvel wrote:
> On 13 March 2018 at 10:04, Greg KH <greg@kroah.com> wrote:

> > On Wed, Mar 07, 2018 at 06:24:09PM +0000, Ard Biesheuvel wrote:

> >> On 2 March 2018 at 16:54, Greg KH <greg@kroah.com> wrote:

> >> > On Fri, Mar 02, 2018 at 05:14:50PM +0800, Alex Shi wrote:

> >> >>

> >> >>

> >> >> On 03/01/2018 11:24 PM, Greg KH wrote:

> >> >> > On Wed, Feb 28, 2018 at 11:56:22AM +0800, Alex Shi wrote:

> >> >> >> Hi All,

> >> >> >>

> >> >> >> This backport patchset fixed the meltdown issue, it's original branch:

> >> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=kpti

> >> >> >> A few dependency or fixingpatches are also picked up, if they are necessary

> >> >> >>  and no functional changes.

> >> >> >>

> >> >> >> The patchset also on repository:

> >> >> >>    git://git.linaro.org/kernel/linux-linaro-stable.git lts-4.9-spectrevv2

> >> >> >>

> >> >> >> No bug found yet from kernelci.org and lkft testing.

> >> >> >

> >> >> > No bugs is good, but does it actually fix the meltdown problem?  What

> >> >> > did you test it on?

> >> >>

> >> >> Oh, I have no A73/A75 cpu, so I can not reproduce meltdown bug.

> >> >

> >> > Then why should I trust this backport at all?

> >> >

> >> > Please test on the hardware that is affected, otherwise you do not know

> >> > if your patches do anything or not.

> >> >

> >>

> >> I don't think it is feasible to test these backports by confirming

> >> that they make the fundamental issue go away. We simply don't have the

> >> code to reproduce all the variants, and we have to rely on the

> >> information provided by ARM Ltd. regarding which cores are affected

> >> and which aren't.

> >

> > You really don't have the reproducers?  Please work with ARM to resolve

> > that, this should not be a non-tested set of patches.  That's really

> > worse than no patches at all, as if they were applied, that would

> > provide a false-sense of "all is fixed".

> >

> 

> I know that on x86, the line between architecture and platform is

> blurry. That is not the case on ARM, though.

> 

> Unlike platform firmware, the OS is built on top of an abstracted

> platform which is described by ARM's Architecture Reference Manual. If

> ARM Ltd. issues recommendations regarding what firmware PSCI methods

> to call when doing a context switch, or which barrier instruction to

> issue in certain circumstances, they do so because a certain class of

> hardware may require it in some cases. It is really not up to me to go

> find some exploit code on GitHub, run it before and after applying the

> patch and conclude that the problem is fixed. Instead, what I should

> do is confirm that the changes result in the recommended actions to be

> taken at the appropriate times.


To _not_ take that exploit code and run it to _verify_ that your patches
work, would be foolish, right?

I can't believe we are having the argument of "Test that your patches
actually work"...

Ugh, these series are all now dropped from my patch queue until you all
get your act together and get someone to verify the changes actually
work.

greg k-h
Ard Biesheuvel March 13, 2018, 1:01 p.m. UTC | #6
On 13 March 2018 at 10:38, Greg KH <greg@kroah.com> wrote:
> On Tue, Mar 13, 2018 at 10:13:26AM +0000, Ard Biesheuvel wrote:

>> On 13 March 2018 at 10:04, Greg KH <greg@kroah.com> wrote:

>> > On Wed, Mar 07, 2018 at 06:24:09PM +0000, Ard Biesheuvel wrote:

>> >> On 2 March 2018 at 16:54, Greg KH <greg@kroah.com> wrote:

...
>> >> > Please test on the hardware that is affected, otherwise you do not know

>> >> > if your patches do anything or not.

>> >> >

>> >>

>> >> I don't think it is feasible to test these backports by confirming

>> >> that they make the fundamental issue go away. We simply don't have the

>> >> code to reproduce all the variants, and we have to rely on the

>> >> information provided by ARM Ltd. regarding which cores are affected

>> >> and which aren't.

>> >

>> > You really don't have the reproducers?  Please work with ARM to resolve

>> > that, this should not be a non-tested set of patches.  That's really

>> > worse than no patches at all, as if they were applied, that would

>> > provide a false-sense of "all is fixed".

>> >

>>

>> I know that on x86, the line between architecture and platform is

>> blurry. That is not the case on ARM, though.

>>

>> Unlike platform firmware, the OS is built on top of an abstracted

>> platform which is described by ARM's Architecture Reference Manual. If

>> ARM Ltd. issues recommendations regarding what firmware PSCI methods

>> to call when doing a context switch, or which barrier instruction to

>> issue in certain circumstances, they do so because a certain class of

>> hardware may require it in some cases. It is really not up to me to go

>> find some exploit code on GitHub, run it before and after applying the

>> patch and conclude that the problem is fixed. Instead, what I should

>> do is confirm that the changes result in the recommended actions to be

>> taken at the appropriate times.

>

> To _not_ take that exploit code and run it to _verify_ that your patches

> work, would be foolish, right?

>


Oh, absolutely. But that presupposes access to both the affected
hardware and the exploit code.

> I can't believe we are having the argument of "Test that your patches

> actually work"...

>

> Ugh, these series are all now dropped from my patch queue until you all

> get your act together and get someone to verify the changes actually

> work.

>


Fair enough. If anyone needs these patches for their systems, they can
respond with a Tested-by:
Greg KH March 13, 2018, 1:25 p.m. UTC | #7
On Tue, Mar 13, 2018 at 01:01:43PM +0000, Ard Biesheuvel wrote:
> On 13 March 2018 at 10:38, Greg KH <greg@kroah.com> wrote:

> > On Tue, Mar 13, 2018 at 10:13:26AM +0000, Ard Biesheuvel wrote:

> >> On 13 March 2018 at 10:04, Greg KH <greg@kroah.com> wrote:

> >> > On Wed, Mar 07, 2018 at 06:24:09PM +0000, Ard Biesheuvel wrote:

> >> >> On 2 March 2018 at 16:54, Greg KH <greg@kroah.com> wrote:

> ...

> >> >> > Please test on the hardware that is affected, otherwise you do not know

> >> >> > if your patches do anything or not.

> >> >> >

> >> >>

> >> >> I don't think it is feasible to test these backports by confirming

> >> >> that they make the fundamental issue go away. We simply don't have the

> >> >> code to reproduce all the variants, and we have to rely on the

> >> >> information provided by ARM Ltd. regarding which cores are affected

> >> >> and which aren't.

> >> >

> >> > You really don't have the reproducers?  Please work with ARM to resolve

> >> > that, this should not be a non-tested set of patches.  That's really

> >> > worse than no patches at all, as if they were applied, that would

> >> > provide a false-sense of "all is fixed".

> >> >

> >>

> >> I know that on x86, the line between architecture and platform is

> >> blurry. That is not the case on ARM, though.

> >>

> >> Unlike platform firmware, the OS is built on top of an abstracted

> >> platform which is described by ARM's Architecture Reference Manual. If

> >> ARM Ltd. issues recommendations regarding what firmware PSCI methods

> >> to call when doing a context switch, or which barrier instruction to

> >> issue in certain circumstances, they do so because a certain class of

> >> hardware may require it in some cases. It is really not up to me to go

> >> find some exploit code on GitHub, run it before and after applying the

> >> patch and conclude that the problem is fixed. Instead, what I should

> >> do is confirm that the changes result in the recommended actions to be

> >> taken at the appropriate times.

> >

> > To _not_ take that exploit code and run it to _verify_ that your patches

> > work, would be foolish, right?

> >

> 

> Oh, absolutely. But that presupposes access to both the affected

> hardware and the exploit code.


If you all don't have access to both, then someone is doing something
seriously wrong.  Go complain to ARM please, we all know they have both.

I just got done yelling at a whole bunch of vendors last week about this
whole mess at a very large meeting of a lot of different Linux-based
companies.  It's crazy that the disfunction is still happening.

greg k-h