174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel

Message ID CAJZ5v0idFBHG=OKRiBqvsYPgUrWQsqjTKYXfKxfdG023qsCbkw@mail.gmail.com
State New
Headers show

Commit Message

Rafael J. Wysocki Jan. 8, 2017, 2:20 a.m.
On Sun, Jan 8, 2017 at 2:45 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Sun, Jan 8, 2017 at 2:01 AM, Borislav Petkov <bp@alien8.de> wrote:

>> On Sun, Jan 08, 2017 at 01:52:50AM +0100, Rafael J. Wysocki wrote:

>>> So we get the table, but apparently we crash when we attempt to put it.

>>

>> Right, except on 4.10-rc2 we don't crash but we freeze early. These are

>> the last lines:

>>

>> ...

>> [    0.004778] mce: CPU supports 7 MCE banks

>> [    0.004861] LVT offset 1 assigned for vector 0xf9

>> [    0.004945] Last level iTLB entries: 4KB 512, 2MB 1024, 4MB 512

>> [    0.005025] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 512, 1GB 0

>> [    0.005165] Freeing SMP alternatives memory: 24K

>> [    0.211154] ftrace: allocating 25022 entries in 98 pages

>> [    0.219614] smpboot: Max logical packages: 2

>> <EOF>

>>

>>> Let's try to check the obvious just to rule it out (see attached), but

>>> honestly I'm not sure what's going on in there.

>>

>> No change, same freeze.

>

> I was afraid that that would be the case.

>

> Can you try to comment out the acpi_put_table() in

> early_amd_iommu_init() and see if that makes any difference?


Well, there is a bug in early_amd_iommu_init() that may matter in
theory if the table checksum is incorrect.

Please see if the attached makes any difference.

Thanks,
Rafael

Comments

Borislav Petkov Jan. 8, 2017, 1:03 p.m. | #1
On Sun, Jan 08, 2017 at 03:20:20AM +0100, Rafael J. Wysocki wrote:
>  drivers/iommu/amd_iommu_init.c |    2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> Index: linux-pm/drivers/iommu/amd_iommu_init.c

> ===================================================================

> --- linux-pm.orig/drivers/iommu/amd_iommu_init.c

> +++ linux-pm/drivers/iommu/amd_iommu_init.c

> @@ -2230,7 +2230,7 @@ static int __init early_amd_iommu_init(v

>  	 */

>  	ret = check_ivrs_checksum(ivrs_base);

>  	if (ret)

> -		return ret;

> +		goto out;

>  

>  	amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base);

>  	DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type);


Good catch, this one needs to be applied regardless.

However, it doesn't fix my issue though.

But I think I have it - I went and applied the well-proven debugging
technique of sprinkling printks around. Here's what I'm seeing:

early_amd_iommu_init()
|-> acpi_put_table(ivrs_base);
|-> acpi_tb_put_table(table_desc);
|-> acpi_tb_invalidate_table(table_desc);
|-> acpi_tb_release_table(...)
|-> acpi_os_unmap_memory
|-> acpi_os_unmap_iomem
|-> acpi_os_map_cleanup
|-> synchronize_rcu_expedited	<-- the kernel/rcu/tree_exp.h version with CONFIG_PREEMPT_RCU=y

Now that function goes and sends IPIs, i.e., schedule_work()
but this is too early - we haven't even done workqueue_init().
Actually, from looking at the callstack, we do
kernel_init_freeable->native_smp_prepare_cpus() and workqueue_init()
comes next.

And this makes sense because the splat rIP points to __queue_work() but
we haven't done that yet.

So that acpi_put_table() is happening too early. Looks like AMD IOMMU
should not put the table but WTH do I know?!

In any case, commenting out:

        acpi_put_table(ivrs_base);
        ivrs_base = NULL;

and the end of early_amd_iommu_init() makes the box boot again.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng, Lv Jan. 9, 2017, 1:58 a.m. | #2
SGksDQoNCj4gRnJvbTogbGludXgtYWNwaS1vd25lckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzps
aW51eC1hY3BpLW93bmVyQHZnZXIua2VybmVsLm9yZ10gT24gQmVoYWxmIE9mIEJvcmlzbGF2DQo+
IFBldGtvdg0KPiBTdWJqZWN0OiBSZTogMTc0Y2M3MTg3ZTZmIEFDUElDQTogVGFibGVzOiBCYWNr
IHBvcnQgYWNwaV9nZXRfdGFibGVfd2l0aF9zaXplKCkgYW5kDQo+IGVhcmx5X2FjcGlfb3NfdW5t
YXBfbWVtb3J5KCkgZnJvbSBMaW51eCBrZXJuZWwNCj4gDQo+IE9uIFN1biwgSmFuIDA4LCAyMDE3
IGF0IDAzOjIwOjIwQU0gKzAxMDAsIFJhZmFlbCBKLiBXeXNvY2tpIHdyb3RlOg0KPiA+ICBkcml2
ZXJzL2lvbW11L2FtZF9pb21tdV9pbml0LmMgfCAgICAyICstDQo+ID4gIDEgZmlsZSBjaGFuZ2Vk
LCAxIGluc2VydGlvbigrKSwgMSBkZWxldGlvbigtKQ0KPiA+DQo+ID4gSW5kZXg6IGxpbnV4LXBt
L2RyaXZlcnMvaW9tbXUvYW1kX2lvbW11X2luaXQuYw0KPiA+ID09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0NCj4gPiAtLS0g
bGludXgtcG0ub3JpZy9kcml2ZXJzL2lvbW11L2FtZF9pb21tdV9pbml0LmMNCj4gPiArKysgbGlu
dXgtcG0vZHJpdmVycy9pb21tdS9hbWRfaW9tbXVfaW5pdC5jDQo+ID4gQEAgLTIyMzAsNyArMjIz
MCw3IEBAIHN0YXRpYyBpbnQgX19pbml0IGVhcmx5X2FtZF9pb21tdV9pbml0KHYNCj4gPiAgCSAq
Lw0KPiA+ICAJcmV0ID0gY2hlY2tfaXZyc19jaGVja3N1bShpdnJzX2Jhc2UpOw0KPiA+ICAJaWYg
KHJldCkNCj4gPiAtCQlyZXR1cm4gcmV0Ow0KPiA+ICsJCWdvdG8gb3V0Ow0KPiA+DQo+ID4gIAlh
bWRfaW9tbXVfdGFyZ2V0X2l2aGRfdHlwZSA9IGdldF9oaWdoZXN0X3N1cHBvcnRlZF9pdmhkX3R5
cGUoaXZyc19iYXNlKTsNCj4gPiAgCURVTVBfcHJpbnRrKCJVc2luZyBJVkhEIHR5cGUgJSN4XG4i
LCBhbWRfaW9tbXVfdGFyZ2V0X2l2aGRfdHlwZSk7DQo+IA0KPiBHb29kIGNhdGNoLCB0aGlzIG9u
ZSBuZWVkcyB0byBiZSBhcHBsaWVkIHJlZ2FyZGxlc3MuDQo+IA0KPiBIb3dldmVyLCBpdCBkb2Vz
bid0IGZpeCBteSBpc3N1ZSB0aG91Z2guDQo+IA0KPiBCdXQgSSB0aGluayBJIGhhdmUgaXQgLSBJ
IHdlbnQgYW5kIGFwcGxpZWQgdGhlIHdlbGwtcHJvdmVuIGRlYnVnZ2luZw0KPiB0ZWNobmlxdWUg
b2Ygc3ByaW5rbGluZyBwcmludGtzIGFyb3VuZC4gSGVyZSdzIHdoYXQgSSdtIHNlZWluZzoNCj4g
DQo+IGVhcmx5X2FtZF9pb21tdV9pbml0KCkNCj4gfC0+IGFjcGlfcHV0X3RhYmxlKGl2cnNfYmFz
ZSk7DQo+IHwtPiBhY3BpX3RiX3B1dF90YWJsZSh0YWJsZV9kZXNjKTsNCj4gfC0+IGFjcGlfdGJf
aW52YWxpZGF0ZV90YWJsZSh0YWJsZV9kZXNjKTsNCj4gfC0+IGFjcGlfdGJfcmVsZWFzZV90YWJs
ZSguLi4pDQo+IHwtPiBhY3BpX29zX3VubWFwX21lbW9yeQ0KPiB8LT4gYWNwaV9vc191bm1hcF9p
b21lbQ0KPiB8LT4gYWNwaV9vc19tYXBfY2xlYW51cA0KPiB8LT4gc3luY2hyb25pemVfcmN1X2V4
cGVkaXRlZAk8LS0gdGhlIGtlcm5lbC9yY3UvdHJlZV9leHAuaCB2ZXJzaW9uIHdpdGggQ09ORklH
X1BSRUVNUFRfUkNVPXkNCj4gDQo+IE5vdyB0aGF0IGZ1bmN0aW9uIGdvZXMgYW5kIHNlbmRzIElQ
SXMsIGkuZS4sIHNjaGVkdWxlX3dvcmsoKQ0KPiBidXQgdGhpcyBpcyB0b28gZWFybHkgLSB3ZSBo
YXZlbid0IGV2ZW4gZG9uZSB3b3JrcXVldWVfaW5pdCgpLg0KPiBBY3R1YWxseSwgZnJvbSBsb29r
aW5nIGF0IHRoZSBjYWxsc3RhY2ssIHdlIGRvDQo+IGtlcm5lbF9pbml0X2ZyZWVhYmxlLT5uYXRp
dmVfc21wX3ByZXBhcmVfY3B1cygpIGFuZCB3b3JrcXVldWVfaW5pdCgpDQo+IGNvbWVzIG5leHQu
DQo+IA0KPiBBbmQgdGhpcyBtYWtlcyBzZW5zZSBiZWNhdXNlIHRoZSBzcGxhdCBySVAgcG9pbnRz
IHRvIF9fcXVldWVfd29yaygpIGJ1dA0KPiB3ZSBoYXZlbid0IGRvbmUgdGhhdCB5ZXQuDQo+IA0K
PiBTbyB0aGF0IGFjcGlfcHV0X3RhYmxlKCkgaXMgaGFwcGVuaW5nIHRvbyBlYXJseS4gTG9va3Mg
bGlrZSBBTUQgSU9NTVUNCj4gc2hvdWxkIG5vdCBwdXQgdGhlIHRhYmxlIGJ1dCBXVEggZG8gSSBr
bm93PyENCj4gDQo+IEluIGFueSBjYXNlLCBjb21tZW50aW5nIG91dDoNCj4gDQo+ICAgICAgICAg
YWNwaV9wdXRfdGFibGUoaXZyc19iYXNlKTsNCj4gICAgICAgICBpdnJzX2Jhc2UgPSBOVUxMOw0K
PiANCj4gYW5kIHRoZSBlbmQgb2YgZWFybHlfYW1kX2lvbW11X2luaXQoKSBtYWtlcyB0aGUgYm94
IGJvb3QgYWdhaW4uDQoNClNvIHBsZWFzZSBoZWxwIHRvIGNvbW1lbnQgb3V0IHRoZXNlIDIgbGlu
ZXMgKHdpdGggZGVzY3JpcHRpb25zIGFuZCBkbyBub3QgZGVsZXRlIHRoZW0pLg0KVW50aWwgYWNw
aV9vc191bm1hcF9tZW1vcnkoKSBpcyBhYmxlIHRvIGhhbmRsZSBzdWNoIGFuIGVhcmx5IGNhc2Uu
DQoNClRoYW5rcyBhbmQgYmVzdCByZWdhcmRzDQpMdg0KDQo+IA0KPiAtLQ0KPiBSZWdhcmRzL0dy
dXNzLA0KPiAgICAgQm9yaXMuDQo+IA0KPiBHb29kIG1haWxpbmcgcHJhY3RpY2VzIGZvciA0MDA6
IGF2b2lkIHRvcC1wb3N0aW5nIGFuZCB0cmltIHRoZSByZXBseS4NCj4gLS0NCj4gVG8gdW5zdWJz
Y3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LWFj
cGkiIGluDQo+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwu
b3JnDQo+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFq
b3Jkb21vLWluZm8uaHRtbA0K
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng, Lv Jan. 9, 2017, 2:36 a.m. | #3
SGksDQoNCj4gRnJvbTogbGludXgtYWNwaS1vd25lckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzps
aW51eC1hY3BpLW93bmVyQHZnZXIua2VybmVsLm9yZ10gT24gQmVoYWxmIE9mIFpoZW5nLA0KPiBM
dg0KPiBTdWJqZWN0OiBSRTogMTc0Y2M3MTg3ZTZmIEFDUElDQTogVGFibGVzOiBCYWNrIHBvcnQg
YWNwaV9nZXRfdGFibGVfd2l0aF9zaXplKCkgYW5kDQo+IGVhcmx5X2FjcGlfb3NfdW5tYXBfbWVt
b3J5KCkgZnJvbSBMaW51eCBrZXJuZWwNCj4gDQo+IEhpLA0KPiANCj4gPiBGcm9tOiBsaW51eC1h
Y3BpLW93bmVyQHZnZXIua2VybmVsLm9yZyBbbWFpbHRvOmxpbnV4LWFjcGktb3duZXJAdmdlci5r
ZXJuZWwub3JnXSBPbiBCZWhhbGYgT2YNCj4gQm9yaXNsYXYNCj4gPiBQZXRrb3YNCj4gPiBTdWJq
ZWN0OiBSZTogMTc0Y2M3MTg3ZTZmIEFDUElDQTogVGFibGVzOiBCYWNrIHBvcnQgYWNwaV9nZXRf
dGFibGVfd2l0aF9zaXplKCkgYW5kDQo+ID4gZWFybHlfYWNwaV9vc191bm1hcF9tZW1vcnkoKSBm
cm9tIExpbnV4IGtlcm5lbA0KPiA+DQo+ID4gT24gU3VuLCBKYW4gMDgsIDIwMTcgYXQgMDM6MjA6
MjBBTSArMDEwMCwgUmFmYWVsIEouIFd5c29ja2kgd3JvdGU6DQo+ID4gPiAgZHJpdmVycy9pb21t
dS9hbWRfaW9tbXVfaW5pdC5jIHwgICAgMiArLQ0KPiA+ID4gIDEgZmlsZSBjaGFuZ2VkLCAxIGlu
c2VydGlvbigrKSwgMSBkZWxldGlvbigtKQ0KPiA+ID4NCj4gPiA+IEluZGV4OiBsaW51eC1wbS9k
cml2ZXJzL2lvbW11L2FtZF9pb21tdV9pbml0LmMNCj4gPiA+ID09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0NCj4gPiA+IC0t
LSBsaW51eC1wbS5vcmlnL2RyaXZlcnMvaW9tbXUvYW1kX2lvbW11X2luaXQuYw0KPiA+ID4gKysr
IGxpbnV4LXBtL2RyaXZlcnMvaW9tbXUvYW1kX2lvbW11X2luaXQuYw0KPiA+ID4gQEAgLTIyMzAs
NyArMjIzMCw3IEBAIHN0YXRpYyBpbnQgX19pbml0IGVhcmx5X2FtZF9pb21tdV9pbml0KHYNCj4g
PiA+ICAJICovDQo+ID4gPiAgCXJldCA9IGNoZWNrX2l2cnNfY2hlY2tzdW0oaXZyc19iYXNlKTsN
Cj4gPiA+ICAJaWYgKHJldCkNCj4gPiA+IC0JCXJldHVybiByZXQ7DQo+ID4gPiArCQlnb3RvIG91
dDsNCj4gPiA+DQo+ID4gPiAgCWFtZF9pb21tdV90YXJnZXRfaXZoZF90eXBlID0gZ2V0X2hpZ2hl
c3Rfc3VwcG9ydGVkX2l2aGRfdHlwZShpdnJzX2Jhc2UpOw0KPiA+ID4gIAlEVU1QX3ByaW50aygi
VXNpbmcgSVZIRCB0eXBlICUjeFxuIiwgYW1kX2lvbW11X3RhcmdldF9pdmhkX3R5cGUpOw0KPiA+
DQo+ID4gR29vZCBjYXRjaCwgdGhpcyBvbmUgbmVlZHMgdG8gYmUgYXBwbGllZCByZWdhcmRsZXNz
Lg0KPiA+DQo+ID4gSG93ZXZlciwgaXQgZG9lc24ndCBmaXggbXkgaXNzdWUgdGhvdWdoLg0KPiA+
DQo+ID4gQnV0IEkgdGhpbmsgSSBoYXZlIGl0IC0gSSB3ZW50IGFuZCBhcHBsaWVkIHRoZSB3ZWxs
LXByb3ZlbiBkZWJ1Z2dpbmcNCj4gPiB0ZWNobmlxdWUgb2Ygc3ByaW5rbGluZyBwcmludGtzIGFy
b3VuZC4gSGVyZSdzIHdoYXQgSSdtIHNlZWluZzoNCj4gPg0KPiA+IGVhcmx5X2FtZF9pb21tdV9p
bml0KCkNCj4gPiB8LT4gYWNwaV9wdXRfdGFibGUoaXZyc19iYXNlKTsNCj4gPiB8LT4gYWNwaV90
Yl9wdXRfdGFibGUodGFibGVfZGVzYyk7DQo+ID4gfC0+IGFjcGlfdGJfaW52YWxpZGF0ZV90YWJs
ZSh0YWJsZV9kZXNjKTsNCj4gPiB8LT4gYWNwaV90Yl9yZWxlYXNlX3RhYmxlKC4uLikNCj4gPiB8
LT4gYWNwaV9vc191bm1hcF9tZW1vcnkNCj4gPiB8LT4gYWNwaV9vc191bm1hcF9pb21lbQ0KPiA+
IHwtPiBhY3BpX29zX21hcF9jbGVhbnVwDQo+ID4gfC0+IHN5bmNocm9uaXplX3JjdV9leHBlZGl0
ZWQJPC0tIHRoZSBrZXJuZWwvcmN1L3RyZWVfZXhwLmggdmVyc2lvbiB3aXRoIENPTkZJR19QUkVF
TVBUX1JDVT15DQo+ID4NCj4gPiBOb3cgdGhhdCBmdW5jdGlvbiBnb2VzIGFuZCBzZW5kcyBJUElz
LCBpLmUuLCBzY2hlZHVsZV93b3JrKCkNCj4gPiBidXQgdGhpcyBpcyB0b28gZWFybHkgLSB3ZSBo
YXZlbid0IGV2ZW4gZG9uZSB3b3JrcXVldWVfaW5pdCgpLg0KPiA+IEFjdHVhbGx5LCBmcm9tIGxv
b2tpbmcgYXQgdGhlIGNhbGxzdGFjaywgd2UgZG8NCj4gPiBrZXJuZWxfaW5pdF9mcmVlYWJsZS0+
bmF0aXZlX3NtcF9wcmVwYXJlX2NwdXMoKSBhbmQgd29ya3F1ZXVlX2luaXQoKQ0KPiA+IGNvbWVz
IG5leHQuDQo+ID4NCj4gPiBBbmQgdGhpcyBtYWtlcyBzZW5zZSBiZWNhdXNlIHRoZSBzcGxhdCBy
SVAgcG9pbnRzIHRvIF9fcXVldWVfd29yaygpIGJ1dA0KPiA+IHdlIGhhdmVuJ3QgZG9uZSB0aGF0
IHlldC4NCj4gPg0KPiA+IFNvIHRoYXQgYWNwaV9wdXRfdGFibGUoKSBpcyBoYXBwZW5pbmcgdG9v
IGVhcmx5LiBMb29rcyBsaWtlIEFNRCBJT01NVQ0KPiA+IHNob3VsZCBub3QgcHV0IHRoZSB0YWJs
ZSBidXQgV1RIIGRvIEkga25vdz8hDQo+ID4NCj4gPiBJbiBhbnkgY2FzZSwgY29tbWVudGluZyBv
dXQ6DQo+ID4NCj4gPiAgICAgICAgIGFjcGlfcHV0X3RhYmxlKGl2cnNfYmFzZSk7DQo+ID4gICAg
ICAgICBpdnJzX2Jhc2UgPSBOVUxMOw0KPiA+DQo+ID4gYW5kIHRoZSBlbmQgb2YgZWFybHlfYW1k
X2lvbW11X2luaXQoKSBtYWtlcyB0aGUgYm94IGJvb3QgYWdhaW4uDQo+IA0KPiBTbyBwbGVhc2Ug
aGVscCB0byBjb21tZW50IG91dCB0aGVzZSAyIGxpbmVzICh3aXRoIGRlc2NyaXB0aW9ucyBhbmQg
ZG8gbm90IGRlbGV0ZSB0aGVtKS4NCj4gVW50aWwgYWNwaV9vc191bm1hcF9tZW1vcnkoKSBpcyBh
YmxlIHRvIGhhbmRsZSBzdWNoIGFuIGVhcmx5IGNhc2UuDQoNCklNTywgc3luY2hyb25pemVfcmN1
X2V4cGVkaXRlZCgpIHNob3VsZCBiZSBpbXByb3ZlZDoNCklmIHJjdV9pbml0KCkgaXNuJ3QgY2Fs
bGVkIG9yIHRoZXJlIGlzIG5vdGhpbmcgdG8gc3luY2hyb25pemUsIHNjaGVkdWxlX3dvcmsoKSBz
aG91bGRuJ3QgYmUgaW52b2tlZC4NCg0KVGhhbmtzIGFuZCBiZXN0IHJlZ2FyZHMNCkx2DQoNCj4g
DQo+IFRoYW5rcyBhbmQgYmVzdCByZWdhcmRzDQo+IEx2DQo+IA0KPiA+DQo+ID4gLS0NCj4gPiBS
ZWdhcmRzL0dydXNzLA0KPiA+ICAgICBCb3Jpcy4NCj4gPg0KPiA+IEdvb2QgbWFpbGluZyBwcmFj
dGljZXMgZm9yIDQwMDogYXZvaWQgdG9wLXBvc3RpbmcgYW5kIHRyaW0gdGhlIHJlcGx5Lg0K
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng, Lv Jan. 9, 2017, 5:21 a.m. | #4
Hi, Borislav

> From: Zheng, Lv

> Subject: RE: 174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and

> early_acpi_os_unmap_memory() from Linux kernel

> 

> Hi,

> 

> > From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Zheng,

> > Lv

> > Subject: RE: 174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and

> > early_acpi_os_unmap_memory() from Linux kernel

> >

> > Hi,

> >

> > > From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of

> > Borislav

> > > Petkov

> > > Subject: Re: 174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and

> > > early_acpi_os_unmap_memory() from Linux kernel

> > >

> > > On Sun, Jan 08, 2017 at 03:20:20AM +0100, Rafael J. Wysocki wrote:

> > > >  drivers/iommu/amd_iommu_init.c |    2 +-

> > > >  1 file changed, 1 insertion(+), 1 deletion(-)

> > > >

> > > > Index: linux-pm/drivers/iommu/amd_iommu_init.c

> > > > ===================================================================

> > > > --- linux-pm.orig/drivers/iommu/amd_iommu_init.c

> > > > +++ linux-pm/drivers/iommu/amd_iommu_init.c

> > > > @@ -2230,7 +2230,7 @@ static int __init early_amd_iommu_init(v

> > > >  	 */

> > > >  	ret = check_ivrs_checksum(ivrs_base);

> > > >  	if (ret)

> > > > -		return ret;

> > > > +		goto out;

> > > >

> > > >  	amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base);

> > > >  	DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type);

> > >

> > > Good catch, this one needs to be applied regardless.

> > >

> > > However, it doesn't fix my issue though.

> > >

> > > But I think I have it - I went and applied the well-proven debugging

> > > technique of sprinkling printks around. Here's what I'm seeing:

> > >

> > > early_amd_iommu_init()

> > > |-> acpi_put_table(ivrs_base);

> > > |-> acpi_tb_put_table(table_desc);

> > > |-> acpi_tb_invalidate_table(table_desc);

> > > |-> acpi_tb_release_table(...)

> > > |-> acpi_os_unmap_memory

> > > |-> acpi_os_unmap_iomem

> > > |-> acpi_os_map_cleanup

> > > |-> synchronize_rcu_expedited	<-- the kernel/rcu/tree_exp.h version with CONFIG_PREEMPT_RCU=y

> > >

> > > Now that function goes and sends IPIs, i.e., schedule_work()

> > > but this is too early - we haven't even done workqueue_init().

> > > Actually, from looking at the callstack, we do

> > > kernel_init_freeable->native_smp_prepare_cpus() and workqueue_init()

> > > comes next.

> > >

> > > And this makes sense because the splat rIP points to __queue_work() but

> > > we haven't done that yet.

> > >

> > > So that acpi_put_table() is happening too early. Looks like AMD IOMMU

> > > should not put the table but WTH do I know?!

> > >

> > > In any case, commenting out:

> > >

> > >         acpi_put_table(ivrs_base);

> > >         ivrs_base = NULL;

> > >

> > > and the end of early_amd_iommu_init() makes the box boot again.

> >

> > So please help to comment out these 2 lines (with descriptions and do not delete them).

> > Until acpi_os_unmap_memory() is able to handle such an early case.

> 

> IMO, synchronize_rcu_expedited() should be improved:

> If rcu_init() isn't called or there is nothing to synchronize, schedule_work() shouldn't be invoked.



Hmm, I think this problem has its history.
In APEI code, NMI handlers cannot utilize spinlock.
So the developers there used RCU to synchronize NMI handlers before the register region is unmapped.
At that time, there might not be pre-map/post-unmap code prepared in the APEI drivers.
So the APEI developers relied on map/unmap logics implemented in the ACPICA register read/write APIs where the OSL map/unmap is invoked.

That's why the RCU code is in acpi_os_xxx().
If:
1. there is map/unmap/read/write operations available for APEI developers to invoke;
2. RCU synchronization is invoked before invoking the last unmap operation;
3. map/unmap/read/write order is strictly ensured inside of the APEI drivers.
Then we can remove RCU stuffs from acpi_os_xxx.
And the root cause of this issue can be fixed.

I'm not sure if this approach is possible, but can give it a try.
Before that I think all such acpi_put_table() should just be commented out.

Thanks and best regards
Lv

> 

> Thanks and best regards

> Lv

> 

> >

> > Thanks and best regards

> > Lv

> >

> > >

> > > --

> > > Regards/Gruss,

> > >     Boris.

> > >

> > > Good mailing practices for 400: avoid top-posting and trim the reply.
Borislav Petkov Jan. 9, 2017, 9:33 a.m. | #5
+ Paul for comment.

Leaving in the rest for him.

On Mon, Jan 09, 2017 at 02:36:33AM +0000, Zheng, Lv wrote:
> Hi,

> 

> > From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Zheng,

> > Lv

> > Subject: RE: 174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and

> > early_acpi_os_unmap_memory() from Linux kernel

> > 

> > Hi,

> > 

> > > From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of

> > Borislav

> > > Petkov

> > > Subject: Re: 174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and

> > > early_acpi_os_unmap_memory() from Linux kernel

> > >

> > > On Sun, Jan 08, 2017 at 03:20:20AM +0100, Rafael J. Wysocki wrote:

> > > >  drivers/iommu/amd_iommu_init.c |    2 +-

> > > >  1 file changed, 1 insertion(+), 1 deletion(-)

> > > >

> > > > Index: linux-pm/drivers/iommu/amd_iommu_init.c

> > > > ===================================================================

> > > > --- linux-pm.orig/drivers/iommu/amd_iommu_init.c

> > > > +++ linux-pm/drivers/iommu/amd_iommu_init.c

> > > > @@ -2230,7 +2230,7 @@ static int __init early_amd_iommu_init(v

> > > >  	 */

> > > >  	ret = check_ivrs_checksum(ivrs_base);

> > > >  	if (ret)

> > > > -		return ret;

> > > > +		goto out;

> > > >

> > > >  	amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base);

> > > >  	DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type);

> > >

> > > Good catch, this one needs to be applied regardless.

> > >

> > > However, it doesn't fix my issue though.

> > >

> > > But I think I have it - I went and applied the well-proven debugging

> > > technique of sprinkling printks around. Here's what I'm seeing:

> > >

> > > early_amd_iommu_init()

> > > |-> acpi_put_table(ivrs_base);

> > > |-> acpi_tb_put_table(table_desc);

> > > |-> acpi_tb_invalidate_table(table_desc);

> > > |-> acpi_tb_release_table(...)

> > > |-> acpi_os_unmap_memory

> > > |-> acpi_os_unmap_iomem

> > > |-> acpi_os_map_cleanup

> > > |-> synchronize_rcu_expedited	<-- the kernel/rcu/tree_exp.h version with CONFIG_PREEMPT_RCU=y

> > >

> > > Now that function goes and sends IPIs, i.e., schedule_work()

> > > but this is too early - we haven't even done workqueue_init().

> > > Actually, from looking at the callstack, we do

> > > kernel_init_freeable->native_smp_prepare_cpus() and workqueue_init()

> > > comes next.

> > >

> > > And this makes sense because the splat rIP points to __queue_work() but

> > > we haven't done that yet.

> > >

> > > So that acpi_put_table() is happening too early. Looks like AMD IOMMU

> > > should not put the table but WTH do I know?!

> > >

> > > In any case, commenting out:

> > >

> > >         acpi_put_table(ivrs_base);

> > >         ivrs_base = NULL;

> > >

> > > and the end of early_amd_iommu_init() makes the box boot again.

> > 

> > So please help to comment out these 2 lines (with descriptions and do not delete them).

> > Until acpi_os_unmap_memory() is able to handle such an early case.

> 

> IMO, synchronize_rcu_expedited() should be improved:

> If rcu_init() isn't called or there is nothing to synchronize, schedule_work() shouldn't be invoked.


-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 9, 2017, 10:25 p.m. | #6
Hi Paul,

On Mon, Jan 9, 2017 at 11:18 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Mon, Jan 09, 2017 at 10:33:29AM +0100, Borislav Petkov wrote:

>> + Paul for comment.

>>

>> Leaving in the rest for him.

>>

>> On Mon, Jan 09, 2017 at 02:36:33AM +0000, Zheng, Lv wrote:

>> > Hi,

>> >

>> > > From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Zheng,

>> > > Lv

>> > > Subject: RE: 174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and

>> > > early_acpi_os_unmap_memory() from Linux kernel

>> > >

>> > > Hi,

>> > >

>> > > > From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of

>> > > Borislav

>> > > > Petkov

>> > > > Subject: Re: 174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and

>> > > > early_acpi_os_unmap_memory() from Linux kernel

>> > > >

>> > > > On Sun, Jan 08, 2017 at 03:20:20AM +0100, Rafael J. Wysocki wrote:

>> > > > >  drivers/iommu/amd_iommu_init.c |    2 +-

>> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)

>> > > > >

>> > > > > Index: linux-pm/drivers/iommu/amd_iommu_init.c

>> > > > > ===================================================================

>> > > > > --- linux-pm.orig/drivers/iommu/amd_iommu_init.c

>> > > > > +++ linux-pm/drivers/iommu/amd_iommu_init.c

>> > > > > @@ -2230,7 +2230,7 @@ static int __init early_amd_iommu_init(v

>> > > > >        */

>> > > > >       ret = check_ivrs_checksum(ivrs_base);

>> > > > >       if (ret)

>> > > > > -             return ret;

>> > > > > +             goto out;

>> > > > >

>> > > > >       amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base);

>> > > > >       DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type);

>> > > >

>> > > > Good catch, this one needs to be applied regardless.

>> > > >

>> > > > However, it doesn't fix my issue though.

>> > > >

>> > > > But I think I have it - I went and applied the well-proven debugging

>> > > > technique of sprinkling printks around. Here's what I'm seeing:

>> > > >

>> > > > early_amd_iommu_init()

>> > > > |-> acpi_put_table(ivrs_base);

>> > > > |-> acpi_tb_put_table(table_desc);

>> > > > |-> acpi_tb_invalidate_table(table_desc);

>> > > > |-> acpi_tb_release_table(...)

>> > > > |-> acpi_os_unmap_memory

>> > > > |-> acpi_os_unmap_iomem

>> > > > |-> acpi_os_map_cleanup

>> > > > |-> synchronize_rcu_expedited   <-- the kernel/rcu/tree_exp.h version with CONFIG_PREEMPT_RCU=y

>> > > >

>> > > > Now that function goes and sends IPIs, i.e., schedule_work()

>> > > > but this is too early - we haven't even done workqueue_init().

>> > > > Actually, from looking at the callstack, we do

>> > > > kernel_init_freeable->native_smp_prepare_cpus() and workqueue_init()

>> > > > comes next.

>> > > >

>> > > > And this makes sense because the splat rIP points to __queue_work() but

>> > > > we haven't done that yet.

>> > > >

>> > > > So that acpi_put_table() is happening too early. Looks like AMD IOMMU

>> > > > should not put the table but WTH do I know?!

>> > > >

>> > > > In any case, commenting out:

>> > > >

>> > > >         acpi_put_table(ivrs_base);

>> > > >         ivrs_base = NULL;

>> > > >

>> > > > and the end of early_amd_iommu_init() makes the box boot again.

>> > >

>> > > So please help to comment out these 2 lines (with descriptions and do not delete them).

>> > > Until acpi_os_unmap_memory() is able to handle such an early case.

>> >

>> > IMO, synchronize_rcu_expedited() should be improved:

>> > If rcu_init() isn't called or there is nothing to synchronize, schedule_work() shouldn't be invoked.

>

> Indeed it should!

>

> Does the (untested) patch below fix things for you?

>

> If so, does this need to go into 4.10?  (My default workflow would get

> it into 4.11 or 4.12, so please speak up if you need it.)


Yes it should go into 4.10 (if it fixes the problem) as the reported
regression was introduced in 4.10-rc1.

>

> ------------------------------------------------------------------------

>

> commit 1b7feb708241f1662cfd529118468c9f9c0b1449

> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Date:   Mon Jan 9 14:10:50 2017 -0800

>

>     rcu: Make synchronize_rcu_expedited() safe for early boot

>

>     The synchronize_rcu_expedited() function does not check for early-boot

>     use, which can result in failures if it is invoked before the scheduler

>     has started.  Given that the rcupdate.rcu_expedited kernel parameter

>     causes all calls to synchronize_rcu() to be directed instead to

>     synchronize_rcu_expedited(), a usage restriction does not make sense.

>

>     This commit therefore adds a rcu_scheduler_active check to

>     synchronize_rcu_expedited(), so that it is a no-op before the scheduler

>     starts.  This behavior is correct because there is only a single CPU

>     running during that time.

>

>     Reported-by: Lv Zheng <lv.zheng@intel.com>

>     Reported-by: Borislav Petkov <bp@alien8.de>

>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


When applying this, please also add:

Fixes: 174cc7187e6f ("ACPICA: Tables: Back port
acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux
kernel")
Link: https://lkml.kernel.org/r/4034dde8-ffc1-18e2-f40c-00cf37471793@intel.com

>

> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h

> index dfc3ba5a429e..a6c3d86480de 100644

> --- a/kernel/rcu/tree_exp.h

> +++ b/kernel/rcu/tree_exp.h

> @@ -690,6 +690,8 @@ void synchronize_rcu_expedited(void)

>  {

>         struct rcu_state *rsp = rcu_state_p;

>

> +       if (!rcu_scheduler_active)

> +               return;

>         _synchronize_rcu_expedited(rsp, sync_rcu_exp_handler);

>  }

>  EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);


Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Jan. 9, 2017, 10:57 p.m. | #7
On Mon, Jan 09, 2017 at 11:41:15PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> Subject: [PATCH] IOMMU / AMD: Fix error code path in early_amd_iommu_init()

> 

> Prevent early_amd_iommu_init() from leaking memory mapped via

> acpi_get_table() if check_ivrs_checksum() returns an error.

> 

> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---

>  drivers/iommu/amd_iommu_init.c |    2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> Index: linux-pm/drivers/iommu/amd_iommu_init.c

> ===================================================================

> --- linux-pm.orig/drivers/iommu/amd_iommu_init.c

> +++ linux-pm/drivers/iommu/amd_iommu_init.c

> @@ -2230,7 +2230,7 @@ static int __init early_amd_iommu_init(v

>  	 */

>  	ret = check_ivrs_checksum(ivrs_base);

>  	if (ret)

> -		return ret;

> +		goto out;

>  

>  	amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base);

>  	DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type);


Reviewed-by: Borislav Petkov <bp@suse.de>


-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 9, 2017, 11:42 p.m. | #8
On Tue, Jan 10, 2017 at 12:32 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Tue, Jan 10, 2017 at 12:15:01AM +0100, Borislav Petkov wrote:

>> On Mon, Jan 09, 2017 at 02:18:31PM -0800, Paul E. McKenney wrote:

>> > @@ -690,6 +690,8 @@ void synchronize_rcu_expedited(void)

>> >  {

>> >     struct rcu_state *rsp = rcu_state_p;

>> >

>> > +   if (!rcu_scheduler_active)

>> > +           return;

>> >     _synchronize_rcu_expedited(rsp, sync_rcu_exp_handler);

>> >  }

>> >  EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

>>

>> That doesn't work and it is because of those damn what goes before what

>> boot sequence issues :-\

>>

>> We have:

>>

>> rest_init()

>> |-> rcu_scheduler_starting()  ---> that sets rcu_scheduler_active = 1;

>> |-> kernel_thread(kernel_init, NULL, CLONE_FS);

>> |-> kernel_init()

>> |-> kernel_init_freeable()

>> |-> native_smp_prepare_cpus(setup_max_cpus)

>> |-> default_setup_apic_routing

>> |-> enable_IR_x2apic

>> |-> irq_remapping_prepare

>> |-> amd_iommu_prepare

>> |-> iommu_go_to_state

>> |-> acpi_put_table(ivrs_base);

>> |-> acpi_tb_put_table(table_desc);

>> |-> acpi_tb_invalidate_table(table_desc);

>> |-> acpi_tb_release_table(...)

>> |-> acpi_os_unmap_memory

>> |-> acpi_os_unmap_iomem

>> |-> acpi_os_map_cleanup

>> |-> synchronize_rcu_expedited()

>>

>> Now here we have rcu_scheduler_active already set so the test doesn't

>> hit and we hang.

>>

>> So we must do it differently.

>

> Yeah, there is a window just as the scheduler is starting where things don't

> work.

>

> We could move rcu_scheduler_starting() later, as long as there

> is no chance of preemption or context switch before it is invoked.

> Would that help in this case, or are we already context switching before

> acpi_os_map_cleanup() is invoked?


In the particular AMD IOMMU case it doesn't look like we are, but we
do in other cases.

> (If we are already context switching,

> short-circuiting synchronize_rcu_expedited() would be a bug.)


It may be easier to make the caller avoid RCU synchronization
altogether if that's not necessary and the caller should actually be
able to figure out when that's the case.

The patch from Lv at https://patchwork.kernel.org/patch/9504277/ goes
in the right direction IMO, but I'm not yet convinced that this is the
right one.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Jan. 9, 2017, 11:52 p.m. | #9
On Tue, Jan 10, 2017 at 12:40:39AM +0100, Borislav Petkov wrote:
> On Mon, Jan 09, 2017 at 03:32:04PM -0800, Paul E. McKenney wrote:

> > We could move rcu_scheduler_starting() later, as long as there

> > is no chance of preemption or context switch before it is invoked.

> > Would that help in this case, or are we already context switching before

> > acpi_os_map_cleanup() is invoked?  (If we are already context switching,

> > short-circuiting synchronize_rcu_expedited() would be a bug.)

> 

> Hmm, how about the below?

> 

> It would still happen before

> 

>         /*

>          * The boot idle thread must execute schedule()

>          * at least once to get things moving:

>          */

>         init_idle_bootup_task(current);

>         schedule_preempt_disabled();

> 

> in rest_init() and right after native_smp_prepare_cpus() which is where

> we're splatting.

> 

> Lemme run it.

> 

> Even if it works, we would have to stress-test this seriously...


Yeah, the call to wait_for_completion() at the beginning of
kernel_init_freeable() makes me extremely nervous.  Even if it does
happen to work, this looks like an accident waiting to happen.

Is it possible to instead move the ACPI initialization to follow the
workqueue initialization?

							Thanx, Paul

> ---

> diff --git a/init/main.c b/init/main.c

> index b0c9d6facef9..9be221cc87c3 100644

> --- a/init/main.c

> +++ b/init/main.c

> @@ -385,7 +385,6 @@ static noinline void __ref rest_init(void)

>  {

>  	int pid;

> 

> -	rcu_scheduler_starting();

>  	/*

>  	 * We need to spawn init first so that it obtains pid 1, however

>  	 * the init task will end up wanting to create kthreads, which, if

> @@ -1019,6 +1018,8 @@ static noinline void __init kernel_init_freeable(void)

> 

>  	smp_prepare_cpus(setup_max_cpus);

> 

> +	rcu_scheduler_starting();

> +

>  	workqueue_init();

> 

>  	do_pre_smp_initcalls();

> 

> 

> -- 

> Regards/Gruss,

>     Boris.

> 

> Good mailing practices for 400: avoid top-posting and trim the reply.

> 


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Jan. 9, 2017, 11:52 p.m. | #10
On Tue, Jan 10, 2017 at 12:40:39AM +0100, Borislav Petkov wrote:
> Lemme run it.


Well, it boots but I get:

[    0.291447] ------------[ cut here ]------------
[    0.291702] WARNING: CPU: 0 PID: 1 at kernel/rcu/tree.c:3993 rcu_scheduler_starting+0x5c/0x70
[    0.292107] Modules linked in:
[    0.292277] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc3+ #21
[    0.292540] Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.08 01/28/2016
[    0.292893] Call Trace:
[    0.293072]  ? dump_stack+0x46/0x63
[    0.293285]  ? __warn+0xec/0x110
[    0.293487]  ? rcu_scheduler_starting+0x5c/0x70
[    0.293735]  ? kernel_init_freeable+0x58/0x19a
[    0.293976]  ? rest_init+0x80/0x80
[    0.294153]  ? kernel_init+0xa/0x100
[    0.294334]  ? ret_from_fork+0x22/0x30
[    0.294525] ---[ end trace 4c0fe009ed4dc740 ]---

TBH, I like Rafael's suggestion in the other mail to stick with fixing
this in ACPI, especially this is an ACPI problem, not RCU. Well,
more or less: RCU could be taught to *not* do schedule_work() if
workqueue_init() hasn't happened yet but that's a tangential.

So, I'm going to bed. When I wake up, I want to see working fixes!

:-)))

Thanks dudes!

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Jan. 10, 2017, 12:21 a.m. | #11
On Tue, Jan 10, 2017 at 12:42:47AM +0100, Rafael J. Wysocki wrote:
> On Tue, Jan 10, 2017 at 12:32 AM, Paul E. McKenney

> <paulmck@linux.vnet.ibm.com> wrote:

> > On Tue, Jan 10, 2017 at 12:15:01AM +0100, Borislav Petkov wrote:

> >> On Mon, Jan 09, 2017 at 02:18:31PM -0800, Paul E. McKenney wrote:

> >> > @@ -690,6 +690,8 @@ void synchronize_rcu_expedited(void)

> >> >  {

> >> >     struct rcu_state *rsp = rcu_state_p;

> >> >

> >> > +   if (!rcu_scheduler_active)

> >> > +           return;

> >> >     _synchronize_rcu_expedited(rsp, sync_rcu_exp_handler);

> >> >  }

> >> >  EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

> >>

> >> That doesn't work and it is because of those damn what goes before what

> >> boot sequence issues :-\

> >>

> >> We have:

> >>

> >> rest_init()

> >> |-> rcu_scheduler_starting()  ---> that sets rcu_scheduler_active = 1;

> >> |-> kernel_thread(kernel_init, NULL, CLONE_FS);

> >> |-> kernel_init()

> >> |-> kernel_init_freeable()

> >> |-> native_smp_prepare_cpus(setup_max_cpus)

> >> |-> default_setup_apic_routing

> >> |-> enable_IR_x2apic

> >> |-> irq_remapping_prepare

> >> |-> amd_iommu_prepare

> >> |-> iommu_go_to_state

> >> |-> acpi_put_table(ivrs_base);

> >> |-> acpi_tb_put_table(table_desc);

> >> |-> acpi_tb_invalidate_table(table_desc);

> >> |-> acpi_tb_release_table(...)

> >> |-> acpi_os_unmap_memory

> >> |-> acpi_os_unmap_iomem

> >> |-> acpi_os_map_cleanup

> >> |-> synchronize_rcu_expedited()

> >>

> >> Now here we have rcu_scheduler_active already set so the test doesn't

> >> hit and we hang.

> >>

> >> So we must do it differently.

> >

> > Yeah, there is a window just as the scheduler is starting where things don't

> > work.

> >

> > We could move rcu_scheduler_starting() later, as long as there

> > is no chance of preemption or context switch before it is invoked.

> > Would that help in this case, or are we already context switching before

> > acpi_os_map_cleanup() is invoked?

> 

> In the particular AMD IOMMU case it doesn't look like we are, but we

> do in other cases.

> 

> > (If we are already context switching,

> > short-circuiting synchronize_rcu_expedited() would be a bug.)

> 

> It may be easier to make the caller avoid RCU synchronization

> altogether if that's not necessary and the caller should actually be

> able to figure out when that's the case.

> 

> The patch from Lv at https://patchwork.kernel.org/patch/9504277/ goes

> in the right direction IMO, but I'm not yet convinced that this is the

> right one.


From the RCU end, I could force expedited grace periods to translate to
normal grace periods during that window of time, and then make sure that
RCU's grace-period kthreads are spawned beforehand.  Looking into this...

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng, Lv Jan. 10, 2017, 12:44 a.m. | #12
Hi,

Can the attached patch makes something different?

Thanks and best regards
Lv

> From: Borislav Petkov [mailto:bp@alien8.de]

> Subject: Re: 174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and

> early_acpi_os_unmap_memory() from Linux kernel

> 

> On Tue, Jan 10, 2017 at 12:40:39AM +0100, Borislav Petkov wrote:

> > Lemme run it.

> 

> Well, it boots but I get:

> 

> [    0.291447] ------------[ cut here ]------------

> [    0.291702] WARNING: CPU: 0 PID: 1 at kernel/rcu/tree.c:3993 rcu_scheduler_starting+0x5c/0x70

> [    0.292107] Modules linked in:

> [    0.292277] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc3+ #21

> [    0.292540] Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.08 01/28/2016

> [    0.292893] Call Trace:

> [    0.293072]  ? dump_stack+0x46/0x63

> [    0.293285]  ? __warn+0xec/0x110

> [    0.293487]  ? rcu_scheduler_starting+0x5c/0x70

> [    0.293735]  ? kernel_init_freeable+0x58/0x19a

> [    0.293976]  ? rest_init+0x80/0x80

> [    0.294153]  ? kernel_init+0xa/0x100

> [    0.294334]  ? ret_from_fork+0x22/0x30

> [    0.294525] ---[ end trace 4c0fe009ed4dc740 ]---

> 

> TBH, I like Rafael's suggestion in the other mail to stick with fixing

> this in ACPI, especially this is an ACPI problem, not RCU. Well,

> more or less: RCU could be taught to *not* do schedule_work() if

> workqueue_init() hasn't happened yet but that's a tangential.

> 

> So, I'm going to bed. When I wake up, I want to see working fixes!

> 

> :-)))

> 

> Thanks dudes!

> 

> --

> Regards/Gruss,

>     Boris.

> 

> Good mailing practices for 400: avoid top-posting and trim the reply.
Zheng, Lv Jan. 10, 2017, 5:41 a.m. | #13
Hi, Rafael and Paul

> From: Paul E. McKenney [mailto:paulmck@linux.vnet.ibm.com]

> Subject: Re: 174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and

> early_acpi_os_unmap_memory() from Linux kernel

> 

> On Tue, Jan 10, 2017 at 02:27:16AM +0100, Rafael J. Wysocki wrote:

> > On Tue, Jan 10, 2017 at 12:52 AM, Borislav Petkov <bp@alien8.de> wrote:

> > > On Tue, Jan 10, 2017 at 12:40:39AM +0100, Borislav Petkov wrote:

> > >> Lemme run it.

> > >

> > > Well, it boots but I get:

> > >

> > > [    0.291447] ------------[ cut here ]------------

> > > [    0.291702] WARNING: CPU: 0 PID: 1 at kernel/rcu/tree.c:3993 rcu_scheduler_starting+0x5c/0x70

> > > [    0.292107] Modules linked in:

> > > [    0.292277] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc3+ #21

> > > [    0.292540] Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.08 01/28/2016

> > > [    0.292893] Call Trace:

> > > [    0.293072]  ? dump_stack+0x46/0x63

> > > [    0.293285]  ? __warn+0xec/0x110

> > > [    0.293487]  ? rcu_scheduler_starting+0x5c/0x70

> > > [    0.293735]  ? kernel_init_freeable+0x58/0x19a

> > > [    0.293976]  ? rest_init+0x80/0x80

> > > [    0.294153]  ? kernel_init+0xa/0x100

> > > [    0.294334]  ? ret_from_fork+0x22/0x30

> > > [    0.294525] ---[ end trace 4c0fe009ed4dc740 ]---

> > >

> > > TBH, I like Rafael's suggestion in the other mail to stick with fixing

> > > this in ACPI, especially this is an ACPI problem, not RCU. Well,

> > > more or less: RCU could be taught to *not* do schedule_work() if

> > > workqueue_init() hasn't happened yet but that's a tangential.

> > >

> > > So, I'm going to bed. When I wake up, I want to see working fixes!

> > >

> > > :-)))

> >

> > Well, if the https://patchwork.kernel.org/patch/9504277/ patch from Lv

> > worked, the attached one should work too (please test), but it can be

> > justified in a slightly more convincing way.

> >

> > Namely, the idea is that acpi_os_read/write_memory() should never be

> > used before invoking acpi_os_initialize() and since those are the only

> > places where the list of memory regions is walked under RCU without

> > extra locking, it is safe to skip the RCU synchronization until that

> > happens.

> >

> > Thanks,

> > Rafael

> 

> Makes sense to me!


Also looks good to me.

> 

> It looks like I can make the grace-period-free boot-time window

> for CONFIG_PREEMPT=y kernels quite a bit narrower, but it does not

> look like something suitable for jamming into 4.10.


OK, we can have this fixed in ACPI layer first.

Thanks
Lv

> 

> 							Thanx, Paul

> 

> > ---

> >  drivers/acpi/osl.c |    8 +++++++-

> >  1 file changed, 7 insertions(+), 1 deletion(-)

> >

> > Index: linux-pm/drivers/acpi/osl.c

> > ===================================================================

> > --- linux-pm.orig/drivers/acpi/osl.c

> > +++ linux-pm/drivers/acpi/osl.c

> > @@ -378,7 +378,9 @@ static void acpi_os_drop_map_ref(struct

> >  static void acpi_os_map_cleanup(struct acpi_ioremap *map)

> >  {

> >  	if (!map->refcount) {

> > -		synchronize_rcu_expedited();

> > +		if (acpi_os_initialized)

> > +			synchronize_rcu_expedited();

> > +

> >  		acpi_unmap(map->phys, map->virt);

> >  		kfree(map);

> >  	}

> > @@ -671,6 +673,8 @@ acpi_os_read_memory(acpi_physical_addres

> >  	bool unmap = false;

> >  	u64 dummy;

> >

> > +	WARN_ON_ONCE(!acpi_os_initialized);

> > +

> >  	rcu_read_lock();

> >  	virt_addr = acpi_map_vaddr_lookup(phys_addr, size);

> >  	if (!virt_addr) {

> > @@ -716,6 +720,8 @@ acpi_os_write_memory(acpi_physical_addre

> >  	unsigned int size = width / 8;

> >  	bool unmap = false;

> >

> > +	WARN_ON_ONCE(!acpi_os_initialized);

> > +

> >  	rcu_read_lock();

> >  	virt_addr = acpi_map_vaddr_lookup(phys_addr, size);

> >  	if (!virt_addr) {


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Jan. 10, 2017, 5:51 a.m. | #14
On Tue, Jan 10, 2017 at 05:41:45AM +0000, Zheng, Lv wrote:
> Hi, Rafael and Paul

> 

> > From: Paul E. McKenney [mailto:paulmck@linux.vnet.ibm.com]

> > Subject: Re: 174cc7187e6f ACPICA: Tables: Back port acpi_get_table_with_size() and

> > early_acpi_os_unmap_memory() from Linux kernel

> > 

> > On Tue, Jan 10, 2017 at 02:27:16AM +0100, Rafael J. Wysocki wrote:

> > > On Tue, Jan 10, 2017 at 12:52 AM, Borislav Petkov <bp@alien8.de> wrote:

> > > > On Tue, Jan 10, 2017 at 12:40:39AM +0100, Borislav Petkov wrote:

> > > >> Lemme run it.

> > > >

> > > > Well, it boots but I get:

> > > >

> > > > [    0.291447] ------------[ cut here ]------------

> > > > [    0.291702] WARNING: CPU: 0 PID: 1 at kernel/rcu/tree.c:3993 rcu_scheduler_starting+0x5c/0x70

> > > > [    0.292107] Modules linked in:

> > > > [    0.292277] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc3+ #21

> > > > [    0.292540] Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.08 01/28/2016

> > > > [    0.292893] Call Trace:

> > > > [    0.293072]  ? dump_stack+0x46/0x63

> > > > [    0.293285]  ? __warn+0xec/0x110

> > > > [    0.293487]  ? rcu_scheduler_starting+0x5c/0x70

> > > > [    0.293735]  ? kernel_init_freeable+0x58/0x19a

> > > > [    0.293976]  ? rest_init+0x80/0x80

> > > > [    0.294153]  ? kernel_init+0xa/0x100

> > > > [    0.294334]  ? ret_from_fork+0x22/0x30

> > > > [    0.294525] ---[ end trace 4c0fe009ed4dc740 ]---

> > > >

> > > > TBH, I like Rafael's suggestion in the other mail to stick with fixing

> > > > this in ACPI, especially this is an ACPI problem, not RCU. Well,

> > > > more or less: RCU could be taught to *not* do schedule_work() if

> > > > workqueue_init() hasn't happened yet but that's a tangential.

> > > >

> > > > So, I'm going to bed. When I wake up, I want to see working fixes!

> > > >

> > > > :-)))

> > >

> > > Well, if the https://patchwork.kernel.org/patch/9504277/ patch from Lv

> > > worked, the attached one should work too (please test), but it can be

> > > justified in a slightly more convincing way.

> > >

> > > Namely, the idea is that acpi_os_read/write_memory() should never be

> > > used before invoking acpi_os_initialize() and since those are the only

> > > places where the list of memory regions is walked under RCU without

> > > extra locking, it is safe to skip the RCU synchronization until that

> > > happens.

> > >

> > > Thanks,

> > > Rafael

> > 

> > Makes sense to me!

> 

> Also looks good to me.

> 

> > 

> > It looks like I can make the grace-period-free boot-time window

> > for CONFIG_PREEMPT=y kernels quite a bit narrower, but it does not

> > look like something suitable for jamming into 4.10.

> 

> OK, we can have this fixed in ACPI layer first.


Definitely.

I have the RCU changes written in ink on paper and they definitely are
-not- something that goes into 4.10.  4.11 at the earliest, and if no
one asks for it in 4.11, it goes into 4.12.  Serious testing needed.  ;-)

							Thanx, Paul

> Thanks

> Lv

> 

> > 

> > 							Thanx, Paul

> > 

> > > ---

> > >  drivers/acpi/osl.c |    8 +++++++-

> > >  1 file changed, 7 insertions(+), 1 deletion(-)

> > >

> > > Index: linux-pm/drivers/acpi/osl.c

> > > ===================================================================

> > > --- linux-pm.orig/drivers/acpi/osl.c

> > > +++ linux-pm/drivers/acpi/osl.c

> > > @@ -378,7 +378,9 @@ static void acpi_os_drop_map_ref(struct

> > >  static void acpi_os_map_cleanup(struct acpi_ioremap *map)

> > >  {

> > >  	if (!map->refcount) {

> > > -		synchronize_rcu_expedited();

> > > +		if (acpi_os_initialized)

> > > +			synchronize_rcu_expedited();

> > > +

> > >  		acpi_unmap(map->phys, map->virt);

> > >  		kfree(map);

> > >  	}

> > > @@ -671,6 +673,8 @@ acpi_os_read_memory(acpi_physical_addres

> > >  	bool unmap = false;

> > >  	u64 dummy;

> > >

> > > +	WARN_ON_ONCE(!acpi_os_initialized);

> > > +

> > >  	rcu_read_lock();

> > >  	virt_addr = acpi_map_vaddr_lookup(phys_addr, size);

> > >  	if (!virt_addr) {

> > > @@ -716,6 +720,8 @@ acpi_os_write_memory(acpi_physical_addre

> > >  	unsigned int size = width / 8;

> > >  	bool unmap = false;

> > >

> > > +	WARN_ON_ONCE(!acpi_os_initialized);

> > > +

> > >  	rcu_read_lock();

> > >  	virt_addr = acpi_map_vaddr_lookup(phys_addr, size);

> > >  	if (!virt_addr) {

> 


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Jan. 11, 2017, 9:21 a.m. | #15
On Mon, Jan 09, 2017 at 09:51:29PM -0800, Paul E. McKenney wrote:
> Definitely.


Btw, we have more breakage from RCU expedited using workqueues:
https://bugzilla.kernel.org/show_bug.cgi?id=192111

I've added you to CC but let me have other bug reporters confirm reverting

  8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")

does fix the issue for them too.

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Jan. 11, 2017, 9:51 a.m. | #16
On Wed, Jan 11, 2017 at 10:21:06AM +0100, Borislav Petkov wrote:
> On Mon, Jan 09, 2017 at 09:51:29PM -0800, Paul E. McKenney wrote:

> > Definitely.

> 

> Btw, we have more breakage from RCU expedited using workqueues:

> https://bugzilla.kernel.org/show_bug.cgi?id=192111

> 

> I've added you to CC but let me have other bug reporters confirm reverting

> 

>   8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue")

> 

> does fix the issue for them too.


Yes, you could make RCU expedited grace periods go back to using the
requesting task, and that would allow expedited grace periods to run early
in the boot process.  But that causes problems with signals and the like
unless you revert a few other patches.  The bugzilla is interesting --
it looks like ACPI was in some cases doing early-boot grace-period waits
some time back?

I have a limping prototype RCU patch that should avoid this problem.

If all goes well, I will send it out late tomorrow evening, Pacific Time.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Jan. 11, 2017, 10:03 a.m. | #17
On Wed, Jan 11, 2017 at 01:51:56AM -0800, Paul E. McKenney wrote:
> Yes, you could make RCU expedited grace periods go back to using the

> requesting task, and that would allow expedited grace periods to run early

> in the boot process.  But that causes problems with signals and the like

> unless you revert a few other patches.  The bugzilla is interesting --

> it looks like ACPI was in some cases doing early-boot grace-period waits

> some time back?


I think this and https://bugzilla.suse.com/show_bug.cgi?id=1017783 is an
example of a bunch of toshiba schlaptops which cause the issue. So it
looks like ACPI is doing something very early on those which tickles the
issue to happen.

But this is ACPI - anything can happen!

> I have a limping prototype RCU patch that should avoid this problem.

>

> If all goes well, I will send it out late tomorrow evening, Pacific Time.


Attach it to the bugzilla too, pls, because the people there trigger the
issue.

I have the respective(?) SUSE bug and I can ask people there to run it
too.

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Jan. 11, 2017, 10:22 a.m. | #18
On Wed, Jan 11, 2017 at 11:03:23AM +0100, Borislav Petkov wrote:
> On Wed, Jan 11, 2017 at 01:51:56AM -0800, Paul E. McKenney wrote:

> > Yes, you could make RCU expedited grace periods go back to using the

> > requesting task, and that would allow expedited grace periods to run early

> > in the boot process.  But that causes problems with signals and the like

> > unless you revert a few other patches.  The bugzilla is interesting --

> > it looks like ACPI was in some cases doing early-boot grace-period waits

> > some time back?

> 

> I think this and https://bugzilla.suse.com/show_bug.cgi?id=1017783 is an

> example of a bunch of toshiba schlaptops which cause the issue. So it

> looks like ACPI is doing something very early on those which tickles the

> issue to happen.

> 

> But this is ACPI - anything can happen!


;-) ;-) ;-)

> > I have a limping prototype RCU patch that should avoid this problem.

> >

> > If all goes well, I will send it out late tomorrow evening, Pacific Time.

> 

> Attach it to the bugzilla too, pls, because the people there trigger the

> issue.

> 

> I have the respective(?) SUSE bug and I can ask people there to run it

> too.


That would be very good!  Thinking good thoughts for the ongoing tests...

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch hide | download patch | download mbox

---
 drivers/iommu/amd_iommu_init.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-pm/drivers/iommu/amd_iommu_init.c
===================================================================
--- linux-pm.orig/drivers/iommu/amd_iommu_init.c
+++ linux-pm/drivers/iommu/amd_iommu_init.c
@@ -2230,7 +2230,7 @@  static int __init early_amd_iommu_init(v
 	 */
 	ret = check_ivrs_checksum(ivrs_base);
 	if (ret)
-		return ret;
+		goto out;
 
 	amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base);
 	DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type);