Message ID | 1597850436-116171-18-git-send-email-john.garry@huawei.com |
---|---|
State | New |
Headers | show |
Series | blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs | expand |
On Wed, 2020-08-19 at 23:20 +0800, John Garry wrote: > From: Kashyap Desai <kashyap.desai@broadcom.com> > > Fusion adapters can steer completions to individual queues, and > we now have support for shared host-wide tags. > So we can enable multiqueue support for fusion adapters. > > Once driver enable shared host-wide tags, cpu hotplug feature is also > supported as it was enabled using below patchsets - > commit bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are > offline") > > Currently driver has provision to disable host-wide tags using > "host_tagset_enable" module parameter. > > Once we do not have any major performance regression using host-wide > tags, we will drop the hand-crafted interrupt affinity settings. > > Performance is also meeting the expecatation - (used both none and > mq-deadline scheduler) > 24 Drive SSD on Aero with/without this patch can get 3.1M IOPs > 3 VDs consist of 8 SAS SSD on Aero with/without this patch can get 3.1M > IOPs. > > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> > Signed-off-by: Hannes Reinecke <hare@suse.com> > Signed-off-by: John Garry <john.garry@huawei.com> Reverting this commit fixed an issue that Dell Power Edge R6415 server with megaraid_sas is unable to boot. c1:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02) DeviceName: Integrated RAID Subsystem: Dell PERC H730P Mini Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 48 NUMA node: 3 Region 0: I/O ports at c000 [size=256] Region 1: Memory at a5500000 (64-bit, non-prefetchable) [size=64K] Region 3: Memory at a5400000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at <ignored> [disabled] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 512 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s <2us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x8 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+, NROPrPrP-, LTR- 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS-, TPHComp-, ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Vector table: BAR=1 offset=0000e000 PBA: BAR=1 offset=0000f000 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 04000001 c000000f c1080000 4ba9007a Capabilities: [1e0 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn-, PerformEqu- LaneErrStat: 0 Capabilities: [1c0 v1] Power Budgeting <?> Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas [ 26.330282][ T567] megasas: 07.714.04.00-rc1 [ 26.355663][ T611] ahci 0000:87:00.2: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode [ 26.364585][ T611] ahci 0000:87:00.2: flags: 64bit ncq sntf ilck pm led clo only pmp fbs pio slum part [ 26.376125][ T289] megaraid_sas 0000:c1:00.0: FW now in Ready state [ 26.382534][ T289] megaraid_sas 0000:c1:00.0: 63 bit DMA mask and 32 bit consistent mask [ 26.391537][ T289] megaraid_sas 0000:c1:00.0: firmware supports msix : (96) [ 26.431767][ T611] scsi host1: ahci [ 26.492580][ T611] ata1: SATA max UDMA/133 abar m4096@0xc0a02000 port 0xc0a02100 irq 60 [ 26.701197][ T283] bnxt_en 0000:84:00.0 eth0: Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet found at mem ad210000, node addr 4c:d9:8f:4a:20:e6 [ 26.714352][ T283] bnxt_en 0000:84:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link) [ 26.743738][ T24] tg3 0000:81:00.0 eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 4c:d9:8f:65:3f:32 [ 26.754974][ T24] tg3 0000:81:00.0 eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1]) [ 26.765523][ T24] tg3 0000:81:00.0 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] [ 26.774074][ T24] tg3 0000:81:00.0 eth1: dma_rwctrl[00000001] dma_mask[64-bit] [ 26.842518][ T620] ata1: SATA link down (SStatus 0 SControl 300) [ 26.945741][ T289] megaraid_sas 0000:c1:00.0: requested/available msix 49/49 [ 26.952912][ T289] megaraid_sas 0000:c1:00.0: current msix/online cpus : (49/48) [ 26.960401][ T289] megaraid_sas 0000:c1:00.0: RDPQ mode : (disabled) [ 26.966876][ T289] megaraid_sas 0000:c1:00.0: Current firmware supports maximum commands: 928 LDIO threshold: 0 [ 27.079361][ T289] megaraid_sas 0000:c1:00.0: Performance mode :Latency (latency index = 1) [ 27.085381][ T283] bnxt_en 0000:84:00.1 eth2: Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet found at mem ad200000, node addr 4c:d9:8f:4a:20:e7 [ 27.087824][ T289] megaraid_sas 0000:c1:00.0: FW supports sync cache : No [ 27.100959][ T283] bnxt_en 0000:84:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link) [ 27.107835][ T289] megaraid_sas 0000:c1:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009 [ 27.130978][ T24] tg3 0000:81:00.1 eth3: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 4c:d9:8f:65:3f:33 [ 27.142919][ T24] tg3 0000:81:00.1 eth3: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1]) [ 27.146042][ T571] bnxt_en 0000:84:00.1 enp132s0f1np1: renamed from eth2 [ 27.153456][ T24] tg3 0000:81:00.1 eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] [ 27.153467][ T24] tg3 0000:81:00.1 eth3: dma_rwctrl[00000001] dma_mask[64-bit] [ 27.200900][ T289] megaraid_sas 0000:c1:00.0: FW provided supportMaxExtLDs: 1 max_lds: 64 [ 27.209174][ T289] megaraid_sas 0000:c1:00.0: controller type : MR(2048MB) [ 27.216260][ T289] megaraid_sas 0000:c1:00.0: Online Controller Reset(OCR) : Enabled [ 27.224105][ T289] megaraid_sas 0000:c1:00.0: Secure JBOD support : No [ 27.230720][ T289] megaraid_sas 0000:c1:00.0: NVMe passthru support : No [ 27.237527][ T289] megaraid_sas 0000:c1:00.0: FW provided TM TaskAbort/Reset timeout : 0 secs/0 secs [ 27.246754][ T289] megaraid_sas 0000:c1:00.0: JBOD sequence map support : No [ 27.253906][ T289] megaraid_sas 0000:c1:00.0: PCI Lane Margining support : No [ 27.341447][ T289] megaraid_sas 0000:c1:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000 [ 27.351729][ T289] megaraid_sas 0000:c1:00.0: INIT adapter done [ 27.357742][ T289] megaraid_sas 0000:c1:00.0: JBOD sequence map is disabled megasas_setup_jbod_map 5709 [ 27.367832][ T289] megaraid_sas 0000:c1:00.0: pci id : (0x1000)/(0x005d)/(0x1028)/(0x1f47) [ 27.376287][ T289] megaraid_sas 0000:c1:00.0: unevenspan support : yes [ 27.382925][ T289] megaraid_sas 0000:c1:00.0: firmware crash dump : no [ 27.389547][ T289] megaraid_sas 0000:c1:00.0: JBOD sequence map : disabled [ 27.397816][ T289] megaraid_sas 0000:c1:00.0: Max firmware commands: 927 shared with nr_hw_queues = 48 [ 27.407232][ T289] scsi host0: Avago SAS based MegaRAID driver [ 27.430212][ T586] bnxt_en 0000:84:00.0 enp132s0f0np0: renamed from eth0 [ 27.781038][ T603] tg3 0000:81:00.0 eno1: renamed from eth1 [ 28.194046][ T552] tg3 0000:81:00.1 eno2: renamed from eth3 [ 251.961152][ T330] INFO: task systemd-udevd:567 blocked for more than 122 seconds. [ 251.968876][ T330] Not tainted 5.10.0-rc1-next-20201102 #1 [ 251.975003][ T330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 251.983546][ T330] task:systemd-udevd state:D stack:27224 pid: 567 ppid: 506 flags:0x00004324 [ 251.992620][ T330] Call Trace: [ 251.995784][ T330] __schedule+0x71d/0x1b60 [ 252.000067][ T330] ? __sched_text_start+0x8/0x8 [ 252.004798][ T330] schedule+0xbf/0x270 [ 252.008735][ T330] schedule_timeout+0x3fc/0x590 [ 252.013464][ T330] ? usleep_range+0x120/0x120 [ 252.018008][ T330] ? wait_for_completion+0x156/0x250 [ 252.023176][ T330] ? lock_downgrade+0x700/0x700 [ 252.027886][ T330] ? rcu_read_unlock+0x40/0x40 [ 252.032530][ T330] ? do_raw_spin_lock+0x121/0x290 [ 252.037412][ T330] ? lockdep_hardirqs_on_prepare+0x27c/0x3d0 [ 252.043268][ T330] ? _raw_spin_unlock_irq+0x1f/0x30 [ 252.048331][ T330] wait_for_completion+0x15e/0x250 [ 252.053323][ T330] ? wait_for_completion_interruptible+0x320/0x320 [ 252.059687][ T330] ? lockdep_hardirqs_on_prepare+0x27c/0x3d0 [ 252.065543][ T330] ? _raw_spin_unlock_irq+0x1f/0x30 [ 252.070606][ T330] __flush_work+0x42a/0x900 [ 252.074989][ T330] ? queue_delayed_work_on+0x90/0x90 [ 252.080139][ T330] ? __queue_work+0x463/0xf40 [ 252.084700][ T330] ? init_pwq+0x320/0x320 [ 252.088891][ T330] ? queue_work_on+0x5e/0x80 [ 252.093364][ T330] ? trace_hardirqs_on+0x1c/0x150 [ 252.098255][ T330] work_on_cpu+0xe7/0x130 [ 252.102461][ T330] ? flush_delayed_work+0xc0/0xc0 [ 252.107342][ T330] ? __mutex_unlock_slowpath+0xd4/0x670 [ 252.112764][ T330] ? work_debug_hint+0x30/0x30 [ 252.117391][ T330] ? pci_device_shutdown+0x80/0x80 [ 252.122378][ T330] ? cpumask_next_and+0x57/0x80 [ 252.127094][ T330] pci_device_probe+0x500/0x5c0 [ 252.131824][ T330] ? pci_device_remove+0x1f0/0x1f0 [ 252.136805][ T330] really_probe+0x207/0xad0 [ 252.141191][ T330] ? device_driver_attach+0x120/0x120 [ 252.146428][ T330] driver_probe_device+0x1f1/0x370 [ 252.151424][ T330] device_driver_attach+0xe5/0x120 [ 252.156399][ T330] __driver_attach+0xf0/0x260 [ 252.160953][ T330] bus_for_each_dev+0x117/0x1a0 [ 252.165669][ T330] ? subsys_dev_iter_exit+0x10/0x10 [ 252.170731][ T330] bus_add_driver+0x399/0x560 [ 252.175289][ T330] driver_register+0x189/0x310 [ 252.179919][ T330] ? 0xffffffffc05c1000 [ 252.183960][ T330] megasas_init+0x117/0x1000 [megaraid_sas] [ 252.189713][ T330] do_one_initcall+0xf6/0x510 [ 252.194267][ T330] ? perf_trace_initcall_level+0x490/0x490 [ 252.199940][ T330] ? kasan_unpoison_shadow+0x30/0x40 [ 252.205104][ T330] ? __kasan_kmalloc.constprop.11+0xc1/0xd0 [ 252.210859][ T330] ? do_init_module+0x49/0x6c0 [ 252.215500][ T330] ? kmem_cache_alloc_trace+0x11f/0x1e0 [ 252.220925][ T330] ? kasan_unpoison_shadow+0x30/0x40 [ 252.226068][ T330] do_init_module+0x1ed/0x6c0 [ 252.230608][ T330] load_module+0x4a59/0x5d20 [ 252.235081][ T330] ? layout_and_allocate+0x2770/0x2770 [ 252.240404][ T330] ? __vmalloc_node+0x8d/0x100 [ 252.245046][ T330] ? kernel_read_file+0x485/0x5a0 [ 252.249934][ T330] ? kernel_read_file+0x305/0x5a0 [ 252.254839][ T330] ? __x64_sys_fsconfig+0x970/0x970 [ 252.259903][ T330] ? __do_sys_finit_module+0xff/0x180 [ 252.265153][ T330] __do_sys_finit_module+0xff/0x180 [ 252.270216][ T330] ? __do_sys_init_module+0x1d0/0x1d0 [ 252.275465][ T330] ? __fget_files+0x1c3/0x2e0 [ 252.280010][ T330] do_syscall_64+0x33/0x40 [ 252.284304][ T330] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 252.290054][ T330] RIP: 0033:0x7fbb3e2fa78d [ 252.294348][ T330] Code: Unable to access opcode bytes at RIP 0x7fbb3e2fa763. [ 252.301584][ T330] RSP: 002b:00007ffe572e8d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 252.309855][ T330] RAX: ffffffffffffffda RBX: 000055c7795d90f0 RCX: 00007fbb3e2fa78d [ 252.317703][ T330] RDX: 0000000000000000 RSI: 00007fbb3ee6c82d RDI: 0000000000000006 [ 252.325553][ T330] RBP: 00007fbb3ee6c82d R08: 0000000000000000 R09: 00007ffe572e8e40 [ 252.333402][ T330] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000 [ 252.341257][ T330] R13: 000055c7795930e0 R14: 0000000000020000 R15: 0000000000000000 [ 252.349117][ T330] [ 252.349117][ T330] Showing all locks held in the system: [ 252.356770][ T330] 3 locks held by kworker/3:1/289: [ 252.361759][ T330] #0: ffff8881001eb938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ec/0x1610 [ 252.371976][ T330] #1: ffffc90004ee7e00 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x820/0x1610 [ 252.382803][ T330] #2: ffff8881430380e0 (&shost->scan_mutex){+.+.}-{3:3}, at: scsi_scan_host_selected+0xde/0x260 [ 252.393199][ T330] 1 lock held by khungtaskd/330: [ 252.397993][ T330] #0: ffffffff9d4d3760 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.52+0x0/0x30 [ 252.408296][ T330] 1 lock held by systemd-journal/420: [ 252.413562][ T330] 1 lock held by systemd-udevd/567: [ 252.418619][ T330] #0: ffff8881207ac218 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x37/0x120 [ 252.428159][ T330] [ 252.430355][ T330] ============================================= [ 252.430355][ T330] > --- > drivers/scsi/megaraid/megaraid_sas_base.c | 39 +++++++++++++++++++++ > drivers/scsi/megaraid/megaraid_sas_fusion.c | 29 ++++++++------- > 2 files changed, 55 insertions(+), 13 deletions(-) > > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c > b/drivers/scsi/megaraid/megaraid_sas_base.c > index 861f7140f52e..6960922d0d7f 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_base.c > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c > @@ -37,6 +37,7 @@ > #include <linux/poll.h> > #include <linux/vmalloc.h> > #include <linux/irq_poll.h> > +#include <linux/blk-mq-pci.h> > > #include <scsi/scsi.h> > #include <scsi/scsi_cmnd.h> > @@ -113,6 +114,10 @@ unsigned int enable_sdev_max_qd; > module_param(enable_sdev_max_qd, int, 0444); > MODULE_PARM_DESC(enable_sdev_max_qd, "Enable sdev max qd as can_queue. > Default: 0"); > > +int host_tagset_enable = 1; > +module_param(host_tagset_enable, int, 0444); > +MODULE_PARM_DESC(host_tagset_enable, "Shared host tagset enable/disable > Default: enable(1)"); > + > MODULE_LICENSE("GPL"); > MODULE_VERSION(MEGASAS_VERSION); > MODULE_AUTHOR("megaraidlinux.pdl@broadcom.com"); > @@ -3119,6 +3124,19 @@ megasas_bios_param(struct scsi_device *sdev, struct > block_device *bdev, > return 0; > } > > +static int megasas_map_queues(struct Scsi_Host *shost) > +{ > + struct megasas_instance *instance; > + > + instance = (struct megasas_instance *)shost->hostdata; > + > + if (shost->nr_hw_queues == 1) > + return 0; > + > + return blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT], > + instance->pdev, instance->low_latency_index_start); > +} > + > static void megasas_aen_polling(struct work_struct *work); > > /** > @@ -3427,6 +3445,7 @@ static struct scsi_host_template megasas_template = { > .eh_timed_out = megasas_reset_timer, > .shost_attrs = megaraid_host_attrs, > .bios_param = megasas_bios_param, > + .map_queues = megasas_map_queues, > .change_queue_depth = scsi_change_queue_depth, > .max_segment_size = 0xffffffff, > }; > @@ -6808,6 +6827,26 @@ static int megasas_io_attach(struct megasas_instance > *instance) > host->max_lun = MEGASAS_MAX_LUN; > host->max_cmd_len = 16; > > + /* Use shared host tagset only for fusion adaptors > + * if there are managed interrupts (smp affinity enabled case). > + * Single msix_vectors in kdump, so shared host tag is also disabled. > + */ > + > + host->host_tagset = 0; > + host->nr_hw_queues = 1; > + > + if ((instance->adapter_type != MFI_SERIES) && > + (instance->msix_vectors > instance->low_latency_index_start) && > + host_tagset_enable && > + instance->smp_affinity_enable) { > + host->host_tagset = 1; > + host->nr_hw_queues = instance->msix_vectors - > + instance->low_latency_index_start; > + } > + > + dev_info(&instance->pdev->dev, > + "Max firmware commands: %d shared with nr_hw_queues = %d\n", > + instance->max_fw_cmds, host->nr_hw_queues); > /* > * Notify the mid-layer about the new controller > */ > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c > b/drivers/scsi/megaraid/megaraid_sas_fusion.c > index 0824410f78f8..a4251121f173 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > @@ -359,24 +359,29 @@ megasas_get_msix_index(struct megasas_instance > *instance, > { > int sdev_busy; > > - /* nr_hw_queue = 1 for MegaRAID */ > - struct blk_mq_hw_ctx *hctx = > - scmd->device->request_queue->queue_hw_ctx[0]; > - > - sdev_busy = atomic_read(&hctx->nr_active); > + /* TBD - if sml remove device_busy in future, driver > + * should track counter in internal structure. > + */ > + sdev_busy = atomic_read(&scmd->device->device_busy); > > if (instance->perf_mode == MR_BALANCED_PERF_MODE && > - sdev_busy > (data_arms * MR_DEVICE_HIGH_IOPS_DEPTH)) > + sdev_busy > (data_arms * MR_DEVICE_HIGH_IOPS_DEPTH)) { > cmd->request_desc->SCSIIO.MSIxIndex = > mega_mod64((atomic64_add_return(1, &instance- > >high_iops_outstanding) / > MR_HIGH_IOPS_BATCH_COUNT), instance- > >low_latency_index_start); > - else if (instance->msix_load_balance) > + } else if (instance->msix_load_balance) { > cmd->request_desc->SCSIIO.MSIxIndex = > (mega_mod64(atomic64_add_return(1, &instance- > >total_io_count), > instance->msix_vectors)); > - else > + } else if (instance->host->nr_hw_queues > 1) { > + u32 tag = blk_mq_unique_tag(scmd->request); > + > + cmd->request_desc->SCSIIO.MSIxIndex = > blk_mq_unique_tag_to_hwq(tag) + > + instance->low_latency_index_start; > + } else { > cmd->request_desc->SCSIIO.MSIxIndex = > instance->reply_map[raw_smp_processor_id()]; > + } > } > > /** > @@ -956,9 +961,6 @@ megasas_alloc_cmds_fusion(struct megasas_instance > *instance) > if (megasas_alloc_cmdlist_fusion(instance)) > goto fail_exit; > > - dev_info(&instance->pdev->dev, "Configured max firmware commands: %d\n", > - instance->max_fw_cmds); > - > /* The first 256 bytes (SMID 0) is not used. Don't add to the cmd list > */ > io_req_base = fusion->io_request_frames + > MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE; > io_req_base_phys = fusion->io_request_frames_phys + > MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE; > @@ -1102,8 +1104,9 @@ megasas_ioc_init_fusion(struct megasas_instance > *instance) > MR_HIGH_IOPS_QUEUE_COUNT) && cur_intr_coalescing) > instance->perf_mode = MR_BALANCED_PERF_MODE; > > - dev_info(&instance->pdev->dev, "Performance mode :%s\n", > - MEGASAS_PERF_MODE_2STR(instance->perf_mode)); > + dev_info(&instance->pdev->dev, "Performance mode :%s (latency index = > %d)\n", > + MEGASAS_PERF_MODE_2STR(instance->perf_mode), > + instance->low_latency_index_start); > > instance->fw_sync_cache_support = (scratch_pad_1 & > MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0;
> On Wed, 2020-08-19 at 23:20 +0800, John Garry wrote: > > From: Kashyap Desai <kashyap.desai@broadcom.com> > > > > Fusion adapters can steer completions to individual queues, and we now > > have support for shared host-wide tags. > > So we can enable multiqueue support for fusion adapters. > > > > Once driver enable shared host-wide tags, cpu hotplug feature is also > > supported as it was enabled using below patchsets - commit > > bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are > > offline") > > > > Currently driver has provision to disable host-wide tags using > > "host_tagset_enable" module parameter. > > > > Once we do not have any major performance regression using host-wide > > tags, we will drop the hand-crafted interrupt affinity settings. > > > > Performance is also meeting the expecatation - (used both none and > > mq-deadline scheduler) > > 24 Drive SSD on Aero with/without this patch can get 3.1M IOPs > > 3 VDs consist of 8 SAS SSD on Aero with/without this patch can get > > 3.1M IOPs. > > > > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> > > Signed-off-by: Hannes Reinecke <hare@suse.com> > > Signed-off-by: John Garry <john.garry@huawei.com> > > Reverting this commit fixed an issue that Dell Power Edge R6415 server > with > megaraid_sas is unable to boot. I will take a look at this. BTW, can you try keeping same PATCH but use module parameter "host_tagset_enable =0" Kashyap
On 02/11/2020 14:17, Qian Cai wrote: > [ 251.961152][ T330] INFO: task systemd-udevd:567 blocked for more than 122 seconds. > [ 251.968876][ T330] Not tainted 5.10.0-rc1-next-20201102 #1 > [ 251.975003][ T330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 251.983546][ T330] task:systemd-udevd state:D stack:27224 pid: 567 ppid: 506 flags:0x00004324 > [ 251.992620][ T330] Call Trace: > [ 251.995784][ T330] __schedule+0x71d/0x1b60 > [ 252.000067][ T330] ? __sched_text_start+0x8/0x8 > [ 252.004798][ T330] schedule+0xbf/0x270 > [ 252.008735][ T330] schedule_timeout+0x3fc/0x590 > [ 252.013464][ T330] ? usleep_range+0x120/0x120 > [ 252.018008][ T330] ? wait_for_completion+0x156/0x250 > [ 252.023176][ T330] ? lock_downgrade+0x700/0x700 > [ 252.027886][ T330] ? rcu_read_unlock+0x40/0x40 > [ 252.032530][ T330] ? do_raw_spin_lock+0x121/0x290 > [ 252.037412][ T330] ? lockdep_hardirqs_on_prepare+0x27c/0x3d0 > [ 252.043268][ T330] ? _raw_spin_unlock_irq+0x1f/0x30 > [ 252.048331][ T330] wait_for_completion+0x15e/0x250 > [ 252.053323][ T330] ? wait_for_completion_interruptible+0x320/0x320 > [ 252.059687][ T330] ? lockdep_hardirqs_on_prepare+0x27c/0x3d0 > [ 252.065543][ T330] ? _raw_spin_unlock_irq+0x1f/0x30 > [ 252.070606][ T330] __flush_work+0x42a/0x900 > [ 252.074989][ T330] ? queue_delayed_work_on+0x90/0x90 > [ 252.080139][ T330] ? __queue_work+0x463/0xf40 > [ 252.084700][ T330] ? init_pwq+0x320/0x320 > [ 252.088891][ T330] ? queue_work_on+0x5e/0x80 > [ 252.093364][ T330] ? trace_hardirqs_on+0x1c/0x150 > [ 252.098255][ T330] work_on_cpu+0xe7/0x130 > [ 252.102461][ T330] ? flush_delayed_work+0xc0/0xc0 > [ 252.107342][ T330] ? __mutex_unlock_slowpath+0xd4/0x670 > [ 252.112764][ T330] ? work_debug_hint+0x30/0x30 > [ 252.117391][ T330] ? pci_device_shutdown+0x80/0x80 > [ 252.122378][ T330] ? cpumask_next_and+0x57/0x80 > [ 252.127094][ T330] pci_device_probe+0x500/0x5c0 > [ 252.131824][ T330] ? pci_device_remove+0x1f0/0x1f0 Is CONFIG_DEBUG_TEST_DRIVER_REMOVE enabled? I figure it is, with this call. Or please share the .config Cheers, John > [ 252.136805][ T330] really_probe+0x207/0xad0 > [ 252.141191][ T330] ? device_driver_attach+0x120/0x120 > [ 252.146428][ T330] driver_probe_device+0x1f1/0x370 > [ 252.151424][ T330] device_driver_attach+0xe5/0x120 > [ 252.156399][ T330] __driver_attach+0xf0/0x260 > [ 252.160953][ T330] bus_for_each_dev+0x117/0x1a0 > [ 252.165669][ T330] ? subsys_dev_iter_exit+0x10/0x10 > [ 252.170731][ T330] bus_add_driver+0x399/0x560 > [ 252.175289][ T330] driver_register+0x189/0x310 > [ 252.179919][ T330] ? 0xffffffffc05c1000 > [ 252.183960][ T330] megasas_init+0x117/0x1000 [megaraid_sas] > [ 252.189713][ T330] do_one_initcall+0xf6/0x510 > [ 252.194267][ T330] ? perf_trace_initcall_level+0x490/0x490 > [ 252.199940][ T330] ? kasan_unpoison_shadow+0x30/0x40 > [ 252.205104][ T330] ? __kasan_kmalloc.constprop.11+0xc1/0xd0 > [ 252.210859][ T330] ? do_init_module+0x49/0x6c0 > [ 252.215500][ T330] ? kmem_cache_alloc_trace+0x11f/0x1e0 > [ 252.220925][ T330] ? kasan_unpoison_shadow+0x30/0x40 > [ 252.226068][ T330] do_init_module+0x1ed/0x6c0 > [ 252.230608][ T330] load_module+0x4a59/0x5d20 > [ 252.235081][ T330] ? layout_and_allocate+0x2770/0x2770 > [ 252.240404][ T330] ? __vmalloc_node+0x8d/0x100 > [ 252.245046][ T330] ? kernel_read_file+0x485/0x5a0 > [ 252.249934][ T330] ? kernel_read_file+0x305/0x5a0 > [ 252.254839][ T330] ? __x64_sys_fsconfig+0x970/0x970 > [ 252.259903][ T330] ? __do_sys_finit_module+0xff/0x180 > [ 252.265153][ T330] __do_sys_finit_module+0xff/0x180 > [ 252.270216][ T330] ? __do_sys_init_module+0x1d0/0x1d0 > [ 252.275465][ T330] ? __fget_files+0x1c3/0x2e0 > [ 252.280010][ T330] do_syscall_64+0x33/0x40 > [ 252.284304][ T330] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 252.290054][ T330] RIP: 0033:0x7fbb3e2fa78d > [ 252.294348][ T330] Code: Unable to access opcode bytes at RIP 0x7fbb3e2fa763. > [ 252.301584][ T330] RSP: 002b:00007ffe572e8d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > [ 252.309855][ T330] RAX: ffffffffffffffda RBX: 000055c7795d90f0 RCX: 00007fbb3e2fa78d > [ 252.317703][ T330] RDX: 0000000000000000 RSI: 00007fbb3ee6c82d RDI: 0000000000000006 > [ 252.325553][ T330] RBP: 00007fbb3ee6c82d R08: 0000000000000000 R09: 00007ffe572e8e40 > [ 252.333402][ T330] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000 > [ 252.341257][ T330] R13: 000055c7795930e0 R14: 0000000000020000 R15: 0000000000000000 > [ 252.349117][ T330]
On Mon, 2020-11-02 at 14:51 +0000, John Garry wrote: > On 02/11/2020 14:17, Qian Cai wrote: > > [ 251.961152][ T330] INFO: task systemd-udevd:567 blocked for more than > > 122 seconds. > > [ 251.968876][ T330] Not tainted 5.10.0-rc1-next-20201102 #1 > > [ 251.975003][ T330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [ 251.983546][ T330] task:systemd-udevd state:D stack:27224 pid: 567 > > ppid: 506 flags:0x00004324 > > [ 251.992620][ T330] Call Trace: > > [ 251.995784][ T330] __schedule+0x71d/0x1b60 > > [ 252.000067][ T330] ? __sched_text_start+0x8/0x8 > > [ 252.004798][ T330] schedule+0xbf/0x270 > > [ 252.008735][ T330] schedule_timeout+0x3fc/0x590 > > [ 252.013464][ T330] ? usleep_range+0x120/0x120 > > [ 252.018008][ T330] ? wait_for_completion+0x156/0x250 > > [ 252.023176][ T330] ? lock_downgrade+0x700/0x700 > > [ 252.027886][ T330] ? rcu_read_unlock+0x40/0x40 > > [ 252.032530][ T330] ? do_raw_spin_lock+0x121/0x290 > > [ 252.037412][ T330] ? lockdep_hardirqs_on_prepare+0x27c/0x3d0 > > [ 252.043268][ T330] ? _raw_spin_unlock_irq+0x1f/0x30 > > [ 252.048331][ T330] wait_for_completion+0x15e/0x250 > > [ 252.053323][ T330] ? wait_for_completion_interruptible+0x320/0x320 > > [ 252.059687][ T330] ? lockdep_hardirqs_on_prepare+0x27c/0x3d0 > > [ 252.065543][ T330] ? _raw_spin_unlock_irq+0x1f/0x30 > > [ 252.070606][ T330] __flush_work+0x42a/0x900 > > [ 252.074989][ T330] ? queue_delayed_work_on+0x90/0x90 > > [ 252.080139][ T330] ? __queue_work+0x463/0xf40 > > [ 252.084700][ T330] ? init_pwq+0x320/0x320 > > [ 252.088891][ T330] ? queue_work_on+0x5e/0x80 > > [ 252.093364][ T330] ? trace_hardirqs_on+0x1c/0x150 > > [ 252.098255][ T330] work_on_cpu+0xe7/0x130 > > [ 252.102461][ T330] ? flush_delayed_work+0xc0/0xc0 > > [ 252.107342][ T330] ? __mutex_unlock_slowpath+0xd4/0x670 > > [ 252.112764][ T330] ? work_debug_hint+0x30/0x30 > > [ 252.117391][ T330] ? pci_device_shutdown+0x80/0x80 > > [ 252.122378][ T330] ? cpumask_next_and+0x57/0x80 > > [ 252.127094][ T330] pci_device_probe+0x500/0x5c0 > > [ 252.131824][ T330] ? pci_device_remove+0x1f0/0x1f0 > > Is CONFIG_DEBUG_TEST_DRIVER_REMOVE enabled? I figure it is, with this call. > > Or please share the .config No. https://cailca.coding.net/public/linux/mm/git/files/master/x86.config > > Cheers, > John > > > [ 252.136805][ T330] really_probe+0x207/0xad0 > > [ 252.141191][ T330] ? device_driver_attach+0x120/0x120 > > [ 252.146428][ T330] driver_probe_device+0x1f1/0x370 > > [ 252.151424][ T330] device_driver_attach+0xe5/0x120 > > [ 252.156399][ T330] __driver_attach+0xf0/0x260 > > [ 252.160953][ T330] bus_for_each_dev+0x117/0x1a0 > > [ 252.165669][ T330] ? subsys_dev_iter_exit+0x10/0x10 > > [ 252.170731][ T330] bus_add_driver+0x399/0x560 > > [ 252.175289][ T330] driver_register+0x189/0x310 > > [ 252.179919][ T330] ? 0xffffffffc05c1000 > > [ 252.183960][ T330] megasas_init+0x117/0x1000 [megaraid_sas] > > [ 252.189713][ T330] do_one_initcall+0xf6/0x510 > > [ 252.194267][ T330] ? perf_trace_initcall_level+0x490/0x490 > > [ 252.199940][ T330] ? kasan_unpoison_shadow+0x30/0x40 > > [ 252.205104][ T330] ? __kasan_kmalloc.constprop.11+0xc1/0xd0 > > [ 252.210859][ T330] ? do_init_module+0x49/0x6c0 > > [ 252.215500][ T330] ? kmem_cache_alloc_trace+0x11f/0x1e0 > > [ 252.220925][ T330] ? kasan_unpoison_shadow+0x30/0x40 > > [ 252.226068][ T330] do_init_module+0x1ed/0x6c0 > > [ 252.230608][ T330] load_module+0x4a59/0x5d20 > > [ 252.235081][ T330] ? layout_and_allocate+0x2770/0x2770 > > [ 252.240404][ T330] ? __vmalloc_node+0x8d/0x100 > > [ 252.245046][ T330] ? kernel_read_file+0x485/0x5a0 > > [ 252.249934][ T330] ? kernel_read_file+0x305/0x5a0 > > [ 252.254839][ T330] ? __x64_sys_fsconfig+0x970/0x970 > > [ 252.259903][ T330] ? __do_sys_finit_module+0xff/0x180 > > [ 252.265153][ T330] __do_sys_finit_module+0xff/0x180 > > [ 252.270216][ T330] ? __do_sys_init_module+0x1d0/0x1d0 > > [ 252.275465][ T330] ? __fget_files+0x1c3/0x2e0 > > [ 252.280010][ T330] do_syscall_64+0x33/0x40 > > [ 252.284304][ T330] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [ 252.290054][ T330] RIP: 0033:0x7fbb3e2fa78d > > [ 252.294348][ T330] Code: Unable to access opcode bytes at RIP > > 0x7fbb3e2fa763. > > [ 252.301584][ T330] RSP: 002b:00007ffe572e8d18 EFLAGS: 00000246 ORIG_RAX: > > 0000000000000139 > > [ 252.309855][ T330] RAX: ffffffffffffffda RBX: 000055c7795d90f0 RCX: > > 00007fbb3e2fa78d > > [ 252.317703][ T330] RDX: 0000000000000000 RSI: 00007fbb3ee6c82d RDI: > > 0000000000000006 > > [ 252.325553][ T330] RBP: 00007fbb3ee6c82d R08: 0000000000000000 R09: > > 00007ffe572e8e40 > > [ 252.333402][ T330] R10: 0000000000000006 R11: 0000000000000246 R12: > > 0000000000000000 > > [ 252.341257][ T330] R13: 000055c7795930e0 R14: 0000000000020000 R15: > > 0000000000000000 > > [ 252.349117][ T330]
On Mon, 2020-11-02 at 20:01 +0530, Kashyap Desai wrote: > > On Wed, 2020-08-19 at 23:20 +0800, John Garry wrote: > > > From: Kashyap Desai <kashyap.desai@broadcom.com> > > > > > > Fusion adapters can steer completions to individual queues, and we now > > > have support for shared host-wide tags. > > > So we can enable multiqueue support for fusion adapters. > > > > > > Once driver enable shared host-wide tags, cpu hotplug feature is also > > > supported as it was enabled using below patchsets - commit > > > bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are > > > offline") > > > > > > Currently driver has provision to disable host-wide tags using > > > "host_tagset_enable" module parameter. > > > > > > Once we do not have any major performance regression using host-wide > > > tags, we will drop the hand-crafted interrupt affinity settings. > > > > > > Performance is also meeting the expecatation - (used both none and > > > mq-deadline scheduler) > > > 24 Drive SSD on Aero with/without this patch can get 3.1M IOPs > > > 3 VDs consist of 8 SAS SSD on Aero with/without this patch can get > > > 3.1M IOPs. > > > > > > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> > > > Signed-off-by: Hannes Reinecke <hare@suse.com> > > > Signed-off-by: John Garry <john.garry@huawei.com> > > > > Reverting this commit fixed an issue that Dell Power Edge R6415 server > > with > > megaraid_sas is unable to boot. > > I will take a look at this. BTW, can you try keeping same PATCH but use > module parameter "host_tagset_enable =0" Yes, that also works.
On 02/11/2020 15:18, Qian Cai wrote: > On Mon, 2020-11-02 at 14:51 +0000, John Garry wrote: >> On 02/11/2020 14:17, Qian Cai wrote: >>> [ 251.961152][ T330] INFO: task systemd-udevd:567 blocked for more than >>> 122 seconds. >>> [ 251.968876][ T330] Not tainted 5.10.0-rc1-next-20201102 #1 >>> [ 251.975003][ T330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [ 251.983546][ T330] task:systemd-udevd state:D stack:27224 pid: 567 >>> ppid: 506 flags:0x00004324 >>> [ 251.992620][ T330] Call Trace: >>> [ 251.995784][ T330] __schedule+0x71d/0x1b60 >>> [ 252.000067][ T330] ? __sched_text_start+0x8/0x8 >>> [ 252.004798][ T330] schedule+0xbf/0x270 >>> [ 252.008735][ T330] schedule_timeout+0x3fc/0x590 >>> [ 252.013464][ T330] ? usleep_range+0x120/0x120 >>> [ 252.018008][ T330] ? wait_for_completion+0x156/0x250 >>> [ 252.023176][ T330] ? lock_downgrade+0x700/0x700 >>> [ 252.027886][ T330] ? rcu_read_unlock+0x40/0x40 >>> [ 252.032530][ T330] ? do_raw_spin_lock+0x121/0x290 >>> [ 252.037412][ T330] ? lockdep_hardirqs_on_prepare+0x27c/0x3d0 >>> [ 252.043268][ T330] ? _raw_spin_unlock_irq+0x1f/0x30 >>> [ 252.048331][ T330] wait_for_completion+0x15e/0x250 >>> [ 252.053323][ T330] ? wait_for_completion_interruptible+0x320/0x320 >>> [ 252.059687][ T330] ? lockdep_hardirqs_on_prepare+0x27c/0x3d0 >>> [ 252.065543][ T330] ? _raw_spin_unlock_irq+0x1f/0x30 >>> [ 252.070606][ T330] __flush_work+0x42a/0x900 >>> [ 252.074989][ T330] ? queue_delayed_work_on+0x90/0x90 >>> [ 252.080139][ T330] ? __queue_work+0x463/0xf40 >>> [ 252.084700][ T330] ? init_pwq+0x320/0x320 >>> [ 252.088891][ T330] ? queue_work_on+0x5e/0x80 >>> [ 252.093364][ T330] ? trace_hardirqs_on+0x1c/0x150 >>> [ 252.098255][ T330] work_on_cpu+0xe7/0x130 >>> [ 252.102461][ T330] ? flush_delayed_work+0xc0/0xc0 >>> [ 252.107342][ T330] ? __mutex_unlock_slowpath+0xd4/0x670 >>> [ 252.112764][ T330] ? work_debug_hint+0x30/0x30 >>> [ 252.117391][ T330] ? pci_device_shutdown+0x80/0x80 >>> [ 252.122378][ T330] ? cpumask_next_and+0x57/0x80 >>> [ 252.127094][ T330] pci_device_probe+0x500/0x5c0 >>> [ 252.131824][ T330] ? pci_device_remove+0x1f0/0x1f0 >> >> Is CONFIG_DEBUG_TEST_DRIVER_REMOVE enabled? I figure it is, with this call. >> >> Or please share the .config > > No. https://cailca.coding.net/public/linux/mm/git/files/master/x86.config > thanks, FWIW, I just tested another megaraid sas card on linux-next 02 Nov with vanilla arm64 defconfig and no special commandline param, and found no issue: dmesg | grep mega [30.031739] megasas: 07.714.04.00-rc1 [30.039749] megaraid_sas 0000:08:00.0: Adding to iommu group 0 [30.053247] megaraid_sas 0000:08:00.0: BAR:0x0 BAR's base_addr(phys):0x0000080010000000 mapped virt_addr:0x(____ptrval____) [30.053251] megaraid_sas 0000:08:00.0: FW now in Ready state [30.065162] megaraid_sas 0000:08:00.0: 63 bit DMA mask and 63 bit consistent mask [30.081197] megaraid_sas 0000:08:00.0: firmware supports msix : (128) [30.096349] megaraid_sas 0000:08:00.0: requested/available msix 128/128 [30.110277] megaraid_sas 0000:08:00.0: current msix/online cpus: (128/128) [30.124917] megaraid_sas 0000:08:00.0: RDPQ mode : (enabled) [30.136821] megaraid_sas 0000:08:00.0: Current firmware supports maximum commands: 4077 LDIO threshold: 0 [30.208538] megaraid_sas 0000:08:00.0: Performance mode :Latency (latency index = 1) [30.224838] megaraid_sas 0000:08:00.0: FW supports sync cache : Yes [30.238021] megaraid_sas 0000:08:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009 [30.311960] megaraid_sas 0000:08:00.0: FW provided supportMaxExtLDs: 1 max_lds: 64 [30.327885] megaraid_sas 0000:08:00.0: controller type : MR(2048MB) [30.341066] megaraid_sas 0000:08:00.0: Online Controller Reset(OCR) : Enabled [30.356076] megaraid_sas 0000:08:00.0: Secure JBOD support: Yes [30.368710] megaraid_sas 0000:08:00.0: NVMe passthru support : Yes [30.381708] megaraid_sas 0000:08:00.0: FW provided TM TaskAbort/Reset timeout : 6 secs/60 secs [30.399825] megaraid_sas 0000:08:00.0: JBOD sequence map support : Yes [30.413552] megaraid_sas 0000:08:00.0: PCI Lane Margining support : No [30.452059] megaraid_sas 0000:08:00.0: NVME page size : (4096) [30.465079] megaraid_sas 0000:08:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000 [30.485208] megaraid_sas 0000:08:00.0: INIT adapter done [30.496609] megaraid_sas 0000:08:00.0: pci id : (0x1000)/(0x0016)/(0x19e5)/(0xd215) [30.512931] megaraid_sas 0000:08:00.0: unevenspan support : no [30.525199] megaraid_sas 0000:08:00.0: firmware crash dump: no [30.537649] megaraid_sas 0000:08:00.0: JBOD sequence map : enabled [30.550743] megaraid_sas 0000:08:00.0: Max firmware commands: 4076 shared with nr_hw_queues = 127 john@ubuntu:~$ lspci -s 08:00.0 -v 08:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID Tri-Mode SAS3508 (rev 01) Subsystem: Huawei Technologies Co., Ltd. MegaRAID Tri-Mode SAS3508 Flags: bus master, fast devsel, latency 0, IRQ 41, NUMA node 0 Memory at 80010000000 (64-bit, prefetchable) [size=1M] Memory at 80010100000 (64-bit, prefetchable) [size=1M] Memory at e9400000 (32-bit, non-prefetchable) [size=1M] I/O ports at 1000 [size=256] Expansion ROM at e9500000 [disabled] [size=1M] Capabilities: <access denied> Kernel driver in use: megaraid_sas I have no x86 system to test that x86 config, though. How about v5.10-rc2 for this issue? Thanks, John >> >> Cheers, >> John >> >>> [ 252.136805][ T330] really_probe+0x207/0xad0 >>> [ 252.141191][ T330] ? device_driver_attach+0x120/0x120 >>> [ 252.146428][ T330] driver_probe_device+0x1f1/0x370 >>> [ 252.151424][ T330] device_driver_attach+0xe5/0x120 >>> [ 252.156399][ T330] __driver_attach+0xf0/0x260 >>> [ 252.160953][ T330] bus_for_each_dev+0x117/0x1a0 >>> [ 252.165669][ T330] ? subsys_dev_iter_exit+0x10/0x10 >>> [ 252.170731][ T330] bus_add_driver+0x399/0x560 >>> [ 252.175289][ T330] driver_register+0x189/0x310 >>> [ 252.179919][ T330] ? 0xffffffffc05c1000 >>> [ 252.183960][ T330] megasas_init+0x117/0x1000 [megaraid_sas] >>> [ 252.189713][ T330] do_one_initcall+0xf6/0x510 >>> [ 252.194267][ T330] ? perf_trace_initcall_level+0x490/0x490 >>> [ 252.199940][ T330] ? kasan_unpoison_shadow+0x30/0x40 >>> [ 252.205104][ T330] ? __kasan_kmalloc.constprop.11+0xc1/0xd0 >>> [ 252.210859][ T330] ? do_init_module+0x49/0x6c0 >>> [ 252.215500][ T330] ? kmem_cache_alloc_trace+0x11f/0x1e0 >>> [ 252.220925][ T330] ? kasan_unpoison_shadow+0x30/0x40 >>> [ 252.226068][ T330] do_init_module+0x1ed/0x6c0 >>> [ 252.230608][ T330] load_module+0x4a59/0x5d20 >>> [ 252.235081][ T330] ? layout_and_allocate+0x2770/0x2770 >>> [ 252.240404][ T330] ? __vmalloc_node+0x8d/0x100 >>> [ 252.245046][ T330] ? kernel_read_file+0x485/0x5a0 >>> [ 252.249934][ T330] ? kernel_read_file+0x305/0x5a0 >>> [ 252.254839][ T330] ? __x64_sys_fsconfig+0x970/0x970 >>> [ 252.259903][ T330] ? __do_sys_finit_module+0xff/0x180 >>> [ 252.265153][ T330] __do_sys_finit_module+0xff/0x180 >>> [ 252.270216][ T330] ? __do_sys_init_module+0x1d0/0x1d0 >>> [ 252.275465][ T330] ? __fget_files+0x1c3/0x2e0 >>> [ 252.280010][ T330] do_syscall_64+0x33/0x40 >>> [ 252.284304][ T330] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [ 252.290054][ T330] RIP: 0033:0x7fbb3e2fa78d >>> [ 252.294348][ T330] Code: Unable to access opcode bytes at RIP >>> 0x7fbb3e2fa763. >>> [ 252.301584][ T330] RSP: 002b:00007ffe572e8d18 EFLAGS: 00000246 ORIG_RAX: >>> 0000000000000139 >>> [ 252.309855][ T330] RAX: ffffffffffffffda RBX: 000055c7795d90f0 RCX: >>> 00007fbb3e2fa78d >>> [ 252.317703][ T330] RDX: 0000000000000000 RSI: 00007fbb3ee6c82d RDI: >>> 0000000000000006 >>> [ 252.325553][ T330] RBP: 00007fbb3ee6c82d R08: 0000000000000000 R09: >>> 00007ffe572e8e40 >>> [ 252.333402][ T330] R10: 0000000000000006 R11: 0000000000000246 R12: >>> 0000000000000000 >>> [ 252.341257][ T330] R13: 000055c7795930e0 R14: 0000000000020000 R15: >>> 0000000000000000 >>> [ 252.349117][ T330] > > . >
On Tue, 2020-11-03 at 10:54 +0000, John Garry wrote: > I have no x86 system to test that x86 config, though. How about > v5.10-rc2 for this issue? v5.10-rc2 is also broken here. [ 251.941451][ T330] INFO: task systemd-udevd:551 blocked for more than 122 seconds. [ 251.949176][ T330] Not tainted 5.10.0-rc2 #3 [ 251.954094][ T330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 251.962633][ T330] task:systemd-udevd state:D stack:27160 pid: 551 ppid: 506 flags:0x00000324 [ 251.971707][ T330] Call Trace: [ 251.974871][ T330] __schedule+0x71d/0x1b50 [ 251.979155][ T330] ? kcore_callback+0x1d/0x1d [ 251.983709][ T330] schedule+0xbf/0x270 [ 251.987640][ T330] schedule_timeout+0x3fc/0x590 [ 251.992370][ T330] ? usleep_range+0x120/0x120 [ 251.996910][ T330] ? wait_for_completion+0x156/0x250 [ 252.002080][ T330] ? lock_downgrade+0x700/0x700 [ 252.006792][ T330] ? rcu_read_unlock+0x40/0x40 [ 252.011435][ T330] ? do_raw_spin_lock+0x121/0x290 [ 252.016324][ T330] ? lockdep_hardirqs_on_prepare+0x27c/0x3d0 [ 252.022178][ T330] ? _raw_spin_unlock_irq+0x1f/0x30 [ 252.027235][ T330] wait_for_completion+0x15e/0x250 [ 252.032226][ T330] ? wait_for_completion_interruptible+0x2f0/0x2f0 [ 252.038590][ T330] ? lockdep_hardirqs_on_prepare+0x27c/0x3d0 [ 252.044443][ T330] ? _raw_spin_unlock_irq+0x1f/0x30 [ 252.049502][ T330] __flush_work+0x42a/0x900 [ 252.053882][ T330] ? queue_delayed_work_on+0x90/0x90 [ 252.059025][ T330] ? __queue_work+0x463/0xf40 [ 252.063583][ T330] ? init_pwq+0x320/0x320 [ 252.067777][ T330] ? queue_work_on+0x5e/0x80 [ 252.072249][ T330] ? trace_hardirqs_on+0x1c/0x150 [ 252.077138][ T330] work_on_cpu+0xe7/0x130 [ 252.081347][ T330] ? flush_delayed_work+0xc0/0xc0 [ 252.086231][ T330] ? __mutex_unlock_slowpath+0xd4/0x670 [ 252.091655][ T330] ? work_debug_hint+0x30/0x30 [ 252.096284][ T330] ? pci_device_shutdown+0x80/0x80 [ 252.101274][ T330] ? cpumask_next_and+0x57/0x80 [ 252.105990][ T330] pci_device_probe+0x500/0x5c0 [ 252.110703][ T330] ? pci_device_remove+0x1f0/0x1f0 [ 252.115697][ T330] really_probe+0x207/0xad0 [ 252.120065][ T330] ? device_driver_attach+0x120/0x120 [ 252.125317][ T330] driver_probe_device+0x1f1/0x370 [ 252.130291][ T330] device_driver_attach+0xe5/0x120 [ 252.135281][ T330] __driver_attach+0xf0/0x260 [ 252.139827][ T330] bus_for_each_dev+0x117/0x1a0 [ 252.144552][ T330] ? subsys_dev_iter_exit+0x10/0x10 [ 252.149609][ T330] bus_add_driver+0x399/0x560 [ 252.154166][ T330] driver_register+0x189/0x310 [ 252.158795][ T330] ? 0xffffffffc05c5000 [ 252.162838][ T330] megasas_init+0x117/0x1000 [megaraid_sas] [ 252.168593][ T330] do_one_initcall+0xf6/0x510 [ 252.173143][ T330] ? perf_trace_initcall_level+0x490/0x490 [ 252.178809][ T330] ? kasan_unpoison_shadow+0x30/0x40 [ 252.183973][ T330] ? __kasan_kmalloc.constprop.11+0xc1/0xd0 [ 252.189728][ T330] ? do_init_module+0x49/0x6c0 [ 252.194370][ T330] ? kmem_cache_alloc_trace+0x12e/0x2a0 [ 252.199780][ T330] ? kasan_unpoison_shadow+0x30/0x40 [ 252.204942][ T330] do_init_module+0x1ed/0x6c0 [ 252.209479][ T330] load_module+0x4a25/0x5cf0 [ 252.213950][ T330] ? layout_and_allocate+0x2770/0x2770 [ 252.219271][ T330] ? __vmalloc_node+0x8d/0x100 [ 252.223913][ T330] ? kernel_read_file+0x485/0x5a0 [ 252.228796][ T330] ? kernel_read_file+0x305/0x5a0 [ 252.233696][ T330] ? __ia32_sys_fsconfig+0x6a0/0x6a0 [ 252.238841][ T330] ? __do_sys_finit_module+0xff/0x180 [ 252.244093][ T330] __do_sys_finit_module+0xff/0x180 [ 252.249155][ T330] ? __do_sys_init_module+0x1d0/0x1d0 [ 252.254403][ T330] ? __fget_files+0x1c3/0x2e0 [ 252.258940][ T330] do_syscall_64+0x33/0x40 [ 252.263234][ T330] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 252.268984][ T330] RIP: 0033:0x7f7cf6a4878d [ 252.273276][ T330] Code: Unable to access opcode bytes at RIP 0x7f7cf6a48763. [ 252.280499][ T330] RSP: 002b:00007ffcfa94b978 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 252.288781][ T330] RAX: ffffffffffffffda RBX: 000055e01f48b730 RCX: 00007f7cf6a4878d [ 252.296628][ T330] RDX: 0000000000000000 RSI: 00007f7cf75ba82d RDI: 0000000000000006 [ 252.304482][ T330] RBP: 00007f7cf75ba82d R08: 0000000000000000 R09: 00007ffcfa94baa0 [ 252.312331][ T330] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000 [ 252.320167][ T330] R13: 000055e01f433530 R14: 0000000000020000 R15: 0000000000000000 [ 252.328052][ T330] [ 252.328052][ T330] Showing all locks held in the system: [ 252.335722][ T330] 3 locks held by kworker/3:1/289: [ 252.340697][ T330] #0: ffff8881001eb338 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ec/0x1610 [ 252.350906][ T330] #1: ffffc90004ef7e00 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x820/0x1610 [ 252.361725][ T330] #2: ffff88810dc600e0 (&shost->scan_mutex){+.+.}-{3:3}, at: scsi_scan_host_selected+0xde/0x260 [ 252.372132][ T330] 1 lock held by khungtaskd/330: [ 252.376933][ T330] #0: ffffffffb42d2de0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.52+0x0/0x30 [ 252.387234][ T330] 1 lock held by systemd-journal/398: [ 252.392489][ T330] 1 lock held by systemd-udevd/551: [ 252.397550][ T330] #0: ffff888109a49218 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x37/0x120 [ 252.407085][ T330] [ 252.409285][ T330] ============================================= [ 252.409285][ T330]
On Tue, 2020-11-03 at 08:04 -0500, Qian Cai wrote: > On Tue, 2020-11-03 at 10:54 +0000, John Garry wrote: > > I have no x86 system to test that x86 config, though. How about > > v5.10-rc2 for this issue? > > v5.10-rc2 is also broken here. John, Kashyap, any update on this? If this is going to take a while to fix it proper, should I send a patch to revert this or at least disable the feature by default for megaraid_sas in the meantime, so it no longer breaks the existing systems out there? > > [ 251.941451][ T330] INFO: task systemd-udevd:551 blocked for more than 122 > seconds. > [ 251.949176][ T330] Not tainted 5.10.0-rc2 #3 > [ 251.954094][ T330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 251.962633][ T330] task:systemd-udevd state:D stack:27160 pid: 551 > ppid: 506 flags:0x00000324 > [ 251.971707][ T330] Call Trace: > [ 251.974871][ T330] __schedule+0x71d/0x1b50 > [ 251.979155][ T330] ? kcore_callback+0x1d/0x1d > [ 251.983709][ T330] schedule+0xbf/0x270 > [ 251.987640][ T330] schedule_timeout+0x3fc/0x590 > [ 251.992370][ T330] ? usleep_range+0x120/0x120 > [ 251.996910][ T330] ? wait_for_completion+0x156/0x250 > [ 252.002080][ T330] ? lock_downgrade+0x700/0x700 > [ 252.006792][ T330] ? rcu_read_unlock+0x40/0x40 > [ 252.011435][ T330] ? do_raw_spin_lock+0x121/0x290 > [ 252.016324][ T330] ? lockdep_hardirqs_on_prepare+0x27c/0x3d0 > [ 252.022178][ T330] ? _raw_spin_unlock_irq+0x1f/0x30 > [ 252.027235][ T330] wait_for_completion+0x15e/0x250 > [ 252.032226][ T330] ? wait_for_completion_interruptible+0x2f0/0x2f0 > [ 252.038590][ T330] ? lockdep_hardirqs_on_prepare+0x27c/0x3d0 > [ 252.044443][ T330] ? _raw_spin_unlock_irq+0x1f/0x30 > [ 252.049502][ T330] __flush_work+0x42a/0x900 > [ 252.053882][ T330] ? queue_delayed_work_on+0x90/0x90 > [ 252.059025][ T330] ? __queue_work+0x463/0xf40 > [ 252.063583][ T330] ? init_pwq+0x320/0x320 > [ 252.067777][ T330] ? queue_work_on+0x5e/0x80 > [ 252.072249][ T330] ? trace_hardirqs_on+0x1c/0x150 > [ 252.077138][ T330] work_on_cpu+0xe7/0x130 > [ 252.081347][ T330] ? flush_delayed_work+0xc0/0xc0 > [ 252.086231][ T330] ? __mutex_unlock_slowpath+0xd4/0x670 > [ 252.091655][ T330] ? work_debug_hint+0x30/0x30 > [ 252.096284][ T330] ? pci_device_shutdown+0x80/0x80 > [ 252.101274][ T330] ? cpumask_next_and+0x57/0x80 > [ 252.105990][ T330] pci_device_probe+0x500/0x5c0 > [ 252.110703][ T330] ? pci_device_remove+0x1f0/0x1f0 > [ 252.115697][ T330] really_probe+0x207/0xad0 > [ 252.120065][ T330] ? device_driver_attach+0x120/0x120 > [ 252.125317][ T330] driver_probe_device+0x1f1/0x370 > [ 252.130291][ T330] device_driver_attach+0xe5/0x120 > [ 252.135281][ T330] __driver_attach+0xf0/0x260 > [ 252.139827][ T330] bus_for_each_dev+0x117/0x1a0 > [ 252.144552][ T330] ? subsys_dev_iter_exit+0x10/0x10 > [ 252.149609][ T330] bus_add_driver+0x399/0x560 > [ 252.154166][ T330] driver_register+0x189/0x310 > [ 252.158795][ T330] ? 0xffffffffc05c5000 > [ 252.162838][ T330] megasas_init+0x117/0x1000 [megaraid_sas] > [ 252.168593][ T330] do_one_initcall+0xf6/0x510 > [ 252.173143][ T330] ? perf_trace_initcall_level+0x490/0x490 > [ 252.178809][ T330] ? kasan_unpoison_shadow+0x30/0x40 > [ 252.183973][ T330] ? __kasan_kmalloc.constprop.11+0xc1/0xd0 > [ 252.189728][ T330] ? do_init_module+0x49/0x6c0 > [ 252.194370][ T330] ? kmem_cache_alloc_trace+0x12e/0x2a0 > [ 252.199780][ T330] ? kasan_unpoison_shadow+0x30/0x40 > [ 252.204942][ T330] do_init_module+0x1ed/0x6c0 > [ 252.209479][ T330] load_module+0x4a25/0x5cf0 > [ 252.213950][ T330] ? layout_and_allocate+0x2770/0x2770 > [ 252.219271][ T330] ? __vmalloc_node+0x8d/0x100 > [ 252.223913][ T330] ? kernel_read_file+0x485/0x5a0 > [ 252.228796][ T330] ? kernel_read_file+0x305/0x5a0 > [ 252.233696][ T330] ? __ia32_sys_fsconfig+0x6a0/0x6a0 > [ 252.238841][ T330] ? __do_sys_finit_module+0xff/0x180 > [ 252.244093][ T330] __do_sys_finit_module+0xff/0x180 > [ 252.249155][ T330] ? __do_sys_init_module+0x1d0/0x1d0 > [ 252.254403][ T330] ? __fget_files+0x1c3/0x2e0 > [ 252.258940][ T330] do_syscall_64+0x33/0x40 > [ 252.263234][ T330] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 252.268984][ T330] RIP: 0033:0x7f7cf6a4878d > [ 252.273276][ T330] Code: Unable to access opcode bytes at RIP > 0x7f7cf6a48763. > [ 252.280499][ T330] RSP: 002b:00007ffcfa94b978 EFLAGS: 00000246 ORIG_RAX: > 0000000000000139 > [ 252.288781][ T330] RAX: ffffffffffffffda RBX: 000055e01f48b730 RCX: > 00007f7cf6a4878d > [ 252.296628][ T330] RDX: 0000000000000000 RSI: 00007f7cf75ba82d RDI: > 0000000000000006 > [ 252.304482][ T330] RBP: 00007f7cf75ba82d R08: 0000000000000000 R09: > 00007ffcfa94baa0 > [ 252.312331][ T330] R10: 0000000000000006 R11: 0000000000000246 R12: > 0000000000000000 > [ 252.320167][ T330] R13: 000055e01f433530 R14: 0000000000020000 R15: > 0000000000000000 > [ 252.328052][ T330] > [ 252.328052][ T330] Showing all locks held in the system: > [ 252.335722][ T330] 3 locks held by kworker/3:1/289: > [ 252.340697][ T330] #0: ffff8881001eb338 ((wq_completion)events){+.+.}- > {0:0}, at: process_one_work+0x7ec/0x1610 > [ 252.350906][ T330] #1: ffffc90004ef7e00 > ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x820/0x1610 > [ 252.361725][ T330] #2: ffff88810dc600e0 (&shost->scan_mutex){+.+.}-{3:3}, > at: scsi_scan_host_selected+0xde/0x260 > [ 252.372132][ T330] 1 lock held by khungtaskd/330: > [ 252.376933][ T330] #0: ffffffffb42d2de0 (rcu_read_lock){....}-{1:2}, at: > rcu_lock_acquire.constprop.52+0x0/0x30 > [ 252.387234][ T330] 1 lock held by systemd-journal/398: > [ 252.392489][ T330] 1 lock held by systemd-udevd/551: > [ 252.397550][ T330] #0: ffff888109a49218 (&dev->mutex){....}-{3:3}, at: > device_driver_attach+0x37/0x120 > [ 252.407085][ T330] > [ 252.409285][ T330] ============================================= > [ 252.409285][ T330] >
> > > > v5.10-rc2 is also broken here. > > John, Kashyap, any update on this? If this is going to take a while to fix > it > proper, should I send a patch to revert this or at least disable the > feature by > default for megaraid_sas in the meantime, so it no longer breaks the > existing > systems out there? I am trying to get similar h/w to try out. All my current h/w works fine. Give me couple of days' time. If this is not obviously common issue and need time, we will go with module parameter disable method. I will let you know. Kashyap
On 04/11/2020 16:07, Kashyap Desai wrote: >>> >>> v5.10-rc2 is also broken here. >> >> John, Kashyap, any update on this? If this is going to take a while to fix >> it >> proper, should I send a patch to revert this or at least disable the >> feature by >> default for megaraid_sas in the meantime, so it no longer breaks the >> existing >> systems out there? > > I am trying to get similar h/w to try out. All my current h/w works fine. > Give me couple of days' time. > If this is not obviously common issue and need time, we will go with module > parameter disable method. > I will let you know. Hi Kashyap, Please also consider just disabling for this card, so any other possible issues are unearthed on other cards. I don't have this card or any x86 machine to test it unfortunately to assist. BTW, just to be clear, did you try the same .config as Qian Cai? Thanks, John
On Wed, Nov 4, 2020 at 11:38 PM John Garry <john.garry@huawei.com> wrote: > > On 04/11/2020 16:07, Kashyap Desai wrote: > >>> > >>> v5.10-rc2 is also broken here. > >> > >> John, Kashyap, any update on this? If this is going to take a while to fix > >> it > >> proper, should I send a patch to revert this or at least disable the > >> feature by > >> default for megaraid_sas in the meantime, so it no longer breaks the > >> existing > >> systems out there? > > > > I am trying to get similar h/w to try out. All my current h/w works fine. > > Give me couple of days' time. > > If this is not obviously common issue and need time, we will go with module > > parameter disable method. > > I will let you know. > > Hi Kashyap, > > Please also consider just disabling for this card, so any other possible > issues are unearthed on other cards. I don't have this card or any x86 > machine to test it unfortunately to assist. > > BTW, just to be clear, did you try the same .config as Qian Cai? > > Thanks, > John I am able to hit the boot hang and similar kind of stack traces as reported by Qian with shared .config on x86 machine. In my case the system boots after a hang of 40-45 mins. Qian, is it true for you as well ? With module parameter -"host_tagset_enable=0", the issue is not seen. Below is snippet of the dmesg logs/traces which are observed during system bootup and after wait of 40-45 mins drives attached to megaraid_sas adapter are discovered: ======================================== [ 1969.502913] INFO: task systemd-udevd:906 can't die for more than 1720 seconds. [ 1969.597725] task:systemd-udevd state:D stack:13456 pid: 906 ppid: 858 flags:0x00000324 [ 1969.597730] Call Trace: [ 1969.597734] __schedule+0x263/0x7f0 [ 1969.597737] ? __lock_acquire+0x576/0xaf0 [ 1969.597739] ? wait_for_completion+0x7b/0x110 [ 1969.597741] schedule+0x4c/0xc0 [ 1969.597743] schedule_timeout+0x244/0x2e0 [ 1969.597745] ? find_held_lock+0x2d/0x90 [ 1969.597748] ? wait_for_completion+0xa6/0x110 [ 1969.597750] ? wait_for_completion+0x7b/0x110 [ 1969.597752] ? lockdep_hardirqs_on_prepare+0xd4/0x170 [ 1969.597753] ? wait_for_completion+0x7b/0x110 [ 1969.597755] wait_for_completion+0xae/0x110 [ 1969.597757] __flush_work+0x269/0x4b0 [ 1969.597760] ? init_pwq+0xf0/0xf0 [ 1969.597763] work_on_cpu+0x9c/0xd0 [ 1969.597765] ? work_is_static_object+0x10/0x10 [ 1969.597768] ? pci_device_shutdown+0x30/0x30 [ 1969.597770] pci_device_probe+0x197/0x1b0 [ 1969.597773] really_probe+0xda/0x410 [ 1969.597776] driver_probe_device+0xd9/0x140 [ 1969.597778] device_driver_attach+0x4a/0x50 [ 1969.597780] __driver_attach+0x83/0x140 [ 1969.597782] ? device_driver_attach+0x50/0x50 [ 1969.597784] ? device_driver_attach+0x50/0x50 [ 1969.597787] bus_for_each_dev+0x74/0xc0 [ 1969.597789] bus_add_driver+0x14b/0x1f0 [ 1969.597791] ? 0xffffffffc04fb000 [ 1969.597793] driver_register+0x66/0xb0 [ 1969.597795] ? 0xffffffffc04fb000 [ 1969.597801] megasas_init+0xe7/0x1000 [megaraid_sas] [ 1969.597803] do_one_initcall+0x62/0x300 [ 1969.597806] ? do_init_module+0x1d/0x200 [ 1969.597808] ? kmem_cache_alloc_trace+0x296/0x2d0 [ 1969.597811] do_init_module+0x55/0x200 [ 1969.597813] load_module+0x15f2/0x17b0 [ 1969.597816] ? __do_sys_finit_module+0xad/0x110 [ 1969.597818] __do_sys_finit_module+0xad/0x110 [ 1969.597820] do_syscall_64+0x33/0x40 [ 1969.597823] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1969.597825] RIP: 0033:0x7f66340262bd [ 1969.597827] Code: Unable to access opcode bytes at RIP 0x7f6634026293. [ 1969.597828] RSP: 002b:00007ffca1011f48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 1969.597831] RAX: ffffffffffffffda RBX: 000055f6720cf370 RCX: 00007f66340262bd [ 1969.597833] RDX: 0000000000000000 RSI: 00007f6634b9880d RDI: 0000000000000006 [ 1969.597835] RBP: 00007f6634b9880d R08: 0000000000000000 R09: 00007ffca1012070 [ 1969.597836] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000 [ 1969.597838] R13: 000055f6720cce70 R14: 0000000000020000 R15: 0000000000000000 [ 1969.597859] Showing all locks held in the system: [ 1969.597862] 2 locks held by kworker/0:0/5: [ 1969.597863] #0: ffff9af800194b38 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1e6/0x5e0 [ 1969.597872] #1: ffffbf3bc01f3e70 ((kfence_timer).work){+.+.}-{0:0}, at: process_one_work+0x1e6/0x5e0 [ 1969.597890] 3 locks held by kworker/0:1/7: [ 1969.597960] 1 lock held by khungtaskd/643: [ 1969.597962] #0: ffffffffa624cb60 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.54+0x0/0x30 [ 1969.597982] 1 lock held by systemd-udevd/906: [ 1969.597983] #0: ffff9af984a1c218 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x50 [ 1969.598010] ============================================= [ 1983.242512] random: fast init done [ 2071.928411] sd 0:2:0:0: [sda] 1951399936 512-byte logical blocks: (999 GB/931 GiB) [ 2071.928480] sd 0:2:2:0: [sdc] 1756889088 512-byte logical blocks: (900 GB/838 GiB) [ 2071.928537] sd 0:2:1:0: [sdb] 285474816 512-byte logical blocks: (146 GB/136 GiB) [ 2071.928580] sd 0:2:0:0: [sda] Write Protect is off [ 2071.928625] sd 0:2:0:0: [sda] Mode Sense: 1f 00 00 08 [ 2071.928629] sd 0:2:2:0: [sdc] Write Protect is off [ 2071.928669] sd 0:2:1:0: [sdb] Write Protect is off [ 2071.928706] sd 0:2:1:0: [sdb] Mode Sense: 1f 00 00 08 [ 2071.928844] sd 0:2:2:0: [sdc] Mode Sense: 1f 00 00 08 [ 2071.928848] sd 0:2:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA ================================ I am working on it and need some time for debugging. BTW did anyone try "shared host tagset" patchset on some other adapter/s which are not really multiqueue at HW level but driver exposes multiple hardware queues(similar to megaraid_sas) with the .config shared by Qian ? Thanks, Sumit
On Sat, 2020-11-07 at 00:55 +0530, Sumit Saxena wrote: > I am able to hit the boot hang and similar kind of stack traces as > reported by Qian with shared .config on x86 machine. > In my case the system boots after a hang of 40-45 mins. Qian, is it > true for you as well ? I don't know. I had never waited for that long.
On 07/11/2020 00:17, Qian Cai wrote: > On Sat, 2020-11-07 at 00:55 +0530, Sumit Saxena wrote: >> I am able to hit the boot hang and similar kind of stack traces as >> reported by Qian with shared .config on x86 machine. >> In my case the system boots after a hang of 40-45 mins. Qian, is it >> true for you as well ? > I don't know. I had never waited for that long. > > . > Hi Qian, By chance do have an equivalent arm64 .config, enabling the same RH config options? I suppose I could try do this myself also, but an authentic version would be nicer. Thanks, John
On Mon, 2020-11-09 at 08:49 +0000, John Garry wrote: > On 07/11/2020 00:17, Qian Cai wrote: > > On Sat, 2020-11-07 at 00:55 +0530, Sumit Saxena wrote: > > > I am able to hit the boot hang and similar kind of stack traces as > > > reported by Qian with shared .config on x86 machine. > > > In my case the system boots after a hang of 40-45 mins. Qian, is it > > > true for you as well ? > > I don't know. I had never waited for that long. > > > > . > > > > Hi Qian, > > By chance do have an equivalent arm64 .config, enabling the same RH > config options? > > I suppose I could try do this myself also, but an authentic version > would be nicer. The closest one I have here is: https://cailca.coding.net/public/linux/mm/git/files/master/arm64.config but it only selects the Thunder X2 platform and needs to manually select CONFIG_MEGARAID_SAS=m to start with, but none of arm64 systems here have megaraid_sas.
On 09/11/2020 13:39, Qian Cai wrote: >> I suppose I could try do this myself also, but an authentic version >> would be nicer. > The closest one I have here is: > https://cailca.coding.net/public/linux/mm/git/files/master/arm64.config > > but it only selects the Thunder X2 platform and needs to manually select > CONFIG_MEGARAID_SAS=m to start with, but none of arm64 systems here have > megaraid_sas. Thanks, I'm confident I can fix it up to get it going on my Huawei arm64 D06CS. So that board has a megaraid sas card. In addition, it also has hisi_sas HW, which is another storage controller which we enabled this same feature which is causing the problem. I'll report back when I can. Thanks, john
On 09/11/2020 14:05, John Garry wrote: > On 09/11/2020 13:39, Qian Cai wrote: >>> I suppose I could try do this myself also, but an authentic version >>> would be nicer. >> The closest one I have here is: >> https://cailca.coding.net/public/linux/mm/git/files/master/arm64.config >> >> but it only selects the Thunder X2 platform and needs to manually select >> CONFIG_MEGARAID_SAS=m to start with, but none of arm64 systems here have >> megaraid_sas. > > Thanks, I'm confident I can fix it up to get it going on my Huawei arm64 > D06CS. > > So that board has a megaraid sas card. In addition, it also has hisi_sas > HW, which is another storage controller which we enabled this same > feature which is causing the problem. > > I'll report back when I can. So I had to hack that arm64 config a bit to get it booting: https://github.com/hisilicon/kernel-dev/commits/private-topic-sas-5.10-megaraid-hang Boot is ok on my board without the megaraid sas card, but includes hisi_sas HW (which enables the equivalent option which is exposing the problem). But the board with the megaraid sas boots very slowly, specifically around the megaraid sas probe: : ttyS0 at MMIO 0x3f00002f8 (irq = 17, base_baud = 115200) is a 16550A [ 50.023726][ T1] printk: console [ttyS0] enabled [ 50.412597][ T1] megasas: 07.714.04.00-rc1 [ 50.436614][ T5] megaraid_sas 0000:08:00.0: FW now in Ready state [ 50.450079][ T5] megaraid_sas 0000:08:00.0: 63 bit DMA mask and 63 bit consistent mask [ 50.467811][ T5] megaraid_sas 0000:08:00.0: firmware supports msix : (128) [ 50.845995][ T5] megaraid_sas 0000:08:00.0: requested/available msix 128/128 [ 50.861476][ T5] megaraid_sas 0000:08:00.0: current msix/online cpus : (128/128) [ 50.877616][ T5] megaraid_sas 0000:08:00.0: RDPQ mode : (enabled) [ 50.891018][ T5] megaraid_sas 0000:08:00.0: Current firmware supports maximum commands: 4077 LDIO threshold: 0 [ 51.262942][ T5] megaraid_sas 0000:08:00.0: Performance mode :Latency (latency index = 1) [ 51.280749][ T5] megaraid_sas 0000:08:00.0: FW supports sync cache : Yes [ 51.295451][ T5] megaraid_sas 0000:08:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009 [ 51.387474][ T5] megaraid_sas 0000:08:00.0: FW provided supportMaxExtLDs: 1 max_lds: 64 [ 51.404931][ T5] megaraid_sas 0000:08:00.0: controller type : MR(2048MB) [ 51.419616][ T5] megaraid_sas 0000:08:00.0: Online Controller Reset(OCR) : Enabled [ 51.436132][ T5] megaraid_sas 0000:08:00.0: Secure JBOD support : Yes [ 51.450265][ T5] megaraid_sas 0000:08:00.0: NVMe passthru support : Yes [ 51.464757][ T5] megaraid_sas 0000:08:00.0: FW provided TM TaskAbort/Reset timeout : 6 secs/60 secs [ 51.484379][ T5] megaraid_sas 0000:08:00.0: JBOD sequence map support : Yes [ 51.499607][ T5] megaraid_sas 0000:08:00.0: PCI Lane Margining support : No [ 51.547610][ T5] megaraid_sas 0000:08:00.0: NVME page size : (4096) [ 51.608635][ T5] megaraid_sas 0000:08:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000 [ 51.630285][ T5] megaraid_sas 0000:08:00.0: INIT adapter done [ 51.649854][ T5] megaraid_sas 0000:08:00.0: pci id : (0x1000)/(0x0016)/(0x19e5)/(0xd215) [ 51.667873][ T5] megaraid_sas 0000:08:00.0: unevenspan support : no [ 51.681646][ T5] megaraid_sas 0000:08:00.0: firmware crash dump : no [ 51.695596][ T5] megaraid_sas 0000:08:00.0: JBOD sequence map : enabled [ 51.711521][ T5] megaraid_sas 0000:08:00.0: Max firmware commands: 4076 shared with nr_hw_queues = 127 [ 51.733056][ T5] scsi host0: Avago SAS based MegaRAID driver [ 65.304363][ T5] scsi 0:0:0:0: Direct-Access ATA SAMSUNG MZ7KH1T9 404Q PQ: 0 ANSI: 6 [ 65.392401][ T5] scsi 0:0:1:0: Direct-Access ATA SAMSUNG MZ7KH1T9 404Q PQ: 0 ANSI: 6 [ 79.508307][ T5] scsi 0:0:65:0: Enclosure HUAWEI Expander 12Gx16 131 PQ: 0 ANSI: 6 [ 183.965109][ C14] random: fast init done Notice the 14 and 104 second delays. But does boot fully to get to the console. I'll wait for further issues, which you guys seem to experience after a while. Thanks, John
On Tue, Nov 10, 2020 at 11:12 PM John Garry <john.garry@huawei.com> wrote: > > On 09/11/2020 14:05, John Garry wrote: > > On 09/11/2020 13:39, Qian Cai wrote: > >>> I suppose I could try do this myself also, but an authentic version > >>> would be nicer. > >> The closest one I have here is: > >> https://cailca.coding.net/public/linux/mm/git/files/master/arm64.config > >> > >> but it only selects the Thunder X2 platform and needs to manually select > >> CONFIG_MEGARAID_SAS=m to start with, but none of arm64 systems here have > >> megaraid_sas. > > > > Thanks, I'm confident I can fix it up to get it going on my Huawei arm64 > > D06CS. > > > > So that board has a megaraid sas card. In addition, it also has hisi_sas > > HW, which is another storage controller which we enabled this same > > feature which is causing the problem. > > > > I'll report back when I can. > > So I had to hack that arm64 config a bit to get it booting: > https://github.com/hisilicon/kernel-dev/commits/private-topic-sas-5.10-megaraid-hang > > Boot is ok on my board without the megaraid sas card, but includes > hisi_sas HW (which enables the equivalent option which is exposing the > problem). > > But the board with the megaraid sas boots very slowly, specifically > around the megaraid sas probe: > > : ttyS0 at MMIO 0x3f00002f8 (irq = 17, base_baud = 115200) is a 16550A > [ 50.023726][ T1] printk: console [ttyS0] enabled > [ 50.412597][ T1] megasas: 07.714.04.00-rc1 > [ 50.436614][ T5] megaraid_sas 0000:08:00.0: FW now in Ready state > [ 50.450079][ T5] megaraid_sas 0000:08:00.0: 63 bit DMA mask and 63 > bit consistent mask > [ 50.467811][ T5] megaraid_sas 0000:08:00.0: firmware supports msix > : (128) > [ 50.845995][ T5] megaraid_sas 0000:08:00.0: requested/available > msix 128/128 > [ 50.861476][ T5] megaraid_sas 0000:08:00.0: current msix/online > cpus : (128/128) > [ 50.877616][ T5] megaraid_sas 0000:08:00.0: RDPQ mode : (enabled) > [ 50.891018][ T5] megaraid_sas 0000:08:00.0: Current firmware > supports maximum commands: 4077 LDIO threshold: 0 > [ 51.262942][ T5] megaraid_sas 0000:08:00.0: Performance mode > :Latency (latency index = 1) > [ 51.280749][ T5] megaraid_sas 0000:08:00.0: FW supports sync cache > : Yes > [ 51.295451][ T5] megaraid_sas 0000:08:00.0: > megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009 > [ 51.387474][ T5] megaraid_sas 0000:08:00.0: FW provided > supportMaxExtLDs: 1 max_lds: 64 > [ 51.404931][ T5] megaraid_sas 0000:08:00.0: controller type > : MR(2048MB) > [ 51.419616][ T5] megaraid_sas 0000:08:00.0: Online Controller > Reset(OCR) : Enabled > [ 51.436132][ T5] megaraid_sas 0000:08:00.0: Secure JBOD support > : Yes > [ 51.450265][ T5] megaraid_sas 0000:08:00.0: NVMe passthru support > : Yes > [ 51.464757][ T5] megaraid_sas 0000:08:00.0: FW provided TM > TaskAbort/Reset timeout : 6 secs/60 secs > [ 51.484379][ T5] megaraid_sas 0000:08:00.0: JBOD sequence map > support : Yes > [ 51.499607][ T5] megaraid_sas 0000:08:00.0: PCI Lane Margining > support : No > [ 51.547610][ T5] megaraid_sas 0000:08:00.0: NVME page size > : (4096) > [ 51.608635][ T5] megaraid_sas 0000:08:00.0: > megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000 > [ 51.630285][ T5] megaraid_sas 0000:08:00.0: INIT adapter done > [ 51.649854][ T5] megaraid_sas 0000:08:00.0: pci id > : (0x1000)/(0x0016)/(0x19e5)/(0xd215) > [ 51.667873][ T5] megaraid_sas 0000:08:00.0: unevenspan support : no > [ 51.681646][ T5] megaraid_sas 0000:08:00.0: firmware crash dump : no > [ 51.695596][ T5] megaraid_sas 0000:08:00.0: JBOD sequence map > : enabled > [ 51.711521][ T5] megaraid_sas 0000:08:00.0: Max firmware commands: > 4076 shared with nr_hw_queues = 127 > [ 51.733056][ T5] scsi host0: Avago SAS based MegaRAID driver > [ 65.304363][ T5] scsi 0:0:0:0: Direct-Access ATA SAMSUNG > MZ7KH1T9 404Q PQ: 0 ANSI: 6 > [ 65.392401][ T5] scsi 0:0:1:0: Direct-Access ATA SAMSUNG > MZ7KH1T9 404Q PQ: 0 ANSI: 6 > [ 79.508307][ T5] scsi 0:0:65:0: Enclosure HUAWEI > Expander 12Gx16 131 PQ: 0 ANSI: 6 > [ 183.965109][ C14] random: fast init done > > Notice the 14 and 104 second delays. > > But does boot fully to get to the console. I'll wait for further issues, > which you guys seem to experience after a while. > > Thanks, > John "megaraid_sas" driver calls “scsi_scan_host()” to discover SCSI devices. In this failure case, scsi_scan_host() is taking a long time to complete, hence causing delay in system boot. With "host_tagset" enabled, scsi_scan_host() takes around 20 mins. With "host_tagset" disabled, scsi_scan_host() takes upto 5-8 mins. The scan time depends upon the number of scsi channels and devices per scsi channel is exposed by LLD. megaraid_sas driver exposes 4 channels and 128 drives per channel. Each target scan takes 2 seconds (in case of failure with host_tagset enabled). That's why driver load completes after ~20 minutes. See below: [ 299.725271] kobject: 'target18:0:96': free name [ 301.681267] kobject: 'target18:0:97' (00000000987c7f11): kobject_cleanup, parent 0000000000000000 [ 301.681269] kobject: 'target18:0:97' (00000000987c7f11): calling ktype release [ 301.681273] kobject: 'target18:0:97': free name [ 303.575268] kobject: 'target18:0:98' (00000000a8c34149): kobject_cleanup, parent 0000000000000000 In Qian's kernel .config, async scsi scan is disabled so in failure case SCSI scan type is synchronous. Below is the stack trace when scsi_scan_host() hangs: [<0>] __wait_rcu_gp+0x134/0x170 [<0>] synchronize_rcu.part.80+0x53/0x60 [<0>] blk_free_flush_queue+0x12/0x30 [<0>] blk_mq_hw_sysfs_release+0x21/0x70 [<0>] kobject_release+0x46/0x150 [<0>] blk_mq_release+0xb4/0xf0 [<0>] blk_release_queue+0xc4/0x130 [<0>] kobject_release+0x46/0x150 [<0>] scsi_device_dev_release_usercontext+0x194/0x3f0 [<0>] execute_in_process_context+0x22/0xa0 [<0>] device_release+0x2e/0x80 [<0>] kobject_release+0x46/0x150 [<0>] scsi_alloc_sdev+0x2e7/0x310 [<0>] scsi_probe_and_add_lun+0x410/0xbd0 [<0>] __scsi_scan_target+0xf2/0x530 [<0>] scsi_scan_channel.part.7+0x51/0x70 [<0>] scsi_scan_host_selected+0xd4/0x140 [<0>] scsi_scan_host+0x198/0x1c0 This issue hits when lock related debugging is enabled in kernel config. kernel .config parameters(may be subset of this list) are required to hit the issue: CONFIG_PREEMPT_COUNT=y CONFIG_UNINLINE_SPIN_UNLOCK=y CONFIG_LOCK_STAT=y CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y CONFIG_DEBUG_RWSEMS=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_LOCKDEP=y CONFIG_DEBUG_LOCKDEP=y CONFIG_TRACE_IRQFLAGS=y CONFIG_TRACE_IRQFLAGS_NMI=y CONFIG_DEBUG_KOBJECT=y CONFIG_PROVE_RCU=y CONFIG_PREEMPTIRQ_TRACEPOINTS=y When scsi_scan_host() hangs, there are no outstanding IOs with megaraid_sas driver-firmware stack as SCSI "host_busy" counter and megaraid_sas driver's internal counter are "0". Key takeaways: 1. Issue is observed when lock related debugging is enabled so issue is seen in debug environment. 2. Issue seems to be related to generic shared "host_tagset" code whenever some kind of kernel debugging is enabled. We do not see an immediate reason to hide this issue through disabling the "host_tagset" feature. John, Issue may hit on ARM platform too using Qian's .config file with other adapters (e.g. hisi_sas) as well. So I feel disabling “host_tagset” in megaraid_sas driver will not help. It requires debugging from the “Entire Shared host tag feature” perspective as scsi_scan_host() waittime aggravates when "host_tagset" is enabled. Also, I am doing parallel debugging and if I find anything useful, I will share. Qian, I need full dmesg logs from your setup with megaraid_sas.host_tagset_enable=1 and megaraid_sas.host_tagset_enable=0. Please wait for a long time. I just want to make sure that whatever you observe is the same as mine. Thanks, Sumit
On Wed, Nov 11, 2020 at 12:57:59PM +0530, Sumit Saxena wrote: > On Tue, Nov 10, 2020 at 11:12 PM John Garry <john.garry@huawei.com> wrote: > > > > On 09/11/2020 14:05, John Garry wrote: > > > On 09/11/2020 13:39, Qian Cai wrote: > > >>> I suppose I could try do this myself also, but an authentic version > > >>> would be nicer. > > >> The closest one I have here is: > > >> https://cailca.coding.net/public/linux/mm/git/files/master/arm64.config > > >> > > >> but it only selects the Thunder X2 platform and needs to manually select > > >> CONFIG_MEGARAID_SAS=m to start with, but none of arm64 systems here have > > >> megaraid_sas. > > > > > > Thanks, I'm confident I can fix it up to get it going on my Huawei arm64 > > > D06CS. > > > > > > So that board has a megaraid sas card. In addition, it also has hisi_sas > > > HW, which is another storage controller which we enabled this same > > > feature which is causing the problem. > > > > > > I'll report back when I can. > > > > So I had to hack that arm64 config a bit to get it booting: > > https://github.com/hisilicon/kernel-dev/commits/private-topic-sas-5.10-megaraid-hang > > > > Boot is ok on my board without the megaraid sas card, but includes > > hisi_sas HW (which enables the equivalent option which is exposing the > > problem). > > > > But the board with the megaraid sas boots very slowly, specifically > > around the megaraid sas probe: > > > > : ttyS0 at MMIO 0x3f00002f8 (irq = 17, base_baud = 115200) is a 16550A > > [ 50.023726][ T1] printk: console [ttyS0] enabled > > [ 50.412597][ T1] megasas: 07.714.04.00-rc1 > > [ 50.436614][ T5] megaraid_sas 0000:08:00.0: FW now in Ready state > > [ 50.450079][ T5] megaraid_sas 0000:08:00.0: 63 bit DMA mask and 63 > > bit consistent mask > > [ 50.467811][ T5] megaraid_sas 0000:08:00.0: firmware supports msix > > : (128) > > [ 50.845995][ T5] megaraid_sas 0000:08:00.0: requested/available > > msix 128/128 > > [ 50.861476][ T5] megaraid_sas 0000:08:00.0: current msix/online > > cpus : (128/128) > > [ 50.877616][ T5] megaraid_sas 0000:08:00.0: RDPQ mode : (enabled) > > [ 50.891018][ T5] megaraid_sas 0000:08:00.0: Current firmware > > supports maximum commands: 4077 LDIO threshold: 0 > > [ 51.262942][ T5] megaraid_sas 0000:08:00.0: Performance mode > > :Latency (latency index = 1) > > [ 51.280749][ T5] megaraid_sas 0000:08:00.0: FW supports sync cache > > : Yes > > [ 51.295451][ T5] megaraid_sas 0000:08:00.0: > > megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009 > > [ 51.387474][ T5] megaraid_sas 0000:08:00.0: FW provided > > supportMaxExtLDs: 1 max_lds: 64 > > [ 51.404931][ T5] megaraid_sas 0000:08:00.0: controller type > > : MR(2048MB) > > [ 51.419616][ T5] megaraid_sas 0000:08:00.0: Online Controller > > Reset(OCR) : Enabled > > [ 51.436132][ T5] megaraid_sas 0000:08:00.0: Secure JBOD support > > : Yes > > [ 51.450265][ T5] megaraid_sas 0000:08:00.0: NVMe passthru support > > : Yes > > [ 51.464757][ T5] megaraid_sas 0000:08:00.0: FW provided TM > > TaskAbort/Reset timeout : 6 secs/60 secs > > [ 51.484379][ T5] megaraid_sas 0000:08:00.0: JBOD sequence map > > support : Yes > > [ 51.499607][ T5] megaraid_sas 0000:08:00.0: PCI Lane Margining > > support : No > > [ 51.547610][ T5] megaraid_sas 0000:08:00.0: NVME page size > > : (4096) > > [ 51.608635][ T5] megaraid_sas 0000:08:00.0: > > megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000 > > [ 51.630285][ T5] megaraid_sas 0000:08:00.0: INIT adapter done > > [ 51.649854][ T5] megaraid_sas 0000:08:00.0: pci id > > : (0x1000)/(0x0016)/(0x19e5)/(0xd215) > > [ 51.667873][ T5] megaraid_sas 0000:08:00.0: unevenspan support : no > > [ 51.681646][ T5] megaraid_sas 0000:08:00.0: firmware crash dump : no > > [ 51.695596][ T5] megaraid_sas 0000:08:00.0: JBOD sequence map > > : enabled > > [ 51.711521][ T5] megaraid_sas 0000:08:00.0: Max firmware commands: > > 4076 shared with nr_hw_queues = 127 > > [ 51.733056][ T5] scsi host0: Avago SAS based MegaRAID driver > > [ 65.304363][ T5] scsi 0:0:0:0: Direct-Access ATA SAMSUNG > > MZ7KH1T9 404Q PQ: 0 ANSI: 6 > > [ 65.392401][ T5] scsi 0:0:1:0: Direct-Access ATA SAMSUNG > > MZ7KH1T9 404Q PQ: 0 ANSI: 6 > > [ 79.508307][ T5] scsi 0:0:65:0: Enclosure HUAWEI > > Expander 12Gx16 131 PQ: 0 ANSI: 6 > > [ 183.965109][ C14] random: fast init done > > > > Notice the 14 and 104 second delays. > > > > But does boot fully to get to the console. I'll wait for further issues, > > which you guys seem to experience after a while. > > > > Thanks, > > John > "megaraid_sas" driver calls “scsi_scan_host()” to discover SCSI > devices. In this failure case, scsi_scan_host() is taking a long time > to complete, hence causing delay in system boot. > With "host_tagset" enabled, scsi_scan_host() takes around 20 mins. > With "host_tagset" disabled, scsi_scan_host() takes upto 5-8 mins. > > The scan time depends upon the number of scsi channels and devices per > scsi channel is exposed by LLD. > megaraid_sas driver exposes 4 channels and 128 drives per channel. > > Each target scan takes 2 seconds (in case of failure with host_tagset > enabled). That's why driver load completes after ~20 minutes. See > below: > > [ 299.725271] kobject: 'target18:0:96': free name > [ 301.681267] kobject: 'target18:0:97' (00000000987c7f11): > kobject_cleanup, parent 0000000000000000 > [ 301.681269] kobject: 'target18:0:97' (00000000987c7f11): calling > ktype release > [ 301.681273] kobject: 'target18:0:97': free name > [ 303.575268] kobject: 'target18:0:98' (00000000a8c34149): > kobject_cleanup, parent 0000000000000000 > > In Qian's kernel .config, async scsi scan is disabled so in failure > case SCSI scan type is synchronous. > Below is the stack trace when scsi_scan_host() hangs: > > [<0>] __wait_rcu_gp+0x134/0x170 > [<0>] synchronize_rcu.part.80+0x53/0x60 > [<0>] blk_free_flush_queue+0x12/0x30 Can this issue disappear by applying the following change? diff --git a/block/blk-flush.c b/block/blk-flush.c index e32958f0b687..b1fe6176d77f 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -469,9 +469,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size, INIT_LIST_HEAD(&fq->flush_queue[1]); INIT_LIST_HEAD(&fq->flush_data_in_flight); - lockdep_register_key(&fq->key); - lockdep_set_class(&fq->mq_flush_lock, &fq->key); - return fq; fail_rq: @@ -486,7 +483,6 @@ void blk_free_flush_queue(struct blk_flush_queue *fq) if (!fq) return; - lockdep_unregister_key(&fq->key); kfree(fq->flush_rq); kfree(fq); } Thanks, Ming
> > Can this issue disappear by applying the following change? This change fixes the issue for me. Qian, Please try after applying changes suggested by Ming. Thanks, Sumit > > diff --git a/block/blk-flush.c b/block/blk-flush.c > index e32958f0b687..b1fe6176d77f 100644 > --- a/block/blk-flush.c > +++ b/block/blk-flush.c > @@ -469,9 +469,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size, > INIT_LIST_HEAD(&fq->flush_queue[1]); > INIT_LIST_HEAD(&fq->flush_data_in_flight); > > - lockdep_register_key(&fq->key); > - lockdep_set_class(&fq->mq_flush_lock, &fq->key); > - > return fq; > > fail_rq: > @@ -486,7 +483,6 @@ void blk_free_flush_queue(struct blk_flush_queue *fq) > if (!fq) > return; > > - lockdep_unregister_key(&fq->key); > kfree(fq->flush_rq); > kfree(fq); > } > > > Thanks, > Ming >
> > In Qian's kernel .config, async scsi scan is disabled so in failure > case SCSI scan type is synchronous. > Below is the stack trace when scsi_scan_host() hangs: > > [<0>] __wait_rcu_gp+0x134/0x170 > [<0>] synchronize_rcu.part.80+0x53/0x60 > [<0>] blk_free_flush_queue+0x12/0x30 > [<0>] blk_mq_hw_sysfs_release+0x21/0x70 this is per blk_mq_hw_ctx > [<0>] kobject_release+0x46/0x150 > [<0>] blk_mq_release+0xb4/0xf0 > [<0>] blk_release_queue+0xc4/0x130 > [<0>] kobject_release+0x46/0x150 > [<0>] scsi_device_dev_release_usercontext+0x194/0x3f0 > [<0>] execute_in_process_context+0x22/0xa0 > [<0>] device_release+0x2e/0x80 > [<0>] kobject_release+0x46/0x150 > [<0>] scsi_alloc_sdev+0x2e7/0x310 > [<0>] scsi_probe_and_add_lun+0x410/0xbd0 > [<0>] __scsi_scan_target+0xf2/0x530 > [<0>] scsi_scan_channel.part.7+0x51/0x70 > [<0>] scsi_scan_host_selected+0xd4/0x140 > [<0>] scsi_scan_host+0x198/0x1c0 > > This issue hits when lock related debugging is enabled in kernel config. > kernel .config parameters(may be subset of this list) are required to > hit the issue: > > CONFIG_PREEMPT_COUNT=y * > CONFIG_UNINLINE_SPIN_UNLOCK=y * > CONFIG_LOCK_STAT=y > CONFIG_DEBUG_RT_MUTEXES=y * > CONFIG_DEBUG_SPINLOCK=y * > CONFIG_DEBUG_MUTEXES=y * > CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y * > CONFIG_DEBUG_RWSEMS=y * > CONFIG_DEBUG_LOCK_ALLOC=y * > CONFIG_LOCKDEP=y * > CONFIG_DEBUG_LOCKDEP=y > CONFIG_TRACE_IRQFLAGS=y * > CONFIG_TRACE_IRQFLAGS_NMI=y > CONFIG_DEBUG_KOBJECT=y > CONFIG_PROVE_RCU=y * > CONFIG_PREEMPTIRQ_TRACEPOINTS=y * (* means that I enabled) > > When scsi_scan_host() hangs, there are no outstanding IOs with > megaraid_sas driver-firmware stack as SCSI "host_busy" counter and > megaraid_sas driver's internal counter are "0". > Key takeaways: > 1. Issue is observed when lock related debugging is enabled so issue > is seen in debug environment. > 2. Issue seems to be related to generic shared "host_tagset" code > whenever some kind of kernel debugging is enabled. We do not see an > immediate reason to hide this issue through disabling the > "host_tagset" feature. > > John, > Issue may hit on ARM platform too using Qian's .config file with other > adapters (e.g. hisi_sas) as well. So I feel disabling “host_tagset” in > megaraid_sas driver will not help. It requires debugging from the > “Entire Shared host tag feature” perspective as scsi_scan_host() > waittime aggravates when "host_tagset" is enabled. Also, I am doing > parallel debugging and if I find anything useful, I will share. So isn't this then really related to how many HW queues we expose there is just scaling up the time? For megaraid sas, it's 1->128 for my arm64 platform when host_tagset_enable=1. As a hack, I tried this (while keeping host_tagset_enable=1): @@ -6162,11 +6168,15 @@ static int megasas_init_fw(struct megasas_instance *instance) else instance->low_latency_index_start = 1; - num_msix_req = num_online_cpus() + instance->low_latency_index_start; + num_msix_req = 6 + instance->low_latency_index_start; (6 is an arbitrary small number) And boot time is nearly same as with host_tagset_enable=0. For hisi_sas, max HW queue number ever is 16. In addition, we don't scan each channel/id/lun for hisi_sas, as it has a scan handler. > > Qian, > I need full dmesg logs from your setup with > megaraid_sas.host_tagset_enable=1 and > megaraid_sas.host_tagset_enable=0. Please wait for a long time. I just > want to make sure that whatever you observe is the same as mine. > Thanks, John
On Wed, 2020-11-11 at 17:27 +0800, Ming Lei wrote: > Can this issue disappear by applying the following change? This makes the system boot again as well. > > diff --git a/block/blk-flush.c b/block/blk-flush.c > index e32958f0b687..b1fe6176d77f 100644 > --- a/block/blk-flush.c > +++ b/block/blk-flush.c > @@ -469,9 +469,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, > int cmd_size, > INIT_LIST_HEAD(&fq->flush_queue[1]); > INIT_LIST_HEAD(&fq->flush_data_in_flight); > > - lockdep_register_key(&fq->key); > - lockdep_set_class(&fq->mq_flush_lock, &fq->key); > - > return fq; > > fail_rq: > @@ -486,7 +483,6 @@ void blk_free_flush_queue(struct blk_flush_queue *fq) > if (!fq) > return; > > - lockdep_unregister_key(&fq->key); > kfree(fq->flush_rq); > kfree(fq); > } > > > Thanks, > Ming
On Wed, Nov 11, 2020 at 09:42:17AM -0500, Qian Cai wrote: > On Wed, 2020-11-11 at 17:27 +0800, Ming Lei wrote: > > Can this issue disappear by applying the following change? > > This makes the system boot again as well. OK, actually it isn't necessary to register one new lock key for each hctx(blk_flush_queue) instance, and the current way is really over-kill because there can be lots of hw queues in one system. The original lockdep warning can be avoided by setting one nvme_loop specific lock class simply. If nvme_loop is backed against another nvme_loop, we still can avoid the warning by killing the direct end io chain, or assign another lock class. Will prepare one formal patch tomorrow. Thanks, Ming
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 861f7140f52e..6960922d0d7f 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -37,6 +37,7 @@ #include <linux/poll.h> #include <linux/vmalloc.h> #include <linux/irq_poll.h> +#include <linux/blk-mq-pci.h> #include <scsi/scsi.h> #include <scsi/scsi_cmnd.h> @@ -113,6 +114,10 @@ unsigned int enable_sdev_max_qd; module_param(enable_sdev_max_qd, int, 0444); MODULE_PARM_DESC(enable_sdev_max_qd, "Enable sdev max qd as can_queue. Default: 0"); +int host_tagset_enable = 1; +module_param(host_tagset_enable, int, 0444); +MODULE_PARM_DESC(host_tagset_enable, "Shared host tagset enable/disable Default: enable(1)"); + MODULE_LICENSE("GPL"); MODULE_VERSION(MEGASAS_VERSION); MODULE_AUTHOR("megaraidlinux.pdl@broadcom.com"); @@ -3119,6 +3124,19 @@ megasas_bios_param(struct scsi_device *sdev, struct block_device *bdev, return 0; } +static int megasas_map_queues(struct Scsi_Host *shost) +{ + struct megasas_instance *instance; + + instance = (struct megasas_instance *)shost->hostdata; + + if (shost->nr_hw_queues == 1) + return 0; + + return blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT], + instance->pdev, instance->low_latency_index_start); +} + static void megasas_aen_polling(struct work_struct *work); /** @@ -3427,6 +3445,7 @@ static struct scsi_host_template megasas_template = { .eh_timed_out = megasas_reset_timer, .shost_attrs = megaraid_host_attrs, .bios_param = megasas_bios_param, + .map_queues = megasas_map_queues, .change_queue_depth = scsi_change_queue_depth, .max_segment_size = 0xffffffff, }; @@ -6808,6 +6827,26 @@ static int megasas_io_attach(struct megasas_instance *instance) host->max_lun = MEGASAS_MAX_LUN; host->max_cmd_len = 16; + /* Use shared host tagset only for fusion adaptors + * if there are managed interrupts (smp affinity enabled case). + * Single msix_vectors in kdump, so shared host tag is also disabled. + */ + + host->host_tagset = 0; + host->nr_hw_queues = 1; + + if ((instance->adapter_type != MFI_SERIES) && + (instance->msix_vectors > instance->low_latency_index_start) && + host_tagset_enable && + instance->smp_affinity_enable) { + host->host_tagset = 1; + host->nr_hw_queues = instance->msix_vectors - + instance->low_latency_index_start; + } + + dev_info(&instance->pdev->dev, + "Max firmware commands: %d shared with nr_hw_queues = %d\n", + instance->max_fw_cmds, host->nr_hw_queues); /* * Notify the mid-layer about the new controller */ diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c index 0824410f78f8..a4251121f173 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c @@ -359,24 +359,29 @@ megasas_get_msix_index(struct megasas_instance *instance, { int sdev_busy; - /* nr_hw_queue = 1 for MegaRAID */ - struct blk_mq_hw_ctx *hctx = - scmd->device->request_queue->queue_hw_ctx[0]; - - sdev_busy = atomic_read(&hctx->nr_active); + /* TBD - if sml remove device_busy in future, driver + * should track counter in internal structure. + */ + sdev_busy = atomic_read(&scmd->device->device_busy); if (instance->perf_mode == MR_BALANCED_PERF_MODE && - sdev_busy > (data_arms * MR_DEVICE_HIGH_IOPS_DEPTH)) + sdev_busy > (data_arms * MR_DEVICE_HIGH_IOPS_DEPTH)) { cmd->request_desc->SCSIIO.MSIxIndex = mega_mod64((atomic64_add_return(1, &instance->high_iops_outstanding) / MR_HIGH_IOPS_BATCH_COUNT), instance->low_latency_index_start); - else if (instance->msix_load_balance) + } else if (instance->msix_load_balance) { cmd->request_desc->SCSIIO.MSIxIndex = (mega_mod64(atomic64_add_return(1, &instance->total_io_count), instance->msix_vectors)); - else + } else if (instance->host->nr_hw_queues > 1) { + u32 tag = blk_mq_unique_tag(scmd->request); + + cmd->request_desc->SCSIIO.MSIxIndex = blk_mq_unique_tag_to_hwq(tag) + + instance->low_latency_index_start; + } else { cmd->request_desc->SCSIIO.MSIxIndex = instance->reply_map[raw_smp_processor_id()]; + } } /** @@ -956,9 +961,6 @@ megasas_alloc_cmds_fusion(struct megasas_instance *instance) if (megasas_alloc_cmdlist_fusion(instance)) goto fail_exit; - dev_info(&instance->pdev->dev, "Configured max firmware commands: %d\n", - instance->max_fw_cmds); - /* The first 256 bytes (SMID 0) is not used. Don't add to the cmd list */ io_req_base = fusion->io_request_frames + MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE; io_req_base_phys = fusion->io_request_frames_phys + MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE; @@ -1102,8 +1104,9 @@ megasas_ioc_init_fusion(struct megasas_instance *instance) MR_HIGH_IOPS_QUEUE_COUNT) && cur_intr_coalescing) instance->perf_mode = MR_BALANCED_PERF_MODE; - dev_info(&instance->pdev->dev, "Performance mode :%s\n", - MEGASAS_PERF_MODE_2STR(instance->perf_mode)); + dev_info(&instance->pdev->dev, "Performance mode :%s (latency index = %d)\n", + MEGASAS_PERF_MODE_2STR(instance->perf_mode), + instance->low_latency_index_start); instance->fw_sync_cache_support = (scratch_pad_1 & MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0;