[5.4,222/232] scsi: lpfc: Add registration for CPU Offline/Online events

Message ID	20200416131343.278063798@linuxfoundation.org
State	New
Headers	show Return-Path: <SRS0=D6dW=6A=vger.kernel.org=stable-owner@kernel.org> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, Dick Kennedy <dick.kennedy@broadcom.com>, James Smart <jsmart2021@gmail.com>, "Martin K. Petersen" <martin.petersen@oracle.com>, Sasha Levin <sashal@kernel.org> Subject: [PATCH 5.4 222/232] scsi: lpfc: Add registration for CPU Offline/Online events Date: Thu, 16 Apr 2020 15:25:16 +0200 Message-Id: <20200416131343.278063798@linuxfoundation.org> In-Reply-To: <20200416131316.640996080@linuxfoundation.org> References: <20200416131316.640996080@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org Precedence: bulk
Series	None \| expand [5.4,002/232] bus: sunxi-rsb: Return correct data when mixing 16-bit and 8-bit reads [5.4,003/232] ARM: dts: Fix dm814x Ethernet by changing to use rgmii-id mode [5.4,004/232] bpf: Fix deadlock with rq_lock in bpf_send_signal() [5.4,006/232] Input: tm2-touchkey - add support for Coreriver TC360 variant [5.4,009/232] rxrpc: Fix call interruptibility handling [5.4,010/232] net: stmmac: platform: Fix misleading interrupt error msg [5.4,012/232] hinic: fix a bug of waitting for IO stopped [5.4,014/232] hinic: fix out-of-order excution in arm cpu [5.4,017/232] selftests/net: add definition for SOL_DCCP to fix compilation errors for old libc [5.4,019/232] drm/scheduler: fix rare NULL ptr race [5.4,023/232] i2c: pca-platform: Use platform_irq_get_optional [5.4,024/232] media: rc: add keymap for Videostrong KII Pro [5.4,026/232] staging: wilc1000: avoid double unlocking of wilc->hif_cs mutex [5.4,027/232] media: venus: hfi_parser: Ignore HEVC encoding for V1 [5.4,030/232] null_blk: Handle null_add_dev() failures properly [5.4,032/232] media: imx: imx7_mipi_csis: Power off the source when stopping streaming [5.4,033/232] media: imx: imx7-media-csi: Fix video field handling [5.4,035/232] ACPI: EC: Do not clear boot_ec_is_ecdt in acpi_ec_add() [5.4,038/232] block: keep bdi->io_pages in sync with max_sectors_kb for stacked devices [5.4,040/232] irqchip/versatile-fpga: Handle chained IRQs properly [5.4,042/232] media: allegro: fix type of gop_length in channel_create message [5.4,043/232] sched: Avoid scale real weight down to zero [5.4,047/232] media: i2c: video-i2c: fix build errors due to imply hwmon [5.4,049/232] pstore/platform: fix potential mem leak if pstore_init_fs failed [5.4,050/232] gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty [5.4,054/232] efi/x86: Ignore the memory attributes table on i386 [5.4,056/232] block: Fix use-after-free issue accessing struct io_cq [5.4,057/232] media: i2c: ov5695: Fix power on and off sequences [5.4,059/232] irqchip/gic-v4: Provide irq_retrigger to avoid circular locking dependency [5.4,061/232] firmware: fix a double abort case with fw_load_sysfs_fallback [5.4,064/232] block, bfq: fix use-after-free in bfq_idle_slice_timer_body [5.4,066/232] btrfs: remove a BUG_ON() from merge_reloc_roots() [5.4,067/232] btrfs: restart relocate_tree_blocks properly [5.4,068/232] btrfs: track reloc roots based on their commit root bytenr [5.4,070/232] ASoC: dapm: connect virtual mux with default value [5.4,074/232] usb: gadget: composite: Inform controller driver of self-powered [5.4,075/232] ALSA: usb-audio: Add mixer workaround for TRX40 and co [5.4,078/232] ALSA: ice1724: Fix invalid access for enumerated ctl items [5.4,079/232] ALSA: pcm: oss: Fix regression by buffer overflow fix [5.4,081/232] ALSA: hda/realtek - a fake key event is triggered by running shutup [5.4,084/232] ALSA: hda/realtek - Remove now-unnecessary XPS 13 headphone noise fixups [5.4,085/232] ALSA: hda/realtek - Add quirk for Lenovo Carbon X1 8th gen [5.4,086/232] ALSA: hda/realtek - Add quirk for MSI GL63 [5.4,091/232] seccomp: Add missing compat_ioctl for notify [5.4,092/232] acpi/x86: ignore unspecified bit positions in the ACPI global lock field [5.4,095/232] thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=n [5.4,096/232] nvmet-tcp: fix maxh2cdata icresp parameter [5.4,098/232] efi/x86: Add TPM related EFI tables to unencrypted mapping checks [5.4,101/232] PCI: Add boot interrupt quirk mechanism for Xeon chipsets [5.4,102/232] PCI: qcom: Fix the fixup of PCI_VENDOR_ID_QCOM [5.4,104/232] sched/fair: Fix enqueue_task_fair warning [5.4,105/232] tpm: Dont make log failures fatal [5.4,106/232] tpm: tpm1_bios_measurements_next should increase position index [5.4,108/232] KEYS: reaching the keys quotas correctly [5.4,112/232] io_uring: remove bogus RLIMIT_NOFILE check in file registration [5.4,114/232] MIPS/tlbex: Fix LDDIR usage in setup_pw() for Loongson-3 [5.4,115/232] MIPS: OCTEON: irq: Fix potential NULL pointer dereference [5.4,116/232] PM / Domains: Allow no domain-idle-states DT property in genpd when parsing [5.4,118/232] ath9k: Handle txpower changes even when TPC is disabled [5.4,120/232] x86/tsc_msr: Use named struct initializers [5.4,123/232] x86/entry/32: Add missing ASM_CLAC to general_protection entry [5.4,125/232] KVM: nVMX: Properly handle userspace interrupt window request [5.4,126/232] KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks [5.4,128/232] KVM: x86: Allocate new rmap and large page tracking when moving memslot [5.4,130/232] KVM: x86: Gracefully handle __vmalloc() failure during VM allocation [5.4,131/232] KVM: VMX: Add a trampoline to fix VMREAD error handling [5.4,133/232] smb3: fix performance regression with setting mtime [5.4,134/232] CIFS: Fix bug which the return value by asynchronous read is error [5.4,136/232] mtd: spinand: Do not erase the block before writing a bad block marker [5.4,138/232] Btrfs: fix crash during unmount due to race with delayed inode workers [5.4,141/232] btrfs: drop block from cache on error in relocation [5.4,145/232] btrfs: use nofs allocations for running delayed items [5.4,146/232] remoteproc: qcom_q6v5_mss: Dont reassign mpss region on shutdown [5.4,147/232] remoteproc: qcom_q6v5_mss: Reload the mba region on coredump [5.4,150/232] crypto: mxs-dcp - fix scatterlist linearization for hash [5.4,152/232] io_uring: honor original task RLIMIT_FSIZE [5.4,155/232] tools: gpio: Fix out-of-tree build regression [5.4,157/232] arm64: dts: allwinner: h6: Fix PMU compatible [5.4,159/232] arm64: dts: allwinner: h5: Fix PMU compatible [5.4,160/232] mm, memcg: do not high throttle allocators based on wraparound [5.4,163/232] dm verity fec: fix memory leak in verity_fec_dtr [5.4,165/232] dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions() [5.4,166/232] XArray: Fix xas_pause for large multi-index entries [5.4,167/232] xarray: Fix early termination of xas_for_each_marked [5.4,168/232] crypto: caam/qi2 - fix chacha20 data size error [5.4,174/232] scsi: ufs: fix Auto-Hibern8 error detection [5.4,175/232] scsi: lpfc: Fix lpfc_io_buf resource leak in lpfc_get_scsi_buf_s4 error path [5.4,178/232] arm64: armv8_deprecated: Fix undef_hook mask for thumb setend [5.4,182/232] vfio: platform: Switch to platform_get_irq_optional() [5.4,185/232] drm: Remove PageReserved manipulation from drm_pci_alloc [5.4,187/232] drm/amdgpu: unify fw_write_wait for new gfx9 asics [5.4,191/232] NFS: Fix a page leak in nfs_destroy_unlinked_subrequests() [5.4,193/232] fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() [5.4,194/232] ocfs2: no need try to truncate file beyond i_size [5.4,195/232] perf tools: Support Python 3.8+ in Makefile [5.4,197/232] Input: i8042 - add Acer Aspire 5738z to nomux list [5.4,199/232] clk: ingenic/jz4770: Exit with error if CGU init failed [5.4,200/232] clk: ingenic/TCU: Fix round_rate returning error [5.4,203/232] hfsplus: fix crash and filesystem corruption when deleting files [5.4,204/232] libata: Return correct status in sata_pmp_eh_recover_pm() when ATA_DFLAG_DETACH is set [5.4,207/232] powerpc/64/tm: Dont let userspace set regs->trap via sigreturn [5.4,209/232] powerpc/hash64/devmap: Use H_PAGE_THP_HUGE when setting up huge devmap PTE entries [5.4,210/232] powerpc/xive: Use XIVE_BAD_IRQ instead of zero to catch non configured IPIs [5.4,211/232] powerpc/64: Setup a paca before parsing device tree etc. [5.4,215/232] scsi: mpt3sas: Fix kernel panic observed on soft HBA unplug [5.4,216/232] powerpc: Make setjmp/longjmp signature standard [5.4,218/232] dm zoned: remove duplicate nr_rnd_zones increase in dmz_init_zone() [5.4,221/232] dm clone: Add missing casts to prevent overflows and data corruption [5.4,222/232] scsi: lpfc: Add registration for CPU Offline/Online events [5.4,225/232] scsi: lpfc: Fix broken Credit Recovery after driver load [5.4,228/232] drm/amdgpu: fix gfx hang during suspend with video playback (v2) [5.4,229/232] drm/i915/icl+: Dont enable DDI IO power on a TypeC port in TBT mode [5.4,231/232] mmc: sdhci: Convert sdhci_set_timeout_irq() to non-static [5.4,232/232] mmc: sdhci: Refactor sdhci_set_timeout()

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 691acbdcc46df..84b0f0ac26e7e 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -1209,6 +1209,13 @@ struct lpfc_hba { uint64_t ktime_seg10_min; uint64_t ktime_seg10_max; #endif + + struct hlist_node cpuhp; /* used for cpuhp per hba callback */ + struct timer_list cpuhp_poll_timer; + struct list_head poll_list; /* slowpath eq polling list */ +#define LPFC_POLL_HB 1 /* slowpath heartbeat */ +#define LPFC_POLL_FASTPATH 0 /* called from fastpath */ +#define LPFC_POLL_SLOWPATH 1 /* called from slowpath */ }; static inline struct Scsi_Host * diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h index b2ad8c7504862..9e477d766ce98 100644 --- a/drivers/scsi/lpfc/lpfc_crtn.h +++ b/drivers/scsi/lpfc/lpfc_crtn.h @@ -215,6 +215,12 @@ irqreturn_t lpfc_sli_fp_intr_handler(int, void *); irqreturn_t lpfc_sli4_intr_handler(int, void *); irqreturn_t lpfc_sli4_hba_intr_handler(int, void *); +inline void lpfc_sli4_cleanup_poll_list(struct lpfc_hba *phba); +int lpfc_sli4_poll_eq(struct lpfc_queue *q, uint8_t path); +void lpfc_sli4_poll_hbtimer(struct timer_list *t); +void lpfc_sli4_start_polling(struct lpfc_queue *q); +void lpfc_sli4_stop_polling(struct lpfc_queue *q); + void lpfc_read_rev(struct lpfc_hba *, LPFC_MBOXQ_t *); void lpfc_sli4_swap_str(struct lpfc_hba *, LPFC_MBOXQ_t *); void lpfc_config_ring(struct lpfc_hba *, int, LPFC_MBOXQ_t *); diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index e8813d26e5941..1bf79445c15bf 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -40,6 +40,7 @@ #include <linux/irq.h> #include <linux/bitops.h> #include <linux/crash_dump.h> +#include <linux/cpuhotplug.h> #include <scsi/scsi.h> #include <scsi/scsi_device.h> @@ -66,9 +67,13 @@ #include "lpfc_version.h" #include "lpfc_ids.h" +static enum cpuhp_state lpfc_cpuhp_state; /* Used when mapping IRQ vectors in a driver centric manner */ static uint32_t lpfc_present_cpu; +static void __lpfc_cpuhp_remove(struct lpfc_hba *phba); +static void lpfc_cpuhp_remove(struct lpfc_hba *phba); +static void lpfc_cpuhp_add(struct lpfc_hba *phba); static void lpfc_get_hba_model_desc(struct lpfc_hba *, uint8_t *, uint8_t *); static int lpfc_post_rcv_buf(struct lpfc_hba *); static int lpfc_sli4_queue_verify(struct lpfc_hba *); @@ -3387,6 +3392,8 @@ lpfc_online(struct lpfc_hba *phba) if (phba->cfg_xri_rebalancing) lpfc_create_multixri_pools(phba); + lpfc_cpuhp_add(phba); + lpfc_unblock_mgmt_io(phba); return 0; } @@ -3545,6 +3552,7 @@ lpfc_offline(struct lpfc_hba *phba) spin_unlock_irq(shost->host_lock); } lpfc_destroy_vport_work_array(phba, vports); + __lpfc_cpuhp_remove(phba); if (phba->cfg_xri_rebalancing) lpfc_destroy_multixri_pools(phba); @@ -9160,6 +9168,8 @@ lpfc_sli4_queue_destroy(struct lpfc_hba *phba) } spin_unlock_irq(&phba->hbalock); + lpfc_sli4_cleanup_poll_list(phba); + /* Release HBA eqs */ if (phba->sli4_hba.hdwq) lpfc_sli4_release_hdwq(phba); @@ -10962,6 +10972,170 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors) return; } +/** + * lpfc_cpuhp_get_eq + * + * @phba: pointer to lpfc hba data structure. + * @cpu: cpu going offline + * @eqlist: + */ +static void +lpfc_cpuhp_get_eq(struct lpfc_hba *phba, unsigned int cpu, + struct list_head *eqlist) +{ + struct lpfc_vector_map_info *map; + const struct cpumask *maskp; + struct lpfc_queue *eq; + unsigned int i; + cpumask_t tmp; + u16 idx; + + for (idx = 0; idx < phba->cfg_irq_chann; idx++) { + maskp = pci_irq_get_affinity(phba->pcidev, idx); + if (!maskp) + continue; + /* + * if irq is not affinitized to the cpu going + * then we don't need to poll the eq attached + * to it. + */ + if (!cpumask_and(&tmp, maskp, cpumask_of(cpu))) + continue; + /* get the cpus that are online and are affini- + * tized to this irq vector. If the count is + * more than 1 then cpuhp is not going to shut- + * down this vector. Since this cpu has not + * gone offline yet, we need >1. + */ + cpumask_and(&tmp, maskp, cpu_online_mask); + if (cpumask_weight(&tmp) > 1) + continue; + + /* Now that we have an irq to shutdown, get the eq + * mapped to this irq. Note: multiple hdwq's in + * the software can share an eq, but eventually + * only eq will be mapped to this vector + */ + for_each_possible_cpu(i) { + map = &phba->sli4_hba.cpu_map[i]; + if (!(map->irq == pci_irq_vector(phba->pcidev, idx))) + continue; + eq = phba->sli4_hba.hdwq[map->hdwq].hba_eq; + list_add(&eq->_poll_list, eqlist); + /* 1 is good enough. others will be a copy of this */ + break; + } + } +} + +static void __lpfc_cpuhp_remove(struct lpfc_hba *phba) +{ + if (phba->sli_rev != LPFC_SLI_REV4) + return; + + cpuhp_state_remove_instance_nocalls(lpfc_cpuhp_state, + &phba->cpuhp); + /* + * unregistering the instance doesn't stop the polling + * timer. Wait for the poll timer to retire. + */ + synchronize_rcu(); + del_timer_sync(&phba->cpuhp_poll_timer); +} + +static void lpfc_cpuhp_remove(struct lpfc_hba *phba) +{ + if (phba->pport->fc_flag & FC_OFFLINE_MODE) + return; + + __lpfc_cpuhp_remove(phba); +} + +static void lpfc_cpuhp_add(struct lpfc_hba *phba) +{ + if (phba->sli_rev != LPFC_SLI_REV4) + return; + + rcu_read_lock(); + + if (!list_empty(&phba->poll_list)) { + timer_setup(&phba->cpuhp_poll_timer, lpfc_sli4_poll_hbtimer, 0); + mod_timer(&phba->cpuhp_poll_timer, + jiffies + msecs_to_jiffies(LPFC_POLL_HB)); + } + + rcu_read_unlock(); + + cpuhp_state_add_instance_nocalls(lpfc_cpuhp_state, + &phba->cpuhp); +} + +static int __lpfc_cpuhp_checks(struct lpfc_hba *phba, int *retval) +{ + if (phba->pport->load_flag & FC_UNLOADING) { + *retval = -EAGAIN; + return true; + } + + if (phba->sli_rev != LPFC_SLI_REV4) { + *retval = 0; + return true; + } + + /* proceed with the hotplug */ + return false; +} + +static int lpfc_cpu_offline(unsigned int cpu, struct hlist_node *node) +{ + struct lpfc_hba *phba = hlist_entry_safe(node, struct lpfc_hba, cpuhp); + struct lpfc_queue *eq, *next; + LIST_HEAD(eqlist); + int retval; + + if (!phba) { + WARN_ONCE(!phba, "cpu: %u. phba:NULL", raw_smp_processor_id()); + return 0; + } + + if (__lpfc_cpuhp_checks(phba, &retval)) + return retval; + + lpfc_cpuhp_get_eq(phba, cpu, &eqlist); + + /* start polling on these eq's */ + list_for_each_entry_safe(eq, next, &eqlist, _poll_list) { + list_del_init(&eq->_poll_list); + lpfc_sli4_start_polling(eq); + } + + return 0; +} + +static int lpfc_cpu_online(unsigned int cpu, struct hlist_node *node) +{ + struct lpfc_hba *phba = hlist_entry_safe(node, struct lpfc_hba, cpuhp); + struct lpfc_queue *eq, *next; + unsigned int n; + int retval; + + if (!phba) { + WARN_ONCE(!phba, "cpu: %u. phba:NULL", raw_smp_processor_id()); + return 0; + } + + if (__lpfc_cpuhp_checks(phba, &retval)) + return retval; + + list_for_each_entry_safe(eq, next, &phba->poll_list, _poll_list) { + n = lpfc_find_cpu_handle(phba, eq->hdwq, LPFC_FIND_BY_HDWQ); + if (n == cpu) + lpfc_sli4_stop_polling(eq); + } + + return 0; +} + /** * lpfc_sli4_enable_msix - Enable MSI-X interrupt mode to SLI-4 device * @phba: pointer to lpfc hba data structure. @@ -11367,6 +11541,9 @@ lpfc_sli4_hba_unset(struct lpfc_hba *phba) /* Wait for completion of device XRI exchange busy */ lpfc_sli4_xri_exchange_busy_wait(phba); + /* per-phba callback de-registration for hotplug event */ + lpfc_cpuhp_remove(phba); + /* Disable PCI subsystem interrupt */ lpfc_sli4_disable_intr(phba); @@ -12632,6 +12809,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid) /* Enable RAS FW log support */ lpfc_sli4_ras_setup(phba); + INIT_LIST_HEAD(&phba->poll_list); + cpuhp_state_add_instance_nocalls(lpfc_cpuhp_state, &phba->cpuhp); + return 0; out_free_sysfs_attr: @@ -13450,11 +13630,24 @@ lpfc_init(void) /* Initialize in case vector mapping is needed */ lpfc_present_cpu = num_present_cpus(); + error = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, + "lpfc/sli4:online", + lpfc_cpu_online, lpfc_cpu_offline); + if (error < 0) + goto cpuhp_failure; + lpfc_cpuhp_state = error; + error = pci_register_driver(&lpfc_driver); - if (error) { - fc_release_transport(lpfc_transport_template); - fc_release_transport(lpfc_vport_transport_template); - } + if (error) + goto unwind; + + return error; + +unwind: + cpuhp_remove_multi_state(lpfc_cpuhp_state); +cpuhp_failure: + fc_release_transport(lpfc_transport_template); + fc_release_transport(lpfc_vport_transport_template); return error; } @@ -13471,6 +13664,7 @@ lpfc_exit(void) { misc_deregister(&lpfc_mgmt_dev); pci_unregister_driver(&lpfc_driver); + cpuhp_remove_multi_state(lpfc_cpuhp_state); fc_release_transport(lpfc_transport_template); fc_release_transport(lpfc_vport_transport_template); idr_destroy(&lpfc_hba_index); diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index e2cec1f6e659b..fb2b0dc52d9bc 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -485,7 +485,8 @@ lpfc_sli4_eq_flush(struct lpfc_hba *phba, struct lpfc_queue *eq) } static int -lpfc_sli4_process_eq(struct lpfc_hba *phba, struct lpfc_queue *eq) +lpfc_sli4_process_eq(struct lpfc_hba *phba, struct lpfc_queue *eq, + uint8_t rearm) { struct lpfc_eqe *eqe; int count = 0, consumed = 0; @@ -519,8 +520,8 @@ lpfc_sli4_process_eq(struct lpfc_hba *phba, struct lpfc_queue *eq) eq->queue_claimed = 0; rearm_and_exit: - /* Always clear and re-arm the EQ */ - phba->sli4_hba.sli4_write_eq_db(phba, eq, consumed, LPFC_QUEUE_REARM); + /* Always clear the EQ. */ + phba->sli4_hba.sli4_write_eq_db(phba, eq, consumed, rearm); return count; } @@ -7894,7 +7895,7 @@ lpfc_sli4_process_missed_mbox_completions(struct lpfc_hba *phba) if (mbox_pending) /* process and rearm the EQ */ - lpfc_sli4_process_eq(phba, fpeq); + lpfc_sli4_process_eq(phba, fpeq, LPFC_QUEUE_REARM); else /* Always clear and re-arm the EQ */ sli4_hba->sli4_write_eq_db(phba, fpeq, 0, LPFC_QUEUE_REARM); @@ -10055,10 +10056,13 @@ lpfc_sli_issue_iocb(struct lpfc_hba *phba, uint32_t ring_number, struct lpfc_iocbq *piocb, uint32_t flag) { struct lpfc_sli_ring *pring; + struct lpfc_queue *eq; unsigned long iflags; int rc; if (phba->sli_rev == LPFC_SLI_REV4) { + eq = phba->sli4_hba.hdwq[piocb->hba_wqidx].hba_eq; + pring = lpfc_sli4_calc_ring(phba, piocb); if (unlikely(pring == NULL)) return IOCB_ERROR; @@ -10066,6 +10070,8 @@ lpfc_sli_issue_iocb(struct lpfc_hba *phba, uint32_t ring_number, spin_lock_irqsave(&pring->ring_lock, iflags); rc = __lpfc_sli_issue_iocb(phba, ring_number, piocb, flag); spin_unlock_irqrestore(&pring->ring_lock, iflags); + + lpfc_sli4_poll_eq(eq, LPFC_POLL_FASTPATH); } else { /* For now, SLI2/3 will still use hbalock */ spin_lock_irqsave(&phba->hbalock, iflags); @@ -14245,7 +14251,7 @@ lpfc_sli4_hba_intr_handler(int irq, void *dev_id) lpfc_sli4_mod_hba_eq_delay(phba, fpeq, LPFC_MAX_AUTO_EQ_DELAY); /* process and rearm the EQ */ - ecount = lpfc_sli4_process_eq(phba, fpeq); + ecount = lpfc_sli4_process_eq(phba, fpeq, LPFC_QUEUE_REARM); if (unlikely(ecount == 0)) { fpeq->EQ_no_entry++; @@ -14305,6 +14311,147 @@ lpfc_sli4_intr_handler(int irq, void *dev_id) return (hba_handled == true) ? IRQ_HANDLED : IRQ_NONE; } /* lpfc_sli4_intr_handler */ +void lpfc_sli4_poll_hbtimer(struct timer_list *t) +{ + struct lpfc_hba *phba = from_timer(phba, t, cpuhp_poll_timer); + struct lpfc_queue *eq; + int i = 0; + + rcu_read_lock(); + + list_for_each_entry_rcu(eq, &phba->poll_list, _poll_list) + i += lpfc_sli4_poll_eq(eq, LPFC_POLL_SLOWPATH); + if (!list_empty(&phba->poll_list)) + mod_timer(&phba->cpuhp_poll_timer, + jiffies + msecs_to_jiffies(LPFC_POLL_HB)); + + rcu_read_unlock(); +} + +inline int lpfc_sli4_poll_eq(struct lpfc_queue *eq, uint8_t path) +{ + struct lpfc_hba *phba = eq->phba; + int i = 0; + + /* + * Unlocking an irq is one of the entry point to check + * for re-schedule, but we are good for io submission + * path as midlayer does a get_cpu to glue us in. Flush + * out the invalidate queue so we can see the updated + * value for flag. + */ + smp_rmb(); + + if (READ_ONCE(eq->mode) == LPFC_EQ_POLL) + /* We will not likely get the completion for the caller + * during this iteration but i guess that's fine. + * Future io's coming on this eq should be able to + * pick it up. As for the case of single io's, they + * will be handled through a sched from polling timer + * function which is currently triggered every 1msec. + */ + i = lpfc_sli4_process_eq(phba, eq, LPFC_QUEUE_NOARM); + + return i; +} + +static inline void lpfc_sli4_add_to_poll_list(struct lpfc_queue *eq) +{ + struct lpfc_hba *phba = eq->phba; + + if (list_empty(&phba->poll_list)) { + timer_setup(&phba->cpuhp_poll_timer, lpfc_sli4_poll_hbtimer, 0); + /* kickstart slowpath processing for this eq */ + mod_timer(&phba->cpuhp_poll_timer, + jiffies + msecs_to_jiffies(LPFC_POLL_HB)); + } + + list_add_rcu(&eq->_poll_list, &phba->poll_list); + synchronize_rcu(); +} + +static inline void lpfc_sli4_remove_from_poll_list(struct lpfc_queue *eq) +{ + struct lpfc_hba *phba = eq->phba; + + /* Disable slowpath processing for this eq. Kick start the eq + * by RE-ARMING the eq's ASAP + */ + list_del_rcu(&eq->_poll_list); + synchronize_rcu(); + + if (list_empty(&phba->poll_list)) + del_timer_sync(&phba->cpuhp_poll_timer); +} + +inline void lpfc_sli4_cleanup_poll_list(struct lpfc_hba *phba) +{ + struct lpfc_queue *eq, *next; + + list_for_each_entry_safe(eq, next, &phba->poll_list, _poll_list) + list_del(&eq->_poll_list); + + INIT_LIST_HEAD(&phba->poll_list); + synchronize_rcu(); +} + +static inline void +__lpfc_sli4_switch_eqmode(struct lpfc_queue *eq, uint8_t mode) +{ + if (mode == eq->mode) + return; + /* + * currently this function is only called during a hotplug + * event and the cpu on which this function is executing + * is going offline. By now the hotplug has instructed + * the scheduler to remove this cpu from cpu active mask. + * So we don't need to work about being put aside by the + * scheduler for a high priority process. Yes, the inte- + * rrupts could come but they are known to retire ASAP. + */ + + /* Disable polling in the fastpath */ + WRITE_ONCE(eq->mode, mode); + /* flush out the store buffer */ + smp_wmb(); + + /* + * Add this eq to the polling list and start polling. For + * a grace period both interrupt handler and poller will + * try to process the eq _but_ that's fine. We have a + * synchronization mechanism in place (queue_claimed) to + * deal with it. This is just a draining phase for int- + * errupt handler (not eq's) as we have guranteed through + * barrier that all the CPUs have seen the new CQ_POLLED + * state. which will effectively disable the REARMING of + * the EQ. The whole idea is eq's die off eventually as + * we are not rearming EQ's anymore. + */ + mode ? lpfc_sli4_add_to_poll_list(eq) : + lpfc_sli4_remove_from_poll_list(eq); +} + +void lpfc_sli4_start_polling(struct lpfc_queue *eq) +{ + __lpfc_sli4_switch_eqmode(eq, LPFC_EQ_POLL); +} + +void lpfc_sli4_stop_polling(struct lpfc_queue *eq) +{ + struct lpfc_hba *phba = eq->phba; + + __lpfc_sli4_switch_eqmode(eq, LPFC_EQ_INTERRUPT); + + /* Kick start for the pending io's in h/w. + * Once we switch back to interrupt processing on a eq + * the io path completion will only arm eq's when it + * receives a completion. But since eq's are in disa- + * rmed state it doesn't receive a completion. This + * creates a deadlock scenaro. + */ + phba->sli4_hba.sli4_write_eq_db(phba, eq, 0, LPFC_QUEUE_REARM); +} + /** * lpfc_sli4_queue_free - free a queue structure and associated memory * @queue: The queue structure to free. @@ -14379,6 +14526,7 @@ lpfc_sli4_queue_alloc(struct lpfc_hba *phba, uint32_t page_size, return NULL; INIT_LIST_HEAD(&queue->list); + INIT_LIST_HEAD(&queue->_poll_list); INIT_LIST_HEAD(&queue->wq_list); INIT_LIST_HEAD(&queue->wqfull_list); INIT_LIST_HEAD(&queue->page_list); @@ -19698,6 +19846,8 @@ lpfc_sli4_issue_wqe(struct lpfc_hba *phba, struct lpfc_sli4_hdw_queue *qp, lpfc_sli_ringtxcmpl_put(phba, pring, pwqe); spin_unlock_irqrestore(&pring->ring_lock, iflags); + + lpfc_sli4_poll_eq(qp->hba_eq, LPFC_POLL_FASTPATH); return 0; } @@ -19718,6 +19868,8 @@ lpfc_sli4_issue_wqe(struct lpfc_hba *phba, struct lpfc_sli4_hdw_queue *qp, } lpfc_sli_ringtxcmpl_put(phba, pring, pwqe); spin_unlock_irqrestore(&pring->ring_lock, iflags); + + lpfc_sli4_poll_eq(qp->hba_eq, LPFC_POLL_FASTPATH); return 0; } @@ -19746,6 +19898,8 @@ lpfc_sli4_issue_wqe(struct lpfc_hba *phba, struct lpfc_sli4_hdw_queue *qp, } lpfc_sli_ringtxcmpl_put(phba, pring, pwqe); spin_unlock_irqrestore(&pring->ring_lock, iflags); + + lpfc_sli4_poll_eq(qp->hba_eq, LPFC_POLL_FASTPATH); return 0; } return WQE_ERROR; diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h index 0d4882a9e634c..c60a636a88949 100644 --- a/drivers/scsi/lpfc/lpfc_sli4.h +++ b/drivers/scsi/lpfc/lpfc_sli4.h @@ -133,6 +133,23 @@ struct lpfc_rqb { struct lpfc_queue { struct list_head list; struct list_head wq_list; + + /* + * If interrupts are in effect on _all_ the eq's the footprint + * of polling code is zero (except mode). This memory is chec- + * ked for every io to see if the io needs to be polled and + * while completion to check if the eq's needs to be rearmed. + * Keep in same cacheline as the queue ptr to avoid cpu fetch + * stalls. Using 1B memory will leave us with 7B hole. Fill + * it with other frequently used members. + */ + uint16_t last_cpu; /* most recent cpu */ + uint16_t hdwq; + uint8_t qe_valid; + uint8_t mode; /* interrupt or polling */ +#define LPFC_EQ_INTERRUPT 0 +#define LPFC_EQ_POLL 1 + struct list_head wqfull_list; enum lpfc_sli4_queue_type type; enum lpfc_sli4_queue_subtype subtype; @@ -239,10 +256,8 @@ struct lpfc_queue { struct delayed_work sched_spwork; uint64_t isr_timestamp; - uint16_t hdwq; - uint16_t last_cpu; /* most recent cpu */ - uint8_t qe_valid; struct lpfc_queue *assoc_qp; + struct list_head _poll_list; void **q_pgs; /* array to index entries per page */ };

[5.4,222/232] scsi: lpfc: Add registration for CPU Offline/Online events

Commit Message

Patch