[5.15,104/207] tcp: fix page frag corruption on page fault

From: Paolo Abeni <pabeni@redhat.com>

From: Paolo Abeni <pabeni@redhat.com>

commit dacb5d8875cc6cd3a553363b4d6f06760fcbe70c upstream.

Steffen reported a TCP stream corruption for HTTP requests
served by the apache web-server using a cifs mount-point
and memory mapping the relevant file.

The root cause is quite similar to the one addressed by
commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from
memory reclaim"). Here the nested access to the task page frag
is caused by a page fault on the (mmapped) user-space memory
buffer coming from the cifs file.

The page fault handler performs an smb transaction on a different
socket, inside the same process context. Since sk->sk_allaction
for such socket does not prevent the usage for the task_frag,
the nested allocation modify "under the hood" the page frag
in use by the outer sendmsg call, corrupting the stream.

The overall relevant stack trace looks like the following:

httpd 78268 [001] 3461630.850950:      probe:tcp_sendmsg_locked:
        ffffffff91461d91 tcp_sendmsg_locked+0x1
        ffffffff91462b57 tcp_sendmsg+0x27
        ffffffff9139814e sock_sendmsg+0x3e
        ffffffffc06dfe1d smb_send_kvec+0x28
        [...]
        ffffffffc06cfaf8 cifs_readpages+0x213
        ffffffff90e83c4b read_pages+0x6b
        ffffffff90e83f31 __do_page_cache_readahead+0x1c1
        ffffffff90e79e98 filemap_fault+0x788
        ffffffff90eb0458 __do_fault+0x38
        ffffffff90eb5280 do_fault+0x1a0
        ffffffff90eb7c84 __handle_mm_fault+0x4d4
        ffffffff90eb8093 handle_mm_fault+0xc3
        ffffffff90c74f6d __do_page_fault+0x1ed
        ffffffff90c75277 do_page_fault+0x37
        ffffffff9160111e page_fault+0x1e
        ffffffff9109e7b5 copyin+0x25
        ffffffff9109eb40 _copy_from_iter_full+0xe0
        ffffffff91462370 tcp_sendmsg_locked+0x5e0
        ffffffff91462370 tcp_sendmsg_locked+0x5e0
        ffffffff91462b57 tcp_sendmsg+0x27
        ffffffff9139815c sock_sendmsg+0x4c
        ffffffff913981f7 sock_write_iter+0x97
        ffffffff90f2cc56 do_iter_readv_writev+0x156
        ffffffff90f2dff0 do_iter_write+0x80
        ffffffff90f2e1c3 vfs_writev+0xa3
        ffffffff90f2e27c do_writev+0x5c
        ffffffff90c042bb do_syscall_64+0x5b
        ffffffff916000ad entry_SYSCALL_64_after_hwframe+0x65

The cifs filesystem rightfully sets sk_allocations to GFP_NOFS,
we can avoid the nesting using the sk page frag for allocation
lacking the __GFP_FS flag. Do not define an additional mm-helper
for that, as this is strictly tied to the sk page frag usage.

v1 -> v2:
 - use a stricted sk_page_frag() check instead of reordering the
   code (Eric)

Reported-by: Steffen Froemer <sfroemer@redhat.com>
Fixes: 5640f7685831 ("net: use a per task frag allocator")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/sock.h |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Message ID	20211206145613.844832146@linuxfoundation.org
State	Superseded
Headers	show Return-Path: <stable-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44159C433EF for <stable@archiver.kernel.org>; Mon, 6 Dec 2021 15:31:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349618AbhLFPdy (ORCPT <rfc822;stable@archiver.kernel.org>); Mon, 6 Dec 2021 10:33:54 -0500 Received: from ams.source.kernel.org ([145.40.68.75]:35966 "EHLO ams.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1386909AbhLFPaU (ORCPT <rfc822;stable@vger.kernel.org>); Mon, 6 Dec 2021 10:30:20 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 213B0B810AC; Mon, 6 Dec 2021 15:26:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 701DDC34903; Mon, 6 Dec 2021 15:26:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1638804408; bh=cTIY31xDE66jg3uCbrprrqx8pWmnNOvMrVFyPGcMJX0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iZuqND79GLjkIs5cf2hoNwdaFmT+ekOkA46eVFjsa0KwCCfwnj0dnK/vNJqj6n/H4 v9NBdKYuieDyhOc7b2m2JwhAkb8XEyAYjXNHtyWxiNxAkh3qzgD0UnPTVQw+Z3u5IK WcYvcQ1NE0sCAoXFO2ymBkWOx/p+KXuFgV+1qgr4= From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, Steffen Froemer <sfroemer@redhat.com>, Paolo Abeni <pabeni@redhat.com>, Eric Dumazet <edumazet@google.com>, "David S. Miller" <davem@davemloft.net> Subject: [PATCH 5.15 104/207] tcp: fix page frag corruption on page fault Date: Mon, 6 Dec 2021 15:55:58 +0100 Message-Id: <20211206145613.844832146@linuxfoundation.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211206145610.172203682@linuxfoundation.org> References: <20211206145610.172203682@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <stable.vger.kernel.org> X-Mailing-List: stable@vger.kernel.org
Series	None \| expand [5.15,002/207] ALSA: usb-audio: Rename early_playback_start flag with lowlatency_playback [5.15,003/207] ALSA: usb-audio: Disable low-latency playback for free-wheel mode [5.15,004/207] ALSA: usb-audio: Disable low-latency mode for implicit feedback sync [5.15,005/207] ALSA: usb-audio: Check available frames for the next packet size [5.15,006/207] ALSA: usb-audio: Add spinlock to stop_urbs() [5.15,007/207] ALSA: usb-audio: Improved lowlatency playback support [5.15,008/207] ALSA: usb-audio: Avoid killing in-flight URBs during draining [5.15,009/207] ALSA: usb-audio: Fix packet size calculation regression [5.15,010/207] ALSA: usb-audio: Less restriction for low-latency playback mode [5.15,011/207] ALSA: usb-audio: Switch back to non-latency mode at a later point [5.15,012/207] ALSA: usb-audio: Dont start stream for capture at prepare [5.15,013/207] gfs2: release iopen glock early in evict [5.15,014/207] gfs2: Fix length of holes reported at end-of-file [5.15,015/207] powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent... [5.15,016/207] powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one window [5.15,017/207] drm/sun4i: fix unmet dependency on RESET_CONTROLLER for PHY_SUN6I_MIPI_DPHY [5.15,018/207] mac80211: do not access the IV when it was stripped [5.15,019/207] mac80211: fix throughput LED trigger [5.15,020/207] x86/hyperv: Move required MSRs check to initial platform probing [5.15,021/207] net/smc: Transfer remaining wait queue entries during fallback [5.15,022/207] atlantic: Fix OOB read and write in hw_atl_utils_fw_rpc_wait [5.15,023/207] net: return correct error code [5.15,024/207] pinctrl: qcom: fix unmet dependencies on GPIOLIB for GPIOLIB_IRQCHIP [5.15,025/207] platform/x86: dell-wmi-descriptor: disable by default [5.15,026/207] platform/x86: thinkpad_acpi: Add support for dual fan control [5.15,027/207] platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep [5.15,028/207] s390/setup: avoid using memblock_enforce_memory_limit [5.15,029/207] btrfs: silence lockdep when reading chunk tree during mount [5.15,030/207] btrfs: check-integrity: fix a warning on write caching disabled disk [5.15,031/207] thermal: core: Reset previous low and high trip during thermal zone init [5.15,032/207] scsi: iscsi: Unblock session then wake up error handler [5.15,033/207] net: usb: r8152: Add MAC passthrough support for more Lenovo Docks [5.15,034/207] drm/amd/pm: Remove artificial freq level on Navi1x [5.15,035/207] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again [5.15,036/207] drm/amd/amdgpu: fix potential memleak [5.15,037/207] ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile [5.15,038/207] ata: libahci: Adjust behavior when StorageD3Enable _DSD is set [5.15,039/207] ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf... [5.15,040/207] ipv6: check return value of ipv6_skip_exthdr [5.15,041/207] net: tulip: de4x5: fix the problem that the array lp->phy[8] may be out of bound [5.15,042/207] net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock() [5.15,043/207] perf sort: Fix the weight sort key behavior [5.15,044/207] perf sort: Fix the ins_lat sort key behavior [5.15,045/207] perf sort: Fix the p_stage_cyc sort key behavior [5.15,046/207] perf inject: Fix ARM SPE handling [5.15,047/207] perf hist: Fix memory leak of a perf_hpp_fmt [5.15,048/207] perf report: Fix memory leaks around perf_tip() [5.15,049/207] tracing: Dont use out-of-sync va_list in event printing [5.15,050/207] net/smc: Avoid warning of possible recursive locking [5.15,051/207] ACPI: Add stubs for wakeup handler functions [5.15,052/207] net/tls: Fix authentication failure in CCM mode [5.15,053/207] vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit [5.15,054/207] kprobes: Limit max data_size of the kretprobe instances [5.15,055/207] ALSA: hda/cs8409: Set PMSG_ON earlier inside cs8409 driver [5.15,056/207] rt2x00: do not mark device gone on EPROTO errors during start [5.15,057/207] ipmi: Move remove_work to dedicated workqueue [5.15,058/207] cpufreq: Fix get_cpu_device() failure in add_cpu_dev_symlink() [5.15,059/207] iwlwifi: mvm: retry init flow if failed [5.15,060/207] dma-buf: system_heap: Use for_each_sgtable_sg in pages free flow [5.15,061/207] s390/pci: move pseudo-MMIO to prevent MIO overlap [5.15,062/207] fget: check that the fd still exists after getting a ref to it [5.15,063/207] sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl [5.15,064/207] sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl [5.15,065/207] scsi: lpfc: Fix non-recovery of remote ports following an unsolicited LOGO [5.15,066/207] scsi: ufs: ufs-pci: Add support for Intel ADL [5.15,067/207] ipv6: fix memory leak in fib6_rule_suppress [5.15,068/207] drm/amd/display: Allow DSC on supported MST branch devices [5.15,069/207] drm/i915/dp: Perform 30ms delay after source OUI write [5.15,070/207] KVM: fix avic_set_running for preemptable kernels [5.15,071/207] KVM: Disallow user memslot with size that exceeds "unsigned long" [5.15,072/207] KVM: x86/mmu: Fix TLB flush range when handling disconnected pt [5.15,073/207] KVM: Ensure local memslot copies operate on up-to-date arch-specific data [5.15,074/207] KVM: x86: ignore APICv if LAPIC is not enabled [5.15,075/207] KVM: nVMX: Emulate guest TLB flush on nested VM-Enter with new vpid12 [5.15,076/207] KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST [5.15,077/207] KVM: nVMX: Abide to KVM_REQ_TLB_FLUSH_GUEST request on nested vmentry/vmexit [5.15,078/207] KVM: VMX: prepare sync_pir_to_irr for running with APICv disabled [5.15,079/207] KVM: x86: Use a stable condition around all VT-d PI paths [5.15,080/207] KVM: MMU: shadow nested paging does not have PKU [5.15,081/207] KVM: arm64: Avoid setting the upper 32 bits of TCR_EL2 and CPTR_EL2 to 1 [5.15,082/207] KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg() [5.15,083/207] KVM: x86: check PIR even for vCPUs with disabled APICv [5.15,084/207] tracing/histograms: String compares should not care about signed values [5.15,085/207] net: dsa: mv88e6xxx: Fix application of erratum 4.8 for 88E6393X [5.15,086/207] net: dsa: mv88e6xxx: Drop unnecessary check in mv88e6393x_serdes_erratum_4_6() [5.15,087/207] net: dsa: mv88e6xxx: Save power by disabling SerDes trasmitter and receiver [5.15,088/207] net: dsa: mv88e6xxx: Add fix for erratum 5.2 of 88E6393X family [5.15,089/207] net: dsa: mv88e6xxx: Fix inband AN for 2500base-x on 88E6393X family [5.15,090/207] net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed [5.15,091/207] wireguard: selftests: increase default dmesg log size [5.15,092/207] wireguard: allowedips: add missing __rcu annotation to satisfy sparse [5.15,093/207] wireguard: selftests: actually test for routing loops [5.15,094/207] wireguard: selftests: rename DEBUG_PI_LIST to DEBUG_PLIST [5.15,095/207] wireguard: device: reset peer src endpoint when netns exits [5.15,096/207] wireguard: receive: use ring buffer for incoming handshakes [5.15,097/207] wireguard: receive: drop handshakes if queue lock is contended [5.15,098/207] wireguard: ratelimiter: use kvcalloc() instead of kvzalloc() [5.15,099/207] i2c: stm32f7: flush TX FIFO upon transfer errors [5.15,100/207] i2c: stm32f7: recover the bus on access timeout [5.15,101/207] i2c: stm32f7: stop dma transfer in case of NACK [5.15,102/207] i2c: cbus-gpio: set atomic transfer callback [5.15,103/207] natsemi: xtensa: fix section mismatch warnings [5.15,104/207] tcp: fix page frag corruption on page fault [5.15,105/207] net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings() [5.15,106/207] net: mpls: Fix notifications when deleting a device [5.15,107/207] siphash: use _unaligned version by default [5.15,108/207] arm64: ftrace: add missing BTIs [5.15,109/207] iwlwifi: fix warnings produced by kernel debug options [5.15,110/207] net/mlx5e: IPsec: Fix Software parser inner l3 type setting in case of encapsulation [5.15,111/207] net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources() [5.15,112/207] selftests: net: Correct case name [5.15,113/207] net: dsa: b53: Add SPI ID table [5.15,114/207] mt76: mt7915: fix NULL pointer dereference in mt7915_get_phy_mode [5.15,115/207] ASoC: tegra: Fix wrong value type in ADMAIF [5.15,116/207] ASoC: tegra: Fix wrong value type in I2S [5.15,117/207] ASoC: tegra: Fix wrong value type in DMIC [5.15,118/207] ASoC: tegra: Fix wrong value type in DSPK [5.15,119/207] ASoC: tegra: Fix kcontrol put callback in ADMAIF [5.15,120/207] ASoC: tegra: Fix kcontrol put callback in I2S [5.15,121/207] ASoC: tegra: Fix kcontrol put callback in DMIC [5.15,122/207] ASoC: tegra: Fix kcontrol put callback in DSPK [5.15,123/207] ASoC: tegra: Fix kcontrol put callback in AHUB [5.15,124/207] rxrpc: Fix rxrpc_peer leak in rxrpc_look_up_bundle() [5.15,125/207] rxrpc: Fix rxrpc_local leak in rxrpc_lookup_peer() [5.15,126/207] ALSA: intel-dsp-config: add quirk for CML devices based on ES8336 codec [5.15,127/207] net: stmmac: Avoid DMA_CHAN_CONTROL write if no Split Header support [5.15,128/207] net: usb: lan78xx: lan78xx_phy_init(): use PHY_POLL instead of "0" if no IRQ is av... [5.15,129/207] net: marvell: mvpp2: Fix the computation of shared CPUs [5.15,130/207] dpaa2-eth: destroy workqueue at the end of remove function [5.15,131/207] octeontx2-af: Fix a memleak bug in rvu_mbox_init() [5.15,132/207] net: annotate data-races on txq->xmit_lock_owner [5.15,133/207] ipv4: convert fib_num_tclassid_users to atomic_t [5.15,134/207] net/smc: fix wrong list_del in smc_lgr_cleanup_early [5.15,135/207] net/rds: correct socket tunable error in rds_tcp_tune() [5.15,136/207] net/smc: Keep smc_close_final rc during active close [5.15,137/207] drm/msm/a6xx: Allocate enough space for GMU registers [5.15,138/207] drm/msm: Do hw_init() before capturing GPU state [5.15,139/207] drm/vc4: kms: Wait for the commit before increasing our clock rate [5.15,140/207] drm/vc4: kms: Fix return code check [5.15,141/207] drm/vc4: kms: Add missing drm_crtc_commit_put [5.15,142/207] drm/vc4: kms: Clear the HVS FIFO commit pointer once done [5.15,143/207] drm/vc4: kms: Dont duplicate pending commit [5.15,144/207] drm/vc4: kms: Fix previous HVS commit wait [5.15,145/207] atlantic: Increase delay for fw transactions [5.15,146/207] atlatnic: enable Nbase-t speeds with base-t [5.15,147/207] atlantic: Fix to display FW bundle version instead of FW mac version. [5.15,148/207] atlantic: Add missing DIDs and fix 115c. [5.15,149/207] Remove Half duplex mode speed capabilities. [5.15,150/207] atlantic: Fix statistics logic for production hardware [5.15,151/207] atlantic: Remove warn trace message. [5.15,152/207] KVM: x86/mmu: Skip tlb flush if it has been done in zap_gfn_range() [5.15,153/207] KVM: x86/mmu: Pass parameter flush as false in kvm_tdp_mmu_zap_collapsible_sptes() [5.15,154/207] drm/msm/devfreq: Fix OPP refcnt leak [5.15,155/207] drm/msm: Fix mmap to include VM_IO and VM_DONTDUMP [5.15,156/207] drm/msm: Fix wait_fence submitqueue leak [5.15,157/207] drm/msm: Restore error return on invalid fence [5.15,158/207] ASoC: rk817: Add module alias for rk817-codec [5.15,159/207] iwlwifi: Fix memory leaks in error handling path [5.15,160/207] KVM: X86: Fix when shadow_root_level=5 && guest root_level<4 [5.15,161/207] KVM: SEV: initialize regions_list of a mirror VM [5.15,162/207] net/mlx5e: Fix missing IPsec statistics on uplink representor [5.15,163/207] net/mlx5: Move MODIFY_RQT command to ignore list in internal error state [5.15,164/207] net/mlx5: E-switch, Respect BW share of the new group [5.15,165/207] net/mlx5: E-Switch, fix single FDB creation on BlueField [5.15,166/207] net/mlx5: E-Switch, Check group pointer before reading bw_share value [5.15,167/207] KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register [5.15,168/207] KVM: VMX: Set failure code in prepare_vmcs02() [5.15,169/207] mctp: Dont let RTM_DELROUTE delete local routes [5.15,170/207] Revert "drm/i915: Implement Wa_1508744258" [5.15,171/207] io-wq: dont retry task_work creation failure on fatal conditions [5.15,172/207] x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword [5.15,173/207] x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry() [5.15,174/207] x86/entry: Use the correct fence macro after swapgs in kernel CR3 [5.15,175/207] x86/xen: Add xenpv_restore_regs_and_return_to_usermode() [5.15,176/207] preempt/dynamic: Fix setup_preempt_mode() return value [5.15,177/207] sched/uclamp: Fix rq->uclamp_max not set on first enqueue [5.15,178/207] KVM: SEV: Return appropriate error codes if SEV-ES scratch setup fails [5.15,179/207] KVM: x86/mmu: Rename slot_handle_leaf to slot_handle_level_4k [5.15,180/207] KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path [5.15,181/207] net/mlx5e: Rename lro_timeout to packet_merge_timeout [5.15,182/207] net/mlx5e: Rename TIR lro functions to TIR packet merge functions [5.15,183/207] net/mlx5e: Sync TIR params updates against concurrent create/modify [5.15,184/207] serial: 8250_bcm7271: UART errors after resuming from S2 [5.15,185/207] parisc: Fix KBUILD_IMAGE for self-extracting kernel [5.15,186/207] parisc: Fix "make install" on newer debian releases [5.15,187/207] parisc: Mark cr16 CPU clocksource unstable on all SMP machines [5.15,188/207] vgacon: Propagate console boot parameters before calling `vc_resize [5.15,189/207] xhci: Fix commad ring abort, write all 64 bits to CRCR register. [5.15,190/207] USB: NO_LPM quirk Lenovo Powered USB-C Travel Hub [5.15,191/207] usb: typec: tcpm: Wait in SNK_DEBOUNCED until disconnect [5.15,192/207] usb: cdns3: gadget: fix new urb never complete if ep cancel previous requests [5.15,193/207] usb: cdnsp: Fix a NULL pointer dereference in cdnsp_endpoint_init() [5.15,194/207] x86/tsc: Add a timer to make sure TSC_adjust is always checked [5.15,195/207] x86/tsc: Disable clocksource watchdog for TSC on qualified platorms [5.15,196/207] x86/64/mm: Map all kernel memory into trampoline_pgd [5.15,197/207] tty: serial: msm_serial: Deactivate RX DMA for polling support [5.15,198/207] serial: pl011: Add ACPI SBSA UART match id [5.15,199/207] serial: tegra: Change lower tolerance baud rate limit for tegra20 and tegra30 [5.15,200/207] serial: core: fix transmit-buffer reset and memleak [5.15,201/207] serial: 8250_pci: Fix ACCES entries in pci_serial_quirks array [5.15,202/207] serial: 8250_pci: rewrite pericom_do_set_divisor() [5.15,203/207] serial: 8250: Fix RTS modem control while in rs485 mode [5.15,204/207] serial: liteuart: Fix NULL pointer dereference in ->remove() [5.15,205/207] serial: liteuart: fix use-after-free and memleak on unbind [5.15,206/207] serial: liteuart: fix minor-number leak on probe errors [5.15,207/207] ipmi: msghandler: Make symbol remove_work_wq static

[5.15,104/207] tcp: fix page frag corruption on page fault

Commit Message

Patch