Message ID | 20200819051945.1797088-1-hch@lst.de |
---|---|
State | New |
Headers | show |
Series | net: bypass ->sendpage for slab pages | expand |
On 2020/8/22 05:14, David Miller wrote: > From: Christoph Hellwig <hch@lst.de> > Date: Thu, 20 Aug 2020 06:37:44 +0200 > >> If you look at who uses sendpage outside the networking layer itself >> you see that it is basically block driver and file systems. These >> have no way to control what memory they get passed and have to deal >> with everything someone throws at them. > > I see nvme doing virt_to_page() on several things when it calls into > kernel_sendpage(). > > This is the kind of stuff I want cleaned up, and which your patch > will not trap nor address. > > In nvme it sometimes seems to check for sendpage validity: > > /* can't zcopy slab pages */ > if (unlikely(PageSlab(page))) { > ret = sock_no_sendpage(queue->sock, page, offset, len, > flags); > } else { > ret = kernel_sendpage(queue->sock, page, offset, len, > flags); > } > > Yet elsewhere does not and just blindly calls: > > ret = kernel_sendpage(queue->sock, virt_to_page(pdu), > offset_in_page(pdu) + req->offset, len, flags); > > This pdu seems to come from a page frag allocation. > > That's the target side. On the host side: > > ret = kernel_sendpage(cmd->queue->sock, page, cmd->offset, > left, flags); > > No page slab check or anything like that. > > I'm hesitent to put in the kernel_sendpage() patch, becuase it provides a > disincentive to fix up code like this. > Hi David and Christoph, It has been quiet for a while, what should we go next for the kernel_sendpage() related issue ? Will Christoph's or my series be considered as proper fix, or maybe I should wait for some other better idea to show up? Any is OK for me, once the problem is fixed. Thanks in advance. Coly Li
On Fri, Sep 18, 2020 at 04:37:24PM +0800, Coly Li wrote: > Hi David and Christoph, > > It has been quiet for a while, what should we go next for the > kernel_sendpage() related issue ? > > Will Christoph's or my series be considered as proper fix, or maybe I > should wait for some other better idea to show up? Any is OK for me, > once the problem is fixed. I think for all the network storage stuff we really need a "send me out a page helper", and the nvmet bits that Dave pointed to look to me like they actually are currently broken. Given that Dave doesn't want to change the kernel_sendpage semantics I'll resend with a new helper instead. Any preferences for a name? safe_sendpage? kernel_sendpage_safe? kernel_send_one_page?
diff --git a/net/socket.c b/net/socket.c index dbbe8ea7d395da..b4e65688915fe3 100644 --- a/net/socket.c +++ b/net/socket.c @@ -3638,7 +3638,11 @@ EXPORT_SYMBOL(kernel_getpeername); int kernel_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags) { - if (sock->ops->sendpage) + /* sendpage does manipulates the refcount of the sent in page, which + * does not work for Slab pages, or for tails of non-__GFP_COMP + * high order pages. + */ + if (sock->ops->sendpage && !PageSlab(page) && page_count(page) > 0) return sock->ops->sendpage(sock, page, offset, size, flags); return sock_no_sendpage(sock, page, offset, size, flags);
Sending Slab or tail pages into ->sendpage will cause really strange delayed oops. Prevent it right in the networking code instead of requiring drivers to guess the exact conditions where sendpage works. Based on a patch from Coly Li <colyli@suse.de>. Signed-off-by: Christoph Hellwig <hch@lst.de> --- net/socket.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)