From patchwork Fri Apr 24 16:20:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 220603 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18673C54FD0 for ; Fri, 24 Apr 2020 16:22:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E6F482098B for ; Fri, 24 Apr 2020 16:22:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728541AbgDXQWE (ORCPT ); Fri, 24 Apr 2020 12:22:04 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:58604 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728220AbgDXQWB (ORCPT ); Fri, 24 Apr 2020 12:22:01 -0400 Received: from ip5f5af183.dynamic.kabel-deutschland.de ([95.90.241.131] helo=wittgenstein.fritz.box) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jS15Q-0004dV-0p; Fri, 24 Apr 2020 16:21:56 +0000 From: Christian Brauner To: Jens Axboe , Greg Kroah-Hartman , linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-api@vger.kernel.org Cc: Jonathan Corbet , Serge Hallyn , "Rafael J. Wysocki" , Tejun Heo , "David S. Miller" , Christian Brauner , Saravana Kannan , Jan Kara , David Howells , Seth Forshee , David Rheinsberg , Tom Gundersen , Christian Kellner , Dmitry Vyukov , =?utf-8?q?St=C3=A9phane_Gr?= =?utf-8?q?aber?= , linux-doc@vger.kernel.org, netdev@vger.kernel.org, Steve Barber , Dylan Reid , Filipe Brandenburger , Kees Cook , Benjamin Elder , Akihiro Suda Subject: [PATCH v3 2/7] loopfs: implement loopfs Date: Fri, 24 Apr 2020 18:20:47 +0200 Message-Id: <20200424162052.441452-3-christian.brauner@ubuntu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200424162052.441452-1-christian.brauner@ubuntu.com> References: <20200424162052.441452-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This implements loopfs, a loop device filesystem. It takes inspiration from the binderfs filesystem I implemented about two years ago and with which we had overall good experiences so far. Parts of it are also based on [3] but it's mostly a new, imho cleaner approach. Loopfs allows to create private loop devices instances to applications for various use-cases. It covers the use-case that was expressed on-list and in-person to get programmatic access to private loop devices for image building in sandboxes. An illustration for this is provided in [4]. Also loopfs is intended to provide loop devices to privileged and unprivileged containers which has been a frequent request from various major tools (Chromium, Kubernetes, LXD, Moby/Docker, systemd). I'm providing a non-exhaustive list of issues and requests (cf. [5]) around this feature mainly to illustrate that I'm not making the use-cases up. Currently none of this can be done safely since handing a loop device from the host into a container means that the container can see anything that the host is doing with that loop device and what other containers are doing with that device too. And (bind-)mounting devtmpfs inside of containers is not secure at all so also not an option (though sometimes done out of despair apparently). The workloads people run in containers are supposed to be indiscernible from workloads run on the host and the tools inside of the container are supposed to not be required to be aware that they are running inside a container apart from containerization tools themselves. This is especially true when running older distros in containers that did exist before containers were as ubiquitous as they are today. With loopfs user can call mount -o loop and in a correctly setup container things work the same way they would on the host. The filesystem representation allows us to do this in a very simple way. At container setup, a container manager can mount a private instance of loopfs somehwere, e.g. at /dev/loopfs and then bind-mount or symlink /dev/loopfs/loop-control to /dev/loop-control, pre allocate and symlink the number of standard devices into their standard location and have a service file or rules in place that symlink additionally allocated loop devices through losetup into place as well. With the new syscall interception logic this is also possible for unprivileged containers. In these cases when a user calls mount -o loop it will be possible to completely setup the loop device in the container. The final mount syscall is handled through syscall interception which we already implemented and released in earlier kernels (see [1] and [2]) and is actively used in production workloads. The mount is often rewritten to a fuse binary to provide safe access for unprivileged containers. Loopfs also allows the creation of hidden/detached dynamic loop devices and associated mounts which also was a often issued request. With the old mount api this can be achieved by creating a temporary loopfs and stashing a file descriptor to the mount point and the loop-control device and immediately unmounting the loopfs instance. With the new mount api a detached mount can be created directly (i.e. a mount not visible anywhere in the filesystem). New loop devices can then be allocated and configured. They can be mounted through /proc/self// with the old mount api or by using the fd directly with the new mount api. Combined with a mount namespace this allows for fully auto-cleaned up loop devices on program crash. This ties back to various use-cases and is illustrated in [4]. The filesystem representation requires the standard boilerplate filesystem code we know from other tiny filesystems. And all of the loopfs code is hidden under a config option that defaults to false. This specifically means, that none of the code even exists when users do not have any use-case for loopfs. In addition, the loopfs code does not alter how loop devices behave at all, i.e. there are no changes to any existing workloads and I've taken care to ifdef all loopfs specific things out. Each loopfs mount is a separate instance. As such loop devices created in one instance are independent of loop devices created in another instance. This specifically entails that loop devices are only visible in the loopfs instance they belong to. The number of loop devices available in loopfs instances are hierarchically limited through /proc/sys/user/max_loop_devices via the ucount infrastructure (Thanks to David Rheinsberg for pointing out that missing piece.). An administrator could e.g. set echo 3 > /proc/sys/user/max_loop_devices at which point any loopfs instance mounted by uid x can only create 3 loop devices no matter how many loopfs instances they mount. This limit applies hierarchically to all user namespaces. In addition, loopfs has a "max" mount option which allows to set a limit on the number of loop devices for a given loopfs instance. This is mainly to cover use-cases where a single loopfs mount is shared as a bind-mount between multiple parties that are prevented from creating other loopfs mounts and is equivalent to the semantics of the binderfs and devpts "max" mount option. Note that in __loop_clr_fd() we now need not just check whether bdev is valid but also whether bdev->bd_disk is valid. This wasn't necessary before because in order to call LOOP_CLR_FD the loop device would need to be open and thus bdev->bd_disk was guaranteed to be allocated. For loopfs loop devices we allow callers to simply unlink them just as we do for binderfs binder devices and we do also need to account for the case where a loopfs superblock is shutdown while backing files might still be associated with some loop devices. In such cases no bd_disk device will be attached to bdev. This is not in itself noteworthy it's more about documenting the "why" of the added bdev->bd_disk check for posterity. [1]: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace") [2]: fb3c5386b382 ("seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE") [3]: https://lore.kernel.org/lkml/1401227936-15698-1-git-send-email-seth.forshee@canonical.com [4]: https://gist.github.com/brauner/dcaf15e6977cc1bfadfb3965f126c02f [5]: https://github.com/kubernetes-sigs/kind/issues/1333 https://github.com/kubernetes-sigs/kind/issues/1248 https://lists.freedesktop.org/archives/systemd-devel/2017-August/039453.html https://chromium.googlesource.com/chromiumos/docs/+/master/containers_and_vms.md#loop-mount https://gitlab.com/gitlab-com/support-forum/issues/3732 https://github.com/moby/moby/issues/27886 https://twitter.com/_AkihiroSuda_/status/1249664478267854848 https://serverfault.com/questions/701384/loop-device-in-a-linux-container https://discuss.linuxcontainers.org/t/providing-access-to-loop-and-other-devices-in-containers/1352 https://discuss.concourse-ci.org/t/exposing-dev-loop-devices-in-privileged-mode/813 Cc: Jens Axboe Cc: Steve Barber Cc: Filipe Brandenburger Cc: Kees Cook Cc: Benjamin Elder Cc: Seth Forshee Cc: Stéphane Graber Cc: Tom Gundersen Cc: Tejun Heo Cc: Christian Kellner Cc: Greg Kroah-Hartman Cc: "David S. Miller" Cc: Dylan Reid Cc: David Rheinsberg Cc: Akihiro Suda Cc: Dmitry Vyukov Cc: "Rafael J. Wysocki" Reviewed-by: Serge Hallyn Signed-off-by: Christian Brauner --- /* v2 */ - David Rheinsberg / Christian Brauner : - Correctly cleanup loop devices that are in-use after the loopfs instance has been shut down. This is important for some use-cases that David pointed out where they effectively create a loopfs instance, allocate devices and drop unnecessary references to it. - Christian Brauner : - Replace lo_loopfs_i inode member in struct loop_device with a custom struct lo_info pointer which is only allocated for loopfs loop devices. /* v3 */ - Christian Brauner : - Fix loopfs_access() to not care about non-loopfs devices. - Stash refcounted sbinfo in lo_info to simplify retrieval of user namespace. This way each loopfs instance just takes a single reference for each to the user namespace that is dropped when the last loop device is removed. This puts us on the safe side. (Thanks to Serge for making me aware of this issue. - David Rheinsberg / Serge Hallyn : - Remove "max" mount option. --- MAINTAINERS | 5 + drivers/block/Kconfig | 4 + drivers/block/Makefile | 1 + drivers/block/loop.c | 182 +++++++++++--- drivers/block/loop.h | 7 + drivers/block/loopfs/Makefile | 3 + drivers/block/loopfs/loopfs.c | 431 +++++++++++++++++++++++++++++++++ drivers/block/loopfs/loopfs.h | 37 +++ include/linux/user_namespace.h | 3 + include/uapi/linux/magic.h | 1 + kernel/ucount.c | 3 + 11 files changed, 646 insertions(+), 31 deletions(-) create mode 100644 drivers/block/loopfs/Makefile create mode 100644 drivers/block/loopfs/loopfs.c create mode 100644 drivers/block/loopfs/loopfs.h diff --git a/MAINTAINERS b/MAINTAINERS index b816a453b10e..560b37a65bce 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9957,6 +9957,11 @@ W: http://www.avagotech.com/support/ F: drivers/message/fusion/ F: drivers/scsi/mpt3sas/ +LOOPFS FILE SYSTEM +M: Christian Brauner +S: Supported +F: drivers/block/loopfs/ + LSILOGIC/SYMBIOS/NCR 53C8XX and 53C1010 PCI-SCSI drivers M: Matthew Wilcox L: linux-scsi@vger.kernel.org diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index 025b1b77b11a..d7ff37d795ad 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -214,6 +214,10 @@ config BLK_DEV_LOOP Most users will answer N here. +config BLK_DEV_LOOPFS + bool "Loopback device virtual filesystem support" + depends on BLK_DEV_LOOP=y + config BLK_DEV_LOOP_MIN_COUNT int "Number of loop devices to pre-create at init time" depends on BLK_DEV_LOOP diff --git a/drivers/block/Makefile b/drivers/block/Makefile index 795facd8cf19..7052be26aa8b 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -36,6 +36,7 @@ obj-$(CONFIG_XEN_BLKDEV_BACKEND) += xen-blkback/ obj-$(CONFIG_BLK_DEV_DRBD) += drbd/ obj-$(CONFIG_BLK_DEV_RBD) += rbd.o obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX) += mtip32xx/ +obj-$(CONFIG_BLK_DEV_LOOPFS) += loopfs/ obj-$(CONFIG_BLK_DEV_RSXX) += rsxx/ obj-$(CONFIG_ZRAM) += zram/ diff --git a/drivers/block/loop.c b/drivers/block/loop.c index da693e6a834e..0c99ee0b42a8 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -81,6 +81,10 @@ #include "loop.h" +#ifdef CONFIG_BLK_DEV_LOOPFS +#include "loopfs/loopfs.h" +#endif + #include static DEFINE_IDR(loop_index_idr); @@ -1115,6 +1119,24 @@ loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer, return err; } +static void loop_remove(struct loop_device *lo) +{ + del_gendisk(lo->lo_disk); + blk_cleanup_queue(lo->lo_queue); + blk_mq_free_tag_set(&lo->tag_set); + put_disk(lo->lo_disk); +#ifdef CONFIG_BLK_DEV_LOOPFS + loopfs_remove(lo); +#endif + kfree(lo); +} + +static inline void __loop_remove(struct loop_device *lo) +{ + idr_remove(&loop_index_idr, lo->lo_number); + loop_remove(lo); +} + static int __loop_clr_fd(struct loop_device *lo, bool release) { struct file *filp = NULL; @@ -1164,7 +1186,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release) } set_capacity(lo->lo_disk, 0); loop_sysfs_exit(lo); - if (bdev) { + if (bdev && bdev->bd_disk) { bd_set_size(bdev, 0); /* let user-space know about this change */ kobject_uevent(&disk_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE); @@ -1174,7 +1196,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release) module_put(THIS_MODULE); blk_mq_unfreeze_queue(lo->lo_queue); - partscan = lo->lo_flags & LO_FLAGS_PARTSCAN && bdev; + partscan = lo->lo_flags & LO_FLAGS_PARTSCAN && bdev && bdev->bd_disk; lo_number = lo->lo_number; loop_unprepare_queue(lo); out_unlock: @@ -1213,7 +1235,12 @@ static int __loop_clr_fd(struct loop_device *lo, bool release) lo->lo_flags = 0; if (!part_shift) lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN; - lo->lo_state = Lo_unbound; +#ifdef CONFIG_BLK_DEV_LOOPFS + if (loopfs_wants_remove(lo)) + __loop_remove(lo); + else +#endif + lo->lo_state = Lo_unbound; mutex_unlock(&loop_ctl_mutex); /* @@ -1259,6 +1286,74 @@ static int loop_clr_fd(struct loop_device *lo) return __loop_clr_fd(lo, false); } +#ifdef CONFIG_BLK_DEV_LOOPFS +int loopfs_rundown_locked(struct loop_device *lo) +{ + int ret; + + if (WARN_ON_ONCE(!loopfs_device(lo))) + return -EINVAL; + + ret = mutex_lock_killable(&loop_ctl_mutex); + if (ret) + return ret; + + if (lo->lo_state != Lo_unbound || atomic_read(&lo->lo_refcnt) > 0) { + ret = -EBUSY; + } else { + /* + * Since the device is unbound it has no associated backing + * file and we can safely set Lo_rundown to prevent it from + * being found. Actual cleanup happens during inode eviction. + */ + lo->lo_state = Lo_rundown; + ret = 0; + } + + mutex_unlock(&loop_ctl_mutex); + return ret; +} + +/** + * loopfs_evict_locked() - remove loop device or mark inactive + * @lo: loopfs loop device + * + * This function will remove a loop device. If it has no users + * and is bound the backing file will be cleaned up. If the loop + * device has users it will be marked for auto cleanup. + * This function is only called when a loopfs instance is shutdown + * when all references to it from this loopfs instance have been + * dropped. If there are still any references to it cleanup will + * happen in lo_release(). + */ +void loopfs_evict_locked(struct loop_device *lo) +{ + struct lo_loopfs *lo_info; + struct inode *lo_inode; + + WARN_ON_ONCE(!loopfs_device(lo)); + + mutex_lock(&loop_ctl_mutex); + lo_info = lo->lo_info; + lo_inode = lo_info->lo_inode; + lo_info->lo_inode = NULL; + lo_info->lo_flags |= LOOPFS_FLAGS_INACTIVE; + + if (atomic_read(&lo->lo_refcnt) > 0) { + lo->lo_flags |= LO_FLAGS_AUTOCLEAR; + } else { + lo->lo_state = Lo_rundown; + lo->lo_disk->private_data = NULL; + lo_inode->i_private = NULL; + + mutex_unlock(&loop_ctl_mutex); + __loop_clr_fd(lo, false); + return; + } + mutex_unlock(&loop_ctl_mutex); +} +#endif /* CONFIG_BLK_DEV_LOOPFS */ + static int loop_set_status(struct loop_device *lo, const struct loop_info64 *info) { @@ -1842,7 +1937,7 @@ static void lo_release(struct gendisk *disk, fmode_t mode) if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) { if (lo->lo_state != Lo_bound) - goto out_unlock; + goto out_remove; lo->lo_state = Lo_rundown; mutex_unlock(&loop_ctl_mutex); /* @@ -1860,6 +1955,12 @@ static void lo_release(struct gendisk *disk, fmode_t mode) blk_mq_unfreeze_queue(lo->lo_queue); } +out_remove: +#ifdef CONFIG_BLK_DEV_LOOPFS + if (lo->lo_state != Lo_bound && loopfs_wants_remove(lo)) + __loop_remove(lo); +#endif + out_unlock: mutex_unlock(&loop_ctl_mutex); } @@ -2006,7 +2107,7 @@ static const struct blk_mq_ops loop_mq_ops = { .complete = lo_complete_rq, }; -static int loop_add(struct loop_device **l, int i) +static int loop_add(struct loop_device **l, int i, struct inode *inode) { struct loop_device *lo; struct gendisk *disk; @@ -2096,7 +2197,17 @@ static int loop_add(struct loop_device **l, int i) disk->private_data = lo; disk->queue = lo->lo_queue; sprintf(disk->disk_name, "loop%d", i); + add_disk(disk); + +#ifdef CONFIG_BLK_DEV_LOOPFS + err = loopfs_add(lo, inode, disk_devt(disk)); + if (err) { + __loop_remove(lo); + goto out; + } +#endif + *l = lo; return lo->lo_number; @@ -2112,36 +2223,41 @@ static int loop_add(struct loop_device **l, int i) return err; } -static void loop_remove(struct loop_device *lo) -{ - del_gendisk(lo->lo_disk); - blk_cleanup_queue(lo->lo_queue); - blk_mq_free_tag_set(&lo->tag_set); - put_disk(lo->lo_disk); - kfree(lo); -} +struct find_free_cb_data { + struct loop_device **l; + struct inode *inode; +}; static int find_free_cb(int id, void *ptr, void *data) { struct loop_device *lo = ptr; - struct loop_device **l = data; + struct find_free_cb_data *cb_data = data; - if (lo->lo_state == Lo_unbound) { - *l = lo; - return 1; - } - return 0; + if (lo->lo_state != Lo_unbound) + return 0; + +#ifdef CONFIG_BLK_DEV_LOOPFS + if (!loopfs_access(cb_data->inode, lo)) + return 0; +#endif + + *cb_data->l = lo; + return 1; } -static int loop_lookup(struct loop_device **l, int i) +static int loop_lookup(struct loop_device **l, int i, struct inode *inode) { struct loop_device *lo; int ret = -ENODEV; if (i < 0) { int err; + struct find_free_cb_data cb_data = { + .l = &lo, + .inode = inode, + }; - err = idr_for_each(&loop_index_idr, &find_free_cb, &lo); + err = idr_for_each(&loop_index_idr, &find_free_cb, &cb_data); if (err == 1) { *l = lo; ret = lo->lo_number; @@ -2152,6 +2268,11 @@ static int loop_lookup(struct loop_device **l, int i) /* lookup and return a specific i */ lo = idr_find(&loop_index_idr, i); if (lo) { +#ifdef CONFIG_BLK_DEV_LOOPFS + if (!loopfs_access(inode, lo)) + return -EACCES; +#endif + *l = lo; ret = lo->lo_number; } @@ -2166,9 +2287,9 @@ static struct kobject *loop_probe(dev_t dev, int *part, void *data) int err; mutex_lock(&loop_ctl_mutex); - err = loop_lookup(&lo, MINOR(dev) >> part_shift); + err = loop_lookup(&lo, MINOR(dev) >> part_shift, NULL); if (err < 0) - err = loop_add(&lo, MINOR(dev) >> part_shift); + err = loop_add(&lo, MINOR(dev) >> part_shift, NULL); if (err < 0) kobj = NULL; else @@ -2192,15 +2313,15 @@ static long loop_control_ioctl(struct file *file, unsigned int cmd, ret = -ENOSYS; switch (cmd) { case LOOP_CTL_ADD: - ret = loop_lookup(&lo, parm); + ret = loop_lookup(&lo, parm, file_inode(file)); if (ret >= 0) { ret = -EEXIST; break; } - ret = loop_add(&lo, parm); + ret = loop_add(&lo, parm, file_inode(file)); break; case LOOP_CTL_REMOVE: - ret = loop_lookup(&lo, parm); + ret = loop_lookup(&lo, parm, file_inode(file)); if (ret < 0) break; if (lo->lo_state != Lo_unbound) { @@ -2212,14 +2333,13 @@ static long loop_control_ioctl(struct file *file, unsigned int cmd, break; } lo->lo_disk->private_data = NULL; - idr_remove(&loop_index_idr, lo->lo_number); - loop_remove(lo); + __loop_remove(lo); break; case LOOP_CTL_GET_FREE: - ret = loop_lookup(&lo, -1); + ret = loop_lookup(&lo, -1, file_inode(file)); if (ret >= 0) break; - ret = loop_add(&lo, -1); + ret = loop_add(&lo, -1, file_inode(file)); } mutex_unlock(&loop_ctl_mutex); @@ -2307,7 +2427,7 @@ static int __init loop_init(void) /* pre-create number of devices given by config or max_loop */ mutex_lock(&loop_ctl_mutex); for (i = 0; i < nr; i++) - loop_add(&lo, i); + loop_add(&lo, i, NULL); mutex_unlock(&loop_ctl_mutex); printk(KERN_INFO "loop: module loaded\n"); diff --git a/drivers/block/loop.h b/drivers/block/loop.h index af75a5ee4094..2b3cd5bac71e 100644 --- a/drivers/block/loop.h +++ b/drivers/block/loop.h @@ -17,6 +17,10 @@ #include #include +#ifdef CONFIG_BLK_DEV_LOOPFS +#include "loopfs/loopfs.h" +#endif + /* Possible states of device */ enum { Lo_unbound, @@ -62,6 +66,9 @@ struct loop_device { struct request_queue *lo_queue; struct blk_mq_tag_set tag_set; struct gendisk *lo_disk; +#ifdef CONFIG_BLK_DEV_LOOPFS + struct lo_loopfs *lo_info; +#endif }; struct loop_cmd { diff --git a/drivers/block/loopfs/Makefile b/drivers/block/loopfs/Makefile new file mode 100644 index 000000000000..87ec703b662e --- /dev/null +++ b/drivers/block/loopfs/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only +loopfs-y := loopfs.o +obj-$(CONFIG_BLK_DEV_LOOPFS) += loopfs.o diff --git a/drivers/block/loopfs/loopfs.c b/drivers/block/loopfs/loopfs.c new file mode 100644 index 000000000000..09cd5a919ea2 --- /dev/null +++ b/drivers/block/loopfs/loopfs.c @@ -0,0 +1,431 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../loop.h" +#include "loopfs.h" + +#define FIRST_INODE 1 +#define SECOND_INODE 2 +#define INODE_OFFSET 3 + +struct loopfs_info { + kuid_t root_uid; + kgid_t root_gid; + struct dentry *control_dentry; + struct user_namespace *user_ns; + atomic_t users; +}; + +static inline struct loopfs_info *LOOPFS_SB(const struct super_block *sb) +{ + return sb->s_fs_info; +} + +struct super_block *loopfs_i_sb(const struct inode *inode) +{ + if (inode && inode->i_sb->s_magic == LOOPFS_SUPER_MAGIC) + return inode->i_sb; + + return NULL; +} + +bool loopfs_device(const struct loop_device *lo) +{ + return lo->lo_info != NULL; +} + +struct user_namespace *loopfs_ns(const struct loop_device *lo) +{ + if (loopfs_device(lo)) + return lo->lo_info->sbi->user_ns; + return &init_user_ns; +} + +bool loopfs_access(const struct inode *first, struct loop_device *lo) +{ + struct inode *second = NULL; + + if (loopfs_device(lo)) { + second = lo->lo_info->lo_inode; + if (!second) + return false; /* loopfs already gone */ + } + return loopfs_i_sb(first) == loopfs_i_sb(second); +} + +bool loopfs_wants_remove(const struct loop_device *lo) +{ + return loopfs_device(lo) && + (lo->lo_info->lo_flags & LOOPFS_FLAGS_INACTIVE); +} + +/** + * loopfs_add - allocate inode from super block of a loopfs mount + * @lo: loop device for which we are creating a new device entry + * @ref_inode: inode from wich the super block will be taken + * @device_nr: device number of the associated disk device + * + * This function creates a new device node for @lo. + * Minor numbers are limited and tracked globally. The + * function will stash a struct loop_device for the specific loop + * device in i_private of the inode. + * It will go on to allocate a new inode from the super block of the + * filesystem mount, stash a struct loop_device in its i_private field + * and attach a dentry to that inode. + * + * Return: 0 on success, negative errno on failure + */ +int loopfs_add(struct loop_device *lo, struct inode *ref_inode, dev_t device_nr) +{ + int ret; + char name[DISK_NAME_LEN]; + struct super_block *sb; + struct loopfs_info *info; + struct dentry *root, *dentry; + struct inode *inode; + struct lo_loopfs *lo_info; + + sb = loopfs_i_sb(ref_inode); + if (!sb) + return 0; + + if (MAJOR(device_nr) != LOOP_MAJOR) + return -EINVAL; + + lo_info = kzalloc(sizeof(struct lo_loopfs), GFP_KERNEL); + if (!lo_info) { + ret = -ENOMEM; + goto err; + } + + info = LOOPFS_SB(sb); + lo_info->lo_ucount = inc_ucount(sb->s_user_ns, + info->root_uid, UCOUNT_LOOP_DEVICES); + if (!lo_info->lo_ucount) { + ret = -ENOSPC; + goto err; + } + + if (snprintf(name, sizeof(name), "loop%d", lo->lo_number) >= sizeof(name)) { + ret = -EINVAL; + goto err; + } + + inode = new_inode(sb); + if (!inode) { + ret = -ENOMEM; + goto err; + } + + /* + * The i_fop field will be set to the correct fops by the device layer + * when the loop device in this loopfs instance is opened. + */ + inode->i_ino = MINOR(device_nr) + INODE_OFFSET; + inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); + inode->i_uid = info->root_uid; + inode->i_gid = info->root_gid; + init_special_inode(inode, S_IFBLK | 0600, device_nr); + + root = sb->s_root; + inode_lock(d_inode(root)); + /* look it up */ + dentry = lookup_one_len(name, root, strlen(name)); + if (IS_ERR(dentry)) { + inode_unlock(d_inode(root)); + iput(inode); + ret = PTR_ERR(dentry); + goto err; + } + + if (d_really_is_positive(dentry)) { + /* already exists */ + dput(dentry); + inode_unlock(d_inode(root)); + iput(inode); + ret = -EEXIST; + goto err; + } + + d_instantiate(dentry, inode); + fsnotify_create(d_inode(root), dentry); + inode_unlock(d_inode(root)); + + lo_info->lo_inode = inode; + lo->lo_info = lo_info; + atomic_inc(&info->users); + lo->lo_info->sbi = info; + inode->i_private = lo; + + return 0; + +err: + if (lo_info->lo_ucount) + dec_ucount(lo_info->lo_ucount, UCOUNT_LOOP_DEVICES); + kfree(lo_info); + return ret; +} + +void loopfs_remove(struct loop_device *lo) +{ + struct lo_loopfs *lo_info = lo->lo_info; + struct loopfs_info *sbi; + struct inode *inode; + struct super_block *sb; + struct dentry *root, *dentry; + + if (!lo_info) + return; + + inode = lo_info->lo_inode; + if (!inode || !S_ISBLK(inode->i_mode) || imajor(inode) != LOOP_MAJOR) + goto out; + + sb = loopfs_i_sb(inode); + lo_info->lo_inode = NULL; + + /* + * The root dentry is always the parent dentry since we don't allow + * creation of directories. + */ + root = sb->s_root; + + inode_lock(d_inode(root)); + dentry = d_find_any_alias(inode); + if (dentry && simple_positive(dentry)) { + simple_unlink(d_inode(root), dentry); + d_delete(dentry); + } + dput(dentry); + inode_unlock(d_inode(root)); + +out: + if (lo_info->lo_ucount) + dec_ucount(lo_info->lo_ucount, UCOUNT_LOOP_DEVICES); + sbi = lo_info->sbi; + if (atomic_dec_and_test(&sbi->users)) { + put_user_ns(sbi->user_ns); + kfree(sbi); + } + kfree(lo->lo_info); + lo->lo_info = NULL; +} + +/** + * loopfs_loop_ctl_create - create a new loop-control device + * @sb: super block of the loopfs mount + * + * This function creates a new loop-control device node in the loopfs mount + * referred to by @sb. + * + * Return: 0 on success, negative errno on failure + */ +static int loopfs_loop_ctl_create(struct super_block *sb) +{ + struct dentry *dentry; + struct inode *inode = NULL; + struct dentry *root = sb->s_root; + struct loopfs_info *info = sb->s_fs_info; + + if (info->control_dentry) + return 0; + + inode = new_inode(sb); + if (!inode) + return -ENOMEM; + + inode->i_ino = SECOND_INODE; + inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); + init_special_inode(inode, S_IFCHR | 0600, + MKDEV(MISC_MAJOR, LOOP_CTRL_MINOR)); + /* + * The i_fop field will be set to the correct fops by the device layer + * when the loop-control device in this loopfs instance is opened. + */ + inode->i_uid = info->root_uid; + inode->i_gid = info->root_gid; + + dentry = d_alloc_name(root, "loop-control"); + if (!dentry) { + iput(inode); + return -ENOMEM; + } + + info->control_dentry = dentry; + d_add(dentry, inode); + + return 0; +} + +static inline bool is_loopfs_control_device(const struct dentry *dentry) +{ + return LOOPFS_SB(dentry->d_sb)->control_dentry == dentry; +} + +static int loopfs_rename(struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry, + unsigned int flags) +{ + if (is_loopfs_control_device(old_dentry) || + is_loopfs_control_device(new_dentry)) + return -EPERM; + + return simple_rename(old_dir, old_dentry, new_dir, new_dentry, flags); +} + +static int loopfs_unlink(struct inode *dir, struct dentry *dentry) +{ + int ret; + struct loop_device *lo; + + if (is_loopfs_control_device(dentry)) + return -EPERM; + + lo = d_inode(dentry)->i_private; + ret = loopfs_rundown_locked(lo); + if (ret) + return ret; + + return simple_unlink(dir, dentry); +} + +static const struct inode_operations loopfs_dir_inode_operations = { + .lookup = simple_lookup, + .rename = loopfs_rename, + .unlink = loopfs_unlink, +}; + +static void loopfs_evict_inode(struct inode *inode) +{ + struct loop_device *lo = inode->i_private; + + clear_inode(inode); + + if (lo && S_ISBLK(inode->i_mode) && imajor(inode) == LOOP_MAJOR) { + loopfs_evict_locked(lo); + inode->i_private = NULL; + } +} + +static const struct super_operations loopfs_super_ops = { + .evict_inode = loopfs_evict_inode, + .statfs = simple_statfs, +}; + +static int loopfs_fill_super(struct super_block *sb, struct fs_context *fc) +{ + struct loopfs_info *info; + struct inode *inode = NULL; + + sb->s_blocksize = PAGE_SIZE; + sb->s_blocksize_bits = PAGE_SHIFT; + + sb->s_iflags &= ~SB_I_NODEV; + sb->s_iflags |= SB_I_NOEXEC; + sb->s_magic = LOOPFS_SUPER_MAGIC; + sb->s_op = &loopfs_super_ops; + sb->s_time_gran = 1; + + sb->s_fs_info = kzalloc(sizeof(struct loopfs_info), GFP_KERNEL); + if (!sb->s_fs_info) + return -ENOMEM; + info = sb->s_fs_info; + + info->root_gid = make_kgid(sb->s_user_ns, 0); + if (!gid_valid(info->root_gid)) + info->root_gid = GLOBAL_ROOT_GID; + info->root_uid = make_kuid(sb->s_user_ns, 0); + if (!uid_valid(info->root_uid)) + info->root_uid = GLOBAL_ROOT_UID; + info->user_ns = get_user_ns(sb->s_user_ns); + atomic_set(&info->users, 1); + + inode = new_inode(sb); + if (!inode) + return -ENOMEM; + + inode->i_ino = FIRST_INODE; + inode->i_fop = &simple_dir_operations; + inode->i_mode = S_IFDIR | 0755; + inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); + inode->i_op = &loopfs_dir_inode_operations; + set_nlink(inode, 2); + + sb->s_root = d_make_root(inode); + if (!sb->s_root) + return -ENOMEM; + + return loopfs_loop_ctl_create(sb); +} + +static int loopfs_fs_context_get_tree(struct fs_context *fc) +{ + return get_tree_nodev(fc, loopfs_fill_super); +} + +static void loopfs_fs_context_free(struct fs_context *fc) +{ + struct loopfs_info *sbi = fc->s_fs_info; + + fc->s_fs_info = NULL; + if (sbi && atomic_dec_and_test(&sbi->users)) { + put_user_ns(sbi->user_ns); + kfree(sbi); + } +} + +static const struct fs_context_operations loopfs_fs_context_ops = { + .free = loopfs_fs_context_free, + .get_tree = loopfs_fs_context_get_tree, +}; + +static int loopfs_init_fs_context(struct fs_context *fc) +{ + fc->ops = &loopfs_fs_context_ops; + return 0; +} + +static void loopfs_kill_sb(struct super_block *sb) +{ + struct loopfs_info *sbi = sb->s_fs_info; + + sb->s_fs_info = NULL; + if (atomic_dec_and_test(&sbi->users)) { + put_user_ns(sbi->user_ns); + kfree(sbi); + } + + kill_litter_super(sb); +} + +static struct file_system_type loop_fs_type = { + .name = "loop", + .init_fs_context = loopfs_init_fs_context, + .kill_sb = loopfs_kill_sb, + .fs_flags = FS_USERNS_MOUNT, +}; + +int __init init_loopfs(void) +{ + init_user_ns.ucount_max[UCOUNT_LOOP_DEVICES] = 255; + return register_filesystem(&loop_fs_type); +} + +module_init(init_loopfs); +MODULE_AUTHOR("Christian Brauner "); +MODULE_DESCRIPTION("Loop device filesystem"); diff --git a/drivers/block/loopfs/loopfs.h b/drivers/block/loopfs/loopfs.h new file mode 100644 index 000000000000..225d844f5a01 --- /dev/null +++ b/drivers/block/loopfs/loopfs.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_LOOPFS_FS_H +#define _LINUX_LOOPFS_FS_H + +#include +#include +#include +#include + +struct loop_device; + +#ifdef CONFIG_BLK_DEV_LOOPFS + +#define LOOPFS_FLAGS_INACTIVE (1 << 0) + +struct lo_loopfs { + struct loopfs_info *sbi; + struct ucounts *lo_ucount; + struct inode *lo_inode; + int lo_flags; +}; + +extern struct super_block *loopfs_i_sb(const struct inode *inode); +extern bool loopfs_device(const struct loop_device *lo); +extern struct user_namespace *loopfs_ns(const struct loop_device *lo); +extern bool loopfs_access(const struct inode *first, struct loop_device *lo); +extern int loopfs_add(struct loop_device *lo, struct inode *ref_inode, + dev_t device_nr); +extern void loopfs_remove(struct loop_device *lo); +extern bool loopfs_wants_remove(const struct loop_device *lo); +extern void loopfs_evict_locked(struct loop_device *lo); +extern int loopfs_rundown_locked(struct loop_device *lo); + +#endif + +#endif /* _LINUX_LOOPFS_FS_H */ diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 6ef1c7109fc4..04a4891765c0 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -49,6 +49,9 @@ enum ucount_type { #ifdef CONFIG_INOTIFY_USER UCOUNT_INOTIFY_INSTANCES, UCOUNT_INOTIFY_WATCHES, +#endif +#ifdef CONFIG_BLK_DEV_LOOPFS + UCOUNT_LOOP_DEVICES, #endif UCOUNT_COUNTS, }; diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index d78064007b17..0817d093a012 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -75,6 +75,7 @@ #define BINFMTFS_MAGIC 0x42494e4d #define DEVPTS_SUPER_MAGIC 0x1cd1 #define BINDERFS_SUPER_MAGIC 0x6c6f6f70 +#define LOOPFS_SUPER_MAGIC 0x6c6f6f71 #define FUTEXFS_SUPER_MAGIC 0xBAD1DEA #define PIPEFS_MAGIC 0x50495045 #define PROC_SUPER_MAGIC 0x9fa0 diff --git a/kernel/ucount.c b/kernel/ucount.c index 11b1596e2542..fb0f6394a8bb 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -73,6 +73,9 @@ static struct ctl_table user_table[] = { #ifdef CONFIG_INOTIFY_USER UCOUNT_ENTRY("max_inotify_instances"), UCOUNT_ENTRY("max_inotify_watches"), +#endif +#ifdef CONFIG_BLK_DEV_LOOPFS + UCOUNT_ENTRY("max_loop_devices"), #endif { } }; From patchwork Fri Apr 24 16:20:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 220600 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF5E8C55196 for ; Fri, 24 Apr 2020 16:22:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CE0A320700 for ; Fri, 24 Apr 2020 16:22:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728838AbgDXQWo (ORCPT ); Fri, 24 Apr 2020 12:22:44 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:58612 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728325AbgDXQWB (ORCPT ); Fri, 24 Apr 2020 12:22:01 -0400 Received: from ip5f5af183.dynamic.kabel-deutschland.de ([95.90.241.131] helo=wittgenstein.fritz.box) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jS15R-0004dV-QH; Fri, 24 Apr 2020 16:21:57 +0000 From: Christian Brauner To: Jens Axboe , Greg Kroah-Hartman , linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-api@vger.kernel.org Cc: Jonathan Corbet , Serge Hallyn , "Rafael J. Wysocki" , Tejun Heo , "David S. Miller" , Christian Brauner , Saravana Kannan , Jan Kara , David Howells , Seth Forshee , David Rheinsberg , Tom Gundersen , Christian Kellner , Dmitry Vyukov , =?utf-8?q?St=C3=A9phane_Gr?= =?utf-8?q?aber?= , linux-doc@vger.kernel.org, netdev@vger.kernel.org, Steve Barber , Dylan Reid , Filipe Brandenburger , Kees Cook , Benjamin Elder , Akihiro Suda Subject: [PATCH v3 3/7] loop: use ns_capable for some loop operations Date: Fri, 24 Apr 2020 18:20:48 +0200 Message-Id: <20200424162052.441452-4-christian.brauner@ubuntu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200424162052.441452-1-christian.brauner@ubuntu.com> References: <20200424162052.441452-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The following LOOP_GET_STATUS, LOOP_SET_STATUS, and LOOP_SET_BLOCK_SIZE operations are now allowed in non-initial namespaces. Most other operations were already possible before. Cc: Jens Axboe Cc: Seth Forshee Cc: Tom Gundersen Cc: Tejun Heo Cc: Christian Kellner Cc: Greg Kroah-Hartman Cc: "David S. Miller" Cc: David Rheinsberg Cc: Dmitry Vyukov Cc: "Rafael J. Wysocki" Reviewed-by: Serge Hallyn Signed-off-by: Christian Brauner --- /* v2 */ - Christian Brauner : - Adapated loop_capable() based on changes in the loopfs implementation patchset. Otherwise it is functionally equivalent to the v1 version. /* v3 */ unchanged --- drivers/block/loop.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 0c99ee0b42a8..40705f5aeabd 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1352,6 +1352,16 @@ void loopfs_evict_locked(struct loop_device *lo) } mutex_unlock(&loop_ctl_mutex); } + +static bool loop_capable(const struct loop_device *lo, int cap) +{ + return ns_capable(loopfs_ns(lo), cap); +} +#else /* !CONFIG_BLK_DEV_LOOPFS */ +static inline bool loop_capable(const struct loop_device *lo, int cap) +{ + return capable(cap); +} #endif /* CONFIG_BLK_DEV_LOOPFS */ static int @@ -1368,7 +1378,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info) return err; if (lo->lo_encrypt_key_size && !uid_eq(lo->lo_key_owner, uid) && - !capable(CAP_SYS_ADMIN)) { + !loop_capable(lo, CAP_SYS_ADMIN)) { err = -EPERM; goto out_unlock; } @@ -1499,7 +1509,7 @@ loop_get_status(struct loop_device *lo, struct loop_info64 *info) memcpy(info->lo_crypt_name, lo->lo_crypt_name, LO_NAME_SIZE); info->lo_encrypt_type = lo->lo_encryption ? lo->lo_encryption->number : 0; - if (lo->lo_encrypt_key_size && capable(CAP_SYS_ADMIN)) { + if (lo->lo_encrypt_key_size && loop_capable(lo, CAP_SYS_ADMIN)) { info->lo_encrypt_key_size = lo->lo_encrypt_key_size; memcpy(info->lo_encrypt_key, lo->lo_encrypt_key, lo->lo_encrypt_key_size); @@ -1723,7 +1733,7 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode, return loop_clr_fd(lo); case LOOP_SET_STATUS: err = -EPERM; - if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) { + if ((mode & FMODE_WRITE) || loop_capable(lo, CAP_SYS_ADMIN)) { err = loop_set_status_old(lo, (struct loop_info __user *)arg); } @@ -1732,7 +1742,7 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode, return loop_get_status_old(lo, (struct loop_info __user *) arg); case LOOP_SET_STATUS64: err = -EPERM; - if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) { + if ((mode & FMODE_WRITE) || loop_capable(lo, CAP_SYS_ADMIN)) { err = loop_set_status64(lo, (struct loop_info64 __user *) arg); } @@ -1742,7 +1752,7 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode, case LOOP_SET_CAPACITY: case LOOP_SET_DIRECT_IO: case LOOP_SET_BLOCK_SIZE: - if (!(mode & FMODE_WRITE) && !capable(CAP_SYS_ADMIN)) + if (!(mode & FMODE_WRITE) && !loop_capable(lo, CAP_SYS_ADMIN)) return -EPERM; /* Fall through */ default: From patchwork Fri Apr 24 16:20:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 220602 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2643C5519A for ; Fri, 24 Apr 2020 16:22:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CD49F20700 for ; Fri, 24 Apr 2020 16:22:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728671AbgDXQWO (ORCPT ); Fri, 24 Apr 2020 12:22:14 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:58648 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728538AbgDXQWF (ORCPT ); Fri, 24 Apr 2020 12:22:05 -0400 Received: from ip5f5af183.dynamic.kabel-deutschland.de ([95.90.241.131] helo=wittgenstein.fritz.box) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jS15W-0004dV-Nd; Fri, 24 Apr 2020 16:22:02 +0000 From: Christian Brauner To: Jens Axboe , Greg Kroah-Hartman , linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-api@vger.kernel.org Cc: Jonathan Corbet , Serge Hallyn , "Rafael J. Wysocki" , Tejun Heo , "David S. Miller" , Christian Brauner , Saravana Kannan , Jan Kara , David Howells , Seth Forshee , David Rheinsberg , Tom Gundersen , Christian Kellner , Dmitry Vyukov , =?utf-8?q?St=C3=A9phane_Gr?= =?utf-8?q?aber?= , linux-doc@vger.kernel.org, netdev@vger.kernel.org, Steve Barber , Dylan Reid , Filipe Brandenburger , Kees Cook , Benjamin Elder , Akihiro Suda Subject: [PATCH v3 6/7] loopfs: start attaching correct namespace during loop_add() Date: Fri, 24 Apr 2020 18:20:51 +0200 Message-Id: <20200424162052.441452-7-christian.brauner@ubuntu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200424162052.441452-1-christian.brauner@ubuntu.com> References: <20200424162052.441452-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Tag loop devices with the namespace the loopfs instance was mounted in. This has the consequence that loopfs devices carry the correct sysfs permissions for all their core files. All other devices files will continue to be correctly owned by the initial namespaces. Here is sample output: root@b1:~# mount -t loop loop /mnt root@b1:~# ln -sf /mnt/loop-control /dev/loop-control root@b1:~# losetup -f /dev/loop8 root@b1:~# ln -sf /mnt/loop8 /dev/loop8 root@b1:~# ls -al /sys/class/block/loop8 lrwxrwxrwx 1 root root 0 Apr 7 13:06 /sys/class/block/loop8 -> ../../devices/virtual/block/loop8 root@b1:~# ls -al /sys/class/block/loop8/ total 0 drwxr-xr-x 9 root root 0 Apr 7 13:06 . drwxr-xr-x 18 nobody nogroup 0 Apr 7 13:07 .. -r--r--r-- 1 root root 4096 Apr 7 13:06 alignment_offset lrwxrwxrwx 1 nobody nogroup 0 Apr 7 13:07 bdi -> ../../bdi/7:8 -r--r--r-- 1 root root 4096 Apr 7 13:06 capability -r--r--r-- 1 root root 4096 Apr 7 13:06 dev -r--r--r-- 1 root root 4096 Apr 7 13:06 discard_alignment -r--r--r-- 1 root root 4096 Apr 7 13:06 events -r--r--r-- 1 root root 4096 Apr 7 13:06 events_async -rw-r--r-- 1 root root 4096 Apr 7 13:06 events_poll_msecs -r--r--r-- 1 root root 4096 Apr 7 13:06 ext_range -r--r--r-- 1 root root 4096 Apr 7 13:06 hidden drwxr-xr-x 2 nobody nogroup 0 Apr 7 13:07 holders -r--r--r-- 1 root root 4096 Apr 7 13:06 inflight drwxr-xr-x 2 nobody nogroup 0 Apr 7 13:07 integrity drwxr-xr-x 3 nobody nogroup 0 Apr 7 13:07 mq drwxr-xr-x 2 root root 0 Apr 7 13:06 power drwxr-xr-x 3 nobody nogroup 0 Apr 7 13:07 queue -r--r--r-- 1 root root 4096 Apr 7 13:06 range -r--r--r-- 1 root root 4096 Apr 7 13:06 removable -r--r--r-- 1 root root 4096 Apr 7 13:06 ro -r--r--r-- 1 root root 4096 Apr 7 13:06 size drwxr-xr-x 2 nobody nogroup 0 Apr 7 13:07 slaves -r--r--r-- 1 root root 4096 Apr 7 13:06 stat lrwxrwxrwx 1 nobody nogroup 0 Apr 7 13:07 subsystem -> ../../../../class/block drwxr-xr-x 2 root root 0 Apr 7 13:06 trace -rw-r--r-- 1 root root 4096 Apr 7 13:06 uevent root@b1:~# Cc: Jens Axboe Reviewed-by: Serge Hallyn Signed-off-by: Christian Brauner --- /* v2 */ unchanged - Christian Brauner : - Adapted commit message otherwise unchanged. /* v3 */ unchanged --- drivers/block/loop.c | 3 +++ drivers/block/loopfs/loopfs.c | 6 ++++++ drivers/block/loopfs/loopfs.h | 1 + 3 files changed, 10 insertions(+) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 40705f5aeabd..a5fe05cba896 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -2207,6 +2207,9 @@ static int loop_add(struct loop_device **l, int i, struct inode *inode) disk->private_data = lo; disk->queue = lo->lo_queue; sprintf(disk->disk_name, "loop%d", i); +#ifdef CONFIG_BLK_DEV_LOOPFS + loopfs_init(disk, inode); +#endif add_disk(disk); diff --git a/drivers/block/loopfs/loopfs.c b/drivers/block/loopfs/loopfs.c index 09cd5a919ea2..9fa60c1bcc05 100644 --- a/drivers/block/loopfs/loopfs.c +++ b/drivers/block/loopfs/loopfs.c @@ -74,6 +74,12 @@ bool loopfs_wants_remove(const struct loop_device *lo) (lo->lo_info->lo_flags & LOOPFS_FLAGS_INACTIVE); } +void loopfs_init(struct gendisk *disk, struct inode *inode) +{ + if (loopfs_i_sb(inode)) + disk->user_ns = loopfs_i_sb(inode)->s_user_ns; +} + /** * loopfs_add - allocate inode from super block of a loopfs mount * @lo: loop device for which we are creating a new device entry diff --git a/drivers/block/loopfs/loopfs.h b/drivers/block/loopfs/loopfs.h index 225d844f5a01..7ca1b872b36e 100644 --- a/drivers/block/loopfs/loopfs.h +++ b/drivers/block/loopfs/loopfs.h @@ -31,6 +31,7 @@ extern void loopfs_remove(struct loop_device *lo); extern bool loopfs_wants_remove(const struct loop_device *lo); extern void loopfs_evict_locked(struct loop_device *lo); extern int loopfs_rundown_locked(struct loop_device *lo); +extern void loopfs_init(struct gendisk *disk, struct inode *inode); #endif From patchwork Fri Apr 24 16:20:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 220601 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4D65C5519D for ; Fri, 24 Apr 2020 16:22:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BCB9E20700 for ; Fri, 24 Apr 2020 16:22:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728757AbgDXQWS (ORCPT ); Fri, 24 Apr 2020 12:22:18 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:58670 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728561AbgDXQWH (ORCPT ); Fri, 24 Apr 2020 12:22:07 -0400 Received: from ip5f5af183.dynamic.kabel-deutschland.de ([95.90.241.131] helo=wittgenstein.fritz.box) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jS15Y-0004dV-8F; Fri, 24 Apr 2020 16:22:04 +0000 From: Christian Brauner To: Jens Axboe , Greg Kroah-Hartman , linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-api@vger.kernel.org Cc: Jonathan Corbet , Serge Hallyn , "Rafael J. Wysocki" , Tejun Heo , "David S. Miller" , Christian Brauner , Saravana Kannan , Jan Kara , David Howells , Seth Forshee , David Rheinsberg , Tom Gundersen , Christian Kellner , Dmitry Vyukov , =?utf-8?q?St=C3=A9phane_Gr?= =?utf-8?q?aber?= , linux-doc@vger.kernel.org, netdev@vger.kernel.org, Steve Barber , Dylan Reid , Filipe Brandenburger , Kees Cook , Benjamin Elder , Akihiro Suda Subject: [PATCH v3 7/7] loopfs: only show devices in their correct instance Date: Fri, 24 Apr 2020 18:20:52 +0200 Message-Id: <20200424162052.441452-8-christian.brauner@ubuntu.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200424162052.441452-1-christian.brauner@ubuntu.com> References: <20200424162052.441452-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Since loopfs devices belong to a loopfs instance they have no business polluting the host's devtmpfs mount and should not propagate out of the namespace they belong to. Cc: Jens Axboe Cc: Greg Kroah-Hartman Reviewed-by: Serge Hallyn Signed-off-by: Christian Brauner --- /* v2 */ unchanged /* v3 */ unchanged --- block/partitions/core.c | 1 + drivers/base/devtmpfs.c | 4 ++-- drivers/block/loopfs/loopfs.c | 4 +++- include/linux/device.h | 3 +++ 4 files changed, 9 insertions(+), 3 deletions(-) diff --git a/block/partitions/core.c b/block/partitions/core.c index bc1ded1331b1..5761f5c38588 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -416,6 +416,7 @@ struct hd_struct *add_partition(struct gendisk *disk, int partno, pdev->class = &block_class; pdev->type = &part_type; pdev->parent = ddev; + pdev->no_devnode = ddev->no_devnode; err = blk_alloc_devt(p, &devt); if (err) diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c index c9017e0584c0..77371ceb88fa 100644 --- a/drivers/base/devtmpfs.c +++ b/drivers/base/devtmpfs.c @@ -111,7 +111,7 @@ int devtmpfs_create_node(struct device *dev) const char *tmp = NULL; struct req req; - if (!thread) + if (!thread || dev->no_devnode) return 0; req.mode = 0; @@ -138,7 +138,7 @@ int devtmpfs_delete_node(struct device *dev) const char *tmp = NULL; struct req req; - if (!thread) + if (!thread || dev->no_devnode) return 0; req.name = device_get_devnode(dev, NULL, NULL, NULL, &tmp); diff --git a/drivers/block/loopfs/loopfs.c b/drivers/block/loopfs/loopfs.c index 9fa60c1bcc05..1bcb0b44c910 100644 --- a/drivers/block/loopfs/loopfs.c +++ b/drivers/block/loopfs/loopfs.c @@ -76,8 +76,10 @@ bool loopfs_wants_remove(const struct loop_device *lo) void loopfs_init(struct gendisk *disk, struct inode *inode) { - if (loopfs_i_sb(inode)) + if (loopfs_i_sb(inode)) { disk->user_ns = loopfs_i_sb(inode)->s_user_ns; + disk_to_dev(disk)->no_devnode = true; + } } /** diff --git a/include/linux/device.h b/include/linux/device.h index ac8e37cd716a..c69ef1c5a0ef 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -523,6 +523,8 @@ struct dev_links_info { * sync_state() callback. * @dma_coherent: this particular device is dma coherent, even if the * architecture supports non-coherent devices. + * @no_devnode: whether device nodes associated with this device are kept out + * of devtmpfs (e.g. due to separate filesystem) * * At the lowest level, every device in a Linux system is represented by an * instance of struct device. The device structure contains the information @@ -622,6 +624,7 @@ struct device { defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) bool dma_coherent:1; #endif + bool no_devnode:1; }; static inline struct device *kobj_to_dev(struct kobject *kobj)