mbox series

[v5,0/5] cramfs refresh for embedded usage

Message ID 20171006024531.8885-1-nicolas.pitre@linaro.org
Headers show
Series cramfs refresh for embedded usage | expand

Message

Nicolas Pitre Oct. 6, 2017, 2:45 a.m. UTC
This series brings a nice refresh to the cramfs filesystem, adding the
following capabilities:

- Direct memory access, bypassing the block and/or MTD layers entirely.

- Ability to store individual data blocks uncompressed.

- Ability to locate individual data blocks anywhere in the filesystem.

The end result is a very tight filesystem that can be accessed directly
from ROM without any other subsystem underneath. This also allows for
user space XIP which is a very important feature for tiny embedded
systems.

This series is also available based on v4.13 via git here:

  http://git.linaro.org/people/nicolas.pitre/linux xipcramfs

Why cramfs?

  Because cramfs is very simple and small. With CONFIG_CRAMFS_BLOCK=n and
  CONFIG_CRAMFS_PHYSMEM=y the cramfs driver may use as little as 3704 bytes
  of code. That's many times smaller than squashfs. And the runtime memory
  usage is also much less with cramfs than squashfs. It packs very tightly
  already compared to romfs which has no compression support. And the cramfs
  format was simple to extend, allowing for both compressed and uncompressed
  blocks within the same file.

Why not accessing ROM via MTD?

  The MTD layer is nice and flexible. It also represents a huge overhead
  considering its core with no other enabled options weights 19KB.
  That's many times the size of the cramfs code for something that
  essentially boils down to a glorified argument parser and a call to
  memremap() in this case.  And if someone still wants to use cramfs via
  MTD then it is already possible with mtdblock.

Why not using DAX?

  DAX stands for "Direct Access" and is a generic kernel layer helping
  with the necessary tasks involved with XIP. It is tailored for large
  writable filesystems and relies on the presence of an MMU. It also has
  the following shortcoming: "The DAX code does not work correctly on
  architectures which have virtually mapped caches such as ARM, MIPS and
  SPARC." That makes it unsuitable for a large portion of the intended
  targets for this series. And due to the read-only nature of cramfs, it is
  possible to achieve the intended result with a much simpler approach making
  DAX somewhat overkill in this context.

The maximum size of a cramfs image can't exceed 272MB. In practice it is
likely to be much much less. Given this series is concerned with small
memory systems, even in the MMU case there is always plenty of vmalloc
space left to map it all and even a 272MB memremap() wouldn't be a
problem. If it is then maybe your system is big enough with large
resources to manage already and you're pretty unlikely to be using cramfs
in the first place.

Of course, while this cramfs remains backward compatible with existing
filesystem images, a newer mkcramfs version is necessary to take advantage
of the extended data layout. I created a version of mkcramfs that
detects ELF files and marks text+rodata segments for XIP and compresses the
rest of those ELF files automatically.

So here it is. I'm also willing to step up as cramfs maintainer given
that no sign of any maintenance activities appeared for years.


Changes from v4:

- Remove fault handler with vma splitting in favor of VM_MIXEDMAP at mmap
  time for much simpler code. Thanks to Christoph Hellwig for review and
  suggestion.
- Additional code cleanups, mostly from Christoph's suggestions.

Changes from v3:

- Rebased on v4.13.
- Made direct access depend on cramfs not being modular due to unexported
  vma handling functions.
- Solicit comments from mm people explicitly.

Changes from v2:

- Plugged a few races in cramfs_vmasplit_fault(). Thanks to Al Viro for
  highlighting them.
- Fixed some checkpatch warnings

Changes from v1:

- Improved mmap() support by adding the ability to partially populate a
  mapping and lazily split the non directly mapable pages to a separate
  vma at fault time (thanks to Chris Brandt for testing).
- Clarified the documentation some more.


diffstat:

 Documentation/filesystems/cramfs.txt |  42 +++
 MAINTAINERS                          |   4 +-
 fs/cramfs/Kconfig                    |  38 +-
 fs/cramfs/README                     |  31 +-
 fs/cramfs/inode.c                    | 554 +++++++++++++++++++++++++----
 include/uapi/linux/cramfs_fs.h       |  26 +-
 init/do_mounts.c                     |   8 +
 7 files changed, 625 insertions(+), 78 deletions(-)

Comments

Christoph Hellwig Oct. 6, 2017, 6:39 a.m. UTC | #1
This is still missing a proper API for accessing the file system,
as said before specifying a physical address in the mount command
line is a an absolute non-no.

Either work with the mtd folks to get the mtd core down to an absolute
minimum suitable for you, or figure out a way to specify fs nodes
through DT or similar.
Chris Brandt Oct. 6, 2017, 4:07 p.m. UTC | #2
On Friday, October 06, 2017, Christoph Hellwig wrote:
> This is still missing a proper API for accessing the file system,

> as said before specifying a physical address in the mount command

> line is a an absolute non-no.

> 

> Either work with the mtd folks to get the mtd core down to an absolute

> minimum suitable for you, or figure out a way to specify fs nodes

> through DT or similar.


On my system, the QSPI Flash is memory mapped and set up by the boot 
loader. In order to test the upstream kernel, I use a squashfs image and 
mtd-rom.

So, 0x18000000 is the physical address of flash as it is seen by the 
CPU.

Is there any benefit to doing something similar to this?

	/* File System */
	/* Requires CONFIG_MTD_ROM=y */
	qspi@18000000 {
		compatible = "mtd-rom";
		probe-type = "map_rom";
		reg = <0x18000000 0x4000000>;	/* 64 MB*/
		bank-width = <4>;
		device-width = <1>;

		#address-cells = <1>;
		#size-cells = <1>;

		partition@800000 {
			label ="user";
			reg = <0x0800000 0x800000>; /* 8MB @ 0x18800000 */
			read-only;
		};
	};


Of course this basically ioremaps the entire space on probe, but I think
what you really want to do is just ioremap pages at a time (maybe..I 
might not be following your code correctly)


Chris
Nicolas Pitre Oct. 6, 2017, 4:30 p.m. UTC | #3
On Fri, 6 Oct 2017, Chris Brandt wrote:

> On Friday, October 06, 2017, Christoph Hellwig wrote:

> > This is still missing a proper API for accessing the file system,

> > as said before specifying a physical address in the mount command

> > line is a an absolute non-no.

> > 

> > Either work with the mtd folks to get the mtd core down to an absolute

> > minimum suitable for you, or figure out a way to specify fs nodes

> > through DT or similar.

> 

> On my system, the QSPI Flash is memory mapped and set up by the boot 

> loader. In order to test the upstream kernel, I use a squashfs image and 

> mtd-rom.

> 

> So, 0x18000000 is the physical address of flash as it is seen by the 

> CPU.

> 

> Is there any benefit to doing something similar to this?

> 

> 	/* File System */

> 	/* Requires CONFIG_MTD_ROM=y */

> 	qspi@18000000 {

> 		compatible = "mtd-rom";

> 		probe-type = "map_rom";

> 		reg = <0x18000000 0x4000000>;	/* 64 MB*/

> 		bank-width = <4>;

> 		device-width = <1>;

> 

> 		#address-cells = <1>;

> 		#size-cells = <1>;

> 

> 		partition@800000 {

> 			label ="user";

> 			reg = <0x0800000 0x800000>; /* 8MB @ 0x18800000 */

> 			read-only;

> 		};

> 	};

> 

> 

> Of course this basically ioremaps the entire space on probe, but I think

> what you really want to do is just ioremap pages at a time (maybe..I 

> might not be following your code correctly)


No need for ioremaping pages individually. This creates unneeded 
overhead, both in terms of code execution and TLB trashing. With a 
single map, the ARM code at least is smart enough to fit large MMU 
descriptors when possible with a single TLB for a large region. And if 
you're interested in XIP cramfs then you do have huge vmalloc space to 
spare anyway.

As to the requirement for a different interface than a raw physical 
address: I'm investigating factoring out the MTD partition parsing code 
so it could be used with or without the rest of MTD. Incidentally, the 
person who wrote the very first incarnation of MTD partitioning 17 years 
ago was actually me, so with luck I might be able to figure out 
something sensible.


Nicolas