Message ID | 20170606232450.30278-1-nicolas.pitre@linaro.org |
---|---|
Headers | show |
Series | scheduler tinification | expand |
* Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > Many embedded systems don't need the full scheduler support. Most of the > time, user space is tightly controlled and many of the scheduler facilities > are simply unused. Sorry, NAK: > 23 files changed, 3190 insertions(+), 2897 deletions(-) That's a lot of extra code plus churn for a code base that is already pretty #ifdef heavy. Also, the savings are marginal, even with significant functionality disabled: > text data bss dec hex filename > 28623 3404 128 32155 7d9b kernel/sched/built-in.o > > With this series and dl and rt classes disabled: > > text data bss dec hex filename > 20734 3334 40 24108 5e2c kernel/sched/built-in.o With 1GHz + 1GB RAM SoCs being well below $10 in bulk we worry about code complexity, predictability, testability, behavioral and ABI uniformity a lot more than about the last 10-20k of kernel text footprint... So I think the 'tiny' efforts are fundamentally misguided and are shooting for an ever shrinking market of RAM/ROM starved products whose share is shrinking every month. We want to _remove_ kernel options and reduce complexity, not increase it. So unless there's convincing counter arguments, or Linus overrules me, this NAK is pretty firm. I'd love to see scheduler complexity reduction patches though, the "CPP count" of the scheduler code base is pretty damn high: triton:~/tip> git grep -h '^#[^ ]' kernel/sched/ | cut -d' ' -f1 | sort | uniq -c | sort -n | tail -10 2 #ifdef CONFIG_SCHED_DEBUG 4 #endif /* 19 #if 26 #ifndef 27 #undef 97 #else 161 #define 199 #include 317 #ifdef 361 #endif Thanks, Ingo
On Wed, 7 Jun 2017, Ingo Molnar wrote: > > * Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > > Many embedded systems don't need the full scheduler support. Most of the > > time, user space is tightly controlled and many of the scheduler facilities > > are simply unused. > > Sorry, NAK: > > > 23 files changed, 3190 insertions(+), 2897 deletions(-) > > That's a lot of extra code plus churn for a code base that is already pretty > #ifdef heavy. > > Also, the savings are marginal, even with significant functionality disabled: > > > text data bss dec hex filename > > 28623 3404 128 32155 7d9b kernel/sched/built-in.o > > > > With this series and dl and rt classes disabled: > > > > text data bss dec hex filename > > 20734 3334 40 24108 5e2c kernel/sched/built-in.o > > With 1GHz + 1GB RAM SoCs being well below $10 in bulk we worry about code > complexity, predictability, testability, behavioral and ABI uniformity a lot more > than about the last 10-20k of kernel text footprint... > > So I think the 'tiny' efforts are fundamentally misguided and are shooting for an > ever shrinking market of RAM/ROM starved products whose share is shrinking every > month. I'm rather seeing the opposite: an ever growing market of internet-connected coin-cell-battery-powered tiny devices where the amount of RAM is counted in kilobytes rather than megabytes. Let me repeat some background as to what my fundamental motivation is, and then maybe you'll understand why I'm doing this. What is the biggest buzzword in the IT industry besides AI right now? It is IOT. Most IOT targets are so small that people are rewriting new operating systems from scratch for them. Lots of fragmentation already exists. We're talking about systems with less than one megabyte of RAM, sometimes much less. Still, those things are being connected to the internet. And this is going to be a total security nightmare. I wish to be able to leverage the Linux ecosystem for as much of the IOT space as possible to avoid the worst of those nightmares. The Linux ecosystem has a *lot* of knowledgeable people around it, a lot of testing infrastructure and tooling available already, etc. If a security issue turns up on Linux, it has a greater chance of being caught early, or fixed quickly otherwise, and finding people with the right knowledge is easier on Linux than it could be on any RTOS out there. Still with me so far? Yes we have tools that can automatically reduce the kernel size. We can use LTO with the compiler, etc. LTO is pretty good already. It can typically reduce the kernel size by 20%. If all system calls are disabled except for a few ones, then LTO can get rid of another 20%. The minimal kernel I get is still 400-500 KB in size. That's still too big. There is this 120 KB of VFS code that is always there even though there is no real filesystem at all configured in the kernel. There is that other 100 KB of core driver support code despite the fact that the set of drivers I'm using are very simple and make no use of most of that core driver code. Etc. There comes a point where there is no option but to explicitly trim out parts of the kernel as such decisions cannot be automated, hence this patch series. Bringing the scheduler under 20KB in size is therefore very useful in that context. Alternatively I could push for a parallel implementation as I did with the TTY layer where I obtained a 6x size reduction. But in the scheduler case I obtained only a 2x size reduction so I thought it could be more profitable to get about the same saving by reworking the existing code instead., and eventually contributing a very bare scheduler class that would be a smaller alternative to the fair scheduler for deployments where that makes sense. Unless you actually changed your mind about alternative whole scheduler implementations that is... For Linux to be suitable for small IoT, it has to be small, damn small. My target is 256 KB of RAM. And if you look at the kind of application those 256-KB systems are doing, it's basically one main task typically acquiring sensor data and sending it in some crypted protocol over a wireless network on the internet, and possibly accepting commands back. So what do you need from the OS to achieve that? A few system calls, a minimal scheduler, minimal memory management, minimal filesystem structure and minimal network stack. And your user app. So, why not having each of those blocks be created using the existing Linux syscall interface and internal API? At that point, it should be possible to take your standard full-featured Linux workstation and develop your user app on it, run it there using all the existing native debugging tools, etc. In the end you just pick the mini version of everything for the final target and you're done. And you don't have to learn a whole new OS, development environment and program model, etc. Next on my list would be a cache-less, completely serialized VFS bypass that has only what's needed to make the link between the read/write syscalls, a filesystem driver and a block driver while preserving the existing kernel APIs. And by being really small, the maintenance cost of a "parallel" implementation isn't very high, certainly much less than trying to maintain a single code path that can scale to both extremes in that case. PS: As far as I remember, Linus didn't condemn the idea last time I brought up this topic in his presence. I therefore hope we could find ways for allowing Linux usage into the largest computing device deployment to come. Nicolas
> Next on my list would be a cache-less, completely serialized VFS bypass > that has only what's needed to make the link between the read/write > syscalls, a filesystem driver and a block driver while preserving the > existing kernel APIs. And by being really small, the maintenance cost of > a "parallel" implementation isn't very high, certainly much less than > trying to maintain a single code path that can scale to both extremes > in that case. So once you've rewritten the tty layer, the device drivers, the VFS and removed most of the syscalls why even pretend it's Linux any more. It's something else, and that something else is totally architecturally incompatible with Linux. That's btw a good thing - trying to fit Linux directly into such a tiny device isn't sensible because the core assumptions you make about scalability are just totally different. IMHO it would be far far better to just borrow the bits that look handy, and the bits of the ABI you need and put them together as a new OS kernel. When you look at tiny hardware even core bits of the Linux architecture like the wait queues are just not sensible uses of memory and cause fragmentation. The dcache is completely insane in that environment, the scheduler is total overkill and the networking is easy to DoS in a tiny memory. The device layer assumes dynamic hot pluggable device architecture - and that's extremely expensive but nonsensical for most µcontrollers. It's easy to put a Unixlike OS in 256K of RAM and a pile of flash. It's going to be pretty easy to put all the major bits of the Linux API into it. You can run 2.11BSD with only 256K of writable memory (you need more in your PDP-11 to run it but if you look all of that in a µcontroller would live in flash). Alan
On Wed, 7 Jun 2017, Alan Cox wrote: > > Next on my list would be a cache-less, completely serialized VFS bypass > > that has only what's needed to make the link between the read/write > > syscalls, a filesystem driver and a block driver while preserving the > > existing kernel APIs. And by being really small, the maintenance cost of > > a "parallel" implementation isn't very high, certainly much less than > > trying to maintain a single code path that can scale to both extremes > > in that case. > > So once you've rewritten the tty layer, the device drivers, the VFS and > removed most of the syscalls why even pretend it's Linux any more. It's > something else, and that something else is totally architecturally > incompatible with Linux. You got at least one thing wrong. One huge benefit is to leverage existing device drivers of which Linux is plentiful. So there is no point rewriting device drivers. Then if most syscalls are removed then *of course* you won't be able to boot a standard "Linux" distro on it. But that's not the point either. However the compatibility is preserved the other way around i.e. user space from this Linux subset should just work as is on a full Linux kernel. And it would still be a Linux code base i.e. architecturally compatible with Linux at the source level. > That's btw a good thing - trying to fit Linux > directly into such a tiny device isn't sensible because the core > assumptions you make about scalability are just totally different. For a couple core components that's true, hence my approach with the TTY layer. But many other parts aren't that bad. And given that a small system can't afford that many whistles and bells then it is not like if the whole of Linux would be rewritten anyway. > IMHO it would be far far better to just borrow the bits that look > handy, and the bits of the ABI you need and put them together as a new > OS kernel. Hasn't that been attempted and failed already? One nasty effect of such an approach is effectively the creation of a fork, then you completely lose the community leverage and gravitational effect, create fragmentation, fixes are not propagated across, etc. > When you look at tiny hardware even core bits of the Linux > architecture like the wait queues are just not sensible uses of memory > and cause fragmentation. The dcache is completely insane in that > environment, the scheduler is total overkill and the networking is easy > to DoS in a tiny memory. The device layer assumes dynamic hot pluggable > device architecture - and that's extremely expensive but nonsensical for > most µcontrollers. Why do you think I'm proposing scheduler patches? And TTY patches before that, and having plans for the VFS? Obviously, all those things coule be reimplemented for small scale in a new and separate tiny OS. But what if those things could just live in the Linux source tree alongside their big cousins and be swapped according to your needs? Why couldn't those arguments served to the embedded people for years about joining the mainline effort be extended to this use case as well? > It's easy to put a Unixlike OS in 256K of RAM and a pile of flash. It's > going to be pretty easy to put all the major bits of the Linux API into > it. You can run 2.11BSD with only 256K of writable memory (you need more > in your PDP-11 to run it but if you look all of that in a µcontroller > would live in flash). Would be nice if that could share the same source code whenever possible, and also the same source tree, no? Nicolas
> You got at least one thing wrong. One huge benefit is to leverage > existing device drivers of which Linux is plentiful. So there is no > point rewriting device drivers. So you want to keep a common interface for some of the common driver APIs. Several people have managed that. > > IMHO it would be far far better to just borrow the bits that look > > handy, and the bits of the ABI you need and put them together as a new > > OS kernel. > > Hasn't that been attempted and failed already? One nasty effect of such > an approach is effectively the creation of a fork, then you completely > lose the community leverage and gravitational effect, create > fragmentation, fixes are not propagated across, etc. Almost nothing can be shared though, and for drivers you want to re-use then if you can re-use them you can share the code for that. > Why do you think I'm proposing scheduler patches? And TTY patches before > that, and having plans for the VFS? Obviously, all those things coule be > reimplemented for small scale in a new and separate tiny OS. But what if > those things could just live in the Linux source tree alongside their > big cousins and be swapped according to your needs? Why couldn't those > arguments served to the embedded people for years about joining the > mainline effort be extended to this use case as well? I don't think it works like that. The overhead of the duplication and trying to keep them aligned rapidly exceeds the value they give. The moment you try and do the job well you also > > > It's easy to put a Unixlike OS in 256K of RAM and a pile of flash. It's > > going to be pretty easy to put all the major bits of the Linux API into > > it. You can run 2.11BSD with only 256K of writable memory (you need more > > in your PDP-11 to run it but if you look all of that in a µcontroller > > would live in flash). > > Would be nice if that could share the same source code whenever > possible, and also the same source tree, no? But that will never work. The fundamental architecture of a tiny system is different because the scaling rules and underlying algorithms are different. wait queues don't work sanely on tiny devices, TCP queues need a totally different architecture, scheduling is quite different, memory mangement is totally different, things like the dcache which is fairly fundamental to the VFS internals make no sense, the locking model for file systems makes no sense because you can't use all that expensive scaling. Even the device core which is designed for dynamically managed trees of devices with hotplug, discovery and power management heirarchies is basically a large resource expensive paper weight. It goes on and on. Add any desire to do hard real time or meet things like ASIL-B to that and you hit a brick wall pretty damned quick. When you proposed the tty changes I was dubious, now you are talking about basically writing a new OS kernel in the same git tree that shares the drivers it looks even less sensible from a Linux perspective. Alan
Also, let me make it clear at the outset that we do care about RAM footprint all the time, and I've applied countless data structure and .text reducing patches to the kernel. But there's a cost/benefit analysis to be made, and this series fails that test in my view, because it increases the complexity of an already complex code base: * Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > Most IOT targets are so small that people are rewriting new operating systems > from scratch for them. Lots of fragmentation already exists. Let me offer a speculative if somewhat cynical prediction: 90% of those ghastly IOT hardware hacks won't survive the market. The remaining 10% will be successful financially, despite being ghastly hardware hacks and will eventually, in the next iteration or so, get a proper OS. As users ask for more features the the hardware capabilities will increase dramatically and home-grown microcontroller derived code plus minimal OSes will be replaced by a 'real' OS. Because both developers and users will demand IPv6 compatibility, or Bluetooth connectivity, or storage support, or any random range of features we have in the Linux kernel. With the stroke of a pen from the CFO: "yes, we can spend more on our next hardware design!" the problem goes away, overnight, and nobody will look back at the hardware hack that had only 1MB of RAM. > [...] We're talking about systems with less than one megabyte of RAM, sometimes > much less. Two data points: Firstly, by the time any Linux kernel change I commit today gets to a typical distro it's at least 0.5-1 years, 2 years for it to get widely used by hardware shops - 5 years to get used by enterprises. More latency in more conservative places. Secondly, I don't see Moore's Law reversing: http://nerdfever.com/wp-content/uploads/2015/06/2015-06_Moravec_MIPS.png If you combine those two time frames, the consequence of this: Even taking the 1MB size at face value (which I don't: a networking enabled system can probably not function very well with just 1MB of RAM) - the RAM-starved 1 MB system today will effectively be a 2 MB system in 2 years. And yes, I don't claim Moore's law will go on forever and I'm oversimplifying - maybe things are slowing down and it will only be 1.5 MB, but the point remains: the importance of your 20kb .text savings will become a 10-15k .text savings in just 2 years. In 8 years today's 1 MB system will be a 32 MB system if that trend holds up. You can already fit a mostly full Linux system into 32 MB just fine, i.e. the problem has solved itself just by waiting a bit or by increasing the hardware capabilities a bit. But the kernel complexity you introduce with this series stays with us! It will be an additional cost added to many scheduler commits going forward. It's an added cost for all the other usecases. Also, it's not like 20k .text savings will magically enable Linux to fit into 1MB of RAM - it won't. The smallest still practical more or less generic Linux system in existence today is around 16 MB. You can shrink it more, but the effort increases exponentially once you go below a natural minimum size. > [...] Still, those things are being connected to the internet. [...] So while I believe small size has its value, I think it's far more important to be able to _trust_ those devices than to squeeze the last kilobyte out of the kernel. In that sense these qualities: - reducing complexity, - reducing actual line count, - increasing testability, - increasing reviewability, - offering behavioral and ABI uniformity are more important than 1% of RAM of very, very RAM starved system which likely won't use Linux to begin with... So while it obviously the "complexity vs. kernel size" trade-off will always be a judgement call, for the scheduler it's not really an open question what we need to do at this stage: we need to reduce complexity and #ifdef variants, not increase it. Thanks, Ingo
> As users ask for more features the the hardware capabilities will increase > dramatically and home-grown microcontroller derived code plus minimal OSes will be > replaced by a 'real' OS. Because both developers and users will demand IPv6 > compatibility, or Bluetooth connectivity, or storage support, or any random range > of features we have in the Linux kernel. There are already tiny OS's with that feature set but they don't feel Unixish and aren't quite so fun to program. > Even taking the 1MB size at face value (which I don't: a networking enabled system > can probably not function very well with just 1MB of RAM) - the RAM-starved 1 MB > system today will effectively be a 2 MB system in 2 years. Probably not - I may be wrong but power and what you can and can't put on the same die are likely to mean that small RAM devices are here for a while and in fact the CFO will be ordering the engineers to get it in less RAM to save 20 cents a unit. > And yes, I don't claim Moore's law will go on forever and I'm oversimplifying - > maybe things are slowing down and it will only be 1.5 MB, but the point remains: > the importance of your 20kb .text savings will become a 10-15k .text savings in > just 2 years. In 8 years today's 1 MB system will be a 32 MB system if that trend > holds up. Power means it's more likely IMHO that todays 256K RAM system will in a few years be either a 64K RAM system or have tons of persistent memory. Alan
On Thu, 8 Jun 2017, Ingo Molnar wrote: > > Also, let me make it clear at the outset that we do care about RAM footprint all > the time, and I've applied countless data structure and .text reducing patches to > the kernel. But there's a cost/benefit analysis to be made, and this series fails > that test in my view, because it increases the complexity of an already complex > code base: > > * Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > > Most IOT targets are so small that people are rewriting new operating systems > > from scratch for them. Lots of fragmentation already exists. > > Let me offer a speculative if somewhat cynical prediction: 90% of those ghastly > IOT hardware hacks won't survive the market. The remaining 10% will be successful > financially, despite being ghastly hardware hacks and will eventually, in the next > iteration or so, get a proper OS. Your prediction is based on a false premise. There is simply no money to be made with IoT hardware, especially in the low end. Those little devices will be given away for free because it is in the service subscription that the money is. So the hardware has to, and will be, extremely cheap to produce. If a serious bug turns up in one of those device, my own cynical prediction is that no one will bother with field upgradability and they will ask you to throw the device away instead while they ship you a replacement (field upgradability implies at least twice the flash memory size and that comes with a cost so some will gamble that obsolescence will happen before a serious bug turns up). > As users ask for more features the the hardware capabilities will increase > dramatically and home-grown microcontroller derived code plus minimal OSes will be > replaced by a 'real' OS. Because both developers and users will demand IPv6 > compatibility, or Bluetooth connectivity, or storage support, or any random range > of features we have in the Linux kernel. The "Cloud" is taking care of most of that. For the rest, your cellphone or IoT gateway will take over. IPv6 stacks are already used in tiny microcontrollers with as low as 32KB of RAM. > With the stroke of a pen from the CFO: "yes, we can spend more on our next > hardware design!" the problem goes away, overnight, and nobody will look back at > the hardware hack that had only 1MB of RAM. Of course hobbyists can already get a Raspberry Pi Zero and run a full featured Linux distro on it... for a mere 5 bucks. That comes with 512MB of RAM so my patches certainly don't make a difference there. But that's not that simple. First there is a fundamental constraint which is power consumption. If you want your device to run for months (some will hope years) from the same tiny battery then you just cannot afford SDRAM. So we're talking static RAM here. And to keep costs down because you want to give away your thingies by the millions for free it usually means single-chip designs with on-chip sub-megabyte static RAM. And in that field the 256KB mark is located towards the high end of the spectrum. Many IPv6-capable chips available today have less than that. And the thing is: people already manage to do a awful lot of stuff in such a constrained device. Some probably did a good job of it, but most of them likely suck and we don't know about their bugs because we have no idea what's running inside. And because it is rather easy to write a new OS from scratch for such a small environment (and who didn't dream of writing his own OS, right?) then about every company in that field did so. That's not counting most Open Source ones which usually are close to single-person projects. So you get a lot of fragmentation, very very little peer review, and no incentive for proper maintenance because the cost saving simply isn't significant enough. It is just like asteroids. Some of them collapse to form bigger objects like planets, while others have too weak a gravitational field to gather more matter. My vision is about leveraging the Linux gravitational power to bring the tiny embedded space together because, on its own, the tiny embedded space simply has not enough community power to actually organize itself. Of course there are important parts of Linux that couldn't be reused as is in such a setup, but yet many other things still can be reused with either some modifications or a tiny parallel subsystem substitution. Technically, it is always possible to find ways to make it low on maintenance and beneficial to the wider community. But first and foremost you have to agree with the fundamental principle of gathering more people around a common codebase to make it better for everyone and not suggest that they stick to themselves. If you agree to that then we can move back to a technical discussion. > > [...] We're talking about systems with less than one megabyte of RAM, sometimes > > much less. > > Two data points: > > Firstly, by the time any Linux kernel change I commit today gets to a typical > distro it's at least 0.5-1 years, 2 years for it to get widely used by hardware > shops - 5 years to get used by enterprises. More latency in more conservative > places. Don't forget that you are also merging patches today from the Android folks that have been deployed into actual products years ago. So the enterprise distro comparison simply has no commonalities here. > Secondly, I don't see Moore's Law reversing: > > http://nerdfever.com/wp-content/uploads/2015/06/2015-06_Moravec_MIPS.png > > If you combine those two time frames, the consequence of this: > > Even taking the 1MB size at face value (which I don't: a networking enabled system > can probably not function very well with just 1MB of RAM) - the RAM-starved 1 MB > system today will effectively be a 2 MB system in 2 years. As surprising as it might be, IPv6 stacks requiring only a few dozens of kilobytes of memory do exist. Not so surprisingly though, some people think that the existing stacks simply suck and they are rewriting yet another one ... because they think their own will be better of course. So there *is* still a huge market for sub-megabyte systems. I was also counting on Moore's law so that by the time Linux actually has the ability to be tailored for such systems then typical SRAM in those 10-cents microcontrollers will be 512KB instead of 128 or 32. > You can already fit a mostly full Linux system into 32 MB just fine, i.e. the > problem has solved itself just by waiting a bit or by increasing the hardware > capabilities a bit. You just can't procure SDRAM chips smaller than 32MB on the market anymore. That's why Linux didn't get any pressure to fit in smaller than that for quite a while. But I've heard of some people having use cases for thousands if not millions of Linux VMs on a single server and they're looking at 10MB VMs or smaller for their application. > But the kernel complexity you introduce with this series stays with us! It will be > an additional cost added to many scheduler commits going forward. It's an added > cost for all the other usecases. OK, let's talk about that a bit. How isn't sched/core.c with its 7387 lines not overly complex already? How is my moving of rt related code to rt.c and dl related code to dl.c not helping things? Isn't it easier to understand the 3500 lines of code in futex.c when half of it i.e. the PI specific code is split into a separate file? I ask you. If you want to pick only those patches for now then please be my guest. At lease the first two patches of the series should be mergeable without even a doubt. As to the actual complexity I'm introducing... this is just about not compiling some files in and stubbing calls to them out. Isn't that a sign of good isolation when you can stub the dl class out with only 9 insertions and 6 deletions to sched/core.c? I'm not saying the complexity is nonexistent here, but just the _ability_ to remove a scheduler class enforces code abstractions which should be a good thing maintenance wise, no? > Also, it's not like 20k .text savings will magically enable Linux to fit into 1MB > of RAM - it won't. The smallest still practical more or less generic Linux system > in existence today is around 16 MB. You can shrink it more, but the effort > increases exponentially once you go below a natural minimum size. Again, I'm not after a tiny-and-generic Linux target. I'm after a tiny-and-heavily-tailored Linux subset that shares the same ABI and API as the generic Linux. Once you start compiling out pieces of the core kernel, it obviously isn't generic anymore, but the potential for size reduction becomes much bigger. Anyway... as I said, you have to agree with the high level goal and principle of leveraging the Linux codebase to gather the tiny embedded people around it. The tiny embedded community simply will never take hold otherwise. . If we cannot agree on that then any other point of discussion is moot. In which case I'll simply drop this project entirely and move on. Nicolas
* Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > But the kernel complexity you introduce with this series stays with us! It > > will be an additional cost added to many scheduler commits going forward. It's > > an added cost for all the other usecases. > > OK, let's talk about that a bit. How isn't sched/core.c with its 7387 > lines not overly complex already? How is my moving of rt related code to > rt.c and dl related code to dl.c not helping things? Isn't it easier to > understand the 3500 lines of code in futex.c when half of it i.e. the PI > specific code is split into a separate file? I ask you. > > If you want to pick only those patches for now then please be my guest. > At lease the first two patches of the series should be mergeable without > even a doubt. That's a strawman argument - I was reacting to the combined effect of your series: > > > 23 files changed, 3190 insertions(+), 2897 deletions(-) A subset of the patches might be fine and note that in fact I already picked a patch from your series that made sense, I committed this patch of yours three days ago: f5832c1998af: sched/core: Omit building stop_sched_class when !SMP I'll pick others as well as long as they don't complicate the code. Please send a revised series that only does unambiguous complexity reduction/cleanups. Thanks, Ingo
* Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > With the stroke of a pen from the CFO: "yes, we can spend more on our next > > hardware design!" the problem goes away, overnight, and nobody will look back at > > the hardware hack that had only 1MB of RAM. > > Of course hobbyists can already get a Raspberry Pi Zero and run a full > featured Linux distro on it... for a mere 5 bucks. That comes with 512MB > of RAM so my patches certainly don't make a difference there. Note that those mere 5 bucks are probably 50 cents or less in bulk. Perfectly fine economics for many types of 'throw away IoT hardware' products. > But that's not that simple. First there is a fundamental constraint > which is power consumption. If you want your device to run for months > (some will hope years) from the same tiny battery then you just cannot > afford SDRAM. So we're talking static RAM here. And to keep costs down > because you want to give away your thingies by the millions for free it > usually means single-chip designs with on-chip sub-megabyte static RAM. > And in that field the 256KB mark is located towards the high end of the > spectrum. Many IPv6-capable chips available today have less than that. > > And the thing is: people already manage to do a awful lot of stuff in > such a constrained device. Some probably did a good job of it, but most > of them likely suck and we don't know about their bugs because we have > no idea what's running inside. Ok, let me put it this way: there's no way in hell I see a viable Linux kernel running (no matter how stripped down) in 32K or even 64K of RAM. 256K is a stretch as well - but that RAM size you claim to be already 'high end', so it probably wouldn't be used as a standardized solution anyway... Today a Linux 'allnoconfig' kernel, i.e. a kernel with no device drivers and no filesystems whatsoever and with everything optional turned off (including all networking!), is over 2MB large text+data (on x86, which has a compressed instruction set - it would possibly be larger on simpler CPUs): triton:~/tip> size vmlinux text data bss dec filename 926056 208624 1215904 2350584 vmlinux A series that shrinks the .text size of the allnoconfig core Linux kernel from 1MB to 9.9MB in isolation is not proof. There will literally have to be two orders of magnitude more patches than that to reach the 32K size envelope, if I (very) optimistically assume that the difficulty to shrink code is constant (which it most certainly is not). I.e. the whole stated premise of the series is wildly not realistic AFAICS, the series does not make Linux more usable at all on that category of devices (Linux is totally inadequate because it's way too large), it only increases its complexity. But you can prove me wrong: show me a Linux kernel for a real device that fits into 32KB of RAM (or even 256 KB) and _then_ I'll consider the cost/benefit equation. Until that happens I consider most forms of additional complexity on the non-hardware dependent side of the kernel a net negative. Thanks, Ingo
On Sun, 11 Jun 2017, Ingo Molnar wrote: > > * Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > > If you want to pick only those patches for now then please be my guest. > > At least the first two patches of the series should be mergeable without > > even a doubt. > > That's a strawman argument - I was reacting to the combined effect of your series: > > > > > 23 files changed, 3190 insertions(+), 2897 deletions(-) As I mentioned, the bulk of that count comes from moving rt and dl code out of sched/core.c into their respective .c files: sched/deadline: move dl related code out of sched/core.c ... to sched/deadline.c. This helps making sched/core.c smaller and hopefully easier to understand and maintain. This also will help configuring the deadline scheduling class out of the kernel build. Signed-off-by: Nicolas Pitre <nico@linaro.org> kernel/sched/core.c | 335 +---------------------------------------- kernel/sched/deadline.c | 336 ++++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 14 ++ 3 files changed, 356 insertions(+), 329 deletions(-) sched/rt: move rt related code out of sched/core.c ... to sched/rt.c. This helps making sched/core.c smaller and hopefully easier to understand and maintain. This also will make it easier to configure the realtime scheduling class out of the kernel build. Signed-off-by: Nicolas Pitre <nico@linaro.org> kernel/sched/core.c | 315 --------------------------------------------- kernel/sched/rt.c | 310 ++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 5 + 3 files changed, 315 insertions(+), 315 deletions(-) I also untangled the futex code so the PI support is gathered in a file of its own: futex: make PI support optional Split out the priority inheritance support to a file of its own to make futex.c easier to understand and, hopefully, to maintain. This also makes it possible to compile out the PI support when RT task support is not available. Signed-off-by: Nicolas Pitre <nico@linaro.org> include/linux/futex.h | 7 +- init/Kconfig | 7 +- kernel/futex.c | 2829 ++++++++++--------------------------------- kernel/futex_pi.c | 1563 ++++++++++++++++++++++++ 4 files changed, 2233 insertions(+), 2173 deletions(-) Granted I made a mistake in this last description above. It should have said "RT mutex support" instead of "RT task support". But those 3 patches are making the code easier to understand I'd say. > A subset of the patches might be fine and note that in fact I already picked a > patch from your series that made sense, I committed this patch of yours three days > ago: > > f5832c1998af: sched/core: Omit building stop_sched_class when !SMP Good. That was patch #2/8. Why did you skip over #1/8 "cpuset/sched: cpuset makes sense for SMP only"? It is the same kind of simple cleanup as the one you did apply. > I'll pick others as well as long as they don't complicate the code. Please send a > revised series that only does unambiguous complexity reduction/cleanups. Tell me from the above which patches would qualify and I'll repost them. Nicolas
On Sun, 11 Jun 2017, Ingo Molnar wrote: > * Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > > But that's not that simple. First there is a fundamental constraint > > which is power consumption. If you want your device to run for months > > (some will hope years) from the same tiny battery then you just cannot > > afford SDRAM. So we're talking static RAM here. And to keep costs down > > because you want to give away your thingies by the millions for free it > > usually means single-chip designs with on-chip sub-megabyte static RAM. > > And in that field the 256KB mark is located towards the high end of the > > spectrum. Many IPv6-capable chips available today have less than that. > > > > And the thing is: people already manage to do a awful lot of stuff in > > such a constrained device. Some probably did a good job of it, but most > > of them likely suck and we don't know about their bugs because we have > > no idea what's running inside. > > Ok, let me put it this way: there's no way in hell I see a viable Linux kernel > running (no matter how stripped down) in 32K or even 64K of RAM. 256K is a stretch > as well - but that RAM size you claim to be already 'high end', so it probably > wouldn't be used as a standardized solution anyway... I never pretended to make Linux runable in 32KB of RAM. Therefore we strongly agree here. I however mentioned that some 32KB chips are IPv6 capable, just to give you a different perspective given that you're more acquainted with multi-gigabyte systems. And as you did mention Moore's law previously, the fact that 256KB of RAM might be somewhat high-end today in that space, that should become pretty common in the near future. The test board in front of me has 384KB of SRAM and bigger ones exist. > Today a Linux 'allnoconfig' kernel, i.e. a kernel with no device drivers and no > filesystems whatsoever and with everything optional turned off (including all > networking!), is over 2MB large text+data (on x86, which has a compressed > instruction set - it would possibly be larger on simpler CPUs): > > triton:~/tip> size vmlinux > text data bss dec filename > 926056 208624 1215904 2350584 vmlinux On ARM, allnoconfig produces: text data bss dec hex filename 548144 95480 24252 667876 a30e4 vmlinux But more realistically, the test system I'm using currently runs the kernel XIP from flash, so the text size is an indirect metric. It uses external RAM as the 384KB of SRAM still doesn't allow for a successful boot. But here's what I get once booted: / # free total used free shared buffers cached Mem: 7936 1624 6312 0 0 492 -/+ buffers/cache: 1132 6804 / # uname -a Linux (none) 4.12.0-rc4-00013-g32352a9367 #35 PREEMPT Sun Jun 11 10:45:02 EDT 2017 armv7ml GNU/Linux I could make user space XIP from flash as well, but right now it is just some initramfs living in RAM. Obviously you can't use the native Linux networking stack in such small systems. But a few IPv6 stacks have been made to work in a few kilobytes already. > A series that shrinks the .text size of the allnoconfig core Linux kernel from 1MB > to 9.9MB in isolation is not proof. I assume you meant 0.9MB. It is no proof of course. But I'm following the well known and proven "release early, release often" mantra here... unless this is no longer promoted? > There will literally have to be two orders of magnitude more patches than that to > reach the 32K size envelope, if I (very) optimistically assume that the difficulty > to shrink code is constant (which it most certainly is not). Once again, my goal is _not_ 32KB. And I don't intend to shrink code. Most of the time I just want to _remove_ code. Compiling it out to be precise. The goal of this series is all about compiling out code. And to achieve that with the scheduler, I simply moved some code to different source files and not including those source files in the final build. That keeps the number of #ifdef's to a minimum but it makes a big diffstat due to the code movement. In the TTY layer case, I found out that writing a simplistic parallel equivalent that doesn't have to scale to server class systems and remains compatible with existing drivers allowed a 6x factor in size reduction. The same strategy could be employed with the VFS where any kind of file caching doesn't make sense in a tiny system. Don't worry, I'm lot looking forward to using BTRFS in 256KB of RAM either. To give you an idea, here's the size repartition from that booting kernel above: $ size */built-in.o text data bss dec hex filename 290669 41864 3616 336149 52115 drivers/built-in.o 173275 1189 5472 179936 2bee0 fs/built-in.o 10135 14084 84 24303 5eef init/built-in.o 198624 22000 25160 245784 3c018 kernel/built-in.o 79064 133 53 79250 13592 lib/built-in.o 97034 6328 3532 106894 1a18e mm/built-in.o 2135 0 0 2135 857 security/built-in.o 146046 0 0 146046 23a7e usr/built-in.o 0 0 0 0 0 virt/built-in.o That's without LTO (because with LTO there's no way to size individual parts) and without syscall trimming. From previous experiments, LTO brings a 20% reduction on the final build size, and LTO with syscall trimming together provide about 40% reduction. One nice thing about LTO is that part of the 75KB of lib code automatically gets discarded when not referenced, etc. This is not always the case for most of the core driver infrastructure despite most of it not being used in my case. But there are pieces of the kernel that can't automatically be eliminated, such as scheduler classes, because the compiler just can't tell if they'll be used at run time. Some "memory hogs" (relatively speaking) might need a tiny version to cope with a handful of processes max and a few static drivers. As Alan said, wait queues as they are right now consume a lot of memory. But since they're well defined and encapsulated already, it is possible to provide a light alternative implemented in a way that uses much less memory with the side effect of being much less scalable. But scalability is not a huge concern when you have only 256KB of RAM. So it is a combination of strategies that will make the 256KB goal possible. And as you can see from the free output above, this is not _that_ far off already. > But you can prove me wrong: show me a Linux kernel for a real device that fits > into 32KB of RAM (or even 256 KB) and _then_ I'll consider the cost/benefit > equation. Your insisting on 32KB in this discussion is simply disingenuous. So you are basically saying that you want me to work another year on this project "behind closed doors" and come out with "a final solution" before you tell me if my approach is worthy of your consideration? Thanks but no thanks. As I said elsewhere, the value in this proposal is mainline inclusion in an ongoing process otherwise there is no gain over those small OSes out there, and my time is more valuable than that. Nicolas
* Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > A series that shrinks the .text size of the allnoconfig core Linux kernel from 1MB > > to 9.9MB in isolation is not proof. > > I assume you meant 0.9MB. 0.992 MB actually if we apply the ~8k .text savings. 0.9MB would imply 100k of savings on an allnoconfig kernel. > It is no proof of course. But I'm following the well known and proven > "release early, release often" mantra here... unless this is no longer > promoted? I'm following that same pattern: I gave you negative review feedback as early as possible. Fragmention of the scheduler ABI increases complexity and has knock-on costs - and the kernel size reduction for the usecase you cited are still 1-2 orders of magnitude away from making a practical difference. > > There will literally have to be two orders of magnitude more patches than that > > to reach the 32K size envelope, if I (very) optimistically assume that the > > difficulty to shrink code is constant (which it most certainly is not). > > Once again, my goal is _not_ 32KB. > > And I don't intend to shrink code. Most of the time I just want to > _remove_ code. Compiling it out to be precise. The goal of this series > is all about compiling out code. And to achieve that with the scheduler, > I simply moved some code to different source files and not including > those source files in the final build. That keeps the number of #ifdef's > to a minimum but it makes a big diffstat due to the code movement. So I'm fine with most of the code movement - let's try this series without any of the more controversial bits which should make future arguments easier. Thanks, Ingo
On Tue, 13 Jun 2017, Ingo Molnar wrote: > > I simply moved some code to different source files and not including > > those source files in the final build. That keeps the number of #ifdef's > > to a minimum but it makes a big diffstat due to the code movement. > > So I'm fine with most of the code movement - let's try this series without any of > the more controversial bits which should make future arguments easier. You should then be able to merge patches #1 to #5 already (you already have #2) as the more controversial ones are at the end. Nicolas