Message ID | 20170323210304.2181-1-nicolas.pitre@linaro.org |
---|---|
Headers | show |
Series | minitty: a minimal TTY layer alternative for embedded systems | expand |
Hi Nicolas, On Thu, Mar 23, 2017 at 05:03:01PM -0400, Nicolas Pitre wrote: > Here's some numbers using a minimal ARM config. > > When CONFIG_TTY=y, the following files are linked into the kernel: > > text data bss dec hex filename > 8796 128 0 8924 22dc drivers/tty/n_tty.o > 12846 276 44 13166 336e drivers/tty/serial/serial_core.o > 4852 489 49 5390 150e drivers/tty/sysrq.o > 1376 0 0 1376 560 drivers/tty/tty_buffer.o > 13571 172 132 13875 3633 drivers/tty/tty_io.o > 3072 0 0 3072 c00 drivers/tty/tty_ioctl.o > 2457 2 120 2579 a13 drivers/tty/tty_ldisc.o > 1328 0 0 1328 530 drivers/tty/tty_ldsem.o > 316 0 0 316 13c drivers/tty/tty_mutex.o > 2516 0 0 2516 9d4 drivers/tty/tty_port.o > 51130 1067 345 52542 cd3e (TOTALS) > > With CONFIG_TTY=n and CONFIG_MINITTY_SERIAL=y, the above is replaced by: > > text data bss dec hex filename > 8776 8 108 8892 22bc drivers/tty/serial/minitty_serial.o tty_baudrate.o is missing here. It is included in serial_core.o when CONFIG_TTY=y. baruch -- http://baruch.siach.name/blog/ ~. .~ Tk Open Systems =}------------------------------------------------ooO--U--Ooo------------{= - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -
meta-comment, any reason you didn't cc: linux-serial@vger as well? On Thu, Mar 23, 2017 at 05:03:01PM -0400, Nicolas Pitre wrote: > Many embedded systems don't need the full TTY layer support. Most of the > time, the TTY layer is only a conduit for outputting debugging messages > over a serial port. The TTY layer also implements many features that are > very unlikely to ever be used in such a setup. There is great potential > for both code and dynamic memory size reduction on small systems. This is > what this patch series is achieving. > > The existing TTY code is quite large and complex. Trying to shrink it is > rather risky as the potential for breakage is non negligeable. Therefore, > the approach used here consists in the creation of the minimal code that > interface with the existing UART drivers and provide TTY-like character > devices to user space. When the regular TTY layer is disabled, then the > minitty layer replacement is proposed by Kconfig. > > Of course, making it "mini" means there are limitations to what it does: > > - This supports serial ports only. No VT's, no PTY's. > > - The default n_tty line discipline is hardcoded and no other line > discipline are supported. > > - The line discipline features are not all implemented. Notably, XON/XOFF > is currently not implemented (although this might not require a lot of > code to do it). > > - Hung-up state is not implemented. > > - No error handling on RX bytes other than counting them. > > - Behavior in the presence of overflows is most likely different from the > full TTY code. > > - Job control is currently not supported (this may change in the future and > be configurable). > > But again, most small embedded systems simply don't need those things. This is true, and I like the overall idea, but I don't like all of the code duplication. Also, who is going to maintain this? I'm not going to be able to even build it, let alone test it, for the systems I normally use, and now you have tagged me as maintaining it for forever :( > Here's some numbers using a minimal ARM config. > > When CONFIG_TTY=y, the following files are linked into the kernel: > > text data bss dec hex filename > 8796 128 0 8924 22dc drivers/tty/n_tty.o > 12846 276 44 13166 336e drivers/tty/serial/serial_core.o > 4852 489 49 5390 150e drivers/tty/sysrq.o > 1376 0 0 1376 560 drivers/tty/tty_buffer.o > 13571 172 132 13875 3633 drivers/tty/tty_io.o > 3072 0 0 3072 c00 drivers/tty/tty_ioctl.o > 2457 2 120 2579 a13 drivers/tty/tty_ldisc.o > 1328 0 0 1328 530 drivers/tty/tty_ldsem.o > 316 0 0 316 13c drivers/tty/tty_mutex.o > 2516 0 0 2516 9d4 drivers/tty/tty_port.o > 51130 1067 345 52542 cd3e (TOTALS) > > With CONFIG_TTY=n and CONFIG_MINITTY_SERIAL=y, the above is replaced by: > > text data bss dec hex filename > 8776 8 108 8892 22bc drivers/tty/serial/minitty_serial.o > > That's it! And the runtime buffer usage is much less as well. Is there some way to just reorginize the existing code to get you almost this same size? Make ptys and other line diciplines options to select, and slim down the io path by removing features there. And that serial core looks huge from what you are showing is really needed, any way to slim that down by just making features in it configurable? Again, I like the idea, but worry that with this change, we would have two different tty layers we have to maintain for the next 20+ years, and we have a hard time keeping one stable and working today :) thanks, greg k-h
On Fri, 24 Mar 2017, Baruch Siach wrote: > Hi Nicolas, > > On Thu, Mar 23, 2017 at 05:03:01PM -0400, Nicolas Pitre wrote: > > Here's some numbers using a minimal ARM config. > > > > When CONFIG_TTY=y, the following files are linked into the kernel: > > > > text data bss dec hex filename > > 8796 128 0 8924 22dc drivers/tty/n_tty.o > > 12846 276 44 13166 336e drivers/tty/serial/serial_core.o > > 4852 489 49 5390 150e drivers/tty/sysrq.o > > 1376 0 0 1376 560 drivers/tty/tty_buffer.o > > 13571 172 132 13875 3633 drivers/tty/tty_io.o > > 3072 0 0 3072 c00 drivers/tty/tty_ioctl.o > > 2457 2 120 2579 a13 drivers/tty/tty_ldisc.o > > 1328 0 0 1328 530 drivers/tty/tty_ldsem.o > > 316 0 0 316 13c drivers/tty/tty_mutex.o > > 2516 0 0 2516 9d4 drivers/tty/tty_port.o > > 51130 1067 345 52542 cd3e (TOTALS) > > > > With CONFIG_TTY=n and CONFIG_MINITTY_SERIAL=y, the above is replaced by: > > > > text data bss dec hex filename > > 8776 8 108 8892 22bc drivers/tty/serial/minitty_serial.o > > tty_baudrate.o is missing here. It is included in serial_core.o when > CONFIG_TTY=y. It is also included when CONFIG_MINITTY_SERIAL=y, so for comparison purpose I didn't list common files here. Nicolas
On Fri, 24 Mar 2017, Greg Kroah-Hartman wrote: > meta-comment, any reason you didn't cc: linux-serial@vger as well? I didn't realize such a list even existed. I looked up "TTY LAYER" in the maintainer file. > On Thu, Mar 23, 2017 at 05:03:01PM -0400, Nicolas Pitre wrote: > > Many embedded systems don't need the full TTY layer support. Most of the > > time, the TTY layer is only a conduit for outputting debugging messages > > over a serial port. The TTY layer also implements many features that are > > very unlikely to ever be used in such a setup. There is great potential > > for both code and dynamic memory size reduction on small systems. This is > > what this patch series is achieving. > > > > The existing TTY code is quite large and complex. Trying to shrink it is > > rather risky as the potential for breakage is non negligeable. Therefore, > > the approach used here consists in the creation of the minimal code that > > interface with the existing UART drivers and provide TTY-like character > > devices to user space. When the regular TTY layer is disabled, then the > > minitty layer replacement is proposed by Kconfig. > > > > Of course, making it "mini" means there are limitations to what it does: > > > > - This supports serial ports only. No VT's, no PTY's. > > > > - The default n_tty line discipline is hardcoded and no other line > > discipline are supported. > > > > - The line discipline features are not all implemented. Notably, XON/XOFF > > is currently not implemented (although this might not require a lot of > > code to do it). > > > > - Hung-up state is not implemented. > > > > - No error handling on RX bytes other than counting them. > > > > - Behavior in the presence of overflows is most likely different from the > > full TTY code. > > > > - Job control is currently not supported (this may change in the future and > > be configurable). > > > > But again, most small embedded systems simply don't need those things. > > This is true, and I like the overall idea, but I don't like all of the > code duplication. Also, who is going to maintain this? I'm not going > to be able to even build it, let alone test it, for the systems I > normally use, and now you have tagged me as maintaining it for forever > :( I'll maintain it. Will put the needed entry in MAINTAINERS. Why do you say you won't be able to build it? I didn't try but it is meant to build with any serial driver. > > Here's some numbers using a minimal ARM config. > > > > When CONFIG_TTY=y, the following files are linked into the kernel: > > > > text data bss dec hex filename > > 8796 128 0 8924 22dc drivers/tty/n_tty.o > > 12846 276 44 13166 336e drivers/tty/serial/serial_core.o > > 4852 489 49 5390 150e drivers/tty/sysrq.o > > 1376 0 0 1376 560 drivers/tty/tty_buffer.o > > 13571 172 132 13875 3633 drivers/tty/tty_io.o > > 3072 0 0 3072 c00 drivers/tty/tty_ioctl.o > > 2457 2 120 2579 a13 drivers/tty/tty_ldisc.o > > 1328 0 0 1328 530 drivers/tty/tty_ldsem.o > > 316 0 0 316 13c drivers/tty/tty_mutex.o > > 2516 0 0 2516 9d4 drivers/tty/tty_port.o > > 51130 1067 345 52542 cd3e (TOTALS) > > > > With CONFIG_TTY=n and CONFIG_MINITTY_SERIAL=y, the above is replaced by: > > > > text data bss dec hex filename > > 8776 8 108 8892 22bc drivers/tty/serial/minitty_serial.o > > > > That's it! And the runtime buffer usage is much less as well. > > Is there some way to just reorginize the existing code to get you almost > this same size? Make ptys and other line diciplines options to select, > and slim down the io path by removing features there. > > And that serial core looks huge from what you are showing is really > needed, any way to slim that down by just making features in it > configurable? > > Again, I like the idea, but worry that with this change, we would have > two different tty layers we have to maintain for the next 20+ years, and > we have a hard time keeping one stable and working today :) That's the crux of the argument: touching the current TTY layer is NOT going to help keeping it stable. Here, not only I did remove features, but the ones I kept were reimplemented to be much smaller and potentially less scalable and performant too. The ultimate goal here is to have the smallest code possible with very simple locking and not necessarily the most scalable code. That in itself is contradictory with the regular TTY code and warrants a separate implementation. And because it is so small, it is much easier to understand and much easier to maintain. Where code sharing made sense, I did factor out common parts already, such as the baudrate handling. I intend to do the same to add job control support. Nicolas
On Fri, Mar 24, 2017 at 08:31:45AM -0400, Nicolas Pitre wrote: > On Fri, 24 Mar 2017, Greg Kroah-Hartman wrote: > > > meta-comment, any reason you didn't cc: linux-serial@vger as well? > > I didn't realize such a list even existed. I looked up "TTY LAYER" in > the maintainer file. Ah, didn't notice the list wasn't included there, I'll go fix that... > > Again, I like the idea, but worry that with this change, we would have > > two different tty layers we have to maintain for the next 20+ years, and > > we have a hard time keeping one stable and working today :) > > That's the crux of the argument: touching the current TTY layer is NOT > going to help keeping it stable. Here, not only I did remove features, > but the ones I kept were reimplemented to be much smaller and > potentially less scalable and performant too. The ultimate goal here is > to have the smallest code possible with very simple locking and not > necessarily the most scalable code. That in itself is contradictory with > the regular TTY code and warrants a separate implementation. And because > it is so small, it is much easier to understand and much easier to > maintain. So, what you are really saying here is "the current tty layer is too messy, too complex, too big, and not understandable, so I'm going to route around it by rewriting the whole thing just for my single-use-case because I don't want to touch it." That's a horrid thing to do. Factoring things out is great. Routing around the existing working code just because you want something "simpler" is not great. Refactor and fix things up so you do understand it, because by ignoring it, you are going to end up making the same mistakes that have already been fixed with the existing 20+ years of tty layer development. So please, take what we have, refactor, and carve things up so that the _same_ code paths are being used for both "big and little" tty layers. That way _everyone_ benifits, no need to have totally separate code paths, and totally different files that different people maintain. > Where code sharing made sense, I did factor out common parts already, > such as the baudrate handling. I intend to do the same to add job > control support. The first two patches were great, I like those. Keep that work up, just make it so that a single line disipline attached to a serial port, without the pty stuff, works just fine and is tiny. I don't see why that can't be possible. thanks, greg k-h
On Fri, 24 Mar 2017, Greg Kroah-Hartman wrote: > On Fri, Mar 24, 2017 at 08:31:45AM -0400, Nicolas Pitre wrote: > > That's the crux of the argument: touching the current TTY layer is NOT > > going to help keeping it stable. Here, not only I did remove features, > > but the ones I kept were reimplemented to be much smaller and > > potentially less scalable and performant too. The ultimate goal here is > > to have the smallest code possible with very simple locking and not > > necessarily the most scalable code. That in itself is contradictory with > > the regular TTY code and warrants a separate implementation. And because > > it is so small, it is much easier to understand and much easier to > > maintain. > > So, what you are really saying here is "the current tty layer is too > messy, too complex, too big, and not understandable, so I'm going to > route around it by rewriting the whole thing just for my single-use-case > because I don't want to touch it." That's not exactly what I'm saying. Yes, the current TTY code is big. It has to, given that it is extremely flexible, it can scale up and still be robust, and it covers a large amount of use cases. Because of those characteristics, it fundamentally cannot be made small. You just can't have it all. I'm not saying that the current code is not understandable. I spent considerable amount of my time understanding it, first and foremost to get to know what I'm talking about, and find ways to shrink its memory footprint initially. It is certainly complex because of the flexibility and robustness it provides. My code most likely wouldn't perform as well in the presence of multiple high-throughput channels for example. But that's not my concern. I'm concerned about small embedded systems where 85% of that code is useless. In some cases the ability to change baudrate is also unneeded so I intend to make that part configurable too. But in the end there is simply no way I could achieve the same footprint reduction with the existing code. This is clearly impossible. For example, my code perform line discipline handling in the very same buffer where the RX interrupt is storing new data. The existing TTY code has up to 3 buffering layers because of the needed modularisation to support swappable line discipline modules, etc. It is simply unreasonable to expect that the later can be turned into the former without either breaking things or severely restricting its scope. Let's be honest here: the existing code _could_ possibly be reduced of course. That would require a lot of efforts to gain 50% reduction maybe? What I'm looking at with my proposal here is a 6x reduction factor and I'm still not done with it. There is no way I could do that with the existing code. Let me give you some background as to what my fundamental motivation is, and then maybe you'll understand why I'm doing this. What is the biggest buzzword in the IT industry right now? It is IOT. Most IOT targets are so small that people are rewriting new operating systems from scratch for them. Lots of fragmentation already exists. We're talking about systems with less than one megabyte of RAM, sometimes much less. Still, those things are being connected to the internet. And this is going to be a total security nightmare. I wish to be able to leverage the Linux ecosystem for as much of the IOT space as possible to avoid the worst of those nightmares. The Linux ecosystem has a *lot* of knowledgeable people around it, a lot of testing infrastructure and tooling available already, etc. If a security issue turns up on Linux, it has a greater chance of being caught early, or fixed quickly otherwise, and finding people with the right knowledge is easier on Linux than it could be on any RTOS out there. Still with me so far? Yes we have tools that can automatically reduce the kernel size. We can use LTO with the compiler, etc. LTO is pretty good already. It can typically reduce the kernel size by 20%. If all system calls are disabled except for a few ones, then LTO can get rid of another 20%. The minimal kernel I get is still 400-500 KB in size. That's still too big. Part of the size is this 60 KB of TTY + serial driver code just to send some debugging messages out or do simple shell interactions! Now with this mini TTY and one of the existing UART driver I'm down to 20 KB and there is still room for more reduction. There is also this 120 KB of VFS code that is always there even though there is no real filesystem at all configured in the kernel. There is that other 100 KB of core driver support code despite the fact that the set of drivers I'm using are very simple and basic. Etc. For Linux to be suitable, it has to be small, damn small. My target is 256 KB of RAM. And if you look at the kind of application those 256 KB systems are doing, it's basically one main task typically acquiring sensor data and sending it in some crypted protocol over a wireless network on the internet, and possibly accepting commands back. So what do you need from the OS to achieve that? A few system calls, a minimal scheduler, minimal memory management, minimal filesystem structure and minimal network stack. And your user app. So, why not having each of those blocks be created using the existing Linux syscall interface and internal API? At that point, it should be possible to take your standard full-featured Linux workstation and develop your user app on it, run it there using all the existing native debugging tools, etc. Also, it should be possible to swap some of those kernel blocks for the tiny alternative in your kernel config and still be able to boot such a kernel on your PC workstation and validate them there, test them with the existing fuzers, etc. That's what I have here with this mini TTY implementation. In the end you just take the mini version of everything for the final target and you're done. And you don't have to learn a whole new development environment and program model, etc. I hope you'd agree with me that for such a goal, I cannot just try to shrink the existing code. There has to be a parallel implementation of some blocks alongside the main one that preserves the existing API but that provides much less scalability and fewer features. Next on my list would be a cache-less, completely serialized VFS alternative that has only what's needed to make the link between the read/write syscalls, a filesystem driver and a block driver. And by being really small, the maintenance cost of a parallel implementation isn't very high, certainly much less than trying to maintain a single version that can scale to both extremes. Hence this series, which I hope could be the beginning of a trend for allowing Linux into the largest computing device deployment to come. Nicolas
On 24 March 2017 at 13:53, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Fri, Mar 24, 2017 at 08:31:45AM -0400, Nicolas Pitre wrote: >> On Fri, 24 Mar 2017, Greg Kroah-Hartman wrote: >> >> > meta-comment, any reason you didn't cc: linux-serial@vger as well? >> >> I didn't realize such a list even existed. I looked up "TTY LAYER" in >> the maintainer file. > > Ah, didn't notice the list wasn't included there, I'll go fix that... > >> > Again, I like the idea, but worry that with this change, we would have >> > two different tty layers we have to maintain for the next 20+ years, and >> > we have a hard time keeping one stable and working today :) >> >> That's the crux of the argument: touching the current TTY layer is NOT >> going to help keeping it stable. Here, not only I did remove features, >> but the ones I kept were reimplemented to be much smaller and >> potentially less scalable and performant too. The ultimate goal here is >> to have the smallest code possible with very simple locking and not >> necessarily the most scalable code. That in itself is contradictory with >> the regular TTY code and warrants a separate implementation. And because >> it is so small, it is much easier to understand and much easier to >> maintain. > > So, what you are really saying here is "the current tty layer is too > messy, too complex, too big, and not understandable, so I'm going to > route around it by rewriting the whole thing just for my single-use-case > because I don't want to touch it." > > That's a horrid thing to do. > > Factoring things out is great. Routing around the existing working code > just because you want something "simpler" is not great. Refactor and > fix things up so you do understand it, because by ignoring it, you are > going to end up making the same mistakes that have already been fixed > with the existing 20+ years of tty layer development. > > So please, take what we have, refactor, and carve things up so that the > _same_ code paths are being used for both "big and little" tty layers. > That way _everyone_ benifits, no need to have totally separate code > paths, and totally different files that different people maintain. > As I understand it, the memory saving is not only due to having less code, but also due to the fact that functionality that exists as distinct layers in the full featured TTY stack is collapsed into a single layer, requiring substantially less memory for buffers. I guess you could call collapsing layers like this 'routing around it', but the point is that the reason for doing so is not that the code is too complex or too big, but simply that the flexibility offered by a deep stack is fundamentally irreconcilable with a shallow one that is hardwired for a serial debug port. >> Where code sharing made sense, I did factor out common parts already, >> such as the baudrate handling. I intend to do the same to add job >> control support. > > The first two patches were great, I like those. Keep that work up, just > make it so that a single line disipline attached to a serial port, > without the pty stuff, works just fine and is tiny. I don't see why > that can't be possible. > > thanks, > > greg k-h > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel