mbox series

[00/26] x86: video: Speed up the framebuffer

Message ID 20200519231058.19945-1-sjg@chromium.org
Headers show
Series x86: video: Speed up the framebuffer | expand

Message

Simon Glass May 19, 2020, 11:10 p.m. UTC
Some architectures use a cached framebuffer and flush the cache as needed
so that changes are visible. This is supported by U-Boot.

However x86 uses an uncached framebuffer with a 'write-combining' feature
to speed up writes. Reads are permitted but they are extremely expensive.

Unfortunately, reading from the frame buffer is quite common, e.g. to
scroll it. This makes scrolling very slow.

This series adds a new feature which supports copying modified parts of
the frame buffer to the uncached hardware buffer. This speeds up scrolling
dramatically on x86 so the extra complexity cost seems worth it.

In an extreme case, the time to print the environment on minnowboard with
1280x1024 and CONFIG_CONSOLE_SCROLL_LINES disabled is reduced
significantly, from 13 seconds to 300ms.


Simon Glass (26):
  x86: fsp: Reinit the FPU after FSP meminit
  console: Add a way to output to serial only
  video: Show an error when a vidconsole function fails
  sandbox: video: Allow selection of rotated console
  video: Split out expression parts into variables
  video: Adjust rotated console to start at right edge
  video: Drop unnecessary #ifdef around vid_console_color()
  video: Add a comment for struct video_uc_platdata
  video: Add support for copying to a hardware framebuffer
  video: Set up the copy framebuffer when enabled
  video: Clear the copy framebuffer when clearing the screen
  video: Add helpers for vidconsole for the copy framebuffer
  video: Update normal console to support copy buffer
  video: Update truetype console to support copy buffer
  video: Update rotated console to support copy buffer
  video: Update the copy framebuffer when writing bitmaps
  video: Add comments to struct sandbox_sdl_plat
  video: sandbox: Add support for the copy framebuffer
  video: pci: Set up the copy framebuffer
  x86: fsp: video: Allocate a frame buffer when needed
  video: Correctly handle multiple framebuffers
  x86: video: Support copy framebuffer with probed devices
  x86: chromebook_samus: Enable the copy framebuffer
  x86: chromebook_link: Enable the copy framebuffer
  x86: minnowmax: Enable the copy framebuffer
  x86: minnowmax: Drop screen resolution to 1024x768

 arch/x86/cpu/i386/cpu.c            |   5 ++
 arch/x86/include/asm/u-boot-x86.h  |   8 +++
 arch/x86/lib/fsp/fsp_graphics.c    |  12 ++++
 arch/x86/lib/fsp2/fsp_meminit.c    |   1 +
 common/console.c                   |  28 ++++++--
 configs/chromebook_link_defconfig  |   2 +-
 configs/chromebook_samus_defconfig |   2 +-
 configs/minnowmax_defconfig        |   4 +-
 configs/sandbox_defconfig          |   1 +
 drivers/pci/pci_rom.c              |  22 +++++-
 drivers/video/Kconfig              |  31 +++++++++
 drivers/video/broadwell_igd.c      |  16 ++++-
 drivers/video/console_normal.c     |  26 +++++++-
 drivers/video/console_rotate.c     | 103 ++++++++++++++++++++---------
 drivers/video/console_truetype.c   |  43 ++++++++----
 drivers/video/ivybridge_igd.c      |  26 ++++++--
 drivers/video/sandbox_sdl.c        |  10 ++-
 drivers/video/vesa.c               |  30 ++++++++-
 drivers/video/vidconsole-uclass.c  |  44 +++++++++++-
 drivers/video/video-uclass.c       |  93 +++++++++++++++++++++++++-
 drivers/video/video_bmp.c          |  16 ++++-
 include/console.h                  |  13 ++++
 include/dm/test.h                  |  14 +++-
 include/video.h                    |  41 ++++++++++++
 include/video_console.h            |  51 +++++++++++++-
 test/dm/video.c                    |  60 ++++++++++-------
 26 files changed, 600 insertions(+), 102 deletions(-)

Comments

Bin Meng May 22, 2020, 3:29 p.m. UTC | #1
Hi Simon,

On Wed, May 20, 2020 at 7:11 AM Simon Glass <sjg at chromium.org> wrote:
>
> Some architectures use a cached framebuffer and flush the cache as needed
> so that changes are visible. This is supported by U-Boot.
>
> However x86 uses an uncached framebuffer with a 'write-combining' feature
> to speed up writes. Reads are permitted but they are extremely expensive.
>

Is it possible to use cached framebuffer on x86?

> Unfortunately, reading from the frame buffer is quite common, e.g. to
> scroll it. This makes scrolling very slow.
>
> This series adds a new feature which supports copying modified parts of
> the frame buffer to the uncached hardware buffer. This speeds up scrolling
> dramatically on x86 so the extra complexity cost seems worth it.
>
> In an extreme case, the time to print the environment on minnowboard with
> 1280x1024 and CONFIG_CONSOLE_SCROLL_LINES disabled is reduced
> significantly, from 13 seconds to 300ms.
>

Thanks for the series. The improvements sounds great! I will get one
minnowmax board to test this series soon.

Regards,
Bin
Simon Glass May 22, 2020, 4:33 p.m. UTC | #2
Hi Bin,

On Fri, 22 May 2020 at 09:30, Bin Meng <bmeng.cn at gmail.com> wrote:
>
> Hi Simon,
>
> On Wed, May 20, 2020 at 7:11 AM Simon Glass <sjg at chromium.org> wrote:
> >
> > Some architectures use a cached framebuffer and flush the cache as needed
> > so that changes are visible. This is supported by U-Boot.
> >
> > However x86 uses an uncached framebuffer with a 'write-combining' feature
> > to speed up writes. Reads are permitted but they are extremely expensive.
> >
>
> Is it possible to use cached framebuffer on x86?

It might be possible on newer chips. I see so much conflicting stuff
about flushing the cache, though. The write-through cache is slow for
reads. The write-back cache never writes unless you flush.

>
> > Unfortunately, reading from the frame buffer is quite common, e.g. to
> > scroll it. This makes scrolling very slow.
> >
> > This series adds a new feature which supports copying modified parts of
> > the frame buffer to the uncached hardware buffer. This speeds up scrolling
> > dramatically on x86 so the extra complexity cost seems worth it.
> >
> > In an extreme case, the time to print the environment on minnowboard with
> > 1280x1024 and CONFIG_CONSOLE_SCROLL_LINES disabled is reduced
> > significantly, from 13 seconds to 300ms.
> >
>
> Thanks for the series. The improvements sounds great! I will get one
> minnowmax board to test this series soon.

Yes it should work on that.

Regards,
Simon
Bin Meng May 22, 2020, 11:09 p.m. UTC | #3
Hi Simon,

On Sat, May 23, 2020 at 12:34 AM Simon Glass <sjg at chromium.org> wrote:
>
> Hi Bin,
>
> On Fri, 22 May 2020 at 09:30, Bin Meng <bmeng.cn at gmail.com> wrote:
> >
> > Hi Simon,
> >
> > On Wed, May 20, 2020 at 7:11 AM Simon Glass <sjg at chromium.org> wrote:
> > >
> > > Some architectures use a cached framebuffer and flush the cache as needed
> > > so that changes are visible. This is supported by U-Boot.
> > >
> > > However x86 uses an uncached framebuffer with a 'write-combining' feature
> > > to speed up writes. Reads are permitted but they are extremely expensive.
> > >
> >
> > Is it possible to use cached framebuffer on x86?
>
> It might be possible on newer chips. I see so much conflicting stuff
> about flushing the cache, though. The write-through cache is slow for
> reads. The write-back cache never writes unless you flush.

But you said "This is supported by U-Boot." So if we use cached frame
buffers on x86, the video driver already handles the cache coherency
for us? Not sure what conflicting stuff did you see?

>
> >
> > > Unfortunately, reading from the frame buffer is quite common, e.g. to
> > > scroll it. This makes scrolling very slow.
> > >
> > > This series adds a new feature which supports copying modified parts of
> > > the frame buffer to the uncached hardware buffer. This speeds up scrolling
> > > dramatically on x86 so the extra complexity cost seems worth it.
> > >
> > > In an extreme case, the time to print the environment on minnowboard with
> > > 1280x1024 and CONFIG_CONSOLE_SCROLL_LINES disabled is reduced
> > > significantly, from 13 seconds to 300ms.
> > >
> >
> > Thanks for the series. The improvements sounds great! I will get one
> > minnowmax board to test this series soon.
>
> Yes it should work on that.
>

Regards,
Bin
Simon Glass May 22, 2020, 11:13 p.m. UTC | #4
Hi Bin,

On Fri, 22 May 2020 at 17:09, Bin Meng <bmeng.cn at gmail.com> wrote:
>
> Hi Simon,
>
> On Sat, May 23, 2020 at 12:34 AM Simon Glass <sjg at chromium.org> wrote:
> >
> > Hi Bin,
> >
> > On Fri, 22 May 2020 at 09:30, Bin Meng <bmeng.cn at gmail.com> wrote:
> > >
> > > Hi Simon,
> > >
> > > On Wed, May 20, 2020 at 7:11 AM Simon Glass <sjg at chromium.org> wrote:
> > > >
> > > > Some architectures use a cached framebuffer and flush the cache as needed
> > > > so that changes are visible. This is supported by U-Boot.
> > > >
> > > > However x86 uses an uncached framebuffer with a 'write-combining' feature
> > > > to speed up writes. Reads are permitted but they are extremely expensive.
> > > >
> > >
> > > Is it possible to use cached framebuffer on x86?
> >
> > It might be possible on newer chips. I see so much conflicting stuff
> > about flushing the cache, though. The write-through cache is slow for
> > reads. The write-back cache never writes unless you flush.
>
> But you said "This is supported by U-Boot." So if we use cached frame
> buffers on x86, the video driver already handles the cache coherency
> for us? Not sure what conflicting stuff did you see?

It is supported by U-Boot on ARM.

If you enable write-back caching, you see nothing on the display. I
did try that, but no dice.

Regards,
Simon

[..]