mbox series

[PULL,00/26] migration and friends queue

Message ID 20200925120655.295142-1-dgilbert@redhat.com
Headers show
Series migration and friends queue | expand

Message

Dr. David Alan Gilbert Sept. 25, 2020, 12:06 p.m. UTC
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The following changes since commit 8c1c07929feae876202ba26f07a540c5115c18cd:

  Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging (2020-09-24 18:48:45 +0100)

are available in the Git repository at:

  git://github.com/dagrh/qemu.git tags/pull-migration-20200925a

for you to fetch changes up to e12a0edafeb5019aac74114b62a4703f79c5c693:

  virtiofsd: Add -o allow_direct_io|no_allow_direct_io options (2020-09-25 12:45:58 +0100)

----------------------------------------------------------------
Migration and virtiofsd pull

Chuan Zheng's Dirtyrate and TLS changes, with small fixes from Dov and
Luarent and Peter.
Small virtiofs changes from Harry, Stefan, Vivek and Jiachen.
One HMP/monitor rework from me.

----------------------------------------------------------------
Chuan Zheng (18):
      migration/dirtyrate: setup up query-dirtyrate framwork
      migration/dirtyrate: add DirtyRateStatus to denote calculation status
      migration/dirtyrate: Add RamblockDirtyInfo to store sampled page info
      migration/dirtyrate: Add dirtyrate statistics series functions
      migration/dirtyrate: move RAMBLOCK_FOREACH_MIGRATABLE into ram.h
      migration/dirtyrate: Record hash results for each sampled page
      migration/dirtyrate: Compare page hash results for recorded sampled page
      migration/dirtyrate: skip sampling ramblock with size below MIN_RAMBLOCK_SIZE
      migration/dirtyrate: Implement set_sample_page_period() and is_sample_period_valid()
      migration/dirtyrate: Implement calculate_dirtyrate() function
      migration/dirtyrate: Implement qmp_cal_dirty_rate()/qmp_get_dirty_rate() function
      migration/dirtyrate: Add trace_calls to make it easier to debug
      migration/tls: save hostname into MigrationState
      migration/tls: extract migration_tls_client_create for common-use
      migration/tls: add tls_hostname into MultiFDSendParams
      migration/tls: extract cleanup function for common-use
      migration/tls: add support for multifd tls-handshake
      migration/tls: add trace points for multifd-tls

Dov Murik (1):
      migration: Truncate state file in xen-save-devices-state

Dr. David Alan Gilbert (1):
      monitor: Use LOCK_GUARD macros

Harry G. Coin (1):
      virtiofsd: document cache=auto default

Jiachen Zhang (1):
      virtiofsd: Add -o allow_direct_io|no_allow_direct_io options

Laurent Vivier (1):
      migration: increase max-bandwidth to 128 MiB/s (1 Gib/s)

Peter Xu (2):
      migration: Properly destroy variables on incoming side
      migration: Rework migrate_send_rp_req_pages() function

Vivek Goyal (1):
      virtiofsd: Used glib "shared" thread pool

 docs/tools/virtiofsd.rst              |   1 +
 migration/channel.c                   |   1 +
 migration/dirtyrate.c                 | 426 ++++++++++++++++++++++++++++++++++
 migration/dirtyrate.h                 |  69 ++++++
 migration/meson.build                 |   2 +-
 migration/migration.c                 |  36 ++-
 migration/migration.h                 |   9 +-
 migration/multifd.c                   | 124 ++++++++--
 migration/multifd.h                   |   2 +
 migration/postcopy-ram.c              |  24 +-
 migration/ram.c                       |  11 +-
 migration/ram.h                       |  10 +
 migration/savevm.c                    |   3 +-
 migration/tls.c                       |  28 ++-
 migration/tls.h                       |   6 +
 migration/trace-events                |  12 +
 monitor/misc.c                        |  44 ++--
 qapi/migration.json                   |  67 ++++++
 tools/virtiofsd/fuse_virtio.c         |   2 +-
 tools/virtiofsd/helper.c              |   4 +
 tools/virtiofsd/passthrough_ll.c      |  20 +-
 tools/virtiofsd/passthrough_seccomp.c |   2 +
 22 files changed, 797 insertions(+), 106 deletions(-)
 create mode 100644 migration/dirtyrate.c
 create mode 100644 migration/dirtyrate.h

Comments

no-reply@patchew.org Sept. 25, 2020, 12:35 p.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/20200925120655.295142-1-dgilbert@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20200925120655.295142-1-dgilbert@redhat.com
Subject: [PULL 00/26] migration and friends queue

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20200925120655.295142-1-dgilbert@redhat.com -> patchew/20200925120655.295142-1-dgilbert@redhat.com
Switched to a new branch 'test'
786afbf virtiofsd: Add -o allow_direct_io|no_allow_direct_io options
b4e3d2f virtiofsd: Used glib "shared" thread pool
5c6cf53 virtiofsd: document cache=auto default
b0a5d54 monitor: Use LOCK_GUARD macros
88b31e8 migration/tls: add trace points for multifd-tls
42dd844 migration/tls: add support for multifd tls-handshake
0dd99f4 migration/tls: extract cleanup function for common-use
e951359 migration/tls: add tls_hostname into MultiFDSendParams
652dea8 migration/tls: extract migration_tls_client_create for common-use
d70fc72 migration/tls: save hostname into MigrationState
cb0c3a3 migration: increase max-bandwidth to 128 MiB/s (1 Gib/s)
b4f2d50 migration: Truncate state file in xen-save-devices-state
7448415 migration/dirtyrate: Add trace_calls to make it easier to debug
aa44e3a migration/dirtyrate: Implement qmp_cal_dirty_rate()/qmp_get_dirty_rate() function
9a1f005 migration/dirtyrate: Implement calculate_dirtyrate() function
f1bfba7 migration/dirtyrate: Implement set_sample_page_period() and is_sample_period_valid()
cb3b7ff migration/dirtyrate: skip sampling ramblock with size below MIN_RAMBLOCK_SIZE
a82b646 migration/dirtyrate: Compare page hash results for recorded sampled page
7d0f4d2 migration/dirtyrate: Record hash results for each sampled page
8a90389 migration/dirtyrate: move RAMBLOCK_FOREACH_MIGRATABLE into ram.h
539c0d7 migration/dirtyrate: Add dirtyrate statistics series functions
421892b migration/dirtyrate: Add RamblockDirtyInfo to store sampled page info
cab1614 migration/dirtyrate: add DirtyRateStatus to denote calculation status
83826ff migration/dirtyrate: setup up query-dirtyrate framwork
c58f028 migration: Rework migrate_send_rp_req_pages() function
012226f migration: Properly destroy variables on incoming side

=== OUTPUT BEGIN ===
1/26 Checking commit 012226fcacb7 (migration: Properly destroy variables on incoming side)
2/26 Checking commit c58f0286fdde (migration: Rework migrate_send_rp_req_pages() function)
WARNING: Block comments use a leading /* on a separate line
#32: FILE: migration/migration.c:317:
+/* Request one page from the source VM at the given start address.

total: 0 errors, 1 warnings, 89 lines checked

Patch 2/26 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/26 Checking commit 83826ffcaf74 (migration/dirtyrate: setup up query-dirtyrate framwork)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#20: 
new file mode 100644

total: 0 errors, 1 warnings, 71 lines checked

Patch 3/26 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
4/26 Checking commit cab161480dd1 (migration/dirtyrate: add DirtyRateStatus to denote calculation status)
5/26 Checking commit 421892b2ce34 (migration/dirtyrate: Add RamblockDirtyInfo to store sampled page info)
6/26 Checking commit 539c0d7abcf0 (migration/dirtyrate: Add dirtyrate statistics series functions)
7/26 Checking commit 8a9038992bc0 (migration/dirtyrate: move RAMBLOCK_FOREACH_MIGRATABLE into ram.h)
ERROR: Macros with multiple statements should be enclosed in a do - while loop
#67: FILE: migration/ram.h:42:
+#define RAMBLOCK_FOREACH_NOT_IGNORED(block)            \
+    INTERNAL_RAMBLOCK_FOREACH(block)                   \
+        if (ramblock_is_ignored(block)) {} else

ERROR: trailing statements should be on next line
#69: FILE: migration/ram.h:44:
+        if (ramblock_is_ignored(block)) {} else

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#71: FILE: migration/ram.h:46:
+#define RAMBLOCK_FOREACH_MIGRATABLE(block)             \
+    INTERNAL_RAMBLOCK_FOREACH(block)                   \
+        if (!qemu_ram_is_migratable(block)) {} else

ERROR: trailing statements should be on next line
#73: FILE: migration/ram.h:48:
+        if (!qemu_ram_is_migratable(block)) {} else

ERROR: braces {} are necessary for all arms of this statement
#73: FILE: migration/ram.h:48:
+        if (!qemu_ram_is_migratable(block)) {} else
[...]
+        if (!qemu_ram_is_migratable(block)) {} else
[...]

total: 5 errors, 0 warnings, 45 lines checked

Patch 7/26 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

8/26 Checking commit 7d0f4d22333b (migration/dirtyrate: Record hash results for each sampled page)
9/26 Checking commit a82b6464e8aa (migration/dirtyrate: Compare page hash results for recorded sampled page)
10/26 Checking commit cb3b7ff83ca3 (migration/dirtyrate: skip sampling ramblock with size below MIN_RAMBLOCK_SIZE)
11/26 Checking commit f1bfba70ecdd (migration/dirtyrate: Implement set_sample_page_period() and is_sample_period_valid())
12/26 Checking commit 9a1f0050a876 (migration/dirtyrate: Implement calculate_dirtyrate() function)
13/26 Checking commit aa44e3a85c97 (migration/dirtyrate: Implement qmp_cal_dirty_rate()/qmp_get_dirty_rate() function)
14/26 Checking commit 74484157cf26 (migration/dirtyrate: Add trace_calls to make it easier to debug)
15/26 Checking commit b4f2d50c15a8 (migration: Truncate state file in xen-save-devices-state)
16/26 Checking commit cb0c3a374335 (migration: increase max-bandwidth to 128 MiB/s (1 Gib/s))
17/26 Checking commit d70fc7217cc7 (migration/tls: save hostname into MigrationState)
18/26 Checking commit 652dea809a4b (migration/tls: extract migration_tls_client_create for common-use)
19/26 Checking commit e951359067b9 (migration/tls: add tls_hostname into MultiFDSendParams)
20/26 Checking commit 0dd99f4db197 (migration/tls: extract cleanup function for common-use)
21/26 Checking commit 42dd8447efd6 (migration/tls: add support for multifd tls-handshake)
22/26 Checking commit 88b31e8e1583 (migration/tls: add trace points for multifd-tls)
23/26 Checking commit b0a5d54e6e3e (monitor: Use LOCK_GUARD macros)
24/26 Checking commit 5c6cf5345c0f (virtiofsd: document cache=auto default)
25/26 Checking commit b4e3d2f9afca (virtiofsd: Used glib "shared" thread pool)
26/26 Checking commit 786afbf48ae9 (virtiofsd: Add -o allow_direct_io|no_allow_direct_io options)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200925120655.295142-1-dgilbert@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Dr. David Alan Gilbert Sept. 25, 2020, 2:31 p.m. UTC | #2
* no-reply@patchew.org (no-reply@patchew.org) wrote:
> Patchew URL: https://patchew.org/QEMU/20200925120655.295142-1-dgilbert@redhat.com/
> 
> 
> 
> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:
> 
> Type: series
> Message-id: 20200925120655.295142-1-dgilbert@redhat.com
> Subject: [PULL 00/26] migration and friends queue
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> git rev-parse base > /dev/null || exit 0
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> git config --local diff.algorithm histogram
> ./scripts/checkpatch.pl --mailback base..
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> From https://github.com/patchew-project/qemu
>  * [new tag]         patchew/20200925120655.295142-1-dgilbert@redhat.com -> patchew/20200925120655.295142-1-dgilbert@redhat.com
> Switched to a new branch 'test'
> 786afbf virtiofsd: Add -o allow_direct_io|no_allow_direct_io options
> b4e3d2f virtiofsd: Used glib "shared" thread pool
> 5c6cf53 virtiofsd: document cache=auto default
> b0a5d54 monitor: Use LOCK_GUARD macros
> 88b31e8 migration/tls: add trace points for multifd-tls
> 42dd844 migration/tls: add support for multifd tls-handshake
> 0dd99f4 migration/tls: extract cleanup function for common-use
> e951359 migration/tls: add tls_hostname into MultiFDSendParams
> 652dea8 migration/tls: extract migration_tls_client_create for common-use
> d70fc72 migration/tls: save hostname into MigrationState
> cb0c3a3 migration: increase max-bandwidth to 128 MiB/s (1 Gib/s)
> b4f2d50 migration: Truncate state file in xen-save-devices-state
> 7448415 migration/dirtyrate: Add trace_calls to make it easier to debug
> aa44e3a migration/dirtyrate: Implement qmp_cal_dirty_rate()/qmp_get_dirty_rate() function
> 9a1f005 migration/dirtyrate: Implement calculate_dirtyrate() function
> f1bfba7 migration/dirtyrate: Implement set_sample_page_period() and is_sample_period_valid()
> cb3b7ff migration/dirtyrate: skip sampling ramblock with size below MIN_RAMBLOCK_SIZE
> a82b646 migration/dirtyrate: Compare page hash results for recorded sampled page
> 7d0f4d2 migration/dirtyrate: Record hash results for each sampled page
> 8a90389 migration/dirtyrate: move RAMBLOCK_FOREACH_MIGRATABLE into ram.h
> 539c0d7 migration/dirtyrate: Add dirtyrate statistics series functions
> 421892b migration/dirtyrate: Add RamblockDirtyInfo to store sampled page info
> cab1614 migration/dirtyrate: add DirtyRateStatus to denote calculation status
> 83826ff migration/dirtyrate: setup up query-dirtyrate framwork
> c58f028 migration: Rework migrate_send_rp_req_pages() function
> 012226f migration: Properly destroy variables on incoming side
> 
> === OUTPUT BEGIN ===
> 1/26 Checking commit 012226fcacb7 (migration: Properly destroy variables on incoming side)
> 2/26 Checking commit c58f0286fdde (migration: Rework migrate_send_rp_req_pages() function)
> WARNING: Block comments use a leading /* on a separate line
> #32: FILE: migration/migration.c:317:
> +/* Request one page from the source VM at the given start address.
> 
> total: 0 errors, 1 warnings, 89 lines checked

This is just changing the text of an existing comment that's of slightly
the wrong format.


> 
> Patch 2/26 has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 3/26 Checking commit 83826ffcaf74 (migration/dirtyrate: setup up query-dirtyrate framwork)
> WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
> #20: 
> new file mode 100644
> 
> total: 0 errors, 1 warnings, 71 lines checked
> 
> Patch 3/26 has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 4/26 Checking commit cab161480dd1 (migration/dirtyrate: add DirtyRateStatus to denote calculation status)
> 5/26 Checking commit 421892b2ce34 (migration/dirtyrate: Add RamblockDirtyInfo to store sampled page info)
> 6/26 Checking commit 539c0d7abcf0 (migration/dirtyrate: Add dirtyrate statistics series functions)
> 7/26 Checking commit 8a9038992bc0 (migration/dirtyrate: move RAMBLOCK_FOREACH_MIGRATABLE into ram.h)
> ERROR: Macros with multiple statements should be enclosed in a do - while loop
> #67: FILE: migration/ram.h:42:
> +#define RAMBLOCK_FOREACH_NOT_IGNORED(block)            \
> +    INTERNAL_RAMBLOCK_FOREACH(block)                   \
> +        if (ramblock_is_ignored(block)) {} else

This is just moving an existing macro and not changing it.

> ERROR: trailing statements should be on next line
> #69: FILE: migration/ram.h:44:
> +        if (ramblock_is_ignored(block)) {} else
> 
> ERROR: Macros with multiple statements should be enclosed in a do - while loop
> #71: FILE: migration/ram.h:46:
> +#define RAMBLOCK_FOREACH_MIGRATABLE(block)             \
> +    INTERNAL_RAMBLOCK_FOREACH(block)                   \
> +        if (!qemu_ram_is_migratable(block)) {} else
> 
> ERROR: trailing statements should be on next line
> #73: FILE: migration/ram.h:48:
> +        if (!qemu_ram_is_migratable(block)) {} else
> 
> ERROR: braces {} are necessary for all arms of this statement
> #73: FILE: migration/ram.h:48:
> +        if (!qemu_ram_is_migratable(block)) {} else
> [...]
> +        if (!qemu_ram_is_migratable(block)) {} else
> [...]
> 
> total: 5 errors, 0 warnings, 45 lines checked
> 
> Patch 7/26 has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 
> 8/26 Checking commit 7d0f4d22333b (migration/dirtyrate: Record hash results for each sampled page)
> 9/26 Checking commit a82b6464e8aa (migration/dirtyrate: Compare page hash results for recorded sampled page)
> 10/26 Checking commit cb3b7ff83ca3 (migration/dirtyrate: skip sampling ramblock with size below MIN_RAMBLOCK_SIZE)
> 11/26 Checking commit f1bfba70ecdd (migration/dirtyrate: Implement set_sample_page_period() and is_sample_period_valid())
> 12/26 Checking commit 9a1f0050a876 (migration/dirtyrate: Implement calculate_dirtyrate() function)
> 13/26 Checking commit aa44e3a85c97 (migration/dirtyrate: Implement qmp_cal_dirty_rate()/qmp_get_dirty_rate() function)
> 14/26 Checking commit 74484157cf26 (migration/dirtyrate: Add trace_calls to make it easier to debug)
> 15/26 Checking commit b4f2d50c15a8 (migration: Truncate state file in xen-save-devices-state)
> 16/26 Checking commit cb0c3a374335 (migration: increase max-bandwidth to 128 MiB/s (1 Gib/s))
> 17/26 Checking commit d70fc7217cc7 (migration/tls: save hostname into MigrationState)
> 18/26 Checking commit 652dea809a4b (migration/tls: extract migration_tls_client_create for common-use)
> 19/26 Checking commit e951359067b9 (migration/tls: add tls_hostname into MultiFDSendParams)
> 20/26 Checking commit 0dd99f4db197 (migration/tls: extract cleanup function for common-use)
> 21/26 Checking commit 42dd8447efd6 (migration/tls: add support for multifd tls-handshake)
> 22/26 Checking commit 88b31e8e1583 (migration/tls: add trace points for multifd-tls)
> 23/26 Checking commit b0a5d54e6e3e (monitor: Use LOCK_GUARD macros)
> 24/26 Checking commit 5c6cf5345c0f (virtiofsd: document cache=auto default)
> 25/26 Checking commit b4e3d2f9afca (virtiofsd: Used glib "shared" thread pool)
> 26/26 Checking commit 786afbf48ae9 (virtiofsd: Add -o allow_direct_io|no_allow_direct_io options)
> === OUTPUT END ===
> 
> Test command exited with code: 1
> 
> 
> The full log is available at
> http://patchew.org/logs/20200925120655.295142-1-dgilbert@redhat.com/testing.checkpatch/?type=message.
> ---
> Email generated automatically by Patchew [https://patchew.org/].
> Please send your feedback to patchew-devel@redhat.com
Peter Maydell Sept. 25, 2020, 4:22 p.m. UTC | #3
On Fri, 25 Sep 2020 at 13:09, Dr. David Alan Gilbert (git)
<dgilbert@redhat.com> wrote:
>

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

>

> The following changes since commit 8c1c07929feae876202ba26f07a540c5115c18cd:

>

>   Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging (2020-09-24 18:48:45 +0100)

>

> are available in the Git repository at:

>

>   git://github.com/dagrh/qemu.git tags/pull-migration-20200925a

>

> for you to fetch changes up to e12a0edafeb5019aac74114b62a4703f79c5c693:

>

>   virtiofsd: Add -o allow_direct_io|no_allow_direct_io options (2020-09-25 12:45:58 +0100)

>

> ----------------------------------------------------------------

> Migration and virtiofsd pull

>

> Chuan Zheng's Dirtyrate and TLS changes, with small fixes from Dov and

> Luarent and Peter.

> Small virtiofs changes from Harry, Stefan, Vivek and Jiachen.

> One HMP/monitor rework from me.

>



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2
for any user-visible changes.

-- PMM
Vivek Goyal Sept. 29, 2020, 9:53 p.m. UTC | #4
On Fri, Sep 25, 2020 at 01:06:55PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
> 
> Due to the commit 65da4539803373ec4eec97ffc49ee90083e56efd, the O_DIRECT
> open flag of guest applications will be discarded by virtiofsd. While
> this behavior makes it consistent with the virtio-9p scheme when guest
> applications use direct I/O, we no longer have any chance to bypass the
> host page cache.
> 
> Therefore, we add a flag 'allow_direct_io' to lo_data. If '-o
>  no_allow_direct_io' option is added, or none of '-o allow_direct_io' or
>  '-o no_allow_direct_io' is added, the 'allow_direct_io' will be set to
>  0, and virtiofsd discards O_DIRECT as before. If '-o allow_direct_io'
> is added to the starting command-line, 'allow_direct_io' will be set to
> 1, so that the O_DIRECT flags will be retained and host page cache can
> be bypassed.

Hi Jiachen,

Curious that in what cases you want to bypass host page cache.

Thanks
Vivek

> 
> Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Message-Id: <20200824105957.61265-1-zhangjiachen.jaycee@bytedance.com>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  tools/virtiofsd/helper.c         |  4 ++++
>  tools/virtiofsd/passthrough_ll.c | 20 ++++++++++++++------
>  2 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 7bc5d7dc5a..85770d63f1 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -178,6 +178,10 @@ void fuse_cmdline_help(void)
>             "                               (0 leaves rlimit unchanged)\n"
>             "                               default: min(1000000, fs.file-max - 16384)\n"
>             "                                        if the current rlimit is lower\n"
> +           "    -o allow_direct_io|no_allow_direct_io\n"
> +           "                               retain/discard O_DIRECT flags passed down\n"
> +           "                               to virtiofsd from guest applications.\n"
> +           "                               default: no_allow_direct_io\n"
>             );
>  }
>  
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 784330e0e4..0b229ebd57 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -151,6 +151,7 @@ struct lo_data {
>      int timeout_set;
>      int readdirplus_set;
>      int readdirplus_clear;
> +    int allow_direct_io;
>      struct lo_inode root;
>      GHashTable *inodes; /* protected by lo->mutex */
>      struct lo_map ino_map; /* protected by lo->mutex */
> @@ -179,6 +180,8 @@ static const struct fuse_opt lo_opts[] = {
>      { "cache=always", offsetof(struct lo_data, cache), CACHE_ALWAYS },
>      { "readdirplus", offsetof(struct lo_data, readdirplus_set), 1 },
>      { "no_readdirplus", offsetof(struct lo_data, readdirplus_clear), 1 },
> +    { "allow_direct_io", offsetof(struct lo_data, allow_direct_io), 1 },
> +    { "no_allow_direct_io", offsetof(struct lo_data, allow_direct_io), 0 },
>      FUSE_OPT_END
>  };
>  static bool use_syslog = false;
> @@ -1516,7 +1519,8 @@ static void lo_releasedir(fuse_req_t req, fuse_ino_t ino,
>      fuse_reply_err(req, 0);
>  }
>  
> -static void update_open_flags(int writeback, struct fuse_file_info *fi)
> +static void update_open_flags(int writeback, int allow_direct_io,
> +                              struct fuse_file_info *fi)
>  {
>      /*
>       * With writeback cache, kernel may send read requests even
> @@ -1541,10 +1545,13 @@ static void update_open_flags(int writeback, struct fuse_file_info *fi)
>  
>      /*
>       * O_DIRECT in guest should not necessarily mean bypassing page
> -     * cache on host as well. If somebody needs that behavior, it
> -     * probably should be a configuration knob in daemon.
> +     * cache on host as well. Therefore, we discard it by default
> +     * ('-o no_allow_direct_io'). If somebody needs that behavior,
> +     * the '-o allow_direct_io' option should be set.
>       */
> -    fi->flags &= ~O_DIRECT;
> +    if (!allow_direct_io) {
> +        fi->flags &= ~O_DIRECT;
> +    }
>  }
>  
>  static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
> @@ -1576,7 +1583,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out;
>      }
>  
> -    update_open_flags(lo->writeback, fi);
> +    update_open_flags(lo->writeback, lo->allow_direct_io, fi);
>  
>      fd = openat(parent_inode->fd, name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
>                  mode);
> @@ -1786,7 +1793,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>      fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino,
>               fi->flags);
>  
> -    update_open_flags(lo->writeback, fi);
> +    update_open_flags(lo->writeback, lo->allow_direct_io, fi);
>  
>      sprintf(buf, "%i", lo_fd(req, ino));
>      fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
> @@ -2823,6 +2830,7 @@ int main(int argc, char *argv[])
>          .debug = 0,
>          .writeback = 0,
>          .posix_lock = 0,
> +        .allow_direct_io = 0,
>          .proc_self_fd = -1,
>      };
>      struct lo_map_elem *root_elem;
> -- 
> 2.26.2
>
Jiachen Zhang Sept. 30, 2020, 2:14 a.m. UTC | #5
On Wed, Sep 30, 2020 at 5:53 AM Vivek Goyal <vgoyal@redhat.com> wrote:

> On Fri, Sep 25, 2020 at 01:06:55PM +0100, Dr. David Alan Gilbert (git)

> wrote:

> > From: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>

> >

> > Due to the commit 65da4539803373ec4eec97ffc49ee90083e56efd, the O_DIRECT

> > open flag of guest applications will be discarded by virtiofsd. While

> > this behavior makes it consistent with the virtio-9p scheme when guest

> > applications use direct I/O, we no longer have any chance to bypass the

> > host page cache.

> >

> > Therefore, we add a flag 'allow_direct_io' to lo_data. If '-o

> >  no_allow_direct_io' option is added, or none of '-o allow_direct_io' or

> >  '-o no_allow_direct_io' is added, the 'allow_direct_io' will be set to

> >  0, and virtiofsd discards O_DIRECT as before. If '-o allow_direct_io'

> > is added to the starting command-line, 'allow_direct_io' will be set to

> > 1, so that the O_DIRECT flags will be retained and host page cache can

> > be bypassed.

>

> Hi Jiachen,

>

> Curious that in what cases you want to bypass host page cache.

>

> Thanks

> Vivek

>


Hi Vivek,

Some apps like DBMS may allocate their own file cache in userspace, so they
may want to bypass kernel page cache by using O_DIRECT. When these apps are
running in guest and access virtio-fs files, we'd better obey these needs.
This can also eliminate the host memory usage.

Another case is when we perform file I/O benchmarks on different storage
devices (like HDD and SSD), it's not that fair to introduce ram caches. By
using "cache=none" with "allow_direct_io", we can bypass guest and host page
caches, and access the storage devices directly.

Best wishes,
Jiachen


> >

> > Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>

> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

> > Message-Id: <20200824105957.61265-1-zhangjiachen.jaycee@bytedance.com>

> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> > ---

> >  tools/virtiofsd/helper.c         |  4 ++++

> >  tools/virtiofsd/passthrough_ll.c | 20 ++++++++++++++------

> >  2 files changed, 18 insertions(+), 6 deletions(-)

> >

> > diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c

> > index 7bc5d7dc5a..85770d63f1 100644

> > --- a/tools/virtiofsd/helper.c

> > +++ b/tools/virtiofsd/helper.c

> > @@ -178,6 +178,10 @@ void fuse_cmdline_help(void)

> >             "                               (0 leaves rlimit

> unchanged)\n"

> >             "                               default: min(1000000,

> fs.file-max - 16384)\n"

> >             "                                        if the current

> rlimit is lower\n"

> > +           "    -o allow_direct_io|no_allow_direct_io\n"

> > +           "                               retain/discard O_DIRECT

> flags passed down\n"

> > +           "                               to virtiofsd from guest

> applications.\n"

> > +           "                               default:

> no_allow_direct_io\n"

> >             );

> >  }

> >

> > diff --git a/tools/virtiofsd/passthrough_ll.c

> b/tools/virtiofsd/passthrough_ll.c

> > index 784330e0e4..0b229ebd57 100644

> > --- a/tools/virtiofsd/passthrough_ll.c

> > +++ b/tools/virtiofsd/passthrough_ll.c

> > @@ -151,6 +151,7 @@ struct lo_data {

> >      int timeout_set;

> >      int readdirplus_set;

> >      int readdirplus_clear;

> > +    int allow_direct_io;

> >      struct lo_inode root;

> >      GHashTable *inodes; /* protected by lo->mutex */

> >      struct lo_map ino_map; /* protected by lo->mutex */

> > @@ -179,6 +180,8 @@ static const struct fuse_opt lo_opts[] = {

> >      { "cache=always", offsetof(struct lo_data, cache), CACHE_ALWAYS },

> >      { "readdirplus", offsetof(struct lo_data, readdirplus_set), 1 },

> >      { "no_readdirplus", offsetof(struct lo_data, readdirplus_clear), 1

> },

> > +    { "allow_direct_io", offsetof(struct lo_data, allow_direct_io), 1 },

> > +    { "no_allow_direct_io", offsetof(struct lo_data, allow_direct_io),

> 0 },

> >      FUSE_OPT_END

> >  };

> >  static bool use_syslog = false;

> > @@ -1516,7 +1519,8 @@ static void lo_releasedir(fuse_req_t req,

> fuse_ino_t ino,

> >      fuse_reply_err(req, 0);

> >  }

> >

> > -static void update_open_flags(int writeback, struct fuse_file_info *fi)

> > +static void update_open_flags(int writeback, int allow_direct_io,

> > +                              struct fuse_file_info *fi)

> >  {

> >      /*

> >       * With writeback cache, kernel may send read requests even

> > @@ -1541,10 +1545,13 @@ static void update_open_flags(int writeback,

> struct fuse_file_info *fi)

> >

> >      /*

> >       * O_DIRECT in guest should not necessarily mean bypassing page

> > -     * cache on host as well. If somebody needs that behavior, it

> > -     * probably should be a configuration knob in daemon.

> > +     * cache on host as well. Therefore, we discard it by default

> > +     * ('-o no_allow_direct_io'). If somebody needs that behavior,

> > +     * the '-o allow_direct_io' option should be set.

> >       */

> > -    fi->flags &= ~O_DIRECT;

> > +    if (!allow_direct_io) {

> > +        fi->flags &= ~O_DIRECT;

> > +    }

> >  }

> >

> >  static void lo_create(fuse_req_t req, fuse_ino_t parent, const char

> *name,

> > @@ -1576,7 +1583,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t

> parent, const char *name,

> >          goto out;

> >      }

> >

> > -    update_open_flags(lo->writeback, fi);

> > +    update_open_flags(lo->writeback, lo->allow_direct_io, fi);

> >

> >      fd = openat(parent_inode->fd, name, (fi->flags | O_CREAT) &

> ~O_NOFOLLOW,

> >                  mode);

> > @@ -1786,7 +1793,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t

> ino, struct fuse_file_info *fi)

> >      fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n",

> ino,

> >               fi->flags);

> >

> > -    update_open_flags(lo->writeback, fi);

> > +    update_open_flags(lo->writeback, lo->allow_direct_io, fi);

> >

> >      sprintf(buf, "%i", lo_fd(req, ino));

> >      fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);

> > @@ -2823,6 +2830,7 @@ int main(int argc, char *argv[])

> >          .debug = 0,

> >          .writeback = 0,

> >          .posix_lock = 0,

> > +        .allow_direct_io = 0,

> >          .proc_self_fd = -1,

> >      };

> >      struct lo_map_elem *root_elem;

> > --

> > 2.26.2

> >

>

>
<div dir="ltr"><div dir="ltr"><br></div><div class="gmail_quote"><div class="gmail_attr" dir="ltr">On Wed, Sep 30, 2020 at 5:53 AM Vivek Goyal &lt;<a href="mailto:vgoyal@redhat.com">vgoyal@redhat.com</a>&gt; wrote:<br></div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">On Fri, Sep 25, 2020 at 01:06:55PM +0100, Dr. David Alan Gilbert (git) wrote:<br>
&gt; From: Jiachen Zhang &lt;<a target="_blank" href="mailto:zhangjiachen.jaycee@bytedance.com">zhangjiachen.jaycee@bytedance.com</a>&gt;<br>
&gt; <br>
&gt; Due to the commit 65da4539803373ec4eec97ffc49ee90083e56efd, the O_DIRECT<br>
&gt; open flag of guest applications will be discarded by virtiofsd. While<br>
&gt; this behavior makes it consistent with the virtio-9p scheme when guest<br>
&gt; applications use direct I/O, we no longer have any chance to bypass the<br>
&gt; host page cache.<br>
&gt; <br>
&gt; Therefore, we add a flag &#39;allow_direct_io&#39; to lo_data. If &#39;-o<br>
&gt;  no_allow_direct_io&#39; option is added, or none of &#39;-o allow_direct_io&#39; or<br>
&gt;  &#39;-o no_allow_direct_io&#39; is added, the &#39;allow_direct_io&#39; will be set to<br>
&gt;  0, and virtiofsd discards O_DIRECT as before. If &#39;-o allow_direct_io&#39;<br>
&gt; is added to the starting command-line, &#39;allow_direct_io&#39; will be set to<br>
&gt; 1, so that the O_DIRECT flags will be retained and host page cache can<br>
&gt; be bypassed.<br>
<br>
Hi Jiachen,<br>
<br>
Curious that in what cases you want to bypass host page cache.<br>
<br>
Thanks<br>
Vivek<br></blockquote><div> <br></div><div>Hi Vivek,</div><div><br></div><div>Some apps like DBMS may allocate their own file cache in userspace, so they<br>may want to bypass kernel page cache by using O_DIRECT. When these apps are<br>running in guest and access virtio-fs files, we&#39;d better obey these needs.<br>This can also eliminate the host memory usage.<br><br>Another case is when we perform file I/O benchmarks on different storage<br>devices (like HDD and SSD), it&#39;s not that fair to introduce ram caches. By<br>using &quot;cache=none&quot; with &quot;allow_direct_io&quot;, we can bypass guest and host page<br>caches, and access the storage devices directly.<br></div><div><br></div><div>Best wishes,</div><div>Jiachen</div><div> </div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">
&gt; <br>
&gt; Signed-off-by: Jiachen Zhang &lt;<a target="_blank" href="mailto:zhangjiachen.jaycee@bytedance.com">zhangjiachen.jaycee@bytedance.com</a>&gt;<br>
&gt; Reviewed-by: Stefan Hajnoczi &lt;<a target="_blank" href="mailto:stefanha@redhat.com">stefanha@redhat.com</a>&gt;<br>
&gt; Message-Id: &lt;<a target="_blank" href="mailto:20200824105957.61265-1-zhangjiachen.jaycee@bytedance.com">20200824105957.61265-1-zhangjiachen.jaycee@bytedance.com</a>&gt;<br>
&gt; Signed-off-by: Dr. David Alan Gilbert &lt;<a target="_blank" href="mailto:dgilbert@redhat.com">dgilbert@redhat.com</a>&gt;<br>
&gt; ---<br>
&gt;  tools/virtiofsd/helper.c         |  4 ++++<br>
&gt;  tools/virtiofsd/passthrough_ll.c | 20 ++++++++++++++------<br>
&gt;  2 files changed, 18 insertions(+), 6 deletions(-)<br>
&gt; <br>
&gt; diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c<br>
&gt; index 7bc5d7dc5a..85770d63f1 100644<br>
&gt; --- a/tools/virtiofsd/helper.c<br>
&gt; +++ b/tools/virtiofsd/helper.c<br>
&gt; @@ -178,6 +178,10 @@ void fuse_cmdline_help(void)<br>
&gt;             &quot;                               (0 leaves rlimit unchanged)\n&quot;<br>
&gt;             &quot;                               default: min(1000000, fs.file-max - 16384)\n&quot;<br>
&gt;             &quot;                                        if the current rlimit is lower\n&quot;<br>
&gt; +           &quot;    -o allow_direct_io|no_allow_direct_io\n&quot;<br>
&gt; +           &quot;                               retain/discard O_DIRECT flags passed down\n&quot;<br>
&gt; +           &quot;                               to virtiofsd from guest applications.\n&quot;<br>
&gt; +           &quot;                               default: no_allow_direct_io\n&quot;<br>
&gt;             );<br>
&gt;  }<br>
&gt;  <br>
&gt; diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c<br>
&gt; index 784330e0e4..0b229ebd57 100644<br>
&gt; --- a/tools/virtiofsd/passthrough_ll.c<br>
&gt; +++ b/tools/virtiofsd/passthrough_ll.c<br>
&gt; @@ -151,6 +151,7 @@ struct lo_data {<br>
&gt;      int timeout_set;<br>
&gt;      int readdirplus_set;<br>
&gt;      int readdirplus_clear;<br>
&gt; +    int allow_direct_io;<br>
&gt;      struct lo_inode root;<br>
&gt;      GHashTable *inodes; /* protected by lo-&gt;mutex */<br>
&gt;      struct lo_map ino_map; /* protected by lo-&gt;mutex */<br>
&gt; @@ -179,6 +180,8 @@ static const struct fuse_opt lo_opts[] = {<br>
&gt;      { &quot;cache=always&quot;, offsetof(struct lo_data, cache), CACHE_ALWAYS },<br>
&gt;      { &quot;readdirplus&quot;, offsetof(struct lo_data, readdirplus_set), 1 },<br>
&gt;      { &quot;no_readdirplus&quot;, offsetof(struct lo_data, readdirplus_clear), 1 },<br>
&gt; +    { &quot;allow_direct_io&quot;, offsetof(struct lo_data, allow_direct_io), 1 },<br>
&gt; +    { &quot;no_allow_direct_io&quot;, offsetof(struct lo_data, allow_direct_io), 0 },<br>
&gt;      FUSE_OPT_END<br>
&gt;  };<br>
&gt;  static bool use_syslog = false;<br>
&gt; @@ -1516,7 +1519,8 @@ static void lo_releasedir(fuse_req_t req, fuse_ino_t ino,<br>
&gt;      fuse_reply_err(req, 0);<br>
&gt;  }<br>
&gt;  <br>
&gt; -static void update_open_flags(int writeback, struct fuse_file_info *fi)<br>
&gt; +static void update_open_flags(int writeback, int allow_direct_io,<br>
&gt; +                              struct fuse_file_info *fi)<br>
&gt;  {<br>
&gt;      /*<br>
&gt;       * With writeback cache, kernel may send read requests even<br>
&gt; @@ -1541,10 +1545,13 @@ static void update_open_flags(int writeback, struct fuse_file_info *fi)<br>
&gt;  <br>
&gt;      /*<br>
&gt;       * O_DIRECT in guest should not necessarily mean bypassing page<br>
&gt; -     * cache on host as well. If somebody needs that behavior, it<br>
&gt; -     * probably should be a configuration knob in daemon.<br>
&gt; +     * cache on host as well. Therefore, we discard it by default<br>
&gt; +     * (&#39;-o no_allow_direct_io&#39;). If somebody needs that behavior,<br>
&gt; +     * the &#39;-o allow_direct_io&#39; option should be set.<br>
&gt;       */<br>
&gt; -    fi-&gt;flags &amp;= ~O_DIRECT;<br>
&gt; +    if (!allow_direct_io) {<br>
&gt; +        fi-&gt;flags &amp;= ~O_DIRECT;<br>
&gt; +    }<br>
&gt;  }<br>
&gt;  <br>
&gt;  static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,<br>
&gt; @@ -1576,7 +1583,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,<br>
&gt;          goto out;<br>
&gt;      }<br>
&gt;  <br>
&gt; -    update_open_flags(lo-&gt;writeback, fi);<br>
&gt; +    update_open_flags(lo-&gt;writeback, lo-&gt;allow_direct_io, fi);<br>
&gt;  <br>
&gt;      fd = openat(parent_inode-&gt;fd, name, (fi-&gt;flags | O_CREAT) &amp; ~O_NOFOLLOW,<br>
&gt;                  mode);<br>
&gt; @@ -1786,7 +1793,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)<br>
&gt;      fuse_log(FUSE_LOG_DEBUG, &quot;lo_open(ino=%&quot; PRIu64 &quot;, flags=%d)\n&quot;, ino,<br>
&gt;               fi-&gt;flags);<br>
&gt;  <br>
&gt; -    update_open_flags(lo-&gt;writeback, fi);<br>
&gt; +    update_open_flags(lo-&gt;writeback, lo-&gt;allow_direct_io, fi);<br>
&gt;  <br>
&gt;      sprintf(buf, &quot;%i&quot;, lo_fd(req, ino));<br>
&gt;      fd = openat(lo-&gt;proc_self_fd, buf, fi-&gt;flags &amp; ~O_NOFOLLOW);<br>
&gt; @@ -2823,6 +2830,7 @@ int main(int argc, char *argv[])<br>
&gt;          .debug = 0,<br>
&gt;          .writeback = 0,<br>
&gt;          .posix_lock = 0,<br>
&gt; +        .allow_direct_io = 0,<br>
&gt;          .proc_self_fd = -1,<br>
&gt;      };<br>
&gt;      struct lo_map_elem *root_elem;<br>
&gt; -- <br>
&gt; 2.26.2<br>
&gt; <br>
<br>
</blockquote></div></div>