diff mbox series

[RFC,v4,07/11] fs: statx add write zeroes unmap attribute

Message ID 20250421021509.2366003-8-yi.zhang@huaweicloud.com
State New
Headers show
Series fallocate: introduce FALLOC_FL_WRITE_ZEROES flag | expand

Commit Message

Zhang Yi April 21, 2025, 2:15 a.m. UTC
From: Zhang Yi <yi.zhang@huawei.com>

Add a new attribute flag to statx to determine whether a bdev or a file
supports the unmap write zeroes command.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 block/bdev.c              | 4 ++++
 fs/ext4/inode.c           | 9 ++++++---
 include/uapi/linux/stat.h | 1 +
 3 files changed, 11 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig May 5, 2025, 1:22 p.m. UTC | #1
On Mon, Apr 21, 2025 at 10:15:05AM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Add a new attribute flag to statx to determine whether a bdev or a file
> supports the unmap write zeroes command.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
>  block/bdev.c              | 4 ++++
>  fs/ext4/inode.c           | 9 ++++++---
>  include/uapi/linux/stat.h | 1 +
>  3 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 4844d1e27b6f..29b0e5feb138 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -1304,6 +1304,10 @@ void bdev_statx(struct path *path, struct kstat *stat,
>  			queue_atomic_write_unit_max_bytes(bd_queue));
>  	}
>  
> +	if (bdev_write_zeroes_unmap(bdev))
> +		stat->attributes |= STATX_ATTR_WRITE_ZEROES_UNMAP;
> +	stat->attributes_mask |= STATX_ATTR_WRITE_ZEROES_UNMAP;

Hmm, shouldn't this always be set by stat?  But I might just be
really confused what attributes_mask is, and might in fact have
misapplied it in past patches of my own..

Also shouldn't the patches to report the flag go into the bdev/ext4
patches that actually implement the feature for the respective files
to keep bisectability?
Darrick J. Wong May 5, 2025, 2:29 p.m. UTC | #2
On Mon, May 05, 2025 at 03:22:08PM +0200, Christoph Hellwig wrote:
> On Mon, Apr 21, 2025 at 10:15:05AM +0800, Zhang Yi wrote:
> > From: Zhang Yi <yi.zhang@huawei.com>
> > 
> > Add a new attribute flag to statx to determine whether a bdev or a file
> > supports the unmap write zeroes command.
> > 
> > Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > ---
> >  block/bdev.c              | 4 ++++
> >  fs/ext4/inode.c           | 9 ++++++---
> >  include/uapi/linux/stat.h | 1 +
> >  3 files changed, 11 insertions(+), 3 deletions(-)
> > 
> > diff --git a/block/bdev.c b/block/bdev.c
> > index 4844d1e27b6f..29b0e5feb138 100644
> > --- a/block/bdev.c
> > +++ b/block/bdev.c
> > @@ -1304,6 +1304,10 @@ void bdev_statx(struct path *path, struct kstat *stat,
> >  			queue_atomic_write_unit_max_bytes(bd_queue));
> >  	}
> >  
> > +	if (bdev_write_zeroes_unmap(bdev))
> > +		stat->attributes |= STATX_ATTR_WRITE_ZEROES_UNMAP;
> > +	stat->attributes_mask |= STATX_ATTR_WRITE_ZEROES_UNMAP;
> 
> Hmm, shouldn't this always be set by stat?  But I might just be
> really confused what attributes_mask is, and might in fact have
> misapplied it in past patches of my own..

attributes_mask contains attribute flags known to the filesystem,
whereas attributes contains flags actually set on the file.
"known_attributes" would have been a better name, but that's water under
the bridge. :P

> Also shouldn't the patches to report the flag go into the bdev/ext4
> patches that actually implement the feature for the respective files
> to keep bisectability?

/I/ think so...

--D
Zhang Yi May 6, 2025, 4:28 a.m. UTC | #3
On 2025/5/5 22:29, Darrick J. Wong wrote:
> On Mon, May 05, 2025 at 03:22:08PM +0200, Christoph Hellwig wrote:
>> On Mon, Apr 21, 2025 at 10:15:05AM +0800, Zhang Yi wrote:
>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>
>>> Add a new attribute flag to statx to determine whether a bdev or a file
>>> supports the unmap write zeroes command.
>>>
>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>> ---
>>>  block/bdev.c              | 4 ++++
>>>  fs/ext4/inode.c           | 9 ++++++---
>>>  include/uapi/linux/stat.h | 1 +
>>>  3 files changed, 11 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/block/bdev.c b/block/bdev.c
>>> index 4844d1e27b6f..29b0e5feb138 100644
>>> --- a/block/bdev.c
>>> +++ b/block/bdev.c
>>> @@ -1304,6 +1304,10 @@ void bdev_statx(struct path *path, struct kstat *stat,
>>>  			queue_atomic_write_unit_max_bytes(bd_queue));
>>>  	}
>>>  
>>> +	if (bdev_write_zeroes_unmap(bdev))
>>> +		stat->attributes |= STATX_ATTR_WRITE_ZEROES_UNMAP;
>>> +	stat->attributes_mask |= STATX_ATTR_WRITE_ZEROES_UNMAP;
>>
>> Hmm, shouldn't this always be set by stat?  But I might just be
>> really confused what attributes_mask is, and might in fact have
>> misapplied it in past patches of my own..
> 
> attributes_mask contains attribute flags known to the filesystem,
> whereas attributes contains flags actually set on the file.
> "known_attributes" would have been a better name, but that's water under
> the bridge. :P
> 
>> Also shouldn't the patches to report the flag go into the bdev/ext4
>> patches that actually implement the feature for the respective files
>> to keep bisectability?
> 
> /I/ think so...
> 

OK, since this statx reporting flag is not strongly tied to
FALLOC_FL_WRITE_ZEROES in vfs_fallocate(), I'll split this patch into
three separate patches.

Thanks,
Yi.
Christoph Hellwig May 6, 2025, 4:39 a.m. UTC | #4
On Tue, May 06, 2025 at 12:28:54PM +0800, Zhang Yi wrote:
> OK, since this statx reporting flag is not strongly tied to
> FALLOC_FL_WRITE_ZEROES in vfs_fallocate(), I'll split this patch into
> three separate patches.

I don't think that is the right thing to do do.  Keep the flag addition
here, and then report it in the ext4 and bdev patches adding
FALLOC_FL_WRITE_ZEROES as the reporting should be consistent with
the added support.
Christoph Hellwig May 6, 2025, 5:02 a.m. UTC | #5
On Mon, May 05, 2025 at 07:29:45AM -0700, Darrick J. Wong wrote:
> attributes_mask contains attribute flags known to the filesystem,
> whereas attributes contains flags actually set on the file.
> "known_attributes" would have been a better name, but that's water under
> the bridge. :P

Oooh.  I think I was very confused at what this patch does, and what
it does seems confused as well.

The patch adds a new flag to the STATX_ATTR_* namespace, which
historically was used for persistent on-disk flags like immutable,
not the STATX_* namespace where I assumed it, and which has no
support mask.  Which seems really odd for a pure kernel feature.
Then again it seems to follow STATX_ATTR_WRITE_ATOMIC which seems
just as wrongly place unless I'm missing something?
Darrick J. Wong May 6, 2025, 5:36 a.m. UTC | #6
On Tue, May 06, 2025 at 07:02:39AM +0200, Christoph Hellwig wrote:
> On Mon, May 05, 2025 at 07:29:45AM -0700, Darrick J. Wong wrote:
> > attributes_mask contains attribute flags known to the filesystem,
> > whereas attributes contains flags actually set on the file.
> > "known_attributes" would have been a better name, but that's water under
> > the bridge. :P
> 
> Oooh.  I think I was very confused at what this patch does, and what
> it does seems confused as well.
> 
> The patch adds a new flag to the STATX_ATTR_* namespace, which
> historically was used for persistent on-disk flags like immutable,
> not the STATX_* namespace where I assumed it, and which has no
> support mask.  Which seems really odd for a pure kernel feature.
> Then again it seems to follow STATX_ATTR_WRITE_ATOMIC which seems
> just as wrongly place unless I'm missing something?

I think STATX_* (i.e. not STATX_ATTR_*) flags have two purposes: 1) to
declare that specific fields in struct statx actually have meaning, most
notably in scenarios where zeroes are valid field contents; and 2) if
filling out the field is expensive, userspace can elect not to have it
filled by leaving the bit unset.  I don't know how userspace is supposed
to figure out which fields are expensive.

STATX_ATTR_* are supposed to be reflect persistent inode state.  I think
STATX_ATTR_WRITE_ATOMIC is a (now unremovable) artifact of the era when
we were going to have a new iflag and feature bit for all the new
forcealign functionality.  For XFS it's not necessary anymore because we
always have software fallback and the statx::atomic_write_* fields being
nonzero is sufficient to detect the functionality.

(I'm confused about the whole premise of /this/ patch -- it's a "fast
zeroing" fallocate flag that causes the *device* to unmap, so that the
filesystem can preallocate and avoid unwritten extent conversions?
What happens if the block device is thinp and it runs out of space?
That seems antithetical to fallocate...)

--D
Christoph Hellwig May 6, 2025, 5:47 a.m. UTC | #7
On Mon, May 05, 2025 at 10:36:54PM -0700, Darrick J. Wong wrote:
> I think STATX_* (i.e. not STATX_ATTR_*) flags have two purposes: 1) to
> declare that specific fields in struct statx actually have meaning, most
> notably in scenarios where zeroes are valid field contents; and 2) if
> filling out the field is expensive, userspace can elect not to have it
> filled by leaving the bit unset.  I don't know how userspace is supposed
> to figure out which fields are expensive.

Yes.

> (I'm confused about the whole premise of /this/ patch -- it's a "fast
> zeroing" fallocate flag that causes the *device* to unmap, so that the
> filesystem can preallocate and avoid unwritten extent conversions?

Yes.

> What happens if the block device is thinp and it runs out of space?
> That seems antithetical to fallocate...)

While the origin posix_fallocate was about space preallocatіon, these
days fallocate seems to be more about extent layout and/or fast
zeroing.

I'm not a huge fan of either this or the hardware atomics as they
force a FTL layer world view which is quite ingrained but also
rather stupid, but some folks really want to go down there full
throttle, so..
Zhang Yi May 6, 2025, 11:16 a.m. UTC | #8
On 2025/5/6 12:39, Christoph Hellwig wrote:
> On Tue, May 06, 2025 at 12:28:54PM +0800, Zhang Yi wrote:
>> OK, since this statx reporting flag is not strongly tied to
>> FALLOC_FL_WRITE_ZEROES in vfs_fallocate(), I'll split this patch into
>> three separate patches.
> 
> I don't think that is the right thing to do do.  Keep the flag addition
> here, and then report it in the ext4 and bdev patches adding
> FALLOC_FL_WRITE_ZEROES as the reporting should be consistent with
> the added support.
> 

Sorry, but I don't understand your suggestion. The
STATX_ATTR_WRITE_ZEROES_UNMAP attribute only indicate whether the bdev
and the block device that under the specified file support unmap write
zeroes commoand. It does not reflect whether the bdev and the
filesystems support FALLOC_FL_WRITE_ZEROES. The implementation of
FALLOC_FL_WRITE_ZEROES doesn't fully rely on the unmap write zeroes
commoand now, users simply refer to this attribute flag to determine
whether to use FALLOC_FL_WRITE_ZEROES when preallocating a file.
So, STATX_ATTR_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES doesn't
have strong relations, why do you suggested to put this into the ext4
and bdev patches that adding FALLOC_FL_WRITE_ZEROES?

Thanks,
Yi.
Zhang Yi May 6, 2025, 11:25 a.m. UTC | #9
On 2025/5/6 13:47, Christoph Hellwig wrote:
> On Mon, May 05, 2025 at 10:36:54PM -0700, Darrick J. Wong wrote:
>> I think STATX_* (i.e. not STATX_ATTR_*) flags have two purposes: 1) to
>> declare that specific fields in struct statx actually have meaning, most
>> notably in scenarios where zeroes are valid field contents; and 2) if
>> filling out the field is expensive, userspace can elect not to have it
>> filled by leaving the bit unset.  I don't know how userspace is supposed
>> to figure out which fields are expensive.
> 
> Yes.
> 

IIUC, it seems I was misled by STATX_ATTR_WRITE_ATOMIC, adding this
STATX_ATTR_WRITE_ZEROES_UNMAP attribute flag is incorrect. The right
approach should be to add STATX_WRITE_ZEROES_UNMAP, setting it in the
result_mask if the request_mask includes this flag and
bdev_write_zeroes_unmap(bdev) returns true. Something like below. Is
my understanding right?

diff --git a/block/bdev.c b/block/bdev.c
index 4ba48b8735e7..e1367f30dbce 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1303,9 +1303,9 @@ void bdev_statx(const struct path *path, struct kstat *stat, u32 request_mask)
                        queue_atomic_write_unit_max_bytes(bd_queue));
        }

+       if (request_mask & STATX_WRITE_ZEROES_UNMAP &&
+           bdev_write_zeroes_unmap(bdev))
+               stat->result_mask |= STATX_WRITE_ZEROES_UNMAP;

        stat->blksize = bdev_io_min(bdev);

Thanks,
Yi.
Christoph Hellwig May 6, 2025, 12:10 p.m. UTC | #10
On Tue, May 06, 2025 at 07:25:06PM +0800, Zhang Yi wrote:
> +       if (request_mask & STATX_WRITE_ZEROES_UNMAP &&
> +           bdev_write_zeroes_unmap(bdev))
> +               stat->result_mask |= STATX_WRITE_ZEROES_UNMAP;

That would be my expectation.  But then again this area seems to
confuse me a lot, so maybe we'll get Christian or Dave to chim in.
Christoph Hellwig May 6, 2025, 12:11 p.m. UTC | #11
On Tue, May 06, 2025 at 07:16:56PM +0800, Zhang Yi wrote:
> Sorry, but I don't understand your suggestion. The
> STATX_ATTR_WRITE_ZEROES_UNMAP attribute only indicate whether the bdev
> and the block device that under the specified file support unmap write
> zeroes commoand. It does not reflect whether the bdev and the
> filesystems support FALLOC_FL_WRITE_ZEROES. The implementation of
> FALLOC_FL_WRITE_ZEROES doesn't fully rely on the unmap write zeroes
> commoand now, users simply refer to this attribute flag to determine
> whether to use FALLOC_FL_WRITE_ZEROES when preallocating a file.
> So, STATX_ATTR_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES doesn't
> have strong relations, why do you suggested to put this into the ext4
> and bdev patches that adding FALLOC_FL_WRITE_ZEROES?

So what is the point of STATX_ATTR_WRITE_ZEROES_UNMAP?
Darrick J. Wong May 6, 2025, 3:55 p.m. UTC | #12
On Tue, May 06, 2025 at 02:10:12PM +0200, Christoph Hellwig wrote:
> On Tue, May 06, 2025 at 07:25:06PM +0800, Zhang Yi wrote:
> > +       if (request_mask & STATX_WRITE_ZEROES_UNMAP &&
> > +           bdev_write_zeroes_unmap(bdev))
> > +               stat->result_mask |= STATX_WRITE_ZEROES_UNMAP;
> 
> That would be my expectation.  But then again this area seems to
> confuse me a lot, so maybe we'll get Christian or Dave to chim in.

Um... does STATX_WRITE_ZEROES_UNMAP protect a field somewhere?
It might be nice to expose the request alignment granularity/max
size/etc.  Or does this flag exist solely to support discovering that
FALLOC_FL_WRITE_ZEROES is supported?  In which case, why not discover
its existence by calling fallocate(fd, WRITE_ZEROES, 0, 0) like the
other modes?

--D
Zhang Yi May 7, 2025, 7:33 a.m. UTC | #13
On 2025/5/6 20:11, Christoph Hellwig wrote:
> On Tue, May 06, 2025 at 07:16:56PM +0800, Zhang Yi wrote:
>> Sorry, but I don't understand your suggestion. The
>> STATX_ATTR_WRITE_ZEROES_UNMAP attribute only indicate whether the bdev
>> and the block device that under the specified file support unmap write
>> zeroes commoand. It does not reflect whether the bdev and the
>> filesystems support FALLOC_FL_WRITE_ZEROES. The implementation of
>> FALLOC_FL_WRITE_ZEROES doesn't fully rely on the unmap write zeroes
>> commoand now, users simply refer to this attribute flag to determine
>> whether to use FALLOC_FL_WRITE_ZEROES when preallocating a file.
>> So, STATX_ATTR_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES doesn't
>> have strong relations, why do you suggested to put this into the ext4
>> and bdev patches that adding FALLOC_FL_WRITE_ZEROES?
> 
> So what is the point of STATX_ATTR_WRITE_ZEROES_UNMAP?

My idea is not to strictly limiting the use of FALLOC_FL_WRITE_ZEROES to
only bdev or files where bdev_unmap_write_zeroes() returns true. In
other words, STATX_ATTR_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES
are not consistent, they are two independent features. Even if some
devices STATX_ATTR_WRITE_ZEROES_UNMAP are not set, users should still be
allowed to call fallcoate(FALLOC_FL_WRITE_ZEROES). This is because some
devices and drivers currently cannot reliably ascertain whether they
support the unmap write zero command; however, certain devices, such as
specific cloud storage devices, do support it. Users of these devices
may also wish to use FALLOC_FL_WRITE_ZEROES to expedite the zeroing
process.

Therefore, I think that the current point of
STATX_ATTR_WRITE_ZEROES_UNMAP (possibly STATX_WRITE_ZEROES_UNMAP) should
be to just indicate whether a bdev or file supports the unmap write zero
command (i.e., whether bdev_unmap_write_zeroes() returns true). If we
use standard SCSI and NVMe storage devices, and the
STATX_ATTR_WRITE_ZEROES_UNMAP attribute is set, users can be assured
that FALLOC_FL_WRITE_ZEROES is fast and can choose to use
fallocate(FALLOC_FL_WRITE_ZEROES) immediately.

Would you prefer to make STATX_ATTR_WRITE_ZEROES_UNMAP and
FALLOC_FL_WRITE_ZEROES consistent, which means
fallcoate(FALLOC_FL_WRITE_ZEROES) will return -EOPNOTSUPP if the block
device doesn't set STATX_ATTR_WRITE_ZEROES_UNMAP ?

If so, I'd suggested we need to:
1) Remove STATX_ATTR_WRITE_ZEROES_UNMAP since users can check the
   existence by calling fallocate(FALLOC_FL_WRITE_ZEROES) directly, this
   statx flag seems useless.
2) Make the BLK_FEAT_WRITE_ZEROES_UNMAP sysfs interface to RW, allowing
   users to adjust the block device's support state according to the
   real situation.

Thanks,
Yi.
Zhang Yi May 7, 2025, 8:23 a.m. UTC | #14
On 2025/5/6 23:55, Darrick J. Wong wrote:
> On Tue, May 06, 2025 at 02:10:12PM +0200, Christoph Hellwig wrote:
>> On Tue, May 06, 2025 at 07:25:06PM +0800, Zhang Yi wrote:
>>> +       if (request_mask & STATX_WRITE_ZEROES_UNMAP &&
>>> +           bdev_write_zeroes_unmap(bdev))
>>> +               stat->result_mask |= STATX_WRITE_ZEROES_UNMAP;
>>
>> That would be my expectation.  But then again this area seems to
>> confuse me a lot, so maybe we'll get Christian or Dave to chim in.
> 
> Um... does STATX_WRITE_ZEROES_UNMAP protect a field somewhere?
> It might be nice to expose the request alignment granularity/max
> size/etc.

I think that simply returning the support state is sufficient at the
moment. __blkdev_issue_write_zeroes() will send write zeroes through
multiple iterations, and there are no specific restrictions on the
parameters provided by users.

> Or does this flag exist solely to support discovering that
> FALLOC_FL_WRITE_ZEROES is supported?  In which case, why not discover
> its existence by calling fallocate(fd, WRITE_ZEROES, 0, 0) like the
> other modes?
> 

Current STATX_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES are
inconsistent, we allow users to call fallocate(FALLOC_FL_WRITE_ZEROES) on
files that STATX_WRITE_ZEROES_UNMAP is not set. Users can check whether
the device supports unmap write zeroes through STATX_WRITE_ZEROES_UNMAP
and then decide to call fallocate(FALLOC_FL_WRITE_ZEROES) if it is
supported. Please see this explanation for details.

  https://lore.kernel.org/linux-fsdevel/20250421021509.2366003-1-yi.zhang@huaweicloud.com/T/#mc1618822bc27d486296216fc1643d5531fee03e1

However, removing STATX_WRITE_ZEROES_UNMAP also seems good to me(Perhaps
it would be better.).It means we do not allow to call
fallocate(FALLOC_FL_WRITE_ZEROES) if the device does not explicitly
support unmap write zeroes.

Thanks,
Yi.
Darrick J. Wong May 7, 2025, 9:03 p.m. UTC | #15
On Wed, May 07, 2025 at 03:33:23PM +0800, Zhang Yi wrote:
> On 2025/5/6 20:11, Christoph Hellwig wrote:
> > On Tue, May 06, 2025 at 07:16:56PM +0800, Zhang Yi wrote:
> >> Sorry, but I don't understand your suggestion. The
> >> STATX_ATTR_WRITE_ZEROES_UNMAP attribute only indicate whether the bdev
> >> and the block device that under the specified file support unmap write
> >> zeroes commoand. It does not reflect whether the bdev and the
> >> filesystems support FALLOC_FL_WRITE_ZEROES. The implementation of
> >> FALLOC_FL_WRITE_ZEROES doesn't fully rely on the unmap write zeroes
> >> commoand now, users simply refer to this attribute flag to determine
> >> whether to use FALLOC_FL_WRITE_ZEROES when preallocating a file.
> >> So, STATX_ATTR_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES doesn't
> >> have strong relations, why do you suggested to put this into the ext4
> >> and bdev patches that adding FALLOC_FL_WRITE_ZEROES?
> > 
> > So what is the point of STATX_ATTR_WRITE_ZEROES_UNMAP?
> 
> My idea is not to strictly limiting the use of FALLOC_FL_WRITE_ZEROES to
> only bdev or files where bdev_unmap_write_zeroes() returns true. In
> other words, STATX_ATTR_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES
> are not consistent, they are two independent features. Even if some
> devices STATX_ATTR_WRITE_ZEROES_UNMAP are not set, users should still be
> allowed to call fallcoate(FALLOC_FL_WRITE_ZEROES). This is because some
> devices and drivers currently cannot reliably ascertain whether they
> support the unmap write zero command; however, certain devices, such as
> specific cloud storage devices, do support it. Users of these devices
> may also wish to use FALLOC_FL_WRITE_ZEROES to expedite the zeroing
> process.
> 
> Therefore, I think that the current point of
> STATX_ATTR_WRITE_ZEROES_UNMAP (possibly STATX_WRITE_ZEROES_UNMAP) should
> be to just indicate whether a bdev or file supports the unmap write zero
> command (i.e., whether bdev_unmap_write_zeroes() returns true). If we
> use standard SCSI and NVMe storage devices, and the
> STATX_ATTR_WRITE_ZEROES_UNMAP attribute is set, users can be assured
> that FALLOC_FL_WRITE_ZEROES is fast and can choose to use
> fallocate(FALLOC_FL_WRITE_ZEROES) immediately.
> 
> Would you prefer to make STATX_ATTR_WRITE_ZEROES_UNMAP and
> FALLOC_FL_WRITE_ZEROES consistent, which means
> fallcoate(FALLOC_FL_WRITE_ZEROES) will return -EOPNOTSUPP if the block
> device doesn't set STATX_ATTR_WRITE_ZEROES_UNMAP ?
> 
> If so, I'd suggested we need to:
> 1) Remove STATX_ATTR_WRITE_ZEROES_UNMAP since users can check the
>    existence by calling fallocate(FALLOC_FL_WRITE_ZEROES) directly, this
>    statx flag seems useless.
> 2) Make the BLK_FEAT_WRITE_ZEROES_UNMAP sysfs interface to RW, allowing
>    users to adjust the block device's support state according to the
>    real situation.

Sounds fine to me... ;)

--D

> Thanks,
> Yi.
> 
>
Christoph Hellwig May 8, 2025, 5:01 a.m. UTC | #16
On Wed, May 07, 2025 at 03:33:23PM +0800, Zhang Yi wrote:
> On 2025/5/6 20:11, Christoph Hellwig wrote:
> > On Tue, May 06, 2025 at 07:16:56PM +0800, Zhang Yi wrote:
> >> Sorry, but I don't understand your suggestion. The
> >> STATX_ATTR_WRITE_ZEROES_UNMAP attribute only indicate whether the bdev
> >> and the block device that under the specified file support unmap write
> >> zeroes commoand. It does not reflect whether the bdev and the
> >> filesystems support FALLOC_FL_WRITE_ZEROES. The implementation of
> >> FALLOC_FL_WRITE_ZEROES doesn't fully rely on the unmap write zeroes
> >> commoand now, users simply refer to this attribute flag to determine
> >> whether to use FALLOC_FL_WRITE_ZEROES when preallocating a file.
> >> So, STATX_ATTR_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES doesn't
> >> have strong relations, why do you suggested to put this into the ext4
> >> and bdev patches that adding FALLOC_FL_WRITE_ZEROES?
> > 
> > So what is the point of STATX_ATTR_WRITE_ZEROES_UNMAP?
> 
> My idea is not to strictly limiting the use of FALLOC_FL_WRITE_ZEROES to
> only bdev or files where bdev_unmap_write_zeroes() returns true. In
> other words, STATX_ATTR_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES
> are not consistent, they are two independent features. Even if some
> devices STATX_ATTR_WRITE_ZEROES_UNMAP are not set, users should still be
> allowed to call fallcoate(FALLOC_FL_WRITE_ZEROES). This is because some
> devices and drivers currently cannot reliably ascertain whether they
> support the unmap write zero command; however, certain devices, such as
> specific cloud storage devices, do support it. Users of these devices
> may also wish to use FALLOC_FL_WRITE_ZEROES to expedite the zeroing
> process.

What are those "cloud storage devices" where you set it reliably,
i.e.g what drivers?

> Therefore, I think that the current point of
> STATX_ATTR_WRITE_ZEROES_UNMAP (possibly STATX_WRITE_ZEROES_UNMAP) should
> be to just indicate whether a bdev or file supports the unmap write zero
> command (i.e., whether bdev_unmap_write_zeroes() returns true). If we
> use standard SCSI and NVMe storage devices, and the
> STATX_ATTR_WRITE_ZEROES_UNMAP attribute is set, users can be assured
> that FALLOC_FL_WRITE_ZEROES is fast and can choose to use
> fallocate(FALLOC_FL_WRITE_ZEROES) immediately.

That's breaking the abstracton again.  An attribute must say something
about the specific file, not about some underlying semi-related feature.

> Would you prefer to make STATX_ATTR_WRITE_ZEROES_UNMAP and
> FALLOC_FL_WRITE_ZEROES consistent, which means
> fallcoate(FALLOC_FL_WRITE_ZEROES) will return -EOPNOTSUPP if the block
> device doesn't set STATX_ATTR_WRITE_ZEROES_UNMAP ?

Not sure where the block device comes from here, both of these operate
on a file.

> If so, I'd suggested we need to:
> 1) Remove STATX_ATTR_WRITE_ZEROES_UNMAP since users can check the
>    existence by calling fallocate(FALLOC_FL_WRITE_ZEROES) directly, this
>    statx flag seems useless.

Yes, that was my inital thought.

> 2) Make the BLK_FEAT_WRITE_ZEROES_UNMAP sysfs interface to RW, allowing
>    users to adjust the block device's support state according to the
>    real situation.

No, it's a feature and not a flag.
Zhang Yi May 8, 2025, 12:17 p.m. UTC | #17
On 2025/5/8 13:01, Christoph Hellwig wrote:
> On Wed, May 07, 2025 at 03:33:23PM +0800, Zhang Yi wrote:
>> On 2025/5/6 20:11, Christoph Hellwig wrote:
>>> On Tue, May 06, 2025 at 07:16:56PM +0800, Zhang Yi wrote:
>>>> Sorry, but I don't understand your suggestion. The
>>>> STATX_ATTR_WRITE_ZEROES_UNMAP attribute only indicate whether the bdev
>>>> and the block device that under the specified file support unmap write
>>>> zeroes commoand. It does not reflect whether the bdev and the
>>>> filesystems support FALLOC_FL_WRITE_ZEROES. The implementation of
>>>> FALLOC_FL_WRITE_ZEROES doesn't fully rely on the unmap write zeroes
>>>> commoand now, users simply refer to this attribute flag to determine
>>>> whether to use FALLOC_FL_WRITE_ZEROES when preallocating a file.
>>>> So, STATX_ATTR_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES doesn't
>>>> have strong relations, why do you suggested to put this into the ext4
>>>> and bdev patches that adding FALLOC_FL_WRITE_ZEROES?
>>>
>>> So what is the point of STATX_ATTR_WRITE_ZEROES_UNMAP?
>>
>> My idea is not to strictly limiting the use of FALLOC_FL_WRITE_ZEROES to
>> only bdev or files where bdev_unmap_write_zeroes() returns true. In
>> other words, STATX_ATTR_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES
>> are not consistent, they are two independent features. Even if some
>> devices STATX_ATTR_WRITE_ZEROES_UNMAP are not set, users should still be
>> allowed to call fallcoate(FALLOC_FL_WRITE_ZEROES). This is because some
>> devices and drivers currently cannot reliably ascertain whether they
>> support the unmap write zero command; however, certain devices, such as
>> specific cloud storage devices, do support it. Users of these devices
>> may also wish to use FALLOC_FL_WRITE_ZEROES to expedite the zeroing
>> process.
> 
> What are those "cloud storage devices" where you set it reliably,
> i.e.g what drivers?

I don't have these 'cloud storage devices' now, but Ted had mentioned
those cloud-emulated block devices such as Google's Persistent Desk or
Amazon's Elastic Block Device in. I'm not sure if they can accurately
report the BLK_FEAT_WRITE_ZEROES_UNMAP feature, maybe Ted can give more
details.

https://lore.kernel.org/linux-fsdevel/20250106161732.GG1284777@mit.edu/

> 
>> Therefore, I think that the current point of
>> STATX_ATTR_WRITE_ZEROES_UNMAP (possibly STATX_WRITE_ZEROES_UNMAP) should
>> be to just indicate whether a bdev or file supports the unmap write zero
>> command (i.e., whether bdev_unmap_write_zeroes() returns true). If we
>> use standard SCSI and NVMe storage devices, and the
>> STATX_ATTR_WRITE_ZEROES_UNMAP attribute is set, users can be assured
>> that FALLOC_FL_WRITE_ZEROES is fast and can choose to use
>> fallocate(FALLOC_FL_WRITE_ZEROES) immediately.
> 
> That's breaking the abstracton again.  An attribute must say something
> about the specific file, not about some underlying semi-related feature.

OK.

> 
>> Would you prefer to make STATX_ATTR_WRITE_ZEROES_UNMAP and
>> FALLOC_FL_WRITE_ZEROES consistent, which means
>> fallcoate(FALLOC_FL_WRITE_ZEROES) will return -EOPNOTSUPP if the block
>> device doesn't set STATX_ATTR_WRITE_ZEROES_UNMAP ?
> 
> Not sure where the block device comes from here, both of these operate
> on a file.

I am referring to the block device on which the filesystem is mounted.
The support status of the file is directly dependent on this block
device.

> 
>> If so, I'd suggested we need to:
>> 1) Remove STATX_ATTR_WRITE_ZEROES_UNMAP since users can check the
>>    existence by calling fallocate(FALLOC_FL_WRITE_ZEROES) directly, this
>>    statx flag seems useless.
> 
> Yes, that was my inital thought.
> 
>> 2) Make the BLK_FEAT_WRITE_ZEROES_UNMAP sysfs interface to RW, allowing
>>    users to adjust the block device's support state according to the
>>    real situation.
> 
> No, it's a feature and not a flag.
> 

I am a bit confused about the feature and the flag, I checked the other
features, and it appears that features such as BLK_FEAT_ROTATIONAL allow
to be modified, is this flexibility due to historical reasons or for the
convenience of testing?

Think about this again, I suppose we should keep the
BLK_FEAT_WRITE_ZEROES_UNMAP as read-only and add a new flag,
BLK_FALG_WRITE_ZEROES_UNMAP_DISABLED, to disable the
FALLOC_FL_WRITE_ZEROES. Since the Write Zeroes does not guarantee
performance, and some devices may claim to support **UNMAP** Write Zeroes
but exhibit extremely slow write-zeroes speeds. Users may want be able to
disable it. Thoughts?

Thanks,
Yi.
Theodore Ts'o May 8, 2025, 8:24 p.m. UTC | #18
On Thu, May 08, 2025 at 08:17:14PM +0800, Zhang Yi wrote:
> On 2025/5/8 13:01, Christoph Hellwig wrote:
> >>
> >> My idea is not to strictly limiting the use of FALLOC_FL_WRITE_ZEROES to
> >> only bdev or files where bdev_unmap_write_zeroes() returns true. In
> >> other words, STATX_ATTR_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES
> >> are not consistent, they are two independent features. Even if some
> >> devices STATX_ATTR_WRITE_ZEROES_UNMAP are not set, users should still be
> >> allowed to call fallcoate(FALLOC_FL_WRITE_ZEROES). This is because some
> >> devices and drivers currently cannot reliably ascertain whether they
> >> support the unmap write zero command; however, certain devices, such as
> >> specific cloud storage devices, do support it. Users of these devices
> >> may also wish to use FALLOC_FL_WRITE_ZEROES to expedite the zeroing
> >> process.
> > 
> > What are those "cloud storage devices" where you set it reliably,
> > i.e.g what drivers?
> 
> I don't have these 'cloud storage devices' now, but Ted had mentioned
> those cloud-emulated block devices such as Google's Persistent Desk or
> Amazon's Elastic Block Device in. I'm not sure if they can accurately
> report the BLK_FEAT_WRITE_ZEROES_UNMAP feature, maybe Ted can give more
> details.
> 
> https://lore.kernel.org/linux-fsdevel/20250106161732.GG1284777@mit.edu/

There's nothing really exotic about what I was referring to in terms
of "cloud storage devices".  Perhaps a better way of describing them
is to consider devices such as dm-thin, or a Ceph Block Device, which
is being exposed as a SCSI or NVME device.

The distinction I was trying to make is performance-related.  Suppose
you call WRITE_ZEROS on a 14TB region.  After the WRITES_ZEROS
complete, a read anywhere on that 14TB region will return zeros.
That's easy.  But the question is when you call WRITE_ZEROS, will the
storage device (a) go away for a day or more before it completes (which
would be the case if it is a traditional spinning rust platter), or
(b) will it be basically instaneous, because all dm-thin or a Ceph Block
Device needs to do is to delete one or more entries in its mapping
table.

The problem is two-fold.  First, there's no way for the kernel to know
whether a storage device will behave as (a) or (b), because SCSI and
other storage specifications say that performance is out of scope.
They only talk about the functional results (afterwards, if yout try
to read from the region, you will get zeros), and are utterly silent
about how long it migt take.  The second problem is that if you are an
application program, there is no way you will be willing to call
fallocate(WRITE_ZEROS, 14TB) if you don't know whether the disk will
go away for a day or whether it will be instaneous.

But because there is no way for the kernel to know whether WRITE_ZEROS
will be fast or not, how would you expect the kernel to expose
STATX_ATTR_WRITE_ZEROES_UNMAP?  Cristoph's formulation "breaking the
abstraction" perfectly encapsulate the SCSI specification's position
on the matter, and I agree it's a valid position.  It's just not
terribly useful for the application programmer.

Things which some programs/users might want to know or rely upon, but which is normally quite impossible are: 

* Will the write zero / discard operation take a "reasonable" amount
  of time?  (Yes, not necessarilly well defined, but we know it when
  we see it, and hours or days is generally not reasonable.)

* Is the operation reliable --- i.e., is the device allowed to
  randomly decide that it won't actually zero the requested blocks (as
  is the case of discard) whenever it feels like it.

* Is the operation guaranteed to make the data irretreviable even in
  face of an attacker with low-level access to the device.  (And this
  is also not necessarily well defined; does the attacker have access
  to a scanning electronic microscope, or can do a liquid nitrogen
  destructive access of the flash device?)

The UFS (Universal Flash Storage) spec comes the closest to providing
commands that distinguish between these various cases, but for most
storage specifications, like SCSI, it is absolutely requires peaking
behind the abstraction barrier defined by the specification, and so
ultimately, the kernel can't know.

About the best you can do is to require manual configuration; perhaps a
config file at the database or userspace cluster file system level
because the system adminsitrator knows --- maybe because the hyperscale
cloud provider has leaned on the storage vendor to tell them under
NDA, storage specs be damned or they won't spend $$$ millions with
that storage vendor ---  or because the database administrator discovers
that using fallocate(WRITE_ZEROS) causes performance to tank, so they
manually disable the use of WRITE_ZEROS.

Could this be done in the kernel?  Sure.  We could have a file, say,
/sys/block/sdXX/queue/write_zeros where the write_zeros file is
writeable, and so the administrator can force-disable WRITES_ZERO by
writing 0 into the file.  And could this be queried via a STATX
attribute?  I suppose, although to be honest, I'm used to doing this
by looking at the sysfs files.  For example, just recently I coded up
the following:

static int is_rotational (const char *device_name EXT2FS_ATTR((unused)))
{
	int		rotational = -1;
#ifdef __linux__
	char		path[1024];
	struct stat	st;
	FILE		*f;

	if ((stat(device_name, &st) < 0) || !S_ISBLK(st.st_mode))
		return -1;

	snprintf(path, sizeof(path), "/sys/dev/block/%d:%d/queue/rotational",
		major(st.st_rdev), minor(st.st_rdev));
	f = fopen(path, "r");
	if (!f) {
		snprintf(path, sizeof(path),
			"/sys/dev/block/%d:%d/../queue/rotational",
			major(st.st_rdev), minor(st.st_rdev));
		f = fopen(path, "r");
	}
	if (f) {
		if (fscanf(f, "%d", &rotational) != 1)
			rotational = -1;
		fclose(f);
	}
#endif
	return rotational;
}

Easy-peasy!   Who needs statx?   :-)


						- Ted
Zhang Yi May 9, 2025, 12:35 p.m. UTC | #19
On 2025/5/9 4:24, Theodore Ts'o wrote:
> On Thu, May 08, 2025 at 08:17:14PM +0800, Zhang Yi wrote:
>> On 2025/5/8 13:01, Christoph Hellwig wrote:
>>>>
>>>> My idea is not to strictly limiting the use of FALLOC_FL_WRITE_ZEROES to
>>>> only bdev or files where bdev_unmap_write_zeroes() returns true. In
>>>> other words, STATX_ATTR_WRITE_ZEROES_UNMAP and FALLOC_FL_WRITE_ZEROES
>>>> are not consistent, they are two independent features. Even if some
>>>> devices STATX_ATTR_WRITE_ZEROES_UNMAP are not set, users should still be
>>>> allowed to call fallcoate(FALLOC_FL_WRITE_ZEROES). This is because some
>>>> devices and drivers currently cannot reliably ascertain whether they
>>>> support the unmap write zero command; however, certain devices, such as
>>>> specific cloud storage devices, do support it. Users of these devices
>>>> may also wish to use FALLOC_FL_WRITE_ZEROES to expedite the zeroing
>>>> process.
>>>
>>> What are those "cloud storage devices" where you set it reliably,
>>> i.e.g what drivers?
>>
>> I don't have these 'cloud storage devices' now, but Ted had mentioned
>> those cloud-emulated block devices such as Google's Persistent Desk or
>> Amazon's Elastic Block Device in. I'm not sure if they can accurately
>> report the BLK_FEAT_WRITE_ZEROES_UNMAP feature, maybe Ted can give more
>> details.
>>
>> https://lore.kernel.org/linux-fsdevel/20250106161732.GG1284777@mit.edu/
> 
> There's nothing really exotic about what I was referring to in terms
> of "cloud storage devices".  Perhaps a better way of describing them
> is to consider devices such as dm-thin, or a Ceph Block Device, which
> is being exposed as a SCSI or NVME device.

OK, then correctly reporting the BLK_FEAT_WRITE_ZEROES_UNMAP feature
should no longer be a major problem. It seems that we do not need to
pay much attention to enabling this feature manually.

> 
> The distinction I was trying to make is performance-related.  Suppose
> you call WRITE_ZEROS on a 14TB region.  After the WRITES_ZEROS
> complete, a read anywhere on that 14TB region will return zeros.
> That's easy.  But the question is when you call WRITE_ZEROS, will the
> storage device (a) go away for a day or more before it completes (which
> would be the case if it is a traditional spinning rust platter), or
> (b) will it be basically instaneous, because all dm-thin or a Ceph Block
> Device needs to do is to delete one or more entries in its mapping
> table.

Yes.

> 
> The problem is two-fold.  First, there's no way for the kernel to know
> whether a storage device will behave as (a) or (b), because SCSI and
> other storage specifications say that performance is out of scope.
> They only talk about the functional results (afterwards, if yout try
> to read from the region, you will get zeros), and are utterly silent
> about how long it migt take.  The second problem is that if you are an
> application program, there is no way you will be willing to call
> fallocate(WRITE_ZEROS, 14TB) if you don't know whether the disk will
> go away for a day or whether it will be instaneous.
> 
> But because there is no way for the kernel to know whether WRITE_ZEROS
> will be fast or not, how would you expect the kernel to expose
> STATX_ATTR_WRITE_ZEROES_UNMAP?  Cristoph's formulation "breaking the
> abstraction" perfectly encapsulate the SCSI specification's position
> on the matter, and I agree it's a valid position.  It's just not
> terribly useful for the application programmer.

Yes.

> 
> Things which some programs/users might want to know or rely upon, but which is normally quite impossible are: 
> 
> * Will the write zero / discard operation take a "reasonable" amount
>   of time?  (Yes, not necessarilly well defined, but we know it when
>   we see it, and hours or days is generally not reasonable.)
> 
> * Is the operation reliable --- i.e., is the device allowed to
>   randomly decide that it won't actually zero the requested blocks (as
>   is the case of discard) whenever it feels like it.
> 
> * Is the operation guaranteed to make the data irretreviable even in
>   face of an attacker with low-level access to the device.  (And this
>   is also not necessarily well defined; does the attacker have access
>   to a scanning electronic microscope, or can do a liquid nitrogen
>   destructive access of the flash device?)

Yes.

> 
> The UFS (Universal Flash Storage) spec comes the closest to providing
> commands that distinguish between these various cases, but for most
> storage specifications, like SCSI, it is absolutely requires peaking
> behind the abstraction barrier defined by the specification, and so
> ultimately, the kernel can't know.
> 
> About the best you can do is to require manual configuration; perhaps a
> config file at the database or userspace cluster file system level
> because the system adminsitrator knows --- maybe because the hyperscale
> cloud provider has leaned on the storage vendor to tell them under
> NDA, storage specs be damned or they won't spend $$$ millions with
> that storage vendor ---  or because the database administrator discovers
> that using fallocate(WRITE_ZEROS) causes performance to tank, so they
> manually disable the use of WRITE_ZEROS.

Yes, this is indeed what we should consider.

> 
> Could this be done in the kernel?  Sure.  We could have a file, say,
> /sys/block/sdXX/queue/write_zeros where the write_zeros file is
> writeable, and so the administrator can force-disable WRITES_ZERO by
> writing 0 into the file.  And could this be queried via a STATX
> attribute?  I suppose, although to be honest, I'm used to doing this
> by looking at the sysfs files.  For example, just recently I coded up
> the following:
> 
> static int is_rotational (const char *device_name EXT2FS_ATTR((unused)))
> {
> 	int		rotational = -1;
> #ifdef __linux__
> 	char		path[1024];
> 	struct stat	st;
> 	FILE		*f;
> 
> 	if ((stat(device_name, &st) < 0) || !S_ISBLK(st.st_mode))
> 		return -1;
> 
> 	snprintf(path, sizeof(path), "/sys/dev/block/%d:%d/queue/rotational",
> 		major(st.st_rdev), minor(st.st_rdev));
> 	f = fopen(path, "r");
> 	if (!f) {
> 		snprintf(path, sizeof(path),
> 			"/sys/dev/block/%d:%d/../queue/rotational",
> 			major(st.st_rdev), minor(st.st_rdev));
> 		f = fopen(path, "r");
> 	}
> 	if (f) {
> 		if (fscanf(f, "%d", &rotational) != 1)
> 			rotational = -1;
> 		fclose(f);
> 	}
> #endif
> 	return rotational;
> }
> 
> Easy-peasy!   Who needs statx?   :-)
> 

Yes. as I replied earlier, I'm going to implement this with a new flag,
BLK_FALG_WRITE_ZEROES_UNMAP_DISABLED, similar to the existing
BLK_FLAG_WRITE_CACHE_DISABLED. Make
/sys/block/<disk>/queue/write_zeroes_unmap to read-write. Regarding
whether to rename it to 'write_zeroes', I need to reconsider, as the
naming aligns perfectly with FALLOC_FL_WRITE_ZEROES, but the **UNMAP**
semantics cannot be adequately expressed.

Thank you for your detailed explanation and suggestions!

Best regards.
Yi.
diff mbox series

Patch

diff --git a/block/bdev.c b/block/bdev.c
index 4844d1e27b6f..29b0e5feb138 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1304,6 +1304,10 @@  void bdev_statx(struct path *path, struct kstat *stat,
 			queue_atomic_write_unit_max_bytes(bd_queue));
 	}
 
+	if (bdev_write_zeroes_unmap(bdev))
+		stat->attributes |= STATX_ATTR_WRITE_ZEROES_UNMAP;
+	stat->attributes_mask |= STATX_ATTR_WRITE_ZEROES_UNMAP;
+
 	stat->blksize = bdev_io_min(bdev);
 
 	blkdev_put_no_open(bdev);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 94c7d2d828a6..38caf2f39c6d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5653,6 +5653,7 @@  int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
 	struct inode *inode = d_inode(path->dentry);
 	struct ext4_inode *raw_inode;
 	struct ext4_inode_info *ei = EXT4_I(inode);
+	struct block_device *bdev = inode->i_sb->s_bdev;
 	unsigned int flags;
 
 	if ((request_mask & STATX_BTIME) &&
@@ -5672,8 +5673,6 @@  int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
 
 		stat->result_mask |= STATX_DIOALIGN;
 		if (dio_align == 1) {
-			struct block_device *bdev = inode->i_sb->s_bdev;
-
 			/* iomap defaults */
 			stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
 			stat->dio_offset_align = bdev_logical_block_size(bdev);
@@ -5695,6 +5694,9 @@  int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
 		generic_fill_statx_atomic_writes(stat, awu_min, awu_max);
 	}
 
+	if (S_ISREG(inode->i_mode) && bdev_write_zeroes_unmap(bdev))
+		stat->attributes |= STATX_ATTR_WRITE_ZEROES_UNMAP;
+
 	flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
 	if (flags & EXT4_APPEND_FL)
 		stat->attributes |= STATX_ATTR_APPEND;
@@ -5714,7 +5716,8 @@  int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
 				  STATX_ATTR_ENCRYPTED |
 				  STATX_ATTR_IMMUTABLE |
 				  STATX_ATTR_NODUMP |
-				  STATX_ATTR_VERITY);
+				  STATX_ATTR_VERITY |
+				  STATX_ATTR_WRITE_ZEROES_UNMAP);
 
 	generic_fillattr(idmap, request_mask, inode, stat);
 	return 0;
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index f78ee3670dd5..279ce7e34df7 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -251,6 +251,7 @@  struct statx {
 #define STATX_ATTR_VERITY		0x00100000 /* [I] Verity protected file */
 #define STATX_ATTR_DAX			0x00200000 /* File is currently in DAX state */
 #define STATX_ATTR_WRITE_ATOMIC		0x00400000 /* File supports atomic write operations */
+#define STATX_ATTR_WRITE_ZEROES_UNMAP	0x00800000 /* File supports unmap write zeroes */
 
 
 #endif /* _UAPI_LINUX_STAT_H */