Skip to content

Redfs support for rhel kernel-5.14.0-503.35.1.el9_5 #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 134 commits into
base: redfs-rhel9_5-513.35.1
Choose a base branch
from

Conversation

openunix
Copy link
Collaborator

This is the cherry-picks based on branch redfs-ubuntu-noble-6.8.0-58.60-updates@2a889c7f6036。

jankara and others added 30 commits April 22, 2025 14:55
Use invalidate_lock instead of fuse's private i_mmap_sem. The intended
purpose is exactly the same. By this conversion we fix a long standing
race between hole punching and read(2) / readahead(2) paths that can
lead to stale page cache contents.

CC: Miklos Szeredi <[email protected]>
Reviewed-by: Miklos Szeredi <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
(cherry picked from commit 8bcbbe9)
There is a potential race between fuse_read_interrupt() and
fuse_request_end().

TASK1
  in fuse_read_interrupt(): delete req->intr_entry (while holding
  fiq->lock)

TASK2
  in fuse_request_end(): req->intr_entry is empty -> skip fiq->lock
  wake up TASK3

TASK3
  request is freed

TASK1
  in fuse_read_interrupt(): dereference req->in.h.unique ***BAM***

Fix by always grabbing fiq->lock if the request was ever interrupted
(FR_INTERRUPTED set) thereby serializing with concurrent
fuse_read_interrupt() calls.

FR_INTERRUPTED is set before the request is queued on fiq->interrupts.
Dequeing the request is done with list_del_init() but FR_INTERRUPTED is not
cleared in this case.

Reported-by: lijiazi <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit e1e71c1)
Callers of fuse_writeback_range() assume that the file is ready for
modification by the server in the supplied byte range after the call
returns.

If there's a write that extends the file beyond the end of the supplied
range, then the file needs to be extended to at least the end of the range,
but currently that's not done.

There are at least two cases where this can cause problems:

 - copy_file_range() will return short count if the file is not extended
   up to end of the source range.

 - FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file,
   hence the region may not be fully allocated.

Fix by flushing writes from the start of the range up to the end of the
file.  This could be optimized if the writes are non-extending, etc, but
it's probably not worth the trouble.

Fixes: a2bc923 ("fuse: fix copy_file_range() in the writeback case")
Fixes: 6b1bdb5 ("fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)")
Cc: <[email protected]>  # v5.2
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 59bda8e)
The struct fuse_conn argument is not used and can be removed.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit a9667ac)
In writeback cache mode mtime/ctime updates are cached, and flushed to the
server using the ->write_inode() callback.

Closing the file will result in a dirty inode being immediately written,
but in other cases the inode can remain dirty after all references are
dropped.  This result in the inode being written back from reclaim, which
can deadlock on a regular allocation while the request is being served.

The usual mechanisms (GFP_NOFS/PF_MEMALLOC*) don't work for FUSE, because
serving a request involves unrelated userspace process(es).

Instead do the same as for dirty pages: make sure the inode is written
before the last reference is gone.

 - fallocate(2)/copy_file_range(2): these call file_update_time() or
   file_modified(), so flush the inode before returning from the call

 - unlink(2), link(2) and rename(2): these call fuse_update_ctime(), so
   flush the ctime directly from this helper

Reported-by: chenguanyou <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 5c791fe)
Fuse ->release() is otherwise asynchronous for the reason that it can
happen in contexts unrelated to close/munmap.

Inode is already written back from fuse_flush().  Add it to
fuse_vma_close() as well to make sure inode dirtying from mmaps also get
written out before the file is released.

Also add error handling.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 36ea233)
Add missing inode lock annotatation; found by syzbot.

Reported-and-tested-by: [email protected]
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit bda9a71)
Due to the introduction of kmap_local_*, the storage of slots used for
short-term mapping has changed from per-CPU to per-thread.  kmap_atomic()
disable preemption, while kmap_local_*() only disable migration.

There is no need to disable preemption in several kamp_atomic places used
in fuse.

Link: https://lwn.net/Articles/836144/
Signed-off-by: Peng Hao <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 5fe0fc9)
'ia->io=io' has been set in fuse_io_alloc.

Signed-off-by: Peng Hao <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit b5d9758)
Logically it belongs there since attributes are invalidated due to the
updated ctime.  This is a cleanup and should not change behavior.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 371e8fd)
Use list_first_entry_or_null() instead of list_empty() + list_entry().

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 84840ef)
Rename didn't decrement/clear nlink on overwritten target inode.

Create a common helper fuse_entry_unlinked() that handles this for unlink,
rmdir and rename.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit cefd1b8)
The fuse_iget() call in create_new_entry() already updated the inode with
all the new attributes and incremented the attribute version.

Incrementing the nlink will result in the wrong count.  This wasn't noticed
because the attributes were invalidated right after this.

Updating ctime is still needed for the writeback case when the ctime is not
refreshed.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 97f044f)
Only invalidate attributes that the operation might have changed.

Introduce two constants for common combinations of changed attributes:

  FUSE_STATX_MODIFY: file contents are modified but not size

  FUSE_STATX_MODSIZE: size and/or file contents modified

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit fa5eee5)
The attribute version in fuse_inode should be updated whenever the
attributes might have changed on the server.  In case of cached writes this
is not the case, so updating the attr_version is unnecessary and could
possibly affect performance.

Open code the remaining part of fuse_write_update_size().

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 8c56e03)
This function already updates the attr_version in fuse_inode, regardless of
whether the size was changed or not.

Rename the helper to fuse_write_update_attr() to reflect the more generic
nature.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 27ae449)
Extend the fuse_write_update_attr() helper to invalidate cached attributes
after a write.

This has already been done in all cases except in fuse_notify_store(), so
this is mostly a cleanup.

fuse_direct_write_iter() calls fuse_direct_IO() which already calls
fuse_write_update_attr(), so don't repeat that again in the former.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit d347739)
A READ request returning a short count is taken as indication of EOF, and
the cached file size is modified accordingly.

Fix the attribute version checking to allow for changes to fc->attr_version
on other inodes.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 484ce65)
It's safe to call file_update_time() if writeback cache is not enabled,
since S_NOCMTIME is set in this case.  This part is purely a cleanup.

__fuse_copy_file_range() also calls fuse_write_update_attr() only in the
writeback cache case.  This is inconsistent with other callers, where it's
called unconditionally.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 20235b4)
There are two instances of "bool is_wb = fc->writeback_cache" where the
actual use mostly involves checking "is_wb && S_ISREG(inode->i_mode)".

Clean up these cases by storing the second condition in the local variable.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit c15016b)
In case of writeback_cache fuse_fillattr() would revert the queried
attributes to the cached version.

Move this to fuse_change_attributes() in order to manage the writeback
logic in a central helper.  This will be necessary for patches that follow.

Only fuse_do_getattr() -> fuse_fillattr() uses the attributes after calling
fuse_change_attributes(), so this should not change behavior.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 04d82db)
If writeback_cache is enabled, then the size, mtime and ctime attributes of
regular files are always valid in the kernel's cache.  They are retrieved
from userspace only when the inode is freshly looked up.

Add a more generic "cache_mask", that indicates which attributes are
currently valid in cache.

This patch doesn't change behavior.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 4b52f05)
When deciding to send a GETATTR request take into account the cache mask
(which attributes are always valid).  The cache mask takes precedence over
the invalid mask.

This results in the GETATTR request not being sent unnecessarily.

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit ec85537)
fuse_update_attributes() refreshes metadata for internal use.

Each use needs a particular set of attributes to be refreshed, but
currently that cannot be expressed and all but atime are refreshed.

Add a mask argument, which lets fuse_update_get_attr() to decide based on
the cache_mask and the inval_mask whether a GETATTR call is needed or not.

Reported-by: Yongji Xie <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit c6c745b)
It is possible to trigger a crash by splicing anon pipe bufs to the fuse
device.

The reason for this is that anon_pipe_buf_release() will reuse buf->page if
the refcount is 1, but that page might have already been stolen and its
flags modified (e.g. PG_lru added).

This happens in the unlikely case of fuse_dev_splice_write() getting around
to calling pipe_buf_release() after a page has been stolen, added to the
page cache and removed from the page cache.

Fix by calling pipe_buf_release() right after the page was inserted into
the page cache.  In this case the page has an elevated refcount so any
release function will know that the page isn't reusable.

Reported-by: Frank Dinoff <[email protected]>
Link: https://lore.kernel.org/r/CAAmZXrsGg2xsP1CK+cbuEMumtrqdvD-NKnWzhNcvn71RV3c1yw@mail.gmail.com/
Fixes: dd3bb14 ("fuse: support splice() writing to fuse device")
Cc: <[email protected]> # v2.6.35
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 712a951)
Checking buf->flags should be done before the pipe_buf_release() is called
on the pipe buffer, since releasing the buffer might modify the flags.

This is exactly what page_cache_pipe_buf_release() does, and which results
in the same VM_BUG_ON_PAGE(PageLRU(page)) that the original patch was
trying to fix.

Reported-by: Justin Forbes <[email protected]>
Fixes: 712a951 ("fuse: fix page stealing")
Cc: <[email protected]> # v2.6.35
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 4734417)
The acceptable maximum value of lend parameter in
filemap_write_and_wait_range() is LLONG_MAX rather than -1. And there is
also some logic depending on LLONG_MAX check in write_cache_pages(). So
let's pass LLONG_MAX to filemap_write_and_wait_range() in
fuse_writeback_range() instead.

Fixes: 59bda8e ("fuse: flush extending writes")
Signed-off-by: Xie Yongji <[email protected]>
Cc: <[email protected]> # v5.15
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit e388164)
This is in prep for following per inode DAX checking.

Signed-off-by: Jeffle Xu <[email protected]>
Reviewed-by: Vivek Goyal <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit cecd491)
We add 'always', 'never', and 'inode' (default). '-o dax' continues to
operate the same which is equivalent to 'always'.

The following behavior is consistent with that on ext4/xfs:

 - The default behavior (when neither '-o dax' nor
   '-o dax=always|never|inode' option is specified) is equal to 'inode'
   mode, while 'dax=inode' won't be printed among the mount option list.

 - The 'inode' mode is only advisory. It will silently fallback to 'never'
   mode if fuse server doesn't support that.

Also noted that by the time of this commit, 'inode' mode is actually equal
to 'always' mode, before the per inode DAX flag is introduced in the
following patch.

Signed-off-by: Jeffle Xu <[email protected]>
Reviewed-by: Vivek Goyal <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 780b1b9)
Expand the fuse protocol to support per inode DAX.

FUSE_HAS_INODE_DAX flag is added indicating if fuse server/client
supporting per inode DAX. It can be conveyed in both FUSE_INIT request and
reply.

FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled for
corresponding file. It is conveyed in FUSE_LOOKUP reply.

Signed-off-by: Jeffle Xu <[email protected]>
Reviewed-by: Vivek Goyal <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 98046f7)
luis-henrix and others added 30 commits April 24, 2025 09:25
When mounting a user-space filesystem using io_uring, the initialization
of the rings is done separately in the server side.  If for some reason
(e.g. a server bug) this step is not performed it will be impossible to
unmount the filesystem if there are already requests waiting.

This issue is easily reproduced with the libfuse passthrough_ll example,
if the queue depth is set to '0' and a request is queued before trying to
unmount the filesystem.  When trying to force the unmount, fuse_abort_conn()
will try to wake up all tasks waiting in fc->blocked_waitq, but because the
rings were never initialized, fuse_uring_ready() will never return 'true'.

Fixes: 3393ff9 ("fuse: block request allocation until io-uring init is complete")
Signed-off-by: Luis Henriques <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Miklos Szeredi <[email protected]>
Reviewed-by: Bernd Schubert <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
(cherry picked from commit d550114)
(cherry picked from commit 3a7a4e7)
task-A (application) might be in request_wait_answer and
try to remove the request when it has FR_PENDING set.

task-B (a fuse-server io-uring task) might handle this
request with FUSE_IO_URING_CMD_COMMIT_AND_FETCH, when
fetching the next request and accessed the req from
the pending list in fuse_uring_ent_assign_req().
That code path was not protected by fiq->lock and so
might race with task-A.

For scaling reasons we better don't use fiq->lock, but
add a handler to remove canceled requests from the queue.

This also removes usage of fiq->lock from
fuse_uring_add_req_to_ring_ent() altogether, as it was
there just to protect against this race and incomplete.

Also added is a comment why FR_PENDING is not cleared.

Fixes: c090c8a ("fuse: Add io-uring sqe commit and fetch support")
Cc: <[email protected]> # v6.14
Reported-by: Joanne Koong <[email protected]>
Closes: https://lore.kernel.org/all/CAJnrk1ZgHNb78dz-yfNTpxmW7wtT88A=m-zF0ZoLXKLUHRjNTw@mail.gmail.com/
Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 09098e6)
(cherry picked from commit bbe1041)
Function fuse_uring_create() is used only from dev_uring.c and does not
need to be exposed in the header file.  Furthermore, it has the wrong
signature.

While there, also remove the 'struct fuse_ring' forward declaration.

Signed-off-by: Luis Henriques <[email protected]>
Reviewed-by: Bernd Schubert <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 841c7b8)
(cherry picked from commit 54d9a86)
fuse_notify_inval_entry and fuse_notify_delete were using fixed allocations
of FUSE_NAME_MAX to hold the file name. Often that large buffers are not
needed as file names might be smaller, so this uses the actual file name
size to do the allocation.

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Jingbo Xu <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 2412085)
(cherry picked from commit 65e7afa)
...when calling fuse_iget().

Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 253e524)
(cherry picked from commit 4f1ba11)
Function fuse_direntplus_link() might call fuse_iget() to initialize a new
fuse_inode and change its attributes. If fi->attr_version is always
initialized with 0, even if the attributes returned by the FUSE_READDIR
request is staled, as the new fi->attr_version is 0, fuse_change_attributes
will still set the staled attributes to inode. This wrong behaviour may
cause file size inconsistency even when there is no changes from
server-side.

To reproduce the issue, consider the following 2 programs (A and B) are
running concurrently,

        A                                               B
----------------------------------      --------------------------------
{ /fusemnt/dir/f is a file path in a fuse mount, the size of f is 0. }

readdir(/fusemnt/dir) start
//Daemon set size 0 to f direntry
                                        fallocate(f, 1024)
                                        stat(f) // B see size 1024
                                        echo 2 > /proc/sys/vm/drop_caches
readdir(/fusemnt/dir) reply to kernel
Kernel set 0 to the I_NEW inode

                                        stat(f) // B see size 0

In the above case, only program B is modifying the file size, however, B
observes file size changing between the 2 'readonly' stat() calls. To fix
this issue, we should make sure readdirplus still follows the rule of
attr_version staleness checking even if the fi->attr_version is lost due to
inode eviction.

To identify this situation, the new fc->evict_ctr is used to record whether
the eviction of inodes occurs during the readdirplus request processing.
If it does, the result of readdirplus may be inaccurate; otherwise, the
result of readdirplus can be trusted. Although this may still lead to
incorrect invalidation, considering the relatively low frequency of
evict occurrences, it should be acceptable.

Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/lkml/[email protected]/

Reported-by: Jiachen Zhang <[email protected]>
Suggested-by: Miklos Szeredi <[email protected]>
Signed-off-by: Zhang Tianci <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit fabde09)
A readdir buffer of 4K might be just enough to read a single file
name at a time - increase the buffer size to the max_pages.

Reviewed-by: Bernd Schubert <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Bernd Schubert <[email protected]>
(cherry picked from commit d77db6e)
…MAX_PAGES)

Introduce the capability to dynamically configure the max pages limit
(FUSE_MAX_MAX_PAGES) through a sysctl. This allows system administrators
to dynamically set the maximum number of pages that can be used for
servicing requests in fuse.

Previously, this is gated by FUSE_MAX_MAX_PAGES which is statically set
to 256 pages. One result of this is that the buffer size for a write
request is limited to 1 MiB on a 4k-page system.

The default value for this sysctl is the original limit (256 pages).

$ sysctl -a | grep max_pages_limit
fs.fuse.max_pages_limit = 256

$ sysctl -n fs.fuse.max_pages_limit
256

$ echo 1024 | sudo tee /proc/sys/fs/fuse/max_pages_limit
1024

$ sysctl -n fs.fuse.max_pages_limit
1024

$ echo 65536 | sudo tee /proc/sys/fs/fuse/max_pages_limit
tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument

$ echo 0 | sudo tee /proc/sys/fs/fuse/max_pages_limit
tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument

$ echo 65535 | sudo tee /proc/sys/fs/fuse/max_pages_limit
65535

$ sysctl -n fs.fuse.max_pages_limit
65535

Signed-off-by: Joanne Koong <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Sweet Tea Dorminy <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 2b3933b)
(cherry picked from commit b023958)
No need to take lock, we can have that in atomic way.
fuse-io-uring and virtiofs especially benefit from it
as they don't need the fiq lock at all.

Signed-off-by: Bernd Schubert <[email protected]>
(cherry picked from commit 47b2694)
This is especially needed for better ftrace analysis,
for example to build histograms. So far the request unique
was missing, because it was added after the first trace message.

IDs/req-unique now might not come up perfectly sequentially
anymore, but especially  with cloned device or io-uring this
did not work perfectly anyway.

Signed-off-by: Bernd Schubert <[email protected]>
(cherry picked from commit 4415892)
I've been timing various fuse operations and it's quite annoying to do
with kprobes.  Add two tracepoints for sending and ending fuse requests
to make it easier to debug and time various operations.

Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: Bernd Schubert <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 396b209)
(cherry picked from commit 6e77e0e)
fuse_uring_send_next_to_ring() can just call into fuse_uring_send
and avoid code dup.

Signed-off-by: Bernd Schubert <[email protected]>
(cherry picked from commit 9efaa8d)
Rename trace_fuse_request_send to trace_fuse_request_enqueue
Add trace_fuse_request_send
Add trace_fuse_request_bg_enqueue
Add trace_fuse_request_enqueue

This helps to track entire request time and time in different
queues.

Signed-off-by: Bernd Schubert <[email protected]>
(cherry picked from commit 4a7f142)
Our file system has a translation capability for S3-to-posix.
The current value of 1kiB is enough to cover S3 keys, but
does not allow encoding of %xx escape characters.
The limit is increased to (PATH_MAX - 1), as we need
3 x 1024 and that is close to PATH_MAX (4kB) already.
-1 is used as the terminating null is not included in the
length calculation.

Testing large file names was hard with libfuse/example file systems,
so I created a new memfs that does not have a 255 file name length
limitation.
libfuse/libfuse#1077

The connection is initialized with FUSE_NAME_LOW_MAX, which
is set to the previous value of FUSE_NAME_MAX of 1024. With
FUSE_MIN_READ_BUFFER of 8192 that is enough for two file names
+ fuse headers.
When FUSE_INIT reply sets max_pages to a value > 1 we know
that fuse daemon supports request buffers of at least 2 pages
(+ header) and can therefore hold 2 x PATH_MAX file names - operations
like rename or link that need two file names are no issue then.

Signed-off-by: Bernd Schubert <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 27992ef)
(cherry picked from commit 573e7ab)
open_by_handle_at(2) can fail with -ESTALE with a valid handle returned
by a previous name_to_handle_at(2) for evicted fuse inodes, which is
especially common when entry_valid_timeout is 0, e.g. when the fuse
daemon is in "cache=none" mode.

The time sequence is like:

	name_to_handle_at(2)	# succeed
	evict fuse inode
	open_by_handle_at(2)	# fail

The root cause is that, with 0 entry_valid_timeout, the dput() called in
name_to_handle_at(2) will trigger iput -> evict(), which will send
FUSE_FORGET to the daemon.  The following open_by_handle_at(2) will send
a new FUSE_LOOKUP request upon inode cache miss since the previous inode
eviction.  Then the fuse daemon may fail the FUSE_LOOKUP request with
-ENOENT as the cached metadata of the requested inode has already been
cleaned up during the previous FUSE_FORGET.  The returned -ENOENT is
treated as -ESTALE when open_by_handle_at(2) returns.

This confuses the application somehow, as open_by_handle_at(2) fails
when the previous name_to_handle_at(2) succeeds.  The returned errno is
also confusing as the requested file is not deleted and already there.
It is reasonable to fail name_to_handle_at(2) early in this case, after
which the application can fallback to open(2) to access files.

Since this issue typically appears when entry_valid_timeout is 0 which
is configured by the fuse daemon, the fuse daemon is the right person to
explicitly disable the export when required.

Also considering FUSE_EXPORT_SUPPORT actually indicates the support for
lookups of "." and "..", and there are existing fuse daemons supporting
export without FUSE_EXPORT_SUPPORT set, for compatibility, we add a new
INIT flag for such purpose.

Reviewed-by: Amir Goldstein <[email protected]>
Signed-off-by: Jingbo Xu <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit e022f6a)
(cherry picked from commit 52da885)
This makes the code a bit easier to read and allows to more easily add more
conditions when an exclusive lock is needed.

Signed-off-by: Bernd Schubert <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 699cf82)
(cherry picked from commit f400249)
fuse_finish_open() is called from fuse_open_common() and from
fuse_create_open().  In the latter case, the O_TRUNC flag is always
cleared in finish_open()m before calling into fuse_finish_open().

Move the bits that update attribute cache post O_TRUNC open into a
helper and call this helper from fuse_open_common() directly.

Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 0c9d708)
(cherry picked from commit 110fb13)
This removed the need to pass isdir argument to fuse_put_file().

Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit e26ee4e)
(cherry picked from commit cbabbdd)
fuse_open_common() has a lot of code relevant only for regular files and
O_TRUNC in particular.

Copy the little bit of remaining code into fuse_dir_open() and stop using
this common helper for directory open.

Also split out fuse_dir_finish_open() from fuse_finish_open() before we add
inode io modes to fuse_finish_open().

Suggested-by: Miklos Szeredi <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 7de64d5)
(cherry picked from commit 8dcafbe)
In preparation for inode io modes, a server open response could fail due to
conflicting inode io modes.

Allow returning an error from fuse_finish_open() and handle the error in
the callers.

fuse_finish_open() is used as the callback of finish_open(), so that
FMODE_OPENED will not be set if fuse_finish_open() fails.

Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit d2c487f)
(cherry picked from commit c1531c7)
The fuse inode io mode is determined by the mode of its open files/mmaps
and parallel dio opens and expressed in the value of fi->iocachectr:

 > 0 - caching io: files open in caching mode or mmap on direct_io file
 < 0 - parallel dio: direct io mode with parallel dio writes enabled
== 0 - direct io: no files open in caching mode and no files mmaped

Note that iocachectr value of 0 might become positive or negative,
while non-parallel dio is getting processed.

direct_io mmap uses page cache, so first mmap will mark the file as
ff->io_opened and increment fi->iocachectr to enter the caching io mode.

If the server opens the file in caching mode while it is already open
for parallel dio or vice versa the open fails.

This allows executing parallel dio when inode is not in caching mode
and no mmaps have been performed on the inode in question.

Signed-off-by: Bernd Schubert <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit cb098dd)
(cherry picked from commit 0a64c3c)
So far this is just a helper to remove complex locking logic out of
fuse_direct_write_iter.  Especially needed by the next patch in the series
to that adds the fuse inode cache IO mode and adds in even more locking
complexity.

Signed-off-by: Bernd Schubert <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 9bbb671)
(cherry picked from commit a709f7b)
Instead of denying caching mode on parallel dio open, deny caching
open only while parallel dio are in-progress and wait for in-progress
parallel dio writes before entering inode caching io mode.

This allows executing parallel dio when inode is not in caching mode
even if shared mmap is allowed, but no mmaps have been performed on
the inode in question.

An mmap on direct_io file now waits for all in-progress parallel dio
writes to complete, so parallel dio writes together with
FUSE_DIRECT_IO_ALLOW_MMAP is enabled by this commit.

Signed-off-by: Bernd Schubert <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 205c1d8)
(cherry picked from commit afe3358)
In some cases, the fi->writepages may be empty. And there is no need
to check fi->writepages with spin_lock, which may have an impact on
performance due to lock contention. For example, in scenarios where
multiple readers read the same file without any writers, or where
the page cache is not enabled.

Also remove the outdated comment since commit 6b2fb79 ("fuse:
optimize writepages search") has optimize the situation by replacing
list with rb-tree.

Signed-off-by: yangyun <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit ac5cffe)
(cherry picked from commit ba1236c)
This may be a typo. The comment has said shared locks are
not allowed when this bit is set. If using shared lock, the
wait in `fuse_file_cached_io_open` may be forever.

Fixes: 205c1d8 ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP")
CC: [email protected] # v6.9
Signed-off-by: yangyun <[email protected]>
Reviewed-by: Bernd Schubert <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 2f3d8ff)
(cherry picked from commit bbddfd7)
Due to user buffer misalignent we actually need one page more,
i.e. 1025 instead of 1024, will be handled differently.
For now we just bump up the max.

(cherry picked from commit 3f71501)
This is to allow copying into the buffer from the application
without the need to copy in ring context (and with that,
the need that the ring task is active in kernel space).

Signed-off-by: Bernd Schubert <[email protected]>
(cherry picked from commit 43d1a63)
(cherry picked from commit ea01f94)
If pinned pages are used the application can write into these
pages and io_uring_cmd_complete_in_task() is not needed.

Signed-off-by: Bernd Schubert <[email protected]>
(cherry picked from commit 5f0264c)
Add an smp_rmb() before checking list states in fuse_uring_destruct() to
ensure proper ordering between list modifications and emptiness checks.

During connection teardown lists are checked without holding a lock,
and ithout this barrier, the CPU executing fuse_uring_destruct()
might see inconsistent list states, leading to false WARN_ON triggers
even though the lists have been properly emptied.

The smp_rmb() ensures we see the final consistent state of all lists
after teardown operations complete on other CPUs.

This fixes occasional false WARN_ON triggers during connection
teardown.

Signed-off-by: Bernd Schubert <[email protected]>
(cherry picked from commit 2a889c7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.