Skip to content
Snippets Groups Projects
  1. May 16, 2019
    • David Howells's avatar
      afs: Fix application of the results of a inline bulk status fetch · 39db9815
      David Howells authored
      
      Fix afs_do_lookup() such that when it does an inline bulk status fetch op,
      it will update inodes that are already extant (something that afs_iget()
      doesn't do) and to cache permits for each inode created (thereby avoiding a
      follow up FS.FetchStatus call to determine this).
      
      Extant inodes need looking up in advance so that their cb_break counters
      before and after the operation can be compared.  To this end, the inode
      pointers are cached so that they don't need looking up again after the op.
      
      Fixes: 5cf9dd55 ("afs: Prospectively look up extra files when doing a single lookup")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      39db9815
    • David Howells's avatar
      afs: Pass pre-fetch server and volume break counts into afs_iget5_set() · b8359153
      David Howells authored
      
      Pass the server and volume break counts from before the status fetch
      operation that queried the attributes of a file into afs_iget5_set() so
      that the new vnode's break counters can be initialised appropriately.
      
      This allows detection of a volume or server break that happened whilst we
      were fetching the status or setting up the vnode.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      b8359153
    • David Howells's avatar
      afs: Fix unlink to handle YFS.RemoveFile2 better · a38a7558
      David Howells authored
      
      Make use of the status update for the target file that the YFS.RemoveFile2
      RPC op returns to correctly update the vnode as to whether the file was
      actually deleted or just had nlink reduced.
      
      Fixes: 30062bd1 ("afs: Implement YFS support in the fs client")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a38a7558
    • David Howells's avatar
      afs: Clear AFS_VNODE_CB_PROMISED if we detect callback expiry · 61c347ba
      David Howells authored
      
      Fix afs_validate() to clear AFS_VNODE_CB_PROMISED on a vnode if we detect
      any condition that causes the callback promise to be broken implicitly,
      including server break (cb_s_break), volume break (cb_v_break) or callback
      expiry.
      
      Fixes: ae3b7361 ("afs: Fix validation/callback interaction")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      61c347ba
    • David Howells's avatar
      afs: Make vnode->cb_interest RCU safe · f642404a
      David Howells authored
      
      Use RCU-based freeing for afs_cb_interest struct objects and use RCU on
      vnode->cb_interest.  Use that change to allow afs_check_validity() to use
      read_seqbegin_or_lock() instead of read_seqlock_excl().
      
      This also requires the caller of afs_check_validity() to hold the RCU read
      lock across the call.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f642404a
    • David Howells's avatar
      afs: Split afs_validate() so first part can be used under LOOKUP_RCU · c925bd0a
      David Howells authored
      
      Split afs_validate() so that the part that decides if the vnode is still
      valid can be used under LOOKUP_RCU conditions from afs_d_revalidate().
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      c925bd0a
    • David Howells's avatar
      afs: Don't save callback version and type fields · 7c712458
      David Howells authored
      
      Don't save callback version and type fields as the version is about the
      format of the callback information and the type is relative to the
      particular RPC call.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      7c712458
    • Christian Brauner's avatar
      uapi, fsopen: use square brackets around "fscontext" [ver #2] · 1cdc415f
      Christian Brauner authored
      
      Make the name of the anon inode fd "[fscontext]" instead of "fscontext".
      This is minor but most core-kernel anon inode fds already carry square
      brackets around their name:
      
      [eventfd]
      [eventpoll]
      [fanotify]
      [io_uring]
      [pidfd]
      [signalfd]
      [timerfd]
      [userfaultfd]
      
      For the sake of consistency lets do the same for the fscontext anon inode
      fd that comes with the new mount api.
      
      Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1cdc415f
    • David Howells's avatar
      afs: Fix double inc of vnode->cb_break · fd711586
      David Howells authored
      
      When __afs_break_callback() clears the CB_PROMISED flag, it increments
      vnode->cb_break to trigger a future refetch of the status and callback -
      however it also calls afs_clear_permits(), which also increments
      vnode->cb_break.
      
      Fix this by removing the increment from afs_clear_permits().
      
      Whilst we're at it, fix the conditional call to afs_put_permits() as the
      function checks to see if the argument is NULL, so the check is redundant.
      
      Fixes: be080a6f ("afs: Overhaul permit caching");
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      fd711586
    • David Howells's avatar
      afs: Fix application of status and callback to be under same lock · a58823ac
      David Howells authored
      
      When applying the status and callback in the response of an operation,
      apply them in the same critical section so that there's no race between
      checking the callback state and checking status-dependent state (such as
      the data version).
      
      Fix this by:
      
       (1) Allocating a joint {status,callback} record (afs_status_cb) before
           calling the RPC function for each vnode for which the RPC reply
           contains a status or a status plus a callback.  A flag is set in the
           record to indicate if a callback was actually received.
      
       (2) These records are passed into the RPC functions to be filled in.  The
           afs_decode_status() and yfs_decode_status() functions are removed and
           the cb_lock is no longer taken.
      
       (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
           update the vnode.
      
       (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
           the vnode.
      
       (5) vnodes, expected data-version numbers and callback break counters
           (cb_break) no longer need to be passed to the reply delivery
           functions.
      
           Note that, for the moment, the file locking functions still need
           access to both the call and the vnode at the same time.
      
       (6) afs_vnode_commit_status() is now given the cb_break value and the
           expected data_version and the task of applying the status and the
           callback to the vnode are now done here.
      
           This is done under a single taking of vnode->cb_lock.
      
       (7) afs_pages_written_back() is now called by afs_store_data() rather than
           by the reply delivery function.
      
           afs_pages_written_back() has been moved to before the call point and
           is now given the first and last page numbers rather than a pointer to
           the call.
      
       (8) The indicator from YFS.RemoveFile2 as to whether the target file
           actually got removed (status.abort_code == VNOVNODE) rather than
           merely dropping a link is now checked in afs_unlink rather than in
           xdr_decode_YFSFetchStatus().
      
      Supplementary fixes:
      
       (*) afs_cache_permit() now gets the caller_access mask from the
           afs_status_cb object rather than picking it out of the vnode's status
           record.  afs_fetch_status() returns caller_access through its argument
           list for this purpose also.
      
       (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
           than a read lock and now sets the callback inside the same critical
           section.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a58823ac
    • David Howells's avatar
      afs: Fix lock-wait/callback-break double locking · c7226e40
      David Howells authored
      
      __afs_break_callback() holds vnode->lock around its call of
      afs_lock_may_be_available() - which also takes that lock.
      
      Fix this by not taking the lock in __afs_break_callback().
      
      Also, there's no point checking the granted_locks and pending_locks queues;
      it's sufficient to check lock_state, so move that check out of
      afs_lock_may_be_available() into __afs_break_callback() to replace the
      queue checks.
      
      Fixes: e8d6c554 ("AFS: implement file locking")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      c7226e40
    • David Howells's avatar
      afs: Always get the reply time · 4571577f
      David Howells authored
      
      Always ask for the reply time from AF_RXRPC as it's used to calculate the
      callback expiry time and lock expiry times, so it's needed by most FS
      operations.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      4571577f
    • David Howells's avatar
      afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not set · d9052dda
      David Howells authored
      
      Don't invalidate the callback promise on a directory if the
      AFS_VNODE_DIR_VALID flag is not set (which indicates that the directory
      contents are invalid, due to edit failure, callback break, page reclaim).
      
      The directory will be reloaded next time the directory is accessed, so
      clearing the callback flag at this point may race with a reload of the
      directory and cancel it's recorded callback promise.
      
      Fixes: f3ddee8d ("afs: Fix directory handling")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d9052dda
    • David Howells's avatar
      afs: Fix order-1 allocation in afs_do_lookup() · 87182759
      David Howells authored
      
      afs_do_lookup() will do an order-1 allocation to allocate status records if
      there are more than 39 vnodes to stat.
      
      Fix this by allocating an array of {status,callback} records for each vnode
      we want to examine using vmalloc() if larger than a page.
      
      This not only gets rid of the order-1 allocation, but makes it easier to
      grow beyond 50 records for YFS servers.  It also allows us to move to
      {status,callback} tuples for other calls too and makes it easier to lock
      across the application of the status and the callback to the vnode.
      
      Fixes: 5cf9dd55 ("afs: Prospectively look up extra files when doing a single lookup")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      87182759
    • David Howells's avatar
      afs: Fix calculation of callback expiry time · 78107055
      David Howells authored
      
      Fix the calculation of the expiry time of a callback promise, as obtained
      from operations like FS.FetchStatus and FS.FetchData.
      
      The time should be based on the timestamp of the first DATA packet in the
      reply and the calculation needs to turn the ktime_t timestamp into a
      time64_t.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      78107055
    • David Howells's avatar
      afs: Get rid of afs_call::reply[] · ffba718e
      David Howells authored
      
      Replace the afs_call::reply[] array with a bunch of typed members so that
      the compiler can use type-checking on them.  It's also easier for the eye
      to see what's going on.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      ffba718e
    • David Howells's avatar
      afs: Make dynamic root population wait uninterruptibly for proc_cells_lock · 3b05e528
      David Howells authored
      
      Make dynamic root population wait uninterruptibly for proc_cells_lock.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      3b05e528
    • David Howells's avatar
      afs: Don't pass the vnode pointer through into the inline bulk status op · fefb2483
      David Howells authored
      
      Don't pass the vnode pointer through into the inline bulk status op.  We
      want to process the status records outside of it anyway.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      fefb2483
    • David Howells's avatar
      afs: Make some RPC operations non-interruptible · 20b8391f
      David Howells authored
      
      Make certain RPC operations non-interruptible, including:
      
       (*) Set attributes
       (*) Store data
      
           We don't want to get interrupted during a flush on close, flush on
           unlock, writeback or an inode update, leaving us in a state where we
           still need to do the writeback or update.
      
       (*) Extend lock
       (*) Release lock
      
           We don't want to get lock extension interrupted as the file locks on
           the server are time-limited.  Interruption during lock release is less
           of an issue since the lock is time-limited, but it's better to
           complete the release to avoid a several-minute wait to recover it.
      
           *Setting* the lock isn't a problem if it's interrupted since we can
            just return to the user and tell them they were interrupted - at
            which point they can elect to retry.
      
       (*) Silly unlink
      
           We want to remove silly unlink files if we can, rather than leaving
           them for the salvager to clear up.
      
      Note that whilst these calls are no longer interruptible, they do have
      timeouts on them, so if the server stops responding the call will fail with
      something like ETIME or ECONNRESET.
      
      Without this, the following:
      
      	kAFS: Unexpected error from FS.StoreData -512
      
      appears in dmesg when a pending store data gets interrupted and some
      processes may just hang.
      
      Additionally, make the code that checks/updates the server record ignore
      failure due to interruption if the main call is uninterruptible and if the
      server has an address list.  The next op will check it again since the
      expiration time on the old list has past.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Reported-by: default avatarJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      20b8391f
    • David Howells's avatar
      rxrpc: Allow the kernel to mark a call as being non-interruptible · b960a34b
      David Howells authored
      
      Allow kernel services using AF_RXRPC to indicate that a call should be
      non-interruptible.  This allows kafs to make things like lock-extension and
      writeback data storage calls non-interruptible.
      
      If this is set, signals will be ignored for operations on that call where
      possible - such as waiting to get a call channel on an rxrpc connection.
      
      It doesn't prevent UDP sendmsg from being interrupted, but that will be
      handled by packet retransmission.
      
      rxrpc_kernel_recv_data() isn't affected by this since that never waits,
      preferring instead to return -EAGAIN and leave the waiting to the caller.
      
      Userspace initiated calls can't be set to be uninterruptible at this time.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      b960a34b
    • David Howells's avatar
      afs: Fix error propagation from server record check/update · 0ab4c959
      David Howells authored
      
      afs_check/update_server_record() should be setting fc->error rather than
      fc->ac.error as they're called from within the cursor iteration function.
      
      afs_fs_cursor::error is where the error code of the attempt to call the
      operation on multiple servers is integrated and is the final result,
      whereas afs_addr_cursor::error is used to hold the error from individual
      iterations of the call loop.  (Note there's also an afs_vl_cursor which
      also wraps afs_addr_cursor for accessing VL servers rather than file
      servers).
      
      Fix this by setting fc->error in the afs_check/update_server_record() so
      that any error incurred whilst talking to the VL server correctly
      propagates to the final result.
      
      This results in:
      
      	kAFS: Unexpected error from FS.StoreData -512
      
      being seen, even though the store-data op is non-interruptible.  The error
      is actually coming from the server record update getting interrupted.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      0ab4c959
    • David Howells's avatar
      afs: Fix the maximum lifespan of VL and probe calls · 94f699c9
      David Howells authored
      
      If an older AFS server doesn't support an operation, it may accept the call
      and then sit on it forever, happily responding to pings that make kafs
      think that the call is still alive.
      
      Fix this by setting the maximum lifespan of Volume Location service calls
      in particular and probe calls in general so that they don't run on
      endlessly if they're not supported.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      94f699c9
    • David Howells's avatar
      afs: Fix "kAFS: AFS vnode with undefined type 0" · 51eba999
      David Howells authored
      
      Under some circumstances afs_select_fileserver() can return without setting
      an error in fc->error.  The problem is in the no_more_servers segment where
      the accumulated errors from attempts to contact various servers are
      integrated into an afs_error-type variable 'e'.  The resultant error code
      is, however, then abandoned.
      
      Fix this by getting the error out of e.error and putting it in 'error' so
      that the next part will store it into fc->error.
      
      Not doing this causes a report like the following:
      
          kAFS: AFS vnode with undefined type 0
          kAFS: A=0 m=0 s=0 v=0
          kAFS: vnode 20000025:1:1
      
      because the code following the server selection loop then sees what it
      thinks is a successful invocation because fc.error is 0.  However, it can't
      apply the status record because it's all zeros.
      
      The report is followed on the first instance with a trace looking something
      like:
      
           dump_stack+0x67/0x8e
           afs_inode_init_from_status.isra.2+0x21b/0x487
           afs_fetch_status+0x119/0x1df
           afs_iget+0x130/0x295
           afs_get_tree+0x31d/0x595
           vfs_get_tree+0x1f/0xe8
           fc_mount+0xe/0x36
           afs_d_automount+0x328/0x3c3
           follow_managed+0x109/0x20a
           lookup_fast+0x3bf/0x3f8
           do_last+0xc3/0x6a4
           path_openat+0x1af/0x236
           do_filp_open+0x51/0xae
           ? _raw_spin_unlock+0x24/0x2d
           ? __alloc_fd+0x1a5/0x1b7
           do_sys_open+0x13b/0x1e8
           do_syscall_64+0x7d/0x1b3
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 4584ae96 ("afs: Fix missing net error handling")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      51eba999
    • Jackie Liu's avatar
      io_uring: use wait_event_interruptible for cq_wait conditional wait · fdb288a6
      Jackie Liu authored
      
      The previous patch has ensured that io_cqring_events contain
      smp_rmb memory barriers, Now we can use wait_event_interruptible
      to keep the code simple.
      
      Signed-off-by: default avatarJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fdb288a6
    • Jackie Liu's avatar
      io_uring: adjust smp_rmb inside io_cqring_events · dc6ce4bc
      Jackie Liu authored
      
      Whenever smp_rmb is required to use io_cqring_events,
      keep smp_rmb inside the function io_cqring_events.
      
      Signed-off-by: default avatarJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dc6ce4bc
    • Roman Penyaev's avatar
      io_uring: fix infinite wait in khread_park() on io_finish_async() · 2bbcd6d3
      Roman Penyaev authored
      
      This fixes couple of races which lead to infinite wait of park completion
      with the following backtraces:
      
        [20801.303319] Call Trace:
        [20801.303321]  ? __schedule+0x284/0x650
        [20801.303323]  schedule+0x33/0xc0
        [20801.303324]  schedule_timeout+0x1bc/0x210
        [20801.303326]  ? schedule+0x3d/0xc0
        [20801.303327]  ? schedule_timeout+0x1bc/0x210
        [20801.303329]  ? preempt_count_add+0x79/0xb0
        [20801.303330]  wait_for_completion+0xa5/0x120
        [20801.303331]  ? wake_up_q+0x70/0x70
        [20801.303333]  kthread_park+0x48/0x80
        [20801.303335]  io_finish_async+0x2c/0x70
        [20801.303336]  io_ring_ctx_wait_and_kill+0x95/0x180
        [20801.303338]  io_uring_release+0x1c/0x20
        [20801.303339]  __fput+0xad/0x210
        [20801.303341]  task_work_run+0x8f/0xb0
        [20801.303342]  exit_to_usermode_loop+0xa0/0xb0
        [20801.303343]  do_syscall_64+0xe0/0x100
        [20801.303349]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        [20801.303380] Call Trace:
        [20801.303383]  ? __schedule+0x284/0x650
        [20801.303384]  schedule+0x33/0xc0
        [20801.303386]  io_sq_thread+0x38a/0x410
        [20801.303388]  ? __switch_to_asm+0x40/0x70
        [20801.303390]  ? wait_woken+0x80/0x80
        [20801.303392]  ? _raw_spin_lock_irqsave+0x17/0x40
        [20801.303394]  ? io_submit_sqes+0x120/0x120
        [20801.303395]  kthread+0x112/0x130
        [20801.303396]  ? kthread_create_on_node+0x60/0x60
        [20801.303398]  ret_from_fork+0x35/0x40
      
       o kthread_park() waits for park completion, so io_sq_thread() loop
         should check kthread_should_park() along with khread_should_stop(),
         otherwise if kthread_park() is called before prepare_to_wait()
         the following schedule() never returns:
      
         CPU#0                    CPU#1
      
         io_sq_thread_stop():     io_sq_thread():
      
                                     while(!kthread_should_stop() && !ctx->sqo_stop) {
      
            ctx->sqo_stop = 1;
            kthread_park()
      
      	                            prepare_to_wait();
                                          if (kthread_should_stop() {
      				    }
                                          schedule();   <<< nobody checks park flag,
      				                  <<< so schedule and never return
      
       o if the flag ctx->sqo_stop is observed by the io_sq_thread() loop
         it is quite possible, that kthread_should_park() check and the
         following kthread_parkme() is never called, because kthread_park()
         has not been yet called, but few moments later is is called and
         waits there for park completion, which never happens, because
         kthread has already exited:
      
         CPU#0                    CPU#1
      
         io_sq_thread_stop():     io_sq_thread():
      
            ctx->sqo_stop = 1;
                                     while(!kthread_should_stop() && !ctx->sqo_stop) {
                                         <<< observe sqo_stop and exit the loop
      			       }
      
      			       if (kthread_should_park())
      			           kthread_parkme();  <<< never called, since was
      					              <<< never parked
      
            kthread_park()           <<< waits forever for park completion
      
      In the current patch we quit the loop by only kthread_should_park()
      check (kthread_park() is synchronous, so kthread_should_stop() is
      never observed), and we abandon ->sqo_stop flag, since it is racy.
      At the end of the io_sq_thread() we unconditionally call parmke(),
      since we've exited the loop by the park flag.
      
      Signed-off-by: default avatarRoman Penyaev <rpenyaev@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2bbcd6d3
    • David Howells's avatar
      afs: Fix cell DNS lookup · d5c32c89
      David Howells authored
      
      Currently, once configured, AFS cells are looked up in the DNS at regular
      intervals - which is a waste of resources if those cells aren't being
      used.  It also leads to a problem where cells preloaded, but not
      configured, before the network is brought up end up effectively statically
      configured with no VL servers and are unable to get any.
      
      Fix this by not doing the DNS lookup until the first time a cell is
      touched.  It is waited for if we don't have any cached records yet,
      otherwise the DNS lookup to maintain the record is done in the background.
      
      This has the downside that the first time you touch a cell, you now have to
      wait for the upcall to do the required DNS lookups rather than them already
      being cached.
      
      Further, the record is not replaced if the old record has at least one
      server in it and the new record doesn't have any.
      
      Fixes: 0a5143f2 ("afs: Implement VL server rotation")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d5c32c89
  2. May 15, 2019
Loading