Skip to content
Snippets Groups Projects
  1. May 21, 2019
  2. Apr 26, 2019
    • Al Viro's avatar
      fsnotify(): switch to passing const struct qstr * for file_name · 25b229df
      Al Viro authored
      
      Note that in fnsotify_move() and fsnotify_link() we are guaranteed
      that dentry->d_name won't change during the fsnotify() evaluation
      (by having the parent directory locked exclusive), so we don't
      need to fetch dentry->d_name.name in the callers.  In fsnotify_dirent()
      the same stability of dentry->d_name is also true, but it's a bit
      more convoluted - there is one callchain (devpts_pty_new() ->
      fsnotify_create() -> fsnotify_dirent()) where the parent is _not_
      locked, but on devpts ->d_name of everything is unchanging; it
      has neither explicit nor implicit renames.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      25b229df
  3. Apr 25, 2019
  4. Apr 04, 2019
    • Ondrej Mosnacek's avatar
      kernfs: fix xattr name handling in LSM helpers · 1537ad15
      Ondrej Mosnacek authored
      
      The implementation of kernfs_security_xattr_*() helpers reuses the
      kernfs_node_xattr_*() functions, which take the suffix of the xattr name
      and extract full xattr name from it using xattr_full_name(). However,
      this function relies on the fact that the suffix passed to xattr
      handlers from VFS is always constructed from the full name by just
      incerementing the pointer. This doesn't necessarily hold for the callers
      of kernfs_security_xattr_*(), so their usage will easily lead to
      out-of-bounds access.
      
      Fix this by moving the xattr name reconstruction to the VFS xattr
      handlers and replacing the kernfs_security_xattr_*() helpers with more
      general kernfs_xattr_*() helpers that take full xattr name and allow
      accessing all kernfs node's xattrs.
      
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Fixes: b230d5ab ("LSM: add new hook for kernfs node initialization")
      Fixes: ec882da5 ("selinux: implement the kernfs_init_security hook")
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      1537ad15
  5. Mar 21, 2019
    • Ondrej Mosnacek's avatar
      kernfs: initialize security of newly created nodes · e19dfdc8
      Ondrej Mosnacek authored
      Use the new security_kernfs_init_security() hook to allow LSMs to
      possibly assign a non-default security context to a newly created kernfs
      node based on the attributes of the new node and also its parent node.
      
      This fixes an issue with cgroupfs under SELinux, where newly created
      cgroup subdirectories/files would not inherit its parent's context if
      it had been set explicitly to a non-default value (other than the genfs
      context specified by the policy). This can be reproduced as follows (on
      Fedora/RHEL):
      
          # mkdir /sys/fs/cgroup/unified/test
          # # Need permissive to change the label under Fedora policy:
          # setenforce 0
          # chcon -t container_file_t /sys/fs/cgroup/unified/test
          # ls -lZ /sys/fs/cgroup/unified
          total 0
          -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.controllers
          -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.max.depth
          -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.max.descendants
          -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.procs
          -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.stat
          -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.subtree_control
          -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.threads
          drwxr-xr-x.  2 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 init.scope
          drwxr-xr-x. 26 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:21 system.slice
          drwxr-xr-x.  3 root root system_u:object_r:container_file_t:s0 0 Jan 29 03:15 test
          drwxr-xr-x.  3 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 user.slice
          # mkdir /sys/fs/cgroup/unified/test/subdir
      
      Actual result:
      
          # ls -ldZ /sys/fs/cgroup/unified/test/subdir
          drwxr-xr-x. 2 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir
      
      Expected result:
      
          # ls -ldZ /sys/fs/cgroup/unified/test/subdir
          drwxr-xr-x. 2 root root unconfined_u:object_r:container_file_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir
      
      Link: https://github.com/SELinuxProject/selinux-kernel/issues/39
      
      
      
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      e19dfdc8
    • Ondrej Mosnacek's avatar
      LSM: add new hook for kernfs node initialization · b230d5ab
      Ondrej Mosnacek authored
      
      This patch introduces a new security hook that is intended for
      initializing the security data for newly created kernfs nodes, which
      provide a way of storing a non-default security context, but need to
      operate independently from mounts (and therefore may not have an
      associated inode at the moment of creation).
      
      The main motivation is to allow kernfs nodes to inherit the context of
      the parent under SELinux, similar to the behavior of
      security_inode_init_security(). Other LSMs may implement their own logic
      for handling the creation of new nodes.
      
      This patch also adds helper functions to <linux/kernfs.h> for
      getting/setting security xattrs of a kernfs node so that LSMs hooks are
      able to do their job. Other important attributes should be accessible
      direcly in the kernfs_node fields (in case there is need for more, then
      new helpers should be added to kernfs.h along with the patch that needs
      them).
      
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      [PM: more manual merge fixes]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      b230d5ab
    • Ondrej Mosnacek's avatar
      kernfs: use simple_xattrs for security attributes · 0ac6075a
      Ondrej Mosnacek authored
      
      Replace the special handling of security xattrs with simple_xattrs, as
      is already done for the trusted xattrs. This simplifies the code and
      allows LSMs to use more than just a single xattr to do their business.
      
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      [PM: manual merge fixes]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      0ac6075a
    • Ondrej Mosnacek's avatar
      kernfs: do not alloc iattrs in kernfs_xattr_get · d0c9c153
      Ondrej Mosnacek authored
      
      This is a read-only operation, so we can simply return -ENODATA if
      kn->iattr is NULL.
      
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      [PM: minor merge fixes]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      d0c9c153
    • Ondrej Mosnacek's avatar
      kernfs: clean up struct kernfs_iattrs · 05895219
      Ondrej Mosnacek authored
      
      Right now, kernfs_iattrs embeds the whole struct iattr, even though it
      doesn't really use half of its fields... This both leads to wasting
      space and makes the code look awkward. Let's just list the few fields
      we need directly in struct kernfs_iattrs.
      
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      [PM: merged a number of chunks manually due to fuzz]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      05895219
  6. Mar 06, 2019
    • Johannes Weiner's avatar
      fs: kernfs: add poll file operation · 147e1a97
      Johannes Weiner authored
      Patch series "psi: pressure stall monitors", v3.
      
      Android is adopting psi to detect and remedy memory pressure that
      results in stuttering and decreased responsiveness on mobile devices.
      
      Psi gives us the stall information, but because we're dealing with
      latencies in the millisecond range, periodically reading the pressure
      files to detect stalls in a timely fashion is not feasible.  Psi also
      doesn't aggregate its averages at a high enough frequency right now.
      
      This patch series extends the psi interface such that users can
      configure sensitive latency thresholds and use poll() and friends to be
      notified when these are breached.
      
      As high-frequency aggregation is costly, it implements an aggregation
      method that is optimized for fast, short-interval averaging, and makes
      the aggregation frequency adaptive, such that high-frequency updates
      only happen while monitored stall events are actively occurring.
      
      With these patches applied, Android can monitor for, and ward off,
      mounting memory shortages before they cause problems for the user.  For
      example, using memory stall monitors in userspace low memory killer
      daemon (lmkd) we can detect mounting pressure and kill less important
      processes before device becomes visibly sluggish.
      
      In our memory stress testing psi memory monitors produce roughly 10x
      less false positives compared to vmpressure signals.  Having ability to
      specify multiple triggers for the same psi metric allows other parts of
      Android framework to monitor memory state of the device and act
      accordingly.
      
      The new interface is straightforward.  The user opens one of the
      pressure files for writing and writes a trigger description into the
      file descriptor that defines the stall state - some or full, and the
      maximum stall time over a given window of time.  E.g.:
      
              /* Signal when stall time exceeds 100ms of a 1s window */
              char trigger[] = "full 100000 1000000";
              fd = open("/proc/pressure/memory");
              write(fd, trigger, sizeof(trigger));
              while (poll() >= 0) {
                      ...
              }
              close(fd);
      
      When the monitored stall state is entered, psi adapts its aggregation
      frequency according to what the configured time window requires in order
      to emit event signals in a timely fashion.  Once the stalling subsides,
      aggregation reverts back to normal.
      
      The trigger is associated with the open file descriptor.  To stop
      monitoring, the user only needs to close the file descriptor and the
      trigger is discarded.
      
      Patches 1-4 prepare the psi code for polling support.  Patch 5
      implements the adaptive polling logic, the pressure growth detection
      optimized for short intervals, and hooks up write() and poll() on the
      pressure files.
      
      The patches were developed in collaboration with Johannes Weiner.
      
      This patch (of 5):
      
      Kernfs has a standardized poll/notification mechanism for waking all
      pollers on all fds when a filesystem node changes.  To allow polling for
      custom events, add a .poll callback that can override the default.
      
      This is in preparation for pollable cgroup pressure files which have
      per-fd trigger configurations.
      
      Link: http://lkml.kernel.org/r/20190124211518.244221-2-surenb@google.com
      
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      147e1a97
  7. Feb 28, 2019
    • David Howells's avatar
      kernfs, sysfs, cgroup, intel_rdt: Support fs_context · 23bf1b6b
      David Howells authored
      
      Make kernfs support superblock creation/mount/remount with fs_context.
      
      This requires that sysfs, cgroup and intel_rdt, which are built on kernfs,
      be made to support fs_context also.
      
      Notes:
      
       (1) A kernfs_fs_context struct is created to wrap fs_context and the
           kernfs mount parameters are moved in here (or are in fs_context).
      
       (2) kernfs_mount{,_ns}() are made into kernfs_get_tree().  The extra
           namespace tag parameter is passed in the context if desired
      
       (3) kernfs_free_fs_context() is provided as a destructor for the
           kernfs_fs_context struct, but for the moment it does nothing except
           get called in the right places.
      
       (4) sysfs doesn't wrap kernfs_fs_context since it has no parameters to
           pass, but possibly this should be done anyway in case someone wants to
           add a parameter in future.
      
       (5) A cgroup_fs_context struct is created to wrap kernfs_fs_context and
           the cgroup v1 and v2 mount parameters are all moved there.
      
       (6) cgroup1 parameter parsing error messages are now handled by invalf(),
           which allows userspace to collect them directly.
      
       (7) cgroup1 parameter cleanup is now done in the context destructor rather
           than in the mount/get_tree and remount functions.
      
      Weirdies:
      
       (*) cgroup_do_get_tree() calls cset_cgroup_from_root() with locks held,
           but then uses the resulting pointer after dropping the locks.  I'm
           told this is okay and needs commenting.
      
       (*) The cgroup refcount web.  This really needs documenting.
      
       (*) cgroup2 only has one root?
      
      Add a suggestion from Thomas Gleixner in which the RDT enablement code is
      placed into its own function.
      
      [folded a leak fix from Andrey Vagin]
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc: Tejun Heo <tj@kernel.org>
      cc: Li Zefan <lizefan@huawei.com>
      cc: Johannes Weiner <hannes@cmpxchg.org>
      cc: cgroups@vger.kernel.org
      cc: fenghua.yu@intel.com
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      23bf1b6b
  8. Feb 08, 2019
    • Ayush Mittal's avatar
      kernfs: Allocating memory for kernfs_iattrs with kmem_cache. · 26e28d68
      Ayush Mittal authored
      
      Creating a new cache for kernfs_iattrs.
      Currently, memory is allocated with kzalloc() which
      always gives aligned memory. On ARM, this is 64 byte aligned.
      To avoid the wastage of memory in aligning the size requested,
      a new cache for kernfs_iattrs is created.
      
      Size of struct kernfs_iattrs is 80 Bytes.
      On ARM, it will come in kmalloc-128 slab.
      and it will come in kmalloc-192 slab if debug info is enabled.
      Extra bytes taken 48 bytes.
      
      Total number of objects created : 4096
      Total saving = 48*4096 = 192 KB
      
      After creating new slab(When debug info is enabled) :
      sh-3.2# cat /proc/slabinfo
      ...
      kernfs_iattrs_cache   4069   4096    128   32    1 : tunables    0    0    0 : slabdata    128    128      0
      ...
      
      All testing has been done on ARM target.
      
      Signed-off-by: default avatarAyush Mittal <ayush.m@samsung.com>
      Signed-off-by: default avatarVaneet Narang <v.narang@samsung.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26e28d68
  9. Jan 17, 2019
    • Al Viro's avatar
      kill kernfs_pin_sb() · 6d7fbce7
      Al Viro authored
      
      unused now and impossible to use safely anyway.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6d7fbce7
    • Al Viro's avatar
      fix cgroup_do_mount() handling of failure exits · 399504e2
      Al Viro authored
      
      same story as with last May fixes in sysfs (7b745a4e
      "unfuck sysfs_mount()"); new_sb is left uninitialized
      in case of early errors in kernfs_mount_ns() and papering
      over it by treating any error from kernfs_mount_ns() as
      equivalent to !new_ns ends up conflating the cases when
      objects had never been transferred to a superblock with
      ones when that has happened and resulting new superblock
      had been dropped.  Easily fixed (same way as in sysfs
      case).  Additionally, there's a superblock leak on
      kernfs_node_dentry() failure *and* a dentry leak inside
      kernfs_node_dentry() itself - the latter on probably
      impossible errors, but the former not impossible to trigger
      (as the matter of fact, injecting allocation failures
      at that point *does* trigger it).
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      399504e2
  10. Nov 27, 2018
    • Radu Rendec's avatar
      kernfs: Improve kernfs_notify() poll notification latency · 03c0a920
      Radu Rendec authored
      
      kernfs_notify() does two notifications: poll and fsnotify. Originally,
      both notifications were done from scheduled work context and all that
      kernfs_notify() did was schedule the work.
      
      This patch simply moves the poll notification from the scheduled work
      handler to kernfs_notify(). The fsnotify notification still needs to be
      done from scheduled work context because it can sleep (it needs to lock
      a mutex).
      
      If the poll notification is time critical (the notified thread needs to
      wake as quickly as possible), it's better to do it from kernfs_notify()
      directly. One example is calling sysfs_notify_dirent() from a hardware
      interrupt handler to wake up a thread and handle the interrupt in user
      space.
      
      Signed-off-by: default avatarRadu Rendec <radu.rendec@gmail.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03c0a920
  11. Oct 26, 2018
    • Johannes Weiner's avatar
      mm: zero-seek shrinkers · 4b85afbd
      Johannes Weiner authored
      The page cache and most shrinkable slab caches hold data that has been
      read from disk, but there are some caches that only cache CPU work, such
      as the dentry and inode caches of procfs and sysfs, as well as the subset
      of radix tree nodes that track non-resident page cache.
      
      Currently, all these are shrunk at the same rate: using DEFAULT_SEEKS for
      the shrinker's seeks setting tells the reclaim algorithm that for every
      two page cache pages scanned it should scan one slab object.
      
      This is a bogus setting.  A virtual inode that required no IO to create is
      not twice as valuable as a page cache page; shadow cache entries with
      eviction distances beyond the size of memory aren't either.
      
      In most cases, the behavior in practice is still fine.  Such virtual
      caches don't tend to grow and assert themselves aggressively, and usually
      get picked up before they cause problems.  But there are scenarios where
      that's not true.
      
      Our database workloads suffer from two of those.  For one, their file
      workingset is several times bigger than available memory, which has the
      kernel aggressively create shadow page cache entries for the non-resident
      parts of it.  The workingset code does tell the VM that most of these are
      expendable, but the VM ends up balancing them 2:1 to cache pages as per
      the seeks setting.  This is a huge waste of memory.
      
      These workloads also deal with tens of thousands of open files and use
      /proc for introspection, which ends up growing the proc_inode_cache to
      absurdly large sizes - again at the cost of valuable cache space, which
      isn't a reasonable trade-off, given that proc inodes can be re-created
      without involving the disk.
      
      This patch implements a "zero-seek" setting for shrinkers that results in
      a target ratio of 0:1 between their objects and IO-backed caches.  This
      allows such virtual caches to grow when memory is available (they do
      cache/avoid CPU work after all), but effectively disables them as soon as
      IO-backed objects are under pressure.
      
      It then switches the shrinkers for procfs and sysfs metadata, as well as
      excess page cache shadow nodes, to the new zero-seek setting.
      
      Link: http://lkml.kernel.org/r/20181009184732.762-5-hannes@cmpxchg.org
      
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarDomas Mituzas <dmituzas@fb.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarRik van Riel <riel@surriel.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b85afbd
  12. Sep 16, 2018
  13. Jul 21, 2018
  14. Jul 07, 2018
  15. Jun 05, 2018
    • Deepa Dinamani's avatar
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani authored
      
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00
  16. May 21, 2018
  17. Apr 23, 2018
  18. Feb 11, 2018
    • Linus Torvalds's avatar
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds authored
      
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      
      Scripted-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  19. Jan 19, 2018
  20. Nov 27, 2017
    • Al Viro's avatar
      fs: annotate ->poll() instances · 076ccb76
      Al Viro authored
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      076ccb76
    • Linus Torvalds's avatar
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds authored
      
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      
      Requested-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
  21. Sep 01, 2017
  22. Aug 28, 2017
  23. Jul 29, 2017
  24. Mar 17, 2017
    • Vaibhav Jain's avatar
      kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file() · 966fa72a
      Vaibhav Jain authored
      
      Recently started seeing a kernel oops when a module tries removing a
      memory mapped sysfs bin_attribute. On closer investigation the root
      cause seems to be kernfs_release_file() trying to call
      kernfs_op.release() callback that's NULL for such sysfs
      bin_attributes. The oops occurs when kernfs_release_file() is called from
      kernfs_drain_open_files() to cleanup any open handles with active
      memory mappings.
      
      The patch fixes this by checking for flag KERNFS_HAS_RELEASE before
      calling kernfs_release_file() in function kernfs_drain_open_files().
      
      On ppc64-le arch with cxl module the oops back-trace is of the
      form below:
      [  861.381126] Unable to handle kernel paging request for instruction fetch
      [  861.381360] Faulting instruction address: 0x00000000
      [  861.381428] Oops: Kernel access of bad area, sig: 11 [#1]
      ....
      [  861.382481] NIP: 0000000000000000 LR: c000000000362c60 CTR:
      0000000000000000
      ....
      Call Trace:
      [c000000f1680b750] [c000000000362c34] kernfs_drain_open_files+0x104/0x1d0 (unreliable)
      [c000000f1680b790] [c00000000035fa00] __kernfs_remove+0x260/0x2c0
      [c000000f1680b820] [c000000000360da0] kernfs_remove_by_name_ns+0x60/0xe0
      [c000000f1680b8b0] [c0000000003638f4] sysfs_remove_bin_file+0x24/0x40
      [c000000f1680b8d0] [c00000000062a164] device_remove_bin_file+0x24/0x40
      [c000000f1680b8f0] [d000000009b7b22c] cxl_sysfs_afu_remove+0x144/0x170 [cxl]
      [c000000f1680b940] [d000000009b7c7e4] cxl_remove+0x6c/0x1a0 [cxl]
      [c000000f1680b990] [c00000000052f694] pci_device_remove+0x64/0x110
      [c000000f1680b9d0] [c0000000006321d4] device_release_driver_internal+0x1f4/0x2b0
      [c000000f1680ba20] [c000000000525cb0] pci_stop_bus_device+0xa0/0xd0
      [c000000f1680ba60] [c000000000525e80] pci_stop_and_remove_bus_device+0x20/0x40
      [c000000f1680ba90] [c00000000004a6c4] pci_hp_remove_devices+0x84/0xc0
      [c000000f1680bad0] [c00000000004a688] pci_hp_remove_devices+0x48/0xc0
      [c000000f1680bb10] [c0000000009dfda4] eeh_reset_device+0xb0/0x290
      [c000000f1680bbb0] [c000000000032b4c] eeh_handle_normal_event+0x47c/0x530
      [c000000f1680bc60] [c000000000032e64] eeh_handle_event+0x174/0x350
      [c000000f1680bd10] [c000000000033228] eeh_event_handler+0x1e8/0x1f0
      [c000000f1680bdc0] [c0000000000d384c] kthread+0x14c/0x190
      [c000000f1680be30] [c00000000000b5a0] ret_from_kernel_thread+0x5c/0xbc
      
      Fixes: f83f3c51 ("kernfs: fix locking around kernfs_ops->release() callback")
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      966fa72a
  25. Mar 03, 2017
    • David Howells's avatar
      statx: Add a system call to make enhanced file info available · a528d35e
      David Howells authored
      
      Add a system call to make extended file information available, including
      file creation and some attribute flags where available through the
      underlying filesystem.
      
      The getattr inode operation is altered to take two additional arguments: a
      u32 request_mask and an unsigned int flags that indicate the
      synchronisation mode.  This change is propagated to the vfs_getattr*()
      function.
      
      Functions like vfs_stat() are now inline wrappers around new functions
      vfs_statx() and vfs_statx_fd() to reduce stack usage.
      
      ========
      OVERVIEW
      ========
      
      The idea was initially proposed as a set of xattrs that could be retrieved
      with getxattr(), but the general preference proved to be for a new syscall
      with an extended stat structure.
      
      A number of requests were gathered for features to be included.  The
      following have been included:
      
       (1) Make the fields a consistent size on all arches and make them large.
      
       (2) Spare space, request flags and information flags are provided for
           future expansion.
      
       (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
           __s64).
      
       (4) Creation time: The SMB protocol carries the creation time, which could
           be exported by Samba, which will in turn help CIFS make use of
           FS-Cache as that can be used for coherency data (stx_btime).
      
           This is also specified in NFSv4 as a recommended attribute and could
           be exported by NFSD [Steve French].
      
       (5) Lightweight stat: Ask for just those details of interest, and allow a
           netfs (such as NFS) to approximate anything not of interest, possibly
           without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
           Dilger] (AT_STATX_DONT_SYNC).
      
       (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
           its cached attributes are up to date [Trond Myklebust]
           (AT_STATX_FORCE_SYNC).
      
      And the following have been left out for future extension:
      
       (7) Data version number: Could be used by userspace NFS servers [Aneesh
           Kumar].
      
           Can also be used to modify fill_post_wcc() in NFSD which retrieves
           i_version directly, but has just called vfs_getattr().  It could get
           it from the kstat struct if it used vfs_xgetattr() instead.
      
           (There's disagreement on the exact semantics of a single field, since
           not all filesystems do this the same way).
      
       (8) BSD stat compatibility: Including more fields from the BSD stat such
           as creation time (st_btime) and inode generation number (st_gen)
           [Jeremy Allison, Bernd Schubert].
      
       (9) Inode generation number: Useful for FUSE and userspace NFS servers
           [Bernd Schubert].
      
           (This was asked for but later deemed unnecessary with the
           open-by-handle capability available and caused disagreement as to
           whether it's a security hole or not).
      
      (10) Extra coherency data may be useful in making backups [Andreas Dilger].
      
           (No particular data were offered, but things like last backup
           timestamp, the data version number and the DOS archive bit would come
           into this category).
      
      (11) Allow the filesystem to indicate what it can/cannot provide: A
           filesystem can now say it doesn't support a standard stat feature if
           that isn't available, so if, for instance, inode numbers or UIDs don't
           exist or are fabricated locally...
      
           (This requires a separate system call - I have an fsinfo() call idea
           for this).
      
      (12) Store a 16-byte volume ID in the superblock that can be returned in
           struct xstat [Steve French].
      
           (Deferred to fsinfo).
      
      (13) Include granularity fields in the time data to indicate the
           granularity of each of the times (NFSv4 time_delta) [Steve French].
      
           (Deferred to fsinfo).
      
      (14) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.
           Note that the Linux IOC flags are a mess and filesystems such as Ext4
           define flags that aren't in linux/fs.h, so translation in the kernel
           may be a necessity (or, possibly, we provide the filesystem type too).
      
           (Some attributes are made available in stx_attributes, but the general
           feeling was that the IOC flags were to ext[234]-specific and shouldn't
           be exposed through statx this way).
      
      (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
           Michael Kerrisk].
      
           (Deferred, probably to fsinfo.  Finding out if there's an ACL or
           seclabal might require extra filesystem operations).
      
      (16) Femtosecond-resolution timestamps [Dave Chinner].
      
           (A __reserved field has been left in the statx_timestamp struct for
           this - if there proves to be a need).
      
      (17) A set multiple attributes syscall to go with this.
      
      ===============
      NEW SYSTEM CALL
      ===============
      
      The new system call is:
      
      	int ret = statx(int dfd,
      			const char *filename,
      			unsigned int flags,
      			unsigned int mask,
      			struct statx *buffer);
      
      The dfd, filename and flags parameters indicate the file to query, in a
      similar way to fstatat().  There is no equivalent of lstat() as that can be
      emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags.  There is
      also no equivalent of fstat() as that can be emulated by passing a NULL
      filename to statx() with the fd of interest in dfd.
      
      Whether or not statx() synchronises the attributes with the backing store
      can be controlled by OR'ing a value into the flags argument (this typically
      only affects network filesystems):
      
       (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
           respect.
      
       (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
           its attributes with the server - which might require data writeback to
           occur to get the timestamps correct.
      
       (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
           network filesystem.  The resulting values should be considered
           approximate.
      
      mask is a bitmask indicating the fields in struct statx that are of
      interest to the caller.  The user should set this to STATX_BASIC_STATS to
      get the basic set returned by stat().  It should be noted that asking for
      more information may entail extra I/O operations.
      
      buffer points to the destination for the data.  This must be 256 bytes in
      size.
      
      ======================
      MAIN ATTRIBUTES RECORD
      ======================
      
      The following structures are defined in which to return the main attribute
      set:
      
      	struct statx_timestamp {
      		__s64	tv_sec;
      		__s32	tv_nsec;
      		__s32	__reserved;
      	};
      
      	struct statx {
      		__u32	stx_mask;
      		__u32	stx_blksize;
      		__u64	stx_attributes;
      		__u32	stx_nlink;
      		__u32	stx_uid;
      		__u32	stx_gid;
      		__u16	stx_mode;
      		__u16	__spare0[1];
      		__u64	stx_ino;
      		__u64	stx_size;
      		__u64	stx_blocks;
      		__u64	__spare1[1];
      		struct statx_timestamp	stx_atime;
      		struct statx_timestamp	stx_btime;
      		struct statx_timestamp	stx_ctime;
      		struct statx_timestamp	stx_mtime;
      		__u32	stx_rdev_major;
      		__u32	stx_rdev_minor;
      		__u32	stx_dev_major;
      		__u32	stx_dev_minor;
      		__u64	__spare2[14];
      	};
      
      The defined bits in request_mask and stx_mask are:
      
      	STATX_TYPE		Want/got stx_mode & S_IFMT
      	STATX_MODE		Want/got stx_mode & ~S_IFMT
      	STATX_NLINK		Want/got stx_nlink
      	STATX_UID		Want/got stx_uid
      	STATX_GID		Want/got stx_gid
      	STATX_ATIME		Want/got stx_atime{,_ns}
      	STATX_MTIME		Want/got stx_mtime{,_ns}
      	STATX_CTIME		Want/got stx_ctime{,_ns}
      	STATX_INO		Want/got stx_ino
      	STATX_SIZE		Want/got stx_size
      	STATX_BLOCKS		Want/got stx_blocks
      	STATX_BASIC_STATS	[The stuff in the normal stat struct]
      	STATX_BTIME		Want/got stx_btime{,_ns}
      	STATX_ALL		[All currently available stuff]
      
      stx_btime is the file creation time, stx_mask is a bitmask indicating the
      data provided and __spares*[] are where as-yet undefined fields can be
      placed.
      
      Time fields are structures with separate seconds and nanoseconds fields
      plus a reserved field in case we want to add even finer resolution.  Note
      that times will be negative if before 1970; in such a case, the nanosecond
      fields will also be negative if not zero.
      
      The bits defined in the stx_attributes field convey information about a
      file, how it is accessed, where it is and what it does.  The following
      attributes map to FS_*_FL flags and are the same numerical value:
      
      	STATX_ATTR_COMPRESSED		File is compressed by the fs
      	STATX_ATTR_IMMUTABLE		File is marked immutable
      	STATX_ATTR_APPEND		File is append-only
      	STATX_ATTR_NODUMP		File is not to be dumped
      	STATX_ATTR_ENCRYPTED		File requires key to decrypt in fs
      
      Within the kernel, the supported flags are listed by:
      
      	KSTAT_ATTR_FS_IOC_FLAGS
      
      [Are any other IOC flags of sufficient general interest to be exposed
      through this interface?]
      
      New flags include:
      
      	STATX_ATTR_AUTOMOUNT		Object is an automount trigger
      
      These are for the use of GUI tools that might want to mark files specially,
      depending on what they are.
      
      Fields in struct statx come in a number of classes:
      
       (0) stx_dev_*, stx_blksize.
      
           These are local system information and are always available.
      
       (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
           stx_size, stx_blocks.
      
           These will be returned whether the caller asks for them or not.  The
           corresponding bits in stx_mask will be set to indicate whether they
           actually have valid values.
      
           If the caller didn't ask for them, then they may be approximated.  For
           example, NFS won't waste any time updating them from the server,
           unless as a byproduct of updating something requested.
      
           If the values don't actually exist for the underlying object (such as
           UID or GID on a DOS file), then the bit won't be set in the stx_mask,
           even if the caller asked for the value.  In such a case, the returned
           value will be a fabrication.
      
           Note that there are instances where the type might not be valid, for
           instance Windows reparse points.
      
       (2) stx_rdev_*.
      
           This will be set only if stx_mode indicates we're looking at a
           blockdev or a chardev, otherwise will be 0.
      
       (3) stx_btime.
      
           Similar to (1), except this will be set to 0 if it doesn't exist.
      
      =======
      TESTING
      =======
      
      The following test program can be used to test the statx system call:
      
      	samples/statx/test-statx.c
      
      Just compile and run, passing it paths to the files you want to examine.
      The file is built automatically if CONFIG_SAMPLES is enabled.
      
      Here's some example output.  Firstly, an NFS directory that crosses to
      another FSID.  Note that the AUTOMOUNT attribute is set because transiting
      this directory will cause d_automount to be invoked by the VFS.
      
      	[root@andromeda ~]# /tmp/test-statx -A /warthog/data
      	statx(/warthog/data) = 0
      	results=7ff
      	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
      	Device: 00:26           Inode: 1703937     Links: 125
      	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
      	Access: 2016-11-24 09:02:12.219699527+0000
      	Modify: 2016-11-17 10:44:36.225653653+0000
      	Change: 2016-11-17 10:44:36.225653653+0000
      	Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)
      
      Secondly, the result of automounting on that directory.
      
      	[root@andromeda ~]# /tmp/test-statx /warthog/data
      	statx(/warthog/data) = 0
      	results=7ff
      	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
      	Device: 00:27           Inode: 2           Links: 125
      	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
      	Access: 2016-11-24 09:02:12.219699527+0000
      	Modify: 2016-11-17 10:44:36.225653653+0000
      	Change: 2016-11-17 10:44:36.225653653+0000
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a528d35e
  26. Mar 02, 2017
  27. Feb 25, 2017
  28. Feb 21, 2017
    • Tejun Heo's avatar
      kernfs: fix locking around kernfs_ops->release() callback · f83f3c51
      Tejun Heo authored
      
      The release callback may be called from two places - file release
      operation and kernfs open file draining.  kernfs_open_file->mutex is
      used to synchronize the two callsites.  This unfortunately leads to
      possible circular locking because of->mutex is used to protect the
      usual kernfs operations which may use locking constructs which are
      held while removing and thus draining kernfs files.
      
      @of->mutex is for synchronizing concurrent kernfs access operations
      and all we need here is synchronization between the releaes and drain
      paths.  As the drain path has to grab kernfs_open_file_mutex anyway,
      let's use the mutex to synchronize the release operation instead.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-tested-by: default avatarTony Lindgren <tony@atomide.com>
      Fixes: 0e67db2f ("kernfs: add kernfs_ops->open/release() callbacks")
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f83f3c51
Loading