- Mar 08, 2019
-
-
Gustavo A. R. Silva authored
Use kvzalloc() instead of kvmalloc() and memset(). Also, make use of the struct_size() helper instead of the open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle. Link: http://lkml.kernel.org/r/20190131214221.GA28930@embeddedor Signed-off-by:
Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mathieu Malaterre authored
There is a plan to build the kernel with -Wimplicit-fallthrough and this place in the code produced a warning (W=1). This commit remove the following warning: ipc/sem.c:1683:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Link: http://lkml.kernel.org/r/20190114203608.18218-1-malat@debian.org Signed-off-by:
Mathieu Malaterre <malat@debian.org> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 28, 2019
-
-
David Howells authored
Convert the mqueue filesystem to use the filesystem context stuff. Notes: (1) The relevant ipc namespace is selected in when the context is initialised (and it defaults to the current task's ipc namespace). The caller can override this before calling vfs_get_tree(). (2) Rather than simply calling kern_mount_data(), mq_init_ns() and mq_internal_mount() create a context, adjust it and then do the rest of the mount procedure. (3) The lazy mqueue mounting on creation of a new namespace is retained from a previous patch, but the avoidance of sget() if no superblock yet exists is reverted and the superblock is again keyed on the namespace pointer. Yes, there was a performance gain in not searching the superblock hash, but it's only paid once per ipc namespace - and only if someone uses mqueue within that namespace, so I'm not sure it's worth it, especially as calling sget() allows avoidance of recursion. Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Feb 06, 2019
-
-
Arnd Bergmann authored
A lot of system calls that pass a time_t somewhere have an implementation using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have been reworked so that this implementation can now be used on 32-bit architectures as well. The missing step is to redefine them using the regular SYSCALL_DEFINEx() to get them out of the compat namespace and make it possible to build them on 32-bit architectures. Any system call that ends in 'time' gets a '32' suffix on its name for that version, while the others get a '_time32' suffix, to distinguish them from the normal version, which takes a 64-bit time argument in the future. In this step, only 64-bit architectures are changed, doing this rename first lets us avoid touching the 32-bit architectures twice. Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
- Jan 25, 2019
-
-
Arnd Bergmann authored
The behavior of these system calls is slightly different between architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION symbol. Most architectures that implement the split IPC syscalls don't set that symbol and only get the modern version, but alpha, arm, microblaze, mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag. For the architectures that so far only implement sys_ipc(), i.e. m68k, mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior when adding the split syscalls, so we need to distinguish between the two groups of architectures. The method I picked for this distinction is to have a separate system call entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl() does not. The system call tables of the five architectures are changed accordingly. As an additional benefit, we no longer need the configuration specific definition for ipc_parse_version(), it always does the same thing now, but simply won't get called on architectures with the modern interface. A small downside is that on architectures that do set ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points that are never called. They only add a few bytes of bloat, so it seems better to keep them compared to adding yet another Kconfig symbol. I considered adding new syscall numbers for the IPC_64 variants for consistency, but decided against that for now. Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
- Jan 18, 2019
-
-
Arnd Bergmann authored
The sys_ipc() and compat_ksys_ipc() functions are meant to only be used from the system call table, not called by another function. Introduce ksys_*() interfaces for this purpose, as we have done for many other system calls. Link: https://lore.kernel.org/lkml/20190116131527.2071570-3-arnd@arndb.de Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Reviewed-by:
Heiko Carstens <heiko.carstens@de.ibm.com> [heiko.carstens@de.ibm.com: compile fix for !CONFIG_COMPAT] Signed-off-by:
Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com>
-
- Oct 31, 2018
-
-
Waiman Long authored
For SysV semaphores, the semmni value is the last part of the 4-element sem number array. To make semmni behave in a similar way to msgmni and shmmni, we can't directly use the _minmax handler. Instead, a special sem specific handler is added to check the last argument to make sure that it is limited to the [0, IPCMNI] range. An error will be returned if this is not the case. Link: http://lkml.kernel.org/r/1536352137-12003-3-git-send-email-longman@redhat.com Signed-off-by:
Waiman Long <longman@redhat.com> Reviewed-by:
Davidlohr Bueso <dave@stgolabs.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Waiman Long authored
Patch series "ipc: IPCMNI limit check for *mni & increase that limit", v9. The sysctl parameters msgmni, shmmni and semmni have an inherent limit of IPC_MNI (32k). However, users may not be aware of that because they can write a value much higher than that without getting any error or notification. Reading the parameters back will show the newly written values which are not real. The real IPCMNI limit is now enforced to make sure that users won't put in an unrealistic value. The first 2 patches enforce the limits. There are also users out there requesting increase in the IPCMNI value. The last 2 patches attempt to do that by using a boot kernel parameter "ipcmni_extend" to increase the IPCMNI limit from 32k to 8M if the users really want the extended value. This patch (of 4): A user can write arbitrary integer values to msgmni and shmmni sysctl parameters without getting error, but the actual limit is really IPCMNI (32k). This can mislead users as they think they can get a value that is not real. The right limits are now set for msgmni and shmmni so that the users will become aware if they set a value outside of the acceptable range. Link: http://lkml.kernel.org/r/1536352137-12003-2-git-send-email-longman@redhat.com Signed-off-by:
Waiman Long <longman@redhat.com> Acked-by:
Luis R. Rodriguez <mcgrof@kernel.org> Reviewed-by:
Davidlohr Bueso <dave@stgolabs.net> Cc: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Oct 05, 2018
-
-
Kees Cook authored
This uses ERR_CAST() instead of an open-coded cast, as it is casting across structure pointers, which upsets __randomize_layout: ipc/shm.c: In function `shm_lock': ipc/shm.c:209:9: note: randstruct: casting between randomized structure pointer types (ssa): `struct shmid_kernel' and `struct kern_ipc_perm' return (void *)ipcp; ^~~~~~~~~~~~ Link: http://lkml.kernel.org/r/20180919180722.GA15073@beast Fixes: 82061c57 ("ipc: drop ipc_lock()") Signed-off-by:
Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Oct 03, 2018
-
-
Eric W. Biederman authored
Linus recently observed that if we did not worry about the padding member in struct siginfo it is only about 48 bytes, and 48 bytes is much nicer than 128 bytes for allocating on the stack and copying around in the kernel. The obvious thing of only adding the padding when userspace is including siginfo.h won't work as there are sigframe definitions in the kernel that embed struct siginfo. So split siginfo in two; kernel_siginfo and siginfo. Keeping the traditional name for the userspace definition. While the version that is used internally to the kernel and ultimately will not be padded to 128 bytes is called kernel_siginfo. The definition of struct kernel_siginfo I have put in include/signal_types.h A set of buildtime checks has been added to verify the two structures have the same field offsets. To make it easy to verify the change kernel_siginfo retains the same size as siginfo. The reduction in size comes in a following change. Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com>
-
- Sep 04, 2018
-
-
Davidlohr Bueso authored
When getting rid of the general ipc_lock(), this was missed furthermore, making the comment around the ipc object validity check bogus. Under EIDRM conditions, callers will in turn not see the error and continue with the operation. Link: http://lkml.kernel.org/r/20180824030920.GD3677@linux-r8p5 Link: http://lkml.kernel.org/r/20180823024051.GC13343@shao2-debian Fixes: 82061c57 ("ipc: drop ipc_lock()") Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Reported-by:
kernel test robot <rong.a.chen@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Aug 27, 2018
-
-
Arnd Bergmann authored
Christoph Hellwig suggested a slightly different path for handling backwards compatibility with the 32-bit time_t based system calls: Rather than simply reusing the compat_sys_* entry points on 32-bit architectures unchanged, we get rid of those entry points and the compat_time types by renaming them to something that makes more sense on 32-bit architectures (which don't have a compat mode otherwise), and then share the entry points under the new name with the 64-bit architectures that use them for implementing the compatibility. The following types and interfaces are renamed here, and moved from linux/compat_time.h to linux/time32.h: old new --- --- compat_time_t old_time32_t struct compat_timeval struct old_timeval32 struct compat_timespec struct old_timespec32 struct compat_itimerspec struct old_itimerspec32 ns_to_compat_timeval() ns_to_old_timeval32() get_compat_itimerspec64() get_old_itimerspec32() put_compat_itimerspec64() put_old_itimerspec32() compat_get_timespec64() get_old_timespec32() compat_put_timespec64() put_old_timespec32() As we already have aliases in place, this patch addresses only the instances that are relevant to the system call interface in particular, not those that occur in device drivers and other modules. Those will get handled separately, while providing the 64-bit version of the respective interfaces. I'm not renaming the timex, rusage and itimerval structures, as we are still debating what the new interface will look like, and whether we will need a replacement at all. This also doesn't change the names of the syscall entry points, which can be done more easily when we actually switch over the 32-bit architectures to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix. Suggested-by:
Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/ Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
- Aug 22, 2018
-
-
Manfred Spraul authored
ipc_getref has still a return value of type "int", matching the atomic_t interface of atomic_inc_not_zero()/atomic_add_unless(). ipc_getref now uses refcount_inc_not_zero, which has a return value of type "bool". Therefore, update the return code to avoid implicit conversions. Link: http://lkml.kernel.org/r/20180712185241.4017-13-manfred@colorfullife.com Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
The varable names got a mess, thus standardize them again: id: user space id. Called semid, shmid, msgid if the type is known. Most functions use "id" already. idx: "index" for the idr lookup Right now, some functions use lid, ipc_addid() already uses idx as the variable name. seq: sequence number, to avoid quick collisions of the user space id key: user space key, used for the rhash tree Link: http://lkml.kernel.org/r/20180712185241.4017-12-manfred@colorfullife.com Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
Now that we know that rhashtable_init() will not fail, we can get rid of a lot of the unnecessary cleanup paths when the call errored out. [manfred@colorfullife.com: variable name added to util.h to resolve checkpatch warning] Link: http://lkml.kernel.org/r/20180712185241.4017-11-manfred@colorfullife.com Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
In sysvipc we have an ids->tables_initialized regarding the rhashtable, introduced in 0cfb6aee ("ipc: optimize semget/shmget/msgget for lots of keys") It's there, specifically, to prevent nil pointer dereferences, from using an uninitialized api. Considering how rhashtable_init() can fail (probably due to ENOMEM, if anything), this made the overall ipc initialization capable of failure as well. That alone is ugly, but fine, however I've spotted a few issues regarding the semantics of tables_initialized (however unlikely they may be): - There is inconsistency in what we return to userspace: ipc_addid() returns ENOSPC which is certainly _wrong_, while ipc_obtain_object_idr() returns EINVAL. - After we started using rhashtables, ipc_findkey() can return nil upon !tables_initialized, but the caller expects nil for when the ipc structure isn't found, and can therefore call into ipcget() callbacks. Now that rhashtable initialization cannot fail, we can properly get rid of the hack altogether. [manfred@colorfullife.com: commit id extended to 12 digits] Link: http://lkml.kernel.org/r/20180712185241.4017-10-manfred@colorfullife.com Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
ipc/util.c contains multiple functions to get the ipc object pointer given an id number. There are two sets of function: One set verifies the sequence counter part of the id number, other functions do not check the sequence counter. The standard for function names in ipc/util.c is - ..._check() functions verify the sequence counter - ..._idr() functions do not verify the sequence counter ipc_lock() is an exception: It does not verify the sequence counter value, but this is not obvious from the function name. Furthermore, shm.c is the only user of this helper. Thus, we can simply move the logic into shm_lock() and get rid of the function altogether. [manfred@colorfullife.com: most of changelog] Link: http://lkml.kernel.org/r/20180712185241.4017-7-manfred@colorfullife.com Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
The comment that explains ipc_obtain_object_check is wrong: The function checks the sequence number, not the reference counter. Note that checking the reference counter would be meaningless: The reference counter is decreased without holding any locks, thus an object with kern_ipc_perm.deleted=true may disappear at the end of the next rcu grace period. Link: http://lkml.kernel.org/r/20180712185241.4017-6-manfred@colorfullife.com Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Reviewed-by:
Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
Both the comment and the name of ipcctl_pre_down_nolock() are misleading: The function must be called while holdling the rw semaphore. Therefore the patch renames the function to ipcctl_obtain_check(): This name matches the other names used in util.c: - "obtain" function look up a pointer in the idr, without acquiring the object lock. - The caller is responsible for locking. - _check means that the sequence number is checked. Link: http://lkml.kernel.org/r/20180712185241.4017-5-manfred@colorfullife.com Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Reviewed-by:
Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
ipc_addid() is impossible to use: - for certain failures, the caller must not use ipc_rcu_putref(), because the reference counter is not yet initialized. - for other failures, the caller must use ipc_rcu_putref(), because parallel operations could be ongoing already. The patch cleans that up, by initializing the refcount early, and by modifying all callers. The issues is related to the finding of syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com: syzbot found an issue with reading kern_ipc_perm.seq, here both read and write to already released memory could happen. Link: http://lkml.kernel.org/r/20180712185241.4017-4-manfred@colorfullife.com Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
ipc_addid() initializes kern_ipc_perm.seq after having called idr_alloc() (within ipc_idr_alloc()). Thus a parallel semop() or msgrcv() that uses ipc_obtain_object_check() may see an uninitialized value. The patch moves the initialization of kern_ipc_perm.seq before the calls of idr_alloc(). Notes: 1) This patch has a user space visible side effect: If /proc/sys/kernel/*_next_id is used (i.e.: checkpoint/restore) and if semget()/msgget()/shmget() fails in the final step of adding the id to the rhash tree, then .._next_id is cleared. Before the patch, is remained unmodified. There is no change of the behavior after a successful ..get() call: It always clears .._next_id, there is no impact to non checkpoint/restore code as that code does not use .._next_id. 2) The patch correctly documents that after a call to ipc_idr_alloc(), the full tear-down sequence must be used. The callers of ipc_addid() do not fullfill that, i.e. more bugfixes are required. The patch is a squash of a patch from Dmitry and my own changes. Link: http://lkml.kernel.org/r/20180712185241.4017-3-manfred@colorfullife.com Reported-by:
<syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com> Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
ipc_addid() initializes kern_ipc_perm.id after having called ipc_idr_alloc(). Thus a parallel semctl() or msgctl() that uses e.g. MSG_STAT may use this unitialized value as the return code. The patch moves all accesses to kern_ipc_perm.id under the spin_lock(). The issues is related to the finding of syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com: syzbot found an issue with kern_ipc_perm.seq Link: http://lkml.kernel.org/r/20180712185241.4017-2-manfred@colorfullife.com Signed-off-by:
Manfred Spraul <manfred@colorfullife.com> Reviewed-by:
Davidlohr Bueso <dbueso@suse.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Aug 02, 2018
-
-
Jane Chu authored
Commit 05ea8860 ("mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct") adds a new ->pagesize() function to hugetlb_vm_ops, intended to cover all hugetlbfs backed files. With System V shared memory model, if "huge page" is specified, the "shared memory" is backed by hugetlbfs files, but the mappings initiated via shmget/shmat have their original vm_ops overwritten with shm_vm_ops, so we need to add a ->pagesize function to shm_vm_ops. Otherwise, vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs backed vma, result in below BUG: fs/hugetlbfs/inode.c 443 if (unlikely(page_mapped(page))) { 444 BUG_ON(truncate_op); resulting in hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB is deprecated ------------[ cut here ]------------ kernel BUG at fs/hugetlbfs/inode.c:444! Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ... CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 4.14.35-1829.el7uek.x86_64 #2 RIP: 0010:remove_inode_hugepages+0x3db/0x3e2 .... Call Trace: hugetlbfs_evict_inode+0x1e/0x3e evict+0xdb/0x1af iput+0x1a2/0x1f7 dentry_unlink_inode+0xc6/0xf0 __dentry_kill+0xd8/0x18d dput+0x1b5/0x1ed __fput+0x18b/0x216 ____fput+0xe/0x10 task_work_run+0x90/0xa7 exit_to_usermode_loop+0xdd/0x116 do_syscall_64+0x187/0x1ae entry_SYSCALL_64_after_hwframe+0x150/0x0 [jane.chu@oracle.com: relocate comment] Link: http://lkml.kernel.org/r/20180731044831.26036-1-jane.chu@oracle.com Link: http://lkml.kernel.org/r/20180727211727.5020-1-jane.chu@oracle.com Fixes: 05ea8860 ("mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct") Signed-off-by:
Jane Chu <jane.chu@oracle.com> Suggested-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Acked-by:
Davidlohr Bueso <dave@stgolabs.net> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jul 27, 2018
-
-
Davidlohr Bueso authored
In order for load/store tearing prevention to work, _all_ accesses to the variable in question need to be done around READ and WRITE_ONCE() macros. Ensure everyone does so for q->status variable for semtimedop(). Link: http://lkml.kernel.org/r/20180717052654.676-1-dave@stgolabs.net Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jul 12, 2018
-
-
Al Viro authored
Acked-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... so that it could set both ->f_flags and ->f_mode, without callers having to set ->f_flags manually. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Jun 22, 2018
-
-
NeilBrown authored
Due to the use of rhashtables in net namespaces, rhashtable.h is included in lots of the kernel, so a small changes can required a large recompilation. This makes development painful. This patch splits out rhashtable-types.h which just includes the major type declarations, and does not include (non-trivial) inline code. rhashtable.h is no longer included by anything in the include/ directory. Common include files only include rhashtable-types.h so a large recompilation is only triggered when that changes. Acked-by:
Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
NeilBrown <neilb@suse.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jun 14, 2018
-
-
Souptick Joarder authored
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. Commit 1c8f4220 ("mm: change return type to vm_fault_t") Link: http://lkml.kernel.org/r/20180425043413.GA21467@jordon-HP-15-Notebook-PC Signed-off-by:
Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by:
Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Acked-by:
Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
Both smatch and coverity are reporting potential issues with spectre variant 1 with the 'semnum' index within the sma->sems array, ie: ipc/sem.c:388 sem_lock() warn: potential spectre issue 'sma->sems' ipc/sem.c:641 perform_atomic_semop_slow() warn: potential spectre issue 'sma->sems' ipc/sem.c:721 perform_atomic_semop() warn: potential spectre issue 'sma->sems' Avoid any possible speculation by using array_index_nospec() thus ensuring the semnum value is bounded to [0, sma->sem_nsems). With the exception of sem_lock() all of these are slowpaths. Link: http://lkml.kernel.org/r/20180423171131.njs4rfm2yzyeg6do@linux-n805 Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Reported-by:
Dan Carpenter <dan.carpenter@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jun 12, 2018
-
-
Kees Cook authored
The kvmalloc() function has a 2-factor argument form, kvmalloc_array(). This patch replaces cases of: kvmalloc(a * b, gfp) with: kvmalloc_array(a * b, gfp) as well as handling cases of: kvmalloc(a * b * c, gfp) with: kvmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kvmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kvmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kvmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kvmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kvmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kvmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kvmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kvmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kvmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kvmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kvmalloc( - sizeof(char) * COUNT + COUNT , ...) | kvmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kvmalloc + kvmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kvmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kvmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kvmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kvmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kvmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kvmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kvmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kvmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kvmalloc(C1 * C2 * C3, ...) | kvmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kvmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kvmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kvmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kvmalloc(sizeof(THING) * C2, ...) | kvmalloc(sizeof(TYPE) * C2, ...) | kvmalloc(C1 * C2 * C3, ...) | kvmalloc(C1 * C2, ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kvmalloc + kvmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kvmalloc + kvmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by:
Kees Cook <keescook@chromium.org>
-
- May 26, 2018
-
-
Davidlohr Bueso authored
shmat()'s SHM_REMAP option forbids passing a nil address for; this is in fact the very first thing we check for. Andrea reported that for SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check, but we need to check again if the address was rounded down to nil. As of this patch, such cases will return -EINVAL. Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805 Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Reported-by:
Andrea Arcangeli <aarcange@redhat.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
Patch series "ipc/shm: shmat() fixes around nil-page". These patches fix two issues reported[1] a while back by Joe and Andrea around how shmat(2) behaves with nil-page. The first reverts a commit that it was incorrectly thought that mapping nil-page (address=0) was a no no with MAP_FIXED. This is not the case, with the exception of SHM_REMAP; which is address in the second patch. I chose two patches because it is easier to backport and it explicitly reverts bogus behaviour. Both patches ought to be in -stable and ltp testcases need updated (the added testcase around the cve can be modified to just test for SHM_RND|SHM_REMAP). [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805 This patch (of 2): Commit 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection") worked on the idea that we should not be mapping as root addr=0 and MAP_FIXED. However, it was reported that this scenario is in fact valid, thus making the patch both bogus and breaks userspace as well. For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem initialization[1]. [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347 Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net Fixes: 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection") Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Reported-by:
Joe Lawrence <joe.lawrence@redhat.com> Reported-by:
Andrea Arcangeli <aarcange@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 20, 2018
-
-
Arnd Bergmann authored
32-bit architectures implementing 64BIT_TIME and COMPAT_32BIT_TIME need to have the traditional semtimedop() behavior with 32-bit timestamps for sys_ipc() by calling compat_ksys_semtimedop(), while those that are not yet converted need to keep using ksys_semtimedop() like 64-bit architectures do. Note that I chose to not implement a new SEMTIMEDOP64 function that corresponds to the new sys_semtimedop() with 64-bit timeouts. The reason here is that sys_ipc() should no longer be used for new system calls, and libc should just call the semtimedop syscall directly. One open question remain to whether we want to completely avoid the sys_ipc() system call for architectures that do not yet have all the individual calls as they get converted to 64-bit time_t. Doing that would require adding several extra system calls on m68k, mips, powerpc, s390, sh, sparc, and x86-32. Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
Arnd Bergmann authored
Three ipc syscalls (mq_timedsend, mq_timedreceive and and semtimedop) take a timespec argument. After we move 32-bit architectures over to useing 64-bit time_t based syscalls, we need seperate entry points for the old 32-bit based interfaces. This changes the #ifdef guards for the existing 32-bit compat syscalls to check for CONFIG_COMPAT_32BIT_TIME instead, which will then be enabled on all existing 32-bit architectures. Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
Arnd Bergmann authored
This is a preparatation for changing over __kernel_timespec to 64-bit times, which involves assigning new system call numbers for mq_timedsend(), mq_timedreceive() and semtimedop() for compatibility with future y2038 proof user space. The existing ABIs will remain available through compat code. Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
Arnd Bergmann authored
The shmid64_ds/semid64_ds/msqid64_ds data structures have been extended to contain extra fields for storing the upper bits of the time stamps, this patch does the other half of the job and and fills the new fields on 32-bit architectures as well as 32-bit tasks running on a 64-bit kernel in compat mode. There should be no change for native 64-bit tasks. Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
Arnd Bergmann authored
In some places, we still used get_seconds() instead of ktime_get_real_seconds(), and I'm changing the remaining ones now to all use ktime_get_real_seconds() so we use the full available range for timestamps instead of overflowing the 'unsigned long' return value in year 2106 on 32-bit kernels. Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
- Apr 14, 2018
-
-
Eric Biggers authored
syzbot reported a use-after-free of shm_file_data(file)->file->f_op in shm_get_unmapped_area(), called via sys_remap_file_pages(). Unfortunately it couldn't generate a reproducer, but I found a bug which I think caused it. When remap_file_pages() is passed a full System V shared memory segment, the memory is first unmapped, then a new map is created using the ->vm_file. Between these steps, the shm ID can be removed and reused for a new shm segment. But, shm_mmap() only checks whether the ID is currently valid before calling the underlying file's ->mmap(); it doesn't check whether it was reused. Thus it can use the wrong underlying file, one that was already freed. Fix this by making the "outer" shm file (the one that gets put in ->vm_file) hold a reference to the real shm file, and by making __shm_open() require that the file associated with the shm ID matches the one associated with the "outer" file. Taking the reference to the real shm file is needed to fully solve the problem, since otherwise sfd->file could point to a freed file, which then could be reallocated for the reused shm ID, causing the wrong shm segment to be mapped (and without the required permission checks). Commit 1ac0b6de ("ipc/shm: handle removed segments gracefully in shm_mmap()") almost fixed this bug, but it didn't go far enough because it didn't consider the case where the shm ID is reused. The following program usually reproduces this bug: #include <stdlib.h> #include <sys/shm.h> #include <sys/syscall.h> #include <unistd.h> int main() { int is_parent = (fork() != 0); srand(getpid()); for (;;) { int id = shmget(0xF00F, 4096, IPC_CREAT|0700); if (is_parent) { void *addr = shmat(id, NULL, 0); usleep(rand() % 50); while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0)); } else { usleep(rand() % 50); shmctl(id, IPC_RMID, NULL); } } } It causes the following NULL pointer dereference due to a 'struct file' being used while it's being freed. (I couldn't actually get a KASAN use-after-free splat like in the syzbot report. But I think it's possible with this bug; it would just take a more extraordinary race...) BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 RIP: 0010:d_inode include/linux/dcache.h:519 [inline] RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724 [...] Call Trace: file_accessed include/linux/fs.h:2063 [inline] shmem_mmap+0x25/0x40 mm/shmem.c:2149 call_mmap include/linux/fs.h:1789 [inline] shm_mmap+0x34/0x80 ipc/shm.c:465 call_mmap include/linux/fs.h:1789 [inline] mmap_region+0x309/0x5b0 mm/mmap.c:1712 do_mmap+0x294/0x4a0 mm/mmap.c:1483 do_mmap_pgoff include/linux/mm.h:2235 [inline] SYSC_remap_file_pages mm/mmap.c:2853 [inline] SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769 do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ebiggers@google.com: add comment] Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com Reported-by:
<syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com> Fixes: c8d78c18 ("mm: replace remap_file_pages() syscall with emulation") Signed-off-by:
Eric Biggers <ebiggers@google.com> Acked-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by:
Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 11, 2018
-
-
Andrew Morton authored
This was added by the recent "ipc/shm.c: add split function to shm_vm_ops", but it is not necessary. Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
There is a permission discrepancy when consulting msq ipc object metadata between /proc/sysvipc/msg (0444) and the MSG_STAT shmctl command. The later does permission checks for the object vs S_IRUGO. As such there can be cases where EACCESS is returned via syscall but the info is displayed anyways in the procfs files. While this might have security implications via info leaking (albeit no writing to the msq metadata), this behavior goes way back and showing all the objects regardless of the permissions was most likely an overlook - so we are stuck with it. Furthermore, modifying either the syscall or the procfs file can cause userspace programs to break (ie ipcs). Some applications require getting the procfs info (without root privileges) and can be rather slow in comparison with a syscall -- up to 500x in some reported cases for shm. This patch introduces a new MSG_STAT_ANY command such that the msq ipc object permissions are ignored, and only audited instead. In addition, I've left the lsm security hook checks in place, as if some policy can block the call, then the user has no other choice than just parsing the procfs file. Link: http://lkml.kernel.org/r/20180215162458.10059-4-dave@stgolabs.net Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Reported-by:
Robert Kettler <robert.kettler@outlook.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-