dev_cgroup->exceptions is protected with devcgroup_mutex for writes
and RCU for reads; however, RCU usage isn't correct.
* dev_exception_clean() doesn't use RCU variant of list_del() and
kfree(). The function can race with may_access() and may_access()
may end up dereferencing already freed memory. Use list_del_rcu()
and kfree_rcu() instead.
* may_access() may be called only with RCU read locked but doesn't use
RCU safe traversal over ->exceptions. Use list_for_each_entry_rcu().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: stable@vger.kernel.org
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
In 4cef7299b4 ("device_cgroup: add proper checking when changing
default behavior") the cgroup parent usage is unchecked. root will not
have a parent and trying to use device.{allow,deny} will cause problems.
For some reason my stressing scripts didn't test the root directory so I
didn't catch it on my regular tests.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
BugLink: http://bugs.launchpad.net/bugs/1056078
Profile replacement can cause long chains of profiles to build up when
the profile being replaced is pinned. When the pinned profile is finally
freed, it puts the reference to its replacement, which may in turn nest
another call to free_profile on the stack. Because this may happen for
each profile in the replacedby chain this can result in a recusion that
causes the stack to overflow.
Break this nesting by directly walking the chain of replacedby profiles
(ie. use iteration instead of recursion to free the list). This results
in at most 2 levels of free_profile being called, while freeing a
replacedby chain.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
replace_fd() began with "eats a reference, tries to insert into
descriptor table" semantics; at some point I'd switched it to
much saner current behaviour ("try to insert into descriptor
table, grabbing a new reference if inserted; caller should do
fput() in any case"), but forgot to update the callers.
Mea culpa...
[Spotted by Pavel Roskin, who has really weird system with pipe-fed
coredumps as part of what he considers a normal boot ;-)]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull module signing support from Rusty Russell:
"module signing is the highlight, but it's an all-over David Howells frenzy..."
Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.
* 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
X.509: Fix indefinite length element skip error handling
X.509: Convert some printk calls to pr_devel
asymmetric keys: fix printk format warning
MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
MODSIGN: Make mrproper should remove generated files.
MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
MODSIGN: Use the same digest for the autogen key sig as for the module sig
MODSIGN: Sign modules during the build process
MODSIGN: Provide a script for generating a key ID from an X.509 cert
MODSIGN: Implement module signature checking
MODSIGN: Provide module signing public keys to the kernel
MODSIGN: Automatically generate module signing keys if missing
MODSIGN: Provide Kconfig options
MODSIGN: Provide gitignore and make clean rules for extra files
MODSIGN: Add FIPS policy
module: signature checking hook
X.509: Add a crypto key parser for binary (DER) X.509 certificates
MPILIB: Provide a function to read raw data into an MPI
X.509: Add an ASN.1 decoder
X.509: Add simple ASN.1 grammar compiler
...
Merge patches from Andrew Morton:
"A few misc things and very nearly all of the MM tree. A tremendous
amount of stuff (again), including a significant rbtree library
rework."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (160 commits)
sparc64: Support transparent huge pages.
mm: thp: Use more portable PMD clearing sequenece in zap_huge_pmd().
mm: Add and use update_mmu_cache_pmd() in transparent huge page code.
sparc64: Document PGD and PMD layout.
sparc64: Eliminate PTE table memory wastage.
sparc64: Halve the size of PTE tables
sparc64: Only support 4MB huge pages and 8KB base pages.
memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning
mm: memcg: clean up mm_match_cgroup() signature
mm: document PageHuge somewhat
mm: use %pK for /proc/vmallocinfo
mm, thp: fix mlock statistics
mm, thp: fix mapped pages avoiding unevictable list on mlock
memory-hotplug: update memory block's state and notify userspace
memory-hotplug: preparation to notify memory block's state at memory hot remove
mm: avoid section mismatch warning for memblock_type_name
make GFP_NOTRACK definition unconditional
cma: decrease cc.nr_migratepages after reclaiming pagelist
CMA: migrate mlocked pages
kpageflags: fix wrong KPF_THP on non-huge compound pages
...
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:
| effect | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
This patch removes reserved_vm counter from mm_struct. Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.
Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull asm-generic updates from Arnd Bergmann:
"This has three changes for asm-generic that did not really fit into
any other branch as normal asm-generic changes do. One is a fix for a
build warning, the other two are more interesting:
* A patch from Mark Brown to allow using the common clock
infrastructure on all architectures, so we can use the clock API in
architecture independent device drivers.
* The UAPI split patches from David Howells for the asm-generic
files. There are other architecture specific series that are going
through the arch maintainer tree and that depend on this one.
There may be a few small merge conflicts between Mark's patch and the
following arch header file split patches. In each case the solution
will be to keep the new "generic-y += clkdev.h" line, even if it ends
up being the only line in the Kbuild file."
* tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
UAPI: (Scripted) Disintegrate include/asm-generic
asm-generic: Add default clkdev.h
asm-generic: xor: mark static functions as __maybe_unused
Give the key type the opportunity to preparse the payload prior to the
instantiation and update routines being called. This is done with the
provision of two new key type operations:
int (*preparse)(struct key_preparsed_payload *prep);
void (*free_preparse)(struct key_preparsed_payload *prep);
If the first operation is present, then it is called before key creation (in
the add/update case) or before the key semaphore is taken (in the update and
instantiate cases). The second operation is called to clean up if the first
was called.
preparse() is given the opportunity to fill in the following structure:
struct key_preparsed_payload {
char *description;
void *type_data[2];
void *payload;
const void *data;
size_t datalen;
size_t quotalen;
};
Before the preparser is called, the first three fields will have been cleared,
the payload pointer and size will be stored in data and datalen and the default
quota size from the key_type struct will be stored into quotalen.
The preparser may parse the payload in any way it likes and may store data in
the type_data[] and payload fields for use by the instantiate() and update()
ops.
The preparser may also propose a description for the key by attaching it as a
string to the description field. This can be used by passing a NULL or ""
description to the add_key() system call or the key_create_or_update()
function. This cannot work with request_key() as that required the description
to tell the upcall about the key to be created.
This, for example permits keys that store PGP public keys to generate their own
name from the user ID and public key fingerprint in the key.
The instantiate() and update() operations are then modified to look like this:
int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
int (*update)(struct key *key, struct key_preparsed_payload *prep);
and the new payload data is passed in *prep, whether or not it was preparsed.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pull IMA bugfix (security subsystem) from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
ima: fix bug in argument order
The original model of device_cgroup is having a whitelist where all the
allowed devices are listed. The problem with this approach is that is
impossible to have the case of allowing everything but few devices.
The reason for that lies in the way the whitelist is handled internally:
since there's only a whitelist, the "all devices" entry would have to be
removed and replaced by the entire list of possible devices but the ones
that are being denied. Since dev_t is 32 bits long, representing the allowed
devices as a bitfield is not memory efficient.
This patch replaces the "whitelist" by a "exceptions" list and the default
policy is kept as "deny_all" variable in dev_cgroup structure.
The current interface determines that whenever "a" is written to devices.allow
or devices.deny, the entry masking all devices will be added or removed,
respectively. This behavior is kept and it's what will determine the default
policy:
# cat devices.list
a *:* rwm
# echo a >devices.deny
# cat devices.list
# echo a >devices.allow
# cat devices.list
a *:* rwm
The interface is also preserved. For example, if one wants to block only access
to /dev/null:
# ls -l /dev/null
crw-rw-rw- 1 root root 1, 3 Jul 24 16:17 /dev/null
# echo a >devices.allow
# echo "c 1:3 rwm" >devices.deny
# cat /dev/null
cat: /dev/null: Operation not permitted
# echo >/dev/null
bash: /dev/null: Operation not permitted
mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 r" >devices.allow
# cat /dev/null
# echo >/dev/null
bash: /dev/null: Operation not permitted
mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 rw" >devices.allow
# echo >/dev/null
# cat /dev/null
# mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 rwm" >devices.allow
# echo >/dev/null
# cat /dev/null
# mknod /tmp/null c 1 3
#
Note that I didn't rename the functions/variables in this patch, but in the
next one to make reviewing easier.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull security subsystem updates from James Morris:
"Highlights:
- Integrity: add local fs integrity verification to detect offline
attacks
- Integrity: add digital signature verification
- Simple stacking of Yama with other LSMs (per LSS discussions)
- IBM vTPM support on ppc64
- Add new driver for Infineon I2C TIS TPM
- Smack: add rule revocation for subject labels"
Fixed conflicts with the user namespace support in kernel/auditsc.c and
security/integrity/ima/ima_policy.c.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (39 commits)
Documentation: Update git repository URL for Smack userland tools
ima: change flags container data type
Smack: setprocattr memory leak fix
Smack: implement revoking all rules for a subject label
Smack: remove task_wait() hook.
ima: audit log hashes
ima: generic IMA action flag handling
ima: rename ima_must_appraise_or_measure
audit: export audit_log_task_info
tpm: fix tpm_acpi sparse warning on different address spaces
samples/seccomp: fix 31 bit build on s390
ima: digital signature verification support
ima: add support for different security.ima data types
ima: add ima_inode_setxattr/removexattr function and calls
ima: add inode_post_setattr call
ima: replace iint spinblock with rwlock/read_lock
ima: allocating iint improvements
ima: add appraise action keywords and default rules
ima: integrity appraisal extension
vfs: move ima_file_free before releasing the file
...
Pull vfs update from Al Viro:
- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c
(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).
A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.
- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).
- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.
- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.
- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."
Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...