Pull the big VFS changes from Al Viro:
"This one is *big* and changes quite a few things around VFS. What's in there:
- the first of two really major architecture changes - death to open
intents.
The former is finally there; it was very long in making, but with
Miklos getting through really hard and messy final push in
fs/namei.c, we finally have it. Unlike his variant, this one
doesn't introduce struct opendata; what we have instead is
->atomic_open() taking preallocated struct file * and passing
everything via its fields.
Instead of returning struct file *, it returns -E... on error, 0
on success and 1 in "deal with it yourself" case (e.g. symlink
found on server, etc.).
See comments before fs/namei.c:atomic_open(). That made a lot of
goodies finally possible and quite a few are in that pile:
->lookup(), ->d_revalidate() and ->create() do not get struct
nameidata * anymore; ->lookup() and ->d_revalidate() get lookup
flags instead, ->create() gets "do we want it exclusive" flag.
With the introduction of new helper (kern_path_locked()) we are rid
of all struct nameidata instances outside of fs/namei.c; it's still
visible in namei.h, but not for long. Come the next cycle,
declaration will move either to fs/internal.h or to fs/namei.c
itself. [me, miklos, hch]
- The second major change: behaviour of final fput(). Now we have
__fput() done without any locks held by caller *and* not from deep
in call stack.
That obviously lifts a lot of constraints on the locking in there.
Moreover, it's legal now to call fput() from atomic contexts (which
has immediately simplified life for aio.c). We also don't need
anti-recursion logics in __scm_destroy() anymore.
There is a price, though - the damn thing has become partially
asynchronous. For fput() from normal process we are guaranteed
that pending __fput() will be done before the caller returns to
userland, exits or gets stopped for ptrace.
For kernel threads and atomic contexts it's done via
schedule_work(), so theoretically we might need a way to make sure
it's finished; so far only one such place had been found, but there
might be more.
There's flush_delayed_fput() (do all pending __fput()) and there's
__fput_sync() (fput() analog doing __fput() immediately). I hope
we won't need them often; see warnings in fs/file_table.c for
details. [me, based on task_work series from Oleg merged last
cycle]
- sync series from Jan
- large part of "death to sync_supers()" work from Artem; the only
bits missing here are exofs and ext4 ones. As far as I understand,
those are going via the exofs and ext4 trees resp.; once they are
in, we can put ->write_super() to the rest, along with the thread
calling it.
- preparatory bits from unionmount series (from dhowells).
- assorted cleanups and fixes all over the place, as usual.
This is not the last pile for this cycle; there's at least jlayton's
ESTALE work and fsfreeze series (the latter - in dire need of fixes,
so I'm not sure it'll make the cut this cycle). I'll probably throw
symlink/hardlink restrictions stuff from Kees into the next pile, too.
Plus there's a lot of misc patches I hadn't thrown into that one -
it's large enough as it is..."
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits)
ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file()
btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file()
switch dentry_open() to struct path, make it grab references itself
spufs: shift dget/mntget towards dentry_open()
zoran: don't bother with struct file * in zoran_map
ecryptfs: don't reinvent the wheels, please - use struct completion
don't expose I_NEW inodes via dentry->d_inode
tidy up namei.c a bit
unobfuscate follow_up() a bit
ext3: pass custom EOF to generic_file_llseek_size()
ext4: use core vfs llseek code for dir seeks
vfs: allow custom EOF in generic_file_llseek code
vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
vfs: Remove unnecessary flushing of block devices
vfs: Make sys_sync writeout also block device inodes
vfs: Create function for iterating over block devices
vfs: Reorder operations during sys_sync
quota: Move quota syncing to ->sync_fs method
quota: Split dquot_quota_sync() to writeback and cache flushing part
vfs: Move noop_backing_dev_info check from sync into writeback
...
Pull SELinux regression fixes from James Morris.
Andrew Morton has a box that hit that open perms problem.
I also renamed the "epollwakeup" selinux name for the new capability to
be "block_suspend", to match the rename done by commit d9914cf661
("PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND").
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
SELinux: do not check open perms if they are not known to policy
SELinux: include definition of new capabilities
The kernel has added CAP_WAKE_ALARM and CAP_EPOLLWAKEUP. We need to
define these in SELinux so they can be mediated by policy.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
avc_add_callback now just used for registering reset functions
in initcalls, and the callback functions just did reset operations.
So, reducing the arguments to only one event is enough now.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
We no longer need the distinction. We only need data after we decide to do an
audit. So turn the "late" audit data into just "data" and remove what we
currently have as "data".
Signed-off-by: Eric Paris <eparis@redhat.com>
We pay a rather large overhead initializing the common_audit_data.
Since we only need this information if we actually emit an audit
message there is little need to set it up in the hot path. This patch
splits the functionality of avc_has_perm() into avc_has_perm_noaudit(),
avc_audit_required() and slow_avc_audit(). But we take care of setting
up to audit between required() and the actual audit call. Thus saving
measurable time in a hot path.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Eric Paris <eparis@redhat.com>
Because Fedora shipped userspace based on my development tree we now
have policy version 27 in the wild defining only default user, role, and
range. Thus to add default_type we need a policy.28.
Signed-off-by: Eric Paris <eparis@redhat.com>
When new objects are created we have great and flexible rules to
determine the type of the new object. We aren't quite as flexible or
mature when it comes to determining the user, role, and range. This
patch adds a new ability to specify the place a new objects user, role,
and range should come from. For users and roles it can come from either
the source or the target of the operation. aka for files the user can
either come from the source (the running process and todays default) or
it can come from the target (aka the parent directory of the new file)
examples always are done with
directory context: system_u:object_r:mnt_t:s0-s0:c0.c512
process context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[no rule]
unconfined_u:object_r:mnt_t:s0 test_none
[default user source]
unconfined_u:object_r:mnt_t:s0 test_user_source
[default user target]
system_u:object_r:mnt_t:s0 test_user_target
[default role source]
unconfined_u:unconfined_r:mnt_t:s0 test_role_source
[default role target]
unconfined_u:object_r:mnt_t:s0 test_role_target
[default range source low]
unconfined_u:object_r:mnt_t:s0 test_range_source_low
[default range source high]
unconfined_u:object_r:mnt_t:s0:c0.c1023 test_range_source_high
[default range source low-high]
unconfined_u:object_r:mnt_t:s0-s0:c0.c1023 test_range_source_low-high
[default range target low]
unconfined_u:object_r:mnt_t:s0 test_range_target_low
[default range target high]
unconfined_u:object_r:mnt_t:s0:c0.c512 test_range_target_high
[default range target low-high]
unconfined_u:object_r:mnt_t:s0-s0:c0.c512 test_range_target_low-high
Signed-off-by: Eric Paris <eparis@redhat.com>
Instead of declaring the entire selinux_audit_data on the stack when we
start an operation on declare it on the stack if we are going to use it.
We know it's usefulness at the end of the security decision and can declare
it there.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus found that the gigantic size of the common audit data caused a big
perf hit on something as simple as running stat() in a loop. This patch
requires LSMs to declare the LSM specific portion separately rather than
doing it in a union. Thus each LSM can be responsible for shrinking their
portion and don't have to pay a penalty just because other LSMs have a
bigger space requirement.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:
perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`
Signed-off-by: David Howells <dhowells@redhat.com>
selinux/xfrm.h needs to #include net/flow.h or else suffer:
In file included from security/selinux/ss/services.c:69:0:
security/selinux/include/xfrm.h: In function 'selinux_xfrm_notify_policyload':
security/selinux/include/xfrm.h:53:14: error: 'flow_cache_genid' undeclared (first use in this function)
security/selinux/include/xfrm.h:53:14: note: each undeclared identifier is reported only once for each function it appears in
Signed-off-by: David Howells <dhowells@redhat.com>
My @hp.com will no longer be valid starting August 5, 2011 so an update is
necessary. My new email address is employer independent so we don't have
to worry about doing this again any time soon.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no point in counting hits - we can calculate it from the number
of lookups and misses.
This makes the avc statistics a bit smaller, and makes the code
generation better too.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the security modules can decide whether they support the
dcache RCU walk or not it's possible to make selinux a bit more
RCU friendly. The SELinux AVC and security server access decision
code is RCU safe. A specific piece of the LSM audit code may not
be RCU safe.
This patch makes the VFS RCU walk retry if it would hit the non RCU
safe chunk of code. It will normally just work under RCU. This is
done simply by passing the VFS RCU state as a flag down into the
avc_audit() code and returning ECHILD there if it would have an issue.
Based-on-patch-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the security modules can decide whether they support the
dcache RCU walk or not it's possible to make selinux a bit more
RCU friendly. The SELinux AVC and security server access decision
code is RCU safe. A specific piece of the LSM audit code may not
be RCU safe.
This patch makes the VFS RCU walk retry if it would hit the non RCU
safe chunk of code. It will normally just work under RCU. This is
done simply by passing the VFS RCU state as a flag down into the
avc_audit() code and returning ECHILD there if it would have an issue.
Based-on-patch-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>