Commit Graph

286582 Commits

Author SHA1 Message Date
Eric Paris 16c174bd95 audit: check current inode and containing object when filtering on major and minor
The audit system has the ability to filter on the major and minor number of
the device containing the inode being operated upon.  Lets say that
/dev/sda1 has major,minor 8,1 and that we mount /dev/sda1 on /boot.  Now lets
say we add a watch with a filter on 8,1.  If we proceed to open an inode
inside /boot, such as /vboot/vmlinuz, we will match the major,minor filter.

Lets instead assume that one were to use a tool like debugfs and were to
open /dev/sda1 directly and to modify it's contents.  We might hope that
this would also be logged, but it isn't.  The rules will check the
major,minor of the device containing /dev/sda1.  In other words the rule
would match on the major/minor of the tmpfs mounted at /dev.

I believe these rules should trigger on either device.  The man page is
devoid of useful information about the intended semantics.  It only seems
logical that if you want to know everything that happened on a major,minor
that would include things that happened to the device itself...

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17 16:16:55 -05:00
Eric Paris 3035c51e8a audit: drop the meaningless and format breaking word 'user'
userspace audit messages look like so:

type=USER msg=audit(1271170549.415:24710): user pid=14722 uid=0 auid=500 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 msg=''

That third field just says 'user'.  That's useless and doesn't follow the
key=value pair we are trying to enforce.  We already know it came from the
user based on the record type.  Kill that word.  Die.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17 16:16:54 -05:00
Eric Paris 5195d8e217 audit: dynamically allocate audit_names when not enough space is in the names array
This patch does 2 things.  First it reduces the number of audit_names
allocated in every audit context from 20 to 5.  5 should be enough for all
'normal' syscalls (rename being the worst).  Some syscalls can still touch
more the 5 inodes such as mount.  When rpc filesystem is mounted it will
create inodes and those can exceed 5.  To handle that problem this patch will
dynamically allocate audit_names if it needs more than 5.  This should
decrease the typicall memory usage while still supporting all the possible
kernel operations.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17 16:16:54 -05:00
Eric Paris 5ef30ee53b audit: make filetype matching consistent with other filters
Every other filter that matches part of the inodes list collected by audit
will match against any of the inodes on that list.  The filetype matching
however had a strange way of doing things.  It allowed userspace to
indicated if it should match on the first of the second name collected by
the kernel.  Name collection ordering seems like a kernel internal and
making userspace rules get that right just seems like a bad idea.  As it
turns out the userspace audit writers had no idea it was doing this and
thus never overloaded the value field.  The kernel always checked the first
name collected which for the tested rules was always correct.

This patch just makes the filetype matching like the major, minor, inode,
and LSM rules in that it will match against any of the names collected.  It
also changes the rule validation to reject the old unused rule types.

Noone knew it was there.  Noone used it.  Why keep around the extra code?

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17 16:16:54 -05:00
Christoph Hellwig d060646436 xfs: cleanup xfs_file_aio_write
With all the size field updates out of the way xfs_file_aio_write can
be further simplified by pushing all iolock handling into
xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using
the generic generic_write_sync helper for synchronous writes.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:12:33 -06:00
Christoph Hellwig 5bf1f26227 xfs: always return with the iolock held from xfs_file_aio_write_checks
While xfs_iunlock is fine with 0 lockflags the calling conventions are much
cleaner if xfs_file_aio_write_checks never returns without the iolock held.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:11:07 -06:00
Christoph Hellwig 2813d682e8 xfs: remove the i_new_size field in struct xfs_inode
Now that we use the VFS i_size field throughout XFS there is no need for the
i_new_size field any more given that the VFS i_size field gets updated
in ->write_end before unlocking the page, and thus is always uptodate when
writeback could see a page.  Removing i_new_size also has the advantage that
we will never have to trim back di_size during a failed buffered write,
given that it never gets updated past i_size.

Note that currently the generic direct I/O code only updates i_size after
calling our end_io handler, which requires a small workaround to make
sure di_size actually makes it to disk.  I hope to fix this properly in
the generic code.

A downside is that we lose the support for parallel non-overlapping O_DIRECT
appending writes that recently was added.  I don't think keeping the complex
and fragile i_new_size infrastructure for this is a good tradeoff - if we
really care about parallel appending writers we should investigate turning
the iolock into a range lock, which would also allow for parallel
non-overlapping buffered writers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:10:19 -06:00
Christoph Hellwig ce7ae151dd xfs: remove the i_size field in struct xfs_inode
There is no fundamental need to keep an in-memory inode size copy in the XFS
inode.  We already have the on-disk value in the dinode, and the separate
in-memory copy that we need for regular files only in the XFS inode.

Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
VFS inode i_size field for regular files.  Switch code that was directly
accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
we are limited to regular files direct access of the VFS inode i_size field.

This also allows dropping some fairly complicated code in the write path
which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
that is getting updated inside ->write_end.

Note that we do not bother resetting the VFS i_size when truncating a file
that gets freed to zero as there is no point in doing so because the VFS inode
is no longer in use at this point.  Just relax the assert in xfs_ifree to
only check the on-disk size instead.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:08:53 -06:00
Christoph Hellwig f392e6319a xfs: replace i_pin_wait with a bit waitqueue
Replace i_pin_wait, which is only used during synchronous inode flushing
with a bit waitqueue.  This trades off a much smaller inode against
slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit)
bytes in the XFS inode.

Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:07:54 -06:00
Christoph Hellwig 474fce0675 xfs: replace i_flock with a sleeping bitlock
We almost never block on i_flock, the exception is synchronous inode
flushing.  Instead of bloating the inode with a 16/24-byte completion
that we abuse as a semaphore just implement it as a bitlock that uses
a bit waitqueue for the rare sleeping path.  This primarily is a
tradeoff between a much smaller inode and a faster non-blocking
path vs faster wakeups, and we are much better off with the former.

A small downside is that we will lose lockdep checking for i_flock, but
given that it's always taken inside the ilock that should be acceptable.

Note that for example the inode writeback locking is implemented in a
very similar way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:06:45 -06:00
Christoph Hellwig 49e4c70e52 xfs: make i_flags an unsigned long
To be used for bit wakeup i_flags needs to be an unsigned long or we'll
run into trouble on big endian systems.  Because of the 1-byte i_update
field right after it this actually causes a fairly large size increase
on its own (4 or 8 bytes), but that increase will be more than offset
by the next two patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:03:50 -06:00
Christoph Hellwig 8096b1ebb5 xfs: remove the if_ext_max field in struct xfs_ifork
We spent a lot of effort to maintain this field, but it always equals to the
fork size divided by the constant size of an extent.  The prime use of it is
to assert that the two stay in sync.  Just divide the fork size by the extent
size in the few places that we actually use it and remove the overhead
of maintaining it.  Also introduce a few helpers to consolidate the places
where we actually care about the value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17 15:02:28 -06:00
Dan Carpenter 10ec1bb7e9 inetpeer: initialize ->redirect_genid in inet_getpeer()
kmemcheck complains that ->redirect_genid doesn't get initialized.
Presumably it should be set to zero.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17 15:52:12 -05:00
Michał Mirosław 65e9d2faab net: fix NULL-deref in WARN() in skb_gso_segment()
Bug was introduced in commit c8f44affb7.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17 15:51:23 -05:00
Ben Hutchings 36c9247449 net: WARN if skb_checksum_help() is called on skb requiring segmentation
skb_checksum_help() has never done anything useful with skbs that
require segmentation.  Setting skb->ip_summed = CHECKSUM_NONE makes
them invalid and provokes a later WARNing in skb_gso_segment().

Passing such an skb to skb_checksum_help() indicates a bug, so we
should warn about it immediately.  Move the warning from
skb_gso_segment() into a shared function, and add gso_type and
gso_size to it.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17 15:49:03 -05:00
Linus Torvalds 5e5997849a Merge branch 'for-linus' of git://git.kernel.dk/linux-block
* 'for-linus' of git://git.kernel.dk/linux-block:
  cfq-iosched: fix use-after-free of cfqq
2012-01-17 12:41:10 -08:00
Jens Axboe 54b466e44b cfq-iosched: fix use-after-free of cfqq
With the changes in life time management between the cfq IO contexts
and the cfq queues, we now risk having cfqd->active_queue being
freed when cfq_slice_expired() is being called. cfq_preempt_queue()
caches this queue and uses it after calling said function, causing
a use-after-free condition. This triggers the following oops,
when cfqq_type() attempts to dereference it:

BUG: unable to handle kernel paging request at ffff8800746c4f0c
IP: [<ffffffff81266d59>] cfqq_type+0xb/0x20
PGD 18d4063 PUD 1fe15067 PMD 1ffb9067 PTE 80000000746c4160
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU 3
Modules linked in:

Pid: 1, comm: init Not tainted 3.2.0-josef+ #367 Bochs Bochs
RIP: 0010:[<ffffffff81266d59>]  [<ffffffff81266d59>] cfqq_type+0xb/0x20
RSP: 0018:ffff880079c11778  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff880076f3df08 RCX: 0000000000000000
RDX: 0000000000000006 RSI: ffff880074271888 RDI: ffff8800746c4f08
RBP: ffff880079c11778 R08: 0000000000000078 R09: 0000000000000001
R10: 09f911029d74e35b R11: 09f911029d74e35b R12: ffff880076f337f0
R13: ffff8800746c4f08 R14: ffff8800746c4f08 R15: 0000000000000002
FS:  00007f62fd44f700(0000) GS:ffff88007cd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8800746c4f0c CR3: 0000000076c21000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process init (pid: 1, threadinfo ffff880079c10000, task ffff880079c0a040)
Stack:
 ffff880079c117c8 ffffffff812683d8 ffff880079c117a8 ffffffff8125de43
 ffff8800744fcf48 ffff880074b43e98 ffff8800770c8828 ffff880074b43e98
 0000000000000003 0000000000000000 ffff880079c117f8 ffffffff81254149
Call Trace:
 [<ffffffff812683d8>] cfq_insert_request+0x3f5/0x47c
 [<ffffffff8125de43>] ? blk_recount_segments+0x20/0x31
 [<ffffffff81254149>] __elv_add_request+0x1ca/0x200
 [<ffffffff8125aa99>] blk_queue_bio+0x2ef/0x312
 [<ffffffff81258f7b>] generic_make_request+0x9f/0xe0
 [<ffffffff8125907b>] submit_bio+0xbf/0xca
 [<ffffffff81136ec7>] submit_bh+0xdf/0xfe
 [<ffffffff81176d04>] ext3_bread+0x50/0x99
 [<ffffffff811785b3>] dx_probe+0x38/0x291
 [<ffffffff81178864>] ext3_dx_find_entry+0x58/0x219
 [<ffffffff81178ad5>] ext3_find_entry+0xb0/0x406
 [<ffffffff8110c4d5>] ? cache_alloc_debugcheck_after.isra.46+0x14d/0x1a0
 [<ffffffff8110cfbd>] ? kmem_cache_alloc+0xef/0x191
 [<ffffffff8117a330>] ext3_lookup+0x39/0xe1
 [<ffffffff81119461>] d_alloc_and_lookup+0x45/0x6c
 [<ffffffff8111ac41>] do_lookup+0x1e4/0x2f5
 [<ffffffff8111aef6>] link_path_walk+0x1a4/0x6ef
 [<ffffffff8111b557>] path_lookupat+0x59/0x5ea
 [<ffffffff8127406c>] ? __strncpy_from_user+0x30/0x5a
 [<ffffffff8111bce0>] do_path_lookup+0x23/0x59
 [<ffffffff8111cfd6>] user_path_at_empty+0x53/0x99
 [<ffffffff8107b37b>] ? remove_wait_queue+0x51/0x56
 [<ffffffff8111d02d>] user_path_at+0x11/0x13
 [<ffffffff811141f5>] vfs_fstatat+0x3a/0x64
 [<ffffffff8111425a>] vfs_stat+0x1b/0x1d
 [<ffffffff81114359>] sys_newstat+0x1a/0x33
 [<ffffffff81060e12>] ? task_stopped_code+0x42/0x42
 [<ffffffff815d6712>] system_call_fastpath+0x16/0x1b
Code: 89 e6 48 89 c7 e8 fa ca fe ff 85 c0 74 06 4c 89 2b 41 b6 01 5b 44 89 f0 41 5c 41 5d 41 5e 5d c3 55 48 89 e5 66 66 66 66 90 31 c0 <8b> 57 04 f6 c6 01 74 0b 83 e2 20 83 fa 01 19 c0 83 c0 02 5d c3
RIP  [<ffffffff81266d59>] cfqq_type+0xb/0x20
 RSP <ffff880079c11778>
CR2: ffff8800746c4f0c

Get rid of the caching of cfqd->active_queue, and reorder the
check so that it happens before we expire the active queue.

Thanks to Tejun for pin pointing the error location.

Reported-by: Chris Mason <chris.mason@oracle.com>
Tested-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-01-17 21:26:11 +01:00
Ulrich Drepper ce79dac861 x86, opcode: ANDN and Group 17 in x86-opcode-map.txt
The Intel documentation at

http://software.intel.com/file/36945

shows the ANDN opcode and Group 17 with encoding f2 and f3 encoding
respectively.  The current version of x86-opcode-map.txt shows them
with f3 and f4.  Unless someone can point to documentation which shows
the currently used encoding the following patch be applied.

Signed-off-by: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/CAOPLpQdq5SuVo9=023CYhbFLAX9rONyjmYq7jJkqc5xwctW5eA@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-01-17 12:11:54 -08:00
Linus Torvalds 00b1d444af Merge branch 'stable/for-linus-fixes-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/for-linus-fixes-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/balloon: Move the registration from device to subsystem.
2012-01-17 11:56:29 -08:00
Thomas Renninger 5e7590d40d ACPI processor: Remove unneeded cpuidle_unregister_driver call
Since commit 46bcfad7a8 registering
and unregistering cpuidle is done in processor_idle.c.
Unregistering via:
acpi_bus_unregister_driver(&acpi_processor_driver)
   -> acpi_processor_remove()
      -> acpi_processor_power_exit()

Remove not needed cpuidle_unregister_driver() call from
acpi_processor_exit

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-01-17 14:35:52 -05:00
Thomas Renninger 5c2a9f06a9 intel idle: Make idle driver more robust
kvm -cpu host passes the original cpuid info to the guest.

Latest kvm version seem to return true for mwait_leaf cpuid
function on recent Intel CPUs. But it does not return mwait
C-states (mwait_substates), instead zero is returned.

While real CPUs seem to always return non-zero values, the intel
idle driver should not get active in kvm (mwait_substates == 0)
case and bail out.
Otherwise a Null pointer exception will happen later when the
cpuidle subsystem tries to get active:
[0.984807] BUG: unable to handle kernel NULL pointer dereference at (null)
[0.984807] IP: [<(null)>] (null)
...
[0.984807][<ffffffff8143cf34>] ? cpuidle_idle_call+0xb4/0x340
[0.984807][<ffffffff8159e7bc>] ? __atomic_notifier_call_chain+0x4c/0x70
[0.984807][<ffffffff81001198>] ? cpu_idle+0x78/0xd0

Reference:
https://bugzilla.novell.com/show_bug.cgi?id=726296

Cc: stable@vger.kernel.org
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Bruno Friedmann <bruno@ioda-net.ch>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-01-17 14:19:59 -05:00
David Howells 95e3ec1149 intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle
Fix the following warning:

drivers/idle/intel_idle.c: In function 'intel_idle_cpuidle_devices_init':
drivers/idle/intel_idle.c:518:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

By making get_driver_data() return a long instead of an int.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-01-17 14:19:59 -05:00
Masanari Iida 2e92c7ad8f ACPI: kernel-parameters.txt : Add intel_idle.max_cstate
Add missing intel_idle.max_cstate in kernel-parameters.txt

Signed-off-by Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-01-17 14:19:58 -05:00
Yanmin Zhang 63ff07beae intel_idle: remove redundant local_irq_disable() call
irq disabling happens earlier in process_32.c:cpu_idle.  Basically,
cpuidle_state->enter is called, cpu irq is disabled.  cpuidle_state->enter
would turn on irq when exiting.

intel_idle doesn't follow this assumption.  Although it doesn't cause real
issue, it misleads developers.  Remove the call to local_irq_disable() at
entry.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Mingming Zhang <mingmingx.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-01-17 14:19:58 -05:00
Linus Torvalds 8364919c56 Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze
* 'next' of git://git.monstr.eu/linux-2.6-microblaze:
  USB: EHCI: Don't use NO_IRQ in xilinx ehci driver
  microblaze: Add topology init
2012-01-17 10:49:06 -08:00