Commit Graph

13722 Commits

Author SHA1 Message Date
David Howells 4b2b0b9753 ROMFS: Advance destination buffer pointer when reading from a blockdev
RomFS should advance the destination buffer pointer when reading data from a
blockdev source (the data may be split over multiple blocks, each requiring its
own sb_read() call).  Without this, all the data is copied to the beginning of
the output buffer.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-24 13:28:31 -07:00
David Howells 84baf74bf2 ROMFS: romfs_lookup() shouldn't be doing a partial name comparison
romfs_lookup() should be using a routine akin to strcmp() on the backing store,
rather than one akin to strncmp().  If it uses the latter, it's liable to match
/bin/shutdown when looking up /bin/sh.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-24 13:28:31 -07:00
Linus Torvalds a4277bf122 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Fix potential inode allocation soft lockup in Orlov allocator
  ext4: Make the extent validity check more paranoid
  jbd: use SWRITE_SYNC_PLUG when writing synchronous revoke records
  jbd2: use SWRITE_SYNC_PLUG when writing synchronous revoke records
  ext4: really print the find_group_flex fallback warning only once
2009-04-24 08:37:40 -07:00
Linus Torvalds ff91fad2db Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  eCryptfs: Larger buffer for encrypted symlink targets
  eCryptfs: Lock lower directory inode mutex during lookup
  eCryptfs: Remove ecryptfs_unlink_sigs warnings
  eCryptfs: Fix data corruption when using ecryptfs_passthrough
  eCryptfs: Print FNEK sig properly in /proc/mounts
  eCryptfs: NULL pointer dereference in ecryptfs_send_miscdev()
  eCryptfs: Copy lower inode attrs before dentry instantiation
2009-04-24 08:32:44 -07:00
Linus Torvalds 58be18c4de Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] update default configuration.
  [S390] omit frame pointers on s390 when possible
  [S390] Use tape_generic_offline directly.
  [S390] /proc/stat idle field for idle cpus
  [S390] appldata: avoid deadlock with appldata_mem
  [S390] ipl: fix compile breakage
2009-04-24 08:28:27 -07:00
Linus Torvalds 12bac708e6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: Ensure that the inode goal block settings are updated
  GFS2: Fix bug in block allocation
  bitops: Add __ffs64 bitop
2009-04-24 08:27:02 -07:00
Linus Torvalds 97c68d00db Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  cfq-iosched: cache prio_tree root in cfqq->p_root
  cfq-iosched: fix bug with aliased request and cooperation detection
  cfq-iosched: clear ->prio_trees[] on cfqd alloc
  block: fix intermittent dm timeout based oops
  umem: fix request_queue lock warning
  block: simplify I/O stat accounting
  pktcdvd.h should include mempool.h
  cfq-iosched: use the default seek distance when there aren't enough seek samples
  cfq-iosched: make seek_mean converge more quickly
  block: make blk_abort_queue() ignore non-request based devices
  block: include empty disks in /proc/diskstats
  bio: use bio_kmalloc() in copy/map functions
  bio: fix bio_kmalloc()
  block: fix queue bounce limit setting
  block: fix SG_IO vector request data length handling
  scatterlist: make sure sg_miter_next() doesn't return 0 sized mappings
2009-04-24 07:48:24 -07:00
Oleg Nesterov 437f7fdb60 check_unsafe_exec: s/lock_task_sighand/rcu_read_lock/
write_lock(&current->fs->lock) guarantees we can't wrongly miss
LSM_UNSAFE_SHARE, this is what we care about. Use rcu_read_lock()
instead of ->siglock to iterate over the sub-threads. We must see
all CLONE_THREAD|CLONE_FS threads which didn't pass exit_fs(), it
takes fs->lock too.

With or without this patch we can miss the freshly cloned thread
and set LSM_UNSAFE_SHARE, we don't care.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
[ Fixed lock/unlock typo  - Hugh ]
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-24 07:39:45 -07:00
Oleg Nesterov 8c652f96d3 do_execve() must not clear fs->in_exec if it was set by another thread
If do_execve() fails after check_unsafe_exec(), it clears fs->in_exec
unconditionally. This is wrong if we race with our sub-thread which
also does do_execve:

	Two threads T1 and T2 and another process P, all share the same
	->fs.

	T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
	->fs is shared, we set LSM_UNSAFE but not ->in_exec.

	P exits and decrements fs->users.

	T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not
	shared, we set fs->in_exec.

	T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and
	return to the user-space.

	T1 does clone(CLONE_FS /* without CLONE_THREAD */).

	T2 continues without LSM_UNSAFE_SHARE while ->fs is shared with
	another process.

Change check_unsafe_exec() to return res = 1 if we set ->in_exec, and change
do_execve() to clear ->in_exec depending on res.

When do_execve() suceeds, it is safe to clear ->in_exec unconditionally.
It can be set only if we don't share ->fs with another process, and since
we already killed all sub-threads either ->in_exec == 0 or we are the
only user of this ->fs.

Also, we do not need fs->lock to clear fs->in_exec.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-24 07:39:45 -07:00
Martin Schwidefsky e1c805309d [S390] /proc/stat idle field for idle cpus
The cpu idle field in the output of /proc/stat is too small for cpus
that have been idle for more than a tick. Add the architecture hook
arch_idle_time that allows to add the not accounted idle time of a
sleeping cpu without waking the cpu.

The s390 implementation of arch_idle_time uses the already existing
s390_idle_data per_cpu variable to find the sleep time of a neighboring
idle cpu.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-04-23 13:58:17 +02:00
Steven Whitehouse d9ba7615bf GFS2: Ensure that the inode goal block settings are updated
GFS2 has a goal block associated with each inode indicating the
search start position for future block allocations (in fact there
are two, but thats for backward compatibility with GFS1 as they
are set to identical locations in GFS2).

In some circumstances, depending on the ordering of updates to
the inode it was possible for the goal block settings to not
be updated on disk. This patch ensures that the goal block will
always get updated, thus reducing the potential for searching
the same (already allocated) blocks again when looking for free
space during block allocation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-04-23 10:07:37 +01:00
Steven Whitehouse d8bd504ab8 GFS2: Fix bug in block allocation
The new bitfit algorithm was counting from the wrong end of
64 bit words in the bitfield. This fixes it by using __ffs64
instead of fls64

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-04-23 10:07:16 +01:00
Theodore Ts'o b5451f7b26 ext4: Fix potential inode allocation soft lockup in Orlov allocator
If the Orlov allocator is having trouble finding an appropriate block
group, the fallback code could loop forever, causing a soft lockup
warning in find_group_orlov():

BUG: soft lockup - CPU#0 stuck for 61s! [cp:11728]
     ...
Pid: 11728, comm: cp Not tainted (2.6.30-rc1-dirty #77) Lenovo          
EIP: 0060:[<c021650e>] EFLAGS: 00000246 CPU: 0
EIP is at ext4_get_group_desc+0x54/0x9d
    ...
Call Trace:
 [<c0218021>] find_group_orlov+0x2ee/0x334
 [<c0120a5f>] ? sched_clock+0x8/0xb
 [<c02188e3>] ext4_new_inode+0x2cf/0xb1a

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-04-22 21:00:36 -04:00
Theodore Ts'o e84a26ce17 ext4: Make the extent validity check more paranoid
Instead of just checking that the extent block number is greater or
equal than s_first_data_block, make sure it it is not pointing into
the block group descriptors, since that is clearly wrong.  This helps
prevent filesystem from getting very badly corrupted in case an extent
block is corrupted.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-04-22 20:52:25 -04:00
Tyler Hicks 3a6b42cadc eCryptfs: Larger buffer for encrypted symlink targets
When using filename encryption with eCryptfs, the value of the symlink
in the lower filesystem is encrypted and stored as a Tag 70 packet.
This results in a longer symlink target than if the target value wasn't
encrypted.

Users were reporting these messages in their syslog:

[ 45.653441] ecryptfs_parse_tag_70_packet: max_packet_size is [56]; real
packet size is [51]
[ 45.653444] ecryptfs_decode_and_decrypt_filename: Could not parse tag
70 packet from filename; copying through filename as-is

This was due to bufsiz, one the arguments in readlink(), being used to
when allocating the buffer passed to the lower inode's readlink().
That symlink target may be very large, but when decoded and decrypted,
could end up being smaller than bufsize.

To fix this, the buffer passed to the lower inode's readlink() will
always be PATH_MAX in size when filename encryption is enabled.  Any
necessary truncation occurs after the decoding and decrypting.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-04-22 17:02:46 -05:00
Tyler Hicks ca8e34f2b0 eCryptfs: Lock lower directory inode mutex during lookup
This patch locks the lower directory inode's i_mutex before calling
lookup_one_len() to find the appropriate dentry in the lower filesystem.
This bug was found thanks to the warning set in commit 2f9092e1.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-04-22 16:27:12 -05:00
Tyler Hicks e77cc8d243 eCryptfs: Remove ecryptfs_unlink_sigs warnings
A feature was added to the eCryptfs umount helper to automatically
unlink the keys used for an eCryptfs mount from the kernel keyring upon
umount.  This patch keeps the unrecognized mount option warnings for
ecryptfs_unlink_sigs out of the logs.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-04-22 04:08:46 -05:00
Tyler Hicks 13a791b4e6 eCryptfs: Fix data corruption when using ecryptfs_passthrough
ecryptfs_passthrough is a mount option that allows eCryptfs to allow
data to be written to non-eCryptfs files in the lower filesystem.  The
passthrough option was causing data corruption due to it not always
being treated as a non-eCryptfs file.

The first 8 bytes of an eCryptfs file contains the decrypted file size.
This value was being written to the non-eCryptfs files, too.  Also,
extra 0x00 characters were being written to make the file size a
multiple of PAGE_CACHE_SIZE.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-04-22 03:54:13 -05:00
Tyler Hicks 3a5203ab3c eCryptfs: Print FNEK sig properly in /proc/mounts
The filename encryption key signature is not properly displayed in
/proc/mounts.  The "ecryptfs_sig=" mount option name is displayed for
all global authentication tokens, included those for filename keys.

This patch checks the global authentication token flags to determine if
the key is a FEKEK or FNEK and prints the appropriate mount option name
before the signature.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-04-22 03:54:13 -05:00
Tyler Hicks 57ea34d199 eCryptfs: NULL pointer dereference in ecryptfs_send_miscdev()
If data is NULL, msg_ctx->msg is set to NULL and then dereferenced
afterwards.  ecryptfs_send_raw_message() is the only place that
ecryptfs_send_miscdev() is called with data being NULL, but the only
caller of that function (ecryptfs_process_helo()) is never called.  In
short, there is currently no way to trigger the NULL pointer
dereference.

This patch removes the two unused functions and modifies
ecryptfs_send_miscdev() to remove the NULL dereferences.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-04-22 03:54:13 -05:00
Tyler Hicks ae6e84596e eCryptfs: Copy lower inode attrs before dentry instantiation
Copies the lower inode attributes to the upper inode before passing the
upper inode to d_instantiate().  This is important for
security_d_instantiate().

The problem was discovered by a user seeing SELinux denials like so:

type=AVC msg=audit(1236812817.898:47): avc:  denied  { 0x100000 } for
pid=3584 comm="httpd" name="testdir" dev=ecryptfs ino=943872
scontext=root:system_r:httpd_t:s0
tcontext=root:object_r:httpd_sys_content_t:s0 tclass=file

Notice target class is file while testdir is really a directory,
confusing the permission translation (0x100000) due to the wrong i_mode.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2009-04-22 03:54:12 -05:00
Tejun Heo a9e9dc24bb bio: use bio_kmalloc() in copy/map functions
Impact: remove possible deadlock condition

There is no reason to use mempool backed allocation for map functions.
Also, because kern mapping is used inside LLDs (e.g. for EH), using
mempool backed allocation can lead to deadlock under extreme
conditions (mempool already consumed by the time a request reached EH
and requests are blocked on EH).

Switch copy/map functions to bio_kmalloc().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-04-22 08:35:10 +02:00
Tejun Heo 451a9ebf65 bio: fix bio_kmalloc()
Impact: fix bio_kmalloc() and its destruction path

bio_kmalloc() was broken in two ways.

* bvec_alloc_bs() first allocates bvec using kmalloc() and then
  ignores it and allocates again like non-kmalloc bvecs.

* bio_kmalloc_destructor() didn't check for and free bio integrity
  data.

This patch fixes the above problems.  kmalloc patch is separated out
from bio_alloc_bioset() and allocates the requested number of bvecs as
inline bvecs.

* bio_alloc_bioset() no longer takes NULL @bs.  None other than
  bio_kmalloc() used it and outside users can't know how it was
  allocated anyway.

* Define and use BIO_POOL_NONE so that pool index check in
  bvec_free_bs() triggers if inline or kmalloc allocated bvec gets
  there.

* Relocate destructors on top of each allocation function so that how
  they're used is more clear.

Jens Axboe suggested allocating bvecs inline.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-04-22 08:35:10 +02:00
Linus Torvalds ccc5ff94c6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: fix btrfs fallocate oops and deadlock
  Btrfs: use the right node in reada_for_balance
  Btrfs: fix oops on page->mapping->host during writepage
  Btrfs: add a priority queue to the async thread helpers
  Btrfs: use WRITE_SYNC for synchronous writes
2009-04-21 14:12:58 -07:00
Akinobu Mita c12ddba093 hugetlbfs: return negative error code for bad mount option
This fixes the following BUG:

  # mount -o size=MM -t hugetlbfs none /huge
  hugetlbfs: Bad value 'MM' for mount option 'size=MM'
  ------------[ cut here ]------------
  kernel BUG at fs/super.c:996!

Due to

	BUG_ON(!mnt->mnt_sb);

in vfs_kern_mount().

Also, remove unused #include <linux/quotaops.h>

Cc: William Irwin <wli@holomorphy.com>
Cc: <stable@kernel.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-21 13:41:48 -07:00