RomFS should advance the destination buffer pointer when reading data from a
blockdev source (the data may be split over multiple blocks, each requiring its
own sb_read() call). Without this, all the data is copied to the beginning of
the output buffer.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
romfs_lookup() should be using a routine akin to strcmp() on the backing store,
rather than one akin to strncmp(). If it uses the latter, it's liable to match
/bin/shutdown when looking up /bin/sh.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix potential inode allocation soft lockup in Orlov allocator
ext4: Make the extent validity check more paranoid
jbd: use SWRITE_SYNC_PLUG when writing synchronous revoke records
jbd2: use SWRITE_SYNC_PLUG when writing synchronous revoke records
ext4: really print the find_group_flex fallback warning only once
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] update default configuration.
[S390] omit frame pointers on s390 when possible
[S390] Use tape_generic_offline directly.
[S390] /proc/stat idle field for idle cpus
[S390] appldata: avoid deadlock with appldata_mem
[S390] ipl: fix compile breakage
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: cache prio_tree root in cfqq->p_root
cfq-iosched: fix bug with aliased request and cooperation detection
cfq-iosched: clear ->prio_trees[] on cfqd alloc
block: fix intermittent dm timeout based oops
umem: fix request_queue lock warning
block: simplify I/O stat accounting
pktcdvd.h should include mempool.h
cfq-iosched: use the default seek distance when there aren't enough seek samples
cfq-iosched: make seek_mean converge more quickly
block: make blk_abort_queue() ignore non-request based devices
block: include empty disks in /proc/diskstats
bio: use bio_kmalloc() in copy/map functions
bio: fix bio_kmalloc()
block: fix queue bounce limit setting
block: fix SG_IO vector request data length handling
scatterlist: make sure sg_miter_next() doesn't return 0 sized mappings
write_lock(¤t->fs->lock) guarantees we can't wrongly miss
LSM_UNSAFE_SHARE, this is what we care about. Use rcu_read_lock()
instead of ->siglock to iterate over the sub-threads. We must see
all CLONE_THREAD|CLONE_FS threads which didn't pass exit_fs(), it
takes fs->lock too.
With or without this patch we can miss the freshly cloned thread
and set LSM_UNSAFE_SHARE, we don't care.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
[ Fixed lock/unlock typo - Hugh ]
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If do_execve() fails after check_unsafe_exec(), it clears fs->in_exec
unconditionally. This is wrong if we race with our sub-thread which
also does do_execve:
Two threads T1 and T2 and another process P, all share the same
->fs.
T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
->fs is shared, we set LSM_UNSAFE but not ->in_exec.
P exits and decrements fs->users.
T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not
shared, we set fs->in_exec.
T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and
return to the user-space.
T1 does clone(CLONE_FS /* without CLONE_THREAD */).
T2 continues without LSM_UNSAFE_SHARE while ->fs is shared with
another process.
Change check_unsafe_exec() to return res = 1 if we set ->in_exec, and change
do_execve() to clear ->in_exec depending on res.
When do_execve() suceeds, it is safe to clear ->in_exec unconditionally.
It can be set only if we don't share ->fs with another process, and since
we already killed all sub-threads either ->in_exec == 0 or we are the
only user of this ->fs.
Also, we do not need fs->lock to clear fs->in_exec.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cpu idle field in the output of /proc/stat is too small for cpus
that have been idle for more than a tick. Add the architecture hook
arch_idle_time that allows to add the not accounted idle time of a
sleeping cpu without waking the cpu.
The s390 implementation of arch_idle_time uses the already existing
s390_idle_data per_cpu variable to find the sleep time of a neighboring
idle cpu.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
GFS2 has a goal block associated with each inode indicating the
search start position for future block allocations (in fact there
are two, but thats for backward compatibility with GFS1 as they
are set to identical locations in GFS2).
In some circumstances, depending on the ordering of updates to
the inode it was possible for the goal block settings to not
be updated on disk. This patch ensures that the goal block will
always get updated, thus reducing the potential for searching
the same (already allocated) blocks again when looking for free
space during block allocation.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The new bitfit algorithm was counting from the wrong end of
64 bit words in the bitfield. This fixes it by using __ffs64
instead of fls64
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Instead of just checking that the extent block number is greater or
equal than s_first_data_block, make sure it it is not pointing into
the block group descriptors, since that is clearly wrong. This helps
prevent filesystem from getting very badly corrupted in case an extent
block is corrupted.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When using filename encryption with eCryptfs, the value of the symlink
in the lower filesystem is encrypted and stored as a Tag 70 packet.
This results in a longer symlink target than if the target value wasn't
encrypted.
Users were reporting these messages in their syslog:
[ 45.653441] ecryptfs_parse_tag_70_packet: max_packet_size is [56]; real
packet size is [51]
[ 45.653444] ecryptfs_decode_and_decrypt_filename: Could not parse tag
70 packet from filename; copying through filename as-is
This was due to bufsiz, one the arguments in readlink(), being used to
when allocating the buffer passed to the lower inode's readlink().
That symlink target may be very large, but when decoded and decrypted,
could end up being smaller than bufsize.
To fix this, the buffer passed to the lower inode's readlink() will
always be PATH_MAX in size when filename encryption is enabled. Any
necessary truncation occurs after the decoding and decrypting.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
This patch locks the lower directory inode's i_mutex before calling
lookup_one_len() to find the appropriate dentry in the lower filesystem.
This bug was found thanks to the warning set in commit 2f9092e1.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
A feature was added to the eCryptfs umount helper to automatically
unlink the keys used for an eCryptfs mount from the kernel keyring upon
umount. This patch keeps the unrecognized mount option warnings for
ecryptfs_unlink_sigs out of the logs.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
ecryptfs_passthrough is a mount option that allows eCryptfs to allow
data to be written to non-eCryptfs files in the lower filesystem. The
passthrough option was causing data corruption due to it not always
being treated as a non-eCryptfs file.
The first 8 bytes of an eCryptfs file contains the decrypted file size.
This value was being written to the non-eCryptfs files, too. Also,
extra 0x00 characters were being written to make the file size a
multiple of PAGE_CACHE_SIZE.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
The filename encryption key signature is not properly displayed in
/proc/mounts. The "ecryptfs_sig=" mount option name is displayed for
all global authentication tokens, included those for filename keys.
This patch checks the global authentication token flags to determine if
the key is a FEKEK or FNEK and prints the appropriate mount option name
before the signature.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
If data is NULL, msg_ctx->msg is set to NULL and then dereferenced
afterwards. ecryptfs_send_raw_message() is the only place that
ecryptfs_send_miscdev() is called with data being NULL, but the only
caller of that function (ecryptfs_process_helo()) is never called. In
short, there is currently no way to trigger the NULL pointer
dereference.
This patch removes the two unused functions and modifies
ecryptfs_send_miscdev() to remove the NULL dereferences.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Copies the lower inode attributes to the upper inode before passing the
upper inode to d_instantiate(). This is important for
security_d_instantiate().
The problem was discovered by a user seeing SELinux denials like so:
type=AVC msg=audit(1236812817.898:47): avc: denied { 0x100000 } for
pid=3584 comm="httpd" name="testdir" dev=ecryptfs ino=943872
scontext=root:system_r:httpd_t:s0
tcontext=root:object_r:httpd_sys_content_t:s0 tclass=file
Notice target class is file while testdir is really a directory,
confusing the permission translation (0x100000) due to the wrong i_mode.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Impact: remove possible deadlock condition
There is no reason to use mempool backed allocation for map functions.
Also, because kern mapping is used inside LLDs (e.g. for EH), using
mempool backed allocation can lead to deadlock under extreme
conditions (mempool already consumed by the time a request reached EH
and requests are blocked on EH).
Switch copy/map functions to bio_kmalloc().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Impact: fix bio_kmalloc() and its destruction path
bio_kmalloc() was broken in two ways.
* bvec_alloc_bs() first allocates bvec using kmalloc() and then
ignores it and allocates again like non-kmalloc bvecs.
* bio_kmalloc_destructor() didn't check for and free bio integrity
data.
This patch fixes the above problems. kmalloc patch is separated out
from bio_alloc_bioset() and allocates the requested number of bvecs as
inline bvecs.
* bio_alloc_bioset() no longer takes NULL @bs. None other than
bio_kmalloc() used it and outside users can't know how it was
allocated anyway.
* Define and use BIO_POOL_NONE so that pool index check in
bvec_free_bs() triggers if inline or kmalloc allocated bvec gets
there.
* Relocate destructors on top of each allocation function so that how
they're used is more clear.
Jens Axboe suggested allocating bvecs inline.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix btrfs fallocate oops and deadlock
Btrfs: use the right node in reada_for_balance
Btrfs: fix oops on page->mapping->host during writepage
Btrfs: add a priority queue to the async thread helpers
Btrfs: use WRITE_SYNC for synchronous writes
This fixes the following BUG:
# mount -o size=MM -t hugetlbfs none /huge
hugetlbfs: Bad value 'MM' for mount option 'size=MM'
------------[ cut here ]------------
kernel BUG at fs/super.c:996!
Due to
BUG_ON(!mnt->mnt_sb);
in vfs_kern_mount().
Also, remove unused #include <linux/quotaops.h>
Cc: William Irwin <wli@holomorphy.com>
Cc: <stable@kernel.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>