Pull overflow updates from Kees Cook:
"This adds the new overflow checking helpers and adds them to the
2-factor argument allocators. And this adds the saturating size
helpers and does a treewide replacement for the struct_size() usage.
Additionally this adds the overflow testing modules to make sure
everything works.
I'm still working on the treewide replacements for allocators with
"simple" multiplied arguments:
*alloc(a * b, ...) -> *alloc_array(a, b, ...)
and
*zalloc(a * b, ...) -> *calloc(a, b, ...)
as well as the more complex cases, but that's separable from this
portion of the series. I expect to have the rest sent before -rc1
closes; there are a lot of messy cases to clean up.
Summary:
- Introduce arithmetic overflow test helper functions (Rasmus)
- Use overflow helpers in 2-factor allocators (Kees, Rasmus)
- Introduce overflow test module (Rasmus, Kees)
- Introduce saturating size helper functions (Matthew, Kees)
- Treewide use of struct_size() for allocators (Kees)"
* tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
treewide: Use struct_size() for devm_kmalloc() and friends
treewide: Use struct_size() for vmalloc()-family
treewide: Use struct_size() for kmalloc()-family
device: Use overflow helpers for devm_kmalloc()
mm: Use overflow helpers in kvmalloc()
mm: Use overflow helpers in kmalloc_array*()
test_overflow: Add memory allocation overflow tests
overflow.h: Add allocation size calculation helpers
test_overflow: Report test failures
test_overflow: macrofy some more, do more tests for free
lib: add runtime test of check_*_overflow functions
compiler.h: enable builtin overflow checkers and add fallback code
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
void *entry[];
};
instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);
This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
uses. It was done via automatic conversion with manual review for the
"CHECKME" non-standard cases noted below, using the following Coccinelle
script:
// pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
// sizeof *pkey_cache->table, GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@
- alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
// mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@
- alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
// Same pattern, but can't trivially locate the trailing element name,
// or variable name.
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
expression SOMETHING, COUNT, ELEMENT;
@@
- alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
+ alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)
Signed-off-by: Kees Cook <keescook@chromium.org>
ceph_con_workfn() validates con->state before calling try_read() and
then try_write(). However, try_read() temporarily releases con->mutex,
notably in process_message() and ceph_con_in_msg_alloc(), opening the
window for ceph_con_close() to sneak in, close the connection and
release con->sock. When try_write() is called on the assumption that
con->state is still valid (i.e. not STANDBY or CLOSED), a NULL sock
gets passed to the networking stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: selinux_socket_sendmsg+0x5/0x20
Make sure con->state is valid at the top of try_write() and add an
explicit BUG_ON for this, similar to try_read().
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/23706
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
If we go without an established session for a while, backoff delay will
climb to 30 seconds. The keepalive timeout is also 30 seconds, so it's
pretty easily hit after a prolonged hunting for a monitor: we don't get
a chance to send out a keepalive in time, which means we never get back
a keepalive ack in time, cutting an established session and attempting
to connect to a different monitor every 30 seconds:
[Sun Apr 1 23:37:05 2018] libceph: mon0 10.80.20.99:6789 session established
[Sun Apr 1 23:37:36 2018] libceph: mon0 10.80.20.99:6789 session lost, hunting for new mon
[Sun Apr 1 23:37:36 2018] libceph: mon2 10.80.20.103:6789 session established
[Sun Apr 1 23:38:07 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
[Sun Apr 1 23:38:07 2018] libceph: mon1 10.80.20.100:6789 session established
[Sun Apr 1 23:38:37 2018] libceph: mon1 10.80.20.100:6789 session lost, hunting for new mon
[Sun Apr 1 23:38:37 2018] libceph: mon2 10.80.20.103:6789 session established
[Sun Apr 1 23:39:08 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
The regular keepalive interval is 10 seconds. After ->hunting is
cleared in finish_hunting(), call __schedule_delayed() to ensure we
send out a keepalive after 10 seconds.
Cc: stable@vger.kernel.org # 4.7+
Link: http://tracker.ceph.com/issues/23537
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
This means that if we do some backoff, then authenticate, and are
healthy for an extended period of time, a subsequent failure won't
leave us starting our hunting sequence with a large backoff.
Mirrors ceph.git commit d466bc6e66abba9b464b0b69687cf45c9dccf383.
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Pull ceph updates from Ilya Dryomov:
"The big ticket items are:
- support for rbd "fancy" striping (myself).
The striping feature bit is now fully implemented, allowing mapping
v2 images with non-default striping patterns. This completes
support for --image-format 2.
- CephFS quota support (Luis Henriques and Zheng Yan).
This set is based on the new SnapRealm code in the upcoming v13.y.z
("Mimic") release. Quota handling will be rejected on older
filesystems.
- memory usage improvements in CephFS (Chengguang Xu).
Directory specific bits have been split out of ceph_file_info and
some effort went into improving cap reservation code to avoid OOM
crashes.
Also included a bunch of assorted fixes all over the place from
Chengguang and others"
* tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-client: (67 commits)
ceph: quota: report root dir quota usage in statfs
ceph: quota: add counter for snaprealms with quota
ceph: quota: cache inode pointer in ceph_snap_realm
ceph: fix root quota realm check
ceph: don't check quota for snap inode
ceph: quota: update MDS when max_bytes is approaching
ceph: quota: support for ceph.quota.max_bytes
ceph: quota: don't allow cross-quota renames
ceph: quota: support for ceph.quota.max_files
ceph: quota: add initial infrastructure to support cephfs quotas
rbd: remove VLA usage
rbd: fix spelling mistake: "reregisteration" -> "reregistration"
ceph: rename function drop_leases() to a more descriptive name
ceph: fix invalid point dereference for error case in mdsc destroy
ceph: return proper bool type to caller instead of pointer
ceph: optimize memory usage
ceph: optimize mds session register
libceph, ceph: add __init attribution to init funcitons
ceph: filter out used flags when printing unused open flags
ceph: don't wait on writeback when there is no more dirty pages
...
This patch adds the infrastructure required to support cephfs quotas as it
is currently implemented in the ceph fuse client. Cephfs quotas can be
set on any directory, and can restrict the number of bytes or the number
of files stored beneath that point in the directory hierarchy.
Quotas are set using the extended attributes 'ceph.quota.max_files' and
'ceph.quota.max_bytes', and can be removed by setting these attributes to
'0'.
Link: http://tracker.ceph.com/issues/22372
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Add __init attribution to the functions which are called only once
during initiating/registering operations and deleting unnecessary
symbol exports.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Some of message types are missing in ceph_msg_type_name(),
so just adding them for better understanding of output information.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Some of dout format do not include newline in the end,
fix for the files which are in fs/ceph and net/ceph directories,
and changing printk to dout for printing debug info in super.c
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
rbd needs this for null copyups -- if copyup data is all zeroes, we
want to save some I/O and network bandwidth. See rbd_obj_issue_copyup()
in the next commit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
In preparation for rbd "fancy" striping, introduce ceph_bvec_iter for
working with bio_vec array data buffers. The wrappers are trivial, but
make it look similar to ceph_bio_iter.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The reason we clone bios is to be able to give each object request
(and consequently each ceph_osd_data/ceph_msg_data item) its own
pointer to a (list of) bio(s). The messenger then initializes its
cursor with cloned bio's ->bi_iter, so it knows where to start reading
from/writing to. That's all the cloned bios are used for: to determine
each object request's starting position in the provided data buffer.
Introduce ceph_bio_iter to do exactly that -- store position within bio
list (i.e. pointer to bio) + position within that bio (i.e. bvec_iter).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
- make it void
- xlen (object extent length) out parameter should be u32 because only
a single stripe unit is mapped at a time
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
bl, stripeno and objsetno should be u64 -- otherwise large enough files
get corrupted. How large depends on file layout:
- 4M-objects layout (default): any file over 16P
- 64K-objects layout (smallest possible object size): any file over 512T
Only CephFS is affected, rbd doesn't use ceph_calc_file_object_mapping()
yet. Fortunately, CephFS has a max_file_size configurable, the default
for which is way below both of the above numbers.
Reimplement the logic from scratch with no layout validation -- it's
done on the MDS side.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Prefer the direct use of octal for permissions.
Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
and some typing.
Miscellanea:
o Whitespace neatening around these conversions.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When parsing string option, in order to avoid memory leak we need to
carefully free it first in case of specifying same option several times.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Should check result of kstrndup() in case of memory allocation failure.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Pull ceph updates from Ilya Dryomov:
"We have a set of file locking improvements from Zheng, rbd rw/ro state
handling code cleanup from myself and some assorted CephFS fixes from
Jeff.
rbd now defaults to single-major=Y, lifting the limit of ~240 rbd
images per host for everyone"
* tag 'ceph-for-4.15-rc1' of git://github.com/ceph/ceph-client:
rbd: default to single-major device number scheme
libceph: don't WARN() if user tries to add invalid key
rbd: set discard_alignment to zero
ceph: silence sparse endianness warning in encode_caps_cb
ceph: remove the bump of i_version
ceph: present consistent fsid, regardless of arch endianness
ceph: clean up spinlocking and list handling around cleanup_cap_releases()
rbd: get rid of rbd_mapping::read_only
rbd: fix and simplify rbd_ioctl_set_ro()
ceph: remove unused and redundant variable dropping
ceph: mark expected switch fall-throughs
ceph: -EINVAL on decoding failure in ceph_mdsc_handle_fsmap()
ceph: disable cached readdir after dropping positive dentry
ceph: fix bool initialization/comparison
ceph: handle 'session get evicted while there are file locks'
ceph: optimize flock encoding during reconnect
ceph: make lock_to_ceph_filelock() static
ceph: keep auth cap when inode has flocks or posix locks
Pull get_user_pages_fast() conversion from Al Viro:
"A bunch of places switched to get_user_pages_fast()"
* 'work.get_user_pages_fast' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ceph: use get_user_pages_fast()
pvr2fs: use get_user_pages_fast()
atomisp: use get_user_pages_fast()
st: use get_user_pages_fast()
via_dmablit(): use get_user_pages_fast()
fsl_hypervisor: switch to get_user_pages_fast()
rapidio: switch to get_user_pages_fast()
vchiq_2835_arm: switch to get_user_pages_fast()