We used to infer alignment of IOs within a page based on the file offset,
which assumed they matched. This broke with direct IO that was not aligned
to pages (e.g., 512-byte aligned IO). We were also trusting the alignment
specified in the OSD reply, which could have been adjusted by the server.
Explicitly specify the page alignment when setting up OSD IO requests.
Signed-off-by: Sage Weil <sage@newdream.net>
If the client gets out of sync with the server message sequence number, we
normally skip low seq messages (ones we already received). The skip code
was also incrementing the expected seq, such that all subsequent messages
also appeared old and got skipped, and an eventual timeout on the osd
connection. This resulted in some lagging requests and console messages
like
[233480.882885] ceph: skipping osd22 10.138.138.13:6804 seq 2016, expected 2017
[233480.882919] ceph: skipping osd22 10.138.138.13:6804 seq 2017, expected 2018
[233480.882963] ceph: skipping osd22 10.138.138.13:6804 seq 2018, expected 2019
[233480.883488] ceph: skipping osd22 10.138.138.13:6804 seq 2019, expected 2020
[233485.219558] ceph: skipping osd22 10.138.138.13:6804 seq 2020, expected 2021
[233485.906595] ceph: skipping osd22 10.138.138.13:6804 seq 2021, expected 2022
[233490.379536] ceph: skipping osd22 10.138.138.13:6804 seq 2022, expected 2023
[233495.523260] ceph: skipping osd22 10.138.138.13:6804 seq 2023, expected 2024
[233495.923194] ceph: skipping osd22 10.138.138.13:6804 seq 2024, expected 2025
[233500.534614] ceph: tid 6023602 timed out on osd22, will reset osd
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sage Weil <sage@newdream.net>
This only happened when parse_extra_token was not passed
to ceph_parse_option() (hence, only happened in rbd).
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
These facilitate preallocation of pages so that we can encode into the pagelist
in an atomic context.
Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
The rados block device (rbd), based on osdblk, creates a block device
that is backed by objects stored in the Ceph distributed object storage
cluster. Each device consists of a single metadata object and data
striped over many data objects.
The rbd driver supports read-only snapshots.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph. This
is mostly a matter of moving files around. However, a few key pieces
of the interface change as well:
- ceph_client becomes ceph_fs_client and ceph_client, where the latter
captures the mon and osd clients, and the fs_client gets the mds client
and file system specific pieces.
- Mount option parsing and debugfs setup is correspondingly broken into
two pieces.
- The mon client gets a generic handler callback for otherwise unknown
messages (mds map, in this case).
- The basic supported/required feature bits can be expanded (and are by
ceph_fs_client).
No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.
Signed-off-by: Sage Weil <sage@newdream.net>
Several other ethtool functions leave heap uncleared (potentially) by
drivers. Some interfaces appear safe (eeprom, etc), in that the sizes
are well controlled. In some situations (e.g. unchecked error conditions),
the heap will remain unchanged in areas before copying back to userspace.
Note that these are less of an issue since these all require CAP_NET_ADMIN.
Cc: stable@kernel.org
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling ETHTOOL_GRXCLSRLALL with a large rule_cnt will allocate kernel
heap without clearing it. For the one driver (niu) that implements it,
it will leave the unused portion of heap unchanged and copy the full
contents back to userspace.
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 5ed3bc7288.
It turns-out that not all drivers are calling ieee80211_tx_status from a
compatible context. Revert this for now and try again later...
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We never delete the addBA response timer, which
is typically fine, but if the station it belongs
to is deleted very quickly after starting the BA
session, before the peer had a chance to reply,
the timer may fire after the station struct has
been freed already. Therefore, we need to delete
the timer in a suitable spot -- best when the
session is being stopped (which will happen even
then) in which case the delete will be a no-op
most of the time.
I've reproduced the scenario and tested the fix.
This fixes the crash reported at
http://mid.gmane.org/4CAB6F96.6090701@candelatech.com
Cc: stable@kernel.org
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
caif_connect() might dereference a netdevice after dev_put() it.
It also doesnt check dev_get_by_index() return value and could
dereference a NULL pointer.
Fix it, using RCU to avoid taking a reference.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_headroom() is unsigned so "skb_headroom(skb) + toff" is also
unsigned and can't be less than zero. This test was added in 66d50d25:
"u32: negative offset fix" It was supposed to fix a regression.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
L2CAP doesn't permit change like MTU, FCS, TxWindow values while the
connection is alive, we can only set that before the
connection/configuration process. That can lead to bugs in the L2CAP
operation.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
vlan: dont drop packets from unknown vlans in promiscuous mode
Phonet: Correct header retrieval after pskb_may_pull
um: Proper Fix for f25c80a4: remove duplicate structure field initialization
ip_gre: Fix dependencies wrt. ipv6.
net-2.6: SYN retransmits: Add new parameter to retransmits_timed_out()
iwl3945: queue the right work if the scan needs to be aborted
mac80211: fix use-after-free
The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids
array and attempts to ensure that only a supported hmac entry is
returned. The current code fails to do this properly - if the last id
in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the
id integer remains set after exiting the loop, and the address of an
out-of-bounds entry will be returned and subsequently used in the parent
function, causing potentially ugly memory corruption. This patch resets
the id integer to 0 on encountering an invalid id so that NULL will be
returned after finishing the loop if no valid ids are found.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two user-controlled allocations in SCTP are subsequently dereferenced as
sockaddr structs, without checking if the dereferenced struct members fall
beyond the end of the allocated chunk. There doesn't appear to be any
information leakage here based on how these members are used and
additional checking, but it's still worth fixing.
[akpm@linux-foundation.org: remove unfashionable newlines, fix gmail tab->space conversion]
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent patch to allow IGMPv2 responses to IGMPv3 queries
bypasses length checks for valid query lengths, incorrectly
resets the v2_seen timer, and does not support IGMPv1.
The following patch responds with a v2 report as required
by IGMPv2 while correcting the other problems introduced
by the patch.
Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit e81963b180.
LRO is now deprecated in favour of GRO, and only a few drivers use it,
so it is desirable to build it as a module in distribution kernels.
The original change to prevent building it as a module was made in an
attempt to avoid the case where some dependents are set to y and some
to m, and INET_LRO can be set to m rather than y. However, the
Kconfig system will reliably set INET_LRO=y in this case.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the condition (3rd arg) passed to sk_wait_event() in
sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory()
causes the following soft lockup in tcp_sendmsg() when the global tcp
memory pool has exhausted.
>>> snip <<<
localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
localhost kernel: CPU 3:
localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200] [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
localhost kernel:
localhost kernel: Call Trace:
localhost kernel: [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel: [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
localhost kernel: [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
localhost kernel: [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel: [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
localhost kernel: [vfs_write+0x185/0x190] vfs_write+0x185/0x190
localhost kernel: [sys_write+0x50/0x90] sys_write+0x50/0x90
localhost kernel: [system_call+0x7e/0x83] system_call+0x7e/0x83
>>> snip <<<
What is happening is, that the sk_wait_event() condition passed from
sk_stream_wait_memory() evaluates to true for the case of tcp global memory
exhaustion. This is because both sk_stream_memory_free() and vm_wait are true
which causes sk_wait_event() to *not* call schedule_timeout().
Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
This causes the caller to again try allocation, which again fails and again
calls sk_stream_wait_memory(), and so on.
[ Bug introduced by commit c1cbe4b7ad
("[NET]: Avoid atomic xchg() for non-error case") -DaveM ]
Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>