Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
with de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.
The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is a relatively common idiom (8 instances) to first look up an IDR
entry, and then remove it from the tree if it is found, possibly doing
further operations upon the entry afterwards. If we change idr_remove()
to return the removed object, all of these users can save themselves a
walk of the IDR tree.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Firewire was using is_compat_task to check whether it was in a compat
ioctl or a non-compat ioctl. Use is_compat_syscall instead so it works
properly on all architectures.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Found by the UC-KLEE tool: A user could supply less input to
firewire-cdev ioctls than write- or write/read-type ioctl handlers
expect. The handlers used data from uninitialized kernel stack then.
This could partially leak back to the user if the kernel subsequently
generated fw_cdev_event_'s (to be read from the firewire-cdev fd)
which notably would contain the _u64 closure field which many of the
ioctl argument structures contain.
The fact that the handlers would act on random garbage input is a
lesser issue since all handlers must check their input anyway.
The fix simply always null-initializes the entire ioctl argument buffer
regardless of the actual length of expected user input. That is, a
runtime overhead of memset(..., 40) is added to each firewirew-cdev
ioctl() call. [Comment from Clemens Ladisch: This part of the stack is
most likely to be already in the cache.]
Remarks:
- There was never any leak from kernel stack to the ioctl output
buffer itself. IOW, it was not possible to read kernel stack by a
read-type or write/read-type ioctl alone; the leak could at most
happen in combination with read()ing subsequent event data.
- The actual expected minimum user input of each ioctl from
include/uapi/linux/firewire-cdev.h is, in bytes:
[0x00] = 32, [0x05] = 4, [0x0a] = 16, [0x0f] = 20, [0x14] = 16,
[0x01] = 36, [0x06] = 20, [0x0b] = 4, [0x10] = 20, [0x15] = 20,
[0x02] = 20, [0x07] = 4, [0x0c] = 0, [0x11] = 0, [0x16] = 8,
[0x03] = 4, [0x08] = 24, [0x0d] = 20, [0x12] = 36, [0x17] = 12,
[0x04] = 20, [0x09] = 24, [0x0e] = 4, [0x13] = 40, [0x18] = 4.
Reported-by: David Ramos <daramos@stanford.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
An idr related patch introduced the following sparse warning:
drivers/firewire/core-cdev.c:488:33: warning: incorrect type in initializer (different base types)
drivers/firewire/core-cdev.c:488:33: expected bool [unsigned] [usertype] preload
drivers/firewire/core-cdev.c:488:33: got restricted gfp_t
So let's convert from gfp_t bitfield to Boolean explicitly and safely.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Commit 18d627113b (firewire: prevent dropping of completed iso packet
header data) was intended to be an obvious bug fix, but libdc1394 and
FlyCap2 depend on the old behaviour by ignoring all returned information
and thus not noticing that not all packets have been received yet. The
result was that the video frame buffers would be saved before they
contained the correct data.
Reintroduce the old behaviour for old clients.
Tested-by: Stepan Salenikovich <stepan.salenikovich@gmail.com>
Tested-by: Josep Bosch <jep250@gmail.com>
Cc: <stable@vger.kernel.org> # 3.4+
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Convert to the much saner new idr interface.
v2: Stefan pointed out that add_client_resource() may be called from
non-process context. Preload iff @gfp_mask contains __GFP_WAIT.
Also updated to include minor upper limit check.
[tim.gardner@canonical.com: fix accidentally orphaned 'minor'[
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix two bugs of the /dev/fw* character device concerning the
FW_CDEV_IOC_GET_INFO ioctl with nonzero fw_cdev_get_info.bus_reset.
(Practically all /dev/fw* clients issue this ioctl right after opening
the device.)
Both bugs are caused by sizeof(struct fw_cdev_event_bus_reset) being 36
without natural alignment and 40 with natural alignment.
1) Memory corruption, affecting i386 userland on amd64 kernel:
Userland reserves a 36 bytes large buffer, kernel writes 40 bytes.
This has been first found and reported against libraw1394 if
compiled with gcc 4.7 which happens to order libraw1394's stack such
that the bug became visible as data corruption.
2) Information leak, affecting all kernel architectures except i386:
4 bytes of random kernel stack data were leaked to userspace.
Hence limit the respective copy_to_user() to the 32-bit aligned size of
struct fw_cdev_event_bus_reset.
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: stable@kernel.org
Seen with recent libdc1394: If a client mmap()s the buffer of an
isochronous reception buffer with PROT_READ|PROT_WRITE instead of just
PROT_READ, firewire-core sets the wrong DMA mapping direction during
buffer initialization.
The fix is to split fw_iso_buffer_init() into allocation and DMA mapping
and to perform the latter after both buffer and DMA context were
allocated. Buffer allocation and context allocation may happen in any
order, but we need the context type (reception or transmission) in order
to set the DMA direction of the buffer.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:
perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`
Signed-off-by: David Howells <dhowells@redhat.com>
Extend the kernel and userspace APIs to allow reporting all currently
completed isochronous packets, even if the next interrupt packet has not
yet been reached. This is required to determine the status of the
packets at the end of a paused or stopped stream, and useful for more
precise synchronization of audio streams.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
The buffer for the header data of completed iso packets has a fixed
size, so it is possible to configure a stream with a big interval
between interrupt packets or with big headers so that this buffer would
overflow. Previously, ohci.c would drop any data that would not fit,
but this could make unsuspecting applications believe that fewer than
the actual number of packets have completed.
Instead of dropping data, add calls to flush_iso_completion() so that
there are as many events as needed to report all of the data.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Associate all log messages from firewire-core with the respective card
because some people have more than one card. E.g.
firewire_ohci 0000:04:00.0: added OHCI v1.10 device as card 0, 8 IR + 8 IT contexts, quirks 0x0
firewire_ohci 0000:05:00.0: added OHCI v1.10 device as card 1, 8 IR + 8 IT contexts, quirks 0x0
firewire_core: created device fw0: GUID 0814438400000389, S800
firewire_core: phy config: new root=ffc1, gap_count=5
firewire_core: created device fw1: GUID 0814438400000388, S800
firewire_core: created device fw2: GUID 0001d202e06800d1, S800
turns into
firewire_ohci 0000:04:00.0: added OHCI v1.10 device as card 0, 8 IR + 8 IT contexts, quirks 0x0
firewire_ohci 0000:05:00.0: added OHCI v1.10 device as card 1, 8 IR + 8 IT contexts, quirks 0x0
firewire_core 0000:04:00.0: created device fw0: GUID 0814438400000389, S800
firewire_core 0000:04:00.0: phy config: new root=ffc1, gap_count=5
firewire_core 0000:05:00.0: created device fw1: GUID 0814438400000388, S800
firewire_core 0000:04:00.0: created device fw2: GUID 0001d202e06800d1, S800
This increases the module size slightly; to keep this in check, turn the
former printk wrapper macros into functions. Their implementation is
largely copied from driver core's dev_printk counterparts.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Clemens points out that we need to use compat_ptr() in order to safely
cast from u64 to addresses of a 32-bit usermode client.
Before, our conversion went wrong
- in practice if the client cast from pointer to integer such that
sign-extension happened, (libraw1394 and libdc1394 at least were not
doing that, IOW were not affected)
or
- in theory on s390 (which doesn't have FireWire though) and on the
tile architecture, regardless of what the client does.
The bug would usually be observed as the initial get_info ioctl failing
with "Bad address" (EFAULT).
Reported-by: Carl Karsten <carl@personnelware.com>
Reported-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Between open(2) of a /dev/fw* and the first FW_CDEV_IOC_GET_INFO
ioctl(2) on it, the kernel already queues FW_CDEV_EVENT_BUS_RESET events
to be read(2) by the client. The get_info ioctl is practically always
issued right away after open, hence this condition only occurs if the
client opens during a bus reset, especially during a rapid series of bus
resets.
The problem with this condition is twofold:
- These bus reset events carry the (as yet undocumented) @closure
value of 0. But it is not the kernel's place to choose closures;
they are privat to the client. E.g., this 0 value forced from the
kernel makes it unsafe for clients to dereference it as a pointer to
a closure object without NULL pointer check.
- It is impossible for clients to determine the relative order of bus
reset events from get_info ioctl(2) versus those from read(2),
except in one way: By comparison of closure values. Again, such a
procedure imposes complexity on clients and reduces freedom in use
of the bus reset closure.
So, change the ABI to suppress queuing of bus reset events before the
first FW_CDEV_IOC_GET_INFO ioctl was issued by the client.
Note, this ABI change cannot be version-controlled. The kernel cannot
distinguish old from new clients before the first FW_CDEV_IOC_GET_INFO
ioctl.
We will try to back-merge this change into currently maintained stable/
longterm series, and we only document the new behaviour. The old
behavior is now considered a kernel bug, which it basically is.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: <stable@kernel.org>
On Jun 27 Linus Torvalds wrote:
> The correct error code for "I don't understand this ioctl" is ENOTTY.
> The naming may be odd, but you should think of that error value as a
> "unrecognized ioctl number, you're feeding me random numbers that I
> don't understand and I assume for historical reasons that you tried to
> do some tty operation on me".
[...]
> The EINVAL thing goes way back, and is a disaster. It predates Linux
> itself, as far as I can tell. You'll find lots of man-pages that have
> this line in it:
>
> EINVAL Request or argp is not valid.
>
> and it shows up in POSIX etc. And sadly, it generally shows up
> _before_ the line that says
>
> ENOTTY The specified request does not apply to the kind of object
> that the descriptor d references.
>
> so a lot of people get to the EINVAL, and never even notice the ENOTTY.
[...]
> At least glibc (and hopefully other C libraries) use a _string_ that
> makes much more sense: strerror(ENOTTY) is "Inappropriate ioctl for
> device"
So let's correct this in the <linux/firewire-cdev.h> ABI while it is
still young, relative to distributor adoption.
Side note: We return -ENOTTY not only on _IOC_TYPE or _IOC_NR mismatch,
but also on _IOC_SIZE mismatch. An ioctl with an unsupported size of
argument structure can be seen as an unsupported version of that ioctl.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: <stable@kernel.org>
The struct sbp2_logical_unit.work items can all be executed in parallel
but are not reentrant. Furthermore, reconnect or re-login work must be
executed in a WQ_MEM_RECLAIM workqueue.
Hence replace the old single-threaded firewire-sbp2 workqueue by a
concurrency-managed but non-reentrant workqueue with rescuer.
firewire-core already maintains one, hence use this one.
In earlier versions of this change, I observed occasional failures of
parallel INQUIRY to an Initio INIC-2430 FireWire 800 to dual IDE bridge.
More testing indicates that parallel INQUIRY is not actually a problem,
but too quick successions of logout and login + INQUIRY, e.g. a quick
sequence of cable plugout and plugin, can result in failed INQUIRY.
This does not seem to be something that should or could be addressed by
serialization.
Another dual-LU device to which I currently have access to, an
OXUF924DSB FireWire 800 to dual SATA bridge with firmware from MacPower,
has been successfully tested with this too.
This change is beneficial to environments with two or more FireWire
storage devices, especially if they are located on the same bus.
Management tasks that should be performed as soon and as quickly as
possible, especially reconnect, are no longer held up by tasks on other
devices that may take a long time, especially login with INQUIRY and sd
or sr driver probe.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>