Commit Graph

1800 Commits

Author SHA1 Message Date
Prarit Bhargava
9e3961a097 kernel: add panic_on_warn
There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system.  Sometimes this is easy to do, other times (such as
in the case of a remote admin) it is not trivial to send new images to
the user.

A much easier method would be a switch to change the WARN() over to a
panic.  This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.

This patch adds a panic_on_warn kernel parameter and
/proc/sys/kernel/panic_on_warn calls panic() in the
warn_slowpath_common() path.  The function will still print out the
location of the warning.

An example of the panic_on_warn output:

The first line below is from the WARN_ON() to output the WARN_ON()'s
location.  After that the panic() output is displayed.

    WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 30 PID: 11698 Comm: insmod Tainted: G        W  OE  3.17.0+ #57
    Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
     0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
     0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
     ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
    Call Trace:
     [<ffffffff81665190>] dump_stack+0x46/0x58
     [<ffffffff8165e2ec>] panic+0xd0/0x204
     [<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
     [<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
     [<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
     [<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
     [<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
     [<ffffffff81002144>] do_one_initcall+0xd4/0x210
     [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
     [<ffffffff810f8889>] load_module+0x16a9/0x1b30
     [<ffffffff810f3d30>] ? store_uevent+0x70/0x70
     [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
     [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
     [<ffffffff8166cf29>] system_call_fastpath+0x12/0x17

Successfully tested by me.

hpa said: There is another very valid use for this: many operators would
rather a machine shuts down than being potentially compromised either
functionally or security-wise.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:10 -08:00
Linus Torvalds
3eb5b893eb Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MPX support from Thomas Gleixner:
 "This enables support for x86 MPX.

  MPX is a new debug feature for bound checking in user space.  It
  requires kernel support to handle the bound tables and decode the
  bound violating instruction in the trap handler"

* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  asm-generic: Remove asm-generic arch_bprm_mm_init()
  mm: Make arch_unmap()/bprm_mm_init() available to all architectures
  x86: Cleanly separate use of asm-generic/mm_hooks.h
  x86 mpx: Change return type of get_reg_offset()
  fs: Do not include mpx.h in exec.c
  x86, mpx: Add documentation on Intel MPX
  x86, mpx: Cleanup unused bound tables
  x86, mpx: On-demand kernel allocation of bounds tables
  x86, mpx: Decode MPX instruction to get bound violation information
  x86, mpx: Add MPX-specific mmap interface
  x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
  x86, mpx: Add MPX to disabled features
  ia64: Sync struct siginfo with general version
  mips: Sync struct siginfo with general version
  mpx: Extend siginfo structure to include bound violation information
  x86, mpx: Rename cfg_reg_u and status_reg
  x86: mpx: Give bndX registers actual names
  x86: Remove arbitrary instruction size limit in instruction decoder
2014-12-10 09:34:43 -08:00
Linus Torvalds
86c6a2fddf Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.

     This instruments might_sleep() checks to catch places that nest
     blocking primitives - such as mutex usage in a wait loop.  Such
     bugs can result in hard to debug races/hangs.

     Another category of invalid nesting that this facility will detect
     is the calling of blocking functions from within schedule() ->
     sched_submit_work() -> blk_schedule_flush_plug().

     There's some potential for false positives (if secondary blocking
     primitives themselves are not ready yet for this facility), but the
     kernel will warn once about such bugs per bootup, so the warning
     isn't much of a nuisance.

     This feature comes with a number of fixes, for problems uncovered
     with it, so no messages are expected normally.

   - Another round of sched/numa optimizations and refinements, for
     CONFIG_NUMA_BALANCING=y.

   - Another round of sched/dl fixes and refinements.

  Plus various smaller fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched: Add missing rcu protection to wake_up_all_idle_cpus
  sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
  sched/numa: Init numa balancing fields of init_task
  sched/deadline: Remove unnecessary definitions in cpudeadline.h
  sched/cpupri: Remove unnecessary definitions in cpupri.h
  sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
  sched/fair: Fix stale overloaded status in the busiest group finding logic
  sched: Move p->nr_cpus_allowed check to select_task_rq()
  sched/completion: Document when to use wait_for_completion_io_*()
  sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
  sched/fair: Kill task_struct::numa_entry and numa_group::task_list
  sched: Refactor task_struct to use numa_faults instead of numa_* pointers
  sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
  sched/deadline: Reschedule from switched_from_dl() after a successful pull
  sched/deadline: Push task away if the deadline is equal to curr during wakeup
  sched/deadline: Add deadline rq status print
  sched/deadline: Fix artificial overrun introduced by yield_task_dl()
  sched/rt: Clean up check_preempt_equal_prio()
  sched/core: Use dl_bw_of() under rcu_read_lock_sched()
  sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
  ...
2014-12-09 21:21:34 -08:00
Linus Torvalds
5706ffd045 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events update from Ingo Molnar:
 "On the kernel side there's few changes, the one that stands out is
  PEBS machine state sampling support on x86, by Stephane Eranian.

  On the tooling side:

  User visible tooling changes:

   - Don't open the DWARF info multiple times, keeping instead a dwfl
     handle in struct dso, greatly speeding up 'perf report' on powerpc.
     (Sukadev Bhattiprolu)

   - Introduce PARSE_OPT_DISABLED option flag and use it to avoid
     showing undersired options in tools that provides frontends to
     'perf record', like sched, kvm, etc (Namhyung Kim)

   - Fallback to kallsyms when using the minimal 'ELF' loader (Arnaldo
     Carvalho de Melo)

   - Fix annotation with kcore (Adrian Hunter)

   - Support source line numbers in annotate using a hotkey (Andi Kleen)

   - Callchain improvements including:
     * Enable printing the srcline in the history
     * Make get_srcline fall back to sym+offset (Andi Kleen)

   - TUI hist_entry browser fixes, including showing missing overhead
     value for first level callchain.  Detected comparing the output of
     --stdio/--gui (that matched) with --tui, that had this problem.
     (Namhyung Kim)

   - Support handling complete branch stacks as histograms (Andi Kleen)

  Tooling infrastructure changes:

   - Prep work for supporting per-pkg and snapshot counters in 'perf
     stat' (Jiri Olsa)

   - 'perf stat' refactorings, moving stuff from it to evsel.c to use in
     per-pkg/snapshot format changes (Jiri Olsa)

   - Add per-pkg format file parsing (Matt Fleming)

   - Clean up libelf feature support code (Namhyung Kim)

   - Add gzip decompression support for kernel modules (Namhyung Kim)

   - More prep patches for Intel PT, including a a thread stack and more
     stuff made available via the database export mechanism (Adrian
     Hunter)

   - More Intel PT work, including a facility to export sample data
     (comms, threads, symbol names, etc) in a database friendly way,
     with an script to use this to create a postgresql database.
     (Adrian Hunter)

   - Make sure that thread->mg->machine points to the machine where the
     thread exists (it was being set only for the kmaps kernel modules
     case, do it as well for the mmaps) and use it to shorten function
     signatures (Arnaldo Carvalho de Melo)

  ... and lots of other fixes and smaller improvements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits)
  perf report: In branch stack mode use address history sorting
  perf report: Add --branch-history option
  perf callchain: Support handling complete branch stacks as histograms
  perf stat: Add support for snapshot counters
  perf stat: Add support for per-pkg counters
  perf tools: Remove perf_evsel__read interface
  perf stat: Use read_counter in read_counter_aggr
  perf stat: Make read_counter work over the thread dimension
  perf stat: Use perf_evsel__read_cb in read_counter
  perf tools: Add snapshot format file parsing
  perf tools: Add per-pkg format file parsing
  perf evsel: Introduce perf_evsel__read_cb function
  perf evsel: Introduce perf_counts_values__scale function
  perf evsel: Introduce perf_evsel__compute_deltas function
  perf tools: Allow to force redirect pr_debug to stderr.
  perf tools: Fix segfault due to invalid kernel dso access
  perf callchain: Make get_srcline fall back to sym+offset
  perf symbols: Move bfd_demangle stubbing to its only user
  perf callchain: Enable printing the srcline in the history
  perf tools: Collapse first level callchain entry if it has sibling
  ...
2014-12-09 20:55:37 -08:00
Linus Torvalds
b64bb1d758 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "Here's the usual mixed bag of arm64 updates, also including some
  related EFI changes (Acked by Matt) and the MMU gather range cleanup
  (Acked by you).

  Changes include:
   - support for alternative instruction patching from Andre
   - seccomp from Akashi
   - some AArch32 instruction emulation, required by the Android folks
   - optimisations for exception entry/exit code, cmpxchg, pcpu atomics
   - mmu_gather range calculations moved into core code
   - EFI updates from Ard, including long-awaited SMBIOS support
   - /proc/cpuinfo fixes to align with the format used by arch/arm/
   - a few non-critical fixes across the architecture"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits)
  arm64: remove the unnecessary arm64_swiotlb_init()
  arm64: add module support for alternatives fixups
  arm64: perf: Prevent wraparound during overflow
  arm64/include/asm: Fixed a warning about 'struct pt_regs'
  arm64: Provide a namespace to NCAPS
  arm64: bpf: lift restriction on last instruction
  arm64: Implement support for read-mostly sections
  arm64: compat: align cacheflush syscall with arch/arm
  arm64: add seccomp support
  arm64: add SIGSYS siginfo for compat task
  arm64: add seccomp syscall for compat task
  asm-generic: add generic seccomp.h for secure computing mode 1
  arm64: ptrace: allow tracer to skip a system call
  arm64: ptrace: add NT_ARM_SYSTEM_CALL regset
  arm64: Move some head.text functions to executable section
  arm64: jump labels: NOP out NOP -> NOP replacement
  arm64: add support to dump the kernel page tables
  arm64: Add FIX_HOLE to permanent fixed addresses
  arm64: alternatives: fix pr_fmt string for consistency
  arm64: vmlinux.lds.S: don't discard .exit.* sections at link-time
  ...
2014-12-09 13:12:47 -08:00
Linus Torvalds
a4a26e8e92 Merge tag 'nios2-v3.19-rc1' of git://git.rocketboards.org/linux-socfpga-next
Pull Altera Nios II processor support from Ley Foon Tan:
 "Here is the Linux port for Nios II processor (from Altera) arch/nios2/
  tree for v3.19.

  The patchset has been discussed on the kernel mailing lists since
  April and has gone through 6 revisions of review.  The additional
  changes since then have been mostly further cleanups and fixes when
  merged with other trees.

  The arch code is in arch/nios2 and one asm-generic change (acked by
  Arnd)"

Arnd Bergmann says:
 "I've reviewed the architecture port in the past and it looks good in
  its latest version"

Acked-by: Arnd Bergmann <arnd@arndb.de>

* tag 'nios2-v3.19-rc1' of git://git.rocketboards.org/linux-socfpga-next: (40 commits)
  nios2: Make NIOS2_CMDLINE_IGNORE_DTB depend on CMDLINE_BOOL
  nios2: Add missing NR_CPUS to Kconfig
  nios2: asm-offsets: Remove unused definition TI_TASK
  nios2: Remove write-only struct member from nios2_timer
  nios2: Remove unused extern declaration of shm_align_mask
  nios2: include linux/type.h in io.h
  nios2: move include asm-generic/io.h to end of file
  nios2: remove include asm-generic/iomap.h from io.h
  nios2: remove unnecessary space before define
  nios2: fix error handling of irq_of_parse_and_map
  nios2: Use IS_ENABLED instead of #ifdefs to check config symbols
  nios2: Build infrastructure
  Documentation: Add documentation for Nios2 architecture
  MAINTAINERS: Add nios2 maintainer
  nios2: ptrace support
  nios2: Module support
  nios2: Nios2 registers
  nios2: Miscellaneous header files
  nios2: Cpuinfo handling
  nios2: Time keeping
  ...
2014-12-09 12:45:41 -08:00
Linus Torvalds
140dfc9299 Merge tag 'dm-3.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:

 - Significant DM thin-provisioning performance improvements to meet
   performance requirements that were requested by the Gluster
   distributed filesystem.

   Specifically, dm-thinp now takes care to aggregate IO that will be
   issued to the same thinp block before issuing IO to the underlying
   devices.  This really helps improve performance on HW RAID6 devices
   that have a writeback cache because it avoids RMW in the HW RAID
   controller.

 - Some stable fixes: fix leak in DM bufio if integrity profiles were
   enabled, use memzero_explicit in DM crypt to avoid any potential for
   information leak, and a DM cache fix to properly mark a cache block
   dirty if it was promoted to the cache via the overwrite optimization.

 - A few simple DM persistent data library fixes

 - DM cache multiqueue policy block promotion improvements.

 - DM cache discard improvements that take advantage of range
   (multiblock) discard support in the DM bio-prison.  This allows for
   much more efficient bulk discard processing (e.g.  when mkfs.xfs
   discards the entire device).

 - Some small optimizations in DM core and RCU deference cleanups

 - DM core changes to suspend/resume code to introduce the new internal
   suspend/resume interface that the DM thin-pool target now uses to
   suspend/resume active thin devices when the thin-pool must
   suspend/resume.

   This avoids forcing userspace to track all active thin volumes in a
   thin-pool when the thin-pool is suspended for the purposes of
   metadata or data space resize.

* tag 'dm-3.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (49 commits)
  dm crypt: use memzero_explicit for on-stack buffer
  dm space map metadata: fix sm_bootstrap_get_count()
  dm space map metadata: fix sm_bootstrap_get_nr_blocks()
  dm bufio: fix memleak when using a dm_buffer's inline bio
  dm cache: fix spurious cell_defer when dealing with partial block at end of device
  dm cache: dirty flag was mistakenly being cleared when promoting via overwrite
  dm cache: only use overwrite optimisation for promotion when in writeback mode
  dm cache: discard block size must be a multiple of cache block size
  dm cache: fix a harmless race when working out if a block is discarded
  dm cache: when reloading a discard bitset allow for a different discard block size
  dm cache: fix some issues with the new discard range support
  dm array: if resizing the array is a noop set the new root to the old one
  dm: use rcu_dereference_protected instead of rcu_dereference
  dm thin: fix pool_io_hints to avoid looking at max_hw_sectors
  dm thin: suspend/resume active thin devices when reloading thin-pool
  dm: enhance internal suspend and resume interface
  dm thin: do not allow thin device activation while pool is suspended
  dm: add presuspend_undo hook to target_type
  dm: return earlier from dm_blk_ioctl if target doesn't implement .ioctl
  dm thin: remove stale 'trim' message in block comment above pool_message
  ...
2014-12-08 21:10:03 -08:00
Ley Foon Tan
4fdace8d4f Add ELF machine define for Nios2
Signed-off-by: Ley Foon Tan <lftan@altera.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2014-12-08 12:55:57 +08:00
Masahiro Yamada
d0747f10ed uapi: fix to export linux/vm_sockets.h
A typo "header=y" was introduced by commit 7071cf7fc4 ("uapi: add
missing network related headers to kbuild").

Signed-off-by: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-04 15:28:40 -08:00
AKASHI Takahiro
766a85d7bc arm64: ptrace: add NT_ARM_SYSTEM_CALL regset
This regeset is intended to be used to get and set a system call number
while tracing.
There was some discussion about possible approaches to do so:

(1) modify x8 register with ptrace(PTRACE_SETREGSET) indirectly,
    and update regs->syscallno later on in syscall_trace_enter(), or
(2) define a dedicated regset for this purpose as on s390, or
(3) support ptrace(PTRACE_SET_SYSCALL) as on arch/arm

Thinking of the fact that user_pt_regs doesn't expose 'syscallno' to
tracer as well as that secure_computing() expects a changed syscall number,
especially case of -1, to be visible before this function returns in
syscall_trace_enter(), (1) doesn't work well.
We will take (2) since it looks much cleaner.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-11-28 10:19:49 +00:00
Jussi Laako
d42472ecff ALSA: pcm: Add big-endian DSD sample formats and fix XMOS DSD sample format
This patch fixes XMOS DSD sample format to DSD_U32_BE and also adds
DSD_U16_BE and DSD_U32_BE sample formats.

Signed-off-by: Jussi Laako <jussi@sonarnerd.net>
Acked-by: Jurgen Kramer <gtmkramer@xs4all.nl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2014-11-21 15:13:28 +01:00
Mike Snitzer
ffcc393641 dm: enhance internal suspend and resume interface
Rename dm_internal_{suspend,resume} to dm_internal_{suspend,resume}_fast
-- dm-stats will continue using these methods to avoid all the extra
suspend/resume logic that is not needed in order to quickly flush IO.

Introduce dm_internal_suspend_noflush() variant that actually calls the
mapped_device's target callbacks -- otherwise target-specific hooks are
avoided (e.g. dm-thin's thin_presuspend and thin_postsuspend).  Common
code between dm_internal_{suspend_noflush,resume} and
dm_{suspend,resume} was factored out as __dm_{suspend,resume}.

Update dm_internal_{suspend_noflush,resume} to always take and release
the mapped_device's suspend_lock.  Also update dm_{suspend,resume} to be
aware of potential for DM_INTERNAL_SUSPEND_FLAG to be set and respond
accordingly by interruptibly waiting for the DM_INTERNAL_SUSPEND_FLAG to
be cleared.  Add lockdep annotation to dm_suspend() and dm_resume().

The existing DM_SUSPEND_FLAG remains unchanged.
DM_INTERNAL_SUSPEND_FLAG is set by dm_internal_suspend_noflush() and
cleared by dm_internal_resume().

Both DM_SUSPEND_FLAG and DM_INTERNAL_SUSPEND_FLAG may be set if a device
was already suspended when dm_internal_suspend_noflush() was called --
this can be thought of as a "nested suspend".  A "nested suspend" can
occur with legacy userspace dm-thin code that might suspend all active
thin volumes before suspending the pool for resize.

But otherwise, in the normal dm-thin-pool suspend case moving forward:
the thin-pool will have DM_SUSPEND_FLAG set and all active thins from
that thin-pool will have DM_INTERNAL_SUSPEND_FLAG set.

Also add DM_INTERNAL_SUSPEND_FLAG to status report.  This new
DM_INTERNAL_SUSPEND_FLAG state is being reported to assist with
debugging (e.g. 'dmsetup info' will report an internally suspended
device accordingly).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
2014-11-19 12:31:17 -05:00
Mike Snitzer
d67ee213fa dm: add presuspend_undo hook to target_type
The DM thin-pool target now must undo the changes performed during
pool_presuspend() so introduce presuspend_undo hook in target_type.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
2014-11-19 11:24:59 -05:00
Dave Hansen
fe3d197f84 x86, mpx: On-demand kernel allocation of bounds tables
This is really the meat of the MPX patch set.  If there is one patch to
review in the entire series, this is the one.  There is a new ABI here
and this kernel code also interacts with userspace memory in a
relatively unusual manner.  (small FAQ below).

Long Description:

This patch adds two prctl() commands to provide enable or disable the
management of bounds tables in kernel, including on-demand kernel
allocation (See the patch "on-demand kernel allocation of bounds tables")
and cleanup (See the patch "cleanup unused bound tables"). Applications
do not strictly need the kernel to manage bounds tables and we expect
some applications to use MPX without taking advantage of this kernel
support. This means the kernel can not simply infer whether an application
needs bounds table management from the MPX registers.  The prctl() is an
explicit signal from userspace.

PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
require kernel's help in managing bounds tables.

PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
won't allocate and free bounds tables even if the CPU supports MPX.

PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
directory out of a userspace register (bndcfgu) and then cache it into
a new field (->bd_addr) in  the 'mm_struct'.  PR_MPX_DISABLE_MANAGEMENT
will set "bd_addr" to an invalid address.  Using this scheme, we can
use "bd_addr" to determine whether the management of bounds tables in
kernel is enabled.

Also, the only way to access that bndcfgu register is via an xsaves,
which can be expensive.  Caching "bd_addr" like this also helps reduce
the cost of those xsaves when doing table cleanup at munmap() time.
Unfortunately, we can not apply this optimization to #BR fault time
because we need an xsave to get the value of BNDSTATUS.

==== Why does the hardware even have these Bounds Tables? ====

MPX only has 4 hardware registers for storing bounds information.
If MPX-enabled code needs more than these 4 registers, it needs to
spill them somewhere. It has two special instructions for this
which allow the bounds to be moved between the bounds registers
and some new "bounds tables".

They are similar conceptually to a page fault and will be raised by
the MPX hardware during both bounds violations or when the tables
are not present. This patch handles those #BR exceptions for
not-present tables by carving the space out of the normal processes
address space (essentially calling the new mmap() interface indroduced
earlier in this patch set.) and then pointing the bounds-directory
over to it.

The tables *need* to be accessed and controlled by userspace because
the instructions for moving bounds in and out of them are extremely
frequent. They potentially happen every time a register pointing to
memory is dereferenced. Any direct kernel involvement (like a syscall)
to access the tables would obviously destroy performance.

==== Why not do this in userspace? ====

This patch is obviously doing this allocation in the kernel.
However, MPX does not strictly *require* anything in the kernel.
It can theoretically be done completely from userspace. Here are
a few ways this *could* be done. I don't think any of them are
practical in the real-world, but here they are.

Q: Can virtual space simply be reserved for the bounds tables so
   that we never have to allocate them?
A: As noted earlier, these tables are *HUGE*. An X-GB virtual
   area needs 4*X GB of virtual space, plus 2GB for the bounds
   directory. If we were to preallocate them for the 128TB of
   user virtual address space, we would need to reserve 512TB+2GB,
   which is larger than the entire virtual address space today.
   This means they can not be reserved ahead of time. Also, a
   single process's pre-popualated bounds directory consumes 2GB
   of virtual *AND* physical memory. IOW, it's completely
   infeasible to prepopulate bounds directories.

Q: Can we preallocate bounds table space at the same time memory
   is allocated which might contain pointers that might eventually
   need bounds tables?
A: This would work if we could hook the site of each and every
   memory allocation syscall. This can be done for small,
   constrained applications. But, it isn't practical at a larger
   scale since a given app has no way of controlling how all the
   parts of the app might allocate memory (think libraries). The
   kernel is really the only place to intercept these calls.

Q: Could a bounds fault be handed to userspace and the tables
   allocated there in a signal handler instead of in the kernel?
A: (thanks to tglx) mmap() is not on the list of safe async
   handler functions and even if mmap() would work it still
   requires locking or nasty tricks to keep track of the
   allocation state there.

Having ruled out all of the userspace-only approaches for managing
bounds tables that we could think of, we create them on demand in
the kernel.

Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Qiaowei Ren
ee1b58d36a mpx: Extend siginfo structure to include bound violation information
This patch adds new fields about bound violation into siginfo
structure. si_lower and si_upper are respectively lower bound
and upper bound when bound violation is caused.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151819.1908C900@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Stephane Eranian
60e2364e60 perf: Add ability to sample machine state on interrupt
Enable capture of interrupted machine state for each sample.

Registers to sample are passed per event in the sample_regs_intr bitmask.

To sample interrupt machine state, the PERF_SAMPLE_INTR_REGS must be passed in
sample_type.

The list of available registers is arch dependent and provided by asm/perf_regs.h

Registers are laid out as u64 in the order of the bit order of sample_intr_regs.

This patch also adds a new ABI version PERF_ATTR_SIZE_VER4 because we extend
the perf_event_attr struct with a new u64 field.

Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: cebbert.lkml@gmail.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/1411559322-16548-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:57 +01:00
Chen Hanxiao
f622b429da sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
Remove question mark:

 s/New utsname group?/New utsname namespace

Unified style for IPC:

 s/New ipcs/New ipc namespace

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/1415091082-15093-1-git-send-email-chenhanxiao@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:58:53 +01:00
Gregory Fong
66f1c44887 bridge: include in6.h in if_bridge.h for struct in6_addr
if_bridge.h uses struct in6_addr ip6, but wasn't including the in6.h
header.  Thomas Backlund originally sent a patch to do this, but this
revealed a redefinition issue: https://lkml.org/lkml/2013/1/13/116

The redefinition issue should have been fixed by the following Linux
commits:
ee262ad827 inet: defines IPPROTO_* needed for module alias generation
cfd280c912 net: sync some IP headers with glibc

and the following glibc commit:
6c82a2f8d7c8e21e39237225c819f182ae438db3 Coordinate IPv6 definitions for Linux and glibc

so actually include the header now.

Reported-by: Colin Guthrie <colin@mageia.org>
Reported-by: Christiaan Welvaart <cjw@daneel.dyndns.org>
Reported-by: Thomas Backlund <tmb@mageia.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 17:13:34 -05:00
stephen hemminger
7071cf7fc4 uapi: add missing network related headers to kbuild
The makefile for sanitizing kernel headers uses the kbuild file
to determine which files to do. Several networking related headers
were missing. Without these headers iproute2 build would break.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 12:33:29 -05:00
Linus Torvalds
f5fa363026 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Various scheduler fixes all over the place: three SCHED_DL fixes,
  three sched/numa fixes, two generic race fixes and a comment fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/dl: Fix preemption checks
  sched: Update comments for CLONE_NEWNS
  sched: stop the unbound recursion in preempt_schedule_context()
  sched/fair: Fix division by zero sysctl_numa_balancing_scan_size
  sched/fair: Care divide error in update_task_scan_period()
  sched/numa: Fix unsafe get_task_struct() in task_numa_assign()
  sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer()
  sched/deadline: Don't replenish from a !SCHED_DEADLINE entity
  sched: Fix race between task_group and sched_task_group
2014-10-31 14:05:35 -07:00
Linus Torvalds
5656b408ff Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, plus on the kernel side:

   - a revert for a newly introduced PMU driver which isn't complete yet
     and where we ran out of time with fixes (to be tried again in
     v3.19) - this makes up for a large chunk of the diffstat.

   - compilation warning fixes

   - a printk message fix

   - event_idx usage fixes/cleanups"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf probe: Trivial typo fix for --demangle
  perf tools: Fix report -F dso_from for data without branch info
  perf tools: Fix report -F dso_to for data without branch info
  perf tools: Fix report -F symbol_from for data without branch info
  perf tools: Fix report -F symbol_to for data without branch info
  perf tools: Fix report -F mispredict for data without branch info
  perf tools: Fix report -F in_tx for data without branch info
  perf tools: Fix report -F abort for data without branch info
  perf tools: Make CPUINFO_PROC an array to support different kernel versions
  perf callchain: Use global caching provided by libunwind
  perf/x86/intel: Revert incomplete and undocumented Broadwell client support
  perf/x86: Fix compile warnings for intel_uncore
  perf: Fix typos in sample code in the perf_event.h header
  perf: Fix and clean up initialization of pmu::event_idx
  perf: Fix bogus kernel printk
  perf diff: Add missing hists__init() call at tool start
2014-10-31 14:01:47 -07:00
Linus Torvalds
7f474df0a7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
 - workarounds for a couple of misbehaving Elan Touchscreens, by Adel
   Gadllah
 - fix for TransducerSerialNumber field implementation, by Jason Gerecke
 - a couple of new HID usages (added by HUT), by Olivier Gay

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: input: Fix TransducerSerialNumber implementation
  HID: add keyboard input assist hid usages
  HID: usbhid: enable always-poll quirk for Elan Touchscreen 016f
  HID: usbhid: enable always-poll quirk for Elan Touchscreen 009b
2014-10-29 11:52:35 -07:00
Andy Lutomirski
b438b1ab35 perf: Fix typos in sample code in the perf_event.h header
struct perf_event_mmap_page has members called "index" and
"cap_user_rdpmc".  Spell them correctly in the examples.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/320ba26391a8123cc16e5f02d24d34bd404332fd.1412313343.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:51:02 +01:00
Chen Hanxiao
fcd964dda5 sched: Update comments for CLONE_NEWNS
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/1412674147-8941-1-git-send-email-chenhanxiao@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:46:08 +01:00
Linus Torvalds
f7e87a44ef Merge tag 'media/v3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
 "A series of driver fixes:
   - a few compilation fixes with randconfigs
   - one potential compilation breakage on userspace due to the usage of
     a gcc extension
   - several warnings fixed
   - some other random driver fixes"

* tag 'media/v3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (22 commits)
  [media] s5p-jpeg: Avoid -Wuninitialized warning in s5p_jpeg_parse_hdr
  [media] s5p-fimc: Only build suspend/resume for PM
  [media] s5p-jpeg: Only build suspend/resume for PM
  [media] Remove references to non-existent PLAT_S5P symbol
  [media] videobuf-dma-contig: set vm_pgoff to be zero to pass the sanity check in vm_iomap_memory()
  [media] tw68: remove bogus I2C_ALGOBIT dependency
  [media] usbvision-video: two use after frees
  [media] tw68: remove deprecated IRQF_DISABLED
  [media] xc5000: use after free in release()
  [media] em28xx-input: NULL dereference on error
  [media] wl128x: fix fmdbg compiler warning
  Revert "[media] v4l2-dv-timings: fix a sparse warning"
  [media] hackrf: harmless off by one in debug code
  [media] cx23885: initialize config structs for T9580
  [media] v4l: uvcvideo: Fix buffer completion size check
  [media] vivid: fix buffer overrun
  [media] saa7146: Create a device name before it's used
  [media] em28xx: fix uninitialized variable warning
  [media] vivid: fix Kconfig FB dependency
  [media] anysee: make sure loading modules is const
  ...
2014-10-27 15:05:40 -07:00