Commit Graph

3658 Commits

Author SHA1 Message Date
Vlastimil Babka edf14cdbf9 mm, printk: introduce new format string for flags
In mm we use several kinds of flags bitfields that are sometimes printed
for debugging purposes, or exported to userspace via sysfs.  To make
them easier to interpret independently on kernel version and config, we
want to dump also the symbolic flag names.  So far this has been done
with repeated calls to pr_cont(), which is unreliable on SMP, and not
usable for e.g.  sysfs export.

To get a more reliable and universal solution, this patch extends
printk() format string for pointers to handle the page flags (%pGp),
gfp_flags (%pGg) and vma flags (%pGv).  Existing users of
dump_flag_names() are converted and simplified.

It would be possible to pass flags by value instead of pointer, but the
%p format string for pointers already has extensions for various kernel
structures, so it's a good fit, and the extra indirection in a
non-critical path is negligible.

[linux@rasmusvillemoes.dk: lots of good implementation suggestions]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Linus Torvalds e71c2c1eeb Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Main kernel side changes:

   - Big reorganization of the x86 perf support code.  The old code grew
     organically deep inside arch/x86/kernel/cpu/perf* and its naming
     became somewhat messy.

     The new location is under arch/x86/events/, using the following
     cleaner hierarchy of source code files:

       perf/x86: Move perf_event.c .................. => x86/events/core.c
       perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c
       perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c
       perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch]
       perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c
       perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c
       perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c
       perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c
       perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
       perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
       perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
       perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
       perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
       perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
       perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
       perf/x86: Move perf_event_intel_uncore_snb.c   => x86/events/intel/uncore_snb.c
       perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
       perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
       perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
       perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
       perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c

     (Borislav Petkov)

   - Update various x86 PMU constraint and hw support details (Stephane
     Eranian)

   - Optimize kprobes for BPF execution (Martin KaFai Lau)

   - Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas
     Gleixner)

   - Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner)

   - Various fixes and smaller cleanups.

  There are lots of perf tooling updates as well.  A few highlights:

  perf report/top:

     - Hierarchy histogram mode for 'perf top' and 'perf report',
       showing multiple levels, one per --sort entry: (Namhyung Kim)

       On a mostly idle system:

         # perf top --hierarchy -s comm,dso

       Then expand some levels and use 'P' to take a snapshot:

         # cat perf.hist.0
         -  92.32%         perf
               58.20%         perf
               22.29%         libc-2.22.so
                5.97%         [kernel]
                4.18%         libelf-0.165.so
                1.69%         [unknown]
         -   4.71%         qemu-system-x86
                3.10%         [kernel]
                1.60%         qemu-system-x86_64 (deleted)
         +   2.97%         swapper
         #

     - Add 'L' hotkey to dynamicly set the percent threshold for
       histogram entries and callchains, i.e.  dynamicly do what the
       --percent-limit command line option to 'top' and 'report' does.
       (Namhyung Kim)

  perf mem:

     - Allow specifying events via -e in 'perf mem record', also listing
       what events can be specified via 'perf mem record -e list' (Jiri
       Olsa)

  perf record:

     - Add 'perf record' --all-user/--all-kernel options, so that one
       can tell that all the events in the command line should be
       restricted to the user or kernel levels (Jiri Olsa), i.e.:

         perf record -e cycles:u,instructions:u

       is equivalent to:

         perf record --all-user -e cycles,instructions

     - Make 'perf record' collect CPU cache info in the perf.data file header:

         $ perf record usleep 1
         [ perf record: Woken up 1 times to write data ]
         [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
         $ perf report --header-only -I | tail -10 | head -8
         # CPU cache info:
         #  L1 Data                 32K [0-1]
         #  L1 Instruction          32K [0-1]
         #  L1 Data                 32K [2-3]
         #  L1 Instruction          32K [2-3]
         #  L2 Unified             256K [0-1]
         #  L2 Unified             256K [2-3]
         #  L3 Unified            4096K [0-3]

       Will be used in 'perf c2c' and eventually in 'perf diff' to
       allow, for instance running the same workload in multiple
       machines and then when using 'diff' show the hardware difference.
       (Jiri Olsa)

     - Improved support for Java, using the JVMTI agent library to do
       jitdumps that then will be inserted in synthesized
       PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized
       ELF files stored in ~/.debug and keyed with build-ids, to allow
       symbol resolution and even annotation with source line info, see
       the changeset comments to see how to use it (Stephane Eranian)

  perf script/trace:

     - Decode data_src values (e.g.  perf.data files generated by 'perf
       mem record') in 'perf script': (Jiri Olsa)

         # perf script
           perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No <SNIP>
                                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     - Improve support to 'data_src', 'weight' and 'addr' fields in
       'perf script' (Jiri Olsa)

     - Handle empty print fmts in 'perf script -s' i.e. when running
       python or perl scripts (Taeung Song)

  perf stat:

     - 'perf stat' now shows shadow metrics (insn per cycle, etc) in
       interval mode too.  E.g:

         # perf stat -I 1000 -e instructions,cycles sleep 1
         #         time   counts unit events
            1.000215928  519,620      instructions     #  0.69 insn per cycle
            1.000215928  752,003      cycles
         <SNIP>

     - Port 'perf kvm stat' to PowerPC (Hemant Kumar)

     - Implement CSV metrics output in 'perf stat' (Andi Kleen)

  perf BPF support:

     - Support converting data from bpf events in 'perf data' (Wang Nan)

     - Print bpf-output events in 'perf script': (Wang Nan).

         # perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000
         # perf script
            usleep  4882 21384.532523:   evt:  ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms])
             BPF output: 0000: 52 61 69 73 65 20 61 20  Raise a
                         0008: 42 50 46 20 65 76 65 6e  BPF even
                         0010: 74 21 00 00              t!..
             BPF string: "Raise a BPF event!"
         #

     - Add API to set values of map entries in a BPF object, be it
       individual map slots or ranges (Wang Nan)

     - Introduce support for the 'bpf-output' event (Wang Nan)

     - Add glue to read perf events in a BPF program (Wang Nan)

     - Improve support for bpf-output events in 'perf trace' (Wang Nan)

  ... and tons of other changes as well - see the shortlog and git log
  for details!"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits)
  perf stat: Add --metric-only support for -A
  perf stat: Implement --metric-only mode
  perf stat: Document CSV format in manpage
  perf hists browser: Check sort keys before hot key actions
  perf hists browser: Allow thread filtering for comm sort key
  perf tools: Add sort__has_comm variable
  perf tools: Recalc total periods using top-level entries in hierarchy
  perf tools: Remove nr_sort_keys field
  perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry()
  perf tools: Remove hist_entry->fmt field
  perf tools: Fix command line filters in hierarchy mode
  perf tools: Add more sort entry check functions
  perf tools: Fix hist_entry__filter() for hierarchy
  perf jitdump: Build only on supported archs
  tools lib traceevent: Add '~' operation within arg_num_eval()
  perf tools: Omit unnecessary cast in perf_pmu__parse_scale
  perf tools: Pass perf_hpp_list all the way through setup_sort_list
  perf tools: Fix perf script python database export crash
  perf jitdump: DWARF is also needed
  perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes
  ...
2016-03-14 17:58:53 -07:00
Ingo Molnar 6cbe9e4a22 Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 10:28:27 +01:00
Dan Williams d77a117e68 list: kill list_force_poison()
Given we have uninitialized list_heads being passed to list_add() it
will always be the case that those uninitialized values randomly trigger
the poison value.  Especially since a list_add() operation will seed the
stack with the poison value for later stack allocations to trip over.

For example, see these two false positive reports:

  list_add attempted on force-poisoned entry
  WARNING: at lib/list_debug.c:34
  [..]
  NIP [c00000000043c390] __list_add+0xb0/0x150
  LR [c00000000043c38c] __list_add+0xac/0x150
  Call Trace:
    __list_add+0xac/0x150 (unreliable)
    __down+0x4c/0xf8
    down+0x68/0x70
    xfs_buf_lock+0x4c/0x150 [xfs]

  list_add attempted on force-poisoned entry(0000000000000500),
   new->next == d0000000059ecdb0, new->prev == 0000000000000500
  WARNING: at lib/list_debug.c:33
  [..]
  NIP [c00000000042db78] __list_add+0xa8/0x140
  LR [c00000000042db74] __list_add+0xa4/0x140
  Call Trace:
    __list_add+0xa4/0x140 (unreliable)
    rwsem_down_read_failed+0x6c/0x1a0
    down_read+0x58/0x60
    xfs_log_commit_cil+0x7c/0x600 [xfs]

Fixes: commit 5c2c2587b1 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Eryu Guan <eguan@redhat.com>
Tested-by: Eryu Guan <eguan@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09 15:43:42 -08:00
Ingo Molnar 39a1142dbb Merge tag 'v4.5-rc6' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:55:22 +01:00
Thomas Gleixner 3712bba1a2 cpumask: Export cpumask_any_but()
Almost every cpumask function is exported, just not the one I need to make the
Intel uncore driver modular.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.878299859@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:20 +01:00
Jason Andryuk a68075908a lib/ucs2_string: Correct ucs2 -> utf8 conversion
The comparisons should be >= since 0x800 and 0x80 require an additional bit
to store.

For the 3 byte case, the existing shift would drop off 2 more bits than
intended.

For the 2 byte case, there should be 5 bits bits in byte 1, and 6 bits in
byte 2.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Matthew Garrett <mjg59@coreos.com>
Cc: "Lee, Chun-Yi" <jlee@suse.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-02-16 12:49:05 +00:00
Ingo Molnar 4682c211a8 Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:

 * Prevent accidental deletion of EFI variables through efivarfs that
   may brick machines. We use a whitelist of known-safe variables to
   allow things like installing distributions to work out of the box, and
   instead restrict vendor-specific variable deletion by making
   non-whitelist variables immutable (Peter Jones)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 13:14:57 +01:00
Linus Torvalds 60f40585c9 Merge tag 'driver-core-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
 "Here is one driver core, well klist, fix for 4.5-rc4.

  It fixes a problem found in the scsi device list traversal that
  probably also could be triggered by other subsystems.

  The fix has been in linux-next for a while with no reported problems"

* tag 'driver-core-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  klist: fix starting point removed bug in klist iterators
2016-02-14 12:34:53 -08:00
Ingo Molnar e2d6f8a5f5 Merge branch 'linus' into locking/core, to resolve conflicts
Conflicts:
	kernel/locking/lockdep.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-13 08:30:07 +01:00
Jason A. Donenfeld 7eb3912994 vsprintf: kptr_restrict is okay in IRQ when 2
The kptr_restrict flag, when set to 1, only prints the kernel address
when the user has CAP_SYSLOG.  When it is set to 2, the kernel address
is always printed as zero.  When set to 1, this needs to check whether
or not we're in IRQ.

However, when set to 2, this check is unneccessary, and produces
confusing results in dmesg.  Thus, only make sure we're not in IRQ when
mode 1 is used, but not mode 2.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11 18:35:48 -08:00
Yang Shi 7707535ab9 ubsan: cosmetic fix to Kconfig text
When enabling UBSAN_SANITIZE_ALL, the kernel image size gets increased
significantly (~3x).  So, it sounds better to have some note in Kconfig.

And, fixed a typo.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11 18:35:48 -08:00
Linus Torvalds 9aece75c13 Merge branch 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
 "Workqueue fixes for v4.5-rc3.

   - Remove a spurious triggering of flush dependency warning.

   - Officially break local execution guarantee of unbound work items
     and add a debug feature to flush out usages which depend on it.

   - Work around CPU -> NODE mapping becoming invalid on CPU offline.

  The branch is young but pushing out early as stable kernels are being
  affected"

* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup
  workqueue: implement "workqueue.debug_force_rr_cpu" debug feature
  workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs
  Revert "workqueue: make sure delayed work run in local cpu"
  workqueue: skip flush dependency checks for legacy workqueues
2016-02-10 11:04:05 -08:00
Peter Jones 73500267c9 lib/ucs2_string: Add ucs2 -> utf8 helper functions
This adds ucs2_utf8size(), which tells us how big our ucs2 string is in
bytes, and ucs2_as_utf8, which translates from ucs2 to utf8..

Signed-off-by: Peter Jones <pjones@redhat.com>
Tested-by: Lee, Chun-Yi <jlee@suse.com>
Acked-by: Matthew Garrett <mjg59@coreos.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-02-10 13:19:03 +00:00
Tejun Heo f303fccb82 workqueue: implement "workqueue.debug_force_rr_cpu" debug feature
Workqueue used to guarantee local execution for work items queued
without explicit target CPU.  The guarantee is gone now which can
break some usages in subtle ways.  To flush out those cases, this
patch implements a debug feature which forces round-robin CPU
selection for all such work items.

The debug feature defaults to off and can be enabled with a kernel
parameter.  The default can be flipped with a debug config option.

If you hit this commit during bisection, please refer to 041bd12e27
("Revert "workqueue: make sure delayed work run in local cpu"") for
more information and ping me.

Signed-off-by: Tejun Heo <tj@kernel.org>
2016-02-09 17:59:38 -05:00
Ingo Molnar 3aa6b46c6d Merge branch 'locking/urgent' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 12:03:15 +01:00
Arnd Bergmann 975db45e9c locking/static_keys: Avoid nested functions
clang does not support nested functions inside of an array definition:

  lib/test_static_keys.c:105:16: error: function definition is not allowed here
                          .test_key       = test_key_func(&old_true_key, static_key_true),
  lib/test_static_keys.c:50:20: note: expanded from macro 'test_key_func'
          ({bool func(void) { return branch(key); } func; })

That code appears to be a little too clever, so this simplifies it
a bit by defining functions outside of the array.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jason Baron <jbaron@akamai.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1454942223-2781480-1-git-send-email-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 10:27:29 +01:00
Masahiro Yamada 4ba6a2b28f scatterlist: fix a typo in comment block of sg_miter_stop()
Fix the doubled "started" and tidy up the following sentences.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-08 10:15:17 -08:00
James Bottomley 00cd29b799 klist: fix starting point removed bug in klist iterators
The starting node for a klist iteration is often passed in from
somewhere way above the klist infrastructure, meaning there's no
guarantee the node is still on the list.  We've seen this in SCSI where
we use bus_find_device() to iterate through a list of devices.  In the
face of heavy hotplug activity, the last device returned by
bus_find_device() can be removed before the next call.  This leads to

Dec  3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at include/linux/kref.h:47 klist_iter_init_node+0x3d/0x50()
Dec  3 13:22:02 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core
Dec  3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Not tainted 4.4.0-rc1+ #2
Dec  3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013
Dec  3 13:22:02 localhost kernel: ffffffff81a20e77 ffff880613acfd18 ffffffff81321eef 0000000000000000
Dec  3 13:22:02 localhost kernel: ffff880613acfd50 ffffffff8107ca52 ffff88061176b198 0000000000000000
Dec  3 13:22:02 localhost kernel: ffffffff814542b0 ffff880610cfb100 ffff88061176b198 ffff880613acfd60
Dec  3 13:22:02 localhost kernel: Call Trace:
Dec  3 13:22:02 localhost kernel: [<ffffffff81321eef>] dump_stack+0x44/0x55
Dec  3 13:22:02 localhost kernel: [<ffffffff8107ca52>] warn_slowpath_common+0x82/0xc0
Dec  3 13:22:02 localhost kernel: [<ffffffff814542b0>] ? proc_scsi_show+0x20/0x20
Dec  3 13:22:02 localhost kernel: [<ffffffff8107cb4a>] warn_slowpath_null+0x1a/0x20
Dec  3 13:22:02 localhost kernel: [<ffffffff8167225d>] klist_iter_init_node+0x3d/0x50
Dec  3 13:22:02 localhost kernel: [<ffffffff81421d41>] bus_find_device+0x51/0xb0
Dec  3 13:22:02 localhost kernel: [<ffffffff814545ad>] scsi_seq_next+0x2d/0x40
[...]

And an eventual crash. It can actually occur in any hotplug system
which has a device finder and a starting device.

We can fix this globally by making sure the starting node for
klist_iter_init_node() is actually a member of the list before using it
(and by starting from the beginning if it isn't).

Reported-by: Ewan D. Milne <emilne@redhat.com>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-07 22:18:47 -08:00
Eric Dumazet d7ce369243 dump_stack: avoid potential deadlocks
Some servers experienced fatal deadlocks because of a combination of
bugs, leading to multiple cpus calling dump_stack().

The checksumming bug was fixed in commit 34ae6a1aa0 ("ipv6: update
skb->csum when CE mark is propagated").

The second problem is a faulty locking in dump_stack()

CPU1 runs in process context and calls dump_stack(), grabs dump_lock.

   CPU2 receives a TCP packet under softirq, grabs socket spinlock, and
   call dump_stack() from netdev_rx_csum_fault().

   dump_stack() spins on atomic_cmpxchg(&dump_lock, -1, 2), since
   dump_lock is owned by CPU1

While dumping its stack, CPU1 is interrupted by a softirq, and happens
to process a packet for the TCP socket locked by CPU2.

CPU1 spins forever in spin_lock() : deadlock

Stack trace on CPU1 looked like :

    NMI backtrace for cpu 1
    RIP: _raw_spin_lock+0x25/0x30
    ...
    Call Trace:
      <IRQ>
      tcp_v6_rcv+0x243/0x620
      ip6_input_finish+0x11f/0x330
      ip6_input+0x38/0x40
      ip6_rcv_finish+0x3c/0x90
      ipv6_rcv+0x2a9/0x500
      process_backlog+0x461/0xaa0
      net_rx_action+0x147/0x430
      __do_softirq+0x167/0x2d0
      call_softirq+0x1c/0x30
      do_softirq+0x3f/0x80
      irq_exit+0x6e/0xc0
      smp_call_function_single_interrupt+0x35/0x40
      call_function_single_interrupt+0x6a/0x70
      <EOI>
      printk+0x4d/0x4f
      printk_address+0x31/0x33
      print_trace_address+0x33/0x3c
      print_context_stack+0x7f/0x119
      dump_trace+0x26b/0x28e
      show_trace_log_lvl+0x4f/0x5c
      show_stack_log_lvl+0x104/0x113
      show_stack+0x42/0x44
      dump_stack+0x46/0x58
      netdev_rx_csum_fault+0x38/0x3c
      __skb_checksum_complete_head+0x6e/0x80
      __skb_checksum_complete+0x11/0x20
      tcp_rcv_established+0x2bd5/0x2fd0
      tcp_v6_do_rcv+0x13c/0x620
      sk_backlog_rcv+0x15/0x30
      release_sock+0xd2/0x150
      tcp_recvmsg+0x1c1/0xfc0
      inet_recvmsg+0x7d/0x90
      sock_recvmsg+0xaf/0xe0
      ___sys_recvmsg+0x111/0x3b0
      SyS_recvmsg+0x5c/0xb0
      system_call_fastpath+0x16/0x1b

Fixes: b58d977432 ("dump_stack: serialize the output from dump_stack()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-05 18:10:40 -08:00
Matthew Wilcox 46437f9a55 radix-tree: fix race in gang lookup
If the indirect_ptr bit is set on a slot, that indicates we need to redo
the lookup.  Introduce a new function radix_tree_iter_retry() which
forces the loop to retry the lookup by setting 'slot' to NULL and
turning the iterator back to point at the problematic entry.

This is a pretty rare problem to hit at the moment; the lookup has to
race with a grow of the radix tree from a height of 0.  The consequences
of hitting this race are that gang lookup could return a pointer to a
radix_tree_node instead of a pointer to whatever the user had inserted
in the tree.

Fixes: cebbd29e1c ("radix-tree: rewrite gang lookup using iterator")
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-03 08:28:43 -08:00
Vitaly Kuznetsov 72676bb53f lib/test-string_helpers.c: fix and improve string_get_size() tests
Recently added commit 564b026fbd ("string_helpers: fix precision loss
for some inputs") fixed precision issues for string_get_size() and broke
tests.

Fix and improve them: test both STRING_UNITS_2 and STRING_UNITS_10 at a
time, better failure reporting, test small an huge values.

Fixes: 564b026fbd ("string_helpers: fix precision loss for some inputs")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-03 08:28:43 -08:00
Christian Borntraeger 0b6ec8c0a3 debugobjects: Allow bigger number of early boot objects
On my bigger s390 systems  I always get "Out of memory.
ODEBUG disabled". Since the number of objects is needed at
compile time, we can not change the size dynamically before
the caches etc are available. Doubling the size seems to
do the trick. Since it is init data it will be freed anyway,
this should be ok.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: http://lkml.kernel.org/r/1453905478-13409-1-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-27 15:40:59 +01:00
Linus Torvalds 048ccca8c1 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
 "Initial roundup of 4.5 merge window patches

   - Remove usage of ib_query_device and instead store attributes in
     ib_device struct

   - Move iopoll out of block and into lib, rename to irqpoll, and use
     in several places in the rdma stack as our new completion queue
     polling library mechanism.  Update the other block drivers that
     already used iopoll to use the new mechanism too.

   - Replace the per-entry GID table locks with a single GID table lock

   - IPoIB multicast cleanup

   - Cleanups to the IB MR facility

   - Add support for 64bit extended IB counters

   - Fix for netlink oops while parsing RDMA nl messages

   - RoCEv2 support for the core IB code

   - mlx4 RoCEv2 support

   - mlx5 RoCEv2 support

   - Cross Channel support for mlx5

   - Timestamp support for mlx5

   - Atomic support for mlx5

   - Raw QP support for mlx5

   - MAINTAINERS update for mlx4/mlx5

   - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates

   - Add support for remote invalidate to the iSER driver (pushed
     through the RDMA tree due to dependencies, acknowledged by nab)

   - Update to NFSoRDMA (pushed through the RDMA tree due to
     dependencies, acknowledged by Bruce)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
  IB/mlx5: Unify CQ create flags check
  IB/mlx5: Expose Raw Packet QP to user space consumers
  {IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
  IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
  IB/mlx5: Add Raw Packet QP query functionality
  IB/mlx5: Add create and destroy functionality for Raw Packet QP
  IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
  IB/mlx5: Allocate a Transport Domain for each ucontext
  net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
  net/mlx5_core: Add RQ and SQ event handling
  net/mlx5_core: Export transport objects
  IB/mlx5: Expose CQE version to user-space
  IB/mlx5: Add CQE version 1 support to user QPs and SRQs
  IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
  IB/sa: Fix netlink local service GFP crash
  IB/srpt: Remove redundant wc array
  IB/qib: Improve ipoib UD performance
  IB/mlx4: Advertise RoCE v2 support
  IB/mlx4: Create and use another QP1 for RoCEv2
  IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
  ...
2016-01-23 18:45:06 -08:00
Linus Torvalds 48162a203e Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes the following issues:

  API:
   - A large number of bug fixes for the af_alg interface, credit goes
     to Dmitry Vyukov for discovering and reporting these issues.

  Algorithms:
   - sw842 needs to select crc32.
   - The soft dependency on crc32c is now in the correct spot.

  Drivers:
   - The atmel AES driver needs HAS_DMA.
   - The atmel AES driver was a missing break statement, fortunately
     it's only a debug function.
   - A number of bug fixes for the Intel qat driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (24 commits)
  crypto: algif_skcipher - sendmsg SG marking is off by one
  crypto: crc32c - Fix crc32c soft dependency
  crypto: algif_skcipher - Load TX SG list after waiting
  crypto: atmel-aes - Add missing break to atmel_aes_reg_name
  crypto: algif_skcipher - Fix race condition in skcipher_check_key
  crypto: algif_hash - Fix race condition in hash_check_key
  crypto: CRYPTO_DEV_ATMEL_AES should depend on HAS_DMA
  lib: sw842: select crc32
  crypto: af_alg - Forbid bind(2) when nokey child sockets are present
  crypto: algif_skcipher - Remove custom release parent function
  crypto: algif_hash - Remove custom release parent function
  crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey path
  crypto: qat - update init_esram for C3xxx dev type
  crypto: qat - fix timeout issues
  crypto: qat - remove to call get_sram_bar_id for qat_c3xxx
  crypto: algif_skcipher - Add key check exception for cipher_null
  crypto: skcipher - Add crypto_skcipher_has_setkey
  crypto: algif_hash - Require setkey before accept(2)
  crypto: hash - Add crypto_ahash_has_setkey
  crypto: algif_skcipher - Add nokey compatibility path
  ...
2016-01-22 11:58:43 -08:00