Commit Graph

16218 Commits

Author SHA1 Message Date
Linus Torvalds
1fe0135b9e Merge tag 'pm+acpi-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:

 - Revert two cpuidle commits added during the 3.8 development cycle
   that turn out to have introduced a significant performance regression
   as requested by Jeremy Eder.

 - The recent patches that made the freezer less heavy-weight introduced
   a regression causing user-space-driven hibernation using the ioctl()
   interface to block indefinitely when the hibernate process executes
   try_to_freeze().  Fix from Colin Cross addresses this by adding a
   process flag to mark the hibernate/suspend process to inform the
   freezer that that process should be ignored.

 - One of the recent cpufreq reverts uncovered a problem in the core
   causing the cpufreq driver module refcount to become negative after a
   system suspend-resume cycle.  Fix from Rafael J Wysocki.

 - The evaluation of the ACPI battery _BIX method has never worked
   correctly, because the commit that added support for it forgot to
   take the "Revision" field in the return package into account.  As a
   result, the reading of battery info doesn't work at all on some
   systems, which is addressed by a fix from Lan Tianyu.

* tag 'pm+acpi-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processes
  ACPI / battery: Fix parsing _BIX return value
  cpufreq: Fix cpufreq driver module refcount balance after suspend/resume
  Revert "cpuidle: Quickly notice prediction failure for repeat mode"
  Revert "cpuidle: Quickly notice prediction failure in general case"
2013-08-02 12:21:32 -07:00
Linus Torvalds
19788a9008 Merge branch 'akpm' (patches from Andrew Morton)
Merge more patches from Andrew Morton:
 "A bunch of fixes.

  Plus Joe's printk move and rework.  It's not a -rc3 thing but now
  would be a nice time to offload it, while things are quiet.  I've been
  sitting on it all for a couple of weeks, no issues"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  vmpressure: make sure there are no events queued after memcg is offlined
  vmpressure: do not check for pending work to prevent from new work
  vmpressure: change vmpressure::sr_lock to spinlock
  printk: rename struct log to struct printk_log
  printk: use pointer for console_cmdline indexing
  printk: move braille console support into separate braille.[ch] files
  printk: add console_cmdline.h
  printk: move to separate directory for easier modification
  drivers/rtc/rtc-twl.c: fix: rtcX/wakealarm attribute isn't created
  mm: zbud: fix condition check on allocation size
  thp, mm: avoid PageUnevictable on active/inactive lru lists
  mm/swap.c: clear PageActive before adding pages onto unevictable list
  arch/x86/platform/ce4100/ce4100.c: include reboot.h
  mm: sched: numa: fix NUMA balancing when !SCHED_DEBUG
  rapidio: fix use after free in rio_unregister_scan()
  .gitignore: ignore *.lz4 files
  MAINTAINERS: dynamic debug: Jason's not there...
  dmi_scan: add comments on dmi_present() and the loop in dmi_scan_machine()
  ocfs2/refcounttree: add the missing NULL check of the return value of find_or_create_page()
  mm: mempolicy: fix mbind_range() && vma_adjust() interaction
2013-07-31 17:52:04 -07:00
Joe Perches
62e32ac350 printk: rename struct log to struct printk_log
Rename the struct to enable moving portions of
printk.c to separate files.

The rename changes output of /proc/vmcoreinfo.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31 14:41:03 -07:00
Joe Perches
23475408c6 printk: use pointer for console_cmdline indexing
Make the code a bit more compact by always using a pointer for the active
console_cmdline.

Move overly indented code to correct indent level.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31 14:41:03 -07:00
Joe Perches
bbeddf52ad printk: move braille console support into separate braille.[ch] files
Create files with prototypes and static inlines for braille support.  Make
braille_console functions return 1 on success.

Corrected CONFIG_A11Y_BRAILLE_CONSOLE=n _braille_console_setup
return value to NULL.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31 14:41:03 -07:00
Joe Perches
d197c43d04 printk: add console_cmdline.h
Add an include file for the console_cmdline struct so that the braille
console driver can be separated.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31 14:41:03 -07:00
Joe Perches
b9ee979e9d printk: move to separate directory for easier modification
Make it easier to break up printk into bite-sized chunks.

Remove printk path/filename from comment.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31 14:41:03 -07:00
Dave Kleikamp
10e84b97ed mm: sched: numa: fix NUMA balancing when !SCHED_DEBUG
Commit 3105b86a9f ("mm: sched: numa: Control enabling and disabling of
NUMA balancing if !SCHED_DEBUG") defined numabalancing_enabled to
control the enabling and disabling of automatic NUMA balancing, but it
is never used.

I believe the intention was to use this in place of sched_feat_numa(NUMA).

Currently, if SCHED_DEBUG is not defined, sched_feat_numa(NUMA) will
never be changed from the initial "false".

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31 14:41:02 -07:00
Linus Torvalds
06693f305e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix association failures not triggering a connect-failure event in
    cfg80211, from Johannes Berg.

 2) Eliminate a potential NULL deref with older iptables tools when
    configuring xt_socket rules, from Eric Dumazet.

 3) Missing RTNL locking in wireless regulatory code, from Johannes
    Berg.

 4) Fix OOPS caused by firmware loading races in ath9k_htc, from Alexey
    Khoroshilov.

 5) Fix usb URB leak in usb_8dev CAN driver, also from Alexey
    Khoroshilov.

 6) VXLAN namespace teardown fails to unregister devices, from Stephen
    Hemminger.

 7) Fix multicast settings getting dropped by firmware in qlcnic driver,
    from Sucheta Chakraborty.

 8) Add sysctl range enforcement for tcp_syn_retries, from Michal Tesar.

 9) Fix a nasty bug in bridging where an active timer would get
    reinitialized with a setup_timer() call.  From Eric Dumazet.

10) Fix use after free in new mlx5 driver, from Dan Carpenter.

11) Fix freed pointer reference in ipv6 multicast routing on namespace
    cleanup, from Hannes Frederic Sowa.

12) Some usbnet drivers report TSO and SG in their feature set, but the
    usbnet layer doesn't really support them.  From Eric Dumazet.

13) Fix crash on EEH errors in tg3 driver, from Gavin Shan.

14) Drop cb_lock when requesting modules in genetlink, from Stanislaw
    Gruszka.

15) Kernel stack leaks in cbq scheduler and af_key pfkey messages, from
    Dan Carpenter.

16) FEC driver erroneously signals NETDEV_TX_BUSY on transmit leading to
    endless loops, from Uwe Kleine-König.

17) Fix hangs from loading mvneta driver, from Arnaud Patard.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits)
  mlx5: fix error return code in mlx5_alloc_uuars()
  mvneta: Try to fix mvneta when compiled as module
  mvneta: Fix hang when loading the mvneta driver
  atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring
  genetlink: fix usage of NLM_F_EXCL or NLM_F_REPLACE
  af_key: more info leaks in pfkey messages
  net/fec: Don't let ndo_start_xmit return NETDEV_TX_BUSY without link
  net_sched: Fix stack info leak in cbq_dump_wrr().
  igb: fix vlan filtering in promisc mode when not in VT mode
  ixgbe: Fix Tx Hang issue with lldpad on 82598EB
  genetlink: release cb_lock before requesting additional module
  net: fec: workaround stop tx during errata ERR006358
  qlcnic: Fix diagnostic interrupt test for 83xx adapters.
  qlcnic: Fix setting Guest VLAN
  qlcnic: Fix operation type and command type.
  qlcnic: Fix initialization of work function.
  Revert "atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring"
  atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring
  net/tg3: Fix warning from pci_disable_device()
  net/tg3: Fix kernel crash
  ...
2013-07-31 12:56:18 -07:00
Colin Cross
2b44c4db2e freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processes
Calling freeze_processes sets a global flag that will cause any
process that calls try_to_freeze to enter the refrigerator.  It
skips sending a signal to the current task, but if the current
task ever hits try_to_freeze, all threads will be frozen and the
system will deadlock.

Set a new flag, PF_SUSPEND_TASK, on the task that calls
freeze_processes.  The flag notifies the freezer that the thread
is involved in suspend and should not be frozen.  Also add a
WARN_ON in thaw_processes if the caller does not have the
PF_SUSPEND_TASK flag set to catch if a different task calls
thaw_processes than the one that called freeze_processes, leaving
a task with PF_SUSPEND_TASK permanently set on it.

Threads that spawn off a task with PF_SUSPEND_TASK set (which
swsusp does) will also have PF_SUSPEND_TASK set, preventing them
from freezing while they are helping with suspend, but they need
to be dead by the time suspend is triggered, otherwise they may
run when userspace is expected to be frozen.  Add a WARN_ON in
thaw_processes if more than one thread has the PF_SUSPEND_TASK
flag set.

Reported-and-tested-by: Michael Leun <lkml20130126@newton.leun.net>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-30 14:05:06 +02:00
Rafael J. Wysocki
148519120c Revert "cpuidle: Quickly notice prediction failure for repeat mode"
Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
repeat mode), because it has been identified as the source of a
significant performance regression in v3.8 and later as explained by
Jeremy Eder:

  We believe we've identified a particular commit to the cpuidle code
  that seems to be impacting performance of variety of workloads.
  The simplest way to reproduce is using netperf TCP_RR test, so
  we're using that, on a pair of Sandy Bridge based servers.  We also
  have data from a large database setup where performance is also
  measurably/positively impacted, though that test data isn't easily
  share-able.

  Included below are test results from 3 test kernels:

  kernel       reverts
  -----------------------------------------------------------
  1) vanilla   upstream (no reverts)

  2) perfteam2 reverts e11538d1f0

  3) test      reverts 69a37beabf
                       e11538d1f0

  In summary, netperf TCP_RR numbers improve by approximately 4%
  after reverting 69a37beabf.  When
  69a37beabf is included, C0 residency
  never seems to get above 40%.  Taking that patch out gets C0 near
  100% quite often, and performance increases.

  The below data are histograms representing the %c0 residency @
  1-second sample rates (using turbostat), while under netperf test.

  - If you look at the first 4 histograms, you can see %c0 residency
    almost entirely in the 30,40% bin.
  - The last pair, which reverts 69a37beabf,
    shows %c0 in the 80,90,100% bins.

  Below each kernel name are netperf TCP_RR trans/s numbers for the
  particular kernel that can be disclosed publicly, comparing the 3
  test kernels.  We ran a 4th test with the vanilla kernel where
  we've also set /dev/cpu_dma_latency=0 to show overall impact
  boosting single-threaded TCP_RR performance over 11% above
  baseline.

  3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
  TCP_RR trans/s 54323.78

  -----------------------------------------------------------
  3.10-rc2 vanilla RX (no reverts)
  TCP_RR trans/s 48192.47

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    59]:
  ***********************************************************
     40.0000 -    50.0000 [     1]: *
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  Sender %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    11]: ***********
     40.0000 -    50.0000 [    49]:
  *************************************************
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  -----------------------------------------------------------
  3.10-rc2 perfteam2 RX (reverts commit
  e11538d1f0)
  TCP_RR trans/s 49698.69

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     1]: *
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    59]:
  ***********************************************************
     40.0000 -    50.0000 [     0]:
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  Sender %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [     2]: **
     40.0000 -    50.0000 [    58]:
  **********************************************************
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  -----------------------------------------------------------
  3.10-rc2 test RX (reverts 69a37beabf
  and e11538d1f0)
  TCP_RR trans/s 47766.95

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     1]: *
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    27]: ***************************
     40.0000 -    50.0000 [     2]: **
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     2]: **
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [    28]: ****************************

  Sender:
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    11]: ***********
     40.0000 -    50.0000 [     0]:
     50.0000 -    60.0000 [     1]: *
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     3]: ***
     80.0000 -    90.0000 [     7]: *******
     90.0000 -   100.0000 [    38]: **************************************

  These results demonstrate gaining back the tendency of the CPU to
  stay in more responsive, performant C-states (and thus yield
  measurably better performance), by reverting commit
  69a37beabf.

Requested-by: Jeremy Eder <jeder@redhat.com>
Tested-by: Len Brown <len.brown@intel.com>
Cc: 3.8+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-29 13:32:29 +02:00
Linus Torvalds
6803f37e09 Merge tag 'trace-fixes-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "Oleg is working on fixing a very tight race between opening a event
  file and deleting that event at the same time (both must be done as
  root).

  I also found a bug while testing Oleg's patches which has to do with a
  race with kprobes using the function tracer.

  There's also a deadlock fix that was introduced with the previous
  fixes"

* tag 'trace-fixes-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Remove locking trace_types_lock from tracing_reset_all_online_cpus()
  ftrace: Add check for NULL regs if ops has SAVE_REGS set
  tracing: Kill trace_cpu struct/members
  tracing: Change tracing_fops/snapshot_fops to rely on tracing_get_cpu()
  tracing: Change tracing_entries_fops to rely on tracing_get_cpu()
  tracing: Change tracing_stats_fops to rely on tracing_get_cpu()
  tracing: Change tracing_buffers_fops to rely on tracing_get_cpu()
  tracing: Change tracing_pipe_fops() to rely on tracing_get_cpu()
  tracing: Introduce trace_create_cpu_file() and tracing_get_cpu()
2013-07-28 18:10:39 -07:00
Francesco Fusco
d738ce8fdc sysctl: range checking in do_proc_dointvec_ms_jiffies_conv
When (integer) sysctl values are expressed in ms and have to be
represented internally as jiffies. The msecs_to_jiffies function
returns an unsigned long, which gets assigned to the integer.
This patch prevents the value to be assigned if bigger than
INT_MAX, done in a similar way as in cba9f3 ("Range checking in
do_proc_dointvec_(userhz_)jiffies_conv").

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-26 14:22:10 -07:00
Steven Rostedt (Red Hat)
09d8091c02 tracing: Remove locking trace_types_lock from tracing_reset_all_online_cpus()
Commit a82274151a "tracing: Protect ftrace_trace_arrays list in trace_events.c"
added taking the trace_types_lock mutex in trace_events.c as there were
several locations that needed it for protection. Unfortunately, it also
encapsulated a call to tracing_reset_all_online_cpus() which also takes
the trace_types_lock, causing a deadlock.

This happens when a module has tracepoints and has been traced. When the
module is removed, the trace events module notifier will grab the
trace_types_lock, do a bunch of clean ups, and also clears the buffer
by calling tracing_reset_all_online_cpus. This doesn't happen often
which explains why it wasn't caught right away.

Commit a82274151a was marked for stable, which means this must be
sent to stable too.

Link: http://lkml.kernel.org/r/51EEC646.7070306@broadcom.com

Reported-by: Arend van Spril <arend@broadcom.com>
Tested-by: Arend van Spriel <arend@broadcom.com>
Cc: Alexander Z Lam <azl@google.com>
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-26 08:57:32 -04:00
Steven Rostedt (Red Hat)
195a8afc7a ftrace: Add check for NULL regs if ops has SAVE_REGS set
If a ftrace ops is registered with the SAVE_REGS flag set, and there's
already a ops registered to one of its functions but without the
SAVE_REGS flag, there's a small race window where the SAVE_REGS ops gets
added to the list of callbacks to call for that function before the
callback trampoline gets set to save the regs.

The problem is, the function is not currently saving regs, which opens
a small race window where the ops that is expecting regs to be passed
to it, wont. This can cause a crash if the callback were to reference
the regs, as the SAVE_REGS guarantees that regs will be set.

To fix this, we add a check in the loop case where it checks if the ops
has the SAVE_REGS flag set, and if so, it will ignore it if regs is
not set.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-24 11:22:54 -04:00
Oleg Nesterov
9c01fe4593 tracing: Kill trace_cpu struct/members
After the previous changes trace_array_cpu->trace_cpu and
trace_array->trace_cpu becomes write-only. Remove these members
and kill "struct trace_cpu" as well.

As a side effect this also removes memset(per_cpu_memory, 0).
It was not needed, alloc_percpu() returns zero-filled memory.

Link: http://lkml.kernel.org/r/20130723152613.GA23741@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-24 11:22:53 -04:00
Oleg Nesterov
6484c71cbc tracing: Change tracing_fops/snapshot_fops to rely on tracing_get_cpu()
tracing_open() and tracing_snapshot_open() are racy, the memory
inode->i_private points to can be already freed.

Convert these last users of "inode->i_private == trace_cpu" to
use "i_private = trace_array" and rely on tracing_get_cpu().

v2: incorporate the fix from Steven, tracing_release() must not
    blindly dereference file->private_data unless we know that
    the file was opened for reading.

Link: http://lkml.kernel.org/r/20130723152610.GA23737@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-24 11:22:53 -04:00
Oleg Nesterov
0bc392ee46 tracing: Change tracing_entries_fops to rely on tracing_get_cpu()
tracing_open_generic_tc() is racy, the memory inode->i_private
points to can be already freed.

1. Change its last user, tracing_entries_fops, to use
   tracing_*_generic_tr() instead.

2. Change debugfs_create_file("buffer_size_kb", data) callers
   to pass "data = tr".

3. Change tracing_entries_read() and tracing_entries_write() to
   use tracing_get_cpu().

4. Kill the no longer used tracing_open_generic_tc() and
   tracing_release_generic_tc().

Link: http://lkml.kernel.org/r/20130723152606.GA23730@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-24 11:22:52 -04:00
Oleg Nesterov
4d3435b8a4 tracing: Change tracing_stats_fops to rely on tracing_get_cpu()
tracing_open_generic_tc() is racy, the memory inode->i_private
points to can be already freed.

1. Change one of its users, tracing_stats_fops, to use
   tracing_*_generic_tr() instead.

2. Change trace_create_cpu_file("stats", data) to pass "data = tr".

3. Change tracing_stats_read() to use tracing_get_cpu().

Link: http://lkml.kernel.org/r/20130723152603.GA23727@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-24 11:22:52 -04:00
Oleg Nesterov
46ef2be0d1 tracing: Change tracing_buffers_fops to rely on tracing_get_cpu()
tracing_buffers_open() is racy, the memory inode->i_private points
to can be already freed.

Change debugfs_create_file("trace_pipe_raw", data) caller to pass
"data = tr", tracing_buffers_open() can use tracing_get_cpu().

Change debugfs_create_file("snapshot_raw_fops", data) caller too,
this file uses tracing_buffers_open/release.

Link: http://lkml.kernel.org/r/20130723152600.GA23720@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-24 11:22:51 -04:00
Oleg Nesterov
15544209cb tracing: Change tracing_pipe_fops() to rely on tracing_get_cpu()
tracing_open_pipe() is racy, the memory inode->i_private points to
can be already freed.

Change debugfs_create_file("trace_pipe", data) callers to to pass
"data = tr", tracing_open_pipe() can use tracing_get_cpu().

Link: http://lkml.kernel.org/r/20130723152557.GA23717@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-24 11:22:51 -04:00
Oleg Nesterov
649e9c70da tracing: Introduce trace_create_cpu_file() and tracing_get_cpu()
Every "file_operations" used by tracing_init_debugfs_percpu is buggy.
f_op->open/etc does:

	1. struct trace_cpu *tc = inode->i_private;
	   struct trace_array *tr = tc->tr;

	2. trace_array_get(tr) or fail;

	3. do_something(tc);

But tc (and tr) can be already freed before trace_array_get() is called.
And it doesn't matter whether this file is per-cpu or it was created by
init_tracer_debugfs(), free_percpu() or kfree() are equally bad.

Note that even 1. is not safe, the freed memory can be unmapped. But even
if it was safe trace_array_get() can wrongly succeed if we also race with
the next new_instance_create() which can re-allocate the same tr, or tc
was overwritten and ->tr points to the valid tr. In this case 3. uses the
freed/reused memory.

Add the new trivial helper, trace_create_cpu_file() which simply calls
trace_create_file() and encodes "cpu" in "struct inode". Another helper,
tracing_get_cpu() will be used to read cpu_nr-or-RING_BUFFER_ALL_CPUS.

The patch abuses ->i_cdev to encode the number, it is never used unless
the file is S_ISCHR(). But we could use something else, say, i_bytes or
even ->d_fsdata. In any case this hack is hidden inside these 2 helpers,
it would be trivial to change them if needed.

This patch only changes tracing_init_debugfs_percpu() to use the new
trace_create_cpu_file(), the next patches will change file_operations.

Note: tracing_get_cpu(inode) is always safe but you can't trust the
result unless trace_array_get() was called, without trace_types_lock
which acts as a barrier it can wrongly return RING_BUFFER_ALL_CPUS.

Link: http://lkml.kernel.org/r/20130723152554.GA23710@redhat.com

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-24 11:22:13 -04:00
Linus Torvalds
c2468d32f5 Merge branch 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
 "This contains two patches, both of which aren't fixes per-se but I
  think it'd be better to fast-track them.

  One removes bcache_subsys_id which was added without proper review
  through the block tree.  Fortunately, bcache cgroup code is
  unconditionally disabled, so this was never exposed to userland.  The
  cgroup subsys_id is removed.  Kent will remove the affected (disabled)
  code through bcache branch.

  The other simplifies task_group_path_from_hierarchy().  The function
  doesn't currently have in-kernel users but there are external code and
  development going on dependent on the function and making the function
  available for 3.11 would make things go smoother"

* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: replace task_cgroup_path_from_hierarchy() with task_cgroup_path()
  cgroup: remove bcache_subsys_id which got added stealthily
2013-07-23 15:48:35 -07:00
David Howells
42577ca8c3 Fix __wait_on_atomic_t() to call the action func if the counter != 0
Fix __wait_on_atomic_t() so that it calls the action func if the counter != 0
rather than if the counter is 0 so as to be analogous to __wait_on_bit().

Thanks to Yacine who found this by visual inspection.

This will affect FS-Cache in that it will could fail to sleep correctly when
trying to clean up after a netfs cookie is withdrawn.

Reported-by: Yacine Belkadi <yacine.belkadi.1@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
cc: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-23 15:46:48 -07:00
Linus Torvalds
b3a3a9c441 Merge tag 'trace-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes and cleanups from Steven Rostedt:
 "This contains fixes, optimizations and some clean ups

  Some of the fixes need to go back to 3.10.  They are minor, and deal
  mostly with incorrect ref counting in accessing event files.

  There was a couple of optimizations that should have perf perform a
  bit better when accessing trace events.

  And some various clean ups.  Some of the clean ups are necessary to
  help in a fix to a theoretical race between opening a event file and
  deleting that event"

* tag 'trace-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Kill the unbalanced tr->ref++ in tracing_buffers_open()
  tracing: Kill trace_array->waiter
  tracing: Do not (ab)use trace_seq in event_id_read()
  tracing: Simplify the iteration logic in f_start/f_next
  tracing: Add ref_data to function and fgraph tracer structs
  tracing: Miscellaneous fixes for trace_array ref counting
  tracing: Fix error handling to ensure instances can always be removed
  tracing/kprobe: Wait for disabling all running kprobe handlers
  tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()
  tracing/syscall: Avoid perf_trace_buf_*() if sys_data->perf_events is empty
  tracing/function: Avoid perf_trace_buf_*() if event_function.perf_events is empty
  tracing: Typo fix on ring buffer comments
  tracing: Use trace_seq_puts()/trace_seq_putc() where possible
  tracing: Use correct config guard CONFIG_STACK_TRACER
2013-07-22 19:07:24 -07:00