Commit Graph

13044 Commits

Author SHA1 Message Date
Steffen Klassert
9612090527 padata: Fix cpu hotplug
We don't remove the cpu that went offline from our cpumasks
on cpu hotplug. This got lost somewhere along the line, so
restore it. This fixes a hang of the padata instance on cpu
hotplug.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-29 19:52:46 +08:00
Steffen Klassert
13614e0fb1 padata: Use the online cpumask as the default
We use the active cpumask to determine the superset of cpus
to use for parallelization. However, the active cpumask is
for internal usage of the scheduler and therefore not the
appropriate cpumask for these purposes. So use the online
cpumask instead.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-29 19:52:46 +08:00
Steffen Klassert
107f8bdac9 padata: Add a reference to the api documentation
Add a reference to the padata api documentation at Documentation/padata.txt

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-29 19:52:45 +08:00
Linus Torvalds
95211279c5 Merge branch 'akpm' (Andrew's patch-bomb)
Merge first batch of patches from Andrew Morton:
 "A few misc things and all the MM queue"

* emailed from Andrew Morton <akpm@linux-foundation.org>: (92 commits)
  memcg: avoid THP split in task migration
  thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE
  memcg: clean up existing move charge code
  mm/memcontrol.c: remove unnecessary 'break' in mem_cgroup_read()
  mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event()
  mm/memcontrol.c: s/stealed/stolen/
  memcg: fix performance of mem_cgroup_begin_update_page_stat()
  memcg: remove PCG_FILE_MAPPED
  memcg: use new logic for page stat accounting
  memcg: remove PCG_MOVE_LOCK flag from page_cgroup
  memcg: simplify move_account() check
  memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
  memcg: kill dead prev_priority stubs
  memcg: remove PCG_CACHE page_cgroup flag
  memcg: let css_get_next() rely upon rcu_read_lock()
  cgroup: revert ss_id_lock to spinlock
  idr: make idr_get_next() good for rcu_read_lock()
  memcg: remove unnecessary thp check in page stat accounting
  memcg: remove redundant returns
  memcg: enum lru_list lru
  ...
2012-03-22 09:04:48 -07:00
Hugh Dickins
ca464d69b1 memcg: let css_get_next() rely upon rcu_read_lock()
Remove lock and unlock around css_get_next()'s call to idr_get_next().
memcg iterators (only users of css_get_next) already did rcu_read_lock(),
and its comment demands that; but add a WARN_ON_ONCE to make sure of it.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:01 -07:00
Hugh Dickins
42aee6c495 cgroup: revert ss_id_lock to spinlock
Commit c1e2ee2dc4 ("memcg: replace ss->id_lock with a rwlock") has now
been seen to cause the unfair behavior we should have expected from
converting a spinlock to an rwlock: softlockup in cgroup_mkdir(), whose
get_new_cssid() is waiting for the wlock, while there are 19 tasks using
the rlock in css_get_next() to get on with their memcg workload (in an
artificial test, admittedly).  Yet lib/idr.c was made suitable for RCU
way back: revert that commit, restoring ss->id_lock to a spinlock.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:01 -07:00
David Rientjes
05af2e104a mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat()
sync_mm_rss() can only be used for current to avoid race conditions in
iterating and clearing its per-task counters.  Remove the task argument
for it and its helper function, __sync_task_rss_stat(), to avoid thinking
it can be used safely for anything other than current.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:59 -07:00
Mel Gorman
cc9a6c8776 cpuset: mm: reduce large amounts of memory barrier related damage v3
Commit c0ff7453bb ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.

[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths.  This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32.  The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.

For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.

This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side.  This is much cheaper on some architectures, including x86.  The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.

While updating the nodemask, a check is made to see if a false failure
is a risk.  If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.

In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%.  The
actual results were

                             3.3.0-rc3          3.3.0-rc3
                             rc3-vanilla        nobarrier-v2r1
    Clients   1 UserTime       0.07 (  0.00%)   0.08 (-14.19%)
    Clients   2 UserTime       0.07 (  0.00%)   0.07 (  2.72%)
    Clients   4 UserTime       0.08 (  0.00%)   0.07 (  3.29%)
    Clients   1 SysTime        0.70 (  0.00%)   0.65 (  6.65%)
    Clients   2 SysTime        0.85 (  0.00%)   0.82 (  3.65%)
    Clients   4 SysTime        1.41 (  0.00%)   1.41 (  0.32%)
    Clients   1 WallTime       0.77 (  0.00%)   0.74 (  4.19%)
    Clients   2 WallTime       0.47 (  0.00%)   0.45 (  3.73%)
    Clients   4 WallTime       0.38 (  0.00%)   0.37 (  1.58%)
    Clients   1 Flt/sec/cpu  497620.28 (  0.00%) 520294.53 (  4.56%)
    Clients   2 Flt/sec/cpu  414639.05 (  0.00%) 429882.01 (  3.68%)
    Clients   4 Flt/sec/cpu  257959.16 (  0.00%) 258761.48 (  0.31%)
    Clients   1 Flt/sec      495161.39 (  0.00%) 517292.87 (  4.47%)
    Clients   2 Flt/sec      820325.95 (  0.00%) 850289.77 (  3.65%)
    Clients   4 Flt/sec      1020068.93 (  0.00%) 1022674.06 (  0.26%)
    MMTests Statistics: duration
    Sys Time Running Test (seconds)             135.68    132.17
    User+Sys Time Running Test (seconds)         164.2    160.13
    Total Elapsed Time (seconds)                123.46    120.87

The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected).  The
actual number of page faults is noticeably improved.

For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.

To test the actual bug the commit fixed I opened two terminals.  The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data.  In a second window, the nodemask of the
cpuset was continually randomised in a loop.

Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:59 -07:00
Konstantin Khlebnikov
c3f0327f8e mm: add rss counters consistency check
Warn about non-zero rss counters at final mmdrop.

This check will prevent reoccurences of bugs such as that fixed in "mm:
fix rss count leakage during migration".

I didn't hide this check under CONFIG_VM_DEBUG because it rather small and
rss counters cover whole page-table management, so this is a good
invariant.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:55 -07:00
Linus Torvalds
e2a0883e40 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
  yet."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
  ext4: initialization of ext4_li_mtx needs to be done earlier
  debugfs-related mode_t whack-a-mole
  hfsplus: add an ioctl to bless files
  hfsplus: change finder_info to u32
  hfsplus: initialise userflags
  qnx4: new helper - try_extent()
  qnx4: get rid of qnx4_bread/qnx4_getblk
  take removal of PF_FORKNOEXEC to flush_old_exec()
  trim includes in inode.c
  um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
  um: embed ->stub_pages[] into mmu_context
  gadgetfs: list_for_each_safe() misuse
  ocfs2: fix leaks on failure exits in module_init
  ecryptfs: make register_filesystem() the last potential failure exit
  ntfs: forgets to unregister sysctls on register_filesystem() failure
  logfs: missing cleanup on register_filesystem() failure
  jfs: mising cleanup on register_filesystem() failure
  make configfs_pin_fs() return root dentry on success
  configfs: configfs_create_dir() has parent dentry in dentry->d_parent
  configfs: sanitize configfs_create()
  ...
2012-03-21 13:36:41 -07:00
Linus Torvalds
3556485f15 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates for 3.4 from James Morris:
 "The main addition here is the new Yama security module from Kees Cook,
  which was discussed at the Linux Security Summit last year.  Its
  purpose is to collect miscellaneous DAC security enhancements in one
  place.  This also marks a departure in policy for LSM modules, which
  were previously limited to being standalone access control systems.
  Chromium OS is using Yama, and I believe there are plans for Ubuntu,
  at least.

  This patchset also includes maintenance updates for AppArmor, TOMOYO
  and others."

Fix trivial conflict in <net/sock.h> due to the jumo_label->static_key
rename.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
  AppArmor: Fix location of const qualifier on generated string tables
  TOMOYO: Return error if fails to delete a domain
  AppArmor: add const qualifiers to string arrays
  AppArmor: Add ability to load extended policy
  TOMOYO: Return appropriate value to poll().
  AppArmor: Move path failure information into aa_get_name and rename
  AppArmor: Update dfa matching routines.
  AppArmor: Minor cleanup of d_namespace_path to consolidate error handling
  AppArmor: Retrieve the dentry_path for error reporting when path lookup fails
  AppArmor: Add const qualifiers to generated string tables
  AppArmor: Fix oops in policy unpack auditing
  AppArmor: Fix error returned when a path lookup is disconnected
  KEYS: testing wrong bit for KEY_FLAG_REVOKED
  TOMOYO: Fix mount flags checking order.
  security: fix ima kconfig warning
  AppArmor: Fix the error case for chroot relative path name lookup
  AppArmor: fix mapping of META_READ to audit and quiet flags
  AppArmor: Fix underflow in xindex calculation
  AppArmor: Fix dropping of allowed operations that are force audited
  AppArmor: Add mising end of structure test to caps unpacking
  ...
2012-03-21 13:25:04 -07:00
Linus Torvalds
b8716614a7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "* sha512 bug fixes (already in your tree).
  * SHA224/SHA384 AEAD support in caam.
  * X86-64 optimised version of Camellia.
  * Tegra AES support.
  * Bulk algorithm registration interface to make driver registration easier.
  * padata race fixes.
  * Misc fixes."

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (31 commits)
  padata: Fix race on sequence number wrap
  padata: Fix race in the serialization path
  crypto: camellia - add assembler implementation for x86_64
  crypto: camellia - rename camellia.c to camellia_generic.c
  crypto: camellia - fix checkpatch warnings
  crypto: camellia - rename camellia module to camellia_generic
  crypto: tcrypt - add more camellia tests
  crypto: testmgr - add more camellia test vectors
  crypto: camellia - simplify key setup and CAMELLIA_ROUNDSM macro
  crypto: twofish-x86_64/i586 - set alignmask to zero
  crypto: blowfish-x86_64 - set alignmask to zero
  crypto: serpent-sse2 - combine ablk_*_init functions
  crypto: blowfish-x86_64 - use crypto_[un]register_algs
  crypto: twofish-x86_64-3way - use crypto_[un]register_algs
  crypto: serpent-sse2 - use crypto_[un]register_algs
  crypto: serpent-sse2 - remove dead code from serpent_sse2_glue.c::serpent_sse2_init()
  crypto: twofish-x86 - Remove dead code from twofish_glue_3way.c::init()
  crypto: In crypto_add_alg(), 'exact' wants to be initialized to 0
  crypto: caam - fix gcc 4.6 warning
  crypto: Add bulk algorithm registration interface
  ...
2012-03-21 13:20:43 -07:00
Linus Torvalds
c207f3a431 Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6
Pull irq_domain support for all architectures from Grant Likely:
 "Generialize powerpc's irq_host as irq_domain

  This branch takes the PowerPC irq_host infrastructure (reverse mapping
  from Linux IRQ numbers to hardware irq numbering), generalizes it,
  renames it to irq_domain, and makes it available to all architectures.

  Originally the plan has been to create an all-new irq_domain
  implementation which addresses some of the powerpc shortcomings such
  as not handling 1:1 mappings well, but doing that proved to be far
  more difficult and invasive than generalizing the working code and
  refactoring it in-place.  So, this branch rips out the 'new'
  irq_domain and replaces it with the modified powerpc version (in a
  fully bisectable way of course).  It converts all users over to the
  new API and makes irq_domain selectable on any architecture.

  No architecture is forced to enable irq_domain, but the infrastructure
  is required for doing OpenFirmware style irq translations.  It will
  even work on SPARC even though SPARC has it's own mechanism for
  translating irqs at boot time.  MIPS, microblaze, embedded x86 and c6x
  are converted too.

  The resulting irq_domain code is probably still too verbose and can be
  optimized more, but that can be done incrementally and is a task for
  follow-on patches."

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6: (31 commits)
  dt: fix twl4030 for non-dt compile on x86
  mfd: twl-core: Add IRQ_DOMAIN dependency
  devicetree: Add empty of_platform_populate() for !CONFIG_OF_ADDRESS (sparc)
  irq_domain: Centralize definition of irq_dispose_mapping()
  irq_domain/mips: Allow irq_domain on MIPS
  irq_domain/x86: Convert x86 (embedded) to use common irq_domain
  ppc-6xx: fix build failure in flipper-pic.c and hlwd-pic.c
  irq_domain/microblaze: Convert microblaze to use irq_domains
  irq_domain/powerpc: Replace custom xlate functions with library functions
  irq_domain/powerpc: constify irq_domain_ops
  irq_domain/c6x: Use library of xlate functions
  irq_domain/c6x: constify irq_domain structures
  irq_domain/c6x: Convert c6x to use generic irq_domain support.
  irq_domain: constify irq_domain_ops
  irq_domain: Create common xlate functions that device drivers can use
  irq_domain: Remove irq_domain_add_simple()
  irq_domain: Remove 'new' irq_domain in favour of the ppc one
  mfd: twl-core.c: Fix the number of interrupts managed by twl4030
  of/address: add empty static inlines for !CONFIG_OF
  irq_domain: Add support for base irq and hwirq in legacy mappings
  ...
2012-03-21 10:27:19 -07:00
Linus Torvalds
c7c66c0cb0 Merge tag 'pm-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates for 3.4 from Rafael Wysocki:
 "Assorted extensions and fixes including:

  * Introduction of early/late suspend/hibernation device callbacks.
  * Generic PM domains extensions and fixes.
  * devfreq updates from Axel Lin and MyungJoo Ham.
  * Device PM QoS updates.
  * Fixes of concurrency problems with wakeup sources.
  * System suspend and hibernation fixes."

* tag 'pm-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (43 commits)
  PM / Domains: Check domain status during hibernation restore of devices
  PM / devfreq: add relation of recommended frequency.
  PM / shmobile: Make MTU2 driver use pm_genpd_dev_always_on()
  PM / shmobile: Make CMT driver use pm_genpd_dev_always_on()
  PM / shmobile: Make TMU driver use pm_genpd_dev_always_on()
  PM / Domains: Introduce "always on" device flag
  PM / Domains: Fix hibernation restore of devices, v2
  PM / Domains: Fix handling of wakeup devices during system resume
  sh_mmcif / PM: Use PM QoS latency constraint
  tmio_mmc / PM: Use PM QoS latency constraint
  PM / QoS: Make it possible to expose PM QoS latency constraints
  PM / Sleep: JBD and JBD2 missing set_freezable()
  PM / Domains: Fix include for PM_GENERIC_DOMAINS=n case
  PM / Freezer: Remove references to TIF_FREEZE in comments
  PM / Sleep: Add more wakeup source initialization routines
  PM / Hibernate: Enable usermodehelpers in hibernate() error path
  PM / Sleep: Make __pm_stay_awake() delete wakeup source timers
  PM / Sleep: Fix race conditions related to wakeup source timer function
  PM / Sleep: Fix possible infinite loop during wakeup source destruction
  PM / Hibernate: print physical addresses consistently with other parts of kernel
  ...
2012-03-21 10:15:51 -07:00
Linus Torvalds
9f3938346a Merge branch 'kmap_atomic' of git://github.com/congwang/linux
Pull kmap_atomic cleanup from Cong Wang.

It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().

Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.

* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
  feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
  highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
  drbd: remove the second argument of k[un]map_atomic()
  zcache: remove the second argument of k[un]map_atomic()
  gma500: remove the second argument of k[un]map_atomic()
  dm: remove the second argument of k[un]map_atomic()
  tomoyo: remove the second argument of k[un]map_atomic()
  sunrpc: remove the second argument of k[un]map_atomic()
  rds: remove the second argument of k[un]map_atomic()
  net: remove the second argument of k[un]map_atomic()
  mm: remove the second argument of k[un]map_atomic()
  lib: remove the second argument of k[un]map_atomic()
  power: remove the second argument of k[un]map_atomic()
  kdb: remove the second argument of k[un]map_atomic()
  udf: remove the second argument of k[un]map_atomic()
  ubifs: remove the second argument of k[un]map_atomic()
  squashfs: remove the second argument of k[un]map_atomic()
  reiserfs: remove the second argument of k[un]map_atomic()
  ocfs2: remove the second argument of k[un]map_atomic()
  ntfs: remove the second argument of k[un]map_atomic()
  ...
2012-03-21 09:40:26 -07:00
Linus Torvalds
69a7aebcf0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "It's indeed trivial -- mostly documentation updates and a bunch of
  typo fixes from Masanari.

  There are also several linux/version.h include removals from Jesper."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
  kcore: fix spelling in read_kcore() comment
  constify struct pci_dev * in obvious cases
  Revert "char: Fix typo in viotape.c"
  init: fix wording error in mm_init comment
  usb: gadget: Kconfig: fix typo for 'different'
  Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
  writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
  writeback: fix typo in the writeback_control comment
  Documentation: Fix multiple typo in Documentation
  tpm_tis: fix tis_lock with respect to RCU
  Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
  Doc: Update numastat.txt
  qla4xxx: Add missing spaces to error messages
  compiler.h: Fix typo
  security: struct security_operations kerneldoc fix
  Documentation: broken URL in libata.tmpl
  Documentation: broken URL in filesystems.tmpl
  mtd: simplify return logic in do_map_probe()
  mm: fix comment typo of truncate_inode_pages_range
  power: bq27x00: Fix typos in comment
  ...
2012-03-20 21:12:50 -07:00
Al Viro
38eff28926 constify path argument of trace_seq_path()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:40 -04:00
Al Viro
66b3fad3f4 constify path argument of audit_log_d_path()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:40 -04:00
Al Viro
48fde701af switch open-coded instances of d_make_root() to new helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:35 -04:00
Linus Torvalds
e45836fafe Merge branch 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue changes from Tejun Heo:
 "This contains only one commit which cleans up UP allocation path a
  bit."

* 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: use percpu allocator for cwq on UP
2012-03-20 18:13:22 -07:00
Linus Torvalds
0d9cabdcce Merge branch 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
 "Out of the 8 commits, one fixes a long-standing locking issue around
  tasklist walking and others are cleanups."

* 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Walk task list under tasklist_lock in cgroup_enable_task_cg_list
  cgroup: Remove wrong comment on cgroup_enable_task_cg_list()
  cgroup: remove cgroup_subsys argument from callbacks
  cgroup: remove extra calls to find_existing_css_set
  cgroup: replace tasklist_lock with rcu_read_lock
  cgroup: simplify double-check locking in cgroup_attach_proc
  cgroup: move struct cgroup_pidlist out from the header file
  cgroup: remove cgroup_attach_task_current_cg()
2012-03-20 18:11:21 -07:00
Oleg Nesterov
b6e238dcee exit_signal: fix the "parent has changed security domain" logic
exit_notify() changes ->exit_signal if the parent already did exec.
This doesn't really work, we are not going to send the signal now
if there is another live thread or the exiting task is traced. The
parent can exec before the last dies or the tracer detaches.

Move this check into do_notify_parent() which actually sends the
signal.

The user-visible change is that we do not change ->exit_signal,
and thus the exiting task is still "clone children" for
do_wait()->eligible_child(__WCLONE). Hopefully this is fine, the
current logic is racy anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-20 14:16:50 -07:00
Oleg Nesterov
e636825346 exit_signal: simplify the "we have changed execution domain" logic
exit_notify() checks "tsk->self_exec_id != tsk->parent_exec_id"
to handle the "we have changed execution domain" case.

We can change do_thread() to always set ->exit_signal = SIGCHLD
and remove this check to simplify the code.

We could change setup_new_exec() instead, this looks more logical
because it increments ->self_exec_id. But note that de_thread()
already resets ->exit_signal if it changes the leader, let's keep
both changes close to each other.

Note that we change ->exit_signal lockless, this changes the rules.
Thereafter ->exit_signal is not stable under tasklist but this is
fine, the only possible change is OLDSIG -> SIGCHLD. This can race
with eligible_child() but the race is harmless. We can race with
reparent_leader() which changes our ->exit_signal in parallel, but
it does the same change to SIGCHLD.

The noticeable user-visible change is that the execing task is not
"visible" to do_wait()->eligible_child(__WCLONE) right after exec.
To me this looks more logical, and this is consistent with mt case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-20 14:16:50 -07:00
Oleg Nesterov
5f8aadd8b9 CLONE_PARENT shouldn't allow to set ->exit_signal
The child must not control its ->exit_signal, it is the parent who
decides which signal the child should use for notification.

This means that CLONE_PARENT should not use "clone_flags & CSIGNAL",
the forking task is the sibling of the new process and their parent
doesn't control exit_signal in this case.

This patch uses ->exit_signal of the forking process, but perhaps
we should simply use SIGCHLD.

We read group_leader->exit_signal lockless, this can race with the
ORIGINAL_SIGNAL -> SIGCHLD transition, but this is fine.

Potentially this change allows to kill self_exec_id/parent_exec_id.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-20 14:16:50 -07:00
Linus Torvalds
161f7a7161 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer changes for v3.4 from Ingo Molnar

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  ntp: Fix integer overflow when setting time
  math: Introduce div64_long
  cs5535-clockevt: Allow the MFGPT IRQ to be shared
  cs5535-clockevt: Don't ignore MFGPT on SMP-capable kernels
  x86/time: Eliminate unused irq0_irqs counter
  clocksource: scx200_hrt: Fix the build
  x86/tsc: Reduce the TSC sync check time for core-siblings
  timer: Fix bad idle check on irq entry
  nohz: Remove ts->Einidle checks before restarting the tick
  nohz: Remove update_ts_time_stat from tick_nohz_start_idle
  clockevents: Leave the broadcast device in shutdown mode when not needed
  clocksource: Load the ACPI PM clocksource asynchronously
  clocksource: scx200_hrt: Convert scx200 to use clocksource_register_hz
  clocksource: Get rid of clocksource_calc_mult_shift()
  clocksource: dbx500: convert to clocksource_register_hz()
  clocksource: scx200_hrt:  use pr_<level> instead of printk
  time: Move common updates to a function
  time: Reorder so the hot data is together
  time: Remove most of xtime_lock usage in timekeeping.c
  ntp: Add ntp_lock to replace xtime_locking
  ...
2012-03-20 10:32:09 -07:00