Commit Graph

96 Commits

Author SHA1 Message Date
Tejun Heo
3a101d0548 sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining
Currently, when a cpu goes down, cpu_active is cleared before
CPU_DOWN_PREPARE starts and cpuset configuration is updated from a
default priority cpu notifier.  When a cpu is coming up, it's set
before CPU_ONLINE but cpuset configuration again is updated from the
same cpu notifier.

For cpu notifiers, this presents an inconsistent state.  Threads which
a CPU_DOWN_PREPARE notifier expects to be bound to the CPU can be
migrated to other cpus because the cpu is no more inactive.

Fix it by updating cpu_active in the highest priority cpu notifier and
cpuset configuration in the second highest when a cpu is coming up.
Down path is updated similarly.  This guarantees that all other cpu
notifiers see consistent cpu_active and cpuset configuration.

cpuset_track_online_cpus() notifier is converted to
cpuset_update_active_cpus() which just updates the configuration and
now called from cpuset_cpu_[in]active() notifiers registered from
sched_init_smp().  If cpuset is disabled, cpuset_update_active_cpus()
degenerates into partition_sched_domains() making separate notifier
for !CONFIG_CPUSETS unnecessary.

This problem is triggered by cmwq.  During CPU_DOWN_PREPARE, hotplug
callback creates a kthread and kthread_bind()s it to the target cpu,
and the thread is expected to run on that cpu.

* Ingo's test discovered __cpuinit/exit markups were incorrect.
  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Menage <menage@google.com>
2010-06-08 21:40:36 +02:00
Daniel J Blueman
5c113fbeed fix cpu_chain section mismatch...
In commit e9fb7631eb ("cpu-hotplug: introduce cpu_notify(),
__cpu_notify(), cpu_notify_nofail()") the new helper functions access
cpu_chain.  As a result, it shouldn't be marked __cpuinitdata (via
section mismatch warning).

Alternatively, the helper functions should be forced inline, or marked
__ref or __cpuinit.  In the meantime, this patch silences the warning
the trivial way.

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-01 09:22:50 -07:00
Rafael J. Wysocki
e9a5f426b8 CPU: Avoid using unititialized error variable in disable_nonboot_cpus()
If there's only one CPU online when disable_nonboot_cpus() is called,
the error variable will not be initialized and that may lead to
erroneous behavior.  Fix this issue by initializing error in
disable_nonboot_cpus() as appropriate.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-30 09:06:00 -07:00
Linus Torvalds
00b9b0af58 Avoid warning when CPU hotplug isn't enabled
Commit e9fb7631eb ("cpu-hotplug: introduce cpu_notify(),
__cpu_notify(), cpu_notify_nofail()") also introduced this annoying
warning:

  kernel/cpu.c:157: warning: 'cpu_notify_nofail' defined but not used

when CONFIG_HOTPLUG_CPU wasn't set.

So move that helper inside the #ifdef CONFIG_HOTPLUG_CPU region, and
simplify it while at it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 10:32:08 -07:00
Lai Jiangshan
79a6cdeb7e cpuhotplug: do not need cpu_hotplug_begin() when CONFIG_HOTPLUG_CPU=n
Since when CONFIG_HOTPLUG_CPU=n, get_online_cpus() do nothing, so we don't
need cpu_hotplug_begin() either.

This patch moves cpu_hotplug_begin()/cpu_hotplug_done() into the code
block of CONFIG_HOTPLUG_CPU=y.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:48 -07:00
Akinobu Mita
e6bde73b07 cpu-hotplug: return better errno on cpu hotplug failure
Currently, onlining or offlining a CPU failure by one of the cpu notifiers
error always cause -EINVAL error.  (i.e.  writing 0 or 1 to
/sys/devices/system/cpu/cpuX/online gets EINVAL)

To get better error reporting rather than always getting -EINVAL, This
changes cpu_notify() to return -errno value with notifier_to_errno() and
fix the callers.  Now that cpu notifiers can return encapsulate errno
value.

Currently, all cpu hotplug notifiers return NOTIFY_OK, NOTIFY_BAD, or
NOTIFY_DONE.  So cpu_notify() can returns 0 or -EPERM with this change for
now.

(notifier_to_errno(NOTIFY_OK) == 0, notifier_to_errno(NOTIFY_DONE) == 0,
notifier_to_errno(NOTIFY_BAD) == -EPERM)

Forthcoming patches convert several cpu notifiers to return encapsulate
errno value with notifier_from_errno().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:47 -07:00
Akinobu Mita
e9fb7631eb cpu-hotplug: introduce cpu_notify(), __cpu_notify(), cpu_notify_nofail()
No functional change.  These are just wrappers of
raw_cpu_notifier_call_chain.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:47 -07:00
Haicheng Li
4eaf3f6439 mem-hotplug: fix potential race while building zonelist for new populated zone
Add global mutex zonelists_mutex to fix the possible race:

     CPU0                                  CPU1                    CPU2
(1) zone->present_pages += online_pages;
(2)                                       build_all_zonelists();
(3)                                                               alloc_page();
(4)                                                               free_page();
(5) build_all_zonelists();
(6)   __build_all_zonelists();
(7)     zone->pageset = alloc_percpu();

In step (3,4), zone->pageset still points to boot_pageset, so bad
things may happen if 2+ nodes are in this state. Even if only 1 node
is accessing the boot_pageset, (3) may still consume too much memory
to fail the memory allocations in step (7).

Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
since there is a new fresh memory block added in step(6).

[haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Andi Kleen <andi.kleen@intel.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:02 -07:00
Haicheng Li
1f522509c7 mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset
For each new populated zone of hotadded node, need to update its pagesets
with dynamically allocated per_cpu_pageset struct for all possible CPUs:

    1) Detach zone->pageset from the shared boot_pageset
       at end of __build_all_zonelists().

    2) Use mutex to protect zone->pageset when it's still
       shared in onlined_pages()

Otherwises, multiple zones of different nodes would share same boot strapping
boot_pageset for same CPU, which will finally cause below kernel panic:

  ------------[ cut here ]------------
  kernel BUG at mm/page_alloc.c:1239!
  invalid opcode: 0000 [#1] SMP
  ...
  Call Trace:
   [<ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0
   [<ffffffff81162e67>] alloc_pages_current+0x87/0xd0
   [<ffffffff81128407>] __page_cache_alloc+0x67/0x70
   [<ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260
   [<ffffffff81132751>] ra_submit+0x21/0x30
   [<ffffffff811329c6>] ondemand_readahead+0x166/0x2c0
   [<ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0
   [<ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670
   [<ffffffff81266cfa>] nfs_file_read+0xca/0x130
   [<ffffffff8117b20a>] do_sync_read+0xfa/0x140
   [<ffffffff8117bf75>] vfs_read+0xb5/0x1a0
   [<ffffffff8117c151>] sys_read+0x51/0x80
   [<ffffffff8103c032>] system_call_fastpath+0x16/0x1b
  RIP  [<ffffffff8112ff13>] get_page_from_freelist+0x883/0x900
   RSP <ffff88000d1e78a8>
  ---[ end trace 4bda28328b9990db ]

[akpm@linux-foundation.org: merge fix]
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Andi Kleen <andi.kleen@intel.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:01 -07:00
minskey guo
cf23422b9d cpu/mem hotplug: enable CPUs online before local memory online
Enable users to online CPUs even if the CPUs belongs to a numa node which
doesn't have onlined local memory.

The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
in system boot/init period, or at the time of local memory online.  For a
numa node without onlined local memory, its zonelists are not initialized
at present.  As a result, any memory allocation operations executed by
CPUs within this node will fail.  In fact, an out-of-memory error is
triggered when attempt to online CPUs before memory comes to online.

This patch tries to create zonelists for such numa nodes, so that the
memory allocation for this node can be fallback'ed to other nodes.

[akpm@linux-foundation.org: remove unneeded export]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: minskey guo<chaohong.guo@intel.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:00 -07:00
Tejun Heo
3fc1f1e27a stop_machine: reimplement using cpu_stop
Reimplement stop_machine using cpu_stop.  As cpu stoppers are
guaranteed to be available for all online cpus,
stop_machine_create/destroy() are no longer necessary and removed.

With resource management and synchronization handled by cpu_stop, the
new implementation is much simpler.  Asking the cpu_stop to execute
the stop_cpu() state machine on all online cpus with cpu hotplug
disabled is enough.

stop_machine itself doesn't need to manage any global resources
anymore, so all per-instance information is rolled into struct
stop_machine_data and the mutex and all static data variables are
removed.

The previous implementation created and destroyed RT workqueues as
necessary which made stop_machine() calls highly expensive on very
large machines.  According to Dimitri Sivanich, preventing the dynamic
creation/destruction makes booting faster more than twice on very
large machines.  cpu_stop resources are preallocated for all online
cpus and should have the same effect.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
2010-05-06 18:49:20 +02:00
Ingo Molnar
b257c14ceb Merge branch 'linus' into sched/core
Merge reason: merge the latest fixes, update to -rc4.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-15 09:36:16 +02:00
Oleg Nesterov
6a1bdc1b57 sched: _cpu_down(): Don't play with current->cpus_allowed
_cpu_down() changes the current task's affinity and then recovers it at
the end. The problems are well known: we can't restore old_allowed if it
was bound to the now-dead-cpu, and we can race with the userspace which
can change cpu-affinity during unplug.

_cpu_down() should not play with current->cpus_allowed at all. Instead,
take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable()
removes the dying cpu from cpu_online_mask.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091023.GA9148@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:12:03 +02:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Chen Gong
87d5e0236d kernel/cpu.c: delete deprecated definition in cpu_up()
Additional_cpus is only supported for IA64 now.  X86_64 should not be
included.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:28 -08:00
Frans Pop
9d3cfc4c1d sched: Correct printk whitespace in warning from cpu down task check
Due to an incorrect line break the output currently contains tabs.
Also remove trailing space.

The actual output that logcheck sent me looked like this:
 Task events/1 (pid = 10) is on cpu 1^I^I^I^I(state = 1, flags = 84208040)

After this patch it becomes:
 Task events/1 (pid = 10) is on cpu 1 (state = 1, flags = 84208040)

Signed-off-by: Frans Pop <elendilplanet.nl>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <201001251456.34996.elendil@planet.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-28 06:59:55 +01:00
Peter Zijlstra
11854247e2 sched: Fix incorrect sanity check
We moved to migrate on wakeup, which means that sleeping tasks could
still be present on offline cpus. Amend the check to only test running
tasks.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-28 06:59:51 +01:00
Xiaotian Feng
9ee349ad6d sched: Fix set_cpu_active() in cpu_down()
Sachin found cpu hotplug test failures on powerpc, which made
the kernel hang on his POWER box.

The problem is that we fail to re-activate a cpu when a
hot-unplug fails. Fix this by moving the de-activation into
_cpu_down after doing the initial checks.

Remove the synchronize_sched() calls and rely on those implied
by rebuilding the sched domains using the new mask.

Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Tested-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.500272612@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 19:01:53 +01:00
Linus Torvalds
702a7c7609 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits)
  sched: Remove forced2_migrations stats
  sched: Fix memory leak in two error corner cases
  sched: Fix build warning in get_update_sysctl_factor()
  sched: Update normalized values on user updates via proc
  sched: Make tunable scaling style configurable
  sched: Fix missing sched tunable recalculation on cpu add/remove
  sched: Fix task priority bug
  sched: cgroup: Implement different treatment for idle shares
  sched: Remove unnecessary RCU exclusion
  sched: Discard some old bits
  sched: Clean up check_preempt_wakeup()
  sched: Move update_curr() in check_preempt_wakeup() to avoid redundant call
  sched: Sanitize fork() handling
  sched: Clean up ttwu() rq locking
  sched: Remove rq->clock coupling from set_task_cpu()
  sched: Consolidate select_task_rq() callers
  sched: Remove sysctl.sched_features
  sched: Protect sched_rr_get_param() access to task->sched_class
  sched: Protect task->cpus_allowed access in sched_getaffinity()
  sched: Fix balance vs hotplug race
  ...

Fixed up conflicts in kernel/sysctl.c (due to sysctl cleanup)
2009-12-12 11:34:10 -08:00
Peter Zijlstra
6ad4c18884 sched: Fix balance vs hotplug race
Since (e761b77: cpu hotplug, sched: Introduce cpu_active_map and redo
sched domain managment) we have cpu_active_mask which is suppose to rule
scheduler migration and load-balancing, except it never (fully) did.

The particular problem being solved here is a crash in try_to_wake_up()
where select_task_rq() ends up selecting an offline cpu because
select_task_rq_fair() trusts the sched_domain tree to reflect the
current state of affairs, similarly select_task_rq_rt() trusts the
root_domain.

However, the sched_domains are updated from CPU_DEAD, which is after the
cpu is taken offline and after stop_machine is done. Therefore it can
race perfectly well with code assuming the domains are right.

Cure this by building the domains from cpu_active_mask on
CPU_DOWN_PREPARE.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-06 21:10:56 +01:00
Mike Travis
feae3203d7 timers, init: Limit the number of per cpu calibration bootup messages
Limit the number of per cpu calibration messages by only
printing out results for the first cpu to boot.

Also, don't print "CPUx is down" as this is expected, and we
don't need 4096 reminders... ;-)

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091118002219.889552000@alcatraz.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 10:18:42 +01:00
Linus Torvalds
227423904c Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pat: Fix cacheflush address in change_page_attr_set_clr()
  mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set
  x86: Fix earlyprintk=dbgp for machines without NX
  x86, pat: Sanity check remap_pfn_range for RAM region
  x86, pat: Lookup the protection from memtype list on vm_insert_pfn()
  x86, pat: Add lookup_memtype to get the current memtype of a paddr
  x86, pat: Use page flags to track memtypes of RAM pages
  x86, pat: Generalize the use of page flag PG_uncached
  x86, pat: Add rbtree to do quick lookup in memtype tracking
  x86, pat: Add PAT reserve free to io_mapping* APIs
  x86, pat: New i/f for driver to request memtype for IO regions
  x86, pat: ioremap to follow same PAT restrictions as other PAT users
  x86, pat: Keep identity maps consistent with mmaps even when pat_disabled
  x86, mtrr: make mtrr_aps_delayed_init static bool
  x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
  generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
  x86: Fix an incorrect argument of reserve_bootmem()
  x86: Fix system crash when loading with "reservetop" parameter
2009-09-15 09:19:38 -07:00
Shane Wang
69575d3886 x86, intel_txt: clean up the impact on generic code, unbreak non-x86
Move tboot.h from asm to linux to fix the build errors of intel_txt
patch on non-X86 platforms. Remove the tboot code from generic code
init/main.c and kernel/cpu.c.

Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-01 18:25:07 -07:00
Suresh Siddha
d0af9eed5a x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
SDM Vol 3a section titled "MTRR considerations in MP systems" specifies
the need for synchronizing the logical cpu's while initializing/updating
MTRR.

Currently Linux kernel does the synchronization of all cpu's only when
a single MTRR register is programmed/updated. During an AP online
(during boot/cpu-online/resume)  where we initialize all the MTRR/PAT registers,
we don't follow this synchronization algorithm.

This can lead to scenarios where during a dynamic cpu online, that logical cpu
is initializing MTRR/PAT with cache disabled (cr0.cd=1) etc while other logical
HT sibling continue to run (also with cache disabled because of cr0.cd=1
on its sibling).

Starting from Westmere, VMX transitions with cr0.cd=1 don't work properly
(because of some VMX performance optimizations) and the above scenario
(with one logical cpu doing VMX activity and another logical cpu coming online)
can result in system crash.

Fix the MTRR initialization by doing rendezvous of all the cpus. During
boot and resume, we delay the MTRR/PAT init for APs till all the
logical cpu's come online and the rendezvous process at the end of AP's bringup,
will initialize the MTRR/PAT for all AP's.

For dynamic single cpu online, we synchronize all the logical cpus and
do the MTRR/PAT init on the AP that is coming online.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-21 16:25:55 -07:00
Joseph Cihula
86886e55b2 x86, intel_txt: Intel TXT Sx shutdown support
Support for graceful handling of sleep states (S3/S4/S5) after an Intel(R) TXT launch.

Without this patch, attempting to place the system in one of the ACPI sleep
states (S3/S4/S5) will cause the TXT hardware to treat this as an attack and
will cause a system reset, with memory locked.  Not only may the subsequent
memory scrub take some time, but the platform will be unable to enter the
requested power state.

This patch calls back into the tboot so that it may properly and securely clean
up system state and clear the secrets-in-memory flag, after which it will place
the system into the requested sleep state using ACPI information passed by the kernel.

 arch/x86/kernel/smpboot.c     |    2 ++
 drivers/acpi/acpica/hwsleep.c |    3 +++
 kernel/cpu.c                  |    7 ++++++-
 3 files changed, 11 insertions(+), 1 deletion(-)

Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-21 11:50:04 -07:00