Commit Graph

782 Commits

Author SHA1 Message Date
Nick Piggin 4827bbb06e i386: remove bogus comment about memory barrier
The comment being removed by this patch is incorrect and misleading.

In the following situation:

	1. load  ...
	2. store 1 -> X
	3. wmb
	4. rmb
	5. load  a <- Y
	6. store ...

4 will only ensure ordering of 1 with 5.
3 will only ensure ordering of 2 with 6.

Further, a CPU with strictly in-order stores will still only provide that
2 and 6 are ordered (effectively, it is the same as a weakly ordered CPU
with wmb after every store).

In all cases, 5 may still be executed before 2 is visible to other CPUs!

The additional piece of the puzzle that mb() provides is the store/load
ordering, which fundamentally cannot be achieved with any combination of
rmb()s and wmb()s.

This can be an unexpected result if one expected any sort of global ordering
guarantee to barriers (eg. that the barriers themselves are sequentially
consistent with other types of barriers).  However sfence or lfence barriers
need only provide an ordering partial ordering of memory operations -- Consider
that wmb may be implemented as nothing more than inserting a special barrier
entry in the store queue, or, in the case of x86, it can be a noop as the store
queue is in order. And an rmb may be implemented as a directive to prevent
subsequent loads only so long as their are no previous outstanding loads (while
there could be stores still in store queues).

I can actually see the occasional load/store being reordered around lfence on
my core2. That doesn't prove my above assertions, but it does show the comment
is wrong (unless my program is -- can send it out by request).

So:
   mb() and smp_mb() always have and always will require a full mfence
   or lock prefixed instruction on x86.  And we should remove this comment.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Paul McKenney <paulmck@us.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-29 09:13:59 -07:00
Len Brown 5a16eff86d Pull bugzilla-1641 into release branch 2007-08-24 22:19:05 -04:00
Rolf Eike Beer 5ca2481424 PCI: Document pci_iomap()
This useful interface is hardly mentioned anywhere in the in-tree
documentation.

Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: Tejun Heo <htejun@gmail.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-08-22 14:48:40 -07:00
Len Brown 61ec7567db ACPI: boot correctly with "nosmp" or "maxcpus=0"
In MPS mode, "nosmp" and "maxcpus=0" boot a UP kernel with IOAPIC disabled.
However, in ACPI mode, these parameters didn't completely disable
the IO APIC initialization code and boot failed.

init/main.c:
	Disable the IO_APIC if "nosmp" or "maxcpus=0"
	undefine disable_ioapic_setup() when it doesn't apply.

i386:
	delete ioapic_setup(), it was a duplicate of parse_noapic()
	delete undefinition of disable_ioapic_setup()

x86_64:
	rename disable_ioapic_setup() to parse_noapic() to match i386
	define disable_ioapic_setup() in header to match i386

http://bugzilla.kernel.org/show_bug.cgi?id=1641

Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
2007-08-21 00:33:35 -04:00
Daniel Gollub 0328ecef90 x86_64: Fix to keep watchdog disabled by default for i386/x86_64
Fixed wrong expression which enabled watchdogs even if nmi_watchdog kernel
parameter wasn't set. This regression got slightly introduced with commit
b7471c6da9.

Introduced NMI_DISABLED (-1) which allows to switch the value of NMI_DEFAULT
without breaking the APIC NMI watchdog code (again).

Fixes:
   https://bugzilla.novell.com/show_bug.cgi?id=298084
   http://bugzilla.kernel.org/show_bug.cgi?id=7839
And likely some more nmi_watchdog=0 related issues.

Signed-off-by: Daniel Gollub <dgollub@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-18 10:25:25 -07:00
Satyam Sharma 62be90012c i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
Use cpu_relax() in the busy loops, as atomic_read() doesn't automatically
imply volatility for i386 and x86_64. x86_64 doesn't have this issue because
it open-codes the while loop in smpboot.c:smp_callin() itself that already
uses cpu_relax().

For i386, however, smpboot.c:smp_callin() calls wait_for_init_deassert()
which is buggy for mach-default and mach-es7000 cases.

[ I test-built a kernel -- smp_callin() itself got inlined in its only
  callsite, smpboot.c:start_secondary() -- and the relevant piece of
  code disassembles to the following:

0xc1019704 <start_secondary+12>:        mov    0xc144c4c8,%eax
0xc1019709 <start_secondary+17>:        test   %eax,%eax
0xc101970b <start_secondary+19>:        je     0xc1019709 <start_secondary+17>

  init_deasserted (at 0xc144c4c8) gets fetched into %eax only once and
  then we loop over the test of the stale value in the register only,
  so these look like real bugs to me. With the fix below, this becomes:

0xc1019706 <start_secondary+14>:        pause
0xc1019708 <start_secondary+16>:        cmpl   $0x0,0xc144c4c8
0xc101970f <start_secondary+23>:        je     0xc1019706 <start_secondary+14>

  which looks nice and healthy. ]

Thanks to Heiko Carstens for noticing this.

Signed-off-by: Satyam Sharma <satyam@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-18 09:54:44 -07:00
Andi Kleen d3f7eae182 i386: Use global flag to disable broken local apic timer on AMD CPUs.
The Averatec 2370 and some other Turion laptop BIOS seems to program the
ENABLE_C1E MSR inconsistently between cores. This confuses the lapic
use heuristics because when C1E is enabled anywhere it seems to affect
the complete chip.

Use a global flag instead of a per cpu flag to handle this.
If any CPU has C1E enabled disabled lapic use.

Thanks to Cal Peake for debugging.

Cc: tglx@linutronix.de
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-11 15:58:13 -07:00
Andi Kleen ab144f5ec6 i386: Make patching more robust, fix paravirt issue
Commit 19d36ccdc3 "x86: Fix alternatives
and kprobes to remap write-protected kernel text" uses code which is
being patched for patching.

In particular, paravirt_ops does patching in two stages: first it
calls paravirt_ops.patch, then it fills any remaining instructions
with nop_out().  nop_out calls text_poke() which calls
lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
that call site is one of the places we patch.

If we always do patching as one single call to text_poke(), we only
need make sure we're not patching the memcpy in text_poke itself.
This means the prototype to paravirt_ops.patch needs to change, to
marshal the new code into a buffer rather than patching in place as it
does now.  It also means all patching goes through text_poke(), which
is known to be safe (apply_alternatives is also changed to make a
single patch).

AK: fix compilation on x86-64 (bad rusty!)
AK: fix boot on x86-64 (sigh)
AK: merged with other patches

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-11 15:58:13 -07:00
Muli Ben-Yehuda 73c59afc65 finish i386 and x86-64 sysdata conversion
This patch finishes the i386 and x86-64 ->sysdata conversion and hopefully
also fixes Riku's and Andy's observed bugs.  It is based on Yinghai Lu's
and Andy Whitcroft's patches (thanks!) with some changes:

- introduce pci_scan_bus_with_sysdata() and use it instead of
  pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will
  allocate the sysdata structure and then call pci_scan_bus().
- always allocate pci_sysdata dynamically. The whole point of this
  sysdata work is to make it easy to do root-bus specific things
  (e.g., support PCI domains and IOMMU's). I dislike using a default
  struct pci_sysdata in some places and a dynamically allocated
  pci_sysdata elsewhere - the potential for someone indavertantly
  changing the default structure is too high.
- this patch only makes the minimal changes necessary, i.e., the NUMA node is
  always initialized to -1. Patches to do the right thing with regards
  to the NUMA node can build on top of this (either add a 'node'
  parameter to pci_scan_bus_with_sysdata() or just update the node
  when it becomes known).

The patch was compile tested with various configurations (e.g., NUMAQ,
VISWS) and run-time tested on i386 and x86-64.  Unfortunately none of my
machines exhibited the bugs so caveat emptor.

Andy, could you please see if this fixes the NUMA issues you've seen?
Riku, does this fix "pci=noacpi" on your laptop?

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: <riku.seppala@kymp.net>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-11 15:47:42 -07:00
Andrew Morton 57d4810ea0 revert "x86, serial: convert legacy COM ports to platform devices"
Revert 7e92b4fc34.  It broke Sébastien Dugué's
machine and Jeff said (persuasively)

  This seems like it will break decades-long-working stuff, in favor of
  breaking new ground in our favorite area, "trusting the BIOS."

  It's just not worth it for serial ports, IMO.  Serial ports are something
  that just shouldn't break at this late stage in the game.  My new Intel
  platform boxes don't even have serial ports, so I question the value of
  messing with serial port probing even more...  because...  just wait a year,
  and your box won't have a serial port either!  :)

  I certainly don't object to the use of platform devices (or isa_driver),
  but the probe change seems questionable.  That's sorta analagous to
  rewriting the floppy driver probe routine.  Sure you could do it...  but why
  risk all that damage and go through debugging all over again?

  It seems clear from this report that we cannot, should not, trust BIOS for
  something (a) so simple and (b) that has been working for over a decade.

Much discussion ensued and we've decided to have another go at all of this.

Cc: Sébastien Dugué <sebastien.dugue@bull.net>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Jeff Garzik <jeff@garzik.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Cc: Sascha Sommer <saschasommer@freenet.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:38 -07:00
Stephane Eranian a583f1b542 remove unused TIF_NOTIFY_RESUME flag
Remove unused TIF_NOTIFY_RESUME flag for all processor architectures.  The
flag was not used excecpt on IA-64 where the patch replaces it with
TIF_PERFMON_WORK.

Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:38 -07:00
Rafael J. Wysocki b0cb1a19d0 Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION
Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION to avoid
confusion (among other things, with CONFIG_SUSPEND introduced in the
next patch).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-29 16:45:38 -07:00
H. Peter Anvin 238b706da1 [x86 setup] Make struct ist_info cross-architecture, and use in setup code
Make "struct ist_info" valid on both i386 and x86-64, and use the
structure by name in the setup code.  Additionally, "Intel SpeedStep
IST" is redundant, refer to it as IST consistently.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2007-07-25 12:02:21 -07:00
H. Peter Anvin f77b1ab383 [x86 setup] Fix typos in struct efi_info
Fix missing letters in the structure members of struct efi_info.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2007-07-25 12:02:21 -07:00
Len Brown e8b2fd0122 ACPI: Kconfig: remove CONFIG_ACPI_SLEEP from source
As it was a synonym for (CONFIG_ACPI && CONFIG_X86),
the ifdefs for it were more clutter than they were worth.

For ia64, just add a few stubs in anticipation of future
S3 or S4 support.

Signed-off-by: Len Brown <len.brown@intel.com>
2007-07-25 01:29:39 -04:00
Andi Kleen e05aff854c i386: Use patchable lock prefix in set_64bit
Previously lock was unconditionally used, but shouldn't be needed on
UP systems.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:38 -07:00
Juergen Beisert f25f64ed5b x86: Replace NSC/Cyrix specific chipset access macros by inlined functions.
Due to index register access ordering problems, when using macros a line
like this fails (and does nothing):

	setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);

With inlined functions this line will work as expected.

Note about a side effect: Seems on Geode GX1 based systems the
"suspend on halt power saving feature" was never enabled due to this
wrong macro expansion. With inlined functions it will be enabled, but
this will stop the TSC when the CPU runs into a HLT instruction.
Kernel output something like this:
	Clocksource tsc unstable (delta = -472746897 ns)

This is the 3rd version of this patch.

 - Adding missed arch/i386/kernel/cpu/mtrr/state.c
	Thanks to Andres Salomon
 - Adding some big fat comments into the new header file
 	Suggested by Andi Kleen

AK: fixed x86-64 compilation

Signed-off-by: Juergen Beisert <juergen@kreuzholzen.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:38 -07:00
Andi Kleen 8f4e956b31 x86: Stop MCEs and NMIs during code patching
When a machine check or NMI occurs while multiple byte code is patched
the CPU could theoretically see an inconsistent instruction and crash.
Prevent this by temporarily disabling MCEs and returning early in the
NMI handler.

Based on discussion with Mathieu Desnoyers.

Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:37 -07:00
Andi Kleen 19d36ccdc3 x86: Fix alternatives and kprobes to remap write-protected kernel text
Reenable kprobes and alternative patching when the kernel text is write
protected by DEBUG_RODATA

Add a general utility function to change write protected text.  The new
function remaps the code using vmap to write it and takes care of CPU
synchronization.  It also does CLFLUSH to make icache recovery faster.

There are some limitations on when the function can be used, see the
comment.

This is a newer version that also changes the paravirt_ops code.
text_poke also supports multi byte patching now.

Contains bug fixes from Zach Amsden and suggestions from Mathieu
Desnoyers.

Cc: Jan Beulich <jbeulich@novell.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Zach Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:37 -07:00
Muli Ben-Yehuda 08f1c192c3 x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata
This patch introduces struct pci_sysdata to x86 and x86-64, and
converts the existing two users (NUMA, Calgary) to use it.

This lays the groundwork for having other users of sysdata, such as
the PCI domains work.

The Calgary bits are tested, the NUMA bits just look ok.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:14 -07:00
Andres Salomon f62e518484 i386: basic infrastructure support for AMD geode-class machines
This builds upon the existing geode infrastructure, but adds southbridge
support, some GPIO functions, and a header file (asm-i386/geode.h) with some
useful GX/LX detection tests.

The majority of this code was written by Jordan Crouse.

Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:14 -07:00
Thomas Gleixner a2900975ef i386: move PIT function declarations and constants to correct header file
setup_pit_timer is declared in asm-i386/timer.h.  Move it to the pit header
file, so it can be used by x86_64 as well.

Move also the PIT constants.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:14 -07:00
Robert P. J. Day 48dd9343d0 i386: replace hard-coded constant with appropriate macro from kernel.h
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:13 -07:00
Andreas Mohr 267eb01a62 i386: add cpu_relax() to cmos_lock()
Add cpu_relax() to cmos_lock() inline function for faster operation on SMT
CPUs and less power consumption on others in case of lock contention (which
probably doesn't happen too often, so admittedly this patch is not too
exciting).

[akpm@linux-foundation.org: Include the header file for cpu_relax()]
Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:13 -07:00
Rafael J. Wysocki 1c10070a55 i386: do not restore reserved memory after hibernation
On some systems the ACPI NVS area is located in the first 1 MB of RAM and
it is overwritten by the i386 code during the restore after hibernation.
This confuses the ACPI platform firmware that doesn't update the AC adapter
status appropriately as a result
(http://bugzilla.kernel.org/show_bug.cgi?id=7995).

The solution is to register the reserved memory in the first 1 MB as
'nosave', so that swsusp doesn't touch it during the restore.  Also, this
has been done on x86_64 for a long time now, so this patch makes the i386
restore code behave like the x86_64 one.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:12 -07:00