IRQ_WORK is used by GHES, but it is selected by PERF_EVENT.
For now PERF_EVENT is selected by x86 by default, but
in concept, IRQ_WORK should be selected by GHES, not by others.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bit 0 of the support parameter to the OSC call should be set in order to
indicate that the OS supports the WHEA mechanism. Stuart Hayes tracked
an APEI issue on some Dell platforms down to this.
Reported-by: Stuart Hayes <Stuart_Hayes@Dell.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'apei-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI, APEI, EINJ Param support is disabled by default
APEI GHES: 32-bit buildfix
ACPI: APEI build fix
ACPI, APEI, GHES: Add hardware memory error recovery support
HWPoison: add memory_failure_queue()
ACPI, APEI, GHES, Error records content based throttle
ACPI, APEI, GHES, printk support for recoverable error via NMI
lib, Make gen_pool memory allocator lockless
lib, Add lock-less NULL terminated single list
Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
ACPI, APEI, Add WHEA _OSC support
ACPI, APEI, Add APEI bit support in generic _OSC call
ACPI, APEI, GHES, Support disable GHES at boot time
ACPI, APEI, GHES, Prevent GHES to be built as module
ACPI, APEI, Use apei_exec_run_optional in APEI EINJ and ERST
ACPI, APEI, Add apei_exec_run_optional
ACPI, APEI, GHES, Do not ratelimit fatal error printk before panic
ACPI, APEI, ERST, Fix erst-dbg long record reading issue
ACPI, APEI, ERST, Prevent erst_dbg from loading if ERST is disabled
Some trivial conflicts due to other various merges
adding to the end of common lists sooner than this one.
arch/ia64/Kconfig
arch/powerpc/Kconfig
arch/x86/Kconfig
lib/Kconfig
lib/Makefile
Signed-off-by: Len Brown <len.brown@intel.com>
EINJ parameter support is only usable for some specific BIOS.
Originally, it is expected to have no harm for BIOS does not support
it. But now, we found it will cause issue (memory overwriting) for
some BIOS. So param support is disabled by default and only enabled
when newly added module parameter named "param_extension" is
explicitly specified.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Matthew Garrett <mjg@redhat.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
drivers/acpi/apei/ghes.c:542: warning: integer overflow in expression
drivers/acpi/apei/ghes.c:619: warning: integer overflow in expression
ghes.c:(.text+0x46289): undefined reference to `__udivdi3'
in function ghes_estatus_cache_add().
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Len Brown <len.brown@intel.com>
as GHES is optional...
When # CONFIG_ACPI_APEI_GHES is not set:
(.init.text+0x4c22): undefined reference to `ghes_disable'
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Len Brown <len.brown@intel.com>
memory_failure_queue() is called when recoverable memory errors are
notified by firmware to do the recovery work.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
printk is used by GHES to report hardware errors. Ratelimit is
enforced on the printk to avoid too many hardware error reports in
kernel log. Because there may be thousands or even millions of
corrected hardware errors during system running.
Currently, a simple scheme is used. That is, the total number of
hardware error reporting is ratelimited. This may cause some issues
in practice.
For example, there are two kinds of hardware errors occurred in
system. One is corrected memory error, because the fault memory
address is accessed frequently, there may be hundreds error report
per-second. The other is corrected PCIe AER error, it will be
reported once per-second. Because they share one ratelimit control
structure, it is highly possible that only memory error is reported.
To avoid the above issue, an error record content based throttle
algorithm is implemented in the patch. Where after the first
successful reporting, all error records that are same are throttled for
some time, to let other kinds of error records have the opportunity to
be reported.
In above example, the memory errors will be throttled for some time,
after being printked. Then the PCIe AER error will be printked
successfully.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Some APEI GHES recoverable errors are reported via NMI, but printk is
not safe in NMI context.
To solve the issue, a lock-less memory allocator is used to allocate
memory in NMI handler, save the error record into the allocated
memory, put the error record into a lock-less list. On the other
hand, an irq_work is used to delay the operation from NMI context to
IRQ context. The irq_work IRQ handler will remove nodes from
lock-less list, printk the error record and do some further processing
include recovery operation, then free the memory.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (28 commits)
ACPI: delete stale reference in kernel-parameters.txt
ACPI: add missing _OSI strings
ACPI: remove NID_INVAL
thermal: make THERMAL_HWMON implementation fully internal
thermal: split hwmon lookup to a separate function
thermal: hide CONFIG_THERMAL_HWMON
ACPI print OSI(Linux) warning only once
ACPI: DMI workaround for Asus A8N-SLI Premium and Asus A8N-SLI DELUX
ACPI / Battery: propagate sysfs error in acpi_battery_add()
ACPI / Battery: avoid acpi_battery_add() use-after-free
ACPI: introduce "acpi_rsdp=" parameter for kdump
ACPI: constify ops structs
ACPI: fix CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS
ACPI: fix 80 char overflow
ACPI / Battery: Resolve the race condition in the sysfs_remove_battery()
ACPI / Battery: Add the check before refresh sysfs in the battery_notify()
ACPI / Battery: Add the hibernation process in the battery_notify()
ACPI / Battery: Rename acpi_battery_quirks2 with acpi_battery_quirks
ACPI / Battery: Change 16-bit signed negative battery current into correct value
ACPI / Battery: Add the power unit macro
...
Linux supports some optional features, but it should notify the BIOS about
them via the _OSI method. Currently Linux doesn't notify any, which might
make such features not work because the BIOS doesn't know about them.
Jarosz has a system which needs this to make ACPI processor aggregator
device work.
Reported-by: "Jarosz, Sebastian" <sebastian.jarosz@intel.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'pstore-efi' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
efivars: Introduce PSTORE_EFI_ATTRIBUTES
efivars: Use string functions in pstore_write
efivars: introduce utf16_strncmp
efivars: String functions
efi: Add support for using efivars as a pstore backend
pstore: Allow the user to explicitly choose a backend
pstore: Make "part" unsigned
pstore: Add extra context for writes and erases
pstore: Extend API for more flexibility in new backends
* 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (135 commits)
drm/radeon/kms: fix DP training for DPEncoderService revision bigger than 1.1
drm/radeon/kms: add missing vddci setting on NI+
drm/radeon: Add a rmb() in IH processing
drm/radeon: ATOM Endian fix for atombios_crtc_program_pll()
drm/radeon: Fix the definition of RADEON_BUF_SWAP_32BIT
drm/radeon: Do an MMIO read on interrupts when not uisng MSIs
drm/radeon: Writeback endian fixes
drm/radeon: Remove a bunch of useless _iomem casts
drm/gem: add support for private objects
DRM: clean up and document parsing of video= parameter
DRM: Radeon: Fix section mismatch.
drm: really make debug levels match in edid failure code
drm/radeon/kms: fix i2c map for rv250/280
drm/nouveau/gr: disable fifo access and idle before suspend ctx unload
drm/nouveau: pass flag to engine fini() method on suspend
drm/nouveau: replace nv04_graph_fifo_access() use with direct reg bashing
drm/nv40/gr: rewrite/split context takedown functions
drm/nouveau: detect disabled device in irq handler and return IRQ_NONE
drm/nouveau: ignore connector type when deciding digital/analog on DVI-I
drm/nouveau: Add a quirk for Gigabyte NX86T
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
fs: Merge split strings
treewide: fix potentially dangerous trailing ';' in #defined values/expressions
uwb: Fix misspelling of neighbourhood in comment
net, netfilter: Remove redundant goto in ebt_ulog_packet
trivial: don't touch files that are removed in the staging tree
lib/vsprintf: replace link to Draft by final RFC number
doc: Kconfig: `to be' -> `be'
doc: Kconfig: Typo: square -> squared
doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
drivers/net: static should be at beginning of declaration
drivers/media: static should be at beginning of declaration
drivers/i2c: static should be at beginning of declaration
XTENSA: static should be at beginning of declaration
SH: static should be at beginning of declaration
MIPS: static should be at beginning of declaration
ARM: static should be at beginning of declaration
rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
Update my e-mail address
PCIe ASPM: forcedly -> forcibly
gma500: push through device driver tree
...
Fix up trivial conflicts:
- arch/arm/mach-ep93xx/dma-m2p.c (deleted)
- drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
- drivers/net/r8169.c (just context changes)
We'll never have a negative part, so just make this an unsigned int.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
EFI only provides small amounts of individual storage, and conventionally
puts metadata in the storage variable name. Rather than add a metadata
header to the (already limited) variable storage, it's easier for us to
modify pstore to pass all the information we need to construct a unique
variable name to the appropriate functions.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Some pstore implementations may not have a static context, so extend the
API to pass the pstore_info struct to all calls and allow for a context
pointer.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>