* topic/huang-d3cold-v7:
PCI/PM: add PCIe runtime D3cold support
PCI: do not call pci_set_power_state with PCI_D3cold
PCI/PM: add runtime PM support to PCIe port
ACPI/PM: specify lowest allowed state for device sleep state
This patch adds runtime D3cold support and corresponding ACPI platform
support. This patch only enables runtime D3cold support; it does not
enable D3cold support during system suspend/hibernate.
D3cold is the deepest power saving state for a PCIe device, where its main
power is removed. While it is in D3cold, you can't access the device at
all, not even its configuration space (which is still accessible in D3hot).
Therefore the PCI PM registers can not be used to transition into/out of
the D3cold state; that must be done by platform logic such as ACPI _PR3.
To support wakeup from D3cold, a system may provide auxiliary power, which
allows a device to request wakeup using a Beacon or the sideband WAKE#
signal. WAKE# is usually connected to platform logic such as ACPI GPE.
This is quite different from other power saving states, where devices
request wakeup via a PME message on the PCIe link.
Some devices, such as those in plug-in slots, have no direct platform
logic. For example, there is usually no ACPI _PR3 for them. D3cold
support for these devices can be done via the PCIe Downstream Port leading
to the device. When the PCIe port is powered on/off, the device is powered
on/off too. Wakeup events from the device will be notified to the
corresponding PCIe port.
For more information about PCIe D3cold and corresponding ACPI support,
please refer to:
- PCI Express Base Specification Revision 2.0
- Advanced Configuration and Power Interface Specification Revision 5.0
[bhelgaas: changelog]
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Originally-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This patch adds runtime PM support to PCIe port. This is needed by
PCIe D3cold support, where PCIe device without ACPI node may be
powered on/off by PCIe port.
Because runtime suspend is broken for some chipsets, a black list is
used to disable runtime PM support for these chipsets.
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Get rid of these:
drivers/pci/pcie/portdrv_core.c: In function 'pcie_port_device_register':
drivers/pci/pcie/portdrv_core.c:275:16: warning: 'cap_mask' may be used
uninitialized in this function [-Wuninitialized]
drivers/pci/pcie/portdrv_core.c:240:6: note: 'cap_mask' was declared here
In some cases, 'cap_mask' may be not set in pcie_port_platform_notify,
holding a garbage value.
Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Since 3.2.12 and 3.3, some systems are failing to boot with a BUG_ON.
Some other systems using the pata_jmicron driver fail to boot because no
disks are detected. Passing pcie_aspm=force on the kernel command line
works around it.
The cause: commit 4949be1682 ("PCI: ignore pre-1.1 ASPM quirking when
ASPM is disabled") changed the behaviour of pcie_aspm_sanity_check() to
always return 0 if aspm is disabled, in order to avoid cases where we
changed ASPM state on pre-PCIe 1.1 devices.
This skipped the secondary function of pcie_aspm_sanity_check which was
to avoid us enabling ASPM on devices that had non-PCIe children, causing
trouble later on. Move the aspm_disabled check so we continue to honour
that scenario.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=42979 and
http://bugs.debian.org/665420
Reported-by: Romain Francoise <romain@orebokech.com> # kernel panic
Reported-by: Chris Holland <bandidoirlandes@gmail.com> # disk detection trouble
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: stable@vger.kernel.org
Tested-by: Hatem Masmoudi <hatem.masmoudi@gmail.com> # Dell Latitude E5520
Tested-by: janek <jan0x6c@gmail.com> # pata_jmicron with JMB362/JMB363
[jn: with more symptoms in log message]
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull PCI changes (including maintainer change) from Jesse Barnes:
"This pull has some good cleanups from Bjorn and Yinghai, as well as
some more code from Yinghai to better handle resource re-allocation
when enabled.
There's also a new initcall_debug feature from Arjan which will print
out quirk timing information to help identify slow quirks for fixing
or refinement (Yinghai sent in a few patches to do just that once the
new debug code landed).
Beyond that, I'm handing off PCI maintainership to Bjorn Helgaas.
He's been a core PCI and Linux contributor for some time now, and has
kindly volunteered to take over. I just don't feel I have the time
for PCI review and work that it deserves lately (I've taken on some
other projects), and haven't been as responsive lately as I'd like, so
I approached Bjorn asking if he'd like to manage things. He's going
to give it a try, and I'm confident he'll do at least as well as I
have in keeping the tree managed, patches flowing, and keeping things
stable."
Fix up some fairly trivial conflicts due to other cleanups (mips device
resource fixup cleanups clashing with list handling cleanup, ppc iseries
removal clashing with pci_probe_only cleanup etc)
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci: (112 commits)
PCI: Bjorn gets PCI hotplug too
PCI: hand PCI maintenance over to Bjorn Helgaas
unicore32/PCI: move <asm-generic/pci-bridge.h> include to asm/pci.h
sparc/PCI: convert devtree and arch-probed bus addresses to resource
powerpc/PCI: allow reallocation on PA Semi
powerpc/PCI: convert devtree bus addresses to resource
powerpc/PCI: compute I/O space bus-to-resource offset consistently
arm/PCI: don't export pci_flags
PCI: fix bridge I/O window bus-to-resource conversion
x86/PCI: add spinlock held check to 'pcibios_fwaddrmap_lookup()'
PCI / PCIe: Introduce command line option to disable ARI
PCI: make acpihp use __pci_remove_bus_device instead
PCI: export __pci_remove_bus_device
PCI: Rename pci_remove_behind_bridge to pci_stop_and_remove_behind_bridge
PCI: Rename pci_remove_bus_device to pci_stop_and_remove_bus_device
PCI: print out PCI device info along with duration
PCI: Move "pci reassigndev resource alignment" out of quirks.c
PCI: Use class for quirk for usb host controller fixup
PCI: Use class for quirk for ti816x class fixup
PCI: Use class for quirk for intel e100 interrupt fixup
...
Right now we won't touch ASPM state if ASPM is disabled, except in the case
where we find a device that appears to be too old to reliably support ASPM.
Right now we'll clear it in that case, which is almost certainly the wrong
thing to do. The easiest way around this is just to disable the blacklisting
when ASPM is disabled.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add a parameter to avoid using MSI/MSI-X for PCIe native hotplug; it's
known to be buggy on some platforms.
In my environment, while shutting down, following stack trace is shown
sometimes.
irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 1081, comm: reboot Not tainted 3.2.0 #1
Call Trace:
<IRQ> [<ffffffff810cec1d>] __report_bad_irq+0x3d/0xe0
[<ffffffff810cee1c>] note_interrupt+0x15c/0x210
[<ffffffff810cc485>] handle_irq_event_percpu+0xb5/0x210
[<ffffffff810cc621>] handle_irq_event+0x41/0x70
[<ffffffff810cf675>] handle_fasteoi_irq+0x55/0xc0
[<ffffffff81015356>] handle_irq+0x46/0xb0
[<ffffffff814fbe9d>] do_IRQ+0x5d/0xe0
[<ffffffff814f146e>] common_interrupt+0x6e/0x6e
[<ffffffff8106b040>] ? __do_softirq+0x60/0x210
[<ffffffff8108aeb1>] ? hrtimer_interrupt+0x151/0x240
[<ffffffff814fb5ec>] call_softirq+0x1c/0x30
[<ffffffff810152d5>] do_softirq+0x65/0xa0
[<ffffffff8106ae9d>] irq_exit+0xbd/0xe0
[<ffffffff814fbf8e>] smp_apic_timer_interrupt+0x6e/0x99
[<ffffffff814f9e5e>] apic_timer_interrupt+0x6e/0x80
<EOI> [<ffffffff814f0fb1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
[<ffffffff812629fc>] pci_bus_write_config_word+0x6c/0x80
[<ffffffff81266fc2>] pci_intx+0x52/0xa0
[<ffffffff8127de3d>] pci_intx_for_msi+0x1d/0x30
[<ffffffff8127e4fb>] pci_msi_shutdown+0x7b/0x110
[<ffffffff81269d34>] pci_device_shutdown+0x34/0x50
[<ffffffff81326c4f>] device_shutdown+0x2f/0x140
[<ffffffff8107b981>] kernel_restart_prepare+0x31/0x40
[<ffffffff8107b9e6>] kernel_restart+0x16/0x60
[<ffffffff8107bbfd>] sys_reboot+0x1ad/0x220
[<ffffffff814f4b90>] ? do_page_fault+0x1e0/0x460
[<ffffffff811942d0>] ? __sync_filesystem+0x90/0x90
[<ffffffff8105c9aa>] ? __cond_resched+0x2a/0x40
[<ffffffff814ef090>] ? _cond_resched+0x30/0x40
[<ffffffff81169e17>] ? iterate_supers+0xb7/0xd0
[<ffffffff814f9382>] system_call_fastpath+0x16/0x1b
handlers:
[<ffffffff8138a0f0>] usb_hcd_irq
[<ffffffff8138a0f0>] usb_hcd_irq
[<ffffffff8138a0f0>] usb_hcd_irq
Disabling IRQ #16
An un-wanted interrupt is generated when PCI driver switches from
MSI/MSI-X to INTx while shutting down the device. The interrupt does
not happen if MSI/MSI-X is not used on the device.
I confirmed that this problem does not happen if pcie_hp=nomsi was
specified and hotplug operation worked fine as usual.
v2: Automatically disable MSI/MSI-X against following device:
PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02)
v3: Based on the review comment, combile the if statements.
v4: Removed module parameter.
Move some code to build pciehp as a module.
Move device specific code to driver/pci/quirks.c.
v5: Drop a device specific code until getting a vendor statement.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Distributions may wish to provide different defaults for PCIE ASPM
depending on their target audience. Provide a configuration option for
choosing the default policy.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Right now we forcibly clear ASPM state on all devices if the BIOS indicates
that the feature isn't supported. Based on the Microsoft presentation
"PCI Express In Depth for Windows Vista and Beyond", I'm starting to think
that this may be an error. The implication is that unless the platform
grants full control via _OSC, Windows will not touch any PCIe features -
including ASPM. In that case clearing ASPM state would be an error unless
the platform has granted us that control.
This patch reworks the ASPM disabling code such that the actual clearing
of state is triggered by a successful handoff of PCIe control to the OS.
The general ASPM code undergoes some changes in order to ensure that the
ability to clear the bits isn't overridden by ASPM having already been
disabled. Further, this theoretically now allows for situations where
only a subset of PCIe roots hand over control, leaving the others in the
BIOS state.
It's difficult to know for sure that this is the right thing to do -
there's zero public documentation on the interaction between all of these
components. But enough vendors enable ASPM on platforms and then set this
bit that it seems likely that they're expecting the OS to leave them alone.
Measured to save around 5W on an idle Thinkpad X220.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The land of PCI power management is a land of sorrow and ugliness,
especially in the area of signaling events by devices. There are
devices that set their PME Status bits, but don't really bother
to send a PME message or assert PME#. There are hardware vendors
who don't connect PME# lines to the system core logic (they know
who they are). There are PCI Express Root Ports that don't bother
to trigger interrupts when they receive PME messages from the devices
below. There are ACPI BIOSes that forget to provide _PRW methods for
devices capable of signaling wakeup. Finally, there are BIOSes that
do provide _PRW methods for such devices, but then don't bother to
call Notify() for those devices from the corresponding _Lxx/_Exx
GPE-handling methods. In all of these cases the kernel doesn't have
a chance to receive a proper notification that it should wake up a
device, so devices stay in low-power states forever. Worse yet, in
some cases they continuously send PME Messages that are silently
ignored, because the kernel simply doesn't know that it should clear
the device's PME Status bit.
This problem was first observed for "parallel" (non-Express) PCI
devices on add-on cards and Matthew Garrett addressed it by adding
code that polls PME Status bits of such devices, if they are enabled
to signal PME, to the kernel. Recently, however, it has turned out
that PCI Express devices are also affected by this issue and that it
is not limited to add-on devices, so it seems necessary to extend
the PME polling to all PCI devices, including PCI Express and planar
ones. Still, it would be wasteful to poll the PME Status bits of
devices that are known to receive proper PME notifications, so make
the kernel (1) poll the PME Status bits of all PCI and PCIe devices
enabled to signal PME and (2) disable the PME Status polling for
devices for which correct PME notifications are received.
Tested-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: remove printks about disabled bridge windows
PCI: fold pci_calc_resource_flags() into decode_bar()
PCI: treat mem BAR type "11" (reserved) as 32-bit, not 64-bit, BAR
PCI: correct pcie_set_readrq write size
PCI: pciehp: change wait time for valid configuration access
x86/PCI: Preserve existing pci=bfsort whitelist for Dell systems
PCI: ARI is a PCIe v2 feature
x86/PCI: quirks: Use pci_dev->revision
PCI: Make the struct pci_dev * argument of pci_fixup_irqs const.
PCI hotplug: cpqphp: use pci_dev->vendor
PCI hotplug: cpqphp: use pci_dev->subsystem_{vendor|device}
x86/PCI: config space accessor functions should not ignore the segment argument
PCI: Assign values to 'pci_obff_signal_type' enumeration constants
x86/PCI: reduce severity of host bridge window conflict warnings
PCI: enumerate the PCI device only removed out PCI hieratchy of OS when re-scanning PCI
PCI: PCIe AER: add aer_recover_queue
x86/PCI: select direct access mode for mmconfig option
PCI hotplug: Rename is_ejectable which also exists in dock.c
In addition to native PCIe AER, now APEI (ACPI Platform Error
Interface) GHES (Generic Hardware Error Source) can be used to report
PCIe AER errors too. To add support to APEI GHES PCIe AER recovery,
aer_recover_queue is added to export the recovery function in native
PCIe AER driver.
Recoverable PCIe AER errors are reported via NMI in APEI GHES. Then
APEI GHES uses irq_work to delay the error processing into an IRQ
handler. But PCIe AER recovery can be very time-consuming, so
aer_recover_queue, which can be used in IRQ handler, delays the real
recovery action into the process context, that is, work queue.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Merriam-Webster tells us that the word exists. However ...
* Google suggests `forcibly' because it doesn't recognize `forcedly'.
* Google lists 494 thousand results for `forcedly'.
* Google lists 13.7 million results for `forcibly'.
* Linus's repo contains 1 occurrence of `forcedly' ( 0 after my change).
* Linus's repo contains 60 occurrences of `forcibly' (61 after my change).
Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
If it was preempted, and the variable aer_mask_override is changed
after the spin_unlock_irqrestore it will write an uninitialized
variable by the pci_write_config_dword() function.
Signed-off-by: Wanlong Gao <wanlong.gao@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Commit 2f671e2d allowed us to clear ASPM state when the FADT
tells us it isn't supported, but we don't put this into effect
if the aspm_policy is set to POLICY_POWERSAVE. Enable the
state to be cleared regardless of policy.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: Disable ASPM when _OSC control is not granted for PCIe services
PCI: Changing ASPM policy, via /sys, to POWERSAVE could cause NMIs
PCI: PCIe links may not get configured for ASPM under POWERSAVE mode
PCI/ACPI: Report ASPM support to BIOS if not disabled from command line
The AER error information printing support is implemented in
drivers/pci/pcie/aer/aer_print.c. So some string constants, functions
and macros definitions can be re-used without being exported.
The original PCIe AER error information printing function is not
re-used directly because the overall format is quite different. And
changing the original printing format may make some original users'
scripts broken.
Signed-off-by: Huang Ying <ying.huang@intel.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
When printing PCIe AER error information, each line is prefixed with
PCIe device and driver information. In original implementation, the
prefix is generated when each line is printed. In fact, all lines
share the same prefix. So this patch pre-generated the prefix, and
use that one when each line is printed.
In addition to common prefix can be pre-generated, the trailing white
spaces in string constants and NULLs in char * array constants can be
removed too. These can reduce the object file size further.
The size of object file before and after changing is as follow:
text data bss dec
before: 3038 0 0 3038
after: 2118 0 0 2118
Signed-off-by: Huang Ying <ying.huang@intel.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>