Commit Graph

80 Commits

Author SHA1 Message Date
Rafael J. Wysocki 5b889bf237 PCI: Fix build if quirks are not enabled
After commit b9c3b26641 ("PCI: support
device-specific reset methods") the kernel build is broken if
CONFIG_PCI_QUIRKS is unset.

Fix this by moving pci_dev_specific_reset() to drivers/pci/quirks.c and
providing an empty replacement for !CONFIG_PCI_QUIRKS builds.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-31 12:00:45 -08:00
Dexuan Cui b9c3b26641 PCI: support device-specific reset methods
Add a new type of quirk for resetting devices at pci_dev_reset time.
This is necessary to handle device with nonstandard reset procedures,
especially useful for guest drivers.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-16 13:37:50 -08:00
Allen Kay ae21ee65e8 PCI: acs p2p upsteram forwarding enabling
Note: dom0 checking in v4 has been separated out into 2/2.

This patch enables P2P upstream forwarding in ACS capable PCIe switches.
It solves two potential problems in virtualization environment where a PCIe
device is assigned to a guest domain using a HW iommu such as VT-d:

1) Unintentional failure caused by guest physical address programmed
   into the device's DMA that happens to match the memory address range
   of other downstream ports in the same PCIe switch.  This causes the PCI
   transaction to go to the matching downstream port instead of go to the
   root complex to get translated by VT-d as it should be.

2) Malicious guest software intentionally attacks another downstream
   PCIe device by programming the DMA address into the assigned device
   that matches memory address range of the downstream PCIe port.

We are in process of implementing device filtering software in KVM/XEN
management software to allow device assignment of PCIe devices behind a PCIe
switch only if it has ACS capability and with the P2P upstream forwarding bits
enabled.  This patch is intended to work for both KVM and Xen environments.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Reviewed-by: Mathew Wilcox <willy@linux.intel.com>
Reviewed-by: Chris Wright <chris@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:47:25 -08:00
Eric W. Biederman 0ba379ec0f PCI: Simplify hotplug mch quirk.
There is a very old quirk for the intel E7502 E7320 and E7525 memory
controller hubs that disables usage of msi interrupts on pcie hotplug
bridges of those devices, and disables changing the affinity of irqs.

Today all we have to do to disable msi on a specific device is to set
dev->no_msi, which is much more straightforward than the previous
logic.

The re-running of this fixup after pci hotplug happens below these
devices is totally bogus.  All of the state we change is pure software
state and we don't change the hardware at all.  Which means hotplug on
the lower devices doesn't have a chance to change this state.  So we
can safely remove the special case from the pciehp driver and the pcie
portdriver.

I suspect the special case was someone's expermental debug code that
slipped in. Certainly it isn't mentioned in commit
6fb8880a61510295aece04a542767161f624dffe aka BKrev:
41966101LJ_ogfOU0m2aE6teZfQnuQ where the code first appears.

Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 14:06:49 -07:00
Michael S. Tsirkin 711d57796f PCI: expose function reset capability in sysfs
Some devices allow an individual function to be reset without affecting
other functions in the same device: that's what pci_reset_function does.
For devices that have this support, expose reset attribite in sysfs.

This is useful e.g. for virtualization, where a qemu userspace
process wants to reset the device when the guest is reset,
to emulate machine reboot as closely as possible.

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:24 -07:00
Chris Wright 6faf17f6f1 PCI SR-IOV: correct broken resource alignment calculations
An SR-IOV capable device includes an SR-IOV PCIe capability which
describes the Virtual Function (VF) BAR requirements.  A typical SR-IOV
device can support multiple VFs whose BARs must be in a contiguous region,
effectively an array of VF BARs.  The BAR reports the size requirement
for a single VF.  We calculate the full range needed by simply multiplying
the VF BAR size with the number of possible VFs and create a resource
spanning the full range.

This all seems sane enough except it artificially inflates the alignment
requirement for the VF BAR.  The VF BAR need only be aligned to the size
of a single BAR not the contiguous range of VF BARs.  This can cause us
to fail to allocate resources for the BAR despite the fact that we
actually have enough space.

This patch adds a thin PCI specific layer over the generic
resource_alignment() function which is aware of the special nature of
VF BARs and does sorting and allocation based on the smaller alignment
requirement.

I recognize that while resource_alignment is generic, it's basically a
PCI helper.  An alternative to this patch is to add PCI VF BAR specific
information to struct resource.  I opted for the extra layer rather than
adding such PCI specific information to struct resource.  This does
have the slight downside that we don't cache the BAR size and re-read
for each alignment query (happens a small handful of times during boot
for each VF BAR).

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Yu Zhao <yu.zhao@intel.com>
Cc: stable@kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-08-30 08:37:25 -07:00
Yu Zhao e277d2fc79 PCI: handle Virtual Function ATS enabling
The SR-IOV spec requires that the Smallest Translation Unit and
the Invalidate Queue Depth fields in the Virtual Function ATS
capability are hardwired to 0. If a function is a Virtual Function,
then and set its Physical Function's STU before enabling the ATS.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-05-18 11:25:58 +01:00
Yu Zhao 302b4215da PCI: support the ATS capability
The PCIe ATS capability makes the Endpoint be able to request the
DMA address translation from the IOMMU and cache the translation
in the device side, thus alleviate IOMMU pressure and improve the
hardware performance in the I/O virtualization environment.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-05-18 11:25:54 +01:00
Linus Torvalds e76e5b2c66 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
  PCI: fix HT MSI mapping fix
  PCI: don't enable too much HT MSI mapping
  x86/PCI: make pci=lastbus=255 work when acpi is on
  PCI: save and restore PCIe 2.0 registers
  PCI: update fakephp for bus_id removal
  PCI: fix kernel oops on bridge removal
  PCI: fix conflict between SR-IOV and config space sizing
  powerpc/PCI: include pci.h in powerpc MSI implementation
  PCI Hotplug: schedule fakephp for feature removal
  PCI Hotplug: rename legacy_fakephp to fakephp
  PCI Hotplug: restore fakephp interface with complete reimplementation
  PCI: Introduce /sys/bus/pci/devices/.../rescan
  PCI: Introduce /sys/bus/pci/devices/.../remove
  PCI: Introduce /sys/bus/pci/rescan
  PCI: Introduce pci_rescan_bus()
  PCI: do not enable bridges more than once
  PCI: do not initialize bridges more than once
  PCI: always scan child buses
  PCI: pci_scan_slot() returns newly found devices
  PCI: don't scan existing devices
  ...

Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
2009-04-01 09:47:12 -07:00
Rafael J. Wysocki 0128a89cf7 PCI PM: Move pci_restore_standard_config to pci-driver.c
Move pci_restore_standard_config() from pci.c to pci-driver.c and
make it static.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-30 21:46:55 +02:00
Alex Chiang 705b1aaa82 PCI: Introduce /sys/bus/pci/rescan
This interface allows the user to force a rescan of all PCI buses
in system, and rediscover devices that have been removed earlier.

pci_bus_attrs implementation from Trent Piepho.

Thanks to Vegard Nossum for discovering locking issues with the
sysfs interface.

Cc: Trent Piepho <xyzzy@speakeasy.org>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 14:57:58 -07:00
Yu Zhao 74bb1bcc7d PCI: handle SR-IOV Virtual Function Migration
Add or remove a Virtual Function after receiving a Migrate In or Out
Request.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:28 -07:00
Yu Zhao dd7cc44d0b PCI: add SR-IOV API for Physical Function driver
Add or remove the Virtual Function when the SR-IOV is enabled or
disabled by the device driver. This can happen anytime rather than
only at the device probe stage.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:26 -07:00
Yu Zhao 480b93b783 PCI: centralize device setup code
Move the device setup stuff into pci_setup_device() which will be used
to setup the Virtual Function later.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:25 -07:00
Yu Zhao a28724b0fb PCI: reserve bus range for SR-IOV device
Reserve the bus number range used by the Virtual Function when
pcibios_assign_all_busses() returns true.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:24 -07:00
Yu Zhao 8c5cdb6adc PCI: restore saved SR-IOV state
Restore the volatile registers in the SR-IOV capability after the
D3->D0 transition.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:24 -07:00
Yu Zhao d1b054da8f PCI: initialize and release SR-IOV capability
If a device has the SR-IOV capability, initialize it (set the ARI
Capable Hierarchy in the lowest numbered PF if necessary; calculate
the System Page Size for the VF MMIO, probe the VF Offset, Stride
and BARs). A lock for the VF bus allocation is also initialized if
a PF is the lowest numbered PF.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:22 -07:00
Yuji Shimada 32a9a682be PCI: allow assignment of memory resources with a specified alignment
This patch allows memory resources to be assigned with a specified
alignment at boot-time or run-time. The patch is useful when we use PCI
pass-through, because page-aligned memory resources are required to
securely share PCI resources with guest drivers.

If you want to assign the resource at boot time, please set
"pci=resource_alignment=" boot parameter.

This is format of "pci=resource_alignment=" boot parameter:

        [<order of align>@][<domain>:]<bus>:<slot>.<func>[; ...]
                Specifies alignment and device to reassign
                aligned memory resources.
                If <order of align> is not specified, PAGE_SIZE is
                used as alignment.
                PCI-PCI bridge can be specified, if resource
                windows need to be expanded.

This is example:

        pci=resource_alignment=20@07:00.0;18@0f:00.0;00:1d.7

If you want to assign the resource at run-time, please set
"/sys/bus/pci/resource_alignment" file, and hot-remove the device and
hot-add the device.  For this purpose, fakephp or PCI hotplug interfaces
can be used.

The format of "/sys/bus/pci/resource_alignment" file is the same with
boot parameter. You can use "," instead of ";".

For example:

        # cd /sys/bus/pci
        # echo -n 20@12:00.0 > resource_alignment
        # echo 1 > devices/0000:12:00.0/remove
        # echo 1 > rescan

Reviewed-by: Alex Chiang <achiang@hp.com>
Reviewed-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:15 -07:00
Randy Dunlap b33bfdef24 PCI: fix struct pci_platform_pm_ops kernel-doc
Fix struct pci_platform_pm_ops kernel-doc notation.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-02-13 12:02:47 -08:00
Rafael J. Wysocki aa8c6c9374 PCI PM: Restore standard config registers of all devices early
There is a problem in our handling of suspend-resume of PCI devices that
many of them have their standard config registers restored with
interrupts enabled and they are put into the full power state with
interrupts enabled as well.  This may lead to the following scenario:
  * an interrupt vector is shared between two or more devices
  * one device is resumed earlier and generates an interrupt
  * the interrupt handler of another device tries to handle it and
    attempts to access the device the config space of which hasn't been
    restored yet and/or which still is in a low power state
  * the system crashes as a result

To prevent this from happening we should restore the standard
configuration registers of all devices with interrupts disabled and we
should put them into the D0 power state right after that.
Unfortunately, this cannot be done using the existing
pci_set_power_state(), because it can sleep.  Also, to do it we have to
make sure that the config spaces of all devices were actually saved
during suspend.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-16 12:57:58 -08:00
Rafael J. Wysocki 734104292f PCI PM: Avoid touching devices behind bridges in unknown state
It generally is better to avoid accessing devices behind bridges that
may not be in the D0 power state, because in that case the bridges'
secondary buses may not be accessible.  For this reason, during the
early phase of resume (ie. with interrupts disabled), before
restoring the standard config registers of a device, check the power
state of the bridge the device is behind and postpone the restoration
of the device's config space, as well as any other operations that
would involve accessing the device, if that state is not D0.

In such cases the restoration of the device's config space will be
retried during the "normal" phase of resume (ie. with interrupts
enabled), so that the bridge can be put into D0 before that happens.

Also, save standard configuration registers of PCI devices during the
"normal" phase of suspend (ie. with interrupts enabled), so that the
bridges the devices are behind can be put into low power states (we
don't put bridges into low power states at the moment, but we may
want to do it in the future and it seems reasonable to design for
that).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:16:05 -08:00
Rafael J. Wysocki fa58d305d9 PCI PM: Add suspend counterpart of pci_reenable_device
PCI devices without drivers are not disabled during suspend and
hibernation, but they are enabled during resume, with the help of
pci_reenable_device(), so there is an unbalanced execution of
pcibios_enable_device() in the resume code path.

To correct this introduce function pci_disable_enabled_device()
that will disable the argument device, if it is enabled when the
function is being run, without updating the device's pci_dev
structure and use it in the suspend code path to balance the
pci_reenable_device() executed during resume.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:14:40 -08:00
Stephen Hemminger 287d19ce2e PCI: revise VPD access interface
Change PCI VPD API which was only used by sysfs to something usable
in drivers.
   * move iteration over multiple words to the low level
   * use conventional types for arguments
   * add exportable wrapper

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:17 -08:00
Jesse Barnes eb9c39d031 PCI: set device wakeup capable flag if platform support is present
When PCI devices are initialized, we check whether they support PCI PM
caps and set the device can_wakeup flag if so.  However, some devices
may have platform provided wakeup events rather than PCI PME signals, so
we need to set can_wakeup in that case too.  Doing so should allow
wakeups from many more devices, especially on cost constrained systems.

Reported-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Joseph Chan <JosephChan@via.com.tw>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:07 -08:00
Yu Zhao 876e501ab2 PCI: factor pci_bus_add_child() from pci_bus_add_devices()
This patch splits a new function, pci_bus_add_child(), from
pci_bus_add_devices(). The new function can be used to register PCI
buses to the device core.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:06 -08:00