Merge branch 'master' into gfs2

This commit is contained in:
Steven Whitehouse
2006-09-28 08:29:59 -04:00
1941 changed files with 87478 additions and 33031 deletions

View File

@@ -1,13 +1,12 @@
What: devfs
Date: July 2005
Date: July 2005 (scheduled), finally removed in kernel v2.6.18
Contact: Greg Kroah-Hartman <gregkh@suse.de>
Description:
devfs has been unmaintained for a number of years, has unfixable
races, contains a naming policy within the kernel that is
against the LSB, and can be replaced by using udev.
The files fs/devfs/*, include/linux/devfs_fs*.h will be removed,
The files fs/devfs/*, include/linux/devfs_fs*.h were removed,
along with the the assorted devfs function calls throughout the
kernel tree.
Users:

View File

@@ -0,0 +1,88 @@
What: /sys/power/
Date: August 2006
Contact: Rafael J. Wysocki <rjw@sisk.pl>
Description:
The /sys/power directory will contain files that will
provide a unified interface to the power management
subsystem.
What: /sys/power/state
Date: August 2006
Contact: Rafael J. Wysocki <rjw@sisk.pl>
Description:
The /sys/power/state file controls the system power state.
Reading from this file returns what states are supported,
which is hard-coded to 'standby' (Power-On Suspend), 'mem'
(Suspend-to-RAM), and 'disk' (Suspend-to-Disk).
Writing to this file one of these strings causes the system to
transition into that state. Please see the file
Documentation/power/states.txt for a description of each of
these states.
What: /sys/power/disk
Date: August 2006
Contact: Rafael J. Wysocki <rjw@sisk.pl>
Description:
The /sys/power/disk file controls the operating mode of the
suspend-to-disk mechanism. Reading from this file returns
the name of the method by which the system will be put to
sleep on the next suspend. There are four methods supported:
'firmware' - means that the memory image will be saved to disk
by some firmware, in which case we also assume that the
firmware will handle the system suspend.
'platform' - the memory image will be saved by the kernel and
the system will be put to sleep by the platform driver (e.g.
ACPI or other PM registers).
'shutdown' - the memory image will be saved by the kernel and
the system will be powered off.
'reboot' - the memory image will be saved by the kernel and
the system will be rebooted.
The suspend-to-disk method may be chosen by writing to this
file one of the accepted strings:
'firmware'
'platform'
'shutdown'
'reboot'
It will only change to 'firmware' or 'platform' if the system
supports that.
What: /sys/power/image_size
Date: August 2006
Contact: Rafael J. Wysocki <rjw@sisk.pl>
Description:
The /sys/power/image_size file controls the size of the image
created by the suspend-to-disk mechanism. It can be written a
string representing a non-negative integer that will be used
as an upper limit of the image size, in bytes. The kernel's
suspend-to-disk code will do its best to ensure the image size
will not exceed this number. However, if it turns out to be
impossible, the kernel will try to suspend anyway using the
smallest image possible. In particular, if "0" is written to
this file, the suspend image will be as small as possible.
Reading from this file will display the current image size
limit, which is set to 500 MB by default.
What: /sys/power/pm_trace
Date: August 2006
Contact: Rafael J. Wysocki <rjw@sisk.pl>
Description:
The /sys/power/pm_trace file controls the code which saves the
last PM event point in the RTC across reboots, so that you can
debug a machine that just hangs during suspend (or more
commonly, during resume). Namely, the RTC is only used to save
the last PM event point if this file contains '1'. Initially
it contains '0' which may be changed to '1' by writing a
string representing a nonzero integer into it.
To use this debugging feature you should attempt to suspend
the machine, then reboot it and run
dmesg -s 1000000 | grep 'hash matches'
CAUTION: Using it will cause your machine's real-time (CMOS)
clock to be set to a random invalid time after a resume.

View File

@@ -43,59 +43,52 @@
<para>A Universal Serial Bus (USB) is used to connect a host,
such as a PC or workstation, to a number of peripheral
devices. USB uses a tree structure, with the host at the
devices. USB uses a tree structure, with the host as the
root (the system's master), hubs as interior nodes, and
peripheral devices as leaves (and slaves).
peripherals as leaves (and slaves).
Modern PCs support several such trees of USB devices, usually
one USB 2.0 tree (480 Mbit/sec each) with
a few USB 1.1 trees (12 Mbit/sec each) that are used when you
connect a USB 1.1 device directly to the machine's "root hub".
</para>
<para>That master/slave asymmetry was designed in part for
ease of use. It is not physically possible to assemble
(legal) USB cables incorrectly: all upstream "to-the-host"
connectors are the rectangular type, matching the sockets on
root hubs, and the downstream type are the squarish type
(or they are built in to the peripheral).
Software doesn't need to deal with distributed autoconfiguration
since the pre-designated master node manages all that.
At the electrical level, bus protocol overhead is reduced by
eliminating arbitration and moving scheduling into host software.
<para>That master/slave asymmetry was designed-in for a number of
reasons, one being ease of use. It is not physically possible to
assemble (legal) USB cables incorrectly: all upstream "to the host"
connectors are the rectangular type (matching the sockets on
root hubs), and all downstream connectors are the squarish type
(or they are built into the peripheral).
Also, the host software doesn't need to deal with distributed
auto-configuration since the pre-designated master node manages all that.
And finally, at the electrical level, bus protocol overhead is reduced by
eliminating arbitration and moving scheduling into the host software.
</para>
<para>USB 1.0 was announced in January 1996, and was revised
<para>USB 1.0 was announced in January 1996 and was revised
as USB 1.1 (with improvements in hub specification and
support for interrupt-out transfers) in September 1998.
USB 2.0 was released in April 2000, including high speed
transfers and transaction translating hubs (used for USB 1.1
USB 2.0 was released in April 2000, adding high-speed
transfers and transaction-translating hubs (used for USB 1.1
and 1.0 backward compatibility).
</para>
<para>USB support was added to Linux early in the 2.2 kernel series
shortly before the 2.3 development forked off. Updates
from 2.3 were regularly folded back into 2.2 releases, bringing
new features such as <filename>/sbin/hotplug</filename> support,
more drivers, and more robustness.
The 2.5 kernel series continued such improvements, and also
worked on USB 2.0 support,
higher performance,
better consistency between host controller drivers,
API simplification (to make bugs less likely),
and providing internal "kerneldoc" documentation.
<para>Kernel developers added USB support to Linux early in the 2.2 kernel
series, shortly before 2.3 development forked. Updates from 2.3 were
regularly folded back into 2.2 releases, which improved reliability and
brought <filename>/sbin/hotplug</filename> support as well more drivers.
Such improvements were continued in the 2.5 kernel series, where they added
USB 2.0 support, improved performance, and made the host controller drivers
(HCDs) more consistent. They also simplified the API (to make bugs less
likely) and added internal "kerneldoc" documentation.
</para>
<para>Linux can run inside USB devices as well as on
the hosts that control the devices.
Because the Linux 2.x USB support evolved to support mass market
platforms such as Apple Macintosh or PC-compatible systems,
it didn't address design concerns for those types of USB systems.
So it can't be used inside mass-market PDAs, or other peripherals.
USB device drivers running inside those Linux peripherals
But USB device drivers running inside those peripherals
don't do the same things as the ones running inside hosts,
and so they've been given a different name:
they're called <emphasis>gadget drivers</emphasis>.
This document does not present gadget drivers.
so they've been given a different name:
<emphasis>gadget drivers</emphasis>.
This document does not cover gadget drivers.
</para>
</chapter>
@@ -103,17 +96,14 @@
<chapter id="host">
<title>USB Host-Side API Model</title>
<para>Within the kernel,
host-side drivers for USB devices talk to the "usbcore" APIs.
There are two types of public "usbcore" APIs, targetted at two different
layers of USB driver. Those are
<emphasis>general purpose</emphasis> drivers, exposed through
driver frameworks such as block, character, or network devices;
and drivers that are <emphasis>part of the core</emphasis>,
which are involved in managing a USB bus.
Such core drivers include the <emphasis>hub</emphasis> driver,
which manages trees of USB devices, and several different kinds
of <emphasis>host controller driver (HCD)</emphasis>,
<para>Host-side drivers for USB devices talk to the "usbcore" APIs.
There are two. One is intended for
<emphasis>general-purpose</emphasis> drivers (exposed through
driver frameworks), and the other is for drivers that are
<emphasis>part of the core</emphasis>.
Such core drivers include the <emphasis>hub</emphasis> driver
(which manages trees of USB devices) and several different kinds
of <emphasis>host controller drivers</emphasis>,
which control individual busses.
</para>
@@ -122,21 +112,21 @@
<itemizedlist>
<listitem><para>USB supports four kinds of data transfer
(control, bulk, interrupt, and isochronous). Two transfer
types use bandwidth as it's available (control and bulk),
while the other two types of transfer (interrupt and isochronous)
<listitem><para>USB supports four kinds of data transfers
(control, bulk, interrupt, and isochronous). Two of them (control
and bulk) use bandwidth as it's available,
while the other two (interrupt and isochronous)
are scheduled to provide guaranteed bandwidth.
</para></listitem>
<listitem><para>The device description model includes one or more
"configurations" per device, only one of which is active at a time.
Devices that are capable of high speed operation must also support
full speed configurations, along with a way to ask about the
"other speed" configurations that might be used.
Devices that are capable of high-speed operation must also support
full-speed configurations, along with a way to ask about the
"other speed" configurations which might be used.
</para></listitem>
<listitem><para>Configurations have one or more "interface", each
<listitem><para>Configurations have one or more "interfaces", each
of which may have "alternate settings". Interfaces may be
standardized by USB "Class" specifications, or may be specific to
a vendor or device.</para>
@@ -162,7 +152,7 @@
</para></listitem>
<listitem><para>The Linux USB API supports synchronous calls for
control and bulk messaging.
control and bulk messages.
It also supports asynchnous calls for all kinds of data transfer,
using request structures called "URBs" (USB Request Blocks).
</para></listitem>
@@ -463,14 +453,25 @@
file in your Linux kernel sources.
</para>
<para>Otherwise the main use for this file from programs
is to poll() it to get notifications of usb devices
as they're plugged or unplugged.
To see what changed, you'd need to read the file and
compare "before" and "after" contents, scan the filesystem,
or see its hotplug event.
</para>
<para>This file, in combination with the poll() system call, can
also be used to detect when devices are added or removed:
<programlisting>int fd;
struct pollfd pfd;
fd = open("/proc/bus/usb/devices", O_RDONLY);
pfd = { fd, POLLIN, 0 };
for (;;) {
/* The first time through, this call will return immediately. */
poll(&amp;pfd, 1, -1);
/* To see what's changed, compare the file's previous and current
contents or scan the filesystem. (Scanning is more precise.) */
}</programlisting>
Note that this behavior is intended to be used for informational
and debug purposes. It would be more appropriate to use programs
such as udev or HAL to initialize a device or start a user-mode
helper program, for instance.
</para>
</sect1>
<sect1>

View File

@@ -358,7 +358,8 @@ Here is a list of some of the different kernel trees available:
quilt trees:
- USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de>
kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
- x86-64, partly i386, Andi Kleen <ak@suse.de>
ftp.firstfloor.org:/pub/ak/x86_64/quilt/
Bug Reporting
-------------

View File

@@ -2543,6 +2543,9 @@ Your cooperation is appreciated.
64 = /dev/usb/rio500 Diamond Rio 500
65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de)
66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD)
67 = /dev/usb/adutux0 1st Ontrak ADU device
...
76 = /dev/usb/adutux10 10th Ontrak ADU device
96 = /dev/usb/hiddev0 1st USB HID device
...
111 = /dev/usb/hiddev15 16th USB HID device

View File

@@ -6,6 +6,21 @@ be removed from this file.
---------------------------
What: /sys/devices/.../power/state
dev->power.power_state
dpm_runtime_{suspend,resume)()
When: July 2007
Why: Broken design for runtime control over driver power states, confusing
driver-internal runtime power management with: mechanisms to support
system-wide sleep state transitions; event codes that distinguish
different phases of swsusp "sleep" transitions; and userspace policy
inputs. This framework was never widely used, and most attempts to
use it were broken. Drivers should instead be exposing domain-specific
interfaces either to kernel or to userspace.
Who: Pavel Machek <pavel@suse.cz>
---------------------------
What: RAW driver (CONFIG_RAW_DRIVER)
When: December 2005
Why: declared obsolete since kernel 2.6.3
@@ -55,6 +70,18 @@ Who: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
---------------------------
What: sys_sysctl
When: January 2007
Why: The same information is available through /proc/sys and that is the
interface user space prefers to use. And there do not appear to be
any existing user in user space of sys_sysctl. The additional
maintenance overhead of keeping a set of binary names gets
in the way of doing a good job of maintaining this interface.
Who: Eric Biederman <ebiederm@xmission.com>
---------------------------
What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl])
When: November 2005
Files: drivers/pcmcia/: pcmcia_ioctl.c
@@ -202,14 +229,6 @@ Who: Nick Piggin <npiggin@suse.de>
---------------------------
What: Support for the MIPS EV96100 evaluation board
When: September 2006
Why: Does no longer build since at least November 15, 2003, apparently
no userbase left.
Who: Ralf Baechle <ralf@linux-mips.org>
---------------------------
What: Support for the Momentum / PMC-Sierra Jaguar ATX evaluation board
When: September 2006
Why: Does no longer build since quite some time, and was never popular,
@@ -294,3 +313,24 @@ Why: The frame diverter is included in most distribution kernels, but is
It is not clear if anyone is still using it.
Who: Stephen Hemminger <shemminger@osdl.org>
---------------------------
What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment
When: Oktober 2008
Why: The stacking of class devices makes these values misleading and
inconsistent.
Class devices should not carry any of these properties, and bus
devices have SUBSYTEM and DRIVER as a replacement.
Who: Kay Sievers <kay.sievers@suse.de>
---------------------------
What: i2c-isa
When: December 2006
Why: i2c-isa is a non-sense and doesn't fit in the device driver
model. Drivers relying on it are better implemented as platform
drivers.
Who: Jean Delvare <khali@linux-fr.org>
---------------------------

View File

@@ -1124,11 +1124,15 @@ debugging information is displayed on console.
NMI switch that most IA32 servers have fires unknown NMI up, for example.
If a system hangs up, try pressing the NMI switch.
[NOTE]
This function and oprofile share a NMI callback. Therefore this function
cannot be enabled when oprofile is activated.
And NMI watchdog will be disabled when the value in this file is set to
non-zero.
nmi_watchdog
------------
Enables/Disables the NMI watchdog on x86 systems. When the value is non-zero
the NMI watchdog is enabled and will continuously test all online cpus to
determine whether or not they are still functioning properly.
Because the NMI watchdog shares registers with oprofile, by disabling the NMI
watchdog, oprofile may have more registers to utilize.
2.4 /proc/sys/vm - The virtual memory subsystem

View File

@@ -7,9 +7,12 @@ Supported adapters:
* VIA Technologies, Inc. VT82C686A/B
Datasheet: Sometimes available at the VIA website
* VIA Technologies, Inc. VT8231, VT8233, VT8233A, VT8235, VT8237R
* VIA Technologies, Inc. VT8231, VT8233, VT8233A
Datasheet: available on request from VIA
* VIA Technologies, Inc. VT8235, VT8237R, VT8237A, VT8251
Datasheet: available on request and under NDA from VIA
Authors:
Kyösti Mälkki <kmalkki@cc.hut.fi>,
Mark D. Studebaker <mdsxyz123@yahoo.com>,
@@ -39,6 +42,8 @@ Your lspci -n listing must show one of these :
device 1106:8235 (VT8231 function 4)
device 1106:3177 (VT8235)
device 1106:3227 (VT8237R)
device 1106:3337 (VT8237A)
device 1106:3287 (VT8251)
If none of these show up, you should look in the BIOS for settings like
enable ACPI / SMBus or even USB.

View File

@@ -6,9 +6,12 @@ This module is a very simple fake I2C/SMBus driver. It implements four
types of SMBus commands: write quick, (r/w) byte, (r/w) byte data, and
(r/w) word data.
You need to provide a chip address as a module parameter when loading
this driver, which will then only react to SMBus commands to this address.
No hardware is needed nor associated with this module. It will accept write
quick commands to all addresses; it will respond to the other commands (also
to all addresses) by reading from or writing to an array in memory. It will
quick commands to one address; it will respond to the other commands (also
to one address) by reading from or writing to an array in memory. It will
also spam the kernel logs for every command it handles.
A pointer register with auto-increment is implemented for all byte
@@ -21,6 +24,11 @@ The typical use-case is like this:
3. load the target sensors chip driver module
4. observe its behavior in the kernel log
PARAMETERS:
int chip_addr:
The SMBus address to emulate a chip at.
CAVEATS:
There are independent arrays for byte/data and word/data commands. Depending
@@ -33,6 +41,9 @@ If the hardware for your driver has banked registers (e.g. Winbond sensors
chips) this module will not work well - although it could be extended to
support that pretty easily.
Only one chip address is supported - although this module could be
extended to support more.
If you spam it hard enough, printk can be lossy. This module really wants
something like relayfs.

View File

@@ -421,6 +421,11 @@ more details, with real examples.
The second argument is optional, and if supplied will be used
if first argument is not supported.
as-instr
as-instr checks if the assembler reports a specific instruction
and then outputs either option1 or option2
C escapes are supported in the test instruction
cc-option
cc-option is used to check if $(CC) supports a given option, and not
supported to use an optional second option.

View File

@@ -573,8 +573,6 @@ running once the system is up.
gscd= [HW,CD]
Format: <io>
gt96100eth= [NET] MIPS GT96100 Advanced Communication Controller
gus= [HW,OSS]
Format: <io>,<irq>,<dma>,<dma16>
@@ -1240,7 +1238,11 @@ running once the system is up.
bootloader. This is currently used on
IXP2000 systems where the bus has to be
configured a certain way for adjunct CPUs.
noearly [X86] Don't do any early type 1 scanning.
This might help on some broken boards which
machine check when some devices' config space
is read. But various workarounds are disabled
and some IOMMU drivers will not work.
pcmv= [HW,PCMCIA] BadgePAD 4
pd. [PARIDE]
@@ -1363,6 +1365,14 @@ running once the system is up.
reserve= [KNL,BUGS] Force the kernel to ignore some iomem area
reservetop= [IA-32]
Format: nn[KMG]
Reserves a hole at the top of the kernel virtual
address space.
reset_devices [KNL] Force drivers to reset the underlying device
during initialization.
resume= [SWSUSP]
Specify the partition device for software suspend

View File

@@ -192,6 +192,17 @@ or, for backwards compatibility, the option value. E.g.,
arp_interval
Specifies the ARP link monitoring frequency in milliseconds.
The ARP monitor works by periodically checking the slave
devices to determine whether they have sent or received
traffic recently (the precise criteria depends upon the
bonding mode, and the state of the slave). Regular traffic is
generated via ARP probes issued for the addresses specified by
the arp_ip_target option.
This behavior can be modified by the arp_validate option,
below.
If ARP monitoring is used in an etherchannel compatible mode
(modes 0 and 2), the switch should be configured in a mode
that evenly distributes packets across all links. If the
@@ -213,6 +224,54 @@ arp_ip_target
maximum number of targets that can be specified is 16. The
default value is no IP addresses.
arp_validate
Specifies whether or not ARP probes and replies should be
validated in the active-backup mode. This causes the ARP
monitor to examine the incoming ARP requests and replies, and
only consider a slave to be up if it is receiving the
appropriate ARP traffic.
Possible values are:
none or 0
No validation is performed. This is the default.
active or 1
Validation is performed only for the active slave.
backup or 2
Validation is performed only for backup slaves.
all or 3
Validation is performed for all slaves.
For the active slave, the validation checks ARP replies to
confirm that they were generated by an arp_ip_target. Since
backup slaves do not typically receive these replies, the
validation performed for backup slaves is on the ARP request
sent out via the active slave. It is possible that some
switch or network configurations may result in situations
wherein the backup slaves do not receive the ARP requests; in
such a situation, validation of backup slaves must be
disabled.
This option is useful in network configurations in which
multiple bonding hosts are concurrently issuing ARPs to one or
more targets beyond a common switch. Should the link between
the switch and target fail (but not the switch itself), the
probe traffic generated by the multiple bonding instances will
fool the standard ARP monitor into considering the links as
still up. Use of the arp_validate option can resolve this, as
the ARP monitor will only consider ARP requests and replies
associated with its own instance of bonding.
This option was added in bonding version 3.1.0.
downdelay
Specifies the time, in milliseconds, to wait before disabling

View File

@@ -1,7 +1,6 @@
DCCP protocol
============
Last updated: 10 November 2005
Contents
========
@@ -42,8 +41,11 @@ Socket options
DCCP_SOCKOPT_PACKET_SIZE is used for CCID3 to set default packet size for
calculations.
DCCP_SOCKOPT_SERVICE sets the service. This is compulsory as per the
specification. If you don't set it you will get EPROTO.
DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of
service codes (RFC 4340, sec. 8.1.2); if this socket option is not set,
the socket will fall back to 0 (which means that no meaningful service code
is present). Connecting sockets set at most one service option; for
listening sockets, multiple service codes can be specified.
Notes
=====

View File

@@ -116,6 +116,9 @@ FURTHER NOTES ON NO-MMU MMAP
(*) A list of all the mappings on the system is visible through /proc/maps in
no-MMU mode.
(*) A list of all the mappings in use by a process is visible through
/proc/<pid>/maps in no-MMU mode.
(*) Supplying MAP_FIXED or a requesting a particular mapping address will
result in an error.
@@ -125,6 +128,49 @@ FURTHER NOTES ON NO-MMU MMAP
error will result if they don't. This is most likely to be encountered
with character device files, pipes, fifos and sockets.
==========================
INTERPROCESS SHARED MEMORY
==========================
Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU
mode. The former through the usual mechanism, the latter through files created
on ramfs or tmpfs mounts.
=======
FUTEXES
=======
Futexes are supported in NOMMU mode if the arch supports them. An error will
be given if an address passed to the futex system call lies outside the
mappings made by a process or if the mapping in which the address lies does not
support futexes (such as an I/O chardev mapping).
=============
NO-MMU MREMAP
=============
The mremap() function is partially supported. It may change the size of a
mapping, and may move it[*] if MREMAP_MAYMOVE is specified and if the new size
of the mapping exceeds the size of the slab object currently occupied by the
memory to which the mapping refers, or if a smaller slab object could be used.
MREMAP_FIXED is not supported, though it is ignored if there's no change of
address and the object does not need to be moved.
Shared mappings may not be moved. Shareable mappings may not be moved either,
even if they are not currently shared.
The mremap() function must be given an exact match for base address and size of
a previously mapped object. It may not be used to create holes in existing
mappings, move parts of existing mappings or resize parts of mappings. It must
act on a complete mapping.
[*] Not currently supported.
============================================
PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT
============================================

View File

@@ -0,0 +1,253 @@
The PCI Express Advanced Error Reporting Driver Guide HOWTO
T. Long Nguyen <tom.l.nguyen@intel.com>
Yanmin Zhang <yanmin.zhang@intel.com>
07/29/2006
1. Overview
1.1 About this guide
This guide describes the basics of the PCI Express Advanced Error
Reporting (AER) driver and provides information on how to use it, as
well as how to enable the drivers of endpoint devices to conform with
PCI Express AER driver.
1.2 Copyright © Intel Corporation 2006.
1.3 What is the PCI Express AER Driver?
PCI Express error signaling can occur on the PCI Express link itself
or on behalf of transactions initiated on the link. PCI Express
defines two error reporting paradigms: the baseline capability and
the Advanced Error Reporting capability. The baseline capability is
required of all PCI Express components providing a minimum defined
set of error reporting requirements. Advanced Error Reporting
capability is implemented with a PCI Express advanced error reporting
extended capability structure providing more robust error reporting.
The PCI Express AER driver provides the infrastructure to support PCI
Express Advanced Error Reporting capability. The PCI Express AER
driver provides three basic functions:
- Gathers the comprehensive error information if errors occurred.
- Reports error to the users.
- Performs error recovery actions.
AER driver only attaches root ports which support PCI-Express AER
capability.
2. User Guide
2.1 Include the PCI Express AER Root Driver into the Linux Kernel
The PCI Express AER Root driver is a Root Port service driver attached
to the PCI Express Port Bus driver. If a user wants to use it, the driver
has to be compiled. Option CONFIG_PCIEAER supports this capability. It
depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
CONFIG_PCIEAER = y.
2.2 Load PCI Express AER Root Driver
There is a case where a system has AER support in BIOS. Enabling the AER
Root driver and having AER support in BIOS may result unpredictable
behavior. To avoid this conflict, a successful load of the AER Root driver
requires ACPI _OSC support in the BIOS to allow the AER Root driver to
request for native control of AER. See the PCI FW 3.0 Specification for
details regarding OSC usage. Currently, lots of firmwares don't provide
_OSC support while they use PCI Express. To support such firmwares,
forceload, a parameter of type bool, could enable AER to continue to
be initiated although firmwares have no _OSC support. To enable the
walkaround, pls. add aerdriver.forceload=y to kernel boot parameter line
when booting kernel. Note that forceload=n by default.
2.3 AER error output
When a PCI-E AER error is captured, an error message will be outputed to
console. If it's a correctable error, it is outputed as a warning.
Otherwise, it is printed as an error. So users could choose different
log level to filter out correctable error messages.
Below shows an example.
+------ PCI-Express Device Error -----+
Error Severity : Uncorrected (Fatal)
PCIE Bus Error type : Transaction Layer
Unsupported Request : First
Requester ID : 0500
VendorID=8086h, DeviceID=0329h, Bus=05h, Device=00h, Function=00h
TLB Header:
04000001 00200a03 05010000 00050100
In the example, 'Requester ID' means the ID of the device who sends
the error message to root port. Pls. refer to pci express specs for
other fields.
3. Developer Guide
To enable AER aware support requires a software driver to configure
the AER capability structure within its device and to provide callbacks.
To support AER better, developers need understand how AER does work
firstly.
PCI Express errors are classified into two types: correctable errors
and uncorrectable errors. This classification is based on the impacts
of those errors, which may result in degraded performance or function
failure.
Correctable errors pose no impacts on the functionality of the
interface. The PCI Express protocol can recover without any software
intervention or any loss of data. These errors are detected and
corrected by hardware. Unlike correctable errors, uncorrectable
errors impact functionality of the interface. Uncorrectable errors
can cause a particular transaction or a particular PCI Express link
to be unreliable. Depending on those error conditions, uncorrectable
errors are further classified into non-fatal errors and fatal errors.
Non-fatal errors cause the particular transaction to be unreliable,
but the PCI Express link itself is fully functional. Fatal errors, on
the other hand, cause the link to be unreliable.
When AER is enabled, a PCI Express device will automatically send an
error message to the PCIE root port above it when the device captures
an error. The Root Port, upon receiving an error reporting message,
internally processes and logs the error message in its PCI Express
capability structure. Error information being logged includes storing
the error reporting agent's requestor ID into the Error Source
Identification Registers and setting the error bits of the Root Error
Status Register accordingly. If AER error reporting is enabled in Root
Error Command Register, the Root Port generates an interrupt if an
error is detected.
Note that the errors as described above are related to the PCI Express
hierarchy and links. These errors do not include any device specific
errors because device specific errors will still get sent directly to
the device driver.
3.1 Configure the AER capability structure
AER aware drivers of PCI Express component need change the device
control registers to enable AER. They also could change AER registers,
including mask and severity registers. Helper function
pci_enable_pcie_error_reporting could be used to enable AER. See
section 3.3.
3.2. Provide callbacks
3.2.1 callback reset_link to reset pci express link
This callback is used to reset the pci express physical link when a
fatal error happens. The root port aer service driver provides a
default reset_link function, but different upstream ports might
have different specifications to reset pci express link, so all
upstream ports should provide their own reset_link functions.
In struct pcie_port_service_driver, a new pointer, reset_link, is
added.
pci_ers_result_t (*reset_link) (struct pci_dev *dev);
Section 3.2.2.2 provides more detailed info on when to call
reset_link.
3.2.2 PCI error-recovery callbacks
The PCI Express AER Root driver uses error callbacks to coordinate
with downstream device drivers associated with a hierarchy in question
when performing error recovery actions.
Data struct pci_driver has a pointer, err_handler, to point to
pci_error_handlers who consists of a couple of callback function
pointers. AER driver follows the rules defined in
pci-error-recovery.txt except pci express specific parts (e.g.
reset_link). Pls. refer to pci-error-recovery.txt for detailed
definitions of the callbacks.
Below sections specify when to call the error callback functions.
3.2.2.1 Correctable errors
Correctable errors pose no impacts on the functionality of
the interface. The PCI Express protocol can recover without any
software intervention or any loss of data. These errors do not
require any recovery actions. The AER driver clears the device's
correctable error status register accordingly and logs these errors.
3.2.2.2 Non-correctable (non-fatal and fatal) errors
If an error message indicates a non-fatal error, performing link reset
at upstream is not required. The AER driver calls error_detected(dev,
pci_channel_io_normal) to all drivers associated within a hierarchy in
question. for example,
EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort.
If Upstream port A captures an AER error, the hierarchy consists of
Downstream port B and EndPoint.
A driver may return PCI_ERS_RESULT_CAN_RECOVER,
PCI_ERS_RESULT_DISCONNECT, or PCI_ERS_RESULT_NEED_RESET, depending on
whether it can recover or the AER driver calls mmio_enabled as next.
If an error message indicates a fatal error, kernel will broadcast
error_detected(dev, pci_channel_io_frozen) to all drivers within
a hierarchy in question. Then, performing link reset at upstream is
necessary. As different kinds of devices might use different approaches
to reset link, AER port service driver is required to provide the
function to reset link. Firstly, kernel looks for if the upstream
component has an aer driver. If it has, kernel uses the reset_link
callback of the aer driver. If the upstream component has no aer driver
and the port is downstream port, we will use the aer driver of the
root port who reports the AER error. As for upstream ports,
they should provide their own aer service drivers with reset_link
function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
to mmio_enabled.
3.3 helper functions
3.3.1 int pci_find_aer_capability(struct pci_dev *dev);
pci_find_aer_capability locates the PCI Express AER capability
in the device configuration space. If the device doesn't support
PCI-Express AER, the function returns 0.
3.3.2 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
pci_enable_pcie_error_reporting enables the device to send error
messages to root port when an error is detected. Note that devices
don't enable the error reporting by default, so device drivers need
call this function to enable it.
3.3.3 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
pci_disable_pcie_error_reporting disables the device to send error
messages to root port when an error is detected.
3.3.4 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
error status register.
3.4 Frequent Asked Questions
Q: What happens if a PCI Express device driver does not provide an
error recovery handler (pci_driver->err_handler is equal to NULL)?
A: The devices attached with the driver won't be recovered. If the
error is fatal, kernel will print out warning messages. Please refer
to section 3 for more information.
Q: What happens if an upstream port service driver does not provide
callback reset_link?
A: Fatal error recovery will fail if the errors are reported by the
upstream ports who are attached by the service driver.
Q: How does this infrastructure deal with driver that is not PCI
Express aware?
A: This infrastructure calls the error callback functions of the
driver when an error happens. But if the driver is not aware of
PCI Express, the device might not report its own errors to root
port.
Q: What modifications will that driver need to make it compatible
with the PCI Express AER Root driver?
A: It could call the helper functions to enable AER in devices and
cleanup uncorrectable status register. Pls. refer to section 3.3.

File diff suppressed because it is too large Load Diff

View File

@@ -52,3 +52,18 @@ suspend image will be as small as possible.
Reading from this file will display the current image size limit, which
is set to 500 MB by default.
/sys/power/pm_trace controls the code which saves the last PM event point in
the RTC across reboots, so that you can debug a machine that just hangs
during suspend (or more commonly, during resume). Namely, the RTC is only
used to save the last PM event point if this file contains '1'. Initially it
contains '0' which may be changed to '1' by writing a string representing a
nonzero integer into it.
To use this debugging feature you should attempt to suspend the machine, then
reboot it and run
dmesg -s 1000000 | grep 'hash matches'
CAUTION: Using it will cause your machine's real-time (CMOS) clock to be
set to a random invalid time after a resume.

View File

@@ -41,11 +41,6 @@ Board-specific code:
|
.. more boards here ...
It should also be noted that each board is required to have some certain
headers. At the time of this writing, io.h is the only thing that needs
to be provided for each board, and can generally just reference generic
functions (with the exception of isa_port2addr).
Next, for companion chips:
.
`-- arch
@@ -104,12 +99,13 @@ and then populate that with sub-directories for each member of the family.
Both the Solution Engine and the hp6xx boards are an example of this.
After you have setup your new arch/sh/boards/ directory, remember that you
also must add a directory in include/asm-sh for headers localized to this
board. In order to interoperate seamlessly with the build system, it's best
to have this directory the same as the arch/sh/boards/ directory name,
though if your board is again part of a family, the build system has ways
of dealing with this, and you can feel free to name the directory after
the family member itself.
should also add a directory in include/asm-sh for headers localized to this
board (if there are going to be more than one). In order to interoperate
seamlessly with the build system, it's best to have this directory the same
as the arch/sh/boards/ directory name, though if your board is again part of
a family, the build system has ways of dealing with this (via incdir-y
overloading), and you can feel free to name the directory after the family
member itself.
There are a few things that each board is required to have, both in the
arch/sh/boards and the include/asm-sh/ heirarchy. In order to better
@@ -122,6 +118,7 @@ might look something like:
* arch/sh/boards/vapor/setup.c - Setup code for imaginary board
*/
#include <linux/init.h>
#include <asm/rtc.h> /* for board_time_init() */
const char *get_system_type(void)
{
@@ -152,79 +149,57 @@ int __init platform_setup(void)
}
Our new imaginary board will also have to tie into the machvec in order for it
to be of any use. Currently the machvec is slowly on its way out, but is still
required for the time being. As such, let us take a look at what needs to be
done for the machvec assignment.
to be of any use.
machvec functions fall into a number of categories:
- I/O functions to IO memory (inb etc) and PCI/main memory (readb etc).
- I/O remapping functions (ioremap etc)
- some initialisation functions
- a 'heartbeat' function
- some miscellaneous flags
- I/O mapping functions (ioport_map, ioport_unmap, etc).
- a 'heartbeat' function.
- PCI and IRQ initialization routines.
- Consistent allocators (for boards that need special allocators,
particularly for allocating out of some board-specific SRAM for DMA
handles).
The tree can be built in two ways:
- as a fully generic build. All drivers are linked in, and all functions
go through the machvec
- as a machine specific build. In this case only the required drivers
will be linked in, and some macros may be redefined to not go through
the machvec where performance is important (in particular IO functions).
There are machvec functions added and removed over time, so always be sure to
consult include/asm-sh/machvec.h for the current state of the machvec.
There are three ways in which IO can be performed:
- none at all. This is really only useful for the 'unknown' machine type,
which us designed to run on a machine about which we know nothing, and
so all all IO instructions do nothing.
- fully custom. In this case all IO functions go to a machine specific
set of functions which can do what they like
- a generic set of functions. These will cope with most situations,
and rely on a single function, mv_port2addr, which is called through the
machine vector, and converts an IO address into a memory address, which
can be read from/written to directly.
The kernel will automatically wrap in generic routines for undefined function
pointers in the machvec at boot time, as machvec functions are referenced
unconditionally throughout most of the tree. Some boards have incredibly
sparse machvecs (such as the dreamcast and sh03), whereas others must define
virtually everything (rts7751r2d).
Thus adding a new machine involves the following steps (I will assume I am
adding a machine called vapor):
Adding a new machine is relatively trivial (using vapor as an example):
- add a new file include/asm-sh/vapor/io.h which contains prototypes for
If the board-specific definitions are quite minimalistic, as is the case for
the vast majority of boards, simply having a single board-specific header is
sufficient.
- add a new file include/asm-sh/vapor.h which contains prototypes for
any machine specific IO functions prefixed with the machine name, for
example vapor_inb. These will be needed when filling out the machine
vector.
This is the minimum that is required, however there are ample
opportunities to optimise this. In particular, by making the prototypes
inline function definitions, it is possible to inline the function when
building machine specific versions. Note that the machine vector
functions will still be needed, so that a module built for a generic
setup can be loaded.
Note that these prototypes are generated automatically by setting
__IO_PREFIX to something sensible. A typical example would be:
- add a new file arch/sh/boards/vapor/mach.c. This contains the definition
of the machine vector. When building the machine specific version, this
will be the real machine vector (via an alias), while in the generic
version is used to initialise the machine vector, and then freed, by
making it initdata. This should be defined as:
#define __IO_PREFIX vapor
#include <asm/io_generic.h>
struct sh_machine_vector mv_vapor __initmv = {
.mv_name = "vapor",
}
ALIAS_MV(vapor)
somewhere in the board-specific header. Any boards being ported that still
have a legacy io.h should remove it entirely and switch to the new model.
- finally add a file arch/sh/boards/vapor/io.c, which contains
definitions of the machine specific io functions.
- Add machine vector definitions to the board's setup.c. At a bare minimum,
this must be defined as something like:
A note about initialisation functions. Three initialisation functions are
provided in the machine vector:
- mv_arch_init - called very early on from setup_arch
- mv_init_irq - called from init_IRQ, after the generic SH interrupt
initialisation
- mv_init_pci - currently not used
struct sh_machine_vector mv_vapor __initmv = {
.mv_name = "vapor",
};
ALIAS_MV(vapor)
Any other remaining functions which need to be called at start up can be
added to the list using the __initcalls macro (or module_init if the code
can be built as a module). Many generic drivers probe to see if the device
they are targeting is present, however this may not always be appropriate,
so a flag can be added to the machine vector which will be set on those
machines which have the hardware in question, reducing the probe to a
single conditional.
- finally add a file arch/sh/boards/vapor/io.c, which contains definitions of
the machine specific io functions (if there are enough to warrant it).
3. Hooking into the Build System
================================
@@ -303,4 +278,3 @@ which will in turn copy the defconfig for this board, run it through
oldconfig (prompting you for any new options since the time of creation),
and start you on your way to having a functional kernel for your new
board.

View File

@@ -0,0 +1,33 @@
Notes on register bank usage in the kernel
==========================================
Introduction
------------
The SH-3 and SH-4 CPU families traditionally include a single partial register
bank (selected by SR.RB, only r0 ... r7 are banked), whereas other families
may have more full-featured banking or simply no such capabilities at all.
SR.RB banking
-------------
In the case of this type of banking, banked registers are mapped directly to
r0 ... r7 if SR.RB is set to the bank we are interested in, otherwise ldc/stc
can still be used to reference the banked registers (as r0_bank ... r7_bank)
when in the context of another bank. The developer must keep the SR.RB value
in mind when writing code that utilizes these banked registers, for obvious
reasons. Userspace is also not able to poke at the bank1 values, so these can
be used rather effectively as scratch registers by the kernel.
Presently the kernel uses several of these registers.
- r0_bank, r1_bank (referenced as k0 and k1, used for scratch
registers when doing exception handling).
- r2_bank (used to track the EXPEVT/INTEVT code)
- Used by do_IRQ() and friends for doing irq mapping based off
of the interrupt exception vector jump table offset
- r6_bank (global interrupt mask)
- The SR.IMASK interrupt handler makes use of this to set the
interrupt priority level (used by local_irq_enable())
- r7_bank (current)

View File

@@ -29,6 +29,7 @@ Currently, these files are in /proc/sys/vm:
- drop-caches
- zone_reclaim_mode
- min_unmapped_ratio
- min_slab_ratio
- panic_on_oom
==============================================================
@@ -138,7 +139,6 @@ This is value ORed together of
1 = Zone reclaim on
2 = Zone reclaim writes dirty pages out
4 = Zone reclaim swaps pages
8 = Also do a global slab reclaim pass
zone_reclaim_mode is set during bootup to 1 if it is determined that pages
from remote zones will cause a measurable performance reduction. The
@@ -162,18 +162,13 @@ Allowing regular swap effectively restricts allocations to the local
node unless explicitly overridden by memory policies or cpuset
configurations.
It may be advisable to allow slab reclaim if the system makes heavy
use of files and builds up large slab caches. However, the slab
shrink operation is global, may take a long time and free slabs
in all nodes of the system.
=============================================================
min_unmapped_ratio:
This is available only on NUMA kernels.
A percentage of the file backed pages in each zone. Zone reclaim will only
A percentage of the total pages in each zone. Zone reclaim will only
occur if more than this percentage of pages are file backed and unmapped.
This is to insure that a minimal amount of local pages is still available for
file I/O even if the node is overallocated.
@@ -182,6 +177,24 @@ The default is 1 percent.
=============================================================
min_slab_ratio:
This is available only on NUMA kernels.
A percentage of the total pages in each zone. On Zone reclaim
(fallback from the local zone occurs) slabs will be reclaimed if more
than this percentage of pages in a zone are reclaimable slab pages.
This insures that the slab growth stays under control even in NUMA
systems that rarely perform global reclaim.
The default is 5 percent.
Note that slab reclaim is triggered in a per zone / node fashion.
The process of reclaiming slab memory is currently not node specific
and may not be fast.
=============================================================
panic_on_oom
This enables or disables panic on out-of-memory feature. If this is set to 1,

Some files were not shown because too many files have changed in this diff Show More