Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

Conflicts:
	drivers/net/pcmcia/pcnet_cs.c
	net/caif/caif_socket.c
This commit is contained in:
David S. Miller
2010-10-06 19:39:31 -07:00
730 changed files with 7837 additions and 4315 deletions
+4 -4
View File
@@ -3554,12 +3554,12 @@ E: cvance@nai.com
D: portions of the Linux Security Module (LSM) framework and security modules
N: Petr Vandrovec
E: vandrove@vc.cvut.cz
E: petr@vandrovec.name
D: Small contributions to ncpfs
D: Matrox framebuffer driver
S: Chudenicka 8
S: 10200 Prague 10, Hostivar
S: Czech Republic
S: 21513 Conradia Ct
S: Cupertino, CA 95014
S: USA
N: Thibaut Varene
E: T-Bone@parisc-linux.org
@@ -46,7 +46,6 @@
<sect1><title>Atomic and pointer manipulation</title>
!Iarch/x86/include/asm/atomic.h
!Iarch/x86/include/asm/unaligned.h
</sect1>
<sect1><title>Delaying, scheduling, and timer routines</title>
-1
View File
@@ -57,7 +57,6 @@
</para>
<sect1><title>String Conversions</title>
!Ilib/vsprintf.c
!Elib/vsprintf.c
</sect1>
<sect1><title>String Manipulation</title>
@@ -1961,6 +1961,12 @@ machines due to caching.
</sect1>
</chapter>
<chapter id="apiref">
<title>Mutex API reference</title>
!Iinclude/linux/mutex.h
!Ekernel/mutex.c
</chapter>
<chapter id="references">
<title>Further reading</title>
+45
View File
@@ -0,0 +1,45 @@
CFQ ioscheduler tunables
========================
slice_idle
----------
This specifies how long CFQ should idle for next request on certain cfq queues
(for sequential workloads) and service trees (for random workloads) before
queue is expired and CFQ selects next queue to dispatch from.
By default slice_idle is a non-zero value. That means by default we idle on
queues/service trees. This can be very helpful on highly seeky media like
single spindle SATA/SAS disks where we can cut down on overall number of
seeks and see improved throughput.
Setting slice_idle to 0 will remove all the idling on queues/service tree
level and one should see an overall improved throughput on faster storage
devices like multiple SATA/SAS disks in hardware RAID configuration. The down
side is that isolation provided from WRITES also goes down and notion of
IO priority becomes weaker.
So depending on storage and workload, it might be useful to set slice_idle=0.
In general I think for SATA/SAS disks and software RAID of SATA/SAS disks
keeping slice_idle enabled should be useful. For any configurations where
there are multiple spindles behind single LUN (Host based hardware RAID
controller or for storage arrays), setting slice_idle=0 might end up in better
throughput and acceptable latencies.
CFQ IOPS Mode for group scheduling
===================================
Basic CFQ design is to provide priority based time slices. Higher priority
process gets bigger time slice and lower priority process gets smaller time
slice. Measuring time becomes harder if storage is fast and supports NCQ and
it would be better to dispatch multiple requests from multiple cfq queues in
request queue at a time. In such scenario, it is not possible to measure time
consumed by single queue accurately.
What is possible though is to measure number of requests dispatched from a
single queue and also allow dispatch from multiple cfq queue at the same time.
This effectively becomes the fairness in terms of IOPS (IO operations per
second).
If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches
to IOPS mode and starts providing fairness in terms of number of requests
dispatched. Note that this mode switching takes effect only for group
scheduling. For non-cgroup users nothing should change.
@@ -217,6 +217,7 @@ Details of cgroup files
CFQ sysfs tunable
=================
/sys/block/<disk>/queue/iosched/group_isolation
-----------------------------------------------
If group_isolation=1, it provides stronger isolation between groups at the
expense of throughput. By default group_isolation is 0. In general that
@@ -243,6 +244,33 @@ By default one should run with group_isolation=0. If that is not sufficient
and one wants stronger isolation between groups, then set group_isolation=1
but this will come at cost of reduced throughput.
/sys/block/<disk>/queue/iosched/slice_idle
------------------------------------------
On a faster hardware CFQ can be slow, especially with sequential workload.
This happens because CFQ idles on a single queue and single queue might not
drive deeper request queue depths to keep the storage busy. In such scenarios
one can try setting slice_idle=0 and that would switch CFQ to IOPS
(IO operations per second) mode on NCQ supporting hardware.
That means CFQ will not idle between cfq queues of a cfq group and hence be
able to driver higher queue depth and achieve better throughput. That also
means that cfq provides fairness among groups in terms of IOPS and not in
terms of disk time.
/sys/block/<disk>/queue/iosched/group_idle
------------------------------------------
If one disables idling on individual cfq queues and cfq service trees by
setting slice_idle=0, group_idle kicks in. That means CFQ will still idle
on the group in an attempt to provide fairness among groups.
By default group_idle is same as slice_idle and does not do anything if
slice_idle is enabled.
One can experience an overall throughput drop if you have created multiple
groups and put applications in that group which are not driving enough
IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
on individual groups and throughput should improve.
What works
==========
- Currently only sync IO queues are support. All the buffered writes are
+14 -8
View File
@@ -109,17 +109,19 @@ use numbers 2000-2063 to identify GPIOs in a bank of I2C GPIO expanders.
If you want to initialize a structure with an invalid GPIO number, use
some negative number (perhaps "-EINVAL"); that will never be valid. To
test if a number could reference a GPIO, you may use this predicate:
test if such number from such a structure could reference a GPIO, you
may use this predicate:
int gpio_is_valid(int number);
A number that's not valid will be rejected by calls which may request
or free GPIOs (see below). Other numbers may also be rejected; for
example, a number might be valid but unused on a given board.
Whether a platform supports multiple GPIO controllers is currently a
platform-specific implementation issue.
example, a number might be valid but temporarily unused on a given board.
Whether a platform supports multiple GPIO controllers is a platform-specific
implementation issue, as are whether that support can leave "holes" in the space
of GPIO numbers, and whether new controllers can be added at runtime. Such issues
can affect things including whether adjacent GPIO numbers are both valid.
Using GPIOs
-----------
@@ -480,12 +482,16 @@ To support this framework, a platform's Kconfig will "select" either
ARCH_REQUIRE_GPIOLIB or ARCH_WANT_OPTIONAL_GPIOLIB
and arrange that its <asm/gpio.h> includes <asm-generic/gpio.h> and defines
three functions: gpio_get_value(), gpio_set_value(), and gpio_cansleep().
They may also want to provide a custom value for ARCH_NR_GPIOS.
ARCH_REQUIRE_GPIOLIB means that the gpio-lib code will always get compiled
It may also provide a custom value for ARCH_NR_GPIOS, so that it better
reflects the number of GPIOs in actual use on that platform, without
wasting static table space. (It should count both built-in/SoC GPIOs and
also ones on GPIO expanders.
ARCH_REQUIRE_GPIOLIB means that the gpiolib code will always get compiled
into the kernel on that architecture.
ARCH_WANT_OPTIONAL_GPIOLIB means the gpio-lib code defaults to off and the user
ARCH_WANT_OPTIONAL_GPIOLIB means the gpiolib code defaults to off and the user
can enable it and build it into the kernel optionally.
If neither of these options are selected, the platform does not support
+3 -4
View File
@@ -91,12 +91,11 @@ name The chip name.
I2C devices get this attribute created automatically.
RO
update_rate The rate at which the chip will update readings.
update_interval The interval at which the chip will update readings.
Unit: millisecond
RW
Some devices have a variable update rate. This attribute
can be used to change the update rate to the desired
frequency.
Some devices have a variable update rate or interval.
This attribute can be used to change it to the desired value.
************
+5
View File
@@ -345,5 +345,10 @@ documentation, in <filename>, for the functions listed.
section titled <section title> from <filename>.
Spaces are allowed in <section title>; do not quote the <section title>.
!C<filename> is replaced by nothing, but makes the tools check that
all DOC: sections and documented functions, symbols, etc. are used.
This makes sense to use when you use !F/!P only and want to verify
that all documentation is included.
Tim.
*/ <twaugh@redhat.com>
+2 -1
View File
@@ -9,7 +9,7 @@ firstly, there's nothing wrong with semaphores. But if the simpler
mutex semantics are sufficient for your code, then there are a couple
of advantages of mutexes:
- 'struct mutex' is smaller on most architectures: .e.g on x86,
- 'struct mutex' is smaller on most architectures: E.g. on x86,
'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
A smaller structure size means less RAM footprint, and better
CPU-cache utilization.
@@ -136,3 +136,4 @@ the APIs of 'struct mutex' have been streamlined:
void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
int mutex_lock_interruptible_nested(struct mutex *lock,
unsigned int subclass);
int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
File diff suppressed because it is too large Load Diff
+302
View File
@@ -0,0 +1,302 @@
Linux* Driver for Intel(R) Network Connection
===============================================================
Intel Gigabit Linux driver.
Copyright(c) 1999 - 2010 Intel Corporation.
Contents
========
- Identifying Your Adapter
- Command Line Parameters
- Additional Configurations
- Support
Identifying Your Adapter
========================
The e1000e driver supports all PCI Express Intel(R) Gigabit Network
Connections, except those that are 82575, 82576 and 82580-based*.
* NOTE: The Intel(R) PRO/1000 P Dual Port Server Adapter is supported by
the e1000 driver, not the e1000e driver due to the 82546 part being used
behind a PCI Express bridge.
For more information on how to identify your adapter, go to the Adapter &
Driver ID Guide at:
http://support.intel.com/support/go/network/adapter/idguide.htm
For the latest Intel network drivers for Linux, refer to the following
website. In the search field, enter your adapter name or type, or use the
networking link on the left to search for your adapter:
http://support.intel.com/support/go/network/adapter/home.htm
Command Line Parameters
=======================
The default value for each parameter is generally the recommended setting,
unless otherwise noted.
NOTES: For more information about the InterruptThrottleRate,
RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay
parameters, see the application note at:
http://www.intel.com/design/network/applnots/ap450.htm
InterruptThrottleRate
---------------------
Valid Range: 0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative,
4=simplified balancing)
Default Value: 3
The driver can limit the amount of interrupts per second that the adapter
will generate for incoming packets. It does this by writing a value to the
adapter that is based on the maximum amount of interrupts that the adapter
will generate per second.
Setting InterruptThrottleRate to a value greater or equal to 100
will program the adapter to send out a maximum of that many interrupts
per second, even if more packets have come in. This reduces interrupt
load on the system and can lower CPU utilization under heavy load,
but will increase latency as packets are not processed as quickly.
The driver has two adaptive modes (setting 1 or 3) in which
it dynamically adjusts the InterruptThrottleRate value based on the traffic
that it receives. After determining the type of incoming traffic in the last
timeframe, it will adjust the InterruptThrottleRate to an appropriate value
for that traffic.
The algorithm classifies the incoming traffic every interval into
classes. Once the class is determined, the InterruptThrottleRate value is
adjusted to suit that traffic type the best. There are three classes defined:
"Bulk traffic", for large amounts of packets of normal size; "Low latency",
for small amounts of traffic and/or a significant percentage of small
packets; and "Lowest latency", for almost completely small packets or
minimal traffic.
In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
latency" or "Lowest latency" class, the InterruptThrottleRate is increased
stepwise to 20000. This default mode is suitable for most applications.
For situations where low latency is vital such as cluster or
grid computing, the algorithm can reduce latency even more when
InterruptThrottleRate is set to mode 1. In this mode, which operates
the same as mode 3, the InterruptThrottleRate will be increased stepwise to
70000 for traffic in class "Lowest latency".
In simplified mode the interrupt rate is based on the ratio of Tx and
Rx traffic. If the bytes per second rate is approximately equal the
interrupt rate will drop as low as 2000 interrupts per second. If the
traffic is mostly transmit or mostly receive, the interrupt rate could
be as high as 8000.
Setting InterruptThrottleRate to 0 turns off any interrupt moderation
and may improve small packet latency, but is generally not suitable
for bulk throughput traffic.
NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and
RxAbsIntDelay parameters. In other words, minimizing the receive
and/or transmit absolute delays does not force the controller to
generate more interrupts than what the Interrupt Throttle Rate
allows.
NOTE: When e1000e is loaded with default settings and multiple adapters
are in use simultaneously, the CPU utilization may increase non-
linearly. In order to limit the CPU utilization without impacting
the overall throughput, we recommend that you load the driver as
follows:
modprobe e1000e InterruptThrottleRate=3000,3000,3000
This sets the InterruptThrottleRate to 3000 interrupts/sec for
the first, second, and third instances of the driver. The range
of 2000 to 3000 interrupts per second works on a majority of
systems and is a good starting point, but the optimal value will
be platform-specific. If CPU utilization is not a concern, use
RX_POLLING (NAPI) and default driver settings.
RxIntDelay
----------
Valid Range: 0-65535 (0=off)
Default Value: 0
This value delays the generation of receive interrupts in units of 1.024
microseconds. Receive interrupt reduction can improve CPU efficiency if
properly tuned for specific network traffic. Increasing this value adds
extra latency to frame reception and can end up decreasing the throughput
of TCP traffic. If the system is reporting dropped receives, this value
may be set too high, causing the driver to run out of available receive
descriptors.
CAUTION: When setting RxIntDelay to a value other than 0, adapters may
hang (stop transmitting) under certain network conditions. If
this occurs a NETDEV WATCHDOG message is logged in the system
event log. In addition, the controller is automatically reset,
restoring the network connection. To eliminate the potential
for the hang ensure that RxIntDelay is set to 0.
RxAbsIntDelay
-------------
Valid Range: 0-65535 (0=off)
Default Value: 8
This value, in units of 1.024 microseconds, limits the delay in which a
receive interrupt is generated. Useful only if RxIntDelay is non-zero,
this value ensures that an interrupt is generated after the initial
packet is received within the set amount of time. Proper tuning,
along with RxIntDelay, may improve traffic throughput in specific network
conditions.
TxIntDelay
----------
Valid Range: 0-65535 (0=off)
Default Value: 8
This value delays the generation of transmit interrupts in units of
1.024 microseconds. Transmit interrupt reduction can improve CPU
efficiency if properly tuned for specific network traffic. If the
system is reporting dropped transmits, this value may be set too high
causing the driver to run out of available transmit descriptors.
TxAbsIntDelay
-------------
Valid Range: 0-65535 (0=off)
Default Value: 32
This value, in units of 1.024 microseconds, limits the delay in which a
transmit interrupt is generated. Useful only if TxIntDelay is non-zero,
this value ensures that an interrupt is generated after the initial
packet is sent on the wire within the set amount of time. Proper tuning,
along with TxIntDelay, may improve traffic throughput in specific
network conditions.
Copybreak
---------
Valid Range: 0-xxxxxxx (0=off)
Default Value: 256
Driver copies all packets below or equaling this size to a fresh Rx
buffer before handing it up the stack.
This parameter is different than other parameters, in that it is a
single (not 1,1,1 etc.) parameter applied to all driver instances and
it is also available during runtime at
/sys/module/e1000e/parameters/copybreak
SmartPowerDownEnable
--------------------
Valid Range: 0-1
Default Value: 0 (disabled)
Allows PHY to turn off in lower power states. The user can set this parameter
in supported chipsets.
KumeranLockLoss
---------------
Valid Range: 0-1
Default Value: 1 (enabled)
This workaround skips resetting the PHY at shutdown for the initial
silicon releases of ICH8 systems.
IntMode
-------
Valid Range: 0-2 (0=legacy, 1=MSI, 2=MSI-X)
Default Value: 2
Allows changing the interrupt mode at module load time, without requiring a
recompile. If the driver load fails to enable a specific interrupt mode, the
driver will try other interrupt modes, from least to most compatible. The
interrupt order is MSI-X, MSI, Legacy. If specifying MSI (IntMode=1)
interrupts, only MSI and Legacy will be attempted.
CrcStripping
------------
Valid Range: 0-1
Default Value: 1 (enabled)
Strip the CRC from received packets before sending up the network stack. If
you have a machine with a BMC enabled but cannot receive IPMI traffic after
loading or enabling the driver, try disabling this feature.
WriteProtectNVM
---------------
Valid Range: 0-1
Default Value: 1 (enabled)
Set the hardware to ignore all write/erase cycles to the GbE region in the
ICHx NVM (non-volatile memory). This feature can be disabled by the
WriteProtectNVM module parameter (enabled by default) only after a hardware
reset, but the machine must be power cycled before trying to enable writes.
Note: the kernel boot option iomem=relaxed may need to be set if the kernel
config option CONFIG_STRICT_DEVMEM=y, if the root user wants to write the
NVM from user space via ethtool.
Additional Configurations
=========================
Jumbo Frames
------------
Jumbo Frames support is enabled by changing the MTU to a value larger than
the default of 1500. Use the ifconfig command to increase the MTU size.
For example:
ifconfig eth<x> mtu 9000 up
This setting is not saved across reboots.
Notes:
- The maximum MTU setting for Jumbo Frames is 9216. This value coincides
with the maximum Jumbo Frames size of 9234 bytes.
- Using Jumbo Frames at 10 or 100 Mbps is not supported and may result in
poor performance or loss of link.
- Some adapters limit Jumbo Frames sized packets to a maximum of
4096 bytes and some adapters do not support Jumbo Frames.
Ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. We
strongly recommend downloading the latest version of Ethtool at:
http://sourceforge.net/projects/gkernel.
Speed and Duplex
----------------
Speed and Duplex are configured through the Ethtool* utility. For
instructions, refer to the Ethtool man page.
Enabling Wake on LAN* (WoL)
---------------------------
WoL is configured through the Ethtool* utility. For instructions on
enabling WoL with Ethtool, refer to the Ethtool man page.
WoL will be enabled on the system during the next shut down or reboot.
For this driver version, in order to enable WoL, the e1000e driver must be
loaded when shutting down or rebooting the system.
In most cases Wake On LAN is only supported on port A for multiple port
adapters. To verify if a port supports Wake on LAN run ethtool eth<X>.
Support
=======
For general information, go to the Intel support website at:
www.intel.com/support/
or the Intel Wired Networking project hosted by Sourceforge at:
http://sourceforge.net/projects/e1000
If an issue is identified with the released source code on the supported
kernel with a supported adapter, email the specific information related
to the issue to e1000-devel@lists.sf.net
+3 -37
View File
@@ -1,19 +1,16 @@
Linux* Base Driver for Intel(R) Network Connection
==================================================
November 24, 2009
Intel Gigabit Linux driver.
Copyright(c) 1999 - 2010 Intel Corporation.
Contents
========
- In This Release
- Identifying Your Adapter
- Known Issues/Troubleshooting
- Support
In This Release
===============
This file describes the ixgbevf Linux* Base Driver for Intel Network
Connection.
@@ -33,7 +30,7 @@ Identifying Your Adapter
For more information on how to identify your adapter, go to the Adapter &
Driver ID Guide at:
http://support.intel.com/support/network/sb/CS-008441.htm
http://support.intel.com/support/go/network/adapter/idguide.htm
Known Issues/Troubleshooting
============================
@@ -57,34 +54,3 @@ or the Intel Wired Networking project hosted by Sourceforge at:
If an issue is identified with the released source code on the supported
kernel with a supported adapter, email the specific information related
to the issue to e1000-devel@lists.sf.net
License
=======
Intel 10 Gigabit Linux driver.
Copyright(c) 1999 - 2009 Intel Corporation.
This program is free software; you can redistribute it and/or modify it
under the terms and conditions of the GNU General Public License,
version 2, as published by the Free Software Foundation.
This program is distributed in the hope it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
The full GNU General Public License is included in this distribution in
the file called "COPYING".
Trademarks
==========
Intel, Itanium, and Pentium are trademarks or registered trademarks of
Intel Corporation or its subsidiaries in the United States and other
countries.
* Other names and brands may be claimed as the property of others.
+1 -1
View File
@@ -13,7 +13,7 @@ regulators (where voltage output is controllable) and current sinks (where
current limit is controllable).
(C) 2008 Wolfson Microelectronics PLC.
Author: Liam Girdwood <lg@opensource.wolfsonmicro.com>
Author: Liam Girdwood <lrg@slimlogic.co.uk>
Nomenclature
@@ -296,6 +296,7 @@ Conexant 5051
Conexant 5066
=============
laptop Basic Laptop config (default)
hp-laptop HP laptops, e g G60
dell-laptop Dell laptops
dell-vostro Dell Vostro
olpc-xo-1_5 OLPC XO 1.5
+380
View File
@@ -0,0 +1,380 @@
Concurrency Managed Workqueue (cmwq)
September, 2010 Tejun Heo <tj@kernel.org>
Florian Mickler <florian@mickler.org>
CONTENTS
1. Introduction
2. Why cmwq?
3. The Design
4. Application Programming Interface (API)
5. Example Execution Scenarios
6. Guidelines
1. Introduction
There are many cases where an asynchronous process execution context
is needed and the workqueue (wq) API is the most commonly used
mechanism for such cases.
When such an asynchronous execution context is needed, a work item
describing which function to execute is put on a queue. An
independent thread serves as the asynchronous execution context. The
queue is called workqueue and the thread is called worker.
While there are work items on the workqueue the worker executes the
functions associated with the work items one after the other. When
there is no work item left on the workqueue the worker becomes idle.
When a new work item gets queued, the worker begins executing again.
2. Why cmwq?
In the original wq implementation, a multi threaded (MT) wq had one
worker thread per CPU and a single threaded (ST) wq had one worker
thread system-wide. A single MT wq needed to keep around the same
number of workers as the number of CPUs. The kernel grew a lot of MT
wq users over the years and with the number of CPU cores continuously
rising, some systems saturated the default 32k PID space just booting
up.
Although MT wq wasted a lot of resource, the level of concurrency
provided was unsatisfactory. The limitation was common to both ST and
MT wq albeit less severe on MT. Each wq maintained its own separate
worker pool. A MT wq could provide only one execution context per CPU
while a ST wq one for the whole system. Work items had to compete for
those very limited execution contexts leading to various problems
including proneness to deadlocks around the single execution context.
The tension between the provided level of concurrency and resource
usage also forced its users to make unnecessary tradeoffs like libata
choosing to use ST wq for polling PIOs and accepting an unnecessary
limitation that no two polling PIOs can progress at the same time. As
MT wq don't provide much better concurrency, users which require
higher level of concurrency, like async or fscache, had to implement
their own thread pool.
Concurrency Managed Workqueue (cmwq) is a reimplementation of wq with
focus on the following goals.
* Maintain compatibility with the original workqueue API.
* Use per-CPU unified worker pools shared by all wq to provide
flexible level of concurrency on demand without wasting a lot of
resource.
* Automatically regulate worker pool and level of concurrency so that
the API users don't need to worry about such details.
3. The Design
In order to ease the asynchronous execution of functions a new
abstraction, the work item, is introduced.
A work item is a simple struct that holds a pointer to the function
that is to be executed asynchronously. Whenever a driver or subsystem
wants a function to be executed asynchronously it has to set up a work
item pointing to that function and queue that work item on a
workqueue.
Special purpose threads, called worker threads, execute the functions
off of the queue, one after the other. If no work is queued, the
worker threads become idle. These worker threads are managed in so
called thread-pools.
The cmwq design differentiates between the user-facing workqueues that
subsystems and drivers queue work items on and the backend mechanism
which manages thread-pool and processes the queued work items.
The backend is called gcwq. There is one gcwq for each possible CPU
and one gcwq to serve work items queued on unbound workqueues.
Subsystems and drivers can create and queue work items through special
workqueue API functions as they see fit. They can influence some
aspects of the way the work items are executed by setting flags on the
workqueue they are putting the work item on. These flags include
things like CPU locality, reentrancy, concurrency limits and more. To
get a detailed overview refer to the API description of
alloc_workqueue() below.
When a work item is queued to a workqueue, the target gcwq is
determined according to the queue parameters and workqueue attributes
and appended on the shared worklist of the gcwq. For example, unless
specifically overridden, a work item of a bound workqueue will be
queued on the worklist of exactly that gcwq that is associated to the
CPU the issuer is running on.
For any worker pool implementation, managing the concurrency level
(how many execution contexts are active) is an important issue. cmwq
tries to keep the concurrency at a minimal but sufficient level.
Minimal to save resources and sufficient in that the system is used at
its full capacity.
Each gcwq bound to an actual CPU implements concurrency management by
hooking into the scheduler. The gcwq is notified whenever an active
worker wakes up or sleeps and keeps track of the number of the
currently runnable workers. Generally, work items are not expected to
hog a CPU and consume many cycles. That means maintaining just enough
concurrency to prevent work processing from stalling should be
optimal. As long as there are one or more runnable workers on the
CPU, the gcwq doesn't start execution of a new work, but, when the
last running worker goes to sleep, it immediately schedules a new
worker so that the CPU doesn't sit idle while there are pending work
items. This allows using a minimal number of workers without losing
execution bandwidth.
Keeping idle workers around doesn't cost other than the memory space
for kthreads, so cmwq holds onto idle ones for a while before killing
them.
For an unbound wq, the above concurrency management doesn't apply and
the gcwq for the pseudo unbound CPU tries to start executing all work
items as soon as possible. The responsibility of regulating
concurrency level is on the users. There is also a flag to mark a
bound wq to ignore the concurrency management. Please refer to the
API section for details.
Forward progress guarantee relies on that workers can be created when
more execution contexts are necessary, which in turn is guaranteed
through the use of rescue workers. All work items which might be used
on code paths that handle memory reclaim are required to be queued on
wq's that have a rescue-worker reserved for execution under memory
pressure. Else it is possible that the thread-pool deadlocks waiting
for execution contexts to free up.
4. Application Programming Interface (API)
alloc_workqueue() allocates a wq. The original create_*workqueue()
functions are deprecated and scheduled for removal. alloc_workqueue()
takes three arguments - @name, @flags and @max_active. @name is the
name of the wq and also used as the name of the rescuer thread if
there is one.
A wq no longer manages execution resources but serves as a domain for
forward progress guarantee, flush and work item attributes. @flags
and @max_active control how work items are assigned execution
resources, scheduled and executed.
@flags:
WQ_NON_REENTRANT
By default, a wq guarantees non-reentrance only on the same
CPU. A work item may not be executed concurrently on the same
CPU by multiple workers but is allowed to be executed
concurrently on multiple CPUs. This flag makes sure
non-reentrance is enforced across all CPUs. Work items queued
to a non-reentrant wq are guaranteed to be executed by at most
one worker system-wide at any given time.
WQ_UNBOUND
Work items queued to an unbound wq are served by a special
gcwq which hosts workers which are not bound to any specific
CPU. This makes the wq behave as a simple execution context
provider without concurrency management. The unbound gcwq
tries to start execution of work items as soon as possible.
Unbound wq sacrifices locality but is useful for the following
cases.
* Wide fluctuation in the concurrency level requirement is
expected and using bound wq may end up creating large number
of mostly unused workers across different CPUs as the issuer
hops through different CPUs.
* Long running CPU intensive workloads which can be better
managed by the system scheduler.
WQ_FREEZEABLE
A freezeable wq participates in the freeze phase of the system
suspend operations. Work items on the wq are drained and no
new work item starts execution until thawed.
WQ_RESCUER
All wq which might be used in the memory reclaim paths _MUST_
have this flag set. This reserves one worker exclusively for
the execution of this wq under memory pressure.
WQ_HIGHPRI
Work items of a highpri wq are queued at the head of the
worklist of the target gcwq and start execution regardless of
the current concurrency level. In other words, highpri work
items will always start execution as soon as execution
resource is available.
Ordering among highpri work items is preserved - a highpri
work item queued after another highpri work item will start
execution after the earlier highpri work item starts.
Although highpri work items are not held back by other
runnable work items, they still contribute to the concurrency
level. Highpri work items in runnable state will prevent
non-highpri work items from starting execution.
This flag is meaningless for unbound wq.
WQ_CPU_INTENSIVE
Work items of a CPU intensive wq do not contribute to the
concurrency level. In other words, runnable CPU intensive
work items will not prevent other work items from starting
execution. This is useful for bound work items which are
expected to hog CPU cycles so that their execution is
regulated by the system scheduler.
Although CPU intensive work items don't contribute to the
concurrency level, start of their executions is still
regulated by the concurrency management and runnable
non-CPU-intensive work items can delay execution of CPU
intensive work items.
This flag is meaningless for unbound wq.
WQ_HIGHPRI | WQ_CPU_INTENSIVE
This combination makes the wq avoid interaction with
concurrency management completely and behave as a simple
per-CPU execution context provider. Work items queued on a
highpri CPU-intensive wq start execution as soon as resources
are available and don't affect execution of other work items.
@max_active:
@max_active determines the maximum number of execution contexts per
CPU which can be assigned to the work items of a wq. For example,
with @max_active of 16, at most 16 work items of the wq can be
executing at the same time per CPU.
Currently, for a bound wq, the maximum limit for @max_active is 512
and the default value used when 0 is specified is 256. For an unbound
wq, the limit is higher of 512 and 4 * num_possible_cpus(). These
values are chosen sufficiently high such that they are not the
limiting factor while providing protection in runaway cases.
The number of active work items of a wq is usually regulated by the
users of the wq, more specifically, by how many work items the users
may queue at the same time. Unless there is a specific need for
throttling the number of active work items, specifying '0' is
recommended.
Some users depend on the strict execution ordering of ST wq. The
combination of @max_active of 1 and WQ_UNBOUND is used to achieve this
behavior. Work items on such wq are always queued to the unbound gcwq
and only one work item can be active at any given time thus achieving
the same ordering property as ST wq.
5. Example Execution Scenarios
The following example execution scenarios try to illustrate how cmwq
behave under different configurations.
Work items w0, w1, w2 are queued to a bound wq q0 on the same CPU.
w0 burns CPU for 5ms then sleeps for 10ms then burns CPU for 5ms
again before finishing. w1 and w2 burn CPU for 5ms then sleep for
10ms.
Ignoring all other tasks, works and processing overhead, and assuming
simple FIFO scheduling, the following is one highly simplified version
of possible sequences of events with the original wq.
TIME IN MSECS EVENT
0 w0 starts and burns CPU
5 w0 sleeps
15 w0 wakes up and burns CPU
20 w0 finishes
20 w1 starts and burns CPU
25 w1 sleeps
35 w1 wakes up and finishes
35 w2 starts and burns CPU
40 w2 sleeps
50 w2 wakes up and finishes
And with cmwq with @max_active >= 3,
TIME IN MSECS EVENT
0 w0 starts and burns CPU
5 w0 sleeps
5 w1 starts and burns CPU
10 w1 sleeps
10 w2 starts and burns CPU
15 w2 sleeps
15 w0 wakes up and burns CPU
20 w0 finishes
20 w1 wakes up and finishes
25 w2 wakes up and finishes
If @max_active == 2,
TIME IN MSECS EVENT
0 w0 starts and burns CPU
5 w0 sleeps
5 w1 starts and burns CPU
10 w1 sleeps
15 w0 wakes up and burns CPU
20 w0 finishes
20 w1 wakes up and finishes
20 w2 starts and burns CPU
25 w2 sleeps
35 w2 wakes up and finishes
Now, let's assume w1 and w2 are queued to a different wq q1 which has
WQ_HIGHPRI set,
TIME IN MSECS EVENT
0 w1 and w2 start and burn CPU
5 w1 sleeps
10 w2 sleeps
10 w0 starts and burns CPU
15 w0 sleeps
15 w1 wakes up and finishes
20 w2 wakes up and finishes
25 w0 wakes up and burns CPU
30 w0 finishes
If q1 has WQ_CPU_INTENSIVE set,
TIME IN MSECS EVENT
0 w0 starts and burns CPU
5 w0 sleeps
5 w1 and w2 start and burn CPU
10 w1 sleeps
15 w2 sleeps
15 w0 wakes up and burns CPU
20 w0 finishes
20 w1 wakes up and finishes
25 w2 wakes up and finishes
6. Guidelines
* Do not forget to use WQ_RESCUER if a wq may process work items which
are used during memory reclaim. Each wq with WQ_RESCUER set has one
rescuer thread reserved for it. If there is dependency among
multiple work items used during memory reclaim, they should be
queued to separate wq each with WQ_RESCUER.
* Unless strict ordering is required, there is no need to use ST wq.
* Unless there is a specific need, using 0 for @max_active is
recommended. In most use cases, concurrency level usually stays
well under the default limit.
* A wq serves as a domain for forward progress guarantee (WQ_RESCUER),
flush and work item attributes. Work items which are not involved
in memory reclaim and don't need to be flushed as a part of a group
of work items, and don't require any special attribute, can use one
of the system wq. There is no difference in execution
characteristics between using a dedicated wq and a system wq.
* Unless work items are expected to consume a huge amount of CPU
cycles, using a bound wq is usually beneficial due to the increased
level of locality in wq operations and work item execution.
+52 -26
View File
@@ -962,6 +962,13 @@ W: http://www.fluff.org/ben/linux/
S: Maintained
F: arch/arm/mach-s3c6410/
ARM/S5P ARM ARCHITECTURES
M: Kukjin Kim <kgene.kim@samsung.com>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
L: linux-samsung-soc@vger.kernel.org (moderated for non-subscribers)
S: Maintained
F: arch/arm/mach-s5p*/
ARM/SHMOBILE ARM ARCHITECTURE
M: Paul Mundt <lethal@linux-sh.org>
M: Magnus Damm <magnus.damm@gmail.com>
@@ -1227,7 +1234,7 @@ F: drivers/auxdisplay/
F: include/linux/cfag12864b.h
AVR32 ARCHITECTURE
M: Haavard Skinnemoen <hskinnemoen@atmel.com>
M: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
W: http://www.atmel.com/products/AVR32/
W: http://avr32linux.org/
W: http://avrfreaks.net/
@@ -1235,7 +1242,7 @@ S: Supported
F: arch/avr32/
AVR32/AT32AP MACHINE SUPPORT
M: Haavard Skinnemoen <hskinnemoen@atmel.com>
M: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
S: Supported
F: arch/avr32/mach-at32ap/
@@ -2213,6 +2220,12 @@ W: http://acpi4asus.sf.net
S: Maintained
F: drivers/platform/x86/eeepc-laptop.c
EFIFB FRAMEBUFFER DRIVER
L: linux-fbdev@vger.kernel.org
M: Peter Jones <pjones@redhat.com>
S: Maintained
F: drivers/video/efifb.c
EFS FILESYSTEM
W: http://aeschi.ch.eu.org/efs/
S: Orphan
@@ -2671,9 +2684,14 @@ S: Maintained
F: drivers/media/video/gspca/
HARDWARE MONITORING
M: Jean Delvare <khali@linux-fr.org>
M: Guenter Roeck <guenter.roeck@ericsson.com>
L: lm-sensors@lm-sensors.org
W: http://www.lm-sensors.org/
S: Orphan
T: quilt kernel.org/pub/linux/kernel/people/jdelvare/linux-2.6/jdelvare-hwmon/
T: quilt kernel.org/pub/linux/kernel/people/groeck/linux-staging/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git
S: Maintained
F: Documentation/hwmon/
F: drivers/hwmon/
F: include/linux/hwmon*.h
@@ -2811,11 +2829,6 @@ S: Maintained
F: arch/x86/kernel/hpet.c
F: arch/x86/include/asm/hpet.h
HPET: ACPI
M: Bob Picco <bob.picco@hp.com>
S: Maintained
F: drivers/char/hpet.c
HPFS FILESYSTEM
M: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
W: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi
@@ -3070,16 +3083,27 @@ L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/ixp2000/
INTEL ETHERNET DRIVERS (e100/e1000/e1000e/igb/igbvf/ixgb/ixgbe)
INTEL ETHERNET DRIVERS (e100/e1000/e1000e/igb/igbvf/ixgb/ixgbe/ixgbevf)
M: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
M: Jesse Brandeburg <jesse.brandeburg@intel.com>
M: Bruce Allan <bruce.w.allan@intel.com>
M: Alex Duyck <alexander.h.duyck@intel.com>
M: Carolyn Wyborny <carolyn.wyborny@intel.com>
M: Don Skidmore <donald.c.skidmore@intel.com>
M: Greg Rose <gregory.v.rose@intel.com>
M: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>
M: Alex Duyck <alexander.h.duyck@intel.com>
M: John Ronciak <john.ronciak@intel.com>
L: e1000-devel@lists.sourceforge.net
W: http://e1000.sourceforge.net/
S: Supported
F: Documentation/networking/e100.txt
F: Documentation/networking/e1000.txt
F: Documentation/networking/e1000e.txt
F: Documentation/networking/igb.txt
F: Documentation/networking/igbvf.txt
F: Documentation/networking/ixgb.txt
F: Documentation/networking/ixgbe.txt
F: Documentation/networking/ixgbevf.txt
F: drivers/net/e100.c
F: drivers/net/e1000/
F: drivers/net/e1000e/
@@ -3087,6 +3111,7 @@ F: drivers/net/igb/
F: drivers/net/igbvf/
F: drivers/net/ixgb/
F: drivers/net/ixgbe/
F: drivers/net/ixgbevf/
INTEL PRO/WIRELESS 2100 NETWORK CONNECTION SUPPORT
L: linux-wireless@vger.kernel.org
@@ -3434,7 +3459,7 @@ F: drivers/s390/kvm/
KEXEC
M: Eric Biederman <ebiederm@xmission.com>
W: http://ftp.kernel.org/pub/linux/kernel/people/horms/kexec-tools/
W: http://kernel.org/pub/linux/utils/kernel/kexec/
L: kexec@lists.infradead.org
S: Maintained
F: include/linux/kexec.h
@@ -3795,9 +3820,8 @@ W: http://www.syskonnect.com
S: Supported
MATROX FRAMEBUFFER DRIVER
M: Petr Vandrovec <vandrove@vc.cvut.cz>
L: linux-fbdev@vger.kernel.org
S: Maintained
S: Orphan
F: drivers/video/matrox/matroxfb_*
F: include/linux/matroxfb.h
@@ -3921,10 +3945,8 @@ F: Documentation/serial/moxa-smartio
F: drivers/char/mxser.*
MSI LAPTOP SUPPORT
M: Lennart Poettering <mzxreary@0pointer.de>
M: Lee, Chun-Yi <jlee@novell.com>
L: platform-driver-x86@vger.kernel.org
W: https://tango.0pointer.de/mailman/listinfo/s270-linux
W: http://0pointer.de/lennart/tchibo.html
S: Maintained
F: drivers/platform/x86/msi-laptop.c
@@ -3941,8 +3963,10 @@ S: Supported
F: drivers/mfd/
MULTIMEDIA CARD (MMC), SECURE DIGITAL (SD) AND SDIO SUBSYSTEM
S: Orphan
M: Chris Ball <cjb@laptop.org>
L: linux-mmc@vger.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc.git
S: Maintained
F: drivers/mmc/
F: include/linux/mmc/
@@ -3964,7 +3988,7 @@ F: drivers/char/isicom.c
F: include/linux/isicom.h
MUSB MULTIPOINT HIGH SPEED DUAL-ROLE CONTROLLER
M: Felipe Balbi <felipe.balbi@nokia.com>
M: Felipe Balbi <balbi@ti.com>
L: linux-usb@vger.kernel.org
T: git git://gitorious.org/usb/usb.git
S: Maintained
@@ -3984,8 +4008,8 @@ S: Maintained
F: drivers/net/natsemi.c
NCP FILESYSTEM
M: Petr Vandrovec <vandrove@vc.cvut.cz>
S: Maintained
M: Petr Vandrovec <petr@vandrovec.name>
S: Odd Fixes
F: fs/ncpfs/
NCR DUAL 700 SCSI DRIVER (MICROCHANNEL)
@@ -4262,7 +4286,7 @@ S: Maintained
F: drivers/char/hw_random/omap-rng.c
OMAP USB SUPPORT
M: Felipe Balbi <felipe.balbi@nokia.com>
M: Felipe Balbi <balbi@ti.com>
M: David Brownell <dbrownell@users.sourceforge.net>
L: linux-usb@vger.kernel.org
L: linux-omap@vger.kernel.org
@@ -4839,6 +4863,7 @@ RCUTORTURE MODULE
M: Josh Triplett <josh@freedesktop.org>
M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
S: Supported
T: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
F: Documentation/RCU/torture.txt
F: kernel/rcutorture.c
@@ -4863,6 +4888,7 @@ M: Dipankar Sarma <dipankar@in.ibm.com>
M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
W: http://www.rdrop.com/users/paulmck/rclock/
S: Supported
T: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
F: Documentation/RCU/
F: include/linux/rcu*
F: include/linux/srcu*
@@ -4870,12 +4896,10 @@ F: kernel/rcu*
F: kernel/srcu*
X: kernel/rcutorture.c
REAL TIME CLOCK DRIVER
REAL TIME CLOCK DRIVER (LEGACY)
M: Paul Gortmaker <p_gortmaker@yahoo.com>
S: Maintained
F: Documentation/rtc.txt
F: drivers/rtc/
F: include/linux/rtc.h
F: drivers/char/rtc.c
REAL TIME CLOCK (RTC) SUBSYSTEM
M: Alessandro Zummo <a.zummo@towertech.it>
@@ -5112,8 +5136,10 @@ S: Maintained
F: drivers/mmc/host/sdricoh_cs.c
SECURE DIGITAL HOST CONTROLLER INTERFACE (SDHCI) DRIVER
S: Orphan
M: Chris Ball <cjb@laptop.org>
L: linux-mmc@vger.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc.git
S: Maintained
F: drivers/mmc/host/sdhci.*
SECURE DIGITAL HOST CONTROLLER INTERFACE, OPEN FIRMWARE BINDINGS (SDHCI-OF)
+1 -1
View File
@@ -1,7 +1,7 @@
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 36
EXTRAVERSION = -rc3
EXTRAVERSION = -rc7
NAME = Sheep on Meth
# *DOCUMENTATION*
+2 -2
View File
@@ -32,8 +32,9 @@ config HAVE_OPROFILE
config KPROBES
bool "Kprobes"
depends on KALLSYMS && MODULES
depends on MODULES
depends on HAVE_KPROBES
select KALLSYMS
help
Kprobes allows you to trap at almost any kernel address and
execute a callback function. register_kprobe() establishes
@@ -45,7 +46,6 @@ config OPTPROBES
def_bool y
depends on KPROBES && HAVE_OPTPROBES
depends on !PREEMPT
select KALLSYMS_ALL
config HAVE_EFFICIENT_UNALIGNED_ACCESS
bool
+2
View File
@@ -43,6 +43,8 @@ extern void smp_imb(void);
/* ??? Ought to use this in arch/alpha/kernel/signal.c too. */
#ifndef CONFIG_SMP
#include <linux/sched.h>
extern void __load_new_mm_context(struct mm_struct *);
static inline void
flush_icache_user_range(struct vm_area_struct *vma, struct page *page,

Some files were not shown because too many files have changed in this diff Show More