Merge branch 'master' into for-next

This commit is contained in:
Jiri Kosina
2010-06-16 18:08:13 +02:00
3859 changed files with 293546 additions and 105994 deletions
+1
View File
@@ -28,6 +28,7 @@ modules.builtin
*.gz
*.bz2
*.lzma
*.lzo
*.patch
*.gcno
+7
View File
@@ -0,0 +1,7 @@
filesystems/dnotify_test
laptops/dslm
timers/hpet_example
vm/hugepage-mmap
vm/hugepage-shm
vm/map_hugetlb
@@ -0,0 +1,20 @@
What: /sys/class/power/ds2760-battery.*/charge_now
Date: May 2010
KernelVersion: 2.6.35
Contact: Daniel Mack <daniel@caiaq.de>
Description:
This file is writeable and can be used to set the current
coloumb counter value inside the battery monitor chip. This
is needed for unavoidable corrections of aging batteries.
A userspace daemon can monitor the battery charging logic
and once the counter drops out of considerable bounds, take
appropriate action.
What: /sys/class/power/ds2760-battery.*/charge_full
Date: May 2010
KernelVersion: 2.6.35
Contact: Daniel Mack <daniel@caiaq.de>
Description:
This file is writeable and can be used to set the assumed
battery 'full level'. As batteries age, this value has to be
amended over time.
@@ -0,0 +1,7 @@
What: /sys/devices/system/node/nodeX/compact
Date: February 2010
Contact: Mel Gorman <mel@csn.ul.ie>
Description:
When this file is written to, all memory within that node
will be compacted. When it completes, memory will be freed
into blocks which have as many contiguous pages as possible
@@ -0,0 +1,15 @@
What: /sys/firmware/sfi/tables/
Date: May 2010
Contact: Len Brown <lenb@kernel.org>
Description:
SFI defines a number of small static memory tables
so the kernel can get platform information from firmware.
The tables are defined in the latest SFI specification:
http://simplefirmware.org/documentation
While the tables are used by the kernel, user-space
can observe them this way:
# cd /sys/firmware/sfi/tables
# cat $TABLENAME > $TABLENAME.bin
+46 -33
View File
@@ -639,6 +639,36 @@ is planned to completely remove virt_to_bus() and bus_to_virt() as
they are entirely deprecated. Some ports already do not provide these
as it is impossible to correctly support them.
Handling Errors
DMA address space is limited on some architectures and an allocation
failure can be determined by:
- checking if dma_alloc_coherent returns NULL or dma_map_sg returns 0
- checking the returned dma_addr_t of dma_map_single and dma_map_page
by using dma_mapping_error():
dma_addr_t dma_handle;
dma_handle = dma_map_single(dev, addr, size, direction);
if (dma_mapping_error(dev, dma_handle)) {
/*
* reduce current DMA mapping usage,
* delay and try again later or
* reset driver.
*/
}
Networking drivers must call dev_kfree_skb to free the socket buffer
and return NETDEV_TX_OK if the DMA mapping fails on the transmit hook
(ndo_start_xmit). This means that the socket buffer is just dropped in
the failure case.
SCSI drivers must return SCSI_MLQUEUE_HOST_BUSY if the DMA mapping
fails in the queuecommand hook. This means that the SCSI subsystem
passes the command to the driver again later.
Optimizing Unmap State Space Consumption
On many platforms, dma_unmap_{single,page}() is simply a nop.
@@ -703,42 +733,25 @@ to "Closing".
1) Struct scatterlist requirements.
Struct scatterlist must contain, at a minimum, the following
members:
Don't invent the architecture specific struct scatterlist; just use
<asm-generic/scatterlist.h>. You need to enable
CONFIG_NEED_SG_DMA_LENGTH if the architecture supports IOMMUs
(including software IOMMU).
struct page *page;
unsigned int offset;
unsigned int length;
2) ARCH_KMALLOC_MINALIGN
The base address is specified by a "page+offset" pair.
Architectures must ensure that kmalloc'ed buffer is
DMA-safe. Drivers and subsystems depend on it. If an architecture
isn't fully DMA-coherent (i.e. hardware doesn't ensure that data in
the CPU cache is identical to data in main memory),
ARCH_KMALLOC_MINALIGN must be set so that the memory allocator
makes sure that kmalloc'ed buffer doesn't share a cache line with
the others. See arch/arm/include/asm/cache.h as an example.
Previous versions of struct scatterlist contained a "void *address"
field that was sometimes used instead of page+offset. As of Linux
2.5., page+offset is always used, and the "address" field has been
deleted.
2) More to come...
Handling Errors
DMA address space is limited on some architectures and an allocation
failure can be determined by:
- checking if dma_alloc_coherent returns NULL or dma_map_sg returns 0
- checking the returned dma_addr_t of dma_map_single and dma_map_page
by using dma_mapping_error():
dma_addr_t dma_handle;
dma_handle = dma_map_single(dev, addr, size, direction);
if (dma_mapping_error(dev, dma_handle)) {
/*
* reduce current DMA mapping usage,
* delay and try again later or
* reset driver.
*/
}
Note that ARCH_KMALLOC_MINALIGN is about DMA memory alignment
constraints. You don't need to worry about the architecture data
alignment constraints (e.g. the alignment constraints about 64-bit
objects).
Closing
+6 -6
View File
@@ -389,7 +389,7 @@
</para>
<para>
If your driver supports memory management (it should!), you'll
need to set that up at load time as well. How you intialize
need to set that up at load time as well. How you initialize
it depends on which memory manager you're using, TTM or GEM.
</para>
<sect3>
@@ -399,7 +399,7 @@
aperture space for graphics devices. TTM supports both UMA devices
and devices with dedicated video RAM (VRAM), i.e. most discrete
graphics devices. If your device has dedicated RAM, supporting
TTM is desireable. TTM also integrates tightly with your
TTM is desirable. TTM also integrates tightly with your
driver specific buffer execution function. See the radeon
driver for examples.
</para>
@@ -443,7 +443,7 @@
likely eventually calling ttm_bo_global_init and
ttm_bo_global_release, respectively. Also like the previous
object, ttm_global_item_ref is used to create an initial reference
count for the TTM, which will call your initalization function.
count for the TTM, which will call your initialization function.
</para>
</sect3>
<sect3>
@@ -557,7 +557,7 @@ void intel_crt_init(struct drm_device *dev)
CRT connector and encoder combination is created. A device
specific i2c bus is also created, for fetching EDID data and
performing monitor detection. Once the process is complete,
the new connector is regsitered with sysfs, to make its
the new connector is registered with sysfs, to make its
properties available to applications.
</para>
<sect4>
@@ -581,12 +581,12 @@ void intel_crt_init(struct drm_device *dev)
<para>
For each encoder, CRTC and connector, several functions must
be provided, depending on the object type. Encoder objects
need should provide a DPMS (basically on/off) function, mode fixup
need to provide a DPMS (basically on/off) function, mode fixup
(for converting requested modes into native hardware timings),
and prepare, set and commit functions for use by the core DRM
helper functions. Connector helpers need to provide mode fetch and
validity functions as well as an encoder matching function for
returing an ideal encoder for a given connector. The core
returning an ideal encoder for a given connector. The core
connector functions include a DPMS callback, (deprecated)
save/restore routines, detection, mode probing, property handling,
and cleanup functions.
+1 -1
View File
@@ -269,7 +269,7 @@ static void board_hwcontrol(struct mtd_info *mtd, int cmd)
information about the device.
</para>
<programlisting>
int __init board_init (void)
static int __init board_init (void)
{
struct nand_chip *this;
int err = 0;
+1 -1
View File
@@ -58,7 +58,7 @@ MPEG stream embedded, sliced VBI data format in this specification.
</contrib>
<affiliation>
<address>
<email>awalls@radix.net</email>
<email>awalls@md.metrocast.net</email>
</address>
</affiliation>
</author>
@@ -53,8 +53,10 @@ input</refpurpose>
automatically, similar to sensing the video standard. To do so, applications
call <constant> VIDIOC_QUERY_DV_PRESET</constant> with a pointer to a
&v4l2-dv-preset; type. Once the hardware detects a preset, that preset is
returned in the preset field of &v4l2-dv-preset;. When detection is not
possible or fails, the value V4L2_DV_INVALID is returned.</para>
returned in the preset field of &v4l2-dv-preset;. If the preset could not be
detected because there was no signal, or the signal was unreliable, or the
signal did not map to a supported preset, then the value V4L2_DV_INVALID is
returned.</para>
</refsect1>
<refsect1>
+13 -16
View File
@@ -13,7 +13,7 @@ Reporting (AER) driver and provides information on how to use it, as
well as how to enable the drivers of endpoint devices to conform with
PCI Express AER driver.
1.2 Copyright © Intel Corporation 2006.
1.2 Copyright (C) Intel Corporation 2006.
1.3 What is the PCI Express AER Driver?
@@ -71,15 +71,11 @@ console. If it's a correctable error, it is outputed as a warning.
Otherwise, it is printed as an error. So users could choose different
log level to filter out correctable error messages.
Below shows an example.
+------ PCI-Express Device Error -----+
Error Severity : Uncorrected (Fatal)
PCIE Bus Error type : Transaction Layer
Unsupported Request : First
Requester ID : 0500
VendorID=8086h, DeviceID=0329h, Bus=05h, Device=00h, Function=00h
TLB Header:
04000001 00200a03 05010000 00050100
Below shows an example:
0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000
0000:50:00.0: [20] Unsupported Request (First)
0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100
In the example, 'Requester ID' means the ID of the device who sends
the error message to root port. Pls. refer to pci express specs for
@@ -112,7 +108,7 @@ but the PCI Express link itself is fully functional. Fatal errors, on
the other hand, cause the link to be unreliable.
When AER is enabled, a PCI Express device will automatically send an
error message to the PCIE root port above it when the device captures
error message to the PCIe root port above it when the device captures
an error. The Root Port, upon receiving an error reporting message,
internally processes and logs the error message in its PCI Express
capability structure. Error information being logged includes storing
@@ -198,8 +194,9 @@ to reset link, AER port service driver is required to provide the
function to reset link. Firstly, kernel looks for if the upstream
component has an aer driver. If it has, kernel uses the reset_link
callback of the aer driver. If the upstream component has no aer driver
and the port is downstream port, we will use the aer driver of the
root port who reports the AER error. As for upstream ports,
and the port is downstream port, we will perform a hot reset as the
default by setting the Secondary Bus Reset bit of the Bridge Control
register associated with the downstream port. As for upstream ports,
they should provide their own aer service drivers with reset_link
function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
@@ -253,11 +250,11 @@ cleanup uncorrectable status register. Pls. refer to section 3.3.
4. Software error injection
Debugging PCIE AER error recovery code is quite difficult because it
Debugging PCIe AER error recovery code is quite difficult because it
is hard to trigger real hardware errors. Software based error
injection can be used to fake various kinds of PCIE errors.
injection can be used to fake various kinds of PCIe errors.
First you should enable PCIE AER software error injection in kernel
First you should enable PCIe AER software error injection in kernel
configuration, that is, following item should be in your .config.
CONFIG_PCIEAER_INJECT=y or CONFIG_PCIEAER_INJECT=m
+12
View File
@@ -18,6 +18,8 @@ kernel patches.
2b: Passes allnoconfig, allmodconfig
2c: Builds successfully when using O=builddir
3: Builds on multiple CPU architectures by using local cross-compile tools
or some other build farm.
@@ -95,3 +97,13 @@ kernel patches.
25: If any ioctl's are added by the patch, then also update
Documentation/ioctl/ioctl-number.txt.
26: If your modified source code depends on or uses any of the kernel
APIs or features that are related to the following kconfig symbols,
then test multiple builds with the related kconfig symbols disabled
and/or =m (if that option is available) [not all of these at the
same time, just various/random combinations of them]:
CONFIG_SMP, CONFIG_SYSFS, CONFIG_PROC_FS, CONFIG_INPUT, CONFIG_PCI,
CONFIG_BLOCK, CONFIG_PM, CONFIG_HOTPLUG, CONFIG_MAGIC_SYSRQ,
CONFIG_NET, CONFIG_INET=n (but latter with CONFIG_NET=y)
+5
View File
@@ -130,6 +130,8 @@ Linux kernel master tree:
ftp.??.kernel.org:/pub/linux/kernel/...
?? == your country code, such as "us", "uk", "fr", etc.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git
Linux kernel mailing list:
linux-kernel@vger.kernel.org
[mail majordomo@vger.kernel.org to subscribe]
@@ -160,3 +162,6 @@ How to NOT write kernel driver by Arjan van de Ven:
Kernel Janitor:
http://janitor.kernelnewbies.org/
GIT, Fast Version Control System:
http://git-scm.com/
+59
View File
@@ -0,0 +1,59 @@
APEI Error INJection
~~~~~~~~~~~~~~~~~~~~
EINJ provides a hardware error injection mechanism
It is very useful for debugging and testing of other APEI and RAS features.
To use EINJ, make sure the following are enabled in your kernel
configuration:
CONFIG_DEBUG_FS
CONFIG_ACPI_APEI
CONFIG_ACPI_APEI_EINJ
The user interface of EINJ is debug file system, under the
directory apei/einj. The following files are provided.
- available_error_type
Reading this file returns the error injection capability of the
platform, that is, which error types are supported. The error type
definition is as follow, the left field is the error type value, the
right field is error description.
0x00000001 Processor Correctable
0x00000002 Processor Uncorrectable non-fatal
0x00000004 Processor Uncorrectable fatal
0x00000008 Memory Correctable
0x00000010 Memory Uncorrectable non-fatal
0x00000020 Memory Uncorrectable fatal
0x00000040 PCI Express Correctable
0x00000080 PCI Express Uncorrectable fatal
0x00000100 PCI Express Uncorrectable non-fatal
0x00000200 Platform Correctable
0x00000400 Platform Uncorrectable non-fatal
0x00000800 Platform Uncorrectable fatal
The format of file contents are as above, except there are only the
available error type lines.
- error_type
This file is used to set the error type value. The error type value
is defined in "available_error_type" description.
- error_inject
Write any integer to this file to trigger the error
injection. Before this, please specify all necessary error
parameters.
- param1
This file is used to set the first error parameter value. Effect of
parameter depends on error_type specified. For memory error, this is
physical memory address.
- param2
This file is used to set the second error parameter value. Effect of
parameter depends on error_type specified. For memory error, this is
physical memory address mask.
For more information about EINJ, please refer to ACPI specification
version 4.0, section 17.5.
+79 -2
View File
@@ -12,6 +12,8 @@ Introduction
of the s3c2410 GPIO system, please read the Samsung provided
data-sheet/users manual to find out the complete list.
See Documentation/arm/Samsung/GPIO.txt for the core implemetation.
GPIOLIB
-------
@@ -24,8 +26,60 @@ GPIOLIB
listed below will be removed (they may be marked as __deprecated
in the near future).
- s3c2410_gpio_getpin
- s3c2410_gpio_setpin
The following functions now either have a s3c_ specific variant
or are merged into gpiolib. See the definitions in
arch/arm/plat-samsung/include/plat/gpio-cfg.h:
s3c2410_gpio_setpin() gpio_set_value() or gpio_direction_output()
s3c2410_gpio_getpin() gpio_get_value() or gpio_direction_input()
s3c2410_gpio_getirq() gpio_to_irq()
s3c2410_gpio_cfgpin() s3c_gpio_cfgpin()
s3c2410_gpio_getcfg() s3c_gpio_getcfg()
s3c2410_gpio_pullup() s3c_gpio_setpull()
GPIOLIB conversion
------------------
If you need to convert your board or driver to use gpiolib from the exiting
s3c2410 api, then here are some notes on the process.
1) If your board is exclusively using an GPIO, say to control peripheral
power, then it will require to claim the gpio with gpio_request() before
it can use it.
It is recommended to check the return value, with at least WARN_ON()
during initialisation.
2) The s3c2410_gpio_cfgpin() can be directly replaced with s3c_gpio_cfgpin()
as they have the same arguments, and can either take the pin specific
values, or the more generic special-function-number arguments.
3) s3c2410_gpio_pullup() changs have the problem that whilst the
s3c2410_gpio_pullup(x, 1) can be easily translated to the
s3c_gpio_setpull(x, S3C_GPIO_PULL_NONE), the s3c2410_gpio_pullup(x, 0)
are not so easy.
The s3c2410_gpio_pullup(x, 0) case enables the pull-up (or in the case
of some of the devices, a pull-down) and as such the new API distinguishes
between the UP and DOWN case. There is currently no 'just turn on' setting
which may be required if this becomes a problem.
4) s3c2410_gpio_setpin() can be replaced by gpio_set_value(), the old call
does not implicitly configure the relevant gpio to output. The gpio
direction should be changed before using gpio_set_value().
5) s3c2410_gpio_getpin() is replaceable by gpio_get_value() if the pin
has been set to input. It is currently unknown what the behaviour is
when using gpio_get_value() on an output pin (s3c2410_gpio_getpin
would return the value the pin is supposed to be outputting).
6) s3c2410_gpio_getirq() should be directly replacable with the
gpio_to_irq() call.
The s3c2410_gpio and gpio_ calls have always operated on the same gpio
numberspace, so there is no problem with converting the gpio numbering
between the calls.
Headers
@@ -54,6 +108,11 @@ PIN Numbers
eg S3C2410_GPA(0) or S3C2410_GPF(1). These defines are used to tell
the GPIO functions which pin is to be used.
With the conversion to gpiolib, there is no longer a direct conversion
from gpio pin number to register base address as in earlier kernels. This
is due to the number space required for newer SoCs where the later
GPIOs are not contiguous.
Configuring a pin
-----------------
@@ -71,6 +130,8 @@ Configuring a pin
which would turn GPA(0) into the lowest Address line A0, and set
GPE(8) to be connected to the SDIO/MMC controller's SDDAT1 line.
The s3c_gpio_cfgpin() call is a functional replacement for this call.
Reading the current configuration
---------------------------------
@@ -82,6 +143,9 @@ Reading the current configuration
The return value will be from the same set of values which can be
passed to s3c2410_gpio_cfgpin().
The s3c_gpio_getcfg() call should be a functional replacement for
this call.
Configuring a pull-up resistor
------------------------------
@@ -95,6 +159,10 @@ Configuring a pull-up resistor
Where the to value is zero to set the pull-up off, and 1 to enable
the specified pull-up. Any other values are currently undefined.
The s3c_gpio_setpull() offers similar functionality, but with the
ability to encode whether the pull is up or down. Currently there
is no 'just on' state, so up or down must be selected.
Getting the state of a PIN
--------------------------
@@ -106,6 +174,9 @@ Getting the state of a PIN
This will return either zero or non-zero. Do not count on this
function returning 1 if the pin is set.
This call is now implemented by the relevant gpiolib calls, convert
your board or driver to use gpiolib.
Setting the state of a PIN
--------------------------
@@ -117,6 +188,9 @@ Setting the state of a PIN
Which sets the given pin to the value. Use 0 to write 0, and 1 to
set the output to 1.
This call is now implemented by the relevant gpiolib calls, convert
your board or driver to use gpiolib.
Getting the IRQ number associated with a PIN
--------------------------------------------
@@ -128,6 +202,9 @@ Getting the IRQ number associated with a PIN
Note, not all pins have an IRQ.
This call is now implemented by the relevant gpiolib calls, convert
your board or driver to use gpiolib.
Authour
-------
+14 -1
View File
@@ -8,10 +8,16 @@ Introduction
The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported
by the 's3c2410' architecture of ARM Linux. Currently the S3C2410,
S3C2412, S3C2413, S3C2440, S3C2442 and S3C2443 devices are supported.
S3C2412, S3C2413, S3C2416 S3C2440, S3C2442, S3C2443 and S3C2450 devices
are supported.
Support for the S3C2400 and S3C24A0 series are in progress.
The S3C2416 and S3C2450 devices are very similar and S3C2450 support is
included under the arch/arm/mach-s3c2416 directory. Note, whilst core
support for these SoCs is in, work on some of the extra peripherals
and extra interrupts is still ongoing.
Configuration
-------------
@@ -209,6 +215,13 @@ GPIO
Newer kernels carry GPIOLIB, and support is being moved towards
this with some of the older support in line to be removed.
As of v2.6.34, the move towards using gpiolib support is almost
complete, and very little of the old calls are left.
See Documentation/arm/Samsung-S3C24XX/GPIO.txt for the S3C24XX specific
support and Documentation/arm/Samsung/GPIO.txt for the core Samsung
implementation.
Clock Management
----------------
+42
View File
@@ -0,0 +1,42 @@
Samsung GPIO implementation
===========================
Introduction
------------
This outlines the Samsung GPIO implementation and the architecture
specfic calls provided alongisde the drivers/gpio core.
S3C24XX (Legacy)
----------------
See Documentation/arm/Samsung-S3C24XX/GPIO.txt for more information
about these devices. Their implementation is being brought into line
with the core samsung implementation described in this document.
GPIOLIB integration
-------------------
The gpio implementation uses gpiolib as much as possible, only providing
specific calls for the items that require Samsung specific handling, such
as pin special-function or pull resistor control.
GPIO numbering is synchronised between the Samsung and gpiolib system.
PIN configuration
-----------------
Pin configuration is specific to the Samsung architecutre, with each SoC
registering the necessary information for the core gpio configuration
implementation to configure pins as necessary.
The s3c_gpio_cfgpin() and s3c_gpio_setpull() provide the means for a
driver or machine to change gpio configuration.
See arch/arm/plat-samsung/include/plat/gpio-cfg.h for more information
on these functions.
+23 -10
View File
@@ -13,9 +13,10 @@ Introduction
- S3C24XX: See Documentation/arm/Samsung-S3C24XX/Overview.txt for full list
- S3C64XX: S3C6400 and S3C6410
- S5PC6440
S5PC100 and S5PC110 support is currently being merged
- S5P6440
- S5P6442
- S5PC100
- S5PC110 / S5PV210
S3C24XX Systems
@@ -35,7 +36,10 @@ Configuration
unifying all the SoCs into one kernel.
s5p6440_defconfig - S5P6440 specific default configuration
s5p6442_defconfig - S5P6442 specific default configuration
s5pc100_defconfig - S5PC100 specific default configuration
s5pc110_defconfig - S5PC110 specific default configuration
s5pv210_defconfig - S5PV210 specific default configuration
Layout
@@ -50,18 +54,27 @@ Layout
specific information. It contains the base clock, GPIO and device definitions
to get the system running.
plat-s3c is the s3c24xx/s3c64xx platform directory, although it is currently
involved in other builds this will be phased out once the relevant code is
moved elsewhere.
plat-s3c24xx is for s3c24xx specific builds, see the S3C24XX docs.
plat-s3c64xx is for the s3c64xx specific bits, see the S3C24XX docs.
plat-s5p is for s5p specific builds, and contains common support for the
S5P specific systems. Not all S5Ps use all the features in this directory
due to differences in the hardware.
plat-s5p is for s5p specific builds, more to be added.
Layout changes
--------------
The old plat-s3c and plat-s5pc1xx directories have been removed, with
support moved to either plat-samsung or plat-s5p as necessary. These moves
where to simplify the include and dependency issues involved with having
so many different platform directories.
It was decided to remove plat-s5pc1xx as some of the support was already
in plat-s5p or plat-samsung, with the S5PC110 support added with S5PV210
the only user was the S5PC100. The S5PC100 specific items where moved to
arch/arm/mach-s5pc100.
[ to finish ]
Port Contributors
+133 -18
View File
@@ -17,6 +17,9 @@ HOWTO
You can do a very simple testing of running two dd threads in two different
cgroups. Here is what you can do.
- Enable Block IO controller
CONFIG_BLK_CGROUP=y
- Enable group scheduling in CFQ
CONFIG_CFQ_GROUP_IOSCHED=y
@@ -54,32 +57,52 @@ cgroups. Here is what you can do.
Various user visible config options
===================================
CONFIG_BLK_CGROUP
- Block IO controller.
CONFIG_DEBUG_BLK_CGROUP
- Debug help. Right now some additional stats file show up in cgroup
if this option is enabled.
CONFIG_CFQ_GROUP_IOSCHED
- Enables group scheduling in CFQ. Currently only 1 level of group
creation is allowed.
CONFIG_DEBUG_CFQ_IOSCHED
- Enables some debugging messages in blktrace. Also creates extra
cgroup file blkio.dequeue.
Config options selected automatically
=====================================
These config options are not user visible and are selected/deselected
automatically based on IO scheduler configuration.
CONFIG_BLK_CGROUP
- Block IO controller. Selected by CONFIG_CFQ_GROUP_IOSCHED.
CONFIG_DEBUG_BLK_CGROUP
- Debug help. Selected by CONFIG_DEBUG_CFQ_IOSCHED.
Details of cgroup files
=======================
- blkio.weight
- Specifies per cgroup weight.
- Specifies per cgroup weight. This is default weight of the group
on all the devices until and unless overridden by per device rule.
(See blkio.weight_device).
Currently allowed range of weights is from 100 to 1000.
- blkio.weight_device
- One can specify per cgroup per device rules using this interface.
These rules override the default value of group weight as specified
by blkio.weight.
Following is the format.
#echo dev_maj:dev_minor weight > /path/to/cgroup/blkio.weight_device
Configure weight=300 on /dev/sdb (8:16) in this cgroup
# echo 8:16 300 > blkio.weight_device
# cat blkio.weight_device
dev weight
8:16 300
Configure weight=500 on /dev/sda (8:0) in this cgroup
# echo 8:0 500 > blkio.weight_device
# cat blkio.weight_device
dev weight
8:0 500
8:16 300
Remove specific weight for /dev/sda in this cgroup
# echo 8:0 0 > blkio.weight_device
# cat blkio.weight_device
dev weight
8:16 300
- blkio.time
- disk time allocated to cgroup per device in milliseconds. First
two fields specify the major and minor number of the device and
@@ -92,13 +115,105 @@ Details of cgroup files
third field specifies the number of sectors transferred by the
group to/from the device.
- blkio.io_service_bytes
- Number of bytes transferred to/from the disk by the group. These
are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field
specifies the number of bytes.
- blkio.io_serviced
- Number of IOs completed to/from the disk by the group. These
are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field
specifies the number of IOs.
- blkio.io_service_time
- Total amount of time between request dispatch and request completion
for the IOs done by this cgroup. This is in nanoseconds to make it
meaningful for flash devices too. For devices with queue depth of 1,
this time represents the actual service time. When queue_depth > 1,
that is no longer true as requests may be served out of order. This
may cause the service time for a given IO to include the service time
of multiple IOs when served out of order which may result in total
io_service_time > actual time elapsed. This time is further divided by
the type of operation - read or write, sync or async. First two fields
specify the major and minor number of the device, third field
specifies the operation type and the fourth field specifies the
io_service_time in ns.
- blkio.io_wait_time
- Total amount of time the IOs for this cgroup spent waiting in the
scheduler queues for service. This can be greater than the total time
elapsed since it is cumulative io_wait_time for all IOs. It is not a
measure of total time the cgroup spent waiting but rather a measure of
the wait_time for its individual IOs. For devices with queue_depth > 1
this metric does not include the time spent waiting for service once
the IO is dispatched to the device but till it actually gets serviced
(there might be a time lag here due to re-ordering of requests by the
device). This is in nanoseconds to make it meaningful for flash
devices too. This time is further divided by the type of operation -
read or write, sync or async. First two fields specify the major and
minor number of the device, third field specifies the operation type
and the fourth field specifies the io_wait_time in ns.
- blkio.io_merged
- Total number of bios/requests merged into requests belonging to this
cgroup. This is further divided by the type of operation - read or
write, sync or async.
- blkio.io_queued
- Total number of requests queued up at any given instant for this
cgroup. This is further divided by the type of operation - read or
write, sync or async.
- blkio.avg_queue_size
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
The average queue size for this cgroup over the entire time of this
cgroup's existence. Queue size samples are taken each time one of the
queues of this cgroup gets a timeslice.
- blkio.group_wait_time
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
This is the amount of time the cgroup had to wait since it became busy
(i.e., went from 0 to 1 request queued) to get a timeslice for one of
its queues. This is different from the io_wait_time which is the
cumulative total of the amount of time spent by each IO in that cgroup
waiting in the scheduler queue. This is in nanoseconds. If this is
read when the cgroup is in a waiting (for timeslice) state, the stat
will only report the group_wait_time accumulated till the last time it
got a timeslice and will not include the current delta.
- blkio.empty_time
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
This is the amount of time a cgroup spends without any pending
requests when not being served, i.e., it does not include any time
spent idling for one of the queues of the cgroup. This is in
nanoseconds. If this is read when the cgroup is in an empty state,
the stat will only report the empty_time accumulated till the last
time it had a pending request and will not include the current delta.
- blkio.idle_time
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
This is the amount of time spent by the IO scheduler idling for a
given cgroup in anticipation of a better request than the exising ones
from other queues/cgroups. This is in nanoseconds. If this is read
when the cgroup is in an idling state, the stat will only report the
idle_time accumulated till the last idle period and will not include
the current delta.
- blkio.dequeue
- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. This
gives the statistics about how many a times a group was dequeued
from service tree of the device. First two fields specify the major
and minor number of the device and third field specifies the number
of times a group was dequeued from a particular device.
- blkio.reset_stats
- Writing an int to this file will result in resetting all the stats
for that cgroup.
CFQ sysfs tunable
=================
/sys/block/<disk>/queue/iosched/group_isolation
+1 -1
View File
@@ -339,7 +339,7 @@ To mount a cgroup hierarchy with all available subsystems, type:
The "xxx" is not interpreted by the cgroup code, but will appear in
/proc/mounts so may be any useful identifying string that you like.
To mount a cgroup hierarchy with just the cpuset and numtasks
To mount a cgroup hierarchy with just the cpuset and memory
subsystems, type:
# mount -t cgroup -o cpuset,memory hier1 /dev/cgroup

Some files were not shown because too many files have changed in this diff Show More