You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge branches 'sched/rt' and 'sched/urgent' into sched/core
This commit is contained in:
@@ -3786,14 +3786,11 @@ S: The Netherlands
|
||||
|
||||
N: David Woodhouse
|
||||
E: dwmw2@infradead.org
|
||||
D: ARCnet stuff, Applicom board driver, SO_BINDTODEVICE,
|
||||
D: some Alpha platform porting from 2.0, Memory Technology Devices,
|
||||
D: Acquire watchdog timer, PC speaker driver maintenance,
|
||||
D: JFFS2 file system, Memory Technology Device subsystem,
|
||||
D: various other stuff that annoyed me by not working.
|
||||
S: c/o Red Hat Engineering
|
||||
S: Rustat House
|
||||
S: 60 Clifton Road
|
||||
S: Cambridge. CB1 7EG
|
||||
S: c/o Intel Corporation
|
||||
S: Pipers Way
|
||||
S: Swindon. SN3 1RJ
|
||||
S: England
|
||||
|
||||
N: Chris Wright
|
||||
|
||||
@@ -33,10 +33,12 @@ o Gnu make 3.79.1 # make --version
|
||||
o binutils 2.12 # ld -v
|
||||
o util-linux 2.10o # fdformat --version
|
||||
o module-init-tools 0.9.10 # depmod -V
|
||||
o e2fsprogs 1.29 # tune2fs
|
||||
o e2fsprogs 1.41.4 # e2fsck -V
|
||||
o jfsutils 1.1.3 # fsck.jfs -V
|
||||
o reiserfsprogs 3.6.3 # reiserfsck -V 2>&1|grep reiserfsprogs
|
||||
o xfsprogs 2.6.0 # xfs_db -V
|
||||
o squashfs-tools 4.0 # mksquashfs -version
|
||||
o btrfs-progs 0.18 # btrfsck
|
||||
o pcmciautils 004 # pccardctl -V
|
||||
o quota-tools 3.09 # quota -V
|
||||
o PPP 2.4.0 # pppd --version
|
||||
|
||||
@@ -483,17 +483,25 @@ values. To do the latter, you can stick the following in your .emacs file:
|
||||
(* (max steps 1)
|
||||
c-basic-offset)))
|
||||
|
||||
(add-hook 'c-mode-common-hook
|
||||
(lambda ()
|
||||
;; Add kernel style
|
||||
(c-add-style
|
||||
"linux-tabs-only"
|
||||
'("linux" (c-offsets-alist
|
||||
(arglist-cont-nonempty
|
||||
c-lineup-gcc-asm-reg
|
||||
c-lineup-arglist-tabs-only))))))
|
||||
|
||||
(add-hook 'c-mode-hook
|
||||
(lambda ()
|
||||
(let ((filename (buffer-file-name)))
|
||||
;; Enable kernel mode for the appropriate files
|
||||
(when (and filename
|
||||
(string-match "~/src/linux-trees" filename))
|
||||
(string-match (expand-file-name "~/src/linux-trees")
|
||||
filename))
|
||||
(setq indent-tabs-mode t)
|
||||
(c-set-style "linux")
|
||||
(c-set-offset 'arglist-cont-nonempty
|
||||
'(c-lineup-gcc-asm-reg
|
||||
c-lineup-arglist-tabs-only))))))
|
||||
(c-set-style "linux-tabs-only")))))
|
||||
|
||||
This will make emacs go better with the kernel coding style for C
|
||||
files below ~/src/linux-trees.
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
|
||||
This document describes the DMA API. For a more gentle introduction
|
||||
phrased in terms of the pci_ equivalents (and actual examples) see
|
||||
DMA-mapping.txt
|
||||
Documentation/PCI/PCI-DMA-mapping.txt.
|
||||
|
||||
This API is split into two pieces. Part I describes the API and the
|
||||
corresponding pci_ API. Part II describes the extensions to the API
|
||||
@@ -170,16 +170,15 @@ Returns: 0 if successful and a negative error if not.
|
||||
u64
|
||||
dma_get_required_mask(struct device *dev)
|
||||
|
||||
After setting the mask with dma_set_mask(), this API returns the
|
||||
actual mask (within that already set) that the platform actually
|
||||
requires to operate efficiently. Usually this means the returned mask
|
||||
This API returns the mask that the platform requires to
|
||||
operate efficiently. Usually this means the returned mask
|
||||
is the minimum required to cover all of memory. Examining the
|
||||
required mask gives drivers with variable descriptor sizes the
|
||||
opportunity to use smaller descriptors as necessary.
|
||||
|
||||
Requesting the required mask does not alter the current mask. If you
|
||||
wish to take advantage of it, you should issue another dma_set_mask()
|
||||
call to lower the mask again.
|
||||
wish to take advantage of it, you should issue a dma_set_mask()
|
||||
call to set the mask to the value returned.
|
||||
|
||||
|
||||
Part Id - Streaming DMA mappings
|
||||
|
||||
@@ -41,6 +41,12 @@ GPL version 2.
|
||||
</abstract>
|
||||
|
||||
<revhistory>
|
||||
<revision>
|
||||
<revnumber>0.7</revnumber>
|
||||
<date>2008-12-23</date>
|
||||
<authorinitials>hjk</authorinitials>
|
||||
<revremark>Added generic platform drivers and offset attribute.</revremark>
|
||||
</revision>
|
||||
<revision>
|
||||
<revnumber>0.6</revnumber>
|
||||
<date>2008-12-05</date>
|
||||
@@ -312,6 +318,16 @@ interested in translating it, please email me
|
||||
pointed to by addr.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<filename>offset</filename>: The offset, in bytes, that has to be
|
||||
added to the pointer returned by <function>mmap()</function> to get
|
||||
to the actual device memory. This is important if the device's memory
|
||||
is not page aligned. Remember that pointers returned by
|
||||
<function>mmap()</function> are always page aligned, so it is good
|
||||
style to always add this offset.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
@@ -594,6 +610,78 @@ framework to set up sysfs files for this region. Simply leave it alone.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="using_uio_pdrv">
|
||||
<title>Using uio_pdrv for platform devices</title>
|
||||
<para>
|
||||
In many cases, UIO drivers for platform devices can be handled in a
|
||||
generic way. In the same place where you define your
|
||||
<varname>struct platform_device</varname>, you simply also implement
|
||||
your interrupt handler and fill your
|
||||
<varname>struct uio_info</varname>. A pointer to this
|
||||
<varname>struct uio_info</varname> is then used as
|
||||
<varname>platform_data</varname> for your platform device.
|
||||
</para>
|
||||
<para>
|
||||
You also need to set up an array of <varname>struct resource</varname>
|
||||
containing addresses and sizes of your memory mappings. This
|
||||
information is passed to the driver using the
|
||||
<varname>.resource</varname> and <varname>.num_resources</varname>
|
||||
elements of <varname>struct platform_device</varname>.
|
||||
</para>
|
||||
<para>
|
||||
You now have to set the <varname>.name</varname> element of
|
||||
<varname>struct platform_device</varname> to
|
||||
<varname>"uio_pdrv"</varname> to use the generic UIO platform device
|
||||
driver. This driver will fill the <varname>mem[]</varname> array
|
||||
according to the resources given, and register the device.
|
||||
</para>
|
||||
<para>
|
||||
The advantage of this approach is that you only have to edit a file
|
||||
you need to edit anyway. You do not have to create an extra driver.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="using_uio_pdrv_genirq">
|
||||
<title>Using uio_pdrv_genirq for platform devices</title>
|
||||
<para>
|
||||
Especially in embedded devices, you frequently find chips where the
|
||||
irq pin is tied to its own dedicated interrupt line. In such cases,
|
||||
where you can be really sure the interrupt is not shared, we can take
|
||||
the concept of <varname>uio_pdrv</varname> one step further and use a
|
||||
generic interrupt handler. That's what
|
||||
<varname>uio_pdrv_genirq</varname> does.
|
||||
</para>
|
||||
<para>
|
||||
The setup for this driver is the same as described above for
|
||||
<varname>uio_pdrv</varname>, except that you do not implement an
|
||||
interrupt handler. The <varname>.handler</varname> element of
|
||||
<varname>struct uio_info</varname> must remain
|
||||
<varname>NULL</varname>. The <varname>.irq_flags</varname> element
|
||||
must not contain <varname>IRQF_SHARED</varname>.
|
||||
</para>
|
||||
<para>
|
||||
You will set the <varname>.name</varname> element of
|
||||
<varname>struct platform_device</varname> to
|
||||
<varname>"uio_pdrv_genirq"</varname> to use this driver.
|
||||
</para>
|
||||
<para>
|
||||
The generic interrupt handler of <varname>uio_pdrv_genirq</varname>
|
||||
will simply disable the interrupt line using
|
||||
<function>disable_irq_nosync()</function>. After doing its work,
|
||||
userspace can reenable the interrupt by writing 0x00000001 to the UIO
|
||||
device file. The driver already implements an
|
||||
<function>irq_control()</function> to make this possible, you must not
|
||||
implement your own.
|
||||
</para>
|
||||
<para>
|
||||
Using <varname>uio_pdrv_genirq</varname> not only saves a few lines of
|
||||
interrupt handler code. You also do not need to know anything about
|
||||
the chip's internal registers to create the kernel part of the driver.
|
||||
All you need to know is the irq number of the pin the chip is
|
||||
connected to.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="userspace_driver" xreflabel="Writing a driver in user space">
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
[ NOTE: The virt_to_bus() and bus_to_virt() functions have been
|
||||
superseded by the functionality provided by the PCI DMA
|
||||
interface (see Documentation/DMA-mapping.txt). They continue
|
||||
superseded by the functionality provided by the PCI DMA interface
|
||||
(see Documentation/PCI/PCI-DMA-mapping.txt). They continue
|
||||
to be documented below for historical purposes, but new code
|
||||
must not use them. --davidm 00/12/12 ]
|
||||
|
||||
|
||||
@@ -392,6 +392,10 @@ int main(int argc, char *argv[])
|
||||
goto err;
|
||||
}
|
||||
}
|
||||
if (!maskset && !tid && !containerset) {
|
||||
usage();
|
||||
goto err;
|
||||
}
|
||||
|
||||
do {
|
||||
int i;
|
||||
|
||||
@@ -186,8 +186,9 @@ a virtual address mapping (unlike the earlier scheme of virtual address
|
||||
do not have a corresponding kernel virtual address space mapping) and
|
||||
low-memory pages.
|
||||
|
||||
Note: Please refer to DMA-mapping.txt for a discussion on PCI high mem DMA
|
||||
aspects and mapping of scatter gather lists, and support for 64 bit PCI.
|
||||
Note: Please refer to Documentation/PCI/PCI-DMA-mapping.txt for a discussion
|
||||
on PCI high mem DMA aspects and mapping of scatter gather lists, and support
|
||||
for 64 bit PCI.
|
||||
|
||||
Special handling is required only for cases where i/o needs to happen on
|
||||
pages at physical memory addresses beyond what the device can support. In these
|
||||
@@ -953,14 +954,14 @@ elevator_allow_merge_fn called whenever the block layer determines
|
||||
results in some sort of conflict internally,
|
||||
this hook allows it to do that.
|
||||
|
||||
elevator_dispatch_fn fills the dispatch queue with ready requests.
|
||||
elevator_dispatch_fn* fills the dispatch queue with ready requests.
|
||||
I/O schedulers are free to postpone requests by
|
||||
not filling the dispatch queue unless @force
|
||||
is non-zero. Once dispatched, I/O schedulers
|
||||
are not allowed to manipulate the requests -
|
||||
they belong to generic dispatch queue.
|
||||
|
||||
elevator_add_req_fn called to add a new request into the scheduler
|
||||
elevator_add_req_fn* called to add a new request into the scheduler
|
||||
|
||||
elevator_queue_empty_fn returns true if the merge queue is empty.
|
||||
Drivers shouldn't use this, but rather check
|
||||
@@ -990,7 +991,7 @@ elevator_activate_req_fn Called when device driver first sees a request.
|
||||
elevator_deactivate_req_fn Called when device driver decides to delay
|
||||
a request by requeueing it.
|
||||
|
||||
elevator_init_fn
|
||||
elevator_init_fn*
|
||||
elevator_exit_fn Allocate and free any elevator specific storage
|
||||
for a queue.
|
||||
|
||||
|
||||
@@ -0,0 +1,63 @@
|
||||
Queue sysfs files
|
||||
=================
|
||||
|
||||
This text file will detail the queue files that are located in the sysfs tree
|
||||
for each block device. Note that stacked devices typically do not export
|
||||
any settings, since their queue merely functions are a remapping target.
|
||||
These files are the ones found in the /sys/block/xxx/queue/ directory.
|
||||
|
||||
Files denoted with a RO postfix are readonly and the RW postfix means
|
||||
read-write.
|
||||
|
||||
hw_sector_size (RO)
|
||||
-------------------
|
||||
This is the hardware sector size of the device, in bytes.
|
||||
|
||||
max_hw_sectors_kb (RO)
|
||||
----------------------
|
||||
This is the maximum number of kilobytes supported in a single data transfer.
|
||||
|
||||
max_sectors_kb (RW)
|
||||
-------------------
|
||||
This is the maximum number of kilobytes that the block layer will allow
|
||||
for a filesystem request. Must be smaller than or equal to the maximum
|
||||
size allowed by the hardware.
|
||||
|
||||
nomerges (RW)
|
||||
-------------
|
||||
This enables the user to disable the lookup logic involved with IO merging
|
||||
requests in the block layer. Merging may still occur through a direct
|
||||
1-hit cache, since that comes for (almost) free. The IO scheduler will not
|
||||
waste cycles doing tree/hash lookups for merges if nomerges is 1. Defaults
|
||||
to 0, enabling all merges.
|
||||
|
||||
nr_requests (RW)
|
||||
----------------
|
||||
This controls how many requests may be allocated in the block layer for
|
||||
read or write requests. Note that the total allocated number may be twice
|
||||
this amount, since it applies only to reads or writes (not the accumulated
|
||||
sum).
|
||||
|
||||
read_ahead_kb (RW)
|
||||
------------------
|
||||
Maximum number of kilobytes to read-ahead for filesystems on this block
|
||||
device.
|
||||
|
||||
rq_affinity (RW)
|
||||
----------------
|
||||
If this option is enabled, the block layer will migrate request completions
|
||||
to the CPU that originally submitted the request. For some workloads
|
||||
this provides a significant reduction in CPU cycles due to caching effects.
|
||||
|
||||
scheduler (RW)
|
||||
--------------
|
||||
When read, this file will display the current and available IO schedulers
|
||||
for this block device. The currently active IO scheduler will be enclosed
|
||||
in [] brackets. Writing an IO scheduler name to this file will switch
|
||||
control of this block device to that new IO scheduler. Note that writing
|
||||
an IO scheduler name to this file will attempt to load that IO scheduler
|
||||
module, if it isn't already present in the system.
|
||||
|
||||
|
||||
|
||||
Jens Axboe <jens.axboe@oracle.com>, February 2009
|
||||
@@ -1,7 +1,8 @@
|
||||
CGROUPS
|
||||
-------
|
||||
|
||||
Written by Paul Menage <menage@google.com> based on Documentation/cpusets.txt
|
||||
Written by Paul Menage <menage@google.com> based on
|
||||
Documentation/cgroups/cpusets.txt
|
||||
|
||||
Original copyright statements from cpusets.txt:
|
||||
Portions Copyright (C) 2004 BULL SA.
|
||||
@@ -68,7 +69,7 @@ On their own, the only use for cgroups is for simple job
|
||||
tracking. The intention is that other subsystems hook into the generic
|
||||
cgroup support to provide new attributes for cgroups, such as
|
||||
accounting/limiting the resources which processes in a cgroup can
|
||||
access. For example, cpusets (see Documentation/cpusets.txt) allows
|
||||
access. For example, cpusets (see Documentation/cgroups/cpusets.txt) allows
|
||||
you to associate a set of CPUs and a set of memory nodes with the
|
||||
tasks in each cgroup.
|
||||
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
Memory Resource Controller(Memcg) Implementation Memo.
|
||||
Last Updated: 2008/12/15
|
||||
Base Kernel Version: based on 2.6.28-rc8-mm.
|
||||
Last Updated: 2009/1/19
|
||||
Base Kernel Version: based on 2.6.29-rc2.
|
||||
|
||||
Because VM is getting complex (one of reasons is memcg...), memcg's behavior
|
||||
is complex. This is a document for memcg's internal behavior.
|
||||
Please note that implementation details can be changed.
|
||||
|
||||
(*) Topics on API should be in Documentation/controllers/memory.txt)
|
||||
(*) Topics on API should be in Documentation/cgroups/memory.txt)
|
||||
|
||||
0. How to record usage ?
|
||||
2 objects are used.
|
||||
@@ -340,3 +340,23 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
|
||||
# mount -t cgroup none /cgroup -t cpuset,memory,cpu,devices
|
||||
|
||||
and do task move, mkdir, rmdir etc...under this.
|
||||
|
||||
9.7 swapoff.
|
||||
Besides management of swap is one of complicated parts of memcg,
|
||||
call path of swap-in at swapoff is not same as usual swap-in path..
|
||||
It's worth to be tested explicitly.
|
||||
|
||||
For example, test like following is good.
|
||||
(Shell-A)
|
||||
# mount -t cgroup none /cgroup -t memory
|
||||
# mkdir /cgroup/test
|
||||
# echo 40M > /cgroup/test/memory.limit_in_bytes
|
||||
# echo 0 > /cgroup/test/tasks
|
||||
Run malloc(100M) program under this. You'll see 60M of swaps.
|
||||
(Shell-B)
|
||||
# move all tasks in /cgroup/test to /cgroup
|
||||
# /sbin/swapoff -a
|
||||
# rmdir /test/cgroup
|
||||
# kill malloc task.
|
||||
|
||||
Of course, tmpfs v.s. swapoff test should be tested, too.
|
||||
@@ -251,7 +251,7 @@ NFS/RDMA Setup
|
||||
|
||||
Instruct the server to listen on the RDMA transport:
|
||||
|
||||
$ echo rdma 2050 > /proc/fs/nfsd/portlist
|
||||
$ echo rdma 20049 > /proc/fs/nfsd/portlist
|
||||
|
||||
- On the client system
|
||||
|
||||
@@ -263,7 +263,7 @@ NFS/RDMA Setup
|
||||
Regardless of how the client was built (module or built-in), use this
|
||||
command to mount the NFS/RDMA server:
|
||||
|
||||
$ mount -o rdma,port=2050 <IPoIB-server-name-or-address>:/<export> /mnt
|
||||
$ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
|
||||
|
||||
To verify that the mount is using RDMA, run "cat /proc/mounts" and check
|
||||
the "proto" field for the given mount.
|
||||
|
||||
@@ -1371,292 +1371,8 @@ auto_msgmni default value is 1.
|
||||
2.4 /proc/sys/vm - The virtual memory subsystem
|
||||
-----------------------------------------------
|
||||
|
||||
The files in this directory can be used to tune the operation of the virtual
|
||||
memory (VM) subsystem of the Linux kernel.
|
||||
|
||||
vfs_cache_pressure
|
||||
------------------
|
||||
|
||||
Controls the tendency of the kernel to reclaim the memory which is used for
|
||||
caching of directory and inode objects.
|
||||
|
||||
At the default value of vfs_cache_pressure=100 the kernel will attempt to
|
||||
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
|
||||
swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
|
||||
to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100
|
||||
causes the kernel to prefer to reclaim dentries and inodes.
|
||||
|
||||
dirty_background_bytes
|
||||
----------------------
|
||||
|
||||
Contains the amount of dirty memory at which the pdflush background writeback
|
||||
daemon will start writeback.
|
||||
|
||||
If dirty_background_bytes is written, dirty_background_ratio becomes a function
|
||||
of its value (dirty_background_bytes / the amount of dirtyable system memory).
|
||||
|
||||
dirty_background_ratio
|
||||
----------------------
|
||||
|
||||
Contains, as a percentage of the dirtyable system memory (free pages + mapped
|
||||
pages + file cache, not including locked pages and HugePages), the number of
|
||||
pages at which the pdflush background writeback daemon will start writing out
|
||||
dirty data.
|
||||
|
||||
If dirty_background_ratio is written, dirty_background_bytes becomes a function
|
||||
of its value (dirty_background_ratio * the amount of dirtyable system memory).
|
||||
|
||||
dirty_bytes
|
||||
-----------
|
||||
|
||||
Contains the amount of dirty memory at which a process generating disk writes
|
||||
will itself start writeback.
|
||||
|
||||
If dirty_bytes is written, dirty_ratio becomes a function of its value
|
||||
(dirty_bytes / the amount of dirtyable system memory).
|
||||
|
||||
dirty_ratio
|
||||
-----------
|
||||
|
||||
Contains, as a percentage of the dirtyable system memory (free pages + mapped
|
||||
pages + file cache, not including locked pages and HugePages), the number of
|
||||
pages at which a process which is generating disk writes will itself start
|
||||
writing out dirty data.
|
||||
|
||||
If dirty_ratio is written, dirty_bytes becomes a function of its value
|
||||
(dirty_ratio * the amount of dirtyable system memory).
|
||||
|
||||
dirty_writeback_centisecs
|
||||
-------------------------
|
||||
|
||||
The pdflush writeback daemons will periodically wake up and write `old' data
|
||||
out to disk. This tunable expresses the interval between those wakeups, in
|
||||
100'ths of a second.
|
||||
|
||||
Setting this to zero disables periodic writeback altogether.
|
||||
|
||||
dirty_expire_centisecs
|
||||
----------------------
|
||||
|
||||
This tunable is used to define when dirty data is old enough to be eligible
|
||||
for writeout by the pdflush daemons. It is expressed in 100'ths of a second.
|
||||
Data which has been dirty in-memory for longer than this interval will be
|
||||
written out next time a pdflush daemon wakes up.
|
||||
|
||||
highmem_is_dirtyable
|
||||
--------------------
|
||||
|
||||
Only present if CONFIG_HIGHMEM is set.
|
||||
|
||||
This defaults to 0 (false), meaning that the ratios set above are calculated
|
||||
as a percentage of lowmem only. This protects against excessive scanning
|
||||
in page reclaim, swapping and general VM distress.
|
||||
|
||||
Setting this to 1 can be useful on 32 bit machines where you want to make
|
||||
random changes within an MMAPed file that is larger than your available
|
||||
lowmem without causing large quantities of random IO. Is is safe if the
|
||||
behavior of all programs running on the machine is known and memory will
|
||||
not be otherwise stressed.
|
||||
|
||||
legacy_va_layout
|
||||
----------------
|
||||
|
||||
If non-zero, this sysctl disables the new 32-bit mmap mmap layout - the kernel
|
||||
will use the legacy (2.4) layout for all processes.
|
||||
|
||||
lowmem_reserve_ratio
|
||||
---------------------
|
||||
|
||||
For some specialised workloads on highmem machines it is dangerous for
|
||||
the kernel to allow process memory to be allocated from the "lowmem"
|
||||
zone. This is because that memory could then be pinned via the mlock()
|
||||
system call, or by unavailability of swapspace.
|
||||
|
||||
And on large highmem machines this lack of reclaimable lowmem memory
|
||||
can be fatal.
|
||||
|
||||
So the Linux page allocator has a mechanism which prevents allocations
|
||||
which _could_ use highmem from using too much lowmem. This means that
|
||||
a certain amount of lowmem is defended from the possibility of being
|
||||
captured into pinned user memory.
|
||||
|
||||
(The same argument applies to the old 16 megabyte ISA DMA region. This
|
||||
mechanism will also defend that region from allocations which could use
|
||||
highmem or lowmem).
|
||||
|
||||
The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is
|
||||
in defending these lower zones.
|
||||
|
||||
If you have a machine which uses highmem or ISA DMA and your
|
||||
applications are using mlock(), or if you are running with no swap then
|
||||
you probably should change the lowmem_reserve_ratio setting.
|
||||
|
||||
The lowmem_reserve_ratio is an array. You can see them by reading this file.
|
||||
-
|
||||
% cat /proc/sys/vm/lowmem_reserve_ratio
|
||||
256 256 32
|
||||
-
|
||||
Note: # of this elements is one fewer than number of zones. Because the highest
|
||||
zone's value is not necessary for following calculation.
|
||||
|
||||
But, these values are not used directly. The kernel calculates # of protection
|
||||
pages for each zones from them. These are shown as array of protection pages
|
||||
in /proc/zoneinfo like followings. (This is an example of x86-64 box).
|
||||
Each zone has an array of protection pages like this.
|
||||
|
||||
-
|
||||
Node 0, zone DMA
|
||||
pages free 1355
|
||||
min 3
|
||||
low 3
|
||||
high 4
|
||||
:
|
||||
:
|
||||
numa_other 0
|
||||
protection: (0, 2004, 2004, 2004)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
pagesets
|
||||
cpu: 0 pcp: 0
|
||||
:
|
||||
-
|
||||
These protections are added to score to judge whether this zone should be used
|
||||
for page allocation or should be reclaimed.
|
||||
|
||||
In this example, if normal pages (index=2) are required to this DMA zone and
|
||||
pages_high is used for watermark, the kernel judges this zone should not be
|
||||
used because pages_free(1355) is smaller than watermark + protection[2]
|
||||
(4 + 2004 = 2008). If this protection value is 0, this zone would be used for
|
||||
normal page requirement. If requirement is DMA zone(index=0), protection[0]
|
||||
(=0) is used.
|
||||
|
||||
zone[i]'s protection[j] is calculated by following expression.
|
||||
|
||||
(i < j):
|
||||
zone[i]->protection[j]
|
||||
= (total sums of present_pages from zone[i+1] to zone[j] on the node)
|
||||
/ lowmem_reserve_ratio[i];
|
||||
(i = j):
|
||||
(should not be protected. = 0;
|
||||
(i > j):
|
||||
(not necessary, but looks 0)
|
||||
|
||||
The default values of lowmem_reserve_ratio[i] are
|
||||
256 (if zone[i] means DMA or DMA32 zone)
|
||||
32 (others).
|
||||
As above expression, they are reciprocal number of ratio.
|
||||
256 means 1/256. # of protection pages becomes about "0.39%" of total present
|
||||
pages of higher zones on the node.
|
||||
|
||||
If you would like to protect more pages, smaller values are effective.
|
||||
The minimum value is 1 (1/1 -> 100%).
|
||||
|
||||
page-cluster
|
||||
------------
|
||||
|
||||
page-cluster controls the number of pages which are written to swap in
|
||||
a single attempt. The swap I/O size.
|
||||
|
||||
It is a logarithmic value - setting it to zero means "1 page", setting
|
||||
it to 1 means "2 pages", setting it to 2 means "4 pages", etc.
|
||||
|
||||
The default value is three (eight pages at a time). There may be some
|
||||
small benefits in tuning this to a different value if your workload is
|
||||
swap-intensive.
|
||||
|
||||
overcommit_memory
|
||||
-----------------
|
||||
|
||||
Controls overcommit of system memory, possibly allowing processes
|
||||
to allocate (but not use) more memory than is actually available.
|
||||
|
||||
|
||||
0 - Heuristic overcommit handling. Obvious overcommits of
|
||||
address space are refused. Used for a typical system. It
|
||||
ensures a seriously wild allocation fails while allowing
|
||||
overcommit to reduce swap usage. root is allowed to
|
||||
allocate slightly more memory in this mode. This is the
|
||||
default.
|
||||
|
||||
1 - Always overcommit. Appropriate for some scientific
|
||||
applications.
|
||||
|
||||
2 - Don't overcommit. The total address space commit
|
||||
for the system is not permitted to exceed swap plus a
|
||||
configurable percentage (default is 50) of physical RAM.
|
||||
Depending on the percentage you use, in most situations
|
||||
this means a process will not be killed while attempting
|
||||
to use already-allocated memory but will receive errors
|
||||
on memory allocation as appropriate.
|
||||
|
||||
overcommit_ratio
|
||||
----------------
|
||||
|
||||
Percentage of physical memory size to include in overcommit calculations
|
||||
(see above.)
|
||||
|
||||
Memory allocation limit = swapspace + physmem * (overcommit_ratio / 100)
|
||||
|
||||
swapspace = total size of all swap areas
|
||||
physmem = size of physical memory in system
|
||||
|
||||
nr_hugepages and hugetlb_shm_group
|
||||
----------------------------------
|
||||
|
||||
nr_hugepages configures number of hugetlb page reserved for the system.
|
||||
|
||||
hugetlb_shm_group contains group id that is allowed to create SysV shared
|
||||
memory segment using hugetlb page.
|
||||
|
||||
hugepages_treat_as_movable
|
||||
--------------------------
|
||||
|
||||
This parameter is only useful when kernelcore= is specified at boot time to
|
||||
create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages
|
||||
are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero
|
||||
value written to hugepages_treat_as_movable allows huge pages to be allocated
|
||||
from ZONE_MOVABLE.
|
||||
|
||||
Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge
|
||||
pages pool can easily grow or shrink within. Assuming that applications are
|
||||
not running that mlock() a lot of memory, it is likely the huge pages pool
|
||||
can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value
|
||||
into nr_hugepages and triggering page reclaim.
|
||||
|
||||
laptop_mode
|
||||
-----------
|
||||
|
||||
laptop_mode is a knob that controls "laptop mode". All the things that are
|
||||
controlled by this knob are discussed in Documentation/laptops/laptop-mode.txt.
|
||||
|
||||
block_dump
|
||||
----------
|
||||
|
||||
block_dump enables block I/O debugging when set to a nonzero value. More
|
||||
information on block I/O debugging is in Documentation/laptops/laptop-mode.txt.
|
||||
|
||||
swap_token_timeout
|
||||
------------------
|
||||
|
||||
This file contains valid hold time of swap out protection token. The Linux
|
||||
VM has token based thrashing control mechanism and uses the token to prevent
|
||||
unnecessary page faults in thrashing situation. The unit of the value is
|
||||
second. The value would be useful to tune thrashing behavior.
|
||||
|
||||
drop_caches
|
||||
-----------
|
||||
|
||||
Writing to this will cause the kernel to drop clean caches, dentries and
|
||||
inodes from memory, causing that memory to become free.
|
||||
|
||||
To free pagecache:
|
||||
echo 1 > /proc/sys/vm/drop_caches
|
||||
To free dentries and inodes:
|
||||
echo 2 > /proc/sys/vm/drop_caches
|
||||
To free pagecache, dentries and inodes:
|
||||
echo 3 > /proc/sys/vm/drop_caches
|
||||
|
||||
As this is a non-destructive operation and dirty objects are not freeable, the
|
||||
user should run `sync' first.
|
||||
Please see: Documentation/sysctls/vm.txt for a description of these
|
||||
entries.
|
||||
|
||||
|
||||
2.5 /proc/sys/dev - Device specific parameters
|
||||
@@ -2311,6 +2027,34 @@ increase the likelihood of this process being killed by the oom-killer. Valid
|
||||
values are in the range -16 to +15, plus the special value -17, which disables
|
||||
oom-killing altogether for this process.
|
||||
|
||||
The process to be killed in an out-of-memory situation is selected among all others
|
||||
based on its badness score. This value equals the original memory size of the process
|
||||
and is then updated according to its CPU time (utime + stime) and the
|
||||
run time (uptime - start time). The longer it runs the smaller is the score.
|
||||
Badness score is divided by the square root of the CPU time and then by
|
||||
the double square root of the run time.
|
||||
|
||||
Swapped out tasks are killed first. Half of each child's memory size is added to
|
||||
the parent's score if they do not share the same memory. Thus forking servers
|
||||
are the prime candidates to be killed. Having only one 'hungry' child will make
|
||||
parent less preferable than the child.
|
||||
|
||||
/proc/<pid>/oom_score shows process' current badness score.
|
||||
|
||||
The following heuristics are then applied:
|
||||
* if the task was reniced, its score doubles
|
||||
* superuser or direct hardware access tasks (CAP_SYS_ADMIN, CAP_SYS_RESOURCE
|
||||
or CAP_SYS_RAWIO) have their score divided by 4
|
||||
* if oom condition happened in one cpuset and checked task does not belong
|
||||
to it, its score is divided by 8
|
||||
* the resulting score is multiplied by two to the power of oom_adj, i.e.
|
||||
points <<= oom_adj when it is positive and
|
||||
points >>= -(oom_adj) otherwise
|
||||
|
||||
The task with the highest badness score is then selected and its children
|
||||
are killed, process itself will be killed in an OOM situation when it does
|
||||
not have children or some of them disabled oom like described above.
|
||||
|
||||
2.13 /proc/<pid>/oom_score - Display current oom-killer score
|
||||
-------------------------------------------------------------
|
||||
|
||||
|
||||
@@ -0,0 +1,87 @@
|
||||
This describes the interface for the ADT7475 driver:
|
||||
|
||||
(there are 4 fans, numbered fan1 to fan4):
|
||||
|
||||
fanX_input Read the current speed of the fan (in RPMs)
|
||||
fanX_min Read/write the minimum speed of the fan. Dropping
|
||||
below this sets an alarm.
|
||||
|
||||
(there are three PWMs, numbered pwm1 to pwm3):
|
||||
|
||||
pwmX Read/write the current duty cycle of the PWM. Writes
|
||||
only have effect when auto mode is turned off (see
|
||||
below). Range is 0 - 255.
|
||||
|
||||
pwmX_enable Fan speed control method:
|
||||
|
||||
0 - No control (fan at full speed)
|
||||
1 - Manual fan speed control (using pwm[1-*])
|
||||
2 - Automatic fan speed control
|
||||
|
||||
pwmX_auto_channels_temp Select which channels affect this PWM
|
||||
|
||||
1 - TEMP1 controls PWM
|
||||
2 - TEMP2 controls PWM
|
||||
4 - TEMP3 controls PWM
|
||||
6 - TEMP2 and TEMP3 control PWM
|
||||
7 - All three inputs control PWM
|
||||
|
||||
pwmX_freq Read/write the PWM frequency in Hz. The number
|
||||
should be one of the following:
|
||||
|
||||
11 Hz
|
||||
14 Hz
|
||||
22 Hz
|
||||
29 Hz
|
||||
35 Hz
|
||||
44 Hz
|
||||
58 Hz
|
||||
88 Hz
|
||||
|
||||
pwmX_auto_point1_pwm Read/write the minimum PWM duty cycle in automatic mode
|
||||
|
||||
pwmX_auto_point2_pwm Read/write the maximum PWM duty cycle in automatic mode
|
||||
|
||||
(there are three temperature settings numbered temp1 to temp3):
|
||||
|
||||
tempX_input Read the current temperature. The value is in milli
|
||||
degrees of Celsius.
|
||||
|
||||
tempX_max Read/write the upper temperature limit - exceeding this
|
||||
will cause an alarm.
|
||||
|
||||
tempX_min Read/write the lower temperature limit - exceeding this
|
||||
will cause an alarm.
|
||||
|
||||
tempX_offset Read/write the temperature adjustment offset
|
||||
|
||||
tempX_crit Read/write the THERM limit for remote1.
|
||||
|
||||
tempX_crit_hyst Set the temperature value below crit where the
|
||||
fans will stay on - this helps drive the temperature
|
||||
low enough so it doesn't stay near the edge and
|
||||
cause THERM to keep tripping.
|
||||
|
||||
tempX_auto_point1_temp Read/write the minimum temperature where the fans will
|
||||
turn on in automatic mode.
|
||||
|
||||
tempX_auto_point2_temp Read/write the maximum temperature over which the fans
|
||||
will run in automatic mode. tempX_auto_point1_temp
|
||||
and tempX_auto_point2_temp together define the
|
||||
range of automatic control.
|
||||
|
||||
tempX_alarm Read a 1 if the max/min alarm is set
|
||||
tempX_fault Read a 1 if either temp1 or temp3 diode has a fault
|
||||
|
||||
(There are two voltage settings, in1 and in2):
|
||||
|
||||
inX_input Read the current voltage on VCC. Value is in
|
||||
millivolts.
|
||||
|
||||
inX_min read/write the minimum voltage limit.
|
||||
Dropping below this causes an alarm.
|
||||
|
||||
inX_max read/write the maximum voltage limit.
|
||||
Exceeding this causes an alarm.
|
||||
|
||||
inX_alarm Read a 1 if the max/min alarm is set.
|
||||
@@ -13,18 +13,21 @@ Author:
|
||||
Description
|
||||
-----------
|
||||
|
||||
This driver provides support for the accelerometer found in various HP laptops
|
||||
sporting the feature officially called "HP Mobile Data Protection System 3D" or
|
||||
"HP 3D DriveGuard". It detect automatically laptops with this sensor. Known models
|
||||
(for now the HP 2133, nc6420, nc2510, nc8510, nc84x0, nw9440 and nx9420) will
|
||||
have their axis automatically oriented on standard way (eg: you can directly
|
||||
play neverball). The accelerometer data is readable via
|
||||
This driver provides support for the accelerometer found in various HP
|
||||
laptops sporting the feature officially called "HP Mobile Data
|
||||
Protection System 3D" or "HP 3D DriveGuard". It detect automatically
|
||||
laptops with this sensor. Known models (for now the HP 2133, nc6420,
|
||||
nc2510, nc8510, nc84x0, nw9440 and nx9420) will have their axis
|
||||
automatically oriented on standard way (eg: you can directly play
|
||||
neverball). The accelerometer data is readable via
|
||||
/sys/devices/platform/lis3lv02d.
|
||||
|
||||
Sysfs attributes under /sys/devices/platform/lis3lv02d/:
|
||||
position - 3D position that the accelerometer reports. Format: "(x,y,z)"
|
||||
calibrate - read: values (x, y, z) that are used as the base for input class device operation.
|
||||
write: forces the base to be recalibrated with the current position.
|
||||
calibrate - read: values (x, y, z) that are used as the base for input
|
||||
class device operation.
|
||||
write: forces the base to be recalibrated with the current
|
||||
position.
|
||||
rate - reports the sampling rate of the accelerometer device in HZ
|
||||
|
||||
This driver also provides an absolute input class device, allowing
|
||||
@@ -39,11 +42,12 @@ the accelerometer are converted into a "standard" organisation of the axes
|
||||
* When the laptop is horizontal the position reported is about 0 for X and Y
|
||||
and a positive value for Z
|
||||
* If the left side is elevated, X increases (becomes positive)
|
||||
* If the front side (where the touchpad is) is elevated, Y decreases (becomes negative)
|
||||
* If the front side (where the touchpad is) is elevated, Y decreases
|
||||
(becomes negative)
|
||||
* If the laptop is put upside-down, Z becomes negative
|
||||
|
||||
If your laptop model is not recognized (cf "dmesg"), you can send an email to the
|
||||
authors to add it to the database. When reporting a new laptop, please include
|
||||
the output of "dmidecode" plus the value of /sys/devices/platform/lis3lv02d/position
|
||||
in these four cases.
|
||||
If your laptop model is not recognized (cf "dmesg"), you can send an
|
||||
email to the authors to add it to the database. When reporting a new
|
||||
laptop, please include the output of "dmidecode" plus the value of
|
||||
/sys/devices/platform/lis3lv02d/position in these four cases.
|
||||
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user