You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge branches 'topic/fix/asoc', 'topic/fix/hda', 'topic/fix/misc' and 'topic/pci-ioremap-bar' into for-linus
This commit is contained in:
@@ -66,6 +66,7 @@ Kenneth W Chen <kenneth.w.chen@intel.com>
|
||||
Koushik <raghavendra.koushik@neterion.com>
|
||||
Leonid I Ananiev <leonid.i.ananiev@intel.com>
|
||||
Linas Vepstas <linas@austin.ibm.com>
|
||||
Mark Brown <broonie@sirena.org.uk>
|
||||
Matthieu CASTET <castet.matthieu@free.fr>
|
||||
Michael Buesch <mb@bu3sch.de>
|
||||
Michael Buesch <mbuesch@freenet.de>
|
||||
|
||||
@@ -1653,14 +1653,14 @@ S: Chapel Hill, North Carolina 27514-4818
|
||||
S: USA
|
||||
|
||||
N: Dave Jones
|
||||
E: davej@codemonkey.org.uk
|
||||
E: davej@redhat.com
|
||||
W: http://www.codemonkey.org.uk
|
||||
D: x86 errata/setup maintenance.
|
||||
D: AGPGART driver.
|
||||
D: Assorted VIA x86 support.
|
||||
D: 2.5 AGPGART overhaul.
|
||||
D: CPUFREQ maintenance.
|
||||
D: Backport/Forwardport merge monkey.
|
||||
D: Various Janitor work.
|
||||
S: United Kingdom
|
||||
D: Fedora kernel maintainence.
|
||||
D: Misc/Other.
|
||||
S: 314 Littleton Rd, Westford, MA 01886, USA
|
||||
|
||||
N: Martin Josfsson
|
||||
E: gandalf@wlug.westbo.se
|
||||
|
||||
@@ -1105,7 +1105,7 @@ static struct block_device_operations opt_fops = {
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Function names as strings (__FUNCTION__).
|
||||
Function names as strings (__func__).
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
|
||||
@@ -236,10 +236,8 @@ software system can set different pages for controlling accesses to the
|
||||
MSI-X structure. The implementation of MSI support requires the PCI
|
||||
subsystem, not a device driver, to maintain full control of the MSI-X
|
||||
table/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-X
|
||||
table/MSI-X PBA. A device driver is prohibited from requesting the MMIO
|
||||
address space of the MSI-X table/MSI-X PBA. Otherwise, the PCI subsystem
|
||||
will fail enabling MSI-X on its hardware device when it calls the function
|
||||
pci_enable_msix().
|
||||
table/MSI-X PBA. A device driver should not access the MMIO address
|
||||
space of the MSI-X table/MSI-X PBA.
|
||||
|
||||
5.3.2 API pci_enable_msix
|
||||
|
||||
|
||||
@@ -163,6 +163,10 @@ need pass only as many optional fields as necessary:
|
||||
o class and classmask fields default to 0
|
||||
o driver_data defaults to 0UL.
|
||||
|
||||
Note that driver_data must match the value used by any of the pci_device_id
|
||||
entries defined in the driver. This makes the driver_data field mandatory
|
||||
if all the pci_device_id entries have a non-zero driver_data value.
|
||||
|
||||
Once added, the driver probe routine will be invoked for any unclaimed
|
||||
PCI devices listed in its (newly updated) pci_ids list.
|
||||
|
||||
|
||||
@@ -203,22 +203,17 @@ to mmio_enabled.
|
||||
|
||||
3.3 helper functions
|
||||
|
||||
3.3.1 int pci_find_aer_capability(struct pci_dev *dev);
|
||||
pci_find_aer_capability locates the PCI Express AER capability
|
||||
in the device configuration space. If the device doesn't support
|
||||
PCI-Express AER, the function returns 0.
|
||||
|
||||
3.3.2 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
|
||||
3.3.1 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
|
||||
pci_enable_pcie_error_reporting enables the device to send error
|
||||
messages to root port when an error is detected. Note that devices
|
||||
don't enable the error reporting by default, so device drivers need
|
||||
call this function to enable it.
|
||||
|
||||
3.3.3 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
|
||||
3.3.2 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
|
||||
pci_disable_pcie_error_reporting disables the device to send error
|
||||
messages to root port when an error is detected.
|
||||
|
||||
3.3.4 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
|
||||
3.3.3 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
|
||||
pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
|
||||
error status register.
|
||||
|
||||
|
||||
@@ -0,0 +1,99 @@
|
||||
The cgroup freezer is useful to batch job management system which start
|
||||
and stop sets of tasks in order to schedule the resources of a machine
|
||||
according to the desires of a system administrator. This sort of program
|
||||
is often used on HPC clusters to schedule access to the cluster as a
|
||||
whole. The cgroup freezer uses cgroups to describe the set of tasks to
|
||||
be started/stopped by the batch job management system. It also provides
|
||||
a means to start and stop the tasks composing the job.
|
||||
|
||||
The cgroup freezer will also be useful for checkpointing running groups
|
||||
of tasks. The freezer allows the checkpoint code to obtain a consistent
|
||||
image of the tasks by attempting to force the tasks in a cgroup into a
|
||||
quiescent state. Once the tasks are quiescent another task can
|
||||
walk /proc or invoke a kernel interface to gather information about the
|
||||
quiesced tasks. Checkpointed tasks can be restarted later should a
|
||||
recoverable error occur. This also allows the checkpointed tasks to be
|
||||
migrated between nodes in a cluster by copying the gathered information
|
||||
to another node and restarting the tasks there.
|
||||
|
||||
Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping
|
||||
and resuming tasks in userspace. Both of these signals are observable
|
||||
from within the tasks we wish to freeze. While SIGSTOP cannot be caught,
|
||||
blocked, or ignored it can be seen by waiting or ptracing parent tasks.
|
||||
SIGCONT is especially unsuitable since it can be caught by the task. Any
|
||||
programs designed to watch for SIGSTOP and SIGCONT could be broken by
|
||||
attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can
|
||||
demonstrate this problem using nested bash shells:
|
||||
|
||||
$ echo $$
|
||||
16644
|
||||
$ bash
|
||||
$ echo $$
|
||||
16690
|
||||
|
||||
From a second, unrelated bash shell:
|
||||
$ kill -SIGSTOP 16690
|
||||
$ kill -SIGCONT 16990
|
||||
|
||||
<at this point 16990 exits and causes 16644 to exit too>
|
||||
|
||||
This happens because bash can observe both signals and choose how it
|
||||
responds to them.
|
||||
|
||||
Another example of a program which catches and responds to these
|
||||
signals is gdb. In fact any program designed to use ptrace is likely to
|
||||
have a problem with this method of stopping and resuming tasks.
|
||||
|
||||
In contrast, the cgroup freezer uses the kernel freezer code to
|
||||
prevent the freeze/unfreeze cycle from becoming visible to the tasks
|
||||
being frozen. This allows the bash example above and gdb to run as
|
||||
expected.
|
||||
|
||||
The freezer subsystem in the container filesystem defines a file named
|
||||
freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the
|
||||
cgroup. Subsequently writing "THAWED" will unfreeze the tasks in the cgroup.
|
||||
Reading will return the current state.
|
||||
|
||||
* Examples of usage :
|
||||
|
||||
# mkdir /containers/freezer
|
||||
# mount -t cgroup -ofreezer freezer /containers
|
||||
# mkdir /containers/0
|
||||
# echo $some_pid > /containers/0/tasks
|
||||
|
||||
to get status of the freezer subsystem :
|
||||
|
||||
# cat /containers/0/freezer.state
|
||||
THAWED
|
||||
|
||||
to freeze all tasks in the container :
|
||||
|
||||
# echo FROZEN > /containers/0/freezer.state
|
||||
# cat /containers/0/freezer.state
|
||||
FREEZING
|
||||
# cat /containers/0/freezer.state
|
||||
FROZEN
|
||||
|
||||
to unfreeze all tasks in the container :
|
||||
|
||||
# echo THAWED > /containers/0/freezer.state
|
||||
# cat /containers/0/freezer.state
|
||||
THAWED
|
||||
|
||||
This is the basic mechanism which should do the right thing for user space task
|
||||
in a simple scenario.
|
||||
|
||||
It's important to note that freezing can be incomplete. In that case we return
|
||||
EBUSY. This means that some tasks in the cgroup are busy doing something that
|
||||
prevents us from completely freezing the cgroup at this time. After EBUSY,
|
||||
the cgroup will remain partially frozen -- reflected by freezer.state reporting
|
||||
"FREEZING" when read. The state will remain "FREEZING" until one of these
|
||||
things happens:
|
||||
|
||||
1) Userspace cancels the freezing operation by writing "THAWED" to
|
||||
the freezer.state file
|
||||
2) Userspace retries the freezing operation by writing "FROZEN" to
|
||||
the freezer.state file (writing "FREEZING" is not legal
|
||||
and returns EIO)
|
||||
3) The tasks that blocked the cgroup from entering the "FROZEN"
|
||||
state disappear from the cgroup's set of tasks.
|
||||
@@ -112,14 +112,22 @@ the per cgroup LRU.
|
||||
|
||||
2.2.1 Accounting details
|
||||
|
||||
All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted.
|
||||
RSS pages are accounted at the time of page_add_*_rmap() unless they've already
|
||||
been accounted for earlier. A file page will be accounted for as Page Cache;
|
||||
it's mapped into the page tables of a process, duplicate accounting is carefully
|
||||
avoided. Page Cache pages are accounted at the time of add_to_page_cache().
|
||||
The corresponding routines that remove a page from the page tables or removes
|
||||
a page from Page Cache is used to decrement the accounting counters of the
|
||||
cgroup.
|
||||
All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
|
||||
(some pages which never be reclaimable and will not be on global LRU
|
||||
are not accounted. we just accounts pages under usual vm management.)
|
||||
|
||||
RSS pages are accounted at page_fault unless they've already been accounted
|
||||
for earlier. A file page will be accounted for as Page Cache when it's
|
||||
inserted into inode (radix-tree). While it's mapped into the page tables of
|
||||
processes, duplicate accounting is carefully avoided.
|
||||
|
||||
A RSS page is unaccounted when it's fully unmapped. A PageCache page is
|
||||
unaccounted when it's removed from radix-tree.
|
||||
|
||||
At page migration, accounting information is kept.
|
||||
|
||||
Note: we just account pages-on-lru because our purpose is to control amount
|
||||
of used pages. not-on-lru pages are tend to be out-of-control from vm view.
|
||||
|
||||
2.3 Shared Page Accounting
|
||||
|
||||
|
||||
@@ -48,7 +48,7 @@ hooks, beyond what is already present, required to manage dynamic
|
||||
job placement on large systems.
|
||||
|
||||
Cpusets use the generic cgroup subsystem described in
|
||||
Documentation/cgroup.txt.
|
||||
Documentation/cgroups/cgroups.txt.
|
||||
|
||||
Requests by a task, using the sched_setaffinity(2) system call to
|
||||
include CPUs in its CPU affinity mask, and using the mbind(2) and
|
||||
|
||||
@@ -96,6 +96,11 @@ errors=remount-ro(*) Remount the filesystem read-only on an error.
|
||||
errors=continue Keep going on a filesystem error.
|
||||
errors=panic Panic and halt the machine if an error occurs.
|
||||
|
||||
data_err=ignore(*) Just print an error message if an error occurs
|
||||
in a file data buffer in ordered mode.
|
||||
data_err=abort Abort the journal if an error occurs in a file
|
||||
data buffer in ordered mode.
|
||||
|
||||
grpid Give objects the same group ID as their creator.
|
||||
bsdgroups
|
||||
|
||||
|
||||
@@ -1384,15 +1384,18 @@ causes the kernel to prefer to reclaim dentries and inodes.
|
||||
dirty_background_ratio
|
||||
----------------------
|
||||
|
||||
Contains, as a percentage of total system memory, the number of pages at which
|
||||
the pdflush background writeback daemon will start writing out dirty data.
|
||||
Contains, as a percentage of the dirtyable system memory (free pages + mapped
|
||||
pages + file cache, not including locked pages and HugePages), the number of
|
||||
pages at which the pdflush background writeback daemon will start writing out
|
||||
dirty data.
|
||||
|
||||
dirty_ratio
|
||||
-----------------
|
||||
|
||||
Contains, as a percentage of total system memory, the number of pages at which
|
||||
a process which is generating disk writes will itself start writing out dirty
|
||||
data.
|
||||
Contains, as a percentage of the dirtyable system memory (free pages + mapped
|
||||
pages + file cache, not including locked pages and HugePages), the number of
|
||||
pages at which a process which is generating disk writes will itself start
|
||||
writing out dirty data.
|
||||
|
||||
dirty_writeback_centisecs
|
||||
-------------------------
|
||||
@@ -2412,24 +2415,29 @@ will be dumped when the <pid> process is dumped. coredump_filter is a bitmask
|
||||
of memory types. If a bit of the bitmask is set, memory segments of the
|
||||
corresponding memory type are dumped, otherwise they are not dumped.
|
||||
|
||||
The following 4 memory types are supported:
|
||||
The following 7 memory types are supported:
|
||||
- (bit 0) anonymous private memory
|
||||
- (bit 1) anonymous shared memory
|
||||
- (bit 2) file-backed private memory
|
||||
- (bit 3) file-backed shared memory
|
||||
- (bit 4) ELF header pages in file-backed private memory areas (it is
|
||||
effective only if the bit 2 is cleared)
|
||||
- (bit 5) hugetlb private memory
|
||||
- (bit 6) hugetlb shared memory
|
||||
|
||||
Note that MMIO pages such as frame buffer are never dumped and vDSO pages
|
||||
are always dumped regardless of the bitmask status.
|
||||
|
||||
Default value of coredump_filter is 0x3; this means all anonymous memory
|
||||
segments are dumped.
|
||||
Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only
|
||||
effected by bit 5-6.
|
||||
|
||||
Default value of coredump_filter is 0x23; this means all anonymous memory
|
||||
segments and hugetlb private memory are dumped.
|
||||
|
||||
If you don't want to dump all shared memory segments attached to pid 1234,
|
||||
write 1 to the process's proc file.
|
||||
write 0x21 to the process's proc file.
|
||||
|
||||
$ echo 0x1 > /proc/1234/coredump_filter
|
||||
$ echo 0x21 > /proc/1234/coredump_filter
|
||||
|
||||
When a new process is created, the process inherits the bitmask status from its
|
||||
parent. It is useful to set up coredump_filter before the program runs.
|
||||
|
||||
@@ -86,6 +86,15 @@ norm_unmount (*) commit on unmount; the journal is committed
|
||||
fast_unmount do not commit on unmount; this option makes
|
||||
unmount faster, but the next mount slower
|
||||
because of the need to replay the journal.
|
||||
bulk_read read more in one go to take advantage of flash
|
||||
media that read faster sequentially
|
||||
no_bulk_read (*) do not bulk-read
|
||||
no_chk_data_crc skip checking of CRCs on data nodes in order to
|
||||
improve read performance. Use this option only
|
||||
if the flash media is highly reliable. The effect
|
||||
of this option is that corruption of the contents
|
||||
of a file can go unnoticed.
|
||||
chk_data_crc (*) do not skip checking CRCs on data nodes
|
||||
|
||||
|
||||
Quick usage instructions
|
||||
|
||||
@@ -101,6 +101,7 @@ parameter is applicable:
|
||||
X86-64 X86-64 architecture is enabled.
|
||||
More X86-64 boot options can be found in
|
||||
Documentation/x86_64/boot-options.txt .
|
||||
X86 Either 32bit or 64bit x86 (same as X86-32+X86-64)
|
||||
|
||||
In addition, the following text indicates that the option:
|
||||
|
||||
@@ -690,7 +691,7 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
See Documentation/block/as-iosched.txt and
|
||||
Documentation/block/deadline-iosched.txt for details.
|
||||
|
||||
elfcorehdr= [X86-32, X86_64]
|
||||
elfcorehdr= [IA64,PPC,SH,X86-32,X86_64]
|
||||
Specifies physical address of start of kernel core
|
||||
image elf header. Generally kexec loader will
|
||||
pass this option to capture kernel.
|
||||
@@ -796,6 +797,8 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
Defaults to the default architecture's huge page size
|
||||
if not specified.
|
||||
|
||||
hlt [BUGS=ARM,SH]
|
||||
|
||||
i8042.debug [HW] Toggle i8042 debug mode
|
||||
i8042.direct [HW] Put keyboard port into non-translated mode
|
||||
i8042.dumbkbd [HW] Pretend that controller can only read data from
|
||||
@@ -1211,6 +1214,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
mem=nopentium [BUGS=X86-32] Disable usage of 4MB pages for kernel
|
||||
memory.
|
||||
|
||||
memchunk=nn[KMG]
|
||||
[KNL,SH] Allow user to override the default size for
|
||||
per-device physically contiguous DMA buffers.
|
||||
|
||||
memmap=exactmap [KNL,X86-32,X86_64] Enable setting of an exact
|
||||
E820 memory map, as specified by the user.
|
||||
Such memmap=exactmap lines can be constructed based on
|
||||
@@ -1393,6 +1400,8 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
|
||||
nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects.
|
||||
|
||||
nodsp [SH] Disable hardware DSP at boot time.
|
||||
|
||||
noefi [X86-32,X86-64] Disable EFI runtime services support.
|
||||
|
||||
noexec [IA-64]
|
||||
@@ -1409,13 +1418,15 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
noexec32=off: disable non-executable mappings
|
||||
read implies executable mappings
|
||||
|
||||
nofpu [SH] Disable hardware FPU at boot time.
|
||||
|
||||
nofxsr [BUGS=X86-32] Disables x86 floating point extended
|
||||
register save and restore. The kernel will only save
|
||||
legacy floating-point registers on task switch.
|
||||
|
||||
noclflush [BUGS=X86] Don't use the CLFLUSH instruction
|
||||
|
||||
nohlt [BUGS=ARM]
|
||||
nohlt [BUGS=ARM,SH]
|
||||
|
||||
no-hlt [BUGS=X86-32] Tells the kernel that the hlt
|
||||
instruction doesn't work correctly and not to
|
||||
@@ -1578,7 +1589,7 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
See also Documentation/paride.txt.
|
||||
|
||||
pci=option[,option...] [PCI] various PCI subsystem options:
|
||||
off [X86-32] don't probe for the PCI bus
|
||||
off [X86] don't probe for the PCI bus
|
||||
bios [X86-32] force use of PCI BIOS, don't access
|
||||
the hardware directly. Use this if your machine
|
||||
has a non-standard PCI host bridge.
|
||||
@@ -1586,9 +1597,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
hardware access methods are allowed. Use this
|
||||
if you experience crashes upon bootup and you
|
||||
suspect they are caused by the BIOS.
|
||||
conf1 [X86-32] Force use of PCI Configuration
|
||||
conf1 [X86] Force use of PCI Configuration
|
||||
Mechanism 1.
|
||||
conf2 [X86-32] Force use of PCI Configuration
|
||||
conf2 [X86] Force use of PCI Configuration
|
||||
Mechanism 2.
|
||||
noaer [PCIE] If the PCIEAER kernel config parameter is
|
||||
enabled, this kernel boot option can be used to
|
||||
@@ -1608,37 +1619,37 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
this option if the kernel is unable to allocate
|
||||
IRQs or discover secondary PCI buses on your
|
||||
motherboard.
|
||||
rom [X86-32] Assign address space to expansion ROMs.
|
||||
rom [X86] Assign address space to expansion ROMs.
|
||||
Use with caution as certain devices share
|
||||
address decoders between ROMs and other
|
||||
resources.
|
||||
norom [X86-32,X86_64] Do not assign address space to
|
||||
norom [X86] Do not assign address space to
|
||||
expansion ROMs that do not already have
|
||||
BIOS assigned address ranges.
|
||||
irqmask=0xMMMM [X86-32] Set a bit mask of IRQs allowed to be
|
||||
irqmask=0xMMMM [X86] Set a bit mask of IRQs allowed to be
|
||||
assigned automatically to PCI devices. You can
|
||||
make the kernel exclude IRQs of your ISA cards
|
||||
this way.
|
||||
pirqaddr=0xAAAAA [X86-32] Specify the physical address
|
||||
pirqaddr=0xAAAAA [X86] Specify the physical address
|
||||
of the PIRQ table (normally generated
|
||||
by the BIOS) if it is outside the
|
||||
F0000h-100000h range.
|
||||
lastbus=N [X86-32] Scan all buses thru bus #N. Can be
|
||||
lastbus=N [X86] Scan all buses thru bus #N. Can be
|
||||
useful if the kernel is unable to find your
|
||||
secondary buses and you want to tell it
|
||||
explicitly which ones they are.
|
||||
assign-busses [X86-32] Always assign all PCI bus
|
||||
assign-busses [X86] Always assign all PCI bus
|
||||
numbers ourselves, overriding
|
||||
whatever the firmware may have done.
|
||||
usepirqmask [X86-32] Honor the possible IRQ mask stored
|
||||
usepirqmask [X86] Honor the possible IRQ mask stored
|
||||
in the BIOS $PIR table. This is needed on
|
||||
some systems with broken BIOSes, notably
|
||||
some HP Pavilion N5400 and Omnibook XE3
|
||||
notebooks. This will have no effect if ACPI
|
||||
IRQ routing is enabled.
|
||||
noacpi [X86-32] Do not use ACPI for IRQ routing
|
||||
noacpi [X86] Do not use ACPI for IRQ routing
|
||||
or for PCI scanning.
|
||||
use_crs [X86-32] Use _CRS for PCI resource
|
||||
use_crs [X86] Use _CRS for PCI resource
|
||||
allocation.
|
||||
routeirq Do IRQ routing for all PCI devices.
|
||||
This is normally done in pci_enable_device(),
|
||||
@@ -1667,6 +1678,12 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
reserved for the CardBus bridge's memory
|
||||
window. The default value is 64 megabytes.
|
||||
|
||||
pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power
|
||||
Management.
|
||||
off Disable ASPM.
|
||||
force Enable ASPM even on devices that claim not to support it.
|
||||
WARNING: Forcing ASPM on may cause system lockups.
|
||||
|
||||
pcmv= [HW,PCMCIA] BadgePAD 4
|
||||
|
||||
pd. [PARIDE]
|
||||
|
||||
@@ -50,10 +50,12 @@ Connecting a function (probe) to a marker is done by providing a probe (function
|
||||
to call) for the specific marker through marker_probe_register() and can be
|
||||
activated by calling marker_arm(). Marker deactivation can be done by calling
|
||||
marker_disarm() as many times as marker_arm() has been called. Removing a probe
|
||||
is done through marker_probe_unregister(); it will disarm the probe and make
|
||||
sure there is no caller left using the probe when it returns. Probe removal is
|
||||
preempt-safe because preemption is disabled around the probe call. See the
|
||||
"Probe example" section below for a sample probe module.
|
||||
is done through marker_probe_unregister(); it will disarm the probe.
|
||||
marker_synchronize_unregister() must be called before the end of the module exit
|
||||
function to make sure there is no caller left using the probe. This, and the
|
||||
fact that preemption is disabled around the probe call, make sure that probe
|
||||
removal and module unload are safe. See the "Probe example" section below for a
|
||||
sample probe module.
|
||||
|
||||
The marker mechanism supports inserting multiple instances of the same marker.
|
||||
Markers can be put in inline functions, inlined static functions, and
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -95,7 +95,9 @@ On all - write a character to /proc/sysrq-trigger. e.g.:
|
||||
|
||||
'p' - Will dump the current registers and flags to your console.
|
||||
|
||||
'q' - Will dump a list of all running timers.
|
||||
'q' - Will dump per CPU lists of all armed hrtimers (but NOT regular
|
||||
timer_list timers) and detailed information about all
|
||||
clockevent devices.
|
||||
|
||||
'r' - Turns off keyboard raw mode and sets it to XLATE.
|
||||
|
||||
|
||||
@@ -0,0 +1,101 @@
|
||||
Using the Linux Kernel Tracepoints
|
||||
|
||||
Mathieu Desnoyers
|
||||
|
||||
|
||||
This document introduces Linux Kernel Tracepoints and their use. It provides
|
||||
examples of how to insert tracepoints in the kernel and connect probe functions
|
||||
to them and provides some examples of probe functions.
|
||||
|
||||
|
||||
* Purpose of tracepoints
|
||||
|
||||
A tracepoint placed in code provides a hook to call a function (probe) that you
|
||||
can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or
|
||||
"off" (no probe is attached). When a tracepoint is "off" it has no effect,
|
||||
except for adding a tiny time penalty (checking a condition for a branch) and
|
||||
space penalty (adding a few bytes for the function call at the end of the
|
||||
instrumented function and adds a data structure in a separate section). When a
|
||||
tracepoint is "on", the function you provide is called each time the tracepoint
|
||||
is executed, in the execution context of the caller. When the function provided
|
||||
ends its execution, it returns to the caller (continuing from the tracepoint
|
||||
site).
|
||||
|
||||
You can put tracepoints at important locations in the code. They are
|
||||
lightweight hooks that can pass an arbitrary number of parameters,
|
||||
which prototypes are described in a tracepoint declaration placed in a header
|
||||
file.
|
||||
|
||||
They can be used for tracing and performance accounting.
|
||||
|
||||
|
||||
* Usage
|
||||
|
||||
Two elements are required for tracepoints :
|
||||
|
||||
- A tracepoint definition, placed in a header file.
|
||||
- The tracepoint statement, in C code.
|
||||
|
||||
In order to use tracepoints, you should include linux/tracepoint.h.
|
||||
|
||||
In include/trace/subsys.h :
|
||||
|
||||
#include <linux/tracepoint.h>
|
||||
|
||||
DEFINE_TRACE(subsys_eventname,
|
||||
TPPTOTO(int firstarg, struct task_struct *p),
|
||||
TPARGS(firstarg, p));
|
||||
|
||||
In subsys/file.c (where the tracing statement must be added) :
|
||||
|
||||
#include <trace/subsys.h>
|
||||
|
||||
void somefct(void)
|
||||
{
|
||||
...
|
||||
trace_subsys_eventname(arg, task);
|
||||
...
|
||||
}
|
||||
|
||||
Where :
|
||||
- subsys_eventname is an identifier unique to your event
|
||||
- subsys is the name of your subsystem.
|
||||
- eventname is the name of the event to trace.
|
||||
- TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the function
|
||||
called by this tracepoint.
|
||||
- TPARGS(firstarg, p) are the parameters names, same as found in the prototype.
|
||||
|
||||
Connecting a function (probe) to a tracepoint is done by providing a probe
|
||||
(function to call) for the specific tracepoint through
|
||||
register_trace_subsys_eventname(). Removing a probe is done through
|
||||
unregister_trace_subsys_eventname(); it will remove the probe sure there is no
|
||||
caller left using the probe when it returns. Probe removal is preempt-safe
|
||||
because preemption is disabled around the probe call. See the "Probe example"
|
||||
section below for a sample probe module.
|
||||
|
||||
The tracepoint mechanism supports inserting multiple instances of the same
|
||||
tracepoint, but a single definition must be made of a given tracepoint name over
|
||||
all the kernel to make sure no type conflict will occur. Name mangling of the
|
||||
tracepoints is done using the prototypes to make sure typing is correct.
|
||||
Verification of probe type correctness is done at the registration site by the
|
||||
compiler. Tracepoints can be put in inline functions, inlined static functions,
|
||||
and unrolled loops as well as regular functions.
|
||||
|
||||
The naming scheme "subsys_event" is suggested here as a convention intended
|
||||
to limit collisions. Tracepoint names are global to the kernel: they are
|
||||
considered as being the same whether they are in the core kernel image or in
|
||||
modules.
|
||||
|
||||
|
||||
* Probe / tracepoint example
|
||||
|
||||
See the example provided in samples/tracepoints/src
|
||||
|
||||
Compile them with your kernel.
|
||||
|
||||
Run, as root :
|
||||
modprobe tracepoint-example (insmod order is not important)
|
||||
modprobe tracepoint-probe-example
|
||||
cat /proc/tracepoint-example (returns an expected error)
|
||||
rmmod tracepoint-example tracepoint-probe-example
|
||||
dmesg
|
||||
@@ -36,7 +36,7 @@ $ mount -t debugfs debugfs /debug
|
||||
$ echo mmiotrace > /debug/tracing/current_tracer
|
||||
$ cat /debug/tracing/trace_pipe > mydump.txt &
|
||||
Start X or whatever.
|
||||
$ echo "X is up" > /debug/tracing/marker
|
||||
$ echo "X is up" > /debug/tracing/trace_marker
|
||||
$ echo none > /debug/tracing/current_tracer
|
||||
Check for lost events.
|
||||
|
||||
@@ -59,9 +59,8 @@ The 'cat' process should stay running (sleeping) in the background.
|
||||
Load the driver you want to trace and use it. Mmiotrace will only catch MMIO
|
||||
accesses to areas that are ioremapped while mmiotrace is active.
|
||||
|
||||
[Unimplemented feature:]
|
||||
During tracing you can place comments (markers) into the trace by
|
||||
$ echo "X is up" > /debug/tracing/marker
|
||||
$ echo "X is up" > /debug/tracing/trace_marker
|
||||
This makes it easier to see which part of the (huge) trace corresponds to
|
||||
which action. It is recommended to place descriptive markers about what you
|
||||
do.
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user