You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge branch 'master' into upstream
This commit is contained in:
@@ -0,0 +1,96 @@
|
|||||||
|
#
|
||||||
|
# This list is used by git-shortlog to fix a few botched name translations
|
||||||
|
# in the git archive, either because the author's full name was messed up
|
||||||
|
# and/or not always written the same way, making contributions from the
|
||||||
|
# same person appearing not to be so or badly displayed.
|
||||||
|
#
|
||||||
|
# repo-abbrev: /pub/scm/linux/kernel/git/
|
||||||
|
#
|
||||||
|
|
||||||
|
Aaron Durbin <adurbin@google.com>
|
||||||
|
Adam Oldham <oldhamca@gmail.com>
|
||||||
|
Adam Radford <aradford@gmail.com>
|
||||||
|
Adrian Bunk <bunk@stusta.de>
|
||||||
|
Alan Cox <alan@lxorguk.ukuu.org.uk>
|
||||||
|
Alan Cox <root@hraefn.swansea.linux.org.uk>
|
||||||
|
Aleksey Gorelov <aleksey_gorelov@phoenix.com>
|
||||||
|
Al Viro <viro@ftp.linux.org.uk>
|
||||||
|
Al Viro <viro@zenIV.linux.org.uk>
|
||||||
|
Andreas Herrmann <aherrman@de.ibm.com>
|
||||||
|
Andrew Morton <akpm@osdl.org>
|
||||||
|
Andrew Vasquez <andrew.vasquez@qlogic.com>
|
||||||
|
Andy Adamson <andros@citi.umich.edu>
|
||||||
|
Arnaud Patard <arnaud.patard@rtp-net.org>
|
||||||
|
Arnd Bergmann <arnd@arndb.de>
|
||||||
|
Axel Dyks <xl@xlsigned.net>
|
||||||
|
Ben Gardner <bgardner@wabtec.com>
|
||||||
|
Ben M Cahill <ben.m.cahill@intel.com>
|
||||||
|
Björn Steinbrink <B.Steinbrink@gmx.de>
|
||||||
|
Brian Avery <b.avery@hp.com>
|
||||||
|
Brian King <brking@us.ibm.com>
|
||||||
|
Christoph Hellwig <hch@lst.de>
|
||||||
|
Corey Minyard <minyard@acm.org>
|
||||||
|
David Brownell <david-b@pacbell.net>
|
||||||
|
David Woodhouse <dwmw2@shinybook.infradead.org>
|
||||||
|
Domen Puncer <domen@coderock.org>
|
||||||
|
Douglas Gilbert <dougg@torque.net>
|
||||||
|
Ed L. Cashin <ecashin@coraid.com>
|
||||||
|
Evgeniy Polyakov <johnpol@2ka.mipt.ru>
|
||||||
|
Felipe W Damasio <felipewd@terra.com.br>
|
||||||
|
Felix Kuhling <fxkuehl@gmx.de>
|
||||||
|
Felix Moeller <felix@derklecks.de>
|
||||||
|
Filipe Lautert <filipe@icewall.org>
|
||||||
|
Franck Bui-Huu <vagabon.xyz@gmail.com>
|
||||||
|
Frank Zago <fzago@systemfabricworks.com>
|
||||||
|
Greg Kroah-Hartman <greg@echidna.(none)>
|
||||||
|
Greg Kroah-Hartman <gregkh@suse.de>
|
||||||
|
Greg Kroah-Hartman <greg@kroah.com>
|
||||||
|
Henk Vergonet <Henk.Vergonet@gmail.com>
|
||||||
|
Henrik Kretzschmar <henne@nachtwindheim.de>
|
||||||
|
Herbert Xu <herbert@gondor.apana.org.au>
|
||||||
|
Jacob Shin <Jacob.Shin@amd.com>
|
||||||
|
James Bottomley <jejb@mulgrave.(none)>
|
||||||
|
James Bottomley <jejb@titanic.il.steeleye.com>
|
||||||
|
James E Wilson <wilson@specifix.com>
|
||||||
|
James Ketrenos <jketreno@io.(none)>
|
||||||
|
Jean Tourrilhes <jt@hpl.hp.com>
|
||||||
|
Jeff Garzik <jgarzik@pretzel.yyz.us>
|
||||||
|
Jens Axboe <axboe@suse.de>
|
||||||
|
Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
|
||||||
|
John Stultz <johnstul@us.ibm.com>
|
||||||
|
Juha Yrjola <at solidboot.com>
|
||||||
|
Juha Yrjola <juha.yrjola@nokia.com>
|
||||||
|
Juha Yrjola <juha.yrjola@solidboot.com>
|
||||||
|
Kay Sievers <kay.sievers@vrfy.org>
|
||||||
|
Kenneth W Chen <kenneth.w.chen@intel.com>
|
||||||
|
Koushik <raghavendra.koushik@neterion.com>
|
||||||
|
Leonid I Ananiev <leonid.i.ananiev@intel.com>
|
||||||
|
Linas Vepstas <linas@austin.ibm.com>
|
||||||
|
Matthieu CASTET <castet.matthieu@free.fr>
|
||||||
|
Michel Dänzer <michel@tungstengraphics.com>
|
||||||
|
Mitesh shah <mshah@teja.com>
|
||||||
|
Morten Welinder <terra@gnome.org>
|
||||||
|
Morten Welinder <welinder@anemone.rentec.com>
|
||||||
|
Morten Welinder <welinder@darter.rentec.com>
|
||||||
|
Morten Welinder <welinder@troll.com>
|
||||||
|
Nguyen Anh Quynh <aquynh@gmail.com>
|
||||||
|
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
|
||||||
|
Patrick Mochel <mochel@digitalimplant.org>
|
||||||
|
Peter A Jonsson <pj@ludd.ltu.se>
|
||||||
|
Praveen BP <praveenbp@ti.com>
|
||||||
|
Rajesh Shah <rajesh.shah@intel.com>
|
||||||
|
Ralf Baechle <ralf@linux-mips.org>
|
||||||
|
Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
|
||||||
|
Rémi Denis-Courmont <rdenis@simphalempin.com>
|
||||||
|
Rudolf Marek <R.Marek@sh.cvut.cz>
|
||||||
|
Rui Saraiva <rmps@joel.ist.utl.pt>
|
||||||
|
Sachin P Sant <ssant@in.ibm.com>
|
||||||
|
Sam Ravnborg <sam@mars.ravnborg.org>
|
||||||
|
Simon Kelley <simon@thekelleys.org.uk>
|
||||||
|
Stéphane Witzmann <stephane.witzmann@ubpmes.univ-bpclermont.fr>
|
||||||
|
Stephen Hemminger <shemminger@osdl.org>
|
||||||
|
Tejun Heo <htejun@gmail.com>
|
||||||
|
Thomas Graf <tgraf@suug.ch>
|
||||||
|
Tony Luck <tony.luck@intel.com>
|
||||||
|
Tsuneo Yoshioka <Tsuneo.Yoshioka@f-secure.com>
|
||||||
|
Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
|
||||||
+10
-7
@@ -78,7 +78,8 @@ Identifying GPIOs
|
|||||||
-----------------
|
-----------------
|
||||||
GPIOs are identified by unsigned integers in the range 0..MAX_INT. That
|
GPIOs are identified by unsigned integers in the range 0..MAX_INT. That
|
||||||
reserves "negative" numbers for other purposes like marking signals as
|
reserves "negative" numbers for other purposes like marking signals as
|
||||||
"not available on this board", or indicating faults.
|
"not available on this board", or indicating faults. Code that doesn't
|
||||||
|
touch the underlying hardware treats these integers as opaque cookies.
|
||||||
|
|
||||||
Platforms define how they use those integers, and usually #define symbols
|
Platforms define how they use those integers, and usually #define symbols
|
||||||
for the GPIO lines so that board-specific setup code directly corresponds
|
for the GPIO lines so that board-specific setup code directly corresponds
|
||||||
@@ -139,10 +140,10 @@ issues including wire-OR and output latencies.
|
|||||||
The get/set calls have no error returns because "invalid GPIO" should have
|
The get/set calls have no error returns because "invalid GPIO" should have
|
||||||
been reported earlier in gpio_set_direction(). However, note that not all
|
been reported earlier in gpio_set_direction(). However, note that not all
|
||||||
platforms can read the value of output pins; those that can't should always
|
platforms can read the value of output pins; those that can't should always
|
||||||
return zero. Also, these calls will be ignored for GPIOs that can't safely
|
return zero. Also, using these calls for GPIOs that can't safely be accessed
|
||||||
be accessed wihtout sleeping (see below).
|
without sleeping (see below) is an error.
|
||||||
|
|
||||||
Platform-specific implementations are encouraged to optimise the two
|
Platform-specific implementations are encouraged to optimize the two
|
||||||
calls to access the GPIO value in cases where the GPIO number (and for
|
calls to access the GPIO value in cases where the GPIO number (and for
|
||||||
output, value) are constant. It's normal for them to need only a couple
|
output, value) are constant. It's normal for them to need only a couple
|
||||||
of instructions in such cases (reading or writing a hardware register),
|
of instructions in such cases (reading or writing a hardware register),
|
||||||
@@ -239,7 +240,8 @@ options are part of the IRQ interface, e.g. IRQF_TRIGGER_FALLING, as are
|
|||||||
system wakeup capabilities.
|
system wakeup capabilities.
|
||||||
|
|
||||||
Non-error values returned from irq_to_gpio() would most commonly be used
|
Non-error values returned from irq_to_gpio() would most commonly be used
|
||||||
with gpio_get_value().
|
with gpio_get_value(), for example to initialize or update driver state
|
||||||
|
when the IRQ is edge-triggered.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -260,9 +262,10 @@ pullups (or pulldowns) so that the on-chip ones should not be used.
|
|||||||
There are other system-specific mechanisms that are not specified here,
|
There are other system-specific mechanisms that are not specified here,
|
||||||
like the aforementioned options for input de-glitching and wire-OR output.
|
like the aforementioned options for input de-glitching and wire-OR output.
|
||||||
Hardware may support reading or writing GPIOs in gangs, but that's usually
|
Hardware may support reading or writing GPIOs in gangs, but that's usually
|
||||||
configuration dependednt: for GPIOs sharing the same bank. (GPIOs are
|
configuration dependent: for GPIOs sharing the same bank. (GPIOs are
|
||||||
commonly grouped in banks of 16 or 32, with a given SOC having several such
|
commonly grouped in banks of 16 or 32, with a given SOC having several such
|
||||||
banks.) Code relying on such mechanisms will necessarily be nonportable.
|
banks.) Some systems can trigger IRQs from output GPIOs. Code relying on
|
||||||
|
such mechanisms will necessarily be nonportable.
|
||||||
|
|
||||||
Dynamic definition of GPIOs is not currently supported; for example, as
|
Dynamic definition of GPIOs is not currently supported; for example, as
|
||||||
a side effect of configuring an add-on board with some GPIO expanders.
|
a side effect of configuring an add-on board with some GPIO expanders.
|
||||||
|
|||||||
@@ -0,0 +1,68 @@
|
|||||||
|
timer_stats - timer usage statistics
|
||||||
|
------------------------------------
|
||||||
|
|
||||||
|
timer_stats is a debugging facility to make the timer (ab)usage in a Linux
|
||||||
|
system visible to kernel and userspace developers. It is not intended for
|
||||||
|
production usage as it adds significant overhead to the (hr)timer code and the
|
||||||
|
(hr)timer data structures.
|
||||||
|
|
||||||
|
timer_stats should be used by kernel and userspace developers to verify that
|
||||||
|
their code does not make unduly use of timers. This helps to avoid unnecessary
|
||||||
|
wakeups, which should be avoided to optimize power consumption.
|
||||||
|
|
||||||
|
It can be enabled by CONFIG_TIMER_STATS in the "Kernel hacking" configuration
|
||||||
|
section.
|
||||||
|
|
||||||
|
timer_stats collects information about the timer events which are fired in a
|
||||||
|
Linux system over a sample period:
|
||||||
|
|
||||||
|
- the pid of the task(process) which initialized the timer
|
||||||
|
- the name of the process which initialized the timer
|
||||||
|
- the function where the timer was intialized
|
||||||
|
- the callback function which is associated to the timer
|
||||||
|
- the number of events (callbacks)
|
||||||
|
|
||||||
|
timer_stats adds an entry to /proc: /proc/timer_stats
|
||||||
|
|
||||||
|
This entry is used to control the statistics functionality and to read out the
|
||||||
|
sampled information.
|
||||||
|
|
||||||
|
The timer_stats functionality is inactive on bootup.
|
||||||
|
|
||||||
|
To activate a sample period issue:
|
||||||
|
# echo 1 >/proc/timer_stats
|
||||||
|
|
||||||
|
To stop a sample period issue:
|
||||||
|
# echo 0 >/proc/timer_stats
|
||||||
|
|
||||||
|
The statistics can be retrieved by:
|
||||||
|
# cat /proc/timer_stats
|
||||||
|
|
||||||
|
The readout of /proc/timer_stats automatically disables sampling. The sampled
|
||||||
|
information is kept until a new sample period is started. This allows multiple
|
||||||
|
readouts.
|
||||||
|
|
||||||
|
Sample output of /proc/timer_stats:
|
||||||
|
|
||||||
|
Timerstats sample period: 3.888770 s
|
||||||
|
12, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
|
||||||
|
15, 1 swapper hcd_submit_urb (rh_timer_func)
|
||||||
|
4, 959 kedac schedule_timeout (process_timeout)
|
||||||
|
1, 0 swapper page_writeback_init (wb_timer_fn)
|
||||||
|
28, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
|
||||||
|
22, 2948 IRQ 4 tty_flip_buffer_push (delayed_work_timer_fn)
|
||||||
|
3, 3100 bash schedule_timeout (process_timeout)
|
||||||
|
1, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
|
||||||
|
1, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
|
||||||
|
1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer)
|
||||||
|
1, 2292 ip __netdev_watchdog_up (dev_watchdog)
|
||||||
|
1, 23 events/1 do_cache_clean (delayed_work_timer_fn)
|
||||||
|
90 total events, 30.0 events/sec
|
||||||
|
|
||||||
|
The first column is the number of events, the second column the pid, the third
|
||||||
|
column is the name of the process. The forth column shows the function which
|
||||||
|
initialized the timer and in parantheses the callback function which was
|
||||||
|
executed on expiry.
|
||||||
|
|
||||||
|
Thomas, Ingo
|
||||||
|
|
||||||
@@ -0,0 +1,249 @@
|
|||||||
|
High resolution timers and dynamic ticks design notes
|
||||||
|
-----------------------------------------------------
|
||||||
|
|
||||||
|
Further information can be found in the paper of the OLS 2006 talk "hrtimers
|
||||||
|
and beyond". The paper is part of the OLS 2006 Proceedings Volume 1, which can
|
||||||
|
be found on the OLS website:
|
||||||
|
http://www.linuxsymposium.org/2006/linuxsymposium_procv1.pdf
|
||||||
|
|
||||||
|
The slides to this talk are available from:
|
||||||
|
http://tglx.de/projects/hrtimers/ols2006-hrtimers.pdf
|
||||||
|
|
||||||
|
The slides contain five figures (pages 2, 15, 18, 20, 22), which illustrate the
|
||||||
|
changes in the time(r) related Linux subsystems. Figure #1 (p. 2) shows the
|
||||||
|
design of the Linux time(r) system before hrtimers and other building blocks
|
||||||
|
got merged into mainline.
|
||||||
|
|
||||||
|
Note: the paper and the slides are talking about "clock event source", while we
|
||||||
|
switched to the name "clock event devices" in meantime.
|
||||||
|
|
||||||
|
The design contains the following basic building blocks:
|
||||||
|
|
||||||
|
- hrtimer base infrastructure
|
||||||
|
- timeofday and clock source management
|
||||||
|
- clock event management
|
||||||
|
- high resolution timer functionality
|
||||||
|
- dynamic ticks
|
||||||
|
|
||||||
|
|
||||||
|
hrtimer base infrastructure
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
The hrtimer base infrastructure was merged into the 2.6.16 kernel. Details of
|
||||||
|
the base implementation are covered in Documentation/hrtimers/hrtimer.txt. See
|
||||||
|
also figure #2 (OLS slides p. 15)
|
||||||
|
|
||||||
|
The main differences to the timer wheel, which holds the armed timer_list type
|
||||||
|
timers are:
|
||||||
|
- time ordered enqueueing into a rb-tree
|
||||||
|
- independent of ticks (the processing is based on nanoseconds)
|
||||||
|
|
||||||
|
|
||||||
|
timeofday and clock source management
|
||||||
|
-------------------------------------
|
||||||
|
|
||||||
|
John Stultz's Generic Time Of Day (GTOD) framework moves a large portion of
|
||||||
|
code out of the architecture-specific areas into a generic management
|
||||||
|
framework, as illustrated in figure #3 (OLS slides p. 18). The architecture
|
||||||
|
specific portion is reduced to the low level hardware details of the clock
|
||||||
|
sources, which are registered in the framework and selected on a quality based
|
||||||
|
decision. The low level code provides hardware setup and readout routines and
|
||||||
|
initializes data structures, which are used by the generic time keeping code to
|
||||||
|
convert the clock ticks to nanosecond based time values. All other time keeping
|
||||||
|
related functionality is moved into the generic code. The GTOD base patch got
|
||||||
|
merged into the 2.6.18 kernel.
|
||||||
|
|
||||||
|
Further information about the Generic Time Of Day framework is available in the
|
||||||
|
OLS 2005 Proceedings Volume 1:
|
||||||
|
http://www.linuxsymposium.org/2005/linuxsymposium_procv1.pdf
|
||||||
|
|
||||||
|
The paper "We Are Not Getting Any Younger: A New Approach to Time and
|
||||||
|
Timers" was written by J. Stultz, D.V. Hart, & N. Aravamudan.
|
||||||
|
|
||||||
|
Figure #3 (OLS slides p.18) illustrates the transformation.
|
||||||
|
|
||||||
|
|
||||||
|
clock event management
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
While clock sources provide read access to the monotonically increasing time
|
||||||
|
value, clock event devices are used to schedule the next event
|
||||||
|
interrupt(s). The next event is currently defined to be periodic, with its
|
||||||
|
period defined at compile time. The setup and selection of the event device
|
||||||
|
for various event driven functionalities is hardwired into the architecture
|
||||||
|
dependent code. This results in duplicated code across all architectures and
|
||||||
|
makes it extremely difficult to change the configuration of the system to use
|
||||||
|
event interrupt devices other than those already built into the
|
||||||
|
architecture. Another implication of the current design is that it is necessary
|
||||||
|
to touch all the architecture-specific implementations in order to provide new
|
||||||
|
functionality like high resolution timers or dynamic ticks.
|
||||||
|
|
||||||
|
The clock events subsystem tries to address this problem by providing a generic
|
||||||
|
solution to manage clock event devices and their usage for the various clock
|
||||||
|
event driven kernel functionalities. The goal of the clock event subsystem is
|
||||||
|
to minimize the clock event related architecture dependent code to the pure
|
||||||
|
hardware related handling and to allow easy addition and utilization of new
|
||||||
|
clock event devices. It also minimizes the duplicated code across the
|
||||||
|
architectures as it provides generic functionality down to the interrupt
|
||||||
|
service handler, which is almost inherently hardware dependent.
|
||||||
|
|
||||||
|
Clock event devices are registered either by the architecture dependent boot
|
||||||
|
code or at module insertion time. Each clock event device fills a data
|
||||||
|
structure with clock-specific property parameters and callback functions. The
|
||||||
|
clock event management decides, by using the specified property parameters, the
|
||||||
|
set of system functions a clock event device will be used to support. This
|
||||||
|
includes the distinction of per-CPU and per-system global event devices.
|
||||||
|
|
||||||
|
System-level global event devices are used for the Linux periodic tick. Per-CPU
|
||||||
|
event devices are used to provide local CPU functionality such as process
|
||||||
|
accounting, profiling, and high resolution timers.
|
||||||
|
|
||||||
|
The management layer assignes one or more of the folliwing functions to a clock
|
||||||
|
event device:
|
||||||
|
- system global periodic tick (jiffies update)
|
||||||
|
- cpu local update_process_times
|
||||||
|
- cpu local profiling
|
||||||
|
- cpu local next event interrupt (non periodic mode)
|
||||||
|
|
||||||
|
The clock event device delegates the selection of those timer interrupt related
|
||||||
|
functions completely to the management layer. The clock management layer stores
|
||||||
|
a function pointer in the device description structure, which has to be called
|
||||||
|
from the hardware level handler. This removes a lot of duplicated code from the
|
||||||
|
architecture specific timer interrupt handlers and hands the control over the
|
||||||
|
clock event devices and the assignment of timer interrupt related functionality
|
||||||
|
to the core code.
|
||||||
|
|
||||||
|
The clock event layer API is rather small. Aside from the clock event device
|
||||||
|
registration interface it provides functions to schedule the next event
|
||||||
|
interrupt, clock event device notification service and support for suspend and
|
||||||
|
resume.
|
||||||
|
|
||||||
|
The framework adds about 700 lines of code which results in a 2KB increase of
|
||||||
|
the kernel binary size. The conversion of i386 removes about 100 lines of
|
||||||
|
code. The binary size decrease is in the range of 400 byte. We believe that the
|
||||||
|
increase of flexibility and the avoidance of duplicated code across
|
||||||
|
architectures justifies the slight increase of the binary size.
|
||||||
|
|
||||||
|
The conversion of an architecture has no functional impact, but allows to
|
||||||
|
utilize the high resolution and dynamic tick functionalites without any change
|
||||||
|
to the clock event device and timer interrupt code. After the conversion the
|
||||||
|
enabling of high resolution timers and dynamic ticks is simply provided by
|
||||||
|
adding the kernel/time/Kconfig file to the architecture specific Kconfig and
|
||||||
|
adding the dynamic tick specific calls to the idle routine (a total of 3 lines
|
||||||
|
added to the idle function and the Kconfig file)
|
||||||
|
|
||||||
|
Figure #4 (OLS slides p.20) illustrates the transformation.
|
||||||
|
|
||||||
|
|
||||||
|
high resolution timer functionality
|
||||||
|
-----------------------------------
|
||||||
|
|
||||||
|
During system boot it is not possible to use the high resolution timer
|
||||||
|
functionality, while making it possible would be difficult and would serve no
|
||||||
|
useful function. The initialization of the clock event device framework, the
|
||||||
|
clock source framework (GTOD) and hrtimers itself has to be done and
|
||||||
|
appropriate clock sources and clock event devices have to be registered before
|
||||||
|
the high resolution functionality can work. Up to the point where hrtimers are
|
||||||
|
initialized, the system works in the usual low resolution periodic mode. The
|
||||||
|
clock source and the clock event device layers provide notification functions
|
||||||
|
which inform hrtimers about availability of new hardware. hrtimers validates
|
||||||
|
the usability of the registered clock sources and clock event devices before
|
||||||
|
switching to high resolution mode. This ensures also that a kernel which is
|
||||||
|
configured for high resolution timers can run on a system which lacks the
|
||||||
|
necessary hardware support.
|
||||||
|
|
||||||
|
The high resolution timer code does not support SMP machines which have only
|
||||||
|
global clock event devices. The support of such hardware would involve IPI
|
||||||
|
calls when an interrupt happens. The overhead would be much larger than the
|
||||||
|
benefit. This is the reason why we currently disable high resolution and
|
||||||
|
dynamic ticks on i386 SMP systems which stop the local APIC in C3 power
|
||||||
|
state. A workaround is available as an idea, but the problem has not been
|
||||||
|
tackled yet.
|
||||||
|
|
||||||
|
The time ordered insertion of timers provides all the infrastructure to decide
|
||||||
|
whether the event device has to be reprogrammed when a timer is added. The
|
||||||
|
decision is made per timer base and synchronized across per-cpu timer bases in
|
||||||
|
a support function. The design allows the system to utilize separate per-CPU
|
||||||
|
clock event devices for the per-CPU timer bases, but currently only one
|
||||||
|
reprogrammable clock event device per-CPU is utilized.
|
||||||
|
|
||||||
|
When the timer interrupt happens, the next event interrupt handler is called
|
||||||
|
from the clock event distribution code and moves expired timers from the
|
||||||
|
red-black tree to a separate double linked list and invokes the softirq
|
||||||
|
handler. An additional mode field in the hrtimer structure allows the system to
|
||||||
|
execute callback functions directly from the next event interrupt handler. This
|
||||||
|
is restricted to code which can safely be executed in the hard interrupt
|
||||||
|
context. This applies, for example, to the common case of a wakeup function as
|
||||||
|
used by nanosleep. The advantage of executing the handler in the interrupt
|
||||||
|
context is the avoidance of up to two context switches - from the interrupted
|
||||||
|
context to the softirq and to the task which is woken up by the expired
|
||||||
|
timer.
|
||||||
|
|
||||||
|
Once a system has switched to high resolution mode, the periodic tick is
|
||||||
|
switched off. This disables the per system global periodic clock event device -
|
||||||
|
e.g. the PIT on i386 SMP systems.
|
||||||
|
|
||||||
|
The periodic tick functionality is provided by an per-cpu hrtimer. The callback
|
||||||
|
function is executed in the next event interrupt context and updates jiffies
|
||||||
|
and calls update_process_times and profiling. The implementation of the hrtimer
|
||||||
|
based periodic tick is designed to be extended with dynamic tick functionality.
|
||||||
|
This allows to use a single clock event device to schedule high resolution
|
||||||
|
timer and periodic events (jiffies tick, profiling, process accounting) on UP
|
||||||
|
systems. This has been proved to work with the PIT on i386 and the Incrementer
|
||||||
|
on PPC.
|
||||||
|
|
||||||
|
The softirq for running the hrtimer queues and executing the callbacks has been
|
||||||
|
separated from the tick bound timer softirq to allow accurate delivery of high
|
||||||
|
resolution timer signals which are used by itimer and POSIX interval
|
||||||
|
timers. The execution of this softirq can still be delayed by other softirqs,
|
||||||
|
but the overall latencies have been significantly improved by this separation.
|
||||||
|
|
||||||
|
Figure #5 (OLS slides p.22) illustrates the transformation.
|
||||||
|
|
||||||
|
|
||||||
|
dynamic ticks
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Dynamic ticks are the logical consequence of the hrtimer based periodic tick
|
||||||
|
replacement (sched_tick). The functionality of the sched_tick hrtimer is
|
||||||
|
extended by three functions:
|
||||||
|
|
||||||
|
- hrtimer_stop_sched_tick
|
||||||
|
- hrtimer_restart_sched_tick
|
||||||
|
- hrtimer_update_jiffies
|
||||||
|
|
||||||
|
hrtimer_stop_sched_tick() is called when a CPU goes into idle state. The code
|
||||||
|
evaluates the next scheduled timer event (from both hrtimers and the timer
|
||||||
|
wheel) and in case that the next event is further away than the next tick it
|
||||||
|
reprograms the sched_tick to this future event, to allow longer idle sleeps
|
||||||
|
without worthless interruption by the periodic tick. The function is also
|
||||||
|
called when an interrupt happens during the idle period, which does not cause a
|
||||||
|
reschedule. The call is necessary as the interrupt handler might have armed a
|
||||||
|
new timer whose expiry time is before the time which was identified as the
|
||||||
|
nearest event in the previous call to hrtimer_stop_sched_tick.
|
||||||
|
|
||||||
|
hrtimer_restart_sched_tick() is called when the CPU leaves the idle state before
|
||||||
|
it calls schedule(). hrtimer_restart_sched_tick() resumes the periodic tick,
|
||||||
|
which is kept active until the next call to hrtimer_stop_sched_tick().
|
||||||
|
|
||||||
|
hrtimer_update_jiffies() is called from irq_enter() when an interrupt happens
|
||||||
|
in the idle period to make sure that jiffies are up to date and the interrupt
|
||||||
|
handler has not to deal with an eventually stale jiffy value.
|
||||||
|
|
||||||
|
The dynamic tick feature provides statistical values which are exported to
|
||||||
|
userspace via /proc/stats and can be made available for enhanced power
|
||||||
|
management control.
|
||||||
|
|
||||||
|
The implementation leaves room for further development like full tickless
|
||||||
|
systems, where the time slice is controlled by the scheduler, variable
|
||||||
|
frequency profiling, and a complete removal of jiffies in the future.
|
||||||
|
|
||||||
|
|
||||||
|
Aside the current initial submission of i386 support, the patchset has been
|
||||||
|
extended to x86_64 and ARM already. Initial (work in progress) support is also
|
||||||
|
available for MIPS and PowerPC.
|
||||||
|
|
||||||
|
Thomas, Ingo
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -48,14 +48,9 @@ following:
|
|||||||
The SMBus controller is function 3 in device 1f. Class 0c05 is SMBus Serial
|
The SMBus controller is function 3 in device 1f. Class 0c05 is SMBus Serial
|
||||||
Controller.
|
Controller.
|
||||||
|
|
||||||
If you do NOT see the 24x3 device at function 3, and you can't figure out
|
|
||||||
any way in the BIOS to enable it,
|
|
||||||
|
|
||||||
The ICH chips are quite similar to Intel's PIIX4 chip, at least in the
|
The ICH chips are quite similar to Intel's PIIX4 chip, at least in the
|
||||||
SMBus controller.
|
SMBus controller.
|
||||||
|
|
||||||
See the file i2c-piix4 for some additional information.
|
|
||||||
|
|
||||||
|
|
||||||
Process Call Support
|
Process Call Support
|
||||||
--------------------
|
--------------------
|
||||||
@@ -74,6 +69,61 @@ SMBus 2.0 Support
|
|||||||
|
|
||||||
The 82801DB (ICH4) and later chips support several SMBus 2.0 features.
|
The 82801DB (ICH4) and later chips support several SMBus 2.0 features.
|
||||||
|
|
||||||
|
|
||||||
|
Hidden ICH SMBus
|
||||||
|
----------------
|
||||||
|
|
||||||
|
If your system has an Intel ICH south bridge, but you do NOT see the
|
||||||
|
SMBus device at 00:1f.3 in lspci, and you can't figure out any way in the
|
||||||
|
BIOS to enable it, it means it has been hidden by the BIOS code. Asus is
|
||||||
|
well known for first doing this on their P4B motherboard, and many other
|
||||||
|
boards after that. Some vendor machines are affected as well.
|
||||||
|
|
||||||
|
The first thing to try is the "i2c_ec" ACPI driver. It could be that the
|
||||||
|
SMBus was hidden on purpose because it'll be driven by ACPI. If the
|
||||||
|
i2c_ec driver works for you, just forget about the i2c-i801 driver and
|
||||||
|
don't try to unhide the ICH SMBus. Even if i2c_ec doesn't work, you
|
||||||
|
better make sure that the SMBus isn't used by the ACPI code. Try loading
|
||||||
|
the "fan" and "thermal" drivers, and check in /proc/acpi/fan and
|
||||||
|
/proc/acpi/thermal_zone. If you find anything there, it's likely that
|
||||||
|
the ACPI is accessing the SMBus and it's safer not to unhide it. Only
|
||||||
|
once you are certain that ACPI isn't using the SMBus, you can attempt
|
||||||
|
to unhide it.
|
||||||
|
|
||||||
|
In order to unhide the SMBus, we need to change the value of a PCI
|
||||||
|
register before the kernel enumerates the PCI devices. This is done in
|
||||||
|
drivers/pci/quirks.c, where all affected boards must be listed (see
|
||||||
|
function asus_hides_smbus_hostbridge.) If the SMBus device is missing,
|
||||||
|
and you think there's something interesting on the SMBus (e.g. a
|
||||||
|
hardware monitoring chip), you need to add your board to the list.
|
||||||
|
|
||||||
|
The motherboard is identified using the subvendor and subdevice IDs of the
|
||||||
|
host bridge PCI device. Get yours with "lspci -n -v -s 00:00.0":
|
||||||
|
|
||||||
|
00:00.0 Class 0600: 8086:2570 (rev 02)
|
||||||
|
Subsystem: 1043:80f2
|
||||||
|
Flags: bus master, fast devsel, latency 0
|
||||||
|
Memory at fc000000 (32-bit, prefetchable) [size=32M]
|
||||||
|
Capabilities: [e4] #09 [2106]
|
||||||
|
Capabilities: [a0] AGP version 3.0
|
||||||
|
|
||||||
|
Here the host bridge ID is 2570 (82865G/PE/P), the subvendor ID is 1043
|
||||||
|
(Asus) and the subdevice ID is 80f2 (P4P800-X). You can find the symbolic
|
||||||
|
names for the bridge ID and the subvendor ID in include/linux/pci_ids.h,
|
||||||
|
and then add a case for your subdevice ID at the right place in
|
||||||
|
drivers/pci/quirks.c. Then please give it very good testing, to make sure
|
||||||
|
that the unhidden SMBus doesn't conflict with e.g. ACPI.
|
||||||
|
|
||||||
|
If it works, proves useful (i.e. there are usable chips on the SMBus)
|
||||||
|
and seems safe, please submit a patch for inclusion into the kernel.
|
||||||
|
|
||||||
|
Note: There's a useful script in lm_sensors 2.10.2 and later, named
|
||||||
|
unhide_ICH_SMBus (in prog/hotplug), which uses the fakephp driver to
|
||||||
|
temporarily unhide the SMBus without having to patch and recompile your
|
||||||
|
kernel. It's very convenient if you just want to check if there's
|
||||||
|
anything interesting on your hidden ICH SMBus.
|
||||||
|
|
||||||
|
|
||||||
**********************
|
**********************
|
||||||
The lm_sensors project gratefully acknowledges the support of Texas
|
The lm_sensors project gratefully acknowledges the support of Texas
|
||||||
Instruments in the initial development of this driver.
|
Instruments in the initial development of this driver.
|
||||||
|
|||||||
@@ -19,6 +19,7 @@ It currently supports the following devices:
|
|||||||
* (type=4) Analog Devices ADM1032 evaluation board
|
* (type=4) Analog Devices ADM1032 evaluation board
|
||||||
* (type=5) Analog Devices evaluation boards: ADM1025, ADM1030, ADM1031
|
* (type=5) Analog Devices evaluation boards: ADM1025, ADM1030, ADM1031
|
||||||
* (type=6) Barco LPT->DVI (K5800236) adapter
|
* (type=6) Barco LPT->DVI (K5800236) adapter
|
||||||
|
* (type=7) One For All JP1 parallel port adapter
|
||||||
|
|
||||||
These devices use different pinout configurations, so you have to tell
|
These devices use different pinout configurations, so you have to tell
|
||||||
the driver what you have, using the type module parameter. There is no
|
the driver what you have, using the type module parameter. There is no
|
||||||
@@ -157,3 +158,17 @@ many more, using /dev/velleman.
|
|||||||
http://home.wanadoo.nl/hihihi/libk8005.htm
|
http://home.wanadoo.nl/hihihi/libk8005.htm
|
||||||
http://struyve.mine.nu:8080/index.php?block=k8000
|
http://struyve.mine.nu:8080/index.php?block=k8000
|
||||||
http://sourceforge.net/projects/libk8005/
|
http://sourceforge.net/projects/libk8005/
|
||||||
|
|
||||||
|
|
||||||
|
One For All JP1 parallel port adapter
|
||||||
|
-------------------------------------
|
||||||
|
|
||||||
|
The JP1 project revolves around a set of remote controls which expose
|
||||||
|
the I2C bus their internal configuration EEPROM lives on via a 6 pin
|
||||||
|
jumper in the battery compartment. More details can be found at:
|
||||||
|
|
||||||
|
http://www.hifi-remote.com/jp1/
|
||||||
|
|
||||||
|
Details of the simple parallel port hardware can be found at:
|
||||||
|
|
||||||
|
http://www.hifi-remote.com/jp1/hardware.shtml
|
||||||
|
|||||||
@@ -6,7 +6,7 @@ Supported adapters:
|
|||||||
Datasheet: Publicly available at the Intel website
|
Datasheet: Publicly available at the Intel website
|
||||||
* ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges
|
* ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges
|
||||||
Datasheet: Only available via NDA from ServerWorks
|
Datasheet: Only available via NDA from ServerWorks
|
||||||
* ATI IXP southbridges IXP200, IXP300, IXP400
|
* ATI IXP200, IXP300, IXP400 and SB600 southbridges
|
||||||
Datasheet: Not publicly available
|
Datasheet: Not publicly available
|
||||||
* Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge
|
* Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge
|
||||||
Datasheet: Publicly available at the SMSC website http://www.smsc.com
|
Datasheet: Publicly available at the SMSC website http://www.smsc.com
|
||||||
|
|||||||
@@ -13,6 +13,9 @@ Supported adapters:
|
|||||||
* VIA Technologies, Inc. VT8235, VT8237R, VT8237A, VT8251
|
* VIA Technologies, Inc. VT8235, VT8237R, VT8237A, VT8251
|
||||||
Datasheet: available on request and under NDA from VIA
|
Datasheet: available on request and under NDA from VIA
|
||||||
|
|
||||||
|
* VIA Technologies, Inc. CX700
|
||||||
|
Datasheet: available on request and under NDA from VIA
|
||||||
|
|
||||||
Authors:
|
Authors:
|
||||||
Kyösti Mälkki <kmalkki@cc.hut.fi>,
|
Kyösti Mälkki <kmalkki@cc.hut.fi>,
|
||||||
Mark D. Studebaker <mdsxyz123@yahoo.com>,
|
Mark D. Studebaker <mdsxyz123@yahoo.com>,
|
||||||
@@ -44,6 +47,7 @@ Your lspci -n listing must show one of these :
|
|||||||
device 1106:3227 (VT8237R)
|
device 1106:3227 (VT8237R)
|
||||||
device 1106:3337 (VT8237A)
|
device 1106:3337 (VT8237A)
|
||||||
device 1106:3287 (VT8251)
|
device 1106:3287 (VT8251)
|
||||||
|
device 1106:8324 (CX700)
|
||||||
|
|
||||||
If none of these show up, you should look in the BIOS for settings like
|
If none of these show up, you should look in the BIOS for settings like
|
||||||
enable ACPI / SMBus or even USB.
|
enable ACPI / SMBus or even USB.
|
||||||
@@ -51,3 +55,6 @@ enable ACPI / SMBus or even USB.
|
|||||||
Except for the oldest chips (VT82C596A/B, VT82C686A and most probably
|
Except for the oldest chips (VT82C596A/B, VT82C686A and most probably
|
||||||
VT8231), this driver supports I2C block transactions. Such transactions
|
VT8231), this driver supports I2C block transactions. Such transactions
|
||||||
are mainly useful to read from and write to EEPROMs.
|
are mainly useful to read from and write to EEPROMs.
|
||||||
|
|
||||||
|
The CX700 additionally appears to support SMBus PEC, although this driver
|
||||||
|
doesn't implement it yet.
|
||||||
|
|||||||
@@ -129,6 +129,12 @@ Technical changes:
|
|||||||
structure, those name member should be initialized to a driver name
|
structure, those name member should be initialized to a driver name
|
||||||
string. i2c_driver itself has no name member anymore.
|
string. i2c_driver itself has no name member anymore.
|
||||||
|
|
||||||
|
* [Driver model] Instead of shutdown or reboot notifiers, provide a
|
||||||
|
shutdown() method in your driver.
|
||||||
|
|
||||||
|
* [Power management] Use the driver model suspend() and resume()
|
||||||
|
callbacks instead of the obsolete pm_register() calls.
|
||||||
|
|
||||||
Coding policy:
|
Coding policy:
|
||||||
|
|
||||||
* [Copyright] Use (C), not (c), for copyright.
|
* [Copyright] Use (C), not (c), for copyright.
|
||||||
|
|||||||
@@ -97,7 +97,7 @@ SMBus Write Word Data
|
|||||||
=====================
|
=====================
|
||||||
|
|
||||||
This is the opposite operation of the Read Word Data command. 16 bits
|
This is the opposite operation of the Read Word Data command. 16 bits
|
||||||
of data is read from a device, from a designated register that is
|
of data is written to a device, to the designated register that is
|
||||||
specified through the Comm byte.
|
specified through the Comm byte.
|
||||||
|
|
||||||
S Addr Wr [A] Comm [A] DataLow [A] DataHigh [A] P
|
S Addr Wr [A] Comm [A] DataLow [A] DataHigh [A] P
|
||||||
|
|||||||
@@ -21,20 +21,26 @@ The driver structure
|
|||||||
|
|
||||||
Usually, you will implement a single driver structure, and instantiate
|
Usually, you will implement a single driver structure, and instantiate
|
||||||
all clients from it. Remember, a driver structure contains general access
|
all clients from it. Remember, a driver structure contains general access
|
||||||
routines, a client structure specific information like the actual I2C
|
routines, and should be zero-initialized except for fields with data you
|
||||||
address.
|
provide. A client structure holds device-specific information like the
|
||||||
|
driver model device node, and its I2C address.
|
||||||
|
|
||||||
static struct i2c_driver foo_driver = {
|
static struct i2c_driver foo_driver = {
|
||||||
.driver = {
|
.driver = {
|
||||||
.name = "foo",
|
.name = "foo",
|
||||||
},
|
},
|
||||||
.attach_adapter = &foo_attach_adapter,
|
.attach_adapter = foo_attach_adapter,
|
||||||
.detach_client = &foo_detach_client,
|
.detach_client = foo_detach_client,
|
||||||
.command = &foo_command /* may be NULL */
|
.shutdown = foo_shutdown, /* optional */
|
||||||
|
.suspend = foo_suspend, /* optional */
|
||||||
|
.resume = foo_resume, /* optional */
|
||||||
|
.command = foo_command, /* optional */
|
||||||
}
|
}
|
||||||
|
|
||||||
The name field must match the driver name, including the case. It must not
|
The name field is the driver name, and must not contain spaces. It
|
||||||
contain spaces, and may be up to 31 characters long.
|
should match the module name (if the driver can be compiled as a module),
|
||||||
|
although you can use MODULE_ALIAS (passing "foo" in this example) to add
|
||||||
|
another name for the module.
|
||||||
|
|
||||||
All other fields are for call-back functions which will be explained
|
All other fields are for call-back functions which will be explained
|
||||||
below.
|
below.
|
||||||
@@ -43,11 +49,18 @@ below.
|
|||||||
Extra client data
|
Extra client data
|
||||||
=================
|
=================
|
||||||
|
|
||||||
The client structure has a special `data' field that can point to any
|
Each client structure has a special `data' field that can point to any
|
||||||
structure at all. You can use this to keep client-specific data. You
|
structure at all. You should use this to keep device-specific data,
|
||||||
|
especially in drivers that handle multiple I2C or SMBUS devices. You
|
||||||
do not always need this, but especially for `sensors' drivers, it can
|
do not always need this, but especially for `sensors' drivers, it can
|
||||||
be very useful.
|
be very useful.
|
||||||
|
|
||||||
|
/* store the value */
|
||||||
|
void i2c_set_clientdata(struct i2c_client *client, void *data);
|
||||||
|
|
||||||
|
/* retrieve the value */
|
||||||
|
void *i2c_get_clientdata(struct i2c_client *client);
|
||||||
|
|
||||||
An example structure is below.
|
An example structure is below.
|
||||||
|
|
||||||
struct foo_data {
|
struct foo_data {
|
||||||
@@ -493,6 +506,33 @@ by `__init_data'. Hose functions and structures can be removed after
|
|||||||
kernel booting (or module loading) is completed.
|
kernel booting (or module loading) is completed.
|
||||||
|
|
||||||
|
|
||||||
|
Power Management
|
||||||
|
================
|
||||||
|
|
||||||
|
If your I2C device needs special handling when entering a system low
|
||||||
|
power state -- like putting a transceiver into a low power mode, or
|
||||||
|
activating a system wakeup mechanism -- do that in the suspend() method.
|
||||||
|
The resume() method should reverse what the suspend() method does.
|
||||||
|
|
||||||
|
These are standard driver model calls, and they work just like they
|
||||||
|
would for any other driver stack. The calls can sleep, and can use
|
||||||
|
I2C messaging to the device being suspended or resumed (since their
|
||||||
|
parent I2C adapter is active when these calls are issued, and IRQs
|
||||||
|
are still enabled).
|
||||||
|
|
||||||
|
|
||||||
|
System Shutdown
|
||||||
|
===============
|
||||||
|
|
||||||
|
If your I2C device needs special handling when the system shuts down
|
||||||
|
or reboots (including kexec) -- like turning something off -- use a
|
||||||
|
shutdown() method.
|
||||||
|
|
||||||
|
Again, this is a standard driver model call, working just like it
|
||||||
|
would for any other driver stack: the calls can sleep, and can use
|
||||||
|
I2C messaging.
|
||||||
|
|
||||||
|
|
||||||
Command function
|
Command function
|
||||||
================
|
================
|
||||||
|
|
||||||
|
|||||||
@@ -104,6 +104,9 @@ loader, and have no meaning to the kernel directly.
|
|||||||
Do not modify the syntax of boot loader parameters without extreme
|
Do not modify the syntax of boot loader parameters without extreme
|
||||||
need or coordination with <Documentation/i386/boot.txt>.
|
need or coordination with <Documentation/i386/boot.txt>.
|
||||||
|
|
||||||
|
There are also arch-specific kernel-parameters not documented here.
|
||||||
|
See for example <Documentation/x86_64/boot-options.txt>.
|
||||||
|
|
||||||
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
|
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
|
||||||
a trailing = on the name of any parameter states that that parameter will
|
a trailing = on the name of any parameter states that that parameter will
|
||||||
be entered as an environment variable, whereas its absence indicates that
|
be entered as an environment variable, whereas its absence indicates that
|
||||||
@@ -361,6 +364,11 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||||||
clocksource is not available, it defaults to PIT.
|
clocksource is not available, it defaults to PIT.
|
||||||
Format: { pit | tsc | cyclone | pmtmr }
|
Format: { pit | tsc | cyclone | pmtmr }
|
||||||
|
|
||||||
|
code_bytes [IA32] How many bytes of object code to print in an
|
||||||
|
oops report.
|
||||||
|
Range: 0 - 8192
|
||||||
|
Default: 64
|
||||||
|
|
||||||
disable_8254_timer
|
disable_8254_timer
|
||||||
enable_8254_timer
|
enable_8254_timer
|
||||||
[IA32/X86_64] Disable/Enable interrupt 0 timer routing
|
[IA32/X86_64] Disable/Enable interrupt 0 timer routing
|
||||||
@@ -601,6 +609,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||||||
highmem otherwise. This also works to reduce highmem
|
highmem otherwise. This also works to reduce highmem
|
||||||
size on bigger boxes.
|
size on bigger boxes.
|
||||||
|
|
||||||
|
highres= [KNL] Enable/disable high resolution timer mode.
|
||||||
|
Valid parameters: "on", "off"
|
||||||
|
Default: "on"
|
||||||
|
|
||||||
hisax= [HW,ISDN]
|
hisax= [HW,ISDN]
|
||||||
See Documentation/isdn/README.HiSax.
|
See Documentation/isdn/README.HiSax.
|
||||||
|
|
||||||
@@ -1070,6 +1082,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
|||||||
in certain environments such as networked servers or
|
in certain environments such as networked servers or
|
||||||
real-time systems.
|
real-time systems.
|
||||||
|
|
||||||
|
nohz= [KNL] Boottime enable/disable dynamic ticks
|
||||||
|
Valid arguments: on, off
|
||||||
|
Default: on
|
||||||
|
|
||||||
noirqbalance [IA-32,SMP,KNL] Disable kernel irq balancing
|
noirqbalance [IA-32,SMP,KNL] Disable kernel irq balancing
|
||||||
|
|
||||||
noirqdebug [IA-32] Disables the code which attempts to detect and
|
noirqdebug [IA-32] Disables the code which attempts to detect and
|
||||||
|
|||||||
@@ -1334,6 +1334,9 @@ platforms are moved over to use the flattened-device-tree model.
|
|||||||
fsl-usb2-mph compatible controllers. Either this property or
|
fsl-usb2-mph compatible controllers. Either this property or
|
||||||
"port0" (or both) must be defined for "fsl-usb2-mph" compatible
|
"port0" (or both) must be defined for "fsl-usb2-mph" compatible
|
||||||
controllers.
|
controllers.
|
||||||
|
- dr_mode : indicates the working mode for "fsl-usb2-dr" compatible
|
||||||
|
controllers. Can be "host", "peripheral", or "otg". Default to
|
||||||
|
"host" if not defined for backward compatibility.
|
||||||
|
|
||||||
Recommended properties :
|
Recommended properties :
|
||||||
- interrupts : <a b> where a is the interrupt number and b is a
|
- interrupts : <a b> where a is the interrupt number and b is a
|
||||||
@@ -1367,6 +1370,7 @@ platforms are moved over to use the flattened-device-tree model.
|
|||||||
#size-cells = <0>;
|
#size-cells = <0>;
|
||||||
interrupt-parent = <700>;
|
interrupt-parent = <700>;
|
||||||
interrupts = <26 1>;
|
interrupts = <26 1>;
|
||||||
|
dr_mode = "otg";
|
||||||
phy = "ulpi";
|
phy = "ulpi";
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
MPC52xx Device Tree Bindings
|
MPC5200 Device Tree Bindings
|
||||||
----------------------------
|
----------------------------
|
||||||
|
|
||||||
(c) 2006 Secret Lab Technologies Ltd
|
(c) 2006-2007 Secret Lab Technologies Ltd
|
||||||
Grant Likely <grant.likely at secretlab.ca>
|
Grant Likely <grant.likely at secretlab.ca>
|
||||||
|
|
||||||
********** DRAFT ***********
|
********** DRAFT ***********
|
||||||
@@ -20,11 +20,11 @@ described in Documentation/powerpc/booting-without-of.txt), or passed
|
|||||||
by Open Firmare (IEEE 1275) compatible firmware using an OF compatible
|
by Open Firmare (IEEE 1275) compatible firmware using an OF compatible
|
||||||
client interface API.
|
client interface API.
|
||||||
|
|
||||||
This document specifies the requirements on the device-tree for mpc52xx
|
This document specifies the requirements on the device-tree for mpc5200
|
||||||
based boards. These requirements are above and beyond the details
|
based boards. These requirements are above and beyond the details
|
||||||
specified in either the OpenFirmware spec or booting-without-of.txt
|
specified in either the OpenFirmware spec or booting-without-of.txt
|
||||||
|
|
||||||
All new mpc52xx-based boards are expected to match this document. In
|
All new mpc5200-based boards are expected to match this document. In
|
||||||
cases where this document is not sufficient to support a new board port,
|
cases where this document is not sufficient to support a new board port,
|
||||||
this document should be updated as part of adding the new board support.
|
this document should be updated as part of adding the new board support.
|
||||||
|
|
||||||
@@ -32,26 +32,26 @@ II - Philosophy
|
|||||||
===============
|
===============
|
||||||
The core of this document is naming convention. The whole point of
|
The core of this document is naming convention. The whole point of
|
||||||
defining this convention is to reduce or eliminate the number of
|
defining this convention is to reduce or eliminate the number of
|
||||||
special cases required to support a 52xx board. If all 52xx boards
|
special cases required to support a 5200 board. If all 5200 boards
|
||||||
follow the same convention, then generic 52xx support code will work
|
follow the same convention, then generic 5200 support code will work
|
||||||
rather than coding special cases for each new board.
|
rather than coding special cases for each new board.
|
||||||
|
|
||||||
This section tries to capture the thought process behind why the naming
|
This section tries to capture the thought process behind why the naming
|
||||||
convention is what it is.
|
convention is what it is.
|
||||||
|
|
||||||
1. Node names
|
1. names
|
||||||
-------------
|
---------
|
||||||
There is strong convention/requirements already established for children
|
There is strong convention/requirements already established for children
|
||||||
of the root node. 'cpus' describes the processor cores, 'memory'
|
of the root node. 'cpus' describes the processor cores, 'memory'
|
||||||
describes memory, and 'chosen' provides boot configuration. Other nodes
|
describes memory, and 'chosen' provides boot configuration. Other nodes
|
||||||
are added to describe devices attached to the processor local bus.
|
are added to describe devices attached to the processor local bus.
|
||||||
Following convention already established with other system-on-chip
|
|
||||||
processors, MPC52xx boards must have an 'soc5200' node as a child of the
|
|
||||||
root node.
|
|
||||||
|
|
||||||
The soc5200 node holds child nodes for all on chip devices. Child nodes
|
Following convention already established with other system-on-chip
|
||||||
are typically named after the configured function. ie. the FEC node is
|
processors, 5200 device trees should use the name 'soc5200' for the
|
||||||
named 'ethernet', and a PSC in uart mode is named 'serial'.
|
parent node of on chip devices, and the root node should be its parent.
|
||||||
|
|
||||||
|
Child nodes are typically named after the configured function. ie.
|
||||||
|
the FEC node is named 'ethernet', and a PSC in uart mode is named 'serial'.
|
||||||
|
|
||||||
2. device_type property
|
2. device_type property
|
||||||
-----------------------
|
-----------------------
|
||||||
@@ -66,28 +66,47 @@ exactly.
|
|||||||
Since device_type isn't enough to match devices to drivers, there also
|
Since device_type isn't enough to match devices to drivers, there also
|
||||||
needs to be a naming convention for the compatible property. Compatible
|
needs to be a naming convention for the compatible property. Compatible
|
||||||
is an list of device descriptions sorted from specific to generic. For
|
is an list of device descriptions sorted from specific to generic. For
|
||||||
the mpc52xx, the required format for each compatible value is
|
the mpc5200, the required format for each compatible value is
|
||||||
<chip>-<device>[-<mode>]. At the minimum, the list shall contain two
|
<chip>-<device>[-<mode>]. The OS should be able to match a device driver
|
||||||
items; the first specifying the exact chip, and the second specifying
|
to the device based solely on the compatible value. If two drivers
|
||||||
mpc52xx for the chip.
|
match on the compatible list; the 'most compatible' driver should be
|
||||||
|
selected.
|
||||||
|
|
||||||
ie. ethernet on mpc5200b: compatible = "mpc5200b-ethernet\0mpc52xx-ethernet"
|
The split between the MPC5200 and the MPC5200B leaves a bit of a
|
||||||
|
connundrum. How should the compatible property be set up to provide
|
||||||
|
maximum compatability information; but still acurately describe the
|
||||||
|
chip? For the MPC5200; the answer is easy. Most of the SoC devices
|
||||||
|
originally appeared on the MPC5200. Since they didn't exist anywhere
|
||||||
|
else; the 5200 compatible properties will contain only one item;
|
||||||
|
"mpc5200-<device>".
|
||||||
|
|
||||||
The idea here is that most drivers will match to the most generic field
|
The 5200B is almost the same as the 5200, but not quite. It fixes
|
||||||
in the compatible list (mpc52xx-*), but can also test the more specific
|
silicon bugs and it adds a small number of enhancements. Most of the
|
||||||
field for enabling bug fixes or extra features.
|
devices either provide exactly the same interface as on the 5200. A few
|
||||||
|
devices have extra functions but still have a backwards compatible mode.
|
||||||
|
To express this infomation as completely as possible, 5200B device trees
|
||||||
|
should have two items in the compatible list;
|
||||||
|
"mpc5200b-<device>\0mpc5200-<device>". It is *strongly* recommended
|
||||||
|
that 5200B device trees follow this convention (instead of only listing
|
||||||
|
the base mpc5200 item).
|
||||||
|
|
||||||
|
If another chip appear on the market with one of the mpc5200 SoC
|
||||||
|
devices, then the compatible list should include mpc5200-<device>.
|
||||||
|
|
||||||
|
ie. ethernet on mpc5200: compatible = "mpc5200-ethernet"
|
||||||
|
ethernet on mpc5200b: compatible = "mpc5200b-ethernet\0mpc5200-ethernet"
|
||||||
|
|
||||||
Modal devices, like PSCs, also append the configured function to the
|
Modal devices, like PSCs, also append the configured function to the
|
||||||
end of the compatible field. ie. A PSC in i2s mode would specify
|
end of the compatible field. ie. A PSC in i2s mode would specify
|
||||||
"mpc52xx-psc-i2s", not "mpc52xx-i2s". This convention is chosen to
|
"mpc5200-psc-i2s", not "mpc5200-i2s". This convention is chosen to
|
||||||
avoid naming conflicts with non-psc devices providing the same
|
avoid naming conflicts with non-psc devices providing the same
|
||||||
function. For example, "mpc52xx-spi" and "mpc52xx-psc-spi" describe
|
function. For example, "mpc5200-spi" and "mpc5200-psc-spi" describe
|
||||||
the mpc5200 simple spi device and a PSC spi mode respectively.
|
the mpc5200 simple spi device and a PSC spi mode respectively.
|
||||||
|
|
||||||
If the soc device is more generic and present on other SOCs, the
|
If the soc device is more generic and present on other SOCs, the
|
||||||
compatible property can specify the more generic device type also.
|
compatible property can specify the more generic device type also.
|
||||||
|
|
||||||
ie. mscan: compatible = "mpc5200-mscan\0mpc52xx-mscan\0fsl,mscan";
|
ie. mscan: compatible = "mpc5200-mscan\0fsl,mscan";
|
||||||
|
|
||||||
At the time of writing, exact chip may be either 'mpc5200' or
|
At the time of writing, exact chip may be either 'mpc5200' or
|
||||||
'mpc5200b'.
|
'mpc5200b'.
|
||||||
@@ -96,7 +115,7 @@ Device drivers should always try to match as generically as possible.
|
|||||||
|
|
||||||
III - Structure
|
III - Structure
|
||||||
===============
|
===============
|
||||||
The device tree for an mpc52xx board follows the structure defined in
|
The device tree for an mpc5200 board follows the structure defined in
|
||||||
booting-without-of.txt with the following additional notes:
|
booting-without-of.txt with the following additional notes:
|
||||||
|
|
||||||
0) the root node
|
0) the root node
|
||||||
@@ -115,7 +134,7 @@ Typical memory description node; see booting-without-of.
|
|||||||
|
|
||||||
3) The soc5200 node
|
3) The soc5200 node
|
||||||
-------------------
|
-------------------
|
||||||
This node describes the on chip SOC peripherals. Every mpc52xx based
|
This node describes the on chip SOC peripherals. Every mpc5200 based
|
||||||
board will have this node, and as such there is a common naming
|
board will have this node, and as such there is a common naming
|
||||||
convention for SOC devices.
|
convention for SOC devices.
|
||||||
|
|
||||||
@@ -125,71 +144,111 @@ name type description
|
|||||||
device_type string must be "soc"
|
device_type string must be "soc"
|
||||||
ranges int should be <0 baseaddr baseaddr+10000>
|
ranges int should be <0 baseaddr baseaddr+10000>
|
||||||
reg int must be <baseaddr 10000>
|
reg int must be <baseaddr 10000>
|
||||||
|
compatible string mpc5200: "mpc5200-soc"
|
||||||
|
mpc5200b: "mpc5200b-soc\0mpc5200-soc"
|
||||||
|
system-frequency int Fsystem frequency; source of all
|
||||||
|
other clocks.
|
||||||
|
bus-frequency int IPB bus frequency in HZ. Clock rate
|
||||||
|
used by most of the soc devices.
|
||||||
|
#interrupt-cells int must be <3>.
|
||||||
|
|
||||||
Recommended properties:
|
Recommended properties:
|
||||||
name type description
|
name type description
|
||||||
---- ---- -----------
|
---- ---- -----------
|
||||||
compatible string should be "<chip>-soc\0mpc52xx-soc"
|
model string Exact model of the chip;
|
||||||
ie. "mpc5200b-soc\0mpc52xx-soc"
|
ie: model="fsl,mpc5200"
|
||||||
#interrupt-cells int must be <3>. If it is not defined
|
revision string Silicon revision of chip
|
||||||
here then it must be defined in every
|
ie: revision="M08A"
|
||||||
soc device node.
|
|
||||||
bus-frequency int IPB bus frequency in HZ. Clock rate
|
The 'model' and 'revision' properties are *strongly* recommended. Having
|
||||||
used by most of the soc devices.
|
them presence acts as a bit of a safety net for working around as yet
|
||||||
Defining it here avoids needing it
|
undiscovered bugs on one version of silicon. For example, device drivers
|
||||||
added to every device node.
|
can use the model and revision properties to decide if a bug fix should
|
||||||
|
be turned on.
|
||||||
|
|
||||||
4) soc5200 child nodes
|
4) soc5200 child nodes
|
||||||
----------------------
|
----------------------
|
||||||
Any on chip SOC devices available to Linux must appear as soc5200 child nodes.
|
Any on chip SOC devices available to Linux must appear as soc5200 child nodes.
|
||||||
|
|
||||||
Note: in the tables below, '*' matches all <chip> values. ie.
|
Note: The tables below show the value for the mpc5200. A mpc5200b device
|
||||||
*-pic would translate to "mpc5200-pic\0mpc52xx-pic"
|
tree should use the "mpc5200b-<device>\0mpc5200-<device> form.
|
||||||
|
|
||||||
Required soc5200 child nodes:
|
Required soc5200 child nodes:
|
||||||
name device_type compatible Description
|
name device_type compatible Description
|
||||||
---- ----------- ---------- -----------
|
---- ----------- ---------- -----------
|
||||||
cdm@<addr> cdm *-cmd Clock Distribution
|
cdm@<addr> cdm mpc5200-cmd Clock Distribution
|
||||||
pic@<addr> interrupt-controller *-pic need an interrupt
|
pic@<addr> interrupt-controller mpc5200-pic need an interrupt
|
||||||
controller to boot
|
controller to boot
|
||||||
bestcomm@<addr> dma-controller *-bestcomm 52xx pic also requires
|
bestcomm@<addr> dma-controller mpc5200-bestcomm 5200 pic also requires
|
||||||
the bestcomm device
|
the bestcomm device
|
||||||
|
|
||||||
Recommended soc5200 child nodes; populate as needed for your board
|
Recommended soc5200 child nodes; populate as needed for your board
|
||||||
name device_type compatible Description
|
name device_type compatible Description
|
||||||
---- ----------- ---------- -----------
|
---- ----------- ---------- -----------
|
||||||
gpt@<addr> gpt *-gpt General purpose timers
|
gpt@<addr> gpt mpc5200-gpt General purpose timers
|
||||||
rtc@<addr> rtc *-rtc Real time clock
|
rtc@<addr> rtc mpc5200-rtc Real time clock
|
||||||
mscan@<addr> mscan *-mscan CAN bus controller
|
mscan@<addr> mscan mpc5200-mscan CAN bus controller
|
||||||
pci@<addr> pci *-pci PCI bridge
|
pci@<addr> pci mpc5200-pci PCI bridge
|
||||||
serial@<addr> serial *-psc-uart PSC in serial mode
|
serial@<addr> serial mpc5200-psc-uart PSC in serial mode
|
||||||
i2s@<addr> sound *-psc-i2s PSC in i2s mode
|
i2s@<addr> sound mpc5200-psc-i2s PSC in i2s mode
|
||||||
ac97@<addr> sound *-psc-ac97 PSC in ac97 mode
|
ac97@<addr> sound mpc5200-psc-ac97 PSC in ac97 mode
|
||||||
spi@<addr> spi *-psc-spi PSC in spi mode
|
spi@<addr> spi mpc5200-psc-spi PSC in spi mode
|
||||||
irda@<addr> irda *-psc-irda PSC in IrDA mode
|
irda@<addr> irda mpc5200-psc-irda PSC in IrDA mode
|
||||||
spi@<addr> spi *-spi MPC52xx spi device
|
spi@<addr> spi mpc5200-spi MPC5200 spi device
|
||||||
ethernet@<addr> network *-fec MPC52xx ethernet device
|
ethernet@<addr> network mpc5200-fec MPC5200 ethernet device
|
||||||
ata@<addr> ata *-ata IDE ATA interface
|
ata@<addr> ata mpc5200-ata IDE ATA interface
|
||||||
i2c@<addr> i2c *-i2c I2C controller
|
i2c@<addr> i2c mpc5200-i2c I2C controller
|
||||||
usb@<addr> usb-ohci-be *-ohci,ohci-be USB controller
|
usb@<addr> usb-ohci-be mpc5200-ohci,ohci-be USB controller
|
||||||
xlb@<addr> xlb *-xlb XLB arbritrator
|
xlb@<addr> xlb mpc5200-xlb XLB arbritrator
|
||||||
|
|
||||||
|
Important child node properties
|
||||||
|
name type description
|
||||||
|
---- ---- -----------
|
||||||
|
cell-index int When multiple devices are present, is the
|
||||||
|
index of the device in the hardware (ie. There
|
||||||
|
are 6 PSC on the 5200 numbered PSC1 to PSC6)
|
||||||
|
PSC1 has 'cell-index = <0>'
|
||||||
|
PSC4 has 'cell-index = <3>'
|
||||||
|
|
||||||
|
5) General Purpose Timer nodes (child of soc5200 node)
|
||||||
|
On the mpc5200 and 5200b, GPT0 has a watchdog timer function. If the board
|
||||||
|
design supports the internal wdt, then the device node for GPT0 should
|
||||||
|
include the empty property 'has-wdt'.
|
||||||
|
|
||||||
|
6) PSC nodes (child of soc5200 node)
|
||||||
|
PSC nodes can define the optional 'port-number' property to force assignment
|
||||||
|
order of serial ports. For example, PSC5 might be physically connected to
|
||||||
|
the port labeled 'COM1' and PSC1 wired to 'COM1'. In this case, PSC5 would
|
||||||
|
have a "port-number = <0>" property, and PSC1 would have "port-number = <1>".
|
||||||
|
|
||||||
|
PSC in i2s mode: The mpc5200 and mpc5200b PSCs are not compatible when in
|
||||||
|
i2s mode. An 'mpc5200b-psc-i2s' node cannot include 'mpc5200-psc-i2s' in the
|
||||||
|
compatible field.
|
||||||
|
|
||||||
IV - Extra Notes
|
IV - Extra Notes
|
||||||
================
|
================
|
||||||
|
|
||||||
1. Interrupt mapping
|
1. Interrupt mapping
|
||||||
--------------------
|
--------------------
|
||||||
The mpc52xx pic driver splits hardware IRQ numbers into two levels. The
|
The mpc5200 pic driver splits hardware IRQ numbers into two levels. The
|
||||||
split reflects the layout of the PIC hardware itself, which groups
|
split reflects the layout of the PIC hardware itself, which groups
|
||||||
interrupts into one of three groups; CRIT, MAIN or PERP. Also, the
|
interrupts into one of three groups; CRIT, MAIN or PERP. Also, the
|
||||||
Bestcomm dma engine has it's own set of interrupt sources which are
|
Bestcomm dma engine has it's own set of interrupt sources which are
|
||||||
cascaded off of peripheral interrupt 0, which the driver interprets as a
|
cascaded off of peripheral interrupt 0, which the driver interprets as a
|
||||||
fourth group, SDMA.
|
fourth group, SDMA.
|
||||||
|
|
||||||
The interrupts property for device nodes using the mpc52xx pic consists
|
The interrupts property for device nodes using the mpc5200 pic consists
|
||||||
of three cells; <L1 L2 level>
|
of three cells; <L1 L2 level>
|
||||||
|
|
||||||
L1 := [CRIT=0, MAIN=1, PERP=2, SDMA=3]
|
L1 := [CRIT=0, MAIN=1, PERP=2, SDMA=3]
|
||||||
L2 := interrupt number; directly mapped from the value in the
|
L2 := interrupt number; directly mapped from the value in the
|
||||||
"ICTL PerStat, MainStat, CritStat Encoded Register"
|
"ICTL PerStat, MainStat, CritStat Encoded Register"
|
||||||
level := [LEVEL_HIGH=0, EDGE_RISING=1, EDGE_FALLING=2, LEVEL_LOW=3]
|
level := [LEVEL_HIGH=0, EDGE_RISING=1, EDGE_FALLING=2, LEVEL_LOW=3]
|
||||||
|
|
||||||
|
2. Shared registers
|
||||||
|
-------------------
|
||||||
|
Some SoC devices share registers between them. ie. the i2c devices use
|
||||||
|
a single clock control register, and almost all device are affected by
|
||||||
|
the port_config register. Devices which need to manipulate shared regs
|
||||||
|
should look to the parent SoC node. The soc node is responsible
|
||||||
|
for arbitrating all shared register access.
|
||||||
|
|||||||
@@ -180,40 +180,81 @@ PCI
|
|||||||
pci=lastbus=NUMBER Scan upto NUMBER busses, no matter what the mptable says.
|
pci=lastbus=NUMBER Scan upto NUMBER busses, no matter what the mptable says.
|
||||||
pci=noacpi Don't use ACPI to set up PCI interrupt routing.
|
pci=noacpi Don't use ACPI to set up PCI interrupt routing.
|
||||||
|
|
||||||
IOMMU
|
IOMMU (input/output memory management unit)
|
||||||
|
|
||||||
iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge]
|
Currently four x86-64 PCI-DMA mapping implementations exist:
|
||||||
[,forcesac][,fullflush][,nomerge][,noaperture][,calgary]
|
|
||||||
size set size of iommu (in bytes)
|
|
||||||
noagp don't initialize the AGP driver and use full aperture.
|
|
||||||
off don't use the IOMMU
|
|
||||||
leak turn on simple iommu leak tracing (only when CONFIG_IOMMU_LEAK is on)
|
|
||||||
memaper[=order] allocate an own aperture over RAM with size 32MB^order.
|
|
||||||
noforce don't force IOMMU usage. Default.
|
|
||||||
force Force IOMMU.
|
|
||||||
merge Do SG merging. Implies force (experimental)
|
|
||||||
nomerge Don't do SG merging.
|
|
||||||
forcesac For SAC mode for masks <40bits (experimental)
|
|
||||||
fullflush Flush IOMMU on each allocation (default)
|
|
||||||
nofullflush Don't use IOMMU fullflush
|
|
||||||
allowed overwrite iommu off workarounds for specific chipsets.
|
|
||||||
soft Use software bounce buffering (default for Intel machines)
|
|
||||||
noaperture Don't touch the aperture for AGP.
|
|
||||||
allowdac Allow DMA >4GB
|
|
||||||
When off all DMA over >4GB is forced through an IOMMU or bounce
|
|
||||||
buffering.
|
|
||||||
nodac Forbid DMA >4GB
|
|
||||||
panic Always panic when IOMMU overflows
|
|
||||||
calgary Use the Calgary IOMMU if it is available
|
|
||||||
|
|
||||||
swiotlb=pages[,force]
|
1. <arch/x86_64/kernel/pci-nommu.c>: use no hardware/software IOMMU at all
|
||||||
|
(e.g. because you have < 3 GB memory).
|
||||||
|
Kernel boot message: "PCI-DMA: Disabling IOMMU"
|
||||||
|
|
||||||
pages Prereserve that many 128K pages for the software IO bounce buffering.
|
2. <arch/x86_64/kernel/pci-gart.c>: AMD GART based hardware IOMMU.
|
||||||
force Force all IO through the software TLB.
|
Kernel boot message: "PCI-DMA: using GART IOMMU"
|
||||||
|
|
||||||
calgary=[64k,128k,256k,512k,1M,2M,4M,8M]
|
3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used
|
||||||
calgary=[translate_empty_slots]
|
e.g. if there is no hardware IOMMU in the system and it is need because
|
||||||
calgary=[disable=<PCI bus number>]
|
you have >3GB memory or told the kernel to us it (iommu=soft))
|
||||||
|
Kernel boot message: "PCI-DMA: Using software bounce buffering
|
||||||
|
for IO (SWIOTLB)"
|
||||||
|
|
||||||
|
4. <arch/x86_64/pci-calgary.c> : IBM Calgary hardware IOMMU. Used in IBM
|
||||||
|
pSeries and xSeries servers. This hardware IOMMU supports DMA address
|
||||||
|
mapping with memory protection, etc.
|
||||||
|
Kernel boot message: "PCI-DMA: Using Calgary IOMMU"
|
||||||
|
|
||||||
|
iommu=[<size>][,noagp][,off][,force][,noforce][,leak[=<nr_of_leak_pages>]
|
||||||
|
[,memaper[=<order>]][,merge][,forcesac][,fullflush][,nomerge]
|
||||||
|
[,noaperture][,calgary]
|
||||||
|
|
||||||
|
General iommu options:
|
||||||
|
off Don't initialize and use any kind of IOMMU.
|
||||||
|
noforce Don't force hardware IOMMU usage when it is not needed.
|
||||||
|
(default).
|
||||||
|
force Force the use of the hardware IOMMU even when it is
|
||||||
|
not actually needed (e.g. because < 3 GB memory).
|
||||||
|
soft Use software bounce buffering (SWIOTLB) (default for
|
||||||
|
Intel machines). This can be used to prevent the usage
|
||||||
|
of an available hardware IOMMU.
|
||||||
|
|
||||||
|
iommu options only relevant to the AMD GART hardware IOMMU:
|
||||||
|
<size> Set the size of the remapping area in bytes.
|
||||||
|
allowed Overwrite iommu off workarounds for specific chipsets.
|
||||||
|
fullflush Flush IOMMU on each allocation (default).
|
||||||
|
nofullflush Don't use IOMMU fullflush.
|
||||||
|
leak Turn on simple iommu leak tracing (only when
|
||||||
|
CONFIG_IOMMU_LEAK is on). Default number of leak pages
|
||||||
|
is 20.
|
||||||
|
memaper[=<order>] Allocate an own aperture over RAM with size 32MB<<order.
|
||||||
|
(default: order=1, i.e. 64MB)
|
||||||
|
merge Do scatter-gather (SG) merging. Implies "force"
|
||||||
|
(experimental).
|
||||||
|
nomerge Don't do scatter-gather (SG) merging.
|
||||||
|
noaperture Ask the IOMMU not to touch the aperture for AGP.
|
||||||
|
forcesac Force single-address cycle (SAC) mode for masks <40bits
|
||||||
|
(experimental).
|
||||||
|
noagp Don't initialize the AGP driver and use full aperture.
|
||||||
|
allowdac Allow double-address cycle (DAC) mode, i.e. DMA >4GB.
|
||||||
|
DAC is used with 32-bit PCI to push a 64-bit address in
|
||||||
|
two cycles. When off all DMA over >4GB is forced through
|
||||||
|
an IOMMU or software bounce buffering.
|
||||||
|
nodac Forbid DAC mode, i.e. DMA >4GB.
|
||||||
|
panic Always panic when IOMMU overflows.
|
||||||
|
calgary Use the Calgary IOMMU if it is available
|
||||||
|
|
||||||
|
iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
|
||||||
|
implementation:
|
||||||
|
swiotlb=<pages>[,force]
|
||||||
|
<pages> Prereserve that many 128K pages for the software IO
|
||||||
|
bounce buffering.
|
||||||
|
force Force all IO through the software TLB.
|
||||||
|
|
||||||
|
Settings for the IBM Calgary hardware IOMMU currently found in IBM
|
||||||
|
pSeries and xSeries machines:
|
||||||
|
|
||||||
|
calgary=[64k,128k,256k,512k,1M,2M,4M,8M]
|
||||||
|
calgary=[translate_empty_slots]
|
||||||
|
calgary=[disable=<PCI bus number>]
|
||||||
|
panic Always panic when IOMMU overflows
|
||||||
|
|
||||||
64k,...,8M - Set the size of each PCI slot's translation table
|
64k,...,8M - Set the size of each PCI slot's translation table
|
||||||
when using the Calgary IOMMU. This is the size of the translation
|
when using the Calgary IOMMU. This is the size of the translation
|
||||||
@@ -234,14 +275,14 @@ IOMMU
|
|||||||
|
|
||||||
Debugging
|
Debugging
|
||||||
|
|
||||||
oops=panic Always panic on oopses. Default is to just kill the process,
|
oops=panic Always panic on oopses. Default is to just kill the process,
|
||||||
but there is a small probability of deadlocking the machine.
|
but there is a small probability of deadlocking the machine.
|
||||||
This will also cause panics on machine check exceptions.
|
This will also cause panics on machine check exceptions.
|
||||||
Useful together with panic=30 to trigger a reboot.
|
Useful together with panic=30 to trigger a reboot.
|
||||||
|
|
||||||
kstack=N Print that many words from the kernel stack in oops dumps.
|
kstack=N Print N words from the kernel stack in oops dumps.
|
||||||
|
|
||||||
pagefaulttrace Dump all page faults. Only useful for extreme debugging
|
pagefaulttrace Dump all page faults. Only useful for extreme debugging
|
||||||
and will create a lot of output.
|
and will create a lot of output.
|
||||||
|
|
||||||
call_trace=[old|both|newfallback|new]
|
call_trace=[old|both|newfallback|new]
|
||||||
@@ -251,15 +292,8 @@ Debugging
|
|||||||
newfallback: use new unwinder but fall back to old if it gets
|
newfallback: use new unwinder but fall back to old if it gets
|
||||||
stuck (default)
|
stuck (default)
|
||||||
|
|
||||||
call_trace=[old|both|newfallback|new]
|
Miscellaneous
|
||||||
old: use old inexact backtracer
|
|
||||||
new: use new exact dwarf2 unwinder
|
|
||||||
both: print entries from both
|
|
||||||
newfallback: use new unwinder but fall back to old if it gets
|
|
||||||
stuck (default)
|
|
||||||
|
|
||||||
Misc
|
|
||||||
|
|
||||||
noreplacement Don't replace instructions with more appropriate ones
|
noreplacement Don't replace instructions with more appropriate ones
|
||||||
for the CPU. This may be useful on asymmetric MP systems
|
for the CPU. This may be useful on asymmetric MP systems
|
||||||
where some CPU have less capabilities than the others.
|
where some CPUs have less capabilities than others.
|
||||||
|
|||||||
@@ -2,7 +2,7 @@ Firmware support for CPU hotplug under Linux/x86-64
|
|||||||
---------------------------------------------------
|
---------------------------------------------------
|
||||||
|
|
||||||
Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to
|
Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to
|
||||||
know in advance boot time the maximum number of CPUs that could be plugged
|
know in advance of boot time the maximum number of CPUs that could be plugged
|
||||||
into the system. ACPI 3.0 currently has no official way to supply
|
into the system. ACPI 3.0 currently has no official way to supply
|
||||||
this information from the firmware to the operating system.
|
this information from the firmware to the operating system.
|
||||||
|
|
||||||
|
|||||||
@@ -9,9 +9,9 @@ zombie. While the thread is in user space the kernel stack is empty
|
|||||||
except for the thread_info structure at the bottom.
|
except for the thread_info structure at the bottom.
|
||||||
|
|
||||||
In addition to the per thread stacks, there are specialized stacks
|
In addition to the per thread stacks, there are specialized stacks
|
||||||
associated with each cpu. These stacks are only used while the kernel
|
associated with each CPU. These stacks are only used while the kernel
|
||||||
is in control on that cpu, when a cpu returns to user space the
|
is in control on that CPU; when a CPU returns to user space the
|
||||||
specialized stacks contain no useful data. The main cpu stacks is
|
specialized stacks contain no useful data. The main CPU stacks are:
|
||||||
|
|
||||||
* Interrupt stack. IRQSTACKSIZE
|
* Interrupt stack. IRQSTACKSIZE
|
||||||
|
|
||||||
@@ -32,17 +32,17 @@ x86_64 also has a feature which is not available on i386, the ability
|
|||||||
to automatically switch to a new stack for designated events such as
|
to automatically switch to a new stack for designated events such as
|
||||||
double fault or NMI, which makes it easier to handle these unusual
|
double fault or NMI, which makes it easier to handle these unusual
|
||||||
events on x86_64. This feature is called the Interrupt Stack Table
|
events on x86_64. This feature is called the Interrupt Stack Table
|
||||||
(IST). There can be up to 7 IST entries per cpu. The IST code is an
|
(IST). There can be up to 7 IST entries per CPU. The IST code is an
|
||||||
index into the Task State Segment (TSS), the IST entries in the TSS
|
index into the Task State Segment (TSS). The IST entries in the TSS
|
||||||
point to dedicated stacks, each stack can be a different size.
|
point to dedicated stacks; each stack can be a different size.
|
||||||
|
|
||||||
An IST is selected by an non-zero value in the IST field of an
|
An IST is selected by a non-zero value in the IST field of an
|
||||||
interrupt-gate descriptor. When an interrupt occurs and the hardware
|
interrupt-gate descriptor. When an interrupt occurs and the hardware
|
||||||
loads such a descriptor, the hardware automatically sets the new stack
|
loads such a descriptor, the hardware automatically sets the new stack
|
||||||
pointer based on the IST value, then invokes the interrupt handler. If
|
pointer based on the IST value, then invokes the interrupt handler. If
|
||||||
software wants to allow nested IST interrupts then the handler must
|
software wants to allow nested IST interrupts then the handler must
|
||||||
adjust the IST values on entry to and exit from the interrupt handler.
|
adjust the IST values on entry to and exit from the interrupt handler.
|
||||||
(this is occasionally done, e.g. for debug exceptions)
|
(This is occasionally done, e.g. for debug exceptions.)
|
||||||
|
|
||||||
Events with different IST codes (i.e. with different stacks) can be
|
Events with different IST codes (i.e. with different stacks) can be
|
||||||
nested. For example, a debug interrupt can safely be interrupted by an
|
nested. For example, a debug interrupt can safely be interrupted by an
|
||||||
@@ -58,17 +58,17 @@ The currently assigned IST stacks are :-
|
|||||||
|
|
||||||
Used for interrupt 12 - Stack Fault Exception (#SS).
|
Used for interrupt 12 - Stack Fault Exception (#SS).
|
||||||
|
|
||||||
This allows to recover from invalid stack segments. Rarely
|
This allows the CPU to recover from invalid stack segments. Rarely
|
||||||
happens.
|
happens.
|
||||||
|
|
||||||
* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
|
* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
|
||||||
|
|
||||||
Used for interrupt 8 - Double Fault Exception (#DF).
|
Used for interrupt 8 - Double Fault Exception (#DF).
|
||||||
|
|
||||||
Invoked when handling a exception causes another exception. Happens
|
Invoked when handling one exception causes another exception. Happens
|
||||||
when the kernel is very confused (e.g. kernel stack pointer corrupt)
|
when the kernel is very confused (e.g. kernel stack pointer corrupt).
|
||||||
Using a separate stack allows to recover from it well enough in many
|
Using a separate stack allows the kernel to recover from it well enough
|
||||||
cases to still output an oops.
|
in many cases to still output an oops.
|
||||||
|
|
||||||
* NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
|
* NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,70 @@
|
|||||||
|
|
||||||
|
Configurable sysfs parameters for the x86-64 machine check code.
|
||||||
|
|
||||||
|
Machine checks report internal hardware error conditions detected
|
||||||
|
by the CPU. Uncorrected errors typically cause a machine check
|
||||||
|
(often with panic), corrected ones cause a machine check log entry.
|
||||||
|
|
||||||
|
Machine checks are organized in banks (normally associated with
|
||||||
|
a hardware subsystem) and subevents in a bank. The exact meaning
|
||||||
|
of the banks and subevent is CPU specific.
|
||||||
|
|
||||||
|
mcelog knows how to decode them.
|
||||||
|
|
||||||
|
When you see the "Machine check errors logged" message in the system
|
||||||
|
log then mcelog should run to collect and decode machine check entries
|
||||||
|
from /dev/mcelog. Normally mcelog should be run regularly from a cronjob.
|
||||||
|
|
||||||
|
Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN
|
||||||
|
(N = CPU number)
|
||||||
|
|
||||||
|
The directory contains some configurable entries:
|
||||||
|
|
||||||
|
Entries:
|
||||||
|
|
||||||
|
bankNctl
|
||||||
|
(N bank number)
|
||||||
|
64bit Hex bitmask enabling/disabling specific subevents for bank N
|
||||||
|
When a bit in the bitmask is zero then the respective
|
||||||
|
subevent will not be reported.
|
||||||
|
By default all events are enabled.
|
||||||
|
Note that BIOS maintain another mask to disable specific events
|
||||||
|
per bank. This is not visible here
|
||||||
|
|
||||||
|
The following entries appear for each CPU, but they are truly shared
|
||||||
|
between all CPUs.
|
||||||
|
|
||||||
|
check_interval
|
||||||
|
How often to poll for corrected machine check errors, in seconds
|
||||||
|
(Note output is hexademical). Default 5 minutes.
|
||||||
|
|
||||||
|
tolerant
|
||||||
|
Tolerance level. When a machine check exception occurs for a non
|
||||||
|
corrected machine check the kernel can take different actions.
|
||||||
|
Since machine check exceptions can happen any time it is sometimes
|
||||||
|
risky for the kernel to kill a process because it defies
|
||||||
|
normal kernel locking rules. The tolerance level configures
|
||||||
|
how hard the kernel tries to recover even at some risk of deadlock.
|
||||||
|
|
||||||
|
0: always panic,
|
||||||
|
1: panic if deadlock possible,
|
||||||
|
2: try to avoid panic,
|
||||||
|
3: never panic or exit (for testing only)
|
||||||
|
|
||||||
|
Default: 1
|
||||||
|
|
||||||
|
Note this only makes a difference if the CPU allows recovery
|
||||||
|
from a machine check exception. Current x86 CPUs generally do not.
|
||||||
|
|
||||||
|
trigger
|
||||||
|
Program to run when a machine check event is detected.
|
||||||
|
This is an alternative to running mcelog regularly from cron
|
||||||
|
and allows to detect events faster.
|
||||||
|
|
||||||
|
TBD document entries for AMD threshold interrupt configuration
|
||||||
|
|
||||||
|
For more details about the x86 machine check architecture
|
||||||
|
see the Intel and AMD architecture manuals from their developer websites.
|
||||||
|
|
||||||
|
For more details about the architecture see
|
||||||
|
see http://one.firstfloor.org/~andi/mce.pdf
|
||||||
+11
-11
@@ -3,26 +3,26 @@
|
|||||||
|
|
||||||
Virtual memory map with 4 level page tables:
|
Virtual memory map with 4 level page tables:
|
||||||
|
|
||||||
0000000000000000 - 00007fffffffffff (=47bits) user space, different per mm
|
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
|
||||||
hole caused by [48:63] sign extension
|
hole caused by [48:63] sign extension
|
||||||
ffff800000000000 - ffff80ffffffffff (=40bits) guard hole
|
ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole
|
||||||
ffff810000000000 - ffffc0ffffffffff (=46bits) direct mapping of all phys. memory
|
ffff810000000000 - ffffc0ffffffffff (=46 bits) direct mapping of all phys. memory
|
||||||
ffffc10000000000 - ffffc1ffffffffff (=40bits) hole
|
ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole
|
||||||
ffffc20000000000 - ffffe1ffffffffff (=45bits) vmalloc/ioremap space
|
ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space
|
||||||
... unused hole ...
|
... unused hole ...
|
||||||
ffffffff80000000 - ffffffff82800000 (=40MB) kernel text mapping, from phys 0
|
ffffffff80000000 - ffffffff82800000 (=40 MB) kernel text mapping, from phys 0
|
||||||
... unused hole ...
|
... unused hole ...
|
||||||
ffffffff88000000 - fffffffffff00000 (=1919MB) module mapping space
|
ffffffff88000000 - fffffffffff00000 (=1919 MB) module mapping space
|
||||||
|
|
||||||
The direct mapping covers all memory in the system upto the highest
|
The direct mapping covers all memory in the system up to the highest
|
||||||
memory address (this means in some cases it can also include PCI memory
|
memory address (this means in some cases it can also include PCI memory
|
||||||
holes)
|
holes).
|
||||||
|
|
||||||
vmalloc space is lazily synchronized into the different PML4 pages of
|
vmalloc space is lazily synchronized into the different PML4 pages of
|
||||||
the processes using the page fault handler, with init_level4_pgt as
|
the processes using the page fault handler, with init_level4_pgt as
|
||||||
reference.
|
reference.
|
||||||
|
|
||||||
Current X86-64 implementations only support 40 bit of address space,
|
Current X86-64 implementations only support 40 bits of address space,
|
||||||
but we support upto 46bits. This expands into MBZ space in the page tables.
|
but we support up to 46 bits. This expands into MBZ space in the page tables.
|
||||||
|
|
||||||
-Andi Kleen, Jul 2004
|
-Andi Kleen, Jul 2004
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user