You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
This commit is contained in:
@@ -229,6 +229,6 @@ KernelVersion: 4.1
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
For a partition, the offset of that partition from the start
|
||||
of the master device in bytes. This attribute is absent on
|
||||
main devices, so it can be used to distinguish between
|
||||
partitions and devices that aren't partitions.
|
||||
of the parent (another partition or a flash device) in bytes.
|
||||
This attribute is absent on flash devices, so it can be used
|
||||
to distinguish them from partitions.
|
||||
|
||||
@@ -1,22 +1,24 @@
|
||||
Dynamic DMA mapping Guide
|
||||
=========================
|
||||
=========================
|
||||
Dynamic DMA mapping Guide
|
||||
=========================
|
||||
|
||||
David S. Miller <davem@redhat.com>
|
||||
Richard Henderson <rth@cygnus.com>
|
||||
Jakub Jelinek <jakub@redhat.com>
|
||||
:Author: David S. Miller <davem@redhat.com>
|
||||
:Author: Richard Henderson <rth@cygnus.com>
|
||||
:Author: Jakub Jelinek <jakub@redhat.com>
|
||||
|
||||
This is a guide to device driver writers on how to use the DMA API
|
||||
with example pseudo-code. For a concise description of the API, see
|
||||
DMA-API.txt.
|
||||
|
||||
CPU and DMA addresses
|
||||
CPU and DMA addresses
|
||||
=====================
|
||||
|
||||
There are several kinds of addresses involved in the DMA API, and it's
|
||||
important to understand the differences.
|
||||
|
||||
The kernel normally uses virtual addresses. Any address returned by
|
||||
kmalloc(), vmalloc(), and similar interfaces is a virtual address and can
|
||||
be stored in a "void *".
|
||||
be stored in a ``void *``.
|
||||
|
||||
The virtual memory system (TLB, page tables, etc.) translates virtual
|
||||
addresses to CPU physical addresses, which are stored as "phys_addr_t" or
|
||||
@@ -37,7 +39,7 @@ be restricted to a subset of that space. For example, even if a system
|
||||
supports 64-bit addresses for main memory and PCI BARs, it may use an IOMMU
|
||||
so devices only need to use 32-bit DMA addresses.
|
||||
|
||||
Here's a picture and some examples:
|
||||
Here's a picture and some examples::
|
||||
|
||||
CPU CPU Bus
|
||||
Virtual Physical Address
|
||||
@@ -98,15 +100,16 @@ microprocessor architecture. You should use the DMA API rather than the
|
||||
bus-specific DMA API, i.e., use the dma_map_*() interfaces rather than the
|
||||
pci_map_*() interfaces.
|
||||
|
||||
First of all, you should make sure
|
||||
First of all, you should make sure::
|
||||
|
||||
#include <linux/dma-mapping.h>
|
||||
#include <linux/dma-mapping.h>
|
||||
|
||||
is in your driver, which provides the definition of dma_addr_t. This type
|
||||
can hold any valid DMA address for the platform and should be used
|
||||
everywhere you hold a DMA address returned from the DMA mapping functions.
|
||||
|
||||
What memory is DMA'able?
|
||||
What memory is DMA'able?
|
||||
========================
|
||||
|
||||
The first piece of information you must know is what kernel memory can
|
||||
be used with the DMA mapping facilities. There has been an unwritten
|
||||
@@ -143,7 +146,8 @@ What about block I/O and networking buffers? The block I/O and
|
||||
networking subsystems make sure that the buffers they use are valid
|
||||
for you to DMA from/to.
|
||||
|
||||
DMA addressing limitations
|
||||
DMA addressing limitations
|
||||
==========================
|
||||
|
||||
Does your device have any DMA addressing limitations? For example, is
|
||||
your device only capable of driving the low order 24-bits of address?
|
||||
@@ -166,7 +170,7 @@ style to do this even if your device holds the default setting,
|
||||
because this shows that you did think about these issues wrt. your
|
||||
device.
|
||||
|
||||
The query is performed via a call to dma_set_mask_and_coherent():
|
||||
The query is performed via a call to dma_set_mask_and_coherent()::
|
||||
|
||||
int dma_set_mask_and_coherent(struct device *dev, u64 mask);
|
||||
|
||||
@@ -175,12 +179,12 @@ If you have some special requirements, then the following two separate
|
||||
queries can be used instead:
|
||||
|
||||
The query for streaming mappings is performed via a call to
|
||||
dma_set_mask():
|
||||
dma_set_mask()::
|
||||
|
||||
int dma_set_mask(struct device *dev, u64 mask);
|
||||
|
||||
The query for consistent allocations is performed via a call
|
||||
to dma_set_coherent_mask():
|
||||
to dma_set_coherent_mask()::
|
||||
|
||||
int dma_set_coherent_mask(struct device *dev, u64 mask);
|
||||
|
||||
@@ -209,7 +213,7 @@ of your driver reports that performance is bad or that the device is not
|
||||
even detected, you can ask them for the kernel messages to find out
|
||||
exactly why.
|
||||
|
||||
The standard 32-bit addressing device would do something like this:
|
||||
The standard 32-bit addressing device would do something like this::
|
||||
|
||||
if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
|
||||
dev_warn(dev, "mydev: No suitable DMA available\n");
|
||||
@@ -225,7 +229,7 @@ than 64-bit addressing. For example, Sparc64 PCI SAC addressing is
|
||||
more efficient than DAC addressing.
|
||||
|
||||
Here is how you would handle a 64-bit capable device which can drive
|
||||
all 64-bits when accessing streaming DMA:
|
||||
all 64-bits when accessing streaming DMA::
|
||||
|
||||
int using_dac;
|
||||
|
||||
@@ -239,7 +243,7 @@ all 64-bits when accessing streaming DMA:
|
||||
}
|
||||
|
||||
If a card is capable of using 64-bit consistent allocations as well,
|
||||
the case would look like this:
|
||||
the case would look like this::
|
||||
|
||||
int using_dac, consistent_using_dac;
|
||||
|
||||
@@ -260,7 +264,7 @@ uses consistent allocations, one would have to check the return value from
|
||||
dma_set_coherent_mask().
|
||||
|
||||
Finally, if your device can only drive the low 24-bits of
|
||||
address you might do something like:
|
||||
address you might do something like::
|
||||
|
||||
if (dma_set_mask(dev, DMA_BIT_MASK(24))) {
|
||||
dev_warn(dev, "mydev: 24-bit DMA addressing not available\n");
|
||||
@@ -280,7 +284,7 @@ only provide the functionality which the machine can handle. It
|
||||
is important that the last call to dma_set_mask() be for the
|
||||
most specific mask.
|
||||
|
||||
Here is pseudo-code showing how this might be done:
|
||||
Here is pseudo-code showing how this might be done::
|
||||
|
||||
#define PLAYBACK_ADDRESS_BITS DMA_BIT_MASK(32)
|
||||
#define RECORD_ADDRESS_BITS DMA_BIT_MASK(24)
|
||||
@@ -308,7 +312,8 @@ A sound card was used as an example here because this genre of PCI
|
||||
devices seems to be littered with ISA chips given a PCI front end,
|
||||
and thus retaining the 16MB DMA addressing limitations of ISA.
|
||||
|
||||
Types of DMA mappings
|
||||
Types of DMA mappings
|
||||
=====================
|
||||
|
||||
There are two types of DMA mappings:
|
||||
|
||||
@@ -336,12 +341,14 @@ There are two types of DMA mappings:
|
||||
to memory is immediately visible to the device, and vice
|
||||
versa. Consistent mappings guarantee this.
|
||||
|
||||
IMPORTANT: Consistent DMA memory does not preclude the usage of
|
||||
proper memory barriers. The CPU may reorder stores to
|
||||
.. important::
|
||||
|
||||
Consistent DMA memory does not preclude the usage of
|
||||
proper memory barriers. The CPU may reorder stores to
|
||||
consistent memory just as it may normal memory. Example:
|
||||
if it is important for the device to see the first word
|
||||
of a descriptor updated before the second, you must do
|
||||
something like:
|
||||
something like::
|
||||
|
||||
desc->word0 = address;
|
||||
wmb();
|
||||
@@ -377,16 +384,17 @@ Also, systems with caches that aren't DMA-coherent will work better
|
||||
when the underlying buffers don't share cache lines with other data.
|
||||
|
||||
|
||||
Using Consistent DMA mappings.
|
||||
Using Consistent DMA mappings
|
||||
=============================
|
||||
|
||||
To allocate and map large (PAGE_SIZE or so) consistent DMA regions,
|
||||
you should do:
|
||||
you should do::
|
||||
|
||||
dma_addr_t dma_handle;
|
||||
|
||||
cpu_addr = dma_alloc_coherent(dev, size, &dma_handle, gfp);
|
||||
|
||||
where device is a struct device *. This may be called in interrupt
|
||||
where device is a ``struct device *``. This may be called in interrupt
|
||||
context with the GFP_ATOMIC flag.
|
||||
|
||||
Size is the length of the region you want to allocate, in bytes.
|
||||
@@ -415,7 +423,7 @@ exists (for example) to guarantee that if you allocate a chunk
|
||||
which is smaller than or equal to 64 kilobytes, the extent of the
|
||||
buffer you receive will not cross a 64K boundary.
|
||||
|
||||
To unmap and free such a DMA region, you call:
|
||||
To unmap and free such a DMA region, you call::
|
||||
|
||||
dma_free_coherent(dev, size, cpu_addr, dma_handle);
|
||||
|
||||
@@ -430,7 +438,7 @@ a kmem_cache, but it uses dma_alloc_coherent(), not __get_free_pages().
|
||||
Also, it understands common hardware constraints for alignment,
|
||||
like queue heads needing to be aligned on N byte boundaries.
|
||||
|
||||
Create a dma_pool like this:
|
||||
Create a dma_pool like this::
|
||||
|
||||
struct dma_pool *pool;
|
||||
|
||||
@@ -444,7 +452,7 @@ pass 0 for boundary; passing 4096 says memory allocated from this pool
|
||||
must not cross 4KByte boundaries (but at that time it may be better to
|
||||
use dma_alloc_coherent() directly instead).
|
||||
|
||||
Allocate memory from a DMA pool like this:
|
||||
Allocate memory from a DMA pool like this::
|
||||
|
||||
cpu_addr = dma_pool_alloc(pool, flags, &dma_handle);
|
||||
|
||||
@@ -452,7 +460,7 @@ flags are GFP_KERNEL if blocking is permitted (not in_interrupt nor
|
||||
holding SMP locks), GFP_ATOMIC otherwise. Like dma_alloc_coherent(),
|
||||
this returns two values, cpu_addr and dma_handle.
|
||||
|
||||
Free memory that was allocated from a dma_pool like this:
|
||||
Free memory that was allocated from a dma_pool like this::
|
||||
|
||||
dma_pool_free(pool, cpu_addr, dma_handle);
|
||||
|
||||
@@ -460,7 +468,7 @@ where pool is what you passed to dma_pool_alloc(), and cpu_addr and
|
||||
dma_handle are the values dma_pool_alloc() returned. This function
|
||||
may be called in interrupt context.
|
||||
|
||||
Destroy a dma_pool by calling:
|
||||
Destroy a dma_pool by calling::
|
||||
|
||||
dma_pool_destroy(pool);
|
||||
|
||||
@@ -468,11 +476,12 @@ Make sure you've called dma_pool_free() for all memory allocated
|
||||
from a pool before you destroy the pool. This function may not
|
||||
be called in interrupt context.
|
||||
|
||||
DMA Direction
|
||||
DMA Direction
|
||||
=============
|
||||
|
||||
The interfaces described in subsequent portions of this document
|
||||
take a DMA direction argument, which is an integer and takes on
|
||||
one of the following values:
|
||||
one of the following values::
|
||||
|
||||
DMA_BIDIRECTIONAL
|
||||
DMA_TO_DEVICE
|
||||
@@ -521,14 +530,15 @@ packets, map/unmap them with the DMA_TO_DEVICE direction
|
||||
specifier. For receive packets, just the opposite, map/unmap them
|
||||
with the DMA_FROM_DEVICE direction specifier.
|
||||
|
||||
Using Streaming DMA mappings
|
||||
Using Streaming DMA mappings
|
||||
============================
|
||||
|
||||
The streaming DMA mapping routines can be called from interrupt
|
||||
context. There are two versions of each map/unmap, one which will
|
||||
map/unmap a single memory region, and one which will map/unmap a
|
||||
scatterlist.
|
||||
|
||||
To map a single region, you do:
|
||||
To map a single region, you do::
|
||||
|
||||
struct device *dev = &my_dev->dev;
|
||||
dma_addr_t dma_handle;
|
||||
@@ -545,7 +555,7 @@ To map a single region, you do:
|
||||
goto map_error_handling;
|
||||
}
|
||||
|
||||
and to unmap it:
|
||||
and to unmap it::
|
||||
|
||||
dma_unmap_single(dev, dma_handle, size, direction);
|
||||
|
||||
@@ -563,7 +573,7 @@ Using CPU pointers like this for single mappings has a disadvantage:
|
||||
you cannot reference HIGHMEM memory in this way. Thus, there is a
|
||||
map/unmap interface pair akin to dma_{map,unmap}_single(). These
|
||||
interfaces deal with page/offset pairs instead of CPU pointers.
|
||||
Specifically:
|
||||
Specifically::
|
||||
|
||||
struct device *dev = &my_dev->dev;
|
||||
dma_addr_t dma_handle;
|
||||
@@ -593,7 +603,7 @@ error as outlined under the dma_map_single() discussion.
|
||||
You should call dma_unmap_page() when the DMA activity is finished, e.g.,
|
||||
from the interrupt which told you that the DMA transfer is done.
|
||||
|
||||
With scatterlists, you map a region gathered from several regions by:
|
||||
With scatterlists, you map a region gathered from several regions by::
|
||||
|
||||
int i, count = dma_map_sg(dev, sglist, nents, direction);
|
||||
struct scatterlist *sg;
|
||||
@@ -617,16 +627,18 @@ Then you should loop count times (note: this can be less than nents times)
|
||||
and use sg_dma_address() and sg_dma_len() macros where you previously
|
||||
accessed sg->address and sg->length as shown above.
|
||||
|
||||
To unmap a scatterlist, just call:
|
||||
To unmap a scatterlist, just call::
|
||||
|
||||
dma_unmap_sg(dev, sglist, nents, direction);
|
||||
|
||||
Again, make sure DMA activity has already finished.
|
||||
|
||||
PLEASE NOTE: The 'nents' argument to the dma_unmap_sg call must be
|
||||
the _same_ one you passed into the dma_map_sg call,
|
||||
it should _NOT_ be the 'count' value _returned_ from the
|
||||
dma_map_sg call.
|
||||
.. note::
|
||||
|
||||
The 'nents' argument to the dma_unmap_sg call must be
|
||||
the _same_ one you passed into the dma_map_sg call,
|
||||
it should _NOT_ be the 'count' value _returned_ from the
|
||||
dma_map_sg call.
|
||||
|
||||
Every dma_map_{single,sg}() call should have its dma_unmap_{single,sg}()
|
||||
counterpart, because the DMA address space is a shared resource and
|
||||
@@ -638,11 +650,11 @@ properly in order for the CPU and device to see the most up-to-date and
|
||||
correct copy of the DMA buffer.
|
||||
|
||||
So, firstly, just map it with dma_map_{single,sg}(), and after each DMA
|
||||
transfer call either:
|
||||
transfer call either::
|
||||
|
||||
dma_sync_single_for_cpu(dev, dma_handle, size, direction);
|
||||
|
||||
or:
|
||||
or::
|
||||
|
||||
dma_sync_sg_for_cpu(dev, sglist, nents, direction);
|
||||
|
||||
@@ -650,17 +662,19 @@ as appropriate.
|
||||
|
||||
Then, if you wish to let the device get at the DMA area again,
|
||||
finish accessing the data with the CPU, and then before actually
|
||||
giving the buffer to the hardware call either:
|
||||
giving the buffer to the hardware call either::
|
||||
|
||||
dma_sync_single_for_device(dev, dma_handle, size, direction);
|
||||
|
||||
or:
|
||||
or::
|
||||
|
||||
dma_sync_sg_for_device(dev, sglist, nents, direction);
|
||||
|
||||
as appropriate.
|
||||
|
||||
PLEASE NOTE: The 'nents' argument to dma_sync_sg_for_cpu() and
|
||||
.. note::
|
||||
|
||||
The 'nents' argument to dma_sync_sg_for_cpu() and
|
||||
dma_sync_sg_for_device() must be the same passed to
|
||||
dma_map_sg(). It is _NOT_ the count returned by
|
||||
dma_map_sg().
|
||||
@@ -671,7 +685,7 @@ dma_map_*() call till dma_unmap_*(), then you don't have to call the
|
||||
dma_sync_*() routines at all.
|
||||
|
||||
Here is pseudo code which shows a situation in which you would need
|
||||
to use the dma_sync_*() interfaces.
|
||||
to use the dma_sync_*() interfaces::
|
||||
|
||||
my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len)
|
||||
{
|
||||
@@ -747,7 +761,8 @@ is planned to completely remove virt_to_bus() and bus_to_virt() as
|
||||
they are entirely deprecated. Some ports already do not provide these
|
||||
as it is impossible to correctly support them.
|
||||
|
||||
Handling Errors
|
||||
Handling Errors
|
||||
===============
|
||||
|
||||
DMA address space is limited on some architectures and an allocation
|
||||
failure can be determined by:
|
||||
@@ -755,7 +770,7 @@ failure can be determined by:
|
||||
- checking if dma_alloc_coherent() returns NULL or dma_map_sg returns 0
|
||||
|
||||
- checking the dma_addr_t returned from dma_map_single() and dma_map_page()
|
||||
by using dma_mapping_error():
|
||||
by using dma_mapping_error()::
|
||||
|
||||
dma_addr_t dma_handle;
|
||||
|
||||
@@ -773,7 +788,8 @@ failure can be determined by:
|
||||
of a multiple page mapping attempt. These example are applicable to
|
||||
dma_map_page() as well.
|
||||
|
||||
Example 1:
|
||||
Example 1::
|
||||
|
||||
dma_addr_t dma_handle1;
|
||||
dma_addr_t dma_handle2;
|
||||
|
||||
@@ -802,8 +818,12 @@ Example 1:
|
||||
dma_unmap_single(dma_handle1);
|
||||
map_error_handling1:
|
||||
|
||||
Example 2: (if buffers are allocated in a loop, unmap all mapped buffers when
|
||||
mapping error is detected in the middle)
|
||||
Example 2::
|
||||
|
||||
/*
|
||||
* if buffers are allocated in a loop, unmap all mapped buffers when
|
||||
* mapping error is detected in the middle
|
||||
*/
|
||||
|
||||
dma_addr_t dma_addr;
|
||||
dma_addr_t array[DMA_BUFFERS];
|
||||
@@ -846,7 +866,8 @@ SCSI drivers must return SCSI_MLQUEUE_HOST_BUSY if the DMA mapping
|
||||
fails in the queuecommand hook. This means that the SCSI subsystem
|
||||
passes the command to the driver again later.
|
||||
|
||||
Optimizing Unmap State Space Consumption
|
||||
Optimizing Unmap State Space Consumption
|
||||
========================================
|
||||
|
||||
On many platforms, dma_unmap_{single,page}() is simply a nop.
|
||||
Therefore, keeping track of the mapping address and length is a waste
|
||||
@@ -858,7 +879,7 @@ Actually, instead of describing the macros one by one, we'll
|
||||
transform some example code.
|
||||
|
||||
1) Use DEFINE_DMA_UNMAP_{ADDR,LEN} in state saving structures.
|
||||
Example, before:
|
||||
Example, before::
|
||||
|
||||
struct ring_state {
|
||||
struct sk_buff *skb;
|
||||
@@ -866,7 +887,7 @@ transform some example code.
|
||||
__u32 len;
|
||||
};
|
||||
|
||||
after:
|
||||
after::
|
||||
|
||||
struct ring_state {
|
||||
struct sk_buff *skb;
|
||||
@@ -875,23 +896,23 @@ transform some example code.
|
||||
};
|
||||
|
||||
2) Use dma_unmap_{addr,len}_set() to set these values.
|
||||
Example, before:
|
||||
Example, before::
|
||||
|
||||
ringp->mapping = FOO;
|
||||
ringp->len = BAR;
|
||||
|
||||
after:
|
||||
after::
|
||||
|
||||
dma_unmap_addr_set(ringp, mapping, FOO);
|
||||
dma_unmap_len_set(ringp, len, BAR);
|
||||
|
||||
3) Use dma_unmap_{addr,len}() to access these values.
|
||||
Example, before:
|
||||
Example, before::
|
||||
|
||||
dma_unmap_single(dev, ringp->mapping, ringp->len,
|
||||
DMA_FROM_DEVICE);
|
||||
|
||||
after:
|
||||
after::
|
||||
|
||||
dma_unmap_single(dev,
|
||||
dma_unmap_addr(ringp, mapping),
|
||||
@@ -902,7 +923,8 @@ It really should be self-explanatory. We treat the ADDR and LEN
|
||||
separately, because it is possible for an implementation to only
|
||||
need the address in order to perform the unmap operation.
|
||||
|
||||
Platform Issues
|
||||
Platform Issues
|
||||
===============
|
||||
|
||||
If you are just writing drivers for Linux and do not maintain
|
||||
an architecture port for the kernel, you can safely skip down
|
||||
@@ -928,12 +950,13 @@ to "Closing".
|
||||
alignment constraints (e.g. the alignment constraints about 64-bit
|
||||
objects).
|
||||
|
||||
Closing
|
||||
Closing
|
||||
=======
|
||||
|
||||
This document, and the API itself, would not be in its current
|
||||
form without the feedback and suggestions from numerous individuals.
|
||||
We would like to specifically mention, in no particular order, the
|
||||
following people:
|
||||
following people::
|
||||
|
||||
Russell King <rmk@arm.linux.org.uk>
|
||||
Leo Dagum <dagum@barrel.engr.sgi.com>
|
||||
|
||||
+320
-242
File diff suppressed because it is too large
Load Diff
@@ -1,19 +1,20 @@
|
||||
DMA with ISA and LPC devices
|
||||
============================
|
||||
============================
|
||||
DMA with ISA and LPC devices
|
||||
============================
|
||||
|
||||
Pierre Ossman <drzeus@drzeus.cx>
|
||||
:Author: Pierre Ossman <drzeus@drzeus.cx>
|
||||
|
||||
This document describes how to do DMA transfers using the old ISA DMA
|
||||
controller. Even though ISA is more or less dead today the LPC bus
|
||||
uses the same DMA system so it will be around for quite some time.
|
||||
|
||||
Part I - Headers and dependencies
|
||||
---------------------------------
|
||||
Headers and dependencies
|
||||
------------------------
|
||||
|
||||
To do ISA style DMA you need to include two headers:
|
||||
To do ISA style DMA you need to include two headers::
|
||||
|
||||
#include <linux/dma-mapping.h>
|
||||
#include <asm/dma.h>
|
||||
#include <linux/dma-mapping.h>
|
||||
#include <asm/dma.h>
|
||||
|
||||
The first is the generic DMA API used to convert virtual addresses to
|
||||
bus addresses (see Documentation/DMA-API.txt for details).
|
||||
@@ -23,8 +24,8 @@ this is not present on all platforms make sure you construct your
|
||||
Kconfig to be dependent on ISA_DMA_API (not ISA) so that nobody tries
|
||||
to build your driver on unsupported platforms.
|
||||
|
||||
Part II - Buffer allocation
|
||||
---------------------------
|
||||
Buffer allocation
|
||||
-----------------
|
||||
|
||||
The ISA DMA controller has some very strict requirements on which
|
||||
memory it can access so extra care must be taken when allocating
|
||||
@@ -42,13 +43,13 @@ requirements you pass the flag GFP_DMA to kmalloc.
|
||||
|
||||
Unfortunately the memory available for ISA DMA is scarce so unless you
|
||||
allocate the memory during boot-up it's a good idea to also pass
|
||||
__GFP_REPEAT and __GFP_NOWARN to make the allocator try a bit harder.
|
||||
__GFP_RETRY_MAYFAIL and __GFP_NOWARN to make the allocator try a bit harder.
|
||||
|
||||
(This scarcity also means that you should allocate the buffer as
|
||||
early as possible and not release it until the driver is unloaded.)
|
||||
|
||||
Part III - Address translation
|
||||
------------------------------
|
||||
Address translation
|
||||
-------------------
|
||||
|
||||
To translate the virtual address to a bus address, use the normal DMA
|
||||
API. Do _not_ use isa_virt_to_phys() even though it does the same
|
||||
@@ -61,8 +62,8 @@ Note: x86_64 had a broken DMA API when it came to ISA but has since
|
||||
been fixed. If your arch has problems then fix the DMA API instead of
|
||||
reverting to the ISA functions.
|
||||
|
||||
Part IV - Channels
|
||||
------------------
|
||||
Channels
|
||||
--------
|
||||
|
||||
A normal ISA DMA controller has 8 channels. The lower four are for
|
||||
8-bit transfers and the upper four are for 16-bit transfers.
|
||||
@@ -80,8 +81,8 @@ The ability to use 16-bit or 8-bit transfers is _not_ up to you as a
|
||||
driver author but depends on what the hardware supports. Check your
|
||||
specs or test different channels.
|
||||
|
||||
Part V - Transfer data
|
||||
----------------------
|
||||
Transfer data
|
||||
-------------
|
||||
|
||||
Now for the good stuff, the actual DMA transfer. :)
|
||||
|
||||
@@ -112,37 +113,37 @@ Once the DMA transfer is finished (or timed out) you should disable
|
||||
the channel again. You should also check get_dma_residue() to make
|
||||
sure that all data has been transferred.
|
||||
|
||||
Example:
|
||||
Example::
|
||||
|
||||
int flags, residue;
|
||||
int flags, residue;
|
||||
|
||||
flags = claim_dma_lock();
|
||||
flags = claim_dma_lock();
|
||||
|
||||
clear_dma_ff();
|
||||
clear_dma_ff();
|
||||
|
||||
set_dma_mode(channel, DMA_MODE_WRITE);
|
||||
set_dma_addr(channel, phys_addr);
|
||||
set_dma_count(channel, num_bytes);
|
||||
set_dma_mode(channel, DMA_MODE_WRITE);
|
||||
set_dma_addr(channel, phys_addr);
|
||||
set_dma_count(channel, num_bytes);
|
||||
|
||||
dma_enable(channel);
|
||||
dma_enable(channel);
|
||||
|
||||
release_dma_lock(flags);
|
||||
release_dma_lock(flags);
|
||||
|
||||
while (!device_done());
|
||||
while (!device_done());
|
||||
|
||||
flags = claim_dma_lock();
|
||||
flags = claim_dma_lock();
|
||||
|
||||
dma_disable(channel);
|
||||
dma_disable(channel);
|
||||
|
||||
residue = dma_get_residue(channel);
|
||||
if (residue != 0)
|
||||
printk(KERN_ERR "driver: Incomplete DMA transfer!"
|
||||
" %d bytes left!\n", residue);
|
||||
residue = dma_get_residue(channel);
|
||||
if (residue != 0)
|
||||
printk(KERN_ERR "driver: Incomplete DMA transfer!"
|
||||
" %d bytes left!\n", residue);
|
||||
|
||||
release_dma_lock(flags);
|
||||
release_dma_lock(flags);
|
||||
|
||||
Part VI - Suspend/resume
|
||||
------------------------
|
||||
Suspend/resume
|
||||
--------------
|
||||
|
||||
It is the driver's responsibility to make sure that the machine isn't
|
||||
suspended while a DMA transfer is in progress. Also, all DMA settings
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
DMA attributes
|
||||
==============
|
||||
==============
|
||||
DMA attributes
|
||||
==============
|
||||
|
||||
This document describes the semantics of the DMA attributes that are
|
||||
defined in linux/dma-mapping.h.
|
||||
@@ -108,6 +109,7 @@ This is a hint to the DMA-mapping subsystem that it's probably not worth
|
||||
the time to try to allocate memory to in a way that gives better TLB
|
||||
efficiency (AKA it's not worth trying to build the mapping out of larger
|
||||
pages). You might want to specify this if:
|
||||
|
||||
- You know that the accesses to this memory won't thrash the TLB.
|
||||
You might know that the accesses are likely to be sequential or
|
||||
that they aren't sequential but it's unlikely you'll ping-pong
|
||||
@@ -121,11 +123,12 @@ pages). You might want to specify this if:
|
||||
the mapping to have a short lifetime then it may be worth it to
|
||||
optimize allocation (avoid coming up with large pages) instead of
|
||||
getting the slight performance win of larger pages.
|
||||
|
||||
Setting this hint doesn't guarantee that you won't get huge pages, but it
|
||||
means that we won't try quite as hard to get them.
|
||||
|
||||
NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
|
||||
though ARM64 patches will likely be posted soon.
|
||||
.. note:: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
|
||||
though ARM64 patches will likely be posted soon.
|
||||
|
||||
DMA_ATTR_NO_WARN
|
||||
----------------
|
||||
@@ -142,10 +145,10 @@ problem at all, depending on the implementation of the retry mechanism.
|
||||
So, this provides a way for drivers to avoid those error messages on calls
|
||||
where allocation failures are not a problem, and shouldn't bother the logs.
|
||||
|
||||
NOTE: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC.
|
||||
.. note:: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC.
|
||||
|
||||
DMA_ATTR_PRIVILEGED
|
||||
------------------------------
|
||||
-------------------
|
||||
|
||||
Some advanced peripherals such as remote processors and GPUs perform
|
||||
accesses to DMA buffers in both privileged "supervisor" and unprivileged
|
||||
|
||||
+44
-32
@@ -1,9 +1,8 @@
|
||||
=====================
|
||||
The Linux IPMI Driver
|
||||
=====================
|
||||
|
||||
The Linux IPMI Driver
|
||||
---------------------
|
||||
Corey Minyard
|
||||
<minyard@mvista.com>
|
||||
<minyard@acm.org>
|
||||
:Author: Corey Minyard <minyard@mvista.com> / <minyard@acm.org>
|
||||
|
||||
The Intelligent Platform Management Interface, or IPMI, is a
|
||||
standard for controlling intelligent devices that monitor a system.
|
||||
@@ -141,7 +140,7 @@ Addressing
|
||||
----------
|
||||
|
||||
The IPMI addressing works much like IP addresses, you have an overlay
|
||||
to handle the different address types. The overlay is:
|
||||
to handle the different address types. The overlay is::
|
||||
|
||||
struct ipmi_addr
|
||||
{
|
||||
@@ -153,7 +152,7 @@ to handle the different address types. The overlay is:
|
||||
The addr_type determines what the address really is. The driver
|
||||
currently understands two different types of addresses.
|
||||
|
||||
"System Interface" addresses are defined as:
|
||||
"System Interface" addresses are defined as::
|
||||
|
||||
struct ipmi_system_interface_addr
|
||||
{
|
||||
@@ -166,7 +165,7 @@ straight to the BMC on the current card. The channel must be
|
||||
IPMI_BMC_CHANNEL.
|
||||
|
||||
Messages that are destined to go out on the IPMB bus use the
|
||||
IPMI_IPMB_ADDR_TYPE address type. The format is
|
||||
IPMI_IPMB_ADDR_TYPE address type. The format is::
|
||||
|
||||
struct ipmi_ipmb_addr
|
||||
{
|
||||
@@ -184,16 +183,16 @@ spec.
|
||||
Messages
|
||||
--------
|
||||
|
||||
Messages are defined as:
|
||||
Messages are defined as::
|
||||
|
||||
struct ipmi_msg
|
||||
{
|
||||
struct ipmi_msg
|
||||
{
|
||||
unsigned char netfn;
|
||||
unsigned char lun;
|
||||
unsigned char cmd;
|
||||
unsigned char *data;
|
||||
int data_len;
|
||||
};
|
||||
};
|
||||
|
||||
The driver takes care of adding/stripping the header information. The
|
||||
data portion is just the data to be send (do NOT put addressing info
|
||||
@@ -208,7 +207,7 @@ block of data, even when receiving messages. Otherwise the driver
|
||||
will have no place to put the message.
|
||||
|
||||
Messages coming up from the message handler in kernelland will come in
|
||||
as:
|
||||
as::
|
||||
|
||||
struct ipmi_recv_msg
|
||||
{
|
||||
@@ -246,6 +245,7 @@ and the user should not have to care what type of SMI is below them.
|
||||
|
||||
|
||||
Watching For Interfaces
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
When your code comes up, the IPMI driver may or may not have detected
|
||||
if IPMI devices exist. So you might have to defer your setup until
|
||||
@@ -256,6 +256,7 @@ and tell you when they come and go.
|
||||
|
||||
|
||||
Creating the User
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
To use the message handler, you must first create a user using
|
||||
ipmi_create_user. The interface number specifies which SMI you want
|
||||
@@ -272,6 +273,7 @@ closing the device automatically destroys the user.
|
||||
|
||||
|
||||
Messaging
|
||||
^^^^^^^^^
|
||||
|
||||
To send a message from kernel-land, the ipmi_request_settime() call does
|
||||
pretty much all message handling. Most of the parameter are
|
||||
@@ -321,6 +323,7 @@ though, since it is tricky to manage your own buffers.
|
||||
|
||||
|
||||
Events and Incoming Commands
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The driver takes care of polling for IPMI events and receiving
|
||||
commands (commands are messages that are not responses, they are
|
||||
@@ -367,7 +370,7 @@ in the system. It discovers interfaces through a host of different
|
||||
methods, depending on the system.
|
||||
|
||||
You can specify up to four interfaces on the module load line and
|
||||
control some module parameters:
|
||||
control some module parameters::
|
||||
|
||||
modprobe ipmi_si.o type=<type1>,<type2>....
|
||||
ports=<port1>,<port2>... addrs=<addr1>,<addr2>...
|
||||
@@ -437,7 +440,7 @@ default is one. Setting to 0 is useful with the hotmod, but is
|
||||
obviously only useful for modules.
|
||||
|
||||
When compiled into the kernel, the parameters can be specified on the
|
||||
kernel command line as:
|
||||
kernel command line as::
|
||||
|
||||
ipmi_si.type=<type1>,<type2>...
|
||||
ipmi_si.ports=<port1>,<port2>... ipmi_si.addrs=<addr1>,<addr2>...
|
||||
@@ -474,16 +477,22 @@ The driver supports a hot add and remove of interfaces. This way,
|
||||
interfaces can be added or removed after the kernel is up and running.
|
||||
This is done using /sys/modules/ipmi_si/parameters/hotmod, which is a
|
||||
write-only parameter. You write a string to this interface. The string
|
||||
has the format:
|
||||
has the format::
|
||||
|
||||
<op1>[:op2[:op3...]]
|
||||
The "op"s are:
|
||||
|
||||
The "op"s are::
|
||||
|
||||
add|remove,kcs|bt|smic,mem|i/o,<address>[,<opt1>[,<opt2>[,...]]]
|
||||
You can specify more than one interface on the line. The "opt"s are:
|
||||
|
||||
You can specify more than one interface on the line. The "opt"s are::
|
||||
|
||||
rsp=<regspacing>
|
||||
rsi=<regsize>
|
||||
rsh=<regshift>
|
||||
irq=<irq>
|
||||
ipmb=<ipmb slave addr>
|
||||
|
||||
and these have the same meanings as discussed above. Note that you
|
||||
can also use this on the kernel command line for a more compact format
|
||||
for specifying an interface. Note that when removing an interface,
|
||||
@@ -496,7 +505,7 @@ The SMBus Driver (SSIF)
|
||||
The SMBus driver allows up to 4 SMBus devices to be configured in the
|
||||
system. By default, the driver will only register with something it
|
||||
finds in DMI or ACPI tables. You can change this
|
||||
at module load time (for a module) with:
|
||||
at module load time (for a module) with::
|
||||
|
||||
modprobe ipmi_ssif.o
|
||||
addr=<i2caddr1>[,<i2caddr2>[,...]]
|
||||
@@ -535,7 +544,7 @@ the smb_addr parameter unless you have DMI or ACPI data to tell the
|
||||
driver what to use.
|
||||
|
||||
When compiled into the kernel, the addresses can be specified on the
|
||||
kernel command line as:
|
||||
kernel command line as::
|
||||
|
||||
ipmb_ssif.addr=<i2caddr1>[,<i2caddr2>[...]]
|
||||
ipmi_ssif.adapter=<adapter1>[,<adapter2>[...]]
|
||||
@@ -565,9 +574,9 @@ Some users need more detailed information about a device, like where
|
||||
the address came from or the raw base device for the IPMI interface.
|
||||
You can use the IPMI smi_watcher to catch the IPMI interfaces as they
|
||||
come or go, and to grab the information, you can use the function
|
||||
ipmi_get_smi_info(), which returns the following structure:
|
||||
ipmi_get_smi_info(), which returns the following structure::
|
||||
|
||||
struct ipmi_smi_info {
|
||||
struct ipmi_smi_info {
|
||||
enum ipmi_addr_src addr_src;
|
||||
struct device *dev;
|
||||
union {
|
||||
@@ -575,7 +584,7 @@ struct ipmi_smi_info {
|
||||
void *acpi_handle;
|
||||
} acpi_info;
|
||||
} addr_info;
|
||||
};
|
||||
};
|
||||
|
||||
Currently special info for only for SI_ACPI address sources is
|
||||
returned. Others may be added as necessary.
|
||||
@@ -590,7 +599,7 @@ Watchdog
|
||||
|
||||
A watchdog timer is provided that implements the Linux-standard
|
||||
watchdog timer interface. It has three module parameters that can be
|
||||
used to control it:
|
||||
used to control it::
|
||||
|
||||
modprobe ipmi_watchdog timeout=<t> pretimeout=<t> action=<action type>
|
||||
preaction=<preaction type> preop=<preop type> start_now=x
|
||||
@@ -635,7 +644,7 @@ watchdog device is closed. The default value of nowayout is true
|
||||
if the CONFIG_WATCHDOG_NOWAYOUT option is enabled, or false if not.
|
||||
|
||||
When compiled into the kernel, the kernel command line is available
|
||||
for configuring the watchdog:
|
||||
for configuring the watchdog::
|
||||
|
||||
ipmi_watchdog.timeout=<t> ipmi_watchdog.pretimeout=<t>
|
||||
ipmi_watchdog.action=<action type>
|
||||
@@ -675,6 +684,7 @@ also get a bunch of OEM events holding the panic string.
|
||||
|
||||
|
||||
The field settings of the events are:
|
||||
|
||||
* Generator ID: 0x21 (kernel)
|
||||
* EvM Rev: 0x03 (this event is formatting in IPMI 1.0 format)
|
||||
* Sensor Type: 0x20 (OS critical stop sensor)
|
||||
@@ -683,18 +693,20 @@ The field settings of the events are:
|
||||
* Event Data 1: 0xa1 (Runtime stop in OEM bytes 2 and 3)
|
||||
* Event data 2: second byte of panic string
|
||||
* Event data 3: third byte of panic string
|
||||
|
||||
See the IPMI spec for the details of the event layout. This event is
|
||||
always sent to the local management controller. It will handle routing
|
||||
the message to the right place
|
||||
|
||||
Other OEM events have the following format:
|
||||
Record ID (bytes 0-1): Set by the SEL.
|
||||
Record type (byte 2): 0xf0 (OEM non-timestamped)
|
||||
byte 3: The slave address of the card saving the panic
|
||||
byte 4: A sequence number (starting at zero)
|
||||
The rest of the bytes (11 bytes) are the panic string. If the panic string
|
||||
is longer than 11 bytes, multiple messages will be sent with increasing
|
||||
sequence numbers.
|
||||
|
||||
* Record ID (bytes 0-1): Set by the SEL.
|
||||
* Record type (byte 2): 0xf0 (OEM non-timestamped)
|
||||
* byte 3: The slave address of the card saving the panic
|
||||
* byte 4: A sequence number (starting at zero)
|
||||
The rest of the bytes (11 bytes) are the panic string. If the panic string
|
||||
is longer than 11 bytes, multiple messages will be sent with increasing
|
||||
sequence numbers.
|
||||
|
||||
Because you cannot send OEM events using the standard interface, this
|
||||
function will attempt to find an SEL and add the events there. It
|
||||
|
||||
@@ -1,8 +1,11 @@
|
||||
ChangeLog:
|
||||
Started by Ingo Molnar <mingo@redhat.com>
|
||||
Update by Max Krasnyansky <maxk@qualcomm.com>
|
||||
|
||||
================
|
||||
SMP IRQ affinity
|
||||
================
|
||||
|
||||
ChangeLog:
|
||||
- Started by Ingo Molnar <mingo@redhat.com>
|
||||
- Update by Max Krasnyansky <maxk@qualcomm.com>
|
||||
|
||||
|
||||
/proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify
|
||||
which target CPUs are permitted for a given IRQ source. It's a bitmask
|
||||
@@ -16,50 +19,52 @@ will be set to the default mask. It can then be changed as described above.
|
||||
Default mask is 0xffffffff.
|
||||
|
||||
Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
|
||||
it to CPU4-7 (this is an 8-CPU SMP box):
|
||||
it to CPU4-7 (this is an 8-CPU SMP box)::
|
||||
|
||||
[root@moon 44]# cd /proc/irq/44
|
||||
[root@moon 44]# cat smp_affinity
|
||||
ffffffff
|
||||
[root@moon 44]# cd /proc/irq/44
|
||||
[root@moon 44]# cat smp_affinity
|
||||
ffffffff
|
||||
|
||||
[root@moon 44]# echo 0f > smp_affinity
|
||||
[root@moon 44]# cat smp_affinity
|
||||
0000000f
|
||||
[root@moon 44]# ping -f h
|
||||
PING hell (195.4.7.3): 56 data bytes
|
||||
...
|
||||
--- hell ping statistics ---
|
||||
6029 packets transmitted, 6027 packets received, 0% packet loss
|
||||
round-trip min/avg/max = 0.1/0.1/0.4 ms
|
||||
[root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:'
|
||||
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||
44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1
|
||||
[root@moon 44]# echo 0f > smp_affinity
|
||||
[root@moon 44]# cat smp_affinity
|
||||
0000000f
|
||||
[root@moon 44]# ping -f h
|
||||
PING hell (195.4.7.3): 56 data bytes
|
||||
...
|
||||
--- hell ping statistics ---
|
||||
6029 packets transmitted, 6027 packets received, 0% packet loss
|
||||
round-trip min/avg/max = 0.1/0.1/0.4 ms
|
||||
[root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:'
|
||||
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||
44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1
|
||||
|
||||
As can be seen from the line above IRQ44 was delivered only to the first four
|
||||
processors (0-3).
|
||||
Now lets restrict that IRQ to CPU(4-7).
|
||||
|
||||
[root@moon 44]# echo f0 > smp_affinity
|
||||
[root@moon 44]# cat smp_affinity
|
||||
000000f0
|
||||
[root@moon 44]# ping -f h
|
||||
PING hell (195.4.7.3): 56 data bytes
|
||||
..
|
||||
--- hell ping statistics ---
|
||||
2779 packets transmitted, 2777 packets received, 0% packet loss
|
||||
round-trip min/avg/max = 0.1/0.5/585.4 ms
|
||||
[root@moon 44]# cat /proc/interrupts | 'CPU\|44:'
|
||||
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||
44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1
|
||||
::
|
||||
|
||||
[root@moon 44]# echo f0 > smp_affinity
|
||||
[root@moon 44]# cat smp_affinity
|
||||
000000f0
|
||||
[root@moon 44]# ping -f h
|
||||
PING hell (195.4.7.3): 56 data bytes
|
||||
..
|
||||
--- hell ping statistics ---
|
||||
2779 packets transmitted, 2777 packets received, 0% packet loss
|
||||
round-trip min/avg/max = 0.1/0.5/585.4 ms
|
||||
[root@moon 44]# cat /proc/interrupts | 'CPU\|44:'
|
||||
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||
44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1
|
||||
|
||||
This time around IRQ44 was delivered only to the last four processors.
|
||||
i.e counters for the CPU0-3 did not change.
|
||||
|
||||
Here is an example of limiting that same irq (44) to cpus 1024 to 1031:
|
||||
Here is an example of limiting that same irq (44) to cpus 1024 to 1031::
|
||||
|
||||
[root@moon 44]# echo 1024-1031 > smp_affinity_list
|
||||
[root@moon 44]# cat smp_affinity_list
|
||||
1024-1031
|
||||
[root@moon 44]# echo 1024-1031 > smp_affinity_list
|
||||
[root@moon 44]# cat smp_affinity_list
|
||||
1024-1031
|
||||
|
||||
Note that to do this with a bitmask would require 32 bitmasks of zero
|
||||
to follow the pertinent one.
|
||||
|
||||
@@ -1,4 +1,6 @@
|
||||
irq_domain interrupt number mapping library
|
||||
===============================================
|
||||
The irq_domain interrupt number mapping library
|
||||
===============================================
|
||||
|
||||
The current design of the Linux kernel uses a single large number
|
||||
space where each separate IRQ source is assigned a different number.
|
||||
@@ -36,7 +38,9 @@ irq_domain also implements translation from an abstract irq_fwspec
|
||||
structure to hwirq numbers (Device Tree and ACPI GSI so far), and can
|
||||
be easily extended to support other IRQ topology data sources.
|
||||
|
||||
=== irq_domain usage ===
|
||||
irq_domain usage
|
||||
================
|
||||
|
||||
An interrupt controller driver creates and registers an irq_domain by
|
||||
calling one of the irq_domain_add_*() functions (each mapping method
|
||||
has a different allocator function, more on that later). The function
|
||||
@@ -62,15 +66,21 @@ If the driver has the Linux IRQ number or the irq_data pointer, and
|
||||
needs to know the associated hwirq number (such as in the irq_chip
|
||||
callbacks) then it can be directly obtained from irq_data->hwirq.
|
||||
|
||||
=== Types of irq_domain mappings ===
|
||||
Types of irq_domain mappings
|
||||
============================
|
||||
|
||||
There are several mechanisms available for reverse mapping from hwirq
|
||||
to Linux irq, and each mechanism uses a different allocation function.
|
||||
Which reverse map type should be used depends on the use case. Each
|
||||
of the reverse map types are described below:
|
||||
|
||||
==== Linear ====
|
||||
irq_domain_add_linear()
|
||||
irq_domain_create_linear()
|
||||
Linear
|
||||
------
|
||||
|
||||
::
|
||||
|
||||
irq_domain_add_linear()
|
||||
irq_domain_create_linear()
|
||||
|
||||
The linear reverse map maintains a fixed size table indexed by the
|
||||
hwirq number. When a hwirq is mapped, an irq_desc is allocated for
|
||||
@@ -89,9 +99,13 @@ accepts a more general abstraction 'struct fwnode_handle'.
|
||||
|
||||
The majority of drivers should use the linear map.
|
||||
|
||||
==== Tree ====
|
||||
irq_domain_add_tree()
|
||||
irq_domain_create_tree()
|
||||
Tree
|
||||
----
|
||||
|
||||
::
|
||||
|
||||
irq_domain_add_tree()
|
||||
irq_domain_create_tree()
|
||||
|
||||
The irq_domain maintains a radix tree map from hwirq numbers to Linux
|
||||
IRQs. When an hwirq is mapped, an irq_desc is allocated and the
|
||||
@@ -109,8 +123,12 @@ accepts a more general abstraction 'struct fwnode_handle'.
|
||||
|
||||
Very few drivers should need this mapping.
|
||||
|
||||
==== No Map ===-
|
||||
irq_domain_add_nomap()
|
||||
No Map
|
||||
------
|
||||
|
||||
::
|
||||
|
||||
irq_domain_add_nomap()
|
||||
|
||||
The No Map mapping is to be used when the hwirq number is
|
||||
programmable in the hardware. In this case it is best to program the
|
||||
@@ -121,10 +139,14 @@ Linux IRQ number into the hardware.
|
||||
|
||||
Most drivers cannot use this mapping.
|
||||
|
||||
==== Legacy ====
|
||||
irq_domain_add_simple()
|
||||
irq_domain_add_legacy()
|
||||
irq_domain_add_legacy_isa()
|
||||
Legacy
|
||||
------
|
||||
|
||||
::
|
||||
|
||||
irq_domain_add_simple()
|
||||
irq_domain_add_legacy()
|
||||
irq_domain_add_legacy_isa()
|
||||
|
||||
The Legacy mapping is a special case for drivers that already have a
|
||||
range of irq_descs allocated for the hwirqs. It is used when the
|
||||
@@ -163,14 +185,17 @@ that the driver using the simple domain call irq_create_mapping()
|
||||
before any irq_find_mapping() since the latter will actually work
|
||||
for the static IRQ assignment case.
|
||||
|
||||
==== Hierarchy IRQ domain ====
|
||||
Hierarchy IRQ domain
|
||||
--------------------
|
||||
|
||||
On some architectures, there may be multiple interrupt controllers
|
||||
involved in delivering an interrupt from the device to the target CPU.
|
||||
Let's look at a typical interrupt delivering path on x86 platforms:
|
||||
Let's look at a typical interrupt delivering path on x86 platforms::
|
||||
|
||||
Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU
|
||||
Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU
|
||||
|
||||
There are three interrupt controllers involved:
|
||||
|
||||
1) IOAPIC controller
|
||||
2) Interrupt remapping controller
|
||||
3) Local APIC controller
|
||||
@@ -180,7 +205,8 @@ hardware architecture, an irq_domain data structure is built for each
|
||||
interrupt controller and those irq_domains are organized into hierarchy.
|
||||
When building irq_domain hierarchy, the irq_domain near to the device is
|
||||
child and the irq_domain near to CPU is parent. So a hierarchy structure
|
||||
as below will be built for the example above.
|
||||
as below will be built for the example above::
|
||||
|
||||
CPU Vector irq_domain (root irq_domain to manage CPU vectors)
|
||||
^
|
||||
|
|
||||
@@ -190,6 +216,7 @@ as below will be built for the example above.
|
||||
IOAPIC irq_domain (manage IOAPIC delivery entries/pins)
|
||||
|
||||
There are four major interfaces to use hierarchy irq_domain:
|
||||
|
||||
1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt
|
||||
controller related resources to deliver these interrupts.
|
||||
2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller
|
||||
@@ -199,7 +226,8 @@ There are four major interfaces to use hierarchy irq_domain:
|
||||
4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware
|
||||
to stop delivering the interrupt.
|
||||
|
||||
Following changes are needed to support hierarchy irq_domain.
|
||||
Following changes are needed to support hierarchy irq_domain:
|
||||
|
||||
1) a new field 'parent' is added to struct irq_domain; it's used to
|
||||
maintain irq_domain hierarchy information.
|
||||
2) a new field 'parent_data' is added to struct irq_data; it's used to
|
||||
@@ -223,6 +251,7 @@ software architecture.
|
||||
|
||||
For an interrupt controller driver to support hierarchy irq_domain, it
|
||||
needs to:
|
||||
|
||||
1) Implement irq_domain_ops.alloc and irq_domain_ops.free
|
||||
2) Optionally implement irq_domain_ops.activate and
|
||||
irq_domain_ops.deactivate.
|
||||
|
||||
@@ -1,4 +1,6 @@
|
||||
===============
|
||||
What is an IRQ?
|
||||
===============
|
||||
|
||||
An IRQ is an interrupt request from a device.
|
||||
Currently they can come in over a pin, or over a packet.
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
===================
|
||||
Linux IOMMU Support
|
||||
===================
|
||||
|
||||
@@ -9,11 +10,11 @@ This guide gives a quick cheat sheet for some basic understanding.
|
||||
|
||||
Some Keywords
|
||||
|
||||
DMAR - DMA remapping
|
||||
DRHD - DMA Remapping Hardware Unit Definition
|
||||
RMRR - Reserved memory Region Reporting Structure
|
||||
ZLR - Zero length reads from PCI devices
|
||||
IOVA - IO Virtual address.
|
||||
- DMAR - DMA remapping
|
||||
- DRHD - DMA Remapping Hardware Unit Definition
|
||||
- RMRR - Reserved memory Region Reporting Structure
|
||||
- ZLR - Zero length reads from PCI devices
|
||||
- IOVA - IO Virtual address.
|
||||
|
||||
Basic stuff
|
||||
-----------
|
||||
@@ -33,7 +34,7 @@ devices that need to access these regions. OS is expected to setup
|
||||
unity mappings for these regions for these devices to access these regions.
|
||||
|
||||
How is IOVA generated?
|
||||
---------------------
|
||||
----------------------
|
||||
|
||||
Well behaved drivers call pci_map_*() calls before sending command to device
|
||||
that needs to perform DMA. Once DMA is completed and mapping is no longer
|
||||
@@ -82,14 +83,14 @@ in ACPI.
|
||||
ACPI: DMAR (v001 A M I OEMDMAR 0x00000001 MSFT 0x00000097) @ 0x000000007f5b5ef0
|
||||
|
||||
When DMAR is being processed and initialized by ACPI, prints DMAR locations
|
||||
and any RMRR's processed.
|
||||
and any RMRR's processed::
|
||||
|
||||
ACPI DMAR:Host address width 36
|
||||
ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
|
||||
ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000
|
||||
ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000
|
||||
ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effff
|
||||
ACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff
|
||||
ACPI DMAR:Host address width 36
|
||||
ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
|
||||
ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000
|
||||
ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000
|
||||
ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effff
|
||||
ACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff
|
||||
|
||||
When DMAR is enabled for use, you will notice..
|
||||
|
||||
@@ -98,10 +99,12 @@ PCI-DMA: Using DMAR IOMMU
|
||||
Fault reporting
|
||||
---------------
|
||||
|
||||
DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
|
||||
DMAR:[fault reason 05] PTE Write access is not set
|
||||
DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
|
||||
DMAR:[fault reason 05] PTE Write access is not set
|
||||
::
|
||||
|
||||
DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
|
||||
DMAR:[fault reason 05] PTE Write access is not set
|
||||
DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
|
||||
DMAR:[fault reason 05] PTE Write access is not set
|
||||
|
||||
TBD
|
||||
----
|
||||
|
||||
+34
-31
@@ -1,5 +1,9 @@
|
||||
Linux 2.4.2 Secure Attention Key (SAK) handling
|
||||
18 March 2001, Andrew Morton
|
||||
=========================================
|
||||
Linux Secure Attention Key (SAK) handling
|
||||
=========================================
|
||||
|
||||
:Date: 18 March 2001
|
||||
:Author: Andrew Morton
|
||||
|
||||
An operating system's Secure Attention Key is a security tool which is
|
||||
provided as protection against trojan password capturing programs. It
|
||||
@@ -13,7 +17,7 @@ this sequence. It is only available if the kernel was compiled with
|
||||
sysrq support.
|
||||
|
||||
The proper way of generating a SAK is to define the key sequence using
|
||||
`loadkeys'. This will work whether or not sysrq support is compiled
|
||||
``loadkeys``. This will work whether or not sysrq support is compiled
|
||||
into the kernel.
|
||||
|
||||
SAK works correctly when the keyboard is in raw mode. This means that
|
||||
@@ -25,64 +29,63 @@ What key sequence should you use? Well, CTRL-ALT-DEL is used to reboot
|
||||
the machine. CTRL-ALT-BACKSPACE is magical to the X server. We'll
|
||||
choose CTRL-ALT-PAUSE.
|
||||
|
||||
In your rc.sysinit (or rc.local) file, add the command
|
||||
In your rc.sysinit (or rc.local) file, add the command::
|
||||
|
||||
echo "control alt keycode 101 = SAK" | /bin/loadkeys
|
||||
|
||||
And that's it! Only the superuser may reprogram the SAK key.
|
||||
|
||||
|
||||
NOTES
|
||||
=====
|
||||
.. note::
|
||||
|
||||
1: Linux SAK is said to be not a "true SAK" as is required by
|
||||
systems which implement C2 level security. This author does not
|
||||
know why.
|
||||
1. Linux SAK is said to be not a "true SAK" as is required by
|
||||
systems which implement C2 level security. This author does not
|
||||
know why.
|
||||
|
||||
|
||||
2: On the PC keyboard, SAK kills all applications which have
|
||||
/dev/console opened.
|
||||
2. On the PC keyboard, SAK kills all applications which have
|
||||
/dev/console opened.
|
||||
|
||||
Unfortunately this includes a number of things which you don't
|
||||
actually want killed. This is because these applications are
|
||||
incorrectly holding /dev/console open. Be sure to complain to your
|
||||
Linux distributor about this!
|
||||
Unfortunately this includes a number of things which you don't
|
||||
actually want killed. This is because these applications are
|
||||
incorrectly holding /dev/console open. Be sure to complain to your
|
||||
Linux distributor about this!
|
||||
|
||||
You can identify processes which will be killed by SAK with the
|
||||
command
|
||||
You can identify processes which will be killed by SAK with the
|
||||
command::
|
||||
|
||||
# ls -l /proc/[0-9]*/fd/* | grep console
|
||||
l-wx------ 1 root root 64 Mar 18 00:46 /proc/579/fd/0 -> /dev/console
|
||||
|
||||
Then:
|
||||
Then::
|
||||
|
||||
# ps aux|grep 579
|
||||
root 579 0.0 0.1 1088 436 ? S 00:43 0:00 gpm -t ps/2
|
||||
|
||||
So `gpm' will be killed by SAK. This is a bug in gpm. It should
|
||||
be closing standard input. You can work around this by finding the
|
||||
initscript which launches gpm and changing it thusly:
|
||||
So ``gpm`` will be killed by SAK. This is a bug in gpm. It should
|
||||
be closing standard input. You can work around this by finding the
|
||||
initscript which launches gpm and changing it thusly:
|
||||
|
||||
Old:
|
||||
Old::
|
||||
|
||||
daemon gpm
|
||||
|
||||
New:
|
||||
New::
|
||||
|
||||
daemon gpm < /dev/null
|
||||
|
||||
Vixie cron also seems to have this problem, and needs the same treatment.
|
||||
Vixie cron also seems to have this problem, and needs the same treatment.
|
||||
|
||||
Also, one prominent Linux distribution has the following three
|
||||
lines in its rc.sysinit and rc scripts:
|
||||
Also, one prominent Linux distribution has the following three
|
||||
lines in its rc.sysinit and rc scripts::
|
||||
|
||||
exec 3<&0
|
||||
exec 4>&1
|
||||
exec 5>&2
|
||||
|
||||
These commands cause *all* daemons which are launched by the
|
||||
initscripts to have file descriptors 3, 4 and 5 attached to
|
||||
/dev/console. So SAK kills them all. A workaround is to simply
|
||||
delete these lines, but this may cause system management
|
||||
applications to malfunction - test everything well.
|
||||
These commands cause **all** daemons which are launched by the
|
||||
initscripts to have file descriptors 3, 4 and 5 attached to
|
||||
/dev/console. So SAK kills them all. A workaround is to simply
|
||||
delete these lines, but this may cause system management
|
||||
applications to malfunction - test everything well.
|
||||
|
||||
|
||||
@@ -1,7 +1,10 @@
|
||||
SM501 Driver
|
||||
============
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
Copyright 2006, 2007 Simtec Electronics
|
||||
============
|
||||
SM501 Driver
|
||||
============
|
||||
|
||||
:Copyright: |copy| 2006, 2007 Simtec Electronics
|
||||
|
||||
The Silicon Motion SM501 multimedia companion chip is a multifunction device
|
||||
which may provide numerous interfaces including USB host controller USB gadget,
|
||||
|
||||
+107
-83
@@ -1,10 +1,15 @@
|
||||
============================
|
||||
A block layer cache (bcache)
|
||||
============================
|
||||
|
||||
Say you've got a big slow raid 6, and an ssd or three. Wouldn't it be
|
||||
nice if you could use them as cache... Hence bcache.
|
||||
|
||||
Wiki and git repositories are at:
|
||||
http://bcache.evilpiepirate.org
|
||||
http://evilpiepirate.org/git/linux-bcache.git
|
||||
http://evilpiepirate.org/git/bcache-tools.git
|
||||
|
||||
- http://bcache.evilpiepirate.org
|
||||
- http://evilpiepirate.org/git/linux-bcache.git
|
||||
- http://evilpiepirate.org/git/bcache-tools.git
|
||||
|
||||
It's designed around the performance characteristics of SSDs - it only allocates
|
||||
in erase block sized buckets, and it uses a hybrid btree/log to track cached
|
||||
@@ -37,17 +42,19 @@ to be flushed.
|
||||
|
||||
Getting started:
|
||||
You'll need make-bcache from the bcache-tools repository. Both the cache device
|
||||
and backing device must be formatted before use.
|
||||
and backing device must be formatted before use::
|
||||
|
||||
make-bcache -B /dev/sdb
|
||||
make-bcache -C /dev/sdc
|
||||
|
||||
make-bcache has the ability to format multiple devices at the same time - if
|
||||
you format your backing devices and cache device at the same time, you won't
|
||||
have to manually attach:
|
||||
have to manually attach::
|
||||
|
||||
make-bcache -B /dev/sda /dev/sdb -C /dev/sdc
|
||||
|
||||
bcache-tools now ships udev rules, and bcache devices are known to the kernel
|
||||
immediately. Without udev, you can manually register devices like this:
|
||||
immediately. Without udev, you can manually register devices like this::
|
||||
|
||||
echo /dev/sdb > /sys/fs/bcache/register
|
||||
echo /dev/sdc > /sys/fs/bcache/register
|
||||
@@ -60,16 +67,16 @@ slow devices as bcache backing devices without a cache, and you can choose to ad
|
||||
a caching device later.
|
||||
See 'ATTACHING' section below.
|
||||
|
||||
The devices show up as:
|
||||
The devices show up as::
|
||||
|
||||
/dev/bcache<N>
|
||||
|
||||
As well as (with udev):
|
||||
As well as (with udev)::
|
||||
|
||||
/dev/bcache/by-uuid/<uuid>
|
||||
/dev/bcache/by-label/<label>
|
||||
|
||||
To get started:
|
||||
To get started::
|
||||
|
||||
mkfs.ext4 /dev/bcache0
|
||||
mount /dev/bcache0 /mnt
|
||||
@@ -81,13 +88,13 @@ Cache devices are managed as sets; multiple caches per set isn't supported yet
|
||||
but will allow for mirroring of metadata and dirty data in the future. Your new
|
||||
cache set shows up as /sys/fs/bcache/<UUID>
|
||||
|
||||
ATTACHING
|
||||
Attaching
|
||||
---------
|
||||
|
||||
After your cache device and backing device are registered, the backing device
|
||||
must be attached to your cache set to enable caching. Attaching a backing
|
||||
device to a cache set is done thusly, with the UUID of the cache set in
|
||||
/sys/fs/bcache:
|
||||
/sys/fs/bcache::
|
||||
|
||||
echo <CSET-UUID> > /sys/block/bcache0/bcache/attach
|
||||
|
||||
@@ -97,7 +104,7 @@ your bcache devices. If a backing device has data in a cache somewhere, the
|
||||
important if you have writeback caching turned on.
|
||||
|
||||
If you're booting up and your cache device is gone and never coming back, you
|
||||
can force run the backing device:
|
||||
can force run the backing device::
|
||||
|
||||
echo 1 > /sys/block/sdb/bcache/running
|
||||
|
||||
@@ -110,7 +117,7 @@ but all the cached data will be invalidated. If there was dirty data in the
|
||||
cache, don't expect the filesystem to be recoverable - you will have massive
|
||||
filesystem corruption, though ext4's fsck does work miracles.
|
||||
|
||||
ERROR HANDLING
|
||||
Error Handling
|
||||
--------------
|
||||
|
||||
Bcache tries to transparently handle IO errors to/from the cache device without
|
||||
@@ -134,25 +141,27 @@ the backing devices to passthrough mode.
|
||||
read some of the dirty data, though.
|
||||
|
||||
|
||||
HOWTO/COOKBOOK
|
||||
Howto/cookbook
|
||||
--------------
|
||||
|
||||
A) Starting a bcache with a missing caching device
|
||||
|
||||
If registering the backing device doesn't help, it's already there, you just need
|
||||
to force it to run without the cache:
|
||||
to force it to run without the cache::
|
||||
|
||||
host:~# echo /dev/sdb1 > /sys/fs/bcache/register
|
||||
[ 119.844831] bcache: register_bcache() error opening /dev/sdb1: device already registered
|
||||
|
||||
Next, you try to register your caching device if it's present. However
|
||||
if it's absent, or registration fails for some reason, you can still
|
||||
start your bcache without its cache, like so:
|
||||
start your bcache without its cache, like so::
|
||||
|
||||
host:/sys/block/sdb/sdb1/bcache# echo 1 > running
|
||||
|
||||
Note that this may cause data loss if you were running in writeback mode.
|
||||
|
||||
|
||||
B) Bcache does not find its cache
|
||||
B) Bcache does not find its cache::
|
||||
|
||||
host:/sys/block/md5/bcache# echo 0226553a-37cf-41d5-b3ce-8b1e944543a8 > attach
|
||||
[ 1933.455082] bcache: bch_cached_dev_attach() Couldn't find uuid for md5 in set
|
||||
@@ -160,7 +169,8 @@ B) Bcache does not find its cache
|
||||
[ 1933.478179] : cache set not found
|
||||
|
||||
In this case, the caching device was simply not registered at boot
|
||||
or disappeared and came back, and needs to be (re-)registered:
|
||||
or disappeared and came back, and needs to be (re-)registered::
|
||||
|
||||
host:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register
|
||||
|
||||
|
||||
@@ -180,7 +190,8 @@ device is still available at an 8KiB offset. So either via a loopdev
|
||||
of the backing device created with --offset 8K, or any value defined by
|
||||
--data-offset when you originally formatted bcache with `make-bcache`.
|
||||
|
||||
For example:
|
||||
For example::
|
||||
|
||||
losetup -o 8192 /dev/loop0 /dev/your_bcache_backing_dev
|
||||
|
||||
This should present your unmodified backing device data in /dev/loop0
|
||||
@@ -191,33 +202,38 @@ cache device without loosing data.
|
||||
|
||||
E) Wiping a cache device
|
||||
|
||||
host:~# wipefs -a /dev/sdh2
|
||||
16 bytes were erased at offset 0x1018 (bcache)
|
||||
they were: c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
|
||||
::
|
||||
|
||||
After you boot back with bcache enabled, you recreate the cache and attach it:
|
||||
host:~# make-bcache -C /dev/sdh2
|
||||
UUID: 7be7e175-8f4c-4f99-94b2-9c904d227045
|
||||
Set UUID: 5bc072a8-ab17-446d-9744-e247949913c1
|
||||
version: 0
|
||||
nbuckets: 106874
|
||||
block_size: 1
|
||||
bucket_size: 1024
|
||||
nr_in_set: 1
|
||||
nr_this_dev: 0
|
||||
first_bucket: 1
|
||||
[ 650.511912] bcache: run_cache_set() invalidating existing data
|
||||
[ 650.549228] bcache: register_cache() registered cache device sdh2
|
||||
host:~# wipefs -a /dev/sdh2
|
||||
16 bytes were erased at offset 0x1018 (bcache)
|
||||
they were: c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
|
||||
|
||||
start backing device with missing cache:
|
||||
host:/sys/block/md5/bcache# echo 1 > running
|
||||
After you boot back with bcache enabled, you recreate the cache and attach it::
|
||||
|
||||
attach new cache:
|
||||
host:/sys/block/md5/bcache# echo 5bc072a8-ab17-446d-9744-e247949913c1 > attach
|
||||
[ 865.276616] bcache: bch_cached_dev_attach() Caching md5 as bcache0 on set 5bc072a8-ab17-446d-9744-e247949913c1
|
||||
host:~# make-bcache -C /dev/sdh2
|
||||
UUID: 7be7e175-8f4c-4f99-94b2-9c904d227045
|
||||
Set UUID: 5bc072a8-ab17-446d-9744-e247949913c1
|
||||
version: 0
|
||||
nbuckets: 106874
|
||||
block_size: 1
|
||||
bucket_size: 1024
|
||||
nr_in_set: 1
|
||||
nr_this_dev: 0
|
||||
first_bucket: 1
|
||||
[ 650.511912] bcache: run_cache_set() invalidating existing data
|
||||
[ 650.549228] bcache: register_cache() registered cache device sdh2
|
||||
|
||||
start backing device with missing cache::
|
||||
|
||||
host:/sys/block/md5/bcache# echo 1 > running
|
||||
|
||||
attach new cache::
|
||||
|
||||
host:/sys/block/md5/bcache# echo 5bc072a8-ab17-446d-9744-e247949913c1 > attach
|
||||
[ 865.276616] bcache: bch_cached_dev_attach() Caching md5 as bcache0 on set 5bc072a8-ab17-446d-9744-e247949913c1
|
||||
|
||||
|
||||
F) Remove or replace a caching device
|
||||
F) Remove or replace a caching device::
|
||||
|
||||
host:/sys/block/sda/sda7/bcache# echo 1 > detach
|
||||
[ 695.872542] bcache: cached_dev_detach_finish() Caching disabled for sda7
|
||||
@@ -226,13 +242,15 @@ F) Remove or replace a caching device
|
||||
wipefs: error: /dev/nvme0n1p4: probing initialization failed: Device or resource busy
|
||||
Ooops, it's disabled, but not unregistered, so it's still protected
|
||||
|
||||
We need to go and unregister it:
|
||||
We need to go and unregister it::
|
||||
|
||||
host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# ls -l cache0
|
||||
lrwxrwxrwx 1 root root 0 Feb 25 18:33 cache0 -> ../../../devices/pci0000:00/0000:00:1d.0/0000:70:00.0/nvme/nvme0/nvme0n1/nvme0n1p4/bcache/
|
||||
host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# echo 1 > stop
|
||||
kernel: [ 917.041908] bcache: cache_set_free() Cache set b7ba27a1-2398-4649-8ae3-0959f57ba128 unregistered
|
||||
|
||||
Now we can wipe it:
|
||||
Now we can wipe it::
|
||||
|
||||
host:~# wipefs -a /dev/nvme0n1p4
|
||||
/dev/nvme0n1p4: 16 bytes were erased at offset 0x00001018 (bcache): c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
|
||||
|
||||
@@ -252,40 +270,44 @@ if there are any active backing or caching devices left on it:
|
||||
|
||||
1) Is it present in /dev/bcache* ? (there are times where it won't be)
|
||||
|
||||
If so, it's easy:
|
||||
If so, it's easy::
|
||||
|
||||
host:/sys/block/bcache0/bcache# echo 1 > stop
|
||||
|
||||
2) But if your backing device is gone, this won't work:
|
||||
2) But if your backing device is gone, this won't work::
|
||||
|
||||
host:/sys/block/bcache0# cd bcache
|
||||
bash: cd: bcache: No such file or directory
|
||||
|
||||
In this case, you may have to unregister the dmcrypt block device that
|
||||
references this bcache to free it up:
|
||||
In this case, you may have to unregister the dmcrypt block device that
|
||||
references this bcache to free it up::
|
||||
|
||||
host:~# dmsetup remove oldds1
|
||||
bcache: bcache_device_free() bcache0 stopped
|
||||
bcache: cache_set_free() Cache set 5bc072a8-ab17-446d-9744-e247949913c1 unregistered
|
||||
|
||||
This causes the backing bcache to be removed from /sys/fs/bcache and
|
||||
then it can be reused. This would be true of any block device stacking
|
||||
where bcache is a lower device.
|
||||
This causes the backing bcache to be removed from /sys/fs/bcache and
|
||||
then it can be reused. This would be true of any block device stacking
|
||||
where bcache is a lower device.
|
||||
|
||||
3) In other cases, you can also look in /sys/fs/bcache/:
|
||||
3) In other cases, you can also look in /sys/fs/bcache/::
|
||||
|
||||
host:/sys/fs/bcache# ls -l */{cache?,bdev?}
|
||||
lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/bdev1 -> ../../../devices/virtual/block/dm-1/bcache/
|
||||
lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/cache0 -> ../../../devices/virtual/block/dm-4/bcache/
|
||||
lrwxrwxrwx 1 root root 0 Mar 5 09:39 5bc072a8-ab17-446d-9744-e247949913c1/cache0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/ata10/host9/target9:0:0/9:0:0:0/block/sdl/sdl2/bcache/
|
||||
host:/sys/fs/bcache# ls -l */{cache?,bdev?}
|
||||
lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/bdev1 -> ../../../devices/virtual/block/dm-1/bcache/
|
||||
lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/cache0 -> ../../../devices/virtual/block/dm-4/bcache/
|
||||
lrwxrwxrwx 1 root root 0 Mar 5 09:39 5bc072a8-ab17-446d-9744-e247949913c1/cache0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/ata10/host9/target9:0:0/9:0:0:0/block/sdl/sdl2/bcache/
|
||||
|
||||
The device names will show which UUID is relevant, cd in that directory
|
||||
and stop the cache::
|
||||
|
||||
The device names will show which UUID is relevant, cd in that directory
|
||||
and stop the cache:
|
||||
host:/sys/fs/bcache/5bc072a8-ab17-446d-9744-e247949913c1# echo 1 > stop
|
||||
|
||||
This will free up bcache references and let you reuse the partition for
|
||||
other purposes.
|
||||
This will free up bcache references and let you reuse the partition for
|
||||
other purposes.
|
||||
|
||||
|
||||
|
||||
TROUBLESHOOTING PERFORMANCE
|
||||
Troubleshooting performance
|
||||
---------------------------
|
||||
|
||||
Bcache has a bunch of config options and tunables. The defaults are intended to
|
||||
@@ -301,11 +323,13 @@ want for getting the best possible numbers when benchmarking.
|
||||
raid stripe size to get the disk multiples that you would like.
|
||||
|
||||
For example: If you have a 64k stripe size, then the following offset
|
||||
would provide alignment for many common RAID5 data spindle counts:
|
||||
would provide alignment for many common RAID5 data spindle counts::
|
||||
|
||||
64k * 2*2*2*3*3*5*7 bytes = 161280k
|
||||
|
||||
That space is wasted, but for only 157.5MB you can grow your RAID 5
|
||||
volume to the following data-spindle counts without re-aligning:
|
||||
volume to the following data-spindle counts without re-aligning::
|
||||
|
||||
3,4,5,6,7,8,9,10,12,14,15,18,20,21 ...
|
||||
|
||||
- Bad write performance
|
||||
@@ -313,9 +337,9 @@ want for getting the best possible numbers when benchmarking.
|
||||
If write performance is not what you expected, you probably wanted to be
|
||||
running in writeback mode, which isn't the default (not due to a lack of
|
||||
maturity, but simply because in writeback mode you'll lose data if something
|
||||
happens to your SSD)
|
||||
happens to your SSD)::
|
||||
|
||||
# echo writeback > /sys/block/bcache0/bcache/cache_mode
|
||||
# echo writeback > /sys/block/bcache0/bcache/cache_mode
|
||||
|
||||
- Bad performance, or traffic not going to the SSD that you'd expect
|
||||
|
||||
@@ -325,13 +349,13 @@ want for getting the best possible numbers when benchmarking.
|
||||
accessed data out of your cache.
|
||||
|
||||
But if you want to benchmark reads from cache, and you start out with fio
|
||||
writing an 8 gigabyte test file - so you want to disable that.
|
||||
writing an 8 gigabyte test file - so you want to disable that::
|
||||
|
||||
# echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
|
||||
# echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
|
||||
|
||||
To set it back to the default (4 mb), do
|
||||
To set it back to the default (4 mb), do::
|
||||
|
||||
# echo 4M > /sys/block/bcache0/bcache/sequential_cutoff
|
||||
# echo 4M > /sys/block/bcache0/bcache/sequential_cutoff
|
||||
|
||||
- Traffic's still going to the spindle/still getting cache misses
|
||||
|
||||
@@ -344,10 +368,10 @@ want for getting the best possible numbers when benchmarking.
|
||||
throttles traffic if the latency exceeds a threshold (it does this by
|
||||
cranking down the sequential bypass).
|
||||
|
||||
You can disable this if you need to by setting the thresholds to 0:
|
||||
You can disable this if you need to by setting the thresholds to 0::
|
||||
|
||||
# echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
|
||||
# echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us
|
||||
# echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
|
||||
# echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us
|
||||
|
||||
The default is 2000 us (2 milliseconds) for reads, and 20000 for writes.
|
||||
|
||||
@@ -369,7 +393,7 @@ want for getting the best possible numbers when benchmarking.
|
||||
a fix for the issue there).
|
||||
|
||||
|
||||
SYSFS - BACKING DEVICE
|
||||
Sysfs - backing device
|
||||
----------------------
|
||||
|
||||
Available at /sys/block/<bdev>/bcache, /sys/block/bcache*/bcache and
|
||||
@@ -454,7 +478,8 @@ writeback_running
|
||||
still be added to the cache until it is mostly full; only meant for
|
||||
benchmarking. Defaults to on.
|
||||
|
||||
SYSFS - BACKING DEVICE STATS:
|
||||
Sysfs - backing device stats
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
There are directories with these numbers for a running total, as well as
|
||||
versions that decay over the past day, hour and 5 minutes; they're also
|
||||
@@ -463,14 +488,11 @@ aggregated in the cache set directory as well.
|
||||
bypassed
|
||||
Amount of IO (both reads and writes) that has bypassed the cache
|
||||
|
||||
cache_hits
|
||||
cache_misses
|
||||
cache_hit_ratio
|
||||
cache_hits, cache_misses, cache_hit_ratio
|
||||
Hits and misses are counted per individual IO as bcache sees them; a
|
||||
partial hit is counted as a miss.
|
||||
|
||||
cache_bypass_hits
|
||||
cache_bypass_misses
|
||||
cache_bypass_hits, cache_bypass_misses
|
||||
Hits and misses for IO that is intended to skip the cache are still counted,
|
||||
but broken out here.
|
||||
|
||||
@@ -482,7 +504,8 @@ cache_miss_collisions
|
||||
cache_readaheads
|
||||
Count of times readahead occurred.
|
||||
|
||||
SYSFS - CACHE SET:
|
||||
Sysfs - cache set
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
Available at /sys/fs/bcache/<cset-uuid>
|
||||
|
||||
@@ -520,8 +543,7 @@ flash_vol_create
|
||||
Echoing a size to this file (in human readable units, k/M/G) creates a thinly
|
||||
provisioned volume backed by the cache set.
|
||||
|
||||
io_error_halflife
|
||||
io_error_limit
|
||||
io_error_halflife, io_error_limit
|
||||
These determines how many errors we accept before disabling the cache.
|
||||
Each error is decayed by the half life (in # ios). If the decaying count
|
||||
reaches io_error_limit dirty data is written out and the cache is disabled.
|
||||
@@ -545,7 +567,8 @@ unregister
|
||||
Detaches all backing devices and closes the cache devices; if dirty data is
|
||||
present it will disable writeback caching and wait for it to be flushed.
|
||||
|
||||
SYSFS - CACHE SET INTERNAL:
|
||||
Sysfs - cache set internal
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This directory also exposes timings for a number of internal operations, with
|
||||
separate files for average duration, average frequency, last occurrence and max
|
||||
@@ -574,7 +597,8 @@ cache_read_races
|
||||
trigger_gc
|
||||
Writing to this file forces garbage collection to run.
|
||||
|
||||
SYSFS - CACHE DEVICE:
|
||||
Sysfs - Cache device
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Available at /sys/block/<cdev>/bcache
|
||||
|
||||
|
||||
@@ -1,12 +1,8 @@
|
||||
===============================================================
|
||||
== BT8XXGPIO driver ==
|
||||
== ==
|
||||
== A driver for a selfmade cheap BT8xx based PCI GPIO-card ==
|
||||
== ==
|
||||
== For advanced documentation, see ==
|
||||
== http://www.bu3sch.de/btgpio.php ==
|
||||
===============================================================
|
||||
===================================================================
|
||||
A driver for a selfmade cheap BT8xx based PCI GPIO-card (bt8xxgpio)
|
||||
===================================================================
|
||||
|
||||
For advanced documentation, see http://www.bu3sch.de/btgpio.php
|
||||
|
||||
A generic digital 24-port PCI GPIO card can be built out of an ordinary
|
||||
Brooktree bt848, bt849, bt878 or bt879 based analog TV tuner card. The
|
||||
@@ -17,9 +13,8 @@ The bt8xx chip does have 24 digital GPIO ports.
|
||||
These ports are accessible via 24 pins on the SMD chip package.
|
||||
|
||||
|
||||
==============================================
|
||||
== How to physically access the GPIO pins ==
|
||||
==============================================
|
||||
How to physically access the GPIO pins
|
||||
======================================
|
||||
|
||||
The are several ways to access these pins. One might unsolder the whole chip
|
||||
and put it on a custom PCI board, or one might only unsolder each individual
|
||||
@@ -27,7 +22,7 @@ GPIO pin and solder that to some tiny wire. As the chip package really is tiny
|
||||
there are some advanced soldering skills needed in any case.
|
||||
|
||||
The physical pinouts are drawn in the following ASCII art.
|
||||
The GPIO pins are marked with G00-G23
|
||||
The GPIO pins are marked with G00-G23::
|
||||
|
||||
G G G G G G G G G G G G G G G G G G
|
||||
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
|
||||
|
||||
+35
-30
@@ -1,18 +1,16 @@
|
||||
=======================================================================
|
||||
README for btmrvl driver
|
||||
=======================================================================
|
||||
|
||||
=============
|
||||
btmrvl driver
|
||||
=============
|
||||
|
||||
All commands are used via debugfs interface.
|
||||
|
||||
=====================
|
||||
Set/get driver configurations:
|
||||
Set/get driver configurations
|
||||
=============================
|
||||
|
||||
Path: /debug/btmrvl/config/
|
||||
|
||||
gpiogap=[n]
|
||||
hscfgcmd
|
||||
These commands are used to configure the host sleep parameters.
|
||||
gpiogap=[n], hscfgcmd
|
||||
These commands are used to configure the host sleep parameters::
|
||||
bit 8:0 -- Gap
|
||||
bit 16:8 -- GPIO
|
||||
|
||||
@@ -23,7 +21,8 @@ hscfgcmd
|
||||
where Gap is the gap in milli seconds between wakeup signal and
|
||||
wakeup event, or 0xff for special host sleep setting.
|
||||
|
||||
Usage:
|
||||
Usage::
|
||||
|
||||
# Use SDIO interface to wake up the host and set GAP to 0x80:
|
||||
echo 0xff80 > /debug/btmrvl/config/gpiogap
|
||||
echo 1 > /debug/btmrvl/config/hscfgcmd
|
||||
@@ -32,15 +31,16 @@ hscfgcmd
|
||||
echo 0x03ff > /debug/btmrvl/config/gpiogap
|
||||
echo 1 > /debug/btmrvl/config/hscfgcmd
|
||||
|
||||
psmode=[n]
|
||||
pscmd
|
||||
psmode=[n], pscmd
|
||||
These commands are used to enable/disable auto sleep mode
|
||||
|
||||
where the option is:
|
||||
where the option is::
|
||||
|
||||
1 -- Enable auto sleep mode
|
||||
0 -- Disable auto sleep mode
|
||||
|
||||
Usage:
|
||||
Usage::
|
||||
|
||||
# Enable auto sleep mode
|
||||
echo 1 > /debug/btmrvl/config/psmode
|
||||
echo 1 > /debug/btmrvl/config/pscmd
|
||||
@@ -50,15 +50,16 @@ pscmd
|
||||
echo 1 > /debug/btmrvl/config/pscmd
|
||||
|
||||
|
||||
hsmode=[n]
|
||||
hscmd
|
||||
hsmode=[n], hscmd
|
||||
These commands are used to enable host sleep or wake up firmware
|
||||
|
||||
where the option is:
|
||||
where the option is::
|
||||
|
||||
1 -- Enable host sleep
|
||||
0 -- Wake up firmware
|
||||
|
||||
Usage:
|
||||
Usage::
|
||||
|
||||
# Enable host sleep
|
||||
echo 1 > /debug/btmrvl/config/hsmode
|
||||
echo 1 > /debug/btmrvl/config/hscmd
|
||||
@@ -68,12 +69,13 @@ hscmd
|
||||
echo 1 > /debug/btmrvl/config/hscmd
|
||||
|
||||
|
||||
======================
|
||||
Get driver status:
|
||||
Get driver status
|
||||
=================
|
||||
|
||||
Path: /debug/btmrvl/status/
|
||||
|
||||
Usage:
|
||||
Usage::
|
||||
|
||||
cat /debug/btmrvl/status/<args>
|
||||
|
||||
where the args are:
|
||||
@@ -90,14 +92,17 @@ hsstate
|
||||
txdnldrdy
|
||||
This command displays the value of Tx download ready flag.
|
||||
|
||||
|
||||
=====================
|
||||
Issuing a raw hci command
|
||||
=========================
|
||||
|
||||
Use hcitool to issue raw hci command, refer to hcitool manual
|
||||
|
||||
Usage: Hcitool cmd <ogf> <ocf> [Parameters]
|
||||
Usage::
|
||||
|
||||
Hcitool cmd <ogf> <ocf> [Parameters]
|
||||
|
||||
Interface Control Command::
|
||||
|
||||
Interface Control Command
|
||||
hcitool cmd 0x3f 0x5b 0xf5 0x01 0x00 --Enable All interface
|
||||
hcitool cmd 0x3f 0x5b 0xf5 0x01 0x01 --Enable Wlan interface
|
||||
hcitool cmd 0x3f 0x5b 0xf5 0x01 0x02 --Enable BT interface
|
||||
@@ -105,13 +110,13 @@ Use hcitool to issue raw hci command, refer to hcitool manual
|
||||
hcitool cmd 0x3f 0x5b 0xf5 0x00 0x01 --Disable Wlan interface
|
||||
hcitool cmd 0x3f 0x5b 0xf5 0x00 0x02 --Disable BT interface
|
||||
|
||||
=======================================================================
|
||||
SD8688 firmware
|
||||
===============
|
||||
|
||||
Images:
|
||||
|
||||
SD8688 firmware:
|
||||
|
||||
/lib/firmware/sd8688_helper.bin
|
||||
/lib/firmware/sd8688.bin
|
||||
- /lib/firmware/sd8688_helper.bin
|
||||
- /lib/firmware/sd8688.bin
|
||||
|
||||
|
||||
The images can be downloaded from:
|
||||
|
||||
@@ -1,17 +1,27 @@
|
||||
[ NOTE: The virt_to_bus() and bus_to_virt() functions have been
|
||||
==========================================================
|
||||
How to access I/O mapped memory from within device drivers
|
||||
==========================================================
|
||||
|
||||
:Author: Linus
|
||||
|
||||
.. warning::
|
||||
|
||||
The virt_to_bus() and bus_to_virt() functions have been
|
||||
superseded by the functionality provided by the PCI DMA interface
|
||||
(see Documentation/DMA-API-HOWTO.txt). They continue
|
||||
to be documented below for historical purposes, but new code
|
||||
must not use them. --davidm 00/12/12 ]
|
||||
must not use them. --davidm 00/12/12
|
||||
|
||||
[ This is a mail message in response to a query on IO mapping, thus the
|
||||
strange format for a "document" ]
|
||||
::
|
||||
|
||||
[ This is a mail message in response to a query on IO mapping, thus the
|
||||
strange format for a "document" ]
|
||||
|
||||
The AHA-1542 is a bus-master device, and your patch makes the driver give the
|
||||
controller the physical address of the buffers, which is correct on x86
|
||||
(because all bus master devices see the physical memory mappings directly).
|
||||
|
||||
However, on many setups, there are actually _three_ different ways of looking
|
||||
However, on many setups, there are actually **three** different ways of looking
|
||||
at memory addresses, and in this case we actually want the third, the
|
||||
so-called "bus address".
|
||||
|
||||
@@ -38,7 +48,7 @@ because the memory and the devices share the same address space, and that is
|
||||
not generally necessarily true on other PCI/ISA setups.
|
||||
|
||||
Now, just as an example, on the PReP (PowerPC Reference Platform), the
|
||||
CPU sees a memory map something like this (this is from memory):
|
||||
CPU sees a memory map something like this (this is from memory)::
|
||||
|
||||
0-2 GB "real memory"
|
||||
2 GB-3 GB "system IO" (inb/out and similar accesses on x86)
|
||||
@@ -52,7 +62,7 @@ So when the CPU wants any bus master to write to physical memory 0, it
|
||||
has to give the master address 0x80000000 as the memory address.
|
||||
|
||||
So, for example, depending on how the kernel is actually mapped on the
|
||||
PPC, you can end up with a setup like this:
|
||||
PPC, you can end up with a setup like this::
|
||||
|
||||
physical address: 0
|
||||
virtual address: 0xC0000000
|
||||
@@ -61,7 +71,7 @@ PPC, you can end up with a setup like this:
|
||||
where all the addresses actually point to the same thing. It's just seen
|
||||
through different translations..
|
||||
|
||||
Similarly, on the Alpha, the normal translation is
|
||||
Similarly, on the Alpha, the normal translation is::
|
||||
|
||||
physical address: 0
|
||||
virtual address: 0xfffffc0000000000
|
||||
@@ -70,7 +80,7 @@ Similarly, on the Alpha, the normal translation is
|
||||
(but there are also Alphas where the physical address and the bus address
|
||||
are the same).
|
||||
|
||||
Anyway, the way to look up all these translations, you do
|
||||
Anyway, the way to look up all these translations, you do::
|
||||
|
||||
#include <asm/io.h>
|
||||
|
||||
@@ -81,8 +91,8 @@ Anyway, the way to look up all these translations, you do
|
||||
|
||||
Now, when do you need these?
|
||||
|
||||
You want the _virtual_ address when you are actually going to access that
|
||||
pointer from the kernel. So you can have something like this:
|
||||
You want the **virtual** address when you are actually going to access that
|
||||
pointer from the kernel. So you can have something like this::
|
||||
|
||||
/*
|
||||
* this is the hardware "mailbox" we use to communicate with
|
||||
@@ -104,7 +114,7 @@ pointer from the kernel. So you can have something like this:
|
||||
...
|
||||
|
||||
on the other hand, you want the bus address when you have a buffer that
|
||||
you want to give to the controller:
|
||||
you want to give to the controller::
|
||||
|
||||
/* ask the controller to read the sense status into "sense_buffer" */
|
||||
mbox.bufstart = virt_to_bus(&sense_buffer);
|
||||
@@ -112,7 +122,7 @@ you want to give to the controller:
|
||||
mbox.status = 0;
|
||||
notify_controller(&mbox);
|
||||
|
||||
And you generally _never_ want to use the physical address, because you can't
|
||||
And you generally **never** want to use the physical address, because you can't
|
||||
use that from the CPU (the CPU only uses translated virtual addresses), and
|
||||
you can't use it from the bus master.
|
||||
|
||||
@@ -124,8 +134,10 @@ be remapped as measured in units of pages, a.k.a. the pfn (the memory
|
||||
management layer doesn't know about devices outside the CPU, so it
|
||||
shouldn't need to know about "bus addresses" etc).
|
||||
|
||||
NOTE NOTE NOTE! The above is only one part of the whole equation. The above
|
||||
only talks about "real memory", that is, CPU memory (RAM).
|
||||
.. note::
|
||||
|
||||
The above is only one part of the whole equation. The above
|
||||
only talks about "real memory", that is, CPU memory (RAM).
|
||||
|
||||
There is a completely different type of memory too, and that's the "shared
|
||||
memory" on the PCI or ISA bus. That's generally not RAM (although in the case
|
||||
@@ -137,20 +149,22 @@ whatever, and there is only one way to access it: the readb/writeb and
|
||||
related functions. You should never take the address of such memory, because
|
||||
there is really nothing you can do with such an address: it's not
|
||||
conceptually in the same memory space as "real memory" at all, so you cannot
|
||||
just dereference a pointer. (Sadly, on x86 it _is_ in the same memory space,
|
||||
just dereference a pointer. (Sadly, on x86 it **is** in the same memory space,
|
||||
so on x86 it actually works to just deference a pointer, but it's not
|
||||
portable).
|
||||
|
||||
For such memory, you can do things like
|
||||
For such memory, you can do things like:
|
||||
|
||||
- reading::
|
||||
|
||||
- reading:
|
||||
/*
|
||||
* read first 32 bits from ISA memory at 0xC0000, aka
|
||||
* C000:0000 in DOS terms
|
||||
*/
|
||||
unsigned int signature = isa_readl(0xC0000);
|
||||
|
||||
- remapping and writing:
|
||||
- remapping and writing::
|
||||
|
||||
/*
|
||||
* remap framebuffer PCI memory area at 0xFC000000,
|
||||
* size 1MB, so that we can access it: We can directly
|
||||
@@ -165,7 +179,8 @@ For such memory, you can do things like
|
||||
/* unmap when we unload the driver */
|
||||
iounmap(baseptr);
|
||||
|
||||
- copying and clearing:
|
||||
- copying and clearing::
|
||||
|
||||
/* get the 6-byte Ethernet address at ISA address E000:0040 */
|
||||
memcpy_fromio(kernel_buffer, 0xE0040, 6);
|
||||
/* write a packet to the driver */
|
||||
@@ -181,10 +196,10 @@ happy that your driver works ;)
|
||||
Note that kernel versions 2.0.x (and earlier) mistakenly called the
|
||||
ioremap() function "vremap()". ioremap() is the proper name, but I
|
||||
didn't think straight when I wrote it originally. People who have to
|
||||
support both can do something like:
|
||||
support both can do something like::
|
||||
|
||||
/* support old naming silliness */
|
||||
#if LINUX_VERSION_CODE < 0x020100
|
||||
#if LINUX_VERSION_CODE < 0x020100
|
||||
#define ioremap vremap
|
||||
#define iounmap vfree
|
||||
#endif
|
||||
@@ -196,13 +211,10 @@ And the above sounds worse than it really is. Most real drivers really
|
||||
don't do all that complex things (or rather: the complexity is not so
|
||||
much in the actual IO accesses as in error handling and timeouts etc).
|
||||
It's generally not hard to fix drivers, and in many cases the code
|
||||
actually looks better afterwards:
|
||||
actually looks better afterwards::
|
||||
|
||||
unsigned long signature = *(unsigned int *) 0xC0000;
|
||||
vs
|
||||
unsigned long signature = readl(0xC0000);
|
||||
|
||||
I think the second version actually is more readable, no?
|
||||
|
||||
Linus
|
||||
|
||||
|
||||
+52
-40
@@ -1,7 +1,8 @@
|
||||
Cache and TLB Flushing
|
||||
Under Linux
|
||||
==================================
|
||||
Cache and TLB Flushing Under Linux
|
||||
==================================
|
||||
|
||||
David S. Miller <davem@redhat.com>
|
||||
:Author: David S. Miller <davem@redhat.com>
|
||||
|
||||
This document describes the cache/tlb flushing interfaces called
|
||||
by the Linux VM subsystem. It enumerates over each interface,
|
||||
@@ -28,7 +29,7 @@ Therefore when software page table changes occur, the kernel will
|
||||
invoke one of the following flush methods _after_ the page table
|
||||
changes occur:
|
||||
|
||||
1) void flush_tlb_all(void)
|
||||
1) ``void flush_tlb_all(void)``
|
||||
|
||||
The most severe flush of all. After this interface runs,
|
||||
any previous page table modification whatsoever will be
|
||||
@@ -37,7 +38,7 @@ changes occur:
|
||||
This is usually invoked when the kernel page tables are
|
||||
changed, since such translations are "global" in nature.
|
||||
|
||||
2) void flush_tlb_mm(struct mm_struct *mm)
|
||||
2) ``void flush_tlb_mm(struct mm_struct *mm)``
|
||||
|
||||
This interface flushes an entire user address space from
|
||||
the TLB. After running, this interface must make sure that
|
||||
@@ -49,8 +50,8 @@ changes occur:
|
||||
page table operations such as what happens during
|
||||
fork, and exec.
|
||||
|
||||
3) void flush_tlb_range(struct vm_area_struct *vma,
|
||||
unsigned long start, unsigned long end)
|
||||
3) ``void flush_tlb_range(struct vm_area_struct *vma,
|
||||
unsigned long start, unsigned long end)``
|
||||
|
||||
Here we are flushing a specific range of (user) virtual
|
||||
address translations from the TLB. After running, this
|
||||
@@ -69,7 +70,7 @@ changes occur:
|
||||
call flush_tlb_page (see below) for each entry which may be
|
||||
modified.
|
||||
|
||||
4) void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
|
||||
4) ``void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)``
|
||||
|
||||
This time we need to remove the PAGE_SIZE sized translation
|
||||
from the TLB. The 'vma' is the backing structure used by
|
||||
@@ -87,8 +88,8 @@ changes occur:
|
||||
|
||||
This is used primarily during fault processing.
|
||||
|
||||
5) void update_mmu_cache(struct vm_area_struct *vma,
|
||||
unsigned long address, pte_t *ptep)
|
||||
5) ``void update_mmu_cache(struct vm_area_struct *vma,
|
||||
unsigned long address, pte_t *ptep)``
|
||||
|
||||
At the end of every page fault, this routine is invoked to
|
||||
tell the architecture specific code that a translation
|
||||
@@ -100,7 +101,7 @@ changes occur:
|
||||
translations for software managed TLB configurations.
|
||||
The sparc64 port currently does this.
|
||||
|
||||
6) void tlb_migrate_finish(struct mm_struct *mm)
|
||||
6) ``void tlb_migrate_finish(struct mm_struct *mm)``
|
||||
|
||||
This interface is called at the end of an explicit
|
||||
process migration. This interface provides a hook
|
||||
@@ -112,7 +113,7 @@ changes occur:
|
||||
|
||||
Next, we have the cache flushing interfaces. In general, when Linux
|
||||
is changing an existing virtual-->physical mapping to a new value,
|
||||
the sequence will be in one of the following forms:
|
||||
the sequence will be in one of the following forms::
|
||||
|
||||
1) flush_cache_mm(mm);
|
||||
change_all_page_tables_of(mm);
|
||||
@@ -143,7 +144,7 @@ and have no dependency on translation information.
|
||||
|
||||
Here are the routines, one by one:
|
||||
|
||||
1) void flush_cache_mm(struct mm_struct *mm)
|
||||
1) ``void flush_cache_mm(struct mm_struct *mm)``
|
||||
|
||||
This interface flushes an entire user address space from
|
||||
the caches. That is, after running, there will be no cache
|
||||
@@ -152,7 +153,7 @@ Here are the routines, one by one:
|
||||
This interface is used to handle whole address space
|
||||
page table operations such as what happens during exit and exec.
|
||||
|
||||
2) void flush_cache_dup_mm(struct mm_struct *mm)
|
||||
2) ``void flush_cache_dup_mm(struct mm_struct *mm)``
|
||||
|
||||
This interface flushes an entire user address space from
|
||||
the caches. That is, after running, there will be no cache
|
||||
@@ -164,8 +165,8 @@ Here are the routines, one by one:
|
||||
This option is separate from flush_cache_mm to allow some
|
||||
optimizations for VIPT caches.
|
||||
|
||||
3) void flush_cache_range(struct vm_area_struct *vma,
|
||||
unsigned long start, unsigned long end)
|
||||
3) ``void flush_cache_range(struct vm_area_struct *vma,
|
||||
unsigned long start, unsigned long end)``
|
||||
|
||||
Here we are flushing a specific range of (user) virtual
|
||||
addresses from the cache. After running, there will be no
|
||||
@@ -181,7 +182,7 @@ Here are the routines, one by one:
|
||||
call flush_cache_page (see below) for each entry which may be
|
||||
modified.
|
||||
|
||||
4) void flush_cache_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn)
|
||||
4) ``void flush_cache_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn)``
|
||||
|
||||
This time we need to remove a PAGE_SIZE sized range
|
||||
from the cache. The 'vma' is the backing structure used by
|
||||
@@ -202,7 +203,7 @@ Here are the routines, one by one:
|
||||
|
||||
This is used primarily during fault processing.
|
||||
|
||||
5) void flush_cache_kmaps(void)
|
||||
5) ``void flush_cache_kmaps(void)``
|
||||
|
||||
This routine need only be implemented if the platform utilizes
|
||||
highmem. It will be called right before all of the kmaps
|
||||
@@ -214,8 +215,8 @@ Here are the routines, one by one:
|
||||
|
||||
This routing should be implemented in asm/highmem.h
|
||||
|
||||
6) void flush_cache_vmap(unsigned long start, unsigned long end)
|
||||
void flush_cache_vunmap(unsigned long start, unsigned long end)
|
||||
6) ``void flush_cache_vmap(unsigned long start, unsigned long end)``
|
||||
``void flush_cache_vunmap(unsigned long start, unsigned long end)``
|
||||
|
||||
Here in these two interfaces we are flushing a specific range
|
||||
of (kernel) virtual addresses from the cache. After running,
|
||||
@@ -243,8 +244,10 @@ size). This setting will force the SYSv IPC layer to only allow user
|
||||
processes to mmap shared memory at address which are a multiple of
|
||||
this value.
|
||||
|
||||
NOTE: This does not fix shared mmaps, check out the sparc64 port for
|
||||
one way to solve this (in particular SPARC_FLAG_MMAPSHARED).
|
||||
.. note::
|
||||
|
||||
This does not fix shared mmaps, check out the sparc64 port for
|
||||
one way to solve this (in particular SPARC_FLAG_MMAPSHARED).
|
||||
|
||||
Next, you have to solve the D-cache aliasing issue for all
|
||||
other cases. Please keep in mind that fact that, for a given page
|
||||
@@ -255,8 +258,8 @@ physical page into its address space, by implication the D-cache
|
||||
aliasing problem has the potential to exist since the kernel already
|
||||
maps this page at its virtual address.
|
||||
|
||||
void copy_user_page(void *to, void *from, unsigned long addr, struct page *page)
|
||||
void clear_user_page(void *to, unsigned long addr, struct page *page)
|
||||
``void copy_user_page(void *to, void *from, unsigned long addr, struct page *page)``
|
||||
``void clear_user_page(void *to, unsigned long addr, struct page *page)``
|
||||
|
||||
These two routines store data in user anonymous or COW
|
||||
pages. It allows a port to efficiently avoid D-cache alias
|
||||
@@ -276,14 +279,16 @@ maps this page at its virtual address.
|
||||
If D-cache aliasing is not an issue, these two routines may
|
||||
simply call memcpy/memset directly and do nothing more.
|
||||
|
||||
void flush_dcache_page(struct page *page)
|
||||
``void flush_dcache_page(struct page *page)``
|
||||
|
||||
Any time the kernel writes to a page cache page, _OR_
|
||||
the kernel is about to read from a page cache page and
|
||||
user space shared/writable mappings of this page potentially
|
||||
exist, this routine is called.
|
||||
|
||||
NOTE: This routine need only be called for page cache pages
|
||||
.. note::
|
||||
|
||||
This routine need only be called for page cache pages
|
||||
which can potentially ever be mapped into the address
|
||||
space of a user process. So for example, VFS layer code
|
||||
handling vfs symlinks in the page cache need not call
|
||||
@@ -322,18 +327,19 @@ maps this page at its virtual address.
|
||||
made of this flag bit, and if set the flush is done and the flag
|
||||
bit is cleared.
|
||||
|
||||
IMPORTANT NOTE: It is often important, if you defer the flush,
|
||||
.. important::
|
||||
|
||||
It is often important, if you defer the flush,
|
||||
that the actual flush occurs on the same CPU
|
||||
as did the cpu stores into the page to make it
|
||||
dirty. Again, see sparc64 for examples of how
|
||||
to deal with this.
|
||||
|
||||
void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
|
||||
unsigned long user_vaddr,
|
||||
void *dst, void *src, int len)
|
||||
void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
|
||||
unsigned long user_vaddr,
|
||||
void *dst, void *src, int len)
|
||||
``void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
|
||||
unsigned long user_vaddr, void *dst, void *src, int len)``
|
||||
``void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
|
||||
unsigned long user_vaddr, void *dst, void *src, int len)``
|
||||
|
||||
When the kernel needs to copy arbitrary data in and out
|
||||
of arbitrary user pages (f.e. for ptrace()) it will use
|
||||
these two routines.
|
||||
@@ -344,8 +350,9 @@ maps this page at its virtual address.
|
||||
likely that you will need to flush the instruction cache
|
||||
for copy_to_user_page().
|
||||
|
||||
void flush_anon_page(struct vm_area_struct *vma, struct page *page,
|
||||
unsigned long vmaddr)
|
||||
``void flush_anon_page(struct vm_area_struct *vma, struct page *page,
|
||||
unsigned long vmaddr)``
|
||||
|
||||
When the kernel needs to access the contents of an anonymous
|
||||
page, it calls this function (currently only
|
||||
get_user_pages()). Note: flush_dcache_page() deliberately
|
||||
@@ -354,7 +361,8 @@ maps this page at its virtual address.
|
||||
architectures). For incoherent architectures, it should flush
|
||||
the cache of the page at vmaddr.
|
||||
|
||||
void flush_kernel_dcache_page(struct page *page)
|
||||
``void flush_kernel_dcache_page(struct page *page)``
|
||||
|
||||
When the kernel needs to modify a user page is has obtained
|
||||
with kmap, it calls this function after all modifications are
|
||||
complete (but before kunmapping it) to bring the underlying
|
||||
@@ -366,14 +374,16 @@ maps this page at its virtual address.
|
||||
the kernel cache for page (using page_address(page)).
|
||||
|
||||
|
||||
void flush_icache_range(unsigned long start, unsigned long end)
|
||||
``void flush_icache_range(unsigned long start, unsigned long end)``
|
||||
|
||||
When the kernel stores into addresses that it will execute
|
||||
out of (eg when loading modules), this function is called.
|
||||
|
||||
If the icache does not snoop stores then this routine will need
|
||||
to flush it.
|
||||
|
||||
void flush_icache_page(struct vm_area_struct *vma, struct page *page)
|
||||
``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``
|
||||
|
||||
All the functionality of flush_icache_page can be implemented in
|
||||
flush_dcache_page and update_mmu_cache. In the future, the hope
|
||||
is to remove this interface completely.
|
||||
@@ -387,7 +397,8 @@ the kernel trying to do I/O to vmap areas must manually manage
|
||||
coherency. It must do this by flushing the vmap range before doing
|
||||
I/O and invalidating it after the I/O returns.
|
||||
|
||||
void flush_kernel_vmap_range(void *vaddr, int size)
|
||||
``void flush_kernel_vmap_range(void *vaddr, int size)``
|
||||
|
||||
flushes the kernel cache for a given virtual address range in
|
||||
the vmap area. This is to make sure that any data the kernel
|
||||
modified in the vmap range is made visible to the physical
|
||||
@@ -395,7 +406,8 @@ I/O and invalidating it after the I/O returns.
|
||||
Note that this API does *not* also flush the offset map alias
|
||||
of the area.
|
||||
|
||||
void invalidate_kernel_vmap_range(void *vaddr, int size) invalidates
|
||||
``void invalidate_kernel_vmap_range(void *vaddr, int size) invalidates``
|
||||
|
||||
the cache for a given virtual address range in the vmap area
|
||||
which prevents the processor from making the cache stale by
|
||||
speculatively reading data while the I/O was occurring to the
|
||||
|
||||
+235
-217
File diff suppressed because it is too large
Load Diff
@@ -1,9 +1,9 @@
|
||||
================
|
||||
CIRCULAR BUFFERS
|
||||
================
|
||||
================
|
||||
Circular Buffers
|
||||
================
|
||||
|
||||
By: David Howells <dhowells@redhat.com>
|
||||
Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
||||
:Author: David Howells <dhowells@redhat.com>
|
||||
:Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
||||
|
||||
|
||||
Linux provides a number of features that can be used to implement circular
|
||||
@@ -20,7 +20,7 @@ producer and just one consumer. It is possible to handle multiple producers by
|
||||
serialising them, and to handle multiple consumers by serialising them.
|
||||
|
||||
|
||||
Contents:
|
||||
.. Contents:
|
||||
|
||||
(*) What is a circular buffer?
|
||||
|
||||
@@ -31,8 +31,8 @@ Contents:
|
||||
- The consumer.
|
||||
|
||||
|
||||
==========================
|
||||
WHAT IS A CIRCULAR BUFFER?
|
||||
|
||||
What is a circular buffer?
|
||||
==========================
|
||||
|
||||
First of all, what is a circular buffer? A circular buffer is a buffer of
|
||||
@@ -60,9 +60,7 @@ buffer, provided that neither index overtakes the other. The implementer must
|
||||
be careful, however, as a region more than one unit in size may wrap the end of
|
||||
the buffer and be broken into two segments.
|
||||
|
||||
|
||||
============================
|
||||
MEASURING POWER-OF-2 BUFFERS
|
||||
Measuring power-of-2 buffers
|
||||
============================
|
||||
|
||||
Calculation of the occupancy or the remaining capacity of an arbitrarily sized
|
||||
@@ -71,13 +69,13 @@ modulus (divide) instruction. However, if the buffer is of a power-of-2 size,
|
||||
then a much quicker bitwise-AND instruction can be used instead.
|
||||
|
||||
Linux provides a set of macros for handling power-of-2 circular buffers. These
|
||||
can be made use of by:
|
||||
can be made use of by::
|
||||
|
||||
#include <linux/circ_buf.h>
|
||||
|
||||
The macros are:
|
||||
|
||||
(*) Measure the remaining capacity of a buffer:
|
||||
(#) Measure the remaining capacity of a buffer::
|
||||
|
||||
CIRC_SPACE(head_index, tail_index, buffer_size);
|
||||
|
||||
@@ -85,7 +83,7 @@ The macros are:
|
||||
can be inserted.
|
||||
|
||||
|
||||
(*) Measure the maximum consecutive immediate space in a buffer:
|
||||
(#) Measure the maximum consecutive immediate space in a buffer::
|
||||
|
||||
CIRC_SPACE_TO_END(head_index, tail_index, buffer_size);
|
||||
|
||||
@@ -94,14 +92,14 @@ The macros are:
|
||||
beginning of the buffer.
|
||||
|
||||
|
||||
(*) Measure the occupancy of a buffer:
|
||||
(#) Measure the occupancy of a buffer::
|
||||
|
||||
CIRC_CNT(head_index, tail_index, buffer_size);
|
||||
|
||||
This returns the number of items currently occupying a buffer[2].
|
||||
|
||||
|
||||
(*) Measure the non-wrapping occupancy of a buffer:
|
||||
(#) Measure the non-wrapping occupancy of a buffer::
|
||||
|
||||
CIRC_CNT_TO_END(head_index, tail_index, buffer_size);
|
||||
|
||||
@@ -112,7 +110,7 @@ The macros are:
|
||||
Each of these macros will nominally return a value between 0 and buffer_size-1,
|
||||
however:
|
||||
|
||||
[1] CIRC_SPACE*() are intended to be used in the producer. To the producer
|
||||
(1) CIRC_SPACE*() are intended to be used in the producer. To the producer
|
||||
they will return a lower bound as the producer controls the head index,
|
||||
but the consumer may still be depleting the buffer on another CPU and
|
||||
moving the tail index.
|
||||
@@ -120,7 +118,7 @@ however:
|
||||
To the consumer it will show an upper bound as the producer may be busy
|
||||
depleting the space.
|
||||
|
||||
[2] CIRC_CNT*() are intended to be used in the consumer. To the consumer they
|
||||
(2) CIRC_CNT*() are intended to be used in the consumer. To the consumer they
|
||||
will return a lower bound as the consumer controls the tail index, but the
|
||||
producer may still be filling the buffer on another CPU and moving the
|
||||
head index.
|
||||
@@ -128,14 +126,12 @@ however:
|
||||
To the producer it will show an upper bound as the consumer may be busy
|
||||
emptying the buffer.
|
||||
|
||||
[3] To a third party, the order in which the writes to the indices by the
|
||||
(3) To a third party, the order in which the writes to the indices by the
|
||||
producer and consumer become visible cannot be guaranteed as they are
|
||||
independent and may be made on different CPUs - so the result in such a
|
||||
situation will merely be a guess, and may even be negative.
|
||||
|
||||
|
||||
===========================================
|
||||
USING MEMORY BARRIERS WITH CIRCULAR BUFFERS
|
||||
Using memory barriers with circular buffers
|
||||
===========================================
|
||||
|
||||
By using memory barriers in conjunction with circular buffers, you can avoid
|
||||
@@ -152,10 +148,10 @@ time, and only one thing should be emptying a buffer at any one time, but the
|
||||
two sides can operate simultaneously.
|
||||
|
||||
|
||||
THE PRODUCER
|
||||
The producer
|
||||
------------
|
||||
|
||||
The producer will look something like this:
|
||||
The producer will look something like this::
|
||||
|
||||
spin_lock(&producer_lock);
|
||||
|
||||
@@ -193,10 +189,10 @@ ordering between the read of the index indicating that the consumer has
|
||||
vacated a given element and the write by the producer to that same element.
|
||||
|
||||
|
||||
THE CONSUMER
|
||||
The Consumer
|
||||
------------
|
||||
|
||||
The consumer will look something like this:
|
||||
The consumer will look something like this::
|
||||
|
||||
spin_lock(&consumer_lock);
|
||||
|
||||
@@ -235,8 +231,7 @@ prevents the compiler from tearing the store, and enforces ordering
|
||||
against previous accesses.
|
||||
|
||||
|
||||
===============
|
||||
FURTHER READING
|
||||
Further reading
|
||||
===============
|
||||
|
||||
See also Documentation/memory-barriers.txt for a description of Linux's memory
|
||||
|
||||
+106
-83
@@ -1,12 +1,16 @@
|
||||
The Common Clk Framework
|
||||
Mike Turquette <mturquette@ti.com>
|
||||
========================
|
||||
The Common Clk Framework
|
||||
========================
|
||||
|
||||
:Author: Mike Turquette <mturquette@ti.com>
|
||||
|
||||
This document endeavours to explain the common clk framework details,
|
||||
and how to port a platform over to this framework. It is not yet a
|
||||
detailed explanation of the clock api in include/linux/clk.h, but
|
||||
perhaps someday it will include that information.
|
||||
|
||||
Part 1 - introduction and interface split
|
||||
Introduction and interface split
|
||||
================================
|
||||
|
||||
The common clk framework is an interface to control the clock nodes
|
||||
available on various devices today. This may come in the form of clock
|
||||
@@ -35,10 +39,11 @@ is defined in struct clk_foo and pointed to within struct clk_core. This
|
||||
allows for easy navigation between the two discrete halves of the common
|
||||
clock interface.
|
||||
|
||||
Part 2 - common data structures and api
|
||||
Common data structures and api
|
||||
==============================
|
||||
|
||||
Below is the common struct clk_core definition from
|
||||
drivers/clk/clk.c, modified for brevity:
|
||||
drivers/clk/clk.c, modified for brevity::
|
||||
|
||||
struct clk_core {
|
||||
const char *name;
|
||||
@@ -59,7 +64,7 @@ struct clk. That api is documented in include/linux/clk.h.
|
||||
|
||||
Platforms and devices utilizing the common struct clk_core use the struct
|
||||
clk_ops pointer in struct clk_core to perform the hardware-specific parts of
|
||||
the operations defined in clk-provider.h:
|
||||
the operations defined in clk-provider.h::
|
||||
|
||||
struct clk_ops {
|
||||
int (*prepare)(struct clk_hw *hw);
|
||||
@@ -95,19 +100,20 @@ the operations defined in clk-provider.h:
|
||||
struct dentry *dentry);
|
||||
};
|
||||
|
||||
Part 3 - hardware clk implementations
|
||||
Hardware clk implementations
|
||||
============================
|
||||
|
||||
The strength of the common struct clk_core comes from its .ops and .hw pointers
|
||||
which abstract the details of struct clk from the hardware-specific bits, and
|
||||
vice versa. To illustrate consider the simple gateable clk implementation in
|
||||
drivers/clk/clk-gate.c:
|
||||
drivers/clk/clk-gate.c::
|
||||
|
||||
struct clk_gate {
|
||||
struct clk_hw hw;
|
||||
void __iomem *reg;
|
||||
u8 bit_idx;
|
||||
...
|
||||
};
|
||||
struct clk_gate {
|
||||
struct clk_hw hw;
|
||||
void __iomem *reg;
|
||||
u8 bit_idx;
|
||||
...
|
||||
};
|
||||
|
||||
struct clk_gate contains struct clk_hw hw as well as hardware-specific
|
||||
knowledge about which register and bit controls this clk's gating.
|
||||
@@ -115,7 +121,7 @@ Nothing about clock topology or accounting, such as enable_count or
|
||||
notifier_count, is needed here. That is all handled by the common
|
||||
framework code and struct clk_core.
|
||||
|
||||
Let's walk through enabling this clk from driver code:
|
||||
Let's walk through enabling this clk from driver code::
|
||||
|
||||
struct clk *clk;
|
||||
clk = clk_get(NULL, "my_gateable_clk");
|
||||
@@ -123,70 +129,71 @@ Let's walk through enabling this clk from driver code:
|
||||
clk_prepare(clk);
|
||||
clk_enable(clk);
|
||||
|
||||
The call graph for clk_enable is very simple:
|
||||
The call graph for clk_enable is very simple::
|
||||
|
||||
clk_enable(clk);
|
||||
clk->ops->enable(clk->hw);
|
||||
[resolves to...]
|
||||
clk_gate_enable(hw);
|
||||
[resolves struct clk gate with to_clk_gate(hw)]
|
||||
clk_gate_set_bit(gate);
|
||||
clk_enable(clk);
|
||||
clk->ops->enable(clk->hw);
|
||||
[resolves to...]
|
||||
clk_gate_enable(hw);
|
||||
[resolves struct clk gate with to_clk_gate(hw)]
|
||||
clk_gate_set_bit(gate);
|
||||
|
||||
And the definition of clk_gate_set_bit:
|
||||
And the definition of clk_gate_set_bit::
|
||||
|
||||
static void clk_gate_set_bit(struct clk_gate *gate)
|
||||
{
|
||||
u32 reg;
|
||||
static void clk_gate_set_bit(struct clk_gate *gate)
|
||||
{
|
||||
u32 reg;
|
||||
|
||||
reg = __raw_readl(gate->reg);
|
||||
reg |= BIT(gate->bit_idx);
|
||||
writel(reg, gate->reg);
|
||||
}
|
||||
reg = __raw_readl(gate->reg);
|
||||
reg |= BIT(gate->bit_idx);
|
||||
writel(reg, gate->reg);
|
||||
}
|
||||
|
||||
Note that to_clk_gate is defined as:
|
||||
Note that to_clk_gate is defined as::
|
||||
|
||||
#define to_clk_gate(_hw) container_of(_hw, struct clk_gate, hw)
|
||||
#define to_clk_gate(_hw) container_of(_hw, struct clk_gate, hw)
|
||||
|
||||
This pattern of abstraction is used for every clock hardware
|
||||
representation.
|
||||
|
||||
Part 4 - supporting your own clk hardware
|
||||
Supporting your own clk hardware
|
||||
================================
|
||||
|
||||
When implementing support for a new type of clock it is only necessary to
|
||||
include the following header:
|
||||
include the following header::
|
||||
|
||||
#include <linux/clk-provider.h>
|
||||
#include <linux/clk-provider.h>
|
||||
|
||||
To construct a clk hardware structure for your platform you must define
|
||||
the following:
|
||||
the following::
|
||||
|
||||
struct clk_foo {
|
||||
struct clk_hw hw;
|
||||
... hardware specific data goes here ...
|
||||
};
|
||||
struct clk_foo {
|
||||
struct clk_hw hw;
|
||||
... hardware specific data goes here ...
|
||||
};
|
||||
|
||||
To take advantage of your data you'll need to support valid operations
|
||||
for your clk:
|
||||
for your clk::
|
||||
|
||||
struct clk_ops clk_foo_ops {
|
||||
.enable = &clk_foo_enable;
|
||||
.disable = &clk_foo_disable;
|
||||
};
|
||||
struct clk_ops clk_foo_ops {
|
||||
.enable = &clk_foo_enable;
|
||||
.disable = &clk_foo_disable;
|
||||
};
|
||||
|
||||
Implement the above functions using container_of:
|
||||
Implement the above functions using container_of::
|
||||
|
||||
#define to_clk_foo(_hw) container_of(_hw, struct clk_foo, hw)
|
||||
#define to_clk_foo(_hw) container_of(_hw, struct clk_foo, hw)
|
||||
|
||||
int clk_foo_enable(struct clk_hw *hw)
|
||||
{
|
||||
struct clk_foo *foo;
|
||||
int clk_foo_enable(struct clk_hw *hw)
|
||||
{
|
||||
struct clk_foo *foo;
|
||||
|
||||
foo = to_clk_foo(hw);
|
||||
foo = to_clk_foo(hw);
|
||||
|
||||
... perform magic on foo ...
|
||||
... perform magic on foo ...
|
||||
|
||||
return 0;
|
||||
};
|
||||
return 0;
|
||||
};
|
||||
|
||||
Below is a matrix detailing which clk_ops are mandatory based upon the
|
||||
hardware capabilities of that clock. A cell marked as "y" means
|
||||
@@ -194,41 +201,56 @@ mandatory, a cell marked as "n" implies that either including that
|
||||
callback is invalid or otherwise unnecessary. Empty cells are either
|
||||
optional or must be evaluated on a case-by-case basis.
|
||||
|
||||
clock hardware characteristics
|
||||
-----------------------------------------------------------
|
||||
| gate | change rate | single parent | multiplexer | root |
|
||||
|------|-------------|---------------|-------------|------|
|
||||
.prepare | | | | | |
|
||||
.unprepare | | | | | |
|
||||
| | | | | |
|
||||
.enable | y | | | | |
|
||||
.disable | y | | | | |
|
||||
.is_enabled | y | | | | |
|
||||
| | | | | |
|
||||
.recalc_rate | | y | | | |
|
||||
.round_rate | | y [1] | | | |
|
||||
.determine_rate | | y [1] | | | |
|
||||
.set_rate | | y | | | |
|
||||
| | | | | |
|
||||
.set_parent | | | n | y | n |
|
||||
.get_parent | | | n | y | n |
|
||||
| | | | | |
|
||||
.recalc_accuracy| | | | | |
|
||||
| | | | | |
|
||||
.init | | | | | |
|
||||
-----------------------------------------------------------
|
||||
[1] either one of round_rate or determine_rate is required.
|
||||
.. table:: clock hardware characteristics
|
||||
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
| | gate | change rate | single parent | multiplexer | root |
|
||||
+================+======+=============+===============+=============+======+
|
||||
|.prepare | | | | | |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|.unprepare | | | | | |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|.enable | y | | | | |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|.disable | y | | | | |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|.is_enabled | y | | | | |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|.recalc_rate | | y | | | |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|.round_rate | | y [1]_ | | | |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|.determine_rate | | y [1]_ | | | |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|.set_rate | | y | | | |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|.set_parent | | | n | y | n |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|.get_parent | | | n | y | n |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|.recalc_accuracy| | | | | |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|.init | | | | | |
|
||||
+----------------+------+-------------+---------------+-------------+------+
|
||||
|
||||
.. [1] either one of round_rate or determine_rate is required.
|
||||
|
||||
Finally, register your clock at run-time with a hardware-specific
|
||||
registration function. This function simply populates struct clk_foo's
|
||||
data and then passes the common struct clk parameters to the framework
|
||||
with a call to:
|
||||
with a call to::
|
||||
|
||||
clk_register(...)
|
||||
clk_register(...)
|
||||
|
||||
See the basic clock types in drivers/clk/clk-*.c for examples.
|
||||
See the basic clock types in ``drivers/clk/clk-*.c`` for examples.
|
||||
|
||||
Part 5 - Disabling clock gating of unused clocks
|
||||
Disabling clock gating of unused clocks
|
||||
=======================================
|
||||
|
||||
Sometimes during development it can be useful to be able to bypass the
|
||||
default disabling of unused clocks. For example, if drivers aren't enabling
|
||||
@@ -239,7 +261,8 @@ are sorted out.
|
||||
To bypass this disabling, include "clk_ignore_unused" in the bootargs to the
|
||||
kernel.
|
||||
|
||||
Part 6 - Locking
|
||||
Locking
|
||||
=======
|
||||
|
||||
The common clock framework uses two global locks, the prepare lock and the
|
||||
enable lock.
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user