You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: - most of the rest of MM - a small number of misc things - lib/ updates - checkpatch - autofs updates - ipc/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (126 commits) ipc: optimize semget/shmget/msgget for lots of keys ipc/sem: play nicer with large nsops allocations ipc/sem: drop sem_checkid helper ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t ipc: convert ipc_namespace.count from atomic_t to refcount_t kcov: support compat processes sh: defconfig: cleanup from old Kconfig options mn10300: defconfig: cleanup from old Kconfig options m32r: defconfig: cleanup from old Kconfig options drivers/pps: use surrounding "if PPS" to remove numerous dependency checks drivers/pps: aesthetic tweaks to PPS-related content cpumask: make cpumask_next() out-of-line kmod: move #ifdef CONFIG_MODULES wrapper to Makefile kmod: split off umh headers into its own file MAINTAINERS: clarify kmod is just a kernel module loader kmod: split out umh code into its own file test_kmod: flip INT checks to be consistent test_kmod: remove paranoid UINT_MAX check on uint range processing vfat: deduplicate hex2bin() ...
This commit is contained in:
@@ -13,8 +13,12 @@ Optional properties:
|
||||
|
||||
Example:
|
||||
pps {
|
||||
compatible = "pps-gpio";
|
||||
gpios = <&gpio2 6 0>;
|
||||
pinctrl-names = "default";
|
||||
pinctrl-0 = <&pinctrl_pps>;
|
||||
|
||||
gpios = <&gpio1 26 GPIO_ACTIVE_HIGH>;
|
||||
assert-falling-edge;
|
||||
|
||||
compatible = "pps-gpio";
|
||||
status = "okay";
|
||||
};
|
||||
|
||||
+23
-21
@@ -48,12 +48,12 @@ problem:
|
||||
time_pps_create().
|
||||
|
||||
This implies that the source has a /dev/... entry. This assumption is
|
||||
ok for the serial and parallel port, where you can do something
|
||||
OK for the serial and parallel port, where you can do something
|
||||
useful besides(!) the gathering of timestamps as it is the central
|
||||
task for a PPS-API. But this assumption does not work for a single
|
||||
task for a PPS API. But this assumption does not work for a single
|
||||
purpose GPIO line. In this case even basic file-related functionality
|
||||
(like read() and write()) makes no sense at all and should not be a
|
||||
precondition for the use of a PPS-API.
|
||||
precondition for the use of a PPS API.
|
||||
|
||||
The problem can be simply solved if you consider that a PPS source is
|
||||
not always connected with a GPS data source.
|
||||
@@ -88,13 +88,13 @@ Coding example
|
||||
--------------
|
||||
|
||||
To register a PPS source into the kernel you should define a struct
|
||||
pps_source_info_s as follows:
|
||||
pps_source_info as follows:
|
||||
|
||||
static struct pps_source_info pps_ktimer_info = {
|
||||
.name = "ktimer",
|
||||
.path = "",
|
||||
.mode = PPS_CAPTUREASSERT | PPS_OFFSETASSERT | \
|
||||
PPS_ECHOASSERT | \
|
||||
.mode = PPS_CAPTUREASSERT | PPS_OFFSETASSERT |
|
||||
PPS_ECHOASSERT |
|
||||
PPS_CANWAIT | PPS_TSFMT_TSPEC,
|
||||
.echo = pps_ktimer_echo,
|
||||
.owner = THIS_MODULE,
|
||||
@@ -108,13 +108,13 @@ initialization routine as follows:
|
||||
|
||||
The pps_register_source() prototype is:
|
||||
|
||||
int pps_register_source(struct pps_source_info_s *info, int default_params)
|
||||
int pps_register_source(struct pps_source_info *info, int default_params)
|
||||
|
||||
where "info" is a pointer to a structure that describes a particular
|
||||
PPS source, "default_params" tells the system what the initial default
|
||||
parameters for the device should be (it is obvious that these parameters
|
||||
must be a subset of ones defined in the struct
|
||||
pps_source_info_s which describe the capabilities of the driver).
|
||||
pps_source_info which describe the capabilities of the driver).
|
||||
|
||||
Once you have registered a new PPS source into the system you can
|
||||
signal an assert event (for example in the interrupt handler routine)
|
||||
@@ -142,8 +142,10 @@ If the SYSFS filesystem is enabled in the kernel it provides a new class:
|
||||
Every directory is the ID of a PPS sources defined in the system and
|
||||
inside you find several files:
|
||||
|
||||
$ ls /sys/class/pps/pps0/
|
||||
assert clear echo mode name path subsystem@ uevent
|
||||
$ ls -F /sys/class/pps/pps0/
|
||||
assert dev mode path subsystem@
|
||||
clear echo name power/ uevent
|
||||
|
||||
|
||||
Inside each "assert" and "clear" file you can find the timestamp and a
|
||||
sequence number:
|
||||
@@ -154,32 +156,32 @@ sequence number:
|
||||
Where before the "#" is the timestamp in seconds; after it is the
|
||||
sequence number. Other files are:
|
||||
|
||||
* echo: reports if the PPS source has an echo function or not;
|
||||
* echo: reports if the PPS source has an echo function or not;
|
||||
|
||||
* mode: reports available PPS functioning modes;
|
||||
* mode: reports available PPS functioning modes;
|
||||
|
||||
* name: reports the PPS source's name;
|
||||
* name: reports the PPS source's name;
|
||||
|
||||
* path: reports the PPS source's device path, that is the device the
|
||||
PPS source is connected to (if it exists).
|
||||
* path: reports the PPS source's device path, that is the device the
|
||||
PPS source is connected to (if it exists).
|
||||
|
||||
|
||||
Testing the PPS support
|
||||
-----------------------
|
||||
|
||||
In order to test the PPS support even without specific hardware you can use
|
||||
the ktimer driver (see the client subsection in the PPS configuration menu)
|
||||
the pps-ktimer driver (see the client subsection in the PPS configuration menu)
|
||||
and the userland tools available in your distribution's pps-tools package,
|
||||
http://linuxpps.org , or https://github.com/ago/pps-tools .
|
||||
http://linuxpps.org , or https://github.com/redlab-i/pps-tools.
|
||||
|
||||
Once you have enabled the compilation of ktimer just modprobe it (if
|
||||
Once you have enabled the compilation of pps-ktimer just modprobe it (if
|
||||
not statically compiled):
|
||||
|
||||
# modprobe ktimer
|
||||
# modprobe pps-ktimer
|
||||
|
||||
and the run ppstest as follow:
|
||||
|
||||
$ ./ppstest /dev/pps0
|
||||
$ ./ppstest /dev/pps1
|
||||
trying PPS source "/dev/pps1"
|
||||
found PPS source "/dev/pps1"
|
||||
ok, found 1 source(s), now start fetching data...
|
||||
@@ -187,7 +189,7 @@ and the run ppstest as follow:
|
||||
source 0 - assert 1186592700.388931295, sequence: 365 - clear 0.000000000, sequence: 0
|
||||
source 0 - assert 1186592701.389032765, sequence: 366 - clear 0.000000000, sequence: 0
|
||||
|
||||
Please, note that to compile userland programs you need the file timepps.h .
|
||||
Please note that to compile userland programs, you need the file timepps.h.
|
||||
This is available in the pps-tools repository mentioned above.
|
||||
|
||||
|
||||
|
||||
@@ -193,6 +193,39 @@ Example::
|
||||
for (node = rb_first(&mytree); node; node = rb_next(node))
|
||||
printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring);
|
||||
|
||||
Cached rbtrees
|
||||
--------------
|
||||
|
||||
Computing the leftmost (smallest) node is quite a common task for binary
|
||||
search trees, such as for traversals or users relying on a the particular
|
||||
order for their own logic. To this end, users can use 'struct rb_root_cached'
|
||||
to optimize O(logN) rb_first() calls to a simple pointer fetch avoiding
|
||||
potentially expensive tree iterations. This is done at negligible runtime
|
||||
overhead for maintanence; albeit larger memory footprint.
|
||||
|
||||
Similar to the rb_root structure, cached rbtrees are initialized to be
|
||||
empty via:
|
||||
|
||||
struct rb_root_cached mytree = RB_ROOT_CACHED;
|
||||
|
||||
Cached rbtree is simply a regular rb_root with an extra pointer to cache the
|
||||
leftmost node. This allows rb_root_cached to exist wherever rb_root does,
|
||||
which permits augmented trees to be supported as well as only a few extra
|
||||
interfaces:
|
||||
|
||||
struct rb_node *rb_first_cached(struct rb_root_cached *tree);
|
||||
void rb_insert_color_cached(struct rb_node *, struct rb_root_cached *, bool);
|
||||
void rb_erase_cached(struct rb_node *node, struct rb_root_cached *);
|
||||
|
||||
Both insert and erase calls have their respective counterpart of augmented
|
||||
trees:
|
||||
|
||||
void rb_insert_augmented_cached(struct rb_node *node, struct rb_root_cached *,
|
||||
bool, struct rb_augment_callbacks *);
|
||||
void rb_erase_augmented_cached(struct rb_node *, struct rb_root_cached *,
|
||||
struct rb_augment_callbacks *);
|
||||
|
||||
|
||||
Support for Augmented rbtrees
|
||||
-----------------------------
|
||||
|
||||
|
||||
@@ -0,0 +1,384 @@
|
||||
Heterogeneous Memory Management (HMM)
|
||||
|
||||
Transparently allow any component of a program to use any memory region of said
|
||||
program with a device without using device specific memory allocator. This is
|
||||
becoming a requirement to simplify the use of advance heterogeneous computing
|
||||
where GPU, DSP or FPGA are use to perform various computations.
|
||||
|
||||
This document is divided as follow, in the first section i expose the problems
|
||||
related to the use of a device specific allocator. The second section i expose
|
||||
the hardware limitations that are inherent to many platforms. The third section
|
||||
gives an overview of HMM designs. The fourth section explains how CPU page-
|
||||
table mirroring works and what is HMM purpose in this context. Fifth section
|
||||
deals with how device memory is represented inside the kernel. Finaly the last
|
||||
section present the new migration helper that allow to leverage the device DMA
|
||||
engine.
|
||||
|
||||
|
||||
1) Problems of using device specific memory allocator:
|
||||
2) System bus, device memory characteristics
|
||||
3) Share address space and migration
|
||||
4) Address space mirroring implementation and API
|
||||
5) Represent and manage device memory from core kernel point of view
|
||||
6) Migrate to and from device memory
|
||||
7) Memory cgroup (memcg) and rss accounting
|
||||
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
1) Problems of using device specific memory allocator:
|
||||
|
||||
Device with large amount of on board memory (several giga bytes) like GPU have
|
||||
historically manage their memory through dedicated driver specific API. This
|
||||
creates a disconnect between memory allocated and managed by device driver and
|
||||
regular application memory (private anonymous, share memory or regular file
|
||||
back memory). From here on i will refer to this aspect as split address space.
|
||||
I use share address space to refer to the opposite situation ie one in which
|
||||
any memory region can be use by device transparently.
|
||||
|
||||
Split address space because device can only access memory allocated through the
|
||||
device specific API. This imply that all memory object in a program are not
|
||||
equal from device point of view which complicate large program that rely on a
|
||||
wide set of libraries.
|
||||
|
||||
Concretly this means that code that wants to leverage device like GPU need to
|
||||
copy object between genericly allocated memory (malloc, mmap private/share/)
|
||||
and memory allocated through the device driver API (this still end up with an
|
||||
mmap but of the device file).
|
||||
|
||||
For flat dataset (array, grid, image, ...) this isn't too hard to achieve but
|
||||
complex data-set (list, tree, ...) are hard to get right. Duplicating a complex
|
||||
data-set need to re-map all the pointer relations between each of its elements.
|
||||
This is error prone and program gets harder to debug because of the duplicate
|
||||
data-set.
|
||||
|
||||
Split address space also means that library can not transparently use data they
|
||||
are getting from core program or other library and thus each library might have
|
||||
to duplicate its input data-set using specific memory allocator. Large project
|
||||
suffer from this and waste resources because of the various memory copy.
|
||||
|
||||
Duplicating each library API to accept as input or output memory allocted by
|
||||
each device specific allocator is not a viable option. It would lead to a
|
||||
combinatorial explosions in the library entry points.
|
||||
|
||||
Finaly with the advance of high level language constructs (in C++ but in other
|
||||
language too) it is now possible for compiler to leverage GPU or other devices
|
||||
without even the programmer knowledge. Some of compiler identified patterns are
|
||||
only do-able with a share address. It is as well more reasonable to use a share
|
||||
address space for all the other patterns.
|
||||
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
2) System bus, device memory characteristics
|
||||
|
||||
System bus cripple share address due to few limitations. Most system bus only
|
||||
allow basic memory access from device to main memory, even cache coherency is
|
||||
often optional. Access to device memory from CPU is even more limited, most
|
||||
often than not it is not cache coherent.
|
||||
|
||||
If we only consider the PCIE bus than device can access main memory (often
|
||||
through an IOMMU) and be cache coherent with the CPUs. However it only allows
|
||||
a limited set of atomic operation from device on main memory. This is worse
|
||||
in the other direction the CPUs can only access a limited range of the device
|
||||
memory and can not perform atomic operations on it. Thus device memory can not
|
||||
be consider like regular memory from kernel point of view.
|
||||
|
||||
Another crippling factor is the limited bandwidth (~32GBytes/s with PCIE 4.0
|
||||
and 16 lanes). This is 33 times less that fastest GPU memory (1 TBytes/s).
|
||||
The final limitation is latency, access to main memory from the device has an
|
||||
order of magnitude higher latency than when the device access its own memory.
|
||||
|
||||
Some platform are developing new system bus or additions/modifications to PCIE
|
||||
to address some of those limitations (OpenCAPI, CCIX). They mainly allow two
|
||||
way cache coherency between CPU and device and allow all atomic operations the
|
||||
architecture supports. Saddly not all platform are following this trends and
|
||||
some major architecture are left without hardware solutions to those problems.
|
||||
|
||||
So for share address space to make sense not only we must allow device to
|
||||
access any memory memory but we must also permit any memory to be migrated to
|
||||
device memory while device is using it (blocking CPU access while it happens).
|
||||
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
3) Share address space and migration
|
||||
|
||||
HMM intends to provide two main features. First one is to share the address
|
||||
space by duplication the CPU page table into the device page table so same
|
||||
address point to same memory and this for any valid main memory address in
|
||||
the process address space.
|
||||
|
||||
To achieve this, HMM offer a set of helpers to populate the device page table
|
||||
while keeping track of CPU page table updates. Device page table updates are
|
||||
not as easy as CPU page table updates. To update the device page table you must
|
||||
allow a buffer (or use a pool of pre-allocated buffer) and write GPU specifics
|
||||
commands in it to perform the update (unmap, cache invalidations and flush,
|
||||
...). This can not be done through common code for all device. Hence why HMM
|
||||
provides helpers to factor out everything that can be while leaving the gory
|
||||
details to the device driver.
|
||||
|
||||
The second mechanism HMM provide is a new kind of ZONE_DEVICE memory that does
|
||||
allow to allocate a struct page for each page of the device memory. Those page
|
||||
are special because the CPU can not map them. They however allow to migrate
|
||||
main memory to device memory using exhisting migration mechanism and everything
|
||||
looks like if page was swap out to disk from CPU point of view. Using a struct
|
||||
page gives the easiest and cleanest integration with existing mm mechanisms.
|
||||
Again here HMM only provide helpers, first to hotplug new ZONE_DEVICE memory
|
||||
for the device memory and second to perform migration. Policy decision of what
|
||||
and when to migrate things is left to the device driver.
|
||||
|
||||
Note that any CPU access to a device page trigger a page fault and a migration
|
||||
back to main memory ie when a page backing an given address A is migrated from
|
||||
a main memory page to a device page then any CPU access to address A trigger a
|
||||
page fault and initiate a migration back to main memory.
|
||||
|
||||
|
||||
With this two features, HMM not only allow a device to mirror a process address
|
||||
space and keeps both CPU and device page table synchronize, but also allow to
|
||||
leverage device memory by migrating part of data-set that is actively use by a
|
||||
device.
|
||||
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
4) Address space mirroring implementation and API
|
||||
|
||||
Address space mirroring main objective is to allow to duplicate range of CPU
|
||||
page table into a device page table and HMM helps keeping both synchronize. A
|
||||
device driver that want to mirror a process address space must start with the
|
||||
registration of an hmm_mirror struct:
|
||||
|
||||
int hmm_mirror_register(struct hmm_mirror *mirror,
|
||||
struct mm_struct *mm);
|
||||
int hmm_mirror_register_locked(struct hmm_mirror *mirror,
|
||||
struct mm_struct *mm);
|
||||
|
||||
The locked variant is to be use when the driver is already holding the mmap_sem
|
||||
of the mm in write mode. The mirror struct has a set of callback that are use
|
||||
to propagate CPU page table:
|
||||
|
||||
struct hmm_mirror_ops {
|
||||
/* sync_cpu_device_pagetables() - synchronize page tables
|
||||
*
|
||||
* @mirror: pointer to struct hmm_mirror
|
||||
* @update_type: type of update that occurred to the CPU page table
|
||||
* @start: virtual start address of the range to update
|
||||
* @end: virtual end address of the range to update
|
||||
*
|
||||
* This callback ultimately originates from mmu_notifiers when the CPU
|
||||
* page table is updated. The device driver must update its page table
|
||||
* in response to this callback. The update argument tells what action
|
||||
* to perform.
|
||||
*
|
||||
* The device driver must not return from this callback until the device
|
||||
* page tables are completely updated (TLBs flushed, etc); this is a
|
||||
* synchronous call.
|
||||
*/
|
||||
void (*update)(struct hmm_mirror *mirror,
|
||||
enum hmm_update action,
|
||||
unsigned long start,
|
||||
unsigned long end);
|
||||
};
|
||||
|
||||
Device driver must perform update to the range following action (turn range
|
||||
read only, or fully unmap, ...). Once driver callback returns the device must
|
||||
be done with the update.
|
||||
|
||||
|
||||
When device driver wants to populate a range of virtual address it can use
|
||||
either:
|
||||
int hmm_vma_get_pfns(struct vm_area_struct *vma,
|
||||
struct hmm_range *range,
|
||||
unsigned long start,
|
||||
unsigned long end,
|
||||
hmm_pfn_t *pfns);
|
||||
int hmm_vma_fault(struct vm_area_struct *vma,
|
||||
struct hmm_range *range,
|
||||
unsigned long start,
|
||||
unsigned long end,
|
||||
hmm_pfn_t *pfns,
|
||||
bool write,
|
||||
bool block);
|
||||
|
||||
First one (hmm_vma_get_pfns()) will only fetch present CPU page table entry and
|
||||
will not trigger a page fault on missing or non present entry. The second one
|
||||
do trigger page fault on missing or read only entry if write parameter is true.
|
||||
Page fault use the generic mm page fault code path just like a CPU page fault.
|
||||
|
||||
Both function copy CPU page table into their pfns array argument. Each entry in
|
||||
that array correspond to an address in the virtual range. HMM provide a set of
|
||||
flags to help driver identify special CPU page table entries.
|
||||
|
||||
Locking with the update() callback is the most important aspect the driver must
|
||||
respect in order to keep things properly synchronize. The usage pattern is :
|
||||
|
||||
int driver_populate_range(...)
|
||||
{
|
||||
struct hmm_range range;
|
||||
...
|
||||
again:
|
||||
ret = hmm_vma_get_pfns(vma, &range, start, end, pfns);
|
||||
if (ret)
|
||||
return ret;
|
||||
take_lock(driver->update);
|
||||
if (!hmm_vma_range_done(vma, &range)) {
|
||||
release_lock(driver->update);
|
||||
goto again;
|
||||
}
|
||||
|
||||
// Use pfns array content to update device page table
|
||||
|
||||
release_lock(driver->update);
|
||||
return 0;
|
||||
}
|
||||
|
||||
The driver->update lock is the same lock that driver takes inside its update()
|
||||
callback. That lock must be call before hmm_vma_range_done() to avoid any race
|
||||
with a concurrent CPU page table update.
|
||||
|
||||
HMM implements all this on top of the mmu_notifier API because we wanted to a
|
||||
simpler API and also to be able to perform optimization latter own like doing
|
||||
concurrent device update in multi-devices scenario.
|
||||
|
||||
HMM also serve as an impedence missmatch between how CPU page table update are
|
||||
done (by CPU write to the page table and TLB flushes) from how device update
|
||||
their own page table. Device update is a multi-step process, first appropriate
|
||||
commands are write to a buffer, then this buffer is schedule for execution on
|
||||
the device. It is only once the device has executed commands in the buffer that
|
||||
the update is done. Creating and scheduling update command buffer can happen
|
||||
concurrently for multiple devices. Waiting for each device to report commands
|
||||
as executed is serialize (there is no point in doing this concurrently).
|
||||
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
5) Represent and manage device memory from core kernel point of view
|
||||
|
||||
Several differents design were try to support device memory. First one use
|
||||
device specific data structure to keep information about migrated memory and
|
||||
HMM hooked itself in various place of mm code to handle any access to address
|
||||
that were back by device memory. It turns out that this ended up replicating
|
||||
most of the fields of struct page and also needed many kernel code path to be
|
||||
updated to understand this new kind of memory.
|
||||
|
||||
Thing is most kernel code path never try to access the memory behind a page
|
||||
but only care about struct page contents. Because of this HMM switchted to
|
||||
directly using struct page for device memory which left most kernel code path
|
||||
un-aware of the difference. We only need to make sure that no one ever try to
|
||||
map those page from the CPU side.
|
||||
|
||||
HMM provide a set of helpers to register and hotplug device memory as a new
|
||||
region needing struct page. This is offer through a very simple API:
|
||||
|
||||
struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
|
||||
struct device *device,
|
||||
unsigned long size);
|
||||
void hmm_devmem_remove(struct hmm_devmem *devmem);
|
||||
|
||||
The hmm_devmem_ops is where most of the important things are:
|
||||
|
||||
struct hmm_devmem_ops {
|
||||
void (*free)(struct hmm_devmem *devmem, struct page *page);
|
||||
int (*fault)(struct hmm_devmem *devmem,
|
||||
struct vm_area_struct *vma,
|
||||
unsigned long addr,
|
||||
struct page *page,
|
||||
unsigned flags,
|
||||
pmd_t *pmdp);
|
||||
};
|
||||
|
||||
The first callback (free()) happens when the last reference on a device page is
|
||||
drop. This means the device page is now free and no longer use by anyone. The
|
||||
second callback happens whenever CPU try to access a device page which it can
|
||||
not do. This second callback must trigger a migration back to system memory.
|
||||
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
6) Migrate to and from device memory
|
||||
|
||||
Because CPU can not access device memory, migration must use device DMA engine
|
||||
to perform copy from and to device memory. For this we need a new migration
|
||||
helper:
|
||||
|
||||
int migrate_vma(const struct migrate_vma_ops *ops,
|
||||
struct vm_area_struct *vma,
|
||||
unsigned long mentries,
|
||||
unsigned long start,
|
||||
unsigned long end,
|
||||
unsigned long *src,
|
||||
unsigned long *dst,
|
||||
void *private);
|
||||
|
||||
Unlike other migration function it works on a range of virtual address, there
|
||||
is two reasons for that. First device DMA copy has a high setup overhead cost
|
||||
and thus batching multiple pages is needed as otherwise the migration overhead
|
||||
make the whole excersie pointless. The second reason is because driver trigger
|
||||
such migration base on range of address the device is actively accessing.
|
||||
|
||||
The migrate_vma_ops struct define two callbacks. First one (alloc_and_copy())
|
||||
control destination memory allocation and copy operation. Second one is there
|
||||
to allow device driver to perform cleanup operation after migration.
|
||||
|
||||
struct migrate_vma_ops {
|
||||
void (*alloc_and_copy)(struct vm_area_struct *vma,
|
||||
const unsigned long *src,
|
||||
unsigned long *dst,
|
||||
unsigned long start,
|
||||
unsigned long end,
|
||||
void *private);
|
||||
void (*finalize_and_map)(struct vm_area_struct *vma,
|
||||
const unsigned long *src,
|
||||
const unsigned long *dst,
|
||||
unsigned long start,
|
||||
unsigned long end,
|
||||
void *private);
|
||||
};
|
||||
|
||||
It is important to stress that this migration helpers allow for hole in the
|
||||
virtual address range. Some pages in the range might not be migrated for all
|
||||
the usual reasons (page is pin, page is lock, ...). This helper does not fail
|
||||
but just skip over those pages.
|
||||
|
||||
The alloc_and_copy() might as well decide to not migrate all pages in the
|
||||
range (for reasons under the callback control). For those the callback just
|
||||
have to leave the corresponding dst entry empty.
|
||||
|
||||
Finaly the migration of the struct page might fails (for file back page) for
|
||||
various reasons (failure to freeze reference, or update page cache, ...). If
|
||||
that happens then the finalize_and_map() can catch any pages that was not
|
||||
migrated. Note those page were still copied to new page and thus we wasted
|
||||
bandwidth but this is considered as a rare event and a price that we are
|
||||
willing to pay to keep all the code simpler.
|
||||
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
7) Memory cgroup (memcg) and rss accounting
|
||||
|
||||
For now device memory is accounted as any regular page in rss counters (either
|
||||
anonymous if device page is use for anonymous, file if device page is use for
|
||||
file back page or shmem if device page is use for share memory). This is a
|
||||
deliberate choice to keep existing application that might start using device
|
||||
memory without knowing about it to keep runing unimpacted.
|
||||
|
||||
Drawbacks is that OOM killer might kill an application using a lot of device
|
||||
memory and not a lot of regular system memory and thus not freeing much system
|
||||
memory. We want to gather more real world experience on how application and
|
||||
system react under memory pressure in the presence of device memory before
|
||||
deciding to account device memory differently.
|
||||
|
||||
|
||||
Same decision was made for memory cgroup. Device memory page are accounted
|
||||
against same memory cgroup a regular page would be accounted to. This does
|
||||
simplify migration to and from device memory. This also means that migration
|
||||
back from device memory to regular memory can not fail because it would
|
||||
go above memory cgroup limit. We might revisit this choice latter on once we
|
||||
get more experience in how device memory is use and its impact on memory
|
||||
resource control.
|
||||
|
||||
|
||||
Note that device memory can never be pin nor by device driver nor through GUP
|
||||
and thus such memory is always free upon process exit. Or when last reference
|
||||
is drop in case of share memory or file back memory.
|
||||
+18
-1
@@ -7457,6 +7457,13 @@ S: Maintained
|
||||
F: tools/testing/selftests/
|
||||
F: Documentation/dev-tools/kselftest*
|
||||
|
||||
KERNEL USERMODE HELPER
|
||||
M: "Luis R. Rodriguez" <mcgrof@kernel.org>
|
||||
L: linux-kernel@vger.kernel.org
|
||||
S: Maintained
|
||||
F: kernel/umh.c
|
||||
F: include/linux/umh.h
|
||||
|
||||
KERNEL VIRTUAL MACHINE (KVM)
|
||||
M: Paolo Bonzini <pbonzini@redhat.com>
|
||||
M: Radim Krčmář <rkrcmar@redhat.com>
|
||||
@@ -7632,7 +7639,7 @@ F: include/linux/kmemleak.h
|
||||
F: mm/kmemleak.c
|
||||
F: mm/kmemleak-test.c
|
||||
|
||||
KMOD MODULE USERMODE HELPER
|
||||
KMOD KERNEL MODULE LOADER - USERMODE HELPER
|
||||
M: "Luis R. Rodriguez" <mcgrof@kernel.org>
|
||||
L: linux-kernel@vger.kernel.org
|
||||
S: Maintained
|
||||
@@ -7790,6 +7797,13 @@ M: Sasha Levin <alexander.levin@verizon.com>
|
||||
S: Maintained
|
||||
F: tools/lib/lockdep/
|
||||
|
||||
HMM - Heterogeneous Memory Management
|
||||
M: Jérôme Glisse <jglisse@redhat.com>
|
||||
L: linux-mm@kvack.org
|
||||
S: Maintained
|
||||
F: mm/hmm*
|
||||
F: include/linux/hmm*
|
||||
|
||||
LIBNVDIMM BLK: MMIO-APERTURE DRIVER
|
||||
M: Ross Zwisler <ross.zwisler@linux.intel.com>
|
||||
L: linux-nvdimm@lists.01.org
|
||||
@@ -10727,8 +10741,11 @@ W: http://wiki.enneenne.com/index.php/LinuxPPS_support
|
||||
L: linuxpps@ml.enneenne.com (subscribers-only)
|
||||
S: Maintained
|
||||
F: Documentation/pps/
|
||||
F: Documentation/devicetree/bindings/pps/pps-gpio.txt
|
||||
F: Documentation/ABI/testing/sysfs-pps
|
||||
F: drivers/pps/
|
||||
F: include/linux/pps*.h
|
||||
F: include/uapi/linux/pps.h
|
||||
|
||||
PPTP DRIVER
|
||||
M: Dmitry Kozlov <xeb@mail.ru>
|
||||
|
||||
@@ -65,13 +65,14 @@ extern void * memchr(const void *, int, size_t);
|
||||
aligned values. The DEST and COUNT parameters must be even for
|
||||
correct operation. */
|
||||
|
||||
#define __HAVE_ARCH_MEMSETW
|
||||
extern void * __memsetw(void *dest, unsigned short, size_t count);
|
||||
|
||||
#define memsetw(s, c, n) \
|
||||
(__builtin_constant_p(c) \
|
||||
? __constant_c_memset((s),0x0001000100010001UL*(unsigned short)(c),(n)) \
|
||||
: __memsetw((s),(c),(n)))
|
||||
#define __HAVE_ARCH_MEMSET16
|
||||
extern void * __memset16(void *dest, unsigned short, size_t count);
|
||||
static inline void *memset16(uint16_t *p, uint16_t v, size_t n)
|
||||
{
|
||||
if (__builtin_constant_p(v))
|
||||
return __constant_c_memset(p, 0x0001000100010001UL * v, n * 2);
|
||||
return __memset16(p, v, n * 2);
|
||||
}
|
||||
|
||||
#endif /* __KERNEL__ */
|
||||
|
||||
|
||||
@@ -34,7 +34,7 @@ static inline void scr_memsetw(u16 *s, u16 c, unsigned int count)
|
||||
if (__is_ioaddr(s))
|
||||
memsetw_io((u16 __iomem *) s, c, count);
|
||||
else
|
||||
memsetw(s, c, count);
|
||||
memset16(s, c, count / 2);
|
||||
}
|
||||
|
||||
/* Do not trust that the usage will be correct; analyze the arguments. */
|
||||
|
||||
@@ -20,7 +20,7 @@
|
||||
.globl memset
|
||||
.globl __memset
|
||||
.globl ___memset
|
||||
.globl __memsetw
|
||||
.globl __memset16
|
||||
.globl __constant_c_memset
|
||||
|
||||
.ent ___memset
|
||||
@@ -110,8 +110,8 @@ EXPORT_SYMBOL(___memset)
|
||||
EXPORT_SYMBOL(__constant_c_memset)
|
||||
|
||||
.align 5
|
||||
.ent __memsetw
|
||||
__memsetw:
|
||||
.ent __memset16
|
||||
__memset16:
|
||||
.prologue 0
|
||||
|
||||
inswl $17,0,$1 /* E0 */
|
||||
@@ -123,8 +123,8 @@ __memsetw:
|
||||
or $1,$4,$17 /* E0 */
|
||||
br __constant_c_memset /* .. E1 */
|
||||
|
||||
.end __memsetw
|
||||
EXPORT_SYMBOL(__memsetw)
|
||||
.end __memset16
|
||||
EXPORT_SYMBOL(__memset16)
|
||||
|
||||
memset = ___memset
|
||||
__memset = ___memset
|
||||
|
||||
@@ -24,6 +24,20 @@ extern void * memchr(const void *, int, __kernel_size_t);
|
||||
#define __HAVE_ARCH_MEMSET
|
||||
extern void * memset(void *, int, __kernel_size_t);
|
||||
|
||||
#define __HAVE_ARCH_MEMSET32
|
||||
extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
|
||||
static inline void *memset32(uint32_t *p, uint32_t v, __kernel_size_t n)
|
||||
{
|
||||
return __memset32(p, v, n * 4);
|
||||
}
|
||||
|
||||
#define __HAVE_ARCH_MEMSET64
|
||||
extern void *__memset64(uint64_t *, uint32_t low, __kernel_size_t, uint32_t hi);
|
||||
static inline void *memset64(uint64_t *p, uint64_t v, __kernel_size_t n)
|
||||
{
|
||||
return __memset64(p, v, n * 8, v >> 32);
|
||||
}
|
||||
|
||||
extern void __memzero(void *ptr, __kernel_size_t n);
|
||||
|
||||
#define memset(p,v,n) \
|
||||
|
||||
@@ -87,6 +87,8 @@ EXPORT_SYMBOL(__raw_writesl);
|
||||
EXPORT_SYMBOL(strchr);
|
||||
EXPORT_SYMBOL(strrchr);
|
||||
EXPORT_SYMBOL(memset);
|
||||
EXPORT_SYMBOL(__memset32);
|
||||
EXPORT_SYMBOL(__memset64);
|
||||
EXPORT_SYMBOL(memcpy);
|
||||
EXPORT_SYMBOL(memmove);
|
||||
EXPORT_SYMBOL(memchr);
|
||||
|
||||
+18
-6
@@ -28,7 +28,7 @@ UNWIND( .fnstart )
|
||||
1: orr r1, r1, r1, lsl #8
|
||||
orr r1, r1, r1, lsl #16
|
||||
mov r3, r1
|
||||
cmp r2, #16
|
||||
7: cmp r2, #16
|
||||
blt 4f
|
||||
|
||||
#if ! CALGN(1)+0
|
||||
@@ -41,7 +41,7 @@ UNWIND( .fnend )
|
||||
UNWIND( .fnstart )
|
||||
UNWIND( .save {r8, lr} )
|
||||
mov r8, r1
|
||||
mov lr, r1
|
||||
mov lr, r3
|
||||
|
||||
2: subs r2, r2, #64
|
||||
stmgeia ip!, {r1, r3, r8, lr} @ 64 bytes at a time.
|
||||
@@ -73,11 +73,11 @@ UNWIND( .fnend )
|
||||
UNWIND( .fnstart )
|
||||
UNWIND( .save {r4-r8, lr} )
|
||||
mov r4, r1
|
||||
mov r5, r1
|
||||
mov r5, r3
|
||||
mov r6, r1
|
||||
mov r7, r1
|
||||
mov r7, r3
|
||||
mov r8, r1
|
||||
mov lr, r1
|
||||
mov lr, r3
|
||||
|
||||
cmp r2, #96
|
||||
tstgt ip, #31
|
||||
@@ -114,7 +114,7 @@ UNWIND( .fnstart )
|
||||
tst r2, #4
|
||||
strne r1, [ip], #4
|
||||
/*
|
||||
* When we get here, we've got less than 4 bytes to zero. We
|
||||
* When we get here, we've got less than 4 bytes to set. We
|
||||
* may have an unaligned pointer as well.
|
||||
*/
|
||||
5: tst r2, #2
|
||||
@@ -135,3 +135,15 @@ UNWIND( .fnstart )
|
||||
UNWIND( .fnend )
|
||||
ENDPROC(memset)
|
||||
ENDPROC(mmioset)
|
||||
|
||||
ENTRY(__memset32)
|
||||
UNWIND( .fnstart )
|
||||
mov r3, r1 @ copy r1 to r3 and fall into memset64
|
||||
UNWIND( .fnend )
|
||||
ENDPROC(__memset32)
|
||||
ENTRY(__memset64)
|
||||
UNWIND( .fnstart )
|
||||
mov ip, r0 @ preserve r0 as return value
|
||||
b 7b @ jump into the middle of memset
|
||||
UNWIND( .fnend )
|
||||
ENDPROC(__memset64)
|
||||
|
||||
@@ -690,7 +690,7 @@ void __init smp_init_cpus(void)
|
||||
acpi_parse_gic_cpu_interface, 0);
|
||||
|
||||
if (cpu_count > nr_cpu_ids)
|
||||
pr_warn("Number of cores (%d) exceeds configured maximum of %d - clipping\n",
|
||||
pr_warn("Number of cores (%d) exceeds configured maximum of %u - clipping\n",
|
||||
cpu_count, nr_cpu_ids);
|
||||
|
||||
if (!bootcpu_valid) {
|
||||
|
||||
@@ -17,6 +17,9 @@ config FRV
|
||||
select HAVE_DEBUG_STACKOVERFLOW
|
||||
select ARCH_NO_COHERENT_DMA_MMAP
|
||||
|
||||
config CPU_BIG_ENDIAN
|
||||
def_bool y
|
||||
|
||||
config ZONE_DMA
|
||||
bool
|
||||
default y
|
||||
|
||||
@@ -23,6 +23,9 @@ config H8300
|
||||
select HAVE_ARCH_HASH
|
||||
select CPU_NO_EFFICIENT_FFS
|
||||
|
||||
config CPU_BIG_ENDIAN
|
||||
def_bool y
|
||||
|
||||
config RWSEM_GENERIC_SPINLOCK
|
||||
def_bool y
|
||||
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
CONFIG_EXPERIMENTAL=y
|
||||
CONFIG_SYSVIPC=y
|
||||
CONFIG_IKCONFIG=y
|
||||
CONFIG_IKCONFIG_PROC=y
|
||||
@@ -40,7 +39,6 @@ CONFIG_NETFILTER_XT_MATCH_REALM=m
|
||||
CONFIG_NETFILTER_XT_MATCH_SCTP=m
|
||||
CONFIG_NETFILTER_XT_MATCH_STRING=m
|
||||
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
|
||||
CONFIG_IP_NF_QUEUE=m
|
||||
CONFIG_IP_NF_IPTABLES=m
|
||||
CONFIG_IP_NF_MATCH_ADDRTYPE=m
|
||||
CONFIG_IP_NF_MATCH_ECN=m
|
||||
@@ -48,7 +46,6 @@ CONFIG_IP_NF_MATCH_TTL=m
|
||||
CONFIG_IP_NF_FILTER=m
|
||||
CONFIG_IP_NF_TARGET_REJECT=m
|
||||
CONFIG_IP_NF_TARGET_LOG=m
|
||||
CONFIG_IP_NF_TARGET_ULOG=m
|
||||
CONFIG_IP_NF_MANGLE=m
|
||||
CONFIG_IP_NF_TARGET_ECN=m
|
||||
CONFIG_IP_NF_TARGET_TTL=m
|
||||
@@ -106,7 +103,6 @@ CONFIG_SENSORS_SMSC47M1=m
|
||||
CONFIG_SENSORS_W83781D=m
|
||||
CONFIG_SENSORS_W83L785TS=m
|
||||
CONFIG_SENSORS_W83627HF=m
|
||||
CONFIG_VIDEO_OUTPUT_CONTROL=m
|
||||
CONFIG_EXT2_FS=y
|
||||
CONFIG_EXT2_FS_XATTR=y
|
||||
CONFIG_EXT2_FS_POSIX_ACL=y
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
CONFIG_EXPERIMENTAL=y
|
||||
CONFIG_SYSVIPC=y
|
||||
CONFIG_BSD_PROCESS_ACCT=y
|
||||
CONFIG_IKCONFIG=y
|
||||
@@ -30,7 +29,6 @@ CONFIG_IP_PNP=y
|
||||
CONFIG_IP_PNP_DHCP=y
|
||||
# CONFIG_IPV6 is not set
|
||||
CONFIG_MTD=y
|
||||
CONFIG_MTD_PARTITIONS=y
|
||||
CONFIG_MTD_REDBOOT_PARTS=y
|
||||
CONFIG_MTD_BLOCK=y
|
||||
CONFIG_MTD_CFI=m
|
||||
@@ -63,7 +61,6 @@ CONFIG_SERIAL_M32R_SIO_CONSOLE=y
|
||||
CONFIG_SERIAL_M32R_PLDSIO=y
|
||||
CONFIG_HW_RANDOM=y
|
||||
CONFIG_DS1302=y
|
||||
CONFIG_VIDEO_OUTPUT_CONTROL=m
|
||||
CONFIG_FB=y
|
||||
CONFIG_FIRMWARE_EDID=y
|
||||
CONFIG_FB_S1D13XXX=y
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
CONFIG_EXPERIMENTAL=y
|
||||
CONFIG_SYSVIPC=y
|
||||
CONFIG_BSD_PROCESS_ACCT=y
|
||||
CONFIG_IKCONFIG=y
|
||||
@@ -29,7 +28,6 @@ CONFIG_IP_PNP=y
|
||||
CONFIG_IP_PNP_DHCP=y
|
||||
# CONFIG_IPV6 is not set
|
||||
CONFIG_MTD=y
|
||||
CONFIG_MTD_PARTITIONS=y
|
||||
CONFIG_MTD_REDBOOT_PARTS=y
|
||||
CONFIG_MTD_BLOCK=y
|
||||
CONFIG_MTD_CFI=m
|
||||
@@ -62,7 +60,6 @@ CONFIG_SERIAL_M32R_SIO_CONSOLE=y
|
||||
CONFIG_SERIAL_M32R_PLDSIO=y
|
||||
CONFIG_HW_RANDOM=y
|
||||
CONFIG_DS1302=y
|
||||
CONFIG_VIDEO_OUTPUT_CONTROL=m
|
||||
CONFIG_FB=y
|
||||
CONFIG_FIRMWARE_EDID=y
|
||||
CONFIG_FB_S1D13XXX=y
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
CONFIG_EXPERIMENTAL=y
|
||||
CONFIG_BSD_PROCESS_ACCT=y
|
||||
CONFIG_IKCONFIG=y
|
||||
CONFIG_LOG_BUF_SHIFT=14
|
||||
@@ -39,7 +38,6 @@ CONFIG_NETDEVICES=y
|
||||
# CONFIG_VT is not set
|
||||
CONFIG_SERIAL_M32R_SIO_CONSOLE=y
|
||||
CONFIG_HW_RANDOM=y
|
||||
CONFIG_VIDEO_OUTPUT_CONTROL=m
|
||||
CONFIG_EXT2_FS=y
|
||||
CONFIG_EXT3_FS=y
|
||||
CONFIG_NFS_FS=y
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
CONFIG_EXPERIMENTAL=y
|
||||
CONFIG_SYSVIPC=y
|
||||
CONFIG_IKCONFIG=y
|
||||
CONFIG_IKCONFIG_PROC=y
|
||||
@@ -31,9 +30,7 @@ CONFIG_IP_PNP_DHCP=y
|
||||
# CONFIG_IPV6 is not set
|
||||
# CONFIG_STANDALONE is not set
|
||||
CONFIG_MTD=y
|
||||
CONFIG_MTD_PARTITIONS=y
|
||||
CONFIG_MTD_REDBOOT_PARTS=y
|
||||
CONFIG_MTD_CHAR=y
|
||||
CONFIG_MTD_BLOCK=y
|
||||
CONFIG_BLK_DEV_LOOP=y
|
||||
CONFIG_BLK_DEV_NBD=m
|
||||
@@ -50,7 +47,6 @@ CONFIG_NETDEVICES=y
|
||||
# CONFIG_VT is not set
|
||||
CONFIG_SERIAL_M32R_SIO_CONSOLE=y
|
||||
CONFIG_HW_RANDOM=y
|
||||
CONFIG_VIDEO_OUTPUT_CONTROL=m
|
||||
CONFIG_EXT2_FS=y
|
||||
CONFIG_EXT3_FS=y
|
||||
CONFIG_ISO9660_FS=y
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
CONFIG_EXPERIMENTAL=y
|
||||
CONFIG_SYSVIPC=y
|
||||
CONFIG_IKCONFIG=y
|
||||
CONFIG_IKCONFIG_PROC=y
|
||||
@@ -29,9 +28,7 @@ CONFIG_IP_PNP_DHCP=y
|
||||
# CONFIG_IPV6 is not set
|
||||
# CONFIG_STANDALONE is not set
|
||||
CONFIG_MTD=y
|
||||
CONFIG_MTD_PARTITIONS=y
|
||||
CONFIG_MTD_REDBOOT_PARTS=y
|
||||
CONFIG_MTD_CHAR=y
|
||||
CONFIG_MTD_BLOCK=y
|
||||
CONFIG_BLK_DEV_LOOP=y
|
||||
CONFIG_BLK_DEV_NBD=m
|
||||
@@ -48,7 +45,6 @@ CONFIG_NETDEVICES=y
|
||||
# CONFIG_VT is not set
|
||||
CONFIG_SERIAL_M32R_SIO_CONSOLE=y
|
||||
CONFIG_HW_RANDOM=y
|
||||
CONFIG_VIDEO_OUTPUT_CONTROL=m
|
||||
CONFIG_EXT2_FS=y
|
||||
CONFIG_EXT3_FS=y
|
||||
CONFIG_ISO9660_FS=y
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user