Extentions to RapidIO switch support:
1. modify switch route operation declarations to allow using single
switch-specific file for family of switches that share the same route
table operations.
2. add standard route table operations for switches that that support
route table manipulation registers as defined in the Rev.1.3 of RapidIO
specification.
3. add clear-route-table operation for switches
4. add CPSxx and TSIxxx families of RapidIO switches
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This changes notifier_from_errno(0) to be NOTIFY_OK instead of
NOTIFY_STOP_MASK | NOTIFY_OK.
Currently, the notifiers which return encapsulated errno value have to
do something like this:
err = do_something(); // returns -errno
if (err)
return notifier_from_errno(err);
else
return NOTIFY_OK;
This change makes the above code simple:
err = do_something(); // returns -errno
return return notifier_from_errno(err);
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No functional changes, just s/atomic_t count/int nr_threads/.
With the recent changes this counter has a single user, get_nr_threads()
And, none of its callers need the really accurate number of threads, not
to mention each caller obviously races with fork/exit. It is only used to
report this value to the user-space, except first_tid() uses it to avoid
the unnecessary while_each_thread() loop in the unlikely case.
It is a bit sad we need a word in struct signal_struct for this, perhaps
we can change get_nr_threads() to approximate the number of threads using
signal->live and kill ->nr_threads later.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that task->signal can't go away get_nr_threads() doesn't need
->siglock to read signal->count.
Also, make it inline, move into sched.h, and convert 2 other proc users of
signal->count to use this (now trivial) helper.
Henceforth get_nr_threads() is the only valid user of signal->count, we
are ready to turn it into "int nr_threads" or, perhaps, kill it.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have a lot of problems with accessing task_struct->signal, it can
"disappear" at any moment. Even current can't use its ->signal safely
after exit_notify(). ->siglock helps, but it is not convenient, not
always possible, and sometimes it makes sense to use task->signal even
after this task has already dead.
This patch adds the reference counter, sigcnt, into signal_struct. This
reference is owned by task_struct and it is dropped in
__put_task_struct(). Perhaps it makes sense to export
get/put_signal_struct() later, but currently I don't see the immediate
reason.
Rename __cleanup_signal() to free_signal_struct() and unexport it. With
the previous changes it does nothing except kmem_cache_free().
Change __exit_signal() to not clear/free ->signal, it will be freed when
the last reference to any thread in the thread group goes away.
Note:
- when the last thead exits signal->tty can point to nowhere, see
the next patch.
- with or without this patch signal_struct->count should go away,
or at least it should be "int nr_threads" for fs/proc. This will
be addressed later.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change zap_other_threads() to return the number of other sub-threads found
on ->thread_group list.
Other changes are cosmetic:
- change the code to use while_each_thread() helper
- remove the obsolete comment about SIGKILL/SIGSTOP
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that nobody ever changes subprocess_info->cred we can kill this member
and related code. ____call_usermodehelper() always runs in the context of
freshly forked kernel thread, it has the proper ->cred copied from its
parent kthread, keventd.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
call_usermodehelper_keys() uses call_usermodehelper_setkeys() to change
subprocess_info->cred in advance. Now that we have info->init() we can
change this code to set tgcred->session_keyring in context of execing
kernel thread.
Note: since currently call_usermodehelper_keys() is never called with
UMH_NO_WAIT, call_usermodehelper_keys()->key_get() and umh_keys_cleanup()
are not really needed, we could rely on install_session_keyring_to_cred()
which does key_get() on success.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The first patch in this series introduced an init function to the
call_usermodehelper api so that processes could be customized by caller.
This patch takes advantage of that fact, by customizing the helper in
do_coredump to create the pipe and set its core limit to one (for our
recusrsion check). This lets us clean up the previous uglyness in the
usermodehelper internals and factor call_usermodehelper out entirely.
While I'm at it, we can also modify the helper setup to look for a core
limit value of 1 rather than zero for our recursion check
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
About 6 months ago, I made a set of changes to how the core-dump-to-a-pipe
feature in the kernel works. We had reports of several races, including
some reports of apps bypassing our recursion check so that a process that
was forked as part of a core_pattern setup could infinitely crash and
refork until the system crashed.
We fixed those by improving our recursion checks. The new check basically
refuses to fork a process if its core limit is zero, which works well.
Unfortunately, I've been getting grief from maintainer of user space
programs that are inserted as the forked process of core_pattern. They
contend that in order for their programs (such as abrt and apport) to
work, all the running processes in a system must have their core limits
set to a non-zero value, to which I say 'yes'. I did this by design, and
think thats the right way to do things.
But I've been asked to ease this burden on user space enough times that I
thought I would take a look at it. The first suggestion was to make the
recursion check fail on a non-zero 'special' number, like one. That way
the core collector process could set its core size ulimit to 1, and enable
the kernel's recursion detection. This isn't a bad idea on the surface,
but I don't like it since its opt-in, in that if a program like abrt or
apport has a bug and fails to set such a core limit, we're left with a
recursively crashing system again.
So I've come up with this. What I've done is modify the
call_usermodehelper api such that an extra parameter is added, a function
pointer which will be called by the user helper task, after it forks, but
before it exec's the required process. This will give the caller the
opportunity to get a call back in the processes context, allowing it to do
whatever it needs to to the process in the kernel prior to exec-ing the
user space code. In the case of do_coredump, this callback is ues to set
the core ulimit of the helper process to 1. This elimnates the opt-in
problem that I had above, as it allows the ulimit for core sizes to be set
to the value of 1, which is what the recursion check looks for in
do_coredump.
This patch:
Create new function call_usermodehelper_fns() and allow it to assign both
an init and cleanup function, as we'll as arbitrary data.
The init function is called from the context of the forked process and
allows for customization of the helper process prior to calling exec. Its
return code gates the continuation of the process, or causes its exit.
Also add an arbitrary data pointer to the subprocess_info struct allowing
for data to be passed from the caller to the new process, and the
subsequent cleanup process
Also, use this patch to cleanup the cleanup function. It currently takes
an argp and envp pointer for freeing, which is ugly. Lets instead just
make the subprocess_info structure public, and pass that to the cleanup
and init routines
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems). Part of the reason is that
the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at
node 0 for newly created tasks.
This patch changes the rotor to be initialized to a random node number of
the cpuset.
[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have observed several workloads running on multi-node systems where
memory is assigned unevenly across the nodes in the system. There are
numerous reasons for this but one is the round-robin rotor in
cpuset_mem_spread_node().
For example, a simple test that writes a multi-page file will allocate
pages on nodes 0 2 4 6 ... Odd nodes are skipped. (Sometimes it
allocates on odd nodes & skips even nodes).
An example is shown below. The program "lfile" writes a file consisting
of 10 pages. The program then mmaps the file & uses get_mempolicy(...,
MPOL_F_NODE) to determine the nodes where the file pages were allocated.
The output is shown below:
# ./lfile
allocated on nodes: 2 4 6 0 1 2 6 0 2
There is a single rotor that is used for allocating both file pages & slab
pages. Writing the file allocates both a data page & a slab page
(buffer_head). This advances the RR rotor 2 nodes for each page
allocated.
A quick confirmation seems to confirm this is the cause of the uneven
allocation:
# echo 0 >/dev/cpuset/memory_spread_slab
# ./lfile
allocated on nodes: 6 7 8 9 0 1 2 3 4 5
This patch introduces a second rotor that is used for slab allocations.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FILE_MAPPED per memcg of migrated file cache is not properly updated,
because our hook in page_add_file_rmap() can't know to which memcg
FILE_MAPPED should be counted.
Basically, this patch is for fixing the bug but includes some big changes
to fix up other messes.
Now, at migrating mapped file, events happen in following sequence.
1. allocate a new page.
2. get memcg of an old page.
3. charge ageinst a new page before migration. But at this point,
no changes to new page's page_cgroup, no commit for the charge.
(IOW, PCG_USED bit is not set.)
4. page migration replaces radix-tree, old-page and new-page.
5. page migration remaps the new page if the old page was mapped.
6. Here, the new page is unlocked.
7. memcg commits the charge for newpage, Mark the new page's page_cgroup
as PCG_USED.
Because "commit" happens after page-remap, we can count FILE_MAPPED
at "5", because we should avoid to trust page_cgroup->mem_cgroup.
if PCG_USED bit is unset.
(Note: memcg's LRU removal code does that but LRU-isolation logic is used
for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is
not on LRU or page_cgroup->mem_cgroup is NULL.)
We can lose file_mapped accounting information at 5 because FILE_MAPPED
is updated only when mapcount changes 0->1. So we should catch it.
BTW, historically, above implemntation comes from migration-failure
of anonymous page. Because we charge both of old page and new page
with mapcount=0, we can't catch
- the page is really freed before remap.
- migration fails but it's freed before remap
or .....corner cases.
New migration sequence with memcg is:
1. allocate a new page.
2. mark PageCgroupMigration to the old page.
3. charge against a new page onto the old page's memcg. (here, new page's pc
is marked as PageCgroupUsed.)
4. page migration replaces radix-tree, page table, etc...
5. At remapping, new page's page_cgroup is now makrked as "USED"
We can catch 0->1 event and FILE_MAPPED will be properly updated.
And we can catch SWAPOUT event after unlock this and freeing this
page by unmap() can be caught.
7. Clear PageCgroupMigration of the old page.
So, FILE_MAPPED will be correctly updated.
Then, for what MIGRATION flag is ?
Without it, at migration failure, we may have to charge old page again
because it may be fully unmapped. "charge" means that we have to dive into
memory reclaim or something complated. So, it's better to avoid
charge it again. Before this patch, __commit_charge() was working for
both of the old/new page and fixed up all. But this technique has some
racy condtion around FILE_MAPPED and SWAPOUT etc...
Now, the kernel use MIGRATION flag and don't uncharge old page until
the end of migration.
I hope this change will make memcg's page migration much simpler. This
page migration has caused several troubles. Worth to add a flag for
simplification.
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds support for moving charge of file pages, which include
normal file, tmpfs file and swaps of tmpfs file. It's enabled by setting
bit 1 of <target cgroup>/memory.move_charge_at_immigrate.
Unlike the case of anonymous pages, file pages(and swaps) in the range
mmapped by the task will be moved even if the task hasn't done page fault,
i.e. they might not be the task's "RSS", but other task's "RSS" that maps
the same file. And mapcount of the page is ignored(the page can be moved
even if page_mapcount(page) > 1). So, conditions that the page/swap
should be met to be moved is that it must be in the range mmapped by the
target task and it must be charged to the old cgroup.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few architectures, like OMAP, allow you to set a debouncing time for the
gpio before generating the IRQ. Teach gpiolib about that.
Mark said:
: This would be generally useful for embedded systems, especially where
: the interrupt concerned is a wake source. It allows drivers to avoid
: spurious interrupts from noisy sources so if the hardware supports it
: the driver can avoid having to explicitly wait for the signal to become
: stable and software has to cope with fewer events. We've lived without
: it for quite some time, though.
David said:
: I looked at adding debounce support to the generic GPIO calls (and thus
: gpiolib) some time back, but decided against it. I forget why at this
: time (check list archives) but it wasn't because of lack of utility in
: certain contexts.
:
: One thing to watch out for is just how variable the hardware capabilities
: are. Atmel GPIOs have something like a fixed number of 32K clock cycles
: for debounce, twl4030 had something odd, OMAPs were more like the Atmel
: chips but with a different clock. In some cases debouncing had to be
: ganged, not per-GPIO. And so forth.
Signed-off-by: Felipe Balbi <felipe.balbi@nokia.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: David Brownell <david-b@pacbell.net>
Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gpiolib doesn't need to modify the names and I assume most initializers
use string constants that shouldn't be modified anyhow.
[akpm@linux-foundation.org: fix drivers/gpio/cs5535-gpio.c]
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Kevin Wells <kevin.wells@nxp.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>