The timeouts in IPMI are in the 1-5 second range in message handling, so a
1 second timeout is a reasonable thing to do. This should help with
reducing power consumption on idle systems.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some odd systems may have multiple BMCs, and we want to be able to support
them. Let's make the assumption that if a system legitimately has
multiple BMCs then each BMC's SI will be of the same type, and also that
we won't see multiple SIs of the same type unless we have multiple BMCs.
If these hold true then we should register all SIs of the same type.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we're not currently in the middle of a transaction, and if we have
interrupts, there's no real reason to poll the controller more frequently
than the core IPMI code does. Set the interrupt_disabled flag
appropriately as the interrupt state changes, and make the timeout code
reset itself only if the transaction is incomplete or we have no
interrupts.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ipmi spec provides an ordering for si discovery. Change the driver to
match, with the exception of preferring smbios to SPMI as HPs (at least)
contain accurate information in the former but not the latter.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Only register one si per bmc. Use any user-provided devices first,
followed by the first device with an irq, followed by the first device
discovered.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ipmi spec indicates that we should only make use of one si per bmc, so
separate device discovery and registration to make that possible.
[thenzl@redhat.com: fix mutex use]
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch from a char* to an enum to identify the address source of SIs,
making it easier to handle them appropriately during registration.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
type T;
T x;
identifier f;
@@
T f (...) { <+...
- ERR_PTR(PTR_ERR(x))
+ x
...+> }
@@
expression x;
@@
- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ipc/sem.c begins with a 15 year old description about bugs in the initial
implementation in Linux-1.0. The patch replaces that with a top level
description of the current code.
A TODO could be derived from this text:
The opengroup man page for semop() does not mandate FIFO. Thus there is
no need for a semaphore array list of pending operations.
If
- this list is removed
- the per-semaphore array spinlock is removed (possible if there is no
list to protect)
- sem_otime is moved into the semaphores and calculated on demand during
semctl()
then the array would be read-mostly - which would significantly improve
scaling for applications that use semaphore arrays with lots of entries.
The price would be expensive semctl() calls:
for(i=0;i<sma->sem_nsems;i++) spin_lock(sma->sem_lock);
<do stuff>
for(i=0;i<sma->sem_nsems;i++) spin_unlock(sma->sem_lock);
I'm not sure if the complexity is worth the effort, thus here is the
documentation of the current behavior first.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The wake-up part of semtimedop() consists out of two steps:
- the right tasks must be identified.
- they must be woken up.
Right now, both steps run while the array spinlock is held. This patch
reorders the code and moves the actual wake_up_process() behind the point
where the spinlock is dropped.
The code also moves setting sem->sem_otime to one place: It does not make
sense to set the last modify time multiple times.
[akpm@linux-foundation.org: repair kerneldoc]
[akpm@linux-foundation.org: fix uninitialised retval]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following series of patches tries to fix the spinlock contention
reported by Chris Mason - his benchmark exposes problems of the current
code:
- In the worst case, the algorithm used by update_queue() is O(N^2).
Bulk wake-up calls can enter this worst case. The patch series fix
that.
Note that the benchmark app doesn't expose the problem, it just should
be fixed: Real world apps might do the wake-ups in another order than
perfect FIFO.
- The part of the code that runs within the semaphore array spinlock is
significantly larger than necessary.
The patch series fixes that. This change is responsible for the main
improvement.
- The cacheline with the spinlock is also used for a variable that is
read in the hot path (sem_base) and for a variable that is unnecessarily
written to multiple times (sem_otime). The last step of the series
cacheline-aligns the spinlock.
This patch:
The SysV semaphore code allows to perform multiple operations on all
semaphores in the array as atomic operations. After a modification,
update_queue() checks which of the waiting tasks can complete.
The algorithm that is used to identify the tasks is O(N^2) in the worst
case. For some cases, it is simple to avoid the O(N^2).
The patch adds a detection logic for some cases, especially for the case
of an array where all sleeping tasks are single sembuf operations and a
multi-sembuf operation is used to wake up multiple tasks.
A big database application uses that approach.
The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of
the patch breaks that.
[akpm@linux-foundation.org: make do_smart_update() static]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently idr_remove_all will fail with a use after free error if
idr::layers is bigger than 2, which on 32 bit systems corresponds to items
more than 1024. This is due to stepping back too many levels during
backtracking. For simplicity let's assume that IDR_BITS=1 -> we have 2
nodes at each level below the root node and each leaf node stores two IDs.
(In reality for 32 bit systems IDR_BITS=5, with 32 nodes at each sub-root
level and 32 IDs in each leaf node). The sequence of freeing the nodes at
the moment is as follows:
layer
1 -> a(7)
2 -> b(3) c(5)
3 -> d(1) e(2) f(4) g(6)
Until step 4 things go fine, but then node c is freed, whereas node g
should be freed first. Since node c contains the pointer to node g we'll
have a use after free error at step 6.
How many levels we step back after visiting the leaf nodes is currently
determined by the msb of the id we are currently visiting:
Step
1. node d with IDs 0,1 is freed, current ID is advanced to 2.
msb of the current ID bit 1. This means we need to step back
1 level to node b and take the next sibling, node e.
2-3. node e with IDs 2,3 is freed, current ID is 4, msb is bit 2.
This means we need to step back 2 levels to node a, freeing
node b on the way.
4-5. node f with IDs 4,5 is freed, current ID is 6, msb is still
bit 2. This means we again need to step back 2 levels to node
a and free c on the way.
6. We should visit node g, but its pointer is not available as
node c was freed.
The fix changes how we determine the number of levels to step back.
Instead of deducting this merely from the msb of the current ID, we should
really check if advancing the ID causes an overflow to a bit position
corresponding to a given layer. In the above example overflow from bit 0
to bit 1 should mean stepping back 1 level. Overflow from bit 1 to bit 2
should mean stepping back 2 levels and so on.
The fix was tested with IDs up to 1 << 20, which corresponds to 4 layers
on 32 bit systems.
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: <stable@kernel.org> [2.6.34.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since when CONFIG_HOTPLUG_CPU=n, get_online_cpus() do nothing, so we don't
need cpu_hotplug_begin() either.
This patch moves cpu_hotplug_begin()/cpu_hotplug_done() into the code
block of CONFIG_HOTPLUG_CPU=y.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I used this module to test the series of modification to the cpu notifiers
code.
Example1: inject CPU offline error (-1 == -EPERM)
# modprobe cpu-notifier-error-inject cpu_down_prepare_error=-1
# echo 0 > /sys/devices/system/cpu/cpu1/online
bash: echo: write error: Operation not permitted
Example2: inject CPU online error (-2 == -ENOENT)
# modprobe cpu-notifier-error-inject cpu_up_prepare_error=-2
# echo 1 > /sys/devices/system/cpu/cpu1/online
bash: echo: write error: No such file or directory
[akpm@linux-foundation.org: fix Kconfig help text]
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>