Commit 7ea0ed2b5b ("ipmi: Make the message handler easier to use for
SMI interfaces") changed handle_new_recv_msgs() to call handle_one_recv_msg()
for a smi_msg while the smi_msg is still connected to waiting_rcv_msgs list.
That could lead to following list corruption problems:
1) low-level function treats smi_msg as not connected to list
handle_one_recv_msg() could end up calling smi_send(), which
assumes the msg is not connected to list.
For example, the following sequence could corrupt list by
doing list_add_tail() for the entry still connected to other list.
handle_new_recv_msgs()
msg = list_entry(waiting_rcv_msgs)
handle_one_recv_msg(msg)
handle_ipmb_get_msg_cmd(msg)
smi_send(msg)
spin_lock(xmit_msgs_lock)
list_add_tail(msg)
spin_unlock(xmit_msgs_lock)
2) race between multiple handle_new_recv_msgs() instances
handle_new_recv_msgs() once releases waiting_rcv_msgs_lock before calling
handle_one_recv_msg() then retakes the lock and list_del() it.
If others call handle_new_recv_msgs() during the window shown below
list_del() will be done twice for the same smi_msg.
handle_new_recv_msgs()
spin_lock(waiting_rcv_msgs_lock)
msg = list_entry(waiting_rcv_msgs)
spin_unlock(waiting_rcv_msgs_lock)
|
| handle_one_recv_msg(msg)
|
spin_lock(waiting_rcv_msgs_lock)
list_del(msg)
spin_unlock(waiting_rcv_msgs_lock)
Fixes: 7ea0ed2b5b ("ipmi: Make the message handler easier to use for SMI interfaces")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
[Added a comment to describe why this works.]
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org # 3.19
Tested-by: Ye Feng <yefeng.yl@alibaba-inc.com>
Unlike everywhere else in the IPMI specification, the I2C address
specified in the SPMI table is not shifted to the left one bit with
the LSB zero. Instead it is not shifted with the MSB zero.
Reported-by: Sanjeev <singhsan@codeaurora.org>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Commit d61a3ead26 ("[PATCH] IPMI: reserve I/O ports separately")
changed the way I/O ports were reserved and includes this comment in
log:
Some BIOSes reserve disjoint I/O regions in their ACPI tables for the IPMI
controller. This causes problems when trying to register the entire I/O
region. Therefore we must register each I/O port separately.
There is a similar problem with memio regions on an arm64 platform
(AMD Seattle). Where I see:
ipmi message handler version 39.2
ipmi_si AMDI0300:00: probing via device tree
ipmi_si AMDI0300:00: ipmi_si: probing via ACPI
ipmi_si AMDI0300:00: [mem 0xe0010000] regsize 1 spacing 4 irq 23
ipmi_si: Adding ACPI-specified kcs state machine
IPMI System Interface driver.
ipmi_si: Trying ACPI-specified kcs state machine at mem \
address 0xe0010000, slave address 0x0, irq 23
ipmi_si: Could not set up I/O space
The problem is that the ACPI core registers disjoint regions for the
platform device:
e0010000-e0010000 : AMDI0300:00
e0010004-e0010004 : AMDI0300:00
and the ipmi_si driver tries to register one region e0010000-e0010004.
Based on a patch from Mark Salter <msalter@redhat.com>, who also wrote
all the above text.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Tested-by: Mark Salter <msalter@redhat.com>
Commit 1717f2096b ("panic, x86: Fix re-entrance problem due to panic
on NMI") introduced nmi_panic() which prevents concurrent and recursive
execution of panic(). It also saves registers for the crash dump on x86
by later commit 58c5661f21 ("panic, x86: Allow CPUs to save registers
even if looping in NMI context").
ipmi_watchdog driver can call panic() from NMI handler, so replace it
with nmi_panic().
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Extend the tryacpi module parameter to turn off acpi_ipmi_probe such
that hard-coded options (type, ports, address, etc.) have complete
control over the smi_info data structures setup by the driver.
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Under some circumstances, the IPMI state machine could return
a call without delay option but the driver would still do a long
delay because the result wasn't checked. Instead of calling
the state machine after transaction done, just go back to the
top of the processing to start over.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Enclosing '#include <linux/acpi.h>' within '#ifdef CONFIG_ACPI' is
unnecessary, since it has its own conditional compile for CONFIG_ACPI.
Commit 0fbcf4af7c ("ipmi: Convert the IPMI SI ACPI handling to a
platform device") exposed this as a problem for platforms that do not
support ACPI when it introduced a call to ACPI_PTR() macro outside of
the CONFIG_ACPI conditional compile. This would have been perfectly
acceptable if acpi.h were not conditionally excluded for the non-acpi
platform, because the conditional compile within acpi.h defines
ACPI_PTR() to return NULL when compiled for non acpi platforms.
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Fixed commit reference in header to conform to standard.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
We call cleanup_one_si from ipmi_pci_remove, which calls ->addr_source_cleanup,
which gets set to point to ipmi_pci_cleanup, which does a pci_disable_device.
On return from this, we do a second pci_disable_device, which
results in the trace below.
ipmi_si 0000:00:16.0: disabling already-disabled device
Call Trace:
[<ffffffff818ce54c>] dump_stack+0x45/0x57
[<ffffffff810525f7>] warn_slowpath_common+0x97/0xe0
[<ffffffff810526f6>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81497ca1>] pci_disable_device+0xb1/0xc0
[<ffffffffa00851a5>] ipmi_pci_remove+0x25/0x30 [ipmi_si]
[<ffffffff8149a696>] pci_device_remove+0x46/0xc0
[<ffffffff8156801f>] __device_release_driver+0x7f/0xf0
[<ffffffff81568978>] driver_detach+0xb8/0xc0
[<ffffffff81567e50>] bus_remove_driver+0x50/0xa0
[<ffffffff8156914e>] driver_unregister+0x2e/0x60
[<ffffffff8149a3e5>] pci_unregister_driver+0x25/0x90
[<ffffffffa0085804>] cleanup_ipmi_si+0xd4/0xf0 [ipmi_si]
[<ffffffff810c727a>] SyS_delete_module+0x12a/0x200
[<ffffffff818d4d72>] system_call_fastpath+0x12/0x17
Signed-off-by: Dave Jones <dsj@fb.com>
Lots of char arrays could be set as const since they contain only literal
char arrays.
We could in the same time make const some struct members who are pointer
to those const char arrays.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
We encountered a panic on boot in ipmi_si on a dell per320 due to an
uninitialized timer as follows.
static int smi_start_processing(void *send_info,
ipmi_smi_t intf)
{
/* Try to claim any interrupts. */
if (new_smi->irq_setup)
new_smi->irq_setup(new_smi);
--> IRQ arrives here and irq handler tries to modify uninitialized timer
which triggers BUG_ON(!timer->function) in __mod_timer().
Call Trace:
<IRQ>
[<ffffffffa0532617>] start_new_msg+0x47/0x80 [ipmi_si]
[<ffffffffa053269e>] start_check_enables+0x4e/0x60 [ipmi_si]
[<ffffffffa0532bd8>] smi_event_handler+0x1e8/0x640 [ipmi_si]
[<ffffffff810f5584>] ? __rcu_process_callbacks+0x54/0x350
[<ffffffffa053327c>] si_irq_handler+0x3c/0x60 [ipmi_si]
[<ffffffff810efaf0>] handle_IRQ_event+0x60/0x170
[<ffffffff810f245e>] handle_edge_irq+0xde/0x180
[<ffffffff8100fc59>] handle_irq+0x49/0xa0
[<ffffffff8154643c>] do_IRQ+0x6c/0xf0
[<ffffffff8100ba53>] ret_from_intr+0x0/0x11
/* Set up the timer that drives the interface. */
setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi);
The following patch fixes the problem.
To: Openipmi-developer@lists.sourceforge.net
To: Corey Minyard <minyard@acm.org>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org # Applies cleanly to 3.10-, needs small rework before
In order to allow panic actions to be processed, the ipmi watchdog
driver sets a new timeout value on panic. The 255s timeout
was designed to allow kdump and others actions on panic, as in
http://lkml.iu.edu/hypermail/linux/kernel/0711.3/0258.html
This is counter-intuitive for a end-user who sets watchdog timeout
value to something like 30s and who expects BMC to reset the system
within 30s of a panic.
This commit allows user to configure the timeout on panic.
Signed-off-by: Jean-Yves Faye <jean-yves.faye@c-s.fr>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
The policy for drivers is to have MODULE_DEVICE_TABLE() just after the
struct used in it. For clarity.
Suggested-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
The IPMI driver would let the final timeout just happen, but it could
easily just stop the timer. If the timer stop fails that's ok, that
should be rare.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
The timer and thread were not being started for internal messages,
so in interrupt mode if something hung the timer would never go
off and clean things up. Factor out the internal message sending
and start the timer for those messages, too.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Tested-by: Gouji, Masayuki <gouji.masayuki@jp.fujitsu.com>
Cc: stable@vger.kernel.org
This patch replaces timeval with timespec64 as 32 bit 'struct timeval'
will not give current time beyond 2038.
The patch changes the code to use ktime_get_real_ts64() which returns
a 'struct timespec64' instead of do_gettimeofday() which returns a
'struct timeval'
This patch also alters the format string in pr_info() for now.tv_sec
to incorporate 'long long' on 32 bit architectures.
Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
They are broken on some platforms, this gives people a chance to work
around it until the firmware is fixed.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Fix autoloading ipmi modules when using device tree.
Signed-off-by: Brijesh Singh <brijeshkumar.singh@amd.com>
Moved this change up into the CONFIG_OF section to account
for changes to the probing code.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
It appears that some BMCs support interrupts but don't support setting
the irq enable bits. The interrupts are just always on. Sigh.
Add code to compensate.
The new code was very similar to another functions, so this also
factors out the common code into other functions.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Tested-by: Henrik Korkuc <henrik@kirneh.eu>
Received handlers defined as ipmi_recv_hndl member of struct
ipmi_user_hndl can take a spinlock. This means that if the kernel
panics while holding the lock, a deadlock may happen on the lock
while flushing queued messages in the panic context.
Calling the receive handler doesn't make much meanings in the panic
context, simply skip it to avoid possible deadlocks.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
When processing queued messages in the panic context, IPMI driver
tries to do it without any locking to avoid deadlocks. However,
this means we can touch a corrupted list if the kernel panicked
while manipulating the list. Fortunately, current `add-tail and
del-from-head' style implementation won't touch the corrupted part,
but it is inherently risky.
To get rid of the risk, this patch re-initializes the message lists
on panic if the related spinlock has already been acquired. As the
result, we may lose queued messages, but it's not so painful.
Dropping messages on the received message list is also less
problematic because no one can respond the received messages.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Fixed a comment typo.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
When flushing queued messages in run-to-completion mode,
smi_event_handler() is recursively called.
flush_messages()
smi_event_handler()
handle_transaction_done()
deliver_recv_msg()
ipmi_smi_msg_received()
smi_recv_tasklet()
sender()
flush_messages()
smi_event_handler()
...
The depth of the recursive call depends on the number of queued
messages, so it can cause a stack overflow if many messages have
been queued.
To solve this problem, this patch removes flush_messages()
from sender()@ipmi_si_intf.c. Instead, add flush_messages() to
caller side of sender() if needed. Additionally, to implement this,
add new handler flush_messages to struct ipmi_smi_handlers.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Fixed up a comment and some spacing issues.
Signed-off-by: Corey Minyard <cminyard@mvista.com>