Pull powerpc updates from Ben Herrenschmidt:
"This is the powerpc new goodies for 3.17. The short story:
The biggest bit is Michael removing all of pre-POWER4 processor
support from the 64-bit kernel. POWER3 and rs64. This gets rid of a
ton of old cruft that has been bitrotting in a long while. It was
broken for quite a few versions already and nobody noticed. Nobody
uses those machines anymore. While at it, he cleaned up a bunch of
old dusty cabinets, getting rid of a skeletton or two.
Then, we have some base VFIO support for KVM, which allows assigning
of PCI devices to KVM guests, support for large 64-bit BARs on
"powernv" platforms, support for HMI (Hardware Management Interrupts)
on those same platforms, some sparse-vmemmap improvements (for memory
hotplug),
There is the usual batch of Freescale embedded updates (summary in the
merge commit) and fixes here or there, I think that's it for the
highlights"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (102 commits)
powerpc/eeh: Export eeh_iommu_group_to_pe()
powerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API
powerpc: Reduce scariness of interrupt frames in stack traces
powerpc: start loop at section start of start in vmemmap_populated()
powerpc: implement vmemmap_free()
powerpc: implement vmemmap_remove_mapping() for BOOK3S
powerpc: implement vmemmap_list_free()
powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE
powerpc/book3s: Fix endianess issue for HMI handling on napping cpus.
powerpc/book3s: handle HMIs for cpus in nap mode.
powerpc/powernv: Invoke opal call to handle hmi.
powerpc/book3s: Add basic infrastructure to handle HMI in Linux.
powerpc/iommu: Fix comments with it_page_shift
powerpc/powernv: Handle compound PE in config accessors
powerpc/powernv: Handle compound PE for EEH
powerpc/powernv: Handle compound PE
powerpc/powernv: Split ioda_eeh_get_state()
powerpc/powernv: Allow to freeze PE
powerpc/powernv: Enable M64 aperatus for PHB3
powerpc/eeh: Aux PE data for error log
...
The function is used by VFIO driver, which might be built as a
dynamic module. So it should be exported.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull timer and time updates from Thomas Gleixner:
"A rather large update of timers, timekeeping & co
- Core timekeeping code is year-2038 safe now for 32bit machines.
Now we just need to fix all in kernel users and the gazillion of
user space interfaces which rely on timespec/timeval :)
- Better cache layout for the timekeeping internal data structures.
- Proper nanosecond based interfaces for in kernel users.
- Tree wide cleanup of code which wants nanoseconds but does hoops
and loops to convert back and forth from timespecs. Some of it
definitely belongs into the ugly code museum.
- Consolidation of the timekeeping interface zoo.
- A fast NMI safe accessor to clock monotonic for tracing. This is a
long standing request to support correlated user/kernel space
traces. With proper NTP frequency correction it's also suitable
for correlation of traces accross separate machines.
- Checkpoint/restart support for timerfd.
- A few NOHZ[_FULL] improvements in the [hr]timer code.
- Code move from kernel to kernel/time of all time* related code.
- New clocksource/event drivers from the ARM universe. I'm really
impressed that despite an architected timer in the newer chips SoC
manufacturers insist on inventing new and differently broken SoC
specific timers.
[ Ed. "Impressed"? I don't think that word means what you think it means ]
- Another round of code move from arch to drivers. Looks like most
of the legacy mess in ARM regarding timers is sorted out except for
a few obnoxious strongholds.
- The usual updates and fixlets all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
timekeeping: Fixup typo in update_vsyscall_old definition
clocksource: document some basic timekeeping concepts
timekeeping: Use cached ntp_tick_length when accumulating error
timekeeping: Rework frequency adjustments to work better w/ nohz
timekeeping: Minor fixup for timespec64->timespec assignment
ftrace: Provide trace clocks monotonic
timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
seqcount: Add raw_write_seqcount_latch()
seqcount: Provide raw_read_seqcount()
timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
timekeeping: Create struct tk_read_base and use it in struct timekeeper
timekeeping: Restructure the timekeeper some more
clocksource: Get rid of cycle_last
clocksource: Move cycle_last validation to core code
clocksource: Make delta calculation a function
wireless: ath9k: Get rid of timespec conversions
drm: vmwgfx: Use nsec based interfaces
drm: i915: Use nsec based interfaces
timekeeping: Provide ktime_get_raw()
hangcheck-timer: Use ktime_get_ns()
...
Some new functions are exposed for use by the IOMMU code but
won't build when CONFIG_IOMMU_API isn't set, so shield them
appropriately.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some people see things like "Exception: 501" in stack traces in dmesg
and assume that means that something has gone badly wrong, when in
fact "Exception: 501" just means a device interrupt was taken.
This changes "Exception" to "interrupt" to make it clearer that we
are just recording the fact of a change in control flow rather than
some error condition.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
(NOTE: This patch depends on upstream HMI handling patchset at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-July/119731.html)
The current HMI handling on napping cpus does not take care of endianess
issue. On LE host kernel when we wake up from nap due to HMI interrupt we
would checkstop while jumping into opal call. There is a similar issue in
case of fast sleep wakeup where the code invokes opal_resync_tb opal call
without handling LE issue. This patch fixes that as well.
With this patch applied, HMIs handling on LE host kernel works fine.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
HMIs are thread specific and can come while thread is in sleep/nap mode.
Hence with SMT=off mode we can receive HMIs on sleeping threads. For
interrupt received in nap mode, cpu wakes up at system reset vector, clears
the interrupt and go back to nap mode again. But HMIs are sticky and they
keep happening until we clear reason bits from HMER. Hence add a special
check for HMI in reset vector (through power7_wakeup_* functions) and
invoke opal call to handle HMI.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Handle Hypervisor Maintenance Interrupt (HMI) in Linux. This patch implements
basic infrastructure to handle HMI in Linux host. The design is to invoke
opal handle hmi in real mode for recovery and set irq_pending when we hit HMI.
During check_irq_replay pull opal hmi event and print hmi info on console.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is a couple of commented debug prints which still use
IOMMU_PAGE_SHIFT() which is not defined for POWERPC anymore, replace
them with it_page_shift.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch allows PE (struct eeh_pe) instance to have auxillary data,
whose size is configurable on basis of platform. For PowerNV, the
auxillary data will be used to cache PHB diag-data for that PE
(frozen PE or fenced PHB). In turn, we can retrieve the diag-data
at any later points.
It's useful for the case of VFIO PCI devices where the error log
should be cached, and then be retrieved by the guest at later point.
Also, it can avoid PHB diag-data overwritting if another frozen PE
reported and the previous diag-data isn't fetched by guest.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
pr_warn() is equal to pr_warning(), but the former is a bit more
formal according to commit fc62f2f ("kernel.h: add pr_warn for
symmetry to dev_warn, netdev_warn").
The patch replaces pr_warning() with pr_warn().
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch prints 4 PCIE or AER config registers each line, which
is part of the EEH log so that it looks a bit more compact.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
According to the experiment I did, PCI config access is blocked
on P7IOC frozen PE by hardware, but PHB3 doesn't do that. That
means we always get 0xFF's while dumping PCI config space of the
frozen PE on P7IOC. We don't have the problem on PHB3. So we have
to enable I/O prioir to collecting error log. Otherwise, meaningless
0xFF's are always returned.
The patch fixes it by EEH flag (EEH_ENABLE_IO_FOR_LOG), which is
selectively set to indicate the case for: P7IOC on PowerNV platform,
pSeries platform.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There are multiple global EEH flags. Almost each flag has its own
accessor, which doesn't make sense. The patch refactors EEH flag
accessors so that they look unified:
eeh_add_flag(): Add EEH flag
eeh_clear_flag(): Clear EEH flag
eeh_has_flag(): Check if one specific flag has been set
eeh_enabled(): Check if EEH functionality has been enabled
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Function eeh_iommu_group_to_pe() iterates each PCI device to check
the binding IOMMU group with get_iommu_table_base(), which possibly
fetches pdev->dev.archdata.dma_data.dma_offset. It's (0x1 << 59)
for "bypass" cases.
The patch fixes the issue by iterating devices hooked to the IOMMU
group and fetch IOMMU table there.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
pci_get_slot() is called with hold of PCI bus semaphore and it's not
safe to be called in interrupt context. However, we possibly checks
EEH error and calls the function in interrupt context. To avoid using
pci_get_slot(), we turn into device tree for fetching location code.
Otherwise, we might run into WARN_ON() as following messages indicate:
WARNING: at drivers/pci/search.c:223
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72
task: c000000001367af0 ti: c000000001444000 task.ti: c000000001444000
NIP: c000000000497b70 LR: c000000000037530 CTR: 000000003003d114
REGS: c000000001446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+)
MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 48002422 XER: 20000000
CFAR: c00000000003752c SOFTE: 0
:
NIP [c000000000497b70] .pci_get_slot+0x40/0x110
LR [c000000000037530] .eeh_pe_loc_get+0x150/0x190
Call Trace:
.of_get_property+0x30/0x60 (unreliable)
.eeh_pe_loc_get+0x150/0x190
.eeh_dev_check_failure+0x1b4/0x550
.eeh_check_failure+0x90/0xf0
.lpfc_sli_check_eratt+0x504/0x7c0 [lpfc]
.lpfc_poll_eratt+0x64/0x100 [lpfc]
.call_timer_fn+0x64/0x190
.run_timer_softirq+0x2cc/0x3e0
Cc: stable@vger.kernel.org
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The sysfs entries are lost because of commit 2213fb1 ("powerpc/eeh:
Skip eeh sysfs when eeh is disabled"). That commit added condition
to create sysfs entries with EEH_ENABLED, which isn't populated
when trying to create sysfs entries on PowerNV platform during system
boot time. The patch fixes the issue by:
* Reoder EEH initialization functions so that they're same on
PowerNV/pSeries.
* Cache PE's primary bus by PowerNV platform instead of EEH core
to avoid kernel crash caused by the function reorder. Another
benefit with this is to avoid one eeh_probe_mode_dev() in EEH
core.
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch exports functions to be used by new VFIO ioctl command,
which will be introduced in subsequent patch, to support EEH
functinality for VFIO PCI devices.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We must not handle EEH error on devices which are passed to somebody
else. Instead, we expect that the frozen device owner detects an EEH
error and recovers from it.
This avoids EEH error handling on passed through devices so the device
owner gets a chance to handle them.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Scott writes:
Highlights include e6500 hardware threading support, an e6500 TLB erratum
workaround, corenet error reporting, support for a new board, and some
minor fixes.
Pull tracing updates from Steven Rostedt:
"This pull request has a lot of work done. The main thing is the
changes to the ftrace function callback infrastructure. It's
introducing a way to allow different functions to call directly
different trampolines instead of all calling the same "mcount" one.
The only user of this for now is the function graph tracer, which
always had a different trampoline, but the function tracer trampoline
was called and did basically nothing, and then the function graph
tracer trampoline was called. The difference now, is that the
function graph tracer trampoline can be called directly if a function
is only being traced by the function graph trampoline. If function
tracing is also happening on the same function, the old way is still
done.
The accounting for this takes up more memory when function graph
tracing is activated, as it needs to keep track of which functions it
uses. I have a new way that wont take as much memory, but it's not
ready yet for this merge window, and will have to wait for the next
one.
Another big change was the removal of the ftrace_start/stop() calls
that were used by the suspend/resume code that stopped function
tracing when entering into suspend and resume paths. The stop of
ftrace was done because there was some function that would crash the
system if one called smp_processor_id()! The stop/start was a big
hammer to solve the issue at the time, which was when ftrace was first
introduced into Linux. Now ftrace has better infrastructure to debug
such issues, and I found the problem function and labeled it with
"notrace" and function tracing can now safely be activated all the way
down into the guts of suspend and resume
Other changes include clean ups of uprobe code, clean up of the
trace_seq() code, and other various small fixes and clean ups to
ftrace and tracing"
* tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits)
ftrace: Add warning if tramp hash does not match nr_trampolines
ftrace: Fix trampoline hash update check on rec->flags
ring-buffer: Use rb_page_size() instead of open coded head_page size
ftrace: Rename ftrace_ops field from trampolines to nr_trampolines
tracing: Convert local function_graph functions to static
ftrace: Do not copy old hash when resetting
tracing: let user specify tracing_thresh after selecting function_graph
ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on()
tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST
s390/ftrace: remove check of obsolete variable function_trace_stop
arm64, ftrace: Remove check of obsolete variable function_trace_stop
Blackfin: ftrace: Remove check of obsolete variable function_trace_stop
metag: ftrace: Remove check of obsolete variable function_trace_stop
microblaze: ftrace: Remove check of obsolete variable function_trace_stop
MIPS: ftrace: Remove check of obsolete variable function_trace_stop
parisc: ftrace: Remove check of obsolete variable function_trace_stop
sh: ftrace: Remove check of obsolete variable function_trace_stop
sparc64,ftrace: Remove check of obsolete variable function_trace_stop
tile: ftrace: Remove check of obsolete variable function_trace_stop
ftrace: x86: Remove check of obsolete variable function_trace_stop
...
The general idea is that each core will release all of its
threads into the secondary thread startup code, which will
eventually wait in the secondary core holding area, for the
appropriate bit in the PACA to be set. The kick_cpu function
pointer will set that bit in the PACA, and thus "release"
the core/thread to boot. We also need to do a few things that
U-Boot normally does for CPUs (like enable branch prediction).
Signed-off-by: Andy Fleming <afleming@freescale.com>
[scottwood@freescale.com: various changes, including only enabling
threads if Linux wants to kick them]
Signed-off-by: Scott Wood <scottwood@freescale.com>
Pull powerpc fixes from Ben Herrenschmidt:
"Here are 3 more small powerpc fixes that should still go into .16.
One is a recent regression (MMCR2 business), the other is a trivial
endian fix without which FW updates won't work on LE in IBM machines,
and the 3rd one turns a BUG_ON into a WARN_ON which is definitely a
LOT more friendly especially when the whole thing is about retrieving
error logs ..."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix endianness of flash_block_list in rtas_flash
powerpc/powernv: Change BUG_ON to WARN_ON in elog code
powerpc/perf: Fix MMCR2 handling for EBB
I spent ten minutes scratching my head, trying to work out where we
enabled relocation on interrupts for guest kernels. Expand the doco to
make it clear.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>