Commit Graph

4670 Commits

Author SHA1 Message Date
Ricky Wu
31b081066e misc: rtsx: init value of aspm_enabled
make sure ASPM state sync with pcr->aspm_enabled
init value pcr->aspm_enabled

Cc: stable@vger.kernel.org
Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
Link: https://lore.kernel.org/r/20210122081906.19100-1-ricky_wu@realtek.com
Fixes: d928061c31 ("misc: rtsx: modify en/disable aspm function")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-22 11:04:53 +01:00
Oded Gabbay
2dc4a6d791 habanalabs: disable FW events on device removal
When device is removed, we need to make sure the F/W won't send us
any more events because during the remove process we disable the
interrupts.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-21 20:30:22 +02:00
Oded Gabbay
f8abaf379b habanalabs: fix backward compatibility of idle check
Need to take the lower 32 bits of the driver's 64-bit idle mask and put
it in the legacy 32-bit variable that the userspace reads to know the
idle mask.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-21 20:30:22 +02:00
Ofir Bitton
9354f1b421 habanalabs: zero pci counters packet before submit to FW
Driver does not zero some pci counters packets before sending
to FW. This causes an out of sync PI/CI between driver and FW.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-21 20:30:22 +02:00
Oded Gabbay
9488307a55 habanalabs: prevent soft lockup during unmap
When using Deep learning framework such as tensorflow or pytorch, there
are tens of thousands of host memory mappings. When the user frees
all those mappings at the same time, the process of unmapping and
unpinning them can take a long time, which may cause a soft lockup
bug.

To prevent this, we need to free the core to do other things during
the unmapping process. For now, we chose to do it every 32K unmappings
(each unmap is a single 4K page).

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-12 15:00:10 +02:00
Oded Gabbay
aa6df6533b habanalabs: fix reset process in case of failures
There are some points in the reset process where if the code fails
for some reason, and the system admin tries to initiate the reset
process again we will get a kernel panic.

This is because there aren't any protections in different fini
functions that are called during the reset process.

The protections that are added in this patch make sure that if the fini
functions are called multiple times, without calling init functions
between them, there won't be double release of already released
resources.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-12 14:59:52 +02:00
Oded Gabbay
a9d4ef6434 habanalabs: fix dma_addr passed to dma_mmap_coherent
When doing dma_alloc_coherent in the driver, we add a certain hard-coded
offset to the DMA address before returning to the callee function. This
offset is needed when our device use this DMA address to perform
outbound transactions to the host.

However, if we want to map the DMA'able memory to the user via
dma_mmap_coherent(), we need to pass the original dma address, without
this offset. Otherwise, we will get erronouos mapping.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-12 14:59:36 +02:00
Andy Shevchenko
afded6d83a misc: pvpanic: Check devm_ioport_map() for NULL
Inconveniently devm_ioport_map() and devm_ioremap_resource()
return errors differently, i.e. former uses simply NULL pointer,
while the latter an error pointer.

Due to this, we have to check each of them separately.

Fixes: f104060813 ("misc: pvpanic: Combine ACPI and platform drivers")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20201228184313.57610-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-07 20:28:01 +01:00
Dinghao Liu
b000700d6d habanalabs: Fix memleak in hl_device_reset
When kzalloc() fails, we should execute hl_mmu_fini()
to release the MMU module. It's the same when
hl_ctx_init() fails.

Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-29 23:23:12 +02:00
Oded Gabbay
097c62b6f0 habanalabs: fix order of status check
When the device is in reset or needs to be reset, the disabled property
is don't-care.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:39 +02:00
Oded Gabbay
fcaebc7354 habanalabs: register to pci shutdown callback
We need to make sure our device is idle when rebooting a virtual
machine. This is done in the driver level.

The firmware will later handle FLR but we want to be extra safe and
stop the devices until the FLR is handled.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:39 +02:00
Alon Mizrahi
a3fd283063 habanalabs: add validation cs counter, fix misplaced counters
Up until now validation errors were counted in the parsing field
of the cs_counters struct, so we added a new counter and increased
it when needed.

In addition, there were some locations where only one of the counters
was updated (ctx or aggregate) so add the second one to be updated
as well.

Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:39 +02:00
Oded Gabbay
98e8781f00 habanalabs/gaudi: retry loading TPC f/w on -EINTR
If loading the firmware file for the TPC f/w was interrupted, try
to do it again, up to 5 times.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:39 +02:00
Oded Gabbay
377182a3cc habanalabs: adjust pci controller init to new firmware
When the firmware security is enabled, the pcie_aux_dbi_reg_addr
register in the PCI controller is blocked. Therefore, ignore
the result of writing to this register and assume it worked. Also
remove the prints on errors in the internal ELBI write function.

If the security is enabled, the firmware is responsible for setting
this register correctly so we won't have any problem.

If the security is disabled, the write will work (unless something
is totally broken at the PCI level and then the whole sequence
will fail).

In addition, remove a write to register pcie_aux_dbi_reg_addr+4,
which was never actually needed.

Moreover, PCIE_DBI registers are blocked to access from host when
firmware security is enabled. Use a different register to flush the
writes.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:39 +02:00
Oded Gabbay
90ffe170a3 habanalabs: update comment in hl_boot_if.h
Hard-reset flag is updated in many stages of the boot sequence of the
firmware.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:38 +02:00
Oded Gabbay
13d0ee10b5 habanalabs/gaudi: enhance reset message
Print the initiator who performs the hard-reset for easier debugging.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:38 +02:00
Ofir Bitton
6bbb77b9e6 habanalabs: full FW hard reset support
Driver must fetch FW hard reset capability at every FW boot stage:
preboot, CPU boot, CPU application.
If hard reset is triggered, driver will take into consideration
only the last capability received.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:38 +02:00
Oded Gabbay
0024c09485 habanalabs/gaudi: disable CGM at HW initialization
In case the clock gating was enabled in preboot we need to disable it
at the H/W initialization stage before touching the MME/TPC registers.
Otherwise, the ASIC can get stuck. If the security is enabled in
the firmware level, the CGM is always disabled and the driver can't
enable it.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:38 +02:00
Tomer Tayar
7a585dfc32 habanalabs: Revise comment to align with mirror list name
hw_queues_mirror was renamed to cs_mirror, so revise accordingly a
comment that refers to this list.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:38 +02:00
Alon Mizrahi
72ab9ca52d habanalabs/gaudi: do not set EB in collective slave queues
We don't need to set EB on signal packets from collective slave
queues as it degrades performance. Because the slaves are the network
queues, the engine barrier doesn't actually guarantee that the
packet has been sent.

Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:38 +02:00
Ofir Bitton
9c9013cbd8 habanalabs: preboot hard reset support
FW hard reset capability indication is now moved to preboot stage.
Driver will check if HW is dirty only after it validated preboot
is up. If HW is dirty, driver will perform a hard reset according
to the FW capability.
In addition, FW defines a new message which driver need to send in
order to initiate a hard reset.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:38 +02:00
Alon Mizrahi
6585489e80 habanalabs: remove generic gaudi get_pll_freq function
As we only fetch the CPU_PLL frequency in gaudi, we don't need a
generic get_pll_frequency function which takes a pll index as input

Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:38 +02:00
Alon Mizrahi
4783489951 habanalabs: fetch PSOC PLL frequency from F/W in goya
When the F/W security is enabled, goya needs to fetch the PSOC pll
frequency through a dedicated interface

Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:37 +02:00
Tomer Tayar
105b5ca9b1 habanalabs: Fix a missing-braces warning
Fix a compilation "missing braces around initializer" warning.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28 08:47:37 +02:00
Linus Torvalds
8a5be36b93 Merge tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:

 - Switch to the generic C VDSO, as well as some cleanups of our VDSO
   setup/handling code.

 - Support for KUAP (Kernel User Access Prevention) on systems using the
   hashed page table MMU, using memory protection keys.

 - Better handling of PowerVM SMT8 systems where all threads of a core
   do not share an L2, allowing the scheduler to make better scheduling
   decisions.

 - Further improvements to our machine check handling.

 - Show registers when unwinding interrupt frames during stack traces.

 - Improvements to our pseries (PowerVM) partition migration code.

 - Several series from Christophe refactoring and cleaning up various
   parts of the 32-bit code.

 - Other smaller features, fixes & cleanups.

Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh
Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling,
Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King,
Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar,
Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz,
Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour,
Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu
Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver
O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej
Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe
Kleine-König, Vincent Stehlé, Youling Tang, and Zhang Xiaoxu.

* tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (304 commits)
  powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug
  powerpc: Add config fragment for disabling -Werror
  powerpc/configs: Add ppc64le_allnoconfig target
  powerpc/powernv: Rate limit opal-elog read failure message
  powerpc/pseries/memhotplug: Quieten some DLPAR operations
  powerpc/ps3: use dma_mapping_error()
  powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
  powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
  powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()
  KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp
  KVM: PPC: fix comparison to bool warning
  KVM: PPC: Book3S: Assign boolean values to a bool variable
  powerpc: Inline setup_kup()
  powerpc/64s: Mark the kuap/kuep functions non __init
  KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering
  powerpc/xive: Improve error reporting of OPAL calls
  powerpc/xive: Simplify xive_do_source_eoi()
  powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW
  powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW
  powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG
  ...
2020-12-17 13:34:25 -08:00