Commit Graph

810291 Commits

Author SHA1 Message Date
Thierry Reding 77a0b09dd9 drm/tegra: vic: Load firmware on demand
Loading the firmware requires an allocation of IOVA space to make sure
that the VIC's Falcon microcontroller can read the firmware if address
translation via the SMMU is enabled.

However, the allocation currently happens at a time where the geometry
of an IOMMU domain may not have been initialized yet. This happens for
example on Tegra186 and later where an ARM SMMU is used. Domains which
are created by the ARM SMMU driver postpone the geometry setup until a
device is attached to the domain. This is because IOMMU domains aren't
attached to a specific IOMMU instance at allocation time and hence the
input address space, which defines the geometry, is not known yet.

Work around this by postponing the firmware load until it is needed at
the time where a channel is opened to the VIC. At this time the shared
IOMMU domain's geometry has been properly initialized.

As a byproduct this allows the Tegra DRM to be created in the absence
of VIC firmware, since the VIC initialization no longer fails if the
firmware can't be found.

Based on an earlier patch by Dmitry Osipenko <digetx@gmail.com>.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
2019-02-07 18:28:59 +01:00
Thierry Reding 8e5d19c625 drm/tegra: Store parent pointer in Tegra DRM clients
Tegra DRM clients need access to their parent, so store a pointer to it
upon registration. It's technically possible to get at this by going via
the host1x client's parent and getting the driver data, but that's quite
complicated and not very transparent. It's much more straightforward and
natural to let the children know about their parent.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
2019-02-07 18:28:59 +01:00
Thierry Reding e1f338c0f8 gpu: host1x: Optimize CDMA push buffer memory usage
The host1x CDMA push buffer is terminated by a special opcode (RESTART)
that tells the CDMA to wrap around to the beginning of the push buffer.
To accomodate the RESTART opcode, an extra 4 bytes are allocated on top
of the 512 * 8 = 4096 bytes needed for the 512 slots (1 slot = 2 words)
that are used for other commands passed to CDMA. This requires that two
memory pages are allocated, but most of the second page (4092 bytes) is
never used.

Decrease the number of slots to 511 so that the RESTART opcode fits
within the page. Adjust the push buffer wraparound code to take into
account push buffer sizes that are not a power of two.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-02-07 18:28:59 +01:00
Thierry Reding 0e43b8da15 gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND
The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to
the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an
absolute address. This can cause SMMU faults if the CDMA fetches past a
pushbuffer's IOMMU mapping.

Properly setting the DMAEND prevents the CDMA from fetching beyond that
address and avoid such issues. This is currently not observed because a
whole (almost) page of essentially scratch space absorbs any excessive
prefetching by CDMA. However, changing the number of slots in the push
buffer can trigger these SMMU faults.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-02-07 18:28:58 +01:00
Thierry Reding 8de896eb20 gpu: host1x: Support 40-bit addressing on Tegra186
The host1x and clients instantiated on Tegra186 support addressing 40
bits of memory.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-02-07 18:28:58 +01:00
Thierry Reding 38fabcc953 gpu: host1x: Restrict IOVA space to DMA mask
On Tegra186 and later, the ARM SMMU provides an input address space that
is 48 bits wide. However, memory clients can only address up to 40 bits.
If the geometry is used as-is, allocations of IOVA space can end up in a
region that is not addressable by the memory clients.

To fix this, restrict the IOVA space to the DMA mask of the host1x
device.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-02-07 18:28:57 +01:00
Thierry Reding 67a82dbc0a gpu: host1x: Support 40-bit addressing
Tegra186 and later support 40 bits of address space. Additional
registers need to be programmed to store the full 40 bits of push
buffer addresses.

Since command stream gathers can also reside in buffers in a 40-bit
address space, a new variant of the GATHER opcode is also introduced.
It takes two parameters: the first parameter contains the lower 32
bits of the address and the second parameter contains bits 32 to 39.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-02-07 18:28:35 +01:00
Thierry Reding 5a5fccbd8c gpu: host1x: Introduce support for wide opcodes
The CDMA push buffer can currently only handle opcodes that take a
single word parameter. However, the host1x implementation on Tegra186
and later supports opcodes that require multiple words as parameters.

Unfortunately the way the push buffer is structured, these wide opcodes
cannot simply be composed of two regular opcodes because that could
result in the wide opcode being split across the end of the push buffer
and the final RESTART opcode required to wrap the push buffer around
would break the wide opcode.

One way to fix this would be to remove the concept of slots to simplify
push buffer operations. However, that's not entirely trivial and should
be done in a separate patch. For now, simply use a different function
to push four-word opcodes into the push buffer. Technically only three
words are pushed, with the fourth word used as padding to preserve the
2-word alignment required by the slots abstraction. The fourth word is
always a NOP opcode.

Additional care must be taken when the end of the push buffer is
reached. If a four-word opcode doesn't fit into the push buffer without
being split by the boundary, NOP opcodes will be introduced and the new
wide opcode placed at the beginning of the push buffer.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-02-07 18:28:35 +01:00
Thierry Reding de5469c21f gpu: host1x: Program the channel stream ID
When processing command streams, make sure the host1x's stream ID is
programmed for the channel so that addresses are properly translated
through the SMMU.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-02-07 18:28:33 +01:00
Thierry Reding 6841482b82 gpu: host1x: Set up stream ID table
In order to enable the MMIO path stream ID protection provided by the
incarnation of host1x found in Tegra186 and later, the host1x must be
provided with the list of stream ID register offsets for each of its
clients. Some clients (such as VIC) have multiple stream ID registers
that are assumed to be contiguous. The host1x is programmed with the
base offset and a limit which provide the range of registers that the
host1x needs to monitor for writes.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-02-04 08:35:55 +01:00
Thierry Reding f67524caf4 gpu: host1x: Represent host1x bus devices in debugfs
This new debugfs file represents the state of host1x bus devices,
specifying the list of subdevices and marking which ones have
successfully registered.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-02-04 08:35:54 +01:00
Arnd Bergmann 0747a672a3 gpu: host1x: Use completion instead of semaphore
In this usage, the two are completely equivalent, but the completion
documents better what is going on, and we generally try to avoid
semaphores these days.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-02-04 08:35:54 +01:00
Alban Bedel db5adf4d6d drm/tegra: hdmi: Fix audio to work with any pixel clock rate
The audio setting implementation was limited to a few specific pixel
clocks. This prevented HDMI audio from working on several test devices
as they need a pixel clock that is not supported by this implementation.

Fix this by implementing the algorithm provided in the TRM using fixed
point arithmetic. This allows the driver to cope with any sane pixel
clock rate.

Signed-off-by: Alban Bedel <alban.bedel@avionic-design.de>
[treding@nvidia.com: fix uninitialized variable warning]
Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-01-16 13:11:45 +01:00
Thierry Reding e3c702dcc7 drm/tegra: hdmi: Reuse common HDA format parser
Eliminate some duplicate code by reusing the HDA format parser already
used by the SOR.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-01-16 13:10:33 +01:00
Thierry Reding fad7b80643 drm/tegra: hda: Extract HDA format parsing code
This code can be reused for HDMI, so extract it into a reusable
function.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-01-16 13:10:20 +01:00
Thierry Reding cd54fb96e5 drm/tegra: sor: Parse more data from HDA format
The HDA format data passed to the SOR from the HDA codec contains more
information than just the rate and number of channels. Parse all the
fields and store them in an internal structure for subsequent use.

While at it, also fix an off-by-one error in the number of channels.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-01-16 13:10:03 +01:00
Thierry Reding f25d0a68be drm/tegra: Refactor CEC support
Most of the CEC support code already lives in the "output" library code.
Move registration and unregistration to the library code as well to make
use of the same code with HDMI on Tegra210 and later via the SOR.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-01-16 13:09:32 +01:00
Linus Torvalds bfeffd1552 Linux 5.0-rc1 2019-01-06 17:08:20 -08:00
Linus Torvalds 85e1ffbd42 Merge tag 'kbuild-v4.21-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:

 - improve boolinit.cocci and use_after_iter.cocci semantic patches

 - fix alignment for kallsyms

 - move 'asm goto' compiler test to Kconfig and clean up jump_label
   CONFIG option

 - generate asm-generic wrappers automatically if arch does not
   implement mandatory UAPI headers

 - remove redundant generic-y defines

 - misc cleanups

* tag 'kbuild-v4.21-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: rename generated .*conf-cfg to *conf-cfg
  kbuild: remove unnecessary stubs for archheader and archscripts
  kbuild: use assignment instead of define ... endef for filechk_* rules
  arch: remove redundant UAPI generic-y defines
  kbuild: generate asm-generic wrappers if mandatory headers are missing
  arch: remove stale comments "UAPI Header export list"
  riscv: remove redundant kernel-space generic-y
  kbuild: change filechk to surround the given command with { }
  kbuild: remove redundant target cleaning on failure
  kbuild: clean up rule_dtc_dt_yaml
  kbuild: remove UIMAGE_IN and UIMAGE_OUT
  jump_label: move 'asm goto' support test to Kconfig
  kallsyms: lower alignment on ARM
  scripts: coccinelle: boolinit: drop warnings on named constants
  scripts: coccinelle: check for redeclaration
  kconfig: remove unused "file" field of yylval union
  nds32: remove redundant kernel-space generic-y
  nios2: remove unneeded HAS_DMA define
2019-01-06 16:33:10 -08:00
Linus Torvalds ac5eed2b41 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling updates form Ingo Molnar:
 "A final batch of perf tooling changes: mostly fixes and small
  improvements"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
  perf session: Add comment for perf_session__register_idle_thread()
  perf thread-stack: Fix thread stack processing for the idle task
  perf thread-stack: Allocate an array of thread stacks
  perf thread-stack: Factor out thread_stack__init()
  perf thread-stack: Allow for a thread stack array
  perf thread-stack: Avoid direct reference to the thread's stack
  perf thread-stack: Tidy thread_stack__bottom() usage
  perf thread-stack: Simplify some code in thread_stack__process()
  tools gpio: Allow overriding CFLAGS
  tools power turbostat: Override CFLAGS assignments and add LDFLAGS to build command
  tools thermal tmon: Allow overriding CFLAGS assignments
  tools power x86_energy_perf_policy: Override CFLAGS assignments and add LDFLAGS to build command
  perf c2c: Increase the HITM ratio limit for displayed cachelines
  perf c2c: Change the default coalesce setup
  perf trace beauty ioctl: Beautify USBDEVFS_ commands
  perf trace beauty: Export function to get the files for a thread
  perf trace: Wire up ioctl's USBDEBFS_ cmd table generator
  perf beauty ioctl: Add generator for USBDEVFS_ ioctl commands
  tools headers uapi: Grab a copy of usbdevice_fs.h
  perf trace: Store the major number for a file when storing its pathname
  ...
2019-01-06 16:30:14 -08:00
Linus Torvalds 574823bfab Change mincore() to count "mapped" pages rather than "cached" pages
The semantics of what "in core" means for the mincore() system call are
somewhat unclear, but Linux has always (since 2.3.52, which is when
mincore() was initially done) treated it as "page is available in page
cache" rather than "page is mapped in the mapping".

The problem with that traditional semantic is that it exposes a lot of
system cache state that it really probably shouldn't, and that users
shouldn't really even care about.

So let's try to avoid that information leak by simply changing the
semantics to be that mincore() counts actual mapped pages, not pages
that might be cheaply mapped if they were faulted (note the "might be"
part of the old semantics: being in the cache doesn't actually guarantee
that you can access them without IO anyway, since things like network
filesystems may have to revalidate the cache before use).

In many ways the old semantics were somewhat insane even aside from the
information leak issue.  From the very beginning (and that beginning is
a long time ago: 2.3.52 was released in March 2000, I think), the code
had a comment saying

  Later we can get more picky about what "in core" means precisely.

and this is that "later".  Admittedly it is much later than is really
comfortable.

NOTE! This is a real semantic change, and it is for example known to
change the output of "fincore", since that program literally does a
mmmap without populating it, and then doing "mincore()" on that mapping
that doesn't actually have any pages in it.

I'm hoping that nobody actually has any workflow that cares, and the
info leak is real.

We may have to do something different if it turns out that people have
valid reasons to want the old semantics, and if we can limit the
information leak sanely.

Cc: Kevin Easton <kevin@guarana.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Masatake YAMATO <yamato@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-06 13:43:02 -08:00
Linus Torvalds 94bd8a05cd Fix 'acccess_ok()' on alpha and SH
Commit 594cc251fd ("make 'user_access_begin()' do 'access_ok()'")
broke both alpha and SH booting in qemu, as noticed by Guenter Roeck.

It turns out that the bug wasn't actually in that commit itself (which
would have been surprising: it was mostly a no-op), but in how the
addition of access_ok() to the strncpy_from_user() and strnlen_user()
functions now triggered the case where those functions would test the
access of the very last byte of the user address space.

The string functions actually did that user range test before too, but
they did it manually by just comparing against user_addr_max().  But
with user_access_begin() doing the check (using "access_ok()"), it now
exposed problems in the architecture implementations of that function.

For example, on alpha, the access_ok() helper macro looked like this:

  #define __access_ok(addr, size) \
        ((get_fs().seg & (addr | size | (addr+size))) == 0)

and what it basically tests is of any of the high bits get set (the
USER_DS masking value is 0xfffffc0000000000).

And that's completely wrong for the "addr+size" check.  Because it's
off-by-one for the case where we check to the very end of the user
address space, which is exactly what the strn*_user() functions do.

Why? Because "addr+size" will be exactly the size of the address space,
so trying to access the last byte of the user address space will fail
the __access_ok() check, even though it shouldn't.  As a result, the
user string accessor functions failed consistently - because they
literally don't know how long the string is going to be, and the max
access is going to be that last byte of the user address space.

Side note: that alpha macro is buggy for another reason too - it re-uses
the arguments twice.

And SH has another version of almost the exact same bug:

  #define __addr_ok(addr) \
        ((unsigned long __force)(addr) < current_thread_info()->addr_limit.seg)

so far so good: yes, a user address must be below the limit.  But then:

  #define __access_ok(addr, size)         \
        (__addr_ok((addr) + (size)))

is wrong with the exact same off-by-one case: the case when "addr+size"
is exactly _equal_ to the limit is actually perfectly fine (think "one
byte access at the last address of the user address space")

The SH version is actually seriously buggy in another way: it doesn't
actually check for overflow, even though it did copy the _comment_ that
talks about overflow.

So it turns out that both SH and alpha actually have completely buggy
implementations of access_ok(), but they happened to work in practice
(although the SH overflow one is a serious serious security bug, not
that anybody likely cares about SH security).

This fixes the problems by using a similar macro on both alpha and SH.
It isn't trying to be clever, the end address is based on this logic:

        unsigned long __ao_end = __ao_a + __ao_b - !!__ao_b;

which basically says "add start and length, and then subtract one unless
the length was zero".  We can't subtract one for a zero length, or we'd
just hit an underflow instead.

For a lot of access_ok() users the length is a constant, so this isn't
actually as expensive as it initially looks.

Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-06 13:25:45 -08:00
Linus Torvalds baa6707381 Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypt updates from Ted Ts'o:
 "Add Adiantum support for fscrypt"

* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
  fscrypt: add Adiantum support
2019-01-06 12:21:11 -08:00
Linus Torvalds 215240462a Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bug fixes from Ted Ts'o:
 "Fix a number of ext4 bugs"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix special inode number checks in __ext4_iget()
  ext4: track writeback errors using the generic tracking infrastructure
  ext4: use ext4_write_inode() when fsyncing w/o a journal
  ext4: avoid kernel warning when writing the superblock to a dead device
  ext4: fix a potential fiemap/page fault deadlock w/ inline_data
  ext4: make sure enough credits are reserved for dioread_nolock writes
2019-01-06 12:19:23 -08:00
Linus Torvalds e2b745f469 Merge tag 'dma-mapping-4.21-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
 "Fix various regressions introduced in this cycles:

   - fix dma-debug tracking for the map_page / map_single
     consolidatation

   - properly stub out DMA mapping symbols for !HAS_DMA builds to avoid
     link failures

   - fix AMD Gart direct mappings

   - setup the dma address for no kernel mappings using the remap
     allocator"

* tag 'dma-mapping-4.21-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING for remapped allocations
  x86/amd_gart: fix unmapping of non-GART mappings
  dma-mapping: remove a few unused exports
  dma-mapping: properly stub out the DMA API for !CONFIG_HAS_DMA
  dma-mapping: remove dmam_{declare,release}_coherent_memory
  dma-mapping: implement dmam_alloc_coherent using dmam_alloc_attrs
  dma-mapping: implement dma_map_single_attrs using dma_map_page_attrs
2019-01-06 11:47:26 -08:00