The code should call ocfs2_free_alloc_context() to free meta_ac &
data_ac before calling ocfs2_run_deallocs(). Because
ocfs2_run_deallocs() will acquire the system inode's i_mutex hold by
meta_ac. So try to release the lock before ocfs2_run_deallocs().
Fixes: af1310367f41 ("ocfs2: fix sparse file & data ordering issue in direct io.")
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Acked-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When doing append direct write in an already allocated cluster, and fast
path in ocfs2_dio_get_block() is triggered, function
ocfs2_dio_end_io_write() will be skipped as there is no context
allocated.
As a result, the disk file size will not be changed as it should be.
The solution is to skip fast path when we are about to change file size.
Fixes: af1310367f41 ("ocfs2: fix sparse file & data ordering issue in direct io.")
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Acked-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the current implementation of unaligned aio+dio, lock order behave as
follow:
in user process context:
-> call io_submit()
-> get i_mutex
<== window1
-> get ip_unaligned_aio
-> submit direct io to block device
-> release i_mutex
-> io_submit() return
in dio work queue context(the work queue is created in __blockdev_direct_IO):
-> release ip_unaligned_aio
<== window2
-> get i_mutex
-> clear unwritten flag & change i_size
-> release i_mutex
There is a limitation to the thread number of dio work queue. 256 at
default. If all 256 thread are in the above 'window2' stage, and there
is a user process in the 'window1' stage, the system will became
deadlock. Since the user process hold i_mutex to wait ip_unaligned_aio
lock, while there is a direct bio hold ip_unaligned_aio mutex who is
waiting for a dio work queue thread to be schedule. But all the dio
work queue thread is waiting for i_mutex lock in 'window2'.
This case only happened in a test which send a large number(more than
256) of aio at one io_submit() call.
My design is to remove ip_unaligned_aio lock. Change it to a sync io
instead. Just like ip_unaligned_aio lock, serialize the unaligned aio
dio.
[akpm@linux-foundation.org: remove OCFS2_IOCB_UNALIGNED_IO, per Junxiao Bi]
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clean up ocfs2_file_write_iter & ocfs2_prepare_inode_for_write:
* remove append dio check: it will be checked in ocfs2_direct_IO()
* remove file hole check: file hole is supported for now
* remove inline data check: it will be checked in ocfs2_direct_IO()
* remove the full_coherence check when append dio: we will get the
inode_lock in ocfs2_dio_get_block, there is no need to fall back to
buffer io to ensure the coherence semantics.
Now the drop dio procedure is gone. :)
[akpm@linux-foundation.org: remove unused label]
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are mainly three issues in the direct io code path after commit
24c40b329e ("ocfs2: implement ocfs2_direct_IO_write"):
* Does not support sparse file.
* Does not support data ordering. eg: when write to a file hole, it
will alloc extent first. If system crashed before io finished, data
will corrupt.
* Potential risk when doing aio+dio. The -EIOCBQUEUED return value is
likely to be ignored by ocfs2_direct_IO_write().
To resolve above problems, re-design direct io code with following ideas:
* Use buffer io to fill in holes. And this will make better
performance also.
* Clear unwritten after direct write finished. So we can make sure
meta data changes after data write to disk. (Unwritten extent is
invisible to user, from user's view, meta data is not changed when
allocate an unwritten extent.)
* Clear ocfs2_direct_IO_write(). Do all ending work in end_io.
This patch has passed fs,dio,ltp-aiodio.part1,ltp-aiodio.part2,ltp-aiodio.part4
test cases of ltp.
For performance improvement, see following test result:
ocfs2 cluster size 1MB, ocfs2 volume is mounted on /mnt/.
The original way:
+ rm /mnt/test.img -f
+ dd if=/dev/zero of=/mnt/test.img bs=4K count=1048576 oflag=direct
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 1707.83 s, 2.5 MB/s
+ rm /mnt/test.img -f
+ dd if=/dev/zero of=/mnt/test.img bs=256K count=16384 oflag=direct
16384+0 records in
16384+0 records out
4294967296 bytes (4.3 GB) copied, 582.705 s, 7.4 MB/s
After this patch:
+ rm /mnt/test.img -f
+ dd if=/dev/zero of=/mnt/test.img bs=4K count=1048576 oflag=direct
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 64.6412 s, 66.4 MB/s
+ rm /mnt/test.img -f
+ dd if=/dev/zero of=/mnt/test.img bs=256K count=16384 oflag=direct
16384+0 records in
16384+0 records out
4294967296 bytes (4.3 GB) copied, 34.7611 s, 124 MB/s
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.
There is still one issue in the direct write procedure.
phase 1: alloc extent with UNWRITTEN flag
phase 2: submit direct data to disk, add zero page to page cache
phase 3: clear UNWRITTEN flag when data has been written to disk
When there are 2 direct write A(0~3KB),B(4~7KB) writing to the same
cluster 0~7KB (cluster size 8KB). Write request A arrive phase 2 first,
it will zero the region (4~7KB). Before request A enter to phase 3,
request B arrive phase 2, it will zero region (0~3KB). This is just like
request B steps request A.
To resolve this issue, we should let request B knows this cluster is already
under zero, to prevent it from steps the previous write request.
This patch will add function ocfs2_unwritten_check() to do this job. It
will record all clusters that are under direct write(it will be recorded
in the 'ip_unwritten_list' member of inode info), and prevent the later
direct write writing to the same cluster to do the zero work again.
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.
Direct io needs to get the physical address from write_begin, to map the
user page. This patch is to change the arg 'phys' of
ocfs2_write_cluster to a pointer, so it can be retrieved to write_begin.
And we can retrieve it to the direct io procedure.
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.
Append direct io do not change i_size in get block phase. It only move
to orphan when starting write. After data is written to disk, it will
delete itself from orphan and update i_size. So skip i_size change
section in write_begin for direct io.
And when there is no extents alloc, no meta data changes needed for
direct io (since write_begin start trans for 2 reason: alloc extents &
change i_size. Now none of them needed). So we can skip start trans
procedure.
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.
Direct io data will not appear in buffer. The w_target_page member will
not be filled by direct io. So avoid to use it when it's NULL. Unlinke
buffer io and mmap, direct io will call write_begin with more than 1
page a time. So the target_index is not sufficient to describe the
actual data. change it to a range start at target_index, end in
end_index.
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.
There is a problem in ocfs2's direct io implement: if system crashed
after extents allocated, and before data return, we will get a extent
with dirty data on disk. This problem violate the journal=order
semantics, which means meta changes take effect after data written to
disk. To resolve this issue, direct write can use the UNWRITTEN flag to
describe a extent during direct data writeback. The direct write
procedure should act in the following order:
phase 1: alloc extent with UNWRITTEN flag
phase 2: submit direct data to disk, add zero page to page cache
phase 3: clear UNWRITTEN flag when data has been written to disk
This patch is to change the 'c_unwritten' member of
ocfs2_write_cluster_desc to 'c_clear_unwritten'. Means whether to clear
the unwritten flag. It do not care if a extent is allocated or not.
And use 'c_new' to specify a newly allocated extent. So the direct io
procedure can use c_clear_unwritten to control the UNWRITTEN bit on
extent.
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patchset: fix ocfs2 direct io code patch to support sparse file and data
ordering semantics
The idea is to use buffer io(more precisely use the interface
ocfs2_write_begin_nolock & ocfs2_write_end_nolock) to do the zero work
beyond block size. And clear UNWRITTEN flag until direct io data has
been written to disk, which can prevent data corruption when system
crashed during direct write.
And we will also archive a better performance: eg. dd direct write new
file with block size 4KB: before this patchset:
2.5 MB/s
after this patchset:
66.4 MB/s
This patch (of 8):
To support direct io in ocfs2_write_begin_nolock &
ocfs2_write_end_nolock.
Remove unused args filp & flags. Add new arg type. The type is one of
buffer/direct/mmap. Indicate 3 way to perform write. buffer/mmap type
has implemented. direct type will be implemented later.
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block fixes from Jens Axboe:
"Final round of fixes for this merge window - some of this has come up
after the initial pull request, and some of it was put in a post-merge
branch before the merge window.
This contains:
- Fix for a bad check for an error on dma mapping in the mtip32xx
driver, from Alexey Khoroshilov.
- A set of fixes for lightnvm, from Javier, Matias, and Wenwei.
- An NVMe completion record corruption fix from Marta, ensuring that
we read things in the right order.
- Two writeback fixes from Tejun, marked for stable@ as well.
- A blk-mq sw queue iterator fix from Thomas, fixing an oops for
sparse CPU maps. They hit this in the hot plug/unplug rework"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: avoid cqe corruption when update at the same time as read
writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode
writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()
blk-mq: Use proper cpumask iterator
mtip32xx: fix checks for dma mapping errors
lightnvm: do not load L2P table if not supported
lightnvm: do not reserve lun on l2p loading
nvme: lightnvm: return ppa completion status
lightnvm: add a bitmap of luns
lightnvm: specify target's logical address area
null_blk: add lightnvm null_blk device to the nullb_list
Pull MTD updates from Brian Norris:
"NAND:
- Add sunxi_nand randomizer support
- begin refactoring NAND ecclayout structs
- fix pxa3xx_nand dmaengine usage
- brcmnand: fix support for v7.1 controller
- add Qualcomm NAND controller driver
SPI NOR:
- add new ls1021a, ls2080a support to Freescale QuadSPI
- add new flash ID entries
- support bottom-block protection for Winbond flash
- support Status Register Write Protect
- remove broken QPI support for Micron SPI flash
JFFS2:
- improve post-mount CRC scan efficiency
General:
- refactor bcm63xxpart parser, to later extend for NAND
- add writebuf size parameter to mtdram
Other minor code quality improvements"
* tag 'for-linus-20160324' of git://git.infradead.org/linux-mtd: (72 commits)
mtd: nand: remove kerneldoc for removed function parameter
mtd: nand: Qualcomm NAND controller driver
dt/bindings: qcom_nandc: Add DT bindings
mtd: nand: don't select chip in nand_chip's block_bad op
mtd: spi-nor: support lock/unlock for a few Winbond chips
mtd: spi-nor: add TB (Top/Bottom) protect support
mtd: spi-nor: add SPI_NOR_HAS_LOCK flag
mtd: spi-nor: use BIT() for flash_info flags
mtd: spi-nor: disallow further writes to SR if WP# is low
mtd: spi-nor: make lock/unlock bounds checks more obvious and robust
mtd: spi-nor: silently drop lock/unlock for already locked/unlocked region
mtd: spi-nor: wait for SR_WIP to clear on initial unlock
mtd: nand: simplify nand_bch_init() usage
mtd: mtdswap: remove useless if (!mtd->ecclayout) test
mtd: create an mtd_oobavail() helper and make use of it
mtd: kill the ecclayout->oobavail field
mtd: nand: check status before reporting timeout
mtd: bcm63xxpart: give width specifier an 'int', not 'size_t'
mtd: mtdram: Add parameter for setting writebuf size
mtd: nand: pxa3xx_nand: kill unused field 'drcmr_cmd'
...
Pull UBI/UBIFS updates from Richard Weinberger:
"This contains cleanups and a maintainer update for UBI and UBIFS"
* tag 'upstream-4.6-rc1' of git://git.infradead.org/linux-ubifs:
ubifs: Remove unused header
MAINTAINERS: Update UBIFS entry
mtd: ubi: Add logging functions ubi_msg, ubi_warn and ubi_err
ubifs: Add logging functions for ubifs_msg, ubifs_err and ubifs_warn
Pull more nfsd updates from Bruce Fields:
"Apologies for the previous request, which omitted the top 8 commits
from my for-next branch (including the SCSI layout commits). Thanks
to Trond for spotting my error!"
This actually includes the new layout types, so here's that part of
the pull message repeated:
"Support for a new pnfs layout type from Christoph Hellwig. The new
layout type is a variant of the block layout which uses SCSI features
to offer improved fencing and device identification.
Note this pull request also includes the client side of SCSI layout,
with Trond's permission"
* tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux:
nfsd: use short read as well as i_size to set eof
nfsd: better layoutupdate bounds-checking
nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCK
nfsd: add SCSI layout support
nfsd: move some blocklayout code
nfsd: add a new config option for the block layout driver
nfs/blocklayout: add SCSI layout support
nfs4.h: add SCSI layout definitions
Pull nfsd updates from Bruce Fields:
"Various bugfixes, a RDMA update from Chuck Lever, and support for a
new pnfs layout type from Christoph Hellwig. The new layout type is a
variant of the block layout which uses SCSI features to offer improved
fencing and device identification.
(Also: note this pull request also includes the client side of SCSI
layout, with Trond's permission.)"
* tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux:
sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
nfsd: recover: fix memory leak
nfsd: fix deadlock secinfo+readdir compound
nfsd4: resfh unused in nfsd4_secinfo
svcrdma: Use new CQ API for RPC-over-RDMA server send CQs
svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs
svcrdma: Remove close_out exit path
svcrdma: Hook up the logic to return ERR_CHUNK
svcrdma: Use correct XID in error replies
svcrdma: Make RDMA_ERROR messages work
rpcrdma: Add RPCRDMA_HDRLEN_ERR
svcrdma: svc_rdma_post_recv() should close connection on error
svcrdma: Close connection when a send error occurs
nfsd: Lower NFSv4.1 callback message size limit
svcrdma: Do not send Write chunk XDR pad with inline content
svcrdma: Do not write xdr_buf::tail in a Write chunk
svcrdma: Find client-provided write and reply chunks once per reply
nfsd: Update NFS server comments related to RDMA support
nfsd: Fix a memory leak when meeting unsupported state_protect_how4
nfsd4: fix bad bounds checking
Use the result of a local read to determine when to set the eof flag. This
allows us to return the location of the end of the file atomically at the
time of the read.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
[bfields: add some documentation]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
API:
- Fix kzalloc error path crash in ecryptfs added by skcipher
conversion. Note the subject of the commit is screwed up and the
correct subject is actually in the body.
Drivers:
- A number of fixes to the marvell cesa hashing code.
- Remove bogus nested irqsave that clobbers the saved flags in ccp"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: marvell/cesa - forward devm_ioremap_resource() error code
crypto: marvell/cesa - initialize hash states
crypto: marvell/cesa - fix memory leak
crypto: ccp - fix lock acquisition code
eCryptfs: Use skcipher and shash
Merge third patch-bomb from Andrew Morton:
- more ocfs2 changes
- a few hotfixes
- Andy's compat cleanups
- misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd,
panic, ipmi, kgdb, profile, kfifo, ubsan, etc.
- many rapidio updates: fixes, new drivers.
- kcov: kernel code coverage feature. Like gcov, but not
"prohibitively expensive".
- extable code consolidation for various archs
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
ia64/extable: use generic search and sort routines
x86/extable: use generic search and sort routines
s390/extable: use generic search and sort routines
alpha/extable: use generic search and sort routines
kernel/...: convert pr_warning to pr_warn
drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings
drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP
memremap: add MEMREMAP_WC flag
memremap: don't modify flags
kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE
mm/mprotect.c: don't imply PROT_EXEC on non-exec fs
ipc/sem: make semctl setting sempid consistent
ubsan: fix tree-wide -Wmaybe-uninitialized false positives
kfifo: fix sparse complaints
scripts/gdb: account for changes in module data structure
scripts/gdb: add cmdline reader command
scripts/gdb: add version command
kernel: add kcov code coverage
profile: hide unused functions when !CONFIG_PROC_FS
hpwdt: use nmi_panic() when kernel panics in NMI handler
...
Since commit e22553e2a2 ("eventfd: don't take the spinlock in
eventfd_poll", 2015-02-17), eventfd is reading ctx->count outside
ctx->wqh.lock.
However, things aren't as simple as the read barrier in eventfd_poll
would suggest. In fact, the read barrier, besides lacking a comment, is
not paired in any obvious manner with another read barrier, and it is
pointless because it is sitting between a write (deep in poll_wait) and
the read of ctx->count. The read barrier is acting just as a compiler
barrier, for which we can use READ_ONCE instead. This is what the code
change in this patch does.
The documentation change is just as important, however. The question,
posed by Andrea Arcangeli, is then why the thing is safe on
architectures where spin_unlock does not imply a store-load memory
barrier. The answer is that it's safe because writes of ctx->count use
the same lock as poll_wait, and hence an acquire barrier implicit in
poll_wait provides the necessary synchronization between eventfd_poll
and callers of wake_up_locked_poll. This is sort of mentioned in the
commit message with respect to eventfd_ctx_read ("eventfd_read is
similar, it will do a single decrement with the lock held") but it
applies to all other callers too. It's tricky enough that it should be
documented in the code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Mason <clm@fb.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit fixes the following security hole affecting systems where
all of the following conditions are fulfilled:
- The fs.suid_dumpable sysctl is set to 2.
- The kernel.core_pattern sysctl's value starts with "/". (Systems
where kernel.core_pattern starts with "|/" are not affected.)
- Unprivileged user namespace creation is permitted. (This is
true on Linux >=3.8, but some distributions disallow it by
default using a distro patch.)
Under these conditions, if a program executes under secure exec rules,
causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
namespace, changes its root directory and crashes, the coredump will be
written using fsuid=0 and a path derived from kernel.core_pattern - but
this path is interpreted relative to the root directory of the process,
allowing the attacker to control where a coredump will be written with
root privileges.
To fix the security issue, always interpret core_pattern for dumps that
are written under SUID_DUMP_ROOT relative to the root directory of init.
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FAT has long supported its own default file name encoding config
setting, separate from CONFIG_NLS_DEFAULT.
However, if UTF-8 encoded file names are desired FAT character set
should not be set to utf8 since this would make file names case
sensitive even if case insensitive matching is requested. Instead,
"utf8" mount options should be provided to enable UTF-8 file names in
FAT file system.
Unfortunately, there was no possibility to set the default value of this
option so on UTF-8 system "utf8" mount option had to be added manually
to most FAT mounts.
This patch adds config option to set such default value.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ext4 treats directory offsets differently for 32-bit and 64-bit callers.
Check the caller type using in_compat_syscall, not is_compat_task. This
changes behavior on SPARC slightly.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>