Pull nfsd update from Bruce Fields:
"Included this time:
- more nfsd containerization work from Stanislav Kinsbursky: we're
not quite there yet, but should be by 3.9.
- NFSv4.1 progress: implementation of basic backchannel security
negotiation and the mandatory BACKCHANNEL_CTL operation. See
http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues
for remaining TODO's
- Fixes for some bugs that could be triggered by unusual compounds.
Our xdr code wasn't designed with v4 compounds in mind, and it
shows. A more thorough rewrite is still a todo.
- If you've ever seen "RPC: multiple fragments per record not
supported" logged while using some sort of odd userland NFS client,
that should now be fixed.
- Further work from Jeff Layton on our mechanism for storing
information about NFSv4 clients across reboots.
- Further work from Bryan Schumaker on his fault-injection mechanism
(which allows us to discard selective NFSv4 state, to excercise
rarely-taken recovery code paths in the client.)
- The usual mix of miscellaneous bugs and cleanup.
Thanks to everyone who tested or contributed this cycle."
* 'for-3.8' of git://linux-nfs.org/~bfields/linux: (111 commits)
nfsd4: don't leave freed stateid hashed
nfsd4: free_stateid can use the current stateid
nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
nfsd: warn on odd reply state in nfsd_vfs_read
nfsd4: fix oops on unusual readlike compound
nfsd4: disable zero-copy on non-final read ops
svcrpc: fix some printks
NFSD: Correct the size calculation in fault_inject_write
NFSD: Pass correct buffer size to rpc_ntop
nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
nfsd: simplify service shutdown
nfsd: replace boolean nfsd_up flag by users counter
nfsd: simplify NFSv4 state init and shutdown
nfsd: introduce helpers for generic resources init and shutdown
nfsd: make NFSd service structure allocated per net
nfsd: make NFSd service boot time per-net
nfsd: per-net NFSd up flag introduced
nfsd: move per-net startup code to separated function
nfsd: pass net to __write_ports() and down
nfsd: pass net to nfsd_set_nrthreads()
...
Pull new F2FS filesystem from Jaegeuk Kim:
"Introduce a new file system, Flash-Friendly File System (F2FS), to
Linux 3.8.
Highlights:
- Add initial f2fs source codes
- Fix an endian conversion bug
- Fix build failures on random configs
- Fix the power-off-recovery routine
- Minor cleanup, coding style, and typos patches"
From the Kconfig help text:
F2FS is based on Log-structured File System (LFS), which supports
versatile "flash-friendly" features. The design has been focused on
addressing the fundamental issues in LFS, which are snowball effect
of wandering tree and high cleaning overhead.
Since flash-based storages show different characteristics according to
the internal geometry or flash memory management schemes aka FTL, F2FS
and tools support various parameters not only for configuring on-disk
layout, but also for selecting allocation and cleaning algorithms.
and there's an article by Neil Brown about it on lwn.net:
http://lwn.net/Articles/518988/
* tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits)
f2fs: fix tracking parent inode number
f2fs: cleanup the f2fs_bio_alloc routine
f2fs: introduce accessor to retrieve number of dentry slots
f2fs: remove redundant call to f2fs_put_page in delete entry
f2fs: make use of GFP_F2FS_ZERO for setting gfp_mask
f2fs: rewrite f2fs_bio_alloc to make it simpler
f2fs: fix a typo in f2fs documentation
f2fs: remove unused variable
f2fs: move error condition for mkdir at proper place
f2fs: remove unneeded initialization
f2fs: check read only condition before beginning write out
f2fs: remove unneeded memset from init_once
f2fs: show error in case of invalid mount arguments
f2fs: fix the compiler warning for uninitialized use of variable
f2fs: resolve build failures
f2fs: adjust kernel coding style
f2fs: fix endian conversion bugs reported by sparse
f2fs: remove unneeded version.h header file from f2fs.h
f2fs: update the f2fs document
f2fs: update Kconfig and Makefile
...
It is currently impossible to examine the state of seccomp for a given
process. While attaching with gdb and attempting "call
prctl(PR_GET_SECCOMP,...)" will work with some situations, it is not
reliable. If the process is in seccomp mode 1, this query will kill the
process (prctl not allowed), if the process is in mode 2 with prctl not
allowed, it will similarly be killed, and in weird cases, if prctl is
filtered to return errno 0, it can look like seccomp is disabled.
When reviewing the state of running processes, there should be a way to
externally examine the seccomp mode. ("Did this build of Chrome end up
using seccomp?" "Did my distro ship ssh with seccomp enabled?")
This adds the "Seccomp" line to /proc/$pid/status.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: James Morris <jmorris@namei.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During c/r sessions we've found that there is no way at the moment to
fetch some VMA associated flags, such as mlock() and madvise().
This leads us to a problem -- we don't know if we should call for mlock()
and/or madvise() after restore on the vma area we're bringing back to
life.
This patch intorduces a new field into "smaps" output called VmFlags,
where all set flags associated with the particular VMA is shown as two
letter mnemonics.
[ Strictly speaking for c/r we only need mlock/madvise bits but it has been
said that providing just a few flags looks somehow inconsistent. So all
flags are here now. ]
This feature is made available on CONFIG_CHECKPOINT_RESTORE=n kernels, as
other applications may start to use these fields.
The data is encoded in a somewhat awkward two letters mnemonic form, to
encourage userspace to be prepared for fields being added or removed in
the future.
[a.p.zijlstra@chello.nl: props to use for_each_set_bit]
[sfr@canb.auug.org.au: props to use array instead of struct]
[akpm@linux-foundation.org: overall redesign and simplification]
[akpm@linux-foundation.org: remove unneeded braces per sfr, avoid using bloaty for_each_set_bit()]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
So far FAT either offsets time stamps by sys_tz.minuteswest or leaves them
as they are (when tz=UTC mount option is used). However in some cases it
is useful if one can specify time stamp offset on his own (e.g. when time
zone of the camera connected is different from time zone of the computer,
or when HW clock is in UTC and thus sys_tz.minuteswest == 0).
So provide a mount option time_offset= which allows user to specify offset
in minutes that should be applied to time stamps on the filesystem.
akpm: this code would work incorrectly when used via `mount -o remount',
because cached inodes would not be updated. But fatfs's fat_remount() is
basically a no-op anyway.
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ext4 update from Ted Ts'o:
"There are two major features for this merge window. The first is
inline data, which allows small files or directories to be stored in
the in-inode extended attribute area. (This requires that the file
system use inodes which are at least 256 bytes or larger; 128 byte
inodes do not have any room for in-inode xattrs.)
The second new feature is SEEK_HOLE/SEEK_DATA support. This is
enabled by the extent status tree patches, and this infrastructure
will be used to further optimize ext4 in the future.
Beyond that, we have the usual collection of code cleanups and bug
fixes."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (63 commits)
ext4: zero out inline data using memset() instead of empty_zero_page
ext4: ensure Inode flags consistency are checked at build time
ext4: Remove CONFIG_EXT4_FS_XATTR
ext4: remove unused variable from ext4_ext_in_cache()
ext4: remove redundant initialization in ext4_fill_super()
ext4: remove redundant code in ext4_alloc_inode()
ext4: use sync_inode_metadata() when syncing inode metadata
ext4: enable ext4 inline support
ext4: let fallocate handle inline data correctly
ext4: let ext4_truncate handle inline data correctly
ext4: evict inline data out if we need to strore xattr in inode
ext4: let fiemap work with inline data
ext4: let ext4_rename handle inline dir
ext4: let empty_dir handle inline dir
ext4: let ext4_delete_entry() handle inline data
ext4: make ext4_delete_entry generic
ext4: let ext4_find_entry handle inline data
ext4: create a new function search_dir
ext4: let ext4_readdir handle inline data
ext4: let add_dir_entry handle inline data properly
...
Pull x86 EFI update from Peter Anvin:
"EFI tree, from Matt Fleming. Most of the patches are the new efivarfs
filesystem by Matt Garrett & co. The balance are support for EFI
wallclock in the absence of a hardware-specific driver, and various
fixes and cleanups."
* 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
efivarfs: Make efivarfs_fill_super() static
x86, efi: Check table header length in efi_bgrt_init()
efivarfs: Use query_variable_info() to limit kmalloc()
efivarfs: Fix return value of efivarfs_file_write()
efivarfs: Return a consistent error when efivarfs_get_inode() fails
efivarfs: Make 'datasize' unsigned long
efivarfs: Add unique magic number
efivarfs: Replace magic number with sizeof(attributes)
efivarfs: Return an error if we fail to read a variable
efi: Clarify GUID length calculations
efivarfs: Implement exclusive access for {get,set}_variable
efivarfs: efivarfs_fill_super() ensure we clean up correctly on error
efivarfs: efivarfs_fill_super() ensure we free our temporary name
efivarfs: efivarfs_fill_super() fix inode reference counts
efivarfs: efivarfs_create() ensure we drop our reference on inode on error
efivarfs: efivarfs_file_read ensure we free data in error paths
x86-64/efi: Use EFI to deal with platform wall clock (again)
x86/kernel: remove tboot 1:1 page table creation code
x86, efi: 1:1 pagetable mapping for virtual EFI calls
x86, mm: Include the entire kernel memory map in trampoline_pgd
...
Pull xfs update from Ben Myers:
"There is plenty going on, including the cleanup of xfssyncd, metadata
verifiers, CRC infrastructure for the log, tracking of inodes with
speculative allocation, a cleanup of xfs_fs_subr.c, fixes for
XFS_IOC_ZERO_RANGE, and important fix related to log replay (only
update the last_sync_lsn when a transaction completes), a fix for
deadlock on AGF buffers, documentation and comment updates, and a few
more cleanups and fixes.
Details:
- remove the xfssyncd mess
- only update the last_sync_lsn when a transaction completes
- zero allocation_args on the kernel stack
- fix AGF/alloc workqueue deadlock
- silence uninitialised f.file warning
- Update inode alloc comments
- Update mount options documentation
- report projid32bit feature in geometry call
- speculative preallocation inode tracking
- fix attr tree double split corruption
- fix broken error handling in xfs_vm_writepage
- drop buffer io reference when a bad bio is built
- add more attribute tree trace points
- growfs infrastructure changes for 3.8
- fs/xfs/xfs_fs_subr.c die die die
- add CRC infrastructure
- add CRC checks to the log
- Remove description of nodelaylog mount option from xfs.txt
- inode allocation should use unmapped buffers
- byte range granularity for XFS_IOC_ZERO_RANGE
- fix direct IO nested transaction deadlock
- fix stray dquot unlock when reclaiming dquots
- fix sparse reported log CRC endian issue"
Fix up trivial conflict in fs/xfs/xfs_fsops.c due to the same patch
having been applied twice (commits eaef854335 and 1375cb65e8: "xfs:
growfs: don't read garbage for new secondary superblocks") with later
updates to the affected code in the XFS tree.
* tag 'for-linus-v3.8-rc1' of git://oss.sgi.com/xfs/xfs: (78 commits)
xfs: fix sparse reported log CRC endian issue
xfs: fix stray dquot unlock when reclaiming dquots
xfs: fix direct IO nested transaction deadlock.
xfs: byte range granularity for XFS_IOC_ZERO_RANGE
xfs: inode allocation should use unmapped buffers.
xfs: Remove the description of nodelaylog mount option from xfs.txt
xfs: add CRC checks to the log
xfs: add CRC infrastructure
xfs: convert buffer verifiers to an ops structure.
xfs: connect up write verifiers to new buffers
xfs: add pre-write metadata buffer verifier callbacks
xfs: add buffer pre-write callback
xfs: Add verifiers to dir2 data readahead.
xfs: add xfs_da_node verification
xfs: factor and verify attr leaf reads
xfs: factor dir2 leaf read
xfs: factor out dir2 data block reading
xfs: factor dir2 free block reading
xfs: verify dir2 block format buffers
xfs: factor dir2 block read operations
...
In f2fs_fs.h, one f2fs inode contains 923 data block pointers, while
f2fs documentation says it is 929. Fix this inconsistence.
Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
This adds a document describing the mount options, proc entries, usage, and
design of Flash-Friendly File System, namely F2FS.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Ted has sent out a RFC about removing this feature. Eric and Jan
confirmed that both RedHat and SUSE enable this feature in all their
product. David also said that "As far as I know, it's enabled in all
Android kernels that use ext4." So it seems OK for us.
And what's more, as inline data depends its implementation on xattr,
and to be frank, I don't run any test again inline data enabled while
xattr disabled. So I think we should add inline data and remove this
config option in the same release.
[ The savings if you disable CONFIG_EXT4_FS_XATTR is only 27k, which
isn't much in the grand scheme of things. Since no one seems to be
testing this configuration except for some automated compile farms, on
balance we are better removing this config option, and so that it is
effectively always enabled. -- tytso ]
Cc: David Brown <davidb@codeaurora.org>
Cc: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
nodelaylog mount option is removed by commit 93b8a585. But there still be
the description about it in the xfs document. This patch removes it.
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Our server rejects compounds containing more than one write operation.
It's unclear whether this is really permitted by the spec; with 4.0,
it's possibly OK, with 4.1 (which has clearer limits on compound
parameters), it's probably not OK. No client that we're aware of has
ever done this, but in theory it could be useful.
The source of the limitation: we need an array of iovecs to pass to the
write operation. In the worst case that array of iovecs could have
hundreds of elements (the maximum rwsize divided by the page size), so
it's too big to put on the stack, or in each compound op. So we instead
keep a single such array in the compound argument.
We fill in that array at the time we decode the xdr operation.
But we decode every op in the compound before executing any of them. So
once we've used that array we can't decode another write.
If we instead delay filling in that array till the time we actually
perform the write, we can reuse it.
Another option might be to switch to decoding compound ops one at a
time. I considered doing that, but it has a number of other side
effects, and I'd rather fix just this one problem for now.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This is mostly a revert of 01dc52ebdf ("oom: remove deprecated oom_adj")
from Davidlohr Bueso.
It reintroduces /proc/pid/oom_adj for backwards compatibility with earlier
kernels. It simply scales the value linearly when /proc/pid/oom_score_adj
is written.
The major difference is that its scheduled removal is no longer included
in Documentation/feature-removal-schedule.txt. We do warn users with a
single printk, though, to suggest the more powerful and supported
/proc/pid/oom_score_adj interface.
Reported-by: Artem S. Tashkinov <t.artem@lycos.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Once inode64 is the default allocation mode now, kernel documentation should be
updated to match this behaviour.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Pull nfsd update from J Bruce Fields:
"Another relatively quiet cycle. There was some progress on my
remaining 4.1 todo's, but a couple of them were just of the form
"check that we do X correctly", so didn't have much affect on the
code.
Other than that, a bunch of cleanup and some bugfixes (including an
annoying NFSv4.0 state leak and a busy-loop in the server that could
cause it to peg the CPU without making progress)."
* 'for-3.7' of git://linux-nfs.org/~bfields/linux: (46 commits)
UAPI: (Scripted) Disintegrate include/linux/sunrpc
UAPI: (Scripted) Disintegrate include/linux/nfsd
nfsd4: don't allow reclaims of expired clients
nfsd4: remove redundant callback probe
nfsd4: expire old client earlier
nfsd4: separate session allocation and initialization
nfsd4: clean up session allocation
nfsd4: minor free_session cleanup
nfsd4: new_conn_from_crses should only allocate
nfsd4: separate connection allocation and initialization
nfsd4: reject bad forechannel attrs earlier
nfsd4: enforce per-client sessions/no-sessions distinction
nfsd4: set cl_minorversion at create time
nfsd4: don't pin clientids to pseudoflavors
nfsd4: fix bind_conn_to_session xdr comment
nfsd4: cast readlink() bug argument
NFSD: pass null terminated buf to kstrtouint()
nfsd: remove duplicate init in nfsd4_cb_recall
nfsd4: eliminate redundant nfs4_free_stateid
fs/nfsd/nfs4idmap.c: adjust inconsistent IS_ERR and PTR_ERR
...
Pull NFS client updates from Trond Myklebust:
"Features include:
- Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
Aside from the issues discussed at the LKS, distros are shipping
NFSv4.1 with all the trimmings.
- Fix fdatasync()/fsync() for the corner case of a server reboot.
- NFSv4 OPEN access fix: finally distinguish correctly between
open-for-read and open-for-execute permissions in all situations.
- Ensure that the TCP socket is closed when we're in CLOSE_WAIT
- More idmapper bugfixes
- Lots of pNFS bugfixes and cleanups to remove unnecessary state and
make the code easier to read.
- In cases where a pNFS read or write fails, allow the client to
resume trying layoutgets after two minutes of read/write-
through-mds.
- More net namespace fixes to the NFSv4 callback code.
- More net namespace fixes to the NFSv3 locking code.
- More NFSv4 migration preparatory patches.
Including patches to detect network trunking in both NFSv4 and
NFSv4.1
- pNFS block updates to optimise LAYOUTGET calls."
* tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits)
pnfsblock: cleanup nfs4_blkdev_get
NFS41: send real read size in layoutget
NFS41: send real write size in layoutget
NFS: track direct IO left bytes
NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
NFSv4 set open access operation call flag in nfs4_init_opendata_res
NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL
NFSv4 reduce attribute requests for open reclaim
NFSv4: nfs4_open_done first must check that GETATTR decoded a file type
NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid
NFSv4.1: Deal with wraparound issues when updating the layout stateid
NFSv4.1: Always set the layout stateid if this is the first layoutget
NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout
NFSv4: don't put ACCESS in OPEN compound if O_EXCL
NFSv4: don't check MAY_WRITE access bit in OPEN
NFS: Set key construction data for the legacy upcall
NFSv4.1: don't do two EXCHANGE_IDs on mount
NFS: nfs41_walk_client_list(): re-lock before iterating
...
This is to complete part of the Userspace API (UAPI) disintegration for which
the preparatory patches were pulled recently. After these patches, userspace
headers will be segregated into:
include/uapi/linux/.../foo.h
for the userspace interface stuff, and:
include/linux/.../foo.h
for the strictly kernel internal stuff.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>