Commit Graph

397775 Commits

Author SHA1 Message Date
Jozsef Kadlecsik
169faa2e19 netfilter: ipset: Validate the set family and not the set type family at swapping
This closes netfilter bugzilla #843, reported by Quentin Armitage.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16 20:36:05 +02:00
Jozsef Kadlecsik
0f1799ba1a netfilter: ipset: Consistent userspace testing with nomatch flag
The "nomatch" commandline flag should invert the matching at testing,
similarly to the --return-nomatch flag of the "set" match of iptables.
Until now it worked with the elements with "nomatch" flag only. From
now on it works with elements without the flag too, i.e:

 # ipset n test hash:net
 # ipset a test 10.0.0.0/24 nomatch
 # ipset t test 10.0.0.1
 10.0.0.1 is NOT in set test.
 # ipset t test 10.0.0.1 nomatch
 10.0.0.1 is in set test.

 # ipset a test 192.168.0.0/24
 # ipset t test 192.168.0.1
 192.168.0.1 is in set test.
 # ipset t test 192.168.0.1 nomatch
 192.168.0.1 is NOT in set test.

 Before the patch the results were

 ...
 # ipset t test 192.168.0.1
 192.168.0.1 is in set test.
 # ipset t test 192.168.0.1 nomatch
 192.168.0.1 is in set test.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16 20:35:55 +02:00
Jozsef Kadlecsik
55524c219a netfilter: ipset: Skip really non-first fragments for IPv6 when getting port/protocol
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16 20:33:44 +02:00
Phil Oester
d830f0fa1d netfilter: nf_nat_proto_icmpv6:: fix wrong comparison in icmpv6_manip_pkt
In commit 58a317f1 (netfilter: ipv6: add IPv6 NAT support), icmpv6_manip_pkt
was added with an incorrect comparison of ICMP codes to types.  This causes
problems when using NAT rules with the --random option.  Correct the
comparison.

This closes netfilter bugzilla #851, reported by Alexander Neumann.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-13 11:58:48 +02:00
Michal Kubeček
c13a84a830 netfilter: nf_conntrack: use RCU safe kfree for conntrack extensions
Commit 68b80f11 (netfilter: nf_nat: fix RCU races) introduced
RCU protection for freeing extension data when reallocation
moves them to a new location. We need the same protection when
freeing them in nf_ct_ext_free() in order to prevent a
use-after-free by other threads referencing a NAT extension data
via bysource list.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-13 11:58:40 +02:00
Ariel Elior
c19d65c95c bnx2x: Fix configuration of doorbell block
As part of VF RSS feature doorbell block was configured not to use dpm, but
a small part of configuration was left out, preventing the driver from sending
tx messages to the device. This patch adds the missing configuration.

Reported-by: Eric Dumazet <eric.dumazet@gmil.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-09 17:06:14 -04:00
Linus Torvalds
300893b08f Merge tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs
Pull xfs updates from Ben Myers:
 "For 3.12-rc1 there are a number of bugfixes in addition to work to
  ease usage of shared code between libxfs and the kernel, the rest of
  the work to enable project and group quotas to be used simultaneously,
  performance optimisations in the log and the CIL, directory entry file
  type support, fixes for log space reservations, some spelling/grammar
  cleanups, and the addition of user namespace support.

   - introduce readahead to log recovery
   - add directory entry file type support
   - fix a number of spelling errors in comments
   - introduce new Q_XGETQSTATV quotactl for project quotas
   - add USER_NS support
   - log space reservation rework
   - CIL optimisations
  - kernel/userspace libxfs rework"

* tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs: (112 commits)
  xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
  xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
  Fix wrong flag ASSERT in xfs_attr_shortform_getvalue
  xfs: finish removing IOP_* macros.
  xfs: inode log reservations are too small
  xfs: check correct status variable for xfs_inobt_get_rec() call
  xfs: inode buffers may not be valid during recovery readahead
  xfs: check LSN ordering for v5 superblocks during recovery
  xfs: btree block LSN escaping to disk uninitialised
  XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
  xfs: fix bad dquot buffer size in log recovery readahead
  xfs: don't account buffer cancellation during log recovery readahead
  xfs: check for underflow in xfs_iformat_fork()
  xfs: xfs_dir3_sfe_put_ino can be static
  xfs: introduce object readahead to log recovery
  xfs: Simplify xfs_ail_min() with list_first_entry_or_null()
  xfs: Register hotcpu notifier after initialization
  xfs: add xfs sb v4 support for dirent filetype field
  xfs: Add write support for dirent filetype field
  xfs: Add read-only support for dirent filetype field
  ...
2013-09-09 11:19:09 -07:00
Olof Johansson
45150c43b1 direct-io: Use return from cmpxchg to decide of assignment happened
Not using the return value can in the generic case be racy, so it's
in general good practice to check the return value instead.

This also resolved the warning caused on ARM and other architectures:

  fs/direct-io.c: In function 'sb_init_dio_done_wq':
  fs/direct-io.c:557:2: warning: value computed is not used [-Wunused-value]

Signed-off-by: Olof Johansson <olof@lixom.net>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: H Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-09 10:47:42 -07:00
Linus Torvalds
ef9a61bef9 Merge tag 'for-linus-20130909' of git://git.infradead.org/linux-mtd
Pull mtd updates from David Woodhouse:
 - factor out common code from MTD tests
 - nand-gpio cleanup and portability to non-ARM
 - m25p80 support for 4-byte addressing chips, other new chips
 - pxa3xx cleanup and support for new platforms
 - remove obsolete alauda, octagon-5066 drivers
 - erase/write support for bcm47xxsflash
 - improve detection of ECC requirements for NAND, controller setup
 - NFC acceleration support for atmel-nand, read/write via SRAM
 - etc

* tag 'for-linus-20130909' of git://git.infradead.org/linux-mtd: (184 commits)
  mtd: chips: Add support for PMC SPI Flash chips in m25p80.c
  mtd: ofpart: use for_each_child_of_node() macro
  mtd: mtdswap: replace strict_strtoul() with kstrtoul()
  mtd cs553x_nand: use kzalloc() instead of memset
  mtd: atmel_nand: fix error return code in atmel_nand_probe()
  mtd: bcm47xxsflash: writing support
  mtd: bcm47xxsflash: implement erasing support
  mtd: bcm47xxsflash: convert to module_platform_driver instead of init/exit
  mtd: bcm47xxsflash: convert kzalloc to avoid invalid access
  mtd: remove alauda driver
  mtd: nand: mxc_nand: mark 'const' properly
  mtd: maps: cfi_flagadm: add missing __iomem annotation
  mtd: spear_smi: add missing __iomem annotation
  mtd: r852: Staticize local symbols
  mtd: nandsim: Staticize local symbols
  mtd: impa7: add missing __iomem annotation
  mtd: sm_ftl: Staticize local symbols
  mtd: m25p80: add support for mr25h10
  mtd: m25p80: make CONFIG_M25PXX_USE_FAST_READ safe to enable
  mtd: m25p80: Pass flags through CAT25_INFO macro
  ...
2013-09-09 10:33:19 -07:00
Linus Torvalds
b5f0998cae Merge tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire updates from Stefan Richter:

 - Fix a regression since 3.2 inclusive: The subsystem workqueue
   deadlocked between transaction completion handling and bus reset
   handling if the worker pool could not be increased in time.

 - janitorial updates

* tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: ohci: Fix deadlock at bus reset
  firewire: ohci: Change module_pci_driver to module_init/module_exit
  firewire: ohci: beautify some macro definitions
  firewire: ohci: change confusing name of a struct member
  firewire: core: typecast from gfp_t to bool more safely
  firewire: WQ_NON_REENTRANT is meaningless and going away
2013-09-09 10:32:03 -07:00
Linus Torvalds
64c353864e Merge branch 'for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull DMA mapping update from Marek Szyprowski:
 "This contains an addition of Device Tree support for reserved memory
  regions (Contiguous Memory Allocator is one of the drivers for it) and
  changes required by the KVM extensions for PowerPC architectue"

* 'for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  ARM: init: add support for reserved memory defined by device tree
  drivers: of: add initialization code for dma reserved memory
  drivers: of: add function to scan fdt nodes given by path
  drivers: dma-contiguous: clean source code and prepare for device tree
2013-09-09 10:26:33 -07:00
Linus Torvalds
d8cacd3a25 Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio update from Rusty Russell:
 "More console fixes; these are the theoretical ones which didn't get
  CC:stable.  But for that reason, I did a merge with master partway
  through to avoid an unnecessary conflict.

  Also: a fun lguest bug turns out if you don't clear the TF flag when
  trapping Bad Things happen to the guest kernel as the stack
  overflows..."

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  virtio_pci: pm: Use CONFIG_PM_SLEEP instead of CONFIG_PM
  lguest: fix GPF in guest when using gdb.
  lguest: fix guest kernel stack overflow when TF bit set.
  lguest: fix BUG_ON() in invalid guest page table.
  virtio: console: prevent use-after-free of port name in port unplug
  virtio: console: cleanup an error message
  virtio: console: fix locking around send_sigio_to_port()
  virtio: console: add locking in port unplug path
  virtio: console: add locks around buffer removal in port unplug path
  tools/lguest: offer VIRTIO_F_ANY_LAYOUT for net device.
  virtio tools: add .gitignore
  lguest: Point to the right directory for the lguest launcher
2013-09-09 10:20:54 -07:00
Linus Torvalds
d75671e36e Merge tag 'vfio-v3.12-rc0' of git://github.com/awilliam/linux-vfio
Pull VFIO update from Alex Williamson:
 "VFIO updates include safer default file flags for VFIO device fds, an
  external user interface exported to allow other modules to hold
  references to VFIO groups, a fix to test for extended config space on
  PCIe and PCI-x, and new hot reset interfaces for PCI devices which
  allows the user to do PCI bus/slot resets when all of the devices
  affected by the reset are owned by the user.

  For this last feature, the PCI bus reset interface, I depend on
  changes already merged from Bjorn's PCI pull request.  I therefore
  merged my tree up to commit cb3e433, which I think was the correct
  action, but as Stephen Rothwell noted, I failed to provide a commit
  message indicating why the merge was required.  Sorry for that.
  Thanks, Alex"

* tag 'vfio-v3.12-rc0' of git://github.com/awilliam/linux-vfio:
  vfio: fix documentation
  vfio-pci: PCI hot reset interface
  vfio-pci: Test for extended config space
  vfio-pci: Use fdget() rather than eventfd_fget()
  vfio: Add O_CLOEXEC flag to vfio device fd
  vfio: use get_unused_fd_flags(0) instead of get_unused_fd()
  vfio: add external user support
2013-09-09 10:19:36 -07:00
Linus Torvalds
bf97293eb8 Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - Fix NFSv4 recovery so that it doesn't recover lost locks in cases
     such as lease loss due to a network partition, where doing so may
     result in data corruption.  Add a kernel parameter to control
     choice of legacy behaviour or not.
   - Performance improvements when 2 processes are writing to the same
     file.
   - Flush data to disk when an RPCSEC_GSS session timeout is imminent.
   - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
     NFS clients from being able to manipulate our lease and file
     locking state.
   - Allow sharing of RPCSEC_GSS caches between different rpc clients.
   - Fix the broken NFSv4 security auto-negotiation between client and
     server.
   - Fix rmdir() to wait for outstanding sillyrename unlinks to complete
   - Add a tracepoint framework for debugging NFSv4 state recovery
     issues.
   - Add tracing to the generic NFS layer.
   - Add tracing for the SUNRPC socket connection state.
   - Clean up the rpc_pipefs mount/umount event management.
   - Merge more patches from Chuck in preparation for NFSv4 migration
     support"

* tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits)
  NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
  NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
  NFSv4: Allow security autonegotiation for submounts
  NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
  NFSv4: Fix security auto-negotiation
  NFS: Clean up nfs_parse_security_flavors()
  NFS: Clean up the auth flavour array mess
  NFSv4.1 Use MDS auth flavor for data server connection
  NFS: Don't check lock owner compatability unless file is locked (part 2)
  NFS: Don't check lock owner compatibility in writes unless file is locked
  nfs4: Map NFS4ERR_WRONG_CRED to EPERM
  nfs4.1: Add SP4_MACH_CRED write and commit support
  nfs4.1: Add SP4_MACH_CRED stateid support
  nfs4.1: Add SP4_MACH_CRED secinfo support
  nfs4.1: Add SP4_MACH_CRED cleanup support
  nfs4.1: Add state protection handler
  nfs4.1: Minimal SP4_MACH_CRED implementation
  SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
  SUNRPC: Add an identifier for struct rpc_clnt
  SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints
  ...
2013-09-09 09:19:15 -07:00
Linus Torvalds
16d70e1529 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse bugfixes from Miklos Szeredi:
 "Just a bunch of bugfixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: use list_for_each_entry() for list traversing
  fuse: readdir: check for slash in names
  fuse: hotfix truncate_pagecache() issue
  fuse: invalidate inode attributes on xattr modification
  fuse: postpone end_page_writeback() in fuse_writepage_locked()
2013-09-09 09:18:23 -07:00
Linus Torvalds
6c337ad6cc Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull GFS2 updates from Steven Whitehouse:
 "This is possibly the smallest ever set of GFS2 patches for a merge
  window.  Also, most of them are bug fixes this time.

  Two of my three patches (moving gfs2_sync_meta and merging the two
  writepage implementations) are clean ups with the third (taking the
  glock ref in examine_bucket) being a fix for a difficult to hit race
  condition.

  The removal of an unused memory barrier is a clean up from Bob
  Peterson, and the "spectator" relates to a rarely used mount option.
  Ben Marzinski's patch fixes a corner case where the incorrect inode
  flags were being set, resulting in incorrect behaviour on fsync"

* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: dirty inode correctly in gfs2_write_end
  GFS2: Don't flag consistency error if first mounter is a spectator
  GFS2: Remove unnecessary memory barrier
  GFS2: Merge ordered and writeback writepage
  GFS2: Take glock reference in examine_bucket()
  GFS2: Move gfs2_sync_meta to lops.c
2013-09-09 09:16:51 -07:00
Linus Torvalds
6cccc7d301 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "This includes both the first pile of Ceph patches (which I sent to
  torvalds@vger, sigh) and a few new patches that add support for
  fscache for Ceph.  That includes a few fscache core fixes that David
  Howells asked go through the Ceph tree.  (Thanks go to Milosz Tanski
  for putting this feature together)

  This first batch of patches (included here) had (has) several
  important RBD bug fixes, hole punch support, several different
  cleanups in the page cache interactions, improvements in the truncate
  code (new truncate mutex to avoid shenanigans with i_mutex), and a
  series of fixes in the synchronous striping read/write code.

  On top of that is a random collection of small fixes all across the
  tree (error code checks and error path cleanup, obsolete wq flags,
  etc)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (43 commits)
  ceph: use d_invalidate() to invalidate aliases
  ceph: remove ceph_lookup_inode()
  ceph: trivial buildbot warnings fix
  ceph: Do not do invalidate if the filesystem is mounted nofsc
  ceph: page still marked private_2
  ceph: ceph_readpage_to_fscache didn't check if marked
  ceph: clean PgPrivate2 on returning from readpages
  ceph: use fscache as a local presisent cache
  fscache: Netfs function for cleanup post readpages
  FS-Cache: Fix heading in documentation
  CacheFiles: Implement interface to check cache consistency
  FS-Cache: Add interface to check consistency of a cached object
  rbd: fix null dereference in dout
  rbd: fix buffer size for writes to images with snapshots
  libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc
  rbd: fix I/O error propagation for reads
  ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem
  ceph: allow sync_read/write return partial successed size of read/write.
  ceph: fix bugs about handling short-read for sync read mode.
  ceph: remove useless variable revoked_rdcache
  ...
2013-09-09 09:13:22 -07:00
Linus Torvalds
255ae3fbd2 Merge tag 'metag-for-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
Pull metag architecture changes from James Hogan:
 - Device tree updates for TZ1090 GPIO drivers merged via GPIO tree.
 - Add driver for ImgTec PDC irqchip as found in TZ1090 SoC.
 - Add linux-metag mailing list to MAINTAINERS file.

* tag 'metag-for-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
  irq-imgpdc: add ImgTec PDC irqchip driver
  MAINTAINERS: add linux-metag mailing list
  metag: tz1090: instantiate gpio-tz1090-pdc
  metag: tz1090: select and instantiate gpio-tz1090
  metag: tz1090: select and instantiate irq-imgpdc
2013-09-09 09:09:44 -07:00
Linus Torvalds
89c5a9461d Merge tag 'arc-v3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC changes from Vineet Gupta:

 - ARC MM changes:
    - preparation for MMUv4 (accomodate new PTE bits, new cmds)
    - Rework the ASID allocation algorithm to remove asid-mm reverse map
 - Boilerplate code consolidation in Exception Handlers
 - Disable FRAME_POINTER for ARC
 - Unaligned Access Emulation for Big-Endian from Noam
 - Bunch of fixes (udelay, missing accessors) from Mischa

* tag 'arc-v3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: fix new Section mismatches in build (post __cpuinit cleanup)
  Kconfig.debug: Add FRAME_POINTER anti-dependency for ARC
  ARC: Fix __udelay calculation
  ARC: remove console_verbose() from setup_arch()
  ARC: Add read*_relaxed to asm/io.h
  ARC: Handle un-aligned user space access in BE.
  ARC: [ASID] Track ASID allocation cycles/generations
  ARC: [ASID] activate_mm() == switch_mm()
  ARC: [ASID] get_new_mmu_context() to conditionally allocate new ASID
  ARC: [ASID] Refactor the TLB paranoid debug code
  ARC: [ASID] Remove legacy/unused debug code
  ARC: No need to flush the TLB in early boot
  ARC: MMUv4 preps/3 - Abstract out TLB Insert/Delete
  ARC: MMUv4 preps/2 - Reshuffle PTE bits
  ARC: MMUv4 preps/1 - Fold PTE K/U access flags
  ARC: Code cosmetics (Nothing semantical)
  ARC: Entry Handler tweaks: Optimize away redundant IRQ_DISABLE_SAVE
  ARC: Exception Handlers Code consolidation
  ARC: Add some .gitignore entries
2013-09-09 09:05:33 -07:00
Linus Torvalds
833ae40b51 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu fixes from Greg Ungerer:
 "Just a small collection of cleanups and fixes this time, no big
  changes.  The most interresting are to make the m68k and m68knommu
  consistently use CONFIG_IOMAP, clean out some unused board config
  options and flush the cache on signal stack creation"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: remove 16 unused boards in Kconfig.machine
  m68k: define 'VM_DATA_DEFAULT_FLAGS' no matter whether has 'NOMMU' or not
  m68knommu: user generic iomap to support ioread*/iowrite*
  m68k/coldfire: flush cache when creating the signal stack frame
  m68knommu: Mark functions only called from setup_arch() __init
2013-09-09 09:04:46 -07:00
Linus Torvalds
20e029d791 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
 "This pile contains mostly fixes and improvements for issues identified
  by Richard W M Jones while adding UML as backend to libguestfs"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Add irq chip um/mask handlers
  um: prctl: Do not include linux/ptrace.h
  um: Run UML in it's own session.
  um: Cleanup SIGTERM handling
  um: ubd: Introduce submit_request()
  um: ubd: Add REQ_FLUSH suppport
  um: Implement probe_kernel_read()
  um: hostfs: Fix writeback
2013-09-09 09:03:46 -07:00
Linus Torvalds
e5c832d555 vfs: fix dentry RCU to refcounting possibly sleeping dput()
This is the fix that the last two commits indirectly led up to - making
sure that we don't call dput() in a bad context on the dentries we've
looked up in RCU mode after the sequence count validation fails.

This basically expands d_rcu_to_refcount() into the callers, and then
fixes the callers to delay the dput() in the failure case until _after_
we've dropped all locks and are no longer in an RCU-locked region.

The case of 'complete_walk()' was trivial, since its failure case did
the unlock_rcu_walk() directly after the call to d_rcu_to_refcount(),
and as such that is just a pure expansion of the function with a trivial
movement of the resulting dput() to after 'unlock_rcu_walk()'.

In contrast, the unlazy_walk() case was much more complicated, because
not only does convert two different dentries from RCU to be reference
counted, but it used to not call unlock_rcu_walk() at all, and instead
just returned an error and let the caller clean everything up in
"terminate_walk()".

Happily, one of the dentries in question (called "parent" inside
unlazy_walk()) is the dentry of "nd->path", which terminate_walk() wants
a refcount to anyway for the non-RCU case.

So what the new and improved unlazy_walk() does is to first turn that
dentry into a refcounted one, and once that is set up, the error cases
can continue to use the terminate_walk() helper for cleanup, but for the
non-RCU case.  Which makes it possible to drop out of RCU mode if we
actually hit the sequence number failure case.

Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-08 18:13:49 -07:00
Aaron Lu
9e266ece21 virtio_pci: pm: Use CONFIG_PM_SLEEP instead of CONFIG_PM
The virtio_pci_freeze/restore are defined under CONFIG_PM but is used
by SET_SYSTEM_SLEEP_PM_OPS macro, which is defined under
CONFIG_PM_SLEEP. So if CONFIG_PM_SLEEP is not cofigured but
CONFIG_PM_RUNTIME is, the following warning message appeared:

drivers/virtio/virtio_pci.c:770:12: warning: ‘virtio_pci_freeze’ defined but not used [-Wunused-function]
 static int virtio_pci_freeze(struct device *dev)
            ^
drivers/virtio/virtio_pci.c:790:12: warning: ‘virtio_pci_restore’ defined but not used [-Wunused-function]
 static int virtio_pci_restore(struct device *dev)
            ^
Fix it by changing CONFIG_PM to CONFIG_PM_SLEEP.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-09-09 10:02:53 +09:30
Linus Torvalds
0d98439ea3 vfs: use lockred "dead" flag to mark unrecoverably dead dentries
This simplifies the RCU to refcounting code in particular.

I was originally intending to leave this for later, but walking through
all the dput() logic (see previous commit), I realized that the dput()
"might_sleep()" check was misleadingly weak.  And I removed it as
misleading, both for performance profiling and for debugging.

However, the might_sleep() debugging case is actually true: the final
dput() can indeed sleep, if the inode of the dentry that you are
releasing ends up sleeping at iput time (see dentry_iput()).  So the
problem with the might_sleep() in dput() wasn't that it wasn't true, it
was that it wasn't actually testing and triggering on the interesting
case.

In particular, just about *any* dput() can indeed sleep, if you happen
to race with another thread deleting the file in question, and you then
lose the race to the be the last dput() for that file.  But because it's
a very rare race, the debugging code would never trigger it in practice.

Why is this problematic? The new d_rcu_to_refcount() (see commit
15570086b5: "vfs: reimplement d_rcu_to_refcount() using
lockref_get_or_lock()") does a dput() for the failure case, and it does
it under the RCU lock.  So potentially sleeping really is a bug.

But there's no way I'm going to fix this with the previous complicated
"lockref_get_or_lock()" interface.  And rather than revert to the old
and crufty nested dentry locking code (which did get this right by
delaying the reference count updates until they were verified to be
safe), let's make forward progress.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-08 13:46:52 -07:00
Linus Torvalds
8aab6a2733 vfs: reorganize dput() memory accesses
This is me being a bit OCD after all the dentry optimization work this
merge window: profiles end up showing 'dput()' as a rather expensive
operation, and there were two unrelated bad reasons for that.

The first reason was reading d_lockref.count for debugging purposes,
which touches the lockref cacheline (for reads) before really need to.
More importantly, the debugging test in question is _wrong_, and has
hidden bugs.  It's true that we can only sleep when the count goes down
to zero, but the test as-is hides the much more subtle bug that happens
if we race with somebody else deleting the file.

Anyway we _will_ touch that cacheline, but let's do it for a write and
in the right routine (ie in "lockref_put_or_lock()") which annotates the
costs better.  So remove the misleading debug code.

The other was an unnecessary access to the cacheline that contains the
d_lru list, just to check whether we already were on the LRU list or
not.  This is exactly what we have d_flags for, so that we can avoid
touching extra cache lines for the common case.  So just add another bit
for "is this dentry on the LRU".

Finally, mark the tests properly likely/unlikely, so that the common
fast-paths are dense in the instruction stream.

This makes the profiles look much saner.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-08 13:26:18 -07:00