Commit Graph

504966 Commits

Author SHA1 Message Date
Thadeu Lima de Souza Cascardo 045c47ca30 blk-throttle: check stats_cpu before reading it from sysfs
When reading blkio.throttle.io_serviced in a recently created blkio
cgroup, it's possible to race against the creation of a throttle policy,
which delays the allocation of stats_cpu.

Like other functions in the throttle code, just checking for a NULL
stats_cpu prevents the following oops caused by that race.

[ 1117.285199] Unable to handle kernel paging request for data at address 0x7fb4d0020
[ 1117.285252] Faulting instruction address: 0xc0000000003efa2c
[ 1137.733921] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1137.733945] SMP NR_CPUS=2048 NUMA PowerNV
[ 1137.734025] Modules linked in: bridge stp llc kvm_hv kvm binfmt_misc autofs4
[ 1137.734102] CPU: 3 PID: 5302 Comm: blkcgroup Not tainted 3.19.0 #5
[ 1137.734132] task: c000000f1d188b00 ti: c000000f1d210000 task.ti: c000000f1d210000
[ 1137.734167] NIP: c0000000003efa2c LR: c0000000003ef9f0 CTR: c0000000003ef980
[ 1137.734202] REGS: c000000f1d213500 TRAP: 0300   Not tainted  (3.19.0)
[ 1137.734230] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 42008884  XER: 20000000
[ 1137.734325] CFAR: 0000000000008458 DAR: 00000007fb4d0020 DSISR: 40000000 SOFTE: 0
GPR00: c0000000003ed3a0 c000000f1d213780 c000000000c59538 0000000000000000
GPR04: 0000000000000800 0000000000000000 0000000000000000 0000000000000000
GPR08: ffffffffffffffff 00000007fb4d0020 00000007fb4d0000 c000000000780808
GPR12: 0000000022000888 c00000000fdc0d80 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 000001003e120200 c000000f1d5b0cc0 0000000000000200 0000000000000000
GPR24: 0000000000000001 c000000000c269e0 0000000000000020 c000000f1d5b0c80
GPR28: c000000000ca3a08 c000000000ca3dec c000000f1c667e00 c000000f1d213850
[ 1137.734886] NIP [c0000000003efa2c] .tg_prfill_cpu_rwstat+0xac/0x180
[ 1137.734915] LR [c0000000003ef9f0] .tg_prfill_cpu_rwstat+0x70/0x180
[ 1137.734943] Call Trace:
[ 1137.734952] [c000000f1d213780] [d000000005560520] 0xd000000005560520 (unreliable)
[ 1137.734996] [c000000f1d2138a0] [c0000000003ed3a0] .blkcg_print_blkgs+0xe0/0x1a0
[ 1137.735039] [c000000f1d213960] [c0000000003efb50] .tg_print_cpu_rwstat+0x50/0x70
[ 1137.735082] [c000000f1d2139e0] [c000000000104b48] .cgroup_seqfile_show+0x58/0x150
[ 1137.735125] [c000000f1d213a70] [c0000000002749dc] .kernfs_seq_show+0x3c/0x50
[ 1137.735161] [c000000f1d213ae0] [c000000000218630] .seq_read+0xe0/0x510
[ 1137.735197] [c000000f1d213bd0] [c000000000275b04] .kernfs_fop_read+0x164/0x200
[ 1137.735240] [c000000f1d213c80] [c0000000001eb8e0] .__vfs_read+0x30/0x80
[ 1137.735276] [c000000f1d213cf0] [c0000000001eb9c4] .vfs_read+0x94/0x1b0
[ 1137.735312] [c000000f1d213d90] [c0000000001ebb38] .SyS_read+0x58/0x100
[ 1137.735349] [c000000f1d213e30] [c000000000009218] syscall_exit+0x0/0x98
[ 1137.735383] Instruction dump:
[ 1137.735405] 7c6307b4 7f891800 409d00b8 60000000 60420000 3d420004 392a63b0 786a1f24
[ 1137.735471] 7d49502a e93e01c8 7d495214 7d2ad214 <7cead02a> e9090008 e9490010 e9290018

And here is one code that allows to easily reproduce this, although this
has first been found by running docker.

void run(pid_t pid)
{
	int n;
	int status;
	int fd;
	char *buffer;
	buffer = memalign(BUFFER_ALIGN, BUFFER_SIZE);
	n = snprintf(buffer, BUFFER_SIZE, "%d\n", pid);
	fd = open(CGPATH "/test/tasks", O_WRONLY);
	write(fd, buffer, n);
	close(fd);
	if (fork() > 0) {
		fd = open("/dev/sda", O_RDONLY | O_DIRECT);
		read(fd, buffer, 512);
		close(fd);
		wait(&status);
	} else {
		fd = open(CGPATH "/test/blkio.throttle.io_serviced", O_RDONLY);
		n = read(fd, buffer, BUFFER_SIZE);
		close(fd);
	}
	free(buffer);
	exit(0);
}

void test(void)
{
	int status;
	mkdir(CGPATH "/test", 0666);
	if (fork() > 0)
		wait(&status);
	else
		run(getpid());
	rmdir(CGPATH "/test");
}

int main(int argc, char **argv)
{
	int i;
	for (i = 0; i < NR_TESTS; i++)
		test();
	return 0;
}

Reported-by: Ricardo Marin Matinata <rmm@br.ibm.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-02-20 22:11:58 -08:00
Linus Torvalds 3d883483dc Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull more thermal managament updates from Zhang Rui:
 "Specifics:

   - Exynos thermal driver refactoring.  Several cleanups, code
     optimization, unused symbols removal, and unused feature removal in
     Exynos thermal driver.  Thanks Lukasz for this effort.

   - Exynos thermal driver support to OF thermal.  After the code
     refactoring, the driver earned the support to OF thermal.  Chip
     thermal data were moved from driver code to DTS, reducing the code
     footprint.  Thanks Lukasz for this.

   - After receiving the OF thermal support, the exynos thermal driver
     now must allow modular build.  Thanks Arnd for detecting, reporting
     and fixing this.

   - Exynos thermal driver support to Exynos 7 SoC.  Thanks Abhilash for
     this.

   - Accurate temperature reporting on Rockchip thermal driver, thanks
     to Caesar.

   - Fix on how OF thermal enables its zones, thanks Lukasz for fixing.

   - Fixes in OF thermal examples under Documentation/.  Thanks Srinivas
     for fixing"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
  thermal: exynos: Add TMU support for Exynos7 SoC
  dts: Documentation: Add documentation for Exynos7 SoC thermal bindings
  cpufreq: exynos: allow modular build
  thermal: Fix examples in DT documentation
  thermal: exynos: Correct sanity check at exynos_report_trigger() function
  thermal: Kconfig: Remove config for not used EXYNOS_THERMAL_CORE
  thermal: exynos: Remove exynos_tmu_data.c file
  thermal: rockchip: make temperature reporting much more accurate
  thermal: exynos: Remove exynos_thermal_common.[c|h] files
  thermal: samsung: core: Exynos TMU rework to use device tree for configuration
  dts: Documentation: Update exynos-thermal.txt example for Exynos5440
  dts: Documentation: Extending documentation entry for exynos-thermal
  cpufreq: exynos: Use device tree to determine if cpufreq cooling should be registered
  thermal: exynos: Modify exynos thermal code to use device tree for cpu cooling configuration
  thermal: exynos: Provide thermal_exynos.h file to be included in device tree files
  thermal: exynos: cosmetic: Correct comment format
  thermal: of: Enable thermal_zoneX when sensor is correctly added
2015-02-19 17:51:22 -08:00
David Vrabel e3a1f6cac1 x86: pte_protnone() and pmd_protnone() must check entry is not present
Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if
_PAGE_PRESENT is clear.  Make pte_protnone() and pmd_protnone() check
for this.

This fixes a 64-bit Xen PV guest regression introduced by 8a0516ed8b
("mm: convert p[te|md]_numa users to p[te|md]_protnone_numa").  Any
userspace process would endlessly fault.

In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL set
by the hypervisor.  This meant that any fault on a present userspace
entry (e.g., a write to a read-only mapping) would be misinterpreted as
a NUMA hinting fault and the fault would not be correctly handled,
resulting in the access endlessly faulting.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-19 15:04:49 -08:00
Linus Torvalds 2b9fb532d4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This pull is mostly cleanups and fixes:

   - The raid5/6 cleanups from Zhao Lei fixup some long standing warts
     in the code and add improvements on top of the scrubbing support
     from 3.19.

   - Josef has round one of our ENOSPC fixes coming from large btrfs
     clusters here at FB.

   - Dave Sterba continues a long series of cleanups (thanks Dave), and
     Filipe continues hammering on corner cases in fsync and others

  This all was held up a little trying to track down a use-after-free in
  btrfs raid5/6.  It's not clear yet if this is just made easier to
  trigger with this pull or if its a new bug from the raid5/6 cleanups.
  Dave Sterba is the only one to trigger it so far, but he has a
  consistent way to reproduce, so we'll get it nailed shortly"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (68 commits)
  Btrfs: don't remove extents and xattrs when logging new names
  Btrfs: fix fsync data loss after adding hard link to inode
  Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group
  Btrfs: account for large extents with enospc
  Btrfs: don't set and clear delalloc for O_DIRECT writes
  Btrfs: only adjust outstanding_extents when we do a short write
  btrfs: Fix out-of-space bug
  Btrfs: scrub, fix sleep in atomic context
  Btrfs: fix scheduler warning when syncing log
  Btrfs: Remove unnecessary placeholder in btrfs_err_code
  btrfs: cleanup init for list in free-space-cache
  btrfs: delete chunk allocation attemp when setting block group ro
  btrfs: clear bio reference after submit_one_bio()
  Btrfs: fix scrub race leading to use-after-free
  Btrfs: add missing cleanup on sysfs init failure
  Btrfs: fix race between transaction commit and empty block group removal
  btrfs: add more checks to btrfs_read_sys_array
  btrfs: cleanup, rename a few variables in btrfs_read_sys_array
  btrfs: add checks for sys_chunk_array sizes
  btrfs: more superblock checks, lower bounds on devices and sectorsize/nodesize
  ...
2015-02-19 14:36:00 -08:00
Linus Torvalds 4533f6e27a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph changes from Sage Weil:
 "On the RBD side, there is a conversion to blk-mq from Christoph,
  several long-standing bug fixes from Ilya, and some cleanup from
  Rickard Strandqvist.

  On the CephFS side there is a long list of fixes from Zheng, including
  improved session handling, a few IO path fixes, some dcache management
  correctness fixes, and several blocking while !TASK_RUNNING fixes.

  The core code gets a few cleanups and Chaitanya has added support for
  TCP_NODELAY (which has been used on the server side for ages but we
  somehow missed on the kernel client).

  There is also an update to MAINTAINERS to fix up some email addresses
  and reflect that Ilya and Zheng are doing most of the maintenance for
  RBD and CephFS these days.  Do not be surprised to see a pull request
  come from one of them in the future if I am unavailable for some
  reason"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
  MAINTAINERS: update Ceph and RBD maintainers
  libceph: kfree() in put_osd() shouldn't depend on authorizer
  libceph: fix double __remove_osd() problem
  rbd: convert to blk-mq
  ceph: return error for traceless reply race
  ceph: fix dentry leaks
  ceph: re-send requests when MDS enters reconnecting stage
  ceph: show nocephx_require_signatures and notcp_nodelay options
  libceph: tcp_nodelay support
  rbd: do not treat standalone as flatten
  ceph: fix atomic_open snapdir
  ceph: properly mark empty directory as complete
  client: include kernel version in client metadata
  ceph: provide seperate {inode,file}_operations for snapdir
  ceph: fix request time stamp encoding
  ceph: fix reading inline data when i_size > PAGE_SIZE
  ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)
  ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps)
  ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)
  rbd: fix error paths in rbd_dev_refresh()
  ...
2015-02-19 14:14:42 -08:00
Linus Torvalds 89d3fa45b4 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal managament updates from Zhang Rui:
 "Specifics:

   - Abstract the code and introduce helper functions for all int340x
     thermal drivers.  From: Srinivas Pandruvada.

   - Reorganize the ACPI LPAT table support code so that it can be
     shared for both ACPI PMIC driver and int340x thermal driver.

   - Add support for Braswell in intel_soc_dts thermal driver.

   - a couple of small fixes/cleanups for step_wise governor and int340x
     thermal driver"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  Thermal/int340x_thermal: remove unused uuids.
  thermal: step_wise: spelling fixes
  thermal: int340x: fix sparse warning
  Thermal/int340x: LPAT conversion for temperature
  ACPI / PMIC: Use common LPAT table handling functions
  ACPI / LPAT: Common table processing functions
  thermal: Intel SoC DTS: Add Braswell support
  Thermal/int340x/int3402: Provide notification support
  Thermal/int340x/processor_thermal: Add thermal zone support
  Thermal/int340x/int3403: Use int340x thermal API
  Thermal/int340x/int3402: Use int340x thermal API
  Thermal/int340x: Add common thermal zone handler
2015-02-19 11:28:36 -08:00
Linus Torvalds 477ea11696 Merge tag 'edac_fixes_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull two EDAC fixes from Borislav Petkov:

 - A fix to sb_edac for proper detection on SNB machines

 - A fix to amd64_edac to not explode on Numascale machines with more
   than 16 memory controllers, from Daniel J Blueman.

* tag 'edac_fixes_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, amd64_edac: Prevent OOPS with >16 memory controllers
  sb_edac: Fix detection on SNB machines
2015-02-19 11:18:14 -08:00
Linus Torvalds 6ed3e57fd2 Merge tag 'platform-drivers-x86-v3.20-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull platform driver update from Darren Hart:
 "This includes a significant update to the toshiba_acpi driver,
  bringing it to feature parity with the Windows driver, followed by
  some needed cleanups.

  The other changes are mostly minor updates, quirks, sparse fixes, or
  cleanups.

  Details:

   - toshiba_acpi:
       Add support for missing features from the Windows driver, bump the
       sysfs version, and clean up the driver.

   - thinkpad_acpi:
       BIOS string versions, unhandled hkey events.

   - msamsung-laptop:
       Add native backlight quirk, enable better lid handling.

   - intel_scu_ipc:
       Read resources from PCI configuration

   - other:
       Fix sparse warnings, general cleanups"

* tag 'platform-drivers-x86-v3.20-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: (34 commits)
  toshiba_acpi: Cleanup GPL header
  toshiba_acpi: Cleanup comment blocks and capitalization
  toshiba_acpi: Make use of DEVICE_ATTR_{RO, RW} macros
  toshiba_acpi: Drop the toshiba_ prefix from sysfs function names
  toshiba_acpi: Move sysfs function and struct declarations further down
  Documentation/ABI: Add file describing the sysfs entries for toshiba_acpi
  toshiba_acpi: Clean file according to coding style
  toshiba_acpi: Bump version number to 0.21
  toshiba_acpi: Add support to enable/disable USB 3
  toshiba_acpi: Add support for Panel Power ON
  toshiba_acpi: Add support for Keyboard functions mode
  toshiba_acpi: Add fan entry to sysfs
  toshiba_acpi: Add version entry to sysfs
  thinkpad_acpi: support new BIOS version string pattern
  thinkpad_acpi: unhandled hkey event
  toshiba_acpi: Make toshiba_eco_mode_available more robust
  classmate-laptop: Fix sparse warning (0 as NULL)
  Sony-laptop: Fix sparse warning (make undeclared var static)
  thinkpad_acpi.c: Fix sparse warning (make undeclared var static)
  samsung-laptop.c: Prefer kstrtoint over single variable sscanf
  ...
2015-02-19 10:56:51 -08:00
Linus Torvalds b11a278397 Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kconfig updates from Michal Marek:
 "Yann E Morin was supposed to take over kconfig maintainership, but
  this hasn't happened.  So I'm sending a few kconfig patches that I
  collected:

   - Fix for missing va_end in kconfig
   - merge_config.sh displays used if given too few arguments
   - s/boolean/bool/ in Kconfig files for consistency, with the plan to
     only support bool in the future"

* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kconfig: use va_end to match corresponding va_start
  merge_config.sh: Display usage if given too few arguments
  kconfig: use bool instead of boolean for type definition attributes
2015-02-19 10:36:45 -08:00
Linus Torvalds 7734334337 Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull misc kbuild changes from Michal Marek:
 "Just a few non-critical kbuild changes:

   - builddeb adds the actual distribution name in the changelog
   - documentation fixes"

* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: trivial - fix the help doc of CONFIG_CC_OPTIMIZE_FOR_SIZE
  kbuild: Update documentation of clean-files and clean-dirs
  builddeb: Try to determine distribution
  builddeb: Update year and git repository URL in debian/copyright
2015-02-19 10:31:37 -08:00
Sage Weil 0f5417cea6 MAINTAINERS: update Ceph and RBD maintainers
- add Ilya, drop Yehuda as an RBD maintainer
- add Zheng as a Ceph maintainer
- update Yehuda and Sage's emails

Signed-off-by: Sage Weil <sage@redhat.com>
2015-02-19 10:11:20 -08:00
Linus Torvalds 27a22ee4c7 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - several cleanups in kbuild

 - serialize multiple *config targets so that 'make defconfig kvmconfig'
   works

 - The cc-ifversion macro got support for an else-branch

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild,gcov: simplify kernel/gcov/Makefile more
  kbuild: allow cc-ifversion to have the argument for false condition
  kbuild,gcov: simplify kernel/gcov/Makefile
  kbuild,gcov: remove unnecessary workaround
  kbuild: do not add $(call ...) to invoke cc-version or cc-fullversion
  kbuild: fix cc-ifversion macro
  kbuild: drop $(version_h) from MRPROPER_FILES
  kbuild: use mixed-targets when two or more config targets are given
  kbuild: remove redundant line from bounds.h/asm-offsets.h
  kbuild: merge bounds.h and asm-offsets.h rules
  kbuild: Drop support for clean-rule
2015-02-19 10:07:08 -08:00
Ilya Dryomov b28ec2f37e libceph: kfree() in put_osd() shouldn't depend on authorizer
a255651d4c ("ceph: ensure auth ops are defined before use") made
kfree() in put_osd() conditional on the authorizer.  A mechanical
mistake most likely - fix it.

Cc: Alex Elder <elder@linaro.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-02-19 14:27:51 +03:00
Ilya Dryomov 7eb71e0351 libceph: fix double __remove_osd() problem
It turns out it's possible to get __remove_osd() called twice on the
same OSD.  That doesn't sit well with rb_erase() - depending on the
shape of the tree we can get a NULL dereference, a soft lockup or
a random crash at some point in the future as we end up touching freed
memory.  One scenario that I was able to reproduce is as follows:

            <osd3 is idle, on the osd lru list>
<con reset - osd3>
con_fault_finish()
  osd_reset()
                              <osdmap - osd3 down>
                              ceph_osdc_handle_map()
                                <takes map_sem>
                                kick_requests()
                                  <takes request_mutex>
                                  reset_changed_osds()
                                    __reset_osd()
                                      __remove_osd()
                                  <releases request_mutex>
                                <releases map_sem>
    <takes map_sem>
    <takes request_mutex>
    __kick_osd_requests()
      __reset_osd()
        __remove_osd() <-- !!!

A case can be made that osd refcounting is imperfect and reworking it
would be a proper resolution, but for now Sage and I decided to fix
this by adding a safe guard around __remove_osd().

Fixes: http://tracker.ceph.com/issues/8087

Cc: Sage Weil <sage@redhat.com>
Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc53e: libceph: assert both regular and lingering lists in __remove_osd()
Cc: stable@vger.kernel.org # 3.9+: cc9f1f518c: libceph: change from BUG to WARN for __remove_osd() asserts
Cc: stable@vger.kernel.org # 3.9+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-02-19 14:27:50 +03:00
Christoph Hellwig 7ad18afad0 rbd: convert to blk-mq
This converts the rbd driver to use the blk-mq infrastructure.  Except
for switching to a per-request work item this is almost mechanical.

This was tested by Alexandre DERUMIER in November, and found to give
him 120000 iops, although the only comparism available was an old
3.10 kernel which gave 80000iops.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <elder@linaro.org>
[idryomov@gmail.com: context, blk_mq_init_queue() EH]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-02-19 14:27:42 +03:00
Yan, Zheng 4d41cef279 ceph: return error for traceless reply race
When we receives traceless reply for request that created new inode,
we re-send a lookup request to MDS get information of the newly created
inode. (VFS expects FS' callback return an inode in create case)
This breaks one request into two requests. Other client may modify or
move to the new inode in the middle.

When the race happens, ceph_handle_notrace_create() unconditionally
links the dentry for 'create' operation to the inode returned by lookup.
This may confuse VFS when the inode is a directory (VFS does not allow
multiple linkages for directory inode).

This patch makes ceph_handle_notrace_create() when it detect a race.
This event should be rare and it happens only when we talk to old MDS.
Recent MDS does not send traceless reply for request that creates new
inode.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:40 +03:00
Yan, Zheng 5cba372c0f ceph: fix dentry leaks
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:40 +03:00
Yan, Zheng 3de22be677 ceph: re-send requests when MDS enters reconnecting stage
So that MDS can check if any request is already completed and process
completed requests in clientreplay stage. When completed requests are
processed in clientreplay stage, MDS can avoid sending traceless
replies.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:40 +03:00
Ilya Dryomov 2a0b61cefc ceph: show nocephx_require_signatures and notcp_nodelay options
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19 13:31:40 +03:00
Chaitanya Huilgol ba988f87f5 libceph: tcp_nodelay support
TCP_NODELAY socket option set on connection sockets,
disables Nagle’s algorithm and improves latency characteristics.
tcp_nodelay(default)/notcp_nodelay option flags provided to
enable/disable setting the socket option.

Signed-off-by: Chaitanya Huilgol <chaitanya.huilgol@sandisk.com>
[idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments]
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19 13:31:40 +03:00
Ilya Dryomov cf32bd9c86 rbd: do not treat standalone as flatten
If the clone is resized down to 0, it becomes standalone.  If such
resize is carried over while an image is mapped we would detect this
and call rbd_dev_parent_put() which means "let go of all parent state,
including the spec(s) of parent images(s)".  This leads to a mismatch
between "rbd info" and sysfs parent fields, so a fix is in order.

    # rbd create --image-format 2 --size 1 foo
    # rbd snap create foo@snap
    # rbd snap protect foo@snap
    # rbd clone foo@snap bar
    # DEV=$(rbd map bar)
    # rbd resize --allow-shrink --size 0 bar
    # rbd resize --size 1 bar
    # rbd info bar | grep parent
            parent: rbd/foo@snap

Before:

    # cat /sys/bus/rbd/devices/0/parent
    (no parent image)

After:

    # cat /sys/bus/rbd/devices/0/parent
    pool_id 0
    pool_name rbd
    image_id 10056b8b4567
    image_name foo
    snap_id 2
    snap_name snap
    overlap 0

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-02-19 13:31:39 +03:00
Yan, Zheng bf91c31508 ceph: fix atomic_open snapdir
ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value
and creates snapdir inode if it's -ENOENT

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:39 +03:00
Yan, Zheng 2f92b3d0a9 ceph: properly mark empty directory as complete
ceph_add_cap() calls __check_cap_issue(), which clears directory
inode' complete flag. so we should set the complete flag for empty
directory should be set after calling ceph_add_cap().

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:39 +03:00
Yan, Zheng a6a5ce4f0d client: include kernel version in client metadata
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:39 +03:00
Yan, Zheng 38c48b5f0a ceph: provide seperate {inode,file}_operations for snapdir
remove all unsupported operations from {inode,file}_operations.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:39 +03:00