Commit Graph

141 Commits

Author SHA1 Message Date
Mikulas Patocka
ff236289a2 loop: fix crash when using unassigned loop device
commit ef7e7c82e0 upstream.

When the loop module is loaded, it creates 8 loop devices /dev/loop[0-7].
The devices have no request routine and thus, when they are used without
being assigned, a crash happens.

For example, these commands cause crash (assuming there are no used loop
devices):

Kernel Fault: Code=26 regs=000000007f420980 (Addr=0000000000000010)
CPU: 1 PID: 50 Comm: kworker/1:1 Not tainted 3.11.0 #1
Workqueue: ksnaphd do_metadata [dm_snapshot]
task: 000000007fcf4078 ti: 000000007f420000 task.ti: 000000007f420000
[  116.319988]
     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001001111111100001111 Not tainted
r00-03  000000ff0804ff0f 00000000408bf5d0 00000000402d8204 000000007b7ff6c0
r04-07  00000000408a95d0 000000007f420950 000000007b7ff6c0 000000007d06c930
r08-11  000000007f4205c0 0000000000000001 000000007f4205c0 000000007f4204b8
r12-15  0000000000000010 0000000000000000 0000000000000000 0000000000000000
r16-19  000000001108dd48 000000004061cd7c 000000007d859800 000000000800000f
r20-23  0000000000000000 0000000000000008 0000000000000000 0000000000000000
r24-27  00000000ffffffff 000000007b7ff6c0 000000007d859800 00000000408a95d0
r28-31  0000000000000000 000000007f420950 000000007f420980 000000007f4208e8
sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000303000
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  117.549988]
IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d82fc 00000000402d8300
 IIR: 53820020    ISR: 0000000000000000  IOR: 0000000000000010
 CPU:        1   CR30: 000000007f420000 CR31: ffffffffffffffff
 ORIG_R28: 0000000000000001
 IAOQ[0]: generic_make_request+0x11c/0x1a0
 IAOQ[1]: generic_make_request+0x120/0x1a0
 RP(r2): generic_make_request+0x24/0x1a0
Backtrace:
 [<00000000402d83f0>] submit_bio+0x70/0x140
 [<0000000011087c4c>] dispatch_io+0x234/0x478 [dm_mod]
 [<0000000011087f44>] sync_io+0xb4/0x190 [dm_mod]
 [<00000000110883bc>] dm_io+0x2c4/0x310 [dm_mod]
 [<00000000110bfcd0>] do_metadata+0x28/0xb0 [dm_snapshot]
 [<00000000401591d8>] process_one_work+0x160/0x460
 [<0000000040159bc0>] worker_thread+0x300/0x478
 [<0000000040161a70>] kthread+0x118/0x128
 [<0000000040104020>] end_fault_vector+0x20/0x28
 [<0000000040177220>] task_tick_fair+0x420/0x4d0
 [<00000000401aa048>] invoke_rcu_core+0x50/0x60
 [<00000000401ad5b8>] rcu_check_callbacks+0x210/0x8d8
 [<000000004014aaa0>] update_process_times+0xa8/0xc0
 [<00000000401ab86c>] rcu_process_callbacks+0x4b4/0x598
 [<0000000040142408>] __do_softirq+0x250/0x2c0
 [<00000000401789d0>] find_busiest_group+0x3c0/0xc70
[  119.379988]
Kernel panic - not syncing: Kernel Fault
Rebooting in 1 seconds..

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-04 10:56:20 -08:00
Mikulas Patocka
6be5caa68a loop: fix crash if blk_alloc_queue fails
commit 3ec981e30f upstream.

loop: fix crash if blk_alloc_queue fails

If blk_alloc_queue fails, loop_add cleans up, but it doesn't clean up the
identifier allocated with idr_alloc. That causes crash on module unload in
idr_for_each(&loop_index_idr, &loop_exit_cb, NULL); where we attempt to
remove non-existed device with that id.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000380
IP: [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
PGD 43d399067 PUD 43d0ad067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: loop(-) dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_loop dm_mod ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_userspace cpufreq_stats cpufreq_ondemand cpufreq_conservative cpufreq_powersave spadfs fuse hid_generic usbhid hid raid0 md_mod dmi_sysfs nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc lm85 hwmon_vid snd_hwdep snd_usbmidi_lib snd_rawmidi snd soundcore acpi_cpufreq ohci_hcd freq_table tg3 ehci_pci mperf ehci_hcd kvm_amd kvm sata_svw serverworks libphy libata ide_core k10temp usbcore hwmon microcode ptp pcspkr pps_core e100 skge mii usb_common i2c_piix4 floppy evdev rtc_cmos i2c_core processor but!
 ton unix
CPU: 7 PID: 2735 Comm: rmmod Tainted: G        W    3.10.15-devel #15
Hardware name: empty empty/S3992-E, BIOS 'V1.06   ' 06/09/2009
task: ffff88043d38e780 ti: ffff88043d21e000 task.ti: ffff88043d21e000
RIP: 0010:[<ffffffff812057c9>]  [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
RSP: 0018:ffff88043d21fe10  EFLAGS: 00010282
RAX: ffffffffa05102e0 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88043ea82800 RDI: 0000000000000000
RBP: ffff88043d21fe48 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000000ff
R13: 0000000000000080 R14: 0000000000000000 R15: ffff88043ea82800
FS:  00007ff646534700(0000) GS:ffff880447000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000380 CR3: 000000043e9bf000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffffffff8100aba4 0000000000000092 ffff88043d21fe48 ffff88043ea82800
 00000000000000ff ffff88043d21fe98 0000000000000000 ffff88043d21fe60
 ffffffffa05102b4 0000000000000000 ffff88043d21fe70 ffffffffa05102ec
Call Trace:
 [<ffffffff8100aba4>] ? native_sched_clock+0x24/0x80
 [<ffffffffa05102b4>] loop_remove+0x14/0x40 [loop]
 [<ffffffffa05102ec>] loop_exit_cb+0xc/0x10 [loop]
 [<ffffffff81217b74>] idr_for_each+0x104/0x190
 [<ffffffffa05102e0>] ? loop_remove+0x40/0x40 [loop]
 [<ffffffff8109adc5>] ? trace_hardirqs_on_caller+0x105/0x1d0
 [<ffffffffa05135dc>] loop_exit+0x34/0xa58 [loop]
 [<ffffffff810a98ea>] SyS_delete_module+0x13a/0x260
 [<ffffffff81221d5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f
Code: f0 4c 8b 6d f8 c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56 41 55 4c 8d af 80 00 00 00 41 54 53 48 89 fb 48 83 ec 18 <48> 83 bf 80 03 00
00 00 74 4d e8 98 fe ff ff 31 f6 48 c7 c7 20
RIP  [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
 RSP <ffff88043d21fe10>
CR2: 0000000000000380
---[ end trace 64ec069ec70f1309 ]---

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-04 10:56:18 -08:00
Mikulas Patocka
12e6e9c2a5 block: fix a probe argument to blk_register_region
commit a207f59376 upstream.

The probe function is supposed to return NULL on failure (as we can see in
kobj_lookup: kobj = probe(dev, index, data); ... if (kobj) return kobj;

However, in loop and brd, it returns negative error from ERR_PTR.

This causes a crash if we simulate disk allocation failure and run
less -f /dev/loop0 because the negative number is interpreted as a pointer:

BUG: unable to handle kernel NULL pointer dereference at 00000000000002b4
IP: [<ffffffff8118b188>] __blkdev_get+0x28/0x450
PGD 23c677067 PUD 23d6d1067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: loop hpfs nvidia(PO) ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_stats cpufreq_ondemand cpufreq_userspace cpufreq_powersave cpufreq_conservative hid_generic spadfs usbhid hid fuse raid0 snd_usb_audio snd_pcm_oss snd_mixer_oss md_mod snd_pcm snd_timer snd_page_alloc snd_hwdep snd_usbmidi_lib dmi_sysfs snd_rawmidi nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd soundcore lm85 hwmon_vid ohci_hcd ehci_pci ehci_hcd serverworks sata_svw libata acpi_cpufreq freq_table mperf ide_core usbcore kvm_amd kvm tg3 i2c_piix4 libphy microcode e100 usb_common ptp skge i2c_core pcspkr k10temp evdev floppy hwmon pps_core mii rtc_cmos button processor unix [last unloaded: nvidia]
CPU: 1 PID: 6831 Comm: less Tainted: P        W  O 3.10.15-devel #18
Hardware name: empty empty/S3992-E, BIOS 'V1.06   ' 06/09/2009
task: ffff880203cc6bc0 ti: ffff88023e47c000 task.ti: ffff88023e47c000
RIP: 0010:[<ffffffff8118b188>]  [<ffffffff8118b188>] __blkdev_get+0x28/0x450
RSP: 0018:ffff88023e47dbd8  EFLAGS: 00010286
RAX: ffffffffffffff74 RBX: ffffffffffffff74 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88023e47dc18 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88023f519658
R13: ffffffff8118c300 R14: 0000000000000000 R15: ffff88023f519640
FS:  00007f2070bf7700(0000) GS:ffff880247400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002b4 CR3: 000000023da1d000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 0000000000000002 0000001d00000000 000000003e47dc50 ffff88023f519640
 ffff88043d5bb668 ffffffff8118c300 ffff88023d683550 ffff88023e47de60
 ffff88023e47dc98 ffffffff8118c10d 0000001d81605698 0000000000000292
Call Trace:
 [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60
 [<ffffffff8118c10d>] blkdev_get+0x1dd/0x370
 [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60
 [<ffffffff813cea6c>] ? _raw_spin_unlock+0x2c/0x50
 [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60
 [<ffffffff8118c365>] blkdev_open+0x65/0x80
 [<ffffffff8114d12e>] do_dentry_open.isra.18+0x23e/0x2f0
 [<ffffffff8114d214>] finish_open+0x34/0x50
 [<ffffffff8115e122>] do_last.isra.62+0x2d2/0xc50
 [<ffffffff8115eb58>] path_openat.isra.63+0xb8/0x4d0
 [<ffffffff81115a8e>] ? might_fault+0x4e/0xa0
 [<ffffffff8115f4f0>] do_filp_open+0x40/0x90
 [<ffffffff813cea6c>] ? _raw_spin_unlock+0x2c/0x50
 [<ffffffff8116db85>] ? __alloc_fd+0xa5/0x1f0
 [<ffffffff8114e45f>] do_sys_open+0xef/0x1d0
 [<ffffffff8114e559>] SyS_open+0x19/0x20
 [<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f
Code: 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 89 d6 41 55 41 54 4c 8d 67 18 53 48 83 ec 18 89 75 cc e9 f2 00 00 00 0f 1f 44 00 00 <48> 8b 80 40 03 00 00 48 89 df 4c 8b 68 58 e8 d5
a4 07 00 44 89
RIP  [<ffffffff8118b188>] __blkdev_get+0x28/0x450
 RSP <ffff88023e47dbd8>
CR2: 00000000000002b4
---[ end trace bb7f32dbf02398dc ]---

The brd change should be backported to stable kernels starting with 2.6.25.
The loop change should be backported to stable kernels starting with 2.6.22.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 11:11:50 -08:00
Al Viro
db2a144bed block_device_operations->release() should return void
The value passed is 0 in all but "it can never happen" cases (and those
only in a couple of drivers) *and* it would've been lost on the way
out anyway, even if something tried to pass something meaningful.
Just don't bother.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-07 02:16:21 -04:00
Linus Torvalds
20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
Linus Torvalds
27387dd8c6 Merge tag 'for-linus-20130409' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "I've got a few smaller fixes queued up for 3.9 that should go in.  The
  major one is the loop regression, the others are nice fixes on their
  own though.  It contains:

   - Fix for unitialized var in the block sysfs code, courtesy of Arnd
     and gcc-4.8.

   - Two fixes for mtip32xx, fixing probe and command timeout.  Also a
     debug measure that could have waited for 3.10, but it's driver
     only, so I let it slip in.

   - Revert the loop partition cleanup fix, it could cause a deadlock on
     auto-teardown as part of umount.  The fix is clear, but at this
     point we just want to revert it and get a real fix in for 3.10."

* tag 'for-linus-20130409' of git://git.kernel.dk/linux-block:
  Revert "loop: cleanup partitions when detaching loop device"
  mtip32xx: fix two smatch warnings
  mtip32xx: Add debugfs entry device_status
  mtip32xx: return 0 from pci probe in case of rebuild
  mtip32xx: recovery from command timeout
  block: avoid using uninitialized value in from queue_var_store
2013-04-09 12:05:41 -07:00
Al Viro
03d95eb2f2 lift sb_start_write() out of ->write()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:12:56 -04:00
Jens Axboe
c2fccc1c9f Revert "loop: cleanup partitions when detaching loop device"
This reverts commit 8761a3dc1f.

There are situations where the destruction path is called
with the bdev->bd_mutex already held, which then deadlocks in
loop_clr_fd(). The normal partition cleanup does a trylock()
on the mutex, but it'd be nice to have a more bullet proof
method in loop. So punt this more involved fix to the next
merge window, and just back out this buggy fix for now.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-08 10:12:11 +02:00
Anatol Pomozov
c1681bf8a7 loop: prevent bdev freeing while device in use
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".

But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
  bd_set_size+0x10/0xa0
  loop_clr_fd+0x1f8/0x420 [loop]
  lo_ioctl+0x200/0x7e0 [loop]
  lo_compat_ioctl+0x47/0xe0 [loop]
  compat_blkdev_ioctl+0x341/0x1290
  do_filp_open+0x42/0xa0
  compat_sys_ioctl+0xc1/0xf20
  do_sys_open+0x16e/0x1d0
  sysenter_dispatch+0x7/0x1a

To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().

The issue is reprodusible on current Linus head and v3.3. Here is the test:

  dd if=/dev/zero of=loop.file bs=1M count=1
  while [ true ]; do
    losetup /dev/loop0 loop.file
    echo 2 > /proc/sys/vm/drop_caches
    losetup -d /dev/loop0
  done

[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
  time we call loop_set_fd() we check that loop_device->lo_state is
  Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
  it will get EBUSY.  And if we try to loop_clr_fd() on unbound loop
  device we'll get ENXIO.

  loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
  loop_device->lo_ctl_mutex. ]

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-01 15:48:47 -07:00
Phillip Susi
8761a3dc1f loop: cleanup partitions when detaching loop device
Any partitions added by user space to the loop device were being
left in place after detaching the loop device.  This was because
the detach path issued a BLKRRPART to clean up partitions if
LO_FLAGS_PARTSCAN was set, meaning that the partitions were auto
scanned on attach.  Replace this BLKRRPART with code that
unconditionally cleans up partitions on detach instead.

Signed-off-by: Phillip Susi <psusi@ubuntu.com>

Modified by Jens to export delete_partition().

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-22 12:21:53 -06:00
Wei Yongjun
183cfb5720 loop: fix error return code in loop_add()
Fix to return a negative error code from the error handling
case, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-22 08:59:19 -06:00
Linus Torvalds
f042fea0da Merge branch 'for-3.9/drivers' of git://git.kernel.dk/linux-block
Pull block driver bits from Jens Axboe:
 "After the block IO core bits are in, please grab the driver updates
  from below as well.  It contains:

   - Fix ancient regression in dac960.  Nobody must be using that
     anymore...

   - Some good fixes from Guo Ghao for loop, fixing both potential
     oopses and deadlocks.

   - Improve mtip32xx for NUMA systems, by being a bit more clever in
     distributing work.

   - Add IBM RamSan 70/80 driver.  A second round of fixes for that is
     pending, that will come in through for-linus during the 3.9 cycle
     as per usual.

   - A few xen-blk{back,front} fixes from Konrad and Roger.

   - Other minor fixes and improvements."

* 'for-3.9/drivers' of git://git.kernel.dk/linux-block:
  loopdev: ignore negative offset when calculate loop device size
  loopdev: remove an user triggerable oops
  loopdev: move common code into loop_figure_size()
  loopdev: update block device size in loop_set_status()
  loopdev: fix a deadlock
  xen-blkback: use balloon pages for persistent grants
  xen-blkfront: drop the use of llist_for_each_entry_safe
  xen/blkback: Don't trust the handle from the frontend.
  xen-blkback: do not leak mode property
  block: IBM RamSan 70/80 driver fixes
  rsxx: add slab.h include to dma.c
  drivers/block/mtip32xx: add missing GENERIC_HARDIRQS dependency
  block: remove new __devinit/exit annotations on ramsam driver
  block: IBM RamSan 70/80 device driver
  drivers/block/mtip32xx/mtip32xx.c:1726:5: sparse: symbol 'mtip_send_trim' was not declared. Should it be static?
  drivers/block/mtip32xx/mtip32xx.c:4029:1: sparse: symbol 'mtip_workq_sdbf0' was not declared. Should it be static?
  dac960: return success instead of -ENOTTY
  mtip32xx: add trim support
  mtip32xx: Add workqueue and NUMA support
  block: delete super ancient PC-XT driver for 1980's hardware
2013-02-28 13:16:07 -08:00
Tejun Heo
c718aa652d block/loop: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:15 -08:00
Tejun Heo
9d60916677 block/loop: don't use idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being
deprecated.  Drop its usage.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:13 -08:00
Al Viro
3dadecce20 switch vfs_getattr() to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:08 -05:00
Guo Chao
b7a1da695f loopdev: ignore negative offset when calculate loop device size
Negative offset may cause loop device size larger than backing file
size.

 $ fallocate -l 1M a
 $ losetup --offset 0xffffffffffff0000 /dev/loop0 a
 $ blockdev --getsize64 /dev/loop0
 1114112
 $ ls -l a
 -rw-r--r-- 1 root root 1048576 Jan 23 12:46 a
 $ cat /dev/loop0
 cat: /dev/loop0: Input/output error

It makes no sense to do that. Only apply offset when it's positive.

Fix a typo in the comment by the way.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: M. Hindess <hindessm@uk.ibm.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-22 10:43:22 +01:00
Guo Chao
b1a6650406 loopdev: remove an user triggerable oops
When loopdev is built as module and we pass an invalid parameter,
loop_init() will return directly without deregister misc device, which
will cause an oops when insert loop module next time because we left some
garbage in the misc device list.

Test case:
sudo modprobe loop max_part=1024
(failed due to invalid parameter)
sudo modprobe loop
(oops)

Clean up nicely to avoid such oops.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: M. Hindess <hindessm@uk.ibm.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-22 10:43:22 +01:00
Guo Chao
7b0576a3d8 loopdev: move common code into loop_figure_size()
Update block device size in accord with gendisk size and let userspace
know the change in loop_figure_size(). This is a clean up to remove
common code of loop_figure_size()'s two callers.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: M. Hindess <hindessm@uk.ibm.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-22 10:43:22 +01:00
Guo Chao
541c742a75 loopdev: update block device size in loop_set_status()
Loop device driver sometimes fails to impose the size limit on the
device. Keep issuing following two commands:

losetup --offset 7517244416 --sizelimit 3224971264 /dev/loop0 backed_file
blockdev --getsize64 /dev/loop0

blockdev reports file size instead of sizelimit several out of 100 times.

The problems are:

	- losetup set up the device in two ioctl:
		  LOOP_SET_FD and LOOP_SET_STATUS64.

	- LOOP_SET_STATUS64 only update size of gendisk.

Block device size will be updated lazily when device comes to use. If udev
rushes in between the two ioctl, it will bring in a block device whose
size is backing file size. If the device is not released after
LOOP_SET_STATUS64 ioctl, blockdev will not see the updated size.

Update block size in LOOP_SET_STATUS64 ioctl.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Reported-by: M. Hindess <hindessm@uk.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-22 10:43:22 +01:00
Guo Chao
5370019dc2 loopdev: fix a deadlock
bd_mutex and lo_ctl_mutex can be held in different order.

Path #1:

blkdev_open
 blkdev_get
  __blkdev_get (hold bd_mutex)
   lo_open (hold lo_ctl_mutex)

Path #2:

blkdev_ioctl
 lo_ioctl (hold lo_ctl_mutex)
  lo_set_capacity (hold bd_mutex)

Lockdep does not report it, because path #2 actually holds a subclass of
lo_ctl_mutex.  This subclass seems creep into the code by mistake.  The
patch author actually just mentioned it in the changelog, see commit
f028f3b2 ("loop: fix circular locking in loop_clr_fd()"), also see:

	http://marc.info/?l=linux-kernel&m=123806169129727&w=2

Path #2 hold bd_mutex to call bd_set_size(), I've protected it
with i_mutex in a previous patch, so drop bd_mutex at this site.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: M. Hindess <hindessm@uk.ibm.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-22 10:43:21 +01:00
Linus Torvalds
9228ff9038 Merge branch 'for-3.8/drivers' of git://git.kernel.dk/linux-block
Pull block driver update from Jens Axboe:
 "Now that the core bits are in, here are the driver bits for 3.8.  The
  branch contains:

   - A huge pile of drbd bits that were dumped from the 3.7 merge
     window.  Following that, it was both made perfectly clear that
     there is going to be no more over-the-wall pulls and how the
     situation on individual pulls can be improved.

   - A few cleanups from Akinobu Mita for drbd and cciss.

   - Queue improvement for loop from Lukas.  This grew into adding a
     generic interface for waiting/checking an even with a specific
     lock, allowing this to be pulled out of md and now loop and drbd is
     also using it.

   - A few fixes for xen back/front block driver from Roger Pau Monne.

   - Partition improvements from Stephen Warren, allowing partiion UUID
     to be used as an identifier."

* 'for-3.8/drivers' of git://git.kernel.dk/linux-block: (609 commits)
  drbd: update Kconfig to match current dependencies
  drbd: Fix drbdsetup wait-connect, wait-sync etc... commands
  drbd: close race between drbd_set_role and drbd_connect
  drbd: respect no-md-barriers setting also when changed online via disk-options
  drbd: Remove obsolete check
  drbd: fixup after wait_even_lock_irq() addition to generic code
  loop: Limit the number of requests in the bio list
  wait: add wait_event_lock_irq() interface
  xen-blkfront: free allocated page
  xen-blkback: move free persistent grants code
  block: partition: msdos: provide UUIDs for partitions
  init: reduce PARTUUID min length to 1 from 36
  block: store partition_meta_info.uuid as a string
  cciss: use check_signature()
  cciss: cleanup bitops usage
  drbd: use copy_highpage
  drbd: if the replication link breaks during handshake, keep retrying
  drbd: check return of kmalloc in receive_uuids
  drbd: Broadcast sync progress no more often than once per second
  drbd: don't try to clear bits once the disk has failed
  ...
2012-12-17 13:39:11 -08:00
Lukas Czerner
7b5a35225b loop: Limit the number of requests in the bio list
Currently there is not limitation of number of requests in the loop bio
list. This can lead into some nasty situations when the caller spawns
tons of bio requests taking huge amount of memory. This is even more
obvious with discard where blkdev_issue_discard() will submit all bios
for the range and wait for them to finish afterwards. On really big loop
devices and slow backing file system this can lead to OOM situation as
reported by Dave Chinner.

With this patch we will wait in loop_make_request() if the number of
bios in the loop bio list would exceed 'nr_congestion_on'.
We'll wake up the process as we process the bios form the list. Some
threshold hysteresis is in place to avoid high frequency oscillation.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-11-30 11:48:05 +01:00
Dave Chinner
a1ecac3b06 loop: Make explicit loop device destruction lazy
xfstests has always had random failures of tests due to loop devices
failing to be torn down and hence leaving filesytems that cannot be
unmounted. This causes test runs to immediately stop.

Over the past 6 or 7 years we've added hacks like explicit unmount
-d commands for loop mounts, losetup -d after unmount -d fails, etc,
but still the problems persist.  Recently, the frequency of loop
related failures increased again to the point that xfstests 259 will
reliably fail with a stray loop device that was not torn down.

That is despite the fact the test is above as simple as it gets -
loop 5 or 6 times running mkfs.xfs with different paramters:

        lofile=$(losetup -f)
        losetup $lofile "$testfile"
        "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!"
        sync
        losetup -d $lofile

And losteup -d $lofile is failing with EBUSY on 1-3 of these loops
every time the test is run.

Turns out that blkid is running simultaneously with losetup -d, and
so it sees an elevated reference count and returns EBUSY.  But why
is blkid running? It's obvious, isn't it? udev has decided to try
and find out what is on the block device as a result of a creation
notification. And it is racing with mkfs, so might still be scanning
the device when mkfs finishes and we try to tear it down.

So, make losetup -d force autoremove behaviour. That is, when the
last reference goes away, tear down the device. xfstests wants it
*gone*, not causing random teardown failures when we know that all
the operations the tests have specifically run on the device have
completed and are no longer referencing the loop device.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-30 08:37:31 +01:00
Eric W. Biederman
e4849737f7 userns: Convert loop to use kuid_t instead of uid_t
Cc: Jens Axboe <jaxboe@fusionio.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:20 -07:00
Silva Paulo
68d740d79c blk: fix wrong idr_pre_get() error check in loop.c
The idr_pre_get() function never returns a value < 0.  It returns 0 (no
memory) or 1 (OK).

Reported-by: Silva Paulo <psdasilva@yahoo.com>
[ Rewrote Silva's patch, but attributing it to Silva anyway  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-14 15:39:58 -07:00