Commit Graph

361374 Commits

Author SHA1 Message Date
Kent Overstreet 2f477877f8 block: Remove some unnecessary bi_vcnt usage
More prep work for immutable bvecs/effecient bio splitting - usage of
bi_vcnt has to be auditing, so getting rid of all the unnecessary usage
makes that easier.

Plus, bio_segments() is really what this code wanted, as it respects the
current value of bi_idx.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Eric Moore <Eric.Moore@lsi.com>
CC: "James E.J. Bottomley" <JBottomley@parallels.com>
CC: linux-scsi@vger.kernel.org
2013-03-23 14:15:31 -07:00
Kent Overstreet 4f2ac93c17 block: Remove bi_idx references
For immutable bvecs, all bi_idx usage needs to be audited - so here
we're removing all the unnecessary uses.

Most of these are places where it was being initialized on a bio that
was just allocated, a few others are conversions to standard macros.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
2013-03-23 14:15:31 -07:00
Kent Overstreet 5b83636ae3 block: Change bio_split() to respect the current value of bi_idx
In the current code bio_split() won't be seeing partially completed bios
so this doesn't change any behaviour, but this makes the code a bit
clearer as to what bio_split() actually requires.

The immediate purpose of the patch is removing unnecessary bi_idx
references, but the end goal is to allow partial completed bios to be
submitted, which along with immutable biovecs enables effecient bio
splitting.

Some of the callers were (double) checking that bios could be split, so
update their checks too.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
CC: Neil Brown <neilb@suse.de>
CC: Martin K. Petersen <martin.petersen@oracle.com>
2013-03-23 14:15:30 -07:00
Kent Overstreet aa8b57aa3d block: Use bio_sectors() more consistently
Bunch of places in the code weren't using it where they could be -
this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
into a struct bvec_iter.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: "Ed L. Cashin" <ecashin@coraid.com>
CC: Nick Piggin <npiggin@kernel.dk>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Jim Paris <jim@jtan.com>
CC: Geoff Levand <geoff@infradead.org>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Neil Brown <neilb@suse.de>
CC: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Ed Cashin <ecashin@coraid.com>
2013-03-23 14:15:30 -07:00
Kent Overstreet f73a1c7d11 block: Add bio_end_sector()
Just a little convenience macro - main reason to add it now is preparing
for immutable bio vecs, it'll reduce the size of the patch that puts
bi_sector/bi_size/bi_idx into a struct bvec_iter.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Neil Brown <neilb@suse.de>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: linux-s390@vger.kernel.org
CC: Chris Mason <chris.mason@fusionio.com>
CC: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2013-03-23 14:15:29 -07:00
Kent Overstreet fb9e353476 md: Convert md_trim_bio() to use bio_advance()
Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
Acked-by: NeilBrown <neilb@suse.de>
2013-03-23 14:15:28 -07:00
Kent Overstreet f79ea41614 block: Refactor blk_update_request()
Converts it to use bio_advance(), simplifying it quite a bit in the
process.

Note that req_bio_endio() now always calls bio_advance() - which means
it always loops over the biovec, not just on partial completions. Don't
expect it to affect performance, but worth noting.

Tested it by forcing partial updates, and dumping before and after on
various bio/bvec fields when doing a partial update.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
2013-03-23 14:15:28 -07:00
Kent Overstreet 054bdf646e block: Add bio_advance()
This is prep work for immutable bio vecs; we first want to centralize
where bvecs are modified.

Next two patches convert some existing code to use this function.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
2013-03-23 14:15:27 -07:00
Kent Overstreet 9f060e2231 block: Convert integrity to bvec_alloc_bs()
This adds a pointer to the bvec array to struct bio_integrity_payload,
instead of the bvecs always being inline; then the bvecs are allocated
with bvec_alloc_bs().

Changed bvec_alloc_bs() and bvec_free_bs() to take a pointer to a
mempool instead of the bioset, so that bio integrity can use a different
mempool for its bvecs, and thus avoid a potential deadlock.

This is eventually for immutable bio vecs - immutable bvecs aren't
useful if we still have to copy them, hence the need for the pointer.
Less code is always nice too, though.

Also, bio_integrity_alloc() was using fs_bio_set if no bio_set was
specified. This was wrong - using the bio_set doesn't protect us from
memory allocation failures, because we just used kmalloc for the
bio_integrity_payload. But it does introduce the possibility of
deadlock, if for some reason we weren't supposed to be using fs_bio_set.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Martin K. Petersen <martin.petersen@oracle.com>
2013-03-23 14:15:27 -07:00
Kent Overstreet 6fda981caf block: Fix a buffer overrun in bio_integrity_split()
bio_integrity_split() seemed to be confusing pointers and arrays -
bip_vec in bio_integrity_payload was an array appended to the end of the
payload, so the bio_vecs in struct bio_pair should have come after the
bio_integrity_payload they're for.

Fix it by making bip_vec a pointer to the inline vecs - a later patch is
going to make more use of this pointer.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Martin K. Petersen <martin.petersen@oracle.com>
2013-03-23 14:15:26 -07:00
Kent Overstreet df2cb6daa4 block: Avoid deadlocks with bio allocation by stacking drivers
Previously, if we ever try to allocate more than once from the same bio
set while running under generic_make_request() (i.e. a stacking block
driver), we risk deadlock.

This is because of the code in generic_make_request() that converts
recursion to iteration; any bios we submit won't actually be submitted
(so they can complete and eventually be freed) until after we return -
this means if we allocate a second bio, we're blocking the first one
from ever being freed.

Thus if enough threads call into a stacking block driver at the same
time with bios that need multiple splits, and the bio_set's reserve gets
used up, we deadlock.

This can be worked around in the driver code - we could check if we're
running under generic_make_request(), then mask out __GFP_WAIT when we
go to allocate a bio, and if the allocation fails punt to workqueue and
retry the allocation.

But this is tricky and not a generic solution. This patch solves it for
all users by inverting the previously described technique. We allocate a
rescuer workqueue for each bio_set, and then in the allocation code if
there are bios on current->bio_list we would be blocking, we punt them
to the rescuer workqueue to be submitted.

This guarantees forward progress for bio allocations under
generic_make_request() provided each bio is submitted before allocating
the next, and provided the bios are freed after they complete.

Note that this doesn't do anything for allocation from other mempools.
Instead of allocating per bio data structures from a mempool, code
should use bio_set's front_pad.

Tested it by forcing the rescue codepath to be taken (by disabling the
first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
of arbitrary bio splitting) and verified that the rescuer was being
invoked.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Muthukumar Ratty <muthur@gmail.com>
2013-03-23 14:15:26 -07:00
Kent Overstreet 57fb233f07 block: Reorder struct bio_set
This is prep work for the next patch, which embeds a struct bio_list in
struct bio_set.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
2013-03-23 14:15:25 -07:00
Linus Torvalds a937536b86 Linux 3.9-rc3 2013-03-17 15:59:32 -07:00
David Rientjes 6c4d3bc99b perf,x86: fix link failure for non-Intel configs
Commit 1d9d8639c0 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") introduces a link failure since
perf_restore_debug_store() is only defined for CONFIG_CPU_SUP_INTEL:

	arch/x86/power/built-in.o: In function `restore_processor_state':
	(.text+0x45c): undefined reference to `perf_restore_debug_store'

Fix it by defining the dummy function appropriately.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-17 15:59:15 -07:00
Linus Torvalds 2a6e06b2ae perf,x86: fix wrmsr_on_cpu() warning on suspend/resume
Commit 1d9d8639c0 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") fixed a crash when doing PEBS performance profiling
after resuming, but in using init_debug_store_on_cpu() to restore the
DS_AREA mtrr it also resulted in a new WARN_ON() triggering.

init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU
cross-calls to do the MSR update.  Which is not really valid at the
early resume stage, and the warning is quite reasonable.  Now, it all
happens to _work_, for the simple reason that smp_call_function_single()
ends up just doing the call directly on the CPU when the CPU number
matches, but we really should just do the wrmsr() directly instead.

This duplicates the wrmsr() logic, but hopefully we can just remove the
wrmsr_on_cpu() version eventually.

Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-17 15:44:43 -07:00
Linus Torvalds 08637024ab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "Eric's rcu barrier patch fixes a long standing problem with our
  unmount code hanging on to devices in workqueue helpers.  Liu Bo
  nailed down a difficult assertion for in-memory extent mappings."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix warning of free_extent_map
  Btrfs: fix warning when creating snapshots
  Btrfs: return as soon as possible when edquot happens
  Btrfs: return EIO if we have extent tree corruption
  btrfs: use rcu_barrier() to wait for bdev puts at unmount
  Btrfs: remove btrfs_try_spin_lock
  Btrfs: get better concurrency for snapshot-aware defrag work
2013-03-17 11:04:14 -07:00
Liu Bo 3b2775942d Btrfs: fix warning of free_extent_map
Users report that an extent map's list is still linked when it's actually
going to be freed from cache.

The story is that

a) when we're going to drop an extent map and may split this large one into
smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
that it's on the list to be logged, then the smaller ems split from it will also
be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.

b) we'll keep ems from unlinking the list and freeing when they are flagged with
EXTENT_FLAG_LOGGING, because the log code holds one reference.

The end result is the warning, but the truth is that we set the flag
EXTENT_FLAG_LOGGING only during fsync.

So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.

Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-15 21:51:49 -04:00
Linus Torvalds e20437852d Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild fix from Michal Marek:
 "One fix for for make headers_install/headers_check to not require make
  3.81.  The requirement has been accidentally introduced in 3.7."

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: fix make headers_check with make 3.80
2013-03-15 18:06:55 -07:00
Linus Torvalds 236595879b Merge tag 'for-3.9-rc3' of git://openrisc.net/jonas/linux
Pull OpenRISC bug fixes from Jonas Bonn:

 - The GPIO descriptor work has exposed how broken the non-GPIOLIB bits
   for OpenRISC were.  We now require GPIOLIB as this is the preferred
   way forward.

 - The system.h split introduced a bug in llist.h for arches using
   asm-generic/cmpxchg.h directly, which is currently only OpenRISC.
   The patch here moves two defines from asm-generic/atomic.h to
   asm-generic/cmpxchg.h to make things work as they should.

 - The VIRT_TO_BUS selector was added for OpenRISC, but OpenRISC does
   not have the virt_to_bus methods, so there's a patch to remove it
   again.

* tag 'for-3.9-rc3' of git://openrisc.net/jonas/linux:
  openrisc: remove HAVE_VIRT_TO_BUS
  asm-generic: move cmpxchg*_local defs to cmpxchg.h
  openrisc: require gpiolib
2013-03-15 18:05:37 -07:00
Linus Torvalds 9e1a0aab60 Merge tag 'char-misc-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg Kroah-Hartman:
 "Here are some tiny fixes for the w1 drivers and the final removal
  patch for getting rid of CONFIG_EXPERIMENTAL (all users of it are now
  gone from your tree, this just drops the Kconfig item itself.)

  All have been in the linux-next tree for a while"

* tag 'char-misc-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  final removal of CONFIG_EXPERIMENTAL
  w1: fix oops when w1_search is called from netlink connector
  w1-gpio: fix unused variable warning
  w1-gpio: remove erroneous __exit and __exit_p()
  ARM: w1-gpio: fix erroneous gpio requests
2013-03-15 18:04:38 -07:00
Linus Torvalds 5cd8846c3b Merge tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "A collection of small fixes, as expected for the middle rc:
   - A couple of fixes for potential NULL dereferences and out-of-range
     array accesses revealed by static code parsers
   - A fix for the wrong error handling detected by trinity
   - A regression fix for missing audio on some MacBooks
   - CA0132 DSP loader fixes
   - Fix for EAPD control of IDT codecs on machines w/o speaker
   - Fix a regression in the HD-audio widget list parser code
   - Workaround for the NuForce UDH-100 USB audio"

* tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix missing EAPD/GPIO setup for Cirrus codecs
  sound: sequencer: cap array index in seq_chn_common_event()
  ALSA: hda/ca0132 - Remove extra setting of dsp_state.
  ALSA: hda/ca0132 - Check download state of DSP.
  ALSA: hda/ca0132 - Check if dspload_image succeeded.
  ALSA: hda - Disable IDT eapd_switch if there are no internal speakers
  ALSA: hda - Fix snd_hda_get_num_raw_conns() to return a correct value
  ALSA: usb-audio: add a workaround for the NuForce UDH-100
  ALSA: asihpi - fix potential NULL pointer dereference
  ALSA: seq: Fix missing error handling in snd_seq_timer_open()
2013-03-15 17:35:49 -07:00
Linus Torvalds c7f17deb31 Merge branch 'fixes-for-3.9' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull DMA-mapping fix from Marek Szyprowski:
 "An important fix for all ARM architectures which use ZONE_DMA.
  Without it dma_alloc_* calls with GFP_ATOMIC flag might have allocated
  buffers outsize DMA zone."

* 'fixes-for-3.9' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  ARM: DMA-mapping: add missing GFP_DMA flag for atomic buffer allocation
2013-03-15 17:35:03 -07:00
Linus Torvalds de1893f640 Merge tag 'mfd-fixes-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes
Pull MFD fixes from Samuel Ortiz:
 "This is the first batch of MFD fixes for 3.9.

  With this one we have:

   - An ab8500 build failure fix.
   - An ab8500 device tree parsing fix.
   - A fix for twl4030_madc remove routine to work properly (when
     built-in).
   - A fix for properly registering palmas interrupt handler.
   - A fix for omap-usb init routine to actually write into the
     hostconfig register.
   - A couple of warning fixes for ab8500-gpadc and tps65912"

* tag 'mfd-fixes-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes:
  mfd: twl4030-madc: Remove __exit_p annotation
  mfd: ab8500: Kill "reg" property from binding
  mfd: ab8500-gpadc: Complain if we fail to enable vtvout LDO
  mfd: wm831x: Don't forward declare enum wm831x_auxadc
  mfd: twl4030-audio: Fix argument type for twl4030_audio_disable_resource()
  mfd: tps65912: Declare and use tps65912_irq_exit()
  mfd: palmas: Provide irq flags through DT/platform data
  mfd: Make AB8500_CORE select POWER_SUPPLY to fix build error
  mfd: omap-usb-host: Actually update hostconfig
2013-03-15 17:34:01 -07:00
Linus Torvalds 92fbb1c917 Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
 "Bug fixes for pmbus, ltc2978, and lineage-pem drivers

  Added specific maintainer for some hwmon drivers"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (pmbus/ltc2978) Fix temperature reporting
  hwmon: (pmbus) Fix krealloc() misuse in pmbus_add_attribute()
  hwmon: (lineage-pem) Add missing terminating entry for pem_[input|fan]_attributes
  MAINTAINERS: Add maintainer for MAX6697, INA209, and INA2XX drivers
2013-03-15 17:33:13 -07:00
Stephane Eranian 1d9d8639c0 perf,x86: fix kernel crash with PEBS/BTS after suspend/resume
This patch fixes a kernel crash when using precise sampling (PEBS)
after a suspend/resume. Turns out the CPU notifier code is not invoked
on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
by the kernel and keeps it power-on/resume value of 0 causing any PEBS
measurement to crash when running on CPU0.

The workaround is to add a hook in the actual resume code to restore
the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
the DS_AREA will be restored twice but this is harmless.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-15 09:26:35 -07:00