Commit Graph

32133 Commits

Author SHA1 Message Date
Trond Myklebust
c05eecf636 SUNRPC: Don't allow low priority tasks to pre-empt higher priority ones
Currently, the priority queues attempt to be 'fair' to lower priority
tasks by scheduling them after a certain number of higher priority tasks
have run. The problem is that both the transport send queue and
the NFSv4.1 session slot queue have strong ordering requirements.

This patch therefore removes the fairness code in favour of strong
ordering of task priorities.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:53 +01:00
Trond Myklebust
62ae082d88 NFSv4: Reorder the XDR structures to put sequence at the top, not bottom
Pre-condition for optimising the slot allocation and reintroducing FIFO
behaviour.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:52 +01:00
Trond Myklebust
8fe72bac8d NFSv4: Clean up handling of privileged operations
Privileged rpc calls are those that are run by the state recovery thread,
in cases where we're trying to recover the system after a server reboot
or a network partition. In those cases, we want to fence off all other
rpc calls (see nfs4_begin_drain_session()) so that they don't end up
using stateids or clientids that are in the process of being recovered.

Prior to this patch, we had to set up special callback functions in
order to declare an rpc call as being privileged.
By adding a new field to the sequence arguments, this patch simplifies
things considerably, and allows us to declare the rpc call as privileged
before it is run.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:50 +01:00
Trond Myklebust
76e697ba7e NFSv4.1: Move slot table and session struct definitions to nfs4session.h
Clean up. Gather NFSv4.1 slot definitions in fs/nfs/nfs4session.h.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:46 +01:00
Trond Myklebust
c34309a45e NFS: Remove unused function slot_idx
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:46 +01:00
Trond Myklebust
87dda67e73 NFSv4.1: Allow SEQUENCE to resize the slot table on the fly
Instead of an array of slots, use a singly linked list of slots that
can be dynamically appended to or shrunk.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:42 +01:00
Trond Myklebust
97e548a93d NFSv4.1: Support dynamic resizing of the session slot table
Allow the server to control the size of the session slot table
by adjusting the value of sr_target_max_slots in the reply to the
SEQUENCE operation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:42 +01:00
Trond Myklebust
da0507b7c9 NFSv4.1: Reset the sequence number for slots that have been deallocated
When the server tells us that it is dynamically resizing the session
replay cache, we should reset the sequence number for those slots
that have been deallocated.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:17 +01:00
Trond Myklebust
464ee9f966 NFSv4.1: Ensure that the client tracks the server target_highest_slotid
Dynamic slot allocation in NFSv4.1 depends on the client being able to
track the server's target value for the highest slotid in the
slot table.  See the reference in Section 2.10.6.1 of RFC5661.

To avoid ordering problems in the case where 2 SEQUENCE replies contain
conflicting updates to this target value, we also introduce a generation
counter, to track whether or not an RPC containing a SEQUENCE operation
was launched before or after the last update.

Also rename the nfs4_slot_table target_max_slots field to
'target_highest_slotid' to avoid confusion with a slot
table size or number of slots.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:29:47 +01:00
Trond Myklebust
2b2fa71723 NFSv4.1: Simplify struct nfs4_sequence_args too
Replace the session pointer + slotid with a pointer to the
allocated slot.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-26 17:49:52 -05:00
Trond Myklebust
df2fabffba NFSv4.1: Label each entry in the session slot tables with its slot number
Instead of doing slot table pointer gymnastics every time we want to
know which slot we're using.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-26 17:49:51 -05:00
Trond Myklebust
e3725ec015 NFSv4.1: Shrink struct nfs4_sequence_res by moving the session pointer
Move the session pointer into the slot table, then have struct nfs4_slot
point to that slot table.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-26 17:49:04 -05:00
Trond Myklebust
933602e368 NFSv4.1: Shrink struct nfs4_sequence_res by moving sr_renewal_time
Store the renewal time inside the session slot instead.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-21 09:29:53 -05:00
Trond Myklebust
ae72ae6760 NFSv4.1: Don't confuse CREATE_SESSION arguments and results
Don't store the target request and response sizes in the same
variables used to store the server's replies to those targets.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-21 09:29:51 -05:00
Sasha Levin
d9b482c8ba hashtable: introduce a small and naive hashtable
This hashtable implementation is using hlist buckets to provide a simple
hashtable to prevent it from getting reimplemented all over the kernel.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
[ Merging this now, so that subsystems can start applying Sasha's
  patches that use this   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-02 12:44:51 -07:00
Linus Torvalds
8c23f406c6 Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fix from Marcelo Tosatti.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: fix vcpu->mmio_fragments overflow
2012-11-01 08:27:02 -07:00
Xiao Guangrong
87da7e66a4 KVM: x86: fix vcpu->mmio_fragments overflow
After commit b3356bf0db (KVM: emulator: optimize "rep ins" handling),
the pieces of io data can be collected and write them to the guest memory
or MMIO together

Unfortunately, kvm splits the mmio access into 8 bytes and store them to
vcpu->mmio_fragments. If the guest uses "rep ins" to move large data, it
will cause vcpu->mmio_fragments overflow

The bug can be exposed by isapc (-M isapc):

[23154.818733] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ ......]
[23154.858083] Call Trace:
[23154.859874]  [<ffffffffa04f0e17>] kvm_get_cr8+0x1d/0x28 [kvm]
[23154.861677]  [<ffffffffa04fa6d4>] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
[23154.863604]  [<ffffffffa04f5a1a>] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]

Actually, we can use one mmio_fragment to store a large mmio access then
split it when we pass the mmio-exit-info to userspace. After that, we only
need two entries to store mmio info for the cross-mmio pages access

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-31 20:36:30 -02:00
Linus Torvalds
2df4f26167 Merge tag 'md-3.7-fixes' of git://neil.brown.name/md
Pull md fixes from NeilBrown:
 "Some fixes for md in 3.7
   - one recently introduced crash for dm-raid10 with discard
   - one bug in new functionality that has been around for a few
     releases.
   - minor bug in md's 'faulty' personality

  and UAPI disintegration for md."

* tag 'md-3.7-fixes' of git://neil.brown.name/md:
  MD RAID10: Fix oops when creating RAID10 arrays via dm-raid.c
  md/raid1: Fix assembling of arrays containing Replacements.
  md faulty: use disk_stack_limits()
  UAPI: (Scripted) Disintegrate include/linux/raid
2012-10-30 19:48:48 -07:00
Mikulas Patocka
1bf11c5353 percpu-rw-semaphores: use rcu_read_lock_sched
Use rcu_read_lock_sched / rcu_read_unlock_sched / synchronize_sched
instead of rcu_read_lock / rcu_read_unlock / synchronize_rcu.

This is an optimization. The RCU-protected region is very small, so
there will be no latency problems if we disable preempt in this region.

So we use rcu_read_lock_sched / rcu_read_unlock_sched that translates
to preempt_disable / preempt_disable. It is smaller (and supposedly
faster) than preemptible rcu_read_lock / rcu_read_unlock.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-28 10:59:36 -07:00
Mikulas Patocka
5c1eabe685 percpu-rw-semaphores: use light/heavy barriers
This patch introduces new barrier pair light_mb() and heavy_mb() for
percpu rw semaphores.

This patch fixes a bug in percpu-rw-semaphores where a barrier was
missing in percpu_up_write.

This patch improves performance on the read path of
percpu-rw-semaphores: on non-x86 cpus, there was a smp_mb() in
percpu_up_read. This patch changes it to a compiler barrier and removes
the "#if defined(X86) ..." condition.

From: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-28 10:59:36 -07:00
Linus Torvalds
e657e078d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "This is what we usually expect at this stage of the game, lots of
  little things, mostly in drivers.  With the occasional 'oops didn't
  mean to do that' kind of regressions in the core code."

 1) Uninitialized data in __ip_vs_get_timeouts(), from Arnd Bergmann

 2) Reject invalid ACK sequences in Fast Open sockets, from Jerry Chu.

 3) Lost error code on return from _rtl_usb_receive(), from Christian
    Lamparter.

 4) Fix reset resume on USB rt2x00, from Stanislaw Gruszka.

 5) Release resources on error in pch_gbe driver, from Veaceslav Falico.

 6) Default hop limit not set correctly in ip6_template_metrics[], fix
    from Li RongQing.

 7) Gianfar PTP code requests wrong kind of resource during probe, fix
    from Wei Yang.

 8) Fix VHOST net driver on big-endian, from Michael S Tsirkin.

 9) Mallenox driver bug fixes from Jack Morgenstein, Or Gerlitz, Moni
    Shoua, Dotan Barak, and Uri Habusha.

10) usbnet leaks memory on TX path, fix from Hemant Kumar.

11) Use socket state test, rather than presence of FIN bit packet, to
    determine FIONREAD/SIOCINQ value.  Fix from Eric Dumazet.

12) Fix cxgb4 build failure, from Vipul Pandya.

13) Provide a SYN_DATA_ACKED state to complement SYN_FASTOPEN in socket
    info dumps.  From Yuchung Cheng.

14) Fix leak of security path in kfree_skb_partial().  Fix from Eric
    Dumazet.

15) Handle RX FIFO overflows more resiliently in pch_gbe driver, from
    Veaceslav Falico.

16) Fix MAINTAINERS file pattern for networking drivers, from Jean
    Delvare.

17) Add iPhone5 IDs to IPHETH driver, from Jay Purohit.

18) VLAN device type change restriction is too strict, and should not
    trigger for the automatically generated vlan0 device.  Fix from Jiri
    Pirko.

19) Make PMTU/redirect flushing work properly again in ipv4, from
    Steffen Klassert.

20) Fix memory corruptions by using kfree_rcu() in netlink_release().
    From Eric Dumazet.

21) More qmi_wwan device IDs, from Bjørn Mork.

22) Fix unintentional change of SNAT/DNAT hooks in generic NAT
    infrastructure, from Elison Niven.

23) Fix 3.6.x regression in xt_TEE netfilter module, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  tilegx: fix some issues in the SW TSO support
  qmi_wwan/cdc_ether: move Novatel 551 and E362 to qmi_wwan
  net: usb: Fix memory leak on Tx data path
  net/mlx4_core: Unmap UAR also in the case of error flow
  net/mlx4_en: Don't use vlan tag value as an indication for vlan presence
  net/mlx4_en: Fix double-release-range in tx-rings
  bas_gigaset: fix pre_reset handling
  vhost: fix mergeable bufs on BE hosts
  gianfar_ptp: use iomem, not ioports resource tree in probe
  ipv6: Set default hoplimit as zero.
  NET_VENDOR_TI: make available for am33xx as well
  pch_gbe: fix error handling in pch_gbe_up()
  b43: Fix oops on unload when firmware not found
  mwifiex: clean up scan state on error
  mwifiex: return -EBUSY if specific scan request cannot be honored
  brcmfmac: fix potential NULL dereference
  Revert "ath9k_hw: Updated AR9003 tx gain table for 5GHz"
  ath9k_htc: Add PID/VID for a Ubiquiti WiFiStation
  rt2x00: usb: fix reset resume
  rtlwifi: pass rx setup error code to caller
  ...
2012-10-26 15:00:48 -07:00
Linus Torvalds
490916d6ba Merge tag 'staging-3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg Kroah-Hartman:
 "Here are some staging driver fixes for your 3.7-rc tree.

  Nothing major here, a number of iio driver fixups that were causing
  problems, some comedi driver bugfixes, and a bunch of tidspbridge
  warning squashing and other regressions fixed from the 3.6 release.

  All have been in the linux-next releases for a bit.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'staging-3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (32 commits)
  staging: tidspbridge: delete unused mmu functions
  staging: tidspbridge: ioremap physical address of the stack segment in shm
  staging: tidspbridge: ioremap dsp sync addr
  staging: tidspbridge: change type to __iomem for per and core addresses
  staging: tidspbridge: drop const from custom mmu implementation
  staging: tidspbridge: request the right irq for mmu
  staging: ipack: add missing include (implicit declaration of function 'kfree')
  staging: ramster: depends on NET
  staging: omapdrm: fix allocation size for page addresses array
  staging: zram: Fix handling of incompressible pages
  Staging: android: binder: Allow using highmem for binder buffers
  Staging: android: binder: Fix memory leak on thread/process exit
  staging: comedi: ni_labpc: fix possible NULL deref during detach
  staging: comedi: das08: fix possible NULL deref during detach
  staging: comedi: amplc_pc263: fix possible NULL deref during detach
  staging: comedi: amplc_pc236: fix possible NULL deref during detach
  staging: comedi: amplc_pc236: fix invalid register access during detach
  staging: comedi: amplc_dio200: fix possible NULL deref during detach
  staging: comedi: 8255_pci: fix possible NULL deref during detach
  staging: comedi: ni_daq_700: fix dio subdevice regression
  ...
2012-10-26 10:25:31 -07:00
Linus Torvalds
299650cad6 Merge tag 'driver-core-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg Kroah-Hartman:
 "Here are a number of firmware core fixes for 3.7, and some other minor
  fixes.  And some documentation updates thrown in for good measure.

  All have been in the linux-next tree for a while.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'driver-core-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Documentation:Chinese translation of Documentation/arm64/memory.txt
  Documentation:Chinese translation of Documentation/arm64/booting.txt
  Documentation:Chinese translation of Documentation/IRQ.txt
  firmware loader: document kernel direct loading
  sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat()
  dynamic_debug: Remove unnecessary __used
  firmware loader: sync firmware cache by async_synchronize_full_domain
  firmware loader: let direct loading back on 'firmware_buf'
  firmware loader: fix one reqeust_firmware race
  firmware loader: cancel uncache work before caching firmware
2012-10-26 10:24:51 -07:00
Linus Torvalds
f76ddd9807 Merge tag 'char-misc-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg Kroah-Hartman:
 "Here are some driver fixes for 3.7.  They include extcon driver fixes,
  a hyper-v bugfix, and two other minor driver fixes.

  All of these have been in the linux-next releases for a while.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'char-misc-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  sonypi: suspend/resume callbacks should be conditionally compiled on CONFIG_PM_SLEEP
  Drivers: hv: Cleanup error handling in vmbus_open()
  extcon : register for cable interest by cable name
  extcon: trivial: kfree missed from remove path
  extcon: driver model release call not needed
  extcon: MAX77693: Add platform data for MUIC device to initialize registers
  extcon: max77693: Use max77693_update_reg for rmw operations
  extcon: Fix kerneldoc for extcon_set_cable_state and extcon_set_cable_state_
  extcon: adc-jack: Add missing MODULE_LICENSE
  extcon: adc-jack: Fix checking return value of request_any_context_irq
  extcon: Fix return value in extcon_register_interest()
  extcon: unregister compat link on cleanup
  extcon: Unregister compat class at module unload to fix oops
  extcon: optimising the check_mutually_exclusive function
  extcon: standard cable names definition and declaration changed
  extcon-max8997: remove usage of ret in max8997_muic_handle_charger_type_detach
  extcon: Remove duplicate inclusion of extcon.h header file
2012-10-26 10:24:19 -07:00
Linus Torvalds
622f202a4c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This fixes a couple of nasty page table initialization bugs which were
  causing kdump regressions.  A clean rearchitecturing of the code is in
  the works - meanwhile these are reverts that restore the
  best-known-working state of the kernel.

  There's also EFI fixes and other small fixes."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, mm: Undo incorrect revert in arch/x86/mm/init.c
  x86: efi: Turn off efi_enabled after setup on mixed fw/kernel
  x86, mm: Find_early_table_space based on ranges that are actually being mapped
  x86, mm: Use memblock memory loop instead of e820_RAM
  x86, mm: Trim memory in memblock to be page aligned
  x86/irq/ioapic: Check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
  x86/efi: Fix oops caused by incorrect set_memory_uc() usage
  x86-64: Fix page table accounting
  Revert "x86/mm: Fix the size calculation of mapping tables"
  MAINTAINERS: Add EFI git repository location
2012-10-26 09:35:46 -07:00