linux

mirror of https://github.com/armbian/linux.git synced 2026-01-06 10:13:00 -08:00

Author	SHA1	Message	Date
Ingo Molnar	2d84e023cb	Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull the v3.5 RCU tree from Paul E. McKenney: 1) A set of improvements and fixes to the RCU_FAST_NO_HZ feature (with more on the way for 3.6). Posted to LKML: https://lkml.org/lkml/2012/4/23/324 (commits 1-3 and 5), https://lkml.org/lkml/2012/4/16/611 (commit 4), https://lkml.org/lkml/2012/4/30/390 (commit 6), and https://lkml.org/lkml/2012/5/4/410 (commit 7, combined with the other commits for the convenience of the tester). 2) Changes to make rcu_barrier() avoid disrupting execution of CPUs that have no RCU callbacks. Posted to LKML: https://lkml.org/lkml/2012/4/23/322. 3) A couple of commits that improve the efficiency of the interaction between preemptible RCU and the scheduler, these two being all that survived an abortive attempt to allow preemptible RCU's __rcu_read_lock() to be inlined. The full set was posted to LKML at https://lkml.org/lkml/2012/4/14/143, and the first and third patches of that set remain. 4) Lai Jiangshan's algorithmic implementation of SRCU, which includes call_srcu() and srcu_barrier(). A major feature of this new implementation is that synchronize_srcu() no longer disturbs the execution of other CPUs. This work is based on earlier implementations by Peter Zijlstra and Paul E. McKenney. Posted to LKML: https://lkml.org/lkml/2012/2/22/82. 5) A number of miscellaneous bug fixes and improvements which were posted to LKML at: https://lkml.org/lkml/2012/4/23/353 with subsequent updates posted to LKML. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-05-14 08:41:46 +02:00
Linus Torvalds	4a873f5399	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David S. Miller: 1) Since we do RCU lookups on ipv4 FIB entries, we have to test if the entry is dead before returning it to our caller. 2) openvswitch locking and packet validation fixes from Ansis Atteka, Jesse Gross, and Pravin B Shelar. 3) Fix PM resume locking in IGB driver, from Benjamin Poirier. 4) Fix VLAN header handling in vhost-net and macvtap, from Basil Gor. 5) Revert a bogus network namespace isolation change that was causing regressions on S390 networking devices. 6) If bonding decides to process and handle a LACPDU frame, we shouldn't bump the rx_dropped counter. From Jiri Bohac. 7) Fix mis-calculation of available TX space in r8169 driver when doing TSO, which can lead to crashes and/or hung device. From Julien Ducourthial. 8) SCTP does not validate cached routes properly in all cases, from Nicolas Dichtel. 9) Link status interrupt needs to be handled in ks8851 driver, from Stephen Boyd. 10) Use capable(), not cap_raised(), in connector/userns netlink code. From Eric W. Biederman via Andrew Morton. 11) Fix pktgen OOPS on module unload, from Eric Dumazet. 12) iwlwifi under-estimates SKB truesizes, also from Eric Dumazet. 13) Cure division by zero in SFC driver, from Ben Hutchings. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) ks8851: Update link status during link change interrupt macvtap: restore vlan header on user read vhost-net: fix handle_rx buffer size bonding: don't increase rx_dropped after processing LACPDUs connector/userns: replace netlink uses of cap_raised() with capable() sctp: check cached dst before using it pktgen: fix crash at module unload Revert "net: maintain namespace isolation between vlan and real device" ehea: fix losing of NEQ events when one event occurred early igb: fix rtnl race in PM resume path ipv4: Do not use dead fib_info entries. r8169: fix unsigned int wraparound with TSO sfc: Fix division by zero when using one RX channel and no SR-IOV openvswitch: Validation of IPv6 set port action uses IPv4 header net: compare_ether_addr[_64bits]() has no ordering cdc_ether: Ignore bogus union descriptor for RNDIS devices bnx2x: bug fix when loading after SAN boot e1000: Silence sparse warnings by correcting type igb, ixgbe: netdev_tx_reset_queue incorrectly called from tx init path openvswitch: Release rtnl_lock if ovs_vport_cmd_build_info() failed. ...	2012-05-12 12:57:01 -07:00
Paul E. McKenney	dc36be4419	Merge branches 'barrier.2012.05.09a', 'fixes.2012.04.26a', 'inline.2012.05.02b' and 'srcu.2012.05.07b' into HEAD barrier: Reduce the amount of disturbance by rcu_barrier() to the rest of the system. This branch also includes improvements to RCU_FAST_NO_HZ, which are included here due to conflicts. fixes: Miscellaneous fixes. inline: Remaining changes from an abortive attempt to inline preemptible RCU's __rcu_read_lock(). These are (1) making exit_rcu() avoid unnecessary work and (2) avoiding having preemptible RCU record a blocked thread when the scheduler declines to do a context switch. srcu: Lai Jiangshan's algorithmic implementation of SRCU, including call_srcu().	2012-05-11 10:14:21 -07:00
Nicolas Dichtel	e0268868ba	sctp: check cached dst before using it dst_check() will take care of SA (and obsolete field), hence IPsec rekeying scenario is taken into account. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Vlad Yaseivch <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:15:47 -04:00
David S. Miller	59b9997bab	Revert "net: maintain namespace isolation between vlan and real device" This reverts commit `8a83a00b07`. It causes regressions for S390 devices, because it does an unconditional DST drop on SKBs for vlans and the QETH device needs the neighbour entry hung off the DST for certain things on transmit. Arnd can't remember exactly why he even needed this change. Conflicts: drivers/net/macvlan.c net/8021q/vlan_dev.c net/core/dev.c Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:03:34 -04:00
Paul E. McKenney	21e52e1566	rcu: Make RCU_FAST_NO_HZ handle timer migration The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a CPU goes offline, in which case it assumes that the CPU will have to come out of dyntick-idle mode (cancelling the timer) in order to go offline. This is important because when RCU_FAST_NO_HZ permits a CPU to enter dyntick-idle mode despite having RCU callbacks pending, it posts a timer on that CPU to force a wakeup on that CPU. This wakeup ensures that the CPU will eventually handle the end of the grace period, including invoking its RCU callbacks. However, Pascal Chapperon's test setup shows that the timer handler rcu_idle_gp_timer_func() really does get invoked in some cases. This is problematic because this can cause the CPU that entered dyntick-idle mode despite still having RCU callbacks pending to remain in dyntick-idle mode indefinitely, which means that its RCU callbacks might never be invoked. This situation can result in grace-period delays or even system hangs, which matches Pascal's observations of slow boot-up and shutdown (https://lkml.org/lkml/2012/4/5/142). See also the bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=806548 This commit therefore causes the "should never be invoked" timer handler rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up the CPU for which the timer was intended, allowing that CPU to invoke its RCU callbacks in a timely manner. Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-05-09 14:26:56 -07:00
Johannes Berg	1c430a727f	net: compare_ether_addr[_64bits]() has no ordering Neither compare_ether_addr() nor compare_ether_addr_64bits() (as it can fall back to the former) have comparison semantics like memcmp() where the sign of the return value indicates sort order. We had a bug in the wireless code due to a blind memcmp replacement because of this. A cursory look suggests that the wireless bug was the only one due to this semantic difference. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-07 19:21:29 -04:00
Linus Torvalds	18b15fcde7	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes form Peter Anvin * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intel_mid_powerbtn: mark irq as IRQF_NO_SUSPEND arch/x86/platform/geode/net5501.c: change active_low to 0 for LED driver x86, relocs: Remove an unused variable asm-generic: Use __BITS_PER_LONG in statfs.h x86/amd: Re-enable CPU topology extensions in case BIOS has disabled it	2012-05-06 12:19:38 -07:00
Linus Torvalds	59068e369b	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull an ACPI patch from Len Brown: "It fixes a D3 issue new in 3.4-rc1." By Lin Ming via Len Brown: * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: ACPI: Fix D3hot v D3cold confusion	2012-05-05 10:06:06 -07:00
Lin Ming	1cc0c998fd	ACPI: Fix D3hot v D3cold confusion Before this patch, ACPI_STATE_D3 incorrectly referenced D3hot in some places, but D3cold in other places. After this patch, ACPI_STATE_D3 always means ACPI_STATE_D3_COLD; and all references to D3hot use ACPI_STATE_D3_HOT. ACPI's _PR3 method is used to enter both D3hot and D3cold states. What distinguishes D3hot from D3cold is the presence _PR3 (Power Resources for D3hot) If these resources are all ON, then the state is D3hot. If _PR3 is not present, or all _PR0 resources for the devices are OFF, then the state is D3cold. This patch applies after Linux-3.4-rc1. A future syntax cleanup may remove ACPI_STATE_D3 to emphasize that it always means ACPI_STATE_D3_COLD. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Aaron Lu <aaron.lu@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-05-05 01:19:52 -04:00
Linus Torvalds	4f988f152e	seqlock: add 'raw_seqcount_begin()' function The normal read_seqcount_begin() function will wait for any current writers to exit their critical region by looping until the sequence count is even. That "wait for sequence count to stabilize" is the right thing to do if the read-locker will just retry the whole operation on contention: no point in doing a potentially expensive reader sequence if we know at the beginning that we'll just end up re-doing it all. HOWEVER. Some users don't actually retry the operation, but instead will abort and do the operation with proper locking. So the sequence count case may be the optimistic quick case, but in the presense of writers you may want to do full locking in order to guarantee forward progress. The prime example of this would be the RCU name lookup. And in that case, you may well be better off without the "retry early", and are in a rush to instead get to the failure handling. Thus this "raw" interface that just returns the sequence number without testing it - it just forces the low bit to zero so that read_seqcount_retry() will always fail such a "active concurrent writer" scenario. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-04 15:13:54 -07:00
Linus Torvalds	2f62427862	Fix __read_seqcount_begin() to use ACCESS_ONCE for sequence value read We really need to use a ACCESS_ONCE() on the sequence value read in __read_seqcount_begin(), because otherwise the compiler might end up reloading the value in between the test and the return of it. As a result, it might end up returning an odd value (which means that a write is in progress). If the reader is then fast enough that that odd value is still the current one when the read_seqcount_retry() is done, we might end up with a "successful" read sequence, even despite the concurrent write being active. In practice this probably never really happens - there just isn't anything else going on around the read of the sequence count, and the common case is that we end up having a read barrier immediately afterwards. So the code sequence in which gcc might decide to reaload from memory is small, and there's no reason to believe it would ever actually do the reload. But if the compiler ever were to decide to do so, it would be incredibly annoying to debug. Let's just make sure. Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-04 14:46:02 -07:00
Linus Torvalds	c42f1d4b52	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Transfer padding was wrong for full-speed USB in ASIX driver, fix from Ingo van Lil. 2) Propagate the negative packet offset fix into the PowerPC BPF JIT. From Jan Seiffert. 3) dl2k driver's private ioctls were letting unprivileged tasks make MII writes and other ugly bits like that. Fix from Jeff Mahoney. 4) Fix TX VLAN and RX packet drops in ucc_geth, from Joakim Tjernlund. 5) OOPS and network namespace fixes in IPVS from Hans Schillstrom and Julian Anastasov. 6) Fix races and sleeping in locked context bugs in drop_monitor, from Neil Horman. 7) Fix link status indication in smsc95xx driver, from Paolo Pisati. 8) Fix bridge netfilter OOPS, from Peter Huang. 9) L2TP sendmsg can return on error conditions with the socket lock held, oops. Fix from Sasha Levin. 10) udp_diag should return meaningful values for socket memory usage, from Shan Wei. 11) Eric Dumazet is so awesome he gets his own section: Socket memory cgroup code (I never should have applied those patches, grumble...) made erroneous changes to sk_sockets_allocated_read_positive(). It was changed to use percpu_counter_sum_positive (which requires BH disabling) instead of percpu_counter_read_positive (which does not). Revert back to avoid crashes and lockdep warnings. Adjust the default tcp_adv_win_scale and tcp_rmem[2] values to fix throughput regressions. This is necessary as a result of our more precise skb->truesize tracking. Fix SKB leak in netem packet scheduler. 12) New device IDs for various bluetooth devices, from Manoj Iyer, AceLan Kao, and Steven Harms. 13) Fix command completion race in ipw2200, from Stanislav Yakovlev. 14) Fix rtlwifi oops on unload, from Larry Finger. 15) Fix hard_mtu when adjusting hard_header_len in smsc95xx driver. From Stephane Fillod. 16) ehea driver registers it's IRQ before all the necessary state is setup, resulting in crashes. Fix from Thadeu Lima de Souza Cascardo. 17) Fix PHY connection failures in davinci_emac driver, from Anatolij Gustschin. 18) Missing break; in switch statement in bluetooth's hci_cmd_complete_evt(). Fix from Szymon Janc. 19) Fix queue programming in iwlwifi, from Johannes Berg. 20) Interrupt throttling defaults not being actually programmed into the hardware, fix from Jeff Kirsher and Ying Cai. 21) TLAN driver SKB encoding in descriptor busted on 64-bit, fix from Benjamin Poirier. 22) Fix blind status block RX producer pointer deref in TG3 driver, from Matt Carlson. 23) Promisc and multicast are busted on ehea, fixes from Thadeu Lima de Souza Cascardo. 24) Fix crashes in 6lowpan, from Alexander Smirnov. 25) tcp_complete_cwr() needs to be careful to not rewind the CWND to ssthresh if ssthresh has the "infinite" value. Fix from Yuchung Cheng. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits) sungem: Fix WakeOnLan tcp: change tcp_adv_win_scale and tcp_rmem[2] net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg drop_monitor: prevent init path from scheduling on the wrong cpu usbnet: fix failure handling in usbnet_probe usbnet: fix leak of transfer buffer of dev->interrupt ucc_geth: Add 16 bytes to max TX frame for VLANs net: ucc_geth, increase no. of HW RX descriptors netem: fix possible skb leak sky2: fix receive length error in mixed non-VLAN/VLAN traffic sky2: propogate rx hash when packet is copied net: fix two typos in skbuff.h cxgb3: Don't call cxgb_vlan_mode until q locks are initialized ixgbe: fix calling skb_put on nonlinear skb assertion bug ixgbe: Fix a memory leak in IEEE DCB igbvf: fix the bug when initializing the igbvf smsc75xx: enable mac to detect speed/duplex from phy smsc75xx: declare smsc75xx's MII as GMII capable smsc75xx: fix phy interrupt acknowledge smsc75xx: fix phy init reset loop ...	2012-05-03 17:10:39 -07:00
Paul E. McKenney	9dd8fb16c3	rcu: Make exit_rcu() more precise and consolidate When running preemptible RCU, if a task exits in an RCU read-side critical section having blocked within that same RCU read-side critical section, the task must be removed from the list of tasks blocking a grace period (perhaps the current grace period, perhaps the next grace period, depending on timing). The exit() path invokes exit_rcu() to do this cleanup. However, the current implementation of exit_rcu() needlessly does the cleanup even if the task did not block within the current RCU read-side critical section, which wastes time and needlessly increases the size of the state space. Fix this by only doing the cleanup if the current task is actually on the list of tasks blocking some grace period. While we are at it, consolidate the two identical exit_rcu() functions into a single function. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Conflicts: kernel/rcupdate.c	2012-05-02 14:48:27 -07:00
Paul E. McKenney	616c310e83	rcu: Move PREEMPT_RCU preemption to switch_to() invocation Currently, PREEMPT_RCU readers are enqueued upon entry to the scheduler. This is inefficient because enqueuing is required only if there is a context switch, and entry to the scheduler does not guarantee a context switch. The commit therefore moves the enqueuing to immediately precede the call to switch_to() from the scheduler. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-02 14:43:23 -07:00
John W. Linville	076e7779c0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2012-05-01 14:14:05 -04:00
Eric Dumazet	d961949660	net: fix two typos in skbuff.h fix kernel doc typos in function names Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-01 09:40:19 -04:00
Linus Torvalds	e7a7c9ab41	Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a set of SAS and SATA fixes; there are one or two longstanding bug fixes, but most of this is regression fixes." * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: [SCSI] libfc: update mfs boundry checking [SCSI] Revert "[SCSI] libsas: fix sas port naming" [SCSI] libsas: fix false positive 'device attached' conditions [SCSI] libsas, libata: fix start of life for a sas ata_port [SCSI] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready [SCSI] libsas: unify domain_device sas_rphy lifetimes [SCSI] libsas: fix sas_get_port_device regression [SCSI] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys [SCSI] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work [SCSI] libata: Pass correct DMA device to scsi host [SCSI] scsi_lib: use correct DMA device in __scsi_alloc_queue	2012-04-30 15:33:50 -07:00
Matthew Garrett	41b3254c93	efi: Add new variable attributes More recent versions of the UEFI spec have added new attributes for variables. Add them. Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-04-30 15:30:18 -07:00
H. Peter Anvin	f5c2347ee2	asm-generic: Use __BITS_PER_LONG in statfs.h <asm-generic/statfs.h> is exported to userspace, so using BITS_PER_LONG is invalid. We need to use __BITS_PER_LONG instead. This is kernel bugzilla 43165. Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1335465916-16965-1-git-send-email-hpa@linux.intel.com Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: <stable@vger.kernel.org>	2012-04-30 12:55:15 -07:00
Lai Jiangshan	931ea9d1a6	rcu: Implement per-domain single-threaded call_srcu() state machine This commit implements an SRCU state machine in support of call_srcu(). The state machine is preemptible, light-weight, and single-threaded, minimizing synchronization overhead. In particular, there is no longer any need for synchronize_srcu() to be guarded by a mutex. Expedited processing is handled, at least in the absence of concurrent grace-period operations on that same srcu_struct structure, by having the synchronize_srcu_expedited() thread take on the role of the workqueue thread for one iteration. There is a reasonable probability that a given SRCU callback will be invoked on the same CPU that registered it, however, there is no guarantee. Concurrent SRCU grace-period primitives can cause callbacks to be executed elsewhere, even in absence of CPU-hotplug operations. Callbacks execute in process context, but under the influence of local_bh_disable(), so it is illegal to sleep in an SRCU callback function. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-04-30 10:48:25 -07:00
Lai Jiangshan	966f58c2f6	rcu: Remove unused srcu_barrier() The old srcu_barrier() macro is now unused. This commit removes it so that it may be used for the SRCU flavor of rcu_barrier(), which will in turn be needed to allow the upcoming call_srcu() to be used from within modules. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-04-30 10:48:23 -07:00
Lai Jiangshan	b52ce066c5	rcu: Implement a variant of Peter's SRCU algorithm This commit implements a variant of Peter's algorithm, which may be found at https://lkml.org/lkml/2012/2/1/119. o Make the checking lock-free to enable parallel checking. Parallel checking is required when (1) the original checking task is preempted for a long time, (2) sychronize_srcu_expedited() starts during an ongoing SRCU grace period, or (3) we wish to avoid acquiring a lock. o Since the checking is lock-free, we avoid a mutex in state machine for call_srcu(). o Remove the SRCU_REF_MASK and remove the coupling with the flipping. This might allow us to remove the preempt_disable() in future versions, though such removal will need great care because it rescinds the one-old-reader-per-CPU guarantee. o Remove a smp_mb(), simplify the comments and make the smp_mb() pairs more intuitive. Inspired-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-04-30 10:48:22 -07:00
Lai Jiangshan	440253c17f	rcu: Increment upper bit only for srcu_read_lock() The purpose of the upper bit of SRCU's per-CPU counters is to guarantee that no reasonable series of srcu_read_lock() and srcu_read_unlock() operations can return the value of the counter to its original value. This guarantee is require only after the index has been switched to the other set of counters, so at most one srcu_read_lock() can affect a given CPU's counter. The number of srcu_read_unlock() operations on a given counter is limited to the number of tasks in the system, which given the Linux kernel's current structure is limited to far less than 2^30 on 32-bit systems and far less than 2^62 on 64-bit systems. (Something about a limited number of bytes in the kernel's address space.) Therefore, if srcu_read_lock() increments the upper bits, then srcu_read_unlock() need not do so. In this case, an srcu_read_lock() and an srcu_read_unlock() will flip the lower bit of the upper field of the counter. An unreasonably large additional number of srcu_read_unlock() operations would be required to return the counter to its initial value, thus preserving the guarantee. This commit takes this approach, which further allows it to shrink the size of the upper field to one bit, making the number of srcu_read_unlock() operations required to return the counter to its initial value even more unreasonable than before. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-04-30 10:48:20 -07:00
Paul E. McKenney	cef50120b6	rcu: Direct algorithmic SRCU implementation The current implementation of synchronize_srcu_expedited() can cause severe OS jitter due to its use of synchronize_sched(), which in turn invokes try_stop_cpus(), which causes each CPU to be sent an IPI. This can result in severe performance degradation for real-time workloads and especially for short-interation-length HPC workloads. Furthermore, because only one instance of try_stop_cpus() can be making forward progress at a given time, only one instance of synchronize_srcu_expedited() can make forward progress at a time, even if they are all operating on distinct srcu_struct structures. This commit, inspired by an earlier implementation by Peter Zijlstra (https://lkml.org/lkml/2012/1/31/211) and by further offline discussions, takes a strictly algorithmic bits-in-memory approach. This has the disadvantage of requiring one explicit memory-barrier instruction in each of srcu_read_lock() and srcu_read_unlock(), but on the other hand completely dispenses with OS jitter and furthermore allows SRCU to be used freely by CPUs that RCU believes to be idle or offline. The update-side implementation handles the single read-side memory barrier by rechecking the per-CPU counters after summing them and by running through the update-side state machine twice. This implementation has passed moderate rcutorture testing on both x86 and Power. Also updated to use this_cpu_ptr() instead of per_cpu_ptr(), as suggested by Peter Zijlstra. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2012-04-30 10:48:19 -07:00

1 2 3 4 5 ...

49638 Commits