Steffen Klassert says:
====================
Just one patch this time.
1) Drop packets when the matching SA is in larval state and add a
statistic counter for that. From Fan Du.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
By default the SH scheduler rejects connections that are hashed onto a
realserver of weight 0. This patch adds a flag to make SH choose a
different realserver in this case, instead of rejecting the connection.
The patch also adds a flag to make SH include the source port (TCP, UDP,
SCTP) in the hash as well as the source address. This basically allows
for deterministic round-robin load balancing (i.e., where any director
in a cluster of directors with identical config will send the same
packet the same way).
The flags are service flags (IP_VS_SVC_F_SCHED*) so that these options
can be set per service. They are set using a new option to ipvsadm.
Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Add macros for single bit definitions of a specific type. These are
similar to the BIT() macro that already exists, but with a few
exceptions:
1. The namespace is such that they can be used in uapi definitions.
2. The type is set with the _AC() macro to allow it to be used in
assembly.
3. The type is explicitly specified to be UL or ULL.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-nbca8p7cg6jyjoit7klh3o91@git.kernel.org
This small patch adds the definition of ARPHRD_NETLINK which can for
example be used by netlink monitoring devices as device type. So that
sockaddr_ll can pick it up and based on that choose the correct packet
dissector.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently send all mappings to the iommu in PAGE_SIZE chunks,
which prevents the iommu from enabling support for larger page sizes.
We still need to pin pages, which means we step through them in
PAGE_SIZE chunks, but we can batch up contiguous physical memory
chunks to allow the iommu the opportunity to use larger pages. The
approach here is a bit different that the one currently used for
legacy KVM device assignment. Rather than looking at the vma page
size and using that as the maximum size to pass to the iommu, we
instead simply look at whether the next page is physically
contiguous. This means we might ask the iommu to map a 4MB region,
while legacy KVM might limit itself to a maximum of 2MB.
Splitting our mapping path also allows us to be smarter about locked
memory because we can more easily unwind if the user attempts to
exceed the limit. Therefore, rather than assuming that a mapping
will result in locked memory, we test each page as it is pinned to
determine whether it locks RAM vs an mmap'd MMIO region. This should
result in better locking granularity and less locked page fudge
factors in userspace.
The unmap path uses the same algorithm as legacy KVM. We don't want
to track the pfn for each mapping ourselves, but we need the pfn in
order to unpin pages. We therefore ask the iommu for the iova to
physical address translation, ask it to unpin a page, and see how many
pages were actually unpinned. iommus supporting large pages will
often return something bigger than a page here, which we know will be
physically contiguous and we can unpin a batch of pfns. iommus not
supporting large mappings won't see an improvement in batching here as
they only unmap a page at a time.
With this change, we also make a clarification to the API for mapping
and unmapping DMA. We can only guarantee unmaps at the same
granularity as used for the original mapping. In other words,
unmapping a subregion of a previous mapping is not guaranteed and may
result in a larger or smaller unmapping than requested. The size
field in the unmapping structure is updated to reflect this.
Previously this was unmodified on mapping, always returning the the
requested unmap size. This is now updated to return the actual unmap
size on success, allowing userspace to appropriately track mappings.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This has been replaced by the new and much better VIDIOC_DBG_G_CHIP_INFO.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
xt_socket module can be a nice replacement to conntrack module
in some cases (SYN filtering for example)
But it lacks the ability to match the 3rd packet of TCP
handshake (ACK coming from the client).
Add a XT_SOCKET_NOWILDCARD flag to disable the wildcard mechanism.
The wildcard is the legacy socket match behavior, that ignores
LISTEN sockets bound to INADDR_ANY (or ipv6 equivalent)
iptables -I INPUT -p tcp --syn -j SYN_CHAIN
iptables -I INPUT -m socket --nowildcard -j ACCEPT
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* linus: (1465 commits)
ARM: tegra30: clocks: Fix pciex clock registration
lseek(fd, n, SEEK_END) does *not* go to eof - n
Linux 3.10-rc6
smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().
powerpc: Fix missing/delayed calls to irq_work
powerpc: Fix emulation of illegal instructions on PowerNV platform
powerpc: Fix stack overflow crash in resume_kernel when ftracing
snd_pcm_link(): fix a leak...
use can_lookup() instead of direct checks of ->i_op->lookup
move exit_task_namespaces() outside of exit_notify()
fput: task_work_add() can fail if the caller has passed exit_task_work()
xfs: don't shutdown log recovery on validation errors
xfs: ensure btree root split sets blkno correctly
xfs: fix implicit padding in directory and attr CRC formats
xfs: don't emit v5 superblock warnings on write
mei: me: clear interrupts on the resume path
mei: nfc: fix nfc device freeing
mei: init: Flush scheduled work before resetting the device
sctp: fully initialize sctp_outq in sctp_outq_init
netiucv: Hold rtnl between name allocation and device registration.
...
VFIO implements platform independent stuff such as
a PCI driver, BAR access (via read/write on a file descriptor
or direct mapping when possible) and IRQ signaling.
The platform dependent part includes IOMMU initialization
and handling. This implements an IOMMU driver for VFIO
which does mapping/unmapping pages for the guest IO and
provides information about DMA window (required by a POWER
guest).
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In previous discussions, I tried to find some reasonable heuristics
for delayed ACK, however this seems not possible, according to Eric:
"ACKS might also be delayed because of bidirectional
traffic, and is more controlled by the application
response time. TCP stack can not easily estimate it."
"ACK can be incredibly useful to recover from losses in
a short time.
The vast majority of TCP sessions are small lived, and we
send one ACK per received segment anyway at beginning or
retransmits to let the sender smoothly increase its cwnd,
so an auto-tuning facility wont help them that much."
and according to David:
"ACKs are the only information we have to detect loss.
And, for the same reasons that TCP VEGAS is fundamentally
broken, we cannot measure the pipe or some other
receiver-side-visible piece of information to determine
when it's "safe" to stretch ACK.
And even if it's "safe", we should not do it so that losses are
accurately detected and we don't spuriously retransmit.
The only way to know when the bandwidth increases is to
"test" it, by sending more and more packets until drops happen.
That's why all successful congestion control algorithms must
operate on explicited tested pieces of information.
Similarly, it's not really possible to universally know if
it's safe to stretch ACK or not."
It still makes sense to enable or disable quick ack mode like
what TCP_QUICK_ACK does.
Similar to TCP_QUICK_ACK option, but for people who can't
modify the source code and still wants to control
TCP delayed ACK behavior. As David suggested, this should belong
to per-path scope, since different pathes may want different
behaviors.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netlink_diag.h is in include/uapi/linux but not in the Kbuild necessary
to cause it to be exported by make headers_install.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add gre vport implementation. Most of gre protocol processing
is pushed to gre module. It make use of gre demultiplexer
therefore it can co-exist with linux device based gre tunnels.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/wireless/ath/ath9k/Kconfig
drivers/net/xen-netback/netback.c
net/batman-adv/bat_iv_ogm.c
net/wireless/nl80211.c
The ath9k Kconfig conflict was a change of a Kconfig option name right
next to the deletion of another option.
The xen-netback conflict was overlapping changes involving the
handling of the notify list in xen_netbk_rx_action().
Batman conflict resolution provided by Antonio Quartulli, basically
keep everything in both conflict hunks.
The nl80211 conflict is a little more involved. In 'net' we added a
dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
Linus reported. Meanwhile in 'net-next' the handlers were converted
to use pre and post doit handlers which use a flag to determine
whether to hold the RTNL mutex around the operation.
However, the dump handlers to not use this logic. Instead they have
to explicitly do the locking. There were apparent bugs in the
conversion of nl80211_dump_wiphy() in that we were not dropping the
RTNL mutex in all the return paths, and it seems we very much should
be doing so. So I fixed that whilst handling the overlapping changes.
To simplify the initial returns, I take the RTNL mutex after we try
to allocate 'tb'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add defines for 5 and 10 MHz channel width and fix channel
handling functions accordingly.
Also check for and report the WIPHY_FLAG_SUPPORTS_5_10_MHZ
capability.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
[fix spelling in comment]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Files tipc.h and tipc_config.h were moved to uapi directory, but
the corresponding comments were not updated at the same time.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross says:
====================
A few miscellaneous improvements and cleanups before the GRE tunnel
integration series. Intended for net-next/3.11.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
when user runs command btrfs dev del the raid requisite error if any
goes to the /var/log/messages, its not good idea to clutter messages
with these user (knowledge) errors, further user don't have to review
the system messages to know problem with the cli it should be dropped
to the user as part of the cli return.
to bring this feature created a set of the ERROR defined
BTRFS_ERROR_DEV* error codes and created their error string.
I expect this enum to be added with other error which we might
want to communicate to the user land
v3:
moved the code with in the file no logical change
v1->v2:
introduce error codes for the device mgmt usage
v1:
adds a parameter in the ioctl arg struct to carry the error string
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
btrfs_qgroup_wait_for_completion waits until the currently running qgroup
operation completes. It returns immediately when no rescan process is in
progress. This is useful to automate things around the rescan process (e.g.
testing).
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>