Due to MQ support we may allocate a whole bunch of rx queues but
never use them. With this patch we'll safe the space used by
the receive buffers until they are actually in use:
sh-4.2# free -h
total used free shared buffers cached
Mem: 490M 35M 455M 0B 0B 4.1M
-/+ buffers/cache: 31M 459M
Swap: 0B 0B 0B
sh-4.2# ethtool -L eth0 combined 8
sh-4.2# free -h
total used free shared buffers cached
Mem: 490M 162M 327M 0B 0B 4.1M
-/+ buffers/cache: 158M 331M
Swap: 0B 0B 0B
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we've adjusted all the code, we can simply set switcher_addr to
wherever it needs to go below the fixmaps, rather than asserting that
it should be so.
With large NR_CPUS and PAE, people were hitting the "mapping switcher
would thwack fixmap" message.
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's always to same, so no need to put in the PTE every time we're
about to run. Keep a flag to track whether the pagetable has the
Switcher entries allocated, and when allocating always initialize the
Switcher text PTE.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently use the whole top PGD entry for the switcher, so we
simply share a fixed page of PTEs between all guests (actually, it's
one per Host CPU, to ensure isolation between guests).
Changes to a scheme where every guest has its own mappings.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want a separate find_pte() function so we can call it for populating the
switcher PTE entries.
We can also use it in page_writable().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a bit neater: we can immediately return if a PTE/PGD/PMD entry
is invalid (which also kills the guest). It means we don't risk using
invalid entries as we reshuffle the code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
ie. SHARED_SWITCHER_PAGES == 1. It is well under a page, and it's a
minor simplification: it's nice to have *one* simplification in a
patch series!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently assume that the Switcher the top pgd; we want to remove
this assumption, so check that vaddr is OK, rather then checking pgd
index.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently use the whole top PGD entry for the switcher, but that's
hitting the fixmap in some configurations (mainly, large NR_CPUS).
Introduce a variable, currently set to the constant.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Returning EMFILE (process has too many open files) is incorrect to
indicate a port is already open by another process. Use EBUSY for that.
This does change what we report to userspace, but I believe userspace
can look at it this way: it gets EBUSY, a new error code, instead of
EMFILE. It's still an error, and that's not changing.
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch adds queue steering to virtio-scsi. When a target is sent
multiple requests, we always drive them to the same queue so that FIFO
processing order is kept. However, if a target was idle, we can choose
a queue arbitrarily. In this case the queue is chosen according to the
current VCPU, so the driver expects the number of request queues to be
equal to the number of VCPUs. This makes it easy and fast to select
the queue, and also lets the driver optimize the IRQ affinity for the
virtqueues (each virtqueue's affinity is set to the CPU that "owns"
the queue).
The speedup comes from improving cache locality and giving CPU affinity
to the virtqueues, which is why this scheme was selected. Assuming that
the thread that is sending requests to the device is I/O-bound, it is
likely to be sleeping at the time the ISR is executed, and thus executing
the ISR on the same processor that sent the requests is cheap.
However, the kernel will not execute the ISR on the "best" processor
unless you explicitly set the affinity. This is because in practice
you will have many such I/O-bound processes and thus many otherwise
idle processors. Then the kernel will execute the ISR on a random
processor, rather than the one that is sending requests to the device.
The alternative to per-CPU virtqueues is per-target virtqueues. To
achieve the same locality, we could dynamically choose the virtqueue's
affinity based on the CPU of the last task that sent a request. This
is less appealing because we do not set the affinity directly---we only
provide a hint to the irqbalanced running in userspace. Dynamically
changing the affinity only works if the userspace applies the hint
fast enough.
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Asias He <asias@redhat.com>
Tested-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio_scsi_target_state is now empty. We will find new uses for it in
the next few patches, so this patch does not drop it completely.
And as James suggested, we use entries target_alloc and target_destroy
in the host template to allocate and destroy the virtio_scsi_target_state
of each target, attach this struct to scsi_target->hostdata. Now
we can get at it from the sdev with scsi_target(sdev)->hostdata.
No messing around with fixed size arrays and bulk memory allocation
and no need to pass in the maximum target size as a parameter because
everything should now happen dynamically.
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some head files were split or moved to uapi/ without
updating MAINTAINERS.
Signed-off-by: Amos Kong <kongjianjun@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio_balloon.h exports "u16" and "u64" to userspace. Use "__u16" and
"__u64" instead.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Check on the correct return value from
vringh_notify_enable_kern(). It returns false if
more packets are available, not true.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>