* git://git.infradead.org/iommu-2.6:
intel-iommu: Fix off-by-one in RMRR setup
intel-iommu: Add domain check in domain_remove_one_dev_info
intel-iommu: Remove Host Bridge devices from identity mapping
intel-iommu: Use coherent DMA mask when requested
intel-iommu: Dont cache iova above 32bit
intel-iommu: Speed up processing of the identity_mapping function
intel-iommu: Check for identity mapping candidate using system dma mask
intel-iommu: Only unlink device domains from iommu
intel-iommu: Enable super page (2MiB, 1GiB, etc.) support
intel-iommu: Flush unmaps at domain_exit
intel-iommu: Remove obsolete comment from detect_intel_iommu
intel-iommu: fix VT-d PMR disable for TXT on S3 resume
There are no externally-visible changes with this. In the loop in the
internal __domain_mapping() function, we simply detect if we are mapping:
- size >= 2MiB, and
- virtual address aligned to 2MiB, and
- physical address aligned to 2MiB, and
- on hardware that supports superpages.
(and likewise for larger superpages).
We automatically use a superpage for such mappings. We never have to
worry about *breaking* superpages, since we trust that we will always
*unmap* the same range that was mapped. So all we need to do is ensure
that dma_pte_clear_range() will also cope with superpages.
Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so
it can return a PTE at the appropriate level rather than always
extending the page tables all the way down to level 1. Again, this is
simplified by the fact that we should never encounter existing small
pages when we're creating a mapping; any old mapping that used the same
virtual range will have been entirely removed and its obsolete page
tables freed.
Provide an 'intel_iommu=sp_off' argument on the command line as a
chicken bit. Not that it should ever be required.
==
The original commit seen in the iommu-2.6.git was Youquan's
implementation (and completion) of my own half-baked code which I'd
typed into an email. Followed by half a dozen subsequent 'fixes'.
I've taken the unusual step of rewriting history and collapsing the
original commits in order to keep the main history simpler, and make
life easier for the people who are going to have to backport this to
older kernels. And also so I can give it a more coherent commit comment
which (hopefully) gives a better explanation of what's going on.
The original sequence of commits leading to identical code was:
Youquan Song (3):
intel-iommu: super page support
intel-iommu: Fix superpage alignment calculation error
intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte()
David Woodhouse (4):
intel-iommu: Precalculate superpage support for dmar_domain
intel-iommu: Fix hardware_largepage_caps()
intel-iommu: Fix inappropriate use of superpages in __domain_mapping()
intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Fix build warnings in physmap.h:
include/linux/mtd/physmap.h:25: warning: 'struct platform_device' declared inside parameter list
include/linux/mtd/physmap.h:25: warning: its scope is only this definition or declaration, which is probably not what you want
include/linux/mtd/physmap.h:26: warning: 'struct platform_device' declared inside parameter list
include/linux/mtd/physmap.h:27: warning: 'struct platform_device' declared inside parameter list
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Add an API that tells the other side that callbacks
should be delayed until a lot of work has been done.
Implement using the new event_idx feature.
Note: it might seem advantageous to let the drivers
ask for a callback after a specific capacity has
been reached. However, as a single head can
free many entries in the descriptor table,
we don't really have a clue about capacity
until get_buf is called. The API is the simplest
to implement at the moment, we'll see what kind of
hints drivers can pass when there's more than one
user of the feature.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With the new used_event and avail_event and features, both
host and guest need similar logic to check whether events are
enabled, so it helps to put the common code in the header.
Note that Xen has similar logic for notification hold-off
in include/xen/interface/io/ring.h with req_event and req_prod
corresponding to event_idx + 1 and new_idx respectively.
+1 comes from the fact that req_event and req_prod in Xen start at 1,
while event index in virtio starts at 0.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Define a new feature bit for the guest and host to utilize
an event index (like Xen) instead if a flag bit to enable/disable
interrupts and kicks.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Thomas Gleixner reports that we now have a boot crash triggered by
CONFIG_CPUMASK_OFFSTACK=y:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c11ae035>] find_next_bit+0x55/0xb0
Call Trace:
[<c11addda>] cpumask_any_but+0x2a/0x70
[<c102396b>] flush_tlb_mm+0x2b/0x80
[<c1022705>] pud_populate+0x35/0x50
[<c10227ba>] pgd_alloc+0x9a/0xf0
[<c103a3fc>] mm_init+0xec/0x120
[<c103a7a3>] mm_alloc+0x53/0xd0
which was introduced by commit de03c72cfc ("mm: convert
mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of
mm_init() vs mm_init_cpumask
Thomas wrote a patch to just fix the ordering of initialization, but I
hate the new double allocation in the fork path, so I ended up instead
doing some more radical surgery to clean it all up.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-2.6.40' of git://linux-nfs.org/~bfields/linux: (22 commits)
nfsd: make local functions static
NFSD: Remove unused variable from nfsd4_decode_bind_conn_to_session()
NFSD: Check status from nfsd4_map_bcts_dir()
NFSD: Remove setting unused variable in nfsd_vfs_read()
nfsd41: error out on repeated RECLAIM_COMPLETE
nfsd41: compare request's opcnt with session's maxops at nfsd4_sequence
nfsd v4.1 lOCKT clientid field must be ignored
nfsd41: add flag checking for create_session
nfsd41: make sure nfs server process OPEN with EXCLUSIVE4_1 correctly
nfsd4: fix wrongsec handling for PUTFH + op cases
nfsd4: make fh_verify responsibility of nfsd_lookup_dentry caller
nfsd4: introduce OPDESC helper
nfsd4: allow fh_verify caller to skip pseudoflavor checks
nfsd: distinguish functions of NFSD_MAY_* flags
svcrpc: complete svsk processing on cb receive failure
svcrpc: take advantage of tcp autotuning
SUNRPC: Don't wait for full record to receive tcp data
svcrpc: copy cb reply instead of pages
svcrpc: close connection if client sends short packet
svcrpc: note network-order types in svc_process_calldir
...
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
dm kcopyd: return client directly and not through a pointer
dm kcopyd: reserve fewer pages
dm io: use fixed initial mempool size
dm kcopyd: alloc pages from the main page allocator
dm kcopyd: add gfp parm to alloc_pl
dm kcopyd: remove superfluous page allocation spinlock
dm kcopyd: preallocate sub jobs to avoid deadlock
dm kcopyd: avoid pointless job splitting
dm mpath: do not fail paths after integrity errors
dm table: reject devices without request fns
dm table: allow targets to support discards internally
* 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
SUNRPC: Support for RPC over AF_LOCAL transports
SUNRPC: Remove obsolete comment
SUNRPC: Use AF_LOCAL for rpcbind upcalls
SUNRPC: Clean up use of curly braces in switch cases
NFS: Revert NFSROOT default mount options
SUNRPC: Rename xs_encode_tcp_fragment_header()
nfs,rcu: convert call_rcu(nfs_free_delegation_callback) to kfree_rcu()
nfs41: Correct offset for LAYOUTCOMMIT
NFS: nfs_update_inode: print current and new inode size in debug output
NFSv4.1: Fix the handling of NFS4ERR_SEQ_MISORDERED errors
NFSv4: Handle expired stateids when the lease is still valid
SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI EC: remove redundant code
ACPI: Add D3 cold state
ACPI: processor: fix processor_physically_present in UP kernel
ACPI: Split out custom_method functionality into an own driver
ACPI: Cleanup custom_method debug stuff
ACPI EC: enable MSI workaround for Quanta laptops
ACPICA: Update to version 20110413
ACPICA: Execute an orphan _REG method under the EC device
ACPICA: Move ACPI_NUM_PREDEFINED_REGIONS to a more appropriate place
ACPICA: Update internal address SpaceID for DataTable regions
ACPICA: Add more methods eligible for NULL package element removal
ACPICA: Split all internal Global Lock functions to new file - evglock
ACPI: EC: add another DMI check for ASUS hardware
ACPI EC: remove dead code
ACPICA: Fix code divergence of global lock handling
ACPICA: Use acpi_os_create_lock interface
ACPI: osl, add acpi_os_create_lock interface
ACPI:Fix goto flows in thermal-sys
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param
x86 idle: deprecate "no-hlt" cmdline param
x86 idle APM: deprecate CONFIG_APM_CPU_IDLE
x86 idle floppy: deprecate disable_hlt()
x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it
x86 idle: clarify AMD erratum 400 workaround
idle governor: Avoid lock acquisition to read pm_qos before entering idle
cpuidle: menu: fixed wrapping timers at 4.294 seconds
NFSv4.1 LAYOUTRETURN implementation
Currently, does not support layout-type payload encoding.
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
[call pnfs_return_layout right before pnfs_destroy_layout]
[remove assert_spin_locked from pnfs_clear_lseg_list]
[remove wait parameter from the layoutreturn path.]
[remove return_type field from nfs4_layoutreturn_args]
[remove range from nfs4_layoutreturn_args]
[no need to send layoutcommit from _pnfs_return_layout]
[don't wait on sync layoutreturn]
[fix layout stateid in layoutreturn args]
[fixed NULL deref in _pnfs_return_layout]
[removed recaim member of nfs4_layoutreturn_args]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Non-rpc layout driver such as for objects and blocks
implement their own I/O path and error handling logic.
Therefore bypass NFS-based error handling for these layout drivers.
[fix lseg ref-count bugs, and null de-refs]
[Fall out from: non-rpc layout drivers]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[get rid of PNFS_USE_RPC_CODE]
[get rid of __nfs4_write_done_cb]
[revert useless change in nfs4_write_done_cb]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
* Add the pnfs_osd_xdr.h header
* defintions the pnfs_osd_layout structure including all it's
sub-types and constants.
* Declare the pnfs_osd_xdr_decode_layout API + all needed
inline helpers.
* Define the pnfs_osd_deviceaddr structure and all its subtypes and
constants.
* Declare API for decoding of a pnfs_osd_deviceaddr from XDR stream.
* Define the pnfs_osd_ioerr structure, its substructures and constants.
* Declare API for encoding of a pnfs_osd_ioerr into XDR stream.
* Define the pnfs_osd_layoutupdate structure and its substructures.
* Declare API for encoding of a pnfs_osd_layoutupdate into XDR stream.
[Remove server definitions]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Return client directly from dm_kcopyd_client_create, not through a
parameter, making it consistent with dm_io_client_create.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Reserve just the minimum of pages needed to process one job.
Because we allocate pages from page allocator, we don't need to reserve
a large number of pages. The maximum job size is SUB_JOB_SIZE and we
calculate the number of reserved pages based on this.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Replace the arbitrary calculation of an initial io struct mempool size
with a constant.
The code calculated the number of reserved structures based on the request
size and used a "magic" multiplication constant of 4. This patch changes
it to reserve a fixed number - itself still chosen quite arbitrarily.
Further testing might show if there is a better number to choose.
Note that if there is no memory pressure, we can still allocate an
arbitrary number of "struct io" structures. One structure is enough to
process the whole request.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>