* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code
But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (42 commits)
Btrfs: hash the btree inode during fill_super
Btrfs: relocate file extents in clusters
Btrfs: don't rename file into dummy directory
Btrfs: check size of inode backref before adding hardlink
Btrfs: fix releasepage to avoid unlocking extents we haven't locked
Btrfs: Fix test_range_bit for whole file extents
Btrfs: fix errors handling cached state in set/clear_extent_bit
Btrfs: fix early enospc during balancing
Btrfs: deal with NULL space info
Btrfs: account for space used by the super mirrors
Btrfs: fix extent entry threshold calculation
Btrfs: remove dead code
Btrfs: fix bitmap size tracking
Btrfs: don't keep retrying a block group if we fail to allocate a cluster
Btrfs: make balance code choose more wisely when relocating
Btrfs: fix arithmetic error in clone ioctl
Btrfs: add snapshot/subvolume destroy ioctl
Btrfs: change how subvolumes are organized
Btrfs: do not reuse objectid of deleted snapshot/subvol
Btrfs: speed up snapshot dropping
...
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
...
The extent relocation code copy file extents one by one when
relocating data block group. This is inefficient if file
extents are small. This patch makes the relocation code copy
file extents in clusters. So we can can make better use of
read-ahead.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
A recent change enforces only one access point to each subvolume. The first
directory entry (the one added when the subvolume/snapshot was created) is
treated as valid access point, all other subvolume links are linked to dummy
empty directories. The dummy directories are temporary inodes that only in
memory, so we can not rename file into them.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
For every hardlink in btrfs, there is a corresponding inode back
reference. All inode back references for hardlinks in a given
directory are stored in single b-tree item. The size of b-tree item
is limited by the size of b-tree leaf, so we can only create limited
number of hardlinks to a given file in a directory.
The original code lacks of the check, it oops if the number of
hardlinks goes over the limit. This patch fixes the issue by adding
check to btrfs_link and btrfs_rename.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
During releasepage, we try to drop any extent_state structs for the
bye offsets of the page we're releaseing. But the code was incorrectly
telling clear_extent_bit to delete the state struct unconditionallly.
Normally this would be fine because we have the page locked, but other
parts of btrfs will lock down an entire extent, the most common place
being IO completion.
releasepage was deleting the extent state without first locking the extent,
which may result in removing a state struct that another process had
locked down. The fix here is to leave the NODATASUM and EXTENT_LOCKED
bits alone in releasepage.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
If test_range_bit finds an extent that goes all the way to (u64)-1, it
can incorrectly wrap the u64 instead of treaing it like the end of
the address space.
This just adds a check for the highest possible offset so we don't wrap.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Both set and clear_extent_bit allow passing a cached
state struct to reduce rbtree search times. clear_extent_bit
was improperly bypassing some of the checks around making sure
the extent state fields were correct for a given operation.
The fix used here (from Yan Zheng) is to use the hit_next
goto target instead of jumping all the way down to start clearing
bits without making sure the cached state was exactly correct
for the operation we were doing.
This also fixes up the setting of the start variable for both
ops in the case where we find an overlapping extent that
begins before the range we want to change. In both cases
we were incorrectly going backwards from the original
requested change.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We now do extra checks before a balance to make sure
there is room for the balance to take place. One of
the checks was testing to see if we were trying to
balance away the last block group of a given type.
If there is no space available for new chunks, we
should not try and balance away the last block group
of a give type. But, the code wasn't checking for
available chunk space, and so it was exiting too soon.
The fix here is to combine some of the checks and make
sure we try to allocate new chunks when we're balancing
the last block group.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
After a balance it is briefly possible for the space info
field in the inode to be NULL. This adds some checks
to make sure things properly deal with the NULL value.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
trivial: fix typo in aic7xxx comment
trivial: fix comment typo in drivers/ata/pata_hpt37x.c
trivial: typo in kernel-parameters.txt
trivial: fix typo in tracing documentation
trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
trivial: remove unnecessary semicolons
trivial: Fix duplicated word "options" in comment
trivial: kbuild: remove extraneous blank line after declaration of usage()
trivial: improve help text for mm debug config options
trivial: doc: hpfall: accept disk device to unload as argument
trivial: doc: hpfall: reduce risk that hpfall can do harm
trivial: SubmittingPatches: Fix reference to renumbered step
trivial: fix typos "man[ae]g?ment" -> "management"
trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
trivial: fix missing printk space in amd_k7_smp_check
trivial: fix typo s/ketymap/keymap/ in comment
trivial: fix typo "to to" in multiple files
trivial: fix typos in comments s/DGBU/DBGU/
...
As we get closer to proper -ENOSPC handling in btrfs, we need more accurate
space accounting for the space info's. Currently we exclude the free space for
the super mirrors, but the space they take up isn't accounted for in any of the
counters. This patch introduces bytes_super, which keeps track of the amount
of bytes used for a super mirror in the block group cache and space info. This
makes sure that our free space caclucations will be completely accurate.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
There is a slight problem with the extent entry threshold calculation for the
free space cache. We only adjust the threshold down as we add bitmaps, but
never actually adjust the threshold up as we add bitmaps. This means we could
fragment the free space so badly that we end up using all bitmaps to describe
the free space, use all the free space which would result in the bitmaps being
freed, but then go to add free space again as we delete things and immediately
add bitmaps since the extent threshold would still be 0. Now as we free
bitmaps the extent threshold will be ratcheted up to allow more extent entries
to be added.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch removes a bunch of dead code from the snapshot removal stuff. It
was confusing me when doing the metadata ENOSPC stuff so I killed it.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When we first go to add free space, we allocate a new info and set the offset
and bytes to the space we are adding. This is fine, except we actually set the
size of a bitmap as we set the bits in it, so if we add space to a bitmap, we'd
end up counting the same space twice. This isn't a huge deal, it just makes
the allocator behave weirdly since it will think that a bitmap entry has more
space than it ends up actually having. I used a BUG_ON() to catch when this
problem happened, and with this patch I no longer get the BUG_ON().
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The box can get locked up in the allocator if we happen upon a block group
under these conditions:
1) During a commit, so caching threads cannot make progress
2) Our block group currently is in the middle of being cached
3) Our block group currently has plenty of free space in it
4) Our block group is so fragmented that it ends up having no free space chunks
larger than min_bytes calculated by btrfs_find_space_cluster.
What happens is we try and do btrfs_find_space_cluster, which fails because it
is unable to find enough free space chunks that are large than min_bytes and
are close enough together. Since the block group is not cached we do a
wait_block_group_cache_progress, which waits for the number of bytes we need,
except the block group already has _plenty_ of free space, its just severely
fragmented, so we loop and try again, ad infinitum. This patch keeps us from
waiting on the block group to finish caching if we failed to find a free space
cluster before. It also makes sure that we don't even try to find a free space
cluster if we are on our last loop in the allocator, since we will have tried
everything at this point at it is futile.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Currently, we can panic the box if the first block group we go to move is of a
type where there is no space left to move those extents. For example, if we
fill the disk up with data, and then we try to balance and we have no room to
move the data nor room to allocate new chunks, we will panic. Change this by
checking to see if we have room to move this chunk around, and if not, return
-ENOSPC and move on to the next chunk. This will make sure we remove block
groups that are moveable, like if we have alot of empty metadata block groups,
and then that way we make room to be able to balance our data chunks as well.
Tested this with an fs that would panic on btrfs-vol -b normally, but no longer
panics with this patch.
V1->V2:
-actually search for a free extent on the device to make sure we can allocate a
chunk if need be.
-fix btrfs_shrink_device to make sure we actually try to relocate all the
chunks, and then if we can't return -ENOSPC so if we are doing a btrfs-vol -r
we don't remove the device with data still on it.
-check to make sure the block group we are going to relocate isn't the last one
in that particular space
-fix a bug in btrfs_shrink_device where we would change the device's size and
not fix it if we fail to do our relocate
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Fix an arithmetic error that was breaking extents cloned via the clone
ioctl starting in the second half of a file.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch adds snapshot/subvolume destroy ioctl. A subvolume that isn't being
used and doesn't contains links to other subvolumes can be destroyed.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>