"trans->transid" is cpu endian but we want to store the data as little
endian. "item->ctime.nsec" is only 32 bits, not 64.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
This is the kernel portion of btrfs send/receive
Conflicts:
fs/btrfs/Makefile
fs/btrfs/backref.h
fs/btrfs/ctree.c
fs/btrfs/ioctl.c
fs/btrfs/ioctl.h
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This patch introduces the BTRFS_IOC_SEND ioctl that is
required for send. It allows btrfs-progs to implement
full and incremental sends. Patches for btrfs-progs will
follow.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
This patch introduces uuids for subvolumes. Each
subvolume has it's own uuid. In case it was snapshotted,
it also contains parent_uuid. In case it was received,
it also contains received_uuid.
It also introduces subvolume ctime/otime/stime/rtime. The
first two are comparable to the times found in inodes. otime
is the origin/creation time and ctime is the change time.
stime/rtime are only valid on received subvolumes.
stime is the time of the subvolume when it was
sent. rtime is the time of the subvolume when it was
received.
Additionally to the times, we have a transid for each
time. They are updated at the same place as the times.
btrfs receive uses stransid and rtransid to find out
if a received subvolume changed in the meantime.
If an older kernel mounts a filesystem with the
extented fields, all fields become invalid. The next
mount with a new kernel will detect this and reset the
fields.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Each ordered operation has a free callback, and this was called with the
worker spinlock held. Josef made the free callback also call iput,
which we can't do with the spinlock.
This drops the spinlock for the free operation and grabs it again before
moving through the rest of the list. We'll circle back around to this
and find a cleaner way that doesn't bounce the lock around so much.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org
In support of the recently added capability to remount with lzo
compression, provide a helper function to check the compression
INCOMPAT flags when remounting with lzo compression, and set
the flags if necessary.
Also, implement the new helper function when defragmenting with
explicit lzo compression and when setting the default subvolume.
Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Often no exact match is wanted but just the next lower or
higher item. There's a lot of duplicated code throughout
btrfs to deal with the corner cases. This patch adds a
helper function that can facilitate searching.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Lift the EXDEV condition and allow different root trees for files being
cloned, then pass source inode's root when searching for extents.
Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from
one filesystem are mounted separately.
Signed-off-by: David Sterba <dsterba@suse.cz>
While testing with my buffer read fio jobs[1], I find that btrfs does not
perform well enough.
Here is a scenario in fio jobs:
We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
and all of them will race on add_to_page_cache_lru(), and if one thread
successfully puts its page into the page cache, it takes the responsibility
to read the page's data.
And what's more, reading a page needs a period of time to finish, in which
other threads can slide in and process rest pages:
t1 t2 t3 t4
add Page1
read Page1 add Page2
| read Page2 add Page3
| | read Page3 add Page4
| | | read Page4
-----|------------|-----------|-----------|--------
v v v v
bio bio bio bio
Now we have four bios, each of which holds only one page since we need to
maintain consecutive pages in bio. Thus, we can end up with far more bios
than we need.
Here we're going to
a) delay the real read-page section and
b) try to put more pages into page cache.
With that said, we can make each bio hold more pages and reduce the number
of bios we need.
Here is some numbers taken from fio results:
w/o patch w patch
------------- -------- ---------------
READ: 745MB/s +25% 934MB/s
[1]:
[global]
group_reporting
thread
numjobs=4
bs=32k
rw=read
ioengine=sync
directory=/mnt/btrfs/
[READ]
filename=foobar
size=2000M
invalidate=1
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
For backref walking, we've introduce delayed ref's sequence. However,
it changes our preallocation behavior.
The story is that when we preallocate an extent and then mark it written
piece by piece, the ideal case should be that we don't need to COW the
extent, which is why we use 'preallocate'.
But we may not make use of preallocation, since when we check for cross refs on
the extent, we may have two ref entries which have the same content except
the sequence value, and we recognize them as cross refs and do COW to allocate
another extent.
So we end up with several pieces of space instead of an whole extent.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
There is a small window where an eb can have no IO bits set on it, which
could potentially result in extent_buffer_under_io() returning false when we
want it to return true, which could result in not fun things happening. So
in order to protect this case we need to hold the refs_lock when we make
this transition to make sure we get reliable results out of
extent_buffer_udner_io(). Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
This sounds sort of impossible but it is the only thing I can think of and
at the very least it is theoretically possible so here it goes.
If we are in try_release_extent_buffer we will check that the ref count on
the extent buffer is 1 and not under IO, and then go down and clear the tree
ref. If between this check and clearing the tree ref somebody else comes in
and grabs a ref on the eb and the marks it dirty before
try_release_extent_buffer() does it's tree ref clear we can end up with a
dirty eb that will be freed while it is still dirty which will result in a
panic. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
I noticed while looking at an extent_buffer race that we will
unconditionally return 1 if we get down to release_extent_buffer after
clearing the tree ref. However we can easily race in here and get a ref on
the eb and not actually free the eb. So make release_extent_buffer return 1
if it free'd the eb and 0 if not so we can be a little kinder to the vm.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Code is added to suppress the I/O stats printing at mount time if all
statistic values are zero.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
People complained about the annoying kernel log message
"btrfs: no dev_stats entry found ... (OK on first mount after mkfs)"
everytime a filesystem is mounted for the first time after running
mkfs. Since the distribution of the btrfs-progs is not synchronized
to the kernel version, mkfs like it is now will be used also in the
future. Then this message is not useful to find errors, it is just
annoying. This commit removes the printk().
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and
btrfs_foo() functions, which read and write specific fields in the
extent buffer.
The total number of set/get functions is ~200, but in fact we only
need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64.
It results in redunction of ~37K bytes.
text data bss dec hex filename
629661 12489 216 642366 9cd3e fs/btrfs/btrfs.o.orig
592637 12489 216 605342 93c9e fs/btrfs/btrfs.o
Signed-off-by: Li Zefan <lizefan@huawei.com>