linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
Nick Piggin	b0018361c3	block: bd_start_claiming cleanup I don't like the subtle multi-context code in bd_claim (ie. detects where it has been called based on bd_claiming). It seems clearer to just require a new function to finish a 2-part claim. Also improve commentary in bd_start_claiming as to how it should be used. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Nick Piggin	cf3425707e	block: bd_start_claiming fix module refcount bd_start_claiming has an unbalanced module_put introduced in `6b4517a79`. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Nick Piggin	3322e79a38	fs: convert simple fs to new truncate Convert simple filesystems: ramfs, configfs, sysfs, block_dev to new truncate sequence. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:15:47 -04:00
Christoph Hellwig	7ea8085910	drop unused dentry argument to ->fsync Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:05:02 -04:00
Linus Torvalds	e8bebe2f71	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (69 commits) fix handling of offsets in cris eeprom.c, get rid of fake on-stack files get rid of home-grown mutex in cris eeprom.c switch ecryptfs_write() to struct inode , kill on-stack fake files switch ecryptfs_get_locked_page() to struct inode simplify access to ecryptfs inodes in ->readpage() and friends AFS: Don't put struct file on the stack Ban ecryptfs over ecryptfs logfs: replace inode uid,gid,mode initialization with helper function ufs: replace inode uid,gid,mode initialization with helper function udf: replace inode uid,gid,mode init with helper ubifs: replace inode uid,gid,mode initialization with helper function sysv: replace inode uid,gid,mode initialization with helper function reiserfs: replace inode uid,gid,mode initialization with helper function ramfs: replace inode uid,gid,mode initialization with helper function omfs: replace inode uid,gid,mode initialization with helper function bfs: replace inode uid,gid,mode initialization with helper function ocfs2: replace inode uid,gid,mode initialization with helper function nilfs2: replace inode uid,gid,mode initialization with helper function minix: replace inode uid,gid,mode init with helper ext4: replace inode uid,gid,mode init with helper ... Trivial conflict in fs/fs-writeback.c (mark bitfields unsigned)	2010-05-21 19:37:45 -07:00
Josef Bacik	18e9e5104f	Introduce freeze_super and thaw_super for the fsfreeze ioctl Currently the way we do freezing is by passing sb>s_bdev to freeze_bdev and then letting it do all the work. But freezing is more of an fs thing, and doesn't really have much to do with the bdev at all, all the work gets done with the super. In btrfs we do not populate s_bdev, since we can have multiple bdev's for one fs and setting s_bdev makes removing devices from a pool kind of tricky. This means that freezing a btrfs filesystem fails, which causes us to corrupt with things like tux-on-ice which use the fsfreeze mechanism. So instead of populating sb->s_bdev with a random bdev in our pool, I've broken the actual fs freezing stuff into freeze_super and thaw_super. These just take the super_block that we're freezing and does the appropriate work. It's basically just copy and pasted from freeze_bdev. I've then converted freeze_bdev over to use the new super helpers. I've tested this with ext4 and btrfs and verified everything continues to work the same as before. The only new gotcha is multiple calls to the fsfreeze ioctl will return EBUSY if the fs is already frozen. I thought this was a better solution than adding a freeze counter to the super_block, but if everybody hates this idea I'm open to suggestions. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-21 18:31:18 -04:00
Al Viro	d3f2147307	Move grabbing s_umount to callers of grab_super() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-21 18:31:17 -04:00
Jens Axboe	7407cf355f	Merge branch 'master' into for-2.6.35 Conflicts: fs/block_dev.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-29 09:36:24 +02:00
Dmitry Monakhov	fbd9b09a17	blkdev: generalize flags for blkdev_issue_fn functions The patch just convert all blkdev_issue_xxx function to common set of flags. Wait/allocation semantics preserved. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-28 19:47:36 +02:00
Tejun Heo	6b4517a791	block: implement bd_claiming and claiming block Currently, device claiming for exclusive open is done after low level open - disk->fops->open() - has completed successfully. This means that exclusive open attempts while a device is already exclusively open will fail only after disk->fops->open() is called. cdrom driver issues commands during open() which means that O_EXCL open attempt can unintentionally inject commands to in-progress command stream for burning thus disturbing burning process. In most cases, this doesn't cause problems because the first command to be issued is TUR which most devices can process in the middle of burning. However, depending on how a device replies to TUR during burning, cdrom driver may end up issuing further commands. This can't be resolved trivially by moving bd_claim() before doing actual open() because that means an open attempt which will end up failing could interfere other legit O_EXCL open attempts. ie. unconfirmed open attempts can fail others. This patch resolves the problem by introducing claiming block which is started by bd_start_claiming() and terminated either by bd_claim() or bd_abort_claiming(). bd_claim() from inside a claiming block is guaranteed to succeed and once a claiming block is started, other bd_start_claiming() or bd_claim() attempts block till the current claiming block is terminated. bd_claim() can still be used standalone although now it always synchronizes against claiming blocks, so the existing users will keep working without any change. blkdev_open() and open_bdev_exclusive() are converted to use claiming blocks so that exclusive open attempts from these functions don't interfere with the existing exclusive open. This problem was discovered while investigating bko#15403. https://bugzilla.kernel.org/show_bug.cgi?id=15403 The burning problem itself can be resolved by updating userspace probing tools to always open w/ O_EXCL. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Matthias-Christian Ott <ott@mirix.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-27 10:57:54 +02:00
Tejun Heo	1a3cbbc5a5	block: factor out bd_may_claim() Factor out bd_may_claim() from bd_claim(), add comments and apply a couple of cosmetic edits. This is to prepare for further updates to claim path. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-27 10:57:54 +02:00
Anton Blanchard	b8af67e268	fs/block_dev.c: fix performance regression in O_DIRECT\|O_SYNC writes to block devices We are seeing a large regression in database performance on recent kernels. The database opens a block device with O_DIRECT\|O_SYNC and a number of threads write to different regions of the file at the same time. A simple test case is below. I haven't defined DEVICE since getting it wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we see about 17MB/sec and only a few threads in IO wait: procs -----io---- -system-- -----cpu------ r b bi bo in cs us sy id wa st 0 3 0 16170 656 2259 0 0 86 14 0 0 2 0 16704 695 2408 0 0 92 8 0 0 2 0 17308 744 2653 0 0 86 14 0 0 2 0 17933 759 2777 0 0 89 10 0 Most threads are blocking in vfs_fsync_range, which has: mutex_lock(&mapping->host->i_mutex); err = fop->fsync(file, dentry, datasync); if (!ret) ret = err; mutex_unlock(&mapping->host->i_mutex); commit `148f948ba8` (vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers some explanation of what is going on: Use these new helpers for syncing from generic VFS functions. This makes O_SYNC writes to block devices acquire i_mutex for syncing. If we really care about this, we can make block_fsync() drop the i_mutex and reacquire it before it returns. Thanks Jan for such a good commit message! As well as dropping i_mutex, Christoph suggests we should remove the call to sync_blockdev(): > sync_blockdev is an overcomplicated alias for filemap_write_and_wait on > the block device inode, which is exactly what we did just before calling > into ->fsync The patch below incorporates both suggestions. With it the testcase improves from 17MB/s to 68M/sec: procs -----io---- -system-- -----cpu------ r b bi bo in cs us sy id wa st 0 7 0 65536 1000 3878 0 0 70 30 0 0 34 0 69632 1016 3921 0 1 46 53 0 0 57 0 69632 1000 3921 0 0 55 45 0 0 53 0 69640 754 4111 0 0 81 19 0 Testcase: #define _GNU_SOURCE #include <stdio.h> #include <pthread.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define NR_THREADS 64 #define BUFSIZE (64 * 1024) #define DEVICE "/dev/mapper/XXXXXX" #define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1)) static int fd; static void doit(void arg) { unsigned long offset = (long)arg; char b, buf; b = malloc(BUFSIZE + 1024); buf = (char )ALIGN((unsigned long)b, 1024); memset(buf, 0, BUFSIZE); while (1) pwrite(fd, buf, BUFSIZE, offset); } int main(int argc, char argv[]) { int flags = O_RDWR\|O_DIRECT; int i; unsigned long offset = 0; if (argc > 1 && !strcmp(argv[1], "O_SYNC")) flags \|= O_SYNC; fd = open(DEVICE, flags); if (fd == -1) { perror("open"); exit(1); } for (i = 0; i < NR_THREADS-1; i++) { pthread_t tid; pthread_create(&tid, NULL, doit, (void )offset); offset += BUFSIZE; } doit((void )offset); return 0; } Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-24 11:31:26 -07:00
Andrew Morton	b1dd3b2843	vfs: rename block_fsync() to blkdev_fsync() Requested by hch, for consistency now it is exported. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Anton Blanchard <anton@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-07 08:38:04 -07:00
Anton Blanchard	55ab3a1ff8	raw: fsync method is now required Commit `148f948ba8` (vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke the raw driver. We now call through generic_file_aio_write -> generic_write_sync -> vfs_fsync_range. vfs_fsync_range has: if (!fop \|\| !fop->fsync) { ret = -EINVAL; goto out; } But drivers/char/raw.c doesn't set an fsync method. We have two options: fix it or remove the raw driver completely. I'm happy to do either, the fact this has been broken for so long suggests it is rarely used. The patch below adds an fsync method to the raw driver. My knowledge of the block layer is pretty sketchy so this could do with a once over. If we instead decide to remove the raw driver, this patch might still be useful as a backport to 2.6.33 and 2.6.32. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-07 08:38:04 -07:00
Jun'ichi Nomura	4b06e5b9ad	freeze_bdev: don't deactivate successfully frozen MS_RDONLY sb Thanks Thomas and Christoph for testing and review. I removed 'smp_wmb()' before up_write from the previous patch, since up_write() should have necessary ordering constraints. (I.e. the change of s_frozen is visible to others after up_write) I'm quite sure the change is harmless but if you are uncomfortable with Tested-by/Reviewed-by on the modified patch, please remove them. If MS_RDONLY, freeze_bdev should just up_write(s_umount) instead of deactivate_locked_super(). Also, keep sb->s_frozen consistent so that remount can check the frozen state. Otherwise a crash reported here can happen: http://lkml.org/lkml/2010/1/16/37 http://lkml.org/lkml/2010/1/28/53 This patch should be applied for 2.6.32 stable series, too. Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Thomas Backlund <tmb@mandriva.org> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-02-07 03:06:21 -05:00
Jens Axboe	2058297d2d	Merge branch 'for-linus' into for-2.6.33 Conflicts: block/cfq-iosched.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-11-03 21:14:39 +01:00
Christoph Hellwig	ab0a9735e0	blkdev: flush disk cache on ->fsync Currently there is no barrier support in the block device code. That means we cannot guarantee any sort of data integerity when using the block device node with dis kwrite caches enabled. Using the raw block device node is a typical use case for virtualization (and I assume databases, too). This patch changes block_fsync to issue a cache flush and thus make fsync on block device nodes actually useful. Note that in mainline we would also need to add such code to the ->aio_write method for O_SYNC handling, but assuming that Jan's patch series for the O_SYNC rewrite goes in it will also call into ->fsync for 2.6.32. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-29 14:14:04 +01:00
Neil Brown	960cc0f4fe	block: use after free bug in __blkdev_get commit `0762b8bde9` (from 14 months ago) introduced a use-after-free bug which has just recently started manifesting in my md testing. I tried git bisect to find out what caused the bug to start manifesting, and it could have been the recent change to blk_unregister_queue (`48c0d4d4c0`) but the results were inconclusive. This patch certainly fixes my symptoms and looks correct as the two calls are now in the same order as elsewhere in that function. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-26 15:27:11 +01:00
Christoph Hellwig	4504230a71	freeze_bdev: grab active reference to frozen superblocks Currently we held s_umount while a filesystem is frozen, despite that we might return to userspace and unlock it from a different process. Instead grab an active reference to keep the file system busy and add an explicit check for frozen filesystems in remount and reject the remount instead of blocking on s_umount. Add a new get_active_super helper to super.c for use by freeze_bdev that grabs an active reference to a superblock from a given block device. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-09-24 07:47:41 -04:00
Christoph Hellwig	4fadd7bb20	freeze_bdev: kill bd_mount_sem Now that we have the freeze count there is not much reason for bd_mount_sem anymore. The actual freeze/thaw operations are serialized using the bd_fsfreeze_mutex, and the only other place we take bd_mount_sem is get_sb_bdev which tries to prevent mounting a filesystem while the block device is frozen. Instead of add a check for bd_fsfreeze_count and return -EBUSY if a filesystem is frozen. While that is a change in user visible behaviour a failing mount is much better for this case rather than having the mount process stuck uninterruptible for a long time. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-09-24 07:47:39 -04:00
Alexey Dobriyan	83d5cde47d	const: make block_device_operations const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:25 -07:00
Jens Axboe	2c96ce9f20	fs: remove bdev->bd_inode_backing_dev_info It has been unused since it was introduced in: commit 520808bf20e90fdbdb320264ba7dd5cf9d47dcac Author: Andrew Morton <akpm@osdl.org> Date: Fri May 21 00:46:17 2004 -0700 [PATCH] block device layer: separate backing_dev_info infrastructure So lets just kill it. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:16:18 +02:00
Christoph Hellwig	eef9938067	vfs: Rename generic_file_aio_write_nolock generic_file_aio_write_nolock() is now used only by block devices and raw character device. Filesystems should use __generic_file_aio_write() in case generic_file_aio_write() doesn't suit them. So rename the function to blkdev_aio_write() and move it to fs/blockdev.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:15 +02:00
Alan Jenkins	dddac6a7b4	PM / Hibernate: Replace bdget call with simple atomic_inc of i_count Create bdgrab(). This function copies an existing reference to a block_device. It is safe to call from any context. Hibernation code wishes to copy a reference to the active swap device. Right now it calls bdget() under a spinlock, but this is wrong because bdget() can sleep. It doesn't need a full bdget() because we already hold a reference to active swap devices (and the spinlock protects against swapoff). Fixes http://bugzilla.kernel.org/show_bug.cgi?id=13827 Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2009-07-29 21:07:55 +02:00
Jan Kara	60b0680fa2	vfs: Rename fsync_super() to sync_filesystem() (version 4) Rename the function so that it better describe what it really does. Also remove the unnecessary include of buffer_head.h. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:04 -04:00

... 2 3 4 5 6 ...

215 Commits