Commit Graph

35272 Commits

Author SHA1 Message Date
Andreas Rohner 2cc88f3a5f nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl
With this ioctl the segment usage entries in the SUFILE can be updated
from userspace.

This is useful, because it allows the userspace GC to modify and update
segment usage entries for specific segments, which enables it to avoid
unnecessary write operations.

If a segment needs to be cleaned, but there is no or very little
reclaimable space in it, the cleaning operation basically degrades to a
useless moving operation.  In the end the only thing that changes is the
location of the data and a timestamp in the segment usage information.
With this ioctl the GC can skip the cleaning and update the segment
usage entries directly instead.

This is basically a shortcut to cleaning the segment.  It is still
necessary to read the segment summary information, but the writing of
the live blocks can be skipped if it's not worth it.

[konishi.ryusuke@lab.ntt.co.jp: add description of NILFS_IOCTL_SET_SUINFO ioctl]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:25 -07:00
Andreas Rohner 00e9ffcd27 nilfs2: add nilfs_sufile_set_suinfo to update segment usage
Introduce nilfs_sufile_set_suinfo(), which expects an array of
nilfs_suinfo_update structures and updates the segment usage information
accordingly.

This is basically a helper function for the newly introduced
NILFS_IOCTL_SET_SUINFO ioctl.

[konishi.ryusuke@lab.ntt.co.jp: use put_bh() instead of brelse() because we know bh != NULL]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:25 -07:00
Fabian Frederick 5f356fd4d7 fs/coda/inode.c: add __init to init_inodecache()
init_inodecache is only called by __init init_coda

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:25 -07:00
Fabian Frederick dac52fc182 BEFS: logging cleanup
Summary:
 - all printk(KERN_foo converted to pr_foo()
 - add pr_fmt and remove redundant prefixes
 - convert befs_() to va_format (based on patch by Joe Perches)
 - remove non standard %Lu
 - use __func__ for all debugging

[akpm@linux-foundation.org: fix printk warnings, reported by Fengguang]
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Joe Perches <joe@perches.com>
Cc: Fengguang Wu <fengguang.wu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:25 -07:00
Fabian Frederick 91a52ab7d6 fs/befs/linuxvfs.c: add __init to befs_init_inodecache()
init_inodecache is only called by __init init_befs_fs.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:24 -07:00
Fabian Frederick 2d4319ef57 befs: replace kmalloc/memset 0 by kzalloc
Use kzalloc for clean fs_info allocation like other filesystems.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:24 -07:00
Fabian Frederick 84ee353df0 fs/minix/inode.c: add __init to init_inodecache()
init_inodecache is only called by __init init_minix_fs.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:24 -07:00
Luis Henriques b003f96502 binfmt_misc: add missing 'break' statement
A missing 'break' statement in bm_status_write() results in a user program
receiving '3' when doing the following:

  write(fd, "-1", 2);

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:16 -07:00
Fabian Frederick 7a42d4b6a1 fs/efs/super.c: add __init to init_inodecache()
init_inodecache is only called by __init init_efs_fs.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:16 -07:00
Josh Triplett 69369a7003 fs, kernel: permit disabling the uselib syscall
uselib hasn't been used since libc5; glibc does not use it.  Support
turning it off.

When disabled, also omit the load_elf_library implementation from
binfmt_elf.c, which only uselib invokes.

bloat-o-meter:
add/remove: 0/4 grow/shrink: 0/1 up/down: 0/-785 (-785)
function                                     old     new   delta
padzero                                       39      36      -3
uselib_flags                                  20       -     -20
sys_uselib                                   168       -    -168
SyS_uselib                                   168       -    -168
load_elf_library                             426       -    -426

The new CONFIG_USELIB defaults to `y'.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:05 -07:00
Wang YanQing 8f6c5ffc89 kernel/groups.c: remove return value of set_groups
After commit 6307f8fee2 ("security: remove dead hook task_setgroups"),
set_groups will always return zero, so we could just remove return value
of set_groups.

This patch reduces code size, and simplfies code to use set_groups,
because we don't need to check its return value any more.

[akpm@linux-foundation.org: remove obsolete claims from set_groups() comment]
Signed-off-by: Wang YanQing <udknight@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:05 -07:00
Fabian Frederick 6af9f7bf3c sys_sysfs: Add CONFIG_SYSFS_SYSCALL
sys_sysfs is an obsolete system call no longer supported by libc.

 - This patch adds a default CONFIG_SYSFS_SYSCALL=y

 - Option can be turned off in expert mode.

 - cond_syscall added to kernel/sys_ni.c

[akpm@linux-foundation.org: tweak Kconfig help text]
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:05 -07:00
Dave Hansen 5509a5d27b drop_caches: add some documentation and info message
There is plenty of anecdotal evidence and a load of blog posts
suggesting that using "drop_caches" periodically keeps your system
running in "tip top shape".  Perhaps adding some kernel documentation
will increase the amount of accurate data on its use.

If we are not shrinking caches effectively, then we have real bugs.
Using drop_caches will simply mask the bugs and make them harder to
find, but certainly does not fix them, nor is it an appropriate
"workaround" to limit the size of the caches.  On the contrary, there
have been bug reports on issues that turned out to be misguided use of
cache dropping.

Dropping caches is a very drastic and disruptive operation that is good
for debugging and running tests, but if it creates bug reports from
production use, kernel developers should be aware of its use.

Add a bit more documentation about it, a syslog message to track down
abusers, and vmstat drop counters to help analyze problem reports.

[akpm@linux-foundation.org: checkpatch fixes]
[hannes@cmpxchg.org: add runtime suppression control]
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:04 -07:00
Sasha Levin 67f9fd91f9 mm: remove read_cache_page_async()
This patch removes read_cache_page_async() which wasn't really needed
anywhere and simplifies the code around it a bit.

read_cache_page_async() is useful when we want to read a page into the
cache without waiting for it to complete.  This happens when the
appropriate callback 'filler' doesn't complete its read operation and
releases the page lock immediately, and instead queues a different
completion routine to do that.  This never actually happened anywhere in
the code.

read_cache_page_async() had 3 different callers:

- read_cache_page() which is the sync version, it would just wait for
  the requested read to complete using wait_on_page_read().

- JFFS2 would call it from jffs2_gc_fetch_page(), but the filler
  function it supplied doesn't do any async reads, and would complete
  before the filler function returns - making it actually a sync read.

- CRAMFS would call it using the read_mapping_page_async() wrapper, with
  a similar story to JFFS2 - the filler function doesn't do anything that
  reminds async reads and would always complete before the filler function
  returns.

To sum it up, the code in mm/filemap.c never took advantage of having
read_cache_page_async().  While there are filler callbacks that do async
reads (such as the block one), we always called it with the
read_cache_page().

This patch adds a mandatory wait for read to complete when adding a new
page to the cache, and removes read_cache_page_async() and its wrappers.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:04 -07:00
Johannes Weiner 91b0abe36a mm + fs: store shadow entries in page cache
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page.  As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently.  At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.

Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate.  Reclaim will check
for this flag before installing shadow pages.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:01 -07:00
Johannes Weiner 0cd6144aad mm + fs: prepare for non-page entries in page cache radix trees
shmem mappings already contain exceptional entries where swap slot
information is remembered.

To be able to store eviction information for regular page cache, prepare
every site dealing with the radix trees directly to handle entries other
than pages.

The common lookup functions will filter out non-page entries and return
NULL for page cache holes, just as before.  But provide a raw version of
the API which returns non-page entries as well, and switch shmem over to
use it.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:00 -07:00
Johannes Weiner e7b563bb2a mm: filemap: move radix tree hole searching here
The radix tree hole searching code is only used for page cache, for
example the readahead code trying to get a a picture of the area
surrounding a fault.

It sufficed to rely on the radix tree definition of holes, which is
"empty tree slot".  But this is about to change, though, as shadow page
descriptors will be stored in the page cache after the actual pages get
evicted from memory.

Move the functions over to mm/filemap.c and make them native page cache
operations, where they can later be adapted to handle the new definition
of "page cache hole".

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:00 -07:00
Johannes Weiner 55881bc76f fs: cachefiles: use add_to_page_cache_lru()
This code used to have its own lru cache pagevec up until a0b8cab3 ("mm:
remove lru parameter from __pagevec_lru_add and remove parts of pagevec
API").  Now it's just add_to_page_cache() followed by lru_cache_add(),
might as well use add_to_page_cache_lru() directly.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:00 -07:00
Joonsoo Kim 9119a41e90 mm, hugetlb: unify region structure handling
Currently, to track reserved and allocated regions, we use two different
ways, depending on the mapping.  For MAP_SHARED, we use
address_mapping's private_list and, while for MAP_PRIVATE, we use a
resv_map.

Now, we are preparing to change a coarse grained lock which protect a
region structure to fine grained lock, and this difference hinder it.
So, before changing it, unify region structure handling, consistently
using a resv_map regardless of the kind of mapping.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:59 -07:00
Dan Carpenter 45d4f85504 fs/direct-io.c: remove some left over checks
We know that "ret > 0" is true here.  These tests were left over from
commit 02afc27fae ('direct-io: Handle O_(D)SYNC AIO') and aren't
needed any more.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:57 -07:00
Gu Zheng 2b665e276c fs/direct-io.c: remove redundant comparison
The return value of bio_get_nr_vecs() cannot be bigger than
BIO_MAX_PAGES, so we can remove redundant the comparison between
nr_pages and BIO_MAX_PAGES.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:57 -07:00
Wengang Wang 9c339255cb ocfs2: pass "new" parameter to ocfs2_init_xattr_bucket
This patch fixes the following crash:

  kernel BUG at fs/ocfs2/uptodate.c:530!
  Modules linked in: ocfs2(F) ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd sunrpc 8021q garp stp llc bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iTCO_wdt iTCO_vendor_support dcdbas coretemp freq_table mperf microcode pcspkr serio_raw bnx2 lpc_ich mfd_core i5k_amb i5000_edac edac_core e1000e sg shpchp ext4(F) jbd2(F) mbcache(F) dm_round_robin(F) sr_mod(F) cdrom(F) usb_storage(F) sd_mod(F) crc_t10dif(F) pata_acpi(F) ata_generic(F) ata_piix(F) mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F) radeon(F)
   ttm(F) drm_kms_helper(F) drm(F) hwmon(F) i2c_algo_bit(F) i2c_core(F) dm_multipath(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F)
  CPU 5
  Pid: 21303, comm: xattr-test Tainted: GF       W    3.8.13-30.el6uek.x86_64 #2 Dell Inc. PowerEdge 1950/0M788G
  RIP: ocfs2_set_new_buffer_uptodate+0x51/0x60 [ocfs2]
  Process xattr-test (pid: 21303, threadinfo ffff880017aca000, task ffff880016a2c480)
  Call Trace:
    ocfs2_init_xattr_bucket+0x8a/0x120 [ocfs2]
    ocfs2_cp_xattr_bucket+0xbb/0x1b0 [ocfs2]
    ocfs2_extend_xattr_bucket+0x20a/0x2f0 [ocfs2]
    ocfs2_add_new_xattr_bucket+0x23e/0x4b0 [ocfs2]
    ocfs2_xattr_set_entry_index_block+0x13c/0x3d0 [ocfs2]
    ocfs2_xattr_block_set+0xf9/0x220 [ocfs2]
    __ocfs2_xattr_set_handle+0x118/0x710 [ocfs2]
    ocfs2_xattr_set+0x691/0x880 [ocfs2]
    ocfs2_xattr_user_set+0x46/0x50 [ocfs2]
    generic_setxattr+0x96/0xa0
    __vfs_setxattr_noperm+0x7b/0x170
    vfs_setxattr+0xbc/0xc0
    setxattr+0xde/0x230
    sys_fsetxattr+0xc6/0xf0
    system_call_fastpath+0x16/0x1b
  Code: 41 80 0c 24 01 48 89 df e8 7d f0 ff ff 4c 89 e6 48 89 df e8 a2 fe ff ff 48 89 df e8 3a f0 ff ff 48 8b 1c 24 4c 8b 64 24 08 c9 c3 <0f> 0b eb fe 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 66 66
  RIP  ocfs2_set_new_buffer_uptodate+0x51/0x60 [ocfs2]

It hit the BUG_ON() in ocfs2_set_new_buffer_uptodate():

    void ocfs2_set_new_buffer_uptodate(struct ocfs2_caching_info *ci,
                                       struct buffer_head *bh)
    {
          /* This should definitely *not* exist in our cache */
          if (ocfs2_buffer_cached(ci, bh))
                  printk(KERN_ERR "bh->b_blocknr: %lu @ %p\n", bh->b_blocknr, bh);
          BUG_ON(ocfs2_buffer_cached(ci, bh));

          set_buffer_uptodate(bh);

          ocfs2_metadata_cache_io_lock(ci);
          ocfs2_set_buffer_uptodate(ci, bh);
          ocfs2_metadata_cache_io_unlock(ci);
    }

The problem here is:

We cached a block, but the buffer_head got reused.  When we are to pick
up this block again, a new buffer_head created with UPTODATE flag
cleared.  ocfs2_buffer_uptodate() returned false since no UPTODATE is
set on the buffer_head.  so we set this block to cache as a NEW block,
then it failed at asserting block is not in cache.

The fix is to add a new parameter indicating the bucket is a new
allocated or not to ocfs2_init_xattr_bucket().
ocfs2_init_xattr_bucket() assert block not cached accordingly.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:57 -07:00
jiangyiwen 43b10a2037 ocfs2: avoid system inode ref confusion by adding mutex lock
The following case may lead to the same system inode ref in confusion.

A thread                            B thread
ocfs2_get_system_file_inode
->get_local_system_inode
->_ocfs2_get_system_file_inode
                                    because of *arr == NULL,
                                    ocfs2_get_system_file_inode
                                    ->get_local_system_inode
                                    ->_ocfs2_get_system_file_inode
gets first ref thru
_ocfs2_get_system_file_inode,
gets second ref thru igrab and
set *arr = inode
                                    at the moment, B thread also gets
                                    two refs, so lead to one more
                                    inode ref.

So add mutex lock to avoid multi thread set two inode ref once at the
same time.

Signed-off-by: jiangyiwen <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:57 -07:00
jiangyiwen 7dc3e83901 ocfs2: iput inode alloc when failed locally
In ocfs2_info_handle_freeinode() and ocfs2_test_inode_bit() func, after
calls ocfs2_get_system_file_inode() to get inode ref, if calls
ocfs2_info_scan_inode_alloc() or ocfs2_inode_lock() failed, we should
iput inode alloc to avoid leaking the inode.

Signed-off-by: jiangyiwen <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:57 -07:00
Tariq Saeed da8ded405d ocfs2/o2net: o2net_listen_data_ready should do nothing if socket state is not TCP_LISTEN
Orabug: 17330860

When accepting an incomming connection o2net_accept_one clones a child
data socket from the parent listening socket.  It then proceeds to setup
the child with callback o2net_data_ready() and sk_user_data to NULL.  If
data arrives in this window, o2net_listen_data_ready will be called with
some non-deterministic value in sk_user_data (not inherited).  We panic
when we page fault on sk_user_data -- in parent it is
sock_def_readable().

The fix is to recognize that this is a data socket being set up by
looking at the socket state and do nothing.

Signed-off-by: Tariq Saseed <tariq.x.saeed@oracle.com>
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:56 -07:00