Commit Graph

35119 Commits

Author SHA1 Message Date
Linus Torvalds
238ccbb050 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (22 commits)
  Input: ALPS - add interleaved protocol support (Dell E6x00 series)
  Input: keyboard - don't override beep with a bell
  Input: altera_ps2 - fix test of unsigned in altera_ps2_probe()
  Input: add mc13783 touchscreen driver
  Input: ep93xx_keypad - update driver to new core support
  Input: wacom - separate pen from express keys on Graphire
  Input: wacom - add defines for data packet report IDs
  Input: wacom - add support for new LCD tablets
  Input: wacom - add defines for packet lengths of various devices
  Input: wacom - ensure the device is initialized properly upon resume
  Input: at32psif - do not sleep in atomic context
  Input: i8042 - add Gigabyte M1022M to the noloop list
  Input: i8042 - allow installing platform filters for incoming data
  Input: i8042 - fix locking in interrupt routine
  Input: ALPS - do not set REL_X/REL_Y capabilities on the touchpad
  Input: document use of input_event() function
  Input: sa1111ps2 - annotate probe() and remove() methods
  Input: ambakmi - annotate probe() and remove() methods
  Input: gscps2 - fix probe() and remove() annotations
  Input: altera_ps2 - add annotations to probe and remove methods
  ...
2009-12-16 10:31:44 -08:00
Linus Torvalds
9b2831704e Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (33 commits)
  sh: Fix test of unsigned in se7722_irq_demux()
  sh: mach-ecovec24: Add FSI sound support
  sh: mach-ecovec24: Add mt9t112 camera support
  sh: mach-ecovec24: Add tw9910 support
  sh: MSIOF/mmc_spi platform data for the Ecovec24 board
  sh: ms7724se: Add ak4642 support
  sh: Fix up FPU build for SH5
  sh: Remove old early serial console code V2
  sh: sh5 scif pdata (sh5-101/sh5-103)
  sh: sh4a scif pdata (sh7757/sh7763/sh7770/sh7780/sh7785/sh7786/x3)
  sh: sh4a scif pdata (sh7343/sh7366/sh7722/sh7723/sh7724)
  sh: sh4 scif pdata (sh7750/sh7760/sh4-202)
  sh: sh3 scif pdata (sh7705/sh770x/sh7710/sh7720)
  sh: sh2a scif pdata (sh7201/sh7203/sh7206/mxg)
  sh: sh2 scif pdata (sh7616)
  sh-sci: Extend sh-sci driver with early console V2
  sh: Stub in P3 ioremap support for nommu parts.
  sh: wire up vmallocinfo support in ioremap() implementations.
  sh: Make the unaligned trap handler always obey notification levels.
  sh: Couple kernel and user write page perm bits for CONFIG_X2TLB
  ...
2009-12-16 10:29:52 -08:00
Linus Torvalds
7949456b1b Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
  ppc440spe-adma: adds updated ppc440spe adma driver
  iop-adma.c: use resource_size()
  dmaengine: clarify the meaning of the DMA_CTRL_ACK flag
  sh: stylistic improvements for the DMA driver
  dmaengine: fix dmatest to verify minimum transfer length and test buffer size
  sh: DMA driver has to specify its alignment requirements
  Add COH 901 318 DMA block driver v5
2009-12-16 10:28:56 -08:00
Linus Torvalds
60d9aa758c Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (90 commits)
  jffs2: Fix long-standing bug with symlink garbage collection.
  mtd: OneNAND: Fix test of unsigned in onenand_otp_walk()
  mtd: cfi_cmdset_0002, fix lock imbalance
  Revert "mtd: move mxcnd_remove to .exit.text"
  mtd: m25p80: add support for Macronix MX25L4005A
  kmsg_dump: fix build for CONFIG_PRINTK=n
  mtd: nandsim: add support for 4KiB pages
  mtd: mtdoops: refactor as a kmsg_dumper
  mtd: mtdoops: make record size configurable
  mtd: mtdoops: limit the maximum mtd partition size
  mtd: mtdoops: keep track of used/unused pages in an array
  mtd: mtdoops: several minor cleanups
  core: Add kernel message dumper to call on oopses and panics
  mtd: add ARM pismo support
  mtd: pxa3xx_nand: Fix PIO data transfer
  mtd: nand: fix multi-chip suspend problem
  mtd: add support for switching old SST chips into QRY mode
  mtd: fix M29W800D dev_id and uaddr
  mtd: don't use PF_MEMALLOC
  mtd: Add bad block table overrides to Davinci NAND driver
  ...

Fixed up conflicts (mostly trivial) in
	drivers/mtd/devices/m25p80.c
	drivers/mtd/maps/pcmciamtd.c
	drivers/mtd/nand/pxa3xx_nand.c
	kernel/printk.c
2009-12-16 10:23:43 -08:00
Linus Torvalds
a79960e576 Merge git://git.infradead.org/iommu-2.6
* git://git.infradead.org/iommu-2.6:
  implement early_io{re,un}map for ia64
  Revert "Intel IOMMU: Avoid memory allocation failures in dma map api calls"
  intel-iommu: ignore page table validation in pass through mode
  intel-iommu: Fix oops with intel_iommu=igfx_off
  intel-iommu: Check for an RMRR which ends before it starts.
  intel-iommu: Apply BIOS sanity checks for interrupt remapping too.
  intel-iommu: Detect DMAR in hyperspace at probe time.
  dmar: Fix build failure without NUMA, warn on bogus RHSA tables and don't abort
  iommu: Allocate dma-remapping structures using numa locality info
  intr_remap: Allocate intr-remapping table using numa locality info
  dmar: Allocate queued invalidation structure using numa locality info
  dmar: support for parsing Remapping Hardware Static Affinity structure
2009-12-16 10:11:38 -08:00
Linus Torvalds
6a5df38f5f Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (116 commits)
  V4L/DVB (13698): pms: replace asm/uaccess.h to linux/uaccess.h
  V4L/DVB (13690): radio/si470x: #include <sched.h>
  V4L/DVB (13688): au8522: modify the attributes of local filter coefficients
  V4L/DVB (13687): cx231xx: use NULL when pointer is needed
  V4L/DVB: Davinci VPFE Capture: remove unused #include <linux/version.h>
  V4L/DVB (13685): Correct code taking the size of a pointer
  V4L/DVB (13684): Fix some cut-and-paste noise in dib0090.h
  V4L/DVB (13683): sanio-ms: clean up init, exit and id_table
  V4L/DVB (13682): dib8000: make some constant static
  V4L/DVB: lgs8gxx: Use shifts rather than multiply/divide when possible
  V4L/DVB (13680b): DocBook/media: create links for included sources
  V4L/DVB (13680a): DocBook/media: copy images after building HTML
  V4L/DVB (13678): Add support for yet another DvbWorld, TeVii and Prof USB devices
  V4L/DVB (13676): configurable IRQ mode on NetUP Dual DVB-S2 CI; IRQ from CAM processing (CI interface works faster)
  V4L/DVB (13674): stv090x: Add DiSEqC envelope mode
  V4L/DVB (13673): lnbp21: Implement 22 kHz tone control
  V4L/DVB (13671): sh_mobile_ceu_camera: Remove frame size page alignment
  V4L/DVB (13670): soc-camera: Add mt9t112 camera driver
  V4L/DVB (13669): tw9910: Add sync polarity support
  V4L/DVB (13668): tw9910: remove cropping
  ...
2009-12-16 10:09:16 -08:00
Linus Torvalds
9cfc86249f Merge branch 'akpm'
* akpm: (173 commits)
  genalloc: use bitmap_find_next_zero_area
  ia64: use bitmap_find_next_zero_area
  sparc: use bitmap_find_next_zero_area
  mlx4: use bitmap_find_next_zero_area
  isp1362-hcd: use bitmap_find_next_zero_area
  iommu-helper: use bitmap library
  bitmap: introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area
  qnx4: use hweight8
  qnx4fs: remove remains of the (defunct) write support
  resource: constify arg to resource_size() and resource_type()
  gru: send cross partition interrupts using the gru
  gru: function to generate chipset IPI values
  gru: update driver version number
  gru: improve GRU TLB dropin statistics
  gru: fix GRU interrupt race at deallocate
  gru: add hugepage support
  gru: fix bug in allocation of kernel contexts
  gru: update GRU structures to match latest hardware spec
  gru: check for correct GRU chiplet assignment
  gru: remove stray local_irq_enable
  ...
2009-12-16 10:06:39 -08:00
Akinobu Mita
a66022c457 iommu-helper: use bitmap library
Use bitmap library and kill some unused iommu helper functions.

1. s/iommu_area_free/bitmap_clear/

2. s/iommu_area_reserve/bitmap_set/

3. Use bitmap_find_next_zero_area instead of find_next_zero_area

  This cannot be simple substitution because find_next_zero_area
  doesn't check the last bit of the limit in bitmap

4. Remove iommu_area_free, iommu_area_reserve, and find_next_zero_area

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:18 -08:00
Akinobu Mita
c1a2a962a2 bitmap: introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area
This introduces new bitmap functions:

bitmap_set: Set specified bit area
bitmap_clear: Clear specified bit area
bitmap_find_next_zero_area: Find free bit area

These are mostly stolen from iommu helper. The differences are:

- Use find_next_bit instead of doing test_bit for each bit

- Rewrite bitmap_set and bitmap_clear

  Instead of setting or clearing for each bit.

- Check the last bit of the limit

  iommu-helper doesn't want to find such area

- The return value if there is no zero area

  find_next_zero_area in iommu helper: returns -1
  bitmap_find_next_zero_area: return >= bitmap size

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Lothar Wassmann <LW@KARO-electronics.de>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:18 -08:00
Jean Delvare
f65380c07b resource: constify arg to resource_size() and resource_type()
resource_size() doesn't change the resource it operates on, so the res
parameter can be marked const.  Same for resource_type().

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:17 -08:00
Christoph Hellwig
5fe878ae7f direct-io: cleanup blockdev_direct_IO locking
Currently the locking in blockdev_direct_IO is a mess, we have three
different locking types and very confusing checks for some of them.  The
most complicated one is DIO_OWN_LOCKING for reads, which happens to not
actually be used.

This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read
case is unused anyway, and the write side is almost identical to
DIO_NO_LOCKING.  The difference is that DIO_NO_LOCKING always sets the
create argument for the get_blocks callback to zero, but we can easily
move that to the actual get_blocks callbacks.  There are four users of the
DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is
fine with the new version, ocfs2 only errors out if create were ever set,
and we can remove this dead code now, the block device code only ever uses
create for an error message if we are fully beyond the device which can
never happen, and last but not least XFS will need the new behavour for
writes.

Now we can replace the lock_type variable with a flags one, where no flag
means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
flag.  Separate out the check for not allowing to fill holes into a
separate flag, although for now both flags always get set at the same
time.

Also revamp the documentation of the locking scheme to actually make
sense.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alex Elder <aelder@sgi.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:13 -08:00
Shaohua Li
fac046ad0b aio: remove unused field
Don't know the reason, but it appears ki_wait field of iocb never gets used.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:13 -08:00
Amerigo Wang
06a7f71124 kexec: premit reduction of the reserved memory size
Implement shrinking the reserved memory for crash kernel, if it is more
than enough.

For example, if you have already reserved 128M, now you just want 100M,
you can do:

# echo $((100*1024*1024)) > /sys/kernel/kexec_crash_size

Note, you can only do this before loading the crash kernel.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:13 -08:00
Amerigo Wang
9cf18e1dd7 ipc: HARD_MSGMAX should be higher not lower on 64bit
We have HARD_MSGMAX lower on 64bit than on 32bit, since usually 64bit
machines have more memory than 32bit machines.

Making it higher on 64bit seems reasonable, and keep the original number
on 32bit.

Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:10 -08:00
Manfred Spraul
b97e820fff ipc/sem.c: add a per-semaphore pending list
Based on Nick's findings:

sysv sem has the concept of semaphore arrays that consist out of multiple
semaphores.  Atomic operations that affect multiple semaphores are
supported.

The patch is the first step for optimizing simple, single semaphore
operations: In addition to the global list of all pending operations, a
2nd, per-semaphore list with the simple operations is added.

Note: this patch does not make sense by itself, the new list is used
nowhere.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:10 -08:00
Oleg Nesterov
ad09750b51 signals: kill force_sig_specific()
Kill force_sig_specific(), this trivial wrapper has no callers.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:09 -08:00
Oleg Nesterov
614c517d7c signals: SEND_SIG_NOINFO should be considered as SI_FROMUSER()
No changes in compiled code. The patch adds the new helper, si_fromuser()
and changes check_kill_permission() to use this helper.

The real effect of this patch is that from now we "officially" consider
SEND_SIG_NOINFO signal as "from user-space" signals. This is already true
if we look at the code which uses SEND_SIG_NOINFO, except __send_signal()
has another opinion - see the next patch.

The naming of these special SEND_SIG_XXX siginfo's is really bad
imho.  From __send_signal()'s pov they mean

	SEND_SIG_NOINFO		from user
	SEND_SIG_PRIV		from kernel
	SEND_SIG_FORCED		no info

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Reviewed-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:08 -08:00
Oleg Nesterov
2f0edac555 ptrace: change tracehook_report_syscall_exit() to handle stepping
Suggested by Roland.

Change tracehook_report_syscall_exit() to look at step flag and send the
trap signal if needed.

This change affects ia64, microblaze, parisc, powerpc, sh.  They pass
nonzero "step" argument to tracehook but since it was ignored the tracee
reports via ptrace_notify(), this is not right and not consistent.

	- PTRACE_SETSIGINFO doesn't work

	- if the tracer resumes the tracee with signr != 0 the new signal
	  is generated rather than delivering it

	- If PT_TRACESYSGOOD is set the tracee reports the wrong exit_code

I don't have a powerpc machine, but I think this test-case should see the
difference:

	#include <unistd.h>
	#include <sys/ptrace.h>
	#include <sys/wait.h>
	#include <assert.h>
	#include <stdio.h>

	int main(void)
	{
		int pid, status;

		if (!(pid = fork())) {
			assert(ptrace(PTRACE_TRACEME) == 0);
			kill(getpid(), SIGSTOP);

			getppid();

			return 0;
		}

		assert(pid == wait(&status));
		assert(ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD) == 0);

		assert(ptrace(PTRACE_SYSCALL, pid, 0,0) == 0);
		assert(pid == wait(&status));

		assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
		assert(pid == wait(&status));

		if (status == 0x57F)
			return 0;

		printf("kernel bug: status=%X shouldn't have 0x80\n", status);
		return 1;
	}

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:08 -08:00
Oleg Nesterov
85ec7fd9f8 ptrace: introduce user_single_step_siginfo() helper
Suggested by Roland.

Currently there is no way to synthesize a single-stepping trap in the
arch-independent manner.  This patch adds the default helper which fills
siginfo_t, arch/ can can override it.

Architetures which implement user_enable_single_step() should add
user_single_step_siginfo() also.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:08 -08:00
Oleg Nesterov
c6a47cc2cc ptrace: cleanup ptrace_init_task()->ptrace_link() path
No functional changes.

ptrace_init_task() looks confusing, as if we always auto-attach when "bool
ptrace" argument is true, while in fact we attach only if current is
traced.

Make the code more explicit and kill now unused ptrace_link().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:08 -08:00
Daisuke Nishimura
57f9fd7d25 memcg: cleanup mem_cgroup_move_parent()
mem_cgroup_move_parent() calls try_charge first and cancel_charge on
failure.  IMHO, charge/uncharge(especially charge) is high cost operation,
so we should avoid it as far as possible.

This patch tries to delay try_charge in mem_cgroup_move_parent() by
re-ordering checks it does.

And this patch renames mem_cgroup_move_account() to
__mem_cgroup_move_account(), changes the return value of
__mem_cgroup_move_account() from int to void, and adds a new
wrapper(mem_cgroup_move_account()), which checks whether a @pc is valid
for moving account and calls __mem_cgroup_move_account().

This patch removes the last caller of trylock_page_cgroup(), so removes
its definition too.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:07 -08:00
KAMEZAWA Hiroyuki
d8046582d5 memcg: make memcg's file mapped consistent with global VM
In global VM, FILE_MAPPED is used but memcg uses MAPPED_FILE.  This makes
grep difficult.  Replace memcg's MAPPED_FILE with FILE_MAPPED

And in global VM, mapped shared memory is accounted into FILE_MAPPED.
But memcg doesn't. fix it.
Note:
  page_is_file_cache() just checks SwapBacked or not.
  So, we need to check PageAnon.

Cc: Balbir Singh <balbir@in.ibm.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:07 -08:00
KAMEZAWA Hiroyuki
569b846df5 memcg: coalesce uncharge during unmap/truncate
In massive parallel enviroment, res_counter can be a performance
bottleneck.  One strong techinque to reduce lock contention is reducing
calls by coalescing some amount of calls into one.

Considering charge/uncharge chatacteristic,
	- charge is done one by one via demand-paging.
	- uncharge is done by
		- in chunk at munmap, truncate, exit, execve...
		- one by one via vmscan/paging.

It seems we have a chance to coalesce uncharges for improving scalability
at unmap/truncation.

This patch is a for coalescing uncharge.  For avoiding scattering memcg's
structure to functions under /mm, this patch adds memcg batch uncharge
information to the task.  A reason for per-task batching is for making use
of caller's context information.  We do batched uncharge (deleyed
uncharge) when truncation/unmap occurs but do direct uncharge when
uncharge is called by memory reclaim (vmscan.c).

The degree of coalescing depends on callers
  - at invalidate/trucate... pagevec size
  - at unmap ....ZAP_BLOCK_SIZE
(memory itself will be freed in this degree.)
Then, we'll not coalescing too much.

On x86-64 8cpu server, I tested overheads of memcg at page fault by
running a program which does map/fault/unmap in a loop. Running
a task per a cpu by taskset and see sum of the number of page faults
in 60secs.

[without memcg config]
  40156968  page-faults              #      0.085 M/sec   ( +-   0.046% )
  27.67 cache-miss/faults
[root cgroup]
  36659599  page-faults              #      0.077 M/sec   ( +-   0.247% )
  31.58 miss/faults
[in a child cgroup]
  18444157  page-faults              #      0.039 M/sec   ( +-   0.133% )
  69.96 miss/faults
[child with this patch]
  27133719  page-faults              #      0.057 M/sec   ( +-   0.155% )
  47.16 miss/faults

We can see some amounts of improvement.
(root cgroup doesn't affected by this patch)
Another patch for "charge" will follow this and above will be improved more.

Changelog(since 2009/10/02):
 - renamed filed of memcg_batch (as pages to bytes, memsw to memsw_bytes)
 - some clean up and commentary/description updates.
 - added initialize code to copy_process(). (possible bug fix)

Changelog(old):
 - fixed !CONFIG_MEM_CGROUP case.
 - rebased onto the latest mmotm + softlimit fix patches.
 - unified patch for callers
 - added commetns.
 - make ->do_batch as bool.
 - removed css_get() at el. We don't need it.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:07 -08:00
Alexey Dobriyan
e3c96f53ac reiserfs: don't compile procfs.o at all if no support
* small define cleanup in header
* fix #ifdeffery in procfs.c via Kconfig

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:06 -08:00
Alexey Dobriyan
904e812931 reiserfs: remove /proc/fs/reiserfs/version
/proc/fs/reiserfs/version is on the way of removing ->read_proc interface.
 It's empty however, so simply remove it instead of doing dummy
conversion.  It's hard to see what information userspace can extract from
empty file.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:06 -08:00