SD/MMC cards tend to support an erase operation. In addition, eMMC v4.4
cards can support secure erase, trim and secure trim operations that are
all variants of the basic erase command.
SD/MMC device attributes "erase_size" and "preferred_erase_size" have been
added.
"erase_size" is the minimum size, in bytes, of an erase operation. For
MMC, "erase_size" is the erase group size reported by the card. Note that
"erase_size" does not apply to trim or secure trim operations where the
minimum size is always one 512 byte sector. For SD, "erase_size" is 512
if the card is block-addressed, 0 otherwise.
SD/MMC cards can erase an arbitrarily large area up to and
including the whole card. When erasing a large area it may
be desirable to do it in smaller chunks for three reasons:
1. A single erase command will make all other I/O on the card
wait. This is not a problem if the whole card is being erased, but
erasing one partition will make I/O for another partition on the
same card wait for the duration of the erase - which could be a
several minutes.
2. To be able to inform the user of erase progress.
3. The erase timeout becomes too large to be very useful.
Because the erase timeout contains a margin which is multiplied by
the size of the erase area, the value can end up being several
minutes for large areas.
"erase_size" is not the most efficient unit to erase (especially for SD
where it is just one sector), hence "preferred_erase_size" provides a good
chunk size for erasing large areas.
For MMC, "preferred_erase_size" is the high-capacity erase size if a card
specifies one, otherwise it is based on the capacity of the card.
For SD, "preferred_erase_size" is the allocation unit size specified by
the card.
"preferred_erase_size" is in bytes.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ben Gardiner <bengardiner@nanometrics.ca>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 83ba7b071f ("writeback: simplify the write back thread queue")
broke writeback_in_progress() as in that commit we started to remove work
items from the list at the moment we start working on them and not at the
moment they are finished. Thus if the flusher thread was doing some work
but there was no other work queued, writeback_in_progress() returned
false. This could in particular cause unnecessary queueing of background
writeback from balance_dirty_pages() or writeout work from
writeback_sb_if_idle().
This patch fixes the problem by introducing a bit in the bdi state which
indicates that the flusher thread is processing some work and uses this
bit for writeback_in_progress() test.
NOTE: Both callsites of writeback_in_progress() (namely,
writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
need a different information than what writeback_in_progress() provides.
They would need to know whether *the kind of writeback they are going to
submit* is already queued. But this information isn't that simple to
provide so let's fix writeback_in_progress() for the time being.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Unify the logic for kupdate and non-kupdate cases. There won't be
starvation because the inodes requeued into b_more_io will later be
spliced _after_ the remaining inodes in b_io, hence won't stand in the way
of other inodes in the next run.
It avoids unnecessary redirty_tail() calls, hence the update of
i_dirtied_when. The timestamp update is undesirable because it could
later delay the inode's periodic writeback, or may exclude the inode from
the data integrity sync operation (which checks timestamp to avoid extra
work and livelock).
===
How the redirty_tail() comes about:
It was a long story.. This redirty_tail() was introduced with
wbc.more_io. The initial patch for more_io actually does not have the
redirty_tail(), and when it's merged, several 100% iowait bug reports
arised:
reiserfs:
http://lkml.org/lkml/2007/10/23/93
jfs:
commit 29a424f283
JFS: clear PAGECACHE_TAG_DIRTY for no-write pages
ext2:
http://www.spinics.net/linux/lists/linux-ext4/msg04762.html
They are all old bugs hidden in various filesystems that become "visible"
with the more_io patch. At the time, the ext2 bug is thought to be
"trivial", so not fixed. Instead the following updated more_io patch with
redirty_tail() is merged:
http://www.spinics.net/linux/lists/linux-ext4/msg04507.html
This will in general prevent 100% on ext2 and possibly other unknown FS bugs.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid delaying writeback for an expire inode with lots of dirty pages, but
no active dirtier at the moment. Previously we only do that for the
kupdate case.
Any filesystem that does delayed allocation or unwritten extent conversion
after IO completion will cause this - for example, XFS.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reducing the number of times balance_dirty_pages calls global_page_state
reduces the cache references and so improves write performance on a
variety of workloads.
'perf stats' of simple fio write tests shows the reduction in cache
access. Where the test is fio 'write,mmap,600Mb,pre_read' on AMD AthlonX2
with 3Gb memory (dirty_threshold approx 600 Mb) running each test 10
times, dropping the fasted & slowest values then taking the average &
standard deviation
average (s.d.) in millions (10^6)
2.6.31-rc8 648.6 (14.6)
+patch 620.1 (16.5)
Achieving this reduction is by dropping clip_bdi_dirty_limit as it rereads
the counters to apply the dirty_threshold and moving this check up into
balance_dirty_pages where it has already read the counters.
Also by rearrange the for loop to only contain one copy of the limit tests
allows the pdflush test after the loop to use the local copies of the
counters rather than rereading them.
In the common case with no throttling it now calls global_page_state 5
fewer times and bdi_stat 2 fewer.
Fengguang:
This patch slightly changes behavior by replacing clip_bdi_dirty_limit()
with the explicit check (nr_reclaimable + nr_writeback >= dirty_thresh) to
avoid exceeding the dirty limit. Since the bdi dirty limit is mostly
accurate we don't need to do routinely clip. A simple dirty limit check
would be enough.
The check is necessary because, in principle we should throttle everything
calling balance_dirty_pages() when we're over the total limit, as said by
Peter.
We now set and clear dirty_exceeded not only based on bdi dirty limits,
but also on the global dirty limit. The global limit check is added in
place of clip_bdi_dirty_limit() for safety and not intended as a behavior
change. The bdi limits should be tight enough to keep all dirty pages
under the global limit at most time; occasional small exceeding should be
OK though. The change makes the logic more obvious: the global limit is
the ultimate goal and shall be always imposed.
We may now start background writeback work based on outdated conditions.
That's safe because the bdi flush thread will (and have to) double check
the states. It reduces overall overheads because the test based on old
states still have good chance to be right.
[akpm@linux-foundation.org] fix uninitialized dirty_exceeded
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kconfig dependency warning for PC8736x_GPIO by restricting it to
X86_32.
warning: (SCx200_GPIO && SCx200 || PC8736x_GPIO && X86) selects NSC_GPIO which has unmet direct dependencies (X86_32)
NSC_GPIO is X86_32 only. The other driver (SCx200_GPIO) that selects
NSC_GPIO is X86_32 only (indirectly, since SCx200 depends on X86_32), so
limit this driver also.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a fatal kernel-doc error due to a #define coming between a function's
kernel-doc notation and the function signature. (kernel-doc cannot handle
this)
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ACPI_PREEMPTION_POINT() logic was introduced in commit 8bd108d
(ACPICA: add preemption point after each opcode parse). The follow up
commits abe1dfab6, 138d15692, c084ca70 tried to fix the preemption logic
back and forth, but nobody noticed that the usage of
in_atomic_preempt_off() in that context is wrong.
The check which guards the call of cond_resched() is:
if (!in_atomic_preempt_off() && !irqs_disabled())
in_atomic_preempt_off() is not intended for general use as the comment
above the macro definition clearly says:
* Check whether we were atomic before we did preempt_disable():
* (used by the scheduler, *after* releasing the kernel lock)
On a CONFIG_PREEMPT=n kernel the usage of in_atomic_preempt_off() works by
accident, but with CONFIG_PREEMPT=y it's just broken.
The whole purpose of the ACPI_PREEMPTION_POINT() is to reduce the latency
on a CONFIG_PREEMPT=n kernel, so make ACPI_PREEMPTION_POINT() depend on
CONFIG_PREEMPT=n and remove the in_atomic_preempt_off() check.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16210
[akpm@linux-foundation.org: fix build]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Len Brown <lenb@kernel.org>
Cc: Francois Valenduc <francois.valenduc@tvcablenet.be>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current kfifo scatterlist implementation will not work with chained
scatterlists. It assumes that struct scatterlist arrays are allocated
contiguously, which is not the case when chained scatterlists (struct
sg_table) are in use.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
isofs: Fix lseek() to position beyond 4 GB
vfs: remove unused MNT_STRICTATIME
vfs: show unreachable paths in getcwd and proc
vfs: only add " (deleted)" where necessary
vfs: add prepend_path() helper
vfs: __d_path: dont prepend the name of the root dentry
ia64: perfmon: add d_dname method
vfs: add helpers to get root and pwd
cachefiles: use path_get instead of lone dget
fs/sysv/super.c: add support for non-PDP11 v7 filesystems
V7: Adjust sanity checks for some volumes
Add v7 alias
v9fs: fixup for inode_setattr being removed
Manual merge to take Al's version of the fs/sysv/super.c file: it merged
cleanly, but Al had removed an unnecessary header include, so his side
was better.
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
Squashfs: fix checkpatch.pl warnings
Squashfs: fix filename typo
Squashfs: update Kconfig and documentation for LZO
Squashfs: fix block size use in LZO decompressor
Squashfs: Add LZO compression support
squashfs: fix filename in header comment
Squashfs: Make XATTR config name consistent with other file systems
squashfs: fix compiler inline warning
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
exofs: Fix groups code when num_devices is not divisible by group_width
exofs: Remove useless optimization
exofs: exofs_file_fsync and exofs_file_flush correctness
exofs: Remove superfluous dependency on buffer_head and writeback
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits)
ceph: generalize mon requests, add pool op support
ceph: only queue async writeback on cap revocation if there is dirty data
ceph: do not ignore osd_idle_ttl mount option
ceph: constify dentry_operations
ceph: whitespace cleanup
ceph: add flock/fcntl lock support
ceph: define on-wire types, constants for file locking support
ceph: add CEPH_FEATURE_FLOCK to the supported feature bits
ceph: support v2 reconnect encoding
ceph: support v2 client_caps encoding
ceph: move AES iv definition to shared header
ceph: fix decoding of pool snap info
ceph: make ->sync_fs not wait if wait==0
ceph: warn on missing snap realm
ceph: print useful error message when crush rule not found
ceph: use %pU to print uuid (fsid)
ceph: sync header defs with server code
ceph: clean up header guards
ceph: strip misleading/obsolete version, feature info
ceph: specify supported features in super.h
...
* 'msm-video' of git://codeaurora.org/quic/kernel/dwalker/linux-msm:
video: msm: Fix section mismatch in mddi.c.
drivers: video: msm: drop some unused variables
* 'ixp4xx' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6:
IXP4xx: Fix LL debugging on little-endian CPU.
IXP4xx: Fix sparse warnings in I/O primitives.
IXP4xx: Make mdio_bus struct static in the Ethernet driver.
IXP4xx: Fix ixp4xx_crypto little-endian operation.
IXP4xx: Prevent HSS transmitter lockup by disabling FRaMe signals.
ixp4xx/vulcan: add PCI support
ixp4xx: base support for Arcom Vulcan
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (226 commits)
ARM: 6323/1: cam60: don't use __init for cam60_spi_{flash_platform_data,partitions}
ARM: 6324/1: cam60: move cam60_spi_devices to .init.data
ARM: 6322/1: imx/pca100: Fix name of spi platform data
ARM: 6321/1: fix syntax error in main Kconfig file
ARM: 6297/1: move U300 timer to dynamic clock lookup
ARM: 6296/1: clock U300 intcon and timer properly
ARM: 6295/1: fix U300 apb_pclk split
ARM: 6306/1: fix inverted MMC card detect in U300
ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS operations can broadcast a faulty ASID
ARM: 6294/1: etm: do a dummy read from OSSRR during initialization
ARM: 6292/1: coresight: add ETM management registers
ARM: 6288/1: ftrace: document mcount formats
ARM: 6287/1: ftrace: clean up mcount assembly indentation
ARM: 6286/1: fix Thumb-2 decompressor broken by "Auto calculate ZRELADDR"
ARM: 6281/1: video/imxfb.c: allow usage without BACKLIGHT_CLASS_DEVICE
ARM: 6280/1: imx: Fix build failure when including <mach/gpio.h> without <linux/spinlock.h>
ARM: S5PV210: Fix on missing s3c-sdhci card detection method for hsmmc3
ARM: S5P: Fix on missing S5P_DEV_FIMC in plat-s5p/Kconfig
ARM: S5PV210: Override FIMC driver name on Aquila board
ARM: S5PC100: enable FIMC on SMDKC100
...
Fix up conflicts in arch/arm/mach-{s5pc100,s5pv210}/cpu.c due to
different subsystem 'setname' calls, and trivial port types in
include/linux/serial_core.h
Fix checkstack error:
lib/decompress_bunzip2.c: In function `get_next_block':
lib/decompress_bunzip2.c:511: warning: the frame size of 1932 bytes is larger than 1024 bytes
byteCount, symToByte, and mtfSymbol cannot be declared static or allocated
dynamically so place them in the bunzip_data struct.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add four examples to the kernel sample directory.
It shows how to handle:
- a byte stream fifo
- a integer type fifo
- a dynamic record sized fifo
- the fifo DMA functions
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>