Merge remote-tracking branch 'origin/upstream/linux-linaro-lsk-v3.10-android+android-common-3.10' into develop-3.10

This commit is contained in:
黄涛
2013-12-10 12:14:30 +08:00
118 changed files with 1678 additions and 537 deletions

View File

@@ -0,0 +1,136 @@
Small Task Packing in the big.LITTLE MP Reference Patch Set
What is small task packing?
----
Simply that the scheduler will fit as many small tasks on a single CPU
as possible before using other CPUs. A small task is defined as one
whose tracked load is less than 90% of a NICE_0 task. This is a change
from the usual behavior since the scheduler will normally use an idle
CPU for a waking task unless that task is considered cache hot.
How is it implemented?
----
Since all small tasks must wake up relatively frequently, the main
requirement for packing small tasks is to select a partly-busy CPU when
waking rather than looking for an idle CPU. We use the tracked load of
the CPU runqueue to determine how heavily loaded each CPU is and the
tracked load of the task to determine if it will fit on the CPU. We
always start with the lowest-numbered CPU in a sched domain and stop
looking when we find a CPU with enough space for the task.
Some further tweaks are necessary to suppress load balancing when the
CPU is not fully loaded, otherwise the scheduler attempts to spread
tasks evenly across the domain.
How does it interact with the HMP patches?
----
Firstly, we only enable packing on the little domain. The intent is that
the big domain is intended to spread tasks amongst the available CPUs
one-task-per-CPU. The little domain however is attempting to use as
little power as possible while servicing its tasks.
Secondly, since we offload big tasks onto little CPUs in order to try
to devote one CPU to each task, we have a threshold above which we do
not try to pack a task and instead will select an idle CPU if possible.
This maintains maximum forward progress for busy tasks temporarily
demoted from big CPUs.
Can the behaviour be tuned?
----
Yes, the load level of a 'full' CPU can be easily modified in the source
and is exposed through sysfs as /sys/kernel/hmp/packing_limit to be
changed at runtime. The presence of the packing behaviour is controlled
by CONFIG_SCHED_HMP_LITTLE_PACKING and can be disabled at run-time
using /sys/kernel/hmp/packing_enable.
The definition of a small task is hard coded as 90% of NICE_0_LOAD
and cannot be modified at run time.
Why do I need to tune it?
----
The optimal configuration is likely to be different depending upon the
design and manufacturing of your SoC.
In the main, there are two system effects from enabling small task
packing.
1. CPU operating point may increase
2. wakeup latency of tasks may be increased
There are also likely to be secondary effects from loading one CPU
rather than spreading tasks.
Note that all of these system effects are dependent upon the workload
under consideration.
CPU Operating Point
----
The primary impact of loading one CPU with a number of light tasks is to
increase the compute requirement of that CPU since it is no longer idle
as often. Increased compute requirement causes an increase in the
frequency of the CPU through CPUfreq.
Consider this example:
We have a system with 3 CPUs which can operate at any frequency between
350MHz and 1GHz. The system has 6 tasks which would each produce 10%
load at 1GHz. The scheduler has frequency-invariant load scaling
enabled. Our DVFS governor aims for 80% utilization at the chosen
frequency.
Without task packing, these tasks will be spread out amongst all CPUs
such that each has 2. This will produce roughly 20% system load, and
the frequency of the package will remain at 350MHz.
With task packing set to the default packing_limit, all of these tasks
will sit on one CPU and require a package frequency of ~750MHz to reach
80% utilization. (0.75 = 0.6 * 0.8).
When a package operates on a single frequency domain, all CPUs in that
package share frequency and voltage.
Depending upon the SoC implementation there can be a significant amount
of energy lost to leakage from idle CPUs. The decision about how
loaded a CPU must be to be considered 'full' is therefore controllable
through sysfs (sys/kernel/hmp/packing_limit) and directly in the code.
Continuing the example, lets set packing_limit to 450 which means we
will pack tasks until the total load of all running tasks >= 450. In
practise, this is very similar to a 55% idle 1Ghz CPU.
Now we are only able to place 4 tasks on CPU0, and two will overflow
onto CPU1. CPU0 will have a load of 40% and CPU1 will have a load of
20%. In order to still hit 80% utilization, CPU0 now only needs to
operate at (0.4*0.8=0.32) 320MHz, which means that the lowest operating
point will be selected, the same as in the non-packing case, except that
now CPU2 is no longer needed and can be power-gated.
In order to use less energy, the saving from power-gating CPU2 must be
more than the energy spent running CPU0 for the extra cycles. This
depends upon the SoC implementation.
This is obviously a contrived example requiring all the tasks to
be runnable at the same time, but it illustrates the point.
Wakeup Latency
----
This is an unavoidable consequence of trying to pack tasks together
rather than giving them a CPU each. If you cannot find an acceptable
level of wakeup latency, you should turn packing off.
Cyclictest is a good test application for determining the added latency
when configuring packing.
Why is it turned off for the VersatileExpress V2P_CA15A7 CoreTile?
----
Simply, this core tile only has power gating for the whole A7 package.
When small task packing is enabled, all our low-energy use cases
normally fit onto one A7 CPU. We therefore end up with 2 mostly-idle
CPUs and one mostly-busy CPU. This decreases the amount of time
available where the whole package is idle and can be turned off.

View File

@@ -1,6 +1,6 @@
VERSION = 3
PATCHLEVEL = 10
SUBLEVEL = 19
SUBLEVEL = 21
EXTRAVERSION =
NAME = TOSSUG Baby Fish

View File

@@ -1513,6 +1513,17 @@ config SCHED_HMP
There is currently no support for migration of task groups, hence
!SCHED_AUTOGROUP. Furthermore, normal load-balancing must be disabled
between cpus of different type (DISABLE_CPU_SCHED_DOMAIN_BALANCE).
When turned on, this option adds sys/kernel/hmp directory which
contains the following files:
up_threshold - the load average threshold used for up migration
(0 - 1023)
down_threshold - the load average threshold used for down migration
(0 - 1023)
hmp_domains - a list of cpumasks for the present HMP domains,
starting with the 'biggest' and ending with the
'smallest'.
Note that both the threshold files can be written at runtime to
control scheduler behaviour.
config SCHED_HMP_PRIO_FILTER
bool "(EXPERIMENTAL) Filter HMP migrations by task priority"
@@ -1547,28 +1558,24 @@ config HMP_VARIABLE_SCALE
bool "Allows changing the load tracking scale through sysfs"
depends on SCHED_HMP
help
When turned on, this option exports the thresholds and load average
period value for the load tracking patches through sysfs.
When turned on, this option exports the load average period value
for the load tracking patches through sysfs.
The values can be modified to change the rate of load accumulation
and the thresholds used for HMP migration.
The load_avg_period_ms is the time in ms to reach a load average of
0.5 for an idle task of 0 load average ratio that start a busy loop.
The up_threshold and down_threshold is the value to go to a faster
CPU or to go back to a slower cpu.
The {up,down}_threshold are devided by 1024 before being compared
to the load average.
For examples, with load_avg_period_ms = 128 and up_threshold = 512,
used for HMP migration. 'load_avg_period_ms' is the time in ms to
reach a load average of 0.5 for an idle task of 0 load average
ratio which becomes 100% busy.
For example, with load_avg_period_ms = 128 and up_threshold = 512,
a running task with a load of 0 will be migrated to a bigger CPU after
128ms, because after 128ms its load_avg_ratio is 0.5 and the real
up_threshold is 0.5.
This patch has the same behavior as changing the Y of the load
average computation to
(1002/1024)^(LOAD_AVG_PERIOD/load_avg_period_ms)
but it remove intermadiate overflows in computation.
but removes intermediate overflows in computation.
config HMP_FREQUENCY_INVARIANT_SCALE
bool "(EXPERIMENTAL) Frequency-Invariant Tracked Load for HMP"
depends on HMP_VARIABLE_SCALE && CPU_FREQ
depends on SCHED_HMP && CPU_FREQ
help
Scales the current load contribution in line with the frequency
of the CPU that the task was executed on.

View File

@@ -313,6 +313,17 @@ out:
return err;
}
static phys_addr_t kvm_kaddr_to_phys(void *kaddr)
{
if (!is_vmalloc_addr(kaddr)) {
BUG_ON(!virt_addr_valid(kaddr));
return __pa(kaddr);
} else {
return page_to_phys(vmalloc_to_page(kaddr)) +
offset_in_page(kaddr);
}
}
/**
* create_hyp_mappings - duplicate a kernel virtual address range in Hyp mode
* @from: The virtual kernel start address of the range
@@ -324,16 +335,27 @@ out:
*/
int create_hyp_mappings(void *from, void *to)
{
unsigned long phys_addr = virt_to_phys(from);
phys_addr_t phys_addr;
unsigned long virt_addr;
unsigned long start = KERN_TO_HYP((unsigned long)from);
unsigned long end = KERN_TO_HYP((unsigned long)to);
/* Check for a valid kernel memory mapping */
if (!virt_addr_valid(from) || !virt_addr_valid(to - 1))
return -EINVAL;
start = start & PAGE_MASK;
end = PAGE_ALIGN(end);
return __create_hyp_mappings(hyp_pgd, start, end,
__phys_to_pfn(phys_addr), PAGE_HYP);
for (virt_addr = start; virt_addr < end; virt_addr += PAGE_SIZE) {
int err;
phys_addr = kvm_kaddr_to_phys(from + virt_addr - start);
err = __create_hyp_mappings(hyp_pgd, virt_addr,
virt_addr + PAGE_SIZE,
__phys_to_pfn(phys_addr),
PAGE_HYP);
if (err)
return err;
}
return 0;
}
/**

View File

@@ -122,7 +122,15 @@ static void tc2_pm_down(u64 residency)
} else
BUG();
gic_cpu_if_down();
/*
* If the CPU is committed to power down, make sure
* the power controller will be in charge of waking it
* up upon IRQ, ie IRQ lines are cut from GIC CPU IF
* to the CPU by disabling the GIC CPU IF to prevent wfi
* from completing execution behind power controller back
*/
if (!skip_wfi)
gic_cpu_if_down();
if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) {
arch_spin_unlock(&tc2_pm_lock);

View File

@@ -3,6 +3,7 @@
#include <asm/page.h> /* for __va, __pa */
#include <arch/io.h>
#include <asm-generic/iomap.h>
#include <linux/kernel.h>
struct cris_io_operations

View File

@@ -319,7 +319,7 @@ struct thread_struct {
regs->loadrs = 0; \
regs->r8 = get_dumpable(current->mm); /* set "don't zap registers" flag */ \
regs->r12 = new_sp - 16; /* allocate 16 byte scratch area */ \
if (unlikely(!get_dumpable(current->mm))) { \
if (unlikely(get_dumpable(current->mm) != SUID_DUMP_USER)) { \
/* \
* Zap scratch regs to avoid leaking bits between processes with different \
* uid/privileges. \

View File

@@ -454,7 +454,15 @@ static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
if (copy_vsx_to_user(&frame->mc_vsregs, current))
return 1;
msr |= MSR_VSX;
}
} else if (!ctx_has_vsx_region)
/*
* With a small context structure we can't hold the VSX
* registers, hence clear the MSR value to indicate the state
* was not saved.
*/
msr &= ~MSR_VSX;
#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* save spe registers */

View File

@@ -1530,12 +1530,12 @@ static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
dn = dev->of_node;
if (!dn) {
strcat(buf, "\n");
strcpy(buf, "\n");
return strlen(buf);
}
cp = of_get_property(dn, "compatible", NULL);
if (!cp) {
strcat(buf, "\n");
strcpy(buf, "\n");
return strlen(buf);
}

View File

@@ -258,7 +258,7 @@ static bool slice_scan_available(unsigned long addr,
slice = GET_HIGH_SLICE_INDEX(addr);
*boundary_addr = (slice + end) ?
((slice + end) << SLICE_HIGH_SHIFT) : SLICE_LOW_TOP;
return !!(available.high_slices & (1u << slice));
return !!(available.high_slices & (1ul << slice));
}
}

View File

@@ -57,5 +57,5 @@ config PPC_MPC5200_BUGFIX
config PPC_MPC5200_LPBFIFO
tristate "MPC5200 LocalPlus bus FIFO driver"
depends on PPC_MPC52xx
depends on PPC_MPC52xx && PPC_BESTCOMM
select PPC_BESTCOMM_GEN_BD

View File

@@ -151,13 +151,23 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
rid_end = pe->rid + 1;
}
/* Associate PE in PELT */
/*
* Associate PE in PELT. We need add the PE into the
* corresponding PELT-V as well. Otherwise, the error
* originated from the PE might contribute to other
* PEs.
*/
rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
bcomp, dcomp, fcomp, OPAL_MAP_PE);
if (rc) {
pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
return -ENXIO;
}
rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number,
pe->pe_number, OPAL_ADD_PE_TO_DOMAIN);
if (rc)
pe_warn(pe, "OPAL error %d adding self to PELTV\n", rc);
opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number,
OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);

View File

@@ -35,7 +35,6 @@ static u8 *ctrblk;
static char keylen_flag;
struct s390_aes_ctx {
u8 iv[AES_BLOCK_SIZE];
u8 key[AES_MAX_KEY_SIZE];
long enc;
long dec;
@@ -441,30 +440,36 @@ static int cbc_aes_set_key(struct crypto_tfm *tfm, const u8 *in_key,
return aes_set_key(tfm, in_key, key_len);
}
static int cbc_aes_crypt(struct blkcipher_desc *desc, long func, void *param,
static int cbc_aes_crypt(struct blkcipher_desc *desc, long func,
struct blkcipher_walk *walk)
{
struct s390_aes_ctx *sctx = crypto_blkcipher_ctx(desc->tfm);
int ret = blkcipher_walk_virt(desc, walk);
unsigned int nbytes = walk->nbytes;
struct {
u8 iv[AES_BLOCK_SIZE];
u8 key[AES_MAX_KEY_SIZE];
} param;
if (!nbytes)
goto out;
memcpy(param, walk->iv, AES_BLOCK_SIZE);
memcpy(param.iv, walk->iv, AES_BLOCK_SIZE);
memcpy(param.key, sctx->key, sctx->key_len);
do {
/* only use complete blocks */
unsigned int n = nbytes & ~(AES_BLOCK_SIZE - 1);
u8 *out = walk->dst.virt.addr;
u8 *in = walk->src.virt.addr;
ret = crypt_s390_kmc(func, param, out, in, n);
ret = crypt_s390_kmc(func, &param, out, in, n);
if (ret < 0 || ret != n)
return -EIO;
nbytes &= AES_BLOCK_SIZE - 1;
ret = blkcipher_walk_done(desc, walk, nbytes);
} while ((nbytes = walk->nbytes));
memcpy(walk->iv, param, AES_BLOCK_SIZE);
memcpy(walk->iv, param.iv, AES_BLOCK_SIZE);
out:
return ret;
@@ -481,7 +486,7 @@ static int cbc_aes_encrypt(struct blkcipher_desc *desc,
return fallback_blk_enc(desc, dst, src, nbytes);
blkcipher_walk_init(&walk, dst, src, nbytes);
return cbc_aes_crypt(desc, sctx->enc, sctx->iv, &walk);
return cbc_aes_crypt(desc, sctx->enc, &walk);
}
static int cbc_aes_decrypt(struct blkcipher_desc *desc,
@@ -495,7 +500,7 @@ static int cbc_aes_decrypt(struct blkcipher_desc *desc,
return fallback_blk_dec(desc, dst, src, nbytes);
blkcipher_walk_init(&walk, dst, src, nbytes);
return cbc_aes_crypt(desc, sctx->dec, sctx->iv, &walk);
return cbc_aes_crypt(desc, sctx->dec, &walk);
}
static struct crypto_alg cbc_aes_alg = {

View File

@@ -933,7 +933,7 @@ static ssize_t show_idle_count(struct device *dev,
idle_count = ACCESS_ONCE(idle->idle_count);
if (ACCESS_ONCE(idle->clock_idle_enter))
idle_count++;
} while ((sequence & 1) || (idle->sequence != sequence));
} while ((sequence & 1) || (ACCESS_ONCE(idle->sequence) != sequence));
return sprintf(buf, "%llu\n", idle_count);
}
static DEVICE_ATTR(idle_count, 0444, show_idle_count, NULL);
@@ -951,7 +951,7 @@ static ssize_t show_idle_time(struct device *dev,
idle_time = ACCESS_ONCE(idle->idle_time);
idle_enter = ACCESS_ONCE(idle->clock_idle_enter);
idle_exit = ACCESS_ONCE(idle->clock_idle_exit);
} while ((sequence & 1) || (idle->sequence != sequence));
} while ((sequence & 1) || (ACCESS_ONCE(idle->sequence) != sequence));
idle_time += idle_enter ? ((idle_exit ? : now) - idle_enter) : 0;
return sprintf(buf, "%llu\n", idle_time >> 12);
}

View File

@@ -190,7 +190,7 @@ cputime64_t s390_get_idle_time(int cpu)
sequence = ACCESS_ONCE(idle->sequence);
idle_enter = ACCESS_ONCE(idle->clock_idle_enter);
idle_exit = ACCESS_ONCE(idle->clock_idle_exit);
} while ((sequence & 1) || (idle->sequence != sequence));
} while ((sequence & 1) || (ACCESS_ONCE(idle->sequence) != sequence));
return idle_enter ? ((idle_exit ?: now) - idle_enter) : 0;
}

View File

@@ -248,6 +248,15 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
return ret;
}
static int is_ftrace_caller(unsigned long ip)
{
if (ip == (unsigned long)(&ftrace_call) ||
ip == (unsigned long)(&ftrace_regs_call))
return 1;
return 0;
}
/*
* A breakpoint was added to the code address we are about to
* modify, and this is the handle that will just skip over it.
@@ -257,10 +266,13 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
*/
int ftrace_int3_handler(struct pt_regs *regs)
{
unsigned long ip;
if (WARN_ON_ONCE(!regs))
return 0;
if (!ftrace_location(regs->ip - 1))
ip = regs->ip - 1;
if (!ftrace_location(ip) && !is_ftrace_caller(ip))
return 0;
regs->ip += MCOUNT_INSN_SIZE - 1;

View File

@@ -430,7 +430,7 @@ static enum ucode_state request_microcode_amd(int cpu, struct device *device,
snprintf(fw_name, sizeof(fw_name), "amd-ucode/microcode_amd_fam%.2xh.bin", c->x86);
if (request_firmware(&fw, (const char *)fw_name, device)) {
pr_err("failed to load file %s\n", fw_name);
pr_debug("failed to load file %s\n", fw_name);
goto out;
}

View File

@@ -378,9 +378,9 @@ static void amd_e400_idle(void)
* The switch back from broadcast mode needs to be
* called with interrupts disabled.
*/
local_irq_disable();
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
local_irq_enable();
local_irq_disable();
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
local_irq_enable();
} else
default_idle();
}

View File

@@ -4207,7 +4207,10 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op,
case OpMem8:
ctxt->memop.bytes = 1;
if (ctxt->memop.type == OP_REG) {
ctxt->memop.addr.reg = decode_register(ctxt, ctxt->modrm_rm, 1);
int highbyte_regs = ctxt->rex_prefix == 0;
ctxt->memop.addr.reg = decode_register(ctxt, ctxt->modrm_rm,
highbyte_regs);
fetch_register_operand(&ctxt->memop);
}
goto mem_common;

View File

@@ -2229,6 +2229,7 @@ void blk_start_request(struct request *req)
if (unlikely(blk_bidi_rq(req)))
req->next_rq->resid_len = blk_rq_bytes(req->next_rq);
BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags));
blk_add_timer(req);
}
EXPORT_SYMBOL(blk_start_request);

Some files were not shown because too many files have changed in this diff Show More