mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
1d68d2612c2e7309166fa43d8e27eb163435527f
298 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1d68d2612c |
ACPI / EC: Add event clearing variation support.
We've been suffering from the uncertainty of the SCI_EVT clearing timing.
This patch implements 3 of 4 possible modes to handle SCI_EVT clearing
variations. The old behavior is kept in this patch.
Status: QR_EC is re-checked as early as possible after checking previous
SCI_EVT. This always leads to 2 QR_EC transactions per SCI_EVT
indication and the target may implement event queue which returns
0x00 indicating "no outstanding event".
This is proven to be a conflict against Windows behavior, but is
still kept in this patch to make the EC driver robust to the
possible regressions that may occur on Samsung platforms.
Query: QR_EC is re-checked after the target has handled the QR_EC query
request command pushed by the host.
Event: QR_EC is re-checked after the target has noticed the query event
response data pulled by the host.
This timing is not determined by any IRQs, so we may need to use a
guard period in this mode, which may explain the existence of the
ec_guard() code used by the old EC driver where the re-check timing
is implemented in the similar way as this mode.
Method: QR_EC is re-checked as late as possible after completing the _Qxx
evaluation. The target may implement SCI_EVT like a level triggered
interrupt.
It is proven on kernel bugzilla 94411 that, Windows will have all
_Qxx evaluations parallelized. Thus unless required by further
evidences, we needn't implement this mode as it is a conflict of
the _Qxx parallelism requirement.
Note that, according to the reports, there are platforms that cannot be
handled using the "Status" mode without enabling the
EC_FLAGS_QUERY_HANDSHAKE quirk. But they can be handled with the other
modes according to the tests (kernel bugzilla 97381).
The following log entry can be used to confirm the differences of the 3
modes as it should appear at the different positions for the 3 modes:
Command(QR_EC) unblocked
Status: appearing after
EC_SC(W) = 0x84
Query: appearing after
EC_DATA(R) = 0xXX
where XX is the event number used to determine _QXX
Event: appearing after first
EC_SC(R) = 0xX0 SCI_EVT=x BURST=0 CMD=0 IBF=0 OBF=0
that is next to the following log entry:
Command(QR_EC) completed by hardware
Link: https://bugzilla.kernel.org/show_bug.cgi?id=94411
Link: https://bugzilla.kernel.org/show_bug.cgi?id=97381
Link: https://bugzilla.kernel.org/show_bug.cgi?id=98111
Reported-and-tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Reported-and-tested-by: Tigran Gabrielyan <tigrangab@gmail.com>
Reported-and-tested-by: Adrien D <ghbdtn@openmailbox.org>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
||
|
|
9d8993be2d |
ACPI / EC: Convert event handling work queue into loop style.
During the period that a work queue is scheduled (queued up for run) but hasn't been run, second schedule_work() could fail. This may not lead to the loss of queries because QR_EC is always ensured to be submitted after the work queue has been in the running state. The event handling work queue can be changed into the loop style to allow us to control the code in a more flexible way: 1. Makes it possible to add event=0x00 termination condition in the loop. 2. Increases the thoughput of the QR_EC transactions as the 2nd+ QR_EC transactions may be handled in the same work item used for the 1st QR_EC transaction, thus the delay caused by the 2nd+ work item scheduling can be eliminated. Except the logging message changes and the throughput improvement, this patch is just a funcitonal no-op. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Tested-by: Tigran Gabrielyan <tigrangab@gmail.com> Tested-by: Adrien D <ghbdtn@openmailbox.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
f8b8eb7153 |
ACPI / EC: Cleanup transaction state transition.
This patch collects transaction state transition code into one function. We then could have a single function to maintain transaction transition related behaviors. No functional changes. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Tested-by: Tigran Gabrielyan <tigrangab@gmail.com> Tested-by: Adrien D <ghbdtn@openmailbox.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
3174abcfea |
ACPI / EC: Remove non-root-caused busy polling quirks.
{ Update to correct 1 patch subject in the description }
We have fixed a lot of race issues in the EC driver recently.
The following commit introduces MSI udelay()/msleep() quirk to MSI laptops
to make EC firmware working for bug 12011 without root causing any EC
driver race issues:
Commit:
|
||
|
|
15de603b04 |
ACPI / EC: Add module params for polling modes.
We have 2 polling modes in the EC driver: 1. busy polling: originally used for the MSI quirks. udelay() is used to perform register access guarding. 2. wait polling: normal code path uses wait_event_timeout() and it can be woken up as soon as the transaction is completed in the interrupt mode. It also contains the register acces guarding logic in case the interrupt doesn't arrive and the EC driver is about to advance the transaction in task context (the polling mode). The wait polling is useful for interrupt mode to allow other tasks to use the CPU during the wait. But for the polling mode, the busy polling takes less time than the wait polling, because if no interrupt arrives, the wait polling has to wait the minimal HZ interval. We have a new use case for using the busy polling mode. Some GPIO drivers initialize PIN configuration which cause a GPIO multiplexed EC GPE to be disabled out of the GPE register's control. Busy polling mode is useful here as it takes less time than the wait polling. But the guarding logic prevents it from responding even faster. We should spinning around the EC status rather than spinning around the nop execution lasted a determined period. This patch introduces 2 module params for the polling mode switch and the guard time, so that users can use the busy polling mode without the guarding in case the guarding is not necessary. This is an example to use the 2 module params for this purpose: acpi.ec_busy_polling acpi.ec_polling_guard=0 We've tested the patch on a test platform. The platform suffers from such kind of the GPIO PIN issue. The GPIO driver resets all PIN configuration and after that, EC interrupt cannot arrive because of the multiplexing. Then the platform suffers from a long delay carried out by the wait_event_timeout() as all further EC transactions will run in the polling mode. We switched the EC driver to use the busy polling mechanism instead of the wait timeout polling mechanism and the delay is still high: [ 44.283005] calling PNP0C0B:00+ @ 1305, parent: platform [ 44.417548] call PNP0C0B:00+ returned 0 after 131323 usecs And this patch can significantly reduce the delay: [ 44.502625] calling PNP0C0B:00+ @ 1308, parent: platform [ 44.503760] call PNP0C0B:00+ returned 0 after 1103 usecs Tested-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
d8d031a605 |
ACPI / EC: Fix and clean up register access guarding logics.
In the polling mode, EC driver shouldn't access the EC registers too
frequently. Though this statement is concluded from the non-root caused
bugs (see links below), we've maintained the register access guarding
logics in the current EC driver. The guarding logics can be found here and
there, makes it hard to root cause real timing issues. This patch collects
the guarding logics into one single function so that all hidden logics
related to this can be seen clearly.
The current guarding related code also has several issues:
1. Per-transaction timestamp prevents inter-transaction guarding from being
implemented in the same place. We have an inter-transaction udelay() in
acpi_ec_transaction_unblocked(), this logic can be merged into ec_poll()
if we can use per-device timestamp. This patch completes such merge to
form a new ec_guard() function and collects all guarding related hidden
logics in it.
One hidden logic is: there is no inter-transaction guarding performed
for non MSI quirk (wait polling mode), this patch skips
inter-transaction guarding before wait_event_timeout() for the wait
polling mode to reveal the hidden logic.
The other hidden logic is: there is msleep() inter-transaction guarding
performed when the GPE storming is observed. As after merging this
commit:
Commit:
|
||
|
|
373783e6e9 |
ACPI / EC: Remove irqs_disabled() check.
The following commit merges polling and interrupt modes for EC driver:
Commit:
|
||
|
|
5ab82a11e5 |
ACPI / EC: Remove storming threashold enlarging quirk.
This patch removes the storming threashold enlarging quirk. After applying the following commit, we can notice that there is no no-op GPE handling invocation can be observed, thus it is unlikely that the no-op counts can exceed the storming threashold: Commit: |
||
|
|
7c0b2595da |
ACPI / EC: Update acpi_ec_is_gpe_raised() with new GPE status flag.
This patch updates acpi_ec_is_gpe_raised() according to the following
commit:
Commit:
|
||
|
|
6b5eab5469 |
ACPI / EC: fix NULL pointer dereference in acpi_ec_remove_query_handler()
Use list_for_each_entry_safe for iterating because handler may be freed in the loop. BUG: unable to handle kernel NULL pointer dereference at 000000000000002c IP: [<ffffffff814d69c8>] acpi_ec_put_query_handler+0x7/0x1a Call Trace: acpi_ec_remove_query_handler+0x87/0x97 acpi_smbus_hc_remove+0x2a/0x44 [sbshc] acpi_device_remove+0x7b/0x9a __device_release_driver+0x7e/0x110 driver_detach+0xb0/0xc0 bus_remove_driver+0x54/0xe0 driver_unregister+0x2b/0x60 acpi_bus_unregister_driver+0x10/0x12 acpi_smb_hc_driver_exit+0x10/0x12 [sbshc] SyS_delete_module+0x1b8/0x210 system_call_fastpath+0x12/0x6a Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
1c832b3e85 |
ACPI / EC: Call acpi_walk_dep_device_list() after installing EC opregion handler
On some machines(E,G Mircosoft surface 3), ACPI battery depends on the EC operation region and it has _DEP method which contains EC. Current code doesn't support such devices whose dep_unmet will be not be decreased after EC opregion handler being installed. This blocks battery device to be attached with its driver. This patch is to fix the issue. Link: https://bugzilla.kernel.org/show_bug.cgi?id=90161 Reported-and-tested-by: Lompik <lompik@voila.fr> Tested-by: Valentin Lab <valentin.lab_bugzilla.kernel.org@kalysto.org> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
770970f0b4 |
ACPI / EC: Add GPE reference counting debugging messages.
This patch enhances debugging with the GPE reference count messages added. This kind of log entries can be used by the platform validators to validate if there is an EC transaction broken because of firmware/driver bugs. No functional changes. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
3535a3c126 |
ACPI / EC: Cleanup logging/debugging splitter support.
This patch refines logging/debugging splitter support so that when DEBUG is
disabled, splitters won't be visible in the kernel logs while they are
still available for developers when DEBUG is enabled.
This patch also refines the splitters to mark the following handling
process boundaries:
+++++: boundary of driver starting/stopping
boundary of IRQ storming
=====: boundary of transaction advancement
*****: boundary of EC command
boundary of EC query
#####: boundary of EC _Qxx evaluation
The following 2 log entries are originally logged using pr_info() in order
to be used as the boot/suspend/resume log entries for the EC device, this
patch also restores them to pr_info() logging level:
ACPI : EC: EC started
ACPI : EC: EC stopped
In this patch, one log entry around "Polling quirk" is converted into
ec_dbg_raw() which doesn't contain the boundary marker.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
||
|
|
92e4b1bcd6 |
ACPI / EC: Remove non-standard log emphasis
Remove unusual pr_info() visual emphasis introduced in
|
||
|
|
37d11391c2 |
Revert "ACPI / EC: Add query flushing support"
Revert commit
|
||
|
|
e06bf91b59 |
Revert "ACPI / EC: Add GPE reference counting debugging messages"
Revert commit |
||
|
|
b5bca896ef |
ACPI / EC: Add GPE reference counting debugging messages
This patch enhances debugging with the GPE reference count messages added. No functional changes. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
f252cb09e1 |
ACPI / EC: Add query flushing support
This patch implementes the QR_EC flushing support.
Grace periods are implemented from the detection of an SCI_EVT to the
submission/completion of the QR_EC transaction. During this period, all
EC command transactions are allowed to be submitted.
Note that query periods and event periods are intentionally distiguished to
allow further improvements.
1. Query period: from the detection of an SCI_EVT to the sumission of the
QR_EC command. This period is used for storming prevention, as currently
QR_EC is deferred to a work queue rather than directly issued from the
IRQ context even there is no other transactions pending, so malicous
SCI_EVT GPE can act like "level triggered" to trigger a GPE storm. We
need to be prepared for this. And in the future, we may change it to be
a part of the advance_transaction() where we will try QR_EC submission
in appropriate positions to avoid such GPE storming.
2. Event period: from the detection of an SCI_EVT to the completion of the
QR_EC command. We may extend it to the completion of _Qxx evaluation.
This is actually a grace period for event flushing, but we only flush
queries due to the reason stated in known issue 1. That's also why we
use EC_FLAGS_EVENT_xxx. During this period, QR_EC transactions need to
pass the flushable submission check.
In this patch, the following flags are implemented:
1. EC_FLAGS_EVENT_ENABLED: this is derived from the old
EC_FLAGS_QUERY_PENDING flag which can block SCI_EVT handlings.
With this flag, the logics implemented by the original flag are
extended:
1. Old logic: unless both of the flags are set, the event poller will
not be scheduled, and
2. New logic: as soon as both of the flags are set, the evet poller will
be scheduled.
2. EC_FLAGS_EVENT_DETECTED: this is also derived from the old
EC_FLAGS_QUERY_PENDING flag which can block SCI_EVT detection. It thus
can be used to indicate the storming prevention period for query
submission.
acpi_ec_submit_request()/acpi_ec_complete_request() are invoked to
implement this period so that acpi_set_gpe() can be invoked under the
"reference count > 0" condition.
3. EC_FLAGS_EVENT_PENDING: this is newly added to indicate the grace period
for event flushing (query flushing for now).
acpi_ec_submit_request()/acpi_ec_complete_request() are invoked to
implement this period so that the flushing process can wait until the
event handling (query transaction for now) to be completed.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=82611
Link: https://bugzilla.kernel.org/show_bug.cgi?id=77431
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Tested-by: Ortwin Glück <odi@odi.ch>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
||
|
|
e1d4d90fc0 |
ACPI / EC: Refine command storm prevention support
This patch refines EC command storm prevention support. Current command storming code is wrong, when the storming condition is detected, it only flags the condition without doing anything for the current command but performing storming prevention for the follow-up commands. So: 1. The first command which suffers from the storming still suffers from storming. 2. The follow-up commands which may not suffer from the storming are unconditionally forced into the storming prevention mode. Ideally, we should only enable storm prevention immediately after detection for the current command so that the next command can try the power/performance efficient interrupt mode again. This patch improves the command storm prevention by disabling GPE right after the detection and re-enabling it right before completing the command transaction using the GPE storming prevention APIs. This thus deploys the following GPE handling model: 1. acpi_enable_gpe()/acpi_disable_gpe() for reference count changes: This set of APIs are used for EC usage reference counting. 2. acpi_set_gpe(ACPI_GPE_ENABLE)/acpi_set_gpe(ACPI_GPE_DISABLE): This set of APIs are used for preventing GPE storm. They must be invoked when the reference count > 0. Note that as the storming prevention should always happen when there is an outstanding request, or GPE enabling value will be messed up by the races. This patch also adds BUG_ON() to enforces this rule to prevent future bugs. The msleep(1) used after completing a transaction is useless now as this sounds like a guard time only useful for platforms that need the EC_FLAGS_MSI quirks while we have fixed GPE race issues using the previous raw handler mode enabling. It is kept to avoid regressions. A seperate patch which deletes EC_FLAGS_MSI quirks should take care of deleting it. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
9887d22add |
ACPI / EC: Add command flushing support.
This patch implements the EC command flushing support. During the grace period indicated by EC_FLAGS_STARTED and EC_FLAGS_STOPPED, all submitted EC command transactions can be completed and new submissions are prevented before suspending so that the EC hardware can be ensured to be in the idle state when the system is resumed. There is a good indicator for flush support: All acpi_ec_submit_request() is invoked after checking driver state with acpi_ec_started() except the first one. This means all code paths can be flushed as fast as possible by discarding the requests occurred after the flush operation. The reference increased for such kind of code path is wrapped by acpi_ec_submit_flushable_request(). Signed-off-by: Lv Zheng <lv.zheng@intel.com> Tested-by: Ortwin Glück <odi@odi.ch> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
ad479e7f47 |
ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag
By using the 2 flags, we can indicate an inter-mediate state where the current transactions should be completed while the new transactions should be dropped. The comparison of the old flag and the new flags: Old New about to set BLOCKED STOPPED set / STARTED set BLOCKED set STOPPED clear / STARTED clear BLOCKED clear STOPPED clear / STARTED set A new period can be indicated by the 2 flags. The new period is between the point where we are about to set BLOCKED and the point when the BLOCKED is set. The new flags facilitate us with acpi_ec_started() check to allow the EC transaction to be submitted during the new period. This period thus can be used as a grace period for the EC transaction flushing. The only functional change after applying this patch is: 1. The GPE enabling/disabling is protected by the EC specific lock. We can do this because of recent ACPICA GPE API enhancement. This is reasonable as the GPE disabling/enabling state should only be determined by the EC driver's state machine which is protected by the EC spinlock. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Tested-by: Ortwin Glück <odi@odi.ch> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
a8d4fc227f |
ACPI / EC: Update revision due to raw handler mode.
The bug fixes around GPE races have been done to the EC driver by the previous commits. This patch increases the revision to 3 to indicate the behavior differences between the old and the new drivers. The copyright/authorship notices are also updated. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
9e295ac14d |
ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp.
Timeout in the ec_poll() doesn't refer to the last register access time. It thus can win the competition against the acpi_ec_gpe_handler() if a transaction takes longer than 1ms but individual register accesses are less than 1ms. In some cases, it can make the following silicon bug easier to be triggered: GPE EN is not wired to the GPE trigger line, so when GPE STS is already set when 1 is written to GPE EN, no GPE can be triggered. This patch adds register access timestamp reference support for ec_poll() to reduce the number of ec_poll() invocations. Reported-by: Venkat Raghavulu <venkat.raghavulu@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
ca37bfdfbc |
ACPI / EC: Fix several GPE handling issues by deploying ACPI_GPE_DISPATCH_RAW_HANDLER mode.
This patch switches EC driver into ACPI_GPE_DISPATCH_RAW_HANDLER mode where
the GPE lock is not held for acpi_ec_gpe_handler() and the ACPICA internal
GPE enabling/disabling/clearing operations are bypassed so that further
improvements are possible with the GPE APIs.
There are 2 strong reasons for deploying raw GPE handler mode in the EC
driver:
1. Some hardware logics can control their interrupts via their own
registers, so their interrupts can be disabled/enabled/acknowledged
without using the super IRQ controller provided functions. While there
is no mean (EC commands) for the EC driver to achieve this.
2. During suspending, the EC driver is still working for a while to
complete the platform firmware provided functionailities using ec_poll()
after all GPEs are disabled (see acpi_ec_block_transactions()), which
means the EC driver will drive the EC GPE out of the GPE core's control.
Without deploying the raw GPE handler mode, we can see many races between
the EC driver and the GPE core due to the above restrictions:
1. There is a race condition due to ACPICA internal GPE
disabling/clearing/enabling logics in acpi_ev_gpe_dispatch():
Orignally EC GPE is disabled (EN=0), cleared (STS=0) before invoking a
GPE handler and re-enabled (EN=1) after invoking a GPE handler in
acpi_ev_gpe_dispatch(). When re-enabling appears, GPE may be flagged
(STS=1).
=================================================================
(event pending A)
=================================================================
acpi_ev_gpe_dispatch() ec_poll()
EN=0
STS=0
acpi_ec_gpe_handler()
*****************************************************************
(event handling A)
Lock(EC)
advance_transaction()
EC_SC read
=================================================================
(event pending B)
=================================================================
EC_SC handled
Unlock(EC)
*****************************************************************
*****************************************************************
(event handling B)
Lock(EC)
advance_transaction()
EC_SC read
=================================================================
(event pending C)
=================================================================
EC_SC handled
Unlock(EC)
*****************************************************************
EN=1
This race condition is the root cause of different issues on different
silicon variations.
A. Silicon variation A:
On some platforms, GPE will be triggered due to "writing 1 to EN when
STS=1". This is because both EN and STS lines are wired to the GPE
trigger line.
1. Issue 1:
We can see no-op acpi_ec_gpe_handler() invoked on such platforms.
This is because:
a. event pending B: An event can arrive after ACPICA's GPE
clearing performed in acpi_ev_gpe_dispatch(), this event may
fail to be detected by EC_SC read that is performed before its
arrival;
b. event handling B: The event can be handled in ec_poll() because
EC lock is released after acpi_ec_gpe_handler() invocation;
c. There is no code in ec_poll() to clear STS but the GPE can
still be triggered by the EN=1 write performed in
acpi_ev_finish_gpe(), this leads to a no-op EC GPE handler
invocation;
d. As no-op GPE handler invocations are counted by the EC driver
to trigger the command storming conditions, the wrong no-op
GPE handler invocations thus can easily trigger wrong command
storming conditions.
Note 1:
If we removed GPE disabling/enabling code from
acpi_ev_gpe_dispatch(), we could still see no-op GPE handlers
triggered by the event arriving after the GPE clearing and before
the GPE handling on both silicon variation A and B. This can only
occur if the CPU is very slow (timing slice between STS=0 write
and EC_SC read should be short enough before hardware sets another
GPE indication). Thus this is very rare and is not what we need to
fix.
B. Silicon variation B:
On other platforms, GPE may not be triggered due to "writing 1 to EN
when STS=1". This is because only STS line is wired to the GPE
trigger line.
2. Issue 2:
We can see GPE loss on such platforms. This is because:
a. event pending B vs. event handling A: An event can arrive after
ACPICA's GPE handling performed in acpi_ev_gpe_dispatch(), or
event pending C vs. event handling B: An event can arrive after
Linux's GPE handling performed in ec_poll(),
these events may fail to be detected by EC_SC read that is
performed before their arrival;
b. The GPE cannot be triggered by EN=1 write performed in
acpi_ev_finish_gpe();
c. If no polling mechanism is implemented in the driver for the
pending event (for example, SCI_EVT), this event is lost due to
no GPE being triggered.
Note 2:
On most platforms, there might be another rule that GPE may not be
triggered due to "writing 1 to STS when STS=1 and EN=1".
Then on silicon variation B, an even worse case is if the issue 2
event loss happens, further events may never trigger GPE again on
such platforms due to being blocked by the current STS=1. Unless
someone clears STS, all events have to be polled.
2. There is a race condition due to lacking in GPE status checking in EC
driver:
Originally, GPE status is checked in ACPICA core but not checked in
the GPE handler. Thus since the status checking and handling is not
locked, it can be interrupted by another handling path.
=================================================================
(event pending A)
=================================================================
acpi_ev_gpe_detect() ec_poll()
if (EN==1 && STS==1)
*****************************************************************
(event handling A)
Lock(EC)
advance_transaction()
EC_SC read
EC_SC handled
Unlock(EC)
*****************************************************************
acpi_ev_gpe_dispatch()
EN=0
STS=0
acpi_ec_gpe_handler()
*****************************************************************
(event handling B)
Lock(EC)
advance_transaction()
EC_SC read
Unlock(EC)
*****************************************************************
3. Issue 3:
We can see no-op acpi_ec_gpe_handler() invoked on both silicon
variation A and B. This is because:
a. event pending A: An event can arrive to trigger an EC GPE and
ACPICA checks it and is about to invoke the EC GPE handler;
b. event handling A: The event can be handled in ec_poll() because
EC lock is not held after the GPE status checking;
c. event handling B: Then when the EC GPE handler is invoked, it
becomes a no-op GPE handler invocation.
d. As no-op GPE handler invocations are counted by the EC driver
to trigger the command storming conditions, the wrong no-op
GPE handler invocations thus can easily trigger wrong command
storming conditions.
Note 3:
This no-op GPE handler invocation is rare because the time between
the IRQ arrival and the acpi_ec_gpe_handler() invocation is less than
the timeout value waited in ec_poll(). So most of the no-op GPE
handler invocations are caused by the reason described in issue 1.
3. There is a race condition due to ACPICA internal GPE clearing logic in
acpi_enable_gpe():
During runtime, acpi_enable_gpe() can be invoked by the EC storming
prevention code. When it is invoked, GPE may be flagged (STS=1).
=================================================================
(event pending A)
=================================================================
acpi_ev_gpe_dispatch() acpi_ec_transaction()
EN=0
STS=0
acpi_ec_gpe_handler()
*****************************************************************
(event handling A)
Lock(EC)
advance_transaction()
EC_SC read
EC_SC handled
Unlock(EC)
*****************************************************************
EN=1 ?
Lock(EC)
Unlock(EC)
=================================================================
(event pending B)
=================================================================
acpi_enable_gpe()
STS=0
EN=1
4. Issue 4:
We can see GPE loss on both silicon variation A and B platforms.
This is because:
a. event pending B: An event can arrive right before ACPICA's GPE
clearing performed in acpi_enable_gpe();
b. If the GPE is cleared when GPE is disabled, then EN=1 write in
acpi_enable_gpe() cannot trigger this GPE;
c. If no polling mechanism is implemented in the driver for this
event (for example, SCI_EVT), this event is lost due to no GPE
being triggered.
Note 4:
Currently we don't have this issue, but after we switch the EC
driver into ACPI_GPE_DISPATCH_RAW_HANDLER mode, we need to take care
of handling this because the EN=1 write in acpi_ev_gpe_dispatch()
will be abandoned.
There might be more race issues for the current GPE handler usages. This is
because the EC IRQ's enabling/disabling/checking/clearing/handling
operations should be locked by a single lock that is under the EC driver's
control to achieve the serialization. Which means we need to invoke GPE
APIs with EC driver's lock held and all ACPICA internal GPE operations
related to the GPE handler should be abandoned. Invoking GPE APIs inside of
the EC driver lock and bypassing ACPICA internal GPE operations requires
the ACPI_GPE_DISPATCH_RAW_HANDLER mode where the same lock used by the APIs
are released prior than invoking the handlers. Otherwise, we can see dead
locks due to circular locking dependencies (see Reference below).
This patch then switches the EC driver into the
ACPI_GPE_DISPATCH_RAW_HANDLER mode so that it can perform correct GPE
operations using the GPE APIs:
1. Bypasses EN modifications performed in acpi_ev_gpe_dispatch() by
using acpi_install_gpe_raw_handler() and invoking all GPE APIs with EC
spin lock held. This can fix issue 1 as it makes a non frequent GPE
enabling/disabling environment.
2. Bypasses STS clearing performed in acpi_enable_gpe() by replacing
acpi_enable_gpe()/acpi_disable_gpe() with acpi_set_gpe(). This can fix
issue 4. And this can also help to fix issue 1 as it makes a no sudden
GPE clearing environment when GPE is frequently enabled/disabled.
3. Ensures STS acknowledged before handling by invoking acpi_clear_gpe()
in advance_transaction(). This can finally fix issue 1 even in a
frequent GPE enabling/disabling environment. And this can also finally
fix issue 3 when issue 2 is fixed.
Note 3:
GPE clearing is edge triggered W1C, which means we can clear it right
before handling it. Since all EC GPE indications are handled in
advance_transaction() by previous commits, we can now move GPE clearing
into it to implement the correct GPE clearing.
Note 4:
We can use acpi_set_gpe() which is not shared GPE safer instead of
acpi_enable_gpe()/acpi_disable_gpe() because EC GPE is not shared by
other hardware, which is mentioned in the ACPI specification 5.0, 12.6
Interrupt Model: "OSPM driver treats this as an edge event (the EC SCI
cannot be shared)". So we can stop using shared GPE safer APIs
acpi_enable_gpe()/acpi_disable_gpe() in the EC driver. Otherwise
cleanups need to be made in acpi_ev_enable_gpe() to bypass the GPE
clearing logic before keeping acpi_enable_gpe().
This patch also invokes advance_transaction() when GPE is re-enabled in the
task context which:
1. Ensures EN=1 can trigger GPE by checking and handling EC status register
right after EN=1 writes. This can fix issue 2.
After applying this patch, without frequent GPE enablings considered:
=================================================================
(event pending A)
=================================================================
acpi_ec_gpe_handler() ec_poll()
*****************************************************************
(event handling A)
Lock(EC)
advance_transaction()
if STS==1
STS=0
EC_SC read
=================================================================
(event pending B)
=================================================================
EC_SC handled
Unlock(EC)
*****************************************************************
*****************************************************************
(event handling B)
Lock(EC)
advance_transaction()
if STS==1
STS=0
EC_SC read
=================================================================
(event pending C)
=================================================================
EC_SC handled
Unlock(EC)
*****************************************************************
The event pending for issue 1 (event pending B) can arrive as a next GPE
due to the previous IRQ context STS=0 write. And if it is handled by
ec_poll() (event handling B), as it is also acknowledged by ec_poll(), the
event pending for issue 2 (event pending C) can properly arrive as a next
GPE after the task context STS=0 write. So no GPE will be lost and never
triggered due to GPE clearing performed in the wrong position. And since
all GPE handling is performed after a locked GPE status checking, we can
hardly see no-op GPE handler invocations due to issue 1 and 3. We may still
see no-op GPE handler invocations due to "Note 1", but as it is inevitable,
it needn't be fixed.
After applying this patch, with frequent GPE enablings considered:
=================================================================
(event pending A)
=================================================================
acpi_ec_gpe_handler() acpi_ec_transaction()
*****************************************************************
(event handling A)
Lock(EC)
advance_transaction()
if STS==1
STS=0
EC_SC read
=================================================================
(event pending B)
=================================================================
EC_SC handled
Unlock(EC)
*****************************************************************
*****************************************************************
(event handling B)
Lock(EC)
EN=1
if STS==1
advance_transaction()
if STS==1
STS=0
EC_SC read
=================================================================
(event pending C)
=================================================================
EC_SC handled
Unlock(EC)
*****************************************************************
The event pending for issue 2 can be manually handled by
advance_transaction(). And after the STS=0 write performed in the manual
triggered advance_transaction(), GPE can always arrive. So no GPE will be
lost due to frequent GPE disabling/enabling performed in the driver like
issue 4.
Note 5:
It's ideally when EN=1 write occurred, an IRQ thread should be woken up to
handle the GPE when the GPE was raised. But this requires the IRQ thread to
contain the poller code for all EC GPE indications, while currently some of
the indications are handled in the user tasks. It then is very hard for the
code to determine whether a user task should be invoked or the poller work
item should be scheduled. So we have to invoke advance_transaction()
directly now and it leaves us such a restriction for the GPE re-enabling:
it must be performed in the task context to avoid starving the GPEs.
As a conclusion: we can see the EC GPE is always handled in serial after
deploying the raw GPE handler mode:
Lock(EC)
if (STS==1)
STS=0
EC_SC read
EC_SC handled
Unlock(EC)
The EC driver specific lock is responsible to make the EC GPE handling
processes serialized so that EC can handle its GPE from both IRQ and task
contexts and the next IRQ can be ensured to arrive after this process.
Note 6:
We have many EC_FLAGS_MSI qurik users in the current driver. They all seem
to be suffering from unexpected GPE triggering source lost. And they are
false root caused to a timing issue. Since EC communication protocol has
already flow control defined, timing shouldn't be the root cause, while
this fix might be fixing the root cause of the old bugs.
Link: https://lkml.org/lkml/2014/11/4/974
Link: https://lkml.org/lkml/2014/11/18/316
Link: https://www.spinics.net/lists/linux-acpi/msg54340.html
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
||
|
|
550b3aac5a |
ACPI / EC: Cleanup QR_EC related code
The QR_EC related code pieces have redundants, this patch merges them into acpi_ec_query() which invokes acpi_ec_transaction() where EC mutex and the global lock are already held. After doing so, query handler traversal still need to be locked by EC mutex after invoking acpi_ec_transaction(). Note that EC event handling is sequential. We fetch one event from firmware event queue and process it until 0x00 or error returned. So we don't need to hold mutex for whole acpi_ec_clear() process to determine whether we should continue to drain. And for the same reason, we don't need to hold mutex for the whole procedure from the QR_EC transaction to the query handler traversal. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |