The patch removes 455 occurrences of FAIL_ON_WARNINGS from moz.build files, and
adds 78 instances of ALLOW_COMPILER_WARNINGS. About half of those 78 are in
code we control and which should be removable with a little effort.
When the parent process has trouble deserializing an IPC message from the content
process, it originally killed that content process. This doesn't result in us
creating a crash report (and doing so is difficult if the FatalError is hit
off main thread). We now crash the parent process if we hit such a FatalError
in the parent process. This will hopefully give us an idea of how frequent
these FatalErrors are, since up until now we've been getting no crash data
for them.
This is motivated by three separate but related problems:
1. Our concept of recursion depth is broken for things that run from AfterProcessNextEvent observers (e.g. Promises). We decrement the recursionDepth counter before firing observers, so a Promise callback running at the lowest event loop depth has a recursion depth of 0 (whereas a regular nsIRunnable would be 1). This is a problem because it's impossible to distinguish a Promise running after a sync XHR's onreadystatechange handler from a top-level event (since the former runs with depth 2 - 1 = 1, and the latter runs with just 1).
2. The nsIThreadObserver mechanism that is used by a lot of code to run "after" the current event is a poor fit for anything that runs script. First, the order the observers fire in is the order they were added, not anything fixed by spec. Additionally, running script can cause the event loop to spin, which is a big source of pain here (bholley has some nasty bug caused by this).
3. We run Promises from different points in the code for workers and main thread. The latter runs from XPConnect's nsIThreadObserver callbacks, while the former runs from a hardcoded call to run Promises in the worker event loop. What workers do is particularly problematic because it means we can't get the right recursion depth no matter what we do to nsThread.
The solve this, this patch does the following:
1. Consolidate some handling of microtasks and all handling of stable state from appshell and WorkerPrivate into CycleCollectedJSRuntime.
2. Make the recursionDepth counter only available to CycleCollectedJSRuntime (and its consumers) and remove it from the nsIThreadInternal and nsIThreadObserver APIs.
3. Adjust the recursionDepth counter so that microtasks run with the recursionDepth of the task they are associated with.
4. Introduce the concept of metastable state to replace appshell's RunBeforeNextEvent. Metastable state is reached after every microtask or task is completed. This provides the semantics that bent and I want for IndexedDB, where transactions autocommit at the end of a microtask and do not "spill" from one microtask into a subsequent microtask. This differs from appshell's RunBeforeNextEvent in two ways:
a) It fires between microtasks, which was the motivation for starting this.
b) It no longer ensures that we're at the same event loop depth in the native event queue. bent decided we don't care about this.
5. Reorder stable state to happen after microtasks such as Promises, per HTML. Right now we call the regular thread observers, including appshell, before the main thread observer (XPConnect), so stable state tasks happen before microtasks.
This allows us to send a sync fork request to the Nuwa process when we need one but there is no
spare process available. After an app is launched, the request to fork a spare process is still
handled asynchronously.
This also moves the initialization of the sandbox TargetServices to earlier in
plugin-container.cpp content_process_main, because it needs to happen before
xul.dll loads.
The current situation looks like this: Firefox launches the plugin-container
with two environment variables set:
LD_LIBRARY_PATH=$FIREFOX_DIR:$LD_LIBRARY_PATH
LD_PRELOAD=$FIREFOX_DIR/libmozgtk2.so:$LD_PRELOAD
libxul.so has a dependency on libmozgtk.so (without "2"), but libmozgtk2.so
has a SONAME of libmozgtk.so, so ld.so recognizes libmozgtk2.so as a
dependency of libxul.so, and uses it instead of the actual libmozgtk.so,
making the plugin-container use Gtk+2 instead of Gtk+3 to load Gtk+2 plugins.
Now, ASan sets things up in shared libraries such that they needs a symbol
from the executable binary. So in the case of plugin-container, the
plugin-container executable itself contains some ASan symbols such as
__asan_init_v3. libmozgtk2.so, OTOH, contains an undefined weak reference to
that symbol, like all other Firefox shared libraries.
Since libmozgtk2.so is LD_PRELOADed, it is loaded _before_ the
plugin-container executable, and __asan_init_v3 can't be resolved.
Disabling ASan for libmozgtk2.so would be a possibility, but the build system
doesn't really know how to do that, and filtering out -fsanitize=address
can be fragile.
The alternative possibility, implemented here, is to change the library
loading strategy, renaming libmozgtk2.so to gtk2/libmozgtk.so, and setting
the following environment variable when Firefox launches the plugin-container:
LD_LIBRARY_PATH=$FIREFOX_DIR/gtk2:$FIREFOX_DIR:$LD_LIBRARY_PATH
There are a variety of ways that the parent and child process ensure that
the child process exits quickly in opt builds, but for AddressSanitizer
builds we want to let the child process to run to completion, so that we
can get a LeakSanitizer report.
This requires adding some addition LSan suppressions, because running
LSan in child processes detects some new leaks.
The bulk of this commit was generated by running:
run-clang-tidy.py \
-checks='-*,llvm-namespace-comment' \
-header-filter=^/.../mozilla-central/.* \
-fix