NO BUG - Update build system docs for Python and moz.build files

DONTBUILD (NPOTB)
This commit is contained in:
Gregory Szorc 2013-09-30 14:32:07 +02:00
parent d12c994044
commit 58fd1367e3
5 changed files with 334 additions and 182 deletions

View File

@ -22,62 +22,67 @@ harmony to build the source tree. We begin with a graphic overview.
Phase 1: Configuration
======================
Phase 1 centers around the configure script, which is a bash shell script.
The file is generated from a file called configure.in which is written in M4
Phase 1 centers around the ``configure`` script, which is a bash shell script.
The file is generated from a file called ``configure.in`` which is written in M4
and processed using Autoconf 2.13 to create the final configure script.
You don't have to worry about how you obtain a configure file: the build system
does this for you.
You don't have to worry about how you obtain a ``configure`` file: the build
system does this for you.
The primary job of configure is to determine characteristics of the system and
compiler, apply options passed into it, and validate everything looks OK to
build. The primary output of the configure script is an executable file in the
object directory called config.status. configure also produces some additional
files (like autoconf.mk). However, the most important file in terms of
architecture is config.status.
The primary job of ``configure`` is to determine characteristics of the system
and compiler, apply options passed into it, and validate everything looks OK to
build. The primary output of the ``configure`` script is an executable file
in the object directory called ``config.status``. ``configure`` also produces
some additional files (like ``autoconf.mk``). However, the most important file
in terms of architecture is ``config.status``.
The existence of a config.status file may be familiar to those who have worked
with Autoconf before. However, Mozilla's config.status is different from almost
any other config.status you've ever seen: it's written in Python! Instead of
having our configure script produce a shell script, we have it generating Python.
The existence of a ``config.status`` file may be familiar to those who have worked
with Autoconf before. However, Mozilla's ``config.status`` is different from almost
any other ``config.status`` you've ever seen: it's written in Python! Instead of
having our ``configure`` script produce a shell script, we have it generating
Python.
Now is as good a time as any to mention that Python is prevalent in our build
system. If we need to write code for the build system, we do it in Python.
That's just how we roll.
That's just how we roll. For more, see :ref:`python`.
config.status contains 2 parts: data structures representing the output of
configure and a command-line interface for preparing/configuring/generating
``config.status`` contains 2 parts: data structures representing the output of
``configure`` and a command-line interface for preparing/configuring/generating
an appropriate build backend. (A build backend is merely a tool used to build
the tree - like GNU Make or Tup). These data structures essentially describe
the current state of the system and what the existing build configuration looks
like. For example, it defines which compiler to use, how to invoke it, which
application features are enabled, etc. You are encouraged to open up
config.status to have a look for yourself!
``config.status`` to have a look for yourself!
Once we have emitted a config.status file, we pass into the realm of phase 2.
Once we have emitted a ``config.status`` file, we pass into the realm of
phase 2.
Phase 2: Build Backend Preparation and the Build Definition
===========================================================
Once configure has determined what the current build configuration is, we need
to apply this to the source tree so we can actually build.
Once ``configure`` has determined what the current build configuration is,
we need to apply this to the source tree so we can actually build.
What essentially happens is the automatically-produced config.status Python
script is executed as soon as configure has generated it. config.status is charged
with the task of tell a tool had to build the tree. To do this, config.status
must first scan the build system definition.
What essentially happens is the automatically-produced ``config.status`` Python
script is executed as soon as ``configure`` has generated it. ``config.status``
is charged with the task of tell a tool how to build the tree. To do this,
``config.status`` must first scan the build system definition.
The build system definition consists of various moz.build files in the tree.
There is roughly one moz.build file per directory or pet set of related directories.
Each moz.build files defines how its part of the build config works. For example it
says I want these C++ files compiled or look for additional information in these
directories. config.status starts with the main moz.build file and then recurses
into all referenced files and directories. As the moz.build files are read, data
structures describing the overall build system definition are emitted. These data
structures are then read by a build backend generator which then converts them
into files, function calls, etc. In the case of a `make` backend, the generator
writes out Makefiles.
The build system definition consists of various ``moz.build`` files in the tree.
There is roughly one ``moz.build`` file per directory or per set of related directories.
Each ``moz.build`` files defines how its part of the build config works. For
example it says *I want these C++ files compiled* or *look for additional
information in these directories.* config.status starts with the ``moz.build``
file from the root directory and then descends into referenced ``moz.build``
files by following ``DIRS`` variables or similar.
When config.status runs, you'll see the following output::
As the ``moz.build`` files are read, data structures describing the overall
build system definition are emitted. These data structures are then fed into a
build backend, which then performs actions, such as writing out files to
be read by a build tool. e.g. a ``make`` backend will write a
``Makefile``.
When ``config.status`` runs, you'll see the following output::
Reticulating splines...
Finished reading 1096 moz.build files into 1276 descriptors in 2.40s
@ -85,14 +90,18 @@ When config.status runs, you'll see the following output::
2188 total backend files. 0 created; 1 updated; 2187 unchanged
Total wall time: 5.03s; CPU time: 3.79s; Efficiency: 75%
What this is saying is that a total of 1096 moz.build files were read. Altogether,
1276 data structures describing the build configuration were derived from them.
It took 2.40s wall time to just read these files and produce the data structures.
The 1276 data structures were fed into the build backend which then determined it
had to manage 2188 files derived from those data structures. Most of them
already existed and didn't need changed. However, 1 was updated as a result of
the new configuration. The whole process took 5.03s. Although, only 3.79s was in
CPU time. That likely means we spent roughly 25% of the time waiting on I/O.
What this is saying is that a total of *1096* ``moz.build`` files were read.
Altogether, *1276* data structures describing the build configuration were
derived from them. It took *2.40s* wall time to just read these files and
produce the data structures. The *1276* data structures were fed into the
build backend which then determined it had to manage *2188* files derived
from those data structures. Most of them already existed and didn't need
changed. However, *1* was updated as a result of the new configuration.
The whole process took *5.03s*. Although, only *3.79s* was in
CPU time. That likely means we spent roughly *25%* of the time waiting on
I/O.
For more on how ``moz.build`` files work, see :ref:`mozbuild-files`.
Phase 3: Invokation of the Build Backend
========================================

View File

@ -17,10 +17,12 @@ Important Concepts
build-overview
Mozconfig Files <mozconfigs>
mozbuild-files
Profile Guided Optimization <pgo>
slow
environment-variables
build-targets
python
test_manifests
mozbuild
@ -33,7 +35,6 @@ Mozilla build system.
:maxdepth: 1
mozbuild/index
mozbuild/frontend
mozbuild/dumbmake

View File

@ -0,0 +1,116 @@
.. _mozbuild-files:
===============
moz.build Files
===============
``moz.build`` files are the mechanism by which tree metadata (notably
the build configuration) is defined.
Directories in the tree contain ``moz.build`` files which declare
functionality for their respective part of the tree. This includes
things such as the list of C++ files to compile, where to find tests,
etc.
``moz.build`` files are actually Python scripts. However, their
execution is governed by special rules. This is explained below.
moz.build Python Sandbox
========================
As mentioned above, ``moz.build`` files are Python scripts. However,
they are executed in a special Python *sandbox* that significantly
changes and limits the execution environment. The environment is so
different, it's doubtful most ``moz.build`` files would execute without
error if executed by a vanilla Python interpreter (e.g. ``python
moz.build``.
The following properties make execution of ``moz.build`` files special:
1. The execution environment exposes a limited subset of Python.
2. There is a special set of global symbols and an enforced naming
convention of symbols.
The limited subset of Python is actually an extremely limited subset.
Only a few symbols from ``__builtins__`` are exposed. These include
``True``, ``False``, and ``None``. Global functions like ``import``,
``print``, and ``open`` aren't available. Without these, ``moz.build``
files can do very little. *This is by design*.
The execution sandbox treats all ``UPPERCASE`` variables specially. Any
``UPPERCASE`` variable must be known to the sandbox before the script
executes. Any attempt to read or write to an unknown ``UPPERCASE``
variable will result in an exception being raised. Furthermore, the
types of all ``UPPERCASE`` variables is strictly enforced. Attempts to
assign an incompatible type to an ``UPPERCASE`` variable will result in
an exception being raised.
The strictness of behavior with ``UPPERCASE`` variables is a very
intentional design decision. By ensuring strict behavior, any operation
involving an ``UPPERCASE`` variable is guaranteed to have well-defined
side-effects. Previously, when the build configuration was defined in
``Makefiles``, assignments to variables that did nothing would go
unnoticed. ``moz.build`` files fix this problem by eliminating the
potential for false promises.
In the sandbox, all ``UPPERCASE`` variables are globals and all
non-``UPPERCASE`` variables are locals. After a ``moz.build`` file has
completed execution, only the globals are used to retrieve state.
The set of variables and functions available to the Python sandbox is
defined by the :py:mod:`mozbuild.frontend.sandbox_symbols` module. The
data structures in this module are consumed by the
:py:class:`mozbuild.frontend.reader.MozbuildSandbox` class to construct
the sandbox. There are tests to ensure that the set of symbols exposed
to an empty sandbox are all defined in the ``sandbox_symbols`` module.
This module also contains documentation for each symbol, so nothing can
sneak into the sandbox without being explicitly defined and documented.
Reading and Traversing moz.build Files
======================================
The process responsible for reading ``moz.build`` files simply starts at
a root ``moz.build`` file, processes it, emits the globals namespace to
a consumer, and then proceeds to process additional referenced
``moz.build`` files from the original file. The consumer then examines
the globals/``UPPERCASE`` variables set as part of execution and then
converts the data therein to Python class instances.
The executed Python sandbox is essentially represented as a dictionary
of all the special ``UPPERCASE`` variables populated during its
execution.
The code for reading ``moz.build`` files lives in
:py:mod:`mozbuild.frontend.reader`. The evaluated Python sandboxes are
passed into :py:mod:`mozbuild.frontend.emitter`, which converts them to
classes defined in :py:mod:`mozbuild.frontend.data`. Each class in this
module define a domain-specific component of tree metdata. e.g. there
will be separate classes that represent a JavaScript file vs a compiled
C++ file or test manifests. This means downstream consumers of this data
can filter on class types to only consume what they are interested in.
There is no well-defined mapping between ``moz.build`` file instances
and the number of :py:mod:`mozbuild.frontend.data` classes derived from
each. Depending on the content of the ``moz.build`` file, there may be 1
object derived or 100.
The purpose of the ``emitter`` layer between low-level sandbox execution
and metadata representation is to facilitate a unified normalization and
verification step. There are multiple downstream consumers of the
``moz.build``-derived data and many will perform the same actions. This
logic can be complicated, so we a component dedicated to it.
Other Notes
===========
:py:class:`mozbuild.frontend.reader.BuildReader`` and
:py:class:`mozbuild.frontend.reader.TreeMetadataEmitter`` have a
stream-based API courtesy of generators. When you hook them up properly,
the :py:mod:`mozbuild.frontend.data` classes are emitted before all
``moz.build`` files have been read. This means that downstream errors
are raised soon after sandbox execution.
Lots of the code for evaluating Python sandboxes is applicable to
non-Mozilla systems. In theory, it could be extracted into a standalone
and generic package. However, until there is a need, there will
likely be some tightly coupled bits.

View File

@ -1,137 +0,0 @@
=================
mozbuild.frontend
=================
The mozbuild.frontend package is of sufficient importance and complexity
to warrant its own README file. If you are looking for documentation on
how the build system gets started, you've come to the right place.
Overview
========
Tree metadata (including the build system) is defined by a collection of
files in the source tree called *mozbuild* files. These typically are files
named *moz.build*. But, the actual name can vary.
Each *mozbuild* file defines basic metadata about the part of the tree
(typically directory scope) it resides in. This includes build system
configuration, such as the list of C++ files to compile or headers to install
or libraries to link together.
*mozbuild* files are actually Python scripts. However, their execution
is governed by special rules. This will be explained later.
Once a *mozbuild* file has executed, it is converted into a set of static
data structures.
The set of all data structures from all relevant *mozbuild* files
constitute all of the metadata from the tree.
How *mozbuild* Files Work
=========================
As stated above, *mozbuild* files are actually Python scripts. However,
their behavior is very different from what you would expect if you executed
the file using the standard Python interpreter from the command line.
There are two properties that make execution of *mozbuild* files special:
1. They are evaluated in a sandbox which exposes a limited subset of Python
2. There is a special set of global variables which hold the output from
execution.
The limited subset of Python is actually an extremely limited subset.
Only a few built-ins are exposed. These include *True*, *False*, and
*None*. Global functions like *import*, *print*, and *open* aren't defined.
Without these, *mozbuild* files can do very little. This is by design.
The side-effects of the execution of a *mozbuild* file are used to define
the build configuration. Specifically, variables set during the execution
of a *mozbuild* file are examined and their values are used to populate
data structures.
The enforced convention is that all UPPERCASE names inside a sandbox are
reserved and it is the value of these variables post-execution that is
examined. Furthermore, the set of allowed UPPERCASE variable names and
their types is statically defined. If you attempt to reference or assign
to an UPPERCASE variable name that isn't known to the build system or
attempt to assign a value of the wrong type (e.g. a string when it wants a
list), an error will be raised during execution of the *mozbuild* file.
This strictness is to ensure that assignment to all UPPERCASE variables
actually does something. If things weren't this way, *mozbuild* files
might think they were doing something but in reality wouldn't be. We don't
want to create false promises, so we validate behavior strictly.
If a variable is not UPPERCASE, you can do anything you want with it,
provided it isn't a function or other built-in. In other words, normal
Python rules apply.
All of the logic for loading and evaluating *mozbuild* files is in the
*reader* module. Of specific interest is the *MozbuildSandbox* class. The
*BuildReader* class is also important, as it is in charge of
instantiating *MozbuildSandbox* instances and traversing a tree of linked
*mozbuild* files. Unless you are a core component of the build system,
*BuildReader* is probably the only class you care about in this module.
The set of variables and functions *exported* to the sandbox is defined by
the *sandbox_symbols* module. These data structures are actually used to
populate MozbuildSandbox instances. And, there are tests to ensure that the
sandbox doesn't add new symbols without those symbols being added to the
module. And, since the module contains documentation, this ensures the
documentation is up to date (at least in terms of symbol membership).
How Sandboxes are Converted into Data Structures
================================================
The output of a *mozbuild* file execution is essentially a dict of all
the special UPPERCASE variables populated during its execution. While these
dicts are data structures, they aren't the final data structures that
represent the build configuration.
We feed the *mozbuild* execution output (actually *reader.MozbuildSandbox*
instances) into a *TreeMetadataEmitter* class instance. This class is
defined in the *emitter* module. *TreeMetadataEmitter* converts the
*MozbuildSandbox* instances into instances of the *TreeMetadata*-derived
classes from the *data* module.
All the classes in the *data* module define a domain-specific
component of the tree metadata, including build configuration. File compilation
and IDL generation are separate classes, for example. The only thing these
classes have in common is that they inherit from *TreeMetadata*, which is
merely an abstract base class.
The set of all emitted *TreeMetadata* instances (converted from executed
*mozbuild* files) constitutes the aggregate tree metadata. This is the
the authoritative definition of the build system, etc and is what's used by
all downstream consumers, such as build backends. There is no monolithic
class or data structure. Instead, the tree metadata is modeled as a collection
of *TreeMetadata* instances.
There is no defined mapping between the number of
*MozbuildSandbox*/*moz.build* instances and *TreeMetadata* instances.
Some *mozbuild* files will emit only 1 *TreeMetadata* instance. Some
will emit 7. Some may even emit 0!
The purpose of this *emitter* layer between the raw *mozbuild* execution
result and *TreeMetadata* is to facilitate additional normalization and
verification of the output. There are multiple downstream consumers of
this data and there is common functionality shared between them. An
abstraction layer that provides high-level filtering is a useful feature.
Thus *TreeMetadataEmitter* exists.
Other Notes
===========
*reader.BuildReader* and *emitter.TreeMetadataEmitter* have a nice
stream-based API courtesy of generators. When you hook them up properly,
*TreeMetadata* instances can be consumed before all *mozbuild* files have
been read. This means that errors down the pipe can trigger before all
upstream tasks (such as executing and converting) are complete. This should
reduce the turnaround time in the event of errors. This likely translates to
a more rapid pace for implementing backends, which require lots of iterative
runs through the entire system.
Lots of code in this sub-module is applicable to other systems, not just
Mozilla's. However, some of the code is tightly coupled. If there is a will
to extract the generic bits for re-use in other projects, that can and should
be done.

163
build/docs/python.rst Normal file
View File

@ -0,0 +1,163 @@
.. _python:
===========================
Python and the Build System
===========================
The Python programming language is used significantly in the build
system. If we need to write code for the build system or for a tool
related to the build system, Python is typically the first choice.
Python Requirements
===================
The tree requires Python 2.7.3 or greater but not Python 3 to build.
All Python packages not in the Python distribution are included in the
source tree. So all you should need is a vanilla Python install and you
should be good to go.
Only CPython (the Python distribution available from www.python.org) is
supported.
We require Python 2.7.3 (and not say 2.7.2) to build because Python
2.7.3 contains numerous bug fixes, especially around the area of Unicode
handling. These bug fixes are extremely annoying and have to be worked
around. The build maintainers were tired of doing this, so the minimum
version requirement was upped (bug 870420).
We intend to eventually support Python 3. This will come by way of dual
2.7/3.x compatibility because a single flag day conversion to 3.x will
be too cumbersome given the amount of Python that would need converted.
We will not know which 3.x minor release we are targeting until this
effort is underway. This is tracked in bug 636155.
Compiled Python Packages
========================
There are some features of the build that rely on compiled Python packages
(packages containing C source). These features are currently all
optional because not every system contains the Python development
headers required to build these extensions.
We recommend you have the Python development headers installed (``mach
bootstrap`` should do this for you) so you can take advantage of these
features.
Issues with OS X System Python
==============================
The Python that ships with OS X has historically been littered with
subtle bugs and suboptimalities. Furthermore, OS X up through 10.8 don't
ship with Python 2.7.3 (10.8 ships with 2.7.2).
OS X 10.8 and below users will be required to install a new Python
distribution. This may not be necessary for OS X 10.9+. However, we
still recommend installing a separate Python because of the history with
OS X's system Python issues.
We recommend installing Python through Homebrew or MacPorts. If you run
``mach bootstrap``, this should be done for you.
Virtualenvs
===========
The build system relies heavily on
`virtualenvs <http://www.virtualenv.org/en/latest/>`_. Virtualenvs are
standalone and isolated Python environments. The problem a virtualenv
solves is that of dependencies across multiple Python components. If two
components on a system relied on different versions of a package, there
could be a conflict. Instead of managing multiple versions of a package
simultaneously, Python and virtualenvs take the route that it is easier
to just keep them separate so there is no potential for conflicts.
Very early in the build process, a virtualenv is created inside the
:term:`object directory`. The virtualenv is configured such that it can
find all the Python packages in the source tree. The code for this lives
in :py:mod:`mozbuild.virtualenv`.
Deficiencies
------------
There are numerous deficiencies with the way virtualenvs are handled in
the build system.
* mach reinvents the virtualenv.
There is code in ``build/mach_bootstrap.py`` that configures ``sys.path``
much the same way the virtualenv does. There are various bugs tracking
this. However, no clear solution has yet been devised. It's not a huge
problem and thus not a huge priority.
* They aren't preserved across copies and packaging.
If you attempt to copy an entire tree from one machine to another or
from one directory to another, chances are the virtualenv will fall
apart. It would be nice if we could preserve it somehow. Instead of
actually solving portable virtualenvs, all we really need to solve is
encapsulating the logic for populating the virtualenv along with all
dependent files in the appropriate place.
* .pyc files written to source directory.
We rely heavily on ``.pth`` files in our virtualenv. A ``.pth`` file
is a special file that contains a list of paths. Python will take the
set of listed paths encountered in ``.pth`` files and add them to
``sys.path``.
When Python compiles a ``.py`` file to bytecode, it writes out a
``.pyc`` file so it doesn't have to perform this compilation again.
It puts these ``.pyc`` files alongside the ``.pyc`` file. Python
provides very little control for determing where these ``.pyc`` files
go, even in Python 3 (which offers customer importers).
With ``.pth`` files pointing back to directories in the source tree
and not the object directory, ``.pyc`` files are created in the source
tree. This is bad because when Python imports a module, it first looks
for a ``.pyc`` file before the ``.py`` file. If there is a ``.pyc``
file but no ``.py`` file, it will happily import the module. This
wreaks havoc during file moves, refactoring, etc.
There are various proposals for fixing this. See bug 795995.
Common Issues with Python
=========================
Upgrading your Python distribution breaks the virtualenv
--------------------------------------------------------
If you upgrade the Python distribution (e.g. install Python 2.7.5
from 2.7.3, chances are parts of the virtualenv will break.
This commonly manifests as a cryptic ``Cannot import XXX`` exception.
More often than not, the module being imported contains binary/compiled
components.
If you upgrade or reinstall your Python distribution, we recommend
clobbering your build.
Packages installed at the system level conflict with build system's
-------------------------------------------------------------------
It is common for people to install Python packages using ``sudo`` (e.g.
``sudo pip install psutil``) or with the system's package manager
(e.g. ``apt-get install python-mysql``.
A problem with this is that packages installed at the system level may
conflict with the package provided by the source tree. As of bug 907902
and changeset f18eae7c3b27 (September 16, 2013), this should no longer
be an issue since the virtualenv created as part of the build doesn't
add the system's ``site-packages`` directory to ``sys.path``. However,
poorly installed packages may still find a way to creep into the mix and
interfere with our virtualenv.
As a general principle, we recommend against using your system's package
manager or using ``sudo`` to install Python packages. Instead, create
virtualenvs and isolated Python environments for all of your Python
projects.
Python on $PATH is not appropriate
----------------------------------
Tools like ``mach`` will look for Python by performing ``/usr/bin/env
python`` or equivalent. Please be sure the appropriate Python 2.7.3+
path is on $PATH. On OS X, this likely means you'll need to modify your
shell's init script to put something ahead of ``/usr/bin``.