NO BUG - Update build system docs for Python and moz.build files

DONTBUILD (NPOTB)
2024-09-13 09:24:08 -07:00 · 2013-09-30 14:32:07 +02:00 · 2013-09-30 14:32:07 +02:00 · 58fd1367e3
commit 58fd1367e3
parent d12c994044
5 changed files with 334 additions and 182 deletions
--- a/build/docs/build-overview.rst
+++ b/build/docs/build-overview.rst
@ -22,62 +22,67 @@ harmony to build the source tree. We begin with a graphic overview.
 Phase 1: Configuration
 ======================

-Phase 1 centers around the configure script, which is a bash shell script.
-The file is generated from a file called configure.in which is written in M4
+Phase 1 centers around the ``configure`` script, which is a bash shell script.
+The file is generated from a file called ``configure.in`` which is written in M4
 and processed using Autoconf 2.13 to create the final configure script.
-You don't have to worry about how you obtain a configure file: the build system
-does this for you.
+You don't have to worry about how you obtain a ``configure`` file: the build
+system does this for you.

-The primary job of configure is to determine characteristics of the system and
-compiler, apply options passed into it, and validate everything looks OK to
-build. The primary output of the configure script is an executable file in the
-object directory called config.status. configure also produces some additional
-files (like autoconf.mk). However, the most important file in terms of
-architecture is config.status.
+The primary job of ``configure`` is to determine characteristics of the system
+and compiler, apply options passed into it, and validate everything looks OK to
+build. The primary output of the ``configure`` script is an executable file
+in the object directory called ``config.status``. ``configure`` also produces
+some additional files (like ``autoconf.mk``). However, the most important file
+in terms of architecture is ``config.status``.

-The existence of a config.status file may be familiar to those who have worked
-with Autoconf before. However, Mozilla's config.status is different from almost
-any other config.status you've ever seen: it's written in Python! Instead of
-having our configure script produce a shell script, we have it generating Python.
+The existence of a ``config.status`` file may be familiar to those who have worked
+with Autoconf before. However, Mozilla's ``config.status`` is different from almost
+any other ``config.status`` you've ever seen: it's written in Python! Instead of
+having our ``configure`` script produce a shell script, we have it generating
+Python.

 Now is as good a time as any to mention that Python is prevalent in our build
 system. If we need to write code for the build system, we do it in Python.
-That's just how we roll.
+That's just how we roll. For more, see :ref:`python`.

-config.status contains 2 parts: data structures representing the output of
-configure and a command-line interface for preparing/configuring/generating
+``config.status`` contains 2 parts: data structures representing the output of
+``configure`` and a command-line interface for preparing/configuring/generating
 an appropriate build backend. (A build backend is merely a tool used to build
 the tree - like GNU Make or Tup). These data structures essentially describe
 the current state of the system and what the existing build configuration looks
 like. For example, it defines which compiler to use, how to invoke it, which
 application features are enabled, etc. You are encouraged to open up
-config.status to have a look for yourself!
+``config.status`` to have a look for yourself!

-Once we have emitted a config.status file, we pass into the realm of phase 2.
+Once we have emitted a ``config.status`` file, we pass into the realm of
+phase 2.

 Phase 2: Build Backend Preparation and the Build Definition
 ===========================================================

-Once configure has determined what the current build configuration is, we need
-to apply this to the source tree so we can actually build.
+Once ``configure`` has determined what the current build configuration is,
+we need to apply this to the source tree so we can actually build.

-What essentially happens is the automatically-produced config.status Python
-script is executed as soon as configure has generated it. config.status is charged
-with the task of tell a tool had to build the tree. To do this, config.status
-must first scan the build system definition.
+What essentially happens is the automatically-produced ``config.status`` Python
+script is executed as soon as ``configure`` has generated it. ``config.status``
+is charged with the task of tell a tool how to build the tree. To do this,
+``config.status`` must first scan the build system definition.

-The build system definition consists of various moz.build files in the tree.
-There is roughly one moz.build file per directory or pet set of related directories.
-Each moz.build files defines how its part of the build config works. For example it
-says I want these C++ files compiled or look for additional information in these
-directories. config.status starts with the main moz.build file and then recurses
-into all referenced files and directories. As the moz.build files are read, data
-structures describing the overall build system definition are emitted. These data
-structures are then read by a build backend generator which then converts them
-into files, function calls, etc. In the case of a `make` backend, the generator
-writes out Makefiles.
+The build system definition consists of various ``moz.build`` files in the tree.
+There is roughly one ``moz.build`` file per directory or per set of related directories.
+Each ``moz.build`` files defines how its part of the build config works. For
+example it says *I want these C++ files compiled* or *look for additional
+information in these directories.* config.status starts with the ``moz.build``
+file from the root directory and then descends into referenced ``moz.build``
+files by following ``DIRS`` variables or similar.

-When config.status runs, you'll see the following output::
+As the ``moz.build`` files are read, data structures describing the overall
+build system definition are emitted. These data structures are then fed into a
+build backend, which then performs actions, such as writing out files to
+be read by a build tool. e.g. a ``make`` backend will write a
+``Makefile``.
+
+When ``config.status`` runs, you'll see the following output::

   Reticulating splines...
   Finished reading 1096 moz.build files into 1276 descriptors in 2.40s
@ -85,14 +90,18 @@ When config.status runs, you'll see the following output::
   2188 total backend files. 0 created; 1 updated; 2187 unchanged
   Total wall time: 5.03s; CPU time: 3.79s; Efficiency: 75%

-What this is saying is that a total of 1096 moz.build files were read. Altogether,
-1276 data structures describing the build configuration were derived from them.
-It took 2.40s wall time to just read these files and produce the data structures.
-The 1276 data structures were fed into the build backend which then determined it
-had to manage 2188 files derived from those data structures. Most of them
-already existed and didn't need changed. However, 1 was updated as a result of
-the new configuration. The whole process took 5.03s. Although, only 3.79s was in
-CPU time. That likely means we spent roughly 25% of the time waiting on I/O.
+What this is saying is that a total of *1096* ``moz.build`` files were read.
+Altogether, *1276* data structures describing the build configuration were
+derived from them.  It took *2.40s* wall time to just read these files and
+produce the data structures.  The *1276* data structures were fed into the
+build backend which then determined it had to manage *2188* files derived
+from those data structures. Most of them already existed and didn't need
+changed. However, *1* was updated as a result of the new configuration.
+The whole process took *5.03s*. Although, only *3.79s* was in
+CPU time. That likely means we spent roughly *25%* of the time waiting on
+I/O.
+
+For more on how ``moz.build`` files work, see :ref:`mozbuild-files`.

 Phase 3: Invokation of the Build Backend
 ========================================
--- a/build/docs/index.rst
+++ b/build/docs/index.rst
@ -17,10 +17,12 @@ Important Concepts

   build-overview
   Mozconfig Files <mozconfigs>
+   mozbuild-files
   Profile Guided Optimization <pgo>
   slow
   environment-variables
   build-targets
+   python
   test_manifests

 mozbuild
@ -33,7 +35,6 @@ Mozilla build system.
   :maxdepth: 1

   mozbuild/index
-   mozbuild/frontend
   mozbuild/dumbmake


--- a/build/docs/mozbuild-files.rst
+++ b/build/docs/mozbuild-files.rst
@ -0,0 +1,116 @@
+.. _mozbuild-files:
+
+===============
+moz.build Files
+===============
+
+``moz.build`` files are the mechanism by which tree metadata (notably
+the build configuration) is defined.
+
+Directories in the tree contain ``moz.build`` files which declare
+functionality for their respective part of the tree. This includes
+things such as the list of C++ files to compile, where to find tests,
+etc.
+
+``moz.build`` files are actually Python scripts. However, their
+execution is governed by special rules. This is explained below.
+
+moz.build Python Sandbox
+========================
+
+As mentioned above, ``moz.build`` files are Python scripts. However,
+they are executed in a special Python *sandbox* that significantly
+changes and limits the execution environment. The environment is so
+different, it's doubtful most ``moz.build`` files would execute without
+error if executed by a vanilla Python interpreter (e.g. ``python
+moz.build``.
+
+The following properties make execution of ``moz.build`` files special:
+
+1. The execution environment exposes a limited subset of Python.
+2. There is a special set of global symbols and an enforced naming
+   convention of symbols.
+
+The limited subset of Python is actually an extremely limited subset.
+Only a few symbols from ``__builtins__`` are exposed. These include
+``True``, ``False``, and ``None``. Global functions like ``import``,
+``print``, and ``open`` aren't available. Without these, ``moz.build``
+files can do very little. *This is by design*.
+
+The execution sandbox treats all ``UPPERCASE`` variables specially. Any
+``UPPERCASE`` variable must be known to the sandbox before the script
+executes. Any attempt to read or write to an unknown ``UPPERCASE``
+variable will result in an exception being raised. Furthermore, the
+types of all ``UPPERCASE`` variables is strictly enforced. Attempts to
+assign an incompatible type to an ``UPPERCASE`` variable will result in
+an exception being raised.
+
+The strictness of behavior with ``UPPERCASE`` variables is a very
+intentional design decision. By ensuring strict behavior, any operation
+involving an ``UPPERCASE`` variable is guaranteed to have well-defined
+side-effects. Previously, when the build configuration was defined in
+``Makefiles``, assignments to variables that did nothing would go
+unnoticed. ``moz.build`` files fix this problem by eliminating the
+potential for false promises.
+
+In the sandbox, all ``UPPERCASE`` variables are globals and all
+non-``UPPERCASE`` variables are locals. After a ``moz.build`` file has
+completed execution, only the globals are used to retrieve state.
+
+The set of variables and functions available to the Python sandbox is
+defined by the :py:mod:`mozbuild.frontend.sandbox_symbols` module. The
+data structures in this module are consumed by the
+:py:class:`mozbuild.frontend.reader.MozbuildSandbox` class to construct
+the sandbox. There are tests to ensure that the set of symbols exposed
+to an empty sandbox are all defined in the ``sandbox_symbols`` module.
+This module also contains documentation for each symbol, so nothing can
+sneak into the sandbox without being explicitly defined and documented.
+
+Reading and Traversing moz.build Files
+======================================
+
+The process responsible for reading ``moz.build`` files simply starts at
+a root ``moz.build`` file, processes it, emits the globals namespace to
+a consumer, and then proceeds to process additional referenced
+``moz.build`` files from the original file. The consumer then examines
+the globals/``UPPERCASE`` variables set as part of execution and then
+converts the data therein to Python class instances.
+
+The executed Python sandbox is essentially represented as a dictionary
+of all the special ``UPPERCASE`` variables populated during its
+execution.
+
+The code for reading ``moz.build`` files lives in
+:py:mod:`mozbuild.frontend.reader`. The evaluated Python sandboxes are
+passed into :py:mod:`mozbuild.frontend.emitter`, which converts them to
+classes defined in :py:mod:`mozbuild.frontend.data`. Each class in this
+module define a domain-specific component of tree metdata. e.g. there
+will be separate classes that represent a JavaScript file vs a compiled
+C++ file or test manifests. This means downstream consumers of this data
+can filter on class types to only consume what they are interested in.
+
+There is no well-defined mapping between ``moz.build`` file instances
+and the number of :py:mod:`mozbuild.frontend.data` classes derived from
+each. Depending on the content of the ``moz.build`` file, there may be 1
+object derived or 100.
+
+The purpose of the ``emitter`` layer between low-level sandbox execution
+and metadata representation is to facilitate a unified normalization and
+verification step. There are multiple downstream consumers of the
+``moz.build``-derived data and many will perform the same actions. This
+logic can be complicated, so we a component dedicated to it.
+
+Other Notes
+===========
+
+:py:class:`mozbuild.frontend.reader.BuildReader`` and
+:py:class:`mozbuild.frontend.reader.TreeMetadataEmitter`` have a
+stream-based API courtesy of generators. When you hook them up properly,
+the :py:mod:`mozbuild.frontend.data` classes are emitted before all
+``moz.build`` files have been read. This means that downstream errors
+are raised soon after sandbox execution.
+
+Lots of the code for evaluating Python sandboxes is applicable to
+non-Mozilla systems. In theory, it could be extracted into a standalone
+and generic package. However, until there is a need, there will
+likely be some tightly coupled bits.
--- a/build/docs/mozbuild/frontend.rst
+++ b/build/docs/mozbuild/frontend.rst
@ -1,137 +0,0 @@
-=================
-mozbuild.frontend
-=================
-
-The mozbuild.frontend package is of sufficient importance and complexity
-to warrant its own README file. If you are looking for documentation on
-how the build system gets started, you've come to the right place.
-
-Overview
-========
-
-Tree metadata (including the build system) is defined by a collection of
-files in the source tree called *mozbuild* files. These typically are files
-named *moz.build*. But, the actual name can vary.
-
-Each *mozbuild* file defines basic metadata about the part of the tree
-(typically directory scope) it resides in. This includes build system
-configuration, such as the list of C++ files to compile or headers to install
-or libraries to link together.
-
-*mozbuild* files are actually Python scripts. However, their execution
-is governed by special rules. This will be explained later.
-
-Once a *mozbuild* file has executed, it is converted into a set of static
-data structures.
-
-The set of all data structures from all relevant *mozbuild* files
-constitute all of the metadata from the tree.
-
-How *mozbuild* Files Work
-=========================
-
-As stated above, *mozbuild* files are actually Python scripts. However,
-their behavior is very different from what you would expect if you executed
-the file using the standard Python interpreter from the command line.
-
-There are two properties that make execution of *mozbuild* files special:
-
-1. They are evaluated in a sandbox which exposes a limited subset of Python
-2. There is a special set of global variables which hold the output from
-   execution.
-
-The limited subset of Python is actually an extremely limited subset.
-Only a few built-ins are exposed. These include *True*, *False*, and
-*None*. Global functions like *import*, *print*, and *open* aren't defined.
-Without these, *mozbuild* files can do very little. This is by design.
-
-The side-effects of the execution of a *mozbuild* file are used to define
-the build configuration. Specifically, variables set during the execution
-of a *mozbuild* file are examined and their values are used to populate
-data structures.
-
-The enforced convention is that all UPPERCASE names inside a sandbox are
-reserved and it is the value of these variables post-execution that is
-examined. Furthermore, the set of allowed UPPERCASE variable names and
-their types is statically defined. If you attempt to reference or assign
-to an UPPERCASE variable name that isn't known to the build system or
-attempt to assign a value of the wrong type (e.g. a string when it wants a
-list), an error will be raised during execution of the *mozbuild* file.
-This strictness is to ensure that assignment to all UPPERCASE variables
-actually does something. If things weren't this way, *mozbuild* files
-might think they were doing something but in reality wouldn't be. We don't
-want to create false promises, so we validate behavior strictly.
-
-If a variable is not UPPERCASE, you can do anything you want with it,
-provided it isn't a function or other built-in. In other words, normal
-Python rules apply.
-
-All of the logic for loading and evaluating *mozbuild* files is in the
-*reader* module. Of specific interest is the *MozbuildSandbox* class. The
-*BuildReader* class is also important, as it is in charge of
-instantiating *MozbuildSandbox* instances and traversing a tree of linked
-*mozbuild* files. Unless you are a core component of the build system,
-*BuildReader* is probably the only class you care about in this module.
-
-The set of variables and functions *exported* to the sandbox is defined by
-the *sandbox_symbols* module. These data structures are actually used to
-populate MozbuildSandbox instances. And, there are tests to ensure that the
-sandbox doesn't add new symbols without those symbols being added to the
-module. And, since the module contains documentation, this ensures the
-documentation is up to date (at least in terms of symbol membership).
-
-How Sandboxes are Converted into Data Structures
-================================================
-
-The output of a *mozbuild* file execution is essentially a dict of all
-the special UPPERCASE variables populated during its execution. While these
-dicts are data structures, they aren't the final data structures that
-represent the build configuration.
-
-We feed the *mozbuild* execution output (actually *reader.MozbuildSandbox*
-instances) into a *TreeMetadataEmitter* class instance. This class is
-defined in the *emitter* module. *TreeMetadataEmitter* converts the
-*MozbuildSandbox* instances into instances of the *TreeMetadata*-derived
-classes from the *data* module.
-
-All the classes in the *data* module define a domain-specific
-component of the tree metadata, including build configuration. File compilation
-and IDL generation are separate classes, for example. The only thing these
-classes have in common is that they inherit from *TreeMetadata*, which is
-merely an abstract base class.
-
-The set of all emitted *TreeMetadata* instances (converted from executed
-*mozbuild* files) constitutes the aggregate tree metadata. This is the
-the authoritative definition of the build system, etc and is what's used by
-all downstream consumers, such as build backends. There is no monolithic
-class or data structure. Instead, the tree metadata is modeled as a collection
-of *TreeMetadata* instances.
-
-There is no defined mapping between the number of
-*MozbuildSandbox*/*moz.build* instances and *TreeMetadata* instances.
-Some *mozbuild* files will emit only 1 *TreeMetadata* instance. Some
-will emit 7. Some may even emit 0!
-
-The purpose of this *emitter* layer between the raw *mozbuild* execution
-result and *TreeMetadata* is to facilitate additional normalization and
-verification of the output. There are multiple downstream consumers of
-this data and there is common functionality shared between them. An
-abstraction layer that provides high-level filtering is a useful feature.
-Thus *TreeMetadataEmitter* exists.
-
-Other Notes
-===========
-
-*reader.BuildReader* and *emitter.TreeMetadataEmitter* have a nice
-stream-based API courtesy of generators. When you hook them up properly,
-*TreeMetadata* instances can be consumed before all *mozbuild* files have
-been read. This means that errors down the pipe can trigger before all
-upstream tasks (such as executing and converting) are complete. This should
-reduce the turnaround time in the event of errors. This likely translates to
-a more rapid pace for implementing backends, which require lots of iterative
-runs through the entire system.
-
-Lots of code in this sub-module is applicable to other systems, not just
-Mozilla's. However, some of the code is tightly coupled. If there is a will
-to extract the generic bits for re-use in other projects, that can and should
-be done.
--- a/build/docs/python.rst
+++ b/build/docs/python.rst
@ -0,0 +1,163 @@
+.. _python:
+
+===========================
+Python and the Build System
+===========================
+
+The Python programming language is used significantly in the build
+system. If we need to write code for the build system or for a tool
+related to the build system, Python is typically the first choice.
+
+Python Requirements
+===================
+
+The tree requires Python 2.7.3 or greater but not Python 3 to build.
+All Python packages not in the Python distribution are included in the
+source tree. So all you should need is a vanilla Python install and you
+should be good to go.
+
+Only CPython (the Python distribution available from www.python.org) is
+supported.
+
+We require Python 2.7.3 (and not say 2.7.2) to build because Python
+2.7.3 contains numerous bug fixes, especially around the area of Unicode
+handling. These bug fixes are extremely annoying and have to be worked
+around. The build maintainers were tired of doing this, so the minimum
+version requirement was upped (bug 870420).
+
+We intend to eventually support Python 3. This will come by way of dual
+2.7/3.x compatibility because a single flag day conversion to 3.x will
+be too cumbersome given the amount of Python that would need converted.
+We will not know which 3.x minor release we are targeting until this
+effort is underway. This is tracked in bug 636155.
+
+Compiled Python Packages
+========================
+
+There are some features of the build that rely on compiled Python packages
+(packages containing C source). These features are currently all
+optional because not every system contains the Python development
+headers required to build these extensions.
+
+We recommend you have the Python development headers installed (``mach
+bootstrap`` should do this for you) so you can take advantage of these
+features.
+
+Issues with OS X System Python
+==============================
+
+The Python that ships with OS X has historically been littered with
+subtle bugs and suboptimalities. Furthermore, OS X up through 10.8 don't
+ship with Python 2.7.3 (10.8 ships with 2.7.2).
+
+OS X 10.8 and below users will be required to install a new Python
+distribution. This may not be necessary for OS X 10.9+. However, we
+still recommend installing a separate Python because of the history with
+OS X's system Python issues.
+
+We recommend installing Python through Homebrew or MacPorts. If you run
+``mach bootstrap``, this should be done for you.
+
+Virtualenvs
+===========
+
+The build system relies heavily on
+`virtualenvs <http://www.virtualenv.org/en/latest/>`_. Virtualenvs are
+standalone and isolated Python environments. The problem a virtualenv
+solves is that of dependencies across multiple Python components. If two
+components on a system relied on different versions of a package, there
+could be a conflict. Instead of managing multiple versions of a package
+simultaneously, Python and virtualenvs take the route that it is easier
+to just keep them separate so there is no potential for conflicts.
+
+Very early in the build process, a virtualenv is created inside the
+:term:`object directory`. The virtualenv is configured such that it can
+find all the Python packages in the source tree. The code for this lives
+in :py:mod:`mozbuild.virtualenv`.
+
+Deficiencies
+------------
+
+There are numerous deficiencies with the way virtualenvs are handled in
+the build system.
+
+* mach reinvents the virtualenv.
+
+  There is code in ``build/mach_bootstrap.py`` that configures ``sys.path``
+  much the same way the virtualenv does. There are various bugs tracking
+  this. However, no clear solution has yet been devised. It's not a huge
+  problem and thus not a huge priority.
+
+* They aren't preserved across copies and packaging.
+
+  If you attempt to copy an entire tree from one machine to another or
+  from one directory to another, chances are the virtualenv will fall
+  apart. It would be nice if we could preserve it somehow. Instead of
+  actually solving portable virtualenvs, all we really need to solve is
+  encapsulating the logic for populating the virtualenv along with all
+  dependent files in the appropriate place.
+
+* .pyc files written to source directory.
+
+  We rely heavily on ``.pth`` files in our virtualenv. A ``.pth`` file
+  is a special file that contains a list of paths. Python will take the
+  set of listed paths encountered in ``.pth`` files and add them to
+  ``sys.path``.
+
+  When Python compiles a ``.py`` file to bytecode, it writes out a
+  ``.pyc`` file so it doesn't have to perform this compilation again.
+  It puts these ``.pyc`` files alongside the ``.pyc`` file. Python
+  provides very little control for determing where these ``.pyc`` files
+  go, even in Python 3 (which offers customer importers).
+
+  With ``.pth`` files pointing back to directories in the source tree
+  and not the object directory, ``.pyc`` files are created in the source
+  tree. This is bad because when Python imports a module, it first looks
+  for a ``.pyc`` file before the ``.py`` file. If there is a ``.pyc``
+  file but no ``.py`` file, it will happily import the module. This
+  wreaks havoc during file moves, refactoring, etc.
+
+  There are various proposals for fixing this. See bug 795995.
+
+Common Issues with Python
+=========================
+
+Upgrading your Python distribution breaks the virtualenv
+--------------------------------------------------------
+
+If you upgrade the Python distribution (e.g. install Python 2.7.5
+from 2.7.3, chances are parts of the virtualenv will break.
+This commonly manifests as a cryptic ``Cannot import XXX`` exception.
+More often than not, the module being imported contains binary/compiled
+components.
+
+If you upgrade or reinstall your Python distribution, we recommend
+clobbering your build.
+
+Packages installed at the system level conflict with build system's
+-------------------------------------------------------------------
+
+It is common for people to install Python packages using ``sudo`` (e.g.
+``sudo pip install psutil``) or with the system's package manager
+(e.g. ``apt-get install python-mysql``.
+
+A problem with this is that packages installed at the system level may
+conflict with the package provided by the source tree. As of bug 907902
+and changeset f18eae7c3b27 (September 16, 2013), this should no longer
+be an issue since the virtualenv created as part of the build doesn't
+add the system's ``site-packages`` directory to ``sys.path``. However,
+poorly installed packages may still find a way to creep into the mix and
+interfere with our virtualenv.
+
+As a general principle, we recommend against using your system's package
+manager or using ``sudo`` to install Python packages. Instead, create
+virtualenvs and isolated Python environments for all of your Python
+projects.
+
+Python on $PATH is not appropriate
+----------------------------------
+
+Tools like ``mach`` will look for Python by performing ``/usr/bin/env
+python`` or equivalent. Please be sure the appropriate Python 2.7.3+
+path is on $PATH. On OS X, this likely means you'll need to modify your
+shell's init script to put something ahead of ``/usr/bin``.