Files

455 lines
20 KiB
ReStructuredText
Raw Permalink Normal View History

2020-06-25 21:13:38 -04:00
********
Tutorial
********
Introduction
============
UWrap is a tool providing two essential capabilities: a tree matching language,
that allows to identify tree nodes and structure, and a wrapping language,
2020-06-25 21:13:38 -04:00
which allows to associate an input tree with another set of related nodes. This
tutorial will go through the basic of the language, using an Ada appliction
as a mean to generate a tree.
Displaying names
================
The initial step is to retreive an Ada application that we're going to feed to
uwrap. The project <link to ada tutorial repo> will do. Once you get this
2020-06-25 21:13:38 -04:00
project, to to its directory and run:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
$ uwrap -l ada -P prj.gpr -w names.wrp src/test.ads
the -l switch specifies the input language. -P refers to a GNAT project listing
various availables sources. The -w switch identifies the wrapping script to
2020-06-25 21:13:38 -04:00
execute. Finally, a list of source files, here src/spec.ads, provides the
list of files to load, parse and wrap.
Executing this command should list all entities declared within src/spec.ads.
2020-06-25 21:13:38 -04:00
The file tutorial.wrp looks like this:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
match DefiningName ()
2020-08-26 14:45:54 -04:00
wrap standard.out (it & "\n");
2020-06-25 21:13:38 -04:00
The above represent a UWrap command. It's composed in three parts: a matching
expression, the identificaton of a node to wrap, and then a wrapping operation.
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
[match <matching expression>]
[wrap <wrapping operation>];
2020-06-25 21:13:38 -04:00
Each section surrounded by [] in the description above can be omitted. Here,
2020-08-26 14:45:54 -04:00
we're ommiting the "node to wrap" section, which is automatically set to ``it``.
2020-06-25 21:13:38 -04:00
UWrap will traverse the tree node by node, through a depth traversal, examining
the parent before its children. Understanding the order of the traverse is
key to writing more complex programs, where one iteration may depend on either
the previous ones or the next (there's provision in UWrap to refer to objects
not yet created, but let's not get ahead of the flow).
The matching expression ``match DefiningName ()`` matches any node that can be of
the form DefiningName. In the case of Ada, the type of the node as described
by the libadalang library will match for that particular node. A good way
to understand the structure of the Ada matcher is to use GPS and open the
2020-06-25 21:13:38 -04:00
libadalang view. It will display the tree of Ada nodes as well as their type.
``wrap`` introduces the name of a template to use to wrap the node. In this
2020-06-25 21:13:38 -04:00
case, ``standard.out`` is a pre-existing template provided by the standard UWrap
libray.
``standard.out`` has a unique field that corresponds to a text to display at the
2020-08-26 14:45:54 -04:00
end of the wrapping process. In this case, the expression ``it & "\n"`` will
convert ``it`` to a string - which for Ada nodes means extracting the text of the
2020-06-25 21:13:38 -04:00
node - and then concatenating it with a end of line.
It's useful to undersand that ``standard.out`` is not an operation that is
2020-06-25 21:13:38 -04:00
instantaneously displaying text. Instead, it's creating a template instance - or
wrapper - around the node, after the template ``standard.out``. Once the
wrapping operations are over, UWrap will browse through all the created
2020-06-25 21:13:38 -04:00
wrappers and display the contents of those created after ``standard.out``. This
has various consequences that we'll talk about later on.
Identifying unaliased parameters
================================
Let's look at a more complex matching case. Coding standard implementation is
a good way to illustrate some of the basic capabilities. In this example, we
2020-06-25 21:13:38 -04:00
have a coding rule that states that parameters should not be marked "access" if
they are only refered to through .all notation.
Open the code under tutorial/access, and run the test:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
2020-08-26 15:04:15 -04:00
$ uwrap -l ada -P prj.gpr -w access.wrp src/test.adb
2020-06-25 21:13:38 -04:00
You should see:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-08-26 14:45:54 -04:00
2020-08-26 15:04:15 -04:00
test.adb:0:18: V2: access object should be out or in out
2020-06-25 21:13:38 -04:00
Indeed, that V2 parameter is only referenced through ``.all``, or said
2020-06-25 21:13:38 -04:00
otherwise, it is not referenced other than a ``.all`` reference. It's in
contradiction with the coding standard rule.
Let's open access.wrp and see how this is done:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
2020-08-26 14:45:54 -04:00
match param: ParamSpec (x" access ") and parent (subp: SubpBody()) do
2020-06-25 21:13:38 -04:00
match not subp.child (
Identifier ()
2020-06-25 21:13:38 -04:00
and not parent (DefiningName())
and not parent (ExplicitDeref())
and p_referenced_decl ().filter (param))
wrap standard.out
2020-08-26 14:45:54 -04:00
("\e<sloc>\e<it.child (DefiningName())> access object should be out or in out\n");
2020-07-12 18:41:07 -04:00
end;
2020-06-25 21:13:38 -04:00
Looks a lot more comprehensive that the previous one, right? Thankfully, it's
2020-06-25 21:13:38 -04:00
not that complicated, and with a little bit of trial and error, with the help
of the GPS libadalang view, it's relatively easy to develop these kind of
2020-06-25 21:13:38 -04:00
matcher. Let's look at them step by step.
First:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
2020-08-26 14:45:54 -04:00
match param: ParamSpec (x" access ") and parent (subp: SubpBody ())
2020-06-25 21:13:38 -04:00
We want to only match param specifications, hence ``ParamSpec``. Furthermore,
the potentially problematic specifications are access mode. There are various
2020-08-26 14:45:54 -04:00
ways to detect these. Here, we're using a textual matching. ``(x" access ")``.
Notice the x in front of the string, which signals that it should be interpreted
as a regular expression. So any parameter specification that contains the text
"access" will potentially be flagged. This is potentially weak (e.g. it doesn't
2020-08-31 12:18:39 +02:00
match ``ACCESS`` or ``:access`` ) but that'll do for the purpose of the
demonstration.
2020-06-25 21:13:38 -04:00
We are going to need to use the value of that node later on the analysis. So
we're capturing it under the ``param`` name. The expression
``param: ParamSpec (x" access ")`` check ``it`` against the ParamSpec predicate,
2020-08-26 14:45:54 -04:00
then stores ``it`` in param if matched.
2020-06-25 21:13:38 -04:00
We then need to concentrate only on subprogram bodies - our analysis is going
to cover the usage of these parameters. The second part of the condition is
using a parent predicate, which will look at all parent node, from the direct
parent to the root of the tree. We want to check that there's a ``SubpBody``
node in the parent chain. We're going to capture that value in the ``subp``
2020-06-25 21:13:38 -04:00
variable, which can be done either on the return of parent, or in the condition
of parent on the return of the predicate SubpBody () (the second option is
retained here).
2020-08-31 14:30:07 +02:00
The ``do`` keyword introduces a list of sub-commands. If the top command is
valid, then the subcommands are executed. Subcommands can be used to describe
2020-06-25 21:13:38 -04:00
more complex logic (there may be more than one command) or just for organization
2020-08-31 14:30:07 +02:00
purposes. Here, it allows to clearly differentiate the parameter that we check
2020-06-25 21:13:38 -04:00
from the analysis of its usage, but is not stricly necessary (we could have
a unique and larger match expression instead).
The matching block looks like
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
match not subp.child (
Identifier ()
2020-06-25 21:13:38 -04:00
and not parent (DefiningName())
and not parent (ExplicitDeref())
and p_referenced_decl (param))
Now we need to look at all the node underneath the subprogram declaring this
parameter. We're re-using the node captured before under subp, and through
dot notation, are querying all of it children. We're looking specifically for
a node that:
2020-06-26 08:36:15 -04:00
* Is an identified: ``Identifier ()``
* Isn't a declaration, or not a child of a defining name: ``not parent (DefiningName ())``
* Isn't a dereference, or not a child of explicit deref: ``not parent (ExplicitDeref ())``
* Is a reference to the parameter param initially captured: ``p_referenced_decl ().filter (param)``
2020-06-25 21:13:38 -04:00
A few notes here:
2020-06-26 08:36:15 -04:00
* ``p_referenced_decl`` is a standard libadalang property query. It does not
operate on declarations, which is the reason why we have to guard on
2020-06-26 08:36:15 -04:00
``DefiningNames`` before.
* Within a browsing predicate such as ``child`` or ``parent``, the value of
2020-08-31 14:30:07 +02:00
``it`` is switched to the sub-nodes being browsed. So in that second child
query, p_referenced_decl operates on the child being analyzed, not the top
level node which is a parameter specification. This is the reason why we had
to capture the value in the top level matched, then to re-inject it in the
``filter`` call to ``referenced_decl`` for comparison.
2020-06-25 21:13:38 -04:00
If any node of the form above is found, we're good. There is indeed a reference
to this parameter as an access value, and access mode can be justified. If not,
we will create a message wrapper:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
wrap standard.out
2020-08-26 14:45:54 -04:00
("\e<sloc>\e<it.child (DefiningName())> access object should be out or in out\n");
2020-06-25 21:13:38 -04:00
The above demonstrates the usage of the "\e<>" expression in strings."\e<"
2020-06-25 21:13:38 -04:00
introduces a section of expression, which allows to include in long string
pieces directly computed from the environment without having to concatenate
various pieces. This can be particularly useful when working with multi-lines
strings (openned and closed by """).
Advanced Ada user may have already identified the fact that this implementation
may be a bit naive. It may be useful to consider more situation, for example in
cases where dereference is implicit. The point of this tutorial isn't to show
full Ada awareness, but rather to demonstrate how to write relatively non-trivial
analysis in a few lines of code.
Generating an Ada wrapper
=========================
So far, we have only looked at the matching language, only for the purpose of
displaying messages on the standard output. While this is a perfectly honorable
usage, UWrap is design with wrapping in mind. For that purpose, it comes with
2020-06-25 21:13:38 -04:00
a standard runtime that facilitates wrapping around the Ada language.
Open the code under tutorial/wrap_names, and run the test:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
$ uwrap -l ada -P prj.gpr -w wrap_names.wrp src/some_package.ads
2020-06-25 21:13:38 -04:00
This should generate Ada files in the local directory. This file contains
2020-06-25 21:13:38 -04:00
function wrappers - every function calling its counterpart declared in test.ads,
but under different types, parameters and subprogram names.
Let's open wrap_names.wrp and see how this is done:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
import ada.wrappers;
walk wrap_ada_specs ();
2020-06-25 21:13:38 -04:00
2020-08-26 14:45:54 -04:00
match DefiningName (x"Some_(.*)")
wrap w_DefiningName ("My_\1");
2020-06-25 21:13:38 -04:00
2020-08-26 14:45:54 -04:00
match DefiningName (x"Some_(?<a>.*)") and parent (ParamSpec ())
wrap w_DefiningName ("A_Param_\e<a>");
2020-06-25 21:13:38 -04:00
2020-08-31 14:30:07 +02:00
First, you'll notice ``import ada.wrappers`` which references a module from the
standard UWrap library. As for languages such as Java, a UWrap script always
has access to all of its standard library. As a matter of fact, we've been
using it when writing ``standard.out`` before, using the ``out`` template of
the module ``standard``. Using an ``import`` clause allows to refer to the
entities declared in that module without having to prefix.
2020-06-25 21:13:38 -04:00
The next call is:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
walk wrap_ada_specs ();
2020-06-25 21:13:38 -04:00
This command is a conditionless execution over the template ``wrap_ada_specs``,
without storing the resulting object. This is needed here as we're only
interested in executing the commands declared in the wrap_ada_specs template
without storing its result or recording that it has been applied.
``wrap_ada_specs`` role is to further explore the current node and position many
default wrappers to it, in order to sustain the generation of the overall Ada
code. This is a good demonstrator of some of the most advanced
2020-06-25 21:13:38 -04:00
capabilities of UWrap - you can open the file [link to include/templates] for
more information. Note that as of today, it is primarily designed to be used
in conjunction to -fdump-ada-spec, and only supports the subset of specification
features that are generated by this option.
This line on its own is already a functionning wrapper code, which will take
a specification and create a wrapper around it, not changing anything. The next
line is instructing to alter the way the default wrapper works:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
2020-08-26 14:45:54 -04:00
match DefiningName (x"Some_(.*)")
wrap w_DefiningName ("My_\1");
2020-06-25 21:13:38 -04:00
The matcher here introduces regular expressions - we're matching any
DefiningName that has Some\_ in its name followed by zero or more characters.
2020-06-25 21:13:38 -04:00
This name is then captured as the first captured element, to be re-used later
on with the "\1" string reference.
2020-08-31 14:30:07 +02:00
We then wrap ``w_DefiningName``, providing a value ``My_1``, so essentially
changing ``Some\_`` by ``My\_``, and ignoring any character before ``Some_``.
``w_DefiningName`` is a template defined in ``ada.wrappers`` which gets
analyzed at the end of the wrapping process to generate a new name for a given
entity.
2020-06-25 21:13:38 -04:00
2020-08-31 14:30:07 +02:00
Writing our own wrapping with ``w_DefiningName`` has for effect to override the
default behavior of the standard wrappers. Indeed, there is also a command to
wrap ``DefiningName`` with ``w_DefiningName`` in ``wrap_ada_specs``. However,
2020-06-25 21:13:38 -04:00
wrapping operations are evaluated from last to first - with a rule that a given
template can only be wrapping once a given node. So for the entities where our
specific rule matches, no other ``w_DefiningName`` wrapping operation will
2020-06-25 21:13:38 -04:00
apply, and in particular none of the ones that are declared in ``wrap_ada_specs``.
This effect is more visible by considering the two wrapping operations in this
file:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
2020-08-26 14:45:54 -04:00
match DefiningName (x"Some_(.*)")
wrap w_DefiningName ("My_\1");
2020-06-25 21:13:38 -04:00
2020-08-26 14:45:54 -04:00
match DefiningName (x"Some_(?<a>.*)") and parent (ParamSpec ())
wrap w_DefiningName ("A_Param_\e<a>");
2020-06-25 21:13:38 -04:00
In this sequence, we will first evaluate wether we are on a defining name
child of a parameter which matches Some\_. If that's the case, we'll wrap the
2020-08-26 14:45:54 -04:00
name to x"A_Param\e<a>" and the wrapper above will not be executed. If we're
2020-06-25 21:13:38 -04:00
not on a parameter of the correct name, then we'll check if the matcher above
can be executed. And if not, the top one in ``wrap_ada_specs`` will be.
Also note the alternative syntax to capture a name in a regexp on the second
command. Often with wrapping programs, many regexps needs to work in conjunction
with the other with many pieces to match. It can be difficult to track the
2020-08-26 14:45:54 -04:00
group numbers, so the form x"(?<some name>some pattern>)" allows to name a given
2020-06-25 21:13:38 -04:00
group, for re-use in expressions later on.
Wrapping C strings into Ada Strings
===================================
Renaming Ada entities is a fun exercise, but let's look at a real life example.
The initial motivation behind UWrap was to provide a platform to automatically
massage the output of the C to Ada binder fdump-ada-spec (although argulably
there are much more uses cases of it now). Bindings generated by fdump-ada-spec
are extermly useful in the sense that they provide a binary accurate translation
from C to Ada. However, no decision on the semantic of the binding can be
2020-06-25 21:13:38 -04:00
provided, and C being very low level, it results into a very low level binding
which feels like C even with Ada.
UWrap use case here is to provide a relatively easy way to describe the
2020-06-25 21:13:38 -04:00
decisions to take as to developer a thicker binding. One of the most common of
these decisions to make is wether a C string should remain a pointer to char,
or if it should be converted to an Ada String - which involved a potentially
expensive operation (a copy) but improves greatly the quality of usage.
Let's have a look. Open the code under tutorial/c_strings and run the following:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
$ uwrap -l ada -P prj.gpr -w c_strings.wrp src/test_h.ads
``test_h.ads`` is a pregenerated output of fdump-ada-specs. You'll notice that
this project also has the original C code. The resulting wrapping code is
an Ada package that is calling the originally bound C code, and replacing in a
few places C strings with Ada strings. Let's look at the wrapper code:
2020-06-25 21:13:38 -04:00
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
import ada.wrappers;
import ada.transformations;
walk wrap_ada_specs ();
2020-06-25 21:13:38 -04:00
2020-08-26 14:45:54 -04:00
match DefiningName (x"(.*)_h")
2020-07-09 14:36:59 -04:00
wrap w_DefiningName ("\1_Wrapped");
2020-06-25 21:13:38 -04:00
match ParamSpec()
2020-07-09 14:36:59 -04:00
and p_type_expression ("Interfaces.C.Strings.chars_ptr")
and not p_defining_name ().filter ("leaveMeAlone")
walk chars_into_string ();
2020-06-25 21:13:38 -04:00
2020-07-09 14:36:59 -04:00
match SubpDecl
(f_subp_spec
(x"^function"
and p_returns ().filter ("Interfaces.C.Strings.chars_ptr")))
walk chars_into_string ();
2020-06-25 21:13:38 -04:00
2020-07-09 14:36:59 -04:00
2020-06-25 21:13:38 -04:00
As before, we're going to use ``ada.wrappers`` to invoke ``wrap_ada_specs``. This
time however, we're also going to use ``ada.transformations``. This module
provides a number of pre-set templates, that are able to do complex modifications
2020-06-25 21:13:38 -04:00
on the generated bound code. Note that it's perfecly fine to describe the fine
behavior of these transformation yourself. However, this requires a deep
understanding of the way Ada wrapping is setup, while the already provided
transformation are off the shelf. They can also serve as a base to develop
2020-06-25 21:13:38 -04:00
custom ones. Description on the way these work go beyond the scope of the
tutorial, and will be covered by the full UWrap documentation.
2020-06-26 08:36:15 -04:00
Also note the use of ``normalize_ada_name`` when wrapping with w_DefiningName.
This is a standard function that changes the style of an identifier to match
the most common Ada rule, e.g. changing "anEntityName" to "An_Entity_Name".
2020-06-25 21:13:38 -04:00
The first command reads:
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
match ParamSpec()
and p_type_expression ().filter ("Interfaces.C.Strings.chars_ptr")
and not p_defining_name ().filter ("leaveMeAlone")
walk chars_into_string ();
2020-06-25 21:13:38 -04:00
This matches a parameter specification, then looks at a property
2020-07-09 14:36:59 -04:00
``p_type_expression``, which would be the type of the parameter. Here,
2020-06-25 21:13:38 -04:00
we're performing a textual check to the full name of the C char type, which
corresponds to the pattern generated by fdump-ada-specs. We're also then
2020-06-25 21:13:38 -04:00
describing a condition where we don't want to apply this transformation, if the
2020-08-26 14:45:54 -04:00
defining name of the parameter is exactly x"leaveMeAlone". If all these conditions
match, then ``wrap chars_into_string ()`` will apply the preset
2020-06-25 21:13:38 -04:00
transformation from C string to Ada string.
To modify a returned type, a transformation needs to be applied directly on the
subprogram itself. This is the role of the code
2020-06-25 21:21:12 -04:00
.. code-block:: text
2020-06-25 21:13:38 -04:00
2020-07-09 14:36:59 -04:00
match SubpDecl
(f_subp_spec
(x"^function"
and p_returns ().filter ("Interfaces.C.Strings.chars_ptr")))
walk chars_into_string ();
2020-06-25 21:13:38 -04:00
For illustration purposes, the style of this condition is different from the
2020-07-09 14:36:59 -04:00
previous one (both are possible, and can be more or less convenient depending
on the context). Here, we we will here match for a subprogram declaration
2020-07-09 14:36:59 -04:00
through the SubpDecl predicate. Inside the parenthesis, we're decscriping how
this SubpDecl should look like - it should have a field f_subp_spec. Instead
of that field predicate, we're describing hot this field should look like. Its
text should start with "function" (so it is a function), and it must have a
2020-07-09 14:36:59 -04:00
property p_returns which textually matches "Interfaces.C.Strings.chars_ptr" (it
returns a C string. We can then walk over the template ``chars_into_string``. It is
versatile enough and knows how to handle both parameters and functions.
2020-06-25 21:13:38 -04:00
2020-07-09 14:36:59 -04:00
Some careful reader may have noticed the usage of the predicate ``f_subp_spec``
with a different prefix than the ``p_`` one seen before. This is actually
2020-07-09 14:36:59 -04:00
similar to these property check ``p_referenced_decl``, except that we mean
"match a node that has such a field and which field matches
2020-06-25 21:13:38 -04:00
a specific values". Properties and fields are features of langkit and libadalang
which input tree of UWrap currently relies on.
Going further
=============
While UWrap documentation is still work in progress, and some of its semanics
are still being refined. The language offers much more capabilities such as
2020-06-25 21:13:38 -04:00
template definition, containers, templates types, control over the iteration,
creation of arbitrary subnodes, matching over the created templates, lambda,
reductions, etc. A good way to have a glance of it is to check out the
2020-06-25 21:21:12 -04:00
`core testsuite <https://github.com/AdaCore/uwrap/tree/master/testsuite/tests/core>`_
of the language.
2020-06-25 21:13:38 -04:00
On top of these, a number of Ada transformations are already implemented,
2020-06-25 21:13:38 -04:00
allowing to transform return integers into exception, access parameters into
returned values or out modes or arrays, etc. A good way to get an idea on how
2020-06-25 21:21:12 -04:00
these work is to look at the `fdump-ada-spec specific testuite
<https://github.com/AdaCore/uwrap/tree/master/testsuite/tests/fdump-ada-spec>`_, directly
at the `runtime implementation <https://github.com/AdaCore/uwrap/tree/master/include/ada>`_
of the transformations and ada wrappers or in the usage makde to `bind cuda
<https://github.com/AdaCore/cuda/blob/master/api/cuda.wrp>`_.
2020-06-25 21:13:38 -04:00
At the time of writing, a lot for work is still necessary to stabilize the
language, its processing and error recovergy. Performances have not been
optimized yet and a few shortcuts may end up to long processing times on
particulary large input files or complex wrappers. Feel free to open issues
2020-06-25 21:13:38 -04:00
on the github tracker to report any problem or suggestion!