mirror of
https://github.com/AdaCore/xmlada.git
synced 2026-02-12 12:30:28 -08:00
KC07-033 git-svn-id: svn+ssh://svn.eu.adacore.com/Dev/trunk/xmlada@182161 936e1b1b-40f2-da11-902a-00137254ae57
69 lines
2.8 KiB
ReStructuredText
69 lines
2.8 KiB
ReStructuredText
.. _The_Input_module:
|
|
|
|
****************
|
|
The Input module
|
|
****************
|
|
|
|
This module provides a set of packages with a common interface to access the
|
|
characters contained in a stream. Various implementations are provided to
|
|
access files and manipulate standard Ada strings.
|
|
|
|
A top-level tagged type is provided that must be extended for the various
|
|
streams. It is assumed that the pointer to the current character in the stream
|
|
can only go forward, and never backward. As a result, it is possible to
|
|
implement this package for sockets or other strings where it isn't even
|
|
possible to go backward. This also means that one doesn't have to provide
|
|
buffers in such cases, and thus that it is possible to provide memory-efficient
|
|
readers.
|
|
|
|
Two predefined readers are available, namely `String_Input` to read characters
|
|
from a standard Ada string, and `File_Input` to read characters from a standard
|
|
text file.
|
|
|
|
They all provide the following primite operations:
|
|
|
|
`Open`
|
|
|
|
Although this operation isn't exactly overriden, since its parameters
|
|
depend on the type of stream you want to read from, it is nice to
|
|
use a standard name for this constructor.
|
|
|
|
`Close`
|
|
This terminates the stream reader and free any associated memory. It
|
|
is no longer possible to read from the stream afterwards.
|
|
|
|
`Next_Char`
|
|
Return the next Unicode character in the stream. Note this character doesn't
|
|
have to be associated specifically with a single byte, but that it depends on
|
|
the encoding chosen for the stream (see the unicode module documentation for
|
|
more information).
|
|
|
|
The next time this function is called, it returns the following character
|
|
from the stream.
|
|
|
|
`Eof`
|
|
This function should return True when the reader has already returned the
|
|
last character from the stream. Note that it is not guarantee that a second
|
|
call to Eof will also return True.
|
|
|
|
It is the responsability of this stream reader to correctly call the decoding
|
|
functions in the unicode module so as to return one single valid unicode
|
|
character. No further processing is done on the result of `Next_Char`. Note
|
|
that the standard `File_Input` and `String_Input` streams can automatically
|
|
detect the encoding to use for a file, based on a header read directly from the
|
|
file.
|
|
|
|
Based on the first four bytes of the stream (assuming this is valid XML), they
|
|
will automatically detect whether the file was encoded as Utf8, Utf16,... If
|
|
you are writing your own input streams, consider adding this automatic
|
|
detection as well.
|
|
|
|
However, it is always possible to override the default through a call to
|
|
`Set_Encoding`. This allows you to specify both the character set (Latin1, ...)
|
|
and the character encoding scheme (Utf8,...).
|
|
|
|
The user is also encouraged to set the identifiers for the stream they are
|
|
parsing, through calls to `Set_System_Id` and `Set_Public_Id`. These are used
|
|
when reporting error messages.
|
|
|