Mono.Posix
1.0.5000.0
2.0.0.0
4.0.0.0
This type is safe for multithreaded operations.
System.Text.Encoding
A Unix filename .
Unix filenames are an interesting construct, as there is no
encoding. The operating system kernel only maintains a sequence of bytes
for the filename, with no encoding implied. This makes it non-trivial (or
impossible) to determine what encoding a filename is in -- it could be
UTF-8, ASCII, Shift-JIS, or some binary data inserted by a freak
touch(1) accident (try touch "$(printf "test\xffname")"
within a bash(1) shell for an example).
On the other hand, developers and users expect filenames to be
strings, and the type is a UTF-16 encoded
string. This consequently requires that all filesystem byte sequences be
converted into some UTF-16 encoded string so that files can be used
sensibly.
All filenames strings provided to/from the
and types are
passed through UnixEncoding.
UnixEncoding does the following:
-
When unmarshaling a filename from unmanaged to managed code
(such as with ),
UnixEncoding will first try to decode the string as a UTF-8
string.
If the UTF-8 decode fails, any "invalid" characters will be
represented as the sequence of
followed by the
"offending" byte cast to a char.
-
When marshaling a filename from managed to unmanaged code (such as
via or
), the filename will be
encoded using UTF-8 unless
is encountered,
in which case the EscapeByte character will be skipped
and the following character will be marshaled as a byte.
The upshot to all this is that Mono.Unix and
Mono.Unix.Native can list, access, and open all files on your
filesystem, regardless of encoding.
The downside is that all such support is only within the
Mono.Unix and Mono.Unix.Native namespaces. You won't be
able to pass non-Unicode filenames as command-line arguments.
In short, it's a Glorious Hack. Rejoice. Or something.
What this means:
-
Any filename on disk, in any encoding (or lack thereof), can be
found and used with the Mono.Unix and Mono.Unix.Native
types.
-
You don't need to specify the encoding of filenames (which could be
wrong anyway, since a directory may contain files in more than one
encoding).
-
Printing or otherwise saving/displaying the filename may be
incorrect, since it contains extra escaping that's relevant only to
the Mono.Unix and Mono.Unix.Native classes. I'm not
losing any sleep over this, because if the encoding is unknown the
strings couldn't be displayed correctly anyway...
-
You may not be able to use the
classes to use a file
obtained via Mono.Unix and Mono.Unix.Native classes.
This is because System.IO
doesn't know about UnixEncoding and the escape mechanism it uses.
I don't consider this to be a problem, as the System.IO classes
couldn't open these files anyway -- they weren't returned by
, and they were effectively
invisible to normal Mono programs. They still are.
If the filename contains Mono.Unix.UnixEncoding.EscapeByte,
then you won't be able to use System.IO with that file. If the
filename doesn't contain EscapeByte, it can be used with
System.IO.
-
You still can't specify filenames in arbitrary encodings on the
mono command line. Mono will still try to decode these as either
UTF-8 strings or as an encoding listed in the
MONO_EXTERNAL_ENCODINGS environment variable.
Questions & Answers
-
Q
Why UTF-8? Why not use
?
-
A
Because UTF-8 is sane and should always be used. :-)
-
Q
Seriously?
-
A
Ha ha only serious. Plus, since a directory can contain files in
more than one encoding, and expecting the developer to provide the
right encoding for each file would require the developer to be
clairvoyant.
Plus, using UTF-8 allows any Unicode character to be used in a
filename (which could be considered as a bad thing, depending).
-
Q
What is
?
-
A
U+0000. Since this is the terminating null, it by definition cannot
appear within a Unix filename, so it's a sane choice.
-
Q
Why not use byte[] instead of
s for filenames in
,
, etc.?
-
A
Because byte[] is fugly to work with, so it would need to
be offered in addition to the string versions, which would double all
the file-related APIs. Do you really want to explain the difference
between these APIs?
public static int open (string pathname, OpenFlags flags);
public static int open (byte[] pathname, OpenFlags flags);
(Hint: if you do want to explain the difference between these
you're masochistic.)
Furthermore, what should
be (or
, or any other
string-typed structure member)? If it's a byte[], developers
will still need a way to convert it to a string for debugging and
display to the user, but the developer can't know what encoding to use
(it could be anything), so this becomes an impossible problem.
UnixEncoding may be a Glorious Hack, but at least it leaves the
API usage unambiguous.
-
Q
.NET doesn't have these limitations! Why does Mono?
-
A
Because Windows stores all filenames on disk as Unicode (and has
since Windows NT 3.1 and/or the introduction of Long Filenames in
Windows 95), so it doesn't need to worry (as much) about the
arbitrary filename encoding problem. Short filenames might be in a
local encoding, but CIFS uses Unicode, so you can't be accessing
non-Unicode filenames over a network share.
-
Q
Why doesn't Mono do this (or something like it) so that System.IO
can read and process all files?
-
A
Priorities. :-)
Plus, I thought it would be easy for Mono to do this, but after
implementing this type I'm not sure the other
maintainers would wish to deal with the issues of arbitrary
filename encodings.
Plus, most current Linux distros default to using UTF-8 already,
so (hopefully) this won't be an issue for too much longer
(10 years?).
Constructor
1.0.5000.0
2.0.0.0
4.0.0.0
Constructs a new instance of the
class.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Boolean
To be added.
To be added.
To be added.
To be added.
Field
1.0.5000.0
2.0.0.0
4.0.0.0
System.Char
The character which precedes all characters which need
escaping during managed->unmanaged marshaling.
This character is U+0000.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Int32
To be added.
To be added.
To be added.
To be added.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Int32
To be added.
To be added.
To be added.
To be added.
To be added.
To be added.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Byte[]
To be added.
To be added.
To be added.
To be added.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Int32
To be added.
To be added.
To be added.
To be added.
To be added.
To be added.
To be added.
To be added.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Int32
To be added.
To be added.
To be added.
To be added.
To be added.
To be added.
To be added.
To be added.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Int32
To be added.
To be added.
To be added.
To be added.
To be added.
To be added.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Int32
To be added.
To be added.
To be added.
To be added.
To be added.
To be added.
To be added.
To be added.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Text.Decoder
To be added.
To be added.
To be added.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Text.Encoder
To be added.
To be added.
To be added.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Int32
To be added.
To be added.
To be added.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Int32
To be added.
To be added.
To be added.
To be added.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Int32
To be added.
To be added.
To be added.
To be added.
Method
1.0.5000.0
2.0.0.0
4.0.0.0
System.Byte[]
To be added.
To be added.
To be added.
Field
1.0.5000.0
2.0.0.0
4.0.0.0
System.Text.Encoding
A default instance.
This member can be used instead of constructing a new
instance.