#9124: mailbox now accepts binary input and uses binary internally

Although this patch contains API changes and is rather weighty for an
RC phase, the mailbox module was essentially unusable without the patch
since it would produce UnicodeErrors when handling non-ascii input
at arbitrary and somewhat mysterious places, and any non-trivial amount
of email processing will encounter messages with non-ascii bytes.
The release manager approved the patch application.

The changes allow binary input, and reject non-ASCII string input early
with a useful message instead of failing mysteriously later.  Binary
is used internally for reading and writing the mailbox files.  StringIO
and Text file input are deprecated.

Initial patch by Victor Stinner, validated and expanded by R. David Murray.
This commit is contained in:
R. David Murray
2011-01-30 06:21:28 +00:00
parent b02f7c00ae
commit b7deff1ddc
4 changed files with 388 additions and 141 deletions

View File

@@ -81,13 +81,16 @@ Maildir, mbox, MH, Babyl, and MMDF.
it.
Parameter *message* may be a :class:`Message` instance, an
:class:`email.Message.Message` instance, a string, or a file-like object
(which should be open in text mode). If *message* is an instance of the
:class:`email.Message.Message` instance, a string, a byte string, or a
file-like object (which should be open in binary mode). If *message* is
an instance of the
appropriate format-specific :class:`Message` subclass (e.g., if it's an
:class:`mboxMessage` instance and this is an :class:`mbox` instance), its
format-specific information is used. Otherwise, reasonable defaults for
format-specific information are used.
.. versionchanged:: 3.2 support for binary input
.. method:: remove(key)
__delitem__(key)
@@ -108,8 +111,9 @@ Maildir, mbox, MH, Babyl, and MMDF.
:exc:`KeyError` exception if no message already corresponds to *key*.
As with :meth:`add`, parameter *message* may be a :class:`Message`
instance, an :class:`email.Message.Message` instance, a string, or a
file-like object (which should be open in text mode). If *message* is an
instance, an :class:`email.Message.Message` instance, a string, a byte
string, or a file-like object (which should be open in binary mode). If
*message* is an
instance of the appropriate format-specific :class:`Message` subclass
(e.g., if it's an :class:`mboxMessage` instance and this is an
:class:`mbox` instance), its format-specific information is
@@ -171,10 +175,20 @@ Maildir, mbox, MH, Babyl, and MMDF.
raise a :exc:`KeyError` exception if no such message exists.
.. method:: get_bytes(key)
Return a byte representation of the message corresponding to *key*, or
raise a :exc:`KeyError` exception if no such message exists.
.. versionadded:: 3.2
.. method:: get_string(key)
Return a string representation of the message corresponding to *key*, or
raise a :exc:`KeyError` exception if no such message exists.
raise a :exc:`KeyError` exception if no such message exists. The
message is processed through :class:`email.message.Message` to
convert it to a 7bit clean representation.
.. method:: get_file(key)
@@ -184,9 +198,11 @@ Maildir, mbox, MH, Babyl, and MMDF.
file-like object behaves as if open in binary mode. This file should be
closed once it is no longer needed.
.. versionadded:: 3.2
The file-like object supports the context manager protocol, so that
you can use a :keyword:`with` statement to automatically close it.
.. versionchanged:: 3.2
The file object really is a binary file; previously it was incorrectly
returned in text mode. Also, the file-like object now supports the
context manager protocol: you can use a :keyword:`with` statement to
automatically close it.
.. note::
@@ -746,9 +762,11 @@ Maildir, mbox, MH, Babyl, and MMDF.
If *message* is omitted, the new instance is created in a default, empty state.
If *message* is an :class:`email.Message.Message` instance, its contents are
copied; furthermore, any format-specific information is converted insofar as
possible if *message* is a :class:`Message` instance. If *message* is a string
possible if *message* is a :class:`Message` instance. If *message* is a string,
a byte string,
or a file, it should contain an :rfc:`2822`\ -compliant message, which is read
and parsed.
and parsed. Files should be open in binary mode, but text mode files
are accepted for backward compatibility.
The format-specific state and behaviors offered by subclasses vary, but in
general it is only the properties that are not specific to a particular

File diff suppressed because it is too large Load Diff

View File

@@ -7,8 +7,10 @@ import email
import email.message
import re
import io
import tempfile
from test import support
import unittest
import textwrap
import mailbox
import glob
try:
@@ -48,6 +50,8 @@ class TestBase(unittest.TestCase):
class TestMailbox(TestBase):
maxDiff = None
_factory = None # Overridden by subclasses to reuse tests
_template = 'From: foo\n\n%s'
@@ -69,14 +73,108 @@ class TestMailbox(TestBase):
self.assertEqual(len(self._box), 2)
keys.append(self._box.add(email.message_from_string(_sample_message)))
self.assertEqual(len(self._box), 3)
keys.append(self._box.add(io.StringIO(_sample_message)))
keys.append(self._box.add(io.BytesIO(_bytes_sample_message)))
self.assertEqual(len(self._box), 4)
keys.append(self._box.add(_sample_message))
self.assertEqual(len(self._box), 5)
keys.append(self._box.add(_bytes_sample_message))
self.assertEqual(len(self._box), 6)
with self.assertWarns(DeprecationWarning):
keys.append(self._box.add(
io.TextIOWrapper(io.BytesIO(_bytes_sample_message))))
self.assertEqual(len(self._box), 7)
self.assertEqual(self._box.get_string(keys[0]), self._template % 0)
for i in (1, 2, 3, 4):
for i in (1, 2, 3, 4, 5, 6):
self._check_sample(self._box[keys[i]])
_nonascii_msg = textwrap.dedent("""\
From: foo
Subject: Falinaptár házhozszállítással. Már rendeltél?
0
""")
def test_add_invalid_8bit_bytes_header(self):
key = self._box.add(self._nonascii_msg.encode('latin1'))
self.assertEqual(len(self._box), 1)
self.assertEqual(self._box.get_bytes(key),
self._nonascii_msg.encode('latin1'))
def test_invalid_nonascii_header_as_string(self):
subj = self._nonascii_msg.splitlines()[1]
key = self._box.add(subj.encode('latin1'))
self.assertEqual(self._box.get_string(key),
'Subject: =?unknown-8bit?b?RmFsaW5hcHThciBo4Xpob3pzeuFsbO104XNz'
'YWwuIE3hciByZW5kZWx06Ww/?=\n\n')
def test_add_nonascii_header_raises(self):
with self.assertRaisesRegex(ValueError, "ASCII-only"):
self._box.add(self._nonascii_msg)
_non_latin_bin_msg = textwrap.dedent("""\
From: foo@bar.com
To: báz
Subject: Maintenant je vous présente mon collègue, le pouf célèbre
\tJean de Baddie
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Да, они летят.
""").encode('utf-8')
def test_add_8bit_body(self):
key = self._box.add(self._non_latin_bin_msg)
self.assertEqual(self._box.get_bytes(key),
self._non_latin_bin_msg)
with self._box.get_file(key) as f:
self.assertEqual(f.read(),
self._non_latin_bin_msg.replace(b'\n',
os.linesep.encode()))
self.assertEqual(self._box[key].get_payload(),
"Да, они летят.\n")
def test_add_binary_file(self):
with tempfile.TemporaryFile('wb+') as f:
f.write(_bytes_sample_message)
f.seek(0)
key = self._box.add(f)
# See issue 11062
if not isinstance(self._box, mailbox.Babyl):
self.assertEqual(self._box.get_bytes(key).split(b'\n'),
_bytes_sample_message.split(b'\n'))
def test_add_binary_nonascii_file(self):
with tempfile.TemporaryFile('wb+') as f:
f.write(self._non_latin_bin_msg)
f.seek(0)
key = self._box.add(f)
# See issue 11062
if not isinstance(self._box, mailbox.Babyl):
self.assertEqual(self._box.get_bytes(key).split(b'\n'),
self._non_latin_bin_msg.split(b'\n'))
def test_add_text_file_warns(self):
with tempfile.TemporaryFile('w+') as f:
f.write(_sample_message)
f.seek(0)
with self.assertWarns(DeprecationWarning):
key = self._box.add(f)
# See issue 11062
if not isinstance(self._box, mailbox.Babyl):
self.assertEqual(self._box.get_bytes(key).split(b'\n'),
_bytes_sample_message.split(b'\n'))
def test_add_StringIO_warns(self):
with self.assertWarns(DeprecationWarning):
key = self._box.add(io.StringIO(self._template % "0"))
self.assertEqual(self._box.get_string(key), self._template % "0")
def test_add_nonascii_StringIO_raises(self):
with self.assertWarns(DeprecationWarning):
with self.assertRaisesRegex(ValueError, "ASCII-only"):
self._box.add(io.StringIO(self._nonascii_msg))
def test_remove(self):
# Remove messages using remove()
self._test_remove_or_delitem(self._box.remove)
@@ -154,12 +252,21 @@ class TestMailbox(TestBase):
self.assertEqual(msg0.get_payload(), '0')
self._check_sample(self._box.get_message(key1))
def test_get_bytes(self):
# Get bytes representations of messages
key0 = self._box.add(self._template % 0)
key1 = self._box.add(_sample_message)
self.assertEqual(self._box.get_bytes(key0),
(self._template % 0).encode('ascii'))
self.assertEqual(self._box.get_bytes(key1), _bytes_sample_message)
def test_get_string(self):
# Get string representations of messages
key0 = self._box.add(self._template % 0)
key1 = self._box.add(_sample_message)
self.assertEqual(self._box.get_string(key0), self._template % 0)
self.assertEqual(self._box.get_string(key1), _sample_message)
self.assertEqual(self._box.get_string(key1).split('\n'),
_sample_message.split('\n'))
def test_get_file(self):
# Get file representations of messages
@@ -169,9 +276,9 @@ class TestMailbox(TestBase):
data0 = file.read()
with self._box.get_file(key1) as file:
data1 = file.read()
self.assertEqual(data0.replace(os.linesep, '\n'),
self.assertEqual(data0.decode('ascii').replace(os.linesep, '\n'),
self._template % 0)
self.assertEqual(data1.replace(os.linesep, '\n'),
self.assertEqual(data1.decode('ascii').replace(os.linesep, '\n'),
_sample_message)
def test_iterkeys(self):
@@ -405,11 +512,12 @@ class TestMailbox(TestBase):
def test_dump_message(self):
# Write message representations to disk
for input in (email.message_from_string(_sample_message),
_sample_message, io.StringIO(_sample_message)):
output = io.StringIO()
_sample_message, io.BytesIO(_bytes_sample_message)):
output = io.BytesIO()
self._box._dump_message(input, output)
self.assertEqual(output.getvalue(), _sample_message)
output = io.StringIO()
self.assertEqual(output.getvalue(),
_bytes_sample_message.replace(b'\n', os.linesep.encode()))
output = io.BytesIO()
self.assertRaises(TypeError,
lambda: self._box._dump_message(None, output))
@@ -439,6 +547,7 @@ class TestMailboxSuperclass(TestBase):
self.assertRaises(NotImplementedError, lambda: box.__getitem__(''))
self.assertRaises(NotImplementedError, lambda: box.get_message(''))
self.assertRaises(NotImplementedError, lambda: box.get_string(''))
self.assertRaises(NotImplementedError, lambda: box.get_bytes(''))
self.assertRaises(NotImplementedError, lambda: box.get_file(''))
self.assertRaises(NotImplementedError, lambda: '' in box)
self.assertRaises(NotImplementedError, lambda: box.__contains__(''))
@@ -640,9 +749,9 @@ class TestMaildir(TestMailbox):
"Host name mismatch: '%s' should be '%s'" %
(groups[4], hostname))
previous_groups = groups
tmp_file.write(_sample_message)
tmp_file.write(_bytes_sample_message)
tmp_file.seek(0)
self.assertEqual(tmp_file.read(), _sample_message)
self.assertEqual(tmp_file.read(), _bytes_sample_message)
tmp_file.close()
file_count = len(os.listdir(os.path.join(self._path, "tmp")))
self.assertEqual(file_count, repetitions,
@@ -787,6 +896,12 @@ class _TestMboxMMDF(TestMailbox):
self.assertEqual(self._box[key].get_from(), 'foo@bar blah')
self.assertEqual(self._box[key].get_payload(), '0')
def test_add_from_bytes(self):
# Add a byte string starting with 'From ' to the mailbox
key = self._box.add(b'From foo@bar blah\nFrom: foo\n\n0')
self.assertEqual(self._box[key].get_from(), 'foo@bar blah')
self.assertEqual(self._box[key].get_payload(), '0')
def test_add_mbox_or_mmdf_message(self):
# Add an mboxMessage or MMDFMessage
for class_ in (mailbox.mboxMessage, mailbox.MMDFMessage):
@@ -817,7 +932,7 @@ class _TestMboxMMDF(TestMailbox):
self._box._file.seek(0)
contents = self._box._file.read()
self._box.close()
with open(self._path, 'r', newline='') as f:
with open(self._path, 'rb') as f:
self.assertEqual(contents, f.read())
self._box = self._factory(self._path)
@@ -1087,6 +1202,15 @@ class TestMessage(TestBase):
self._post_initialize_hook(msg)
self._check_sample(msg)
def test_initialize_with_binary_file(self):
# Initialize based on contents of binary file
with open(self._path, 'wb+') as f:
f.write(_bytes_sample_message)
f.seek(0)
msg = self._factory(f)
self._post_initialize_hook(msg)
self._check_sample(msg)
def test_initialize_with_nothing(self):
# Initialize without arguments
msg = self._factory()
@@ -1363,6 +1487,14 @@ class TestMessageConversion(TestBase):
msg_plain = mailbox.Message(msg)
self._check_sample(msg_plain)
def test_x_from_bytes(self):
# Convert all formats to Message
for class_ in (mailbox.Message, mailbox.MaildirMessage,
mailbox.mboxMessage, mailbox.MHMessage,
mailbox.BabylMessage, mailbox.MMDFMessage):
msg = class_(_bytes_sample_message)
self._check_sample(msg)
def test_x_to_invalid(self):
# Convert all formats to an invalid format
for class_ in (mailbox.Message, mailbox.MaildirMessage,
@@ -1908,6 +2040,8 @@ H4sICM2D1UIAA3RleHQAC8nILFYAokSFktSKEoW0zJxUPa7wzJIMhZLyfIWczLzUYj0uAHTs
--NMuMz9nt05w80d4+--
"""
_bytes_sample_message = _sample_message.encode('ascii')
_sample_headers = {
"Return-Path":"<gkj@gregorykjohnson.com>",
"X-Original-To":"gkj+person@localhost",

View File

@@ -16,6 +16,12 @@ Core and Builtins
Library
-------
- Issue #9124: mailbox now accepts binary input and reads and writes mailbox
files in binary mode, using the email package's binary support to parse
arbitrary email messages. StringIO and text file input is deprecated,
and string input fails early if non-ASCII characters are used, where
previously it would fail when the email was processed in a later step.
- Issue #10845: Mitigate the incompatibility between the multiprocessing
module on Windows and the use of package, zipfile or directory execution
by special casing main modules that actually *are* called __main__.py.