Commit Graph

36 Commits

Author SHA1 Message Date
Greg Ward
4d9d2563f5 #17445: difflib: add diff_bytes(), to compare bytes rather than str
Some applications (e.g. traditional Unix diff, version control
systems) neither know nor care about the encodings of the files they
are comparing. They are textual, but to the diff utility they are just
bytes. This worked fine under Python 2, because all of the hardcoded
strings in difflib.py are ASCII, so could safely be combined with
old-style u'' strings. But it stopped working in 3.x.

The solution is to use surrogate escapes for a lossless
bytes->str->bytes roundtrip. That means {unified,context}_diff() can
continue to just handle strings without worrying about bytes. Callers
who have to deal with bytes will need to change to using diff_bytes().

Use case: Mercurial's test runner uses difflib to compare current hg
output with known good output. But Mercurial's output is just bytes,
since it can contain:
  * file contents (arbitrary unknown encoding)
  * filenames (arbitrary unknown encoding)
  * usernames and commit messages (usually UTF-8, but not guaranteed
    because old versions of Mercurial did not enforce it)
  * user messages (locale encoding)

Since the output of any given hg command can include text in multiple
encodings, it is hopeless to try to treat it as decodable Unicode
text. It's just bytes, all the way down.

This is an elaboration of a patch by Terry Reedy.
2015-04-20 20:21:21 -04:00
Berker Peksag
102029dfd6 Issue #2052: Add charset parameter to HtmlDiff.make_file(). 2015-03-15 01:18:47 +02:00
Georg Brandl
794e9bf1fe merge with 3.4 2014-10-29 10:27:06 +01:00
Georg Brandl
525d355984 Fixing broken links in doc, part 3: the rest 2014-10-29 10:26:56 +01:00
Terry Jan Reedy
386b2b18f5 Merge with 3.4. Closes #21232. 2014-04-18 17:00:50 -04:00
Terry Jan Reedy
bddecc3861 Issue #21232: Replace .splitlines arg '1' with 'keepends=True'. 2014-04-18 17:00:19 -04:00
Andrew Kuchling
c51da2b8a0 #14332: provide a better explanation of junk in difflib docs
Initial patch by Alba Magallanes.
2014-03-19 16:43:06 -04:00
Andrew Kuchling
2e3743cd30 #13437: link to the source code for a few more modules 2014-03-19 16:23:01 -04:00
Serhiy Storchaka
fbc1c26803 Issue #19795: Improved markup of True/False constants. 2013-11-29 12:17:13 +02:00
Serhiy Storchaka
bfdcd436f0 Issue #18758: Fixed and improved cross-references. 2013-10-13 23:09:14 +03:00
R David Murray
96433f8e34 #18601: fix error made when difflib example was converted to use 'with'. 2013-07-30 15:37:11 -04:00
Raymond Hettinger
1929983518 Beautify and modernize the SequenceMatcher example 2011-04-09 19:41:31 -07:00
Raymond Hettinger
dbb677a894 Beautify and modernize the SequenceMatcher example 2011-04-09 19:41:00 -07:00
Éric Araujo
a3dd56b6cf Use with statement where it improves the documentation (closes #10461) 2011-03-11 17:42:48 +01:00
Terry Reedy
17a59252e8 Issue 10534, difflib: tweak doc; test new SequenceMatcher instance attributes; avoid unneeded lists of SM.b2j keys and items in .__chain_b. Do not backport. 2010-12-15 20:18:10 +00:00
Georg Brandl
500be24a64 Fix indentation. 2010-12-03 19:56:42 +00:00
Terry Reedy
74a7c67db1 2010-12-03 18:57:42 +00:00
Terry Reedy
dc9b17d922 Add version-added note twice for new difflib SequenceMatcher autojunk parameter. 2010-11-27 20:52:14 +00:00
Terry Reedy
99f9637de8 Issue 2986: Add autojunk paramater to SequenceMatcher to turn off heuristic. Patch by Terry Reedy, Eli Bendersky, and Simon Cross 2010-11-25 06:12:34 +00:00
Georg Brandl
8e9eb95c40 #8686: remove potentially confusing wording that does not add any value. 2010-10-17 09:23:05 +00:00
R. David Murray
b2416e54b1 Merged revisions 80004 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r80004 | r.david.murray | 2010-04-12 12:35:19 -0400 (Mon, 12 Apr 2010) | 13 lines

  Issue #7585: use tab between components in unified and context diff headers.

  Instead of spaces between the filename and date (or whatever the string
  is that follows the filename, if any) use tabs.  This is what the unix
  'diff' command does, for example, and difflib was intended to follow
  the 'standard' way of doing diffs.  This improves compatibility with
  patch tools.  The docs and examples are also changed to recommended that
  the date format used be the ISO 8601 format, which is what modern diff
  tools emit by default.

  Patch by Anatoly Techtonik.
........
2010-04-12 16:58:02 +00:00
Raymond Hettinger
58c8c262f8 Add another example to the seealso section. 2009-04-27 21:01:21 +00:00
Georg Brandl
c2a4f4fb67 Update signature style for optional arguments, part 3. 2009-04-10 09:03:43 +00:00
Georg Brandl
e6bcc9145e Remove many "versionchanged" items that didn't use the official markup,
but just some text embedded in the docs.

Also remove paragraph about implicit relative imports from tutorial.
2008-05-12 18:05:20 +00:00
Benjamin Peterson
e41251e864 Merged revisions 62490 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r62490 | benjamin.peterson | 2008-04-24 20:29:10 -0500 (Thu, 24 Apr 2008) | 2 lines

  reformat some documentation of classes so methods and attributes are under the class directive
........
2008-04-25 01:59:09 +00:00