Senthil Kumaran
2c4810efa2
#17403 : urllib.parse.robotparser normalizes the urls before adding to ruleline.
...
This helps in handling certain types invalid urls in a conservative manner.
2013-05-29 05:58:47 -07:00
Georg Brandl
2bd953e291
Merged revisions 83238 via svnmerge from
...
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r83238 | georg.brandl | 2010-07-29 19:55:01 +0200 (Do, 29 Jul 2010) | 1 line
#4108 : the first default entry (User-agent: *) wins.
........
2010-08-01 20:59:03 +00:00
Senthil Kumaran
a4f79f97db
Merged revisions 83209 via svnmerge from
...
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r83209 | senthil.kumaran | 2010-07-28 21:57:56 +0530 (Wed, 28 Jul 2010) | 3 lines
Fix Issue6325 - robotparse to honor urls with query strings.
........
2010-07-28 16:35:35 +00:00
Skip Montanaro
1ef19f0de1
Close issue 3437 - missing state change when Allow lines are processed.
...
Adds test cases which use Allow: as well.
2008-07-27 00:49:02 +00:00
Benjamin Peterson
0522a9f1eb
#1778443 robotparser fixes from Aristotelis Mikropoulos
2008-07-12 23:41:19 +00:00
Skip Montanaro
b8bdbc04e7
Get rid of _test(), _main(), _debug() and _check(). Tests are no longer
...
needed (better set available in Lib/test/test_robotparser.py). Clean up a
few PEP 8 nits (compound statements on a single line, whitespace around
operators).
2008-04-28 03:27:53 +00:00
Skip Montanaro
1a41313684
fixes 813986
2007-08-28 23:22:52 +00:00
Georg Brandl
4ffc8f5107
Patch #1555098 : use str.join() instead of repeated string
...
concatenation in robotparser.
2007-03-13 09:41:31 +00:00
Martin v. Löwis
31bd529f53
Patch #1014237 : Consistently return booleans throughout.
2004-08-23 20:42:35 +00:00
Raymond Hettinger
bac788a3cd
Replace str.find()!=1 with the more readable "in" operator.
2004-05-04 09:21:43 +00:00
Raymond Hettinger
2d95f1ad57
SF patch #911431 : robot.txt must be robots.txt
...
(Contributed by George Yoshida.)
2004-03-13 20:27:23 +00:00
Guido van Rossum
68468eba63
Get rid of many apply() calls.
2003-02-27 20:14:51 +00:00
Neal Norwitz
5aee504ccb
Remove import of re, it is not used
2002-05-31 14:14:06 +00:00
Raymond Hettinger
aef22fb9cd
Patch 560023 adding docstrings. 2.2 Candidate (after verifying modules were not updated after 2.2).
2002-05-29 16:18:42 +00:00
Tim Peters
bc0e910826
Convert a pile of obvious "yes/no" functions to return bool.
2002-04-04 22:55:58 +00:00
Martin v. Löwis
73f570ba08
Correctly set default entry in all cases.
2002-03-18 10:43:18 +00:00
Martin v. Löwis
d22368ffb3
Patch #499513 : use readline() instead of readlines(). Removed the
...
unnecessary redirection limit code which is already in FancyURLopener.
2002-03-18 10:41:20 +00:00
Martin v. Löwis
1c63f6e489
Correct various errors:
...
- Use substring search, not re search for user-agent and paths.
- Consider * entry last. Unquote, then requote URLs.
- Treat empty Disallow as "allow everything".
Add test cases. Fixes #523041
2002-02-28 15:24:47 +00:00
Andrew M. Kuchling
e7abf97903
Remove unused import (PyChecker)
2001-08-13 14:43:43 +00:00
Tim Peters
0e6d213177
Whitespace normalization.
2001-02-15 23:56:39 +00:00
Skip Montanaro
5bba231d1e
The bulk of the credit for these changes goes to Bastian Kleineidam
...
* restores urllib as the file fetcher (closes bug #132000 )
* allows checking URLs with empty paths (closes patches #103511 and 103721)
* properly handle user agents with versions (e.g., SpamMeister/1.5)
* added several more tests
2001-02-12 20:58:30 +00:00
Eric S. Raymond
141971f22a
String method conversion.
2001-02-09 08:40:40 +00:00
Tim Peters
dfc538acae
Whitespace normalization.
2001-01-21 04:49:16 +00:00
Skip Montanaro
e99d5ea25b
added __all__ lists to a number of Python modules
...
added test script and expected output file as well
this closes patch 103297.
__all__ attributes will be added to other modules without first submitting
a patch, just adding the necessary line to the test script to verify
more-or-less correct implementation.
2001-01-20 19:54:20 +00:00
Skip Montanaro
663f6c2ad2
rewrite of robotparser.py by Bastian Kleineidam. Closes patch 102229.
2001-01-20 15:59:25 +00:00