37 Commits

Author SHA1 Message Date
Miss Islington (bot)
58a1a76bae bpo-35922: Fix RobotFileParser when robots.txt has no relevant crawl delay or request rate (GH-11791)
Co-Authored-By: Tal Einat <taleinat+github@gmail.com>
(cherry picked from commit 8047e0e1c6)

Co-authored-by: Rémi Lapeyre <remi.lapeyre@henki.fr>
2019-06-16 00:07:54 -07:00
Christopher Beacham
5db5c0669e bpo-21475: Support the Sitemap extension in robotparser (GH-6883) 2018-05-16 10:52:07 -04:00
Michael Lazar
bd08a0af2d bpo-32861: urllib.robotparser fix incomplete __str__ methods. (GH-5711)
The urllib.robotparser's __str__ representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra
newlines that were being appended to the end of the string.
2018-05-14 17:10:41 +03:00
Berker Peksag
3df02dbc8e bpo-31325: Fix usage of namedtuple in RobotFileParser.parse() (#4529) 2017-11-23 15:40:26 -08:00
Antoine Pitrou
a6a4dc816d bpo-31370: Remove support for threads-less builds (#3385)
* Remove Setup.config
* Always define WITH_THREAD for compatibility.
2017-09-07 18:56:24 +02:00
Berker Peksag
9a7bbb2e3f Issue #25400: RobotFileParser now correctly returns default values for crawl_delay and request_rate
Initial patch by Peter Wirtz.
2016-09-18 20:17:58 +03:00
Berker Peksag
2a8d7f1c47 Issue #28151: Use pythontest.net in test_robotparser 2016-09-18 11:21:57 +03:00
Berker Peksag
a3c1728bb6 Use HTTP in testPythonOrg 2016-09-11 15:46:47 +03:00
Berker Peksag
966ad74bf9 Unskip testPythonOrg in test_robotparser
We should probably use pythontest.net for this.
2016-09-11 15:27:07 +03:00
Berker Peksag
2a9f5edeeb Wrap testPasswordProtectedSite with @reap_threads 2016-09-11 15:17:53 +03:00
Berker Peksag
4da0fd06ce Issue #25497: Rewrite test_robotparser to use a class based design 2016-09-11 14:53:16 +03:00
Serhiy Storchaka
e437a10d15 Issue #23277: Remove unused imports in tests. 2016-04-24 21:41:02 +03:00
Berker Peksag
960e848f0d Issue #16099: RobotFileParser now supports Crawl-delay and Request-rate
extensions.

Patch by Nikolay Bogoychev.
2015-10-08 12:27:06 +03:00
Berker Peksag
ad324f6bcc Issue #20753: Skip PasswordProtectedSiteTestCase when Python is built without threads. 2014-06-29 15:54:56 +03:00
Senthil Kumaran
601d6ec693 issue20753 - robotparser tests should not rely upon external resource when not required.
Specifically, it was relying a URL which gave HTTP 403 and used it to assert
it's methods, this changes undoes that and provides a local http server with
similar properties.

Patch contributed by Vajrasky Kok.
2014-06-25 02:58:15 -07:00
Zachary Ware
66f2928479 Issue #18492: Allow all resources when tests are not run by regrtest.py.
This changeset also includes cleanup allowed by this behavior change.
2014-06-02 16:01:29 -05:00
Georg Brandl
89e5671be7 #20719: Disable the robotparser python.org test until the gzip encoding issue can be sorted. 2014-02-23 08:45:15 +01:00
Senthil Kumaran
c70a6ae49b #17403: urllib.parse.robotparser normalizes the urls before adding to ruleline.
This helps in handling certain types invalid urls in a conservative manner.
2013-05-29 05:54:31 -07:00
Ezio Melotti
0fb37ea34d #17066: test_robotparser now works with unittest test discovery. Patch by Zachary Ware. 2013-03-12 07:49:12 +02:00
Antoine Pitrou
95531ea2f1 Avoid failing in test_robotparser when mueblesmoraleda.com is flaky and
an overzealous DNS service (e.g. OpenDNS) redirects to a placeholder
Web site.
2011-07-08 19:43:51 +02:00
Antoine Pitrou
8bc09039ed Improve transient_internet() again to detect more network errors,
and use it in test_robotparser. Fixes #8574.
2010-09-07 21:09:09 +00:00
Georg Brandl
0a0fc07d37 #4108: the first default entry (User-agent: *) wins. 2010-07-29 17:55:01 +00:00
Senthil Kumaran
3f8ab965f7 Fix Issue6325 - robotparse to honor urls with query strings. 2010-07-28 16:27:56 +00:00
Florent Xicluna
41fe615539 (partially)
Merged revisions 79534,79537,79539,79558,79606 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r79534 | florent.xicluna | 2010-03-31 23:21:54 +0200 (mer, 31 mar 2010) | 2 lines

  Fix test for xml.etree when using a non-ascii path.  And use check_warnings instead of catch_warnings.
........
  r79537 | florent.xicluna | 2010-03-31 23:40:32 +0200 (mer, 31 mar 2010) | 2 lines

  Fix typo
........
  r79539 | florent.xicluna | 2010-04-01 00:01:03 +0200 (jeu, 01 avr 2010) | 2 lines

  Replace catch_warnings with check_warnings when it makes sense.  Use assertRaises context manager to simplify some tests.
........
  r79558 | florent.xicluna | 2010-04-01 20:17:09 +0200 (jeu, 01 avr 2010) | 2 lines

  #7092: Fix some -3 warnings, and fix Lib/platform.py when the path contains a double-quote.
........
  r79606 | florent.xicluna | 2010-04-02 19:26:42 +0200 (ven, 02 avr 2010) | 2 lines

  Backport some robotparser test and skip the test if the external resource is not available.
........
2010-04-02 18:52:12 +00:00
Antoine Pitrou
1bfd0cc5d4 Furniture is not very reliable these days (buildbot failures). 2010-04-02 17:12:12 +00:00