Commit Graph

35 Commits

Author SHA1 Message Date
Guido van Rossum
909bc18188 Recover from failed saves; when a file turns out to be a directory,
create a directory and moer the original file to the index.html.
1999-01-03 13:06:00 +00:00
Guido van Rossum
a42c1ee21d Added note() message to Page class -- this was used but didn't exist.
(The alternative would be to call self.checker.note() but since
self.checker might be None that's not quite right.
1998-08-06 21:31:13 +00:00
Guido van Rossum
b77a68e6b1 Rewrite to support multiple suckers, each with their own thread. 1998-07-08 03:05:22 +00:00
Guido van Rossum
125700addb Instead of printint, use self.message() or self.note(). 1998-07-08 03:04:39 +00:00
Guido van Rossum
0a13f7f23a # This is a new module I wrote over the weekend. Again, you missed the
# checkin email because my PC doesn't have the "Mail" command.

Add threading (now that it works).  Also some small adaptations to
Unix again.
1998-06-15 14:49:16 +00:00
Guido van Rossum
e3bd82117f Primitive GUI for websucker. 1998-06-15 12:35:19 +00:00
Guido van Rossum
d328a9b5f4 Fix the way a trailing / is changed to /index.html so that it
doesn't depend on the value of os.sep.  (I.e. ported to Windows :-)
1998-06-15 12:34:41 +00:00
Guido van Rossum
6eb9d32c43 sort the urls in the todo list 1998-06-15 12:33:02 +00:00
Guido van Rossum
bee64533d6 Use a try-except so that the pickle file is written even when we die
because of an unexpected exception.
1998-04-27 19:35:15 +00:00
Guido van Rossum
986abac1ba Give in to tabnanny 1998-04-06 14:29:28 +00:00
Guido van Rossum
88b02cf346 Use a better way to bind the checkext instance variable to a check
button widget, not involving a __getattr__() method but a callback on
the widget.
1998-03-05 20:12:18 +00:00
Guido van Rossum
1a7eae919a Adapt to new webchecker structure. Due to better structure of
getpage(), much less duplicate code is needed -- we only need to
override readhtml().
1998-02-21 20:08:39 +00:00
Guido van Rossum
00756bd4a6 Major overhaul. Don't use global variable (e.g. verbose); use
instance variables.  Make all global functions methods, for easy
overriding.  Restructure getpage() for easy overriding.  Add
save_pickle() method and load_pickle() global function to make it
easier for other programs to emulate the toplevel interface.
1998-02-21 20:02:09 +00:00
Guido van Rossum
f326134e5c Map .shtml to text/html. 1997-10-07 14:56:42 +00:00
Guido van Rossum
d57548023f A variant on webchecker that creates a mirror copy of a remote site. 1997-10-06 18:54:25 +00:00
Guido van Rossum
2237b73baf Several changes:
- Change the code that looks for robots.txt to always look in /, even
if the "root" path is somewhere deep down below.

- Add link processing in <AREA> tags.

- Change safeclose() to avoid crashing when the file has no geturl()
method.
1997-10-06 18:54:01 +00:00
Guido van Rossum
68bdad1015 Tiny script to play with it on a Mac. 1997-05-28 16:09:02 +00:00
Guido van Rossum
29f6533c7f Scroll to top of info window when done. 1997-05-09 03:19:29 +00:00
Guido van Rossum
89efda363f Avoid the fancy handler for error 401 (request authentication). 1997-05-07 15:00:56 +00:00
Guido van Rossum
af310c1d00 Restructured Checker class to get rid of 'ext' table.
Links are now either in 'todo' or 'done', and ext links
are hadled more like local links except that no further
links are gathered (and sometimes they aren't checked,
e.g. for mailto and news URLs).  The -x option reverses
its meaning: it disables checking of ext links (they are
moved to 'done' without checking).  A new 'errors' table
collects pages with bad links as we go -- redundant,
but useful for the GUI version which needs to report
this as we go.  Some new methods, including reset().
New checkpoint format.

Adapted the GUI to the changes in the Checker class.
Added Quit and "Start over" buttons, and a checkbox
to disable checking external links.  The details
window now also shows bad links emanating from the
selected page.  Miscellaneous small chages.
1997-02-02 23:30:32 +00:00
Guido van Rossum
4f6ecdaacf Add root URL entry box, separate start/stop/step buttons.
If the users selects an item in 'To check', start checking there.
1997-02-01 05:17:29 +00:00
Guido van Rossum
6133ec656e Process <img> and <frame> tags. Don't bother skipping second href. 1997-02-01 05:16:08 +00:00
Guido van Rossum
de99d310cc Check in another copy of tktools.py... 1997-01-31 18:58:53 +00:00
Guido van Rossum
06981c328d Tk interface to webchecker. Not fully featured yet, but usable. 1997-01-31 18:58:12 +00:00
Guido van Rossum
0b0b5f0279 Spin off checking of external page in a subroutine.
Increase MAXPAGE to 150K.
Add back printing of __doc__ for usage message.
1997-01-31 18:57:23 +00:00