HarvestMan Downloads
HarvestMan Packages
You can download HarvestMan packages here. The packages are arranged in chronological order (Latest releases first).
Package Information
The latest release is 1.4.6 final. The previous release is 1.4.5 final.Release 1.4.6 final(Released: Sep 09 2005)
- HarvestMan 1.4.6 final (Source tar.bz2) (99225 bytes) (Updated Sep 09 2005 IST 14.45)
- Win32 installer for HarvestMan 1.4.6 (win32 executable) (1804921 bytes) (Updated Sep 09 2005 IST 17.15)
Release 1.4.5 final(Released: Aug 19 2005)
Release 1.4 final (Uploaded: Dec 17 2004)
The final version of HarvestMan (HarvestMan-1.4) can be downloaded here.
(NOTE: The py2exe install can be done by using the file py2exesetup.bat)
Sample XML configuration file for all releases starting from 1.4.5 final
HarvestMan does not require HTML tidy from 1.4.5 version onwards. So tidy is no
longer added with HarvestMan or supported from this version. Documentation for
this release will be available after the final 1.4.5 release.
HarvestMan & HTML Tidy
Please note that the Tidy library supplied with HarvestMan is only the Python wrapper of html tidy. For this to work, your system should contain the original html tidy libraries. More information can be found at Html Tidy Project Page.
However, an additional HarvestMan package has been added here (Dec 19 2004) which contains a pre-built version of libtidy for Linux x86 systems. The install script of HarvestMan will automatically copy this library and configure it for you. It is built on Fedora Core 2.0 using gcc-3.3.3. It might work for other Unix systems, but it is not guaranteed. However, it should work for most Linux x86 systems.
Tidylib INSTALLATION
Update (Dec 19 2004)
Python tidy library supplied in this page require the installation of tidylib library which can be obtained from the project page of html tidy at http://tidy.sourceforge.net/. However, the package below has been updated with a tidylib library built on Linux x86 (Fedora Core 2). This can be found under the "pvt_ctypes" directory. After unarchiving the file as shown below, copy the tidylib library (named libtidy-0.99.so.0.0.0) to "/usr/lib" and do the following as "root".
% cd /usr/lib
% ln -s libtidy-0.99.so.0.0.0 libtidy-0.99.so.0
% ln -s libtidy-0.99.so.0 libtidy.so
Download the file 'tidy.tar.gz', gunzip and untar it inside the 'HarvestMan' directory. This will create a directory named 'tidy' inside the HarvestMan folder.When you setup HarvestMan with distutils, this directory will also be copied to your Python installation folder. The archive contains tidy shared libraries for Windows/Linux platforms,so it should work in both. HarvestMan wont work with tidy on other platforms.
- Html Tidy Libraries for HarvestMan, with libtidy (348436 bytes) (Updated Dec 19 2004)
Other Html Tidy Problems
The Python html tidy wrapper requires the Ctypes module to work. The tidy package includes a private installation of ctypes, and uses it if it does not find a ctpyes module installed in your system. However, the private ctypes version requires the presence of the library libffi, specifically libffi-2.00-beta.
If you are getting an error as below when you run HarvestMan with the tidy option set, you need to get libffi-2.00-beta.
File "/usr/lib/python2.3/site-packages/HarvestMan/tidy/pvt_ctypes/ctypes.zip/ctypes/__init__.py", line 13, in ?
ImportError: libffi-2.00-beta.so: cannot open shared object file: No such file or directory
libffi-2.00-beta can be downloaded as part of libffi-3.4.3-2. This can be obtained from,
Downloading and installing the above library should fix the libffi problem with tidy.HarvestMan Documentation
Download documentation on HarvestMan. This contains the main documentation, the change log for the latest release and the changes.txt for all releases.
- HarvestMan Documentation, tarred & gzipped (39894 bytes)
- The HarvestMan Web Crawler