README for the SilvaReferenceChecker
Introduction
Silva Reference Checker is an auxiliary product to check references on a site maintained by the CMS Silva.
If You have not installed Silva, get it from:
http://www.infrae.com/download/Silva/
This product both checks internal references, like included images or relative links to other documents, and external links. It's primary purpose is the internal reference check; the external reference check is currently just an add-on, and may be less useful than other link checkers.
The reference checker does only check references; it provides no means to fix broken references.
Additionally it only checks references in the content; if there are broken links in the layout outside of the content, the reference checker will not complain, as it does not even see these references. This is not considered as a missing feature of this product; please program Your layout templates carefully and they will not break.
Version information
This product has been tested with Silva 1.2, although it should run with any of the Silva 1.2.x series, and seems to work with both the 1.3 and 1.4 line.
infrae is generously hosting the CVS for this product, which is available as:
cvs -z3 -d:pserver:[email protected]:/cvs/infrae co -d SilvaReferenceChecker greenhouse/SilvaReferenceChecker
Preparing the installation
Before You do anything else, prepare Yourself before installing by reading the BUGS.txt, and decide if You still want to give it a try.
Please check, if Your python installation used to run Zope supports the "socket.ssl" module; if it does not, the reference checker cannot check https-urls. You do not need this feature to use the reference checker; its only all the https:// links remain unchecked.
This check is most simply done by starting the python interpreter interactively and type in "from socket import ssl". If this raises an ImportError, the python interpreter does not support secure sockets. If installed from source You may be able to fix the location of the SSL-includes and -libraries in the Modules/Setup file and recompile the python interpreter to get SSL support, if You have SSL on Your machine, but python does not use it.
Alternatively You may consider using the lms (Link Monitoring Server). In that case You do not need to have ssl support available in this installation, as the lms is checking the links via ssl, not Zope.
You can reuse any existing lms server; in case You want to run Your own, get it from:
http://www.gocept.com/angebot/opensource/lms
Installation
- Unpack the tar-ball in the "Products" directory.
- In the newly created directory "SilvaReferenceChecker" copy "config.py.in" to "config.py". If You want to use lms You need to edit this file by commenting in the two lines importing the interface to the lms and set the "lms_url" and "lms_client_id" to values valid for the lms server You want to use.
- Restart Zope.
Visit Your Silva Root and select "Silva Reference Checker" from the add-drop down.
Actually You may place reference checkers in any place inside a Silva Root; the reference checker will then only check the objects inside of the container (and its subcontainers). On the other hand a reference checker stay away from a container if this container already contains another reference checker, so You may be able to keep the load of one check manageable by placing a reference checkers in subcontainers which have grown large.
If You have a "deep virtual host root" inside a Silva Root, You should place a separate Reference checker in this root. (A "virtual deep root" is a subcontainer serving as the document root of an apache running in front of Zope). Please report bugs, as "deep virtual host root" support has not been tested very well yet.
ZMI Configuration
Before doing anything else please check with the "Configuration" tab of a reference checker. Most probably the values there need adjustment; otherwise You may get a lot of false positives.
You may want to change the following values:
- "Path absolute internal references"
This tells the reference checker how to resolve relative links (without leadinghttp://...
) starting with a slash.
The following values are supported:- "Zope Root" -- if "/" is the Zope Root (especially if You are running Zope without front end server).
- "Silva Root" -- if "/" is the Silva Root (default)
-
"This Container" -- if the container the reference checker
is located in is the public root.
This option is meant for "deep virtual host" setups; If a front end server points to a folder inside the Silva Root, You should place a separate reference checker inside this folder an check this option
- "Path relative internal references"
This tells the reference checker how to resolve relative links (withouthttp://...
) not starting with a slash.
Which value is correct depends on what the<base>
tag of your public rendering template says.
The following values are supported:-
"Current Object" : resolve to the current object.
This is the right value, if You have a base tag like::<base tal:attributes="href python:
or no%s/
% here.absolute_url()" /><base>
tag at all. (This is the case for the default layout.) - "Current Container" : resolve to the next container.
This is the right value, if You have a base tag like:<base tal:attributes="href python:
This is the default, as I would recommend to use such a tag. (It avoids problems with relative links in the "index"; see below)%s/
% here.container_url()" /> -
"Silva Root" : resolve to the Silva Root.
This is the right value, if You have a base tag like:<base tal:attributes="href python:
or:%s/
% here.silva_root()" /><base tal:attributes="href python:
%s/
% here.get_root_root()" />
(I do not know it this value is really useful).
Note 1: The option "Current Object" may lead to a lot of references classified "Weird" in the "index" of a container. It may be the case these references are ok when viewing the index through the current container, however.
Note 2: I would expect odd results of the path-relative references are resolved to "Silva Root" wile the path-absolute ones are set to "This Container". This combination is simply not supported yet.
If neither of the settings suit to Your needs You have to hack the source code to get the option You need; sorry.
-
"Current Object" : resolve to the current object.
- "Show unchecked references"
Check this if You want to see references not checked via configuration (see below). Unchecking this suppresses the display of references which have not been checked.
- "Show warnings"
Uncheck this if You do not wanted to list external references which got resolved with a return value of "3xx".
- "Always consider URLs external if matching:"
If You have some parts of Your Site not served by Zope, links to this part will be resolved as "missing" by the reference checker, except if the checker is told these links are actually external links.
For this You can add here a list of patterns (one by one line) matching the links pointing to resources external to Zope. Note that the patterns are interpreted as regular expressions.
Usually You can copy this information from the front end server configuration.
If all of the content is served by Zope, leave this section empty.
- "Never check the URLs matching"
This allows to disable the check of certain external references, matching a given pattern. References matching any of the pattern will be listed with a status of "unchecked" (except if listing of unchecked references is disabled, in which case they are not listed at all).
The rationale behind this option is to disable checks to servers, which are known to be slow, unreachable from the editorial server, require authorization, etc, or for which the link checker simply misbehaves due to some bugs.
Note that the patterns are actually interpreted as regular expressions. For example to not check any http or https references to host www.refusenik.org You may want to use a pattern like::
^https?://www\.refusenik\.org(/.*)?
By default this field contains the value ^https://.*
if Your
python installation does not support SSL, and is empty otherwise.
- "Seconds after which an external link will be considered as timeout"
This allows to set the timeout value for opening connections to external servers. Note that this changes the default timeout for all sockets handled by the process running the Zope Server, so setting this value to small may affect other resources (see also BUGS.txt :-/)
If Your python version does not support this (its not available for 2.1.3), this option will not show up at all.
- "Number of threads running in parallel for the external link check"
The external reference check via the ZMI runs in parallel to not get slow servers blocking the complete check. Set the value to "something reasonable" -- this depends on some trial and error. Basically, setting the value too low results in a very slow checker, while setting it too high will consume too much resources and make the server go slow for other tasks, or running out of memory in the worst case.
Note that neither checking internal references not checking references via the SMI is affected by this value.
Installation to access the checker via the SMI
To allow editors to access the reference checker to check their documents while editing it, one has to add some patches to the Silva installation.
To do this:
- copy all files in the
patches
subdirectory of this product to theviews/edit/VersionedContent
subdirectory of the Silva product - edit the
macro_tab_edit.pt
in that directory; directly after the:<metal:block fill-slot="middleground" [...some definitions here ...] > <div class="middleground">
insert something like::<tal:linkchecker condition="view/has_linkchecker" replace="structure view/render_linkchecking" />
- restart the Zope server, if it is not running in debug mode.
This should give a "check references" button in the upper right corner of the edit tab of every versioned content, if there is a reference checker with the id "service_references" available.
Clicking on this button should open a popup starting the reference check "in the background"; i.e. doing a regular refresh every 10 seconds until all references of the current document are checked. (This is to avoid to editors to hammer on the "check" button and trigger a lot of checks bombing the Zope server in case one check is somewhat slow.)
The presentation of the results does currently not meet the high standards of the Silva UI design ... I hope they are readable, at least.
Checking References of Extensions Products
The Silva Reference Checker does not check references of content types defined in extension products. If these content types should be checked too, they should register a reference retriever, as described by the API.txt If an extension content type inherits from Silva Document, the retriever registered for the Silva Document will be inherited unless overridden.
Contact / Bug reports / Patches
Lacking a better place I recommend the "Silva General" mailing list at [email protected] for discussion.
Please report bugs or feature request to (crobbenhaar at web dot de) for the time being. Feature request without patches are dropped silently with highly probability, however.
If really people are going to submit patches to me, I may consider moving this product to an accessible place like SourceForge, but for now I didn't.
License
SilvaReferenceChecker is released under the BSD license.
See LICENSE.txt
.
Credits
The initial revision of the reference checker has been hammered into shape by Albrecht Schmiedel, based on a first inoperable draft from Clemens Klein-Robbenhaar. Clemens Klein-Robbenhaar also supplied the patches to access reference checks via the SMI
The ThreadPool implementation used here is due to Tim Lesher.