this is a very alpha stuff made by a python newbie, and zope in the middle oldbie
:)

KebasData will get a url, and parses the page for the pattern you specified.  If there's no start and end 
pattern, the whole page will gulped.  If start and end pattern is specified, part of the page will be used.

To get data daily, you can setup crontab.  or use Xron.


I would like to compile all the URLs and regex patterns used by everybody, and include it with the tar ball.
so, mail me your URLs and regex patterns, and I'll include it with KebasData.tar.gz

This is areally an exercise in making a Zope Product, and pointers, comments, good coding style, etc 
are requested.

The lowdown
------------
Url is the url you want to retrieve
pattern is the regular expression pattern that will match data in the page
start pattern is a unique string in the page that will determine your start location.
if left blank, it will be the first character in the page.
end pattern is a unique string that determines your end location. if not specified, it will be the last character in the page.

e.g, to get daily userfriendly strip
url: userfriendly.org/static
pattern: <img.*?>
start pattern: <!--Start Current Strip--> 
end pattern: <!--End Strip-->


getting top news from cnn.com/CNNI

url: www.cnn.com/CNNI
pattern: <a href=(.*?)>(.*?)</a>
start pattern: <!-- Profil
end pattern: <!-- FN and SI Content --> 

will give you a list of tuples.  you can then access any tuples or items from a dtml-method.
<dtml-var expr="myid.match"> will give all items grabbed/matched.
<dtml-in expr="myid.match"> do stuff</dtml-in> will iterate thru list
<dtml-var expr="myid.match[0][0]"> will give the first group match.
etc


This view helps determine how you can manipulate data retrieved. 
You can have a list of items, or a list of tuples if your regular expression uses grouping.
e.g:
 <img.*?> will return a list of data
 <img(.*?)alt(.*?)> will return a list of tuples.

You can access data as you would any list.

