
this is a very alpha stuff made by a python newbie, and zope in the middle oldbie
:)

KebasData will get a url, and parses the page for the pattern you specified.  If there's no start and end pattern, the whole page will gulped.  If start and end pattern is specified, part of the page will be used.

To get data daily, you can setup crontab.  or use Xron.
Use get_matched() method

I would like to compile all the URLs and regex patterns used by everybody, and include it with the tar ball.
so, mail me your URLs and regex patterns, and I'll include it with KebasData.tar.gz

This is areally an exercise in making a Zope Product, and pointers, comments, good coding style, etc 
are requested.

The lowdown
------------
Url is the url you want to retrieve
pattern is the regular expression pattern that will match data in the page
start pattern is a unique string in the page that will determine your start location.
if left blank, it will be the first character in the page.
end pattern is a unique string that determines your end location. if not specified, it will be the last character in the page.

render_method: a dtml method to associate rendering of matches.
if you have a DTML Method with id "use_this", then put use_this in
render_method field.
Checkout the userfriendly example.

starting with KebasData-0.0.1a2, you can pass url to get_matched(), and do stuff with the results.

e.g, to get daily userfriendly strip
url: userfriendly.org/static
pattern: &lt;img.*?&gt;
start pattern: &lt;!--Start Current Strip--&gt; 
end pattern: &lt;!--End Strip--&gt;
render_method: show

somewhere in your acquisition path, create a DTML Method call show with this codes inside:

---------------start of show-----------------
&lt;dtml-var standard_html_header&gt;
&lt;dtml-var match&gt;
&lt;dtml-var standard_html_footer&gt;
---------------end---------------------------

You can then access "http://server.com/userfriendly"
or &lt;dtml-var "userfriendly.view()"&gt;

get daily dilbert strip
url: http://www.dilbert.com/
pattern: &lt;a href=(.*?)&gt;(.*?)&lt;/a&gt;
start pattern: &lt;!--COMIC STRIP BEGIN--&gt;
end pattern: &lt;!--COMIC STRIP END--&gt;
render method: dilbert_method

create a DTML Method with id dilbert_method
-----------------start----------------------
&lt;img src=&lt;dtml-var "_.string.replace(match[0][0],'/comic','http://www.dilbert.com/comic')"&gt;&gt;
--------end---------------------------------

you can access the strip at http://server.com/dilbert/
or &lt;dtml-var "dilbert.view()"&gt;


getting top news from cnn.com/CNNI

url: www.cnn.com/CNNI
pattern: &lt;a href=(.*?)&gt;(.*?)&lt;/a&gt;
start pattern: &lt;!-- Profil
end pattern: &lt;!-- FN and SI Content --&gt; 

will give you a list of tuples.  you can then access any tuples or items from a dtml-method.
&lt;dtml-var expr="myid.match"&gt; will give all items grabbed/matched.
&lt;dtml-in expr="myid.match"&gt; will iterate thru list
&lt;dtml-var expr="myid.match[0][0]"&gt; will give the first group match.
etc


This view helps determine how you can manipulate data retrieved. 
You can have a list of items, or a list of tuples if your regular expression uses grouping.
e.g:
	&lt;img.*?&gt; will return a list of data
	&lt;img(.*?)alt(.*?)&gt; will return a list of tuples.

You can access data as you would any list.


