Hi As an exercise I am trying to implement a simple web proxy server similar to hidemyass.com. My program is a work in progress:- 1) gets a web page from the internet 2) Uses HTMLTree::XMLParser to turn HTML into XML compliant 3) walks the tree (REXML) and saves each link (<a href="http...">)in a DB table 4) replaces each href with a link to my server which has an ID that indexes the DB table 5) uses render_text to display the new HTML at the client Issues solved:- * Company proxy server * Standard (HREF) links Thinks to solve:- * Buttons (havent analysed yet) * Forms (havent analysed yet) * CSS (don''t know how to solve) * Javascript (I don''t even understand the implications of this yet) * Redireted web pages (I have seen a library routine for this) * Browser Location field not behaving as I would like (map.connect ??) * Slow to insert a few hundred links into the DB (~50 links per second) (is there a bulk load? MySQL) Gotchas:- * HTML is not XML (solved with HTMLTree::XMLParser) * HREFs might not contain a complete URL (solved with URI.parse) * Bad HTML without preamble and with 2 root nodes (HEAD and BODY) (not solved yet) * URI.parse has a small bug (see earlier post on this group) The program works for many cases but there are problems. CNN.COM works but the formatting is wrong. This might be because the page is not being processed with CSS - I havent analysed it yet. I am wondering if I have to pass the retrieved page (1)) thru some kind of pseudo web client to get the CSS and Javascript functionality I hope that this group can give me a few pointers on where to go from here Peter _______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails