[opencms-dev] finding all links filtered by pattern in a local site

Christoph Kukulies kuku at physik.rwth-aachen.de
Thu Oct 30 11:02:19 CET 2014


I'm fighting with a jsp for a while now to collect all links in a site 
(recursively), follow each link, if it matches a pattern,
i.e., if it is a local link into my site (so that I do not have to crawl 
the whole world wide web :) )

Problem: I have imported a site that was created using Wordpress and all 
pages are invoked by http://localhost/sitename/?page_id=<number>.

I would like to collect all pages with a pattern http://localhost* and 
recurse through their contents to collect them all up
into one Set for further processing.


Anyone written something similar?


Sidequestion:
What again defines out as the writer that write to the page, btw? Using 
out.println() within a method or function declaration
(between <%! %> tags) doesn't find "out" and leaves it undeclared.



-- 
Chris Christoph P. U. Kukulies kukulies (at) rwth-aachen.de



More information about the opencms-dev mailing list