[opencms-dev] getting recognized by robots and crawlers

Fri Oct 21 18:00:07 CEST 2005

One reason why I'm doing all this "getting rid" stuff now is, that
my customer wants his site to be better recognized by robots.

At the moment the site is practically not visited.

I see google and other robots visiting the site, reading robots.txt and
after that nothing happens.

When I do a wget -r on the site, just the index.html is fetched,
that's all. The content seems to be totally hidden to the outer world.

What would be the way to proceed from here? What I want to achieve is,
that all relevant paths are traversed by robots and documents (PDF)
are also searched through. (opencms search module, htdig? that other thing
with 'L', forgot the name for the moment).

Another point is, and that's probably another reason why the site
is practically not mentioned in search results - I told that to the
customer's admins already - is that there is no RR PTR record (reverse lookup)
in their DNS server. I assume that search engines leave fingers off of sites
that do not reverse map (often sites of questionable contents, mildly
speaking :)

I took care that they added one now but it still is not migrated downstream
to other DNS servers.

Help appreciated.

Thank you.

--
Chris Christoph P. U. Kukulies kukulies (at) rwth-aachen.de