[opencms-dev] search engines and static vs. dynamic contents

Code Create, Bernd Wolfsegger bw at code-create.com
Wed Nov 16 20:23:34 CET 2005


It is possible ....

String lastModified = DateTransform.formatDateTime(cms.getCmsObject().readResource(cms.getRequestContext().getUri()).getDateLastModified(), "EEE, dd MMM yyyy hh:mm:ss", locale) + " GMT";
response.setHeader("Last-Modified", lastModified);

and to be sure


<meta http-equiv="Last-Modified" content="<%= lastModified%>">

Kind regards, Bernd

On Wednesday, 16. November 2005 21:09, Code Create, Bernd Wolfsegger wrote:
> Well,
>
> as for: 3) Make sure the "Last Modified" is set correctly:
>
> I found that OpenCms generated sites do have an updated "Modified" date
> every time they are requested ...
> Is it somehow possible to change this behaviour?
> Does anybody know if it has an effect to set the Last Modified Header in
> the response manually in the JSP to the OpenCms last edited date?
>
> Any opinions ? :-)
>
> Kind regrds, Bernd
>
> On Wednesday, 16. November 2005 01:34, Doychi wrote:
> > On 1:35:11 2005-11-16 "Code Create, Bernd Wolfsegger"
> > <bw at code-create.com> wrote:
> > <snip>
> >
> > > Well, any experts here? :)
> >
> > I wouldn't call myself an expert, but I've worked with Verity's K2 for a
> > number of years and what follows are some suggestions that generally get
> > thrown around to simplify the job of indexing/crawling documents.
> >
> > > I don't think that a robot can do more than make http requests etc..
> > > (Anything else would be a security case) And that is really something
> > > different from accessing the servers file system.
> >
> > Got it in one.
> >
> > > You have a problem with such dynamically generated sites, where you
> > > have a "controller" JSP (always the same Url) an thousands of get
> > > Parameter to determine which content to show.
> > > But thats not the case with OpenCms. The Urls look exactly like
> > > static content Urls. No difference.
> >
> > <snip>
> >
> > Assumption:  The site has already been spidered/index once by the search
> > engine.
> >
> > Get parameters are a problem, but not the only one.  It depends a little
> > on how the search engine spiders the site.  Some will only check that the
> > "last modified" time of the root page is newer than when the spider last
> > found the page and if it isn't new then it won't process the page, and
> > won't check any pages further down the tree.  Others will check every
> > page they already know about to see if it is newer and then spider from
> > the newer pages.  The second method is safer and I suspect most search
> > engines are using this method now.
> >
> > If you want to make your site highly accessable to spiders I would
> > recommend:
> >
> > 1) Don't use JavaScript in links that you want the spider to follow. 
> > Some engines will be able to follow SOME JavaScript links, but due to the
> > flexability of JavaScript to mangle links I wouldn't trust it.
> >
> > 2) Don't use GET/POST parameters to change the information on a page.
> > Again some search engines have options to allow the GET parameters to be
> > used in identifying pages for indexing, but again I wouldn't recommend
> > it.
> >
> > 3) Make sure the "Last Modified" is set correctly.  This does two things
> > it prevents the spider from having to process pages it doesn't have to,
> > reducing the load on your servers, and also ensures that the content is
> > index if it is new.
> >
> > Any way I hope this helps.
> >
> > --
> > Doychi
> > spdoychiam at doychi-dina.ath.cx
> >
> >
> > _______________________________________________
> > This mail is send to you from the opencms-dev mailing list
> > To change your list options, or to unsubscribe from the list, please
> > visit http://mail.opencms.org/mailman/listinfo/opencms-dev

-- 

[  Code Create
[  Web Content Management and Presentation


[  Bernd Wolfsegger
[  Sun Certified Programmer for Java(TM) 2 Platform


[  Lohmeyerstrasse 13
[  10587 Berlin
[  Germany
[  Fon +49 (0)30 26555788
[  Fax +49 (0)30 2651835
[  Mobile +49 (0)163 6505622

[  bw at code-create.com
[  http://www.code-create.com/




More information about the opencms-dev mailing list