[opencms-dev] search engines and static vs. dynamic contents

Alexander Kandzior alex at opencms.org
Wed Nov 16 23:05:24 CET 2005


Belive me you don't want to do this. 

For example, if your template changes (but not the content) your code may
cause browser clients to cache the page using an if-modified-since code 304
http header. This will lilley happen even if the navigation or other
elements used on the JSP have changed, since only the content is used for
the date last modified calculation.

Internally, OpenCms does a quite complete calculation of the possible date
last modified of a requested resource, based on all included elements in the
request. If one element used in the JSP changes, the date last modified for
the complete page is adjusted. This also works in tandem with the Flex
cache. For example if there is one element included that has cache set to
"never", this will cause the date-last-modified to be set to the current
date, since no caching of the element and hence no caching of the page that
uses this element is possible / wanted.

Best Regards,
Alex.

Alexander Kandzior
Alkacon Software - The OpenCms Experts
http://www.alkacon.com

 

> -----Original Message-----
> From: opencms-dev-bounces at opencms.org 
> [mailto:opencms-dev-bounces at opencms.org] On Behalf Of Code 
> Create, Bernd Wolfsegger
> Sent: Wednesday, November 16, 2005 7:10 PM
> To: doychi-lists at doychi-dina.ath.cx; The OpenCms mailing list
> Subject: Re: [opencms-dev] search engines and static vs. 
> dynamic contents
> 
> Well,
> 
> as for: 3) Make sure the "Last Modified" is set correctly:
> 
> I found that OpenCms generated sites do have an updated 
> "Modified" date every time they are requested ...
> Is it somehow possible to change this behaviour?
> Does anybody know if it has an effect to set the Last 
> Modified Header in the response manually in the JSP to the 
> OpenCms last edited date?
> 
> Any opinions ? :-)
> 
> Kind regrds, Bernd
> 
> On Wednesday, 16. November 2005 01:34, Doychi wrote:
> > On 1:35:11 2005-11-16 "Code Create, Bernd Wolfsegger" 
> > <bw at code-create.com>
> > wrote:
> > <snip>
> >
> > > Well, any experts here? :)
> >
> > I wouldn't call myself an expert, but I've worked with 
> Verity's K2 for 
> > a number of years and what follows are some suggestions 
> that generally 
> > get thrown around to simplify the job of indexing/crawling 
> documents.
> >
> > > I don't think that a robot can do more than make http 
> requests etc..
> > > (Anything else would be a security case) And that is really 
> > > something different from accessing the servers file system.
> >
> > Got it in one.
> >
> > > You have a problem with such dynamically generated sites, 
> where you 
> > > have a "controller" JSP (always the same Url) an thousands of get 
> > > Parameter to determine which content to show.
> > > But thats not the case with OpenCms. The Urls look exactly like 
> > > static content Urls. No difference.
> >
> > <snip>
> >
> > Assumption:  The site has already been spidered/index once by the 
> > search engine.
> >
> > Get parameters are a problem, but not the only one.  It depends a 
> > little on how the search engine spiders the site.  Some will only 
> > check that the "last modified" time of the root page is newer than 
> > when the spider last found the page and if it isn't new 
> then it won't 
> > process the page, and won't check any pages further down the tree.  
> > Others will check every page they already know about to see 
> if it is 
> > newer and then spider from the newer pages.  The second method is 
> > safer and I suspect most search engines are using this method now.
> >
> > If you want to make your site highly accessable to spiders I would
> > recommend:
> >
> > 1) Don't use JavaScript in links that you want the spider 
> to follow.  
> > Some engines will be able to follow SOME JavaScript links, 
> but due to 
> > the flexability of JavaScript to mangle links I wouldn't trust it.
> >
> > 2) Don't use GET/POST parameters to change the information 
> on a page.
> > Again some search engines have options to allow the GET 
> parameters to 
> > be used in identifying pages for indexing, but again I 
> wouldn't recommend it.
> >
> > 3) Make sure the "Last Modified" is set correctly.  This does two 
> > things it prevents the spider from having to process pages 
> it doesn't 
> > have to, reducing the load on your servers, and also 
> ensures that the 
> > content is index if it is new.
> >
> > Any way I hope this helps.
> >
> > --
> > Doychi
> > spdoychiam at doychi-dina.ath.cx
> >
> >
> > _______________________________________________
> > This mail is send to you from the opencms-dev mailing list 
> To change 
> > your list options, or to unsubscribe from the list, please visit 
> > http://mail.opencms.org/mailman/listinfo/opencms-dev
> 
> -- 
> 
> [  Code Create
> [  Web Content Management and Presentation
> 
> 
> [  Bernd Wolfsegger
> [  Sun Certified Programmer for Java(TM) 2 Platform
> 
> 
> [  Lohmeyerstrasse 13
> [  10587 Berlin
> [  Germany
> [  Fon +49 (0)30 26555788
> [  Fax +49 (0)30 2651835
> [  Mobile +49 (0)163 6505622
> 
> [  bw at code-create.com
> [  http://www.code-create.com/
> 
> 
> 
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list To 
> change your list options, or to unsubscribe from the list, 
> please visit http://mail.opencms.org/mailman/listinfo/opencms-dev
> 
> 




More information about the opencms-dev mailing list