[opencms-dev] search engines and static vs. dynamic contents

Code Create, Bernd Wolfsegger bw at code-create.com
Thu Nov 17 11:31:23 CET 2005


Hi Alex and others involved into the theme,

well, seems the complete story is more complex than imagined :)

On Thursday, 17. November 2005 01:05, Alexander Kandzior wrote:
> Belive me you don't want to do this.
>
> For example, if your template changes (but not the content) your code may
> cause browser clients to cache the page using an if-modified-since code 304
> http header. This will lilley happen even if the navigation or other
> elements used on the JSP have changed, since only the content is used for
> the date last modified calculation.

O.k., but I think changes in the content are more important than changes in 
the layout.
If I make substantial changes to the generation code, I just touch the content 
(site) folder and all resources have an up-to-date Last Modified.

>
> Internally, OpenCms does a quite complete calculation of the possible date
> last modified of a requested resource, based on all included elements in
> the request. If one element used in the JSP changes, the date last modified
> for the complete page is adjusted. This also works in tandem with the Flex
> cache. For example if there is one element included that has cache set to
> "never", this will cause the date-last-modified to be set to the current
> date, since no caching of the element and hence no caching of the page that
> uses this element is possible / wanted.

Hm, thats's o.k. I don't want the pages to be Browser / Server cached, 
undependend form the flexcache settings.
I fact, the "frame" JSP I use is not flexcached (never), only the included 
"content" JSPs and navigation etc.. That is because the "frame" JSP contains 
code that has to be executed every time the page is requested.

So, is there a way to have the whole pages not Browser / Server cached 
undependend from the flexcache settings (because they differ for main and sub 
elements), and also don't have the search engines think my pages are always 
new, and visiting my pages endlessly again and again...

Regarding Browser / Server chaching does anybody know about the interaction of 
the cache headers like "expires", "pragma", "cache-control" etc. and 
last-modified?

RFC 2616 is rather complex :)


Kind regards, Bernd


>
> Best Regards,
> Alex.
>
> Alexander Kandzior
> Alkacon Software - The OpenCms Experts
> http://www.alkacon.com
>
> > -----Original Message-----
> > From: opencms-dev-bounces at opencms.org
> > [mailto:opencms-dev-bounces at opencms.org] On Behalf Of Code
> > Create, Bernd Wolfsegger
> > Sent: Wednesday, November 16, 2005 7:10 PM
> > To: doychi-lists at doychi-dina.ath.cx; The OpenCms mailing list
> > Subject: Re: [opencms-dev] search engines and static vs.
> > dynamic contents
> >
> > Well,
> >
> > as for: 3) Make sure the "Last Modified" is set correctly:
> >
> > I found that OpenCms generated sites do have an updated
> > "Modified" date every time they are requested ...
> > Is it somehow possible to change this behaviour?
> > Does anybody know if it has an effect to set the Last
> > Modified Header in the response manually in the JSP to the
> > OpenCms last edited date?
> >
> > Any opinions ? :-)
> >
> > Kind regrds, Bernd
> >
> > On Wednesday, 16. November 2005 01:34, Doychi wrote:
> > > On 1:35:11 2005-11-16 "Code Create, Bernd Wolfsegger"
> > > <bw at code-create.com>
> > > wrote:
> > > <snip>
> > >
> > > > Well, any experts here? :)
> > >
> > > I wouldn't call myself an expert, but I've worked with
> >
> > Verity's K2 for
> >
> > > a number of years and what follows are some suggestions
> >
> > that generally
> >
> > > get thrown around to simplify the job of indexing/crawling
> >
> > documents.
> >
> > > > I don't think that a robot can do more than make http
> >
> > requests etc..
> >
> > > > (Anything else would be a security case) And that is really
> > > > something different from accessing the servers file system.
> > >
> > > Got it in one.
> > >
> > > > You have a problem with such dynamically generated sites,
> >
> > where you
> >
> > > > have a "controller" JSP (always the same Url) an thousands of get
> > > > Parameter to determine which content to show.
> > > > But thats not the case with OpenCms. The Urls look exactly like
> > > > static content Urls. No difference.
> > >
> > > <snip>
> > >
> > > Assumption:  The site has already been spidered/index once by the
> > > search engine.
> > >
> > > Get parameters are a problem, but not the only one.  It depends a
> > > little on how the search engine spiders the site.  Some will only
> > > check that the "last modified" time of the root page is newer than
> > > when the spider last found the page and if it isn't new
> >
> > then it won't
> >
> > > process the page, and won't check any pages further down the tree.
> > > Others will check every page they already know about to see
> >
> > if it is
> >
> > > newer and then spider from the newer pages.  The second method is
> > > safer and I suspect most search engines are using this method now.
> > >
> > > If you want to make your site highly accessable to spiders I would
> > > recommend:
> > >
> > > 1) Don't use JavaScript in links that you want the spider
> >
> > to follow.
> >
> > > Some engines will be able to follow SOME JavaScript links,
> >
> > but due to
> >
> > > the flexability of JavaScript to mangle links I wouldn't trust it.
> > >
> > > 2) Don't use GET/POST parameters to change the information
> >
> > on a page.
> >
> > > Again some search engines have options to allow the GET
> >
> > parameters to
> >
> > > be used in identifying pages for indexing, but again I
> >
> > wouldn't recommend it.
> >
> > > 3) Make sure the "Last Modified" is set correctly.  This does two
> > > things it prevents the spider from having to process pages
> >
> > it doesn't
> >
> > > have to, reducing the load on your servers, and also
> >
> > ensures that the
> >
> > > content is index if it is new.
> > >
> > > Any way I hope this helps.
> > >
> > > --
> > > Doychi
> > > spdoychiam at doychi-dina.ath.cx
> > >
> > >
> > > _______________________________________________
> > > This mail is send to you from the opencms-dev mailing list
> >
> > To change
> >
> > > your list options, or to unsubscribe from the list, please visit
> > > http://mail.opencms.org/mailman/listinfo/opencms-dev
> >
> > --
> >
> > [  Code Create
> > [  Web Content Management and Presentation
> >
> >
> > [  Bernd Wolfsegger
> > [  Sun Certified Programmer for Java(TM) 2 Platform
> >
> >
> > [  Lohmeyerstrasse 13
> > [  10587 Berlin
> > [  Germany
> > [  Fon +49 (0)30 26555788
> > [  Fax +49 (0)30 2651835
> > [  Mobile +49 (0)163 6505622
> >
> > [  bw at code-create.com
> > [  http://www.code-create.com/
> >
> >
> >
> > _______________________________________________
> > This mail is send to you from the opencms-dev mailing list To
> > change your list options, or to unsubscribe from the list,
> > please visit http://mail.opencms.org/mailman/listinfo/opencms-dev

-- 

[  Code Create
[  Web Content Management and Presentation


[  Bernd Wolfsegger
[  Sun Certified Programmer for Java(TM) 2 Platform


[  Lohmeyerstrasse 13
[  10587 Berlin
[  Germany
[  Fon +49 (0)30 26555788
[  Fax +49 (0)30 2651835
[  Mobile +49 (0)163 6505622

[  bw at code-create.com
[  http://www.code-create.com/




More information about the opencms-dev mailing list