[SPAM] - Re: - Re: [opencms-dev] Search on static/exported content: looking for a solution - E-Mail-Adresse wurde in Betreff-Zeile gefunden.

M Butcher mbutcher at grcomputing.net
Fri Mar 12 21:02:02 CET 2004


Hartmann, Waehrisch & Feykes GmbH wrote:
> You could use ht://dig, but it acts as a spider and indexes the pages as a
> whole. The lucene module has the advantage that it indexes the diffentent
> parts of a page (Headline, description, keywords, body, path and more) in
> different index fields. The stuff that comes from templates like menus and
> news feeds and so on (if you have any) is left out.
> You could try to access the lucene index with a cgi script directly. I think
> it is well documented on lucene's homepage.
> Or maybe you find a ready perl solution on the net.

In fact, IIRC, there is a Perl implementation of Lucene that can read 
Lucene indexes -- I think it will work against those index files 
generated by the Java version. Using perl regexes might be a nice way to 
rewrite URLs as well.

http://jakarta.apache.org/lucene/docs/resources.html

Matt



> 
> 
> ----- Original Message ----- 
> From: "Thomas Hartwig" <TH at ivu.de>
> To: <opencms-dev at opencms.org>
> Sent: Friday, March 12, 2004 12:51 PM
> Subject: AW: [SPAM] - Re: - Re: [opencms-dev] Search on static/exported
> content: looking for a solution - E-Mail-Adresse wurde in Betreff-Zeile
> gefunden.
> 
> 
> Hi Stephan,
> 
> thanks again.
> 
> I know about the bad circumstances of cgi programming. But our principal
> fixed pure apache server as requirement in the project document.
> (I don't know why!!)
> 
> So it seems to be the better solution to introduce a cgi based search engine
> found in the web ??
> 
> Best Regards, Tom
> 
> 
>>-----Ursprüngliche Nachricht-----
>>Von: Hartmann, Waehrisch & Feykes GmbH [SMTP:hartmann at waehrisch-feykes.de]
>>Gesendet am: Freitag, 12. März 2004 12:27
>>An: opencms-dev at opencms.org
>>Betreff: [SPAM] - Re: - Re: [opencms-dev] Search on static/exported
>>content: looking for a solution - E-Mail-Adresse wurde in Betreff-Zeile
>>gefunden.
>>
>>You could modify the module to index only files that are marked for export
>>or if export is default and the page has no false export attribute.
>>The path stored in the index is just the full vfs path of a page inside
>>opencms. So you can easily prepend it with your export path location in
> 
> the
> 
>>page that displays search results.
>>Version 1.5 of the module also contains a mechanism to update the index
>>when
>>you publish a project or single resources. But it is not well tested and
>>you'll have to activate it by hand.
>>What do you mean with "lucene based cgi script"? A real cgi script written
>>in perl or a bash script that launches a java vm on each request will
>>result
>>in long response times and bad performance over all, i think. So you
> 
> should
> 
>>really consider to use tomcat and mod_jk2 for your search page, if not
>>fully
>>opencms at least a "lucene based servlet".
>>
>>Bye,
>>Stephan
>>
>>----- Original Message ----- 
>>From: "Thomas Hartwig" <TH at ivu.de>
>>To: <opencms-dev at opencms.org>
>>Sent: Friday, March 12, 2004 11:55 AM
>>Subject: AW: - Re: [opencms-dev] Search on static/exported content:
> 
> looking
> 
>>for a solution
>>
>>
>>Thanks for your fast reply,
>>
>>but I'am not sure if I had explained clearly enough what we have to do.
>>
>>We realize a two phase opencms framework project for a principal.
>>
>>He can use this framework within opencms to fullfill special prepared html
>>structure
>>in the first phase called design phase.
>>If this work was finished one have to publish/export all marked project
>>files.
>>These files are 'deployed' to the runtime environment controlled by an pur
>>apache server.
>>
>>This apache is the official access point to the principal's sites called
>>public phase
>>and handles exclusive static files.
>>On this sites one can search for html content on static files too. Because
>>of
>>the limited functionality of a pure apache web server there can't be any
>>dynamic in search modules except cgi. But online search must be a little
>>bit
>>dynamic in any way.
>>
>>We had the idea that we could use lucene during design phase to index all
>>marked for export html files in such a way that indexing happens on the
>>filesystem (export path?) during publishing/export.
>>In the public phase we have to use some lucene based cgi scrips to run
>>online
>>search based upon this search indexes.
>>
>>So I my question was if lucene covers this special case.
>>
>>Sorry because of my long text but I can't waste my time with unhopefully
>>evaluations at the moment.
>>
>>Special Thanks, Tom
>>
>>
>>
>>-----Ursprüngliche Nachricht-----
>>Von: Hartmann, Waehrisch & Feykes GmbH
>>[mailto:hartmann at waehrisch-feykes.de]
>>Gesendet: Donnerstag, 11. März 2004 10:22
>>An: opencms-dev at opencms.org
>>Betreff: [SPAM] - Re: [opencms-dev] Search on static/exported content:
>>looking for a solution - E-Mail-Adresse wurde in Betreff-Zeile gefunden.
>>
>>
>>The lucene search module indexes all resources in the virtual file system,
>>no matter if they will be exported or not. The links in your search
> 
> results
> 
>>will point to the path in the vfs (lets say the dynamic path). If the link
>>to a search result points to a file in opencms' vfs and this resource is
>>exported, opencms will redirect to the static url of this resource.
>>You may have to change some settings in opencms.properties (the ones for
>>static export path) to make it work with your apache, i think.
>>
>>Bye,
>>Stephan
>>
>>----- Original Message ----- 
>>From: "Thomas Hartwig" <TH at ivu.de>
>>To: <opencms-dev at opencms.org>
>>Sent: Thursday, March 11, 2004 9:47 AM
>>Subject: [opencms-dev] Search on static/exported content: looking for a
>>solution
>>
>>
>>Hi List,
>>
>>
>>I am new to opencms and looking generally for a search engine which
> 
> indexes
> 
>>all by the static opencms export mechanism exported files too.
>>
>>But it would also sufficient for me to index implicit exported html or pdf
>>-
>>files (marked by the export flag set to "true"), because I mapped the
>>opencms
>>FS 'export' directory to the document root of an apache server which is
> 
> the
> 
>>public access point for our sites. So any dynamic behavior except cgi is
>>forbidden.
>>
>>I found same entries in the mailing archive list handling this topic but
> 
> no
> 
>>satisfying anwer.
>>
>>The lucene solution found in opencms sandbox addresses this case ?? Have I
>>to
>>set simply the <project> tag value in the lucene registry to "online" and
>>all
>>works fine ?
>>I use opencms vs. 5.0.1.
>>
>>Best regards,
>>
>>Tom :-)
>>
>>
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
> 
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
> 
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev




More information about the opencms-dev mailing list