[opencms-dev] PDF + SearchEngine

Ivan Jelenic ivan.jelenic at nbs.yu
Tue Aug 12 09:39:01 CEST 2003


Hi,

If you want to use lucene search engine for PDF files, you first need to
convert pdf files to lucene searchable files. You can do that with some free
classes. Try to find it. For example you can use pdfbox for that. This
project is managed at SourceForge:

http://sourceforge.net/projects/pdfbox

Example:

java org.pdfbox.Main  <file.PDF> <output-for lucene>

Hope it will help you.
Best regards, Ivan.

----- Original Message ----- 
From: "Apostoly Guillaume" <ApostolyG at mail.europcar.com>
To: <opencms-dev at opencms.org>; <"<opencms-dev"@opencms.org>
Sent: Tuesday, August 12, 2003 9:01 AM
Subject: RE: [opencms-dev] PDF + SearchEngine


> I know that HtDig can, using filters (that converts .doc or .pdf to HTML
> while the search engine is browsing the site). I'm not sure about Lucene.
>
> -----Message d'origine-----
> De: Björn Schlueter [mailto:bschlueter at lenord.de]
> Date: mardi 12 août 2003 07:42
> À: <opencms-dev at opencms.org
> Objet: [opencms-dev] PDF + SearchEngine
>
>
> Hello,
>
> is there a chance that a searchEngine (ht:dig or Lucene) can search trough
> pdf-files that are placed in the galleries?
>
> That would be a neat feature. If anybody knows how to do this, please let
me
> know!
>
> Regards
>
> Björn
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev





More information about the opencms-dev mailing list