net.grcomputing.opencms.search.lucene
Class TaggedPlainDocument

java.lang.Object
  |
  +--net.grcomputing.opencms.search.lucene.TaggedPlainDocument
All Implemented Interfaces:
I_DocumentConstants, I_DocumentFactory

public class TaggedPlainDocument
extends java.lang.Object
implements I_DocumentConstants, I_DocumentFactory

This class serves as a document factory for OpenCMS resources. It produces Lucene Document objects that contain the correct fields for indexing OpenCMS resources. Unlike some of the other Lucene implementations, this one is highly coupled with the OpenCMS API - thereby taking advantage of properties security settings, etc.

This class handles pages with tagged data, e.g. XML and HTML and their derrivatives.

Author:
Matt Butcher mbutcher@grcomputing.net
See Also:
http://grcomputing.net

Field Summary
 
Fields inherited from interface net.grcomputing.opencms.search.lucene.I_DocumentConstants
FIELD_BODY, FIELD_DESC, FIELD_INITIAL_ADD, FIELD_KEYWORDS, FIELD_LAST_MOD, FIELD_PATH, FIELD_TITLE
 
Constructor Summary
TaggedPlainDocument()
           
 
Method Summary
 Document Document(CmsObject cmso, CmsFile f)
          Takes a tagged Plain instance and builds a Lucene Document suitable for index generation.
 Document Document(CmsObject cmso, CmsFile f, java.util.HashMap p)
          Takes a Plain instance of tagged content and builds a Lucene Document suitable for index generation.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TaggedPlainDocument

public TaggedPlainDocument()
Method Detail

Document

public Document Document(CmsObject cmso,
                         CmsFile f,
                         java.util.HashMap p)
                  throws CmsException
Takes a Plain instance of tagged content and builds a Lucene Document suitable for index generation. This is for Plain documents that contain tagged data -- that is, HTML, XML, and their derivatives. Like the default PageDocument parser, this one uses the fast tag stripper, which will simply strip out all of the tags from a document. Information stored in element attributes will not make it into the index. All parsed character DATA will make it in, even if it is JavaScript or CSS.

Specified by:
Document in interface I_DocumentFactory
Throws:
CmsException - it cannot work with the CmsFile or CmsObject.
See Also:
FastTagStripper

Document

public Document Document(CmsObject cmso,
                         CmsFile f)
                  throws CmsException
Takes a tagged Plain instance and builds a Lucene Document suitable for index generation. Convenience method.

Specified by:
Document in interface I_DocumentFactory
Throws:
CmsException - it cannot work with the CmsFile or CmsObject.


Copyright © 2003 Matt Butcher of Global Resources for Computing. Reporoduction and modification of this documents are allowed as in accordance with the GPL v2. Refer to COPYING.txt for information on acceptible use