[opencms-dev] Lucene - Searchable fields

M Butcher mbutcher at grcomputing.net
Thu Nov 27 04:35:02 CET 2003


For a Page object, it is something like this (from PageDocument.java):

=======================================================================
cmso is a CmsObject and f is a CmsFile, which you can get from a bunch 
of methods of CmsObject (like readFile() ).
=======================================================================

/*
  * Now we move on to the contents of the Page. What we need to do is
  * get the body file (from CmsXmlControlFile), then get the
  * <TEMPLATE<![CDATA[]]></TEMPLATE> from it. That info is in HTML,
  * so we need to parse that out and get just the text.
  */
         CmsXmlControlFile xCntrl = new CmsXmlControlFile(cmso, f);
         String contentsName = xCntrl.getElementTemplate("body");

         CmsFile contents = cmso.readFile(contentsName);
         CmsXmlTemplateFile xcContents = new CmsXmlTemplateFile(cmso,
           contents);

         // We want the all existing bodies.
         // get all body selector names
         Iterator itSelect = xcContents.getAllSections().iterator();
         String cdata;

         // for each body, DO WHATEVER
         while (itSelect.hasNext()) {
             cdata =
                 xcContents.getTemplateContent(
                     null,
                     null,
                     (String) itSelect.next());
             // cdata now has the contents of the file.
         }

=======================================================================
To get the bodies of JSPs or plain text documents, things are much 
easier. From PlainDocument.java:

         String c = new String(cmso.readFile(absPath).getContents());
         if(c.length() > 0) {
             // DO WHATEVER YOU NEED TO DO...
             // c now has the contents of the file.
         } else {
             A_OpenCms.log(A_OpenCms.C_OPENCMS_INFO,
                 "File " + f.getName() + " has no contents.");
         }

Hope that helps.

Matt

Trevor Lee wrote:
> Hi Matt,
> 
> I'm trying to integrate a Highlighting text tool with the search results -
> which requires the body to be retrieved from the Document object.
> 
> What is the method that should be used from the OpenCMS api given the
> Document object to retrieve the body? (Assuming i don't change the
> *Document.java files as you suggested)
> 
> Thanks : )
> 
> Cheers
> Trevor
> 
> -----Original Message-----
> From: opencms-dev-admin at opencms.org
> [mailto:opencms-dev-admin at opencms.org]On Behalf Of M Butcher
> Sent: Wednesday, November 26, 2003 5:28 PM
> To: opencms-dev at opencms.org
> Subject: Re: [opencms-dev] Lucene - Searchable fields
> 
> 
> 
> "body" is the correct doc object. The body is put in an UnStored Field,
> which, according to the Javadocs for Lucene:
> 
> "Constructs a String-valued Field that is tokenized and indexed, but
> that is not stored in the index."
> 
> Body was implemented this way because there is no point in storing the
> entire body of a document in the index when it can easily be fetched
> through the OpenCms API.... I guess I should say, because _I_ couldn't
> see a point in storing the entire body. You may be able to think of a
> perfectly good reason. ;-)
> 
> If, for some reason, you decide you need to change from an unstored to
> some other type of Field, you would need to change each of the
> *Document.java classes that index bodies (e.g. no need to touch
> BodylessDocument.java) and recompile (etc.). Shouldn't be bad, though,
> since it only requires a one-line change per file.
> 
> Matt
> 
> Trevor Lee wrote:
> 
>>Hi,
>>
>>I was wondering what the searchable fields are?
>>
>>>From the simple_search.jsp:
>>Last modified date is last_modified (ie doc.get("last_modified"))
>>title is title
>>
>>what is the corresponding value for body?
>>I've tried doc.get("body") and doc.get("text") and both seems to return
>>null. But the doc.get("title") works ok for the corresponding doc object.
>>
>>If anyone has any ideas it would be much appreciated.
>>
>>Cheers
>>Trevor
>>
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
> 
> 
> 
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
> 
> 
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev





More information about the opencms-dev mailing list