[opencms-dev] Lucene - Searchable fields

Trevor Lee Trevor.Lee at 4Loop.com.au
Fri Nov 28 05:31:02 CET 2003


Hi Matt,

Thanks a bundle for your help

Cheers
Trevor

-----Original Message-----
From: opencms-dev-admin at opencms.org
[mailto:opencms-dev-admin at opencms.org]On Behalf Of M Butcher
Sent: Thursday, November 27, 2003 2:42 PM
To: opencms-dev at opencms.org
Subject: Re: [opencms-dev] Lucene - Searchable fields



For a Page object, it is something like this (from PageDocument.java):

=======================================================================
cmso is a CmsObject and f is a CmsFile, which you can get from a bunch
of methods of CmsObject (like readFile() ).
=======================================================================

/*
  * Now we move on to the contents of the Page. What we need to do is
  * get the body file (from CmsXmlControlFile), then get the
  * <TEMPLATE<![CDATA[]]></TEMPLATE> from it. That info is in HTML,
  * so we need to parse that out and get just the text.
  */
         CmsXmlControlFile xCntrl = new CmsXmlControlFile(cmso, f);
         String contentsName = xCntrl.getElementTemplate("body");

         CmsFile contents = cmso.readFile(contentsName);
         CmsXmlTemplateFile xcContents = new CmsXmlTemplateFile(cmso,
           contents);

         // We want the all existing bodies.
         // get all body selector names
         Iterator itSelect = xcContents.getAllSections().iterator();
         String cdata;

         // for each body, DO WHATEVER
         while (itSelect.hasNext()) {
             cdata =
                 xcContents.getTemplateContent(
                     null,
                     null,
                     (String) itSelect.next());
             // cdata now has the contents of the file.
         }

=======================================================================
To get the bodies of JSPs or plain text documents, things are much
easier. From PlainDocument.java:

         String c = new String(cmso.readFile(absPath).getContents());
         if(c.length() > 0) {
             // DO WHATEVER YOU NEED TO DO...
             // c now has the contents of the file.
         } else {
             A_OpenCms.log(A_OpenCms.C_OPENCMS_INFO,
                 "File " + f.getName() + " has no contents.");
         }

Hope that helps.

Matt

Trevor Lee wrote:
> Hi Matt,
>
> I'm trying to integrate a Highlighting text tool with the search results -
> which requires the body to be retrieved from the Document object.
>
> What is the method that should be used from the OpenCMS api given the
> Document object to retrieve the body? (Assuming i don't change the
> *Document.java files as you suggested)
>
> Thanks : )
>
> Cheers
> Trevor
>
> -----Original Message-----
> From: opencms-dev-admin at opencms.org
> [mailto:opencms-dev-admin at opencms.org]On Behalf Of M Butcher
> Sent: Wednesday, November 26, 2003 5:28 PM
> To: opencms-dev at opencms.org
> Subject: Re: [opencms-dev] Lucene - Searchable fields
>
>
>
> "body" is the correct doc object. The body is put in an UnStored Field,
> which, according to the Javadocs for Lucene:
>
> "Constructs a String-valued Field that is tokenized and indexed, but
> that is not stored in the index."
>
> Body was implemented this way because there is no point in storing the
> entire body of a document in the index when it can easily be fetched
> through the OpenCms API.... I guess I should say, because _I_ couldn't
> see a point in storing the entire body. You may be able to think of a
> perfectly good reason. ;-)
>
> If, for some reason, you decide you need to change from an unstored to
> some other type of Field, you would need to change each of the
> *Document.java classes that index bodies (e.g. no need to touch
> BodylessDocument.java) and recompile (etc.). Shouldn't be bad, though,
> since it only requires a one-line change per file.
>
> Matt
>
> Trevor Lee wrote:
>
>>Hi,
>>
>>I was wondering what the searchable fields are?
>>
>>>From the simple_search.jsp:
>>Last modified date is last_modified (ie doc.get("last_modified"))
>>title is title
>>
>>what is the corresponding value for body?
>>I've tried doc.get("body") and doc.get("text") and both seems to return
>>null. But the doc.get("title") works ok for the corresponding doc object.
>>
>>If anyone has any ideas it would be much appreciated.
>>
>>Cheers
>>Trevor
>>
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev


_______________________________________________
This mail is send to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://mail.opencms.org/mailman/listinfo/opencms-dev





More information about the opencms-dev mailing list