[opencms-dev] Lucene - Searchable fields
Trevor Lee
Trevor.Lee at 4Loop.com.au
Fri Nov 28 05:31:02 CET 2003
Hi Matt,
Thanks a bundle for your help
Cheers
Trevor
-----Original Message-----
From: opencms-dev-admin at opencms.org
[mailto:opencms-dev-admin at opencms.org]On Behalf Of M Butcher
Sent: Thursday, November 27, 2003 2:42 PM
To: opencms-dev at opencms.org
Subject: Re: [opencms-dev] Lucene - Searchable fields
For a Page object, it is something like this (from PageDocument.java):
=======================================================================
cmso is a CmsObject and f is a CmsFile, which you can get from a bunch
of methods of CmsObject (like readFile() ).
=======================================================================
/*
* Now we move on to the contents of the Page. What we need to do is
* get the body file (from CmsXmlControlFile), then get the
* <TEMPLATE<![CDATA[]]></TEMPLATE> from it. That info is in HTML,
* so we need to parse that out and get just the text.
*/
CmsXmlControlFile xCntrl = new CmsXmlControlFile(cmso, f);
String contentsName = xCntrl.getElementTemplate("body");
CmsFile contents = cmso.readFile(contentsName);
CmsXmlTemplateFile xcContents = new CmsXmlTemplateFile(cmso,
contents);
// We want the all existing bodies.
// get all body selector names
Iterator itSelect = xcContents.getAllSections().iterator();
String cdata;
// for each body, DO WHATEVER
while (itSelect.hasNext()) {
cdata =
xcContents.getTemplateContent(
null,
null,
(String) itSelect.next());
// cdata now has the contents of the file.
}
=======================================================================
To get the bodies of JSPs or plain text documents, things are much
easier. From PlainDocument.java:
String c = new String(cmso.readFile(absPath).getContents());
if(c.length() > 0) {
// DO WHATEVER YOU NEED TO DO...
// c now has the contents of the file.
} else {
A_OpenCms.log(A_OpenCms.C_OPENCMS_INFO,
"File " + f.getName() + " has no contents.");
}
Hope that helps.
Matt
Trevor Lee wrote:
> Hi Matt,
>
> I'm trying to integrate a Highlighting text tool with the search results -
> which requires the body to be retrieved from the Document object.
>
> What is the method that should be used from the OpenCMS api given the
> Document object to retrieve the body? (Assuming i don't change the
> *Document.java files as you suggested)
>
> Thanks : )
>
> Cheers
> Trevor
>
> -----Original Message-----
> From: opencms-dev-admin at opencms.org
> [mailto:opencms-dev-admin at opencms.org]On Behalf Of M Butcher
> Sent: Wednesday, November 26, 2003 5:28 PM
> To: opencms-dev at opencms.org
> Subject: Re: [opencms-dev] Lucene - Searchable fields
>
>
>
> "body" is the correct doc object. The body is put in an UnStored Field,
> which, according to the Javadocs for Lucene:
>
> "Constructs a String-valued Field that is tokenized and indexed, but
> that is not stored in the index."
>
> Body was implemented this way because there is no point in storing the
> entire body of a document in the index when it can easily be fetched
> through the OpenCms API.... I guess I should say, because _I_ couldn't
> see a point in storing the entire body. You may be able to think of a
> perfectly good reason. ;-)
>
> If, for some reason, you decide you need to change from an unstored to
> some other type of Field, you would need to change each of the
> *Document.java classes that index bodies (e.g. no need to touch
> BodylessDocument.java) and recompile (etc.). Shouldn't be bad, though,
> since it only requires a one-line change per file.
>
> Matt
>
> Trevor Lee wrote:
>
>>Hi,
>>
>>I was wondering what the searchable fields are?
>>
>>>From the simple_search.jsp:
>>Last modified date is last_modified (ie doc.get("last_modified"))
>>title is title
>>
>>what is the corresponding value for body?
>>I've tried doc.get("body") and doc.get("text") and both seems to return
>>null. But the doc.get("title") works ok for the corresponding doc object.
>>
>>If anyone has any ideas it would be much appreciated.
>>
>>Cheers
>>Trevor
>>
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
_______________________________________________
This mail is send to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://mail.opencms.org/mailman/listinfo/opencms-dev
More information about the opencms-dev
mailing list