[opencms-dev] a solution of double-byte content search (based on lucene and M. Butcher's modu)

石 羽森 shi_yusen at hotmail.com
Mon Jul 5 16:53:01 CEST 2004


hi there,

the following is on double-byte content search, try it if you are in a CJK 
project.

1. install net.grcomputing.opencms.search.lucene 1.5 which contents 
lucene-1.3-final.jar. lucene 1.3 fixed StandardTokenizer's handling of CJK 
characters (Chinese, Japanese and Korean ideograms).

2. modify the content branch of CmsNewExplorerFileList.java as following:

CmsSearchFormObject searchForm = 
(CmsSearchFormObject)((Hashtable)session.getValue("ocms_search.allfilter")).get(currentFilter);

String query = searchForm.getValue01();
SearchHelper search = new SearchHelper(cms);
Query tempquery = null;
try {
Analyzer analyzer = new StandardAnalyzer();
tempquery = QueryParser.parse(query, "", analyzer);
} catch (Exception e) {
}
Hits hits = search.doSimpleSearch(tempquery.toString());

int i, j = hits.length();

if(j == 0) {
        content.append("<h2>Your search found no matches. Please try 
again.</h2>");
} else {
        float score;
        Document doc;
        String tLastMod;
        if(j == 1)
            content.append("<h2 class=\"search-mathces\">Your search found 
1 match.</h2>");
        else
            content.append("<h2 class=\"search-matches\">Your search found 
" + Integer.toString(j) + " matches.</h2>");

        // For each hit, get the Document and print out some information 
(including a link) about each item that
        // matches.
        for(i = 0; i<j; ++i) {
            score = hits.score(i);
            doc = hits.doc(i);
            String lms = doc.get("last_modified");
            if(lms != null && !"".equals(lms))
                tLastMod = DateField.stringToDate(lms).toString();
            else tLastMod = "unknown";
            
            //tLastMod = "unknown";
            content.append("<p class=\"search-hit\"><b 
class=\"search-hit-title\">" 
                + "<a href=\"" + cms.link(doc.get("abs_path")) + "\" 
class=\"search-hit-link\">"
                + doc.get("title") + "</a></b><br><i 
class=\"search-hit-score\">");
            //out.print(score); // Score is between 0.0 and 1.0
            content.append("</i> " + doc.get("description") + " <br><span 
class=\"smalltext\">(Last modified: " + tLastMod + ")</span></p>");
        }
    }

3. set searchbylucene in regitry.xml to on.

4. compile and update opencms.jar.

5. restart tomcat.

now you can search double-byte phrases.

Shi Yusen
shiys at langhua.cn

_________________________________________________________________
与联机的朋友进行交流,请使用 MSN Messenger:  http://messenger.msn.com/cn  




More information about the opencms-dev mailing list