[opencms-dev] Lucene-search: stop words aren't displayed insearchresultlist

Jonathan Woods jonathan.woods at scintillance.com
Tue May 30 05:47:52 CEST 2006


Jason -

I can tell this problem is in my near future too.  Is it really necessary to
create a patch?  I was hoping to specify an analyser class in
opencms-search.xml and get round the problem that way.

Jon

-----Original Message-----
From: opencms-dev-bounces at opencms.org
[mailto:opencms-dev-bounces at opencms.org] On Behalf Of Jason Trump
Sent: 30 May 2006 02:19
To: The OpenCms mailing list
Subject: RE: [opencms-dev] Lucene-search: stop words aren't displayed
insearchresultlist

I ran into this exact problem.  The code that generates the highlights is
using the lucene analyzer to normalize whitespace (remove carriage returns
and such, I think), which has unpleasant the side-effect of stripping out
words like "and", "or", and "the".

I have a patch to the source code which replaces the lucene analyzer
approach with a regular expression match that normalizes the whitespace, but
leaves the text intact.  If you create an entry in bugzilla and email me the
bug number, I'll attach my patch to the bug and maybe they'll accept it.

jason

> -----Original Message-----
> From: opencms-dev-bounces at opencms.org [mailto:opencms-dev- 
> bounces at opencms.org] On Behalf Of Christian Steinert
> Sent: Sunday, May 28, 2006 9:56 AM
> To: The OpenCms mailing list
> Subject: [opencms-dev] Lucene-search: stop words aren't displayed in 
> searchresultlist
> 
> Dear List,
> 
> I stumbled over a strange problem, when starting to work with the
search
> function.
> The search itself is working fine, but the preview snippets in the 
> result listing will not contain words like "and" and "or", which of 
> course makes things impossible to read.
> 
> Did anyone experience something similar? Or is there anyone, for whom
it
> just worked instead?
> I mean I understand, that lucene will not *search* for these words,
but
> is there a way to get them *displayed* in the preview anyway?
> 
> any ideas or any working code/config-combinations are appreciated.
> 
> thanks a lot.
> christian
> 
> 
> 
> 
> 
> My search config is rather trivial, especially I did not change any 
> analyzers or indexers, but anyway - maybe I just post my search config 
> and search code here (though both are not terribly modified from their 
> standard versions...)
> 
> 
> search code ====
> <%@ page buffer="none" import="org.opencms.main.*,
org.opencms.search.*,
> org.opencms.file.*, org.opencms.jsp.*, java.util.*" %><%
> 
>     // Create a JSP action element
>     org.opencms.jsp.CmsJspActionElement cms = new 
> CmsJspActionElement(pageContext, request, response);
> 
>     // Get the search manager
>     CmsSearchManager searchManager = OpenCms.getSearchManager(); %>
> 
> <jsp:useBean id="search" scope="request"
> class="org.opencms.search.CmsSearch">
> <!--    <jsp:setProperty name = "search" property="matchesPerPage"
> param="matchesperpage"/>-->
> <!--    <jsp:setProperty name = "search" property="displayPages"
> param="displaypages"/>-->
>     <jsp:setProperty name = "search" property="matchesPerPage"
> value="10"/>
>     <jsp:setProperty name = "search" property="displayPages"
value="10"/>
>     <jsp:setProperty name = "search" property="*"/>
>     <%
>         search.init(cms.getCmsObject());
>         search.setField( new String[]{ 
> "title","keywords","description","content" } );
>     %>
> </jsp:useBean>
> 
> <html>
> <head>
> <title>Search result</title>
> </head>
> 
> <body>
> <h1>Search result</h1>
> 
> <%
>     int pageno = 1;
>         String srchPageParam = request.getParameter("searchPage");
> 
>     if (srchPageParam!=null) {
>         pageno = Integer.parseInt(srchPageParam);
>     }
> 
>     int itemsPerPage = search.getMatchesPerPage();
>         List result = search.getSearchResult();
>     int firstResultNr = ((pageno-1)*itemsPerPage)+1;
>         int lastResultNr = firstResultNr+result.size()-1;
>         int totalResultCount = search.getSearchResultCount();
> 
>     String fields = search.getFields();
>         if (fields==null) {
>          fields = request.getParameter("fields");
>         }
> 
>         if (result == null  && search.getLastException() != null) {
>           %>
>           <h3>Error</h3>
>           <%= search.getLastException().toString() %>
>           <%
>         } else if ( totalResultCount==0 ) {
>           %><p>There are no documents matching your query <strong><%=
> search.getQuery() %></strong>.</p>
>             <p>Suggestions: <ul><li>Check for possible spelling errors 
> in your search,</li><li>Try searching for different or less specific 
> terms.</li></p>
>           <%
> 
>         } else {
>           //ListIterator iterator = result.listIterator();
>           %><p>Showing results <%=firstResultNr %> to
<%=lastResultNr%>
> of <%=totalResultCount%> for <strong><%= search.getQuery()
%></strong></p>
>           <%
>             //while (iterator.hasNext()) {
>             for (int i=0;i<result.size();i++){
>               CmsSearchResult entry = (CmsSearchResult)result.get(i); 
> //(CmsSearchResult)iterator.next();
>           %>
>           <p><a href="<%=
> cms.link(cms.getRequestContext().removeSiteRoot(entry.getPath()))
> %>"><%= entry.getTitle() %></a><br />
>             <%--
>               entry.getKeywords();
>               entry.getDescription()
>               entry.getDateLastModified()
>             --%>
>             <%= entry.getExcerpt() %>
>           </p>
>           <%
>           }
>         }
> 
>         %><p><%
>       if (search.getPreviousUrl() != null) {
>             %><a href="<%= cms.link(search.getPreviousUrl()) 
> %>&fields=<%= fields %>"><< Previous</a> <%
>            }
>       Map pageLinks = search.getPageLinks();
>       Iterator i=pageLinks.keySet().iterator();
>       while (i.hasNext()) {
>         int pageNumber = ((Integer)i.next()).intValue();
>         String pageLink = cms.link((String)pageLinks.get(new
> Integer(pageNumber)));
>         if (pageNumber != pageno) {
>                   %><a href="<%= pageLink %>&fields=<%= fields %>"><%= 
> pageNumber %></a> <%
>         } else {
>                   %><span class="currentpage"><%= pageNumber %></span>
<%
>         }
>     }
>     if (search.getNextUrl()!= null) {
>                 %><a href="<%= cms.link(search.getNextUrl()) 
> %>&fields=<%= fields %>">Next >></a><%
>     }
> %></p>
> </body>
> 
> 
> search-config.xml ====
> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE opencms SYSTEM 
> "http://www.opencms.org/dtd/6.0/opencms-search.dtd">
> 
> <opencms>
>     <search>
>         <cache>8</cache>
>         <directory>index</directory>
>         <timeout>60000</timeout>
>         <excerpt>1024</excerpt>
> 
>
<highlighter>org.opencms.search.documents.CmsTermHighlighterHtml</highli
gh
> ter>
>         <documenttypes>
>             <documenttype>
>                 <name>generic</name>
> 
> <class>org.opencms.search.documents.CmsDocumentGeneric</class>
>                 <mimetypes/>
>                 <resourcetypes>
>                     <resourcetype>*</resourcetype>
>                 </resourcetypes>
>             </documenttype>
>             <documenttype>
>                 <name>html</name>
> 
> <class>org.opencms.search.documents.CmsDocumentHtml</class>
>                 <mimetypes>
>                     <mimetype>text/html</mimetype>
>                 </mimetypes>
>                 <resourcetypes>
>                     <resourcetype>plain</resourcetype>
>                 </resourcetypes>
>             </documenttype>
>             <documenttype>
>                 <name>image</name>
> 
> <class>org.opencms.search.documents.CmsDocumentGeneric</class>
>                 <mimetypes/>
>                 <resourcetypes>
>                     <resourcetype>image</resourcetype>
>                 </resourcetypes>
>             </documenttype>
>             <documenttype>
>                 <name>jsp</name>
> 
> <class>org.opencms.search.documents.CmsDocumentPlainText</class>
>                 <mimetypes/>
>                 <resourcetypes>
>                     <resourcetype>jsp</resourcetype>
>                 </resourcetypes>
>             </documenttype>
>             <documenttype>
>                 <name>msexcel</name>
> 
> <class>org.opencms.search.documents.CmsDocumentMsExcel</class>
>                 <mimetypes>
>                     <mimetype>application/vnd.ms-excel</mimetype>
>                 </mimetypes>
>                 <resourcetypes>
>                     <resourcetype>binary</resourcetype>
>                     <resourcetype>plain</resourcetype>
>                 </resourcetypes>
>             </documenttype>
>             <documenttype>
>                 <name>mspowerpoint</name>
> 
> <class>org.opencms.search.documents.CmsDocumentMsPowerPoint</class>
>                 <mimetypes>
>                     <mimetype>application/vnd.ms-powerpoint</mimetype>
>                 </mimetypes>
>                 <resourcetypes>
>                     <resourcetype>binary</resourcetype>
>                     <resourcetype>plain</resourcetype>
>                 </resourcetypes>
>             </documenttype>
>             <documenttype>
>                 <name>msword</name>
> 
> <class>org.opencms.search.documents.CmsDocumentMsWord</class>
>                 <mimetypes>
>                     <mimetype>application/msword</mimetype>
>                 </mimetypes>
>                 <resourcetypes>
>                     <resourcetype>binary</resourcetype>
>                     <resourcetype>plain</resourcetype>
>                 </resourcetypes>
>             </documenttype>
>             <documenttype>
>                 <name>pdf</name>
>
<class>org.opencms.search.documents.CmsDocumentPdf</class>
>                 <mimetypes>
>                     <mimetype>application/pdf</mimetype>
>                 </mimetypes>
>                 <resourcetypes>
>                     <resourcetype>binary</resourcetype>
>                     <resourcetype>plain</resourcetype>
>                 </resourcetypes>
>             </documenttype>
>             <documenttype>
>                 <name>rtf</name>
>
<class>org.opencms.search.documents.CmsDocumentRtf</class>
>                 <mimetypes>
>                     <mimetype>text/rtf</mimetype>
>                     <mimetype>application/rtf</mimetype>
>                 </mimetypes>
>                 <resourcetypes>
>                     <resourcetype>binary</resourcetype>
>                     <resourcetype>plain</resourcetype>
>                 </resourcetypes>
>             </documenttype>
>             <documenttype>
>                 <name>text</name>
> 
> <class>org.opencms.search.documents.CmsDocumentPlainText</class>
>                 <mimetypes>
>                     <mimetype>text/html</mimetype>
>                     <mimetype>text/plain</mimetype>
>                 </mimetypes>
>                 <resourcetypes>
>                     <resourcetype>plain</resourcetype>
>                 </resourcetypes>
>             </documenttype>
>             <documenttype>
>                 <name>xmlcontent</name>
> 
> <class>org.opencms.search.documents.CmsDocumentXmlContent</class>
>                 <mimetypes/>
>                 <resourcetypes>
>                     <resourcetype>*</resourcetype>
>                 </resourcetypes>
>             </documenttype>
>             <documenttype>
>                 <name>xmlpage</name>
> 
> <class>org.opencms.search.documents.CmsDocumentXmlPage</class>
>                 <mimetypes>
>                     <mimetype>text/html</mimetype>
>                 </mimetypes>
>                 <resourcetypes>
>                     <resourcetype>xmlpage</resourcetype>
>                 </resourcetypes>
>             </documenttype>
> 
> 
>             <documenttype>
>                 <name>ba_audio</name>
> 
> <class>org.opencms.search.documents.CmsDocumentPlainText</class>
>                 <mimetypes>
>                     <mimetype>text/plain</mimetype>
>                 </mimetypes>
>                 <resourcetypes>
>                     <resourcetype>ba_audio</resourcetype>
>                 </resourcetypes>
>             </documenttype>
>         </documenttypes>
>         <analyzers>
>             <analyzer>
> 
> <class>org.apache.lucene.analysis.de.GermanAnalyzer</class>
>                 <locale>de</locale>
>             </analyzer>
>             <analyzer>
> 
> <class>org.apache.lucene.analysis.standard.StandardAnalyzer</class>
>                 <locale>en</locale>
>             </analyzer>
>             <analyzer>
> 
> <class>org.apache.lucene.analysis.snowball.SnowballAnalyzer</class>
>                 <stemmer>French</stemmer>
>                 <locale>fr</locale>
>             </analyzer>
>             <analyzer>
> 
> <class>org.apache.lucene.analysis.snowball.SnowballAnalyzer</class>
>                 <stemmer>Italian</stemmer>
>                 <locale>it</locale>
>             </analyzer>
>         </analyzers>
>         <indexes>
>             <index>
>                 <name>de</name>
>                 <rebuild>auto</rebuild>
>                 <project>Online</project>
>                 <locale>de</locale>
>                 <sources>
>                     <source>de</source>
>                 </sources>
>             </index>
>             <index>
>                 <name>en</name>
>                 <rebuild>auto</rebuild>
>                 <project>Online</project>
>                 <locale>en</locale>
>                 <sources>
>                     <source>en</source>
>                 </sources>
>             </index>
>         </indexes>
>         <indexsources>
>             <indexsource>
>                 <name>de</name>
>                 <indexer class="org.opencms.search.CmsVfsIndexer"/>
>                 <resources>
>                     <resource>/sites/default/de/about/</resource>
>                     <resource>/sites/default/de/archives/</resource>
>                 </resources>
>                 <documenttypes-indexed>
>                     <name>html</name>
>                     <name>image</name>
>                     <name>msexcel</name>
>                     <name>mspowerpoint</name>
>                     <name>msword</name>
>                     <name>pdf</name>
>                     <name>rtf</name>
>                     <name>xmlcontent</name>
>                     <name>xmlpage</name>
>                 </documenttypes-indexed>
>             </indexsource>
>             <indexsource>
>                 <name>en</name>
>                 <indexer class="org.opencms.search.CmsVfsIndexer"/>
>                 <resources>
>                     <resource>/sites/default/en/about/</resource>
>                     <resource>/sites/default/en/archives/</resource>
>                 </resources>
>                 <documenttypes-indexed>
>                     <name>xmlpage</name>
>                     <name>xmlcontent</name>
>                     <name>rtf</name>
>                     <name>pdf</name>
>                     <name>msword</name>
>                     <name>mspowerpoint</name>
>                     <name>msexcel</name>
>                     <name>image</name>
>                     <name>html</name>
>             <name>ba_article</name>
>                 </documenttypes-indexed>
>             </indexsource>
>         </indexsources>
>     </search>
> </opencms>
> 
> 
> 
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list To change 
> your list options, or to unsubscribe from the list, please
visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev

_______________________________________________
This mail is sent to you from the opencms-dev mailing list To change your
list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev





More information about the opencms-dev mailing list