[opencms-dev] Lucene-search: stop words aren't displayedinsearchresultlist
Jason Trump
jason.trump at brulant.com
Tue May 30 16:21:16 CEST 2006
That's a good idea, and would avoid the need for a patch. But, I felt
that the analyzer to normalize text for highlights seemed like an
approach that would produce problems, so a patch might be good for the
product anyway.
jt
> -----Original Message-----
> From: opencms-dev-bounces at opencms.org [mailto:opencms-dev-
> bounces at opencms.org] On Behalf Of Jonathan Woods
> Sent: Monday, May 29, 2006 11:48 PM
> To: 'The OpenCms mailing list'
> Subject: RE: [opencms-dev] Lucene-search: stop words aren't
> displayedinsearchresultlist
>
> Jason -
>
> I can tell this problem is in my near future too. Is it really
necessary
> to
> create a patch? I was hoping to specify an analyser class in
> opencms-search.xml and get round the problem that way.
>
> Jon
>
> -----Original Message-----
> From: opencms-dev-bounces at opencms.org
> [mailto:opencms-dev-bounces at opencms.org] On Behalf Of Jason Trump
> Sent: 30 May 2006 02:19
> To: The OpenCms mailing list
> Subject: RE: [opencms-dev] Lucene-search: stop words aren't displayed
> insearchresultlist
>
> I ran into this exact problem. The code that generates the highlights
is
> using the lucene analyzer to normalize whitespace (remove carriage
returns
> and such, I think), which has unpleasant the side-effect of stripping
out
> words like "and", "or", and "the".
>
> I have a patch to the source code which replaces the lucene analyzer
> approach with a regular expression match that normalizes the
whitespace,
> but
> leaves the text intact. If you create an entry in bugzilla and email
me
> the
> bug number, I'll attach my patch to the bug and maybe they'll accept
it.
>
> jason
>
> > -----Original Message-----
> > From: opencms-dev-bounces at opencms.org [mailto:opencms-dev-
> > bounces at opencms.org] On Behalf Of Christian Steinert
> > Sent: Sunday, May 28, 2006 9:56 AM
> > To: The OpenCms mailing list
> > Subject: [opencms-dev] Lucene-search: stop words aren't displayed in
> > searchresultlist
> >
> > Dear List,
> >
> > I stumbled over a strange problem, when starting to work with the
> search
> > function.
> > The search itself is working fine, but the preview snippets in the
> > result listing will not contain words like "and" and "or", which of
> > course makes things impossible to read.
> >
> > Did anyone experience something similar? Or is there anyone, for
whom
> it
> > just worked instead?
> > I mean I understand, that lucene will not *search* for these words,
> but
> > is there a way to get them *displayed* in the preview anyway?
> >
> > any ideas or any working code/config-combinations are appreciated.
> >
> > thanks a lot.
> > christian
> >
> >
> >
> >
> >
> > My search config is rather trivial, especially I did not change any
> > analyzers or indexers, but anyway - maybe I just post my search
config
> > and search code here (though both are not terribly modified from
their
> > standard versions...)
> >
> >
> > search code ====
> > <%@ page buffer="none" import="org.opencms.main.*,
> org.opencms.search.*,
> > org.opencms.file.*, org.opencms.jsp.*, java.util.*" %><%
> >
> > // Create a JSP action element
> > org.opencms.jsp.CmsJspActionElement cms = new
> > CmsJspActionElement(pageContext, request, response);
> >
> > // Get the search manager
> > CmsSearchManager searchManager = OpenCms.getSearchManager(); %>
> >
> > <jsp:useBean id="search" scope="request"
> > class="org.opencms.search.CmsSearch">
> > <!-- <jsp:setProperty name = "search" property="matchesPerPage"
> > param="matchesperpage"/>-->
> > <!-- <jsp:setProperty name = "search" property="displayPages"
> > param="displaypages"/>-->
> > <jsp:setProperty name = "search" property="matchesPerPage"
> > value="10"/>
> > <jsp:setProperty name = "search" property="displayPages"
> value="10"/>
> > <jsp:setProperty name = "search" property="*"/>
> > <%
> > search.init(cms.getCmsObject());
> > search.setField( new String[]{
> > "title","keywords","description","content" } );
> > %>
> > </jsp:useBean>
> >
> > <html>
> > <head>
> > <title>Search result</title>
> > </head>
> >
> > <body>
> > <h1>Search result</h1>
> >
> > <%
> > int pageno = 1;
> > String srchPageParam = request.getParameter("searchPage");
> >
> > if (srchPageParam!=null) {
> > pageno = Integer.parseInt(srchPageParam);
> > }
> >
> > int itemsPerPage = search.getMatchesPerPage();
> > List result = search.getSearchResult();
> > int firstResultNr = ((pageno-1)*itemsPerPage)+1;
> > int lastResultNr = firstResultNr+result.size()-1;
> > int totalResultCount = search.getSearchResultCount();
> >
> > String fields = search.getFields();
> > if (fields==null) {
> > fields = request.getParameter("fields");
> > }
> >
> > if (result == null && search.getLastException() != null) {
> > %>
> > <h3>Error</h3>
> > <%= search.getLastException().toString() %>
> > <%
> > } else if ( totalResultCount==0 ) {
> > %><p>There are no documents matching your query
<strong><%=
> > search.getQuery() %></strong>.</p>
> > <p>Suggestions: <ul><li>Check for possible spelling
errors
> > in your search,</li><li>Try searching for different or less specific
> > terms.</li></p>
> > <%
> >
> > } else {
> > //ListIterator iterator = result.listIterator();
> > %><p>Showing results <%=firstResultNr %> to
> <%=lastResultNr%>
> > of <%=totalResultCount%> for <strong><%= search.getQuery()
> %></strong></p>
> > <%
> > //while (iterator.hasNext()) {
> > for (int i=0;i<result.size();i++){
> > CmsSearchResult entry =
(CmsSearchResult)result.get(i);
> > //(CmsSearchResult)iterator.next();
> > %>
> > <p><a href="<%=
> > cms.link(cms.getRequestContext().removeSiteRoot(entry.getPath()))
> > %>"><%= entry.getTitle() %></a><br />
> > <%--
> > entry.getKeywords();
> > entry.getDescription()
> > entry.getDateLastModified()
> > --%>
> > <%= entry.getExcerpt() %>
> > </p>
> > <%
> > }
> > }
> >
> > %><p><%
> > if (search.getPreviousUrl() != null) {
> > %><a href="<%= cms.link(search.getPreviousUrl())
> > %>&fields=<%= fields %>"><< Previous</a> <%
> > }
> > Map pageLinks = search.getPageLinks();
> > Iterator i=pageLinks.keySet().iterator();
> > while (i.hasNext()) {
> > int pageNumber = ((Integer)i.next()).intValue();
> > String pageLink = cms.link((String)pageLinks.get(new
> > Integer(pageNumber)));
> > if (pageNumber != pageno) {
> > %><a href="<%= pageLink %>&fields=<%= fields
%>"><%=
> > pageNumber %></a> <%
> > } else {
> > %><span class="currentpage"><%= pageNumber
%></span>
> <%
> > }
> > }
> > if (search.getNextUrl()!= null) {
> > %><a href="<%= cms.link(search.getNextUrl())
> > %>&fields=<%= fields %>">Next >></a><%
> > }
> > %></p>
> > </body>
> >
> >
> > search-config.xml ====
> > <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE opencms SYSTEM
> > "http://www.opencms.org/dtd/6.0/opencms-search.dtd">
> >
> > <opencms>
> > <search>
> > <cache>8</cache>
> > <directory>index</directory>
> > <timeout>60000</timeout>
> > <excerpt>1024</excerpt>
> >
> >
>
<highlighter>org.opencms.search.documents.CmsTermHighlighterHtml</highli
> gh
> > ter>
> > <documenttypes>
> > <documenttype>
> > <name>generic</name>
> >
> > <class>org.opencms.search.documents.CmsDocumentGeneric</class>
> > <mimetypes/>
> > <resourcetypes>
> > <resourcetype>*</resourcetype>
> > </resourcetypes>
> > </documenttype>
> > <documenttype>
> > <name>html</name>
> >
> > <class>org.opencms.search.documents.CmsDocumentHtml</class>
> > <mimetypes>
> > <mimetype>text/html</mimetype>
> > </mimetypes>
> > <resourcetypes>
> > <resourcetype>plain</resourcetype>
> > </resourcetypes>
> > </documenttype>
> > <documenttype>
> > <name>image</name>
> >
> > <class>org.opencms.search.documents.CmsDocumentGeneric</class>
> > <mimetypes/>
> > <resourcetypes>
> > <resourcetype>image</resourcetype>
> > </resourcetypes>
> > </documenttype>
> > <documenttype>
> > <name>jsp</name>
> >
> > <class>org.opencms.search.documents.CmsDocumentPlainText</class>
> > <mimetypes/>
> > <resourcetypes>
> > <resourcetype>jsp</resourcetype>
> > </resourcetypes>
> > </documenttype>
> > <documenttype>
> > <name>msexcel</name>
> >
> > <class>org.opencms.search.documents.CmsDocumentMsExcel</class>
> > <mimetypes>
> > <mimetype>application/vnd.ms-excel</mimetype>
> > </mimetypes>
> > <resourcetypes>
> > <resourcetype>binary</resourcetype>
> > <resourcetype>plain</resourcetype>
> > </resourcetypes>
> > </documenttype>
> > <documenttype>
> > <name>mspowerpoint</name>
> >
> > <class>org.opencms.search.documents.CmsDocumentMsPowerPoint</class>
> > <mimetypes>
> >
<mimetype>application/vnd.ms-powerpoint</mimetype>
> > </mimetypes>
> > <resourcetypes>
> > <resourcetype>binary</resourcetype>
> > <resourcetype>plain</resourcetype>
> > </resourcetypes>
> > </documenttype>
> > <documenttype>
> > <name>msword</name>
> >
> > <class>org.opencms.search.documents.CmsDocumentMsWord</class>
> > <mimetypes>
> > <mimetype>application/msword</mimetype>
> > </mimetypes>
> > <resourcetypes>
> > <resourcetype>binary</resourcetype>
> > <resourcetype>plain</resourcetype>
> > </resourcetypes>
> > </documenttype>
> > <documenttype>
> > <name>pdf</name>
> >
> <class>org.opencms.search.documents.CmsDocumentPdf</class>
> > <mimetypes>
> > <mimetype>application/pdf</mimetype>
> > </mimetypes>
> > <resourcetypes>
> > <resourcetype>binary</resourcetype>
> > <resourcetype>plain</resourcetype>
> > </resourcetypes>
> > </documenttype>
> > <documenttype>
> > <name>rtf</name>
> >
> <class>org.opencms.search.documents.CmsDocumentRtf</class>
> > <mimetypes>
> > <mimetype>text/rtf</mimetype>
> > <mimetype>application/rtf</mimetype>
> > </mimetypes>
> > <resourcetypes>
> > <resourcetype>binary</resourcetype>
> > <resourcetype>plain</resourcetype>
> > </resourcetypes>
> > </documenttype>
> > <documenttype>
> > <name>text</name>
> >
> > <class>org.opencms.search.documents.CmsDocumentPlainText</class>
> > <mimetypes>
> > <mimetype>text/html</mimetype>
> > <mimetype>text/plain</mimetype>
> > </mimetypes>
> > <resourcetypes>
> > <resourcetype>plain</resourcetype>
> > </resourcetypes>
> > </documenttype>
> > <documenttype>
> > <name>xmlcontent</name>
> >
> > <class>org.opencms.search.documents.CmsDocumentXmlContent</class>
> > <mimetypes/>
> > <resourcetypes>
> > <resourcetype>*</resourcetype>
> > </resourcetypes>
> > </documenttype>
> > <documenttype>
> > <name>xmlpage</name>
> >
> > <class>org.opencms.search.documents.CmsDocumentXmlPage</class>
> > <mimetypes>
> > <mimetype>text/html</mimetype>
> > </mimetypes>
> > <resourcetypes>
> > <resourcetype>xmlpage</resourcetype>
> > </resourcetypes>
> > </documenttype>
> >
> >
> > <documenttype>
> > <name>ba_audio</name>
> >
> > <class>org.opencms.search.documents.CmsDocumentPlainText</class>
> > <mimetypes>
> > <mimetype>text/plain</mimetype>
> > </mimetypes>
> > <resourcetypes>
> > <resourcetype>ba_audio</resourcetype>
> > </resourcetypes>
> > </documenttype>
> > </documenttypes>
> > <analyzers>
> > <analyzer>
> >
> > <class>org.apache.lucene.analysis.de.GermanAnalyzer</class>
> > <locale>de</locale>
> > </analyzer>
> > <analyzer>
> >
> > <class>org.apache.lucene.analysis.standard.StandardAnalyzer</class>
> > <locale>en</locale>
> > </analyzer>
> > <analyzer>
> >
> > <class>org.apache.lucene.analysis.snowball.SnowballAnalyzer</class>
> > <stemmer>French</stemmer>
> > <locale>fr</locale>
> > </analyzer>
> > <analyzer>
> >
> > <class>org.apache.lucene.analysis.snowball.SnowballAnalyzer</class>
> > <stemmer>Italian</stemmer>
> > <locale>it</locale>
> > </analyzer>
> > </analyzers>
> > <indexes>
> > <index>
> > <name>de</name>
> > <rebuild>auto</rebuild>
> > <project>Online</project>
> > <locale>de</locale>
> > <sources>
> > <source>de</source>
> > </sources>
> > </index>
> > <index>
> > <name>en</name>
> > <rebuild>auto</rebuild>
> > <project>Online</project>
> > <locale>en</locale>
> > <sources>
> > <source>en</source>
> > </sources>
> > </index>
> > </indexes>
> > <indexsources>
> > <indexsource>
> > <name>de</name>
> > <indexer class="org.opencms.search.CmsVfsIndexer"/>
> > <resources>
> > <resource>/sites/default/de/about/</resource>
> > <resource>/sites/default/de/archives/</resource>
> > </resources>
> > <documenttypes-indexed>
> > <name>html</name>
> > <name>image</name>
> > <name>msexcel</name>
> > <name>mspowerpoint</name>
> > <name>msword</name>
> > <name>pdf</name>
> > <name>rtf</name>
> > <name>xmlcontent</name>
> > <name>xmlpage</name>
> > </documenttypes-indexed>
> > </indexsource>
> > <indexsource>
> > <name>en</name>
> > <indexer class="org.opencms.search.CmsVfsIndexer"/>
> > <resources>
> > <resource>/sites/default/en/about/</resource>
> > <resource>/sites/default/en/archives/</resource>
> > </resources>
> > <documenttypes-indexed>
> > <name>xmlpage</name>
> > <name>xmlcontent</name>
> > <name>rtf</name>
> > <name>pdf</name>
> > <name>msword</name>
> > <name>mspowerpoint</name>
> > <name>msexcel</name>
> > <name>image</name>
> > <name>html</name>
> > <name>ba_article</name>
> > </documenttypes-indexed>
> > </indexsource>
> > </indexsources>
> > </search>
> > </opencms>
> >
> >
> >
> > _______________________________________________
> > This mail is sent to you from the opencms-dev mailing list To change
> > your list options, or to unsubscribe from the list, please
> visit
> > http://lists.opencms.org/mailman/listinfo/opencms-dev
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list To change
your
> list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev
>
>
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please
visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev
More information about the opencms-dev
mailing list