[opencms-dev] Sort search results weighting popularity and age

Jonathan Woods jonathan.woods at scintillance.com
Thu May 18 11:07:11 CEST 2006


Alessandro -

I guess I was concentrating on the 'search' part, rather than retrieval.
Your steps might go something like the following (I'm using some of them in
my current development):

1.  First, you need some way of recording accesses - haven't thought how
best to go about that.

2.  Then you need to make sure that at index time, extra Lucene Field(s) are
added to the Lucene Documents placed in the index, and that these Fields
reflect the number of accesses (and anything else you want).  As you
probably know, the way to influence how a document gets indexed is to write
a Lucene Document factory class implementing I_CmsDocumentFactory (and maybe
even extending or delegating to CmsDocumentXmlContent or its superclass
A_CmsVfsDocument); then the name of your factory class needs to appear in
opencms-search.xml, in a /opencms/search/documenttypes/documenttype node.
Your doc factory class can add and boost Fields as it sees fit to model the
significance of various factors, though the ordering and relevance should
actually be controlled most at search time.

3.  And finally, you need a collector which takes notice of Lucene stuff
rather than just VFS resources and their properties/attributes.  This is the
thing which would be of most general use to OpenCms folk, I think: the power
of Lucene combined with the flexibility of collectors.  That's what I was
banging on about below.  The only trouble is that collectors deal with lists
of CmsResources rather than, say, Lucene Hits.  I'm just about to add
something to my implementation which makes Hits available too, one of the
advantages being that you don't need to read in (potentially large) document
contents just to get out the hit information.

Jon

-----Original Message-----
From: opencms-dev-bounces at opencms.org
[mailto:opencms-dev-bounces at opencms.org] On Behalf Of Alessandro Magnolo
Sent: 18 May 2006 09:47
To: The OpenCms mailing list
Subject: Re: [opencms-dev] Sort search results weighting popularity and age

On 5/17/06, Jonathan Woods <jonathan.woods at scintillance.com> wrote:
> Nice idea.  I've not done it, but I believe the search engine part 
> would best be done by writing your own collector implementation - i.e. 
> a class which implements I_CmsResourceCollector and is referred to 
> underneath the node /opencms/vfs/resources/collectors in opencms-vfs.xml.

mmmh... Are you sure that I_CmsResourceCollector is used by the search
engine?
AFAIK, it is used just to get the complete list of XML resources in a
folder.

Instead, I would like to do a free text search in the whole site, using the
lucene index, and then take into account the date/popularity of file when
sorting the results.

I'd like to work on this feature and then release it for inclusion in
OpenCms, but I need some guidance on where I should put the effort.

Is there a separate mailing list for OpenCms developers, where a discussion
on this subject could take place? (here it's mostly user related questions).

regards,
Alessandro Magnolo

> -----Original Message-----
> From: opencms-dev-bounces at opencms.org
> [mailto:opencms-dev-bounces at opencms.org] On Behalf Of Alessandro 
> Magnolo
> Sent: 17 May 2006 11:04
> To: The OpenCms mailing list
> Subject: [opencms-dev] Sort search results weighting popularity and 
> age
>
> I would like to sort search results in OpenCms weighting the page 
> popularity (how many times the page was viewed) and age (how old the
content is).
> Has anybody done this before?
>
> In OpenCms we have the publish date, so we know the age.
> Regarding popularity, maybe it's possible to retrieve this information 
> from the flex cache... is it?
>
> Then, the next step would be to make the search engine weight in these 
> values in the results list (toghether with relevancy as currently
defined).
> Any suggestion on how to do it?
>
> regards,
> Alessandro Magnolo
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list To change 
> your list options, or to unsubscribe from the list, please visit 
> http://lists.opencms.org/mailman/listinfo/opencms-dev
>
>
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list To change 
> your list options, or to unsubscribe from the list, please visit 
> http://lists.opencms.org/mailman/listinfo/opencms-dev
>

_______________________________________________
This mail is sent to you from the opencms-dev mailing list To change your
list options, or to unsubscribe from the list, please visit
http://lists.opencms.org/mailman/listinfo/opencms-dev





More information about the opencms-dev mailing list