[opencms-dev] Mapping OpenCMS structured content XML field to a SOLR field

Rüdiger Kurz r.kurz at alkacon.com
Thu Feb 14 15:25:41 CET 2013


Hi Arturo,

happy to hear that you were able to solve the problem. Just a small 
addition: Now seeing that the values like E0000001 don't need to be 
interpreted under lingual aspects, you might want to index them as pure 
String values. To do this you have to modify your mapping within the XSD to:

<searchsetting element="numeroExpediente" searchcontent="true">
   <solrfield targetfield="numexp" sourcefield="*_exact" />
</searchsetting>

After touching/publishing you will have fields like:
numexp_<locale>_exact

This will increase your search performance and you can also make use of 
field type specific Solr features like faceting or date handling 
(sourcefield="*_dt").

In any case it is very good that you have added the missing locale to 
your schema.xml, not at least for text based search / auto-completion / 
spell-checking, or what other lingual Solr features you like to use.

greetings
Rüdiger

Am 14.02.2013 13:11, schrieb Arturo Martín Lladó:
> Hi,
>
> 2013/2/14 Rüdiger Kurz <r.kurz at alkacon.com>:
>> Arturo,
>>
>> the mapping you defined in the XSD should not create a field 'numexp' even
>> you declared it explicitly in the schema.xml since XML contents can hold
>> multi lingual content. Instead several dynamic fields named numexp_<locale>
>> will be created within the index, but also those are missing in your case.
>>
>> Having a closer look at the schema.xml you will find a dynamic field for
>> each language: *_en,*_de,*_el,*_es,*_fi,*_fr,*_hu,*_it
>>
>> Only those languages can be indexed correctly that are also defined as
>> locale in the opencms-system.xml as in the schema.xml. If the locale is
>> either missing in the schema or in the system configuration indexing will
>> fail. Please check this.
>>
>
> Thank you VERY much!
>
> That's it! Right now I'm working with the locale "ca" (i.e. Catalan).
> The locale was defined on opencms-system.xml but there was no field or
> type defined on SolR schema.xml file. That's what I had to do in
> schema.xml:
>
> 1. Create the new text_ca type:
>
>      <!-- Catalan -->
>      <fieldType name="text_ca" class="solr.TextField" positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_ca.txt" format="snowball"
> enablePositionIncrements="true"/>
>          <!-- Unfortunately, this class does not exists <filter
> class="solr.CatalanLightStemFilterFactory"/> -->
>          <!-- more aggressive: <filter
> class="solr.SnowballPorterFilterFactory" language="Catalan"/> -->
>        </analyzer>
>      </fieldType>
>
> 2. Create the new text_ca field:
>
> <field name="text_ca"             type="text_ca"      indexed="true"
> stored="false" multiValued="true"/><!-- Catchall for Catalan  text
> fields -->
>
> (copy&paste warning here for you: I can see "Catchall for German  text
> fields" commented for the rest of fields below "text_de" :)
>
> 3. Create the new dynamic field for the "ca" locale:
>
> <dynamicField name="*_ca"         type="text_ca"      indexed="true"
> stored="true" multiValued="true" />
>
> 4. Copy the new field to its generic "collector" text field:
>
>   <copyField source="*_ca"      dest="text_ca"/>
>
> 5. Restart Tomcat, touch the files and rebuild the SolR indeces.
>
> Now I can see the fields and query using them:
>
> <response>
>
> [...]
>
> <arr name="numexp_es"><str>E0000001</str></arr>
> <arr name="numexp_ca"><str>E0000001</str></arr>
>
> [...]
>
> </response>
>
> Again, thank you VERY much, Rüdiger :-)
>
> Regards.
>
>> Another reason for the missing field could be that you did not touch/publish
>> the expected resource after changing the XSD. The resource you expect must
>> have been touched and or published, otherwise the according document will
>> not have that new field. Alternatively, to be really sure you can rebuild
>> the whole index.
>>
>> regards
>> Rüdiger
>>
>> Am 14.02.2013 11:01, schrieb Arturo Martín Lladó:
>>
>>> Hi,
>>>
>>> This is the XML output of the following URL:
>>>
>>> http://localhost:8080/XXX/opencms/handleSolrSelect?fq=con_locales:*&fq=parent-folders:*&fq=path:/sites/default/ajuntament/.content/contrato/contrato_00002.html&fl=*,score
>>>
>>> <response><lst name="responseHeader"><int name="status">0</int><int
>>> name="QTime">4</int><lst name="params"><str name="q">*:*</str><str
>>> name="fl">*,score</str><str name="qt">edismax</str><int
>>> name="rows">10</int><arr
>>>
>>> name="fq"><str>con_locales:*</str><str>parent-folders:*</str><str>path:/sites/default/ajuntament/.content/contrato/contrato_00002.html</str></arr><long
>>> name="start">0</long></lst></lst><result name="response" numFound="1"
>>> start="0"><doc><str
>>> name="id">2c61e8eb-6153-11e2-bfbe-d1cbafdd7d70</str><str
>>> name="contentblob">[B:[B at 4e2d1d1e</str><str
>>>
>>> name="path">/sites/default/ajuntament/.content/contrato/contrato_00002.html</str><str
>>> name="type">contrato</str><str name="suffix">.html</str><date
>>> name="created">2013-01-18T09:41:03Z</date><date
>>> name="lastmodified">2013-01-30T15:59:08Z</date><date
>>> name="contentdate">2013-02-13T14:51:16.803Z</date><date
>>> name="relased">1970-01-01T00:00:00Z</date><date
>>> name="expired">292278994-08-17T07:12:55.807Z</date><arr
>>> name="res_locales"><str>ca</str></arr><arr
>>> name="con_locales"><str>ca</str></arr><str
>>>
>>> name="template_prop">/system/modules/es.tresdigits.alcudiaweb/templates/basica2columnas.jsp</str><str
>>> name="default-file_prop">index.html</str><str
>>> name="notification-interval_prop">0</str><str
>>> name="NavPos_prop">2.0</str><str
>>> name="enable-notification_prop">false</str><str
>>> name="locale_prop">ca</str><str
>>> name="NavText_prop">Principal</str><str name="Title_prop">Títol del
>>> contracte 2</str><arr
>>>
>>> name="category"><str>ca/</str><str>ca/tipoContrato/</str><str>ca/tipoContrato/tipo02/</str><str>ca/tipoDocumentoContratacion/</str><str>ca/tipoDocumentoContratacion/tipoDocumentoContratacion02/</str><str>ca/tipoFaseContratacion/</str><str>ca/tipoFaseContratacion/tipoFaseContratacion02/</str></arr><arr
>>> name="ca_excerpt"><str>E000002
>>> Acta del contrato
>>> Títol del contracte 2
>>> </str></arr><date
>>> name="timestamp">2013-02-14T09:44:02.904Z</date><float
>>> name="score">1.0</float><str
>>>
>>> name="link">/alcudiaweb/opencms/ajuntament/.content/contrato/contrato_00002.html</str></doc></result></response>
>>>
>>> Still no presence of the "numexp" field.
>>> Hope it helps.
>>>
>>> Kind regards,
>>>
>>> Arturo Martín Lladó
>>>
>>> 2013/2/11 Rüdiger Kurz <r.kurz at alkacon.com>:
>>>>
>>>> Hi,
>>>>
>>>> please paste the following URL into a browser window (you need to be
>>>> logged
>>>> into OpenCms with the browser for having access to OpenCms Offline Solr
>>>> Index):
>>>>
>>>>
>>>> http://localhost:8080/opencms/opencms/handleSolrSelect?fq=con_locales:*&fq=parent-folders:*&fq=path:/the/full/path/to/the/resource/you/expect&fl=*,score
>>>>
>>>> and send the XML result to the mailing list.
>>>>
>>>> regards
>>>> Rüdiger
>>>>
>>>> Am 11.02.2013 19:32, schrieb Ramon Gavira:
>>>>
>>>>> Hi Arturo.
>>>>>
>>>>> We have the same problema, did get it work??
>>>>>
>>>>> -----Mensaje original-----
>>>>> De: opencms-dev-bounces at opencms.org
>>>>> [mailto:opencms-dev-bounces at opencms.org]
>>>>> En nombre de Arturo Martín Lladó
>>>>> Enviado el: miércoles, 23 de enero de 2013 12:54
>>>>> Para: opencms-dev at opencms.org
>>>>> Asunto: [opencms-dev] Mapping OpenCMS structured content XML field to a
>>>>> SOLR
>>>>> field
>>>>>
>>>>> Hi,
>>>>>
>>>>> We are trying to map an OpenCMS structured content XML field to a SOLR
>>>>> field
>>>>> in order to perform a search using that field as a filter.
>>>>>
>>>>> The XML field is described this way in the XSD file:
>>>>>
>>>>> <xsd:complexType name="OpenCmsContrato">
>>>>>        <xsd:sequence>
>>>>>        [...]
>>>>>            <xsd:element name="numeroExpediente" type="OpenCmsString"
>>>>> minOccurs="1" maxOccurs="1" />
>>>>>        [...]
>>>>>        </xsd:sequence>
>>>>>        <xsd:attribute name="language" type="OpenCmsLocale"
>>>>> use="required"/>
>>>>> </xsd:complexType>
>>>>>
>>>>> And these are the search settings for the element, defined in the same
>>>>> XSD
>>>>> file:
>>>>>
>>>>> <xsd:annotation>
>>>>>        <xsd:appinfo>
>>>>>        [...]
>>>>>            <searchsettings>
>>>>>                <searchsetting element="numeroExpediente"
>>>>> searchcontent="true">
>>>>>                    <solrfield targetfield="numexp" />
>>>>>                </searchsetting>
>>>>>            </searchsettings>
>>>>>        [...]
>>>>>        </xsd:appinfo>
>>>>> </xsd:annotation>
>>>>>
>>>>> The target SOLR field "numexp" is defined this way in SOLR's schema.xml
>>>>> file:
>>>>>
>>>>> <fields>
>>>>>        <field name="numexp"                 type="string"
>>>>> indexed="true"  stored="true" />
>>>>>        [...]
>>>>> </fields>
>>>>>
>>>>> And this is the way we perform the query to SOLR on a JSP file:
>>>>>
>>>>> CmsSearchManager manager = OpenCms.getSearchManager(); CmsSolrIndex
>>>>> index
>>>>> =
>>>>> manager.getIndexSolr("Solr Online");
>>>>>
>>>>> String query = "fq=type:contrato";
>>>>>
>>>>> if (!"".equals(text))
>>>>>        query += "&fq=numexp:" + text;
>>>>>
>>>>> CmsSolrResultList listFiles = index.search(cmso, query);
>>>>>
>>>>> When we execute this code, we get listFiles.size() = 0, but when we
>>>>> change
>>>>> the filter field to the predifined SOLR field "content", this
>>>>> way:
>>>>>
>>>>> if (!"".equals(text))
>>>>>        query += "&fq=content:" + text;
>>>>>
>>>>> we get the expected result.
>>>>>
>>>>> With the CmsSearchResource object we get using the "content" SOLR field
>>>>> as
>>>>> filter, we are able to iterate over the fields of its inner
>>>>> I_CmsSearchDocument, getting this list as result:
>>>>>
>>>>> id
>>>>> contentblob
>>>>> path
>>>>> type
>>>>> suffix
>>>>> created
>>>>> lastmodified
>>>>> contentdate
>>>>> relased
>>>>> expired
>>>>> res_locales
>>>>> con_locales
>>>>> template_prop
>>>>> default-file_prop
>>>>> notification-interval_prop
>>>>> NavPos_prop
>>>>> enable-notification_prop
>>>>> locale_prop
>>>>> NavText_prop
>>>>> Title_prop
>>>>> category
>>>>> ca_excerpt
>>>>> timestamp
>>>>> score
>>>>> link
>>>>>
>>>>> No presence of the "numexp" field on the list. Why? Are we missing any
>>>>> step?
>>>>> Do we have to configure something else in order to make the mapping
>>>>> work?
>>>>>
>>>>> Thank you in advance.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Arturo.
>>>>> _______________________________________________
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Kind Regards,
>>>> Rüdiger.
>>>>
>>>> -------------------
>>>>
>>>> Rüdiger Kurz
>>>>
>>>> Alkacon Software GmbH  - The OpenCms Experts
>>>> http://www.alkacon.com - http://www.opencms.org
>>>>
>>>> _______________________________________________
>>>> This mail is sent to you from the opencms-dev mailing list
>>>> To change your list options, or to unsubscribe from the list, please
>>>> visit
>>>> http://lists.opencms.org/cgi-bin/mailman/listinfo/opencms-dev
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> This mail is sent to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please visit
>>> http://lists.opencms.org/cgi-bin/mailman/listinfo/opencms-dev
>>>
>>>
>>>

-- 
Kind Regards,
Rüdiger.

-------------------

Rüdiger Kurz

Alkacon Software GmbH  - The OpenCms Experts
http://www.alkacon.com - http://www.opencms.org



More information about the opencms-dev mailing list