[opencms-dev] Mapping OpenCMS structured content XML field to a SOLR field

Arturo Martín Lladó arturo.martin.llado at gmail.com
Thu Feb 14 13:11:21 CET 2013


Hi,

2013/2/14 Rüdiger Kurz <r.kurz at alkacon.com>:
> Arturo,
>
> the mapping you defined in the XSD should not create a field 'numexp' even
> you declared it explicitly in the schema.xml since XML contents can hold
> multi lingual content. Instead several dynamic fields named numexp_<locale>
> will be created within the index, but also those are missing in your case.
>
> Having a closer look at the schema.xml you will find a dynamic field for
> each language: *_en,*_de,*_el,*_es,*_fi,*_fr,*_hu,*_it
>
> Only those languages can be indexed correctly that are also defined as
> locale in the opencms-system.xml as in the schema.xml. If the locale is
> either missing in the schema or in the system configuration indexing will
> fail. Please check this.
>

Thank you VERY much!

That's it! Right now I'm working with the locale "ca" (i.e. Catalan).
The locale was defined on opencms-system.xml but there was no field or
type defined on SolR schema.xml file. That's what I had to do in
schema.xml:

1. Create the new text_ca type:

    <!-- Catalan -->
    <fieldType name="text_ca" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_ca.txt" format="snowball"
enablePositionIncrements="true"/>
        <!-- Unfortunately, this class does not exists <filter
class="solr.CatalanLightStemFilterFactory"/> -->
        <!-- more aggressive: <filter
class="solr.SnowballPorterFilterFactory" language="Catalan"/> -->
      </analyzer>
    </fieldType>

2. Create the new text_ca field:

<field name="text_ca"             type="text_ca"      indexed="true"
stored="false" multiValued="true"/><!-- Catchall for Catalan  text
fields -->

(copy&paste warning here for you: I can see "Catchall for German  text
fields" commented for the rest of fields below "text_de" :)

3. Create the new dynamic field for the "ca" locale:

<dynamicField name="*_ca"         type="text_ca"      indexed="true"
stored="true" multiValued="true" />

4. Copy the new field to its generic "collector" text field:

 <copyField source="*_ca"      dest="text_ca"/>

5. Restart Tomcat, touch the files and rebuild the SolR indeces.

Now I can see the fields and query using them:

<response>

[...]

<arr name="numexp_es"><str>E0000001</str></arr>
<arr name="numexp_ca"><str>E0000001</str></arr>

[...]

</response>

Again, thank you VERY much, Rüdiger :-)

Regards.

> Another reason for the missing field could be that you did not touch/publish
> the expected resource after changing the XSD. The resource you expect must
> have been touched and or published, otherwise the according document will
> not have that new field. Alternatively, to be really sure you can rebuild
> the whole index.
>
> regards
> Rüdiger
>
> Am 14.02.2013 11:01, schrieb Arturo Martín Lladó:
>
>> Hi,
>>
>> This is the XML output of the following URL:
>>
>> http://localhost:8080/XXX/opencms/handleSolrSelect?fq=con_locales:*&fq=parent-folders:*&fq=path:/sites/default/ajuntament/.content/contrato/contrato_00002.html&fl=*,score
>>
>> <response><lst name="responseHeader"><int name="status">0</int><int
>> name="QTime">4</int><lst name="params"><str name="q">*:*</str><str
>> name="fl">*,score</str><str name="qt">edismax</str><int
>> name="rows">10</int><arr
>>
>> name="fq"><str>con_locales:*</str><str>parent-folders:*</str><str>path:/sites/default/ajuntament/.content/contrato/contrato_00002.html</str></arr><long
>> name="start">0</long></lst></lst><result name="response" numFound="1"
>> start="0"><doc><str
>> name="id">2c61e8eb-6153-11e2-bfbe-d1cbafdd7d70</str><str
>> name="contentblob">[B:[B at 4e2d1d1e</str><str
>>
>> name="path">/sites/default/ajuntament/.content/contrato/contrato_00002.html</str><str
>> name="type">contrato</str><str name="suffix">.html</str><date
>> name="created">2013-01-18T09:41:03Z</date><date
>> name="lastmodified">2013-01-30T15:59:08Z</date><date
>> name="contentdate">2013-02-13T14:51:16.803Z</date><date
>> name="relased">1970-01-01T00:00:00Z</date><date
>> name="expired">292278994-08-17T07:12:55.807Z</date><arr
>> name="res_locales"><str>ca</str></arr><arr
>> name="con_locales"><str>ca</str></arr><str
>>
>> name="template_prop">/system/modules/es.tresdigits.alcudiaweb/templates/basica2columnas.jsp</str><str
>> name="default-file_prop">index.html</str><str
>> name="notification-interval_prop">0</str><str
>> name="NavPos_prop">2.0</str><str
>> name="enable-notification_prop">false</str><str
>> name="locale_prop">ca</str><str
>> name="NavText_prop">Principal</str><str name="Title_prop">Títol del
>> contracte 2</str><arr
>>
>> name="category"><str>ca/</str><str>ca/tipoContrato/</str><str>ca/tipoContrato/tipo02/</str><str>ca/tipoDocumentoContratacion/</str><str>ca/tipoDocumentoContratacion/tipoDocumentoContratacion02/</str><str>ca/tipoFaseContratacion/</str><str>ca/tipoFaseContratacion/tipoFaseContratacion02/</str></arr><arr
>> name="ca_excerpt"><str>E000002
>> Acta del contrato
>> Títol del contracte 2
>> </str></arr><date
>> name="timestamp">2013-02-14T09:44:02.904Z</date><float
>> name="score">1.0</float><str
>>
>> name="link">/alcudiaweb/opencms/ajuntament/.content/contrato/contrato_00002.html</str></doc></result></response>
>>
>> Still no presence of the "numexp" field.
>> Hope it helps.
>>
>> Kind regards,
>>
>> Arturo Martín Lladó
>>
>> 2013/2/11 Rüdiger Kurz <r.kurz at alkacon.com>:
>>>
>>> Hi,
>>>
>>> please paste the following URL into a browser window (you need to be
>>> logged
>>> into OpenCms with the browser for having access to OpenCms Offline Solr
>>> Index):
>>>
>>>
>>> http://localhost:8080/opencms/opencms/handleSolrSelect?fq=con_locales:*&fq=parent-folders:*&fq=path:/the/full/path/to/the/resource/you/expect&fl=*,score
>>>
>>> and send the XML result to the mailing list.
>>>
>>> regards
>>> Rüdiger
>>>
>>> Am 11.02.2013 19:32, schrieb Ramon Gavira:
>>>
>>>> Hi Arturo.
>>>>
>>>> We have the same problema, did get it work??
>>>>
>>>> -----Mensaje original-----
>>>> De: opencms-dev-bounces at opencms.org
>>>> [mailto:opencms-dev-bounces at opencms.org]
>>>> En nombre de Arturo Martín Lladó
>>>> Enviado el: miércoles, 23 de enero de 2013 12:54
>>>> Para: opencms-dev at opencms.org
>>>> Asunto: [opencms-dev] Mapping OpenCMS structured content XML field to a
>>>> SOLR
>>>> field
>>>>
>>>> Hi,
>>>>
>>>> We are trying to map an OpenCMS structured content XML field to a SOLR
>>>> field
>>>> in order to perform a search using that field as a filter.
>>>>
>>>> The XML field is described this way in the XSD file:
>>>>
>>>> <xsd:complexType name="OpenCmsContrato">
>>>>       <xsd:sequence>
>>>>       [...]
>>>>           <xsd:element name="numeroExpediente" type="OpenCmsString"
>>>> minOccurs="1" maxOccurs="1" />
>>>>       [...]
>>>>       </xsd:sequence>
>>>>       <xsd:attribute name="language" type="OpenCmsLocale"
>>>> use="required"/>
>>>> </xsd:complexType>
>>>>
>>>> And these are the search settings for the element, defined in the same
>>>> XSD
>>>> file:
>>>>
>>>> <xsd:annotation>
>>>>       <xsd:appinfo>
>>>>       [...]
>>>>           <searchsettings>
>>>>               <searchsetting element="numeroExpediente"
>>>> searchcontent="true">
>>>>                   <solrfield targetfield="numexp" />
>>>>               </searchsetting>
>>>>           </searchsettings>
>>>>       [...]
>>>>       </xsd:appinfo>
>>>> </xsd:annotation>
>>>>
>>>> The target SOLR field "numexp" is defined this way in SOLR's schema.xml
>>>> file:
>>>>
>>>> <fields>
>>>>       <field name="numexp"                 type="string"
>>>> indexed="true"  stored="true" />
>>>>       [...]
>>>> </fields>
>>>>
>>>> And this is the way we perform the query to SOLR on a JSP file:
>>>>
>>>> CmsSearchManager manager = OpenCms.getSearchManager(); CmsSolrIndex
>>>> index
>>>> =
>>>> manager.getIndexSolr("Solr Online");
>>>>
>>>> String query = "fq=type:contrato";
>>>>
>>>> if (!"".equals(text))
>>>>       query += "&fq=numexp:" + text;
>>>>
>>>> CmsSolrResultList listFiles = index.search(cmso, query);
>>>>
>>>> When we execute this code, we get listFiles.size() = 0, but when we
>>>> change
>>>> the filter field to the predifined SOLR field "content", this
>>>> way:
>>>>
>>>> if (!"".equals(text))
>>>>       query += "&fq=content:" + text;
>>>>
>>>> we get the expected result.
>>>>
>>>> With the CmsSearchResource object we get using the "content" SOLR field
>>>> as
>>>> filter, we are able to iterate over the fields of its inner
>>>> I_CmsSearchDocument, getting this list as result:
>>>>
>>>> id
>>>> contentblob
>>>> path
>>>> type
>>>> suffix
>>>> created
>>>> lastmodified
>>>> contentdate
>>>> relased
>>>> expired
>>>> res_locales
>>>> con_locales
>>>> template_prop
>>>> default-file_prop
>>>> notification-interval_prop
>>>> NavPos_prop
>>>> enable-notification_prop
>>>> locale_prop
>>>> NavText_prop
>>>> Title_prop
>>>> category
>>>> ca_excerpt
>>>> timestamp
>>>> score
>>>> link
>>>>
>>>> No presence of the "numexp" field on the list. Why? Are we missing any
>>>> step?
>>>> Do we have to configure something else in order to make the mapping
>>>> work?
>>>>
>>>> Thank you in advance.
>>>>
>>>> Regards,
>>>>
>>>> Arturo.
>>>> _______________________________________________
>>>
>>>
>>>
>>>
>>> --
>>> Kind Regards,
>>> Rüdiger.
>>>
>>> -------------------
>>>
>>> Rüdiger Kurz
>>>
>>> Alkacon Software GmbH  - The OpenCms Experts
>>> http://www.alkacon.com - http://www.opencms.org
>>>
>>> _______________________________________________
>>> This mail is sent to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please
>>> visit
>>> http://lists.opencms.org/cgi-bin/mailman/listinfo/opencms-dev
>>>
>>>
>>>
>> _______________________________________________
>> This mail is sent to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please visit
>> http://lists.opencms.org/cgi-bin/mailman/listinfo/opencms-dev
>>
>>
>>
>
> --
> Rüdiger Kurz
>
> -------------------
>
>
> Alkacon Software GmbH - The OpenCms Experts
> Rüdiger Kurz
> An der Wachsfabrik 13
> 50996 Koeln, DE
>
> Tel: +49 (0)2236 3826-16
> Fax: +49 (0)2236 3826-20
> Email: r.kurz at alkacon.com
>
> http://www.alkacon.com
> http://www.opencms.org
>
> Geschäftsführer: Alexander Kandzior, Amtsgericht Köln, HRB 54613



-- 
====================
Arturo Martín Lladó
====================



More information about the opencms-dev mailing list