[opencms-dev] Mapping OpenCMS structured content XML field to a SOLR field

Arturo Martín Lladó arturo.martin.llado at gmail.com
Fri Feb 22 10:07:28 CET 2013


Hi, Rüdiger:

Regarding "exact" field configuration, I had this configuration before
for this field:

		<searchsetting element="tipoContrato" searchcontent="true">
                    <solrfield targetfield="tipoContrato"/>
                </searchsetting>

and the field was shown in the XML result offered querying
handleSolrSelect (as tipoContrato_es and tipoContrato_ca).

Now I need to perform exact searches over this field, so I configured
it as you recommended:

		<searchsetting element="tipoContrato" searchcontent="true">
                    <solrfield targetfield="tipoContrato"
sourcefield="*_exact"/>
                </searchsetting>

After publishing the XSD, restarting Tomcat, touching the related
files (rewriting content) and rebuilding the Solr index, the fields
tipoContrato_es_exact and tipoContrato_ca_exact are not shown in the
XML result.

Am I missing something here? Do I need to configure something else in
Solr's schema.xml?

Kind regards,

2013/2/14 Arturo Martín Lladó <arturo.martin.llado at gmail.com>:
> Thank you again, for this tip and for your help, Rüdiger :-)
>
> Kind regards.
>
> 2013/2/14 Rüdiger Kurz <r.kurz at alkacon.com>:
>> Hi Arturo,
>>
>> happy to hear that you were able to solve the problem. Just a small
>> addition: Now seeing that the values like E0000001 don't need to be
>> interpreted under lingual aspects, you might want to index them as pure
>> String values. To do this you have to modify your mapping within the XSD to:
>>
>>
>> <searchsetting element="numeroExpediente" searchcontent="true">
>>   <solrfield targetfield="numexp" sourcefield="*_exact" />
>> </searchsetting>
>>
>> After touching/publishing you will have fields like:
>> numexp_<locale>_exact
>>
>> This will increase your search performance and you can also make use of
>> field type specific Solr features like faceting or date handling
>> (sourcefield="*_dt").
>>
>> In any case it is very good that you have added the missing locale to your
>> schema.xml, not at least for text based search / auto-completion /
>> spell-checking, or what other lingual Solr features you like to use.
>>
>> greetings
>> Rüdiger
>>
>> Am 14.02.2013 13:11, schrieb Arturo Martín Lladó:
>>
>>> Hi,
>>>
>>> 2013/2/14 Rüdiger Kurz <r.kurz at alkacon.com>:
>>>>
>>>> Arturo,
>>>>
>>>> the mapping you defined in the XSD should not create a field 'numexp'
>>>> even
>>>> you declared it explicitly in the schema.xml since XML contents can hold
>>>> multi lingual content. Instead several dynamic fields named
>>>> numexp_<locale>
>>>> will be created within the index, but also those are missing in your
>>>> case.
>>>>
>>>> Having a closer look at the schema.xml you will find a dynamic field for
>>>> each language: *_en,*_de,*_el,*_es,*_fi,*_fr,*_hu,*_it
>>>>
>>>> Only those languages can be indexed correctly that are also defined as
>>>> locale in the opencms-system.xml as in the schema.xml. If the locale is
>>>> either missing in the schema or in the system configuration indexing will
>>>> fail. Please check this.
>>>>
>>>
>>> Thank you VERY much!
>>>
>>> That's it! Right now I'm working with the locale "ca" (i.e. Catalan).
>>> The locale was defined on opencms-system.xml but there was no field or
>>> type defined on SolR schema.xml file. That's what I had to do in
>>> schema.xml:
>>>
>>> 1. Create the new text_ca type:
>>>
>>>      <!-- Catalan -->
>>>      <fieldType name="text_ca" class="solr.TextField"
>>> positionIncrementGap="100">
>>>        <analyzer>
>>>          <tokenizer class="solr.StandardTokenizerFactory"/>
>>>          <filter class="solr.LowerCaseFilterFactory"/>
>>>          <filter class="solr.StopFilterFactory" ignoreCase="true"
>>> words="lang/stopwords_ca.txt" format="snowball"
>>> enablePositionIncrements="true"/>
>>>          <!-- Unfortunately, this class does not exists <filter
>>> class="solr.CatalanLightStemFilterFactory"/> -->
>>>          <!-- more aggressive: <filter
>>> class="solr.SnowballPorterFilterFactory" language="Catalan"/> -->
>>>        </analyzer>
>>>      </fieldType>
>>>
>>> 2. Create the new text_ca field:
>>>
>>> <field name="text_ca"             type="text_ca"      indexed="true"
>>> stored="false" multiValued="true"/><!-- Catchall for Catalan  text
>>> fields -->
>>>
>>> (copy&paste warning here for you: I can see "Catchall for German  text
>>> fields" commented for the rest of fields below "text_de" :)
>>>
>>> 3. Create the new dynamic field for the "ca" locale:
>>>
>>> <dynamicField name="*_ca"         type="text_ca"      indexed="true"
>>> stored="true" multiValued="true" />
>>>
>>> 4. Copy the new field to its generic "collector" text field:
>>>
>>>   <copyField source="*_ca"      dest="text_ca"/>
>>>
>>> 5. Restart Tomcat, touch the files and rebuild the SolR indeces.
>>>
>>> Now I can see the fields and query using them:
>>>
>>> <response>
>>>
>>> [...]
>>>
>>> <arr name="numexp_es"><str>E0000001</str></arr>
>>> <arr name="numexp_ca"><str>E0000001</str></arr>
>>>
>>> [...]
>>>
>>> </response>
>>>
>>> Again, thank you VERY much, Rüdiger :-)
>>>
>>> Regards.
>>>
>>>> Another reason for the missing field could be that you did not
>>>> touch/publish
>>>> the expected resource after changing the XSD. The resource you expect
>>>> must
>>>> have been touched and or published, otherwise the according document will
>>>> not have that new field. Alternatively, to be really sure you can rebuild
>>>> the whole index.
>>>>
>>>> regards
>>>> Rüdiger
>>>>
>>>> Am 14.02.2013 11:01, schrieb Arturo Martín Lladó:
>>>>
>>>>> Hi,
>>>>>
>>>>> This is the XML output of the following URL:
>>>>>
>>>>>
>>>>> http://localhost:8080/XXX/opencms/handleSolrSelect?fq=con_locales:*&fq=parent-folders:*&fq=path:/sites/default/ajuntament/.content/contrato/contrato_00002.html&fl=*,score
>>>>>
>>>>> <response><lst name="responseHeader"><int name="status">0</int><int
>>>>> name="QTime">4</int><lst name="params"><str name="q">*:*</str><str
>>>>> name="fl">*,score</str><str name="qt">edismax</str><int
>>>>> name="rows">10</int><arr
>>>>>
>>>>>
>>>>> name="fq"><str>con_locales:*</str><str>parent-folders:*</str><str>path:/sites/default/ajuntament/.content/contrato/contrato_00002.html</str></arr><long
>>>>> name="start">0</long></lst></lst><result name="response" numFound="1"
>>>>> start="0"><doc><str
>>>>> name="id">2c61e8eb-6153-11e2-bfbe-d1cbafdd7d70</str><str
>>>>> name="contentblob">[B:[B at 4e2d1d1e</str><str
>>>>>
>>>>>
>>>>> name="path">/sites/default/ajuntament/.content/contrato/contrato_00002.html</str><str
>>>>> name="type">contrato</str><str name="suffix">.html</str><date
>>>>> name="created">2013-01-18T09:41:03Z</date><date
>>>>> name="lastmodified">2013-01-30T15:59:08Z</date><date
>>>>> name="contentdate">2013-02-13T14:51:16.803Z</date><date
>>>>> name="relased">1970-01-01T00:00:00Z</date><date
>>>>> name="expired">292278994-08-17T07:12:55.807Z</date><arr
>>>>> name="res_locales"><str>ca</str></arr><arr
>>>>> name="con_locales"><str>ca</str></arr><str
>>>>>
>>>>>
>>>>> name="template_prop">/system/modules/es.tresdigits.alcudiaweb/templates/basica2columnas.jsp</str><str
>>>>> name="default-file_prop">index.html</str><str
>>>>> name="notification-interval_prop">0</str><str
>>>>> name="NavPos_prop">2.0</str><str
>>>>> name="enable-notification_prop">false</str><str
>>>>> name="locale_prop">ca</str><str
>>>>> name="NavText_prop">Principal</str><str name="Title_prop">Títol del
>>>>> contracte 2</str><arr
>>>>>
>>>>>
>>>>> name="category"><str>ca/</str><str>ca/tipoContrato/</str><str>ca/tipoContrato/tipo02/</str><str>ca/tipoDocumentoContratacion/</str><str>ca/tipoDocumentoContratacion/tipoDocumentoContratacion02/</str><str>ca/tipoFaseContratacion/</str><str>ca/tipoFaseContratacion/tipoFaseContratacion02/</str></arr><arr
>>>>> name="ca_excerpt"><str>E000002
>>>>> Acta del contrato
>>>>> Títol del contracte 2
>>>>> </str></arr><date
>>>>> name="timestamp">2013-02-14T09:44:02.904Z</date><float
>>>>> name="score">1.0</float><str
>>>>>
>>>>>
>>>>> name="link">/alcudiaweb/opencms/ajuntament/.content/contrato/contrato_00002.html</str></doc></result></response>
>>>>>
>>>>> Still no presence of the "numexp" field.
>>>>> Hope it helps.
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Arturo Martín Lladó
>>>>>
>>>>> 2013/2/11 Rüdiger Kurz <r.kurz at alkacon.com>:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> please paste the following URL into a browser window (you need to be
>>>>>> logged
>>>>>> into OpenCms with the browser for having access to OpenCms Offline Solr
>>>>>> Index):
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://localhost:8080/opencms/opencms/handleSolrSelect?fq=con_locales:*&fq=parent-folders:*&fq=path:/the/full/path/to/the/resource/you/expect&fl=*,score
>>>>>>
>>>>>> and send the XML result to the mailing list.
>>>>>>
>>>>>> regards
>>>>>> Rüdiger
>>>>>>
>>>>>> Am 11.02.2013 19:32, schrieb Ramon Gavira:
>>>>>>
>>>>>>> Hi Arturo.
>>>>>>>
>>>>>>> We have the same problema, did get it work??
>>>>>>>
>>>>>>> -----Mensaje original-----
>>>>>>> De: opencms-dev-bounces at opencms.org
>>>>>>> [mailto:opencms-dev-bounces at opencms.org]
>>>>>>> En nombre de Arturo Martín Lladó
>>>>>>> Enviado el: miércoles, 23 de enero de 2013 12:54
>>>>>>> Para: opencms-dev at opencms.org
>>>>>>> Asunto: [opencms-dev] Mapping OpenCMS structured content XML field to
>>>>>>> a
>>>>>>> SOLR
>>>>>>> field
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We are trying to map an OpenCMS structured content XML field to a SOLR
>>>>>>> field
>>>>>>> in order to perform a search using that field as a filter.
>>>>>>>
>>>>>>> The XML field is described this way in the XSD file:
>>>>>>>
>>>>>>> <xsd:complexType name="OpenCmsContrato">
>>>>>>>        <xsd:sequence>
>>>>>>>        [...]
>>>>>>>            <xsd:element name="numeroExpediente" type="OpenCmsString"
>>>>>>> minOccurs="1" maxOccurs="1" />
>>>>>>>        [...]
>>>>>>>        </xsd:sequence>
>>>>>>>        <xsd:attribute name="language" type="OpenCmsLocale"
>>>>>>> use="required"/>
>>>>>>> </xsd:complexType>
>>>>>>>
>>>>>>> And these are the search settings for the element, defined in the same
>>>>>>> XSD
>>>>>>> file:
>>>>>>>
>>>>>>> <xsd:annotation>
>>>>>>>        <xsd:appinfo>
>>>>>>>        [...]
>>>>>>>            <searchsettings>
>>>>>>>                <searchsetting element="numeroExpediente"
>>>>>>> searchcontent="true">
>>>>>>>                    <solrfield targetfield="numexp" />
>>>>>>>                </searchsetting>
>>>>>>>            </searchsettings>
>>>>>>>        [...]
>>>>>>>        </xsd:appinfo>
>>>>>>> </xsd:annotation>
>>>>>>>
>>>>>>> The target SOLR field "numexp" is defined this way in SOLR's
>>>>>>> schema.xml
>>>>>>> file:
>>>>>>>
>>>>>>> <fields>
>>>>>>>        <field name="numexp"                 type="string"
>>>>>>> indexed="true"  stored="true" />
>>>>>>>        [...]
>>>>>>> </fields>
>>>>>>>
>>>>>>> And this is the way we perform the query to SOLR on a JSP file:
>>>>>>>
>>>>>>> CmsSearchManager manager = OpenCms.getSearchManager(); CmsSolrIndex
>>>>>>> index
>>>>>>> =
>>>>>>> manager.getIndexSolr("Solr Online");
>>>>>>>
>>>>>>> String query = "fq=type:contrato";
>>>>>>>
>>>>>>> if (!"".equals(text))
>>>>>>>        query += "&fq=numexp:" + text;
>>>>>>>
>>>>>>> CmsSolrResultList listFiles = index.search(cmso, query);
>>>>>>>
>>>>>>> When we execute this code, we get listFiles.size() = 0, but when we
>>>>>>> change
>>>>>>> the filter field to the predifined SOLR field "content", this
>>>>>>> way:
>>>>>>>
>>>>>>> if (!"".equals(text))
>>>>>>>        query += "&fq=content:" + text;
>>>>>>>
>>>>>>> we get the expected result.
>>>>>>>
>>>>>>> With the CmsSearchResource object we get using the "content" SOLR
>>>>>>> field
>>>>>>> as
>>>>>>> filter, we are able to iterate over the fields of its inner
>>>>>>> I_CmsSearchDocument, getting this list as result:
>>>>>>>
>>>>>>> id
>>>>>>> contentblob
>>>>>>> path
>>>>>>> type
>>>>>>> suffix
>>>>>>> created
>>>>>>> lastmodified
>>>>>>> contentdate
>>>>>>> relased
>>>>>>> expired
>>>>>>> res_locales
>>>>>>> con_locales
>>>>>>> template_prop
>>>>>>> default-file_prop
>>>>>>> notification-interval_prop
>>>>>>> NavPos_prop
>>>>>>> enable-notification_prop
>>>>>>> locale_prop
>>>>>>> NavText_prop
>>>>>>> Title_prop
>>>>>>> category
>>>>>>> ca_excerpt
>>>>>>> timestamp
>>>>>>> score
>>>>>>> link
>>>>>>>
>>>>>>> No presence of the "numexp" field on the list. Why? Are we missing any
>>>>>>> step?
>>>>>>> Do we have to configure something else in order to make the mapping
>>>>>>> work?
>>>>>>>
>>>>>>> Thank you in advance.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Arturo.
>>>>>>> _______________________________________________
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Kind Regards,
>>>>>> Rüdiger.
>>>>>>
>>>>>> -------------------
>>>>>>
>>>>>> Rüdiger Kurz
>>>>>>
>>>>>> Alkacon Software GmbH  - The OpenCms Experts
>>>>>> http://www.alkacon.com - http://www.opencms.org
>>>>>>
>>>>>> _______________________________________________
>>>>>> This mail is sent to you from the opencms-dev mailing list
>>>>>> To change your list options, or to unsubscribe from the list, please
>>>>>> visit
>>>>>> http://lists.opencms.org/cgi-bin/mailman/listinfo/opencms-dev
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> This mail is sent to you from the opencms-dev mailing list
>>>>> To change your list options, or to unsubscribe from the list, please
>>>>> visit
>>>>> http://lists.opencms.org/cgi-bin/mailman/listinfo/opencms-dev
>>>>>
>>>>>
>>>>>
>>
>> --
>> Kind Regards,
>> Rüdiger.
>>
>> -------------------
>>
>> Rüdiger Kurz
>>
>> Alkacon Software GmbH  - The OpenCms Experts
>> http://www.alkacon.com - http://www.opencms.org
>
>
>
> --
> Arturo Martín Lladó



-- 
Arturo Martín Lladó



More information about the opencms-dev mailing list