[opencms-dev] Mapping OpenCMS structured content XML field to a SOLR field
Arturo Martín Lladó
arturo.martin.llado at gmail.com
Thu Feb 14 16:20:37 CET 2013
Thank you again, for this tip and for your help, Rüdiger :-)
Kind regards.
2013/2/14 Rüdiger Kurz <r.kurz at alkacon.com>:
> Hi Arturo,
>
> happy to hear that you were able to solve the problem. Just a small
> addition: Now seeing that the values like E0000001 don't need to be
> interpreted under lingual aspects, you might want to index them as pure
> String values. To do this you have to modify your mapping within the XSD to:
>
>
> <searchsetting element="numeroExpediente" searchcontent="true">
> <solrfield targetfield="numexp" sourcefield="*_exact" />
> </searchsetting>
>
> After touching/publishing you will have fields like:
> numexp_<locale>_exact
>
> This will increase your search performance and you can also make use of
> field type specific Solr features like faceting or date handling
> (sourcefield="*_dt").
>
> In any case it is very good that you have added the missing locale to your
> schema.xml, not at least for text based search / auto-completion /
> spell-checking, or what other lingual Solr features you like to use.
>
> greetings
> Rüdiger
>
> Am 14.02.2013 13:11, schrieb Arturo Martín Lladó:
>
>> Hi,
>>
>> 2013/2/14 Rüdiger Kurz <r.kurz at alkacon.com>:
>>>
>>> Arturo,
>>>
>>> the mapping you defined in the XSD should not create a field 'numexp'
>>> even
>>> you declared it explicitly in the schema.xml since XML contents can hold
>>> multi lingual content. Instead several dynamic fields named
>>> numexp_<locale>
>>> will be created within the index, but also those are missing in your
>>> case.
>>>
>>> Having a closer look at the schema.xml you will find a dynamic field for
>>> each language: *_en,*_de,*_el,*_es,*_fi,*_fr,*_hu,*_it
>>>
>>> Only those languages can be indexed correctly that are also defined as
>>> locale in the opencms-system.xml as in the schema.xml. If the locale is
>>> either missing in the schema or in the system configuration indexing will
>>> fail. Please check this.
>>>
>>
>> Thank you VERY much!
>>
>> That's it! Right now I'm working with the locale "ca" (i.e. Catalan).
>> The locale was defined on opencms-system.xml but there was no field or
>> type defined on SolR schema.xml file. That's what I had to do in
>> schema.xml:
>>
>> 1. Create the new text_ca type:
>>
>> <!-- Catalan -->
>> <fieldType name="text_ca" class="solr.TextField"
>> positionIncrementGap="100">
>> <analyzer>
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="lang/stopwords_ca.txt" format="snowball"
>> enablePositionIncrements="true"/>
>> <!-- Unfortunately, this class does not exists <filter
>> class="solr.CatalanLightStemFilterFactory"/> -->
>> <!-- more aggressive: <filter
>> class="solr.SnowballPorterFilterFactory" language="Catalan"/> -->
>> </analyzer>
>> </fieldType>
>>
>> 2. Create the new text_ca field:
>>
>> <field name="text_ca" type="text_ca" indexed="true"
>> stored="false" multiValued="true"/><!-- Catchall for Catalan text
>> fields -->
>>
>> (copy&paste warning here for you: I can see "Catchall for German text
>> fields" commented for the rest of fields below "text_de" :)
>>
>> 3. Create the new dynamic field for the "ca" locale:
>>
>> <dynamicField name="*_ca" type="text_ca" indexed="true"
>> stored="true" multiValued="true" />
>>
>> 4. Copy the new field to its generic "collector" text field:
>>
>> <copyField source="*_ca" dest="text_ca"/>
>>
>> 5. Restart Tomcat, touch the files and rebuild the SolR indeces.
>>
>> Now I can see the fields and query using them:
>>
>> <response>
>>
>> [...]
>>
>> <arr name="numexp_es"><str>E0000001</str></arr>
>> <arr name="numexp_ca"><str>E0000001</str></arr>
>>
>> [...]
>>
>> </response>
>>
>> Again, thank you VERY much, Rüdiger :-)
>>
>> Regards.
>>
>>> Another reason for the missing field could be that you did not
>>> touch/publish
>>> the expected resource after changing the XSD. The resource you expect
>>> must
>>> have been touched and or published, otherwise the according document will
>>> not have that new field. Alternatively, to be really sure you can rebuild
>>> the whole index.
>>>
>>> regards
>>> Rüdiger
>>>
>>> Am 14.02.2013 11:01, schrieb Arturo Martín Lladó:
>>>
>>>> Hi,
>>>>
>>>> This is the XML output of the following URL:
>>>>
>>>>
>>>> http://localhost:8080/XXX/opencms/handleSolrSelect?fq=con_locales:*&fq=parent-folders:*&fq=path:/sites/default/ajuntament/.content/contrato/contrato_00002.html&fl=*,score
>>>>
>>>> <response><lst name="responseHeader"><int name="status">0</int><int
>>>> name="QTime">4</int><lst name="params"><str name="q">*:*</str><str
>>>> name="fl">*,score</str><str name="qt">edismax</str><int
>>>> name="rows">10</int><arr
>>>>
>>>>
>>>> name="fq"><str>con_locales:*</str><str>parent-folders:*</str><str>path:/sites/default/ajuntament/.content/contrato/contrato_00002.html</str></arr><long
>>>> name="start">0</long></lst></lst><result name="response" numFound="1"
>>>> start="0"><doc><str
>>>> name="id">2c61e8eb-6153-11e2-bfbe-d1cbafdd7d70</str><str
>>>> name="contentblob">[B:[B at 4e2d1d1e</str><str
>>>>
>>>>
>>>> name="path">/sites/default/ajuntament/.content/contrato/contrato_00002.html</str><str
>>>> name="type">contrato</str><str name="suffix">.html</str><date
>>>> name="created">2013-01-18T09:41:03Z</date><date
>>>> name="lastmodified">2013-01-30T15:59:08Z</date><date
>>>> name="contentdate">2013-02-13T14:51:16.803Z</date><date
>>>> name="relased">1970-01-01T00:00:00Z</date><date
>>>> name="expired">292278994-08-17T07:12:55.807Z</date><arr
>>>> name="res_locales"><str>ca</str></arr><arr
>>>> name="con_locales"><str>ca</str></arr><str
>>>>
>>>>
>>>> name="template_prop">/system/modules/es.tresdigits.alcudiaweb/templates/basica2columnas.jsp</str><str
>>>> name="default-file_prop">index.html</str><str
>>>> name="notification-interval_prop">0</str><str
>>>> name="NavPos_prop">2.0</str><str
>>>> name="enable-notification_prop">false</str><str
>>>> name="locale_prop">ca</str><str
>>>> name="NavText_prop">Principal</str><str name="Title_prop">Títol del
>>>> contracte 2</str><arr
>>>>
>>>>
>>>> name="category"><str>ca/</str><str>ca/tipoContrato/</str><str>ca/tipoContrato/tipo02/</str><str>ca/tipoDocumentoContratacion/</str><str>ca/tipoDocumentoContratacion/tipoDocumentoContratacion02/</str><str>ca/tipoFaseContratacion/</str><str>ca/tipoFaseContratacion/tipoFaseContratacion02/</str></arr><arr
>>>> name="ca_excerpt"><str>E000002
>>>> Acta del contrato
>>>> Títol del contracte 2
>>>> </str></arr><date
>>>> name="timestamp">2013-02-14T09:44:02.904Z</date><float
>>>> name="score">1.0</float><str
>>>>
>>>>
>>>> name="link">/alcudiaweb/opencms/ajuntament/.content/contrato/contrato_00002.html</str></doc></result></response>
>>>>
>>>> Still no presence of the "numexp" field.
>>>> Hope it helps.
>>>>
>>>> Kind regards,
>>>>
>>>> Arturo Martín Lladó
>>>>
>>>> 2013/2/11 Rüdiger Kurz <r.kurz at alkacon.com>:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> please paste the following URL into a browser window (you need to be
>>>>> logged
>>>>> into OpenCms with the browser for having access to OpenCms Offline Solr
>>>>> Index):
>>>>>
>>>>>
>>>>>
>>>>> http://localhost:8080/opencms/opencms/handleSolrSelect?fq=con_locales:*&fq=parent-folders:*&fq=path:/the/full/path/to/the/resource/you/expect&fl=*,score
>>>>>
>>>>> and send the XML result to the mailing list.
>>>>>
>>>>> regards
>>>>> Rüdiger
>>>>>
>>>>> Am 11.02.2013 19:32, schrieb Ramon Gavira:
>>>>>
>>>>>> Hi Arturo.
>>>>>>
>>>>>> We have the same problema, did get it work??
>>>>>>
>>>>>> -----Mensaje original-----
>>>>>> De: opencms-dev-bounces at opencms.org
>>>>>> [mailto:opencms-dev-bounces at opencms.org]
>>>>>> En nombre de Arturo Martín Lladó
>>>>>> Enviado el: miércoles, 23 de enero de 2013 12:54
>>>>>> Para: opencms-dev at opencms.org
>>>>>> Asunto: [opencms-dev] Mapping OpenCMS structured content XML field to
>>>>>> a
>>>>>> SOLR
>>>>>> field
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We are trying to map an OpenCMS structured content XML field to a SOLR
>>>>>> field
>>>>>> in order to perform a search using that field as a filter.
>>>>>>
>>>>>> The XML field is described this way in the XSD file:
>>>>>>
>>>>>> <xsd:complexType name="OpenCmsContrato">
>>>>>> <xsd:sequence>
>>>>>> [...]
>>>>>> <xsd:element name="numeroExpediente" type="OpenCmsString"
>>>>>> minOccurs="1" maxOccurs="1" />
>>>>>> [...]
>>>>>> </xsd:sequence>
>>>>>> <xsd:attribute name="language" type="OpenCmsLocale"
>>>>>> use="required"/>
>>>>>> </xsd:complexType>
>>>>>>
>>>>>> And these are the search settings for the element, defined in the same
>>>>>> XSD
>>>>>> file:
>>>>>>
>>>>>> <xsd:annotation>
>>>>>> <xsd:appinfo>
>>>>>> [...]
>>>>>> <searchsettings>
>>>>>> <searchsetting element="numeroExpediente"
>>>>>> searchcontent="true">
>>>>>> <solrfield targetfield="numexp" />
>>>>>> </searchsetting>
>>>>>> </searchsettings>
>>>>>> [...]
>>>>>> </xsd:appinfo>
>>>>>> </xsd:annotation>
>>>>>>
>>>>>> The target SOLR field "numexp" is defined this way in SOLR's
>>>>>> schema.xml
>>>>>> file:
>>>>>>
>>>>>> <fields>
>>>>>> <field name="numexp" type="string"
>>>>>> indexed="true" stored="true" />
>>>>>> [...]
>>>>>> </fields>
>>>>>>
>>>>>> And this is the way we perform the query to SOLR on a JSP file:
>>>>>>
>>>>>> CmsSearchManager manager = OpenCms.getSearchManager(); CmsSolrIndex
>>>>>> index
>>>>>> =
>>>>>> manager.getIndexSolr("Solr Online");
>>>>>>
>>>>>> String query = "fq=type:contrato";
>>>>>>
>>>>>> if (!"".equals(text))
>>>>>> query += "&fq=numexp:" + text;
>>>>>>
>>>>>> CmsSolrResultList listFiles = index.search(cmso, query);
>>>>>>
>>>>>> When we execute this code, we get listFiles.size() = 0, but when we
>>>>>> change
>>>>>> the filter field to the predifined SOLR field "content", this
>>>>>> way:
>>>>>>
>>>>>> if (!"".equals(text))
>>>>>> query += "&fq=content:" + text;
>>>>>>
>>>>>> we get the expected result.
>>>>>>
>>>>>> With the CmsSearchResource object we get using the "content" SOLR
>>>>>> field
>>>>>> as
>>>>>> filter, we are able to iterate over the fields of its inner
>>>>>> I_CmsSearchDocument, getting this list as result:
>>>>>>
>>>>>> id
>>>>>> contentblob
>>>>>> path
>>>>>> type
>>>>>> suffix
>>>>>> created
>>>>>> lastmodified
>>>>>> contentdate
>>>>>> relased
>>>>>> expired
>>>>>> res_locales
>>>>>> con_locales
>>>>>> template_prop
>>>>>> default-file_prop
>>>>>> notification-interval_prop
>>>>>> NavPos_prop
>>>>>> enable-notification_prop
>>>>>> locale_prop
>>>>>> NavText_prop
>>>>>> Title_prop
>>>>>> category
>>>>>> ca_excerpt
>>>>>> timestamp
>>>>>> score
>>>>>> link
>>>>>>
>>>>>> No presence of the "numexp" field on the list. Why? Are we missing any
>>>>>> step?
>>>>>> Do we have to configure something else in order to make the mapping
>>>>>> work?
>>>>>>
>>>>>> Thank you in advance.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Arturo.
>>>>>> _______________________________________________
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Kind Regards,
>>>>> Rüdiger.
>>>>>
>>>>> -------------------
>>>>>
>>>>> Rüdiger Kurz
>>>>>
>>>>> Alkacon Software GmbH - The OpenCms Experts
>>>>> http://www.alkacon.com - http://www.opencms.org
>>>>>
>>>>> _______________________________________________
>>>>> This mail is sent to you from the opencms-dev mailing list
>>>>> To change your list options, or to unsubscribe from the list, please
>>>>> visit
>>>>> http://lists.opencms.org/cgi-bin/mailman/listinfo/opencms-dev
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> This mail is sent to you from the opencms-dev mailing list
>>>> To change your list options, or to unsubscribe from the list, please
>>>> visit
>>>> http://lists.opencms.org/cgi-bin/mailman/listinfo/opencms-dev
>>>>
>>>>
>>>>
>
> --
> Kind Regards,
> Rüdiger.
>
> -------------------
>
> Rüdiger Kurz
>
> Alkacon Software GmbH - The OpenCms Experts
> http://www.alkacon.com - http://www.opencms.org
--
Arturo Martín Lladó
More information about the opencms-dev
mailing list