[opencms-dev] Developed an XML Indexer for Lucene but getting error - EOF
M Butcher
mbutcher at grcomputing.net
Mon Mar 15 21:39:01 CET 2004
What is throwing the exception, the XML parser or the indexer? Last
week, I was working on my XSLT code and created some code that looks
almost exactly like yours (except I created a Transformer instead of an
XMLReader) and it worked fine -- perhaps the problem is in whatever gets
handed to the IndexManager.
Matt
Alex ! wrote:
> Ok so I think I'm alsmost done but now when the cron runs (yes it is
> mysteriously begun working!), I get the following error, for a
> premature end of file? any ideas? the way i am retrievin the file
> contents is as follows:
>
> in = new ByteArrayInputStream(f.getContents());
> is = new InputSource(in);
> xr.parse(is);
>
> where: private XMLReader xr
> private InputStream in
> private InputSource is
>
>
> Error output form OCMS log:
>
> [13.03.2004 06:58:10] <opencms_cronscheduler> Starting job for
> com.opencms.core.CmsCronEntry{58 6 * * * admin Administrators
> net.grcomputing.opencms.search.lucene.CronIndexManager
> createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml}
>
> [13.03.2004 06:58:10] <opencms_info>
> =====IndexManager=============================================================
>
> [13.03.2004 06:58:10] <opencms_info> Analyzer:
> org.apache.lucene.analysis.standard.StandardAnalyzer
> [13.03.2004 06:58:10] <opencms_info> Extension map exists to handle XML
> [13.03.2004 06:58:10] <opencms_info> Page DocumentFactory loaded
> [13.03.2004 06:58:10] <opencms_info> IndexManager: indexing /test/
> [13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error
> processing file test_xml.xml: com.opencms.core.CmsException: 0 Unknown
> exception. Detailed error: Premature end of file..
> [13.03.2004 06:58:10] <opencms_info> IndexManager: indexing /test/xml/
> [13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error
> processing file article5.xml: com.opencms.core.CmsException: 0 Unknown
> exception. Detailed error: Premature end of file..
> [13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error
> processing file article7.xml: com.opencms.core.CmsException: 0 Unknown
> exception. Detailed error: Premature end of file..
> [13.03.2004 06:58:10] <opencms_info> IndexManager: 4 documents are being
> processed
> [13.03.2004 06:58:10] <opencms_info> IndexManager: Index has been
> optimized.
> [13.03.2004 06:58:10] <opencms_info> Done
> =====IndexManager=============================================================
>
> [13.03.2004 06:58:10] <opencms_cronscheduler> Successful launch of job
> com.opencms.core.CmsCronEntry{58 6 * * * admin Administrators
> net.grcomputing.opencms.search.lucene.CronIndexManager
> createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml}
> Message: CronIndexManager rebuilt the Lucene index on Sat Mar 13
> 06:58:10 GMT 2004
>
>
> Thanks alex
>
>
>> From: M Butcher <mbutcher at grcomputing.net>
>> Reply-To: opencms-dev at opencms.org
>> To: opencms-dev at opencms.org
>> Subject: Re: [opencms-dev] Developed an XML Indexer for Lucene but
>> getting error
>> Date: Mon, 08 Mar 2004 10:03:43 -0700
>>
>>
>> Alex,
>>
>> I can't tell, from the stack trace, what is going on. Judging from
>> where the exception is located, it looks like a problem with content
>> defs... but that doesn't make sense....
>>
>> When you finish it, please do send it to Stephan and I. It sounds like
>> a very useful addition to the existing indexing tools.
>>
>> Matt
>>
>> Alex ! wrote:
>>
>>> Hi,
>>>
>>> this ones probably for Matt/Stefan.
>>>
>>> I have written an XML Indexer for the lucene module (almost
>>> finished), which will basically take an xml file, parse it, and then
>>> add its elements and their contents to the lucene index, instead of
>>> stripping the element tags and then including the remaining content a
>>> a siingle searchable body (as is currently available).
>>>
>>> Everything is now compiled (into a seprate jar, just 2 class files),
>>> the cron job runs but gives the following error:
>>>
>>> [07.03.2004 14:20:10] <opencms_cronscheduler> Starting job for
>>> com.opencms.core.CmsCronEntry{20 14 * * * admin Administrators
>>> net.grcomputing.opencms.search.lucene.CronIndexManager
>>> createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/uk_lucene_registry.xml}
>>>
>>>
>>> [07.03.2004 14:20:10] <opencms_info>
>>> =====IndexManager=============================================================
>>>
>>>
>>> [07.03.2004 14:20:10] <opencms_info> Analyzer:
>>> org.apache.lucene.analysis.standard.StandardAnalyzer
>>> [07.03.2004 14:20:10] <opencms_info> Extension map exists to handle XML
>>> [07.03.2004 14:20:10] <opencms_info> Page DocumentFactory loaded
>>> [07.03.2004 14:20:10] <opencms_info> IndexManager: indexing /test/
>>> [07.03.2004 14:20:11] <opencms_info> Created XMLDocumentHandlerSAX
>>> [07.03.2004 14:20:11] <opencms_info> Return Document
>>> [07.03.2004 14:20:11] <opencms_cronscheduler> Error running job for
>>> com.opencms.core.CmsCronEntry{20 14 * * * admin Administrators
>>> net.grcomputing.opencms.search.lucene.CronIndexManager
>>> createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml}
>>> Error: java.lang.NullPointerException
>>> at org.apache.lucene.index.FieldInfos.add(FieldInfos.java:90)
>>> at
>>> org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:92)
>>>
>>> at
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:257)
>>> at
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:244)
>>> at
>>> net.grcomputing.opencms.search.lucene.IndexManager.processFile(Unknown
>>> Source)
>>> at
>>> net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown
>>> Source)
>>> at
>>> net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown
>>> Source)
>>> at
>>> net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown
>>> Source)
>>> at
>>> com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
>>>
>>>
>>> my registry entry for the xml files look like this (contained in
>>> external registry file):
>>>
>>> <!-- For XML Files :) -->
>>> <docFactory enabled="true" type="plain">
>>> <fileType name="XML">
>>> <extension>.xml</extension>
>>>
>>> <class>com.mydomain.opencms.lucene.xmlindexing.XMLDocument</class>
>>> </fileType>
>>> </docFactory>
>>>
>>> Your help would be much appreciated.
>>>
>>> (should I send you the source to correct and include in your next
>>> patch/update?)
>>>
>>> Many Thanks
>>>
>>> Alex
>>>
>>> _________________________________________________________________
>>> Find a cheaper internet access deal - choose one to suit you.
>>> http://www.msn.co.uk/internetaccess
>>>
>>> _______________________________________________
>>> This mail is send to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please
>>> visit
>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>> _______________________________________________
>> This mail is send to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please
>> visit
>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>
> _________________________________________________________________
> Find a cheaper internet access deal - choose one to suit you.
> http://www.msn.co.uk/internetaccess
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
More information about the opencms-dev
mailing list