[opencms-dev] Developed an XML Indexer for Lucene but getting error - EOF

M Butcher mbutcher at grcomputing.net
Mon Mar 15 21:39:01 CET 2004


What is throwing the exception, the XML parser or the indexer? Last 
week, I was working on my XSLT code and created some code that looks 
almost exactly like yours (except I created a Transformer instead of an 
XMLReader) and it worked fine -- perhaps the problem is in whatever gets 
handed to the IndexManager.

Matt

Alex ! wrote:
> Ok so I think I'm alsmost done but now when the cron runs (yes it is 
> mysteriously begun working!), I get the following error,  for a 
> premature end of file? any ideas? the way i am retrievin the file 
> contents is as follows:
> 
>             in = new ByteArrayInputStream(f.getContents());
>             is = new InputSource(in);
>             xr.parse(is);
> 
> where:     private XMLReader xr
>     private InputStream in
>     private InputSource is
> 
> 
> Error output form OCMS log:
> 
> [13.03.2004 06:58:10] <opencms_cronscheduler> Starting job for 
> com.opencms.core.CmsCronEntry{58 6 * * * admin Administrators 
> net.grcomputing.opencms.search.lucene.CronIndexManager 
> createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml} 
> 
> [13.03.2004 06:58:10] <opencms_info>
> =====IndexManager============================================================= 
> 
> [13.03.2004 06:58:10] <opencms_info> Analyzer: 
> org.apache.lucene.analysis.standard.StandardAnalyzer
> [13.03.2004 06:58:10] <opencms_info> Extension map exists to handle XML
> [13.03.2004 06:58:10] <opencms_info> Page DocumentFactory loaded
> [13.03.2004 06:58:10] <opencms_info> IndexManager: indexing /test/
> [13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error 
> processing file test_xml.xml: com.opencms.core.CmsException: 0 Unknown 
> exception. Detailed error: Premature end of file..
> [13.03.2004 06:58:10] <opencms_info> IndexManager: indexing /test/xml/
> [13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error 
> processing file article5.xml: com.opencms.core.CmsException: 0 Unknown 
> exception. Detailed error: Premature end of file..
> [13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error 
> processing file article7.xml: com.opencms.core.CmsException: 0 Unknown 
> exception. Detailed error: Premature end of file..
> [13.03.2004 06:58:10] <opencms_info> IndexManager: 4 documents are being 
> processed
> [13.03.2004 06:58:10] <opencms_info> IndexManager:  Index has been 
> optimized.
> [13.03.2004 06:58:10] <opencms_info> Done
> =====IndexManager============================================================= 
> 
> [13.03.2004 06:58:10] <opencms_cronscheduler> Successful launch of job 
> com.opencms.core.CmsCronEntry{58 6 * * * admin Administrators 
> net.grcomputing.opencms.search.lucene.CronIndexManager 
> createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml} 
> Message: CronIndexManager rebuilt the Lucene index on Sat Mar 13 
> 06:58:10 GMT 2004
> 
> 
> Thanks alex
> 
> 
>> From: M Butcher <mbutcher at grcomputing.net>
>> Reply-To: opencms-dev at opencms.org
>> To: opencms-dev at opencms.org
>> Subject: Re: [opencms-dev] Developed an XML Indexer for Lucene but 
>> getting error
>> Date: Mon, 08 Mar 2004 10:03:43 -0700
>>
>>
>> Alex,
>>
>> I can't tell, from the stack trace, what is going on. Judging from 
>> where the exception is located, it looks like a problem with content 
>> defs... but that doesn't make sense....
>>
>> When you finish it, please do send it to Stephan and I. It sounds like 
>> a very useful addition to the existing indexing tools.
>>
>> Matt
>>
>> Alex ! wrote:
>>
>>> Hi,
>>>
>>> this ones probably for Matt/Stefan.
>>>
>>> I have written an XML Indexer for the lucene module (almost 
>>> finished), which will basically take an xml file, parse it, and then 
>>> add its elements and their contents to the lucene index, instead of 
>>> stripping the element tags and then including the remaining content a 
>>> a siingle searchable body (as is currently available).
>>>
>>> Everything is now compiled (into a seprate jar, just 2 class files), 
>>> the cron job runs but gives the following error:
>>>
>>> [07.03.2004 14:20:10] <opencms_cronscheduler> Starting job for 
>>> com.opencms.core.CmsCronEntry{20 14 * * * admin Administrators 
>>> net.grcomputing.opencms.search.lucene.CronIndexManager 
>>> createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/uk_lucene_registry.xml} 
>>>
>>>
>>> [07.03.2004 14:20:10] <opencms_info>
>>> =====IndexManager============================================================= 
>>>
>>>
>>> [07.03.2004 14:20:10] <opencms_info> Analyzer: 
>>> org.apache.lucene.analysis.standard.StandardAnalyzer
>>> [07.03.2004 14:20:10] <opencms_info> Extension map exists to handle XML
>>> [07.03.2004 14:20:10] <opencms_info> Page DocumentFactory loaded
>>> [07.03.2004 14:20:10] <opencms_info> IndexManager: indexing /test/
>>> [07.03.2004 14:20:11] <opencms_info> Created XMLDocumentHandlerSAX
>>> [07.03.2004 14:20:11] <opencms_info> Return Document
>>> [07.03.2004 14:20:11] <opencms_cronscheduler> Error running job for 
>>> com.opencms.core.CmsCronEntry{20 14 * * * admin Administrators 
>>> net.grcomputing.opencms.search.lucene.CronIndexManager 
>>> createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml} 
>>> Error: java.lang.NullPointerException
>>>     at org.apache.lucene.index.FieldInfos.add(FieldInfos.java:90)
>>>     at 
>>> org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:92) 
>>>
>>>     at 
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:257)
>>>     at 
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:244)
>>>     at 
>>> net.grcomputing.opencms.search.lucene.IndexManager.processFile(Unknown 
>>> Source)
>>>     at 
>>> net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown 
>>> Source)
>>>     at 
>>> net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown 
>>> Source)
>>>     at 
>>> net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown 
>>> Source)
>>>     at 
>>> com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
>>>
>>>
>>> my registry entry for the xml files look like this (contained in 
>>> external registry file):
>>>
>>>       <!-- For XML Files :) -->
>>>       <docFactory enabled="true" type="plain">
>>>          <fileType name="XML">
>>>            <extension>.xml</extension>
>>>            
>>> <class>com.mydomain.opencms.lucene.xmlindexing.XMLDocument</class>
>>>          </fileType>
>>>       </docFactory>
>>>
>>> Your help would be much appreciated.
>>>
>>> (should I send you the source to correct and include in your next 
>>> patch/update?)
>>>
>>> Many Thanks
>>>
>>> Alex
>>>
>>> _________________________________________________________________
>>> Find a cheaper internet access deal - choose one to suit you. 
>>> http://www.msn.co.uk/internetaccess
>>>
>>> _______________________________________________
>>> This mail is send to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please 
>>> visit
>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>> _______________________________________________
>> This mail is send to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please 
>> visit
>> http://mail.opencms.org/mailman/listinfo/opencms-dev
> 
> 
> _________________________________________________________________
> Find a cheaper internet access deal - choose one to suit you. 
> http://www.msn.co.uk/internetaccess
> 
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev




More information about the opencms-dev mailing list