[opencms-dev] Developed an XML Indexer for Lucene but getting error - EOF

Alex ! kingofkingston at hotmail.com
Sat Mar 13 08:08:02 CET 2004


Ok so I think I'm alsmost done but now when the cron runs (yes it is 
mysteriously begun working!), I get the following error,  for a premature 
end of file? any ideas? the way i am retrievin the file contents is as 
follows:

			in = new ByteArrayInputStream(f.getContents());
			is = new InputSource(in);
			xr.parse(is);

where: 	private XMLReader xr
	private InputStream in
	private InputSource is


Error output form OCMS log:

[13.03.2004 06:58:10] <opencms_cronscheduler> Starting job for 
com.opencms.core.CmsCronEntry{58 6 * * * admin Administrators 
net.grcomputing.opencms.search.lucene.CronIndexManager 
createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml}
[13.03.2004 06:58:10] <opencms_info>
=====IndexManager=============================================================
[13.03.2004 06:58:10] <opencms_info> Analyzer: 
org.apache.lucene.analysis.standard.StandardAnalyzer
[13.03.2004 06:58:10] <opencms_info> Extension map exists to handle XML
[13.03.2004 06:58:10] <opencms_info> Page DocumentFactory loaded
[13.03.2004 06:58:10] <opencms_info> IndexManager: indexing /test/
[13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error processing 
file test_xml.xml: com.opencms.core.CmsException: 0 Unknown exception. 
Detailed error: Premature end of file..
[13.03.2004 06:58:10] <opencms_info> IndexManager: indexing /test/xml/
[13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error processing 
file article5.xml: com.opencms.core.CmsException: 0 Unknown exception. 
Detailed error: Premature end of file..
[13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error processing 
file article7.xml: com.opencms.core.CmsException: 0 Unknown exception. 
Detailed error: Premature end of file..
[13.03.2004 06:58:10] <opencms_info> IndexManager: 4 documents are being 
processed
[13.03.2004 06:58:10] <opencms_info> IndexManager:  Index has been 
optimized.
[13.03.2004 06:58:10] <opencms_info> Done
=====IndexManager=============================================================
[13.03.2004 06:58:10] <opencms_cronscheduler> Successful launch of job 
com.opencms.core.CmsCronEntry{58 6 * * * admin Administrators 
net.grcomputing.opencms.search.lucene.CronIndexManager 
createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml} 
Message: CronIndexManager rebuilt the Lucene index on Sat Mar 13 06:58:10 
GMT 2004


Thanks alex


>From: M Butcher <mbutcher at grcomputing.net>
>Reply-To: opencms-dev at opencms.org
>To: opencms-dev at opencms.org
>Subject: Re: [opencms-dev] Developed an XML Indexer for Lucene but getting 
>error
>Date: Mon, 08 Mar 2004 10:03:43 -0700
>
>
>Alex,
>
>I can't tell, from the stack trace, what is going on. Judging from where 
>the exception is located, it looks like a problem with content defs... but 
>that doesn't make sense....
>
>When you finish it, please do send it to Stephan and I. It sounds like a 
>very useful addition to the existing indexing tools.
>
>Matt
>
>Alex ! wrote:
>>Hi,
>>
>>this ones probably for Matt/Stefan.
>>
>>I have written an XML Indexer for the lucene module (almost finished), 
>>which will basically take an xml file, parse it, and then add its elements 
>>and their contents to the lucene index, instead of stripping the element 
>>tags and then including the remaining content a a siingle searchable body 
>>(as is currently available).
>>
>>Everything is now compiled (into a seprate jar, just 2 class files), the 
>>cron job runs but gives the following error:
>>
>>[07.03.2004 14:20:10] <opencms_cronscheduler> Starting job for 
>>com.opencms.core.CmsCronEntry{20 14 * * * admin Administrators 
>>net.grcomputing.opencms.search.lucene.CronIndexManager 
>>createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/uk_lucene_registry.xml}
>>
>>[07.03.2004 14:20:10] <opencms_info>
>>=====IndexManager=============================================================
>>
>>[07.03.2004 14:20:10] <opencms_info> Analyzer: 
>>org.apache.lucene.analysis.standard.StandardAnalyzer
>>[07.03.2004 14:20:10] <opencms_info> Extension map exists to handle XML
>>[07.03.2004 14:20:10] <opencms_info> Page DocumentFactory loaded
>>[07.03.2004 14:20:10] <opencms_info> IndexManager: indexing /test/
>>[07.03.2004 14:20:11] <opencms_info> Created XMLDocumentHandlerSAX
>>[07.03.2004 14:20:11] <opencms_info> Return Document
>>[07.03.2004 14:20:11] <opencms_cronscheduler> Error running job for 
>>com.opencms.core.CmsCronEntry{20 14 * * * admin Administrators 
>>net.grcomputing.opencms.search.lucene.CronIndexManager 
>>createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml} 
>>Error: java.lang.NullPointerException
>>     at org.apache.lucene.index.FieldInfos.add(FieldInfos.java:90)
>>     at 
>>org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:92)
>>     at 
>>org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:257)
>>     at 
>>org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:244)
>>     at 
>>net.grcomputing.opencms.search.lucene.IndexManager.processFile(Unknown 
>>Source)
>>     at 
>>net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown 
>>Source)
>>     at net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown 
>>Source)
>>     at 
>>net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown 
>>Source)
>>     at 
>>com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
>>
>>
>>my registry entry for the xml files look like this (contained in external 
>>registry file):
>>
>>       <!-- For XML Files :) -->
>>       <docFactory enabled="true" type="plain">
>>          <fileType name="XML">
>>            <extension>.xml</extension>
>>            
>><class>com.mydomain.opencms.lucene.xmlindexing.XMLDocument</class>
>>          </fileType>
>>       </docFactory>
>>
>>Your help would be much appreciated.
>>
>>(should I send you the source to correct and include in your next 
>>patch/update?)
>>
>>Many Thanks
>>
>>Alex
>>
>>_________________________________________________________________
>>Find a cheaper internet access deal - choose one to suit you. 
>>http://www.msn.co.uk/internetaccess
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>_______________________________________________
>This mail is send to you from the opencms-dev mailing list
>To change your list options, or to unsubscribe from the list, please visit
>http://mail.opencms.org/mailman/listinfo/opencms-dev

_________________________________________________________________
Find a cheaper internet access deal - choose one to suit you. 
http://www.msn.co.uk/internetaccess




More information about the opencms-dev mailing list