[opencms-dev] Developed an XML Indexer for Lucene but getting error - EOF
Alex !
kingofkingston at hotmail.com
Sat Mar 13 08:08:02 CET 2004
Ok so I think I'm alsmost done but now when the cron runs (yes it is
mysteriously begun working!), I get the following error, for a premature
end of file? any ideas? the way i am retrievin the file contents is as
follows:
in = new ByteArrayInputStream(f.getContents());
is = new InputSource(in);
xr.parse(is);
where: private XMLReader xr
private InputStream in
private InputSource is
Error output form OCMS log:
[13.03.2004 06:58:10] <opencms_cronscheduler> Starting job for
com.opencms.core.CmsCronEntry{58 6 * * * admin Administrators
net.grcomputing.opencms.search.lucene.CronIndexManager
createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml}
[13.03.2004 06:58:10] <opencms_info>
=====IndexManager=============================================================
[13.03.2004 06:58:10] <opencms_info> Analyzer:
org.apache.lucene.analysis.standard.StandardAnalyzer
[13.03.2004 06:58:10] <opencms_info> Extension map exists to handle XML
[13.03.2004 06:58:10] <opencms_info> Page DocumentFactory loaded
[13.03.2004 06:58:10] <opencms_info> IndexManager: indexing /test/
[13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error processing
file test_xml.xml: com.opencms.core.CmsException: 0 Unknown exception.
Detailed error: Premature end of file..
[13.03.2004 06:58:10] <opencms_info> IndexManager: indexing /test/xml/
[13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error processing
file article5.xml: com.opencms.core.CmsException: 0 Unknown exception.
Detailed error: Premature end of file..
[13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error processing
file article7.xml: com.opencms.core.CmsException: 0 Unknown exception.
Detailed error: Premature end of file..
[13.03.2004 06:58:10] <opencms_info> IndexManager: 4 documents are being
processed
[13.03.2004 06:58:10] <opencms_info> IndexManager: Index has been
optimized.
[13.03.2004 06:58:10] <opencms_info> Done
=====IndexManager=============================================================
[13.03.2004 06:58:10] <opencms_cronscheduler> Successful launch of job
com.opencms.core.CmsCronEntry{58 6 * * * admin Administrators
net.grcomputing.opencms.search.lucene.CronIndexManager
createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml}
Message: CronIndexManager rebuilt the Lucene index on Sat Mar 13 06:58:10
GMT 2004
Thanks alex
>From: M Butcher <mbutcher at grcomputing.net>
>Reply-To: opencms-dev at opencms.org
>To: opencms-dev at opencms.org
>Subject: Re: [opencms-dev] Developed an XML Indexer for Lucene but getting
>error
>Date: Mon, 08 Mar 2004 10:03:43 -0700
>
>
>Alex,
>
>I can't tell, from the stack trace, what is going on. Judging from where
>the exception is located, it looks like a problem with content defs... but
>that doesn't make sense....
>
>When you finish it, please do send it to Stephan and I. It sounds like a
>very useful addition to the existing indexing tools.
>
>Matt
>
>Alex ! wrote:
>>Hi,
>>
>>this ones probably for Matt/Stefan.
>>
>>I have written an XML Indexer for the lucene module (almost finished),
>>which will basically take an xml file, parse it, and then add its elements
>>and their contents to the lucene index, instead of stripping the element
>>tags and then including the remaining content a a siingle searchable body
>>(as is currently available).
>>
>>Everything is now compiled (into a seprate jar, just 2 class files), the
>>cron job runs but gives the following error:
>>
>>[07.03.2004 14:20:10] <opencms_cronscheduler> Starting job for
>>com.opencms.core.CmsCronEntry{20 14 * * * admin Administrators
>>net.grcomputing.opencms.search.lucene.CronIndexManager
>>createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/uk_lucene_registry.xml}
>>
>>[07.03.2004 14:20:10] <opencms_info>
>>=====IndexManager=============================================================
>>
>>[07.03.2004 14:20:10] <opencms_info> Analyzer:
>>org.apache.lucene.analysis.standard.StandardAnalyzer
>>[07.03.2004 14:20:10] <opencms_info> Extension map exists to handle XML
>>[07.03.2004 14:20:10] <opencms_info> Page DocumentFactory loaded
>>[07.03.2004 14:20:10] <opencms_info> IndexManager: indexing /test/
>>[07.03.2004 14:20:11] <opencms_info> Created XMLDocumentHandlerSAX
>>[07.03.2004 14:20:11] <opencms_info> Return Document
>>[07.03.2004 14:20:11] <opencms_cronscheduler> Error running job for
>>com.opencms.core.CmsCronEntry{20 14 * * * admin Administrators
>>net.grcomputing.opencms.search.lucene.CronIndexManager
>>createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml}
>>Error: java.lang.NullPointerException
>> at org.apache.lucene.index.FieldInfos.add(FieldInfos.java:90)
>> at
>>org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:92)
>> at
>>org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:257)
>> at
>>org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:244)
>> at
>>net.grcomputing.opencms.search.lucene.IndexManager.processFile(Unknown
>>Source)
>> at
>>net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown
>>Source)
>> at net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown
>>Source)
>> at
>>net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown
>>Source)
>> at
>>com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
>>
>>
>>my registry entry for the xml files look like this (contained in external
>>registry file):
>>
>> <!-- For XML Files :) -->
>> <docFactory enabled="true" type="plain">
>> <fileType name="XML">
>> <extension>.xml</extension>
>>
>><class>com.mydomain.opencms.lucene.xmlindexing.XMLDocument</class>
>> </fileType>
>> </docFactory>
>>
>>Your help would be much appreciated.
>>
>>(should I send you the source to correct and include in your next
>>patch/update?)
>>
>>Many Thanks
>>
>>Alex
>>
>>_________________________________________________________________
>>Find a cheaper internet access deal - choose one to suit you.
>>http://www.msn.co.uk/internetaccess
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>_______________________________________________
>This mail is send to you from the opencms-dev mailing list
>To change your list options, or to unsubscribe from the list, please visit
>http://mail.opencms.org/mailman/listinfo/opencms-dev
_________________________________________________________________
Find a cheaper internet access deal - choose one to suit you.
http://www.msn.co.uk/internetaccess
More information about the opencms-dev
mailing list