[opencms-dev] Developed an XML Indexer for Lucene but getting error - EOF

Alex ! kingofkingston at hotmail.com
Tue Mar 16 21:40:02 CET 2004


I only have 3 xml files in the test dir im trying to index. One of those 
files I am using in my jsp, and it works fine. See below code snippet. 
XMLDocument constructor throws no exceptions, nor does 
XMLDocument.Document(cmso,f).

Its gotta be IndexManager. Maybe the document I am producing is not what it 
is expecting? But then why the EOF?

Inside my jsp:


<%
	CmsJspActionElement cmsJspAE = new CmsJspActionElement(pageContext, 
request, response);
	CmsObject cmso = cmsJspAE.getCmsObject();
        CmsFile f = cmso.readFile("/test/test_xml.xml");

        String thepath = f.getAbsolutePath();
        out.println("<br>"+thepath+"<br><br>");

        XMLDocument xmldoc = null;
        Document thisdoc = null;

		try
		{
                        xmldoc = new XMLDocument();
                        out.println("<br>"+xmldoc.getFactoryName()+"<br>");
		        thisdoc = xmldoc.Document(cmso, f);

		}
		catch (Exception e)
		{
			throw new CmsException(e.getMessage(), e.getCause());
		}
                String outdoc = thisdoc.toString();
                out.println("Lucene Document: <br><br>" + outdoc);
%>


>From: M Butcher <mbutcher at grcomputing.net>
>Reply-To: opencms-dev at opencms.org
>To: opencms-dev at opencms.org
>Subject: Re: [opencms-dev] Developed an XML Indexer for Lucene but getting 
>error - EOF
>Date: Tue, 16 Mar 2004 11:09:57 -0700
>
>Alex ! wrote:
>>OK, Matt. So I had some input from my colleague, changed the XMLDocument 
>>class (seems it wasnt done in the best way!) and now tried calling the 
>>XMLDocument(cmso,f) class directly from a jsp - and it works, returns a 
>>lucene document, which i test by outputing to screen using the 
>>Document.toString() method as before.
>>
>>But... the cron still returns the same premature end of file exception.
>
>On the same document? Do you know what is throwing the exception? Is it the 
>XMLDocument constructor or the IndexManager?
>
>>
>>
>>Alex
>>
>>
>>>From: "Alex !" <kingofkingston at hotmail.com>
>>>Reply-To: opencms-dev at opencms.org
>>>To: opencms-dev at opencms.org
>>>Subject: Re: [opencms-dev] Developed an XML Indexer for Lucene but 
>>>getting error - EOF
>>>Date: Mon, 15 Mar 2004 22:28:34 +0000
>>>
>>>It seems to be the indexer. I have a class XMLDocument (implements 
>>>I_FileDocumentFactory), which is based on bodyless document. Here I set 
>>>up the XMLReader and instantiate a XMLDocumentHandlerSAX class (extends 
>>>DefaultHandler).
>>>
>>>After some thorough debug and testing, its seems the indexer, as I can 
>>>call the XMLDocumentHandlerSAX from within a jsp and it works, returning 
>>>a Lucene Document, that I then print to screen using Document.toString(), 
>>>it all looks ok, although I havent tried indexing it myself (i was 
>>>counting on the module doing this).
>>>
>>>Could it be the XMLDoument class? Here is what it looks like:
>>>
>>>public class XMLDocument implements I_FileDocumentFactory
>>>{
>>>     public static String FACTORY_NAME = "XML DocumentFactory";
>>>     private XMLDocumentHandlerSAX saxhdlr = null;
>>>     private XMLReader xr = null;
>>>     private InputStream in = null;
>>>     private InputSource is = null;
>>>
>>>     public XMLDocument() { }
>>>
>>>     public String getFactoryName() {
>>>        return FACTORY_NAME;
>>>     }
>>>
>>>     public Document Document(CmsObject cmso, CmsFile f) throws 
>>>CmsException
>>>     {
>>>         try
>>>         {
>>>             XMLDocumentHandlerSAX saxhdlr = new 
>>>XMLDocumentHandlerSAX(cmso, f);
>>>
>>>             in = new ByteArrayInputStream(f.getContents());
>>>             is = new InputSource(in);
>>>
>>>             //in = (InputStream)(new 
>>>ByteArrayInputStream(f.getContents()));
>>>             //is = new InputSource(in);
>>>
>>>             //is = new InputSource (new StringReader (xmlText));
>>>
>>>             xr = XMLReaderFactory.createXMLReader( 
>>>"org.apache.xerces.parsers.SAXParser" );
>>>           xr.setContentHandler(saxhdlr);
>>>           xr.setFeature( "http://xml.org/sax/features/validation",false 
>>>);
>>>           xr.setFeature( 
>>>"http://apache.org/xml/features/continue-after-fatal-error",true );
>>>             xr.parse(is);
>>>
>>>         }
>>>         catch (Exception e)
>>>         {
>>>             throw new CmsException(e.getMessage(), e.getCause());
>>>         }
>>>         return saxhdlr.getDocument();
>>>     }
>>>
>>>     public Document Document(CmsObject cmso, CmsFile f, HashMap h) 
>>>throws CmsException
>>>     {
>>>         return Document(cmso,f);
>>>     }
>>>}
>>>
>>>
>>>It seems the handler class returns what it should, so it is either the 
>>>XMLDocument class or the indexer which is complaining. Should I send you 
>>>the two src files ? theyre about as complete as they are gonna get...
>>>
>>>Cheers
>>>
>>>Alex
>>>
>>>
>>>>From: M Butcher <mbutcher at grcomputing.net>
>>>>Reply-To: opencms-dev at opencms.org
>>>>To: opencms-dev at opencms.org
>>>>Subject: Re: [opencms-dev] Developed an XML Indexer for Lucene but 
>>>>getting error - EOF
>>>>Date: Mon, 15 Mar 2004 13:53:17 -0700
>>>>
>>>>What is throwing the exception, the XML parser or the indexer? Last 
>>>>week, I was working on my XSLT code and created some code that looks 
>>>>almost exactly like yours (except I created a Transformer instead of an 
>>>>XMLReader) and it worked fine -- perhaps the problem is in whatever gets 
>>>>handed to the IndexManager.
>>>>
>>>>Matt
>>>>
>>>>Alex ! wrote:
>>>>
>>>>>Ok so I think I'm alsmost done but now when the cron runs (yes it is 
>>>>>mysteriously begun working!), I get the following error,  for a 
>>>>>premature end of file? any ideas? the way i am retrievin the file 
>>>>>contents is as follows:
>>>>>
>>>>>             in = new ByteArrayInputStream(f.getContents());
>>>>>             is = new InputSource(in);
>>>>>             xr.parse(is);
>>>>>
>>>>>where:     private XMLReader xr
>>>>>     private InputStream in
>>>>>     private InputSource is
>>>>>
>>>>>
>>>>>Error output form OCMS log:
>>>>>
>>>>>[13.03.2004 06:58:10] <opencms_cronscheduler> Starting job for 
>>>>>com.opencms.core.CmsCronEntry{58 6 * * * admin Administrators 
>>>>>net.grcomputing.opencms.search.lucene.CronIndexManager 
>>>>>createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml}
>>>>>
>>>>>
>>>>>[13.03.2004 06:58:10] <opencms_info>
>>>>>=====IndexManager=============================================================
>>>>>
>>>>>
>>>>>[13.03.2004 06:58:10] <opencms_info> Analyzer: 
>>>>>org.apache.lucene.analysis.standard.StandardAnalyzer
>>>>>[13.03.2004 06:58:10] <opencms_info> Extension map exists to handle XML
>>>>>[13.03.2004 06:58:10] <opencms_info> Page DocumentFactory loaded
>>>>>[13.03.2004 06:58:10] <opencms_info> IndexManager: indexing /test/
>>>>>[13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error 
>>>>>processing file test_xml.xml: com.opencms.core.CmsException: 0 Unknown 
>>>>>exception. Detailed error: Premature end of file..
>>>>>[13.03.2004 06:58:10] <opencms_info> IndexManager: indexing /test/xml/
>>>>>[13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error 
>>>>>processing file article5.xml: com.opencms.core.CmsException: 0 Unknown 
>>>>>exception. Detailed error: Premature end of file..
>>>>>[13.03.2004 06:58:10] <opencms_critical> IndexManager: CMS Error 
>>>>>processing file article7.xml: com.opencms.core.CmsException: 0 Unknown 
>>>>>exception. Detailed error: Premature end of file..
>>>>>[13.03.2004 06:58:10] <opencms_info> IndexManager: 4 documents are 
>>>>>being processed
>>>>>[13.03.2004 06:58:10] <opencms_info> IndexManager:  Index has been 
>>>>>optimized.
>>>>>[13.03.2004 06:58:10] <opencms_info> Done
>>>>>=====IndexManager=============================================================
>>>>>
>>>>>
>>>>>[13.03.2004 06:58:10] <opencms_cronscheduler> Successful launch of job 
>>>>>com.opencms.core.CmsCronEntry{58 6 * * * admin Administrators 
>>>>>net.grcomputing.opencms.search.lucene.CronIndexManager 
>>>>>createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml} 
>>>>>Message: CronIndexManager rebuilt the Lucene index on Sat Mar 13 
>>>>>06:58:10 GMT 2004
>>>>>
>>>>>
>>>>>Thanks alex
>>>>>
>>>>>
>>>>>>From: M Butcher <mbutcher at grcomputing.net>
>>>>>>Reply-To: opencms-dev at opencms.org
>>>>>>To: opencms-dev at opencms.org
>>>>>>Subject: Re: [opencms-dev] Developed an XML Indexer for Lucene but 
>>>>>>getting error
>>>>>>Date: Mon, 08 Mar 2004 10:03:43 -0700
>>>>>>
>>>>>>
>>>>>>Alex,
>>>>>>
>>>>>>I can't tell, from the stack trace, what is going on. Judging from 
>>>>>>where the exception is located, it looks like a problem with content 
>>>>>>defs... but that doesn't make sense....
>>>>>>
>>>>>>When you finish it, please do send it to Stephan and I. It sounds like 
>>>>>>a very useful addition to the existing indexing tools.
>>>>>>
>>>>>>Matt
>>>>>>
>>>>>>Alex ! wrote:
>>>>>>
>>>>>>>Hi,
>>>>>>>
>>>>>>>this ones probably for Matt/Stefan.
>>>>>>>
>>>>>>>I have written an XML Indexer for the lucene module (almost 
>>>>>>>finished), which will basically take an xml file, parse it, and then 
>>>>>>>add its elements and their contents to the lucene index, instead of 
>>>>>>>stripping the element tags and then including the remaining content a 
>>>>>>>a siingle searchable body (as is currently available).
>>>>>>>
>>>>>>>Everything is now compiled (into a seprate jar, just 2 class files), 
>>>>>>>the cron job runs but gives the following error:
>>>>>>>
>>>>>>>[07.03.2004 14:20:10] <opencms_cronscheduler> Starting job for 
>>>>>>>com.opencms.core.CmsCronEntry{20 14 * * * admin Administrators 
>>>>>>>net.grcomputing.opencms.search.lucene.CronIndexManager 
>>>>>>>createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/uk_lucene_registry.xml}
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>[07.03.2004 14:20:10] <opencms_info>
>>>>>>>=====IndexManager=============================================================
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>[07.03.2004 14:20:10] <opencms_info> Analyzer: 
>>>>>>>org.apache.lucene.analysis.standard.StandardAnalyzer
>>>>>>>[07.03.2004 14:20:10] <opencms_info> Extension map exists to handle 
>>>>>>>XML
>>>>>>>[07.03.2004 14:20:10] <opencms_info> Page DocumentFactory loaded
>>>>>>>[07.03.2004 14:20:10] <opencms_info> IndexManager: indexing /test/
>>>>>>>[07.03.2004 14:20:11] <opencms_info> Created XMLDocumentHandlerSAX
>>>>>>>[07.03.2004 14:20:11] <opencms_info> Return Document
>>>>>>>[07.03.2004 14:20:11] <opencms_cronscheduler> Error running job for 
>>>>>>>com.opencms.core.CmsCronEntry{20 14 * * * admin Administrators 
>>>>>>>net.grcomputing.opencms.search.lucene.CronIndexManager 
>>>>>>>createIndex=true,registry=C:/dev/java/tomcat-4.1.27/webapps/opencms/WEB-INF/config/epfolio_uk_lucene_registry.xml} 
>>>>>>>Error: java.lang.NullPointerException
>>>>>>>     at org.apache.lucene.index.FieldInfos.add(FieldInfos.java:90)
>>>>>>>     at 
>>>>>>>org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:92)
>>>>>>>
>>>>>>>
>>>>>>>     at 
>>>>>>>org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:257)
>>>>>>>     at 
>>>>>>>org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:244)
>>>>>>>     at 
>>>>>>>net.grcomputing.opencms.search.lucene.IndexManager.processFile(Unknown 
>>>>>>>Source)
>>>>>>>     at 
>>>>>>>net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown 
>>>>>>>Source)
>>>>>>>     at 
>>>>>>>net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown 
>>>>>>>Source)
>>>>>>>     at 
>>>>>>>net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown 
>>>>>>>Source)
>>>>>>>     at 
>>>>>>>com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
>>>>>>>
>>>>>>>
>>>>>>>my registry entry for the xml files look like this (contained in 
>>>>>>>external registry file):
>>>>>>>
>>>>>>>       <!-- For XML Files :) -->
>>>>>>>       <docFactory enabled="true" type="plain">
>>>>>>>          <fileType name="XML">
>>>>>>>            <extension>.xml</extension>
>>>>>>>            
>>>>>>><class>com.mydomain.opencms.lucene.xmlindexing.XMLDocument</class>
>>>>>>>          </fileType>
>>>>>>>       </docFactory>
>>>>>>>
>>>>>>>Your help would be much appreciated.
>>>>>>>
>>>>>>>(should I send you the source to correct and include in your next 
>>>>>>>patch/update?)
>>>>>>>
>>>>>>>Many Thanks
>>>>>>>
>>>>>>>Alex
>>>>>>>
>>>>>>>_________________________________________________________________
>>>>>>>Find a cheaper internet access deal - choose one to suit you. 
>>>>>>>http://www.msn.co.uk/internetaccess
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>This mail is send to you from the opencms-dev mailing list
>>>>>>>To change your list options, or to unsubscribe from the list, please 
>>>>>>>visit
>>>>>>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>>_______________________________________________
>>>>>>This mail is send to you from the opencms-dev mailing list
>>>>>>To change your list options, or to unsubscribe from the list, please 
>>>>>>visit
>>>>>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>>
>>>>>
>>>>>
>>>>>_________________________________________________________________
>>>>>Find a cheaper internet access deal - choose one to suit you. 
>>>>>http://www.msn.co.uk/internetaccess
>>>>>
>>>>>_______________________________________________
>>>>>This mail is send to you from the opencms-dev mailing list
>>>>>To change your list options, or to unsubscribe from the list, please 
>>>>>visit
>>>>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>
>>>>
>>>>_______________________________________________
>>>>This mail is send to you from the opencms-dev mailing list
>>>>To change your list options, or to unsubscribe from the list, please 
>>>>visit
>>>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>
>>>
>>>_________________________________________________________________
>>>Stay in touch with absent friends - get MSN Messenger 
>>>http://www.msn.co.uk/messenger
>>>
>>>_______________________________________________
>>>This mail is send to you from the opencms-dev mailing list
>>>To change your list options, or to unsubscribe from the list, please 
>>>visit
>>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>>_________________________________________________________________
>>Tired of 56k? Get a FREE BT Broadband connection 
>>http://www.msn.co.uk/specials/btbroadband
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>_______________________________________________
>This mail is send to you from the opencms-dev mailing list
>To change your list options, or to unsubscribe from the list, please visit
>http://mail.opencms.org/mailman/listinfo/opencms-dev

_________________________________________________________________
Stay in touch with absent friends - get MSN Messenger 
http://www.msn.co.uk/messenger




More information about the opencms-dev mailing list