[opencms-dev] lucene in OpenCms 5 for Excel

??? shiys at langhua.cn
Sun Jan 9 14:40:45 CET 2005


Hi Peter,

Try the following code to make index for Ms Excel files:

package net.grcomputing.opencms.search.lucene;

import com.opencms.core.CmsException;
import com.opencms.file.CmsFile;
import com.opencms.file.CmsObject;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;

import jxl.Cell;
import jxl.CellType;
import jxl.Sheet;
import jxl.Workbook;
import jxl.read.biff.BiffException;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

/**
 * This class allows to create org.apache.lucene.document.Document
 * so you can index the entire content of Excel files.
 *
 * Title: ExcelDocument
 * Company: Beijing Langhua Ltd.
 *
 * @author Shi Yusen
 * @link    http://www.langhua.cn/
 * @version 1.0
 */
public class ExcelDocument extends BodylessDocument {

	public static String FACTORY_NAME = "MS Excel DocumentFactory";

	public ExcelDocument() {
	}

	public String getFactoryName() {
		return FACTORY_NAME;
	}

	public Document Document(CmsObject cmso, CmsFile f) throws CmsException {
		String bodyText = null;
		Document doc = super.Document(cmso, f);

		f = cmso.readFile(f.getAbsolutePath());

		InputStream in = new ByteArrayInputStream(f.getContents());
		
		try {
			Workbook wb = Workbook.getWorkbook(in);
			Sheet[] sheets = wb.getSheets();
			for(int i=0; i<sheets.length; i++) {
				for(int j=0; j<sheets[i].getRows(); j++) {
					Cell[] cells = sheets[i].getRow(j);
					for(int k=0; k<cells.length; k++) {
						// Only Label will be indexed.
						if(cells[k].getType().equals(CellType.LABEL)) {
							bodyText += cells[k].getContents() + "\n";
						}
					}
				}
			}
		} catch (BiffException e1) {
			throw new CmsException(e1.getMessage());
		} catch (IOException e1) {
			throw new CmsException(e1.getMessage());
		}
		
		if (bodyText != null) {
			doc.add(Field.Text(FIELD_BODY, bodyText));
			doc.add(Field.UnStored(FIELD_BULK, bodyText));
		}

		return doc;
	}

	public Document Document(CmsObject cmso, CmsFile f, HashMap h)
		throws CmsException {
		return Document(cmso, f);
	}
}

Regards,

Shi Yusen / Beijing Langhua Ltd.


-----????-----
???: Peter Korn [mailto:peter_korn at gmx.de]
????: 2005?1?7? 18:02
???: opencms-dev at opencms.org
??: [opencms-dev] lucene in OpenCms 5 for Exel


Hi, 

is it possible to use lucene in OpenCms 5 for Exel files?

thanks 

Peter

-- 
+++ Sparen Sie mit GMX DSL +++ http://www.gmx.net/de/go/dsl
AKTION f|r Wechsler: DSL-Tarife ab 3,99 EUR/Monat + Startguthaben


More information about the opencms-dev mailing list