[opencms-dev] CmsCollector performance

Shi Yusen shiys at langhua.cn
Wed Mar 5 15:28:24 CET 2008


Sorry for not open the performance module. We have to make a living.

Anyway, here are the steps to build your own module:
1. Add some new filters in CmsResourceFilter. For example, I added two
new filter methods: addTopLatest(int top) and addPagedLatest(int
startRow, int rowsInPage).

2. Add new filter branches in
org.opencms.db.CmsDriverManager.readResources(...)

3. Add new functions to implements the new filter branches in
org.opencms.db.oracle.CmsVfsDriver

3. Add new query strings to /org/opencms/db/oracle/query.properties. For
example, you can use 
# patterns for statements to select resources/folders (= selections
without content)
# THINGS TO KNOW: don't select the project-ID attrib. of the structure
table per default!
# There are cases, where the project-ID attrib. of the project-resources
tab. is used
# as the project-ID!
C_ORACLE_RESOURCES_SELECT_ATTRIBS_LEVEL1=\
    STRUCTURE_ID,\
	RESOURCE_ID,\
	RESOURCE_PATH,\
	STRUCTURE_STATE,\
	DATE_RELEASED,\
	DATE_EXPIRED,\
	STRUCTURE_VERSION,\
	RESOURCE_ID_2,\
	RESOURCE_TYPE,\
	RESOURCE_FLAGS,\
	RESOURCE_STATE,\
	DATE_CREATED,\
	DATE_LASTMODIFIED,\
	USER_CREATED,\
	USER_LASTMODIFIED,\
	LOCKED_IN_PROJECT,\
	RESOURCE_SIZE,\
	DATE_CONTENT,\
	SIBLING_COUNT,\
	RESOURCE_VERSION

# patterns for statements to select resources/folders (= selections
without content)
# THINGS TO KNOW: don't select the project-ID attrib. of the structure
table per default!
# There are cases, where the project-ID attrib. of the project-resources
tab. is used
# as the project-ID!
C_ORACLE_RESOURCES_SELECT_ATTRIBS_LEVEL2=\
    CMS_${PROJECT}_STRUCTURE.STRUCTURE_ID AS STRUCTURE_ID,\
	CMS_${PROJECT}_STRUCTURE.RESOURCE_ID AS RESOURCE_ID,\
	CMS_${PROJECT}_STRUCTURE.RESOURCE_PATH AS RESOURCE_PATH,\
	CMS_${PROJECT}_STRUCTURE.STRUCTURE_STATE AS STRUCTURE_STATE,\
	CMS_${PROJECT}_STRUCTURE.DATE_RELEASED AS DATE_RELEASED,\
	CMS_${PROJECT}_STRUCTURE.DATE_EXPIRED AS DATE_EXPIRED,\
	CMS_${PROJECT}_STRUCTURE.STRUCTURE_VERSION AS STRUCTURE_VERSION,\
	CMS_${PROJECT}_RESOURCES.RESOURCE_ID AS RESOURCE_ID_2,\
	CMS_${PROJECT}_RESOURCES.RESOURCE_TYPE AS RESOURCE_TYPE,\
	CMS_${PROJECT}_RESOURCES.RESOURCE_FLAGS AS RESOURCE_FLAGS,\
	CMS_${PROJECT}_RESOURCES.RESOURCE_STATE AS RESOURCE_STATE,\
	CMS_${PROJECT}_RESOURCES.DATE_CREATED AS DATE_CREATED,\
	CMS_${PROJECT}_RESOURCES.DATE_LASTMODIFIED AS DATE_LASTMODIFIED,\
	CMS_${PROJECT}_RESOURCES.USER_CREATED AS USER_CREATED,\
	CMS_${PROJECT}_RESOURCES.USER_LASTMODIFIED AS USER_LASTMODIFIED,\
	CMS_${PROJECT}_RESOURCES.PROJECT_LASTMODIFIED AS LOCKED_IN_PROJECT,\
	CMS_${PROJECT}_RESOURCES.RESOURCE_SIZE AS RESOURCE_SIZE,\
	CMS_${PROJECT}_RESOURCES.DATE_CONTENT AS DATE_CONTENT,\
	CMS_${PROJECT}_RESOURCES.SIBLING_COUNT AS SIBLING_COUNT,\
	CMS_${PROJECT}_RESOURCES.RESOURCE_VERSION AS RESOURCE_VERSION

#
# General subtree selection statement
#
C_ORACLE_RESOURCES_READ_TREE_PAGED=\
SELECT \
    ${C_ORACLE_RESOURCES_SELECT_ATTRIBS_LEVEL1} \
FROM (\
    SELECT \
        ${C_ORACLE_RESOURCES_SELECT_ATTRIBS_LEVEL1},\
        ROWNUM AS ROW_NUMBER \
    FROM (\
        SELECT \
            ${C_ORACLE_RESOURCES_SELECT_ATTRIBS_LEVEL2},\
            CMS_${PROJECT}_RESOURCES.PROJECT_LASTMODIFIED \
        FROM \
	        ${C_RESOURCES_SELECT_TABLES} \
        WHERE \
	        ${C_JOIN_RESOURCE_STRUCTURE}

#
# Resources order by DATE_LASTMODIFIED AND ROWNUM
#
C_ORACLE_RESOURCES_PAGED_ORDER_BY_DATELASTMODIFIED=\
	        ORDER BY CMS_${PROJECT}_RESOURCES.DATE_LASTMODIFIED DESC\
	    )\
	) \
WHERE ROW_NUMBER >=? AND ROW_NUMBER <?

to get paged latest resources.

Use
#
# General subtree selection statement
#
C_ORACLE_RESOURCES_READ_TREE=\
SELECT \
    ${C_ORACLE_RESOURCES_SELECT_ATTRIBS_LEVEL1} \
FROM (\
    SELECT \
        ${C_ORACLE_RESOURCES_SELECT_ATTRIBS_LEVEL2},\
        CMS_${PROJECT}_RESOURCES.PROJECT_LASTMODIFIED \
    FROM \
	    ${C_RESOURCES_SELECT_TABLES} \
    WHERE \
	    ${C_JOIN_RESOURCE_STRUCTURE}

#
# Resources order by DATE_LASTMODIFIED
#
C_ORACLE_RESOURCES_ORDER_BY_DATELASTMODIFIED=\
	    ORDER BY CMS_${PROJECT}_RESOURCES.DATE_LASTMODIFIED DESC\
	) \
WHERE ROWNUM<=?

to get top latest resources.

To filter the resources by sql rather than by Java. That's the trick.
Too simple, right?

Good luck,

Shi Yusen/Beijing Langhua Ltd.


在 2008-03-05三的 05:28 -0800,marcio.camurati写道:
> Shi Yusen,
> 
> The velocity is really very nice !
> 
> Can you give more informations about this project, do you use only simple
> collectors like allInFolderDateReleasedDesc or allInSubTreeDateReleasedDesc
> for get this information at the OpenCMS resources?
> 
> You said at the end for don't worry about the sql problem, this problem
> really exit if we use the OpenCMS at CORE it´s really necessary any
> modification about sql operation to resolve the performance problem that
> Martin post to us ?
> 
> Best regards,
> Marcio Camurati
> 
> 
> Shi Yusen wrote:
> > 
> > Here is a website we almost completed with more than 50k pages now, and
> > about 15k-20k more annually. You'll find it's quite fast.
> > http://www2.scnjw.com/scnjw/index.html
> > 
> > CentOS + Squad + Apache + Tomcat + OpenCms 7.0.3 + Oracle.
> > 
> > Don't worry about OpenCms performance. For the sql problem, you can
> > write your own performance module to improve the sql operation. It's not
> > difficult.
> > 
> > Regards,
> > 
> > Shi Yusen/Beijing Langhua Ltd.
> > 
> > 
> > 在 2008-03-05三的 03:47 -0800,marcio.camurati写道:
> >> Hi Martin,
> >> 
> >> I read you post, and see you cenario at our future project that is at the
> >> begin yet, the performance is one of our concern about use the OpenCMS to
> >> manage the content of the site, do you have any good news about your
> >> performance problem ? or it continuos ?
> >> 
> >> If you can tallk more obut it or about the solution that you do for it.
> >> 
> >> Best regards,
> >> Marcio Camurati
> >> 
> >> 
> >> Martin Bednář wrote:
> >> > 
> >> > I have problems with very poor performance of Collectors, specialy with 
> >> > allInFolderDateReleasedDesc and allInSubTreeDateReleasedDesc. I have 
> >> > site with about >10000 articles categorized in folders, so i have this 
> >> > structure
> >> > 
> >> > /Categories
> >> > /Categories/Cat1
> >> > /Categories/Cat2
> >> > /Categories/CatX
> >> > ...
> >> > 
> >> > in Cat1, CatX i have  page which shows articles in whole category (with 
> >> > paging  by 20 articles)
> >> > on homepage i have 20 newest articles from all categories
> >> > I use something like:
> >> > <cms:contentload
> >> >    collector="allInSubTreeDateReleasedDesc" 
> >> > param="/Categories/|magArticle|20"
> >> >    editable="true">
> >> > 
> >> > respectively in cat1
> >> > <cms:contentload
> >> >    collector="allInSubTreeDateReleasedDesc" 
> >> > param="/Categories/Cat1/|magArticle"
> >> >    pageSize="20" pageIndex="1" editable="true">
> >> > 
> >> > Performance is very poor, I looked to source code and see that CMS
> >> works 
> >> > (for my HP for exmaple) in this way:
> >> > Load /Categories resource from DB, Create CmsResource
> >> > Load all 10000 under /Categories resources from DB and Create 
> >> > CmsResource objects for it
> >> > Sort all 10000 CmsResources by ReleaseDate
> >> > throw 9980 unneeded objects !!!
> >> > return 20 CmsResources
> >> > 
> >> > It's really crazy.
> >> > 
> >> > Is there a way how to optimize it ?
> >> > 
> >> > Why data is not sorted on SQL server and returned only 20 items in 
> >> > recordset ?
> >> > 
> >> > I thing that it's really performance problem, waste of CPU and memory.
> >> > 
> >> > This operation takes about 2,3min on my server (CPU 2xQuadCore 2,4GHz, 
> >> > 4GB RAM, 8xHDD on 3WareCard in RAID6) on my old server with AMD64 
> >> > Opteron and 4HDD in SW RAID it takes about 14minutes !
> >> > 
> >> > Martin
> >> > 
> >> > 
> >> > _______________________________________________
> >> > This mail is sent to you from the opencms-dev mailing list
> >> > To change your list options, or to unsubscribe from the list, please
> >> visit
> >> > http://lists.opencms.org/mailman/listinfo/opencms-dev
> >> > 
> >> 
> >> -- 
> >> View this message in context:
> >> http://www.nabble.com/CmsCollector-performance-tp14931100p15848425.html
> >> Sent from the OpenCMS - Dev mailing list archive at Nabble.com.
> >> 
> >> 
> >> _______________________________________________
> >> This mail is sent to you from the opencms-dev mailing list
> >> To change your list options, or to unsubscribe from the list, please
> >> visit
> >> http://lists.opencms.org/mailman/listinfo/opencms-dev
> > 
> > 
> > _______________________________________________
> > This mail is sent to you from the opencms-dev mailing list
> > To change your list options, or to unsubscribe from the list, please visit
> > http://lists.opencms.org/mailman/listinfo/opencms-dev
> > 
> 
> -- 
> View this message in context: http://www.nabble.com/CmsCollector-performance-tp14931100p15850039.html
> Sent from the OpenCMS - Dev mailing list archive at Nabble.com.
> 
> 
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev




More information about the opencms-dev mailing list