[opencms-dev] Adding checksums to OpenCms 8

Sebastian Himberger sebastian.himberger at gmx.de
Thu Oct 1 03:43:21 CEST 2009


Hi,

yes I think Andreas' idea is a good one! I also think there should be a
small tool to add checksums to all resources which don't have one yet.

All the best,
Sebastian

Christian Steinert schrieb:
> Sebastian Himberger wrote:
>> Hi Christian,
>>
>> that's a good Idea. Maybe something like an checkIntegrity(I_CmsResport)
>> or something like this.
>>   
> I like Andreas' comment on this - he is right, if checking is enabled
> it should be enough to just read each file and if something is wrong,
> then an appropriate special kind of CMSException could be fired.
>> I don't know regarding the properties and other metadata. If we use NULL
>> fields it would of course be possible to make this configurable.
> I think NULL fields are best for this since not everybody may want
> this kind of validation.
>
> It would be great if the upgrade wizard or a small separate tool could
> generate checksums for existing content. 
> Of course, an Admin could - already with the present tools - also
> touch all resources through the explorer and cause the content to be
> re-written, which would then re-calculate the check sums but this
> would mess up the update dates for all resources.
>
>>  I don't
>> think we need another integrity mechanism for the structure but
>> properties might be interesting. Although I think adding it to the
>> content would already provide a huge improvement.
>>   
> agreed. This also depends a little on the direction into which opencms
> is going. It seems that there is a slight push towards getting more
> into xml content and away from properties, although properties will
> probably be around for a long time and maybe forever. Checksums on VFS
> structures might really be the wrong thing to even try at the high
> level of abstraction at which opencms is using its storage, so that
> whoever wants this level of integrity would need to use a sufficiently
> capable DB underneath.
>
> Best Regards
> Christian
>> Christian Steinert schrieb:
>>   
>>> Generally this sounds like a very good idea and I agree that it seems 
>>> bet to add this at the driver level although higher layers should have 
>>> some way of requesting checksum validity information (not necessarily 
>>> the checksums values themselves, since this is maybe too much of an 
>>> implementation detail and, for example, the checksum algorithm might 
>>> change over time).
>>>
>>> But would the checksums be restricted to file content or are there also 
>>> considerations  to add checksums for properties and/or to general file 
>>> system structures? I find it hard to estimate the possible performance 
>>> impact of checksumming this kind of information, too, so I don't know 
>>> whether that is a good idea. Some OS-filesystems are of course capable 
>>> of doing checksums on all metadata as well, but they have very optimized 
>>> data structures and I/O behavior which might be impossible to do when 
>>> sitting inside of a Java VM+Servlet Container and on top of various 
>>> different databases.
>>> Nonetheless, I at least wanted to raise the point
>>>
>>> Best REgads
>>> Christian
>>>   
>>>     
>>>> Hi List,
>>>>
>>>> I recently had a customer who stored about 10000 JPEGs inside OpenCms
>>>> (with MySQL). Due to hard disk degradation in a RAID1-Array some of the
>>>> data became invalid (slowly over time of course) resulting in corrupt
>>>> images. Although backups were in place (with checksums to verify
>>>> everything) the slow degradation made it extremely difficult to find the
>>>> corrupt images. The only way was to read backups from various stages and
>>>> compare checksums and last modification dates. I've read a lot about
>>>> data integrity and since OpenCms stores all the binary data in the DB I
>>>> think it might be worth it to add additional features to the database
>>>> structure.
>>>>
>>>> I would suggest adding at least a field FILE_CONTENT_HASH to the
>>>> CMS_CONTENTS table which is filled in during file writes and updates.
>>>> The field could be NULLable indicating that no checksum is available.
>>>> This would also allow to disable generating the checksum in favor of
>>>> write performance. Maybe we could implement a hook in the driver
>>>> structure to perform validations on read (using a Java interface).
>>>> Additional checks could be performed using a scheduled task or custom
>>>> modules. Eventually it would be nice to have the checksum available in
>>>> the CmsFile objects but I don't think this an requirement for a first
>>>> step. I don't know if this should also applied to properties. More
>>>> security is of course always good but I really would want to keep the
>>>> changes to a minimum at first.
>>>>
>>>> Whats your take on it?
>>>>
>>>> Best regards,
>>>> Sebastian
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> This mail is sent to you from the opencms-dev mailing list
>>>> To change your list options, or to unsubscribe from the list, please visit
>>>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>>>
>>>>   
>>>>     
>>>>       
>>> _______________________________________________
>>> This mail is sent to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please visit
>>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>>   
>>>     
>>
>>
>> _______________________________________________
>> This mail is sent to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please visit
>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>
>>   
>
> ------------------------------------------------------------------------
>
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev




More information about the opencms-dev mailing list