[opencms-dev] Adding checksums to OpenCms 8

Sebastian Himberger sebastian.himberger at gmx.de
Wed Sep 30 17:03:35 CEST 2009


Hi Christian,

that's a good Idea. Maybe something like an checkIntegrity(I_CmsResport)
or something like this.

I don't know regarding the properties and other metadata. If we use NULL
fields it would of course be possible to make this configurable. I don't
think we need another integrity mechanism for the structure but
properties might be interesting. Although I think adding it to the
content would already provide a huge improvement.

All the best,
Sebastian


Christian Steinert schrieb:
> Generally this sounds like a very good idea and I agree that it seems 
> bet to add this at the driver level although higher layers should have 
> some way of requesting checksum validity information (not necessarily 
> the checksums values themselves, since this is maybe too much of an 
> implementation detail and, for example, the checksum algorithm might 
> change over time).
>
> But would the checksums be restricted to file content or are there also 
> considerations  to add checksums for properties and/or to general file 
> system structures? I find it hard to estimate the possible performance 
> impact of checksumming this kind of information, too, so I don't know 
> whether that is a good idea. Some OS-filesystems are of course capable 
> of doing checksums on all metadata as well, but they have very optimized 
> data structures and I/O behavior which might be impossible to do when 
> sitting inside of a Java VM+Servlet Container and on top of various 
> different databases.
> Nonetheless, I at least wanted to raise the point
>
> Best REgads
> Christian
>   
>> Hi List,
>>
>> I recently had a customer who stored about 10000 JPEGs inside OpenCms
>> (with MySQL). Due to hard disk degradation in a RAID1-Array some of the
>> data became invalid (slowly over time of course) resulting in corrupt
>> images. Although backups were in place (with checksums to verify
>> everything) the slow degradation made it extremely difficult to find the
>> corrupt images. The only way was to read backups from various stages and
>> compare checksums and last modification dates. I've read a lot about
>> data integrity and since OpenCms stores all the binary data in the DB I
>> think it might be worth it to add additional features to the database
>> structure.
>>
>> I would suggest adding at least a field FILE_CONTENT_HASH to the
>> CMS_CONTENTS table which is filled in during file writes and updates.
>> The field could be NULLable indicating that no checksum is available.
>> This would also allow to disable generating the checksum in favor of
>> write performance. Maybe we could implement a hook in the driver
>> structure to perform validations on read (using a Java interface).
>> Additional checks could be performed using a scheduled task or custom
>> modules. Eventually it would be nice to have the checksum available in
>> the CmsFile objects but I don't think this an requirement for a first
>> step. I don't know if this should also applied to properties. More
>> security is of course always good but I really would want to keep the
>> changes to a minimum at first.
>>
>> Whats your take on it?
>>
>> Best regards,
>> Sebastian
>>
>>
>>
>>
>> _______________________________________________
>> This mail is sent to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please visit
>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>
>>   
>>     
>
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev
>   




More information about the opencms-dev mailing list