[opencms-dev] Adding checksums to OpenCms 8

Sebastian Himberger sebastian.himberger at gmx.de
Tue Sep 29 19:39:49 CEST 2009


Hi List,

I recently had a customer who stored about 10000 JPEGs inside OpenCms
(with MySQL). Due to hard disk degradation in a RAID1-Array some of the
data became invalid (slowly over time of course) resulting in corrupt
images. Although backups were in place (with checksums to verify
everything) the slow degradation made it extremely difficult to find the
corrupt images. The only way was to read backups from various stages and
compare checksums and last modification dates. I've read a lot about
data integrity and since OpenCms stores all the binary data in the DB I
think it might be worth it to add additional features to the database
structure.

I would suggest adding at least a field FILE_CONTENT_HASH to the
CMS_CONTENTS table which is filled in during file writes and updates.
The field could be NULLable indicating that no checksum is available.
This would also allow to disable generating the checksum in favor of
write performance. Maybe we could implement a hook in the driver
structure to perform validations on read (using a Java interface).
Additional checks could be performed using a scheduled task or custom
modules. Eventually it would be nice to have the checksum available in
the CmsFile objects but I don't think this an requirement for a first
step. I don't know if this should also applied to properties. More
security is of course always good but I really would want to keep the
changes to a minimum at first.

Whats your take on it?

Best regards,
Sebastian






More information about the opencms-dev mailing list