[opencms-dev] Adding checksums to OpenCms 8
Christian Steinert
christian_steinert at web.de
Wed Sep 30 23:51:56 CEST 2009
Sebastian Himberger wrote:
> Hi Christian,
>
> that's a good Idea. Maybe something like an checkIntegrity(I_CmsResport)
> or something like this.
>
I like Andreas' comment on this - he is right, if checking is enabled it
should be enough to just read each file and if something is wrong, then
an appropriate special kind of CMSException could be fired.
> I don't know regarding the properties and other metadata. If we use NULL
> fields it would of course be possible to make this configurable.
I think NULL fields are best for this since not everybody may want this
kind of validation.
It would be great if the upgrade wizard or a small separate tool could
generate checksums for existing content.
Of course, an Admin could - already with the present tools - also touch
all resources through the explorer and cause the content to be
re-written, which would then re-calculate the check sums but this would
mess up the update dates for all resources.
> I don't
> think we need another integrity mechanism for the structure but
> properties might be interesting. Although I think adding it to the
> content would already provide a huge improvement.
>
agreed. This also depends a little on the direction into which opencms
is going. It seems that there is a slight push towards getting more into
xml content and away from properties, although properties will probably
be around for a long time and maybe forever. Checksums on VFS structures
might really be the wrong thing to even try at the high level of
abstraction at which opencms is using its storage, so that whoever wants
this level of integrity would need to use a sufficiently capable DB
underneath.
Best Regards
Christian
>
> Christian Steinert schrieb:
>
>> Generally this sounds like a very good idea and I agree that it seems
>> bet to add this at the driver level although higher layers should have
>> some way of requesting checksum validity information (not necessarily
>> the checksums values themselves, since this is maybe too much of an
>> implementation detail and, for example, the checksum algorithm might
>> change over time).
>>
>> But would the checksums be restricted to file content or are there also
>> considerations to add checksums for properties and/or to general file
>> system structures? I find it hard to estimate the possible performance
>> impact of checksumming this kind of information, too, so I don't know
>> whether that is a good idea. Some OS-filesystems are of course capable
>> of doing checksums on all metadata as well, but they have very optimized
>> data structures and I/O behavior which might be impossible to do when
>> sitting inside of a Java VM+Servlet Container and on top of various
>> different databases.
>> Nonetheless, I at least wanted to raise the point
>>
>> Best REgads
>> Christian
>>
>>
>>> Hi List,
>>>
>>> I recently had a customer who stored about 10000 JPEGs inside OpenCms
>>> (with MySQL). Due to hard disk degradation in a RAID1-Array some of the
>>> data became invalid (slowly over time of course) resulting in corrupt
>>> images. Although backups were in place (with checksums to verify
>>> everything) the slow degradation made it extremely difficult to find the
>>> corrupt images. The only way was to read backups from various stages and
>>> compare checksums and last modification dates. I've read a lot about
>>> data integrity and since OpenCms stores all the binary data in the DB I
>>> think it might be worth it to add additional features to the database
>>> structure.
>>>
>>> I would suggest adding at least a field FILE_CONTENT_HASH to the
>>> CMS_CONTENTS table which is filled in during file writes and updates.
>>> The field could be NULLable indicating that no checksum is available.
>>> This would also allow to disable generating the checksum in favor of
>>> write performance. Maybe we could implement a hook in the driver
>>> structure to perform validations on read (using a Java interface).
>>> Additional checks could be performed using a scheduled task or custom
>>> modules. Eventually it would be nice to have the checksum available in
>>> the CmsFile objects but I don't think this an requirement for a first
>>> step. I don't know if this should also applied to properties. More
>>> security is of course always good but I really would want to keep the
>>> changes to a minimum at first.
>>>
>>> Whats your take on it?
>>>
>>> Best regards,
>>> Sebastian
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> This mail is sent to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please visit
>>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>>
>>>
>>>
>>>
>> _______________________________________________
>> This mail is sent to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please visit
>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20090930/0431b91c/attachment.htm>
More information about the opencms-dev
mailing list