[opencms-dev] Adding checksums to OpenCms 8
Christian Steinert
christian_steinert at web.de
Thu Oct 1 18:06:28 CEST 2009
Sebastian Himberger wrote:
> Hi,
>
> yes I think Andreas' idea is a good one!
One thing though - there should be some way of even recovering a
semi-broken file in case of emergency (de-activating checksums would be
a possibility but this means that the checksums should be only checked,
if checking is enabled - even if checksums have been stored in the DB
for some point before.
Otherwise, if some files are semi-corrupted, there is no way to recover
at least what is still left of them, because trying to read them will
result in an exception :)
But for such rare emergencies it should be completely fine to just
disable the checksumming and then trying to retrieve the files again.
Best Regards
Christian
> I also think there should be a
> small tool to add checksums to all resources which don't have one yet.
>
> All the best,
> Sebastian
>
> Christian Steinert schrieb:
>
>> Sebastian Himberger wrote:
>>
>>> Hi Christian,
>>>
>>> that's a good Idea. Maybe something like an checkIntegrity(I_CmsResport)
>>> or something like this.
>>>
>>>
>> I like Andreas' comment on this - he is right, if checking is enabled
>> it should be enough to just read each file and if something is wrong,
>> then an appropriate special kind of CMSException could be fired.
>>
>>> I don't know regarding the properties and other metadata. If we use NULL
>>> fields it would of course be possible to make this configurable.
>>>
>> I think NULL fields are best for this since not everybody may want
>> this kind of validation.
>>
>> It would be great if the upgrade wizard or a small separate tool could
>> generate checksums for existing content.
>> Of course, an Admin could - already with the present tools - also
>> touch all resources through the explorer and cause the content to be
>> re-written, which would then re-calculate the check sums but this
>> would mess up the update dates for all resources.
>>
>>
>>> I don't
>>> think we need another integrity mechanism for the structure but
>>> properties might be interesting. Although I think adding it to the
>>> content would already provide a huge improvement.
>>>
>>>
>> agreed. This also depends a little on the direction into which opencms
>> is going. It seems that there is a slight push towards getting more
>> into xml content and away from properties, although properties will
>> probably be around for a long time and maybe forever. Checksums on VFS
>> structures might really be the wrong thing to even try at the high
>> level of abstraction at which opencms is using its storage, so that
>> whoever wants this level of integrity would need to use a sufficiently
>> capable DB underneath.
>>
>> Best Regards
>> Christian
>>
>>> Christian Steinert schrieb:
>>>
>>>
>>>> Generally this sounds like a very good idea and I agree that it seems
>>>> bet to add this at the driver level although higher layers should have
>>>> some way of requesting checksum validity information (not necessarily
>>>> the checksums values themselves, since this is maybe too much of an
>>>> implementation detail and, for example, the checksum algorithm might
>>>> change over time).
>>>>
>>>> But would the checksums be restricted to file content or are there also
>>>> considerations to add checksums for properties and/or to general file
>>>> system structures? I find it hard to estimate the possible performance
>>>> impact of checksumming this kind of information, too, so I don't know
>>>> whether that is a good idea. Some OS-filesystems are of course capable
>>>> of doing checksums on all metadata as well, but they have very optimized
>>>> data structures and I/O behavior which might be impossible to do when
>>>> sitting inside of a Java VM+Servlet Container and on top of various
>>>> different databases.
>>>> Nonetheless, I at least wanted to raise the point
>>>>
>>>> Best REgads
>>>> Christian
>>>>
>>>>
>>>>
>>>>> Hi List,
>>>>>
>>>>> I recently had a customer who stored about 10000 JPEGs inside OpenCms
>>>>> (with MySQL). Due to hard disk degradation in a RAID1-Array some of the
>>>>> data became invalid (slowly over time of course) resulting in corrupt
>>>>> images. Although backups were in place (with checksums to verify
>>>>> everything) the slow degradation made it extremely difficult to find the
>>>>> corrupt images. The only way was to read backups from various stages and
>>>>> compare checksums and last modification dates. I've read a lot about
>>>>> data integrity and since OpenCms stores all the binary data in the DB I
>>>>> think it might be worth it to add additional features to the database
>>>>> structure.
>>>>>
>>>>> I would suggest adding at least a field FILE_CONTENT_HASH to the
>>>>> CMS_CONTENTS table which is filled in during file writes and updates.
>>>>> The field could be NULLable indicating that no checksum is available.
>>>>> This would also allow to disable generating the checksum in favor of
>>>>> write performance. Maybe we could implement a hook in the driver
>>>>> structure to perform validations on read (using a Java interface).
>>>>> Additional checks could be performed using a scheduled task or custom
>>>>> modules. Eventually it would be nice to have the checksum available in
>>>>> the CmsFile objects but I don't think this an requirement for a first
>>>>> step. I don't know if this should also applied to properties. More
>>>>> security is of course always good but I really would want to keep the
>>>>> changes to a minimum at first.
>>>>>
>>>>> Whats your take on it?
>>>>>
>>>>> Best regards,
>>>>> Sebastian
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> This mail is sent to you from the opencms-dev mailing list
>>>>> To change your list options, or to unsubscribe from the list, please visit
>>>>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> This mail is sent to you from the opencms-dev mailing list
>>>> To change your list options, or to unsubscribe from the list, please visit
>>>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> This mail is sent to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please visit
>>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>>
>>>
>>>
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> This mail is sent to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please visit
>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>
>
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20091001/ffca19a8/attachment.htm>
More information about the opencms-dev
mailing list