[opencms-dev] Adding checksums to OpenCms 8

Christian Steinert christian_steinert at web.de
Thu Oct 1 18:06:28 CEST 2009


Sebastian Himberger wrote:
> Hi,
>
> yes I think Andreas' idea is a good one! 
One thing though - there should be some way of even recovering a 
semi-broken file in case of emergency (de-activating checksums would be 
a possibility but this means that the checksums should be only checked, 
if checking is enabled - even if checksums have been stored in the DB 
for some point before.

Otherwise, if some files are semi-corrupted, there is no way to recover 
at least what is still left of them, because trying to read them will 
result in an exception  :)

But for such rare emergencies it should be completely fine to just 
disable the checksumming and then trying to retrieve the files again.

Best Regards
Christian

> I also think there should be a
> small tool to add checksums to all resources which don't have one yet.
>
> All the best,
> Sebastian
>
> Christian Steinert schrieb:
>   
>> Sebastian Himberger wrote:
>>     
>>> Hi Christian,
>>>
>>> that's a good Idea. Maybe something like an checkIntegrity(I_CmsResport)
>>> or something like this.
>>>   
>>>       
>> I like Andreas' comment on this - he is right, if checking is enabled
>> it should be enough to just read each file and if something is wrong,
>> then an appropriate special kind of CMSException could be fired.
>>     
>>> I don't know regarding the properties and other metadata. If we use NULL
>>> fields it would of course be possible to make this configurable.
>>>       
>> I think NULL fields are best for this since not everybody may want
>> this kind of validation.
>>
>> It would be great if the upgrade wizard or a small separate tool could
>> generate checksums for existing content. 
>> Of course, an Admin could - already with the present tools - also
>> touch all resources through the explorer and cause the content to be
>> re-written, which would then re-calculate the check sums but this
>> would mess up the update dates for all resources.
>>
>>     
>>>  I don't
>>> think we need another integrity mechanism for the structure but
>>> properties might be interesting. Although I think adding it to the
>>> content would already provide a huge improvement.
>>>   
>>>       
>> agreed. This also depends a little on the direction into which opencms
>> is going. It seems that there is a slight push towards getting more
>> into xml content and away from properties, although properties will
>> probably be around for a long time and maybe forever. Checksums on VFS
>> structures might really be the wrong thing to even try at the high
>> level of abstraction at which opencms is using its storage, so that
>> whoever wants this level of integrity would need to use a sufficiently
>> capable DB underneath.
>>
>> Best Regards
>> Christian
>>     
>>> Christian Steinert schrieb:
>>>   
>>>       
>>>> Generally this sounds like a very good idea and I agree that it seems 
>>>> bet to add this at the driver level although higher layers should have 
>>>> some way of requesting checksum validity information (not necessarily 
>>>> the checksums values themselves, since this is maybe too much of an 
>>>> implementation detail and, for example, the checksum algorithm might 
>>>> change over time).
>>>>
>>>> But would the checksums be restricted to file content or are there also 
>>>> considerations  to add checksums for properties and/or to general file 
>>>> system structures? I find it hard to estimate the possible performance 
>>>> impact of checksumming this kind of information, too, so I don't know 
>>>> whether that is a good idea. Some OS-filesystems are of course capable 
>>>> of doing checksums on all metadata as well, but they have very optimized 
>>>> data structures and I/O behavior which might be impossible to do when 
>>>> sitting inside of a Java VM+Servlet Container and on top of various 
>>>> different databases.
>>>> Nonetheless, I at least wanted to raise the point
>>>>
>>>> Best REgads
>>>> Christian
>>>>   
>>>>     
>>>>         
>>>>> Hi List,
>>>>>
>>>>> I recently had a customer who stored about 10000 JPEGs inside OpenCms
>>>>> (with MySQL). Due to hard disk degradation in a RAID1-Array some of the
>>>>> data became invalid (slowly over time of course) resulting in corrupt
>>>>> images. Although backups were in place (with checksums to verify
>>>>> everything) the slow degradation made it extremely difficult to find the
>>>>> corrupt images. The only way was to read backups from various stages and
>>>>> compare checksums and last modification dates. I've read a lot about
>>>>> data integrity and since OpenCms stores all the binary data in the DB I
>>>>> think it might be worth it to add additional features to the database
>>>>> structure.
>>>>>
>>>>> I would suggest adding at least a field FILE_CONTENT_HASH to the
>>>>> CMS_CONTENTS table which is filled in during file writes and updates.
>>>>> The field could be NULLable indicating that no checksum is available.
>>>>> This would also allow to disable generating the checksum in favor of
>>>>> write performance. Maybe we could implement a hook in the driver
>>>>> structure to perform validations on read (using a Java interface).
>>>>> Additional checks could be performed using a scheduled task or custom
>>>>> modules. Eventually it would be nice to have the checksum available in
>>>>> the CmsFile objects but I don't think this an requirement for a first
>>>>> step. I don't know if this should also applied to properties. More
>>>>> security is of course always good but I really would want to keep the
>>>>> changes to a minimum at first.
>>>>>
>>>>> Whats your take on it?
>>>>>
>>>>> Best regards,
>>>>> Sebastian
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> This mail is sent to you from the opencms-dev mailing list
>>>>> To change your list options, or to unsubscribe from the list, please visit
>>>>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>>>>
>>>>>   
>>>>>     
>>>>>       
>>>>>           
>>>> _______________________________________________
>>>> This mail is sent to you from the opencms-dev mailing list
>>>> To change your list options, or to unsubscribe from the list, please visit
>>>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>>>   
>>>>     
>>>>         
>>> _______________________________________________
>>> This mail is sent to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please visit
>>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>>
>>>   
>>>       
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> This mail is sent to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please visit
>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>     
>
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20091001/ffca19a8/attachment.htm>


More information about the opencms-dev mailing list