SemCrypt - Semantic-based Encrypted XML Document Processing

Project duration: January 2005 - June 2007
Funded by: Bundesministerium für Verkehr, Innovation und Technologie (FIT-IT)

Project partners
Vienna University of Technology EC3 Networks GmbH

Project team: Katharina Grün (DKE)
Michael Schrefl (DKE)
Michael Karlinger (DKE)
Georg Nitsche (DKE)

Publications

M. Schrefl, J. Dorn, K. Grün:
SemCrypt - Ensuring Privacy of Electronic Documents through Semantic-based Encrypted Query Processing
In: Karl Aberer, Michael J. Franklin, Shojiro Nishio (Eds.): Proceedings of the International Workshop on Privacy Data Management (PDM 2005), in conjunction with the 21st International Conference on Data Engineering (ICDE 2005), Tokyo, Japan, April 8-9, 2005, IEEE Computer Society Press, 10 pages, ISBN 0-7695-2285-8, p. 1191, 2005.

K. Grün, M. Karlinger, M. Schrefl:
Schema-aware Labelling of XML Documents for Efficient Query and Update Processing in SemCrypt
In: International Journal of Computer Systems: Science & Engineering, Vol. 21, No 1, January 2006, CRL Publishing Ltd., ISSN 0267-6192, pp. 65-82, 2006.

K. Grün:
A Generic Framework for Querying and Updating Secondary XML Index Structures
In: Proceedings of the SIGMOD 2007 Ph.D. Workshop on Innovative Database Research (IDAR 2007), Beijing, China, June 10, 2007, pp. 27-32, 2007.

K. Grün, M. Schrefl:
Exploiting the Structure of Update Fragments for Efficient XML Index Maintenance
In: Guozhu Dong, Xuemin Lin, Wei Wang, Yun Yang, Jeffrey Xu Yu (Eds.): Advances in Data and Web Management, Proceedings of the Joint 9th Asia-Pacific Web Conference (APWeb 2007) and the 8th International Conference on Web-Age Information Management (WAIM 2007), HuangShan, China, June 16-18, 2007, Springer Verlag Deutschland, Reihe Lecture Notes in Computer Science (LNCS), Vol. 4505, ISBN 978-3-540-72483-4, pp. 471-478, 2007.

K. Grün, M. Schrefl:
Extensible Indexing in XML Databases
Institute report 08.01, August 2008.

K. Grün, M. Karlinger, M. Schrefl:
SemCrypt - Secure XML Processing in Outsourced Databases
Institute report 08.02, September 2008.

W. Dorninger:
Securing Remote Data Stores - Design and Implementation of an Encrypted Data Store
(Master Thesis, 2005)
Diplomarbeit, Betreuung: o. Univ.-Prof. Dr. Michael Schrefl, unter Anleitung von Mag. Katharina Grün und Mag. Michael Karlinger, ausgeführt an der Universität Linz, Institut für Wirtschaftsinformatik - Data & Knowledge Engineering, Dezember 2005.

P. Lasinger:
Indexing Encrypted XML Documents in the SemCrypt Database Management System
(Master Thesis, 2006)
Diplomarbeit, Betreuung: o. Univ.-Prof. Dr. Michael Schrefl, unter Anleitung von Mag. Katharina Grün und Mag. Michael Karlinger, ausgeführt an der Universität Linz, Institut für Wirtschaftsinformatik - Data & Knowledge Engineering, Juli 2006.

K. Grün:
Flexible and Selective Indexing in XML Databases
(PhD Thesis, 2008)

Motivation Outsourcing IT services to external Service Providers is an emerging and growing market which represents a popular alternative to maintaining services in-house. By specializing on particular services, Service Providers can increase the quality and decrease the costs of their services. An important IT service is providing and administrating a data store in which individuals or companies can file documents and then query and update these documents without having to worry about, e.g., IT infrastructure, availability of data stores or back-ups. When storing sensitive data at such an external data store, the so-called Storage Provider needs to ensure that neither intruders nor its own staff can access the data. Currently, service level agreements represent the only possibility to ultimately guarantee data privacy in this storage model. The data owner thus needs to trust the Storage Provider with respect to fulfilling the contract and not misusing the data. Description Project SemCrypt explores techniques to build a secure external data store which allows for efficient querying and updating of the stored documents. SemCrypt protects data from unauthorized access at the Storage Provider by only storing encrypted documents. Encryption and decryption are not performed at the external data store, but within a trusted domain of the data owner. In order to query and update the encrypted documents, SemCrypt makes use of the properties of XML, which is the required data format of the documents. XML documents do not only contain data but also information about the data structure. To process queries and updates, new techniques have been developed which utilize these structural semantics and combine them with special access structures. These techniques enable the direct access to encrypted data without having to perform a time-consuming decryption of the whole data store. Challenges Designing a secure XML database system (DBS) poses several challenges. To ensure data privacy and prevent security risks, the system must guarantee both storage and communication security. The physical storage structure must neither reveal the document content nor the document structure to the Storage Provider. To be widely applicable, the DBS should not depend on specific encryption techniques. Repeated encryption of the same plain text fragments or markups needs to result in different cipher texts in order to avoid statistic-based attacks. Regarding the client-server communication, repeated transmission of the same data has to be avoided, again in order to avoid statistic-based attacks. The DBS needs to support querying and updating of both the content and structure of documents, i.e. navigating within documents and constraining values and types. Finally, regarding the system's overall performance, the data volume to be transferred from the server for answering queries has to be minimized. Achievements SemCrypt tackles the above-mentioned challenges as follows. The physical storage structure guarantees data privacy and security through encryption. To identify encrypted fragments, query and update processing techniques exploit the structural semantics of XML, i.e. schema information, which is captured by the document structure. Processing documents is based on a schema-aware labeling scheme which not only allows to identify each node of a document by a unique node label, but also enables to execute many operations directly onto node labels without accessing encrypted data. To efficiently process queries that constrain node values, SemCrypt provides an index framework that supports indexing the content and structure of arbitrary document fragments (cf. project SCIENS). Query processing is based on a query algebra enabling query optimization and index selection. Prior works on XML processing present isolated solutions for these problems, which cannot easily be combined due to conflicting assumptions. SemCrypt not only extends these techniques, but also integrates them into a multi-layered architecture. Application Scenarios The developed techniques allow for adapting SemCrypt to diverse application scenarios. Individuals can store private documents at external data stores. Companies can outsource their data stores and share sensitive documents, e.g., with regard to eGovernment, eFinance or Human Resource Management applications. When different people access the same documents, additional techniques are necessary to guarantee that each person can only access the document fragments for which he or she is entitled. Developing and integrating corresponding techniques is an integral part of the project.

By providing secure data stores in different application scenarios, SemCrypt will increase the popularity of outsourcing data stores to external Storage Providers. In particular, small and medium-sized businesses will profit from outsourcing the cost and resource-intensive administration of data stores without putting the privacy of their data at risk.