Database Encryption
Luc Bouganim Yanli GUO
INRIA Rocquencourt INRIA Rocquencourt
Le Chesnay, FRANCE Le Chesnay, FRANCE
Luc.Bouganim@inria.fr yanli.guo@inria.fr
Encryption Algorithm and Mode of Operation
Independently of the encryption strategy, the security of the encrypted data depends on the encryption algorithm, the encryption key size and its protection.Even having adopted strong algorithms, such as AES, the cipher text could still disclose plain text information if an inappropriate mode is chosen. For example, if encryption algorithm is implemented in electronic codebook mode (ECB), identical plaintext blocks are encrypted into identical cipher text blocks, thus disclosing repetitive patterns. In database context, repetitive pattern are common as many records could have same attribute values, so much care should be taken when choosing the encryption mode. Moreover, simple solutions that may work in other context (e.g., using counter mode with an initialization vector based on the data address) may fail in the database one since data can be updated (with previous example, performing an exclusive OR between old and new version of encrypted data will disclose the exclusive OR between old and new version of plain text data).All specificity of the database context should be taken into account to guide the choice of an adequate encryption algorithm and mode of operation: repetitive patterns, updates, huge volume of encrypted data. Moreover, the protection should be strong enough since the data may be valid for a very long time (several years).Thus, state-of-the-art encryption algorithm and mode of operation (without any concession) should be used.
Key Management
Key management refers to the way cryptographic keys are generated and managed throughout their life. Because cryptography is based on keys that encrypt and decrypt data, the database protection solution is only as good as the protection of the keys. The location of encryption keys and their access restrictions are thus particularly important. Since the problem is quite independent of the encryption level,the following text assumes database-level encryption.
For database level encryption, an easy solution is to store the keys in a restricted database table or file, potentially encrypted by a master key (itself stored somewhere on the database server). But all administrators with privileged access could also access these keys and decrypt any data within the system without ever being detected.
To overcome this problem, specialized tamper-resistant cryptographic chipsets,called hardware security module (HSM), can be used to provide secure storage for encryption keys [14][16]. Generally, the encryption keys are stored on the server encrypted by a master key which is stored in the HSM. At encryption/decryption time, encrypted keys are dynamically decrypted by the HSM (using the master key) and remove from the server memory as soon as the cryptographic operations are performed, as shown in Figure 2.a.
An alternative solution is to move security-related tasks to distinct software running on a (physically) distinct server, called security server, as shown in Figure 2.b. The security server then manages users, roles, privileges, encryption policies and encryption keys (potentially relying on a HSM). Within the DBMS, a security module communicates with the security server in order to authenticate users, check privileges and encrypt or decrypt data. Encryption keys can then be linked to user or to user’s privileges. A clear distinction is also made between the role of the DBA,administering the database resources, and the role of the SA (Security Administrator), administering security parameters. The gain in confidence comes from the fact that an attack requires a conspiracy between DBA and SA.
While adding a security server and/or HSM minimizes the exposure of the encryption keys, it does not fully protect the database. Indeed encryption keys, as well as decrypted data still appear (briefly) in the database server memory and can be the target of attackers.
Applications
Since several years, most DBMS manufacturers provide native encryption capabilities that enable application developers to include additional measures of data security through selective encryption of stored data. Such native capabilities take the form of encryption toolkits or packages (Oracle8i/9i [15]), functions that can be embedded in SQL statements (IBM DB2 [5]), or extensions of SQL (Sybase [18] and SQL Server 2005 [14]). To limit performance overhead, selective encryption can be generally done at the column level but may involve changing the database schema to accommodate binary data resulting from the encryption process [14].
SQL Server 2008 [14] introduces transparent data encryption (TDE) which is actually very similar to storage-level encryption. The whole database is protected by a single key (DEK for Database Encryption Key), itself protected by more complex means, including the possibility to use HSM. TDE performs all of the cryptographic operations at the I/O level, but within the database system, and removes any need for application developers to create custom code to encrypt and decrypt data.
TDE (same name as SQL Server but different functionalities) has been introduced in Oracle10g/11g, greatly enlarging the possibilities of using cryptography within the DBMS [16]. Encryption keys can now be managed by a HSM or be stored in an external file named wallet which is encrypted using an administratively defined password. Selective encryption can be done at the column granularity or larger (tablespace, i.e., set of data files corresponding to one or several tables and indexes). To avoid the analysis of encrypted data, Oracle proposes to include in the encryption process a Salt, a random 16 bytes string stored with each encrypted attribute value. An interesting, but rather dangerous,feature is the possibility to use encryption mode that preserve equality (typically a CBC mode with a constant initialization vector), thus allowing, for instance, to use indexes for equality predicates encrypting the searched value.
The database-level encryption with security server approach mentioned above is proposed by IBM DB2 with the Data Encryption Expert (DEE [5]) and by third-party vendors like Protegrity [6], RSA BSAFE [17] and SafeNet [19] (appliance-based solution). The third-party vendors’ products can adapt to most DBMS engine (Oracle, IBM DB2, SQL Server and Sybase).
Encryption Scheme
While all existing commercial database products adopt classical encryption algorithms for database encryption, specific encryption schemes have attracted much attention in the academic field, specifically in the Database as a Service paradigm. In this paradigm, database service providers offer its customers seamless mechanisms to create, store, and access their databases at the host site [1]. In this context, the database server may manage encrypted data without having access to the encryption keys (similar to application-level encryption).
Privacy homomorphic (PH) encryption is a form of encryption where one can perform some specific algebraic operations on the plaintext by performing (possibly different) algebraic operations on the cipher text. The first application of PH to aggregation queries in relational databases is exploited in [7], but this homomorphic encryption function is insecure against cipher text-only attacks. In [8], it supports complex aggregate queries and nested queries, but this scheme may reveal information about the input distribution, which can be exploited. Order preserving encryption scheme (OPES) [9] allows building directly indexes on cipher text. OPES can handle, without decryption, any interesting SQL query types. Unfortunately,OPES has been shown insecure in [10] and their authors introduced the fast comparison encryption (FCE) scheme for the database-level encryption strategy.FCE can be used for fast comparison through partial decryption technique. It encrypts plaintext byte by byte allowing fast comparison starting from the most significant byte and stopping as soon as a difference is found.
An alternative proposal is to use classical encryption algorithms and to store additional auxiliary fuzzy information, next to the cipher text in order to allow partial query processing on encrypted data [1][3]. Such auxiliary information shouldn’t reveal plain text content, thus a trade-off exists between security and efficiency:increasing the precision of auxiliary information increases the performance since more processing can be done on encrypted data, but it also increases the risk of data disclosure.
New database encryption strategies
Currently existing architecture including database encryption are not fully satisfactory since, as mentioned above, encryption keys appears in plain text in the RAM of the server or of the client machine where the application runs. HSM acts as a safe storage to minimize the risk diminishing the keys exposure during its lifetime.Research is being conducted to make a better use of HSM, avoiding exposing encryption keys during the whole process. Two architectures can be considered:server-HSM when the HSM is shared by all users and is located on the server;client-HSM when the HSM is dedicated to a single user and is located near theuser, potentially on the client machine. These two architectures are pictured in figure 3.
Logically, the server-HSM is nothing more than a database-level encryption with a security-server embedded in the HSM. The HSM now manages users, privileges,encryption policies and keys. It has the same advantages as the database-level encryption with security-server approach but does not expose encryption keys at any moment (since encryption/decryption is done within the HSM). Moreover, the security server cannot be tampered since it is fully embedded in the tamperresistant HSM. With this approach, the only data that appears in plain-text is the query results that are delivered to the users. The main difficulty of this approach is its complexity, since a complex piece of software must be embedded in a HSM with restricted computation resources (due to security constraints).