Best practices for implementing encryption and managing keys.
Encryption can be used to protect data in a Greenplum Database system in the following ways:
- Connections between clients and the master database can be encrypted with SSL. This is enabled by setting the
ssl
server configuration parameter toon
and editing thepg_hba.conf
file. See "Encrypting Client/Server Connections" in the Greenplum Database Administrator Guide for information about enabling SSL in Greenplum Database. - Greenplum Database 4.2.1 and above allow SSL encryption of data in transit between the Greenplum parallel file distribution server,
gpfdist
, and segment hosts. SeeEncrypting gpfdist Connections for more information. - Network communications between hosts in the Greenplum Database cluster can be encrypted using IPsec. An authenticated, encrypted VPN is established between every pair of hosts in the cluster. Check your operating system documentation for IPsec support, or consider a third-party solution such as that provided by Zettaset.
- The
pgcrypto
module of encryption/decryption functions protects data at rest in the database. Encryption at the column level protects sensitive information, such as passwords, Social Security numbers, or credit card numbers. See Encrypting Data in Tables using PGP for an example.
Best Practices
- Encryption ensures that data can be seen only by users who have the key required to decrypt the data.
- Encrypting and decrypting data has a performance cost; only encrypt data that requires encryption.
- Do performance testing before implementing any encryption solution in a production system.
- Server certificates in a production Greenplum Database system should be signed by a certificate authority (CA) so that clients can authenticate the server. The CA may be local if all clients are local to the organization.
- Client connections to Greenplum Database should use SSL encryption whenever the connection goes through an insecure link.
- A symmetric encryption scheme, where the same key is used to both encrypt and decrypt, has better performance than an asymmetric scheme and should be used when the key can be shared safely.
- Use functions from the pgcrypto module to encrypt data on disk. The data is encrypted and decrypted in the database process, so it is important to secure the client connection with SSL to avoid transmitting unencrypted data.
- Use the gpfdists protocol to secure ETL data as it is loaded into or unloaded from the database. See Encrypting gpfdist Connections.
Key Management
Whether you are using symmetric (single private key) or asymmetric (public and private key) cryptography, it is important to store the master or private key securely. There are many options for storing encryption keys, for example, on a file system, key vault, encrypted USB, trusted platform module (TPM), or hardware security module (HSM).
Consider the following questions when planning for key management:
- Where will the keys be stored?
- When should keys expire?
- How are keys protected?
- How are keys accessed?
- How can keys be recovered and revoked?
The Open Web Application Security Project (OWASP) provides a very comprehensive guide to securing encryption keys.
Encrypting Data at Rest with pgcrypto
The pgcrypto module for Greenplum Database provides functions for encrypting data at rest in the database. Administrators can encrypt columns with sensitive information, such as social security numbers or credit card numbers, to provide an extra layer of protection. Database data stored in encrypted form cannot be read by users who do not have the encryption key, and the data cannot be read directly from disk.
pgcrypto is installed by default when you install Greenplum Database. You must explicitly enable pgcrypto in each database in which you want to use the module.
pgcrypto allows PGP encryption using symmetric and asymmetric encryption. Symmetric encryption encrypts and decrypts data using the same key and is faster than asymmetric encryption. It is the preferred method in an environment where exchanging secret keys is not an issue. With asymmetric encryption, a public key is used to encrypt data and a private key is used to decrypt data. This is slower then symmetric encryption and it requires a stronger key.
Using pgcrypto always comes at the cost of performance and maintainability. It is important to use encryption only with the data that requires it. Also, keep in mind that you cannot search encrypted data by indexing the data.
Before you implement in-database encryption, consider the following PGP limitations.
- No support for signing. That also means that it is not checked whether the encryption sub-key belongs to the master key.
- No support for encryption key as master key. This practice is generally discouraged, so this limitation should not be a problem.
- No support for several subkeys. This may seem like a problem, as this is common practice. On the other hand, you should not use your regular GPG/PGP keys with pgcrypto, but create new ones, as the usage scenario is rather different.
Greenplum Database is compiled with zlib by default; this allows PGP encryption functions to compress data before encrypting. When compiled with OpenSSL, more algorithms will be available.
Because pgcrypto functions run inside the database server, the data and passwords move between pgcrypto and the client application in clear-text. For optimal security, you should connect locally or use SSL connections and you should trust both the system and database administrators.
pgcrypto configures itself according to the findings of the main PostgreSQL configure script.
When compiled with zlib
, pgcrypto encryption functions are able to compress data before encrypting.
Pgcrypto has various levels of encryption ranging from basic to advanced built-in functions. The following table shows the supported encryption algorithms.
Value Functionality | Built-in | With OpenSSL |
---|---|---|
MD5 | yes | yes |
SHA1 | yes | yes |
SHA224/256/384/512 | yes | yes 1. |
Other digest algorithms | no | yes 2 |
Blowfish | yes | yes |
AES | yes | yes3 |
DES/3DES/CAST5 | no | yes |
Raw Encryption | yes | yes |
PGP Symmetric-Key | yes | yes |
PGP Public Key | yes | yes |
Creating PGP Keys
To use PGP asymmetric encryption in Greenplum Database, you must first create public and private keys and install them.
This section assumes you are installing Greenplum Database on a Linux machine with the Gnu Privacy Guard (gpg
) command line tool. VMware recommends using the latest version of GPG to create keys. Download and install Gnu Privacy Guard (GPG) for your operating system from https://www.gnupg.org/download/. On the GnuPG website you will find installers for popular Linux distributions and links for Windows and Mac OS X installers.
As root, execute the following command and choose option 1 from the menu:
Respond to the prompts and follow the instructions, as shown in this example:
List the PGP keys by entering the following command:
2027CC30 is the public key and will be used to encrypt data in the database. 4FD2EFBB is the private (secret) key and will be used to decrypt data.
Export the keys using the following commands:
See the pgcrypto documentation for for more information about PGP encryption functions.
Encrypting Data in Tables using PGP
This section shows how to encrypt data inserted into a column using the PGP keys you generated.
Dump the contents of the
public.key
file and then copy it to the clipboard:Enable the
pgcrypto
extension:Create a table called
userssn
and insert some sensitive data, social security numbers for Bob and Alice, in this example. Paste the public.key contents after "dearmor(".Verify that the
ssn
column is encrypted.Extract the public.key ID from the database:
This shows that the PGP key ID used to encrypt the
ssn
column is 9D4D255F4FD2EFBB. It is recommended to perform this step whenever a new key is created and then store the ID for tracking.You can use this key to see which key pair was used to encrypt the data:
Note: Different keys may have the same ID. This is rare, but is a normal event. The client application should try to decrypt with each one to see which fits — like handling
ANYKEY
. See pgp_key_id() in the pgcrypto documentation.Decrypt the data using the private key.
If you created a key with passphrase, you may have to enter it here. However for the purpose of this example, the passphrase is blank.
Encrypting gpfdist Connections
The gpfdists
protocol is a secure version of the gpfdist
protocol that securely identifies the file server and the Greenplum Database and encrypts the communications between them. Using gpfdists
protects against eavesdropping and man-in-the-middle attacks.
The gpfdists
protocol implements client/server SSL security with the following notable features:
- Client certificates are required.
- Multilingual certificates are not supported.
- A Certificate Revocation List (CRL) is not supported.
- The TLSv1 protocol is used with the
TLS_RSA_WITH_AES_128_CBC_SHA
encryption algorithm. These SSL parameters cannot be changed. - SSL renegotiation is supported.
- The SSL ignore host mismatch parameter is set to false.
- Private keys containing a passphrase are not supported for the
gpfdist
file server (server.key) or for the Greenplum Database (client.key). - It is the user's responsibility to issue certificates that are appropriate for the operating system in use. Generally, converting certificates to the required format is supported, for example using the SSL Converter at https://www.sslshopper.com/ssl-converter.html.
A gpfdist
server started with the --ssl
option can only communicate with the gpfdists
protocol. A gpfdist
server started without the --ssl
option can only communicate with the gpfdist
protocol. For more detail about gpfdist
refer to the Greenplum Database Administrator Guide.
There are two ways to enable the gpfdists
protocol:
- Run
gpfdist
with the--ssl
option and then use thegpfdists
protocol in theLOCATION
clause of aCREATE EXTERNAL TABLE
statement. - Use a YAML control file with the SSL option set to true and run
gpload
. Runninggpload
starts thegpfdist
server with the--ssl
option and then uses thegpfdists
protocol.
When using gpfdists, the following client certificates must be located in the $PGDATA/gpfdists
directory on each segment:
- The client certificate file,
client.crt
- The client private key file,
client.key
- The trusted certificate authorities,
root.crt
Important: Do not protect the private key with a passphrase. The server does not prompt for a passphrase for the private key, and loading data fails with an error if one is required.
When using gpload
with SSL you specify the location of the server certificates in the YAML control file. When using gpfdist
with SSL, you specify the location of the server certificates with the --ssl option.
The following example shows how to securely load data into an external table. The example creates a readable external table named ext_expenses
from all files with the txt
extension, using the gpfdists
protocol. The files are formatted with a pipe (|
) as the column delimiter and an empty space as null.
Run
gpfdist
with the--ssl
option on the segment hosts.Log into the database and execute the following command:
Parent topic: Greenplum Database Best Practices
1 SHA2 algorithms were added to OpenSSL in version 0.9.8. For older versions, pgcrypto will use built-in code2 Any digest algorithm OpenSSL supports is automatically picked up. This is not possible with ciphers, which need to be supported explicitly.3 AES is included in OpenSSL since version 0.9.7. For older versions, pgcrypto will use built-in code.
Content feedback and comments