Tanzu Greenplum Platform Extension Framework 7.0.0-beta.1

Accessing Azure, Google Cloud Storage, and S3-Compatible Object Stores

Last Updated February 14, 2025

PXF is installed with connectors to Azure Blob Storage, Azure Data Lake Storage Gen2, Google Cloud Storage, AWS, MinIO, and Dell ECS S3-compatible object stores.

Prerequisites

Before working with object store data using VMware Tanzu Greenplum platform extension framework (PXF), ensure that:

Connectors, Data Formats, and Profiles

The PXF object store connectors provide built-in profiles to support the following data formats:

  • Text
  • CSV
  • Avro
  • JSON
  • ORC
  • Parquet
  • AvroSequenceFile
  • SequenceFile

The PXF connectors to Azure expose the following profiles to read, and in many cases write, these supported data formats.

ADL support has been deprecated as of PXF 7.0.0. Use the ABFSS profile instead.

Data FormatAzure Blob StorageAzure Data Lake Storage Gen2Supported Operations
delimited single line plain textwasbs:textabfss:textRead, Write
delimited single line comma-separated values of plain textwasbs:csvabfss:csvRead, Write
multi-byte or multi-character delimited single line csvwasbs:csvabfss:csvRead
delimited text with quoted linefeedswasbs:text:multiabfss:text:multiRead
fixed width single line textwasbs:fixedwidthabfss:fixedwidthRead, Write
Avrowasbs:avroabfss:avroRead, Write
JSONwasbs:jsonabfss:jsonRead, Write
ORCwasbs:orcabfss:orcRead, Write
Parquetwasbs:parquetabfss:parquetRead, Write
AvroSequenceFilewasbs:AvroSequenceFileabfss:AvroSequenceFileRead, Write
SequenceFilewasbs:SequenceFileabfss:SequenceFileRead, Write

Similarly, the PXF connectors to Google Cloud Storage, and S3-compatible object stores expose these profiles:

Data FormatGoogle Cloud StorageAWS S3, MinIO, or Dell ECSSupported Operations
delimited single line plain textgs:texts3:textRead, Write
delimited single line comma-separated values of plain textgs:csvs3:csvRead, Write
multi-byte or multi-character delimited single line comma-separated values csvgs:csvs3:csvRead
delimited text with quoted linefeedsgs:text:multis3:text:multiRead
fixed width single line textgs:fixedwidths3:fixedwidthRead, Write
Avrogs:avros3:avroRead, Write
JSONgs:jsons3:jsonRead
ORCgs:orcs3:orcRead, Write
Parquetgs:parquets3:parquetRead, Write
AvroSequenceFilegs:AvroSequenceFiles3:AvroSequenceFileRead, Write
SequenceFilegs:SequenceFiles3:SequenceFileRead, Write

You provide the profile name when you specify the pxf protocol on a CREATE EXTERNAL TABLE command to create a Greenplum Database external table that references a file or directory in the specific object store.

Sample CREATE EXTERNAL TABLE Commands

When you create an external table that references a file or directory in an object store, you must specify a SERVER in the LOCATION URI

The following command creates an external table that references a text file on S3. It specifies the profile named s3:text and the server configuration named s3srvcfg:

CREATE EXTERNAL TABLE pxf_s3_text(location text, month text, num_orders int, total_sales float8)
  LOCATION ('pxf://S3_BUCKET/pxf_examples/pxf_s3_simple.txt?PROFILE=s3:text&SERVER=s3srvcfg')
FORMAT 'TEXT' (delimiter=E',');

The following command creates an external table that references a text file on Azure Blob Storage. It specifies the profile named wasbs:text and the server configuration named wasbssrvcfg. You would provide the Azure Blob Storage container identifier and your Azure Blob Storage account name.

CREATE EXTERNAL TABLE pxf_wasbs_text(location text, month text, num_orders int, total_sales float8)
  LOCATION ('pxf://AZURE_CONTAINER@YOUR_AZURE_BLOB_STORAGE_ACCOUNT_NAME.blob.core.windows.net/path/to/blob/file?PROFILE=wasbs:text&SERVER=wasbssrvcfg')
FORMAT 'TEXT';

The following command creates an external table that references a text file on Azure Data Lake Storage Gen2. It specifies the profile named abfss:text and the server configuration named abfsssrvcfg. You would provide your Azure Data Lake Storage Gen2 account name.

CREATE EXTERNAL TABLE pxf_abfss_text(location text, month text, num_orders int, total_sales float8)
  LOCATION ('pxf://YOUR_ABFSS_ACCOUNT_NAME.dfs.core.windows.net/path/to/file?PROFILE=abfss:text&SERVER=abfsssrvcfg')
FORMAT 'TEXT';

The following command creates an external table that references a JSON file on Google Cloud Storage. It specifies the profile named gs:json and the server configuration named gcssrvcfg:

CREATE EXTERNAL TABLE pxf_gsc_json(location text, month text, num_orders int, total_sales float8)
  LOCATION ('pxf://dir/subdir/file.json?PROFILE=gs:json&SERVER=gcssrvcfg')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');