Tanzu Greenplum Platform Extension Framework 6.9

Accessing Azure, Google Cloud Storage, and S3-Compatible Object Stores

Last Updated February 14, 2025

PXF is installed with connectors to Azure Blob Storage, Azure Data Lake, Google Cloud Storage, AWS, MinIO, and Dell ECS S3-compatible object stores.

Prerequisites

Before working with object store data using PXF, ensure that:

Connectors, Data Formats, and Profiles

The PXF object store connectors provide built-in profiles to support the following data formats:

  • Text
  • CSV
  • Avro
  • JSON
  • ORC
  • Parquet
  • AvroSequenceFile
  • SequenceFile

The PXF connectors to Azure expose the following profiles to read, and in many cases write, these supported data formats:

Data FormatAzure Blob StorageAzure Data LakeSupported Operations
delimited single line plain textwasbs:textadl:textRead, Write
delimited single line comma-separated values of plain textwasbs:csvadl:csvRead, Write
multi-byte or multi-character delimited single line csvwasbs:csvadl:csvRead
delimited text with quoted linefeedswasbs:text:multiadl:text:multiRead
fixed width single line textwasbs:fixedwidthadl:fixedwidthRead, Write
Avrowasbs:avroadl:avroRead, Write
JSONwasbs:jsonadl:jsonRead, Write
ORCwasbs:orcadl:orcRead, Write
Parquetwasbs:parquetadl:parquetRead, Write
AvroSequenceFilewasbs:AvroSequenceFileadl:AvroSequenceFileRead, Write
SequenceFilewasbs:SequenceFileadl:SequenceFileRead, Write

Similarly, the PXF connectors to Google Cloud Storage, and S3-compatible object stores expose these profiles:

Data FormatGoogle Cloud StorageAWS S3, MinIO, or Dell ECSSupported Operations
delimited single line plain textgs:texts3:textRead, Write
delimited single line comma-separated values of plain textgs:csvs3:csvRead, Write
multi-byte or multi-character delimited single line comma-separated values csvgs:csvs3:csvRead
delimited text with quoted linefeedsgs:text:multis3:text:multiRead
fixed width single line textgs:fixedwidths3:fixedwidthRead, Write
Avrogs:avros3:avroRead, Write
JSONgs:jsons3:jsonRead
ORCgs:orcs3:orcRead, Write
Parquetgs:parquets3:parquetRead, Write
AvroSequenceFilegs:AvroSequenceFiles3:AvroSequenceFileRead, Write
SequenceFilegs:SequenceFiles3:SequenceFileRead, Write

You provide the profile name when you specify the pxf protocol on a CREATE EXTERNAL TABLE command to create a Greenplum Database external table that references a file or directory in the specific object store.

Sample CREATE EXTERNAL TABLE Commands

Note: When you create an external table that references a file or directory in an object store, you must specify a SERVER in the LOCATION URI.

The following command creates an external table that references a text file on S3. It specifies the profile named s3:text and the server configuration named s3srvcfg:

CREATE EXTERNAL TABLE pxf_s3_text(location text, month text, num_orders int, total_sales float8)
  LOCATION ('pxf://S3_BUCKET/pxf_examples/pxf_s3_simple.txt?PROFILE=s3:text&SERVER=s3srvcfg')
FORMAT 'TEXT' (delimiter=E',');

The following command creates an external table that references a text file on Azure Blob Storage. It specifies the profile named wasbs:text and the server configuration named wasbssrvcfg. You would provide the Azure Blob Storage container identifier and your Azure Blob Storage account name.

CREATE EXTERNAL TABLE pxf_wasbs_text(location text, month text, num_orders int, total_sales float8)
  LOCATION ('pxf://AZURE_CONTAINER@YOUR_AZURE_BLOB_STORAGE_ACCOUNT_NAME.blob.core.windows.net/path/to/blob/file?PROFILE=wasbs:text&SERVER=wasbssrvcfg')
FORMAT 'TEXT';

The following command creates an external table that references a text file on Azure Data Lake. It specifies the profile named adl:text and the server configuration named adlsrvcfg. You would provide your Azure Data Lake account name.

CREATE EXTERNAL TABLE pxf_adl_text(location text, month text, num_orders int, total_sales float8)
  LOCATION ('pxf://YOUR_ADL_ACCOUNT_NAME.azuredatalakestore.net/path/to/file?PROFILE=adl:text&SERVER=adlsrvcfg')
FORMAT 'TEXT';

The following command creates an external table that references a JSON file on Google Cloud Storage. It specifies the profile named gs:json and the server configuration named gcssrvcfg:

CREATE EXTERNAL TABLE pxf_gsc_json(location text, month text, num_orders int, total_sales float8)
  LOCATION ('pxf://dir/subdir/file.json?PROFILE=gs:json&SERVER=gcssrvcfg')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');