Tanzu Greenplum Platform Extension Framework 6.10

About Column Projection in PXF

Last Updated February 14, 2025

PXF supports column projection, and it is always enabled. With column projection, only the columns required by a SELECT query on an external table are returned from the external data source. This process can improve query performance, and can also reduce the amount of data that is transferred to Greenplum Database.

Note: Some external data sources do not support column projection. If a query accesses a data source that does not support column projection, the query is instead run without it, and the data is filtered after it is transferred to Greenplum Database.

Column projection is automatically enabled for the pxf external table protocol. PXF accesses external data sources using different connectors, and column projection support is also determined by the specific connector implementation. The following PXF connector and profile combinations support column projection on read operations:

Data SourceConnectorProfile(s)
External SQL databaseJDBC Connectorjdbc
HiveHive Connectorhive (accessing tables stored as Text, Parquet, RCFile, and ORC), hive:rc, hive:orc
HadoopHDFS Connectorhdfs:orc, hdfs:parquet
Network File SystemFile Connectorfile:orc, file:parquet
Amazon S3S3-Compatible Object Store Connectorss3:orc, s3:parquet
Amazon S3 using S3 SelectS3-Compatible Object Store Connectorss3:parquet, s3:text
Google Cloud StorageGCS Object Store Connectorgs:orc, gs:parquet
Azure Blob StorageAzure Object Store Connectorwasbs:orc, wasbs:parquet
Azure Data LakeAzure Object Store Connectoradl:orc, adl:parquet

Note: PXF may deactivate column projection in cases where it cannot successfully serialize a query filter; for example, when the WHERE clause resolves to a boolean type.

To summarize, all of the following criteria must be met for column projection to occur:

  • The external data source that you are accessing must support column projection. For example, Hive supports column projection for ORC-format data, and certain SQL databases support column projection.
  • The underlying PXF connector and profile implementation must also support column projection. For example, the PXF Hive and JDBC connector profiles identified above support column projection, as do the PXF connectors that support reading Parquet data.
  • PXF must be able to serialize the query filter.