The VMware Tanzu Greenplum platform extension framework for Red Hat Enterprise Linux, CentOS, and Oracle Enterprise Linux is updated and distributed independently of Greenplum Database starting with version 5.13.0. Version 5.16.0 is the first independent release that includes an Ubuntu distribution. Version 6.3.0 is the first independent release that includes a Red Hat Enterprise Linux 8.x distribution. Version 6.9.0 is the first independent release that includes a Red Hat Enterprise Linux 9.x distribution.
You must download and install the PXF package to obtain the most recent version of this component.
Supported Platforms
The independent PXF 6.x distribution is compatible with these operating system platform versions and Greenplum Database versions:
PXF Version | OS Version | Greenplum Version |
---|---|---|
5.13+, 6.0+ | RHEL 7.x, CentOS 7.x, OEL 7.x | 5.21.2+, 6.x |
5.16+, 6.0+ | RHEL 7.x, CentOS 7.x, OEL 7.x, Ubuntu 18.04 LTS | 6.x |
6.3+ | RHEL 7.x, CentOS 7.x, OEL 7.x, Ubuntu 18.04 LTS, RHEL 8.x | 6.20+ |
6.8+ | RHEL 7.x, CentOS 7.x, OEL 7.x, Ubuntu 18.04 LTS, RHEL 8.x | 6.20+, 7.x |
6.9+ | RHEL 7.x, CentOS 7.x, OEL 7.x, Ubuntu 18.04 LTS, RHEL 8.x, RHEL 9.x | 6.26+ |
PXF is compatible with these Java and Hadoop component versions:
PXF Version | Java Versions | Hadoop Versions | Hive Server Versions | HBase Server Version |
---|---|---|---|---|
6.10.0, 6.9.x, 6.8.0, 6.7.0, 6.6.0, 6.5.x, 6.4.x, 6.3.x, 6.2.x, 6.1.0, 6.0.x | 8, 11 | 2.x, 3.1+ | 1.x, 2.x, 3.1+ | 1.3.2 |
5.16.x, 5.15.x, 5.14, 5.13 | 8, 11 | 2.x, 3.1+ | 1.x, 2.x, 3.1+ | 1.3.2 |
Upgrading to Version 6.x
Release 6.10.2
Release Date: July 12, 2024
New and Changed Features
PXF 6.10.2 includes these new and changed features:
- PXF now supports querying and migrating data encoded in GB18030.
- The Spring Framework library dependency has been updated to version 5.3.34.
- The
gp-common-go-libs
library dependency has been updated to version v1.0.20.
Resolved Issues
PXF 6.10.2 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2024‑22262 | Updates Spring Framework to version 5.3.34. (Resolved by PR-1105.) |
CVE‑2024‑45288 | Updates golang.org/x/net to version 0.23.0. (Resolved by PR-1105.) |
Release 6.10.1
Release Date: March 27, 2024
Changed Features
PXF 6.10.1 includes these changes:
- PXF improves performance when reading multi-line JSON files that contain multiple JSON objects on a single line.
- The Spring Framework library dependency has been updated to version 5.3.33.
- The Tomcat library dependency has been updated to version 9.0.87.
- The
gp-common-go-libs
library dependency has been updated to version v1.0.16.
Resolved Issues
PXF 6.10.1 resolves these issues:
Issue # | Summary |
---|---|
33272 | Resolves slowness when reading multi-line Json files that have multiple Json objects on a single line. (Resolved by PR-1100.) |
CVE‑2024‑22243 | Updates Spring Framework to version 5.3.33. (Resolved by PR-1105.) |
CVE‑2024‑22259 | Updates Spring Framework to version 5.3.33. (Resolved by PR-1105.) |
CVE‑2024‑24549 | Updates Tomcat to version 9.0.87. (Resolved by PR-1108.) |
CVE‑2024‑27289 | Updates github.com/jackc/pgx/v4 to version v4.18.2. (Resolved by PR-1102.) |
947 | Resolves an issue where a Greenplum query would fail with the error, “transfer closed with outstanding read data remaining”, due to race conditions between Tomcat and Spring MVC async error handling. (Resolved by PR-1105.) |
Release 6.10.0
Release Date: March 4, 2024
New and Changed Features
PXF 6.10.0 includes these new features and changes:
- The PXF JDBC Connector adds support for reading and writing the
UUID
data type. - PXF is now bundled with the PostgreSQL JDBC driver version 42.7.2.
- For both the PXF external table and the FDW extensions, the PXF JDBC Connector will now error out on
timestamp
data in which the year has more than 4 digits, when date_wide_range is not enabled. - The PXF JDBC connector will now fail when reading
date
values with more than 4 digit years, when using Java 11.
Resolved Issues
PXF 6.10.0 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2024‑1597 | Updates the postgresql JDBC JAR file to version 42.7.2. |
N/A | Resolves an issue with the PXF JDBC Connector where the behavior of the PXF external table extension differed from the behavior of the PXF FDW extension when reading timestamp data in which the year had more than 4 digits. (Resolved by PR-1081.) |
N/A | Resolves an issue where, when using Java 11, the PXF JDBC connector was not correctly reading date values with more than 4 digits. (Resolved by PR-1096.) |
Release 6.9.1
Release Date: February 6, 2024
Changed Features
PXF 6.9.1 includes these changes:
- PXF bundles a newer
gp-common-go-libs
supporting library along with its dependencies. - The Spring library dependency is updated to version 2.7.18.
Resolved Issues
PXF 6.9.1 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2023‑48795 | Updates golang.org/x/crypto . |
CVE‑2023‑41080 | Updates bundled Tomcat version. |
CVE‑2023‑42795 | Updates bundled Tomcat version. |
CVE‑2023‑45648 | Updates bundled Tomcat version. |
CVE‑2023‑46589 | Updates bundled Tomcat version. |
CVE‑2023‑34055 | Updates Springboot to version 2.7.18. |
Release 6.9.0
Release Date: December 8, 2023
New and Changed Features
PXF 6.9.0 includes these new features and changes:
- PXF now supports Red Hat Enterprise Linux 64-bit 9.x for VMware Greenplum version 6.26.x.
- PXF improves error reporting by
pxf register
when installing the control file fails. - The
snappy-java
library dependency is updated to version 1.1.10.4. - The
golang.org/x/net
library dependency is updated to 0.17.0. - The Go standard library dependency is updated to version 1.21.3.
Resolved Issues
PXF 6.9.0 resolves these issues:
Issue # | Summary |
---|---|
33104 | Resolves an issue where PXF was not reporting an error when pxf register failed to install the control file. (Resolved by PR-1047.) |
Release 6.8.0
Release Date: September 28, 2023
New and Changed Features
PXF 6.8.0 includes these new features and changes:
- PXF introduces a new property to the pxf-site.xml per-server configuration file. PXF uses this property, named
pxf.service.kerberos.ticket-renew-window
, to identify how much of a Kerberos ticket lifespan should elapse before it generates a new ticket. - The PXF Hadoop and Object Store Connectors (including S3-Select) now support predicate pushdown for the
NUMERIC
data type when reading non-Parquet data formats. - The PXF JDBC Connector now supports predicate pushdown for the
CHAR
,VARCHAR
, andNUMERIC
data types. - PXF command line output and log messages replace the term master with the term coordinator.
- The
azure-storage
library dependency is updated to version 5.5.0. - PXF now supports Red Hat Enterprise Linux 64-bit 8.x for VMware Greenplum version 7.x.
- PXF removes support for the MapR Hadoop distribution.
Release 6.7.0
Release Date: July 14, 2023
This version of the VMware Greenplum Platform Extension Framework documentation replaces the term master with the term coordinator.
New and Changed Features
PXF 6.7.0 includes these new features and changes:
- PXF introduces support for reading data that contains a multi-byte delimiter or a delimiter with multiple characters. Refer to About Reading Data Containing Multi-Byte or Multi-Character Delimiters for more information and for examples.
-
PXF introduces the following new features for JSON data type support:
- Assuming that the Greenplum Database data type is chosen properly, adds support for reading the original precision of numeric types.
- Adds support for reading one-dimensional arrays of JSON primitive types.
- Adds support for writing JSON primitive types and one-dimensional arrays.
Refer to Reading and Write JSON Data for more information.
-
PXF enhances its message logging for write operations.
- PXF exposes a new PXF Service application property,
server.address
, to specify the listen address for the PXF Service. - The default PXF Service listen address is changed from
0.0.0.0
tolocalhost
. If you have previously configured PXF to run on non-Greenplum hosts, or you wish to retain the previous listen address, you must re-configure the listen address. - The PXF JDBC Connector can now read/write a date that contains more than four alphanumeric characters in the year. Specify the
DATE_WIDE_RANGE
external table option or thejdbc.date.wideRange
jdbc-site.xml
server configuration file property to use this new feature. - The Spring library dependency is updated to version 2.7.12.
- The
snappy-java
library dependency is updated to version 1.1.10.1. -
PXF v6.7.0 introduces these changes and features related to numeric precison overflow detection and action when writing to ORC files:
- PXF logs a warning when it encounters an overflow condition and the decimal overflow option is set to
ignore
. - PXF now attempts to round numeric data to meet both precision and scale requirements before writing to an ORC file when the data exceeds the maximum supported precision of 38 and overflows. PXF previously rounded the data to meet only precision requirement, or wrote a
NULL
value if it failed to round. See Resolved Issues. - PXF uses the new
pxf.orc.write.decimal.overflow
property in thepxf-site.xml
server configuration file to govern its action when numeric data that it writes to an ORC file exceeds the maximum supported precision of 38 and overflows. During upgrade, be sure to perform step 8 if you want to change the default value (round
) of this property for an existing PXF server configuration.
- PXF logs a warning when it encounters an overflow condition and the decimal overflow option is set to
Resolved Issues
PXF 6.7.0 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2023‑34455 | Updates snappy-java to version 1.1.10.1. (Resolved by PR-989.) |
32890 | Adds enhanced message logging for write operations. (Partially resolved by PR-979.) |
32723 | Resolves an issue where PXF wrote a NULL value to an ORC file when numeric data had a precision greater than 38. (Resolved by PR-978.) |
974 | Resolves an issue where the default PXF listen address was not limited to local traffic. (Resolved by PR-976.) |
N/A | Reduces the frequency of partial file transfer errors by updating the Spring library dependency to version 2.7.12. (Resolved by PR-983.) |
Release 6.6.0
Release Date: April 10, 2023
New and Changed Features
PXF 6.6.0 includes these new features and changes:
- PXF introduces new
*:fixedwidth
profiles to support fixed-width text data. Refer to Reading and Writing Fixed-Width Text Data for more information and for examples. -
PXF v6.6.0 introduces these changes and features related to precison overflow detection and action when writing to Parquet files:
- PXF now always logs a warning when it encounters an overflow condition.
- PXF now attempts to round numeric data before writing to a Parquet file when the data exceeds the maximum supported precision of 38 and overflows. PXF previously wrote a
NULL
value. See Resolved Issues. - PXF uses the new
pxf.parquet.write.decimal.overflow
property in thepxf-site.xml
server configuration file to govern its action when numeric data that it writes to a Parquet file exceeds the maximum supported precision of 38 and overflows. During upgrade, be sure to perform step 7 if you want to change the default value (round
) of this property for an existing PXF server configuration.
- PXF v6.6.0 deprecates the
DATA-SCHEMA
external table option (used withSequenceFile
profiles) and replaces it with the option namedDATA_SCHEMA
.
Resolved Issues
PXF 6.6.0 resolves these issues:
Issue # | Summary |
---|---|
32723 | Partially resolves an issue where PXF wrote a NULL value to a Parquet file when numeric data had a precision greater than 38. (Resolved by PR-940.) |
32715 | Partially resolves an issue where PXF returned an ArrayIndexOutOfBoundsException when it wrote a numeric value with precision greater than 38 to a Parquet file. (Resolved by PR-940.) |
Release 6.5.1
Release Date: March 20, 2023
Changed Features
PXF 6.5.1 includes these changes:
- PXF is updated to work with the external table framework in Greenplum 7 (Beta 2).
- PXF bundles a newer
gp-common-go-libs
supporting library along with its dependencies to resolve several CVEs (see Resolved Issues).
Resolved Issues
PXF 6.5.1 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2022‑41723 | Updates golang.org/x/net . |
CVE‑2021‑43565 | Updates golang.org/x/crypto/ssh . |
CVE‑2022‑27664 | Updates golang.org/x/net/http2 |
CVE‑2022‑27191 | Updates golang.org/x/crypto/ssh . |
CVE‑2022‑32149 | Updates golang.org/x/text/language . |
CVE‑2022‑30632 | Updates golang.org/x/net/http/httpguts . |
CVE‑2020‑29652 | Updates golang.org/x/crypto/ssh . |
CVE‑2021‑33194 | Updates golang.org/x/net/html . |
CVE‑2021‑38561 | Updates golang.org/x/text/language . |
CVE‑2022‑29526 | Updates golang.org/x/sys/unix . |
Release 6.5.0
Release Date: December 22, 2022
New and Changed Features
PXF 6.5.0 includes these new features and changes:
- PXF improves support for reading JSON records that span multiple lines both when the data includes special characters, and when the data is compressed with a splittable codec like BZip2.
- PXF introduces a new
CREATE EXTERNAL TABLE
option for the*:json
profiles namedSPLIT_BY_FILE
that you can use to specify how PXF splits the data it reads. The default value isfalse
, PXF creates multiple splits for each file that will be processed in parallel. When set totrue
, PXF creates and processes a single split per file. - PXF adds support for specifying parallel execution parameters when you use the JDBC Connector to access an Oracle database. Refer to About Setting Oracle Parallel Query Session Parameters for more information.
- PXF is now bundled with the PostgreSQL JDBC driver version 42.4.3.
- PXF adds support for reading and writing Parquet
LIST
types. Refer to the PXF Parquet Data Type Mapping documentation for more information about the data types supported and the data type mappings.
Resolved Issues
PXF 6.5.0 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2022‑41946 | Updates the postgresql JDBC JAR file to version 42.4.3. |
32387 | PXF did not support reading or writing Parquet LIST types. PXF 6.5.0 includes native support for reading and writing LISTS of certain Parquet types. (Resolved by PR-885 and PR-876.) |
32353 | When PXF read a JSON file containing multi-line records and the external table definition specified both a *:json profile and an IDENTIFIER , PXF could both return wrong results when the data included special characters, and return duplicate rows when the data was compressed with a splittable codec. These issues are resolved. (Resolved by PR-879.) |
N/A | Resolves an issue where, in certain error conditions, PXF failed to close a connection to an external data source. PXF now closes the connection. (Resolved by PR-897.) |
N/A | Resolves an out-of-buffer data access issue by adding additional buffer boundary checks to the PXF extension to guard against invalid reads. (Resolved by PR-885.) |
N/A | Resolves an issue where PXF may have returned incomplete or incorrect results when it did not project a boolean column that was included in a WHERE clause but was not also present in the SELECT list. (Resolved by PR-875.) |
Release 6.4.2
Release Date: September 20, 2022
Changed Features
PXF 6.4.2 includes these changes:
- Reading a JSON file containing multi-line records may be less performant due to the fix implemented for resolved issue
32353
. - Removes a string length check in the PXF extension that was added in version 6.3.2, and instead logs a message.
Resolved Issues
PXF 6.4.2 resolves these issues:
Issue # | Summary |
---|---|
32439 | Resolves an issue where PXF returned the error expected column <N> to have length <M>, actual length is 0 when it read an ORC or Parquet data file that contained a string that included ASCII NULL-bytes by removing a string length check. (Resolved by PR-870.) |
32353 | Resolves an issue where PXF returned incomplete data when it read a JSON file containing multi-line records, and the external table definition specified both a *:json profile and an IDENTIFIER . (Resolved by PR-858.) |
Release 6.4.0
Release Date: August 19, 2022
Changed Features
PXF 6.4.0 includes these changes:
- Adds support for writing ORC primitive types and one-dimensional arrays.
- Introduces a new configuration property named
pxf.orc.write.timezone.utc
to govern how PXF writes ORC timestamp values to the external data store. By default, PXF writes timestamp values using the UTC time zone. - Adds support for using a
PreparedStatement
when reading with the JDBC Connector, using thejdbc.read.prepared-statement
property injdbc-site.xml
. - Updates the
aws-java-sdk-s3
dependency to version 1.12.261 to resolve CVE-2022-31159. - Updates the
snappy-java
dependency to version 1.1.8.4. - Updates the
postgresql
dependency to version 42.4.1 to resolve CVE-2022-31197.
Release 6.3.2
Release Date: July 21, 2022
Changed Features
PXF 6.3.2 includes these changes:
- Improves the error messages returned when PXF encounters
UnsupportedOperationException
s. - Adds data buffer boundary checks to the PXF extension to guard against invalid reads.
- Updates Spring to version 2.5.12 to resolve CVE-2022-22965. For more information about this vulnerability, including impacted product suites and release lines, please refer to VMSA-2022-0010.
- Updates the bundled Log4j library to version 2.17.2 (pulled in with the Spring update to version 2.5.12).
- Updates Hadoop libraries to version 2.10.2 to resolve CVE-2021-37404.
- Updates the bundled version of the ORC library to version 1.6.13 to obtain the fix for ORC-1065.
Resolved Issues
PXF 6.3.2 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2022‑22965 | Updates Spring to version 2.5.12. (Resolved by PR-789.) |
CVE‑2021‑37404 | Updates Hadoop to version 2.10.2. (Resolved by PR-819.) |
32264 | Resolves an ORC split generation failed error encountered when the HiveORC profile was used to read an ORC file by updating the bundled ORC library to version 1.6.13 to pull in the fix for ORC-1065. (Resolved by PR-815.) |
Release 6.3.1
Release Date: April 27, 2022
Resolved Issues
PXF 6.3.1 resolves these issues:
Issue # | Summary |
---|---|
32177 | Resolves an issue where PXF returned a NullPointerException while reading from a Hive table when the hive:orc profile and the VECTORIZE=true option were specified, and some of the table data contained repeating values. (Resolved by PR-794.) |
32149 | Resolves an issue where the PXF post-installation script failed when the PXF rpm was installed with the --prefix option (install to a custom location). (Resolved by PR-788.) |
Release 6.3.0
Release Date: March 18, 2022
New and Changed Features
PXF 6.3.0 includes these new and changed features:
- Distributes a Broadcom Support Portal download package for Greenplum 6 for Red Hat Enterprise Linux 64-bit 8.x.
- Bundles version 42.3.3 of the
postgresql
JDBC JAR file to mitigate CVE-2022-21724. - Introduces support for reading the
date
,decimal
,local-timestamp-millis
,local-timestamp-micros
,time-millis
,time-micros
,timestamp-millis
,timestamp-micros
, anduuid
Avro logical types. - Supports upgrading when you use
gpupgrade
to upgrade from Greenplum 5 to Greenplum 6. The PXF package now includes two new scripts,pxf-pre-gpupgrade
andpxf-post-gpupgrade
, that you use during this upgrade process. - Supports Kerberos constrained delegation (also known as S4U2proxy) for accessing HDFS and Hive. This feature allows you to configure the PXF service principal in Microsoft Active Directory or a Red Hat IPA Server, and to direct the service to act as a delegate for obtaining Kerberos tickets from end users to the HDFS namenode service principal. With Kerberos constrained delegation, configuring the PXF service principal to be a Hadoop proxy user is no longer required to access a Kerberos-secured Hadoop cluster. This feature is deactivated by default. PXF introduces a new configuration property named
pxf.service.kerberos.constrained-delegation
to activate this feature.
Resolved Issues
PXF 6.3.0 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2022‑21724 | Updates the postgresql JDBC JAR file to version 42.3.3. (Resolved by PR-760.) |
31992 | Resolves an issue where PXF returned duplicate rows when the hdfs:json profile was used to read a JSON file with multi-line records and the data contained multi-byte characters. (Resolved by PR-738.) |
31112 | Resolves an issue where PXF required that its service principal be configured as a Hadoop proxy user to access a Kerberos-secured Hadoop cluster. (Resolved by PR-707.) |
N/A | Resolves an issue where PXF did not close Hive Metastore connections in a timely manner, which eventually resulted in the exhaustion of the Metastore connection pool. (Resolved by PR-756.) |
Release 6.2.3
Release Date: February 1, 2022
Changed Features
PXF 6.2.3 includes these changes:
- PXF bundles version 2.17.1 of the
log4j2
library to mitigate CVE-2021-44832. - PXF updates the version of
go
that it uses to build thepxf
CLI tool to version 1.17.6 to mitigate CVE-2021-44716. - PXF now writes early startup messages that were previously directed to
stdout/stderr
and ignored to the file$PXF_LOG_DIR/pxf_app.out
. - PXF introduces a performance improvement when it iterates over a list of fragments.
Resolved Issues
PXF 6.2.3 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2021‑44832 | Updates the bundled log4j2 library to version 2.17.1. (Resolved by PR-735.) |
CVE‑2021‑44716 | Updates the go library to version 1.17.6. (Resolved by PR-740.) |
Release 6.2.2
Release Date: December 22, 2021
Changed Features
PXF 6.2.2 includes these changes:
- PXF bundles version 2.17.0 of the
log4j2
library to mitigate CVE-2021-45105. - PXF downgrades the bundled version of Spring Boot to resolve issue 31927.
Resolved Issues
PXF 6.2.2 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2021‑45105 | Updates the bundled log4j2 library to version 2.17.0. (Resolved by PR-733.) |
31927 | Resolves an issue where the PXF C extension reported a partial file transfer error when a data-less response that the PXF server sent to Greenplum Database failed to include a zero-length chunk. PXF 6.2.2 downgrades the bundled version of Spring Boot to 2.4.3, which does not exhibit the error behavior. (Resolved by PR-732.) |
Release 6.2.1
Release Date: December 17, 2021
Changed Features
PXF 6.2.1 includes these changes:
- PXF bundles version 2.16.0 of the
log4j2
library to mitigate CVE-2021-44228 and CVE-2021-45046. - PXF now returns an
UnsupportedOperationException
when it accesses a Hive transactional table. - PXF now supports the
SKIP_HEADER_COUNT
option for external tables that specified a*:text:multi
profile. - When reading from a MySQL database, PXF now uses a
jdbc.statement.fetchSize
default value of-2147483648
(Integer.MIN_VALUE
). This setting enables the MySQL JDBC driver to stream the results from a MySQL server, lessening the memory requirements when reading large data sets. - The PXF Hive connector now uses the
hive-site.xml
hive.metastore.failure.retries
property setting to identify the maximum number of times to retry a failed connection to the Hive MetaStore. The default value is one retry. Addressing Hive MetaStore Connection Errors describes when and how to configure this property.
Resolved Issues
PXF 6.2.1 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2021‑45046 | Updates the bundled log4j2 library to version 2.16.0. (Resolved by PR-727.) |
CVE‑2021‑44228 | Updates the bundled log4j2 library to version 2.15.0. (Resolved by PR-723.) |
31955 | Resolves an issue where PXF failed to access a Hive table due to a MetaStore connection issue. PXF now includes retry logic for the MetaStore connection based on the hive.metastore.failure.retries property setting in the hive-site.xml file. (Resolved by PR‑726.) |
31948 | Resolves an issue where PXF ran out of memory when it read a large data set from a MySQL database. PXF now uses a jdbc.statement.fetchSize default value of -2147483648 (Integer.MIN_VALUE ) when it accesses MySQL, which streams the results from a MySQL server to PXF. (Resolved by PR‑721.) |
31906 | Resolves an issue where PXF returned 0 rows when a query was performed on a Hive transactional table instead of reporting that transactional tables are unsupported. PXF now more clearly identifies the problem by returning an UnsupportedOperationException and the error: PXF does not support Hive transactional tables . (Resolved by PR-719.) |
31791 | Resolves an issue where PXF ignored the SKIP_HEADER_COUNT custom option when it read from an external data source via an external table that specified a *:text:multi profile. PXF now recognizes and implements this option for *:text:multi profiles. (Resolved by PR-710.) |
Release 6.2.0
Release Date: September 13, 2021
New and Changed Features
PXF 6.2.0 includes these new and changed features:
- PXF adds support for reading a JSON array into a Greenplum Database text array (
TEXT[]
). Refer to Working with JSON Data for additional information. - PXF adds support for reading lists of certain ORC scalar types into a Greenplum Database array of native type. Refer to the PXF ORC data type mapping documentation for more information about the data type mapping.
- PXF bundles newer versions of ORC, Spring Boot, and other dependent libraries.
-
PXF improves its message logging by:
- Better aligning the log message text.
- Also logging the affected fragment when it encounters a read error.
- PXF introduces a new property to the pxf-site.xml per-server configuration file. PXF uses this property,
pxf.sasl.connection.retries
, to specify the maximum number of times that it retries a SASL connection request to an external data source after a refused connection returns aGSS initiate failed
error. - PXF introduces a new PXF Service application property,
pxf.fragmenter-cache.expiration
, to specify the amount of time after which an entry expires and is removed from the fragment cache.
Resolved Issues
PXF 6.2.0 resolves these issues:
Issue # | Summary |
---|---|
N/A | Resolves an issue when using the jdbc profile to write data to a Hive table. The Hive JDBC driver always returned 0 when running an update, and PXF would return an error even if the INSERT ran correctly. (Resolved by PR-662.) |
31675 | Resolves a fragment cache issue that appeared when an external table was re-created within the same transaction in a stored procedure, and the new external table referenced a different LOCATION . (Resolved by PR-691.) |
31657 | Queries on an external table intermittently failed in some Kerberos-secured environments because the Hadoop NameNode erroneously detected a replay attack during Kerberos authentication. This issue is resolved by PR-688. |
31571 | PXF did not support ORC lists. PXF 6.2.0 includes support for reading lists of certain ORC scalar types into a Greenplum Database array of native type. (Resolved by PR-675.) |
31326 | PXF did not support reading a JSON array into a Greenplum Database array-type column. PXF 6.2.0 includes support for reading a JSON array into a text array (TEXT[] ). (Resolved by PR-646.) |
683 | Resolves an issue where PXF incorrectly casted an enum value from the external data source to a string . (Resolved by PR-696.) |
Release 6.1.0
Release Date: June 24, 2021
New and Changed Features
PXF 6.1.0 includes these new and changed features:
- PXF now natively supports reading and writing Avro arrays.
- PXF adds support for reading JSON objects, such as embedded arrays, as
text
. The data returned by PXF is a valid JSON string that you can manipulate with the existing Greenplum Database JSON functions and operators. - PXF improves its error reporting by displaying the exception class when there is no error message available.
- PXF introduces a new property that you can use to configure the connection timeout for data upload/write operations to an external datastore. This property is named
pxf.connection.upload-timeout
, and is located in the pxf-application.properties file. - PXF now uses the
pxf.connection.timeout
configuration property to set the connection timeout only for read operations. If you previously set this property to specify the write timeout, you should now usepxf.connection.upload-timeout
instead. - PXF bundles a newer
gp-common-go-libs
supporting library along with its dependencies.
Resolved Issues
PXF 6.1.0 resolves these issues:
Issue # | Summary |
---|---|
31389 | Resolves an issue where certain pxf cluster commands returned the error connect: no such file or directory when the current working directory contained a directory with the same name as the hostname. This issue was resolved by upgrading a dependent library. (Resolved by PR-633.) |
31317 | PXF did not support writing Avro arrays. PXF 6.1.0 includes native support for reading and writing Avro arrays. (Resolved by PR-636.) |
Release 6.0.1
Release Date: May 11, 2021
Resolved Issues
PXF 6.0.1 resolves these issues:
Issue # | Summary |
---|---|
N/A | Resolves an issue where PXF returned wrong results for batches of ORC data that were shorter than the default batch size. (Resolved by PR-630.) |
N/A | Resolves an issue where PXF threw a NullPointerException when it encountered a repeating ORC column value of type string . (Resolved by PR-627.) |
178013439 | Resolves an issue where using the profile HiveVectorizedORC did not result in vectorized execution. (Resolved by PR-624.) |
31409 | Resolves an issue where PXF intermittently failed with the error ERROR: PXF server error(500) : Failed to initialize HiveResolver when it accessed Hive tables STORED AS ORC . (Resolved by PR-626.) |
Release 6.0.0
Release Date: March 29, 2021
New and Changed Features
PXF 6.0.0 includes these new and changed features:
Architecture and Bundled Libraries
-
PXF 6.0.0 is built on the Spring Boot framework:
- PXF distributes a single JAR file that includes all of its dependencies.
- PXF no longer installs and uses a standalone Tomcat server; it uses the Tomcat version 9.0.43 embedded in the PXF Spring Boot application.
- PXF bundles the
postgresql-42.2.14.jar
PostgreSQL driver JAR file. - PXF library dependencies have changed with new, updated, and removed libraries.
- The PXF API has changed. If you are upgrading from PXF 5.x, you must update the PXF extension in each database in which it is registered as described in Upgrading from PXF 5.
- PXF 6 moves fragment allocation from its C extension to the PXF Service running on each segment host.
- The PXF Service now also runs on the Greenplum Database master and standby master hosts. If you used PXF 5.x to access Kerberos-secured HDFS, you must now generate principals and keytabs for the master and standby master as described in Upgrading from PXF 5.
Files, Configuration, and Commands
- PXF 6 uses the
$PXF_BASE
environment variable to identify its runtime configuration directory; it no longer uses$PXF_CONF
for this purpose. - By default, PXF installs its executables and runtime configuration into the same directory,
$PXF_HOME
, andPXF_BASE=$PXF_HOME
. See About the PXF Installation and Configuration Directories for the new installation file layout. - You can relocate the
$PXF_BASE
runtime configuration directory to a different directory after you install PXF by running the newpxf [cluster] prepare
command as described in Relocating $PXF_BASE. - PXF template server configuration files now reside in
$PXF_HOME/templates
; they were previously located in the$PXF_CONF/templates
directory. - The
pxf [cluster] register
command now copies only the PXFpxf.control
extension file to the Greenplum Database installation. Run this command after your first installation of PXF, and/or after you upgrade your Greenplum Database installation. - PXF 6 no longer requires initialization, and deprecates the
init
andreset
commands.pxf [cluster] init
is now equivalent topxf [cluster] register
, andpxf [cluster] reset
is a no-op. -
PXF 6 includes new and changed configuration; see About the PXF Configuration Files for more information:
- PXF 6 integrates with Apache Log4j 2; the PXF logging configuration file is now named
pxf-log4j2.xml
, and is inxml
format. -
PXF 6 adds a new configuration file for the PXF server application,
pxf-application.properties
; this file includes:- New properties to configure the PXF streaming thread pool.
- New
pxf.log.level
property to set the PXF logging level. -
Configuration properties moved from the PXF 5
pxf-env.sh
file and renamed:pxf-env.sh Property Name pxf-application.properties Property Name PXF_MAX_THREADS pxf.max.threads
-
PXF 6 adds new configuration environment variables to
pxf-env.sh
to simplify the registration of external library dependencies:New Property Name Description PXF_LOADER_PATH Additional directories and JARs for PXF to class-load. LD_LIBRARY_PATH Additional directories and native libraries for PXF to load.
See Registering PXF Library Dependencies for more information. - PXF 6 deprecates the
PXF_FRAGMENTER_CACHE
configuration property; fragment metadata caching is no longer configurable and is now always activated.
- PXF 6 integrates with Apache Log4j 2; the PXF logging configuration file is now named
Profiles
-
PXF 6 introduces new profile names and deprecates some older profile names. The old profile names still work, but it is highly recommended to switch to using the new profile names:
New Profile Name Old/Deprecated Profile Name hive Hive hive:rc HiveRC hive:orc HiveORC hive:orc HiveVectorizedORC1 hive:text HiveText jdbc Jdbc hbase HBase 1 To use the
HiveVectorizedORC
profile in PXF 6, specify thehive:orc
profile name with the newVECTORIZE=true
custom option. - PXF adds support for natively reading an ORC file located in Hadoop, an object store, or a network file system. See the Hadoop ORC and Object Store ORC documentation for prerequisites and usage information.
- PXF adds support for reading and writing comma-separated value form text data located in Hadoop, an object store, or a network file system though a separate
CSV
profile. See the Hadoop Text and Object Store Text documentation for usage information. - PXF supports predicate pushdown on
VARCHAR
data types. - PXF supports predicate pushdown for the
IN
operator when you specify one of the*:parquet
profiles to read a parquet file. - PXF supports specifying a codec short name (alias) rather than the Java class name when you create a writable external table for a
*:text
,*:csv
, or*:SequenceFile
profile that includes aCOMPRESSION_CODEC
.
Monitoring
- PXF now supports monitoring of the PXF Service process at runtime. Refer to About PXF Service Runtime Monitoring for more information.
Logging
- PXF improves the display of error messages in the
psql
client, in some cases including aHINT
that provides possible error resolution actions. - When PXF is configured to auto-terminate on detection of an out of memory condition, it now logs messages to
$PXF_LOGDIR/pxf-oom.log
rather thancatalina.out
.
Removed Features
PXF version 6.0.0 removes:
- The
THREAD-SAFE
external table custom option (deprecated since 5.10.0). - The
PXF_USER_IMPERSONATION
,PXF_PRINCIPAL
, andPXF_KEYTAB
configuration properties inpxf-env.sh
(deprecated since 5.10.0). - The
jdbc.user.impersonation
configuration property injdbc-site.xml
(deprecated since 5.10.0). - The Hadoop profile names
HdfsTextSimple
,HdfsTextMulti
,Avro
,Json
,Parquet
, andSequenceWritable
(deprecated since 5.0.1).
Resolved Issues
PXF 6.0.0 resolves these issues:
Issue # | Summary |
---|---|
30987 | Resolves an issue where PXF returned an out of memory error while running a query on a Hive table backed by a large number of files when it could not enlarge a string buffer during the fragmentation process. PXF 6.0.0 moves fragment distribution logic and fragment allocation to the PXF Service running on each segment host. |
Deprecated Features
Deprecated features may be removed in a future major release of PXF. PXF version 6.x deprecates:
- The
DATA-SCHEMA
external table option (deprecated since PXF version 6.6.0). - The
PXF_FRAGMENTER_CACHE
configuration property (deprecated since PXF version 6.0.0). - The
pxf [cluster] init
commands (deprecated since PXF version 6.0.0). - The
pxf [cluster] reset
commands (deprecated since PXF version 6.0.0). - The Hive profile names
Hive
,HiveText
,HiveRC
,HiveORC
, andHiveVectorizedORC
(deprecated since PXF version 6.0.0). Refer to Connectors, Data Formats, and Profiles in the PXF Hadoop documentation for the new profile names. - The
HBase
profile name (nowhbase
) (deprecated since PXF version 6.0.0). - The
Jdbc
profile name (nowjdbc
) (deprecated since PXF version 6.0.0). - Specifying a
COMPRESSION_CODEC
using the Java class name; use the codec short name instead.
Known Issues and Limitations
PXF 6.x has these known issues and limitations:
Issue # | Description |
---|---|
178013439 | (Resolved in 6.0.1) Using the deprecated HiveVectorizedORC profile does not result in vectorized execution.Workaround: Use the hive:orc profile with the option VECTORIZE=true . |
31409 | (Resolved in 6.0.1) PXF can intermittently fail with the following error when it accesses Hive tables STORED AS ORC :ERROR: PXF server error(500) : Failed to initialize HiveResolver Workaround: Use vectorized query execution by adding the VECTORIZE=true custom option to the LOCATION URL. (Note that PXF does not support predicate pushdown, complex types, and the timestamp data type with ORC vectorized execution.) |
168957894 | The PXF Hive Connector does not support using the hive[:*] profiles to access Hive 3 managed (CRUD and insert-only transactional, and temporary) tables.Workaround: Use the PXF JDBC Connector to access Hive 3 managed tables. |
Content feedback and comments