This document contains pertinent release information about the VMware Tanzu Greenplum streaming server version 1.x releases. The Tanzu Greenplum streaming server (GPSS) is included in certain VMware Tanzu Greenplum 5.x, 6.x, and 7.x distributions. GPSS is also updated and distributed independently of VMware Greenplum. You may need to download and install the GPSS distribution from Broadcom Support Portal to obtain the most recent version of this component.
Supported Platforms
VMware Greenplum Streaming Server 1.x is compatible with these Operating System and VMware Greenplum versions:
GPSS Version | OS Version | VMware Greenplum Version |
---|---|---|
all | RHEL 6.x, CentOS 6.x, RHEL 7.x, CentOS 7.x | 5.17.0+ (up to GPSS 1.10.4), 6.x |
1.6.0+ | Ubuntu 18.04 LTS | 6.x |
1.7.0+ | OEL 7.x, RHEL 8.x | 6.x |
1.7.0 to 1.10.4 | Photon 3 | 6.x |
1.10.3+ | RHEL 8.7+, Rocky Linux 8.7+, OEL 8.7+ using Red Hat Compatible Kernel (RHCK) | 7.x |
1.10.4+ | RHEL 9, Rocky Linux 9, OEL 9.x using Red Hat Compatible Kernel (RHCK) | 6.x, 7.x |
Starting from version 1.11, Greenplum Streaming Server does not support VMware Greenplum 5.x nor Photon 3.
Release 1.11
Release 1.11.4
Release Date: January 09, 2025
Greenplum Streaming Server 1.11.4 includes resolved issues.
Resolved Issues
Greenplum Streaming Server 1.11.4 resolves these issues:
- N/A
- Resolves an issue that
greenplum_fdw
is using the backend version oflibpq
by linking to the frontend version statically. This ensures that the frontend version oflibpq
is utilized, allowing connections initiated byGreenplumFDW
to be correctly identified as remote. - N/A
- Resolves an authentication failure issue in gp2gp if local GPDB cluster and remote GPDB cluster has the same hostname and port. Now, GPSS uses IP address to connect to endpoints in the remote GPDB segment instead of hostname.
- N/A
- Resolves unexpected outputs from the
dryrun
command when handling SQL queries containing the % character. - N/A
- Automatically create the history table during
gpsscli dryrun
to resolve the "relation doesn't exist" error. - N/A
- Resolve the mismatch between GPSS Prometheus metrics and the actual job status following job restoration.
Release 1.11.3
Release Date: August 05, 2024
Greenplum Streaming Server 1.11.3 includes changes and resolves issues.
Changed Features
Greenplum Streaming Server 1.11.3 includes these changes:
Greenplum Streaming Server introduces an optimized progress log policy that creates one log per job per day with daily rotation.
The
gpsscli progress
command now displays progress info in inline mode with an added--scrolling
option to maintain the previous output format.Greenplum Streaming Server enhances
gpsscli list
output by adding a space between columns for better readability.Greenplum Streaming Server improves version verification by ignoring patch version differences when checking the
gpss
executable and related extensions.Greenplum Streaming Server now supports RabbitMQ jobs with SSL for stream mode with the following updates:
- Upgraded
rabbitmq-stream-go-client
to version 1.3.0. - Upgraded
rabbitmq-server
to 3.12.10-1 in Docker images. - Updated
SourceReader
to use the newSetSaslConfiguration
for passwordless SSL authentication.
- Upgraded
Resolved Issues
Greenplum Streaming Server 1.11.3 resolves these issues:
- 35370903
- Resolves a panic issue triggered by closing the Kafka consumer multiple times.
- N/A
- Resolves an issue where stopping a job manually was logged as an
ERROR
instead ofINFO
.
Release 1.11.2
Release Date: May 17, 2024
Greenplum Streaming Server 1.11.2 resolves a single issue.
- N/A
- Resolves an issue where setting the
COMMIT.MINIMAL_INTERVAL
YAML configuration parameter when working with RabbitMQ data could result in data loss.
Resolved Issues
Greenplum Streaming Server 1.11.1 resolves a single issue:
- 378679
- Resolves an issue where, when using Streaming Server Monitor from Greenplum Command Center, Streaming Server routines experienced memory leaks.
Release 1.11.1
Release Date: May 03, 2024
Greenplum Streaming Server 1.11.1 resolves a single issue.
Resolved Issues
Greenplum Streaming Server 1.11.1 resolves a single issue:
- 378679
- Resolves an issue where, when using Streaming Server Monitor from Greenplum Command Center, Streaming Server routines experienced memory leaks.
Release 1.11.0
Release Date: January 8th, 2024
Greenplum Streaming Server 1.11.0 adds new features, includes changes, and resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.11.0.
New and Changed Features
Greenplum Streaming Server 1.11.0 includes these new and changed features:
- Greenplum Streaming Server now supports unloading data from your VMware Greenplum to a file. See Unloading File Data from Greenplum for more details.
- The
gpsccli
utility introduces a new option--daemon
to run Greenplum Streaming Server as a daemon. See the gpsscli reference page for mode details. - Greenplum Streaming Server now supports loading RabbitMQ data to multiple VMware Greenplum tables.
- You may now find the Greenplum Streaming Server job name under the
application_name
column in thepg_stat_activity
system view of your VMware Greenplum. - Greenplum Streaming Server creates a directory
$GPHOME/docs/cli_help/gpss
during its installation, which provides a quick start guide with useful information and examples to set up load jobs, along with sample configuration files. - The version 3 of the YAML configuration file is no longer Beta, it is promoted to a supported feature.
- The option
max_restart_time
in the YAML configuration file now has a new value,-1
, which restarts the job indefinitely. You may usegpsscli stop
to stop the jobs from being restarted indefinitely. - Greenplum Streaming Server improves the performance of the
MERGE
mode for heap tables by leveragingUPSERT
, introduced by VMware Greenplum 7.0. You should expect performance improvements if you are using GPSS with VMWare Greenplum 7.0 or later.
Resolved Issues
Greenplum Streaming Server 1.11.0 resolves these issues:
- 33146
- Fixes a memory leak issue when loading Kafka data and running
gpsscli monitor
. - N/A
- Resolves an issue where jobs with multiple outputs mapping errored out with
"ParseConfigContent failed: yaml: unmarshal errors: line 77: cannot unmarshal !!map into []shared.ColumnMap"
- N/A
- Resolves an issue where the
alert
parameter in the YAML configuration file did not take effect for File and RabbitMQ jobs. - N/A
- Resolves an issue where a job failed to start, and if configured to retry, the retry caused GPSS to panic with error
runtime error: invalid memory address or nil pointer dereference
.
Release 1.10
VMware Greenplum Streaming Server version 1.10.x is the last version that supports VMware Greenplum 5.x.
Release 1.10.4
Release Date: November 1, 2023
Greenplum Streaming Server 1.10.4 includes changes and resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.10.4.
Changed Features
Greenplum Streaming Server 1.10.4 includes these changes:
- Version 1.10.4 adds support for Red Hat Enterprise Linux 64-bit 9, Oracle Linux 64-bit 9 using the Red Hat Compatible Kernel (RHCK), and Rocky Linux 9 for VMware Greenplum version 6.x and 7.x.
- To alleviate possible data skew, GPSS changes the distribution key that it uses for its Kafka history tables.
Resolved Issues
Greenplum Streaming Server 1.10.4 resolves this issue:
- 33098
- Resolves an issue where GPSS lost retry information for jobs that were manually stopped and then restarted. This resulted in GPSS returning the warning
retry job <jobname> is disabled, stop schedule
and exiting a job when a primary VMware Greenplum segment went down. GPSS now explicitly retains the retry configuration for manually stopped jobs, enabling it to better tolerate a segment failure and mirror switch over.
Release 1.10.3
Release Date: September 21, 2023
Greenplum Streaming Server 1.10.3 includes changes and resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.10.3.
Changed Features
Greenplum Streaming Server 1.10.3 includes these changes:
- The per-run, job-specific server log file now includes the YAML configuration used to submit the job and the job status at completion. You can find the YAML configuration in a log message prefaced with
start job
. The job status log message is prefaced withjob finished
. - Version 1.10.3 adds support for Red Hat Enterprise Linux 64-bit 8.7+, Oracle Linux 64-bit 8.7+ using the Red Hat Compatible Kernel (RHCK), and Rocky Linux 8.7+ for VMware Greenplum version 7.0.0.
- Shadowed passwords are now supported for LDAP user accounts.
Resolved Issues
Greenplum Streaming Server 1.10.3 resolves these issues:
- 33015
- Resolves an issue where GPSS returned a
Resource temporarily unavailable
error due to a resource leak that occurred when it repeatedly retried a Kafka job that consumed illegal JSON. GPSS now ensures that it releases all connections to Kafka when it detects an offset gap. - 32935
- The per-run, job-specific server log file did not include enough information about the job. GPSS version 1.10.3 adds the job YAML configuration and the job status at completion to the log file.
Release 1.10.2
Release Date: July 27, 2023
Greenplum Streaming Server 1.10.2 resolves an issue.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.10.2.
Resolved Issues
Greenplum Streaming Server 1.10.2 resolves this issue:
- 32960
- Resolves an issue where GPSS returned a
value out of range
error when the object identifier of the target VMware Greenplum table was larger than 2^32. GPSS now checks for the existence of the target table rather than attempting to access the table's object identifier.
Release 1.10.1
Release Date: June 9, 2023
Greenplum Streaming Server 1.10.1 includes changes and resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.10.1.
Changed Features
Greenplum Streaming Server 1.10.1 includes these changes:
- The per-run, job-specific server log file name is changed from
gpss-<jobname>_<timestamp>.log
togpss_<jobname>_<timestamp>.log
. - The user name and password are now optional components of the RabbitMQ
SERVER
andserver
(version 3 (Beta)) load configuration file properties. - GPSS supports TLS encryption only when loading from a RabbitMQ queue. GPSS does not support TLS encryption when loading from a RabbitMQ stream.
- Version 1.10.1 adds support for Red Hat Enterprise Linux 64-bit 8.7+, Oracle Linux 64-bit 8.7+ using the Red Hat Compatible Kernel (RHCK), and Rocky Linux 8.7+ for VMware Greenplum version 7 Beta 4+.
Resolved Issues
Greenplum Streaming Server 1.10.1 resolves these issues:
- N/A
- Resolves an issue where, when loading from RabbitMQ, GPSS returned a vague error when the VMware Greenplum table specified in the load configuration file did not exist. The message now more accurately reflects the error condition.
- N/A
- Resolves an issue where GPSS did not direct certain log messages to the appropriate per-run, job-specific server log file. GPSS now correctly routes these messages.
- N/A
- Resolves an issue where certain messages in the per-run, job-specific server log file were missing the job identifier. These log messages now include the job id.
Release 1.10.0
Release Date: May 15, 2023
Greenplum Streaming Server 1.10.0 adds new features and includes changes.
This version of the VMware Greenplum Streaming Server documentation replaces the term master with the term coordinator.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.10.0.
New and Changed Features
Greenplum Streaming Server 1.10.0 includes these new and changed features:
GPSS updates the
go
library dependency to version 1.19.1.GPSS introduces support for TLS encryption to RabbitMQ. Refer to Configuring gpss for TLS-Encrypted Communications with RabbitMQ for more information.
GPSS v1.10.0 includes these logging-related changes and new features:
- Version 1.10.0 changes the naming format of GPSS server log files. Previous versions of the server log file name included a date. The new naming format replaces the date with a timestamp that specifies the day and time including milliseconds. Refer to Managing GPSS Log Files for more information. (Upgrade actions may be required as described in Upgrading the Streaming Server.)
- Log messages that GPSS writes to server log files now include the job identifier (truncated to 8 characters).
- You can direct GPSS to automatically rotate the server log file on an hourly or daily basis by setting the new
Logging:Rotate
property in the gpss.json server configuration file. See Configuring Automatic Server Log File Rotation for more information about this new feature. - You can direct GPSS to create per-run server log files for each job by setting the new
Logging:SplitByJob
property in the gpss.json server configuration file. Refer to Configuring Per-Run Server Log Files for more information.
The GPSS gRPC Batch Data API exposes a new
ConnectionRequest
message field namedSessionTimeout
that allows the developer to specify the maximum amount of idle time before GPSS releases a connection to VMware Greenplum. If you choose to make use of this feature in your GPSS client application, upgrade actions are required as described in Upgrading the Streaming Server.
Release 1.9
Release Date: March 10, 2023
Greenplum Streaming Server 1.9.0 adds new features, includes changes, and resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.9.0.
New and Changed Features
Greenplum Streaming Server 1.9.0 includes these new and changed features:
- GPSS now invokes the user-defined functions or SQL commands that you provide in
TEARDOWN_SQL
(version 2) orteardown_statement
(version 3 (Beta)) on both job success and failure. The functions/commands were previously invoked only when the job was successful. - The gpsscli dryrun command now supports the
--property <template_var>=<value>
option. This allows you to use property template variables in the load configuration file that you provide to the command. - You can optionally provide the
--name <jobname>
option to thegpsscli dryrun
command to name the dry run job. - GPSS introduces a new
ENCODING
(version 2) /encoding
(version 3 (Beta) property to the load configuration file that allows you to specify the character set encoding for source data that is of thecsv
,custom
,delimited
, orjson
formats. - GPSS introduces a new
FILTER
(version 2) /filter
(version 3 (Beta)) property to the load configuration file that allows you to specify an output filter for a job. An output filter may be useful when you want to write different data to multiple VMware Greenplum output tables. - GPSS introduces a new
ALERT
(version 2) /alert
(version 3 (Beta)) property block to the load configuration file that allows you to register for a job stopped notification, specifying a command that GPSS will run when a job is stopped. - GPSS introduces a new
TRANSFORMER
(version 2) /transformer
(version 3 (Beta)) property block to the load configuration file that allows you to specify input and/or output transform functions for the data. An input transformer is ago
plugin, an output transformer is a user-defined SQL function (UDF). GPSS supports specifying transforms only when loading from Kafka or RabbitMQ data sources. - GPSS now supports reading Kafka and RabbitMQ messages that contain multiple lines when included in only one of the key or value input data (not both).
- The GPSS RabbitMQ data source is no longer Beta, it is promoted to a supported feature.
- The GPSS RabbitMQ data source now supports
strong
consistency for streams. Refer to Understanding RabbitMQ Message Offset Management for more information about how GPSS manages RabbitMQ offsets and message consistency.
Resolved Issues
Greenplum Streaming Server 1.9.0 resolves these issues:
- 32640
- Resolves an issue where idle
SELECT VERSION()
queries consumed connection resources.
Release 1.8
Release 1.8.1
Release Date: December 21, 2022
Greenplum Streaming Server 1.8.1 includes changes and resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.8.1.
Changed Features
Greenplum Streaming Server 1.8.1 includes these changes:
- GPSS now names the external table that it creates for an
S3
load job with thes3ext
prefix. The prefix was previouslyS3ext
.
Resolved Issues
Greenplum Streaming Server 1.8.1 resolves these issues:
- 32584
- Resolves an issue where the Greenplum Streaming Server returned the error
pq: password authentication failed for user
when a load job specified no password because it did not clear the configuration of the previous job. - 32522
- Resolves an issue where the Greenplum Streaming Server exposed the shadow password string in the logs. GPSS now obscures the password in the log file.
- 32498
- Resolves a resource leak issue where, when a Kafka job failed, the Greenplum Streaming Server did not close the Kafka metadata consumer.
- N/A
- Resolves an issue where the Greenplum Streaming Server calculated the job identifier hash incorrectly when the RabbitMQ load configuration file specified a queue source.
Release 1.8.0
Release Date: September 9, 2022
Greenplum Streaming Server 1.8.0 adds new features, includes changes, and resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.8.0.
New and Changed Features
Greenplum Streaming Server 1.8.0 includes these new and changed features:
GPSS Configuration
The gpss.json
server configuration file now includes a Gpfdist:Certificate:DBClientShared
property. Use this boolean property to instruct GPSS to reuse the Gpfdist SSL certificate for the control channel (client) connection to VMware Greenplum. Configuring SSL for the Control Channel provides the relevant configuration information.
General
- When
ReuseTables
is set tofalse
, GPSS now creates each job's external table using the job name rather than a hash. This enables you to more easily track external tables per-job. About External Table Naming and Lifecycle describes how GPSS names external tables, and also provides information about their lifecycle. - GPSS introduces new scheduling options that allow you to configure automatic stop and restart conditions for jobs. You specify the
RUNNING_DURATION
,AUTO_STOP_RESTART_INTERVAL
,MAX_RESTART_TIMES
, andQUIT_AT_EOF_AFTER
(version 2) orrunning_duration
,auto_stop_restart_interval
,max_restart_times
, andquit_at_eof_after
(version 3 (Beta)) options in theSCHEDULE/schedule
block of the load configuration file. - GPSS enhances the
delimited
data format to support setting quote and escape characters and an end-of-line prefix string when you use the format to load data into VMware Greenplum.
Kafka Data Source
- GPSS changes the name of the version 3 (Beta) load configuration file
window
property totask
. - GPSS records in the progress log file the total number of rows that it processes in a Kafka message. Now, when loading
jsonl
,delimited
, andcsv
format data where a Kafka message can include multiple rows, thetotal_rows_read
identifies the Kafka message and the newtotal_rows
field identifies the total number of rows inserted and rejected. - The Kafka data source exposes a new metadata field named
timestamp
. Thisint64
-type field identifies the time that a message was written to the Kafka log. - When
SAVE_FAILING_BATCH
istrue
, GPSS records the time that a record was inserted into the backup table. The name of the new column isgpss_save_timestamp
. Refer to Redirecting Data to a Backup Table when GPSS Encounters Expression Evaluation Errors for a discussion of the backup table schema. - When
RECOVER_FAILING_BATCH (Beta)
istrue
, GPSS reports more information about the result of the operation, including the batch size and number of records recovered.
File Data Source
- The file data source now supports the
delimited
data format. - The file data source can now load the
stdout
of a command into a VMware Greenplum table. You specify command specifics via the newEXEC
(version 2) orexec
(version 3 (Beta)) block in the load configuration file. - GPSS now supports initiating a dry run of a file job.
New RabbitMQ Data Source (Beta)
GPSS introduces Beta support for loading from a RabbitMQ data source. You can load messages from a RabbitMQ queue or stream into VMware Greenplum. Refer to Loading from RabbitMQ into Greenplum (Beta) for more information about using this new Beta feature, and rabbitmq-v3.yaml (Beta) and rabbitmq-v2.yaml (Beta) for more information about the supported load configuration file properties.
Resolved Issues
Greenplum Streaming Server 1.8.0 resolves these issues:
- 32278, 32180
- Resolves an issue where a VMware Greenplum cluster using
pgbouncer
to manage connections did not receive a client SSL certificate as expected. GPSS now exposes aDBClientShared
GPSS server configuration property that you can use to instruct GPSS to present the Gpfdist certificate as the client SSL cert to VMware Greenplum. - 32096, 31802
- Resolves an issue where GPSS was unable to automatically stop a job based on run time by exposing new job scheduling properties.
- 32044
- Resolves an issue where the recovery of a failed batch (Beta) could not be adequately monitored. GPSS now records the time that a record is inserted into the backup table in a new column named
gpss_save_timestamp
. GPSS also reports more information during bad batch recovery operations. - 32144
- Resolves an issue where external tables used by GPSS were difficult to locate. Now, when
ReuseTables
isfalse
, GPSS names the external table using the job name instead of a hash of configuration properties. - 182386619
- GPSS would incorrectly fall back (to earliest or latest offset) all Kafka partitions, even those without offset gaps. This issue is resolved; GPSS now falls back only those partitions that have experienced an offset gap and writes this information to the GPSS log.
- N/A
- Resolves an issue where GPSS did not reset
MAX_RETRIES
after a job was successfully submitted and running.
Release 1.7
Release 1.7.2
Release Date: April 21, 2022
Greenplum Streaming Server 1.7.2 includes changes and resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.7.2.
Changed Features
Greenplum Streaming Server 1.7.2 includes these changes:
- GPSS adds support for specifying backslash escape sequences when you set the following CSV options: delimiter, quote, and escape. GPSS supports the standard backslash escape sequences for backspace, form feed, newline, carriage return, and tab, as well as escape sequences that you specify in hexadecimal format (prefaced with
\x
). Refer to Backslash Escape Sequences in the PostgreSQL documentation for more information. - To resolve issue
32168
, GPSS version 1.7.2 introduces support for loading files or messages that contain one JSON record per line into VMware Greenplum. To use this new feature, you must specifyFORMAT: jsonl
in version 2 format load configuration files, or specifyjson
format withis_jsonl: true
in version 3 (Beta) format load configuration files.
Resolved Issues
Greenplum Streaming Server 1.7.2 resolves these issues:
- 32168
- Resolves an issue where GPSS did not support loading multi-line JSON files into VMware Greenplum. GPSS 1.7.2 introduces support for loading JSON message or file data that contains a single JSON record per line.
- N/A
- Resolves an issue where GPSS did not support escape sequences that were specified in the CSV delimiter, quote, and escape options. GPSS now supports standard and hexadecimal-format backslash escape sequences.
Release 1.7.1
Release Date: March 31, 2022
Greenplum Streaming Server 1.7.1 resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.7.1.
Resolved Issues
Greenplum Streaming Server 1.7.1 resolves these issues:
- 32105
- Resolves an issue where GPSS incorrectly added an offset based on the VMware Greenplum local time zone to
timestamp
(without timezone) types that it loaded into a VMware Greenplum table. - 181293923
- In some cases, GPSS returned the error
pq: missing data for column *name*
when loading a file containing CSV-format data. This issue is resolved; GPSS no longer automatically adds a newline when one already exists at the end of the file.
Release 1.7.0
Release Date: March 18, 2022
Greenplum Streaming Server 1.7.0 adds new features, includes changes, and resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.7.0.
New and Changed Features
Greenplum Streaming Server 1.7.0 includes these new and changed features:
OS and Platforms
- GPSS introduces support for Red Hat Enterprise Linux 8 and Photon 3 for VMware Greenplum 6, and now provides download packages for these operating system versions on Broadcom Support Portal.
- GPSS updates the version of
go
that it uses to build the CLI tools to version 1.17.6 to mitigate CVE-2021-44716.
GPSS Configuration
GPSS introduces a default timeout of 10 seconds for a gpss
service instance to connect to VMware Greenplum and a related environment variable named GPDB_CONNECT_TIMEOUT
. You can set this environment variable to change the amount of time that GPSS waits to establish a connection to VMware Greenplum as described in Running the Greenplum Streaming Server.
Authentication
- After it encounters an SSL connection failure on the control channel, GPSS will attempt to initiate a non-SSL connection on the channel.
- The
gpss.json
server configuration file now includes anAuthentication
property block. Use the configuration properties in this block to specify a user name and password for client authentication to the GPSS server. Refer to Configuring the Streaming Server for Client-to-Server Authentication for additional information about this new feature. - GPSS adds the
-U/--username
and-P/--password
options to the gpsscli subcommands to specify the user name and password for client authentication to the GPSS server.
Kafka Data Source
- GPSS now saves the
topic:partition:offset
for each badly-formatted Kafka message written to the error log; you can view this information when you run theSELECT * FROM gp_read_error_log('<exttbl>')
command. - GPSS adds the
--skip-explain
flag to the gpsscli start subcommand to skip the explain SQL check step of its internal processing. - GPSS now supports loading from a single kafka topic into multiple VMware Greenplum tables. Provide an
OUTPUTS:TABLE
(version 2) ortargets:gpdb:tables:table
(version 3 (Beta)) block for each table, and specify the properties that identify the data targeted to each. - GPSS introduces a new datatype named
gp_json
(Beta) to thedataflow
extension. For additional information about using thegp_json
data type, refer to About the JSON Format and Column Type documentation.
File and Kafka Data Sources
- GPSS adds support for new CSV options for file and Kafka jobs. You can now specify the delimiter, quote, and null string values in the load configuration file. You can identify a list of columns whose values GPSS forces to be not null. You can also specify GPSS's behaviour when it encounters missing trailing fields in a row of data. New version 2 property names include
DELIMITER
,QUOTE
,NULL_STRING
,ESCAPE
,FORCE_NOT_NULL
, andFILL_MISSING_FIELDS
. New version 3 property names includedelimiter
,quote
,null_string
,escape
,force_not_null
, andfill_missing_fields
. - GPSS exposes new
PREPARE_SQL
andTEARDOWN_SQL
(version 2) andprepare_statement
andteardown_statement
(version 3) load configuration file properties for Kafka and file data sources. You can use the properties to specify user-defined function or SQL commands for GPSS to run before executing a job, and/or at job completion.
version 3 (Beta) Configuration
GPSS 1.7.0 adds, changes, and relocates property keywords in the version 3 (Beta) configuration file format. Refer to the gpsscli-v3.yaml (Beta), gpkafka-v3.yaml (Beta), and filesource-v3.yaml (Beta) reference pages for the new keywords and locations.
New S3 Data Source (Beta)
GPSS 1.7.0 introduces Beta support for a new data source, S3. This data source does not read directly from S3, but rather uses the VMware Greenplum s3 protocol and external tables to read from s3 and write to Greenplum in parallel. Refer to Loading from S3 into Greenplum (Beta) for more information about using this new feature, and s3source-v3.yaml (Beta) for the supported load configuration file properties.
New Commands and Options
- GPSS adds the new gpsscli dryrun subcommand. When you invoke this command, GPSS performs a trial run of a Kafka or S3 job without actually writing to VMware Greenplum. You can use the command to help diagnose load job errors as described in Diagnosing an Error with a Trial Load.
- GPSS adds the
-f/--force
flag to the gpsscli remove subcommand to forcibly stop and remove a GPSS job(s).
Other Changes
- GPSS adds new Submitted and Success statuses for batch (file, s3) jobs. GPSS 1.7.0 also changes the Stopped status to signify that a job was stopped by the user. Refer to the gpsscli status reference page for a description of GPSS job statuses.
- GPSS 1.7.0 removes the Streaming Job API (Beta) documentation.
Resolved Issues
Greenplum Streaming Server 1.7.0 resolves these issues:
- CVE-2021-44716
- Updates the
go
library to version 1.17.6. - N/A
- You can now specify an Avro schema file path for both the key and the value when you load Kafka data into VMware Greenplum.
- N/A
- Resolves an issue where GPSS erroneously inserted a
\n
after parsing 76 characters of Avro data when the load configuration file specifiedbytes_to_base64: true
. - 32022
- Resolves an issue where GPSS did not provide any way to run SQL commands before GPSS initiates a job or after a GPSS job completes by exposing new properties in version 2 and version 3 (Beta) load configuration files (
PREPARE_SQL/TEARDOWN_SQL
andprepare_statement/teardown_statement
). - 31886
- Resolves an issue where GPSS returned an authentication error when SSL was deactivated for the user (i.e. there was a
hostnossl
connection type entry configured for the user in thepg_hba.conf
file). GPSS now attempts to initiate a non-SSL connection when it encounters an SSL connection failure on the control channel.
Release 1.6
Release 1.6.0
Release Date: May 28, 2021
Greenplum Streaming Server 1.6.0 adds new features, includes changes, and resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.6.0.
New and Changed Features
Greenplum Streaming Server 1.6.0 includes these new and changed features:
- GPSS adds the
-c | --config
flag/option to thegpss
command to specify the JSON-formatted configuration file. - The
gpsscli --version
command now displays the version of the GPSS server in addition to displaying that of the client. - The
gpss.json
server configuration file now includes aKeepAlive
property block. Use the configuration properties in this block to specify timeout options for the gRPC connection between the GPSS client and the GPSS server. - GPSS changes the format of front-end logs (messages written by commands to
stdout
) from CSV format to a more human-readable format. Related, GPSS adds a--csv-log
option to the commands to write the front-end logs in CSV format. GPSS also adds a--color
option to commands to enable the use of color in message display. - GPSS exposes a new load configuration property for Kafka data sources named
IDLE_DURATION
(version 2 configuration) andidle_duration_ms
(version 3 configuration). Use this property to specify that GPSS use lazy load mode, waiting until data arrives before locking the target VMware Greenplum table. - GPSS exposes a new load configuration property for Kafka data sources named
SCHEMA_PATH_ON_GPDB
(version 2 configuration) andschema_path_on_gpdb
(version 3 configuration). Use this property to specify the path to the Avro.avsc
file that contains the schema of the Kafka key or value data (but not both). This file must reside in the same location on all VMware Greenplum segment hosts. - GPSS exposes a new load configuration property for Kafka data sources named
FALLBACK_OFFSET
(version 2 configuration) andfallback_offset
(version 3 configuration). Use this property to specify that GPSS automatically handle Kafka message offset mismatches, and how. - GPSS exposes new load configuration properties for Kafka data sources to support access to an SSL-secured schema registry. Refer to Accessing an SSL-Secured Schema Registry for more information.
- GPSS now supports acting as a high-level Kafka consumer when the Kafka client properties include a
group.id
setting. - GPSS exposes a new load configuration property for Kafka data sources named
CONSISTENCY
(version 2 configuration) andconsistency
(version 3 configuration). Use this property to specify how GPSS manages Kafka message offsets when it acts as a high-level consumer. Refer to Understanding Kafka Message Offset Management for more information. - GPSS 1.6.0 provides additional documentation about developing and using custom formatters with GPSS.
Beta Features
Greenplum Streaming Server 1.6.0 includes these new Beta features:
GPSS exposes a new load configuration property for Kafka data sources named
RECOVER_FAILING_BATCH
(version 2 configuration) andrecover_failing_batch
(version 3 configuration). Use this property in conjunction withSAVE_FAILING_BATCH
to instruct GPSS to automatically reload the good data in the batch, and retain only the error data in the backup table.Note: Enabling this feature may have severe performance implications when any data in the Kafka topic generates an expression error.
Note: This feature requires that GPSS has the VMware Greenplum privileges to create a function.
GPSS adds a new extension named
dataflow
. This extension includes a new data type,gp_jsonb
(available for VMware Greenplum version 6.x only), and a new formatter,text_in
. You mustCREATE EXTENSION dataflow;
in each database in which you choose to use these types and formatters. For additional information about thegp_jsonb
data type, see About the JSON Format and Column Type.
Resolved Issues
Greenplum Streaming Server 1.6.0 resolves this issue:
- 31458
- Resolves an issue where job progress information was available only via
stdout
. GPSS now supports consumer groups, which saves message offsets to the Kafka topic. - 31396
- Resolves an issue where the GPSS Ubuntu download package was missing certain dependent libraries. These libraries are now marked as required.
- 31359
- Resolves an issue where GPSS could not restart a job that had been stopped for a long period of time. GPSS now supports a
FALLBACK_OPTION
load configuration property that instructs GPSS to automatically handle offset mismatches, and how to handle them. - 31315
- Resolves an issue where GPSS was unable to load data from Kafka when TLS-secured communication was required between the Kafka broker and the schema registry. GPSS now supports load configuration properties to specify the certificates and keys required for this communication.
- 31278
- Resolves an issue where GPSS was unable to load Avro data when the schema was not embedded in the
.avro
file. GPSS now supports theSCHEMA_PATH_ON_GPDB
load configuration property to specify the.avsc
schema file. - 31277
- Resolves a request for a job timeout by supporting a new
IDLE_DURATION
load configuration property. - 30723, 30711
- Resolves an issue where GPSS failed to load JSON-format data that included
\u0000
by creating a new VMware Greenplum data type namedgp_jsonb
(Beta).
Release 1.5
Release 1.5.3
Release Date: April 15, 2021
Greenplum Streaming Server 1.5.3 resolves an issue.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.3.
Resolved Issues
Greenplum Streaming Server 1.5.3 resolves this issue:
- 31357
- Resolves an issue where GPSS did not correctly handle
CUSTOM_OPTION
properties specified in a load configuration file. GPSS now supports using theNAME
andPARAMSTR
properties to specify a custom formatter user-defined function.
Release 1.5.2
Release Date: March 5, 2021
Greenplum Streaming Server 1.5.2 resolves several issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.2.
Changed Features
Greenplum Streaming Server 1.5.2 includes this change:
- GPSS omits the end time in its output error hints. Resolved issue 31287 provides more information.
Resolved Issues
Greenplum Streaming Server 1.5.2 resolves these issues:
- N/A
- Resolves an issue where GPSS logged the message
execInsert and err: nil
because it did not check for an error before logging. - 31287
- Resolves an issue where GPSS did not always display the correct end time in the output error hint by removing the end time condition.
- 177153850
- Resolves an issue where a GPSS query returned a syntax error from VMware Greenplum because
MATCH COLUMNS
was empty. GPSS now requires and checks that this field includes at least one column when you submit a load job that specifiesUPDATE
orMERGE
mode. - 177133400
- Resolves an issue where GPSS stopped a Kafka job unexpectedly and did not return an error when it encountered a batch that contained only a control message.
- 177077055
- Resolves an issue where the
--all
option was incorrectly displayed in the help output of thegpsscli load
command. - 177077007
- GPSS consumed a large mount of memory caching Kafka messages when it ran many concurrent jobs that read from multiple partitions. This issue is resolved; GPSS now specifies a less aggressive default value for the
librdkafka
queued.max.messages.kbytes
property when the user does not explicitly configure it. - 177014072
- Resolves an issue where GPSS incorrectly returned the error
gpkafka load show job progress fail, err: job progress is nil
when it failed to start a Kafka job. GPSS now returns the more meaningful errorgpkafka load start job failed
in this situation. - 176842005
- Resolves an issue where GPSS submitted a job with the wrong name when a
gpsscli load *.yaml
command operated on more than one load job.
Release 1.5.1
Release Date: February 5, 2021
Greenplum Streaming Server 1.5.1 includes changes and resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.1.
Changed Features
Greenplum Streaming Server 1.5.1 includes these changes:
- Version 1.5.1 is the first standalone GPSS release that includes a
.deb
installation package for Ubuntu 18.04 LTS systems. - The
gpsscli
subcommands now consistently return zero (0
) on success and non-zero when GPSS encounters an error. - GPSS improves the error message that it returns when it encounters a mismatched extension or formatter version.
- GPSS bundles a patched version of the
libserdes
library to fix an issue that can arise when theSCHEMA_REGISTRY_ADDRS
property value includes a trailing slash. See resolved issue 31137. - GPSS now registers the
gp_read_persistent_error_log()
function when you register the GPSS extension in a database. Resolved issue 31201 provides more information. - The progress log file name format has changed; the new format retains the complete job name rather than truncating it to 8 characters.
Resolved Issues
Greenplum Streaming Server 1.5.1 resolves these issues:
- 31201
- Resolves an issue where GPSS returned a
permission denied for language c
error when it attempted, at runtime, to register an internal function as the VMware Greenplum user that started GPSS, and this user did not have the privileges required to create such functions. GPSS now registers this internal function when you create the GPSS extension in a database. - 31137
- Due to a bug in the dependent library
libserdes
, GPSS did not correctly handle a trailing slash when specified in the first address in a list ofSCHEMA_REGISTRY_ADDR
s. This issue is resolved; GPSS 1.5.1 bundles a patched version of thelibserdes
library that can handle such addresses. - 176136800
- Resolves an issue where GPSS returned an error when it interpreted and parsed the
SAVE_FAILING_BATCH
property and value in a (deprecated) version 1 load configuration file, when version 1 of the file does not support this property. GPSS now displays a warning message when it encounters a property that is not supported in a version 1 configuration file. - 176068963
- GPSS reported an offset gap when it read Kafka messages using the
read_committed
isolation level, the job was restarted, and the topic retention period had expired. This issue is resolved; GPSS now records control message offsets. - 175867685
- Resolves an issue where the
-i | --edit-in-place
option was displayed in the help output of subcommands that did not support the option. GPSS now correctly displays the option only for thegpsscli convert
command. - 175867670
- Resolves an issue where the
gpsscli
subcommands did not return consistent values.gpsscli
now returns zero (0
) on success and non-zero on failure. - n/a
- Resolves an issue where GPSS did not correctly validate a
filesource.yaml
load configuration file before submitting the job.
Release 1.5.0
Release Date: December 2, 2020
Greenplum Streaming Server 1.5.0 adds new features, includes changes, and resolves issues.
You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.0.
New and Changed Features
Greenplum Streaming Server 1.5.0 includes these new and changed features:
- The load configuration file
ERROR_LIMIT
property, previously mandatory, is now optional. The default value for the property is zero (0
); GPSS deactivates error logging and stops a load operation upon encountering the first error. - GPSS includes out-of-the-box Prometheus integration, enabling you to use the tool to monitor your
gpss
server instances. Refer to Monitoring GPSS Service Instances for more information on enabling and using this integration. - New configuration properties in the
gpss.json
server configuration file include:- The
DebugPort
configuration property. You can use this property to identify the port number on which GPSS starts a debug server for thegpss
server instance. Refer to Pulling Information from the Debug Server for more information. - The
MinTLSVersion
configuration property. You use this property to specify the minimum TLS version that GPSS requests on encrypted connections. - The
Logging
configuration property block. You can use these configuration properties to set the front-end and back-end logging levels for GPSS commands. See About GPSS Logging. - The
JobStore
configuration property block. Use the configuration property in this block to specify a local directory in which GPSS maintains job status information. This allows a GPSS server instance to (re)start any in-progress jobs when the instance first starts up. See About GPSS Job Management. - The
Monitor
configuration property block. You use this property to enable GPSS Prometheus integration.
- The
- GPSS no longer generates and assigns a unique identifier as the job name when you invoke the
gpsscli submit
orgpsscli load
commands without specifying the--name
option. GPSS now assigns the base name of the load configuration file as the default job name. - GPSS exposes a new load configuration property for Kafka data sources named
PARTITIONS
. Use this property to specify the specific partition numbers from which you want GPSS to load Kafka messages from the topic. (This property is not supported for the Kafka version 1 configuration file format.) - GPSS supports specifying template parameters for load configuration file properties. When you specify the
{{template\_var}}
value syntax in the file, GPSS substitutestemplate\_var
with avalue
that you specify via the-p | --property template\_var=value
option when you submit or load the job. - GPSS supports SSL encryption on the control channel between GPSS and the VMware Greenplum coordinator, and ships with an updated
pq
library to support this feature. See Configuring SSL for the Control Channel for configuration information. - The
gpsscli
start
,stop
, andremove
subcommands now support a--all
flag. When you specify this flag, GPSS: starts all submitted jobs, stops all running jobs, or removes all stopped jobs. - The
gpsscli submit
andgpsscli load
commands can now operate on one or more YAML load configuration files. - GPSS exposes the new
SAVE_FAILING_BATCH
load configuration property. When you set this property totrue
, GPSS also writes loading data to a backup table. When GPSS encounters expression evaluation errors, this backup table aids in the recovery of the load operation. See Redirecting Data to a Backup Table when GPSS Encounters Expression Evaluation Errors for additional information. (This property is not supported for the Kafka version 1 configuration file format.) - GPSS 1.5.0 introduces a new Beta feature, the version 3 load configuration file format. This format introduces a new YAML organization and keywords, and more closely aligns with the GPSS gRPC Streaming Job API. Refer to gpsscli-v3.yaml (Beta) for the version 3 syntax.
- GPSS 1.5.0 supports the persisent error log feature of VMware Greenplum when you are running against Greenplum version 5.26+ or 6.6+. For more details about the persisent error log, refer to the CREATE EXTERNAL TABLE SQL reference page in the VMware Greenplum documentation.
Resolved Issues
Greenplum Streaming Server 1.5.0 resolves these issues:
- 30332
- In some cases when GPSS reused external tables for jobs, it did not update the external table that it uses internally for load operations when the target Greenplum table definition was modified.
- 171299427
- Resolves an issue where GPSS was unable to cancel a batch write operation when it encountered an error, and left a lingering session.
Release 1.4
Release 1.4.3
Release Date: December 17, 2021
Greenplum Streaming Server 1.4.3 resolves an issues and includes related changes.
You may be required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.4.3.
Changes
Greenplum Streaming Server 1.4.3 includes this change:
- After it encounters an SSL connection failure on the control channel, GPSS will attempt to initiate a non-SSL connection on the channel.
Resolved Issues
Greenplum Streaming Server 1.4.3 resolves this issue:
- 31886
- Resolves an issue where, after upgrade to version 1.4.2, GPSS returned an authentication error when SSL was deactivated for the user (i.e. there was a
hostnossl
connection type entry configured for the user in thepg_hba.conf
file). GPSS now attempts to initiate a non-SSL connection when it encounters an SSL connection failure on the control channel.
Release 1.4.2
Release Date: November 2, 2020
Greenplum Streaming Server 1.4.2 resolves issues and includes related changes.
You may be required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.4.2.
Changes
Greenplum Streaming Server 1.4.2 includes these changes:
- GPSS now specifies the SSL
prefer
mode on the control channel to the VMware Greenplum coordinator host. GPSS previously explicitly deactivated SSL on the channel.
Resolved Issues
Greenplum Streaming Server 1.4.2 resolves these issues:
- n/a
- Resolves an issue where GPSS recorded an incorrect count in the progress log file when the messages it received included offset gaps, such as with transaction control messages.
- 30776, 174685715
- Resolves an issue where
gpsscli stop
would not respond (hang). - 174685711
- Resolves an issue where GPSS failed to load a large (>2GB) file. GPSS now transfers a file in multiple, smaller chunks when loading to Greenplum.
- 174984151
- GPSS sent an HTTP request to the Avro schema registry service on every segment on every commit; in some cases, this created and destroyed a large number of TCP connections in the process. GPSS resolves this issue by reading the schema a single time per session (as long as the schema remains unchanged).
Release 1.4.1
Release Date: August 7, 2020
Greenplum Streaming Server 1.4.1 resolves issues and includes related changes.
You may be required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.4.1.
Changes
Greenplum Streaming Server 1.4.1 includes these changes:
- GPSS bundles a patched version of the
librdkafka
library to fix an issue that can arise when the Kafka topic that GPSS loads includes messages with discontinuous offsets. See resolved issue 30797, 30776. - GPSS now always tracks Kafka job progress in a separate, CSV-format log file. See resolved issue 173603095 and Checking the Progress of a Load Operation.
- GPSS 1.4.1 changes the format and content of the server and client log file messsages. The old log file format was delimited text, which could not be parsed when the text contained a newline. The log files are now CSV-format and include a header row. See resolved issue 173603029 and Examining GPSS Log Files.
Resolved Issues
Greenplum Streaming Server 1.4.1 resolves these issues:
- n/a
- When the schema registry service was down, GPSS appeared to hang during a Kafka load operation because it tried to access the registry multiple times for each Kafka message. This issue is resolved; GPSS now reports an error and stops retrying immediately when it detects that the schema registry is down.
- 30797, 30776
- Due to a bug in the dependent library
librdkafka
, a load job from Kafka would hang when there were aborted Kafka transactions in the topic, or when the messages were deleted before GPSS was able to consume them. This issue is resolved. GPSS 1.4.1 bundles a patched version of thelibrdkafka
library and can now handle message offsets that are not continuous. - 30760
- Certain merge/update operations failed with the error
Cannot parallelize an UPDATE statement that updates the distribution columns
because GPSS versions 1.3.5 through 1.4.0 used the Greenplum Postgres Planner by default, which does not support updating columns that are specified as the distribution key. GPSS 1.4.1 resolves this issue by not explicitly specifying a query planner/optimizer, but rather using the default that is configured in the Greenplum cluster. - 173653147
- In some cases,
gpsscli stop
would hang when you invoked it to stop a Kafka load job that GPSS had previously retried. This issue is resolved. - 173637940
- The GPSS utilities distributed in the VMware Greenplum 6.8.x and 6.9.0 Client and Loader Tools packages were missing the dependent library
libserdes.so
. This issue is resolved, the package now includes this library. - 173637900
- The GPSS 1.4.1 Batch Data gRPC API fixes a parallel loading regression that manifested itself when the
gpss.json
server configuration file included the (default)ReuseTables: true
property setting. - 173603095
- Because GPSS tracked job progress only during
gpsscli progress
command execution, the progress information for jobs for which you did not run the command was lost. This issue is resolved. GPSS now always tracks job progress in a separate, CSV-format log file (with header row) namedprogress_*jobname*_*jobid*_*date*.log
. - 173603029
- GPSS log file messages with embedded newlines could not be parsed. This issue is resolved; GPSS changes the client and server log file format to CSV (with header row).
Release 1.4.0
Release Date: June 26, 2020
Greenplum Streaming Server 1.4.0 adds new features, includes changes, and resolves issues.
You may be required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.4.0.
New and Changed Features
Greenplum Streaming Server 1.4.0 includes these new and changed features:
- GPSS supports loading from a file data source. You can now load data in Avro, binary, CSV, and JSON files into VMware Greenplum. See Loading File Data into Greenplum for more information.
- GPSS defines a new
META
load configuration property block. You can load the properties in this single JSON-format column into the target table, or use the properties in update or merge criteria for a load operation. The availableMETA
properties are data-source specific:- The Kafka data source exposes the following
META
properties:topic
(text
),partition
(int
), andoffset
(bigint
). - The file data source exposes a single
META
property namedfilename
(text
).
- The Kafka data source exposes the following
- GPSS supports Avro data containing binary fields.
- GPSS implements a faster update in merge mode for large datasets when the load configuration specifies no
UPDATE_COLUMNS
. In this scenario, GPSS updates allMAPPING
columns in each row. - You can use GPSS to load data into a VMware Greenplum cluster that utilizes the PgBouncer connection pooler.
- The CentOS 7.x GPSS packages for Greenplum 6 support Oracle Enterprise Linux 7.
- GPSS uses a single thread and socket per partition by sharing a Kafka consumer between workers.
- GPSS bundles
librdkafka
version 1.4.2. This version provides support for controlling how GPSS reads Kafka messages written transactionally via theisolation.level
property. - GPSS 1.4 introduces the new Streaming Job API (Beta), a gRPC API that allows you to manage and submit streaming jobs to the server.
Resolved Issues
Greenplum Streaming Server 1.4.0 resolves these issues:
- 172142789
- The GPSS Batch Data gRPC API fixes inaccurate
TransferStats
success and error counts for data load operations initiated in update mode.
Deprecated Features
Deprecated features may be removed in a future minor release of the Greenplum Streaming Server. GPSS 1.4.x deprecates:
- The
gpkafka
Version 1 configuration file format (deprecated since 1.4.0). - The
gpkafka.yaml
(versions 1 and 2)POLL
block, including thePOLL:BATCHSIZE
andPOLL:TIMEOUT
properties (deprecated since 1.3.5).
Removed Features
Deprecated features may be removed in a future minor release of the Greenplum Streaming Server. GPSS 1.4.x removes:
- The
gpsscli history
andgpkafka history
commands (deprecated in 1.3.5).
Release 1.3
Release 1.3.1
Release Date: December 19, 2019
Greenplum Streaming Server version 1.3.1 is the first standalone release of GPSS. GPSS 1.3.1 is also included in the VMware Greenplum version 5.24 and 6.2 distributions.
Greenplum Streaming Server 1.3.1 is a maintenance release that resolves several issues.
Resolved Issues
Greenplum Streaming Server 1.3.1 resolves these issues:
- 169806983
- In some cases, reading from Kafka using the default
MINIMAL_INTERVAL
(0 seconds) caused GPSS to consume a large amount of CPU resources, even when no new messages existed in the Kafka topic. This issue is resolved. - 169807372, 169831558
- GPSS 1.3.0 did not recognize internal history tables that were created with GPSS 1.2.6 and earlier. In some cases, this caused GPSS to load duplicate messages into VMware Greenplum. This issue is resolved.
Release 1.3.0
Release Date: November 1, 2019
Greenplum Streaming Server version 1.3.0 is included in the VMware Greenplum version 5.23 and 6.1 distributions.
Greenplum Streaming Server 1.3.0 is a minor release that includes new and changed features and resolves several issues.
New and Changed Features
Greenplum Streaming Server 1.3.0 includes these new and changed features:
- GPSS now supports log rotation, utilizing a mechanism that you can easily integrate with the Linux
logrotate
system. See Managing GPSS Log Files for more information. - GPSS has added the new
INPUT:FILTER
load configuration property. This property enables you to specify a filter that GPSS applies to Kafka input data before loading it into VMware Greenplum. - GPSS displays job progress by partition when you provide the
--partition
flag to thegpsscli progress
command. - GPSS enables you to load Kafka data that was emitted since a specific timestamp into VMware Greenplum. To use this feature, you provide the
--force-reset-timestamp
flag when you rungpsscli load
,gpsscli start
, orgpkafka load
. - GPSS now supports update and merge operations on data stored in a VMware Greenplum table. The load configuration file accepts
MODE
,MATCH_COLUMNS
,UPDATE_COLUMNS
, andUPDATE_CONDITION
property values to direct these operations. Example: Merging Data from Kafka into Greenplum Using the Streaming Server provides an example merge scenario. - GPSS supports Kerberos authentication to both Kafka and VMware Greenplum.
- GPSS supports SSL encryption between GPSS and Kafka.
- GPSS supports SSL encryption on the data channel between GPSS and VMware Greenplum.
Resolved Issues
Greenplum Streaming Server 1.3.0 is a minor release that resolves these issues:
- 168130147
- In some situations, specifying the
--force-reset-earliest
flag when loading data failed to read from the correct offset. This problem has been fixed. (Using the--force-reset-*xxx*
flags outside of an offset mismatch scenario is discouraged.) - 167997441
- GPSS did not save error data to the external table error log when it encountered an incorrectly-formatted JSON or Avro message. This issue has been fixed; invoking
gp_read_error_log()
on the external table now displays the offending data. - 164823612
- GPSS incorrectly treated Kafka jobs that specified the same Kafka topic and Greenplum output schema name and output table name, but different database names, as the same job. This issue has been resolved. GPSS now includes the VMware Greenplum name when constructing a job definition.
Beta Features
Greenplum Streaming Server 1.x includes these Beta features:
GPSS adds support for a RabbitMQ data source (introduced in 1.8.0, promoted to supported in 1.9.0).
GPSS adds support for an
s3
data source (introduced in 1.7.0).GPSS adds a new datatype named
gp_json
to thedataflow
extension (introduced in 1.7.0).GPSS exposes a new load configuration property for Kafka data sources named
RECOVER_FAILING_BATCH
(version 2 configuration) andrecover_failing_batch
(version 3 configuration). Use this property in conjunction withSAVE_FAILING_BATCH
to instruct GPSS to automatically reload the good data in the batch, and retain only the error data in the backup table.Note: Enabling this feature may have severe performance implications when any data in the Kafka topic generates an expression error.
Note: This feature requires that GPSS has the VMware Greenplum privileges to create a function.
(Introduced in 1.6.0.)
GPSS adds a new extension named
dataflow
. This extension includes a new data type,gp_jsonb
(available for VMware Greenplum version 6.x only), and a new formatter,text_in
. (Introduced in 1.6.0).GPSS specifies a new version 3 load configuration file format. This format introduces a new YAML organization and keywords. (Introduced in 1.5.0.)
Deprecated Features
Deprecated features may be removed in a future release of the Greenplum Streaming Server. GPSS 1.x deprecates:
- Specifying the
gpss.json
configuration file to thegpss
command standalone (deprecated since 1.6.0). Use the-c | --config
option when you specify the file. - The
gpkafka
Version 1 configuration file format (deprecated since 1.4.0). - The
gpkafka.yaml
(versions 1 and 2)POLL
block, including thePOLL:BATCHSIZE
andPOLL:TIMEOUT
properties (deprecated since 1.3.5).
Known Issues and Limitations
Greenplum Streaming Server 1.x has these known issues:
- 31998
- In some cases, an
EXPLAIN INSERT
command internally launched by GPSS on a Kafka job may take a long time to complete. You can work around this issue by specifying the--skip-explain
flag to the gpsscli start command when you start the job.
N/A
: In releases 1.11.0 and 1.11.1, setting the COMMIT.MINIMAL_INTERVAL
YAML configuration parameter when working with RabbitMQ data could result in data loss. This issue is resolved in release 1.11.2.
- N/A
- The
SAVE_FAILING_BATCH
andPARTITIONS
configuration properties are not supported when you use the version 1 configuration file format to load data. - N/A
- The Greenplum Streaming Server may consume a very large amount of system memory when you use it to load a huge (hundreds of GBs) file, in some cases causing the Linux kernel to kill the GPSS server process. Do not use GPSS to load very large files; instead, use
gpfdist
. - 30503
-
Due to limitations in the VMware Greenplum external table framework, GPSS cannot log a data type conversion error that it encounters while evaluating a mapping expression. For example, if you use the expression
EXPRESSION: (jdata->>'id')::int
in your load configuration file, and the content ofjdata->>'id'
is a string that includes non-integer characters, the evaluation fails and GPSS terminates the load job. GPSS cannot log and propagate the error back to the user viagp_read_error_log()
. -
Workarounds for Kafka:
- Set the
SAVE_FAILING_BATCH
load configuration property totrue
, and then manually load any data batch that included expression errors. - Skip the bad Kafka message by specifying a
--force--reset-*xxx*
flag on the job start or load command. - Correct the message and publish it to another Kafka topic before loading it into VMware Greenplum.
- Set the
Content feedback and comments