If you are using the Greenplum Streaming Server (GPSS) in your current Greenplum Database installation, you must perform the GPSS upgrade procedure when:
- You upgrade to a newer version of Greenplum Database, or
- You install a new standalone GPSS package on your ETL host or in your Greenplum Database installation.
The GPSS upgrade procedures describe how to upgrade GPSS in your Greenplum Database installation or on your ETL host. This procedure uses GPSS.from to refer to your currently-installed GPSS and GPSS.new to refer to the GPSS installed when you upgrade to the new version of Greenplum Database or install a new GPSS package.
The GPSS upgrade procedure has two parts. You perform one procedure before, and one procedure after, you upgrade to a new version of Greenplum Database or GPSS:
- Step1: GPSS Pre-Upgrade Actions
- Upgrade to a new Greenplum Database version or install a new GPSS package.
- Step2: Upgrading GPSS
Step1: GPSS Pre-Upgrade Actions
Perform this procedure in your GPSS.from installation before you upgrade to a new version of Greenplum Database or GPSS:
Log in to the Greenplum Database coordinator host or the ETL host and set up your environment. For example:
$ ssh gpadmin@<gpcoord> gpadmin@gpcoord$ . /usr/local/greenplum-db/greenplum_path.sh
Or:
$ ssh etluser@<etlhost> etluser@etlhost$ . /usr/local/gpss/gpss_path.sh
Identify and note the current version (GPSS.from) of GPSS. For example:
$ gpss --version
Stop all
gpss
jobs that are in the Running state.Stop all running
gpss
instances.Upgrade to the new version of Greenplum Database or install a new version of GPSS, and then continue your GPSS upgrade with Step2: Upgrading GPSS.
Step2: Upgrading GPSS
After you upgrade to the new version of Greenplum Database or install the new version of GPSS in your Greenplum installation, perform the following procedure to upgrade the GPSS.new software:
Log in to the Greenplum Database coordinator host or the ETL host and set up your environment. For example, on the coordinator:
$ ssh gpadmin@<gpcoord> gpadmin@gpcoord$ . /usr/local/greenplum-db/greenplum_path.sh
Identify and note the new version (GPSS.new) of GPSS. For example:
gpadmin@gpcoord$ gpss --version
If you are upgrading from GPSS version 1.3.0 or older:
GPSS 1.3.0 introduced a regression that caused it to no longer recognize history tables (internal tables that GPSS creates for each job) that were created with GPSS 1.2.6. This regression could cause GPSS to load duplicate Kafka messages into Greenplum. This issue is resolved in GPSS 1.3.1.
You are not required to perform any upgrade steps related to this issue; GPSS will automatically perform the required actions when you resubmit and restart a load job that you initiated with GPSS 1.3.0. GPSS's upgrade actions are dependent upon the GPSS version(s) from which you are upgrading, and are described below:
- If you are upgrading directly from GPSS 1.2.6 or older, GPSS performs no special upgrade actions.
- If you are upgrading from GPSS 1.3.0 and you previously submitted load jobs with both GPSS 1.2.6 or older and 1.3.0, GPSS copies the internal history table for each submitted job to a table with the correct name format, and uses those tables. GPSS also retains and renames the internal history table for each GPSS 1.3.0 job, adding the prefix
deprecated_
. - If you first and only used GPSS 1.3.0 and are upgrading from this version, GPSS renames the internal history table for each restarted job.
If you are upgrading from GPSS version 1.3.1 or older:
- GPSS 1.3.2 changes the
gpss.json
configuration file:- The new file format allows you to specify unique SSL
Certificate
s for GPSS andgpfdist
. If you are using SSL to encrypt communication between GPSS and Kafka, Greenplum, or the GPSS client, you must update thegpss.json
server configuration file to configure the correctCertificate
block. - The
ListenAddress:SSL
property is removed. Ensure that you remove this property from all GPSS server configuration files.
- The new file format allows you to specify unique SSL
- GPSS 1.3.2 renames
gpkafka check
togpkafka history
. If you have any scripts or programs that referencegpkafka check
, you must replace these references withgpkafka history
. - GPSS 1.3.2 removes the
ENCRYPTION
property from thegpkafka.yaml
job configuration file. Ensure that you remove this property from all job configuration files, and that you provide Kafka SSL configuration properties via thePROPERTY
block in the file. - GPSS 1.3.2 removes the
LOCAL_HOSTNAME
andLOCAL_PORT
properties from thegpkafka.yaml
job configuration file. You must remove these properties from all job configurations, and specify thegpfdist
configuration for each job in one of the following ways:- If you are loading data with
gpkafka load
, provide the--config gpfdistconfig.json
or--gpfdist-host hostaddr
and--gpfdist-port portnum
options when you run the command. - If you are loading data with the
gpsscli
job management commands, ensure that thegpss.json
configuration file for thegpss
server instance servicing the request specifies the desiredGpfdist:Host
andGpfdist:Port
settings.
- If you are loading data with
- GPSS 1.3.2 removes the
--no-reuse
flag from thegpsscli load
andgpsscli start
commands. If you have any scripts or programs that reference this flag, you must remove the references.
- GPSS 1.3.2 changes the
If you developed a client application with GPSS 1.3.5 or earlier and you want to use the new
MaxErrorRows
orAbort
session capabilities added to theClose
service that were introduced in GPSS 1.3.6, you must:Edit the
gpss.proto
service definition and add the newCloseRequest
field(s):message CloseRequest { Session session = 1; int32 MaxErrorRows = 2; bool Abort = 3; }
Re-generate the GPSS client classes.
Add code to utilize the new fields.
Re-compile and re-distribute your GPSS client application. Refer to Developing a Batch Data Client for supporting information.
If you are upgrading from GPSS version 1.4.x or older:
- GPSS 1.4.0 removes the
gpsscli history
andgpkafka history
commands. If you have any scripts or programs that reference these commands, you must remove the references. - GPSS 1.4.1 changes the client and server log file format to CSV. If you created any scripts that parsed the previous log file format, you must update that script logic.
- GPSS 1.4.1 adds a new, separate logfile to track Kafka job progress. If you created any scripts that relied on the existence of progress information in the client or server log files, you must update that script logic.
- GPSS 1.4.0 removes the
If you are upgrading from GPSS version 1.6.x or older and you have registered the
dataflow
extension in any database, you must drop and re-create the extension:DROP EXTENSION dataflow; CREATE EXTENSION dataflow;
If you are upgrading from GPSS version 1.7.x or older:
- GPSS 1.8.0 changes the name of the Kafka version 3 (Beta) load configuration file
window
property totask
. If you have any Kafka load configuration files that specifywindow:
, you must change the references totask:
.
- GPSS 1.8.0 changes the name of the Kafka version 3 (Beta) load configuration file
If you are upgrading from GPSS version 1.9.x or older:
- GPSS 1.10.0 changes the naming format of its server log files as described in the Version 1.10.0 release notes and adds a
job_id
field to the content of the server log file. You must update any scripts that you have written that rely on the log file naming format or the log file content of previous releases.
- GPSS 1.10.0 changes the naming format of its server log files as described in the Version 1.10.0 release notes and adds a
If you developed a client application with GPSS 1.9.x or earlier and you want to use the new session timeout capability added to the
Connect
service that was introduced in GPSS 1.10.0, you must:Edit the
gpss.proto
service definition and add the newSessionTimeout
field to theConnectRequest
message:message ConnectRequest { string Host = 1; ... bool UseSSL = 6; int32 SessionTimeout = 7; }
Re-generate the GPSS client classes.
Add code to utilize the new field.
Re-compile and re-distribute your GPSS client application. Refer to Developing a Batch Data Client for supporting information.
If you are upgrading from GPSS version 1.10.0:
- GPSS 1.10.1 changes the naming format of its per-run server log files as described in the Version 1.10.1 release notes. You must update any scripts that you have written that rely on the per-run server log file naming format introduced in version 1.10.0.
If you installed a new version of Greenplum Database, or you installed the GPSS
gppkg
or.tar.gz
packages in your Greenplum installation, you must drop and re-create the GPSS extension in any Greenplum database in which you are using GPSS to load data. A database superuser or the database owner must run these SQL commands:DROP EXTENSION gpss; CREATE EXTENSION gpss;
(If the extension does not already exist, GPSS automatically creates it in a database the first time a Greenplum superuser or the database owner submits a load job to any table that resides in that database.)
Restart your
gpss
instances.Resubmit and restart your GPSS jobs.
For any Kafka job that you resubmit and restart, GPSS will consume Kafka messages from the offset associated with the latest timestamp recorded in the history table for the job.
Content feedback and comments