title: gpsscli.yaml
gpsscli configuration file.
Synopsis
DATABASE: <db_name>
USER: <user_name>
PASSWORD: <password>
HOST: <coordinator_host>
PORT: <greenplum_port>
VERSION: <version_number>
<DATASOURCE>
<DATASOURCE_specific_properties>
[SCHEDULE:
RETRY_INTERVAL: <retry_time>
MAX_RETRIES: <num_retries>
RUNNING_DURATION: <run_time>
AUTO_STOP_RESTART_INTERVAL: <restart_time>
MAX_RESTART_TIMES: <num_restarts>
QUIT_AT_EOF_AFTER: <clock_time>]
[ALERT:
COMMAND: <command_to_run>
WORKDIR: <directory>
TIMEOUT: <alert_time>]
Where you may specify any property value with a template variable that GPSS substitutes at runtime using the following syntax:
<PROPERTY:> {{<template_var>}}
Description
You specify the configuration parameters for a VMware Tanzu Greenplum streaming server (GPSS) job in a YAML-formatted configuration file that you provide to the gpsscli submit
command. There are two types of configuration parameters in this file - VMware Tanzu Greenplum connection parameters, and parameters specific to the data source from which you will load data into Greenplum.
This reference page uses the name gpsscli.yaml
to refer to this file; you may choose your own name for the file.
GPSS currently supports loading data from Kafka and file data sources. Refer to Loading Kafka Data into Greenplum and Loading File Data into Greenplum for detailed information about using GPSS to load data into Tanzu Greenplum.
The gpsscli
utility processes the YAML configuration file in order, using indentation (spaces) to determine the document hierarchy and the relationships between the sections. The use of white space in the file is significant, and keywords are case-sensitive.
Keywords and Values
VMware Tanzu Greenplum Options
- DATABASE: db_name
- The name of the Tanzu Greenplum.
- USER: user_name
- The name of the Tanzu Greenplum user/role. This user_name must have permissions as described in Configuring Tanzu Greenplum Role Privileges.
- PASSWORD: password
- The password for the Tanzu Greenplum user/role. By default, the GPSS client passes the password to the GPSS server in clear text. When the password has a
SHADOW:
prefix, it represents a shadowed password string, and GPSS uses theShadow:Key
specified in its gpss.json configuration file, or a default key, to decode the password. - HOST: coordinator_host
- The host name or IP address of the Tanzu Greenplum coordinator host.
- PORT: greenplum_port
- The port number of the Tanzu Greenplum server on the coordinator host.
- VERSION: version_number
- The version of the
gpsscli
configuration file. GPSS supports versions 1 and 2 of this format.
DATASOURCE: Options
- DATASOURCE
-
The data source. GPSS currently supports
KAFKA
andFILE
data sources; refer to gpkafka-v2.yaml and filesource-v2.yaml for configuration file format and parameters.- DATASOURCE_specific_parameters
- Parameters specific to the datasource.
Job SCHEDULE: Options
- SCHEDULE:
-
Controls the frequency and interval of restarting jobs.
- RETRY_INTERVAL: retry_time
- The period of time that GPSS waits before retrying a failed job. You can specify the time interval in day (
d
), hour (h
), minute (m
), second (s
), or millisecond (ms
) integer units; do not mix units. The default retry interval is5m
(5 minutes). - MAX_RETRIES: num_retries
- The maximum number of times that GPSS attempts to retry a failed job. The default is 0, do not retry. If you specify a negative value, GPSS retries the job indefinitely.
- RUNNING_DURATION: run_time
- The amount of time after which GPSS automatically stops a job. GPSS does not automatically stop a job by default.
- AUTO_STOP_RESTART_INTERVAL: restart_time
- The amount of time after which GPSS restarts a job that it stopped due to reaching
RUNNING_DURATION
. - MAX_RESTART_TIMES: num_restarts
- The maximum number of times that GPSS restarts a job that it stopped due to reaching
RUNNING_DURATION
. The default is 0, do not restart the job. If you specify the value-1
, GPSS restarts the job indefinitely. You may usegpsscli stop
to stop the jobs from being restarted indefinitely. - QUIT_AT_EOF_AFTER: clock_time
- The clock time after which GPSS stops a job every day when it encounters an EOF. By default, GPSS does not automatically stop a job that reaches EOF. GPSS never stops a job when the current time is before
clock_time
, even when GPSS encounters an EOF.
- Job ALERT: Options
-
Controls notification when a job is stopped for any reason (success, completion, error, user-initiated stop).
- COMMAND: command_to_run
- The command (program or script) that the GPSS server runs on the GPSS server host, including arguments. You must specify the absolute path of the command, and the command must be executable by GPSS.
- If command_to_run is a script, you must specify the interpreter (for example,
#!/bin/bash
or#!/usr/bin/python3
) in the shell script file. - command_to_run has access to job-related environment variables that GPSS sets, including:
$GPSSJOB_NAME
,$GPSSJOB_STATUS
, and$GPSSJOB_DETAIL
. - WORKDIR: directory
- The working directory for command_to_run. The default working directory is the directory from which you started the GPSS server process. If you specify a relative path, it is relative to the directory from which you started the GPSS server process.
- TIMEOUT: alert_time
- The maximum amount of time that command_to_run may run. GPSS starts the alert timer after a job stops, and forcibly stops command_to_run if it is still running after alert_time. You can specify the time interval in day (
d
), hour (h
), minute (m
), or second (s
) integer units; do not mix units. The default alert timeout is-1s
(no timeout).
Template Variables
GPSS supports using template variables to specify property values in the load configuration file.
You specify a template variable value in the load configuration file as follows:
<PROPERTY>: {{<template_var>}}
For example:
MAX_RETRIES: {{numretries}}
GPSS substitutes the template variable with a value that you specify via the -p | --property <template_var=value>
option to the gpsscli dryrun
, gpsscli submit
, gpsscli load
, or gpkafka load
command.
For example, if the command line specifies:
--property numretries=10
GPSS substitutes occurrences of {{numretries}}
in the load configuration file with the value 10
before submitting the job, and uses that value while the job is running.
Examples
Submit a job to load data into Tanzu Greenplum as defined in the load configuration file named loadit.yaml
:
$ gpsscli submit loadit.yaml
Example Tanzu Greenplum configuration parameters in loadit.yaml
:
DATABASE: ops
USER: gpadmin
PASSWORD: changeme
HOST: mdw-1
PORT: 15432
<DATASOURCE_block> ...
See Also
gpsscli load, gpsscli submit, gpkafka load, filesource-v2.yaml, gpkafka-v2.yaml
Content feedback and comments