You can configure a Greenplum system to use proxies for interconnect communication to reduce the use of connections and ports during query processing.
The Greenplum interconnect (the networking layer) refers to the inter-process communication between segments and the network infrastructure on which this communication relies. For information about the Greenplum architecture and interconnect, see About the Greenplum Architecture.
In general, when running a query, a QD (query dispatcher) on the Greenplum coordinator creates connections to one or more QE (query executor) processes on segments, and a QE can create connections to other QEs. For a description of Greenplum query processing and parallel query processing, see About Greenplum Query Processing.
By default, connections between the QD on the coordinator and QEs on segment instances and between QEs on different segment instances require a separate network port. You can configure a Greenplum system to use proxies when Greenplum communicates between the QD and QEs and between QEs on different segment instances. The interconnect proxies require only one network connection for Greenplum internal communication between two segment instances, so it consumes fewer connections and ports than TCP
mode, and has better performance than UDPIFC
mode in a high-latency network.
To enable interconnect proxies for the Greenplum system, set these system configuration parameters.
- List the proxy ports with the parameter gp_interconnect_proxy_addresses. You must specify a proxy port for the coordinator, standby coordinator, and all segment instances.
- Set the parameter gp_interconnect_type to
proxy
.
When expanding a Greenplum Database system, you must deactivate interconnect proxies before adding new hosts and segment instances to the system, and you must update the
gp_interconnect_proxy_addresses
parameter with the newly-added segment instances before you re-enable interconnect proxies.
Parent topic: Managing a Greenplum System
Example
This example sets up a Greenplum system to use proxies for the Greenplum interconnect when running queries. The example sets the gp_interconnect_proxy_addresses parameter and tests the proxies before setting the gp_interconnect_type parameter for the Greenplum system.
- Setting the Interconnect Proxy Addresses
- Testing the Interconnect Proxies
- Setting Interconnect Proxies for the System
Setting the Interconnect Proxy Addresses
Set the gp_interconnect_proxy_addresses
parameter to specify the proxy ports for the coordinator and segment instances. The syntax for the value has the following format and you must specify the parameter value as a single-quoted string.
<db_id>:<cont_id>:<seg_address>:<port>[, ... ]
For the coordinator, standby coordinator, and segment instance, the first three fields, db_id, cont_id, and seg_address can be found in the gp_segment_configuration catalog table. The fourth field, port, is the proxy port for the Greenplum coordinator or a segment instance.
- db_id is the
dbid
column in the catalog table. - cont_id is the
content
column in the catalog table. - seg_address is the IP address or hostname corresponding to the
address
column in the catalog table. - port is the TCP/IP port for the segment instance proxy that you specify.
If a segment instance hostname is bound to a different IP address at runtime, you must run
gpstop -u
to re-load thegp_interconnect_proxy_addresses
value.
This is an example PL/Python function that displays or sets the segment instance proxy port values for the gp_interconnect_proxy_addresses
parameter. To create and run the function, you must enable PL/Python in the database with the CREATE EXTENSION plpython3u
command.
--
-- A PL/Python function to setup the interconnect proxy addresses.
-- Requires the Python module os.
--
-- Usage:
-- select * from my_setup_ic_proxy(500, ''); -- display IC proxy values for segments
-- select * from my_setup_ic_proxy(500, 'update proxy'); -- update the gp_interconnect_proxy_addresses parameter
--
-- The first argument, "delta", is used to calculate the proxy port with this formula:
--
-- proxy_port = postmaster_port + delta
--
-- If "delta" is 0, we automatically calculate one using the difference between
-- the max and min port values in gp_segment_configuration.
--
-- The second argument, "action", is used to update the gp_interconnect_proxy_addresses parameter.
-- The parameter is not updated unless "action" is 'update proxy'.
--
create or replace function my_setup_ic_proxy(delta int, action text)
returns table(dbid smallint, content smallint, address text, port int) as $$
import os
import sys
global delta
results = []
value = ''
if delta != 0:
# Perform validation of configuration with given delta.
conflicting_ports = plpy.execute('''
with proxy_ports
as (select address, port + %s from gp_segment_configuration),
segment_ports
as (select address, port from gp_segment_configuration)
select * from proxy_ports intersect select * from segment_ports
''' % delta)
if conflicting_ports.nrows() > 0:
plpy.error('Chosen delta creates conflicting ports. Please choose a different delta or pass delta = 0 for automatic delta assignment')
else:
# Compute the delta ourselves
delta = plpy.execute('select max(port) - min(port) + 1 as step from gp_segment_configuration')[0]['step']
plpy.notice('Calculated a delta of %d' % delta)
segs = plpy.execute('''SELECT dbid, content, port, address
FROM gp_segment_configuration
ORDER BY 1''')
for seg in segs:
dbid = seg['dbid']
content = seg['content']
port = seg['port']
address = seg['address']
# decide the proxy port
port = port + delta
# append to the result list
results.append((dbid, content, address, port))
# build the value for the GUC
if value:
value += ','
value += '{}:{}:{}:{}'.format(dbid, content, address, port)
if action.lower() == 'update proxy':
plpy.notice('Setting config with gpconfig')
gpconfig_cmd = " ".join(["gpconfig", "--skipvalidation", "-c", "gp_interconnect_proxy_addresses", "-v", "\"'{}'\"".format(value)])
plpy.notice('Running command: {}'.format(gpconfig_cmd))
rc = os.system(gpconfig_cmd)
if rc != 0:
plpy.error('Failed to set config with gpconfig. Please consult the gpconfig log file. rc = {} stderr = {}'.format(rc, sys.stderr))
plpy.notice('Reloading config with gpstop')
rc = os.system('gpstop -u')
if rc != 0:
plpy.error('Failed to reload config with gpstop. Please consult the gpstop log file.')
else:
rerun_with_cmd = '''select * from my_setup_ic_proxy(%d, 'update proxy')''' % delta
plpy.notice('''if the settings displayed below is desirable, re-run the following command:''')
plpy.notice(rerun_with_cmd)
return results
$$ language plpython3u execute on coordinator;
Running this command lists the segment instance values for the gp_interconnect_proxy_addresses
parameter, by applying a port delta of 500. The delta parameter should be adjusted according to the available ports in your environment.
select * from my_setup_ic_proxy(500, '');
When choosing an explicit delta, the resultant port assignments will be checked automatically for conflicts.
Once a suitable port delta has been specified, run the following command to set the parameter.
select my_setup_ic_proxy(500, 'update proxy');
Passing in a delta
of 0
will make the function automatically assign a port delta that results in a non-conflicting port assignment.
The following command will thus automatically generate a port assignment and apply that assignment.
select my_setup_ic_proxy(0, 'update proxy');
As an alternative, you can run the gpconfig utility to set the gp_interconnect_proxy_addresses
parameter. To set the value as a string, specify a single-quoted string that is enclosed in double quotes. The example Greenplum system consists of a coordinator and a single segment instance.
gpconfig --skipvalidation -c gp_interconnect_proxy_addresses -v "'1:-1:192.168.180.50:35432,2:0:192.168.180.54:35000'"
After setting the gp_interconnect_proxy_addresses
parameter, reload the postgresql.conf
file with the gpstop -u
command. This command does not stop and restart the Greenplum system.
Testing the Interconnect Proxies
To test the proxy ports configured for the system, you can set the PGOPTIONS
environment variable when you start a psql
session in a command shell. This command sets the environment variable to enable interconnect proxies, starts psql
, and logs into the database mytest
.
PGOPTIONS="-c gp_interconnect_type=proxy" psql -d mytest
You can run queries in the shell to test the system. For example, you can run a query that accesses all the primary segment instances. This query displays the segment IDs and number of rows on the segment instance from the table sales
.
# SELECT gp_segment_id, COUNT(*) FROM sales GROUP BY gp_segment_id ;
Setting Interconnect Proxies for the System
After you have tested the interconnect proxies for the system, set the server configuration parameter for the system with the gpconfig
utility.
gpconfig -c gp_interconnect_type -v proxy
Reload the postgresql.conf
file with the gpstop -u
command. This command does not stop and restart the Greenplum system.
Content feedback and comments