Tanzu Greenplum 7

Setting Up VMware vSphere Network

Last Updated March 11, 2025

Requirements

The minimum requirement for Greenplum on vSphere networking is connectivity between the Greenplum Segment Host Virtual Machines for use as the Greenplum Interconnect.

The interconnect is used by Greenplum for internal communications. It requires a number of static IP addresses that will be used for Greenplum Database Coordinator and Segment Hosts. For more information about Greenplum architecture, see the Greenplum Architecture Documentation.

Determine the number of required static IP addresses based on the number of virtual machines used for the Greenplum Coordinator and Segment Hosts in your environment. For example, a Greenplum Cluster with a Standby Coordinator, 10 Segment Hosts, with 3 Greenplum Segments per Segment Host, would require 12 static IP addresses. For more information about sizing a Greenplum Cluster on vSphere, see Planning VMware vSphere with Greenplum.

Best Practices

The following recommendations are to ensure a Greenplum deployment can achieve the highest performance, with high availability and security.

High Availability

When creating and configuring the distributed port groups in vSphere to use with Greenplum, the following configuration recommendations ensure high availability for Greenplum networking:

  • Use multiple, active/standby uplinks.
  • Separate the active/standby uplinks across different NICs.
  • The active/standby uplinks should connect to different switches which should be interlinked.

Network Throughput

Greenplum deployments on vSphere will typically consist of the networking used for the Greenplum Interconnect, and networking between vSphere compute hosts and the distributed storage system (e.g. vSAN, PureStorage, PowerFlex). These two networks require high throughput for optimal Greenplum performance.

It is recommended that the Greenplum Interconnect has available a minimum of 10 Gbe/s of throughput with low latency.

It is recommended that the networking used for storage has available a minimum of 100 Gbe/s of throughput with low latency.

Dedicated Networking

The networking used for the Greenplum Interconnect, and networking between vSphere compute hosts and the distributed storage system (e.g. vSAN, PureStorage, PowerFlex), are used extensively during normal Greenplum operations. It is recommended that the networks used are dedicated to Greenplum operations.

MTU

It is recommended that the MTU settings, throughout the infrastructure used for Greenplum, are consistent with the Greenplum configuration given here Configuring Your Systems.

When Greenplum is deployed on vSphere, the Distributed Virtual Switch MTU configuration should be validated. Check the Distributed Virtual Switch settings to make sure that the Maximum Transmission Unit (MTU) is set to the expected value:

  1. On the VMware vSphere Client Home page, click Networking and navigate to your distributed switch.
  2. Navigate to Configure > Properties > Advanced.
  3. Verify that MTU is set to the expected value.

Since the distributed switch and the physical switches are configured separately, settings such as the MTU, VLAN, and teaming settings might present configuration discrepancies. To assist in identifying any network configuration issues, VMware vSphere provides a vDS Health Check to detect network configuration inconsistencies between the distributed switch and physical switches. For more information, see the VMware vSphere Documentation.

It is recommended to only enable the vDS Health Check during deployment and testing, and turn it off once you have verified that there are no configuration inconsistencies.

To enable the VMware vSphere Distributed Switch Health Check from vCenter:

  1. Click Menu > Networking.
  2. Select your distributed switch.
  3. Click Configure > Settings > Health Check.
  4. Click Edit on the right pane.
  5. For VLAN and MTU, select Enabled and leave Interval as default (1 minute).
  6. For Teaming and Failover, select Enabled and leave Interval as default (1 minute).
  7. Click OK.

Any discrepancies identified should be resolved for optimal Greenplum performance and high availability.

There are some known limitations of Health Check:

  • It does not check the LAG ports.
  • It could cause network performance degradation if you have many uplinks, VLANs, and hosts.

Network Separation

The minimum requirement for Greenplum is an available network for the Greenplum Interconnect. However, it is common and recommended to use additional networks for other purposes in a typical Greenplum system.

  1. Internal Network - The minimum required network for Greenplum. For more information, see the Requirements section. Often, this network is not routable outside the Greenplum infrastructure.
  2. External Network - An externally routable network, often assigned as a separate virtual NIC on the Greenplum Coordinator / Standby Coordinator Host. Used for external user access to the Greenplum Database.
  3. DataOps Network - A separate NIC on each Greenplum Host assigned to a dedicated network for data operations such as backup, restore, disaster recovery, ETL, or otherwise any input and output of data separate from regular Greenplum Database access.
  4. Administration - A separate NIC on each Greenplum Host assigned to a dedicated network for administrative access.

Example Configuration

The following table summarizes a sample configuration of distributed port groups you may see in a typical vSphere cluster, with vSAN used for Greenplum.

Port Group Description
greenplum‑internal Used for Greenplum cluster internal communications, including interconnect, dispatch and mirroring. This network is usually air-gapped, and does not have internet connection. All IP addresses are statically assigned.
greenplum‑external Used for Greenplum cluster user connections from outside the cluster. The Greenplum coordinator host exposes $PGPORT on this network. It is also used to connect to the coordinator host and standby coordinator host for DBA or troubleshooting purposes. This network usually has DHCP enabled.
greenplum‑dataops Used for ETL and backup/restore operations for the Greenplum cluster.
VMware vSphere‑management Used by vCenter to manage the ESXi hosts. It supports the VMkernel port of management.
VMware vSphere‑vmotion Used for VMware vSphere HA and DRS. It supports the VMkernel port of vMotion.
vSphere‑vsan Used for vSAN connections. It supports the VMkernel port of vSAN.
vSphere‑vcenter Dedicated management network, including vCenter, DHCP, DNS, and NTP services. Usually the IP addresses are statically assigned.
vSphere‑vm Used by non Greenplum virtual machines to connect to the company network. It usually has internet connectivity. This network usually has DHCP enabled.

The table below summarizes an example configuration of distributed port group uplink configuration to achieve high availability. This example is for a vSphere cluster such that each compute host has 4 available NIC ports.

Port Group Name uplink1 (vmnic0) uplink2 (vmnic1) uplink3 (vmnic2) uplink4 (vmnic3)
vSphere‑management Active Unused Unused Standby
vSphere-vmotion Standby Unused Unused Active
vSphere-vm Standby Unused Unused Active
vSphere-vcenter Standby Unused Unused Active
vSphere-vsan Unused Active Standby Unused
greenplum-internal Unused Standby Active Unused
greenplum-external Active Unused Unused Standby
greenplum-dataops Standby Unused Unused Active

About the Greenplum Interconnect

The interconnect is the networking layer of the Greenplum Database architecture.

The interconnect refers to the inter-process communication between segments and the network infrastructure on which this communication relies. The Greenplum interconnect relies on each Greenplum Segment Host having a unique, static IP address. For performance reasons, a 10-Gigabit system, or faster is recommended.

By default, the interconnect uses User Datagram Protocol with flow control (UDPIFC) for interconnect traffic to send messages over the network. The Greenplum software performs packet verification beyond what is provided by UDP. This means the reliability is equivalent to Transmission Control Protocol (TCP), and the performance and scalability exceeds TCP. If the interconnect is changed to TCP, Greenplum Database has a scalability limit of 1000 segment instances. With UDPIFC as the default protocol for the interconnect, this limit is not applicable.

Interconnect Redundancy

The recommended best practice is to have a highly available interconnect to ensure network infrastructure failures do not disrupt Greenplum workload. A sample physical solution is using multiple Ethernet switches on your network with multiple Ethernet ports for each Greenplum Segment Host to ensure no single hardware failure will disrupt Greenplum workload.

Network Interface Configuration

A segment host typically has multiple network interfaces designated to Greenplum interconnect traffic. The master host typically has additional external network interfaces in addition to the interfaces used for interconnect traffic.

Depending on the number of interfaces available, you will want to distribute interconnect network traffic across the number of available interfaces. This is done by assigning segment instances to a particular network interface and ensuring that the primary segments are evenly balanced over the number of available interfaces.

This is done by creating separate host address names for each network interface. For example, if a host has four network interfaces, then it would have four corresponding host addresses, each of which maps to one or more primary segment instances. The /etc/hosts file should be configured to contain not only the host name of each machine, but also all interface host addresses for all of the Greenplum Database hosts (master, standby master, segments, and ETL hosts).

With this configuration, the operating system automatically selects the best path to the destination. Greenplum Database automatically balances the network destinations to maximize parallelism.

Switch Configuration

This is an example when deploying Greenplum without any virtualization. Using multiple 10 Gigabit Ethernet switches within your Greenplum Database array, evenly divide the number of subnets between each switch. In this example configuration, if we had two switches, NICs 1 and 2 on each host would use switch 1 and NICs 3 and 4 on each host would use switch 2. For the master host, the host name bound to NIC 1 (and therefore using switch 1) is the effective master host name for the array. Therefore, if deploying a warm standby master for redundancy purposes, the standby master should map to a NIC that uses a different switch than the primary master.

About ETL Hosts for Data Loading

Greenplum supports fast, parallel data loading with its external tables feature. By using external tables in conjunction with Greenplum Database's parallel file server (gpfdist), administrators can achieve maximum parallelism and load bandwidth from their Greenplum Database system. Many production systems deploy designated ETL servers for data loading purposes. These machines run the Greenplum parallel file server (gpfdist), but not Greenplum Database instances.

One advantage of using the gpfdist file server program is that it ensures that all of the segments in your Greenplum Database system are fully utilized when reading from external table data files.

The gpfdist program can serve data to the segment instances at an average rate of about 350 MB/s for delimited text formatted files and 200 MB/s for CSV formatted files. Therefore, you should consider the following options when running gpfdist in order to maximize the network bandwidth of your ETL systems:

  • If your ETL machine is configured with multiple network interface cards (NICs) as described in Network Interface Configuration, run one instance of gpfdist on your ETL host and then define your external table definition so that the host name of each NIC is declared in the LOCATION clause (see CREATE EXTERNAL TABLE in the Greenplum Database Reference Guide). This allows network traffic between your Greenplum segment hosts and your ETL host to use all NICs simultaneously.
  • Run multiple gpfdist instances on your ETL host and divide your external data files equally between each instance. For example, if you have an ETL system with two network interface cards (NICs), then you could run two gpfdist instances on that machine to maximize your load performance. You would then divide the external table data files evenly between the two gpfdist programs.