The Connector for VMware Greenplum and VMware GemFire is a available as a separate download for VMware Greenplum 5.x and 6.x from Broadcom Support Portal. Before you can use the connector, you must download the connector JAR file and copy the JAR to each VMware GemFire host.
To use the connector, you
specify configuration details with gfsh commands
or within a cache.xml file.
Do not mix the use of gfsh
for configuration with the use
of a cache.xml
file.
To do an explicit mapping of fields, or to map only a subset
of the fields,
specify all configuration in a cache.xml
file.
Downloading the Connector JAR File and Copying to GemFire Hosts
The Connector for Greenplum and GemFire is available as a separate download for Greenplum Database 5.x and 6.x from Broadcom Support Portal. The connector download package is a .tar.gz
file that contains the connector and javadoc JAR files as well as a license file.
Perform these steps to download and unpack the connector package:
Navigate to the VMware Greenplum product on Broadcom Support Portal, select Connector for Greenplum and GemFire under the desired Greenplum release.
The connector download file name format is
gemfire-greenplum-<version>.tar.gz
. For example:gemfire-greenplum-4.0.1.tar.gz
For more information about download prerequisites, troubleshooting, and instructions, see Download Broadcom products and software.
Make note of the directory to which the file was downloaded.
Follow the instructions in Verifying the VMware Greenplum Software Download in the VMware Greenplum documentation to verify the integrity of the Connector for Greenplum and GemFire software.
Unpack the
.tar.gz
file. For example:$ tar xzvf gemfire-greenplum-4.0.1.tar.gz
Unpacking the file creates a directory named
gemfire-greenplum-<version>
in the current working directory. The directory contents include the connector and javadoc JAR files as well as a license file:gemfire-greenplum-4.0.1-javadoc.jar gemfire-greenplum-4.0.1.jar open_source_license_Connector_for_VMware_Greenplum_and_VMware_GemFire_4.0.1_GA.txt
For each host in the GemFire cluster, copy the connector JAR file to the
path_to_product/extensions/
directory on the GemFire host. This step ensures that the connector JAR file is loaded on GemFire startup. Refer to Installing GemFire Extensions in the GemFire documentation for further information.
Using gfsh Commands to Specify Configuration
gfsh
may be used to configure all aspects of transfer and the the mapping,
as follows:
If domain objects are not on the classpath, configure PDX serialization with the GemFire
configure pdx
command after starting locators, but before starting servers. For example:gfsh>configure pdx --read-serialized=true \ --auto-serializable-classes=io.pivotal.gemfire.demo.entity.*
After starting servers, use the GemFire
create jndi-binding
command to specify all aspects of the data source. For example,gfsh>create jndi-binding --name=datasource --type=SIMPLE \ --jdbc-driver-class="org.postgresql.Driver" \ --username="g2c_user" --password="changeme" \ --connection-url="jdbc:postgresql://localhost:5432/gemfire_db"
After creating regions, set up the gpfdist protocol by using
configure gpfdist-protocol
. For example,gfsh>configure gpfdist-protocol --port=8000
Specify the mapping of the Greenplum Database table to the GemFire region with the
create gpdb-mapping
command. For example,gfsh>create gpdb-mapping --region=/Child --data-source=datasource \ --pdx-name="io.pivotal.gemfire.demo.entity.Child" --table=child --id=id,parent_id
Specifying Configuration with a cache.xml File
To provide configuration details within a cache.xml
file,
specify the correct xsi:schemaLocation
attribute within the
cache.xml
file.
For the v4.0.x connector, use
http://schema.pivotal.io/gemfire/gpdb/gpdb-4.0.xsd
Connector Requirements and Caveats
Export is supported from partitioned GemFire regions only. Data cannot be exported from replicated regions. Data can be imported to replicated regions.
The number of Greenplum Database segments must be greater than or equal to the number of GemFire servers. If there is a high ratio of Greenplum Database segments to GemFire servers, the Greenplum configuration parameter
gp_external_max_segs
may be used to limit Greenplum Database concurrency. See gp_external_max_segs for details on this parameter. An approach to finding the best setting begins with identifying a representative import operation.- Measure the performance of the representative import operation with the default setting.
- Measure again with
gp_external_max_segs
set to half the total number of Greenplum Database segments. If there is no gain in performance, then the parameter does not need to be adjusted. - Iterate with values of
gp_external_max_segs
that are half as much at each iteration, until there is no performance improvement or the value ofgp_external_max_segs
is the same as the number of GemFire servers.
Upgrading Java Applications from Version 2.4 to Version 3.x
API changes implemented for version 3.0.0 that are also in this connector version require code revisions in all applications that use import or export functionality.
For this sample version 2.4 export operation, an upsert type of operation was implied:
// Version 2.4 API
long numberExported = GpdbService.createOperation(region).exportRegion();
Here is the equivalent version 3.x code to implement the upsert type of operation:
// Version 3.x API
ExportConfiguration exportConfig = ExportConfiguration.builder(region)
.setType(ExportType.UPSERT)
.build();
ExportResult result = GpdbService.exportRegion(exportConfig);
int numberExported = result.getExportedCount();
For this sample version 2.4 import operation,
// Version 2.4 API
long numberImported = GpdbService.createOperation(region).importRegion();
here is the version 3.x code to implement the import operation:
// Version 3.x API
ImportConfiguration importConfig = ImportConfiguration.builder(region)
.build();
ImportResult result = GpdbService.importRegion(importConfig);
int numberImported = result.getImportedCount();
Please note that the new result
objects' counts are of type int
instead of type long
.
This is for consistency,
as the connector internally uses JDBC's executeQuery()
,
which supports int
.
Content feedback and comments