VMware Greenplum is a massively parallel processing database server specially designed to manage large scale analytic data warehouses and business intelligence workloads. Apache NiFi is a framework that provides an interactive user interface through which you create and manage automated dataflows between systems. The VMware Greenplum Connector for Apache NiFi provides organizations a fast and simple way to build data ingestion pipelines for Greenplum Database, code-free.
You can use the web-based Apache NiFi user interface and built-in NiFi processors to set up a data pipeline that employs the Connector’s PutGreenplumRecord
processor to load record-oriented data into Greenplum Database for subsequent analytics.
The Connector:
- Utilizes the drag-and-drop-based Apache NiFi user interface for component and data pipeline configuration.
- Supports CSV, Avro, Parquet, JSON, and XML input data formats using built-in NiFi Record Readers.
- Converts NiFi records into Greenplum tuples.
- Loads the tuples into Greenplum Database.
The VMware Greenplum Connector for Apache NiFi uses the Greenplum Streaming Server to load data in parallel into Greenplum Database. This facilitates higher concurrency and throughput during data ingestion compared to a JDBC-based NiFi processor, with less load on the Greenplum Database master host.
Next Steps
- Install the Connector and register it with Apache NiFi.
- Review an introduction to the Apache NiFi user interface in Using the Apache NiFi User Interface.
- Examine the data load procedure described in Loading Data with the Connector.
- Try out the Loading CSV Data from the File System example.
Content feedback and comments