Getting Started with Spring Cloud Data Flow for VMware Tanzu

Last Updated October 23, 2024

Here are the steps to help you get started using Spring Cloud Data Flow for VMware Tanzu. The examples below use Spring Cloud Data Flow for VMware Tanzu to quickly create a data pipeline.

To have read and write access to a Spring Cloud Data Flow for VMware Tanzu service instance, you must have the SpaceDeveloper role in the space where the service instance was created. If you have only the SpaceAuditor role in the space where the service instance was created, you have only read (not write) access to the service instance.

Consider installing the Spring Cloud Data Flow for VMware Tanzu and Service Instance Logs cf CLI plug-ins. For more information, see Using the Shell and Viewing Service Instance Logs.

The examples in this topic use the Spring Cloud Data Flow for VMware Tanzu cf CLI plug-in.

Creating a data pipeline using the shell

Create a Spring Cloud Data Flow service instance (see the Creating an Instance section of the Managing Service Instances topic). If you use the default backing data services of VMware Tanzu SQL [MySQL] and RabbitMQ for VMware Tanzu, you can then import the Spring Cloud Data Flow OSS “RabbitMQ + Maven” stream app starters. For information about these apps, see the Spring website.

Start the Data Flow shell using the cf dataflow-shell command added by the Spring Cloud Data Flow for VMware Tanzu cf CLI plug-in:

$ cf dataflow-shell my-dataflow
Attaching shell to dataflow service my-dataflow in org myorg / space dev as user...
Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
Successfully targeted https://dataflow-9f45f80b-c6b6-43dd-a7d4-e43f14990ffd.apps.example.com
dataflow:>

Import the stream app starters using the Data Flow shell’s app import command:

dataflow:>app import https://dataflow.spring.io/rabbitmq-maven-latest
Successfully registered 65 applications from [source.sftp.metadata,
sink.throughput.metadata, ... sink.router.metadata, sink.mongodb]

With the app starters imported, you can use three–the http source, the split processor, and the log sink–to create a stream that consumes data via an HTTP POST request, processes it by splitting it into words, and outputs the results in logs.

Create the stream using the Data Flow shell’s stream create command:

dataflow:>stream create --name words --definition "http | splitter --expression=payload.split(' ') | log"
Created new stream 'words'

Next, deploy the stream, using the stream deploy command:

dataflow:>stream deploy words
Deployment request has been sent for stream 'words'

Creating a data pipeline using the dashboard

Create a Spring Cloud Data Flow service instance (see the Creating an instance section of the Managing service instances topic). If you use the default backing data services of VMware Tanzu SQL [MySQL] and RabbitMQ for VMware Tanzu, you can then import the Spring Cloud Data Flow OSS “RabbitMQ + Maven” stream app starters. For information about these apps, see the Spring website.

In Apps Manager, visit the Spring Cloud Data Flow service instance’s page and click Manage to access its dashboard.

data-flow, Overview tab, Bound apps pane

This will take you to the dashboard’s Apps tab, where you can import applications. Click Add Application(s).

Select Bulk import application coordinates from an HTTP URI location, then click the Stream Apps (RabbitMQ/Maven) link (or manually enter https://dataflow.spring.io/rabbitmq-maven-latest in the URI field). Finally, click Import the application(s).

Data Flow, Apps pane, Bulk import coordinates from an HTTP URI location

With the app starters imported, visit the Streams tab. You can use three of the imported starter applications: the http source, the split processor, and the log sink, to create a stream that consumes data via an HTTP POST request, processes it by splitting it into words, and outputs the results in logs.

Click Create stream(s) to enter the stream creation view. In the left sidebar, search for the http source application. Click it and drag it onto the canvas to begin defining a stream.

Data Flow, Streams pane, Create a stream

Search for and add the splitter processor application and log sink application.

Data Flow, Streams pane, Adding Applications to a Stream

Click the splitter application, then click the gear icon beside it to edit its properties. In the expression field, enter payload.split(' '). Click OK.

Data Flow, Streams pane, Properties for SPLITTER dialog box

Click and drag between the output and input ports on the applications to connect them and complete the stream.

Data Flow, Streams pane, Create a stream

Click Create Stream. Type the name “words,” then click Create the stream.

Data Flow, Streams pane, Create Stream dialog box

The Streams tab now displays the new stream. Click the ► button to deploy the stream.

The new Stream is now shown in the Definitions column.

Click Deploy stream.

The stream called words is displayed with the HTTP source and splitter process.

Using the deployed data pipeline

You can run the cf apps command to see the applications deployed as part of the stream:

$ cf apps
Getting apps in org myorg / space dev as user...
OK

name                        requested state   instances   memory   disk   urls
RWSDZgk-words-http-v1       started           1/1         1G       1G     RWSDZgk-words-http-v1.apps.example.com
RWSDZgk-words-log-v1        started           1/1         1G       1G     RWSDZgk-words-log-v1.apps.example.com
RWSDZgk-words-splitter-v1   started           1/1         1G       1G     RWSDZgk-words-splitter-v1.apps.example.com

Run the cf logs command on the RWSDZgk-words-log-v1 application:

$ cf logs RWSDZgk-words-log-v1
Retrieving logs for app RWSDZgk-words-log-v1 in org myorg / space dev as user...

Then, in a separate command line, use the Data Flow shell (started using the cf dataflow-shell command) to send a POST request to the RWSDZgk-words-http-v1 application:

$ cf dataflow-shell dataflow
...
Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>http post --target https://RWSDZgk-words-http-v1.apps.example.com --data "This is a test"
> POST (text/plain) https://RWSDZgk-words-http-v1.apps.example.com This is a test
> 202 ACCEPTED

Watch for the processed data in the RWSDZgk-words-log-v1 application’s logs:

2018-06-07T16:47:08.80-0500 [APP/PROC/WEB/0] OUT 2018-06-07 21:47:08.808  INFO 16 --- [plitter.words-1] RWSDZgk-words-log-v1                     : This
2018-06-07T16:47:08.81-0500 [APP/PROC/WEB/0] OUT 2018-06-07 21:47:08.810  INFO 16 --- [plitter.words-1] RWSDZgk-words-log-v1                     : is
2018-06-07T16:47:08.82-0500 [APP/PROC/WEB/0] OUT 2018-06-07 21:47:08.820  INFO 16 --- [plitter.words-1] RWSDZgk-words-log-v1                     : a
2018-06-07T16:47:08.82-0500 [APP/PROC/WEB/0] OUT 2018-06-07 21:47:08.822  INFO 16 --- [plitter.words-1] RWSDZgk-words-log-v1                     : test

Content feedback and comments

Spring Cloud Data Flow for Cloud Foundry 1.12

Getting Started with Spring Cloud Data Flow for VMware Tanzu

Creating a data pipeline using the shell

Creating a data pipeline using the dashboard

Using the deployed data pipeline