Managing Data Flow Service Instances using cf CLI

Last Updated December 10, 2024

Here you will find information about managing Data Flow service instances using the Cloud Foundry Command Line Interface (cf CLI). You can also manage Data Flow service instances using Apps Manager.

To have read and write access to a Spring Cloud Data Flow for VMware Tanzu service instance, you must have the SpaceDeveloper role in the space where the service instance was created.
If you have only the SpaceAuditor role in the space where the service instance was created, you have only read (not write) access to the service instance.

Available parameters

When creating or updating a Spring Cloud Data Flow service instance, you can configure the service instance using parameters passed to the cf CLI commands. See the following sections for information about the supported parameters.

Setting the buildpack

Each Data Flow service instance can be given the name of a buildpack to use for deploying stream and task apps. You can set the buildpack for the service instance using a buildpack parameter given to cf create-service or cf update-service.

To create a service instance that uses a buildpack named custom-java-buildpack to deploy apps, you might run:

$ cf create-service p-dataflow standard data-flow -c '{"buildpack": "custom-java-buildpack"}'

Configuring app settings for Data Flow server and Skipper

You can configure settings for a service instance’s backing Data Flow server and Skipper apps using parameters given to cf create-service or cf update-service.

Parameter	Function
`dataflow.disk`	The disk used by the Data Flow server
`dataflow.memory`	The memory used by the Data Flow server
`skipper.disk`	The disk used by the Skipper backing app
`skipper.memory`	The memory used by the Skipper backing app

For all disk and memory settings, the default unit is mebibytes (MB). You can use other units by naming the unit in the value string (for example, "1G", "512MB", "2GiB", or "3gb").

To create a service instance with a Skipper backing app that uses 4 GiB of disk space, you might run:

$ cf create-service p-dataflow standard data-flow -c '{"skipper": { "disk": "4GiB" } }'

Configuring domain for Data Flow server and Skipper

You can configure the domain used by a service instance’s backing Data Flow server and Skipper apps using a domain parameter given to cf create-service or cf update-service.

To create a service instance that uses the domain my-dataflow.example.com for its backing Data Flow server and Skipper apps, you might run:

$ cf create-service p-dataflow standard data-flow -c '{"domain": "my-dataflow.example.com"}'

Configuring Skipper health check

You can configure Spring Cloud Skipper settings for a service instance’s Skipper backing app by passing the settings as parameters to cf create-service or cf update-service. This can be used to configure the deployer health check timeout, for example.

To create a service instance that uses a health check timeout of five (5) minutes, you might run:

$ cf create-service p-dataflow standard data-flow -c '{"spring.cloud.skipper.server.strategies.healthcheck.timeout-in-millis": 300000}'

Activating caching for Maven artifacts

By default, a Data Flow server instance does not cache artifacts downloaded from a Maven repository, because this caching can overwhelm app containers and cause the service instance’s Data Flow or Skipper backing apps to crash.

If you need to, you can activate caching of Maven artifacts by setting a maven-cache parameter, passed to cf create-service or cf update-service, to true:

$ cf create-service p-dataflow standard data-flow -c '{"maven-cache": true}'

Setting dependent services

Each Data Flow service instance uses three dependent data services. Defaults for these services can be configured in the tile settings, and these defaults can be overridden for each individual service instance at create or update time.

The service offerings with the plan proxy are proxy services used by Spring Cloud Data Flow service instances. The Spring Cloud Data Flow service broker creates and deletes instances of these services automatically along with each Spring Cloud Data Flow service instance. Do not manually create or delete instances of these services.

General parameters used to configure dependent data services for a Data Flow service instance are listed below.

Parameter	Function
`relational-data-service.name`	The name of the service to use for a relational database that stores Spring Cloud Data Flow metadata and task history
`relational-data-service.plan`	The name of the service plan to use for the relational database service
`messaging-data-service.name`	The name of the service to use for a RabbitMQ or Kafka server that facilitates event messaging
`messaging-data-service.plan`	The name of the service plan to use for the RabbitMQ or Kafka service
`skipper-relational.name`	The name of the service to use for a relational database used by the Skipper application
`skipper-relational.plan`	The name of the service plan to use for a relational database used by the Skipper application

To create a Data Flow service instance that uses VMware Tanzu for MySQL for the Data Flow and Skipper relational databases and uses VMware RabbitMQ for the event messaging service, use a command similar to the following:

$ cf create-service p-dataflow standard data-flow -c '{ "relational-data-service": { "name": "p.mysql", "plan": "med-db" }, "messaging-data-service": { "name": "p.rabbitmq", "plan": "high-vol" }, "skipper-relational": { "name": "p.mysql", "plan": "sm-db" } }'

Setting Composed Task Runner app URL

To run composed tasks, Spring Cloud Data Flow uses a task app called the Composed Task Runner (CTR). By default, Data Flow downloads this app from the Maven Central repository. A different default URL for this app can be configured in the tile settings, and this default can be overridden for each individual service instance at create or update time. You can specify a different URL to use for downloading the app by using a parameter passed to the cf create-service or cf update-service command.

To create a service instance that downloads the CTR app from https://example.com/ctr.jar, you might run:

$ cf create-service p-dataflow standard data-flow -c '{ "composed-task-runner-uri": "https://example.ctr.jar" }'

Binding arbitrary services

Each Data Flow service instance can optionally be bound to other service instances. For instance, you can configure a Data Flow service instance to be bound to an existing Spring Cloud Services Config Server service instance. To specify that a Data Flow service instance should be bound to an existing other service instance, include the other service instance name in a JSON array called services and pass the array to the cf create-service or cf update-service command.

To create a Data Flow service instance that is bound to an existing Spring Cloud Services Config Server service instance named my-config-server, use a command similar to the following:

$ cf create-service p-dataflow standard data-flow -c '{"services": ["my-config-server"] }'

When the service instance is created, the data-flow service instance is bound to the existing my-config-server service instance.

Using a Grafana dashboard

You can use Grafana to view metrics for Spring Cloud Data Flow apps and streams. To activate this, use settings under spring.cloud.dataflow.grafana-info, passed to cf create-service or cf update-service.

To create a service instance that sends metrics to a Grafana dashboard located at https://grafana.example.com:443, you might run:

$ cf create-service p-dataflow standard data-flow -c '{"spring.cloud.dataflow.grafana-info.url": "https://grafana.example.com:443"}'

Spring Cloud Data Flow does not provide a Grafana installation. You must provide your own Grafana installation to use Spring Cloud Data Flow with Grafana.

Setting Maven properties

Each Data Flow service instance can optionally specify Maven configuration properties. For the complete list of properties that can be specified, see Function Composition in the OSS Spring Cloud Data Flow documentation.

Maven configuration properties can be set for each Data Flow service instance using parameters given to cf create-service or cf update-service.

To set the maven.remote-repositories.repo1.url property, use a command similar to the following:

$ cf create-service p-dataflow standard data-flow -c '{"maven.remote-repositories.repo1.url": "https://repo.spring.io/libs-snapshot"}'

To configure a private Maven repository that requires authentication, you can provide a username and password, as shown here:

$ cf create-service p-dataflow standard data-flow -c '{"maven.remote-repositories.repo1.url":"https://my.private.maven/repo","maven.remote-repositories.repo1.auth.username":"user","maven.remote-repositories.repo1.auth.password":"password"}'

Configuring Wavefront

Spring Cloud Data Flow can integrate with Tanzu Observability by Wavefront to monitor deployed event-streaming and batch applications. Default values for Wavefront settings can be set in the tile configuration, and these default values can be overridden for each individual service instance at create or update time.

To configure Wavefront settings for a Data Flow service instance, pass a wavefront parameter to the cf create-service or cf update-service command. This parameter is a JSON object with the following fields.

Parameter	Function
`uri`	The URI of the Wavefront instance
`api-token`	The user API token to use for Wavefront
`source`	An arbitrary string used to identify the Data Flow service instance

To configure these settings for a new Data Flow service instance, use a command similar to the following:

$ cf create-service p-dataflow standard data-flow -c '{"wavefront": {"uri": "https://wavefront.example.com", "api-token": "EXAMPLE_API_TOKEN", "source": "my-dataflow-si"} }'

All Wavefront settings are optional. If you do not supply a value for any particular setting, the Data Flow service instance uses the default value for that setting (the value specified in the tile settings).

Limiting concurrent tasks

Each Data Flow service instance can execute a maximum number of concurrently-running tasks (the default limit is 10).

To configure the concurrent task limit using a concurrent-task-limit parameter given to cf create-service or cf update-service:

$ cf create-service p-dataflow standard data-flow -c '{"concurrent-task-limit": 30}'

When the number of concurrent tasks reaches the specified limit, the Data Flow service instance no longer launches new tasks until the number of running tasks is again below the limit.

Activating task support only (no streams)

Each Data Flow service instance can be configured to run tasks only (with stream support deactivated).

To configure the service instance to activate only task support using a task-only parameter given to cf create-service or cf update-service, run:

$ cf create-service p-dataflow standard data-flow -c '{"task-only": true}'

With task-only set to true, the Spring Cloud Skipper backing app (with its associated relational database backing service instance and the messaging backing service instance) is not deployed for the service instance, and the service instance dashboard does not display the Streams tab. See Using the Dashboard.

Activating stream support only (no tasks)

Each Data Flow service instance can be configured to run streams only (with task support deactivated).

To configure the service instance to activate only stream support using a stream-only parameter given to cf create-service or cf update-service, run:

$ cf create-service p-dataflow standard data-flow -c '{"stream-only": true}'

With stream-only set to true, the service instance dashboard does not display the Tasks tab. See Using the Dashboard.

Configuring Skipper memory allocation

When creating or updating a Data Flow service instance, you can set the memory allocation for the associated Spring Cloud Skipper server deployed to VMware Tanzu Platform for Cloud Foundry. The default memory allocation for Skipper is 2 GB.

To configure a value for Skipper’s memory allocation, pass a skipper parameter (a JSON object with a single memory key) to the cf create-service or cf update-service command:

$ cf create-service p-dataflow standard data-flow -c '{"skipper": { "memory": "8G" }}'

Using Scheduler

You can use the Scheduler service with Spring Cloud Data Flow to schedule task executions. For more information, see the Spring Cloud Data Flow OSS documentation about Scheduling Tasks. If you configure a Data Flow service instance to use Scheduler, the Data Flow broker creates a new Scheduler service instance in the Data Flow service instance’s backing space. This Scheduler service instance is then bound to the Data Flow server’s backing application.

To configure a Data Flow service instance to use Scheduler, pass a scheduler parameter to the cf create-service or cf update-service command. This parameter is a JSON object with the fields listed below.

Parameter	Function
`name`	The name of the scheduler service offering to use. Only the Scheduler service, `scheduler-for-pcf`, is supported.
`plan`	The name of the service plan to use
`instance-name`	Optional. The name of the service instance to create

To create a Data Flow service instance (named mysched) that uses a Scheduler service instance with the standard plan, run a command similar to the following:

$ cf create-service p-dataflow standard mydf -c '{"scheduler":{"name": "p-scheduler", "plan": "standard", "instance-name":"mysched"}}

Creating an instance

Use the following procedure to create an instance.

Begin by targeting the org and space.

$ cf target -o myorg -s development
api endpoint:   https://api.system.example.com
api version:    2.75.0
user:           user
org:            myorg
space:          development

View the plan details for the Data Flow product using cf marketplace -s.

$ cf marketplace
Getting services from marketplace in org myorg / space development as user...
OK

service             plans    description
p-dataflow          standard Deploys Spring Cloud Data Flow servers to orchestrate data pipelines
p-dataflow-mysql    proxy    Proxies to the Spring Cloud Data Flow MySQL service instance
p-dataflow-rabbitmq proxy    Proxies to the Spring Cloud Data Flow RabbitMQ service instance

TIP:  Use 'cf marketplace -s SERVICE' to view descriptions of individual plans of a given service.

$ cf marketplace -s p-dataflow
Getting service plan information for service p-dataflow as user...
OK

service plan   description     free or paid
standard       Standard Plan   free

Create the service instance using cf create-service. To create a Data Flow service that sets the Maven maven.remote-repositories.repo1.url property to https://repo.spring.io/release, run a command similar to the following:

$ cf create-service p-dataflow standard data-flow -c '{ "maven.remote-repositories.repo1.url": "https://repo.spring.io/libs-snapshot" }'
Creating service instance data-flow in org myorg / space development as user...
OK

Create in progress. Use 'cf services' or 'cf service data-flow' to check operation status.

As shown in the command output, you can use the cf services or cf service commands to check the status of the service instance. When the service instance is ready, the cf service command returns status create succeeded:

$ cf service data-flow

Service instance: data-flow
Service: p-dataflow
Bound apps:
Tags:
Plan: standard
Description: Deploys Spring Cloud Data Flow servers to orchestrate data pipelines
Documentation url: https://cloud.spring.io/spring-cloud-dataflow/
Dashboard: https://p-dataflow.apps.example.com/instances/f09e5c77-e526-4f49-86d6-721c6b8e2fd9/dashboard

Last Operation
Status: create succeeded
Message: Created
Started: 2017-07-20T18:24:14Z
Updated: 2017-07-20T18:26:17Z

Updating an instance

You can update settings on a Data Flow service instance using the cf CLI. The cf update-service command can be given a -c flag with a JSON object containing parameters used to configure the service instance.

Begin by targeting the correct org and space.

$ cf target -o myorg -s development
api endpoint:   https://api.system.example.com
api version:    2.75.0
user:           user
org:            myorg
space:          development

You can view all service instances in the space using cf services.

$ cf services
Getting services in org myorg / space development as user...
OK

name                                           service              plan      bound apps  last operation
data-flow                                      p-dataflow           standard              create succeeded
mysql-b3e76c87-c5ae-47e4-a83c-5fabf2fc4f11     p-dataflow-mysql     proxy                 create succeeded
rabbitmq-b3e76c87-c5ae-47e4-a83c-5fabf2fc4f11  p-dataflow-rabbitmq  proxy                 create succeeded

To upgrade a service instance to the latest version included in the tile, include the parameter upgrade with value true, run:
```
cf update-service SERVICE_NAME -c '{ "PARAMETER": "VALUE" }'
```
Where:
- SERVICE_NAME is the name of the service instance.
- PARAMETER is a supported parameter.
- VALUE is the value for the parameter.
For more information, see Available Parameters.
For example:
```
$ cf update-service data-flow -c '{"upgrade": true}'
Updating service instance data-flow as user...
OK

Update in progress. Use 'cf services' or 'cf service data-flow' to check operation status.
```

As shown in the output from the cf update-service command, you can use the cf services or cf service commands to check the status of the service instance.

When the Data Flow service instance has been updated, the cf service command returns a status update succeeded:

$ cf service data-flow
Showing info of service data-flow in org myorg / space dev as user...

name:            data-flow
service:         p-dataflow
bound apps:
tags:
plan:            standard
description:     Deploys Spring Cloud Data Flow servers to orchestrate data pipelines
documentation:
dashboard:       https://p-dataflow.apps.example.com/instances/1cf8ff5b-4a65-469d-bee7-36e6541ac241/dashboard

Showing status of last operation from service data-flow...

status:    update succeeded
message:   Updated
started:   2018-06-19T19:26:09Z
updated:   2018-06-19T19:29:17Z

Deleting an instance

Deleting a Data Flow service instance results in deletion of all of its dependent service instances.