Skip to content

CSD Overview

Kostas Sakellis edited this page Feb 27, 2014 · 10 revisions

Custom Service Descriptors

Cloudera Manager (CM) 4.5 introduced parcels - a mechanism to distribute software to a managed cluster. Parcels go only as far as to distribute software across the cluster - they do not allow the starting/management of processes. In Cloudera Manager 5 we have introduced the the ability to add your own managed service to CM through the use of Custom Service Descriptors (CSDs). A third party service making use of CSDs can leverage features of Cloudera Manager such as monitoring, resource management, configuration, distribution, life-cycle management, etc. This service will show up in Cloudera Manager just like any other service eg. HDFS, HBase.

Note: This documentation assumes you have read and are familiar with basic operating principles of Cloudera Manager.

Guiding Principles

  • Can be written by non-programmers using documentation and developer tooling.
  • The service descriptor language (SDL) should be declarative and not require a specialized programming language.
  • In Cloudera Manager, a service backed by a CSD should look and feel like a first-party service. Eg. HDFS.
  • A baseline of functionality is provided to a CSDs for free. eg. process level monitoring.
  • Should work well with parcels but a CSD is not limited to only be used with parcels.
  • If a partner has their own way of installing their software, they can still use a CSD for configuration and process life-cycle management.

What exactly is a CSD?

A CSD is linked to one service type in Cloudera Manager and is packaged and distributed as a jar file. The jar is self-contained and encases all the description and logic needed to manage the service type in CM. For example, the Spark CSD layout is shown below:

$ jar -tf SPARK-1.0.jar 
descriptor/service.sdl
scripts/control.sh
images/icon.png

More examples including the Spark CSD are available in our git repo.

At the heart of the CSD is the service descriptor language (SDL) file, descriptor/service.sdl. This is a json file containing instructions for CM on how to manage the service. Some instructions include:

  • the service types and associated role types
  • how to start the service/roles
  • parameters for both the service and role types
  • additional commands
  • configuration file generators

See the Service Descriptor Language Reference for more details.

Alongside the service.sdl file is a scripts directory that contains executable scripts used for starting roles and running custom commands. These scripts are referenced by The service.sdl file through script runners. Scripts can be written in any language that can be executed on the cluster. In the above example scripts/control.sh is written in bash and is used to start the Spark master and worker roles.

In addition, a CSD may also contain:

  • an icon
  • static configuration files.

Naming

The name of the CSD jar file has the following format: <name>-<csd-version>-<extra>.jar. For example, a release build of version 1.0 of the Spark CSD would be: SPARK-1.0.jar. The <extra> section of the file name is reserved for snapshot builds of the CSD when building with maven. So for example, a snapshot build of the Spark CSD would be: SPARK-1.0-SNAPSHOT.jar. The SNAPSHOT is ignored by Cloudera Manager.

Note: when we say version here we mean the CSD version, NOT the underlining service version. For example, version 1.0 of the Spark CSD, SPARK-1.0.jar might control Spark 0.9.

CSDs vs. Parcels

Both CSDs and Parcels are mechanisms that a partner can use to integration with Cloudera Manager. A parcel is essentially a tar ball of the partner software with some metadata. Through CM, the parcel can get downloaded from a repository, distributed to all the nodes in the cluster and made active. Essentially what happens is that the tarball gets copied to the cluster and unpacked. At no point, can a partner provide a way to start/configure the processes contained in the parcel. Parcels alone are a good way to distribute plugin to services like the LZO plugin to CDH. In that case, there isn't a process that needs management.

CSDs pick up where parcels leave off. Once the software is distributed to the cluster, a partner can write a CSD that describes to Cloudera Manager how to administer their software - start/stop, configuration, resource management. A CSD is what provides the ability for a partner to have a service show up in the wizard and have a status page like other CM services.

If the software being integrated has a process management/configuration component, it is strongly recommended to build a parcel for the software distribution and a CSD for the process management. This provides the most turnkey solution for customers. If however, a partner would like to use their own mechanism for distributing their service to the cluster then a CSD can still be used for process management.

Clone this wiki locally