Skip to content

Releases: cdapio/cdap

Cask Data Application Platform 4.0.0

21 Dec 11:19

Choose a tag to compare

New Features

  • Adds a transactional messaging system that is used for reliable communication of messages between components. In CDAP 4.0.0, the transactional messaging system replaces Kafka for publishing and subscribing audit logs that is used within CDAP for computing data lineage. (CDAP-7211)
  • Added a pluggable extension to retrieve operational statistics in CDAP. Provided extensions for operational stats from YARN, HDFS, HBase, and CDAP. (CDAP-7670) (CDAP-7703) (CDAP-7704)
  • Allow updating or resetting of log levels for program types worker, flow, and service dynamically using REST endpoints (CDAP-5479) (CDAP-7214)

Improvements

  • New menu option in Cloudera Manager when running the CDAP CSD enables running utilities such as the HBaseQueueDebugger. (CDAP-5632)
  • Added support for impersonation with CDAP Explore (Hive) operations, including enabling exploring of a dataset or running queries against it. (CDAP-6587)
  • Added support for enabling client certificate-based authentication to the CDAP Authentication server. (CDAP-7287)
  • Merged various shell scripts into a single script to interface with CDAP, called cdap, shipped with both the SDK and Distributed CDAP.(CDAP-1280)
  • Updated the default CDAP Router port to 11015 to avoid conflicting with HiveServer2's default port.(CDAP-1696)
  • Fixed an issue with the CDAP scripts under Windows not handling a JAVA_HOME path with spaces in it correctly. CDAP SDK home directories with spaces in the path are not supported (due to issues with the product) and the scripts now exit if such a path is detected.(CDAP-3262)
  • For MapReduce programs using a PartitionedFileSet as input, the partition key corresponding to the input split is now exposed to the mapper.(CDAP-4322)
  • Fixed an issue where an exception from an HttpContentConsumer was being silently ignored.(CDAP-4901)
  • Added pagination for the search RESTful API. Pagination is achieved via {{offset}}, {{limit}}`, {{numCursors}}, and {{cursor}} parameters in the RESTful API.(CDAP-5068)
  • Added the property program.container.dist.jars to set extra jars to be localized to every program container and to be added to classpaths of CDAP programs.(CDAP-6183)
  • Fixed an issue that allowed a FileSet to be created if its corresponding directory already existed.(CDAP-6425)
  • The namespace that integration test cases run against by default has been made configurable.(CDAP-6572)
  • Added a feature that implements caching of user credentials in CDAP system services.(CDAP-6635)
  • Fixed an issue in WorkerContext that did not properly implement the contract of the Transactional interface. Note that this fix may cause incompatibilities with previous releases in certain cases. See API Changes, CDAP-6837 for more details.(CDAP-6837)
  • Updated more system services to respect the cdap-site parameter "master.service.memory.mb".(CDAP-6862)
  • Added support for concurrent runs of a Spark program.(CDAP-6885)
  • Added support for running CDAP on Apache HBase 1.2.(CDAP-6937)
  • Added support for Amazon EMR 4.6.0+ installation of CDAP via a bootstrap action script.(CDAP-6938)
  • Added support for enabling SSL between the CDAP Router and CDAP Master.(CDAP-6984)
  • Adding the capability to clean up log files which do not have corresponding metadata.(CDAP-6995)
  • Added support for checkpointing in Spark Streaming programs to persist checkpoints transactionally.(CDAP-7117)
  • Updated the Windows start scripts to match the new shell script functionality.(CDAP-7181)
  • Added the ability to specify an announce address and port for the CDAP AppFabric and Dataset services. Deprecated the properties app.bind.address and dataset.service.bind.address, replacing them with master.services.bind.address as the bind address for master services. Added the properties master.services.announce.address, app.announce.port, and dataset.service.announce.port for use as announce addresses that are different from the bind address.(CDAP-7192)
  • Improved CDAP Master logging of events related to programs that it launches.(CDAP-7208)
  • Fixed a NullPointerException being logged on closing network connection.(CDAP-7240)
  • Upgraded the Apache Tephra version to 0.10-incubating.(CDAP-7284)
  • Added support for CDH 5.9.(CDAP-7291)
  • Provided programs more control over when and how transactions are executed.(CDAP-7319)
  • The Log HTTP Handler and Router have been fixed to allow the streaming of larger logs files.(CDAP-7385)
  • Revised the documentation on the recommended setting for yarn.nodemanager.delete.debug-delay-sec.(CDAP-7393)
  • Removed the requirement in the documentation of running kinit prior to running the CDAP Upgrade Tool when upgrading a package installation of CDAP on a secure Hadoop cluster.(CDAP-7439)
  • Improves how MapReduce configures its inputs, such that failures surface immediately.(CDAP-7476)
  • Fixed an issue in MapReduce that caused skipping the destroy() method if the committing of any of the dataset outputs failed.(CDAP-7477)
  • DynamicPartitioner can now limit the number of open RecordWriters to one, if the output partition keys are grouped.(CDAP-7557)
  • Added support for specifying the Hive execution engine at runtime (dynamically).(CDAP-7659)
  • Adds the cluster.name property that identifies a cluster; this property can be set in the cdap-site.xml file.(CDAP-7761)
  • Added a step in the CDAP Upgrade Tool to upgrade the specification of the MetadataDataset.(CDAP-7797)

Bug Fixes

  • A MapReduce job using either a FileSet or PartitionedFileSet as input no longer fails if there are no input partitions.(CDAP-2945)
  • The Authentication server announce address is now configurable.(CDAP-4535)
  • Fixed a problem with downloading of large (multiple gigabyte) CDAP Explore queries.(CDAP-5012)
  • Fixed an issue where the metadata of streams was not being updated when the stream's schema was altered.(CDAP-5061)
  • Fixed an issue where a warning was logged instead of an error when a MapReduce job failed in the CDAP SDK.(CDAP-5372)
  • Updated the default CDAP UI port to 11011 to avoid conflicting with Accumulo and Cloudera Manager's Activity Monitor.(CDAP-5897)
  • Authentication handler APIs have been updated to restrict which cdap-site.xml and cdap-security.xml properties are available to it.(CDAP-6398)
  • Fixed an issue with searching for an entity in Cask Tracker by metadata after a tag with the same prefix has been removed.(CDAP-6404)
  • Fixed an issue with misleading log messages from the RunRecord corrector.(CDAP-7031)
  • Fixed an issue so as to significantly reduce the chance of a schedule misfire in the case where the CPU cannot trigger a schedule within a certain time threshold.(CDAP-7116)
  • Fixed a problem with duplicate logs showing for a running program.(CDAP-7138)
  • On an incorrect ZooKeeper quorum configuration, the CDAP Upgrade Tool and other services such as Master, Router, and Kafka will timeout with an error instead of hanging indefinitely.(CDAP-7154)
  • Fixed an issue in the CDAP Upgrade Tool to allow it to run on a CDAP instance with authorization enabled.(CDAP-7175)
  • Fixed an issue where macros were not being substituted for postaction plugins.(CDAP-7177)
  • Lineage information is now returned for deleted datasets.(CDAP-7204)
  • Fixed an issue with the FileBatchSource not working with Azure Blob Storage.(CDAP-7248)
  • Fixed an issue with CDAP Explore using Tez on Azure HDInsight.(CDAP-7249)
  • Fixed an issue where dataset usage was n...
Read more

Cask Data Application Platform 3.6.0

06 Oct 01:07

Choose a tag to compare

Improvements

  • Allow concurrent runs of different versions of a service. A RouteConfig can be uploaded to configure the percentage of requests that need to be sent to the different versions. (CDAP-5771)
  • Improved the PartitionedFileSet to validate the schema of a partition key. Note that this will break code that uses incorrect partition keys, which was previously silently ignored. (CDAP-7281)
  • All non-versioned endpoints are now directed to applications with a default version. Added test cases with a mixed usage of the new versioned endpoints and the corresponding non-versioned endpoints. (CDAP-7343)
  • Added an upgrade step that adds a default version ID to jobs and triggers in the Schedule Store. (CDAP-7366)
  • The Log HTTP Handler and Router have been fixed to allow the streaming of larger logs files. (CDAP-7385)
  • Added an HTTP RESTful API to create applications with a version. (CDAP-7264)
  • Added an HTTP RESTful API to start or stop programs of a specific application version. (CDAP-7265)
  • Added an upgrade step that adds a default application version to existing applications. (CDAP-7266)
  • Added an HTTP RESTful API to store, fetch, and delete RouteConfigs for user service endpoint routing control. (CDAP-7268)
  • User services now include their application version in the payload when they announce themselves in Apache Twill. (CDAP-7272)

Bug Fixes

  • Unit Test framework now has the capability to exclude scala, so users can depend on their own version of the library. (CDAP-3822)
  • Fixed an issue where dataset usage was not being recorded after an application was deleted. (CDAP-7250)
  • Fixed a problem with the documentation example links to the CDAP ETL Guide. (CDAP-7314)
  • Fixed a problem with upgrading CDAP using the CDAP Upgrade Tool. (CDAP-7321)
  • Fixed a problem with the upgrade tool while upgrading HBase coprocessors. (CDAP-7324)
  • Fixed a problem with the listing of applications not returning the application version correctly. (CDAP-7334)
  • Fixed a problem with using "Download All" logs in the browser log viewer by having it fetch and stream the response to the client. (CDAP-7353)
  • Fixed a problem with NodeJS buffering a response before sending it to a client. (CDAP-7359)
  • Fixed a problem with log file corruption if the log saver container crashes due to being killed by YARN. (CDAP-7361)
  • Fixed a problem with the CDAP UI not handling "5xx" error codes correctly. (CDAP-7364)
  • Fixed Hydrator Studio in the Windows version of Chrome to allow users to open and edit a node configuration. (CDAP-7374)
  • Fixed an error in the "CDAP Introduction" tutorial's "Transforming Your Data" example of an application configuration. (CDAP-7386)
  • Fixed an issue that caused unit test failures when using org.hamcrest classes. (CDAP-7391)
  • Fixed an issue where the Java process corresponding to the MapReduce application master kept running even if the application was moved to the FINISHED state. (CDAP-7392)
  • Fixed a problem with Hydrator pipelines using a DBSource not working in an HDP cluster. (HYDRATOR-791)
  • Fixed a problem with Spark data pipelines not supporting argument values in excess of 64K characters. (HYDRATOR-948)

Cask Data Application Platform 3.5.1

15 Sep 16:26

Choose a tag to compare

Known Issues

  • If you are upgrading an authorization-enabled CDAP instance, you will need to give the cdap user ADMIN privileges on all existing CDAP namespaces. See the Administration Manual: Upgrading for your distribution for details. (CDAP-7175)

  • In CDAP 3.5.0, new kafka.server.* properties replace older properties such as kafka.log.dir, as described in the Administration Manual: Appendices: cdap-site.xml.

    If you are upgrading from CDAP 3.4.x to 3.5.x, and you have set a value for kafka.log.dir by using Cloudera Manager's safety-valve mechanism, you need to change to the new property kafka.server.log.dirs, as the deprecated kafka.log.dir is being ignored in favor of the new property. If you don't, your custom value will be replaced with the default value. (CDAP-7179)

Improvements

  • Added the ability to specify an announce address and port for the appfabric and dataset services.

    Deprecated the properties app.bind.address and dataset.service.bind.address, replacing them with master.services.bind.address as the bind address for master services.

    Added the properties master.services.announce.address, app.announce.port, and dataset.service.announce.port for use as announce addresses that are different from the bind address. (CDAP-7192)

  • Upgraded the version of netty-http used in CDAP to version 0.15, resolving a problem with a NullPointerException being logged on the closing of a network connection. (CDAP-7240)

  • Snapshot sinks now allow users to specify a property cleanPartitionsOlderThan that cleans up any snapshots older than "x" days. (HYDRATOR-578)

Bug Fixes

  • PartitionConsumer appropriately drops partitions that have been deleted from a corresponding PartitionedFileSet. (CDAP-6215)
  • Fixed an issue with searching for an entity in Cask Tracker by metadata after a tag with the same prefix has been removed. (CDAP-6404)
  • Fixed a problem with duplicate logs showing for a running program. (CDAP-7138)
  • Fixed a bug in the upgrade tool to allow it to run on a CDAP instance with authorization enabled. (CDAP-7175)
  • Fixed an issue with uploading an application JAR or file to a stream through the CDAP UI. (CDAP-7178)
  • Fixed a problem with the property dataset.service.bind.address having no effect. (CDAP-7187)
  • Corrected errors in the documentation to correctly show how to set the schema on an existing table. (CDAP-7199)
  • Lineage information is now returned for deleted datasets. (CDAP-7204)
  • Fixed a problem with being unable to delete a namespace if a configured keytab file doesn't exist. (CDAP-7222)
  • Fixed a problem with a NullPointerException when the CDAP UI fetches a log. (CDAP-7235)
  • Prevented accidental grant of additional actions to a user as part of a grant operation when using Apache Sentry as the authorization provider. (CDAP-7237)
  • Fixed a problem with the FileBatchSource not working with Azure Blob Storage. (CDAP-7248)
  • Fixed a problem with CDAP Explore using Tez on Azure HDInsight. (CDAP-7249)
  • Fixed an issue where the Joiner plugin was failing in Hydrator pipelines executing in a Spark environment. (HYDRATOR-912)
  • Fixed a bug that caused the Database Source, Joiner, GroupByAggregate, and Deduplicate plugins to fail on certain versions of Spark. (HYDRATOR-922)
  • Fixed an error in the documentation of the HDFS Source and Sink with respect to the alias under high-availability. (HYDRATOR-932)
  • Fixed an issue preventing the adding of additional tags after an existing tag had been deleted. (TRACKER-217)

Cask Data Application Platform 3.5.0

23 Aug 01:44

Choose a tag to compare

New Features

  • All HBase Tables created through CDAP will now have a key cdap.version in the HTableDescriptor. (CDAP-2963)
  • Add location for cdap-cli.sh to PATH in distributed CDAP packages. (CDAP-3368)
  • Improved performance of the Dataset Service. (CDAP-3890)
  • Created pre-defined alert definitions in the CDAP Ambari Service. (CDAP-4106)
  • Support for HA CDAP installations in the CDAP Ambari Service. (CDAP-4107)
  • Support for Kerberos-enabled clusters via the CDAP Ambari service. (CDAP-4109)
  • CDAP Auth Server is now supported in the CDAP Ambari Service on Ambari clusters which have Kerberos enabled. (CDAP-4110)
  • Added an authorization extension backed by Apache Sentry to enforce authorization on CDAP entities. (CDAP-4288)
  • Added a way to cache authorization policies so every authorization enforcement request does not have to make a remote call. Caching is configurable—it can be enabled by setting security.authorization.cache.enabled to true. TTL for cache entries (security.authorization.cache.ttl.secs) as well as refresh interval (security.authorization.cache.refresh.interval.secs) is also configurable. (CDAP-4913)
  • Provided access to Partitioner and Comparator classes to the MapReduceTaskContext by implementing ProgramLifeCycle. (CDAP-5740)
  • Provided setting of YARN container resources requirements for all program types via preferences and runtime arguments. (CDAP-5770)
  • Added protection for a partition of a file set from being deleted while a query is reading the partition. (CDAP-6062)
  • CDAP namespaces can now be mapped to custom namespaces in storage providers. While creating a namespace, users can specify the Filesystem directory, HBase namespace and Hive database for that namespace. These settings cannot be changed once the namespace has been successfully created. (CDAP-6153)
  • Enable authorization, lineage, and audit log at the data operation level for all Datasets. (CDAP-6168)
  • Addes a new log viewer across CDAP, Cask Hydrator, and Cask Tracker, wherever appropriate. Provides easier navigation and debugging functionality for logs of different entities. (CDAP-6174)
  • Added an indicator in the UI of the CDAP mode (distributed or standalone, secure or insecure). (CDAP-6235)
  • Added authorization to the Secure Key HTTP RESTful APIs. To create a secure key, a user needs WRITE privilege on the namespace in which the key is being created. Users can only view secure keys that they have access to. To delete a key, ADMIN privilege is required. (CDAP-6393)
  • Exposed the secure store APIs to Programs. (CDAP-6456)
  • Added authorization for listing and viewing CDAP entities. (CDAP-6516)
  • Fixed an issue where the UI would ignore the configured port when connecting to the CDAP Router. (CDAP-7002)
  • Added an alpha feature: Hydrator Data Pipeline preview (CDAP SDK only). (HYDRATOR-156)
  • Added support for executing custom actions in the Cask Hydrator pipelines. (HYDRATOR-162)
  • Re-organized the bottom panel in Cask Hydrator to be in-context. Pipeline-level information is moved to a top panel and plugin-level information is moved to a modal dialog. (HYDRATOR-168)
  • Re-organized the left panel in Cask Hydrator studio view to have a maximum of four categories of plugin types: Source, Transform, Sink, and Actions. All other types are consolidated into one of these types. (HYDRATOR-379)
  • Implemented the Value Mapper plugin for Cask Hydrator plugins. This is a type of transform that maps string values of a field in the input record to another value. (HYDRATOR-501)
  • Added the XML Parser Transform plugin to Cask Hydrator plugins. This plugin uses XPath to extract fields from a complex XML Event. It is generally used in conjunction with the XML Reader Source Plugin. (HYDRATOR-502)
  • Added the XML Reader Source Plugin to Cask Hydrator plugins. This plugin allows users to read XML files stored on HDFS. (HYDRATOR-503)
  • Implemented the Cask Hydrator plugin for Row Denormalizer aggregator. This plugin converts raw data into de-normalized data based on a key column. De-normalized data can be easier and faster to query. (HYDRATOR-506)
  • Added the Cobol Copybook source plugin to Cask Hydrator plugins. This source plugin allows users to read and process mainframe files defined using COBOL Copybook. (HYDRATOR-507)
  • Added the Excel Reader Source plugin to Cask Hydrator Plugins. This plugin provides the ability to read data from one or more Excel file(s). (HYDRATOR-514)
  • Adds macros to pipeline plugin configurations. This allows users to set macros for plugin properties which can be provided as runtime arguments while scheduling and running the pipeline. (HYDRATOR-629)
  • Adds a new Run Configuration player for published pipeline views. This allows users to set runtime arguments while scheduling or running a pipeline. (HYDRATOR-634)
  • Added a Twitter source for Spark Streaming pipelines. (HYDRATOR-685)
  • Added the ability to edit user properties for a dataset directly in Cask Tracker. (TRACKER-96)
  • Added the Cask Tracker Meter to measure how active a dataset is in a cluster on a scale of zero to 100. (TRACKER-98)
  • Added the ability to add, remove, and manage a common dictionary of Preferred Tags in Cask Tracker and apply them to datasets. (TRACKER-100)
  • Added the ability to preview data directly in the Cask Tracker UI. (TRACKER-104)
  • Added the ability to view usage metrics about datasets in Cask Tracker. Users can view how many applications and programs are accessing each dataset using service endpoints and the Tracker UI. (TRACKER-105)

Improvements

  • Created a Docker-specific ENTRYPOINT script to support passing arguments. (CDAP-1545)
  • Improved the way that MapReduce failures are reported. (CDAP-4065)
  • Warns if either the app-fabric or router bind addresses are configured with a loopback address. (CDAP-4775)
  • The number of containers for the CDAP Explore service is no longer configurable and will be ignored upon specification. It will always be set to one (1). (CDAP-5000)
  • Now publishing stdout and stderr logs for MapReduce containers to CDAP. (CDAP-5336)
  • Allowing the setting of batch size for flowlet process methods via preferences and runtime arguments. (CDAP-5601)
  • Added support for long-running Spark jobs in a Kerberos-enabled cluster. (CDAP-5794)
  • Added support for starting extensions in distributed mode. (CDAP-5874)
  • Setting the JAVA_LIBRARY_PATH now causes CDAP Master to load Hadoop native libraries at startup. (CDAP-5959)
  • CDAP Upgrade tasks are now available in the CDAP Ambari Service. (CDAP-5969)
  • CDAP's Tephra dependency has been changed to depend on the Apache Incubator Tephra project. (CDAP-6034)
  • Improved the error message given on application deployment failure due to a missing Spark library. (CDAP-6206)
  • Added support in the log API for field suppression in JSON format. (CDAP-6216)
  • Added the ability to specify a CDAP Master's temporary directory. (CDAP-6246)
  • Introduced new experimental dataset APIs for updating a dataset's properties. (CDAP-6276)
  • Allowed specifying individual Java heap sizes for Java services in cdap-env.sh. (CDAP-6327)
  • Declared startup script contents as read-only to prevent them from being overridden by a user in cdap-env.sh. (CDAP-6350)
  • Added "Quick Links" for the CDAP UI, Cask Hydrator, and Cask Tracker in the Ambari 2.3+ UI. (CDAP-6361)
  • Added support for CDAP services over SSL in Ambari. (CDAP-6362)
  • Provided service dependencies for Ambari (requires Ambari 2.2+). ([CDAP-6363](https://issues.cask.c...
Read more

Cask Data Application Platform 3.3.7

21 Aug 17:49

Choose a tag to compare

Improvements

  • Improved program launch performance to avoid large cpu spikes when multiple programs are launched at the same time. (CDAP-7021)

Bug Fixes

  • Created a Docker-specific ENTRYPOINT script to easily support arguments.(CDAP-1545)
  • Fixed an issue that caused massive log message when there are underlying HDFS issues. (CDAP-6643)
  • Fixes issues that prevents log saver from performing cleanup when metadata is present for a non-existing file. (CDAP-6829)
  • Fixes issues that makes Log Saver more resilient to errors while checkpointing. (CDAP-6852)
  • Improved performance in cube datasets when querying for more than one measure in a query. This will also improve metrics query performance. (CDAP-6860)

Cask Data Application Platform 3.3.6

21 Jul 19:23

Choose a tag to compare

Bug Fixes

  • Made log saver process resilient to underlying HDFS exceptions. (CDAP-6465)
  • Fixed a problem with the CDAP Master leaking memory (due to a Twill Zookeeper issue) whenever a program is launched. (CDAP-6486)
  • Fixed a performance issue with the log handler by setting a maximum limit for the reading of log events from Kafka before requiring reading the events from disk storage. (CDAP-6493)
  • Fixed a problem with the log saver slowing down when Kafka partitions become highly skewed. (CDAP-6545)

Cask Data Application Platform 3.4.3

02 Jul 00:04

Choose a tag to compare

Bug Fixes

  • Fixed an issue where configuration of the FileSource was failing while setting the properties for the FileInputFormat. (CDAP-6238)
  • Fixed a bug in HDFSink where we now emit a null character in a UTF-8 encoding if a field is null. (CDAP-6255)
  • HDFSSink can now be used alongside other sinks in a Hydrator pipeline. (CDAP-6258)
  • Release 3.4.0 introduced an infinite-scroll for the display of input and output schemas and the version (1.2.2) that we used of the infinite scroll component had performance problems. We have downgraded the infinite scroll component we use to restore the performance in Hydrator views. (CDAP-6302)
  • Fixed a bug that the program run record was not correctly reflected in CDAP if the corresponding YARN application failed to start. (CDAP-6311)

Cask Data Application Platform 3.3.5

01 Jul 22:22

Choose a tag to compare

Bug Fixes

  • Fixed a bug that the program run record was not correctly reflected in CDAP if the corresponding YARN application failed to start. (CDAP-6311)

Cask Data Application Platform 3.4.2

07 Jun 23:47

Choose a tag to compare

Bug Fixes

  • Fixed integrating navigator app in Tracker UI. The POST body request that was sent while deploying navigator app was using an older deprecated property (UI was using ‘metadataKafkaConfig’ instead of ‘auditKafkaConfig’). This should enable using navigator app in Tracker UI. (CDAP-5998)
  • Fixed an issue in cloning a Hydrator pipeline which used to happen when a user navigate from CDAP to Hydrator to clone a pipeline in Hydrator UI. (CDAP-6096)
  • Fixed a NullPointerException issue in Spark when saving RDD to PartitionedFileSet dataset. (CDAP-6109)
  • Fixed the Hydrator Hive batch source so that it no longer throws a ClassNotFoundException. (CDAP-6041)
  • Fixed Hydrator CSVParser so that a nullable field is only set to null if the parsed value is an empty string, and the field is not a string or nullable string type. (CDAP-6044)

Cask Data Application Platform 3.3.4

19 May 19:00

Choose a tag to compare

Bug Fixes

  • Explore jobs properly use the latest/updated delegation tokens.
    (CDAP-5793)
  • Update HDFS delegation token properly for HA mode.
    (CDAP-5844)
  • Avoid the cancellation of delegation tokens upon completion of Explore-launched MapReduce and Spark jobs, as these delegation tokens are shared by CDAP system services.
    (CDAP-5855)