Releases: cdapio/cdap
Cask Data Application Platform v3.2.0
New Features
- Added support for HBase1.1.(CDAP-2556)
- Added a new API for creating an application from an artifact.(CDAP-2666)
- Added the ability to write to multiple outputs from a MapReduce job.(CDAP-2756)
- Added the ability to dynamically write to multiple partitions of a PartitionedFileSet dataset as the output of a MapReduce job.(CDAP-2757)
- Added a Stream and Dataset Widget to the CDAP-UI.(CDAP-3253)
- Added stream views, enabling reading from a single stream using various formats and schemas.(CDAP-3390)
- Added a Validator Transform that can be used to validate records based on a set of available validators and configured to write invalid records to an error dataset.(CDAP-3476)
- Added a service to manage the metadata of CDAP entities.(CDAP-3516)
- Added the publishing of metadata change notifications to Apache Kafka.(CDAP-3518)
- Added the ability to compute lineage of a CDAP dataset or stream in a given time window.(CDAP-3519)
- Added RESTful APIs for adding/retrieving/deleting of metadata for apps/programs/datasets/streams.(CDAP-3520)
- Added the ability to record a dataset or stream access by a CDAP program.(CDAP-3521)
- Added the capability to search CDAP entities based on their metadata.(CDAP-3522)
- Added RESTful APIs for searching CDAP entities based on business metadata.(CDAP-3523)
- Added a data store to manage business metadata of CDAP entities.(CDAP-3527)
- Added SSH port forwarding to the CDAP virtual machine.(CDAP-3549)
- Added a data store for recording data accesses by CDAP programs and computing lineage.(CDAP-3556)
- Added the ability to write to multiple sinks in ETL real-time and batch applications.(CDAP-3590)
- Added the ability for real-time ETL pipelines to write to multiple sinks.(CDAP-3591)
- Added the ability for batch ETL pipelines to write to multiple sinks.(CDAP-3592)
- For the CSV and TSV stream formats, a “mapping” setting can now be specified, mapping stream event columns to schema columns.(CDAP-3626)
- Added support for CDAP to work with HDP 2.3.(CDAP-3693)
Improvements
- Added documentation of the RESTful endpoint to retrieve the properties of a stream.(CDAP-1914)
- Added an interface to load a file into a stream from the CDAP-UI.(CDAP-2514)
- The CDAP-UI “Errors” pop-up in the main screen now displays the time and date for each error.(CDAP-2809)
- Updated the Cloudera Manager CSD to use support for logback.(CDAP-2872)
- Cleaned up the messages shown in the errors dropdown in the CDAP-UI.(CDAP-2950)
- Added a CDAP-CLI command to stop a workflow.(CDAP-3147)
- Added support for upgrading the Hadoop distribution or the HBase version that CDAP is running on.(CDAP-3179)
- Revised the documentation of the file cdap-default.xml, removed properties no longer in use, and corrected discrepancies between the documentation and the shipped XML file.(CDAP-3257)
- Improved the help provided in the CDAP-CLI for the setting of stream formats.(CDAP-3270)
- Upgraded netty-http version to 0.12.0.(CDAP-3275)
- Added a HTTP RESTful API to update the application configuration and artifact version.(CDAP-3282)
- Added a “clear” button in the CDAP-UI for cases where a user decides to not used a pre-populated schema.(CDAP-3332)
- Defined a directory structure to be used for predefined applications.(CDAP-3351)
- Added documentation in the source code on adding new commands and completers to the CDAP-CLI.(CDAP-3357)
- In the CDAP-UI, added visualization for Workflow tokens in Workflows.(CDAP-3393)
- HBaseQueueDebugger now shows the minimum queue event transaction write pointer both for each queue and for all queues.(CDAP-3419)
- Added an example cdap-env.sh to the shipped packages.(CDAP-3443)
- Added an example in the documentation explaining how to prune invalid transactions from the transaction manager.(CDAP-3464)
- Modified the CDAP upgrade tool to delete all adapters and the ETLBatch and ETLRealtime ApplicationTemplates.(CDAP-3490)
- Added the ability to persist the runtime arguments with which a program was run.(CDAP-3495)
- Added support for writing to Amazon S3 in Avro and Parquet formats from batch ETL applications.(CDAP-3550)
- Updated CDAP to use Tephra 0.6.2.(CDAP-3564)
- Updated the transaction debugger client to print checkpoint information.(CDAP-3610)
Bug Fixes
- Fixed an issue where failed dataset operations via Explore queries did not invalidate the associated transaction.(CDAP-1697)
- Fixed a problem where users got an incorrect message while creating a dataset in a non-existent namespace.(CDAP-1864)
- Fixed a problem with services returning the same message for all failures.(CDAP-1892)
- Fixed a problem where a dataset could be created in a non-existent namespace in standalone mode.(CDAP-1984)
- Fixed a problem with the CDAP-CLI creating file logs.(CDAP-2428)
- Fixed a problem with the CDAP-CLI not auto-completing when setting a stream format.(CDAP-2521)
- Fixed a problem with the CDAP-UI of buttons staying ‘in focus’ after clicking.(CDAP-2785)
- The CDAP-UI “Errors” pop-up in the main screen now displays the time and date for each error.(CDAP-2809)
- Fixed a problem with schedules not being deployed in suspended mode.(CDAP-2892)
- Fixed a problem where failure of a spark node would cause a workflow to restart indefinitely.(CDAP-3014)
- Fixed an issue with the CDAP standalone process periodically crashing with Out-of-Memory errors when writing to an Oracle table.(CDAP-3073)
- Fixed a problem with workflow runs not getting scheduled due to Quartz exceptions.(CDAP-3101)
- Fixed a problem with discrepancies between the documentation and the defaults actually used by CDAP.(CDAP-3121)
- Fixed a problem in the CDAP-UI with the clone button in an incorrect position when using Firefox.(CDAP-3200)
- Fixed a problem in the CDAP-UI with an incorrect tabbing order when using Firefox.(CDAP-3201)
- Fixed a problem when specifying the HBase version using the HBASE_VERSION environment variable.(CDAP-3219)
- Fixed a problem in the CDAP-UI error pop-ups not having a default focus on a button.(CDAP-3233)
- Fixed a problem in the CDAP-UI with the default schema shown for streams.(CDAP-3243)
- Fixed a problem in the CDAP-UI with scrolling on the namespaces dropdown on certain pages.(CDAP-3260)
- Fixed a problem on CDAP distributed mode with the serializing of the metadata artifact causing a stack overflow.(CDAP-3261)
- Fixed a problem in the CDAP-UI not warning users if they exit or close their browser without saving.(CDAP-3305)
- Fixed a problem in the CDAP-UI with refreshing always returning to the overview page.(CDAP-3313)
- Fixed a problem with the table batch source requiring a row key to be set.(CDAP-3326)
- Fixed a problem with the application deployment for apps that contain Spark.(CDAP-3343)
- Fixed a problem with the display of ETL application metrics in the CDAP-UI.(CDAP-3349)
- Fixed a problem in the CDAP examples with the use of a runtime argument, min.pages.threshold.([CDAP-3355](https://issues.cask.co/browse/CDAP-335...
Cask Data Application Platform v3.1.2
- Improve the UI performance when rendering flow diagrams with a large number of nodes. (CDAP-3530)
- Fixed a bug that prevents streams events that are already processed from being re-processed in flows. (CDAP-3458)
- Fixed a bug that prevented explore service working on clusters with secure hive 0.14 (CDAP-3452)
- Fixed the readless increment co-processor to handle multiple readless increment columns in the same row. (CDAP-3449)
- Fixed a problem with the logback-container.xml not being copied into master services. (CDAP-3362)
Cask Data Application Platform v3.0.5
- Fixed a bug that prevents streams events that are already processed from being re-processed in flows (CDAP-3458)
Cask Data Application Platform v3.0.4
Cask Data Application Platform v3.1.1
This is a bug-fix release
Bugs Fixed
- CDAP-3259 - Removed a development script accidentally included in the 3.1.0 release.
- CDAP-3321 - Fixed a problem of being unable to enable SSL on the CDAP-UI.
- CDAP-3340 - Fixed a problem with the deployment of applications and the batch loading of events to a stream when using the CDAP CLI on Windows.
- CDAP-3362 - Fixed a problem of the logback-container.xml not being copied into the master services.
- CDAP-3377 - Fixed a problem in the CDAP-UI with shrinking the browser height when working with application templates.
- CDAP-3386 - Fixed a problem with Spark classes not being found when running a Spark program through a Workflow in Distributed mode on HDP 2.2.
- CDAP-3404 - Fixed an error in the installation documentation on enabling the CDAP Explore service.
- CDAP-3405 - Fixed a problem with the third step of the Getting Started example on cask.co/get-started.
- CDAP-3408 - Fixed a problem with starting the CDAP Explore service on CDH 5.2 and 5.3.
Cask Data Application Platform v3.1.0
New Features
MapR 4.1 Support, HDP 2.2 Support, CDH 5.4 Support
- CDAP-1614 -Added HBase 1.0 support.
- CDAP-2318 -Made CDAP work on the HDP 2.2 distribution.
- CDAP-2786 -Added support to CDAP 3.1.0 for the MapR 4.1 distro.
- CDAP-2798 -Added Hive 0.14 support.
- CDAP-2801 -Added CDH 5.4 Hive 1.1 support.
- CDAP-2836 -Added support for restart of specific CDAP System
Services Instances. - CDAP-2853 -Completed certification process for MapR on CDAP.
- CDAP-2879 -Added Hive 1.0 in Standalone.
- CDAP-2881 -Added support for HDP 2.2.x.
- CDAP-2891 -Documented cdap-env.sh and settings OPTS for HDP 2.2.
- CDAP-2898 -Added Hive 1.1 in Standalone.
- CDAP-2953 -Added HiveServer2 support in a secure cluster.
Spark
- CDAP-344 -Users can now run Spark in distributed mode.
- CDAP-1993 -Added ability to manipulate the SparkConf.
- CDAP-2700 -Added the ability to Spark programs of discovering CDAP
services in distributed mode. - CDAP-2701 -Spark programs are able to collect Metrics in
distributed mode. - CDAP-2703 -Users are able to collect/view logs from Spark programs
in distributed mode. - CDAP-2705 -Added examples, guides and documentation for Spark in
distributed mode. LogAnalysis application demonstrating parallel
execution of the Spark and MapReduce programs using Workflows. - CDAP-2923 -Added support for the WorkflowToken in the
Spark programs. - CDAP-2936 -Spark program can now specify resources usage for
driver and executor process in distributed mode.
Workflows
- CDAP-1983 -Added example application for processing and analyzing
Wikipedia data using Workflows. - CDAP-2709 -Added ability to add generic keys to the WorkflowToken.
- CDAP-2712 -Added ability to update the WorkflowToken in MapReduce
and Spark programs. - CDAP-2713 -Added ability to persist the WorkflowToken per run of
the Workflow. - CDAP-2714 -Added ability to query the WorkflowToken for the past
as well as currently running Workflow runs. - CDAP-2752 -Added ability for custom actions to access the CDAP
datasets and services. - CDAP-2894 -Added an API to retreive the system properties (e.g.
MapReduce counters in case of MapReduce program) from
the WorkflowToken. - CDAP-2923 -Added support for the WorkflowToken in the
Spark programs. - CDAP-2982 -Added verification that the Workflow contains all
programs/custom actions with a unique name.
Datasets
- CDAP-347 -User can use datasets in beforeSubmit and afterFinish.
- CDAP-585 -Changes to Spark program runner to use File dataset
in Spark. Spark programs can now use file-based datasets. - CDAP-2734 -Added PartitionedFileSet support to setting/getting
properties at the Partition level. - CDAP-2746 -PartitionedFileSets now record the creation time of
each partition in the metadata. - CDAP-2747 -PartitionedFileSets now index the creation time of
partitions to allow selection of partitions that were created after
a given time. Introduced BatchPartitionConsumer as a way to
incrementally consume new data in a PartitionedFileSet. - CDAP-2752 -Added ability for custom actions to access the CDAP
datasets and services. - CDAP-2758 -FileSet now support existing HDFS locations.
Treat base paths that start with “/” as absolute in the file system.
An absolute base path for a (Partitioned)FileSet was interpreted as
relative to the namespace’s data directory. Newly created FileSets
interpret absolute base paths as absolute in the file system.
Introduced a new property for (Partitioned)FileSets name
“data.external”. If true, the base path of the FileSet is assumed to
be managed by some external process. That is, the FileSet will not
attempt to create the directory, it will not delete any files when
the FileSet is dropped or truncated, and it will not allow adding or
deleting files or partitions. In other words, the FileSet
is read-only.
- CDAP-2784 -Added support to write to PartitionedFileSet Partition
metadata from MapReduce. - CDAP-2822 -IndexedTable now supports scans on the indexed field.
Metrics
- CDAP-2975 -Added pre-split FactTables.
- CDAP-2326 -Added better unit-test coverage for Cube dataset.
- CDAP-1853 -Metrics processor scaling no longer needs a master
services restart. - CDAP-2844 -MapReduce metrics collection no longer use counters,
and instead report directly to Kafka. - CDAP-2701 -Spark programs are able to collect Metrics in
distributed mode. - CDAP-2466 -Added CLI for metrics search and query.
- CDAP-2236 -New CDAP UI switched over to using newer
search/query APIs. - CDAP-1998 -Removed deprecated Context - Query param in Metrics
v3 API.
Miscellaneous New Features
- CDAP-332 -Added a Restful end-point for deleting Streams.
- CDAP-1483 -QueueAdmin now uses Id.Namespace instead of
simply String. - CDAP-1584 -CDAP CLI now shows the username in the CLI prompt.
- CDAP-2139 -Removed a duplicate Table of Contents on the
Documentation Search page. - CDAP-2515 -Added a metrics client for search and query by tags.
- CDAP-2582 -Documented the licenses of the shipped
CDAP-UI components. - CDAP-2595 -Added data modelling of flows.
- CDAP-2596 -Added data modelling of MapReduce.
- CDAP-2617 -Added the capability to get logs for a given time range
from CLI. - CDAP-2618 -Simplified the Cube sink configurations.
- CDAP-2670 -Added Parquet sink with time partitioned file dataset.
- CDAP-2739 -Added S3 batch source for ETLbatch.
- CDAP-2802 -Stopped using HiveConf.ConfVars.defaultValue, to
support Hive >0.13. - CDAP-2847 -Added ability to add custom filters to FileBatchSource.
- CDAP-2893 -Custom Transform now parses log formats for ETL.
- CDAP-2913 -Provided installation method for EMR.
- CDAP-2915 -Added an SQS realtime plugin for ETL.
- CDAP-3022 -Added Cloudfront format option to LogParserTransform.
- CDAP-3032 -Documented TestConfiguration class usage in
unit-test framework.
Cask Data Application Platform v3.0.3
Bug fixes
- Fix Bower dependency error
(CDAP-3010)
Cask Data Application Platform v2.8.2
Bug fixes
- Fix Bower dependency error
(CDAP-3010)