Releases · cdapio/cdap

29 Aug 19:44

prinam

v4.3.0

af8d0d4

Summary

1. Data Pipelines:
- Support for conditional execution of parts of a pipeline
- Ability for pipelines to trigger other pipelines for cross-team, cross-pipeline inter-connectivity, and to build complex interconnected pipelines.
- Improved pipeline studio with redesigned nodes, undo/redo capability, metrics
- Automated upgrade of pipelines to newer CDAP versions
- Custom icons and labels for pipeline plugins
- Operational insights into pipelines

2. Data Preparation:
- Support for User Defined Directives (UDD), so users can write their own custom directives for cleansing/preparing data.
- Restricting Directive Usage and ability to alias Directives for your IT Administrators to control directive access

3. Governance & Security:
- Standardized authorization model
- Apache Ranger Integration for authorization of CDAP entities

4. Enhanced support for Apache Spark:
- PySpark Support so data scientists can develop their Spark logic in Python, while still taking advantage of enterprise integration capabilities of CDAP
- Spark Dataframe Support so Spark developers can access CDAP datasets as Spark DataFrames

5. New Frameworks and Tools:
- Microservices for real-time IoT use cases.
- Distributed Rules Engine - for Business Analysts to effectively manage rules for data transformation and data policy

New Features

Data Pipelines Enhancements

Added a new splitter transform plugin type that can send output to different ports. Also added a union splitter transform that will send records to different ports depending on which type in the union it is and a splitter transform that splits records based on whether the specified field is null. (CDAP-12033)
Added a way for pipeline plugins to emit alerts, and a new AlertPublisher plugin type that publishes those alerts. Added a plugin that publishes alerts to CDAP TMS and an Apache Kafka Alert Publisher plugin to publish alerts to a Kafka topic. (CDAP-12034)
Batch data pipelines now support condition plugin types which can control the flow of execution of the pipeline. Condition plugins in the pipeline have access to the stage statistics such as number of input records, number of output records, number of error records generated from the stages which executed prior to the condition node. Also implemented Apache Commons JEXL based condition plugin which is available by default for the batch data pipelines. (CDAP-12108)
Plugin prepareRun and onFinish methods now run in a separate transaction per plugin so that pipelines with many plugins will not timeout. (CDAP-12167)
All pipeline plugins now have access to the pipeline namespace and name through their context object. (CDAP-12191)
Added a feature that allows undoing and redoing of actions in pipeline Studio. (CDAP-9107)
Made pipeline nodes bigger to show the version and metrics on the node. (CDAP-12057)
Revamped pipeline connections, to allow dropping a connection anywhere on the node, and allow selecting and deleting multiple connections using the Delete key. (CDAP-12077)
Added an automated UI flow for users to upgrade pipelines to newer CDAP versions. (CDAP-10619)
Added visualization for pipeline in UI. This helps visualizing runs, logs/warnings and data flowing through each node for each run in the pipeline. (CDAP-11889)
Added support for plugins of plugins. This allows the parent plugin to expose some APIs that its own plugins will implement and extend. (CDAP-12111)
Added ability to support custom label and custom icons for pipeline plugins. (CDAP-12114)
BatchSource, BatchSink, BatchAggregator, BatchJoiner, and Transform plugins now have a way to get SettableArguments when preparing a run, which allows them to set arguments for the rest of the pipeline. (CDAP-10974)
Runtime arguments are now available to the script plugins such as Javascript and Python via the Context object. (CDAP-10653)
Added a method to PluginContext that will return macro evaluated plugin properties. (CDAP-12472)
Enhanced add field transform plugin to add multiple fields. (CDAP-12094)

Triggers

Added capabilities to trigger programs and data pipelines based on status of other programs and data pipelines. (CDAP-11912)
Added the capability to use plugin properties and runtime arguments from the triggering data pipeline as runtime arguments in the triggered data pipeline. (CDAP-12382)
Added composite AND and OR trigger. (CDAP-12232)

Data Preparation Enhancements

Added the ability for users to connect Data Preparation to their existing data in Apache Kafka. (CDAP-11618)
Added point and click interaction for performing various calculations on data in Data Prep. (CDAP-12092)
Added point and click interaction for applying custom transformations in Data Prep. (CDAP-12118)
Added point and click interaction to mask column data. (CDAP-9530)
Added point and click interaction to encode/decode column data. (CDAP-9532)
Added point and click interaction to parse Avro and Excel files. (CDAP-11869)
Added point and click interaction for replacing column names in bulk. (CDAP-11977)
:cask-issue:CDAP-12091 - Added point and click interaction for defining and incrementing variable. (CDAP-12091)

Spark Enhancements

Added capabilities to run PySpark programs in CDAP. (CDAP-4871)

Governance and Security Enhancements

Implemented the new authorization model for CDAP. The old authorization model is no longer supported. (CDAP-12134)
Added a new configuration security.authorization.extension.jar.path in cdap-site.xml which can be used to add extra classpath and is avalible to cdap security extensions. (CDAP-12317)
Removed automatic grant/revoke privileges on CDAP entity creation/deletion. (CDAP-12100)
Added support for authorization on Kerberos principal for impersonation. (CDAP-12367)
Modified the authorization model so that read/write on an entity will not depend on its parent. (CDAP-11839)
Deprecated createFilter() and added a new isVisible API in AuthorzationEnforcer. Deprecated grant/revoke APIs for EntityId and added new one for Authorizable which support wildcard privileges. (CDAP-12135)
Removed version for artifacts for authorization policy to be consistent with applications. From 4.3 onwards CDAP does not support policies on artifact/application version. (CDAP-12283)

Other New Features

Added a wizard to allow configuring and deploying microservices in UI. (CDAP-11940)
Enabled GC logging for CDAP services. (CDAP-6329)
Added support for HDInsight 3.6. (CDAP-11448)
CSD now performs a version compatibility check with the active CDAP Parcel. (CDAP-4874)
Added live migration of metrics tables from pre 4.3 tables to 4.3 salted metrics tables. (CDAP-12348)
Added capability to salt the row key of the metrics tables so that writes are evenly distributed and there is no region hot spotting. (CDAP-12017)
Added a REST API to check the status of metrics processor. We can view the topic level processing stats using this endpoint. (CDAP-12068)
Added option to disable/enable metrics for a program through runtime arguments or preferences. This feature can also be used system wide by enabling/disabling metrics in cdap-site.xml. (CDAP-12070)
Added global "CDAP" config to enable/disable metrics emission from user programs.By default metrics is enabled. (CDAP-12290)
DatasetOutputCommiter's methods are now executed in the MapReduce ApplicationMaster, within OutputCommitter's commitJob/abortJob methods. The MapReduceContext.addOutput(Output.of(String, OutputFormatProvider)) API can no longer be used to add OutputFormatProviders that also implement the DatasetOutputCommi...

Assets 2

14 Jul 02:50

sreevatsanraman

v4.1.2

0ba400d

Cask Data Application Platform - 4.1.2

Improvements

Reuse network connections for TMS client. (CDAP-12020)
Added a way to limit the frequency of retrieving the MapReduce task report, which could cause network load for very large jobs. (CDAP-11959)
Added the ability to configure the HBase client scanner cache for a dataset. (CDAP-11949)
Added startup check for CDAP master to error out if the configurations for HBaseDDLExecutor extensions are provided, however extension jar cannot be loaded. (CDAP-11594)
Upgraded IDEA IntelliJ IDE in CDAP SDK VM to 2017.1.3 release. (CDAP-11444)
Upgraded Eclipse IDE in CDAP SDK VM to Neon 3 release. (CDAP-11398)
Added the ability to denormalize data, by splitting based on de-limiter text or array flattening, to individual records in Dataprep UI as point and click directive. (CDAP-9515)
Added the ability to apply some DataPrep directives on multiple columns, starting with Join columns and Swap columns. Multiple columns can be selected by checking the checkbox next to each column's name, then selecting a directive in the directive dropdown. (CDAP-9514)
Added the ability to format data (date time, string formatting etc.,) in Dataprep UI as point and click directive. (CDAP-9507)
Added the ability to extract text using regex patterns in Dataprep UI as point and click directive. (CDAP-9523)
Added feature where macro arguments are also listed in the runtime arguments of preview mode, just like when running a new pipeline. (CDAP-9096)
Added feature where values of macro arguments are automatically populated and shown in the UI when running a pipeline, if those values exist as Preferences. (CDAP-9094)
Enable GC logging for cdap services. (CDAP-6329)

Bug Fixes

Fixed a bug that UGI provider returns the old and incorrect UGI information. (CDAP-11985)
Fixed a bug that sometimes wrong user is used in explore, which results in the failure of deleting namespace. (CDAP-11955)
Fixed a bug where committed data could be removed during HBase table flush or compaction. (CDAP-11948)
Fixed an issue where a failed MapReduce run was marked as successful. (CDAP-11937)
Fixed a bug that hydrator pipelines and other programs do not create datasets at runtime with correct impersonated user. (CDAP-11880)
Fixed impersonation when upgrading datasets in UpgradeTool (CDAP-11815)
Fixed an issue with retrieving workflow state if it contains an exception without a message. (CDAP-11795)
HBaseDDLExecutor implementation is now localized to the containers without adding it in the container classpath. (CDAP-11783)
Fixed delete button on action plugins to allow users to delete easily. (CDAP-10488)
Fixed a bug that impersonated workflow does not create local datasets with correct impersonated user. (CDAP-9456)
Fixed issue in explore preview where UI is not displaying boolean value correctly (CDAP-8963)
Fixed an issue where Workflow driver was getting restarted when it runs out of memory, causing the Workflow to be executed from start node again. (CDAP-5067)

Assets 2

07 Jun 08:19

sreevatsanraman

v4.2.0

e1875a7

Cask Data Application Platform v4.2.0

Summary

Spark Enhancements: Added suppport for Apache Spark 2.x. Users have an option to configure CDAP to use Spark 1.x or Spark 2.x on their cluster. Also added capability to run interactive Spark code within CDAP.
Enhanced Data Preparation: Added capabilities in data preparation to connect to the File System (Local and HDFS) and relational databases, browse and select their existing data, and import into Data Preparation for cleansing, preparing and transforming.
Event Driven Schedules: Added capabilities to start CDAP programs based on data availability of partitions of data in HDFS and pose run contraints to intelligently orchestrate CDAP Workflows.

New Features

Spark Enhancements

Added support for Spark 2.x. In environments where multiple Spark versions exist, CDAP must be configured to use one or the other (CDAP-7875)
Enable capabilities to run interactive Spark code within CDAP (CDAP-11409)
Added capabilities to run arbitrary Spark code in CDAP Pipelines (CDAP-11410)
Enhancements to speed up launching Spark programs (CDAP-11411)

Enhanced Data Preparation

Adds File System Browser Component to browse Local and HDFS File System from Data Preparation (CDAP-9290)
Adds Data Quality information to Data Preparation table. Currently, it shows the completeness of each column (CDAP-9517)
Added point-and-click interactions for applying directives such as parsing, splitting, find and replace, filling null or empty rows, copying and deleting columns in Data Preparation. They can be invoked by using the dropdown menu for each column (CDAP-9524)
Added point-and-click interaction for cleansing column names (CDAP-11333)
Added a point-and-click interaction to set all column names in Data Preparation (CDAP-11334)
Added the ability to ingest data one-tim from Data Preparation to a CDAP Dataset (CDAP-11424)
Added macro support for Data Preparation directives (CDAP-9556)

Event Driven Schedules

Introduces a new, event-driven scheduling system that can start programs based on data availability in HDFS partitions (CDAP-7593)
Allow users to configure constraints for schedules, such as duration since last run and allowed time range for program execution (CDAP-11338)

Other New Features

Added capability for CDAP Services to dynamically list available artifacts and dynamically load artifacts (CDAP-11498)
Added support for EMR 5.0 - 5.3 (CDAP-7873)
Added the ability for Data Preparation to handle byte arrays of data for processing binary data (CDAP-11486)
Added an API to Spark Streaming sources to provide number of streams being used by a streaming source (CDAP-11422)
Users can now upload, view, and use plugins of type 'sparksink' in Studio. (CDAP-11681)
Modified the log viewer to only show ERROR, WARN, and INFO levels of logs by default, instead of all logs as previously (CDAP-8668)

Bug fixes

Fix a bug where the log level was always set to INFO at the root logger (CDAP-8289)
Fix a bug where extra characters after an artifact version range were being ignored instead of being recognized as invalid (CDAP-7727)
Fixed a bug where users could not read from real Datasets while previewing CDAP Pipelines (CDAP-7884)
Fixed a bug that prevented users from adding extra classpath to Apache Spark drivers and executors (CDAP-9422)
Fixed a bug where impersonated workflow was not creating local datasets with the correct impersonated user (CDAP-9456)
Fixed a bug in Parquet and Avro File sinks that would cause them to fail if they received ByteBuffers instead of byte arrays. (CDAP-11417)
Fixed a bug where writes could only succeed in one MongoDB sink even when multiple MongoDB sinks were present in a pipeline (CDAP-11558)
Fixed a thread leakage bug in Spark (SPARK-20935) after Spark Streaming program completed (CDAP-11577)
Fixed a bug in TMS where fetching from the payload table raised an exception if the fetch had an empty result (CDAP-11588)
Fixed a bug in the Purchase example that could cause purchases to overwrite each other (CDAP-11643)
Fixed a bug that prevented from using logback.xml in Apache Spark Streaming programs. (CDAP-11651)
Fixed an issue where pipeline metrics were not showing up in pipelines with a large number of nodes (CDAP-9284)
Fixed an issue with retrieving workflow state if it contained an exception without a message (CDAP-11795)
Fixed an issue with the CDAP Ambari service definition where the "cdap" headless user was not unique to the cluster (CDAP-11445)
Fixed the CDAP Upgrade tool to not fail when encountering a non-CDAP table that follows the CDAP naming convention (CDAP-4887)
Fixed an issue where the driver process of a CDAP Workflow was getting restarted when it ran out of memory, causing the Workflow to be executed again from the start node (CDAP-5067)
Fixed an issue with the detection of Apache Spark on HDP 2.5 and above, which caused excess noise on the console (CDAP-7429)
Fixed an issue with the YARN container allocation logic so that the correct container size is used. (CDAP-8888)
Fixed the stream container to terminate cleanly and cleaned up the CDAP Master's Apache Twill JAR files after master shutdown (CDAP-8911)
Fixed an issue where redeployment of an application with a deleted schedule would fail (CDAP-8918)
Fixed warnings about /opt/cdap/master/artifacts not being a directory in unit tests (CDAP-8961)
Fixed an issue due to which CDAP entity roles were not cleanup when the entity was deleted (CDAP-9026)
Fixed an issue where cdap-security.xml was not written under Ambari unless security.enabled in cdap-site.xml was set to true (CDAP-9378)
Fixed the Azure Blob Store source to work with Avro and Parquet formats (CDAP-10475)
Fixed the Azure Blob Store source to work with CDAP FileSets (CDAP-11384)
Fixed the "value is" filter in the Data Preparation UI (CDAP-11557)
Fixed impersonation while upgrading datasets in the Upgrade tool (CDAP-11815)

Deprecations

Add property "metrics.processor.queue.size" with default value 20000 to limit the maximum size of a queue where metrics processor temporarily stores newly fetched metrics in memory before persisting them. Added property "metrics.processor.max.delay.ms" with default value 3000 milliseconds to specify the maximum delay allowed between the latest metrics timestamp and the time when it is processed. The larger this property is, Metrics Processor gets to sleep more often between fetching each batch of metrics but the delay between metrics emission and processing also increases. Deprecated the property "metrics.messaging.fetcher.limit" (CDAP-8327)

Assets 2

17 Apr 04:03

prinam

v4.1.1

c78aaf9

Cask Data Application Platform 4.1.1

Summary

Data Preparation: Point-and-click interactions and integration with the rest of CDAP
including, but not limited to, namespaces, security, and pipelines.
Upgrade: Significant reduction in downtime during CDAP upgrades, by removing some data
migration and doing required migration in the background after CDAP starts up.
Pipeline Previews: Added logs, better error messaging, ability to read from existing
datasets, and a better stop experience.
Logs: Added a condensed view of logs for CDAP pipelines and programs that does not
include logs emitted by the CDAP platform and libraries. The condensed view only contains lifecycle logs, logs emitted by the program or pipeline, and errors.
Schedules: Added the ability to update schedules without redeploying the application.

New Features

Data Preparation
................................

Users can now interact with and manage multiple workspaces in Data Preparation. (CDAP-9235)
Added point-and-click interactions for applying directives such as parsing, splitting, find and replace, filling null or empty rows, copying and deleting columns in Data Preparation. They can be invoked by using the dropdown menu for each column. (WRANGLER-77)

Logs
................................

Added option to the log viewer to only show "user" condensed logs. (CDAP-9117)
Logs for previews of CDAP pipelines are now available in the CDAP UI via the Logs button in Preview mode. (HYDRATOR-1316)

Schedules
................................

Added support for adding, deleting, updating, and retrieving workflow schedules. (CDAP-8902)

Other New Features
................................

Upgraded Apache Tephra dependency to the 0.11.0-incubating version. (CDAP-8872)
Users can now deploy CDAP pipelines with a single action plugin. This feature can be used to run external Apache Spark programs as CDAP pipelines. (CDAP-9141

Added a sparkprogram plugin type that can be used to run arbitrary Spark code at the beginning or end of a pipeline. An external Spark program can be added by clicking the "plus" ("+") button in the CDAP UI, choosing Library, and specifying sparkprogram as the type. It is then available as an Action plugin in the CDAP Studio.
Added support for HDP 2.6. (CDAP-9250)
Added support for CDH 5.11.0. (CDAP-9281)
Added support that allows plugin developers to integrate with CDAP services by exposing CDAP service discovery capabilities in the plugin context. (CDAP-9311)

Improvements

Upgrade
................................

Added the running of HBase coprocessor upgrades concurrently on CDAP Datasets. (CDAP-9278)
Improved the CDAP upgrade process to minimize the downtime needed to upgrade, by performing data migration in the background. (CDAP-9282

Pipeline Previews
................................

Simplified the status, next runtime of pipelines, total number of running pipelines, and drafts in the pipeline list view UI. (CDAP-9017)

Schedules
................................

Allow administrators to enable or disable updating schedules using the property "app.deploy.update.schedules" in cdap-site.xml. Users can override this to enable or disable updating schedules during deployment of an application using the same property specified in the configuration of the application. (CDAP-8942)

Other Improvements
................................

Added fetch size and transaction flush interval configurations to the Kafka Consumer Flowlet. (CDAP-7731)
Users can now see a contextual message with appropriate call(s) to action when no entities are found on the Overview page. (CDAP-8430)
Added new configurations to control the YARN application master container memory size, maximum heap memory size, and maximum non-heap memory size: twill.java.heap.memory.ratio, twill.yarn.am.memory.mb, and twill.yarn.am.reserved.memory.mb. (CDAP-8990)
Increased the default memory allocation for the CDAP Explore service container to 2048MB. (CDAP-9003)
Users can now grant and revoke privileges for UNIX groups and users when using Apache Sentry as the authorization extension for CDAP. (CDAP-9027)
Added a "cdap apply-pack [pack]" command to the "cdap" script that allows for upgrading of individual CDAP components. (CDAP-9077)

Bug Fixes

Upgrade
................................

Fixed an issue with the pipeline upgrade tool that caused it to skip CDAP 4.0.x pipelines. (CDAP-9185)

Pipeline Previews
................................

Fixed a bug that preview cannot read from datasets in real space. (CDAP-7884)
When previewing a pipeline in the CDAP Studio, disabled all writes to sinks. Incoming data to sinks can be viewed in the preview tab of the sink, but is not written to the sink. (CDAP-8013)
Fixed an issue where preview of CDAP pipelines did not show data for successful stages if a particular stage failed. (CDAP-9333)

Logs
................................

Fixed a problem that caused duplicate logs to show up for a running pipeline. (CDAP-7138)
Fixed bug where the "Total Messages/Errors/Warnings" at the top of logviewer was showing incorrect values. (CDAP-9248)

Schedules
................................

Fixed an issue where redeployment of an application with a deleted schedule would fail. (CDAP-8918)

Other Bug Fixes
................................

Removed the requirement of being an admin to run the CDAP startup script for Windows. (CDAP-4213)
Made Plugin Endpoint invocation more robust. If a plugin's parent can't instantiate the plugin necessary for invoking, CDAP will attempt with other parents of the plugin and try to instantiate using them before retuning error. (CDAP-5715)
Fixed an issue with namespace deletion which caused CDAP Application test cases to fail in a Windows environment. (CDAP-6348)
Fix an issue with losing a few metrics when a container is shutdown. (CDAP-8862)
Fixed an issue with the YARN container allocation logic so that the correct container size is used. (CDAP-8888)
Improved the serializability of Tables and IndexedTables when used in Spark programs. (CDAP-8913)
Moved the "add plugin" behavior from a plugin's left panel to an "Add Entity" button in the CDAP Studio UI. (CDAP-8945)
Fixed an issue in the CDAP UI where navigating from a stream card to an overview and then to a detail page made the detail page show a spinner icon indefinitely. (CDAP-8950)
Fixed an issue with the Spark program runtime so that the Kryo serializer can be used. (CDAP-8980
Fixed an issue where the HBase Queue Debugging Tool failed when authorization was enabled. (CDAP-9005)
Fixed an issue where users could not grant and revoke privileges for UNIX groups and users when using Apache Sentry as the authorization extension for CDAP. (CDAP-9029
Fixed an issue where revoking privileges from a role caused the privilege to be revoked from all roles. (CDAP-9046)
Fixed an issue with the Window plugin so that it propagates schema properly. (CDAP-9086)
Fixed the Overview panel in home page of the CDAP UI to handle unknown entities appropriately. (CDAP-9087)
Added the retrying of local dataset operations when a failure happens. (CDAP-9114)
Fixed an issue with the binary format in the Kafka streaming source that prevented pipeline deployment. (CDAP-9142)
Fixed an issue that caused YARN containers to be killed due to excessive memory usage when impersonation is enabled. (CDAP-9160)
Fixed bug where navigation links were referencing default namespace instead of the current namespace. (CDAP-9216)
Improved error messages for the 'Get S...

Assets 2

10 Apr 23:10

prinam

v3.5.5

4a49268

Cask Data Application Platform 3.5.5

New Features

Authentication server announce address is now configurable with the property security.auth.server.announce.urls, which are comma-separated URLs in the form of protocol://host:port. The property security.auth.server.announce.address is now deprecated. It is only used if it is set but security.auth.server.announce.urls has not been set. (CDAP-4535)
security.auth.server.announce.address now takes a single address in the form of either host:port or host.
A default URL will be generated by the Authentication Server if either property is not set.
New configurations have been added to control the YARN application master container memory size, maximum heap memory size, and maximum non-heap memory size: twill.java.heap.memory.ratio, twill.yarn.am.memory.mb, and twill.yarn.am.reserved.memory.mb. (CDAP-8990)

Improvements

LogHandler endpoints now returns a 404 status code if the entity (the run id) for which logs are requested does not exist. (CDAP-9084)

Bug Fixes

Fixed an issue where HBaseQueueDebugger failed when authorization was enabled. (CDAP-9005)
Fixed a memory leak issue with the Hadoop FileSystem object. (CDAP-9160)
Fixed an out-of-memory issue for the log saver by adding a limit on the maximum number of events in-memory. (CDAP-9085)
Fixed an issue with uncaught exceptions so that they are logged through the logger, allowing log collections for those exceptions. (CDAP-8997)

Deprecated and Removed Features

The property security.auth.server.announce.address is now deprecated. (CDAP-4535)

Assets 2

03 Mar 02:43

prinam

v3.5.4

452e308

Cask Data Application Platform 3.5.4

New Features

Added fetch size and transaction flush interval configurations to the Kafka Consumer Flowlet. (CDAP-7731)
Fixed an issue to make artifact, datasets, logs, and coprocessor JAR locations resilient to an HDFS Namenode HA upgrade. (CDAP-8343)

Improvements

Reduced non-informative stacktrace information in the log when a connection to the CDAP Router is closed prematurely. (CDAP-8250)
Improved the master process stop procedure to support fast failover when running with HA. Added a new kill command to force-kill CDAP processes. (CDAP-8565)

Bug Fixes

Fixed an issue where DefaultNamespaceEnsurer sometimes prevented CDAP Master shutdown. (CDAP-7090)
Fixed an issue with CDAP Standalone starting in a Microsoft Windows environment. (CDAP-7829)
Fix the CDAP UpgradeTool to not rely on the existence of a 'default' namespace. (CDAP-8229)
Added back the CDAP UI health-check end point to determine the status of the CDAP UI service. (CDAP-8260)
Fixed an issue where a major compaction was not evicting invalid queue entries. (CDAP-8798)
Fixed an issue with transactions started after a snapshot restore having an incorrect invalid transaction list. (CDAP-8855)

Assets 2

27 Feb 07:15

prinam

v4.1.0

be2d596

Cask Data Application Platform 4.1.0

New Features

Secure Impersonation

Added support for fine-grained impersonation at the CDAP application, dataset, and stream level. (CDAP-8110)
Impersonated namespaces can be configured to disallow the impersonation of the namespace owner when running CDAP Explore queries. (CDAP-8355)

Replication and Resiliency

Provided SPI hooks that users can implement for performing HBase DDL operations. (CDAP-7685)
Added a tool to check a cluster's replication status. (CDAP-8025)
CDAP context methods will now be retried according to a program's retry policy. These are governed by these properties: (CDAP-8032)
- custom.action.retry.policy.base.delay.ms
- custom.action.retry.policy.max.delay.ms
- custom.action.retry.policy.max.retries
- custom.action.retry.policy.max.time.secs
- custom.action.retry.policy.type
- flow.retry.policy.base.delay.ms
- flow.retry.policy.max.delay.ms
- flow.retry.policy.max.retries
- flow.retry.policy.max.time.secs
- flow.retry.policy.type
- mapreduce.retry.policy.base.delay.ms
- mapreduce.retry.policy.max.delay.ms
- mapreduce.retry.policy.max.retries
- mapreduce.retry.policy.max.time.secs
- mapreduce.retry.policy.type
- service.retry.policy.base.delay.ms
- service.retry.policy.max.delay.ms
- service.retry.policy.max.retries
- service.retry.policy.max.time.secs
- service.retry.policy.type
- spark.retry.policy.base.delay.ms
- spark.retry.policy.max.delay.ms
- spark.retry.policy.max.retries
- spark.retry.policy.max.time.secs
- spark.retry.policy.type
- system.log.process.retry.policy.base.delay.ms
- system.log.process.retry.policy.max.retries
- system.log.process.retry.policy.max.time.secs
- system.log.process.retry.policy.type
- system.metrics.retry.policy.base.delay.ms
- system.metrics.retry.policy.max.retries
- system.metrics.retry.policy.max.time.secs
- system.metrics.retry.policy.type
- worker.retry.policy.base.delay.ms
- worker.retry.policy.max.delay.ms
- worker.retry.policy.max.retries
- worker.retry.policy.max.time.secs
- worker.retry.policy.type
- workflow.retry.policy.base.delay.ms
- workflow.retry.policy.max.delay.ms
- workflow.retry.policy.max.retries
- workflow.retry.policy.max.time.secs
- workflow.retry.policy.type
Added a master.manage.hbase.coprocessors setting that can be set to false on clusters where the CDAP coprocessors are deployed on every HBase node. (CDAP-8037)

Enhancements to the New CDAP UI

Added the management of preferences at the application and program levels. (CDAP-8021)

The CDAP UI added dataset and stream detail and overviews. (CDAP-8217)
The CDAP UI added a "call-to-action" dialog after entity creation, so users can easily perform actions on the newly-created entities. (CDAP-8203)
Users can now view events and logs of programs in the new CDAP UI using the events and log view "fast-action" dialogs. (CDAP-8282,CDAP-8376)
Users now see on the CDAP UI homepage a "Just Added" section, listing and highlighting any entities added in the last five minutes. (CDAP-8398)
The CDAP UI added a duration timer to CDAP pipelines. (HYDRATOR-208)

Logs

Added a prototype implementation for a rolling HDFS log appender. (CDAP-7676,CDAP-9999)
Program context information, including namespace, program name, and program type, are now available in the MDC property of each ILoggingEvent emitted from a program container. (CDAP-7962)
Revised the CDAP Log Appender to use Logback's Appender interface. (CDAP-8108)
The log file cleaner thread will remove metadata and, for successfully deleted metadata entries, it will delete the corresponding log files. The log file cleaner thread will only remove the metadata entries for the old (pre-4.1.0) log format. (CDAP-8231)
Logs collected by the CDAP Log Appender will be stored at a common <cdap>/logs path, owned by the cdap user. For security, it is readable only by the cdap user. (CDAP-8261)
Added additional metrics about the status of the log framework: log.process.min.delay and log.process.max.delay. (CDAP-8428)

New CDAP Pipeline Plugins

The Kinesis Spark Streaming source plugin is available in its own repository at github.com/hydrator/kinesis-spark-streaming-source. (HYDRATOR-235)
Added a plugin for sampling data from a source, available at github.com/hydrator/sampling-aggregator. (HYDRATOR-552)
The HTTP Sink plugin (for posting data from a pipeline to an external endpoint) has been added at github.com/hydrator/http-sink. (HYDRATOR-585)
The Kinesis Source plugin now works in realtime pipelines. (HYDRATOR-954)
Added a Feature Generator plugin for a pipeline builder. (HYDRATOR-983)
Added a DynamoDb Sink as a plugin, available at github.com/hydrator/dynamodb-sink. (HYDRATOR-1049)
Added a DynamoDB Batch Source plugin, available at github.com/hydrator/dynamodb-source. (HYDRATOR-1050)
Added a "Fail This Pipeline" sink plugin in a repo at github.com/hydrator/failpipeline-sink; this is a sink where, if any records flow to the sink, the pipeline is marked as failed, triggering any post-actions that might be scheduled. (HYDRATOR-1073)
Added a plugin for fetching data from an external HTTP site and writing the response to HDFS, available at github.com/hydrator/httptohdfs-action. (HYDRATOR-1074)
Added a Realtime Stream Source plugin, available at github.com/hydrator/realtime-stream-source. (HYDRATOR-1172)
The Tokenizer plugin is now available in it own repository at github.com/hydrator/tokenizer-analytics. (HYDRATOR-1249)
The NGramTransform plugin is now available in its own repository at github.com/hydrator/ngram-analytics. (HYDRATOR-1250)
The DecisionTree Regression plugins are now available in their own repository at github.com/hydrator/decision-tree-analytics. (HYDRATOR-1251)
The SkipGram Feature Generator plugin is now available in its own repository at github.com/hydrator/skipgram-analytics. (HYDRATOR-1252)
The Naive Bayes Analytics plugin is now available in its own repository at github.com/hydrator/naive-bayes-analytics. (HYDRATOR-1253)
The HashingTF Feature Generator plugin is now available in its own repository at github.com/hydrator/hashing-tf-feature-generator. (HYDRATOR-1254)
The LogisticRegression plugins are now available in their own repository at github.com/hydrator/logistic-regression-analytics. (HYDRATOR-1255)
Added a new ErrorTransform plugin-type that can be placed after a pipeline stage to consume errors emitted by that stage. (HYDRATOR-1323)
Support added for Table datasets for lookups in plugins and pipelines. (HYDRATOR-1398)

Dataset Improvements

Added the ability to reuse an existing file system location and Hive table when creating a partitioned file set. (CDAP-7596)
Added configuring the CDAP Explore database and table name for a dataset using dataset properties. (CDAP-7597)
Added a tool that pre-builds and loads the HBase coprocessors required by CDAP onto HDFS. (CDAP-7683)
Added control of group ownership and permissions through dataset properties. (CDAP-8070)

Other New Features

CDAP now uses environment variables in the spark-env.sh and properties in the `spark-d...

Assets 2

25 Jan 03:20

prinam

v4.0.1

b0a9ea5

Cask Data Application Platform 4.0.1

Improvement

Added a step in the CDAP Upgrade Tool to disable TMS (Transaction Messaging Service) message and payload tables. The TMS TwillRunnable will update the coprocessors of those tables if required and enable the tables. (CDAP-8047)

Bug Fixes

Fixed an issue where the CDAP service scripts could cause a terminal session to not echo characters. (CDAP-7694)
The CDAP Security service under CDAP Standalone is no longer forced to bind to localhost. (CDAP-7992)
To avoid transaction timeouts, log cleanup is now done in configurable batches (controlled by the property log.cleanup.max.num.files) instead of a single short transaction. (CDAP-8000)
Fixed a bug in the TMS (Transaction Messaging Service) message and payload table coprocessors by changing the accessing of CDAP configuration and TMS metadata tables from reading them inline to reading them in a separate thread. (CDAP-8007)
Changed the default CDAP UI port to 11011 to match the CDAP 4.0.0 release. (CDAP-8023)
Removed an obsolete Update Dataset Specifications step in the CDAP Upgrade tool. This step was required only for upgrading from CDAP versions lower than 3.2 to CDAP Version 3.2. (CDAP-8086)
Provided a workaround for Scala bug SI-6240 (https://issues.scala-lang.org/browse/SI-6240) to allow concurrent execution of Spark programs in CDAP Workflows. (CDAP-8087)
Fixed the CDAP Hydrator detail view so that it can be rendered in older browsers. (CDAP-8088)
Fixed an issue where the number of records processed during a preview run of the realtime data pipeline was being incremented incorrectly. (CDAP-8094)
Fixed an issue with the flag used by the Node proxy to enable SSL between the CDAP UI and CDAP Router. (CDAP-8126)
Fixed an issue with the CDAP CLI where execute commands may be interpreted incorrectly. (CDAP-8137)
Fixed an issue in the template path used with the original CDAP UI when rendering a dataset detailed view. (CDAP-8148)
Fixed issues with the Ambari UI "Quick Links" and alerts definitions for SSL and non-default ports and the writing of the cdap-security.xml file when configured under the CDAP Ambari Service. (CDAP-8158)
Fixed an issue where runtime arguments were not being passed for the preview run correctly in the CDAP UI. (HYDRATOR-1212)
Fixed an issue where previews would not run in a non-default namespace. (HYDRATOR-1226)

Assets 2

21 Jan 01:09

prinam

v3.5.3

4b105ce

Cask Data Application Platform 3.5.3

Improvements

Now allows usage of a custom Kryo serializer in Spark programs. (CDAP-7647)

Bug Fixes

Fixed an issue where the CDAP service scripts could cause a terminal session to not echo characters. (CDAP-7694)
Removed an obsolete Update Dataset Specifications step in the CDAP Upgrade tool. This step was required only for upgrading from CDAP versions lower than 3.2 to CDAP Version 3.2. (CDAP-8086)
Provided a workaround for Scala bug SI-6240 (https://issues.scala-lang.org/browse/SI-6240) to allow concurrent execution of Spark programs in CDAP Workflows. (CDAP-8087)

Assets 2

23 Dec 22:40

awholegunch

v3.5.2

28d974c

Cask Data Application Platform 3.5.2

Known Issues

In CDAP 3.5.0, new kafka.server.* properties replace older properties such as kafka.log.dir, as described in the Administration Manual: Appendices: cdap-site.xml. (CDAP-7179)

If you are upgrading from CDAP 3.4.x to 3.5.x and you have set a value for kafka.log.dir by using Cloudera Manager's safety-valve mechanism, you need to change to the new property kafka.server.log.dirs, as the deprecated kafka.log.dir is being ignored in favor of the new property. If you don't, your custom value will be replaced with the default value.
When running in CDAP Standalone, the Cask Hydrator plugin NaiveBayesTrainer has a permgen memory leak that leads to an out-of-memory error if the plugin is repeatedly used a number of times, as few as six runs. The only workaround is to reset the memory by restarting CDAP Standalone. (CDAP-7608)

Improvements

Fixed an issue with the CDAP scripts under Windows not handling a JAVA_HOME path with spaces in it correctly. CDAP SDK home directories with spaces in the path are not supported (due to issues with the product) and the scripts now exit if such a path is detected. (CDAP-3262)
For MapReduce programs using a PartitionedFileSet as input, expose the partition key corresponding to the input split to the mapper. (CDAP-4322)
Added the property program.container.dist.jars to set extra jars to be localized to every program container and to be added to classpaths of CDAP programs. (CDAP-6183)
The namespace that integration test cases run against by default has been made configurable. (CDAP-6572)
Improve UpgradeTool to upgrade tables in namespaces with impersonation configured. (CDAP-6577)
Added support for concurrent runs of a Spark program. (CDAP-6885)
Added support for impersonation with CDAP Explore (Hive) operations, such as enabling exploring of a dataset or running queries against it. (CDAP-6587)
Added support for CDH 5.9. (CDAP-7291)
The Log HTTP Handler and Router have been fixed to allow the streaming of larger logs files. (CDAP-7385)
Added support to LogSaver for impersonation. (CDAP-7387)
Added authorization for schedules in CDAP. (CDAP-7404)
Improved error handling upon failures in namespace creation. (CDAP-7529)
DynamicPartitioner can now limit the number of open RecordWriters to one, if the output partition keys are grouped. (CDAP-7557)
Added a property kafka.zookeeper.quorum to be used across all internal clients using Kafka. (CDAP-7682)
Adds cluster.name as a property that identifies a cluster; this property can be set in the cdap-site.xml. (CDAP-7761)
Added the Windows Share Copy plugin to the Hydrator plugins. (HYDRATOR-979)
The SSH hostname and the command to be executed are now macro-enabled for the SSH action plugin. (HYDRATOR-997)

Bug Fixes

Fixed an issue that prevented macros from being used with a secure KMS store. (CDAP-6981)
Fixed an issue so as to significantly reduce the chance of a schedule misfire in the case where the CPU cannot trigger a schedule within a certain time threshold. (CDAP-7116)
Fixed an issue where macros were not being substituted for postaction plugins. (CDAP-7177)
Fixed an issue where dataset usage was not being recorded after an application was deleted. (CDAP-7250)
Fixed an issue that would cause MapReduce and Spark programs to fail if too many macros were being used. (CDAP-7318)
Fixed a problem with upgrading CDAP using the CDAP Upgrade Tool. (CDAP-7321)
Fixed a problem with the upgrade tool while upgrading HBase coprocessors. (CDAP-7324)
Fixed a problem with using "Download All" logs in the browser log viewer by having it fetch and stream the response to the client. (CDAP-7353)
Fixed a problem with NodeJS buffering a response before sending it to a client. (CDAP-7359)
Fixed a problem with log file corruption if the log saver container crashes due to being killed by YARN. (CDAP-7361)
Fixed a problem with the CDAP UI not handling "5xx" error codes correctly. (CDAP-7364)
Fixed Hydrator Studio in the Windows version of Chrome to allow users to open and edit a node configuration. (CDAP-7374)
Fixed an error in the "CDAP Introduction" tutorial's "Transforming Your Data" example of an application configuration. (CDAP-7386)
Fixed TestFramework classloading to support classes that depend on org.hamcrest. (CDAP-7391)
Fixed an issue where the Java process corresponding to the MapReduce application master kept running even if the application was moved to the FINISHED state. (CDAP-7392)
Fixed an issue with impersonation in flows not working by not re-using HBaseAdmin across different UGI. (CDAP-7394)
Fixed an issue which prevented scheduled jobs from running on a namespace with impersonation. (CDAP-7396)
Fixed an issue which prevented an app in a namespace from being deleted if a program for the same app is running in a different namespace. (CDAP-7398)
Fixed an issue that prevented the CDAP UI from starting if the logback.xml was configured to log at the INFO or lower level. (CDAP-7403)
Added authorization for schedules in CDAP. (CDAP-7404)
Avoid the caching of YarnClient in order to fix a problem that occurred in namespaces with impersonation configured. (CDAP-7420)
Fixed an issue that prevented HBaseQueueDebugger from running in an impersonated namespace. (CDAP-7433)
Fixed an error which prevented the downloading of large logs using the CDAP UI. (CDAP-7435)
Removed the requirement of running "kinit" prior to running either the Upgrade or Transaction Debugger tools of CDAP on a secure Hadoop cluster. ([CDAP-7438, :cask-issue:CDAP-7439](https://issues.cask.co/browse/CDAP-7438`, :cask-issue:`CDAP-7439))
Fixed an issue that prevented the CDAP Upgrade Tool from being run for a namespace with authorization turned on. (CDAP-7458)
Fix logback-container.xml to work on clusters with multiple log directories configured for YARN. (CDAP-7473)
Fixed a problem in CDAP logging that caused system logs from Kafka to not be saved after an upgrade and for previously-saved logs to become inaccessible. (CDAP-7482)
Fixed cases where the MapReduce classloader was being closed prematurely. (CDAP-7500)
Fixed a problem that prevented the use of a logback.xml from an application jar. (CDAP-7527)
Fixed a problem in integration tests to allow JDBC connections against authorization-enabled and SSL-enabled CDAP instances. (CDAP-7548)
Improved the usability of ServiceManager in integration tests. The getServiceURL method now waits for the service to be discoverable before returning the service's URL. (CDAP-7566)
Fixed cases where Spark programs could not be started after a master failover or restart. (CDAP-7612)
The CDAP Ambari service was updated to use scripts for Auth Server/Router alerts in Ambari due to Ambari not supporting CDAP's /status endpoint with WEB check. (CDAP-7660)
Fixed a problem with Hydrator pipelines using a DBSource not working in an HDP cluster. (HYDRATOR-791)
Fixed a problem with Spark data pipelines not supporting argument values in excess of 64K characters. (HYDRATOR-948)
Fixed a problem that prevented the adding of a schema with hyphens in the Hydrator UI.

Assets 2

Releases: cdapio/cdap

Cask Data Application Platform - 4.3.0

Summary

New Features

Data Pipelines Enhancements

Triggers

Data Preparation Enhancements

Spark Enhancements

Governance and Security Enhancements

Other New Features

Uh oh!

Cask Data Application Platform - 4.1.2

Improvements

Bug Fixes

Uh oh!

Cask Data Application Platform v4.2.0

Summary

New Features

Spark Enhancements

Enhanced Data Preparation

Event Driven Schedules

Other New Features

Bug fixes

Deprecations

Uh oh!

Cask Data Application Platform 4.1.1

Summary

New Features

Improvements

Bug Fixes

Uh oh!

Cask Data Application Platform 3.5.5

New Features

Improvements

Bug Fixes

Deprecated and Removed Features

Uh oh!

Cask Data Application Platform 3.5.4

New Features

Improvements

Bug Fixes

Uh oh!

Cask Data Application Platform 4.1.0

New Features

Secure Impersonation

Replication and Resiliency

Enhancements to the New CDAP UI

Logs

New CDAP Pipeline Plugins

Dataset Improvements

Other New Features

Uh oh!

Cask Data Application Platform 4.0.1

Improvement

Bug Fixes

Uh oh!

Cask Data Application Platform 3.5.3

Improvements

Bug Fixes

Uh oh!

Cask Data Application Platform 3.5.2

Known Issues

Improvements

Bug Fixes

Uh oh!