Releases: Teradata/teradataml
teradataml 20.00.00.02
Teradata Python package for Advanced Analytics.
teradataml makes available to Python users a collection of analytic functions that reside on Teradata Vantage. This allows users to perform analytics on Teradata Vantage with no SQL coding. In addition, the teradataml library provides functions for scaling data manipulation and transformation, data filtering and sub-setting, and can be used in conjunction with other open-source python libraries.
For community support, please visit the Teradata Community.
For Teradata customer support, please visit Teradata Support.
Copyright 2024, Teradata. All Rights Reserved.
Table of Contents
Release Notes:
teradataml 20.00.00.02
-
teradataml will no longer be supported with SQLAlchemy < 2.0.
-
teradataml no longer shows the warnings from Vantage by default.
- Users should set
display.suppress_vantage_runtime_warnings
toFalse
to display warnings.
- Users should set
-
New Features/Functionality
-
teradataml: SQLE Engine Analytic Functions
- New Analytics Database Analytic Functions:
TFIDF()
Pivoting()
UnPivoting()
- New Unbounded Array Framework(UAF) Functions:
AutoArima()
DWT()
DWT2D()
FilterFactory1d()
IDWT()
IDWT2D()
IQR()
Matrix2Image()
SAX()
WindowDFFT()
- New Analytics Database Analytic Functions:
-
teradataml: Functions
udf()
- Creates a user defined function (UDF) and returns ColumnExpression.set_session_param()
is added to set the database session parameters.unset_session_param()
is added to unset database session parameters.
-
teradataml: DataFrame
materialize()
- Persists DataFrame into database for current session.create_temp_view()
- Creates a temporary view for session on the DataFrame.
-
teradataml DataFrameColumn a.k.a. ColumnExpression
- Date Time Functions
DataFrameColumn.to_timestamp()
- Converts string or integer value to a TIMESTAMP data type or TIMESTAMP WITH TIME ZONE data type.DataFrameColumn.extract()
- Extracts date component to a numeric value.DataFrameColumn.to_interval()
- Converts a numeric value or string value into an INTERVAL_DAY_TO_SECOND or INTERVAL_YEAR_TO_MONTH value.
- String Functions
DataFrameColumn.parse_url()
- Extracts a part from a URL.
- Arithmetic Functions
DataFrameColumn.log
- Returns the logarithm value of the column with respect to 'base'.
- Date Time Functions
-
teradataml: AutoML
- New methods added for
AutoML()
,AutoRegressor()
andAutoClassifier()
:evaluate()
- Performs evaluation on the data using the best model or the model of users choice
from the leaderboard.load()
: Loads the saved model from database.deploy()
: Saves the trained model inside database.remove_saved_model()
: Removes the saved model in database.model_hyperparameters()
: Returns the hyperparameter of fitted or loaded models.
- New methods added for
-
-
Updates
-
teradataml: AutoML
AutoML()
,AutoRegressor()
- New performance metrics added for task type regression i.e., "MAPE", "MPE", "ME", "EV", "MPD" and "MGD".
AutoML()
,AutoRegressor()
andAutoClassifier
- New arguments added:
volatile
,persist
. predict()
- Data input is now mandatory for generating predictions. Default model
evaluation is now removed.
- New arguments added:
-
DataFrameColumn.cast()
: Accepts 2 new argumentsformat
andtimezone
. -
DataFrame.assign()
: Accepts ColumnExpressions returned byudf()
. -
teradataml: Options
set_config_params()
- Following arguments will be deprecated in the future:
ues_url
auth_token
- Following arguments will be deprecated in the future:
-
Database Utility
list_td_reserved_keywords()
- Accepts a list of strings as argument.
-
Updates to existing UAF Functions:
ACF()
-round_results
parameter removed as it was used for internal testing.BreuschGodfrey()
- Added default_value 0.05 for parametersignificance_level
.GoldfeldQuandt()
-- Removed parameters
weights
andformula
.
Replaced parameterorig_regr_paramcnt
withconst_term
.
Changed description for parameteralgorithm
. Please refer document for more details. - Note: This will break backward compatibility.
- Removed parameters
HoltWintersForecaster()
- Default value of parameterseasonal_periods
removed.IDFFT2()
- Removed parameteroutput_fmt_row_major
as it is used for internal testing.Resample()
- Added parameteroutput_fmt_index_style
.
-
-
Bug Fixes
- KNN
predict()
function can now predict on test data which does not contain target column. - Metrics functions are supported on the Lake system.
- The following OpensourceML functions from different sklearn modules are fixed.
sklearn.ensemble
:- ExtraTreesClassifier -
apply()
- ExtraTreesRegressor -
apply()
- RandomForestClassifier -
apply()
- RandomForestRegressor -
apply()
- ExtraTreesClassifier -
sklearn.impute
:- SimpleImputer -
transform()
,fit_transform()
,inverse_transform()
- MissingIndicator -
transform()
,fit_transform()
- SimpleImputer -
sklearn.kernel_approximations
:- Nystroem -
transform()
,fit_transform()
- PolynomialCountSketch -
transform()
,fit_transform()
- RBFSampler -
transform()
,fit_transform()
- Nystroem -
sklearn.neighbors
:- KNeighborsTransformer -
transform()
,fit_transform()
- RadiusNeighborsTransformer -
transform()
,fit_transform()
- KNeighborsTransformer -
sklearn.preprocessing
:- KernelCenterer -
transform()
- OneHotEncoder -
transform()
,inverse_transform()
- KernelCenterer -
- OpensourceML returns teradataml objects for model attributes and functions instead of sklearn
objects so that the user can perform further operations likescore()
,predict()
etc on top
of the returned objects. - AutoML
predict()
function now generates correct ROC-AUC value for positive class. deploy()
method ofScript
andApply
classes retries model deployment if there is any
intermittent network issues.
- KNN
teradataml 20.00.00.01
Teradata Python package for Advanced Analytics.
teradataml makes available to Python users a collection of analytic functions that reside on Teradata Vantage. This allows users to perform analytics on Teradata Vantage with no SQL coding. In addition, the teradataml library provides functions for scaling data manipulation and transformation, data filtering and sub-setting, and can be used in conjunction with other open-source python libraries.
For community support, please visit the Teradata Community.
For Teradata customer support, please visit Teradata Support.
Copyright 2024, Teradata. All Rights Reserved.
Table of Contents
Release Notes:
teradataml 20.00.00.01
-
teradataml no longer supports Python versions less than 3.8.
-
New Features/Functionality
-
Personal Access Token (PAT) support in teradataml
set_auth_token()
- teradataml now supports authentication via PAT in addition to
OAuth 2.0 Device Authorization Grant (formerly known as the Device Flow).- It accepts UES URL, Personal AccessToken (PAT) and Private Key file generated from VantageCloud Lake Console
and optional argumentusername
andexpiration_time
in seconds.
- It accepts UES URL, Personal AccessToken (PAT) and Private Key file generated from VantageCloud Lake Console
-
-
Updates
-
teradataml: SQLE Engine Analytic Functions
ANOVA()
- New arguments added:
group_name_column
,group_value_name
,group_names
,num_groups
for data containing group values and group names.
- New arguments added:
FTest()
- New arguments added:
sample_name_column
,sample_name_value
,first_sample_name
,second_sample_name
.
- New arguments added:
GLM()
- Supports stepwise regression and accept new arguments
stepwise_direction
,max_steps_num
andinitial_stepwise_columns
. - New arguments added:
attribute_data
,parameter_data
,iteration_mode
andpartition_column
.
- Supports stepwise regression and accept new arguments
GetFutileColumns()
- Arguments
category_summary_column
andthreshold_value
are now optional.
- Arguments
KMeans()
- New argument added:
initialcentroids_method
.
- New argument added:
NonLinearCombineFit()
- Argument
result_column
is now optional.
- Argument
ROC()
- Argument
positive_class
is now optional.
- Argument
SVMPredict()
- New argument added:
model_type
.
- New argument added:
ScaleFit()
- New arguments added:
ignoreinvalid_locationscale
,unused_attributes
,attribute_name_column
,attribute_value_column
. - Arguments
attribute_name_column
,attribute_value_column
andtarget_attributes
are supported for sparse input. - Arguments
attribute_data
,parameter_data
andpartition_column
are supported for partitioning.
- New arguments added:
ScaleTransform()
- New arguments added:
attribute_name_column
andattribute_value_column
support for sparse input.
- New arguments added:
TDGLMPredict()
- New arguments added:
family
andpartition_column
.
- New arguments added:
XGBoost()
- New argument
base_score
is added for initial prediction value for all data points.
- New argument
XGBoostPredict()
- New argument
detailed
is added for detailed information of each prediction.
- New argument
ZTest()
- New arguments added:
sample_name_column
,sample_value_column
,first_sample_name
andsecond_sample_name
.
- New arguments added:
-
teradataml: AutoML
AutoML()
,AutoRegressor()
andAutoClassifier()
- New argument
max_models
is added as an early stopping criterion to limit the maximum number of models to be trained.
- New argument
-
teradataml: DataFrame functions
DataFrame.agg()
- Accepts ColumnExpressions and list of ColumnExpressions as arguments.
-
teradataml: General Functions
- Data Transfer Utility
fastload()
- Improved error and warning table handling with below-mentioned new arguments.err_staging_db
err_tbl_name
warn_tbl_name
err_tbl_1_suffix
err_tbl_2_suffix
fastload()
- Change in behaviour ofsave_errors
argument.
Whensave_errors
is set toTrue
, error information will be available in two persistent tablesERR_1
andERR_2
.
Whensave_errors
is set toFalse
, error information will be available in single pandas dataframe.
- Garbage collector location is now configurable.
User can set configure.local_storage to a desired location.
- Data Transfer Utility
-
-
Bug Fixes
- UAF functions now work if the database name has special characters.
- OpensourceML can now read and process NULL/nan values.
- Boolean values output will now be returned as VARBYTE column with 0 or 1 values in OpensourceML.
- Fixed bug for
Apply
'sdeploy()
. - Issue with volatile table creation is fixed where it is created in the right database, i.e., user's spool space, regardless of the temp database specified.
ColumnTransformer
function now processes its arguments in the order they are passed.
teradataml 20.00.00.00
-
New Features/Functionality
-
teradataml OpenML: Run Opensource packages through Teradata Vantage
OpenML
dynamically exposes opensource packages through Teradata Vantage.OpenML
provides an
interface object through which exposed classes and functions of opensource packages can be accessed
with the same syntax and arguments.
The following functionality is added in the current release:td_sklearn
- Interface object to run scikit-learn functions and classes through Teradata Vantage.
Example usage below:from teradataml import td_sklearn, DataFrame df_train = DataFrame("multi_model_classification") feature_columns = ["col1", "col2", "col3", "col4"] label_columns = ["label"] part_columns = ["partition_column_1", "partition_column_2"] linear_svc = td_sklearn.LinearSVC()
OpenML
is supported in both Teradata Vantage Enterprise and Teradata Vantage Lake.- Argument Support:
Use of X and y arguments
- Scikit-learn users are familiar with usingX
andy
as argument names
which take data as pandas DataFrames, numpy arrays or lists etc. However, in OpenML, we pass
teradataml DataFrames for argumentsX
andy
.df_x = df_train.select(feature_columns) df_y = df_train.select(label_columns) linear_svc = linear_svc.fit(X=df_x, y=df_y)
Additional support for data, feature_columns, label_columns and group_columns arguments
-
Apart from traditional arguments, OpenML supports additional arguments -data
,
feature_columns
,label_columns
andgroup_columns
. These are used as alternatives toX
,y
andgroups
.linear_svc = linear_svc.fit(data=df_train, feature_columns=feature_columns, label_colums=label_columns)
Support for classification and regression metrics
- Metrics functions for classification and
regression insklearn.metrics
module are supported. Other metrics functions' support will be added
in future releases.Distributed Modeling and partition_columns argument support
- Existing scikit-learn supports
only single model generation. However, OpenML supports both single model use case and distributed
(multi) model use case. For this, user has to additionally passpartition_columns
argument to
existingfit()
,predict()
or any other function to be run. This will generate multiple models
for multiple partitions, using the data in corresponding partition.df_x_1 = df_train.select(feature_columns + part_columns) linear_svc = linear_svc.fit(X=df_x_1, y=df_y, partition_columns=part_columns)
Support for load and deploy models
- OpenML provides additional support for saving (deploying) the
trained models. These models can be loaded later to perform operations like prediction, score etc. The
following functions are provided by OpenML:<obj>.deploy()
- Used to deploy/save the model created and/or trained by OpenML.td_sklearn.deploy()
- Used to deploy/save the model created and/or trained outside teradataml.td_sklearn.load()
- Used to load the saved models.
Refer Teradata Python Package User Guide for more details of this feature, arguments, usage, examples and supportability in both VantageCloud Enterprise and VantageCloud Lake. -
teradataml: AutoML - Automated end to end Machine Learning flow.
AutoML is an approach to automate the process of building, training, and validating machine learning models.
It involves automation of various aspects of the machine learning workflow, such as feature exploration,
feature engineering, data preparation, model training and evaluation for given dataset.
teradataml AutoML feature offers best model identification, model leaderboard generation, parallel execution,
early stopping feature, model evaluation, model prediction, live logging, customization on default process.AutoML
AutoML is a generic algorithm that supports all three tasks, i.e. 'Regression',
'Binary Classification' and 'Multiclass Classification'.- Methods of AutoML
__init__()
- Instantiate an object of AutoML with given parameters.fit()
- Perform fit on specified data and target column.leaderboard()
- Get the leaderboard for the AutoML. Presents diverse models, feature
selection method, and performance metrics.leader()
- Show best performing model and its details such as feature
selection method, and performance metrics.predict()
- Perform prediction on the data using the best model or the model of users
choice from the leaderboard.generate_custom_config()
- Generate custom config JSON file required for customized
run of AutoML.
- Methods of AutoML
AutoRegressor
AutoRegressor is a special purpose AutoML feature to run regression specific tasks.- Methods of AutoRegressor
__init__()
- Instantiate an object of AutoRegressor with given parameters.fit()
- Perform fit on specified data and target column.leaderboard()
- Get the leaderboard for the AutoRegressor. Presents diverse models, feature
selection method, and performance metrics.leader()
- Show best performing model and its details such as feature
selection method, and performance metrics.predict()
- Perform prediction on the data using the best model or the model of users
choice from the leaderboard.generate_custom_config()
- Generate custom config JSON file required for customized
run of AutoRegressor.
- Methods of AutoRegressor
AutoClassifier
AutoClassifier is a special purpose AutoML feature to run classification specific tasks.- Methods of AutoClassifier
__init__()
- Instantiate an object of AutoClassifier with given parameters.fit()
- Perform fit on specified data and target column.leaderboard()
- Get the leaderboard for the AutoClassifier. Presents diverse models, feature
selection method, and performance metrics.leader()
- Show best performing model and its details such as feature
selection method, and performance metrics.predict()
- Perform prediction on the data using the best model or the model of users
choice from the leaderboard.generate_custom_config()
- Generate custom config JSON file required for customized
run of AutoClassifier.
- Methods of AutoClassifier
-
teradataml: DataFrame
fillna
- Replace the null values in a column with the value specified.- Data Manipulation
cube()
- Analyzes data by grouping it into multiple dimensions.rollup()
- Analyzes a set of data across a single dimension with more than one level of detail.replace()
- Replaces the values for columns.
-
teradataml: Script and Apply
deploy()
- Function deploys the model, generated afterexecute_script()
, in database or user
environment in lake. The function is available in both Script and Apply.
-
teradataml: DataFrameColumn
fillna
- Replaces every occurrence of null value in column with the value specified.
-
-
teradataml DataFrameColumn a.k.a. ColumnExpression
- Date Time Functions
DataFrameColumn.week_start()
- Returns the first date or timestamp of the week that begins immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.week_begin()
- It is an alias forDataFrameColumn.week_start()
function.DataFrameColumn.week_end()
- Returns the last date or timestamp of the week that ends immediately after the specified date or timestamp value in a column as a literal.DataFrameColumn.month_start()
- Returns the first date or timestamp of the month that begins immediately before the specified date or timestamp value in a column or as a literal.DataFrameColumn.month_begin()
- It is an alias forDataFrameColumn.month_start()
function.DataFrameColumn.month_end()
- Returns the last date or timestamp of the month that ends immediately after the specified date or timestamp value in a column or as a literal.DataFrameColumn.year_start()
- Returns the first date or timestamp of the year that begins immediately before the specified date or timestamp value in a column or as a literal.DataFrameColumn.year_begin()
- It is an alias forDataFrameColumn.year_start()
function.DataFrameColumn.year_end()
- Returns the last date or timestamp of the year that ends immediately after the specified date or timestamp value in a column or as a literal.DataFrameColumn.quarter_start()
- Returns the first date or timestamp of the quarter that begins immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.quarter_begin()
- It is an alias forDataFrameColumn.quarter_start()
function.DataFrameColumn.quarter_end()
- Returns the last date or timestamp of the quarter that ends immediately after the specified date or timestamp value in a column as a literal.DataFrameColumn.last_sunday()
- Returns the date or timestamp of Sunday that falls immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.last_monday()
- Returns the date o...
- Date Time Functions
teradataml17.20.00.07
-
New Features/Functionality
-
Open Analytics Framework (OpenAF) APIs:
- Manage all user environments.
create_env()
:- new argument
conda_env
is added to create a conda environment.
- new argument
list_user_envs()
:- User can list conda environment(s) by using filter with new argument
conda_env
.
- User can list conda environment(s) by using filter with new argument
- Conda environment(s) can be managed using APIs for installing , updating, removing files/libraries.
- Manage all user environments.
-
Bug Fixes
columns
argument forFillNa
function is made optional.
-
teradataml 17.20.00.06
-
New Features/Functionality
-
teradataml DataFrameColumn a.k.a. ColumnExpression
-
ColumnExpression.nulls_first()
- Displays NULL values at first. -
ColumnExpression.nulls_last()
- Displays NULL values at last. -
Bit Byte Manipulation Functions
DataFrameColumn.bit_and()
- Returns the logical AND operation on the bits from
the column and corresponding bits from the argument.DataFrameColumn.bit_get()
- Returns the bit specified by input argument from the column and
returns either 0 or 1 to indicate the value of that bit.DataFrameColumn.bit_or()
- Returns the logical OR operation on the bits from the column and
corresponding bits from the argument.DataFrameColumn.bit_xor()
- Returns the bitwise XOR operation on the binary representation of the
column and corresponding bits from the argument.DataFrameColumn.bitand()
- It is an alias forDataFrameColumn.bit_and()
function.DataFrameColumn.bitnot()
- Returns a bitwise complement on the binary representation of the column.DataFrameColumn.bitor()
- It is an alias forDataFrameColumn.bit_or()
function.DataFrameColumn.bitwise_not()
- It is an alias forDataFrameColumn.bitnot()
function.DataFrameColumn.bitwiseNOT()
- It is an alias forDataFrameColumn.bitnot()
function.DataFrameColumn.bitxor()
- It is an alias forDataFrameColumn.bit_xor()
function.DataFrameColumn.countset()
- Returns the count of the binary bits within the column that are either set to 1
or set to 0, depending on the input argument value.DataFrameColumn.getbit()
- It is an alias forDataFrameColumn.bit_get()
function.DataFrameColumn.rotateleft()
- Returns an expression rotated to the left by the specified number of bits,
with the most significant bits wrapping around to the right.DataFrameColumn.rotateright()
- Returns an expression rotated to the right by the specified number of bits,
with the least significant bits wrapping around to the left.DataFrameColumn.setbit()
- Sets the value of the bit specified by input argument to the value
of column.DataFrameColumn.shiftleft()
- Returns the expression when value in column is shifted by the specified
number of bits to the left.DataFrameColumn.shiftright()
- Returns the expression when column expression is shifted by the specified
number of bits to the right.DataFrameColumn.subbitstr()
- Extracts a bit substring from the column expression based on the specified
bit position.DataFrameColumn.to_byte()
- Converts a numeric data type to the Vantage byte representation
(byte value) of the column expression value.
-
Regular Expression Functions
DataFrameColumn.regexp_instr()
- Searches string value in column for a match to value specified in argument.DataFrameColumn.regexp_replace()
- Replaces the portions of string value in a column that matches the value
specified regex string and replaces with the replace string.DataFrameColumn.regexp_similar()
- Compares value in column to value in argument and returns integer value.DataFrameColumn.regexp_substr()
- Extracts a substring from column that matches a regular expression
specified in the input argument.
-
-
Open Analytics Framework (OpenAF) APIs:
- Manage all user environments.
create_env()
:- User can create one or more user environments using newly added argument
template
by providing specifications in template json file. New feature allows user to create complete user environment, including file and library installation, in just single function call.
- User can create one or more user environments using newly added argument
- UserEnv Class – Manage individual user environment.
- Properties:
models
- Supports listing of models in user environment.
- Methods:
install_model()
- Install a model in user environment.uninstall_model()
- Uninstall a model from user environment.snapshot()
- Take the snapshot of the user environment.
- Properties:
- Manage all user environments.
-
teradataml: Bring Your Own Model
- New Functions
DataRobotPredict()
- Score the data in Vantage using the model trained externally in DataRobot and stored
in Vantage.
- New Functions
-
-
Updates
DataFrame.describe()
- Method now accepts an argument
statistics
, which specifies the aggregate operation to be performed.
- Method now accepts an argument
DataFrame.sort()
- Method now accepts ColumnExpressions as well.
- Enables sorting using NULLS FIRST and NULLS LAST.
view_log()
downloads the Apply query logs based on query id.- Arguments which accepts floating numbers will accept integers also for
Analytics Database Analytic Functions
. - Argument
ignore_nulls
added toDataFrame.plot()
to ignore the null values while plotting the data. Dataframe.sample()
- Method supports column stratification.
-
Bug Fixes
DataFrameColumn.cast()
accepts all teradatasqlalchemy types.- Minor bug fix related to
DataFrame.merge()
.
teradataml 17.20.00.05
-
New Features/Functionality
-
teradataml: Hyperparameter-Tuning - Technique to identify best model parameters.
Hyperparameter tuning is an optimization method to determine the optimal set of
hyperparameters for the given dataset and learning model. teradataml hyperparameter tuning feature
offers best model identification, parallel execution, early stopping feature, best data identification,
model evaluation, model prediction, live logging, input data hyper-parameterization, input data sampling,
numerous scoring functions, hyper-parameterization for non-model trainer functions.GridSearch
GridSearch is an exhaustive search algorithm that covers all possible
parameter values to identify optimal hyperparameters.- Methods of GridSearch
__init__()
- Instantiate an object of GridSearch for given model function and parameters.evaluate()
- Function to perform evaluation on the given teradataml DataFrame using default model.fit()
- Function to perform hyperparameter-tuning for given hyperparameters and model on teradataml DataFrame.get_error_log()
- Useful to get the error log if model execution failed, using the model identifier.get_input_data()
- Useful to get the input data using the data identifier, when input data is also parameterized.get_model()
- Returns the trained model for the given model identifier.get_parameter_grid()
- Returns the hyperparameter space used for hyperparameter optimization.is_running()
- Returns the execution status of hyperaparameter tuning.predict()
- Function to perform prediction on the given teradataml DataFrame using default model.set_model()
- Function to update the default model.
- Properties of GridSearch
best_data_id
- Returns the best data identifier used for model training.best_model
- Returns the best trained model.best_model_id
- Returns the identifier for best model.best_params_
- Returns the best set of hyperparameter.best_sampled_data_
- Returns the best sampled data used to train the best model.best_score_
- Returns the best trained model score.model_stats
- Returns the model evaluation reports.models
- Returns the metadata of all the models.
- Methods of GridSearch
RandomSearch
RandomSearch algorithm performs random sampling on hyperparameter
space to identify optimal hyperparameters.- Methods of RandomSearch
__init__()
- Instantiate an object of RandomSearch for given model function and parameters.evaluate()
- Function to perform evaluation on the given teradataml DataFrame using default model.fit()
- Function to perform hyperparameter-tuning for given hyperparameters and model on teradataml DataFrame.get_error_log()
- Useful to get the error log if model execution failed, using the model identifier.get_input_data()
- Useful to get the input data using the data identifier, when input data is also parameterized.get_model()
- Returns the trained model for the given model identifier.get_parameter_grid()
- Returns the hyperparameter space used for hyperparameter optimization.is_running()
- Returns the execution status of hyperaparameter tuning.predict()
- Function to perform prediction on the given teradataml DataFrame using default model.set_model()
- Function to update the default model.
- Properties of GridSearch
best_data_id
- Returns the best data identifier used for model training.best_model
- Returns the best trained model.best_model_id
- Returns the identifier for best model.best_params_
- Returns the best set of hyperparameter.best_sampled_data_
- Returns the best sampled data used to train the best model.best_score_
- Returns the best trained model score.model_stats
- Returns the model evaluation reports.models
- Returns the metadata of all the models.
- Methods of RandomSearch
-
teradataml: DataFrame
- New Functions
DataFrame.plot()
- Generates the below type of plots on teradataml DataFrame.- line - Generates line plot.
- bar - Generates bar plot.
- scatter - Generates scatter plot.
- corr - Generates correlation plot.
- wiggle - Generates a wiggle plot.
- mesh - Generates a mesh plot.
DataFrame.itertuples()
- iterate over teradataml DataFrame rows as namedtuples or list.
- New Functions
-
teradataml: GeoDataFrame
- New Functions
GeoDataFrame.plot()
- Generate the below type of plots on teradataml GeoDataFrame.- line - Generates line plot.
- bar - Generates bar plot.
- scatter - Generates scatter plot.
- corr - Generates correlation plot.
- wiggle - Generates a wiggle plot.
- mesh - Generates a mesh plot.
- geometry - Generates plot on geospatial data.
- New Functions
-
Plot:
Axis
- Genertes the axis for plot.Figure
- Generates the figure for plot.subplots
- Helps in generating multiple plots on a singleFigure
.
-
Bring Your Own Model (BYOM) Function:
DataikuPredict
- Score the data in Vantage using the model trained externally in Dataiku UI and stored in Vantage.
-
async_run_status()
- Function to check the status of asynchronous run(s) using unique run id(s). -
teradataml DataFrameColumn a.k.a. ColumnExpression
- Regular Arithmetic Functions
DataFrameColumn.abs()
- Computes the absolute value.DataFrameColumn.ceil()
- Returns the ceiling value of the column.DataFrameColumn.ceiling()
- It is an alias forDataFrameColumn.ceil()
function.DataFrameColumn.degrees()
- Converts radians value from the column to degrees.DataFrameColumn.exp()
- Raises e (the base of natural logarithms) to the power of the value in the column, where e = 2.71828182845905.DataFrameColumn.floor()
- Returns the largest integer equal to or less than the value in the column.DataFrameColumn.ln()
- Computes the natural logarithm of values in column.DataFrameColumn.log10()
- Computes the base 10 logarithm.DataFrameColumn.mod()
- Returns the modulus of the column.DataFrameColumn.pmod()
- It is an alias forDataFrameColumn.mod()
function.DataFrameColumn.nullifzero()
- Converts data from zero to null to avoid problems with division by zero.DataFrameColumn.pow()
- Computes the power of the column raised to expression or constant.DataFrameColumn.power()
- It is an alias forDataFrameColumn.pow()
function.DataFrameColumn.radians()
- Converts degree value from the column to radians.DataFrameColumn.round()
- Returns the rounded off value.DataFrameColumn.sign()
- Returns the sign.DataFrameColumn.signum()
- It is an alias forDataFrameColumn.sign()
function.DataFrameColumn.sqrt()
- Computes the square root of values in the column.DataFrameColumn.trunc()
- Provides the truncated value of columns.DataFrameColumn.width_bucket()
- Returns the number of the partition to which column is assigned.DataFrameColumn.zeroifnull()
- Converts data from null to zero to avoid problems with null.
- Trigonometric Functions
DataFrameColumn.acos()
- Returns the arc-cosine value.DataFrameColumn.asin()
- Returns the arc-sine value.DataFrameColumn.atan()
- Returns the arc-tangent value.DataFrameColumn.atan2()
- Returns the arc-tangent value based on x and y coordinates.DataFrameColumn.cos()
- Returns the cosine value.DataFrameColumn.sin()
- Returns the sine value.DataFrameColumn.tan()
- Returns the tangent value.
- Hyperbolic Functions
DataFrameColumn.acosh()
- Returns the inverse hyperbolic cosine value.DataFrameColumn.asinh()
- Returns the inverse hyperbolic sine value.DataFrameColumn.atanh()
- Returns the inverse hyperbolic tangent value.DataFrameColumn.cosh()
- Returns the hyperbolic cosine value.DataFrameColumn.sinh()
- Returns the hyperbolic sine valueDataFrameColumn.tanh()
- Returns the hyperbolic tangent value.
- String Functions
DataFrameColumn.ascii()
- Returns the decimal representation of the first character in column.DataFrameColumn.char2hexint()
- Returns the hexadecimal representation for a character string in a column.DataFrameColumn.chr()
- Returns the Latin ASCII character of a given a numeric code value in column.DataFrameColumn.char()
- It is an alias forDataFrameColumn.chr()
function.DataFrameColumn.character_length()
- Returns the number of characters in the column.DataFrameColumn.char_length()
- It is an alias forDataFrameColumn.character_length()
function.DataFrameColumn.edit_distance()
- Returns the minimum number of edit operations required to
transform string in a column into string specified in argument.DataFrameColumn.index()
- Returns the position of a string in a column where string specified in argument starts.DataFrameColumn.initcap()
- Modifies a string column and returns the string with the first character
of each word in uppercase.DataFrameColumn.instr()
- Searches the string in a column for occurrences of search string passed as argument.DataFrameColumn.lcase()
- Returns a character string identical to string values ...
- Regular Arithmetic Functions
-
teradataml 17.20.00.04
-
New Features/Functionality
-
teradataml is now compatible with SQLAlchemy 2.0.X
- Important notes when user has sqlalchemy version >= 2.0:
-
Users will not be able to run the
execute()
method on SQLAlchemy engine object returned by
get_context()
andcreate_context()
teradataml functions. This is because SQLAlchemy has
removed the support forexecute()
method on the engine object.
Thus, user scripts whereget_context().execute()
andcreate_context().execute()
, is used,
Teradata recommends to replace those with eitherexecute_sql()
function exposed by teradataml
orexec_driver_sql()
method on theConnection
object returned byget_connection()
function
in teradataml.
from teradataml import execute_sql
execute_sql("DROP TABLE test_select")
get_connection().exec_driver_sql("select sessionno from DBC.SessionInfoV where UserName = 'alice';")
-
Now
get_connection().execute()
accepts only executable sqlalchemy object. Refer to
sqlalchemy.engine.base.execute()
for more details.
-
- Important notes when user has sqlalchemy version >= 2.0:
-
New utility function
execute_sql()
is added to execute the SQL. -
Extending compatibility for native MAC with ARM processors.
-
Added support for floor division (//) between two teradataml DataFrame Columns.
-
Analytics Database Analytic Functions:
GLMPerSegment()
GLMPredictPerSegment()
OneClassSVM()
OneClassSVMPredict()
SVM()
SVMPredict()
TargetEncodingFit()
TargetEncodingTransform()
TrainTestSplit()
WordEmbeddings()
XGBoost()
XGBoostPredict()
-
teradataml Options
- Display Options
display.geometry_column_length
Option to display the default length of geometry column in GeoDataFrame.
- Display Options
-
Updates
set_auth_token()
function can generate the client id automatically based on org_id when user do not specify it.- Analytics Database Analytic Functions:
ColumnTransformer()
- Does not allow list values for arguments -
onehotencoding_fit_data
andordinalencoding_fit_data
.
- Does not allow list values for arguments -
OrdidnalEncodingFit()
- New arguments added -
category_data
,target_column_names
,categories_column
,ordinal_values_column
. - Allows the list of values for arguments -
target_column
,start_value
,default_value
.
- New arguments added -
OneHotEncodingFit()
- New arguments added -
category_data
,approach
,target_columns
,categories_column
,category_counts
. - Allows the list of values for arguments -
target_column
,other_column
.
- New arguments added -
-
Bug Fixes
DataFrame.sample()
method output is now deterministic.copy_to_sql()
now preserves the rows of the table even when the view content is copied to the same table name.list_user_envs()
does not raise warning when no user environments found.
-
teradataml 17.20.00.03
Teradata Python package for Advanced Analytics.
teradataml makes available to Python users a collection of analytic functions that reside on Teradata Vantage. This allows users to perform analytics on Teradata Vantage with no SQL coding. In addition, the teradataml library provides functions for scaling data manipulation and transformation, data filtering and sub-setting, and can be used in conjunction with other open-source python libraries.
For community support, please visit the Teradata Community.
For Teradata customer support, please visit Teradata Support.
Copyright 2023, Teradata. All Rights Reserved.
Table of Contents
Release Notes:
teradataml 17.20.00.03
-
Updates
- DataFrame.join
- New arguments
lprefix
andrprefix
added. - Behavior of arguments
lsuffix
andrsuffix
will be changed in future, use new arguments instead. - New and old affix arguments can now be used independently.
- New arguments
- Analytic functions can be imported regardless of context creation.
Import after create context constraint is now removed. ReadNOS
andWriteNOS
now accept dictionary value forauthorization
androw_format
arguments.WriteNOS
supports writing CSV files to external store.- Following model cataloging APIs will be deprecated in future:
- describe_model
- delete_model
- list_models
- publish_model
- retrieve_model
- save_model
- DataFrame.join
-
Bug Fixes
copy_to_sql()
bug related to NaT value has been fixed.- Tooltip on PyCharm IDE now points to SQLE.
value
argument ofFillNa()
, a Vantage Analytic Library function supports special characters.case
function accepts DataFrame column as value inwhens
argument.
Release Notes:
teradataml 17.20.00.02
-
New Features/Functionality
-
teradataml: Open Analytics
- New Functions
set_auth_token()
- Sets the JWT token automatically for using Open AF API's.
- New Functions
-
teradataml Options
- Display Options
display.suppress_vantage_runtime_warnings
Suppresses the VantageRuntimeWarning raised by teradataml, when set to True.
- Display Options
-
Updates
- SimpleImputeFit function arguments
stats_columns
andstats
are made to be optional. - New argument
table_format
is added to ReadNOS(). - Argument
full_scan
is changed toscan_pct
in ReadNOS().
- SimpleImputeFit function arguments
-
Bug Fixes
- Minor bug fix related to read_csv.
- APPLY and
DataFrame.apply()
supports hash by and local order by. - Output column names are changed for DataFrame.dtypes and DataFrame.tdtypes.
-
Release Notes:
teradataml 17.20.00.01
-
New Features/Functionality
-
teradataml: DataFrame
- New Functions
DataFrame.pivot()
- Rotate data from rows into columns to create easy-to-read DataFrames.DataFrame.unpivot()
- Rotate data from columns into rows to create easy-to-read DataFrames.DataFrame.drop_duplicate()
- Drop duplicate rows from teradataml DataFrame.
- New properties
Dataframe.is_art
- Check whether teradataml DataFrame is created on an Analytic Result Table, i.e., ART table or not.
- New Functions
-
teradataml: Unbounded Array Framework (UAF) Functions:
-
New Functions
- New Functions Supported on Database Versions: 17.20.x.x
- MODEL PREPARATION AND PARAMETER ESTIMATION functions:
ACF()
ArimaEstimate()
ArimaValidate()
DIFF()
LinearRegr()
MultivarRegr()
PACF()
PowerTransform()
SeasonalNormalize()
Smoothma()
UNDIFF()
Unnormalize()
- SERIES FORECASTING functions:
ArimaForecast()
DTW()
HoltWintersForecaster()
MAMean()
SimpleExp()
- DATA PREPARATION functions:
BinaryMatrixOp()
BinarySeriesOp()
GenseriesFormula()
MatrixMultiply()
Resample()
- DIAGNOSTIC STATISTICAL TEST functions:
BreuschGodfrey()
BreuschPaganGodfrey()
CumulPeriodogram()
DickeyFuller()
DurbinWatson()
FitMetrics()
GoldfeldQuandt()
Portman()
SelectionCriteria()
SignifPeriodicities()
SignifResidmean()
WhitesGeneral()
- TEMPORAL AND SPATIAL functions:
Convolve()
Convolve2()
DFFT()
DFFT2()
DFFT2Conv()
DFFTConv()
GenseriesSinusoids()
IDFFT()
IDFFT2()
LineSpec()
PowerSpec()
- GENERAL UTILITY functions:
ExtractResults()
InputValidator()
MInfo()
SInfo()
TrackingOp()
- MODEL PREPARATION AND PARAMETER ESTIMATION functions:
- New Functions Supported on Database Versions: 17.20.x.x
-
New Features: Inputs to Unbounded Array Framework (UAF) functions
TDAnalyticResult()
- Allows to prepare function output generated by UAF functions to be passed.TDGenSeries()
- Allows to generate a series, that can be passed to a UAF function.TDMatrix()
- Represents a Matrix in time series, that can be created from a teradataml DataFrame.TDSeries()
- Represents a Series in time series, that can be created from a teradataml DataFrame.
-
-
Updates
- Native Object Store (NOS) functions support authorization by specifying authorization object.
display_analytic_functions()
categorizes the analytic functions based on function type.- ColumnTransformer accepts multiple values for arguments nonlinearcombine_fit_data,
onehotencoding_fit_data, ordinalencoding_fit_data.
-
Bug Fixes
- Redundant warnings thrown by teradataml are suppressed.
- OpenAF supports when context is created with JWT Token.
- New argument "match_column_order" added to copy_to_sql, that allows DataFrame loading with any column order.
copy_to_sql
updated to map data type timezone(tzinfo) to TIMESTAMP(timezone=True), instead of VARCHAR.- Improved performance for DataFrame.sum and DataFrameColumn.sum functions.
-
Release Notes:
teradataml 17.20.00.00
-
New Features/Functionality
-
teradataml: Analytics Database Analytic Functions
- New Functions
-
New Functions Supported on Database Versions: 17.20.x.x
ANOVA()
ClassificationEvaluator()
ColumnTransformer()
DecisionForest()
GLM()
GetFutileColumns()
KMeans()
KMeansPredict()
NaiveBayesTextClassifierTrainer()
NonLinearCombineFit()
NonLinearCombineTransform()
OrdinalEncodingFit()
OrdinalEncodingTransform()
RandomProjectionComponents()
RandomProjectionFit()
RandomProjectionTransform()
RegressionEvaluator()
ROC()
SentimentExtractor()
Silhouette()
TDGLMPredict()
TextParser()
VectorDistance()
-
- Updates
display_analytic_functions()
categorizes the analytic functions based on function type.- Users can provide range value for columns argument.
- New Functions
-
teradataml: Open Analytics
- Manage all user environments.
list_base_envs()
- list the available python base versions.create_env()
- create a new user environment. get_env()
- get existing user environment.list_user_envs()
- list the available user environments.remove_env()
- delete user environment.remove_all_envs()
- delete all the user environments.
- UserEnv Class – Manage individual user environment.
- Properties
files
- Get files in user environment.libs
- Get libraries in user environment.
- Methods
install_file()
- Install a file in user environment.remove_file()
- Remove a file in user environment.install_lib()
- Install a library in user environment.update_lib()
- Update a library in user environment.uninstall_lib()
- Uninstall a library in user environment.status()
- Check the status of- file installation
- library installation
- library update
- library uninstallation
refresh()
- Refresh the environment details in local client.
- Properties
- Apply Class – Execute a user script on VantageCloud Lake.
__init__()
- Instantiate an object of apply for script execution.install_file()
- Install a file in user environment.remove_file()
- Remove a file in user environment.set_data()
– Reset data and related arguments.execute_script()
– Executes Python script.
- Manage all user environments.
-
teradataml: DataFrame
- New Functions
DataFrame.apply()
- Execute a user defined Python function on VantageLake Cloud.
- New Functions
-
teradataml: Bring Your Own Model
- New Functions
ONNXPredict()
- Score using model trained externally on ONNX and stored in Vantage.
- New Functions
-
teradataml: Options
- New Functions
- set_config_params() New API to set all config params in one go.
- New Configuration Options
- For Open Analytics support.
- ues_url – User Environment Service URL for ...
- For Open Analytics support.
- New Functions
-