Fix compatibility issues with Featuretools (#41)

* tests pass * upgraded sphinx * trying to fix sphinx error * added test * removed comment * removed ww.init() * updated release notes * fixed release note formatting * fixed formatting * changed entities to dataframes * renamed normalize_entity to normalize_entityset * fixed test * added breaking change note to release notes
alteryx · Mar 9, 2022 · f05cb9d · f05cb9d
1 parent d02756f
commit f05cb9d
Show file tree

Hide file tree

Showing 6 changed files with 96 additions and 51 deletions.
diff --git a/README.md b/README.md
@@ -8,9 +8,9 @@ AutoNormalize is a Python library for automated datatable normalization. It allo
 
 ## Getting Started
 
-* [Install](#install)  
-* [Demos](#demos)  
-* [API Reference](#api-reference)  
+- [Install](#install)
+- [Demos](#demos)
+- [API Reference](#api-reference)
 
 ## Install
 
@@ -26,11 +26,11 @@ pip uninstall autonormalize
 
 ## Demos
 
-* [Blog Post](https://blog.featurelabs.com/automatic-dataset-normalization-for-feature-engineering-in-python/)
-* [Machine Learning Demo with Featuretools](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/AutoNormalize%20%2B%20FeatureTools%20Demo.ipynb)
-* [Kaggle Liquor Sales Dataset Demo](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/Kaggle%20Liquor%20Sales%20Dataset%20Demo.ipynb)
-* [Demo with Editing Dependencies](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/Editing%20Dependnecies%20Demo.ipynb)
-* [Kaggle Food Production Dataset Demo](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/Kaggle%20Food%20%20Dataset%20Demo.ipynb)
+- [Blog Post](https://blog.featurelabs.com/automatic-dataset-normalization-for-feature-engineering-in-python/)
+- [Machine Learning Demo with Featuretools](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/AutoNormalize%20%2B%20FeatureTools%20Demo.ipynb)
+- [Kaggle Liquor Sales Dataset Demo](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/Kaggle%20Liquor%20Sales%20Dataset%20Demo.ipynb)
+- [Demo with Editing Dependencies](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/Editing%20Dependnecies%20Demo.ipynb)
+- [Kaggle Food Production Dataset Demo](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/Kaggle%20Food%20%20Dataset%20Demo.ipynb)
 
 ## API Reference
 
@@ -44,19 +44,19 @@ Creates a normalized entityset from a dataframe.
 
 **Arguments:**
 
-* `df` (pd.Dataframe) : the dataframe containing data
+- `df` (pd.Dataframe) : the dataframe containing data
 
-* `accuracy` (0 < float <= 1.00; default = 0.98) : the accuracy threshold required in order to conclude a dependency (i.e. with accuracy = 0.98, 0.98 of the rows must hold true the dependency LHS --> RHS)
+- `accuracy` (0 < float <= 1.00; default = 0.98) : the accuracy threshold required in order to conclude a dependency (i.e. with accuracy = 0.98, 0.98 of the rows must hold true the dependency LHS --> RHS)
 
-* `index` (str, optional) : name of column that is intended index of df
+- `index` (str, optional) : name of column that is intended index of df
 
-* `name` (str, optional) : the name of created EntitySet
+- `name` (str, optional) : the name of created EntitySet
 
-* `time_index` (str, optional) : name of time column in the dataframe.
+- `time_index` (str, optional) : name of time column in the dataframe.
 
 **Returns:**
 
-* `entityset` (ft.EntitySet) : created entity set
+- `entityset` (ft.EntitySet) : created entity set
 
 ### `find_dependencies`
 
@@ -68,7 +68,7 @@ Finds dependencies within dataframe with the DFD search algorithm.
 
 **Returns:**
 
-* `dependencies` (Dependencies) : the dependencies found in the data within the contraints provided
+- `dependencies` (Dependencies) : the dependencies found in the data within the contraints provided
 
 ### `normalize_dataframe`
 
@@ -78,13 +78,13 @@ normalize_dataframe(df, dependencies)
 
 Normalizes dataframe based on the dependencies given. Keys for the newly created DataFrames can only be columns that are strings, ints, or categories. Keys are chosen according to the priority:
 
-1) shortest lenghts
-2) has "id" in some form in the name of an attribute
-3) has attribute furthest to left in the table
+1. shortest lenghts
+2. has "id" in some form in the name of an attribute
+3. has attribute furthest to left in the table
 
 **Returns:**
 
-* `new_dfs` (list[pd.DataFrame]) : list of new dataframes
+- `new_dfs` (list[pd.DataFrame]) : list of new dataframes
 
 <br />
 
@@ -98,25 +98,25 @@ Creates a normalized EntitySet from dataframe based on the dependencies given. K
 
 **Returns:**
 
-* `entityset` (ft.EntitySet) : created EntitySet
+- `entityset` (ft.EntitySet) : created EntitySet
 
 <br />
 
-### `normalize_entity`
+### `normalize_entityset`
 
 ```shell
-normalize_entity(es, accuracy=0.98)
+normalize_entityset(es, accuracy=0.98)
 ```
 
 Returns a new normalized `EntitySet` from an `EntitySet` with a single entity.
 
 **Arguments:**
 
-* `es` (ft.EntitySet) : EntitySet with a single entity to normalize
+- `es` (ft.EntitySet) : EntitySet with a single entity to normalize
 
 **Returns:**
 
-* `new_es` (ft.EntitySet) : new normalized EntitySet
+- `new_es` (ft.EntitySet) : new normalized EntitySet
 
 <br />
 

diff --git a/autonormalize/autonormalize.py b/autonormalize/autonormalize.py
@@ -85,24 +85,31 @@ def make_entityset(df, dependencies, name=None, time_index=None):
     normalize.normalize_dataframe(depdf)
     normalize.make_indexes(depdf)
 
-    entities = {}
+    dataframes = {}
     relationships = []
 
     stack = [depdf]
 
     while stack != []:
         current = stack.pop()
+        if (current.df.ww.schema is None):
+            current.df.ww.init(index=current.index[0], name=current.index[0])
+
+        current_df_name = current.df.ww.name
         if time_index in current.df.columns:
-            entities[current.index[0]] = (current.df, current.index[0], time_index)
+            dataframes[current_df_name] = (current.df, current.index[0], time_index)
         else:
-            entities[current.index[0]] = (current.df, current.index[0])
+            dataframes[current_df_name] = (current.df, current.index[0])
         for child in current.children:
+            if (child.df.ww.schema is None):
+                child.df.ww.init(index=child.index[0], name=child.index[0])
+            child_df_name = child.df.ww.name
             # add to stack
             # add relationship
             stack.append(child)
-            relationships.append((child.index[0], child.index[0], current.index[0], child.index[0]))
+            relationships.append((child_df_name, child.index[0], current_df_name, child.index[0]))
 
-    return ft.EntitySet(name, entities, relationships)
+    return ft.EntitySet(name, dataframes, relationships)
 
 
 def auto_entityset(df, accuracy=0.98, index=None, name=None, time_index=None):
@@ -141,9 +148,9 @@ def auto_normalize(df):
     return normalize_dataframe(df, find_dependencies(df))
 
 
-def normalize_entity(es, accuracy=0.98):
+def normalize_entityset(es, accuracy=0.98):
     """
-    Returns a new normalized EntitySet from an EntitySet with a single entity.
+    Returns a new normalized EntitySet from an EntitySet with a single dataframe.
 
     Arguments:
         es (ft.EntitySet) : EntitySet to normalize
@@ -152,13 +159,14 @@ def normalize_entity(es, accuracy=0.98):
     Returns:
         new_es (ft.EntitySet) : new normalized EntitySet
     """
-    # TO DO: add option to pass an EntitySet with more than one entity, and specify which one
+    # TO DO: add option to pass an EntitySet with more than one dataframe, and specify which one
     # to normalize while preserving existing relationships
 
-    if len(es.entities) > 1:
-        raise ValueError('There is more than one entity in this EntitySet')
-    if len(es.entities) == 0:
+    if len(es.dataframes) > 1:
+        raise ValueError('There is more than one dataframe in this EntitySet')
+    if len(es.dataframes) == 0:
         raise ValueError('This EntitySet is empty')
-    entity = es.entities[0]
-    new_es = auto_entityset(entity.df, accuracy, index=entity.index, name=es.id, time_index=entity.time_index)
+
+    df = es.dataframes[0]
+    new_es = auto_entityset(df, accuracy, index=df.ww.index, name=es.id, time_index=df.ww.time_index)
     return new_es
diff --git a/autonormalize/tests/test_example.py b/autonormalize/tests/test_example.py
@@ -1,5 +1,8 @@
 import featuretools as ft
+import pandas as pd
+from unittest.mock import patch
 
+import pytest
 import autonormalize as an
 
 
@@ -21,3 +24,30 @@ def test_ft_mock_customer():
     assert set([str(rel) for rel in entityset.relationships]) == set(['<Relationship: transaction_id.session_id -> session_id.session_id>',
                                                                       '<Relationship: transaction_id.product_id -> product_id.product_id>',
                                                                       '<Relationship: session_id.customer_id -> customer_id.customer_id>'])
+
+
+@patch("autonormalize.autonormalize.auto_entityset")
+def test_normalize_entityset(auto_entityset):
+    df1 = pd.DataFrame({"test": [0, 1, 2]})
+    df2 = pd.DataFrame({"test": [0, 1, 2]})
+    accuracy = 0.98
+
+    es = ft.EntitySet()
+
+    error = "This EntitySet is empty"
+    with pytest.raises(ValueError, match=error):
+        an.normalize_entityset(es, accuracy)
+
+    es.add_dataframe(df1, "df")
+
+    df_out = es.dataframes[0]
+
+    an.normalize_entityset(es, accuracy)
+
+    auto_entityset.assert_called_with(df_out, accuracy, index=df_out.ww.index, name=es.id, time_index=df_out.ww.time_index)
+
+    es.add_dataframe(df2, "df2")
+
+    error = "There is more than one dataframe in this EntitySet"
+    with pytest.raises(ValueError, match=error):
+        an.normalize_entityset(es, accuracy)
diff --git a/dev-requirements.txt b/dev-requirements.txt
@@ -3,9 +3,9 @@ codecov==2.1.8
 flake8==3.7.8
 autopep8==1.4.4
 isort==4.3.21
-nbsphinx==0.8.5
-pydata-sphinx-theme==0.4.0
-Sphinx==3.2.1
+nbsphinx==0.8.7
+pydata-sphinx-theme==0.7.1
+Sphinx==4.2.0
 nbconvert==6.0.2
 ipython==7.16.3
 pygments==2.8.1

diff --git a/docs/source/api_reference.rst b/docs/source/api_reference.rst
@@ -16,7 +16,7 @@ Autonormalize
    make_entityset
    auto_entityset
    auto_normalize
-   normalize_entity
+   normalize_entityset
 
 Dependencies
 ======================

diff --git a/docs/source/release_notes.rst b/docs/source/release_notes.rst
@@ -3,34 +3,41 @@
 Release Notes
 -------------
 
-.. Future Release
-  ==============
+Future Release
+==============
     * Enhancements
     * Fixes
+        * Fix compatibility issues with featuretools (:pr:`41`)
     * Changes
+        * Rename ``normalize_entity`` to ``normalize_entityset`` (:pr:`41`)
     * Documentation Changes
     * Testing Changes
 
-.. Thanks to the following people for contributing to this release:
+    Thanks to the following people for contributing to this release:
+    :user:`dvreed77`
+
+Breaking Changes
+++++++++++++++++
+    * :pr:`41`: The function ``normalize_entity`` has been renamed to ``normalize_entityset``.
 
 v1.0.1 Jan 7, 2022
 ==================
     * Documentation Changes
-      * Update release notes and release format (:pr:`37`)
-      * Updated sphinx documentation and guides (:pr:`35`)
+        * Update release notes and release format (:pr:`37`)
+        * Updated sphinx documentation and guides (:pr:`35`)
     * Testing Changes
-      * Updated tests to work with featuretools 1.0 (:pr:`35`)
+        * Updated tests to work with featuretools 1.0 (:pr:`35`)
 
-  Thanks to the following people for contributing to this release:
-  :user:`gsheni`, :user:`tuethan1999`
+    Thanks to the following people for contributing to this release:
+    :user:`gsheni`, :user:`tuethan1999`
 
 
 v1.0.0 Aug 15, 2019
 ===================
     * Initial Release
 
-  Thanks to the following people for contributing to this release:
-  :user:`allisonportis`
+    Thanks to the following people for contributing to this release:
+    :user:`allisonportis`
 
 .. command
 .. git log --pretty=oneline --abbrev-commit