Skip to content

A repository to create higher degree features from a column of numeric data.

Notifications You must be signed in to change notification settings

Eric-Conn/CreatePolynomialFeatures

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Polynomial Features Project

For this project we will create an algorithm that take in a pandas dataframe and creates polynomial (higher degree) features from a numeric column, and returns a new dataframe with the polynomial features.

Math Background

Given a numerical feature, we may want to create higher degree features. So given a value $x$, we want an algorithm that returns the tuple $( x, x^2, x^3, ..., x^p)$.

Now, given a dataframe $df1$, and a polynomial degree $p$, and a numerical column which we wish to transform, we want an algorithm that returns a dataframe $df2$ with all of the original features from $df1$, and concatenated on, the new higher degree features, up to and including degree p. This algorithm will also keep the column to be transformed and its transformed columns all adjacent to eachother in the resulting dataframe $df2$.

Code

import numpy as np
import pandas as pd
# So we want construct a function that takes in a dataframe
# and returns a new one with the specified poly features.

# (df, column_name, highest number of degress we want) |---> new df with poly features




def create_poly_from_df(df,column_name,p):
    
    
        
    
    
    df_copy = df.copy()
    
    # so the first thing we need to do is take in the dataframe 
    # and isolate the chosen feature.
    
    data_col = df[column_name].values.copy()
    
    #Now we want to run a loop that creates each polynomial feature.
    
    features = []
    
    new_feature_names = []
    
    
    if p == 1:
        return df
    
    
    
    for i in range(1,p+1):
        
        features.append(data_col**i)   #creating the polynomial features.
        
        
        
        # Creating the names of our new columns.
        if i == 1:
            new_feature_names.append(column_name)
        else: 
            new_feature_names.append(column_name + f'^{i}')
        
        
        
    poly_df = pd.DataFrame(np.array(features).T)   #Our dataframe of polynomial features.
    
  
    poly_df.columns = new_feature_names #Assigning column names
    poly_df.index = df_copy.index #Assigning the index of the inputted dataframe.
    
    
    if type(df) == pd.core.series.Series:
        return  poly_df #Dropping the feature from new_df because it is
    # included in the new dataframe.
     
        
        
        
        
        
        
    df_copy.drop(column_name,inplace = True,axis = 1) 
    
    
    return pd.concat([df_copy,poly_df],axis = 1) #Concatonating or old dataframe to the new one.




    
    
        
    
    
    
    

Example

We can apply our function to the iris datasets sepal_length feature.

iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
iris.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
create_poly_from_df(iris,'sepal_length',3).head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
sepal_width petal_length petal_width species sepal_length sepal_length^2 sepal_length^3
0 3.5 1.4 0.2 setosa 5.1 26.01 132.651
1 3.0 1.4 0.2 setosa 4.9 24.01 117.649
2 3.2 1.3 0.2 setosa 4.7 22.09 103.823
3 3.1 1.5 0.2 setosa 4.6 21.16 97.336
4 3.6 1.4 0.2 setosa 5.0 25.00 125.000

We see our new dataframe has sepal_length, sepal_length squared, and sepal_length cubed.

Now we see how we can create high degree features.

create_poly_from_df(iris[['sepal_length']],'sepal_length',6).head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
sepal_length sepal_length^2 sepal_length^3 sepal_length^4 sepal_length^5 sepal_length^6
0 5.1 26.01 132.651 676.5201 3450.25251 17596.287801
1 4.9 24.01 117.649 576.4801 2824.75249 13841.287201
2 4.7 22.09 103.823 487.9681 2293.45007 10779.215329
3 4.6 21.16 97.336 447.7456 2059.62976 9474.296896
4 5.0 25.00 125.000 625.0000 3125.00000 15625.000000

About

A repository to create higher degree features from a column of numeric data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages