Example

Polynomial Features Project

For this project we will create an algorithm that take in a pandas dataframe and creates polynomial (higher degree) features from a numeric column, and returns a new dataframe with the polynomial features.

Math Background

Given a numerical feature, we may want to create higher degree features. So given a value $x$, we want an algorithm that returns the tuple $( x, x^2, x^3, ..., x^p)$.

Now, given a dataframe $df1$, and a polynomial degree $p$, and a numerical column which we wish to transform, we want an algorithm that returns a dataframe $df2$ with all of the original features from $df1$, and concatenated on, the new higher degree features, up to and including degree p. This algorithm will also keep the column to be transformed and its transformed columns all adjacent to eachother in the resulting dataframe $df2$.

Code

import numpy as np
import pandas as pd

# So we want construct a function that takes in a dataframe
# and returns a new one with the specified poly features.

# (df, column_name, highest number of degress we want) |---> new df with poly features




def create_poly_from_df(df,column_name,p):
    
    
        
    
    
    df_copy = df.copy()
    
    # so the first thing we need to do is take in the dataframe 
    # and isolate the chosen feature.
    
    data_col = df[column_name].values.copy()
    
    #Now we want to run a loop that creates each polynomial feature.
    
    features = []
    
    new_feature_names = []
    
    
    if p == 1:
        return df
    
    
    
    for i in range(1,p+1):
        
        features.append(data_col**i)   #creating the polynomial features.
        
        
        
        # Creating the names of our new columns.
        if i == 1:
            new_feature_names.append(column_name)
        else: 
            new_feature_names.append(column_name + f'^{i}')
        
        
        
    poly_df = pd.DataFrame(np.array(features).T)   #Our dataframe of polynomial features.
    
  
    poly_df.columns = new_feature_names #Assigning column names
    poly_df.index = df_copy.index #Assigning the index of the inputted dataframe.
    
    
    if type(df) == pd.core.series.Series:
        return  poly_df #Dropping the feature from new_df because it is
    # included in the new dataframe.
     
        
        
        
        
        
        
    df_copy.drop(column_name,inplace = True,axis = 1) 
    
    
    return pd.concat([df_copy,poly_df],axis = 1) #Concatonating or old dataframe to the new one.

Example

We can apply our function to the iris datasets sepal_length feature.

iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

iris.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa

create_poly_from_df(iris,'sepal_length',3).head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	sepal_width	petal_length	petal_width	species	sepal_length	sepal_length^2	sepal_length^3
0	3.5	1.4	0.2	setosa	5.1	26.01	132.651
1	3.0	1.4	0.2	setosa	4.9	24.01	117.649
2	3.2	1.3	0.2	setosa	4.7	22.09	103.823
3	3.1	1.5	0.2	setosa	4.6	21.16	97.336
4	3.6	1.4	0.2	setosa	5.0	25.00	125.000

We see our new dataframe has sepal_length, sepal_length squared, and sepal_length cubed.

Now we see how we can create high degree features.

create_poly_from_df(iris[['sepal_length']],'sepal_length',6).head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	sepal_length	sepal_length^2	sepal_length^3	sepal_length^4	sepal_length^5	sepal_length^6
0	5.1	26.01	132.651	676.5201	3450.25251	17596.287801
1	4.9	24.01	117.649	576.4801	2824.75249	13841.287201
2	4.7	22.09	103.823	487.9681	2293.45007	10779.215329
3	4.6	21.16	97.336	447.7456	2059.62976	9474.296896
4	5.0	25.00	125.000	625.0000	3125.00000	15625.000000

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
createpolyfeatures.ipynb		createpolyfeatures.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Polynomial Features Project

Math Background

Code

Example

About

Releases

Packages

Languages

Eric-Conn/CreatePolynomialFeatures

Folders and files

Latest commit

History

Repository files navigation

Polynomial Features Project

Math Background

Code

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages