-
Notifications
You must be signed in to change notification settings - Fork 10
machinelearning_tutorial
The BuildSim Learn project is one of many exciting projects that we have worked. The goal of this project is to bridge the machine learning and data mining techniques with modeling tasks - whether it is model calibration or design decision support.
In this tutorial, I will walk through some basics of machine learning as well as how to use machine learning and optimization to support a Super fast design decision making.
A pre-requisite of this tutorial is the installations of the BuildSimHubAPI python library, Scipy library, Pandas library, and Scikit-learn package.
The full example can be found in this link
Data is THE key to for any study. If we want to surprise the client with a Super fast responses, we need to have a high-quality dataset to train our machine learning models.
BuildSim Cloud offers three different ways to generate your unique simulation datasets.
Brute-force algorithm is a straightforward algorithm. This algorithm tries out all the possible design combinations. For instance, the image below (credit to Professor Karaguzel from CMU) has six parameters. The climate parameter has 14 different values, and glazing type parameter has 11 different values ... In total the possible combination of this study is: 14 x 11 x 2 x 3 x 4 x 2 = 7392.
- Pros:
- Exhaustively search. It is the only algorithm that can cover the entire solution space
- Simple and easy to set it up.
- Cons:
- The number of simulation increase quickly.. almost too quick sometimes.
- It is impossible to support continuous values, for instance, LPD in range of 6.0 – 12.0 W/m2
- Example: link
import BuildSimHubAPI as bsh_api
# 1. set your folder key
project_api_key = 'f98aadb3-254f-428d-a321-82a6e4b9424c'
local_file_dir = '/Users/weilixu/Desktop/data/UnitTest/5ZoneAirCooled.idf'
bsh = bsh_api.BuildSimHubAPIClient()
# if the seed model is on the buildsim cloud - add model_api_key to the new_parametric_job function
new_pj = bsh.new_parametric_job(project_api_key)
measure_list = list()
# Define EEMs - for full list of measures and their behavior,
# please check: https://github.com/weilix88/buildsimhub_python_api/wiki/Parametric#energyefficientmeasures
wwr = bsh_api.measures.WindowWallRatio()
wwr_ratio = [0.6, 0.4]
wwr.set_datalist(wwr_ratio)
measure_list.append(wwr)
lpd = bsh_api.measures.LightLPD('ip')
lpdValue = [1.2, 0.9]
lpd.set_datalist(lpdValue)
measure_list.append(lpd)
heatEff = bsh_api.measures.HeatingEfficiency()
cop = [0.8, 0.86]
heatEff.set_datalist(cop)
measure_list.append(heatEff)
# Add EEMs to parametric job
new_pj.add_model_measures(measure_list)
# Start!
results = new_pj.submit_parametric_study_local(local_file_dir, track=True)
Monte Carlo sampling is a type of numeric method that estimates the solution of complex problems (e.g., repeated calculating EUIs) by repeated sampling from predefined distributions. A simple but most used example is coin flip.
In energy modeling, Monte Carlo often used for sensitivity analysis. A few papers also use it for model calibration.
- Pros:
- The number of simulations does not heavily depend on the number of parameters.
- Works very well with both categorical data (Daylight sensor on/off) and numeric data (lighting power density from 6 W/m2 to 12 W/m2)
- Cons:
- The outcome is always approximation - even you toss 100,000 times coins, you still may get 49.5% head.
- It is difficult to estimate how many time you repeat the experiment (or how many simulations)
On BuildSim Cloud, we implemented latin hypercube algorithm, which is also a sampling method, but requires far less number of simulations than a regular Monte Carlo algorithm.
- Examples: Link
import BuildSimHubAPI as bsh_api
# 1. set your folder key
project_key = 'f98aadb3-254f-428d-a321-82a6e4b9424c'
local_file_dir = "/Users/weilixu/Desktop/data/UnitTest/5ZoneAirCooled.idf"
number_of_simulation = 200
bsh = bsh_api.BuildSimHubAPIClient()
# if the seed model is on the buildsim cloud - add model_api_key to the new_parametric_job function
new_pj = bsh.new_parametric_job(project_key)
# Define EEMs
measure_list = list()
wwrn = bsh_api.measures.WindowWallRatio(orientation="N")
wwrn.set_min(0.3)
wwrn.set_max(0.6)
measure_list.append(wwrn)
wwrs = bsh_api.measures.WindowWallRatio(orientation="S")
wwrs.set_min(0.3)
wwrs.set_max(0.6)
measure_list.append(wwrs)
wwrw = bsh_api.measures.WindowWallRatio(orientation="W")
wwrw.set_min(0.3)
wwrw.set_max(0.6)
measure_list.append(wwrw)
wwre = bsh_api.measures.WindowWallRatio(orientation="E")
wwre.set_min(0.3)
wwre.set_max(0.6)
measure_list.append(wwre)
wallr = bsh_api.measures.WallRValue('ip')
wallr.set_min(20)
wallr.set_max(40)
measure_list.append(wallr)
lpd = bsh_api.measures.LightLPD('ip')
lpd.set_min(0.6)
lpd.set_max(1.2)
measure_list.append(lpd)
chillerEff = bsh_api.measures.CoolingChillerCOP()
chillerEff.set_min(3.5)
chillerEff.set_max(5.5)
measure_list.append(chillerEff)
heatEff = bsh_api.measures.HeatingEfficiency()
heatEff.set_min(0.8)
heatEff.set_max(0.95)
measure_list.append(heatEff)
# Add EEMs to parametric job
new_pj.add_model_measures(measure_list)
# Start!
results = new_pj.submit_parametric_study_local(local_file_dir, algorithm='montecarlo', size=number_of_simulation, track=True)
One set at a time method provides flexibilities in specifying the parametric parameters. This technique runs through the index of the design combinations.
For example, we have three sets of data want to be tested with simulations.
The first set if 2.4 W/m2-k Window U value, 0.4 Window SHGC and 8 W/m2 lighting power density. This set of data will be passed in for one simulation. Then the method will pass in the next set of parameters for simulation.
-
Pros:
- Fully control the number of simulations
- Works for both categorical and numeric data
- Fully control of sampling method (e.g., MCMC)
-
Cons:
- Requires more effort to set up the sampling method
-
Examples link
import BuildSimHubAPI as bsh_api
# 1. set your folder key
project_api_key = 'faa5ac84-6f72-427c-a4e7-278e9c17830d'
local_file_dir = '/Users/weilixu/Desktop/data/UnitTest/5ZoneAirCooled.idf'
bsh = bsh_api.BuildSimHubAPIClient()
# if the seed model is on the buildsim cloud - add model_api_key to the new_parametric_job function
new_pj = bsh.new_parametric_job(project_api_key)
measure_list = list()
# Define EEMs
wwr = bsh_api.measures.WindowWallRatio()
wwr_ratio = [0.6, 0.4]
wwr.set_datalist(wwr_ratio)
measure_list.append(wwr)
lpd = bsh_api.measures.LightLPD('ip')
lpdValue = [1.2, 0.9]
lpd.set_datalist(lpdValue)
measure_list.append(lpd)
heatEff = bsh_api.measures.HeatingEfficiency()
cop = [0.8, 0.86]
heatEff.set_datalist(cop)
measure_list.append(heatEff)
# Add EEMs to parametric job
new_pj.add_model_measures(measure_list)
# Start!
results = new_pj.submit_parametric_study_local(local_file_dir, algorithm='opat', track=True)
Once you started generating the parametric run, it is likely to see a similar message:
Submitting a parametric simulation job request...
Received server response
You can track the parametric using API key: f7d28f9b-c96b-437d-bd89
Building parametric sets generation
The API key showed in the message is very important for the rest of the tutorial. Copy and paste it into your data extraction script.
# In this example, we will extract the data from the cloud for our machine learning study
import BuildSimHubAPI as bsh_api
# 1. set your folder key
project_api_key = 'faa5ac84-6f72-427c-a4e7-278e9c17830d'
model_api_key = 'f7d28f9b-c96b-437d-bd89'
## We need a helper function that converts BuildSim cloud responded data into pandas
def convert_parajson_pandas(result_dict):
for i in range(len(result_dict)):
tempstr = result_dict["value"]
dict = {}
for key in result_dict:
if key == "model":
templist = result_dict[key]
tempdict = {}
for i in range(len(templist)):
tempstr = result_dict["model"][i]
templist = tempstr.split(',')
for j in range(len(templist)):
pair = templist[j].split(': ')
if pair[0] not in tempdict:
tempdict[pair[0]] = []
tempdict[pair[0]].append(pair[1])
for subkey in tempdict:
dict[subkey] = tempdict[subkey]
elif key != 'model_plot':
dict[key] = result_dict[key]
return pd.DataFrame(dict)
# Let's start extract.
bsh = bsh_api.BuildSimHubAPIClient()
results = bsh.parametric_results(project_key, model_api_key)
# In this example, we will train a regression model that predicts EUIs
eui_result = results.net_site_eui()
df = convert_parajson_pandas(eui_result)
print(df.to_string())
print('Minimum case:')
print(df.loc[df['value'].idxmin()])
'''
value Wall_R LPD ChillerCOP HeatingEff WWRS WWRW WWRE WWRN
0 11.69 20 0.6 5.5 0.95 0.3 0.3 0.6 0.32153813
1 12.79 28.96 0.72 4.48 0.9 0.45 0.58 0.48 0.48
2 13.61 23.22 0.94 4.71 0.94 0.34 0.32 0.36 0.43
3 13.09 24.87 0.74 4.23 0.8 0.43 0.39 0.43 0.4
4 13.20 25.21 0.78 4.56 0.83 0.43 0.35 0.57 0.46
5 13.58 37.87 0.8 3.82 0.84 0.4 0.54 0.55 0.59
6 12.23 27.14 0.64 4.76 0.88 0.36 0.38 0.52 0.42
7 12.28 21.12 0.68 5.02 0.93 0.51 0.4 0.47 0.3
Minimum case:
value 11.96 (kBtu/ft2)
Wall_R 32.14 (W/m2-K)
LPD 0.6 (W/ft2)
ChillerCOP 5.01 (COP)
HeatingEff 0.88
WWRS 0.31
WWRW 0.39
WWRE 0.39
WWRN 0.46
'''
The script print out all the data in a nice pandas data frame format. Also, it provides the minimum value case.
Now it is the time kick off our learning process!
Even for regression, we have many different algorithms - they are all good at predicting, but based on the characteristics of the data, they may have different predicting capabilities. The cross-validation is one of the techniques that helps us better select a good algorithm. In BuildSim Learn, a script called cross_validation.py seems a good fit for solving this issue.
import BuildSimHubAPI as bsh_api
import BuildSimHubAPI.postprocess as pp
from sklearn.model_selection import cross_validate
from sklearn import linear_model
from sklearn.svm import SVR
# Parametric Study
# 1. set your folder key
project_api_key = 'f98aadb3-254f-428d-a321-82a6e4b9424c'
model_api_key = 'f7d28f9b-c96b-437d-bd89'
# hyper-parameter
# number of folds
cv_fold = 3
# CPU use for parallelism the calculation
cpu = 1
# insert algorithms here
algs_name = ['linear', 'svr']
algs = [linear_model.LinearRegression(), SVR(kernel='linear', C=10)]
# SCRIPT
bsh = bsh_api.BuildSimHubAPIClient()
results = bsh.parametric_results(project_api_key, model_api_key)
# Collect results
result_dict = results.net_site_eui()
result_unit = results.last_parameter_unit
# Plot
param_plot = pp.ParametricPlot(result_dict, result_unit)
df = param_plot.pandas_df()
# Training code starts from here
y = df.loc[:, 'Value']
x = df.loc[:, df.columns != 'Value']
# default is 3 fold
for i in range(len(algs)):
train_result = cross_validate(algs[i], x, y, cv=cv_fold, n_jobs=cpu, return_train_score=False)
print(algs_name[i])
print(train_result['test_score'])
Apparently, this crossover script uses k-fold cross-validation algorithm. In the example, we will set the k to 3 folds, and compare the performance of linear regression (least square) and support vector regression.
Finish extracting: 1 to 101 , remaining: -1
linear:
[0.99887645 0.99904463 0.99882417]
svr:
[0.99706845 0.9934951 0.99621598]
It looks like both algorithms work for this case. The linear regression seems to handle this dataset slightly better than the SVR algorithm. Thus we will use linear regression to continue our tutorial.
Now, let's build our training script that - hopefully we can build a very simple linear regression model that is capable of predicting the EUIs (It should be noted that the EUI will be displayed in kBtu/ft2).
# additional imports to the previous script
import numpy as np
from sklearn import linear_model
# We need a helper function for a pretty print of the algorithm
def pretty_print_linear(coefs, names=None, sort=False):
if names is None:
names = ["X%s" % x for x in range(len(coefs))]
lst = zip(coefs, names)
if sort:
lst = sorted(lst, key=lambda x: -np.abs(x[0]))
return " + ".join("%s * %s" % (round(coef, 3), name)
for coef, name in lst)
# Following the previous script, we received the result data by calling net_site_eui()
df = convert_parajson_pandas(result_dict)
# Training code starts from here
y = df.loc[:, 'value'] # target value
x = df.loc[:, df.columns != 'value'] # parameters
column_head = list(x)
# train a regression model
alg = linear_model.LinearRegression() # linear regression model
alg.fit(x, y) # model training
# Print the results
print("Model training a completed!...")
print('Interpret value β0: '+str(alg.intercept_))
print('training score: ' + str(alg.score(x, y)))
print('Linear regression model: ' + str(alg.intercept_) + ' + ' + pretty_print_linear(alg.coef_))
print("#############################################################")
Execute this code; we will get a message like this:
Model training is completed!...
Interpret value β0: 11.74956999156323
training score: 0.9992059200343152
Linear regression model: 11.74956999156323 + -0.002 * X0 + 5.231 * X1 + -0.26 * X2 + -2.507 * X3 + 0.37 * X4 + 0.471 * X5 + 0.342 * X6 + 0.365 * X7
#############################################################
Nice! we do have a linear regression model that is beautifully and clearly presented! In the dataset, we find that the lowest EUI is 11.96 kBtu/ft2. Next, let's see if our trained linear regression can give us a better answer!
# we need scipy optimizer
import scipy.optimize as opt
# Some helper functiosn for generating boundary and initialize values
def bounds(col_head):
col_head = col_head.strip()
for measure in measure_list:
if measure.measure_name == col_head:
return measure.get_boundary()
def col_max(col_head):
col_head = col_head.strip()
for measure in measure_list:
if measure.measure_name == col_head:
return measure.get_max()
# alg is the trained linear regression model
# define the objective function in the optimization
def fun(x_pred): return alg.predict([x_pred])
x = df.loc[:, df.columns != 'value']
column_head = list(x)
# define boundary
bounds = [bounds(col_head) for col_head in column_head]
# Optimization initial value - sets everything to maximum
X = np.array([[col_max(col_head)] for col_head in column_head])
X.reshape(1, -1)
# optimize!
res = opt.minimize(fun, X, bounds=bounds, options={'disp': True})
# print results
print()
print("#############################################################")
print("Optimization completed in : " + str(end - start) + " seconds")
print("Optimized")
print(column_head)
print(res.x)
target_val = alg.predict([res.x])
end = time.time()
print('Predicted EUI: ' + str(target_val) + ' kBtu/ft2')
After the script completed:
#############################################################
Start optimization: 4.638588905334473
#############################################################
Optimization terminated successfully. (Exit mode 0)
Current function value: 11.470887577450602
Iterations: 2
Function evaluations: 20
Gradient evaluations: 2
#############################################################
Optimization completed in : 4.640115976333618 seconds
Optimized
['Wall_R', ' LPD', ' ChillerCOP', ' HeatingEff', ' WWRS', ' WWRW', ' WWRE', ' WWRN']
[40. 0.6 5.5 0.95 0.3 0.3 0.3 0.3 ]
Predicted EUI: [11.57088758] kBtu/ft2
The optimization uses our trained algorithm as objective function to find a design combination that is even lower than all the cases in the dataset!
Now, we should validate the lowest case generated in the optimization. This can be done using BuildSimHub API library.
import BuildSimHubAPI as bsh_api
project_key = 'f98aadb3-254f-428d-a321-82a6e4b9424c'
local_file_dir = "/Users/weilixu/Desktop/data/UnitTest/5ZoneAirCooled.idf"
# Start script
bsh = bsh_api.BuildSimHubAPIClient()
# Upload the same model and get the model from the cloud
new_sj = bsh.new_simulation_job(project_key)
model = new_sj.create_model(local_file_dir, comment='Seed model for validation')
# Define EEMs
measure_list = list()
wwrn = bsh_api.measures.WindowWallRatio(orientation="N")
wwrn.set_data(0.3)
measure_list.append(wwrn)
wwrs = bsh_api.measures.WindowWallRatio(orientation="S")
wwrs.set_data(0.3)
measure_list.append(wwrs)
wwrw = bsh_api.measures.WindowWallRatio(orientation="W")
wwrw.set_data(0.3)
measure_list.append(wwrw)
wwre = bsh_api.measures.WindowWallRatio(orientation="E")
wwre.set_data(0.3)
measure_list.append(wwre)
wallr = bsh_api.measures.WallRValue('ip')
wallr.set_data(40)
measure_list.append(wallr)
lpd = bsh_api.measures.LightLPD('ip')
lpd.set_data(0.6)
measure_list.append(lpd)
chillerEff = bsh_api.measures.CoolingChillerCOP()
chillerEff.set_data(5.5)
measure_list.append(chillerEff)
heatEff = bsh_api.measures.HeatingEfficiency()
heatEff.set_data(0.95)
measure_list.append(heatEff)
# apply the measure
new_model_key = model.apply_measures(measure_list)
# start the simulation!
results = new_sj.run_model_simulation(new_model_key, track=True)
print(str(results.net_site_eui()) + ' (kBtu/ft2)')
The simulation is completed with the following message!
Submitting a model to the server...
1-760-13660
Applying measure to model: 1-760-13660
Wall_R: 40, LPD: 0.6, ChillerCOP: 5.5, HeatingEff: 0.95, WWRS: 0.3, WWRW: 0.3, WWRE: 0.3, WWRN: 0.3
Submitting the simulation request...
Received server response
Looking for available simulation engine... 1%
Looking for available simulation engine... 1%
Looking for available simulation engine... 1%
September... 67%
Writing Simulation Results... 89%
Writing Simulation Results... 89%
Writing Simulation Results... 89%
Writing Simulation Results... 89%
Writing Simulation Results... 89%
Simulation finished successfully
Completed! You can retrieve results using the key: 1-760-13661
11.6 (kBtu/ft2)
The predicted value is just 0.03 kBtu/ft2 away from the real simulation value, and most importantly, the result is lower than all the cases in the pre-simulated dataset!
Predict the lowest EUI may seem simple with the above case. Because an experienced mechanical engineer will tell you the same design combinations that yield the lowest EUI in a second.
But what if we have budget constraints on the problem? Then the optimization problem becomes less explicit now. So, lets form the problem in python!
# our problem is we only have $140k for this project.
budget = 140000
## Cost problems
# Define helper function to calculate cost for each parameter
def window_wall_ratio_cost(val):
return -13143.2 * val + 33255.32
def wall_r_cost(val):
return 432 * val + 11230.2
def lpd_cost(val):
return -8112 * val + 19885.4
def chiller_cost(val):
return 8732.2 * val + 23222.22
def boiler_cost(val):
return 9174.44 * val + 17837.99
# Define our constraint function
# constraint function
def cost(x_pred):
cost = 0.0
for i in range(len(column_head)):
head = column_head[i]
if head == 'Wall_R':
cost += wall_r_cost(x_pred[i])
elif head == 'LPD':
cost += lpd_cost(x_pred[i])
elif head == 'ChillerCOP':
cost += chiller_cost(x_pred[i])
elif head == 'HeatingEff':
cost += boiler_cost(x_pred[i])
else:
cost += window_wall_ratio_cost(x_pred[i])
return budget - cost
# Restart our optimization and this time, with cost constraints.
bounds = [bounds(col_head) for col_head in column_head]
X = np.array([[col_max(col_head)] for col_head in column_head])
X.reshape(1, -1)
res = opt.minimize(fun, X, bounds=bounds, options={'disp': True}, constraints={'type': 'ineq', 'fun': cost})
print()
end = time.time()
print("#############################################################")
print("Optimization completed in : " + str(end - start) + " seconds")
print("Optimized")
print(column_head)
print(res.x)
target_val = alg.predict([res.x])
total_cost = budget - cost(res.x)
print('Predicted EUI: ' + str(target_val) + ' kBtu/ft2')
print('Cost: $' + str(total_cost) + ' compare to budget: $' + str(budget))
The produced message:
#############################################################
Start optimization: 4.914916753768921
#############################################################
Optimization terminated successfully. (Exit mode 0)
Current function value: 11.620641027687586
Iterations: 12
Function evaluations: 120
Gradient evaluations: 12
#############################################################
Optimization completed in : 4.922420024871826 seconds
Optimized
['Wall_R', ' LPD', ' ChillerCOP', ' HeatingEff', ' WWRS', ' WWRW', ' WWRE', ' WWRN']
[20. 0.6 5.5 0.95 0.32153813 0.3
0.6 0.3 ]
Predicted EUI: [11.72064103] kBtu/ft2
Cost: $140000.00000000058 compare to budget: $140000
We find the result is slightly higher than $140k, with lowest possible EUI in less than 0.01 seconds!.
You can extract the optimized model and open up the BuildSim 3D to see the geometry.
import BuildSimHubAPI as bsh_api
project_key = 'f98aadb3-254f-428d-a321-82a6e4b9424c'
model_api_key = 'FILL_IN_OPTIMIZED_MODEL_KEY'
bsh = bsh_api.BuildSimHubAPIClient()
model = bsh.model_results(project_key, model_api_key)
model.bldg_geo()