January 2019 Chris Cameron
I am auditing Microsoft Professional Program for Artificial Intelligence track.
The second of 10 courses is Introduction to Python for Data Science.
Since I know a lot of Python already I will probably skim the first 3-4 chapters and focus on Matplotlib and Control Flow/Pandas since I haven't used matplotlib in years and I've never used Pandas.
nothing noteworthy
nothing noteworthy
nothing noteworthy
nothing noteworthy
the justification for using numpy is being able to type (arr1 / arr2**2)
where arr1
and arr2
are numpy arrays. You could do this with lists too if you wanted to but I understand that the performance and statistical usage of numpy arrays is important.
he does arr1 > 23
to get an array of booleans
then he does arr1[arr1 > 23]
to get the specific values that were greater than 23.
neat
side note: it's pretty obvious (and a bit hilarious) these transcriptions were done by AI.
you can retrieve elements arr2d[n][m]
or arr2d[n,m]
. Works with slices too like arr2d[:,1:3]
numpy.mean
,numpy.median
numpy.corrcoef
(correlation coefficient, I believe)numpy.std
standard deviationnumpy.sum
numpy.sort
numpy.round
numpy.random.normal
random numbernumpy.column_stack
(creates a columnar 2d numpy array)
- Shows Hans Rosling's GDP vs Life Expectancy bubble chart.
matplotlib.pyplot as plt
subpackageplt.plot(...)
plt.show()
to display itplt.scatter(...)
plt.hist()
- label your axes
plt.xlabel('Year')
,plt.ylabel('Population')
plt.title("blah")
plt.yticks()
plt.fill_between()
They bring up ggplot
as an alternative plotting library.
They mention a gallery of nice plots that matplotlib supports
introduces or
and
if
else
elif
store data in a DataFrame
typically you import your DataFrame as opposed to declaring it manually (I wonder if this is true generally - this course has been pretty basic)
import pandas as pd
pd.read_csv()
pd.read_csv('filepath', index_col=0)
(this makes sure you don't use an index column as a data column by accident)df['column_header']
works with data framesdf['column_header'] = [some list]
- it's based on numpy so you can do like
df['density'] = df['population'] / df['area'] * 1000000
to create new columns - to access rows you have to use
df.loc['rowname']
to find that row - you can combine it either way like
df.loc['row']['col']
ordf['col'].loc('row')
Unfortunately all labs and exercises in this course were locked by the fee. Fortunately this course is so basic it's not worth the cost to unlock this stuff.