Install package with pip Num
pip install <package-name>
. Example:pip install numpy
- For more pip commands, check out pip guidelines document
Install package with conda
conda install <package>
. Example: conda install numpy
- For more conda commands, check out conda guidelines document
- Single Line Comment:
//sample text
- Multi Lines Comment:
""" Hello World! Nice to meet all of you cookie monsters! """
- bool from int (0, 1):
bool(0) / bool(1)
- X and Y
- X or Y
- if not X
- custom object boolean
- Define Nan, Infinite
- Sum up an array:
sum(arr)
- Max, returns maximum between two value:
max(a, b)
- Min, returns minimum between two value:
min(a, b)
- Atan2:
import math; math.atan2(90, 15)
- Asin:
import math; math.asin(0.5)
- Sin:
import math; math.sin(1)
- Cos:
import math; math.cos(1)
- Factorial:
import math: math.factorial(1)
- Round up a number to a certain decimal point:
round(value, 1)
- Calculate percentile
- Power of a number:
pow(base_number, exponent_number
- Square root of a number:
sqrt(number)
- Ceiling
import math value : int = math.ceil(invalue)
- Floor
import math value : int = math.floor(invalue)
- Logarithm / Log
- Log to the base of 2:
- Numpy:
import numpy as np; np.log2(10)
- Math:
import math; math.log2(10)
- Plotting of log to the to the base of 2
- Numpy:
- Log to the base of 10:
- Math:
import math; math.log10(10)
- Math:
- Log to the base of 2:
- Exclusive Or (XOR)
- Get maximum:
import sys; sys.maxsize
- Get minimum:
import sys: -sys.maxsize - 1
- Format floating value to n decimal:
"%.2f" % floating_var
/print("{:.2f}".format(a))
Notes:
Difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified (immutable),
and bytearray() returns an object that can be modified (mutable).
- Numpy <> Bytes, Bytes <> Numpy
- Bytes -> String:
bytesobj.decode("utf-8")
- String -> Bytes:
strobj.encode("utf-8")
- Bytes -> Multimedia file (video/audio))
- Check bytes encoding
- To Bytes:
bytes(<value>)
- Get size of bytes object:
import sys;sys.getsizeof(bytesobject)
- Split bytes to chunks
- The effect is less overhead in transmitting tasks to worker processes and collecting results.
Notes:
Difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified (immutable),
and bytearray() returns an object that can be modified (mutable).
- Integer to Bytearray
- Native Array to Bytearray
- Numpy Array to Bytearray
- [Image as Bytearray](notes/cv/image_as bytearray.ipynb)
- Check bytes array encoding
- To ByteArray:
bytearray(<value>)
- Numpy basic
- numpy array with int random value:
np.random.randint(5, size=(2, 4))
- numpy array with int random value:
- Check if numpy array has true value:
np.any(<np-array>)
- Get numpy shape:
nparray.shape
- Numpy array to list:
nparray.tolist()
- List to numpy array:
np.array(listarray)
- Change datatype:
nparray = nparray.astype(<dtype>)
Example:nparray = nparray.astype("uint8")
- Numpy NaN (Not A Number): Constant to act as a placeholder for any missing numerical values in the array:
np.NaN / np.nan / np.NAN
- Numpy multiply by a value:
nparray = nparray * 255
- Numpy array to image
- Numpy array to Torch tensor:
torch.from_numpy(nparray)
- Numpy <> Binary File(.npy)
- Print numpy array without scientific notation (e-2)
- Set numpy print options to suppress scientific notation
np.set_printoptions(suppress=True, precision=10) print(predictions)
- Use of
numpy.where
- Get minimum value of numpy array:
np.amin(array)
- Get maximum value of numpy array:
np.amax(array)
- Calculate euclidean distance with numpy
- Opencv Numpy array to bytes
targetimage : np.array success, encoded_targetimage = cv2.imencode(".png", targetimage) encoded_targetimage.tobytes()
- Generate string with parameter
- Using template literal:
print(f'Completed part {id}')
- Generate string with templates
- String formatting method:
print('Completed part %d' % part_id)
- create string in the raw format: `varname="world"; print(f"Hello {varname!r}")
- Using template literal:
- Check if string is empty, len = 0:
strvar = ""; if not strvar:
- Check if string contains digit:
any(chr.isdigit() for chr in str1) #return True if there's digit
- Check file extension: notes/string/check_file_extension.ipynb
- Capitalize a string:
strvar.capitalize()
- Uppercase a string:
strvar.upper()
- Lowercase a string:
strvar.lower()
- Capitalize the beginning of each word:
strvar.title()
- Get substring from a string:
strvar[<begin-index>:<end-index>]
/strvar[<begin-index>:]
/strvar[:<end-index>]
- Strip multiple white spaces into only one
- Remove white spaces in the beginning and end:
strvar.strip()
- Swap existing upper and lower case:
strvar.swapcase()
- Capitalize every first letter of a word:
strvar.title()
- Splitting string:
- Split a string based on separator:
strvar.split(separator)
Example:strvar.split("x")
- Split on white space:
strvar.split()
- If split with every character, do this instead:
[*"ABCDE"]
Result:["A", "B", "C", "D", "E"]
- Split a string based on separator:
- Check if string starts with a substring:
strvar.startswith(<substring>)
- Check if string ends with a substring:
strvar.endswith(<substring>)
- Check if string have substring/specific character. Returns -1 if not found:
strvar.find(input : str)
,strvar.find(input: str, start_index : int)
- String get substring with index:
str[startindex:endindex]
- Replace string/character with intended string/character:
strout = strin.replace(" ", "_")
- Replace multiple characters with intended character
- Replace multiple string with intended string
- Generate string
- String to List/Dict:
eval(strinput)
- List to string:
<separators>.join(list) example: ', '.join(listbuffer)
- Generate unique identifer UUID
- Validate if a string is UUID
- Compare if both UUID are the same
- UUID to string:
str(uuidparam)
- string to UUID:
uuid.UUID(uuid_in_str)
-
datetime: datetime.ipynb
- get current local date and time:
datetime.now()
- get utc date and time:
datetime.utcnow()
- time to str and reverse
- get current local date and time:
-
datetime.timedelta
from datetime import timedelta timediff : timedelta = datetime.now() - before print(timediff.microseconds) print(timediff.seconds) print(timediff.days)
- List of str to int:
list(map(int, arr))
- List with range of values:
list(range(...))
- Split str to list of str:
arr.split(" ")
- Find if a value in a list:
if value in mylist:
/if value not in mylist:
- Copy by value:
array.copy()
(to not impact changes in array after changing) - Sort an array in place:
arr.sort()
/ Return a sorted array:sorted(arr)
- Sort an array in reverse:
arr.sort(reverse = True)
/ Return a sorted array:sorted(arr, reverse = True)
- Sort an array in reverse:
- Get index of a value:
arr.index(value)
(When not found will raise ValueError) - Add one more value to existing list:
arr.append(value)
- Insert at index:
arr.insert(index, value)
- Extend list with values in another list:
arr.extend(arr2)
- Remove an item (the first item found) from the list:
arr.remove(item)
- Remove item by index:
del arr[index]
ordel arr[index-start: index-end]
- Check for empty list:
arr = []; if not arr: #empty list
- Clear a list:
arr.clear()
- Get list subset:
list[start:stop:step]
examplearray[0::2]
a[0:6:2]
- Check all items in a list(subset) if exist in another list, returns boolean:
set(b).issubset(v)
- Check unordered list to have the same items, returns boolean:
set(a) == set(b)
- Change values of list with List Comprehension:
[func(a) for a in sample_list]
- Iteration of list with index:
for index, value in enumerate(inlist):
- Enumerate with a beginning index:
for index, value in enumerate(inlist, 2):
(Index comes as second parameter)
- Enumerate with a beginning index:
- Iteration over two lists:
[<operation> for item1, item2 in zip(list1, list2)]
- Count occurence of items in list:
array.count(val)
- Get maximum value in a list of numbers (even strings):
max(samplelist)
- Get argument of minimum / maximum value
- Reverse a list:
list(reversed([1, 2, 3, 4])
/listinput.reverse()
- list to string:
",".join(bufferlist)
- Remove a value in list by index:
returnedvalue = listarray.pop(index)
(Note: Invoke IndexError if index not valid)- Remove last value:
listarray.pop()
- Remove last value:
- List Counter
Build list
- Build list of same values:
['100'] * 20 # 20 items of the value '100'
- Build multiple list into one:
lista + listb + listc
- Build list by breaking down every character of a string:
[*'abcdef']
- Define dict with str keys
- Define dict from two lists:
dict(zip(list1, list2))
- Add new key value pair:
dict.update({"key2":"value2"})
- Remove key<> value pair by referring to specific key
- Get keys as list:
list(lut.keys())
- Get values as list:
list(lut.values())
- Create dict from list:
{i: 0 for i in arr}
- Remove existing key:
del keyvalue['key']
- Remove key<>value:
value = keyvalue.pop(key, alternative-value-if-key-not-present)
- Handling missing items in dict
- Iteration to dict to get keys and values
- Save/load dictionary to/from a file
- Revert or inverse a dictionary mapping:
inv_map = {v: k for k, v in my_map.items()}
- Copy by value:
sampledict.copy()
- Decompose/unpack dictionary when passing as argument
- Use case: class declaration
- Reverse key value pair to build inverse key value pair with zip
- Dictionary to decide class to call with class as value
- Change order of key value based on the key/value items
- Set initialization:
setsample = {1,2,3,4,5}
/setsample = set()
- Add item:
setsample.add(<value>)
examplesetsample.add((1,2))
(has to be tuple, not list) - Add multiple items:
setsample.update(<another-set>)
- Set with multiple-value input as set
- Remove value by index:
setsample.pop(<index>)
- Remove value by value:
setsample.remove(<index>)
- Check if value exist in set:
if value in setsample:
- Build a tuple:
var : tuple[bool, str | None] = tuple([True, "abc"])
- List to tuple:
tuple([1,2])
- To identify if any items in the iterables has True/1 values:
any(sample_list) #returns single value True/False
- Zip multiple iterables
- Import:
import polars as pl
- Get header of dataframe:
df.columns
- View first n rows:
df.head(n)
- View random rows:
df.sample(n)
- Get number of rows:
row_count = df.select(pl.count()).item()
- Dataframe from dict
- Read in csv:
pl.read_csv(...)
- read in csv changing column file type
data_pl = pl.read_csv('file.csv').with_column_types({'col1': pl.Utf8, 'col2': pl.Utf8})
- read_csv without header (to prevent value be the header name)
pd.read_csv(datapath, header = None)
- read in csv changing column file type
- Write to csv:
write_csv(file : str, has_header: bool = True, separator : str = ",")
- Read excel: `pl.read_excel(source : str |..., sheet_name : str, engine = "openpyxl")
- Assign column name to dataframe:
df.columns = column_name
- Create empty data frame:
pl.DataFrame()
- Dataframe from dict:
df = pl.from_dict({"name": name_list, "id": id_list})
- Change header:
outdf = df.rename({"foo": "apple"}) # foo is previous title, apple is new title
- Get unique values of one/a few columns:
df[['column_name']].unique()
- Conversion
- Column to list:
df["a"].to_list()
- Check if dataframe is empty:
df.is_empty()
- Reorder column:
df = df[['PRODUCT', 'PROGRAM', 'MFG_AREA']]
- Drop Column:
df.drop("<column-name>")
/df.drop(["<column-name1>", "<column-name2>"])
- Rename column name:
df = df.rename(dict(zip(["column_name1_ori", "column_name2_ori"], ["column_name1", "column_name2"])))
- Casting:
out = df.select(pl.col("<col-name>").cast(pl.Int32))
- Fill null with value:
df.with_columns(pl.col("b").fill_null(99))
- Remove rows with conditions using filter
- Sort column value by order
- Concatenate dataframe
- default concatenate on rows:
pl.concat([df1, df2])
equivalent topl.concat([df1, df2], how="diagonal")
/pl.concat([df1, df2], how="vertical"
- default concatenate on rows:
- Group By
- Drop duplicates whole /subset
- Change sequence of columns in dataframe:
df = df[['PRODUCT', 'PROGRAM', 'MFG_AREA']]
- Add a new column with list:
df.with_columns(pl.Series(name="column-name", values=prediction_list))
- Apply function to a column:
df=df.with_columns([(pl.col("<column-name>").map_elements(<function-to-apply>).alias("<new-column-name>"))])
- Drop nulls:
df = df.drop_nulls()
More- Drop a row if all value is null:
df.filter(~pl.all_horizontal(pl.all().is_null()))
- Drop a row if all value is null:
- Replace column values
- Apply function to value:
- String operations
- Apply uppercase to column with string
- Replace value in column
- String remove whitespace front and back and in between
df = df.select(pl.col(pl.Utf8).str.strip_chars()) df = df.select(pl.col(pl.Utf8).str.replace(" ", ""))
- Check if duplicated value in a column
- Check if any value in a Boolean Series is true:
df.select(pl.col("a").is_duplicated())['a'].any()
- Dataframe basic
- Get # rows and columns
- Get summary/infos about dataframe
- Get data types
- Dataframe/Series Min, Max, Median, General Description
- Get rows name (index) and columns name (column)
- Get a glimpse of dataframe
- Get subset of a dataframe by rows/by columns
- Get rows by finding matching values from a specific column
- Check if a column name exist in dataframe -
if 'code' in df.columns:
- Iteration of each rows in a dataframe
-
Check if dataframe is empty:
df.empty #return boolean
-
Build dataframe with columns name
column_list = ["a", "b"] df = pd.DataFrame(columns = column_list)
-
Build a new dataframe from a subset of columns from another dataframe
-
Get subset of dataframe, sample columns with specific criteria
- Sample by percentage
- Sample by # of rows specified
- Sample by matching to a value
-
Column to list:
df.columns.tolist()
-
Sample rows:
df = df.sample(frac=1).reset_index(drop=True)
-
- Concatenate by adding rows
-
Reset index without creating new (index) column:
df.reset_index(drop=True)
-
Assign df by copy instead of reference -
df.copy()
-
Shuffle rows of df:
df = df.sample(frac=1).reset_index(drop=True)
-
Bytes to dataframe
from io import BytesIO import pandas as pd data = BytesIO(bytesdata) df = pd.read_csv(data)
-
Sort values according to particular column
df = df.sort_values(by=['frame'])
- Series to value
- Series/Dataframe to numpy array:
input.to_numpy()
- Series iteration:
for index, item in seriesf.items():
- Series to dict:
seriesf.to_dict()
- Create new column and assign value according to another column
- Assign values by lambda and df.assign
- Dataframe append rows
- Drop duplicates for df / subset, keep one copy and remove all
- Remove/drop rows where specific column matched value
- Remove specific columns with column name
- Drop rows by index
- Drop rows/columns with np.NaN:
df3 = df3.dropna(axis = 1) #row
- pivot table:
:TODO
- Drawback: Not able to do filtering selection
- Merge two dataframes based on certain column values
- Filter with function isin()
- Filter df with item not in list
- Filter with function query()
- Find with loc
df.loc[df['address'].eq('[email protected]')] #filter with one value
df.loc[df.a.eq(123) & df.b.eq("helloworld")] #filter with one value in multiple columns
df.loc[df.a.isin(valuelist)] #filter with a few values in a list
- Filter by substring:
df.loc[df['folder'].eq(folderkey) & df['id'].str.contains(videokey)]
- Assign value to specific column(s) by matching value
- Get a subset of dataframe by rows -
df.iloc[<from_rows>:<to_rows>, :]
- Count items and filter by counter values
- Retrieve columns name which match specific str
- Read in excel with specific sheet name:
pd.read_excel(<url>, sheet_name = "Sheet1", engine = "openpyxl")
- Note: Install engine by
pip install openpyxl
- Note: Install engine by
- Read number of sheets in excel
- Save excel:
df.to_excel('file_name', index = False)
- Write to multiple sheets
- Read csv with other delimiter
pd.read_csv(<path-to-file>, delimiter = '\x01')
- Read csv with bad lines
pd.read_csv(<path-to-file>, on_bad_lines='skip')
- Note:
pd.read_csv(<path>, error_bad_lines = False)
deprecated
- Note:
- Read csv with encoding
pd.read_csv('file name', encoding = 'utf-8')
- Save to csv
df.to_csv('file name', index = False)
- Note: Put
index = False
is important to prevent an extra column of index being saved.
- Note: Put
- Save to csv with encoding
df.to_csv('file name', encoding = 'utf-8')
- Write list/dict to csv file (Note: to not affected by the comma in the collection)
Panda Parquet In/Out
- Read in parquet:
pd.read_parquet(...)
- Write to parquet:
pd.to_parquet(...)
Note: Pickle have security risk and slow in serialization (even to csv and json). Dont use
- Read in pickle to dataframe:
df = pd.read_pickle(<file_name>) # ends with .pkl
- Save to pickle:
df.to_pickle(<file_name>)
- Fixed Random Seed Number (Generate same pattern) :
random.seed(integer_value)
Randomize everytime:random.seed()
- Generate random floating value within 0- 1:
from random import random; random.random()
- Generate random integer within (min, max). Both bound included:
from random import randint; randint(0, 100) #within 0 and 100
- Generate random floating value:
from random import random; random()
- Randomly choosing an item out from a list:
import random; random.choice([123, 456, 378])
- Generate list with random number:
import random; random.sample(range(10, 30), 5)
- Example shown where 5 random numbers are generated in between 10 to 30
- Shuffle an array:
random.shuffle(array)
- Native Catching Exception
- Catch multiple error:
except (CompileError, ProgrammingError) as e:
- Catch multiple error:
- Traceback
- Suppress and log error
- ValueError: argument of the correct data type but an inappropriate value
- TypeError: the data type of an object is incorrect
- IndexError: Raised when a sequence subscript is out of range
- KeyError: When key cannot be found
- ZeroDivisionError: when a number is divided by zero
- OSError: error from an os-specific function
- FileNotFoundError: when a file or directory is requested but doesn’t exist
- IsADirectoryError: when removing a file but it turns out is a directory with
os.remove(file)
- NotImplementedError: commonly raised when an abstract method is not implemented in a derived class
- NameError: reference to some name (variable, function, class) that hasn’t been defined
- AttributeError: reference to certain attribute in a class which does not exist
- ImportError: Trouble loading a module
- Submodule
- ModuleNotFoundError: the module trying to import can’t be found or try to import something from a module that doesn’t exist in the module
- Submodule
- AssertionError: Raise when run
assert
-
The character used by the operating system to separate pathname components:
os.sep
-
Iterate through a path to get files/folders of all the subpaths
-
Write file:
f.write(str)
-
print without new line:
print(..., end="")
-
Get environment path (second param is optional):
import os; os.getenv(<PATH_NAME> : str, <alternative-return-value>: str)
-
- Set variable:
os.environ['redis'] = "localhost:6379"
- Get value with key:
import os; os.environ["HOMEDIR"]
- Get value with default value:
database_url = os.environ.get("DATABASE_URL", "default-value")
- Set variable:
-
Check if path is a folder:
os.path.isdir(<path>)
-
Get root path:
root_path = os.path.expanduser("~/.root")
-
from pathlib import Path; outsize : int = Path(inputfilepath).stat().st_size
import os; outsize : int = os.path.getsize(inputfilepath)
-
Create folder:
os.mkdir(<path>
-
Create folders recursively:
os.makedirs(<path>)
-
Rename file:
os.rename(<filepath-from>, <filepath-to>)
/os.rename(<dirpath-from>, <dirpath-to>)
-
Get folder path out of given path with filename:
os.path.dirname(<path-to-file>)
-
Expand home directory:
os.path.expanduser('~')
-
Get current running script path:
os.getcwd()
-
Get the list of all files and directories in the specified directory (does not expand to items in the child folder:
os.listdir(<path>)
-
Get current file path (getcwd will point to the running script(main) path, this will get individually py path):
os.path.dirname(os.path.abspath(__file__))
-
Get filename from path:
os.path.basename(configfilepath)
-
Split extension from rest of path(Including .):
filename, ext = os.path.splitext(path)
-
Append certain path:
sys.path.append(<path>)
-
Check if path exist:
os.path.exists(<path>)
-
Remove a file:
os.remove()
-
Get size of current file in byte:
os.path.getsize(<path>)
orfrom pathlib import Path; Path(<path>).stat().st_size
-
Removes an empty directory:
os.rmdir()
-
Deletes a directory and all its contents:
import shutil;shutil.rmtree(<path-to-directory>)
-
open(<path-to-file>, mode)
-
r
: Open for text file for reading text -
w
: Open a text file for writing text -
a
: Open a text file for appending text -
b
: Open to read/write as bytes Read file has 3 functions -
read()
orread(size)
: read all / size as one string. -
readline()
: read a single line from a text file and return the line as a string. -
readlines()
: read all the lines of the text file into a list of strings. -
write(<param> : str)
: write in param. Need to explicitly add\n
to split line. -
.close()
: close file iterator -
check if file iterator is closed:
closed
-
- Get system input
- Check operating system:
import platform; platform.system()
- Check if port is open/close
- Measure Time Performance with time.time() / time.perf_counter()
- Add delay to execution of the program by pausing:
import time;time.sleep(seconds)
- Note: stops the execution of current thread only
- Point to a later time from now
- Matplotlib
- Effective way to view object address and object
- Reserved methods in class
- The magic variable *args and **kwargs: Quick Review Elaborated Notes
- Check if object is of specified type:
isinstance(obj, MyClass)
/isinstance(obj, (type1, type2) : tuple)
- Deep Copy, Shallow Copy
- Copy list by value:
list_cp = list_ori[:]
(Note:list_cp = list_ori
copy by reference)
- Copy list by value:
- Define dataclass @dataclass
- dataclass 1
- dataclass 2
- dataclass 3
- Compare normal class definition with dataclass definition
- Layout output of dict for dataclass class
- dict as constructor input
- Enum
- Enum get key:
obj.name
- Enum get value:
obj.value
- Implement Enum in Python
- Compare enum:
value == EnumObject.OPTION1
- Enum with string
- str to enum
- Compare enum:
- Get all the values of enum:
[e.value for e in Directions]
- Enum get key:
- Serialize class object
- [Function/Module with error handling](notes/class/function_with error_handling.ipynb)
- Identify if function did not return object. TLDR: if not test1()
- Compare class object
- Static Variable
__dict__
return all attributes of an object(only those defined in init):obj.__dict__
__str__
return string representation of the obj:def __str__(self):
__eq__
compare the instances of the class:def __eq__(self, other):
__repr__
: represent a class's objects as a string. Call object withrepr(obj)
__call__
: to make class instance callableclassinstance(variable)
- Find matching word/character 1
- Introduction of functions in re library
- Square brackets for upper and lower case
[Ww]oodchuck
- Find matching word/character 2
- Optional character with
?
- Optional 0 or more character with
*
- Optional 1 or more character with
+
- Any character with
.
- Optional character with
- Find matching word/character 3
- Whitespace character find with
\s
- Non-whitespace character find with
\S
- Whitespace character find with
- Find matching word/character 4
- Caret before square bracket:
^[]
to indicate beginning - Dollar sign after square bracket:
[]$
to indicate ending
- Caret before square bracket:
- Negation
- Disjunction
- To match a series of patterns with parenthesis.
- Extract hashtags
- Extract numbers from string
- yield instead of return link1 yield, iterators, generators
- Produce a new iterable with map()
- Generate a new iterable with Boolean-return function with filter()
- Produce a single cumulative value from iterable with reduce()
- Condition checking with any()
- Multiple function declaration with singledispatch)
- Lambda function:
x = lambda a, b : a * b
Note: Functional style can be replaced with list comprehension or generator expressions
- Number swapping
- Reverse value
- Fibonacci Memoization
- Number of Pairs, Number of 3 value combination
- Factorial
- Tree
- from abs import ABC
- from abs import ABCMeta
- Difference between importance ABC or ABCMeta
- TLDR: ABC is a wrapper of ABCMeta, both serves the purpose where former easy to write.
- Unnamed arguments
- Named arguments:
:TODO
- Filename as argument
- Read from config file
- How to comment on config file(*.ini): Put
#
sign in front of an empty line
- How to comment on config file(*.ini): Put
- Using .env Files for Environment Variables in Python Applications
When to use configparser? When to user .env?
#### TLDR:
Use .env to save string-variable value which should not at any cost being exposed in code versioning platform/docker
### use .env
- the . of filename make it hidden
- already excluded in preset .gitignore
- Nearly every programming language has a package or library that can be used to read environment variables from the .env file instead of from your local environment.
- load_dotenv will find from host environment for variables when .env file is not file (for docker environment)
### use configparser
- import with more built in variable type (int, string, boolean) and checks to perform upon the value
- Dataframe - column-major, Numpy - row-major
- Memory Profiling
- Execution Time Profiling with line_profiler
Difference of pool(from multiprocessing) from thread:
pool spins up different processes while thread stay in the same process
The goal of pool (multiprocessing) is to maximize the use of cpu cores.
- Basic:
import logging logger = logging.getLogger(__name__) logging.basicConfig(stream=sys.stdout, level=logging.INFO)
- Logging Levels: DEBUG, INFO, WARN, ERROR, FATAL
- Advanced configuration log to stdout
- Advanced configuration log to file
- Log with variables:
logging.error(f"Keys {a} is missing")
- Log exception
- Logging write to both stdout and file
- Factory method
- Abstract Factory
- Monkey Patching
- Singleton: A singleton is a class with only one instance.
- Decorator
- Class Method @classmethod: take
cls
as first parameter (have access to internal fields and methods) - Static Method @staticmethod: can take no parameters, basically just a function
- When to use @classmethod, @staticmethod
- Class method can modify the class state,it bound to the class and it contain cls as parameter.
def test(cls, ): self.variable = ?
- Static method can not modify the class state,it bound to the class and it does't know class or instance
def test(variable): ...
- Class method can modify the class state,it bound to the class and it contain cls as parameter.
- When to use @classmethod, @staticmethod
- dataclass @dataclass
- Abstract class with ABCMeta and @abstractmethod
- Property Setting
- @property to prevent setting value
- Native Verbose Method
- Using built-in property function
- Using decorator
- getter: @property
- setter: @{variable}.setter
- deleter: @{variable}.deleter
- @lru_cache
- Simple check, raise AssertionError if wrong:
format: `assert condition, message when error raised
assert a == 20, "val a == 20"
assert isinstance(a, int)
- List
- Tuple
- Set
# Prior to python 3.9 from typing import List, Tuple, Set items: List[str] values : Tuple[int, str, str] products : Set[bytes] # python 3.9 onwards # no need import items: list[str] values : tuple[int, str, str] products : set[bytes]
- Any :
from typing import Any; varible : Any
- Union / Optional
- Simplification of Union from python 3.10 onwards:
var1 : str | None
- Simplification of Union from python 3.10 onwards:
- Literal
- Annotated
- Before python 3.9:
from typing_extensions import Annotated
- Python 3.9 onwards:
from typing import Annotated
- Before python 3.9:
- Jinja: Template Engine for Python. TLDR: Use text-based templates to tranformer
Hello, {{ user_name }}!
toHello, John!
- Kill after x amount of time if process not complete
- Context Managers
- Open url with webbrowser module
- In script
- In command:
python -m webbrowser -t "https://www.python.org"
- Declare redis:
r = redis.Redis(host='127.0.0.1', port=6379)
- Set key:
r.set('counter', 1)
- Get key:
value : int = r.get('counter')
- Check if key exist:
r.exists(<key>)
- Set expiry and check ttl
- Get IP from domain name:
import socket;socket.gethostbyname("www.google.com");
- Get host name of the machine:
socket.gethostname()
- Password hashing with library bcrypt - saltround
- Password hasing with passlib backed with bcrypt (Used in Fast API)
- Form Data
- Send image via UploadFile
- Client upload file to FastAPI Uploadfile and get response
- Return content from url and write image
- Connect to db with sqlalchemy
- Silence the log:
create_engine(..., echo = False)
- SQLAlchemy query with name and value insertion
- Silence the log:
- Postgres connect to AWS RDS
- Local Node
Save and load image between REST and PostgresObsolete: large files (including image) should be saved to storage
Save and load video between REST and PostgresObsolete: large files (including image) should be saved to storage
-
What is a bucket in S3
A bucket is a container for objects stored in Amazon S3 which can contains files and folders. You can store any number of objects in a bucket and can have up to 100 buckets in your account
- Check if cuda is available -
import torch; torch.cuda.is_available()
- Softmax
Torch Tensor Creation
- Create tensor of zeros with shape like another tensor:
torch.zeros_like(another_tensor)
- Create tensor of zeros with shape (tuple):
torch.zeros(shape_in_tuple)
- Create tensor of ones with shape like another tensor:
torch.ones_like(another_tensor)
- Create tensor of ones with shape (tuple):
torch.ones(shape_in_tuple)
- Create tensor of random floating value between 0-1 with shape like another tensor:
torch.rand_like(another_tensor, dtype = torch.float)
- Create tensor of random floating value between 0-1 with shape (tuple):
torch.rand(shape_in_tuple)
Torch Tensor Info Extraction
- Given torch.tensor
buffer = tensor(4)
, get the value by -id = buffer.item()
- Given torch.tensor, get the argmax of each row -
torch.argmax(buffer, dim=<(int)dimension_to_reduce>)
- Tensor to cuda -
inputs = inputs.to("cuda:0")
orinputs = inputs.cuda()
- Tensor to cpu -
inputs = inputs.to("cpu")
orinputs = inputs.cpu()
- Tensor shape -
tensor.shape
- Tensor data types -
tensor.dtype
- Device tensor is stored on -
tensor.device
- Torch tensor(single value) to value:
tensorarray.item()
- Retrieve subset of torch tensor by row index:
tensor[<row_number>, :]
/tensor[<row_number_from>:<row_number_to>, :]
- Retrieve subset of torch tensor by column index:
tensor[:, <column_number_from>:<column_number_to>]
Torch Tensor Conversion
- List to torch tensor -
torch.tensor(listimp)
- Numpy array to torch tensor -
torch.from_numpy(np_array)
- Torch tensor to numpy:
tensorarray.numpy()
- Image to torch tensor
- Torch tensor to image
Torch Tensor Operation
- Torch tensor value change by indexing and conditions
- Concatenate tensor according to dimension (0 for adding rows, 1 for adding columns):
torch.cat([<tensor_1>, <tensor_2>, ...], dim = <dimension_number>
Dataset Loader, Iterator
torch.utils.data.DataLoader
: stores the samples and their corresponding labels,torch.utils.data.Dataset
: wraps an iterable around the Dataset to enable easy access to the samples
Torch Tensor In/Out
- Save torch tensor to file:
torch.save(x : torch.tensor, tensorfile :str)
- Load torch tensor from file:
torch.load(tensorfile :str)
-
-
Fashion MNIST Torch
Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes.
-
- Send model to cuda -
model.to('cuda:0')
ormodel.cuda()
- Overview of DatasetDict
- DatasetDict from Pandas Dataframe
- Get image shape:
img.shape
(Important: shape[0] = height, shape[1] = width) - Create a color image:
image = np.zeros((h,w,3), np.uint8)
- Read/Write image:
- As byte
- As Bytearray
- As base64
- From imageio (save with numpy array)
- Read only 3 channels:
im3d = imageio.imread('path/to/some/singlechannelimage.png', pilmode='RGB')
- Read only 3 channels:
- Read image
- Read image from url
- Read in image with Pillow
- Pillow read in image from np.array:
im = Image.fromarray(nprrayimage)
- Pillow read in image from np.array:
- Read in image from imageio
- Pause to display image or wait for an input:
cv2.waitKey(0)
- Save an image:
cv2.imwrite(pathtoimg : str, img : numpy.ndarray)
- Show an image in window:
cv2.imshow(windowname : str, frame : np.array)
- Show an image in Jupyter notebok
from IPython.display import Image Image(filename=pathtoimg : str)
- Crop image
- numpy array:
image[y0:y1, x0: x1, :]
- numpy array:
- Flip image:
frame = cv2.flip(frame, flipcode : int)
- Positive flip code for flip on y axis (left right flip)
- 0 for flip on x axis (up down)
- Negative for flipping around both axes
Computer Vision - Filter
- Blur with averaging mask:
cv2.blur(img,(5,5))
- GaussianBlur:
blur = cv2.GaussianBlur(img,(5,5),0)
- Note: Kernel size
(5, 5)
to be positive and odd. Read more here on how kernel size influence the degree of blurring.
- Note: Kernel size
- Blurring region of image
Computer Vision - Video Stream
- Play video in jupyter notebook/lab
- Get total number of frames in the video:
int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
- change video frame curent count to desired frame count
- Concat multiple video streams to show side by side: 2 video streams 3 video streams
- Save stream to video output
- opencv method (Face problem when replaying the video generated on AWS cloud services)
- imageio method
- Read in video stream from a file
- Read in stream from camera
- video arrays (in opencv) -> bytes -> np.array -> video arrays (in opencv)
- Merge audio with video
- Check if video comes with audio
- Split audio from video
Computer Vision - Other
- Overlay image
- Resizing frame:
outframe = cv2.resize(frame, (w, h))
- Set color to rectangle region
- RGB value to color text
- Color to gray image:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
- Draw circle:
image = cv2.circle(image, center_coordinates: set, example: (50, 100), radius: int, color : set, example: (255, 255, 255), thickness : int)
- bgr to rgb channel:
img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
- Draw rectangle and point on image with mouse activity
- Draw rectangle on image in jupyter
- Write text on image
- Remove background
- Weighted blend two image with
cv2.addWeighted
- Add channel to image
- Draw contour
- Audio of .wav -> .flac
- Get sampling rate of an audio file
- Audio file <> Numpy Array
- Play audio file in Jupyter
- Read an audio file:
array, sampling_rate = librosa.load(audiopath)
- Get duration of audio:
librosa.get_duration(path=x)
- Show waveform
- Show spectrogram, log mel spectrogram