GitHub - whitepawglobal/bite-size-python: Bite-size code to faciliate your daily data science usage 🍪

Bite-Size Python

Package Installation

Install package with pip Num pip install <package-name>. Example:pip install numpy

For more pip commands, check out pip guidelines document

Install package with conda

conda install <package>. Example: conda install numpy

For more conda commands, check out conda guidelines document

Basic
Intermediate
Advanced
Software Development
Machine Learning
Medium Posts

Basic

Comment

Single Line Comment: //sample text

Multi Lines Comment:

 """
 Hello World!
 Nice to meet all of you cookie monsters!
 """

Boolean Operator

bool from int (0, 1): bool(0) / bool(1)
X and Y
X or Y
if not X
custom object boolean

Maths

Define Nan, Infinite
Sum up an array: sum(arr)
Max, returns maximum between two value: max(a, b)
Min, returns minimum between two value: min(a, b)
Atan2: import math; math.atan2(90, 15)
Asin: import math; math.asin(0.5)
Sin: import math; math.sin(1)
Cos:import math; math.cos(1)
Factorial: import math: math.factorial(1)
Round up a number to a certain decimal point: round(value, 1)
Calculate percentile
Power of a number: pow(base_number, exponent_number
Square root of a number: sqrt(number)

Ceiling

import math
value : int = math.ceil(invalue)

Floor

import math
value : int = math.floor(invalue)

Logarithm / Log
- Log to the base of 2:
  - Numpy: import numpy as np; np.log2(10)
  - Math: import math; math.log2(10)
  - Plotting of log to the to the base of 2
- Log to the base of 10:
  - Math: import math; math.log10(10)
Exclusive Or (XOR)
- Swap two numbers with XOR

Math-others

Unique combination pair

Sorting

Data Types

Integer

Get maximum: import sys; sys.maxsize
Get minimum: import sys: -sys.maxsize - 1

Floating Value (float, double)

Format floating value to n decimal: "%.2f" % floating_var / print("{:.2f}".format(a))

Bytes

Notes:

Difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified (immutable), 
and bytearray() returns an object that can be modified (mutable).

Numpy <> Bytes, Bytes <> Numpy
Bytes -> String: bytesobj.decode("utf-8")
String -> Bytes: strobj.encode("utf-8")
Bytes -> Multimedia file (video/audio))
Check bytes encoding
To Bytes: bytes(<value>)
Get size of bytes object: import sys;sys.getsizeof(bytesobject)
Split bytes to chunks
- The effect is less overhead in transmitting tasks to worker processes and collecting results.

ByteArray

Notes:

Difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified (immutable), 
and bytearray() returns an object that can be modified (mutable).

Integer to Bytearray
Native Array to Bytearray
Numpy Array to Bytearray
[Image as Bytearray](notes/cv/image_as bytearray.ipynb)
Check bytes array encoding
To ByteArray: bytearray(<value>)

Numpy

Numpy basic
- numpy array with int random value: np.random.randint(5, size=(2, 4))
Check if numpy array has true value: np.any(<np-array>)
Get numpy shape: nparray.shape
Numpy array to list: nparray.tolist()
List to numpy array: np.array(listarray)
Change datatype: nparray = nparray.astype(<dtype>) Example: nparray = nparray.astype("uint8")
Numpy NaN (Not A Number): Constant to act as a placeholder for any missing numerical values in the array: np.NaN / np.nan / np.NAN
Numpy multiply by a value: nparray = nparray * 255
Numpy array to image
Numpy array to Torch tensor: torch.from_numpy(nparray)
Numpy <> Binary File(.npy)
Print numpy array without scientific notation (e-2)

Set numpy print options to suppress scientific notation

np.set_printoptions(suppress=True, precision=10)
print(predictions)

Use of numpy.where
Get minimum value of numpy array: np.amin(array)
Get maximum value of numpy array: np.amax(array)
Calculate euclidean distance with numpy

Opencv Numpy array to bytes

targetimage : np.array
success, encoded_targetimage = cv2.imencode(".png", targetimage)
encoded_targetimage.tobytes()

String

Generate string with parameter
- Using template literal: print(f'Completed part {id}')
- Generate string with templates
- String formatting method: print('Completed part %d' % part_id)
- create string in the raw format: `varname="world"; print(f"Hello {varname!r}")
Check if string is empty, len = 0: strvar = ""; if not strvar:
Check if string contains digit: any(chr.isdigit() for chr in str1) #return True if there's digit
Check file extension: notes/string/check_file_extension.ipynb
Capitalize a string: strvar.capitalize()
Uppercase a string: strvar.upper()
Lowercase a string: strvar.lower()
Capitalize the beginning of each word: strvar.title()
Get substring from a string: strvar[<begin-index>:<end-index>] / strvar[<begin-index>:] / strvar[:<end-index>]
Strip multiple white spaces into only one
Remove white spaces in the beginning and end: strvar.strip()
Swap existing upper and lower case: strvar.swapcase()
Capitalize every first letter of a word: strvar.title()
Splitting string:
- Split a string based on separator: strvar.split(separator) Example: strvar.split("x")
- Split on white space: strvar.split()
- If split with every character, do this instead: [*"ABCDE"] Result: ["A", "B", "C", "D", "E"]
Check if string starts with a substring: strvar.startswith(<substring>)
Check if string ends with a substring: strvar.endswith(<substring>)
Check if string have substring/specific character. Returns -1 if not found: strvar.find(input : str), strvar.find(input: str, start_index : int)
String get substring with index: str[startindex:endindex]
Replace string/character with intended string/character: strout = strin.replace(" ", "_")
Replace multiple characters with intended character
Replace multiple string with intended string
Generate string
String to List/Dict: eval(strinput)
List to string: <separators>.join(list) example: ', '.join(listbuffer)

Unique Identifer (UUID)

Generate unique identifer UUID
Validate if a string is UUID
Compare if both UUID are the same
UUID to string: str(uuidparam)
string to UUID: uuid.UUID(uuid_in_str)

Datetime

date, datetime create
datetime: datetime.ipynb
- get current local date and time: datetime.now()
- get utc date and time: datetime.utcnow()
- time to str and reverse
find differences of two datetime: use divmod
date, datetime comparison)

datetime.timedelta

from datetime import timedelta
timediff : timedelta =  datetime.now() - before
print(timediff.microseconds)
print(timediff.seconds)
print(timediff.days)

Convert datetime between different time zone

Data Structure

List

List of str to int: list(map(int, arr))
List with range of values: list(range(...))
Split str to list of str: arr.split(" ")
Find if a value in a list: if value in mylist: / if value not in mylist:
Copy by value: array.copy() (to not impact changes in array after changing)
Sort an array in place: arr.sort() / Return a sorted array: sorted(arr)
- Sort an array in reverse: arr.sort(reverse = True) / Return a sorted array: sorted(arr, reverse = True)
Get index of a value: arr.index(value) (When not found will raise ValueError)
Add one more value to existing list: arr.append(value)
Insert at index: arr.insert(index, value)
Extend list with values in another list: arr.extend(arr2)
Remove an item (the first item found) from the list: arr.remove(item)
Remove item by index: del arr[index] or del arr[index-start: index-end]
Check for empty list: arr = []; if not arr: #empty list
Clear a list: arr.clear()
Get list subset: list[start:stop:step] example array[0::2] a[0:6:2]
Check all items in a list(subset) if exist in another list, returns boolean: set(b).issubset(v)
Check unordered list to have the same items, returns boolean: set(a) == set(b)
Change values of list with List Comprehension: [func(a) for a in sample_list]
Iteration of list with index: for index, value in enumerate(inlist):
- Enumerate with a beginning index: for index, value in enumerate(inlist, 2): (Index comes as second parameter)
Iteration over two lists: [<operation> for item1, item2 in zip(list1, list2)]
Count occurence of items in list: array.count(val)
Get maximum value in a list of numbers (even strings): max(samplelist)
Get argument of minimum / maximum value
Reverse a list: list(reversed([1, 2, 3, 4]) / listinput.reverse()
list to string: ",".join(bufferlist)
Remove a value in list by index: returnedvalue = listarray.pop(index) (Note: Invoke IndexError if index not valid)
- Remove last value: listarray.pop()
  - Stack Implementation with list
  - Queue Implementation with list
List Counter

Build list

Build list of same values: ['100'] * 20 # 20 items of the value '100'
Build multiple list into one: lista + listb + listc
Build list by breaking down every character of a string: [*'abcdef']

Dictionary

Define dict with str keys
Define dict from two lists: dict(zip(list1, list2))
Add new key value pair: dict.update({"key2":"value2"})
Remove key<> value pair by referring to specific key
Get keys as list: list(lut.keys())
Get values as list: list(lut.values())
Create dict from list: {i: 0 for i in arr}
Remove existing key: del keyvalue['key']
Remove key<>value: value = keyvalue.pop(key, alternative-value-if-key-not-present)
Handling missing items in dict
Iteration to dict to get keys and values
Save/load dictionary to/from a file
Revert or inverse a dictionary mapping: inv_map = {v: k for k, v in my_map.items()}
Copy by value: sampledict.copy()
Decompose/unpack dictionary when passing as argument
- Use case: class declaration
Reverse key value pair to build inverse key value pair with zip
Dictionary to decide class to call with class as value
Change order of key value based on the key/value items

[collections.defaultdict]

defaultdict introduction

Set

Set initialization: setsample = {1,2,3,4,5}/ setsample = set()
Add item: setsample.add(<value>) example setsample.add((1,2)) (has to be tuple, not list)
Add multiple items: setsample.update(<another-set>)
Set with multiple-value input as set
Remove value by index: setsample.pop(<index>)
Remove value by value: setsample.remove(<index>)
Check if value exist in set: if value in setsample:

Tuple

Build a tuple: var : tuple[bool, str | None] = tuple([True, "abc"])
List to tuple: tuple([1,2])

Named Tuple

NamedTuple

Applicable to Python Iterables (List, Set,...)

To identify if any items in the iterables has True/1 values: any(sample_list) #returns single value True/False
Zip multiple iterables

JSON

Polars

Import: import polars as pl

View

Get header of dataframe: df.columns
View first n rows: df.head(n)
View random rows: df.sample(n)
Get number of rows: row_count = df.select(pl.count()).item()

File IO

Dataframe from dict
Read in csv: pl.read_csv(...)
- read in csv changing column file type
  - data_pl = pl.read_csv('file.csv').with_column_types({'col1': pl.Utf8, 'col2': pl.Utf8})
- read_csv without header (to prevent value be the header name)
  - pd.read_csv(datapath, header = None)
Write to csv: write_csv(file : str, has_header: bool = True, separator : str = ",")
Read excel: `pl.read_excel(source : str |..., sheet_name : str, engine = "openpyxl")

Data Manipulation

Assign column name to dataframe: df.columns = column_name
Create empty data frame: pl.DataFrame()
Dataframe from dict: df = pl.from_dict({"name": name_list, "id": id_list})
Change header: outdf = df.rename({"foo": "apple"}) # foo is previous title, apple is new title
Get unique values of one/a few columns: df[['column_name']].unique()
Conversion
Column to list: df["a"].to_list()
Check if dataframe is empty: df.is_empty()
Reorder column: df = df[['PRODUCT', 'PROGRAM', 'MFG_AREA']]
Drop Column: df.drop("<column-name>") / df.drop(["<column-name1>", "<column-name2>"])
Rename column name: df = df.rename(dict(zip(["column_name1_ori", "column_name2_ori"], ["column_name1", "column_name2"])))
Casting: out = df.select(pl.col("<col-name>").cast(pl.Int32))
Fill null with value: df.with_columns(pl.col("b").fill_null(99))
Remove rows with conditions using filter
Sort column value by order
Concatenate dataframe
- default concatenate on rows: pl.concat([df1, df2]) equivalent to pl.concat([df1, df2], how="diagonal")/ pl.concat([df1, df2], how="vertical"
Group By
Drop duplicates whole /subset
Change sequence of columns in dataframe: df = df[['PRODUCT', 'PROGRAM', 'MFG_AREA']]
Add a new column with list: df.with_columns(pl.Series(name="column-name", values=prediction_list))
Apply function to a column: df=df.with_columns([(pl.col("<column-name>").map_elements(<function-to-apply>).alias("<new-column-name>"))])
Drop nulls: df = df.drop_nulls() More
- Drop a row if all value is null: df.filter(~pl.all_horizontal(pl.all().is_null()))
Replace column values
Apply function to value:
- String operations
- Apply uppercase to column with string
- Replace value in column
- String remove whitespace front and back and in between
```
df = df.select(pl.col(pl.Utf8).str.strip_chars())
df = df.select(pl.col(pl.Utf8).str.replace(" ", ""))
```
Check if duplicated value in a column

Series

Check if any value in a Boolean Series is true: df.select(pl.col("a").is_duplicated())['a'].any()

Note

Pandas to Polars Cheatsheet

Modin

Install

pip install "modin[all]"

Import

import modin.pandas as pd

Pandas

Panda Infos

Dataframe basic
- Get # rows and columns
- Get summary/infos about dataframe
Get data types
Dataframe/Series Min, Max, Median, General Description
Get rows name (index) and columns name (column)
Get a glimpse of dataframe
Get subset of a dataframe by rows/by columns
Get rows by finding matching values from a specific column
Check if a column name exist in dataframe - if 'code' in df.columns:
Iteration of each rows in a dataframe

Panda Operations

Check if dataframe is empty: df.empty #return boolean
Get dataframe from list

Build dataframe with columns name

column_list = ["a", "b"]
df = pd.DataFrame(columns = column_list)

Build a new dataframe from a subset of columns from another dataframe
Get subset of dataframe, sample columns with specific criteria
- Sample by percentage
- Sample by # of rows specified
- Sample by matching to a value
Apply
Column to list: df.columns.tolist()
Sample rows: df = df.sample(frac=1).reset_index(drop=True)
Referring to dataframe column by key or by string
Concatenate dataframe
- Concatenate by adding rows
Append string to all rows of a column
Reset index without creating new (index) column: df.reset_index(drop=True)
Assign df by copy instead of reference - df.copy()
Shuffle rows of df: df = df.sample(frac=1).reset_index(drop=True)
Pandas with multiple index

Bytes to dataframe

  from io import BytesIO
  import pandas as pd

  data = BytesIO(bytesdata)
  df = pd.read_csv(data)

Sort values according to particular column df = df.sort_values(by=['frame'])
Rearrange dataframe column sequence

Panda Type

Panda Series

Series to value
Series/Dataframe to numpy array: input.to_numpy()
Series iteration: for index, item in seriesf.items():
Series to dict: seriesf.to_dict()

Panda Assign values

Panda Remove/drop values

Drop duplicates for df / subset, keep one copy and remove all
Remove/drop rows where specific column matched value
Remove specific columns with column name
Drop rows by index
Drop rows/columns with np.NaN: df3 = df3.dropna(axis = 1) #row

Panda SQL-like functions

pivot table: :TODO
- Drawback: Not able to do filtering selection
Merge two dataframes based on certain column values

Panda Filtering

Filter with function isin()
Filter df with item not in list
Filter with function query()
Find with loc
- df.loc[df['address'].eq('[email protected]')] #filter with one value
- df.loc[df.a.eq(123) & df.b.eq("helloworld")] #filter with one value in multiple columns
- df.loc[df.a.isin(valuelist)] #filter with a few values in a list
- Filter by substring: df.loc[df['folder'].eq(folderkey) & df['id'].str.contains(videokey)]
Assign value to specific column(s) by matching value
Get a subset of dataframe by rows - df.iloc[<from_rows>:<to_rows>, :]
Count items and filter by counter values
Retrieve columns name which match specific str

Panda Excel In/Out

Read in excel with specific sheet name: pd.read_excel(<url>, sheet_name = "Sheet1", engine = "openpyxl")
- Note: Install engine by pip install openpyxl
Read number of sheets in excel
Save excel: df.to_excel('file_name', index = False)
Write to multiple sheets

Panda CSV In/Out

Read csv with other delimiter pd.read_csv(<path-to-file>, delimiter = '\x01')
Read csv with bad lines pd.read_csv(<path-to-file>, on_bad_lines='skip')
- Note: pd.read_csv(<path>, error_bad_lines = False) deprecated
Read csv with encoding pd.read_csv('file name', encoding = 'utf-8')
Save to csv df.to_csv('file name', index = False)
- Note: Put index = False is important to prevent an extra column of index being saved.
Save to csv with encoding df.to_csv('file name', encoding = 'utf-8')
Write list/dict to csv file (Note: to not affected by the comma in the collection)

Panda Parquet In/Out

Read in parquet: pd.read_parquet(...)
Write to parquet: pd.to_parquet(...)

Panda Pickle In/Out

Note: Pickle have security risk and slow in serialization (even to csv and json). Dont use

Read in pickle to dataframe: df = pd.read_pickle(<file_name>) # ends with .pkl
Save to pickle: df.to_pickle(<file_name>)

Panda Dataframe Others

Random dataframe and database table generator

Random

Fixed Random Seed Number (Generate same pattern) : random.seed(integer_value) Randomize everytime: random.seed()
Generate random floating value within 0- 1: from random import random; random.random()
Generate random integer within (min, max). Both bound included: from random import randint; randint(0, 100) #within 0 and 100
Generate random floating value: from random import random; random()
Randomly choosing an item out from a list: import random; random.choice([123, 456, 378])
- Weighted Sampling
Generate list with random number: import random; random.sample(range(10, 30), 5)
- Example shown where 5 random numbers are generated in between 10 to 30
Shuffle an array: random.shuffle(array)

Intermediate

Error Handling

Native Catching Exception
- Catch multiple error: except (CompileError, ProgrammingError) as e:
Traceback
Suppress and log error

Types of Built-In Exceptions

ValueError: argument of the correct data type but an inappropriate value
TypeError: the data type of an object is incorrect
IndexError: Raised when a sequence subscript is out of range
KeyError: When key cannot be found
ZeroDivisionError: when a number is divided by zero
OSError: error from an os-specific function
FileNotFoundError: when a file or directory is requested but doesn’t exist
IsADirectoryError: when removing a file but it turns out is a directory with os.remove(file)
NotImplementedError: commonly raised when an abstract method is not implemented in a derived class
NameError: reference to some name (variable, function, class) that hasn’t been defined
AttributeError: reference to certain attribute in a class which does not exist
ImportError: Trouble loading a module
- Submodule
  - ModuleNotFoundError: the module trying to import can’t be found or try to import something from a module that doesn’t exist in the module
AssertionError: Raise when run assert

File System

The character used by the operating system to separate pathname components: os.sep
Iterate through a path to get files/folders of all the subpaths
Write file: f.write(str)
print without new line: print(..., end="")
Get environment path (second param is optional): import os; os.getenv(<PATH_NAME> : str, <alternative-return-value>: str)
Get and set environment path
- Set variable: os.environ['redis'] = "localhost:6379"
- Get value with key: import os; os.environ["HOMEDIR"]
- Get value with default value: database_url = os.environ.get("DATABASE_URL", "default-value")
Flush out print
Check if path is a folder: os.path.isdir(<path>)
Get root path: root_path = os.path.expanduser("~/.root")
Get file size
- from pathlib import Path; outsize : int = Path(inputfilepath).stat().st_size
- import os; outsize : int = os.path.getsize(inputfilepath)
Create folder: os.mkdir(<path>
Create folders recursively: os.makedirs(<path>)
Rename file: os.rename(<filepath-from>, <filepath-to>) / os.rename(<dirpath-from>, <dirpath-to>)
Get folder path out of given path with filename: os.path.dirname(<path-to-file>)
Expand home directory: os.path.expanduser('~')
Get current running script path: os.getcwd()
Get the list of all files and directories in the specified directory (does not expand to items in the child folder: os.listdir(<path>)
Get current file path (getcwd will point to the running script(main) path, this will get individually py path): os.path.dirname(os.path.abspath(__file__))
Get filename from path: os.path.basename(configfilepath)
Split extension from rest of path(Including .): filename, ext = os.path.splitext(path)
Append certain path: sys.path.append(<path>)
Check if path exist: os.path.exists(<path>)
Remove a file: os.remove()
Get size of current file in byte: os.path.getsize(<path>) or from pathlib import Path; Path(<path>).stat().st_size
Removes an empty directory: os.rmdir()
Deletes a directory and all its contents: import shutil;shutil.rmtree(<path-to-directory>)
Copy a file to another path
Unzip file
Readfile
```
open(<path-to-file>, mode)
```
- r: Open for text file for reading text
- w: Open a text file for writing text
- a: Open a text file for appending text
- b: Open to read/write as bytes Read file has 3 functions
- read() or read(size): read all / size as one string.
- readline(): read a single line from a text file and return the line as a string.
- readlines(): read all the lines of the text file into a list of strings.
- write(<param> : str): write in param. Need to explicitly add \n to split line.
- .close(): close file iterator
- check if file iterator is closed: closed

System

Get system input
Check operating system: import platform; platform.system()
Check if port is open/close

Time

Measure Time Performance with time.time() / time.perf_counter()
Add delay to execution of the program by pausing: import time;time.sleep(seconds)
- Note: stops the execution of current thread only
Point to a later time from now

Plotting

Matplotlib
- Plot with lines & dots

Advanced

Class

Effective way to view object address and object
Reserved methods in class
The magic variable *args and **kwargs: Quick Review Elaborated Notes
Check if object is of specified type: isinstance(obj, MyClass) / isinstance(obj, (type1, type2) : tuple)
Deep Copy, Shallow Copy
- Copy list by value: list_cp = list_ori[:] (Note: list_cp = list_ori copy by reference)
Define dataclass @dataclass
- dataclass 1
- dataclass 2
- dataclass 3
  - Compare normal class definition with dataclass definition
  - Layout output of dict for dataclass class
- dict as constructor input
Enum
- Enum get key: obj.name
- Enum get value: obj.value
- Implement Enum in Python
  - Compare enum: value == EnumObject.OPTION1
  - Enum with string
  - str to enum
- Get all the values of enum: [e.value for e in Directions]
Serialize class object
[Function/Module with error handling](notes/class/function_with error_handling.ipynb)
Identify if function did not return object. TLDR: if not test1()
Compare class object
Static Variable

Magic Method

__dict__ return all attributes of an object(only those defined in init): obj.__dict__
__str__ return string representation of the obj: def __str__(self):
__eq__ compare the instances of the class: def __eq__(self, other):
- Define eq function in class 1
- Define eq function in class 2
__repr__: represent a class's objects as a string. Call object with repr(obj)
__call__: to make class instance callable classinstance(variable)

Regular Expression (Regex)

Find matching word/character 1
- Introduction of functions in re library
- Square brackets for upper and lower case [Ww]oodchuck
Find matching word/character 2
- Optional character with ?
- Optional 0 or more character with *
- Optional 1 or more character with +
- Any character with .
Find matching word/character 3
- Whitespace character find with \s
- Non-whitespace character find with \S
Find matching word/character 4
- Caret before square bracket:^[] to indicate beginning
- Dollar sign after square bracket:[]$ to indicate ending
Negation
Disjunction
- To match a series of patterns with parenthesis.
Extract hashtags
Extract numbers from string

Data Structure - Processing iterables with a functional style

yield instead of return link1 yield, iterators, generators
Produce a new iterable with map()
Generate a new iterable with Boolean-return function with filter()
Produce a single cumulative value from iterable with reduce()
Condition checking with any()
Multiple function declaration with singledispatch)
Lambda function: x = lambda a, b : a * b Note: Functional style can be replaced with list comprehension or generator expressions

Algorithm

Inheritance

from abs import ABC
from abs import ABCMeta
Difference between importance ABC or ABCMeta
- TLDR: ABC is a wrapper of ABCMeta, both serves the purpose where former easy to write.

Passing variables in from command line

Unnamed arguments
Named arguments: :TODO
Filename as argument

Environment Setting

Read from config file
- How to comment on config file(*.ini): Put # sign in front of an empty line
Using .env Files for Environment Variables in Python Applications

When to use configparser? When to user .env?
#### TLDR:
Use .env to save string-variable value which should not at any cost being exposed in code versioning platform/docker

### use .env
- the . of filename make it hidden
- already excluded in preset .gitignore
- Nearly every programming language has a package or library that can be used to read environment variables from the .env file instead of from your local environment. 
- load_dotenv will find from host environment for variables when .env file is not file (for docker environment)

### use configparser
- import with more built in variable type (int, string, boolean) and checks to perform upon the value

XML Parser

Read from xml file

URL

Download URL to local file and checksum

Performance

Multiprocessing

Difference of pool(from multiprocessing) from thread:
pool spins up different processes while thread stay in the same process
  
The goal of pool (multiprocessing) is to maximize the use of cpu cores.

Create workers according to number of cores

Logging

Built-In Logging

Basic:

import logging
logger = logging.getLogger(__name__)
logging.basicConfig(stream=sys.stdout, level=logging.INFO)

Logging Levels: DEBUG, INFO, WARN, ERROR, FATAL
Advanced configuration log to stdout
Advanced configuration log to file
Log with variables: logging.error(f"Keys {a} is missing")
Log exception
Logging write to both stdout and file

Logging Others

Logging with module icecream

Design Patterns

Factory method
Abstract Factory
Monkey Patching
Singleton: A singleton is a class with only one instance.
Decorator

Built-in Decorators

Class Method @classmethod: take cls as first parameter (have access to internal fields and methods)
Static Method @staticmethod: can take no parameters, basically just a function
- When to use @classmethod, @staticmethod
  - Class method can modify the class state,it bound to the class and it contain cls as parameter.
    def test(cls, ): self.variable = ?
  - Static method can not modify the class state,it bound to the class and it does't know class or instance
    def test(variable): ...
dataclass @dataclass
- dataclass hello world
Abstract class with ABCMeta and @abstractmethod
Property Setting
@property to prevent setting value
1. Native Verbose Method
2. Using built-in property function
3. Using decorator
- getter: @property
- setter: @{variable}.setter
- deleter: @{variable}.deleter
@lru_cache

Testing

Simple check, raise AssertionError if wrong: format: `assert condition, message when error raised
- assert a == 20, "val a == 20"
- assert isinstance(a, int)

Module typing: Type hint & annotations

List
Tuple

Set

# Prior to python 3.9

from typing import List, Tuple, Set

items: List[str]
values : Tuple[int, str, str]
products : Set[bytes]

# python 3.9 onwards
# no need import 

items: list[str]
values : tuple[int, str, str]
products : set[bytes]

Any : from typing import Any; varible : Any
Union / Optional
- Simplification of Union from python 3.10 onwards: var1 : str | None
Literal
Annotated
- Before python 3.9: from typing_extensions import Annotated
- Python 3.9 onwards: from typing import Annotated

Pydantic : Data parsing and validation library

BaseModel to correctly declare type
Pydantic Settings
- Checking HTTPUrl

Email Validation

Basic checking of domain existence and email constructed correctly

Keywords

Others

Jinja: Template Engine for Python. TLDR: Use text-based templates to tranformer Hello, {{ user_name }}! to Hello, John!
Kill after x amount of time if process not complete
Context Managers

Webbrowser

Open url with webbrowser module
- In script
- In command: python -m webbrowser -t "https://www.python.org"

Redis

Declare redis: r = redis.Redis(host='127.0.0.1', port=6379)
Set key: r.set('counter', 1)
Get key: value : int = r.get('counter')
Check if key exist: r.exists(<key>)
Set expiry and check ttl

Networking

Get IP from domain name: import socket;socket.gethostbyname("www.google.com");
Get host name of the machine: socket.gethostname()

Concurrency

Built-in Concurrency Library: Asyncio

Simple example with asyncio

Hashing

Web

Webhook

Software Development

REST

FastAPI

Requests

Get data from url

Database

Connect to db with sqlalchemy
- Silence the log: create_engine(..., echo = False)
- SQLAlchemy query with name and value insertion

PostgreSQL

Postgres connect to AWS RDS
Local Node
~~Save and load image between REST and Postgres~~ Obsolete: large files (including image) should be saved to storage
~~Save and load video between REST and Postgres~~ Obsolete: large files (including image) should be saved to storage

Cloud

AWS

Postgres connect to AWS RDS

S3: Scalable Storage

List name of buckets
List objects in a specific bucket
Upload file with function upload_file or upload_fileobj
- Upload video file
- Upload video file with progress counter
Upload multipart
Upload multipart with multiple workers
Get object from S3 with boto
Download s3 public from url with requests
Create subfolder in bucket and upload file
Delete S3 object, objects and/or folder Note:
What is a bucket in S3

A bucket is a container for objects stored in Amazon S3 which can contains files and folders. You can store any number of objects in a bucket and can have up to 100 buckets in your account

Machine Learning

Pytorch

Check if cuda is available - import torch; torch.cuda.is_available()
Softmax

Torch Tensor

Torch Tensor Creation

Create tensor of zeros with shape like another tensor: torch.zeros_like(another_tensor)
Create tensor of zeros with shape (tuple): torch.zeros(shape_in_tuple)
Create tensor of ones with shape like another tensor: torch.ones_like(another_tensor)
Create tensor of ones with shape (tuple): torch.ones(shape_in_tuple)
Create tensor of random floating value between 0-1 with shape like another tensor:
torch.rand_like(another_tensor, dtype = torch.float)
Create tensor of random floating value between 0-1 with shape (tuple):
torch.rand(shape_in_tuple)

Torch Tensor Info Extraction

Given torch.tensor buffer = tensor(4), get the value by - id = buffer.item()
Given torch.tensor, get the argmax of each row - torch.argmax(buffer, dim=<(int)dimension_to_reduce>)
Tensor to cuda - inputs = inputs.to("cuda:0") or inputs = inputs.cuda()
Tensor to cpu - inputs = inputs.to("cpu") or inputs = inputs.cpu()
Tensor shape - tensor.shape
Tensor data types - tensor.dtype
Device tensor is stored on - tensor.device
Torch tensor(single value) to value: tensorarray.item()
Retrieve subset of torch tensor by row index: tensor[<row_number>, :] / tensor[<row_number_from>:<row_number_to>, :]
Retrieve subset of torch tensor by column index: tensor[:, <column_number_from>:<column_number_to>]

Torch Tensor Conversion

List to torch tensor - torch.tensor(listimp)
Numpy array to torch tensor - torch.from_numpy(np_array)
Torch tensor to numpy: tensorarray.numpy()
Image to torch tensor
Torch tensor to image

Torch Tensor Operation

Torch tensor value change by indexing and conditions
Concatenate tensor according to dimension (0 for adding rows, 1 for adding columns):
torch.cat([<tensor_1>, <tensor_2>, ...], dim = <dimension_number>

Dataset Loader, Iterator

torch.utils.data.DataLoader: stores the samples and their corresponding labels,
torch.utils.data.Dataset: wraps an iterable around the Dataset to enable easy access to the samples

Torch Tensor In/Out

Save torch tensor to file: torch.save(x : torch.tensor, tensorfile :str)
Load torch tensor from file: torch.load(tensorfile :str)

Torch Dataset

Image Datasets
- Fashion MNIST Torch
  
  Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes.
Text Datasets
Audio Datasets

Huggingface

Send model to cuda - model.to('cuda:0') or model.cuda()
Overview of DatasetDict
DatasetDict from Pandas Dataframe

Computer Vision

Computer Vision - Basic

Get image shape: img.shape (Important: shape[0] = height, shape[1] = width)
Create a color image: image = np.zeros((h,w,3), np.uint8)
Read/Write image:
- As byte
- As Bytearray
- As base64
- From imageio (save with numpy array)
  - Read only 3 channels: im3d = imageio.imread('path/to/some/singlechannelimage.png', pilmode='RGB')
Read image
- Read image from url
- Read in image with Pillow
  - Pillow read in image from np.array: im = Image.fromarray(nprrayimage)
- Read in image from imageio
Pause to display image or wait for an input: cv2.waitKey(0)
Save an image: cv2.imwrite(pathtoimg : str, img : numpy.ndarray)
Show an image in window: cv2.imshow(windowname : str, frame : np.array)

Show an image in Jupyter notebok

from IPython.display import Image
Image(filename=pathtoimg : str)

Crop image
- numpy array: image[y0:y1, x0: x1, :]
Flip image: frame = cv2.flip(frame, flipcode : int)
- Positive flip code for flip on y axis (left right flip)
- 0 for flip on x axis (up down)
- Negative for flipping around both axes

Computer Vision - Intermediate

Computer Vision - Filter

Blur with averaging mask: cv2.blur(img,(5,5))
GaussianBlur: blur = cv2.GaussianBlur(img,(5,5),0)
- Note: Kernel size (5, 5) to be positive and odd. Read more here on how kernel size influence the degree of blurring.
Blurring region of image

Computer Vision - Video Stream

Play video in jupyter notebook/lab
Get total number of frames in the video: int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
change video frame curent count to desired frame count
Concat multiple video streams to show side by side: 2 video streams 3 video streams
Save stream to video output
- opencv method (Face problem when replaying the video generated on AWS cloud services)
- imageio method
Read in video stream from a file
- Rread in video stream with imageio
Read in stream from camera
video arrays (in opencv) -> bytes -> np.array -> video arrays (in opencv)
Merge audio with video
Check if video comes with audio
Split audio from video

Computer Vision - Other

Overlay image
Resizing frame: outframe = cv2.resize(frame, (w, h))
Set color to rectangle region
RGB value to color text
Color to gray image: gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Draw circle: image = cv2.circle(image, center_coordinates: set, example: (50, 100), radius: int, color : set, example: (255, 255, 255), thickness : int)
bgr to rgb channel: img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
Draw rectangle and point on image with mouse activity
- Mouse Events
Draw rectangle on image in jupyter
Write text on image
Remove background
Weighted blend two image with cv2.addWeighted
Add channel to image
Draw contour

Name		Name	Last commit message	Last commit date
Latest commit History 501 Commits
metadata		metadata
notes		notes
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
config.yml		config.yml
miniconda-guidelines.md		miniconda-guidelines.md
non-comprehensive-library-list.md		non-comprehensive-library-list.md
readme.md		readme.md

License

whitepawglobal/bite-size-python

Folders and files

Latest commit

History

Repository files navigation