Skip to content

whitepawglobal/bite-size-python

Repository files navigation

Bite-Size Python

project status: active

Package Installation

Install package with pip Num pip install <package-name>. Example:pip install numpy

Install package with conda

conda install <package>. Example: conda install numpy


Basic

Comment

  • Single Line Comment: //sample text
  • Multi Lines Comment:
     """
     Hello World!
     Nice to meet all of you cookie monsters!
     """
    

Boolean Operator

  • Define Nan, Infinite
  • Sum up an array: sum(arr)
  • Max, returns maximum between two value: max(a, b)
  • Min, returns minimum between two value: min(a, b)
  • Atan2: import math; math.atan2(90, 15)
  • Asin: import math; math.asin(0.5)
  • Sin: import math; math.sin(1)
  • Cos:import math; math.cos(1)
  • Factorial: import math: math.factorial(1)
  • Round up a number to a certain decimal point: round(value, 1)
  • Calculate percentile
  • Power of a number: pow(base_number, exponent_number
  • Square root of a number: sqrt(number)
  • Ceiling
    import math
    value : int = math.ceil(invalue)
    
  • Floor
    import math
    value : int = math.floor(invalue)
    
  • Logarithm / Log
  • Exclusive Or (XOR)

Math-others

Sorting

Data Types

Integer

  • Get maximum: import sys; sys.maxsize
  • Get minimum: import sys: -sys.maxsize - 1

Floating Value (float, double)

  • Format floating value to n decimal: "%.2f" % floating_var / print("{:.2f}".format(a))

Bytes

Notes:

Difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified (immutable), 
and bytearray() returns an object that can be modified (mutable).

ByteArray

Notes:

Difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified (immutable), 
and bytearray() returns an object that can be modified (mutable).

Numpy

  • Numpy basic
    • numpy array with int random value: np.random.randint(5, size=(2, 4))
  • Check if numpy array has true value: np.any(<np-array>)
  • Get numpy shape: nparray.shape
  • Numpy array to list: nparray.tolist()
  • List to numpy array: np.array(listarray)
  • Change datatype: nparray = nparray.astype(<dtype>) Example: nparray = nparray.astype("uint8")
  • Numpy NaN (Not A Number): Constant to act as a placeholder for any missing numerical values in the array: np.NaN / np.nan / np.NAN
  • Numpy multiply by a value: nparray = nparray * 255
  • Numpy array to image
  • Numpy array to Torch tensor: torch.from_numpy(nparray)
  • Numpy <> Binary File(.npy)
  • Print numpy array without scientific notation (e-2)
  • Set numpy print options to suppress scientific notation
    np.set_printoptions(suppress=True, precision=10)
    print(predictions)
    
  • Use of numpy.where
  • Get minimum value of numpy array: np.amin(array)
  • Get maximum value of numpy array: np.amax(array)
  • Calculate euclidean distance with numpy
  • Opencv Numpy array to bytes
    targetimage : np.array
    success, encoded_targetimage = cv2.imencode(".png", targetimage)
    encoded_targetimage.tobytes()
    

String

  • Generate string with parameter
  • Check if string is empty, len = 0: strvar = ""; if not strvar:
  • Check if string contains digit: any(chr.isdigit() for chr in str1) #return True if there's digit
  • Check file extension: notes/string/check_file_extension.ipynb
  • Capitalize a string: strvar.capitalize()
  • Uppercase a string: strvar.upper()
  • Lowercase a string: strvar.lower()
  • Capitalize the beginning of each word: strvar.title()
  • Get substring from a string: strvar[<begin-index>:<end-index>] / strvar[<begin-index>:] / strvar[:<end-index>]
  • Strip multiple white spaces into only one
  • Remove white spaces in the beginning and end: strvar.strip()
  • Swap existing upper and lower case: strvar.swapcase()
  • Capitalize every first letter of a word: strvar.title()
  • Splitting string:
    • Split a string based on separator: strvar.split(separator) Example: strvar.split("x")
    • Split on white space: strvar.split()
    • If split with every character, do this instead: [*"ABCDE"] Result: ["A", "B", "C", "D", "E"]
  • Check if string starts with a substring: strvar.startswith(<substring>)
  • Check if string ends with a substring: strvar.endswith(<substring>)
  • Check if string have substring/specific character. Returns -1 if not found: strvar.find(input : str), strvar.find(input: str, start_index : int)
  • String get substring with index: str[startindex:endindex]
  • Replace string/character with intended string/character: strout = strin.replace(" ", "_")
  • Replace multiple characters with intended character
  • Replace multiple string with intended string
  • Generate string
  • String to List/Dict: eval(strinput)
  • List to string: <separators>.join(list) example: ', '.join(listbuffer)

Unique Identifer (UUID)

Datetime

Data Structure

  • List of str to int: list(map(int, arr))
  • List with range of values: list(range(...))
  • Split str to list of str: arr.split(" ")
  • Find if a value in a list: if value in mylist: / if value not in mylist:
  • Copy by value: array.copy() (to not impact changes in array after changing)
  • Sort an array in place: arr.sort() / Return a sorted array: sorted(arr)
    • Sort an array in reverse: arr.sort(reverse = True) / Return a sorted array: sorted(arr, reverse = True)
  • Get index of a value: arr.index(value) (When not found will raise ValueError)
  • Add one more value to existing list: arr.append(value)
  • Insert at index: arr.insert(index, value)
  • Extend list with values in another list: arr.extend(arr2)
  • Remove an item (the first item found) from the list: arr.remove(item)
  • Remove item by index: del arr[index] or del arr[index-start: index-end]
  • Check for empty list: arr = []; if not arr: #empty list
  • Clear a list: arr.clear()
  • Get list subset: list[start:stop:step] example array[0::2] a[0:6:2]
  • Check all items in a list(subset) if exist in another list, returns boolean: set(b).issubset(v)
  • Check unordered list to have the same items, returns boolean: set(a) == set(b)
  • Change values of list with List Comprehension: [func(a) for a in sample_list]
  • Iteration of list with index: for index, value in enumerate(inlist):
    • Enumerate with a beginning index: for index, value in enumerate(inlist, 2): (Index comes as second parameter)
  • Iteration over two lists: [<operation> for item1, item2 in zip(list1, list2)]
  • Count occurence of items in list: array.count(val)
  • Get maximum value in a list of numbers (even strings): max(samplelist)
  • Get argument of minimum / maximum value
  • Reverse a list: list(reversed([1, 2, 3, 4]) / listinput.reverse()
  • list to string: ",".join(bufferlist)
  • Remove a value in list by index: returnedvalue = listarray.pop(index) (Note: Invoke IndexError if index not valid)
  • List Counter

Build list

  • Build list of same values: ['100'] * 20 # 20 items of the value '100'
  • Build multiple list into one: lista + listb + listc
  • Build list by breaking down every character of a string: [*'abcdef']

[collections.defaultdict]

Set

  • Set initialization: setsample = {1,2,3,4,5}/ setsample = set()
  • Add item: setsample.add(<value>) example setsample.add((1,2)) (has to be tuple, not list)
  • Add multiple items: setsample.update(<another-set>)
  • Set with multiple-value input as set
  • Remove value by index: setsample.pop(<index>)
  • Remove value by value: setsample.remove(<index>)
  • Check if value exist in set: if value in setsample:

Tuple

  • Build a tuple: var : tuple[bool, str | None] = tuple([True, "abc"])
  • List to tuple: tuple([1,2])

Named Tuple

Applicable to Python Iterables (List, Set,...)

  • To identify if any items in the iterables has True/1 values: any(sample_list) #returns single value True/False
  • Zip multiple iterables

JSON

Polars

  • Import: import polars as pl

View

  • Get header of dataframe: df.columns
  • View first n rows: df.head(n)
  • View random rows: df.sample(n)
  • Get number of rows: row_count = df.select(pl.count()).item()

File IO

  • Dataframe from dict
  • Read in csv: pl.read_csv(...)
    • read in csv changing column file type
      • data_pl = pl.read_csv('file.csv').with_column_types({'col1': pl.Utf8, 'col2': pl.Utf8})
    • read_csv without header (to prevent value be the header name)
      • pd.read_csv(datapath, header = None)
  • Write to csv: write_csv(file : str, has_header: bool = True, separator : str = ",")
  • Read excel: `pl.read_excel(source : str |..., sheet_name : str, engine = "openpyxl")

Data Manipulation

  • Assign column name to dataframe: df.columns = column_name
  • Create empty data frame: pl.DataFrame()
  • Dataframe from dict: df = pl.from_dict({"name": name_list, "id": id_list})
  • Change header: outdf = df.rename({"foo": "apple"}) # foo is previous title, apple is new title
  • Get unique values of one/a few columns: df[['column_name']].unique()
  • Conversion
  • Column to list: df["a"].to_list()
  • Check if dataframe is empty: df.is_empty()
  • Reorder column: df = df[['PRODUCT', 'PROGRAM', 'MFG_AREA']]
  • Drop Column: df.drop("<column-name>") / df.drop(["<column-name1>", "<column-name2>"])
  • Rename column name: df = df.rename(dict(zip(["column_name1_ori", "column_name2_ori"], ["column_name1", "column_name2"])))
  • Casting: out = df.select(pl.col("<col-name>").cast(pl.Int32))
  • Fill null with value: df.with_columns(pl.col("b").fill_null(99))
  • Remove rows with conditions using filter
  • Sort column value by order
  • Concatenate dataframe
    • default concatenate on rows: pl.concat([df1, df2]) equivalent to pl.concat([df1, df2], how="diagonal")/ pl.concat([df1, df2], how="vertical"
  • Group By
  • Drop duplicates whole /subset
  • Change sequence of columns in dataframe: df = df[['PRODUCT', 'PROGRAM', 'MFG_AREA']]
  • Add a new column with list: df.with_columns(pl.Series(name="column-name", values=prediction_list))
  • Apply function to a column: df=df.with_columns([(pl.col("<column-name>").map_elements(<function-to-apply>).alias("<new-column-name>"))])
  • Drop nulls: df = df.drop_nulls() More
    • Drop a row if all value is null: df.filter(~pl.all_horizontal(pl.all().is_null()))
  • Replace column values
  • Apply function to value:
  • Check if duplicated value in a column

Series

Note

Modin

Install

pip install "modin[all]"

Import

import modin.pandas as pd

Panda Infos

Panda Operations

Panda Type

Panda Series

  • Series to value
  • Series/Dataframe to numpy array: input.to_numpy()
  • Series iteration: for index, item in seriesf.items():
  • Series to dict: seriesf.to_dict()

Panda Assign values

Panda Remove/drop values

Panda SQL-like functions

Panda Filtering

Panda Excel In/Out

Panda CSV In/Out

  • Read csv with other delimiter pd.read_csv(<path-to-file>, delimiter = '\x01')
  • Read csv with bad lines pd.read_csv(<path-to-file>, on_bad_lines='skip')
    • Note: pd.read_csv(<path>, error_bad_lines = False) deprecated
  • Read csv with encoding pd.read_csv('file name', encoding = 'utf-8')
  • Save to csv df.to_csv('file name', index = False)
    • Note: Put index = False is important to prevent an extra column of index being saved.
  • Save to csv with encoding df.to_csv('file name', encoding = 'utf-8')
  • Write list/dict to csv file (Note: to not affected by the comma in the collection)
  • Read in parquet: pd.read_parquet(...)
  • Write to parquet: pd.to_parquet(...)

Panda Pickle In/Out

Note: Pickle have security risk and slow in serialization (even to csv and json). Dont use

  • Read in pickle to dataframe: df = pd.read_pickle(<file_name>) # ends with .pkl
  • Save to pickle: df.to_pickle(<file_name>)

Panda Dataframe Others

Random

  • Fixed Random Seed Number (Generate same pattern) : random.seed(integer_value) Randomize everytime: random.seed()
  • Generate random floating value within 0- 1: from random import random; random.random()
  • Generate random integer within (min, max). Both bound included: from random import randint; randint(0, 100) #within 0 and 100
  • Generate random floating value: from random import random; random()
  • Randomly choosing an item out from a list: import random; random.choice([123, 456, 378])
  • Generate list with random number: import random; random.sample(range(10, 30), 5)
    • Example shown where 5 random numbers are generated in between 10 to 30
  • Shuffle an array: random.shuffle(array)

Intermediate

Error Handling

  • The character used by the operating system to separate pathname components: os.sep

  • Iterate through a path to get files/folders of all the subpaths

  • Write file: f.write(str)

  • print without new line: print(..., end="")

  • Get environment path (second param is optional): import os; os.getenv(<PATH_NAME> : str, <alternative-return-value>: str)

  • Get and set environment path

    • Set variable: os.environ['redis'] = "localhost:6379"
    • Get value with key: import os; os.environ["HOMEDIR"]
    • Get value with default value: database_url = os.environ.get("DATABASE_URL", "default-value")
  • Flush out print

  • Check if path is a folder: os.path.isdir(<path>)

  • Get root path: root_path = os.path.expanduser("~/.root")

  • Get file size

    • from pathlib import Path; outsize : int = Path(inputfilepath).stat().st_size
    • import os; outsize : int = os.path.getsize(inputfilepath)
  • Create folder: os.mkdir(<path>

  • Create folders recursively: os.makedirs(<path>)

  • Rename file: os.rename(<filepath-from>, <filepath-to>) / os.rename(<dirpath-from>, <dirpath-to>)

  • Get folder path out of given path with filename: os.path.dirname(<path-to-file>)

  • Expand home directory: os.path.expanduser('~')

  • Get current running script path: os.getcwd()

  • Get the list of all files and directories in the specified directory (does not expand to items in the child folder: os.listdir(<path>)

  • Get current file path (getcwd will point to the running script(main) path, this will get individually py path): os.path.dirname(os.path.abspath(__file__))

  • Get filename from path: os.path.basename(configfilepath)

  • Split extension from rest of path(Including .): filename, ext = os.path.splitext(path)

  • Append certain path: sys.path.append(<path>)

  • Check if path exist: os.path.exists(<path>)

  • Remove a file: os.remove()

  • Get size of current file in byte: os.path.getsize(<path>) or from pathlib import Path; Path(<path>).stat().st_size

  • Removes an empty directory: os.rmdir()

  • Deletes a directory and all its contents: import shutil;shutil.rmtree(<path-to-directory>)

  • Copy a file to another path

  • Unzip file

  • Readfile

    open(<path-to-file>, mode)
    
    • r: Open for text file for reading text

    • w: Open a text file for writing text

    • a: Open a text file for appending text

    • b: Open to read/write as bytes Read file has 3 functions

    • read() or read(size): read all / size as one string.

    • readline(): read a single line from a text file and return the line as a string.

    • readlines(): read all the lines of the text file into a list of strings.

    • write(<param> : str): write in param. Need to explicitly add \n to split line.

    • .close(): close file iterator

    • check if file iterator is closed: closed

System

Time

Plotting

Advanced

Class

Magic Method

Data Structure - Processing iterables with a functional style

Algorithm

Inheritance

Passing variables in from command line

Environment Setting

When to use configparser? When to user .env?
#### TLDR:
Use .env to save string-variable value which should not at any cost being exposed in code versioning platform/docker

### use .env
- the . of filename make it hidden
- already excluded in preset .gitignore
- Nearly every programming language has a package or library that can be used to read environment variables from the .env file instead of from your local environment. 
- load_dotenv will find from host environment for variables when .env file is not file (for docker environment)

### use configparser
- import with more built in variable type (int, string, boolean) and checks to perform upon the value

XML Parser

URL

Performance

Multiprocessing

Difference of pool(from multiprocessing) from thread:
pool spins up different processes while thread stay in the same process
  
The goal of pool (multiprocessing) is to maximize the use of cpu cores.

Logging

Built-In Logging

Logging Others

Design Patterns

Testing

  • Simple check, raise AssertionError if wrong: format: `assert condition, message when error raised
    • assert a == 20, "val a == 20"
    • assert isinstance(a, int)
  • List
  • Tuple
  • Set
    # Prior to python 3.9
    
    from typing import List, Tuple, Set
    
    items: List[str]
    values : Tuple[int, str, str]
    products : Set[bytes]
    
    # python 3.9 onwards
    # no need import 
    
    items: list[str]
    values : tuple[int, str, str]
    products : set[bytes]
    
  • Any : from typing import Any; varible : Any
  • Union / Optional
  • Literal
  • Annotated
    • Before python 3.9: from typing_extensions import Annotated
    • Python 3.9 onwards: from typing import Annotated

Pydantic : Data parsing and validation library

Keywords

Others

Webbrowser

  • Open url with webbrowser module
    • In script
    • In command: python -m webbrowser -t "https://www.python.org"
  • Declare redis: r = redis.Redis(host='127.0.0.1', port=6379)
  • Set key: r.set('counter', 1)
  • Get key: value : int = r.get('counter')
  • Check if key exist: r.exists(<key>)
  • Set expiry and check ttl

Networking

  • Get IP from domain name: import socket;socket.gethostbyname("www.google.com");
  • Get host name of the machine: socket.gethostname()

Concurrency

Built-in Concurrency Library: Asyncio

Hashing

Web

Software Development

REST

FastAPI

Requests

Database

Cloud

AWS

Machine Learning

Pytorch

  • Check if cuda is available - import torch; torch.cuda.is_available()
  • Softmax

Torch Tensor

Torch Tensor Creation

  • Create tensor of zeros with shape like another tensor: torch.zeros_like(another_tensor)
  • Create tensor of zeros with shape (tuple): torch.zeros(shape_in_tuple)
  • Create tensor of ones with shape like another tensor: torch.ones_like(another_tensor)
  • Create tensor of ones with shape (tuple): torch.ones(shape_in_tuple)
  • Create tensor of random floating value between 0-1 with shape like another tensor:
    torch.rand_like(another_tensor, dtype = torch.float)
  • Create tensor of random floating value between 0-1 with shape (tuple):
    torch.rand(shape_in_tuple)

Torch Tensor Info Extraction

  • Given torch.tensor buffer = tensor(4), get the value by - id = buffer.item()
  • Given torch.tensor, get the argmax of each row - torch.argmax(buffer, dim=<(int)dimension_to_reduce>)
  • Tensor to cuda - inputs = inputs.to("cuda:0") or inputs = inputs.cuda()
  • Tensor to cpu - inputs = inputs.to("cpu") or inputs = inputs.cpu()
  • Tensor shape - tensor.shape
  • Tensor data types - tensor.dtype
  • Device tensor is stored on - tensor.device
  • Torch tensor(single value) to value: tensorarray.item()
  • Retrieve subset of torch tensor by row index: tensor[<row_number>, :] / tensor[<row_number_from>:<row_number_to>, :]
  • Retrieve subset of torch tensor by column index: tensor[:, <column_number_from>:<column_number_to>]

Torch Tensor Conversion

Torch Tensor Operation

Dataset Loader, Iterator

  • torch.utils.data.DataLoader: stores the samples and their corresponding labels,
  • torch.utils.data.Dataset: wraps an iterable around the Dataset to enable easy access to the samples

Torch Tensor In/Out

Torch Dataset

  • Image Datasets

    • Fashion MNIST Torch

      Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes.

  • Text Datasets

  • Audio Datasets

Huggingface

Computer Vision

Computer Vision - Basic

  • Get image shape: img.shape (Important: shape[0] = height, shape[1] = width)
  • Create a color image: image = np.zeros((h,w,3), np.uint8)
  • Read/Write image:
  • Read image
  • Pause to display image or wait for an input: cv2.waitKey(0)
  • Save an image: cv2.imwrite(pathtoimg : str, img : numpy.ndarray)
  • Show an image in window: cv2.imshow(windowname : str, frame : np.array)
  • Show an image in Jupyter notebok
    from IPython.display import Image
    Image(filename=pathtoimg : str)
    
  • Crop image
    • numpy array: image[y0:y1, x0: x1, :]
  • Flip image: frame = cv2.flip(frame, flipcode : int)
    • Positive flip code for flip on y axis (left right flip)
    • 0 for flip on x axis (up down)
    • Negative for flipping around both axes

Computer Vision - Intermediate

Computer Vision - Filter

  • Blur with averaging mask: cv2.blur(img,(5,5))
  • GaussianBlur: blur = cv2.GaussianBlur(img,(5,5),0)
    • Note: Kernel size (5, 5) to be positive and odd. Read more here on how kernel size influence the degree of blurring.
  • Blurring region of image

Computer Vision - Video Stream

Computer Vision - Other

[Librosa](I/O operations for audio files, including resampling)

Huggingface Datasets

Good To Read

Medium Posts

About

Bite-size code to faciliate your daily data science usage 🍪

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published