Skip to content

DKT Project Served by Airflow / BentoML / Docker Swarm

Notifications You must be signed in to change notification settings

cheat-tos/serving

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“š Deep Knowledge Tracing(DKT)

๋”ฅ๋Ÿฌ๋‹์„ ์ด์šฉํ•œ ์ง€์‹ ์ƒํƒœ ์ถ”์ (Deep Learning + Knowledge Tracing)์œผ๋กœ ํŠน์ • ์‹œํ—˜์„ ํ†ตํ•ด ํ•™์ƒ์˜ ์ง€์‹ ์ƒํƒœ๋ฅผ ํŒŒ์•…ํ•˜๊ณ  ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์Œ ๋ฌธ์ œ๋ฅผ ๋งž์ถœ์ง€ ์˜ˆ์ธกํ•˜๋Š” ํƒœ์Šคํฌ์ž…๋‹ˆ๋‹ค. ํ•™์Šต๊ณผ ๋ง๊ฐ์„ ํ†ตํ•ด ์ง€์‹ ์ƒํƒœ๋Š” ๊ณ„์† ๋ณ€ํ™”ํ•˜๋ฉฐ ์ถ”๊ฐ€๋˜๋Š” ๋ฌธ์ œ ํ’€์ด ์ •๋ณด๋กœ ์ง€์‹ ์ƒํƒœ๋ฅผ ์ง€์†์ ์œผ๋กœ ์ถ”์ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ“ Repository Summary

์ด ๋ ˆํฌ์ง€ํ† ๋ฆฌ๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ์•„ํ‚คํ…์ณ๋ฅผ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•œ ์ฝ”๋“œ๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ์Šต๋‹ˆ๋‹ค. ๋จผ์ € ์ด ์•„ํ‚คํ…์ณ๋Š” ๋‘ ๊ฐœ์˜ ์„œ๋ฒ„๋ฅผ ์ „์ œํ•œ ์ƒํƒœ๋กœ ๊ตฌ์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

1. Inference์šฉ ์„œ๋ฒ„ : Naver Cloud Platform Server

  • ์œ ์ €์˜ Request๋ฅผ ๋ฐ›์•„ Inference๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ๋ Œ๋”๋งํ•˜๊ฑฐ๋‚˜, Model file์„ packingํ•˜์—ฌ ์ƒˆ๋กœ์šด Docker Image๋กœ ๋งŒ๋“œ๋Š” ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

2. Train์šฉ ์„œ๋ฒ„ : P40 GPU Server

  • Docker ์ปจํ…Œ์ด๋„ˆ๋กœ ๊ตฌ์„ฑ๋œ ์„œ๋ฒ„๋กœ, P40 GPU๊ฐ€ ํ• ๋‹น๋˜์–ด์žˆ์Šต๋‹ˆ๋‹ค. ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ๋‚ด๋ ค๋ฐ›์•„ Train์„ ์ง„ํ–‰ํ•˜๊ณ , model.pt ํŒŒ์ผ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. Model Train์— GPU ๋ฆฌ์†Œ์Šค๊ฐ€ ๋งŽ์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•ด๋‹น ์„œ๋ฒ„๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

์™œ ์„œ๋ฒ„๋ฅผ ๋‚˜๋ˆ„์—ˆ๋Š”๊ฐ€?

ML/DL Cycle์—์„œ ๊ฐ Task๋ณ„๋กœ ํ•„์š”ํ•œ ๋ฆฌ์†Œ์Šค์˜ ์ข…๋ฅ˜์™€ ์ˆ˜์ค€์ด ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด Train Server์˜ ๊ฒฝ์šฐ High-GPU ํ™˜๊ฒฝ์ด ์„ธํŒ…๋˜์–ด์•ผ ์›ํ™œํ•œ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Inference์˜ ๊ฒฝ์šฐ ๋ฌด๊ฑฐ์šด ๋ชจ๋ธ์„ ์˜ฌ๋ฆฌ์ง€ ์•Š๋Š”๋‹ค๋ฉด ๊ตณ์ด ๋†’์€ ๋น„์šฉ์„ ๋“ค์—ฌ GPU ์„œ๋ฒ„๋ฅผ ์‚ฌ์šฉํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ์ด์ฒ˜๋Ÿผ Task์— ๋”ฐ๋ผ ๋น„์šฉ์„ ์ตœ๋Œ€ํ•œ ์ ˆ์•ฝํ•˜๊ณ  ์Šค์ผ€์ผ๋ง ๊ฐ€๋Šฅํ•œ ์•„ํ‚คํ…์ณ๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด AI ๋ชจ๋ธ ์„œ๋น™์˜ ํ•ต์‹ฌ point ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.

๊ธฐ๋ณธ์ ์œผ๋กœ ์„œ๋ฒ„ ์„ธํŒ…์€ NCP Cloud Server์™€ Docker ์ปจํ…Œ์ด๋„ˆ์ธ P40 GPU Server์— ์˜ฌ๋ผ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ •์ƒ์ ์œผ๋กœ ๋™์ž‘ํ•˜๋„๋ก ์„ธํŒ…ํ•˜์‹œ๋ ค๋ฉด Config ๋˜๋Š” ์ฝ”๋“œ ๋‚ด์˜ ๊ฒฝ๋กœ ์„ค์ •์„ ๋ณ€๊ฒฝํ•ด์ฃผ์…”์•ผํ•ฉ๋‹ˆ๋‹ค.

--- Service Flow

  1. ๊ฐ€์žฅ ๋จผ์ € Inference, Train ๊ฐ ์„œ๋ฒ„์— git clone์„ ๋ฐ›๊ณ  ์‰˜ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ด์šฉํ•ด ์„œ๋ฒ„ ์„ธํŒ… / ์ตœ์ดˆ์˜ Client-Inference ์ปจํ…Œ์ด๋„ˆ ์ƒ์„ฑ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

  2. ์œ ์ €๊ฐ€ Inference ์„œ๋ฒ„์— ์š”์ฒญ์„ ๋ณด๋‚ด๋ฉด, Flask ์ปจํ…Œ์ด๋„ˆ(Client ์ปจํ…Œ์ด๋„ˆ)๊ฐ€ ํ•ด๋‹น ๋ฐ์ดํ„ฐ๋ฅผ Inference ์ปจํ…Œ์ด๋„ˆ(์„œ๋ฒ„ ์ปจํ…Œ์ด๋„ˆ)๋กœ ์ „๋‹ฌํ•˜์—ฌ Score ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

  3. update_data_dag

    • Inference ๊ณผ์ •์—์„œ AWS S3์— ์—…๋ฐ์ดํŠธํ•  ๋ฐ์ดํ„ฐ๋ฅผ ์Œ“์•„๋‘์—ˆ๋‹ค๊ฐ€, ์ฃผ๊ธฐ์ ์œผ๋กœ S3๋กœ ์—…๋กœ๋“œํ•˜์—ฌ Versioning ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋•Œ S3 ๋ฒ„ํ‚ท ๋„ค์ž„์€ ๋ฐ˜๋“œ์‹œ ๋ณธ์ธ์˜ S3 Bucket Name์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ์„ธ์š”.
  4. retrain_and_push_dag

    • Train server์—์„œ S3์˜ ์—…๋ฐ์ดํŠธ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‚ด๋ ค๋ฐ›๊ณ , ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ๋ชจ๋ธ ํŒŒ์ผ์„ ์ƒ์„ฑํ•œ ๋’ค ์ด๋ฅผ ๋‹ค์‹œ S3์— ์—…๋กœ๋“œํ•˜์—ฌ Versioningํ•ฉ๋‹ˆ๋‹ค.
  5. update_inference_dag

    • Inference Server์—์„œ ๋ชจ๋ธํŒŒ์ผ์„ ๋‚ด๋ ค๋ฐ›๊ณ , ์ด๋ฅผ ๊ธฐ์กด์˜ ์ฝ”๋“œ์™€ ํ•จ๊ป˜ BentoML๋กœ Packing & Containerizingํ•˜์—ฌ ์ด๋ฏธ์ง€๋กœ ๋นŒ๋“œํ•œ ํ›„ Docker Hub์— pushํ•ฉ๋‹ˆ๋‹ค. ์ด ๋•Œ ํ˜„์žฌ Docker Hub ๋‚ด์˜ Image Repository ์„ธํŒ…์„ ๋ณธ์ธ์˜ ๋ ˆํฌ์ง€ํ† ๋ฆฌ๋กœ ๋ฐ”๊ฟ”์ฃผ์„ธ์š”.
    • ๊ฐ Inference ์ปจํ…Œ์ด๋„ˆ๋Š” Service ๋‹จ์œ„๋กœ ๋ฌถ์—ฌ Docker Swarm์œผ๋กœ ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, Docker Image๋ฅผ Pull๋ฐ›์•„ ๋ณ€๊ฒฝ์ ์ด ์ƒ๊ธด Image๋ฅผ ๊ธฐ์ค€์œผ๋กœ Inference Service๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.

์ดํ›„์—๋Š” 2-5์˜ ๊ณผ์ •์„ ์ฃผ๊ธฐ์ ์œผ๋กœ ๋ฐ˜๋ณตํ•˜์—ฌ ๋ชจ๋ธ์„ ์žฌํ•™์Šตํ•˜๊ณ  ์—…๋ฐ์ดํŠธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋“  ๊ณผ์ •์€ Apache Airflow Dags์— ์˜ํ•ด ์ˆ˜ํ–‰/๊ด€๋ฆฌ๋ฉ๋‹ˆ๋‹ค.


๐Ÿ“Œ How to run

ํ•ด๋‹น ๋ ˆํฌ์ง€ํ† ๋ฆฌ์˜ ์ตœ์ƒ๋‹จ์—๋Š” ๋‘๊ฐœ์˜ Initialization ์‰˜ ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ด ๋‘ ์‰˜ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฐ๊ฐ Inference Server์™€ Train Server๋ฅผ ์ตœ์ดˆ ์„ธํŒ…ํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„, Airflow Webserver๋ฅผ ์ด์šฉํ•˜์—ฌ Dags๋ฅผ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ์„œ๋น™ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

init_for_inference_server.sh

๐Ÿ  ํ™ˆ ๋””๋ ‰ํ† ๋ฆฌ : /root/ โœ”๏ธ ์„ค์น˜ ํŒจํ‚ค์ง€

  • Docker
  • Airflow
  • SQLite3

Description

  • ํŒจํ‚ค์ง€ ์„ธํŒ… ๋ฐ Airflow Scheduler ๋ฐ๋ชฌ ์‹คํ–‰
  • GUI Web server ๋ฐ๋ชฌ ์‹คํ–‰(default port 8080)
  • ์ตœ์ดˆ์˜ Client-Inference ์ปจํ…Œ์ด๋„ˆ ์„œ๋น„์Šค๋“ค์„ Docker Swarm์œผ๋กœ Deploy
  • ์‹คํ–‰ ๊ณผ์ •์—์„œ aws configure๋ฅผ ์„ค์ •ํ•ด์•ผ S3 ์—…๋กœ๋“œ/๋‹ค์šด๋กœ๋“œ ๊ธฐ๋Šฅ์„ ์ •์ƒ์ ์œผ๋กœ ์ˆ˜ํ–‰ ๊ฐ€๋Šฅ

init_for_train_server.sh

๐Ÿ  ํ™ˆ ๋””๋ ‰ํ† ๋ฆฌ : /opt/ml/ โœ”๏ธ ์„ค์น˜ ํŒจํ‚ค์ง€

  • Docker
  • Airflow
  • SQLite3

Description

  • ํŒจํ‚ค์ง€ ์„ธํŒ… ๋ฐ Airflow Scheduler ๋ฐ๋ชฌ ์‹คํ–‰
  • GUI Web server ๋ฐ๋ชฌ ์‹คํ–‰(default port 6006)
  • ์‹คํ–‰ ๊ณผ์ •์—์„œ aws configure๋ฅผ ์„ค์ •ํ•ด์•ผ S3 ์—…๋กœ๋“œ/๋‹ค์šด๋กœ๋“œ ๊ธฐ๋Šฅ์„ ์ •์ƒ์ ์œผ๋กœ ์ˆ˜ํ–‰ ๊ฐ€๋Šฅ

Airflow Dags

์•„๋ž˜ Dag๋“ค์„ ์ฐจ๋ก€๋Œ€๋กœ ์ˆ˜ํ–‰ํ•˜์…จ์„ ๋•Œ ํ•œ Cycle์ด ์™„์„ฑ๋ฉ๋‹ˆ๋‹ค.

update_data_dag - Inference Server

Inference ๊ณผ์ •์—์„œ ์Œ“์ธ ์œ ์ € Interaction ๋กœ๊ทธ๋“ค์„ ๊ธฐ์กด ๋ฐ์ดํ„ฐ์™€ ํ•ฉ์ณ์„œ S3์— ์—…๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

retrain_and_push_dag - Train Server

S3์— ์—…๋ฐ์ดํŠธ๋œ ์œ ์ € Interaction ๋กœ๊ทธ๋“ค์„ ๋ถˆ๋Ÿฌ์™€ ์žฌํ•™์Šต๋œ ๋ชจ๋ธ์„ S3์— ์—…๋กœ๋“œ ํ•ฉ๋‹ˆ๋‹ค.

update_inference_server_dag - Inference Server

S3์— ์ €์žฅ๋œ Retrain ๋œ ๋ชจ๋ธ์„ Inference server๋กœ ์ฝ์–ด์™€ rolling update๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ› ๏ธ Installation

Server Dependencies (pip3)

  • easydict==1.9
  • numpy==1.19.5
  • pandas==1.1.5
  • sklearn==0.0
  • torch==1.6.0
  • transformers==4.6.1
  • bentoml==0.12.1
  • boto3==1.17.78
  • apache-airflow
  • sqlalchemy < 1.4.0
  • attrdict

๐Ÿ›๏ธ File Structure

โ”œโ”€โ”€ client                                 # Components of Flask
โ”‚ย ย  โ”œโ”€โ”€ Dockerfile
โ”‚ย ย  โ”œโ”€โ”€ app
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ main.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ requirements.txt
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ static
โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ ...
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ templates
โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ index.html
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ uwsgi.ini
โ”‚ย ย  โ””โ”€โ”€ start.sh                           # Build and Run client server container
โ”‚
โ”œโ”€โ”€ dags                                   # Airflow DAGs
โ”‚ย ย  โ”œโ”€โ”€ retrain_and_push_dag.py
โ”‚ย ย  โ”œโ”€โ”€ update_data_dag.py
โ”‚ย ย  โ””โ”€โ”€ update_inference_server_dag.py
โ”‚
โ”œโ”€โ”€ data                                   # Data WH upload/download & save data
โ”‚ย ย  โ””โ”€โ”€ ...
โ”œโ”€โ”€ models                                 # Data WH upload/download & save model
โ”‚ย ย  โ””โ”€โ”€ ...
โ”‚
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ docker_manual
โ”‚ย ย  โ”œโ”€โ”€ docker_commands.sh                 # Summary of docker commands
โ”‚ย ย  โ””โ”€โ”€ service-init.sh                    # Start inference with docker swarm
โ”‚
โ”œโ”€โ”€ config                                 # Model Config JSON Files 
โ”‚ย ย  โ””โ”€โ”€ ...
โ”œโ”€โ”€ asset                                  # Encoder class npy files 
โ”‚ย ย  โ””โ”€โ”€ ..
โ”œโ”€โ”€ dkt                                    # Baseline codes
โ”‚ย ย  โ””โ”€โ”€ ...                                
โ”œโ”€โ”€ args.py                                # Get user arguments
โ”œโ”€โ”€ train.py                               # Training model
โ”œโ”€โ”€ requirements.txt
โ”‚
โ”œโ”€โ”€ questions.csv
โ”œโ”€โ”€ inference.py                           # Inferenece using question.csv
โ”œโ”€โ”€ packer.py                              # Packing model, encoders to bentoml service class
โ”œโ”€โ”€ service.py                             # Compose inference api 
โ”‚
โ”œโ”€โ”€ init_for_inference_server.sh           # Initialize for Inference server
โ”œโ”€โ”€ init_for_train_server.sh               # Initialize for Train server
โ””โ”€โ”€ README.md

client

Flask๋ฅผ ์ด์šฉํ•˜์—ฌ ์›นํŽ˜์ด์ง€๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ์š”์†Œ๋“ค์ž…๋‹ˆ๋‹ค. Interactionํ•˜๋Š” ํ™”๋ฉด์„ ๋ฐ”๊พธ๊ณ ์‹ถ๋‹ค๋ฉด ์ด ํŒŒํŠธ๋ฅผ ์ˆ˜์ •ํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

dags

Airflow์— Dag๋กœ ๋“ฑ๋ก๋  ํŒŒ์ผ๋“ค์ด ์กด์žฌํ•˜๋Š” ๋””๋ ‰ํ† ๋ฆฌ์ž…๋‹ˆ๋‹ค. cp ์ปค๋งจ๋“œ๋ฅผ ์ด์šฉํ•˜์—ฌ initialization ๊ณผ์ •์—์„œ airflow/dags/ ๋””๋ ‰ํ† ๋ฆฌ๋กœ ๋ณต์‚ฌ๋ฉ๋‹ˆ๋‹ค.

data

๋ฐ์ดํ„ฐ๊ฐ€ ์œ ์ € Interaction์„ ํ†ตํ•ด ์ƒˆ๋กญ๊ฒŒ ์ถ”๊ฐ€๋˜์—ˆ์„ ๋•Œ, S3์— Upload/Download๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ๋˜, Uploadํ•  csv ๋ฐ์ดํ„ฐ / Downloadํ•œ csv ๋ฐ์ดํ„ฐ๋ฅผ saveํ•ฉ๋‹ˆ๋‹ค.

models

๋ชจ๋ธ์ด ์ƒˆ๋กญ๊ฒŒ ์ถ”๊ฐ€๋˜์—ˆ์„ ๋•Œ, S3์— Upload/Download๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ๋˜, Uploadํ•  ๋ชจ๋ธ / Downloadํ•œ ๋ชจ๋ธ์„ saveํ•ฉ๋‹ˆ๋‹ค.

docker_manual

Docker ๋ช…๋ น์–ด๋ฅผ ์ •๋ฆฌํ•ด๋‘” docker_commands.sh๊ณผ ์ดˆ๊ธฐ Inference ์ปจํ…Œ์ด๋„ˆ ์„œ๋น„์Šค ์„ธํŒ…์„ ์œ„ํ•œ service_init.sh๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ์Šต๋‹ˆ๋‹ค.

config

Train ๋ฐ Inference๋ฅผ ์œ„ํ•œ argument configuration ํŒŒ์ผ๋“ค๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ์Šต๋‹ˆ๋‹ค.

asset

Training ๊ณผ์ •์—์„œ Categorical data๋ฅผ ๋ณ€ํ˜•ํ•˜๋Š” Encoder ์ •๋ณด๊ฐ€ npy ํŒŒ์ผ๋“ค๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

dkt

DKT Task ์ˆ˜ํ–‰์„ ์œ„ํ•œ Trainer, Model, Metric, Loss ๋“ฑ์˜ ์ฝ”๋“œ๊ฐ€ ์žˆ๋Š” baseline code ๋””๋ ‰ํ† ๋ฆฌ์ž…๋‹ˆ๋‹ค.


๐Ÿ‘ช Contributor

๊น€์„ฑ์ต Git Badge Gmail Badge

๊น€๋™์šฐ Git Badge

ํ™ฉ์ •ํ›ˆ Git Badge Gmail Badge