Skip to content

snoop2head/instagram_hashtag_analysis

Repository files navigation

instagram_hashtag_analysis

Crawl and Analyze Instagram Hashtag Data

Header Numbers for files

  • 0: Crawl Instagram posts according to search result of #keyword
  • 1: Create and wrangle dataset with pandas
  • 2: KoNLPy tagging for Koran nouns, Korean action words
  • 3: Extract similar documents and make word2Vec models with gensim
  • 4: TF-IDF code without using scikit-learn library
  • 5: Extracting similar documents using scikit-learn library's tfidfvectorizer

๋ฌธ์„œ ์•ž์— ์žˆ๋Š” ๋ฒˆํ˜ธ๋Š” ๋‹ค์Œ์„ ์˜๋ฏธํ•จ

  • 0: #keyword ๊ฒ€์ƒ‰, ํ•ด์‹œํƒœ๊ทธ ๊ธฐ๋ฐ˜ ์ธ์Šคํƒ€๊ทธ๋žจ ํฌ๋กค๋ง

  • 1: ์ธ์Šคํƒ€๊ทธ๋žจ ๋ฐ์ดํ„ฐ ํ†ตํ•ฉ ๋ฐ ์กฐ์ž‘ - Pandas ๋ชจ๋“ˆ ์ด์šฉ

  • 2: KoNLPy ํ˜•ํƒœ์†Œ๋ถ„์„ -> ์ตœ๋Œ€ ๋นˆ๋„ ์ฒด์–ธ(๋ช…์‚ฌ), ์„œ์ˆ ์–ด(๋™์‚ฌ, ํ˜•์šฉ์‚ฌ) ๋„์ถœ

  • 3: Gensim์„ ์ด์šฉํ•œ Word2Vec ๋ชจ๋ธ ๋„์ถœ ๋ฐ ์œ ์‚ฌ ๋ฌธ์„œ ์ถ”์ถœ

  • 4: scikitlearn ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€, Vanilla๋กœ ์ž‘์„ฑํ•œ TF-IDF ์˜ˆ์ œ

  • 5: scikitlearn ๋ชจ๋“ˆ์˜ TF-IDF Vectorizer์„ ์ด์šฉํ•œ ์œ ์‚ฌ ๋ฌธ์„œ ๋„์ถœ