Skip to content

USC DSCI 560 - Data Science Professional Practicum - Spring 2024 - Prof. Young Cho

License

Notifications You must be signed in to change notification settings

KayvanShah1/usc-dsci560-dspp-sp24

Repository files navigation

USC DSCI 560 - Data Science Professional Practicum - Spring 2024

Please note that this repository serves as a reference guide and should be utilized as a tool for learning and comprehension. It's paramount to refrain from engaging in any activities associated with plagiarism. Embrace the wealth of knowledge herein to enhance your understanding and augment your skill set in the field of machine learning, data science and analytics.

Before diving in, take a moment to peruse the license and disclaimer. Understanding the terms laid out will ensure responsible utilization of the resources within this repository, promoting ethical learning practices.

Tip

Before exploring the materials, take a moment to review the license and disclaimer for responsible utilization. The repository covers various topics, providing valuable insights and hands-on experience in Data Management.

Course Details:

  • Course Name: DSCI 560 - Data Science Professional Practicum
  • Instructor: Prof. Young Cho
  • Semester: Spring 2024

Feel free to navigate through the folders, explore the assignments, delve into the projects, and utilize the solutions provided as learning aids. Whether you're a novice eager to grasp the fundamentals or an experienced practitioner seeking to refine your skills, this repository aims to be your companion in mastering the intricate intersection of machine learning and data science.

We encourage you to engage actively, experiment with the materials provided, and embark on an enriching journey into the realm of Machine Learning for Data Science.

Happy learning!

Caution

Please note that this repository serves as a reference guide and should be utilized as a tool for learning and comprehension. It's paramount to refrain from engaging in any activities associated with plagiarism. Embrace the wealth of knowledge herein to enhance your understanding and augment your skill set in the field of Applied Machine Learning.

Table of contents

Labs & Project

Lab Topic Covered Grade
Lab 1 Web Scraping Stock Market Indicators and Latest News from CNBC World 100/100
Lab 2 Text Extraction from PDFs 100/100
Lab 3 Part 1 Stock Price Analysis & Algorithmic Trading 100/100
Lab 3 Part 2 Stock Price Forecasting & Mock Trading Environment 95/100
Lab 4 Part 1 Subreddit Posts Data and Keyphrase Extraction 100/100
Lab 4 Part 2 Clustering Analysis of Reddit Posts 100/100
Lab 5 Part 1 Oil Wells Data Extraction 100/100
Lab 5 Part 2 Oil Wells Analysis and Visualization 100/100
Lab 6 Part 1 PDFs Question Answering Chatbot using Langchain & OpenAI API 100/100
Lab 6 Part 2 PDFs Question Answering Chatbot using Langchain & Llama 2 100/100
Final Project VirtuTA: AI-enabled Virtual Teaching Assistant 8.18/10

Readings

Reading Paper Title Grade
1 CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks 100/100
2 Deep learning-based NLP data pipeline for EHR-scanned document information extraction 100/100
3 Stock Price Forecasting Using Data From Yahoo Finance and Analysing Seasonal and Nonseasonal Trend 100/100
4 Can Blog Communication Dynamics be Correlated with Stock Market Activity? 100/100
5 Streaming Hierarchical Clustering for Concept Mining 100/100
6 A Physics-Guided Deep Learning Predictive Model for Robust Production Forecasting and Diagnostics in Unconventional Wells 100/100
7 Language Models are Unsupervised Multitask Learners 100/100

Authors

  1. Kayvan Shah | MS in Applied Data Science | USC
  2. Soma Meghana Prathipati | MS in Applied Data Science | USC
  3. Shreyansh Baredia | MS in Applied Data Science | USC

LICENSE

This repository is licensed under the BSD 5-Clause License. See the LICENSE file for details.

Disclaimer

The content and code provided in this repository are for educational and demonstrative purposes only. The project may contain experimental features, and the code might not be optimized for production environments. The authors and contributors are not liable for any misuse, damages, or risks associated with the direct or indirect use of this code. Users are strictly advised to review, test, and completely modify the code to suit their specific use cases and requirements. By using any part of this project, you agree to these terms.