Skip to content

2020 Spring Fudan University Machine Learning Course HW by prof. Chen Qin. 复旦大学大数据学院2020年春季课程-人工智能与机器学习(DATA620006)

Notifications You must be signed in to change notification settings

jrothschild33/Fudan-MachineLearning

Repository files navigation

Fudan University-Machine Learning

2020 Spring Fudan University Machine Learning Course HW by prof. Chen Qin

1. Linear Regression

Task

The datasets are real observations downloaded from the website of the Central Meteorological Administration. Please use linear regression to predict the PM2.5 value.

Modules

  • numpy==1.18.3
  • pandas==0.25.3
  • seaborn==0.10.1
  • matplotlib==3.2.1
  • sklearn.model_selection.train_test_split
  • sklearn.metrics.mean_squared_error
  • sklearn.linear_model.LinearRegression

Results

result

2. Logistic Regression

Task

Implement two models (probabilistic generative model && logistic regression model) to predict whether a person can make over 50k a year according to the personal information.

Modules

  • numpy==1.18.3
  • pandas==0.25.3
  • seaborn==0.10.1
  • matplotlib==3.2.1
  • sklearn.preprocessing
  • sklearn.preprocessing.MinMaxScaler
  • sklearn.preprocessing.LabelEncoder
  • sklearn.linear_model.LogisticRegression
  • sklearn.metrics.accuracy_score
  • sklearn.model_selection.train_test_split

Results

learningcurve

3. Sentiment Classification

Task

This task is based on subtask 2 of SemEval-2014 Task 4: Aspect Based Sentiment Analysis

You are required to implement two neural networks (RNN and CNN or their variants) for sentiment classification specific to an aspect.

For example:

  • “Even though its good seafood, the prices are too high”.
  • This sentence contains two aspects, namely “seafood” and “prices”. The sentiment for the two aspects are positive and negative respectively.

Modules

  • numpy==1.18.3
  • pandas==0.25.3
  • torch==1.2.0
  • torch.optim
  • torch.nn.functional

Results

result1

result2

4. Auto Encoder

Task

Please write an auto-encoder for the images.

  • Use the trained encoder to obtain the 2-dimensional code of the last 1000 images in the test set, and visualize them with a scatterplot where different colors represent different digits.
  • Use the decoder to generate 20 images by sampling some codes.

Modules

  • numpy == 1.18.3
  • scipy == 1.2.1
  • Pillow == 7.1.2
  • tensorflow == 1.15.3
  • torch == 1.2.0

Results

epoch

decoder

5. Reproduction of ALBERT Model

Task

With the application and development of pre-training model in natural language processing, machine reading comprehension no longer simply relies on the combination of network structure and word embedding. This paper briefly introduces the concepts of machine reading comprehension and pre-training language model, summarizes the research progress of machine reading comprehension based on ALBERT model, analyzes the performance of the current pre-training model on the relevant data set.

Requirements

  • python == 3.7
  • pytorch == 1.0.1
  • cuda version == 10.1

Dataset

  • SQuAD

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

  • MRPC

    A text file containing 5800 pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship. No more than 1 sentence has been extracted from any given news article. We have made a concerted effort to correctly associate with each sentence information about its provenance and any associated information about its author.

Structure

Structure

Results

Model Parameters SQuAD1.1 SQuAD2.0
ALBERT base 12M 89.3/82.1 79.1/76.1
ALBERT large 18M 90.9/84.1 82.1/79.0
ALBERT xlarge 59M 93.0/86.5 85.9/83.1
ALBERT xxlarge 233M 94.1/88.3 88.1/85.1

Reference

About

2020 Spring Fudan University Machine Learning Course HW by prof. Chen Qin. 复旦大学大数据学院2020年春季课程-人工智能与机器学习(DATA620006)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published