Who wrote this

Python package to distinguish text data into human vs AI generated. This package provides a straightforward interface for verifying the authenticity of text, AI or Human generated.

Features

TextPreprocessing - Preprocess text data and convert it into tokens.
TextEmbedding - Convert Text data into word embeddings using LLM models.
Classifier - Predict whether the text data is Human or AI generated.
EvaluateModel - Evaluates the model on 10,000 samples
EvaluateClassifier - Evaluates the ensemble models
UserApp - Runs the Streamlit app on localhost for UI of WhoWroteThis

Usage

Install requirements:

pip install -r requirements.txt

Text Preprocessing

To preprocess text data for classification

from WhoWroteThis import TextPreprocessing

processor = TextPreprocessing(text, file_given=False)
preprocessed_text = processor.preprocess()
print(preprocessed_text)

Generating Embeddings

Generate embeddings using LLM model

from WhoWroteThis import TextEmbedding

embeddings = TextEmbedding('text.txt', model='gpt-2').get_embeddings()

Predict

Predict the text as human or AI-generated

from WhoWroteThis import Classifier

prediction = Classifier.predict(embeddings)
print(prediction)

Evaluate Model

Evaluation on dataset


# Load data
data = pd.read_csv(
f'{os.getcwd()}\\whowrotethis\\data\\10k_raw_unseen.csv')
x_test = data.loc[:, '0' : '767']
y_test = data['label']

# Get predictions
model = EnsembledModel(x_test)
y_pred_1 = model.simple_predict()
y_pred_2 = model.weighted_predict()

# Prepare figure
fig, axs = plt.subplots(1, 2, figsize=(15, 5), tight_layout=True)
count = 0
fig.suptitle("Number of wrong predictions using the ensembled model")

# Evaluate models
print("Weighted Ensemble Model:")
evaluate(y_test, y_pred_2, axs, count, "Weighted Ensembled Model")
count += 1
print("-" * 30)

print("Unweighted Ensemble Model:")
evaluate(y_test, y_pred_1, axs, count, "Unweighted Ensembled Model")

plt.show()

Interface for text detection

Web user interface for the user to input text and display prediction

from whowrotethis import UserApp
UserApp().run_app()

Name		Name	Last commit message	Last commit date
Latest commit ritikakumar0204 Add files via upload May 6, 2024 a01126a · May 6, 2024 History 88 Commits
.idea		.idea
whowrotethis		whowrotethis
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf
error_logger.py		error_logger.py
main.py		main.py
requirements.txt		requirements.txt
run_test.py		run_test.py
setup.py		setup.py
text.txt		text.txt
user_input.txt		user_input.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Who wrote this

Features

Usage

About

Releases

Packages

Contributors 3

Languages

ritikakumar0204/DS5010-Final-Project

Folders and files

Latest commit

History

Repository files navigation

Who wrote this

Features

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages