LangChain Document Embedding API

This is a Flask API for extracting text from PDFs, PowerPoints, XL Spreadsheets, Youtube Channels, and more.

Prerequisites

Docker
Docker Compose

Usage

Clone the repository and navigate to the root directory.
Run the following command to start the API:

docker-compose up

This will start the API on http://localhost:5000.

To extract text from a PDF, make a POST request to http://localhost:5000/getTextForPDF with the following JSON data:

{
"url": "https://example.com/path/to/pdf"
}

This will return the extracted text from the PDF as a JSON response.

To extract text from an HTML page, make a POST request to http://localhost:5000/getTextForURL with the following JSON data:

{
"url": "https://example.com"
}

This will return the extracted text and title from the HTML page as a JSON response.

Configuration

The following environment variables can be set to configure the API:

PORT: The port on which the API should listen. Default is 5000.

Dependencies

The API uses the following dependencies:

Flask==2.2.2
Flask-Cors==3.0.10
gunicorn==20.1.0
selenium==4.7.2
PyPDF2==3.0.1
timeout-decorator==0.5.0
beautifulsoup4==4.11.2

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.dockerignore		.dockerignore
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
youtube.py		youtube.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LangChain Document Embedding API

Prerequisites

Usage

Configuration

Dependencies

About

Releases

Packages

Languages

jacobsomer/LangChain-Document-Embeddings-Service

Folders and files

Latest commit

History

Repository files navigation

LangChain Document Embedding API

Prerequisites

Usage

Configuration

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages