HURIDOCS

All

36 repositories

uwazi
Public
Uwazi is a web-based, open-source solution for building and sharing document collections
open-source pdf data-science database ai documents non-profit
TypeScript
•
MIT License
•83•275•442•5•Updated Aug 7, 2025Aug 7, 2025
pdf_metadata_extraction
Public
pdf_information_extraction
Python
•1•5•0•8•Updated Aug 7, 2025Aug 7, 2025
trainable-entity-extractor
Public
Trainable Entity Extractor
Python
•
Apache License 2.0
•0•1•0•7•Updated Aug 7, 2025Aug 7, 2025
pdf-document-layout-analysis
Public
A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on.
Python
•
Apache License 2.0
•79•642•3•7•Updated Aug 7, 2025Aug 7, 2025
ml-cloud-connector
Public
ml-cloud-connector
Python
•
Apache License 2.0
•0•0•0•0•Updated Aug 7, 2025Aug 7, 2025
pdf-features
Public
pdf-features
Python
•0•0•0•0•Updated Aug 4, 2025Aug 4, 2025
dummy_extractor_services
Public
Python
•0•0•0•0•Updated Jul 31, 2025Jul 31, 2025
ML-Benchmarks
Public
Repository to store all the ML benchmarks
0•0•0•0•Updated Jul 29, 2025Jul 29, 2025
NER-in-docker
Public
NER-in-docker
Python
•0•1•0•7•Updated Jul 15, 2025Jul 15, 2025
uwazi-documentation
Public
HTML
•
MIT License
•3•2•6•0•Updated Jun 24, 2025Jun 24, 2025
pdf-document-layout-analysis-async
Public
pdf-document-layout-analysis-async
Python
•0•1•0•5•Updated May 26, 2025May 26, 2025
docker-translation-service
Public
docker-translation-service
Python
•
Apache License 2.0
•0•0•0•6•Updated May 2, 2025May 2, 2025
queue-processor
Public
queue-processor
Python
•
Apache License 2.0
•0•0•0•0•Updated Mar 24, 2025Mar 24, 2025
pdf-labeled-data
Public
TypeScript
•
Apache License 2.0
•1•3•0•0•Updated Mar 18, 2025Mar 18, 2025
rison
Public
JavaScript
•
Apache License 2.0
•4•0•0•0•Updated Mar 11, 2025Mar 11, 2025
pdf-text-extraction
Public
This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of text extraction from PDF files.
Makefile
•
Apache License 2.0
•4•34•2•0•Updated Feb 3, 2025Feb 3, 2025
pdf-table-of-contents-extractor
Public
This project aims to extract Table of Contents (TOC) information from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of identifying and structuring the document's TOC.
Makefile
•
Apache License 2.0
•5•17•2•0•Updated Feb 3, 2025Feb 3, 2025
preserve
Public
Preserve is a tool for capturing and saving online digital content. Integrated with Uwazi, Preserve captures content from websites, social media and communication platforms, and archives them with accompanying key metadata to ensure evidentiary value by establishing and demonstrating authenticity and chain of custody.
TypeScript
•
MIT License
•1•6•12•7•Updated Jan 22, 2025Jan 22, 2025
pdf_ocr_service
Public
An http service to OCR PDFs based on a redis queue.
Python
•
MIT License
•0•1•3•0•Updated Dec 13, 2024Dec 13, 2024
react-text-selection-handler
Public
text selection handling and highlighting
TypeScript
•
Apache License 2.0
•0•0•6•0•Updated Nov 14, 2024Nov 14, 2024
convert-to-pdf-service
Public
An http service to convert documents to PDF based on a redis queue.
Python
•
MIT License
•0•0•3•7•Updated Sep 19, 2024Sep 19, 2024
pdf-tokens-type-labeler
Public
Python
•3•3•1•6•Updated Jul 4, 2024Jul 4, 2024
pdf_paragraphs_extraction
Public
Python
•
MIT License
•6•49•1•4•Updated Jul 4, 2024Jul 4, 2024
pdf-reading-order
Public
Python
•2•13•0•0•Updated Apr 26, 2024Apr 26, 2024
uwazi-design
Public
0•4•6•0•Updated Jul 3, 2023Jul 3, 2023
topic-classification
Public
Python
•
MIT License
•4•5•10•4•Updated May 25, 2023May 25, 2023
twitter_crawler
Public
twitter crawler
Python
•0•1•0•1•Updated Apr 3, 2023Apr 3, 2023
semantic-search
Public
Python
•4•3•1•3•Updated Dec 27, 2022Dec 27, 2022
mock-semantic-ml-server
Public
Mock server that simulates the ML server that processes documents for semantic search
JavaScript
•0•0•0•1•Updated Dec 10, 2022Dec 10, 2022
classification-utils
Public
Python
•2•0•0•3•Updated Nov 21, 2022Nov 21, 2022