Skip to content

ThirdAILabs/Demos

Repository files navigation

Forks Stargazers Issues License


Logo

Demos

Interactive notebooks for exploring the ThirdAI library.

[Website] · [Report Issues] · [Careers]

👋 Welcome

All of ThirdAI's technology is powered by its BOLT library. BOLT is a deep-learning framework that leverages sparsity to enable training and deploying very large scale deep learning models on any CPU. This demos repo will help get you familiar with our products Neural DB and Universal Deep Transformer (UDT) through interactive notebooks.

🧠 NeuralDB (for RAG and Search)

NeuralDB is an efficient, private, teachable CPU-only text retrieval engine. You can insert all your PDFs, DOCXs, CSVs (and even parse URLs) into a NeuralDB and do semantic search and QnA on them. Read our three part blog on why you need NeuralDB here. Leveraging over a decade of research in efficient neural network training, NeuralDB has been meticulously optimized to operate effectively on conventional CPUs, making it accessible to any standard desktop machine. Additionally, since it can be trained and used anywhere, NeuralDB gives you airgapped privacy, ensuring your data never leaves your local machine.

With the capacity to scale Retreival Augmented Generation (RAG) capabilities over thousands of pages, NeuralDB revolutionizes the way you interact with your data.

Here is a quick overview of how NeuralDB works:

from thirdai import neural_db as ndb

db = neural_db.NeuralDB()

db.insert(
  sources=[ndb.PDF(filename), ndb.DOCX(filename), ndb.CSV(filename)], 
  train=True
)

results = ndb.search(
    query="what is the termination period of this contract?",
    top_k=2,
)

for result in results:
    print(result.text)

NeuralDB also provides teaching methods for incorporating human feedback into RAG.

# associate a source with a target
db.associate(source="parties involved", target="made by and between")

# associate text with a result
db.text_to_result("made by and between",0)

See the neural_db folder for more examples and documentation.

🪐 Universal Deep Transformer (for all Transformer and ML needs)

Universal Deep Transformer (UDT) is our consolidated API for performing different ML tasks on a variety of data types. It handles text, numeric, categorical, multi-categorical, graph, and time series data while generalizing to tasks like NLP, multi-class classification, multi-label retrieval, regression etc. Just like NeuralDB, UDT is optimized for conventional CPUs and is accessible to any standard desktop machine.

Some applications of UDT include:

Here is an example of the UDT API used for multi-label tabular classification:

from thirdai import bolt

model = bolt.UniversalDeepTransformer(
    data_types={
        "title": bolt.types.text(),
        "category": bolt.types.categorical(),
        "number": bolt.types.numerical(range=(0, 100)),
        "label": bolt.types.categorical(delimiter=":")
    },
    target="label",
    n_target_classes=2,
    delimiter='\t',
)

model.train(filename.csv, epochs=5, learning_rate=0.001, metrics=["precision@1"])

model.predict({"title": "Red shoes", "category": "XL", "number": "12.6"})

See the universal_deep_transformer folder for more examples and documentation.

📄 License

Many notebooks come with an API key that will only work on the dataset in the demo. If you want to try out ThirdAI on your own dataset, simply register for a free license here.

To use your license do the following before constructing your NeuralDB or UDT models.

from thirdai import licensing

licensing.activate("") # insert your valid license key here

# create NeuralDB or UDT ...

Please refer to LICENSE.txt for more information on usage terms.

🎙 Contact

ThirdAILabs - @ThirdAILab - [email protected]