The Docling Loader is not compatible with `GenericLoader`.

The typical requirements for RAG projects are generally as follows:

- Import files into a vector database
- From a directory structure
- Be able to update the files
- Without re-importing everything
- Oh, and don't forget to remove files that are no longer present from the vector database
- Since the PDF format isn’t great, we also have some files in Word format
- It’s not just 10 sample documents, but 50,000 with 20 pages each, evolving daily
- The files are, of course, stored in cloud storage

In my opinion, the best approach to handle this using LangChain is with code similar to this:

```python
vector_store=...
record_manager=...
loader=GenericLoader(
    blob_loader=FileSystemBlobLoader(  # Or CloudBlobLoader
        path="mydata/",
        glob="**/*",
        show_progress=True,
    ),
    blob_parser=DoclingParser()
)
index(
    loader.lazy_load(),
    record_manager,
    vector_store,
    batch_size=100,
)
```

Change `FileSystemBlobLoader` to `CloudBlobLoader`, and you can manage complex scenarios in just a few lines.


To be compatible, and allow, for example, files to be uploaded directly from cloud storage (see `CloudBlobLoader`), it would be a good idea to split the code into `Loader` and `Parser`.

To be able to write in 20 lines what is usually written in 2000 lines.

See this [PR](https://github.com/langchain-ai/langchain/pull/28970) for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The Docling Loader is not compatible with `GenericLoader`. #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The Docling Loader is not compatible with GenericLoader. #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The Docling Loader is not compatible with `GenericLoader`. #10