Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed #868

Open
ShuhaoZhangTony opened this issue Apr 6, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@ShuhaoZhangTony
Copy link

Describe the bug
A clear and concise description of what the bug is.

I'm trying to use haystack's API to build a RAG pipeline. I'm using FAISSDocumentStore and EmbeddingRetriever.

Works like the following:

# Create the document store using the factory
document_store = create_document_store(store_type, **store_config)

documents = []
documents_dir = args.docs_path
for filename in os.listdir(documents_dir):
    file_path = os.path.join(documents_dir, filename)
    if os.path.isfile(file_path):
        with open(file_path, 'r', encoding='utf-8') as file:
            content = file.read()
            document = Document(content=content)
            documents.append(document)
document_store.write_documents(documents)

# Ensure the retriever is initialized before updating embeddings
retriever = RetrieverFactory.get_retriever(retriever_type=args.retriever_type,
                                           document_store=document_store,
                                           query_embedding_model=args.query_embedding_model,
                                           passage_embedding_model=args.passage_embedding_model
                                           )

# Update embeddings right after writing documents
if hasattr(document_store,
           'update_embeddings'):  # check ensures that this code block only executes if the document_store instance has the update_embeddings method.
    document_store.update_embeddings(retriever=retriever, batch_size=10)

Error message
Error that was thrown (if available)

haystack/modeling/model/language_model.py", line 222, in _pool_tokens
ignore_mask_3d[:, :, :] = ignore_mask_2d[:, :, np.newaxis]
~~~~~~~~~~~~~~^^^^^^^^^
IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here, like type of downstream task, part of etc..

To Reproduce
Steps to reproduce the behavior

System:

  • OS: Ubuntu 18.04
  • GPU/CPU:
  • FARM version:
@ShuhaoZhangTony ShuhaoZhangTony added the bug Something isn't working label Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant