Skip to content

Add MS SQL Server as a DB Provider for RAG backend #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 10, 2025

Conversation

dminnear-rh
Copy link
Contributor

@sauagarwa There's a lot of changes to clean up this pattern in addition to adding the subchart for deploying SQL Server for local deploy and Azure SQL server as a RAG DB provider.

I'll summarize the key changes:

  • added ansible playbook and updated makefile to allow creating GPU machinesets on Azure
  • remove unused code (the minio chart, additional cluster groups)
  • add charts for azure sql server and local sql server and updated secrets to accommodate them
  • updated number of elasticsearch nodes since it would report as unhealthy without an additional node for the document embeddings
  • added specific overrides for Azure to allow installing on GPUs with only 16 GB of VRAM which are more compatible with the NVIDIA operator
  • enabled the use of the TGI server chart (now we can have a model deployed by the inference service on vLLM as well as an additional model deployed by the TGI server so the default AWS installation can use two models by default)
  • Models are now only configured in one place (values-global.yaml) and you can use the values global.model.vllm and global.model.tgis to easily set the model used by the inference server and TGI server, respectively
  • Updated the RAG DB embedding image to use the image built from https://github.com/validatedpatterns-sandbox/vector-embedder which allows more flexibility in configuring the env, better logging, configuring the sources from repos or URLs more easily the values file, and updated versions of langchain and the db providers
  • renamed the llm-serving-service chart to vllm-inference-service to further distinguish that we're using vLLM for that chart versus the HF TGI server in the tgis-server chart
  • loosen versions of operators installed to better support more versions of Openshift

With these changes, we are able to install the chart on ROSA 4.18 as well as the ARO 4.14 provided by our demo platform and everything comes up healthy and synced regardless of your RAG DB provider. This ensures CI will begin passing again (as it checks for out-of-sync and unhealthy applications in ArgoCD).

Please let me know if there's anything you want me to change, move into a separate PR, etc. and I'm happy to do so.

One other important thing, right now this PR is using quay.io/dminnear/gradio-tgi-multi-model-rag because the latest changes to the UI haven't been built and pushed to https://quay.io/repository/ecosystem-appeng/rag-llm-ui?tab=info. If we can get that image updated with the latest https://github.com/RHEcosystemAppEng/llm-on-openshift/tree/main/examples/ui/gradio/gradio-tgi-multi-model-rag-redis I can revert that back to the proper image.

@dminnear-rh dminnear-rh requested a review from sauagarwa June 3, 2025 16:32
@dminnear-rh dminnear-rh force-pushed the use-mssql-db branch 3 times, most recently from 21f1d99 to 212da39 Compare June 9, 2025 18:02
tgis: mistralai/Mistral-7B-Instruct-v0.3
# Embedding model used by the RAG DB
embedding: sentence-transformers/all-mpnet-base-v2
tgisServer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we deploying tgisServer

@dminnear-rh dminnear-rh force-pushed the use-mssql-db branch 5 times, most recently from dba08d7 to 2e9215c Compare June 10, 2025 01:58
@sauagarwa sauagarwa merged commit ab18e90 into validatedpatterns:main Jun 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants