Connect Weaviate
This tutorial shows you how to connect your Weaviate cluster and perform semantic search with PromptQL.
Prerequisites
In this example, we use a Weaviate Cluster on Cloud.
- Create a cluster on Weaviate
- You will need the Weaviate REST API URL and API Key, both available on Weaviate Cloud.
- You’ll also need a Cohere API key for the embeddings.
Clone the project
git clone [email protected]:hasura/weaviate-promptql-quickstart.git
cd promptql-weaviate-quickstart
Add your API keys for Weaviate and Cohere
We’ll be using the Cohere API key to create embeddings for our unstructured data via Weaviate Client. We also need the Weaviate Cloud URL and Weaviate API Key. Both can be obtained from Weaviate Cloud.
In your project directory, add the key to your .env
file.
First, copy the env.sample
file to .env
cp env.sample .env
Now, update the API Keys
APP_PYTHON_COHERE_API_KEY='....'
APP_PYTHON_WCD_URL='....'
APP_PYTHON_WCD_API_KEY='....'
Note: Feel free to modify the embedding configuration in Weaviate Client to use something other than Cohere and configure the API key accordingly.
Create a collection and load data
After configuring the .env
values with the above credentials, continue with the data loading into Weaviate.
Head to app/connector/python
and execute
python3 load-data.py
This will create a movie collection and load sample movie data with vector embedding using Cohere.
Write custom functions
In app/connector/python/functions.py
, add additional functions based on requirements.
We already have two functions.
semantic_search
and keyword_search
. They are exposed as functions to PromptQL.
Click to show Python code
"""
functions.py
This is an example of how you can use the Python SDK's built-in Function connector to easily write Python code.
When you add a Python Lambda connector to your Hasura project, this file is generated for you!
In this file you'll find code examples that will help you get up to speed with the usage of the Hasura lambda connector.
If you are an old pro and already know what is going on you can get rid of these example functions and start writing your own code.
"""
import os
import weaviate
from weaviate.classes.init import Auth
import weaviate.classes.config as wc
import weaviate.classes.query as wq
from hasura_ndc import start
from hasura_ndc.function_connector import FunctionConnector
from pydantic import BaseModel, Field # You only need this import if you plan to have complex inputs/outputs, which function similar to how frameworks like FastAPI do
from hasura_ndc.errors import UnprocessableContent
from typing import Annotated
# Weaviate Environment Variables
wcd_url = os.environ["WCD_URL"]
wcd_api_key = os.environ["WCD_API_KEY"]
cohere_api_key = os.environ["COHERE_API_KEY"]
client = weaviate.connect_to_weaviate_cloud(
cluster_url=wcd_url, # Replace with your Weaviate Cloud URL
auth_credentials=Auth.api_key(wcd_api_key), # Replace with your Weaviate Cloud key
headers={"X-Cohere-Api-Key": cohere_api_key}
)
connector = FunctionConnector()
class Movies(BaseModel):
title: str
# Semantic search to find similar entities in weaviate
@connector.register_query # This is how you register a query
def semantic_search(query: str) -> list[Movies]:
# do near by search with weaviate python client
# Get the collection
movies = client.collections.get("Movie")
# Perform query
response = movies.query.near_text(
query=query, limit=5, return_metadata=wq.MetadataQuery(distance=True)
)
movies_list = []
# Inspect the response
for o in response.objects:
print("XXXXXXX")
print(o.properties)
print(o.properties["release_date"])
print(
o.properties["title"]
) # Print the title and release year (note the release date is a datetime object)
movies_list.append(Movies(title=o.properties["title"]))
print(
f"Distance to query: {o.metadata.distance:.3f}\n"
) # Print the distance of the object from the query
return movies_list
# This is an example of a keyword search function which uses the BM25 algorithm to find the most relevant Movie entities in Weaviate
@connector.register_query # This is how you register a query
def keyword_search(query: str) -> list[Movies]:
# Get the collection
movies = client.collections.get("Movie")
# Perform query
response = movies.query.bm25(
query="history", limit=5, return_metadata=wq.MetadataQuery(score=True)
)
movies_list = []
# Inspect the response
for o in response.objects:
print(
o.properties["title"]
) # Print the title and release year (note the release date is a datetime object)
movies_list.append(Movies(title=o.properties["title"]))
print(
f"BM25 score: {o.metadata.score:.3f}\n"
) # Print the BM25 score of the object from the query
return movies_list
if __name__ == "__main__":
start(connector)
Build your supergraph
Create your supergraph build locally:
ddn supergraph build local
Start your supergraph locally
ddn run docker-start
Head to your local DDN console
Run the following from your project’s directory:
ddn console --local
Talk to your data
Now, you can ask semantic questions on your data:
> Movies with dystopian future
> Historical movies
Iterating on the Python Code and Logic
After you modify code in Python for any semantic or keyword search logic, you need to update the commands by following the steps below:
Introspect your connector
Make sure Docker is running, then execute:
ddn connector introspect python
Tip: Run in debug mode with the --log-level DEBUG
to understand what is going on with the introspection:
Add resources
We need to add the functions available in Python to be available to PromptQL
ddn command add '*'
Build
ddn supergraph build local