How to utilize AI hallucination with RAG

AI hallucination seems to be unavoidable, but we can utilize this with RAG (Retrieval-Augmented Generation) to optimize the output. Below is an example to illustrate how to ask ChatGPT to generate the desired legal provisions.

Conponents: OpenAI API, Zilliz Cloud Collection

1. Setup

from openai import OpenAI
from pymilvus import MilvusClient
import numpy as np
import re

2. ChatGPT

# Configuration for OpenAI
openai_client = OpenAI(
    api_key=your_api_key
)

# Call the model to generate your target content
completion = openai_client.chat.completions.create(
    model="gpt-4o", # Pick an model
    messages=[
        {"role": "system", "content": "your_system_content"}, # Set the setting
        {"role": "user", "content": "your_user_content"} # Ask your questions
    ]
)
raw_content = completion.choices[0].message.content

3. Vector Search in Zilliz Cloud Collection

# Embed the raw_content
def get_embedding(text, model="text-embedding-3-small"):
    return openai_client.embeddings.create(input = [text], model=model).data[0].embedding

raw_vector = get_embedding(raw_content)
vector = np.array(raw_vector) # Convert it to a (1536, ) vector

# Configuration for Zilliz
milvus_client = MilvusClient(
    uri=your_url
    token=your_token
)

# Vector Search in Collection
milvus_search = milvus_client.search(
    collection_name="your_collection", # Your Collection name
    data=[vector],
    limit=1, # The number of results to return
    search_params={"metric_type": "COSINE", "params": {"nprobe": 10}} # Custom your params
)
milvus_search = str(milvus_search) # Otherwise it will be an Extralist

4. Retrive the target content from Zilliz Cloud Collection

# Retrive the id from the search result with regexes
id_pattern = r"'id':\s*(\d+)"
id_match = re.search(id_pattern, milvus_search)
id = int(id_match.group(1))

# Get the content with id in Collection
milvus_get = milvus_client.get(
    collection_name="your_collection",
    ids=[id]
)
milvus_get = str(milvus_get) # Otherwise it will be an Extralist

# Retrive the law_article from the get result with regexes
law_article_pattern = r"'law_article':\s*'([^']*?)'"
law_article_match = re.search(law_article_pattern, milvus_get)
law_article = law_article_match.group(1)

print(law_article)

More details see OpenAI documentation and Zilliz documentation.

Now it is done!