Create Your First Visual Agent Using AOAI and AI Search - Search Product Catalog Images (2024)

Search Product Catalog Images Using Azure Search and OpenAI with Langchain

In the ever-evolving landscape of retail, businesses are continually seeking innovative solutions to streamline their operations and enhance customer experiences. One such breakthrough is the implementation of artificial intelligence (AI) to search product catalog images efficiently. This transformative technology not only simplifies the search process but also empowers businesses to provide personalized and seamless shopping experiences for their customers.

Create Your First Visual Agent Using AOAI and AI Search - Search Product Catalog Images (1)

The Need for AI in Product Catalog Image Search: Traditional methods of searching through product catalogs involve manual tagging and categorization, which can be time-consuming and prone to human error. As the volume of products in a catalog grows, managing and searching for specific items becomes a daunting task. AI, particularly computer vision, addresses these challenges by automating the recognition and categorization of products in images.

Key Features of AI-Powered Product Catalog Image Search:

Object Recognition and Tagging: AI algorithms can identify and tag objects within images, providing accurate and consistent categorization of products. This reduces the reliance on manual tagging, ensuring that products are correctly labeled in the catalog.
Visual Similarity Search: AI enables visual similarity search, allowing users to find products based on visual attributes rather than relying solely on text-based queries. This feature is especially valuable for customers who may struggle to describe a product in words but can easily recognize it visually.
Enhanced Product Discovery: By understanding the visual characteristics of products, AI facilitates a more sophisticated recommendation system. Customers can discover related or complementary items, leading to increased cross-selling opportunities and a more engaging shopping experience.
Improved Accuracy and Efficiency: AI-powered image recognition is highly accurate and can process large volumes of images in a fraction of the time it would take a human. This efficiency not only reduces operational costs but also enhances the speed at which customers can find and purchase products.
Integration with E-Commerce Platforms: AI-driven image search can seamlessly integrate with existing e-commerce platforms, making it easy for businesses to adopt this technology without major disruptions. This integration allows for a smoother transition and ensures that the AI-enhanced search becomes an integral part of the overall shopping experience.

Now lets try to implement this with Azure OpenAI.

Firs you need to import some libraries

import azure.cognitiveservices.speech as speechsdkimport datetimeimport ioimport jsonimport mathimport matplotlib.pyplot as pltimport numpy as npimport openaiimport osimport randomimport requestsimport sysimport timefrom azure.core.credentials import AzureKeyCredentialfrom azure.search.documents import SearchClientfrom azure.search.documents.indexes import SearchIndexClientfrom azure.search.documents.indexes import SearchIndexerClientfrom azure.search.documents.indexes.models import ( SearchIndexerDataContainer, SearchIndexerDataSourceConnection,)from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissionsfrom azure.cognitiveservices.speech import ( AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat,)from azure.cognitiveservices.speech.audio import AudioOutputConfigfrom azure.search.documents.models import VectorizedQuery,VectorizableTextQueryfrom dotenv import load_dotenvfrom io import BytesIOfrom IPython.display import Audiofrom PIL import Imageimport osimport base64import refrom datetime import datetime, timedeltaimport requestsimport osfrom tenacity import ( Retrying, retry_if_exception_type, wait_random_exponential, stop_after_attempt)import jsonimport mimetypes

Initiate some environmental variable for your

Azure OpenAI Endpoint
Azure Cognitive Service End point
Azure Search End point

load_dotenv("azure.env")# Azure Open AIopenai_api_type = os.getenv("azure")openai_api_base = os.getenv("AZURE_OPENAI_ENDPOINT")openai_api_version = os.getenv("AZURE_API_VERSION")openai_api_key = os.getenv("AZURE_OPENAI_KEY")# Azure Cognitive Searchacs_endpoint = os.getenv("ACS_ENDPOINT")acs_key = os.getenv("ACS_KEY")# Azure Computer Vision 4acv_key = os.getenv("ACV_KEY")acv_endpoint = os.getenv("ACV_ENDPOINT")blob_connection_string = os.getenv("BLOB_CONNECTION_STRING")container_name = os.getenv("CONTAINER_NAME")# Azure Cognitive Search index name to createindex_name = "azure-fashion-demo"# Azure Cognitive Search api versionapi_version = "2023-02-01-preview"

Now lets create a function to create text embedding using vision API

def text_embedding(prompt): """ Text embedding using Azure Computer Vision 4.0 """ version = "?api-version=" + api_version + "&modelVersion=latest" vec_txt_url = f"{acv_endpoint}/computervision/retrieval:vectorizeText{version}" headers = {"Content-type": "application/json", "Ocp-Apim-Subscription-Key": acv_key} payload = {"text": prompt} response = requests.post(vec_txt_url, json=payload, headers=headers) if response.status_code == 200: text_emb = response.json().get("vector") return text_emb else: print(f"Error: {response.status_code} - {response.text}") return None

Lets Now lets create a function to create Image embedding using vision API

def image_embedding(image_path): url = f"{acv_endpoint}/computervision/retrieval:vectorizeImage" mime_type, _ = mimetypes.guess_type(image_path) headers = { "Content-Type": mime_type, "Ocp-Apim-Subscription-Key": acv_key } for attempt in Retrying( retry=retry_if_exception_type(requests.HTTPError), wait=wait_random_exponential(min=15, max=60), stop=stop_after_attempt(15) ): with attempt: with open(image_path, 'rb') as image_data: response = requests.post(url, params=params, headers=headers, data=image_data) if response.status_code != 200: response.raise_for_status() vector = response.json()["vector"] return vector

Next thing we require is to create a function which takes a text prompt as input and search Azure Search for most relevant images. Here Buy Now Link is a dummy link which can be replaced with actual product URL

def prompt_search(prompt, topn=5, disp=False): """ Azure Cognitive visual search using a prompt """ results_list = [] # Initialize the Azure Cognitive Search client search_client = SearchClient(acs_endpoint, index_name, AzureKeyCredential(acs_key)) blob_service_client = BlobServiceClient.from_connection_string(blob_connection_string) container_client = blob_service_client.get_container_client(container_name) # Perform vector search vector_query = VectorizedQuery(vector=text_embedding(prompt), k_nearest_neighbors=topn, fields="image_vector") response = search_client.search( search_text=prompt, vector_queries= [vector_query], select=["description"], top = 2 ) for nb, result in enumerate(response, 1): blob_name = result["description"] + ".jpg" blob_client = container_client.get_blob_client(blob_name) image_url = blob_client.url sas_token = generate_blob_sas( blob_service_client.account_name, container_name, blob_name, account_key=blob_client.credential.account_key, permission=BlobSasPermissions(read=True), expiry=datetime.utcnow() + timedelta(hours=1) ) sas_url = blob_client.url + "?" + sas_token results_list.append({"buy_now_link" : sas_url,"price_of_the_product": result["description"], "product_image_url": sas_url}) return results_list

Lets ingest some Product Images to the Azure Search. Here we are basically the idea is we have folder called images having all the product images stored. We are basically creating a container and uploading all the images from the folder to the specific container.

EMBEDDINGS_DIR = "embeddings"os.makedirs(EMBEDDINGS_DIR, exist_ok=True)image_directory = os.path.join('images')embedding_directory = os.path.join('embeddings')output_json_file = os.path.join(embedding_directory, 'output.jsonl')for root, dirs, files in os.walk(image_directory): for file in files: local_file_path = os.path.join(root, file) blob_name = os.path.relpath(local_file_path, image_directory) with open(local_file_path, "rb") as data: blob_client.upload_blob(data, overwrite=True)

Next we will create the embedding of the product images and store the same locally in the embedding directory. Point to note is that we have used only 2 metadata id and description. You can basically extend to many more metadata like price, buy now link etc.

with open(output_json_file, 'w') as outfile: for idx, image_path in enumerate(os.listdir(image_directory)): if image_path: try: vector = image_embedding(os.path.join(image_directory, image_path)) except Exception as e: print(f"Error processing image at index {idx}: {e}") vector = None filename, _ = os.path.splitext(os.path.basename(image_path)) result = { "id": f'{idx}', "image_vector": vector, "description": filename } outfile.write(json.dumps(result)) outfile.write('\n') outfile.flush()print(f"Results are saved to {output_json_file}")

Now since have created the local embedding file, we can upload the same into a Azure Search. Before that lets create an index.

from azure.search.documents.indexes import SearchIndexClientfrom azure.search.documents.indexes.models import ( SimpleField, SearchField, SearchFieldDataType, VectorSearch, HnswAlgorithmConfiguration, VectorSearchProfile, SearchIndex)credential = AzureKeyCredential(acs_key)# Create a search index index_client = SearchIndexClient(endpoint=acs_endpoint, credential=credential) fields = [ SimpleField(name="id", type=SearchFieldDataType.String, key=True), SearchField(name="description", type=SearchFieldDataType.String, sortable=True, filterable=True, facetable=True), SearchField( name="image_vector", hidden=True, type=SearchFieldDataType.Collection(SearchFieldDataType.Single), searchable=True, vector_search_dimensions=1024, vector_search_profile_name="myHnswProfile" ), ] # Configure the vector search configuration vector_search = VectorSearch( algorithms=[ HnswAlgorithmConfiguration( name="myHnsw" ) ], profiles=[ VectorSearchProfile( name="myHnswProfile", algorithm_configuration_name="myHnsw", ) ], ) # Create the search index with the vector search configuration index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search) result = index_client.create_or_update_index(index) print(f"{result.name} created")

Once you have created the index, you can upload the locally stored index file.

from azure.search.documents import SearchClientimport jsondata = []with open(output_json_file, 'r') as file: for line in file: # Remove leading/trailing whitespace and parse JSON json_data = json.loads(line.strip()) data.append(json_data)search_client = SearchClient(endpoint=acs_endpoint, index_name=index_name, credential=credential)results = search_client.upload_documents(data)for result in results: print(f'Indexed {result.key} with status code {result.status_code}')

Congratulations you have finally ready to implement your Agent using OpenAI

Lets create tool called image search which will be used by the Agent

from typing import Optionalfrom langchain_core.callbacks import CallbackManagerForToolRunfrom langchain_core.tools import BaseToolfrom util import prompt_searchclass ImageSearchResults(BaseTool): """Tool that queries the Fashion Image Search API and gets back json.""" name: str = "image_search_results_json" description: str = ( "A wrapper around Image Search. " "Useful for when you need search fashion images related to cloth , shoe etc" "Input should be a search query. Output is a JSON array of the query results" ) num_results: int = 4 def _run( self, query: str, run_manager: Optional[CallbackManagerForToolRun] = None, ) -> str: """Use the tool.""" return str(prompt_search(prompt = query, topn=self.num_results))

Here we will be using Langchain to implement our Fashion Agent called Luca

from langchain_core.prompts.chat import ( BaseMessagePromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder, SystemMessagePromptTemplate, PromptTemplate,)from langchain_core.messages import AIMessage, HumanMessage, SystemMessagefrom langchain_core.runnables import Runnable, RunnablePassthroughfrom langchain_community.tools.convert_to_openai import format_tool_to_openai_functionfrom langchain_core.utils.function_calling import convert_to_openai_functionfrom langchain.agents.output_parsers.openai_functions import ( OpenAIFunctionsAgentOutputParser,)from langchain.agents.format_scratchpad.openai_functions import ( format_to_openai_function_messages,)from langchain.agents import AgentExecutorfrom langchain_openai import AzureChatOpenAIfrom langchain_core.runnables import RunnableConfigfrom custom_tool import ImageSearchResultsimport openai

Lets initialize our LLM

from langchain_openai import AzureChatOpenAIllm = AzureChatOpenAI( api_key=os.environ["AZURE_OPENAI_KEY"], api_version="2023-12-01-preview", azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], model="gpt-4-turbo",)llm(messages=[HumanMessage(content = "Hi")])prefix="""You are Luca a helpful Fashion Agent who help people navigating and buying products onlineNote:\\ Show Prices always in INR\\ Always try user to buy from the buy now link provided"""suffix = ""

Lets attach tool we created, here we are using LCEL to implement out agent

tools = [ImageSearchResults(num_results=5)]llm_with_tools = llm.bind( functions=[convert_to_openai_function(t) for t in tools])messages = [ SystemMessage(content=prefix), HumanMessagePromptTemplate.from_template("{input}"), AIMessage(content=suffix), MessagesPlaceholder(variable_name="agent_scratchpad"),]input_variables = ["input", "agent_scratchpad"]prompt = ChatPromptTemplate(input_variables=input_variables, messages=messages)agent = ( RunnablePassthrough.assign( agent_scratchpad=lambda x: format_to_openai_function_messages( x["intermediate_steps"] ) ) | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser())

Congratulation!! You are ready to test your Agent

response = agent_executor.invoke( { "input": "I am looking for some summer dress as I am travelling to new Delhi", "chat_history": [ HumanMessage(content="hi! my name is bob"), AIMessage(content="Hello Bob! How can I assist you today?"), ], })

Create Your First Visual Agent Using AOAI and AI Search - Search Product Catalog Images (2)

Hurray!! You are now ready to deploy this Agent to a Enterprise App with some good looking UI.

Here is the reference github repo with all the code artifact.

https://github.com/monuminu/AOAI_Samples/tree/main/content_product_tagging

Favor: Please clap if you like this and Follow me for more such content.

References:

Fundamentals of Knowledge Mining and Azure AI Search - TrainingExplore Azure AI Search to discover h...

LangChain Expression Language (LCEL) | 🦜️:link: Langchain
LangChain Expression Language, or LCEL, is...

What is Azure AI Vision? - Azure AI servicesThe Azure AI Vision service provides you with access to ...