Users can utilise this functionality to upload their PDF files through the portal and engage in chat discussions related to the content of those files.
Chat with your data utilises the following Azure Services:
- Azure Document Intelligence for extracting information from documents.
- Azure Cognitive Search for indexing and retrieving information.
- Azure OpenAI Embeddings for embed content extracted from files
We use Azure OpenAI Embeddings to convert text to vectors and index it in Azure Cognitive Search.
update the OpenAI environment variables with the following:
AZURE_OPENAI_API_EMBEDDINGS_DEPLOYMENT_NAME=
When deploying to Azure, ensure to update the Azure App service app settings with AZURE_OPENAI_API_EMBEDDINGS_DEPLOYMENT_NAME
- Create Azure Cognitive Search using the following link
- Create an index on Azure Cognitive Search with the following schema. You can use the Azure portal to create the following indexes
{
"name": "azure-chatgpt",
"fields": [
{
"name": "id",
"type": "Edm.String",
"key": true,
"filterable": true
},
{
"name": "user",
"type": "Edm.String",
"searchable": true,
"filterable": true
},
{
"name": "chatThreadId",
"type": "Edm.String",
"searchable": true,
"filterable": true
},
{
"name": "pageContent",
"searchable": true,
"type": "Edm.String"
},
{
"name": "metadata",
"type": "Edm.String"
},
{
"name": "embedding",
"type": "Collection(Edm.Single)",
"searchable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"retrievable": true,
"analyzer": "",
"dimensions": 1536,
"vectorSearchConfiguration": "vectorConfig"
}
],
"vectorSearch": {
"algorithmConfigurations": [
{
"name": "vectorConfig",
"kind": "hnsw"
}
]
}
}
- After the index has been created, proceed to modify the env.local file with the appropriate Azure Cognitive Search environment variables.
# Azure cognitive search is used for chat over your data
AZURE_SEARCH_API_KEY=
AZURE_SEARCH_NAME=
AZURE_SEARCH_INDEX_NAME=
AZURE_SEARCH_API_VERSION="2023-07-01-Preview"
-
Create an instance of Azure Form Recognizer (also known as Azure Document Intelligence) using the following link. Please be aware that this resource might be called Form recognizer in Azure Portal.
-
After the Form Recognizer (Document Intelligence) resource has been created, proceed to modify the
env.local
file with appropriate environment variables. You can find values for these variables in your Form Recognizer resource (Resource Management blade > Keys and Endpoint). Please make sure that you don't copy the endpoint from there, but only replace the region in the example below. For example, if your Form Recognizer resource is located in East US Azure region, yourAZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
variable would behttps://eastus.api.cognitive.microsoft.com/
.Please note that the file is only preserved for each chat thread:
# Azure AI Document Intelligence to extract content from your data AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT="https://REGION.api.cognitive.microsoft.com/" AZURE_DOCUMENT_INTELLIGENCE_KEY=
-
At this point, you should be able to start new chat sessions with the
File
option. -
Once the
File
chat option is selected, click theChoose File
button to select your document and then click theUpload
button to upload your file. Please note that the Form Recognizer service supports PDF (text or scanned), JPG and PNG input documents. -
Once you receive a notification about a successful file upload, you should be able to start chatting with chatting with a chatbot.
- Central place maintain uploaded files (e.g a storage account with blob storage)
- A way to delete indexed documents on Azure Cognitive Search if the chat thread is deleted