Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Copilot No. Series] Number series copilot #764

Open
wants to merge 63 commits into
base: main
Choose a base branch
from

Conversation

DmitryKatson
Copy link
Contributor

@DmitryKatson DmitryKatson commented Mar 16, 2024

Summary

Introducing the first [draft] version of a "Number Series Copilot" for Business Central, an innovative feature designed to simplify the creation of number series through natural language commands. This allows users to effortlessly set up number series for specific entities like customers, vendors, or even across all entities, by simply describing their needs or specifying a format example.

Current version is intent to create number series for NEW company, where no Number Series exist yet.

Number.Series.Copilot.-.New.company.Scenario.mp4

This flow was tested on the Set up numbers series for the new company like user prompts

Please follow this diagram (green flow), to get the idea of the ongoing process.

diagram-export-16-03-2024-13_39_19

Key ideas:

  • Azure OpenAI Tools calling is used to understand the user intent
  • AL is used to gather data from BC and build tool response
  • The new company flow requires about 120 different number series to be generated. Nor GPT-4, not GPT 3.5 turbo (latest) where possible to generate accurate json array with 1 go
  • Thus, instead of 1 LLM call to build 120 number series, the tool response is been chunked, to generate not more than 10 number series in one go
  • All LLM generations results (12 in total, each requires 5-9 sec of GPT 3.5 time) then been combined in 1 answer, which is presented to the user

Setup:

  • Use No. Series with Copilot Setup to set up Azure Open AI connection and prompts
  • Prompts should be provided separately (using other channels) due to security reasons
  • Model used: gpt-35-turbo-1106
image

Work Item(s)

Fixes #659
Fixes AB#503148

…ut AI implementation business logic)

Added new app components and capabilities for the No. Series Copilot app, including codeunits, page extensions, pages, and tables, to introduce AI features for generating number series. Additionally, new registers and installers were included to handle capability registration and app installation. This enhances the app's functionality and extensibility.
Update page and table numbers as there was overflow with the Base Application.
- Temporary adding Number Series Copilot Setup table and page to store aoai secrets, system prompt and functions prompt in the isolated storage.
- Add Jupyter Notebook patterns to gitignore.
- Updating codeunit, page, and table objects numbers.
The code refactor improves tool retrieval, adds tool ID handling, and updates UI elements. The changes aim to enhance the No. Series Copilot tool functionality and user experience. These alterations are crucial for integrating Azure Key Vault in future.

Include tool calls to the chat history

- Add support for tool call IDs in chat history, improving traceability and accountability for tool call results. The changes enable associating tool call IDs with tool messages in the chat history, enhancing the clarity and transparency of the interactions. This update aligns with the need to provide comprehensive context for tool call responses within the chat history.
- Introduce 'AddToolMessage' method and modify existing methods to capture tool-related data for chat messages.
- Add 'HistoryToolCallIds' list to save tool call IDs for messages.
- Create 'IsToolsList' method to check if a message contains tool-related information.
- Update 'AOAIChatRoles.Enum.al' to define a new 'tool' chat role.

These changes enable the tracking and differentiation of tool call results in chat history.
Enable support for returning valid JSON object as chat completion, adding guidelines for JSON production, and checking compatibility for JSON response format.
Improve json output format specification declaration and [TEMPORARY] implement no. series number limitations to avoid reaching token limit and timeout when dealing with many tables.
Consolidates tool invocation logic by using a dedicated function and refines prompt formats for better user experience. Also updates existing patterns to reflect more accurate numbering conventions. These changes aim to streamline tool usage and enhance prompt clarity.
Consolidate tool retrieval methods; refactor output format instructions.
Improved modularity and maintainability.
Add new Json module to provide tools for working with JSON data, including reading, writing, and parsing JSON. Introduces codeunit for initializing JSON array and object, retrieving element counts, fetching objects by index, and getting values for specified record fields. Implements methods to interact with JSON data using .NET types.

This is partly ported from Base Application codeunit 5459 "Json management"
Additionally, the visibility of response text on the No. Series Proposal Page is dynamically managed, depending on json response validation
Tools message parsing into `function name`, `arguments` and `tool call id`
Replace direct parsing with dedicated method for improved maintainability and readability. This enhances code structure and sets the stage for future extensibility. No related issues.
…f the tool response exceeds token limit.

Ensure secure handling of sensitive data in chat messages by swapping text types with secure text types. Implement chunking of tool responses to avoid exceeding token limits, resulting in improved prompt generation for new number series. Indirectly address potential security vulnerabilities and enhance system functionality.
Change the dictionary to standardize chunk processing of tables in order to avoid hallucinations and effectively handle large volumes. This ensures accurate and complete model responses, enhancing overall functionality. The refactor also introduces new helper functions to process tool completions and validate generated number series.
…nd output examples from setup

Introduce methods to retrieve tool 1 patterns and output examples from setup. Also, add labels and separators for better formatting. Improve error handling and optimize code readability.
Improved retrieval of tool 1 general instructions and output format for better modularity and maintainability.
…dle duplicate No. Series codes

This commit refactors JSON manipulation by separating common JSON operations into a separate file and introduces new functionality to handle duplicate No. Series codes using random character generation. This change improves data integrity and ensures uniqueness of No. Series codes. It adds methods to replace or add properties in JSON objects and modify JSON arrays as necessary.
Replace text prompts for tool limitations, code guidelines, description guidelines, number guidelines, output examples, and output format prompts. Add validation for property lengths in generating number series.
Refactor JSON handling methods to return JSON objects and arrays as text, enhancing readability and modularity across the codebase, paving the way for simplified integration with other modules. This change sets the stage for easier maintenance and future expansions of the JSON-related functionalities. No related issues are involved.
end;
end;

procedure ApplyProposedNoSeries(var NoSeriesGenerated: Record "No. Series Proposal Line")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, i'd like the variable name to somewhat match the object name. Variables can add descriptors to make it more precise but the more they differ the more I think we simply need better names...

var
NoSeries: Record "No. Series";
begin
NoSeries.Init();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoSeries is a local record variable and not in a loop so there is nothing to Init ...

var
NoSeriesLine: Record "No. Series Line";
begin
NoSeriesLine.Init();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, don't init when not needed.

NoSeries.Description := NoSeriesGenerated.Description;
NoSeries."Manual Nos." := true;
NoSeries."Default Nos." := true;
//TODO: Check if we need to add more fields here, like "Mask", "No. Series Type", "Reverse Sales VAT No. Series" etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having todo's in code worries me... is this just a note? should it be fixed before we ship? What if we don't fix it?

I dislike comments in code because they get outdated very quickly but I can live with them (docs are better :P) but todo's like this should not be in code.

NoSeriesLine."Ending No." := NoSeriesGenerated."Ending No.";
NoSeriesLine."Warning No." := NoSeriesGenerated."Warning No.";
NoSeriesLine."Increment-by No." := NoSeriesGenerated."Increment-by No.";
//TODO: Check if we need to add more fields here, like "Allow Gaps in Nos.", "Sequence Name" etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to set the "Allow Gaps" / Implementation field which depends on the purpose of the no. series.
I think we default to Normal / no gaps because it's the safe choice but we prefer Sequence / allow gaps because it has better perf.

FieldRef: FieldRef;
begin
RecRef.Open(NoSeriesGenerated."Setup Table No.");
if not RecRef.FindFirst() then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the assumption here seems to be that there is only 1 setup record and that record would have a blank primary key, should we not assert that more aggressively? Instead of finding the first record maybe we should 'get' the record with the blank prim. key. ... or check that the record we find has a blank value.

}
}

procedure GetEndpoint() Endpoint: Text[250]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these methods should be add to a codeunit somewhere. Logic should reside in codeunits not tables, pages etc...

I get the appeal of these getters but they are not used consistently. Sometimes they are used, sometimes not. They are also not using the Rec. as I'd expect but instead do their own Get... yes, these will likely hit the cache but still, we have Setup.Get all over the place now instead of getting the record once only..

AOAIChatMessages.DeleteMessage(AOAIChatMessages.GetHistory().Count); // remove the last message, as it is not needed anymore
AOAIChatMessages.DeleteMessage(AOAIChatMessages.GetHistory().Count); // remove the tools message, as it is not needed anymore

Sleep(1000); // sleep for 1000ms, as the model can be called only limited number of times per second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 1000ms really the right number? do we have any plan to finetune this a little more?

ApplyProposedNoSeries();
end;

local procedure GenerateNoSeries()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these methods should be moved to a codeunit... and we should have a oneliner here at most..

@MPGoldenEDI
Copy link

Proposing numerical sequences for use in production settings diverges from those in demonstrations. It's important to evaluate the long-term viability of each series and distinguish between various types of documents.
My experience is that the starting point never takes into concideration that the series must last 20 year maybe more.
My suggestion is to include the year and then have a length that covers a growth in the company. A prefix to the year can include alphanumeric characters, also inposted documents, if thats acceptabel by the company. Else the suggestion is to hav a number as prefix on documents that is used for external communication.
The advantage is that each document number clearly indicates the year of creation to the user, and the sorting sequence will invariably follow a logical order.
Example:
Sales Order:
a) SO24000001 when alphanumeric is allowed
b) 124000001 when alphanumeric is not allowed
Posted Sales Invoice:
a) SI24000001 when alphanumeric is allowed
b) 224000001 when alphanumeric is not allowed

I have seen so many systems where the partner has started with Contoso/Cronus number series and belive that this is the way to go.
I have helped customers reconstruct their number series, yet they still struggle with poor sorting.

…ditional function to get results in json, instead of just prompting it. X5 increase in quality and stability.

Improve the way tool prompts are generated for adding and changing number series, ensuring clarity and efficiency in user interactions. These changes aim to enhance the prompt building process based on specified tables and patterns, leading to a more streamlined user experience and accurate data handling.
@DmitryKatson
Copy link
Contributor Author

Here are my notes/questions I have from our sync :)

@darjoo

  • Action visibility

    • Addition to SaaS check, we need to limit CA and potentially any other languages. This is not necessary to add now but
      I'll add the code snippets below. CA is due to language requirement that both EN & FR are supported. Languages wise, we've typically stuck with EN first as Production Ready Previews and slowly roll out other languages as the language assessment team gives us the green light

Added codeunit 324 "No. Series Copilot Impl.".IsCopilotVisible() similar to other copilots you have in the system. It checks SaaS environment (deactivated for now, as I work in Docker) and supported languages (similar to other copilots, only -pl is excluded).

  • Use prompt guides

    • For the prompt dialog, using prompt guides can give the users an easy way to select premade prompts. The ones that you think are most likely to be used by them. I'm not sure if the docs are ready yet about them, but they will be mentioned in the Launch Event for What's New: Extending Copilot with AL

It has prompt guides and a placeholder now

  • There's a check for whether input against the max tokens, and it just exits if it's greater than the max tokens allowed. We should at least give an actionable error or see if it can be modified to always allow even if it's multiple requests

I added the codeunit 348 "No. Ser. Cop. Notific. Manager" and used NotificationManager.SendNotification(ChatCompletionResponseErr); in case of out of tokens scenario and also inside of the functions implementations codeunits - similar as you do in the Sales Lines Suggestions Impl.

  • When the generation comes back with a response, would it be beneficial for AOAIChatMessages to give you say, GetLastTool or GetLastFunction when it's a tool/function and provide you the Name:Text; Arguments: JsonArray (for example) instead of having to parse GetLastMessage?
  • I'll add a method in AOAIChatMessages to clear the existing tools. So you don't have to loop through and delete all tools.

Since I changed the approach from "Get Intent > Run intent function > Add result to chat history > get final answer" to "Get Intent > Run intent function > Get Result > Make standalone AI call without chat history" this is not relevant anymore. But thanks for reimplemening the function calling. Much easier to work now.

  • In your BuildGenerateNewNumberSeries function, it runs tokencount for each line. I'll be improving that, so you can put in AOAIChatMessages directly and it'll count everything that's being sent.

I've removed token counting inside of the functions calling. Instead, it's chunking requests based on number series to generate. As now i use standalone calls, instead of sending chat history from the beginning and I build the final prompt, I'm pretty in control of used tokens. According to the testing each call consumes average 15% of prompt tokens and 50% of completion tokens.

  • Filtering on name with * Setup for tables is risky as we can't guarantee they always end with Setup

There was a discussion on this, but seems we don't have other ways, currently how to find setup tables. Some additional checks could be that

  • There is only 1 record
  • There is a Code field
  • Code field is Blank
    Probably this will be a documented mandatory requirement to make this copilot work. But ideally, we should have TableType::Setup.
  • GPT3.5 has a context window of 16k while GPT4 has 32k and turbo 128k, would it be better to switch?
    I tested with GPT4 and GPT4-o. But currently GPT3.5 gives fastest and reliable responses with a very hand-crafted prompts that I use, at least on the test questions.
  • This you had answered already, hallucination can be pretty bad with generating more tables. So there isn't a point in using GPT4
  • Does generation of ~10 tables always work?

I followed your advice, and instead of "ask in the next Json format" approach I use now 3rd function and get Json array as input. This was a really good advice, which allowed to increase correctness from 10 tables to 50. However, in final version of the copilot I left 40 tables, as 50 works in 9 cases out of 10, according to my tests. And 40 seems to work always and usually cover most of the scenarios.

  • This you had also answered, it doesn't as such it does retries and completely fails if it doesn't generate all.
  • We should allow partial generation and then allow the the user to request another generation on the ungenerated ones, or a full regenerate.

This is a good idea, but not in this version. However, after moving to function call, instead of json output, and additional prompt finetuning i see much reduce in wrong generations. We also discussed this, outside of this thread, and decided to add "No. Series Copilot Impl.".MinimumAccuracy() which is 0.9 for now. It means that if user requested to generate number series for 40 tables and only 36 will be generated, then we keep this result and don't regenerate.

Also, i implemented a logic, that if number series is already set up in the "[area] Setup" table, then we don't need to generate it once again. Meaning, that if user asks "Generate me number series for purchase module", accepts the first result and then asks once again, there will be no generations as it's already configured. This will reduce the number of "duplications" errors and also quality will increase.

  • How long does it take to run? As running 10+ requests will take time, including retries that makes it worse.

    • Takes about a minute. This might be fine as this is an admin scenario where they're setting the system up. And this saves them a lot more time in the long run.

My tests say that approx. 38 seconds to generate 40 number series.

  • Retry logic, what if 1 out of 10 fails?

    • Similar qs to the generation of ~10 tables always work. You mentioned that all of them fail. We should allow partial generation if it makes sense.

Answered above

Thank you so much for going through the code and all! Looking forward to when this sees the light of day in the product!

Thank you for support in this journey ;)

Added a new Codeunit for managing notifications, refactored notification handling
to include centralized notification sending and recalling, and implemented
handling for chat completion errors. This update enhances the error
notification system and ensures consistent error messaging. This commit solves some microsoft#764 reviews
Improve setup table retrieval by filtering and checking metadata, ensuring correct setup table identification and validation. Enhances data accuracy and reliability.
@DmitryKatson
Copy link
Contributor Author

😅 "I worked really hard (c)" and now we have improved version of No Series Copilot.

  1. It's more stable in terms of generated results and require maximum 3 calls to AOAI instead of original 12 to generate no series for blank company. The generation time is still same, We need to generate same number of number series, so tokens/sec speed is a cumber stone here.
  2. The total number of tokens require to generate no series for blank company, reduced apprx. 60%.

This was achieved, due to new way of generating the final json array of number series. Thanks @darjoo for advice.
Instead of just asking LLM to "Please generate me a number series in a Json Array of next format..." we add a function with parameter of JsonArray type. We ask LLM to generate for us function parameter. Appeared it works much better, with slight prompting adjustments, at least X5 increase in result quality.

DmitryKatson and others added 3 commits July 5, 2024 16:29
Adjusted function to correctly remove the last character in the AreasText variable to ensure proper string formatting. This change ensures accurate text representation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AL: Business Foundation From Fork Pull request is coming from a fork Integration GitHub request for Integration area Linked Issue is linked to a Azure Boards work item
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BC Idea]: Number Series Copilot
10 participants