-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple bot contents in Prompt Action. #1148
base: master
Are you sure you want to change the base?
Conversation
WalkthroughWalkthroughThe recent updates focus on enhancing code readability, restructuring parameter handling, and improving validation processes across various components. These changes aim to streamline the codebase for better maintainability and logical flow. Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 33
Configuration used: CodeRabbit UI
Files ignored due to path filters (3)
template/use-cases/Hi-Hello-GPT/actions.yml
is excluded by:!**/*.yml
tests/testing_data/actions/actions.yml
is excluded by:!**/*.yml
tests/testing_data/actions/validation_action_data.json
is excluded by:!**/*.json
Files selected for processing (14)
- kairon/actions/definitions/prompt.py (4 hunks)
- kairon/api/models.py (2 hunks)
- kairon/importer/validator/file_validator.py (3 hunks)
- kairon/shared/actions/data_objects.py (5 hunks)
- kairon/shared/cognition/processor.py (1 hunks)
- kairon/shared/data/processor.py (1 hunks)
- kairon/shared/llm/gpt3.py (5 hunks)
- kairon/shared/utils.py (2 hunks)
- tests/integration_test/action_service_test.py (19 hunks)
- tests/integration_test/services_test.py (42 hunks)
- tests/unit_test/action/action_test.py (55 hunks)
- tests/unit_test/data_processor/data_processor_test.py (42 hunks)
- tests/unit_test/llm_test.py (18 hunks)
- tests/unit_test/validator/training_data_validator_test.py (1 hunks)
Files not summarized due to errors (3)
- tests/integration_test/services_test.py: Error: Message exceeds token limit
- tests/unit_test/action/action_test.py: Error: Message exceeds token limit
- tests/unit_test/data_processor/data_processor_test.py: Error: Message exceeds token limit
Files not reviewed due to errors (1)
- (no review received)
Additional comments: 54
kairon/actions/definitions/prompt.py (4)
- 73-75: The indentation and formatting adjustments in the
fill_slots_from_response
method call improve readability.- 83-84: The use of a ternary operator for setting
bot_response
based on the exception message is a concise way to handle different failure scenarios. However, ensure thatFAQ_DISABLED_ERR
andk_faq_action_config.get("failure_message")
cover all potential error messages that could be encountered in this context.- 109-119: The use of a dictionary to map LLM types to their corresponding parameter generation methods is a clean approach to handling different LLM configurations. This structure facilitates easy extension for additional LLM types in the future.
- 170-191: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [121-235]
Refactoring the handling of similarity and query prompts in
__get_gpt_params
enhances the method's clarity and maintainability. The separation of logic for different prompt types and the use of lists and dictionaries for organizing prompt data are good practices. However, ensure that the changes in prompt handling are thoroughly tested, especially the dynamic construction of prompts based on thek_faq_action_config
and the tracker's state.kairon/shared/cognition/processor.py (1)
- 249-249: Modifying the query in
validate_collection_name
to usellm_prompts__data__iexact
instead ofcollection
reflects the changes in how collections are associated with prompt actions. Ensure that all references to collections within prompt actions have been updated to align with this new approach.kairon/shared/llm/gpt3.py (4)
- 36-37: Splitting the initialization of
self.client
across multiple lines improves readability, especially when passing multiple arguments.- 52-66: The loop for processing collections and content within
train
is well-structured and uses descriptive variable names. The use oftqdm
for progress indication is a user-friendly addition. Ensure that error handling is robust, particularly for cases where content processing might fail.- 110-115: The handling of
query_prompt
in__get_answer
now checks for its presence more explicitly before attempting to use it. This change improves the method's robustness by ensuring thatquery_prompt
anduse_query_prompt
are only used when provided.- 206-235: Refactoring
__attach_similarity_prompt_if_enabled
to processsimilarity_prompt
as a list allows for more flexible and dynamic prompt construction. This approach supports the inclusion of multiple similarity prompts, enhancing the method's utility. Ensure that the logic for constructing the similarity context and handling extracted values is thoroughly tested, particularly in scenarios with multiple prompts.kairon/shared/actions/data_objects.py (4)
- 628-636: The
PromptHyperparameter
class introduces fieldstop_results
andsimilarity_threshold
with default values and validation logic. The validation ensuressimilarity_threshold
is within a reasonable range (0.3 to 1) andtop_results
does not exceed 30. This is a good practice to ensure the parameters are within expected bounds, enhancing the robustness of the system.- 641-641: The
LlmPrompt
class now includes ahyperparameters
field of typePromptHyperparameter
. This encapsulation of hyperparameters related to prompts is a positive change, promoting modularity and making the configuration of prompts more manageable.- 653-654: Validation logic within the
LlmPrompt
class now includes a call toself.hyperparameters.validate()
ifhyperparameters
are provided. This ensures that the hyperparameters associated with a prompt are validated according to the rules defined in thePromptHyperparameter
class. It's a good practice to validate nested documents to maintain data integrity.- 694-695: In the
PromptAction
class, there's a loop to validate eachllm_prompt
in thellm_prompts
list. This is crucial for ensuring that each prompt configuration adheres to the defined validation rules. Additionally, the use ofUtility.validate_kairon_faq_llm_prompts
andUtility.validate_llm_hyperparameters
for further validation of prompts and hyperparameters demonstrates a thorough approach to maintaining data integrity.kairon/api/models.py (2)
- 890-890: The addition of the
hyperparameters
field of typePromptHyperparameters
to theLlmPromptRequest
class is a significant improvement. It encapsulates hyperparameters related to prompts, making the class more modular and easier to manage. This change aligns with the principles of good software design by promoting encapsulation and reducing redundancy.- 872-893: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [887-896]
The removal of
top_results
,similarity_threshold
, andenable_response_cache
fields from thePromptActionConfigRequest
class, alongside the removal of their validators, simplifies the configuration process for prompt actions. This change is in line with the PR objectives of streamlining the handling and configuration of prompts. By delegating hyperparameter management to thePromptHyperparameters
class and removing redundant fields, the code becomes cleaner and more maintainable.tests/unit_test/validator/training_data_validator_test.py (3)
- 475-475: The assertion for the length of
error_summary['prompt_actions']
has been updated to57
. This change likely reflects an update in the validation logic forprompt_actions
, possibly due to new validation rules or adjustments in existing ones. Ensure that this change aligns with the intended validation enhancements mentioned in the PR objectives.- 478-478: The variable
required_fields_error
captures a specific error message related to missing required fields in aprompt_action
. The use of regex to match the error message is a good practice for validating dynamic error messages. However, ensure that the regex pattern accurately captures all possible variations of the error message to avoid false negatives in your tests.- 529-529: The final assertion checks the
component_count
dictionary for expected counts of various action types. This is a good practice to ensure that the validation logic correctly identifies and categorizes different action types. However, ensure that these counts are kept up-to-date with any changes in the validation rules or the addition/removal of action types to avoid false positives/negatives in your tests.tests/unit_test/llm_test.py (10)
- 511-511: The
collection
attribute is added to theCognitionData
instance but not used within thetest_gpt3_faq_embedding_predict
method. Ensure that this addition aligns with the intended use cases and that thecollection
attribute is utilized where necessary.- 523-526: The structure of the
k_faq_action_config
dictionary has been modified to include a list of dictionaries under the keysimilarity_prompt
. Ensure that all references tosimilarity_prompt
throughout the test suite and application code are updated to accommodate this change in structure.Verification successful
The grep output indicates that references to
similarity_prompt
across the codebase have been updated to accommodate the change in structure to a list of dictionaries. This includes updates in both application code and test suites, with the correct usage of keys within the dictionaries as per the new structure. Based on this evidence, it appears that the necessary updates have been made correctly.* 555-555: The URL construction in the `test_gpt3_faq_embedding_predict` method now includes the `collection` attribute. Verify that the constructed URL is correct and aligns with the expected endpoint format for the vector database. * 595-599: Introduced a `query_prompt` dictionary in the `k_faq_action_config` for query prompts. Ensure that the handling of `query_prompt` is implemented correctly in the corresponding methods and that the logic for using query prompts is properly integrated and tested.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that all references to `similarity_prompt` are correctly updated. grep -r "similarity_prompt" . --include=\*.pyLength of output: 7462
Verification successful
The grep output confirms the implementation and testing of the
query_prompt
feature across various parts of the codebase, including action definitions and shared LLM utilities. The presence of specific tests forquery_prompt
functionality further supports that its behavior is verified and integrated properly. Therefore, the handling ofquery_prompt
appears to be correctly implemented and tested as required.* 1001-1007: The addition of the `query_prompt` configuration within `k_faq_action_config` suggests a new feature for handling query prompts. Ensure that the logic for processing query prompts is correctly implemented and that there are sufficient tests to cover various scenarios, including cases where `use_query_prompt` is `True` and `False`. * 928-939: The test method `test_gpt3_faq_embedding_predict_with_previous_bot_responses` introduces handling for previous bot responses in the prediction configuration. Verify that the integration of previous bot responses is correctly implemented in the prediction logic and that it does not adversely affect the prediction outcomes.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify the correct handling of `query_prompt` in methods. grep -r "query_prompt" . --include=\*.pyLength of output: 4667
Verification successful
The grep output confirms the integration of
previous_bot_responses
in the prediction logic, particularly withingpt3.py
and related test files. This indicates that the handling for previous bot responses, as mentioned in the test methodtest_gpt3_faq_embedding_predict_with_previous_bot_responses
, is implemented. However, verifying whether this integration does not adversely affect prediction outcomes would require more detailed analysis or specific testing results, which are not provided in the grep output. Based on the available information, the integration part of the review comment is verified.* 996-1007: The test method `test_gpt3_faq_embedding_predict_with_query_prompt` introduces handling for query prompts in the prediction configuration. This is a significant addition and requires thorough testing to ensure that query prompts are correctly processed and that they influence the prediction outcomes as expected.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify the integration of previous bot responses in prediction logic. grep -r "previous_bot_responses" . --include=\*.pyLength of output: 947
Verification successful
The grep command output indicates that the query prompt feature is comprehensively tested across various scenarios and configurations in both integration and unit tests. Tests such as
test_prompt_action_response_action_with_query_prompt
,test_add_prompt_action_with_invalid_query_prompt
, andtest_update_prompt_action_with_invalid_query_prompt
, among others, suggest a thorough approach to verifying the functionality and integration of the query prompt feature. Therefore, the review comment about the necessity for thorough testing of the query prompt handling in predictions is supported by the existing tests in the codebase.* 809-819: The test method `test_gpt3_faq_embedding_predict_completion_connection_error` simulates a connection error scenario. Ensure that the error handling logic within the actual implementation gracefully handles such exceptions and provides meaningful feedback or fallbacks to the user or calling function.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for comprehensive testing of the query prompt feature. grep -r "query_prompt" ./tests --include=\*.pyLength of output: 2940
Verification successful
To further verify the error handling logic for connection errors and how the test case simulates and expects the system to handle such errors, we need to examine the relevant portions of the code in
./kairon/shared/llm/gpt3.py
and./tests/unit_test/llm_test.py
.
The verification process has confirmed that the error handling logic for
APIConnectionError
in the actual implementation (./kairon/shared/llm/gpt3.py
) provides meaningful feedback by logging the exception and creating a detailed response object indicating failure. Additionally, the test case (./tests/unit_test/llm_test.py
) accurately simulates a connection error scenario and checks for the expected failure response, aligning with the actual implementation's error handling strategy. Therefore, the review comment is validated.* 833-833: In the test method `test_gpt3_faq_embedding_predict_completion_connection_error`, the URL construction for the vector database search includes the `collection` attribute. Verify that the URL correctly targets the intended endpoint and that error handling for connection issues is robust. * 1070-1070: The test method `test_gpt3_faq_embedding_predict_with_query_prompt` includes multiple calls to the OpenAI API, first for embeddings and then for chat completions. Ensure that these calls are correctly sequenced and that the test accurately simulates the behavior of the system when handling query prompts.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify error handling for connection errors in the prediction logic. grep -r "APIConnectionError" . --include=\*.pyLength of output: 378
Script:
#!/bin/bash # Extract the error handling logic for APIConnectionError in gpt3.py cat ./kairon/shared/llm/gpt3.py | grep -A 10 "APIConnectionError" # Extract the relevant test case portion from llm_test.py cat ./tests/unit_test/llm_test.py | grep -A 10 "APIConnectionError"Length of output: 2049
Verification successful
The test method
test_gpt3_faq_embedding_predict_with_query_prompt
correctly sequences the OpenAI API calls, first for embeddings and then for chat completions, accurately simulating the system's behavior when handling query prompts. This aligns with the review comment's concern, confirming the test's implementation is as expected.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify the sequence and correctness of OpenAI API calls in the test method. grep -r "openai" ./tests --include=\*.pyLength of output: 14978
Script:
#!/bin/bash # Locate and review the implementation of test_gpt3_faq_embedding_predict_with_query_prompt rg "def test_gpt3_faq_embedding_predict_with_query_prompt" ./tests --include=\*.pyLength of output: 527
Script:
#!/bin/bash # Corrected attempt to locate and review the implementation of test_gpt3_faq_embedding_predict_with_query_prompt rg "def test_gpt3_faq_embedding_predict_with_query_prompt" ./testsLength of output: 183
Script:
#!/bin/bash # Extract and review the implementation of test_gpt3_faq_embedding_predict_with_query_prompt cat ./tests/unit_test/llm_test.py | grep -A 30 "async def test_gpt3_faq_embedding_predict_with_query_prompt"Length of output: 1985
Script:
#!/bin/bash # Attempt to capture more of the test_gpt3_faq_embedding_predict_with_query_prompt method implementation cat ./tests/unit_test/llm_test.py | grep -A 50 "async def test_gpt3_faq_embedding_predict_with_query_prompt"Length of output: 3163
kairon/shared/data/processor.py (1)
- 5514-5523: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [1148-1148]
The method
edit_prompt_action
has been modified to include changes related to the handling and configuration of prompts within bot actions. Ensure that the new logic correctly updates the prompt action configuration as intended and that all necessary validations are in place to prevent invalid data from being saved.tests/integration_test/action_service_test.py (5)
- 1-7: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1-8]
The import statements are correctly organized and follow Python's recommended order: standard library imports, followed by third-party imports, and then local application/library specific imports. This is good for readability and maintainability.
- 9744-9754: The addition of
instructions
in themock_completion.call_args.kwargs
dictionary is a good update that reflects the new functionality. However, ensure that all new fields added to the dictionary are properly validated in the test assertions to ensure they are being set and used as expected.- 9835-9835: The setup for
test_prompt_response_action
with the addition ofmock_embedding
andmock_completion
patches is correctly configured. Ensure that the test includes comprehensive assertions to validate the behavior of theGPT3FAQEmbedding
class in response to various prompt configurations.- 9977-9977: Passing
hyperparameters
directly toPromptAction.save()
is a clear and explicit way to configure the test entity. This change aligns with the PR's objective of simplifying and centralizing prompt action configuration.- 9994-9998: The detailed assertion for
mock_completion.call_args.kwargs
is good for validating the expected behavior of the completion call. Ensure that all new fields and configurations introduced in the PR are covered by similar detailed assertions across all relevant tests.tests/integration_test/services_test.py (14)
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [1-1]
The code snippet correctly sets up an HTTP header for authorization using pytest fixtures for token type and access token. This is a common practice in tests that require authenticated requests.
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [2-2]
The dictionary structure for defining prompts is consistent and clear, with each prompt having a defined
name
,instructions
,type
,source
,data
, andis_enabled
status. This structure facilitates easy understanding and manipulation of prompt configurations.
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [3-3]
The use of
DEFAULT_NLU_FALLBACK_RESPONSE
as a failure message is a good practice, ensuring consistency across different parts of the application. It's important to verify that this constant is defined and correctly represents a meaningful message for the context where it's used.Verification successful
The constant
DEFAULT_NLU_FALLBACK_RESPONSE
is defined inkairon/shared/data/constant.py
with the value "I'm sorry, I didn't quite understand that. Could you rephrase?". This confirms that the constant is indeed defined and correctly represents a meaningful message for the context where it's used, aligning with good practices for handling fallback responses in NLU contexts.* 1339-1344: > 📝 **NOTE** > This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [4-4]Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify the definition and value of DEFAULT_NLU_FALLBACK_RESPONSE rg "DEFAULT_NLU_FALLBACK_RESPONSE ="Length of output: 166
The test
test_add_prompt_action_with_invalid_similarity_threshold
demonstrates good practice by mocking dependencies (MongoProcessor.get_bot_settings
) and focusing on the behavior under test, which is the validation of thesimilarity_threshold
. The assertion checks for a specific error message, ensuring that the validation logic is correctly implemented.
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [5-5]
The assertion for checking the error message
'similarity_threshold should be within 0.3 and 1'
is specific and ensures that the validation logic forsimilarity_threshold
is correctly implemented. However, it's crucial to also verify that the error structure (loc
,msg
,type
) is consistent across different validation errors for maintainability.
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [6-6]
The test
test_add_prompt_action_with_invalid_top_results
is similar to the previous one but focuses on validatingtop_results
. It's a good practice to have separate tests for different validation rules, ensuring that each rule can be independently verified and maintained.
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [7-7]
Changing the
source
of a prompt fromstatic
tohistory
is a significant modification that affects how the prompt's data is interpreted and used. Ensure that this change aligns with the intended behavior and that any downstream effects are considered, especially in how history data is fetched and utilized in prompts.
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [8-8]
Increasing
num_bot_responses
from a default or previously set value to 10 might have implications on the bot's performance and user experience. Ensure that this change is tested for its impact on response times and that it aligns with user expectations for the number of responses.
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [10-10]
The renaming of a prompt type to
Similarity_analytical Prompt
suggests a more specific use case or functionality. Ensure that this renaming is reflected in all relevant parts of the application and that the new name accurately describes the prompt's purpose.
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [11-11]
The use of
client.put
for updating prompt actions indicates that this test is for an update scenario. It's crucial to ensure that the endpoint correctly handles updates, especially with respect to merging or replacing existing configurations.
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [12-12]
Setting
failure_message
to a custom value like"updated_failure_message"
is a good way to test the update functionality. However, ensure that the custom message is meaningful and tested for its appearance in relevant user-facing interfaces.
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [13-13]
The test
test_add_prompt_action_with_empty_collection_name
specifically checks for the validation of an emptydata
field in aSimilarity Prompt
. This is a good practice to ensure that essential fields are not left empty. However, ensure that the error message'Collection is required for bot content prompts!'
is consistent and clear to the end-users or developers.
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [14-14]
The test
test_add_prompt_action_with_empty_data_for_static_prompt
focuses on the validation of emptydata
for static prompts. This is important for maintaining the integrity of prompt configurations. The specific error message'data is required for static prompts!'
helps in quickly identifying the issue.
- 1339-1344: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [15-15]
The test
test_add_prompt_action_with_multiple_history_source_prompts
checks for the constraint of having only one history source prompt. This is a critical validation to prevent conflicts or unexpected behavior when multiple history sources are used. The assertion for the specific error message'Only one history source can be present!'
is a good practice.tests/unit_test/data_processor/data_processor_test.py (6)
- 277-299: This test case correctly checks for a
ValidationError
when a collection is required for bot content prompts but is not provided. It's a good practice to include such validation tests to ensure the system behaves as expected under various configurations. No changes are needed here, but ensure similar validation checks are consistently applied across all relevant test cases.- 338-338: The validation for
num_bot_responses
being greater than 5 is tested here. It's crucial to test boundary conditions like these to ensure the system's robustness. Consider adding similar tests for other numerical parameters to cover their boundary conditions as well.- 362-362: Testing for the requirement of a 'static' source for system prompts is important for ensuring prompt configurations are correctly validated. This test case effectively captures that requirement. No changes are suggested, but continue to ensure comprehensive coverage for all such configuration rules.
- 410-410: Validating that prompt names cannot be empty is crucial for data integrity. This test case effectively checks for that condition. It's recommended to also include tests for other required fields to ensure comprehensive validation coverage.
- 431-431: Testing for the requirement of 'data' in static prompts is important. This test case correctly identifies a missing 'data' scenario. Similar tests for other required fields in different prompt types would further strengthen the validation logic's coverage.
- 473-473: Ensuring the presence of a system prompt is critical for certain configurations. This test case effectively checks for scenarios where a system prompt is missing. Including additional context or comments about why a system prompt is required could improve the test's readability.
class PromptHyperparameters(BaseModel): | ||
top_results: int = 10 | ||
similarity_threshold: float = 0.70 | ||
|
||
@root_validator | ||
def check(cls, values): | ||
if not 0.3 <= values.get('similarity_threshold') <= 1: | ||
raise ValueError("similarity_threshold should be within 0.3 and 1") | ||
if values.get('top_results') > 30: | ||
raise ValueError("top_results should not be greater than 30") | ||
return values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PromptHyperparameters
class introduces default values and validation logic for top_results
and similarity_threshold
. The validation logic ensures that similarity_threshold
is within the range of 0.3 to 1 and top_results
does not exceed 30. This is a good practice as it enforces constraints on the input values, enhancing data integrity and preventing potential errors during runtime. However, consider adding a lower bound check for top_results
to ensure it's a positive value.
+ if values.get('top_results') <= 0:
+ raise ValueError("top_results should be a positive number")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
class PromptHyperparameters(BaseModel): | |
top_results: int = 10 | |
similarity_threshold: float = 0.70 | |
@root_validator | |
def check(cls, values): | |
if not 0.3 <= values.get('similarity_threshold') <= 1: | |
raise ValueError("similarity_threshold should be within 0.3 and 1") | |
if values.get('top_results') > 30: | |
raise ValueError("top_results should not be greater than 30") | |
return values | |
class PromptHyperparameters(BaseModel): | |
top_results: int = 10 | |
similarity_threshold: float = 0.70 | |
@root_validator | |
def check(cls, values): | |
if not 0.3 <= values.get('similarity_threshold') <= 1: | |
raise ValueError("similarity_threshold should be within 0.3 and 1") | |
if values.get('top_results') > 30: | |
raise ValueError("top_results should not be greater than 30") | |
if values.get('top_results') <= 0: | |
raise ValueError("top_results should be a positive number") | |
return values |
error_list = [] | ||
system_prompt_count = 0 | ||
history_prompt_count = 0 | ||
bot_content_prompt_count = 0 | ||
for prompt in llm_prompts: | ||
if prompt.get('hyperparameters') is not None: | ||
hyperparameters = prompt.get('hyperparameters') | ||
for key, value in hyperparameters.items(): | ||
if key == 'similarity_threshold': | ||
if not (0.3 <= value <= 1.0) or not ( | ||
isinstance(key, float) or isinstance(key, int)): | ||
error_list.append( | ||
"similarity_threshold should be within 0.3 and 1.0 and of type int or float!") | ||
if key == 'top_results' and (value > 30 or not isinstance(key, int)): | ||
error_list.append("top_results should not be greater than 30 and of type int!") | ||
if prompt.get('type') == 'system': | ||
system_prompt_count += 1 | ||
elif prompt.get('source') == 'history': | ||
history_prompt_count += 1 | ||
elif prompt.get('source') == 'bot_content': | ||
bot_content_prompt_count += 1 | ||
if prompt.get('type') not in ['user', 'system', 'query']: | ||
error_list.append('Invalid prompt type') | ||
if prompt.get('source') not in ['static', 'slot', 'action', 'history', 'bot_content']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [693-732]
The validation logic for similarity_threshold
and top_results
within __validate_llm_prompts
has been updated, and validation for collection
has been added. However, there are several issues and improvements to consider:
- The check for
similarity_threshold
mistakenly useskey
instead ofvalue
when checking the type, which will always result in the condition being false. This should be corrected to usevalue
. - Similarly, for
top_results
, the condition mistakenly checks the type ofkey
instead ofvalue
. - The addition of
collection
validation for bot content prompts is a good practice, ensuring that a collection name is provided when the source isbot_content
. - It's recommended to refactor the method to reduce complexity and improve readability. Consider breaking down the validation into smaller, more focused functions or methods.
- if not (0.3 <= value <= 1.0) or not (isinstance(key, float) or isinstance(key, int)):
+ if not (0.3 <= value <= 1.0) or not (isinstance(value, float) or isinstance(value, int)):
- if key == 'top_results' and (value > 30 or not isinstance(key, int)):
+ if key == 'top_results' and (value > 30 or not isinstance(value, int)):
Consider refactoring the validation logic to improve readability and maintainability. Breaking down the method into smaller validation functions could enhance clarity and testability.
'Only one system prompt can be present', 'Data must contain slot name', | ||
'Only one system prompt can be present', 'Only one system prompt can be present', | ||
'Only one system prompt can be present', 'Only one history source can be present'] | ||
del error_summary["prompt_actions"][21] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deletion of an element from error_summary["prompt_actions"]
at index 21
immediately after its assertion might indicate a workaround for a specific test scenario. This approach could potentially mask issues in the validation logic by artificially adjusting the test's expected outcomes. Consider revising the test setup or validation logic to avoid manipulating the error summary directly in the test.
assert error_summary['prompt_actions'] == ['top_results should not be greater than 30 and of type int!', | ||
'similarity_threshold should be within 0.3 and 1.0 and of type int or float!', | ||
'Collection is required for bot content prompts!', | ||
'System prompt is required', 'Query prompt must have static source', | ||
'Name cannot be empty', 'System prompt is required', | ||
'num_bot_responses should not be greater than 5 and of type int: prompt_action_invalid_num_bot_responses', | ||
'Collection is required for bot content prompts!', | ||
'data field in prompts should of type string.', | ||
'data is required for static prompts', | ||
'Temperature must be between 0.0 and 2.0!', | ||
'max_tokens must be between 5 and 4096!', | ||
'top_p must be between 0.0 and 1.0!', 'n must be between 1 and 5!', | ||
'presence_penality must be between -2.0 and 2.0!', | ||
'frequency_penalty must be between -2.0 and 2.0!', | ||
'logit_bias must be a dictionary!', | ||
'System prompt must have static source', | ||
'Collection is required for bot content prompts!', | ||
'Collection is required for bot content prompts!', | ||
'Temperature must be between 0.0 and 2.0!', | ||
'max_tokens must be between 5 and 4096!', | ||
'top_p must be between 0.0 and 1.0!', 'n must be between 1 and 5!', | ||
'Stop must be None, a string, an integer, or an array of 4 or fewer strings or integers.', | ||
'presence_penality must be between -2.0 and 2.0!', | ||
'frequency_penalty must be between -2.0 and 2.0!', | ||
'logit_bias must be a dictionary!', | ||
'Duplicate action found: test_add_prompt_action_one', | ||
'Invalid action configuration format. Dictionary expected.', | ||
'Temperature must be between 0.0 and 2.0!', | ||
'max_tokens must be between 5 and 4096!', | ||
'top_p must be between 0.0 and 1.0!', 'n must be between 1 and 5!', | ||
'Stop must be None, a string, an integer, or an array of 4 or fewer strings or integers.', | ||
'presence_penality must be between -2.0 and 2.0!', | ||
'frequency_penalty must be between -2.0 and 2.0!', | ||
'logit_bias must be a dictionary!', | ||
'Only one system prompt can be present', 'Invalid prompt type', | ||
'Invalid prompt source', 'Only one system prompt can be present', | ||
'Invalid prompt type', 'Invalid prompt source', | ||
'type in LLM Prompts should be of type string.', | ||
'source in LLM Prompts should be of type string.', | ||
'Instructions in LLM Prompts should be of type string.', | ||
'Only one system prompt can be present', | ||
'Data must contain action name', | ||
'Only one system prompt can be present', | ||
'Data must contain slot name', | ||
'Only one system prompt can be present', | ||
'Only one system prompt can be present', | ||
'Only one system prompt can be present', | ||
'Only one history source can be present'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertions for the remaining error messages in error_summary['prompt_actions']
cover a wide range of validation rules for prompt actions, including limits on numerical values, requirements for specific fields, and checks for duplicates. These assertions are crucial for ensuring the robustness of the validation logic. However, the large number of hardcoded assertions makes this test somewhat brittle and harder to maintain. Consider refactoring this test to iterate over expected error messages or categorizing these validations to simplify the test logic and improve maintainability.
@@ -155,7 +155,7 @@ def test_add_prompt_action_with_gpt_feature_disabled(self): | |||
bot = 'test' | |||
user = 'test_user' | |||
request = {"system_prompt": DEFAULT_SYSTEM_PROMPT, "context_prompt": DEFAULT_CONTEXT_PROMPT, | |||
"failure_message": DEFAULT_NLU_FALLBACK_RESPONSE, "top_results": 10, "similarity_threshold": 0.70, | |||
"failure_message": DEFAULT_NLU_FALLBACK_RESPONSE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case checks for an exception when the FAQ feature is disabled, which is good for ensuring feature flags work as expected. However, it's important to also verify that the exception is thrown under the correct conditions and not for other reasons. Adding more context or a brief explanation in the test case about why this specific exception is expected could improve clarity.
'type': 'user', 'source': 'bot_content', 'data': 'python', 'is_enabled': True, | ||
'hyperparameters': {"top_results": 10, "similarity_threshold": 0.70}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repetition of adding data
and hyperparameters
fields to the Similarity Prompt
entity in this hunk reinforces the need for a shared fixture or helper function to manage repeated test setup code.
Refer to previous comments about exploring shared fixtures or helper functions for repeated test setup code.
@@ -10408,7 +10418,7 @@ | |||
'is_enabled': True}, | |||
{'name': 'Similarity Prompt', | |||
'instructions': 'Answer question based on the context above, if answer is not in the context go check previous logs.', | |||
'type': 'user', 'source': 'bot_content', 'is_enabled': True}, | |||
'type': 'user', 'source': 'bot_content', 'data': 'python', 'is_enabled': True}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of a new Language Prompt
alongside the Similarity Prompt
entity expands the test's scope to cover more prompt configurations. Ensure that the test assertions are expanded to validate the behavior and configuration of the new Language Prompt
.
Add assertions to validate the configuration and behavior of the new Language Prompt
entity in the test.
@@ -10556,7 +10566,7 @@ | |||
'is_enabled': True}, | |||
{'name': 'Similarity Prompt', | |||
'instructions': 'Answer question based on the context above, if answer is not in the context go check previous logs.', | |||
'type': 'user', 'source': 'bot_content', 'is_enabled': True}, | |||
'type': 'user', 'source': 'bot_content', 'data': 'python', 'is_enabled': True}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repetition of adding a new Language Prompt
alongside the Similarity Prompt
entity in this hunk reinforces the need for comprehensive test coverage that includes validation for all new prompt configurations introduced in the PR.
Ensure that test assertions cover all new prompt configurations introduced in the PR.
@@ -10632,7 +10642,7 @@ | |||
'is_enabled': True}, | |||
{'name': 'Similarity Prompt', | |||
'instructions': 'Answer question based on the context above, if answer is not in the context go check previous logs.', | |||
'type': 'user', 'source': 'bot_content', 'is_enabled': True}, | |||
'type': 'user', 'source': 'bot_content', 'data': 'python', 'is_enabled': True}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of a mock_completion_for_answer
function suggests an extension of the test's functionality to include more detailed behavior validation. Ensure that this function is used effectively in the test to validate the completion process for answers.
Utilize the mock_completion_for_answer
function in test assertions to validate the completion process for answers in detail.
collection='python', bot="test_embed_faq_predict", user="test").save() | ||
|
||
query = "What kind of language is python?" | ||
k_faq_action_config = { | ||
"system_prompt": "You are a personal assistant. Answer the question according to the below context", | ||
"context_prompt": "Based on below context answer question, if answer not in context check previous logs.", | ||
"top_results": 10, "similarity_threshold": 0.70, 'use_similarity_prompt': True, | ||
'similarity_prompt_name': 'Similarity Prompt', | ||
'similarity_prompt_instructions': 'Answer according to this context.', "enable_response_cache": True} | ||
"similarity_prompt": [{"top_results": 10, "similarity_threshold": 0.70, 'use_similarity_prompt': True, | ||
'similarity_prompt_name': 'Similarity Prompt', | ||
'similarity_prompt_instructions': 'Answer according to this context.', | ||
"collection": 'python'}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test method test_gpt3_faq_embedding_predict_exact_match
does not appear to utilize the mock_llm_request
mock object within the test body. Ensure that the mock is correctly applied to simulate the behavior of external calls and that the test accurately reflects the scenario it is intended to cover.
Consider using the mock_llm_request
mock to simulate external calls within the test method to ensure that the test accurately reflects the intended scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (1)
- tests/integration_test/services_test.py (36 hunks)
Additional comments: 10
tests/integration_test/services_test.py (10)
- 2257-2267: The introduction of the
monkeypatch
parameter and the_mock_get_bot_settings
function intest_add_prompt_action_with_invalid_similarity_threshold
is a good practice for isolating tests from external dependencies. However, ensure that themonkeypatch
is correctly applied and that the mocked settings align with the test's expectations.- 2285-2286: The assertion for the error message
'similarity_threshold should be within 0.3 and 1'
correctly validates the updated validation logic for prompt actions. This change aligns with the PR objectives to refine validation logic.- 2292-2302: Similar to the previous comment, the introduction of the
monkeypatch
parameter and the_mock_get_bot_settings
function intest_add_prompt_action_with_invalid_top_results
is a good practice. Ensure the mocked settings are appropriate for the test scenario.- 2321-2321: The assertion for the error message
'top_results should not be greater than 30'
is a good validation of the updated logic. It ensures that thetop_results
parameter is within the expected range, aligning with the PR's goal to refine validation logic.- 2508-2508: The assertion checking for the error message
'Collection is required for bot content prompts!'
intest_add_prompt_action_with_empty_collection_name
is crucial for ensuring that bot content prompts have the necessary collection data. This aligns with the PR's objectives to streamline configuration and improve validation logic.- 2538-2538: The assertion for the error message
'data is required for static prompts!'
intest_add_prompt_action_with_empty_data_for_static_prompt
ensures that static prompts have the necessary data. This validation is in line with the PR's objectives to refine prompt action configuration and validation.- 2567-2567: The assertion checking for the error message
'Only one history source can be present!'
intest_add_prompt_action_with_multiple_history_source_prompts
is important for ensuring that prompt actions do not have multiple history sources, which aligns with the PR's goal to streamline prompt configurations.- 2727-2727: The assertion for the error message
'similarity_threshold should be within 0.3 and 1'
in the update prompt action test indicates that the validation logic for hyperparameters is correctly applied. This is consistent with the PR's objectives to update validation logic.- 2755-2755: The assertion for the error message
'top_results should not be greater than 30'
in the update prompt action test validates the updated logic for hyperparameters, ensuring thattop_results
is within the expected range. This aligns with the PR's goal to refine validation logic.- 2784-2784: The assertion checking for the error message
'num_bot_responses should not be greater than 5'
in the update prompt action test ensures that the number of bot responses is within the expected limit. This validation aligns with the PR's objectives to streamline and refine prompt action configurations.
'instructions': 'Answer question based on the context above, if answer is not in the context go check previous logs.', | ||
'type': 'user', 'source': 'bot_content', 'is_enabled': True}, | ||
{'name': 'Query Prompt', | ||
'data': 'A programming language is a system of notation for writing computer programs.[1] Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.', | ||
'instructions': 'Answer according to the context', 'type': 'query', | ||
'source': 'bot_content', 'is_enabled': False}, | ||
], | ||
"failure_message": "updated_failure_message", "top_results": 9, "similarity_threshold": 0.50, | ||
"failure_message": "updated_failure_message", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction of set_slots
with evaluation types (expression
, script
) in the update prompt action test indicates an advanced feature for handling prompt responses. It's important to ensure that the evaluation logic is secure, especially when executing scripts, to prevent injection attacks or unauthorized access.
Consider implementing additional security measures or validations for script evaluations to prevent potential security vulnerabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (1)
- tests/integration_test/chat_service_test.py (1 hunks)
Additional comments: 1
tests/integration_test/chat_service_test.py (1)
- 1634-1634: The
responses.reset()
call ensures that the responses are cleared before executing the test, which is a good practice to avoid test interference. This change helps maintain test isolation and ensures that each test runs in a clean state.
|
||
@responses.activate | ||
def test_whatsapp_valid_attachment_message_request(): | ||
responses.reset() | ||
def _mock_validate_hub_signature(*args, **kwargs): | ||
return True | ||
responses.add( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1-1633]
The test functions provided cover a wide range of scenarios involving chat services, including live agent handoff, chat history retrieval, and handling Instagram comments. It's crucial to ensure that all external dependencies are mocked correctly to make the tests reliable. Additionally, consider adding assertions to verify the correctness of the responses and the state after the test execution to ensure the functionality works as expected.
Consider adding more assertions in the test functions to verify the correctness of the responses and the state after the test execution. This will help ensure that the functionality works as expected and improve the reliability of the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- tests/integration_test/chat_service_test.py (2 hunks)
Files skipped from review as they are similar to previous changes (1)
- tests/integration_test/chat_service_test.py
def check(cls, values): | ||
if not 0.3 <= values.get('similarity_threshold') <= 1: | ||
raise ValueError("similarity_threshold should be within 0.3 and 1") | ||
if values.get('top_results') > 30: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
top_result always be there in values?
Summary by CodeRabbit
PromptHyperparameters
class to streamline hyperparameter management.