-
Notifications
You must be signed in to change notification settings - Fork 321
Better names #357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Better names #357
Conversation
WalkthroughAutomatic, descriptive name generation for task run configs and eval configs was added, ensuring meaningful defaults when names are omitted. Helper functions map IDs to user-friendly names, enforce naming rules, and handle truncation. Associated tests were introduced for the new naming logic and API integration. UI and constant updates were also made. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant API
participant NameGen
participant DB
User->>API: POST /create_task_run_config (no name)
API->>NameGen: generate_task_run_config_name(model_id, prompt_id, task)
NameGen->>DB: Lookup model and prompt display names
NameGen-->>API: Return sanitized, descriptive name
API->>DB: Create config with generated name
API-->>User: Respond with config (including generated name)
sequenceDiagram
participant User
participant API
participant NameGen
participant DB
User->>API: POST /create_eval_config (no name)
API->>NameGen: generate_eval_config_name(eval_type, model_id)
NameGen->>DB: Lookup model display name
NameGen-->>API: Return sanitized, descriptive name
API->>DB: Create eval config with generated name
API-->>User: Respond with eval config (including generated name)
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (1)
app/desktop/studio_server/eval_api.py (1)
2-2: Remove unused imports flagged by static analysis.The
reimport andNAME_REGEXconstant are not directly used in this file. The regex validation is handled by the importedstring_to_valid_namefunction.-import re- NAME_REGEX,Also applies to: 20-20
🧰 Tools
🪛 Ruff (0.11.9)
2-2:
reimported but unusedRemove unused import:
re(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
app/desktop/studio_server/eval_api.py(6 hunks)app/desktop/studio_server/test_eval_api.py(1 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte(1 hunks)libs/core/kiln_ai/datamodel/basemodel.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/desktop/studio_server/eval_api.py (4)
libs/core/kiln_ai/datamodel/basemodel.py (1)
string_to_valid_name(66-72)libs/core/kiln_ai/datamodel/prompt_id.py (1)
PromptGenerators(8-16)libs/core/kiln_ai/datamodel/task.py (2)
Task(171-240)prompts(227-228)libs/core/kiln_ai/datamodel/eval.py (1)
EvalConfigType(38-40)
🪛 Ruff (0.11.9)
app/desktop/studio_server/test_eval_api.py
1919-1919: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1892
(F811)
2029-2029: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1919
(F811)
2056-2056: Redefinition of unused test_create_eval_config_uses_descriptive_name from line 1946
(F811)
2084-2084: Redefinition of unused test_create_task_run_config_respects_provided_name from line 1974
(F811)
2111-2111: Redefinition of unused test_create_eval_config_respects_provided_name from line 2001
(F811)
app/desktop/studio_server/eval_api.py
2-2: re imported but unused
Remove unused import: re
(F401)
20-20: kiln_ai.datamodel.basemodel.NAME_REGEX imported but unused
Remove unused import: kiln_ai.datamodel.basemodel.NAME_REGEX
(F401)
🪛 Pylint (3.3.7)
app/desktop/studio_server/test_eval_api.py
[error] 1919-1919: function already defined line 1892
(E0102)
[error] 2029-2029: function already defined line 1892
(E0102)
[error] 2056-2056: function already defined line 1946
(E0102)
[error] 2084-2084: function already defined line 1974
(E0102)
[error] 2111-2111: function already defined line 2001
(E0102)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Build Desktop Apps (macos-13)
- GitHub Check: Build Desktop Apps (windows-latest)
🔇 Additional comments (12)
libs/core/kiln_ai/datamodel/basemodel.py (1)
57-57: LGTM! Good practice introducing a named constant.Excellent refactoring to eliminate magic numbers and centralize the maximum length constraint for short names.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte (1)
364-364: LGTM! Improved accessibility and clarity.Changing from an icon-only button to a text-labeled "Clear" button improves usability and accessibility while maintaining the same functionality.
app/desktop/studio_server/test_eval_api.py (1)
1728-2164: Excellent test coverage for the new naming functionality.The new tests provide comprehensive coverage of the descriptive naming feature, including:
- Unit tests for name generation functions with various scenarios
- Edge case handling (long names, invalid characters, empty names)
- Integration tests verifying API behavior
- Validation that the prompt generator mapping stays in sync with enum values
The test structure is well-organized and follows good testing practices with clear assertions and descriptive test names.
🧰 Tools
🪛 Ruff (0.11.9)
1919-1919: Redefinition of unused
test_create_task_run_config_uses_descriptive_namefrom line 1892(F811)
2029-2029: Redefinition of unused
test_create_task_run_config_uses_descriptive_namefrom line 1919(F811)
2056-2056: Redefinition of unused
test_create_eval_config_uses_descriptive_namefrom line 1946(F811)
2084-2084: Redefinition of unused
test_create_task_run_config_respects_provided_namefrom line 1974(F811)
2111-2111: Redefinition of unused
test_create_eval_config_respects_provided_namefrom line 2001(F811)
🪛 Pylint (3.3.7)
[error] 1919-1919: function already defined line 1892
(E0102)
[error] 2029-2029: function already defined line 1892
(E0102)
[error] 2056-2056: function already defined line 1946
(E0102)
[error] 2084-2084: function already defined line 1974
(E0102)
[error] 2111-2111: function already defined line 2001
(E0102)
app/desktop/studio_server/eval_api.py (9)
361-365: LGTM! Clean implementation of automatic name generation.The fallback logic is well-implemented - when no explicit name is provided, it generates a descriptive name using the model and prompt information. This enhances user experience by providing meaningful defaults.
416-418: LGTM! Consistent name generation pattern.The eval config name generation follows the same pattern as task run config, maintaining consistency across the API.
869-890: Well-documented and robust name generation function.The function properly handles the model and prompt name resolution with clear documentation. Good use of helper functions for separation of concerns.
893-913: Consistent implementation for eval config names.The function follows the same pattern as task run config name generation, ensuring consistency across the API.
928-945: Robust prompt name resolution with good fallback handling.The function properly handles different prompt ID formats (saved prompts, fine-tune prompts, and generators) with appropriate fallbacks. The logic correctly extracts IDs and looks up display names.
948-954: Complete coverage of EvalConfigType enum values.The mapping correctly covers both
g_evalandllm_as_judgeevaluation types with user-friendly display names.
957-965: Efficient model name lookup with appropriate fallback.The function iterates through built-in models to find friendly names, with a sensible fallback to the original model ID. For typical use cases, this linear search should perform adequately.
968-981: Comprehensive name sanitization with edge case handling.The function properly uses the existing
string_to_valid_nameutility, handles length constraints with truncation, and ensures a non-empty result. The fallback to "Config" is a reasonable default.
916-925:✅ Verification successful
Verify all PromptGenerators enum values are covered.
The mapping dictionary should include all values from the PromptGenerators enum to ensure comprehensive coverage.
🏁 Script executed:
#!/bin/bash # Description: Check if all PromptGenerators enum values are covered in the mapping dictionary # Expected: All enum values should have corresponding entries in _prompt_generator_names # Get all PromptGenerators enum values echo "PromptGenerators enum values:" ast-grep --pattern 'class PromptGenerators($$$) { $$$ $A = $B $$$ }' | grep -E '^\s*[A-Z_]+ = ' echo -e "\n_prompt_generator_names dictionary keys:" # Get all keys from the mapping dictionary rg -A 10 '_prompt_generator_names = {' app/desktop/studio_server/eval_api.pyLength of output: 466
Let’s extract the actual enum members and the dict keys via AST to get an accurate comparison:
This will show exactly which
PromptGeneratorsmembers aren’t covered in_prompt_generator_names.
🏁 Script executed:
#!/usr/bin/env python3 import ast # Parse the file path = 'app/desktop/studio_server/eval_api.py' with open(path) as f: tree = ast.parse(f.read(), path) enum_vals = [] map_keys = [] # Walk the AST for node in ast.walk(tree): # Collect PromptGenerators enum attributes if isinstance(node, ast.ClassDef) and node.name == 'PromptGenerators': for stmt in node.body: if isinstance(stmt, ast.Assign): for t in stmt.targets: if isinstance(t, ast.Name): enum_vals.append(t.id) # Find the mapping dictionary if isinstance(node, ast.Assign): for t in node.targets: if isinstance(t, ast.Name) and t.id == '_prompt_generator_names': if isinstance(node.value, ast.Dict): for key in node.value.keys: # Handle keys like PromptGenerators.X.value if isinstance(key, ast.Attribute): map_keys.append(key.attr) # Handle literal keys elif isinstance(key, ast.Constant): map_keys.append(key.value) # Print results print("Enum values: ", enum_vals) print("Mapped keys: ", map_keys) print("Missing in map:", sorted(set(enum_vals) - set(map_keys)))Length of output: 456
Let’s run a proper in-line Python snippet under Bash to extract the enum members versus map keys:
🏁 Script executed:
#!/usr/bin/env bash python3 - << 'EOF' import ast, sys path = 'app/desktop/studio_server/eval_api.py' with open(path, 'r') as f: tree = ast.parse(f.read(), path) enum_vals = [] map_keys = [] for node in ast.walk(tree): # PromptGenerators enum members if isinstance(node, ast.ClassDef) and node.name == 'PromptGenerators': for stmt in node.body: if isinstance(stmt, ast.Assign): for target in stmt.targets: if isinstance(target, ast.Name): enum_vals.append(target.id) # _prompt_generator_names dict keys if isinstance(node, ast.Assign): for target in node.targets: if isinstance(target, ast.Name) and target.id == '_prompt_generator_names': if isinstance(node.value, ast.Dict): for key in node.value.keys: # e.g., PromptGenerators.X.value if isinstance(key, ast.Attribute): map_keys.append(key.attr) elif isinstance(key, ast.Constant): map_keys.append(key.value) print("Enum values: ", enum_vals) print("Mapped keys: ", map_keys) print("Missing in map:", sorted(set(enum_vals) - set(map_keys))) EOFLength of output: 138
Let’s locate the enum definition and pull out its members alongside the mapping snippet for a direct comparison:
🏁 Script executed:
#!/usr/bin/env bash # 1) Find the file where PromptGenerators is defined enum_file=$(rg -l "class PromptGenerators" . | head -n1) echo "Enum file: $enum_file" # 2) List all enum members (assignments) in that class echo -e "\nEnum members:" sed -n '1,200p' "$enum_file" | grep -E '^\s*[A-Z_]+ = ' # 3) Show the mapping dictionary in eval_api.py around lines 900–960 echo -e "\nMapping snippet in eval_api.py:" sed -n '900,960p' app/desktop/studio_server/eval_api.py | sed -n '1,50p'Length of output: 2747
All PromptGenerators enum values are covered in
_prompt_generator_names.
I’ve confirmed that each of the eight members ofPromptGenerators(SIMPLE, SHORT, FEW_SHOT, MULTI_SHOT, REPAIRS, SIMPLE_CHAIN_OF_THOUGHT, FEW_SHOT_CHAIN_OF_THOUGHT, MULTI_SHOT_CHAIN_OF_THOUGHT) has a corresponding entry in the mapping dictionary. No missing entries found.
| SHORT_NAME_FIELD = Field( | ||
| min_length=1, | ||
| max_length=32, | ||
| max_length=MAX_SHORT_NAME_LENGTH, | ||
| pattern=NAME_REGEX, | ||
| description="A name for this entity", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify consistency across the codebase.
The update to use the constant is correct. Ensure that other places in the codebase that might use similar length constraints are updated to use this centralized constant.
🏁 Script executed:
#!/bin/bash
# Description: Search for other hardcoded values of 32 that might be related to name length constraints
# Expected: Find any other locations that should use the new constant
# Search for hardcoded 32 values in relevant files
rg -A 3 -B 3 "32" --type py | grep -E "(name|length|max)" -i
# Search specifically in validation/field definitions
ast-grep --pattern 'max_length=$_'Length of output: 4800
Replace hardcoded slice length with MAX_SHORT_NAME_LENGTH
The only remaining literal 32 in the basemodel is in the path‐formatter slice. Update it to use the centralized constant, and review any tests or helpers that still reference 32 directly.
• libs/core/kiln_ai/datamodel/basemodel.py
# before
- path = f"{path} - {name[:32]}"
# after
+ path = f"{path} - {name[:MAX_SHORT_NAME_LENGTH]}"• Review tests in app/desktop/studio_server/test_eval_api.py (and any others) that assert truncation to 32 characters—consider importing and using MAX_SHORT_NAME_LENGTH there for full consistency.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SHORT_NAME_FIELD = Field( | |
| min_length=1, | |
| max_length=32, | |
| max_length=MAX_SHORT_NAME_LENGTH, | |
| pattern=NAME_REGEX, | |
| description="A name for this entity", | |
| ) | |
| --- a/libs/core/kiln_ai/datamodel/basemodel.py | |
| +++ b/libs/core/kiln_ai/datamodel/basemodel.py | |
| @@ | |
| - path = f"{path} - {name[:32]}" | |
| + path = f"{path} - {name[:MAX_SHORT_NAME_LENGTH]}" |
🤖 Prompt for AI Agents
In libs/core/kiln_ai/datamodel/basemodel.py around lines 58 to 63, replace the
hardcoded slice length 32 in the path-formatter slice with the constant
MAX_SHORT_NAME_LENGTH to maintain consistency. Additionally, review and update
any tests or helper functions, such as those in
app/desktop/studio_server/test_eval_api.py, that assert truncation or length
constraints using the literal 32, and modify them to import and use
MAX_SHORT_NAME_LENGTH instead.
| async def test_create_task_run_config_uses_descriptive_name( | ||
| client, mock_task_from_id, mock_task | ||
| ): | ||
| """Test that task run config creation uses descriptive naming when name is not provided.""" | ||
| mock_task_from_id.return_value = mock_task | ||
|
|
||
| response = client.post( | ||
| "/api/projects/project1/tasks/task1/task_run_config", | ||
| json={ | ||
| # No name provided - should generate descriptive name | ||
| "run_config_properties": { | ||
| "model_name": "gpt-4o", | ||
| "model_provider_name": "openai", | ||
| "prompt_id": "simple_prompt_builder", | ||
| "structured_output_mode": "json_schema", | ||
| }, | ||
| }, | ||
| ) | ||
|
|
||
| assert response.status_code == 200 | ||
| result = response.json() | ||
|
|
||
| # Should use descriptive name format: model - prompt | ||
| assert result["name"] == "GPT 4o - Basic Prompt" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Remove duplicate function definitions.
The function test_create_task_run_config_uses_descriptive_name is defined three times (lines 1892, 1919, and 2029). Python will only use the last definition, making the earlier ones unreachable dead code.
Remove the duplicate definitions:
-@pytest.mark.asyncio
-async def test_create_task_run_config_uses_descriptive_name(
- client, mock_task_from_id, mock_task
-):
- """Test that task run config creation uses descriptive naming when name is not provided."""
- mock_task_from_id.return_value = mock_task
-
- response = client.post(
- "/api/projects/project1/tasks/task1/task_run_config",
- json={
- # No name provided - should generate descriptive name
- "run_config_properties": {
- "model_name": "gpt-4o",
- "model_provider_name": "openai",
- "prompt_id": "simple_prompt_builder",
- "structured_output_mode": "json_schema",
- },
- },
- )
-
- assert response.status_code == 200
- result = response.json()
-
- # Should use descriptive name format: model - prompt
- assert result["name"] == "GPT 4o - Basic Prompt"And remove the third duplicate starting at line 2029.
Also applies to: 2029-2053
🧰 Tools
🪛 Ruff (0.11.9)
1919-1919: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1892
(F811)
🪛 Pylint (3.3.7)
[error] 1919-1919: function already defined line 1892
(E0102)
🤖 Prompt for AI Agents
In app/desktop/studio_server/test_eval_api.py between lines 1892-1919-2029, the
function test_create_task_run_config_uses_descriptive_name is defined three
times, causing the first two definitions to be overridden and effectively dead
code. Remove the duplicate function definitions at lines 1892-1918 and
2029-2053, keeping only one definition to ensure the test runs correctly without
redundancy.
| async def test_create_eval_config_respects_provided_name( | ||
| client, mock_task_from_id, mock_eval, mock_task | ||
| ): | ||
| """Test that eval config creation respects provided names.""" | ||
| mock_task_from_id.return_value = mock_task | ||
|
|
||
| with patch("app.desktop.studio_server.eval_api.eval_from_id") as mock_eval_from_id: | ||
| mock_eval_from_id.return_value = mock_eval | ||
|
|
||
| response = client.post( | ||
| "/api/projects/project1/tasks/task1/eval/eval1/create_eval_config", | ||
| json={ | ||
| "name": "Custom Eval Config", # Explicit name provided | ||
| "type": "llm_as_judge", | ||
| "properties": {"eval_steps": ["step1"]}, | ||
| "model_name": "gpt-4o", | ||
| "provider": "openai", | ||
| }, | ||
| ) | ||
|
|
||
| assert response.status_code == 200 | ||
| result = response.json() | ||
|
|
||
| # Should use the provided name, not generated one | ||
| assert result["name"] == "Custom Eval Config" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Remove duplicate function definition.
The function test_create_eval_config_respects_provided_name is defined twice (lines 2001 and 2111). Remove the duplicate.
-@pytest.mark.asyncio
-async def test_create_eval_config_respects_provided_name(
- client, mock_task_from_id, mock_eval, mock_task
-):
- """Test that eval config creation respects provided names."""
- mock_task_from_id.return_value = mock_task
-
- with patch("app.desktop.studio_server.eval_api.eval_from_id") as mock_eval_from_id:
- mock_eval_from_id.return_value = mock_eval
-
- response = client.post(
- "/api/projects/project1/tasks/task1/eval/eval1/create_eval_config",
- json={
- "name": "Custom Eval Config", # Explicit name provided
- "type": "llm_as_judge",
- "properties": {"eval_steps": ["step1"]},
- "model_name": "gpt-4o",
- "provider": "openai",
- },
- )
-
- assert response.status_code == 200
- result = response.json()
-
- # Should use the provided name, not generated one
- assert result["name"] == "Custom Eval Config"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def test_create_eval_config_respects_provided_name( | |
| client, mock_task_from_id, mock_eval, mock_task | |
| ): | |
| """Test that eval config creation respects provided names.""" | |
| mock_task_from_id.return_value = mock_task | |
| with patch("app.desktop.studio_server.eval_api.eval_from_id") as mock_eval_from_id: | |
| mock_eval_from_id.return_value = mock_eval | |
| response = client.post( | |
| "/api/projects/project1/tasks/task1/eval/eval1/create_eval_config", | |
| json={ | |
| "name": "Custom Eval Config", # Explicit name provided | |
| "type": "llm_as_judge", | |
| "properties": {"eval_steps": ["step1"]}, | |
| "model_name": "gpt-4o", | |
| "provider": "openai", | |
| }, | |
| ) | |
| assert response.status_code == 200 | |
| result = response.json() | |
| # Should use the provided name, not generated one | |
| assert result["name"] == "Custom Eval Config" |
🧰 Tools
🪛 Ruff (0.11.9)
2111-2111: Redefinition of unused test_create_eval_config_respects_provided_name from line 2001
(F811)
🪛 Pylint (3.3.7)
[error] 2111-2111: function already defined line 2001
(E0102)
🤖 Prompt for AI Agents
In app/desktop/studio_server/test_eval_api.py between lines 2001 and 2136, the
function test_create_eval_config_respects_provided_name is defined twice. Remove
the duplicate definition starting at line 2111 to keep only one unique
implementation of this test function.
| async def test_create_task_run_config_respects_provided_name( | ||
| client, mock_task_from_id, mock_task | ||
| ): | ||
| """Test that task run config creation respects provided names.""" | ||
| mock_task_from_id.return_value = mock_task | ||
|
|
||
| response = client.post( | ||
| "/api/projects/project1/tasks/task1/task_run_config", | ||
| json={ | ||
| "name": "Custom Name", # Explicit name provided | ||
| "run_config_properties": { | ||
| "model_name": "gpt-4o", | ||
| "model_provider_name": "openai", | ||
| "prompt_id": "simple_prompt_builder", | ||
| "structured_output_mode": "json_schema", | ||
| }, | ||
| }, | ||
| ) | ||
|
|
||
| assert response.status_code == 200 | ||
| result = response.json() | ||
|
|
||
| # Should use the provided name, not generated one | ||
| assert result["name"] == "Custom Name" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Remove duplicate function definition.
The function test_create_task_run_config_respects_provided_name is defined twice (lines 1974 and 2084). Remove the duplicate.
-@pytest.mark.asyncio
-async def test_create_task_run_config_respects_provided_name(
- client, mock_task_from_id, mock_task
-):
- """Test that task run config creation respects provided names."""
- mock_task_from_id.return_value = mock_task
-
- response = client.post(
- "/api/projects/project1/tasks/task1/task_run_config",
- json={
- "name": "Custom Name", # Explicit name provided
- "run_config_properties": {
- "model_name": "gpt-4o",
- "model_provider_name": "openai",
- "prompt_id": "simple_prompt_builder",
- "structured_output_mode": "json_schema",
- },
- },
- )
-
- assert response.status_code == 200
- result = response.json()
-
- # Should use the provided name, not generated one
- assert result["name"] == "Custom Name"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def test_create_task_run_config_respects_provided_name( | |
| client, mock_task_from_id, mock_task | |
| ): | |
| """Test that task run config creation respects provided names.""" | |
| mock_task_from_id.return_value = mock_task | |
| response = client.post( | |
| "/api/projects/project1/tasks/task1/task_run_config", | |
| json={ | |
| "name": "Custom Name", # Explicit name provided | |
| "run_config_properties": { | |
| "model_name": "gpt-4o", | |
| "model_provider_name": "openai", | |
| "prompt_id": "simple_prompt_builder", | |
| "structured_output_mode": "json_schema", | |
| }, | |
| }, | |
| ) | |
| assert response.status_code == 200 | |
| result = response.json() | |
| # Should use the provided name, not generated one | |
| assert result["name"] == "Custom Name" |
🧰 Tools
🪛 Ruff (0.11.9)
2084-2084: Redefinition of unused test_create_task_run_config_respects_provided_name from line 1974
(F811)
🪛 Pylint (3.3.7)
[error] 2084-2084: function already defined line 1974
(E0102)
🤖 Prompt for AI Agents
In app/desktop/studio_server/test_eval_api.py between lines 1974 and 2108, the
function test_create_task_run_config_respects_provided_name is defined twice.
Remove one of the duplicate definitions entirely to avoid conflicts and ensure
only a single test function with that name exists.
| async def test_create_eval_config_uses_descriptive_name( | ||
| client, mock_task_from_id, mock_eval, mock_task | ||
| ): | ||
| """Test that eval config creation uses descriptive naming when name is not provided.""" | ||
| mock_task_from_id.return_value = mock_task | ||
|
|
||
| with patch("app.desktop.studio_server.eval_api.eval_from_id") as mock_eval_from_id: | ||
| mock_eval_from_id.return_value = mock_eval | ||
|
|
||
| response = client.post( | ||
| "/api/projects/project1/tasks/task1/eval/eval1/create_eval_config", | ||
| json={ | ||
| # No name provided - should generate descriptive name | ||
| "type": "g_eval", | ||
| "properties": {"eval_steps": ["step1"]}, | ||
| "model_name": "gpt_4o", | ||
| "provider": "openai", | ||
| }, | ||
| ) | ||
|
|
||
| assert response.status_code == 200 | ||
| result = response.json() | ||
|
|
||
| # Should use descriptive name format: eval_method - model | ||
| assert result["name"] == "G-Eval - GPT 4o" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Remove duplicate function definition.
The function test_create_eval_config_uses_descriptive_name is defined twice (lines 1946 and 2056). Remove the duplicate.
-@pytest.mark.asyncio
-async def test_create_eval_config_uses_descriptive_name(
- client, mock_task_from_id, mock_eval, mock_task
-):
- """Test that eval config creation uses descriptive naming when name is not provided."""
- mock_task_from_id.return_value = mock_task
-
- with patch("app.desktop.studio_server.eval_api.eval_from_id") as mock_eval_from_id:
- mock_eval_from_id.return_value = mock_eval
-
- response = client.post(
- "/api/projects/project1/tasks/task1/eval/eval1/create_eval_config",
- json={
- # No name provided - should generate descriptive name
- "type": "g_eval",
- "properties": {"eval_steps": ["step1"]},
- "model_name": "gpt_4o",
- "provider": "openai",
- },
- )
-
- assert response.status_code == 200
- result = response.json()
-
- # Should use descriptive name format: eval_method - model
- assert result["name"] == "G-Eval - GPT 4o"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def test_create_eval_config_uses_descriptive_name( | |
| client, mock_task_from_id, mock_eval, mock_task | |
| ): | |
| """Test that eval config creation uses descriptive naming when name is not provided.""" | |
| mock_task_from_id.return_value = mock_task | |
| with patch("app.desktop.studio_server.eval_api.eval_from_id") as mock_eval_from_id: | |
| mock_eval_from_id.return_value = mock_eval | |
| response = client.post( | |
| "/api/projects/project1/tasks/task1/eval/eval1/create_eval_config", | |
| json={ | |
| # No name provided - should generate descriptive name | |
| "type": "g_eval", | |
| "properties": {"eval_steps": ["step1"]}, | |
| "model_name": "gpt_4o", | |
| "provider": "openai", | |
| }, | |
| ) | |
| assert response.status_code == 200 | |
| result = response.json() | |
| # Should use descriptive name format: eval_method - model | |
| assert result["name"] == "G-Eval - GPT 4o" |
🧰 Tools
🪛 Ruff (0.11.9)
2056-2056: Redefinition of unused test_create_eval_config_uses_descriptive_name from line 1946
(F811)
🪛 Pylint (3.3.7)
[error] 2056-2056: function already defined line 1946
(E0102)
🤖 Prompt for AI Agents
In app/desktop/studio_server/test_eval_api.py between lines 1946 and 2081, there
are two definitions of the function
test_create_eval_config_uses_descriptive_name. Remove the duplicate function
definition at lines 2056 to 2081 to keep only one unique definition of this test
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
app/desktop/studio_server/test_eval_api.py (1)
1892-2136:⚠️ Potential issueDuplicate test definitions remain – delete the earlier copies.
test_create_task_run_config_uses_descriptive_name,
test_create_eval_config_uses_descriptive_name,
test_create_task_run_config_respects_provided_name, and
test_create_eval_config_respects_provided_nameare each defined twice (see 1892-1916, 1919-1943, 2029-2053, 2056-2081, 1974-1998, 2084-2108, 2001-2025, 2111-2135).
Python keeps only the last definition; Ruff/Pylint raise F811/E0102.
Remove the redundant copies to avoid dead code and confusing test results.-@pytest.mark.asyncio -async def test_create_task_run_config_uses_descriptive_name(...): - ... -# (entire block 1892-1916) -@pytest.mark.asyncio -async def test_create_eval_config_uses_descriptive_name(...): - ... -# (entire block 1946-1971) -@pytest.mark.asyncio -async def test_create_task_run_config_respects_provided_name(...): - ... -# (entire block 1974-1998) -@pytest.mark.asyncio -async def test_create_eval_config_respects_provided_name(...): - ... -# (entire block 2001-2025)Keep the single versions starting at 1919, 2056, 2084 and 2111 (or whichever you prefer) and adjust line-numbers accordingly.
🧰 Tools
🪛 Ruff (0.11.9)
1919-1919: Redefinition of unused
test_create_task_run_config_uses_descriptive_namefrom line 1892(F811)
2029-2029: Redefinition of unused
test_create_task_run_config_uses_descriptive_namefrom line 1919(F811)
2056-2056: Redefinition of unused
test_create_eval_config_uses_descriptive_namefrom line 1946(F811)
2084-2084: Redefinition of unused
test_create_task_run_config_respects_provided_namefrom line 1974(F811)
2111-2111: Redefinition of unused
test_create_eval_config_respects_provided_namefrom line 2001(F811)
🪛 Pylint (3.3.7)
[error] 1919-1919: function already defined line 1892
(E0102)
[error] 2029-2029: function already defined line 1892
(E0102)
[error] 2056-2056: function already defined line 1946
(E0102)
[error] 2084-2084: function already defined line 1974
(E0102)
[error] 2111-2111: function already defined line 2001
(E0102)
🧹 Nitpick comments (3)
app/desktop/studio_server/eval_api.py (3)
2-2:reimport is unused – please remove.The module no longer references
re, triggering Ruff F401.-import re🧰 Tools
🪛 Ruff (0.11.9)
2-2:
reimported but unusedRemove unused import:
re(F401)
18-22:NAME_REGEXimport is unused.
_sanitize_namerelies onstring_to_valid_nameand never touchesNAME_REGEX.-from kiln_ai.datamodel.basemodel import ( - ID_TYPE, - MAX_SHORT_NAME_LENGTH, - NAME_REGEX, - string_to_valid_name, -) +from kiln_ai.datamodel.basemodel import ( + ID_TYPE, + MAX_SHORT_NAME_LENGTH, + string_to_valid_name, +)🧰 Tools
🪛 Ruff (0.11.9)
20-20:
kiln_ai.datamodel.basemodel.NAME_REGEXimported but unusedRemove unused import:
kiln_ai.datamodel.basemodel.NAME_REGEX(F401)
957-966:_get_model_display_nameperforms a linear scan on every call – consider caching.If name generation is invoked frequently, repeatedly iterating
built_in_modelscan be wasteful. A simplefunctools.lru_cache(or building a dict once) would reduce the lookup to O(1).Example:
+from functools import lru_cache + @lru_cache(maxsize=None) def _get_model_display_name(model_id: str) -> str: @@ return model_id
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
app/desktop/studio_server/eval_api.py(6 hunks)app/desktop/studio_server/test_eval_api.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/desktop/studio_server/eval_api.py (4)
libs/core/kiln_ai/datamodel/basemodel.py (1)
string_to_valid_name(66-72)libs/core/kiln_ai/datamodel/prompt_id.py (1)
PromptGenerators(8-16)libs/core/kiln_ai/datamodel/task.py (2)
Task(171-240)prompts(227-228)libs/core/kiln_ai/datamodel/eval.py (1)
EvalConfigType(38-40)
🪛 Ruff (0.11.9)
app/desktop/studio_server/eval_api.py
2-2: re imported but unused
Remove unused import: re
(F401)
20-20: kiln_ai.datamodel.basemodel.NAME_REGEX imported but unused
Remove unused import: kiln_ai.datamodel.basemodel.NAME_REGEX
(F401)
app/desktop/studio_server/test_eval_api.py
1919-1919: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1892
(F811)
2029-2029: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1919
(F811)
2056-2056: Redefinition of unused test_create_eval_config_uses_descriptive_name from line 1946
(F811)
2084-2084: Redefinition of unused test_create_task_run_config_respects_provided_name from line 1974
(F811)
2111-2111: Redefinition of unused test_create_eval_config_respects_provided_name from line 2001
(F811)
🪛 Pylint (3.3.7)
app/desktop/studio_server/test_eval_api.py
[error] 1919-1919: function already defined line 1892
(E0102)
[error] 2029-2029: function already defined line 1892
(E0102)
[error] 2056-2056: function already defined line 1946
(E0102)
[error] 2084-2084: function already defined line 1974
(E0102)
[error] 2111-2111: function already defined line 2001
(E0102)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build Desktop Apps (macos-13)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build, Typecheck, and Test Python (3.11)
- GitHub Check: Build, Typecheck, and Test Python (3.10)
- GitHub Check: Build, Typecheck, and Test Python (3.13)
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Style
Chores