Skip to content

Conversation

@scosman
Copy link
Collaborator

@scosman scosman commented Jun 10, 2025

Summary by CodeRabbit

  • New Features

    • Automatically generates descriptive names for task run configurations and evaluation configurations when no name is provided, making them easier to identify in the UI.
  • Bug Fixes

    • Ensures that generated names are always valid, properly formatted, and within length limits.
  • Tests

    • Added comprehensive tests to verify naming behavior and API integration.
  • Style

    • Updated the "Clear All" button in the UI to display the text "Clear" instead of an icon.
  • Chores

    • Introduced a constant for maximum short name length to improve maintainability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 10, 2025

Walkthrough

Automatic, descriptive name generation for task run configs and eval configs was added, ensuring meaningful defaults when names are omitted. Helper functions map IDs to user-friendly names, enforce naming rules, and handle truncation. Associated tests were introduced for the new naming logic and API integration. UI and constant updates were also made.

Changes

File(s) Change Summary
app/desktop/studio_server/eval_api.py Added functions for generating, mapping, and sanitizing descriptive names for configs; updated imports.
app/desktop/studio_server/test_eval_api.py Added tests for naming functions, sanitization, display name mapping, and API integration.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte Changed "Clear All" button from icon to text label "Clear".
libs/core/kiln_ai/datamodel/basemodel.py Introduced MAX_SHORT_NAME_LENGTH constant; updated SHORT_NAME_FIELD to use this constant.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant API
    participant NameGen
    participant DB

    User->>API: POST /create_task_run_config (no name)
    API->>NameGen: generate_task_run_config_name(model_id, prompt_id, task)
    NameGen->>DB: Lookup model and prompt display names
    NameGen-->>API: Return sanitized, descriptive name
    API->>DB: Create config with generated name
    API-->>User: Respond with config (including generated name)
Loading
sequenceDiagram
    participant User
    participant API
    participant NameGen
    participant DB

    User->>API: POST /create_eval_config (no name)
    API->>NameGen: generate_eval_config_name(eval_type, model_id)
    NameGen->>DB: Lookup model display name
    NameGen-->>API: Return sanitized, descriptive name
    API->>DB: Create eval config with generated name
    API-->>User: Respond with eval config (including generated name)
Loading

Poem

In the warren of configs, names now bloom bright,
No more random strings in the burrow at night.
Models and prompts join in clever refrain,
Each task and eval with a meaningful name.
With tests all a-hopping, the rabbits delight—
For clarity’s carrot is finally in sight! 🥕


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
app/desktop/studio_server/eval_api.py (1)

2-2: Remove unused imports flagged by static analysis.

The re import and NAME_REGEX constant are not directly used in this file. The regex validation is handled by the imported string_to_valid_name function.

-import re
-    NAME_REGEX,

Also applies to: 20-20

🧰 Tools
🪛 Ruff (0.11.9)

2-2: re imported but unused

Remove unused import: re

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd8f026 and 17f72ad.

📒 Files selected for processing (4)
  • app/desktop/studio_server/eval_api.py (6 hunks)
  • app/desktop/studio_server/test_eval_api.py (1 hunks)
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte (1 hunks)
  • libs/core/kiln_ai/datamodel/basemodel.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/desktop/studio_server/eval_api.py (4)
libs/core/kiln_ai/datamodel/basemodel.py (1)
  • string_to_valid_name (66-72)
libs/core/kiln_ai/datamodel/prompt_id.py (1)
  • PromptGenerators (8-16)
libs/core/kiln_ai/datamodel/task.py (2)
  • Task (171-240)
  • prompts (227-228)
libs/core/kiln_ai/datamodel/eval.py (1)
  • EvalConfigType (38-40)
🪛 Ruff (0.11.9)
app/desktop/studio_server/test_eval_api.py

1919-1919: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1892

(F811)


2029-2029: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1919

(F811)


2056-2056: Redefinition of unused test_create_eval_config_uses_descriptive_name from line 1946

(F811)


2084-2084: Redefinition of unused test_create_task_run_config_respects_provided_name from line 1974

(F811)


2111-2111: Redefinition of unused test_create_eval_config_respects_provided_name from line 2001

(F811)

app/desktop/studio_server/eval_api.py

2-2: re imported but unused

Remove unused import: re

(F401)


20-20: kiln_ai.datamodel.basemodel.NAME_REGEX imported but unused

Remove unused import: kiln_ai.datamodel.basemodel.NAME_REGEX

(F401)

🪛 Pylint (3.3.7)
app/desktop/studio_server/test_eval_api.py

[error] 1919-1919: function already defined line 1892

(E0102)


[error] 2029-2029: function already defined line 1892

(E0102)


[error] 2056-2056: function already defined line 1946

(E0102)


[error] 2084-2084: function already defined line 1974

(E0102)


[error] 2111-2111: function already defined line 2001

(E0102)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Build Desktop Apps (macos-13)
  • GitHub Check: Build Desktop Apps (windows-latest)
🔇 Additional comments (12)
libs/core/kiln_ai/datamodel/basemodel.py (1)

57-57: LGTM! Good practice introducing a named constant.

Excellent refactoring to eliminate magic numbers and centralize the maximum length constraint for short names.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/+page.svelte (1)

364-364: LGTM! Improved accessibility and clarity.

Changing from an icon-only button to a text-labeled "Clear" button improves usability and accessibility while maintaining the same functionality.

app/desktop/studio_server/test_eval_api.py (1)

1728-2164: Excellent test coverage for the new naming functionality.

The new tests provide comprehensive coverage of the descriptive naming feature, including:

  • Unit tests for name generation functions with various scenarios
  • Edge case handling (long names, invalid characters, empty names)
  • Integration tests verifying API behavior
  • Validation that the prompt generator mapping stays in sync with enum values

The test structure is well-organized and follows good testing practices with clear assertions and descriptive test names.

🧰 Tools
🪛 Ruff (0.11.9)

1919-1919: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1892

(F811)


2029-2029: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1919

(F811)


2056-2056: Redefinition of unused test_create_eval_config_uses_descriptive_name from line 1946

(F811)


2084-2084: Redefinition of unused test_create_task_run_config_respects_provided_name from line 1974

(F811)


2111-2111: Redefinition of unused test_create_eval_config_respects_provided_name from line 2001

(F811)

🪛 Pylint (3.3.7)

[error] 1919-1919: function already defined line 1892

(E0102)


[error] 2029-2029: function already defined line 1892

(E0102)


[error] 2056-2056: function already defined line 1946

(E0102)


[error] 2084-2084: function already defined line 1974

(E0102)


[error] 2111-2111: function already defined line 2001

(E0102)

app/desktop/studio_server/eval_api.py (9)

361-365: LGTM! Clean implementation of automatic name generation.

The fallback logic is well-implemented - when no explicit name is provided, it generates a descriptive name using the model and prompt information. This enhances user experience by providing meaningful defaults.


416-418: LGTM! Consistent name generation pattern.

The eval config name generation follows the same pattern as task run config, maintaining consistency across the API.


869-890: Well-documented and robust name generation function.

The function properly handles the model and prompt name resolution with clear documentation. Good use of helper functions for separation of concerns.


893-913: Consistent implementation for eval config names.

The function follows the same pattern as task run config name generation, ensuring consistency across the API.


928-945: Robust prompt name resolution with good fallback handling.

The function properly handles different prompt ID formats (saved prompts, fine-tune prompts, and generators) with appropriate fallbacks. The logic correctly extracts IDs and looks up display names.


948-954: Complete coverage of EvalConfigType enum values.

The mapping correctly covers both g_eval and llm_as_judge evaluation types with user-friendly display names.


957-965: Efficient model name lookup with appropriate fallback.

The function iterates through built-in models to find friendly names, with a sensible fallback to the original model ID. For typical use cases, this linear search should perform adequately.


968-981: Comprehensive name sanitization with edge case handling.

The function properly uses the existing string_to_valid_name utility, handles length constraints with truncation, and ensures a non-empty result. The fallback to "Config" is a reasonable default.


916-925:

✅ Verification successful

Verify all PromptGenerators enum values are covered.

The mapping dictionary should include all values from the PromptGenerators enum to ensure comprehensive coverage.


🏁 Script executed:

#!/bin/bash
# Description: Check if all PromptGenerators enum values are covered in the mapping dictionary
# Expected: All enum values should have corresponding entries in _prompt_generator_names

# Get all PromptGenerators enum values
echo "PromptGenerators enum values:"
ast-grep --pattern 'class PromptGenerators($$$) {
  $$$
  $A = $B
  $$$
}' | grep -E '^\s*[A-Z_]+ = '

echo -e "\n_prompt_generator_names dictionary keys:"
# Get all keys from the mapping dictionary
rg -A 10 '_prompt_generator_names = {' app/desktop/studio_server/eval_api.py

Length of output: 466


Let’s extract the actual enum members and the dict keys via AST to get an accurate comparison:

This will show exactly which PromptGenerators members aren’t covered in _prompt_generator_names.


🏁 Script executed:

#!/usr/bin/env python3
import ast

# Parse the file
path = 'app/desktop/studio_server/eval_api.py'
with open(path) as f:
    tree = ast.parse(f.read(), path)

enum_vals = []
map_keys = []

# Walk the AST
for node in ast.walk(tree):
    # Collect PromptGenerators enum attributes
    if isinstance(node, ast.ClassDef) and node.name == 'PromptGenerators':
        for stmt in node.body:
            if isinstance(stmt, ast.Assign):
                for t in stmt.targets:
                    if isinstance(t, ast.Name):
                        enum_vals.append(t.id)

    # Find the mapping dictionary
    if isinstance(node, ast.Assign):
        for t in node.targets:
            if isinstance(t, ast.Name) and t.id == '_prompt_generator_names':
                if isinstance(node.value, ast.Dict):
                    for key in node.value.keys:
                        # Handle keys like PromptGenerators.X.value
                        if isinstance(key, ast.Attribute):
                            map_keys.append(key.attr)
                        # Handle literal keys
                        elif isinstance(key, ast.Constant):
                            map_keys.append(key.value)

# Print results
print("Enum values:   ", enum_vals)
print("Mapped keys:   ", map_keys)
print("Missing in map:", sorted(set(enum_vals) - set(map_keys)))

Length of output: 456


Let’s run a proper in-line Python snippet under Bash to extract the enum members versus map keys:


🏁 Script executed:

#!/usr/bin/env bash
python3 - << 'EOF'
import ast, sys

path = 'app/desktop/studio_server/eval_api.py'
with open(path, 'r') as f:
    tree = ast.parse(f.read(), path)

enum_vals = []
map_keys = []

for node in ast.walk(tree):
    # PromptGenerators enum members
    if isinstance(node, ast.ClassDef) and node.name == 'PromptGenerators':
        for stmt in node.body:
            if isinstance(stmt, ast.Assign):
                for target in stmt.targets:
                    if isinstance(target, ast.Name):
                        enum_vals.append(target.id)

    # _prompt_generator_names dict keys
    if isinstance(node, ast.Assign):
        for target in node.targets:
            if isinstance(target, ast.Name) and target.id == '_prompt_generator_names':
                if isinstance(node.value, ast.Dict):
                    for key in node.value.keys:
                        # e.g., PromptGenerators.X.value
                        if isinstance(key, ast.Attribute):
                            map_keys.append(key.attr)
                        elif isinstance(key, ast.Constant):
                            map_keys.append(key.value)

print("Enum values:   ", enum_vals)
print("Mapped keys:   ", map_keys)
print("Missing in map:", sorted(set(enum_vals) - set(map_keys)))
EOF

Length of output: 138


Let’s locate the enum definition and pull out its members alongside the mapping snippet for a direct comparison:


🏁 Script executed:

#!/usr/bin/env bash
# 1) Find the file where PromptGenerators is defined
enum_file=$(rg -l "class PromptGenerators" . | head -n1)
echo "Enum file: $enum_file"

# 2) List all enum members (assignments) in that class
echo -e "\nEnum members:"
sed -n '1,200p' "$enum_file" | grep -E '^\s*[A-Z_]+ = '

# 3) Show the mapping dictionary in eval_api.py around lines 900–960
echo -e "\nMapping snippet in eval_api.py:"
sed -n '900,960p' app/desktop/studio_server/eval_api.py | sed -n '1,50p'

Length of output: 2747


All PromptGenerators enum values are covered in _prompt_generator_names.
I’ve confirmed that each of the eight members of PromptGenerators (SIMPLE, SHORT, FEW_SHOT, MULTI_SHOT, REPAIRS, SIMPLE_CHAIN_OF_THOUGHT, FEW_SHOT_CHAIN_OF_THOUGHT, MULTI_SHOT_CHAIN_OF_THOUGHT) has a corresponding entry in the mapping dictionary. No missing entries found.

Comment on lines 58 to 63
SHORT_NAME_FIELD = Field(
min_length=1,
max_length=32,
max_length=MAX_SHORT_NAME_LENGTH,
pattern=NAME_REGEX,
description="A name for this entity",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify consistency across the codebase.

The update to use the constant is correct. Ensure that other places in the codebase that might use similar length constraints are updated to use this centralized constant.


🏁 Script executed:

#!/bin/bash
# Description: Search for other hardcoded values of 32 that might be related to name length constraints
# Expected: Find any other locations that should use the new constant

# Search for hardcoded 32 values in relevant files
rg -A 3 -B 3 "32" --type py | grep -E "(name|length|max)" -i

# Search specifically in validation/field definitions
ast-grep --pattern 'max_length=$_'

Length of output: 4800


Replace hardcoded slice length with MAX_SHORT_NAME_LENGTH

The only remaining literal 32 in the basemodel is in the path‐formatter slice. Update it to use the centralized constant, and review any tests or helpers that still reference 32 directly.

• libs/core/kiln_ai/datamodel/basemodel.py

 # before
-            path = f"{path} - {name[:32]}"
 # after
+            path = f"{path} - {name[:MAX_SHORT_NAME_LENGTH]}"

• Review tests in app/desktop/studio_server/test_eval_api.py (and any others) that assert truncation to 32 characters—consider importing and using MAX_SHORT_NAME_LENGTH there for full consistency.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SHORT_NAME_FIELD = Field(
min_length=1,
max_length=32,
max_length=MAX_SHORT_NAME_LENGTH,
pattern=NAME_REGEX,
description="A name for this entity",
)
--- a/libs/core/kiln_ai/datamodel/basemodel.py
+++ b/libs/core/kiln_ai/datamodel/basemodel.py
@@
- path = f"{path} - {name[:32]}"
+ path = f"{path} - {name[:MAX_SHORT_NAME_LENGTH]}"
🤖 Prompt for AI Agents
In libs/core/kiln_ai/datamodel/basemodel.py around lines 58 to 63, replace the
hardcoded slice length 32 in the path-formatter slice with the constant
MAX_SHORT_NAME_LENGTH to maintain consistency. Additionally, review and update
any tests or helper functions, such as those in
app/desktop/studio_server/test_eval_api.py, that assert truncation or length
constraints using the literal 32, and modify them to import and use
MAX_SHORT_NAME_LENGTH instead.

Comment on lines +1919 to +1943
async def test_create_task_run_config_uses_descriptive_name(
client, mock_task_from_id, mock_task
):
"""Test that task run config creation uses descriptive naming when name is not provided."""
mock_task_from_id.return_value = mock_task

response = client.post(
"/api/projects/project1/tasks/task1/task_run_config",
json={
# No name provided - should generate descriptive name
"run_config_properties": {
"model_name": "gpt-4o",
"model_provider_name": "openai",
"prompt_id": "simple_prompt_builder",
"structured_output_mode": "json_schema",
},
},
)

assert response.status_code == 200
result = response.json()

# Should use descriptive name format: model - prompt
assert result["name"] == "GPT 4o - Basic Prompt"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Remove duplicate function definitions.

The function test_create_task_run_config_uses_descriptive_name is defined three times (lines 1892, 1919, and 2029). Python will only use the last definition, making the earlier ones unreachable dead code.

Remove the duplicate definitions:

-@pytest.mark.asyncio
-async def test_create_task_run_config_uses_descriptive_name(
-    client, mock_task_from_id, mock_task
-):
-    """Test that task run config creation uses descriptive naming when name is not provided."""
-    mock_task_from_id.return_value = mock_task
-
-    response = client.post(
-        "/api/projects/project1/tasks/task1/task_run_config",
-        json={
-            # No name provided - should generate descriptive name
-            "run_config_properties": {
-                "model_name": "gpt-4o",
-                "model_provider_name": "openai",
-                "prompt_id": "simple_prompt_builder",
-                "structured_output_mode": "json_schema",
-            },
-        },
-    )
-
-    assert response.status_code == 200
-    result = response.json()
-
-    # Should use descriptive name format: model - prompt
-    assert result["name"] == "GPT 4o - Basic Prompt"

And remove the third duplicate starting at line 2029.

Also applies to: 2029-2053

🧰 Tools
🪛 Ruff (0.11.9)

1919-1919: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1892

(F811)

🪛 Pylint (3.3.7)

[error] 1919-1919: function already defined line 1892

(E0102)

🤖 Prompt for AI Agents
In app/desktop/studio_server/test_eval_api.py between lines 1892-1919-2029, the
function test_create_task_run_config_uses_descriptive_name is defined three
times, causing the first two definitions to be overridden and effectively dead
code. Remove the duplicate function definitions at lines 1892-1918 and
2029-2053, keeping only one definition to ensure the test runs correctly without
redundancy.

Comment on lines +2111 to +2136
async def test_create_eval_config_respects_provided_name(
client, mock_task_from_id, mock_eval, mock_task
):
"""Test that eval config creation respects provided names."""
mock_task_from_id.return_value = mock_task

with patch("app.desktop.studio_server.eval_api.eval_from_id") as mock_eval_from_id:
mock_eval_from_id.return_value = mock_eval

response = client.post(
"/api/projects/project1/tasks/task1/eval/eval1/create_eval_config",
json={
"name": "Custom Eval Config", # Explicit name provided
"type": "llm_as_judge",
"properties": {"eval_steps": ["step1"]},
"model_name": "gpt-4o",
"provider": "openai",
},
)

assert response.status_code == 200
result = response.json()

# Should use the provided name, not generated one
assert result["name"] == "Custom Eval Config"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Remove duplicate function definition.

The function test_create_eval_config_respects_provided_name is defined twice (lines 2001 and 2111). Remove the duplicate.

-@pytest.mark.asyncio
-async def test_create_eval_config_respects_provided_name(
-    client, mock_task_from_id, mock_eval, mock_task
-):
-    """Test that eval config creation respects provided names."""
-    mock_task_from_id.return_value = mock_task
-
-    with patch("app.desktop.studio_server.eval_api.eval_from_id") as mock_eval_from_id:
-        mock_eval_from_id.return_value = mock_eval
-
-        response = client.post(
-            "/api/projects/project1/tasks/task1/eval/eval1/create_eval_config",
-            json={
-                "name": "Custom Eval Config",  # Explicit name provided
-                "type": "llm_as_judge",
-                "properties": {"eval_steps": ["step1"]},
-                "model_name": "gpt-4o",
-                "provider": "openai",
-            },
-        )
-
-    assert response.status_code == 200
-    result = response.json()
-
-    # Should use the provided name, not generated one
-    assert result["name"] == "Custom Eval Config"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def test_create_eval_config_respects_provided_name(
client, mock_task_from_id, mock_eval, mock_task
):
"""Test that eval config creation respects provided names."""
mock_task_from_id.return_value = mock_task
with patch("app.desktop.studio_server.eval_api.eval_from_id") as mock_eval_from_id:
mock_eval_from_id.return_value = mock_eval
response = client.post(
"/api/projects/project1/tasks/task1/eval/eval1/create_eval_config",
json={
"name": "Custom Eval Config", # Explicit name provided
"type": "llm_as_judge",
"properties": {"eval_steps": ["step1"]},
"model_name": "gpt-4o",
"provider": "openai",
},
)
assert response.status_code == 200
result = response.json()
# Should use the provided name, not generated one
assert result["name"] == "Custom Eval Config"
🧰 Tools
🪛 Ruff (0.11.9)

2111-2111: Redefinition of unused test_create_eval_config_respects_provided_name from line 2001

(F811)

🪛 Pylint (3.3.7)

[error] 2111-2111: function already defined line 2001

(E0102)

🤖 Prompt for AI Agents
In app/desktop/studio_server/test_eval_api.py between lines 2001 and 2136, the
function test_create_eval_config_respects_provided_name is defined twice. Remove
the duplicate definition starting at line 2111 to keep only one unique
implementation of this test function.

Comment on lines +2084 to +2108
async def test_create_task_run_config_respects_provided_name(
client, mock_task_from_id, mock_task
):
"""Test that task run config creation respects provided names."""
mock_task_from_id.return_value = mock_task

response = client.post(
"/api/projects/project1/tasks/task1/task_run_config",
json={
"name": "Custom Name", # Explicit name provided
"run_config_properties": {
"model_name": "gpt-4o",
"model_provider_name": "openai",
"prompt_id": "simple_prompt_builder",
"structured_output_mode": "json_schema",
},
},
)

assert response.status_code == 200
result = response.json()

# Should use the provided name, not generated one
assert result["name"] == "Custom Name"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Remove duplicate function definition.

The function test_create_task_run_config_respects_provided_name is defined twice (lines 1974 and 2084). Remove the duplicate.

-@pytest.mark.asyncio
-async def test_create_task_run_config_respects_provided_name(
-    client, mock_task_from_id, mock_task
-):
-    """Test that task run config creation respects provided names."""
-    mock_task_from_id.return_value = mock_task
-
-    response = client.post(
-        "/api/projects/project1/tasks/task1/task_run_config",
-        json={
-            "name": "Custom Name",  # Explicit name provided
-            "run_config_properties": {
-                "model_name": "gpt-4o",
-                "model_provider_name": "openai",
-                "prompt_id": "simple_prompt_builder",
-                "structured_output_mode": "json_schema",
-            },
-        },
-    )
-
-    assert response.status_code == 200
-    result = response.json()
-
-    # Should use the provided name, not generated one
-    assert result["name"] == "Custom Name"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def test_create_task_run_config_respects_provided_name(
client, mock_task_from_id, mock_task
):
"""Test that task run config creation respects provided names."""
mock_task_from_id.return_value = mock_task
response = client.post(
"/api/projects/project1/tasks/task1/task_run_config",
json={
"name": "Custom Name", # Explicit name provided
"run_config_properties": {
"model_name": "gpt-4o",
"model_provider_name": "openai",
"prompt_id": "simple_prompt_builder",
"structured_output_mode": "json_schema",
},
},
)
assert response.status_code == 200
result = response.json()
# Should use the provided name, not generated one
assert result["name"] == "Custom Name"
🧰 Tools
🪛 Ruff (0.11.9)

2084-2084: Redefinition of unused test_create_task_run_config_respects_provided_name from line 1974

(F811)

🪛 Pylint (3.3.7)

[error] 2084-2084: function already defined line 1974

(E0102)

🤖 Prompt for AI Agents
In app/desktop/studio_server/test_eval_api.py between lines 1974 and 2108, the
function test_create_task_run_config_respects_provided_name is defined twice.
Remove one of the duplicate definitions entirely to avoid conflicts and ensure
only a single test function with that name exists.

Comment on lines +2056 to +2081
async def test_create_eval_config_uses_descriptive_name(
client, mock_task_from_id, mock_eval, mock_task
):
"""Test that eval config creation uses descriptive naming when name is not provided."""
mock_task_from_id.return_value = mock_task

with patch("app.desktop.studio_server.eval_api.eval_from_id") as mock_eval_from_id:
mock_eval_from_id.return_value = mock_eval

response = client.post(
"/api/projects/project1/tasks/task1/eval/eval1/create_eval_config",
json={
# No name provided - should generate descriptive name
"type": "g_eval",
"properties": {"eval_steps": ["step1"]},
"model_name": "gpt_4o",
"provider": "openai",
},
)

assert response.status_code == 200
result = response.json()

# Should use descriptive name format: eval_method - model
assert result["name"] == "G-Eval - GPT 4o"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Remove duplicate function definition.

The function test_create_eval_config_uses_descriptive_name is defined twice (lines 1946 and 2056). Remove the duplicate.

-@pytest.mark.asyncio
-async def test_create_eval_config_uses_descriptive_name(
-    client, mock_task_from_id, mock_eval, mock_task
-):
-    """Test that eval config creation uses descriptive naming when name is not provided."""
-    mock_task_from_id.return_value = mock_task
-
-    with patch("app.desktop.studio_server.eval_api.eval_from_id") as mock_eval_from_id:
-        mock_eval_from_id.return_value = mock_eval
-
-        response = client.post(
-            "/api/projects/project1/tasks/task1/eval/eval1/create_eval_config",
-            json={
-                # No name provided - should generate descriptive name
-                "type": "g_eval",
-                "properties": {"eval_steps": ["step1"]},
-                "model_name": "gpt_4o",
-                "provider": "openai",
-            },
-        )
-
-    assert response.status_code == 200
-    result = response.json()
-
-    # Should use descriptive name format: eval_method - model
-    assert result["name"] == "G-Eval - GPT 4o"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def test_create_eval_config_uses_descriptive_name(
client, mock_task_from_id, mock_eval, mock_task
):
"""Test that eval config creation uses descriptive naming when name is not provided."""
mock_task_from_id.return_value = mock_task
with patch("app.desktop.studio_server.eval_api.eval_from_id") as mock_eval_from_id:
mock_eval_from_id.return_value = mock_eval
response = client.post(
"/api/projects/project1/tasks/task1/eval/eval1/create_eval_config",
json={
# No name provided - should generate descriptive name
"type": "g_eval",
"properties": {"eval_steps": ["step1"]},
"model_name": "gpt_4o",
"provider": "openai",
},
)
assert response.status_code == 200
result = response.json()
# Should use descriptive name format: eval_method - model
assert result["name"] == "G-Eval - GPT 4o"
🧰 Tools
🪛 Ruff (0.11.9)

2056-2056: Redefinition of unused test_create_eval_config_uses_descriptive_name from line 1946

(F811)

🪛 Pylint (3.3.7)

[error] 2056-2056: function already defined line 1946

(E0102)

🤖 Prompt for AI Agents
In app/desktop/studio_server/test_eval_api.py between lines 1946 and 2081, there
are two definitions of the function
test_create_eval_config_uses_descriptive_name. Remove the duplicate function
definition at lines 2056 to 2081 to keep only one unique definition of this test
function.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
app/desktop/studio_server/test_eval_api.py (1)

1892-2136: ⚠️ Potential issue

Duplicate test definitions remain – delete the earlier copies.

test_create_task_run_config_uses_descriptive_name,
test_create_eval_config_uses_descriptive_name,
test_create_task_run_config_respects_provided_name, and
test_create_eval_config_respects_provided_name are each defined twice (see 1892-1916, 1919-1943, 2029-2053, 2056-2081, 1974-1998, 2084-2108, 2001-2025, 2111-2135).
Python keeps only the last definition; Ruff/Pylint raise F811/E0102.
Remove the redundant copies to avoid dead code and confusing test results.

-@pytest.mark.asyncio
-async def test_create_task_run_config_uses_descriptive_name(...):
-    ...
-# (entire block 1892-1916)

-@pytest.mark.asyncio
-async def test_create_eval_config_uses_descriptive_name(...):
-    ...
-# (entire block 1946-1971)

-@pytest.mark.asyncio
-async def test_create_task_run_config_respects_provided_name(...):
-    ...
-# (entire block 1974-1998)

-@pytest.mark.asyncio
-async def test_create_eval_config_respects_provided_name(...):
-    ...
-# (entire block 2001-2025)

Keep the single versions starting at 1919, 2056, 2084 and 2111 (or whichever you prefer) and adjust line-numbers accordingly.

🧰 Tools
🪛 Ruff (0.11.9)

1919-1919: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1892

(F811)


2029-2029: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1919

(F811)


2056-2056: Redefinition of unused test_create_eval_config_uses_descriptive_name from line 1946

(F811)


2084-2084: Redefinition of unused test_create_task_run_config_respects_provided_name from line 1974

(F811)


2111-2111: Redefinition of unused test_create_eval_config_respects_provided_name from line 2001

(F811)

🪛 Pylint (3.3.7)

[error] 1919-1919: function already defined line 1892

(E0102)


[error] 2029-2029: function already defined line 1892

(E0102)


[error] 2056-2056: function already defined line 1946

(E0102)


[error] 2084-2084: function already defined line 1974

(E0102)


[error] 2111-2111: function already defined line 2001

(E0102)

🧹 Nitpick comments (3)
app/desktop/studio_server/eval_api.py (3)

2-2: re import is unused – please remove.

The module no longer references re, triggering Ruff F401.

-import re
🧰 Tools
🪛 Ruff (0.11.9)

2-2: re imported but unused

Remove unused import: re

(F401)


18-22: NAME_REGEX import is unused.

_sanitize_name relies on string_to_valid_name and never touches NAME_REGEX.

-from kiln_ai.datamodel.basemodel import (
-    ID_TYPE,
-    MAX_SHORT_NAME_LENGTH,
-    NAME_REGEX,
-    string_to_valid_name,
-)
+from kiln_ai.datamodel.basemodel import (
+    ID_TYPE,
+    MAX_SHORT_NAME_LENGTH,
+    string_to_valid_name,
+)
🧰 Tools
🪛 Ruff (0.11.9)

20-20: kiln_ai.datamodel.basemodel.NAME_REGEX imported but unused

Remove unused import: kiln_ai.datamodel.basemodel.NAME_REGEX

(F401)


957-966: _get_model_display_name performs a linear scan on every call – consider caching.

If name generation is invoked frequently, repeatedly iterating built_in_models can be wasteful. A simple functools.lru_cache (or building a dict once) would reduce the lookup to O(1).

Example:

+from functools import lru_cache
+
 @lru_cache(maxsize=None)
 def _get_model_display_name(model_id: str) -> str:
@@
     return model_id
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17f72ad and bfe1f79.

📒 Files selected for processing (2)
  • app/desktop/studio_server/eval_api.py (6 hunks)
  • app/desktop/studio_server/test_eval_api.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
app/desktop/studio_server/eval_api.py (4)
libs/core/kiln_ai/datamodel/basemodel.py (1)
  • string_to_valid_name (66-72)
libs/core/kiln_ai/datamodel/prompt_id.py (1)
  • PromptGenerators (8-16)
libs/core/kiln_ai/datamodel/task.py (2)
  • Task (171-240)
  • prompts (227-228)
libs/core/kiln_ai/datamodel/eval.py (1)
  • EvalConfigType (38-40)
🪛 Ruff (0.11.9)
app/desktop/studio_server/eval_api.py

2-2: re imported but unused

Remove unused import: re

(F401)


20-20: kiln_ai.datamodel.basemodel.NAME_REGEX imported but unused

Remove unused import: kiln_ai.datamodel.basemodel.NAME_REGEX

(F401)

app/desktop/studio_server/test_eval_api.py

1919-1919: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1892

(F811)


2029-2029: Redefinition of unused test_create_task_run_config_uses_descriptive_name from line 1919

(F811)


2056-2056: Redefinition of unused test_create_eval_config_uses_descriptive_name from line 1946

(F811)


2084-2084: Redefinition of unused test_create_task_run_config_respects_provided_name from line 1974

(F811)


2111-2111: Redefinition of unused test_create_eval_config_respects_provided_name from line 2001

(F811)

🪛 Pylint (3.3.7)
app/desktop/studio_server/test_eval_api.py

[error] 1919-1919: function already defined line 1892

(E0102)


[error] 2029-2029: function already defined line 1892

(E0102)


[error] 2056-2056: function already defined line 1946

(E0102)


[error] 2084-2084: function already defined line 1974

(E0102)


[error] 2111-2111: function already defined line 2001

(E0102)

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build Desktop Apps (macos-13)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
  • GitHub Check: Build, Typecheck, and Test Python (3.13)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants