Abundant Take Home Project 🛠️

Welcome to the Abundant Take-Home Project!

Your task is to create a new Terminal Bench task that challenges state-of-the-art AI models. You'll start with an existing example, modify it to increase difficulty, and ensure all tests pass.

Step-by-Step Instructions

1. Get Familiar with TBench

Study Terminal Bench thoroughly at tbench.ai
- Understanding what makes a good benchmark task
- Pay attention to difficulty levels and evaluation criteria
Read our review guidelines

2. Create Your Task

Fork an existing task from /tasks
- Choose a domain you're experienced with (e.g. fullstack, devops, data science, etc.)
- Or if you prefer to start from scratch, here's a list of ideas
Modify it to increase difficulty while maintaining clarity
Update the solution and unit tests to match your changes

3. Submit Your Work

Create a pull request with your new task
Review the CI/CD checks carefully to ensure that
- Your solution and unit tests are set up correctly by verifying that Oracle Solution passes and NOP Agent test fails
  - Note that not all sample tasks have those setup correctly, in which case you'll probably want to fix those first
- Your tasks are sufficiently difficult so that at least 2 out of the 3 agent tests failed (see Run tests for modified tasks section of Test with Terminus (Claude | Gemini | GPT))
  - Each agent failed for the right reasons in agent.log, in the downloaded session artifact (see Upload session recordings section)
IMPORTANT: In the PR description, make sure to include
- Which task did you take inspiration from, if any?
  - How is your task different?
- The results of each agent test
  - Why did the agent fail? Make sure it's not due to poor unit test setup
- BONUS: What other variations could you create based on this task?
Send PR back for review (do not merge the PR before the review is done)

Tips for Success

Leverage your expertise: Take inspiration from your previous work and come up with tasks that resemble real-world engineering tasks.

Open-ended approach: Feel free to use any AI tools or resources to help you achieve the outcome. We're interested in seeing your problem-solving process!

Start simple: Begin with a task similar to existing examples, then gradually increase complexity.

Quality over quantity: One well-crafted, challenging task is better than multiple simple ones.

Getting Help

Don't hesitate to reach out if you have questions! We're here to support you throughout the process.

Contact: ke@abundant.ai for technical questions or repository access 🤝

We're excited to see what you create! This project will give you valuable insight into our work while helping us evaluate your technical skills and creativity.

Name		Name	Last commit message	Last commit date
Latest commit History 871 Commits
.gemini		.gemini
.github		.github
.vscode		.vscode
adapters		adapters
dashboard		dashboard
discord-bot		discord-bot
docker		docker
resources		resources
scripts_bash		scripts_bash
scripts_python		scripts_python
tasks		tasks
terminal_bench		terminal_bench
tests		tests
.env.template		.env.template
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml
registry.json		registry.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Abundant Take Home Project 🛠️

Step-by-Step Instructions

1. Get Familiar with TBench

2. Create Your Task

3. Submit Your Work

Tips for Success

Getting Help

About

Uh oh!

Releases

Packages

Languages

License

querulousfalcon/terminal-bench-modifications

Folders and files

Latest commit

History

Repository files navigation

Abundant Take Home Project 🛠️

Step-by-Step Instructions

1. Get Familiar with TBench

2. Create Your Task

3. Submit Your Work

Tips for Success

Getting Help

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages