A production-ready suite of modular, asynchronous tools for analyzing, refactoring, commenting, and auditing SQL code using Azure OpenAI (GPT-4o).
Designed with security, performance, auditability, and HIPAA/HITECH compliance in mind.
sql_tools/
├── app.py # CLI interface and controller
├── requirements.txt
├── README.md
├── LICENSE
├── core/ # Framework and shared logic
│ ├── base_ai_client.py # Async Azure OpenAI client
│ ├── config_loader.py # Configuration loader (mirrors original config.py)
│ ├── logger.py # HIPAA-compliant logging utility
│ └── sql_task_base.py # Abstract base class for GenAI SQL tasks
├── tasks/ # Modular GenAI SQL task classes
│ ├── sql_analyzer.py
│ ├── sql_commenter.py
│ ├── sql_data_masker.py
│ ├── sql_refactorer.py
│ ├── sql_explainer.py
│ ├── sql_security_auditor.py
│ ├── sql_test_generator.py
│ ├── sql_performance_benchmark.py # New: Added performance benchmarking and optimization
│ ├── sql_data_masker.py # New: Data masking and anonymization
│ ├── sql_visualizer.py # New: Visualization and insights
│ ├── sql_error_corrector.py # New: Error correction and debugging
│ ├── sql_style_enforcer.py # New: Style guide enforcement
│ ├── natural_language_to_sql.py # New: Natural language to SQL conversion
├── learn/ # Learning Mode folder
│ └── sql_learn_mode.py # Interactive tutorials for SQL learning
├── prompts/ # Centralized prompt management
│ ├── index.yaml # YAML file defining all prompts and metadata
│ ├── summarization/ # Summarization-related prompt templates
│ ├── classification/ # Classification-related prompt templates
└── utils/
├── file_utils.py # File I/O, backup, and directory handling
├── prompt_manager.py # Centralized prompt loading and validation
├── sanitizer.py # LLM output cleaner (markdown, GPT comments)
├── dynamic_sql_detector.py # New: Utility for dynamic SQL detection
- Data Masking and Anonymization - Automatically masks sensitive data such as emails, phone numbers, credit card numbers, and SSNs.
- Modular task engine (comment, analyze, refactor, audit, explain, test)
- Query Simulation and Validation
- Centralized prompt management via
prompts/index.yaml - Asynchronous OpenAI integration using
httpx - Sanitized output with
--sanitize - Directory recursion and batching with
--recursive - Preview results before modifying files with
--dry-run - Backup and HIPAA-safe logging
- Git integration: auto-stage files with
--git - Export to separate files using
--output - Support for multiple SQL dialects using nl_to_sql task (e.g., T-SQL, PostgreSQL, Oracle)
- Natural Language to SQL Conversion
- Enhanced security audits with SQL injection and role misuse detection
- Performance benchmarking and optimization
- Data masking and anonymization
- SQL Style Guide Enforcement
- Dynamic SQL Detection
- SQL Education Mode (Interactive Tutorials)
python app.py --task=mask --path=example.sql --output=masked_example.sql- Task:
mask— Automatically identifies and masks sensitive data such as:- Email addresses.
- Phone numbers.
- Credit card numbers.
- Social Security Numbers (SSNs).
--output=...: Writes the masked SQL to a new file.
python app.py --task=style_enforce --path=example.sql --sql_dialect=PostgreSQL --output=styled_example.sql- Task:
style_enforce— Enforces SQL coding standards dynamically using AI. --sql_dialect=...: Specifies the SQL dialect (e.g., PostgreSQL, T-SQL).--output=...: Writes the styled SQL to a new file.
python app.py --task=comment --path=example.sqlpython app.py --task=comment --path=example.sql --sanitize --output=cleaned_example.sqlpython app.py --task=refactor --path=query.sql --dry-runpython app.py --task=analyze --path=./sql_scripts --recursive --backuppython app.py --task=audit --path=query.sql --gitpython app.py --task=test --path=example.sql --dry-runpython app.py --task=nl_to_sql --path="list all patients diagnosed with diabetes last month" --sql_dialect="PostgreSQL" --schema_path="schema/schema.json" --dry-runpython app.py --task=nl_to_sql --path=queries/nl_query.txt --sql_dialect="T-SQL" --schema_path="schema/HealthClaimsDW.json" --output=output/generated_query.sqlpython app.py --task=benchmark --path=example.sql --dry-runpython app.py --task=visualize --path=example.sqlDetect dynamic SQL patterns and analyze risks/optimizations:
python app.py --task=dynamic_sql --path="queries/sample_query.sql" --detect_only --dry-runpython app.py --task=dynamic_sql --path="queries/" --recursiveThe sql_learn_mode.py script provides an interactive platform for learning SQL concepts through quizzes, practice, and conversational guidance. It leverages an AI client (BaseAIClient) for generating dynamic SQL content, such as quiz questions and feedback on queries.
python learn/sql_learn_mode.pyOptionally, you can Set as startup file from within VS Professional and click Start/F5 to run the console app. This will start the AI Agent.
Edit core/config_loader.py to match your Azure OpenAI deployment:
AOPAI_KEY = "your-api-key"
API_BASE = "https://your-resource.openai.azure.com/"
AOPAI_API_VERSION = "2025-01-01-preview"
AOPAI_DEPLOY_MODEL = "gpt-4o-dev"pip install -r requirements.txtAll prompts are defined in a single index.yaml file, which maps specific tasks to their associated prompt templates. This design enables:
- Consistent prompt formatting
- Easier version control and auditing
- Reusability across multiple modules
Each task class dynamically loads its associated prompt using metadata from this file.
commenter.add_comments:
inline: |
"You are a T-SQL expert. Given the SQL code below, please:
1. Prepend a comment header block with:
-- =============================================
-- Author: {user}
-- Create date: {timestamp}
-- Description: <Provide a detailed overview of this query>
-- =============================================
2. Add or improve inline comments throughout the query.
3. Only return the updated SQL code with no markdown formatting.
SQL Code:
{sql_query}"
used_by: tasks.sql_commenter.SQLCommenter
inputs:
- sql_query
- user
- timestamp
version: 1.0
description: Add comments and metadata headers to SQL queries.- Logs are stored per task under the
logs/directory - Safe use of T-SQL comments (
--,/* ... */) - Output sanitized for code-only results when using
--sanitize - Aligned with HIPAA/HITECH compliance standards
This project is licensed under the MIT License.
- Fork and open a PR
- Follow modular design and solid principles
- Ensure proper logging, error handling, and secure configuration
- Vision and engineering by Hans Esquivel
- Powered by Python & Azure OpenAI
- Thanks to the SQL and ML community for insights and best practices
Please open an issue or start a discussion if you want to get in touch.