docs: Add internal database query assistant skill simulation

Skill 18: agdb-query-assistant-skill — non-technical analysts ask
questions in plain English, get safe read-only SQL with explanation
before execution. Covers onboarding (new hire learning the schema),
simple queries (trading volume), complex business logic (crush margin
with unit conversions), follow-ups (12-month trends), and security
guardrails (client PII redaction, audit logging, query validation).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Francy Lisboa Charuto 2026-03-04 17:59:35 -03:00
parent a70b5a5859
commit 8a9166a225

View file

@ -2781,8 +2781,9 @@ FINAL TEAM SKILL LIBRARY — FULL DATA PIPELINE STACK
WORKFLOW SKILLS (process automation):
publication-compliance-skill Internal style/legal/citation rules Pre-submission compliance check
email-thread-intel-skill Email threads (.eml/.msg/raw text) Intent extraction & action items
agdb-query-assistant-skill Internal PostgreSQL data warehouse Natural language → safe SQL queries
18 skills. 8 data sources. One team.
19 skills. 9 data sources. One team.
The junior analyst who joined last month can type:
/noaa-crop-monitor-skill Morning scan
@ -3821,3 +3822,777 @@ The ROI is immediate and measurable:
The skill pays for itself the first time it catches a scope change
you would have missed, or surfaces an overdue action item that was
about to become a client complaint.
=============================================================
SKILL 18: Internal Database Query Assistant — Onboarding
Non-Technical Analysts to Company Data
=============================================================
This is arguably the most common enterprise problem: the company has
a rich internal database (or data warehouse, or lakehouse) with years
of proprietary data — trade records, position history, client
portfolios, internal forecasts, broker contacts, shipping logs. The
data team built it, documented it (partially), and gave everyone
read-only credentials. The problem? Most analysts can't write SQL.
They depend on the data team for every ad-hoc query, wait 2-3 days
for a response, and often get back something different from what they
asked for because the request was ambiguous. This skill turns the
database documentation and schema into a query-writing assistant that
knows the company's specific tables, naming conventions, gotchas, and
business logic — so non-technical analysts can get answers in seconds
instead of days.
User: /agent-skill-creator Our firm has an internal PostgreSQL data warehouse called "agdb" that contains 15 years of proprietary commodity data — trade executions, position history, client order flow, internal S&D model outputs, broker price indications, shipping fixtures, and warehouse stock reports. The data team documented it in a Confluence-style wiki that I've exported here: references/agdb-schema-docs.pdf (142 pages — every table, every column, every relationship). The problem is nobody outside the data team knows how to query it. I'm one of 12 commodity analysts who have read-only access (via a connection string in env as AGDB_READ_DSN) but I don't really know SQL. When I need data, I send a Slack message to the data team and wait 1-3 days. Sometimes they misunderstand what I need and I wait another day for a revision. I want a skill that: (1) understands our entire schema — all 87 tables, the relationships, the naming conventions, the business logic embedded in the column definitions, (2) lets me ask questions in plain English and generates safe, read-only SQL queries, (3) EXPLAINS what the query does before running it so I can verify it's what I meant, (4) runs the query and formats the results into a clean table or chart, (5) knows our firm-specific conventions — like that "position" means net position (long minus short), that our fiscal year starts April 1, that prices in the trades table are in cents per bushel for grains but dollars per metric tonne for oilseeds, and (6) has guardrails so I can't accidentally run something destructive or pull restricted client data. I also want my colleagues to use this — some know basic SQL, some know zero. Here's a sample of common queries the data team runs for us: references/common-queries-catalog.pdf (their top 50 most-requested queries with SQL).
Copilot (loading agent-skill-creator skill):
Reading references/agdb-schema-docs.pdf (142 pages)...
Reading references/common-queries-catalog.pdf...
This is a schema-aware query generation skill with safety guardrails.
The 142-page schema doc is the gold — it contains the institutional
knowledge that makes queries correct (fiscal year conventions, unit
differences, net position logic). The top-50 query catalog is the
training set — these are proven-correct queries I can decompose into
patterns. Let me build this.
Implicit requirements from your documents:
SCHEMA OVERVIEW (from agdb-schema-docs.pdf):
- 87 tables across 8 schemas:
trading.* — 14 tables (executions, positions, orders, fills)
market.* — 11 tables (prices, curves, settlements, vol surfaces)
research.* — 9 tables (forecasts, model outputs, S&D balances)
shipping.* — 12 tables (fixtures, vessel tracking, freight rates)
warehouse.* — 8 tables (stocks, receipts, deliveries, locations)
clients.* — 10 tables (accounts, portfolios, order history)
reference.* — 15 tables (instruments, exchanges, brokers, regions)
audit.* — 8 tables (change logs, access logs, query logs)
- Naming convention: snake_case everywhere, e.g. trading.daily_positions,
market.settlement_prices
- Primary keys: all tables use surrogate integer IDs (table_id)
- Foreign keys: instrument_id links to reference.instruments across
all schemas. client_id links to clients.accounts
- Temporal columns: trade_date (date), created_at (timestamp with tz),
updated_at (timestamp). Most analytical queries use trade_date
- Soft deletes: is_deleted boolean on most tables (must filter!)
FIRM-SPECIFIC CONVENTIONS (critical for correct queries):
- "Position" = net position (long_qty - short_qty), NOT gross
- Fiscal year: April 1 March 31 (FY2026 = Apr 2025 Mar 2026)
- Crop year: varies by commodity:
US corn/soy: Sep 1 Aug 31
US wheat: Jun 1 May 31
Brazil soy: Feb 1 Jan 31
Coffee: Oct 1 Sep 30
- Price units in trading.executions:
Grains (corn, wheat, oats): cents per bushel (divide by 100 for $/bu)
Oilseeds (soybeans, canola): cents per bushel
Oilseed products (meal, oil): dollars per short ton (meal), cents
per pound (oil)
Coffee: cents per pound
Sugar: cents per pound
Cotton: cents per pound
Freight: $/MT (worldscale for tankers)
- When analysts say "P&L", they mean realized + unrealized, marked to
the prior day's settlement (market.settlement_prices)
- The research.forecasts table contains both published and draft
forecasts. Always filter: status = 'published' unless explicitly
looking at drafts
- The clients.* schema is RESTRICTED: only client_id, region, and
segment columns are accessible to analysts. Personal details
(name, contact, AUM) require compliance approval
SAFETY REQUIREMENTS:
- READ-ONLY: connection uses a read-only PostgreSQL role. But the
skill should also validate queries before execution — no INSERT,
UPDATE, DELETE, DROP, TRUNCATE, ALTER, CREATE, GRANT
- QUERY LIMITS: auto-append LIMIT 10000 to any query without an
explicit LIMIT (prevent accidental full-table scans on 500M+ row
tables)
- CLIENT DATA REDACTION: if a query touches clients.* schema,
automatically exclude name, email, phone, address columns. Replace
with client_id only
- AUDIT LOGGING: log every query executed (query text, user, timestamp,
row count) to a local file for the data team to review
- EXPLAIN BEFORE RUN: always show the generated SQL and a plain-English
explanation before executing. User must confirm or modify
- TIMEOUT: 30-second query timeout. If exceeded, suggest adding
filters or ask the data team for help
- NO JOINS ON AUDIT.*: the audit schema is for compliance only — never
join it into analytical queries
Building...
agdb-query-assistant-skill/
├── SKILL.md
├── scripts/
│ ├── query_generator.py # Natural language → SQL translation
│ │ # using schema context + conventions
│ ├── schema_loader.py # Parse schema docs into structured
│ │ # metadata (tables, columns, types,
│ │ # relationships, conventions)
│ ├── query_validator.py # Safety checks: read-only enforcement,
│ │ # LIMIT injection, client data
│ │ # redaction, audit schema blocking
│ ├── query_explainer.py # SQL → plain English explanation
│ │ # ("This query finds all soybean
│ │ # trades from the last 30 days...")
│ ├── query_executor.py # Run query via AGDB_READ_DSN, handle
│ │ # timeouts, format results
│ ├── result_formatter.py # Output as table, CSV, chart, or
│ │ # summary depending on result shape
│ ├── unit_converter.py # Auto-convert cents/bu to $/bu,
│ │ # $/short ton to $/MT, etc. based
│ │ # on commodity context
│ ├── common_queries.py # Pre-built query templates from the
│ │ # top-50 catalog (parameterized)
│ └── audit_logger.py # Log all executed queries locally
├── references/
│ ├── schema-summary.md # Human-readable schema overview
│ │ # (condensed from 142-page PDF)
│ ├── table-catalog.md # All 87 tables with descriptions
│ ├── column-conventions.md # Price units, date conventions,
│ │ # fiscal year, crop year logic
│ ├── relationship-map.md # Foreign key graph (what joins what)
│ ├── common-patterns.md # Top-50 query patterns decomposed
│ ├── restricted-tables.md # Client data restrictions, audit
│ │ # schema rules
│ ├── gotchas.md # is_deleted filter, draft forecasts,
│ │ # timezone handling, null semantics
│ └── methodology.md
├── assets/
│ ├── schema_metadata.json # Machine-readable schema (87 tables,
│ │ # ~1,200 columns, all FK relationships)
│ ├── unit_map.json # Commodity → price unit mapping
│ ├── crop_year_map.json # Commodity → crop year start month
│ ├── fiscal_calendar.json # FY mapping (FY2026 = Apr 2025Mar 2026)
│ ├── restricted_columns.json # Columns that require redaction
│ ├── query_templates.json # Parameterized top-50 queries
│ └── sql_blocklist.json # Prohibited SQL keywords/patterns
├── install.sh
└── README.md
Validation: PASSED
Security: PASSED (AGDB_READ_DSN read from env, read-only enforced,
client data redacted, audit logging enabled)
Installed at: ~/.agents/skills/agdb-query-assistant-skill
Shared to: github.com/acme-trading/agdb-query-assistant-skill
--- A non-technical analyst asks their first question ---
Analyst (no SQL experience): /agdb-query-assistant-skill What was our total soybean trading volume last month?
Copilot (loading agdb-query-assistant-skill):
Loading schema context for agdb...
I'll query the trading.executions table for soybean trades in
February 2026. A few things I need to account for:
- "Soybeans" maps to instrument_group = 'soybeans' in
reference.instruments (includes CBOT soybeans, soybean meal,
soybean oil). Did you mean just soybeans, or the whole complex?
- "Volume" could mean: number of contracts (trade_qty), notional
value (trade_qty × price × multiplier), or metric tonnes
(trade_qty × contract_size_mt)
- "Last month" = February 2026 (Feb 128)
I'll assume you mean the soybeans complex (beans + meal + oil),
measured in contracts, for February 2026. Let me know if you
meant something different.
Here's the query I'd run:
┌─────────────────────────────────────────────────────────┐
│ SELECT │
│ i.instrument_name, │
│ COUNT(*) AS num_trades, │
│ SUM(e.trade_qty) AS total_contracts, │
│ SUM(e.trade_qty * i.contract_size_mt) AS total_mt │
│ FROM trading.executions e │
│ JOIN reference.instruments i │
│ ON e.instrument_id = i.instrument_id │
│ WHERE i.instrument_group = 'soybeans' │
│ AND e.trade_date >= '2026-02-01' │
│ AND e.trade_date < '2026-03-01' │
│ AND e.is_deleted = FALSE │
│ GROUP BY i.instrument_name │
│ ORDER BY total_contracts DESC; │
└─────────────────────────────────────────────────────────┘
PLAIN ENGLISH: "Find all soybean complex trades (soybeans, meal, oil)
executed in February 2026 that haven't been deleted. Count the number
of trades and sum the contracts and metric tonnes, broken down by
specific instrument."
SAFETY CHECK: ✓ Read-only (SELECT only)
SAFETY CHECK: ✓ No client data accessed
SAFETY CHECK: ✓ is_deleted filter applied
SAFETY CHECK: ✓ Date range bounded (won't scan full history)
Run this query? [Yes / Modify / Cancel]
Analyst: Yes
Copilot:
Running query... (0.3 seconds, 4 rows returned)
SOYBEAN COMPLEX TRADING VOLUME — February 2026
| Instrument | Trades | Contracts | Metric Tonnes |
|-------------------|--------|-----------|---------------|
| CBOT Soybeans | 342 | 18,450 | 2,518,418 |
| CBOT Soybean Meal | 198 | 9,200 | 920,000 |
| CBOT Soybean Oil | 156 | 7,800 | 424,322 |
| DCE Soybeans No.1 | 28 | 1,400 | 14,000 |
| TOTAL | 724 | 36,850 | 3,876,740 |
Notes:
- Metric tonnes are calculated from contract size (CBOT soybeans =
136.08 MT/contract, CBOT meal = 100 ST ≈ 90.7 MT, CBOT oil =
60,000 lbs ≈ 27.2 MT per contract, DCE = 10 MT/contract)
- This includes both buy and sell sides. Our NET volume (buys minus
sells) would require a different query — let me know if you want
that instead
- DCE soybeans volume is low — that's normal, we rarely trade
Dalian directly
Query logged to audit/query-log-2026-03-04.csv
--- The analyst digs deeper with a follow-up ---
Analyst: /agdb-query-assistant-skill How does that compare to the same month last year? And show me the trend for the last 12 months.
Copilot (loading agdb-query-assistant-skill):
Good follow-up. I'll expand the date range to Mar 2025 Feb 2026
(12 months) and include Feb 2025 for the year-over-year comparison.
┌─────────────────────────────────────────────────────────┐
│ SELECT │
│ DATE_TRUNC('month', e.trade_date) AS month, │
│ SUM(e.trade_qty) AS total_contracts, │
│ SUM(e.trade_qty * i.contract_size_mt) AS total_mt, │
│ COUNT(*) AS num_trades │
│ FROM trading.executions e │
│ JOIN reference.instruments i │
│ ON e.instrument_id = i.instrument_id │
│ WHERE i.instrument_group = 'soybeans' │
│ AND e.trade_date >= '2025-02-01' │
│ AND e.trade_date < '2026-03-01' │
│ AND e.is_deleted = FALSE │
│ GROUP BY DATE_TRUNC('month', e.trade_date) │
│ ORDER BY month; │
└─────────────────────────────────────────────────────────┘
PLAIN ENGLISH: "Sum soybean complex contracts by month for the last
13 months (Feb 2025 through Feb 2026) to show the trend and enable
a year-over-year comparison."
Run this query? [Yes / Modify / Cancel]
Analyst: Yes
Copilot:
Running query... (0.8 seconds, 13 rows returned)
SOYBEAN COMPLEX — MONTHLY TRADING VOLUME (contracts)
| Month | Contracts | vs Prev Month | Trades |
|----------|-----------|---------------|--------|
| Feb 2025 | 31,200 | | 612 |
| Mar 2025 | 34,800 | +11.5% | 688 |
| Apr 2025 | 29,400 | -15.5% | 571 |
| May 2025 | 33,100 | +12.6% | 649 |
| Jun 2025 | 38,200 | +15.4% | 741 |
| Jul 2025 | 42,500 | +11.3% | 822 |
| Aug 2025 | 35,600 | -16.2% | 694 |
| Sep 2025 | 30,200 | -15.2% | 589 |
| Oct 2025 | 28,800 | -4.6% | 562 |
| Nov 2025 | 32,400 | +12.5% | 634 |
| Dec 2025 | 26,100 | -19.4% | 508 |
| Jan 2026 | 34,500 | +32.2% | 672 |
| Feb 2026 | 36,850 | +6.8% | 724 |
YEAR-OVER-YEAR: Feb 2026 (36,850) vs Feb 2025 (31,200) = +18.1%
TREND OBSERVATIONS:
- Volume peaked in July 2025 (42,500 contracts) — coincides with
the Argentina drought scare that drove CBOT to $13.40
- December 2025 was the low point (26,100) — typical year-end
position flattening before FY close (our fiscal year ends March 31)
- JanuaryFebruary 2026 rebound (+32% and +7%) likely driven by the
current Argentina drought driving renewed speculative activity
- YoY growth of 18% suggests the desk is more active this year,
possibly due to higher volatility creating more trading opportunities
[Chart generated: reports/soy-volume-trend-12m.png]
--- A more experienced analyst writes a complex question ---
Senior Analyst: /agdb-query-assistant-skill I need to see our net soybean position by trader, broken down by contract month, as of close of business yesterday. Include the average entry price and the mark-to-market P&L using yesterday's settlement. I need this for the weekly risk meeting.
Copilot (loading agdb-query-assistant-skill):
This requires joining 3 tables: trading.daily_positions (net position
by trader and instrument), reference.instruments (contract details),
and market.settlement_prices (yesterday's marks). The P&L calculation
uses the firm's convention: (settlement_price - avg_entry_price) ×
net_qty × contract_multiplier, with the sign convention that positive
= profit on longs, negative = loss on longs.
Important conventions:
- Prices in trading.daily_positions.avg_entry_price are in CENTS per
bushel (firm convention for grains). Settlement prices in
market.settlement_prices are also in cents per bushel.
- The P&L output should be in DOLLARS (not cents). So the formula
includes a /100 conversion.
- CBOT soybean contract multiplier = 5,000 bushels
- I need to handle both outright and spread positions. The
daily_positions table has a position_type column ('outright' or
'spread') — I'll include both but flag them.
┌─────────────────────────────────────────────────────────┐
│ SELECT │
│ p.trader_name, │
│ i.instrument_name, │
│ i.contract_month, │
│ p.position_type, │
│ p.net_qty AS net_contracts, │
│ ROUND(p.avg_entry_price / 100.0, 4) AS avg_entry_usd,│
│ ROUND(s.settlement_price / 100.0, 4) AS settle_usd, │
│ ROUND( │
│ (s.settlement_price - p.avg_entry_price) │
│ * p.net_qty * i.contract_multiplier │
│ / 100.0, 2 │
│ ) AS mtm_pnl_usd │
│ FROM trading.daily_positions p │
│ JOIN reference.instruments i │
│ ON p.instrument_id = i.instrument_id │
│ JOIN market.settlement_prices s │
│ ON p.instrument_id = s.instrument_id │
│ AND s.price_date = '2026-03-03' │
│ WHERE i.instrument_group = 'soybeans' │
│ AND p.position_date = '2026-03-03' │
│ AND p.is_deleted = FALSE │
│ AND p.net_qty != 0 │
│ ORDER BY p.trader_name, i.contract_month; │
└─────────────────────────────────────────────────────────┘
PLAIN ENGLISH: "For each trader, show their net soybean position
by contract month as of March 3, 2026. Convert prices from cents
to dollars. Calculate mark-to-market P&L by comparing each trader's
average entry price to yesterday's settlement, multiplied by their
net contracts and the contract multiplier (5,000 bushels). Exclude
flat positions (net_qty = 0) and deleted records."
SAFETY CHECK: ✓ Read-only
SAFETY CHECK: ✓ No client data
SAFETY CHECK: ✓ is_deleted filtered
SAFETY CHECK: ✓ Single date (no full-table scan)
Run this query? [Yes / Modify / Cancel]
Senior Analyst: Yes
Copilot:
Running query... (0.4 seconds, 18 rows returned)
NET SOYBEAN POSITION BY TRADER — COB March 3, 2026
| Trader | Contract | Type | Net Lots | Avg Entry | Settle | MTM P&L ($) |
|--------------|------------|----------|----------|-----------|---------|----------------|
| Chen, David | May-26 | outright | +340 | $10.8200 | $10.9450| +$212,500.00 |
| Chen, David | Jul-26 | outright | +180 | $10.9600 | $11.0800| +$108,000.00 |
| Chen, David | May/Jul | spread | -120 | -$0.1400 | -$0.1350| +$3,000.00 |
| Lopez, Ana | May-26 | outright | -200 | $11.0400 | $10.9450| +$95,000.00 |
| Lopez, Ana | Nov-26 | outright | +150 | $10.4200 | $10.5600| +$105,000.00 |
| Patel, Raj | May-26 | outright | +420 | $10.7800 | $10.9450| +$351,750.00 |
| Patel, Raj | Jul-26 | outright | +280 | $10.8400 | $11.0800| +$336,000.00 |
| Patel, Raj | May/Jul | spread | +200 | -$0.0600 | -$0.1350| -$75,000.00 |
| ... | ... | ... | ... | ... | ... | ... |
DESK SUMMARY:
| Trader | Net Outrights | Gross Lots | Total MTM P&L |
|--------------|---------------|------------|----------------|
| Chen, David | +400 | 640 | +$323,500 |
| Lopez, Ana | -50 | 350 | +$200,000 |
| Patel, Raj | +500 | 900 | +$612,750 |
| DESK TOTAL | +850 | 1,890 | +$1,136,250 |
NOTES FOR YOUR RISK MEETING:
- Desk is net long 850 soybean contracts (~115,830 MT). This is
within the 1,000-contract desk limit per the risk policy
- Patel has the largest position (500 net lots, $613K MTM). His
May/Jul spread is underwater (-$75K) while outrights are up big
- Lopez is the only trader net short (-50 lots). Her short May
position is profitable — she sold near the high
- If March WASDE (tomorrow) surprises bearish, the desk's +850 net
long is exposed. A $0.20/bu decline = -$850K to the desk P&L
--- A brand-new hire who has never seen the database ---
New Analyst (first week): /agdb-query-assistant-skill I just joined the team. I'm supposed to update the weekly S&D balance sheet but I don't know where the data lives. Where are the production, consumption, and trade numbers? And how are they organized?
Copilot (loading agdb-query-assistant-skill):
Welcome aboard! The S&D data lives in the research.* schema. Let me
walk you through the tables you'll need for the weekly balance sheet.
YOUR KEY TABLES:
1. research.sd_balances — The main S&D table
This is where the team's published balance sheets live. One row
per commodity × country × crop year × balance item × forecast date.
Key columns:
- commodity (e.g. 'soybeans', 'corn', 'wheat')
- country (ISO 3166-1 alpha-2: 'US', 'BR', 'AR', 'CN')
- crop_year (e.g. '2025/26' — note: slash format, not dash)
- item (e.g. 'production', 'crush', 'exports', 'imports',
'feed_use', 'ending_stocks', 'beginning_stocks')
- value_mt (metric tonnes — ALL values are in MT, never bushels)
- forecast_date (date the estimate was published internally)
- status ('published' or 'draft' — always filter for
'published' unless James tells you otherwise)
- analyst_id (who created this estimate)
2. research.sd_history — Historical revisions
Every time someone updates a balance sheet number, the old value
is preserved here. Useful for tracking how your team's estimates
evolved over time (what did we think in January vs what we think
now).
3. research.model_outputs — The quantitative model results
The quant team runs regression models for yield, demand, etc.
Their outputs land here. The analysts then review and adjust
these before publishing to sd_balances.
4. reference.crop_years — Crop year calendar
Maps crop_year strings to actual date ranges by commodity.
Critical because '2025/26' means different dates for corn
(Sep 1, 2025 Aug 31, 2026) vs wheat (Jun 1, 2025 May 31, 2026).
THE WEEKLY BALANCE SHEET WORKFLOW:
Your weekly task is to:
1. Pull the LATEST published balance for each commodity
2. Compare to the PREVIOUS week's balance
3. Note what changed and why
4. Publish in the weekly report
Here's the query to get the latest US soybeans balance sheet:
┌─────────────────────────────────────────────────────────┐
│ SELECT │
│ b.item, │
│ ROUND(b.value_mt / 1000000.0, 2) AS value_mmt, │
│ b.forecast_date, │
│ a.analyst_name │
│ FROM research.sd_balances b │
│ JOIN reference.analysts a │
│ ON b.analyst_id = a.analyst_id │
│ WHERE b.commodity = 'soybeans' │
│ AND b.country = 'US' │
│ AND b.crop_year = '2025/26' │
│ AND b.status = 'published' │
│ AND b.forecast_date = ( │
│ SELECT MAX(forecast_date) │
│ FROM research.sd_balances │
│ WHERE commodity = 'soybeans' │
│ AND country = 'US' │
│ AND crop_year = '2025/26' │
│ AND status = 'published' │
│ ) │
│ ORDER BY │
│ CASE b.item │
│ WHEN 'beginning_stocks' THEN 1 │
│ WHEN 'production' THEN 2 │
│ WHEN 'imports' THEN 3 │
│ WHEN 'total_supply' THEN 4 │
│ WHEN 'crush' THEN 5 │
│ WHEN 'exports' THEN 6 │
│ WHEN 'feed_use' THEN 7 │
│ WHEN 'residual' THEN 8 │
│ WHEN 'total_use' THEN 9 │
│ WHEN 'ending_stocks' THEN 10 │
│ END; │
└─────────────────────────────────────────────────────────┘
PLAIN ENGLISH: "Get the most recently published US soybeans 2025/26
balance sheet, with each line item in million metric tonnes, ordered
in standard S&D format (supply items first, then demand, then stocks)."
Run this query? [Yes / Modify / Cancel]
New Analyst: Yes
Copilot:
Running query... (0.2 seconds, 10 rows returned)
US SOYBEANS 2025/26 BALANCE SHEET (latest internal estimate)
Published: March 3, 2026 | Analyst: Maria Santos
| Item | MMT |
|-------------------|---------|
| Beginning Stocks | 9.52 |
| Production | 118.84 |
| Imports | 0.68 |
| TOTAL SUPPLY | 129.04 |
| Crush | 63.50 |
| Exports | 49.67 |
| Feed & Residual | 4.90 |
| Seed & Other | 5.22 |
| TOTAL USE | 123.29 |
| ENDING STOCKS | 5.75 |
This is YOUR TEAM'S estimate, not USDA's. USDA's February WASDE
had ending stocks at 10.34 MMT (281 million bushels). Your team
is MUCH tighter at 5.75 MMT — which is why the desk is net long.
The difference is mainly in exports: your team has 49.67 MMT vs
USDA's 47.90 MMT.
USEFUL COMMANDS FOR YOUR FIRST WEEK:
Ask me:
- "Show me how our estimate changed over the last 4 weeks" (revision
tracking from sd_history)
- "Compare our balance to USDA" (if USDA data is in market.external_sd)
- "What's the Brazil balance sheet?" (same query, country = 'BR')
- "Who last updated the corn balance?" (analyst_id + forecast_date)
- "Show me all commodities we cover" (distinct commodities in
sd_balances)
I'll generate the SQL, explain it, and run it — you don't need to
learn SQL. But I'll show you every query so you start to recognize
the patterns. Most analysts pick up enough SQL to modify queries
within 2-3 weeks of using this skill.
--- An analyst asks something the skill should refuse ---
Analyst: /agdb-query-assistant-skill Show me the full client list with their AUM and contact details. I need to send out the weekly report.
Copilot (loading agdb-query-assistant-skill):
⛔ RESTRICTED DATA REQUEST
The clients.accounts table contains personally identifiable
information (name, email, phone, address) and sensitive financial
data (AUM, portfolio details) that are classified as RESTRICTED
under the firm's data governance policy.
Your read-only database role can see:
- client_id (anonymized identifier)
- region (e.g., 'APAC', 'EMEA', 'Americas')
- segment (e.g., 'institutional', 'corporate', 'retail')
For client names, contact details, or AUM data, you need to:
1. Contact the Sales/CRM team — they manage the distribution list
2. Use the firm's CRM system (Salesforce) for contact details
3. If you need client data for analytical purposes, request
compliance approval through the data governance portal
I CANNOT run a query that exposes client PII, even with your
read-only credentials. This is a firm policy, not a technical
limitation — the columns exist in the database but the skill is
configured to redact them.
WHAT I CAN DO: If you need aggregate client metrics for research
purposes (e.g., "how many institutional clients are in APAC?"),
I can query that without exposing individual records:
┌─────────────────────────────────────────────────────────┐
│ SELECT │
│ region, │
│ segment, │
│ COUNT(*) AS client_count │
│ FROM clients.accounts │
│ WHERE is_active = TRUE │
│ AND is_deleted = FALSE │
│ GROUP BY region, segment │
│ ORDER BY region, segment; │
└─────────────────────────────────────────────────────────┘
Would this help, or do you need the full contact list from Sales?
--- An analyst asks a question that requires business logic ---
Analyst: /agdb-query-assistant-skill What's our soybean crush margin right now? And how does it compare to the 5-year average for this time of year?
Copilot (loading agdb-query-assistant-skill):
Crush margin is a CALCULATED value, not stored directly. It requires
the firm's specific formula (from the schema docs, section 14.3):
Board Crush Margin ($/bu) =
(Soybean Meal price × 0.022) + (Soybean Oil price × 11.0)
- Soybean price
Where:
- Soybean price: $/bu from market.settlement_prices (CBOT S 1)
- Soybean Meal price: $/short ton from market.settlement_prices (CBOT SM 1)
- Soybean Oil price: ¢/lb from market.settlement_prices (CBOT BO 1)
- The 0.022 converts $/short ton to $/bu for meal (yield: 44 lbs
meal per bushel, 1 short ton = 2000 lbs, so 1 bu → 0.022 tons)
- The 11.0 converts ¢/lb to $/bu for oil (yield: 11 lbs oil per
bushel, price in cents → 11 lbs × price / 100)
⚠️ IMPORTANT: The prices in our database are stored in DIFFERENT
units per commodity:
- Soybeans: cents/bu → divide by 100 for $/bu
- Meal: dollars/short ton (already in $)
- Oil: cents/lb → used directly in the formula (the 11.0 factor
handles the conversion)
┌─────────────────────────────────────────────────────────┐
│ WITH latest AS ( │
│ SELECT │
│ MAX(price_date) AS pd │
│ FROM market.settlement_prices │
│ WHERE price_date <= CURRENT_DATE │
│ ), │
│ prices AS ( │
│ SELECT │
│ i.instrument_code, │
│ s.settlement_price, │
│ s.price_date │
│ FROM market.settlement_prices s │
│ JOIN reference.instruments i │
│ ON s.instrument_id = i.instrument_id │
│ CROSS JOIN latest l │
│ WHERE s.price_date = l.pd │
│ AND i.instrument_code IN ('S_1', 'SM_1', 'BO_1') │
│ ), │
│ margin AS ( │
│ SELECT │
│ p_s.price_date, │
│ ROUND(p_s.settlement_price / 100.0, 4) AS soy_usd, │
│ ROUND(p_sm.settlement_price, 2) AS meal_usd_ton, │
│ ROUND(p_bo.settlement_price, 4) AS oil_cents_lb, │
│ ROUND( │
│ (p_sm.settlement_price * 0.022) │
│ + (p_bo.settlement_price * 11.0 / 100.0) │
│ - (p_s.settlement_price / 100.0) │
│ , 4) AS crush_margin_usd │
│ FROM prices p_s │
│ JOIN prices p_sm ON p_sm.instrument_code = 'SM_1' │
│ JOIN prices p_bo ON p_bo.instrument_code = 'BO_1' │
│ WHERE p_s.instrument_code = 'S_1' │
│ ) │
│ SELECT * FROM margin; │
└─────────────────────────────────────────────────────────┘
PLAIN ENGLISH: "Get yesterday's settlement prices for soybeans,
soybean meal, and soybean oil. Calculate the board crush margin
using the firm's standard formula. Convert all prices to consistent
units (dollars per bushel for the final margin)."
Run this query? [Yes / Modify / Cancel]
Analyst: Yes, and also the 5-year average
Copilot:
Running current margin query... (0.2 seconds)
Running 5-year seasonal average... (1.4 seconds)
SOYBEAN CRUSH MARGIN — March 3, 2026
CURRENT:
| Component | Price | Contribution |
|---------------|-------------------|---------------|
| Soybeans (S1) | $10.9450/bu | -$10.9450 |
| Meal (SM1) | $312.40/short ton | +$6.8728 |
| Oil (BO1) | 45.82¢/lb | +$5.0402 |
| CRUSH MARGIN | | +$0.9680/bu |
5-YEAR SEASONAL COMPARISON (same calendar week):
| Year | Crush Margin ($/bu) | Notes |
|------|---------------------|--------------------------------|
| 2022 | $2.1240 | Post-COVID demand boom |
| 2023 | $1.5680 | Still elevated |
| 2024 | $1.0420 | Normalizing |
| 2025 | $0.8860 | Below average |
| 2026 | $0.9680 | Current |
| 5yr Avg | $1.3176 | |
ASSESSMENT:
Current crush margin of $0.97/bu is 27% BELOW the 5-year average
of $1.32/bu for early March. The margin has been below average since
mid-2024, driven by weak soybean oil demand (RD mandate uncertainty)
and adequate meal supplies.
For context: crush margins below $0.80/bu are considered "crush
discouraging" — processors start reducing run rates. At $0.97 we're
above that threshold but not by much. If margins compress further,
expect crush pace to slow in Q2, which is bullish for meal and
bearish for bean oil.
This is the kind of answer that would have taken the data team
2 days to produce. The business logic (the crush formula, the unit
conversions, the seasonal comparison) is encoded in the skill. You
asked in plain English and got the answer in 3 seconds.
=============================================================
WHY THIS SKILL WORKS
=============================================================
The internal database skill solves three problems simultaneously:
1. THE BOTTLENECK PROBLEM
Every firm has 2-3 people who know the database well. They're the
bottleneck for every analyst's data request. When the senior data
engineer goes on vacation, the whole team slows down. The skill
distributes database knowledge to everyone with read access.
WITHOUT THE SKILL:
- Analyst sends Slack message to data team: "Can you pull...?"
- Data team triages: 1-3 day wait (they have 40 requests in queue)
- Data team delivers: sometimes wrong interpretation
- Analyst requests revision: another 1-2 days
- Total: 2-5 business days for one data pull
WITH THE SKILL:
- Analyst asks in plain English: 10 seconds
- Skill generates, explains, and runs query: 30 seconds
- If wrong interpretation, analyst asks again: 30 seconds
- Total: under 2 minutes
2. THE ONBOARDING PROBLEM
A new hire with read-only database access but no SQL knowledge is
essentially locked out of the firm's most valuable asset — 15 years
of proprietary data. The skill turns the 142-page schema document
into an interactive guide. The new analyst from the simulation was
productive on DAY ONE, pulling balance sheets and understanding
table relationships without a 3-week SQL training course.
3. THE INSTITUTIONAL KNOWLEDGE PROBLEM
The business logic (crush margin formula, fiscal year convention,
crop year definitions, price unit conversions) exists in three
places: the schema docs (which nobody reads), senior analysts' heads
(which leave when they do), and the data team's query history (which
nobody outside the team can access). The skill codifies ALL of this
into reusable, version-controlled reference files. When the firm
changes its fiscal year convention, you update one JSON file.
SAFETY IS THE FEATURE, NOT THE CONSTRAINT:
The skill's guardrails (read-only validation, client data redaction,
query explanation before execution, audit logging) aren't limitations
— they're what make it trustworthy enough to give to non-technical
analysts. The data team WANTS analysts to self-serve, but they don't
want someone accidentally running SELECT * on a 500M-row table or
pulling client PII into an email attachment. The skill makes self-
service safe.