docs: Add "Don't Make Humans Be Clear" design principle + messy input simulations
5 realistic interactions showing how agent-skill-creator must work with inarticulate, messy human input — not clean specifications: 1. The File Dump — analyst drags 5 files and types "here" 2. The URL Dump — half-sentence with 2 URLs and "same thing as wasde" 3. The Screenshot + Complaint — Paint-annotated Bloomberg screenshot and "this is ridiculous" (reveals the workflow was unnecessary — data already existed in Databricks) 4. The Forwarded Email — 6-message chain with legal disclaimers, agent extracts the one useful paragraph from Oliver in London 5. The One Word — analyst types "freight", agent infers from desk context, Databricks catalog, and colleague skills Closes with 6 design principles: file interpretation over requirements gathering, context inference, progressive refinement, discovery over assumption, confirm don't interrogate, fail forward not fail safe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
000b1008e6
commit
0663e3ef8a
1 changed files with 700 additions and 0 deletions
|
|
@ -5771,3 +5771,703 @@ That's all agent-skill-creator ever needed to be. Not a developer
|
||||||
tool. Not an AI platform. Not a framework.
|
tool. Not an AI platform. Not a framework.
|
||||||
|
|
||||||
A door.
|
A door.
|
||||||
|
|
||||||
|
|
||||||
|
#############################################################
|
||||||
|
#############################################################
|
||||||
|
## ##
|
||||||
|
## DESIGN PRINCIPLE: ##
|
||||||
|
## "DON'T MAKE HUMANS BE CLEAR" ##
|
||||||
|
## ##
|
||||||
|
## Real users don't write specifications. They dump files, ##
|
||||||
|
## paste URLs, forward emails, and say "make it work." ##
|
||||||
|
## The agent must derive intent from messy context. ##
|
||||||
|
## ##
|
||||||
|
#############################################################
|
||||||
|
#############################################################
|
||||||
|
|
||||||
|
THE PROBLEM WITH THE SIMULATIONS ABOVE:
|
||||||
|
|
||||||
|
Every simulation so far has a user who writes something like:
|
||||||
|
|
||||||
|
"I need a skill that: (1) does X, (2) does Y, (3) handles Z,
|
||||||
|
with format A for input and format B for output. Here are my
|
||||||
|
requirements: [perfectly structured paragraph]."
|
||||||
|
|
||||||
|
That's fiction. Real users — especially non-technical analysts in
|
||||||
|
corporate environments — do NOT write clear specifications. They
|
||||||
|
can't. Not because they're lazy or stupid, but because:
|
||||||
|
|
||||||
|
- They don't know what they don't know (they can't specify edge
|
||||||
|
cases they haven't thought about)
|
||||||
|
- They don't have the vocabulary (they say "the thing that shows
|
||||||
|
the numbers" not "the S&D balance sheet pivot table")
|
||||||
|
- They're busy (they have 10 minutes between meetings, not an hour
|
||||||
|
to write a requirements doc)
|
||||||
|
- The workflow lives in muscle memory (they can DO it, but they
|
||||||
|
can't DESCRIBE it — like explaining how to ride a bicycle)
|
||||||
|
- They communicate in artifacts, not words (they'll show you the
|
||||||
|
spreadsheet, not describe it)
|
||||||
|
|
||||||
|
agent-skill-creator must be designed to work with MESSY, INCOMPLETE,
|
||||||
|
INARTICULATE input. The agent's job is to derive intent from whatever
|
||||||
|
the human dumps on it — files, URLs, screenshots, half-sentences,
|
||||||
|
forwarded emails, "like this but different."
|
||||||
|
|
||||||
|
Here's what REAL interactions look like:
|
||||||
|
|
||||||
|
|
||||||
|
=============================================================
|
||||||
|
REAL INTERACTION 1: The File Dump
|
||||||
|
=============================================================
|
||||||
|
|
||||||
|
What the analyst ACTUALLY types:
|
||||||
|
|
||||||
|
Analyst: /agent-skill-creator here
|
||||||
|
|
||||||
|
And then drags 5 files into the chat:
|
||||||
|
- weekly-process.xlsx
|
||||||
|
- last-weeks-output.pdf
|
||||||
|
- usda-website-screenshot.png
|
||||||
|
- email-from-boss.eml
|
||||||
|
- old-script-that-half-works.py
|
||||||
|
|
||||||
|
That's it. No description. No requirements. Just "here."
|
||||||
|
|
||||||
|
What agent-skill-creator does:
|
||||||
|
|
||||||
|
Copilot (loading agent-skill-creator skill):
|
||||||
|
|
||||||
|
Let me understand what you need from these files...
|
||||||
|
|
||||||
|
Reading weekly-process.xlsx...
|
||||||
|
Reading last-weeks-output.pdf...
|
||||||
|
Viewing usda-website-screenshot.png...
|
||||||
|
Reading email-from-boss.eml...
|
||||||
|
Reading old-script-that-half-works.py...
|
||||||
|
|
||||||
|
OK, here's what I'm piecing together:
|
||||||
|
|
||||||
|
YOUR WORKFLOW (reconstructed from the files):
|
||||||
|
|
||||||
|
1. weekly-process.xlsx — This is your working spreadsheet. It has 6
|
||||||
|
tabs: "Raw Data", "Pivot", "YoY Comparison", "Pace Calculator",
|
||||||
|
"Charts", and "Summary". The "Raw Data" tab has USDA export
|
||||||
|
inspection data pasted in (columns match the USDA FAS CSV
|
||||||
|
format). The "Summary" tab is a formatted table that looks like
|
||||||
|
it gets copy-pasted into a report.
|
||||||
|
|
||||||
|
2. last-weeks-output.pdf — This is what the final product looks
|
||||||
|
like. It's a one-page PDF with the summary table from your
|
||||||
|
Excel, plus two charts (cumulative exports by destination, and
|
||||||
|
weekly pace vs. needed pace). Header says "Weekly Export
|
||||||
|
Inspections Summary — Week Ending Feb 20, 2026."
|
||||||
|
|
||||||
|
3. usda-website-screenshot.png — This is a screenshot of
|
||||||
|
apps.fas.usda.gov/export-sales/inspections.htm showing where
|
||||||
|
you download the CSV. You circled the "Current Marketing Year"
|
||||||
|
link in red.
|
||||||
|
|
||||||
|
4. email-from-boss.eml — From Henrik Johansson, Feb 24: "Can you
|
||||||
|
make the weekly export summary include soymeal and soyoil too?
|
||||||
|
And add a comparison to the 5-year average pace. Client asked
|
||||||
|
for it. Need it by next Monday."
|
||||||
|
|
||||||
|
5. old-script-that-half-works.py — A 47-line Python script that
|
||||||
|
downloads the USDA CSV and filters for soybeans. It has a
|
||||||
|
comment on line 12: "# TODO: this breaks when USDA changes
|
||||||
|
column names again" and another on line 38: "# Ana wrote this
|
||||||
|
part, not sure what it does". The script stops after filtering —
|
||||||
|
no pivot, no pace calculation, no output formatting.
|
||||||
|
|
||||||
|
MY UNDERSTANDING OF WHAT YOU NEED:
|
||||||
|
|
||||||
|
You manually produce a weekly export inspections report. The process
|
||||||
|
is: download CSV from USDA FAS → paste into Excel → filter → pivot
|
||||||
|
by destination → calculate YoY and pace → format summary → generate
|
||||||
|
charts → export to PDF. It takes some amount of time you'd rather
|
||||||
|
not spend.
|
||||||
|
|
||||||
|
Your boss just expanded the scope: add soymeal and soyoil, add
|
||||||
|
5-year average pace comparison. Deadline: next Monday.
|
||||||
|
|
||||||
|
Someone (you or Ana?) started writing a Python script to automate
|
||||||
|
the first part but it only does the CSV download and breaks
|
||||||
|
periodically.
|
||||||
|
|
||||||
|
IS THIS RIGHT? Before I build anything, let me confirm:
|
||||||
|
|
||||||
|
1. The output should look exactly like last-weeks-output.pdf but
|
||||||
|
with soymeal + soyoil added (separate tables? or combined?)
|
||||||
|
2. "5-year average pace" = for each week of the marketing year,
|
||||||
|
what was the average cumulative export percentage across the
|
||||||
|
last 5 marketing years?
|
||||||
|
3. The marketing year for soybeans is Sep 1 – Aug 31. Is it the
|
||||||
|
same for soymeal and soyoil?
|
||||||
|
4. The PDF goes to Henrik (your boss) — does it also go to clients
|
||||||
|
directly, or does Henrik paste it into a larger report?
|
||||||
|
|
||||||
|
I only need quick answers — one word each is fine. I'll figure out
|
||||||
|
the rest from the files.
|
||||||
|
|
||||||
|
Analyst: combined table, yes, yes same, henrik pastes it into bigger report
|
||||||
|
|
||||||
|
Copilot:
|
||||||
|
|
||||||
|
Got it. Building...
|
||||||
|
|
||||||
|
[builds the skill — same quality as before, but derived entirely
|
||||||
|
from the 5 files, not from a written specification]
|
||||||
|
|
||||||
|
|
||||||
|
=============================================================
|
||||||
|
REAL INTERACTION 2: The URL Dump + Half-Sentence
|
||||||
|
=============================================================
|
||||||
|
|
||||||
|
What the analyst ACTUALLY types:
|
||||||
|
|
||||||
|
Analyst: /agent-skill-creator i need something for this
|
||||||
|
https://apps.fas.usda.gov/psdonline/app/index.html#/app/advQuery
|
||||||
|
and also this
|
||||||
|
https://quickstats.nass.usda.gov/
|
||||||
|
same thing as what we do in the wasde extractor but for these two
|
||||||
|
maybe look at references/psd-api-docs.html too
|
||||||
|
|
||||||
|
What agent-skill-creator does:
|
||||||
|
|
||||||
|
Copilot (loading agent-skill-creator skill):
|
||||||
|
|
||||||
|
Let me look at what you shared...
|
||||||
|
|
||||||
|
Fetching https://apps.fas.usda.gov/psdonline/app/index.html...
|
||||||
|
→ This is the USDA PSD (Production, Supply & Distribution) online
|
||||||
|
query tool. It has a web-based advanced query builder for global
|
||||||
|
commodity S&D data.
|
||||||
|
|
||||||
|
Fetching https://quickstats.nass.usda.gov/...
|
||||||
|
→ This is USDA NASS Quick Stats — the statistical query tool for
|
||||||
|
domestic agricultural data (acreage, yield, production, prices,
|
||||||
|
by state/county/crop reporting district).
|
||||||
|
|
||||||
|
Reading references/psd-api-docs.html...
|
||||||
|
→ This is the FAS Open Data Web API documentation. There's a REST
|
||||||
|
API at apps.fas.usda.gov/opendataweb/api/ that returns PSD data
|
||||||
|
as JSON. Endpoints: commodities, countries, unitsOfMeasure,
|
||||||
|
and the main query endpoint with commodity_code, country_code,
|
||||||
|
market_year filters.
|
||||||
|
|
||||||
|
Looking at your existing wasde-extractor-skill for the pattern...
|
||||||
|
→ That skill: downloads USDA data → parses structured tables →
|
||||||
|
compares to previous period → produces a change summary →
|
||||||
|
exports to Excel/markdown.
|
||||||
|
|
||||||
|
HERE'S WHAT I THINK YOU WANT:
|
||||||
|
|
||||||
|
Two data sources (PSD and NASS Quick Stats) that you currently
|
||||||
|
query manually through web interfaces, turned into skills that
|
||||||
|
follow the same pattern as your WASDE extractor — download, parse,
|
||||||
|
compare, summarize.
|
||||||
|
|
||||||
|
SKILL A — PSD Balance Sheet Puller:
|
||||||
|
Instead of clicking through the PSD web query tool (selecting
|
||||||
|
commodity, country, market year, clicking "Get Data", exporting
|
||||||
|
CSV, then reformatting in Excel), you'd say:
|
||||||
|
/psd-query-skill US soybeans 2025/26 balance sheet
|
||||||
|
And get a formatted S&D table with month-over-month revisions.
|
||||||
|
|
||||||
|
SKILL B — NASS Quick Stats Puller:
|
||||||
|
Instead of clicking through Quick Stats (selecting Program,
|
||||||
|
Sector, Group, Commodity, Category, Domain, Geographic Level...
|
||||||
|
it's 8 dropdown menus), you'd say:
|
||||||
|
/nass-query-skill Iowa corn yield by county 2020-2025
|
||||||
|
And get a clean table.
|
||||||
|
|
||||||
|
Both would use APIs (PSD has the REST API you shared, NASS has a
|
||||||
|
public API at quickstats.nass.usda.gov/api) instead of scraping
|
||||||
|
web pages. More reliable, faster, and won't break when USDA
|
||||||
|
redesigns the website.
|
||||||
|
|
||||||
|
Is this the right direction? Or did you mean something different
|
||||||
|
by "same thing as the wasde extractor"?
|
||||||
|
|
||||||
|
Analyst: yes that plus can the psd one compare our internal balance to usda
|
||||||
|
|
||||||
|
Copilot:
|
||||||
|
|
||||||
|
Smart — so the PSD skill doesn't just pull USDA data, it also reads
|
||||||
|
your team's internal balance (from research.sd_balances on
|
||||||
|
Databricks, same as what the agdb-query-assistant uses) and shows
|
||||||
|
a side-by-side comparison: "Our estimate vs USDA" with the
|
||||||
|
differences highlighted.
|
||||||
|
|
||||||
|
That's actually three things in one:
|
||||||
|
1. Pull USDA PSD data via API
|
||||||
|
2. Pull internal estimate from Databricks
|
||||||
|
3. Show the gap and flag where you differ by more than X%
|
||||||
|
|
||||||
|
Building both skills...
|
||||||
|
|
||||||
|
|
||||||
|
=============================================================
|
||||||
|
REAL INTERACTION 3: The Screenshot and a Complaint
|
||||||
|
=============================================================
|
||||||
|
|
||||||
|
What the analyst ACTUALLY types:
|
||||||
|
|
||||||
|
Analyst: /agent-skill-creator this is ridiculous
|
||||||
|
|
||||||
|
[drags in a screenshot of a Bloomberg terminal showing a cluttered
|
||||||
|
Excel spreadsheet open next to it, with red circles and arrows drawn
|
||||||
|
on it in MS Paint]
|
||||||
|
|
||||||
|
look at this mess. every day i manually type these bloomberg numbers
|
||||||
|
into this spreadsheet. there has to be a better way
|
||||||
|
|
||||||
|
What agent-skill-creator does:
|
||||||
|
|
||||||
|
Copilot (loading agent-skill-creator skill):
|
||||||
|
|
||||||
|
Viewing your screenshot...
|
||||||
|
|
||||||
|
I can see:
|
||||||
|
- LEFT SIDE: Bloomberg Terminal showing settlement prices for what
|
||||||
|
looks like CBOT grain futures (corn, wheat, soybeans — I can read
|
||||||
|
the tickers: C 1, W 1, S 1, SM 1, BO 1, plus some ICE contracts)
|
||||||
|
- RIGHT SIDE: An Excel spreadsheet with columns: Date, Contract,
|
||||||
|
Settlement, Change, Volume, Open Interest. The data matches what's
|
||||||
|
on the Bloomberg screen. Your red circles highlight the cells
|
||||||
|
where you're typing today's numbers.
|
||||||
|
- The arrow goes from the Bloomberg price to the corresponding
|
||||||
|
Excel cell.
|
||||||
|
|
||||||
|
You're manually transcribing settlement prices from Bloomberg into
|
||||||
|
Excel every day. I can see about 15 contracts in the spreadsheet
|
||||||
|
and 12 visible on the Bloomberg screen.
|
||||||
|
|
||||||
|
QUESTIONS (so I build the right thing):
|
||||||
|
|
||||||
|
1. How many contracts total do you track? (I can see ~15 in the
|
||||||
|
spreadsheet, but there might be more below the visible area)
|
||||||
|
2. Is this ONLY settlement prices, or do you also record volume,
|
||||||
|
open interest, and daily change?
|
||||||
|
3. Do you have Bloomberg API access (B-PIPE, SAPI, or DAPI), or
|
||||||
|
are you limited to the Terminal screen?
|
||||||
|
4. Does your firm have a data feed that already captures these
|
||||||
|
settlements? (Many firms store Bloomberg data in a database —
|
||||||
|
you might not need to pull from Bloomberg at all if it's already
|
||||||
|
in agri_lake on Databricks)
|
||||||
|
|
||||||
|
The answer to #4 could save us a LOT of complexity. If Rajesh's
|
||||||
|
data team already loads daily settlements into Databricks, the
|
||||||
|
skill just needs to query Databricks and format the spreadsheet.
|
||||||
|
No Bloomberg API needed.
|
||||||
|
|
||||||
|
Analyst: oh i didnt know databricks might have it. how do i check
|
||||||
|
|
||||||
|
Copilot:
|
||||||
|
|
||||||
|
Let me check for you right now.
|
||||||
|
|
||||||
|
/agdb-query-assistant-skill Are there daily settlement prices in
|
||||||
|
Databricks? Look for tables with settlement, price, or daily in
|
||||||
|
the name.
|
||||||
|
|
||||||
|
[querying Databricks catalog]
|
||||||
|
|
||||||
|
YES. Found it:
|
||||||
|
|
||||||
|
agri_lake.market.daily_settlements
|
||||||
|
- 2.4 million rows (goes back to 2018)
|
||||||
|
- Columns: price_date, instrument_code, settlement_price,
|
||||||
|
daily_change, volume, open_interest, exchange
|
||||||
|
- Updated daily at 6:30pm CT (after CBOT close)
|
||||||
|
- Includes: CBOT (corn, wheat, soybeans, meal, oil), ICE (coffee,
|
||||||
|
sugar, cotton, cocoa), MATIF (milling wheat, rapeseed)
|
||||||
|
- 48 contracts currently active
|
||||||
|
|
||||||
|
YOU ALREADY HAVE THIS DATA IN DATABRICKS. You don't need to type
|
||||||
|
it from Bloomberg at all. The data team loads it automatically
|
||||||
|
every evening.
|
||||||
|
|
||||||
|
So instead of a Bloomberg-scraping skill, what you need is:
|
||||||
|
1. A skill that queries agri_lake.market.daily_settlements
|
||||||
|
2. Formats the results to match your Excel layout
|
||||||
|
3. Writes to your spreadsheet (or replaces it entirely)
|
||||||
|
|
||||||
|
This is a 10-minute build, not a Bloomberg API project.
|
||||||
|
|
||||||
|
Want me to build it?
|
||||||
|
|
||||||
|
Analyst: omfg yes
|
||||||
|
|
||||||
|
Copilot (loading agent-skill-creator skill):
|
||||||
|
|
||||||
|
Building...
|
||||||
|
|
||||||
|
daily-settlements-skill/
|
||||||
|
├── SKILL.md
|
||||||
|
├── scripts/
|
||||||
|
│ ├── settlement_puller.py # Query Databricks for latest
|
||||||
|
│ │ # settlements by contract list
|
||||||
|
│ ├── excel_formatter.py # Match the analyst's existing
|
||||||
|
│ │ # spreadsheet layout exactly
|
||||||
|
│ └── change_highlighter.py # Color-code moves > 2% (red/green)
|
||||||
|
├── references/
|
||||||
|
│ ├── contract-list.md # The 15 contracts to track
|
||||||
|
│ └── methodology.md
|
||||||
|
├── assets/
|
||||||
|
│ └── contract_list.json # Configurable contract list
|
||||||
|
├── install.sh
|
||||||
|
└── README.md
|
||||||
|
|
||||||
|
Installed at: ~/.agents/skills/daily-settlements-skill
|
||||||
|
|
||||||
|
This replaces 15 minutes of daily Bloomberg-to-Excel transcription
|
||||||
|
with a 5-second query. And you'll never have a typo again.
|
||||||
|
|
||||||
|
But the bigger insight: you didn't know the data was already in
|
||||||
|
Databricks. The skill didn't just automate your workflow — it
|
||||||
|
revealed that your workflow was UNNECESSARY. The daily manual
|
||||||
|
transcription was solving a problem that was already solved by
|
||||||
|
infrastructure you didn't know existed.
|
||||||
|
|
||||||
|
This happens more than you'd think.
|
||||||
|
|
||||||
|
|
||||||
|
=============================================================
|
||||||
|
REAL INTERACTION 4: The Forwarded Email + "Can We Do This?"
|
||||||
|
=============================================================
|
||||||
|
|
||||||
|
What the analyst ACTUALLY types:
|
||||||
|
|
||||||
|
Analyst: /agent-skill-creator my colleague from London sent me this. is this something we could build?
|
||||||
|
|
||||||
|
[pastes the full body of a forwarded email, including the original
|
||||||
|
headers, signatures, legal disclaimers, and a 6-message reply chain]
|
||||||
|
|
||||||
|
---------- Forwarded message ----------
|
||||||
|
From: James Whitfield <j.whitfield@globalagri.com>
|
||||||
|
To: Maria Santos <m.santos@globalagri.com>
|
||||||
|
Subject: FW: RE: RE: CONAB data automation
|
||||||
|
|
||||||
|
Maria,
|
||||||
|
|
||||||
|
See below — the London grains desk built something that pulls CONAB
|
||||||
|
crop estimates automatically when they're released. Apparently it
|
||||||
|
runs every time CONAB publishes (they don't have a fixed schedule,
|
||||||
|
it's usually 2nd or 3rd week of the month). The London team says
|
||||||
|
it's saved them hours. Think we could do something similar for our
|
||||||
|
Brazil soy coverage?
|
||||||
|
|
||||||
|
- James
|
||||||
|
|
||||||
|
---------- Original message ----------
|
||||||
|
From: Oliver Chang <o.chang@globalagri-london.com>
|
||||||
|
To: James Whitfield <j.whitfield@globalagri.com>
|
||||||
|
Subject: RE: CONAB data automation
|
||||||
|
|
||||||
|
James,
|
||||||
|
|
||||||
|
Yeah we've been running it since November. It monitors the CONAB
|
||||||
|
website (conab.gov.br/info-agro/safras/graos), detects when a new
|
||||||
|
Boletim de Safra PDF is posted, downloads it, extracts the Brazil
|
||||||
|
soy/corn/cotton production and area tables, converts from Portuguese,
|
||||||
|
compares to the previous month's boletim, and flags any revision
|
||||||
|
larger than 1 MMT. Output goes to our Bloomberg chat and a shared
|
||||||
|
drive.
|
||||||
|
|
||||||
|
We built it in Python — about 400 lines. The tricky part was parsing
|
||||||
|
the CONAB PDF because they change the table layout slightly every
|
||||||
|
few months and the tables have merged cells. We ended up using
|
||||||
|
Camelot for the table extraction. It breaks maybe once or twice a
|
||||||
|
year when CONAB does a major format change.
|
||||||
|
|
||||||
|
Happy to share the code but it's a bit messy and has our London
|
||||||
|
paths hardcoded everywhere.
|
||||||
|
|
||||||
|
- Oliver
|
||||||
|
|
||||||
|
---------- Earlier ----------
|
||||||
|
[3 more messages of scheduling back-and-forth, irrelevant to the
|
||||||
|
technical content]
|
||||||
|
|
||||||
|
DISCLAIMER: This email is confidential and intended solely for...
|
||||||
|
[200 words of legal boilerplate]
|
||||||
|
|
||||||
|
What agent-skill-creator does:
|
||||||
|
|
||||||
|
Copilot (loading agent-skill-creator skill):
|
||||||
|
|
||||||
|
Let me extract the useful information from this email chain...
|
||||||
|
|
||||||
|
IGNORING: scheduling messages (3), legal disclaimer, email headers,
|
||||||
|
signatures.
|
||||||
|
|
||||||
|
RELEVANT CONTENT (from Oliver Chang's email):
|
||||||
|
|
||||||
|
The London grains desk has a working Python script (~400 lines) that:
|
||||||
|
1. Monitors conab.gov.br for new Boletim de Safra PDFs
|
||||||
|
2. Downloads the PDF when detected
|
||||||
|
3. Extracts Brazil soy/corn/cotton production + area tables
|
||||||
|
4. Translates from Portuguese
|
||||||
|
5. Compares to previous month, flags revisions > 1 MMT
|
||||||
|
6. Outputs to Bloomberg chat + shared drive
|
||||||
|
|
||||||
|
Known issues:
|
||||||
|
- CONAB changes table layout periodically → breaks the PDF parser
|
||||||
|
- Code has London-specific hardcoded paths
|
||||||
|
- Built with Camelot (Python PDF table extraction library)
|
||||||
|
- Breaks ~1-2x/year on major format changes
|
||||||
|
|
||||||
|
WHAT YOU COULD DO:
|
||||||
|
|
||||||
|
Option A — Get Oliver's code, clean it up, wrap it as a skill
|
||||||
|
(faster if the code is solid, but "a bit messy" suggests fragile)
|
||||||
|
|
||||||
|
Option B — Build a new skill from scratch using the same approach
|
||||||
|
but with better PDF parsing resilience and your team's output
|
||||||
|
format (your format differs from London's — they post to Bloomberg
|
||||||
|
chat, you'd want it in your existing report pipeline)
|
||||||
|
|
||||||
|
Option C — Build a hybrid: use Oliver's proven CONAB monitoring
|
||||||
|
logic (website polling, URL pattern detection) but rebuild the
|
||||||
|
PDF extraction with a more resilient approach than Camelot
|
||||||
|
(possibly using LLM-based table extraction, which handles layout
|
||||||
|
changes much better than rule-based parsers)
|
||||||
|
|
||||||
|
My recommendation: Option C. The monitoring logic is
|
||||||
|
straightforward and Oliver has it working. The PDF parsing is the
|
||||||
|
fragile part — and that's where modern LLM-based extraction
|
||||||
|
(feeding the PDF page image to an LLM and asking it to extract
|
||||||
|
the table) would be much more resilient than Camelot's rule-based
|
||||||
|
approach.
|
||||||
|
|
||||||
|
Want me to build this? I can also reach out to Oliver's code via
|
||||||
|
your GitLab if it's there — or you can ask him to share it and
|
||||||
|
drop it in references/.
|
||||||
|
|
||||||
|
Analyst: dont have his code yet. just build it from what you know about conab. i can always refine later
|
||||||
|
|
||||||
|
Copilot:
|
||||||
|
|
||||||
|
Building from public knowledge of CONAB's publication patterns...
|
||||||
|
|
||||||
|
[builds conab-monitor-skill — with website poller, PDF download,
|
||||||
|
LLM-assisted table extraction, Portuguese→English translation,
|
||||||
|
revision tracking, and output to the team's existing pipeline]
|
||||||
|
|
||||||
|
The key design decision: instead of rigid Camelot table extraction
|
||||||
|
(which breaks on layout changes), the skill sends each PDF page to
|
||||||
|
the LLM with the prompt: "Extract the soybean production table
|
||||||
|
from this CONAB Boletim de Safra page. Return: state, area_ha,
|
||||||
|
yield_kg_ha, production_mt." This approach handles layout changes
|
||||||
|
gracefully because the LLM reads the page visually, not by pixel
|
||||||
|
coordinates.
|
||||||
|
|
||||||
|
When Oliver shares his code, we can cherry-pick his monitoring
|
||||||
|
logic (CONAB URL patterns, publication detection) and plug it in.
|
||||||
|
The skill is designed to accept improvements incrementally.
|
||||||
|
|
||||||
|
|
||||||
|
=============================================================
|
||||||
|
REAL INTERACTION 5: The Absolute Minimum Viable Input
|
||||||
|
=============================================================
|
||||||
|
|
||||||
|
This is the most extreme case — and the most realistic. The analyst
|
||||||
|
provides almost nothing. The agent must work with what it has.
|
||||||
|
|
||||||
|
What the analyst ACTUALLY types:
|
||||||
|
|
||||||
|
Analyst: /agent-skill-creator freight
|
||||||
|
|
||||||
|
That's it. One word.
|
||||||
|
|
||||||
|
What agent-skill-creator does:
|
||||||
|
|
||||||
|
Copilot (loading agent-skill-creator skill):
|
||||||
|
|
||||||
|
"Freight" is broad. Let me figure out what you need based on your
|
||||||
|
context.
|
||||||
|
|
||||||
|
You're on the S&D desk at GlobalAgri Capital. In the context of
|
||||||
|
ag commodity research, "freight" likely means one of:
|
||||||
|
|
||||||
|
A. OCEAN FREIGHT RATES — tracking Panamax/Supramax bulk carrier
|
||||||
|
rates on key grain routes (US Gulf to China, Santos to China,
|
||||||
|
Black Sea to Egypt). You need this for: CIF price calculations,
|
||||||
|
export competitiveness analysis, and logistics cost monitoring.
|
||||||
|
|
||||||
|
B. BARGE FREIGHT — US inland barge rates on the Mississippi/
|
||||||
|
Illinois/Ohio rivers. You need this for: basis analysis (barge
|
||||||
|
freight affects the spread between CBOT and Gulf FOB prices).
|
||||||
|
|
||||||
|
C. FREIGHT FUTURES — FFA (Forward Freight Agreement) curves and
|
||||||
|
derivatives. You need this for: hedging logistics risk and
|
||||||
|
forward pricing.
|
||||||
|
|
||||||
|
D. CONTAINER FREIGHT — rates for containerized ag products (dairy,
|
||||||
|
meat, specialty crops). Less likely for your desk.
|
||||||
|
|
||||||
|
Which one? Or multiple?
|
||||||
|
|
||||||
|
Analyst: A. but i dont know where the data comes from. tomoko does this somehow
|
||||||
|
|
||||||
|
Copilot:
|
||||||
|
|
||||||
|
Let me check what Tomoko uses...
|
||||||
|
|
||||||
|
Looking at the daily-assessment-skill (Tomoko's)...
|
||||||
|
Found: in her normalization step, she uses a FOB-CIF differential
|
||||||
|
of $28.50/MT for the Santos→China route, sourced from "Baltic
|
||||||
|
Exchange via Refinitiv."
|
||||||
|
|
||||||
|
Looking at Databricks...
|
||||||
|
Found: agri_lake.shipping.freight_rates — daily ocean freight rates
|
||||||
|
by route, going back to 2019. Sources include Baltic Exchange
|
||||||
|
(Panamax, Supramax indices) and broker indications.
|
||||||
|
|
||||||
|
Columns: rate_date, route_code, vessel_class, rate_usd_mt,
|
||||||
|
rate_worldscale, index_value, source.
|
||||||
|
|
||||||
|
Routes available:
|
||||||
|
- USG_CN (US Gulf → China): Panamax
|
||||||
|
- SANTOS_CN (Santos → China): Panamax
|
||||||
|
- SANTOS_EU (Santos → EU): Supramax
|
||||||
|
- BSEA_EGY (Black Sea → Egypt): Handysize
|
||||||
|
- BSEA_CN (Black Sea → China): Panamax
|
||||||
|
- PNW_CN (Pacific NW → China): Panamax
|
||||||
|
- ARG_CN (Up River → China): Panamax
|
||||||
|
- 12 more routes...
|
||||||
|
|
||||||
|
So the data already exists. You need a skill that:
|
||||||
|
1. Pulls freight rates from Databricks
|
||||||
|
2. Shows you the current rate by route
|
||||||
|
3. Tracks trends (is freight rising or falling?)
|
||||||
|
4. Calculates what the freight cost means for export competitiveness
|
||||||
|
(e.g., "at current freight, US Gulf soybeans to China cost $X/MT
|
||||||
|
delivered vs Santos soybeans at $Y/MT — Brazil has a $Z/MT
|
||||||
|
logistics advantage")
|
||||||
|
|
||||||
|
That's a freight competitiveness tool, not just a freight tracker.
|
||||||
|
It tells the S&D team which origin is cheapest to ship from on any
|
||||||
|
given day — which directly affects trade flow forecasts.
|
||||||
|
|
||||||
|
Want me to build this?
|
||||||
|
|
||||||
|
Analyst: ya
|
||||||
|
|
||||||
|
Copilot:
|
||||||
|
|
||||||
|
Building...
|
||||||
|
|
||||||
|
[builds freight-competitiveness-skill from one word of input + the
|
||||||
|
analyst's desk context + Databricks schema discovery + Tomoko's
|
||||||
|
existing skill as a reference point]
|
||||||
|
|
||||||
|
|
||||||
|
=============================================================
|
||||||
|
DESIGN IMPLICATIONS FOR AGENT-SKILL-CREATOR
|
||||||
|
=============================================================
|
||||||
|
|
||||||
|
These interactions reveal what agent-skill-creator must actually
|
||||||
|
be good at:
|
||||||
|
|
||||||
|
1. FILE INTERPRETATION, NOT REQUIREMENTS GATHERING
|
||||||
|
|
||||||
|
The primary input is ARTIFACTS, not words:
|
||||||
|
- Excel files (reverse-engineer the workflow from the sheet structure)
|
||||||
|
- PDFs (understand what the output should look like)
|
||||||
|
- Screenshots (read what's on the screen, including Bloomberg)
|
||||||
|
- Code files (understand what already exists, what's broken)
|
||||||
|
- Emails (extract the actual request from the noise)
|
||||||
|
- URLs (fetch and understand the data source)
|
||||||
|
|
||||||
|
The agent must be able to look at an Excel workbook with 6 tabs
|
||||||
|
and reconstruct the entire workflow: "Tab 1 is raw input, Tab 2 is
|
||||||
|
a pivot of Tab 1 filtered by X, Tab 3 compares Tab 2 to a baseline
|
||||||
|
in Tab 4, Tab 5 generates charts from Tab 3, Tab 6 is the formatted
|
||||||
|
output." The human will never describe this. The spreadsheet IS the
|
||||||
|
specification.
|
||||||
|
|
||||||
|
2. CONTEXT INFERENCE, NOT EXPLICIT REQUIREMENTS
|
||||||
|
|
||||||
|
When someone types "freight" the agent should already know:
|
||||||
|
- What desk they're on (S&D, modelling, price assessment)
|
||||||
|
- What commodities they cover (grains, oilseeds, softs)
|
||||||
|
- What data sources are already available (Databricks, Bloomberg,
|
||||||
|
colleagues' existing skills)
|
||||||
|
- What output format their team uses (Excel, PDF, markdown)
|
||||||
|
|
||||||
|
The agent doesn't ask "what do you mean by freight?" — it presents
|
||||||
|
the most likely interpretation given the context and confirms.
|
||||||
|
|
||||||
|
3. PROGRESSIVE REFINEMENT, NOT UPFRONT SPECIFICATION
|
||||||
|
|
||||||
|
The interaction pattern should be:
|
||||||
|
- Human dumps messy input (files, URLs, one word, screenshot)
|
||||||
|
- Agent reconstructs intent and presents its understanding
|
||||||
|
- Human corrects with minimal effort ("yes", "no the other one",
|
||||||
|
"also add X")
|
||||||
|
- Agent builds a first version
|
||||||
|
- Human uses it, finds gaps, says "also do Y"
|
||||||
|
- Agent refines
|
||||||
|
|
||||||
|
This is how humans naturally communicate. They don't write specs.
|
||||||
|
They iterate. The agent must be comfortable building from 60%
|
||||||
|
understanding and refining, rather than demanding 100%
|
||||||
|
understanding before starting.
|
||||||
|
|
||||||
|
4. DISCOVERY OVER ASSUMPTION
|
||||||
|
|
||||||
|
The analyst manually typing Bloomberg prices into Excel didn't know
|
||||||
|
the data was already in Databricks. The analyst who said "freight"
|
||||||
|
didn't know Tomoko's skill already had freight data. The agent's
|
||||||
|
job is not just to build what was asked, but to DISCOVER what
|
||||||
|
already exists and connect the dots.
|
||||||
|
|
||||||
|
Before building, always check:
|
||||||
|
- Is this data already in Databricks? (query the catalog)
|
||||||
|
- Has a colleague already built a skill for this? (check the
|
||||||
|
team's GitLab skill library)
|
||||||
|
- Is there an API for this data source? (check before scraping)
|
||||||
|
- Does the existing workflow have a simpler solution? (maybe the
|
||||||
|
entire workflow is unnecessary)
|
||||||
|
|
||||||
|
5. CONFIRM, DON'T INTERROGATE
|
||||||
|
|
||||||
|
BAD: "Please provide: (1) the data source URL, (2) the update
|
||||||
|
frequency, (3) the output format, (4) the distribution list..."
|
||||||
|
(This is a requirements form. Nobody fills these out.)
|
||||||
|
|
||||||
|
GOOD: "From your files, it looks like you download data from USDA
|
||||||
|
every Monday, pivot it by destination, and send Henrik a PDF.
|
||||||
|
Right?"
|
||||||
|
(This is a confirmation. One word to accept.)
|
||||||
|
|
||||||
|
The agent should do the work of understanding and present a
|
||||||
|
hypothesis. The human's job is to confirm or correct — not to
|
||||||
|
specify from scratch.
|
||||||
|
|
||||||
|
6. FAIL FORWARD, NOT FAIL SAFE
|
||||||
|
|
||||||
|
When the agent has 60% understanding, it should build the 60% skill
|
||||||
|
and let the human see the output. The human will immediately say
|
||||||
|
"this is wrong, it should be X" — which is MUCH easier than asking
|
||||||
|
the human to specify X from nothing. Seeing a wrong output is the
|
||||||
|
fastest way for a non-technical person to articulate what they
|
||||||
|
actually want.
|
||||||
|
|
||||||
|
The worst thing the agent can do is ask 15 clarifying questions
|
||||||
|
before building anything. By question 5, the human has given up
|
||||||
|
and gone back to their spreadsheet.
|
||||||
|
|
||||||
|
SUMMARY:
|
||||||
|
|
||||||
|
agent-skill-creator should treat human input as EVIDENCE, not
|
||||||
|
INSTRUCTIONS. The files, URLs, screenshots, half-sentences, and
|
||||||
|
single words are all evidence of what the human needs. The agent's
|
||||||
|
job is forensic — reconstruct the intent from the evidence, present
|
||||||
|
a hypothesis, confirm, build, and refine.
|
||||||
|
|
||||||
|
The human is never the bottleneck. The agent is always the one doing
|
||||||
|
the work of understanding.
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue