After spending time building structured custom prompts and master instructions for Claude Projects, last month I started exploring Claude Skills, what they actually are and how they differ from prompts or projects.
Around the same time, I had been hearing a lot of buzz about Claude Cowork. I wanted to test whether it could handle real financial workflows, not just simple demos or lightweight use cases.
That led me to a more specific experiment. We already had a custom credit research workflow split into six steps, each with its own prompt. We converted those prompts into skills and built our first custom plugin. From there, I decided to build a fresh set of skills from scratch for a much more demanding use case: extracting financial statements, revenue breakdowns, and debt structures from emerging market filings.
The need was straightforward: I had to extract income statements, balance sheets, cash flow statements, debt schedules, and revenue breakdowns. Not for one company, but for hundreds. Emerging market companies, no less, where filings come in different formats, follow different standards, and occasionally throw curveballs that would make any analyst's eye twitch.

Why Skills? Why Not Just Prompt?

If you've used Claude or any LLM for financial work, you know the drill: upload a filing, prompt it, fix what it got wrong, repeat. The problem is not that it cannot do the work. The problem is that it does not remember how you want it done. Every new conversation starts from zero.

Skills fix this. A skill is a persistent set of instructions, a playbook Claude follows every time. Your domain knowledge, formatting preferences, edge cases, quality checks all live in one markdown file. Company #1 or company #200, same process.

When you're doing this across hundreds of filings, that consistency matters.

I broke the workflow into five extractions: Income Statement (IS), Balance Sheet (BS), Cash Flow Statement (CFS), Debt, and Revenue. Each became its own skill. One skill, one job, done well.

The Build Process

Claude has a skill to create skills. You don't need to write markdown files from scratch. You give it your idea: what you want, how you want the output, what the constraints are, any special requirements. It generates the skill for you.

Let me walk through how I built the IS extractor as an example, since the same process applied to all five skills.

Starting with the instructions. I described what I needed: a pure data extraction skill that takes input files (annual reports, quarterly reports, investor presentations, press releases, supplementary materials) and pulls income statement data into an Excel file with a tab called "IS." It must never derive or calculate anything. Pure extraction only.

I also specified the output structure. The spreadsheet needed to handle four period categories as columns: Annual, Quarterly, Semi-Annual, and 9 Months. The data gets placed where it logically belongs: Q3 2025 goes under quarterly, 9 months data of 2024 goes under 9 months.

The follow-up questions. After the initial instructions, Claude asked follow-up questions to make sure it understood the task before generating the skill. This back-and-forth catches ambiguity early rather than baking it into the skill.

Quick refinements. Once the first version was generated, I made a few additions. One that turned out to be really useful: I told it to add cell comments in the Excel output so I could trace where each number was pulled from. When you're working with a 200-page annual report, clicking on a cell and seeing "Page 47, Note 12" saves a lot of re-verification.

Testing on real filings. I ran the skill against actual annual reports. The first run always shows you things: where your instructions were vague, where it made assumptions you didn't want, where the output needs tightening.

Fixing and retesting. This is where most of the iteration happens. You review the output, spot where something's off, maybe a line item is in the wrong column, or a period got mislabeled, and go back to tighten the instructions. Each round gets the skill closer to what you actually need.

Stress testing across formats. An annual report from a mining company in South Africa looks nothing like a quarterly report from a telecom in Southeast Asia. I ran each skill against different industries, geographies, and reporting standards to make sure it held up.

Locking it down. Once a skill was producing clean output consistently, I stopped tinkering. At some point you need to ship it and let real usage surface whatever's left.

Building the next skills got easier. This was the part I found interesting. Once I built the IS extractor and moved to the next skill in a new task, Claude already understood what I was going for. It carried over the context from the IS skill: the structure, the constraints, the output format. So building the BS, CFS, and revenue extractors went much faster. The exception was debt extraction, which was the trickiest and took the most time out of all five. How companies report their borrowings varies and level of data and granularity I needed was so much that it needed significantly more iteration.

From Skills to Plugin

Once I had all five extraction skills working reliably, the next question was obvious: can I package these together?

That's where the plugin concept comes in. Similar to how Claude has a skill to create skills, you can create a plugin right on Cowork. A plugin in Cowork bundles multiple skills together so they can be installed and used as a unit. In theory, that was the scaling layer. It was meant to take the workflow from useful to operationally efficient.

My Emerging Market plugin bundled the IS, BS, CFS, Debt, and Revenue extraction skills together. A user could run them individually on one company, or use the batch-run setup to process multiple company filings in one go.

My Honest Assessment

One of the main reasons I built this workflow on Cowork was the idea that I could run these skills and the plugin on multiple companies together. That was the dream. Point it at 3 or 4 sets of filings and let it extract everything in one go.

It didn't work out that way.

I first tested the plugin with 5 companies' filings together. It failed badly. I scaled back to 3 companies. Still bad. And to be clear, I wasn't even giving it complete annual reports. I had already extracted just the relevant pages for each company: the financial statements, revenue breakdowns, and debt sections. I did this deliberately to keep the context and token usage down, because I didn't want to compromise on accuracy.

Then I tested the plugin with just one company. It did better, but still missed the cash flow statement numbers and wasn't consistent across multiple runs. Same filings, different output each time.

So I went back to running each extraction skill individually on each company. And it worked. Accurate, consistent output every time.

That confirmed two things for me:

The extraction skills themselves are solid. They extract data the way I want, in the format I want. I don't need to revise them or do more work there.

Cowork plugins aren't ready yet for workflows where multiple files are in play and accuracy matters. When skills run sequentially through a plugin, it tends to overlook key instructions. The more files and steps involved, the more it drifts. For now, running skills individually gives you much better results.

This isn't a deal-breaker. The individual skills still save a huge amount of time. But if you're thinking of building a plugin to batch-process multiple companies at once, manage your expectations.

What I Learned Along the Way

Be specific about what the skill should NOT do. In this case, telling it “never calculate or derive” turned out to be just as important as telling it what to extract. When the task is structured data extraction, you want reported numbers only, nothing more.

Design the output first. I figured out the Excel structure, tab names, column headers, row layout before worrying about extraction logic. When it knows where every piece of data should land, it makes fewer judgment calls.

Test on the ugliest filings first. If your skill can handle a messy PDF from a small-cap mining company, it can handle a clean annual report from a large-cap telco.

Where This Stands Now

Cowork is still at a very early stage. It'll hopefully get to a point where you can work on multiple filings with full accuracy, but it's not there yet.

One thing that is worth noting though: Cowork gives you the liberty to work across different tasks simultaneously without opening multiple tabs. That alone makes it worth keeping an eye on as it matures.

In the meantime, I've adjusted my workflow. For data extraction, I've moved to running skills on the chat platform, executing two skills at a time per company. That's been the sweet spot - reliable output without the drift I was getting on Cowork with larger batches.

I do plan to go back to Cowork soon, probably to test it on qualitative research work where the accuracy bar is different. But for structured data extraction where every number has to be right, the chat-based workflow is what I'm sticking with for now.

Reply

Avatar

or to participate

Keep Reading