I Let an AI Write My GIS Workflow. Here's What Broke.

There’s a growing narrative that AI can write code for you.

That’s true.

What’s more interesting is what happens after the code is written.

I’ve been experimenting with integrating AI directly into ArcGIS Pro workflows. The idea is simple:

Describe what you want → get working geoprocessing code.

In practice, it looks like this:

  • I type: “Select all points within 1 mile of schools and summarize by district”
  • The system generates Python (ArcPy)
  • The code runs inside a real project

And sometimes… it works perfectly.

Other times, it breaks in ways that are surprisingly consistent.

Where things actually break

1. The “almost right” problem

AI is very good at generating code that looks correct.

It’s much worse at generating code that:

  • uses the correct coordinate system
  • handles edge cases in real datasets
  • respects schema constraints

Example:

  • It buffers in degrees instead of meters
  • Or assumes a field exists that doesn’t

This is dangerous because:

The output looks valid, but the result is wrong.

2. Context is everything (and AI doesn’t have enough of it)

In a real GIS project:

  • layers have naming conventions
  • fields have meaning
  • projections matter

Without that context, AI guesses.

Sometimes correctly. Often not.

This is where most “AI coding demos” fall apart. They work in isolation, not inside messy systems.

3. Execution is the real problem

Generating code is easy.

Running it safely is not.

In a production environment, you need:

  • dry-run modes
  • logging
  • validation checks
  • rollback strategies

Without that, you’re basically letting an AI modify your data blindly.

What actually works

After a lot of trial and error, I’ve landed on a pattern:

AI should:

  • generate code
  • suggest approaches

Humans should:

  • validate intent
  • review execution
  • own the result

The key shift

The real value of AI isn’t:

“write code for me”

It’s:

“reduce the distance between intent and execution”

But there’s a gap between:

  • generated code
  • trustworthy systems

Most of my work lately has been about closing that gap.

What I’m exploring next

  • scoring AI-generated code quality
  • comparing outputs against known-good datasets
  • building guardrails into execution environments

Basically:

Not “can AI write code?”
But “can we trust what it produces?”

If you’re using AI in real workflows, I’d love to hear:

  • where it breaks for you
  • what guardrails you’ve built

Because that’s where the interesting work is happening.