The hardest part of serverless is not the code.
Writing a handler is usually the easy bit. You take an event, do some work, return a response, and move on. That part can look almost annoyingly simple. The real work shows up later, when something fails and you have to figure out what actually happened.
That is where serverless gets expensive.
The code is small. The failure surface is not.
Serverless systems tend to hide complexity in the gaps between services.
One function reads from a queue. Another reacts to a webhook. Something else writes to storage, publishes an event, or triggers a workflow. Each piece can look clean on its own. The problem is that the failure usually happens in the handoff, not inside the function body.
That means the questions you need to answer are rarely simple:
- what event did we actually receive
- what shape did it have
- what changed since the last deploy
- which retry path ran
- did the same message get processed twice
- which downstream service failed first
If you cannot answer those quickly, the architecture is already harder than it looked.
Observability matters more than clever code
I care a lot more about observability in serverless systems than I do about making the handlers elegant.
Good logs, useful identifiers, and a clear path from one event to the next matter more than another layer of abstraction. If I can trace a request or job across the system, I can usually fix the problem. If I cannot, then all the neat code in the world does not help much.
The same goes for metrics and alerts. Serverless makes it easy to ship without thinking hard enough about what you will need when things go wrong. Then a rare failure shows up, and the only clue is that something somewhere retried three times and gave up.
That is not enough.
Local testing helps, but it does not solve the real problem
People talk about local testing as if it is the missing piece. It helps, but only up to a point.
You can test the happy path locally. You can even cover a decent amount of edge cases. What you still cannot fully simulate is the exact event shape from production, the timing between systems, the stale dependency, or the weird payload that only shows up once a month.
That is why serverless systems need a habit of recording reality, not just assumptions.
I want to see enough of the incoming event to know what the function was asked to do. I want to know which version handled it. I want enough context to tell whether the bug was in the code, the payload, or the contract between services.
Without that, local testing becomes comfort food.
Event shape drift is the quiet problem
Event shape drift is one of the things that makes serverless feel more fragile than people expect.
An upstream service changes a field name. A webhook adds something unexpected. A queue message starts carrying more or less data than before. Nothing dramatic happens immediately, which is why it is easy to miss. Then a function starts failing on edge cases that used to work fine.
This is one reason I like systems that make the contract explicit. If the event shape matters, treat it like it matters. Validate it, log it, version it if needed, and do not pretend it will stay stable just because the code compiled last week.
The operational context is the real product
When serverless works well, it is because the operational context is tight.
You know where the logs live. You know how retries behave. You know what idempotency looks like. You know which events are safe to reprocess and which ones are not. You know how to answer, quickly, whether a failure is transient or structural.
That is the real skill.
If you have that context, serverless can be a very good way to build. If you do not, it can turn into a pile of tiny functions that are easy to deploy and awkward to understand.
That is why I think the code is not the hard part. The hard part is everything around the code that tells you whether the system is actually healthy.
