When building with Agentic AI, most enterprise value today concentrates around two foundational archetypes — the “Doers” (workflow automation) and the “Thinkers” (intelligence and reasoning). The highest-value use cases increasingly combine both—using intelligence to determine the optimal flow, and structured automation to execute it end to end, often at varying levels of intensity along the spectrum. While this practical hybrid pattern drives immediate business ROI, a smaller, highly specialised frontier of autonomous multi-agent networks is emerging to push the boundaries of open-ended planning and adaptation.
I shall share key learnings from two such Agentic AI systems I recently implemented — one focused on operational execution, and the other on high-stakes cognitive reasoning:The "Doer" (Workflow Automation): Implementing
an agentic chatbot that autonomously resolved simpler queries and seamlessly
diverted complex cases to Genesys for human intervention — significantly reducing
manual agent workload.
The "Thinker" (Intelligent
Reasoning): Building a multi-agent solution that
analysed open order notes to infer the root causes of service delays, helping
avoid SLA violation fines for issues beyond operational control.
What is an Agent?
It is useful to begin with a clear definition of what an
agent is. At its core, an agent is a software component that takes an
input and produces an output. It is uniquely identifiable (often by a name) and
operates within defined safety guardrails.
An agent leverages tools, APIs, and large language
models (LLMs) to perform tasks, guided by business logic encoded in its
instructions. Its execution is governed by configuration settings—such as model
selection, thresholds, and control parameters — while its behaviour is
continuously monitored and analysed through metrics, logs, and traces.
Crucially, an agent functions with awareness of its
broader context. This context is dynamically retrieved either on demand or from
persistent knowledge sources, such as vectorised or graph-based knowledge
bases. These repositories are typically built from existing business
documentation and enriched through ongoing collaboration with domain experts.
Technical Tips and Lessons Learned from Agentic AI
Implementations:
1. Start with Business Clarity
- Establish
clear understanding of the business problem, measurable outcomes,
and expected business ROI.
- Engage
a knowledgeable product owner or end user early to capture domain
knowledge in the form of RAG, knowledge graphs, agent instructions.
- Continuously
validate that the solution aligns with business outcomes and ROI.
- Start with use cases that need workflow or reasoning agents before moving into autonomous systems.
2. Redesign Workflows, Don’t Just Automate
- Map
the end-to-end workflow — especially for
workflow automation use cases.
- Be
bold in challenging and redesigning workflows with the business
team, not just digitising existing ones.
- Decompose
workflows into agent responsibilities, factoring in access
controls, tools and APIs.
- If intent recognition layer is required, start with LLM-based
classification and subsequently an ML model as labelled data matures.
3. Agent Architecture & Orchestration
- Design
modular agents with a single responsibility for better reliability
and interpretability.
- Implement the main orchestrator agent such that it delegates tasks to sub-agents, regains
control after execution and supports sequential or parallel execution with
retries.
- Balance
deterministic vs non-deterministic behaviour by prioritising deterministic logic wherever possible. Implement core logic through tools/APIs and reserve LLM reasoning for resolving ambiguity or making contextual decisions.
- Avoid
deep tool chaining to maintain simplicity and debuggability.
4. Data & Context Engineering
- Ensure
availability of relevant data to identify patterns (e.g. frequent
queries) and generate test cases.
- Dedicate time to analysing how context shapes expected outcomes in sample data using NLP and LLMs; these insights are critical for optimizing reasoning and output quality.
- Pre-process
data (e.g. acronym expansion, text clean up) for lean and relevant context and to reduce token usage and cost
5. Memory, State & Persistence Strategy
- Store
session metadata for sharing across agents.
- Persist
session state with session caching (e.g. Redis) for fast access.
- Persist
session state within a NoSQL quick retrieval DB (like Firestore) for session
restoration in low-latency scenarios.
- Apply
semantic caching (e.g. FAISS) for general or repeatable queries to reduce latency and
optimise token cost.
- For predominantly user-specific queries, semantic caching offers limited benefit. Instead, persist session context in long-term storage(GCS/S3)
with an appropriate expiry, enabling retrieval via composite indexing (e.g. Firestore) when needed.
- Store
transaction-level data in RDBMS (e.g. BigQuery on GCP) for downstream analytics.
- Implement session state management, archival and transaction logging as a bare minimum.
6. Performance, Cost & Scalability
Optimisation
- Implement
monitoring callbacks early to track latency (e.g. time to first
token, end-to-end), throughput, token usage and cost.
- Handle
LLM rate limits by reducing LLM API calls through deterministic flows and routing low-priority requests to smaller models. When utilizing a single LLM, manage parallel agent execution via exponential backoff following a Fibonacci sequence, capped at a maximum number of retries.
- Leverage
batch pre-generation where applicable for speed and cost
savings.
- Regularly
analyse usage patterns to optimise cost vs performance
trade-offs.
7. Safety, Security & Governance
- Implement
multi-layer guardrails for pre-input validation, pre-tool
invocation checks and post-output validation.
- Use centralised guardrails where possible; otherwise build reusable safety components.
- Enforce
secure access controls using token-based tool authorisation (e.g.
JWT) with an authentication server, token rotation for sensitive
use cases and Secrets Manager via IAM for high security.
- Continuously
validate adherence to policies and constraints via agent
instructions for simple constraints and dedicated validation agents for complex
policies; as additional sequential agent would add latency, parallelise the validation agent if the workflow permits.
8. Observability, Logging & Debugging
- Enable
comprehensive logging across user interactions, agent behaviour and
cloud services for post go-live analysis.
- Make
it a practice to capture and analyse prompts, generated outputs and agent
reasoning traces to derive insights that help debug, fine-tune prompts,
and improve system behaviour during development.
9. Human-in-the-Loop & Trust Design
- Use
confidence scoring and reflection to route low-confidence responses
for human review.
- Provide clear explanations to end users to enable informed decision-making and aid debugging of incorrect workflows.
10. Developer Experience & Operational
Discipline
- Standardise
agent structure with below folder structure instead of a single python file:
/agent-name (folder)
├── agent.py (file)
├── description.md (file)
└── instructions.md (file)
- Version
control prompts to maintain history across releases and
compare outputs for regression analysis.
- Build
a regression test suite incrementally from Sprint 1 to ensure consistency.
11. Continuous Improvement & Value
Tracking
- Regularly
analyse monitoring metrics and transaction data during development and post go-live.
- Use
insights to refine agent behaviour, validate business impact and identify
new optimisation opportunities.
12. Prototyping & User Feedback
- Build
UI prototypes early to allow users to interact with the system and
capture early real-world feedback.
- Iterate
rapidly based on user experience and observed behaviour.
To wrap up, the most successful agentic AI implementations are not
those that maximise autonomy, but those that strike the right balance between
structured control and intelligent flexibility — anchored firmly in business
value.
No comments:
Post a Comment