Mej

After completing 12 years in software QA with a variety of test data, I was tempted to make a career shift into data science and decided to pursue this through a structured masters program. Though I love the three pillars - math, statistics and programming, I did not have an easy start as I am getting back to studies after a long gap of 14 years. As I began learning machine learning, visual analytics, data science, Python, Matlab, R, Tableau, Mondrian etc., I got excited of blogging so as to summarise my learning. I will try to make frequent posts and keep it simple. Looking forward for good learning and sharing time... Cheers, Mej!

Thursday, 25 June 2026

Lessons Learned from Agentic AI Implementations

When building with Agentic AI, most enterprise value today concentrates around two foundational archetypes — the “Doers” (workflow automation) and the “Thinkers” (intelligence and reasoning). The highest-value use cases increasingly combine both—using intelligence to determine the optimal flow, and structured automation to execute it end to end, often at varying levels of intensity along the spectrum. While this practical hybrid pattern drives immediate business ROI, a smaller, highly specialised frontier of autonomous multi-agent networks is emerging to push the boundaries of open-ended planning and adaptation.

I shall share key learnings from two such Agentic AI systems I recently implemented — one focused on operational execution, and the other on high-stakes cognitive reasoning:

The "Doer" (Workflow Automation): Implementing an agentic chatbot that autonomously resolved simpler queries and seamlessly diverted complex cases to Genesys for human intervention — significantly reducing manual agent workload.

The "Thinker" (Intelligent Reasoning): Building a multi-agent solution that analysed open order notes to infer the root causes of service delays, helping avoid SLA violation fines for issues beyond operational control.

 

What is an Agent?

It is useful to begin with a clear definition of what an agent is. At its core, an agent is a software component that takes an input and produces an output. It is uniquely identifiable (often by a name) and operates within defined safety guardrails.

An agent leverages tools, APIs, and large language models (LLMs) to perform tasks, guided by business logic encoded in its instructions. Its execution is governed by configuration settings—such as model selection, thresholds, and control parameters — while its behaviour is continuously monitored and analysed through metrics, logs, and traces.

Crucially, an agent functions with awareness of its broader context. This context is dynamically retrieved either on demand or from persistent knowledge sources, such as vectorised or graph-based knowledge bases. These repositories are typically built from existing business documentation and enriched through ongoing collaboration with domain experts.

 

Technical Tips and Lessons Learned from Agentic AI Implementations:


1. Start with Business Clarity

  • Establish clear understanding of the business problem, measurable outcomes, and expected business ROI.
  • Engage a knowledgeable product owner or end user early to capture domain knowledge in the form of RAG, knowledge graphs, agent instructions.
  • Continuously validate that the solution aligns with business outcomes and ROI.
  • Start with use cases that need workflow or reasoning agents before moving into autonomous systems.

 

2. Redesign Workflows, Don’t Just Automate

  • Map the end-to-end workflow — especially for workflow automation use cases.
  • Be bold in challenging and redesigning workflows with the business team, not just digitising existing ones.
  • Decompose workflows into agent responsibilities, factoring in access controls, tools and APIs.
  • If intent recognition layer is required, start with LLM-based classification and subsequently an ML model as labelled data matures.

 

3. Agent Architecture & Orchestration

  • Design modular agents with a single responsibility for better reliability and interpretability.
  • Implement the main orchestrator agent such that it delegates tasks to sub-agents, regains control after execution and supports sequential or parallel execution with retries.
  • Balance deterministic vs non-deterministic behaviour by prioritising deterministic logic wherever possible. Implement core logic through tools/APIs and reserve LLM reasoning for resolving ambiguity or making contextual decisions.
  • Avoid deep tool chaining to maintain simplicity and debuggability.

 

4. Data & Context Engineering

  • Ensure availability of relevant data to identify patterns (e.g. frequent queries) and generate test cases.
  • Dedicate time to analysing how context shapes expected outcomes in sample data using NLP and LLMs; these insights are critical for optimizing reasoning and output quality.
  • Pre-process data (e.g. acronym expansion, text clean up) for lean and relevant context and to reduce token usage and cost

 

5. Memory, State & Persistence Strategy

  • Store session metadata for sharing across agents.
  • Persist session state with session caching (e.g. Redis) for fast access.
  • Persist session state within a NoSQL quick retrieval DB (like Firestore) for session restoration in low-latency scenarios.
  • Apply semantic caching (e.g. FAISS) for general or repeatable queries to reduce latency and optimise token cost.
  • For predominantly user-specific queries, semantic caching offers limited benefit. Instead, persist session context in long-term storage(GCS/S3) with an appropriate expiry, enabling retrieval via composite indexing (e.g. Firestore) when needed.
  • Store transaction-level data in RDBMS (e.g. BigQuery on GCP) for downstream analytics.
  • Implement session state management, archival and transaction logging as a bare minimum.

 

6. Performance, Cost & Scalability Optimisation

  • Implement monitoring callbacks early to track latency (e.g. time to first token, end-to-end), throughput, token usage and cost.
  • Handle LLM rate limits by reducing LLM API calls through deterministic flows and routing low-priority requests to smaller models. When utilizing a single LLM, manage parallel agent execution via exponential backoff following a Fibonacci sequence, capped at a maximum number of retries.
  • Leverage batch pre-generation where applicable for speed and cost savings.
  • Regularly analyse usage patterns to optimise cost vs performance trade-offs.

 

7. Safety, Security & Governance

  • Implement multi-layer guardrails for pre-input validation, pre-tool invocation checks and post-output validation.
  • Use centralised guardrails where possible; otherwise build reusable safety components.
  • Enforce secure access controls using token-based tool authorisation (e.g. JWT) with an authentication server, token rotation for sensitive use cases and Secrets Manager via IAM for high security.
  • Continuously validate adherence to policies and constraints via agent instructions for simple constraints and dedicated validation agents for complex policies; as additional sequential agent would add latency, parallelise the validation agent if the workflow permits.

 

8. Observability, Logging & Debugging

  • Enable comprehensive logging across user interactions, agent behaviour and cloud services for post go-live analysis.
  • Make it a practice to capture and analyse prompts, generated outputs and agent reasoning traces to derive insights that help debug, fine-tune prompts, and improve system behaviour during development.

 

9. Human-in-the-Loop & Trust Design

  • Use confidence scoring and reflection to route low-confidence responses for human review.
  • Provide clear explanations to end users to enable informed decision-making and aid debugging of incorrect workflows.

10. Developer Experience & Operational Discipline

  • Standardise agent structure with below folder structure instead of a single python file:

/agent-name (folder)

  ├── agent.py (file)

  ├── description.md (file)

  └── instructions.md (file)

 

  • Version control prompts to maintain history across releases and compare outputs for regression analysis.
  • Build a regression test suite incrementally from Sprint 1 to ensure consistency.

 

11. Continuous Improvement & Value Tracking

  • Regularly analyse monitoring metrics and transaction data during development and post go-live.
  • Use insights to refine agent behaviour, validate business impact and identify new optimisation opportunities.

 

12. Prototyping & User Feedback

  • Build UI prototypes early to allow users to interact with the system and capture early real-world feedback.
  • Iterate rapidly based on user experience and observed behaviour.

 

To wrap up, the most successful agentic AI implementations are not those that maximise autonomy, but those that strike the right balance between structured control and intelligent flexibility — anchored firmly in business value.

No comments:

Post a Comment

Wanna search?