Why CrewAI’s Supervisor-Employee Structure Fails — and The right way to Repair It

is likely one of the most promising functions of LLMs, and CrewAI has rapidly turn into a well-liked framework for constructing agent groups. However one among its most necessary options—the hierarchical manager-worker course of—merely doesn’t perform as documented. In actual workflows, the supervisor doesn’t successfully coordinate brokers; as a substitute, CrewAI executes duties sequentially, resulting in incorrect reasoning, pointless instrument calls, and very excessive latency. This situation has been highlighted in a number of on-line boards with no clear decision.

On this article, I display why CrewAI’s hierarchical course of fails, present the proof from precise Langfuse traces, and supply a reproducible pathway to make the manager-worker sample work reliably utilizing customized prompting.

Multi-agent Orchestration

Earlier than we get into the main points, allow us to perceive what orchestration means in an agentic context. In easy phrases, orchestration is managing and coordinating a number of inter-dependent duties in a workflow. However have’nt workflow administration instruments (eg; RPA) been obtainable perpetually to do exactly that? So what modified with LLMs?

The reply is the power of LLMs to grasp which means and intent from pure language directions, simply as folks in a crew would. Whereas earlier workflow instruments have been rule-based and inflexible, with LLMs functioning as brokers, the expectation is that they may be capable of perceive the intent of the consumer’s question, use reasoning to create a multi-step plan, infer the instruments for use, derive their inputs within the right codecs, and synthesize all of the totally different intermediate leads to a exact response to the consumer’s question. And the orchestration frameworks are supposed to information the LLM with acceptable prompts for planning, tool-calling, producing response and so forth.

Among the many orchestration frameworks, CrewAI, with its pure language primarily based definition of duties, brokers and crews relies upon probably the most on the LLM’s capability to grasp language and handle workflows. Whereas not as deterministic as LangGraph (since LLM outputs can’t be totally deterministic), it abstracts away a lot of the complexity of routing, error dealing with and so forth into easy, user-friendly constructs with parameters, which the consumer can tune for acceptable conduct. So it’s a good framework for creating prototypes by product groups and even non-developers.

Besides that the manager-worker sample doesn’t work as meant…

For instance, let’s take a use-case to work with. And likewise consider the response primarily based on the next standards:

High quality of orchestration
High quality of ultimate response
Explainability
Latency and utilization value

Use Case

Take the case the place a crew of buyer assist brokers resolve technical or billing tickets. When a ticket comes, a triage agent categorizes the ticket, then assigns to the technical or billing assist specialist for decision. There’s a Buyer Help Supervisor who coordinates the crew, delegates duties and validates high quality of response.

Collectively they are going to be fixing queries equivalent to:

Why is my laptop computer overheating?
Why was I charged twice final month?
My laptop computer is overheating and in addition, I used to be charged twice final month?
My bill quantity is wrong after system glitch?

The primary question is only technical, so solely the technical assist agent must be invoked by the supervisor, the second is Billing solely and the third and fourth ones require solutions from each technical and billing brokers.

Let’s construct this crew of CrewAI brokers and see how effectively it really works.

Crew of Buyer Help Brokers

Hierarchical Course of

In keeping with CrewAI documentation ,“adopting a hierarchical strategy permits for a transparent hierarchy in job administration, the place a ‘supervisor’ agent coordinates the workflow, delegates duties, and validates outcomes for streamlined and efficient execution. “ Additionally, the supervisor agent may be created in two methods, robotically by CrewAI or explicitly set by the consumer. Within the latter case, you will have extra management over directions to the supervisor agent. We are going to strive each methods for our use case.

CrewAI Code

Following is the code for the use case. I’ve used gpt-4o because the LLM and Langfuse for observability.

from crewai import Agent, Crew, Course of, Job, LLM
from dotenv import load_dotenv
import os
from observe import * # Langfuse hint

load_dotenv()
verbose = False
max_iter = 4

API_VERSION = os.getenv(API_VERSION')
# Create your LLM
llm_a = LLM(
    mannequin="gpt-4o",
    api_version=API_VERSION,
    temperature = 0.2,
    max_tokens = 8000,
)

# Outline the supervisor agent
supervisor = Agent(
    position="Buyer Help Supervisor",
    objective="Oversee the assist crew to make sure well timed and efficient decision of buyer inquiries. Use the instrument to categorize the consumer question first, then resolve the subsequent steps.Syntesize responses from totally different brokers if wanted to supply a complete reply to the client.",
    backstory=( """
        You don't attempt to discover a solution to the consumer ticket {ticket} your self. 
        You delegate duties to coworkers primarily based on the next logic:
        Observe the class of the ticket first through the use of the triage agent.
        If the ticket is categorized as 'Each', at all times assign it first to the Technical Help Specialist, then to the Billing Help Specialist, then print the ultimate mixed response. Make sure that the ultimate response solutions each technical and billing points raised within the ticket primarily based on the responses from each Technical and Billing Help Specialists.
        ELSE
        If the ticket is categorized as 'Technical', assign it to the Technical Help Specialist, else skip this step.
        Earlier than continuing additional, analyse the ticket class. Whether it is 'Technical', print the ultimate response. Terminate additional actions.
        ELSE
        If the ticket is categorized as 'Billing', assign it to the Billing Help Specialist.
        Lastly, compile and current the ultimate response to the client primarily based on the outputs from the assigned brokers.
        """
    ),
    llm = llm_a,
    allow_delegation=True,
    verbose=verbose,
)

# Outline the triage agent
triage_agent = Agent(
    position="Question Triage Specialist",
    objective="Categorize the consumer question into technical or billing associated points. If a question requires each elements, reply with 'Each'.",
    backstory=(
        "You're a seasoned skilled in analysing intent of consumer question. You reply exactly with one phrase: 'Technical', 'Billing' or 'Each'."
    ),
    llm = llm_a,
    allow_delegation=False,
    verbose=verbose,
)

# Outline the technical assist agent
technical_support_agent = Agent(
    position="Technical Help Specialist",
    objective="Resolve technical points reported by clients promptly and successfully",
    backstory=(
        "You're a extremely expert technical assist specialist with a robust background in troubleshooting software program and {hardware} points. "
        "Your major duty is to help clients in resolving technical issues, making certain their satisfaction and the graceful operation of their merchandise."
    ),
    llm = llm_a,
    allow_delegation=False,
    verbose=verbose,
)

# Outline the billing assist agent
billing_support_agent = Agent(
    position="Billing Help Specialist",
    objective="Handle buyer inquiries associated to billing, funds, and account administration",
    backstory=(
        "You're an skilled billing assist specialist with experience in dealing with buyer billing inquiries. "
        "Your important goal is to supply clear and correct info relating to billing processes, resolve fee points, and help with account administration to make sure buyer satisfaction."
    ),
    llm = llm_a,
    allow_delegation=False,
    verbose=verbose,
)

# Outline duties
categorize_tickets = Job(
    description="Categorize the incoming buyer assist ticket: '{ticket} primarily based on its content material to find out whether it is technical or billing-related. If a question requires each elements, reply with 'Each'.",
    expected_output="A categorized ticket labeled as 'Technical' or 'Billing' or 'Each'. Don't be verbose, simply reply with one phrase.",
    agent=triage_agent,
)

resolve_technical_issues = Job(
    description="Resolve technical points described within the ticket: '{ticket}'",
    expected_output="Detailed options supplied to every technical situation.",
    agent=technical_support_agent,
)

resolve_billing_issues = Job(
    description="Resolve billing points described within the ticket: '{ticket}'",
    expected_output="Complete responses to every billing-related inquiry.",
    agent=billing_support_agent,
)

# Instantiate your crew with a customized supervisor and hierarchical course of
crew_q = Crew(
    brokers=[triage_agent, technical_support_agent, billing_support_agent],
    duties=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
    # manager_llm = llm_a, # Uncomment for auto-created supervisor
    manager_agent=supervisor, # Remark for auto-created supervisor
    course of=Course of.hierarchical,
    verbose=verbose,
)

As is clear, this system displays the crew of human brokers. Not solely is there a manger, triage agent, technical and billing assist agent, however the CrewAI objects equivalent to Agent, Job and Crew are self-evident of their which means and simple to visualise. One other remark is that there’s little or no python code and a lot of the reasoning, planning and conduct is pure language primarily based which relies upon upon the power of the LLM to derive which means and intent from language, then purpose and plan for the objective.

A CrewAI code due to this fact, scores excessive on ease of growth. It’s a low-code method of making a circulate rapidly with a lot of the heavy-lifting of the workflow being accomplished by the orchestration framework reasonably than the developer.

How effectively does it work?

As we’re testing the hierarchical course of, the method parameter is ready to Course of.hierarchical within the Crew definition. We will strive totally different options of CrewAI as follows and measure efficiency:

Supervisor agent auto-created by CrewAI
Utilizing our customized supervisor agent

1. Auto-created supervisor agent

Enter question: Why is my laptop computer overheating?

Right here is the Langfuse hint:

The important thing observations are as follows:

First the output is “Based mostly on the supplied context, it appears there’s a misalignment between the character of the difficulty (laptop computer overheating) and its categorization as a billing concern. To make clear the connection, it will be necessary to find out if the client is requesting a refund for the laptop computer because of the overheating situation, disputing a cost associated to the acquisition or restore of the laptop computer, or in search of compensation for restore prices incurred because of the overheating…” For a question that was clearly a technical situation, this can be a poor response.
Why does it occur? The left panel reveals that the execution first went to triage specialist, then to technical assist after which unusually, to billing assist specialist as effectively. The next graphic depicts this as effectively:

Wanting intently, we discover that the triage specialist appropriately recognized the ticket as “Technical” and the technical assist agent gave an amazing reply as follows:

However then, as a substitute of stopping and replying with the above because the response, the Crew Supervisor went to the Billing assist specialist and tried to discover a non-existent billing situation within the purely technical consumer question.

This resulted within the Billing agent’s response overwriting the Technical agent’s response, with the Crew Supervisor doing a sub-optimal job of validating the standard of the ultimate response towards the consumer’s question.

Why did it occur?

As a result of within the Crew job definition, I specified the duties as categorize_tickets, resolve_technical_issues, resolve_billing_issues and though the method is meant to be hierarchical, the Crew Supervisor doesn’t carry out any orchestration, as a substitute merely executing all of the duties sequentially.

crew_q = Crew(
    brokers=[triage_agent, technical_support_agent, billing_support_agent],
    duties=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
    manager_llm = llm_a,
    course of=Course of.hierarchical,
    verbose=verbose,
)

For those who now ask a billing-related question, it is going to seem to provide an accurate reply just because the resolve_billing_issues is the final job within the sequence.

What a couple of question that requires each technical and billing assist, equivalent to “My laptop computer is overheating and in addition I used to be charged twice final month?” On this case additionally, the triage agent appropriately categorizes the ticket sort as “Each”, and the technical and billing brokers give right solutions to their particular person queries, however the supervisor is unable to mix all of the responses right into a coherent reply to consumer’s question. As a substitute, the ultimate response solely considers the billing response since that’s the final job to be referred to as in sequence.

Latency and Utilization: As may be seen from the above picture, the Crew execution took virtually 38 secs and spent 15759 tokens. The ultimate output is barely about 200 tokens. The remainder of the tokens have been spent in all of the pondering, agent calling, producing intermediate responses and so forth – all to generate an unsatisfactory response on the finish. The efficiency may be categorised as “Poor”.

Analysis of this strategy

High quality of orchestration: Poor
High quality of ultimate output: Poor
Explainability: Poor
Latency and Utilization: Poor

However maybe, the above end result is because of the truth that we relied on CrewAI’s built-in supervisor, which didn’t have our customized directions. Subsequently, in our subsequent strategy we substitute the CrewAI automated supervisor with our customized Supervisor agent, which has detailed directions on what to do in case of Technical, Billing or Each tickets.

2. Utilizing Customized Supervisor Agent

Our Buyer Help Supervisor is outlined with the next very particular directions. Observe that this requires some experimentation to get it working, and a generic supervisor immediate equivalent to that talked about within the CrewAI documentation will give the identical misguided outcomes because the built-in supervisor agent above.

    position="Buyer Help Supervisor",
    objective="Oversee the assist crew to make sure well timed and efficient decision of buyer inquiries. Use the instrument to categorize the consumer question first, then resolve the subsequent steps.Syntesize responses from totally different brokers if wanted to supply a complete reply to the client.",
    backstory=( """
        You don't attempt to discover a solution to the consumer ticket {ticket} your self. 
        You delegate duties to coworkers primarily based on the next logic:
        Observe the class of the ticket first through the use of the triage agent.
        If the ticket is categorized as 'Each', at all times assign it first to the Technical Help Specialist, then to the Billing Help Specialist, then print the ultimate mixed response. Make sure that the ultimate response solutions each technical and billing points raised within the ticket primarily based on the responses from each Technical and Billing Help Specialists.
        ELSE
        If the ticket is categorized as 'Technical', assign it to the Technical Help Specialist, else skip this step.
        Earlier than continuing additional, analyse the ticket class. Whether it is 'Technical', print the ultimate response. Terminate additional actions.
        ELSE
        If the ticket is categorized as 'Billing', assign it to the Billing Help Specialist.
        Lastly, compile and current the ultimate response to the client primarily based on the outputs from the assigned brokers.
        """

And within the Crew definition, we use the customized supervisor as a substitute of the built-in one:

crew_q = Crew(
    brokers=[triage_agent, technical_support_agent, billing_support_agent],
    duties=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
    # manager_llm = llm_a,
    manager_agent=supervisor,
    course of=Course of.hierarchical,
    verbose=verbose,
)

Let’s repeat the take a look at circumstances

Enter question: Why is my laptop computer overheating?

The hint is the next:

Graph of Why is my laptop computer overheating?

A very powerful remark is that now for this technical question, the circulate didn’t go to the Billing assist specialist agent. The supervisor appropriately adopted directions, categorized the question as technical and stopped execution as soon as the Technical Help Specialist had generated its response. From the response preview displayed, it’s evident that it’s a good response for the consumer question. Additionally, the latency is 24 secs and token utilization is 10k.

Enter question: Why was I charged twice final month?

The hint is as follows:

Response to ‘Why was I charged twice final month?’

Graph of Why was I charged twice final month?

As may be seen, the supervisor appropriately skipped executing the Technical Help Specialist, though that was earlier than the Billing agent within the Crew definition. As a substitute the response generated is of excellent high quality from the Billing Help Specialist solely. Latency is 16 secs and token utilization 7,700 solely

Enter question: My laptop computer is overheating and in addition, I used to be charged twice final month?

The hint reveals the Supervisor executed each Technical and Billing assist brokers and supplied a mixed response.

The response preview within the determine above doesn’t present the total response, which is as follows, and combines responses from each assist brokers. Latency is 38 secs and token utilization is 20k, which is commensurate with the a number of brokers orchestration and the detailed response generated.

Pricey Buyer,

Thanks for reaching out to us relating to the problems you might be experiencing. We sincerely apologize for any inconvenience triggered. Beneath are the detailed options to handle your issues:

**1. Laptop computer Overheating Difficulty:**
   - **Verify for Correct Air flow**: Guarantee your laptop computer is positioned on a tough, flat floor to permit correct airflow. Keep away from utilizing it on comfortable surfaces like beds or couches that may block the vents. Think about using a laptop computer cooling pad or stand with built-in followers to enhance airflow.
   - **Clear the Laptop computer's Vents and Followers**: Mud and particles can accumulate within the vents and followers, proscribing airflow. Energy off the laptop computer, unplug it, and use a can of compressed air to softly blow out mud from the vents. If you're snug, you may clear the interior followers and parts extra totally, or take the laptop computer to an expert technician for inside cleansing.
   - **Monitor Operating Functions and Processes**: Open the Job Supervisor (Home windows: Ctrl + Shift + Esc, macOS: Exercise Monitor) and examine for processes consuming excessive CPU or GPU utilization. Shut pointless functions or processes to scale back the load on the system.
   - **Replace Drivers and Software program**: Replace your working system, drivers (particularly graphics drivers), and another important software program to the most recent variations.
   - **Verify for Malware or Viruses**: Run a full system scan utilizing a good antivirus program to detect and take away any malware.
   - **Modify Energy Settings**: Modify your energy settings to "Balanced" or "Energy Saver" mode (Home windows: Management Panel > Energy Choices, macOS: System Preferences > Vitality Saver).
   - **Examine the Laptop computer's {Hardware}**: If the laptop computer remains to be overheating, there could also be a difficulty with the {hardware}, equivalent to a failing fan or thermal paste that wants alternative. Seek the advice of an expert technician to examine and substitute the thermal paste or defective {hardware} parts if essential.
   - **Environmental Components**: Function the laptop computer in a cool, well-ventilated atmosphere. Keep away from utilizing the laptop computer in direct daylight or close to warmth sources.
   - **Think about Upgrading Elements**: If the laptop computer is older, contemplate upgrading parts equivalent to RAM or switching to an SSD to scale back the pressure on the system and assist with warmth administration.
   - **Monitor Temperature Ranges**: Set up a temperature monitoring instrument (e.g., HWMonitor, Core Temp, or Macs Fan Management) to maintain monitor of the CPU and GPU temperatures. This may help establish if the laptop computer is constantly working at excessive temperatures and supply insights into when the overheating happens.

If the issue persists after making an attempt all of the above options, please contact the laptop computer producer’s assist crew or an authorized restore technician for additional help.

**2. Duplicate Billing Difficulty:**
   - **Verification of Account Particulars**: I reviewed your account info to make sure accuracy and confirmed that the account is lively and matches the reported situation.  
   - **Billing Information Verify**: Upon analyzing the billing data for final month, I recognized two equivalent prices processed on the identical day. This confirms the duplicate billing situation.  
   - **Cost Methodology Particulars**: I verified the fee technique used for the transactions and confirmed that each prices have been processed utilizing the identical fee technique.  
   - **Decision**:  
       - Because the duplicate cost was confirmed, I've initiated a refund for the second transaction. The refund will probably be processed again to the unique fee technique inside 5-7 enterprise days.  
       - Moreover, I've utilized a credit score to your account as a gesture of goodwill for the inconvenience triggered.  
   - **Buyer Communication**: You'll obtain an electronic mail or telephone name from us to tell you of the decision, together with the refund timeline and the utilized credit score.  

We sincerely apologize for the inconvenience attributable to these points. Please tell us if you happen to want additional help or have any extra questions. We're right here to assist

And our ultimate, extra nuanced question is as follows:

Enter question: My bill quantity is wrong after system glitch?

The hint reveals that the question was appropriately categorised as ‘Each’, requiring responses from Technical and Billing brokers and a mixed response was supplied. Latency is 24 secs and tokens used are 12k.

The complete response is included right here:

Thanks for bringing this situation to our consideration. Based mostly on the technical investigation, listed here are the steps we're taking to resolve the system glitch that triggered the inaccurate bill quantity:

1. We've got verified the main points of the inaccurate bill quantity and recognized the difficulty within the system.
2. Our technical crew has analyzed the system logs and reviewed the bill calculation logic to pinpoint the foundation reason for the error.
3. The mandatory fixes have been applied to handle the glitch, and the system has been totally examined to make sure the difficulty is resolved.
4. A corrected bill has been generated, and we've verified its accuracy.

Moreover, right here’s the entire decision for the billing situation associated to the inaccurate bill quantity attributable to the system glitch:

1. **Corrected Bill Verification**: The technical crew has resolved the system glitch and generated a corrected bill. The corrected bill has been personally verified to make sure its accuracy, together with cross-checking the quantities, dates, and any relevant prices or credit.

2. **Buyer Communication**: We are going to promptly notify you in regards to the decision of the difficulty. This communication contains:
   - An apology for the inconvenience attributable to the inaccurate bill.
   - Affirmation that the system glitch has been resolved.
   - Assurance that the corrected bill has been totally reviewed for accuracy.
   - A replica of the corrected bill on your data.

3. **Extra Steps Taken**: To forestall related points sooner or later, the technical crew has applied measures to make sure system stability and accuracy in bill era.

4. **Account Adjustment (if relevant)**: If the inaccurate bill resulted in any overpayment or underpayment, the mandatory changes will probably be made to your account. This contains issuing a refund for any overpayment or offering clear directions for settling any excellent stability.

5. **Comply with-Up**: We're right here to help you with any additional questions or issues relating to your account or billing. Please don't hesitate to achieve out to us, and we will probably be completely satisfied to assist. To your comfort, we've supplied direct contact info for additional communication.

We sincerely apologize for any inconvenience this may occasionally have triggered and guarantee you that we're taking steps to forestall related points sooner or later. Thanks on your understanding and persistence.

Analysis of this strategy

High quality of orchestration: Good
High quality of ultimate output: Good
Explainability: Good (we perceive why it did what it did)
Latency and Utilization: Honest (commensurate with the complexity of the output)

Takeaway

In abstract, the hierarchical Supervisor–Employee sample in CrewAI doesn’t perform as documented. The core orchestration logic is weak; as a substitute of permitting the supervisor to selectively delegate duties, CrewAI executes all duties sequentially, inflicting incorrect agent invocation, overwritten outputs, and inflated latency/token utilization. Why it failed comes right down to the framework’s inside routing—hierarchical mode doesn’t implement conditional branching or true delegation, so the ultimate response is successfully decided by whichever job occurs to run final. The repair is introducing a customized supervisor agent with specific, step-wise directions: it makes use of the triage end result, conditionally calls solely the required brokers, synthesizes their outputs, and terminates execution on the proper level—restoring right routing, bettering output high quality, and considerably optimising token prices.

Conclusion

CrewAI, within the spirit of maintaining the LLM on the heart of orchestration, relies upon upon it for a lot of the heavy-lifting of orchestration, utilising consumer prompts mixed with detailed scaffolding prompts embedded within the framework. In contrast to LangGraph and AutoGen, this strategy sacrifices determinism for developer-friendliness. And typically leads to surprising conduct for important options such because the manager-worker sample, essential for a lot of real-life use circumstances. This text makes an attempt to display a pathway for attaining the specified orchestration for this sample utilizing cautious prompting. In future articles, I intend to discover extra options for CrewAI, LangGraph and others for his or her applicability in sensible use circumstances.

You need to use CrewAI to design an interactive conversational assistant on a doc retailer and additional make the responses actually multimodal. Refer my articles on GraphRAG Design and Multimodal RAG.

Join with me and share your feedback at www.linkedin.com/in/partha-sarkar-lets-talk-AI

_{All photos on this article drawn by me or generated utilizing Copilot or Langfuse. Code shared is written by me.}

Why CrewAI’s Supervisor-Employee Structure Fails — and The right way to Repair It

Multi-agent Orchestration

Use Case

Hierarchical Course of

CrewAI Code

How effectively does it work?

1. Auto-created supervisor agent

Analysis of this strategy

2. Utilizing Customized Supervisor Agent

Let’s repeat the take a look at circumstances

Analysis of this strategy

Takeaway

Conclusion

Klarna takes crypto leap ahead, plans to launch stablecoin in 2026

We might have simply seen the primary indicators of darkish matter

Converter

Editors Pick

Newsletter

Categories

Related Posts