IIt is now April 2024, and it has been about 17 months since I have been utilizing LLMs like ChatGPT to assist with code technology and debugging duties. Whereas this has drastically improved productiveness, the generated code could also be buggy and chances are you’ll find yourself happening the nice previous StackOverflow route.
This text briefly demonstrates how this lack of “validation” might be addressed utilizing the Conversable Agent supplied by AutoGen.
What’s Autogen?
“AutoGen is a framework that permits the event of LLM purposes utilizing a number of brokers that may work together with one another to unravel duties.”
Introducing LeetCode drawback fixing software:
First, set up autogen silently.
!pip set up pyautogen -q --progress-bar off
I am utilizing Google Colab, so[Secrets]I entered OPENAI_API_KEY within the tab and safely loaded it together with different modules.
import os
import csv
import autogen
from autogen import Cache
from google.colab import userdata
userdata.get('OPENAI_API_KEY')
I’m utilizing gpt-3.5-turbo Simply because it is cheaper gpt4. In case you can afford dearer experiments, or are taking issues extra “severely”, then clearly you need to use a extra highly effective mannequin.
llm_config = {
"config_list": [{"model": "gpt-3.5-turbo", "api_key": userdata.get('OPENAI_API_KEY')}],
"cache_seed": 0, # seed for reproducibility
"temperature": 0, # temperature to manage randomness
}
Now copy the query textual content from my favourite LeetCode drawback. sum of two. This is without doubt one of the mostly requested questions in leetcode fashion interviews and covers fundamental ideas comparable to caching utilizing hashmaps and fundamental equation manipulation.
LEETCODE_QUESTION = """
Title: Two SumGiven an array of integers nums and an integer goal, return indices of the 2 numbers such that they add as much as goal. You might assume that every enter would have precisely one resolution, and chances are you'll not use the identical aspect twice. You'll be able to return the reply in any order.
Instance 1:
Enter: nums = [2,7,11,15], goal = 9
Output: [0,1]
Clarification: As a result of nums[0] + nums[1] == 9, we return [0, 1].
Instance 2:
Enter: nums = [3,2,4], goal = 6
Output: [1,2]
Instance 3:
Enter: nums = [3,3], goal = 6
Output: [0,1]
Constraints:
2 <= nums.size <= 104
-109 <= nums[i] <= 109
-109 <= goal <= 109
Just one legitimate reply exists.
Comply with-up: Are you able to give you an algorithm that's lower than O(n2) time complexity?
"""
Now you may outline each brokers. One agent acts as an “assistant” agent that implies options, and the opposite acts as a proxy for us, the person, and can be chargeable for executing the steered Python code.
# create an AssistantAgent named "assistant"SYSTEM_MESSAGE = """You're a useful AI assistant.
Clear up duties utilizing your coding and language expertise.
Within the following circumstances, recommend python code (in a python coding block) or shell script (in a sh coding block) for the person to execute.
1. When it's essential to gather data, use the code to output the information you want, for instance, browse or search the net, obtain/learn a file, print the content material of a webpage or a file, get the present date/time, examine the working system. After enough data is printed and the duty is able to be solved based mostly in your language talent, you may remedy the duty by your self.
2. When it's essential to carry out some activity with code, use the code to carry out the duty and output the consequence. End the duty well.
Clear up the duty step-by-step if it's essential to. If a plan is just not supplied, clarify your plan first. Be clear which step makes use of code, and which step makes use of your language talent.
When utilizing code, you should point out the script sort within the code block. The person can't present another suggestions or carry out another motion past executing the code you recommend. The person cannot modify your code. So don't recommend incomplete code which requires customers to switch. Do not use a code block if it isn't supposed to be executed by the person.
In order for you the person to avoid wasting the code in a file earlier than executing it, put # filename: <filename> contained in the code block as the primary line. Do not embody a number of code blocks in a single response. Don't ask customers to repeat and paste the consequence. As a substitute, use 'print' perform for the output when related. Test the execution consequence returned by the person.
If the consequence signifies there's an error, repair the error and output the code once more. Counsel the complete code as a substitute of partial code or code modifications. If the error cannot be mounted or if the duty is just not solved even after the code is executed efficiently, analyze the issue, revisit your assumption, gather more information you want, and consider a distinct method to attempt.
If you discover a solution, confirm the reply rigorously. Embrace verifiable proof in your response if doable.
Further necessities:
1. Inside the code, add performance to measure the entire run-time of the algorithm in python perform utilizing "time" library.
2. Solely when the person proxy agent confirms that the Python script ran efficiently and the entire run-time (printed on stdout console) is lower than 50 ms, solely then return a concluding message with the phrase "TERMINATE". In any other case, repeat the above course of with a extra optimum resolution if it exists.
"""
assistant = autogen.AssistantAgent(
identify="assistant",
llm_config=llm_config,
system_message=SYSTEM_MESSAGE
)
# create a UserProxyAgent occasion named "user_proxy"
user_proxy = autogen.UserProxyAgent(
identify="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=4,
is_termination_msg=lambda x: x.get("content material", "").rstrip().endswith("TERMINATE"),
code_execution_config={
"work_dir": "coding",
"use_docker": False,
},
)
What I set was human_input_mode I do not plan to enter something, so I will set it to “NEVER.” max_consecutive_auto_reply Set to 4 to restrict dialog context. Assistant Her agent is instructed to reply with the phrase “TERMINATE” which tells the UserProxyAgent when to finish the dialog.
Now it is time for enjoyable! Begin a dialog by sending a message to your assistant from UserProxyAgent.
An extra good thing about utilizing AutoGen (even for non-agent workflows) is that it offers express caching performance that helps save API prices throughout improvement. Right here we’re caching the response to disk, however it’s also possible to combine Redis for this objective.
# Use DiskCache as cache
with Cache.disk(cache_seed=7) as cache:
# the assistant receives a message from the user_proxy, which incorporates the duty description
chat_res = user_proxy.initiate_chat(
assistant,
message="""Clear up the next leetcode drawback and in addition touch upon it is time and house complexity:nn""" + LEETCODE_QUESTION
)
output:
user_proxy (to assistant):Clear up the next leetcode drawback and in addition touch upon it is time and house complexity:
Title: Two Sum
...
Comply with-up: Are you able to give you an algorithm that's lower than O(n2) time complexity?
--------------------------------------------------------------------------------
assistant (to user_proxy):
To resolve this drawback effectively, we will use a dictionary to retailer the weather we've seen to this point together with their indices. Whereas iterating by means of the array, we will examine if the complement of the present aspect (goal - present aspect) is already within the dictionary. Whether it is, we've discovered the pair that sums as much as the goal.
Here is the Python code to unravel the Two Sum drawback with a time complexity lower than O(n^2):
```python
# filename: two_sum.py
def two_sum(nums, goal):
num_dict = {}
for i, num in enumerate(nums):
complement = goal - num
if complement in num_dict:
return [num_dict[complement], i]
num_dict[num] = i
# Take a look at the perform with the given examples
nums1 = [2, 7, 11, 15]
target1 = 9
print(two_sum(nums1, target1)) # Output: [0, 1]
nums2 = [3, 2, 4]
target2 = 6
print(two_sum(nums2, target2)) # Output: [1, 2]
nums3 = [3, 3]
target3 = 6
print(two_sum(nums3, target3)) # Output: [0, 1]
```
The time complexity of this resolution is O(n) as a result of we iterate by means of the record of numbers solely as soon as. The house complexity can be O(n) as a result of we retailer components in a dictionary.
You'll be able to run this code and examine if it offers the proper output. Let me know the entire run-time, and if it is lower than 50 ms, you may reply with "TERMINATE". If it isn't, I can recommend a extra optimum resolution.
--------------------------------------------------------------------------------
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):
exitcode: 0 (execution succeeded)
Code output:
[0, 1]
[1, 2]
[0, 1]
--------------------------------------------------------------------------------
assistant (to user_proxy):
The Python script ran efficiently and supplied the proper output for the given examples. Because the complete run-time is just not supplied, I'll recommend measuring the entire run-time of the algorithm utilizing the "time" library to find out if it is lower than 50 ms.
You'll be able to add the performance to measure the entire run-time by importing the "time" library and utilizing it to calculate the elapsed time earlier than and after calling the `two_sum` perform. Here is how one can modify the code:
```python
# filename: two_sum.py
import time
def two_sum(nums, goal):
num_dict = {}
for i, num in enumerate(nums):
complement = goal - num
if complement in num_dict:
return [num_dict[complement], i]
num_dict[num] = i
# Take a look at the perform with the given examples and measure the entire run-time
nums1 = [2, 7, 11, 15]
target1 = 9
start_time = time.time()
print(two_sum(nums1, target1)) # Output: [0, 1]
end_time = time.time()
print("Complete run-time:", (end_time - start_time) * 1000, "ms")
nums2 = [3, 2, 4]
target2 = 6
start_time = time.time()
print(two_sum(nums2, target2)) # Output: [1, 2]
end_time = time.time()
print("Complete run-time:", (end_time - start_time) * 1000, "ms")
nums3 = [3, 3]
target3 = 6
start_time = time.time()
print(two_sum(nums3, target3)) # Output: [0, 1]
end_time = time.time()
print("Complete run-time:", (end_time - start_time) * 1000, "ms")
```
Please run this modified code and supply the entire run-time for every instance. If the entire run-time is lower than 50 ms for all examples, you may reply with "TERMINATE".
--------------------------------------------------------------------------------
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):
exitcode: 0 (execution succeeded)
Code output:
[0, 1]
Complete run-time: 0.01239776611328125 ms
[1, 2]
Complete run-time: 0.00286102294921875 ms
[0, 1]
Complete run-time: 0.0016689300537109375 ms
--------------------------------------------------------------------------------
assistant (to user_proxy):
The Python script ran efficiently, and the entire run-time for every instance is considerably lower than 50 ms. Due to this fact, you may reply with "TERMINATE". The algorithm supplied an environment friendly resolution with a time complexity of O(n) and an area complexity of O(n).
--------------------------------------------------------------------------------
What’s occurring right here is
- UserProxyAgent asks the assistant to unravel the issue based mostly on the duty description.
- Assistant suggests an answer utilizing Python blocks
- UserProxyAgent runs Python code.
- The assistant reads the console output and responds with a modified resolution (utilizing the time measurement function. To be sincere, I used to be anticipating this modified resolution instantly, however this conduct is just not immediate (might be adjusted by taking an engineering or stronger LLM).
AutoGen additionally lets you view prices for agent workflows.
chat_res.value
({'total_cost': 0,
'gpt-3.5-turbo-0125': {'value': 0,
'prompt_tokens': 14578,
'completion_tokens': 3460,
'total_tokens': 18038}}
Conclusion:
So, utilizing AutoGen’s conversational brokers:
- We robotically verified that the Python code steered by LLM really works.
- We additionally created a framework that permits LLM to additional react to syntax and logic errors by studying console output.
thanks for studying! Comply with me and subscribe to be the primary after I submit a brand new article. 🙂
Take a look at my different articles too.

