Info
Select topic
The AI Agent Challenge is a 6-hour online team-based competition open to professionals and students (aged 16 or above) worldwide, passionate about AI and Multi-Agent Systems.
All participants compete in the same challenge, but students can also join their dedicated University or High School leagues for additional prizes.
Replyers can take part too, but in a different leaderboard: please follow the communication on TamTamy.
No. To play on the 16th April, you must create or join an existing team. Teams are made of 2 up to 4 members).
You can register until 15th April, 23.59 CEST
Once you’ve registered, you can form a team, invite other participants or you can join our Discord Server to find other members to play with. If once the registrations close you still don't have a teammate, we will merge you with another team.
To update your details at any time, log in to your profile and click “Edit profile”.
You can access your Challenge Page and click on “Leave team” to cancel your registration to the challenge.
To form your team, log in to the Reply Challenges platform, click the “Join the challenge” button and select “Create new team”. Once you’ve formed a team, you’ll see it when you log in to the platform. You can also choose a team name and invite your friends and classmates. Just fill in their email addresses and send the invitation.
Your team can have 2 - 4 people. Your team must be at least made of 2 members. If you're still registered alone at the end of the registration period, you'll be automatically paired with another participant to form a team of two. You'll receive an email notification with your team assignment.
You can connect to challenges.reply.com, select the AI Agent Challenge and click on Join the Challenge.
No, but you are free to leave your current team. They won’t receive any notification, so remember to tell them.
Where do I see my team?
You can see your team in your challenge page: connect to challenges.reply.com and in the homepage click on “Your Challenge Page”.
No. There are two challenges and two separate leader boards, so mixed teams are not possible.
We strongly recommend exploring all the materials in the Learn & Train section, including:
Learning modules (from basics to advanced topics about Agents creation and resources management)
Tutorials and instructions for the tools provided during the competition
Sandbox problems to practice and understand the challenge mechanics (starting from February 26th)
You can use any programming language you're comfortable with to build your agentic system. The most commonly used languages for AI agent development are Python, JavaScript/TypeScript, but the choice is entirely up to you and your team. The only requirement is that your code must be functional and submitted as specified in the competition rules.
Langfuse is required for tracking and validation purposes only. Your score will be calculated exclusively based on the output files you submit, not on Langfuse tracking data. Make sure to include your Langfuse session ID in all submissions as per the instructions in the Learn & Train section.
The sandbox mode is a game simulation, useful to help you understand how to upload your solution.
Upload your files by dragging and dropping them or selecting them from your computer.
Training dataset: Submit an output file for each dataset.
Evaluation dataset: Submit both an output file and your source code. The source code must be provided as a zip file containing your complete agentic system with all necessary components to run it (code, dependencies list, configuration files, instructions, etc.).
Important: To ensure proper tracking according to the competition rules, you must include the Langfuse session ID in your submission. Find detailed instructions in the Learn & Train section.
Yes. You’ll see a list of scores for all your submissions.
No, but you’ll see your submission scores.
You're free to choose the tools and frameworks that best suit your approach. However, for the competition you must use:
LLMs via API: We will provide you with an API key to access the LLMs available for the challenge
LangFuse: You must integrate LangFuse for tracking purposes following the instructions provided in the Train & Learn section
Beyond these requirements, you can use any additional libraries, frameworks, or tools to develop your agentic system.
Try reloading the page, then try clearing your cache and cookies. If you’re still having problems, message the Reply AIvengers on chat or email challenges@reply.com
You’ll need your own computer with an internet connection.
No, the platform does not execute the code, the score is calculated using only the output files. Still, AIvengers team must be able to execute the provided code to check that the submission is correct.
The AI Agent Challenge is a 6-hour competition where your team will build an agentic system to solve a specific problem statement.
Timeline on April 16th:
⏰ Challenge starts at 15.30 PM
💯 Leadearboard frozen at 21.00 until 21.30 PM
✋ Challenge ends at 21.30 PM
🏆 Within 10 working days: podium validation & final results
In your challenge page you’ll access:
Training Dataset:
- Use these datasets to develop and refine your agentic system
- Submit outputs as many times as you want
- Check your score after each submission to track your progress
Evaluation Dataset:
- Submit your final solution only once
- Include both your output and source code (zip file with your agentic system)
- Your final score will be based solely on the evaluation dataset performance
At challenge start: Your team will have access to the first three datasets.
Once your team submits the final evaluation solution for the first three datasets, datasets 4 and 5 will be automatically unlocked.
Resources Provided:
Problem statement
API key for LLM access
Training and evaluation datasets
Token allocation for your team: your team receives a budget in two stages:
Datasets 1-3: $40 in tokens
Datasets 4-5: $120 in tokens (unlocked only after submitting evaluation solutions for datasets 1-3)
Token monitoring dashboard
Tracing dashboard with all your team's Langfuse tracking data
Find all materials, instructions, and tools in the competition page and Learn & Train section.
Check out this link for more detailed information.
Token management is a strategic component of the challenge. Once your allocated budget is exhausted, it cannot be refilled. Use your tokens wisely and plan your approach carefully!
You can message the AI-vengers team via chat (Discord).
Yes! You're free to use any development and execution environment you prefer.
The only requirements are:
You must include the Langfuse session ID for tracking (see instructions in the Train & Learn section)
Your final submission for the evaluation dataset must include both output and source code (zip file with your agentic system)
Yes, it’s an online-only competition. However, you are welcome to organize local hubs at your university, school, or workplace, where participants can gather and play together. In some locations, Reply may support official hubs with gadgets or snacks: check the platform or contact us for more information.
We’ll update the leaderboard regularly to show how teams are performing. We’ll also freeze it 30 minutes before the challenge deadline (but we’ll continue to update scores).
Upload your files by dragging and dropping them or selecting them from your computer.
Training dataset: Submit an output file for each dataset as many times as you want (be careful with token management).
Evaluation dataset: Submit both an output file and your source code.
Important: To ensure proper tracking according to the competition rules, you must include the Langfuse session ID in your submission. Find detailed instructions in the Learn & Train section.
It depends on the dataset:
Training dataset: Unlimited submissions. You can upload and test your solutions as many times as you want to refine your agentic system and track your progress.
Evaluation dataset: One submission only. You can upload your final solution only once, so make sure it's complete and tested before submitting.
Your final score will be based solely on your evaluation dataset submission.
The scoring system evaluates AI multi-agent systems based on multiple weighted criteria, including but not limited to:
Detection Accuracy: How well your system distinguishes between fraudulent and non-fraudulent transactions.
Economic Impact Assessment: The financial consequences of errors — prioritizing the prevention of high-cost fraudulent activities and operational losses.
Agentic Efficiency and Optimization: Your system's speed, cost-effectiveness, and architectural complexity. Both over-engineered and overly simplistic solutions will be penalized.
The specific weights and detailed scoring methodology will be communicated from April 1, 2026 and prior to the event inside this section.
Benchmark & Bonus:
All metrics are evaluated against an optimal benchmark solution. Solutions that outperform this benchmark receive additional credit.
Dataset Difficulty:
Each dataset has a weighted scoring system where more complex datasets offer higher maximum points.
Your submission will be rejected if it contains any of the following errors:
Missing output files: All required output files must be included
Missing source code: Evaluation dataset submissions must include source code
Missing Langfuse session ID: Required for tracking and validation
Corrupted zip file: Ensure your zip file is properly compressed and can be extracted
Incomplete system: Missing dependencies, configuration files, or instructions
Double-check your submission before uploading, especially for the evaluation dataset (you only get one chance!).
Langfuse is required for tracking and validation purposes only. Your score will be calculated exclusively based on the output files you submit, not on Langfuse tracking data. Make sure to include your Langfuse session ID in all submissions as per the instructions in the Resource Management learning material.
Check out this link for more detailed information
A submission is made of three elements: the Langfuse Session ID (to be inserted in the upload modal field), an output file (.txt file with the fraudulent transactions list as specified in the problem statement), and source code ( only for evaluation datasets, as a .zip file containing your complete agentic system).
Langfuse automatically tracks your agent's token usage, costs, and latency. You just need to set up the environment variables (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST), use the @observe() decorator and CallbackHandler() in your code, and generate a unique session ID for each run. Langfuse handles the rest. For the full setup and code examples, refer to "Resource Management & Toolkit for the Challenge" in the Learn & Train section.
Find the list here
At the end of the code game, the AIvengers team will review and validate the best scoring submission from top-ranked teams on the leaderboard analyzing the source code files. The Reply team's decisions regarding the rules of this AI Agent competition are final.
We’ll publish a full list of results and notify all finalists no later than 10 working days after the day of the challenge.
At the end of the challenge, the Reply Team will review and validate the top scoring submissions from the leaderboard. The first-place team will receive €2,500 for each member, the second-place team €1,500 per member, and the third-place team €1000 per member. The Reply Team’s decision is final regarding competition rules and prize assignment. We’ll send the winners details of how to claim their prize.
If you are a Replyer, the prizes will be different from the external competition. Each member of the first team will win a Ray-Ban Meta (Gen 2). Each member of the second team and third team will receive a Reply Shop voucher.
You’ll get some emails before and after the challenge, so check your mailbox regularly. You can always message the AIvengers during the challenge via chat (Discord) if you have questions.
All communications will be in English. Though you and your teammates can speak whatever language(s) you like! 😊
Reply AIvengers write the problems and are responsible for enforcing all challenge rules. They’ll review submissions from teams and award prizes. They may exclude any participants or teams at any time, for breaching competition rules.
We want to make training sessions and the challenge fair for everyone. So never stop others from taking part – for instance, by overloading the challenge platform, or sending files containing malware, viruses or other code intended to interrupt, destroy or limit the operation of platform, software, hardware or telecoms equipment. This will result in instant disqualification.
Additional actions to ensure a fair competition for all participants:
No solution sharing: Sharing solutions, code, or outputs between teams is strictly prohibited
Each team must develop their own independent solution
Violations will result in disqualification
If you’ve spotted any cheating or unfair behaviour, email challenges@reply.com
You can win a prize for your university or high school, too. The final score that your team will get during the AI Agent Challenge will be added to the two different Leaderboards. One dedicated to universities and the other one dedicated to high schools.
For the University League, the prize will be a Reply branded football table game for the communal area, or a financial donation for the Institute.
For the High School League, the prize will be a €2,000 donation, plus a coding course designed for your class by Reply experts.
You just need to tell us the name of your university or high school right after creating a team or joining an existing one.
The final score of your team will count for every person on the team: if in your team there are students from different universities the points will be added to each of those.
Yes, you can, the University Students League is open to Alumni too. Right after creating your team, insert the name of the university where you studied and make it win.
No, you can't, but stay tuned for next communications on TamTamy about how to get the official t-shirt and the hubs where to play in the offices.
Read the full Terms & Conditions of the competition.