Info

March 12th, 2026

15:30 – 21:30 CET

2-4 people per team

Log-in to unlock this content

This page is reserved to Replyers only

Log-in to unlock this content

This page is reserved to Replyers only

How the challenge will work

Who can take part?

The AI Agent Challenge is a 6-hours online team-based competition for Replyers passionate about LLMs and AI.

Collaborators can’t participate to this edition which is only for internals, but they can join the external one planned for the next months, communications will follow.

If I register on the platform, am I registered for the competition?

No. To play on 12th March, you need to create your own team and invite a colleague (teams are made of 2 up to four members).

When does registration close?

You can register until 11th March, 23:59pm, CET.

I've registered, but I have no teammates. What can I do?

Once you’ve registered, you can form a team, invite colleagues or you can join our Discord Server to find participants to play with. If once the registrations close you still don't have a teammate, we will merge you with another team.

How do I change my registration details?

To update your details at any time, log in to your profile and click “Edit profile”, or follow this link.

How do I cancel my registration to the challenge?

You can access your Challenge Page and click on “Leave team” to cancel your registration to the challenge.

How can I form my team?

To form your team, log in to the Reply Challenges platform, click the “Join the challenge” button and select “Create new team”. Once you’ve formed a team, you’ll see it when you log in to the platform. You can also choose a team name and invite your friends. Just fill in their email addresses and send the invitation. Remember, this challenge is open only to Replyers.

How many people can be in a team?

Your team can have 2 - 4 people. Your team must be at least made of 2 members. If you're still registered alone at the end of the registration period, you'll be automatically paired with another participant to form a team of two. You'll receive an email notification with your team assignment.

How do I register for the challenge?

You can connect to challenges.reply.com, select the AI Agent Challenge and click on Join the Challenge.

Can I change who's on my team before the challenge day?

No, but you are free to leave your current team. They won’t receive any notification, so remember to tell them.

Where do I see my team?

You can see your team in your challenge page: connect to challenges.reply.com and in the homepage click on “Your Challenge Page”.

Can I arrange a mixed team of Replyers and externals for this AI Agents game?

No, this edition is open to Replyers only.

Can we train for the Reply Ai Agent Challenge?

We strongly recommend exploring all the materials in the Train & Learn section, including:

Learning modules (from basics to advanced topics about Agents creation and resources management)
Tutorials and instructions for the tools provided during the competition
Sandbox problems to practice and understand the challenge mechanics

Which programming language should I know and use?

You can use any programming language you're comfortable with to build your agentic system. The most commonly used languages for AI agent development are Python, JavaScript/TypeScript, but the choice is entirely up to you and your team. The only requirement is that your code must be functional and submitted as specified in the competition rules.

What is Langfuse used for?

Langfuse is required for tracking and validation purposes only. Your score will be calculated exclusively based on the output files you submit, not on Langfuse tracking data. Make sure to include your Langfuse session ID in all submissions as per the instructions in the Learn & Train section.

How do we submit a solution in sandbox mode?

The sandbox mode is a game simulation, useful to help you understand how to upload your solution.

Upload your files by dragging and dropping them or selecting them from your computer.

Training dataset: Submit an output file for each dataset.

Evaluation dataset: Submit both an output file and your source code. The source code must be provided as a zip file containing your complete agentic system with all necessary components to run it (code, dependencies list, configuration files, instructions, etc.).

Important: To ensure proper tracking according to the competition rules, you must include the Langfuse session ID in your submission. Find detailed instructions in the Learn & Train section.

Will I see a score when I submit a solution in sandbox mode?

Yes. You’ll see a list of scores for all your submissions.

Will there be a leader board in the training area?

No, but you’ll see your submission scores.

Which tools should we use?

You're free to choose the tools and frameworks that best suit your approach. However, for the competition you must use:

LLMs via API: We will provide you with an API key to access the LLMs available for the challenge
LangFuse: You must integrate LangFuse for tracking purposes following the instructions provided in the Train & Learn section

Beyond these requirements, you can use any additional libraries, frameworks, or tools to develop your agentic system.

Something's wrong with the platform. What should we do?

Try reloading the page, then try clearing your cache and cookies. If you’re still having problems, message the Reply AIvengers on chat or email challenges@reply.com.

What are the other computer/technical requirements?

You’ll need your own computer with an internet connection.

Does the platform execute the provided code to evaluate the solution?

No, the platform does not execute the code, the score is calculated using only the output files. Still, AIvengers team must be able to execute the provided code to check that the submission is correct.

How does the challenge work?

The AI Agent Challenge is a 6-hour competition where your team will build an agentic system to solve a specific problem statement.

Timeline on March 12th:

⏰ Challenge starts at 15.30 PM
💯 Leadearboard frozen at 21.00 until 21.30 PM
✋ Challenge ends at 21.30 PM
🏆 Within 10 working days: podium validation & final results

In your challenge page you’ll access:

Training Dataset:
- Use these datasets to develop and refine your agentic system
- Submit outputs as many times as you want
- Check your score after each submission to track your progress
Evaluation Dataset:
- Submit your final solution only once
- Include both your output and source code (zip file with your agentic system)
- Your final score will be based solely on the evaluation dataset performance

At challenge start: Your team will have access to the first three datasets.
Once your team submits the final evaluation solution for the first three datasets, datasets 4 and 5 will be automatically unlocked.

Resources Provided:

Problem statement
API key for LLM access
Training and evaluation datasets
Token allocation for your team: your team receives a budget in two stages:
- Datasets 1-3: $40 in tokens
- Datasets 4-5: $120 in tokens (unlocked only after submitting evaluation solutions for datasets 1-3)
Token monitoring dashboard

Find all materials, instructions, and tools in the competition page and Train & Learn section.

What if you run out of tokens?

Token management is a strategic component of the challenge. Once your allocated budget is exhausted, it cannot be refilled. Use your tokens wisely and plan your approach carefully!

What if we have a question about the problem statement?

You can message the AIvengers team via chat (Discord).

Can we use any development and execution environment?

Yes! You're free to use any development and execution environment you prefer.

The only requirements are:

You must include the Langfuse session ID for tracking (see instructions in the Train & Learn section)
Your final submission for the evaluation dataset must include both output and source code (zip file with your agentic system)

Is it an online-only game?

Yes, it’s an online-only competition.

When is the leaderboard updated?

We’ll update the leaderboard regularly to show how teams are performing. We’ll also freeze it 30 minutes before the challenge deadline (but we’ll continue to update scores).

How do we upload a solution?

Upload your files by dragging and dropping them or selecting them from your computer.

Training dataset: Submit an output file for each dataset.

Evaluation dataset: Submit both an output file and your source code (as a zip file containing your agentic system).

Important: To ensure proper tracking according to the competition rules, you must include the Langfuse session ID in your submission. Find detailed instructions in the Train & Learn section.

How many solutions can we upload?

It depends on the dataset:

Training dataset: Unlimited submissions. You can upload and test your solutions as many times as you want to refine your agentic system and track your progress.
Evaluation dataset: One submission only. You can upload your final solution only once, so make sure it's complete and tested before submitting.

Your final score will be based solely on your evaluation dataset submission.

How do you calculate the score?

Your score is based on three key criteria:

Detection Accuracy (30%) How well your system distinguishes between fraudulent and non-fraudulent transactions.
Economic Impact Assessment (30%) The financial consequences of errors — prioritizing the prevention of high-cost fraudulent activities and operational losses.
Agentic Efficiency and Optimization (40%) Your system's speed, cost-effectiveness, and architectural complexity. Both over-engineered and overly simplistic solutions will be penalized.

Benchmark & Bonus:
All metrics are evaluated against an optimal benchmark solution. Solutions that outperform this benchmark receive additional credit.

Dataset Difficulty:
Each dataset has a weighted scoring system where more complex datasets offer higher maximum points.

What can invalidate my submission?

Your submission will be rejected if it contains any of the following errors:

Missing output files: All required output files must be included
Missing source code: Evaluation dataset submissions must include source code
Missing Langfuse session ID: Required for tracking and validation
Corrupted zip file: Ensure your zip file is properly compressed and can be extracted
Incomplete system: Missing dependencies, configuration files, or instructions

Double-check your submission before uploading, especially for the evaluation dataset (you only get one chance!).

What is Lanfuse used for?

Who wins?

At the end of the code game, the AIvengers team will review and validate the best scoring submission from top-ranked teams on the leaderboard analyzing the source code files. The Reply teams’ decisions regarding the rules of this AI Agent competition are final.

When will you announce the results of the Reply AI Agent Challenge?

We’ll publish a full list of results and notify all finalists no later than 10 working days after the day of the challenge.

What are the prizes?

Each member of the first team will an iPhone 17 Pro. Each member of the second team will win a Playstation 5 Slim, while each member of the third team will receive a pair of Rayban Meta.

How will we get updates about the Reply Code Challenge?

You’ll get some emails before and after the challenge, so check your mailbox regularly. You can always message the AIvengers during the challenge via chat (Discord) if you have questions.

Which language(s) do I need to speak?

All communications will be in English. Though you and your teammates can speak whatever language(s) you like! 😊

Who are the Reply AIVengers?

Reply AIvengers write the problems and are responsible for enforcing all challenge rules. They’ll review submissions from teams and award prizes. They may exclude any participants or teams at any time, for breaching competition rules.

What do we do if someone’s cheating or behaving badly?

We want to make training sessions and the challenge fair for everyone. So never stop others from taking part – for instance, by overloading the challenge platform, or sending files containing malware, viruses or other code intended to interrupt, destroy or limit the operation of platform, software, hardware or telecoms equipment. This will result in instant disqualification.

Additional actions to ensure a fair competition for all participants:

No solution sharing: Sharing solutions, code, or outputs between teams is strictly prohibited

Each team must develop their own independent solution

Violations will result in disqualification

If you’ve spotted any cheating or unfair behaviour, email challenges@reply.com