We use cookies to ensure you have the best possible experience. If you click accept, you agree to this use. For more information, please see our privacy policy.
/

Innovation

How we built AI agents in 16 hours: behind the scenes of our 2025 Hackathon

Carl Lapierre
Carl Lapierre
10
min read

A banana duct-taped to a wall, Skynet memes, OsedeAgents, Cyrano de Bergerac and Judge Judy. None of this was on our bingo card. Had you asked us what our team at Osedea would create if given free rein and just 16 hours, we wouldn’t have imagined this wild ride.

2025 has often been dubbed “the year of agents”, so it was only fitting that we dove headfirst into a hands-on AI hackathon, because we believe that hands-on experience is the best experience. On Saturday, May 31st, five development teams at Osedea gathered for our annual energy-drink-fuelled hackathon, our third AI edition. Each team was challenged to build a working demo of an agentic system from scratch. At midnight, they would pitch to a panel of judges, competing for prizes in three categories: Most Innovative, Biggest Laugh, and Best Technical Implementation. Needless to say, our teams rose to the occasion with boundless enthusiasm.

The demos they delivered in under 16 hours were nothing short of spectacular: inventive, hilarious, and brimming with contagious laughter. Here’s a glimpse of what they built during the event:

Team 1

Alexis, Carl, Emilie

Agents excel at decision-making, so it makes sense to leverage them as judges. While “LLM-as-a-Judge” is already a familiar concept in RAG pipelines, Team 1 decided to take things a step further by building Judy, aptly named after Judge Judy, as the first Judge-as-a-Service (JaaS) application.

The inspiration for Judy emerged during last year’s AI Hackathon, when we tasked SPOT with drawing arbitrary scenes using diffusion models and G-Code. In practice, however, only about one out of every four generated images was good enough to trace as an outline, most attempts yielded poor results. Rather than fine-tuning our generative model, we implemented a lightweight “judge” that would evaluate all four rendered outputs and select the single best outline. To our delight, this simple judging step dramatically improved the final drawings.

Building on that insight, Team 1 set out to create a composable, multi-agent pipeline builder that allows judgment scenarios to be configured dynamically. On the frontend, they used React Flow to let users visually connect “worker” modules, “judge” modules, criteria definitions, and a debate component. Under the hood, the UI is transpiled into a directed graph using LangGraph constructs; a Python backend, backed by WebSockets, orchestrating the agents at runtime.

When the pipeline runs, each worker agent generates candidate outputs (for example, business pitches), then passes those outputs along with judge agents and a set of evaluation criteria. The judges, criteria and worker outputs feed into a debate module, which orchestrates a turn-based conversation: each judge elaborates on the criteria regarding the candidate output. Once the debate finishes, each judge assigns scores to each criteria per output, and those scores are averaged to produce a final ranking of structured outputs. Additionally each worker, and judge benefit from their own configurable persona and LLM model, via OpenRouter. This provides a unified API for models allowing multiple different “brains” to produce artifacts. Providers include Gemini, Deepseek, OpenAI, Anthropic, Mistral and more. 

At the hackathon demo, they showcased this setup in a “Dragon’s Den” style pipeline. Each competing team pitched its agentic product to five worker agents, which transformed the pitch into a polished business proposal. Two judge agents then evaluated these proposals within the debate module, using predefined criteria to guide their scoring. In real time, judges exchanged arguments about the merits of each proposal, ultimately converging on a ranked list of the teams’ ideas.

This project highlights the power of multi-agent systems and the LLM-as-a-judge pattern. By making the judgment process itself modular and dynamic, Judy’s architecture can be extended to any workflow that requires automated evaluation, whether that’s picking the best image outline, vetting business ideas, or something entirely new.

Team 2

Christophe, Lilia, Maxime, Phillippe

Have you ever wondered if you could have your own Cyrano de Bergerac whispering in your ear at all times and telling you what to say at the right moment? It all started with a pair of AR glasses, the Vuzix Blade 2. Equipped with a camera, speakers and microphone, the idea is that the wearer would be able to analyze their interlocutor visually and contextually to know what to say next.

The pipeline would start with a video & audio capture from the glasses. The video would go through multiple computer vision models to be processed. The audio would go through a voice-to-text model. Both paths would then merge into an agentic system of LLMs to be processed to output the ideal sentence to send to the wearer. All the computing was done on an AWS EC2 instance, as the Blade would not have enough hardware capacity to run the models locally.

The first step involved detecting the face using MediaPipe's Face Detection module, which provided a cropped image of the person’s face during interaction. If no face was detected, the pipeline would halt to avoid unnecessary computation.

To extract relevant information from the image, we attached semantic tags. For appearance description, we used two models: Bootstrapped Language-Image Pretraining (BLIP) and Contrastive Language–Image Pretraining (CLIP). BLIP generated natural language captions that described the person in a more basic, Cartesian way. To enrich this output, we combined it with CLIP, which helped recover fine-grained visual attributes, often missed by BLIP, resulting in more detailed and playful tags.

Another important component was emotion recognition. We used DeepFace to analyze the cropped face and determine the dominant emotion, along with a confidence score. Lastly, we integrated face recognition using the InceptionResNetV1 model to identify whether the person had already been seen. If so, we reused the existing appearance description to avoid redundant processing.

The audio capture was using the glasses integrated microphone to be able to capture the trigger word to start recording the audio to be able to parse the intent of the wearer within the conversation. The model used was OpenAI-Whisper Turbo.

Finally, both outputs from the video processing and audio processing would enter the agentic pipeline of the LLM agents, each one using the Llama3.1:8b model. The entry point would be the orchestrator, by parsing the captioned speech of the intent, the orchestrator would determine which type of sentence would the user want to say next to their interlocutor (make them laugh, insult them or even romance them). The orchestrator would then call that specific agent. Each agent would have their own set of rules on how to create the sentence and with what attitude, but all the agents would have to base their sentence on the current emotion and appearance of the interlocutor to add relevancy and reduce the generic aspect of text generation. There was then validation of the output by a verifier agent that would confirm that the attitude agent did deliver the desired emotion. The last step of the system would be to translate that sentence into French and by following the style of Cyrano de Bergerac in the famous play by Edmond Rostand.

This project was not without challenges. The implementation with the Vuzix glasses ended up being more complex than planned. The glasses only supported Android API 30, but Expo, the development tool that was used for the frontend, needed at least API 34. This limited the use of React Native during the development. The use of multiple models and the idea of the project was to be a conversation helper, latency needed to be minimal. While testing the models running locally on a laptop, inference could be up to 30 seconds for each model. We then decided to go on a AWS G5 16XL EC2 instance. This allowed us to lower the latency of the full flow down to around 1 second.

This multi-leveled project has showcased the capacity of a network of agents with different models to accomplish an everyday task of analyzing the environment, declaring an intent, and determining the next action to accomplish or the next sentence to say.

Team 3

Antoine, Daehli, Jean-Christophe

Have you ever thought to yourself: ChatGPT is so intelligent, but its black and white, text-only answers look so boring. Or else, you use ChatGPT for so many things that you feel like you are getting "chat fatigue" from only interacting with endless monochrome text threads?

Well, you are not alone! Sometimes it feels like LLMs are stuck at the DOS era of personal computing. Or that they exchange through internet connections only in the form of email, not having yet discovered the possibilities of serving rich user experience with HTML/CSS pages.

Thinking that LLMs were both very good at generating content and at generating code, we gave ourselves a challenge: provide them with an agentic GUI that allows users to browse LLMs in a more interactive, fun & colourful experience.

Our core idea was to use LangGraph as our main tool to create a multi-agent system focused on three aspects: create content that suits the user's requests, define design guidelines that fit the context and needs of the user and write responses in HTML/CSS code that can be rendered as web pages in a browser. With that plan in mind, we hoped to create a looped system that would allow users to visualize the LLM's answers in a browser-rendered page and follow-up simply by clicking embedded links that would trigger the right subsequent prompts.

While we could not implement our full vision on LangGraph with the limited time frame we had, we did successfully create an LLM pipeline to render answers as HTML/CSS and prompt the LLM only by interacting with this generated front-end. While reduced in scope, this small proof of concept did satisfy our curiosity by showing in action what we had originally envisioned: browsing LLMs not through question/answer but through infinitely-clickable, model-generated content and GUI!

Team 4

Armand, Hugo, Robin

Skynet memes is your all-in-one, one-stop meme shop where you can create your memes with agentic technologies. First, take or upload a photo of your best self, select a meme theme, and if you’re feeling rich, provide a style. Our platform will swiftly begin to roast you while you wait for your very own meme to be generated. Skynet memes also allow you to view previously generated memes. Don’t worry, we’ll roast those outputs as well.

The Skynet Memes platform was built with a simple React frontend (sorry, Armand) that leveraged Tailwind for some fast prototyping. The backend, developed with FastAPI allowed for easy API calls which would trigger our LangGraph orchestrated AI Agents. These agents used various models via OpenRouter to get the jobs done.

First, a Describer agent would kick off and give us a detailed description of the uploaded image. Then, a Thinker agent would use the image’s description and the user’s meme theme to come up with a concept. A Writer agent would work their magic to provide us copy. And finally, a generator agent took the copy, image, and meme style and generated our users their own personalized meme.

Additionally, we leveraged another agent, called the Roaster, to provide its insights on the user’s image, as well as the generated memes themselves. This output was sent to our frontend, where we used text-to-speech to serenade our users with commentary… Who doesn’t love some self-depicting humour?

The agentic meme generator

The aim of the project was to create a memes generator using a set of agents called as tools by an orchestrator (see diagram below).

Each node in this diagram represents an agent, with its own model: all use Gemini 2.0 (chosen for its speed of execution) apart from the writer and the better meme, which use OpenAI's GPT-4o. To build that orchestrator architecture, we first used the openAI agents SDK and then switched to LangGraph and tracked the input/output of each node with Langsmith using @traceable decorators.

When creating the meme, several style categories or topic categories were offered. In the styles, we could choose Simpson, Black and White, Minecraft or others. In topics, we could orient the meme towards “Relatable”, “Facepalm”, “Mood” and many others. Each selection directed our agents' System Prompts towards our choices.

The result could be seen in a gallery with our other generated memes. We then saw our starting image with the punchline chosen by our generator.

The roast

Memes generation times ranged from around 30 seconds (for the basic version with the “Generator” node using Gemini) to between 1 and 2 minutes (for the “Better Meme” version using GPT-4o). During this time, another agent was used to generate a roast of the supplied image.

Two versions of the roast were then possible (to be defined in the code), either the honey version, with a rather calm or even pleasant roast, or a standard version with a rather... spicy roast!

These roasts were also triggered in the gallery when one of the generated memes was selected for large display. This allowed people to enjoy the entire roast, download the image and/or the associated audio. 

This roast/meme generator won us the prize for funniest project at the Osedea Hackathon 2025! 

Quick example

Team 5

Cedric, Thomas, Zack

Their project introduces a suite of AI-powered "personas" designed to streamline daily tasks and boost productivity, all managed through the familiar interface of Gmail and backed by in a workflow automation tool called n8n

The core concept of Osedeagents is to provide targeted assistance through specialized AI agents, making complex processes feel effortless. By utilizing Gmail as the primary graphical user interface, the team ensured that interacting with these powerful assistants is intuitive, offering easy history tracking and a well-known platform for all users.

Imagine an AI that preps you for your day. The "Thierry / Charles" persona, for instance, excels at gathering vital information on meeting attendees and their respective companies, ensuring our team is always well-prepared. Meanwhile, "Marie-Pier" acts as a savvy communications aide, helping to draft and refine LinkedIn posts that capture the authentic Osedea voice, awaiting approval before going live. Need quick information about company benefits or internal policies? The "Karine / Ivana" persona is on hand to provide concise summaries and details.

Beyond individual tasks, Osedeagents also streamline collaborative efforts. Organizing our popular Friday Lunch & Learns becomes a breeze, with the system capable of scheduling presentations requested by any Osedea employee, preparing agendas, and sending out timely reminders. The system is even designed to inject a bit of context-aware humor, aligning with Osedea's unique culture. From managing daily briefings with event details and conflict resolution to simplifying complex information requests, Osedeagents are designed to be versatile and incredibly helpful.

Conclusion

The 2025 AI Agent Hackathon was a tremendous journey, fuelled by creativity, collaboration, and just the right amount of caffeine. Seeing teams build everything from a Meme Generator to an AR-powered Cyrano assistant, and from interactive LLM-driven GUIs to Gmail-based AI personas, reminded us how quickly ideas can come to life when curiosity meets determination.

We had a lot of fun pushing boundaries and discovering new possibilities in just 16 hours. If any of these projects stirred up some interesting ideas don’t hesitate to contact us! Let’s grab a coffee, or 10.

Did this article start to give you some ideas? We’d love to work with you! Get in touch and let’s discover what we can do together.

Get in touch
Button Arrow