IA on Edge

Fun exploration of AI on NVIDIA edge board

February 4, 2026 by

WebScIT, Collonval Frédéric

In today's digital age, privacy and accessibility are paramount concerns, especially when it comes to educational tools for children. The Curious Frame project, an offline AI-based tutor, aims to address these concerns by leveraging edge AI technology.

Introduction

The Curious Frame project was born out of the need to create an educational tool that operates offline, ensuring privacy and accessibility. Traditional AI models often rely on cloud-based processing, which can raise privacy concerns and limit accessibility due to the need for a stable internet connection. By using edge AI, the project aims to provide a secure and efficient learning experience for children.

Project Overview

The Curious Frame is designed to be an AI tutor for children, explaining objects in a simple and understandable manner. The project is constrained by the need to operate without a screen, collect only visual information, and provide vocal feedback in the child's language.

Technical Implementation

The project utilizes several key components:

Image Capture: A Raspberry Pi camera is used to capture images of objects.
Edge Computing Platform: The NVIDIA Jetson Orin Nano, an ARM board with 8GB of shared VRAM, serves as the computing platform. An SSD is used for better performance, with the OS on a micro SD card to reduce default RAM usage.
Sound Output: A Jabra Speak2 55 speaker is connected via USB to provide vocal feedback.
Cardboard Frame: A simple cardboard frame is used to point at objects for description.

The workflow involves capturing an image, processing it using a Vision Language Model (VLM), generating a textual explanation, and converting that text to speech using a Text-To-Speech (TTS) system.

Sequence diagram

Image Capture and Processing Pipeline

The image capture is done using a Raspberry Pi camera, which is connected via USB. The captured image is then processed using a Vision Language Model to recognize objects. Initially, the Gemma3n model was used, but it did not support image input on the Ollama framework. Therefore, the moondream2 model was used for image analysis.

Integration of Vision Language Model for Object Recognition

The Gemma3n model is executed on Ollama, but since that integration does not support image input, the moondream2 model is used to analyze the images. Once objects are recognized, the Gemma3n model generates relevant textual explanations. If the language is French, the description is translated using the Gemma3n model again.

Text-To-Speech (TTS)

The text generated by the Gemma3n model is converted to speech using Piper, a Text-To-Speech system.

Challenges and Solutions

The project faced several challenges:

Latency: Minimizing latency is crucial for a smooth user experience on edge devices with limited processing capabilities.
Memory Limitations: Edge devices have limited memory capacity, requiring efficient data storage and processing approaches.
Tooling: There were challenges in getting the tools stack working on the NVIDIA Jetson Orin Nano.

To address these challenges, the project tested a Mistral model (Ministral 3 3B), which allows for better integration of vision and language models. This improvement has led to more efficient processing and reduced latency as the image can be processed directly by the LLM and it is set to think in the appropriate language to avoid translation round trip.

Future Plans

The future plans for the Curious Frame project include:

Integration with Reachy Mini: Plans are underway to integrate the system with Reachy mini, a robot that can provide a more interactive learning experience.
Improvements in Text-To-Speech Technology: The current TTS system, Piper, has a known issue with dropping the first phonemes. The team is looking into alternatives to improve this aspect.

Conclusion

The Curious Frame project successfully integrates edge AI to provide offline educational experiences for children. By addressing challenges such as latency, memory limitations, and tooling issues, the project paves the way for innovations in learning methods and AI deployment in education. The use of edge AI ensures privacy and accessibility, making it a valuable tool for educational purposes.

For more details, you can refer to the Gemma3n Kaggle hackathon article (here), the YouTube video - in French (here) - , and the code repository (here).

in News

# Meet-up Rennes Python

WebScIT, Collonval Frédéric February 4, 2026