Winning project
Codex RayBan-Meta Tutor
Team Codex Tutor
Project description
Codex RayBan-Meta Tutor - An English tutor that lives in your kid's glasses.
IMPORTANT: when playing the video, please turn up the volume to MAX to understand the voice output from the RayBan glasses.
The problem
Kids learning English as a second language in school get stuck waiting for a parent to read over their shoulder. Tutoring apps force typing and screen-switching — friction that breaks the flow.
The solution
Codex Tutor turns your Ray-Ban Meta glasses (with iPhone fallback) into a hands-free English tutor. The kid keeps looking at his paper homework, points at things, asks out loud. The glasses answer by voice. Corrections pop up on the phone only when needed.
Three core interactions
- Point at a word → "How do you pronounce this?" → spoken answer.
- Point at a phrase → "Is this correct?" → yes/no by voice; if no, correction on phone.
- Look at the page → "Any mistakes?" → yes/no by voice; if yes, corrections on phone.
How it works
Built on an existing dual-input app (phone camera+speakers or glasses). The RayBan glasses provide a (low-res) live-stream of video and audio to the Google Gemini Live endpoint. Gemini detects when the student asks a question (optionally pointing at a sheet of paper) and triggers a function call, that takes a high-res photo using the RayBan glasses and send this photo together with the question / input prompt to a VLM for processing. The VLM processes the students question using the image and prompt and returns a response that is converted to speech by Gemini Live. The audio is then output using the RayBan glasses.
Stack
iPhone Ray-Ban Meta glasses · phone app (live + corrections view) · VLM for pointing and for grammar · TTS for voice.
IMPORTANT: when playing the video, please turn up the volume to MAX to understand the voice output from the RayBan glasses.