VisTalk

Overall Effect Demonstration
Working Principle
Application Effect Demonstration 1
Application Effect Demonstration 2
Functional Demonstration of Each Component

Do czego to służy

This product enables visually impaired individuals to more conveniently obtain external information, understand the surrounding environment, and enhance their capabilities in various aspects including travel, social interaction, and self-care in daily life.

Twoja inspiracja

We've found that acquired blind individuals retain pre-blindness visual cognitions, yet existing aids like smart canes fail to leverage this. Current devices, despite tech upgrades, remain limited to obstacle detection and mechanical alerts, wasting users' established visual frameworks. Our solution uses multimodal large models to mimic human visual understanding. By building on pre-blindness worldviews, we: Use cameras/visual models to interpret surroundings; Employ large models for integrated info processing and natural dialogue; Enable users to "see" through tech, restoring intuitive environmental perception.

Jak to działa

Environmental Sensing The smart glasses' HD camera continuously captures surroundings, swiftly identifying traffic signs, store names, menu text, and more. Multimodal Processing A multimodal large model analyzes visual elements (objects, text, colors) and integrates voice commands to determine optimal interaction methods. Voice Interaction Processed results are converted to natural speech via TTS, delivered through in-ear headphones. Conversational queries like "What's the recommended dish?" trigger real-time analysis, cross-referencing with captured data for contextual responses. Tactile Feedback In noisy or emergency situations, vibration modules in the temples provide haptic cues. High-frequency vibrations alert users to obstacles, enabling rapid reactions when audio feedback is challenging.

Proces projektowania

Technical Considerations Hardware components such as cameras, processors, and batteries are selected and integrated, while supporting software functions (multimodal large model algorithms, voice interaction systems, etc.) are developed. Following laboratory testing, visually impaired users are invited for field trials to iteratively optimize the product based on real-world feedback. Final Product Design Balancing functional performance, wearing comfort, and aesthetic appeal, meticulous design decisions are made regarding materials, color schemes, and dimensions. The result is a smart assistive eyewear that combines technological sophistication with stylish design, suitable for prolonged daily use.

Jak to się różni

Multimodal Simulated Vision Multimodal large models emulate human visual understanding, integrating spatial context beyond basic distance measurement. By leveraging pre-blindness visual frameworks, they deliver complex environmental data through simulated "vision" for intuitive perception. Dual-Modality Interaction Conversational Voice replaces rigid prompts, enabling users to proactively query details and receive adaptive, context-based responses. Haptic Vibration provides instant alerts in noisy settings, while voice offers nuanced explanations for complex scenes. This dual design ensures optimal interaction for quick reactions and deep environmental analysis.

Plany na przyszłość

The next steps focus on three fronts. Technically, we'll optimize the multimodal model for faster edge computing and expand object recognition to niche scenarios like braille. Field trials will refine haptic/voice interactions. Commercially, we'll partner with retailers/rehabilitation centers for Asia/Europe pilot launches, offering tiered pricing and subscription-based model updates. Long-term, we aim to integrate the tech into AR wearables and collaborate with urban planners for AI-powered environments. The goal is to redefine accessibility, making adaptive tech intuitive and empowering users to regain autonomy and confidence.

Nagrody

Stworzony przez

昕然李

Uniwersytet / Instytucja

浙江大学

Region

Mainland China