Edward Hong Wang

Edward Hong Wang

Cambridge, Boston, USA

Hi, I'm Edward Hong Wang, a Junior Researcher at Harvard University, working closely with Dr. Fawwaz Habbal. Together, we are co-teaching ES26/ES294 at Harvard SEAS in Spring 2025, a groundbreaking course exploring the intersections between Human Cognition and Artificial Intelligence.

My research focuses primarily on Multi-Modal Large Language Models (LLMs) and their impactful applications across diverse fields such as Neuroscience, Robotics, Quantum Physics, and Collaborative Behavior. I'm passionate about leveraging these advanced technologies to facilitate interdisciplinary breakthroughs.

I'm currently seeking collaborators interested in pioneering research. Active projects include:

✨ Feel free to reach out—I'd love to connect, collaborate, or simply say hi! 🧠 🤖 🌟

Teaching

Publications

  • AI Language System

    Building A Unified AI-centric Language System

    ICLR 2025 Workshop on Deep Generative Model in Machine Learning

    In this pioneering work introducing the first-ever unified language system, we tackle a major risk: under RL training, LLM and AI agents forming private, human-inaccessible "sub-languages." Our AI-centric approach addresses biases, ambiguities, and inefficiencies in natural language by translating diverse inputs into a concise, unambiguous format. This not only reduces computational overhead but also strengthens transparency and fairness, paving the way for more secure AI-to-AI and human-to-AI communication.

  • AI-Powered Neural Implant for PTSD Monitoring

    I developed a cutting-edge dual-loop system integrating a responsive neural implant with an AI-powered wearable platform to continuously monitor and intervene in PTSD episodes. This system is the first automatic self-reporting design in its field, enabling continuous capture and analysis of both neural signals and environmental contexts without requiring manual input from patients.

    The implanted device detects pathological brain activity in real time and delivers targeted neurostimulation when needed. Simultaneously, smart wearables and Meta glasses automatically identify environmental triggers and record contextual data, allowing for near-instant feedback and personalized therapeutic interventions.

Projects

  • 4D Vision Transformer for Video Understanding

    To empower LLMs with visual capabilities, I developed a 4D time-encoded Vision Transformer with temporal continuity and spatial intelligence, achieving near-superhuman comprehension of dynamic real-world events.

    While existing solutions like SLAM and NeRF focus on static scenes, our approach identifies which elements remain unchanged over time and interprets these changes according to physical laws—allowing it to predict outcomes and explain why they occur. For example, our model can infer that a ball may collide with a TV even when they never appear in the same frame.

  • Contextualized Subtitle for Deaf and Hard of Hearing (SDH)

    Most speech-to-text systems focus exclusively on converting audio into raw text, overlooking context, non-speech cues, and emotional nuance. Our system genuinely "listens" and interprets the soundtrack, adding clarity to ambiguous phrases (like distinguishing if "the fish is 1,000 pounds" refers to weight or cost).

    Our pipeline enhances transcripts with contextual coherence, segments content logically, incorporates ambient sounds, and analyzes tone patterns to describe emotional content—creating rich transcripts that capture the full communicative experience. This approach expands accessibility for the deaf and hard-of-hearing community while being 20 times cheaper than human transcription ($0.1 vs $2 per minute). Considering the film subtitling market is valued at $8.51 billion in 2024, our model could save up to $6 billion. Despite its profit potential, we open-sourced the solution to promote inclusivity.

Contact

Location

  • Cambridge, Boston, USA