My PhD Research Objectives for the Coming Years:
Objective 1 (RUN1): Develop a GUI-integrated Multimodal Large Language Model (MLLM) capable of autonomously completing all complex tasks at the operating system level, replacing 90% of routine work through a single RL-optimized MLLM model.
Objective 2 (RUN2): Enable MLLM to achieve L4-level autonomous machine navigation for unmanned systems, demonstrating full-scenario driving capability without human intervention.
Objective 3 (RUN3): Generalize MLLM to embodied intelligent coordination and control, creating a unified super-intelligent model integrating: Spatial perception,Task planning,Motion generation,Feedback execution,Cross-modal understanding
This research roadmap aims to push the boundaries of MLLM capabilities across three critical dimensions of human-machine interaction and autonomous systems.