🔭 I’m currently working on video understanding and large multimodal models
📫 How to reach me: [email protected]
🔭 I’m currently working on video understanding and large multimodal models
📫 How to reach me: [email protected]
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.