Jingye Chen is a third-year Ph.D. student in HKUST supervised by Prof. Qifeng Chen. Previously he obtained the BSc and MSc degree in the School of Computer Science at Fudan University, supervised by Prof. Bin Li and Prof. Xiangyang Xue. He enjoys doing interesting research and thinking outside the box. He also spent a wonderful time as an intern in General AI Group at Microsoft Research Asia advised by Dr. Lei Cui and Dr. Furu Wei . He is fortunate to be mentored by Dr. Zhaowen Wang during the internship at Adobe Research.
An awesome repo about generative game is maintained at link. Welcome to any contributions!
A paper on the numerical and spatial consistency of generative games is released.
One paper accepted to CVPR2025.
We release Videotuna, an all-in-one video fine-tuning framework.
I pass the qualifying exam and become a Ph.D. candidate.
One paper accepted to ECCV2024 Oral.
One paper accepted to ACMMM2024.
We release a survey about llms for multimodal generation and editing.
TextDiffuser-2 is released. More flexible.
We published a multimodal literate model Kosmos-2.5.
One paper accepted to NeurIPS2023.
One paper accepted to AAAI2023.
One paper accepted to EMNLP2022-Findings.
We construct a benchmark for Chinese text recognition.
One paper accepted to AAAI2022.
One paper accepted to IJCAI2021.
One paper accepted to CVPR2021.
Model as a Game: On Numerical and Spatial Consistency for Generative Games
Technical Report, 2025
VideoTuna: A Powerful Toolkit for Video Generation with Model Fine-Tuning and Post-Training
Open-source Project, 2025
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
European Conference on Computer Vision (ECCV), 2024, Oral Presentation
✨ Top10 in the Hugging Face Space Trending List at Dec. 31st 2023; Featured as Space of the Week.
✨ Used by Recraft V3, the rank 1st image generation model in the global leaderboard.
TextDiffuser: Diffusion Models as Text Painters
Neural Information Processing Systems (NeurIPS), 2023
✨ Top10 in the Hugging Face Space Trending List at Jun. 29st 2023; Featured as Space of the Week.
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
AAAI Conference on Artificial Intelligence (AAAI), 2023
✨ Rank 4th in Most Influential AAAI 2023 Papers
XDoc: Unified Pre-training for Cross-Format Document Understanding
Empirical Methods in Natural Language Processing (EMNLP-Findings), 2022
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study
Technical Report, 2022
Text Gestalt: Stroke-Aware Scene Text Image Super-Resolution
AAAI Conference on Artificial Intelligence (AAAI), 2022
Zero-Shot Chinese Character Recognition with Stroke-Level Decomposition
International Joint Conference on Artifical intelligence (IJCAI), 2021
Scene Text Telescope: Text-Focused Scene Image Super-Resolution
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Conference Reviewer: CVPR, ICCV, NeurIPS, ACL, EMNLP, AAAI, ACMMM
Journal Reviewer: TPAMI, TMM
2023 Spring: COMP 2011 Programming with C++
2023 Fall: COMP 2011 Programming with C++
Excellent Master Dissertation Award of Shanghai
RedBird PhD Scholarship in HKUST
Outstanding Graduate of Shanghai (top 5%)
Excellent Student Award
National Scholarship (top 1%)
Outstanding Undergraduate of Shanghai (top 5%)
Best Team Award in University of Cambridge as a Leader
Third Class Undergraduate Scholarship