Pu Fanyi's picture

Pu Fanyi

pufanyi

·

https://pufanyi.github.io

AI & ML interests

CV

Recent Activity

updated a dataset about 11 hours ago

pufanyi/flowers102

updated a collection about 11 hours ago

updated a collection about 11 hours ago

View all activity

Organizations

upvoted a paper 5 days ago

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published 11 days ago • 148

upvoted a paper 9 days ago

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

Paper • 2511.18659 • Published 13 days ago • 11

upvoted a collection 9 days ago

MDGA

Make Diffusion Great Again. The resource list for Super Data Learners, Quokka, and OpenMoE 2. • 16 items • Updated Nov 4 • 8

upvoted a paper 13 days ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published 16 days ago • 91

upvoted 2 papers 16 days ago

Scaling Spatial Intelligence with Multimodal Foundation Models

Paper • 2511.13719 • Published 19 days ago • 44

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

upvoted a collection 16 days ago

SenseNova-SI

Scaling Spatial Intelligence with Multimodal Foundation Models • 8 items • Updated about 11 hours ago • 10

upvoted a collection 18 days ago

Qwen3-VL

37 items • Updated Nov 1 • 488

upvoted a paper 19 days ago

PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

Paper • 2511.13648 • Published 19 days ago • 52

upvoted a paper 25 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 211

upvoted a collection 27 days ago

VST

A comprehensive framework designed to cultivate VLMs with human-like visuospatial abilities. • 5 items • Updated 24 days ago • 6

upvoted 3 papers about 1 month ago

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published Nov 4 • 57

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5 • 124

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

Paper • 2510.26794 • Published Oct 30 • 26

upvoted 3 papers 2 months ago

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

Paper • 2509.23661 • Published Sep 28 • 46

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24 • 98

Visual Jigsaw Post-Training Improves MLLMs

Paper • 2509.25190 • Published Sep 29 • 36

upvoted a paper 3 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 155

upvoted a paper 5 months ago

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 43

upvoted an article 6 months ago

Article

OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve

May 20

•

51