Steven Gay PRO

StevenG640

AI & ML interests

None yet

Recent Activity

liked a Space about 6 hours ago

Qwen/Qwen-Image-Edit-2511

upvoted a paper about 11 hours ago

ViPE: Video Pose Engine for 3D Geometric Perception

upvoted a paper about 12 hours ago

Plan-X: Instruct Video Generation via Semantic Planning

View all activity

Organizations

liked a Space about 6 hours ago

Qwen Image Edit 2511

🏆

191

Generate edited images based on a prompt and input image

upvoted a paper about 11 hours ago

ViPE: Video Pose Engine for 3D Geometric Perception

Paper • 2508.10934 • Published Aug 12, 2025 • 3

upvoted 4 papers about 12 hours ago

upvoted 4 papers about 15 hours ago

A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

Paper • 2511.10555 • Published Nov 13, 2025 • 61

SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

Paper • 2511.15605 • Published Nov 19, 2025 • 23

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Paper • 2510.02283 • Published Oct 2, 2025 • 96

Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

Paper • 2511.20649 • Published Nov 25, 2025 • 47

upvoted a paper about 16 hours ago

In-Video Instructions: Visual Signals as Generative Control

Paper • 2511.19401 • Published Nov 24, 2025 • 31

upvoted 9 papers about 17 hours ago

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

Paper • 2511.20714 • Published Nov 25, 2025 • 48

DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

Paper • 2511.19365 • Published Nov 24, 2025 • 64

Kling-Omni Technical Report

Paper • 2512.16776 • Published 19 days ago • 164

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Paper • 2512.16969 • Published 19 days ago • 111

MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments

Paper • 2512.19432 • Published 15 days ago • 12

Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Paper • 2512.17351 • Published 18 days ago • 25

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published 15 days ago • 11

Abstract 3D Perception for Spatial Intelligence in Vision-Language Models

Paper • 2511.10946 • Published Nov 14, 2025 • 1

Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

Paper • 2511.14210 • Published Nov 18, 2025 • 20

Steven Gay PRO

AI & ML interests

Recent Activity

Organizations

StevenG640's activity

Qwen Image Edit 2511