Companies
AIMageLab aimagelab/repositories
AixonLab AixonLab ,
HuggingFace - AixonLab ,
lumiere_alpha
Alibaba Alibaba ,
Alibaba PAI (Platform for AI) ,
Alibaba TongYi Vision Intelligence Lab ,
Qwen ,
Qwen2-VL ,
Qwen2-VL-7B-Instruct ,
ACE++ ,
In-Context-LoRA ,
Animate-X ,
ACE Plus ++ ,
Wan: Open Large-Scale Video Generative Models
Models - Kijai/WanVideo_comfy ,
UniAnimate-DiT ,
SwapAnyHead (no code) ,
GitHub - Qwen-Image (Chinese text) ,
Edit-R1: Reinforce Image Editing with Diffusion Negative-Aware Finetuning and MLLM Implicit Feedback ,
Qwen-TTS - Dialects
OmniTalker ,
Wan-Animate: Unified Character Animation and Replacement with Holistic Replication ,
Spaces/Qwen3-ASR-Demo (Speech-To-Text) ,
Qwen3-LiveTranslate: Real-Time Multimodal Interpretation - See It, Hear It, Speak It! ,
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback ,
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length ,
Alibaba-Quark/LiveAvatar ,
Tongyi-MAI/Z-Image ,
Ming-Omni: A Unified Multimodal Model for Perception and Generation ,
Alimama Github- Alimama Creative ,
HuggingFace - alimama-creative
AllenAI AllenAI ,
Open Data ,
Molml Olmo ,
Molml Demo ,
AllenAI Molmo 7B D ,
Applied Vision Lab, Institute for Intelligent Computing ,
allenai/tulu-3-sft-personas-instruction-following
Objaverse-XL - A Universe of 10M+ 3D Objects ,
RewardBench: Evaluating Reward Models ,
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning ,
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories ,
ZeroEval: Benchmarking LLMs for Reasoning ,
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild ,
Alpha-VLLM
Lumina-mGPT 2.0: Stand-alone Autoregressive Image Modeling ,
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding ,
NewBie image Exp0.1: Efficient Image Generation Base Model Based on Next-DiT ,
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
ANT
ANT ,
EchoMimic ,
EchoMimic - Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
Anthropic
How we built our multi-agent research system ,
Claude Code Game Studios
Apple Apple DepthPro ,
ComfyUI-Depth-Pro ,
FastVLM: Efficient Vision Encoding for Vision Language Models ,
pico-banana-400k ,
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows ,
Sharp Monocular View Synthesis in Less Than a Second ,
Qwen-Image-Edit-2511-Gaussian-Splash
Baidu - BAAI Emu3.5: Native Multimodal Models are World Learners
Bilibili
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech ,
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System (based on Tortise/XTTS) ,
IndexTTS Paper ,
IndexTTS Samples ,
IndexTTS demo ,
billwuhao/ComfyUI_IndexTTS ,
Black Forest Labs (BFL)
BlackForestLabs ,
FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space ,
black-forest-labs/FLUX.1-Krea-dev ,
Krea online tool ,
bullerwins/FLUX.1-Kontext-dev-GGUF ,
Flux2
Boson.AI Higgs Audio ,
EmergentTTS-Eval ("Emotions" and "Questions") ,
spaces/smola/higgs_audio_v2
BRIA BRIA Background Removal v2.0
ByteDance ByteDance ,
ByteDance Research ,
1.58-bit FLUX ,
GitHub - LipSync - LatentSync ,
X-Portrait2 ,
OmniHuman-1.5 ,
DreamActor-M1 ,
GitHub - goku - Flow Based Video Generative Foundation Models ,
bytedance - MegaTTS3 (voice cloning) ,
MegaTTS3 Demo ,
UNO (e2i) ,
RealCustom ,
InfiniteYou ,
DreamFit ,
BAGEL (ByteDance Adaptive Generative Language Model) ,
USO: Unified Style and Subject Driven Generation via Disentangled and Reward Learning ,
OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning ,
bytedance-research/OneReward ,
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning ,
DreamOmni2: Multimodal Instruction-based Editing and Generation ,
dvlab-research/DreamOmni2 ,
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration ,
DreamID-V: Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer ,
Ouro-2.6B-Thinking ,
Alive: Animate Your World with Lifelike Audio-Video Generation
CanopyLabs.ai
Voice Cloning - Natural intonation, emotion, and rhythm that is superior to SOTA closed source models (giggle, gasp, angry, happy) ,
Orpheus-TTS ,
Orpheus-TTS demo ,
watermark_audio ,
ShmuelRonen/ComfyUI-Orpheus-TTS ,
Lynx: Towards High-Fidelity Personalized Video Generation ,
Character.AI
Ovi - Twin backbone cross-modal fusion for audio-video generation ,
character-ai/Ovi ,
snicolast/ComfyUI-Ovi
Cohere
r7b-arabic ,
HuggingFace - CohereForAI ,
HuggingFace - r7b-arabic ,
One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers ,
CohereLabs/tiny-aya translation ,
Cohere Transcribe
ComfyUI
Official Blog
DeepSeek
HuggingFace - DeepSeek ,
HuggingFace - Open R1 ,
GitHub - Open R1 ,
Facebook META
META SAM3 (Segment Anything 3) ,
DiT ,
Facebook Research Scalable Diffusion Models with Transformers (DiT) ,
Seamless4MT ,
Seamless4MT Demo ,
MoCha - Towards Movie-Grade Talking Character Synthesis ,
AssetGen2 - Animated 3D game assets ,
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models ,
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages ,
SABER: Scaling Zero-Shot Reference-to-Video Generation ,
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory ,
Meta Segment Anything Model Audio (SAM Audio) ,
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming(no code)
FAL fal
FreePik HuggingFace - FreePik ,
Freepik/F-Lite Blog ,
fal-ai/f-lite ,
Freepik/F-Lite
Genmo Genmo
Mochi ,
ComfyUI-MochiWrapper
Google Google-deepmind ,
Google DeepBrain - Generative Motion Latent FlOw MAtching for Audio-driven Talking Portrait (FLOAT) ,
GraphCast / GenCast ,
Google Genie2: Generative Interactive Environments ,
SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds ,
TranslateGemma (translation)
HelloVision HelloVision ,
HelloVision/ComfyUI_HelloMeme
HongKong University of Science & Technology (HKUST) Llasa-3B ,
HKUSTAudio/Llasa-3B ,
HKUSTAudio/AudioX ,
ZeyueT/AudioX ,
Replicate - kjjk10/llasa-3b-long ,
MultiTalk: Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation ,
MeiGen-AI/MultiTalk ,
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing ,
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections ,
LucidFlux: Caption-Free Universal Image Restoration with a Large-Scale Diffusion Transformer
HuaWei - Computing Systems Lab (CSL)
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLMs ,
Reflection Removal through Efficient Adaptation of Diffusion Transformers ,
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos
HuggingFace
HuggingFace ,
HuggingFace Spaces ,
Latent Consistency (LCM) ,
Vision Language Model - SmolVLM-500M-Instruct-WebGPU ,
SmolLM3-3B (web) ,
Intel
Image-GS: Content-Adaptive Image Representation via 2D Gaussians ,
Paper ,
2-minute paper ,
International Digital Economy Academy (IDEA-Research) International Digital Economy Academy (IDEA-Research) ,
IDEA-Research/Grounded-Segment-Anything ,
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding ,
Dino-X blog ,
TREX-2 - object counting ,
dino-x - Detection - Segmentation - Keypoints - Generative Understanding ,
Video - Object Tracking ,
SmolLM (135M, 360M, 1.7B) ,
Rex-Omni: Detect Anything via Next Point Prediction ,
Rex-Omni demo
Korea
CGR Lab HanYang
Kuaishou
Kuaishou Visual Generation and Interaction Center ,
KlingTeam ,
Kwai-Kolors ,
Kwai-Kolors ,
kijai/ComfyUI-KwaiKolorsWrapper ,
LivePortrait ,
3DTrajMaster ,
GRAG: Group-Relative Attention Guidance for Image Editing ,
SVG: Latent Diffusion Model without Variational Autoencoder ,
UniVideo: Unified Understanding, Generation, and Editing for Videos ,
3DiMo: 3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation (no code)
Kyutai Kyutai-TTS ,
kyutai/tts-1.6b-en_fr ,
Lightricks
Lightricks ,
Lightricks/LTX-Video ,
Lightricks - LTX-Video ,
ComfyUI-LTXVideo ,
CivitAI - LTX IMAGE to VIDEO with STG, CAPTION & CLIP EXTEND workflow
GitHub - logtd/ComfyUI-LTXTricks ,
LTX IMAGE to VIDEO with STG, CAPTION & CLIP EXTEND workflow
Liquid.AI
LFM2: On-Device Models ,
Edge Models ,
LFM2.5 Models
LG LG - CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
felixtaubner/cap4d
Marvis-AI
Marvis-TTS-250m (Sesame CSM-1B, Kyutai mimi codec) ,
Marvis-Labs/marvis-tts
Meituan
Meituan Technology Team ,
MeiGen-AI/MultiTalk ,
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing ,
X-SAM: From Segment Anything to Any Segmentation ,
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications ,
LongCat-Video: A Unified Foundational Video Generation Model ,
LongCat-Video-Avatar ,
LongCat-Image-6B (40Gb) ,
MiaoshouAI MiaoshouAI ,
HuggingFace - Florence-2-base-PromptGen-v2.0 / Florence-2-large-PromptGen-v2.0
Florence-2-large-PromptGen-v1.5
ComfyUI-Miaoshouai-Tagger
Microsoft Microsoft (Florence2, Phi 3.5, Phi 4) ,
MoGe - Monocular 2D->3D ,
MoGe (Monocular 2D->3D) ,
kijai/ComfyUI-MoGe ,
MoGe demo ,
Magma multimodal agents ,
GitHub - Magma ,
[GIS-LLM] PEACE: Empowering Geologic Map Holistic Understanding with MLLMs ,
[GIS-LLM] GEO-Bench-VLM ,
1-bit LLM ,
Phi-4-reasoning-vision ,
MiniMaxi
MiniMaxi ,
MiniMaxAI ,
HailuoAI ,
MiniMax-M2
Mistral Mistral ,
Mistral small-3 ,
Mistral Large 3 (instruct) ,
Voxtral-Mini-Realtime
MoonshotAI MoonshotAI ,
Kimi K2: Open Agentic Intelligence
NetEase Fuxi Lab GitHub
NTU
Inverse Super Resolution (InvSR) ,
EdgeTAM: On-Device Track Anything Model ,
Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling (Math Magic) ,
ilcve21/Sparc3D (Math Magic) ,
Hitem3D (use Sparc3D) ,
Ultra3D: Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention (Math Magic) ,
ObjectClear: Complete Object Removal via Object-Effect Attention ,
CineScale: High-Resolution Cinematic Visual Generation ,
FE2E: From Editor to Dense Geometry Estimator ,
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training ,
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation ,
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives ,
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image ,
Light-X : Generative 4D Video Rendering with Camera and Illumination Control ,
LongVie 2: Multimodal Controllable Ultra-Long Video World Model ,
StoryMem: Multi-shot Long Video Storytelling with Memory ,
Self-Refining Video Sampling
NUS OminiControl: Minimal and Universal Control for Diffusion Transformer ,
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer (Liblib AI) ,
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data ,
PixNerd: Pixel Neural Field Diffusion (pixel-space diffusion transformer for image generation without VAE) ,
Chroma1-Radiance (based on PixNerd) ,
SpotEdit: Selective Region Editing in Diffusion Transformers ,
ShowUI-Pi: Flow-based Generative Models as GUI Dexterous Hands ,
Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance
NuMind NuMind,
NuMind NuExtract-1.5,
Nvidia Nvidia ,
Nvidia Labs (NVlabs) ,
GitHub - NVlabs/Sana ,
GitHub - ComfyUI_ExtraModels ,
DiffusionRenderer: Video Diffusion Models ,
3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes ,
3D Gaussian Ray Tracing (3DGRT) ,
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control (Monocular 2D->3D) ,
ViPE: Video Pose Engine for 3D Geometric Perception ,
Audio Flamingo: Series of Advanced Audio Understanding Language Models ,
NVIDIA Earth-2: Climate Weather Model ,
OpenAI OpenAI ,
OpenAI Whisper (transcribe / translate) ,
Whisper Accent - Accent-Aware English Speech Recognition ,
OpenBMB
MiniCPM4-8B ,
MiniCPM-o ,
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning ,
UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models ,
MiniCPM-o 4.5: Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Mulitmodal Live Streaming on Your Phone ,
MiniCPM-o demo ,
OpenXLab OpenXLab ,
Models ,
Ostris Ostris ,
ostris/OpenFLUX.1 ,
Rednote XiaoHongShu
rednote-hilab ,
rednote-hilab ,
Dots LLM ,
FireRedTTS-2: Towards Long Conversational Speech Generation for Podcast and Chatbot ,
FireRedTTS2 (multilingual) ,
FireRed-Image-Edit
PixelWave mikeyandfriends - PixelWave ,
CivitAI - user - humblemikey (Art Style, PixelWave)
RedNote - XiaoHongShu
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model ,
Xiaohongshu Instant ID Research ,
Regional-Prompting-FLUX ,
RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services ,
Resemble-ai
ResembleAI ,
ComfyUI-Chatterbox ,
Perth is a comprehensive Python library for audio watermarking and detection ,
ResembleAI/chatterbox-turbo
SalesForce
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
ServiceNow
DrBench Enterprise Research Benchmark ,
Framework for Evaluating Voice Agents (EVA)
Shakker-Labs Shakker-Labs ,
FLUX.1-dev-ControlNet-Union-Pro 2 ,
FLUX-LoRA-Gallery ,
AWPortrait-Z ,
SkyworkAI SkyworkAI/SkyReels-V1 ,
Kijai/SkyReels-V1-Hunyuan_comfy ,
SkyReels-Audio ,
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
Snap
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
StabilityAI StabilityAI
StableDiffusion StableDiffusionAPI
StepFun.AI Step1X-Edit: A Practical Framework for General Image Editing ,
Step-Audio-EditX ,
stepfun-ai/Step1X-Edit ,
ACE-Step: A Step Towards Music Generation Foundation Model ,
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets ,
StepAudio2: Speech and Audio Understanding & Conversation ,
StepAudio Demo ,
Step-Audio-2-mini ,
Step3-VL-10B: Compact Yet Frontier Multimodal Intelligence ,
Step-3.5-Flash ,
Tencent Tencent ,
Tencent-Hunyuan ,
PhotoMaker ,
MimicMotion ,
ComfyUI - HunYuan ,
HunYuan ,
tencent-ailab/persona-hub ,
PersonaHub ,
HunyuanCustom, a multi-modal, conditional, and controllable generation model centered on subject consistency ,
Tencent/HunyuanCustom (80gb) ,
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters ,
Hunyuan3D-2.1 (10Gb=Shape, 21Gb=Texture, 29Gb=Shape + Texture)
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again (==GPT4o) ,
SRPO: Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference (realism) ,
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs ,
FlashWorld: High-quality 3D Scene Generation within Seconds ,
HunyuanOCR-1B ,
Tsinghua Tsinghua University Knowledge Engineering Group (KEG) & Data Mining ,
CogVideoX-5b ,
CogVideoX models ,
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation ,
Unirig: Diverse Skeleton Rigging - One Model to Rig Them All ,
VAST-AI-Research
tripo3d.AI ,
HuggingFace ,
TripoSG - Image to 3D ,
Github - TripoSG ,
MV-Adapter [Image-to-Multi-View] ,
Github ,
Medium ,
DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow ,
VAST-AI/DetailGen3D ,
DetailGen3D demo ,
One Model to Rig Them All: Diverse Skeleton Rigging with UniRig
ViVago
ViVago ,
HiDream t2i ,
modelscope/HiDream-E1-Full ,
HiDream t2i ,
HiDream e1 ,
Wan t2v ,
text-to-music ace-step-v1 ,
Quick Thoughts on HiDream-I1 & E1
VisionATrix VisionATrix ,
PhotoMaker-Plus
Vision-xl vision-xl
Sina weibo VibeThinker: Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
XiaoMi MiMo Audio: Audio Language Models are Few-Shot Learners ,
XiaomiMiMo/MiMo-Audio ,
MiMo-V2-Flash
XLabs-AI XLabs-AI ,
flux-controlnet-collections ,
GitHub - XLabs-AI/x-flux-comfyui
ZhiPu Z.AI GLM-4.5: Reasoning, Coding, and Agentic Abililties ,
Flappy Bird
Pokemon webapp - Pokedex ,
zai-org ,
SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
AutoGLM ,
RealVideo: A Real-Time Streaming Conversational System Powered by Autoregressive Diffusion Video Generation
Probing LLM Social Intelligence via Werewolf ,
Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction
Zyphra Zyphra ,
Zyphra playground ,
Zonos v01 Speech TTS ,
|
Benchmarks & Leaderboards
Keywords: InoReader - Benchmarking ,
Intro to LLM Benchmarking
AceVFI: A Comprehensive Survey of Advances in Video Frame Interpolation (VFI) ,
Awesome-Video-Frame-Interpolation
Agent Leaderboard
AgentBench: Evaluating LLMs as Agents
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios ,
Paper
AllenAI
RewardBench: Evaluating Reward Models ,
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning ,
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories ,
ZeroEval: Benchmarking LLMs for Reasoning ,
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild ,
Artificial Analysis
Text to Image Leaderboard ,
LLM-Performance-Leaderboard
Arize Text - Arize Phoenix
Audio Audio - Kimi-Audio-Evalkit
AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking
Awesome-GenAI-Watermarking
Coding
LLM Leaderboard for Code Quality & Security ,
Coding LLM Leaderboard (e.g. Kimi K2) ,
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
FinanceBench: A New Benchmark for Financial Question Answering
Guidebook - HowTO Smol Training Playbook: The Secrets to Building World-Class LLMs ,
LLM Evaluation guidebook ,
LLM Evaluation Guidebook ,
GAIA General Agent Leaderboard (e.g. Manus)
GAIA: a benchmark for General AI Assistants
ImagenWorld
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks ,
Introducing ImagenWorld: A Real World Benchmark for Image Generation and Editing ,
TIGER-Lab/ImagenWorld-Visualizer ,
TIGER-Lab/ImagenWorld
Inferless Open-Source Text-to-Speech Model Gallery
LifeArchitect.ai
LMArena LMArena.ai/leaderboard
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents
Meituan VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications ,
Multilingual Embedding Leaderboard
OmniDocBench: OCR Document Parsing
OpenLLM Open LLM Leaderboard
OpenCompass OpenCompass LLM Leaderboard ,
OpenCompass multi-modal Leaderboard
philschmid/LLM Pricing
MMDocBench: benchmarking large vision-language models for fine-grained visual document understanding
OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use
Princeton HAL: Holistic Agent Leaderboard ,
Holistic Agent Leaderboard: The Missing Infrastructure For Ai Agent Evaluation
Speech - Hume - Expressive TTS Arena
Speech - ServiceNow: AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs
Speech - TTS-Arena
Speech - UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models (OpenBMB) Stanford Foundation Model Transparency Index ,
Stanford Holistic Evaluation of Language Models (HELM)
Surveys Deep Research: A Systematic Survey ,
A Systematic Survey of Deep Research
SycoFact 4B: Lightweight Sycophancy and Safety Evaluator ,
Text-to-Image-Leaderboard
Video-Generation-Arena-Leaderboard
Text-To-Speech - TTS-AGI/TTS-Arena
Text-To-Video - Vchitect/VBench_Leaderboard
Video - MovieGenBench
Text - klu Evaluation Guide
Text - DeepEval
Stable Diffusion Ecosystem
MMSearch
SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark
Tencent-Hunyuan/ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation
Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
Top3D.ai - Image-To-3D Online Benchmark
Awesome Alternative UIs for ComfyUI
VBench : Comprehensive Benchmark Suite for Video Generative Models
WorldScore: A Unified Evaluation Benchmark for World Generation ,
WorldScore Leaderboard
Alibaba GenAI Platform
argil.ai: AI influencers ,
fal.ai: argil/avatars
Amazon SageMaker
animemaker
baseten
ByteDance GenAI Platform
DataCrunch.io
Flora.ai
Kaiber.ai
LighTricks - LTX Studio
mage.space
MimicPC ,
Learn ,
nexa.ai (on-device models & inference
Supported Models
OpenRouter (who use & how much)
Pipecat is a framework for building voice (and multimodal) conversational agents
Perplexity DeepSeek R1 1776
RenderNet.ai
Replicate
supported models
comfyui-replicate
RunWare.ai
RunComfy
RunningHub
runpod.ai
SegMind - PixelFlow
Shakker.ai
sinkin
tensor.art
ThinkDiffusion ,
thinkdiffusion ,
Floyo ,
tag/comfyui
Together.AI
Together.AI
vast.ai
MagicQuill ,
GitHub - MagicQuill ,
Demo ,
ComfyUI_MagicQuill
PhotoDoodle ,
smthemex/ComfyUI_PhotoDoodle ,
HuggingFace - PhotoDoodle-Image-Edit-GPU
Tencent GenAI Platform
ComfyUI workflows CivitAI - user - UmeAiRT ,
CivitAI - user - yorgash (ComfyUI workflows)
GenAI - Reading Material
Anthropic Cookbook ,
ArXiv - Prompt Canvas: A Literature-Based Practitioner Guide for Creating Effective Prompts in Large Language Models
ArXiv - ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation (Tel Aviv University, NVIDIA)
ArXiv - ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
Google Skills
Xtending Digital Narrative (Jhave's Ai links)
ComfyUI Desktop User Guide
Does the United Nations Need Agents? (Amina, Abdalla) ,
The UN Made AI-Generated Refugees (404media) ,
SegMind Blog ,
Qwen-Image: Prompt & Parameter Guide
Sonar WhitePapers ,
Coding Persoalities
Reinforcement Learning
Reinforcement Learning (RL) Guide (unsloth.ai) ,
Reinforcement Learning from Human Feedback (Nathan Lambert)
Keywords:
Awesome VLM Architectures ,
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
DFloat11 DFloat11: Lossless LLM Compression for Efficient GPU Inference ,
DFloat11/FLUX.1-Kontext-dev-DF11
FlashPortrait: 6 X Faster Infinite Portrait Animation with Adaptive Latent Prediction
Latent-Space ComfyUI Latent Color Tools
Liger Kernel: Efficient Triton Kernels for LLM Training
LightX2V Qwen Image + Wan Video ,
Qwen T2I ,
Wan I2V ,
MagCache ,
Zehong-Ma/ComfyUI-MagCache ,
ModelTC (Lightning, LightX2V) ,
LightCompress: Towards Accurate and Efficient AIGC Model Compression
SimpleMem: Efficient Lifelong Memory for LLM Agents
TeaCache
WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
ZeroGPU ahead-of-time (AoT) on HuggingFace
Tool / Training / Utility
CivitAI LORA Trainer ,
CivitAI FLUX Trainer ,
ThinkDiffusion - building-better-models-flux-loras-in-comfyui
3D Tools AutoDesk - Wonder Dynamics ,
DeepMotion ,
Rokoko ,
Odyssey (3D scene generation)
ComfyUI-FluxTrainer ,
YouTube - Custom AI Digital Human with HeyGen's Lora Training
YouTube - How to Replace Yourself on Zoom Calls with an AI Clone from HeyGen
DiffSynth-Studio
Qwen-Image-i2L (Image to LoRA) ,
modelscope/DiffSynth-Studio ,
Ostris.AI
YouTube - ostrisai (LORA=Qwen-Edit, Wan2.1 i2v)
Training FLUX LORA
Training FLUX LORA
Training FLUX LORA
shootthesound/comfyUI-Realtime-Lora
FlyMyAI flymyai-lora-trainer ,
FluxGym ,
Training FLUX LORA ,
Training FLUX LORA
CivitAI - flux-guide-part-i-lora-training
Training LORA
Training lora-training-dataset-creation-comfyui-one-click-dataset
ModelScope/data-juicer
SpeechMatics How to Finetune Sesame AI's Speech Model on New Languages and Voices ,
knottwill/sesame-finetune
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development ,
AIDC-AI/ComfyUI-Copilot
Crystools (CPU, GPU, RAM, VRAM, GPU Temp and space)
ComfyUI-to-Python-Extension
comfy-pack: Serving ComfyUI Workflows as APIs ,
bentoml/comfy-pack
ComfyUI api-nodes
Detectors - NSFW Falconsai/nsfw_image_detection ,
ComfyUI-NSFW-Detection
GitHub - JDCN - Directory Path
GitHub - liusida/ComfyUI-AutoCropFaces
Captioning
HTML img-comparison-slider ,
GitHub ,
demo.photo.gallery
LayerStyle
comfyui-propost
NovaSky NovaSky: UC Berkeley's Sky Computing Lab
chflame163/ComfyUI_LayerStyle_Advance (ZhiPu / SegmentAnything)
OmniSVG: A Unified Scalable Vector Graphics Generation Model
POLARIS: A POst-training recipe for scaling reinforcement Learning on Advanced ReasonIng modelS
QuasiBlob - image processing ComfyUI-EsesImageCompare
Qwen-Image & Qwen-Image-Edit LoRA Training
WaterMark
Image Detection Bypass Utility ,
ComfyUI-ShaderNoiseKSampler (blends standard noise generation with a multi-stage shader-based system) ,
LLM Attacks ,
Removing refusals with HF Transformers ,
Harmless instructions ,
Refusal in LLMs is mediated by a single direction ,
Bob's Confetti : Phonetic Memorization Attacks in Music and Video Generation ,
SynthID-Bypass ,
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models ,
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model ,
Heretic: Fully automatic censorship removal for language models ,
Tencent ai-detect ,
Hive ai-generated-content-detection
Tool - Prompt Engineering
adieyal/comfyui-dynamicprompts
AIrjen/OneButtonPrompt
HunyuanVideo 1.5 Prompt Handbook
MushroomFleet/LLM-Base-Prompts (mixed)
AI Video Creation Guide
PixelPruner theallyprompts ,
civitai
Prompt Lists fofr ,
ai-prompts/prompt-lists ,
marduk191/ComfyUI-Fluxpromptenhancer ,
PromptHero - portraits-prompts ,
PromptMania ,
Tool - General / TechScan / Research / DeepResearch
Awesome Deep Research ,
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents ,
A Systematic Survey of Deep Research ,
HuggingFace - Daily Papers ,
blog.comfy.org ,
awesome-comfyui
AI News
AI-Researcher: Autonomous Scientific Innovation
AI4Research: A Survey of Artificial Intelligence for Scientific Research ,
Paper
Alibaba
WebAgent for Information Seeking (WebShaper, WebSailor, WebDancer, WebWalker)
Alibaba-NLP/DeepResearch ,
DeepTutor: AI-Powered Personalized Learning Assistant
Fireplexity: Open Source Perplexity AI Clone
Hermit: offline ai chatbot for zim files ,
Kiwix Library
MiroThinker ,
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
Note-taking / Whiteboards
Heptabase ,
MilaNote ,
Google Mixboard ,
Scrintal ,
Nvidia Universal Deep Research: Bring Your Own Model and Strategy
Research Assistant ByteDance PaSa: An LLM Agent for Comprehensive Academic Paper Search ,
Stanford STORM ,
Github - Deep Research ,
NanoSage - Advanced Recursive Search & Report Generation ,
Perplexity - Deep Research ,
SurfSense
ByteDance - DeerFlow (Deep Exploration and Efficient Research Flow) ,
OpenNotebook ,
PageLM ,
HyperBookLM ,
Momo-research: Context Engineering and Persistent Memory for AI agents. ,
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis ,
OpenResearcher demo ,
dify-deepseek-deploy-a-private-ai-assistant
AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search ,
SakanaAI/AI-Scientist-v2
WebThinker: Empowering Large Reasoning Models with Deep Research Capability (paper) ,
WebThinker ,
RUC-NLPIR/WebThinker ,
|
Object Background Remover / Segmentation / InPaint / OutPaint
Apple DepthPro ,
ComfyUI-Depth-Pro ,
Articulate-Anything - Automatic Modeling of Articulated Objects
Depth Anything 3: Recovering the Visual Space from Any Views ,
PozzettiAndrea/ComfyUI-DepthAnythingV3
GitHub - Inspyrenet-Rembg
GitHub - RMBG (BEN2, mask feather, dino object segmentation) ,
PramaLLC/BEN2_ComfyUI
BRIA Background Removal v2.0
Image Pyramid Structure for High Resolution Salient Object Detection (InSPyReNet)
Bilateral Reference for High-Resolution Dichotomous Image Segmentation (BiRefNet) ,
ComfyUI-BiRefNet ,
MoonHugo/ComfyUI-BiRefNet-Hugo
Background Erase Network (BEN)
CivitAI ComfyUI
FE2E: From Editor to Dense Geometry Estimator
lama-remover ,
batch-process-images-with-lama-cleaner
Diffusers Image Fill Guide
Lanpaint: Training-Free Diffusion Inpainting with Exact and Fast Conditional Inference ,
MegaSAM - Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos
magic-research/Sa2VA ,
Sa2VA-simple-demo
MatAnyone (NTU, SenseTime) ,
HuggingFace demo
Rex-Omni: Detect Anything via Next Point Prediction ,
Rex-Omni demo
ROSE: Remove Objects with Side Effects in Videos
RF-DETR: SOTA Real-Time Object Detection Model ,
RF-DETR demo
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction ,
OpenIXCLab/SeC-4B ,
9nate-drake/Comfyui-SecNodes
Vargo Teleport - 3D from iPhone Video
VOID: Video Object and Interaction Deletion
Meituan X-SAM: From Segment Anything to Any Segmentation
Speech - Music
Keywords:
github - Awesome Audio
ACE-Step 1.5 ,
filliptm/ComfyUI-FL-AceStep-Training (training)
ComfyUI-DeepExtract - separate vocals and sounds from audio files
DiRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
FoleyCrafter FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds ,
HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation ,
Foundation-1: Structured text-to-sample generation for modern music production ,
multimodalart/Foundation-1
HeartMuLa: Music Foundation Models
MMAudio hkchengrex/MMAudio ,
kijai/ComfyUI-MMAudio ,
ACE-Step: A Step Towards Music Generation Foundation Model ,
ace-step/ACE-Step
Platform - ElevenLabs voice-library/angry-voices
Platform - Hume.AI LLM for text-to-speech ,
Platform - Play.HT Play.HT - singaporean-english ,
play.ht sandbox
Platform - Resemble resemble.ai (Fake Audio Detection)
PrismAudio: Decomposed Chain-of-Thoughts and Multi-Dimensional Rewards for Video-to-Audio Generation
RF-DETR: SOTA Real-Time Detection and Segmentation Model
Riffusion (platform)
SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement
SongGeneration - LeVo: High-Quality Song Generation with Multi-Preference Alignment
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing ,
ThinkSound
Music YuE: Open Music Foundation Models for Full-Song Generation
Speech - Text-2-Speech (TTS)
Keywords:
HuggingFace - Speech ,
github - awesome-ai-voice ,
github - Awesome Audio
Audio Enhancers
NovaSR (for clearing up low-quality audio) ,
YatharthS/NovaSR (for super-resolution) ,
LavaSR (for super-resolution) ,
Resemble Enhance (for denoising and bandwidth extension) ,
OpenVINO Audacity plugin (for super-resolution) ,
MelbandRoFormer (for music source separation - vocals & instruments) ,
filliptm/ComfyUI_Fill-Nodes ,
audio-separation-nodes-comfyui ,
CanopyLabs Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation ,
canopylabs/orpheus TTS (emotion, training mesopolitica) ,
ShmuelRonen/ComfyUI-Orpheus-TTS
CosyVoice
CosyVoice 3 (with samples) ,
CosyVoice 2 ,
muxueChen/ComfyUI_NTCosyVoice
touge/ComfyUI-NCE_CosyVoice
DiodioGod TTS Audio Suite
Data
AI Audio Datasets (AI-ADS) ,
CN-Celeb1, CN-Celeb2 ,
DataoceanAI Dolphin (40 Eastern languages East Asia, South Asia, Southeast Asia, Middle East, 22 Chinese dialects)
TTS - FishAudio
Fish Audio ,
FishAudio (No ComfyUI) ,
huggingface.co/fishaudio
Qwen2-Audio-7B-Instruct-Int4
TTS - DeepGram TTS playground ,
text-to-speech-prompting
TTS - F5-TTS HuggingFace - SWivid/F5-TTS ,
niknah/ComfyUI-F5-TTS ,
erax-ai (vietnamese) ,
TTS - FreeVC Github - FreeVC - One-Shot Voice Conversion ,
ShmuelRonen/ComfyUI-FreeVC_wrapper
TTS - Hume.AI TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment
TTS - IMS-Toucan IMS-Toucan: Controllable Text-to-Speech for over 7000 Languages ,
MassivelyMultilingualTTS
TTS - KaniTTS
KaniTTS: Fast and Expressive Speech Generation Model ,
wildminder/ComfyUI-KaniTTS ,
TTS - Kokoro Voice Mixer Studio ,
MushroomFleet/DJZ-KokoroTTS
TTS = Llasa-3B Llasa-3B ,
HKUSTAudio/Llasa-3B ,
Replicate - kjjk10/llasa-3b-long
TTS - Marvis-AI Marvis-TTS-250m (Sesame CSM-1B, Kyutai mimi codec) ,
Marvis-Labs/marvis-tts
TTS - Microsoft VibeVoice microsoft/VibeVoice-1.5B ,
Enemyx-net/VibeVoice-ComfyUI ,
Demo ,
Fine-Tuning ,
VibeVoice-Realtime is a lightweight real-time text-to-speech model supporting streaming text input and robust long-form speech generation ,
VibeVoice-ASR (60min)
TTS - Nari-Labs DIA (2-pax dialogue)
TTS - Sesame Sesame - Crossing the uncanny valley of conversational voice ,
Sesame CSM 1B for Multi-Speaker AI Conversations ,
SesameAILabs/csm ,
Sesame - CSM (Conversational Speech Model) ,
billwuhao/ComfyUI_CSM ,
SpeechMatics - How to Finetune Sesame AI's Speech Model on New Languages and Voices ,
SpeechMatics - knottwill/sesame-finetune
TTS - SparkTTS
An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
billwuhao/ComfyUI_SparkTTS ,
SparkTTS Demo ,
1038lab/ComfyUI-SparkTTS ,
Spark-TTS-finetune ,
SparkAudio/Spark-TTS (NTU)
TTS - Seed-VC
Zero Shot Voice Conversion (Singing VoiceOver) ,
Plachtaa/seed-vc
TTS - SoulX-Podcast SoulX-Podcast: Towards Realistic Long-form Podcasts with Dialectal and Paralinguistic Diversity
TTS - StepFun.AI
StepAudio2: Speech and Audio Understanding & Conversation ,
StepAudio Demo ,
billwuhao/ComfyUI_StepAudioTTS
Tool TTS-WebUI
TTS - ZipVoice
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
TTS - Zonos
Zyphra ,
Zyphra playground ,
Zonos v01 Speech TTS ,
Talking Head
Keywords:
harlanhong/awesome-talking-head-generation ,
JosephPai/Awesome-Talking-Face ,
Kedreamix/Awesome-Talking-Head-Synthesis ,
awesome-ai-talking-heads ,
awesome-digital-human ,
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) ,
European Conference on Computer Vision (ECCV) ,
International Conference on Computer Vision (ICCV) ,
International Conference on Learning Representations (ICLR) ,
Github - lip-sync
Alibaba Fantasy Talking ,
kijai/ComfyUI-WanVideoWrapper ,
Alibaba - OmniAvatar ,
OmniAvatar ,
OmniTalker
YouTube ,
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers ,
EchoShot: Multi-Shot Portrait Video Generation ,
JoHnneyWang/EchoShot ,
Paper
ACTalker ,
Paper
AnimPortrait3D - text to 3D animation
animate-anyone-2 - High-Fidelity Character Image Animation with Environment Affordance ,
AnimateAnyone
Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars ,
tobias-kirschstein.github.io/avat3r
BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation
ByteDance InfiniteYou - Flexible Photo Recrafting While Preserving Your Identity ,
Paper ,
bytedance/InfiniteYou ,
InfiniteYou-FLUX demo ,
ZenAI-Vietnam/ComfyUI_InfiniteYou
LatentSync ,
FlowAct-R1: Towards Interactive Humanoid Video Generation (no-code)
Character.AI avatar-fx
CharaConsist: Fine-Grained Consistent Character Generation
GitHub - DeepFuze ,
YouTube , Facial transformations, lipsyncing, video generation, voice cloning, face swapping, and lipsync translation
DICE-Talk - Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation ,
smthemex/ComfyUI_DICE_Talk
Voice Dubbing
FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes ,
JustDubIt Just-Dub-It: Video Dubbing via Joint Audio-Visual Diffusion ,
Inference pipeline & Training guide
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer ,
jax-explorer/ComfyUI-easycontrol ,
Paper
EchoMimic
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait ,
yuvraj108c/ComfyUI-FLOAT ,
deepbrainai-research.github.io/float
FramePack - Packing Input Frame Context in Next-Frame Prediction Models for Video Generation ,
lllyasviel/FramePack ,
HelloMeme
HumanMLLM HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
KwaiVGI ReCamMaster: Camera-Controlled Generative Rendering from A Single Video ,
jianhongbai.github.io/ReCamMaster ,
Paper ,
kijai/ComfyUI-WanVideoWrapper ,
LAM: Large Avatar Model for One-shot Animatable Gaussian Head
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length ,
Alibaba-Quark/LiveAvatar
LivePortrait
GitHub - ComfyUI-LivePortraitKJ ,
GitHub - ComfyUI-AdvancedLivePortrait ,
YouTube
LHM: Large Animatable Human Reconstruction Model for Single Image to 3D in Seconds ,
spaces/DyrusQZ/LHM demo ,
LHM ComfyUI ,
LAM: Large Avatar Model for One-shot Animatable Gaussian Head ,
PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image ,
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
OmniGen2: Exploration to Advanced Multimodal Generation ,
VectorSpaceLab/OmniGen2 ,
ComfyUI-OmniGen2 ,
MemoAvatar - MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation ,
ComfyUI-IF_MemoAvatar ,
MoCha: End-to-End Video Character Replacement without Structural Guidance ,
PersonaLive
PersonaLive: Expressive Portrait Image Animation for Live Streaming ,
GVCLab/PersonaLive ,
PersonaLive ,
PixelSmile: Toward Fine-Grained Facial Expression Editing
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Sketch2Anim: Towards Transferring Sketch Storyboards into 3D Animation
SkyReels-Audio
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation ,
xiaozhongji/Sonic Demo
smthemex/ComfyUI_Sonic
StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation
Francis-Rings/StableAvatar
TaoAvatar pixelai-team.github.io/TaoAvatar ,
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting ,
Medium
Tencent GitHub - ComfyUI-MimicMotionWrapper
MimicMotion ,
MusePose ,
InteractAvatar: Making Avatars Interact Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars ,
UniAnimate / Animate-X Animate-X ,"
UniAnimate ,
ali-vilab/UniAnimate ,
Isi-dev/ComfyUI-UniAnimate-W (UniAnimate=humans, Animate-X=animals/cartoons) ,
UniAnimate/Animate-X models ,
Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation(Camera + Video Motion) ,
alibaba-damo-academy/Uni3C
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
WildActor: Unconstrained Identity-Preserving Video Generation
X-Portrait GitHub - akatz-ai/ComfyUI-X-Portrait
Face Restoration & Realism
Keywords:
1ai
AdaFace: Quality Adaptive Margin for Face Recognition
CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation
Compare PuLID vs InstantID vs FaceID
GitHub - Gourieff (ReActor Node for ComfyUI) ,
somanchiu/ReSwapper ,
faceswap
HuggingFace - GuijiAI/ReHiFace-S
GitHub - sipie800/ComfyUI-PuLID-Flux-Enhanced
GitHub - cubiq/ComfyUI_FaceAnalysis ,
GitHub - jordoh/ComfyUI-Deepface/
GitHub - Person Mask Generator
alexgenovese/facerestore
modelscope/facechain
Face Restoration Pearl Rope ,
Roop ,
DeepFaceLive ,
SimSwap ,
deepfakes/faceswap
FaceFusion ,
Deepfacelive-DFM-Models
Gourieff - comfyui-reactor-node
GitHub - Person Mask Generator
DreamID - A Fast and High-Fidelity diffusion-based Face Swapping via Triplet ID Group Learning
Image-To-Text (i2t) Captioning
AllenAI Molmo 7B D
Joy Caption
Microsoft ,
Microsoft Florence2 ,
Florence-2 ,
MiaoshouAI ,
ComfyUI-Miaoshouai-Tagger ,
Microsoft Phi - alexisrolland/ComfyUI-Phi (Phi-3.5-mini-instruct, Phi-3.5-vision-instruct) ,
Phi 3.5 ,
MiniCPM-Plus ,
MiniCPM v2.6 Prompt Generator
Moondream (Visual Q&A, Caption, Object Detection) ,
Moondream blog ,
vikhyatk/moondream2 ,
vikhyat/moondream ,
kijai/ComfyUI-moondream ,
Hangover3832/ComfyUI-Hangover-Moondream
OmniVLM-968M (no ComfyUI)
Pixtral Llama Molmo Vision
PromptCraft
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards (qwen-edit-2509 LORA)
QwenVL for ComfyUI (image & video) ,
Qwen2-VL-Instruct ,
Searge-LLM
WD14-Tagger
gokaygokay/Flux Prompt Generator ,
Flux-Florence-2 ,
fairy-root ,
IuvenisSapiens (miniCPM, QWEN, QWEN Audio)
Zhipu GLM ,
GitHub - JcandZero/ComfyUI_GLM4Node ,
GitHub - Nojahhh/ComfyUI_GLM4_Wrapper ,
Models
Keywords: HuggingFace - text-generation ,
InoReader - Algorithm ,
AIModels.fyi
Comfy-Org
HuggingFace
ModelScope
AlexGeNovese checkpoint ,
clip ,
clip_vision ,
controlnet ,
facerestore ,
ipadapters ,
loras ,
sams ,
vae ,
ultralytics
city96 GGUF Qwen-Image ,
LTX ,
HunyuanVideo-I2V
DiffBot diffbot-llm-inference ,
diffy.chat demo
Edge Models
Falcon
Falcon-H1 ,
HuggingFace
Vision Language Model - SmolVLM-500M-Instruct-WebGPU ,
SmolLM3-3B (web) ,
Liquid.AI
LFM2: On-Device Models ,
Edge Models ,
LFM2.5 Models
MiroMind
MiroThinker
HuiHui-ai
abliterated models ,
Huihui-Qwen3-VL-8B-Instruct-abliterated ,
coder3101/Qwen3-VL-8B-Thinking-heretic ,
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer ,
spaces/RiverZ/ICEdit
IamCreateAI/Ruyi ,
ByteDance - 1.58-bit FLUX
Hugging Face for Legal ,
HFforLegal/datasets ,
IPAdapter (FaceID, clip-vision, LORA)
Kijai Skyreels ,
LTXV ,
HunyuanVideo ,
MonsterMMORPG Wan - GGUF ,
Upscale ,
FaceSegments` ,
Yolo
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) , UAE Institute of Foundation Models (IFM) ,
Sherkala (English, Russian, and Turkish) ,
K2-Think ,
Ostris qwen_edit_inpainting
PowerInfer
,
,
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local ,
Paper
QuantStack GGUF Wan2.2-I2V-A14B ,
Qwen-Image-Distill ,
FLUX.1-Kontext-dev ,
LTXV-13B-0.9.8-distilled ,
Wan2.1_I2V_14B_FusionX
Reaslim TensorArt - Extra-Realistic-Flux ,
TensorArt - kg_09
StrangerZone StrangerZone LORA (Flux-Super-Realism-LoRA, Super 3D - Engine)
Swiss-AI - Apertus .
Swiss-AI - Projects
SVDQuant ,
mit-han-lab/ComfyUI-nunchaku
TheBloke (>4K)
TildeOpen LLM: Europe's Sovereign Multilingual AITildeOpen LLM: Europe's Sovereign Multilingual AI ,
TildeAI/TildeOpen-30b
Unsloth.ai
Unsloth.ai ,
UnSloth (>300) ,
GitHub - UnSloth AI ,
unsloth/deepseek-v3 ,
phi-4-all-versions ,
Fine-tune & Run Qwen3 ,
Fine-tuning TTS models (Sesame's CSM, Orpheus)
Upscale SUPIR
Keywords:
Awesome-video-super-resolution-diffusion ,
Awesome Diffusion Models for Video Super-Resolution ,
OpenModelDB ,
HuggingFace - Phips ,
realistic skin
4kagent (satellite)
ac-pill/upscale_models (e.g. RealESRGAN_x4plus_anime_6B.pth)
Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment (No COmfyUI) ,
CineScale: High-Resolution Cinematic Visual Generation ,
ali-vilab/FreeScale
InvSR - Arbitrary-steps Image Super-resolution via Diffusion Inversion (No ComfyUI) ,
OAOA/InvSR demo
camenduru/SUPIR ,
Dynamic Position Extrapolation (DyPE) - supports FLUX, Qwen Image, and Z-Image
FLASHVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution
HuggingFace - upscaler
HYPIR ,
XPixelGroup
GitHub - shiimizu/ComfyUI-TiledDiffusion
GitHub - ssitu/ComfyUI_UltimateSDUpscale
OPPO Research Institute One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution (DLoRAL)
Thera: Aliasing-Free Arbitrary-Scale Super-Resolution with Neural Heat Fields (No COmfyUI)
SeedVR
seedvr2 ,
ByteDance-Seed/SeedVR ,
ComfyUI-SeedVR2_VideoUpscaler
SeedVR2_comfyUI (6Gb, 13Gb)
SeedVR2-7B (33Gb) ,
SeedVR2-3B (14Gb)
Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion
|
Video
Keywords: Github - Awesome Video Diffusion ,
Github - Awesome-LLMs-for-Video-Understanding
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer (supports long video and misaligned characters) ,
ssj9596/One-to-All-Animation
Subject-to-Video (s2v)
SkyworkAI/SkyReels-V1 ,
Kijai/SkyReels-V1-Hunyuan_comfy ,
FramePack - generate 1-minute video (60 seconds) ,
Genmo
Mochi ,
ComfyUI-MochiWrapper
GitHub - logtd/ComfyUI-MochiEdit ,
GitHub - logtd/ComfyUI-LTXTricks ,
Motion-I2V (No ComfyUI)
Google Genie-2 (No ComfyUI)
Tsinghua University Knowledge Engineering Group (KEG) & Data Mining
CogVideoX-5b
CogVideoX models
MAGREF - Masked Guidance for Any-Reference Video Generation ,
MAGREF-Video/MAGREF ,
Phantom (Subject2Video)
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment ,
kijai/ComfyUI-WanVideoWrapper ,
YouTube - Phantom workflow
Remade-AI HuggingFace - Remade-AI (video LORA) ,
remade-effects ,
Selfie-With-Younger-Self ,
360 Degree Rotation ,
Zoom-Call ,
workflow - Selfie-With-Younger-Self
SkyReels (e2v)
Skyreels V1: Human-Centric Video Foundation Model ,
SkyReels V2: Infinite-Length Film Generative Model ,
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling (SVI-Pro)
Tencent Hunyuan Tencent Hunyuan ,
ComfyUI-HunyuanVideoWrapper ,
HunyuanVideo_comfy models ,
HY-Motion 1.0: Scaling Flow Matching Models for 3D Motion Generation (text-to-3D==DeepMotion SayMotion) ,
Video Frame Interpolation kijai/ComfyUI-GIMM-VFI ,
Fannovel16/ComfyUI-Frame-Interpolation
FlowEdit ,
FlowEdit Image Editing (One-Click Text Modification) ,
ComfyUI-Fluxtapoz ,
FlowEdit Video Editing (No Masks, No Noise) ,
logtd/ComfyUI-LTXTricks ,
logtd/ComfyUI-HunyuanLoom ,
GitHub - Fannovel16/ComfyUI-MotionDiff
WAN
Wan: Open Large-Scale Video Generative Models ,
ATI: Any Trajectory Instruction for Controllable Video Generation ,
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion ,
CausVid - From Slow Bidirectional to Fast Autoregressive Video Diffusion Models ,
VACE: All-in-One Video Creation and Editing ,
3D OpenPose / PoseNet / DepthMap
Keyword:
VAST-AI-Research/repositories ,
CivitAI - poses ,
CivitAI - openpose ,
facebook/ActionMesh: Video to Animated 3D Mesh
Data PoseManiacs ,
Bandai-Namco ,
Pose-Depot ,
CivitAI (>5Gb) ,
PoseMyArt ,
AppAnything 1 ,
AppAnything 2 ,
AppAnything 3 ,
HumanDataset 1 ,
HumanDataset 2 ,
3DScanStore ,
RenderPeople ,
CMU Graphics Lab Motion Capture Database ,
Microsoft-Rocketbox ,
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields
MarketPlace DevianArt ,
Proko ,
` MocapCentral
Tool - Poses OpenPoseAI (detect pose from image)
AlphaPose (out-dated)
comfyui_controlnet_aux-Midas, Zoe Depth ,
ComfyUI-Marigold
DeepVerse - 4D Autoregressive Video Generation as a World Model
FaceLift: Single Image to 3D Head
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders ,
HuggingFace - fffiloni/Gaze-LLE
Generative Refocusing: Flexible Defocus Control from a Single Image
Insta360 - Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers
TMElyralab/Comfyui-MusePose
Tencent - akatz-ai/ComfyUI-DepthCrafter-Nodes
Pose Estimation 4DHumans ,
shubham-goel/4D-Humans ,
open-mmlab/mmpose ,
TMElyralab/Comfyui-MusePose ,
logtd/ComfyUI-4DHumans
GeoWizard GeoWizard 2D->3D ,
GitHub - fuxiao0719/GeoWizard ,
kijai/ComfyUI-Geowizard
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator
ComfyUI_Sapiens - (seg,normal,pose,depth,mask maps) ,
sapiens-pose-1b-torchscript
MoBluRF: Motion Deblurring Neural Radiance Fields for Blurry Monocular Video
StableAnimator: High-Quality Identity-Preserving Human Image Animation
Text-To-Motion
FrankenMotion: Part-level Human Motion Generation and Composition
Tencent HY-Motion 1.0: Scaling Flow Matching Models for 3D Motion Generation ,
Nvidia Kimodo: Scaling Controllable Human Motion Generation
Unirig: Diverse Skeleton Rigging - One Model to Rig Them All
UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass
3D - 2D to 3D Monocular / NERF / Gaussian Splatting / Multi-view
Keyword:
Github - awesome-gaussians ,
3D Gaussian Splatting Papers ,
Awesome 3D Diffusion ,
Github - awesome-3D-gaussian-splatting
AllenAI Objaverse-XL - A Universe of 10M+ 3D Objects
BlockGaussian
4D Gaussian Splatting (temporal)
FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction ,
Gsplat-based 4D Gaussian Splatting for Dynamic Scenes
ByteDance Seed3D
CityGaussian CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes ,
GitHub - citygs ,
Paper
CraftsMan3D: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner
DreamTechAI Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
Elevate3D: Elevating 3D Models: High-Quality Texture and Geometry Refinement from a Low-Quality Model
EO-NeRF - Multi-Date Earth Observation NeRF - The Detail Is in the Shadows ,
EOGS - Gaussian Splatting for Efficient Satellite Image Photogrammetry ,
EOGS Paper
Free 360 : Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views
Geo4D Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction ,
jzr99/Geo4D
Google Google Genie2: Generative Interactive Environments ,
Map2Video: Street View Imagery Driven AI Video Generation (paper)
Group3D: MLLM-Driven Semantic Grouping for Open-Vocabulary 3D Object Detection
GUAVA: Generalizable Upper Body 3D Gaussian Avatar
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence
HoloPart: Generative 3D Part Amodal Segmentation ,
HoloPart demo ,
SAMPart3D: Segment Any Part in 3D Objects
Hunyuan3D-2: High Resolution Textured 3D Assets Generation ,
tencent/Hunyuan3D-2
Hunyuan3D-2 demo ,
jtydhr88/ComfyUI-InstantMesh
Humans and Structure from Motion (HSfM) - Reconstructing People, Places, and Cameras
HyperNerf : A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields
Image-To-3D (i3D), Video-To-3D (v3d)
Know3D: Prompting 3D Generation with Knowledge from Vision-Language Models ,
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models ,
Vega3D: Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding ,
World Reconstruction From Inconsistent Views ,
ImmerseGen - Agent-Guided Immersive World Generation with Alpha-Textured Proxies
Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model ,
Make-It-Animatable ,
Demo
Meta - Multi-SpatialMLLM
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models ,
Navigation World Models ,
MapAnything: Universal Feed-Forward Metric 3D Reconstruction ,
facebook/map-anything ,
SAM 3D Body: Robust Full-Body Human Mesh Recovery ,
SAM3D - Human demo ,
SAM3D - Object demo ,
SAM 3D Objects ,
SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos (tsinghua) ,
WorldGen: Generate Any 3D Scene in Seconds ,
AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials ,
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models ,
ShapeR: Metric Generative Shape Reconstruction
Microsoft MoGe - Monocular 2D->3D ,
kijai/ComfyUI-MoGe
MoCA: Mixture-of-Components Attention for Scalable Compositional 3D Generation
MV-Adapter: Multi-view Consistent Image Generation Made Easy ,
ComfyUI-MVAdapter ,
MVAdapter-demo ,
Paper
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Nerfies: Deformable Neural Radiance Fields
Nvidia Cosmos World Foundation Models
Vulkan Gaussian Splatting ,
OccluGaussian OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering ,
Paper
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion
PartGen - Part-level 3D Generation and Reconstruction
PanoWan: Lifting Diffusion Video Generation Models to 360 with Latitude/Longitude-aware Mechanisms
RL3DEdit: Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing
SpatialLM: Large Language Model for Spatial Understanding (No COmfyUI) ,
manycore-research/SpatialLM ,
manycore research
ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding
SkySplat: Generalizable 3D Gaussian Splatting from Multi-Temporal Sparse Satellite Images (no code) ,
SkySplat: 3DGS Blender Toolkit
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Stable-X
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging (no textures) ,
Stable-X/Hi3DGen demo ,
Stable-X/ComfyUI-Hi3DGen ,
Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling ,
ilcve21/Sparc3D
SkinTokens: A Learned Compact Representation for Unified Autoregressive Rigging (no code)
TenCent
HunyuanWorld 1.0 ,
Tencent-Hunyuan/HunyuanWorld-1.0 ,
tencent/HunyuanWorld-1 ,
GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition ,
HunyuanWorld-Voyager ,
HunyuanWorld-Voyager: depth and RGB video for efficient and direct 3D reconstruction ,
Hunyuan World Reconstruction
,
FlashWorld: High-quality 3D Scene Generation within Seconds
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
Trellis3D
Trellis2 ,
microsoft/TRELLIS.2-4B ,
Trellis3d - Structured 3D Latents - for Scalable and Versatile 3D Generation ,
Trellis demo ,
if-ai/ComfyUI-IF_Trellis ,
tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction
UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement
UniK3D: Universal Camera Monocular 3D Estimation ,
UniK3D-demo
UrbanSim: Towards Autonomous Micromobility through Scalable Urban Simulation
VastGaussian VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction ,
Paper
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Multi-view 3D reconstruction
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool ,
lashWorld: High-quality 3D Scene Generation within Seconds ,
CAT3D: Create Anything in 3D with Multi-View Diffusion Models ,
Wonderland: Navigating 3D Scenes from a Single Image ,
MVImgNet: A Large-scale Dataset of Multi-view Images ,
WorldLabs.AI (Li FeiFei)
WorldGrow: Generating Infinite 3D World
Yan: Foundational Interactive Video Generation
YUME 1.5: A Text-Controlled Interactive World Generation Model
3DTown: Constructing a 3D Town from a Single Image
HunYuan 3D hunyuan-3d ,
Tencent/Hunyuan3D-2 ,
MrForExample/ComfyUI-3D-Pack ,
niknah/ComfyUI-Hunyuan-3D-2
Vast.AI Github - TripoSG
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Agents
Keyword:
Awesome Adaptation of Agentic AI ,
Github - LLM-Agents-Papers ,
Google Scholar ,
GitHub - restyler/awesome-n8n ,
GitHub - enescingoz/awesome-n8n-templates ,
Argilla
FinePersonas-v0.1 ,
FinePersonas-Synthetic-Email-Conversations ,
synthetic-data-generator-argilla-reviewer ,
AgentSociety: LLM Agents in City ,
tsinghua-fib-lab/AgentSociety
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments ,
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Chorus Engine: Personal AI Orchestration System
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
1Prompt1Story ,
byliutao/1Prompt1Story
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics (paper)
DeepPersona: A Depth-First Synthetic-Persona Engine for Highly Personalized Language Models ,
thzva/Deeppersona ,
DeepPersona demo
LabClaw: Always-On Lab Agent
Open Character Training
PersonaPlex
PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models
NVidia/personaplex
Tencent PersonaHub ,
tencent-ailab/persona-hub ,
Paper
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users ,
Paper
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios ,
Paper
AIPress: A Muti-Agent News Generation and Feedback Simulation System ,
Paper
Microsoft UserLM-8B: Flipping the Dialogue: Training and Evaluating User Language Models ,
TinyTroupe: LLM-powered multiagent persona simulation for imagination enhancement and business insights
n8n
Agentic-Archive ,
n8n + comfyUI API: Batch Convert Images to Video ,
n8n + comfyUI API: Simple
Nvidia Nemotron-Personas (US) ,
Nemotron-Personas (India) ,
Nemotron-Personas (Japan)
OASIS OASIS: Open Agent Social Interaction Simulations with One Million Agents ,
MiroFish: A Simple and Universal Swarm Intelligence Engine, Predicting Anything
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
OpenClaw
Awesome OpenClaw Skills ,
Don’t Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw ,
AutoResearchClaw: Chat an Idea. Get a Paper. Fully Autonomous & Self-Evolving
OPPO Towards Personalized Deep Research: Benchmarks and Evaluations
Tencent Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Keyword:
Awesome World Models for Robotics ,
Benchmark - WorldScore ,
Alibaba FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
AI2-THOR: An Interactive 3D Environment for Visual AI
AI-Town
AI-Town (a16z) ,
World Craft: Agentic Framework to Create Visualizable Worlds via Text
BEHAVIOR-1K: 1000 realistic, full-length household tasks
CARLA: Open-source simulator for autonomous driving research
Embodied City: Embodied Agent in Urban Environment
Genesis: A Generative and Universal Physics Engine for Robotics and Beyond
GeoVLA: Empowering 3D Representations in Vision-Language-Action Models
GigaWorld-Policy: An Efficient Action-Centered World-Action Model (World Action Models WAM)
Google Google Genie2: Generative Interactive Environments ,
SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots
InternUtopia: Dream General Robots in a City at Scale
Large World Model (LWM)
LongColCap: Representing Long Volumetric Video with Temporal Gaussian Hierarchy
Mirage 2 - Generative World Engines ,
Mirage 2 - Demo ,
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge (Minecraft)
Niantic Labs Large Geospatial Model
MindCraft: Collaborating Action by Action: Multi-agent LLM Framework for Embodied Reasoning
PlayerOne: Egocentric World Simulator
Seoul World Model: Grounding World Simulation Models in a Real-World Metropolis
SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds
SkyWorld.AI
Matrix-3D: Omnidirectional Explorable 3D World Generation
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
SPAgent: Agent in the Physical & Spatial World. Think3D: Thinking with Space for Spatial Reasoning
SynCity SynCity: Training-Free Generation of 3D Worlds ,
Paper
UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
Very Big Video Reasoning (VBVR) Suite - Knowledge, Abstraction, Spatiality, Transformation, Perception ,
Video-Reason/VBVR-Bench-Leaderboard
Virtual Community: An Open World for Humans, Robots, and Society
VIGA: Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning (Vibe-code a Physical Scene with Interactions aka Blender sim)
Web World Models (princeton) ,
Princeton-AI2-Lab/Web-World-Models
WorldGen: Generate Any 3D Scene in Seconds
WorldMirror: Universal 3D World Reconstruction with Any Prior Prompting
World Models
LeWorldModel: Stable End-to-End JEPA from Pixels (Yann Lecun) ,
WorldLabs (Li FeiFei) ,
WorldScore: A Unified Evaluation Benchmark for World Generation ,
WorldScore Leaderboard
Datasets
Amazon Berkeley Objects (ABO) Dataset (household items)
HumanRig - Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset
CivitAI-As-Characters
FineVision: Open Data Is All You Need 200 datasets containing 17M images, 89M question-answer turns, and 10B answer tokens, totaling 5TB of high-quality data
Yuan-ManX/ai-audio-datasets
Cartoon Movement (Kenny Tosh)
Data No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages
Data Common Pile v0.1
Data Meta Omnilingual ASR Corpus
Data - Faces CelebV-HQ: A Large-scale Video Facial Attributes Dataset ,
TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis
Data - Humans HUMOTO: A 4D Dataset of Mocap Human Object Interactions
Data - Movies Movie-Drama scripts
Data - Cccupations O*NET database (800 US occupationsUS)
HuggingFace
FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl ,
FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens ,
Images
Eigen-Banana-Qwen-Image-Edit: Lightning-Fast Instruction-Based Image Editing with Pico-Banana-400K ,
MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes
NTU NTU EEE - Digital Signal Processing Laboratory ,
Research Data
Nvidia
Granary - Multilingual Speech AI ,
nvidia/PhysicalAI-Autonomous-Vehicles-NuRec
UniqueData ,
UniqueData/facial-emotion-recognition-dataset
Cartoons
Cartoon Movement - Israeli-Palestinian-Conflict ,
Israel-War-Cycle ,
Paresh Nath, India ,
Marian Kamensky, Austria ,
Kenny Tosh, Nigeria ,
ThinkChina ,
Lighting
Keyword:
CivitAI - lighting ,
Apple pico-banana-400k
GitHub - LAOGOU-666/Comfyui-LG_Relight
GitHub - kijai/ComfyUI-Geowizard
GitHub - kijai/ComfyUI-Lotus
LBM: Latent Bridge Matching for Fast Image-to-Image Translation ,
gojasper/LBM ,
jasperai/LBM_relighting
Qwen-Image
Qwen-Image-Lightning ,
Text - Translation / OCR / Storyboarding
Keyword:
OmniDocBench ,
Hybrid OCR-LLM Framework for Enterprise-Scale Document Information Extraction Under Copy-heavy Task ,
OCR
DeepSeek-OCR-2 ,
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model ,
FireRed-OCR ,
GLM-OCR ,
HunyuanOCR-1B ,
LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family ,
PaddleOCR ,
Doc/Text To LORA
Doc-to-LoRA and Text-to-LoRA
Translation
Tencent-Hunyuan/HY-MT ,
Google TranslateGemma ,
Cohere Tiny-aya
Coding Assistant / Vibe-code
IQuest-Coder-V1
Mistralai/Devstral-2-
qwen3-coder-next
|