TechScan - Artificial Intelligence Generated Content (AIGC) - April 2026

· 3D · Agent · Benchmark · Company · Datasets · DepthMap · FaceSwap · Image Captioning · Lighting · Models · Music · Platforms · Performance · Prompts · Read · Segmentation · Simulation / 3D-World · TalkingHead · Training · TechScan · Text / OCR · TTS · Upscale · Vibe-code · Video · Online

Companies

  1. AIMageLab aimagelab/repositories
  2. AixonLab AixonLab , HuggingFace - AixonLab , lumiere_alpha
  3. Alibaba Alibaba , Alibaba PAI (Platform for AI) , Alibaba TongYi Vision Intelligence Lab , Qwen , Qwen2-VL , Qwen2-VL-7B-Instruct , ACE++ , In-Context-LoRA , Animate-X , ACE Plus ++ , Wan: Open Large-Scale Video Generative Models Models - Kijai/WanVideo_comfy , UniAnimate-DiT , SwapAnyHead (no code) , GitHub - Qwen-Image (Chinese text) , Edit-R1: Reinforce Image Editing with Diffusion Negative-Aware Finetuning and MLLM Implicit Feedback , Qwen-TTS - Dialects OmniTalker , Wan-Animate: Unified Character Animation and Replacement with Holistic Replication , Spaces/Qwen3-ASR-Demo (Speech-To-Text) , Qwen3-LiveTranslate: Real-Time Multimodal Interpretation - See It, Hear It, Speak It! , UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback , Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length , Alibaba-Quark/LiveAvatar , Tongyi-MAI/Z-Image , Ming-Omni: A Unified Multimodal Model for Perception and Generation ,
  4. Alimama Github- Alimama Creative , HuggingFace - alimama-creative
  5. AllenAI AllenAI , Open Data , Molml Olmo , Molml Demo , AllenAI Molmo 7B D , Applied Vision Lab, Institute for Intelligent Computing , allenai/tulu-3-sft-personas-instruction-following Objaverse-XL - A Universe of 10M+ 3D Objects , RewardBench: Evaluating Reward Models , ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning , SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories , ZeroEval: Benchmarking LLMs for Reasoning , WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild ,
  6. Alpha-VLLM Lumina-mGPT 2.0: Stand-alone Autoregressive Image Modeling , Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding , NewBie image Exp0.1: Efficient Image Generation Base Model Based on Next-DiT , TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
  7. ANT ANT , EchoMimic , EchoMimic - Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
  8. Anthropic How we built our multi-agent research system , Claude Code Game Studios
  9. Apple Apple DepthPro , ComfyUI-Depth-Pro , FastVLM: Efficient Vision Encoding for Vision Language Models , pico-banana-400k , STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows , Sharp Monocular View Synthesis in Less Than a Second , Qwen-Image-Edit-2511-Gaussian-Splash
  10. Baidu - BAAI Emu3.5: Native Multimodal Models are World Learners
  11. Bilibili IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech , IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System (based on Tortise/XTTS) , IndexTTS Paper , IndexTTS Samples , IndexTTS demo , billwuhao/ComfyUI_IndexTTS ,
  12. Black Forest Labs (BFL) BlackForestLabs , FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space , black-forest-labs/FLUX.1-Krea-dev , Krea online tool , bullerwins/FLUX.1-Kontext-dev-GGUF , Flux2
  13. Boson.AI Higgs Audio , EmergentTTS-Eval ("Emotions" and "Questions") , spaces/smola/higgs_audio_v2
  14. BRIA BRIA Background Removal v2.0
  15. ByteDance ByteDance , ByteDance Research , 1.58-bit FLUX , GitHub - LipSync - LatentSync , X-Portrait2 , OmniHuman-1.5 , DreamActor-M1 , GitHub - goku - Flow Based Video Generative Foundation Models , bytedance - MegaTTS3 (voice cloning) , MegaTTS3 Demo , UNO (e2i) , RealCustom , InfiniteYou , DreamFit , BAGEL (ByteDance Adaptive Generative Language Model) , USO: Unified Style and Subject Driven Generation via Disentangled and Reward Learning , OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning , bytedance-research/OneReward , HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning , DreamOmni2: Multimodal Instruction-based Editing and Generation , dvlab-research/DreamOmni2 , BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration , DreamID-V: Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer , Ouro-2.6B-Thinking , Alive: Animate Your World with Lifelike Audio-Video Generation
  16. CanopyLabs.ai Voice Cloning - Natural intonation, emotion, and rhythm that is superior to SOTA closed source models (giggle, gasp, angry, happy) , Orpheus-TTS , Orpheus-TTS demo , watermark_audio , ShmuelRonen/ComfyUI-Orpheus-TTS , Lynx: Towards High-Fidelity Personalized Video Generation ,
  17. Character.AI Ovi - Twin backbone cross-modal fusion for audio-video generation , character-ai/Ovi , snicolast/ComfyUI-Ovi
  18. Cohere r7b-arabic , HuggingFace - CohereForAI , HuggingFace - r7b-arabic , One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers , CohereLabs/tiny-aya translation , Cohere Transcribe
  19. ComfyUI Official Blog
  20. DeepSeek HuggingFace - DeepSeek , HuggingFace - Open R1 , GitHub - Open R1 ,
  21. Facebook META META SAM3 (Segment Anything 3) , DiT , Facebook Research Scalable Diffusion Models with Transformers (DiT) , Seamless4MT , Seamless4MT Demo , MoCha - Towards Movie-Grade Talking Character Synthesis , AssetGen2 - Animated 3D game assets , Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models , Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages , SABER: Scaling Zero-Shot Reference-to-Video Generation , OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory , Meta Segment Anything Model Audio (SAM Audio) , HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming(no code)
  22. FAL fal
  23. FreePik HuggingFace - FreePik , Freepik/F-Lite Blog , fal-ai/f-lite , Freepik/F-Lite
  24. Genmo Genmo Mochi , ComfyUI-MochiWrapper
  25. Google Google-deepmind , Google DeepBrain - Generative Motion Latent FlOw MAtching for Audio-driven Talking Portrait (FLOAT) , GraphCast / GenCast , Google Genie2: Generative Interactive Environments , SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds , TranslateGemma (translation)
  26. HelloVision HelloVision , HelloVision/ComfyUI_HelloMeme
  27. HongKong University of Science & Technology (HKUST) Llasa-3B , HKUSTAudio/Llasa-3B , HKUSTAudio/AudioX , ZeyueT/AudioX , Replicate - kjjk10/llasa-3b-long , MultiTalk: Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation , MeiGen-AI/MultiTalk , InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing , UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections , LucidFlux: Caption-Free Universal Image Restoration with a Large-Scale Diffusion Transformer
  28. HuaWei - Computing Systems Lab (CSL) SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLMs , Reflection Removal through Efficient Adaptation of Diffusion Transformers , MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos
  29. HuggingFace HuggingFace , HuggingFace Spaces , Latent Consistency (LCM) , Vision Language Model - SmolVLM-500M-Instruct-WebGPU , SmolLM3-3B (web) ,
  30. Intel Image-GS: Content-Adaptive Image Representation via 2D Gaussians , Paper , 2-minute paper ,
  31. International Digital Economy Academy (IDEA-Research) International Digital Economy Academy (IDEA-Research) , IDEA-Research/Grounded-Segment-Anything , DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding , Dino-X blog , TREX-2 - object counting , dino-x - Detection - Segmentation - Keypoints - Generative Understanding , Video - Object Tracking , SmolLM (135M, 360M, 1.7B) , Rex-Omni: Detect Anything via Next Point Prediction , Rex-Omni demo
  32. Korea CGR Lab HanYang
  33. Kuaishou Kuaishou Visual Generation and Interaction Center , KlingTeam , Kwai-Kolors , Kwai-Kolors , kijai/ComfyUI-KwaiKolorsWrapper , LivePortrait , 3DTrajMaster , GRAG: Group-Relative Attention Guidance for Image Editing , SVG: Latent Diffusion Model without Variational Autoencoder , UniVideo: Unified Understanding, Generation, and Editing for Videos , 3DiMo: 3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation (no code)
  34. Kyutai Kyutai-TTS , kyutai/tts-1.6b-en_fr ,
  35. Lightricks Lightricks , Lightricks/LTX-Video , Lightricks - LTX-Video , ComfyUI-LTXVideo , CivitAI - LTX IMAGE to VIDEO with STG, CAPTION & CLIP EXTEND workflow GitHub - logtd/ComfyUI-LTXTricks , LTX IMAGE to VIDEO with STG, CAPTION & CLIP EXTEND workflow
  36. Liquid.AI LFM2: On-Device Models , Edge Models , LFM2.5 Models
  37. LG LG - CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models felixtaubner/cap4d
  38. Marvis-AI Marvis-TTS-250m (Sesame CSM-1B, Kyutai mimi codec) , Marvis-Labs/marvis-tts
  39. Meituan Meituan Technology Team , MeiGen-AI/MultiTalk , InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing , X-SAM: From Segment Anything to Any Segmentation , VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications , LongCat-Video: A Unified Foundational Video Generation Model , LongCat-Video-Avatar , LongCat-Image-6B (40Gb) ,
  40. MiaoshouAI MiaoshouAI , HuggingFace - Florence-2-base-PromptGen-v2.0 / Florence-2-large-PromptGen-v2.0 Florence-2-large-PromptGen-v1.5 ComfyUI-Miaoshouai-Tagger
  41. Microsoft Microsoft (Florence2, Phi 3.5, Phi 4) , MoGe - Monocular 2D->3D , MoGe (Monocular 2D->3D) , kijai/ComfyUI-MoGe , MoGe demo , Magma multimodal agents , GitHub - Magma , [GIS-LLM] PEACE: Empowering Geologic Map Holistic Understanding with MLLMs , [GIS-LLM] GEO-Bench-VLM , 1-bit LLM , Phi-4-reasoning-vision ,
  42. MiniMaxi MiniMaxi , MiniMaxAI , HailuoAI , MiniMax-M2
  43. Mistral Mistral , Mistral small-3 , Mistral Large 3 (instruct) , Voxtral-Mini-Realtime
  44. MoonshotAI MoonshotAI , Kimi K2: Open Agentic Intelligence
  45. NetEase Fuxi Lab GitHub
  46. NTU Inverse Super Resolution (InvSR) , EdgeTAM: On-Device Track Anything Model , Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling (Math Magic) , ilcve21/Sparc3D (Math Magic) , Hitem3D (use Sparc3D) , Ultra3D: Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention (Math Magic) , ObjectClear: Complete Object Removal via Object-Effect Attention , CineScale: High-Resolution Cinematic Visual Generation , FE2E: From Editor to Dense Geometry Estimator , DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training , Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation , HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives , PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image , Light-X : Generative 4D Video Rendering with Camera and Illumination Control , LongVie 2: Multimodal Controllable Ultra-Long Video World Model , StoryMem: Multi-shot Long Video Storytelling with Memory , Self-Refining Video Sampling
  47. NUS OminiControl: Minimal and Universal Control for Diffusion Transformer , EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer (Liblib AI) , OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data , PixNerd: Pixel Neural Field Diffusion (pixel-space diffusion transformer for image generation without VAE) , Chroma1-Radiance (based on PixNerd) , SpotEdit: Selective Region Editing in Diffusion Transformers , ShowUI-Pi: Flow-based Generative Models as GUI Dexterous Hands , Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance
  48. NuMind NuMind, NuMind NuExtract-1.5,
  49. Nvidia Nvidia , Nvidia Labs (NVlabs) , GitHub - NVlabs/Sana , GitHub - ComfyUI_ExtraModels , DiffusionRenderer: Video Diffusion Models , 3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes , 3D Gaussian Ray Tracing (3DGRT) , GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control (Monocular 2D->3D) , ViPE: Video Pose Engine for 3D Geometric Perception , Audio Flamingo: Series of Advanced Audio Understanding Language Models , NVIDIA Earth-2: Climate Weather Model ,
  50. OpenAI OpenAI , OpenAI Whisper (transcribe / translate) , Whisper Accent - Accent-Aware English Speech Recognition ,
  51. OpenBMB MiniCPM4-8B , MiniCPM-o , VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning , UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models ,
  52. MiniCPM-o 4.5: Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Mulitmodal Live Streaming on Your Phone , MiniCPM-o demo ,
  53. OpenXLab OpenXLab , Models ,
  54. Ostris Ostris , ostris/OpenFLUX.1 ,
  55. Rednote XiaoHongShu rednote-hilab , rednote-hilab , Dots LLM , FireRedTTS-2: Towards Long Conversational Speech Generation for Podcast and Chatbot , FireRedTTS2 (multilingual) , FireRed-Image-Edit
  56. PixelWave mikeyandfriends - PixelWave , CivitAI - user - humblemikey (Art Style, PixelWave)
  57. RedNote - XiaoHongShu dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model , Xiaohongshu Instant ID Research , Regional-Prompting-FLUX , RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services ,
  58. Resemble-ai ResembleAI , ComfyUI-Chatterbox , Perth is a comprehensive Python library for audio watermarking and detection , ResembleAI/chatterbox-turbo
  59. SalesForce xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
  60. ServiceNow DrBench Enterprise Research Benchmark , Framework for Evaluating Voice Agents (EVA)
  61. Shakker-Labs Shakker-Labs , FLUX.1-dev-ControlNet-Union-Pro 2 , FLUX-LoRA-Gallery , AWPortrait-Z ,
  62. SkyworkAI SkyworkAI/SkyReels-V1 , Kijai/SkyReels-V1-Hunyuan_comfy , SkyReels-Audio , Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
  63. Snap EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
  64. StabilityAI StabilityAI
  65. StableDiffusion StableDiffusionAPI
  66. StepFun.AI Step1X-Edit: A Practical Framework for General Image Editing , Step-Audio-EditX , stepfun-ai/Step1X-Edit , ACE-Step: A Step Towards Music Generation Foundation Model , Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets , StepAudio2: Speech and Audio Understanding & Conversation , StepAudio Demo , Step-Audio-2-mini , Step3-VL-10B: Compact Yet Frontier Multimodal Intelligence , Step-3.5-Flash ,
  67. Tencent Tencent , Tencent-Hunyuan , PhotoMaker , MimicMotion , ComfyUI - HunYuan , HunYuan , tencent-ailab/persona-hub , PersonaHub , HunyuanCustom, a multi-modal, conditional, and controllable generation model centered on subject consistency , Tencent/HunyuanCustom (80gb) , HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters , Hunyuan3D-2.1 (10Gb=Shape, 21Gb=Texture, 29Gb=Shape + Texture) Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again (==GPT4o) , SRPO: Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference (realism) , Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs , FlashWorld: High-quality 3D Scene Generation within Seconds , HunyuanOCR-1B ,
  68. Tsinghua Tsinghua University Knowledge Engineering Group (KEG) & Data Mining , CogVideoX-5b , CogVideoX models , Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation , Unirig: Diverse Skeleton Rigging - One Model to Rig Them All ,
  69. VAST-AI-Research tripo3d.AI , HuggingFace , TripoSG - Image to 3D , Github - TripoSG , MV-Adapter [Image-to-Multi-View] , Github , Medium , DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow , VAST-AI/DetailGen3D , DetailGen3D demo , One Model to Rig Them All: Diverse Skeleton Rigging with UniRig
  70. ViVago ViVago , HiDream t2i , modelscope/HiDream-E1-Full , HiDream t2i , HiDream e1 , Wan t2v , text-to-music ace-step-v1 , Quick Thoughts on HiDream-I1 & E1
  71. VisionATrix VisionATrix , PhotoMaker-Plus
  72. Vision-xl vision-xl
  73. Sina weibo VibeThinker: Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
  74. XiaoMi MiMo Audio: Audio Language Models are Few-Shot Learners , XiaomiMiMo/MiMo-Audio , MiMo-V2-Flash
  75. XLabs-AI XLabs-AI , flux-controlnet-collections , GitHub - XLabs-AI/x-flux-comfyui
  76. ZhiPu Z.AI GLM-4.5: Reasoning, Coding, and Agentic Abililties , Flappy Bird Pokemon webapp - Pokedex , zai-org , SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations AutoGLM , RealVideo: A Real-Time Streaming Conversational System Powered by Autoregressive Diffusion Video Generation
  77. Probing LLM Social Intelligence via Werewolf , Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction
  78. Zyphra Zyphra , Zyphra playground , Zonos v01 Speech TTS ,

Benchmarks & Leaderboards

  • Keywords: InoReader - Benchmarking ,
    1. Intro to LLM Benchmarking
    2. AceVFI: A Comprehensive Survey of Advances in Video Frame Interpolation (VFI) , Awesome-Video-Frame-Interpolation
    3. Agent Leaderboard
    4. AgentBench: Evaluating LLMs as Agents
    5. AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
    6. AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios , Paper
    7. AllenAI RewardBench: Evaluating Reward Models , ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning , SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories , ZeroEval: Benchmarking LLMs for Reasoning , WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild ,
    8. Artificial Analysis Text to Image Leaderboard , LLM-Performance-Leaderboard
    9. Arize Text - Arize Phoenix
    10. Audio Audio - Kimi-Audio-Evalkit
    11. AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking
    12. Awesome-GenAI-Watermarking
    13. Coding LLM Leaderboard for Code Quality & Security , Coding LLM Leaderboard (e.g. Kimi K2) ,
    14. ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
    15. DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
    16. FinanceBench: A New Benchmark for Financial Question Answering
    17. Guidebook - HowTO Smol Training Playbook: The Secrets to Building World-Class LLMs , LLM Evaluation guidebook , LLM Evaluation Guidebook ,
    18. GAIA General Agent Leaderboard (e.g. Manus) GAIA: a benchmark for General AI Assistants
    19. ImagenWorld ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks , Introducing ImagenWorld: A Real World Benchmark for Image Generation and Editing , TIGER-Lab/ImagenWorld-Visualizer , TIGER-Lab/ImagenWorld
    20. Inferless Open-Source Text-to-Speech Model Gallery
    21. LifeArchitect.ai
    22. LMArena LMArena.ai/leaderboard
    23. MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents
    24. Meituan VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications ,
    25. Multilingual Embedding Leaderboard
    26. OmniDocBench: OCR Document Parsing
    27. OpenLLM Open LLM Leaderboard
    28. OpenCompass OpenCompass LLM Leaderboard , OpenCompass multi-modal Leaderboard
    29. philschmid/LLM Pricing
    30. MMDocBench: benchmarking large vision-language models for fine-grained visual document understanding
    31. OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use
    32. Princeton HAL: Holistic Agent Leaderboard , Holistic Agent Leaderboard: The Missing Infrastructure For Ai Agent Evaluation
    33. Speech - Hume - Expressive TTS Arena
    34. Speech - ServiceNow: AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs
    35. Speech - TTS-Arena
    36. Speech - UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models (OpenBMB) Stanford Foundation Model Transparency Index , Stanford Holistic Evaluation of Language Models (HELM)
    37. Surveys Deep Research: A Systematic Survey , A Systematic Survey of Deep Research
    38. SycoFact 4B: Lightweight Sycophancy and Safety Evaluator ,
    39. Text-to-Image-Leaderboard
    40. Video-Generation-Arena-Leaderboard
    41. Text-To-Speech - TTS-AGI/TTS-Arena
    42. Text-To-Video - Vchitect/VBench_Leaderboard
    43. Video - MovieGenBench
    44. Text - klu Evaluation Guide
    45. Text - DeepEval
    46. Stable Diffusion Ecosystem
    47. MMSearch
    48. SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark
    49. Tencent-Hunyuan/ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation
    50. Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
    51. Top3D.ai - Image-To-3D Online Benchmark
    52. Awesome Alternative UIs for ComfyUI
    53. VBench : Comprehensive Benchmark Suite for Video Generative Models
    54. WorldScore: A Unified Evaluation Benchmark for World Generation , WorldScore Leaderboard

    GenAI - Platforms

    1. Alibaba GenAI Platform
    2. argil.ai: AI influencers , fal.ai: argil/avatars
    3. Amazon SageMaker
    4. animemaker
    5. baseten
    6. ByteDance GenAI Platform
    7. DataCrunch.io
    8. Flora.ai
    9. Kaiber.ai
    10. LighTricks - LTX Studio
    11. mage.space
    12. MimicPC , Learn ,
    13. nexa.ai (on-device models & inference Supported Models
    14. OpenRouter (who use & how much)
    15. Pipecat is a framework for building voice (and multimodal) conversational agents
    16. Perplexity DeepSeek R1 1776
    17. RenderNet.ai
    18. Replicate supported models comfyui-replicate
    19. RunWare.ai
    20. RunComfy
    21. RunningHub
    22. runpod.ai
    23. SegMind - PixelFlow
    24. Shakker.ai
    25. sinkin
    26. tensor.art
    27. ThinkDiffusion , thinkdiffusion , Floyo , tag/comfyui
    28. Together.AI
    29. Together.AI
    30. vast.ai
    31. MagicQuill , GitHub - MagicQuill , Demo , ComfyUI_MagicQuill
    32. PhotoDoodle , smthemex/ComfyUI_PhotoDoodle , HuggingFace - PhotoDoodle-Image-Edit-GPU
    33. Tencent GenAI Platform
    34. ComfyUI workflows CivitAI - user - UmeAiRT , CivitAI - user - yorgash (ComfyUI workflows)

    GenAI - Reading Material

    1. Anthropic Cookbook ,
    2. ArXiv - Prompt Canvas: A Literature-Based Practitioner Guide for Creating Effective Prompts in Large Language Models
    3. ArXiv - ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation (Tel Aviv University, NVIDIA)
    4. ArXiv - ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
    5. Google Skills
    6. Xtending Digital Narrative (Jhave's Ai links)
    7. ComfyUI Desktop User Guide
    8. Does the United Nations Need Agents? (Amina, Abdalla) , The UN Made AI-Generated Refugees (404media) ,
    9. SegMind Blog , Qwen-Image: Prompt & Parameter Guide
    10. Sonar WhitePapers , Coding Persoalities
    11. Reinforcement Learning Reinforcement Learning (RL) Guide (unsloth.ai) , Reinforcement Learning from Human Feedback (Nathan Lambert)

    Tool - Performance

  • Keywords: Awesome VLM Architectures ,
    1. AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
    2. DFloat11 DFloat11: Lossless LLM Compression for Efficient GPU Inference , DFloat11/FLUX.1-Kontext-dev-DF11
    3. FlashPortrait: 6 X Faster Infinite Portrait Animation with Adaptive Latent Prediction
    4. Latent-Space ComfyUI Latent Color Tools
    5. Liger Kernel: Efficient Triton Kernels for LLM Training
    6. LightX2V Qwen Image + Wan Video , Qwen T2I , Wan I2V ,
    7. MagCache , Zehong-Ma/ComfyUI-MagCache ,
    8. ModelTC (Lightning, LightX2V) , LightCompress: Towards Accurate and Efficient AIGC Model Compression
    9. SimpleMem: Efficient Lifelong Memory for LLM Agents
    10. TeaCache
    11. WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference
    12. TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
    13. ZeroGPU ahead-of-time (AoT) on HuggingFace

    Tool / Training / Utility

    1. CivitAI LORA Trainer , CivitAI FLUX Trainer , ThinkDiffusion - building-better-models-flux-loras-in-comfyui
    2. 3D Tools AutoDesk - Wonder Dynamics , DeepMotion , Rokoko , Odyssey (3D scene generation)
    3. ComfyUI-FluxTrainer ,
    4. YouTube - Custom AI Digital Human with HeyGen's Lora Training YouTube - How to Replace Yourself on Zoom Calls with an AI Clone from HeyGen
    5. DiffSynth-Studio Qwen-Image-i2L (Image to LoRA) , modelscope/DiffSynth-Studio ,
    6. Ostris.AI YouTube - ostrisai (LORA=Qwen-Edit, Wan2.1 i2v)
    7. Training FLUX LORA
    8. Training FLUX LORA
    9. Training FLUX LORA
    10. shootthesound/comfyUI-Realtime-Lora
    11. FlyMyAI flymyai-lora-trainer , FluxGym , Training FLUX LORA , Training FLUX LORA
    12. CivitAI - flux-guide-part-i-lora-training
    13. Training LORA
    14. Training lora-training-dataset-creation-comfyui-one-click-dataset
    15. ModelScope/data-juicer
    16. SpeechMatics How to Finetune Sesame AI's Speech Model on New Languages and Voices , knottwill/sesame-finetune
    17. ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development , AIDC-AI/ComfyUI-Copilot
    18. Crystools (CPU, GPU, RAM, VRAM, GPU Temp and space)
    19. ComfyUI-to-Python-Extension
    20. comfy-pack: Serving ComfyUI Workflows as APIs , bentoml/comfy-pack
    21. ComfyUI api-nodes
    22. Detectors - NSFW Falconsai/nsfw_image_detection , ComfyUI-NSFW-Detection
    23. GitHub - JDCN - Directory Path
    24. GitHub - liusida/ComfyUI-AutoCropFaces
    25. Captioning
    26. HTML img-comparison-slider , GitHub , demo.photo.gallery
    27. LayerStyle
    28. comfyui-propost
    29. NovaSky NovaSky: UC Berkeley's Sky Computing Lab
    30. chflame163/ComfyUI_LayerStyle_Advance (ZhiPu / SegmentAnything)
    31. OmniSVG: A Unified Scalable Vector Graphics Generation Model
    32. POLARIS: A POst-training recipe for scaling reinforcement Learning on Advanced ReasonIng modelS
    33. QuasiBlob - image processing ComfyUI-EsesImageCompare
    34. Qwen-Image & Qwen-Image-Edit LoRA Training
    35. WaterMark Image Detection Bypass Utility , ComfyUI-ShaderNoiseKSampler (blends standard noise generation with a multi-stage shader-based system) , LLM Attacks , Removing refusals with HF Transformers , Harmless instructions , Refusal in LLMs is mediated by a single direction , Bob's Confetti : Phonetic Memorization Attacks in Music and Video Generation , SynthID-Bypass , AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models , StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model , Heretic: Fully automatic censorship removal for language models , Tencent ai-detect , Hive ai-generated-content-detection

    Tool - Prompt Engineering

    1. adieyal/comfyui-dynamicprompts
    2. AIrjen/OneButtonPrompt
    3. HunyuanVideo 1.5 Prompt Handbook
    4. MushroomFleet/LLM-Base-Prompts (mixed)
    5. AI Video Creation Guide
    6. PixelPruner theallyprompts , civitai
    7. Prompt Lists fofr , ai-prompts/prompt-lists , marduk191/ComfyUI-Fluxpromptenhancer , PromptHero - portraits-prompts , PromptMania ,

    Tool - General / TechScan / Research / DeepResearch

    1. Awesome Deep Research , DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents , A Systematic Survey of Deep Research , HuggingFace - Daily Papers , blog.comfy.org , awesome-comfyui
    2. AI News
    3. AI-Researcher: Autonomous Scientific Innovation
    4. AI4Research: A Survey of Artificial Intelligence for Scientific Research , Paper
    5. Alibaba WebAgent for Information Seeking (WebShaper, WebSailor, WebDancer, WebWalker) Alibaba-NLP/DeepResearch ,
    6. DeepTutor: AI-Powered Personalized Learning Assistant
    7. Fireplexity: Open Source Perplexity AI Clone
    8. Hermit: offline ai chatbot for zim files , Kiwix Library
    9. MiroThinker , MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
    10. Note-taking / Whiteboards Heptabase , MilaNote , Google Mixboard , Scrintal ,
    11. Nvidia Universal Deep Research: Bring Your Own Model and Strategy
    12. Research Assistant ByteDance PaSa: An LLM Agent for Comprehensive Academic Paper Search , Stanford STORM , Github - Deep Research , NanoSage - Advanced Recursive Search & Report Generation , Perplexity - Deep Research , SurfSense ByteDance - DeerFlow (Deep Exploration and Efficient Research Flow) , OpenNotebook , PageLM , HyperBookLM , Momo-research: Context Engineering and Persistent Memory for AI agents. , OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis , OpenResearcher demo ,
    13. dify-deepseek-deploy-a-private-ai-assistant
    14. AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search , SakanaAI/AI-Scientist-v2
    15. WebThinker: Empowering Large Reasoning Models with Deep Research Capability (paper) , WebThinker , RUC-NLPIR/WebThinker ,
  • Object Background Remover / Segmentation / InPaint / OutPaint

  • Apple DepthPro , ComfyUI-Depth-Pro ,
  • Articulate-Anything - Automatic Modeling of Articulated Objects
  • Depth Anything 3: Recovering the Visual Space from Any Views , PozzettiAndrea/ComfyUI-DepthAnythingV3
  • GitHub - Inspyrenet-Rembg
  • GitHub - RMBG (BEN2, mask feather, dino object segmentation) , PramaLLC/BEN2_ComfyUI
  • BRIA Background Removal v2.0
  • Image Pyramid Structure for High Resolution Salient Object Detection (InSPyReNet)
  • Bilateral Reference for High-Resolution Dichotomous Image Segmentation (BiRefNet) , ComfyUI-BiRefNet , MoonHugo/ComfyUI-BiRefNet-Hugo
  • Background Erase Network (BEN) CivitAI ComfyUI
  • FE2E: From Editor to Dense Geometry Estimator
  • lama-remover , batch-process-images-with-lama-cleaner
  • Diffusers Image Fill Guide
  • Lanpaint: Training-Free Diffusion Inpainting with Exact and Fast Conditional Inference ,
  • MegaSAM - Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos
  • magic-research/Sa2VA , Sa2VA-simple-demo
  • MatAnyone (NTU, SenseTime) , HuggingFace demo
  • Rex-Omni: Detect Anything via Next Point Prediction , Rex-Omni demo
  • ROSE: Remove Objects with Side Effects in Videos
  • RF-DETR: SOTA Real-Time Object Detection Model , RF-DETR demo
  • SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction , OpenIXCLab/SeC-4B , 9nate-drake/Comfyui-SecNodes
  • Vargo Teleport - 3D from iPhone Video
  • VOID: Video Object and Interaction Deletion
  • Meituan X-SAM: From Segment Anything to Any Segmentation

    Speech - Music

  • Keywords: github - Awesome Audio
    1. ACE-Step 1.5 , filliptm/ComfyUI-FL-AceStep-Training (training)
    2. ComfyUI-DeepExtract - separate vocals and sounds from audio files
    3. DiRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
    4. FoleyCrafter FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds , HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation ,
    5. Foundation-1: Structured text-to-sample generation for modern music production , multimodalart/Foundation-1
    6. HeartMuLa: Music Foundation Models
    7. MMAudio hkchengrex/MMAudio , kijai/ComfyUI-MMAudio ,
    8. ACE-Step: A Step Towards Music Generation Foundation Model , ace-step/ACE-Step
    9. Platform - ElevenLabs voice-library/angry-voices
    10. Platform - Hume.AI LLM for text-to-speech ,
    11. Platform - Play.HT Play.HT - singaporean-english , play.ht sandbox
    12. Platform - Resemble resemble.ai (Fake Audio Detection)
    13. PrismAudio: Decomposed Chain-of-Thoughts and Multi-Dimensional Rewards for Video-to-Audio Generation
    14. RF-DETR: SOTA Real-Time Detection and Segmentation Model
    15. Riffusion (platform)
    16. SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement
    17. SongGeneration - LeVo: High-Quality Song Generation with Multi-Preference Alignment
    18. ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing , ThinkSound
    19. Music YuE: Open Music Foundation Models for Full-Song Generation

    Speech - Text-2-Speech (TTS)

  • Keywords: HuggingFace - Speech , github - awesome-ai-voice , github - Awesome Audio
    1. Audio Enhancers NovaSR (for clearing up low-quality audio) , YatharthS/NovaSR (for super-resolution) , LavaSR (for super-resolution) , Resemble Enhance (for denoising and bandwidth extension) , OpenVINO Audacity plugin (for super-resolution) , MelbandRoFormer (for music source separation - vocals & instruments) , filliptm/ComfyUI_Fill-Nodes , audio-separation-nodes-comfyui ,
    2. CanopyLabs Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation , canopylabs/orpheus TTS (emotion, training mesopolitica) , ShmuelRonen/ComfyUI-Orpheus-TTS
    3. CosyVoice CosyVoice 3 (with samples) , CosyVoice 2 , muxueChen/ComfyUI_NTCosyVoice touge/ComfyUI-NCE_CosyVoice
    4. DiodioGod TTS Audio Suite
    5. Data AI Audio Datasets (AI-ADS) , CN-Celeb1, CN-Celeb2 ,
    6. DataoceanAI Dolphin (40 Eastern languages East Asia, South Asia, Southeast Asia, Middle East, 22 Chinese dialects)
    7. TTS - FishAudio Fish Audio , FishAudio (No ComfyUI) , huggingface.co/fishaudio
    8. Qwen2-Audio-7B-Instruct-Int4
    9. TTS - DeepGram TTS playground , text-to-speech-prompting
    10. TTS - F5-TTS HuggingFace - SWivid/F5-TTS , niknah/ComfyUI-F5-TTS , erax-ai (vietnamese) ,
    11. TTS - FreeVC Github - FreeVC - One-Shot Voice Conversion , ShmuelRonen/ComfyUI-FreeVC_wrapper
    12. TTS - Hume.AI TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment
    13. TTS - IMS-Toucan IMS-Toucan: Controllable Text-to-Speech for over 7000 Languages , MassivelyMultilingualTTS
    14. TTS - KaniTTS KaniTTS: Fast and Expressive Speech Generation Model , wildminder/ComfyUI-KaniTTS ,
    15. TTS - Kokoro Voice Mixer Studio , MushroomFleet/DJZ-KokoroTTS
    16. TTS = Llasa-3B Llasa-3B , HKUSTAudio/Llasa-3B , Replicate - kjjk10/llasa-3b-long
    17. TTS - Marvis-AI Marvis-TTS-250m (Sesame CSM-1B, Kyutai mimi codec) , Marvis-Labs/marvis-tts
    18. TTS - Microsoft VibeVoice microsoft/VibeVoice-1.5B , Enemyx-net/VibeVoice-ComfyUI , Demo , Fine-Tuning , VibeVoice-Realtime is a lightweight real-time text-to-speech model supporting streaming text input and robust long-form speech generation , VibeVoice-ASR (60min)
    19. TTS - Nari-Labs DIA (2-pax dialogue)
    20. TTS - Sesame Sesame - Crossing the uncanny valley of conversational voice , Sesame CSM 1B for Multi-Speaker AI Conversations , SesameAILabs/csm , Sesame - CSM (Conversational Speech Model) , billwuhao/ComfyUI_CSM , SpeechMatics - How to Finetune Sesame AI's Speech Model on New Languages and Voices , SpeechMatics - knottwill/sesame-finetune
    21. TTS - SparkTTS An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens billwuhao/ComfyUI_SparkTTS , SparkTTS Demo , 1038lab/ComfyUI-SparkTTS , Spark-TTS-finetune , SparkAudio/Spark-TTS (NTU)
    22. TTS - Seed-VC Zero Shot Voice Conversion (Singing VoiceOver) , Plachtaa/seed-vc
    23. TTS - SoulX-Podcast SoulX-Podcast: Towards Realistic Long-form Podcasts with Dialectal and Paralinguistic Diversity
    24. TTS - StepFun.AI StepAudio2: Speech and Audio Understanding & Conversation , StepAudio Demo , billwuhao/ComfyUI_StepAudioTTS
    25. Tool TTS-WebUI
    26. TTS - ZipVoice ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
    27. TTS - Zonos Zyphra , Zyphra playground , Zonos v01 Speech TTS ,

    Talking Head

  • Keywords: harlanhong/awesome-talking-head-generation , JosephPai/Awesome-Talking-Face , Kedreamix/Awesome-Talking-Head-Synthesis , awesome-ai-talking-heads , awesome-digital-human , IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) , European Conference on Computer Vision (ECCV) , International Conference on Computer Vision (ICCV) , International Conference on Learning Representations (ICLR) , Github - lip-sync
    1. Alibaba Fantasy Talking , kijai/ComfyUI-WanVideoWrapper , Alibaba - OmniAvatar , OmniAvatar , OmniTalker YouTube , FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers , EchoShot: Multi-Shot Portrait Video Generation , JoHnneyWang/EchoShot , Paper
    2. ACTalker , Paper
    3. AnimPortrait3D - text to 3D animation
    4. animate-anyone-2 - High-Fidelity Character Image Animation with Environment Affordance , AnimateAnyone
    5. Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars , tobias-kirschstein.github.io/avat3r
    6. BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation
    7. ByteDance InfiniteYou - Flexible Photo Recrafting While Preserving Your Identity , Paper , bytedance/InfiniteYou , InfiniteYou-FLUX demo , ZenAI-Vietnam/ComfyUI_InfiniteYou LatentSync , FlowAct-R1: Towards Interactive Humanoid Video Generation (no-code)
    8. Character.AI avatar-fx
    9. CharaConsist: Fine-Grained Consistent Character Generation
    10. GitHub - DeepFuze , YouTube , Facial transformations, lipsyncing, video generation, voice cloning, face swapping, and lipsync translation
    11. DICE-Talk - Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation , smthemex/ComfyUI_DICE_Talk
    12. Voice Dubbing FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes , JustDubIt Just-Dub-It: Video Dubbing via Joint Audio-Visual Diffusion , Inference pipeline & Training guide
    13. EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer , jax-explorer/ComfyUI-easycontrol , Paper
    14. EchoMimic
    15. FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait , yuvraj108c/ComfyUI-FLOAT , deepbrainai-research.github.io/float
    16. FramePack - Packing Input Frame Context in Next-Frame Prediction Models for Video Generation , lllyasviel/FramePack ,
    17. HelloMeme
    18. HumanMLLM HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
    19. KwaiVGI ReCamMaster: Camera-Controlled Generative Rendering from A Single Video , jianhongbai.github.io/ReCamMaster , Paper , kijai/ComfyUI-WanVideoWrapper ,
    20. LAM: Large Avatar Model for One-shot Animatable Gaussian Head
    21. Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length , Alibaba-Quark/LiveAvatar
    22. LivePortrait GitHub - ComfyUI-LivePortraitKJ , GitHub - ComfyUI-AdvancedLivePortrait , YouTube
    23. LHM: Large Animatable Human Reconstruction Model for Single Image to 3D in Seconds , spaces/DyrusQZ/LHM demo , LHM ComfyUI , LAM: Large Avatar Model for One-shot Animatable Gaussian Head , PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image ,
    24. MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
    25. OmniGen2: Exploration to Advanced Multimodal Generation , VectorSpaceLab/OmniGen2 , ComfyUI-OmniGen2 ,
    26. MemoAvatar - MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation , ComfyUI-IF_MemoAvatar ,
    27. MoCha: End-to-End Video Character Replacement without Structural Guidance ,
    28. PersonaLive PersonaLive: Expressive Portrait Image Animation for Live Streaming , GVCLab/PersonaLive , PersonaLive ,
    29. PixelSmile: Toward Fine-Grained Facial Expression Editing
    30. Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
    31. Sketch2Anim: Towards Transferring Sketch Storyboards into 3D Animation
    32. SkyReels-Audio
    33. Sonic: Shifting Focus to Global Audio Perception in Portrait Animation , xiaozhongji/Sonic Demo smthemex/ComfyUI_Sonic
    34. StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation Francis-Rings/StableAvatar
    35. TaoAvatar pixelai-team.github.io/TaoAvatar , TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting , Medium
    36. Tencent GitHub - ComfyUI-MimicMotionWrapper MimicMotion , MusePose , InteractAvatar: Making Avatars Interact Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars ,
    37. UniAnimate / Animate-X Animate-X ," UniAnimate , ali-vilab/UniAnimate , Isi-dev/ComfyUI-UniAnimate-W (UniAnimate=humans, Animate-X=animals/cartoons) , UniAnimate/Animate-X models ,
    38. Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation(Camera + Video Motion) , alibaba-damo-academy/Uni3C
    39. UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
    40. WildActor: Unconstrained Identity-Preserving Video Generation
    41. X-Portrait GitHub - akatz-ai/ComfyUI-X-Portrait

    Face Restoration & Realism

  • Keywords: 1ai
    1. AdaFace: Quality Adaptive Margin for Face Recognition
    2. CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation
    3. Compare PuLID vs InstantID vs FaceID
    4. GitHub - Gourieff (ReActor Node for ComfyUI) , somanchiu/ReSwapper , faceswap
    5. HuggingFace - GuijiAI/ReHiFace-S
    6. GitHub - sipie800/ComfyUI-PuLID-Flux-Enhanced
    7. GitHub - cubiq/ComfyUI_FaceAnalysis , GitHub - jordoh/ComfyUI-Deepface/
    8. GitHub - Person Mask Generator
    9. alexgenovese/facerestore
    10. modelscope/facechain
    11. Face Restoration Pearl Rope , Roop , DeepFaceLive , SimSwap , deepfakes/faceswap
    12. FaceFusion , Deepfacelive-DFM-Models
    13. Gourieff - comfyui-reactor-node
    14. GitHub - Person Mask Generator
    15. DreamID - A Fast and High-Fidelity diffusion-based Face Swapping via Triplet ID Group Learning

    Image-To-Text (i2t) Captioning

  • AllenAI Molmo 7B D
  • Joy Caption
  • Microsoft , Microsoft Florence2 , Florence-2 , MiaoshouAI , ComfyUI-Miaoshouai-Tagger ,
  • Microsoft Phi - alexisrolland/ComfyUI-Phi (Phi-3.5-mini-instruct, Phi-3.5-vision-instruct) , Phi 3.5 ,
  • MiniCPM-Plus , MiniCPM v2.6 Prompt Generator
  • Moondream (Visual Q&A, Caption, Object Detection) , Moondream blog , vikhyatk/moondream2 , vikhyat/moondream , kijai/ComfyUI-moondream , Hangover3832/ComfyUI-Hangover-Moondream
  • OmniVLM-968M (no ComfyUI)
  • Pixtral Llama Molmo Vision
  • PromptCraft
  • RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards (qwen-edit-2509 LORA)
  • QwenVL for ComfyUI (image & video) , Qwen2-VL-Instruct ,
  • Searge-LLM
  • WD14-Tagger
  • gokaygokay/Flux Prompt Generator , Flux-Florence-2 , fairy-root ,
  • IuvenisSapiens (miniCPM, QWEN, QWEN Audio)
  • Zhipu GLM , GitHub - JcandZero/ComfyUI_GLM4Node , GitHub - Nojahhh/ComfyUI_GLM4_Wrapper ,

    Models

  • Keywords: HuggingFace - text-generation , InoReader - Algorithm ,
    1. AIModels.fyi
    2. Comfy-Org
    3. HuggingFace
    4. ModelScope
    5. AlexGeNovese checkpoint , clip , clip_vision , controlnet , facerestore , ipadapters , loras , sams , vae , ultralytics
    6. city96 GGUF Qwen-Image , LTX , HunyuanVideo-I2V
    7. DiffBot diffbot-llm-inference , diffy.chat demo
    8. Edge Models Falcon Falcon-H1 , HuggingFace Vision Language Model - SmolVLM-500M-Instruct-WebGPU , SmolLM3-3B (web) , Liquid.AI LFM2: On-Device Models , Edge Models , LFM2.5 Models MiroMind MiroThinker
    9. HuiHui-ai abliterated models , Huihui-Qwen3-VL-8B-Instruct-abliterated , coder3101/Qwen3-VL-8B-Thinking-heretic ,
    10. In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer , spaces/RiverZ/ICEdit
    11. IamCreateAI/Ruyi ,
    12. ByteDance - 1.58-bit FLUX
    13. Hugging Face for Legal , HFforLegal/datasets ,
    14. IPAdapter (FaceID, clip-vision, LORA)
    15. Kijai Skyreels , LTXV , HunyuanVideo ,
    16. MonsterMMORPG Wan - GGUF , Upscale , FaceSegments` , Yolo
    17. Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) , UAE Institute of Foundation Models (IFM) , Sherkala (English, Russian, and Turkish) , K2-Think ,
    18. Ostris qwen_edit_inpainting
    19. PowerInfer , , SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local , Paper
    20. QuantStack GGUF Wan2.2-I2V-A14B , Qwen-Image-Distill , FLUX.1-Kontext-dev , LTXV-13B-0.9.8-distilled , Wan2.1_I2V_14B_FusionX
    21. Reaslim TensorArt - Extra-Realistic-Flux , TensorArt - kg_09
    22. StrangerZone StrangerZone LORA (Flux-Super-Realism-LoRA, Super 3D - Engine)
    23. Swiss-AI - Apertus . Swiss-AI - Projects
    24. SVDQuant , mit-han-lab/ComfyUI-nunchaku
    25. TheBloke (>4K)
    26. TildeOpen LLM: Europe's Sovereign Multilingual AITildeOpen LLM: Europe's Sovereign Multilingual AI , TildeAI/TildeOpen-30b
    27. Unsloth.ai Unsloth.ai , UnSloth (>300) , GitHub - UnSloth AI , unsloth/deepseek-v3 , phi-4-all-versions , Fine-tune & Run Qwen3 , Fine-tuning TTS models (Sesame's CSM, Orpheus)

    Upscale SUPIR

  • Keywords: Awesome-video-super-resolution-diffusion , Awesome Diffusion Models for Video Super-Resolution , OpenModelDB , HuggingFace - Phips , realistic skin
    1. 4kagent (satellite)
    2. ac-pill/upscale_models (e.g. RealESRGAN_x4plus_anime_6B.pth)
    3. Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment (No COmfyUI) ,
    4. CineScale: High-Resolution Cinematic Visual Generation , ali-vilab/FreeScale
    5. InvSR - Arbitrary-steps Image Super-resolution via Diffusion Inversion (No ComfyUI) , OAOA/InvSR demo
    6. camenduru/SUPIR ,
    7. Dynamic Position Extrapolation (DyPE) - supports FLUX, Qwen Image, and Z-Image
    8. FLASHVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution
    9. HuggingFace - upscaler
    10. HYPIR , XPixelGroup
    11. GitHub - shiimizu/ComfyUI-TiledDiffusion
    12. GitHub - ssitu/ComfyUI_UltimateSDUpscale
    13. OPPO Research Institute One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution (DLoRAL)
    14. Thera: Aliasing-Free Arbitrary-Scale Super-Resolution with Neural Heat Fields (No COmfyUI)
    15. SeedVR seedvr2 , ByteDance-Seed/SeedVR , ComfyUI-SeedVR2_VideoUpscaler SeedVR2_comfyUI (6Gb, 13Gb) SeedVR2-7B (33Gb) , SeedVR2-3B (14Gb)
    16. Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion
  • Video

  • Keywords: Github - Awesome Video Diffusion , Github - Awesome-LLMs-for-Video-Understanding
    1. One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer (supports long video and misaligned characters) , ssj9596/One-to-All-Animation
    2. Subject-to-Video (s2v) SkyworkAI/SkyReels-V1 , Kijai/SkyReels-V1-Hunyuan_comfy ,
    3. FramePack - generate 1-minute video (60 seconds) ,
    4. Genmo Mochi , ComfyUI-MochiWrapper GitHub - logtd/ComfyUI-MochiEdit ,
    5. GitHub - logtd/ComfyUI-LTXTricks ,
    6. Motion-I2V (No ComfyUI)
    7. Google Genie-2 (No ComfyUI)
    8. Tsinghua University Knowledge Engineering Group (KEG) & Data Mining CogVideoX-5b CogVideoX models
    9. MAGREF - Masked Guidance for Any-Reference Video Generation , MAGREF-Video/MAGREF ,
    10. Phantom (Subject2Video) Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment , kijai/ComfyUI-WanVideoWrapper , YouTube - Phantom workflow
    11. Remade-AI HuggingFace - Remade-AI (video LORA) , remade-effects , Selfie-With-Younger-Self , 360 Degree Rotation , Zoom-Call , workflow - Selfie-With-Younger-Self
    12. SkyReels (e2v) Skyreels V1: Human-Centric Video Foundation Model , SkyReels V2: Infinite-Length Film Generative Model ,
    13. Stable Video Infinity: Infinite-Length Video Generation with Error Recycling (SVI-Pro)
    14. Tencent Hunyuan Tencent Hunyuan , ComfyUI-HunyuanVideoWrapper , HunyuanVideo_comfy models , HY-Motion 1.0: Scaling Flow Matching Models for 3D Motion Generation (text-to-3D==DeepMotion SayMotion) ,
    15. Video Frame Interpolation kijai/ComfyUI-GIMM-VFI , Fannovel16/ComfyUI-Frame-Interpolation
    16. FlowEdit , FlowEdit Image Editing (One-Click Text Modification) , ComfyUI-Fluxtapoz , FlowEdit Video Editing (No Masks, No Noise) , logtd/ComfyUI-LTXTricks , logtd/ComfyUI-HunyuanLoom ,
    17. GitHub - Fannovel16/ComfyUI-MotionDiff
    18. WAN Wan: Open Large-Scale Video Generative Models , ATI: Any Trajectory Instruction for Controllable Video Generation , Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion , CausVid - From Slow Bidirectional to Fast Autoregressive Video Diffusion Models , VACE: All-in-One Video Creation and Editing ,

    3D OpenPose / PoseNet / DepthMap

  • Keyword: VAST-AI-Research/repositories , CivitAI - poses , CivitAI - openpose ,
    1. facebook/ActionMesh: Video to Animated 3D Mesh
    2. Data PoseManiacs , Bandai-Namco , Pose-Depot , CivitAI (>5Gb) , PoseMyArt , AppAnything 1 , AppAnything 2 , AppAnything 3 , HumanDataset 1 , HumanDataset 2 , 3DScanStore , RenderPeople , CMU Graphics Lab Motion Capture Database , Microsoft-Rocketbox ,
    3. InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields
    4. MarketPlace DevianArt , Proko , ` MocapCentral
    5. Tool - Poses OpenPoseAI (detect pose from image)
    6. AlphaPose (out-dated)
    7. comfyui_controlnet_aux-Midas, Zoe Depth , ComfyUI-Marigold
    8. DeepVerse - 4D Autoregressive Video Generation as a World Model
    9. FaceLift: Single Image to 3D Head
    10. Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders , HuggingFace - fffiloni/Gaze-LLE
    11. Generative Refocusing: Flexible Defocus Control from a Single Image
    12. Insta360 - Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation
    13. PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers
    14. TMElyralab/Comfyui-MusePose
    15. Tencent - akatz-ai/ComfyUI-DepthCrafter-Nodes
    16. Pose Estimation 4DHumans , shubham-goel/4D-Humans , open-mmlab/mmpose , TMElyralab/Comfyui-MusePose , logtd/ComfyUI-4DHumans
    17. GeoWizard GeoWizard 2D->3D , GitHub - fuxiao0719/GeoWizard , kijai/ComfyUI-Geowizard
    18. Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator
    19. ComfyUI_Sapiens - (seg,normal,pose,depth,mask maps) , sapiens-pose-1b-torchscript
    20. MoBluRF: Motion Deblurring Neural Radiance Fields for Blurry Monocular Video
    21. StableAnimator: High-Quality Identity-Preserving Human Image Animation
    22. Text-To-Motion FrankenMotion: Part-level Human Motion Generation and Composition Tencent HY-Motion 1.0: Scaling Flow Matching Models for 3D Motion Generation , Nvidia Kimodo: Scaling Controllable Human Motion Generation
    23. Unirig: Diverse Skeleton Rigging - One Model to Rig Them All
    24. UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass

    3D - 2D to 3D Monocular / NERF / Gaussian Splatting / Multi-view

  • Keyword: Github - awesome-gaussians , 3D Gaussian Splatting Papers , Awesome 3D Diffusion , Github - awesome-3D-gaussian-splatting
    1. AllenAI Objaverse-XL - A Universe of 10M+ 3D Objects
    2. BlockGaussian
    3. 4D Gaussian Splatting (temporal) FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction , Gsplat-based 4D Gaussian Splatting for Dynamic Scenes
    4. ByteDance Seed3D
    5. CityGaussian CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes , GitHub - citygs , Paper
    6. CraftsMan3D: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner
    7. DreamTechAI Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
    8. Elevate3D: Elevating 3D Models: High-Quality Texture and Geometry Refinement from a Low-Quality Model
    9. EO-NeRF - Multi-Date Earth Observation NeRF - The Detail Is in the Shadows , EOGS - Gaussian Splatting for Efficient Satellite Image Photogrammetry , EOGS Paper
    10. Free 360 : Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views
    11. Geo4D Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction , jzr99/Geo4D
    12. Google Google Genie2: Generative Interactive Environments , Map2Video: Street View Imagery Driven AI Video Generation (paper)
    13. Group3D: MLLM-Driven Semantic Grouping for Open-Vocabulary 3D Object Detection
    14. GUAVA: Generalizable Upper Body 3D Gaussian Avatar
    15. Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence
    16. HoloPart: Generative 3D Part Amodal Segmentation , HoloPart demo , SAMPart3D: Segment Any Part in 3D Objects
    17. Hunyuan3D-2: High Resolution Textured 3D Assets Generation , tencent/Hunyuan3D-2 Hunyuan3D-2 demo ,
    18. jtydhr88/ComfyUI-InstantMesh
    19. Humans and Structure from Motion (HSfM) - Reconstructing People, Places, and Cameras
    20. HyperNerf : A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields
    21. Image-To-3D (i3D), Video-To-3D (v3d) Know3D: Prompting 3D Generation with Knowledge from Vision-Language Models , Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models , Vega3D: Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding , World Reconstruction From Inconsistent Views ,
    22. ImmerseGen - Agent-Guided Immersive World Generation with Alpha-Textured Proxies
    23. Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model ,
    24. Make-It-Animatable , Demo
    25. Meta - Multi-SpatialMLLM Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models , Navigation World Models , MapAnything: Universal Feed-Forward Metric 3D Reconstruction , facebook/map-anything , SAM 3D Body: Robust Full-Body Human Mesh Recovery , SAM3D - Human demo , SAM3D - Object demo , SAM 3D Objects , SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos (tsinghua) , WorldGen: Generate Any 3D Scene in Seconds , AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials , TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models , ShapeR: Metric Generative Shape Reconstruction
    26. Microsoft MoGe - Monocular 2D->3D , kijai/ComfyUI-MoGe
    27. MoCA: Mixture-of-Components Attention for Scalable Compositional 3D Generation
    28. MV-Adapter: Multi-view Consistent Image Generation Made Easy , ComfyUI-MVAdapter , MVAdapter-demo , Paper
    29. NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
    30. Nerfies: Deformable Neural Radiance Fields
    31. Nvidia Cosmos World Foundation Models Vulkan Gaussian Splatting ,
    32. OccluGaussian OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering , Paper
    33. OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion
    34. PartGen - Part-level 3D Generation and Reconstruction
    35. PanoWan: Lifting Diffusion Video Generation Models to 360 with Latitude/Longitude-aware Mechanisms
    36. RL3DEdit: Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing
    37. SpatialLM: Large Language Model for Spatial Understanding (No COmfyUI) , manycore-research/SpatialLM , manycore research
    38. ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding
    39. SkySplat: Generalizable 3D Gaussian Splatting from Multi-Temporal Sparse Satellite Images (no code) ,
    40. SkySplat: 3DGS Blender Toolkit
    41. Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
    42. Stable-X Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging (no textures) , Stable-X/Hi3DGen demo , Stable-X/ComfyUI-Hi3DGen ,
    43. Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling , ilcve21/Sparc3D
    44. SkinTokens: A Learned Compact Representation for Unified Autoregressive Rigging (no code)
    45. TenCent HunyuanWorld 1.0 , Tencent-Hunyuan/HunyuanWorld-1.0 , tencent/HunyuanWorld-1 , GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition , HunyuanWorld-Voyager , HunyuanWorld-Voyager: depth and RGB video for efficient and direct 3D reconstruction , Hunyuan World Reconstruction , FlashWorld: High-quality 3D Scene Generation within Seconds
    46. The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text
    47. Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
    48. Trellis3D Trellis2 , microsoft/TRELLIS.2-4B , Trellis3d - Structured 3D Latents - for Scalable and Versatile 3D Generation , Trellis demo , if-ai/ComfyUI-IF_Trellis ,
    49. tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction
    50. UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement
    51. UniK3D: Universal Camera Monocular 3D Estimation , UniK3D-demo
    52. UrbanSim: Towards Autonomous Micromobility through Scalable Urban Simulation
    53. VastGaussian VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction , Paper
    54. Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
    55. Multi-view 3D reconstruction WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool , lashWorld: High-quality 3D Scene Generation within Seconds , CAT3D: Create Anything in 3D with Multi-View Diffusion Models , Wonderland: Navigating 3D Scenes from a Single Image , MVImgNet: A Large-scale Dataset of Multi-view Images ,
    56. WorldLabs.AI (Li FeiFei)
    57. WorldGrow: Generating Infinite 3D World
    58. Yan: Foundational Interactive Video Generation
    59. YUME 1.5: A Text-Controlled Interactive World Generation Model
    60. 3DTown: Constructing a 3D Town from a Single Image
    61. HunYuan 3D hunyuan-3d , Tencent/Hunyuan3D-2 , MrForExample/ComfyUI-3D-Pack , niknah/ComfyUI-Hunyuan-3D-2
    62. Vast.AI Github - TripoSG
    63. VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

    Agents

  • Keyword: Awesome Adaptation of Agentic AI , Github - LLM-Agents-Papers , Google Scholar , GitHub - restyler/awesome-n8n , GitHub - enescingoz/awesome-n8n-templates ,
    1. Argilla FinePersonas-v0.1 , FinePersonas-Synthetic-Email-Conversations , synthetic-data-generator-argilla-reviewer ,
    2. AgentSociety: LLM Agents in City , tsinghua-fib-lab/AgentSociety
    3. AgentGym: Evolving Large Language Model-based Agents across Diverse Environments , AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
    4. Chorus Engine: Personal AI Orchestration System
    5. One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt 1Prompt1Story , byliutao/1Prompt1Story
    6. GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics (paper)
    7. DeepPersona: A Depth-First Synthetic-Persona Engine for Highly Personalized Language Models , thzva/Deeppersona , DeepPersona demo
    8. LabClaw: Always-On Lab Agent
    9. Open Character Training
    10. PersonaPlex PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models NVidia/personaplex
    11. Tencent PersonaHub , tencent-ailab/persona-hub , Paper
    12. SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users , Paper
    13. AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios , Paper
    14. AIPress: A Muti-Agent News Generation and Feedback Simulation System , Paper
    15. Microsoft UserLM-8B: Flipping the Dialogue: Training and Evaluating User Language Models , TinyTroupe: LLM-powered multiagent persona simulation for imagination enhancement and business insights
    16. n8n Agentic-Archive , n8n + comfyUI API: Batch Convert Images to Video , n8n + comfyUI API: Simple
    17. Nvidia Nemotron-Personas (US) , Nemotron-Personas (India) , Nemotron-Personas (Japan)
    18. OASIS OASIS: Open Agent Social Interaction Simulations with One Million Agents , MiroFish: A Simple and Universal Swarm Intelligence Engine, Predicting Anything
    19. ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
    20. ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
    21. OpenClaw Awesome OpenClaw Skills , Don’t Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw , AutoResearchClaw: Chat an Idea. Get a Paper. Fully Autonomous & Self-Evolving
    22. OPPO Towards Personalized Deep Research: Benchmarks and Evaluations
    23. Tencent Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

    Simulation Worlds, GIS & World Models

  • Keyword: Awesome World Models for Robotics , Benchmark - WorldScore ,
    1. Alibaba FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
    2. AI2-THOR: An Interactive 3D Environment for Visual AI
    3. AI-Town AI-Town (a16z) , World Craft: Agentic Framework to Create Visualizable Worlds via Text
    4. BEHAVIOR-1K: 1000 realistic, full-length household tasks
    5. CARLA: Open-source simulator for autonomous driving research
    6. Embodied City: Embodied Agent in Urban Environment
    7. Genesis: A Generative and Universal Physics Engine for Robotics and Beyond
    8. GeoVLA: Empowering 3D Representations in Vision-Language-Action Models
    9. GigaWorld-Policy: An Efficient Action-Centered World-Action Model (World Action Models WAM)
    10. Google Google Genie2: Generative Interactive Environments , SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds
    11. Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots
    12. InternUtopia: Dream General Robots in a City at Scale
    13. Large World Model (LWM)
    14. LongColCap: Representing Long Volumetric Video with Temporal Gaussian Hierarchy
    15. Mirage 2 - Generative World Engines , Mirage 2 - Demo ,
    16. MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility
    17. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge (Minecraft)
    18. Niantic Labs Large Geospatial Model
    19. MindCraft: Collaborating Action by Action: Multi-agent LLM Framework for Embodied Reasoning
    20. PlayerOne: Egocentric World Simulator
    21. Seoul World Model: Grounding World Simulation Models in a Real-World Metropolis
    22. SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds
    23. SkyWorld.AI Matrix-3D: Omnidirectional Explorable 3D World Generation Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
    24. SPAgent: Agent in the Physical & Spatial World. Think3D: Thinking with Space for Spatial Reasoning
    25. SynCity SynCity: Training-Free Generation of 3D Worlds , Paper
    26. UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
    27. VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
    28. Very Big Video Reasoning (VBVR) Suite - Knowledge, Abstraction, Spatiality, Transformation, Perception , Video-Reason/VBVR-Bench-Leaderboard
    29. Virtual Community: An Open World for Humans, Robots, and Society
    30. VIGA: Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning (Vibe-code a Physical Scene with Interactions aka Blender sim)
    31. Web World Models (princeton) , Princeton-AI2-Lab/Web-World-Models
    32. WorldGen: Generate Any 3D Scene in Seconds
    33. WorldMirror: Universal 3D World Reconstruction with Any Prior Prompting
    34. World Models LeWorldModel: Stable End-to-End JEPA from Pixels (Yann Lecun) , WorldLabs (Li FeiFei) ,
    35. WorldScore: A Unified Evaluation Benchmark for World Generation , WorldScore Leaderboard

    Datasets

    1. Amazon Berkeley Objects (ABO) Dataset (household items)
    2. HumanRig - Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset
    3. CivitAI-As-Characters
    4. FineVision: Open Data Is All You Need 200 datasets containing 17M images, 89M question-answer turns, and 10B answer tokens, totaling 5TB of high-quality data
    5. Yuan-ManX/ai-audio-datasets
    6. Cartoon Movement (Kenny Tosh)
    7. Data No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages
    8. Data Common Pile v0.1
    9. Data Meta Omnilingual ASR Corpus
    10. Data - Faces CelebV-HQ: A Large-scale Video Facial Attributes Dataset , TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis
    11. Data - Humans HUMOTO: A 4D Dataset of Mocap Human Object Interactions
    12. Data - Movies Movie-Drama scripts
    13. Data - Cccupations O*NET database (800 US occupationsUS)
    14. HuggingFace FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl , FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens ,
    15. Images Eigen-Banana-Qwen-Image-Edit: Lightning-Fast Instruction-Based Image Editing with Pico-Banana-400K ,
    16. MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes
    17. NTU NTU EEE - Digital Signal Processing Laboratory , Research Data
    18. Nvidia Granary - Multilingual Speech AI , nvidia/PhysicalAI-Autonomous-Vehicles-NuRec
    19. UniqueData , UniqueData/facial-emotion-recognition-dataset
    20. Cartoons Cartoon Movement - Israeli-Palestinian-Conflict , Israel-War-Cycle , Paresh Nath, India , Marian Kamensky, Austria , Kenny Tosh, Nigeria , ThinkChina ,

    Lighting

  • Keyword: CivitAI - lighting ,
    1. Apple pico-banana-400k
    2. GitHub - LAOGOU-666/Comfyui-LG_Relight
    3. GitHub - kijai/ComfyUI-Geowizard
    4. GitHub - kijai/ComfyUI-Lotus
    5. LBM: Latent Bridge Matching for Fast Image-to-Image Translation , gojasper/LBM , jasperai/LBM_relighting
    6. Qwen-Image Qwen-Image-Lightning ,

    Text - Translation / OCR / Storyboarding

  • Keyword: OmniDocBench , Hybrid OCR-LLM Framework for Enterprise-Scale Document Information Extraction Under Copy-heavy Task ,
    1. OCR DeepSeek-OCR-2 , dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model , FireRed-OCR , GLM-OCR , HunyuanOCR-1B , LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family , PaddleOCR ,
    2. Doc/Text To LORA Doc-to-LoRA and Text-to-LoRA
    3. Translation Tencent-Hunyuan/HY-MT , Google TranslateGemma , Cohere Tiny-aya

    Coding Assistant / Vibe-code

    1. IQuest-Coder-V1
    2. Mistralai/Devstral-2-
    3. qwen3-coder-next