-
Tiny-vLLM Rebuilds the Inference Engine in C++ and CUDA So You Can Read Every Kernel
Tiny-vLLM is a fully functional LLM inference engine in C++ and CUDA — written deliberately small, with every kernel and every line of math walked through in a free companion course. It’s the “younger and smaller sibling” of vLLM, built for understanding rather than deployment. ## What’s actually in there The implementation is not a… Continue reading
-
SANA-Streaming Brings Real-Time Streaming Video Editing to Consumer GPUs
SANA-Streaming is NVIDIA’s bid at real-time streaming video-to-video editing on consumer GPUs — a use case that’s been a wall for diffusion video models because it demands both temporal consistency frame-to-frame and inference throughput a stream can actually keep up with. ## Hybrid attention, where it counts The architectural move is mixing two attention regimes.… Continue reading
-
ByteDance SwanVoice Synthesizes Long-Form Expressive Speech for Monologue and Dialogue
SwanVoice is ByteDance’s bid at expressive long-form zero-shot speech synthesis — a model that can speak in someone’s voice it has never heard before, sustain that voice across minutes of audio, and do it convincingly across both monologue and dialogue. The combination of “long-form” and “expressive” is the hard part most TTS systems quietly skip.… Continue reading
-
Representation Forcing Lets Unified Multimodal Models Skip the External VAE
Representation Forcing (RF), from the University of Hong Kong and ByteDance Seed, attacks a quiet bottleneck inside unified multimodal models: the external VAE everyone leans on to bridge from latent to pixels. RF replaces that crutch by having the model internally predict — and then use — its own high-level visual representations to generate images… Continue reading
-
Impeccable Is a Design Language That Stops Your AI From Shipping Slop UI
Impeccable is a design language for AI coding harnesses with a specific mission: stop them from shipping the same SaaS visual slop on every project. The author’s diagnosis is what every reviewer of AI-built frontends silently thinks — Inter for everything, purple-to-blue gradients, cards nested in cards, gray text on coloured backgrounds, the rounded-square icon… Continue reading
-
Expanse (YC P26) Targets the 60-70% of GPU Capacity Datacenters Waste
Expanse, launching out of Y Combinator’s current batch, attacks an embarrassing number nobody wants to put on a slide: datacenters run at roughly 30–40% effective GPU utilisation. The rest is paid for and idle. Expanse’s job is to make it useful. ## Three things it actually does The product wraps three capabilities. **Resource prediction** right-sizes… Continue reading
-
LongTraceRL Mines Tiered Distractors From Search-Agent Traces for Long-Context RL
LongTraceRL trains long-context reasoning the way an agent actually experiences a long context — by reusing what real search agents do — and grades the model with entity-level rubric rewards instead of a single yes/no on the final answer. ## Tiered distractors from agent traces Long-context RL has been bottlenecked by sparse rewards and easy… Continue reading
-
Replacing RAG With Grep: GrepSeek Trains Search Agents to Use Bash on the Corpus
GrepSeek — the paper is formally “Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction” — takes a sharp position on RAG: tear out the embedding model, the vector index, and the top-k retrieval, and let the agent search the corpus the way a coding agent navigates a codebase. Just grep, find,… Continue reading
-
revfactory/harness Turns a Domain Description Into an Agent Team and Its Skills
revfactory/harness is a meta-skill for Claude Code: instead of giving you an agent, it designs the agent team. Tell it “build a harness for this project,” describe the domain, and it picks a team architecture, defines specialised agents, and generates the skills they use. ## Six team patterns, picked for you Harness ships with six… Continue reading
-
QwenPaw 1.1.9 Adds a Web IDE With File Tree, Diff Review, and Git Panel
QwenPaw, the personal AI assistant from agentscope-ai, shipped v1.1.9 — and the headline addition is Coding Mode, a full Web IDE that lives inside the assistant rather than sending you off to a separate editor. ## A real IDE, not a chat-with-code box Coding Mode is a three-panel Web IDE: a file tree on the… Continue reading
