<feed xmlns="http://www.w3.org/2005/Atom"> <id>https://cefboud.com/</id><title>Moncef Abboud</title><subtitle>Moncef Abboud blog and posts.</subtitle> <updated>2026-06-20T17:03:44+00:00</updated> <author> <name>Moncef Abboud</name> <uri>https://cefboud.com/</uri> </author><link rel="self" type="application/atom+xml" href="https://cefboud.com/feed.xml"/><link rel="alternate" type="text/html" hreflang="en" href="https://cefboud.com/"/> <generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator> <rights> © 2026 Moncef Abboud </rights> <icon>/assets/img/favicons/favicon.ico</icon> <logo>/assets/img/favicons/favicon-96x96.png</logo> <entry><title>Exploring Speculative Decoding: From Concept to Implementation</title><link href="https://cefboud.com/posts/speculative-decoding/" rel="alternate" type="text/html" title="Exploring Speculative Decoding: From Concept to Implementation" /><published>2026-05-31T00:00:00+00:00</published> <updated>2026-05-31T00:00:00+00:00</updated> <id>https://cefboud.com/posts/speculative-decoding/</id> <content type="text/html" src="https://cefboud.com/posts/speculative-decoding/" /> <author> <name>cef</name> </author> <category term="Technical Writing" /> <category term="Open Source" /> <summary>In this post, we explore speculative decoding through a concrete vLLM-focused implementation, covering draft models, EAGLE, MTP, and the tradeoffs involved.</summary> </entry> <entry><title>Exploring Mixture of Experts: From Concept to Inference Engine</title><link href="https://cefboud.com/posts/mixture-of-experts-MoE-nano-vllm-deep-dive/" rel="alternate" type="text/html" title="Exploring Mixture of Experts: From Concept to Inference Engine" /><published>2026-04-26T00:00:00+00:00</published> <updated>2026-04-26T00:00:00+00:00</updated> <id>https://cefboud.com/posts/mixture-of-experts-MoE-nano-vllm-deep-dive/</id> <content type="text/html" src="https://cefboud.com/posts/mixture-of-experts-MoE-nano-vllm-deep-dive/" /> <author> <name>cef</name> </author> <category term="Technical Writing" /> <category term="Open Source" /> <summary>In this post, we dabble in Mixture of Experts (MoE) models through a concrete nano-vLLM implementation, exploring Triton kernels, expert parallelism, and other fun things.</summary> </entry> <entry><title>Deep Dive into Efficient LLM Inference with nano-vLLM</title><link href="https://cefboud.com/posts/inside-llm-inference-engine-nano-vllm-explanation/" rel="alternate" type="text/html" title="Deep Dive into Efficient LLM Inference with nano-vLLM" /><published>2026-04-05T00:00:00+00:00</published> <updated>2026-04-05T00:00:00+00:00</updated> <id>https://cefboud.com/posts/inside-llm-inference-engine-nano-vllm-explanation/</id> <content type="text/html" src="https://cefboud.com/posts/inside-llm-inference-engine-nano-vllm-explanation/" /> <author> <name>cef</name> </author> <category term="Technical Writing" /> <category term="Open Source" /> <summary>A look inside a lightweight implementation of vLLM. KV cache, paged attention, tensor parallelism &amp;multi-GPU support, etc.</summary> </entry> <entry><title>Coding Agent with Self-hosted LLM: End-to-End Control with Opencode and vLLM</title><link href="https://cefboud.com/posts/coding-agent-self-hosted-llm-opencode-vllm/" rel="alternate" type="text/html" title="Coding Agent with Self-hosted LLM: End-to-End Control with Opencode and vLLM" /><published>2026-03-07T00:00:00+00:00</published> <updated>2026-03-07T00:00:00+00:00</updated> <id>https://cefboud.com/posts/coding-agent-self-hosted-llm-opencode-vllm/</id> <content type="text/html" src="https://cefboud.com/posts/coding-agent-self-hosted-llm-opencode-vllm/" /> <author> <name>cef</name> </author> <category term="Technical Writing" /> <category term="Open Source" /> <summary></summary> </entry> <entry><title>MonClaw: A Minimal OpenClaw Built with the OpenCode SDK</title><link href="https://cefboud.com/posts/monclaw-a-light-openclaw-with-opencode-sdk/" rel="alternate" type="text/html" title="MonClaw: A Minimal OpenClaw Built with the OpenCode SDK" /><published>2026-02-07T00:00:00+00:00</published> <updated>2026-02-07T00:00:00+00:00</updated> <id>https://cefboud.com/posts/monclaw-a-light-openclaw-with-opencode-sdk/</id> <content type="text/html" src="https://cefboud.com/posts/monclaw-a-light-openclaw-with-opencode-sdk/" /> <author> <name>cef</name> </author> <category term="Technical Writing" /> <category term="Open Source" /> <summary>Openclaw captures something magical: a useful AI assistant. If you squint, a useful assistant and a coding agent are quite alike. This post discusses building a minimal implementation using the Opencode SDK.</summary> </entry> </feed>
