Moncef Abboud

https://cefboud.com/Moncef AbboudMoncef Abboud blog and posts. 2026-06-20T17:03:44+00:00 Moncef Abboud https://cefboud.com/ Jekyll © 2026 Moncef Abboud /assets/img/favicons/favicon.ico /assets/img/favicons/favicon-96x96.png Exploring Speculative Decoding: From Concept to Implementation2026-05-31T00:00:00+00:00 2026-05-31T00:00:00+00:00 https://cefboud.com/posts/speculative-decoding/ cef

In this post, we explore speculative decoding through a concrete vLLM-focused implementation, covering draft models, EAGLE, MTP, and the tradeoffs involved.

Exploring Mixture of Experts: From Concept to Inference Engine2026-04-26T00:00:00+00:00 2026-04-26T00:00:00+00:00 https://cefboud.com/posts/mixture-of-experts-MoE-nano-vllm-deep-dive/ cef

In this post, we dabble in Mixture of Experts (MoE) models through a concrete nano-vLLM implementation, exploring Triton kernels, expert parallelism, and other fun things.

Deep Dive into Efficient LLM Inference with nano-vLLM2026-04-05T00:00:00+00:00 2026-04-05T00:00:00+00:00 https://cefboud.com/posts/inside-llm-inference-engine-nano-vllm-explanation/ cef

A look inside a lightweight implementation of vLLM. KV cache, paged attention, tensor parallelism &multi-GPU support, etc.

Coding Agent with Self-hosted LLM: End-to-End Control with Opencode and vLLM2026-03-07T00:00:00+00:00 2026-03-07T00:00:00+00:00 https://cefboud.com/posts/coding-agent-self-hosted-llm-opencode-vllm/ cef

MonClaw: A Minimal OpenClaw Built with the OpenCode SDK2026-02-07T00:00:00+00:00 2026-02-07T00:00:00+00:00 https://cefboud.com/posts/monclaw-a-light-openclaw-with-opencode-sdk/ cef

Openclaw captures something magical: a useful AI assistant. If you squint, a useful assistant and a coding agent are quite alike. This post discusses building a minimal implementation using the Opencode SDK.