About This Page

eBPF (extended Berkeley Packet Filter) lets you run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. See Linux Advanced for kernel internals, Cybersecurity for security context, Docker and Kubernetes for orchestration use cases.

What is eBPF?

  • eBPF is a revolutionary technology that allows programs to run in a privileged context inside the Linux kernel — safely and efficiently — without modifying kernel source code or loading modules.
  • Originally BPF (Berkeley Packet Filter) was for network packet filtering (tcpdump). eBPF extended it to a full in-kernel virtual machine for any kernel event.

Why eBPF is Revolutionary

Traditional ApproacheBPF Approach
Modify kernel source + recompileLoad program at runtime
Write kernel module (.ko)Load verified bytecode
Risk system stabilityVerifier guarantees safety
Need reboot to applyLive, hot-load into running kernel
Generic overheadTargeted, minimal overhead
  • The Superpower user-space safety guarantees. This is why all major cloud providers, observability platforms (Datadog, New Relic), and security tools (Falco, Cilium) use it.

    eBPF gives you kernel-level visibility and control with

eBPF Architecture Overview

graph TD
    User[\"👤 User Space\\neBPF Program (C/Rust)\"]
    Compile[\"🔧 LLVM/Clang\\nCompile to BPF bytecode\"]
    Load[\"📦 syscall: bpf()\\nLoad bytecode into kernel\"]
    Verify[\"✅ BPF Verifier\\nSafety + correctness check\"]
    JIT[\"⚡ JIT Compiler\\nBytecode → native machine code\"]
    Hook[\"🪝 Attach to Hook\\nkprobe/tracepoint/XDP/TC/socket\"]
    Trigger[\"🔥 Kernel Event fires\\nProgram executes in-kernel\"]
    Maps[\"🗺️ BPF Maps\\nShared data: kernel ↔ user\"]
    User --> Compile --> Load --> Verify --> JIT --> Hook --> Trigger
    Trigger --> Maps
    Maps --> User

eBPF vs Kernel Modules

FeatureKernel ModuleeBPF
SafetyCan crash kernelVerifier prevents crashes
PortabilityKernel version dependentCO-RE (compile once, run everywhere)
Load timeRequires insmod + possible rebootRuntime, no reboot
DebuggingDifficultbpftool, bpftrace, perf
SandboxedNoYes — bounded loops, no memory corruption
AccessFull kernel accessControlled via helpers

BPF Virtual Machine

Registers & ISA

  • eBPF has a RISC-style 64-bit instruction set.
RegisterPurpose
r0Return value from function calls and BPF program exit
r1–r5Arguments to BPF helper functions
r6–r9Callee-saved registers
r10Read-only frame pointer (stack pointer)
PCProgram counter (implicit)

Instruction Classes

ClassDescription
BPF_LD / BPF_LDXLoad instructions
BPF_ST / BPF_STXStore instructions
BPF_ALU / BPF_ALU64Arithmetic/logic operations
BPF_JMP / BPF_JMP32Jump instructions
BPF_CALLCall BPF helper functions
BPF_EXITExit program

BPF Verifier

  • The verifier is the safety guardian — it statically analyzes every program before loading.
graph TD
    Load[\"Load bytecode via bpf() syscall\"]
    DAG[\"DAG Check\\nNo unreachable code\\nNo infinite loops\"]
    State[\"State Machine Walk\\nSimulate all possible paths\"]
    Bounds[\"Bounds Checking\\nNo out-of-bounds memory access\"]
    Ptr[\"Pointer Tracking\\nNull checks before dereference\"]
    Types[\"Type Checking\\nContext-specific type safety\"]
    Accept[\"✅ Program Accepted\"]
    Reject[\"❌ Program Rejected\"]
    Load --> DAG --> State --> Bounds --> Ptr --> Types --> Accept
    Bounds --> Reject
    Ptr --> Reject
Verifier RuleWhy
No unbounded loops (pre-5.3)Prevents infinite loops
Bounded loops allowed (5.3+)With proven termination
Stack limit: 512 bytesPrevents stack overflow
Max instructions: 1M (BPF_COMPLEXITY_LIMIT)Prevents analysis timeout
All memory accesses must be bounds-checkedNo buffer overflows
Pointer arithmetic restrictedNo kernel memory corruption

BPF Maps

  • BPF Maps are key-value stores shared between eBPF programs (kernel) and user space. The primary data channel.

Map Types Reference

Map TypeDescriptionUse Case
BPF_MAP_TYPE_HASHHash tableCounting events, per-IP tracking
BPF_MAP_TYPE_ARRAYFixed-size array, indexed by intLatency histograms, counters
BPF_MAP_TYPE_PERCPU_HASHPer-CPU hash — lock-freeHigh-frequency counters
BPF_MAP_TYPE_PERCPU_ARRAYPer-CPU array — lock-freeHot path metrics
BPF_MAP_TYPE_LRU_HASHLRU eviction hashConnection tracking
BPF_MAP_TYPE_RINGBUFRing buffer (recommended for events)Event streaming to user space
BPF_MAP_TYPE_PERF_EVENT_ARRAYPerf buffer (older)Event streaming (legacy)
BPF_MAP_TYPE_PROG_ARRAYArray of BPF programsTail calls / program dispatch
BPF_MAP_TYPE_STACK_TRACEStack tracesProfiling
BPF_MAP_TYPE_SOCKHASH/SOCKMAPSocket mapsTCP redirection
BPF_MAP_TYPE_CGROUP_*cgroup-based mapsContainer-aware policies
BPF_MAP_TYPE_BLOOM_FILTERBloom filterFast membership testing

Map Operations (C API)

BPF map operations — kernel side (CO-RE)
// In eBPF program (kernel side)
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10000);
    __type(key, __u32);       // IPv4 address
    __type(value, __u64);     // packet count
} pkt_count SEC(".maps");
 
// Lookup
__u64 *count = bpf_map_lookup_elem(&pkt_count, &src_ip);
 
// Update
__u64 new_count = (count ? *count : 0) + 1;
bpf_map_update_elem(&pkt_count, &src_ip, &new_count, BPF_ANY);
 
// Delete
bpf_map_delete_elem(&pkt_count, &src_ip);
 
// Atomic increment (per-CPU safe)
__sync_fetch_and_add(count, 1);
Map access from user space (BCC Python)
from bcc import BPF
 
b = BPF(src_file="program.c")
pkt_count = b["pkt_count"]
 
# Read all entries
for ip, count in pkt_count.items():
    print(f"IP: {socket.inet_ntoa(ip)}{count.value} packets")
  • Ring Buffer vs Perf Buffer BPF_MAP_TYPE_RINGBUF) is the modern standard (kernel 5.8+). It's more efficient — single allocation, variable-length records, no per-CPU overhead.

    Ring Buffer (

Ring buffer usage
struct event {
    __u32 pid;
    char comm[16];
    __u64 bytes;
};
 
struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 1 << 24);  // 16MB
} events SEC(".maps");
 
// Reserve and submit event
struct event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e) return 0;
 
e->pid = bpf_get_current_pid_tgid() >> 32;
bpf_get_current_comm(&e->comm, sizeof(e->comm));
e->bytes = bytes;
 
bpf_ringbuf_submit(e, 0);

eBPF Program Types

Hook Points Overview

graph LR
    subgraph Tracing
        KP[\"kprobe/kretprobe\\nKernel function entry/exit\"]
        TP[\"tracepoint\\nStable kernel trace points\"]
        UP[\"uprobe/uretprobe\\nUser space function tracing\"]
        PERF[\"perf_event\\nHardware performance counters\"]
        RAW[\"raw_tracepoint\\nLow-overhead tracepoints\"]
    end
    subgraph Networking
        XDP[\"XDP\\nEarliest packet processing point\"]
        TC[\"TC (Traffic Control)\\nIngress + Egress\"]
        SOCK[\"Socket filter\\nPer-socket packet filter\"]
        CGROUP[\"cgroup/skb\\nPer-cgroup network policy\"]
        LWT[\"LWT\\nLightweight tunnel\"]
        SK[\"sk_msg / sk_skb\\nSocket message redirect\"]
    end
    subgraph Security
        LSM[\"LSM hooks\\nMandatory Access Control\"]
        SECCOMP[\"seccomp-bpf\\nSyscall filtering\"]
    end

Tracing Programs

kprobe / kretprobe

  • Attach to any kernel function at entry (kprobe) or exit (kretprobe).
kprobe — trace sys_execve
SEC("kprobe/do_sys_openat2")
int BPF_KPROBE(trace_openat, int dfd, const char __user *filename, 
               struct open_how *how)
{
    char fname[256];
    bpf_probe_read_user_str(fname, sizeof(fname), filename);
    
    bpf_printk("openat: %s\n", fname);
    return 0;
}
  • Stability

    kprobes are NOT stable — kernel function names can change between versions. Use tracepoints for stability.

Tracepoints (Stable)

  • Tracepoints are stable, versioned trace points defined in kernel source.
List available tracepoints
ls /sys/kernel/debug/tracing/events/syscalls/
ls /sys/kernel/debug/tracing/events/sched/
ls /sys/kernel/debug/tracing/events/net/
Tracepoint — trace process scheduling
SEC("tp/sched/sched_process_exec")
int trace_exec(struct trace_event_raw_sched_process_exec *ctx)
{
    char comm[16];
    bpf_probe_read_kernel_str(comm, sizeof(comm), ctx->filename);
    
    __u32 pid = bpf_get_current_pid_tgid() >> 32;
    bpf_printk("exec: pid=%d file=%s\n", pid, comm);
    return 0;
}

uprobe — User Space Tracing

  • Attach to user space functions (application code, libraries).
uprobe — trace SSL_read in OpenSSL
SEC("uprobe//usr/lib/libssl.so:SSL_read")
int BPF_UPROBE(trace_ssl_read, void *ssl, void *buf, int num)
{
    char data[256];
    bpf_probe_read_user(data, sizeof(data), buf);
    bpf_printk("SSL_read: %s\n", data);
    return 0;
}
  • Zero-Instrumentation Visibility without modifying the application. This is how tools like ssldump and Pixie work.

    With uprobes + OpenSSL tracing, you can capture plaintext TLS data

XDP — eXpress Data Path

  • XDP runs at the absolute earliest point in the network stack — on the NIC driver, before sk_buff allocation. This enables line-rate packet processing.
graph LR
    NIC[\"🌐 NIC receives packet\"]
    XDP[\"⚡ XDP Program runs\\n(before SKB allocation)\"]
    PASS[\"XDP_PASS → Normal kernel stack\"]
    DROP[\"XDP_DROP → Discard (0 overhead)\"]
    TX[\"XDP_TX → Retransmit out same NIC\"]
    REDIR[\"XDP_REDIRECT → Send to another NIC/CPU\"]
    ABORTED[\"XDP_ABORTED → Bug, drop with error\"]
    NIC --> XDP
    XDP --> PASS
    XDP --> DROP
    XDP --> TX
    XDP --> REDIR
    XDP --> ABORTED
XDP — drop packets from blacklisted IPs
SEC("xdp")
int xdp_firewall(struct xdp_md *ctx)
{
    void *data     = (void *)(long)ctx->data;
    void *data_end = (void *)(long)ctx->data_end;
    
    struct ethhdr *eth = data;
    if ((void *)(eth + 1) > data_end) return XDP_PASS;
    if (eth->h_proto != bpf_htons(ETH_P_IP)) return XDP_PASS;
    
    struct iphdr *ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end) return XDP_PASS;
    
    // Check blacklist map
    __u32 src = ip->saddr;
    __u32 *blocked = bpf_map_lookup_elem(&blacklist, &src);
    if (blocked) return XDP_DROP;  // Line-rate drop!
    
    return XDP_PASS;
}
XDP ModeDescriptionPerformance
Native (driver)Runs in NIC driver NAPI pollFastest — pre-SKB
OffloadedRuns on NIC SmartNIC firmwareUltra-fast, requires smart NIC
Generic (skb)Runs after SKB allocationSlowest, works on all drivers

TC — Traffic Control

  • TC (Traffic Control) runs after the network stack — it can read and modify sk_buff, giving access to more context than XDP.
TC — add latency label to outgoing packets
SEC("tc")
int tc_egress(struct __sk_buff *skb)
{
    void *data     = (void *)(long)skb->data;
    void *data_end = (void *)(long)skb->data_end;
    
    struct iphdr *ip = data + sizeof(struct ethhdr);
    if ((void *)(ip + 1) > data_end) return TC_ACT_OK;
    
    // Mark packet for DSCP QoS
    ip->tos = (ip->tos & 0x03) | (46 << 2);  // EF PHB
    return TC_ACT_OK;
}

BPF CO-RE (Compile Once, Run Everywhere)

  • The Portability Problem

    Kernel data structure layouts differ between versions. CO-RE solves this — compile once, relocate at load time.

CO-RE Architecture

graph LR
    Src[\"eBPF Source (C)\"]
    BTF_H[\"vmlinux.h\\n(all kernel types)\"]
    LLVM[\"Clang + LLVM\\nCompile with BTF info\"]
    OBJ[\"BPF object (.o)\\n+ BTF relocation records\"]
    libbpf[\"libbpf loader\\nRelocates based on runtime BTF\"]
    Kernel[\"Running kernel BTF\"]
    Kernel --> libbpf
    Src --> LLVM
    BTF_H --> LLVM
    LLVM --> OBJ --> libbpf
Generate vmlinux.h (BTF type headers)
# On target machine
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
CO-RE access to kernel struct fields
// Safe CO-RE field access — handles struct layout differences
#include "vmlinux.h"
#include <bpf/bpf_core_read.h>
 
SEC("kprobe/tcp_v4_connect")
int trace_connect(struct pt_regs *ctx)
{
    struct sock *sk = (struct sock *)PT_REGS_PARM1(ctx);
    
    // CO-RE safe read — works across kernel versions
    __u16 dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
    __u32 daddr = BPF_CORE_READ(sk, __sk_common.skc_daddr);
    
    bpf_printk("tcp_connect: %x:%d\n", daddr, bpf_ntohs(dport));
    return 0;
}

BPF Helpers

  • BPF programs can’t call arbitrary kernel functions — only approved BPF helpers.

Essential Helpers Reference

HelperDescription
bpf_map_lookup_elemLook up key in map
bpf_map_update_elemUpdate/insert key in map
bpf_map_delete_elemDelete key from map
bpf_probe_read_kernelSafe kernel memory read
bpf_probe_read_userSafe user memory read
bpf_probe_read_user_strSafe user string read
bpf_get_current_pid_tgidGet current PID and TGID
bpf_get_current_uid_gidGet current UID and GID
bpf_get_current_commGet current process name
bpf_ktime_get_nsGet monotonic clock nanoseconds
bpf_printkDebug print (→ /sys/kernel/debug/tracing/trace)
bpf_tail_callJump to another BPF program
bpf_send_signalSend signal to current process
bpf_override_returnOverride kprobe return value
bpf_ringbuf_reserveReserve ring buffer space
bpf_ringbuf_submitSubmit ring buffer record
bpf_sk_redirect_mapRedirect socket to sockmap
bpf_xdp_adjust_headAdjust XDP packet head
bpf_get_stackidGet current stack trace
bpf_perf_event_outputOutput to perf buffer

Development Toolchains

libbpf + BPF Skeleton (Modern)

Project structure — libbpf skeleton
project/
├── vmlinux.h          # auto-generated kernel types
├── program.bpf.c      # eBPF kernel-side code
├── program.c          # user-space loader + consumer
└── Makefile
Build and load
# Install dependencies
sudo apt install clang llvm libbpf-dev linux-headers-$(uname -r)
 
# Compile eBPF program
clang -g -O2 -target bpf -D__TARGET_ARCH_x86 \
      -I/usr/include/x86_64-linux-gnu \
      -c program.bpf.c -o program.bpf.o
 
# Generate skeleton header
bpftool gen skeleton program.bpf.o > program.skel.h
 
# Compile user space
gcc -o program program.c -lbpf

BCC (BPF Compiler Collection)

  • BCC is a Python + C framework — great for rapid prototyping and one-liners.
BCC Python — trace process executions
from bcc import BPF
 
prog = r"""
#include <uapi/linux/ptrace.h>
#include <linux/sched.h>
 
BPF_PERF_OUTPUT(events);
 
struct data_t {
    u32  pid;
    char comm[TASK_COMM_LEN];
    char filename[256];
};
 
TRACEPOINT_PROBE(syscalls, sys_enter_execve) {
    struct data_t data = {};
    data.pid = bpf_get_current_pid_tgid() >> 32;
    bpf_get_current_comm(&data.comm, sizeof(data.comm));
    bpf_probe_read_user_str(data.filename, sizeof(data.filename), args->filename);
    events.perf_submit(args, &data, sizeof(data));
    return 0;
}
"""
 
b = BPF(text=prog)
 
def print_event(cpu, data, size):
    event = b["events"].event(data)
    print(f"[{event.pid:6}] {event.comm.decode():20} exec: {event.filename.decode()}")
 
b["events"].open_perf_buffer(print_event)
 
print("Tracing execve... Ctrl+C to stop.")
while True:
    b.perf_buffer_poll()

bpftrace — eBPF One-Liners

  • bpftrace is an AWK/DTrace-like language for quick eBPF exploration.
bpftrace one-liners
# Trace all execve syscalls
bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%s execve %s\n", comm, str(args->filename)); }'
 
# Count syscalls per process
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
 
# Histogram of read() sizes
bpftrace -e 'tracepoint:syscalls:sys_exit_read /retval > 0/ { @bytes = hist(retval); }'
 
# Disk I/O latency histogram
bpftrace -e 'kprobe:blk_account_io_start { @start[arg0] = nsecs; }
kprobe:blk_account_io_done /@start[arg0]/ {
    @lat = hist((nsecs - @start[arg0]) / 1000);
    delete(@start[arg0]);
}'
 
# CPU profiling — stack traces every 99Hz
bpftrace -e 'profile:hz:99 { @[kstack] = count(); }'
 
# TCP connection tracking
bpftrace -e 'kprobe:tcp_v4_connect { printf("connect: %s → %s\n", comm, ntop(arg1)); }'
 
# Trace SSL/TLS plaintext (no app modification)
bpftrace -e 'uprobe:/usr/lib/libssl.so:SSL_write { printf("SSL write: %s\n", str(arg1)); }'
 
# Memory allocation tracking
bpftrace -e 'uprobe:/usr/lib/libc.so:malloc { @allocs[comm] = sum(arg0); }'

Observability Use Cases

CPU Profiling with BPF

Flame graph generation
# Install FlameGraph tools
git clone https://github.com/brendangregg/FlameGraph
 
# Profile for 30 seconds
bpftrace -e 'profile:hz:99 { @[kstack, ustack] = count(); }' \
    --no-warnings > /tmp/out.txt 30
 
# Or use perf with BPF backend
perf record -F 99 -a -g -- sleep 30
perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg

Network Observability

Network one-liners
# TCP connections with latency
bpftrace -e '
kprobe:tcp_v4_connect { @start[tid] = nsecs; }
kretprobe:tcp_v4_connect /@start[tid]/ {
    printf("tcp_connect latency: %d us\n", (nsecs - @start[tid]) / 1000);
    delete(@start[tid]);
}'
 
# Top TCP talkers by bytes
bpftrace -e '
kprobe:tcp_sendmsg { @bytes[comm] += arg2; }
interval:s:5 { print(@bytes); clear(@bytes); }'

Production Tools Built on eBPF

ToolPurposeeBPF Use
CiliumKubernetes CNI + securityXDP + TC for network policy
FalcoRuntime securitykprobe/tracepoint for syscall monitoring
PixieK8s observabilityuprobes for zero-instrumentation tracing
bcc/toolsLinux perf toolsAll hook types
KatranFacebook L4 load balancerXDP for line-rate LB
TetragonSecurity observabilityTracing + LSM
Datadog AgentAPM + infra monitoringkprobes + uprobes
SysdigContainer securitySyscall monitoring via tracepoints

Security with eBPF

LSM BPF — Linux Security Modules

  • eBPF programs can attach to LSM hooks to enforce fine-grained security policies.
LSM BPF — block file writes from specific processes
SEC("lsm/file_open")
int BPF_PROG(restrict_file_open, struct file *file)
{
    char comm[16];
    bpf_get_current_comm(comm, sizeof(comm));
    
    // Block "bad_proc" from opening /etc/passwd
    if (__builtin_memcmp(comm, "bad_proc", 8) == 0) {
        char filename[64];
        bpf_probe_read_kernel_str(filename, sizeof(filename),
                                 file->f_path.dentry->d_name.name);
        if (__builtin_memcmp(filename, "passwd", 6) == 0)
            return -EPERM;  // Deny access
    }
    return 0;
}

seccomp-bpf — Syscall Filtering

  • seccomp-bpf uses BPF (classic BPF) to filter syscalls per process.
seccomp-bpf — allow only specific syscalls
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <sys/prctl.h>
 
struct sock_filter filter[] = {
    // Load syscall number
    BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, nr)),
    // Allow read, write, exit
    BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_read,  2, 0),
    BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_write, 1, 0),
    // Kill process on anything else
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL),
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
};
 
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
  • Container Use

    Docker, containerd, and Kubernetes all use seccomp-bpf profiles to restrict container syscall access.

bpftool — eBPF Swiss Army Knife

bpftool — inspect and manage eBPF objects
# List loaded BPF programs
bpftool prog list
bpftool prog show id 42
 
# Dump BPF program bytecode
bpftool prog dump xlated id 42
bpftool prog dump jited id 42      # JIT-compiled native code
 
# List all BPF maps
bpftool map list
bpftool map show id 10
bpftool map dump id 10             # dump all map entries
 
# Lookup map entry
bpftool map lookup id 10 key 01 00 00 00
 
# Update map entry
bpftool map update id 10 key 01 00 00 00 value 01 00 00 00
 
# Show BTF types
bpftool btf list
bpftool btf dump id 1
 
# Pin/unpin programs to filesystem
bpftool prog pin id 42 /sys/fs/bpf/my_program
 
# Generate skeleton
bpftool gen skeleton program.bpf.o > program.skel.h
 
# Show BPF network attachments
bpftool net list
 
# Profile BPF program performance
bpftool prog profile id 42 duration 5 cycles instructions l1dcache-misses

More Learn

Docs & Books

Key Projects & Tools

  • bcc/tools — 80+ ready-to-use eBPF tools.
  • bpftrace — High-level tracing language.
  • Cilium — eBPF-based Kubernetes networking + security.
  • Falco — eBPF-based runtime security.
  • Tetragon — eBPF security observability.