Qwen3.5-35B-A3B IQ4_XS: HauhauCS Uncensored vs AesSedai¶

Date: 2026-03-11

Goal¶

Compare the HauhauCS Uncensored fine-tune against the AesSedai quantization on coding tasks, speed, and VRAM.

Background¶

AesSedai IQ4_XS: MoE-optimized quant with Q8_0 attention + IQ3_S FFN experts
HauhauCS IQ4_XS: Uncensored fine-tune with "aggressive" personality adjustments

Both use the same base model (Qwen3.5-35B-A3B MoE) and IQ4_XS quantization, but differ in: 1. Fine-tuning: HauhauCS applies uncensoring and personality changes 2. Quantization approach: Different calibration/imatrix data

Setup¶

Component	Specification
CPU	AMD Ryzen 7 5700G (16 threads)
RAM	32 GB DDR4
GPU	NVIDIA GeForce RTX 4090 (24 GB VRAM)
OS	Ubuntu 24.04.4 LTS
Driver	NVIDIA 580.65.06
llama.cpp	TBD
context_size	40960 tokens

Models Tested¶

ID	Provider	Quant	Source
`hauhaucs`	HauhauCS	IQ4_XS	HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive
`aessedai`	AesSedai	IQ4_XS	AesSedai/Qwen3.5-35B-A3B-GGUF

Methodology¶

Uses the Job Queue Challenge benchmark (v2 methodology from 20260226_qwen35_35b_a3b_provider_comparison):

15 iterations per model for statistical significance
5 levels of increasing difficulty (L1-L5)
pytest validation for each level
Sampler: temp=0.6, top_p=0.95, top_k=20

Levels¶

Level	Task	Points
L1	Basic queue (add/get, FIFO)	25
L2	Retry with exponential backoff	25
L3	Priority scheduling	25
L4	Find & fix concurrency bug	15
L5	Multi-file refactoring	10

Results¶

Summary Table¶

Model	Mean Score	95% CI	TPS	VRAM (MB)
HauhauCS IQ4_XS	TBD	TBD	TBD	TBD
AesSedai IQ4_XS	TBD	TBD	TBD	TBD

Score Distribution¶

Model	Min	Max	Std Dev	Scores
HauhauCS	TBD	TBD	TBD	TBD
AesSedai	TBD	TBD	TBD	TBD

Running the Benchmark¶

# Start HauhauCS service
gpumod start qwen35-35b-a3b-hauhaucs-iq4xs

# Run benchmark (15 iterations)
uv run python docs/benchmarks/20260226_qwen35_job_queue_challenge/benchmark_runner.py \
    --model hauhaucs --port 7097 --iterations 15 \
    --output docs/benchmarks/20260311_hauhaucs_vs_aessedai/

# Stop and switch
gpumod stop qwen35-35b-a3b-hauhaucs-iq4xs
gpumod start qwen35-35b-a3b-aessedai-iq4xs

# Run benchmark for AesSedai
uv run python docs/benchmarks/20260226_qwen35_job_queue_challenge/benchmark_runner.py \
    --model aessedai --port 7094 --iterations 15 \
    --output docs/benchmarks/20260311_hauhaucs_vs_aessedai/

Files¶

File	Description
`result_hauhaucs.json`	HauhauCS benchmark results
`result_aessedai.json`	AesSedai benchmark results
`artifacts//iter_/`	Generated code per iteration

Key Questions¶

Does the HauhauCS fine-tune affect coding capability?
Is there a speed difference from the uncensoring process?
Do the "aggressive" personality changes impact code quality?