Qwen3.5 Model Comparison: 27B vs 35B-A3B¶

Date: 2026-02-25 Hardware: RTX 4090 (24GB VRAM) Test: Multi-agent Tetris development (Planner → Developer → QA)

Models Under Test¶

Model	Preset	Quant	Port	VRAM	Parallel
Qwen3.5-27B	`qwen35-27b-multi`	Q4_K_XL	7082	17 GB	3 slots
Qwen3.5-35B-A3B	`qwen35-35b-q3-multi`	Q3_K_XL	7081	16 GB	3 slots
Qwen3.5-35B-A3B	`qwen35-35b-multi`	Q4_K_XL	7080	20 GB	3 slots

Architecture comparison: - 27B: Dense model, 27B total / 27B active params - 35B-A3B: Sparse MoE, 35B total / 3B active params

Charts¶

Total Time Comparison¶

Total Time

Phase Breakdown¶

Phase Breakdown

VRAM Efficiency¶

VRAM Efficiency

Code Output Comparison¶

Code Output

Results¶

Summary¶

Model	VRAM	Total Time	Plan	Dev	QA	Lines	Valid
Qwen3.5-27B Q4	17 GB	134.0s	36.3s	72.1s	25.6s	312	YES
Qwen3.5-35B-A3B Q3	16 GB	34.8s	7.3s	20.1s	7.5s	322	YES
Qwen3.5-35B-A3B Q4	20 GB	37.8s	8.2s	22.0s	7.6s	311	YES

Key Findings¶

35B-A3B models are dramatically faster than 27B — 35s vs 134s (3.8x faster!)
35B-A3B Q3 is fastest overall — 34.8s total, uses only 16GB VRAM
35B-A3B Q4 slightly slower than Q3 — 37.8s vs 34.8s (8% slower, 4GB more VRAM)
27B is surprisingly slow — Dense architecture less efficient than sparse MoE
All models produced valid, runnable code — 311-322 lines each

Speed Comparison¶

Phase	27B Q4	35B-A3B Q3	35B-A3B Q4	35B-A3B Q3 vs 27B
Planning	36.3s	7.3s	8.2s	5.0x faster
Development	72.1s	20.1s	22.0s	3.6x faster
QA Review	25.6s	7.5s	7.6s	3.4x faster
Total	134.0s	34.8s	37.8s	3.8x faster

VRAM Efficiency¶

Model	VRAM	Time	VRAM Efficiency
35B-A3B Q3	16 GB	34.8s	Best (fastest, lowest VRAM)
27B Q4	17 GB	134.0s	Worst (slow, mid VRAM)
35B-A3B Q4	20 GB	37.8s	Good (fast, highest VRAM)

Generated Code & QA Analysis¶

All three models produced functional Tetris games with similar structure:

Model	Lines	Chars	Syntax	QA Verdict
27B Q4	312	11,279	VALID	Issues noted
35B-A3B Q3	322	11,260	VALID	Issues noted
35B-A3B Q4	311	10,260	VALID	Issues noted

QA Review Summary¶

All three QA agents identified similar potential issues in the generated code:

Common observations across models: - Collision detection edge cases (pieces near board edges) - Rotation wall-kick not fully implemented - Score calculation could have edge cases with >4 lines - Game over detection timing

Verdict: All three games compile and run correctly. The QA agents were thorough in identifying potential edge cases, but the core gameplay functions properly. The issues noted are improvements rather than bugs blocking playability.

Code Quality Comparison¶

Aspect	27B Q4	35B-A3B Q3	35B-A3B Q4
Class structure	Good	Good	Good
All 7 pieces	Yes	Yes	Yes
Rotation states	4 each	4 each	4 each
Line clearing	Yes	Yes	Yes
Scoring	Yes	Yes	Yes
Game over	Yes	Yes	Yes
Controls help	Yes	Yes	Yes

All three models produced structurally similar, fully-featured implementations.

Play the Games¶

# 27B Q4 version
python docs/benchmarks/20260225_qwen35_comparison/tetris_qwen35-27b-multi.py

# 35B-A3B Q3 version (recommended - fastest model)
python docs/benchmarks/20260225_qwen35_comparison/tetris_qwen35-35b-q3-multi.py

# 35B-A3B Q4 version
python docs/benchmarks/20260225_qwen35_comparison/tetris_qwen35-35b-multi.py

Controls: ← → move, ↑ rotate, ↓ soft drop, Space hard drop, q quit

Recommendation¶

Use Qwen3.5-35B-A3B Q3_K_XL as the daily driver.

3.8x faster than Qwen3.5-27B
Uses less VRAM (16GB vs 17GB)
Produces equivalent quality code
Best VRAM efficiency of all tested models

The 27B model should be avoided — it's dramatically slower with no quality benefit.

Reproducing¶

# Sync presets
uv run gpumod preset sync

# Test each model (one at a time due to VRAM)
uv run gpumod service start qwen35-27b-multi
uv run python docs/benchmarks/20260225_qwen35_comparison/benchmark_tetris.py \
  --model qwen35-27b-multi --output docs/benchmarks/20260225_qwen35_comparison/

uv run gpumod service stop qwen35-27b-multi
uv run gpumod service start qwen35-35b-q3-multi
uv run python docs/benchmarks/20260225_qwen35_comparison/benchmark_tetris.py \
  --model qwen35-35b-q3-multi --output docs/benchmarks/20260225_qwen35_comparison/

uv run gpumod service stop qwen35-35b-q3-multi
uv run gpumod service start qwen35-35b-multi
uv run python docs/benchmarks/20260225_qwen35_comparison/benchmark_tetris.py \
  --model qwen35-35b-multi --output docs/benchmarks/20260225_qwen35_comparison/

# Generate charts
uv run python docs/benchmarks/20260225_qwen35_comparison/generate_charts.py

Files¶

File	Description
`benchmark_tetris.py`	Benchmark script
`generate_charts.py`	Chart generator
`result_qwen35-27b-multi.json`	27B results
`result_qwen35-35b-q3-multi.json`	35B-A3B Q3 results
`result_qwen35-35b-multi.json`	35B-A3B Q4 results
`results_combined.json`	Combined comparison
`tetris_qwen35-*.py`	Generated Tetris games
`charts/`	Generated comparison charts