How We Rasterized 3D Graphics Like It's 1994
The PlayStation 1 doesn’t have a modern GPU with programmable shaders. It has a fixed-function chip that draws flat and Gouraud-shaded triangles, textured quads, and lines — one pixel at a time — into a 1 MB block of video RAM. There’s no z-buffer, no perspective-correct textures, and no floating-point math.
In this article, we’ll walk through how we implemented a software rasterizer for the PS1 GPU in our emulator, RogEm. We’ll cover how VRAM works, how the GPU receives and processes drawing commands, how triangles are filled pixel by pixel, and how textures are sampled from palettized formats that predate PNG.
A Framebuffer Made of Raw Pixels
Before we draw anything, we need somewhere to draw it. The PS1 GPU has 1 megabyte of Video RAM (VRAM), organized as a 1024×512 pixel grid. Each pixel is 16 bits in ABGR1555 format — 5 bits per color channel and 1 bit for a semi-transparency mask:
1
2
3
Bit: 15 14───10 9───5 4───0
│ Blue Green Red
└─ Mask bit (semi-transparency)
That’s 32,768 possible colors — a far cry from today’s 16 million, but enough to render Crash Bandicoot.
In our emulator, VRAM is a flat byte array:
1
std::array<uint8_t, 1024 * 2 * 512> m_vram; // 1,048,576 bytes
Writing a single pixel means computing a byte offset and storing two bytes (little-endian):
1
2
3
4
5
6
void GPU::setPixel(const Vec2i &pos, uint16_t color)
{
int index = (pos.y * 1024 + pos.x) * 2;
m_vram[index] = color & 0xFF;
m_vram[index + 1] = color >> 8;
}
VRAM isn’t just for the framebuffer though — it also stores textures and color palettes (CLUTs). The game and the GPU share this space, which means that a game’s textures, its display output, and its rendering target all live in the same flat 1 MB buffer. The game is responsible for laying things out so they don’t overlap.
block-beta
columns 4
block:row1:4
columns 3
A["Framebuffer\n(display area)\n320×240 or 640×480"]
B["Textures\n(4-bit, 8-bit, or 15-bit)"]
C["CLUTs\n(Color\nLook-Up\nTables)"]
end
D["1024 × 512 pixels = 1 MB VRAM (ABGR1555)"]:4
The PS1’s VRAM layout is entirely up to the game. There’s no hardware-enforced separation between the framebuffer and texture data. A buggy game could literally draw polygons over its own textures.
How Commands Reach the GPU
The CPU talks to the GPU through two IO ports:
- GP0 (
0x1F801810): rendering commands and VRAM data transfers - GP1 (
0x1F801814): display control (resolution, video mode, display area)
When the CPU writes a 32-bit word to GP0, the top 3 bits tell the GPU what category of command it is:
| Bits 31–29 | Category | Examples |
|---|---|---|
001 | Draw Polygon | Triangles, quads (flat, Gouraud, textured) |
010 | Draw Line | Lines, polylines (flat or Gouraud) |
011 | Draw Rectangle | Sprites (1×1, 8×8, 16×16, variable) |
101 | CPU → VRAM | Upload texture/CLUT data |
110 | VRAM → CPU | Read back pixels |
100 | VRAM → VRAM | Copy regions within VRAM |
111 | Environment | Set draw mode, texture window, draw area |
000 | Misc | NOP, cache clear, fill rectangle |
The GPU operates as a state machine with four states:
stateDiagram
[*] --> WaitingForCommand
WaitingForCommand --> ReceivingParameters : GP0 command word
WaitingForCommand --> WaitingForCommand : Immediate command
ReceivingParameters --> WaitingForCommand : All params received
WaitingForCommand --> ReceivingData : CPU→VRAM transfer start
ReceivingData --> WaitingForCommand : Transfer complete
WaitingForCommand --> SendingData : VRAM→CPU transfer start
SendingData --> WaitingForCommand : Transfer complete
Most draw commands require multiple 32-bit words. For example, a Gouraud-shaded textured triangle needs 6 words: color+command, vertex 0 position, texcoord 0 + CLUT info, vertex 1 color, vertex 1 position, texcoord 1 + texture page info. The GPU collects parameters one by one until it has enough, then fires the appropriate draw function.
Decoding a Polygon Command
Let’s zoom into the most interesting command category: draw polygon. A single GP0 command word packs a lot of information:
1
2
3
4
5
6
7
8
31──29 28 27 26 25 24 23───0
001 Shade Quad Tex Semi Raw Color (R8G8B8)
│ │ │ │ │
│ │ │ │ └─ Raw texture (skip color modulation)
│ │ │ └─ Semi-transparent blending
│ │ └─ Textured polygon
│ └─ 0 = Triangle (3 verts), 1 = Quad (4 verts)
└─ 0 = Flat shading, 1 = Gouraud shading
From these 5 flag bits, the GPU knows exactly how many parameters to expect. The formula:
1
nbParams = nbVertices * (1 + shaded + textured) - shaded + 1;
This accounts for the fact that each vertex can have a position word, optionally a color word (Gouraud), and optionally a texture coordinate word. The first vertex’s color is embedded in the command word itself (hence the - shaded).
Here are a few concrete examples:
| Command | Shade | Quad | Tex | Params | Description |
|---|---|---|---|---|---|
0x20 | 0 | 0 | 0 | 4 | Flat-shaded triangle |
0x30 | 1 | 0 | 0 | 6 | Gouraud-shaded triangle |
0x2C | 0 | 1 | 1 | 9 | Flat-textured quad |
0x3C | 1 | 1 | 1 | 12 | Gouraud-textured quad |
Rasterizing a Triangle: Edge Functions
Now we get to the heart of it: given three vertices on screen, how do we fill in the pixels?
We use the edge function method, which is a classic technique based on the cross product. For a triangle defined by vertices A, B, C, the edge function for edge AB evaluated at point P is:
\[E_{AB}(P) = (P_x - A_x)(B_y - A_y) - (P_y - A_y)(B_x - A_x)\]If P is on the left side of the directed edge A→B, the result is positive. If it’s on the right, it’s negative. If P is on the edge itself, it’s zero.
A point P is inside the triangle if and only if all three edge functions have the same sign (all positive for counter-clockwise winding, all negative for clockwise):
1
2
3
4
static int edgeFunction(const Vec2i& a, const Vec2i& b, const Vec2i& c)
{
return (c.x - a.x) * (b.y - a.y) - (c.y - a.y) * (b.x - a.x);
}
The algorithm is straightforward:
flowchart TD
A["Compute bounding box of the 3 vertices\n(clamped to VRAM: 0–1023 × 0–511)"] --> B["Compute triangle area via edgeFunction(v0, v1, v2)\nIf area == 0 → degenerate, skip"]
B --> C["For each pixel (x, y) in bounding box:"]
C --> D["Compute w0, w1, w2\n(edge function for each edge)"]
D --> E{"All ≥ 0\nor all ≤ 0?"}
E -- "No" --> C
E -- "Yes (inside)" --> F["Compute barycentric coords:\nα = w0/area, β = w1/area, γ = w2/area"]
F --> G["Interpolate color, UVs\nSample texture if needed\nsetPixel()"]
G --> C
In code, the core loop looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
int area = edgeFunction(v0, v1, v2);
if (area == 0) return; // degenerate triangle
for (int y = minY; y <= maxY; y++) {
for (int x = minX; x <= maxX; x++) {
Vec2i p = {x, y};
int w0 = edgeFunction(v1, v2, p);
int w1 = edgeFunction(v2, v0, p);
int w2 = edgeFunction(v0, v1, p);
// Accept both winding orders
if ((w0 >= 0 && w1 >= 0 && w2 >= 0) ||
(w0 <= 0 && w1 <= 0 && w2 <= 0))
{
float alpha = (float)w0 / area;
float beta = (float)w1 / area;
float gamma = (float)w2 / area;
// ... shade, texture, write pixel
}
}
}
We accept both clockwise and counter-clockwise winding orders. The PS1 doesn’t cull back-faces at the rasterizer level — that’s the GTE’s job via the
NCLIPcommand (see our GTE article). Games that use back-face culling checkNCLIPand simply don’t send the polygon to the GPU.
What About Quads?
PS1 games love quads. The GPU handles them by splitting each quad into two triangles:
1
2
3
4
5
void GPU::rasterizePoly4(const Vertex *verts, ...)
{
rasterizePoly3(verts, ...); // Triangle: v0, v1, v2
rasterizePoly3(verts + 1, ...); // Triangle: v1, v2, v3
}
Simple, effective, and exactly how the real hardware does it.
From Flat Colors to Gouraud Shading
Flat Shading
The simplest case: one color for the entire polygon. The color is packed in the first command word as 8-bit RGB:
1
2
3
4
ColorRGBA color;
color.r = command & 0xFF;
color.g = (command >> 8) & 0xFF;
color.b = (command >> 16) & 0xFF;
Every pixel in the triangle gets this exact same color, converted to ABGR1555 and written to VRAM.
Gouraud Shading
When bit 28 is set, each vertex carries its own color. The GPU interpolates between them using the barycentric coordinates we already computed:
1
2
3
4
5
6
7
8
9
10
ColorRGBA interpolateColor(const ColorRGBA& c0, const ColorRGBA& c1,
const ColorRGBA& c2,
float alpha, float beta, float gamma)
{
ColorRGBA color;
color.r = static_cast<uint8_t>(c0.r * alpha + c1.r * beta + c2.r * gamma);
color.g = static_cast<uint8_t>(c0.g * alpha + c1.g * beta + c2.g * gamma);
color.b = static_cast<uint8_t>(c0.b * alpha + c1.b * beta + c2.b * gamma);
return color;
}
This creates smooth color gradients across the polygon surface — essential for the PS1’s characteristic look.
graph LR
subgraph "Flat Shading"
F["All pixels = same color\n■■■■\n■■■■\n■■■■"]
end
subgraph "Gouraud Shading"
G["Color interpolated per-pixel\n🔴🟠🟡\n🟠🟡🟢\n🟡🟢🔵"]
end
Texture Mapping: Palettes and Pixel Packing
PS1 textures don’t work like modern textures. They come in three color depths, and two of them use color look-up tables (CLUTs) — essentially palettes, just like a GIF image.
The Three Texture Modes
| Mode | Bits/pixel | Texels per 16-bit VRAM word | Needs CLUT? |
|---|---|---|---|
| 4-bit | 4 | 4 texels packed | Yes (16-color palette) |
| 8-bit | 8 | 2 texels packed | Yes (256-color palette) |
| 15-bit | 16 | 1 texel direct | No (direct ABGR1555) |
For 4-bit textures, four pixels are squeezed into a single 16-bit VRAM word. To extract a single texel, we need bit-shifting:
1
2
3
4
// 4-bit mode: 4 texels per 16-bit word
uint16_t texData = getPixel({texPageBaseX + u / 4, texPageBaseY + v});
uint8_t index = (texData >> ((u % 4) * 4)) & 0xF;
return getPixel({clutX + index, clutY}); // CLUT lookup
For 8-bit textures, two pixels share one VRAM word:
1
2
3
4
// 8-bit mode: 2 texels per 16-bit word
uint16_t texData = getPixel({texPageBaseX + u / 2, texPageBaseY + v});
uint8_t index = (texData >> ((u % 2) * 8)) & 0xFF;
return getPixel({clutX + index, clutY});
For 15-bit, it’s a direct read — no palette indirection.
Where Do Textures Live?
Textures are stored directly in VRAM. The texture page base is specified per-polygon via the command parameters:
-
texPageXin units of 64 pixels (so values 0–15 cover the 1024px width) -
texPageYas 0 or 256 (top or bottom half of VRAM)
The CLUT’s position is also in VRAM, specified as part of the first textured vertex’s parameter word. This means textures, palettes, and the framebuffer all coexist in the same 1 MB— games have to juggle these layouts carefully.
flowchart LR
UV["UV from vertex\n(u, v) 8-bit each"] --> Sample["Sample VRAM at\ntexPage + offset"]
Sample --> Mode{"Color\nmode?"}
Mode -- "4-bit" --> Pack4["Extract 4-bit index\nfrom packed word"]
Mode -- "8-bit" --> Pack8["Extract 8-bit index\nfrom packed word"]
Mode -- "15-bit" --> Direct["Use ABGR1555 directly"]
Pack4 --> CLUT["CLUT lookup\nin VRAM"]
Pack8 --> CLUT
CLUT --> Pixel["Final texel color"]
Direct --> Pixel
Texture × Vertex Color Modulation
Once we have the texel color, it’s blended with the vertex color through multiplicative modulation:
1
2
3
finalR = (texR * vertexR) / 128;
finalG = (texG * vertexG) / 128;
finalB = (texB * vertexB) / 128;
The divide-by-128 (not 255!) means a vertex color of (128, 128, 128) preserves the texture’s original color. Values above 128 brighten the texture; below darkens it. This is how the PS1 achieves “baked lighting” on textured polygons.
If the rawTexture flag is set (bit 24), this modulation is skipped and the texture color is used as-is — useful for UI elements or pre-lit textures.
Transparency check: if a texel’s ABGR1555 value is exactly 0x0000, the pixel is considered fully transparent and is skipped entirely. This is the PS1’s version of alpha testing — simple, binary, and cheap.
Affine Texture Mapping (and Why Things Wobble)
You might have noticed: our UV interpolation uses the barycentric coordinates directly:
1
2
float u = alpha * verts[0].u + beta * verts[1].u + gamma * verts[2].u;
float v = alpha * verts[0].v + beta * verts[1].v + gamma * verts[2].v;
This is affine texture mapping — no perspective correction. Modern GPUs divide UVs by the depth (W) value to get perspective-correct textures. The PS1… doesn’t.
The result is the characteristic “texture warping” that PS1 games are famous for. When a polygon is viewed at an angle, the texture appears to swim and bend. Games work around this by subdividing large polygons into smaller ones (the GTE’s NCLIP helps decide when to subdivide), but the wobble is an inherent part of the PS1 aesthetic.
This is actually one of those rare cases where not implementing a feature is the correct emulation behavior. We match the hardware by doing it “wrong.”
If you’ve ever wondered why PS1 textures look “wobbly” compared to the N64 (which had perspective correction), this is why. It’s not a bug — it’s a 1994 cost-saving decision baked into the silicon.
Lines: Bresenham’s Algorithm
Lines are drawn using the classic Bresenham’s line algorithm — the same algorithm taught in every computer graphics course since the 1960s. For Gouraud-shaded lines, we interpolate colors linearly along the line:
1
2
3
4
5
6
7
8
float dr = (float)(v1.color.r - v0.color.r) / steps;
float dg = (float)(v1.color.g - v0.color.g) / steps;
float db = (float)(v1.color.b - v0.color.b) / steps;
// For each step along the line:
currentR += dr;
currentG += dg;
currentB += db;
Polylines (connected line segments) are handled by a 0x5xxx5xxx termination sentinel — the GPU keeps accepting line segments until it sees that magic value.
Rectangles: The Fast Path
Rectangles (sprites) take a shortcut: no edge function math needed. It’s a simple nested loop:
1
2
3
4
5
6
7
8
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
uint8_t u = vert.u + x;
uint8_t v = vert.v + y;
uint16_t texColor = sampleTexture(u, v, ...);
// modulate, write pixel
}
}
The GPU supports four rectangle sizes encoded in the command: variable (size in a parameter word), 1×1 (single pixel), 8×8, and 16×16. The fixed sizes are used heavily for tile-based backgrounds and UI elements.
VRAM Transfers: The Data Highway
Not all GPU commands draw things. Three copy operations move raw pixel data around:
flowchart LR
CPU["CPU\n(main RAM)"] -- "GP0 cmd 0xA0\n32-bit words → 2 pixels each" --> VRAM["VRAM\n1024×512"]
VRAM -- "GP0 cmd 0xC0\nread back pixels" --> CPU
VRAM -- "GP0 cmd 0x80\npixel-by-pixel copy" --> VRAM
- CPU → VRAM: Used to upload textures and CLUTs. The GPU switches to
ReceivingDataWordsstate, and each subsequent 32-bit write is unpacked into two 16-bit pixels. - VRAM → CPU: Used for screenshots or reading back rendered data. The GPU switches to
SendingDataWordsstate. - VRAM → VRAM: Immediate pixel-by-pixel copy within VRAM. Used for double-buffering tricks and texture manipulation.
Timing: Synced to the CRT
The GPU doesn’t just draw — it also generates the video timing signal. Our emulator simulates NTSC timing:
- GPU clock: runs at 53.693 MHz (the CPU runs at 33.868 MHz, so we multiply CPU cycles by
11/7) - One scanline: 3413 GPU cycles
- One frame: 263 scanlines
- VBlank IRQ: fired at the end of each frame
1
2
3
4
5
6
7
8
9
10
11
12
13
void GPU::update(uint32_t cpuCycles)
{
m_gpuCycles += cpuCycles * 11 / 7; // CPU→GPU clock ratio
while (m_gpuCycles >= 3413) {
m_gpuCycles -= 3413;
m_scanline++;
if (m_scanline >= 263) {
m_scanline = 0;
// Fire VBlank interrupt
m_interruptController->requestInterrupt(InterruptType::VBLANK);
}
}
}
This VBlank interrupt is what drives the game loop — most PS1 games synchronize their logic and rendering to the 60 Hz (NTSC) or 50 Hz (PAL) refresh rate.
Getting Pixels on Screen
After the GPU has rasterized everything into VRAM, we need to actually display it. In RogEm, the display pipeline is:
- Grab the raw VRAM byte array from the GPU
- Upload the entire 1024×512 buffer to an OpenGL texture using
glTexSubImage2Dwith theGL_UNSIGNED_SHORT_1_5_5_5_REVformat — which conveniently matches the PS1’s native ABGR1555 pixel format - Compute UV coordinates based on the current display area settings (games can display from any position in VRAM)
- Render the texture into an ImGui window
The PS1 supports multiple display resolutions (256×240, 320×240, 512×240, 640×240, 640×480 interlaced), all configured through GP1 commands. The display area is just a window into the larger 1024×512 VRAM.
Implementation Notes
What’s Not Here (Yet)
Honesty is important, so here’s what our rasterizer doesn’t implement yet:
- Semi-transparency blending: The flag is parsed and stored, but the actual alpha blending between the source and destination pixels isn’t applied during rasterization. The PS1 supports 4 blending modes (
B/2+F/2,B+F,B-F,B+F/4). - Dithering: The PS1 applies an ordered 4×4 dither matrix when converting from 24-bit internal color to 15-bit VRAM. We track the flag but don’t apply it. This is what gives PS1 games their characteristic “grainy” gradients.
- Draw area clipping: We clamp to VRAM edges but don’t enforce the configurable draw area rectangle that games use to prevent drawing outside the current framebuffer.
- Mask bit: The PS1 can protect VRAM pixels from being overwritten (useful for UI overlays). We parse the settings but don’t enforce them.
- Texture window masking: Allows texture coordinate wrapping/clamping within a sub-region, used for animated textures. Parsed but not applied.
What We Got Right
- Both winding orders: accepting CW and CCW triangles is critical — PS1 games don’t have a consistent winding convention.
- All three texture depths: 4-bit CLUT, 8-bit CLUT, and 15-bit direct all work correctly, with proper bit-packing extraction.
- Affine textures: Deliberately not implementing perspective correction is the correct choice for PS1 emulation.
The Full Pipeline
Putting it all together, here’s the complete rendering pipeline for a single Gouraud-shaded textured triangle:
flowchart TD
A["CPU writes GP0 word\n0x3C (Gouraud + Textured + Quad)"] --> B["GPUCommand parses flags:\nshaded=1, quad=1, textured=1"]
B --> C["GPU collects 12 parameter words\n(colors, positions, UVs, CLUT, texpage)"]
C --> D["drawPolygon() extracts vertices"]
D --> E["rasterizePoly4() splits quad\ninto two triangles"]
E --> F["rasterizePoly3():\nCompute bounding box"]
F --> G["For each pixel:\nedgeFunction × 3"]
G --> H{"Inside\ntriangle?"}
H -- No --> G
H -- Yes --> I["Barycentric coords\nα, β, γ"]
I --> J["Interpolate Gouraud color\nR = R0·α + R1·β + R2·γ"]
I --> K["Interpolate UVs\nu = u0·α + u1·β + u2·γ"]
K --> L["sampleTexture()\n4-bit/8-bit CLUT or 15-bit direct"]
L --> M{"texel == 0x0000?"}
M -- "Yes (transparent)" --> G
M -- No --> N["Modulate:\nfinal = tex × vertex / 128"]
J --> N
N --> O["Convert to ABGR1555\nsetPixel() → VRAM"]
O --> G
What We Learned
Building a software rasterizer from scratch, even for 1994-era hardware, teaches you things that no graphics API tutorial will:
Every pixel costs: Without hardware acceleration, you feel every multiplication, every branch, every memory access. It gives you a visceral understanding of why GPUs exist.
The PS1’s constraints were brilliant: No z-buffer means games use the GTE’s
AVSZcommand and ordering tables to sort polygons by depth. No perspective correction means games subdivide geometry to minimize warping. Every hardware limitation spawned a creative software workaround.Simple >= complex: The edge function rasterizer is about 60 lines of code. Bresenham’s line algorithm is 30 lines. The texture sampler with all three modes is under 30 lines. The entire rasterizer fits in one 886-line file. The PS1 GPU is proof that you can build a complete 3D rendering pipeline without any of the complexity of modern graphics.
VRAM as a unified memory was ahead of its time: Modern “unified memory” architectures in AMD APUs and Apple Silicon are conceptually similar to the PS1’s approach of putting everything — framebuffer, textures, palettes — in one shared memory space.
The PS1 GPU may not have shaders or compute dispatches, but it delivered experiences that defined a generation. And recreating it pixel by pixel has been one of the most rewarding parts of building RogEm.
References
- psx-spx GPU documentation
- Rodrigo Copetti — PlayStation Architecture
- Scratchapixel — Rasterization: a Practical Implementation
- Fabien Sanglard — “How do PSX emulators work?”
- Our GTE Part 2 article — the math that happens before polygons reach the GPU