Rendering 3D cube in Metal

This weekend I managed to setup my application to finally render a cube, and I am going over an overview of what is required to setup to render a basic 3D cube. The basic thing I need is to create the vertex buffer and index buffer of the cube.

var vertex: [VertexAttrib] =
    VertexAttrib(position: float4(x: -1.0, y: -1.0, z: -1.0, w: 1.0),
                 color: float4(x: 1.0, y: 0.0, z: 0.0, w: 1.0)),
    VertexAttrib(position: float4(x: -1.0, y: -1.0, z: 1.0, w: 1.0),
                 color: float4(x: 0.0, y: 1.0, z: 0.0, w: 1.0)),
    VertexAttrib(position: float4(x: 1.0, y: -1.0, z: 1.0, w: 1.0),
                 color: float4(x: 0.0, y: 0.0, z: 1.0, w: 1.0)),
    VertexAttrib(position: float4(x: 1.0, y: -1.0, z: -1.0, w: 1.0),
                 color: float4(x: 1.0, y: 1.0, z: 0.0, w: 1.0)),
    VertexAttrib(position: float4(x: -1.0, y: 1.0, z: -1.0, w: 1.0),
                 color: float4(x: 1.0, y: 0.0, z: 1.0, w: 1.0)),
    VertexAttrib(position: float4(x: -1.0, y: 1.0, z: 1.0, w: 1.0),
                 color: float4(x: 0.0, y: 1.0, z: 1.0, w: 1.0)),
    VertexAttrib(position: float4(x: 1.0, y: 1.0, z: 1.0, w: 1.0),
                 color: float4(x: 0.0, y: 0.0, z: 0.0, w: 1.0)),
    VertexAttrib(position: float4(x: 1.0, y: 1.0, z: -1.0, w: 1.0),
                 color: float4(x: 1.0, y: 1.0, z: 1.0, w: 1.0))

let index: [u_short] =
    0, 3, 2, 2, 1, 0, // bottom
    1, 2, 6, 6, 5, 1, // front
    4, 5, 6, 6, 7, 4, // top
    3, 0, 4, 4, 7, 3, // behind
    0, 1, 5, 5, 4, 0, // left
    2, 3, 7, 7, 6, 2  // right

So I created two buffers first, one for vertex attributes for position and color, and one for the indices that contains the index of vertices that get drawn for each face of the cube. Similar to the quad, a vertex descriptor needs to be created. What is different is that we need the MVP (Model, View, Projection) matrix now to specify the world coordinate position of the cube, the camera position and orientation, and also the perspective projection of the camera. We are specifying this MVP matrix using one 4×4 floating point matrix for each object in the scene, and hence we need a uniform buffer setup for the MVP matrix.

var matrix = ObjectAttrib(
    transform: float4x4.init());
var transformBuffer = device.makeBuffer(
    length: MemoryLayout<float4x4>.size,
    options: MTLResourceOptions.storageModeShared)

ObjectAttrib is just a struct wrapping the float4x4 transform matrix for now, later on I will split the MVP into two separate matrices (M and VP) as I will most likely need the model matrix for other calculations (in world space). Following the Apple’s Metal best practice guide, I am using the storageModeShared option for my buffer’s resource option, since my buffer is relatively small. I will in the future try to performance measure this against storageModeManaged with didModifyRange().

In the update loop, we update the camera, get the projection matrix using viewport information, along with some assumptions (60 degrees for, 0.1 near plane, and 100 far plane), and then update the matrix and do a memcpy to the transformBuffer to update the MVP matrix of the object. Because we are using storageModeShared, the GPU accesses the system memory directly, so we don’t have to notify Metal to transfer anything from system memory to device memory.

let viewMatrix: float4x4 = camera.transform.inverse()
let projectionMatrix: float4x4 = matrix_perspective(fovY: float_t(60.0 * M_PI / 180.0), aspect: float_t(viewport.width) / float_t(viewport.height), nearZ: 0.1, farZ: 100.0)
matrix = projectionMatrix * viewMatrix * modelMatrix;
memcpy(transformBuffer?.contents(), &matrix, MemoryLayout<float4x4>.size)

I have implemented the view matrix as the inverse of the transformation of the camera matrix of the camera object. Below is the camera matrix code, implemented in the camera class. I am using a right handed coordinate system, so the forward vector of the camera is by default point the -Z direction. Using the look at position, the camera position, and the up vector of the camera, we can get the three axes of the camera to form a rotational matrix, and then apply translation to the camera using the camera position to form the camera transformation. The view matrix is simply the inverse of this matrix:

func update() {
    let forward = normalize(target - pos)
    let right = normalize(cross(forward, up))
    up = normalize(cross(right, forward))
    transform =
            float4(1.0, 0.0, 0.0, 0.0),
            float4(0.0, 1.0, 0.0, 0.0),
            float4(0.0, 0.0, 1.0, 0.0),
            float4(pos.x, pos.y, pos.z, 1.0)) *
            float4(right.x, right.y, right.z, 0.0),
            float4(up.x, up.y, up.z, 0.0),
            float4(-forward.x, -forward.y, -forward.z, 0.0),
            float4(0.0, 0.0, 0.0, 1.0))

The projection matrix is the perspective projection matrix with the correction matrix that corrects the z-near and z-far of the NDC cube by scaling it 0.5 and shifting it +0.5. The correction matrix is required in Metal because the NDC cube is 2x2x1 rather than 2x2x2 like in OpenGL, the centre is at z = 0.5 rather than z = 0.

func perspective_matrix(
    fovY: float_t,
    aspect: float_t,
    nearZ: float_t,
    farZ: float_t)->float4x4
    var yscale: float_t = 1.0 / tanf(fovY * 0.5)
    var xscale: float_t = 1.0 / aspect * yscale
    var a = -(farZ) / (farZ - nearZ)
    var b = -(farZ * nearZ) / (farZ - nearZ)

    var m: float4x4 = float4x4.init(
        float4(xscale, 0.0, 0.0, 0.0),
        float4(0.0, yscale, 0.0, 0.0),
        float4(0.0, 0.0, a, -1.0),
        float4(0.0, 0.0, b, 0.0))
    var correction = float4x4.init(
        float4(1.0, 0.0, 0.0, 0.0), 
        float4(0.0, 1.0, 0.0, 0.0),
        float4(0.0, 0.0, 0.5, 0.0), 
        float4(0.0, 0.0, 0.5, 1.0))
    return correction * m

The cube’s draw method is different due to the use of index buffer. You don’t have to explicitly call setVertexBuffer(), instead the drawIndexedPrimitives() provides you with arguments for you to fill in details for your index buffer. Note that at the beginning I have declared my index buffer as a ushort array, so I am using uint16 here with 0 offset.

// create render commands for the cube with the render command encoder
encoder.setVertexBuffer(_vertexBuffer, offset: 0, at: 0)
    type: MTLPrimitiveType.triangle,
    indexCount: index.count,
    indexType: MTLIndexType.uint16,
    indexBuffer: _indexBuffer!,
    indexBufferOffset: 0)

Cube with no depth test enabled, and no cull mode

This is what we get so far, we are missing depth test, and we are also drawing both sides of the faces for all the faces of the cube.

To solve the first problem, we need to create a depth buffer and enable depth test. We have to create a descriptor for the depth test, so that nothing that is occluded is being drawn over objects that are meant to occlude it. This is done using depth comparison function (less than), so that we are only drawing fragments are in front of the fragments that we have already drawn from the perspective of the camera.

// create depth descriptor
let depthDescriptor = MTLDepthStencilDescriptor()
depthDescriptor.depthCompareFunction = MTLCompareFunction.less
depthDescriptor.isDepthWriteEnabled = true

depthStencilState = device.makeDepthStencilState(descriptor: depthDescriptor)

We also need to create a depth texture, and a descriptor for creating it. I’m using depth32Float with my viewport’s width and height, and also the same sample count as the viewport that I have, hence I had to declare it with the textureType as type2DMultisample. Lastly, this resourceOptions can be set to storageModePrivate since it is only being written and read on the GPU.

// create depth texture
let depthTexDescriptor = MTLTextureDescriptor()
depthTexDescriptor.pixelFormat = MTLPixelFormat.depth32Float
depthTexDescriptor.width = Int(view.drawableSize.width)
depthTexDescriptor.height = Int(view.drawableSize.height)
depthTexDescriptor.mipmapLevelCount = 1
depthTexDescriptor.sampleCount = view.sampleCount
depthTexDescriptor.resourceOptions = MTLResourceOptions.storageModePrivate
depthTexDescriptor.textureType = MTLTextureType.type2DMultisample
depthTexDescriptor.usage = MTLTextureUsage.renderTarget
depthTexture = device.makeTexture(descriptor: depthTexDescriptor)

The pipeline state descriptor has to be modified to specify the format of the depth buffer we are using with our pipeline.

pipelineStateDescriptor.depthAttachmentPixelFormat = MTLPixelFormat.depth32Float

I see some people using CAMetalLayer, but I am just using the render pass descriptor provided by MTKView. The MTKView's render pass descriptor requires depth attachment to be set to activate depth test. Here we also specify the clear value of depth to be 1.0 (closer objects from the camera has smaller depth value), and the actions for the depth attachment.

// inside render loop 
if let d = view.currentRenderPassDescriptor
    d.depthAttachment.texture = depthTexture
    d.depthAttachment.clearDepth = 1.0
    d.depthAttachment.loadAction = MTLLoadAction.clear
    d.depthAttachment.storeAction =

    // create encoder 

During the creation of the render command encoder, we have to set the depth stencil state to use the depthCompareFunction we have set and enable/disable depth test.


depth test and back culling enabled

Leave a Reply

Your email address will not be published. Required fields are marked *