Manual and Instruction Reference for R16K1S60


Feel free to PM me if something is unclear or you have an improvement in mind for this manual.


Sometime in 2015 I published Ray162K (or R162K in short), my first ever TPT computer. It was huge (360x330 particle area), it was slow (16 frame clock cycle), but it was a bloody 16-bit computer nonetheless. It was fun to build and, in my opinion, to use.

The day I published R162K, Schmolendevice contacted me and introduced me to the world of subframe technology (thanks for that!). I was amazed by how smoothly everything worked in subframe, but since I’d just finished building R162K, I couldn’t muster the strength to look deeper into it right away. I was out of it for quite some time.

When I came back to TPT, I half-expected to be greeted by the sight of countless subframe computers on FP. As you can probably guess, that’s not what happened. I thought, well, I could just build one myself. I set my goals and built R16K1S60, which features:

Also, it’s so small that you can fit four of them in a standard 612x384 TPT simulation.


Driving the thing

On its bottom side it has four buttons, a cartridge slot and an LRCY lamp. These are, from left to right:

On its right side, it has 4 IO ports: #0, #1, #2 and #3 from top to bottom.

To run a program on a cartridge, insert the cartridge into the cartridge slot and press the Start button. Install the peripherals required by the program beforehand.


A 16-bit ALU, 4 16-bit read-write IO ports, 1024 16-bit cells of RAM, shared code/data/stack space. Five general purpose registers (called AX, BX, CX, DX and EX), stack pointer (SP, always points to the bottom of the stack), instruction pointer (IP, which always points to the first cell of the next instruction about to be executed).

The stack grows downward, and zeroing SP might be the first thing your program should do if it uses the stack (as none of the registers are reset apart from IP when Ray is reset).

Flags that reflect properties of the result of the last instruction executed are present. These flags are the Zero Flag (Zf, set if the result is 0), the Sign Flag (Sf, set if the signed result is negative), the Carry Flag (Cf, set if an unsigned operation yielded a carry) and the Overflow Flag (Of, set if a signed operation yielded a carry).

Unconditional and conditional jumps (based on the above flags), subroutine calls and returns, stack operations and base-plus-offset addressing are supported.

A subroutine call pushes the return address (the address of the instruction after the CALL instruction) to the stack and jumps. A subroutine return pops said address and jumps to it.


This section assumes a basic knowledge of assembly programming.

There’s an instruction reference below this section. The mnemonics used there are recognized by rasm and even highlighted by RasmUI.

Once you’ve downloaded rasm.lua (Ray assembler) and rasmui.lua (user interface for rasm), there are several ways of going about this:

RasmUI is also good for debugging. You can choose which instance of Ray to monitor (yes, as I’ve mentioned before, you can have multiple Rays on screen). It’s also got tooltips to make it more difficult to get lost in the mess of its controls. You can even open several RasmUI instances simultaneously.

A few words about rasm:

A few words about RasmUI:

In an assembly source, the following symbols have special meaning:

Instruction reference

Instructions are encoded as 16-bit opcodes. They have zero to three input operands and zero to two output operands. These operands may be registers or 16-bit immediate values. Any instruction is capable of encoding a single immediate value and not more than that, which means that instructions such as mov [0xCAFE], 0xBABE are not possible to encode. An immediate value defaults to 0 if an instruction references it and no immediate value is encoded in the opcode. In the reference, the flag that determines whether an immediate value is encoded or not is referred to as g (set if an immediate value is encoded).

Flags are set according to the result written to the Primary operand line, if any. In the reference, the flag that determines whether the flags are to be updated or not is referred to as f (set if they are to be updated).

Instructions read their inputs from the Primary, Secondary and/or Tertiary operand lines and write results to the Primary and/or the Secondary operand lines. 2- or 3-bit fields in the instruction tell the operand lines which register (or immediate value) to map to. A 3-bit 000 field maps to the immediate value encoded in the instruction, 001 maps to AX, 010 maps to BX, etc. 110 maps to SP and 111 maps to IP. These fields are expressed as 3 letters in the opcodes below, ppp for the primary operand, sss for the secondary operand and tt for the tertiary operand. You might notice that tt fields are only 2-bit long. This means that only an immediate value, AX, BX or CX can be encoded as tertiary operands.

Keep in mind that e.g. secondary operand doesn’t necessarily mean second operand, as the former is an architectural detail, while the latter only exists in the assembly source and is translated by rasm.

Some opcodes include hacks such as discarding the result of an instruction by saving it to operand 000, effectively the immediate value. Since the immediate value is a read-only value, the result is discarded.

Dots mean indifferent bits; they have no effect on the instruction encoded in the opcode they appear in.

Shift sorcery

Ray implements shifts in a funny way: No SAR, no MSB preservation, no nothing. It makes up for that with multiword shifts and rotates. This section describes the inner workings of the Shifter Unit and how it can be used to do all sorts of fun stuff.

Consider SHL (the 2-operand variant). It shifts ppp left by tt bits. Let’s say tt is 1. That’s easy: a is discarded, 0 appears on the right (the LSB).


before: abcdefghijklmnop
after:  bcdefghijklmnop0

The 3-operand variant does some more work and makes a backup of ppp in sss. Let’s use that and shift ppp to the left by 3 bits this time:

              PPP              SSS

before: abcdefghijklmnop ................
after:  defghijklmnop000 abcdefghijklmnop

a, b and c are discarded, sss now contains the initial value of ppp.

Do another shift using SHLD. It gets the bits to shift in from sss. A, B and C are discarded once again, but instead of 0s, a, b and c, the MSBs of the initial value of ppp from the previous SHL, appear on the right:

              PPP              SSS

before: ABCDEFGHIJKLMNOP abcdefghijklmnop

The initial value of ppp is once again backed up in sss.

So, with an SHL followed by as many SHLDs as you want, you can shift as big a number as you want by 16 bits at most. The same applies to right shifts (SHR and SHRD).

To see how ROL works, we have to break the above before/after diagram of SHLD into steps:

              PPP              SSS        temporal storage

step 0: ABCDEFGHIJKLMNOP abcdefghijklmnop ................ // initial state
step 1: ABCDEFGHIJKLMNOP abcdefghijklmnop abcdefghijklmnop // sss is copied into temp
step 2: ABCDEFGHIJKLMNOP ABCDEFGHIJKLMNOP abcdefghijklmnop // ppp is copied into sss
step 3: DEFGHIJKLMNOPabc ABCDEFGHIJKLMNOP abcdefghijklmnop // ppp is shifted, bits get shifted in from temp

What happens if ppp and sss map to the same source? We get a rotate:

              PPP              SSS        temporal storage


Since ppp and sss map to the same destination as well, the question of whether the Primary result or the Secondary result is saved to that destination arises. The answer is simple: the Primary operand has precedence over the Secondary operand, so the Primary result is saved. So after the ROL above, ppp will hold DEFGHIJKLMNOPABC.

This means that ROL x, y is just a shorthand for SHLD x, x, y. The same applies to ROR.

A mutliword rotate is just a big shift (chained SHLDs) and a bit of playing around with the bits shifted out at the end:

; Let's rotate the cx:bx:ax 48-bit integer left by 9 bits.
; Yeah, it's not pretty. We're basically doing shld's work *before*
; the other shifts take place.

mov dx, ax          ; * ((shl ax, dx, 9)) would do the same to dx
shld bx, dx, 9      ; * go on with the multiword shift
shld cx, dx, 9      ; * go on with the multiword shift
shld ax, dx, 9      ; * shift ax, get bits to shift in from the previous shld

And to think that I wasn’t even planning to support rotates at all …

Example programs

; fibonacci.asm
; Calculates the first few elements
; of the Fibonacci sequence using
; 16-bit unsigned integers, outputs
; them on port #3.

    mov dx, 3
    mov bx, 0       ; * start off with 0, 1 (we won't output those)
    mov ax, 1
    mov cx, ax      ; * backup ax
    add ax, bx      ; * calculate the next element
    jc stop         ; * stop of addition had a carry
    mov bx, cx      ; * recall old ax
    send dx, ax     ; * output element
    jmp spin
; showcase.asm
; The program that runs in the showcase save.
; 32-bit fibonacci sequence. Elements output on ports #2 and #3.
; High word on #2, low word on #3.

    mov ax, 0
    mov bx, 0
    mov cx, 1
    mov dx, 0
    recv ex, 0              ; * wait for a send on port #0
    mov ex, ax
    mov sp, bx              ; * lol, we're not using the stack, so ...
    add ax, cx
    adc bx, dx
    jc die
    send 2, bx
    send 3, ax
    recv cx, 3              ; * wait for the low word display to finish
                            ; * don't worry about cx
    mov cx, ex
    mov dx, sp
    jmp spin
    jmp die
; bfc.asm
; Brainf**k compiler. Yeah. I mean it.
; Writes compiled code to 0x200 (plus the bootstrapper).
; Clears a 0x200 cell array for the BF program.
; Not optimized at all. 23 +'s will compile to 23 ((add ax, 1))'s.
; . and , send and expect raw character codes.
; Unusable without a peripheral that can interpret those character codes.
; In theory, a sanity check could be added to check if the BF source overlaps
; with the region reserved for the compiled code. Something like
;     if (compile.end > util_bootstrap) nope();

    mov sp, 0x0200              ; * so the stack doesn't interfere with the result
    mov cx, util_bootstrap.code_end
    mov bx, .source
    add cx, 1
    mov ax, [bx]
    cmp ax, 0
    jz .done
    add bx, 1
    cmp ax, 60
    jb .br__inc_in_dec_out      ; * char's below 60, it's one of '+' (43), ',' (44), '-' (45) and '.' (46)
                                ; * char's not below 60, it's one of '<' (60), '>' (62), '[' (91) and ']' (93)
    cmp ax, 91
    jb .br__left_right          ; * char's below 91, it's either '<' (60) or '>' (62)
                                ; * char's not below 91, it's either '[' (91) or ']' (93)
    cmp ax, 93
    jb .br__while               ; * char's below 93, it's '[' (91)
                                ; * char's not below 93, it's ']' (93)
    pop ax                      ; * pop address of the previous '['
                                ; * there's no error checking here, although it'd be as simple as "cmp sp, 0" and "jz error"
    mov [cx], 0x4E01            ; * we're jumping to the previous '[' unconditionally
    mov [cx+1], ax              ; * we also need the address for the jump stored as an immediate value
    add cx, 2                   ; * skip to the instruction after the ']',
    mov [ax+2], cx              ;   the conditional jump at the previous '[' needs its address to know where to jump
    sub cx, 1                   ; * but we're not there yet
    jmp .loop
    push cx                     ; * push the address for the next ']'
    mov [cx], 0x9488            ; * encodes "cmp ax, 0"
    add cx, 1                   ; * skip one cell for the conditional jump
    mov [cx], 0x4601            ; * encodes "jz imm"
    add cx, 1                   ; * skip one cell for the immediate value
    jmp .loop
    cmp ax, 62
    mov ax, -1                  ; * _something_ is -1 at a '<' (flags are not affected)
    jb .br__left                ; * char's below 62, it's '<' (60)
                                ; * char's not below 62, it's '>' (62)
    mov ax, 1                   ; * _something_ is +1 at a '>'
    mov [cx], 0x3110            ; * encodes "mov [bx], ax", which writes back the data to the pointer
    add cx, 1
    mov [cx], 0x8101            ; * encodes "add bx, _something_", bumping the pointer
    add cx, 1
    mov [cx], 0xC501            ; * encodes "and bx, 0x1FF", masking the pointer
    add cx, 1
    mov [cx], 0x01FF
    add cx, 1
    mov [cx], ax                ; * actually, _something_ is encoded here
    add cx, 1
    mov [cx], 0x3910            ; * encodes "mov ax, [bx]", which gets the data pointer over to data
    jmp .loop
    cmp ax, 45
    jb .br__inc_in              ; * char's below 45, it's either '+' (43) or ',' (44)
                                ; * char's not below 45, it's either '-' (45) or '.' (46)
    cmp ax, 46
    jb .br__dec                 ; * char's below 46, it's '-' (45)
                                ; * char's not below 46, it's '.' (46)
    mov [cx], 0x50C0            ; * encodes "send dx, ax"
    jmp .loop
    mov ax, -1                  ; * _something_ is -1 at a '-'
    mov [cx], 0x8081            ; * encodes "add ax, _something_", bumping the data
    add cx, 1                   ; * skip to the immediate value
    mov [cx], ax                ; * actually, _something_ is encoded here
    jmp .loop
    cmp ax, 44
    jb .br__inc                 ; * char's below 44, it's '+' (43)
                                ; * char's not below 44, it's ',' (44)
    mov [cx], 0x54C0            ; * encodes "recv ax, dx"
    jmp .loop
    mov ax, 1                   ; * _something_ is +1 at a '+'
    jmp .br__dec_inc_common
    mov [cx], 0x1000            ; * encodes "hlt" at the end of the code
    jmp util_bootstrap
.source: dw "++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.", 0 
.end:                           ; ^ helloworld taken from Wikipedia

@org 0x0200
    mov cx, compile.end         ; * part that kills the compiler begins
    sub cx, 1
    mov [cx], 0
    jnz .loop                   ; * part that kills the compiler ends
    mov sp, 0
    mov ax, 0
    mov bx, 0
    mov dx, 3                   ; * configure this - this is the output port

Handling subframe tech safely

— Oh noes, it broke!

That most likely happened because you tried adding something to the save when it was not paused or messed up the particle order some other way. When it comes to subframe tech, you can’t just place or remove particles, as doing that might change the order of evaluation.

Subframe tech relies on that order of evaluation, or particle order, which determines which particle is evaluated after which. When you open a save, particles are assigned their IDs in a way that the ID of a particle is always bigger than that of a particle above it or on its left. This order must be preserved as subframe tech uses it to schedule its tasks inside a frame (hence the term subframe).

If this order is broken by any means (typically by placing or removing particles), it must be restored before a frame elapses, otherwise the results may be catastrophic. This restoration is done by saving and reopening the save, as this resets particle order in the manner described above. The simulation must be paused while particles are being placed or removed for this technique to be effective.

I also recommend turning Heat Simulation, Air and any sort of Gravity off.


Jump to top