June 3rd, 2020

The simplest possible CPU

Early this year, I wrote about a couple of interesting projects involving building computer CPUs using individual 7400-series logic chips, which one might compare to the process of building a car engine by meshing together a thousand hamster wheels. Building CPUs out of discrete chips like this is certainly not the most efficient way to do things, but the goal is not efficiency, but rather understanding: By being able to see each individual part of the CPU and probe it with electronic analysis tools like oscilloscopes and logic analyzers, one can really gain an understanding of how the circuitry works, an understanding which is rather lost when circuits become highly-integrated bits of silicon encased in sealed plastic blocks.

At the time when I wrote that post, I complained about the lack of documentation available for both of the projects: The projects themselves were no doubt very cool and impressive, but there wasn't really a lot of concrete information about how they worked, which meant that people who wanted to learn from these projects were rather at a loss, since learning comes from information. Recently, Drass, the designer of the C74-6502, was kind enough to comment on that earlier post of mine with the information that full schematics for the two main cards of the C74-6502 are now available at https://c74project.com/c74-6502-internals/. I took a look through these schematics, and I have to admit that they're very impressive, actually quite a bit more so than I was expecting; the C74-6502 really is a remarkable piece of work, and creative electronic engineers could gain a lot of inspiration for project designs from it.

I haven't analyzed the design of the C74-6502 in deepest detail, and not being an experienced CPU designer myself, I'm not in a position to criticize the design even if I knew it inside out; doubtlessly there are design criticisms which could be levelled at it, but now that the design is live and available for everyone to see on the Internet, there's only one real problem which I would highlight with it, and this is not something Drass can be faulted for, but rather something which is inevitable and rather integral to the nature of a project like this: The C74-6502 is quite complex, and it incorporates quite a lot of designs which are specific to it.

Of course, this is to be expected. Unlike my own admittedly rather half-hearted efforts at making a 7400-based 6502-compatible CPU some years ago, Drass did not set out to replicate the exact internal workings of the original 6502; the point was not to make a replica device with the same internal structure, but rather a compatible device which works differently inside but yields the same results. The C74-6502 is not built exactly like a 6502 inside, and it doesn't have to be; Drass took the liberty of creating original designs, structures, and terminology for the various parts of the CPU, which makes perfect sense since making a CPU out of 7400-series chips is not like making a CPU out of individual transistors, as the original designers of the 6502 did. The only problem with a design like this is that it requires people to learn a lot of terminology and concepts which are specific to this one design which might not be applicable in the larger world of CPU design or electronics engineering.

The same was true of the original 6502. Again, a significant level of complexity isn't something you can avoid when building a CPU, nor is coming up with some of your own designs and terminology. The problem with studying any specific CPU is that it doesn't really teach you how a CPU works so much as it teaches you how that one specific CPU works. For example, the C74-6502 contains a lot of signals with the letters "MX" at the end of their names. I don't know what this is or what it means; it isn't something that was present in the original 6502. I'm guessing it's short for "multiplexer" or "multiplexed", which would make sense since the C74-6502 contains an array of 74AC153s in its ALU, but figuring out what all of that is would be learning specifically about the C74-6502 itself rather than about the 6502. So while the C74-6502 is a fantastic achievement, and all my praise and gratitude goes to Drass for designing the device and publishing the designs online, I began thinking about how to make a simpler CPU, something more appropriate for learners who don't necessarily want to write War and Peace as their first novel, but would be content to start with something a little less ambitious on the first try. What would the simplest possible CPU look like?

The concept of designing the simplest possible CPU for learners is not a new one; indeed, one of the seminal books on computer architecture is Albert Paul Malvino's Digital Computer Electronics, which contains the design for a computer which Malvino named "Simple as Possible", or SAP for short. SAP is not much more than a handful of registers, but it is a functional CPU; it only has a few instructions and thus isn't very useful, but it effectively shows how a CPU works, and once you understand how it works, it's not difficult to understand how you could add more instructions to it. Indeed, the 6502 and the C74-6502 are mostly just registers and a bunch of logic tying those registers together. So you can legitimately make a computer by just strapping some registers to each other, but one could conceivably conceive of a CPU that is even simpler than that.

Many people imagine the CPU as being sort of like a calculator, and it's true that the CPU is usually where general mathematical operations are performed inside a computer, but math calculations are specifically the job of what's called the ALU (arithmetic logic unit), which is only one part of a larger CPU. In reality, most of what a CPU does is just move bits of information around; it acts as a sort of traffic cop, directing information between various memory cells and I/O ports. If you look at a typical assembly-language or machine-language program, most of the instructions boil down to "put this information over there"; that's most of what a CPU does.

At the most fundamental level, however, a CPU is really just an instruction-follower. The function of a CPU is to take in instructions, and then perform those instructions. Or rather, the CPU doesn't perform instructions; it tells other things to perform instructions. It sends signals to the memory and to the I/O ports telling them what information to transmit. So really, a CPU is fundamentally something which converts an instruction into a set of control signals. In very simplified and rather non-technical terms, if the CPU gets an instruction to print the letter "A" on the screen, it sends a signal to the graphics controller saying "Hey, graphics controller, print an 'A', would you?" and then trusts that the graphics controller will comply. That's all a CPU does: It delegates tasks. It gets its orders from the computer's memory, but then delegates those orders to other devices (which may include sending orders back to the memory).

So a CPU takes input in the form of an instruction, and generates output in the form of control lines. Imagine two wires which are connected in parallel to the inputs of an AND gate and an XOR gate. Further imagine that the output of the AND gate is connected to a red LED, while the output of the XOR gate is connected to a green LED. The device thus described is, in a sense, a very simple computer: If it gets an instruction of either 01 or 10, it will turn on the green LED, but if it gets an instruction of 11, it will turn on the red LED instead. This is what a CPU does: Its instructions are just binary numeric instructions, and its outputs are just control signals hooked up to devices that make them do things.

Indeed, very simple computers like the one I've described, and also like Malvino's SAP, use this kind of combinatorial logic, putting together sets of logic gates which trigger the right control signals based on whatever inputs they get. If you have a larger CPU, however, using combinatorial logic quickly becomes incredibly inefficient. In practical terms, what people usually do when they design a CPU like this is to use a ROM chip. In a fundamental sense, the simplest device that acts like what we could reasonably think of as a CPU is a ROM chip.

Admittedly, this is something of an off-label use of a ROM chip: A ROM chip is supposed to be a memory device. Its main input is its address bus, which is used to select which memory cell you want to read from, and its main output is its data bus, where it outputs the contents of the selected memory cell. Notice, however, that the described behavior can be used to mirror what I described above with the XOR gate and the AND gate: If you think of the ROM's inputs not as memory addresses but as instruction codes, then you can think of the ROM's outputs not as memory contents but as control signals: The ROM produces a set of outputs depending on whatever number is currently being fed into it, and thus it can be used as an instruction decoder. Indeed, most homebrew CPUs, including the C74-6502, use ROM chips to interpret CPU instructions, and thus the process of writing the "microcode", which is the technical name for the contents of that ROM chip, becomes a form of programming, a type of programming which is actually more low-level than writing in assembly language or even machine language. It doesn't get any closer to the metal than that.

Theoretically, you can use any PROM, EPROM, or EEPROM chip as the microcode container for a homebrew CPU. In practice, sizes may become an important factor. The 6502 uses 8-bit instructions, and so the ROM needs to have at least an 8-bit address bus, which most ROM chips can handle, but the number of control signals within the CPU needs to be less than or equal to the number of data pins on the ROM, which could be quite difficult, as the number of outputs from the microcode can be enormous (the 6502's microcode control matrix has 130 control signals). Basically it's as many signals as you need to control whatever the rest of the CPU does. Presumably for this reason, the C74-6502 uses not one ROM chip, but four ROM chips to hold its microcode. These four ROM chips all take the same inputs, but the number of control signals present meant that it was more practical to use several ROM chips than to find a ROM chip with a data bus dozens of pins wide.

So there you go. That is, theoretically, the simplest possible CPU: A ROM chip. In practice, however, most CPU operations take a few steps to complete: Often, you need to read data from one location, process that data somehow, and then store the result of that process in another location. Theoretically, you could build a separate circuit for each instruction so that each instruction gets done all at once, but in practice, CPUs take multiple cycles to perform any instruction. To this end, there needs to be a counter inside the CPU to keep track of which step the CPU is on: Is it on the first step of the current instruction, or the second, or the third? This counter gets reset to zero at the beginning of each instruction so that the CPU can keep track of where it is within the current instruction. Malvino's SAP uses what it simply calls a "ring counter" (RC) to keep track of the current CPU state; the ring counter has 6 outputs and simply cycles through these outputs sequentially from the first one to the sixth one, unless it is reset. Only one output may be active at a time. SAP implements the ring counter using three 74LS107 chips, each of which is a dual JK flip-flop; the flip-flops are wired in series such that each one triggers the next one when the clock pulses. Like pretty much any practical CPU, the C74-6502 also has a ring counter to keep track of the machine state, and for this it uses what it calls the "State Register (Q)", which has 4 states, indicated by the signals Q0, Q1, Q2, and Q3. These are generated by "Q.REG", a 74AC161 4-bit counter chip.

Because each set of control signals is determined by the current instruction and the current step in that instruction, the microcode thus takes not only the current instruction for input, but also all of the outputs from the ring counter so that it can generate the appropriate signals at the appropriate time. Thus, the input bus (address bus) of the microcode ROM needs to be at least the size of the instruction plus the number of ring counter states.

As mentioned, aside from the microcode, ALU, and some timing circuitry, most of the C74-6502 is just registers. A register is really nothing more than a collection of D flip-flops, typically 8 of them. One D flip-flop is essentially a single bit of storage, so if you put 8 of them in parallel in a chip, that chip can hold one byte of data. If you really wanted to, you could use chips like this as RAM, in which each chip would constitute one byte of RAM. In one sense, this would be an incredibly wasteful way to design the RAM for a computer, but it would allow for a highly granular level of analysis, since you could probe literally individual bits of the computer's memory with a voltmeter, logic probe, or oscilloscope. Even more "macroelectronic" would be designing the memory with a system like the "Flip Chip" modules used in computers like the DEC PDP-10. This would allow an absolutely granular level of analysis, down to the individual transistor level.

Of course, the C74-6502 doesn't get quite that low-level; in practice, it uses a variety of 7400-series register chips for its various registers. For example, the instruction register (IR), which contains the current instruction for the CPU to execute, consists of a 74AC273 octal D flip-flop ("octal" meaning that it has 8 of them, i.e. 8 D flip-flops, which, again, constitute 8 bits of data storage). The inputs of this chip are connected directly to the CPU's data bus ("D bus"). The outputs go directly to the microcode ROMs.

Other registers found in the C74-6502 which are fairly standard include:

- Accumulator (abbreviated A), a general-purpose holding area for data
- X and Y memory indexing registers
- Address registers, used to output the desired memory address onto the address bus (some CPUs call this a MAR, or memory address register)
- Flags register (abbreviated the "P" register, for "processor status register"), each bit of which stores information about the state of the CPU
- Program counter (abbreviated PC), which contains the address of the current instruction in memory (some CPUs call this register an instruction pointer (IP) or instruction address register (IAR))

And that's it for now... In the future, I'd like to take a deeper look at exactly how the C74-6502 is structured and how it works, but perhaps by then Drass will have beaten me to the punch by including those details himself. In conclusion, my thanks are again due to Drass, who has gone to great lengths to document this important information for the world. I can only hope that future generations benefit from the understanding that this project has to impart.