Bluespec. The Rust of Hardware Design?

Posted on Oct 7, 2022

My Foray Into Bluespec

In 2017, a senior Phd student in the Synergy Accelerator Lab at Georgia Tech designed a machine learning accelerator engine in BSV - probably because they taught(perhaps still do) BSV as part of the undergrad curriculum at a particular university in Seoul. In 2018, my grad advisor wanted me to synthesize those designs into an FPGA. Since the BSV syntax was foreign to me at the time, I quickly developed an aversion to BSV and never actually bothered synthesizing that accelerator for an FPGA.

For various reasons, I eventually ended up getting involved in LiteX(circa fall 2019), mainly - I noticed a bunch of FPGA implementations published in academia were difficult if not impossible to reproduce, so I wanted my machine learning accelerator design to target a fully open source and reproducible FPGA toolchain. Yosys+NextPNR+OpenOCD formed the toolchain, and LiteX provided the IP needed for DDR3, ethernet, etc.

I was particularly interested in the part of LiteX called LitePcie, which is written in Migen. After experimenting with Migen, I left it for it's spiritual successor, nMigen(now Amaranth), which was new at the time and offered the first class support for module hierarchies that Migen lacked. I then left nMigen for Chisel as nMigen simulation speeds were becoming unacceptably slow for a larger PowerPC design which admittedly wasn't well structured to begin with. In addition to slow nMigen simulation speeds with the built-in simulator, the faster CXXRTL-yosys simulation backend for nMigen was still in its infancy. And after a few months, I left Chisel for SpinalHDL as Chisel had some issues with regards to inspecting single elements in vectors or printing enums during simulation.

And I never looked back from SpinalHDL - until recently. Not that there's anything wrong with SpinalHDL- any difficulties with the SpinalHDL experience are most Scala centric - namely:

Working offline with Scala code that uses SBT is rather difficult
The Scala compiler can be pretty slow

I'd been using SpinalHDL on and off for about a year an a half - during that time I'd also starting tinkering with CommonLisp, Nim, Rust, a tiny bit of OCaml - before quickly switching to Haskell... Which leads me to...

The Ideal RTL

I think the ideal RTL must satisfy the following properties:

Good syntax and predictable behavior.
Easy to build - ie - doesn't require complex makefiles and complex import statements.
Easy to simulate - ie - simulation doesn't need lots of makefiles and/or lots of boilerplate code
Easy support for multiple compiler backends - ie - I can have SpinalHDL emit VHDL or Verilog, and adding support for RTLIL or FIRRTL to Spinal shouldn't be too hard at all.
Supports sane offline builds. Yes - you can write makefiles for Scala - but the assumption that you're always connected to the internet seems to be baked into most if not all JVM based languages. Offline builds are important for HDL designs especially in defense. And also, nearly every other language I can think of supports building offline OOTB.
Correct abstractions. Wires should always only behave like wires. Regs should only behave like registers... Etc
First class support for multiple clock domains.
Good type system. Spinal and Chisel inherit the semantics of the Scala type system - which uses ADTs. So really, that makes Chisel, Spinal, and BSV/BH the only option for RTLs that support good type systems. Good type systems also make connecting up complex hardware such as an AXI slave to an AXI crossbar a trivial-one-liner. The type system can then also ensure that no connections were missed and that only compatible connections are being made - etc. Update: There are other HDL such as Spade and Clash use ADTs.
Modern language features. This would include support for first class functions, support for unit tests, a package manager that can fetch an build sources for dependencies - etc.
IDE support. It's nice to be able to right-click on an instantiation of a hardware module and have your IDE jump to the source file that defines it - or in the case of Spinal or Chisel, be able to hover over an AXI type and see a hover box displaying the IOs defined in that type. It's features like this that make being a productive programmer so much easier.
Safety by construction. Rust as a language demands that the programmer write memory and type safe code by construction. Bluespec as a language demands that the designer partition the system into atomic rules, and the compiler schedules the rules in such a way that register reads occur before register writes, and that a wire can't have multiple drivers.

Why Bluespec

I first address Bluespec's current shortcomings, and then delve into its rewarding benefits.

IDE Experience

BlueSpec satisfies all of the above requirements except for 9 - but I'm currently looking at how to take the haskell code that bluetcl links to and turn it into a language server.

BSV Syntax

With regards to syntax, I find BSV confusing and Bluespec Haskell(BH) refreshing. The bsc compiler accepts both BSC and BH sources at the same time.

I would describe BSV as an attempt to embed what is truly a Haskell-like language into a language that looks like Verilog on the surface.

Clock Domains

Bluespec from what I understand has first class support for multiple clock domains and also makes it impossible to cross clock domains without using a synchronizer.

Computational Model

Bluespec code is unusual in that it's not really a programming language that describes a traditional sequential computer program, nor is it really HDL that describes a circuit. It's more of a computational model presented in TRS format. The computational model was chosen to reflect the nature of hardware design and synthesis. Code in Bluespec is constructed Haskell style, so you get first class functions, functors(I STILL don't know what these are - time to read a category theory textbook), lists, monads, ADTs, etc. All this combined gives you incredible power to succinctly and yet correctly describe hardware.

I think nothing demonstrates this better than the fact that it is possible to write a functional AXI router in bluespec without actually having to construct the arbiter manually. You can just tell bluespec the arbiter you'd like to use such as round-robin - and bluespec will generate the rest of the logic for you. Also - with respect to being correct by construction, it's impossible to construct things with multiple ports in bluespec without interactively specifying priority. What I mean by this is that the bluespec compiler will either tell you that some ports in your design will effectively never get used, or that you need to bring in some sort of arbiter.

I'm not sure how many lines of code are needed to build an AXI router in bluespec, but I'd be willing to be it can be done in a few hundred or less.

The Design Cycle

Typically, RTL designers design at least twice, maybe even thrice. They first develop a mental model or simulator of what they want to design, and then translate that mental model into diagrams, and finally translate that diagram into RTL.

In one of his talks, Nikhil from Bluespec(the company) mentioned that when working on POWER9, IBM originally had a C++ codebase to help model the processor subsystems, but then ended up switching to Bluespec - the net result being a more manageable and much faster codebase in terms of simulation speed.

In Bluespec, the haskell like abstractions allow for quick high modeling of complex systems without losing the ability to quickly tune for meeting timing afterwards - as I mention later.

Atomics

RTL designs can have special edge cases. For example, consider the IQ(Instruction Queue) in an Out-Of-Order(OOO) design. The cycle that an instruction is entered into an instruction queue could be the cycle that a particular functional unit writes back to the registers that would be sources for the instruction currently being entered into the IQ. Such a condition might normally be considered an edge case requiring some out-of-band bypass wire, but we can exploit the atomicity of rules in Bluespec to automatically handle such an edge case by placing the instruction insertion and checking for updates to any renamed registers in the same rule, thus making instruction queuing atomic with respect to register updates.

See the pages 4 and 5 from Composable Building Blocks to Open up Processor Design Micro 2018 paper from MIT in which they detail their methodology for building an OOO RISC-V core for more information.

Splitting Up A Design

Another challenge that the RTL designer can encounter is that it can be difficult to decide how to split up functionality across the design. Furthermore, it can be challenging to ensure that communication between split functionalities are correct.

With bluespec, if you are able to split up your design into atomic invariants, that is, pieces of logic that are always true if certain conditions are met, and if the compiler is able to compile your design without complaining, you'll end up with a design that is always guaranteed to either work - or stall - depending on how fair of a scheduler you use for your design, if you even need to use a scheduler. I know its also possible to inspect and possibly analyze the generated schedule. I'm not sure what level of analysis is possible, but it'd be nice to be able to assert that a certain rule fires with a certain frequency under certain conditions - I'm not sure if this is already possible or not.

Optimizing Your Design

With bluespec, you can do things like replace FIFOs with fall through FIFOS and Regs with CRegs(which are basically multi ported fall-through regs), and the compiler will automatically ensure that your design still works.

This can be very powerful when it comes to optimizing your design within the speed-area tradeoff design space. Also, if you structure your program well and make good use of the powerful abstractions that are native to Haskell-like languages, you may also have access to quick and safe design refactors.

Simulation Speed

Bluespec simulates roughly as fast as Verilator. I've seen claims that say Bluespec can even simulate 3-4x as fast as Verilator.

What Make Bluespec Like Rust?

I'd like to compare Bluespec to Rust because when your program compiles in Rust - it actually means something - namely that your program is memory and type safe(unless you explicitly use unsafe!). And when your Bluespec program compiles, it means that it is type safe, and that the rules for scheduling atomic rules are always satisfied.