BigInt in nimVM vs Python

Posted on Jul 6, 2021

I’m currently working on a digital logic DSL in Nim that is called Polished, because it provides a polish digital logic design experience. I need to decide whether or not to do compile-time or runtime elaboration of the DSL as compile-time execution in Nim run’s on the nimVM.

The nimVM is known for being on the slow side; I wanted to know just how slow, so I decided to do some testing with BigInt in both nimVM and Python, but first some background:

Background

For Polished, I’m considering two possible toolchain flows, A and B:

A

1. Nim consumes DSL and perfoms compiletime elaboration(with homoiconic Nim Macros), transforming the input DSL into a netlist that is valid nim at compiletime.
2. Nim simulates the netlist at runtime AND/OR
3. Nim lower DSL into Verilog, VHDL, or Yosys-RTLIL netlists at runtime.

B

1. Nim consumes DSL at runtime
2. Nim elaborates DSL at runtime
3. Nim lowers DSL into Verilog, VHDL, or Yosys-RTLIL netlists AND/OR Nim lowers DSL and emits a new Nim file and the runtime quits
4. This new Nim file can be simulated.

Step B.3 might seem kind of strange, but consider the following:

Some Reference Flows

• Verilator compiles Verilog into C++ files which must themselves be compiled to simulate the Verilog
• nMigen-Python HDL actually JITs its AST into Python bytecode behind the scenes before simulation(see here)
• SpinalHDL is even more complex with a multistep process that looks like the following (although not necessarily in the following order):
1. Lower SpinalHDL into verilog
2. Generate Scala-JVM FFI RTL signal hooks to interface between Spinal in Scala and Verilator’s generated C++
3. Invoke Verilator to compile verilog to C++
4. Compile generated C++ probably as a shared runtime loadable library
5. Hook up all of the above and start the Spinal runtime on the JVM
• As if that’s not enough, Chisel sometimes works with FIRRTL and then does the Chisel equivalent of everything on the above list after step 1. Chisel first has to lower the Chisel AST into FIRRTL, and if the user wishes to simulate using Verilator as a backend, Chisel will get the FIRRTL compiler to lower FIRRTL into Verilog for Verilator to consume. If the user wishes to use the FIRRTL simulation backend, Chisel can invoke the FIRRTL simulator which is written in Scala directly after lowering the Chisel AST to FIRRTL.

Why Not Elaborate and Simulate in the Same Runtime?

Well, the basic answer is simulation speed. If you build the logic AST during runtime, then you have to use dynamic structures such as linked lists for basically everything. Once the AST for the logic is built, it never changes during simulation, so might as well have the AST be static to begin with.

Since Nim is Homoiconic, I can actually design the DSL such that the AST is built at compiletime and thus is static at runtime. The only issue with such an approach is that the compiletime VM Nim uses isn’t all that fast.

This poses a problem given the following two primary reasons that motivate me to create a Digital Logic DSL in a general purpose language are:

1. Digital Logic Generation should be meta parameterizable
2. Validation of Digital Logic designs takes up to 80% of the Digital Logic design cycle and would immensely benefit from the abstractions modern programming languages provide when writing testbenches.

Let’s consider reason no. 1 from above:

Imagine that you wished to use a neural network to help you determine the optimal parameters for you digital logic design. If you evaluate the neural network in Nim to get the parameters with which to parameterize your hardware, and this evaluation has to occur at compiletime on the nimVM, the model evaluation could be unacceptably slow resulting in a very slow Digital Logic elaboration.

How Slow is NimVM?

I thought it would be interesting to compare BigInt in nimVM vs compiled Nim, vs Python to get a sense of where things stand.

For this to work, you need Nim 1.5.

nimVM

import times
import bigints

proc fac(n : BigInt) : BigInt =
if (n == 0'bi):
return 1'bi
else: return n*fac(n - 1'bi)

proc main =
var t0 = cpuTime()
var res = fac(100'bi)
var tf = cpuTime()
echo "res = ", res
echo "time spent = ", tf - t0, "s"

static: main()


running it: nim --benchmarkvm fac.nims

Compiled Nim

import times
import bigints

proc fac(n : BigInt) : BigInt =
if (n == 0'bi):
return 1'bi
else: return n*fac(n - 1'bi)

proc main =
var t0 = cpuTime()
var res = fac(100'bi)
var tf = cpuTime()
echo "res = ", res
echo "time spent = ", tf - t0, "s"

main()


running it: \$nim c -r -d:release fac.nims

Python

import time

def fac(n):
if (n == 0):
return 1
else:
return n*fac(n - 1)

t0 = time.time()
res = fac(100)
tf = time.time()

print(f"time spent = {tf - t0}")


running it: python3 fac.py

Results Table

The factor column has been normalized to Python speeds

Config Time(s) Factor Slower than Python
nimVM 0.010548 ~311
Compiled Nim 0.000327 ~10
Python 0.0000339 1

Conclusions

1. Clearly, BigInt implementation in Nim needs improvement
2. For arithmetic intense operations, NimVM doesn’t seem to handle too well.
3. Clearly compile time elaboration is not the way to go.