A Stupid Journey with Verilator, Altera BRAM, and Tristate Jiacheng Ma

A Stupid Journey with Verilator, Altera BRAM, and Tristate

Unlike the software community, the hardware community tends to use extremely expensive, licensed toolchains such as Quartus, Vivado, and other EDAs from Synopsys and Mentor Graphics. Open source tools exist, but, meh… 🙂

Earlier this week I was using verilator to simulate a toy project, which uses an IP from Intel/Altera called altsyncram, a widely used block ram module for Intel/Altera FPGAs. This IP is provided in Quartus’s simlib as a huge verilog file (so other EDA tools like VCS and ModelSim can use it to simulate FPGA projects). After a bunch of hacking (i.e., turning off errors and warnings), I got it compiled. However, this module never gives a meaningful read response. The problematic code can be summarized as follow:

module testram(
    input logic clk,
    input logic wen,
    input logic [3:0] waddr,
    input logic [15:0] wdata,
    input logic [3:0] raddr,
    output logic [15:0] rdata
);

    altsyncram #(
        .intended_device_family("Stratix"),
        .operation_mode("DUAL_PORT"),
        .width_a(16),
        .widthad_a(4),
        .numwords_a(16),
        .width_b(16),
        .widthad_b(4),
        .numwords_b(16),
        .rdcontrol_reg_b("CLOCK0"),
        .address_reg_b("CLOCK0"),
        .outdata_reg_b("UNREGISTERED"),
        .read_during_write_mode_mixed_ports("OLD_DATA")
    ) bram (
        .clock0(clk),

        .wren_a(wen),
        .address_a(waddr),
        .data_a(wdata),
        .rden_a(0),

        .address_b(raddr),
        .q_b(rdata),
        .wren_b(0),
        .rden_b(1),

        .data_b(),
        .clock1(),
        .clocken0(),
        .clocken1(),
        .clocken2(),
        .clocken3(),
        .aclr0(),
        .aclr1(),
        .byteena_a(),
        .byteena_b(),
        .addressstall_a(),
        .addressstall_b(),
        .q_a(),
        .eccstatus()
    );
endmodule

In this module, I configured altsyncram as duo-port BRAM, where the first port only serves write and the second port only serves read.

Then, I got the following waveform:

In the testbench, I set raddr to 0, waddr to 0, and wdata to 0xff, so hopefully 0xff will be written to the BRAM, and the value can be read out through rdata. From the code, we can see that testram.wen is obviously connected to testram.bram.m_default.altsyncram_inst.wren_a (because we use this port to write). However, verilator does not propagate the signal to it. Also, rden_a has a weird value of 1 (we explicitly set it to 0 in the code above).

A deeper inspection of the code of altsyncram shows that testram.bram.m_default.altsyncram_inst.wren_a is actually typed as tri0. And another wire connecting testram.wen to it, is also typed as tri0. It seems like for some reason, verilator does not handle this unusual case well.

From verilator’s email thread, I found a piece of messages where someone chatted about the tri signals from Altera IPs. In the conversation, they mentioned a script that helps to convert tri signals to logicthat would help verilator to simulate.

OK… That seems to be the one I need to work around the bug. After parsing the simlib with the script mentioned before, I got a few verilog files where all tri0 and tri1 are replaced with logic. Then it seems like the signal propagates!

Now wren_a and rden_a both have the correct values. However, rdata is still not correct. What the **!

To figure out the real root cause of this bug, I ran the simulation on VCS. VCS can generate the correct waveform when using the original simlib (i.e., the altsyncram directly from simlib, which contains a bunch of tri0 and tri1). However, with the version where all tri0 and tri1 are replaced with logic, VCS shows exactly the same result as verilator.

The problem is about tri0 and tri1. These are special wires that having a value z is also meaningful. When a wire with type tri0 has a value z, it is treated as 0; when a wire with type tri1 has a value z, it is treated as 1. Simply changing all tri0 and tri1 to logic may break the assumption that is held by a huge amount of codes in the implementation of altsyncram. In our case, in the module testram, a number of the interface wires of bram are left as unconnected; in verilator, the default value of these wires are 0.

The solution is to change testram to the following:

module testram(
    input logic clk,
    input logic wen,
    input logic [3:0] waddr,
    input logic [15:0] wdata,
    input logic [3:0] raddr,
    output logic [15:0] rdata
);
    altsyncram #(
        .intended_device_family("Stratix"),
        .operation_mode("DUAL_PORT"),
        .width_a(16),
        .widthad_a(4),
        .numwords_a(16),
        .width_b(16),
        .widthad_b(4),
        .numwords_b(16),
        .rdcontrol_reg_b("CLOCK0"),
        .address_reg_b("CLOCK0"),
        .outdata_reg_b("UNREGISTERED"),
        .read_during_write_mode_mixed_ports("OLD_DATA")
    ) bram (
        .clock0(clk),
        .wren_a(wen),
        .address_a(waddr),
        .data_a(wdata),
        .rden_a(0),
        .address_b(raddr),
        .q_b(rdata),
        .wren_b(0),
        .rden_b(1),
        .data_b(),
        .clock1(),
        .clocken0(1), // clocken0 was tri1
        .clocken1(1), // clocken1 was tri1
        .clocken2(1), // clocken2 was tri1
        .clocken3(1), // clocken3 was tri1
        .aclr0(0), // aclr0 was tri0
        .aclr1(0), // aclr1 was tri0
        .byteena_a(1), // byteena_a was tri1
        .byteena_b(1), // byteena_b was tri1
        .addressstall_a(0), // addressstall_a was tri0
        .addressstall_b(0), // addressstall_b was tri0
        .q_a(),
        .eccstatus()
    );
endmodule

In the above code, we manually assign a value to these “useless” wires. For the wires with a type tri0, we assign a value 0; for the wires with a type tri1, we assign a value 1. The simulation result is shown below.

Everything is correct, and we can get the value!

September 24, 2020