Configurable RWDS sampling and clock-start delay. #21

phsauter · 2024-12-02T16:04:16Z

A finalized version of #20
Current State: Tested in RTL simulation

During the Command-Address block of read and writes (except for zero-latency writes), the RWDS signal must be sampled to determine the additional latency required (1x the configured value if it is LOW, 2x if it is HIGH).
RWDS is driven from the device some times after CS is driven low by the controller and it is de-asserted by the device synchronous to the CK edge during the time the last CA data is stable.

For the worst case timing (t_DSV max, t_CSS min and t_CKDS min) the RWDS signal is only valid for about one clock period (two cycles to three cycles after CS going low).
It is important to mention that according to spec, the RWDS signal from the device and the clock start from the host are both referenced to the CS going low edge but not each other. Meaning in the case of a slower than spec'd clock, RWDS may arrive earlier than expected (even before the clock starts).

This presents us with two problems:

We must ensure we sample at exactly the right time to capture the correct RWDS data.
Given a long chip-to-chip path (or slow pads), the arriving RWDS signal might be shifted back, meaning the valid window is at the end or even beyond the end of the outgoing CA block.

Fixing 1:
Solved by adding a separate module that counts clock edges after CS going low. This then enables a clock gate, propagating a single clock pulse to the sampling flip-flop. The exact edge where the sampling should occur is configurable.
A second flip-flop is added to cross the signal into the clk_phy domain the main FSM is in.

Fixing 2:
Two additions were made:

There is a configurable CS going low to clock-start delay which can give the FSM more time to work before it needs to start the RWDS sampling process.
Applying the additional latency is delayed from the Command-Address phase to the end of the first latency. This increases the available time for RWDS sampling by at least three cycles (depending on the configured latency).

Together the clock-start and RWDS sampling time are a lot more configurable and should greatly increase the control, especially when operating with out-of-spec frequencies.

See Figure 9.4 and 9.5 in the spec (read timing diagrams) as well as table 9.4 (AC parameters). Consider that the diagrams are not to scale for a worst case transaction, t_DSV in particular is way too small.

The attached image show the relevant signal of the new RWDS sampler. The chip select to clock-start time tcs->ck can be increased from the minimum of one clock cycle. trwds can be changed in 1/2 clock cycles steps with the minimum being 0.5 clock cycles (corresponding to the falling edge immediately after CS going low)

Acording to spec: t_DSV (data strobe valid) which is the time from CS# going low to the first hyperbus clock can be at most 2 clock periods long (12ns@166MHz). This shrinks the RWDS valid window down to one period centered on CA4 (5th data transaction). Meaning it is valid around the 3rd rising edge of CK. Problem: With additional routing delay this may cause the RWDS sample register (clocked by clk_i) to miss the stable period of RWDS. Solution: Delaying the clock is allowed and gives RWDS more time to arrive and creates a larger stable window. It is possible to set this to zero to increase throughput.

For the worst case RWDS timing (t_DSV max, t_CSS min and t_CKDS min) the window of validity for RWDS is around one clock period centered around the 3rd rising edge of CK. This ensures we sample exactly then. Other sampling may lead to improper results (from sampling high Z) and increases the risk of metastability. For long chip-to-chip delays (or slow pads) it may still be necessary to increase the CS falling edge to first CK edge time.

Decouples the clock domain better, only the rwds_sample_o signal crosses between phy and system clk.

Exact sampling edge is adjustable.

We want to give the RWDS sampler as much time as possible to get a value. So we delay the additional latency decision to the latest point possible.

The reset was not being triggered since the gated clock stops before chip select goes high. A sticky bit driven by the ungated clock is used to indicate start of transfer. The counter reaching the target value is used to reset the sticky bit. Counter only counts while it is set (when the transfer starts until the target is reached).

phsauter and others added 9 commits November 28, 2024 18:25

Remove phy-fsm singals from RWDS sampling

d294d0a

Decouples the clock domain better, only the rwds_sample_o signal crosses between phy and system clk.

Refactor RWDS sampling to separate module

52adf91

More configurable RWDS sampling

6c7f5d8

Exact sampling edge is adjustable.

Delay additional latency decision, add cfg regs

0f7e079

We want to give the RWDS sampler as much time as possible to get a value. So we delay the additional latency decision to the latest point possible.

Set Hyperram chips conf register in tb

4d157b1

Sample RWDS relative to CS edge not clock start

21b2940

phsauter requested review from paulsc96, thommythomaso and luca-valente as code owners December 2, 2024 16:04

Adjust NumBaseRegs to include rwds_sample reg

4ab8088

phsauter force-pushed the phsauter/rwds_sampling branch from 5c79efb to 4ab8088 Compare December 3, 2024 15:39

phsauter marked this pull request as draft April 8, 2025 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configurable RWDS sampling and clock-start delay. #21

Configurable RWDS sampling and clock-start delay. #21

Uh oh!

phsauter commented Dec 2, 2024

Uh oh!

Uh oh!

Configurable RWDS sampling and clock-start delay. #21

Are you sure you want to change the base?

Configurable RWDS sampling and clock-start delay. #21

Uh oh!

Conversation

phsauter commented Dec 2, 2024

Uh oh!

Uh oh!