Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timestamp API #860

Open
catkira opened this issue Jul 4, 2022 · 47 comments
Open

timestamp API #860

catkira opened this issue Jul 4, 2022 · 47 comments

Comments

@catkira
Copy link
Contributor

catkira commented Jul 4, 2022

A possible way to support timestamps would be to add the following functions to libiio

struct meta_command
{
    uint64 timestamp,
    uint64 num_samples,
    enum type
    {
        send_timed,
        send_untimed  // timestamp is ignored
        receive_timed,
        receive_untimed
    },
    uint64 command_counter
};

set_timestamp_mode(bool mode)
{
    // write mode to mode iio attribute of timestamper core
   // if mode is false, everything should behave like now
}

write_timed_command(meta_command cmd)
{
    // write cmd to command iio channel of timestamper core
}

For getting the timestamp from HDL to user there could be a timestamp added to each block. The disadvantage of this would be, that the blocksize would not be num_samples * sample_size anymore. Also inside the axi_dmac it would be a bit more complicated, because the internal fifo size is calculated like burst_size * num_bursts.
I think the cleanest way would be to add a receive channel to the timestamper module, that gives out a timestamp for every meta command. The difficulty here would be to match the received timestamp to the meta command (probably easy for sync reads and more difficult for async/non-blocking reads). One could add a command_id to the meta_command struct to make this easier. Or add a get_timestamp() function to libiio that matches timestamps to received data blocks. But one would have to make sure, that this matching also works correctly in case of overflows in the HDL. So adding a command_counter to meta_command would probably be the easiest. The command_counter value could then also just be a field of the received_timestamp struct

struct received_timestamp
{
    uint64 timestamp,
    command_counter
}

get_received_timestamp(received_timestamp &timestamp)
{
    // read an item from the rx timestamp channel of the timestamper core
}
@pcercuei
Copy link
Contributor

pcercuei commented Jul 5, 2022

I disagree with your suggestion, the timestamp value should not be written by the DMA to the samples buffer. Instead, it should be written by the IP to a separate hardware register, that the kernel can read in the IRQ handler, and provide to the userspace through a device or buffer attribute. That would make things way simpler: the application would just need to dequeue a block of samples, then optionally read the attribute.

Same for TX - the userspace would write a device/buffer "timestamp" attribute, which would cause the kernel to write to a register of the IP, that would control how the next buffer is transferred.

Note that I don't really think we need a "timestamp", but a "samples counter". Since we know the sample rate we can retrieve the timestamps anyway, and having a "samples counter" would make it much easier to detect DMA underruns / overruns.

@pcercuei
Copy link
Contributor

pcercuei commented Jul 5, 2022

There's no need for buffering in the FPGA. The DMA should process the requests sequentially like it already does, and wait until the transfer time is reached if needed. So it does not need to know about any following request before the previous ones are finished.

The number of bytes does not need to be specified, just transfer the whole block of samples. If you want to transfer a non-standard number of samples, then just create a block of that size.

Adding a separate iio channel is a bad idea, there would be no way of matching the data from that channel to the samples of the main buffer. You only need to instruct the DMA to wait for a specified time before transferring a whole buffer, so a hardware register (appearing as a device/buffer attribute in userspace) looks appropriate to me.

Baseband frequency changes don't change the sample rate of your ADC. The sample rate does not change on-the-fly like that. Making the timestamp core work with samples count would work much better and be way more precise.

@pcercuei
Copy link
Contributor

pcercuei commented Jul 5, 2022

The DMA cores already have hardware queues, and switching between one configured transfer to another is instant and does not cause samples to be missed.

@pcercuei
Copy link
Contributor

pcercuei commented Jul 8, 2022

I don't really know what the FPGA program does, I just know that we already have a hardware queue for the DMA, and no samples are lost when the DMA switches from streaming one block of samples to another. So we already manage "consecutive burst writes" if you will.

You can manage things with just a single register for meta-data, if this register is read by the IP at the moment it starts streaming from a new block of samples. Then every time the DMA is done streaming from a block of samples, the driver can update the value of the register to the next transfer's setting. And since the timestamper is basically about pausing / unpausing the DMA until a special point in time is reached, the magic register could also be a new field in the DMA hardware descriptor.

As for using time vs. samples count... The example of a random software changing the sample rate isn't really valid IMHO. This software does not use timestamped buffers, and if it did, it would either not change the sample rate (keep the fast rate at all times), or handle it properly (not programming timestamped buffers when a sample rate change is under way). Using a samples counter just makes more sense, as it permits to be extremely precise in the time domain, and detect overruns/underruns.

@pcercuei
Copy link
Contributor

pcercuei commented Jul 8, 2022

If the samples counter simply represents the number of samples by which the next transfer should be delayed, then "now" is just "counter=0".

Programming transfers at a specific point in time, which you call "timestamping", is one mechanism. The other mechanism is tagging the RX buffers with a timestamp, to be able to know when exactly those RX buffers were received. Again, it would make sense to use a samples count value, because then you can detect gaps between buffers. You can compute the age of the RX buffer by comparing with the "live" samples counter, which would be a register updated by the IP every time a sample is received.

Whatever we choose to do, there will be driver changes needed, so whether or not changes are intrusive isn't really a problem.

You are thinking about having a "timestamper control logic" after the AXI DMA core. What I'm telling you is that the AXI DMA core could gain the "programming transfers" functionality, and the rest of the HDL wouldn't change.

@catkira
Copy link
Contributor Author

catkira commented Jul 8, 2022

Changes in the hdl code:
FLAG register will get another flag, i.e. BURST.
Another register will be added, i.e. "start_sample" at address 0x0454. It would be nicer if it was at address 0x410 or so, because thats were all the transfer config registers are, but then all the following register addresses would need to change as well. What do You prefer?

and in buffer.c

struct iio_buffer * iio_device_create_buffer(const struct iio_device *dev,
		size_t samples_count, bool cyclic)

could be changed to

struct iio_buffer * iio_device_create_buffer(const struct iio_device *dev,
		size_t samples_count, int flag, unsigned int start_sample)

in iio-private.h iio_buffer gets a new field:

struct iio_buffer {
	const struct iio_device *dev;
	void *buffer, *userdata;
	size_t length, data_length;
	unsgined int start_sample;  // new field

	uint32_t *mask;
	unsigned int dev_sample_size;
	unsigned int sample_size;
	bool dev_is_high_speed;
};

in local.c block get a new fild

struct block {
	uint32_t id,
		 size,
		 bytes_used,
		 type,
		 flags,
                 start_sample,                 // new field
		 offset;
	uint64_t timestamp;
};

and stuff needs to be modified in buffer_impl.h as well.

The callchain when writing a buffer is:

local.c        ioctl_nointr(f, BLOCK_ENQUEUE_IOCTL, last_block)
  -> industrialio-buffer.c        iio_buffer_ioctl()
    -> industrialio-buffer.c         iio_buffer_enqueue_block(buffer, (struct iio_buffer_block __user *)arg)
      -> industrialio-buffer-dma.c         iio_dma_buffer_enqueue_block(buffer, &block)
        -> industrialio-buffer-dma.c         iio_dma_buffer_enqueue(queue, block)
          -> industrialio-buffer-dma.c          iio_dma_buffer_submit_block(queue, block)
            -> industrialio-buffer-dma.c          queue->ops->submit(queue, block)
              -> cf_axi_dds_buffer_stream.c        dds_buffer_submit_block(queue, block)
                -> cf_axi_dds_buffer_stream.c        iio_dmaengine_buffer_submit_block(queue, block, DMA_TO_DEVICE)
                 -> industrialio-buffer-dmaengine.c          desc = dmaengine_prep_slave_single(...)
                   -> dma-axi-dmac.c       axi_dmac_prep_slave_sg(...)
                 -> industrialio-buffer-dmaengine.c          desc->callback_param = block
                 -> industrialio-buffer-dmaengine.c          dmaengine_submit(desc)

desc has type dma_async_tx_descriptor. I guess that cannot contain meta information, so the new meta information would have to go into block.

Edit: After checking again, i see that there is a field dma_descriptor_metadata_ops in dma_async_tx_descriptor, maybe that can be used.

buffer for tx will be prepared with dmaengine_prep_slave_single which will get forwarded to axi_dmac_prep_slave_sg() of the axi-dmac driver, see here.

After preparing the buffer, it will be submitted with dmaengine_submit(), which does

desc->tx_submit(desc)  // defined in include/linux/dmaengine.h

but I dont see yet how this reaches the axi_dmac driver... But I think the desc will somehow end up in axi_dmac_start_transfer(...).
I think the meta information could be added here:

struct axi_dmac_desc {
	struct virt_dma_desc vdesc;
	bool cyclic;   // probably need to make this an int or enum flag
        unsigned int start_sample;    // new attribute
	bool have_partial_xfer;

	unsigned int num_submitted;
	unsigned int num_completed;
	unsigned int num_sgs;
	struct axi_dmac_sg sg[];
};

@pcercuei
Copy link
Contributor

I didn't know about metadata support in the dmaengine framework, that's great!

So the axi-dmac driver could be improved to support meta-data, as well as the DMA API core, to add support for sample counters.
The various chip drivers (or common code to all devices using axi-dmac) can then register themselves as meta-data client or provider, and have the sample counters exported to userspace.

Finally, libiio (in the local backend) would use this information when dequeueing buffers to know when samples have been lost, and set this attribute if the enqueued buffer has to be transmitted at a precise time.

@pcercuei
Copy link
Contributor

I don't think the API of libiio will be a problem. You could have something like:

int iio_block_set_enqueue_delay(block, nb_samples);

uint64_t iio_block_get_dequeue_timestamp(block);

@pcercuei
Copy link
Contributor

@catkira these are two different kernel interfaces, yes. The DMABUF one is the one I am trying to get upstream. It is not yet in ADI's kernel.

@pcercuei
Copy link
Contributor

I'd think we could have that in the DMABUF API only.

And now that I think of it… Why do we need anything in the IP core to support programmed transfers?

If you want a transfer to happen in 2ms, then the application (or libiio) can compute the number of dummy samples needed to fit 2ms, and enqueue that number of dummy samples.

@mhennerich
Copy link
Contributor

Well, in a lot of cases you want to synchronize your timestamping to an 1PPS or some other event.
Also you can't assume that your streaming can keep up with the baseband rate.
So in a TDD application you would prepare a slot, queue it and have some HW IP release it at a certain time.

@pcercuei
Copy link
Contributor

@mhennerich but enqueueing an empty buffer is a "free" operation, since we can fill it once and then enqueue it endlessly. IIOD could have a command "enqueue X dummy samples", so that you wouldn't transfer any data.

Unless you mean that the DMA itself cannot keep up with the baseband rate.

@catkira the mmap interface is never going to get upstream, that's what the dmabuf interface is being designed for. Once it is upstream, I believe ADI's kernel will support both, but the mmap interface will be marked as deprecated.

@pcercuei
Copy link
Contributor

@catkira we do have ADCs and DACs that can deliver GSPS, so if the HPC interface is limited to 300-400 MB/s, then we really do need to handle this in the IP core.

@catkira
Copy link
Contributor Author

catkira commented Sep 4, 2022

This is how iio_block looks currently

struct iio_block {
	struct iio_buffer *buffer;
	struct iio_block_pdata *pdata;
	size_t size;
	void *data;

	struct iio_task_token *token, *old_token;
	size_t bytes_used;
	bool cyclic;
};

I would make it

struct iio_block {
	struct iio_buffer *buffer;
	struct iio_block_pdata *pdata;
	size_t size;
	void *data;

	struct iio_task_token *token, *old_token;
	size_t bytes_used;
	bool cyclic;
        long long time;
        long flags;
};

The signature of iio_buffer_create_blockwould become

struct iio_block *
iio_buffer_create_block(struct iio_buffer *buf, size_t size, long long time, long flags){...}

Or alternative there could be functions like setBlockTime(struct iio_block *, long long time) , setBlockFlags(struct iio_block *, long flags) and corresponding get-functions.

For tx long long time would hold the requested start time in ticks (sample count). For rx it would hold the receive time.
long flags would hold control flags, ie for tx SEND_NOW, SEND_SCHEDULED for rx they could be RECEIVE_NOW, RECEIVE_SCHEDULED and for the response block TIME_PAST, ON_TIME, OVERFLOW.

There would also need to be a possibility to get the timer (counter) value without sending or receiving anything. Because applications need that for the first timed command. Soapy has an abstraction for that which look like this

long long getHardwareTime(const std::string &what = "") const;
void setHardwareTime(const long long timeNs, const std::string &what = "");

It could be done using the existing iio-api by just sending or receiving a block with bytes_used=0.

@pcercuei do You endorse this proposal? Do You prefer having set/get-functions or changing the signature of iio_buffer_create_block?

@pcercuei
Copy link
Contributor

pcercuei commented Sep 5, 2022

There's no point discussing how to implement it in libiio, before we have a proof-of-concept working with HDL code, and kernel drivers; because then the libiio interface will depend on what API is offered by the kernel...

@catkira
Copy link
Contributor Author

catkira commented Sep 5, 2022

Thats a chicken-egg-problem. Ideally the front and backend are developed at the same time so that the whole thing can be tested. But it seems that the hdl people of ADI are not interested in participating, so that part I will have to do alone. But maybe there is a chance to agree on a timestamp api for libiio. I prefer not to maintain private forks, I rather like to get stuff merged with PRs.

@pcercuei
Copy link
Contributor

pcercuei commented Sep 6, 2022

It is not a chicken-egg problem. Doing things out-of-order will most certainly result in twice the amount of work.

HDL people at ADI have their own agenda, you know. I'll see if we can organize something though.

@catkira
Copy link
Contributor Author

catkira commented Sep 11, 2022

Seems like the hdl guys are busy with other things. The timestamp feature has been discussed here on and off during the last couple of years, so it probably does not make sense to wait for an initiative. I will do the timestamp stuff myself then, this means unfortunately that I have to maintain forks of the hdl, buildroot, linux and libiio repo :/

@catkira catkira closed this as completed Sep 11, 2022
@mhennerich
Copy link
Contributor

Hi Benjamin,
scheduling a meeting with our HDL folks is on my todo list. Unfortunately last week I came back from a 2 week vacation, and I'm under water with quite a few things at the moment. We're happy to accept contributions for libiio, linux and HDL if we see them useful and maintainable. I think the timestamp infrastructure will definitely fall into this category. Ideally the HDL work should be discussed on the HDL forum, and I don't know their capacity and bandwidth to start working on this right away. This or the other way it would be good to have some alignment here. I'll set something up this week to discuss with them.

-Michael

@pcercuei pcercuei reopened this Sep 12, 2022
@rgetz
Copy link
Contributor

rgetz commented Sep 13, 2022

the timestamp value should not be written by the DMA to the samples buffer. Instead, it should be written by the IP to a separate hardware register, that the kernel can read in the IRQ handler, and provide to the userspace through a device or buffer attribute.

If the interface IP writes a sample count to a hardware registers - that the IRQ handler provides to user space through an attribute - how do you know if you are synchronous? (between buffers and timestamps?) I order to guarantee alignment between buffers and timestamps (or other meta data) doesn't it need to be managed in a single merged stream?

A single stream is how:

  • Ettus solved problem in CHDR [latest spec[(https://files.ettus.com/app_notes/RFNoC_Specification.pdf#page=12)
  • Industry solved in VITA 49 which is overkill - which is why most people don't use it - but is still a single stream for data + metadata
  • Industry solved in SigMF where you can have metadata per dataset.

Otherwise - how to manage the timestamp and the buffer from an application standpoint?

It's most important - not from a simple time stamping on a single device - use case - but check out use cases in Ettus doc:

  • Configuring multiple channels of a single to change frequency simultaneously (on devices like ADRV9002); or NCOs on AD9081; or control GPIOs (on the FPGA).
  • Configuring the channels on multiple radios to synchronously retune and ensure a predictable phase offset between channels (think multiple Quad boards).
  • Coordinating a simultaneous change of gain or frequency in a radio.
  • Scheduling frequency hopping at set time increments.
  • Transmit phase coherent bursts on multiple radios

I don't see how you can accomplish some of those things - without a tightly coupled timestamp and payload (dataset/buffer).

Those are the sorts of things that people want to do - and @mhennerich is correct - it requires both HDL and software (kernel and userspace) updates... If libiio 1.0 is that time - that's great.

Thanks
-robin

@pcercuei
Copy link
Contributor

@rgetz We were talking about the "timestamp" in particular - so the possibility to program the time at which a given block of samples should be transmitted. Hence my suggestion of using a separate hardware register, since we can program the scheduled transmission time before enqueuing a block.

I was suggesting that the timestamp value should be expressed not in seconds, but in samples. This means having a sample counter in HDL, which would also help later with full meta-data support (as you could program actions at a specific sample in a buffer) as well as overflow / underflow detection.

About full meta-data support we decided months ago that it's just really packet data, so it's a protocol between the IP and the application: neither the driver / IIO nor Libiio need to know what's in there.

@catkira
Copy link
Contributor Author

catkira commented Sep 14, 2022

I also think the "unit" of timestamps should be samples. Therefore there has to be a counter in the hdl core running with src_clk. The questions is what happens at sample rate changes, but I think that can be solved, by just resetting the counter everytime a sample rate change happens or alternatively high level driver code can handle this. Soapy driver for bladeRF does this by getting the hwCounter value before rate change and then getting the hwCounter value immediately after the hwRate change and then keeping an internal offset for all future timestamps (or something similar, not sure if I remember exactly).

I would implement it like this:

  • timestamp feature is implemented inside the axi_dmac core with a counter running at src_clk (= sample clock)
  • axi_dmac core gets additional hw registers (counter start value, duration in samples, flags) which are set when enqueueing a new block
  • regarding the linux driver and libiio I would integrate the new metadate into the sg_descriptor, because it has to be written to the axi_dmac core together with the current dma descriptor (which is also just written to some hw registers of the axi_dmac core).
  • there is one caviat: if timestamping is done at the axi_dmac core level, it is not possible to easily get sample accuracy. This is because the cpack core puts multiple samples into one data word when one or more channels are disabled. The axi_dmac core would also not be allowed to just through superfluous samples away, because they might be needed if there is another read after the current one without any gap in between. I thought about this problem for quite a while, had many ideas like adding an early abort feature to the cpack core. But now I think it would be easiest to do that logic in the axi_dmac or libiio driver:
    the driver will always round up samples in read or write requests so that are exact multiples of the axi_dmac input data width. axi_dmac driver or libiio would than have to internally buffer superfluous samples and prepend them to the next read or write if needed.
  • as a benchmark for the timestamp feature I would use srsRAN, because it is open source and widely used. Plus there is already other SDR hardware like ettus und bladeRF that works with this software. srsRAN is using bladeRF via a SoapySDR driver, so the same could be done for Pluto and more powerful ADI devices.

Changing settings of the SDR IC in a synchronized fashion would be nice, that would expand the scope of the timestamp feature a lot and would go into tdd and sync territory for which there are separate solutions available atm.
However I can imagine a simple solution how to integrate everything in the timestamp feature:

  • there could be a flag setting that tells the core to output the block data not to the dac or read not from the adc, but instead from an external port of the axi_adc core. This external port could be connected to gpio-pins of SDR ICs, or to and I2C core or directly to an I2C interface of an SDR IC to do bitbanging I2C. That way one could change settings in a synchronized way with the timestamp feature. Or even easier have a separate axi_dmac core for changing hardware settings.
  • the counters inside the different axi_dmac core would have to be synchronized for that, but I think its not too difficult to get that done with an additional counter_reset input or so.

I also thought if it would be a functional overload to put the timestamp feature inside the axi_dmac core. I first thought it would. But when I thought longer I thought that it might be useful for many applications and I also found out that there is already a dma core with timestamp feature, the ARTIQ dma core (http://m-labs.hk/artiq/manual/_modules/artiq/coredevice/dma.html).
But as I said before (I know people here dont want to hear it ^^), the axi_dmac core is currently not in a state where functionality can be added. Because the abort feature, which was added last, added a lot of complexity, it is not documented and has 0% test coverage. Maybe You have some tests that are not published ^^
If I would have to do the implementation alone, I would remove the abort feature from the axi_dmac core (I have already done that). After that the code of the core is pretty readable and has decent test coverage. Then I would implement the timestamp feature. And if the abort feature is needed then, I would merge the timestamp back into to axi_dmac core with abort feature.

Edit: that was all written for the RX case. For TX it would be similar, but

  • the counter would run at dest_clk
  • the superfluous samples problem with cupack cannot be solved in the driver. The axi_dmac core would have to tell the cupack core to send out an incomplete burst to the SDR IC. So I think a small feature would have to be added to cupack.
  • One might argue that its inconsistent to solve the superfluous sample problem for cpack in software and for cupack in hardware (hdl). I would still prefer to do as much as possible in c/c++.

@catkira
Copy link
Contributor Author

catkira commented Sep 14, 2022

I read the ettus document that @rgetz mentioned.
They use an absolute time, which makes sense if you want to do all sorts of timed commands. Because the sample clock is not well defined when the sample rate changes. With that in mind, it might be better to use the ctrl_clk of the axi_dmac for the counter. That would be equivalent to the vita_time in ettus then.

Ettus seems to have one data stream from CPU to FPGA where everything is multiplexed, with each packet having a compressed header (CHDR)

  • Data
  • Flow Control
  • Command
  • Command Response

I think the advantage of this over having multiple channels (like multiple axi_dmacs) is that the right order of things stays consistent even when the command queue in the hdl core runs empty.
When there is one axi_dmac for each channel and another axi_dmac for commands, it is possible that some timed commands or Rx/Tx events get executed too late when the queue is empty because of hw bandwidth limitations or busy CPU. But I imagine that this happens only in high load situations like:
multiple channel running at 60 MSPS, then stopping all of them at t_1, changing the frequency of all of them at t_2 and resuming streaming at t_3. If t_1, t_2, t_3 are very close to each other some dma transfers might reach the fpga not in time and the correct sequence of events is lost even though timestamps are present. But that would also mean starting most involved hdl cores from scratch, which is probably a bit too much.
I think having an axi_dmac core for each Rx/Tx channel and one for commands should work pretty good. I dont know if there is a way in Linux or fpga fabric to give one DMA channel a higher priority. If yes, it might be helpful to run the control-axi_dmac core with a higher priority, so that timed commands dont get blocked by the data stream.

@rgetz
Copy link
Contributor

rgetz commented Sep 14, 2022

I was suggesting that the timestamp value should be expressed not in seconds, but in samples.

yes - agree 100% - the proper term should be "sample stamping" - but no one calls it that. The common term used across the industry is still "timestamp" (to reflect sample clocks) However - thinking this is wall time - is terrible - since your wall clock likely is going to run from different clocks across units, vs your sample clock, and will drift compared to sample clocks...

This means having a sample counter in HDL, which would also help later with full meta-data support (as you could program actions at a specific sample in a buffer) as well as overflow / underflow detection.

Also agree - the software exposure in the kernel/user space/library needs to be done synchronously with the HDL.. Both from an architecture/design/implementation perspective. There isn't any point in thinking about SW without the HDL team.

There are basically two ways (I think?) to do things.

  1. Add a header to each buffer
  2. Add fences and resource barriers in the "stream". This is a concept that I learned about from Lars when he was looking at this a few years ago, and is something that the GPU programmers have been doing for a long time...

I don't know the complexity of implementing or complexity of using either. I can see the pro/con of each...

-Robin

@pcercuei
Copy link
Contributor

So we had a meeting with our HDL people to discuss this matter.

The initial design we came up with is the following:

  • One 64-bit counter in axi-dmac, running on the sample clock. It would need to be readable from a register too.
  • A register area of NB_BLOCKS_MAX * 64-bit, containing timestamp values for each one of the blocks, indexed by the DMA hardware descriptor index.
  • During a TX or RX transfer, the IP would read the corresponding timestamp from the registers area, wait until its value matches the internal counter, then perform the transfer. If the freerunning counter is already past the timestamp value, this is an underrun scenario, and the IP can raise an IRQ to the processor. In this case, the current counter's value is written back to the timestamp register (If there's no underrun the write wouldn't change the content of the register anyway).
  • The HDL would have to crawl the DMA descriptors list, in order to find the next transfer, because the transfers could be submitted out-of-order. Unless we make it a requirement that the transfers must be in-order.

On the kernel side of things, I would imagine a IIO buffer property named "timestamp", which the userspace would write prior to enqueueing a new block (either with BLOCK_ENQUEUE_IOCTL with the old mmap API, or with IIO_DMABUF_ENQUEUE_IOCTL with the new dmabuf API). When setting up the DMA descriptor the kernel would then write this value to the registers area, at the offset that corresponds to the block about to get transferred.

I think it would make sense that the "timestamp" IIO attribute would be automatically updated every time a block is enqueued so that it doesn't have to be touched when doing regular gapless data streaming. The value updated would correspond to the value of the block's timestamp register after the enqueue (since it can be updated by HDL when there's an underrun) + the size in samples of the block.

In Libiio, this IIO attribute would be read / written like any other regular attribute, but we could have a dedicated wrapper function for this functionality.

This design would allow to program transfers very accurately relative to one another - e.g. a 300µs delay between two transfers - but for Libiio to program transfers in wall clock time (e.g. transfer this block in exactly 500µs from now) we need to know the queue's latency, aka. the time in samples between the moment a buffer is enqueued and the moment it will be processed. I think it would be enough to have a "counter" IIO buffer attribute which would be updated alongside the "timestamp" one, with the current value of the IP's sample counter. Libiio can then estimate the latency from these values.

@mhennerich did I forget anything?
@rgetz, @catkira thoughts?

@catkira
Copy link
Contributor Author

catkira commented Sep 20, 2022

How about the fixed number of samples issue when some channels are disabled? ( I have written about it above )

@pcercuei
Copy link
Contributor

@catkira You mean this?

there is one caviat: if timestamping is done at the axi_dmac core level, it is not possible to easily get sample accuracy. This is because the cpack core puts multiple samples into one data word when one or more channels are disabled. The axi_dmac core would also not be allowed to just through superfluous samples away, because they might be needed if there is another read after the current one without any gap in between. I thought about this problem for quite a while, had many ideas like adding an early abort feature to the cpack core.

I don't understand the problem, sorry. Unless something is really wrong, you should be able to request 3 16-bit samples from the DMA, and the last one will be padded with zeros.

@catkira
Copy link
Contributor Author

catkira commented Sep 20, 2022

lets the dma width is 64 bit like in pluto. One channel is disabled. How do you sent only 1 IQ pair? zero padding is bad, because a gap between two bursts might not be wanted.

@pcercuei
Copy link
Contributor

You send one IQ pair in a 64-bit word padded with zeroes. I don't see the problem. You would only transfer the one IQ pair. The padding is, as its name suggests, just padding.

@catkira
Copy link
Contributor Author

catkira commented Sep 20, 2022

but then the dac sends out an unwanted 0 sample

@catkira
Copy link
Contributor Author

catkira commented Sep 20, 2022

ie we have multiple blocks with 1 sample each and we want to send them without gap

@catkira
Copy link
Contributor Author

catkira commented Sep 20, 2022

the upack hdl core does not know which bytes are padding and which ones are not ^^

@pcercuei
Copy link
Contributor

Is that something that you think is happening, or that you know is happening?

Because as I said, the padding is just padding, it does not end up on the wire. Only the IQ samples are transferred. If the padding ends up transferred as well, it's a bug (HDL or kernel) that we should fix.

@catkira
Copy link
Contributor Author

catkira commented Sep 20, 2022

I know its happening currently, but its not really a bug so far because fixes length transfers are not supported so far.
You can look at the documentation of the cpack core, that might help to understand the problem.

@catkira
Copy link
Contributor Author

catkira commented Sep 20, 2022

just look what signals the upack core gets from the dma core. The information how many samples or bytes inside the 64 bit data word are valid is not transferred from dma core to upack core. Therefore the padding samples end up on the wire.

@catkira
Copy link
Contributor Author

catkira commented Sep 20, 2022

I know hdl is not your main area of expertise, but by looking at what information the upack core gets from the dma core you can understand the problem without looking at hdl code.

@pcercuei
Copy link
Contributor

Transfers of arbitrary lengths are supported and have been for a while, for instance libiio v0.24 has iio_buffer_push_partial.

I guess this issue is not really related to the timestamp API (more like tangent to it), you should probably open a new bug report on analogdevicesinc/linux.

@catkira
Copy link
Contributor Author

catkira commented Sep 20, 2022

but why in linux? its a design limitation of the upack and dma core.

@pcercuei
Copy link
Contributor

On analogdevicesinc/hdl then.

@acostina
Copy link

We're currently thinking through possible architecture also on the HDL side.
During the above discussion, I don't think it was taken into consideration the fact that we have some designs in which we transfer more than one sample per clock. Also, in some designs, the clocks connected to the DMA are not directly related to sampling clock as we use an intermediary buffer capture/send data. Another aspect would be synchronizing multiple timers in multiple DMAs (both TX and Rx side) which we would probably want to do.
We'll need a bit more time to think about the architecture as we haven't focused on this recently in the HDL team and the people that developed and maintained the DMAC IP are no longer at ADI

@catkira
Copy link
Contributor Author

catkira commented Sep 21, 2022

@acostina I have done some modifications to the upack and dma core to solve this issue. I think its too much to discuss the details in this libiio issue.

@catkira
Copy link
Contributor Author

catkira commented Oct 7, 2022

@pcercuei
It would be nice if the user can get a result from the initiated transfer, ie an error when the requested time has already passed or an overflow occured. Is it possible to get information from the axi_dmac back to the user when dequeueing a block? Or do You have an idea how it could be done?

@pcercuei
Copy link
Contributor

pcercuei commented Oct 7, 2022

Yes, with the design I explained in my big message above, Libiio knows when a transfer's requested time already passed or when an overflow/underflow occured, when enqueueing a block.

@catkira
Copy link
Contributor Author

catkira commented Oct 7, 2022

The design You propose in Your long post would only check if the timestamps are ok when they get enqueued. But it does not guarantee that the dma descriptors reach the FPGA in time. So, if I understand it correctly, it could happen, that everything appears to be fine on the linux side, but the time is already passed once the descriptor reaches the FPGA. Also RX overflows can happen inside the FPGA, when You read from ADC and the DMA channel bandwidth is not enough so that an overflow happens. How can this information reach the user?

@catkira
Copy link
Contributor Author

catkira commented Oct 7, 2022

In the linux driver there is a function that handles transfer completed events here
Inside this function one could read some additional registers from the FPGA like flag or transfer_result. They just have to reach the user somehow.
But these cases where time is passed in hardware, or overflows in hardware should be pretty rare. So one could just have the software checks inside libiio like You described and have dma-axi-dmac.c just print an error message when an overflow or time_passed event occured in hardware.

@pcercuei
Copy link
Contributor

pcercuei commented Oct 7, 2022

it does not guarantee that the dma descriptors reach the FPGA in time

I don't know what you mean by that. There are no hardware DMA descriptors.

When you enqueue a block to the kernel, the kernel will write the block's requested timestamp to the block's own timestamp HDL register, add it to the DMA queue, then read back the HDL timestamp register. If it reads a different value, that means you have an under/overrun.

Also RX overflows can happen inside the FPGA, when You read from ADC and the DMA channel bandwidth is not enough so that an overflow happens.

Can they though? That really sounds like a HDL or hardware problem. The DMA should be able to keep up with the hardware.

@catkira
Copy link
Contributor Author

catkira commented Oct 8, 2022

I don't know what you mean by that. There are no hardware DMA descriptors.

When you enqueue a block to the kernel, the kernel will write the block's requested timestamp to the block's own timestamp HDL register, add it to the DMA queue, then read back the HDL timestamp register. If it reads a different value, that means you have an under/overrun.

What I mean by hardware dma descriptor is the bunch of registers that stores information like length, target_addr, dest_addr and potentially start_time inside the axi_dmac core.
I assume writing to the blocks own timestamp HDL will also take place in the axi_dmac kernel driver (dma-axi-dmac.c). But the only information that currrently reaches this driver are the enqueued blocks via the kernel DMA system. Furthermore there is currently no information flow back from dma-axi-dmac.c to the user. So if the comparison of written and read-back timestamp happens inside dma-axi-dmac.c, how could the user be notified about the result?

[Edit: I remember there are some optional call back functions that the user can give the dma_descriptor. Maybe they could be used to pass information back. But I am not sure, have to check.]

All the channel attributes are provided the DAC and ADC drivers (cf_axi_adc_core.c and cf_axi_dds.c) but they currently dont write anything to the registers inside the axi_dmac core. It would be possible to also let the DAC and ADC drivers write axi_dmac hardware registers, but it would make the whole design not so clean.

Since the timestamp information has to reach dma-axi-dmac.c, it would also be nice to have it as a separate parameter to enqueue_block and not as an iio buffer attribute. Unless the buffer attribute can reach dma-axi-dmac.c through the linux DMA kernel machine, which is not possible if I understand it correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants