-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timestamp API #860
Comments
I disagree with your suggestion, the timestamp value should not be written by the DMA to the samples buffer. Instead, it should be written by the IP to a separate hardware register, that the kernel can read in the IRQ handler, and provide to the userspace through a device or buffer attribute. That would make things way simpler: the application would just need to dequeue a block of samples, then optionally read the attribute. Same for TX - the userspace would write a device/buffer "timestamp" attribute, which would cause the kernel to write to a register of the IP, that would control how the next buffer is transferred. Note that I don't really think we need a "timestamp", but a "samples counter". Since we know the sample rate we can retrieve the timestamps anyway, and having a "samples counter" would make it much easier to detect DMA underruns / overruns. |
There's no need for buffering in the FPGA. The DMA should process the requests sequentially like it already does, and wait until the transfer time is reached if needed. So it does not need to know about any following request before the previous ones are finished. The number of bytes does not need to be specified, just transfer the whole block of samples. If you want to transfer a non-standard number of samples, then just create a block of that size. Adding a separate iio channel is a bad idea, there would be no way of matching the data from that channel to the samples of the main buffer. You only need to instruct the DMA to wait for a specified time before transferring a whole buffer, so a hardware register (appearing as a device/buffer attribute in userspace) looks appropriate to me. Baseband frequency changes don't change the sample rate of your ADC. The sample rate does not change on-the-fly like that. Making the timestamp core work with samples count would work much better and be way more precise. |
The DMA cores already have hardware queues, and switching between one configured transfer to another is instant and does not cause samples to be missed. |
I don't really know what the FPGA program does, I just know that we already have a hardware queue for the DMA, and no samples are lost when the DMA switches from streaming one block of samples to another. So we already manage "consecutive burst writes" if you will. You can manage things with just a single register for meta-data, if this register is read by the IP at the moment it starts streaming from a new block of samples. Then every time the DMA is done streaming from a block of samples, the driver can update the value of the register to the next transfer's setting. And since the timestamper is basically about pausing / unpausing the DMA until a special point in time is reached, the magic register could also be a new field in the DMA hardware descriptor. As for using time vs. samples count... The example of a random software changing the sample rate isn't really valid IMHO. This software does not use timestamped buffers, and if it did, it would either not change the sample rate (keep the fast rate at all times), or handle it properly (not programming timestamped buffers when a sample rate change is under way). Using a samples counter just makes more sense, as it permits to be extremely precise in the time domain, and detect overruns/underruns. |
If the samples counter simply represents the number of samples by which the next transfer should be delayed, then "now" is just "counter=0". Programming transfers at a specific point in time, which you call "timestamping", is one mechanism. The other mechanism is tagging the RX buffers with a timestamp, to be able to know when exactly those RX buffers were received. Again, it would make sense to use a samples count value, because then you can detect gaps between buffers. You can compute the age of the RX buffer by comparing with the "live" samples counter, which would be a register updated by the IP every time a sample is received. Whatever we choose to do, there will be driver changes needed, so whether or not changes are intrusive isn't really a problem. You are thinking about having a "timestamper control logic" after the AXI DMA core. What I'm telling you is that the AXI DMA core could gain the "programming transfers" functionality, and the rest of the HDL wouldn't change. |
Changes in the hdl code: and in buffer.c struct iio_buffer * iio_device_create_buffer(const struct iio_device *dev,
size_t samples_count, bool cyclic) could be changed to struct iio_buffer * iio_device_create_buffer(const struct iio_device *dev,
size_t samples_count, int flag, unsigned int start_sample) in iio-private.h iio_buffer gets a new field: struct iio_buffer {
const struct iio_device *dev;
void *buffer, *userdata;
size_t length, data_length;
unsgined int start_sample; // new field
uint32_t *mask;
unsigned int dev_sample_size;
unsigned int sample_size;
bool dev_is_high_speed;
}; in local.c block get a new fild struct block {
uint32_t id,
size,
bytes_used,
type,
flags,
start_sample, // new field
offset;
uint64_t timestamp;
}; and stuff needs to be modified in buffer_impl.h as well. The callchain when writing a buffer is: local.c ioctl_nointr(f, BLOCK_ENQUEUE_IOCTL, last_block)
-> industrialio-buffer.c iio_buffer_ioctl()
-> industrialio-buffer.c iio_buffer_enqueue_block(buffer, (struct iio_buffer_block __user *)arg)
-> industrialio-buffer-dma.c iio_dma_buffer_enqueue_block(buffer, &block)
-> industrialio-buffer-dma.c iio_dma_buffer_enqueue(queue, block)
-> industrialio-buffer-dma.c iio_dma_buffer_submit_block(queue, block)
-> industrialio-buffer-dma.c queue->ops->submit(queue, block)
-> cf_axi_dds_buffer_stream.c dds_buffer_submit_block(queue, block)
-> cf_axi_dds_buffer_stream.c iio_dmaengine_buffer_submit_block(queue, block, DMA_TO_DEVICE)
-> industrialio-buffer-dmaengine.c desc = dmaengine_prep_slave_single(...)
-> dma-axi-dmac.c axi_dmac_prep_slave_sg(...)
-> industrialio-buffer-dmaengine.c desc->callback_param = block
-> industrialio-buffer-dmaengine.c dmaengine_submit(desc)
Edit: After checking again, i see that there is a field buffer for tx will be prepared with dmaengine_prep_slave_single which will get forwarded to axi_dmac_prep_slave_sg() of the axi-dmac driver, see here. After preparing the buffer, it will be submitted with dmaengine_submit(), which does desc->tx_submit(desc) // defined in include/linux/dmaengine.h but I dont see yet how this reaches the axi_dmac driver... But I think the desc will somehow end up in axi_dmac_start_transfer(...). struct axi_dmac_desc {
struct virt_dma_desc vdesc;
bool cyclic; // probably need to make this an int or enum flag
unsigned int start_sample; // new attribute
bool have_partial_xfer;
unsigned int num_submitted;
unsigned int num_completed;
unsigned int num_sgs;
struct axi_dmac_sg sg[];
}; |
I didn't know about metadata support in the dmaengine framework, that's great! So the axi-dmac driver could be improved to support meta-data, as well as the DMA API core, to add support for sample counters. Finally, libiio (in the local backend) would use this information when dequeueing buffers to know when samples have been lost, and set this attribute if the enqueued buffer has to be transmitted at a precise time. |
I don't think the API of libiio will be a problem. You could have something like:
|
@catkira these are two different kernel interfaces, yes. The DMABUF one is the one I am trying to get upstream. It is not yet in ADI's kernel. |
I'd think we could have that in the DMABUF API only. And now that I think of it… Why do we need anything in the IP core to support programmed transfers? If you want a transfer to happen in 2ms, then the application (or libiio) can compute the number of dummy samples needed to fit 2ms, and enqueue that number of dummy samples. |
Well, in a lot of cases you want to synchronize your timestamping to an 1PPS or some other event. |
@mhennerich but enqueueing an empty buffer is a "free" operation, since we can fill it once and then enqueue it endlessly. IIOD could have a command "enqueue X dummy samples", so that you wouldn't transfer any data. Unless you mean that the DMA itself cannot keep up with the baseband rate. @catkira the mmap interface is never going to get upstream, that's what the dmabuf interface is being designed for. Once it is upstream, I believe ADI's kernel will support both, but the mmap interface will be marked as deprecated. |
@catkira we do have ADCs and DACs that can deliver GSPS, so if the HPC interface is limited to 300-400 MB/s, then we really do need to handle this in the IP core. |
This is how struct iio_block {
struct iio_buffer *buffer;
struct iio_block_pdata *pdata;
size_t size;
void *data;
struct iio_task_token *token, *old_token;
size_t bytes_used;
bool cyclic;
}; I would make it struct iio_block {
struct iio_buffer *buffer;
struct iio_block_pdata *pdata;
size_t size;
void *data;
struct iio_task_token *token, *old_token;
size_t bytes_used;
bool cyclic;
long long time;
long flags;
}; The signature of iio_buffer_create_blockwould become struct iio_block *
iio_buffer_create_block(struct iio_buffer *buf, size_t size, long long time, long flags){...} Or alternative there could be functions like For tx There would also need to be a possibility to get the timer (counter) value without sending or receiving anything. Because applications need that for the first timed command. Soapy has an abstraction for that which look like this long long getHardwareTime(const std::string &what = "") const;
void setHardwareTime(const long long timeNs, const std::string &what = ""); It could be done using the existing iio-api by just sending or receiving a block with bytes_used=0. @pcercuei do You endorse this proposal? Do You prefer having set/get-functions or changing the signature of |
There's no point discussing how to implement it in libiio, before we have a proof-of-concept working with HDL code, and kernel drivers; because then the libiio interface will depend on what API is offered by the kernel... |
Thats a chicken-egg-problem. Ideally the front and backend are developed at the same time so that the whole thing can be tested. But it seems that the hdl people of ADI are not interested in participating, so that part I will have to do alone. But maybe there is a chance to agree on a timestamp api for libiio. I prefer not to maintain private forks, I rather like to get stuff merged with PRs. |
It is not a chicken-egg problem. Doing things out-of-order will most certainly result in twice the amount of work. HDL people at ADI have their own agenda, you know. I'll see if we can organize something though. |
Seems like the hdl guys are busy with other things. The timestamp feature has been discussed here on and off during the last couple of years, so it probably does not make sense to wait for an initiative. I will do the timestamp stuff myself then, this means unfortunately that I have to maintain forks of the hdl, buildroot, linux and libiio repo :/ |
Hi Benjamin, -Michael |
If the interface IP writes a sample count to a hardware registers - that the IRQ handler provides to user space through an attribute - how do you know if you are synchronous? (between buffers and timestamps?) I order to guarantee alignment between buffers and timestamps (or other meta data) doesn't it need to be managed in a single merged stream? A single stream is how:
Otherwise - how to manage the timestamp and the buffer from an application standpoint? It's most important - not from a simple time stamping on a single device - use case - but check out use cases in Ettus doc:
I don't see how you can accomplish some of those things - without a tightly coupled timestamp and payload (dataset/buffer). Those are the sorts of things that people want to do - and @mhennerich is correct - it requires both HDL and software (kernel and userspace) updates... If libiio 1.0 is that time - that's great. Thanks |
@rgetz We were talking about the "timestamp" in particular - so the possibility to program the time at which a given block of samples should be transmitted. Hence my suggestion of using a separate hardware register, since we can program the scheduled transmission time before enqueuing a block. I was suggesting that the timestamp value should be expressed not in seconds, but in samples. This means having a sample counter in HDL, which would also help later with full meta-data support (as you could program actions at a specific sample in a buffer) as well as overflow / underflow detection. About full meta-data support we decided months ago that it's just really packet data, so it's a protocol between the IP and the application: neither the driver / IIO nor Libiio need to know what's in there. |
I also think the "unit" of timestamps should be samples. Therefore there has to be a counter in the hdl core running with src_clk. The questions is what happens at sample rate changes, but I think that can be solved, by just resetting the counter everytime a sample rate change happens or alternatively high level driver code can handle this. Soapy driver for bladeRF does this by getting the hwCounter value before rate change and then getting the hwCounter value immediately after the hwRate change and then keeping an internal offset for all future timestamps (or something similar, not sure if I remember exactly). I would implement it like this:
Changing settings of the SDR IC in a synchronized fashion would be nice, that would expand the scope of the timestamp feature a lot and would go into tdd and sync territory for which there are separate solutions available atm.
I also thought if it would be a functional overload to put the timestamp feature inside the axi_dmac core. I first thought it would. But when I thought longer I thought that it might be useful for many applications and I also found out that there is already a dma core with timestamp feature, the ARTIQ dma core (http://m-labs.hk/artiq/manual/_modules/artiq/coredevice/dma.html). Edit: that was all written for the RX case. For TX it would be similar, but
|
I read the ettus document that @rgetz mentioned. Ettus seems to have one data stream from CPU to FPGA where everything is multiplexed, with each packet having a compressed header (CHDR)
I think the advantage of this over having multiple channels (like multiple axi_dmacs) is that the right order of things stays consistent even when the command queue in the hdl core runs empty. |
yes - agree 100% - the proper term should be "sample stamping" - but no one calls it that. The common term used across the industry is still "timestamp" (to reflect sample clocks) However - thinking this is wall time - is terrible - since your wall clock likely is going to run from different clocks across units, vs your sample clock, and will drift compared to sample clocks...
Also agree - the software exposure in the kernel/user space/library needs to be done synchronously with the HDL.. Both from an architecture/design/implementation perspective. There isn't any point in thinking about SW without the HDL team. There are basically two ways (I think?) to do things.
I don't know the complexity of implementing or complexity of using either. I can see the pro/con of each... -Robin |
So we had a meeting with our HDL people to discuss this matter. The initial design we came up with is the following:
On the kernel side of things, I would imagine a IIO buffer property named "timestamp", which the userspace would write prior to enqueueing a new block (either with BLOCK_ENQUEUE_IOCTL with the old mmap API, or with IIO_DMABUF_ENQUEUE_IOCTL with the new dmabuf API). When setting up the DMA descriptor the kernel would then write this value to the registers area, at the offset that corresponds to the block about to get transferred. I think it would make sense that the "timestamp" IIO attribute would be automatically updated every time a block is enqueued so that it doesn't have to be touched when doing regular gapless data streaming. The value updated would correspond to the value of the block's timestamp register after the enqueue (since it can be updated by HDL when there's an underrun) + the size in samples of the block. In Libiio, this IIO attribute would be read / written like any other regular attribute, but we could have a dedicated wrapper function for this functionality. This design would allow to program transfers very accurately relative to one another - e.g. a 300µs delay between two transfers - but for Libiio to program transfers in wall clock time (e.g. transfer this block in exactly 500µs from now) we need to know the queue's latency, aka. the time in samples between the moment a buffer is enqueued and the moment it will be processed. I think it would be enough to have a "counter" IIO buffer attribute which would be updated alongside the "timestamp" one, with the current value of the IP's sample counter. Libiio can then estimate the latency from these values. @mhennerich did I forget anything? |
How about the fixed number of samples issue when some channels are disabled? ( I have written about it above ) |
@catkira You mean this?
I don't understand the problem, sorry. Unless something is really wrong, you should be able to request 3 16-bit samples from the DMA, and the last one will be padded with zeros. |
lets the dma width is 64 bit like in pluto. One channel is disabled. How do you sent only 1 IQ pair? zero padding is bad, because a gap between two bursts might not be wanted. |
You send one IQ pair in a 64-bit word padded with zeroes. I don't see the problem. You would only transfer the one IQ pair. The padding is, as its name suggests, just padding. |
but then the dac sends out an unwanted 0 sample |
ie we have multiple blocks with 1 sample each and we want to send them without gap |
the upack hdl core does not know which bytes are padding and which ones are not ^^ |
Is that something that you think is happening, or that you know is happening? Because as I said, the padding is just padding, it does not end up on the wire. Only the IQ samples are transferred. If the padding ends up transferred as well, it's a bug (HDL or kernel) that we should fix. |
I know its happening currently, but its not really a bug so far because fixes length transfers are not supported so far. |
just look what signals the upack core gets from the dma core. The information how many samples or bytes inside the 64 bit data word are valid is not transferred from dma core to upack core. Therefore the padding samples end up on the wire. |
I know hdl is not your main area of expertise, but by looking at what information the upack core gets from the dma core you can understand the problem without looking at hdl code. |
Transfers of arbitrary lengths are supported and have been for a while, for instance libiio v0.24 has I guess this issue is not really related to the timestamp API (more like tangent to it), you should probably open a new bug report on analogdevicesinc/linux. |
but why in linux? its a design limitation of the upack and dma core. |
On |
We're currently thinking through possible architecture also on the HDL side. |
@acostina I have done some modifications to the upack and dma core to solve this issue. I think its too much to discuss the details in this libiio issue. |
@pcercuei |
Yes, with the design I explained in my big message above, Libiio knows when a transfer's requested time already passed or when an overflow/underflow occured, when enqueueing a block. |
The design You propose in Your long post would only check if the timestamps are ok when they get enqueued. But it does not guarantee that the dma descriptors reach the FPGA in time. So, if I understand it correctly, it could happen, that everything appears to be fine on the linux side, but the time is already passed once the descriptor reaches the FPGA. Also RX overflows can happen inside the FPGA, when You read from ADC and the DMA channel bandwidth is not enough so that an overflow happens. How can this information reach the user? |
In the linux driver there is a function that handles transfer completed events here |
I don't know what you mean by that. There are no hardware DMA descriptors. When you enqueue a block to the kernel, the kernel will write the block's requested timestamp to the block's own timestamp HDL register, add it to the DMA queue, then read back the HDL timestamp register. If it reads a different value, that means you have an under/overrun.
Can they though? That really sounds like a HDL or hardware problem. The DMA should be able to keep up with the hardware. |
What I mean by hardware dma descriptor is the bunch of registers that stores information like length, target_addr, dest_addr and potentially start_time inside the axi_dmac core. [Edit: I remember there are some optional call back functions that the user can give the dma_descriptor. Maybe they could be used to pass information back. But I am not sure, have to check.] All the channel attributes are provided the DAC and ADC drivers (cf_axi_adc_core.c and cf_axi_dds.c) but they currently dont write anything to the registers inside the axi_dmac core. It would be possible to also let the DAC and ADC drivers write axi_dmac hardware registers, but it would make the whole design not so clean. Since the timestamp information has to reach dma-axi-dmac.c, it would also be nice to have it as a separate parameter to enqueue_block and not as an iio buffer attribute. Unless the buffer attribute can reach dma-axi-dmac.c through the linux DMA kernel machine, which is not possible if I understand it correctly. |
A possible way to support timestamps would be to add the following functions to libiio
For getting the timestamp from HDL to user there could be a timestamp added to each block. The disadvantage of this would be, that the blocksize would not be num_samples * sample_size anymore. Also inside the axi_dmac it would be a bit more complicated, because the internal fifo size is calculated like burst_size * num_bursts.
I think the cleanest way would be to add a receive channel to the timestamper module, that gives out a timestamp for every meta command. The difficulty here would be to match the received timestamp to the meta command (probably easy for sync reads and more difficult for async/non-blocking reads). One could add a command_id to the meta_command struct to make this easier. Or add a get_timestamp() function to libiio that matches timestamps to received data blocks. But one would have to make sure, that this matching also works correctly in case of overflows in the HDL. So adding a command_counter to meta_command would probably be the easiest. The command_counter value could then also just be a field of the received_timestamp struct
The text was updated successfully, but these errors were encountered: