Using program_erase_flash in USB devices on STM32F1 (e.g. DFU example) #222

dhoepfl · 2021-01-19T11:39:42Z

Here’s a problem I ran into when I tried to implement an USB MSC:

I’m using the STM32F103 based “black pill”. I used the stm32f4-discovery/usb_msc example as base but I replaced the ramdisk callbacks with implementations that read from/write to a portion of the flash that I dedicated to persist the data.

Reading worked fine but writing resulted in stalled USB connections. Searching for my mistake took several days but finally I found the explanation: Erasing a flash page on STM32F1 takes about 20ms. That is not the problem per se as I perform all writes in the main loop, outside usbd_poll (that is called from USB ISRs). While flash is erased, everything continues to work as usual, unless you access flash by any means. If you do so, the processor stalls until the flash erase has completed. Unfortunately this includes reading instructions and the interrupt vector from flash.

That means if you perform any flash page erase while running as USB device, you will miss USB frames if you run any code from flash. I did not test the DFU example so it might work there but it breaks the communication used for MSC. The only solution to the problem is to move all code that needs to execute during flash erase into RAM.

I don’t see any way this could be solved in libopemcm3 but it should be documented for flash_erase_page.

The text was updated successfully, but these errors were encountered:

manuelbl · 2021-01-19T12:24:12Z

Are you sure your analysis it correct? It contradicts my understanding of how USB and USB peripherals in MCUs work.

The USB peripheral handles all USB communication. Any code - be it libopencm3 code or project specific code - is only called after the fact, i.e. after a packet has been received or sent. Until such code has been executed, the USB peripheral will continue to work. It will simply respond with NAK to further USB communication (on a endpoint by endpoint basis usually). Responding with NAK isn't a problem at all. It's the basic mechanism for USB flow control. It signals to the host that the device is currently busy. The host will then retry later.

If your analysis was correct, it would mean that:

either the USB peripheral is stalled too if the processor is stalled due to a flash access,
or the USB peripheral accesses the flash too,
or the host uses MSC specific timeouts shorter than 20ms.

Do you have any indication that any of the above is the case? Or do you see an issue with my understanding of the USB peripheral?

karlp · 2021-01-19T14:06:59Z

@manuelbl iirc, we're always acking, unless explicitly told to start nakking. it's an api problem/benefit dependingon how you look at it. @dhoepfl can you try and wrap your long code in set_nak() clear_nak()?

manuelbl · 2021-01-19T14:25:47Z

@karlp I don't agree. On STM32F1s, the ACKs/NAKs are tied to whether buffers are free or not. Whenever the USB peripheral has received a packet, RX_STAT automatically changes. All further communication is respond with NAK until the RX_STAT is reset. The processor is not involved in this. It's all done autonomously by the USB peripheral.

The RX_STAT flag is reset when the packet is retrieved by the processor (call to usbd_ep_read_packet(), see https://github.com/libopencm3/libopencm3/blob/master/lib/stm32/common/st_usbfs_core.c#L230).

With double buffering and on DWC cores it's slightly more complex. But the fundamental concept stays the same: if the processor is stalled, no data is lost as all communication is answered with NAK (once the buffers are full).

I doubt that set_nak() and clear_nak() will make any difference.

karlp · 2021-01-19T14:48:31Z

yes, but we read the packet, (clearing nak) to get the instruction to erase flash, so then we ack the next packet, but are stalling....

manuelbl · 2021-01-19T15:26:45Z

Yes, but still no problem. The first packet is read, clearing NAK and starting the flash erase. A second packet is normally received by the USB peripheral and put into the shared memory (PMA), setting NAK, and all further packets are rejected with NAK until the flash erase is completed and the second packet has been retrieved. This is an automatic and reliable process without data loss.

If data is lost because the processor stalls, then there must be something else involved that hasn't been mentioned yet.

dhoepfl · 2021-01-19T20:00:25Z

Sorry I cannot help a lot here, I noticed this about a year ago but never opened the issue.

I remember that Windows lost the mass storage I wrote back then when I erased the flash right away. I solved it by copying the flash to RAM on startup (my “mass“ storage device is a 8 KB FAT filesystem), writing it back to flash after unmount.

Maybe the problems I saw are related to the fact that I call usbd_poll from interrupt (NVIC_USB_WAKEUP_IRQ and NVIC_USB_LP_CAN_RX0_IRQ). This probably means that write_block (which in turn calls flash_erase_page) is called during the USB interrupt handling. I’m not sure if I switched to IRQ based poll calls before or after getting the disconnects, I think it was afterwards. I’m quite sure I also tried to just set a flag and call flash_erase_page from the main loop but still saw the disconnects. It might just be a problem of the mass storage driver that does not like getting NACKs when it wants to send data to USB media.

Nonetheless I think it would be good to mention the bus stalling in the documentation of flash_erase_page even when USB should work since this is a rather unexpected side-effect (and, as far as I can tell, only mentioned in PM0068/PM0075 but not in the reference manual RM0008 of STM32F1…). E.g. it is also not calling a 1ms systick interrupt 19 times because flash_erase_page … I guess it does drop unhandled repeated interrupts during the stall.

karlp · 2021-01-20T11:12:09Z

I don't recommend calling usbd_poll from irq context, unless you're really sure you want all your handlers to be invoked form irq context either. There's a reason we don't do that in examples :)

We can try and document traps, but it's kinda endless. I would lean on saying that if you're erasing flash, you really should haave a good understanding of the impact of that. The split of some of the information into the PMxxxx manuals is something that only happened on F1, so if you were using modern parts, you would have have seen all the information in the ref manual anyway.

tormodvolden · 2021-11-19T22:44:37Z

For DFU this shouldn't be a problem because it is handled in the DFU protocol: The device reports how long it will be busy during flashing/erasing, and the host waits before resuming communication.

andy7v · 2023-01-17T18:56:17Z

I have exactly the same problem. Everything works fine with the RAM buffer, but there is no real flash memory with erasing / writing. The library requires immediate execution of the "write_block" function. Any delay disrupts normal operation. For testing purposes, I only left a simple dummy "for()" delay loop inside "write_block". If the value of the delay cycles is more than 300-500, the writing freezes. This behavior is independent of the calling "usbd_poll()" from wile() loop or from USB_LP_CAN1_RX0_IRQHandler. If we look to usb_msc.c
if (0 != (ms->read_block)(lba,
trans->msd_buf)) {
/ Error */
}
we see that the unsuccessful write is not handled.
It seems like library don't send "I'm busy" to host.
I tried to accumulate large blocks and then flush to disk. This does not change anything. There comes a time when there is a delay during data flush even if it inside of main loop, not in "write_block".
Is there any one who get working MSC with real flash? All examples with ram buffer only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using program_erase_flash in USB devices on STM32F1 (e.g. DFU example) #222

Using program_erase_flash in USB devices on STM32F1 (e.g. DFU example) #222

dhoepfl commented Jan 19, 2021

manuelbl commented Jan 19, 2021

karlp commented Jan 19, 2021

manuelbl commented Jan 19, 2021

karlp commented Jan 19, 2021

manuelbl commented Jan 19, 2021

dhoepfl commented Jan 19, 2021

karlp commented Jan 20, 2021

tormodvolden commented Nov 19, 2021

andy7v commented Jan 17, 2023

Using program_erase_flash in USB devices on STM32F1 (e.g. DFU example) #222

Using program_erase_flash in USB devices on STM32F1 (e.g. DFU example) #222

Comments

dhoepfl commented Jan 19, 2021

manuelbl commented Jan 19, 2021

karlp commented Jan 19, 2021

manuelbl commented Jan 19, 2021

karlp commented Jan 19, 2021

manuelbl commented Jan 19, 2021

dhoepfl commented Jan 19, 2021

karlp commented Jan 20, 2021

tormodvolden commented Nov 19, 2021

andy7v commented Jan 17, 2023