Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using program_erase_flash in USB devices on STM32F1 (e.g. DFU example) #222

Open
dhoepfl opened this issue Jan 19, 2021 · 9 comments
Open

Comments

@dhoepfl
Copy link

dhoepfl commented Jan 19, 2021

Here’s a problem I ran into when I tried to implement an USB MSC:

I’m using the STM32F103 based “black pill”. I used the stm32f4-discovery/usb_msc example as base but I replaced the ramdisk callbacks with implementations that read from/write to a portion of the flash that I dedicated to persist the data.

Reading worked fine but writing resulted in stalled USB connections. Searching for my mistake took several days but finally I found the explanation: Erasing a flash page on STM32F1 takes about 20ms. That is not the problem per se as I perform all writes in the main loop, outside usbd_poll (that is called from USB ISRs). While flash is erased, everything continues to work as usual, unless you access flash by any means. If you do so, the processor stalls until the flash erase has completed. Unfortunately this includes reading instructions and the interrupt vector from flash.

That means if you perform any flash page erase while running as USB device, you will miss USB frames if you run any code from flash. I did not test the DFU example so it might work there but it breaks the communication used for MSC. The only solution to the problem is to move all code that needs to execute during flash erase into RAM.

I don’t see any way this could be solved in libopemcm3 but it should be documented for flash_erase_page.

@manuelbl
Copy link

Are you sure your analysis it correct? It contradicts my understanding of how USB and USB peripherals in MCUs work.

The USB peripheral handles all USB communication. Any code - be it libopencm3 code or project specific code - is only called after the fact, i.e. after a packet has been received or sent. Until such code has been executed, the USB peripheral will continue to work. It will simply respond with NAK to further USB communication (on a endpoint by endpoint basis usually). Responding with NAK isn't a problem at all. It's the basic mechanism for USB flow control. It signals to the host that the device is currently busy. The host will then retry later.

If your analysis was correct, it would mean that:

  • either the USB peripheral is stalled too if the processor is stalled due to a flash access,
  • or the USB peripheral accesses the flash too,
  • or the host uses MSC specific timeouts shorter than 20ms.

Do you have any indication that any of the above is the case? Or do you see an issue with my understanding of the USB peripheral?

@karlp
Copy link
Member

karlp commented Jan 19, 2021

@manuelbl iirc, we're always acking, unless explicitly told to start nakking. it's an api problem/benefit dependingon how you look at it. @dhoepfl can you try and wrap your long code in set_nak() clear_nak()?

@manuelbl
Copy link

@karlp I don't agree. On STM32F1s, the ACKs/NAKs are tied to whether buffers are free or not. Whenever the USB peripheral has received a packet, RX_STAT automatically changes. All further communication is respond with NAK until the RX_STAT is reset. The processor is not involved in this. It's all done autonomously by the USB peripheral.

The RX_STAT flag is reset when the packet is retrieved by the processor (call to usbd_ep_read_packet(), see https://github.com/libopencm3/libopencm3/blob/master/lib/stm32/common/st_usbfs_core.c#L230).

With double buffering and on DWC cores it's slightly more complex. But the fundamental concept stays the same: if the processor is stalled, no data is lost as all communication is answered with NAK (once the buffers are full).

I doubt that set_nak() and clear_nak() will make any difference.

@karlp
Copy link
Member

karlp commented Jan 19, 2021

yes, but we read the packet, (clearing nak) to get the instruction to erase flash, so then we ack the next packet, but are stalling....

@manuelbl
Copy link

Yes, but still no problem. The first packet is read, clearing NAK and starting the flash erase. A second packet is normally received by the USB peripheral and put into the shared memory (PMA), setting NAK, and all further packets are rejected with NAK until the flash erase is completed and the second packet has been retrieved. This is an automatic and reliable process without data loss.

If data is lost because the processor stalls, then there must be something else involved that hasn't been mentioned yet.

@dhoepfl
Copy link
Author

dhoepfl commented Jan 19, 2021

Sorry I cannot help a lot here, I noticed this about a year ago but never opened the issue.

I remember that Windows lost the mass storage I wrote back then when I erased the flash right away. I solved it by copying the flash to RAM on startup (my “mass“ storage device is a 8 KB FAT filesystem), writing it back to flash after unmount.

Maybe the problems I saw are related to the fact that I call usbd_poll from interrupt (NVIC_USB_WAKEUP_IRQ and NVIC_USB_LP_CAN_RX0_IRQ). This probably means that write_block (which in turn calls flash_erase_page) is called during the USB interrupt handling. I’m not sure if I switched to IRQ based poll calls before or after getting the disconnects, I think it was afterwards. I’m quite sure I also tried to just set a flag and call flash_erase_page from the main loop but still saw the disconnects. It might just be a problem of the mass storage driver that does not like getting NACKs when it wants to send data to USB media.

Nonetheless I think it would be good to mention the bus stalling in the documentation of flash_erase_page even when USB should work since this is a rather unexpected side-effect (and, as far as I can tell, only mentioned in PM0068/PM0075 but not in the reference manual RM0008 of STM32F1…). E.g. it is also not calling a 1ms systick interrupt 19 times because flash_erase_page … I guess it does drop unhandled repeated interrupts during the stall.

@karlp
Copy link
Member

karlp commented Jan 20, 2021

I don't recommend calling usbd_poll from irq context, unless you're really sure you want all your handlers to be invoked form irq context either. There's a reason we don't do that in examples :)

We can try and document traps, but it's kinda endless. I would lean on saying that if you're erasing flash, you really should haave a good understanding of the impact of that. The split of some of the information into the PMxxxx manuals is something that only happened on F1, so if you were using modern parts, you would have have seen all the information in the ref manual anyway.

@tormodvolden
Copy link

For DFU this shouldn't be a problem because it is handled in the DFU protocol: The device reports how long it will be busy during flashing/erasing, and the host waits before resuming communication.

@andy7v
Copy link

andy7v commented Jan 17, 2023

I have exactly the same problem. Everything works fine with the RAM buffer, but there is no real flash memory with erasing / writing. The library requires immediate execution of the "write_block" function. Any delay disrupts normal operation. For testing purposes, I only left a simple dummy "for()" delay loop inside "write_block". If the value of the delay cycles is more than 300-500, the writing freezes. This behavior is independent of the calling "usbd_poll()" from wile() loop or from USB_LP_CAN1_RX0_IRQHandler. If we look to usb_msc.c
if (0 != (ms->read_block)(lba,
trans->msd_buf)) {
/
Error */
}
we see that the unsuccessful write is not handled.
It seems like library don't send "I'm busy" to host.
I tried to accumulate large blocks and then flush to disk. This does not change anything. There comes a time when there is a delay during data flush even if it inside of main loop, not in "write_block".
Is there any one who get working MSC with real flash? All examples with ram buffer only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants