-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NXDART] NuttX Distributed Automation for Build and Runtime Testing. #15730
Comments
@cederom hi, do we need this long label |
Allright, sorry, juest testing the projects :-) It displays well in other places. What do you propose? |
Thank you @fdcavalcanti that would be great!! Ultimately DRUNX will work on home computers of interested users, @lupyuen already has working prototype that simply run the CI scripts that we run on GitHub. There is also Dashboard that sums up the builds from various places. Ultimately there will be as many computing powers as tehere are users testing :-) PyTest is a well standardized choice :-) Are those QEMU images available to fetch from some public repo or even build by hand? What platforms are supported? Do you have some sort of bootstrap for them? ESP QEMU tests would perfectly fit the build and runtime in a situation where no hardware is available or necessary.. maybe even GH CI if not too expensive to run :-) How much place would it take to place them in I have created dedicated issue here: #15733 :-) Thank you!! :-) |
We use QEMU available for ESP32, ESP32S3 and ESP32C3. You can find them here. All tests are Pytest based, but heavily use the pytest-embeded plugin, more specifically the Simple tests like building, booting and testing if However there are more complex tests that use some customized application similar to I'm afraid the current NuttX CI image is not compatible. We use a Docker image that has the build system and all plugins required for the tests. |
Thank you @fdcavalcanti :-) Lets move ESP QEMU discussion to dedicated thread #15733 :-) |
@cederom also although you seem to stick to this "DRUNX" name, it seems already a name of other project: https://github.com/alxolr/drunx I know you want to use a funny name that sounds like "drunk", but it is better to use unique name and more "serious" name. Think about the message we are sending for a company that want to use NuttX! |
Thanks Alan :-) Just a working slur until we figure out better name, ideas welcome!! :-) |
Now testing our new PR Test Bot, that will Build a Test a PR on Real Hardware :-) (Oz64 SG2000 RISC-V SBC) |
I saw it already commenting on PRs wow AMAZING WORK @lupyuen as always!! Big Thank You!! :-) |
I had an idea to keep improving our CI (and lower our GitHub runners usage). Nowadays, we have a set of rules to build parallel jobs for specific target groups. Each target group usually contains jobs for the same arch (usually, sub-grouped by chip). Instead of running these parallel jobs, what do you think about building only the We have previously talked about providing a @lupyuen , have you already tested something similar? |
@tmedicci this is a good idea, actually the CI should do it already case the PR only touches the directory boards/ for some specific arch family. But his fine grained tuning could be difficult when the PR touches in some more common files, otherwise we avoid testing and later discover that some board are broken (and "here comes the french guy again"). |
Yes, it already selects only the archs touched by the PR. But there could still have a lot of defconfigs to build in one single arch. The same is valid for general changes (which triggers all defconfigs). The main problem is that each target group job runs in parallel, so we can't cancel one job if the other failed (we can only avoid another job from running if it hasn't started yet. Correct me if I'm wrong, @lupyuen ). By creating an "on-demand" target group with all the |
@tmedicci I'm experimenting with
Isn't Probably the best we can do for now: |
Oh, perhaps I wasn't that clear on my idea: I proposed to have a defconfig (not necessarily Let me provide a quick example: suppose a PR triggered all archs (because it's a general change, for instance). That would trigger a lot of parallel jobs for every arch. Instead of that, we should create an intermediary flow (target group) with a single defconfig for each board ( |
I'm reading 😉 , I just proposed this change here because it's more related to the way we organize our workflows than to our infrastructure. Let me know if there is an issue more suitable to discuss about it. |
@tmedicci Interesting idea! So someone needs to maintain the Initial Defconfig? (The one that builds before all others) I could do it for RISC-V. But are we able to find someone who can maintain the Initial Defconfigs for Arm32 and Arm64? I think the challenge we face is that we don't have an Owner for Arm32 / Arm64. Changing the CI Flow will be tricky if we can't find a person who will agree to the New CI Flow and maintain it. |
We can ask for help or even creating ourselves a defconfig that builds the |
We are working on the prorototype and documentation right now with @lupyuen.. still considering if creating additional configurations lead to inconsistencies in the long term, I know this will make things faster, and I was pro this idea initially too, but maybe working on generic configurations is a better choice? |
What do you mean by generic configurations? |
The ones that already exist. It may take more time to build separate configurations, but there will be no additional burden and compatibility problems. Anyways, this idea is nice, but looks like we need some other stuff first, and we work in our free time surrounded with thousands of other tasks :\ @lupyuen already created some |
Well, let's quantify it: there are 342 boards and 1654 defconfigs. It can be even less than that: consider testing only a single board for each chip, that would be less than 110 defconfigs. The idea is to test a single defconfig for each board (worst case) and, if successful, proceed to test the other defconfigs. Currently, our CI triggers up to 32 parallel jobs to build the 1654 defconfigs. If one of these jobs fails, the others will continue running. By running only a single job to build 110 defconfigs, we can stop as soon as the first build fails. This would save a lot of GH runners usage. I think it's a huge benefit! |
Sounds good, can you present a working prototype @tmedicci ? :-) |
Just some additional thoughts about our CI organization. Although it's recommended to keep the distributed build farm, we should avoid as many as possible bad commits from being merged upstream. To do that, we need to test every single PR. This costs a lot, so we need to make it more efficient. How? By splitting the CI into more workflows prone to fail (and stop subsequent jobs). First, we build the most complete Let's use our GH runners to build the firmware and run the QEMU testing. If QEMU testing is successful, we can even use self-hosted runners to test the HW (see, the security concerns here are mitigated as it'd only be tested after QEMU). I created a simple diagram about what I think should be our optimal CI in the future: It doesn't matter that much if It takes 3 or more hours to run the complete CI as long as it fails as soon as it detects a failure. Is this possible? I don't know, we have to make it step-by-step. My current proposal is implementing the following: And evaluate how much GH runner's usage we'd save by doing that... |
Yes, I think so. It'd take a while to get used to GH workflows but I'll work on that... |
Sounds cool :-) Go for it in sync with @lupyuen as he knows most about current CI setup and functions :-) The sooner error can be detected and CI does not consume resources unnecessary the better, we almost lost CI on GitHub due to over use @lupyuen saved us from the doom. Also each push to the PR triggers all of the builds again and again.. so yes catching errors as soon as possible is more than welcome :-) Btw what tool do you use to make that nice charts @tmedicci ? :-) |
@lupyuen before making a prototype, what do you think about it? Are these major goals achievable? Any considerations about it? I really would like to have self-hosted GH runners for HW testing (not building) in the future (considering that we have the code tested on QEMU) |
@tmedicci Sorry I thought we were talk about an Initial Defconfig, instead of an Initial CI Test?
|
Hi @lupyuen , just commenting it:
It fails to execute runtime testing (the LTP). Although it's important, we don't need to test on every single board. My idea is 1) to decouple build testing and runtime testing (solves the problem of being stuck) and 2) run the most basic tests (
We can run LTP only on sim, for instance. This could be a parallel job.
Yes, this is important, but we don't need to test every single
The whole idea of creating intermediary steps is to lower CI usage. If the previous workflow failed, we don't need to test the subsequent workflows. Considering a PR that triggers all target groups, by testing some pre-defined set of defconfigs of all arch, we would end up building 100+ configs first (and, if built successfully, test the others) instead of 1600+ in parallel. The overall usage is expected to lower and we can use this for running QEMU testing, for instance.
Do we need a security team for sure? The idea is to run QEMU testing on GH runners. Self-hosted runners would only run tests on real hardware after previous steps finished successfully. These HW tests would be restricted to some pre-defined defconfig |
I think that we should start with nsh config for all boards if the nsh fails to build there is not need to proceed with the others |
Fair enough. Simpler and still efficient. |
@simbit18 Wonder if you have any thoughts about this? One of us will have to implement this, it might get messy 🤔 |
Remind everyone and me too :) The inviolable principles of NuttX All support must apply equally to all supported platforms. At present this includes Linux, Windows MSYS, Windows Cygwin, Windows Ubuntu, Windows native, macOS, Solaris, and FreeBSD. No tool/environment solutions will be considered that limit the usage of NuttX on any of the supported platforms. Inclusive rather than exclusive. Hobbyists are valued users of the OS including retro computing hobbyists and DIY “Maker” hobbyists. Supported toolchains: GCC, Clang, SDCC, ZiLOG ZDS-II (c89), IAR. Others? No changes to build system should limit use of NuttX by any user. Simplifying things for one user does not justify excluding another user. We should seek to expand the NuttX user base, not to limit it for reasons of preference or priority. We must resist the pull to make NuttX into a Linux-only, GCC-only, and ARM-only solution. https://nuttx.apache.org/docs/latest/introduction/inviolables.html#all-users-matter So the system should also be available for all supported platforms, which is not trivial. @lupyuen Unfortunately, I have no idea of a simple solution. |
Thanks @simbit18! I'm keen to hear your thoughts about this revamped CI Flow:
Also I wonder: If you and I are not keen to maintain NuttX CI for the long term, who would be the right people to do this? Hmmm... |
I don't expect it to be a single-person-only job... NuttX is growing, we need to make our CI more efficient being able to test more platforms before something is merged upstream. I can help, but I'll need help too. This is the kind of task that should be on our roadmap for NuttX. |
Yup, Regarding dedicated configurations for tests we were tihnking about this with @lupyuen and it turns our right now it would be best and least invasive to adapt test scenarios to existing configurations and clean up those cinfigurqtions (i.e. mentioned above I will update assumptions on top now :-) |
Might be good to do some analysis on our Current GitHub Usage, something like this. So we have some idea how much GitHub Usage we will reduce. |
@lupyuen As a first check, it could be useful for boards with many configurations. As @cederom rightly observes, however, all configurations of nsh must be made standard. The important thing is that cleaning :nsh config does not increase the configurations!!! |
@cederom yes NXDART is nice and probably is unique! (I found nx-dart, but not nxdart) |
So we change codename to NXDART ? :-) |
@simbit18 clearing nsh config from other options by itself won't increase the number of configs. But... it may reduce the CI coverage and if we want to keep the same CI coverage, we have to create new configs for that. So it's not that easy. |
Testing NSH makes sense from a sanity test stand point. Building NSH should be treated as a smoke test: can't even build it, go back to development. Then, a next step is running the NSH build in QEMU and getting to the Nuttshell should be enough for a sanity test. For more board testing, one thing we can do is have a directory |
raspberrypi-pico:nsh
).nsh
andostest
) and adapt tests scenarios. Board configs need cleanup and standardizatiion (i.e.:nsh
only containsnsh
nothing else, and standard set of built-ins`.PASS
,FAIL
,TIMEOUT
,ERROR
,SKIP
,UNAVAILABLE
results for every single test case plus detailed logs.ASSERT
/EXPECT
conditions, so tests that are designed to fail are possible too (i.e. invalid parameter).selftest
board configuration. Every board should at least provide mandatory baseselftest
, and optionalselftest-extended
,selftest-specific
,selftest-custom
, etc (see below). This is a song of future.The text was updated successfully, but these errors were encountered: