Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NXDART] NuttX Distributed Automation for Build and Runtime Testing. #15730

Open
cederom opened this issue Jan 31, 2025 · 45 comments
Open

[NXDART] NuttX Distributed Automation for Build and Runtime Testing. #15730

cederom opened this issue Jan 31, 2025 · 45 comments
Assignees
Labels

Comments

@cederom
Copy link
Contributor

cederom commented Jan 31, 2025

  • In order to maintain high quality of the code releases we need not only Build but also Runtime Automated Testing on a Real World Hardware Boards.
  • GitHub CI provides automated Pull Request verification (@lupyuen manages). TestBot already operational and can be controller with comment commands.
  • We need to test built code on a physical devices in order to validate drivers etc.
  • Solution should be distributed, so anyone can test what they have at hand/home/work/desk.
  • It's called Distributed Test Environment because it enables independent decentralized cross-check examination.
  • Distributed design provides availability redundancy.
  • Solution is designed to be zero-cost and low maintenance cost. Tested already on rPI-Zero-2W that takes 1W idle and 5W build power with ~400 sec build time of medium size configuration (raspberrypi-pico:nsh).
  • Some sort of central Dashboard is required for build/runtime logs storage, analysis, reporting (@lupyuen manages). NuttX Dashboard already updated and serves TestBot to do runtime on real world hardware boards.
  • It will objectively indicate which change broke (or is going to break) what and where, so project can react accordingly.
  • Use case: Some old PC as local CI machine, attach available board(s), build and runtime test master, report problems to Dashboard.
  • For now we are working on existing board configurations (nsh and ostest) and adapt tests scenarios. Board configs need cleanup and standardizatiion (i.e. :nsh only contains nsh nothing else, and standard set of built-ins`.
  • All tests must return PASS, FAIL, TIMEOUT, ERROR, SKIP, UNAVAILABLE results for every single test case plus detailed logs.
    • PASS - test results as expected.
    • FAIL - test result not as expected.
    • ERROR - test execution failed, for any reason, test result is unknown.
    • TIMEOUT - operation did not finish in available time limits.
    • UNAVAILABLE - test is unavailable due to hardware restrictions.
    • SKIP - test was not performed, but on a test list.
  • Tests should define ASSERT / EXPECT conditions, so tests that are designed to fail are possible too (i.e. invalid parameter).
  • Future: List of external and internal tests needs to be created for all boards, and boards/configuration specific tests. This can be implemented as selftest board configuration. Every board should at least provide mandatory base selftest, and optional selftest-extended, selftest-specific, selftest-custom, etc (see below). This is a song of future.
  • Future: We need a building blocks for base / extended / specific / custom test scenarios. Base tests will be mandatory and cover all boards (i.e. nsh + help + ostest). Extended may include benchmarks, stress tests, will be optional (may not be possible on small platforms) but may provide additional results like performance improvement or degradation. Specific tests will be optional too and would cover arch / board specific tests. There must be a way to implement Custom scenarios for closed testing of custom hardware etc.
@cederom cederom converted this from a draft issue Jan 31, 2025
@cederom cederom added the Type: Enhancement New feature or request label Jan 31, 2025
@cederom cederom self-assigned this Jan 31, 2025
@cederom cederom added Area: Testing and removed Type: Enhancement New feature or request labels Jan 31, 2025
@raiden00pl
Copy link
Member

@cederom hi, do we need this long label DRUNX Distrubutes.... ? Can we use here some standardized way of using labels? We have already Area: Qualification Tests, so maybe this will be enough? We don't want to have a mess in the labels, otherwise they don't fulfill their purpose :)

@cederom
Copy link
Contributor Author

cederom commented Jan 31, 2025

@raiden00pl: @cederom hi, do we need this long label DRUNX Distrubutes.... ? Can we use here some standardized way of using labels? We have already Area: Qualification Tests, so maybe this will be enough? We don't want to have a mess in the labels, otherwise they don't fulfill their purpose :)

Allright, sorry, juest testing the projects :-) It displays well in other places. What do you propose? drunx: core, drunx: dashboard? Testing? Please update as you please you will choose the best one :-)

@fdcavalcanti
Copy link
Contributor

@cederom there are plenty of QEMU tests on Espressif side, all pytest based.
If there is computing power available for running more QEMU tests, we could think of a way to integrate those tests to Nuttx.
Had a brief talk about this with @lupyuen, so I'm tagging him here.

@cederom
Copy link
Contributor Author

cederom commented Jan 31, 2025

@fdcavalcanti: @cederom there are plenty of QEMU tests on Espressif side, all pytest based. If there is computing power available for running more QEMU tests, we could think of a way to integrate those tests to Nuttx. Had a brief talk about this with @lupyuen, so I'm tagging him here.

Thank you @fdcavalcanti that would be great!!

Ultimately DRUNX will work on home computers of interested users, @lupyuen already has working prototype that simply run the CI scripts that we run on GitHub. There is also Dashboard that sums up the builds from various places. Ultimately there will be as many computing powers as tehere are users testing :-)

PyTest is a well standardized choice :-)

Are those QEMU images available to fetch from some public repo or even build by hand?

What platforms are supported?

Do you have some sort of bootstrap for them?

ESP QEMU tests would perfectly fit the build and runtime in a situation where no hardware is available or necessary.. maybe even GH CI if not too expensive to run :-)

How much place would it take to place them in nuttx-apps/testing?

I have created dedicated issue here: #15733 :-)

Thank you!! :-)

@raiden00pl raiden00pl added the Type: Enhancement New feature or request label Jan 31, 2025
@fdcavalcanti
Copy link
Contributor

We use QEMU available for ESP32, ESP32S3 and ESP32C3. You can find them here.

All tests are Pytest based, but heavily use the pytest-embeded plugin, more specifically the pytest-embedded-nuttx plugin (supports Nuttshell parsing, flashing, etc).

Simple tests like building, booting and testing if nsh> is responsive exist. Which is enough to know if the build is at least working to this extent. This allows testing most defconfigs that do not require external hardware, WiFi and BLE.
We also test bootloader combinations: reuse all defconfigs and apply MCUBoot instead of simple boot.

However there are more complex tests that use some customized application similar to nuttx-apps. Most would be Espressif related since they target our boards and don't follow some rules (like directly calling GPIO instead of using gpio.h when testing motor driver, capture driver, etc). That's something to be analyzed internally first.

I'm afraid the current NuttX CI image is not compatible. We use a Docker image that has the build system and all plugins required for the tests.

@cederom
Copy link
Contributor Author

cederom commented Jan 31, 2025

Thank you @fdcavalcanti :-) Lets move ESP QEMU discussion to dedicated thread #15733 :-)

@cederom cederom changed the title [DRUNX] Distributed Runtime and bUild Test Farm for NuttX [DRUNX] Distributed Runtime and bUild for NuttX Test Environment Feb 2, 2025
@acassis
Copy link
Contributor

acassis commented Feb 6, 2025

@cederom hi, do we need this long label DRUNX Distrubutes.... ? Can we use here some standardized way of using labels? We have already Area: Qualification Tests, so maybe this will be enough? We don't want to have a mess in the labels, otherwise they don't fulfill their purpose :)

@cederom also although you seem to stick to this "DRUNX" name, it seems already a name of other project: https://github.com/alxolr/drunx

I know you want to use a funny name that sounds like "drunk", but it is better to use unique name and more "serious" name. Think about the message we are sending for a company that want to use NuttX!

@cederom
Copy link
Contributor Author

cederom commented Feb 6, 2025

@acassis: @cederom also although you seem to stick to this "DRUNX" name, it seems already a name of other project: https://github.com/alxolr/drunx

I know you want to use a funny name that sounds like "drunk", but it is better to use unique name and more "serious" name. Think about the message we are sending for a company that want to use NuttX!

Thanks Alan :-) Just a working slur until we figure out better name, ideas welcome!! :-)

@lupyuen
Copy link
Member

lupyuen commented Feb 6, 2025

Now testing our new PR Test Bot, that will Build a Test a PR on Real Hardware :-) (Oz64 SG2000 RISC-V SBC)

Image

@cederom
Copy link
Contributor Author

cederom commented Feb 7, 2025

@lupyuen Now testing our new PR Test Bot:, that will Build a Test a PR on Real Hardware :-) (Oz64 SG2000 RISC-V SBC)

I saw it already commenting on PRs wow AMAZING WORK @lupyuen as always!! Big Thank You!! :-)

@tmedicci
Copy link
Contributor

I had an idea to keep improving our CI (and lower our GitHub runners usage). Nowadays, we have a set of rules to build parallel jobs for specific target groups. Each target group usually contains jobs for the same arch (usually, sub-grouped by chip).

Instead of running these parallel jobs, what do you think about building only the citest defconfig for every eligible arch and making this a necessary step to run the other defconfigs?

We have previously talked about providing a citest (or ostest, or hwtest: a config that enables most of the features and make it more suitable for testing on CI) for every supported board. By running this defconfig (for all eligible archs), we can avoid running the other parallel jobs for every single defconfig if the most basic (citest) fails.

@lupyuen , have you already tested something similar?

@acassis
Copy link
Contributor

acassis commented Feb 21, 2025

@tmedicci this is a good idea, actually the CI should do it already case the PR only touches the directory boards/ for some specific arch family. But his fine grained tuning could be difficult when the PR touches in some more common files, otherwise we avoid testing and later discover that some board are broken (and "here comes the french guy again").

@tmedicci
Copy link
Contributor

tmedicci commented Feb 21, 2025

Yes, it already selects only the archs touched by the PR. But there could still have a lot of defconfigs to build in one single arch. The same is valid for general changes (which triggers all defconfigs).

The main problem is that each target group job runs in parallel, so we can't cancel one job if the other failed (we can only avoid another job from running if it hasn't started yet. Correct me if I'm wrong, @lupyuen ). By creating an "on-demand" target group with all the citest defconfigs touched by the PR and running it prior to anything (see, this may potentially run builds from different archs if the PR touchs them) we can test the most important case first and make it a dependency to trigger the other parallel jobs.

@lupyuen
Copy link
Member

lupyuen commented Feb 21, 2025

We have previously talked about providing a citest (or ostest, or hwtest: a config that enables most of the features and make it more suitable for testing on CI) for every supported board.

@tmedicci I'm experimenting with rv-virt:knsh64_test as the Runtime Test for rv-virt:knsh64. You can see it here

Instead of running these parallel jobs, what do you think about building only the citest defconfig for every eligible arch and making this a necessary step to run the other defconfigs?

Isn't citest kinda slow? We won't know if any defconfigs have Build Errors until citest completes. Also our citest is a little wonky, it fails quite often according to NuttX Build History

Probably the best we can do for now: @nuttxpr test rv-virt:citest. Let our Test Bot run citest concurrently while GitHub CI is building all defconfigs.

@cederom
Copy link
Contributor Author

cederom commented Feb 22, 2025

Yup, GitHub is too error / overload prone, we cannot really depend on it anything beyond absolute basics, so we are working with Lup on alternatives, please follow @lupyuen designs and prototypes that are already working then report issues and improvements :-) Thanks @tmedicci :-)

@tmedicci
Copy link
Contributor

Isn't citest kinda slow? We won't know if any defconfigs have Build Errors until citest completes. Also our citest is a little wonky, it fails quite often according to NuttX Build History

Oh, perhaps I wasn't that clear on my idea: I proposed to have a defconfig (not necessarily citest) that should be built before anything else. I didn't mean (at least not at this moment) to run runtime testing... the idea is to create more dependent workflows that, if failed, avoid other workflows from running (saving GitHub runners from being wasted unnecessarily).

Let me provide a quick example: suppose a PR triggered all archs (because it's a general change, for instance). That would trigger a lot of parallel jobs for every arch. Instead of that, we should create an intermediary flow (target group) with a single defconfig for each board (citest, nsh, ostest, hwtest, it doesn't matter which now). If this target group fails to build, the other parallel jobs won't run. Instead of running +1500 jobs (every defconfig from all boards), we run a single jobs that tests the same defconfig for all archs/cips/boards. If this fails, it doesn't make sense to continue testing everything else.

@tmedicci
Copy link
Contributor

Yup, GitHub is too error / overload prone, we cannot really depend on it anything beyond absolute basics, so we are working with Lup on alternatives, please follow @lupyuen designs and prototypes that are already working then report issues and improvements :-) Thanks @tmedicci :-)

I'm reading 😉 , I just proposed this change here because it's more related to the way we organize our workflows than to our infrastructure. Let me know if there is an issue more suitable to discuss about it.

@lupyuen
Copy link
Member

lupyuen commented Feb 23, 2025

I proposed to have a defconfig (not necessarily citest) that should be built before anything else

@tmedicci Interesting idea! So someone needs to maintain the Initial Defconfig? (The one that builds before all others) I could do it for RISC-V. But are we able to find someone who can maintain the Initial Defconfigs for Arm32 and Arm64?

I think the challenge we face is that we don't have an Owner for Arm32 / Arm64. Changing the CI Flow will be tricky if we can't find a person who will agree to the New CI Flow and maintain it.

@tmedicci
Copy link
Contributor

@tmedicci Interesting idea! So someone needs to maintain the Initial Defconfig? (The one that builds before all others) I could do it for RISC-V. But are we able to find someone who can maintain the Initial Defconfigs for Arm32 and Arm64?

We can ask for help or even creating ourselves a defconfig that builds the ostest. For the first prototype, it'd be fair enough. In the end, we will test all the eligible defconfigs (as we do currently), but there will be this intermediary step that prevents building everything if the most basic defconfigs fails. What do you think?

@cederom
Copy link
Contributor Author

cederom commented Feb 24, 2025

We are working on the prorototype and documentation right now with @lupyuen.. still considering if creating additional configurations lead to inconsistencies in the long term, I know this will make things faster, and I was pro this idea initially too, but maybe working on generic configurations is a better choice?

@tmedicci
Copy link
Contributor

What do you mean by generic configurations?

@cederom
Copy link
Contributor Author

cederom commented Feb 24, 2025

The ones that already exist. It may take more time to build separate configurations, but there will be no additional burden and compatibility problems. Anyways, this idea is nice, but looks like we need some other stuff first, and we work in our free time surrounded with thousands of other tasks :\

@lupyuen already created some test_xxx configurations.. but after some considerations I feel like this is just additional work with not much benefit?

@tmedicci
Copy link
Contributor

The ones that already exist. It may take more time to build separate configurations, but there will be no additional burden and compatibility problems. Anyways, this idea is nice, but looks like we need some other stuff first, and we work in our free time surrounded with thousands of other tasks :\

@lupyuen already created some test_xxx configurations.. but after some considerations I feel like this is just additional work with not much benefit?

Well, let's quantify it: there are 342 boards and 1654 defconfigs. It can be even less than that: consider testing only a single board for each chip, that would be less than 110 defconfigs.

The idea is to test a single defconfig for each board (worst case) and, if successful, proceed to test the other defconfigs.

Currently, our CI triggers up to 32 parallel jobs to build the 1654 defconfigs. If one of these jobs fails, the others will continue running. By running only a single job to build 110 defconfigs, we can stop as soon as the first build fails. This would save a lot of GH runners usage.

I think it's a huge benefit!

@cederom
Copy link
Contributor Author

cederom commented Feb 24, 2025

Sounds good, can you present a working prototype @tmedicci ? :-)

@tmedicci
Copy link
Contributor

Just some additional thoughts about our CI organization. Although it's recommended to keep the distributed build farm, we should avoid as many as possible bad commits from being merged upstream. To do that, we need to test every single PR. This costs a lot, so we need to make it more efficient. How? By splitting the CI into more workflows prone to fail (and stop subsequent jobs).

First, we build the most complete defconfig for each chip (or board). Then, we test it (runtime testing). After that, we can continue and build all the other defconfigs (and, eventually, test some of these configs on QEMU and/or real HW).

Let's use our GH runners to build the firmware and run the QEMU testing. If QEMU testing is successful, we can even use self-hosted runners to test the HW (see, the security concerns here are mitigated as it'd only be tested after QEMU).

I created a simple diagram about what I think should be our optimal CI in the future:

Image

It doesn't matter that much if It takes 3 or more hours to run the complete CI as long as it fails as soon as it detects a failure. Is this possible? I don't know, we have to make it step-by-step. My current proposal is implementing the following:

Image

And evaluate how much GH runner's usage we'd save by doing that...

@tmedicci
Copy link
Contributor

Sounds good, can you present a working prototype @tmedicci ? :-)

Yes, I think so. It'd take a while to get used to GH workflows but I'll work on that...

@cederom
Copy link
Contributor Author

cederom commented Feb 24, 2025

@tmedicci: Just some additional thoughts about our CI organization. Although it's recommended to keep the distributed build farm, we should avoid as many as possible bad commits from being merged upstream. To do that, we need to test every single PR. This costs a lot, so we need to make it more efficient. How? By splitting the CI into more workflows prone to fail (and stop subsequent jobs).

First, we build the most complete defconfig for each chip (or board). Then, we test it (runtime testing). After that, we can continue and build all the other defconfigs (and, eventually, test some of these configs on QEMU and/or real HW).

Let's use our GH runners to build the firmware and run the QEMU testing. If QEMU testing is successful, we can even use self-hosted runners to test the HW (see, the security concerns here are mitigated as it'd only be tested after QEMU).

I created a simple diagram about what I think should be our optimal CI in the future:

Image

It doesn't matter that much if It takes 3 or more hours to run the complete CI as long as it fails as soon as it detects a failure. Is this possible? I don't know, we have to make it step-by-step. My current proposal is implementing the following:

Image

And evaluate how much GH runner's usage we'd save by doing that...

Sounds cool :-) Go for it in sync with @lupyuen as he knows most about current CI setup and functions :-) The sooner error can be detected and CI does not consume resources unnecessary the better, we almost lost CI on GitHub due to over use @lupyuen saved us from the doom. Also each push to the PR triggers all of the builds again and again.. so yes catching errors as soon as possible is more than welcome :-)

Btw what tool do you use to make that nice charts @tmedicci ? :-)

@tmedicci
Copy link
Contributor

Btw what tool do you use to make that nice charts @tmedicci ? :-)

I used draw.io ;)

@tmedicci
Copy link
Contributor

@tmedicci: Just some additional thoughts about our CI organization. Although it's recommended to keep the distributed build farm, we should avoid as many as possible bad commits from being merged upstream. To do that, we need to test every single PR. This costs a lot, so we need to make it more efficient. How? By splitting the CI into more workflows prone to fail (and stop subsequent jobs).
First, we build the most complete defconfig for each chip (or board). Then, we test it (runtime testing). After that, we can continue and build all the other defconfigs (and, eventually, test some of these configs on QEMU and/or real HW).
Let's use our GH runners to build the firmware and run the QEMU testing. If QEMU testing is successful, we can even use self-hosted runners to test the HW (see, the security concerns here are mitigated as it'd only be tested after QEMU).
I created a simple diagram about what I think should be our optimal CI in the future:
Image
It doesn't matter that much if It takes 3 or more hours to run the complete CI as long as it fails as soon as it detects a failure. Is this possible? I don't know, we have to make it step-by-step. My current proposal is implementing the following:
Image
And evaluate how much GH runner's usage we'd save by doing that...

Sounds cool :-) Go for it in sync with @lupyuen as he knows most about current CI setup and functions :-) The sooner error can be detected and CI does not consume resources unnecessary the better, we almost lost CI on GitHub due to over use @lupyuen saved us from the doom. Also each push to the PR triggers all of the builds again and again.. so yes catching errors as soon as possible is more than welcome :-)

Btw what tool do you use to make that nice charts @tmedicci ? :-)

@lupyuen before making a prototype, what do you think about it? Are these major goals achievable? Any considerations about it?

I really would like to have self-hosted GH runners for HW testing (not building) in the future (considering that we have the code tested on QEMU)

@lupyuen
Copy link
Member

lupyuen commented Feb 25, 2025

@tmedicci Sorry I thought we were talk about an Initial Defconfig, instead of an Initial CI Test? rv-virt:citest has problems today:

@tmedicci
Copy link
Contributor

Hi @lupyuen , just commenting it:

It fails to execute runtime testing (the LTP). Although it's important, we don't need to test on every single board. My idea is 1) to decouple build testing and runtime testing (solves the problem of being stuck) and 2) run the most basic tests (ostest, free, mm etc)

We can run LTP only on sim, for instance. This could be a parallel job.

Yes, this is important, but we don't need to test every single defconfig of a board on QEMU. Do we have an issue to track this problem?

The whole idea of creating intermediary steps is to lower CI usage. If the previous workflow failed, we don't need to test the subsequent workflows. Considering a PR that triggers all target groups, by testing some pre-defined set of defconfigs of all arch, we would end up building 100+ configs first (and, if built successfully, test the others) instead of 1600+ in parallel. The overall usage is expected to lower and we can use this for running QEMU testing, for instance.

Do we need a security team for sure? The idea is to run QEMU testing on GH runners. Self-hosted runners would only run tests on real hardware after previous steps finished successfully. These HW tests would be restricted to some pre-defined defconfig citest or hwtest and known apps (like ostest). The firmware would have already been tested on sim and QEMU (and finished successfully). Perhaps we can use "review deployments" feature to trigger the HW testing manually.

@jerpelea
Copy link
Contributor

Hi @lupyuen , just commenting it:

It fails to execute runtime testing (the LTP). Although it's important, we don't need to test on every single board. My idea is 1) to decouple build testing and runtime testing (solves the problem of being stuck) and 2) run the most basic tests (ostest, free, mm etc)

We can run LTP only on sim, for instance. This could be a parallel job.

Yes, this is important, but we don't need to test every single defconfig of a board on QEMU. Do we have an issue to track this problem?

The whole idea of creating intermediary steps is to lower CI usage. If the previous workflow failed, we don't need to test the subsequent workflows. Considering a PR that triggers all target groups, by testing some pre-defined set of defconfigs of all arch, we would end up building 100+ configs first (and, if built successfully, test the others) instead of 1600+ in parallel. The overall usage is expected to lower and we can use this for running QEMU testing, for instance.

Do we need a security team for sure? The idea is to run QEMU testing on GH runners. Self-hosted runners would only run tests on real hardware after previous steps finished successfully. These HW tests would be restricted to some pre-defined defconfig citest or hwtest and known apps (like ostest). The firmware would have already been tested on sim and QEMU (and finished successfully). Perhaps we can use "review deployments" feature to trigger the HW testing manually.

I think that we should start with nsh config for all boards if the nsh fails to build there is not need to proceed with the others

@tmedicci
Copy link
Contributor

I think that we should start with nsh config for all boards if the nsh fails to build there is not need to proceed with the others

Fair enough. Simpler and still efficient.

@lupyuen
Copy link
Member

lupyuen commented Feb 25, 2025

@simbit18 Wonder if you have any thoughts about this? One of us will have to implement this, it might get messy 🤔

@simbit18
Copy link
Contributor

Remind everyone and me too :)

The inviolable principles of NuttX
All Users Matter

All support must apply equally to all supported platforms. At present this includes Linux, Windows MSYS, Windows Cygwin, Windows Ubuntu, Windows native, macOS, Solaris, and FreeBSD. No tool/environment solutions will be considered that limit the usage of NuttX on any of the supported platforms.

Inclusive rather than exclusive.

Hobbyists are valued users of the OS including retro computing hobbyists and DIY “Maker” hobbyists.

Supported toolchains: GCC, Clang, SDCC, ZiLOG ZDS-II (c89), IAR. Others?

No changes to build system should limit use of NuttX by any user.

Simplifying things for one user does not justify excluding another user.

We should seek to expand the NuttX user base, not to limit it for reasons of preference or priority.

We must resist the pull to make NuttX into a Linux-only, GCC-only, and ARM-only solution.

https://nuttx.apache.org/docs/latest/introduction/inviolables.html#all-users-matter

So the system should also be available for all supported platforms, which is not trivial.

@lupyuen Unfortunately, I have no idea of a simple solution.

@lupyuen
Copy link
Member

lupyuen commented Feb 25, 2025

Thanks @simbit18! I'm keen to hear your thoughts about this revamped CI Flow:

I think that we should start with nsh config for all boards if the nsh fails to build there is not need to proceed with the others

Also I wonder: If you and I are not keen to maintain NuttX CI for the long term, who would be the right people to do this? Hmmm...

@tmedicci
Copy link
Contributor

Thanks @simbit18! I'm keen to hear your thoughts about this revamped CI Flow:

I think that we should start with nsh config for all boards if the nsh fails to build there is not need to proceed with the others

Also I wonder: If you and I are not keen to maintain NuttX CI for the long term, who would be the right people to do this? Hmmm...

I don't expect it to be a single-person-only job... NuttX is growing, we need to make our CI more efficient being able to test more platforms before something is merged upstream. I can help, but I'll need help too. This is the kind of task that should be on our roadmap for NuttX.

@cederom
Copy link
Contributor Author

cederom commented Feb 25, 2025

@jerpelea: I think that we should start with nsh config for all boards if the nsh fails to build there is not need to proceed with the others

Yup, nsh config for sure needs a cleanup and standardization in terms of built-ins.. for now there are various other applications being part of :nsh config on some boards, for instance raspberrypi-pico:nsh has also ostest and getprime in that :nsh config :D

Regarding dedicated configurations for tests we were tihnking about this with @lupyuen and it turns our right now it would be best and least invasive to adapt test scenarios to existing configurations and clean up those cinfigurqtions (i.e. mentioned above :nsh). Then when we have clean configurations and working test scripts we could add more new stuff. Otherwise we will add more unclean stuff on top of already unclean stuff :-P

I will update assumptions on top now :-)

@lupyuen
Copy link
Member

lupyuen commented Feb 25, 2025

I think that we should start with nsh config for all boards if the nsh fails to build there is not need to proceed with the others

Might be good to do some analysis on our Current GitHub Usage, something like this. So we have some idea how much GitHub Usage we will reduce.

@cederom
Copy link
Contributor Author

cederom commented Feb 25, 2025

@acassis suggested DRUNX is not a serious name. @lupyuen suggested DART (Dristributed Automated build and Runtime), and I think NXDART could be better than DRUNX? :-)

@simbit18
Copy link
Contributor

simbit18 commented Feb 25, 2025

Thanks @simbit18! I'm keen to hear your thoughts about this revamped CI Flow:

I think that we should start with nsh config for all boards if the nsh fails to build there is not need to proceed with the others

Also I wonder: If you and I are not keen to maintain NuttX CI for the long term, who would be the right people to do this? Hmmm...

@lupyuen As a first check, it could be useful for boards with many configurations. As @cederom rightly observes, however, all configurations of nsh must be made standard. The important thing is that cleaning :nsh config does not increase the configurations!!!

@acassis
Copy link
Contributor

acassis commented Feb 25, 2025

@cederom yes NXDART is nice and probably is unique! (I found nx-dart, but not nxdart)

@cederom
Copy link
Contributor Author

cederom commented Feb 25, 2025

So we change codename to NXDART ? :-)

@raiden00pl
Copy link
Member

@simbit18 clearing nsh config from other options by itself won't increase the number of configs. But... it may reduce the CI coverage and if we want to keep the same CI coverage, we have to create new configs for that. So it's not that easy.
As the first step it would be useful to standardize the names of configurations and decide what should be in them. The current solution where we create an infinite number of configurations to test even trivial things is not the best from CI point of view. But on the other hand it's good for users.

@fdcavalcanti
Copy link
Contributor

Testing NSH makes sense from a sanity test stand point. Building NSH should be treated as a smoke test: can't even build it, go back to development. Then, a next step is running the NSH build in QEMU and getting to the Nuttshell should be enough for a sanity test.

For more board testing, one thing we can do is have a directory ./tests/boards/<arch>/<board>/configs/test_<whatever>. This would put the test defconfigs in a separate place. In fact, having a tests directory that duplicates the folder structure of the main project is common because it provides an easy view for unit tests, however I have not seen it in embedded.

@cederom cederom changed the title [DRUNX] Distributed Runtime and bUild for NuttX Test Environment [NXDART] NuttX Distributed Automation for Build and Runtime Testing. Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

No branches or pull requests

8 participants