Skip to content

armbian-firstlogin: power-loss recovery and atomic writes#9954

Open
mingzhangqun wants to merge 3 commits into
armbian:mainfrom
Seeed-Studio:fix/firstlogin-power-loss-recovery
Open

armbian-firstlogin: power-loss recovery and atomic writes#9954
mingzhangqun wants to merge 3 commits into
armbian:mainfrom
Seeed-Studio:fix/firstlogin-power-loss-recovery

Conversation

@mingzhangqun

@mingzhangqun mingzhangqun commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add power-loss recovery: if firstlogin is interrupted (power loss), re-running detects existing user and skips prompts
  • Replace non-atomic sed -i with atomic write pattern (sed to temp file → syncmv) for critical config files (sudoers, sshd_config, locale.gen, netplan)
  • Add chmod --reference to preserve original file permissions in atomic writes (per feedback on armbian-firstrun: use atomic write for armbianEnv.txt MAC randomization #9937)
  • Move .not_logged_in_yet deletion to end of script so any interrupted step causes re-run
  • Add sudoers corruption detection and recovery from power loss
  • Make locale exports idempotent (avoid duplicate entries on re-run)
  • Fix logoutexit 0 (correct for subshell context)
  • Fix unquoted $(tty) in test
  • Add autologin cleanup for tty-specific override paths

Background

On embedded devices (e.g. reComputer RK3576), power loss during first boot is a real scenario. The stock armbian-firstlogin has several issues:

  1. Non-atomic writes: sed -i truncates and rewrites — power loss during this window leaves zero-filled files
  2. No re-run recovery: if interrupted after user creation but before completing all config, the marker file is already deleted and firstlogin won't re-run
  3. Non-idempotent: locale exports are appended unconditionally, causing duplicates on re-run

Test plan

  • Flash image to device, complete first boot successfully
  • Verify user creation, locale, timezone, SSH config all work
  • Simulate power loss at various points during firstlogin and verify recovery on next boot
  • Verify armbianEnv.txt, sudoers, sshd_config are valid after reboot

Summary by CodeRabbit

  • Bug Fixes
    • Safer atomic updates for network, locale, SSH and sudoers configurations to reduce corruption on power loss.
    • More resilient user-creation flow: re-runs after interruptions, reuses existing homes, skips duplicate steps, and fails safely for retry.
    • Idempotent per-user shell and sudoers changes to avoid duplicates and enable recovery from corruption.
    • Stricter startup gating, expanded autologin disable paths, and ordered marker cleanup for reliable restarts.

@mingzhangqun mingzhangqun requested a review from a team as a code owner June 9, 2026 06:28
@mingzhangqun mingzhangqun requested review from ArkadiuszRaj and tystuyfzand and removed request for a team June 9, 2026 06:28
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds atomic_write and applies staged atomic file replacements across firstlogin/firstrun; makes locale, netplan, shell defaults, sudoers, SSH edits atomic; and adds user-creation and marker coordination logic to tolerate partial runs and power loss.

Changes

First-login Script Crash and Power-Loss Safety

Layer / File(s) Summary
Atomic write helper and sourcing
packages/bsp/common/usr/lib/armbian/armbian-common, packages/bsp/common/usr/lib/armbian/armbian-firstlogin
Adds atomic_write(src,dest,mode?) in armbian-common and sources it in armbian-firstlogin.
Network configuration atomic writes
packages/bsp/common/usr/lib/armbian/armbian-firstlogin
Automated Wi‑Fi and Ethernet netplan generation and interactive Wi‑Fi flow render YAML to *.tmp and commit via atomic_write (explicit perms where required).
Shell defaults, locale, and geolocation
packages/bsp/common/usr/lib/armbian/armbian-firstlogin
Shell default files updated via staged writes; /etc/default/locale and /etc/locale.gen rewritten atomically; per‑user RC exports appended idempotently; country_code parsed into COUNTRYCODE.
User creation re-run resilience
packages/bsp/common/usr/lib/armbian/armbian-firstlogin
add_user detects/reuses an existing regular user with a home, loops until a normalized username is unique, conditionally skips adduser when present, and verifies RealUserName post-creation to enable retry.
Sudoers validation and recovery
packages/bsp/common/usr/lib/armbian/armbian-firstlogin
Validates /etc/sudoers; restores a default sudoers template via staged *.tmp + atomic_write if corrupted; adds psd-overlay-helper sudoers rule idempotently.
SSH config and marker coordination
packages/bsp/common/usr/lib/armbian/armbian-firstlogin
SSHD config is updated atomically (re-enable AcceptEnv LANG); /root/.not_logged_in_yet removed before restarting ssh.service and ensured again at script end (guarded by tty).
Entry gating and autologin cleanup
packages/bsp/common/usr/lib/armbian/armbian-firstlogin
Entry requires marker and non-empty tty; autologin override removal expanded to tty1 and serial-getty variants; exhausted password retries now exit 0.
firstrun MAC randomization atomic replace
packages/bsp/common/usr/lib/armbian/armbian-firstrun
Generates mac0/mac1, performs sed into ${_f}.tmp, and replaces /boot/armbianEnv.txt via atomic_write (tmp cleanup on sed failure).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • armbian/build#9937: Earlier firstrun atomic-write work for /boot/armbianEnv.txt MAC randomization that this change builds upon.

Suggested reviewers

  • iav

Poem

🐰 I stage my writes where chaos used to creep,
Temp files and moves keep settings safe in sleep.
If power cuts the run and users disappear,
I'll reuse their homes and bring them back next year.
Atomic moves and markers—steady as a drum.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main objectives of the changeset: power-loss recovery mechanisms and adoption of atomic writes for safe file updates in the armbian-firstlogin script.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added size/medium PR with more then 50 and less then 250 lines 08 Milestone: Third quarter release Needs review Seeking for review BSP Board Support Packages labels Jun 9, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/bsp/common/usr/lib/armbian/armbian-firstlogin`:
- Around line 1114-1118: The rm -f /root/.not_logged_in_yet deletion is outside
the main guard (if [[ -f /root/.not_logged_in_yet && -n "$(tty)" ]]) and can
remove the marker when the setup block (lines ~724-1108) is skipped; restore
robustness by relocating the marker removal inside that main guard or by adding
the same condition before deleting (check -f /root/.not_logged_in_yet and -n
"$(tty)" or equivalent variable used in the script) so that
/root/.not_logged_in_yet is only removed after the setup logic in the main block
(the same guarded section that performs the setup steps) has completed.
- Around line 18-25: atomic_write currently treats an empty string mode the same
as "unset" using [[ -n "$mode"]], causing fallback to --reference when the dest
may not exist; change the check to detect whether a second argument was actually
passed (e.g. if [ "$#" -ge 2 ]), and if provided call chmod "$mode" and
fail/return if that chmod fails (do not fall through to --reference), otherwise
(mode not provided) only attempt chmod --reference="$dest" when the destination
exists (silently skip if it doesn't); also audit call sites that create new
files (e.g. sudoers/netplan writers) and pass an explicit mode when the
destination may not exist.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cc4b89eb-0168-4f28-b71d-82a08570e3ce

📥 Commits

Reviewing files that changed from the base of the PR and between 66e68ea and 7e27a05.

📒 Files selected for processing (1)
  • packages/bsp/common/usr/lib/armbian/armbian-firstlogin

Comment thread packages/bsp/common/usr/lib/armbian/armbian-firstlogin Outdated
Comment thread packages/bsp/common/usr/lib/armbian/armbian-firstlogin Outdated
@mingzhangqun mingzhangqun force-pushed the fix/firstlogin-power-loss-recovery branch from ba40aa9 to f2e9a4d Compare June 9, 2026 07:07
@adeepn

adeepn commented Jun 10, 2026

Copy link
Copy Markdown
Member

Why not just use the install command?

@mingzhangqun

mingzhangqun commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Hi @adeepn Thank you for your reply.
install does a data copy (open/write/close loop), not an atomic rename(). Since this PR's whole point is crash safety during power loss, we need mv (which is rename(2) on the same filesystem) — it guarantees the destination is either the old file or the new file, never a partial write. install would write directly to the destination path, leaving a partial file if power is lost mid-write, reintroducing the corruption window we're trying to close.

@adeepn

adeepn commented Jun 11, 2026

Copy link
Copy Markdown
Member

Hi @adeepn Thank you for your reply. install does a data copy (open/write/close loop), not an atomic rename().

my bad: I was reading BSD version man page, which does handle flush and renames :)

But in that case I'd suggest having the function mirror install's call arguments, so the function's semantics are clear from the call site: anyone reading the code should immediately see what it does, rather than having to guess that it appends .tmp to the name.

@mingzhangqun

Copy link
Copy Markdown
Contributor Author

Good point about readability — I've updated atomic_write to accept explicit <src> <dest> [mode] arguments so the data flow is clear at every call site. Commit: fe1b713

@adeepn

adeepn commented Jun 11, 2026

Copy link
Copy Markdown
Member

since you prepared "atomic-install", please re-fix #9937

chmod --reference="$dest" "$src"
fi
mv -f "$src" "$dest"
sync -f "$dest" 2>/dev/null || sync "$dest" 2>/dev/null

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to write "thou shall not use sync rather sync --file-system", but then seeing sync --file-system here already!
Nice!
And to be honest, I was made aware of the -f option today when reading up on https://www.linux.org/threads/sync-vs-fsync-%E2%80%94-making-sure-your-writes-actually-hit-the-disk.64365/.

#Enable Wlan, disable Eth
local _f="/etc/netplan/30-${DEVTYPE}${CONFIG_NAME}.yaml"
echo -e "$(createYAML)" > "${_f}.tmp"
atomic_write "${_f}.tmp" "$_f" 600

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style comment; please use ${VAR} everywhere, not just where it's really really needed.


# change shell for future users
sed -i "s|^SHELL=.*|SHELL=${SHELL_PATH}|" /etc/default/useradd
sync -f /etc/default/useradd 2>/dev/null

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The atomic_write() is a very OK way to do these things, as sed -i does have some tendency to bork things up. Why not use that here as well?

# change default locales
sed -i "s/=.*/=$(echo ${LOCALES} | cut -d" " -f1)/g" /etc/default/locale
# change default locales (atomic write)
if sed -i "s/=.*/=$(echo ${LOCALES} | cut -d" " -f1)/g" /etc/default/locale; then

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do the atomic_write() here as well

if ! grep -q '^root\s\+ALL' /etc/sudoers 2>/dev/null; then
# sudoers corrupted (power-loss), restore from dpkg default
local _f="/etc/sudoers"
cat > "${_f}.tmp" << 'SUEOF'

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be aware that herefiles with a quoted end tag ('EOL' or \EOL) will not do any shell expansion inside of the herefile content. Best to add a remark for future users that want to modify this part.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added a comment above the heredoc in the latest commit:

# Note: quoted heredoc tag ('SUEOF') disables shell variable expansion;
# all content is literal — do not add $VAR references inside without unquoting.
cat > "${_f}.tmp" << 'SUEOF'

REPEATS=$((REPEATS - 1))
fi
[[ "$REPEATS" -eq 0 ]] && logout
[[ "$REPEATS" -eq 0 ]] && exit 0

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to do an exit here instead of a logout ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script runs as a child process (has #!/bin/bash shebang, executed not sourced), so it's not a login shell. logout is a bash builtin restricted to interactive login shells and can behave unexpectedly outside that context. exit 0 exits the script process explicitly and works regardless of how the script is invoked. Using exit 0 here makes the retry-exhaustion path correct and portable.

sed -e '/^#AcceptEnv LANG/ s/^#//' -i /etc/ssh/sshd_config
# re-enable passing locale environment via ssh and allow root password login (atomic write)
_f="/etc/ssh/sshd_config"
if sed -e '/^#AcceptEnv LANG/ s/^#//' -e 's/^#PermitRootLogin.*/PermitRootLogin yes/' "$_f" > "${_f}.tmp"; then

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make this a little more permissive with:

sed -E "
    /^[[:blank:]]*#[[:blank:]]*acceptenv[[:blank:]]+lang/I { 
        i #[NOTE: Modified by ${0##*/}]
        s/^[[:blank:]]**#//; 
    }
    /^[[:blank:]]*(#[[:blank:]]*)?permitrootlogin[[:blank:]]+/I { 
        i #[NOTE: Added by ${0##*/}]
        i PermitRootLogin yes
        d;
    }" "${_f} > ...

sed -e '/^#AcceptEnv LANG/ s/^#//' -i /etc/ssh/sshd_config
# re-enable passing locale environment via ssh and allow root password login (atomic write)
_f="/etc/ssh/sshd_config"
if sed -e '/^#AcceptEnv LANG/ s/^#//' -e 's/^#PermitRootLogin.*/PermitRootLogin yes/' "$_f" > "${_f}.tmp"; then

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so sure about the PermitRootLogin yes here, what's the reason to have that in? For "hardened" configurations, if root login is to be permitted, at least add something additional like PermitRootLogin prohibit-password (which would require a pubkey foir root logins instead of a password) to prevent brute force attacks, or force to have both password and a pubkey.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — I've removed PermitRootLogin yes in the latest commit. It was not in the original upstream code and is out of scope for this power-loss safety PR. The sshd_config edit now only uncomments AcceptEnv LANG (using the more permissive sed -E pattern you suggested), which is what the original code did.

Add atomic_write() helper that writes to a temp file, syncs, then
renames — surviving power loss at any point. Use it for all
armbianEnv.txt modifications (hostname, MAC, root password).

Guard atomic_write permission handling: only chmod --reference when
the destination exists; use explicit mode for new files. Guard
firstlogin marker deletion behind the same tty check as the main
setup block so non-interactive invocations don't remove the marker
without completing setup.
Change atomic_write() signature from (dest, [mode]) to (src, dest, [mode])
so call sites clearly show the data flow without relying on an implicit
.tmp naming convention. Use local variables to avoid repeating long paths.
Consolidate separate chmod calls into the mode parameter where possible.
Move atomic_write() from armbian-firstlogin into armbian-common so both
armbian-firstrun and armbian-firstlogin share one implementation, and
refactor the inline atomic-write logic in armbian-firstrun's MAC
randomization (armbian#9937) to use it.

Also address review feedback on armbian#9954:
- Use ${VAR} notation consistently in the newly added code
- Apply atomic_write() to /etc/default/useradd, /etc/adduser.conf and
  /etc/default/locale (previously left with plain sed -i + sync)
- Add a note about the quoted heredoc tag ('SUEOF') disabling variable
  expansion
- Drop the PermitRootLogin yes change from sshd_config editing (not in
  the original code, and out of scope for this power-loss safety PR)
- Use a more permissive sed -E pattern for uncommenting AcceptEnv LANG
@mingzhangqun mingzhangqun force-pushed the fix/firstlogin-power-loss-recovery branch from fe1b713 to e9b0b9d Compare June 12, 2026 02:30

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/bsp/common/usr/lib/armbian/armbian-common`:
- Line 44: The current fallback call uses 'sync "${dest}"' which fails on
systems where sync has no file argument; update the fallback so if 'sync -f
"${dest}" 2>/dev/null' fails, it calls plain 'sync' (no arguments) instead
(i.e., change the expression that references the command string 'sync -f
"${dest}" 2>/dev/null || sync "${dest}" 2>/dev/null' to fall back to a bare
'sync' so older systems actually flush to disk).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a9144ad4-1905-4cd5-bdba-eff0d52a669d

📥 Commits

Reviewing files that changed from the base of the PR and between fe1b713 and e9b0b9d.

📒 Files selected for processing (3)
  • packages/bsp/common/usr/lib/armbian/armbian-common
  • packages/bsp/common/usr/lib/armbian/armbian-firstlogin
  • packages/bsp/common/usr/lib/armbian/armbian-firstrun

chmod --reference="${dest}" "${src}"
fi
mv -f "${src}" "${dest}"
sync -f "${dest}" 2>/dev/null || sync "${dest}" 2>/dev/null

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Correct the sync fallback to work on older systems.

The fallback sync "${dest}" will fail on systems where sync does not support the -f option, because standard sync (without -f or --file-system) does not accept file arguments. The error is suppressed by 2>/dev/null, so the sync silently fails, leaving data potentially unflushed and defeating the crash-safety goal of this PR.

🔧 Proposed fix
-	sync -f "${dest}" 2>/dev/null || sync "${dest}" 2>/dev/null
+	sync -f "${dest}" 2>/dev/null || sync
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sync -f "${dest}" 2>/dev/null || sync "${dest}" 2>/dev/null
sync -f "${dest}" 2>/dev/null || sync
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/bsp/common/usr/lib/armbian/armbian-common` at line 44, The current
fallback call uses 'sync "${dest}"' which fails on systems where sync has no
file argument; update the fallback so if 'sync -f "${dest}" 2>/dev/null' fails,
it calls plain 'sync' (no arguments) instead (i.e., change the expression that
references the command string 'sync -f "${dest}" 2>/dev/null || sync "${dest}"
2>/dev/null' to fall back to a bare 'sync' so older systems actually flush to
disk).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

08 Milestone: Third quarter release BSP Board Support Packages Needs review Seeking for review size/medium PR with more then 50 and less then 250 lines

Development

Successfully merging this pull request may close these issues.

3 participants