Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OATU2 specification draft [WIP] #69

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions docs/ota-updates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Over The Air Unattended Updates (OTAU2)

This document enumerates some of the approaches and FLOSS software that could
be used to deploy OTAU2 to Lepidopter distribution.

## Requirements

A list of Lepidopter's OTAU2 requirements:

* Atomic software release update

* On failure, deploy previous working bootloader, kernel
configuration, and filesystems
Copy link
Member

@darkk darkk Aug 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SD cards are rather big, so we can afford having previous boot & root untouched, we don't need to update them in-place.

SWUpdate calls it Double copy with fall-back, Chromium OS exploits same idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though most SD card image burners do not support archived image copying to an SD card.
Having +16G free disk space is not always that feasible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we've discussed on IRC, it may be quite safe to burn partition table, /boot and /. These three blobs take first ~1Gb of the card and remaining data may be left uninitialised. Moreover, initialization of the /data on boot may be part of wipe-on-failure strategy.


* On success, deploy newest working bootloader, kernel
configuration, filesystems and reboot (if needed) for the changes to take
effect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say reboot (always) for simplicity, as / is changed during update rollout. Is there any reason to avoid reboot?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes mostly because by rebooting an updated SD card in RasPi you never know if it will come back online. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly the point! Rebooting the RasPi you know that everything went OK / NOK as soon as possible.


* Update of bootloader, kernel and configuration data, and filesystems
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can update /boot & root for sure, but RPi bootloader is fused.


* Support for signing of images and verification of images on
installation

* Support for a self-hosted deployment server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't a self hosted deployment server increase the fingerprint-ability of software we deploy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bassosimone good point, we can perhaps use a cloud-fronted server were we 'll hand the updated images.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, China is ok with blocking google cloud. So we may want to use more than one cloud.
Or employ some sort of Domain generation algorithm as a fallback for a blocked cloud & tor.


* Enable/disable a specific feature and apply/rollback system updates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enable/disable a specific feature

IMHO, distributing toggle-flags should be a part of orchestration platform as OTAU is not universal as it does not match needs of hardcore gentoo SysOps wanting to run ooni-probe in their dom0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree with this. I think all configuration specific to a particular installation should be store inside of the permanent storage of the device and handled by the ooniprobe software itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this can be implemented in the orchestration platform?
I will prefer to add us less complication as possible to the orchestration platform --the server that distributes image updates.

incrementally rather than through a complete OS update that
replaces the filesystem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incrementally

Is there any reason to have incrementall update besides bandwidth?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it's good to do incremental updates if for any reason the OS image grows ups to a certain amount remote updates will be quite hard in points where bandwidth is limited and there are often network outages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another gotcha: your OS image may grow larger then temporary storage to download it. That's the reason to stream it from the network straight to the standby-root partition.


* [OPTIONAL] Support for different host roles with a specific configuration set
applicable only to specific hosts or groups (eg: partner probes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above configuration should not be handled by the update mechanism and it should be part of the software itself.

Managing the lifecycle of multiple differently configured images is going to be imho too complex to manage in the long run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hellais By software you mean lepidopter or ooniprobe?

I think you will encounter cases of customization so it's better to have a plan rather than implementing an OTAU system with no roles in mind.


## Available tools

Before reading any further you should go through the excellent study of
software update management on [device-side software update strategies for
automotive grade linux]
(https://lists.linuxfoundation.org/pipermail/automotive-discussions/2016-May/002061.html)
and the related discussion in [OSTreee manual]
(https://ostree.readthedocs.io/en/latest/manual/related-projects/).

The following software could potentially used to implement and deploy OTAU2
updates.

### OSTree

[WIP EVALUATION]

### SWUpdate

[WIP EVALUATION]

### fwup

An image based "firmware" tool that uses a dual partition update pattern.
Upon a successful image update the MBR will be updated to make the bootloader
boot form the 2nd (updated) partition. Update failures are being detected
during the firmware update process.

#### Pros

* Can be integrated to lepidopter with minimal effort.

* Non complicated implementation.

#### Cons

* There is no support for automatic (or unattended) updates.

* There is no support for incremental updates every update results a new (big)
image.

* There is no native support for ext filesystems.

* There is no fallback mode and in case of software bugs in an updated image,
the system will be unable to boot and user intervention (ie. copy a working
image to an SD card) will be required.