System State Framework (SSF)

Introduction

This document describes an extensible design for tracking and publishing the system state for NG800 and OEM products derived from NG800.

The system state is a string variable that reflects the run-level of the overall system (off, booting, starting, up, shutdown-pending, shutting-down, powering-down). This value is published to user applications via the sysfs (file system).

At the core of the design a state machine tracks the system state and processes multiple inputs such as the ignition signal. Before shutting down Linux because of a de-asserted ignition signal, the state machine grants user-space application time to properly shut down. User applications can prolong the shutdown timer if they need more time to terminate. If the timer elapses, the state machine instructs the kernel to shut down.

File System Entries

All the entries are available under the directory /sys/kernel/broker:

  • ignition

    • status of the ignition signal

      • 1 = asserted
      • 0 = de-asserted
  • system-state

    • state of the system

      • starting –> operating system, applications, etc are starting up
      • up –> system start-up finished, i.e. fully booted, up and running
      • shutdown-pending –> system was told to shut down by giving applications time to terminate, see also shutdown-delay
      • shutting-down –> shut down in progress
  • system-state-target

    • interface to “command” the SSF, i.e. the following parts can be written in it:

      • up –> –> triggers the SSF for being up (transition from starting to up)
      • reboot –> triggers an immediate reboot
      • powerdown –> triggers an immediate power-off
  • shutdown-delay [seconds]

    • set or read the default shutdown-delay
    • this value is initialized in the device-tree
  • extend-shutdown-delay [seconds]

    • delay the shutdown to have more time to terminate applications
  • remaining-shutdown-delay [seconds]

    • countdown with the remaining time until the device shuts down
  • start-reason

    • information about the reason for the start-up

      • power –> ignition and power are both attached to the device
      • reboot –> device is rebooting (reboot command, ignition signal or RTC alarm during shut down process)
      • watchdog –> device is reset by watchdog (see watchdog feature below)
      • wakeup;ignition –> the device was ignited at a power down (power supply still attached)
      • wakeup;rtc-alarm –> the device woke up by an RTC alarm (power supply still attached)
  • ping-request

    • used for the watchdog feature to test a correct operation of the kernel modules
    • writing a test string triggers the kernel modules (response shown in ping-response)
  • ping-response

    • used for the watchdog feature to test a correct operation of the kernel modules
    • reading the response triggered by writing into ping-request

SSF Components

The SSF consists of two kernel modules and a user space application:

  • SSF broker (kernel module)

    • exposes all important SSF topics as sysfs files
    • distributes SSF notifications to registerd components
  • SSF sysstate (kernel module)

    • exposes the current core system state in the sysfs file
  • SSF manager (user space application)

    • writes the state to filesys system-state file such as

      • up as soon as the system is completely booted
      • powerdown as soon as a powerdown is progressing
      • reboot as soon as a reboot is progressing
    • handles the watchdog including the ping check (ping-request and ping-response)

Note

The SSF manager is provided with our OEM Linux Release. If a custom handling of the SSF is needed it can be configured with its command line options, see section SSF Manager below.

Device Tree Entries

At the moment there are only two relevant options to set in the device-tree. The rest of the device tree entries should be left as is or the device may not function properly.

  • default-shutdown-delay-s

    • the default shutdown-delay when no extending of the shutdown-delay is requested.
    • sets the value of shutdown-delay on startup.
  • max-shutdown-delay-s

    • sets the maximum time of the shutdown-delay. This is used to make sure the shutdown delay can’t be extended forever.

Pending Shutdown

When the ignition signal is de-asserted the system-state shows shutdown-pending for the time located in the file remaining-shutdown-delay. Re-asserting the ignition signal during this time the system-state changes back to up.

Prolonging a pending shutdown is described in the next section.

Extending a Shutdown

As mentioned above the shut down can be delayed to have time to terminate applications properly. The following example shows about how to use it:

Example: Let’s assume the default shutdown is 60s and after 30s we notice that we need to delay it for 75s. Perform the following command:

echo "75" > /sys/kernel/broker/extend-shutdown-delay

With this command the shutdown countdown starts again from 75s.

Note

The maximum total delay is configured in the device-tree or is 300s by default.

RTC wake-up

The SSF provides a start reason to differentiate between RTC wake-up and ignition signal. To set up an RTC wake-up you can just use the linux command rtcwake.

Example: If I want to wake-up my device after 90s from now and in the meantime it shall be powered off, I can call this:

rtcwake -s 90 -m off

The start reason read from start-reason is wakeup;rtc-alarm.

Device is Shutting down

The system is rebooting if during the shutting down process the following events are given:

  • re-assertion of the ignition signal
  • wake-up event of an RTC alarm
  • reboot commanded

Powering the Device Off

The system is powering off on the following events:

  • poweroff commanded
  • RTC alarm set up with mode to power off
  • de-assertion of the ignition signal

Ping Request/Response

The kernel modules of the SSF can be tested by writing a string to ping-request and reading the response from ping-request. Any request is taken by the SSF broker and forwarded to the SSF sysstate which finally writes the response on the sysfs.

Watchdog Feature

The provided SSF manager includes a watchdog feature which is linked to the ping request/response mechanism checking that the kernel modules are working as expected. Thus it is using the watchdog feed interval to compare the ping response with the corresponsing request before feeding the watchdog. If the ping response and request are mismatching, the watchdog is not fed and will starve. This leads to a watchdog reset of the device. In this case the start-reason will be shown as watchdog.

SSF Manager

The SSF manager provides currently two features:

  1. marking the system state of the SSF
  2. hanlding the system watchdog by using the SSF Ping mechanism

See the following help for further details:

root@am335x:~# ssf-mgr -h
Usage: ssf-mgr [args]

    -h | --help                            Show this help
    -d | --daemonize                       Run as daemon
    -p | --pidfile=path                    The PID file, see -d
    -m | --mark-sys-state                  Mark the system state for the SSF
    -w | --with-watchdog                   Enable watchdog and supervise SSF modules
    -t | --wd-timeout=TIMEOUT_MS           Configure watchdog timeout to TIMEOUT_MS
                                             default=8000ms
    -i | --wd-feed-interval=INTERVAL_MS    Set watchdog feed interval to INTERVAL_MS
                                             default=4000ms

Used loggers: - evtloop
              - initSys
              - systemState
              - watchdogMgr
              - brokerPinger

SysLogger    OPTIONS:
             --loglevel=n          Set the max application log level (used for all logger
                                   instances as default) to n (0=emcy, 7=dbg). If comma separated
                                   list of separate logger instances is provided after this
                                   number, the log level for each such instance will be overruled
                                   accordingly (e.g. --loglevel=7,evtloop.5,fileOp.6).
             --disable-syslog      disable the log output to syslog
             --enable-stdout       enable output on stdout (e.g. for debugging purposes)

SysLogger Examples:
             prog-name --disable-syslog
             prog-name --disable-syslog --enable-stdout
             prog-name --loglevel=6,config.5,serial.7

In our OEM Linux Release the ssf-mgr.service is starting with the default config where marking of the system state is activated and the watchdog feature is enabled:

root@am335x:~# cat /etc/default/ssf-mgr.conf
# Default settings for system-state-framework manager
#  for details run ssf-mgr --help

MARK_SYS_STATE="-m"
WATCHDOG_CONFIG="-w"
LOGGER_CONFIG="--loglevel=6,evtloop.5,systemState.6,initSys.6,brokerPinger.6,watchdogMgr.6"


root@am335x:~# cat /usr/lib/systemd/system/ssf-mgr.service
[Unit]
Description=SystemStateFramework Manager daemon

[Service]
Type=forking
EnvironmentFile=-/etc/default/ssf-mgr.conf
ExecStart=/usr/bin/ssf-mgr $MARK_SYS_STATE $WATCHDOG_CONFIG -d -p /run/ssf-mgr.pid $LOGGER_CONFIG

[Install]
WantedBy=multi-user.target

Starting Options

Disable System State Marking and Watchdog Feature

Starting the SSF manager without watchdog and without marking the system state, needs to remove the options -w and -m:

root@am335x:~# cat /etc/default/ssf-mgr.conf
# Default settings for system-state-framework manager
#  for details run ssf-mgr --help

MARK_SYS_STATE=""
WATCHDOG_CONFIG=""
LOGGER_CONFIG="--loglevel=6,evtloop.5,systemState.6,initSys.6,brokerPinger.6,watchdogMgr.6"

Disable System State Marking

Starting the SSF manager by handling only the watchdog part can be fulfilled by removing the -m option:

root@am335x:~# cat /etc/default/ssf-mgr.conf
# Default settings for system-state-framework manager
#  for details run ssf-mgr --help

MARK_SYS_STATE=""
WATCHDOG_CONFIG="-w"
LOGGER_CONFIG="--loglevel=6,evtloop.5,systemState.6,initSys.6,brokerPinger.6,watchdogMgr.6"

Timeout Settings

The watchdog feed interval and the watchdog timeout are related, i.e. the watchdog timeout must be higher than the check interval. Those times can be changed by the following command line options:

  • -t watchdog timeout in [ms]

    • default = 8000ms
  • -i watchdog feed interval in [ms]

    • default = 4000ms

Example setting the timeout to 30s and the interval to 15s:

root@am335x:~# cat /etc/default/ssf-mgr.conf
# Default settings for system-state-framework manager
#  for details run ssf-mgr --help

MARK_SYS_STATE="-m"
WATCHDOG_CONFIG="-w -t 30000 -i 15000"
LOGGER_CONFIG="--loglevel=6,evtloop.5,systemState.6,initSys.6,brokerPinger.6,watchdogMgr.6"

Note

The timeout may vary due to the PMIC setting which is a multiple of a specific base time, see the datasheet of the PMIC for more details.

Source for the System State Marking

The init system is systemd and the states of a finished start-up, rebooting or powering off can be collected from dbus messages. Find the dbus registration parameter in the following list.

  • State of finished start-up:

    dbus registartion parameters:
     - sender/service = "org.freedesktop.systemd1";
     - object path    = "/org/freedesktop/systemd1";
     - interface      = "org.freedesktop.systemd1.Manager";
     - signal         = "StartupFinished"
     - item           = "up"     // this is not necessary as StartupFinished does not have any other items, it is just for the internal list
    
  • The same mechanism for the poweroff and reboot is used where only a different singal and different items are used:

    - signal       = "UnitNew"   // same signal for poweroff and reboot
    - itemPoweroff = "poweroff.target"
    - itemReboot   = "reboot.target"
    

System Watchdog Usage

The system watchdog work with the following principle:

/* The watchdog will be activated when opening the the watchdog device file */
fd = open(dev, O_RDWR);
if (-1 == fd)
{
    fprintf(stderr, "Error: %s\n", strerror(errno));
    exit(EXIT_FAILURE);
}

/* Setting the watchdog interval */
fprintf(stdout, "Set watchdog interval to %d\n", interval);
if (ioctl(fd, WDIOC_SETTIMEOUT, &interval) != 0)
{
    fprintf(stderr, "Error: Set watchdog interval failed\n");
    exit(EXIT_FAILURE);
}

/* Getting the current watchdog interval - which might be advisable when the
*  watchdog timeout bases on a factor of a base time such as the PMIC
*  watchdog does.
*/
if (ioctl(fd, WDIOC_GETTIMEOUT, &interval) == 0)
{
    fprintf(stdout, "Current watchdog interval is %d\n", interval);
}
else
{
    fprintf(stderr, "Error: Cannot read watchdog interval\n");
    exit(EXIT_FAILURE);
}


/* Interval loop feeding the watchdog
*    There are two ways to kick the watchdog:
*    - by writing any dummy value into watchdog device file, or
*    - by using IOCTL WDIOC_KEEPALIVE
*/
do
{
    /* the device file way: */
    write(fd, "w", 1);
    fprintf(stdout, "Feed watchdog through writing over device file\n");

    /* OR the ioctl way: */
    ioctl(fd, WDIOC_KEEPALIVE, NULL);
    fprintf(stdout, "Kick watchdog through IOCTL\n");
} while (isLoopRunning);


/* The 'V' value needs to be written into watchdog device file to
*  indicate that we intend to close/stop the watchdog
*/
write(fd, "V", 1);
/* Closing the watchdog device deactivates the watchdog */
close(fd);