Cron Jobs & AutomationLinux System AdministrationService Management (systemd)

Automatically Restart Programs After a Crash or Reboot on Linux: systemd, Supervisor, cron & Best Practices

Keeping critical applications and services running after a crash or system reboot is essential for reliable Linux operations. This guide explains how to restart programs after a crash or reboot on Linux using proven methods: creating systemd service units, using process supervisors like supervisord, implementing lightweight keep-alive scripts with cron, and monitoring with journalctl. You will learn practical unit configuration options (Restart, RestartSec, StartLimitBurst), safe restart policies to avoid restart loops, and recommended security and logging practices. Whether you manage Debian, Ubuntu, RHEL, CentOS, Arch or other distributions, these techniques help you maintain service uptime and recover automatically from failures without manual intervention.

Using systemd to automatically restart services

systemd is the default init system on most modern Linux distributions and provides reliable process supervision. To make an application start on boot and restart automatically after a crash, create a unit file and configure restart policies. Use Restart= to control when systemd attempts to restart the process (common values: on-failure, always), and use StartLimitIntervalSec and StartLimitBurst to prevent fast restart loops. Below are commands you will run to enable and check the service. The examples show expected outputs.

sudo systemctl enable myapp.service

Created symlink /etc/systemd/system/multi-user.target.wants/myapp.service → /etc/systemd/system/myapp.service

The systemctl enable command creates the necessary symlinks so the service starts automatically at boot. The example output confirms the symlink was created to the multi-user target.

sudo systemctl start myapp.service

The systemctl start command typically has no output when successful. It launches the unit; if there is a problem, use systemctl status and journalctl to inspect logs.

sudo systemctl status myapp.service

● myapp.service - My Application
     Loaded: loaded (/etc/systemd/system/myapp.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2026-03-04 08:12:05 UTC; 2min 3s ago
   Main PID: 1423 (myapp)
      Tasks: 5 (limit: 4915)
     Memory: 12.3M
     CGroup: /system.slice/myapp.service
             └─1423 /usr/local/bin/myapp --serve

The systemctl status output shows unit metadata, whether it is enabled, its active state, process id and resource usage. This helps verify the service is running after a start or system boot.

Create a robust systemd unit file (example)

Place the unit file in /etc/systemd/system/ and include Restart policies, user context and resource limits. The example below demonstrates a safe default unit you can adapt.

cat /etc/systemd/system/myapp.service

[Unit]
Description=My Application
After=network.target

[Service]
Type=simple
User=myappuser
WorkingDirectory=/var/lib/myapp
ExecStart=/usr/local/bin/myapp --serve
Restart=on-failure
RestartSec=5
StartLimitIntervalSec=60
StartLimitBurst=5
StandardOutput=syslog
StandardError=syslog

[Install]
WantedBy=multi-user.target

This unit sets Restart=on-failure to restart on abnormal exits but avoids unconditional always. RestartSec=5 adds a delay before restarting to prevent tight restart loops. Use a dedicated user and WorkingDirectory to improve security. After creating or changing a unit, run systemctl daemon-reload and then start/enable the unit.

Inspect logs with journalctl

Use journalctl to view recent logs for the unit. Wrap any long or important flags as shown below to filter and limit output. Viewing the logs helps troubleshoot why a process crashed and whether systemd restarted it successfully.

journalctl -u myapp.service -n 50 --no-pager

Mar 04 08:11:12 server systemd[1]: Started My Application.
Mar 04 08:12:03 server myapp[1423]: [INFO] Listening on 0.0.0.0:8080
Mar 04 08:14:08 server myapp[1423]: [ERROR] Unhandled exception: connection reset
Mar 04 08:14:08 server systemd[1]: myapp.service: Main process exited, code=killed, status=6/ABRT
Mar 04 08:14:13 server systemd[1]: myapp.service: Failed with result 'signal'.
Mar 04 08:14:13 server systemd[1]: myapp.service: Service hold-off time over, scheduling restart.

The journalctl output shows messages from the service and systemd indicating the failure and subsequent restart scheduling. Use the -n flag to limit lines and –no-pager to prevent pagination.

Lightweight keep-alive scripts with cron

If you need a simple approach for processes that are not system services, a small watchdog script scheduled with cron can be effective. This is useful for quick deployments, development machines, or when packaging a service is not feasible. Keep in mind cron-based restarts have coarser granularity (minimum one minute) and less control than systemd.

pgrep -x myapp

1423

pgrep -x searches for processes with an exact name match and returns the PID(s). If it returns nothing, the script can start the application. Use the exact process name to avoid false positives.

crontab -l

* * * * * /usr/local/bin/keep_alive_myapp.sh

The user's crontab lists a task that runs every minute and executes the keep-alive script. A typical script checks for a running PID and starts the app if missing. Cron scheduling entries are shown by crontab -l and must be edited with crontab -e.

Example keep-alive script (not a command block):

#!/bin/bash
if ! pgrep -x “myapp” >/dev/null; then
/usr/local/bin/myapp –serve >> /var/log/myapp.out 2>&1 &
fi

Remember to use proper logging, run the application under a dedicated user, and avoid running unsafe commands in cron. For production-grade setups prefer systemd or a supervisor that supports logging, restart limits and process isolation.

Use supervisord for language-agnostic process control

Supervisor (supervisord) is a lightweight process control system written in Python. It works across many distributions and is easy to configure for multiple programs. It provides a web UI and command-line tooling for status, start/stop and restarts. Below is how to check the status of managed programs.

sudo supervisorctl status

myapp                            RUNNING   pid 1423, uptime 0:17:23
worker-1                         FATAL     Exited too quickly (process log may have details)
logger                           RUNNING   pid 1428, uptime 0:17:20

The supervisorctl status output lists programs configured under supervisord, their state, PID and uptime. Supervisor configuration supports autostart=true and autorestart=true per-program settings.

Supervisor is a good fit when you need a single tool to manage many processes across different runtimes and when you prefer an external supervisor rather than systemd (e.g., some container environments or user-level supervision).

Avoiding restart loops and protecting system stability

Uncontrolled restart loops can overload a system. To avoid them:
– Prefer Restart=on-failure when the program should not restart on a normal exit.
– Use RestartSec to add a delay between restarts.
– Configure StartLimitIntervalSec and StartLimitBurst to rate-limit restart attempts.
– Use monitoring and alerting (e.g., Prometheus + Alertmanager) to notify admins when a service repeatedly fails.
– Investigate root cause using logs, core dumps and stack traces. Enable core dump collection for native apps only when safe.

Example with systemd rate limiting in unit file: set StartLimitIntervalSec=60 and StartLimitBurst=5 so more than five restarts in 60 seconds will keep the unit inactive and require manual intervention.

Advanced options and persistence of state

If your application needs to persist in-memory state across a crash or reboot, simply restarting the process won't restore runtime memory. For stateful recovery consider:
– Checkpoint/restore with CRIU (Checkpoint/Restore In Userspace) for Linux; this is advanced and limited to certain workloads.
– Implementing application-level persistence (databases, write-ahead logs, snapshots).
– Using external session stores (Redis, etcd) so transient processes can recover quickly.
For most services, the combination of persistence at the application layer plus automatic restart at the system level provides reliable recovery.

Security and operational best practices

Follow these recommendations:
– Run services as dedicated low-privilege users with explicit file permissions.
– Use proper configuration management (Ansible, Puppet, Salt) to deploy unit files and supervisor configs reproducibly.
– Centralize logs and rotate them (logrotate) to avoid disk fill.
– Test your restart policy in staging to ensure it behaves as expected during failure scenarios.
– Add health checks and readiness probes when running in orchestrated environments like Kubernetes (K8s has built-in restart policies).

Conclusion

Automatically restarting programs after a crash or reboot is fundamental for resilient Linux operations. For modern systems, prefer systemd units with well-chosen Restart policies, RestartSec delays, and StartLimit protections. Use supervisord when you need a lightweight, distribution-agnostic supervisor, and cron-based keep-alive scripts only for simple or temporary setups. Always combine automatic restarts with robust logging, alerting and application-level persistence to diagnose root causes and avoid data loss. Applying these practices across Debian, Ubuntu, RHEL, CentOS, Arch and other distributions will significantly reduce downtime and manual work while improving system reliability.

Komentariši

Vaša email adresa neće biti objavljivana. Neophodna polja su označena sa *