What to Monitor on a Moodle Server
A practical guide to monitoring a Moodle server, with attention to uptime, cron, queues, disk, database, email, and application-level warning signs.
Goal of Monitoring
The goal of monitoring is to ensure that the system, i.e. the web server and all software on it, is working properly and within established parameters. If at any time a website or a subsystem on the web server stops functioning, a signal should be sent out to the sysop, who maintains the system.
In addition, it should also be possible to examine trends over time, or historic data, to evaluate whether or not the system’s resources should be expanded (or scaled back) in the future.
You will notice that we are relying on two monitoring systems now: one provided by the data center, and a monitoring system based on Webmin, which is an administrative system for (web) servers. The reason for adding Webmin’s monitoring is that the data center does not allow you to monitor specific websites, but Webmin does.
1. Check Monitoring Settings of the Data Center
The data center may have its own monitoring that comes pre-installed and configured with a new web server (VPS). Just make sure that everything is set up correctly.
Don’t bother configuring Strato’s Monitoring Service: there is only one check available in the free plan, and it entails a ping every 30 minutes. Use something like lms.example.com/ instead, which is free. |
|---|
For instance, for hosting provider, do the following. Sign in to hosting provider’s KIS website: https://kis.hosting provider.de/ and click on the appropriate type of server: either Virtual Server 10+ or Virtual Server. In this guide we show the first type.
In the following screen, click on the login button, under the Contract column:
This will open a new browser window (or tab). Here you see the current usage:
The following metrics should not exceed 80%:
-
CPU cores
-
RAM
And Disk space should not exceed 95%.
If the system is not used to send out email, then the SMTP relays metric is typically 0.
Ideally, Uptime monitoring is 100%, but may decrease slightly to 99.91% over time.
Now click on the Monitoring tab, which should take you to the next screen:
Here, make sure all the settings for Manage Email Alerts are switched on.
This monitor will send out an email to the owner of the KIS account with an alert if either CPU, Disk or RAM usage exceeds 80%.
External Monitoring
It is also recommended to add an external monitor. An external monitor is a monitor that resides on another system. For instance, you can use lms.example.com for free to perform a GET request every five minutes to a website on the server you want to monitor. Don’t forget to add your email address so you will receive notifications when the monitor fails.
Using an external monitor ensures you get alerted if the server goes down even if the entire data center goes down with it.
Heartbeat Monitor
We have a custom plugin, tool_heartbeat, which can be used to send out an “I’m alive” signal to lms.example.com (or a comparable service). Use this tool to make sure Moodle’s (or Totara’s) cron is still working.
Here’s how it works:
-
The Moodle or Totara site stops telling Cronitor “I’m alive!” for whatever reason. (The Heartbeat plugin does this, hence the name.)
-
Cronitor notices Totara is no longer alive, waits 5 minutes just in case, and then sends out an alert “Type: Alert” (“Event not received on schedule”).
-
If (when) Totara is reanimated, Cronitor sends out an alert “Type: Recovery”.
So, in the email messages from Cronitor, “Alert” means there’s a problem, and “Recovery” means it’s fixed.
Installation and configuration
-
Place the contents of this directory inside the /admin/tool/heartbeat folder relative to your Moodle or Totara install path.
-
Configure the cron job to * * * * * php /path_to_your_moodle/admin/cli/cron.php | php /path_to_your_moodle/admin/tool/heartbeat/cli/cron.php > /dev/null
Plugins settings
-
Cron monitor: Enable the monitor and add the url of the external cron monitor service
-
Email settings: Enable the email notifications, add the email subject and body, select recipients that get the email.
2. Make sure Webmin is Installed
Our standard procedure is to install Webmin, an administrative system for web servers. So Webmin should be installed and accessible, typically through the hostname and the 10000 port, e.g.: lms.example.com:10000/.
3. Configure Webmin to Monitor Critical Systems and Websites
Go to Webmin and open the Tools > System and Server Status section:
We need to add five types of monitors:
-
Load average: what is the average usage of the system in during the last 15 minutes
-
Disk space: how much is left on the disk (typically an SSD drive)
-
Apache web server: is the web server up and running?
-
Free memory: how much free memory do we have left?
-
MySQL database server: is the database server up and running?
To add a new monitor in Webmin, use the select box next to the button Add monitor of type and then click the button.
Settings for All New Monitors
For all new monitors, do not forget to add a Description that includes the customer’s name (or main website), and fill out the field “Also send email for this service to” with the address of the person in the sysop role for this server. Set the field “Failures before reporting” to 1. (See the screenshots below for some examples of where to find these fields.)
Load Average Monitor
The average load is the usage of the system (mainly CPU usage) during the past 5, 10 and 15 minutes. To get a good perspective, we set this monitor to 15 minutes, under Load average to check.
The Maximum load average value is critical: it should not exceed 80%. The actual value to fill in, is based on the number of CPU cores. This is the computation:
n cores x .8
For instance, 1 core is 0.8, and 4 cores gives you a value of 3.2.
The number of cores can be retrieved from Webmin as well. Simply go to Webmin’s homepage and look for Processor information. There you find the number of cores:
You can also use the command lscpu:
admin@example-host:~$ lscpuArchitecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 4
Disk Space Monitor
This is pretty straightforward: just fill in 5%. This should send out an alert if the disk is over 95% capacity. Filesystem to check is /.
Apache Web Server Monitor
The defaults for this monitor should be fine.
Free Memory Monitor
For this monitor, two values are critical:
-
Minimum free real memory: we want 20% to be free (or max 80% used)
-
Minimum free virtual memory: we want this set equal to the amount of physical RAM.
To compute the 20% minimum free RAM, we need to know the total available real memory. You can find this on the “homepage” of Webmin:
Webmin reports the total memory in Gigibytes (GiB). But the Free Memory monitor uses megabytes (MB). To convert the free memory from GiB to MB, use the following formula:
MB = 1073.74 x n GiB
For instance, if we have 7.77 GiB that gives us 8342.9598 MB. Of this number, we take 20% to fill in for the minimum free real memory, and 25% of the virtual memory as the “Minimum free virtual memory”.
MySQL Database Server Monitor
The defaults for this monitor are fine. Make sure that the “Failures before reporting” field is set to 1 and that the “Also send email for this service to” field is filled in.
4. Add a “Remote HTTP Service” Monitor to Another Webmin
What happens if the entire web server is out or can no longer be reached? In that case, all the monitors we added in the section above will no longer run, or if they are still running, their email alerts may not reach you.
To counter this, we add a “Remote HTTP Service” monitor to a Webmin installation on another web server entirely:
As you can tell from the Status history, this check is performed every 5 minutes.
Set the field “Connection timeout” to 10 seconds. This should also notify you if the loading times for the Moodle website get unacceptable (i.e. more than 10 seconds).
5. Test the Monitoring
Testing should only be done on a completely new system that is not in use yet. The monitors are typically working – they consist of proven, well tested software. So we will not be testing that the monitoring software works, but mainly that we have configured it correctly.
The most critical monitor is the one for the actual Moodle website. We test this by simply turning off the web server. This can be done in Webmin.
Go to Servers > Apache and click the stop button, but only on a new system that is not in use yet:
If you have configured the Remote HTTP Service monitor correctly, you should receive an email very soon.
Restart the Apache web server by clicking on the play button.
You can also stop and start Apache on the command line:
sudo /etc/init.d/apache2 stop
sudo /etc/init.d/apache2 startIf you do not receive any email, make sure that you have used the correct email address, and the correct url (including the port: nowadays almost always 443).
6. Install a New Munin Node on the Web Server
Munin is a logging tool which consists of a server and a node. The node is installed on the system that you want to monitor. The server is where you login to view the historical data. We already have the server in place.
If you login to monitoring.example.internal, you will see an overview of the systems that we are currently monitoring through Munin. Click on a specific system to view the details. Here is an example of the history of the load average:
To install the node on a new web server:
-
Make sure that the library libparse-http-useragent-perl is installed, e.g.:
sudo apt-get install libparse-http-useragent-perl -
Install munin:
-
apt-get install munin
-
apt-get install munin-node
-
-
Make sure that the Apache’s server-status module is enabled. (You can do this through Webmin.)
-
Add the ip address of the Munin server (i.e. the “master”) to /etc/munin/munin-node.conf:
-
allow ^xxx\.xxx\.xxx\.xxx$
-
-
Configure the munin plugins.
Configuring The Munin Plugins
The default plugins for the node (so, on your Munin “client” web server) are in /usr/share/munin/plugins/. They appear in your munin website if they’re symlinked in /etc/munin/plugins. For instance:
In /etc/munin/plugins, add symlinks to the apache plugins:
ln -s /usr/share/munin/plugins/apache_accesses .
ln -s /usr/share/munin/plugins/apache_processes .
ln -s /usr/share/munin/plugins/apache_volume .You must also configure them in the file /etc/munin/plugin-conf.d/munin-node. In that file, if you want to configure multiple plugins at once, use an asterisk notation. E.g.:
[apache*]
This addresses all apache plugins, which are by default:
apache_accesses
apache_processes
apache_volume
Usually when you look at the source code of the plugins (they’re mostly perl scripts), you will find configuration instructions. For instance, the apache plugins need access to Apache’s server status, so you have to configure Apache (i.e. httpd.conf):
<Location /server-status>
SetHandler server-statusOrder deny,allow
Deny from all
Allow from 127.0.0.1
</Location>
ExtendedStatus onWe should also mention here that some plugins seem to exclude each other. For instance, the apache_average_time_last_n_requests plugin (not installed by default) seems to exclude the other (default) apache plugins.
Finally, restart the node:
/etc/init.d/munin-node restartAnd open the firewall for port 4949.
Please note: if any of the Munin plugins fail, you will not see any date from that Munin node on the server (monitoring.example.internal)!
Configure The Munin Server
Finally, you also have to tell the Munin server to start polling the newly added node. Add the ip address of the node server to the file /etc/munin/munin.conf:
[ArbitraryServerName] # Apparently, you can’t use spaces in this name
address xxx.xxx.xxx.xxx
use_node_name yes
The Munin server (the ‘master’) will read the new values within 5 minutes (the default polling interval).
Detailed Monitoring
If you run into any trouble with a VPS, you can add more detailed monitoring.
Performance Monitoring
The following is a monitoring script based on an email exchange with the hosting provider, May 19th 2022 about the website outages on their VS10 Linux VPS (search for 198.51.100.10 #HE-DE:2ad1f7b4109530473 in the email history).
date >> /var/log/custom-monitoring.log; top -n 1 -b >> /var/log/custom-monitoring.log; lsof -ni >> /var/log/custom-monitoring.log
This log will contain detailed performance information which you can use to identify which particular application is causing high load, for instance.
Explanation:
-
date: current date and time
-
top: display linux processes;
-
-n 1: Specifies the maximum number of iterations, or frames, top should produce before ending.
-
-b: Starts top in Batch mode, which could be useful for sending output from top to other programs or to a file. In this mode, top will not accept input and runs until the iterations limit you’ve set with the `-n’ command-line option or until killed.
-
-
lsof: lists on its standard output file information about files opened by processes
-
-i: selects the listing of files any of whose Internet address matches the address specified in i. If no address is specified, this option selects the listing of all Internet and x.25 (HP-UX) network files.
-
-n: selects the listing of files any of whose Internet address matches the address specified in i. If no address is specified, this option selects the listing of all Internet and x.25 (HP-UX) network files.
-
Log File Rotation
This type of monitoring generates a lot of data, so put it in log file rotation, see Webmin > System > Log File Rotation (the one for /var/log/letsencrypt/*.log is a good example).
Use the default settings, except for:
-
Rotation schedule: Daily
-
Number of old logs to keep: 31, so you will always have at least a month’s worth of data.
-
Compress old log files?: Yes.
Slow Query Monitoring for MySQL
MySQL has a slow query log which records all queries which took longer than 10 seconds (by default) to execute. For Moodle, 10 seconds is not realistic because many queries take longer than that, so 30 seconds is probably better.
To activate slow query logging:
-
Login using the mysql client: sudo mysql -uroot -p
-
set global slow_query_log = ‘ON’;
-
set global slow_query_log_file =’/var/log/mysql/slow-query.log’;
-
set global long_query_time = 30;
-
Confirm the changes are active by re-entering the MySQL shell (this reloads the system variables) and running the following command: show variables like ‘%slow%’;
Make sure the slow-query.log is in log rotation (see subsection Log File Rotation).
Incident Response
If you receive an alert from either monitoring system, take the following steps:
-
Verify the alert
-
If normal usage was impeded, i.e. there was an actual outage, notify the customer, with an estimated time to fix if possible
-
Fix the issue
-
Take steps to prevent this from happening again (and document them in a relevant SOP)
-
If there was an outage, notify the customer that the issue is now fixed and what you have done, or will do in the very short term, to prevent a recurrence of the incident.
Appendix – Health Monitoring on Servers Without Webmin
Purpose
This section describes how basic server health monitoring is implemented on systems where Webmin is not installed or not permitted.
Instead of relying on a web-based administration interface, monitoring is achieved using:
-
a lightweight Bash script
-
systemd timers
-
standard Unix tooling (mail, logrotate)
This approach minimizes attack surface, avoids additional services, and is fully auditable.
Rationale (Why No Webmin)
Webmin provides convenient monitoring and administration features but:
-
introduces an additional web-facing service
-
increases maintenance and patching requirements
-
is not always allowed under security policies
For these reasons, this server uses a script-based monitoring approach that:
-
requires no open ports
-
has no daemon processes
-
depends only on standard OS components
-
provides clear alerting and diagnostics
Monitoring Scope
The health check verifies the following:
-
Disk usage on the root filesystem (/)
-
System load (1-minute average, normalized per CPU core)
-
Available memory (MemAvailable)
-
Required services:
-
apache2
-
postgresql
-
-
Local HTTP availability via http://127.0.0.1/
On failure:
-
a diagnostics snapshot is appended to a log file
-
an alert email is sent
On success:
-
a single “OK” line is written to the log
-
no email is sent
Installation
Prerequisites
Ensure mail utilities are installed:
apt update
apt install mailutils
Postfix is already present on this system.
Script Installation
Create the monitoring script:
vim /usr/local/sbin/healthcheck.shInsert the full script source provided below.
Set permissions:
chmod 0755 /usr/local/sbin/healthcheck.shCreate the state directory:
mkdir -p /var/lib/healthchecksystemd Configuration
Create the service unit:
vim /etc/systemd/system/healthcheck.service[Unit]
Description=Basic server health check
[Service]
Type=oneshot
ExecStart=/usr/local/sbin/healthcheck.sh
Create the timer unit:
vim /etc/systemd/system/healthcheck.timer[Unit]
Description=Run healthcheck every 5 minutes
[Timer]
OnBootSec=2min
OnUnitActiveSec=5min
AccuracySec=30s
[Install]
WantedBy=timers.target
Enable and start the timer:
systemctl daemon-reload
systemctl enable --now healthcheck.timerVerify:
systemctl list-timers | grep healthcheckValidation
To verify alerting end-to-end, force a failure:
DISK_MAX_PCT=1 /usr/local/sbin/healthcheck.shExpected result:
-
exit code 1
-
alert email is sent
-
diagnostics appear in /var/log/healthcheck.log
Logging and Log Rotation
Log File
All output is written to:
/var/log/healthcheck.logThis file contains:
-
one-line OK entries for successful runs
-
full diagnostics snapshots for failures
Log Rotation Configuration
Create logrotate configuration:
vim /etc/logrotate.d/healthcheck
/var/log/healthcheck.log {weekly
rotate 8
dateext
compress
delaycompress
missingok
notifempty
copytruncate
}Force a test rotation:
logrotate -vf /etc/logrotate.d/healthcheck
Email Alert Handling
Recipients
Alert emails are sent to multiple recipients using standard Postfix delivery.
Recipients are configured in the script via:
ALERT_EMAIL=”admin@lms.example.com admin@lms.example.com admin@lms.example.com”
Mail Client Filtering (Recommended)
To prevent alert emails from being classified as spam or overlooked:
Create a mail filter or rule in the mail client:
Match subject containing:
[ALERT][Totara][ubuntu]
-
Always deliver to inbox (or mark as important)
-
Optionally apply a label such as “Server Monitoring”
This ensures alerts remain visible while avoiding unnecessary inbox noise.
Script Source Code
/usr/local/sbin/healthcheck.sh
#!/usr/bin/env bashset -euo pipefail
HOSTNAME_SHORT="$(hostname -s)"
HOSTNAME_FQDN="$(hostname -f 2>/dev/null || hostname)"
NOW="$(date -Is)"# —————————–
# CONFIG (defaults, overridable via environment)
# —————————–
: “${ALERT_EMAIL:=admin@lms.example.com admin@lms.example.com admin@lms.example.com}”
: “${MAIL_FROM:=admin@lms.example.com}”
: “${DISK_MAX_PCT:=95}”
: “${LOAD_PER_CORE_MAX:=1.50}”
: “${MEM_AVAIL_MIN_MB:=512}”
: “${HTTP_URL:=http://127.0.0.1/}”
: “${ALERT_COOLDOWN_SECONDS:=1800}”
: “${STATE_DIR:=/var/lib/healthcheck}”
SERVICES=("apache2" "postgresql")# —————————–
log_line() {
echo "[$NOW] $*" >> /var/log/healthcheck.log
}send_alert() {
local subject=”$1″
local body=”$2″
printf “%s\n” “$body” | mail -a “From: ${MAIL_FROM}” -s “$subject” ${ALERT_EMAIL} || true
}rate_limited() {
local key=”$1″
local stamp=”${STATE_DIR}/${key}.stamp”
local now
now=”$(date +%s)”
mkdir -p "$STATE_DIR"
if [[ -f "$stamp" ]]; thenlocal last
last=”$(cat “$stamp” || echo 0)”
now – last < ALERT_COOLDOWN_SECONDS && return 0
fi
echo "$now" > "$stamp"
return 1
}fail() {
local key=”$1″
local msg=”$2″
log_line “FAIL ${HOSTNAME_SHORT}: ${msg}”
{
echo "----- failure snapshot ($NOW) -----"uptime
echodf -h
echofree -m
echotop -b -n1 | head -n 60
echoss -tulpn
echo
systemctl --failed
echo "----------------------------------"} >> /var/log/healthcheck.log
rate_limited “$key” && exit 1
send_alert “[ALERT][Totara][${HOSTNAME_SHORT}] healthcheck failed: ${key}” \
“Time: $NOW
Host: ${HOSTNAME_FQDN}
Reason:
${msg}
See /var/log/healthcheck.log for diagnostics.”
exit 1
}touch /var/log/healthcheck.log
disk_pct=”$(df -P / | awk ‘NR==2{gsub(“%”,””,$5); print $5}’)”
[[ “$disk_pct” -lt “$DISK_MAX_PCT” ]] || fail disk “Disk usage ${disk_pct}%”
cores=”$(nproc)”
load_1m=”$(awk ‘{print $1}’ /proc/loadavg)”
awk -v l=”$load_1m” -v c=”$cores” -v t=”$LOAD_PER_CORE_MAX” ‘BEGIN{ exit !l/c)<=t) }’ \
|| fail load “Load ${load_1m} on ${cores} cores”
mem_avail_mb=”$(awk ‘/MemAvailable/ {print int($2/1024)}’ /proc/meminfo)”
[[ “$mem_avail_mb” -ge “$MEM_AVAIL_MIN_MB” ]] \
|| fail memory “MemAvailable ${mem_avail_mb}MB”
for svc in "${SERVICES[@]}"; do
systemctl is-active --quiet "$svc" \|| fail “service-${svc}” “Service not active: ${svc}”
done
curl -fsS --max-time 10 "$HTTP_URL" >/dev/null \|| fail http “Local HTTP check failed”
log_line “OK ${HOSTNAME_SHORT}”
exit 0
Solin specializes in Moodle hosting, monitoring, and incident response. Need help? Contact us.
Contact us