Building Moodle for high availability on AWS requires separating state from compute: managed services like RDS, EFS, and ElastiCache handle persistent data while Spot instances serve stateless web traffic. This guide covers the full architecture, including failover strategies and critical configuration gotchas.

Glossary of AWS Terminology

  • EC2 (Elastic Compute Cloud): Virtual machines in AWS.
  • Spot Instance: Unused EC2 capacity sold at a discount. AWS may reclaim them with a 2-minute warning.
  • On-Demand Instance: Normal EC2 instance billed at the standard hourly rate with no risk of interruption.
  • ASG (Auto Scaling Group): Service that manages the scaling and lifecycle of EC2 instances.
  • ALB (Application Load Balancer): Layer 7 load balancer for web traffic.
  • RDS: Managed database service for MySQL, MariaDB, PostgreSQL, or Aurora.
  • EFS (Elastic File System): Managed NFS file store that can be mounted on multiple EC2 instances.
  • ElastiCache (Redis): Managed caching service used for Moodle MUC and optionally session caching.

1. Establish Shared Storage and State (The Stateless Layer)

In a Spot-based architecture, no EC2 instance should contain state that is required for proper operation. All persistent data must live in managed services.

Database (RDS or Aurora)

Purpose

Stores all Moodle application data, including course content, activities, logs, configuration, user accounts, enrolments, and (optionally) sessions.

Steps
  • Create a new RDS MySQL or Aurora MySQL instance. Recommended versions: MySQL 8.0 or Aurora MySQL 3.
  • Place the DB in private subnets in at least two Availability Zones.
  • Enable automated backups and set a retention period that matches business requirements.
  • Create a security group allowing inbound MySQL traffic only from:
  • The stable instance
  • The Spot web servers
  • (Optional) a bastion host or VPN
  • Set the database parameter group:
  • innodb_buffer_pool_size sized to available memory.
  • max_allowed_packet to at least 256M for large file uploads.

File Storage (EFS)

Purpose

Provides shared storage for moodledata, which must be accessible by all EC2 instances. Local disk storage cannot be used in a stateless architecture.

Steps
  • Create an EFS file system in the same region and VPC.
  • Enable Mount Targets in at least two Availability Zones in the private subnets.
  • Configure security group rules to allow NFS (TCP 2049) from all EC2 instances.
  • On each EC2 instance (stable and spot):
  • Install the EFS utilities.
  • Mount EFS to /var/www/moodledata (or your chosen location).
  • Set correct ownership and permissions:
  • Typical: www-data:www-data, u+rwX, g+rwX.
Performance Notes
  • Use General Purpose mode for most Moodle deployments.
  • Use Elastic Throughput unless running a very large site.

Caching (ElastiCache Redis)

Purpose

Provides fast shared cache storage for Moodle Universal Cache (MUC), session data (optional), and request-level caching.

Steps
  • Create an ElastiCache Redis cluster (cluster mode disabled is fine for most Moodle installs).
  • Place the cluster in private subnets.
  • Allow inbound traffic from the EC2 instances.
  • Configure Moodle to use Redis for:
  • Application cache
  • Session cache (optional, but recommended)
Notes
  • Do not use local file caching or local sessions on EC2 instances.
  • If using Redis for sessions, configure a replication group with automatic failover.

2. Configure the Stable Instance (Admin/Cron)

Role of the Stable Instance

This EC2 instance is not part of the Spot fleet. It runs on On-Demand pricing and is never interrupted. It performs all critical background processing:

  • Moodle cron
  • Scheduled tasks
  • Ad-hoc tasks such as course backups, grading operations, and import/export tasks
  • CLI maintenance actions
  • Long-running administrative operations

It effectively acts as the control plane of the Moodle application.

Configuration Requirements

  • Instance Type: Choose an instance with enough CPU and memory to handle all background tasks. Suggested minimum: t3.medium or t3.large.
  • Mount EFS: Mount the same EFS file system used by the Spot fleet.
  • Networking:
  • Instance must be in a private subnet.
  • Ensure outbound internet access through a NAT Gateway.
  • Security group must allow:
  • Outbound access for email and updates.
  • Inbound SSH from trusted locations.

Cron Setup: Add the following to root or www-data crontab:* * * * * /usr/bin/php /var/www/html/admin/cli/cron.php >/dev/null 2>&1

  • Web Traffic: It does not need to be in the ALB Target Group unless you want redundancy for admin access. The stable instance should not serve normal web traffic.

3. Configure the Spot Fleet (Web Servers)

These EC2 instances serve all user-facing traffic and can scale horizontally. They can be terminated at any time by AWS, so they must be fully stateless.

Auto Scaling Group (ASG)

Steps
  • Create an ASG in at least two Availability Zones.
  • Configure Mixed Instances Policy:
  • Majority Spot Instances
  • One or two On-Demand instances (optional) as capacity floor
  • Enable Instance Type Diversification:
  • Include 3 to 6 instance types across t, m, and c families.
  • Enable Capacity Rebalancing so AWS can replace at-risk Spot instances proactively.
  • Set the desired capacity, minimum, and maximum:
  • Example: min 2, desired 4, max 8.

Launch Template Configuration

The launch template defines how each web server is set up.

Required items
  • AMI (Ubuntu 24.04 or Amazon Linux 2023 recommended)
  • User Data script that:
  • Installs Apache or Nginx
  • Installs PHP and required extensions
  • Installs AWS EFS utilities
  • Mounts /var/www/moodledata
  • Pulls the Moodle code (Git, EFS, or CodeDeploy)
  • Configures Redis and DB settings via config.php
  • Starts the web server
Critical Requirement

The web servers must never run the Moodle cron. Ensure the user data does not write any cron entries.

4. Configure Load Balancing and Termination Handling

Application Load Balancer (ALB)

Steps
  • Create an ALB in public subnets.
  • Attach AWS ACM certificates for HTTPS.
  • Create a Target Group using HTTP or HTTPS.
  • Register only the ASG (Spot fleet) instances.
  • Configure health checks:
  • Path: /login/index.php or a custom health check script
  • Interval: 30 seconds
  • Healthy threshold: 2

Connection Draining (Deregistration Delay)

Connection Draining protects users when AWS terminates a Spot instance.

Steps
  • Open the Target Group settings in the AWS Console.
  • Set Deregistration Delay to 60 to 120 seconds.
  • Enable Connection Termination Protection if available.
How it Works
  • AWS gives a 2-minute warning before reclaiming a Spot instance.
  • The instance receives the termination notice via instance metadata.
  • The ASG marks the instance as terminating.
  • The ALB immediately stops routing new requests to that instance.
  • Existing user requests have up to 120 seconds to finish.

This prevents unexpected 500 errors during page loads.

5. Moodle Application Configuration

To complete the architecture, configure Moodle so that all stateful components use shared services.

Critical Settings

  • Sessions
  • Enable Database or Redis session storage.
  • Do not use file based sessions.
  • Path: Site administration > Server > Session handling.
  • Asynchronous Backups
  • Set to Enabled.
  • Path: Site administration > Courses > Backups > General backup defaults.
  • Reason: Ensures backups triggered by users running on Spot instances are delegated to the Stable instance.
  • Universal Cache (MUC)
  • Configure all cache stores (Application, Session, Request) to use Redis.
  • Path: Site administration > Plugins > Caching > Configuration.

Config.php Adjustments Typical minimal adjustments:$CFG->sessionredis_host = 'redis.example.internal';

$CFG->session_handler_class = 'core\session\redis';
$CFG->directorypermissions = 02775;
$CFG->preventfilelocking = true;

Risks and Limitations

  • Uploads longer than 2 minutes may fail if a Spot instance is reclaimed mid-upload, even with connection draining.
  • Cron must remain isolated on the Stable instance. Running cron on Spot instances can cause:
  • race conditions
  • tasks running twice
  • course backups failing
  • cron locks becoming stuck
  • EFS performance limits may apply to very large Moodle sites. For heavy workloads, consider:
  • EFS provisioned throughput
  • Local SSD cache combined with rsync or S3 (advanced)
  • Redis is a single point of failure unless deployed with replication and automatic failover.
  • Spot capacity fluctuations may cause scaling delays during high traffic periods.

Solin specializes in Moodle architecture and performance optimization.

Contact us