MySQL Master-Slave Setup & Troubleshooting SOP
Scenario 1: Set Up MySQL Master-Slave Replication
Applicable to: Read replica setup, cross-region database acceleration, analytics query isolation.
Prerequisites
- Master and slave running the same MySQL version (recommend 5.7 or 8.0)
- Network connectivity between master and slave
- Sufficient disk space on slave
Standard Steps
1. Configure master MySQL
vim /etc/my.cnf
Add the following:
[mysqld]
server-id = 1 # Must be unique
log-bin = mysql-bin
binlog_format = ROW
systemctl restart mysqld
2. Configure slave MySQL
vim /etc/my.cnf
Add the following:
[mysqld]
server-id = 2 # Must be unique, different from master
relay-log = relay-bin
read-only = 1 # Recommended for slave
systemctl restart mysqld
3. Create replication user on master
CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
FLUSH PRIVILEGES;
4. Lock tables and get binlog position
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
-- Record File and Position
Important: Do not close this session or stop MySQL before unlocking.
5. Export data from master
# Export all databases
mysqldump -u root -p --all-databases > /tmp/all_db.sql
# Export specific databases only
mysqldump -u root -p --databases db1 db2 > /tmp/db.sql
6. Record binlog info and unlock
SHOW MASTER STATUS;
-- Record mysql-bin.000001 and Position
UNLOCK TABLES;
7. Import data on slave
mysql -u root -p < /tmp/all_db.sql
8. Establish replication on slave
CHANGE MASTER TO
MASTER_HOST = 'master_ip',
MASTER_USER = 'repl',
MASTER_PASSWORD = 'password',
MASTER_LOG_FILE = 'mysql-bin.000001',
MASTER_LOG_POS = 1234;
START SLAVE;
SHOW SLAVE STATUS\G;
9. Verify replication
SHOW SLAVE STATUS\G;
Check that both Slave_IO_Running and Slave_SQL_Running are Yes.
Scenario 2: Percona XtraBackup Large-Volume Migration
Applicable to: Database sizes too large for mysqldump (hundreds of GB), or when downtime must be minimized.
Why XtraBackup
- Hot backup — no database downtime required
- Supports parallel compression
- Based on InnoDB crash recovery — guarantees data consistency
Environment Setup
Install XtraBackup (CentOS/RHEL 7)
yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm
yum install -y percona-xtrabackup-24
For Ubuntu/Debian:
apt install percona-xtrabackup-24
Database user privileges
GRANT BACKUP_ADMIN ON *.* TO 'backup_user'@'localhost';
GRANT PROCESS, RELOAD, REPLICATION CLIENT ON *.* TO 'backup_user'@'localhost';
-- Or use root if privileges are sufficient
Backup Steps
1. Full backup + compression
xtrabackup --backup \
--target-dir=/tmp/backup_full \
--compress \
--compress-threads=4 \
--user=root \
--password=password
2. Transfer to target server
scp -r /tmp/backup_full target_server:/tmp/
Restore Steps
1. Decompress (on target server)
xtrabackup --decompress --target-dir=/tmp/backup_full
2. Prepare (consistency check)
xtrabackup --prepare --target-dir=/tmp/backup_full
Important: Unprepared backups are inconsistent. Starting MySQL with unprepared files will fail.
3. Restore
⚠️ 执行前请确认:目标机器 IP 正确、数据库名正确、备份文件已 prepare 且可用。生产环境建议先快照或备份旧数据目录。
# Stop MySQL
systemctl stop mysqld
# Clear data directory
rm -rf /var/lib/mysql/*
# Copy back
xtrabackup --copy-back --target-dir=/tmp/backup_full
# Fix permissions
chown -R mysql:mysql /var/lib/mysql
# Start MySQL
systemctl start mysqld
Scenario 3: Replica Lag Investigation
Diagnostic Commands
SHOW SLAVE STATUS\G;
Key metrics:
| Field | Meaning |
|---|---|
Slave_IO_Running | IO thread status |
Slave_SQL_Running | SQL thread status |
Seconds_Behind_Master | Lag in seconds (0 = no lag) |
Read_Master_Log_Pos | Master binlog position read |
Relay_Log_Space | Total relay log size |
Common Causes
| Cause | Symptom | Resolution |
|---|---|---|
| Slave machine underpowered | High CPU/IO | Upgrade slave hardware |
| Large transaction blocking | SQL thread stuck | Split large transactions |
| Network latency | IO thread slow | Check master-slave network |
| Replication stopped | SQL thread Stopped | START SLAVE to retry |
Cross-Cloud Database Acceleration
Applicable when: Remote servers need to access a central database with high latency (e.g., overseas servers accessing a domestic database).
Remote server → Cloud Global Accelerator → Domestic source ECS
Configuration steps:
- Create a global acceleration instance in your cloud provider's console
- Configure acceleration region (remote) and source IP (domestic MySQL)
- Obtain the accelerated IP
- Configure security group on source server to allow the accelerator's backend nodes (note: IP group may be dynamic)
- Test:
telnet accelerated_ip 3306
Scenario 4: MySQL Memory Usage Investigation
Applicable to: Self-managed MySQL with high memory usage causing performance degradation.
Investigation Steps
1. Confirm MySQL memory usage
ps -eo pid,rss,vsz,comm | grep mysql
top -b -n 1 | grep mysql
2. Buffer Pool usage by database
SELECT
SUBSTRING_INDEX(table_name, '/', 1) AS db,
COUNT(*) * 16 / 1024 AS size_mb
FROM information_schema.innodb_buffer_page_lru
WHERE table_name IS NOT NULL
GROUP BY db
ORDER BY size_mb DESC
LIMIT 20;
3. Buffer Pool total usage
SELECT COUNT(*) * 16 / 1024 AS total_mb
FROM information_schema.innodb_buffer_page_lru;
4. MySQL memory distribution (non-buffer pool)
SELECT
event_name,
CURRENT_NUMBER_OF_BYTES_USED / 1024 / 1024 AS current_mb,
HIGH_NUMBER_OF_BYTES_USED / 1024 / 1024 AS high_mb
FROM performance_schema.memory_summary_global_by_event_name
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC
LIMIT 20;
5. Active connections
SHOW STATUS LIKE 'Threads_connected';
SHOW STATUS LIKE 'Max_used_connections';
6. Identify high-memory tables
SELECT
table_schema AS 'db',
table_name AS 'table',
ROUND(data_length / 1024 / 1024, 2) AS 'data_size_mb',
ROUND(index_length / 1024 / 1024, 2) AS 'index_size_mb'
FROM information_schema.tables
WHERE table_schema NOT IN ('mysql', 'information_schema', 'performance_schema')
ORDER BY data_length DESC
LIMIT 20;
Common Findings
| Scenario | Indicator | Recommendation |
|---|---|---|
| Single db high buffer pool | Normal, frequently accessed | Acceptable |
| Buffer pool near limit | innodb_buffer_pool_size too large | Reduce or upgrade memory |
| Non-buffer pool abnormal | Temp tables / large sorts / big queries | Optimize SQL |
| Many connections | Connection pool leak or unclosed connections | Check application connection pool |
Scenario 5: Cross-Server Database Migration (Comprehensive)
Applicable to: Game server migration, server upgrades, regional data transfers.
Standard Flow
1. Export cross-server data
↓
2. Upload to target machine
↓
3. Create database and import data
↓
4. Migrate game/service directory
↓
5. Modify configuration files
↓
6. Update ops/management database records
↓
7. Batch update related service configurations
↓
8. Restart service
Key Checkpoints
| Step | Check |
|---|---|
| Data export | Stop writes, lock tables or shut down service |
| Data transfer | Verify md5sum |
| Data import | Check error logs |
| Config update | Confirm IP, port, database name |
| Database update | ops_server table, management config |
| Service restart | Verify process and port |
Command Cheatsheet
# Check replica status
SHOW SLAVE STATUS\G;
# Check master binlog position
SHOW MASTER STATUS;
# Stop/start replica
STOP SLAVE;
START SLAVE;
# Reset replica configuration
RESET SLAVE ALL;
# Check connections
SHOW STATUS LIKE 'Threads_connected';
# Check max connections
SHOW VARIABLES LIKE 'max_connections';
# MySQL buffer pool usage by db
SELECT
SUBSTRING_INDEX(table_name, '/', 1) AS db,
COUNT(*) * 16 / 1024 AS size_mb
FROM information_schema.innodb_buffer_page_lru
WHERE table_name IS NOT NULL
GROUP BY db;
# XtraBackup backup
xtrabackup --backup --target-dir=/tmp/backup --user=root --password=password
# XtraBackup restore
xtrabackup --prepare --target-dir=/tmp/backup
xtrabackup --copy-back --target-dir=/tmp/backup
Notes
- server-id must be unique — master and slave cannot have the same value
- Set
read-only=1on slave — prevents accidental writes - Use XtraBackup for large volumes — mysqldump may fail or take too long
- High replica lag — check slave machine performance and network first
- Always confirm IP, port, database name before migration — wrong values will break connectivity
- Cross-cloud migration has network latency — consider using a cloud global acceleration service