MySQL Master-Slave Setup & Troubleshooting SOP

Scenario 1: Set Up MySQL Master-Slave Replication

Applicable to: Read replica setup, cross-region database acceleration, analytics query isolation.

Prerequisites

Master and slave running the same MySQL version (recommend 5.7 or 8.0)
Network connectivity between master and slave
Sufficient disk space on slave

Standard Steps

1. Configure master MySQL

vim /etc/my.cnf

Add the following:

[mysqld]
server-id = 1  # Must be unique
log-bin = mysql-bin
binlog_format = ROW

systemctl restart mysqld

2. Configure slave MySQL

vim /etc/my.cnf

Add the following:

[mysqld]
server-id = 2  # Must be unique, different from master
relay-log = relay-bin
read-only = 1  # Recommended for slave

systemctl restart mysqld

3. Create replication user on master

CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
FLUSH PRIVILEGES;

4. Lock tables and get binlog position

FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
-- Record File and Position

Important: Do not close this session or stop MySQL before unlocking.

5. Export data from master

# Export all databases
mysqldump -u root -p --all-databases > /tmp/all_db.sql

# Export specific databases only
mysqldump -u root -p --databases db1 db2 > /tmp/db.sql

6. Record binlog info and unlock

SHOW MASTER STATUS;
-- Record mysql-bin.000001 and Position
UNLOCK TABLES;

7. Import data on slave

mysql -u root -p < /tmp/all_db.sql

8. Establish replication on slave

CHANGE MASTER TO
  MASTER_HOST = 'master_ip',
  MASTER_USER = 'repl',
  MASTER_PASSWORD = 'password',
  MASTER_LOG_FILE = 'mysql-bin.000001',
  MASTER_LOG_POS = 1234;

START SLAVE;
SHOW SLAVE STATUS\G;

9. Verify replication

SHOW SLAVE STATUS\G;

Check that both Slave_IO_Running and Slave_SQL_Running are Yes.

Scenario 2: Percona XtraBackup Large-Volume Migration

Applicable to: Database sizes too large for mysqldump (hundreds of GB), or when downtime must be minimized.

Why XtraBackup

Hot backup — no database downtime required
Supports parallel compression
Based on InnoDB crash recovery — guarantees data consistency

Environment Setup

Install XtraBackup (CentOS/RHEL 7)

yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm
yum install -y percona-xtrabackup-24

For Ubuntu/Debian:

apt install percona-xtrabackup-24

Database user privileges

GRANT BACKUP_ADMIN ON *.* TO 'backup_user'@'localhost';
GRANT PROCESS, RELOAD, REPLICATION CLIENT ON *.* TO 'backup_user'@'localhost';
-- Or use root if privileges are sufficient

Backup Steps

1. Full backup + compression

xtrabackup --backup \
  --target-dir=/tmp/backup_full \
  --compress \
  --compress-threads=4 \
  --user=root \
  --password=password

2. Transfer to target server

scp -r /tmp/backup_full target_server:/tmp/

Restore Steps

1. Decompress (on target server)

xtrabackup --decompress --target-dir=/tmp/backup_full

2. Prepare (consistency check)

xtrabackup --prepare --target-dir=/tmp/backup_full

Important: Unprepared backups are inconsistent. Starting MySQL with unprepared files will fail.

3. Restore

⚠️ 执行前请确认：目标机器 IP 正确、数据库名正确、备份文件已 prepare 且可用。生产环境建议先快照或备份旧数据目录。

# Stop MySQL
systemctl stop mysqld

# Clear data directory
rm -rf /var/lib/mysql/*

# Copy back
xtrabackup --copy-back --target-dir=/tmp/backup_full

# Fix permissions
chown -R mysql:mysql /var/lib/mysql

# Start MySQL
systemctl start mysqld

Scenario 3: Replica Lag Investigation

Diagnostic Commands

SHOW SLAVE STATUS\G;

Key metrics:

Field	Meaning
`Slave_IO_Running`	IO thread status
`Slave_SQL_Running`	SQL thread status
`Seconds_Behind_Master`	Lag in seconds (0 = no lag)
`Read_Master_Log_Pos`	Master binlog position read
`Relay_Log_Space`	Total relay log size

Common Causes

Cause	Symptom	Resolution
Slave machine underpowered	High CPU/IO	Upgrade slave hardware
Large transaction blocking	SQL thread stuck	Split large transactions
Network latency	IO thread slow	Check master-slave network
Replication stopped	SQL thread Stopped	`START SLAVE` to retry

Cross-Cloud Database Acceleration

Applicable when: Remote servers need to access a central database with high latency (e.g., overseas servers accessing a domestic database).

Remote server → Cloud Global Accelerator → Domestic source ECS

Configuration steps:

Create a global acceleration instance in your cloud provider's console
Configure acceleration region (remote) and source IP (domestic MySQL)
Obtain the accelerated IP
Configure security group on source server to allow the accelerator's backend nodes (note: IP group may be dynamic)
Test: telnet accelerated_ip 3306

Scenario 4: MySQL Memory Usage Investigation

Applicable to: Self-managed MySQL with high memory usage causing performance degradation.

Investigation Steps

1. Confirm MySQL memory usage

ps -eo pid,rss,vsz,comm | grep mysql
top -b -n 1 | grep mysql

2. Buffer Pool usage by database

SELECT
    SUBSTRING_INDEX(table_name, '/', 1) AS db,
    COUNT(*) * 16 / 1024 AS size_mb
FROM information_schema.innodb_buffer_page_lru
WHERE table_name IS NOT NULL
GROUP BY db
ORDER BY size_mb DESC
LIMIT 20;

3. Buffer Pool total usage

SELECT COUNT(*) * 16 / 1024 AS total_mb
FROM information_schema.innodb_buffer_page_lru;

4. MySQL memory distribution (non-buffer pool)

SELECT
    event_name,
    CURRENT_NUMBER_OF_BYTES_USED / 1024 / 1024 AS current_mb,
    HIGH_NUMBER_OF_BYTES_USED / 1024 / 1024 AS high_mb
FROM performance_schema.memory_summary_global_by_event_name
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC
LIMIT 20;

5. Active connections

SHOW STATUS LIKE 'Threads_connected';
SHOW STATUS LIKE 'Max_used_connections';

6. Identify high-memory tables

SELECT
    table_schema AS 'db',
    table_name AS 'table',
    ROUND(data_length / 1024 / 1024, 2) AS 'data_size_mb',
    ROUND(index_length / 1024 / 1024, 2) AS 'index_size_mb'
FROM information_schema.tables
WHERE table_schema NOT IN ('mysql', 'information_schema', 'performance_schema')
ORDER BY data_length DESC
LIMIT 20;

Common Findings

Scenario	Indicator	Recommendation
Single db high buffer pool	Normal, frequently accessed	Acceptable
Buffer pool near limit	`innodb_buffer_pool_size` too large	Reduce or upgrade memory
Non-buffer pool abnormal	Temp tables / large sorts / big queries	Optimize SQL
Many connections	Connection pool leak or unclosed connections	Check application connection pool

Scenario 5: Cross-Server Database Migration (Comprehensive)

Applicable to: Game server migration, server upgrades, regional data transfers.

Standard Flow

1. Export cross-server data
         ↓
2. Upload to target machine
         ↓
3. Create database and import data
         ↓
4. Migrate game/service directory
         ↓
5. Modify configuration files
         ↓
6. Update ops/management database records
         ↓
7. Batch update related service configurations
         ↓
8. Restart service

Key Checkpoints

Step	Check
Data export	Stop writes, lock tables or shut down service
Data transfer	Verify md5sum
Data import	Check error logs
Config update	Confirm IP, port, database name
Database update	ops_server table, management config
Service restart	Verify process and port

Command Cheatsheet

# Check replica status
SHOW SLAVE STATUS\G;

# Check master binlog position
SHOW MASTER STATUS;

# Stop/start replica
STOP SLAVE;
START SLAVE;

# Reset replica configuration
RESET SLAVE ALL;

# Check connections
SHOW STATUS LIKE 'Threads_connected';

# Check max connections
SHOW VARIABLES LIKE 'max_connections';

# MySQL buffer pool usage by db
SELECT
    SUBSTRING_INDEX(table_name, '/', 1) AS db,
    COUNT(*) * 16 / 1024 AS size_mb
FROM information_schema.innodb_buffer_page_lru
WHERE table_name IS NOT NULL
GROUP BY db;

# XtraBackup backup
xtrabackup --backup --target-dir=/tmp/backup --user=root --password=password

# XtraBackup restore
xtrabackup --prepare --target-dir=/tmp/backup
xtrabackup --copy-back --target-dir=/tmp/backup

Notes

server-id must be unique — master and slave cannot have the same value
Set read-only=1 on slave — prevents accidental writes
Use XtraBackup for large volumes — mysqldump may fail or take too long
High replica lag — check slave machine performance and network first
Always confirm IP, port, database name before migration — wrong values will break connectivity
Cross-cloud migration has network latency — consider using a cloud global acceleration service

ops-mysql-master-slave

Safety Notice

Copy this and send it to your AI assistant to learn