Publish: 2022-03-30 | Modify: 2022-03-30
imgurl.org is a image hosting website operated by xiaoz
since December 2017, hereinafter referred to as ImgURL
. During its operation, ImgURL has undergone several migrations, but at that time the data was not large, so it was not difficult. As time went on, the amount of data increased, and as of March 29, 2022, there were already over 1,176,457 images.
Currently, the server's disk IO pressure is relatively high, mainly from MySQL read and write pressure and image processing pressure (such as image cropping, compression, etc.). At this stage, ImgURL is hosted on Psychz, but due to the poor IO performance of Psychz's mechanical hard drive, it is planned to migrate from Psychz's dedicated server to Kimsufi's dedicated server. This article is a record and sharing of the migration process.
ImgURL mainly consists of three parts: program (PHP), database (MySQL), and external storage. The external data is divided into four parts:
Note: All local, FTP, and minio data are on Psychz.
Among them, the FTP and minio data are the largest:
Although the MySQL data has reached millions of rows, the overall database is not very large, and it is basically a standard import and export operation.
First, export the MySQL database:
mysqldump -uxxx -pxxx imgurl>imgurl.sql
The entire SQL file exported is only 525MB. Then use the scp
command to copy it for migration (omitted here).
Next is to import the database:
mysql -u imgurl -p imgurl<imgurl.sql
The database migration step is completed quickly.
Note: It is not recommended to use the MySQL source
command for import, because the source
command will print every line of operation, and printing millions of data lines one by one will drive you crazy.
The rsync
command is used to migrate the website:
rsync -aqpogt -e 'ssh -p xxx' root@IP:/xxx /xxx
The meanings of the parameters are as follows:
-e 'ssh -p xxx'
: specify the SSH port because I used a non-standard SSH port (22).Since this part of the website program is not very large, it didn't take too long.
Since the FTP data has reached 188GB, it is not considered large, but there are a lot of small files. This time, the rsync
command is still used to migrate the FTP data. However, before the migration, it is better to use the screen
command to keep the task running in the background to avoid interruption of the window task due to long time.
# Create a new screen session
screen -S xxx
# Use the rsync command to migrate
rsync -aqpogt -e 'ssh -p xxx' root@IP:/xxx /xxx
The important thing here is to run the task in the background using the screen command or other methods, after all, there are many small files to avoid interruption. The whole process took several hours, and I forgot how long exactly.
xiaoz uses the single-node version of minio, not the minio cluster. It seems that the data of the single-node version is directly stored on the disk as the source file, so the simplest way is to synchronize the source files directly using the rsync
command, but it is not clear whether there are any unknown risks in doing so. Therefore, another relatively safe approach is used, using the rclone sync
command to synchronize.
I have already configured the minio information on both sides in advance on rclone (omitted steps), named psychz_s3
and kimsufi_s3
. At first, I used the command to migrate the data in the 2021
bucket:
rclone sync -P psychz_s3:/imgurl/imgs/2021 kimsufi_s3:/imgurl/imgs/2021
It took 13 hours and there were many errors, with the error message:
The Content-Md5 you specified did not match what we received.
The number of error files reached 55,269, which means that 55,269 files failed to migrate due to failed MD5 verification. The reason for this error should be that I directly modified the source files of minio using non-standard operations before (the source files of the minio single-node version can be directly viewed and modified, and I directly compressed these image source files before, causing the MD5 to change). Due to the large number of files, combined with the poor IO of Psychz, rclone spent a lot of time scanning.
Then continue to migrate the remaining data. This time I became smarter and optimized the command as follows:
rclone sync --s3-upload-cutoff 0 --tpslimit 10 --ignore-checksum --size-only -P psychz_s3:/imgurl/imgs/2022/03 kimsufi_s3:/imgurl/imgs/2022/03
--s3-upload-cutoff 0
For small objects that were not uploaded as multipart uploads (the object size is as follows if using rclone to upload with --s3-upload-cutoff), rclone uses the ETag header as the MD5 checksum.
However, for objects uploaded as multipart uploads or with server-side encryption (SSE-AWS or SSE-C), the ETag header is no longer the MD5 sum of the data. Therefore, rclone adds an additional metadata X-Amz-Meta-Md5chksum, which is a base64-encoded MD5 hash (the same format as required by Content-MD5).
For large objects, calculating this hash may take some time, so you can disable adding this hash with --s3-disable-checksum. This means that these objects do not have an MD5 checksum.
Note that reading it from the object requires an additional HEAD request since the metadata is not returned in the object listing.
After reading the official description, I still don't quite understand the specific meaning of the --s3-upload-cutoff
parameter.
--tpslimit 10
The effect of this parameter is described by the official as follows:
This limits the transfer rate in transactions per second. It can be useful to limit the rate of transactions made to the S3 backend.
--ignore-checksum
: Ignore hash verification (MD5). This parameter is useful for a large number of small files, but it may reduce reliability.--size-only
: Only verify the size of the file.After monitoring and observation, the effect of the command before and after optimization is significant, and the system load also decreases linearly (because the scanning should be reduced).
With 154GB of data and 670,000 objects, it took more than 30 hours to synchronize the minio data using rclone sync
. There are probably many other optimization parameters for rclone. Interested friends can study the rclone official documentation.
As of March 30, 2022, https://imgurl.org/ has been successfully migrated from Psychz to Kimsufi. The whole process was not difficult, but the rclone sync
synchronization of minio data took too much time. The summary is as follows:
source
command of MySQL for import. It is recommended to use the following method: mysql -u imgurl -p imgurl<imgurl.sql
.screen
command or similar methods to keep the task running in the background to avoid interruption of the task due to the window being closed.rclone sync
can improve efficiency through a series of optimization parameters, such as ignoring hash verification --ignore-checksum
.rsync
, which is simple and efficient.I come from China and I am a freelancer. I specialize in Linux operations, PHP, Golang, and front-end development. I have developed open-source projects such as Zdir, ImgURL, CCAA, and OneNav.