Friday, December 30, 2016

AWS S3 Data transfer - Kill the Colo ;-)

Transferring data from your premises or Colo infrastructure to AWS Cloud is not longer as difficult as it used to be ;-)

Besides dedicated links and physical transport of files, Amazon provides a pure internet solution (S3 Transfer Acceleration) to transfer files to Amazon Simple Storage Solution (S3) which might be enough for your needs. I will describe here my experience using this method from a manual perspective (no scripting this time) which should be enough for cases when for example you are moving to the cloud those big on premises or on Colo files.

Start by performing a test to see how faster this method will be in comparison with the direct upload method http://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html.

I had to transfer 1 TB of data to S3 from Windows 2008 servers. Here is how I did it.

To transfer your files with S3 Accelerated Transfer Upload Speed select your bucket from AWS console | Properties | Transfer Acceleration | Enable | Get the accelerated endpoint which will work just as the regular endpoint; for example mys3bucketname.s3-accelerate.amazonaws.com.

You can use AWS CLI in windows just as well as in *nix systems ( http://docs.aws.amazon.com/cli/latest/userguide/installing.html#install-msi-on-windows ) to upload or download files.

See below for an example that shows how I configure AWS CLI, list what I have in S3, set the AWS CLI to use the accelerated endpoint and finally copy the data to the bucket.
C:\> aws configure
AWS Access Key ID [****************HTAQ]:
AWS Secret Access Key [****************J+aE]:
Default region name [None]: us-east-1
Default output format [None]: json

C:\> aws s3 ls
2016-12-24 06:25:01 mys3bucket1
2016-12-03 15:15:37 mys3bucket2

C:\> aws configure set default.s3.use_accelerate_endpoint true

C:\> aws s3 sync E:\BigData\ s3://mys3bucket2
I got 12MB/sec versus 1.2MB/sec at regular speed. I was able to transfer 1 TB in around 16 hours. The good thing is that the command behaves like rsync meaning that new files or addition to existing files will be the only data you will be transferring after that first attempt. This is good news when you are planning to move the infrastructure to the cloud as it minimizes the possible business disruption timeframe.

C:\> aws configure set default.s3.use_accelerate_endpoint true

C:\> aws s3 sync s3://mys3bucket2 D:\SmallToBeBigDATA\
WARNING: S3 transfer Acceleration has an associated cost. All I can say is: It is worth it. It costed us $8.28 to transfer above 1TB of data from an on premises Windows Server to an EC2 hosted Windows Server via S3.

No comments:

Followers