Skip to main content

AWS Datasync overview

AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data between on-premises storage systems and AWS Storage services and between AWS Storage services. 

For example, DataSync can copy data between Network File System (NFS), Server Message Block (SMB) file servers, self-managed object storage,  S3 buckets, EFS  file systems, and Amazon FSx.

Above diagram depicts the typical architecture of AWS Datasync services.

How it works: 

1) Data Sync Service: Service in the AWS cloud, which manages and tracks data sync tasks, schedules 

2) Data Sync Agent: A Virtual Appliance with computing power to run scheduled copy, uploading capability and maintain metadata (for full and incremental data transfer ) deployed at on-premise or cloud. 


Advantages:

a) Cost-effective solution for Data Sync task ( service charged for per GB transfer in only)

b) Best suited for aggressive deployment with zero-touch existing infrastructure.

c) Secure transport between source and destination

d) Granular Data Sync schedules from minutes to Days

e) Full and Incremental Data Sync Support

f) Data Verification at various stages supported.

g) Logs, events can be integrated with AWS cloud monitoring systems.


Disadvantages:

a) Multiple appliances may be required depending on the count of files 

b) Multiple tasks need to be created depending on directory and files depth

c) No clue or tracking method on sync or scan status at source and destination

d) Multi VLAN not supported ( New firewall might be required to reach various shares)

e) One-way sync only 

f) No local console.

g) It might be expensive and slow when you are copying small files to S3 / FSx.

h) Cross service replication might need attention on object permissions at destination. 

Testing in LAB: 

 This lab has a small set of data synced and tested on an on-premise SMB share to S3 target over a public internet interface.


Step 1: Go to AWS account and Datasync service and download image as per the hypervisor..

due to unknown issue, Downloaded Hyper V image not worked, so tested with VMware image.

Step 2:  Power on the On Premise appliance and get the IP address.


Step 3:  Activate appliance in AWS console.


Step 4:  Create a task by selecting source and target.





Step 5:  Verify a task or Data Sync.

                                                                            Task status

Source SMB file samples
    

                                                              performance metrics


Target Status











Comments

Popular posts from this blog

Aws File Storage gateway insights #2

  S3 is object storage emulated as NFS using AWS file storage gateway; we need to understand S3 object operations and associated charges. Putting more frequent changing files on the AWS file storage gateway may surge the cost. Below is the AWS file operation vs S3 object impact. Interestingly, in LAB, I observed that even if you are accessing the S3 console using the AWS console for administrative purposes, it is calling the list API call or getting the files list. With help of FUSE and S3fs,  on premise NFS exported files were able to access in  cloud EC2 instances. This is very useful incase of you have some systems that needs hybrid file access. [root@ip-172-31-13-8 s3fs-fuse]# s3fs rmanbackupdemo -o use_cache=/tmp -o allow_other -o uid=1001 -o mp_umask=002 -o multireq_max=5 /mys3mount [root@ip-172-31-13-8 s3fs-fuse]# c /mys3mount/ -bash: c: command not found [root@ip-172-31-13-8 s3fs-fuse]# cd /mys3mount/ [root@ip-172-31-13-8 mys3mount]# ls awstest [root@ip-172-31-13-...

Change Block Tracking (CBT) and VM Backups in VMware

As I started my IT career from Backup Engineer to whatever today I am, I thought my first post on VMware should be on VM backup strategy, especially on Change Block Track for CBT mechanism   CBT (Change Block Track) is allows backup applications to take incremental backups rapidly on VM. CBT uses VMware Storage API formerly known as VMware Storage API for Data Protection (VADP for more info https://kb.vmware.com/s/article/1021175) . Below are the steps involved in enabling CBT on a VM.        1). We need to define the CBT setting by configuring the VM configuration parameters ctkEnable= “True” (Advanced /General> Configuration Parameters).        2). Above setting enables CBT global leve mean every disk can be tracked by CBT means you are allowed do a snapshot on VMDK file to track the changes for backup but independent or multi writer disks does not t like this setting to be enabled and it is not recommended too. ...