Skip to main content

AWS Datasync overview

AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data between on-premises storage systems and AWS Storage services and between AWS Storage services. 

For example, DataSync can copy data between Network File System (NFS), Server Message Block (SMB) file servers, self-managed object storage,  S3 buckets, EFS  file systems, and Amazon FSx.

Above diagram depicts the typical architecture of AWS Datasync services.

How it works: 

1) Data Sync Service: Service in the AWS cloud, which manages and tracks data sync tasks, schedules 

2) Data Sync Agent: A Virtual Appliance with computing power to run scheduled copy, uploading capability and maintain metadata (for full and incremental data transfer ) deployed at on-premise or cloud. 


Advantages:

a) Cost-effective solution for Data Sync task ( service charged for per GB transfer in only)

b) Best suited for aggressive deployment with zero-touch existing infrastructure.

c) Secure transport between source and destination

d) Granular Data Sync schedules from minutes to Days

e) Full and Incremental Data Sync Support

f) Data Verification at various stages supported.

g) Logs, events can be integrated with AWS cloud monitoring systems.


Disadvantages:

a) Multiple appliances may be required depending on the count of files 

b) Multiple tasks need to be created depending on directory and files depth

c) No clue or tracking method on sync or scan status at source and destination

d) Multi VLAN not supported ( New firewall might be required to reach various shares)

e) One-way sync only 

f) No local console.

g) It might be expensive and slow when you are copying small files to S3 / FSx.

h) Cross service replication might need attention on object permissions at destination. 

Testing in LAB: 

 This lab has a small set of data synced and tested on an on-premise SMB share to S3 target over a public internet interface.


Step 1: Go to AWS account and Datasync service and download image as per the hypervisor..

due to unknown issue, Downloaded Hyper V image not worked, so tested with VMware image.

Step 2:  Power on the On Premise appliance and get the IP address.


Step 3:  Activate appliance in AWS console.


Step 4:  Create a task by selecting source and target.





Step 5:  Verify a task or Data Sync.

                                                                            Task status

Source SMB file samples
    

                                                              performance metrics


Target Status











Comments

Popular posts from this blog

Cloud Storage - Backup and Archiving (Commvault) -1

 As the Hybrid cloud became a standard for enterprise IT infrastructure, enterprises consider public cloud storage as a long-term archiving solution. As a result, most Backup applications and storage appliances are now ready to integrate with Azure, AWS storage API. I thought to share some Day2 challenges while deploy, integrate and manage the backup applications with cloud storage options. Commvault is one of the leaders in enterprise backup tools, so a couple of scenarios will be tested in this series of posts using commvault and AWS s3, Glacier. Below picture depicts the LAB architecture. 1) Cloud Storage integration support 2) where we can fit cloud storage in a 3-2-1 strategy for backups 3) Deduplication, Micro pruning options 4) Encryption 5) Object locking and Ransomware protection 6) Cloud Lifecycle policy Support 7) Disaster recovery within the cloud Commvault seems natively supporting  most of the cloud storage API without additional license requirements. Integrating...

Aws File Storage gateway insights #2

  S3 is object storage emulated as NFS using AWS file storage gateway; we need to understand S3 object operations and associated charges. Putting more frequent changing files on the AWS file storage gateway may surge the cost. Below is the AWS file operation vs S3 object impact. Interestingly, in LAB, I observed that even if you are accessing the S3 console using the AWS console for administrative purposes, it is calling the list API call or getting the files list. With help of FUSE and S3fs,  on premise NFS exported files were able to access in  cloud EC2 instances. This is very useful incase of you have some systems that needs hybrid file access. [root@ip-172-31-13-8 s3fs-fuse]# s3fs rmanbackupdemo -o use_cache=/tmp -o allow_other -o uid=1001 -o mp_umask=002 -o multireq_max=5 /mys3mount [root@ip-172-31-13-8 s3fs-fuse]# c /mys3mount/ -bash: c: command not found [root@ip-172-31-13-8 s3fs-fuse]# cd /mys3mount/ [root@ip-172-31-13-8 mys3mount]# ls awstest [root@ip-172-31-13-...