The Aspera FASP high speed transport platform is enabled to provide high-performance secure WAN transport of files, directories, and other large data sets to, from and between a number of leading third-party cloud storage platforms. The implementation is an enhanced transport stack and virtual file system layer in the Aspera server software that allows for direct-to-object-storage transfer over the WAN using the FASP protocol and the native I/O capabilities of the particular third-party file system.
The stack is available in all generally available Aspera server software products and supports interoperable transfer with all generally available Aspera client software.
Aspera continually adds support for new third-party storage platforms as market demand is demonstrated, and in version 3.4 is pleased to currently support all leading cloud storage platforms including, OpenStack Swift (v 1.12) for IBM Cloud and Rackspace, Amazon S3, Windows Azure BLOB, Akamai NetStorage, Google Storage, and Limelight Cloud Storage. This whitepaper overviews the motivation for the platform – the fundamental problem of transporting large data sets to and from cloud environments – details the platform capabilities, and describes the performance and functionality testing that comprises verification of each storage platform.
The mainstream “Cloud” storage platforms are “object storage” architectures that emanate in design from the early scale out storage systems developed by the leading web search companies such as the Hadoop File System (HDFS), Google File System (GFS), and Amazon Dynamo. The key design principle of these object storage systems is to organize file data and associated metadata such as names, permissions, access times, etc., as an “object” and to store the file data and the metadata referring to it in a decoupled fashion, allowing for extreme scale and throughput.
The file data is stored across distributed commodity storage in redundant copies to achieve reliability, and scale is achieved through a single namespace in which master tables store a hash of an object’s identifiers and references to the copies of its file data on disk, allowing for fast and universal addressing of individual objects across the distributed platform (see Figure 1).
This approach lends itself extremely well to storage for applications such as indexing for scalable web search, as it allows the application to utilize extremely large data sets, achieve very high aggregate throughput in batch processing, and use inexpensive commodity disks for the underlying storage.
An application uploading or downloading any single item greater than the chunk size (e.g., 64 MB) must divide and reassemble the object into appropriate chunks, which is itself tedious and has a bottleneck in transfer speed in the local area unless done in highly parallel fashion. For example, for 64 MB chunks, writing a 1 Terabyte file requires dividing it into more than 10,000 chunks, and throughput in practical implementations tops out at less than 100 Mbps per I/O stream. We refer to this as the local area storage bottleneck.
A fundamental solution – IBM Aspera Direct-to-Cloud transport
The Aspera Direct-to-Cloud transport platform is a one-of-a kind fundamental solution for transfer of file and directory data to, from and between cloud storage. Built on the FASP transport technology deeply integrated with object storage, it brings all of the characteristics of the Aspera transport platform to cloud storage: maximum speed of transfer for upload to cloud, download from cloud and inter-cloud transfers of files and directories regardless of network distance, in a single transport stream – no parallel streaming required, and support for files and directories up to the maximum size allowed by the storage platform.
Transfer rates adapt automatically to the available network bandwidth and storage bandwidth through Aspera’s patented dynamic rate control and the aggregate bandwidth of multiple transfers is precisely controllable with Aspera’s vlink technology. The platform addresses the fundamental security concerns around data in the cloud with both over-the-wire and at-rest encryption, and provides privacy in multi-tenant storage environments by authenticating all transfer and browsing operations using native storage credentials. Interrupted transfers automatically restart and resume from the point of interruption. Secure file browsing and transfer is supported with all Aspera clients, including browser, desktop, CLI and embedded / SDK modes.
Capability details are highlighted below:
- Performance at any distance – Maximum speed single stream transfer, independent of round-trip delay and packet loss (500 ms / 30% packet loss+) up to the I/O limits of the platform.
- Unlimited throughput in scale out – Automatic cluster scale out supports aggregate transfer throughputs for single mass uploads/downloads at 10 Gigabits per second and up, capable of 120 Terabytes transferred per day and more, at any global distance.
- Large file sizes – Support for files and directory sizes in a single transfer session up to the largest object size supported by the particular platform at a default 64 MB multi-part chunk size, e.g., 0.625 TB per single session on AWS S3. (The most recent software versions have a configurable chunk size extending transfers to the largest object size supported by the platform).
- Large directories of small files – Support for directories containing any number of individual files with high-speed, even for very large numbers of very small files (100 Mbps transfers over WAN for file sets of 1-10 KB in size), 500 Mbps+ with new ascp4).
- Adaptive bandwidth control – Network and disk based congestion control providing automatic adaptation of transmission speed to available network bandwidth and available I/O throughput to/from storage platform, to avoid congestion and overdrive.
- Automatic resume – Automatic retry and checkpoint resume of any transfer (single files and directories) from point of interruption.
- Built-in encryption and encryption at rest – Built in over-thewire encryption and encryption-at-rest (AES 128) with secrets controlled on both client and server side.
- Secure authentication and access control – Built-in support for authenticated Aspera docroots implemented using private cloud credentials. Support for configurable read, write, and listing access per user account. Support for platform-specific role based access control including Amazon IAMS and Microsoft Secure SaaS URLs.
- Seamless, full featured HTTP fallback – Seamless fallback to HTTP(s) in restricted network environments with full support for encryption, encryption-at-rest and automatic retry and resume.
- Concurrent transfer support – Concurrent transfer support scaling up to ~50 concurrent transfers per VM instance on the environment. (Cloud storage platforms vary in their ability to support concurrent sessions depending on the maturity of the platform and the capacity of the particular VM host-to-cloud file system architecture).
- Preservation of file attributes – In later versions transfers can be configured to preserve file creation, modification times against AWS S3 and Swift.
- Complete interoperability with Aspera Clients – Fully interoperable transfer support with all core Aspera products acting as transfer peers with the cloud storage transfer.
- Full-featured transfer modes – Fully interoperable transfer support for all modes of transfer in these products including command line (CLI), interactive GUI point-and-click, browser, hot folder automation, and SDK automation.
- Comprehensive server capabilities – Full support for all Aspera server-side features including secure docroots, console configuration of BW, security and file handling policies and reporting to Aspera Console.
- Support for forward and reverse proxy – Transfers to/from cloud environments support Aspera proxy on the client side in forward or reverse mode.
Transfer Cluster Management with Autoscale
The new Transfer Cluster Manager with Autoscale elastic auto scaling of transfer hosts and client load balancing, cluster-wide reporting, and transfer management, and multi-tenant secure access key system. The service allows for dynamic, real-time scale out of transfer capacity with automatic start/stop of transfer server instances, automatic balancing of client requests across available instances and configurable service levels to manage maximum transfer load per instance, available idle instances for “burst” and automatic decommissioning of unused instances.
The ATCM service includes the following capabilities:
- Manages Transfer Throughput SLAs and compute/ bandwidth costs with elastic scaling – The service is part of the Aspera transfer server software stack, and automatically manages the number of server instances needed to support client transfer demands based on user-defined policies and automatically manages the number of nodes in use and booted up in reserve but idle.
- Provides high availability and load balancing – As transfer loads increase and decrease, nodes are moved from idle to available for client requests, and from available to highly utilized and back again based on user-defined load metrics such as tolerances for low and high transfer throughput and online burst capacity. If the minimum number of available nodes drops below the user-defined threshold, the cluster manager boots up new nodes automatically, and then brings them back down when they are no longer needed.
- Provides increased and reliability – ATCM will monitor the health and availability of Aspera transfer. Any unavailable/ down nodes or services are automatically detected and restarted or replaced if necessary. Subsequent client requests are pointed to healthy nodes via automatic cluster domain name services (DNS) management.
- Works on all major clouds and in conjunction with Aspera Direct-to-Cloud storage infrastructure independent – All of the Autoscale capabilities are implemented in the Aspera software and thus are portable across cloud providers including AWS, IBM Cloud, Azure, Google, etc. Works in both public clouds and Virtual Private Cloud (VPC) environments.
Validation of third-party cloud storage platforms
In order to bring on support of a new object storage platform, and to verify support for a storage platform in our released software, Aspera carries out a comprehensive suite of automated and manual tests to verify performance with WAN conditions, large file sizes and numbers, file integrity, concurrency, load testing, security including encryption and access control, and backward compatibility between versions. Aspera aims to run the same test sets and conditions across all platforms within the limits of the number, variety and network connectivity of the test hosts the platform provides. The parameters of the test cases and performance capabilities for a single virtual host computer running the Aspera server software, by platform, are detailed in Table 1 on the following page.
The majority of cloud-based storage available in the marketplace today is based on object storage. Key design principles of object storage architectures are the separation of file data and metadata, replication of data across distributed commodity storage, and unified access across distributed nodes and clusters. These principles enable more cost-effective scale-out with greater redundancy and durability then traditional block-based storage.