Massive amounts of data are being created driven by billions of sensors all around us such as cameras, smart phones, cars as well as the large amounts of data across enterprises, education systems and organizations. In the age of big data, artificial intelligence (AI), machine learning and deep learning deliver unprecedented insights in the massive amounts of data.
Amazon CEO Jeff Bezos spoke about the potential of artificial intelligence and machine learning at the 2017 Internet Association‘s annual gala in Washington, D.C., “It is a renaissance, it is a golden age,” Bezos said. “We are solving problems with machine learning and artificial intelligence that were in the realm of science fiction for the last several decades. Natural language understanding, machine vision problems, it really is an amazing renaissance.” Machine learning and AI is a horizontal enabling layer. It will empower and improve every business, every government organization, every philanthropy — basically there’s no institution in the world that cannot be improved with machine learning.”
Technology companies such as Amazon, Apple, Baidu, Facebook, Google (Alphabet), Microsoft and NVIDIA have dedicated teams working on AI projects in areas such as image recognition, natural language understanding, visual search, robotics, self-driving cars and text-to-speech. Examples of their innovative AI, machine and deep learning projects include:
- Amazon: Amazon uses AI and complex learning algorithms that continuously assess the market dynamics to determine product recommendations and which products are selected for the Amazon Buy Box.
- Apple: Apple’s Siri virtual assistant on iPhones and other Apple hardware uses deep learning to do searches as well as provide relevant answers interactively using a voice interface.
- Baidu: A speech-recognition system called Deep Speech 2 developed by Baidu easily recognizes English or Mandarin Chinese speech and, in some cases, can translate more accurately than humans.
- Facebook: Facebook’s DeepMask and SharpMask software works in conjunction with its MultiPathNet neural networks allowing Facebook to understand an image based on each pixel it contains.
- Google (Alphabet): Google CEO Sundar Pichai indicates that when users tap on Google Maps that the Google StreetView product uses AI to automatically recognize street signs or business signs to help define the location.
- Microsoft: Microsoft’s AI uses a cognitive vision system in PowerPoint that analyzes photos and auto-generates Alt-Text or suggests diagrams that illustrate that process.
- NVIDIA: NVIDIA DRIVE™ PX is the open AI car computing platform that enables automakers and tier 1 suppliers to accelerate production of automated and autonomous vehicles.
The emergence of AI started when three key technologies came together like a perfect storm, known as the big bang of AI. The three key drivers are deep learning algorithms, parallel processors based on graphics processing units (GPUs) and the availability of big data.
Today, both multi-core CPUs and GPUs are used to accelerate deep learning, analytics, and engineering applications—enabling data scientists, researchers, and engineers to tackle challenges that were once impossible. New deep learning algorithms leverage massively parallel neural networks inspired by the human brain. Instead of experts handcrafting software, a deep learning model writes its own software by learning from many examples, delivering super-human accuracy for common tasks like image, video, and text processing.
Deep learning technology and neural networks have been around for a long time. So why is deep learning now starting to peak and what is the value of big data? Andrew Ng, a luminary in the field of AI, described the evolution of big data and deep learning at the 2016 Spark Conferenceii. Ng indicated that if you take an older traditional learning algorithm such as logistic regression and feed it more data, the system performance plateaus because the algorithm cannot squeeze any more insight with more data. Ng observed that deep neural networks are different. The more training data is fed into the neural network. The adoption of deep learning is rapidly growing because of algorithm innovation, performance leaps of GPU-based computing systems, and the constant growth of big data.
WHY TRADITIONAL STORAGE CAN’T MEET DEEP LEARNING NEEDS
There’s been significant advancement in parallel computing and algorithms, but the technology that stores and delivers big data has largely been built on legacy building blocks, designed in the serial era. A new type of storage system is required to deliver the massive amounts of data for these new computing paradigms.
In the past several years alone, the amount of compute required by deep learning and the amount of compute delivered by GPUs jumped more than 10 times. Meanwhile, disks and SSDs have not increased in performance during the same period. While the volume of unstructured data is exploding, legacy storage struggles to handle the storage performance needs of the emerging big data drivers.
Most deployments today use direct-attached storage (DAS) or distributed direct attached storage (DDAS) where datasets are spread across disks in each server. Use of DDAS allowed data scientists to use commodity off-the-shelf systems/components for their analytics pipeline, like X86 processors and standard hard disk drives, but the approach is full of potential problems. At the time that modern data analytics technologies were being developed, there wasn’t a storage platform big enough for such large amounts of data nor fast enough to meet the high bandwidth requirements from big data software.
In the new age of big data, applications are leveraging large farms of powerful servers and extremely fast networks to access petabytes of data served for everything from data analytics to scientific discovery to movie rendering. These new applications demand fast and efficient storage, which legacy solutions are no longer capable of providing.
What’s needed is a new, innovative storage architecture to support advanced applications while providing best-of-breed performance in all dimensions of concurrency – including input/output operations per second (IOPs), throughput, latency, and capacity – while offering breakthrough levels of density. The new FlashBlade™ flash-based storage by Pure Storage® meets all these needs. FlashBlade can handle big data and concurrent workloads that will drive tomorrow’s discoveries, insights and creations.
For the past four years, Gartner has rated Pure Storage as a Leader in the Magic Quadrant of Solid State Arrays for their innovations in all-flash data storage. Since Pure Storage unveiled its FlashBlade scale-out storage platform, the company has made significant inroads in providing storage for real-time and big data analytics, financial analysis and manufacturing. The FlashBlade architecture is designed from the ground-up for modern analytics workloads, delivering high performance, cost-effective, simple-to-own-andoperate scale-out storage for petabytes of operational data. It is specifically designed for flash media and the architecture contains no provision for mechanical disks. FlashBlade is purpose-built for massively parallel workloads that are required for deep learning processing.
DELIVERING DATA THROUGHPUT FOR AI
Deep learning systems often use mostly small files to keep the training computers busy. the deep learning training is running on NVIDIA DGX-1 servers and the FlashBlade data storage platform. In the example, each DGX-1 is processing 13k images per second through AlexNet using Microsoft CNTK framework. The training model uses small files with random access, which older legacy systems do not handle efficiently. In this example, a FlashBlade can deliver enough ingest throughput to maximize training performance on multiple DGX-1 systems.
Computer systems consisting of multi-core CPUs or GPUs using parallel processing and extremely fast networks are required to process the data. However, legacy storage solutions are based on architectures that are decades old, un-scalable and not well suited for the massive concurrency required by machine learning. Legacy storage is becoming a bottleneck in processing big data and a new storage technology is needed to meet data analytics performance needs.
The FlashBlade all-flash storage array from Pure Storage is designed to meet these needs. FlashBlade performance improves linearly with more data. Whether files are small or large, FlashBlade delivers true linear scaling of capacity and performance, and as a result, is well-suited to modern analytics workloads for AI and deep learning.