Introduction

Once the world had simple applications. But in today’s world, we have all sorts of data, technology, hardware, and other gadgets. Data comes to us from a myriad of places and comes in many forms. And the volume of data is just crushing.

There are three different types of data that an organization uses for analytical purposes. First, there is classical structured data that principally comes from executing transactions. This structured data has been around the longest. Second, there is textual data from emails, call center conversations, contracts, medical records, and elsewhere. Once text was a “black box” that could only be stored but not analyzed by the computer.

Now, textual Extract, Transform, and Load (ETL) technology has opened the door of text to standard analytical techniques. Third, there is the world of analog/IoT. Machines of every kind, such as drones, electric eyes, temperature gauges, and wristwatches—all can generate data. Analog/IoT data is in a much rougher form than structured or textual data. And there is a tremendous amount of this data generated in an automated manner. Analog/IoT data is the domain of the data scientist.

At first, we threw all of this data into a pit called the “data lake.” But we soon discovered that merely throwing data into a pit was a pointless exercise. To be useful—to be analyzed—data needed to (1) be related to each other and (2) have its analytical infrastructure carefully arranged and made available to the end user.

To read full download the whitepaper:

Building the Data Lakehouse

Leave a Reply

Your email address will not be published. Required fields are marked *