The Big Data concept is continually evolving and being reconsidered, as it remains the driving force behind many ongoing waves of digital transformation, including Artificial Intelligence, Data Science and the Internet of Things.
Nowadays, almost every action we take leaves a digital trail. We generate data whenever we go online, when we carry our GPS-equipped smartphones, when we communicate with our friends through social media or chat applications, and when we shop. You could say we leave digital footprints with everything we do that involves a digital action, which is almost everything.
On top of this, the amount of machine-generated data is rapidly growing too. Data is generated and shared when our “smart” home devices communicate with each other or with their home servers. Industrial machinery in plants and factories around the world are increasingly equipped with sensors that gather and transmit data.
The term “Big Data” refers to the collection of all this data and our ability to use it to our advantage across a wide range of areas, including business.
Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL)
Artificial Intelligence (AI) is usually defined as the science of making computers do things that require intelligence when done by humans. AI has had some success in limited, or simplified, domains. However, the five decades since the inception of AI have brought only very slow progress until recently.
AI is intelligence exhibited by machines, rather than humans or other animals. In computer science, the field of AI research defines itself as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of success at some goal. Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving".
The scope of AI is disputed: as machines become increasingly capable, tasks considered as requiring "intelligence" are often removed from the definition.
Capabilities generally classified as AI, as of 2017, include successfully understanding human speech, competing at a high level in strategic game systems (such as chess and Go), autonomous cars, intelligent routing in content delivery networks, military simulations, and interpreting complex data.
There are several other terms you often hear in connection to A.I. Machine Learning (ML) generally entails teaching a machine how to do a particular thing, like recognizing a number, by feeding it a bunch of data and then directing it to make predictions on new data.
Deep Learning (DL) is one type of machine learning that demands less hand-engineering of features. Often the approach involves artificial neural networks, a mathematical system loosely inspired by the way neurons work together in the human brain.
Data Analytics or Business Intelligence
Business or Data Analytics refers to the skills, technologies, practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning. Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods.
In contrast, Business Intelligence (BI) traditionally focuses on using a consistent set of metrics to both measure past performance and guide business planning.
Business Analytics makes extensive use of statistical analysis, including explanatory and predictive modeling, and fact-based management to drive decision making. Analytics may be used as input for human decisions or may drive fully automated decisions.
Data Warehouse vs Data Lake
Data Warehouse (or BW) and Data Lake are both data storage repositories. DW only stores data that has been modeled/structured while data lake stores it all.
Data Warehouses store vast amounts of structured data in highly regimented ways. They require that a rigid, predefined schema exists before loading the data. It’s almost always a star or snowflake schema.
Data Lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed.