Sneak Peek in the World of Big Data !

Saumya Singh
22 min readSep 17, 2020

Over the years, big data has been the hottest topic in the tech world. The evolution of big data has taken the world by storm and with each passing day, it just gets even bigger. Today big data touches every business, big or small, at some level.

According to market intelligence company IDC, the ‘Global Datasphere’ in 2018 reached 18 zettabytes. This is the total of all data created, captured or replicated. IDC predicts that the Global Datasphere will grow to 175 Zettabytes by 2025.

To keep up with the storage demands stemming from all this data creation, IDC forecasts that over 22 ZB of storage capacity must ship across all media types from 2018 to 2025, with nearly 59% of that capacity supplied from the HDD industry.

In 2025, 49% of the world’s stored data will reside in public cloud environments, IDC predicts.

In 2025, each connected person will have at least one data interaction every 18 seconds. Many of these interactions are because of the billions of IoT devices connected across the globe, which are expected to create over 90ZB of data in 2025.

Facts : One zettabyte is equivalent to a trillion gigabytes If you were able to store the entire Global Datasphere on DVDs, then you would have a stack of DVDs that could get you to the moon 23 times or circle Earth 222 times. If you could download the entire 2025 Global Datasphere at an average of 25 Mb/s, today’s average connection speed across the United States, then it would take one person 1.8 billion years to do it, or if every person in the world could help and never rest, then you could get it done in 81 days.

💥 What is Big Data ?

“Big data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data — i.e., we don’t define big data in terms of being larger than a certain number of terabytes (thousands of gigabytes). We assume that, as technology advances over time, the size of datasets that qualify as big data will also increase. Big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes).

💥 What the Statistics Say about the Need for Big Data:

  • The big data growth we’ve been witnessing is only natural. We constantly generate data. On Google alone, we submit 40,000 search queries per second. That amounts to 1.2 trillion searches yearly!
  • Each minute, 300 new hours of video show up on YouTube. That’s why there’s more than 1 billion gigabytes (1 exabyte) of data on its servers!
  • People share more than 100 terabytes of data on Facebook daily. Every minute, users send 31 million messages and view 2.7 million videos.
  • Big data usage statistics indicate people take about 80% of photos on their smartphones. Considering that only this year over 1.4 billion devices will be shipped worldwide, we can only expect this percentage to grow.
  • Smart devices (for example, fitness trackers, sensors, Amazon Echo) produce 5 quintillion bytes of data daily. In 5 years, we can expect for the number of these gadgets to be more than 50 billion!
  • Big data stats indicate that more than 30% of data will be uploaded to the cloud by next year.
  • Moving to a cloud can improve a business’s agility (by 29%) and shorten payback times by 30%.
  • Huge companies like Google use shared computing to satisfy their customers’ needs. About 1,000 computers are involved in answering every query.
  • In fact, the most popular open source for distributed computing — Hadoop, has a compound annual growth rate of 58% and will surpass $1 billion by 2020.

💥 Technologies for Big Data ?

There is a growing number of technologies used to aggregate, manipulate, manage, and analyze big data. We have detailed some of the more prominent technologies but this list is not exhaustive, especially as more technologies continue to be developed to support big data techniques, some of which we have listed.

1. The Hadoop Ecosystem

While Apache Hadoop may not be as dominant as it once was, it’s nearly impossible to talk about big data without mentioning this open source framework for distributed processing of large data sets. Last year, Forrester Predicted, “100% of all large enterprises will adopt it (Hadoop and related technologies such as Spark) for big data analytics within the next two years.”

2. Spark

Apache Spark is part of the Hadoop ecosystem, but its use has become so widespread that it deserves a category of its own. It is an engine for processing big data within Hadoop, and it’s up to one hundred times faster than the standard Hadoop engine, MapReduce.

3. R

R, another open source project, is a programming language and software environment designed for working with statistics. The darling of data scientists, it is managed by the R Foundation and available under the GPL 2 license. Many popular integrated development environments (IDEs), including Eclipse and Visual Studio, support the language.

4. Data Lakes

To make it easier to access their vast stores of data, many enterprises are setting up data lakes. These are huge data repositories that collect data from many different sources and store it in its natural state. This is different than a data warehouse, which also collects data from disparate sources, but processes it and structures it for storage. In this case, the lake and warehouse metaphors are fairly accurate. If data is like water, a data lake is natural and unfiltered like a body of water, while a data warehouse is more like a collection of water bottles stored on shelves.

5. NoSQL Databases

Traditional relational database management systems (RDBMSes) store information in structured, defined columns and rows. Developers and database administrators query, manipulate and manage the data in those RDBMSes using a special language known as SQL.

NoSQL databases specialize in storing unstructured data and providing fast performance, although they don’t provide the same level of consistency as RDBMSes. Popular NoSQL databases include MongoDB, Redis, Cassandra, Couchbase and many others; even the leading RDBMS vendors like Oracle and IBM now also offer NoSQL databases.

MonboDB is one of several well-known NoSQL databases.

6. Predictive Analytics

Predictive analytics is a sub-set of big data analytics that attempts to forecast future events or behavior based on historical data. It draws on data mining, modeling and machine learning techniques to predict what will happen next. It is often used for fraud detection, credit scoring, marketing, finance and business analysis purposes.

7. In-Memory Databases

In any computer system, the memory, also known as the RAM, is orders of magnitude faster than the long-term storage. If a big data analytics solution can process data that is stored in memory, rather than data stored on a hard drive, it can perform dramatically faster. And that’s exactly what in-memory database technology does.

8. Big Data Security Solutions

Because big data repositories present an attractive target to hackers and advanced persistent threats, big data security is a large and growing concern for enterprises. In the AtScale survey, security was the second fastest-growing area of concern related to big data.

9. Big Data Governance Solutions

Closely related to the idea of security is the concept of governance. Data governance is a broad topic that encompasses all the processes related to the availability, usability and integrity of data. It provides the basis for making sure that the data used for big data analytics is accurate and appropriate, as well as providing an audit trail so that business analysts or executives can see where data originated.

10. Self-Service Capabilities

With data scientists and other big data experts in short supply — and commanding large salaries — many organizations are looking for big data analytics tools that allow business users to self-service their own needs. In fact, a report from Research and Market estimates that the self-service business intelligence market generated $3.61 billion in revenue in 2016 and could grow to $7.31 billion by 2021. And Gartner has noted, “The modern BI and analytics platform emerged in the last few years to meet new organizational requirements for accessibility, agility and deeper analytical insight, shifting the market from IT-led, system-of-record reporting to business-led, agile analytics including self-service.”

11. Artificial Intelligence

While the concept of artificial intelligence (AI) has been around nearly as long as there have been computers, the technology has only become truly usable within the past couple of years. In many ways, the big data trend has driven advances in AI, particularly in two subsets of the discipline: machine learning and deep learning.

The standard definition of machine learning is that it is technology that gives “computers the ability to learn without being explicitly programmed.” In big data analytics, machine learning technology allows systems to look at historical data, recognize patterns, build models and predict future outcomes. It is also closely associated with predictive analytics.

Deep learning is a type of machine learning technology that relies on artificial neural networks and uses multiple layers of algorithms to analyze data. As a field, it holds a lot of promise for allowing analytics tools to recognize the content in images and videos and then process it accordingly.

12. Streaming analytics

As organizations have become more familiar with the capabilities of big data analytics solutions, they have begun demanding faster and faster access to insights. For these enterprises, streaming analytics with the ability to analyze data as it is being created, is something of a holy grail. They are looking for solutions that can accept input from multiple disparate sources, process it and return insights immediately — or as close to it as possible. This is particular desirable when it comes to new IoT deployments, which are helping to drive the interest in streaming big data analytics.

13. Edge Computing

In addition to spurring interest in streaming analytics, the IoT trend is also generating interest in edge computing. In some ways, edge computing is the opposite of cloud computing. Instead of transmitting data to a centralized server for analysis, edge computing systems analyze data very close to where it was created — at the edge of the network.

The advantage of an edge computing system is that it reduces the amount of information that must be transmitted over the network, thus reducing network traffic and related costs. It also decreases demands on data centers or cloud computing facilities, freeing up capacity for other workloads and eliminating a potential single point of failure.

While the market for edge computing, and more specifically for edge computing analytics, is still developing, some analysts and venture capitalists have begun calling the technology the “next big thing.”

14. Blockchain

Also a favorite with forward-looking analysts and venture capitalists, blockchain is the distributed database technology that underlies Bitcoin digital currency. The unique feature of a blockchain database is that once data has been written, it cannot be deleted or changed after the fact. In addition, it is highly secure, which makes it an excellent choice for big data applications in sensitive industries like banking, insurance, retail and others.

Blockchain technology is still in its infancy and use cases are still developing. However, several vendors, including IBM, AWS, Microsoft and multiple startups, have rolled out experimental or introductory solutions built on blockchain technology.

Blockchain is distributed ledger technology that offers great potential for data analytics.

15. Prescriptive Analytics

Many analysts divide big data analytics tools into four big categories. The first, descriptive analytics, simply tells what happened. The next type, diagnostic analytics, goes a step further and provides a reason for why events occurred. The third type, predictive analytics, discussed in depth above, attempts to determine what will happen next. This is as sophisticated as most analytics tools currently on the market can get.

However, there is a fourth type of analytics that is even more sophisticated, although very few products with these capabilities are available at this time. Prescriptive analytics offers advice to companies about what they should do in order to make a desired result happen. For example, while predictive analytics might give a company a warning that the market for a particular product line is about to decrease, prescriptive analytics will analyze various courses of action in response to those market changes and forecast the most likely results.

💥 How Netflix uses big data and analytics ?

So, how does Netflix use data analytics? By collecting data from their 151 million subscribers, and implementing data analytics models to discover customer behaviour and buying patterns. Then, using that information to recommend movies and TV shows based on their subscribers’ preferences.

According to Netflix, over 75% of viewer activity is based off personalised recommendations. Netflix collects several data points to create a detailed profile on its subscribers. The profile is far more detailed than the personas created through conventional marketing.

Most significantly, Netflix collects customer interaction and response data to a TV show. For example, Netflix knows the time and date a user watched a show, the device used, if the show was paused, does the viewer resume watching after pausing? Do people finish an entire TV show or not, how long does it take for a user to finish a show and so on.

Netflix even has screenshots of scenes people might have viewed repeatedly, the rating content is given, the number of searches and what is searched for. With this data, Netflix can create a detailed profile on its users. To collect all this data and harness it into meaningful information, Netflix requires data analytics. For example, Netflix uses what is known as the recommendation algorithm to suggest TV shows and movies based on user’s preferences.

Netflix’s ability to collect and use the data is the reason behind their success. According to Netflix, they earn over a billion in customer retention because the recommendation system accounts for over 80% of the content streamed on the platform. Netflix also uses its big data and analytics tools to decide if they want to greenlight original content. To an outsider, it might look like Netflix is throwing their cash at whatever they can get, but in reality, they greenlight original content based on several touch points derived from their user base.

For example, Netflix distributed ‘Orange is the New Black’ knowing it would be a big hit on their platform. How? Because ‘Weeds’, Jenji Kohan’s previous hit performed well on Netflix in terms of viewership and engagement.

Netflix even uses big data and analytics to conduct custom marketing, for example, to promote ‘House of Cards’ Netflix cut over ten different versions of a trailer to promote the show. If you watched lots of TV shows centred on women, you get a trailer focused on the female characters. However, if you watched a lot of content directed by David Finch, you would have gotten a trailer that focused the trailer on him. Netflix did not have to spend too much time and resources on marketing the show because they already knew how many people would be interested in it and what would incentivise them to tune in.

In addition to collecting data on subscriber actions, Netflix also encourages feedback from its subscribers. One feedback system is the thumbs up/thumbs down system that replaced their rating system, the system improved audience engagement by a significant margin, which enabled them to customise the user’s homepage further. According to Joris Evers, Director of Global Communications, there are 33 million different versions of Netflix.

Key takeaways :

Powerful analytics models can process terabytes of data to churn out meaningful information. Judicious use of data analytics is the main reason for Netflix’s success. In fact, big data and analytics are so vital to Netflix’s success that you may as well call them an analytics company instead of a media company.

💥 5 Practical Uses of Big Data:

Here is a list of 5 practical uses of Big Data. Different industries are using Big Data in different ways. In our list we have compiled the uses of Big Data and what industries are using them. Read on to find out more:

1. Location Tracking:

Logistic companies have been using location analytics to track and report orders for quite some time. With Big Data in the picture, it is now possible to track the condition of the good in transit and estimate the losses. It is now possible to gather real-time data about traffic and weather conditions and define routes for transportation. This will help logistic companies to mitigate risks in transport, improve speed and reliability in delivery.

2. Precision Medicine:

With big data, hospitals can improve the level of patient care they provide. 24×7 monitoring can be provided to intensive care patients without the need of direct supervision. On top of that, the efficiency of medication can be improved by analyzing the past records of the patients and the medicines provided to them. The need for guesswork can be significantly reduced.

3. Fraud Detection & Handling:

Banking and finance sector is using big data to predict and prevent cyber crimes, card fraud detection, archival of audit trails, etc. By analyzing the past data of their customers and the data on previous brute force attacks banks can predict future attempts. Not just big data helps in predicting cyber crimes but it also helps in handling issues related to miss transactions and failures in net banking. It can even predict possible spikes on servers so that banks can manage transactions accordingly.

The Securities Exchange Commission (SEC) is using big data to monitor financial markets for possible illegal trades and suspicious activities. The SEC is using network analytics and natural language processors to identify possible frauds in the financial markets.

4. Advertising:

Advertisers are one of the biggest players in Big Data. Be it Facebook, Google, Twitter or any other online giant, all keep a track of the user behavior and transactions. These internet giants provide a great deal of data about people to the advertisers so that they can run targeted campaigns. Take Facebook, for example, here you can target people based on buying intent, website visits, interests, job role, demographics and what not. All this data is collected by Facebook algorithms using big data analysis techniques. The same goes for Google, when you target people based on clicks you will get different results and when you create a campaign for leads that you will get different results. All this is made possible using big data.

5. Entertainment & Media:

In the field of entertainment and media, big data focuses on targeting people with the right content at the right time. Based on your past views and your behavior online you will be shown different recommendations. This technique is popularly used by Netflix and Youtube to increase engagement and drive more revenues.

Now, even television broadcasters are looking to segment their viewer’s database and show different advertisements and shows accordingly. This will allow in better revenue from ads and will provide a more engaging user experience.

Big data is taking people by surprise and with the addition of IoT and machine learning the capabilities are soon going to increase. The amount of data is growing rapidly and so are the possibilities of using it. The number of successful use cases on Big Data is constantly on the rise and its capabilities are no more in doubt.

💥 5 Biggest Risks of Big Data:

Like the two sides of a coin, big data comes with its pros and cons too. Are you prepared to fight the five biggest risks of big data?

There are still enterprises that choose to ignore big data while they can clearly see the flood coming at them.

By 2020, about 1.7 megabytes of information will be created every second for every human being alive. If that doesn’t concern you as an entrepreneur, what else would? Fighting the big data flood is no joke, because it brings with it some serious risks to conquer.

Here are the five biggest risks that big data presents for digital enterprises.

1. Unorganized data

Big data is highly versatile. It comes from number of sources and in number of forms. There’s structured data, there’s unstructured data. There’s data coming from online and offline sources. And all this data keeps piling up each day, each minute. It’s overwhelming for enterprises to tackle such unorganized and siloed data sets effectively. A well planned governance strategy can bring you out of your dark data and help you make sense of it.

2. Data storage and retention

This is one of the most obvious risks associated with big data. When data gets accumulated at such a rapid pace and in such huge volumes, the first concern is its storage. Traditional data storage methods and technology are just not enough to store big data and retain it well. Enterprises today need a shift to cloud based data storage solutions to store, archive and access big data effectively.

3. Cost management

The process of storing, archiving, analyzing, reporting and managing big data involves costs. Many small and medium enterprises think that big data is only for big businesses, and they cannot afford it. However, with careful budgeting and planning of resources, big data costs can be mitigated well. Once the initial set up, migration and overhauling costs are taken care of, big data acts as an incredible revenue generator for digital enterprises.

4. Incompetent analytics

Without proper analytics, big data is just a pile of trash lying unnecessarily in your organization. Analytics is what makes data meaningful, giving management valuable insights to make business decisions and plan strategies for growth. With data growing at such an alarming rate, there’s obviously a lack of skilled professionals and technology to analyze big data efficiently. It exposes enterprises to the risk of misinterpretation of data, and wrong decision making. Hiring the right talent and applying the right tools is crucial to make relevant decisions from a big data project.

5. Data privacy

With big data, comes the biggest risk of data privacy. Enterprises worldwide make use of sensitive data, personal customer information and strategic documents. When there’s so much confidential data lying around, the last thing you want is a data breach at your enterprise. A security incident can not only affect critical data and bring down your reputation; it also leads to legal actions and heavy penalties. Taking measures for data privacy is not just a good initiative anymore, it’s a compliance necessity.

💥 Future of Big Data :

The scene is constantly changing and businesses have to be on the toes to know what the future trends in big data analytics are.

  • The volume of data is only increasing by the year. Considering that people’s preferences and needs change every few months, it would be safe to say that there will be a surge in the usage and applications of big data analytics by companies to gauge the patterns and trends in the market
  • Once more and more companies start realizing how efficient and profitable data analytics is, and how well it benefits them, more companies will leverage it, and the market will continue to grow
  • Different industries need to wake up to the importance of data at its disposal. Companies in the retail industry must analyze customer buying data to predict what their customers will buy next and understand which products they are interested in. Similarly, companies in the engineering and manufacturing sectors must analyze the data of their machinery available to them, to predict which machine may breakdown in the future
  • As more companies adopt big data analytics, more technologies will be developed to provide more accurate predictions. This is like a chain where one factor affects the other, and if all the factors are only increasing and joining hands and helping the market, big data analytics is only going to grow and come up with more variations
  • Though the big data is expected to grow, it is still a raw unstructured field to a certain extent. Of course, it is helping a lot of companies and is helping the market too, but one still needs to understand how to leverage big data analytics more effectively.

💥 6 Big Data Analytics Predictions :

With an exponential growth in big data analytics, this technology is finding new applications across various industry sectors.

  1. Data is going to grow, and grow, and grow. There is no stopping! The yearly demand for new roles like data developers, data scientists, and data engineers may increase to almost 700,000 job opportunities by the year 2020.
  2. There will be a huge demand for analytical skills to work on big data projects. As per IBM, the demand for advanced analysts and data scientists will grow by 28% by the year 2020.
  3. New ways to analyze data will be found. New tools will be discovered, if not invented. Data visualization tools such as Qlikview and Tableau will be in demand.
  4. Real-time insights will be in demand and more companies will opt machine learning for predictive analysis
  5. Privacy of data will be in question and autonomous agents could be in the limelight
  6. As big data will be immensely supported by cognitive technology, the lookout for data-as-a-service models will be on the rise

💥 Prediction from 2020 to 2025 :

📌 The majority of big data experts agree that the amount of generated data will be growing exponentially in the future.

1. Data volumes will continue to increase and migrate to the cloud

AWS, Microsoft Azure, and Google Cloud Platform have transformed the way big data is stored and processed. Before, when companies intended to run data-intensive apps, they needed to physically grow their own data centers. Now, with its pay-as-you-go services, the cloud infrastructure provides agility, scalability, and ease of use.

This trend will certainly continue into the 2020s, but with some adjustments:

  • Hybrid environments. Many companies can’t store sensitive information in the cloud, so they choose to keep a certain amount of data on premises and move the rest to the cloud.
  • Multi-cloud environments. Some companies wanting to address their business needs to the fullest choose to store data using a combination of clouds, both public and private.

2. Machine learning will continue to change the landscape

Playing a huge role in big data, machine learning is another technology expected to impact our future drastically.

Not until recently, machine learning and AI applications have been unavailable to most companies due to the domination of open-source platforms. Though open-source platforms were developed to make technologies closer to people, most businesses lack skills to configure required solutions on their own. Oh, the irony.

The situation has changed once commercial AI vendors started to build connectors to open-source AI and ML platforms and provide affordable solutions that do not require complex configurations. What’s more, commercial vendors offer the features open-source platforms currently lack, such as ML model management and reuse.

This is intriguing and scary at the same time. On the one hand, intelligent robots promise to make our lives easier. On the other hand, there is an ethical issue. Such giants as Google and IBM are already pushing for more transparency by accompanying their machine learning models with the technologies that monitor bias in algorithms.

3. Data scientists and CDOs will be in high demand

The positions of Data Scientists and Chief Data Officers (CDOs) are relatively new, but the need for these specialists on the labor market is already high. As data volumes continue to grow, the gap between the need and the availability of data professionals is already large.

No wonder data scientists are among the top fastest-growing jobs today, along with machine learning engineers and big data engineers. Big data is useless without analysis, and data scientists are those professionals who collect and analyze data with the help of analytics and reporting tools, turning it into actionable insights.

To rank as a good data scientist, one should have the deep knowledge of:

  • Data platforms and tools
  • Programming languages
  • Machine learning algorithms
  • Data manipulation techniques, such as building data pipelines, managing ETL processes, and prepping data for analysis

Striving to improve their operations and gain a competitive edge, businesses are willing to pay higher salaries to such talents. This makes the future look bright for data scientists.

4. Privacy will remain a hot issue

Data security and privacy have always been pressing issues, showing a massive snowballing potential. Ever-growing data volumes create additional challenges in protecting it from intrusions and cyber attacks, as the levels of data protection can’t keep up with the data growth rates.

There are several reasons behind the data security problem:

  • Security skill gap, caused by a lack of education and training opportunities. This gap is constantly growing and will reach 3.5 million unfilled cybersecurity positions by 2021, according to Cybercrime Magazine.
  • Evolution of cyberattacks. The threats used by hackers are evolving and become more complex by the day.
  • Irregular adherence to security standards. Although the governments are taking measures to standardize data protection regulations, GDPR being the example, most organizations still ignore data security standards.

Statistics demonstrate the scale of the problem. Statista calculated the average cyber losses which amounted to $1.56 million for mid-sized companies in the last fiscal year, and $4.7 million across all company sizes, as of May 2019.

5. Fast data and actionable data will come to the forefront

Yet another prediction about the big data future is related to the rise of what is called ‘fast data’ and ‘actionable data’.

Unlike big data, typically relying on Hadoop and NoSQL databases to analyze information in the batch mode, fast data allows for processing in real-time streams. Because of this stream processing, data can be analyzed promptly, within as little as just one millisecond. This brings more value to organizations that can make business decisions and take actions immediately when data arrives.

Fast data has also spoilt users, making them addicted to real-time interactions. As businesses are getting more digitized, which drives better customer experience, consumers expect to access data on the go. What’s more, they want it personalized. In the research cited above, IDC predicts that nearly 30% of the global data will be real-time by 2025.

💥 What the future holds for organizations:

Being frightening and fascinating at the same time, the future of big data analytics promises to change the way businesses operate in finance, healthcare, manufacturing, and other industries.

The overwhelming size of big data may create additional challenges in the future, including data privacy and security risks, shortage of data professionals, and difficulties in data storage and processing.

However, most experts agree that big data will mean big value. It will give rise to new job categories and even entire departments responsible for data management in large organizations. New regulatory structures and standards of conduct will emerge, as companies continue to use consumers’ personal data. Also, most companies will shift from being data-generating to data-powered, making use of actionable data and business insights.

___________________________________________________________________

Note: This blog is written from the data that I got over internet. I tried to answer common questions arising in mind of aspirants.

Undergoing training @ अर्थ 2020 — The School of Technologies, Vimal Daga Sir, LinuxWorld Pvt. Ltd.

LinkedIn : https://www.linkedin.com/in/ssaumyaa7/

For now that’s all people… Thank You 😊

--

--