Just like cloud computing, Big Data has
become a hot topic of 2012. What lies behind the hype?
In this hyper-competitive world, forcing the rival
company to continuously reduce margins, business sees big data as an
opportunity to get the ultimate weapon in the fight for survival. As predicted
by the experts, by the end of 2012, over 90% of the Fortune 500 will actively
prepare for at least a few initiatives in the region of big data. What is big
data and why they have to worry about?
What Is Big Data?
The Simplest Definition
The term “Big
Data” refers simply to the management and analysis of large
amounts of data. According to the report, Big Data: The next Frontier for
Innovation, Productivity and competition, the term “Big Data” refers to data sets whose size is
beyond the capabilities of typical databases (DB) Named by, storage, management and
analysis of information. In addition, the world’s data repository is definitely
growing. As presented in mid-2011, the report analyst firm IDC “Digital
Universe Study” predicted that the total global volume of data created and
replicated in 2011 could be around 1.8 zettabyte (1.8 trillion. gigabytes) –
about 9 times more than what was established in 2006.
More Complex Definition
However, “Big
Data” suggests something more than just an analysis of huge
amounts of information. The problem is not that the organization creates huge
amounts of data, but the fact that most of them are presented in a format that
poorly match the traditional format of a structured database – web-logs,
videos, text documents, or machine code, for example, geospatial data. All of
this is stored in a variety of different stores, or even outside the organization.
As a result, corporations are able to have access to the huge volume of the
data and do not have the necessary tools to establish the relationship between
these data and make them the basis for meaningful conclusions. Adding the fact
that the data is updated more often, and you have a situation in which
traditional methods of analysis of information cannot keep up with the huge
volume of constantly updated data, which ultimately paves the way to big data
technologies.
Best Definition
In fact, the concept of big data involves working
with a huge amount of information and a variety of very frequently updated data
and located in different sources in order to increase efficiency, create new
products and improve competitiveness. Big data has a combined engineering and
technology that extract meaning from the data at the extreme limit of
practicality.
Real Trend or Just a Hoax?
Doubters
Not everyone in the IT industry believes that big
data has the same “high”
value, as the myth created around it. Some experts say that the access to the
heap of facts and the ability to analyze does not mean that you do it right.
Some experts are arguing that this is a dubious
competitive advantage – to spend hours pondering the data that everyone has,
and that the idea of big data is the use of new information and draw
conclusions, which no one did. Even in this situation, it is important to
quickly understand the meaning and context to data, and in some cases, it can
be difficult.
When Will The Time Come For Big Data?
Experts do not think that companies should dive
into the topic of big data, if they do not believe that it will bring answers
to their questions.
The leaders of the industry should be able to
describe the problem they want to solve with the help of big data, whether the
acceleration of existing processes (for
example, to detect fraud) or the introduction of new,
previously considered impractical or too expensive (for example, streaming data from” smart
“sensors and assessment of the impact peak of meteorological information to
fluctuations in demand). If you cannot articulate the purpose
of their efforts in the field of big data, do not begin to deal with them.
This process requires an understanding of what
information is needed to make better decisions. If the best way of obtaining
such information is the analysis of large data, the more likely it is time to
start moving in that direction. If such information can be obtained using
conventional technology business analysis, it may be time to use big data, which
is not yet come.
How Big Is The Difference Between Business
Intelligence And Big Data?
The business process analysis is a descriptive
analysis of the results achieved by the business during a certain period, while
the speed of processing large data leads to predictive analysis that can offer
business advice for the future. Technology allows large data to analyze more
data types in comparison with business intelligence tools, which allows focusing
on structured repositories.
Working with large data is not like the normal
process of business intelligence, where a simple addition of the known values
yields the result: for example, the result of the addition of data on paid
bills become sales for the year. When working with large data result is
obtained in the course of their treatment by the successive modeling: first, a
hypothesis, based statistical, visual or semantic model, based on its fidelity
to the hypothesis is checked, and then makes the following. This process
requires the researcher or interpreting visual values or making interactive
queries based on the knowledge or the development of adaptive algorithms, “Machine Learning“,
the ability to obtain the desired result. The lifetime of this algorithm can be
quite short.
Pitfalls
Do You Know Where Your Data Is?
It makes no sense to implement a solution for
working with large data-only to realize that critical information scattered
throughout the organization cannot reach unknown places. Most companies
already do not possess all the information within their own organizations, and
just die in the attempt to analyze the additional information obtained from
processing of large data.
Lack of Skills
Even if a company decides to implement
technologies to handle large data, it may encounter difficulties in attracting qualified
employees. From specialist to work with the data (as well as their intellectual analysis)
requires a unique combination of skills, including a strong background in
mathematics and statistics, a deep knowledge of statistical tools such as SAS,
SPSS, or based on open source statistical package; ability to find patterns in
the data. All of this must be supported by a good knowledge of the subject area
and excellent communication skills to understand the problems of intelligence
and how to resolve them.
Finding specialists that meet this combination of
requirements is not easy, as it takes about half a million managers and
analysts to work on the analysis of big data and make decisions based on the
results.
For staff, it is important to fully understand
what they are doing. Big Data form the relationship, and then you are the only
solution, whether they are reliable in terms of statistics or not. The number
of permutations and possibilities that you can make means that many people can
affect the result.
Personal Data
Tracking of personal customer data in order to
stimulate demand seems an attractive idea for the seller, but does not seem
necessary for the purchaser of this product. Not everyone wants to make their
life become the subject of analysis and depending on how you will develop rules
for the use of personal data in a variety of different countries, companies
will be cautious in their plans to work with big data, including methods of
data collection. These rules may result in fines in the case of very aggressive
policy in this area, but even greater risk may result in the loss of trust.
Safety
Customers trust companies to ensure the security
of their personal data. However, since large data represents a completely new area
for these products, which were developed without adequate attention to safety
issues, despite the fact that the vast amounts of stored information make the
task of ensuring the safety of their storage, is more important than ever
before.
Over the last year or two, there have been several
well-publicized cases of leakage of confidential data, including the leakage of
information about hundreds of thousands of customers. The government promises
to review the laws on notification of cases of leakage of confidential
information from the time of the 2008 analysis of the security of personal
data. The government advises companies to be prepared for a situation where
they will be required to inform customers about the cases of loss and theft of
personal data. In addition, government said that they would take tough measures
against organizations that are having an irresponsible storage of sensitive
information.
Steps to Large Data.
If you decide to move in the direction of big
data, it is important to be fully prepared and to approach the project in an
organized way, to answer a number of questions.
What would you like to know? Here we have to
decide what you want to find out with this big data, which we cannot get from
the current system. If the answer is – nothing, then maybe you should wait to
start this project.
What are your information assets? Can you build in
the asset system of cross-references to certain laws and formulate lessons? Is
it possible to create new products for data on these assets? If not, then how
can you make this possible?
Once you figure it out, time to prioritize. Select
the most potentially valuable area for the application of the techniques and
technology of big data, prepare a business case to run “pilot” (proof of concept),
drawing attention to a set of skills that you will need in the implementation.
You will need to speak with the owners of the data to get a complete picture.
Run a pilot project and make sure you have
well-formulated test completion, to assess the results. This may be a good time
to offer the owner to take responsibility of information resources for the
project.
For the conclusion of “pilot project”, estimate
how it works, are you getting real conclusions and recommendations?
Whether the work is paid off? Can this project be replicated in other parts of
the organization? Is there any other information that can be included in it?
This will help to answer the question – whether to launch a full project made
by “pilot”, or something must be correct?