There’s no one way to ensure Big Data analytics success, however following a set of frameworks and best practices outlined here, can help businesses to keep their Big Data initiatives on track. Based on some of the more recent successful analytics initiatives I have witnessed, here are 8 key best practices from a business and technology perspective:-
Big Data isn`t just about technology, it’s about strategy. Big Data solutions are most successful when approached from a business perspective and not solely from an IT perspective.
1 Define a Big Data Analytics Strategy
Like any new technology opportunity, Big Data will under deliver on its potential without a clear business strategy.
Addressing the business strategy for Big Data no different from any other investment in new technology. Whether the goal is to increase operational efficiency, make better product pricing decision, boost marketing campaign lift, strength customer interactions, revenue assurance, counter fraud, manage risk or streaming data from ‘smart meters’, business must be able to provide clear business case on the problem they want analytics to solve.
Once the business case is identified and prioritized, this should then lead to a solid roadmap and plan to realize the Big Data Strategy. A good strategy will not just address business problems; it will also prioritize the right problems that have the most value to the organization.
2 Connect the Stakeholders
Big Data initiative requires not just sponsorship but championship within the senior levels of the business. While senior management support may be optional for most technology projects, active senior management involvement is absolutely essential for analytics projects to be successful.
Cross functional team needs to be engaged to ensure business needs are understood and data is accessible to meet specific business goals. This usually begins with executive level sponsorship to the Big Data initiative.
Next it`s pivotal that the business users which potentially are impacted by the initiative including business analysts and operational executives, are engaged from the initial phase of business requirement gathering to execution.
Especially important is the role of the data scientist which ideally has quantitative, statistical knowledge and industry expertise to create the analytical queries and algorithms required to extract the desired business insights from complex customer, operational, and industry specific marketplace data. The data scientist should not only have the analytics and modelling skills but also strong business acumen, coupled with ability to communicate insights and findings to influence senior management to address the business challenge.
3 Establish Critical Success Factors
The success of any analytics projects is highly determined by the value it brings to the business. Hence it`s highly advisable for any analytics projects to commence by first establishing critical success factors(CSFs) and with an agreement from the project sponsor and stakeholders. When establishing critical success factors for Big Data Analytics project, focus in two areas: employee usage of the resulting intelligence and key performance indicators for the processes where analytics will be used.
Whilst measuring the critical success factors from a project management perspective, it’s key that Big Data Analytics initiatives values are measured and tracked with a focus in two areas. First being the adoption of the resulting analytics intelligence by key business users. This measurement shows the total amount of information consumed by the key business user , who’s using the data, how much are they using it, and what are they using it for?
Second being the key performance indicators of the business process itself where Analytics is positioned such as customer satisfaction, supply chain efficiency, and other established metrics.
Comparing numbers before and after the Big Data Analytics implementation should clearly show what the organization has gained as a result—as well as other opportunities to collect data and put it to work.
Used this way, analytics often deliver a huge ROI, helping a company identify problems early on (or before they happen) and take steps to fix (or prevent) them. As Big Data Analytics gets embedded into the day-to-day operational routines of stakeholders, organisations can be sure to have a successful adoption of Big Data Analytics enterprise wide, driving competitive business advantage through analytics.
4 Run a Pilot Project
A well run successful POC not only validate the Big Data strategy, but increases executive sponsor confidence, validates infrastructure requirements and sizing, confirms project approach and helps to gain project momentum. The results of pilots will provide the direction needed by management to determine where to make the next incremental investment prior to making larger commitments.
Whether the objective is to generate insights needed to improve innovation, boost customer loyalty, support profitable expansion, or gain a competitive edge, an effectively executed pilot can quickly demonstrate the value of Big Data business case and further extrapolate its success to the wider enterprise.
Big Data pilot should aim at combining internal data from data warehouses, log files and transactional systems such as ERP, CRM, log files with external data from social media, benchmarks or third party data. Data models can then be aggregated and mined to reveal new patterns and relationships that prove business case.
For business users, technology itself is almost irrelevant – it’s almost a red herring. Business users couldn’t care less about the specific technology used whether it`s an in-memory analytics or graph database. What they care about is being able to rapidly and intuitively analyze large amounts of data for better business insights.
Looking across the Big Data business case, architects must assess the required changes for data, applications/tools and technical architecture and lay a robust and scalable plan.
Here are few pointers you should consider from a technology perspective.
5 Evaluate Data Requirements
Prior to deciding on any tools or platforms, it`s recommended to have an technical assessment of the data requirements to understand:-
- What is the structure of data sources which needs to be leveraged within business? Is it unstructured, semi-structured or highly structured?
- Data frequency of the data being processed? Is it on-demand, continuous feed or real-time?
- What is the analysis type which should be used? Is this batch or streaming/real-time?
- What type of data sources do we need to work with? Web & Social, Machine generated, human generated, biometric, transactional system or other?
- What is the volume of data received? Are these massive chunks or small and fast chunks?
- What will be the consumers of this data? Will it be human, business process, other enterprise applications or other repositories?
- How to store data or how to process it?
- How to visualize results?
Having a clear requirement spelled out in advance would be key to selecting the best fit tools and technologies for the Big Data Analytics engagement.
6 Take a Structured Approach
When it comes to choosing a project methodology for Big Data Analytics initiatives, my recommendation would be to use agile and iterative implementation techniques that delivers quick results addressing the business requirement instead of a big bang SDLC application development.
The best practice is to start small by meeting the requirement identified in the specific business case and gradually expanding the program while not loosing sight of the bigger business strategy.
Generally, successful Big Data Analytics projects are executed in four phases:
Phase 1: Ingest
Ingest which essentially refers to the process of acquiring the data, is the first step in the Big Data implementation. Here variety sources of data are brought on a common platform including:-
Structured Data such as relational databases in ERP, CRM, point-of-sale transactions, call detail records, general ledger transactions and call-center transactions
Unstructured Data such as Documents, Webpages, Logs, E-mails, Audio, Video, Photos, PDS/A, RSS Feeds, XML, comments or social media feeds from sites such as Twitter, Facebook, Linkedin and PinInterest.
This is where we focus on understanding the different data sources and how the data is getting ingested because the goal is to aggregate all the data and to have a unified view around all the different sources independently of where they came from.
Phase 2: Transform
Once the data is ingested, the next phase in the process is where data is refined, organized, integrated, cleansed, normalized, aggregated and transformed ensuring we have all the data required and the most accurate version. The transform step may include multiple data manipulations, such as moving, splitting, translating, merging, sorting, pivoting, and more. Often this step also involves validating the data against data quality rules.
Phase 3: Analyze
In this phase, we understand the data as it`s the first step to finding value. We typically enrich the data to find correlations or insights among disparate data sets with data mining algorithms.
Some of the commonly deployed data mining algorithms include Naïve Bayes, Decision Trees, Logistic Regression, Association Rules, Heuristical K-Means Clustering, Kohonen, O-Cluster, Survival Analysis, Neural Networks etc.
For example banks may use regression algorithms predict a probability of loan default by analyzing details of the loan applications. Online retailers may analyse the relationships between items on their data set and use association techniques to provide best next offer recommendations in real time.
In the Apache Hadoop environment, modelling can be done writing codes in Python, Java or Hive Query language(HQL). R and Apache Mahout are well-known machine learning algorithms commonly used for pattern mining within large data sets.
As more data sources are introduced and linked in these models to solve broader business problems across functions and business units, it`s important to ensure models built can be scaled up without inconsistencies or complexity. The most important step at this stage is incorporation of business inputs from key business process owners on prioritizing specific variables or hypotheses when analyzing the data
Phase 4: Visualize
The final step in big data analytics workflow, the big data analytics visualization is a visual representation of the insights gained from analysis.
After cleaning the data, filtering through enormous amounts of data and replicating the application logic to make the data self-describing, the process continues with the visual mode of representation. Visualizations could include tables, graphs, charts, diagrams, scatter plots, maps, or tag clouds.
7 Pick the Right Tools and Technology for the requirement
As new Big Data tools and technologies become mainstream in the marketplace, the number of available options can sometimes be perplexing. Though this could only be a good thing, as there isn`t such a thing as “one size fits all” tool or “silver bullets”, evaluating tools and vendors and their main different strength, weaknesses, and tradeoffs can be less painful with a clear expectation of the technology requirement. The evaluation of tools is out of scope of this article, however lets take a brief look at the different options beginning with Apache Hadoop.
Apache Hadoop is popular framework designed to process terabytes and even petabytes of unstructured and structured data. It breaks large workloads into smaller data blocks that are distributed across a cluster of commodity hardware for faster processing.
Hadoop platform consists of four key components mainly:-
- Hadoop Distributed File System – A scalable storage platform that stores data across multiple machines without a defined organization structure
- MapReduce – A software programming model with a parallel data processing engine for large data sets
- YARN – A Resource management framework for job scheduling & handling of resources from distributed application
- Hadoop Common – The utilities and libraries used by other Hadoop modules
Besides Apache Hadoop, there are vendors that provide Hadoop distribution platforms include Cloudera, Hortonworks, MapR and Amazon Elastic Map Reduce(EMR). With these Hadoop distribution vendors, you receive a commercial version of the Apache Hadoop framework and additional packaged software components, tools, training, documentation and etc.
In addition to Apache Hadoop and Hadoop distribution vendors, there are commercially available big data suites from big vendors like SAP, SAS, Teradata, Microsoft or IBM. More often that not, the big data suite providers does support integration to leading Hadoop distribution platforms or have directly built Hadoop solution within their existing software portfolio.
While more big names vendors are now converging to Hadoop framework for its low cost benefits, most organisations tend of use a hybrid approach and select tools based on maturity of the tool and unique capabilities required.
To make the right choice, I would recommend checking out the different tools and unique capabilities of options available from Big Data commercial vendors or Hadoop distribution vendors, to understand the strength, weakness and potential trade offs which provides the best fit to the technology requirement(i.e volume, variety, velocity and veracity.It also key to perform a cost benefit analysis between enterprise versions plus commercial supports offered by the different vendors.
8 Define Data Governance Early
Today, security and privacy concerns are magnified by the velocity, volume and variety of data, such as large scale cloud infrastructures, diversity of data sources and formats, and streaming nature of data. While businesses are increasingly adopting Big Data innovations for competitive advantage, it’s important to build resilience against new forms of security threats and realign organizations security, privacy and governance policies especially when dealing with cross countries and continents privacy laws.
Effectively managing these risks will require companies to revisit governance structures and frameworks in order to allow for the effective and timely identification and assessment of risks in order to make informed risk / reward decisions.
A defined data governance strategy should be considered in early stages of planning process, including in the design and evaluation of pilots, to adequately protect information privacy and security.
Big Data Analytics is certainly amongst the most exciting technology discussions in organisations today. Businesses are positive about embracing and reaping the competitive advantage through analytics. The guiding principles outlined above will hopefully help increase the chances of success of big data analytics initiatives as businesses harness the possibility to compete on analytics.