Big data is here to stay. Even technologists who might have previously disparaged it as a buzz word will now acknowledge that the term and its accompanying technologies are evolving into real-world enterprise offerings and data center strategies. This SearchStorage.com big data tutorial is designed to get data storage managers up to speed on the conversations shaping the decisions many IT managers are making about big data technology. An increasing number of data sources -- such as social media -- and a growing number of media-rich data types -- such as X-rays and video -- are fueling the challenges associated with big data at companies that might never have thought of themselves as big data customers. The divide between analytics and storage in the world of big data is narrowing as data storage managers find themselves tasked with designing and managing big data infrastructures. In addition, big data sets that include company-sensitive and personal data have unique security and compliance requirements that managers need to adhere to. View the content in our big data storage tutorial to learn more about these high-transaction environments, new scale-out technologies, rising I/O demands and the latest news on Hadoop.
Big data technologies
Among the big data technologies you'll need to know are the Apache Software Foundations' Java-based Hadoop programming framework that can run applications on systems with thousands of nodes; and the MapReduce software framework, which consists of a Map function that distributes work to different nodes and a Reduce function that gathers results and resolves them into a single value. Also gaining more attention is the Apache Hive data warehousing component, which offers a query language called HiveQL that translates SQL-like queries into MapReduce jobs automatically. In addition to Hive, more vendors are trying to get in on the Hadoop action with their own SQL Server-Hadoop connector engines. Check out the links below to get more in-depth information about each of these technologies for big data, and how newer technologies are reversing some of the big data challenges enterprises once faced.
Hadoop can be a cost-effective way to perform analytics on big data, but only with an adequate knowledge of how to use it. Continue Reading
Improvements to Hadoop Distributed File System and proprietary alternatives are beginning to address some of the shortcomings of the framework. Continue Reading
Hadoop may be one of the most notable big data technologies, but Apache Hive is the query language that makes working with that technology easier. Continue Reading
The Hadoop framework started as a way to manage big data, but is starting to bleed into data warehousing as well. Continue Reading
Vendors such as RainStor, GridIron and Quantum produce technologies that aim to address some of the common problems associated with storing big data. Continue Reading
John Webster on big data architecture
John Webster, a senior partner at Evaluator Group based in Boulder, Colo., offers a thorough explanation of how to manage big data storage environments and Apache Hadoop technology, and provides readers with alternatives to DAS in Hadoop storage. This four-part video presentation begins with a high-level discussion of big data architecture and closes with a technical explanation of the Hadoop Distributed File System (HDFS) and NameNode in Hadoop architectures.
Storage platforms for big data are becoming increasingly important as they have to house a large amount of capacity, but also handle business-critical analytics.
Hadoop is the most talked about big data framework, though there are commercial versions available that aim to address some of its shortcomings.
Using DAS isn't the only option for performing analytics on big data. Here's how to evaluate alternatives to the direct-attached model.
Hadoop's file system uses a "cheap and deep" method, meaning it allows commodity hardware to be scaled to petabytes for a low cost.
Big data vendor trends
How do you know what to focus on when it comes to big data? In this section of our big data tutorial, we've selected the most crucial trends coming from the vendor sphere so data storage managers can stay up to date on popular storage options in the industry.
Hadoop connector software acts as a link between databases and Hadoop clusters, and vendors such as IBM and Microsoft are using them as part of their big data strategies. Continue Reading
Hyperscale storage, made popular by Web giants such as Facebook, is growing in adoption when it comes to use cases such as Web serving and database applications. Continue Reading
CIOs: Big data challenges
CIOs have their own big data challenges: Many feel they should take the lead in identifying those patterns that might drive better business decisions. To do this, they'll need to add to their company's big data skill set by hiring data scientists, mathematicians and information architects. Evaluating the business value that big data and analytics can provide is difficult, and building architecture with adequate capacity and performance is a touchy task. Furthermore, convincing the business that big data governance is a concern for the executive suite and even the boardroom is an important issue. Find out more about the challenges CIOs face with big data in this special collection of news and analysis from SearchCIO.com.
Data experts say human tendency to hoard data and to get stuck in an old way of doing things often presents businesses from making innovative decisions around big data. Continue Reading
Siloed data has plagued effective analysis of data for years, but in big data environments it becomes even more troublesome. Continue Reading
According to one CIO expert, thinking about big data as a larger version of small data sets can be detrimental to computing and analysis. Continue Reading
By moving away from thinking of big data in terms of size, CIOs can get a better grip on the business value of storing and analyzing data sets. Continue Reading
According to a number of analysts, there are some best practices that can help businesses make it or break it in the world of big data. Continue Reading
One of the biggest decisions CIOs will have to make when it comes to big data is whether to build or buy their architecture. Continue Reading
Architectures for big data environments are not one-size-fits-all. That means CIOs need to look at what functionality they need when it comes to selecting vendors. Continue Reading
Big data analytics drive business value
Big data analytics can use real-time data to provide insight into business processes and trends, and ultimately can provide a lot of value to enterprises. But performing analytics that produce actionable results requires skilled data analysts and an infrastructure that can handle constant processing. To help you be sure you're extracting the most valuable results from your analytics project, these big data experts explain what to watch out for.
Some experts say it's best to start small when getting started with big data analytics. Continue Reading
Do you know it all when it comes to big data analytics? Take the quiz to find out. Continue Reading
Big data security and compliance issues
Information stored in big data environments is often extremely sensitive. An organization might be storing customer data, financial records or information integral to business processes -- all information that could do significant damage to the organization if compromised. Security is sometimes baked into database offerings, but it often isn't as comprehensive in open source big data technologies that are growing in popularity. For that reason, third-party or additional security measures may need to be taken. In addition, many enterprises analyze big data sets that have specific requirements for privacy and governance, so additional steps may be necessary for compliance. The following links present some common big data security and compliance issues and tips for proactively preventing a breach.
Big data video
Our big data video section offers a behind-the-scenes glimpse into what users and analysts think of what vendors are doing in the space.
Hear one expert's take on how much of the big data boom is just hype, and whether there are real insights to be gained from big data analytics.
Analysts discuss EMC's strategy to focus on big data at its annual EMC World conference in Las Vegas.
Big data doesn't necessarily require a completely new set of tools in order to perform analytics; your existing technologies could get the job done.
8Terms to know-
Big data glossary
With so many technologies and products associated with big data and analytics, there are a lot of terms to know. Use this glossary for a quick take of some of the most common terms you'll likely come across.