Technologies




A 2011 McKinsey Global Institute report characterizes the main components and ecosystem of big data as follows:

  • Techniques for analyzing data, such as A/B testing, machine learning and natural language processing
  • Big data technologies, like business intelligence, cloud computing and databases
  • Visualization, such as charts, graphs and other displays of the data

Multidimensional big data can also be represented as OLAP data cubes or, mathematically, tensors. Array Database Systems have set out to provide storage and high-level query support on this data type. Additional technologies being applied to big data include efficient tensor-based computation, such as multilinear subspace learning., massively parallel-processing (MPP) databases, search-based applications, data mining, distributed file systems, distributed cache (e.g., burst buffer and Memcached), distributed databases, cloud and HPC-based infrastructure (applications, storage and computing resources) and the Internet.citation needed Although, many approaches and technologies have been developed, it still remains difficult to carry out machine learning with big data.

Some MPP relational databases have the ability to store and manage petabytes of data. Implicit is the ability to load, monitor, back up, and optimize the use of the large data tables in the RDBMS.promotional source?

DARPA's Topological Data Analysis program seeks the fundamental structure of massive data sets and in 2008 the technology went public with the launch of a company called Ayasdi.third-party source needed

The practitioners of big data analytics processes are generally hostile to slower shared storage, preferring direct-attached storage (DAS) in its various forms from solid state drive (SSD) to high capacity SATA disk buried inside parallel processing nodes. The perception of shared storage architectures—Storage area network (SAN) and Network-attached storage (NAS) —is that they are relatively slow, complex, and expensive. These qualities are not consistent with big data analytics systems that thrive on system performance, commodity infrastructure, and low cost.

Real or near-real-time information delivery is one of the defining characteristics of big data analytics. Latency is therefore avoided whenever and wherever possible. Data in direct-attached memory or disk is good—data on memory or disk at the other end of a FC SAN connection is not. The cost of a SAN at the scale needed for analytics applications is very much higher than other storage techniques.

There are advantages as well as disadvantages to shared storage in big data analytics, but big data analytics practitioners as of 2011update did not favour it.promotional source?

Comments

Popular posts from this blog

Applications

Architecture

Big data