Related projects
Other Hadoop-related projects at Apache include:
- Ambari™: A web-based tool for provisioning,
managing, and monitoring Apache Hadoop clusters which includes
support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase,
ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard
for viewing cluster health such as heatmaps and ability to view
MapReduce, Pig and Hive applications visually alongwith features to
diagnose their performance characteristics in a user-friendly
manner.
- Avro™: A data serialization system.
- Cassandra™: A scalable multi-master database
with no single points of failure.
- Chukwa™: A data collection system for managing
large distributed systems.
- HBase™: A scalable, distributed database that
supports structured data storage for large tables.
- Hive™: A data warehouse infrastructure that provides
data summarization and ad hoc querying.
- Mahout™: A Scalable machine learning and data
mining library.
- Ozone™: A scalable, redundant, and
distributed object store for Hadoop.
- Pig™: A high-level data-flow language and execution
framework for parallel computation.
- Spark™: A fast and general compute engine for
Hadoop data. Spark provides a simple and expressive programming
model that supports a wide range of applications, including ETL,
machine learning, stream processing, and graph computation.
- Submarine: A unified AI platform which allows
engineers and data scientists to run Machine Learning and Deep Learning workload in
distributed cluster.
- Tez™: A generalized data-flow programming framework,
built on Hadoop YARN, which provides a powerful and flexible engine
to execute an arbitrary DAG of tasks to process data for both batch
and interactive use-cases. Tez is being adopted by Hive™, Pig™ and
other frameworks in the Hadoop ecosystem, and also by other
commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce
as the underlying execution engine.
- ZooKeeper™: A high-performance coordination
service for distributed applications.