Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340

  • Published on
    02-Dec-2014

  • View
    168

  • Download
    3

DESCRIPTION

Presentation given at a session for SQLSaturday #340 on 09/20/14.

Transcript

  • 1. HADOOP: PAST, PRESENT AND FUTURE BIG DATA INTELLIGENCE PRACTICE 2014 Trace3, All rights reserved.
  • 2. Roadmap 2014 Trace3, All rights reserved. 1 ~1 hour 1- What Makes Up Hadoop 1.x? 2- Whats New In Hadoop 2.x? 3- The Future Of Hadoop
  • 3. WHAT MAKES UP HADOOP 1.0? 2014 Trace3, All rights reserved.
  • 4. Whats a Node? 2014 Trace3, All rights reserved. Node aka Server OperaVng System Compute Storage Memory
  • 5. Hadoop 1.0: HDFS + MapReduce 2014 Trace3, All rights reserved. 4 NameNode JobTracker DataNode / TaskTracker DataNode / TaskTracker DataNode / TaskTracker DataNode / TaskTracker Client 1-1 11--23
  • 6. Hadoop 1.0: HDFS + MapReduce 2014 Trace3, All rights reserved. 5 NameNode JobTracker DataNode / TaskTracker DataNode / TaskTracker 2-1 3-2 Map Reduce DataNode / TaskTracker DataNode / TaskTracker Client 1-1 1-2 1-3 Map Reduce 3-3 4-1 2-3 4-2 2-2 3-1 4-3
  • 7. MapReduce v1 LimitaVons 2014 Trace3, All rights reserved. 6 Scalability Maximum cluster size is 4,000 nodes and maximum concurrent tasks is 40,000 Availability JobTracker failure kills all queued and running jobs Resources ParVVoned into Map and Reduce Hard parGGoning of Map and Reduce slots led to low resource uVlizaVon No Support for Alternate Paradigms / Services Only MapReduce batch jobs, nothing else
  • 8. Hadoop 1.0: Single Use System Pig Hive MapReduce (cluster resource management and data processing) 2014 Trace3, All rights reserved. 7 HADOOP 1.0 Single Use System Batch Apps HDFS (redundant, reliable storage)
  • 9. WHATS NEW IN HADOOP 2.0? 2014 Trace3, All rights reserved.
  • 10. YARN 2014 Trace3, All rights reserved. 9 YARN Replaces MapReduce Yet Another Resource NegoVator YARN will be the de-facto distributed operaVng system for Big Data
  • 11. YARN = BIG DATA 2014 Trace3, All rights reserved. 10 =
  • 12. YARN: No Longer Just Batch Apps 2014 Trace3, All 11 rights reserved. Store DATA in one place Interact with that data in MULTIPLE WAYS with Predictable Performance and Quality of Service ApplicaGons Run NaGvely IN Hadoop YARN (cluster resource management) HDFS2 (redundant, reliable storage) BATCH (MapReduce) INTERACTIVE (Tez) ONLINE (HBase) STREAMING (DataTorrent) GRAPH (Giraph)
  • 13. YARN: ApplicaVons Online Running all on the same Hadoop cluster to give applicaVons access to all the same source data! 2014 Trace3, All 12 rights reserved. MapReduce v2 Stream Processing Master-Worker In-Memory Apache Storm
  • 14. YARN: Quickly Maturing Version 2.3 2014 Trace3, All 13 rights reserved. 2010 2011 2012 2013 2014 Today Conceived at Yahoo! Alpha Releases 2.0 Beta Releases 2.1 GA Released 2.2 Version 2.4 200,000+ nodes, 800,000+ jobs daily 10 million+ hours of compute daily
  • 15. YARN: What Has Changed? 2014 Trace3, All 14 rights reserved. YARN MRv1 RM ResourceManager AM ApplicaVonMaster JT JobTracker Scheduler Scheduler NM TT NodeManager TaskTracker Container Map & Reduce Slot ResourceManager Scheduler JobTracker Scheduler NodeManager ApplicaVonMaster TaskTracker Map Reduce NodeManager Container Container TaskTracker Map Reduce
  • 16. The 6 Benefits Of YARN 2014 Trace3, All rights reserved. 15 Scale New programming models and services Improved cluster uVlizaVon Agility Backwards compaVble with MapReduce v1 Mixed workloads on the same source of data
  • 17. THE FUTURE OF HADOOP 2014 Trace3, All rights reserved.
  • 18. SQL on Hadoop Speed Deliver interacGve query performance. SQL Support array of SQL semanGcs for analyGc applicaGons running against Hadoop. Scale SQL interface to Hadoop designed for queries that scale from Terabytes to Petabytes 2014 Trace3, All rights reserved.
  • 19. SQL on Hadoop Hive on Apache Tez Hortonworks HDP2 Hive on Apache Spark Cloudera CDH5 Apache Drill MapR M7 Cloudera Impala Cloudera CDH5 Pivotal HAWQ Pivotal Big Data Suite 2014 Trace3, All rights reserved.
  • 20. HOYA: HBase (NoSQL) on YARN Dynamic Scaling On-demand cluster size. Increase and decrease the size with load. Easier Deployment APIs to create, start, stop and delete HBase clusters. Availability Recover from Region Server loss with a new container. 2014 Trace3, All rights reserved.
  • 21. Microsoq REEF Machine Learning Framework well suited for building machine learning jobs. Scalable / Fault Tolerant Makes it easy to implement scalable, fault- tolerant runGme environments for a range of computaGonal models. Maintain State Users can build jobs that uGlize data from where its needed and also maintain state a`er jobs are done. 2014 Trace3, All rights reserved. Retainable Evaluator ExecuGon Framework
  • 22. Heterogeneous Storage NameNode Storage 2014 Trace3, All rights reserved. NameNode SATA SSD Fusion IO THEN NOW
  • 23. Hadoop Roadmap Apache Hadoop 2.5 NodeManager 2014 Trace3, All rights reserved. Restart w/o disrupGon Dynamic Resource ConfiguraGon Apache Hadoop 2.6 Memory As Storage Tier Support For Docker Containers Q3 2014 Q4 2014
  • 24. HADOOP: PAST, PRESENT & FUTURE 2014 Trace3, All rights reserved. 23 I KNOW YOU HAVE QUESTIONS
  • 25. One More Thing BigDataCentric 2014 Trace3, All rights reserved. an InteracVve Big Data Community Big Data Meetups SD Big Data Meetup San Diego, CA meetup.com/sdbigdata 2nd Wednesday of Every Month OC Big Data Meetup Irvine, CA meetup.com/ocbigdata 3rd Wednesday of Every Month BigDataCentric.com Forum / Community Events / Blogs
  • 26. THANK YOU! hrp://bigdatajoe.io/ @bigdatajoerossi bigdatajoerossi@gmail.com 2014 Trace3, All rights reserved.

Recommended

View more >