Presto for the Enterprise @ Hadoop Meetup

  • Published on
    12-Apr-2017

  • View
    729

  • Download
    0

Transcript

Warsaw Hadoop User GroupWojciech Bielaukasz Osipiukwww.teradata.com/presto##Wojtek 2014 TeradataHistory of Teradata Center for HadoopFormerly Hadapt Founded in July, 2010 by Justin Borgman, Kamil Bajda-Pawlikowski, and Daniel AbadiPioneered SQL-on-Hadoop marketBased on work done by database research group in Yale Computer Science DepartmentHybrid of Hadoop scalability and DBMS performanceTodayAcquired by Teradata in July, 2014, renamed Teradata Center for Hadoop20+ developers with deep Hadoop and database expertiseHeadquarters in Boston, MATeams in US (MA, CA) and Poland (Warsaw)Contributors to open source project PrestoWho are we? - Teradata Center for Hadoop!#Supported SQL features:aggregation, scalar, most popular joinslimited subqueriesapproximate at ?simple data types: . structural data types: row, map, arrayNot supported SQL features:Keyword end used as column name, not able to use it in projection unless wrapped in quotesTPCH query 6 returns invalid result No support for NUMERIC / DECIMAL type. We are getting rounding errors due to this. This is current work in progress.NATURAL JOIN is not supportedScalar subqueries not supported: "where x = (select y from ...)"Correlated subqueries not supportedNon-equi joins only supported for inner join: ("n_name" < "p_name")EXISTS, EXCEPT, INTERSECT not supportedDROP TABLE IF EXISTS is unsupported (and errors out with a confusing message: " mismatched input 'exists' expecting {, '.'}")SEMI JOIN is not supportedROLLUP is not supportedLIMIT ALL and OFFSET clause not supported 2014 TeradataWhat is Presto?What is Teradata doing?Can I see a Demo?How can I contribute?Talk Agenda#Supported SQL features:aggregation, scalar, most popular joinslimited subqueriesapproximate at ?simple data types: . structural data types: row, map, arrayNot supported SQL features:Keyword end used as column name, not able to use it in projection unless wrapped in quotesTPCH query 6 returns invalid result No support for NUMERIC / DECIMAL type. We are getting rounding errors due to this. This is current work in progress.NATURAL JOIN is not supportedScalar subqueries not supported: "where x = (select y from ...)"Correlated subqueries not supportedNon-equi joins only supported for inner join: ("n_name" < "p_name")EXISTS, EXCEPT, INTERSECT not supportedDROP TABLE IF EXISTS is unsupported (and errors out with a confusing message: " mismatched input 'exists' expecting {, '.'}")SEMI JOIN is not supportedROLLUP is not supportedLIMIT ALL and OFFSET clause not supported 2014 Teradata100% open source distributed ANSI SQL engine for Big DataModern code baseProven scalabilityOptimized for low latency, Interactive queryingCross platform query capability, not only SQL on HadoopDistributed under the Apache license, now supported by TeradataUsed by a community of well known, well respected technology companiesWhat is Presto?#Supported SQL features:aggregation, scalar, most popular joinslimited subqueriesapproximate at ?simple data types: . structural data types: row, map, arrayNot supported SQL features:Keyword end used as column name, not able to use it in projection unless wrapped in quotesTPCH query 6 returns invalid result No support for NUMERIC / DECIMAL type. We are getting rounding errors due to this. This is current work in progress.NATURAL JOIN is not supportedScalar subqueries not supported: "where x = (select y from ...)"Correlated subqueries not supportedNon-equi joins only supported for inner join: ("n_name" < "p_name")EXISTS, EXCEPT, INTERSECT not supportedDROP TABLE IF EXISTS is unsupported (and errors out with a confusing message: " mismatched input 'exists' expecting {, '.'}")SEMI JOIN is not supportedROLLUP is not supportedLIMIT ALL and OFFSET clause not supported 2014 TeradataHistory of PrestoFALL 20126 developers start Presto developmentFALL 201488 Releases 41 Contributors 3943 CommitsSPRING 201598 Releases65 Contributors4587 Commits---------Teradata joins Presto community & offers supportSPRING 2013Presto rolled out within FacebookFALL 2013Facebook open sources PrestoFALL 2008Facebook open sources Hive# 2014 TeradataQuery ExecutionData stream APIWorkerData stream APIWorkerCoordinatorMetadataAPIParser/analyzerPlannerSchedulerWorkerClientData locationAPIPluggable#Classic MPP database engineExternal data, no internal metadataSingle coordinator, multiple workersWorker autodiscoveryJava8In memory vector processing.Reading in data in vector form if possible.Reading only subset of datapartitionspredicate pushdown (sql,orc)ANSI SQL compliant (subset)Plan -> byte-code generationPipelining danych (no spill to disk) - no checkpointing (no failover)Client (CLI/poor-mans JDBC)Cluster management through REST API and SQL (sys schema) 2014 TeradataQuery ExecutionData stream APIWorkerData stream APIWorkerCoordinatorData LocationAPIMetadataAPIParser/analyzerPlannerSchedulerWorkerClientPluggable#Client sends query over HTTP to coordinatorClients:presto-clijdbc (needs work)odbc (needs work)(contributions) R client, python, ruby, ...HTTP RESTful API.Pretty Stable.JSON based. 2014 TeradataQuery ExecutionData stream APIWorkerData stream APIWorkerCoordinatorData LocationAPIMetadataAPIParser/analyzerPlannerSchedulerWorkerClientPluggable#Query ParsingANSI SQL some more rare features like APPROXIMATE AT CONFIDENCEwritten from scratchANTLR basedno catalogall metadata provided by connectorslimited support for extending data type universumQuery Analyzingassign types to entities 2014 TeradataQuery ExecutionData stream APIWorkerData stream APIWorkerCoordinatorData LocationAPIMetadataAPIParser/analyzerPlannerSchedulerWorkerClientPluggable#QueryPlanner.classLogicalPlanner.plan(analysis) -> logical PlanPlanFragmenter.createSubPlans(Plan) -> fragmented planSubPlanplanFragmentchildren (listdefines data flow from parent to children (replication, hashing, round robin). 2014 Teradataselect shipdate, count(*) count,cast(sum(extendedprice)as bigint) price from h_lineitem where returnflag = 'R' group by shipdateorder by count limit 20Logical and fragmented plan#QueryPlanner.classLogicalPlanner.plan(analysis) -> logical PlanPlanFragmenter.createSubPlans(Plan) -> fragmented planSubPlanplanFragmentchildren (listdefines data flow from parent to children (replication, hashing, round robin).select * from hive.default.h_nation,psql.public.p_regionwhere h_nation.regionkey = p_region.regionkey;Logical and fragmented plan#QueryPlanner.classLogicalPlanner.plan(analysis) -> logical PlanPlanFragmenter.createSubPlans(Plan) -> fragmented planSubPlanplanFragmentchildren (listdefines data flow from parent to children (replication, hashing, round robin).Query ExecutionData stream APIWorkerData stream APIWorkerCoordinatorData LocationAPIMetadataAPIParser/analyzerPlannerSchedulerWorkerClientPluggable#Responsible for assigning nodes for tasks from fragmented plan.Takes data location into consideration (code needs rework)Monitors tasks execution and schedules execution process. 2014 TeradataQuery ExecutionData stream APIWorkerData stream APIWorkerCoordinatorData LocationAPIMetadataAPIParser/analyzerPlannerSchedulerWorkerClientPluggable#Data is exposed by connector.Packaged into sequence of memory pages.Each page has vectorized structure:sequence of Blocks (each column is a block)different block implementations for different datatypes/phases of processing.Http communication between workers. Not JSON - efficient binary protocol.Huge request so transport layer not a problem. 2014 TeradataQuery ExecutionData stream APIWorkerData stream APIWorkerCoordinatorData LocationAPIMetadataAPIParser/analyzerPlannerSchedulerWorkerClientPluggablepage 1blockAblockBpageblockAblockB...#Data is exposed by connector.Packaged into sequence of memory pages.Each page has vectorized structure:sequence of Blocks (each column is a block)different block implementations for different datatypes/phases of processing.Http communication between workers. Not JSON - efficient binary protocol.Huge request so transport layer not a problem. 2014 TeradataQuery ExecutionData stream APIWorkerData stream APIWorkerCoordinatorData LocationAPIMetadataAPIParser/analyzerPlannerSchedulerWorkerClientPluggable#Data is exposed by connector.Packaged into sequence of memory pages.Each page has vectorized structure:sequence of Blocks (each column is a block)different block implementations for different datatypes/phases of processing.Http communication between workers. Not JSON - efficient binary protocol.Huge request so transport layer not a problem. 2014 TeradataPlan executionHivePrestomapreduceI/OI/OI/OI/OI/OtasktasktasktasktasktasktaskI/O#MapReduce:Synchronisation points between stagesDumping stage result after each map/reduce taskFailover support (resume computation from disk snapshot)Presto:Streaming data between tasksAfter initial read data from some storage (depending on connector) data always in memory.Some operations require all data to fit in memory (joins)No checkpoints (simplify architecture) -> no failover 2014 TeradataPresto Extensibility pluginsConnectorsData typesExtra functions(new) Security providers# 2014 TeradataPresto Extensibility connector interfacesParser/analyzerPlannerWorkerData location APIHiveCassandraKafkaMySQLMetadata APIHiveCassandraKafkaMySQLData stream APIHiveCassandraKafkaMySQLSchedulerCoordinator#Other connectors: Posgtgresql / generic SQLRaptorTeradata (in progress)Hive/HDFS connectoroptimized to read ORC and RCFileother formats available via Hive storage handlers (with SerDe overhead) 2014 TeradataPresto Extensibility connector interfacespublic interface Connector{ ConnectorHandleResolver getHandleResolver(); ConnectorMetadata getMetadata(); ConnectorSplitManager getSplitManager(); ConnectorPageSourceProvider getPageSourceProvider() ConnectorRecordSetProvider getRecordSetProvider() ConnectorPageSinkProvider getPageSinkProvider() ConnectorRecordSinkProvider getRecordSinkProvider() ConnectorIndexResolver getIndexResolver() Set getSystemTables() List> getTableProperties() ConnectorAccessControl getAccessControl() void shutdown() {}}# 2014 TeradataData stays in memory during execution and is pipelined across nodes MPP-styleVectorized columnar processingPresto is written in highly tuned JavaEfficient in-memory data structuresVery careful coding of inner loopsBytecode generationOptimized ORC readerPredicates push-downQuery optimizerPresto = Performance#designed for short queries, but still OLAP not OLTPuse HIVE for batch/ETL type jobs 2014 TeradataFacebookMultiple production clusters (100s of nodes total)Including 300PB Hadoop data warehouse1000s of internal daily active usersMillions of queries each monthMultiple PBs scanned every dayTrillions of rows a dayNetflix Over 200-node production cluster on EC2Over 15 PB in S3 (Parquet format)Over 300 users and 2.5K queries dailyPresto in Production# 2014 Teradata100% open source contributions to Presto to increase adoption in the enterpriseA multi-year roadmap commitment to phased enhancements of the open source codeThe first ever commercial support offering for PrestoWhat is Teradata Doing?Teradata Certified Prestowww.teradata.com/presto#Hadoop Distro AgnosticModern Code BasePresto is well-designed open source software with proper database architectureStrong Like-Minded CommunityPush down processing across multiple data platformsLeverage Teradata expertise to make SQL for Hadoop viableWhy is Teradata Contributing to Presto?# 2014 TeradataImplementIntegrateProliferateInstallerDocumentationMonitoring & Support ToolsODBC / JDBC DriversBI CertificationSecurityConnectorsCommercial SupportPhase 1Phase 2Phase 3June 8, 2015Q4 20152016Expanding ANSI SQL CoverageTeradata Contributions to PrestoManagement Tools IntegrationYARN Integration# 2014 TeradataEase of install and management via Presto-Admin toolwww.github.com/prestodb/presto-adminPackaging Presto as an RPMTesting Framework for Prestowww.github.com/prestodb/temptoAdded large number of testsJDBC driver for JAVA 6Various SQL improvementsTeradatas Contributions# 2014 TeradataContinued SQL ImprovementsSecurity Authentication & AuthorizationMore Connectors e.g. HbaseODBC & JDBC Drivers that actually workBI tool certifications e.g. TableauYARN IntegrationAmbari IntegrationOpen Source our Docker based Dev Env - WIPOpen our Continuous Integration platform to the communityTeradatas Contribution Product Roadmap# 2014 TeradataTeradata Engineers Dedicated to Presto# 2014 TeradataPresto is an integral part of the Airbnb data infrastructure stack with hundreds of employees running queries each day with the technology. We are excited to see Teradata joining the Presto open source community and are encouraged by the direction of their contributions - James Mayfield, product lead, Airbnb."We are excited to see Teradata's commitment to Presto and adding capabilities in the open source domain. This will create interesting opportunities within our technical and business teams to open up more access options to our critical data. We think this is a positive for Teradata and for the community as a whole- Steve Deasy, vice president of Engineering, Groupon.Early Feedback is Extremely Positive#Demo Time!#www.github.com/facebook/prestowww.github.com/prestodbCertified Distro: www.teradata.com/prestoWebsite: www.prestodb.ioPresto : Users Group: www.groups.google.com/group/presto-usersFacebook Page: www.facebook.com/prestodbTwitter: #prestodbHow can I contribute?# 2014 TeradataWere hiring!WarsawBostonJob Offer: bit.do/prestoContact: Wojciech.Biela@teradata.comJoin us!# 2014 TeradataAvailable for DownloadPresto 101t Server, CLI, JDBCPresto-Admin 0.1DocumentationHDP w/ Presto VM SandboxCDH w/ Presto VM Sandboxwww.teradata.com/prestoPresto 101t certified by Teradata#Wojciech.Biela@teradata.comLukasz.Osipiuk@teradata.com#