Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)

  • Published on

  • View

  • Download


PowerPoint PresentationHello, Enterprise! Meet PrestoTeradata Contributions to Presto10/6/15Christina Wallin#Teradata Center for HadoopFormerly Hadapt, the first SQL-on-Hadoop company (founded in 2010)Offices in Boston and Warsaw, some remote employees in CA and CTAround 20 employees working on PrestoContributors to the open source project Presto!Who are we?#What is Presto?100% open source distributed ANSI SQL engine for Big DataModern architecture and implementationProven scalability and performanceOptimized for low latency, interactive queryingCross platform query capability, not only SQL on HadoopDistributed under the Apache license, now supported by TeradataUsed by a community of well known, well respected technology companies#Interactive performance of execution engineCode generation for operators (similarly to Impala)Data is pipelined MPP-styleRuns at Facebook scale*Capable of querying other non-HDFS data stores as well* 2014 Teradata#Presto ArchitectureCoordinatorParser/analyzerPlannerSchedulerWorkerClientWorkerWorker# 2014 Teradata#Presto Pluggable Data sources CapabilitiesPush-down to Hadoop SystemPush-down to Other DatabaseHadoop HDFSOther DatabasesHadoop KafkaHadoopHadoop prestoPush-down to NoSQL DatabasesnoSQLdatabases#Add information specific to your understanding of the client challenges or objectives that would lead to an analytic roadmap. This should be very tailored to the client audience. 2014 Teradata#Teradata Contributions to PrestoImplementIntegrateProliferateInstallerDocumentationMonitoring & Support ToolsManagement Tool IntegrationYARN Integration ODBC DriverJDBC DriverBI CertificationSecurityConnectorsCommercial SupportPhase 1Phase 2Phase 3June 8, 2015Q4 20152016Expanding ANSI SQL Coverage# 2014 Teradata#Easy Installation and Administration##presto-admin can:Install and uninstall PrestoDeploy configuration files across the clusterStart/stop/restart Presto serversShow you the status of the clusterAdd and remove connectorsUpgrade Presto to a different versionCollect logs, query info, system info for supportAdditionally, we added an RPM for Prestohttps://github.com/prestodb/presto-adminpresto-admin: a tool to manage and install Presto#Hadoop Ecosystem Integration##Ambari Integration (Work In Progress)http://github.com/prestodb/ambari-presto-service#####Resource Allocation with YARNSlated for Q4 2015Allow Presto to run its services within YARN containers so that YARN knows about memory/CPU allocated to Presto.Using Apache SliderThe allocation is fixed and upfrontSupports HDP and CDH Hadoop VersionsYARN CGroups Integrationhttp://github.com/prestodb/presto-yarn#Presto-Yarn Integration objective - resource allocation meant for long running services. In addition for cases where Presto and Hadoop share the same hardware (or cluster) Yarn integration also provides an unified way of accounting and monitoring of cluster utilization.The goal of this is to be transparent to YARN about how much RAM / CPU was allocated to Presto so that less is available to other YARN applications (MapReduce, Tez, etc.)The allocation is fixed and upfront - no dynamic changes to resource allocation supported for Phase 2. To reconfigure memory/cpu settings, a restart is necessary.YARN has introduced support for CPU sharing (via CGroups). Currently, CGroups is only used for limiting CPU usage. So we will leverage this to limit Presto in the CPU usage. (Slider also has some CPU resource sharing support)Apache Slider is a YARN application to deploy existing distributed applications on YARN, monitor them and make them larger or smaller as desired . Sliders objective is to make it easy for existing distributed applications, like Presto, to be deployed on a YARN cluster without changes and with little or no custom code.15#Enterprise Database Features##Improved ODBC driver -- Q4 2015Improved JDBC driver -- Q1 2016Certification against Tableau, Qlik, etc. mid 2016Unleashing Presto on Business Intelligence Tools#Current ContributionsDECIMAL type (WIP)Additional smaller things new functions, bug fixes, TIMESTAMP support for ParquetFuture goal: Support TPC-H and TPC-DS unmodified!Additional subquery and join supportEXISTS, EXCEPT, INTERSECTVarious other odds and endsExpanded ANSI SQL Support#Demo of presto-admin!#Untar presto-admin & install./presto-admin server install presto-server-rpm.rpm./presto-admin server startPause briefly so that the coordinator finds the workers./presto-admin server status./presto-admin configuration showCat hive.propertiesMv hive.properties /opt/prestoadmin/connectors./presto-admin connector add hive./presto-admin server restartwait./presto-admin server statusPresto CLI: ./presto server localhost:8080 catalog hive schema defaultshow tables;Create table lineitem as select * from tpch.1gb.lineitem;Select count(*) from lineitem; 2014 Teradata#https://github.com/facebook/prestohttps://github.com/prestodb/presto-adminCertified distro: http://www.teradata.com/presto/Also can download VM images pre-installed with PrestoHow can I give Presto a try?#Questions?##