Presto Meetup @ Facebook (2014-05-14)

  • Published on
    21-Apr-2017

  • View
    8.120

  • Download
    4

Transcript

PrestoPast, Present, and FutureDain SundstromSELECT now() - INTERVAL 6 MONTHBy The Numbers6 months15 releases30 contributors662 commits1406 files changed130,305 insertions(+) 43,699 deletions(-)New SQL FeaturesCreate tableDistinct aggregationsCross joinsCustom functionsOptimizationsRange predicate push downDistributed aggregationsDistributed window functionsDistinct-limit optimizationApproximate queriesType SystemPlugins can add new scalar typesExtensible operators DATE, TIME, TIMESTAMP and INTERVALTime zones with DST rulesLocalized parse and formatHyperLogLog typeNew ConnectorsHadoop 1.xHadoop 2.xCDH 5Custom S3 integration for HadoopCassandraTPC-HSELECT now()Hive 0.13 SupportNew file formatsORCParquetDWRFVectorized ORC (2-3x more efficient)ORC stripe skippingIndex JoinsTargeting low cardinality joinsLazy hash buildPredicate push downAggregation push downInitial version in already checked inCurrently supported in HBase and MySQLConnectorsHBaseRequires features in Facebook HBaseIndex joinsJDBC (MySQL)Sharding Index joinsViewsCreate/drop viewsView definition stored in connectorFully optimized by PrestoViews stored in Presto syntaxNot compatible with existing Hive viewsMachine LearningSupports classification and regressionMultiple algorithms (Currently only SVM)Feature extraction and normalizationNew functions and typesPossibly extend SQL grammarHighly experimentalContinuous IntegrationContinuous correctness testingRun queries against prod and trunkContinuous benchmarkRun full test suite with every connectorFaster release cycleSELECT now() + INTERVAL 1 YEARAPPROXIMATE AT 95.0 CONFIDENCESQL FeaturesStructs, Maps and ListsTable generating functionsScalar sub queriesFeatures required to run all TPC-DSCreate table with partitioningPossibly: Insert, delete, drop partitionExecution EngineHuge joins and aggregationsHash distributedCo-distributed and co-partitionedSpill to disk (flash)Work stealingBasic task recoveryNative StoreStores data directly on worker nodesUses custom data formatInitial use casesStore for hot dataStore for live dataSupport co-distributed dataSecurityAuthenticationUsername/password, Kerberos, SSL certAuthorizationIntegration with pluginsGrant permissions from SQLNew REST APIPrepared statementsBound parametersServer managed sessionsExplicit support for non-query (DML/DDL)Split query submission, stats, and data fetchingODBC Driver Targeting major BI toolsTableau, MicroStrategy and ExcelSupport for Windows, Mac and LinuxWill require new REST APIWritten in DEntirely open source (ASL2)PluginsPlugin repositoryManage plugins from CLIFunction catalogsPush down joins and aggregationsCustom optimizersSELECT questionFROM audienceWHERE isAwesome(question)(c) 2007 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0