Managing Performance in the Cloud
How to you manage Performance in the Cloud, in particular in "Platform as a Service (PaaS) environments like Window's Azure or Heroku where you don't have a "virtual machine" to manage? Even in "Infrastructure as a Service (IaaS)" environments like Amazon EC2 there are limitations on the tools you can deploy into that environment to assist in performance management, troubleshooting etc (e.g. you can't deploy promiscuous mode network sniffing tools in EC2). James Smith from Adactus will give us an overview of Cloud Services as a whole, and then drill down into some of the issues they have experienced in deployed their "Pulse" Claims Management Solution into the Azure cloud (http://www.pulseclaims.com/home). Beyond just looking at page speed performance he'll talk about the challenges involved in managing SLA's, Cloud "support" (or lack of it!), performance troubleshooting and the whole "performance lifecycle".
1. Managing Performance in the Cloud TheDevMgr 2. BACKGROUND Cloud History 3. Desktop internet computing Shift from local to centralised computing Software was cheap and hardware was expensive. In the nineties 4. Shift from desktop to mobile The cloud is born Bezos and his book company start to shape the future. The carefree noughty days 5. Shift from centralised to distributed computing Commoditisation of computing (PAYG) Anything-as-a-Service (XaaS). The twenty-tens 6. THE CLOUD What is it? 7. Service Models XaaS SaaS PaaS IaaS Anything Software Platform Infrastructure 8. Infrastructure (IaaS) Outsource hardware to support operations Storage, servers, networking components Service provider owns and hosts equipment Service provider responsible for management & maintenance. 9. Platform (Paas) Paradigm for delivering operating systems and associated services over the Internet No downloads or installation Google App Engine, Microsoft Windows Azure, Heroku & Force.com. 10. Software (SaaS) Software distribution model in which applications are hosted by a vendor or service provider Made available to customers over the Internet SalesForce.com, many...many...more. 11. Deployment Models Private PublicHybrid 12. Virtualised infrastructure operated for a single organisation (single tenant) Hosted internally or externally Managed internally or by a third-party Can be secured to meet compliance More expensive, less flexible. Private Cloud 13. Service provider makes resources available to the general public over the Internet Compute, Storage, O/S, Applications May be free or pay-per-usage model Fast deployment, short commitments Shared services, less control. Public Cloud 14. Core platform on private cloud Burstable capability into public cloud Brings best of both private and public Brings problems of both private and public. Hybrid 15. THE COST OF POOR CLOUD PERFORMANCE Financial and customer satisfaction 16. Cost Compuware survey suggests large business losses can exceed 500k due to poor cloud performance 57% of European IT Directors believe that they cant manage cloud application performance You still have to deliver 2 second response times. 17. Performance 50% of ops teams have suffered more than one P-1 performance issue in the cloud 33% experience a P-1 issue every month 60% of incidents took more than 2 hours to resolve Good luck webops (cloudops). Source: AppDynamics 18. COMMON PERFORMANCE CHALLENGES Traditional and new problems 19. Performance Challenges Traditional Connectivity Bandwidth / Latency Bottlenecks CPU, IO, Database Contemporary Bigger scale More stuff Shared infrastructure Not your stuff (entirely). 20. Traditional Connectivity Latency, jitter & Packet loss Bandwidth limitations Users demand fast access to data Bottlenecks Will still occur! Virtualised hardware Host Contention Storage. 21. Contemporary Bigger Scale 10s, 100s, 1000s, 10,0 00s of servers VM Sprawl Dynamically allocated physical resource Over-provisioning Hidden billing costs Shared Resources Room for one more? Deal with other peoples problems DDOS, general stupidity? Mi casa, es tu casa. 22. Elasticity Planned (scheduled/controlled scaling) Unplanned (auto-scaling) Global distribution Data Centres Data Less Control. Paradigm Shift 23. Data location still matters! 24. CLOUD EXPERIENCES Stories from the trenches 25. INFRASTRUCTURE-AS-SERVICE IaaS 26. Adactus Food Ordering Platform Transacts > 7 million orders & > $100M USD a year 30% daily of orders taken in1 hour Adopted as eCommerce platform for Pizza Hut and KFC globally. Application 27. Platform Private Global instances all deployed on private clouds VMWare ESX Hosts V-Webs Dedicated / Non- Virtualised SQL Public Rackspace public cloud On-Demand Load Balancers Web Servers SQL Servers High-scale, high- volume. 28. Big Scale A lot more to manage Virtual Platform Contention End-to-End Application Performance Management. Challenges 29. Solutions Cloud-centric APM AppDynamics CloudKick (now Rackspace APM) Rightscale Automated Operations Chef, Puppet (SysOps) CloudFoundry, OpenShift (App LifeCycle) Heroku, AppFog (NoOps?) 30. PLATFORM-AS-A-SERVICE PaaS 31. Adactus Pulse Claims management solution for the insurance industry delivered as SaaS Processed over a million claims Deployed for ISS and Aviva. Application 32. Platform Deployed into Windows Azure Platform Web Roles Worker Roles SQL Azure SQL Azure Reporting Services Upgrade of traditional ASP.NET application Continuous Deployment Process. 33. Challenges Disproving the shared resource impact Is it the infrastructure? Database performance is a black-box Limitations and more limitations Getting performance data is hard work Not easy to access, dispersed everywhere Baseline performance is not linear. 34. Baseline Performance Large variances in baseline performance. 35. Windows Azure is more consistent. 36. Solutions Instrumentation is king Aspect Orientation (AOP) Gibraltar Does your provider offer a Performance API? Dedicated Cloud (Azure) Tools Dynatrace Cerebrata You must automate Deployment (and everything else!) Consistency is key. 37. DATABASE-AS-SERVICE DaaS 38. Service provider takes responsibility for installing and maintaining the database. Amazon (mySQL) Microsoft SQL Azure Google App Engine Datastore CouchDB, MongoDB. Overview 39. Challenges Most service providers are having performance issues (even Google!) Database is a (performance) black-box You will find limitations Need to handle transient connections Your database will be there, but not always. 40. Solutions Do as much tuning outside of the cloud as possible Instrument your data access DB sharding becomes viable easy Build connection resiliency into your data- framework. 41. On-premise databases Are you sure? You might be about to create your own data storm? Too much on-premise data Too little bandwidth. Caution 42. SOFTWARE-AS-A-SERVICE SaaS 43. Overview Adactus Pulse Delivered on a SaaS Model We consume SaaS (heavily) CRM, Performance, Google Apps, WIKI, Bug Tracking, Testing, Accounting, Planning & Forecasting, Document Management, CMS, Exception Handling, Business Intelligence, Deployment, APM, Collaboration, HRM, ERP and more. 44. Challenges Consumer Good news Performance is out of your control! Bad news Performance is out of your control! Provider Expectations are high! Response times Performance is still king! Competitors Repeat use. 45. Real User Monitoring Consumer Its your new best friend Get to know your SLA Its your new best friend Simple rules Be the first to know Get your money back Provider Its your new best friend You will live & die by your SLAs Simple rules Be the first to know Tell your customers. 46. MonitoringXaaS SaaS PaaS IaaS RUM Instrumentation APM 47. BEYOND PERFORMANCE Stories from the trenches 48. Service-Level-Agreements Critical element for both provider and consumer Dont waste time on detailed numerical service level agreements SLAs need to be based on end-user experience. 49. Service-Level-Agreements 1. Establish system availability 2. Establish system response time 3. Establish error resolution time 4. Establish a fail over window for disaster recovery 5. Ensure that you can get your data back. 50. Service-Level-Agreements IaaS The O/S is your responsibility Managed Cloud Platforms are available PaaS SLAs stop at the O/S Your application still remains your responsibility SaaS Know your SLA inside out. Its your responsibility. 51. Disaster Recovery Its hard in the cloud DR strategies are still emerging Bandwidth & network capacity limits Security is still a concern. 52. Disaster Recovery There isnt a single blueprint Identify critical resources and recovery methods Architect for redundancy Back up to/from and restore to/from the cloud Most cloud SLAs > 99.5% availability 4 hours, 39 minutes downtime per month. 53. THANK YOU. QUESTIONS? Thats all folks!