Database as a Service - Tutorial @ICDE 2010

Publish in

Education

5 views

Please download to get full document.

View again

of 102
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
1. Database as a ServiceSeminar, ICDE 2010, Long Beach, March 04<br />Wolfgang Lehner | Dresden University of Technology, Germany Kai-Uwe Sattler | Ilmenau…
Transcript
  • 1. Database as a ServiceSeminar, ICDE 2010, Long Beach, March 04<br />Wolfgang Lehner | Dresden University of Technology, Germany Kai-Uwe Sattler | Ilmenau University of Technology, Germany <br />1<br />
  • 2. Introduction<br />Motivation<br />SaaS<br />Cloud Computing<br />UseCases<br />2<br />
  • 3. Software as a Service (SaaS)<br />Traditional Software<br />On-DemandUtility<br />Plug In, SubscribePay-per-Use<br />Build Your Own <br />3<br />
  • 4. Comparison of business model<br />4<br />
  • 5. Avoidhiddencostof traditional SW<br />Traditional Software<br />SaaS<br />SW Licenses<br />Subscription Fee<br />Training<br />Training<br />Customization<br />Hardware<br />IT Staff<br />Maintenance<br />Customization<br />5<br />
  • 6. The Long Tail<br />Dozens of markets of millions or millions of markets of dozens?<br />Your Large Customers<br />$ / Customer<br />What if you lower your cost of sale (i.e. lower barrier to entry) and you also lower cost of operations<br />Your Typical Customers<br />New addressable market >> current market<br />(Currently) “non addressable” Customers<br /># of Customers<br />6<br />
  • 7. Acquisition Model<br />Service<br />Business Model<br />Pay for usage<br />Access ModelInternet<br />Technical ModelScalable, elastic, shareable<br />EC2 & S3<br />"All that matters is results — <br />I don't care how it is done"<br />Cloud Computing:<br />A style of computing where massively scalable, IT-enabled capabilities are provided "as a service" across the Internet to multiple external customers.<br />"I don't want to own assets — I want<br />to pay for elastic usage, like a utility"<br />"I want accessibility from anywhere from any device"<br />"It's about economies of scale, with effective and dynamic sharing"<br />What is Cloud? – Gartner’s Definition<br />7<br />
  • 8. To Qualify as a Cloud<br />Common, Location-independent, Online Utility on Demand*<br />Common implies multi-tenancy, not single or isolated tenancy <br />Utility implies pay-for-use pricing<br />onDemandimplies ~infinite, ~immediate, ~invisible scalability<br /> Alternatively, a “Zero-One-Infinity” definition:**<br />0On-premise infrastructure, acquisition cost, adoption cost, support cost<br />1Coherent and resilient environment – not a brittle “software <br /> stack”<br />Scalability in response to changing need, Integratability/ <br /> Interoperability with legacy assets and other services Customizability/Programmability from data, through logic, up into the user interface without compromising robust <br /> multi-tenancy <br />* Joe Weinman, Vice President of Solutions Sales, AT&T, 3 Nov. 2008<br />** From The Jargon File: “Allow none of foo, one of foo, or any number of foo”<br />8<br />
  • 9. Cloud Differentials: Service Models<br />9<br />Cloud Software as a Service (SaaS)<br />Use provider’s applications over a network <br />Cloud Platform as a Service (PaaS)<br />Deploy customer-created applications to a cloud <br />Cloud Infrastructure as a Service (IaaS)<br />Rent processing, storage, network capacity, and other fundamental computing resources<br />
  • 10. Cloud Differentials: Characteristics<br />10<br />Platform<br />Physical – Virtual<br />Homogenous – Heterogeneous<br />Design Paradigms<br />Storage<br />CPU<br />Bandwidth<br />Usage Model<br />Exclusive<br />Shared<br />Pseudo-Shared<br />Size/Location<br />Large Scale(AWS, Google, BM/Google), <br />Small Scale(SMB, Academia)<br />Purpose<br />General Purpose<br />Special Purpose (e.g., DB-Cloud)<br />Administration/Jurisdiction<br />Public<br />Private<br />
  • 11. UseCases: Large-Scale Data Analytics<br />Outsourceyourdata and usecloudresourcesforanalysis<br />Historical and mostlynon-criticaldata<br />Parallelizable, read-mostlyworkload, high variantworkloads<br />Relaxed ACID guarantees<br />Examples (HadoopPoweredBy):<br />Yahoo!: researchfor ad systems and Web search<br />Facebook: reporting and analytics<br />Netseer.com: crawling and log analysis<br />Journey Dynamics: trafficspeedforecasting<br />11<br />
  • 12. UseCases: Database Hosting<br />Public datasets<br />Biologicaldatabases: a singlerepositoryinstead of > 700 separate databases<br />Semantic Web Data, Linkeddata, ...<br />Sloan Digital Sky Survey<br />TwitterCache<br />Already on Amazon AWS: <br />annotated human genomedata, <br />US census, <br />Freebase, ...<br />Archiving, Metadata Indexing, ...<br />12<br />
  • 13. UseCases: Service Hosting<br />Data managementforSaaSsolutions<br />Run theservicesnearthedata<br />= ASP<br />Alreadymanyexistingapplications<br />CRM, e.g. Salesforce, SugarCRM<br />Web Analytics<br />Supply Chain Management<br />HelpDesk Management<br />Enterprise ResourcePlanning, e.g. SAP Business ByDesign<br />...<br />13<br />
  • 14. Foundations & Architectures<br />Virtualization<br />Programmingmodels<br />Consistencymodels & replication<br />SLAs & Workloadmanagement<br />Security<br />14<br />
  • 15. Topics covered in this Seminar<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tenancy<br />Service Level Agreements<br />Storage Model<br />DistributedStorage<br />Replication<br />Security<br />15<br />
  • 16. Current Solutions<br />userperspective<br />one DB for all clients<br />one DB per client<br />Virtualization<br />Replication<br />16<br />DistributedStorage<br />physicalperspective<br />
  • 17. ... it‘s simple!<br />17<br />
  • 18. Virtualization<br />Separating the abstract view of computing resources from the implementation of these resources<br />addsflexibility and agility to the computing infrastructure<br />soften problems related to provisioning, manageability, …<br />lowers TCO: fewercomputingresources<br />Classicaldrivingfactor: serverconsolidation<br />18<br />E-mail server<br />Web server<br />Database server<br />E-mail server<br />Database server<br />Linux<br />Linux<br />Linux<br />Linux<br />Linux<br />EDBT2008 Tutorial (Aboulnaga e.a.)<br />Web server<br />Linux<br />Virtualization<br />Consolidate<br /> Improved utilization using consolidation<br />
  • 19. Whatcanbevirtualized – thebigfour.<br />19<br />
  • 20. Different TypesofVirtualization<br />20<br />APP 1<br />APP 4<br />APP 2<br />APP 3<br />APP 5<br />OPERATING SYSTEM<br />OPERATING SYSTEM<br />VIRTUAL MACHINE 1<br />VIRTUAL MACHINE 2<br />CPU<br />CPU<br />CPU<br />MEM<br />MEM<br />NET<br />VIRTUAL MACHINE MONITOR (VMM)<br />PHYSICAL STORAGE<br />PHYSICAL MACHINE<br />CPU<br />MEM<br />NET<br />CPU<br />CPU<br />
  • 21. Virtual Machines<br />21<br />Technique with long history (since the 1960's)<br />Prominent since IBM 370 mainframeseries<br />Today<br />large scale<br />commodity hardware and operating systems<br />Virtual Machine Monitor (Hypervisor)<br />strong isolation between virtual machines (security, privacy, fault tolerance)<br />flexible mapping between virtual machines and physical resources<br />classical operationspause, resume, checkpoint, migrate (admin / load balancing)<br />Software deployment<br />Preconfigured virtual appliances<br />Repositories of virtual appliances on the web<br />
  • 22. DBMS on top of Virtual Machines<br />... yetanotherapplication?<br />... Overhead?<br />SQL Server withinVMware<br />22<br />
  • 23. Virtualization Design Advisor<br />What fraction of node resources goes to what DBMS?<br />Configuring VM parameters<br />What parameter settings are best for a given resource configuration<br />Configuringthe DBMS parameters<br />Example<br />Workload 1: TPC-H (10GByte)<br />Workload 2: TPC-H (10GByte) only Q18 (132 copies)<br />Virtualization design advisor<br />20% of CPU to Workload 1<br />80% of CPU to Workload 2<br />23<br />
  • 24. Some Experiments<br />Workload Definition based on TPC-H<br />Q18 isoneofthemost CPU intensive queries<br />Q21 isoneofthe least CPU intensive queries<br />Workload Units<br />C: 25x Q18<br />I: 1x Q21<br />Experiment: Sensitivity to workloadResource Needs<br />W1 = 5C + 5I<br />W2 = kC + (10-k)I (increaseof k -> more CPU intensive)<br />Postgres<br />DB2<br />24<br />
  • 25. Some Experiments (2)<br />Workload Settings<br />W3 = 1C<br />W4 = kC<br />Workload Settings<br />W5 = 1C<br />W6 = kI<br />25<br />
  • 26. Virtualization in DBaaS environments<br />DB Layer<br />DB Server<br />DB Server<br />DB Server<br />DB<br />DB<br />DB<br />DB<br />DB<br />Instance <br />Layer<br />Instance<br />Instance<br />Instance<br />Instance<br />Instance<br />Instance<br />DB Server <br />Layer<br />VM<br />VM<br />VM<br />VM<br />VM<br />VM<br />VM Layer<br />HW Layer<br />26<br />
  • 27. Existing Tools for Node Virtualization<br />DB Server<br />DB Layer<br />DB<br />DB<br />DB<br />DB<br />DB<br />DB Ad2visor<br /><ul><li>Indexes
  • 28. MQTs
  • 29. MDC
  • 30. Redistribution of Tables</li></ul>DB Workload Manager<br />Instance <br />Layer<br />Instance<br />Instance<br />DB Server <br />Layer<br />Static Environment Assumptions<br /><ul><li> Advisor expects static hardware environment
  • 31. VM expects static (peak) resource requirements
  • 32. Interactions between layers can improve performance/utilization</li></ul>Node<br />Ressource Model<br />VM<br />VM<br />VM<br />VM Layer<br />VM Configuration<br /><ul><li>Monitoring
  • 33. Resources Configuration
  • 34. (manual) Migration</li></ul>HW Layer<br />27<br />
  • 35. Layer Interactions (2)<br />Experiment<br />DB2 on Linux<br />TPC-H workload on 1GB database<br />Ranges for resource grants<br />Main memory (BP) – 50 MB to 1GB<br />Additional storage (Indexes) – 5% to 30% DB size<br />Varying advisor output (17-26 indexes)<br />Different possible improvement<br />Different expected Performance after improvement<br />DB Advisor<br />Expected Performance<br />Possible Improvement<br />Index <br />Storage<br />Index <br />Storage<br />35%<br />90%<br />25%<br />25%<br />20%<br />20%<br />15%<br />15%<br /><1%<br /><3%<br />10%<br />10%<br />VM Configuration<br />5%<br />5%<br />200<br />MB<br />400<br />MB<br />600<br />MB<br />800<br />MB<br />1<br />GB<br />200<br />MB<br />400<br />MB<br />600<br />MB<br />800<br />MB<br />1<br />GB<br />BP<br />BP<br />28<br />
  • 36. Storage Virtualization<br />General Goal<br />provide a layerofindircetiontoallowthedefinitionofvirtualstoragedevices<br />minimize/avoiddowntime (local and remote mirroring)<br />improveperformance (distribution/balancing – provisioning - controlplacement)<br />reducecostofstorageadministration<br />Operations<br />create, destroy, grow, shrinkvirtualdevices<br />changesize, performance, reliability, ...<br />workloadfluctuations<br />hierarchicalstoragemanagement<br />versioning, snapshots, point-in-time copies<br />backup, checkpoints<br />exploit CPU and memory in the storage system<br />caching<br />executelow-level DBMS functions<br />29<br />
  • 37. Virtualization in DBaaS Environments (2)<br />DB Layer<br />DB Server<br />DB Server<br />DB Server<br />DB<br />DB<br />DB<br />DB<br />DB<br />Instance <br />Layer<br />Instance<br />Instance<br />Instance<br />Instance<br />Instance<br />Instance<br />DB Server <br />Layer<br />VM<br />VM<br />VM<br />VM<br />VM<br />VM<br />VM Layer<br />Shared Disk<br />HW Layer<br />Storage Layer<br />30<br />Local Disk<br />
  • 38. Virtualization in DBaaS Environments (2)<br />DB Layer<br />DB<br />DB<br />DB<br />DB<br />DB<br />DB Server<br />Instance <br />Layer<br />Instance<br />Instance<br />DB Server <br />Layer<br />VM<br />VM<br />VM<br />VM Layer<br />HW Layer<br />Storage Layer<br />31<br />DB Advisor<br /><ul><li>Indexes
  • 39. MQTs
  • 40. MDC
  • 41. Redistribution of Tables</li></ul>DB Workload Manager<br />StorageRessource Model<br />Storage Configuration<br /><ul><li>Device Bundling
  • 42. Replication
  • 43. Archiving</li></ul>Shared Disk<br />Local Disk<br />
  • 44. Onewaytogo? Paravirtualization<br />CPU and Memory Paravirtualization<br />extendstheguest to allow direct interaction withtheunderlyinghypervisor<br />reducesthemonitorcostincludingmemoryand System calloperations.<br />gainsfromparavirtualizationareworkloadspecific<br />Device Paravirtualization<br />places a highperformancevirtualization-aware device driver into the guest<br />paravirtualizeddriversaremoreCPU efficient (less CPU overhead forvirtualization)<br />Paravirtualizeddriverscanalso take advantage of HW features, like partial offload<br />
  • 45. Outline<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tenancy<br />Service Level Agreements<br />Storage Model<br />DistributedStorage<br />Replication<br />Security<br />33<br />
  • 46. Multi Tenancy<br />Goal: consolidate multiple customersontothesame operational system<br />best resourceutilization<br />flexible,butlimitedscalability<br />separate DBper tenant<br />shared DBsharedschema<br />shared DBseparate schema<br /><ul><li>Requirements:
  • 47. Extensibility: customer-specificschemachanges
  • 48. Security: preventingunauthorizeddataaccessesbyothertenants
  • 49. Performance/scalability: scale-up & scale-out
  • 50. Maintenance: on tenantlevelinstead of on databaselevel</li></ul>34<br />
  • 51. Flexible Schema Approaches<br />Goal: allowtenant-specificschemaadditions (columns)<br />Universal Table<br />Extension Table<br />PivotTable<br />35<br />
  • 52. Flexible Schema Approaches: Comparison<br />Best performance<br />Flexible schemaevolution<br />Pivottable<br />Extension table<br />Chunkfolding<br />Private tables<br />Applicationownstheschema<br />Database ownstheschema<br />Universal table<br />XML columns<br />Universal table: <br />requirestechniquesforhandlingsparsedata<br />Fine-grainedindexsupportnotpossible<br />Pivottable:<br />Requiresjoinsforreconstructinglogicaltuples<br />Chunkfolding: similar to pivottables<br />Group of columnsarecombined in a chunk and mappedinto a chunktable<br />Requirescomplexquerytransformation<br />36<br />
  • 53. Access Control in Multi-Tenant DB<br />Shared DB approachesrequirerow-levelaccesscontrol<br />Query transformation.... whereTenantID = 42 ...<br />Potential securityrisks<br />DBMS-levelcontrol, e.g. IBM DB2 LBAC<br />Label-based Access control<br />Controls read/writeaccess to individualrows and columns<br />Securitylabelswithpolicies<br />Requires separate accountforeachtenant<br />37<br />
  • 54. In a Nutshell<br />How shall virtualization be handled on<br />Machine level (VM to HW)<br />DBMS level (database to instance to database server)<br />Schema level (multi tenancy)<br />... using …<br />Allocation between layers<br />Configuration inside layers<br />Flexible schemas<br />… when …<br />Characteristics of the workloads are known<br />Virtual machines are transparent<br />Tenant-specific schema extensions<br />… demanding that …<br />SLAs and security are respected<br />Each node’s utilization is maximized<br />Number of nodes is minimized<br />38<br />
  • 55. Outline<br />Query & Programming Model<br />Logical Data Model<br />Virtuali-zation<br />Multi-Tenancy<br />Service Level Agreements<br />Storage Model<br />DistributedStorage<br />Replication<br />Security<br />39<br />
  • 56. MapReduce Background<br />40<br />Programming model and an associated implementation for large-scale data processing<br />Google and related approaches: Apache Hadoop and Microsoft Dryad<br />User-defined map & reduce functions<br />Infrastructure<br />hides details of parallelization<br />provides fault-tolerance, data distribution, I/O scheduling, load balancing, ...<br />map (in_key, in_value) -> (out_key, intermediate_value) list<br />reduce (out_key,intermediate_value list) -> out_value list<br />M<br />{ (key,value) }<br />R<br />M<br />R<br />M<br />
  • 57. Logic Flow of WordCount<br />Mapper<br />Hadoop Map/Reduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters
  • Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks