Data Center Storage and Networking Hakim Weatherspoon

35 Slides877.31 KB

Data Center Storage and Networking Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking December 1, 2014 Slides from ACM SOSP 2013 presentation on “IOFlow: A Software-Defined Storage Architecture.” Eno Thereska, Hitesh Ballani, Greg O'Shea, Thomas Karagiannis, Antony Rowstron, Tom Talpey, and Timothy Zhu. In SOSP'13, Farmington, PA, USA. November 3-6, 2013. “

Goals for Today IOFlow: a software-defined storage architecture – E. Thereska, H. Ballani, G. O'Shea, T. Karagiannis, A. Rowstron, T. Talpey, R. Black, T. Zhu. ACM Symposium on Operating Systems Principles (SOSP), October 2013, pages 182-196.

Background: Enterprise data centers VM VM VM VM VM VM VM VM VM VM VM VM Virtual Machine Virtual Machine vDisk vDisk Hypervisor S-NIC NIC Switch General purpose applications Application runs on several VMs S-NIC Switch NIC Switch S-NIC S-NIC Storage server Storage server Separate network for VM-to-VM traffic and VM-to-Storage traffic Storage is virtualized Resources are shared 2

Motivation Want: predictable application behaviour and performance Need system to provide end-to-end SLAs, e.g., Guaranteed storage bandwidth B Guaranteed high IOPS and priority Per-application control over decisions along IOs’ path It is hard to provide such SLAs today 5

Example: guarantee aggregate bandwidth B for Red tenant VM Virtual Machine vDis k vDis k Hypervisor S-NIC NIC S-NIC NIC App OS Malware scan Compression File system File system Caching Caching Scheduling Switch Switch Switch S-NIC S-NIC Storag e server Storag e server Scheduling Hypervisor IO Manager Drivers Storage server File system Deduplication VM Virtual Machine App OS Caching Deep IO path with 18 different layers that areScheduling configured Drivers SLAs and operate independently and do not understand 6

Challenges in enforcing end-to-end SLAs No storage control plane No enforcing mechanism along storage data plane Aggregate performance SLAs - Across VMs, files and storage operations Want non-performance SLAs: control over IOs’ path Want to support unmodified applications and VMs 7

IOFlow architecture Decouples the data plane (enforcement) from the Packets controlIOplane (policy logic) Malware scan Compression File system File system Scheduling Scheduling Hypervisor IO Manager Drivers Client-side IO stack High-level SLA Queue 1 Queue n Controller IOFlow API Storage server File system Deduplication App OS . App OS Caching Scheduling Drivers Server-side IO stack 8

Contributions Defined and built storage control plane Controllable queues in data plane Interface between control and data plane (IOFlow API) Built centralized control applications that demonstrate power of architecture 9

SDS: Storage-specific challenges Low-level primitives End-to-end identifier Data plane queues Control plane Old networks SDN Storage today SDS

Storage flows Storage “Flow” refers to all IO requests to which an SLA applies {VMs}, {File Operations}, {Files}, {Shares} -- SLA source set destination sets Aggregate, per-operation and per-file SLAs, e.g., {VM 1-100}, write, *, \\share\db-log} --- high priority {VM 1-100}, *, *, \\share\db-data} --- min 100,000 IOPS Non-performance SLAs, e.g., path routing VM 1, *, *, \\share\dataset --- bypass malware scanner 11

IOFlow API: programming data plane queues 1. Classification [IO Header - Queue] 2. Queue servicing [Queue - token rate, priority, queue size ] 3. Routing [Queue - Next-hop] IO Header Malware scanner 12

Lack of common IO Header for storage traffic VM3 VM4 Block device Z: (/device/scsi1) VM2 VM1 SLA: VM 4, *, *, \\share\dataset -- Bandwidth B Guest OS Application File system Block device Server and VHD Hypervisor \\serverX\AB79.vhd VHD Scanner SMBc Network driver Physical NIC Compute Server SMBs Volume and file H:\AB79.vhd Network driver File system Disk driver Block device /device/ssd5 Physical NIC Storage Server 13

Flow name resolution through controller VM3 VM4 VM2 VM1 SLA: {VM 4, *, *, //share/dataset} -- Bandwidth B Guest OS Application File system Block device Hypervisor VHD SMBc exposes IO Header it understands: VM SID, //server/file.vhd Queuing rule (per-file handle): Scanner VM4 SID, //serverX/AB79.vhd -- Q1 File Q1.token rate -- B SMBs SMBc Network driver Physical NIC Compute Server Controller Network driver system Disk driver Physical NIC Storage Server 14

Rate limiting for congestion control Important for performance SLAs Today: no storage congestion control tokens Queue servicing [Queue - token rate, priority, queue size ] IOs Challenging for storage: e.g., how to rate limit two VMs, one reading, one writing to get equal storage bandwidth? 15

Rate limiting on payload bytes does not work VM VM 8KB Writes 8KB Reads Storag e server 16

Rate limiting on bytes does not work VM VM 8KB Writes 8KB Reads Storag e server 17

Rate limiting on IOPS does not work VM VM 64KB Reads 8KB Writes Storag e server Need to rate limit based on cost 18

Rate limiting based on cost Controller constructs empirical cost models based on device type and workload characteristics RAM, SSDs, disks: read/write ratio, request size Cost models assigned to each queue ConfigureTokenBucket [Queue - cost model] Large request sizes split for pre-emption 19

Recap: Programmable queues on data plane Classification [IO Header - Queue] Per-layer metadata exposed to controller Controller out of critical path Queue servicing [Queue - token rate, priority, queue size ] Congestion control based on operation cost Routing [Queue - Next-hop] How does controller enforce SLA? 20

Distributed, dynamic enforcement {Red VMs 1-4}, *, * //share/dataset -- Bandwidth 40 Gbps VM VM VM VM Hypervisor VM VM VM VM Hypervisor 40Gbps Storag e server SLA needs per-VM enforcement Need to control the aggregate rate of VMs 1-4 that reside on different physical machines Static partitioning of bandwidth is sub-optimal 21

Work-conserving solution VM VM VM VM Hypervisor Storag e server VM VM VM VM Hypervisor VMs with traffic demand should be able to send it as long as the aggregate rate does not exceed 40 Gbps Solution: Max-min fair sharing 22

Max-min fair sharing Well studied problem in networks Existing solutions are distributed Each VM varies its rate based on congestion Converge to max-min sharing Drawbacks: complex and requires congestion signal But we have a centralized controller Converts to simple algorithm at controller 23

Controller-based max-min fair sharing t control interval s stats sampling interval What does controller do? INPUT: per-VM demands Infers VM demands Controller Uses centralized max-min within s a tenant and across tenants t Sets VM token rates OUTPUT: Chooses best place to enforce per-VM allocated token rate 24

Controller decides where to enforce Minimize # times IO is queued and distribute rate limiting load VM VM VM VM Hypervisor Storag e server VM VM VM VM Hypervisor SLA constraints Queues where resources shared Bandwidth enforced close to source Priority enforced end-to-end Efficiency considerations Overhead in data plane # queues Important at 40 Gbps 25

Centralized vs. decentralized control Centralized controller in SDS allows for simple algorithms that focus on SLA enforcement and not on distributed system challenges Analogous to benefits of centralized control in softwaredefined networking (SDN) 26

VM3 VM4 VM2 VM1 IOFlow implementation Guest OS 2 key layers for VM-to-Storage performance SLAs Application File system Block device Hypervisor VHD Controller SMBs Scanner SMBc Network driver Physical NIC Compute Server Network driver File system Disk driver 4 other layers . Scanner driver (routing) . User-level (routing) Physical NIC Storage Server . Network driver . Guest OS file system Implemented as filter drivers on top of layers 27

Evaluation map IOFlow’s ability to enforce end-to-end SLAs Aggregate bandwidth SLAs Priority SLAs and routing application in paper Performance of data and control planes 28

Evaluation setup VM VM VM VM VM Hypervisor Switch Storag e server VM VM VM Hypervisor Clients:10 hypervisor servers, 12 VMs each 4 tenants (Red, Green, Yellow, Blue) 30 VMs/tenant, 3 VMs/tenant/server Storage network: Mellanox 40Gbps RDMA RoCE full-duplex 1 storage server: 16 CPUs, 2.4GHz (Dell R720) SMB 3.0 file server protocol 3 types of backend: RAM, SSDs, Disks Controller: 1 separate server 1 sec control interval (configurable) 29

Workloads 4 Hotmail tenants {Index, Data, Message, Log} Used for trace replay on SSDs (see paper) IoMeter is parametrized with Hotmail tenant characteristics (read/write ratio, request size) 30

Enforcing bandwidth SLAs 4 tenants with different storage bandwidth SLAs Tenant SLA Red {VM1 – 30} - Min 800 MB/s Green {VM31 – 60} - Min 800 MB/s Yellow {VM61 – 90} - Min 2500 MB/s Tenants have different workloads Blue {VM91 – 120} - Min 1500 MB/s Red tenant is aggressive: generates more requests/second 31

Things to look for Distributed enforcement across 4 competing tenants Aggressive tenant(s) under control Dynamic inter-tenant work conservation Bandwidth released by idle tenant given to active tenants Dynamic intra-tenant work conservation Bandwidth of tenant’s idle VMs given to its active VMs 32

Results Controller notices red Intra-tenant Inter-tenant tenant’s work work performanceTenants’ SLAs conservationconservation enforced. 120 queues cfg. 33

Data plane overheads at 40Gbps RDMA Negligible in previous experiment. To bring out worst case varied IO sizes from 512Bytes to 64KB Reasonable overheads for enforcing SLAs 34

Control plane overheads: network and CPU Overheads (MB) Controller configures queue rules, receives statistics and updates token rates every interval 0.3% CPU overhead at controller 35

Before Next time Final Project Presentation/Demo – Due Friday, December 12. – Presentation and Demo – Written submission required: Report Website: index.html that points to report, presentation, and project (e.g. code) Required review and reading for Wednesday, December 3 – Plug into the Supercloud, D. Williams, H. Jamjoom, H. Weatherspoon. IEEE Internet Computing, Vol. 17, No 2, March/April 2013, pp 28-34. – http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber 6365162 Check piazza: http://piazza.com/cornell/fall2014/cs5413 Check website for updated schedule

Back to top button