cloud.cs.illinois Open Cirrus Summit Indranil Gupta, Roy Campbell,

21 Slides1.06 MB

cloud.cs.illinois.edu Open Cirrus Summit Indranil Gupta, Roy Campbell, Michael Heath Department of Computer Science University of Illinois, Urbana-Champaign June 8, 2009 http://cloud.cs.illinois.edu

Principal Investigator cloud.cs.illinois.edu – Michael Heath – parallel algorithms Co-PIs and lead systems researchers – Roy Campbell – O/S, file systems, security – Indranil Gupta – distributed systems and protocols Lead applications researchers – – – – – – Kevin Chang – search and query processing Jiawei Han – data mining Klara Nahrstedt – multimedia, QoS Dan Roth – machine learning, NLP Cheng Zhai – information retrieval Peter Bacjsy, Rob Kooper - NCSA 2

cloud.cs.illinois.edu 3

cloud.cs.illinois.edu 128 compute nodes 64 64 500 TB & 1000 shared cores 4

cloud.cs.illinois.edu Goal: Support both Systems Research and Applications Research in Data-intensive Distributed Computing 5

cloud.cs.illinois.edu Accessing and Using CCT: I. Systems Partition (64 nodes): – – CentOS machines, with sudo access Dedicated access to a subset of machines ( Emulab) User accounts – II. User requests # machines ( 64) storage quota ( 30 TB) Machine allocation survives for 4 weeks, storage survives for 6 months (both extendible) Hadoop/Pig Partition and Service (64 nodes): 6

cloud.cs.illinois.edu Accessing and Using CCT: I. Systems Partition (64 nodes): II. Hadoop/Pig Partition and Service (64 nodes): – Looks like a regular shared Hadoop cluster service – Users share 64 nodes. Individual nodes not directly reachable. 4 slots per machine Several users report stable operation at 256 instances During Spring 09, 10 projects running simultaneously User accounts User requests account storage quota ( 30 TB) Storage survives for 6 months (extendible) 7

cloud.cs.illinois.edu Some Services running inside CCT – ZFS: backend file system. – Zenoss: Monitoring. Shared with department’s other computing clusters – Hadoop HDFS – Ability to make datasets publicly available How do users request an account: two-stage process 1. User account request – require background check 2. Allocation request 8

cloud.cs.illinois.edu Internal UIUC Projects 10 projects inside Computer Science departments – Growing number Includes – 4 course projects in CS 525 (Advanced Distributed Systems) – Research projects in multiple research groups – Systems Research primarily led by: Indranil Gupta’s group (DPRG: dprg.cs.uiuc.edu) Roy Campbell’s group (SRG: srg.cs.uiuc.edu) 9 Several NCSA-driven projects

cloud.cs.illinois.edu NSF-Funded External Projects Abadi (Yale), Madden (MIT), and Naughton (Wisc.) – Study trade-offs in performance and scalability between MapReduce and parallel DBMS for largescale data analysis Baru and Krishnan (SDSC) – Study effectiveness of dynamic strategies for provisioning data intensive applications, based on large topographic data sets from airborne LiDAR surveys 10

cloud.cs.illinois.edu Project Timeline and Progress to Date Hardware received December 2008 Cluster ready for user accounts in February 2009 Yahoo conducted initial training session for 70 users About 215 accounts on cluster to date First two major external NSF-funded user groups now have accounts and we expect more to follow About 50TB of storage has been assigned thus far We run around 50 Hadoop jobs in a typical week http://cloud.cs.illinois.edu 11

cloud.cs.illinois.edu Backup Slides 12

cloud.cs.illinois.edu Access to Unstructured Information Goal: Accessing information we want, when we want it, in forms we can understand Solution: Understanding meaning of information Key capabilities required: Semantic parsing Named entity recognition Identifying relations between entities Paraphrasing and entailment Topic and sentiment analysis 13

cloud.cs.illinois.edu Approach to Cloud Implementation Port NLP tools to Cloud using MapReduce/Hadoop to enable large-scale NLP analysis Provide research community access to deep analysis of large portion of Web – 1 billion pages placed on Cloud, syntactically and semantically parsed, with named entity recognition Develop NLP-enabled applications – – – – – Semantic search engine: entity and relation search Vertical search services Question answering Information integration and summarization Text mining and pattern discovery 14

cloud.cs.illinois.edu Text Information Management Search Engines Information Access Summarization Visualization Filtering Information Organization Search Categorization Mining Extraction Clustering Natural Language Content Analysis Raw Text 15 Analysis Engines Knowledge Acquisition

cloud.cs.illinois.edu Next Generation Text Information Management Data-intensive computing will enable large-scale and intelligent text information management Today: Search by query Tomorrow: Personalized intelligent information agent Today: Document as bag of words Tomorrow: Understanding of entities and relations in documents through large-scale semantic analysis Today: Browsing supported only through preset hyperlinks Tomorrow: Browsing enabled through powerful navigation maps 16

3D camera array cloud.cs.illinois.edu UI Urbana-Champaign multi-display 3D rendering Internet2 networking infrastructure edge processors Multi-stream 3D Tele-Immersive (3DTI) Environment UC Berkeley 17

cloud.cs.illinois.edu D C D G C UC Berkeley C G C D C G C C UIUC G D D D D G G service gateway D display C camera 18

cloud.cs.illinois.edu Cloud Implementation of Tele-immersive Environment Store 3D multi-view videos in Cloud Provide multi-dimensional search/query for various attributes – E.g., search for patient’s arm exercise 19

cloud.cs.illinois.edu System-Level Research Automation of dynamic resource allocation, scheduling, management, and monitoring Partitioning and sharing of computation, network, and storage resources Analysis of distributed system, network, and application logs Scalability and fault tolerance of distributed file systems Characterization of cloud workloads Security and information assurance Multi-site issues: latency, scalability, etc. 20

cloud.cs.illinois.edu Applications Research Understanding textual information through largescale semantic analysis Intelligent browsing through navigation maps Crawling online social networks to understand their dynamic evolution Supporting 3D tele-immersive environments Implementing genetic algorithms via MapReduce Exploring GPUs in cloud environment – K-means clustering, Black-Scholes option pricing, etc. Breaking MapReduce barrier through weaker consistency models 21

Back to top button