CMU SCS Peta-Graph Mining Christos Faloutsos Appel, Ana Chau,

25 Slides1.08 MB

CMU SCS Peta-Graph Mining Christos Faloutsos Appel, Ana Chau, Polo Leskovec, Jure Kang, U Prakash, Aditya Shringarpure, Suyash Tsourakakis, Charalampos Yahoo/Hadoop, 2008 #1

CMU SCS Our goal: One-stop solution for mining huge graphs Yahoo/Hadoop, 2008 2

CMU SCS Outline Centralized Hadoop Degree Distribution old old Pagerank old old Diameter/ANF old X Communities old X Triangles X todo Visualization X todo Datasets: (a) Synthetic (‘Kronecker’, 300M nodes, 1B edges) Yahoo/Hadoop, (b) NetFlix (20K movies, 500K2008 users, 100M edges) 3

CMU SCS Degree Distributions - NetFlix count 100 machines - 8min Yahoo/Hadoop, 2008 Movie in-degree 4

CMU SCS Degree Distributions - NetFlix count Theoretically expected 100 machines - 8min Yahoo/Hadoop, 2008 Movie in-degree 5

CMU SCS Degree Distributions - NetFlix count 100 machines - 8min Yahoo/Hadoop, 2008 User out-degree 6

CMU SCS Degree Distributions - NetFlix count Theoretically expected Sharp drop below 100 ratings 100 machines - 8min User out-degree Yahoo/Hadoop, 2008 7

CMU SCS Degree Distributions - Kronecker count degree 100 machines - 6h Yahoo/Hadoop, 2008 Nodes:259M - Edges: 1B 8

CMU SCS Degree Distributions - timings Time (sec) 24 tasks 48 tasks 1 task Yahoo/Hadoop, Edge 2008 file size (MB) 9

CMU SCS Outline Centralized Hadoop Degree Distribution old old Pagerank old old Diameter/ANF old X Communities old X Triangles X todo Visualization X todo Datasets: (a) Synthetic (‘Kronecker’, 300M nodes, 1B edges) Yahoo/Hadoop, (b) NetFlix (20K movies, 500K2008 users, 100M edges) 10

CMU SCS Diameter Diameter of a graph Maximum shortest path Normally, O(N**2) ANF : Approximate Neighborhood function’ [Palmer 02]: O(E) Goal : calculate neighborhood function Neighborhood N(h) : number of pairs of nodes within distance h Yahoo/Hadoop, 2008 11

CMU SCS Time (min) Diameter 1 node 48 nodes 28 nodes Edge file (MB) For large jobs, parallelization Yahoo/Hadoop, 2008 helps 12

CMU SCS Diameter / Hop Plot (Netflix) # of reachable pairs within h hops h: # of hops Yahoo/Hadoop, 2008 13

CMU SCS Diameter / Hop Plot (Netflix) # of reachable pairs within h hops Diameter: 3 h: # of hops Yahoo/Hadoop, 2008 14

CMU SCS Outline Centralized Hadoop Degree Distribution old old Pagerank old old Diameter/ANF old X Communities old X Triangles X todo Visualization X todo Datasets: (a) Synthetic (‘Kronecker’, 300M nodes, 1B edges) Yahoo/Hadoop, (b) NetFlix (20K movies, 500K2008 users, 100M edges) 15

CMU SCS Community detection Cross associations [Chakrabarti ’04] Yahoo/Hadoop, 2008 16

CMU SCS Community detection Yahoo/Hadoop, 2008 17

CMU SCS Outline Centralized Hadoop Degree Distribution old old Pagerank old old Diameter/ANF old X Communities old X Triangles X todo Visualization X todo Datasets: (a) Synthetic (‘Kronecker’, 300M nodes, 1B edges) Yahoo/Hadoop, (b) NetFlix (20K movies, 500K2008 users, 100M edges) 18

CMU SCS Triangles ‘friends of friends are friends’ Yahoo/Hadoop, 2008 19

CMU SCS Triangles ‘friends of friends are friends’ Yahoo/Hadoop, 2008 20

CMU SCS Triangles ‘friends of friends are friends’ Naïve algo: 3-way join (slow) [Tsourakakis’08]: # triangles sum of cubes of eigenvalues Thus, super-fast computation of #triangles (100x - 25,000x faster than naïve; 95% accuracy Yahoo/Hadoop, 2008 21

CMU SCS Triangles Easy to implement on hadoop: it only needs eigenvalues (to do, with Lanczos) Yahoo/Hadoop, 2008 22

CMU SCS Outline Centralized Hadoop Degree Distribution old old Pagerank old old Diameter/ANF old X Communities old X Triangles X todo Visualization X todo Datasets: (a) Synthetic (‘Kronecker’, 300M nodes, 1B edges) Yahoo/Hadoop, (b) NetFlix (20K movies, 500K2008 users, 100M edges) 23

CMU SCS Visualization Principled visualization of large graphs Yahoo/Hadoop, 2008 (show few most important’ edges) 24

CMU SCS Summary Centralized Hadoop Degree Distribution old old Pagerank old old Diameter/ANF old X Communities old X Triangles X todo Visualization X todo Goal: one-stop solution for mining huge graphs Yahoo/Hadoop, 2008 25

Back to top button