Grph: an unpronounceable1 graph Java library - focusing on performance

Page created by Nathan Carlson

Uncategorized

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Grph: an unpronounceable1 graph Java library - focusing on performance

Grph: an unpronounceable1 graph Java library
         focusing on performance

                         Luc Hogie and friends

                       I3S(CNRS-UNSA)/INRIA

1
    Still, you may pronounce it groumph.

Agenda

  Motivations for developing another graph library

  Global overview of Grph

  7 bullets to kill performance bottlenecks

  Demonstration

  Users

  Conclusion

Motivations for developing
another graph library

In the context of our projects, we need a graph library that is:
  I   enables the fast (CPU) manipulation of large (RAM) dynamic
      networks;
  I   portable;
  I   intuitive;
  I   adequate to network simulation;

Several graph toolkits already are available to the “graph” and
“networks” communities. Among them:
  I   Boost exhibits the best performance, but is hard to use (even
      the simplest examples are scary);
  I   Jung and JGraphT offer the best portability: but they have
      poor performance (both computational and memory usage are
      bad);
  I   SageMath is the best candidate when it comes to graph
      experimentation. It is written in Python (with a bridge to C).
      Not suitable to large software developments.
In spite of the variety of tools, most often people anyway opt for
developing their custom code (is re-inventing the wheel good or
bad?).

Global overview of Grph

Briefly, Grph is a Java graph library augmented with a set of
experimentation tools. Its development started in 2008. It focuses
on efficiency. It supports the following class of graphs:
  I   simple graphs;
  I   multigraphs (two vertices can be connected by several edges);
  I   hypergraphs (edges may connect more that 2 vertices);
  I   any undirected, directed or mixed versions of these (a mixed
      graph consists of edges of various nature);
  I   dynamic graphs;

Undirected simple edge

    I   a RJ45 cable between two computers;

Directed simple edge

    I   a sattelite connection
    I   a optical fiber connection
    I   any directional abstract relation: (a knows b, a depends on b,
        etc)

Undirected simple edge

    I   a bus connection throught ethernet switch;

Directed hyper edge

    I   in a 1-n directed hyper edge, n represents the set of devices in
        range;
    I   in another such edge, n represents the set of devices that can
        entail the hidden node problem;
    I   in a n-1 directed hyper edge, n represents the set of devices to
        which a given device is in range;

Data structure

   Grph proposes a data structure.

Dynamic graphical monitoring

                            Grph comes with a bridge to
   GraphStream which endorses it with automatic graph layout
   and customizable dynamic graph representation.

Interactive console
   One of the greatest strength of the SageMath Python toolkit is its
   interactive Python interpreter. In the same way, Grph comes with
   an interactive shell, by relying on BeanShell. The interaction
   language is a dialect of Java.

Discrete-event engine

   Grph comes with a discrete-event simulation engine that allows it
   to consider dynamic systems.
     I   mobility models for mobile networks;
     I   fault models for unreliable networks;

Experimentation framework

   Grph comes with an experimentation framework that greatly
   helps the elaboration of usable results by taking care of:
     I   the non-recomputation of already computed stuff;
     I   the processing of multi-runs;
     I   the generation of the plot files.
   This greatly simplifies the graph experimentation codes of the user,
   and scale them down by a large factor.

Framework for evolutionary computing

   Grph comes with a evolutionary computing framework targeted to
   the generation of graph instances. This framework, called Drwin
   have the following particularities:
     I   it does not encode indivdiduals, thereby exploited meaningful
         application-level operators;
     I   it is multi-threaded;
     I   self-adaptive evolution and parallelism;
     I   OO.

Interoperability

   A Grph graph can be imported/exported as/to:
     I   Grph (compact binary Grph native format);
     I   GraphText (compct text Grph native format);
     I   GraphML (XML-dialect);
     I   GML;
     I   DOT/Graphviz (graph plotter);
     I   DGS (Dynamic Graphs);
     I   Inet/CAIDA Maps (topology generator);
   But also as:
     I   JUNG graph;
     I   Mascopt graph;

Algorithms

   Grph comes with:
    I   basic topology schemes (grid, ring, chain, star, clique, etc),
        random schemes (GNP, GNM, random tree), GLP
        (Generalized Linear Preference), etc.
    I   implementations for common graph algorithms: eccentricity,
        radius, diameter, in/out vertex/edge degrees, clustering
        coefficient, density, connected components, minimal spanning
        tree, shortest paths, BFS/DFS/RS, distributions, maximum
        clique, minimum vertex cover, maximum independent set,
        maximum flow, (sub)graph isomorphism, etc.

6 bullets to kill performance
bottlenecks

Not fully Object Oriented framework

   Bullet: In Java, OO kills performance
     I   Problem
     I     I   Java object memory management is too hungry
           I   OO implies an indirection to obtain, anyway, its ID
     I   Solution: Grph considers the vertex (or edge) IS its ID.
     I   Advantages:
     I     I   save memory by factor 4
           I   allow much faster management (use of HPPC)
           I   use of low-level caching (speedup 15)

Fast-access incidence lists

   Bullet:Both arrays and hash-table are accessed in constant time,
   but accessing an array is so much faster!
     I   The implementation of Grph uses 2 coupled incidence lists.

Representing vertex/edge sets as daptative integer sets

   Graph algorithms spend a great deal of their time manipulating
   vertices and/or edge sets. Grph proposes three implementations
   for integer sets:
     I   based on hash tables, adequate when the set is sparse;
     I   based on bit sets, adequate when the set is dense;
     I   adaptative, adequate when the density evolves unpredicably.
   Bitset-based intsets require at least 32 times less memory than
   hash-tables-based ones. When the density cross the threshold
   1/32, the implementation of the set is switched. In order to avoid
   too frequent implementation switches, Grph uses an hysteresis
   mechanism.

Caching already computed stuff

   Bullet 2: does not compute twice the same thing! Grph makes
   use of a cache.
     I   Basically, any given property will not be computed twice on
         the same graph; this breaks the complexity of graph
         operations.
     I   For example, the computation of the diameter will return
         immediately if all-pair shortest paths were computed
         previously. This is because diameter requires the distance
         matrix that was already computed by the shortest path
         algorithm.
   Depending on the application, performance dramatically improves!

Use of C/C++ code

  Bullet 4: if you cannot fasten your Java implementation anymore,
  then write it in C.
    I   hackers can expect a speed-up of 5;
    I   Grph will find it and compile it on-the-fly using optimization
        flags for your specific computer;
    I   if compilation failed, an optional 100% pure Java alternative
        will allow the program to run, still.
  JNI and JNA prove inadequate. Instead Grph resort to process
  piping. (number of triangles, max-clique, (sub)graph isomorphism,
  etc) are already implemented in C++.

Parallelizing algorithms

   Bullet 5: take advantage of multi-core computers:
     I   algorithms that can be computed independently on every
         vertex in the graph can be automatically parallelized;
     I   Grph generates a lot more threads that installed cores, hence
         reducing the probability of unfair load balancing;
     I   on my dual-core computer, I experimented speed-up between
         1 and 1.7.

Resorting to linear programming

   Bullet 6: benefiting from the advantages of linear programming:
   Many graph problems can be expressed as linear programs:
     I   the linear program is often shorter than its corresponding
         algorithm;
     I   its resolution benefits from the high efficiency of the solver’s
         strategies for solving.
   Supported for CPLEX and GLPK (under progress). Invocation of
   remote solvers (through SSH) is also available.

Disabling verifications

   Bullet 7: a good program is one that checks its parameters. But if
   you KNOW that the parameters are correct, you can disable these
   time-consuming verifications.
     I   method arguments are checked by assertions;
     I   hence they can be disabled;
     I   improves “production” mode.

Demonstration: let’s see how
to do things
 I   creating a graph (10 × 10 grid);
 I   computing the diameter;
 I   computing all pair shortest paths;
 I   adding a new Java algorithm;
 I   adding a new property to vertices;
 I   computing a distribution;
 I   console profiling;

Users
I   31 users;
I   INRIA, INRA, UCL, Trier, Georgia, Politechnika Warszawska,
    etc

Almost the end

Conclusion

   So we have a graph lib which design objectives are to maximize:
     I   memory usage (use of native types);
     I   computational efficiency (use of caching, parallelism, etc);
     I   unpronounceability.
     I   simplicity to use (simple but functional Java API and tools);
   Grph is currently used by:
     I   Mascotte team at INRIA (at the heart of DRMSim);
     I   You? (no worries, we do provide pro-active support).

Conclusion (why would you do so?)

   I suggest you to take a look at Grph if you want to:
     I   to write graph-based scientific applications;
     I   get rid of Java usual inefficiency;
     I   work with me. :)

   http://www-sop.inria.fr/members/Luc.Hogie/grph/

End of the presntation.
    Any qustions?

You can also read

Using Graph Template Language and R for High

Evo Flattop Grills - Designed & Built in the USA

GREEN BUILDINGS MARKET INTELLIGENCE BRAZIL COUNTRY PROFILE - EDGE Buildings

AHPC 2020 sipearl.com European-processor-initiative.eu - Klosterneuburg 21-02-2020 - European ...

A GENERIC FRAMEWORK FOR ENGAGING ONLINE DATA SOURCES IN INTRODUCTORY PROGRAMMING COURSES - NADEEM ABDUL HAMID - Sinbad.key

Karl Koscher - @supersat Eric Butler - @codebutler Writing, building, loading, and using code on SIM Cards.

EFFICIENTLY R/W "BIG" DATA COLLECTIONS: STORING, QUERYING ON POLYGLOT SETTINGS - GENOVEVA VARGAS SOLAR

Year 6 - Networks Role Plays - Bryony School

Preparing for Science in the Exascale Era - Argonne ...

Football Helmet Standards Overview - NOCSAE

Company Overview March 2018 - (OTCQB:ACLZ) - Accelerize

Maintenance Schedule Honda CR-V - 6807 Tilton Road Egg Harbor Township, NJ 08234 BoardwalkHonda.com - Dealer Inspire

Data Sheet FUJITSU Storage ETERNUS DX100 S5 Disk System

ANIXTER : PROXIM WIRELESS QUOTATION GUIDE ORINOCO WI-FI - A PIONEER AND INNOVATOR IN WIRELESS NETWORKING

Irish Residential Properties REIT plc - Building Communities and Creating Value - Irish Residential Properties ...

Sonatype nexuS profeSSional - Deployment Guidelines

David C. Kehlet, Research Scientist February 21, 2020 - This research was developed with funding from the Defense Advanced Research Projects ...

Linux and the Thin Client Management Market - August 2018 An IDC InfoBrief, Sponsored by - IGEL

Moving towards meaningful and harmonised due diligence disclosure

The power of perspective - 2021 Vanguard Index Chart

INVESTOR PRESENTATION - First Quarter 2015 - WPT Industrial REIT

2020 JEEP GRAND CHEROKEE - BUYER'S GUIDE TO THE - RockwallDodge.com - Dealer Inspire

Ministry Business Plan - Justice and Solicitor General

The Meaning of U. S. State Names - By Mark R. Putnam

Results Presentation Q1-3 2018 - Wienerberger AG

Donaco records $42.4 million EBITDA for FY2018 - Donaco International

Testimony of Evelyn Robinson, Managing Partner - State Government Policy - Before the Indiana Senate Utilities Committee - PJM

2019 Denver Gold Forum - Denver, Colorado, USA Mick Wilkes President & Chief Executive Officer - OceanaGold

Silence Please, Begin - Rashaad Newsome - Curated by Suzanne Carte Art Gallery of York University April 8 - June 14, 2015 - SUZANNE CARTE

Western Region Local Workforce Development Board 2017-2020 Regional Workforce Plan

NU PLASMA II MULTI-COLLECTOR ICP-MS

WIFI TECHNOLOGY REFRESH ANALYSIS - FEDRAMP CLOUD WIFI PROMOTION - Q3 2018 - HUBSPOT

WHY INVEST IN JO&JOE ACCOR GLOBAL DEVELOPMENT FEBRUARY 2019

RENT OUR FACILITIES - the Oakville Centre for the Performing Arts

The MTV VMAs is all set to return this 2020 in the New York City despite the Covid-19 pandemic

How To Grow Your Business With Affiliate Marketing - A quick guide to connecting with thousands of publishers and driving traffic and sales to ...

STRATEGIC PLAN 2019-2021 BRUMBIES RUGBY

Lenovo ThinkPad E15 GEN 3

5.3 JAXA: GCOM-C/SGLI - new developments JAXA/EORC Hiroshi Murakami IOCCG-22 Committee Meeting

Bond investor presentation - Cision

2021 HIGHLIGHTS 25+ YEARS PARTNERING WITH OUTSTANDING MANAGEMENT TEAMS TO ACCELERATE VALUE CREATION - Sun Capital Partners

4 One Act Plays by David Ives - TIME FLIES DEGAS, C'EST MOI LIVES OF THE SAINTS THE GREEN HILL - Campbellsville University

Stomping Ground Commission Callout 2021 - The Place

Parques Reunidos Expands to Australia with the Acquisition of Wet'n'Wild Sydney - July 2018

UPDATE ON ELECTIONS 2020: DÉJÀ VU? OR BLUE WAVE - BNP ...

THE BENEFITS OF DIRECT REAL ESTATE - IN AN INFLATIONARY ENVIRONMENT JULY 2021 - USAA Real Estate

BALKRISHNA INDUSTRIES LTD - Investor Presentation February, 2019 - NSE India

Teenagers: Sleep Patterns and School Performance

First Quarter 2018 Investment Commentary

How the Millennial Generation is Changing the Workplace