Scientifi c Computing Data Visualization Information Technology Quantum Computing - JULY 2020

 
CONTINUE READING
Scientifi c Computing Data Visualization Information Technology Quantum Computing - JULY 2020
Scientific
            Computing
            Data Visualization
            Information
            Technology
            Quantum
            Computing

JULY 2020               www.computer.org
Scientifi c Computing Data Visualization Information Technology Quantum Computing - JULY 2020
IEEE
Computer
Society Has
You Covered!
WORLD-CLASS CONFERENCES — Stay
ahead of the curve by attending one of our
200+ globally recognized conferences.

DIGITAL LIBRARY — Easily access over 780k
articles covering world-class peer-reviewed
content in the IEEE Computer Society
Digital Library.

CALLS FOR PAPERS — Discover
opportunities to write and present your
ground-breaking accomplishments.

EDUCATION — Strengthen your resume
with the IEEE Computer Society Course
Catalog and its range of offerings.

ADVANCE YOUR CAREER — Search the
new positions posted in the IEEE Computer
Society Jobs Board.

NETWORK — Make connections that count
by participating in local Region, Section,
and Chapter activities.

Explore all of the member benefits
at www.computer.org today!
Scientifi c Computing Data Visualization Information Technology Quantum Computing - JULY 2020
IEEE COMPUTER SOCIETY computer.org

   STAFF
   Editor                                                                           Publications Portfolio Managers
   Cathy Martin                                                                     Carrie Clark, Kimberly Sperka

   Publications Operations Project Specialist                                       Publisher
   Christine Anthony                                                                Robin Baldwin

   Production & Design Artist                                                       Senior Advertising Coordinator
   Carmen Flores-Garvey                                                             Debbie Sims

   Circulation: ComputingEdge (ISSN 2469-7087) is published monthly by the IEEE Computer Society. IEEE Headquarters, Three Park Avenue, 17th
   Floor, New York, NY 10016-5997; IEEE Computer Society Publications Office, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720; voice +1 714 821 8380;
   fax +1 714 821 4010; IEEE Computer Society Headquarters, 2001 L Street NW, Suite 700, Washington, DC 20036.
   Postmaster: Send address changes to ComputingEdge-IEEE Membership Processing Dept., 445 Hoes Lane, Piscataway, NJ 08855. Periodicals Postage
   Paid at New York, New York, and at additional mailing offices. Printed in USA.
   Editorial: Unless otherwise stated, bylined articles, as well as product and service descriptions, reflect the author’s or firm’s opinion. Inclusion in
   ComputingEdge does not necessarily constitute endorsement by the IEEE or the Computer Society. All submissions are subject to editing for style,
   clarity, and space.
   Reuse Rights and Reprint Permissions: Educational or personal use of this material is permitted without fee, provided such use: 1) is not made for
   profit; 2) includes this notice and a full citation to the original work on the first page of the copy; and 3) does not imply IEEE endorsement of any third-
   party products or services. Authors and their companies are permitted to post the accepted version of IEEE-copyrighted material on their own Web
   servers without permission, provided that the IEEE copyright notice and a full citation to the original work appear on the first screen of the posted copy.
   An accepted manuscript is a version which has been revised by the author to incorporate review suggestions, but not the published version with copy-
   editing, proofreading, and formatting added by IEEE. For more information, please go to: http://www.ieee.org/publications_standards/publications
   /rights/paperversionpolicy.html. Permission to reprint/republish this material for commercial, advertising, or promotional purposes or for creating new
   collective works for resale or redistribution must be obtained from IEEE by writing to the IEEE Intellectual Property Rights Office, 445 Hoes Lane,
   Piscataway, NJ 08854-4141 or pubs-permissions@ieee.org. Copyright © 2020 IEEE. All rights reserved.
   Abstracting and Library Use: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy for private use of patrons,
   provided the per-copy fee indicated in the code at the bottom of the first page is paid through the Copyright Clearance Center, 222 Rosewood Drive,
   Danvers, MA 01923.
   Unsubscribe: If you no longer wish to receive this ComputingEdge mailing, please email IEEE Computer Society Customer Service at help@
   computer.org and type “unsubscribe ComputingEdge” in your subject line.
   IEEE prohibits discrimination, harassment, and bullying. For more information, visit www.ieee.org/web/aboutus/whatis/policies/p9-26.html.

   IEEE Computer Society Magazine Editors in Chief

   Computer                                            IEEE Intelligent Systems                            IEEE Pervasive Computing
   Jeff Voas, NIST                                     V.S. Subrahmanian,                                  Marc Langheinrich, Università
                                                       Dartmouth College                                   della Svizzera italiana
   Computing in Science
   & Engineering                                       IEEE Internet Computing                             IEEE Security & Privacy
   Lorena A. Barba (Interim),                          George Pallis, University                           David Nicol, University
   George Washington University                        of Cyprus                                           of Illinois at
                                                                                                           Urbana-Champaign
   IEEE Annals of the History                          IEEE Micro
   of Computing                                        Lizy Kurian John, University                        IEEE Software
   Gerardo Con Diaz, University                        of Texas at Austin                                  Ipek Ozkaya, Software
   of California, Davis                                                                                    Engineering Institute
                                                       IEEE MultiMedia
   IEEE Computer Graphics                              Shu-Ching Chen, Florida                             IT Professional
   and Applications                                    International University                            Irena Bojanova, NIST
   Torsten Möller,
   Universität Wien

2469-7087/20 © 2020 IEEE                              Published by the IEEE Computer Society                                         July 2020                   1
Scientifi c Computing Data Visualization Information Technology Quantum Computing - JULY 2020
JULY 2020 � VOLUME 6 � NUMBER 7

           16
SciPipe—Turning
                                        22
                                Weather Report: A
                                                      30
                                                     OpenSpace:
        Scientific            Site-Specific Artwork       Bringing
       Workflows                      Interweaving          NASA
  into Computer              Human Experiences        Missions to
       Programs                and Scientific Data     the Public
                                   Physicalization
Scientifi c Computing Data Visualization Information Technology Quantum Computing - JULY 2020
Scientific Computing
                          8      Metamorphic Testing: A Simple Yet Effective Approach
                                for Testing Scientific Software
                              UPULEE KANEWALA AND TSONG YUEH CHEN

                     16     SciPipe—Turning Scientific Workflows into Computer
                            Programs
                          SAMUEL LAMPA, MARTIN DAHLÖ, JONATHAN ALVARSSON, AND
                          OLA SPJUTH

                   Data Visualization
                 22   Weather Report: A Site-Specific Artwork Interweaving
                      Human Experiences and Scientific Data Physicalization
                      DANIEL F. KEEFE, SETH JOHNSON, ROSS ALTHEIMER, DEUK-GEUN HONG,
                     ROBERT HUNTER, ANDREA J. JOHNSON, MAURA ROCKCASTLE, MARK
                     SWACKHAMER, AND AARON WITTKAMPER

              30   OpenSpace: Bringing NASA Missions to the Public
                   ALEXANDER BOCK, CHARLES HANSEN, AND ANDERS YNNERMAN

            Information Technology
           38    A Manifesto for Energy-Aware Software
                ALCIDES FONSECA, RICK KAZMAN, AND PATRICIA LAGO

         42    Next Generation IoT: Toward Ubiquitous Autonomous
              Cost-Efficient IoT Devices
              MOUSTAFA YOUSSEF AND MAHBUB HASSAN

      Quantum Computing
      46    Powerball and Quantum Supremacy
           ERIK P. DEBENEDICTIS

  50       Beyond Quantum Supremacy
         ERIK P. DEBENEDICTIS

 Departments
  4     Magazine Roundup
  7    Editor’s Note: Advancing Science with Software
56    Conference Calendar

                                         Subscribe to ComputingEdge for free at
                                         www.computer.org/computingedge.
Scientifi c Computing Data Visualization Information Technology Quantum Computing - JULY 2020
Magazine Roundup

    T     he IEEE Computer Society’s lineup of 12 peer-reviewed technical magazines covers cutting-edge topics rang-
          ing from software design and computer graphics to Internet computing and security, from scientific appli-
    cations and machine intelligence to visualization and microchip design. Here are highlights from recent issues.

                                             In this article from the March/April     were researched, implemented,
                                             2020 issue of Computing in Sci-          and launched into the market-
    Challenges and Opportunities             ence & Engineering, the authors’         place, where their intense compe-
    in the Detection of Safety-              experience shows that it is often        tition transformed the 500-year
    Critical Cyberphysical Attacks           straightforward to first define the      tradition of printing and publish-
                                             multiple iterations of tests for per-    ing—placing the electronic literacy
    Cyberphysical     systems     (CPSs)     forming continuous simulations,          on the screens of billions of digital
    are increasingly used in various         and then keep multiple and even          displays, computers, tablets, and
    application domains and face the         competing metamorphic relations          smart phones around the world.
    threat of cyberphysical attacks. In      open for investigating the test-         Read more in this article from the
    this article from the March 2020         ing-result patterns. The authors         January–March 2020 issue of IEEE
    issue of Computer, the authors dis-      call this new approach exploratory       Annals of the History of Computing.
    cuss challenges in detecting these       MT, and they report their experi-
    attacks. They use power grids and        ence of applying it to detect bugs,
    surgical robots to clarify their anal-   mismatches, and constraints in
    ysis, and they use this analysis to      automatically calibrating parame-        Illustrating Changes in Time-
    identify ongoing challenges and          ters for the United States Environ-      Series Data with Data Video
    future research directions.              mental Protection Agency’s Storm
                                             Water Management Model.                  Understanding the changes of
                                                                                      time series is a common task in
                                                                                      many application domains. Con-
    Exploratory Metamorphic                                                           verting time-series data into vid-
    Testing for Scientific Software          The Font Wars, Part 1                    eos helps an audience with little
                                                                                      or no background knowledge gain
    Scientific model developers are          The Font Wars were a decades-            insights and deep impressions. It
    able to verify and validate their        long competition in the computer         essentially integrates data visual-
    software via metamorphic test-           industry for dominance in font           izations and animations to present
    ing (MT), even when the expected         technology, viewed as a key suc-         the evolution of data expressively.
    output of a given test case is not       cess factor for personal computing       However, it remains challenging to
    readily available. The tenet is to       platforms. At the heart of the Font      create this kind of data video. First,
    check whether certain relations          Wars was a fundamental question:         it is difficult to efficiently detect
    hold among the expected outputs          What is the best way to turn tradi-      important changes and include
    of multiple related inputs. Contem-      tional printed letter forms into digi-   them in the video sequence. Exist-
    porary approaches require that the       tal fonts for computer screens and       ing methods require much man-
    relations be defined before tests.       printers? Answers to this question       ual effort to explore the data and

4              July 2020                        Published by the IEEE Computer Society                2469-7087/20 © 2020 IEEE
Scientifi c Computing Data Visualization Information Technology Quantum Computing - JULY 2020
find changes. Second, how these                                                  autonomous driving. It is based
  changes are emphasized in the                                                    on a new system-on-a-chip (SoC)
  videos is also worth studying. A         Container NATs and Session-             that integrates industry-standard
  video without emphasis will hinder       Oriented Standards: Friends             components such as CPUs, ISP,
  an audience from noticing those          or Foe?                                 and GPU, with custom neural net-
  important changes. This article                                                  work accelerators. The FSD com-
  from the March/April 2020 issue of       This article from the November/         puter is capable of processing up
  IEEE Computer Graphics and Appli-        December 2019 issue of IEEE Inter-      to 2,300 frames per second, which
  cations presents an approach that        net Computing highlights issues         is a 21× improvement over Tesla’s
  extracts and visualizes important        that arise when deploying network       previous hardware and at a lower
  changes of a time series.                address translation middle-boxes        cost. When fully utilized, it enables
                                           through containers. The authors         a new level of safety and autonomy
                                           focus on Docker as the container        on the road. Read more in this arti-
                                           technology of choice and pres-          cle from the March/April 2020 issue
  Research on Road Traffic                 ent a thorough analysis of its net-     of IEEE Micro.
  Situation Awareness System               working model, with special atten-
  Based on Image Big Data                  tion to the default bridge network
                                           driver that is used to implement
  Road traffic is an important com-        network address translation func-       Metric Learning-Based
  ponent of the national economy           tionality. They discuss some unex-      Multimodal Audio-Visual
  and social life. Promoting intelli-      pected shortcomings and elabo-          Emotion Recognition
  gent and Informa ionization con-         rate on the suitability of containers
  struction in the field of road traffic   for deploying services based on         People express their emotions
  is conducive to the construction         the Interactive Connectivity Estab-     through multiple channels, such
  of smart cities and the formula-         lishment standard protocol. To          as visual and audio ones. Conse-
  tion of macro-strategies and con-        support their findings, they present    quently, automatic emotion rec-
  struction plans for urban traffic        experiments that they conducted         ognition can be significantly ben-
  development. Aiming at the short-        in a real-world operational environ-    efited by multimodal learning.
  comings of the current road traffic      ment, namely a WebRTC service           Even though each modality exhib-
  system, this article from the Jan-       based on the Janus media server.        its unique characteristics, multi-
  uary/February 2020 issue of IEEE                                                 modal learning takes advantage of
  Intelligent Systems—on the basis                                                 the complementary information of
  of combining convolution neu-                                                    diverse modalities when measur-
  ral networks (CNNs), situational         Compute Solution for Tesla’s            ing the same instance, resulting in
  awareness, databases, and other          Full Self-Driving Computer              enhanced understanding of emo-
  technologies—takes the road traf-                                                tions. Yet, their dependencies and
  fic situational awareness system         Tesla’s full self-driving (FSD) com-    relations are not fully exploited in
  as its research object and analyzes      puter is the world’s first pur-         audio–video emotion recognition.
  the information collection, pro-         pose-built   computer      for   the    Furthermore, learning an effective
  cessing, and analysis process.           highly demanding workloads of           metric through multimodality is a

www.computer.org/computingedge                                                                                            5
Scientifi c Computing Data Visualization Information Technology Quantum Computing - JULY 2020
MAGAZINE ROUNDUP

    crucial goal for many applications    PedibusSmart, SafePath, and Kids-         resourcing. In this article from
    in machine learning. Therefore,       GoGreen. It reports on four years         the March/April 2020 issue of IEEE
    in this article from the January–     of success with more than 1,800           Software, the authors define a
    March 2020 issue of IEEE Multi-       elementary-age       children,    their   systematical service design con-
    Media, the authors propose multi-     teachers, and their families. The         cept to enable all stakeholders to
    modal emotion recognition metric      authors further show how (i) dis-         achieve better outcomes in co-
    learning (MERML), learned jointly     appearing, pervasive technology           creation activities.
    to obtain a discriminative score      contributes to successful adop-
    and a robust representation in a      tion; (ii) properly balancing trust
    latent-space for both modalities.     and tracking leads to useful, nonin-
    The learned metric is efficiently     vasive technological support; and         Detecting Online Content
    used through the radial basis func-   (iii) in-classroom, gameful technol-      Deception
    tion (RBF)-based support vector       ogy engages and motivates partici-
    machine (SVM) kernel. The eval-       pation, with behavior changes per-        The surge of deceptive content
    uation of the framework shows a       sisting over time.                        (such as fake news) in the past few
    significant performance, improv-                                                years has made content decep-
    ing the state-of-the-art results on                                             tion an important area of research.
    the eNTERFACE and CREMA-D                                                       The authors of this article from the
    datasets.                             The Need for New                          March/April 2020 issue of IT Pro-
                                          Antiphishing Measures                     fessional identify two main types
                                          against Spear-Phishing                    of content deception based on
                                          Attacks                                   either fake content or misleading
                                                                                    content. They present a classifi-
    CLIMB: A Pervasive Gameful            In this article from the March/April      cation of deception attacks along
    Platform Promoting Child              2020 issue of IEEE Security & Pri-        with delivery methods. They also
    Independent Mobility                  vacy, the authors provide extensive       discuss defense measures that can
                                          analysis of the unique characteris-       detect deception attacks. Finally,
    Child independent mobility (CIM)      tics of phishing and spear-phishing       they highlight some outstanding
    refers to the freedom and capa-       attacks, argue that spear-phish-          challenges in the area of content
    bility of children to move about      ing attacks cannot be well cap-           deception.
    their local neighborhoods with-       tured by current countermeasures,
    out constant direct adult supervi-    identify ways forward, and analyze
    sion. Our CLIMB project combats       an advanced spear-phishing cam-
    an observed decline in CIM, offer-    paign targeting white-collar work-
    ing a pervasive gameful platform      ers in 32 countries.                         Join the IEEE
    for home–school mobility com-                                                      Computer
    posed of three primary compo-                                                      Society
    nents: the first two using technol-
                                                                                       computer.org/join
    ogy to support different levels of    Three Phases of Transforming
    child independence and the third      a Project-Based IT Company
    providing an element of continu-      into a Lean and Design-Led
    ous motivation for positive behav-    Digital Service Provider
    ior change. This article from the
    January–March 2020 issue of IEEE      Digital transformation requires
    Pervasive Computing describes         a continuous review of value
    these three novel technologies:       creation,   value      capture,   and

6               ComputingEdge                                                                                    July 2020
Scientifi c Computing Data Visualization Information Technology Quantum Computing - JULY 2020
Editor’s Note

   Advancing Science
   with Software

  S      cientists increasingly use
         software to facilitate discov-
   eries through simulation and anal-
                                             Also from Computing in Science
                                             & Engineering, “SciPipe—Turning
                                             Scientific Workflows into Com-
                                                                                    people better understand NASA’s
                                                                                    missions and discoveries.
                                                                                        Whether scientific or com-
   ysis. Scientific software can enable      puter Programs” discusses a tool       mercial, software should be as
   research that might be imprac-            for automating complex scientific      energy-efficient as possible. The
   tical or impossible using experi-         computations involving multiple        authors of IEEE Software’s “A Man-
   mentation, observation, or theory.        programs.                              ifesto for Energy-Aware Software”
   Despite its importance, challenges            Scientific data can be chal-       argue that software developers
   remain in terms of testing, reli-         lenging to not only analyze but        must consider energy consump-
   ability, and reusability. This Com-       also convey to the public. This        tion when designing programs.
   putingEdge issue presents tools           ComputingEdge       issue   includes   IEEE Pervasive Computing’s “Next
   for helping scientists effectively        two examples of creative scien-        Generation IoT: Toward Ubiqui-
   utilize software in their research.       tific data visualizations from IEEE    tous Autonomous Cost-Efficient
      “Metamorphic Testing: A Sim-           Computer Graphics and Appli-           IoT Devices” also emphasizes the
   ple Yet Effective Approach for            cations. In “Weather Report: A         need for energy efficiency, specif-
   Testing Scientific Software,” from        Site-Specific Artwork Interweav-       ically in IoT devices.
   Computing in Science & Engi-              ing Human Experiences and Sci-             This issue of ComputingEdge
   neering, aims to help scientists          entific Data Physicalization,” the     concludes with two Computer
   who have limited training in soft-        authors describe a public art          articles on quantum supremacy.
   ware development employ sci-              installation that educates on cli-     “Powerball and Quantum Suprem-
   entific   software      successfully.     mate change. In “OpenSpace:            acy” teaches quantum concepts
   The authors explain that meta-            Bringing NASA Missions to the          using a lottery metaphor. “Beyond
   morphic testing is a good tech-           Public,” the authors present a visu-   Quantum Supremacy” identifies
   nique for testing exploratory soft-       alization program for use in inter-    advances that are needed to move
   ware with inherent uncertainties.         active museum displays that helps      quantum computing forward.

2469-7087/20 © 2020 IEEE                   Published by the IEEE Computer Society                    July 2020            7
Scientifi c Computing Data Visualization Information Technology Quantum Computing - JULY 2020
DEPARTMENT: SOFTWARE ENGINEERING
                                                                                                      This article originally
                                                                                                                appeared in

    Metamorphic Testing:                                                                                  vol. 21, no. 1, 2019

    A Simple Yet Effective Approach
    for Testing Scientific Software
    Upulee Kanewala, Montana State University
    Tsong Yueh Chen, Swinburne University of Technology

        Testing scientific software is a difficult task due to their inherent complexity and the
        lack of test oracles. In addition, these software systems are usually developed by end-
        user developers who are not normally trained as professional software developers nor
        testers. These factors often lead to inadequate testing. Metamorphic testing (MT) is a
        simple yet effective testing technique for testing such applications. Even though MT is a
        wellknown technique in the software testing community, it is not very well utilized by the
        scientific software developers. The objective of this paper is to present MT as an effective
        technique for testing scientific software. To this end, we discuss why MT is an appropriate
        testing technique for scientists and engineers who are not primarily trained as software
        developers. Specifically, how it can be used to conduct systematic and effective testing
        on programs that do not have test oracles without requiring additional testing tools.

     WHAT MAKES TESTING SCIENTIFIC                                  due to one-off errors,1 compromised performance in
     SOFTWARE DIFFICULT?                                            coordinate measuring machines (CMMs) due to soft-
    Scientific software is widely used for making criti-            ware faults,2 and geoscience software systems pro-
    cal decisions in various scientific and engineering             ducing seemingly correct yet different results that
    domains. For example, simulations are often used                are hard to categorize as incorrect. 3 Previous works
    in place of physical experiments due to the time                also report situations where software faults cause
    and cost constraints associated with conducting                 retractions of published work.4 Testing is the most
    physical experiments. Furthermore, decisions made               widely used approach for quality assurance of soft-
    by these software systems can affect day-to-day                 ware. But some inherent characteristics in scientific
    human life, such as predictions made by climate                 software make it difficult to conduct systematic test-
    models. Thus, it is important to make sure that these           ing in these programs. 5
    software systems are producing the correct results.
    Previous studies have reported many instances                      ›› Correct answers are often unknown. Typically,
    where scientific software systems were affected                       scientific software is exploratory in nature
    by faults such as seismic programs losing precision                   and due to this, the correct results are often
                                                                          unknown. If the result is known there would be
                                                                          no need to develop the software. In such situa-
    DOI No. 10.1109/MCSE.2018.2875368                                     tions, only bounds or ranges of solutions might
    Date of publication 18 October 2018; date of current version          be available. Typically, in testing, an expected
    6 March 2019.                                                         output is used to decide test case passing

8               July 2020                          Published by the IEEE Computer Society              2469-7087/20 © 2020 IEEE
SOFTWARE ENGINEERING

  FIGURE 1. MT process. ts: source test case, tf : follow-up test case, os: output of the source test case and of: of : output of the
  follow-up test case.

        or failing. This would make it challenging to                   input list has two million real numbers. How can we
        conduct systematic testing in these programs.                   know the returned average is correct or not? Though
     ›› Practically difficult to validate the computed                  we are not able to validate the computed average in
        output. Scientific software often implements                    this case, we do know some relationships between the
        mathematical models that involve complex                        outputs of some related inputs. For example, consider
        calculations. Furthermore, they tend to produce                 a new list of real numbers, which is a permuted list of
        complex outputs. Both these characteristics                     the original list of real numbers, or which consists of
        make it hard to determine the correctness                       four million real numbers by duplicating the original list
        of the produced output of the software. This                    of real numbers. For either of the new lists of real num-
        makes it challenging to use automated test                      bers, the new average is expected to be the same as
        case generation approaches such as random                       the old average (subject to some round-off error toler-
        test generation since the output of such test                   ance). If the new and old averages are not the same,
        cases are difficult to validate.                                then we know that the program of computing average
     ›› Inherent uncertainties. Often scientific software               has bugs. This is the intuition of MT.
        is written to simulate models with inherent                         In software testing, passing, or failing of a test case
        uncertainties. For some of these scientific                     is decided using a test oracle and it is an essential com-
        programs, there may be more than one possible                   ponent for conducting systematic testing. MT uses
        output. This makes it challenging to conduct                    metamorphic relations (MRs) to determine whether
        testing on these programs.                                      a test case has passed or failed. An MR specifies how
     ›› Choosing suitable tolerances. Scientific                        the output of the program is expected to change when
        software systems often involve complex                          a specified change is made to the input. The following
        floating-point computations. Thus, specifying                   is the typical process for applying MT to a given pro-
        the acceptable tolerance for the expected                       gram and Figure 1 depicts this process:6
        output in test cases is difficult.
     ›› Incompatible testing tools. Programming                             1. MR identification: identifying MRs for the
        languages such as FORTRAN are widely used                              program under test can be done based on the
        in the scientific community for developing                             specification.
        scientific software. However, testing tools                         2. Source test case creation and execution: com-
        are usually developed for languages such as                            monly used test generation techniques such
        JAVA and C++ that are commonly used by                                 as random, structural coverage, or fault-based
        the software engineering community. Thus,                              test input generation can be used. Then the
        these testing tools are not effective for testing                      generated source test cases are executed on
        scientific programs.                                                   the program under test.
                                                                            3. Follow-up test case creation: use the MRs
   WHAT IS METAMORPHIC                                                         identified in Step 1 to transform the source test
   TESTING (MT)?                                                               case to obtain the follow-up test case.
  Consider a program that will accept a list of real num-                   4. Follow-up test case execution: Execute the
  bers and compute their average. Suppose that the                             follow-up test case and compare the outputs

www.computer.org/computingedge                                                                                                         9
SOFTWARE ENGINEERING

     FIGURE 2. A JUnit test script that uses MT approach to test a matrix multiplication function in the JScience library.

                                                                                     1 2                 2 1
            of the source and follow-up test cases to verify                 A =         and B =             .
                                                                                     3 4                 3 4
            whether the corresponding MRs are satisfied.
            Violation of an MR indicates that the program                    This source test case is executed on the matrix
            under test is faulty.                                        multiplication function under test first. Next, based on
                                                                         the input relationship specified by the above MR, two
         Thus, MT checks whether relationships between                   follow-up test cases are created, namely, the test case
     inputs and outputs of multiple executions were pre-                 consisting of A and B1 and the test case consisting
     served during the program execution and can be used                 of A and B2. Here, B1 is randomly generated and B2 is
     without knowing the correctness of the output for                   defined as B − B1. Suppose that
     individual executions.
         Consider a program P that multiplies two matrices
                                                                                     1 6
     A and B. Assume that the result of multiplying A with B                 B1 =        .
                                                                                     3 5
     is C. Matrix multiplication has the following property:
     A × B = A × B1 + A × B2 where B = B1 + B2. We can use this          Then,
     property as an MR to conduct MT on P. For example,
                                                                                                  1     5
     Figure 2 shows a test script written using JUnit to con-                B2 = B      B1 =             .
                                                                                                  0     1
     duct automated testing on the matrix multiplication
     function in the JScience Matrix class (http://jscience                  As shown in Figure 3, these two follow-up test
     .org/api/org/jscience/mathematics/vector/Matrix                     cases are also executed on the matrix multiplication
     .html).                                                             function under test. Finally, the outputs of the source
         The source test case (which consists of A and B)                test and the follow-up test cases are validated against
     can be provided by the user or can be generated ran-                the above MR. As shown with this test script, this MR
     domly. For example, assume that the user provided the               based testing approach allows to generate follow-up
     following simple two matrices:                                      test cases automatically and verify relationships

10                 ComputingEdge                                                                                            July 2020
SOFTWARE ENGINEERING

  FIGURE 3. MT of a matrix multiplication program.

  between multiple outputs without any manual inter-            them to test their programs using their domain
  vention. Readers who are interested to know more              knowledge. For example, consider a program
  about MT may consult the article by Chen et al.7              that computes the sine value of a given angle x.
                                                                We can derive the following two MRs for testing
   WHY USE MT FOR TESTING                                       the sine function based on its properties: MR1:
   SCIENTIFIC SOFTWARE?                                         sin(x’) = sin(x) where x’ = x + 360°. MR2: sin(x’)
     ›› Scientific software is often written by scientists      = −sin(x) where x’ = −x. Due to the constraints
        who have the domain knowledge required to               on the testing budget, suppose that we can
        develop them. However, scientists might lack the        conduct testing with only one MR. In such a
        knowledge to apply different forms of conven-           situation, an electrical engineer will most likely
        tional testing methods. But, as evident from the        choose MR1 to test her program due to the
        example in Figure 2, MT is simple in concept,           periodicity of current. But, on the other hand, a
        and hence could be easily learned and applied           land surveyor may choose MR2 since she usually
        without any prior knowledge of software testing         works with positive and negative angles to
        or without any software testing experience.8            represent clockwise and anticlockwise mea-
     ›› As shown in the example given in Figure 2, MT is        surements of angles.
        easy to implement: test scripts could be easily      ›› MT provides an effective way to conduct unit
        prepared by the scientific software developers to       testing for scientific software. One of the
        automate the testing process or to incorporate          reasons for the lack of unit testing in scientific
        MT into existing testing infrastructures such as        software is the difficulty in validating the
        JUnit. Thus, MT does not require the developers         expected output of the unit for a randomly
        to buy or maintain additional expensive testing         generated test input. In such situations, MT can
        tools.                                                  be used to conduct automated unit testing by
     ›› As we discussed in Section “WHAT IS META-               means of MRs.
        MORPHIC TESTING (MT)?”, many scientific              ›› Scientists often conduct testing using a limited
        and engineering applications face the test              number of test cases with known outputs that
        oracle problem. This makes it challenging to            they obtain from experiments or analytical
        conduct automated systematic testing on these           solutions. MT can be used to effectively extend
        programs. MT supports automated systematic              these limited number of test cases by deriving
        testing on such programs.                               MRs and creating follow-up test cases according
     ›› MT uses MRs to determine whether test cases             to the MRs. These follow-up test cases are most
        pass or fail. Often scientific software is devel-       likely to execute parts of the program that might
        oped by domain experts who know the properties          not have been executed with the original set of
        of these algorithms the best. Thus, it would be         test cases. Thus, MT provides a way to extend
        easy for them to derive MRs for testing these           existing test cases.
        programs.                                            ›› Many scientific software involves elements of
     ›› Scientific software developers will be able to          randomness which makes testing difficult. MT
        identify the most effective MRs and prioritize          can still be applicable in such situations.

www.computer.org/computingedge                                                                                      11
SOFTWARE ENGINEERING

      SOME EXAMPLES                                             experts the outputs generated by versions of the pro-
                                                                gram injected with faults, they were unable to identify
     Testing Epidemiological Model                              that the outputs were produced by a faulty version of
     Implementations Using MT                                   the program.
     Pullum et al. used MT to verify and validate an epide-         Here we report the results of conducting auto-
     miological model implementation.9 Such implemen-           mated unit testing on the following functions that per-
     tations are used to model how diseases are spread          form several main calculations in the SAXS program:10
     in populations. Thus, it is important to verify and val-
     idate these models since they will be used to make            ›› calculateDistance (f1): computes the distance
     critical decisions during a disease spread. Epidemio-           between atoms;
     logical model implementations face the oracle prob-           ›› findGyrationRadius (f2): computes the gyration
     lem because these programs are written to find the              radius of groups of atoms;
     answer in the first place. Therefore, developing an ora-      ›› scatterSample (f3): main function responsible
     cle for testing these programs is practically difficult.        for scattering.
     One of the approaches used to test these models is to
     compare the output of the model with data obtained              We used the machine learning based MR predic-
     from real phenomena. Obviously, such data is limited.      tion approach proposed by Kanewala et al.12 to predict
     Other approaches used to test this type of programs        the likely MRs for these functions. The test inputs were
     include comparing the output with results obtained         generated randomly. There were no violations of these
     from mathematical models and comparing the results         predicted MRs when applied to these three functions.
     with other simulation models. These techniques are              To evaluate the effectiveness of MT for conduct-
     not sufficient for conducting systematic and compre-       ing unit testing, we created faulty versions, known as
     hensive testing on these programs.                         mutants, of these functions using the μJava (https:
         The authors tested an ordinary differential equa-      //cs.gmu.edu/_offutt/mujava/) mutation engine. This
     tion based epidemiological model and an agent-based        mutation engine creates mutants of the program by
     epidemiological model using MT. They used the data         making a syntactic change in the source code. With
     from the 1918 Influenza outbreak to calibrate the          MT, we say that a mutant is killed if an MR is violated
     models. The authors defined 11 MRs based on mak-           when the corresponding source and follow-up test
     ing changes to various model parameters and the            cases are executed on that mutant. Therefore, the
     expected effects that those changes would have on          fault detection effectiveness of MT can be measured
     the model output. These MRs were defined using             by the number of mutants killed during the MT pro-
     the authors' domain knowledge about these models.          cess. Obviously, the higher the percentage of mutants
     Through MT, authors identified an error in the output      killed, the more effective MT is in revealing bugs of a
     method of the agent-based epidemiological model.           program. We use this process to evaluate the fault
                                                                detection effectiveness of MT in the functions men-
     Using MT to Conduct Automated                              tioned above.
     Unit Testing on a Small Angle X-Ray                             Table 1 shows the percentage of mutants killed
     Scattering (SAXS) Program                                  through MT for individual functions. Overall, 90% of
     We used MT to conduct automated unit testing on an         the mutants could be killed using MT. The important
     open source program written to analyze small angle         thing to note here is that the entire unit testing pro-
     x-ray scattering data called SAXS.10, 11 This program      cess was fully automated starting with MR identifica-
     reconstructs macromolecular structures using scat-         tion, source test case generation, test execution and
     tering patterns obtained from experiments. This pro-       further, did not require the domain experts to evalu-
     gram was initially tested by running the program on        ate the correctness of the test outputs. Though no
     a selected set of inputs where the correctness of          violations of MRs were detected for SAXS, MT helps
     the produced outputs was determined by domain              to establish our confidence on the quality of the
     experts. However, when we showed the domain                SAXS program.

12               ComputingEdge                                                                                   July 2020
SOFTWARE ENGINEERING

                                                             TABLE 1. Mutants detected by predicted MRs. f 1:
  Testing a Monte Carlo Simulation                           calculateDistance, f2: findGyrationRadius, and f3:
  Program With MT                                            scatterSample.
  Ding and Hu13 used MT for testing a Monte Carlo mod-
                                                                                           f1       f2        f3      Total
  eling program that simulates photon propagations in
                                                             No. of faulty
  biological tissues for the purpose of accurate genera-                                   19       54       139         212
                                                             versions used
  tion of reflectance images. The biggest challenge for
  testing this program is the lack of test oracles. One      No. of faulty versions
                                                                                           19       45       127         191
                                                             detected by MT
  solution is to compare the results of the Monte Carlo
  simulation program to experimental results. But, as        % of detected
                                                                                          100       83        91         90
                                                             faulty versions
  with many scientific software, building the necessary
  infrastructure to conduct the relevant physical exper-
  iments is time consuming and expensive. For exam-
  ple, in this specific case, conducting a physical exper-   and Monte Carlo simulations. We are strongly con-
  iment would require a laser beam that would produce        fident that MT is one of the most appropriate and
  a specific number of photons, an environment without       cost-effective testing techniques for scientists and
  interruptions from other light sources and good reac-      engineers.
  tive imaging cameras. Thus, the authors used MT to
  conduct testing on this program.                            REFERENCES
      The authors identified five MRs for the program         1. L. Hatton, “The T experiments: Errors in scientific
  based on domain knowledge and experimental results.            software,” IEEE Comput. Sci. Eng., vol. 4, no. 2, pp. 27–38,
  They generated tests that cover all the branches and           Apr.–Jun. 1997.
  functions in the program. Through the violation of one      2. A. J. Abackerli, P. H. Pereira, and N. Calônego, Jr., “A
  of the MRs used for testing, the authors discovered            case study on testing CMM uncertainty simulation
  faults in the program and corrected it.                        software (VCMM),” J. Brazilian Soc. Mech. Sci. Eng., vol. 32,
      They further evaluated the effectiveness of MT             pp. 8–14, Mar. 2010.
  using mutants. The authors created 150 mutants for          3. L. Hatton and A. Roberts, “How accurate is scientific
  the Monte Carlo simulation program and they were               software?,” IEEE Trans. Softw. Eng., vol. 20, no. 10,
  able to detect 90% (135) of these mutants using MT.            pp. 785–797, Oct. 1994.
                                                              4. G. Miller, “A scientist’s nightmare: Software problem
   SUMMARY                                                       leads to five retractions,” Science, vol. 314, no. 5807,
  Some characteristics in scientific software, such as           pp. 1856–1857, 2006. [Online]. Available: http://www
  not knowing the correct answers and inherent uncer-            . sciencemag.org/content/314/5807/1856.short
  tainties in calculations, make testing them difficult.      5. U. Kanewala and J. M. Bieman, “Testing scientific
  MT can be an effective testing technique to test               software: A systematic literature review,” Inf. Softw.
  these programs. Instead of checking the correctness            Technol., vol. 56, no. 10, pp. 1219–1232, 2014.
  of individual test outputs, MT checks whether the           6. S. Segura, G. Fraser, A. B. Sánchez, and A. R. Cortés,
  changes in the test outputs are according to what is           “A survey on metamorphic testing,” IEEE Trans. Softw.
  expected by the program with respect to the changes            Eng., vol. 42, no. 9, pp. 805–824, 2016. [Online]. Avail-
  in the inputs. These relationships between inputs and          able: https://doi.org/10.1109/TSE.2016.2532875
  the expected changes in the outputs are referred to         7. T. Y. Chen et al., “Metamorphic testing: A review of
  as MRs. Scientists, who typically develop this scien-          challenges and opportunities,” ACM Comput. Surveys,
  tific software, would be in a great position to identify       2017, to be published.
  effective MRs because of their domain knowledge             8. T. Y. Chen, F.-C. Kuo, and Z. Q. Zhou, “An effective
  and, thus would be able to effectively test their soft-        testing method for end-user programmers,” in Proc.
  ware using MT. MT has been successfully applied for            First Workshop End-User Softw. Eng. (WEUSE 2005),
  testing various scientific software including epide-           2005, pp. 21–25. [Online]. Available: http://doi.acm
  miological model implementations, SAXS programs,               .org/10.1145/1082983.1083236

www.computer.org/computingedge                                                                                                  13
SOFTWARE ENGINEERING

      9. L. L. Pullum and O. Ozmen, “ Early results from meta-              U.K., the DIC degree from the Imperial College, London, U.K.,
         morphic testing of epidemiological models,” in Proc.               and the Ph.D. degree from The University of Melbourne, Mel-
         ASE/IEEE Int. Conf. BioMed. Comput., Dec. 2012 ,                   bourne, VIC, Australia. Prior to joining Swinburne, he taught
         pp. 62– 67.                                                        at The University of Hong Kong and The University of Mel-
     10. U. Kanewala , A. Lundgren, and J. M. Bieman, “Auto-                bourne. He is the Inventor of metamorphic testing and adap-
         mated metamorphic testing of scientific software,”                 tive random testing. His contact address is: Department of
         J. C. Carver, N. P. Chue Hong, and G. K. Thiruvathukal,            Computer Science and Software Engineering, Swinburne
         Eds. Soft. Eng. Sci., Taylor & Francis, 2016 , doi:                University of Technology, VIC 3122, Australia. Contact him at
         https://doi.org/10.1201/9781315368924.                             tychen@swin.edu.au.
     11. [Online]. Available: http://cgi.cs.arizona.edu
         /~mstrout/Projects/SAXS/software.php, 2011.
         Accessed on: Sep. 11, 2017.
     12. U. Kanewala , J. M. Bieman, and A. Ben-Hur, “ Predicting
         metamorphic relations for testing scientific software:
         A machine learning approach using graph kernels,”
         Softw. Testing, Verification Rel., vol. 26, no. 3, pp. 245 –269,
         2016, stvr.1594. [Online]. Available at: http://dx.doi.org
         /10.1002/stvr.1594
     13. J. Ding and X. Hu, “Application of metamorphic
         testing monitored by test adequacy in a Monte Carlo
         simulation program,” Softw. Qual. J., vol. 25, no. 3,
         pp. 841 – 869, 2017. [Online]. Available at: https://doi
         .org/10.1007/s11219-016-9337-3

     UPULEE KANEWALA is an Assistant Professor with Mon-

                                                                                                                                                                        Cutting Edge
     tana State University, Bozeman, MT, USA. Her research
     interests include software testing, metamorphic testing,
                                                                               stay
     and quality assurance of scientific software. She received
     the Ph.D. degree in computer science from Colorado State
                                                                             on the
     University, Fort Collins, CO, USA, in 2015, the Master of Sci-
     ence degree in computer engineering from Purdue Univer-
     sity, West Lafayette, IN, USA, in 2010, and the Bachelor of
     Science degree in computer engineering from University of
     Peradeniya, Peradeniya, Sri Lanka, in 2007. She has authored
                                                                                                                                                                                                           J A N UA RY/ F E B R UA RY 2 016

                                                                                                                                                                                                                                                               IEEE Intelligent Systems provides peer-
                                                                                IEEE

                                                                                                                                                                                                  Also in this issue:
                                                                                                                                                                                                               AI’s 10 to Watch         56
                                                                                                                                                                                                Real-Time Taxi Dispatching              68
                                                                                                                           IEEE
                                                                                JANUARY/FEBRUARY 2016

                                                                                                                                                                                        From Flu Trends to Cybersecurity 84

     or coauthored multiple peer-reviewed articles and book                                                                                P   U   T   T   I   N    G   A   I   I   N   T   O      P   R   A    C   T   I   C   E

                                                                                                                                                                                                                                                               reviewed, cutting-edge articles on the
     chapters on metamorphic testing including the first paper on
                                                                                                                                                                                                                                                               theory and applications of systems
     automatic detection of Metamorphic Relations. Her address
                                                                                ONLINE BEHAVIORAL ANALYSIS

                                                                                                                                                                                                                                                               that perceive, reason, learn, and
     is Gianforte School of Computing, 357 Barnard Hall, Montana
     State University, Bozeman, MT 59717. Contact her at upulee                                                                                                                                                                                                act intelligently.
                                                                                VOLUME 31
                                                                                NUMBER 1

                                                                                                                                     www.computer.org/intelligent

     .kanewala@montana.edu.
                                                                                                             IS-31-01-C1   Cover-1                                                                                                  January 11, 2016 6:06 PM

     TSONG YUEH CHEN is a Professor with Swinburne University
     of Technology, Melbourne, VIC, Australia. His main research
     interest focuses on software testing. He received the B.Sc.
                                                                                                                       The #1 AI Magazine
                                                                               www.computer.org/intelligent
                                                                                                                                                                                                                                                                                    IEEE

     and M.Phil. degrees from The University of Hong Kong, Hong
     Kong, the M.Sc. degree from University of London, London,

14                 ComputingEdge                                                                                                                                                                                                                                                                    July 2020
PURPOSE: The IEEE Computer Society is the world’s largest            EXECUTIVE COMMITTEE
association of computing professionals and is the leading provider   President: Leila De Floriani
of technical information in the field.
                                                                     President-Elect: Forrest Shull
MEMBERSHIP: Members receive the monthly magazine                     Past President: Cecilia Metra
Computer, discounts, and opportunities to serve (all activities      First VP: Riccardo Mariani; Second VP: Sy‐Yen Kuo
are led by volunteer members). Membership is open to all IEEE        Secretary: Dimitrios Serpanos; Treasurer: David Lomet
members, affiliate society members, and others interested in the     VP, Membership & Geographic Activities: Yervant Zorian
computer field.                                                      VP, Professional & Educational Activities: Sy-Yen Kuo
                                                                     VP, Publications: Fabrizio Lombardi
COMPUTER SOCIETY WEBSITE: www.computer.org
                                                                     VP, Standards Activities: Riccardo Mariani
OMBUDSMAN: Direct unresolved complaints to                           VP, Technical & Conference Activities: William D. Gropp
ombudsman@computer.org.
                                                                     2019–2020 IEEE Division VIII Director: Elizabeth L. Burd
CHAPTERS: Regular and student chapters worldwide provide the         2020-2021 IEEE Division V Director: Thomas M. Conte
opportunity to interact with colleagues, hear technical experts,     2020 IEEE Division VIII Director-Elect: Christina M. Schober
and serve the local professional community.
AVAILABLE INFORMATION: To check membership status, report            BOARD OF GOVERNORS
an address change, or obtain more information on any of the          Term Expiring 2020: Andy T. Chen, John D. Johnson,
following, email Customer Service at help@computer.org or call
                                                                     Sy-Yen Kuo, David Lomet, Dimitrios Serpanos,
+1 714 821 8380 (international) or our toll-free number,             Hayato Yamana
+1 800 272 6657 (US):                                                Term Expiring 2021: M. Brian Blake, Fred Douglis,
  •   Membership applications                                        Carlos E. Jimenez-Gomez, Ramalatha Marimuthu,
  •   Publications catalog                                           Erik Jan Marinissen, Kunio Uchiyama
  •   Draft standards and order forms                                Term Expiring 2022: Nils Aschenbruck,
  •   Technical committee list                                       Ernesto Cuadros‐Vargas, David S. Ebert, William Gropp,
  •   Technical committee application                                Grace Lewis, Stefano Zanero
  •   Chapter start-up procedures
  •   Student scholarship information
  •   Volunteer leaders/staff directory                              EXECUTIVE STAFF
  •   IEEE senior member grade application (requires 10 years        Executive Director: Melissa A. Russell
      practice and significant performance in five of those 10)      Director, Governance & Associate Executive Director:
                                                                     Anne Marie Kelly
PUBLICATIONS AND ACTIVITIES                                          Director, Finance & Accounting: Sunny Hwang
                                                                     Director, Information Technology & Services: Sumit Kacker
Computer: The flagship publication of the IEEE Computer Society,
                                                                     Director, Marketing & Sales: Michelle Tubb
Computer publishes peer-reviewed technical content that covers
                                                                     Director, Membership Development: Eric Berkowitz
all aspects of computer science, computer engineering,
technology, and applications.
                                                                     COMPUTER SOCIETY OFFICES
Periodicals: The society publishes 12 magazines and 18 journals.     Washington, D.C.: 2001 L St., Ste. 700, Washington, D.C.
Refer to membership application or request information as noted      20036-4928; Phone: +1 202 371 0101; Fax: +1 202 728 9614;
above.                                                               Email: help@computer.org
Conference Proceedings & Books: Conference Publishing                Los Alamitos: 10662 Los Vaqueros Cir., Los Alamitos, CA 90720;
Services publishes more than 275 titles every year.                  Phone: +1 714 821 8380; Email: help@computer.org
Standards Working Groups: More than 150 groups produce IEEE
                                                                     MEMBERSHIP & PUBLICATION ORDERS
standards used throughout the world.
                                                                     Phone: +1 800 678 4333; Fax: +1 714 821 4641;
Technical Committees: TCs provide professional interaction in        Email: help@computer.org
more than 30 technical areas and directly influence computer
engineering conferences and publications.                            IEEE BOARD OF DIRECTORS
Conferences/Education: The society holds about 200 conferences       President: Toshio Fukuda
each year and sponsors many educational activities, including        President-Elect: Susan K. “Kathy” Land
computing science accreditation.                                     Past President: José M.F. Moura
Certifications: The society offers three software developer          Secretary: Kathleen A. Kramer
credentials. For more information, visit                             Treasurer: Joseph V. Lillie
www.computer.org/certification.                                      Director & President, IEEE-USA: Jim Conrad
                                                                     Director & President, Standards Association: Robert S. Fish
BOARD OF GOVERNORS MEETING                                           Director & VP, Educational Activities: Stephen Phillips
                                                                     Director & VP, Membership & Geographic Activities:
24 – 25 September 2020 in McLean, Virginia, USA                      Kukjin Chun
                                                                     Director & VP, Publication Services & Products: Tapan Sarkar
                                                                     Director & VP, Technical Activities: Kazuhiro Kosuge

                                                revised 1 May 2020
DEPARTMENT: SCIENTIFIC PROGRAMMING
                                                                                                      This article originally
                                                                                                                appeared in

     SciPipe—Turning Scientific                                                                          vol. 21, no. 3, 2019

     Workflows into Computer
     Programs
     Samuel Lampa, Martin Dahlö, Jonathan Alvarsson, and Ola Spjuth, Uppsala University

      INTRODUCTION                                                  might not be obvious before trying to apply them to
     Scientific Workflows are becoming increasingly popu-           complex tasks.
     lar as a way to automate complex scientific computa-                At the Department of Pharmaceutical Biosciences,
     tions consisting of multiple programs.                         we have spent the last few years using workflow tools
         One of the main motivations behind this develop-           to automate machine learning pipelines for predictive
     ment is increased robustness and reproducibility of            toxicology among other things. We have reviewed the
     computational analyses. Chaining together multiple             top dozen workflow tools popular in our field of bioin-
     programs using plain scripts, as is often the first step       formatics. We even tried out Luigi, created by music
     in automating a pipeline, can easily become fragile            company Spotify, which is popular in industry and far
     and error prone due to the manual management of file           from a bioinformatics-aimed tool.
     paths and program invocations. Also, plain scripts are              A recurring theme has been how often tools
     not optimal if for some reason you have to cancel a run        contain various limits that make them hard to use
     and try to restart it from any partially finished steps. It    for complex use cases. Machine learning workflows
     can be hard to know which output files are properly fin-       in particular often lead to highly complex workflows
     ished and which are truncated from the cancelled run.          because of their common inclusion of cross validation
     Last but not least, plain scripts do not by default save       and parameter sweeps from hyperparameter optimi-
     an execution trace of what was run, such that the full         zation. Not only are these workflows complex, but they
     procedure used to create a specific output file can be         also show some characteristics not always common
     clearly presented. These are all aspects that scientific       in other domains: The need for dynamic scheduling.
     workflows are designed to help with.                           That is, they need to be able to parametrize and start
         Despite many hundreds of scientific workflow               tasks based on information obtained during the work-
     tools published over the years, there can still be sig-        flow run. Somewhat surprisingly, this is a problem in a
     nificant challenges when trying to use many of them.           majority of workflow tools because of how common
         One reason for this is that many workflow tools            it is that they have a strict separation between the
     have been designed with a very narrow use case in              scheduling and execution phases of the workflow run.
     mind, often building in assumptions unique to the              That is, after a workflow has progressed into its execu-
     specific problem domain aimed at, which might make             tion phase, it is commonly not possible to schedule and
     them less applicable for scientific pipeline needs in          start new tasks with parameter values obtained in the
     general.                                                       current run. At least not without initiating completely
         Even among the large numbers of general workflow           new workflow runs.
     tools, surprisingly many contain various constraints                Anyway, after a lot of evaluation, it seemed at the
     and assumptions limiting their generality. Often, this         time that Luigi1 was the most promising way forward
                                                                    for us. Later, we learned that Luigi's functional pro-
                                                                    gramming inspired API design was not quite fit for our
     DOI No. 10.1109/MCSE.2019.2907814                              needs of dynamic workflow rewiring, which is one of
     Date of current version 26 April 2019.                         the reasons why we later chose to develop the SciPipe

16               July 2020                         Published by the IEEE Computer Society              2469-7087/20 © 2020 IEEE
SCIENTIFIC PROGRAMMING

  library (more on that later), but there is an interesting        SciPipe7 is designed from the start as a pro-
  story to tell about our Luigi phase too.                    gramming library embedded in the implementation
                                                              language (Google's Go, or “Golang”), rather than
  Workflows as Computer Programs                              inventing new textual syntax or graphical tools. It
  It turned out that because Luigi was implemented            thus leverages the full power and flexibility of the Go
  as a programming library, it was flexible enough that       programming language for implementing workflow
  we could build an alternative API on top of it, which       logic. So far, we have not encountered a workflow use
  resulted in the SciLuigi helper library. 2 Specifically,    case that we have not been able to model with this
  SciLuigi enabled us to keep the dependency network          approach. Even complex machine learning workflows
  definition separate from task definitions—a core prin-      with nested branching has been solvable, as exem-
  ciple in flow-based programming, as we will explain         plified in a recent paper by the authors. 8 Another
  shortly—which makes it much easier to reconnect             nice side effect of the Go language in particular is
  workflows without changing internals of workflow            that it compiles to self-contained executable files,
  components in complex ways.                                 which makes deployment of most Go programs very
      This positive experience from a programming             straightforward. SciPipe is open source software
  API-based workflow tool helped push a realization that      (MIT licensed). A simple “Hello World” style workflow
  has grown over a number of years of discussions and         example is shown in Figure 2. For more information,
  experimenting: Workflows are in the end just a glori-       source code and documentation, see the work of
  fied version of computer programs.                          Lampa et al. 6 , 9
      It turns out that although many use cases can
  be modeled as simple linear sequences of program            Flow-Based Programming
  invocations depending on each other, not all cases          Focuses on Data Flow
  are that simple. There are many examples where the          Now, there are some differences between most work-
  need for logic to control the workflow structure is         flow programs and most normal programs. The main
  so complex that any attempt at modeling it with a           difference can be seen in the strong focus in work-
  declarative workflow language ends up implement-            flow programs on data flow. When defining how mul-
  ing what is already available in existing programming       tiple programs depend on each other we are in effect
  languages.                                                  defining how the data will flow through these pro-
      This tendency can also be seen in popular work-         grams. There is actually a looming risk for workflow
  flow engines building on domain specific languages          tool developers to miss this detail and model the work-
  (DSLs). Tools that become popular often either have a       flow dependency graph as just dependencies between
  really flexible DSL from the start, e.g., because the DSL   programs and not their inputs and outputs. This can
  is implemented in an existing scripting language (e.g.,     quickly lead to problems because one program typ-
  Nextflow, 3 building on the Groovy language), or their      ically do not depend on just one other program but
  DSLs are step by step becoming increasingly complex,        rather specific outputs of possibly multiple upstream
  and in the end approaching computer programming             programs. That is, data needs to be a first class citizen
  languages in their capability (e.g., Cuneiform,4 imple-     when defining workflows, or we risk missing important
  menting a powerful functional language 5).                  details that will otherwise be buried in less thought-out
                                                              ad hoc code.
  SciPipe                                                         One paradigm that takes note of this fact is
  The above lesson is something we were taking into           flow-based programming (FBP).10 Invented at IBM in
  account when we set out to design the SciPipe work-         the late 1960s and used on large mainframe computers
  flow tool from scratch after finding out about sev-         at banks and other large institutions, the flow-based
  eral limitations with the Luigi/SciLuigi setup we were      programming paradigm has seen some resurge in pop-
  already using (primarily the need for dynamic schedul-      ularity in recent years, possibly driven by the recent
  ing and lack of compile time warnings about errors in       trends toward multicore CPUs, distributed computing,
  workflow connectivity).                                     and message-oriented architectures.

www.computer.org/computingedge                                                                                           17
SCIENTIFIC PROGRAMMING

     FIGURE 1. Flow-based programs can be likened to a factory with processing stations connected with conveyor belts, upon which
     data items “flow” through the network of stations and conveyor belts.

         Flow-based programming ordains a number of                    implementations. It thus allows us to create libraries
     design principles. The most important one in the con-             of reusable components which can be plugged in at
     text of dependency definition though is that it models            any place in the program network, as long as its in- and
     dependencies between processes in an appropriate                  out-ports are compatible with the in- and out-ports
     level of detail; in terms of data inputs and outputs. The         they connect to.
     data itself are modeled through so-called “information                Note that while FBP is often associated with visual
     packets” and inputs and outputs as “ports”—a kind                 programming, that is far from a requirement. In our
     of pluggable component between which yet another                  experience, skipping the visual part and focusing on
     concept can be connected; “channels.” Channels                    a simple programming API has more than fulfilled
     have bounded buffers and act as a kind of conveyor                our needs, while letting us avoid depending on the
     belt between processes, letting processes work on                 complexity of a visual programming framework. The
     information packets from its in-ports asynchronously              fact that the Go language provides the most impor-
     and sending them on their out-ports (mostly) indepen-             tant pieces for this to work (independently running
     dently from the processing rate of other processes.               go-routines and channels with bounded buffers) cer-
     Figure 1 tries to depict this in an artistic way.                 tainly helps.
         The core idea of flow-based programming is how
     it draws all of these things together into a declara-             Reproducibility in Workflow Programs
     tive data flow definition separate from the process               We have presented some rationale for writing work-
     implementations. This allows easy rewiring of the                 flows as computer programs, but what about the
     data flow without changing a single line of process               other aspects important for workflows, such as

18                ComputingEdge                                                                                           July 2020
You can also read