Manual for installation and usage of R-php - Angelo M. Mineo & Alfredo Pontillo Versione n. 0.99a

Page created by Jessica White
CONTINUE READING
Manual for installation and usage of R-php - Angelo M. Mineo & Alfredo Pontillo Versione n. 0.99a
Manual for installation and usage of R-php
                 Versione n. 0.99a

       Angelo M. Mineo & Alfredo Pontillo
Manual for installation and usage of R-php - Angelo M. Mineo & Alfredo Pontillo Versione n. 0.99a
1

Presentation
R-php is a project developed inside the Department of Statistical and Mathemat-
ical Sciences “Silvio Vianelli” of the University of Palermo (Italy) and has as goal
the realization of a web-oriented statistical software, i.e. a software which the final
user reaches through Internet and requires for the running only the installation of a
browser, that is a program for the hypertext visualization (examples of browser are:
Explorer, Mozilla, Firefox, Opera, and so on).
   R-php is an open-source project with the code released by the authors and can
be freely installed (see paragraph 1.1)
   The idea to project a statistical software that can be used through Internet comes
out from the following considerations: it is an ascertained fact that the growing
spread of Internet and the request of new services from its users has changed and
still is radically changing the way to access the daily use structures (work, study, and
so on); by now the most of information and services goes through the Web and the
software philosophy goes to the same direction; on the database management software
an example is the package PhpMyAdmin that is a via web graphical interface of
MySQL, the database management system more used for web-oriented applications;
on the business field another example of use of the Web is the develop of home-banking
services.
   In general, the trend on producing software that are installed on a single computer
is going down in favour of software that can be used through a connession to Internet
and a browser.
   A main feature of R-php is that all the statistical analyses exploit as “engine” the
open-source statistical programming environment R, that is more and more used not
only from the academic world, but also from consulting firms (Marc Scwatz, 2004,
The Decision to Use R: a cosulting business perspective, R News, Vol. 4(1), pp.
2-5). Then R-php is classifiable among the web projects dedicated to R. Beyond
this project, there are others such as: R-web, R-PHP, and so on (see the web page
http://http://franklin.imgen.bcm.tmc.edu/R.web.servers). Shortly, the main
difference between R-php and the previously cited projects consists in the presence
Manual for installation and usage of R-php - Angelo M. Mineo & Alfredo Pontillo Versione n. 0.99a
2

of an interactive module (R-php point-and-click ) inside R-php, that allows to make
some of the main statistical analyses without the user has to know the statistical
environment R.
   The potential users of R-php are several: let’s think about, for example, students
either inside didactic facilities, such as informatic laboratories, or at home through a
simple Internet connection, or a possible user that does not know and wants not to
know a programming environment as R, but wants to make some simple statistical
analyses without buying the license of a commercial statistical software.
Manual for installation and usage of R-php - Angelo M. Mineo & Alfredo Pontillo Versione n. 0.99a
Chapter 1

Main features of R-php

1.1      Guide to the installation of a R-php server
In this section we shall see how to install R-php on an Apache server, refering
to the next paragraphs for a more detailed description of the needed software for
a correct running of R-php. In a simple manner, with no loss of generality, to
point out the necessary steps for the installation, we shall refer to a LAMP system
(Linux, Apache, MySQL, PHP), i.e. a computer with a Linux operating system,
with Apache web server, with MySQL and PHP. Downloaded the R-php-xxx.tar.gz
source from the web site http://dssm.unipa.it/R-php, or from http://r-php.
homelinux.net, we have to follow these steps:

  1. Obtaining the super user (root) privilege with the command su.

  2. Copying the source on the Apache root: for example, by opening a console
      and by going to the directory where the file R-php-xxx.tar.gz is, it is sufficient
      to execute the following command:

        • for a Redhat based distribution

           ]\# cp R-php-xxx.tar.tgz /var/www/html/

        • for a Debian distribution

                                           3
Manual for installation and usage of R-php - Angelo M. Mineo & Alfredo Pontillo Versione n. 0.99a
CHAPTER 1. MAIN FEATURES OF R-PHP                                                4

          ]\# cp R-php-xxx.tar.tgz /var/www/

  3. Extracting the R-php-xxx.tar.gz file, for example with the following command:

     ]\# tar xvzf R-php-xxx.tar.gz

     Now, the user can find the R-php-xxx directory starting from the root of the
     Apache server: this directory includes all the software.

  4. Entering the R-php-xxx directory with the following command:

     ]\# cd R-php-xxx

     This directory has the following components:

       • COPYNG (file containing the user license);
       • Doc (directory containing the documentation);
       • include (directory containing the configuration files);
       • index.html (main page);
       • R (directory containing R-php base);
       • README (file containing some beginning information about the soft-
          ware);
       • R-gui (directory containing R-php Point and Click );

  5. Entering the include directory with the following command:

     ]\# cd include

  6. Editing (by using, for example, vi) the conn.php file, by modifying the fields
     with the necessary data for the access to MySQL (user id and password).

     ]\# vi conn.php

   Now, it is sufficient loading with a browser the site http://localhost/R-php-xxx/
or the site http://your-domain/R-php-xxx, to run R-php.
Manual for installation and usage of R-php - Angelo M. Mineo & Alfredo Pontillo Versione n. 0.99a
CHAPTER 1. MAIN FEATURES OF R-PHP                                                       5

1.2      Description of R-php
In this paragraph we shall illustrate the needed software for the implementation of a
R-php server and the used code for the running of the two considered modules. We
want to point out that all the used software is open-source.

1.2.1     Needed software
For a correct running of R-php it is necessary installing the following software on
the server:

   • Apache:
      The Apache Httpd server:

        – is a powerful, flexible, HTTP/1.1 compliant web server;
        – implements the latest protocols;
        – is highly configurable and extensible with third-party modules;
        – can be customized by writing “modules” using the Apache module API;
        – provides full source code and comes with an unristrictive license;
        – runs on Windows NT/9X, Netware 5.X and above, OS/2 and most
              versions of UNIX, as well as several other operating systems;
        – is actively being developed;
        – implements many frequently requested features, including:
               ∗ DBM databases for authentication, that allows to set up easily password-
                  protected pages with enormous numbers of authorized users, without
                  bogging down the server;
               ∗ customized responses to errors; indeed, it allows to set up files or even
                  CGI scripts, which are returned by the server in response to errors and
                  problems;
Manual for installation and usage of R-php - Angelo M. Mineo & Alfredo Pontillo Versione n. 0.99a
CHAPTER 1. MAIN FEATURES OF R-PHP                                                 6

         ∗ Multiple DirectoryIndex directives, that instructs the server which files
            have to consider when a directory URL is requested, whichever it finds
            in the directory;
         ∗ virtual hosts, that is a much requested feature, sometimes known as
            “multi-homed-servers”; this feature allows the server to distinguish
            between requests made to different IP addresses or names “mapped”
            to the same machine;
         ∗ Configurable reliable piped logs.

   Nowadays, Apache is considered the most efficient, fast and functional Web
   server; according to the data given by the Netcraft Web Server Survay (http:
   //news.netcraft.com/archives/web_server_survey.html) Apache is the
   more used Web server all around the World, with a pecentage of use equal to
   68.43 % (this result is referred to Jenuary 2005). For a comparison among the
   more common used Web server see figure 1.1.
Manual for installation and usage of R-php - Angelo M. Mineo & Alfredo Pontillo Versione n. 0.99a
CHAPTER 1. MAIN FEATURES OF R-PHP                                         7

       Figure 1.1: Statistics on the diffusion of the main Web servers.
Manual for installation and usage of R-php - Angelo M. Mineo & Alfredo Pontillo Versione n. 0.99a
CHAPTER 1. MAIN FEATURES OF R-PHP                                                     8

  • R
   R is a language and environment for statistical computing and graphics; it is
   a GNU project which is similar to the S language and environment which was
   developed at BELL Laboratories, formerly AT&T and now Lucent Technologies
   by John Chambers and colleagues. R can be considered as a different imple-
   mentation of S. There are some important differences, but much code written
   for S runs unaltered under R. R provides a wide variety of statistical (linear
   and nonlinear modelling, classical statistical tests, time-series analysis, classifi-
   cation, clustering, ...) and graphical techniques, and is highly extensible. The
   S language is often the vehicle of choice for research in statistical methodology
   and R provides an Open Source route to participation in that activity. One of
   R’s strengths is the ease with which well-designed publication-quality plots can
   be produced, including mathematical symbols and formulae where needed. R
   compiles and runs on a wide variety of UNIX platforms and similar systems
   (including FreeBSD and Linux), Windows and MacOS. R is an integrated
   suite of software facilities for data manipulation, calculation and graphical dis-
   play and includes:

        – an effective data handling and storage facility;
        – a suite of operators for calculations on arrays, in particular matrices;
        – a large, coherent, integrated collection of intermediate tools for data anal-
          ysis;
        – graphical facilities for data analysis and display either on-screen or on
          hardcopy;
        – a well-developed, simple and effective programming language which in-
          cludes conditionals, loops, user-defined recursive functions and input/output
          facilities.

   The term “environment” is intended to characterize it as a fully planned and
   coherent system, rather than an incremental accretion of very specific and in-
   flexible tools, as is frequently the case with other data analysis software.
Manual for installation and usage of R-php - Angelo M. Mineo & Alfredo Pontillo Versione n. 0.99a
CHAPTER 1. MAIN FEATURES OF R-PHP                                                   9

  • PHP
   PHP is an HTML-embedded scripting language, i.e. a language that allows to
   insert code executable inside HTML pages. Much of its syntax is borrowed
   from C, Java and Perl with a couple of unique PHP-specific features thrown
   in. The goal of PHP is to allow web developers to write dynamically generated
   pages quickly. PHP stands for PHP Hypertext Preprocessor and then its name
   is a so called recursive acronym.

  • MySQL
   MySQL is a relational database management system. The main feature of a
   relational database is the capacity to store data in separate structures, called
   tables, rather than putting all the data in one big area. The SQL part of
   MySQL is acronym of Structure Query Language, which is the most common
   language used to access database. The MySQL database server is the most
   popular open-source database in the World; its diffusion is due to the great
   speed and flexibility; it is exactly the speed to access data that makes MySQL
   suitable for the implementation of Web applications.

  • ImageMagick
   ImageMagick is a collection of tools and libraries to manipulate an image.
   ImageMagick can read, write, and manipulate image in any of the more pop-
   ular image formats, including .gif, .jpeg, .png e .pdf; moreover, you can create
   .gif’s dynamically making it suitable for Web applications. Here are just a few
   examples of what ImageMagick can do:

     – Convert an image from one format to another;
     – Resize, rotate, sharpen, color reduce, or add special effects to an image;
     – Create thumbnails;
     – Turn a group of images into a .gif animation sequence;
     – Draw shapes or text on an image;
     – Decorate an image with a border or frame;
CHAPTER 1. MAIN FEATURES OF R-PHP                                                   10

           – Describe the format and characteristics of an image.

        The main feature of ImageMagick, that makes it a necessary tool to manip-
        ulate images also for Web applications, it is the possibility to make the above
        mentioned (and not only) operations by shell, for example BASH 1 ).
        As it is known, R has several commands to create images of different formats
        (.png, .jpeg, .ps, and so on), but in the base module of R-php the use of
        ImageMagick has been more easily manageable. Moreover, under UNIX the
        R’s png device uses the X11 driver, which is a problem in “batch mode” or
        for remote operation. The use of ImageMagick in the base module has been
        extended also to the point-and-click module of R-php. Then, R, invocated in
        “batch mode”, gives graphs in .ps format; following, it is reported the PHP
        code to manipulate and convert the .ps file in a “web-compatible” format by
        using ImageMagick:

        exec("convert ./pages/tmp/$temp/Rplots.ps ./pages/tmp/$temp/R.png");
        exec("convert -antialias -rotate 90 $fn ./pages/tmp/$temp/$nn.png");

      • htmldoc
        htmldoc is a program that generates indexed .html, Adobe PostScript, and
        .pdf files from .html ”source” files. htmldoc includes a simple GUI (Grafical
        User Interface) to manage your .html files and automatically (re)generate files
        for viewing and printing. htmldoc can also be used on your web server to
        generate files “on-the-fly”.

For the above described software it is advisable to have the last versions and anyway
it is necessary having installed: for PHP a version from the 4.3.0., for R a version
from 2.0.0.
  1
      Bourn Again SHell
CHAPTER 1. MAIN FEATURES OF R-PHP                                                  11

1.2.2       Description of the first module (R-php base)
When the user enters the home page of this module, a temporary directory is created;
the name of this directory is generated by this portion of code:

$unico = substr(uniqid("5555"),-5);
$time = time();
$max = mktime(6, 0, 0, 1, 1, 1970);
$temp = $unico."_".$time;

According to the implemented code, first it is generated a pseudo-random string of
alpha-numeric characters, to which is associated the UNIX timestamp. Into this
temporary directory we have all the user generated files that are created during the
current work session; this guarantees the multiuse, i.e. the concurrent use of more
users, without creating confusion between the files created in different concurrent
sessions. This feature of R-php does not seem it is implemented in other similar
systems. The maximum number of concurrent users does not depend on R-php, but
on the set up of the hosting Apache server. Following, there is the PHP code that
compares the current UNIX timestamp with that present in the temporary directory
name and deletes those directories present in the server for more than six hours: this
allows to avoid a great number of temporary directories into the server:

foreach(glob("./pages/tmp/*") as $dir){
    $time_scad_temp = explode("_", $dir);
    $control = $time - $time_scad_temp[1];
    if($control>=$max){
        delfile("$dir/*");
        rmdir("$dir");
    }
}
mkdir("pages/tmp/$temp", 777);
chmod("pages/tmp/$temp", 0777);
CHAPTER 1. MAIN FEATURES OF R-PHP                                                    12

    The first module of R-php allows in a textarea the immission of portions of R
code that is transcripted in a text file. Following, there is the sequence of PHP
instructions that allow this operation:

$nomefile = "codice.txt";
$fp = fopen("./pages/tmp/$temp/$nomefile", "w")
       or die("impossibile aprire file");
fputs ( $fp, "$codice") or die("impossibile scrivere");
fclose($fp);

    As it is known, R has commands that permit the interactions with the operatimg
system of the used computer; this could be dangerous in an environment like this one,
because an outer user could input (voluntarily or not) harmful commands. To avoid
this drawback, we have decided to implement a control structure that does not allow
the use of a set of commands that we think are dangerous for the server safety. These
commands are reported in a MySQL database, containing also a short description
of what the banned command would do. It is clear that the user, interested to install
R-php on an own server, can modify this list of commands in any time. For more
safety, this control structure works first “client-side” in javascript and next (only if
the first control is evaded) also “server-side” in PHP.
    For the data input, besides the possibility to insert the data at hand through
the R commands, there is the chance to import the data from a text file. From a
technical point of view, this operation is an “upload” on the server, by using the
following sequence of PHP commands:

if(isset($HTTP_POST_VARS[upFile]) and $HTTP_POST_VARS[upFile] = "yes"){
    include("./include/temp.php");
    $nomeFile = pulisci($HTTP_POST_FILES[’file’][’name’]);
    move_uploaded_file($HTTP_POST_FILES[’file’][’tmp_name’],
                          "./pages/tmp/$_SESSION[temp]/$nomeFile");
}
CHAPTER 1. MAIN FEATURES OF R-PHP                                                  13

   The R commands, that are in the previously created file, are processed from R
that is invocated by PHP in “batch mode”. The command that makes this operation
is the following:

exec(R --no-save -q < codice.txt > output 2>&1);

   R, invocated in “batch mode”, gives the output in two formats: the former is
textual with the requested analysis, the latter in .ps format containing the possible
graphs. At this point, the two files are treated in two different ways:

   • the text file is simply formatted (by using the style-sheet) to make it more
      readable;

   • the .ps format file is split to obtain a file for each image; then each image is
      converted in a more suitable format for the Web visualization (.png format);
      these operations are done with ImageMagick.

   In the final phase the output, with all the graphs, is visualized in a new window.
In this window there is the possibility for the user to save the output in .pdf format
by means of the software htmldoc, or the user can decide to save only the desidered
graphs produced in the work session.

1.2.3     Description of the second module (R-php point-and-
          click)
The R-php module, that we describe in this paragraph, is not a simple R web
interface, but it has all the characteristics of a GUI; this part makes R-php a new
project on respect to the other similar project.
   The organization of the concurrent sessions of this module is the same of the
organization of the R-php base module, described in the previous paragraph.
   About the data input, that is the first step on the use of R-php point-and-click,
it is done by loading an ASCII file from the user computer. Then, the file content is
visualized in a new page as a “spreadsheet”; this data set is managed by MySQL:
CHAPTER 1. MAIN FEATURES OF R-PHP                                                     14

this allows to make some interactive operations on the data: for example, it is possible
to modify the variable names or a value in each cell of the “spreadsheet”; following,
there is the PHP and MySQL code to modify a cell value:

$sql = "UPDATE $temp SET var$v = ’$val’ where id=’$r’";
mysql_query ($sql, $conn);

   Loaded the data set, it is possible to choose what kind of analysis we have to
perform among those proposed. Each analysis can be performed by means of a GUI;
in this GUI they are chosen the analysis options that the user want to set up; these
options are coded in R and this code is processed by the R environment. With a
greater detail, this phase follows this steps:

  1. the code is put in an ASCII file that is the R input; this operation is the same
      of that one described for the base module;

  2. the database containing the data is exported in an ASCII file; following, there
      is the used code:

      exec("mysqldump -T ./pages/tmp/$temp --fields-terminated-by=’ ’
            R $temp -u’$user’ -p’$pass’");

  3. R, invocated in “batch mode”, processes the code that is in the previously
      created file;

  4. R gives the output in two files: the first one (in text format) containg the textual
      part of the analysis, the secon one (in .ps format) containing the graphs;

  5. the ASCII file is formatted according to the Web standards;

  6. the .ps file is first split in more files and then handled and converted by Im-
      ageMagick in .png format; this operation follows that one described for the
      base module;
CHAPTER 1. MAIN FEATURES OF R-PHP                                                  15

   At this point a Web page is generated; this page contains the textual and graphical
results of the performed analysis. Moreover, the output page allows other interest-
ing operations, such as the output saving, included the graphs, in .pdf (Portable
Document Format) and the saving of each single graph by means of a simple click.
   We are not going to describe each GUI of R-php point-and-click because from a
design point of view the implemented code is formally the same for each GUI.
Chapter 2

User guide

2.1      Use of R-php base
This interface allows the simple editing of R code in a form. This module gives to
the web user the most part of the R potentiality: in fact, it is necessary to know the
R language to use it. Moreover, it is possible to import a data file (an ASCII file)
from the own computer.
   This interface allows the simple editing of R code in a text-area. The parts that
a user can handle are:

   • The text-area to submit R code
      In this part of the page the user can edit the R code needed to perform a
      statistical analysis. It is not allowed the use of R commands interacting with
      the operating system; when the user edits one of the banned commands, no code
      is executed and an error window appears warning the user that in the input code
      there is a forbidden command; the same window allows the user to visualize the
      banned command list with a short description of each command. Code can be
      directly edited, but it could be paste from other programs, of course;

   • The send button
      When the user has completed in the text-area the R code input that wants to
      run, it is sufficient to click on this button and the input code is processed by R.

                                           16
CHAPTER 2. USER GUIDE                                                                  17

   • The reset button
      If the user decides to input new code in the text-area, the reset button allows
      to clean the text-area.

   • The text form
      If the user wants to use an ASCII dataset to perform a statistical analysis, he
      can edit the file path in this form, or more easily he can click on the browse
      button and select from the file-loading window the file; once the user has loaded
      the data set, in the text-area we have the R code needed to load the chosen
      data set.

   Now, the user has only to click on the send button to obtain the output of the
input code in a new window; in this window there are all the possible generated
graphs, too.
   In the output window there is the possibility to save the analysis results in .pdf
format by clicking on the suitable link (see 2.3).

2.2      Use of R-php point-and-click
In this kind of interface, data can not be edited directly, but it is necessary to prepare
a text file with an editor; this file is imported from the user’s computer, by clicking
on the browse button. To use this module it is not necessary the user knows the R
environment, but all the allowed analises can be performed by means of the mouse.
   To make an analysis with this module, it is needed to load a data file from the
computer; this procedure is exactly the same of that one described for the R-php
base module. The import of the text file does not present many problems; different
delimiters are supported (comma, tabulation, space and so on) and files with no
special alpha-numeric characters are correctly imported (if something goes wrong, we
recommend to do many attempts by riediting the file and by trying to delete special
characters). Moreover, it is necessary to point out the presence or not of the variable
names in the first row of the data set, by clicking on a box of the Header form, by
choosing yes or no, respectively.
CHAPTER 2. USER GUIDE                         18

                    Figure 2.1: R-php base.
CHAPTER 2. USER GUIDE                                           19

           Figure 2.2: R-php base: code to import a data set.
CHAPTER 2. USER GUIDE                              20

                 Figure 2.3: R-php base: output.
CHAPTER 2. USER GUIDE                                                                                   21

      In the following step, data are placed in a “spreadsheet”.
      An interesting feature is the possibility to modify the content of every single cell
of the “spreadsheet”, included the variable names, from the page visualizing the data,
through a pop-up window that allows such modifications.
      In the menu on the top, there are buttons to perform several kind of statistical
analises1 . If in the data set we make some modifications, it is possible to save on the
own computer the modified data set in text format.
      Chosen the data set, it is possible to perform the statistical analyses at disposal;
if the user would like to perform an analysis with a new data set, it is not possible
to use the browser options, but it is necessary to use the suitable link to input a new
data set. If a user uses unwillingly the back browser button, he does not obtain the
desidered effect, but has the same previous imported data set.
      Then, after we have imported a data set, it is possible to perform the following
analyses.

2.2.1        Descriptive analysis
By choosing the descriptive analysis GUI, besides the previously loaded data set that
even now can be modified, it is loaded a window where it is possible to select:

      • the data set variables on which we want to compute some statistics;

      • the list of the descriptive statistics at disposal.

      In this software version it is possible to compute the following descriptive statis-
tics: minimum, maximum, arithmetic mean, median, quartiles, standard deviation
and variance. By clicking the send button a new window is obtained, containing
the descriptive statistics computed for every selected variable. If the user does not
select any variable, or any descriptive statistics, by clicking the send button an error
message is shown; this message requests the user to select at least one variable and
one descriptive statistics.
  1
      So far, the kind of available analyses are few, but we think to enrich this software feature shortly.
CHAPTER 2. USER GUIDE                                                               22

    Also in this case, such as in the other GUI, it is possible to save the output in a
.pdf file.

2.2.2        Linear regression
By choosing the linear regression GUI, besides the previously loaded data set that
even now can be modified, there is a text area where it is possible to input the model
formula, by using the R syntax. Actually, it is not necessary the user inputs the
model formula directly, but can choose to click on the model button that opens a
new window where it is possible to choose the response variable and the explanatory
variables among those of the data set. In this window we have implemented some
checks, so:

    • it is not possible to choose no response variable or more than one;

    • it is necessary to choose at least one explanatory variable;

    • the same variable can not be chosen as response and explanatory variable at
      the same time.

    After the response and the explanatory variables have been chosen, by clicking on
the insert button, this window is closed and in the text area of the GUI first window
the chosen model formula is written using a R syntax.
    When the user edits one of the banned commands, no code is executed and an
error window appears warning the user that in the input code there is a forbidden
command; the same window allows the user to visualize the banned command list
with a short description of each command.
    If a user edits a dangerous command in the text area, also in this case commands
are not executed and an error message appears with the possibility to visualize the
banned command list. Now, by clicking on the send button, the corresponding re-
gression analysis is performed. The analysis results are visualized in a new output
window containing:

    • the main descriptive statistics computed on the variables involved on the model;
CHAPTER 2. USER GUIDE                                                                   23

   • the variance and covariance matrix;

   • the correlation matrix;

   • the regression analysis results including:

        – the model formula;
        – some descriptive statistics on residuals;
        – the estimated coefficients withr the standard errors and the t tests;
        – the R2 coefficient of determination;
        – the F statistic to test the null hypothesis of all the coefficients zero, against
           the alternative hypothesis that at least one is different from zero.
        – analysis of variance table.

   • a scatterplot matrix of the involved variables;

   • the graphs of the residual analysis to check the basic hypotheses on the model
     adequacy, on the normality of the random part of the model, the homoskedas-
     ticity, and the plot of Cook’s distances to determine possible outliers.

In this window there are buttons to save:

   • all the output in .pdf format;

   • every single plot in .png format.

2.2.3    Analysis of variance
The graphical window of this GUI is very similar to the linear regression one; also
in this GUI, besides the data set previously loaded, there is a text area where it is
possible to input the model formula, by using the R syntax. About the data input,
the user has to input balanced data set if he wants easily interpretable results, as it
is known. Even in this GUI, if a user edits a dangerous command in the text area,
commands are not executed and an error message appears with the possibility to
CHAPTER 2. USER GUIDE                                                               24

visualize the banned command list. Actually, it is not necessary the user inputs the
model formula directly, but can choose to click on the model button that opens a
new window where it is possible to choose the response variable and the explanatory
variables among those of the loaded data set. If the explonatory variables are numeric,
the implemented code forces these variables to factors before the analysis of variance
is performed. In this window we have implemented some checks, so:

   • it is not possible to choose no response variable or more than one;

   • it is necessary to choose at least one explanatory variable;

   • the same variable can not be chosen as response and explanatory variable at
     the same time.

   After the response and the explanatory variables have been chosen, this window
is closed and the main window appears slightly modified. In particular, under the
text area where the model formula is inserted, the levels of the chosen explanatory
variables are shown. Close to the model button, a new button is shown, giving the
possibility to the user to insert possible interactions in the model.
   Now, the user can perform the analysis of variance with the chosen model by
clicking on the send button. Then, the output window is shown; this window contains
the following information:

   • the variables involved in the model;

   • the levels of each factor;

   • the main descriptive statistics computed on the variables involved on the model;

   • the analysis of variance table;

   • the graphs of the analysis of residuals.

   When a multi-way analysis of variance is performed and there is only one obser-
vation per cell, the error has a sum of squares equal to zero with a consequent loss
CHAPTER 2. USER GUIDE                                                                  25

of validity of the F test. In these cases, R-php, after the analysis of variance table,
stops the running code and gives an error message explaining the problem to the user.
At this point, the user is adviced to go back to the GUI main window, by reinputing
the model formula with no interaction of the highest order. Moreover, in the output
window there are buttons to save:

   • all the output in .pdf format;

   • every single plot in .png format.

2.3      Examples of use of R-php point-and-click
In this section, we are going to see some examples of use for every single GUI of
R-php point-and-click developed so far. What is next it is only an indication to the
user on how it is possible to use R-php point-and-click ; if problems were to raise in
the analysis of other data sets, it is possible to report these problems to the authors.

2.3.1     Descriptive statistics
In this section the data set of the “formaggio.txt” file is used. This file contains data
related to the concentration of several chemicals in 30 samples of Cheddar cheese from
the LaTrobe Valley of Victoria, Australia and a subjective measure of taste for each
sample (Moore e McCabe, 2002, Introduction to the practice of Statistics, Freeman
and Co.). Indeed, it is known that as cheddar cheese matures, a variety of chemical
processes take place and this fact assesses the taste of the final product. In particular,
the variables taken into account are:

   • Taste: subjective taste test score, obtained by combining the scores of several
      tasters.

   • Acetic: natural logarithm of concentration of acetic acid.

   • H2S: natural logarithm of concentration of hydrogen sulfide.
CHAPTER 2. USER GUIDE                                                                  26

   • Lactic: concentration of lactic acid.

Loaded the data set by selecting the “formaggio.txt” file from a local directory and
by choosing the Yes option of Header, the obtained result is that shown in figure 2.4.
   By selecting the descriptive statistics GUI it is visualized the screen reported in
figure 2.5.
   Let’s suppose we want to compute for all the 4 variables of the data set the
following descriptive statistics: minimum, maximum, mean, median and variance
(see fig. 2.6).
   At this point the requested computations are performed when the user clicks the
send button. The output window has the form reported in figure 2.7.
   As we have said before, in the output window there is the possibility to save the
analysis results in .pdf; the content of the .pdf file for this analysis is reported in
appendix.

2.3.2      Linear Regression
To show how it can be used the linear regression GUI of R-php point-and-click, let’s
consider the same file, “formaggio.txt”, used in the previous section. Loaded the
data set and chosen the linear regression GUI (see figure 2.8) we want to estimate the
regression model with Taste as response and the remaining variables as explanatory.
   By clicking on the model button, it is shown a window where it is possible to
choose the response and explanatory variables; in this case, we have to click on
the cross between Taste and Response, while we select the remaining variables as
explanatory (see figure 2.9).
   By clicking on the insert button, the beginning window appears and in the text
area it is shown the chosen model formula. Now, it is sufficient to click on the send
button to have the results about the chosen model (see figure 2.10).
   As we have said before, in the output window there is the possibility to save the
analysis results in .pdf; also in this case the content of the .pdf file for this analysis
is reported in appendix.
CHAPTER 2. USER GUIDE                                 27

                Figure 2.4: Import of the data set.
CHAPTER 2. USER GUIDE                                                    28

       Figure 2.5: Beginning screen of the descriptive statistics GUI.
CHAPTER 2. USER GUIDE                                                            29

    Figure 2.6: Selection of the statistics of the descriptive statistics GUI.
CHAPTER 2. USER GUIDE                                             30

          Figure 2.7: Output of the descriptive statistics GUI.
CHAPTER 2. USER GUIDE                                                31

        Figure 2.8: Beginning screen of the linear regression GUI.
CHAPTER 2. USER GUIDE                                                              32

                    Figure 2.9: Window to select the variables.

2.3.3    Analysis of variance
About the analysis of variance we are going to see a first example of one-way analysis
of variance. Let’s consider the data set of the “capelli.txt” file. This file contains
data related to a study conducted at the University of Melbourne to indicate if there
may be a difference between the pain thresholds of blonds and brunettes (McClave
e Dietrich, 1991, Statistics, Dellen Pubblishing). In particular, men and women of
various ages were divided into four categories according to hair colour: LightBlond,
DarkBlonde, LightBrunette, DarkBrunette. The purpose of the experiment was to
CHAPTER 2. USER GUIDE                                                                  33

determine whether hair colour is related to the amount of pain produced by common
types of mishaps and assorted types of trauma. Each person in the experiment was
given a pain threshold score based on his or her performance in a pain sensitivity
test (the higher the score, the higher the person’s pain tolerance). The variables are:
HairColor and Pain. First, let’s import the file as we have described previously.
By choosing the analysis of variance GUI, it is showed the related window. Now,
by clicking on the model button, it is shown a window that permits to the user to
choose the response variable and the factors; in this case, we have to click on the cross
between HairColor and Explanatory and between Pain and Response (see figure
2.11).
   By clicking on the insert button, we go back to the GUI beginning window that
now it is slightly different. Indeed, in the text area it is shown the chosen model
formula and, below, the level of the previously chosen factors (see figure 2.12).
   Now, it is sufficient to click on the send button to have the results about the
chosen model (see figure 2.13).
   As we have said before, in the output window there is the possibility to save the
analysis results in .pdf; also in this case the content of the .pdf file for this analysis
is reported in appendix.
   Now, we are going to see two examples of two-way analysis of variance: the former
in which we have a data set with one observation per cell and then we have to suppose
a null interaction effect, the latter in which we have a balanced design with more than
one observation per cell and then we have to consider the interaction effect. For the
former case, we consider the “lunapiena.txt” file.
   In folklore, the full moon is often portrayed as something sinister, a kind of evil
force possessing the power to control our behaviour. Over the centuries, many promi-
nent writers and philosophers have shared this belief. The data give the admission
rates to the emergency room of a Virginia mental health clinic before, during and
after the 12 full moons from August 1971 to July 1972.
   These data are taken to see if it has grounds the popular belief that the full
moon is often portrayed as something sinister (Larsen & Marx, 1986, An Introduction
to Mathematical Statistics and Its Applications, 3a edizione, Prentice Hall). The
CHAPTER 2. USER GUIDE                                                                  34

variables taken into account are Admission, i.e. the daily admission rate; Month,
i.e. month of year; Moon, i.e. before, during or after the full moon. As before,
let’s import tha data set and let’s choose the analysis of variance GUI. Then, by
clicking on the model button, it is shown a window where it is possible to choose the
response variable and the factors; in this case, we have to click on the cross between
Month and Explanatory, between Moon and Explanatory and between Admission
and Response (see figure 2.14).
   At this point we return to the previous page and the add-interactions button
is shown; this button allows to add the possible interactions; in this case we do not
use this button and we go on with the analysis by clicking the send button. Then
the output window is visualized (see figure 2.15).
   As we have said before, in the output window there is the possibility to save the
analysis results in .pdf; also in this case the content of the .pdf file for this analysis
is reported in appendix.
   Let’s see, now, an example of two-way analysis of variance with an interaction ef-
fect; for this example, we use the “insulate.txt” file. This data set has three variables:
the response variable Strength, i.e. the impact strength of an insulating material,
in foot/pounds, and the explanatory variables Lot, i.e. the lot of insulating material
and Cut, i.e. the type of cut (Ostle & Malone, 1987, Statistics in Research: Basic
Concepts and Techniques for Research Workers, 4a edizione, Blackwell Pubblishing).
   As before, let’s import tha data set and let’s choose the analysis of variance GUI.
Now, by clicking on the model button, it is shown a window where it is possible to
choose the response variable and the factors; in this case, we have to click on the
cross between Cut and Explanatory, between Lot and Explanatory and between
Strength and Response (see figure 2.16).
   By clicking the insert button, we return to the previous page and the add-interactions
button is shown; this button allows to add all the possible interactions; in this case
we select the two variables to estimate their interaction effect (see figure 2.17)
   By clicking the insert button, the model is input with the interaction in the
suitable text area and by clicking the send button, the output is visualized in a new
window (see figure 2.18).
CHAPTER 2. USER GUIDE                                                                  35

   As we have said before, in the output window there is the possibility to save the
analysis results in .pdf; also in this case the content of the .pdf file for this analysis
is reported in appendix.
CHAPTER 2. USER GUIDE                                          36

           Figure 2.10: Output of the linear regression GUI.
CHAPTER 2. USER GUIDE                                       37

             Figure 2.11: Window to select the variables.
CHAPTER 2. USER GUIDE                                            38

          Figure 2.12: Screen of the analysis of variance GUI.
CHAPTER 2. USER GUIDE                                            39

          Figure 2.13: Output of the analysis of variance GUI.
CHAPTER 2. USER GUIDE                                       40

             Figure 2.14: Window to select the variables.
CHAPTER 2. USER GUIDE                                            41

          Figure 2.15: Output of the analysis of variance GUI.
CHAPTER 2. USER GUIDE                                       42

             Figure 2.16: Window to select the variables.
CHAPTER 2. USER GUIDE                                         43

            Figure 2.17: Window to select the interactions.
CHAPTER 2. USER GUIDE                                            44

          Figure 2.18: Output of the analysis of variance GUI.
APPENDIX
output R−PHP − http://r−php.homelinux.net

[1] "Taste"    "Acetic" "H2S"      "Lactic"

                          Taste

Min
[1] 0.7

Median
[1] 20.95

Mean
[1] 24.53333

Max
[1] 57.2

Variance
[1] 264.2375

                          Acetic

Min
[1] 4.477

Median
[1] 5.425

Mean
[1] 5.498033

Max
[1] 6.458

Variance
[1] 0.3259021

                          H2S

Min
[1] 2.996

                                                                     1
output R−PHP − http://r−php.homelinux.net

Median
[1] 5.329

Mean
[1] 5.941767

Max
[1] 10.199

Variance
[1] 4.523615

                 Lactic

Min
[1] 0.86

Median
[1] 1.45

Mean
[1] 1.442

Max
[1] 2.01

Variance
[1] 0.0921062

                                                            2
output R−PHP − http://r−php.homelinux.net

[1] "Taste"    "Acetic" "H2S"         "Lactic"

      Taste               Acetic             H2S               Lactic
 Min.    : 0.70      Min.    :4.477     Min.   : 2.996    Min.    :0.860
 1st Qu.:13.55       1st Qu.:5.237      1st Qu.: 3.978    1st Qu.:1.250
 Median :20.95       Median :5.425      Median : 5.329    Median :1.450
 Mean    :24.53      Mean    :5.498     Mean   : 5.942    Mean    :1.442
 3rd Qu.:36.70       3rd Qu.:5.883      3rd Qu.: 7.575    3rd Qu.:1.667
 Max.    :57.20      Max.    :6.458     Max.   :10.199    Max.    :2.010

           Taste    Acetic        H2S    Lactic
Taste 264.237471 5.0996402 26.1288011 3.4742414
Acetic  5.099640 0.3259021 0.7503155 0.1046089
H2S    26.128801 0.7503155 4.5236150 0.4162177
Lactic  3.474241 0.1046089 0.4162177 0.0921062

             Taste      Acetic         H2S      Lactic
Taste    1.0000000   0.5495393   0.7557523   0.7042362
Acetic   0.5495393   1.0000000   0.6179559   0.6037826
H2S      0.7557523   0.6179559   1.0000000   0.6448123
Lactic   0.7042362   0.6037826   0.6448123   1.0000000

Call:
lm(formula = Taste ~ Acetic + H2S + Lactic)

Residuals:

                                                                           1
output R−PHP − http://r−php.homelinux.net
    Min       1Q   Median         3Q      Max
−17.391   −6.612   −1.009      4.908   25.449

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) −28.8768    19.7354 −1.463 0.15540
Acetic         0.3277    4.4598   0.073 0.94198
H2S            3.9118    1.2484   3.133 0.00425 **
Lactic        19.6705    8.6291   2.280 0.03108 *
−−−
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 10.13 on 26 degrees of freedom
Multiple R−Squared: 0.6518,     Adjusted R−squared: 0.6116
F−statistic: 16.22 on 3 and 26 DF, p−value: 3.81e−06

Analysis of Variance Table

Response: Taste
          Df Sum Sq Mean Sq F value     Pr(>F)
Acetic     1 2314.14 2314.14 22.5481 6.528e−05 ***
H2S        1 2147.02 2147.02 20.9197 0.0001035 ***
Lactic     1 533.32 533.32 5.1964 0.0310795 *
Residuals 26 2668.41 102.63
−−−
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

                                                                        2
output R−PHP − http://r−php.homelinux.net

                                            3
output R−PHP − http://r−php.homelinux.net

[1] "HairColour" "Pain"

> HairColour
[1] "DarkBlond"      "DarkBrunette"      "LightBlond"      "LightBrunette"

      Pain                HairColour
 Min.   :30.00    DarkBlond    :5
 1st Qu.:40.00    DarkBrunette :5
 Median :48.00    LightBlond   :5
 Mean   :47.84    LightBrunette:4
 3rd Qu.:56.00
 Max.   :71.00

Analysis of Variance Table

Response: Pain
           Df Sum Sq Mean Sq F value   Pr(>F)
HairColour 3 1360.73 453.58 6.7914 0.004114 **
Residuals 15 1001.80   66.79
−−−
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

                                                                             1
output R−PHP − http://r−php.homelinux.net

                                            2
output R−PHP − http://r−php.homelinux.net

[1] "Month"       "Moon"        "Admission"

> Month
 [1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep"

> Moon
[1] "After"   "Before" "During"

   Admission            Month           Moon
 Min.   : 5.000     Apr    : 3      After :12
 1st Qu.: 8.475     Aug    : 3      Before:12
 Median :12.850     Dec    : 3      During:12
 Mean   :11.931     Feb    : 3
 3rd Qu.:14.000     Jan    : 3
 Max.   :25.000     Jul    : 3
                    (Other):18

Analysis of Variance Table

Response: Admission
          Df Sum Sq Mean Sq F value   Pr(>F)
Month     11 455.58   41.42 7.1285 5.076e−05 ***
Moon       2 41.51    20.76 3.5726   0.04533 *
Residuals 22 127.82    5.81
−−−
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

                                                                               1
output R−PHP − http://r−php.homelinux.net

                                            2
output R−PHP − http://r−php.homelinux.net

[1] "Month"       "Moon"        "Admission"

> Month
 [1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep"

> Moon
[1] "After"   "Before" "During"

   Admission            Month           Moon
 Min.   : 5.000     Apr    : 3      After :12
 1st Qu.: 8.475     Aug    : 3      Before:12
 Median :12.850     Dec    : 3      During:12
 Mean   :11.931     Feb    : 3
 3rd Qu.:14.000     Jan    : 3
 Max.   :25.000     Jul    : 3
                    (Other):18

Analysis of Variance Table

Response: Admission
          Df Sum Sq Mean Sq F value   Pr(>F)
Month     11 455.58   41.42 7.1285 5.076e−05 ***
Moon       2 41.51    20.76 3.5726   0.04533 *
Residuals 22 127.82    5.81
−−−
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

                                                                               1
output R−PHP − http://r−php.homelinux.net

                                            2
You can also read