EasyMiner easy association rule mining, classification and anomaly detection

For Developers

Use EasyMiner API in your project

EasyMiner is an academic data mining project providing data mining of association rules, building of classification models based on association rules and outlier detection based on frequent pattern mining. The full project is based on composition of components and services with fully documented REST APIs. Most of the components and services are available under open source Apache License, Version 2.0.

To use EasyMiner functionality in your own project, you can build for example a mashup application (or write an own data mining script) using the main REST API.  This API provides full functionality of EasyMiner, including also functions, which are not yet available in the GUI.

You can also extend EasyMiner by adding new algorithms - rule mining, outlier detection or scorer service. For this purpose, the integration component EasyMinerCenter provides documented interfaces in PHP.

REST API

In case you want to use EasyMiner features in your own project, the easiest way is to use the main API (provided by the EasyMinerCenter component). The API is available at URL <server>/easyminercenter/api, where <server> is the URL of the EasyMiner server.

For more information refer to the  API tutorial.

EasyMiner architecture

EasyMiner is based on composition of re-usable services with fully documented APIs. The main services in current version of EasyMiner are shown on the following figure:

The central component (service) is EasyMinerCenter. This component provides user account and tasks management, stores discovered association rules, provides an authentication service for other services. It also calls other services and provides the main graphical web user interface as well as the main API interface for integration of EasyMiner functionality into other projects and scripted data mining workflows.

  • Graphical UI: <server>/easyminercenter
  • API endpoint: <server>/easyminercenter/api
  • API documentation: <server>/easyminercenter/api

EasyMiner-Data is a web service for management of data sources. This service supports upload data in CSV and RDF data formats. Uploaded data are stored into database (data backend) - MySQL (MariaDB) or Hive.

  • API endpoint: <server>/easyminer-data/api/v1
  • API documentation: <server>/easyminer-data/index.html

EasyMiner-Preprocessing service supports creation of datasets for data mining. It takes data fields from data source created using EasyMiner-Data and creates attributes from the  data fields using one of these preprocessing algorithms: each value-one bin, intervals enumeration, nominal enumeration, equidistant intervals, equisized intervals.

  • API endpoint: <server>/easyminer-preprocessing/api/v1
  • API documentation: <server>/easyminer-preprocessing/index.html

EasyMiner-Miner is a web service that encapsulates data mining algorithms.  Supported data mining methods include association rule learning (apriori, FP-Growth), pruning and classification (CBA) and outlier detection (algorithms included in the fpmoutlier package).

  • API endpoint: <server>/easyminer-miner/api/v1
  • API documentation: <server>/easyminer-miner/index.html

EasyMiner-Scorer is a web service for testing of classification models based on association rules.

  • API endpoint: <server>/easyminer-scorer/v0.3
  • API documentation: <server>/easyminer-scorer/index.html

In all the URLs above, the <server> part should be replaced with the URL of the used EasyMiner server.

Extend EasyMiner functionality

The functionality of  EasyMiner is continuously extended with new  algorithms and data mining platforms systems. The two most recent additions are  outlier detection and Spark mining.

EasyMinerCenter supports several data sources (databases or data services), preprocessing services, mining algorithms and scorers. All these services are integrated using drivers written in PHP. Current version of this component supports three types of data mining backends - see the schema.

To integrate your own algorithm, you can use the following PHP interfaces:

EasyMinerCenter\Model\Data\Databases\IDatabase
EasyMinerCenter\Model\Preprocessing\Databases\IPreprocessing
EasyMinerCenter\Model\Mining\IMiningDriver
EasyMinerCenter\Model\Mining\IOutliersMiningDriver
EasyMinerCenter\Model\Scoring\IScorerDriver

A new driver has to be registered in the application configuration. The configuration is defined in the directory <server>/easyminercenter/app/config. The main configuration file is config.neon, configuration for the given server should be defined in the file config.local.neon. The configuration of the algorithm drivers is defined in the section "parameters" - for an example refer to configuration of already existing drivers.

Issue tracking & source code

Source code of most components of EasyMiner/R are available in public repositories on GitHub.com.

The main repository is KIZI/EasyMiner

The state of the code base with respect to maintaining rule mining and classification functionality is regularly checked with integration tests in  Travis CI.  Build Status 

The main repository includes main components of EasyMiner system as well as subprojects. To clone the main repository, run command:

git clone --recursive https://github.com/KIZI/EasyMiner.git

Structure of repositories

Frontend service

Backend services

Other source codes

Issue tracking

The issues are tracked separately for all GitHub projects listed in the previous paragraphs. If you find an error or if you have a suggestion to improve the EasyMiner functionality then you can add a new issue straight to the right GitHub project. In case you are not sure, which GitHub project to choose you can add the issue to the main repository, we will process it and move it to the right subproject.


Extend EasyMiner Backend

All EasyMiner modules communicate each other via RESTful endpoints. Inside a module - events, operations and processes are handling by the akka actor system (available for Scala and Java). If some new algorithm is implemented there are two options how to integrate it with the EasyMiner system.

Option #1 - RESTful service

Wrap your algorithm by a RESTful endpoint which is compatible with a driver that is specified in the "Extend EasyMiner functionality" part. Then the EasyMinerCenter module will be able to use this new service as a new algorithm for a specific driver. This solution is good if a new implemented algorithm already uses a RESTful endpoint or is written in a language other than Scala or Java.

Authentication and authorization is maintained by the EasyMinerCenter module via API key. With a right API key an integrated service can obtain information about connection into an EasyMiner shared storage for a specific user. This storage (MySQL for the limited version and Hive for the unlimited version) is used for saving datasets and preprocessed fields and any integrated service can use it for an other operations like transformations, data mining or machine learning processes.

Option #2 - Akka actor

If we do not want to create a new RESTful endpoint for a new algorithm, then we can simply add it into an existing EasyMiner backend module using the actor system. The EasyMiner-Miner module uses the akka.io framework for the process management. As soon as the module received a particular HTTP request with specific parameters, then the system uses an actor which invokes an appropriate process. The system can also work with remote actors therefore a new algorithm can be separated from the main mining module. This solution is available only for Java and Scala language.

A new integrating algorithm must be wrapped to an akka actor which is able to receive different kinds of messages. Each message implements a particular Java interface which is placed in the EasyMiner-Task module. Therefore you need to only import this package into your Java or Scala project (this package uses a compatible akka version - don't use another akka version) and create an actor receaving these messages:


package cz.vse.easyminer.task.messages;

import java.util.UUID;

/**
 * Interface for all task requests from the main controller to a mining actor
 */
interface TaskMessageRequest extends TaskMessage {

    /**
     * Task id
     *
     * @return java uuid
     */
    UUID getId();

    /**
     * User EasyMiner API key (we can use it for getting user datasets)
     *
     * @return string
     */
    String getApiKey();

}


package cz.vse.easyminer.task.messages;

import java.util.Map;

/**
 * The main controller sends this message as a request for a creation of a new task.
 * A receiving mining actor reacts on this message and sends TaskInitResponse back to the main controller.
 */
public interface TaskInitRequest extends TaskMessageRequest {

    /**
     * Max running time for the task progress. After reaching of this threshold the task should be killed.
     * @return time in minutes
     */
    Integer getMaxRunningTime();

    /**
     * Task properties
     * @return key-value pairs
     */
    Map<String, String> getProperties();

    /**
     * The main body of the task initialization.
     * It may be empty if there are no purpose for this usage.
     * @return task body with dataset or other settings, or empty
     */
    Byte[] getBody();

}


package cz.vse.easyminer.task.messages;

/**
 * The main controller sends this message as a request for a task status.
 */
public interface TaskStatusRequest extends TaskMessageRequest {
}

The receiving actor must react to a TaskMessageRequest message and response with a TaskMessageResponse message.


package cz.vse.easyminer.task.messages;

/**
 * Interface for all task responses from the mining actor to the main controller
 */
interface TaskMessageResponse extends TaskMessage {

    /**
     * Message for this response (status or error messages)
     *
     * @return status message
     */
    String getMessage();

}


package cz.vse.easyminer.task.messages;

/**
 * The mining actor sends this message after request on a task initialization: TastInitRequest
 */
public interface TaskInitResponse extends TaskMessageResponse {

    /**
     * This method return status whether the task has been accepted and is running or not.
     *
     * @return true = task has been accepted and is running, false = task has not been accepted (getMessage method returns a reason).
     */
    Boolean isAccepted();

}


package cz.vse.easyminer.task.messages;

/**
 * The mining actor sends this message after request on a status: TastStatusRequest
 * There are 4 options of behaviour:
 * 1. Task is still running (isActive = true, isCompleted = false, isSuccessful = false, getResult = empty)
 * 2. Task has been failed (isActive = true, isCompleted = true, isSuccessful = false, getResult = empty, getMessage = error message)
 * 3. Task has been successfully completed (isActive = true, isCompleted = true, isSuccessful = true, getResult = result)
 * 4. Task does not exist (isActive = false, isCompleted = false, isSuccessful = false, getResult = empty)
 */
public interface TaskStatusResponse extends TaskMessageResponse {

    /**
     * Task is still active (is running or completed)
     *
     * @return boolean
     */
    Boolean isActive();

    /**
     * Task is completed
     *
     * @return boolean
     */
    Boolean isCompleted();

    /**
     * Task is successfully completed
     *
     * @return true = getResult method returns a result, false = process threw an exception and getMessage method returns an error message
     */
    Boolean isSuccessful();

    /**
     * If this task has been successfully completed, then this method returns result; otherwise it returns the empty array.
     *
     * @return byte array with a result
     */
    Byte[] getResult();

}

Once we have created an actor which is able to process these kinds of messages we can integrate it with the EasyMiner-Miner module. Either we can ask for adding this actor directly to the EasyMiner-Miner module, or we can create a microservice for the actor usage. In the second case it is needed to have your remote actor visible with a public IP address (your actor system must have same version as the EasyMiner actor system).

If we have integrated a new mining actor with a new algorithm, then we can run it by HTTP request on the EasyMiner-Miner module:

Task init HTTP request

  • Url: <easyminer-miner-address>/api/task
  • Method: POST
  • Content-Type: multipart/form-data
  • Accept: application/json
  • Parameters:
    • name: algorithm name (actor name)
    • address: for remote actors use an address to the actor (akka.<protocol>://<actor system name>@<hostname>:<port>/<actor path>), default is empty if we use a local algorithm implemented in the EasyMiner-Miner module.
    • timeout: max running time for the task in minutes (default is 10 minutes)
    • body: for a binary body input use this parameter name.
    • ...other parameter form names are also passing into the task initializator; their values are converted into string by utf-8 coding.
  • Response codes:
    • 202: task has been accepted
    • 404: algorithm or actor does not exist
    • 400: invalid task parameters

Task status HTTP request

  • Url: <easyminer-miner-address>/api/task/<task-id>
  • Method: GET
  • Accept: *
  • Response codes:
    • 404: task does not exist
    • 202: task is still in progress; it returns text/plain with a status message.
    • 500: error during mining; it returns text/plain with an error message.
    • 200: task has been successfully completed; it returns a result with some content type - specific for a particular algorithm.