EasyMiner easy association rule mining, classification and anomaly detection

For Developers

Use EasyMiner API in your project

EasyMiner is an academic data mining project providing data mining of association rules, building of classification models based on association rules and outlier detection based on frequent pattern mining. The full project is based on composition of components and services with fully documented REST APIs. Most of the components and services are available under open source Apache License, Version 2.0.

To use EasyMiner functionality in your own project, you can build for example a mashup application (or write an own data mining script) using the main REST API.  This API provides full functionality of EasyMiner, including also functions, which are not yet available in the GUI.

You can also extend EasyMiner by adding new algorithms - rule mining, outlier detection or scorer service. For this purpose, the integration component EasyMinerCenter provides documented interfaces in PHP.

REST API

In case you want to use EasyMiner features in your own project, the easiest way is to use the main API (provided by the EasyMinerCenter component). The API is available at URL <server>/easyminercenter/api, where <server> is the URL of the EasyMiner server.

For more information refer to the  API tutorial.

EasyMiner architecture

EasyMiner is based on composition of re-usable services with fully documented APIs. The main services in current version of EasyMiner are shown on the following figure:

The central component (service) is EasyMinerCenter. This component provides user account and tasks management, stores discovered association rules, provides an authentication service for other services. It also calls other services and provides the main graphical web user interface as well as the main API interface for integration of EasyMiner functionality into other projects and scripted data mining workflows.

  • Graphical UI: <server>/easyminercenter
  • API endpoint: <server>/easyminercenter/api
  • API documentation: <server>/easyminercenter/api

EasyMiner-Data is a web service for management of data sources. This service supports upload data in CSV and RDF data formats. Uploaded data are stored into database (data backend) - MySQL (MariaDB) or Hive.

  • API endpoint: <server>/easyminer-data/api/v1
  • API documentation: <server>/easyminer-data/index.html

EasyMiner-Preprocessing service supports creation of datasets for data mining. It takes data fields from data source created using EasyMiner-Data and creates attributes from the  data fields using one of these preprocessing algorithms: each value-one bin, intervals enumeration, nominal enumeration, equidistant intervals, equisized intervals.

  • API endpoint: <server>/easyminer-preprocessing/api/v1
  • API documentation: <server>/easyminer-preprocessing/index.html

EasyMiner-Miner is a web service that encapsulates data mining algorithms.  Supported data mining methods include association rule learning (apriori, FP-Growth), pruning and classification (CBA) and outlier detection (algorithms included in the fpmoutlier package).

  • API endpoint: <server>/easyminer-miner/api/v1
  • API documentation: <server>/easyminer-miner/index.html

EasyMiner-Scorer is a web service for testing of classification models based on association rules.

  • API endpoint: <server>/easyminer-scorer/v0.3
  • API documentation: <server>/easyminer-scorer/index.html

In all the URLs above, the <server> part should be replaced with the URL of the used EasyMiner server.

Extend EasyMiner functionality

The functionality of  EasyMiner is continuously extended with new  algorithms and data mining platforms systems. The two most recent additions are  outlier detection and Spark mining.

EasyMinerCenter supports several data sources (databases or data services), preprocessing services, mining algorithms and scorers. All these services are integrated using drivers written in PHP. Current version of this component supports three types of data mining backends - see the schema.

To integrate your own algorithm, you can use the following PHP interfaces:

EasyMinerCenter\Model\Data\Databases\IDatabase
EasyMinerCenter\Model\Preprocessing\Databases\IPreprocessing
EasyMinerCenter\Model\Mining\IMiningDriver
EasyMinerCenter\Model\Mining\IOutliersMiningDriver
EasyMinerCenter\Model\Scoring\IScorerDriver

A new driver has to be registered in the application configuration. The configuration is defined in the directory <server>/easyminercenter/app/config. The main configuration file is config.neon, configuration for the given server should be defined in the file config.local.neon. The configuration of the algorithm drivers is defined in the section "parameters" - for an example refer to configuration of already existing drivers.

Issue tracking & source code

Source code of most components of EasyMiner/R are available in public repositories on GitHub.com.

The main repository is KIZI/EasyMiner

The state of the code base with respect to maintaining rule mining and classification functionality is regularly checked with integration tests in  Travis CI.  Build Status 

The main repository includes main components of EasyMiner system as well as subprojects. To clone the main repository, run command:

git clone --recursive https://github.com/KIZI/EasyMiner.git

Structure of repositories

Frontend service

Backend services

Other source codes

Issue tracking

The issues are tracked separately for all GitHub projects listed in the previous paragraphs. If you find an error or if you have a suggestion to improve the EasyMiner functionality then you can add a new issue straight to the right GitHub project. In case you are not sure, which GitHub project to choose you can add the issue to the main repository, we will process it and move it to the right subproject.


Extend EasyMiner Backend

All EasyMiner modules communicate each other via RESTful endpoints. Inside a module - events, operations and processes are handling by the akka actor system (available for Scala and Java). If some new algorithm is implemented there are two options how to integrate it with the EasyMiner system.

Option #1 - RESTful service integrated with EasyMinerCenter

Wrap your algorithm to a RESTful endpoint which is compatible with a driver that is specified in the "Extend EasyMiner functionality" part. Then the EasyMinerCenter module will be able to use this new service as a new algorithm for a specific driver. This solution is good if a new implemented algorithm uses same parameters as another tightly integrated algorithm, but the progress has a different approach.

Option #2 - RESTful service integrated with EasyMiner-Miner

Wrap your algorithm to a RESTful endpoint with specific operations (see below). The EasyMiner-Miner module has a special operation called "remote-task". If you send a POST request with parameter 'address' and other parameters to <easyminer-miner-host>/api/v1/remote-task, then the HTTP request is forwarded into the specific address with an api key in header. Your sevice can process the task request and run a particular function asynchronously. Subsequently, a client should obtain a task id and send a GET request for a task status: <easyminer-miner-host>/api/v1/remote-task/<task-id>. This request is again forwarded to the remote service and returns three states: "in progress", "result" or "error".

Remote task operations defined in EasyMiner-Miner

Task init HTTP request
  • Url: <easyminer-miner-address>/api/v1/remote-task
  • Method: POST
  • Content-Type: multipart/form-data
  • Parameters:
    • name: algorithm name (required)
    • address: remote address of the integrated service containing a path for task processing. This request creates a task id and sends POST request with same parameters to <remote-service-address>/<task-id>
    • timeout: max running time for the task in minutes (default is 10 minutes)
    • body: if the remote service consumes some complex data then use this parameter which can have any Content-Type header.
    • ...other form parameters are also passing into the remote service; their values are converted into string by utf-8 coding and forwarded with Content-Type: text/plain; UTF-8.
  • Response codes:
    • 202: task has been accepted and is running. Content type of this response is text/plain and contains ID of the created task. This response occurs only if the remote service also returns 202.
    • 400: invalid task parameters or the remote service returned an error. The content type can be various; it also depends on the remote service response.
Task status HTTP request
  • Url: <easyminer-miner-address>/api/v1/remote-task/<task-id>
  • Method: GET
  • Response codes:
    • 404: task does not exist
    • 202: task is still in progress; it can return a message with a various content type depending on the remote service.
    • 500: error during mining; it can return a message with a various content type depending on the remote service.
    • 200: task has been successfully completed; it returns a result with some content type - specific for a particular remote service.

Remote task operation defined in the custom and remote integrated service

Task init HTTP request
  • Url: <remote-service-address>/<task-id>
  • Method: POST
  • Content-Type: multipart/form-data
  • Parameters:
    • name: algorithm name (required)
    • timeout: max running time for the task in minutes (required)
    • body: if the remote service consumes some complex data then use this parameter which can have any Content-Type header.
    • ...other form parameters with Content-Type: text/plain; UTF-8.
  • Response codes:
    • 202: task has been accepted and is running. It can return a message with any content type.
    • 400: invalid task parameters. It can return a message with any content type.
Task status HTTP request
  • Url: <remote-service-address>/<task-id>
  • Method: GET
  • Response codes:
    • 404: task does not exist
    • 202: task is still in progress. It can return a message with any content type.
    • 500: error during mining. It can return a message with any content type.
    • 200: task has been successfully completed; it returns a result with some content type - specific for a particular operation.

Authentication and authorization for both methods are maintained by the EasyMinerCenter module via API key. With a right API key an integrated service can obtain information about connection into an EasyMiner shared storage for a specific user. This storage (MySQL for the limited version and Hive for the unlimited version) is used for saving datasets and preprocessed fields and any integrated service can use it for an other operations like transformations, data mining or machine learning processes.