EasyMiner easy association rule mining, classification and anomaly detection

API Tutorial

Do you want to use full EasyMiner functionality in your own project, or simply automate running of tasks that you have tuned in the user interface? Try our REST API.

The main, top-level API offering the full functionality of EasyMiner is provided by the integration component EasyMinerCenter. If you want to use  APIs of the individual services, you can also do that. This is documented in the For Developers section.

Requirements

To use EasyMiner API, you need to have access to the configured EasyMiner system. You can use your own installation, or use our demo server.

On the selected EasyMiner server, go to the user interface and create a new user account. Then click on your user image (your photo or generic user icon) in the top right corner of the screen, use the link "Show my profile" and copy your API key. The URL of the profile page is: <easyminercenter-url>/em/user/details

 

The API key has to be send in all requests to identify the user account.

API documentation

The API is fully documented using Swagger documentation. The API endpoint and the API documentation are available at: <easyminercenter-url>/api

You can look at the Swagger documentation at our demo server.

Data mining using API

Association rule mining, classification model building

Before you can use the data mining API, you need to have the API key and a dataset you want to analyze. The dataset has to be in the  CSV format (optionally zipped). If you have this ready, then follow these steps:

  1. Upload data in CSV - the data can optionally be zipped; suitable for files of limited size (up to 50MB), for larger files, the user must use a cyclical post call directly on data service. Uploading of the data will create a data source.
  2. Create miner - this operation will create a miner - an instance of a data source created on a selected mining backend (Spark or R)
  3. Preprocess data – generate attributes from data fields of the data stored in the data source; the user selects a preprocessing algorithm for each attribute; it is also possible to generate more attributes from one data field.
  4. Define association rule mining task - define attributes for the antecedent and consequent parts of association rules; the user can opt to fix the attribute to a specific value; it is necessary to set minimum thresholds  for confidence, support and optionally lift.
  5. Execute the mining task and wait for results.
  6. Export the results (in PMML AssociationModel, GUHA PMML or ad-hoc JSON)

API usage examples

Rule mining

A complex, commented example is available on GitHub - see the commented code...

Python client

For evaluation purposes, we prepared a complex benchmarking use case  on 40 datasets from the UCI repository. This benchmarking suite is written in Python. This project is stored in KIZI/EasyMiner-Evaluation GitHub repository.

A part of this project is a Python client for the EasyMiner REST API.

Simple example

1. Upload data in CSV

headers = {"Accept": "application/json"}
files = {("file", open(CSV_FILE, 'rb'))}
r = requests.post(API_URL + '/datasources?separator=' + urllib.parse.quote(CSV_SEPARATOR) + '&encoding=' + CSV_ENCODING + '&type=limited&apiKey=' + API_KEY, files=files, headers=headers)
datasource_id = r.json()["id"]

2. Create miner

headers = {'Content-Type': 'application/json', "Accept": "application/json"}
json_data = json.dumps({"name": "TEST MINER", "type": "cloud", "datasourceId": datasource_id})
r = requests.post(API_URL + "/miners?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
miner_id = r.json()["id"]

3. Preprocess data – generate data fields from attributes stored in data source

The user defines preprocessing for each data field. it is also possible to generate multiple attributes from one data field.

headers = {'Content-Type': 'application/json', "Accept": "application/json"}
r = requests.get(API_URL + '/datasources/' + str(datasource_id) + '?apiKey=' + API_KEY, headers=headers)
datasource_columns = r.json()['column']
attributes_columns_map = {}
for col in datasource_columns:
    column = col["name"]
    json_data = json.dumps(
        {"miner": miner_id, "name": column, "columnName": column, "specialPreprocessing": "eachOne"})
    r = requests.post(API_URL + "/attributes?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
    if r.status_code != 201:
        break  # error occured
    attributes_columns_map[column] = r.json()['name']  # map of created attributes (based on the existing data fields)

4. Define association rule mining task

Define attributes for the antecedent and consequent parts of association rules. The attributes can be configured  to either appear with any value or constrained to only one fixed value.

This step also entails definition of threshold values on interest measures (confidence, support, lift)

# define data mining task
antecedent = []
consequent = []

# prepare antecedent pattern
if len(ANTECEDENT_COLUMNS):
    # add to antecedent only fields defined in the constant
    for column in ANTECEDENT_COLUMNS:
         antecedent.append({"attribute":attributes_columns_map[column]})
else:
    # add to antecedent all fields not used in consequent
    for (column, attribute_name) in attributes_columns_map.items():
        if not(column in CONSEQUENT_COLUMNS):
            antecedent.append({"attribute": attribute_name})

# prepare consequent pattern
for column in CONSEQUENT_COLUMNS:
   consequent.append({"attribute": attributes_columns_map[column]})

    json_data = json.dumps({"miner": miner_id,
                            "name": "Test task",
                            "limitHits": 1000,
                            "IMs": [
                                {
                                    "name": "CONF",
                                    "value": MIN_CONFIDENCE
                                },
                                {
                                    "name": "SUPP",
                                    "value": MIN_SUPPORT
                                }
                            ],
                            "antecedent": antecedent,
                            "consequent": consequent
                            })
# define new data mining task
r = requests.post(API_URL + "/tasks/simple?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
print("create task response code:" + str(r.status_code))
task_id = str(r.json()["id"])

5. Execute the mining task

r = requests.get(API_URL + "/tasks/" + task_id + "/start?apiKey=" + API_KEY, headers=headers)
while True:
    time.sleep(1)
    # check state
    r = requests.get(API_URL + "/tasks/" + task_id + "/state?apiKey=" + API_KEY, headers=headers)
    task_state = r.json()["state"]
    print("task_state:" + task_state)
    if task_state == "solved":
        break
    if task_state == "failed":
        print("task failed executing")
        break

6. Export the results (in PMML AssociationModel, GUHA PMML or simple JSON)

# export rules in JSON format
headers = {"Accept": "application/json"}
r = requests.get(API_URL + '/tasks/' + task_id + '/rules?apiKey=' + API_KEY, headers=headers)
task_rules = r.json()

# export of standardized PMML AssociationModel
r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=associationmodel&apiKey=' + API_KEY)
pmml = r.text

# export of GUHA PMML
r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=guha&apiKey=' + API_KEY)
guha_pmml = r.text