EasyMiner easy association rule mining, classification and anomaly detection

Linked Data RDF Mining

The EasyMiner system is now using the EasyMiner-Rdf module for rules mining from RDF graphs. This module is able to load RDF graphs in many formats (n-triples, turtle, jsonld, trig, etc.) and use it mainly for the rules discovery process. In datasets, created by RDF specifications, we can easily identify data types, schemas, relations between objects and extend data with others public datasets thanks to the linked data paradigm. With these special features we can discover more knowledges from RDF data against the typical data formats for rules mining (like transactions or tables). The EasyMiner-Rdf package offers several operations for RDF data processing and rules mining:

  • RDF datasets loading
    • Well-known RDF formats supported.
    • Progressive loading of RDF named graphs into one dataset.
    • Loading from input streams (local/remote files).
    • RDF Schema and ontology support - future work!
    • Loading datasets from Hadoop HDFS - future work!
  • Dataset transformations (all following operations are lazy)
    • Triples filtering.
    • Triples mapping/replacing.
    • Literals preprocessing like values merging and numbers discretization - future work!
    • Adding of prefix definitions.
  • Dataset actions
    • Sliced printing of triplesets.
    • Histogram printing of triple parts.
    • Auto-identification of predicate range types (dates, numbers, uris, etc.) - future work!
    • Export of the processed dataset into a file in any RDF format.
    • Cache dataset for later use.
  • Mining rules from a dataset. This operation involves several steps:
    • Save the whole dataset into the memory and index it into hash maps.
    • Load all mining thresholds, constraints and patterns.
    • Use AMIE+ algorithm for RDF rules mining.
    • Return all mined rules.
  • Rules processing (all following operations are lazy)
    • Rules filtering by patterns.
    • Rules clustering by DBScan algorithm.
    • Counting additional interest measures for mined rules (confidence, lift, pca confidence, pca lift).
    • Rules filtering and sorting by their interest measures (support, confidence, lift, cluster, etc.).
  • Rules actions
    • Sliced printing of rules.
    • Export rules into a file.
    • Cache rules for later use.

We might have noticed that the EasyMiner-Rdf module uses AMIE+ algorithm for RDF rules mining. This algorithm uses standard measures (like support, confidence, coverage, etc.) as minimal thresholds for the search space pruning (like other well-known algorithms: apriori, fp-growth). It requires to have the whole dataset loaded in the memory as a set of hash maps; therefore it has considerable memory requirements especially for larger datasets. The algorithm mines rules in the Horn clauses form where a rule has just one atom on the right side and several atoms on the left side, which are separated by logical conjunction. In this package, there are also implemented several extensions of the AMIE+ algorithm, such as rule pattern and constraints application during mining and additional interest measures.

(?x <hasChild> ?c) ^ (?y <hasChild> ?c) => (?x <isMarriedTo> ?y)

Please note, that the EasyMiner-Rdf module is still under development and its releases are considered only for experimental purposes.

Integration with the EasyMiner system

Currently, the EasyMiner-Rdf module is in the experimental version, but we can already use its part in the main EasyMiner system via RESTful service. The RDF package is now part of the EasyMiner-Miner module where the only main segment of the RDF mining functionality is integrated. We can only mine rules from RDF data by basic interest measures thresholds and the AMIE+ algorithm (without extensions). For now, other operations (like dataset transformations, preprocessing and rules postprocessing) are not allowed. We plan to add other functions as soon as there are real user needs. The RESTful operation is in detail described in the swagger doc of the EasyMiner-Miner module. See the short description below:

RDF mining init HTTP request

  • Url: <easyminer-miner-address>/api/task
  • Method: POST
  • Content-Type: multipart/form-data
  • Accept: application/json
  • Parameters:
    • name: amie
    • timeout: integer, max running time for the task in minutes (default is 10 minutes)
    • format: application/n-triples | text/turtle
    • body: dataset body in n-triples or turtle format in UTF-8 encoding.
    • min-support: integer, range: , default: 100
    • min-head-coverage: double, range: , default: 0.05
    • min-confidence: double, range: , default: 0.5
    • max-rule-length: integer, range: , default: 3
    • instances: boolean, 1 = rules with constants, 0 = only rules with variables, default: 1
  • Response codes:
    • 202: task has been accepted
    • 400: invalid task parameters

RDF mining status HTTP request

  • Url: <easyminer-miner-address>/api/task/<task-id>
  • Method: GET
  • Accept: *
  • Response codes:
    • 404: task does not exist
    • 202: task is still in progress; it returns text/plain with a status message.
    • 500: error during mining; it returns text/plain with an error message.
    • 200: task has been successfully completed; it returns a rules list in the JSON format.