Department of Knowledge Technologies -> Research Areas

CONTACT
DEPARTMENT OF KNOWLEDGE TECHNOLOGIES
Jožef Stefan Institute
Jamova cesta 39
1000 Ljubljana
Slovenia

e-mail: kt@ijs.si
phone:+386 1 4773 175
fax: +386 1 4773 315

HEAD
Sašo Džeroski

DEPUTY HEAD
Martin Žnidaršič

SECRETARIES
Mili Bauer
Tina Anžič

Send comments and suggestions to the webmasters

RESEARCH AREAS

Data Mining and Machine Learning

Machine learning (knowledge discovery) explores the algorithms of learning in general, while data mining uses a variety of mostly automatic processes for analysing large amounts of data. In these areas, we focus on inductive, relational and constraint-based methods (databases, inductive logic programs), meta-learning (combining classifiers), subgroup discovery and equation discovery. We have developed a series of systems for learning logic programs and various kinds of equations (polynomial algebraic, difference and partial differential equations), learning both the structure and the parameter values of equations.

cheap dalaman transfers

Contact: Nada Lavrač, Sašo Džeroski

Projects - Hide details:
Research Programme: KT - Knowledge Technologies (Tehnologije znanja)
Duration: 2022 - 2027
Contact: Sašo Džeroski
Areas: Data mining and machine learning, Decision support, Human language technologies

HE PARC - Partnership for the Assessment of Risks from Chemicals (Partnerstvo za oceno tveganj zaradi kemikalij )
Duration: 2022 - 2029
Contact: Sašo Džeroski, Panče Panov
Areas: Data mining and machine learning

Software - Hide details:
CIPER - Constrained Inductive Polynomial Equation for Regression
Regression methods aim at inducing model of numeric data. While most state-of-the-art machine learning methods for regression focus on inducing piecewise regression models (regression and model trees), we investigate the predictive performance of regression models based on polynomial equations. We present Ciper, an efficient method for inducing polynomial equations and empirically evaluate its predictive performance on standard regression tasks.

Lagrange/Lagramge
Lagrange and Lagramge are programs for inducing algebraic and ordinary differential equations from observational data. While Lagrange is completely data-driven approach to inducing equations, Lagramge allows for knowledge-driven induction, where user can tailor the space of candidate equation structures according to the background knowledge from the domain of interest.

MLC4.5 and MLJ4.8
Learn to combine classifiers with meta decision trees.

LINUS
ILP learning of constrained logic programs.

RSD
Relational Subgroup Discovery through 1.st order feature construction. The source code of the system, in Yap Prolog, is available for download, with samples and a user manual.

SEGS
SEGS (Search for Enriched Gene Sets) is a web tool for descriptive analysis of microarray data. The analysis is peformed by looking for descriptions of gene sets that are statistically significantly over- or under-expressed between different scenarios within the context of a genome-scale experiments (DNA microarray).

CLUS
Clus is a decision tree and rule induction system that implements the predictive clustering framework. This framework unifies unsupervised clustering and predictive modeling and allows for a natural extension to more complex prediction settings such as multi-task learning and multi-label classification. While most decision tree learners induce classification or regression trees, Clus generalizes this approach by learning trees that are interpreted as cluster hierarchies. We call such trees predictive clustering trees or PCTs. Depending on the learning task at hand, different goal criteria are to be optimized while creating the clusters, and different heuristics will be suitable to achieve this.

Text and Web mining
Text mining, which aims at extracting useful information from document collections, is a well-developed field of computer science, driven by the growth of document collections available in corporate and governmental environments and especially on the Web. In many real-life scenarios, documents are also available in information networks. Examples of such networks include multimedia repositories (containing multimedia descriptions, subtitles, slide titles, etc.), social networks of professionals (containing CVs), citation networks (containing publications), and even software code (heterogeneously interlinked software artifacts containing code comments). The abundance of such document-enriched networks motivates the development of new methodologies that join the two worlds, text mining and mining heterogeneous information networks, and handle the two types of data in a common data mining framework. Handling vast document streams is a relatively new challenge emerging mainly from the self-publishing activities of Web users (e.g., blogging, twitting, and participating in discussion forums). Furthermore, news streams (e.g., Dow Jones, BusinessWire, Bloomberg, Reuters) are growing in number and rate, which makes it impossible for the users to systematically follow the topics of their interest. One of the challenges is thus to investigate techniques for online data mining, machine learning, and sentiment analysis, supporting decision making in near-real time over vast amounts of constantly evolving data.

Contact: Miha Grčar, Igor Mozetič

Projects - Hide details:
Software - Hide details:
MLC4.5 and MLJ4.8
Learn to combine classifiers with meta decision trees.

RSD
Relational Subgroup Discovery through 1.st order feature construction. The source code of the system, in Yap Prolog, is available for download, with samples and a user manual.

SEGS
SEGS (Search for Enriched Gene Sets) is a web tool for descriptive analysis of microarray data. The analysis is peformed by looking for descriptions of gene sets that are statistically significantly over- or under-expressed between different scenarios within the context of a genome-scale experiments (DNA microarray).

Human Language Technologies
Most of the information humans deal with consists of text, and Human Language Technologies enable computers to help us exploit and manage this information. Texts, in whatever language, need to be processed in various ways, from ensuring uniform encoding, to complex linguistic analyses such as assigning syntactic and semantic structure. Such methods find application in text mining, machine translation, search engines, exploratory instruments for linguists and lexicographers, digital publishing, etc. In this research area the department is developing general methods for text processing and mark-up, although with a special focus on the Slovene language. We are especially concerned with the production of standardised and available language resources, such as annotated mono- and multilingual corpora, lexica, and complex digital editions, eg. of Slovenian literature (ZRC eLibrary). While such resources can be directly used for language study, they are, for the most part, targeted towards the use of machine learning programs that automatically induce various language models from the resources.

Contact: Tomaž Erjavec

Projects - Hide details:
Research Programme: KT - Knowledge Technologies (Tehnologije znanja)
Duration: 2022 - 2027
Contact: Sašo Džeroski
Areas: Data mining and machine learning, Decision support, Human language technologies

Software - Hide details:
nl.ijs.si on-line services
Concordancing and lemmatization

LemmaGen
A system for learning Ripple Down Rules specialized for automatic generation of lemmatizers. So far, LemmaGen was used to produce lemmatizers for 12 different languages.

Project JOS - Linguistic Annotation of Slovene (Jezikoslovno označevanje slovenskega jezika)
Morphosyntactic specifications, two annotated corpora, Web concordancer, and service for text markup. The first major and freely available (Creative Commons) set of resources for morphosyntactic annotation and lemmatisation of Slovene.

Decision Support
Decision Support (DS) aims to provide computational support to (groups of) people faced with difficult decisions. DS provides a rich collection of decision analysis, simulation, optimization and modeling techniques, including hierarchical multi-attribute models, decision trees, influence diagrams and belief networks. DS also involves software tools such as decision support systems, group decision support and mediation systems. We have developed a series of decision models and support systems, focusing on qualitative, multi-attribute decision making and models of uncertainty, necessary for capturing realistic aspects of complex decision problems. We continue to develop and expand our main software tool, DEXi.
sohbet odaları antalya escort beylikdüzü escort kiralık bahis sitesi hacklink satın al beagle yavrusu chip satışı zynga chip chip satın al zynga chip

Contact: Marko Bohanec, Martin Žnidaršič

Projects - Hide details:
Research Programme: KT - Knowledge Technologies (Tehnologije znanja)
Duration: 2022 - 2027
Contact: Sašo Džeroski
Areas: Data mining and machine learning, Decision support, Human language technologies

Software - Hide details:
DEXi (DEX for Instruction)
An educational computer program for qualitative decision modelling (developed within Slovenian Ro (Computer Literacy) Programme; 1999-2000)

proDEX
proDEX is a tool for qualitative multi-attribute modelling in basic and extended DEX methodology.

GMOtrack
GMOtrack is a program that supports traceability of genetically modified organisms. Given a table of GMOs (along with the probabilities of their presence and the genetic elements present in their genome) GMOtrack computes the optimal set of screening assays for a two-phase testing strategy.

ECOGEN Soil Quality Index
ESQI is a qualitative multi-attribute model, developed within the ECOGEN project, that calculates an index of soil quality relative to a selected standard soil condition ("medium" value of attributes). The model is implemented in a server-side script, and accessed through an interactive Web page.

Biomine SegMine e-ZISS LemmaGen ViperCharts