AC Data Analytics

Motivation

The extraction of knowledge from raw data is the key success factor for application projects. Data can be in the form of sensor data, user preferences, texts or industrial log data. The detection of meaningful patterns, the acquisition of knowledge from the data as well as the linking with existing information creates the basis for efficient algorithms and the automatic learning of optimal solutions. Challenges are the size of the data (“big data”) and the heterogeneity of the data, which require a suitable combination of various machine learning algorithms and the optimization of parameters.

Goals

The work within the Application Center focusses on the development of new and efficient methods to automate the process of machine learning. The AC Data Analytics implements new methods to extract knowledge from huge data collections using various methods of machine learning, hyper-parameter optimization, Natural Language Processing, and Semantic Knowledge Management. The methods are evaluated in practical application projects and tested on public benchmark data sets.

Technology

The AC Data Analytics covers the entire range of data analysis steps: Both the theory of data analysis as well as the development of frameworks for the practical applications are conducted. The developed solutions extend and improve existing approaches by adding new innovative functions and algorithms. The methods are evaluated in practical application projects and benchmark data collections. The High Performance Computing Testbed is used to test and optimize the developed solutions.

Topics

Natural Language Processing and Text-Mining

Companies and social media (such as forums) produce a large amount of texts. These texts often contain complex knowledge of what is important for companies and for the communication with customers to ensure the success of projects. The diversity of the texts makes the analysis of the texts and the extraction of knowledge very difficult and require well scalable, smart algorithms. Specific challenges in the analysis of customer texts are colloquial word-forms, foreign-language formulations, grammatical mistakes as well as synonyms and homonyms.

In the Data Analytics application center, algorithms are developed able to convert unstructured texts into knowledge graphs. A knowledge graph describes the entities of a domain and how they are related. The creation of a Knowledge Graph, however, is very time-consuming depending on the domain. For this reason, the implemented procedures are highly optimized and support the knowledge extraction process with different views and meta-data. The developed solutions are integrated into practical applications and benchmarked on different data collections.

Recommender Systems

Recommender Systems combine a variety of methods to determine the preferences and interests of users and to generate suggestions based on these. Recommender Systems derive recommendations by analyzing data, such as users’ past behavior, similar users’ interests, objects’ metadata, and contextual aspects. Recommender Systems link this information to objects—for instance, movies, books, news articles—which could be of interest to the user.

Machine Learning

Machine learning is a part of artificial intelligence to recognize patterns and laws on the basis of large amounts of data and intelligent algorithms. Our team mainly deals with Time Series Mining, Deep Learning and Automated Machine Learning. In Automated Machine Learning, we are examining and developing methods for automating algorithm selection and hyper parameter optimization. The goal is to make the complex nature of applying machine learning methods more accessible also for non-experts. We develop Deep Learning architectures of artificial neural networks capable of learning representations, concepts and abstractions from complex data applied on industrial application problems in the context of big data. For this, we consider multi-layer perceptrons, convolutional networks, autoencoders, Boltzman machines, and recurrent networks as basic models. In Time Series Mining, we aim at developing a sound theoretical foundation for a predominalty application-dominated field in order to derive improved classification and clustering methods.

Knowledge and Innovation Management

Digitalization creates and enables a more efficient handling of knowledge. Existing knowledge can be digitized. New knowledge and ideas can be created on the basis of digitized knowledge. However, the basic requirements for knowledge management remain unchanged. The need to share, innovate, reuse, collaborate and learn is timeless. Therefore, we conduct research on the development of methods and tools to support data and knowledge services. With the help of machine learning, we work on solutions for the intelligent collection of knowledge, the enrichment of knowledge with semantic information and meta-data, and the mapping of the relationships between the extracted knowledge. In a further step, the extracted knowledge can be validated and correlated.