Doctoral Colloquia

PhD Students watching a presentation at a DAI Doctoral Colloquium
Project TERAIS

Doctoral colloquia are a platform for PhD students at DAI to present their research a wider departmental audience, exchange ideas, and foster friendships. They are organized on a weekly basis during the semester by assoc. prof. Damas Gruska. They are among the activities within the TERAIS project, aimed at elevating DAI to a workplace of international academic excellence.

When
weekly on Mondays at 13:10 (during the teaching part of the semester)
Where
I-9 and online in MS Teams

Recap of the first semester of Doctoral Colloquia (Summer 2023)

Upcoming presentations

Iveta Bečková: Adversarial Examples in Deep Learning

Deep neural networks achieve remarkable performance in multiple fields. However, after proper training they suffer from an inherent vulnerability against adversarial examples (AEs). The AEs attempt to find the worst-case perturbation in input space, resulting in faulty output (such as misclassification). Different methods of attacks provide different approximations of this worst-case and each of them has certain advantages and disadvantages.The problem gets even more complicated in the deep RL setting, where time is also a factor. We will present our work on comparing different adversarial attacks, as well as plans for future research in adversarial attacks on deep RL agents.

Dana Škorvánková: Automatic 3D Human Pose Estimation, Skeleton Tracking and Body Measurements

The estimation of human body pose and its measurements is an emerging problem that drives attention in many research areas. An automatic and accurate approach to address the problem is crucial in many fields of computer vision-oriented industry. We target multiple human body analysis-related tasks, including pose estimation, pose tracking, and anthropometric body measurements estimation. Similarly to other research fields, deep learning methods proved to outperform analytical strategies. We also examine various types of visual input data, including three-dimensional point clouds. Since obtaining a large-scale database of real annotated training data is time-consuming and non-effective, we propose to substitute or augment the training process with synthetically generated human body data. We will report preliminary results of our experiments within each of the stated tasks, along with the already published parts of our research.

Fandl Matej: Attractor models of associative memory - on learning in modern Hopfield networks

Neural networks exhibiting point attractor dynamics are useful for modeling associative memory. A well known example is the Hopfield network, the modern variants of which got into the spotlight recently due to their huge storage capacity and usability in deep learning architectures. Our work builds on the interpretation of these networks as networks with 1 hidden layer of feature detectors. We see space for improvement in terms of training modern Hopfield networks, since the currently known methods either sacrifice training time for the computational complexity of the model, or the other way around. Our talk will describe our attempts at designing a novel learning rule for these networks, the use of which is expected to lead to fast and efficient distribution of labor between the hidden units.

Andrej Baláž : Compressed self-indexes for pangenomic datasets

Recent advancements in sequencing technologies brought a steep decrease in the acquisition price and a rapid increase in the growth of the size of novel genomic datasets. This growth and the shifting paradigm of jointly analysing all the related sequences, also called pangenomics, demand new data structures and algorithms for efficient processing. We will present several data structures, also called self-indexes, which form the basic building blocks of fundamental bioinformatics algorithms, such as read alignment. Due to the immense sizes of the pangenomic datasets, these self-indexes have to be compressed while remaining time-efficient to be practical. Therefore, we will show two compression techniques, tunnelling and r-indexing, and highlight our contributions to the compressed self-indexes in the form of a space-efficient construction algorithm and pattern-matching algorithm.

Kyselica Daniel: Processing of light curves of satellites and space debris for the purpose of their identification

With the increase in space traffic in recent years, precise monitoring of space debris is necessary. Observations in form of light curves provide us with information about an object’s physical properties including shape, size, surface materials and rotation. Publicly available databases contain a huge amount of gathered light curves that can be used to train machine learning models. Pieces of space debris can fall down to the Earth in a process called reentry, therefore 3D reconstruction of this event can bring more understanding of the physical processes.

Jozef Kubík: Active Learning in Large Language Models

In recent years the popularity of creating large language models has been incredibly rising. Most modern LLMs based on Transformers architecture offer great accuracy in many different text-based tasks but are often limited in some areas. For many low or mid-resource languages (such as Slovak), one of the biggest limitations is the amount of annotated data needed for fine-tuning such big model. Our work aims to highlight this problem with a BERT-line of models and suggest a promising method of reducing data for low-resource languages based on the recent developments in the area of Active learning thanks to the novel concept of Epistemic neural networks.

Mihálová Dominika: Optimal structures based on algebraic constructions

The use of computers to solve problems in the field of mathematics has become more important with the growing complexity of the considered problems. In our work, we focus on two different problems: the Cage problem from the area of Extremal Graph Theory and the Regular representation problem from the area of Algebraic Graph Theory. We present the Cage problem and computational techniques of how to approach it for given specific parameters. We describe our computational approach to the Regular representation problem of k-uniform hypergraphs for groups of order smaller than 33 with our published results.

Gajdošech Lukáš: Evolution of Fusion on High-Quality Depth Data

The availability of 3D sensors acquiring depth data caused the rise of interest in the problem of registering depth maps from a sequence with unknown transformation into a common space. Usually, surface reconstruction of the growing point cloud is performed on the fly. This procedure is known as a Fusion, and it has been a popular topic since 2011 with the release of the KinectFusion paper. A similar task in robotics, when only 2D images are available, is called Structure from Motion. Nowadays, both RGB cameras and depth sensors are available in much higher resolution, with better exposure control and higher framerates. In this presentation, we will do a chronological overview of the papers regarding the Fusion problem and available datasets. Then, we will present data obtained using a novel structured light based-sensor resulting in high 1120x800 resolution depth maps obtained in real-time. This sensor is used primarly in industrial settings. Some sequences are designed to be hard, with rapid movement between the frames, circular motion, and symmetrical objects, where the transformation calculation using only geometry data is ill-defined. On this data, we will compare the results of traditional pipelines with detectors like AKAZE for texture features versus a hybrid, where a neural network is used for obtaining texture correspondences.

Dráček František: Anomaly detection from TLE data

The rapidly growing number of objects in the Low Earth Orbit presents a significant challenge for satellite operators to ensure that satellites do not collide with other objects, which could lead to the loss of the satellite, or worse, it could cause the shattering of the satellites.

A significant research effort is devoted to assessment of close conjunction close conjunction risks. Most such approaches rely on Two line element data (TLE). TLEs contains orbital elements of space object at a specific time. Aside from estimating collision risks, TLEs are used to calculate re-entry, used for drag make-up and space weather estimation.

Our aim is to identify and study anomalies in the TLE data. The anomalies are in some cases related to the errors in the measurement, but also errors caused by their osculating(averaged) character.

Anthony Peter: An Improved Classifier for Learning and Discriminating Malware Using Knowledge Base Embedding

Malware detection is a critical task in cybersecurity, and traditional signature-based approaches are often ineffective against new and evolving threats. Recent research has shown that machine learning models can improve the accuracy of malware classification. However, existing methods often suffer from poor generalization performance and lack of explainability making it difficult to understand how they arrived at their predictions. This can make it challenging for cybersecurity experts to assess the reliability of the model and identify false positives or false negatives. In this work, we aim at a novel approach that combines a graph-based representation of malware with a neural network classifier. Entities and relationships in a knowledge graph are projected into a low-dimensional space. The approach involves learning a vector representation for each entity and relationship in the knowledge base while preserving their semantic meaning, so as to accurately discriminate between malicious and benign software. Additionally, the resulting embeddings can be used to derive explanations for the predictions, giving cybersecurity experts insights into malware behavior and decision-making processes. Overall the goal is an approach that will present a valuable tool for malware detection and analysis in real-world settings, with accurate predictions and meaningful explanations.

Lukáš Radoský: Optimization and Reuse in Development of Large Software Systems

Growing requirements and the complexity of software systems involve a sophisticated and creative process of analysis and design with the intensive cooperation of many experts with various specializations, who are informed about the real state and problems at the last moment in the time of analysis and development process. Therefore, one of the main motivations for this research is to analyze and design diverse appropriate methods in the following areas of research: 1. collaborative and parallel modeling and development through the common and shared software models for increasing productivity and work efficiency, 2. visualization of parallel layers in multidimensional space with particular modules, use cases, versions, or alternative and parallel flow of scenarios to reduce vague and redundant elements, to achieve a lean and optimal architecture, 3. progressive and advanced methods of teaching programming in a graphical environment, in virtual and augmented reality, 4. refactoring and reusing knowledge in models and source code, 5. fusion of models, visualization of functionality, patterns and use case scenarios in software architectures, 6. multidimensional visualization of source code structure in virtual and augmented reality (VR and AR); topics and sources of knowledge; evolution and quality (identification of patterns and bad smells); authors and users; interconnections with the models to reduce cognitive load and complexity of large UML models using layers, decomposing the system for review and deeper understanding, which can lead to more effective implementations.

Plan for this semester

Presentations Plan
PhD Student Date
Peter Anthony 26 Feb
František Dráček 4 Mar
Daniel Kyselica 11 Mar
Filip Kerák 18 Mar
Janka Boborová 25 Mar
Marek Šuppa 8 Apr
Radovan Gregor 15 Apr
Fatana Jafari 22 Apr
Elena Štefancová 29 Apr
Pavol Kollár 6 May
Ján Pastorek 13 May

Past presentations

Summer semester 2023/24

František Dráček: Anomaly detection from TLE data

František Dráček's photo

The burgeoning number of objects in Low Earth Orbit (LEO) poses a critical challenge for satellite operators, demanding robust collision avoidance measures. Although extensive databases track these objects, identifying anomalies within this data is crucial. This research investigates the application of machine learning methods to automatically detect anomalies in satellite data, potentially enhancing space situational awareness and safeguarding future space operations.

Peter Anthony: Tailoring Logic Explained Network for a Robust Explainable Malware Detection

Peter Anthony presenting a DAI Doctoral Colloquium

The field of malware research faces persistent challenges in adopting machine learning solutions due to issues of low generalization and a lack of explainability. While deep learning, particularly artificial neural networks, has shown promise in addressing the generalization problem, their inherent black-box nature poses challenges in providing explicit explanations for predictions. On the other hand, interpretable machine learning models, such as linear regression and decision trees, prioritize transparency but often sacrifice performance. In this work, to address the imperative needs of robustness and explainability in cybersecurity, we propose the application of a recently proposed interpretable-by-design neural network - Logic Explain Network (LEN) to the complex landscape of malware detection. We investigated the effectiveness of LEN in discriminating malware and providing meaningful explanations and evaluate the quality of the explanations over increasing feature size based on fidelity and other standard metrics. Additionally we introduce an improvement on the simplification approach for the global explanation. Our analysis were carried out using static malware features provided by the EMBER dataset. The experimental results shows LEN’s discriminating performance is competitive with Blackbox deep learning models. LEN's Explanations demonstrated high fidelity, indicating genuine reflections of the model's inner workings. However, a notable trade-off between explanation fidelity and compactness is identified.

Winter semester 2023/24

To appear.


Summer semester 2022/23

To appear.