Poster Session
PosterTitlesAndAbstracts.pdf
Dominik AlfkeSemiSupervised Classification on NonSparse Graphs Using Low Rank Graph Convolutional NetworksGraph Convolutional Networks (GCNs) have proven to be successful tools for semisupervised learning on graphbased datasets. For sparse graphs, linear and polynomial filter functions have yielded impressive results. For large nonsparse graphs, however, network training and evaluation becomes prohibitively expensive. This includes hypergraph applications. By introducing lowrank filters, we gain significant runtime acceleration and simultaneously improved accuracy. 

Christoph BrauerAsymptotic analysis of unrolling approaches for bilevel optimizationIn image and signal processing and beyond, quantities of interest are often reconstructed from noisy measurements by means of suitable convex optimization problems. While model based approaches usually assume that the ingredients of the problem are a priori known, data driven approaches are motivated by situations where the objective function or constraints of the problem are partially unknown and shall be learned from data. This gives rise to bilevel optimization problems in which the original convex problem appears as a lowerlevel problem in the constraints and a higherlevel loss function is minimized in order to fit parameters to available training data. Applying gradient based algorithms to bilevel optimization problems poses the difficulty to differentiate through the argmin of the lowerlevel problem. In this contribution, we study the approach to unroll a fixed number of steps of the ChambollePock algorithm applied to the lowerlevel problem in order to accomplish this kind of differentiation approximately. We investigate the asymptotic behavior of the resulting gradients and derive an new approach to learn the parameters of the lowerlevel problem. 

James CheshireOn the Thresholding Bandit ProblemWe are concerned with a specific pure exploration stochastic bandit problem. The learner samples arms sequentially up to a fixed time horizon and then aims to find the set of arms whose means are above a given threshold. We consider two cases, structured and unstructured. In the structured case one has the additional information that the means form a monotonically increasing sequence. During this poster we first present the problem, then propose an approach in the unstructured case. Finally we consider the difficulties that arise when one wishes to extend to the structured case. 

Pavel DvurechenskyOn the complexity of optimal transport problemsOptimal transport approach has become very popular is data science and machine learning. The applications include, for example, image retrieval, classification and training GANs. Nevertheless, the efficiency in practice comes with the price of coslty computations. In this work we consider theoretical aspects of computational complexity of approximating optimal transport distance and barycenter. Namely, the main question is how many arithmetic operations is needed to approximate these objects with given accuracy. We provide algorithmic upper bounds for the complexity. 

Martin GenzelA New Perspective on L1Analysis RecoveryThis poster deals with the problem of robust signal estimation from undersampled noisy subGaussian measurements under the assumption of a cosparse model. Based on generalized sparsity parameters, a nonuniform recovery guarantee for the L1analysis basis pursuit is established, enabling for accurate predictions of its sample complexity. The corresponding bounds on the number of required samples do explicitly depend on the Gram matrix of the analysis operator and therefore particularly take account of its mutual coherence structure. These findings defy conventional wisdom in previous compressed sensing research which suggests that the sparsity of the analysis coefficients is the crucial performance indicator to be studied. In fact, this common conception is not valid in many situations of practical interest, for instance, when using a redundant (multilevel) frame as sparsifying transform. By contrast, it is demonstrated that the proposed theoretical samplingrate bounds can reliably predict the reconstruction capability of various types of analysis operators, such as redundant Haar wavelets systems, total variation, or random frames. Apart from that, the presented results do naturally extend to stable recovery, which allows for handling signal vectors whose coefficient sequence is only compressible but not exactly sparse.This is joint work with Gitta Kutyniok (TU Berlin and University of Tromsø) and Maximilian März (TU Berlin). 

Alexandros GrosdosAlgebraic statistics of local Dirac mixture momentsStatistical moments have recently gained attention from an algebraic point of view. When they are polynomials in the model parameters, one can define the moment ideal which is interesting both for statistical inference and for its algebraic properties. This poster focuses on the local Dirac case. We provide generators for the ideal that encodes all relations among moments for the first order Diracs. Symbolic algebra is used to efficiently solve the problem of parameter identifiability for a mixture with few terms. For larger mixtures we turn to Prony and numerical algebra methods. Our results are showcased with examples from signal processing and statistics.This poster is based on joint work with Markus Wageringel. 

Pavel Gurevich and Hannes StukeDynamical systems approach to outlier robust machine learningWe consider a typical problem of machine learning  the reconstruction of probability distributions of observed spatially distributed data. We introduce the socalled gradient conjugate prior update and study the induced dynamical system. We will explain the dynamics of the parameters and show how one can use insights from the dynamical behavior to recover the ground truth distribution in a way that is robust against outliers. The developed approach is directly applicable to artificial neural networks.Gurevich P., Stuke H. Gradient conjugate priors and multilayer neural networks. J. Artificial Intelligence. Accepted. Preprint: arXiv:1802.02643 [math.ST]. Gurevich P., Stuke H. Robustness Against Outliers For Deep Neural Networks By Gradient Conjugate Priors. Preprint: arXiv:1905.08464 [stat.ML]. 

Martin HollerA convex variational model for learning convolutional image atoms from incomplete dataWe present and analyze a variational model for learning image descriptors from corrupted and/or incomplete data, both in function space and numerically.Building on lifting and relaxation strategies, the proposed approach is convex and allows for simultaneous image reconstruction and descriptor learning in a general, inverse problems context. Further, motivated by an improved numerical performance, we also discuss a semiconvex variant of the proposed approach. For both settings, fundamental analytical properties allowing in particular to ensure wellposedness and stability results for inverse problems are derived in a continuous setting. Exploiting convexity, we further show numerical results where globally optimal minimizers of the proposed energy are computed for imaging applications with incomplete, noisy and blurry data. This is joint work with A. Chambolle and T. Pock. 

Melanie KircheisDirect inversion of the NFFTThe NFFT, short hand for nonequispaced fast Fourier transform, is a fast algorithm to evaluate a trigonometric polynomial \[ f(x) = \sum_{k=\frac M2}^{\frac M21} \hat f_k \, \mathrm e^{2\pi \mathrm i kx} \] at nonequispaced points \(x_j \in \left[\frac 12,\frac 12\right), \,j=1,\dots,N\). In case we are given equispaced points and \(M=N\), this evaluation can be realized by means of the FFT, a fast algorithm that is invertible. Hence, we are interested in an inversion also for nonequispaced data, i.e., the Fourier coefficients \(\hat f_k\) shall be computed for given function values \(f(x_j)\). For this purpose, we use the matrix representation of the NFFT to derive a direct algorithm. Thereby, we are able to compute an inverse NFFT by dint of a modified adjoint NFFT. Finally, we show that this approach can also be explained by means of frame approximation. 

Kirandeep KourOptimal Feature Extraction using Tensor Train Decomposition for Kernel MethodsThere are several applications in science and engineering, where enormous amounts of multirelational data have been produced. They usually have a complex intrinsic structure. Generally, these data depend on various parameters; hence, they can be interpreted as multidimensional data. Although computational power has been increased drastically over the last decades, a direct treatment, involving such multidimensional data, is still almost impossible due to the curse of dimensionality. This means, the required memory storage for multidimensional data increases exponentially with respect to dimensionality and also the associated computational cost. Tensors (multiway arrays) can be considered as an essential tool to mitigate the aforementioned issue. They often provide a natural and compact representation for such massive multidimensional data. There has been a significant advancement in the use of tensor decompositions for feature extraction. The decompositions allow us to select necessary features from a large dimensional feature space.We make use of the tensor train decomposition for building an algorithm for nonseparable multidimensional data, which are, in general, not separable by a linear boundary. Therefore, we exploit Kernelized Support Vector Machines, as a base to our approach. To preserve the input data structure, we directly work with tensors as an input. For the classification in a multidimensional case, we propose a method, the socalled Support Tensor Train Machine, by utilizing the tensor train format, thus not restricting ourselves to rank one tensors. We show the robustness and efficiency of the proposed method by means of numerical experiments. 

Joseph LamWeilLocal minimax rates for closeness testing of discrete distributionsWe consider the closeness testing problem in the Poisson vector model  which is known to be asymptotically equivalent to the model of multinomial distributions. The goal is to distinguish whether two data samples are drawn from the same unspecified distribution, or whether their respective distributions are separated in L1norm. In this paper, we focus on adapting the rate to the shape of the underlying distributions, i.e. we consider a local minimax setting. We provide, to the best of our knowledge, the first local minimax rate for the separation distance up to logarithmic factors, together with a test that achieves it. In view of the rate, closeness testing turns out to be substantially harder than the related onesample testing problem over a wide range of cases. 

Ivan LeceiAnomaly Detection in time series under stochastic external excitationsThe use of machinelearning techniques for the Condition Monitoring of mechanical systems (such as wind turbines) for the identification and anticipation of anomalous/erroneous behaviors is a challenging task. On the one hand effects like aging, erosion, or imbalances are influenced by stochastic uncontrolled time dependent excitations due to the wind. They modify the observed signal in such a way that a comparison between different signals with standard distance metrics is not possible. On the other hand for the successful application of machinelearning methods one typically needs a large number of data sets in order to learn a specific model. When trying to learn the indicators for faults in mechanical systems, i.e. anomalies, the presence of a large sample may not be the case, especially when inducing these anomalies artificially may be very costly, as it is for example the case in the wind energy sector.We address those challenges by firstly using simulations for the creation of synthetic data, which represent the real behavior of the wind turbine as well as possible, resulting in a multidimensional time series including a number of sensors of interest, such as the flapwise and edgewise accelerations of the blades. Secondly, based on the simulated sensor data, the underlying physics of the dynamic system can be reconstructed, from which the so called invariants of the system can be identified. They are assumed to undergo transformations that do not change their topology under the effect of time dependent stochastic excitations. Transformations that do modify the topology are associated with anomalies or certain events. An adequate metric for the comparison between different realizations makes use of topological features. Thirdly, we use multivariate statistical methods, such as copulas, where we interpret the change of the dependence between different sensors as a change in the observed system itself. Copulas are invariant under special classes of transformations, making them very suited to address the above mentioned data analysis challenges for mechanical systems. 

Jan Macdonald and Stephan WaeldchenA RateDistortion Framework for Explaining Neural Network DecisionsWe propose a ratedistortion framework for explaining neural network decisions. We formulate the task of determining the most relevant signal components for a classifier prediction as an optimisation problem. For the case of binary signals and Boolean classifier functions we show that it is hard to solve and to approximate. Finally, we present a heuristic solution strategy for deep ReLU neural network classifiers. We present numerical experiments and compare our method to other established methods. 

Johannes MalyRobust Recovery of LowRank Matrices with NonOrthogonal Sparse Decomposition from Incomplete MeasurementsWe consider the problem of recovering an unknown effectively \( (s_1,s_2)\)sparse lowrank\(R\) matrix \(X\) with possibly nonorthogonal rank\(1\) decomposition from incomplete and inaccurate linear measurements of the form \(y = \mathcal A (X) + \eta\), where \(\eta\) is noise. We derive an optimization formulation for matrix recovery under the considered model and propose a novel algorithm to solve it. The algorithm is a fast first order method, and straightforward to implement. We prove global convergence for any linear measurement model to stationary points and local convergence to global minimizers. By adapting the concept of restricted isometry property from compressed sensing to our novel model class, we prove error bounds between global minimizers and ground truth, up to noise level, from a number of subgaussian measurements scaling as \(R(s_1+s_2)\), up to logfactors in the dimension, and relativetodiameter distortion. Simulation results demonstrate both the accuracy and efficacy of the algorithm. 

Robert Malte PolzinCoupling a multiscale stochastic precipitation model to large scale atmospheric flow dynamicsObjective is the efficient modelling of large scales in atmosphere science. We develop a small scale stochastic model for convective activity and describe convective feedback on the large scale atmospheric flow using databased learning and reduced models for efficient scale coupling. The aim is a hybrid model with a stochastic component for a conceptual description of convection embedded in a deterministic atmospheric flow model. From the meteorological perspective, we intend to use variants of the Dynamic State Index (DSI) on multiple scales and for multiple processes as new control variables for convective structures. To analyse atmospheric processes on different scales, we need to consider the process as an embedded system as the restriction of an unknown larger dynamical system. Therefore, we extend the theory and algorithms for coherent sets for embedded domains and incomplete trajectory data and make towards unified transferoperator approach for coherent sets and patterns. The stateof the art considering the work on transportoriented methods and databased analytics will be illustrated. In view of upward coupling the future work on a model for cloud characteristics with the help of the already obtained theory for coherent set analysis will be described. Perspectively, we try to combine machine learning with coherent sets in dynamical systems. 

Nicole MückeBridging the gap between Optimization and Learning Theory through Gradient Concentration gIn Optimization, it is well known that the rate of convergence for stochastic gradient descent is guided by structural assumptions on the function to be minimized. For example, smoothness allows a rate of \(1/sqrt(t)\) while additional strong convexity leads to an improvement to \(1/t\).We extend the classical analysis from Optimization to Learning Theory. In particular, we show that fast rates of convergence for SGD can be achieved assuming smoothness and (strong) convexity. The main tool is a concentration argument for gradients of the risk, allowing to derive an upper bound in terms of the localized empirical Rademacher complexity of the associated function class, improving over known results and leading to the same good error bounds as in Optimization. A special feature of this approach is an adaptive stepsize choice. Moreover, we analyse the interplay between batchsize, multiple passes over the data and stepsize choice. Our theoretical findings are accompanied by numerical studies. 

Robert NasdalaTransformed rank1 lattices for highdimensional approximationThis poster describes an extension of Fourier approximation methods for multivariate functions defined on the torus \(\mathbb{T}^{d}\) to functions in a weighted Hilbert space \(L_{2}(\mathbb{R}^{d}, \omega)\) via a multivariate change of variables \(\psi:\left(\frac{1}{2},\frac{1}{2}\right)^{d}\to\mathbb{R}^{d}\). We establish sufficient conditions on \(\psi\) and \(\omega\) such that the composition of a function in such a weighted Hilbert space with \(\psi\) yields a function in the Sobolev space \(H_{\mathrm{mix}}^{m}(\mathbb{T}^{d})\) of functions on the torus with mixed smoothness of natural order \(m \in \mathbb{N}_{0}\). In this approach we adapt algorithms for the evaluation and reconstruction of multivariate trigonometric polynomials on the torus \(\mathbb{T}^{d}\) based on single and multiple reconstructing {rank\(1\)} lattices. Since in applications it may be difficult to estimate a related function space, we make use of dimension incremental construction methods for sparse frequency sets. Various numerical tests confirm obtained theoretical results for the transformed methods. 

Daniel Potts and Michael SchmischkeLearning highdimensional additive models on the torusWe consider highdimensional \(1\)periodic functions \(f \colon \mathbb{T}^d \rightarrow \mathbb{R}\), where \(d \in \mathbb{N}\) is the spatial dimension and \(\mathbb{T} \cong \mathbb{R}\setminus\mathbb{Z}\) is the torus. Any such function which is squareintegrable, i.e., \(f \in \mathrm{L}_2(\mathbb{T}^d)\), can be uniquely represented using the multivariate classical ANOVA (analysis of variance) decomposition \(f = \sum_{\boldsymbol{u}\subseteq\mathcal{D}} f_{\boldsymbol{u}}\). Here, \(\mathcal{D} = \{ 1,2,\dots,d \}\) is the set of coordinate indices and \(f_{\boldsymbol{u}} \colon \mathbb{T}^{\boldsymbol{u}} \rightarrow \mathbb{R}\) for \(\boldsymbol{u} \subset \mathcal{D}\) is an ANOVA term.We give an overview of an approximation method for a function \(f\) in two scenarios:


Mones RaslanEfficient Approximation of Solutions of Parametric PDEs by ReLU Neural NetworksWe analyze the suitability of ReLU neural networks for the approximation of solutions of parametric partial differential equations. Without any concrete knowledge about its shape, we exploit the intrinsic lowdimensionality of the solution manifold established by the theory of reduced bases. To be more precise, for a large variety of parametric PDEs it is possible to construct neural networks that yield approximations of the parameterdependent maps which avoid a curse of dimension and essentially only depend on the size of the reduced basis. The obtained rates are significantly better than those provided by classical approximation results. 

Michael RauchensteinerRobust and Resource Efficient Identification of Two Hidden Layer Neural NetworksWe address the structure identification and the uniform approximation of two fully nonlinear layer neural networks from a small number of query samples. We approach the problem by sampling actively finite difference approximations to Hessians of the network. Gathering several approximate Hessians allows reliably to approximate the matrix subspace spanned by symmetric tensors formed by weights of the first layer together with the entangled symmetric tensors, formed by suitable combinations of the weights of the first and second layer. The identification of the 1rank symmetric tensors is then performed by the solution of a robust nonlinear program. We provide guarantees of stable recovery under a posteriori verifiable conditions. We further address the correct attribution of approximate weights to the first or second layer. By using a suitably adapted gradient descent iteration, it is possible then to estimate, up to intrinsic symmetries, the shifts of the activations functions of the first layer. Our method of identification of the weights of the network is fully constructive, with quantifiable sample complexity, and therefore contributes to dwindle the blackbox nature of the network training phase. We corroborate our theoretical results by extensive numerical experiments. 

Elisabeth UllmannOn multilevel best linear unbiased estimatorsWe discuss a novel multilevel best linear unbiased estimator (BLUE) in [1]. This is a general variance reduction technique for the estimation of the expectation of a scalarvalued quantity of interest associated with a family of multifidelity models. The key idea is to reformulate the estimation as a linear regression problem. We then show that the associated estimators are variance minimal within the class of linear unbiased estimators. By solving a sample allocation problem we further construct a variance minimal, linear, and unbiased estimator for a given computational budget. We compare our proposed estimator to other multilevel estimators such as multilevel Monte Carlo, multifidelity Monte Carlo, and approximate control variates. In addition, we provide a sharp lower bound for the variance of any linear unbiased multilevel estimator, and show that our estimator approaches this bound in the infinite lowfidelity data limit.[1] Daniel Schaden, Elisabeth Ullmann: On multilevel best linear unbiased estimators, Preprint No. IGDK201911, May 2019. 

Toni VolkmerNFFT meets Krylov methods: Fast matrixvector prod. for the graph Laplacian of fully connected netw.The graph Laplacian is a standard tool in data science, machine learning, and image processing. The corresponding matrix inherits the complex structure of the underlying network and is densely populated in certain applications. A typical task is the computation of a number of its eigenvalues and eigenvectors. Standard methods become infeasible as the number of nodes in the graph is too large. We propose to use Krylov subspace methods in combination with a fast summation approach based on the nonequispaced fast Fourier transform (NFFT) to perform dense matrixvector products with the graph Laplacian in a fast way. The enormous flexibility of the NFFT algorithm allows us to embed the accelerated matrixvector multiplication into Lanczosbased eigenvalues routines and iterative linear system solvers. We illustrate the feasibility and advantages of our approach on several numerical examples. In particular, we compare our approach with the Nyström method.This is joint work with Dominik Alfke, Daniel Potts, and Martin Stoll. 