Mathematics For Computer Science Pdf Github

Aggregation of Photovoltaic Panels Alessandro Abate Automated Verification C MSc

The increased relevance of renewable energy sources has modified the behaviour of the electrical grid. Some renewable energy sources affect the network in a distributed manner: whilst each unit has little influence, a large population can have a significant impact on the global network, particularly in the case of synchronised behaviour. This work investigates the behaviour of a large, heterogeneous population of photovoltaic panels connected to the grid. We employ Markov models to represent the aggregated behaviour of the population, while the rest of the network (and its associated consumption) is modelled as a single equivalent generator, accounting for both inertia and frequency regulation. Analysis and simulations of the model show that it is a realistic abstraction, and quantitatively indicate that heterogeneity is necessary to enable the overall network to function in safe conditions and to avoid load shedding. This project will provide extensions of this recent research. In collaboration with an industrial partner.

Prerequisites: Computer-Aided Formal Verification, Probabilistic Model Checking

Analysis and verification of stochastic hybrid systems Alessandro Abate Automated Verification C MSc

Stochastic Hybrid Systems (SHS) are dynamical models that are employed to characterize the probabilistic evolution of systems with interleaved and interacting continuous and discrete components.

Formal analysis, verification, and optimal control of SHS models represent relevant goals because of their theoretical generality and for their applicability to a wealth of studies in the Sciences and in Engineering.

In a number of practical instances the presence of a discrete number of continuously operating modes (e.g., in fault-tolerant industrial systems), the effect of uncertainty (e.g., in safety-critical air-traffic systems), or both occurrences (e.g., in models of biological entities) advocate the use of a mathematical framework, such as that of SHS, which is structurally predisposed to model such heterogeneous systems.

In this project, we plan to investigate and develop new analysis and verification techniques (e.g., based on abstractions) that are directly applicable to general SHS models, while being computationally scalable.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Probability and Computing, Automata Logic and Games

Prerequisites: Familiarity with stochastic processes and formal verification

Automated verification of complex systems in the energy sector Alessandro Abate Automated Verification C MSc

Smart microgrids are small-scale versions of centralized electricity systems, which locally generate, distribute, and regulate the flow of electricity to consumers. Among other advantages, microgrids have shown positive effects over the reliability of distribution networks.

These systems present heterogeneity and complexity coming from 1. local and volatile renewables generation, and 2. the presence of nonlinear dynamics both over continuous and discrete variables. These factors calls for the development of proper quantitative models. This framework provides the opportunity of employing formal methods to verify properties of the microgrid. The goal of the project is in particular to focus on the energy production via renewables, such as photovoltaic panels.

The project can benefit form a paid visit/internship to industrial partners.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Probability and Computing, Automata Logic and Games

Prerequisites: Familiarity with stochastic processes and formal verification, whereas no specific knowledge of smart grids is needed.

Bayesian Reinforcement Learning: Robustness and Safe Training Alessandro Abate Automated Verification C In this project we shall build on recent work on ``Safe Learning'' [2], which frames classical RL algorithms to synthesise policies that abide by complex tasks or objectives, whilst training safely (that is, without violating given safety requirements). Tasks/objectives for RL-based synthesis can be goals expressed as logical formulae, and thus be richer than standard reward-based goals. We plan to frame recent work by OXCAV [2] in the context of Bayesian RL, as well as to leverage modern robustness results, as in [3]. We shall pursue both model-based and -free approaches. [2] M. Hasanbeig, A. Abate and D. Kroening, ``Cautious Reinforcement Learning with Logical Constraints,'' AAMAS20, pp. 483-491, 2020. [3] B. Recht, ``A Tour of Reinforcement Learning: The View from Continuous Control,'' Annual Reviews in Control, Vol. 2, 2019. show / hide details

Development of software for the verification of MPL models Alessandro Abate Automated Verification C MSc

This project is targeted to enhance the software tollbox VeriSiMPL (''very simple''), which has been developed to enable the abstraction of Max-Plus-Linear (MPL) models. MPL models are specified in MATLAB, and abstracted to Labeled Transition Systems (LTS). The LTS abstraction is formally put in relationship with its MPL counterpart via a (bi)simulation relation. The abstraction procedure runs in MATLAB and leverages sparse representations, fast manipulations based on vector calculus, and optimized data structures such as Difference-Bound Matrices. LTS can be pictorially represented via the Graphviz tool and exported to PROMELA language. This enables the verification of MPL models against temporal specifications within the SPIN model checker.

Courses: Computer-Aided Formal Verification, Numerical Solution of Differential Equations

Prerequisites: Some familiarity with dynamical systems, working knowledge of MATLAB and C

Formal verification of a software tool for physical and digital components Alessandro Abate, Daniel Kroening Automated Verification MSc

We are interested in working with existing commercial simulation software that is targeted around the modelling and analysis of physical, multi-domain systems. It further encompasses the integration with related software tools, as well as the interfacing with devices and the generation of code. We are interested in enriching the software with formal verification features, envisioning the extension of the tool towards capabilities that might enable the user to raise formal assertions or guarantees on models properties, or to synthesise correct-by-design architectures. Within this long-term plan, this project shall target the formal generation of faults warnings, namely of messages to the user that are related to ``bad (dynamical) behaviours'' or to unwanted ``modelling errors''. The student will be engaged in developing algorithmic solutions towards this goal, while reframing them within a formal and general approach. The project is inter-disciplinary in dealing with hybrid models involving digital and physical quantities, and in connecting the use of formal verification techniques from the computer sciences with more classical analytical tools from control engineering

Courses: Computer-Aided Formal Verification, Software Verification. Prerequisites: Knowledge of basic formal verification.

Innovative Sensing and Actuation for Smart Buildings Alessandro Abate Automated Verification C MSc Sensorisation and actuation in smart buildings and the development of smart HVAC (heat, ventilation and air-conditioning) control strategies for energy management allow for optimised energy usage, leading to the reduction in power consumption or to optimised demand/response strategies that are key in a rather volatile market. This can further lead to optimised maintenance for the building devices. Of course the sensitisation of buildings leads to heavy requirements on the overall infrastructure: we are interested in devising new approaches towards the concept of using ``humans as sensors''. Further, we plan to investigate approaches to perform meta-sensing, namely to extrapolate the knowledge from physical sensors towards that of virtual elements (as an example, to infer the current building occupancy from correlated measurements of temperature and humidity dynamics). On the actuation side, we are likewise interested in engineering non-invasive minimalistic solutions, which are robust to uncertainty and performance-certified. The plan for this project is to make the first steps in this direction, based on recent results in the literature. The project can benefit from a visit to Honeywell Labs (Prague). Courses: Computer-Aided Formal Verification. Prerequisites: Some familiarity with dynamical systems. show / hide details

Model learning and verification Alessandro Abate, Daniel Kroening Automated Verification C MSc

This project will explore connections of techniques from machine learning with successful approaches from formal verification. The project has two sides: a theoretical one, and a more practical one: it will be up to the student to emphasise either of the two sides depending on his/her background and/of interests. The theoretical part will develop existing research, for instance in one of the following two inter-disciplinary domain pairs: learning & repair, or reachability analysis & Bayesian inference. On the other hand, a more practical project will apply the above theoretical connections on a simple models setup in the area of robotics and autonomy.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning

Precise simulations and analysis of aggregated probabilistic models Alessandro Abate Automated Verification C MSc

This project shall investigate a rich research line, recently pursued by a few within the Department of CS, looking at the development of quantitative abstractions of Markovian models. Abstractions come in the form of lumped, aggregated models, which are beneficial in being easier to simulate or to analyse. Key to the novelty of this work, the proposed abstractions are quantitative in that precise error bounds with the original model can be established. As such, whatever can be shown over the abstract model, can be as well formally discussed over the original one.

This project, grounded on existing literature, will pursue (depending on the student's interests) extensions of this recent work, or its implementation as a software tool.

Courses: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning

Reinforcement Learning for Space Operations Alessandro Abate, Licio Romao Automated Verification C Oxcav has an ongoing collaboration with the European Space Agency (ESA) that involves applying Reinforcement Learning (RL) algorithms to a satellite called OPS-SAT, which has been launched in 2019 and is a flying laboratory that allows ESA's partners to test and validate new techniques (more information can be found at https://www.esa.int/Enabling_Support/Operations/OPS-SAT). This project aims at designing controllers that will be used to perform nadir pointing and sun-tracking of OPS-SAT, while meeting some specifications (e.g., admissible nadir pointing errors). The focus will be on data-driven methods that leverage available sensors (gyroscopes, GPS, fine sun sensor, magnetometer) and actuators data using a RL architecture to come up with a safe policy that can yield an adequate performance. The main tasks of the project will consist in (1) exploring an ESA platform called MUST to collect all the necessary data and (2) implementing a RL scheme that will be later deployed in the satellite. Throughout the project you will have the opportunity to work with state-of-the-art data-driven techniques that have been developed at Oxcav, under the supervision of Prof. Alessandro Abate and Dr. Licio Romao. show / hide details

Safe Reinforcement Learning Alessandro Abate Automated Verification C MSc

Reinforcement Learning (RL) is a known architecture for synthesising policies for Markov Decision Processes (MDP). We work on extending this paradigm to the synthesis of 'safe policies', or more general of policies such that a linear time property is satisfied. We convert the property into an automaton, then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the automaton. With this reward function, RL synthesises a policy that satisfies the property: as such, the policy synthesis procedure is `constrained' by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP. We evaluate the performance of the algorithm on numerous numerical examples. This project will provide extensions of these novel and recent results.

Prerequisites: Computer-Aided Formal Verification, Probabilistic Model Checking, Machine Learning

Safety verification for space dynamics via neural-based control barrier functions Alessandro Abate Automated Verification C Barrier functions are Lyapunov-like functions that serve as certificates for the safety verification of dynamical and control models. The OXCAV group has recently worked on the automated and sound synthesis of barrier functions structured as neural nets, with an approach that uses SAT modulo theory. In this project, we shall pursue two objectives: 1. Apply recent results [1] on sound and automated synthesis of barrier certificates on models that are pertinent to the Space domain, such as models for attitude dynamics. Airbus will support this part. 2. Develop new results that extend theory and algorithms to models encompassing uncertainty, such as probabilistic models or models that are adaptive to sensed data. Airbus will support this effort providing environments and models for experiments and testing. [1] A. Abate, D. Ahmed and A. Peruffo, ``Automated Formal Synthesis of Neural Barrier Certificates for Dynamical Models,'' TACAS, To appear, 2021. show / hide details

Software development for abstractions of stochastic hybrid systems Alessandro Abate Automated Verification C MSc

Stochastic hybrid systems (SHS) are dynamical models for the interaction of continuous and discrete states. The probabilistic evolution of continuous and discrete parts of the system are coupled, which makes analysis and verification of such systems compelling. Among specifications of SHS, probabilistic invariance and reach-avoid have received quite some attention recently. Numerical methods have been developed to compute these two specifications. These methods are mainly based on the state space partitioning and abstraction of SHS by Markov chains, which are optimal in the sense of reduction in abstraction error with minimum number of Markov states.

The goal of the project is to combine codes have been developed for these methods. The student should also design a nice user interface (for the choice of dynamical equations, parameters, and methods, etc.).

Courses: Probabilistic Model Checking, Probability and Computing, Numerical Solution of Differential Equations

Prerequisites: Some familiarity with stochastic processes, working knowledge of MATLAB and C

Contextuality and quantum advantage Samson Abramsky Foundations, Structures, and Quantum C MSc "The aim of the project is to explore the use of quantum resources to gain advantage over classical computational models in a range of tasks. The tasks which may be considered include: bounded-depth circuits, non-local games, measurement-based computation, communication complexity, and query complexity. High-level methods for describing quantum solutions for such problems will be studied, based on recent work on a structural theory of contextuality and its application to quantum advantage developed by Abramsky and co-workers. Both qualitative aspects - existence of perfect strategies or solutions - and quantitative aspects - contextuality monotones and resource inequalities - will be studied. References: Contextual fraction as a measure of contextuality, Abramsky, Barbosa and Mansfield PRL 2017 The quantum monad, Abramsky, Barbosa, de Silva and Zapata, MFCS 2017. " show / hide details

Pebble games, monads and comonads in classical, probabilistic and quantum computation Samson Abramsky Foundations, Structures, and Quantum C MSc Pebble games are an important and widely used tool in logic, algorithms and complexity, constraint satisfaction and database theory. The idea is that we can explore a pair of structures, e.g. graphs, by placing up to k pebbles on them, so we have a window of size at most k on the two structures. If we can always keep these pebbles in sync so that the two k-sized windows look the same (are isomorphic) then we say that Duplicator has a winning strategy for the k-pebble game. This gives a resource-bounded notion of approximation to graphs and other structures which has a wide range of applications. Monads and comonads are widely used in functional programming, e.g. in Haskell, and come originally from category theory. It turns out that pebble games, and similar notions of approximate or local views on data, can be captured elegantly by comonads, and this gives a powerful language for many central notions in constraints, databases and descriptive complexity. For example, k-consistency can be captured in these terms; another important example is treewidth, a key parameter which is very widely used to give "islands of tractability" in otherwise hard problems. Finally, monads can be used to give various notions of approximate or non-classical solutions to computational problems. These include probabilistic and quantum solutions. For example, there are quantum versions of constraint systems and games which admit quantum solutions when there are no classical solutions, thus demonstrating a "quantum advantage". Monads and comonads can be used in combination, making use of certain "distributive laws". The aim of this project is to explore a number of aspects of these ideas. Depending on the interests and background of the student, different aspects may be emphasised, from functional programming, category theory, logic, algorithms and descriptive complexity, probabilistic and quantum computation. Some specific directions include: 1. Developing Haskell code for the k-pebbling comonad and various non-classical monads, and using this to give a computational tool-box for various constructions in finite model theory and probabilistic or quantum computation. 2. Developing the category-theoretic ideas involved in combining monads and comonads, and studying some examples. 3. Using the language of comonads to capture other important combinatorial invariants such as tree-depth. 4. Developing the connections between category theory, finite model theory and descriptive complexity. Some background reading. 1. Leonid Libkin, Elements of finite model theory. (Background on pebble games and the connection with logic and complexity). 2. The pebbling comonad in finite mode theory. S. Abramsky, A. Dawar and P. Wang. (Technical report describing the basic ideas which can serve as a starting point.) show / hide details

Sheaf theoretic semantics for vector space models of natural language Samson Abramsky Foundations, Structures, and Quantum B C MSc

Contextuality is a fundamental feature of quantum physical theories and one that distinguishes it from classical mechanics. In a recent paper by Abramsky and Brandenburger, the categorical notion of sheaves has been used to formalize contextuality. This has resulted in generalizing and extending contextuality to other theories which share some structural properties with quantum mechanics. A consequence of this type of modeling is a succinct logical axiomatization of properties such as non-local correlations and as a result of classical no go theorems such as Bell and Kochen-Soecker. Like quantum mechanics, natural language has contextual features; these have been the subject of much study in distributional models of meaning, originated in the work of Firth and later advanced by Schutze. These models are based on vector spaces over the semiring of positive reals with an inner product operation. The vectors represent meanings of words, based on the contexts in which they often appear, and the inner product measures degrees of word synonymy. Despite their success in modeling word meaning, vector spaces suffer from two major shortcomings: firstly they do not immediately scale up to sentences, and secondly, they cannot, at least not in an intuitive way, provide semantics for logical words such as `and', `or', `not'. Recent work in our group has developed a compositional distributional model of meaning in natural language, which lifts vector space meaning to phrases and sentences. This has already led to some very promising experimental results. However, this approach does not deal so well with the logical words.

The goal of this project is to use sheaf theoretic models to provide both a contextual and logical semantics for natural language. We believe that sheaves provide a generalization of the logical Montague semantics of natural language which did very well in modeling logical connectives, but did not account for contextuality. The project will also aim to combine these ideas with those of the distributional approach, leading to an approach which combines the advantages of Montague-style and vector-space semantics.

Prerequisites

==========

The interested student should have taken the category theory and computational linguistics courses, or be familiar with the contents of these.

Complex Object Querying and Data Science Michael Benedikt Data and Knowledge MSc "We will look at query languages for transforming nested collections (collections that might contain collections). Such languages can be useful for preparing large scale feature data for machine learning algorithms. We have a basic implementation of such a language that we implement on top of the big-data framework Spark. The goal of the project is to extend the language with iteration. One goal will be to look at how to adapt processing techniques for nested data to support iteration. Another, closer to application is to utilize iteration to support additional steps of a data science pipeline, such as sampling. " show / hide details

Genomic analysis using machine learning and large scale data management techniques Michael Benedikt Data and Knowledge MSc

We will investigate novel risk analysis -- likelihood
of a patient having some medical condition -- using statistical analysis of
a variety of genomics data sources. This will make use of some new infrastructure for
data management -- a query language for nested data -- along with the use
of the SPARK framework, coupled with some basic statistics
and machine learning algorithms. No background in genomics or statistics
is necessary, but the project
does require knowledge of the basics of data management (e.g. undergrad database
course or some experience with SQL) and good programming skills.

Interpolation Michael Benedikt Data and Knowledge C MSc

Let F1 and F2 be sentences (in first-order logic, say) such that F1 entails F2: that is, any model of F1 is also a model of F2. An interpolant is a sentence G such that F1 entails G, and G entails F2, but G only uses relations and functions that appear in *both* F1 and F2.

The goal in this project is to explore and implement procedures for constructing interpolants, particularly for certain decidable fragments of first-order logic. It turns out that finding interpolants like this has applications in some database query rewriting problems.

Prerequisites: Logic and Proof (or equivalent)

Optimization of Web Query Plans Michael Benedikt Data and Knowledge MSc

This project will look at how to find the best plan for a query, given a collection of data sources with access restrictions.

We will look at logic-based methods for analyzing query plans, taking into account integrity constraints that may exist on the data.

Optimized reasoning with guarded logics Michael Benedikt Data and Knowledge C MSc "Inference in first-order logic is undecidable, but a number of logics have appeared in the last decade that achieve decidability. Many of them are guarded logics, which include the guarded fragment of first-order logic. Although the decidability has been known for some decades, no serious implementation has emerged. Recently we have developed new algorithms for deciding some guarded logics, based on resolution, which are more promising from the perspective of implementation. The project will pursue this both in theory and experimentally." Prerequisites A knowledge of first-order logic, e.g. from the Foundations of CS or Knowledge Representation and Reasoning courses, would be important. show / hide details

High-throughput Molecular Algorithms Luca Cardelli Automated Verification MSc Recent technological developments allow massive parallel reading (sequencing) and writing (synthesis) of heterogeneous pools of DNA strands. We are no longer limited to circuits built out of a small number of different strands, nor to reading out a few bits of output by fluorescence. While these emerging capabilities are somewhat stochastic and error-prone, new classes of algorithms should feasibly take advantage of them. We plan to start by investigating new algorithms to record the order in which a series of molecular events occurs, e.g., the appearance of certain chemicals over time in a dynamic chemical soup. A large network of DNA gates acting in parallel records the relationships between the events; the result can be read out by massive DNA sequencing. We plan to study feasible DNA structures to implement this and similar algorithms, simulate them, and investigate their correctness and robustness. Prerequisites: background in distributed algorithms, verification, and/or synthetic biology. show / hide details

Noise in Molecular Logic Gates Luca Cardelli Automated Verification MSc

A Boolean variable B that ranges over {true,false} can be represented by two molecular species {B_true,B_false}. Boolean gates like AND can then be described by chemical reactions over those species. These reactions can in turn be implemented by DNA molecules and physically executed. Networks of such Boolean gates can function as controllers for molecular-scale devices, including devices we may want to insert into living organisms. We want to investigate, by mathematical analysis, modelchecking, and simulation, the noise behavior of these logical gates, due to both noisy inputs and to the intrinsic molecular fluctuations generated by chemical reactions. How we can we compute reliably in such a regime, and how can we design logic gates that are resistant to noise? Prerequisites: Background in verification and/or simulation

Merging all the biomedical image processing power of ITK with Matlab Ramón Casero Cañas, Vicente Grau Computational Biology and Health Informatics MSc The Insight Toolkit (ITK), a C++ library, is the most advanced software base for biomedical image processing. Our group has produced a basic working prototype of a software interface to run ITK image filters from Matlab [1], the biomedical engineer's platform of choice. We are looking for a student who, as a minimum, integrates ~155 existing image filters into this interface to make them available for biomedical research. This is more or less straightforward by hand. The challenge for an exceptional student would be to make a parser to automatically link ITK filters to Matlab. [1] http://code.google.com/p/gerardus/source/browse/trunk/matlab/?r=1082#matlab%2FItkToolbox show / hide details

Topics in Graph Representation Learning Ismail Ilkan Ceylan Artificial Intelligence and Machine Learning MSc

Various projects in the theme of 'Graph Representation Learning', particularly suitable for MSc and 4th year students - get in touch for the most up-to-date information.

Information processing and conservation laws in general probabilistic theories Giulio Chiribella Foundations, Structures, and Quantum B C MSc "Project description and suitability for different students considered. See the appendix. -areas in which I could supervise student projects in quantum information theory and quantum foundations. -general areas for student supervision: quantum information and foundations of quantum mechanics. show / hide details

Diagrammatic methods in computer security Bob Coecke, Stefano Gogioso Foundations, Structures, and Quantum MSc

Throughout the last decade, categorical and diagrammatic methods have found enormous success in the field of quantum information theory, and are currently being extended to a variety of other disciplines. As part of this project, we will explore their application to classical cryptography and computer security. A number of distinct approaches are available, and the project can be tailored to each student's specific interests in the field.

Prerequisites: One of Quantum Computer Science or Categorical Quantum Mechanics required. Computer Security (or equivalent experience) desirable, but not required.

General game playing using inductive logic programming Andrew Cropper Artificial Intelligence and Machine Learning B C MSc In the general game playing competition, an agent is given the rules of a game (described as a logic program) and then starts playing the game (i.e. the agent generates traces of behaviour). This project will invert the task so that an agent is given traces of behaviour and then has to learn a set of rules that could produce the behaviour. The project will use techniques from inductive logic programming, a form of machine learning which learns computer programs from input/output examples. This work is mainly implementation and experimentation. Prerequisites: familiarity with logic programming (Prolog) show / hide details

Machine learning efficient time-complexity programs Andrew Cropper Artificial Intelligence and Machine Learning C MSc "Metaopt [1,2] is an inductive logic programming (ILP) system that learns efficient logic programs from examples. For instance, given input/output examples of unsorted/sorted lists Metaopt learns quicksort (rather than bubblesort). However, Metaopt does not identify the complexity class of learned programs. The goals of this project are to (1) develop methods to identify the complexity class of a program during the learning, and (2) see whether the complexity information can improve the proof search. This work is a mix of theory, implementation, and experimentation. [1] A. Cropper and S.H. Muggleton. Learning efficient logic programs. Machine learning (2018). https://doi.org/10.1007/s10994-018-5712-6 [2] A. Cropper and S.H. Muggleton. Learning efficient logical robot strategies involving composable objects. In Proceedings of the 24th International Joint Conference Artificial Intelligence (IJCAI 2015), pages 3423-3429. IJCAI, 2015." Prerequisites: familiarity with general machine learning (and preferably computational learning theory) and ideally logic programming show / hide details

Projects on logic-based machine learning (inductive logic programming) Andrew Cropper Artificial Intelligence and Machine Learning B C MSc Description: Inductive logic programming (ILP) [1,2] is a form of machine learning based on mathematical logic. Given examples and background knowledge (BK), the goal of ILP is to induce a logic program (a set of logical rules) that with the BK generalises the examples. For instance, given examples of unsorted/sorted lists, the goal of ILP is to induce a sorting algorithm. I am happy to supervise projects on ILP. These projects will particularly suit students interested in constraint satisfaction, symbolic machine learning, and knowledge representation. Prerequisites: ideally you will have taken the courses Logic and Proof, Intelligent Systems, and Knowledge Representation & Reasoning show / hide details

Relevancy of background knowledge in inductive logic programming Andrew Cropper Artificial Intelligence and Machine Learning B C MSc Inductive logic programming (ILP) is a form of machine learning which learns computer programs from input/output examples of a target program. To improve learning efficiency, ILP systems use background knowledge (i.e. auxiliary functions such as partition and append). However, most ILP systems cannot handle large amounts of background knowledge, and overcoming this limitation is a key challenge in ILP. The goal of this project is to explore techniques to identify relevant background knowledge. There is much freedom with this project, where one could focus on logical aspects, such as finding logically redundant background knowledge, or one could instead focus on statistical aspects, such as finding background knowledge most likely to be relevant for a given task. This work is a mix of theory, implementation, and experiments. Prerequisites: familiarity with statistics, statistical machine learning, and ideally logic programming show / hide details

REASONING ABOUT TEMPORAL KNOWLEDGE Bernardo Cuenca Grau, Przemysław Wałęga Artificial Intelligence and Machine Learning, Data and Knowledge MSc

"Our world produces nowadays huge amounts of time stamped data, say measurements from meteorological stations, recording of online payments, GPS locations of your mobile phone, etc. To reason on top of such massive temporal datasets effectively we need to provide a well-structured formalisation of temporal knowledge and to devise algorithms with good computational properties. This, however, is highly non-trivial; in particular logical formalisms for temporal reasoning often have high computational complexity.

This project provides an opportunity to join the Knowledge Representation and Reasoning group and participate in exciting EPSRC-funded research on temporal reasoning, temporal knowledge graphs, and reasoning over streaming data. There are opportunities to engage both in theoretically-oriented and practically-oriented research in this area. For example, in recent months, we have been investigating the properties and applications of DatalogMTL---a temporal extension of the well-known Datalog rule language which is well-suited for reasoning over large and frequently-changing temporal datasets; the project could focus on analysing the theoretical properties of DatalogMTL and its fragments such as its complexity and expressiveness, or alternatively in more practical aspects such as optimisation techniques for existing reasoning algorithms. There are many avenues for research in this area and we would be more than happy to discuss possible concrete alternatives with the student(s)."

The theoretical part of the project requires good understanding of logics (completing Knowledge Representation and Reasoning course could be beneficial), whereas the practical part is suited for those who have programming skills.

Analysis of Schelling segregation models. Edith Elkind Artificial Intelligence and Machine Learning MSc In Schelling's model of strategic segergation, agents are placed on a highly regular graph (such as a line of a grid), and each agent belongs to one of k types. Agents have a preference towards being surrounded by agents who belong to their own type, and may change locations if they are not happy at their current location (by moving to an empty location or swapping with another discontent agent). Many variants of this basic model have been considered over the years. The goal of this project is to investigate, theoretically and empirically, the degree of diversity of stable outcomes in Schelling's model, as well as to explore novel variants of the model where agents's preferences may evolve over time. show / hide details

Stable roommates problem with structured preferences Edith Elkind Artificial Intelligence and Machine Learning MSc In the stable roommates problem, there are k rooms of varying sizes and n agents, who need to be allocated places in these rooms; it is sometimes assumed that the total number of places is exactly n. Agents may have preferences over rooms as well as potential rommates and may move between rooms so as to improve their assignment. The goal of the project is to understand the complexity of finding stable outcomes in such settings, assuming that agents' preferences over assignments have a simple structure (e.g., each agent wants to maximize the number of friends in their room), and to compare the best social welfare that can be achieved by a centralizes assignment and that achievable in a stable assignment. show / hide details

Form Corpus & Benchmark Tim Furche Data and Knowledge B C MSc (Supervisor C Schallhart) Web pages are the past since interactive web application interfaces have reshaped the online world. With all their feature richness, they enrich our personal online experience and provide some great new challenges for research. In particular, forms became much complex in assisting the user during the _lling, e.g., with completion options, or through structuring the form _lling process by dynam-ically enabling or hiding form elements. Such forms are a very interesting research topic but their complexity prevented so far the establishment of a corpus of modern forms to benchmark di_erent tools dealing with forms automatically. This MSC project will _ll this gap in building a corpus of such forms: Based on a number of production sites from one or two domains, we will build our corpus of web interfaces, connected to a (shared) database. Not only will the future evaluations in the DIADEM project rely on this corpus, but we will also publish the corpus { promoting it as a common benchmark for the research community working on forms. Knowledge in Java, HTML, CSS, Javascript, and web application development are required. show / hide details

Form Filling & Probing Tim Furche Data and Knowledge B C MSc (Joint with C Schallhart) Unearthing the knowledge hidden in queryable web sites requires a good understanding of the involved forms. As part of DIADEM, we are developing OPAL (Ontology based web Pattern Analysis with Logic), a tool to recognize forms belonging to a parameterizable application domain, such as the real estate or used car market. OPAL determines the meaning of individual form elements, e.g., it identi_es the _eld for the minimum or maximum price or for some location. This MSC project will build upon OPAL to not only deal with static forms but also with sequences of interrelated forms, as in case of a rough initial form, followed by a re_nement form, or in case of forms showing some options only after _lling some other parts. Over the course of this MSC project, we will develop a tool which invokes OPAL to analyze a given form, to explore all available submission mechanisms on this form, analyze the resulting pages for forms continuing the initial query, and to combine the outcome all found forms into a single interaction description. Knowledge in Java, HTML, CSS are required, prior experience in logic programming would be a strong plus. show / hide details

Protein language modeling - understanding the language of life (joint project with Harvard Medical School) Yarin Gal, Pascal Notin Artificial Intelligence and Machine Learning C MSc

Proteins leverage the genetic information encoded in DNA to drive the functioning of all organisms around us. Composed as a sequence of amino acids, their structure is extremely expressive and diverse. Deep learning techniques inspired from natural language processing methods have recently been very successful at implicitly teasing out the constraints underpinning these structures by posing the problem as a language modeling task.

This project aims at reaching a finer understanding of the representations learnt by these models to help answer key questions in computational genomics, from uncovering meaningful clusters within or across protein families, to a better understanding of the viral evolution process.

You will get exposure to different deep learning architectures (e.g., VAE, transformers), as well as techniques in dimensionality reduction, latent space visualization and clustering.

This project is a joint collaboration between OATML (https://oatml.cs.ox.ac.uk/) and the Marks lab (https://www.deboramarkslab.c

Prerequisites: * strong python experience

* experience with deep learning, generative models, sequence models

An Empirical Investigation of Training Speed and Generalization " Yarin Gal, Clare Lyle, Lisa Schut Artificial Intelligence and Machine Learning MSc Predicting and understanding generalization in deep neural networks remains an important and open problem in the field. Recent work suggests that it's possible to leverage properties of a neural network's optimization trajectory both to predict generalization performance and to construct generalization bounds. This line of inquiry also promises to provide insight into the connection between training speed and generalization. The focus of this project will be to investigate an estimator of generalization error based on properties of minibatch stochastic gradient descent updates. This may lead to new generalization bounds based on optimization trajectories, or to a principled early stopping criterion for stochastic gradient descent depending on the interests of the student. Prerequisites strong background in Python (ideally experience with a deep learning framework such as TensorFlow/PyTorch/Jax), strong background in probability and familiarity with learning theory show / hide details

Group-equivariance for data-efficient deep learning in Earth Observation Yarin Gal, Freddie Kalaitzis Artificial Intelligence and Machine Learning MSc

Summary: Leverage known symmetries in satellite data, e.g. rotation, flips, scaling, for more data-efficient learning of downstream tasks. Abstract: Choosing the right inductive bias in machine learning tasks can reduce the amount of data required for training by orders of magnitude. One inductive bias that is ubiquitous in computer vision tasks is shift-invariance (e.g. classification) and shift-equivariance (e.g. single/multi image super-resolution, segmentation, image translation, detection), that is, there is a bijection between shifts in the input domain and shifts in the output co-domain. The success of deep learning in computer vision is owed to these inductive biases being baked into an architecture through CNN layers, which in theory allows them to detect the same feature anywhere in an image, under any shift. Rotations and flips and scalings are also desirable symmetries for low-level features (eg. oriented edges and textures), but ones that must be enforced on CNNs, usually through data augmentation techniques, at the expense of model size and training time. Group-equivariant CNNs (g-CNN) [Cohen & Welling, 2016] were the first generalization of CNNs, with the property of exact equivariance in the group of 90-degree rotations, flips and translations (the p4m group). This type of architecture can be even more effective for downstream tasks common in Earth Observation (land cover classification [Marcos et al. 2018], building segmentation, multi-frame super-resolution), because objects in satellite imagery, like coastlines and rivers, can appear under any orientation. This project will explore and leverage known symmetries in satellite data, e.g. rotation, flips, scaling, permutation-invariance (in multi-image setups) for more data-efficient learning of common downstream tasks. References: Cohen, T. and Welling, M., 2016, June. Group equivariant convolutional networks. In International conference on machine learning (pp. 2990-2999). Marcos, D., Volpi, M., Kellenberger, B. and Tuia, D., 2018. Land cover mapping at very high resolution with rotation equivariant CNNs: Towards small yet accurate models. ISPRS journal of photogrammetry and remote sensing, 145, pp.96-107.

Prerequisites

Strong Python coding, experience with deep learning and PyTorch for computer vision, good understanding of CNNs. Experience with Git.

Prerequisites

Strong Python coding, experience with deep learning and PyTorch for computer vision, good understanding of CNNs. Experience with Git.

Desirable: Experience with satellite imagery, exposure in basic group theory.

Hitting two birds with one stone: the optimizer for learning both weights and hyperparameters Yarin Gal, Clare Lyle, Binxin Ru Artificial Intelligence and Machine Learning MSc The performance of deep learning models depends heavily on their training hyperparameters (e.g. learning rate, weight decay). Tuning hyperparameters often require laborious trial and error and/or strong expertise. Although query-based search strategies like Bayesian optimization or evolutionary algorithms have been used to automate the hyperparameter tuning, these methods often require training the network from scratch at each query and a decent number of expensive queries are needed to achieve satisfactory performance. In this project, we aim to develop an optimizer which learns the hyperparameter and weights of deep neural networks in an online fashion during a single round of training. Specifically, we would draw inspiration from Metropolis-Hasting algorithms to evolve a good subset of hyperparameter samples, modify the conventional training procedure without introducing much overheads and study various performance estimators for efficiently assessing the hyperparameter performance with minimum additional training. Prerequisites Experience with deep neural network optimization and Bayesian inference methods, strong Python coding (preferably experience with PyTorch) show / hide details

Improve differentiable neural architecture search via marginal likelihood estimator Yarin Gal, Clare Lyle, Binxin Ru Artificial Intelligence and Machine Learning C MSc Neural architecture search (NAS) aims to automate the design process of good neural network architectures for a given task and has achieved impressive performance outperforming the human experts' design on a variety of applications. One popular class of NAS approaches is differentiable ones (e.g. [1]), which apply a continuous relaxation of architecture-related variables, and alternate between updating the architecture parameters and network weights via respective gradients in a bi-level optimization. Differentiable NAS methods often enjoy low computational costs, thus are practically appealing. However, their search performance tends to be suboptimal because the metric (often mini-batch validation loss) adopted to assess an architecture's quality during the training correlates poorly with the architecture's true generalization performance. On the other hand, a recently proposed metric, sum-over-training-losses (SoTL), has been theoretically shown to approximate Bayesian marginal likelihood in the linear setting [2] and empirically, achieves a good correlation with generalization performance in the non-linear setting [2, 3]. In this project, we aim to develop a SoTL-based metric for updating the architecture parameters to improve the search quality of differentiable NAS methods. We would start with revisiting the theoretical connection between SoTL and marginal likelihood, and then investigate possible ways to incorporate SoTL into the differential NAS framework as well as compute its gradient for optimizing architecture parameters. Prerequisites: Experience with deep neural networks and Bayesian inference, strong Python coding (preferably experience with PyTorch) show / hide details

Inferring dynamic brain networks with Transformer VAEs on MEG data (joint project with the Oxford Centre for Functional MRI of the Brain) " Yarin Gal, Pascal Notin Artificial Intelligence and Machine Learning MSc

This is a joint project between the Oxford Applied and Theoretical Machine Learning Group at CS (https://oatml.cs.ox.ac.uk/) and the Oxford Centre for Functional MRI of the Brain at the department of Clinical Neurosciences (https://www.ndcn.ox.ac.uk/divisions/fmrib/about-fmrib). Scientific objective: Infer dynamic brain networks (ie. which regions of the brains are jointly activated over time) from MEG data of individuals, at rest or while performing specific tasks. The goal is ultimately to improve our understanding of how the brain functions -- how, when, and where information is computed and represented in the brain depending on specific tasks. This may also inform us how to apply brain stimulations as part of a closed loop system, and help track the progression of neurological disorders (e.g., seizures in epilepsy) or the effectiveness of drugs / treatments. Project objectives: The FMRIB team at Oxford already developed several models to tackle this problem (e.g., based on HMM or LSTM). The main challenges at this point are a limited ability to learn long term dependencies in the MEG data given the current model architecture, as well as the necessity to make several simplifying assumptions to make computations tractacle. The project will consist in addressing these issues by leveraging recent progress in Transformer-based architectures, which have demonstrated an increased ability to learn longer-range dependencies in several Natural Language Processing applications over previous methods. Prerequisites * strong python experience * experience with deep learning, sequence models.

Prerequisites

* strong python experience

* experience with deep learning, sequence models

Interpretability in Machine Learning " Yarin Gal, Lisa Schut, Jan Brauner Artificial Intelligence and Machine Learning MSc The growing number of decisions influenced by machine learning models drives the need for explanations of why a system makes a particular prediction. Counterfactual explanations (CEs) are a practical tool for demonstrating why machine learning classifiers make particular decisions. CEs are of the form "If X had not occurred, then Y would not have occurred" [Wachter et al., 2017]. In practice, this type of explanation is an alternate input that is similar to the original input but leads to a different classification. In this project, we will extend on previous work in which we developed a simple and fast method for generating interpretable CEs for neural networks in the white-box setting, by leveraging the predictive uncertainty of the classifier [Schut et al., 2020]. The primary goal of the project will be to extend this method to black-box models, using proxy models [Afrabandpey et al., 2020]. Time-permitting, the project can be extended to developing metrics for interpretability. References: Afrabandpey, H., Peltola, T., Piironen, J., Vehtari, A., and Kaski, S. (2020). A decision-theoretic approach for model interpretability in Bayesian framework. Machine Learning, pages 1–22 Lisa Schut, Oscar Key, Rory McGrath, Luca Costabello, Bogdan Sacaleanu, Medb Corcoran, Yarin Gal. Uncertainty-Aware Counterfactual Explanations for Medical Diagnosis. NeurIPS Machine Learning for Health Workshop 2020. Wachter, S., Mittelstadt, B., and Russell, C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech. , 31:841 Prerequisites Machine Learning, Python (incl. Pytorch) show / hide details

ML for space debris detection (joint project with ESA) Yarin Gal, Atılım Güneş Baydin, Freddie Kalaitzis Artificial Intelligence and Machine Learning MSc

In-orbit collisions with debris endanger astronauts, end missions, and generate new fragments. Databases with orbital information are needed to avoid such collisions. Debris and satellites in high altitude orbits are typically observed from ground-based telescopes or from space-based optical payloads. The smaller the objects, the less light is reflected from the sun and therefore only a faint signal is received at the sensor. Due to the relative motion, the object will then create a streak-like feature in the image. A recent ESA-funded activity has resulted in the creation of a database with several thousand trailed images of objects of interest. However, every image has typically not only the streak-like feature of the object of interest, but also presents some other streaks. This project aims at detecting such streaks using machine learning. That is, identifying which of a known set of objects might be present and provide information about their positions within the image. The labelled dataset is currently under active development at ESA/ESOC and will be provided to the student under an NDA. The student will then develop either an active learning approach or an unsupervised learning approach.

Prerequisites: only suitable for someone who has worked in Machine Learning in the past (computer vision), and has strong programming skills (Python).

Topics in Randomised Algorithms and Computational Complexity Andreas Galanis Algorithms and Complexity Theory C MSc Description: Andreas Galanis is willing to supervise projects in the areas of randomised algorithms and computational complexity. Problems of interest include (i) the analysis of average case instances of hard combinatorial problems (example: can we satisfy a random Boolean formula?), and (ii) the computational complexity of counting problems (example: can we count approximately the number of proper colourings of a graph G?). The projects would suit mathematically oriented students, especially those with an interest in applying probabilistic methods to computer science. show / hide details

Data Science and Machine Learning Techniques for Model Parameterisation in Biomedical and Environmental Applications David Gavaghan, Martin Robinson, Michael Clerx Computational Biology and Health Informatics B C MSc "Time series data arise as the output of a wide range of scientific experiments and clinical monitoring techniques. Typically the system under study will either be undergoing time varying changes which can be recorded, or the system will have a time varying signal as input and the response signal will be recorded. Familiar everyday examples of the former include ECG and EEG measurements (which record the electrical activity in the heart or brain as a function of time), whilst examples of the latter occur across scientific research from cardiac cell modelling to battery testing. Such recordings contain valuable information about the underlying system under study, and gaining insight into the behaviour of that system typically involves building a mathematical or computational model of that system which will have embedded within in key parameters governing system behaviour. The problem that we are interested in is inferring the values of these key parameter through applications of techniques from machine learning and data science. Currently used methods include Bayesian inference (Markov Chain Monte Carlo (MCMC), Approximate Bayesian Computation (ABC)), and non-linear optimisation techniques, although we are also exploring the use of other techniques such as probabilistic programming and Bayesian deep learning. We are also interested in developing techniques that will speed up these algorithms including parallelisation, and the use of Gaussian Process emulators of the underlying models Application domains of current interest include modelling of the cardiac cell (for assessing the toxicity of new drugs), understanding how biological enzymes work (for application in developing novel fuel cells), as well as a range of basic science problems. Application domains of current interest include modelling of the cardiac cell (for assessing the toxicity of new drugs), understanding how biological enzymes work (for application in developing novel fuel cells), as well as a range of basic science problems. " Prerequisites: some knowledge of Python show / hide details

Topics in Continuous Maths, Discrete Maths, Linear Algebra, Object-oriented Programming, Machine Learning and Probability and Computing David Gavaghan Computational Biology and Health Informatics B C Prof Dave Gavaghan is willing to supervise in the area of Topics in Continuous Maths, Discrete Maths, Linear Algebra, Object-oriented Programming, Machine Learning and Probability and Computing. show / hide details

Medical information extraction with deep neural networks Paul Goldberg, Alejo Nevado-Holgado Algorithms and Complexity Theory MSc

Background. Electronic Health Records (EHRs) are the databases used by hospital and general practitioners to daily log all the information they record from patients (i.e. disorders, taken medications, symptoms, medical testsâ€¦). Most of the information held in EHRs is in the form of natural language text (written by the physician during each session with each patient), making it inaccessible for research. Unlocking all this information would bring a very significant advancement to biomedical research, multiplying the quantity and variety of scientifically usable data, which is the reason why major efforts have been relatively recently initiated towards this aim (e.g. I2B2 challenges https://www.i2b2.org/NLP/ or the UK-CRIS network of EHRs https://crisnetwork.co/uk-cris-programme)

Project. Recent Deep Neural Networks (DNN) architectures have shown remarkable results in traditionally unsolved NLP problems, including some Information Extraction tasks such as Slot Filling [Mesnil et al] and Relation Classification [dos Santos et al]. When transferring this success to EHRs, DNNs offer the advantage of not requiring well formatted text, while the problem remains of labelled data being scarce (ranging on the hundreds for EHRs, rather than the tens of thousands used in typical DNN studies). However, ongoing work in our lab has shown that certain extensions of recent NLP-DNN architectures can reproduce the typical remarkable success of DNNs in situations with limited labelled data (paper in preparation). Namely, incorporating interaction terms to feed forwards DNN architectures [Denil et al] can rise the performance of relation classification in I2B2 datasets from 0.65 F1 score to 0.90, while the highest performance previously reported with the same dataset was 0.74.

We therefore propose to apply DNNs to the problem of information extraction in EHRs, using I2B2 and UK-CRIS data as a testbed. More specifically, the DNNs designed and implemented by the student should be able to extract medically relevant information, such as prescribed drugs or diagnoses given to patients. This corresponds to some of the challenges proposed by I2B2 during recent years (https://www.i2b2.org/NLP/Medication/), and are objectives of high interest in UK-CRIS which have sometimes been addressed with older techniques such as rules [Iqbal et al, Jackson et al]. The student is free to use the extension of the feed forward DNN developed in our lab, or to explore other feed forwards or recurrent (e.g. RNN, LSTM or GRU) alternatives. The DNN should be implemented in Pythonâ€™s Keras, Theano, Tensorflow, or PyTorch.

Bibliography

G. Mesnil et al., Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding, IEEEACM Trans. Audio Speech Lang. Process. 23 (2015) 530â€"539. doi:10.1109/TASLP.2014.2383614.

C.N. dos Santos, B. Xiang, B. Zhou, Classifying Relations by Ranking with Convolutional Neural Networks, CoRR. abs/1504.06580 (2015). http://arxiv.org/abs/1504.06580.

M. Denil, A. Demiraj, N. Kalchbrenner, P. Blunsom, N. de Freitas, Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network, CoRR. abs/1406.3830 (2014). http://arxiv.org/abs/1406.3830.

E. Iqbal, R. Mallah, R.G. Jackson, M. Ball, Z.M. Ibrahim, M. Broadbent, O. Dzahini, R. Stewart, C. Johnston, R.J.B. Dobson, Identification of Adverse Drug Events from Free Text Electronic Patient Records and Information in a Large Mental Health Case Register, PLOS ONE. 10 (2015) e0134208. doi:10.1371/journal.pone.0134208.

R.G. Jackson MSc, M. Ball, R. Patel, R.D. Hayes, R.J. Dobson, R. Stewart, TextHunter â€" A User Friendly Tool for Extracting Generic Concepts from Free Text in Clinical Research, AMIA. Ann. Symp. Proc. 2014 (2014) 729â€"738.

Neural network genomics Paul Goldberg, Alejo Nevado Algorithms and Complexity Theory MSc Background. For more than 10 years, GWAS studies have represented a revolution in the study of human disorders and human phenotypes. By measuring how your risk of suffering any given disease changes according to SNP mutations, GWAS studies can measure how relevant each gene is to the disease under study. However, this SNP-disease association is calculated "one SNP at a time", and ignores all gene-gene interactions that are often crucially important to the causation of the disorder. If the interactions between a number of genes (e.g. insulin and its receptor in type 2 diabetes; or APP and alpha-, beta- and gamma- secretase in Alzheimer…) is what produces a given disorder, this interaction will not be detected in a GWAS analysis. This shortcoming may not be a problem in monogenetic hereditable disorders, such as Huntington disease, where mutations in a single gene by itself are enough for causing the disease. However, GWAS studies will likely not uncover the mechanisms of complex disorders, where the disease emerges from the interaction of a number of genes. This is likely the source of the so called "missing hereditability" problem observed in most complex traits or diseases, where all the SNP variations taken together cannot account for most of the hereditability of a given trait or disease [1]. In addition, it has been demonstrated that complex traits such as height and BMI are clearly and strongly hereditable [2], but GWAS studies simply cannot detect most of this hereditability. In summary, GWAS analyses detect simple individual genetic factors, but not interactions between genetic factors. Project. In this project, we propose to identify the interacting genetic factors behind Alzheimer's disease and other neurodegenerations with neural networks, which are known for exploiting the interactions present in the to-be analysed data. While the linear models used in GWAS studies are able to identify only linear and monovariated contributions of each gene to a disorder, neural networks can analyse how genes interact with each other to explain the studied disorder. This versatility is what has allowed neural networks to find such widespread use in industry in the last decade, where they are revolutionizing image, sound and language analysis [3–5]. In our laboratories we already have the hardware and the expertise required to build neural networks, and we have used them in other research areas relevant to Alzheimer's disease. In addition, we have access and experience using UK Biobank, which is the ideal dataset to implement this project. In UK Biobank they have measured wide array SNPs, all disorders and demographics in ~500,000 participants between the ages of 37 and 73, and more than 5000 of them suffer from Alzheimer's disease, Parkinson's disease or other neurodegenerations. We propose the MSc student to build a neural network to predict either diagnoses or disease-related endophenotypes (i.e. brain volumes of affected areas, cognitive scores…) of each one of these participants, using only the information present in the wide array SNPs and relevant demographics. The student is free to use the extension of the feed forward DNN developed in our lab, or to explore other feed forwards or recurrent (e.g. RNN, LSTM or GRU) alternatives. The DNN should be implemented in Python's Keras (https://keras.io/), Theano (http://deeplearning.net/software/theano/), Tensorflow (https://www.tensorflow.org/), or PyTorch (http://pytorch.org/). Bibliography [1] T.A. Manolio, F.S. Collins, N.J. Cox, D.B. Goldstein, L.A. Hindorff, D.J. Hunter, M.I. McCarthy, E.M. Ramos, L.R. Cardon, A. Chakravarti, J.H. Cho, A.E. Guttmacher, A. Kong, L. Kruglyak, E. Mardis, C.N. Rotimi, M. Slatkin, D. Valle, A.S. Whittemore, M. Boehnke, A.G. Clark, E.E. Eichler, G. Gibson, J.L. Haines, T.F.C. Mackay, S.A. McCarroll, P.M. Visscher, Finding the missing heritability of complex diseases, Nature. 461 (2009) 747–753. doi:10.1038/nature08494. [2] D. Cesarini, P.M. Visscher, Genetics and educational attainment, Npj Sci. Learn. 2 (2017). doi:10.1038/s41539-017-0005-6. [3] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature. 521 (2015) 436–444. doi:10.1038/nature14539. [4] A. Esteva, B. Kuprel, R.A. Novoa, J. Ko, S.M. Swetter, H.M. Blau, S. Thrun, Dermatologist-level classification of skin cancer with deep neural networks, Nature. 542 (2017) 115–118. doi:10.1038/nature21056. [5] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human-level control through deep reinforcement learning, Nature. 518 (2015) 529–533. doi:10.1038/nature14236. show / hide details

A Conceptual Model for Assessing Privacy Risk Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc Privacy is not a binary concept, the level of privacy enjoyed by an individual or organisation will depend upon the context within which it is being considered; the more data at attacker has access to the more potential there may be for privacy compromise. We lack a model which considers the different contexts that exist in current systems, which would underpin a measurement system for determining the level of privacy risk that might be faced. This project would seek to develop a prototype model – based on a survey of known privacy breaches and common practices in data sharing. The objective being to propose a method by which privacy risk might be considered taking into consideration the variety of (threat and data-sharing) contexts that any particular person or organisation might be subjected to. It is likely that a consideration of the differences and similarities of the individual or organisational points of view will need to be made, since the nature of contexts faced could be quite diverse. show / hide details

Computer Vision for Physical Security Michael Goldsmith, Sadie Creese Security B C MSc

Computer Vision allows machines to recognise objects in real-world footage. In principle, this allows machines to flag potential threats in an automated fashion based on historical and present images in video footage of real-world environments. Automated threat detection mechanisms to help security guards identify threats would be of tremendous help to them, especially if they have to temporarily leave their post, or have to guard a significant number of areas. In this project, students are asked to implement a system that is able to observe a real environment over time and attempt to identify potential threats, independent of a number of factors, e.g. lighting conditions or wind conditions. The student is encouraged to approach this challenge as they see fit, but would be expected to design, implement and assess any methods they develop. One approach might be to implement a camera system using e.g. a web camera or a Microsoft Kinect to conduct anomaly detection on real-world environments, and flag any issues related to potential threats.

Requirements: Programming skills required

Considering the performance limiters for cybersecurity controls Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

The behaviour, and so effectiveness, of security controls is dependent upon how they are used. One obvious example being the protection afforded by a firewall is dependent upon the maintenance of the rules that determine what the firewall stops and what it does not. The benefit (to the organisation or user seeking to manage risk) of various technical controls in an operational context lacks good evidence and data, so there is scope to consider the performance of controls in a lab environment. This mini-project would select one or more controls from the CIS Top 20 Critical Security Controls (CSC) (version 6.1) and seek to develop laboratory experiments (and implement them) to gather data on how the effectiveness of the control is impacted by its deployment context (including, for example, configuration, dependence on other controls, nature of the threat faced).

Requirements: Students will need an ability to develop a test-suite and deploy the selected controls.

Cybersecurity visualization Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

Cybersecurity visualization helps analysts and risk owners alike to make better decisions about what to do when the network is attacked. In this project the student will develop novel cybersecurity visualizations. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the visualizations they develop. These projects tend to have a focus on network traffic visualization, but the student is encouraged to visualize datasets they would be most interested in. Other interesting topics might for instance include: host-based activities (e.g. CPU/RAM/network usage statistics), network scans (incl. vulnerabilities scanning). Past students have visualized network traffic patterns and android permission violation patterns. Other projects on visualizations are possible (not just cybersecurity), depending on interest and inspiration.

Requirements: Programming skills required.

Detecting disguised processes using Application-Behaviour Profiling Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security MSc

In order to avoid detection, malware can disguise itself as a legitimate program or hijack system processes to reach its goals. Commonly used signature-based Intrusion Detection Systems (IDS) struggle to distinguish between these processes and are thus only of limited use to detect these kind of attacks. They also have the shortcoming that they need to be updated frequently to possess the latest malware definitions. This makes them inherently prone to missing novel attacks. Misuse detection IDSs however overcome this problem by maintaining a ground truth of normal application behaviour and reporting deviations as anomalies. In this project, students will be tasked to investigate how a process' behaviours can be profiled in the attempt to identify whether it is behaving anomalously and if it can be correctly identified as malware. This project will build on existing research in this area.

Requirements: Programming skills required.

Experimenting with anomaly detection features for detecting insider attacks Michael Goldsmith, Ioannis Agrafiotis, Sadie Creese, Arnau Erola Security B C MSc

This project will use an anomaly detection platform being developed by the Cyber Security Analytics Group to consider relative detection performance using different feature sets, and different anomalies of interest, in the face of varying attacks. This research would be experimental in nature and conducted with a view to exploring the minimal sets that would result in detection of a particular threat. Further reflection would then be given to how generalisable this result might be and what we might determine about the critical datasets required for this kind of control to be effective.

Eyetracker data analysis Michael Goldsmith, Ivan Martinovic Security MSc

Eyetracking allows researchers to identify where observers look on a monitor. A challenge in analysing eyetracker output however is identifying patterns of viewing behaviour from the raw eyetracker logs over time and what they mean in a research context, particularly as no semantic information about the pixel being viewed is considered. In practice, this means that we know where someone is looking, but nothing about what it means to look at that pixel or the order of pixels. From a research perspective, being able to tag areas of interest, and what those areas mean, or conduct other statistical analysis on viewing patterns would be of significant use to analysing results, particularly if these results could be automated. The purpose of this project is to conduct an eyetracker experiment, and design and implement useful methods to analyse eyetracker data. We will be able to provide training for the student, so they are able to use the eyetracking tools themselves. Other eyetracking projects also possible, depending on interest and inspiration.

Skills: Programming, Statistics

Formal threat and vulnerability analysis of a distributed ledger model Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

This project would utilise the process algebra CSP and associated model checker FDR to explore various types of threat and how they might successfully compromise a distributed ledger. This type of modelling would indicate possible attacks on a distributed ledger, and could guide subsequent review of actual designs and testing strategies for implementations. The modelling approach would be based on the crypto-protocol analysis techniques already developed for this modelling and analysis environment, and would seek to replicate the approach for a distributed ledger system. Novel work would include developing the model of the distributed ledger, considering which components are important, formulating various attacker models and also formulating the security requirements / properties to be assessed using this model-checking based approach.

Requirements: In order to succeed in this project students would need to have a working knowledge of the machine readable semantics of CSP and also the model checker FDR. An appreciation of threat models and capability will need to be developed.

High Dynamic Range Imaging for Documentation Applications Michael Goldsmith, Sadie Creese Security B C MSc High dynamic range imaging (HDRI) allows more accurate information about light to be captured, stored, processed and displayed to observers. In principle, this allows viewers to obtain more accurate representations of real-world environments and objects. Naturally, HDRI would be of interest to museum curators to document their objects, particularly, non-opaque objects or whose appearance significantly alter dependent on amount of lighting in the environment. Currently, few tools exist that aid curators, archaeologists and art historians to study objects under user-defined parameters to study those object surfaces in meaningful ways. In this project the student is free to approach the challenge as they see fit, but would be expected to design, implement and assess any tools and techniques they develop. The student will then develop techniques to study these objects under user-specified conditions to enable curators and researchers study the surfaces of these objects in novel ways. These methods may involve tone mapping or other modifications of light exponents to view objects under non-natural viewing conditions to have surface details stand out in ways that are meaningful to curators. show / hide details

Intelligent user activity timelines Michael Goldsmith Security B C MSc "Operating system store temporal data in multiple locations. Digital investigators are often tasked with reconstructing timelines of user activities. Timeline generation tools such as log2timeline can aid in extracting temporal data, similarly, 'Professonal' tools such as Encase and Autopsy build and visualise low level timelines. Collectively, these tools: (1) provide (often high levels of) low level data, and (2) are not able to apply any form of reasoning. This project involves the extraction of temporal data and the application of reasoning algorithms to develop reliable event sequences of interest to an investigator. Useful references: Olsson, J. and Boldt, M., 2009. Computer forensic timeline visualization tool. digital investigation, 6, pp.S78-S87. Buchholz, F.P. and Falk, C., 2005, August. Design and Implementation of Zeitline: a Forensic Timeline Editor. In DFRWS." show / hide details

International Cybersecurity Capacity Building Initiatives Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc There is a large investment being made by the international community aimed at helping nations and regions to develop their capacity in cybersecurity. The work of the Global Cyber Security Capacity Centre (based at the Oxford Martin School) studies and documents this: https://www.sbs.ox.ac.uk/cybersecurity-capacity/content/front. There is scope to study in more detail the global trends in capacity building in cybersecurity, the nature of the work and the partnerships that exist to support it. An interesting analysis might be to identify what is missing (through comparison with the Cybersecurity Capacity Maturity Model, a key output of the Centre), and also to consider how strategic, or not, such activities appear to be. An extension of this project, or indeed a second parallel project, might seek to perform a comparison of the existing efforts with the economic and technology metrics that exist for countries around the world, exploring if the data shows any relationships exist between those metrics and the capacity building activities underway. This analysis would involve regression techniques. show / hide details

Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs Michael Goldsmith Security B C MSc (Joint with Sadie Creese) Modelling of security-related interactions, in CSP or more evocative notations such as Milner's bigraphs (http://www.springerlink.com/content/axra2lvj4ph81bdm/; http://www.amazon.co.uk/The-Space-Motion-Communicating-Agents/dp/0521738334/ref=sr_1_2?ie=UTF8&qid=1334845525&sr=8-2). Concurrency and Computer Security a distinct advantage. Appropriate for good 3rd or 4th year undergraduates or MSc. * Open to suggestions for other interesting topics in Cybersecurity, if anyone has a particular interest they would like to pursue. show / hide details

Organisational Resilience and Self-* re-planning Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security MSc

Key Performance Indicator (KPI) is a type of performance measurement in organisations. They evaluate the success of activities in which an organisation or system engages. Often success is measured in terms of whether an activity has been able to complete and able to repeat (e.g. zero defects, customer satisfaction or similar), other times it is measured in terms of making progress towards a strategic goal. Well-chosen KPIs allows us to reflect upon performance of an organisation, and possibly identify potential future degradation issues in systems. In this project, students are tasked to investigate how KPIs can be used to measure and improve cyber-resilience in an organisation. Particularly, we are interested in investigating how the performance of an organisation, particularly with respects to security as well as mission performance. After identifying which aspects of an organisation or the mission are prone to error, it may be beneficial to propose solutions for how these issues can be addressed. With "self*-re-planning", we believe it would be possible for a system to suggest and automate certain aspects of how the mission, infrastructure or organisation can be repurposed to not fail. The student is encouraged to approach this challenge as they see fit, but would be expected to design, implement and assess any methods they develop. For instance, the project may be network-security focused, mission-processes focused or otherwise. It may also investigate more formal approaches to self-* re-planning methods.

Requirements: Programming and/or Social Science Research Methods.

Penetration testing for harm Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

Current penetration testing is typically utilised for discovering how organisations might be vulnerable to external hacks, and testing methods are driven by using techniques determined to be similar to approaches used by hackers. The result being a report highlighting various exploitable weak-points and how they might result in unauthorised access should a malign entity attempt to gain access to a system. Recent research within the cybersecurity analytics group has been studying the relationship between these kinds of attack surfaces and the kinds of harm that an organisation might be exposed to. An interesting question would be whether an orientation around intent, or harm, might result in a different test strategy? Would a different focus be given to the kinds of attack vectors explored in a test if a particular harm is aimed at? This mini-project would aim to explore these and other questions by designing penetration test strategies based on a set of particular harms, and then seek to consider potential differences with current penetration practices by consultation with the professional community. Requirements: Students will need to have a working understanding of penetration testing techniques.

Photogrammetry Michael Goldsmith Security B C MSc Photogrammetry is a set of techniques that allows for 3D measurements from 2D photographs, especially those measurements pertaining to geometry or surface colours. The purpose of this project is to implement one or more photogrammetry techniques from a series of 2D photographs. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the tool they develop. show / hide details

Predicting exposure to risk for active tasks Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

Prior research has been considering how we might better understand and predict the consequences of cyber-attacks based on knowledge of the business processes, people and tasks and how they utilise the information infrastructure / digital assets that might be exposed to specific attack vectors. However, this can clearly be refined by moving to an understanding of those tasks live or active at the time of an attack propagating across a system. If this can be calculated, then an accurate model of where risk may manifest and the harm that may result can be constructed. This project would explore the potential for such a model through practical experimentation and development of software monitors to be placed on a network aimed at inferring the tasks and users that are active based from network traffic. If time allows then host-based sensors might also be explored (such as on an application server) to further refine the understanding of which users and live on which applications etc.

Requirements: Students must be able to construct software prototypes and have a working knowledge of network architectures and computer systems.

Procedural Methods in Computer Graphics Michael Goldsmith Security B C MSc

Procedural methods in computer graphics help us develop content for virtual environments (geometry and materials) using formal grammars. Common approaches include fractals and l-systems. Examples of content may include the creation of cities, planets or buildings. In this project the student will develop an application to use create content procedurally. The student is free to approach the challenge as they see fit, but would be expected to design, implement and assess the methods they develop. These projects tend to have a strong focus designing and implementing existing procedural methods, but also includes a portion of creativity. The project can be based on reality - e.g. looking at developing content that has some kind of basis on how the real world equivalent objects were created (physically-based approaches), or the project may be entirely creative in how it creates content. Past students for instance have built tools to generate cities based on real world examples and non-existent city landscapes, another example include building of procedural planets, including asteroids and earth-like planetary bodies.

Reflectance Transformation Imaging Michael Goldsmith, Stefano Gogioso Foundations, Structures, and Quantum, Security B C MSc Reflectance Transformation Imaging (RTI) is a powerful set of techniques (the first of which known as Polynomial Texture Maps, PTMs) that enables us to capture photographs of objects under a several lighting conditions. Combined, these RTI images form a single photograph in which users can relight these objects by moving the light sources around the hemisphere in front of the object, but also specify user-defined parameters, including removing colour, making the objects more specular or diffuse in order to investigate the surface details in depth. It can be used for forensic investigation of crime scenes as well as cultural heritage documentation and investigation. The purpose of this project is to implement RTI methods of their preference. show / hide details

Resilience – metrics and tools for understanding organisational resilience Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security MSc

Resilience in the face of cyber-attack is considered a key requirement for organisations. Prior work within the cybersecurity analytics group has been developing a resilience model for organisations, but there remains a commonly agreed set of metrics that organisations can use to determine just how resilient they are (at a given point in time and continuously). This mini-project would seek to propose a set of metrics, and if time allows tools for applying them (although the latter may well better suit a following DPhil). The approach will be to consider the development of metrics to measure the various characteristics or organisational capabilities / behaviours that are determined to be necessary in order to be resilient. It will be necessary to consider how the metrics might vary according to, or take account of, different threat environments. It may also be interesting to consider if there are refinements of the resilience requirements that apply to different types of organisations. Data analytics approaches will need to be considered, in order to apply the metrics, and students might consider the use of visualisation to help with this. The resulting metrics will be validated in consultation with security professionals and organisations possessing resilience related experience. In the context of the mini-project this is most likely achievable via a small number of interviews, with a more detailed validation and iterative design approach being supported by a longitudinal study working with 2 or 3 organisations which might adopt and test the metrics

Smartphone security Michael Goldsmith Security B C MSc

(Joint with Sadie Creese) Smartphone security: one concrete idea is the development of a policy language to allow the authors of apps to describe their behaviour, designed to be precise about the expected access to peripherals and networks and the purpose thereof (data required and usage); uses skills in formal specification, understanding of app behaviour (by studying open-source apps), possibly leading to prototyping a software tool to perform run-time checking that the claimed restrictions are adhered to. Suitable for good 3rd or 4th year undergraduates, or MSc, Concurrency, Concurrent Programming, Computer Security all possibly an advantage. Other projects within this domain possible, according to interest and inspiration.

Prerequisites:

Concurrency, Concurrent Programming, Computer Security all possibly an advantage.

Technology-layer social networks Michael Goldsmith Security C MSc (Joint with Sadie Creese) Technology-layer social networks: investigate the potential to identify relationships between people via technology metadata - which machines their machines are "friendly" with. Research will involve identification of all metadata available from the network layer, app layers and the data layer, development of appropriate relationship models, and practical experimentation / forensic-style work exploring how to extract relationships between technologies and identities. Appropriate for 4th year undergraduates or MSc. show / hide details

Trip-wires Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc At Oxford we have developed a framework for understanding the components of an attack, and documenting known attack patterns can be instrumental in developed trip-wires aimed at detecting the presence of insiders in a system. This project will seek to develop a library of such trip-wires based on a survey of openly documented and commented upon attacks, using the Oxford framework. There will be an opportunity to deploy the library into a live-trial context which should afford an opportunity to study the relative utility of the trip-wires within large commercial enterprises. The mini-project would also need to include experimenting with the trip-wires in a laboratory environment, and this would involve the design of appropriate test methods show / hide details

Understanding Enterprise Infrastructure Dependencies Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

There are many tools available for detecting and monitoring cyber-attacks based on network traffic and these are accompanied by a wide variety of tools designed to make alerts tangible to security analysts. By comparison, the impact of these attacks on an organisational level has received little attention. An aspect that could be enhanced further is the addition of a tool facilitating management and updating of our understanding of business processes, but also how those processes are dependent on a network infrastructure. This tool could facilitate the mapping between company strategies, activities needed to accomplish company goals and map these down to the network and people assets. At the top of the hierarchy lies the board, responsible for strategic decisions. These decision are interpreted in the managerial level and could be captured and analysed with business objective diagrams. These diagrams in return could be refined further to derive business processes and organisational charts, ensuring that decision made in the top level will be enforced in the lower levels. The combination of business processes and organisation charts could eventually provide the network infrastructure. For this project we suggest a student could develop novel algorithms for mapping of business processes to network infrastructures in an automated way (given the updated business process files). That said, the student is encouraged to approach this challenge as they see fit, but would be expected to design, implement and assess any methods they develop. Other projects on business process modelling also possible, depending on interest and inspiration.

Vulnerability analysis and threat models for Distributed Ledgers Michael Goldsmith, Sadie Creese, Ioannis Agrafiotis, Arnau Erola Security B C MSc

This project would seek to study the general form of distributed ledgers to identify weak points that might make implementations open to compromise and attack, and to develop corresponding threat models. A number of different approaches might be considered, for example: formal modelling of protocols used to establish agreement or resolve conflicts; design of test suites using attack-graphs and then validation in a laboratory setting.

A Recommender System for Scientific Referees Based on Bibliographic Databases and Knowledge Graphs Georg Gottlob, Emanuel Sallinger, Joel Ouaknine Automated Verification, Data and Knowledge C

Specify, design, and implement a recommender system for referees in Computer Science and other areas. The system should have the following functionality: A user submits to the recommender system the title and list of authors of a paper or project to be reviewed, along with some keywords. The system then compiles an ordered list of suggestions for referees, and for each referee also provides some hints as to why this referee is deemed to be competent. The system should also handle various constraints such as, for example, that referees should not have recent joint publications with the authors.

The design and implementation should follow a knowledge-based approach and use reasoning techniques and graph/network analysis to derive appropriate referees. Data from publicly available databases such as DBLP and ORCID shall be entered into a large integrated knowledge graph (KG). This KG should be enriched by further attributes, and appropriate recursive queries (e.g. in the form of Datalog programs) that select referees should be designed and tested.

We envisage that the student will engage in the following steps and activities:

Elaborate a more precise problem definition and specify the desired system functionality in more detail.
Search for open data that can be used, initially only for the field of Computer Science. Understand how this data can be accessed and downloaded. Assess how this data can best be used towards the above goals.
Get acquainted with knowledge graph technology and software (e.g. the VADALOG system, which will be freely available).
Experiment with recursive queries and establish various ways of obtaining good referees. For example, envisage a similarity-based approach in which an author and a referee are defined to be 'similar' if they publish in similar venues, where, however, the definition of venue 'similarity' is in part based on the similarity of authors publishing in those venues. Find efficient algorithms (possibly probabilistic) for approximately computing such similarities.
Evaluate various referee-finding methods by interviewing domain experts (e.g. Computer Science academics, who can judge the appropriateness of the computed lists of referees, or can suggest better referees). Based on this evaluation, improve the system.
Design and Implement the final system and add a simple but pleasant user interface, and make it available as a Web service. Evaluate the overall result.
Carry out all further necessary tasks which have not been listed here.

Prerequisites: Strong and motivated student with an interest in several of the following topics: databases, software, knowledge graphs, AI, logic, reasoning, and Big Data. Good practical skills, but also a deep understanding of algorithms and recursion so as to be able to develop new efficient methods for querying and processing large amounts of data efficiently.

Bioinformatics Projects Jotun Hein Computational Biology and Health Informatics B C MSc

Jotun Hein has been in Oxford since 2001 and has supervised many students from Computer Science. His main interest is computational biology and combinatorial optimization problems in this area, especially from phylogenetics, biosequence analysis, population genetics, grammars and the origin of life. Some idea for projects can be obtained by browsing these pages:

https://heingroupoxford.com/learning-resources/topics-in-computational-biology/

https://heingroupoxford.com/previous-projects/

A student is also welcome to propose his or her own project.

Injecting Ontological Schema in Knowledge Graph Embedding Ian Horrocks, Jiaoyan Chen Data and Knowledge C MSc Ontology and its counterparts such as logical constraints play a key role in managing the quality of knowledge graphs (KGs) such as Wikidata and DBpedia. KG embedding is widely investigated due to its usability in KG completion and other KG predictive analytics. However the current semantic embedding methods cannot fully utilize the ontology and its logical constraints, and thus the learned sub-symbolic knowledge is incomplete. This project aims to inject the knowledge expressed by an ontological schema into KG embeddings. "Research question: How to represent the logical relationship of an ontology in the embedding space? How to jointly learn the embedding of the KG and its ontology? Expection: embedding algorithms for KG and its ontology." Desirable: Experience with satellite imagery, exposure in basic group theory. Prerequisites: 1. Chen, Jiaoyan, et al. ""OWL2Vec*: Embedding of OWL Ontologies."" arXiv preprint arXiv:2009.14654 (2020). 2. Zhang, Wen, et al. ""Iteratively learning embeddings and rules for knowledge graph reasoning."" The World Wide Web Conference. 2019. 3. Hao, Junheng, et al. ""Universal representation learning of knowledge bases by jointly embedding instances and ontological concepts."" KDD 2019. 4. Chekol, Melisachew Wudage, and Giuseppe Pirrò. ""Refining Node Embeddings via Semantic Proximity."" ISWC 2020." show / hide details

Compilation of a CSP-like language Geraint Jones Programming Languages B C MSc

This is a compiler project, also requiring familiarity with concurrency.

The parallel programming language occam is essentially an implementable sublanguage of CSP. The aim of this project is to produce a small portable implementation of a subset of occam; the proposed technique is to implement a virtual machine based on the inmos transputer, and a compiler which targets that language.

One of the aims of the project is to implement an extension to occam which permits recursion; more ambitious projects might implement a distributed implementation with several communicating copies of the virtual machine. Other possibilities are to produce separate virtual machines, optimised for displaying a simulation, or for efficiency of implementation, or translating the virtual machine code into native code for a real machine.

Logic circuit workbench Geraint Jones Programming Languages B C MSc

This is an interactive programming project with some logic simulation behind it.

The idea is to produce a simulator for a traditional logic breadboard. The user should be able to construct a logic circuit by drawing components from a library of standard gates and latches and so on, and connecting them together in allowable ways. It should then be possible to simulate the behaviour of the circuit, to control its inputs in various ways, and to display its outputs and the values of internal signals in various ways.

The simulator should be able to enforce design rules (such as those about not connecting standard outputs together, or limiting fan-out) but should also cope with partially completed circuits; it might be able to implement circuits described in terms of replicated sub-circuits; it should also be able to some sort of standard netlist.

It might be that this would make a project for two undergraduates: one implementing the interface, the other implementing a simulator that runs the logic.

Modelling of arithmetic circuits Geraint Jones Programming Languages B C MSc

This is a project in the specification of hardware, which I expect to make use of functional programming.

There is a great deal of knowledge about ways (that are good by various measures) of implementing standard arithmetic operations in hardware. However, most presentations of these circuits are at a very low level, involving examples, diagrams, and many subscripts, for example these descriptions.

The aim of this project is to describe circuits like this in a higher-level way by using the higher order functions of functional programming to represent the structure of the circuit. It should certainly be possible to execute instances of the descriptions as simulations of circuits (by plugging in simulations of the component gates), but the same descriptions might well be used to generate circuit netlists for particular instances of the circuit, and even to produce the diagrams.

Toys for Animating Mathematics Geraint Jones Programming Languages B C

The aim is to take some mathematics that would be within the grasp of a mathematically-inclined sixth-former and turn it into some attention-grabbing web pages. Ideally this reveals a connection with computing science. I imagine that this project necessarily involves some sort of animation, and I have visions of Open University television maths lectures.

The programming need not be the most important part of this project, though, because some of the work is in choosing a topic and designing problems and puzzles and the like around it. There's a lot of this sort of thing about, though, so it would be necessary to be a bit original.

Think of GeomLab (and then perhaps think of something a little less ambitious). It might involve logic and proof, it might be about sequences and series, it might be about graphs, it might be about the mathematics of cryptography... you might have something else in mind.

Extensible Search-Based Theorem Prover Mark Kaminski Data and Knowledge C MSc "Background: Search-based proof methods such as those based on DPLL, tableau and connection calculi are widely used in automated theorem proving, ontological reasoning, and automated verification. As most implementations of such methods are developed with a particular logic, algorithm, and set of applications in mind, it is often difficult to extend them with new features. Project goals: The project aims to develop a modular, extensible base for implementing search-based decision procedures for propositional, modal, and description logics. A key objective is to develop an abstract proof search engine that can be instantiated with different logics, proof rules, and search strategies. The capabilities of the engine should be demonstrated by implementing a simple tableau-based or DPLL-based reasoner for propositional or modal logic. Prerequisites: good functional programming skills, familiarity with logic and automated theorem proving. show / hide details

Time-Bandits: Varun Kanade, Thomas Orton Artificial Intelligence and Machine Learning C Consider the following problem: at each time step, you receive a dataset D_t you would like to fit (using e.g. a neural network or statistical model). You need to choose a hyper-parameter p_t for the model, and then you can run an optimization procedure (e.g. stochastic gradient descent) for time T_{p_t} with this hyper-parameter. Afterwards, you can measure the validation loss l_{t} of the fitted model on the dataset. If you are happy with the validation loss, you can move to the next time step. Otherwise, you can try to choose a new hyper-parameter (spending more compute time) and re-fit the model until you have an acceptable validation loss. > > The field of Multi Armed Bandits ( MAB ) is a rich research area which broadly answers the question of how to make decisions while minimizing regret. This has many applications in Machine Learning, Reinforcement Learning, Portfolio Optimization and Economics. However, there is an obstacle in applying solutions to the MAB for the above problem. This is because MAB is typically formalized by considering a loss function over actions, and considering how to pick a single action at each time step to minimize regret. In contrast, in our problem you may pick as many actions as you like (provided you pay for them with time), and you receive the minimum loss over the actions which you took at each time step. The idea of modelling time trade-offs is also a fundamental issue in practical machine learning which has only more recently been receiving attention in a theoretical context. > > The following project is ideal for a mathematically inclined 4th year student with the following objectives: > > 1. Develop some practically motivated formalisms for modelling time resources in the MAB setting. > 2. Develop a good understanding of existing MAB literature and how it relates to the time-resource MAB variants considered. > 3. Provide algorithmic solutions and theoretical lower-bounds to the time-resource MAB variants considered. show / hide details

A fast numerical solver for multiphase flow David Kay Computational Biology and Health Informatics B C MSc

The Allen-Cahn equation is a differential equation used to model the phase separation of two, or more, alloys. This model may also be used to model cell motility, including chemotaxis and cell division. The numerical approximation, via a finite difference scheme, ultimately leads to a large system of linear equation. In this project, using numerical linear algebra techniques, we will develop a computational solver for the linear systems. We will then investigate the robustness of the proposed solver.

Efficient solution of one dimensional airflow models David Kay Computational Biology and Health Informatics B C MSc In this project we will investigate the use of numerical and computational methods to efficiently solve the linear system of equations arising from a one dimensional airflow model within a network of tree like branches. This work will build upon the methods presented within the first year Linear Algebra and Continuous Maths courses. All software to be developed can be in the student's preferred programming language. General area: Design and computational implementation of fast and reliable numerical models arising from biological phenomena. show / hide details

General topics in the area of Steganography and Cryptography Andrew Ker Security B C Supervision of general topics in the area of Steganography and Cryptography? show / hide details

Bots for a Board Game Stefan Kiefer Automated Verification B C The goal of this project is to develop bots for the board game Hex. In a previous project, an interface was created to allow future students to pit their game-playing engines against each other. In this project the goal is to program a strong Hex engine. The student may choose the algorithms that underly the engine, such as alpha-beta search, Monte-Carlo tree search, or neural networks. The available interface allows a comparison between different engines. It is hoped that these comparisons will show that the students' engines become stronger over the years. The project involves reading about game-playing algorithms, selecting promising algorithms and datastructures, and design and development of software (in Java or Scala). show / hide details

Small matrices for irrational nonnegative matrix factorisation Stefan Kiefer Automated Verification B C MSc Recently it was shown (https://arxiv.org/abs/1605.06848) that there is a 6x11-matrix M with rational nonnegative entries so that for any factorisation M = W H (where W is a 6x5-matrix with nonnegative entires, and H is a 5x11-matrix with nonnegative entries) some entries of W and H need to be irrational. The goal of this project is to explore if the number of columns of M can be chosen below 11, perhaps by dropping some columns of M. This will require some geometric reasoning. The project will likely involve the use of tools (such as SMT solvers) that check systems of nonlinear inequalities for solvability. Prerequisites: The project is mathematical in nature, and a good recollection of linear algebra is needed. Openness towards programming and the use of tools is helpful. show / hide details

Truthful scheduling for graphs Elias Koutsoupias Algorithms and Complexity Theory B C MSc The aim of the project is to advance our understanding of the limitations of mechanism design for the scheduling problem, the "holy grail" of algorithmic mechanism design. Specifically, we will consider the graphical scheduling problem, in which every task can be only allocated to two machines, and study the approximation ratio of mechanisms for this setting. The aim is to prove both lower and upper bounds. Both directions are hard problems, and we plan to try to gain some understanding by experimentally searching for lower bounds or trying to verify potentially useful statements about the upper bound. Of particular importance is the case of trees and their special case of stars, i.e., when every task can be given either to the root or to a particular leaf. We plan to study not only the standard objective of the makespan, but the more general class of objectives in which the mechanism minimizes the L^p norm of the times of the machines. The case L^infinity is to minimize the makespan, L^1 is to minimize the welfare, and the case L^0 corresponds to the Nash Social Welfare problem, all of which are interesting problems. Further possible directions include considering fractional scheduling and mechanisms without money. Bibliography: George Christodoulou, Elias Koutsoupias, Annamária Kovács: Truthful Allocation in Graphs and Hypergraphs. ICALP 2021: 56:1-56:20 (https://arxiv.org/abs/2106.03724) show / hide details

Applications of deductive machine learning Daniel Kroening Automated Verification C MSc

The goal of deductive machine learning is to provide computers with the ability to automatically learn a behaviour that provably satisfies a given high-level specification. As opposed to techniques that generalise from incomplete specifications (e.g. examples), deductive machine learning starts with a complete problem description and develops a behaviour as a particular solution.

Potential applications of deductive machine learning are detailed below, and a student would focus on one of these items for their project. We envisage applying existing algorithms, with potential to develop new ones.

- Game playing strategy: given the specification of the winning criteria for a two-player game, learn a winning strategy.

- Program repair: given a buggy program according to a correctness specification, learn a repair that makes the program correct.

- Lock-free data structures: learn a data structure that guarantees the progress of at least one thread when executing multi-threaded procedures, thereby helping to avoid deadlock.

- Security exploit generation: learn code that takes advantage of a security vulnerability present in a given software in order to cause unintended behaviour of that software.

- Security/cryptographic protocol: learn a protocol that performs a security-related function and potentially applies cryptographic methods.

- Compression: learn an encoding for some given data that uses fewer bits than the original representation. This can apply to both lossless and lossy compression.

Probabilistic Modelling Checking Marta Kwiatkowska Automated Verification B C MSc

Professor Marta Kwiatkowska is happy to supervise projects involving probabilistic modelling, verification and strategy synthesis. This is of interest to students taking the Probabilistic Model Checking course and/or those familiar with probabilistic programming.

Below are some concrete project proposals, but students' own suggestions will also be considered:

Synthesis of driver assistance strategies in semi-autonomous driving. Safety of advanced driver assistance systems can be improved by utilising probabilistic model checking. Recently (http://qav.comlab.ox.ac.uk/bibitem.php?key=ELK+19) a method was proposed for correct-by-construction synthesis of driver assistance systems. The method involves cognitive modelling of driver behaviour in ACT-R and employs PRISM. This project builds on these techniques to analyse complex scenarios of semi-autonomous driving such as multi-vehicle interactions at road intersections.
Equilibria-based model checking for stochastic games. Probabilistic model checking for stochastic games enables formal verification of systems where competing or collaborating entities operate in a stochastic environment. Examples include robot coordination systems and the Aloha protocol. Recently (http://qav.comlab.ox.ac.uk/papers/knps19.pdf) probabilistic model checking for stochastic games was extended to enable synthesis of strategies that are subgame perfect social welfare optimal Nash equilibria, soon to be included in the next release of PRISM-games (www.prismmodelchecker.org). This project aims to model and analyse various coordination protocols using PRISM-games.
Probabilistic programming for affective computing. Probabilistic programming facilitates the modelling of cognitive processes (http://probmods.org/). In a recent paper (http://arxiv.org/abs/1903.06445), a probabilistic programming approach to affective computing was proposed, which enables cognitive modelling of emotions and executing the models as stochastic, executable computer programs. This project builds on these approaches to develop affective models based on, e.g., this paper (http://qav.comlab.ox.ac.uk/bibitem.php?key=PK18).

Safety Assurance for Deep Neural Networks Marta Kwiatkowska Automated Verification B C MSc

Safety Assurance for Deep Neural Networks

Professor Marta Kwiatkowska is happy to supervise projects in the area of safety assurance and automated verification for deep learning, including Bayesian neural networks. For recent papers on this topic seehttp://qav.comlab.ox.ac.uk/bibitem.php?key=WWRHK+19, http://qav.comlab.ox.ac.uk/bibitem.php?key=RHK18 and http://qav.comlab.ox.ac.uk/bibitem.php?key=CKLPPW+19, and also https://www.youtube.com/watch?v=XHdVnGxQBfQ.

Below are some concrete project proposals, but students' own suggestions will also be considered:

Robustness of attention-based sentiment analysis models to substitutions. Neural network models for NLP tasks such as sentiment analysis are susceptible to adversarial examples. In a recent paper (https://www.aclweb.org/anthology/D19-1419/) a method was proposed for verifying robustness of NLP tasks to symbol and word substitutions. The method was evaluated on CNN models. This project aims to develop similar techniques for attention-based NLP models (www-nlp.stanford.edu/pubs/emnlp15_attn.pdf).
Attribution-based safety testing of deep neural networks. Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. In a recent paper (http://qav.comlab.ox.ac.uk/bibitem.php?key=WWRHK+19) a game-based method was proposed for robustness evaluation, which can be used to provide saliency analysis. This project aims to extend these techniques with the attribution method (http://arxiv.org/abs/1902.02302) to produce a methodology for computing the causal effect of each feature and evaluate it on image data.
Uncertainty quantification for end-to-end neural network controllers. NVIDIA has created a deep learning system for end-to-end driving called PilotNet (http://devblogs.nvidia.com/parallelforall/explaining-deep-learning-self-driving-car/). It inputs camera images and produces a steering angle. The network is trained on data from cars being driven by real drivers, but it is also possible to use the Carla simulator. In a recent paper (http://arxiv.org/abs/1909.09884) a robustness analysis with statistical guarantees for different driving conditions was carried out for a Bayesian variant of the network. This project aims to develop a methodology based on these techniques and semantic transformation of weather conditions (see http://proceedings.mlr.press/v87/wenzel18a/wenzel18a.pdf) to evaluate the robustness of PilotNet or similar end-to-end controllers in a variety of scenarios.

Dashcam analysis Harjinder Lallie B C MSc

There exist no current guidelines and very few tools to aid the investigation of a dashcam device, particularly for the purpose of extracting and mapping geospatial data contained therein. This project aims to extract, map and chart geospatial data from a dashcam device and provide insights into the routes and speeds taken by a passenger.You will be provided with a number of dashcam forensic images (.E01 files which can easily be extracted back to MOV/MP4 files). You will be expected to develop the solution using Python, and if possible, integrate the solution with Autopsy, an open-source digital forensic tool. You might consider using an open-source tool such as Exiftool to process the EXIF data. You could choose to take this project to a somewhat different angle by choosing to extract watermark data from the dashcam footage and map that instead.

To aid in understanding the background to this project, please see the reference below and the video here: https://warwick.ac.uk/fac/sci/wmg/mediacentre/events/dfls/previousevents/dashcamforensics/

Lallie, H.S., 2020. Dashcam forensics: A preliminary analysis of 7 dashcam devices. Forensic Science International: Digital Investigation, 33, p.200910.

Prerequisite. You may want to use tools such as EXIFTOOL, an open-source EXIF data extraction tool. Although EXIFTOOL might serve your needs, you will most probably want to develop an excellent knowledge of the EXIF standard. Additional support can be provided by providing you with access to specific elements of my digital forensics course at the University of Warwick in the form of recorded lectures. That will comprise around 10 hours of learning. I will also provide you with sufficient background in the topic prior to you beginning the study. You will need good programming skills, preferably in Python.

Keyword searching audio/video files Harjinder Lallie B C MSc Digital forensic investigators often search for the existence of keywords on hard disks or other storage medium. Keywords are easily searchable in PDF/word/ text/other popular formats, however, current digital forensic tools do not allow for keyword searching through movies/audio. This is essential in cases which involve dashcam footage, recorded conversations etc. The aim of this project is to produce a tool which auto-transcribes audio data and then performs a keyword search on the transcribed file – pinpointing the point(s) in the file where the keyword(s) appear. You will be expected to develop the solution using Python, and if possible, integrate the solution with Autopsy, an open-source digital forensic tool. Prerequisite. Additional support can be provided by providing you with access to specific elements of my digital forensics course at the University of Warwick in the form of recorded lectures. That will comprise around 10 hours of learning. You are likely to use the Python SpeechRecognition and possibly the PyAudio libraries. For convenience and demonstration of practicality, you may want to integrate the solution with the open-source forensics tool – Autopsy – and hence will need to develop a good understanding of this tool particularly the keyword search facility. show / hide details

Reasoning with Causality in Digital Forensic Cases. Harjinder Lallie B C MSc Unlike traditional 'wet' forensics, digital forensics is heavily reliant on circumstantial evidence. Quite often a digital investigator is tasked with outlining the sequence of events that took place on a computer system to establish causality and a chain of events. These event sequences are reasonably straightforward to map conceptually, however, there is no solution which 'auto-links' these events. This project applies probabilistic techniques using an attack tree (or alternative) methodology to determining probable event sequences from sequence fragments available on a computer system. You will be supplied with a digital forensic image of a used hard disk, or you may choose to develop your own. You will most likely organize event sequences into categories (e.g. software installation, execution, file download, browsing etc) and then test your proposals on further hard disk images and possibly supplement your findings through qualitative feedback from law enforcement experts (I can connect you). Prerequisite. Knowledge of Autopsy, an open-source forensic tool. Additional support will be given, if required, by providing access to my online wiki self-guided tutorials and by providing access to specific elements of my digital forensics course at the University of Warwick in the form of recorded lectures. That will comprise around 10 hours of learning. I will also provide you with sufficient background in the topic prior to you beginning the study. You will need good programming skills, preferably in Python. show / hide details

Analysing concurrent datatypes in CSP Gavin Lowe Automated Verification C The aim of this project would be to model some concurrent datatypes in CSP, and to analyse them using the model checker FDR. The concurrent datatypes could be some of those studied in the Concurrent Algorithms and Datatypes course. Typical properties to be proved would be linearizability (against a suitable sequential specification) and lock freedom. Prerequisites: Concurrency and Concurrent Algorithms and Datatypes. Reading: Analysing Lock-Free Linearizable Datatypes using CSP, Gavin Lowe. show / hide details

Explainable Neural Networks Thomas Lukasiewicz Artificial Intelligence and Machine Learning MSc

Despite the increasing success of deep neural models, their general lack of interpretability is still a major drawback, carrying far-reaching consequences in safety-critical applications, such as healthcare and legal. Several directions of explaining neural models have recently been introduced, such as feature-based explanations and natural language explanations. However, there are still several major open questions, such as:

Are explanations faithfully describing the decision-making processes of the models that they aim to explain?

Can explanations for the ground-truth label that are provided during training increase model robustness and generalization capabilities?

Can we do few-shot learning of natural language explanations? What are the advantages and disadvantages of each of the multiple types of explanations (e.g., feature-based, example-based, natural language, surrogate models)? The students will be able to pick one of these open questions or propose their own. The projects will also be co-supervised by Oana-Maria Camburu, a postdoctoral researcher with strong background and contributions in this area. "strong coding skills (preferably in deep learning platforms, such as Pytorch or Tensorflow), deep learning knowledge.

References: [1] https://papers.nips.cc/paper/2018/file/4c7a167bb329bd92580a99ce422d6fa6-Paper.pdf [2] https://www.aclweb.org/anthology/2020.acl-main.771/ [3] https://arxiv.org/abs/2004.14546 [4] https://arxiv.org/abs/1910.02065 [5] https://dl.acm.org/doi/abs/10.1145/3313831.3376219 "

Fairness of AI Systems Thomas Lukasiewicz Artificial Intelligence and Machine Learning MSc

Having fair AI systems is critical for the deployment of these systems in society. Measuring the fairness of a system is currently achieved, for example, by using diagnostic datasets [1]. However, current diagnostic datasets may still be unfair [2]. The goal of this project is to investigate the potential sources of unfairness in existing diagnostic datasets, and to propose solutions for fixing these datasets or to gather new diagnostic datasets grounded in linguistic theory that would not suffer from unfairness. For example, for assessing gender bias in NLP models, current diagnostic datasets (e.g., GAP [1], WinoGender [3], WinoBias [4]) are either synthetic or have unbalanced properties that do not allow for correct bias measuring [2]. One could gather a new real-world dataset for assessing gender bias in a similar way GAP was gathered [1] (by selecting paragraphs from Wikipedia that adhere to certain rules), but, additionally, restricting to instances where it is possible to obtain a counterpart with swapped genders.

The students will also be able to propose their own projects in this area. The project will also be co-supervised by Oana-Maria Camburu (postdoctoral researcher) and Vid Kocijan (final year DPhil student), who have strong background and contributions in this area.

Strong coding skills (preferably in deep learning platforms, such as Pytorch or Tensorflow), deep learning knowledge.

References:
[1] https://www.aclweb.org/anthology/Q18-1042.pdf
[2] https://arxiv.org/abs/2011.01837
[3] https://arxiv.org/abs/1804.09301
[4] https://arxiv.org/abs/1804.06876

Moving Object Detection using Transformers Thomas Lukasiewicz Artificial Intelligence and Machine Learning MSc

Moving Object Detection using Transformers Transformers have become a popular choice for many machine learning applications. They have been used with great success for NLP tasks and have more recently been used in the vision domain to tackle object detection in an end-to-end fashion [1]. The aim of this project is to extend the work of [1] and apply it to the moving object detection problem. In a typical object detection framework, the goal is to identify and localise all of the objects of interest in a single image. The moving object detection problem is a generalisation where the goal is to identify and localise only the moving objects in a sequence of images. For example, this would allow a model to distinguish moving vehicles from parked vehicles. You will have access to Calipsa's real-world CCTV dataset to train and test models. The dataset is challenging since the image sequences are temporally sparse and the time difference between consecutive frames is variable. The image quality is also highly variable. [1] End-to-End Object Detection with Transformers, Carion et al. https://arxiv.org/pdf/2005.12872.pdf"

Prerequisites:

Strong Python Coder, experience with TensorFlow, experience with deep learning especially computer vision applications.

Moving Object Detection: A weakly-Supervised Approach Thomas Lukasiewicz Artificial Intelligence and Machine Learning MSc

The goal of moving object detection is to identify and localise only the moving objects in a sequence of images. For example, this would allow a model to distinguish moving vehicles from parked vehicles. Typically, object detectors (moving or otherwise) are trained using highly supervised data. Human annotators draw boxes around each object in each frame of an image sequence. This is time consuming and expensive. Similarly to [1], the aim of this project is to explore the possibility of using a weaker level of supervision (image level labels, rather than boxes) to identify and locate moving objects. You will have access to Calipsa's real-world CCTV dataset to train and test models. The dataset is challenging since the image sequences are temporally sparse and the time difference between consecutive frames is variable. The image quality is also highly variable. [1] Is object localization for free? – Weakly-supervised learning with convolutional neural networks, Sivic et al. https://www.di.ens.fr/~josef/publications/Oquab15.pdf Prerequisites:

"Requirements: Strong python coder, experience with Tensorflow, experience with deep learning, especially computer vision applications."

Analysis of wildlife tracking data Andrew Markham Cyber Physical Systems MSc We have developed ultra-low power wireless tracking tags which have been deployed on badgers, hares and swans in the wild. The sensors mainly measure 3-D acceleration, but some data is also from our world-first underground 3-D tracking system. The sensors are typically sampled at 8 Hz and collars run for between 2 and 3 months. This has generated a large dataset (> 500 million samples) of animal behaviour traces. For some of these deployments, there are secondary (ground truth) data including video and RFID. The challenge is to develop tools to analyse these datasets and present results to end-users (zoologists) in a user-friendly way. Such a tool would perform classification (typically unsupervised) on behaviour traces, generating classes of behaviour. At a very coarse level, this could simply be sleeping vs active, but the richer the classification, the better for automatic generation of ethograms. This classification can also be used online to alter the behaviour of the tracking tags themselves, such as to reduce the sampling rate when an animal is resting to prolong the lifetime of the collar. Suitable for MSc student interested in statistical analysis/data mining and interdisciplinary science show / hide details

Effective Categorical Reasoning Dan Marsden Foundations, Structures, and Quantum MSc

Category theory is an important tool in theoretical computer science. In introductory courses proofs are generally conducted by pasting commuting diagrams together, or simple equational reasoning. There are further proof styles in category theory, for example using string diagrams, "Yoneda style" proofs and the use of internal logics. The aim of this project would be to investigate and contrast these different approaches, showing how, and when, they can be used effectively, on realistic problems. A concrete starting point would be to understanding the techniques involved, and then apply them to non trivial proofs from the literature in order to demonstrate their relative benefits. An ideal outcome would be an example based account for computer scientists of how to reason efficiently in category theory.

Prerequisites: A good understanding of elementary category theory and comfort with mathematical proofs is essential. Some experience with string diagrams, as for example in the quantum computer science courses in the department would also be helpful.

Assessing the performance of stochastic optimization methods in structural modelling Peter Minary Computational Biology and Health Informatics B C

The efficiency of Monte Carlo (MC) based stochastic optimization methods (e.g. Kirkpatrick, et al. Science, 220, 671–680 (1983)) are compared to find low energy conformational states of a given system. We are particularly interested in MC protocols where the temperature varies as a series of impulses followed by relaxation and only the temperature of a given part (e.g. site of interest) of the system is changed.

Prerequisites: Recommended for students who has done the Probability and Computing and Geometric Modelling courses and have interest in randomized search methods and their applications in structural biology.

Developing computational tools to aid the design of CRISPR/Cas9 gene editing experiments Peter Minary Computational Biology and Health Informatics C MSc

At present, the most versatile and widely used gene-editing tool is the CRISPR/Cas9 system, which is composed of a Cas9 nuclease and a short oligonucleotide guide RNA (or guide) that guides the Cas9 nuclease to the targeted DNA sequence (on-target) through complementary binding. There are a large number of computational tools to design highly specific and efficient CRISPR-Cas9 guides but there is a great variation in performance and lack of consensus among the tools. We aim to use ensemble learning to combine the benefits of a selected set of guide design tools to reach superior performance compared to any single method in predicting the efficiency of guides (for which experimental data on their efficiency is available) correctly.

Recommended for students who has done the Machine Learning and the Probability and Computing courses.

Software development for computational structural biology Peter Minary Computational Biology and Health Informatics B C

Several projects are available to implement new algorithms or protocols into the MOSAICS (http://www.cs.ox.ac.uk/mosaics) software package. The first project would aim for the development of new analysis tools to interpret the simulation result. For example, the new protocol would take (input) a structural similarity mearure and a trajectory of simulated conformations and would produce (output) a measure of structural diversity of conformations visited. In the second project, students would implement and compare different physical models to describe hydrogen bonding, which is among the most important canonical interactions that stabilize the double helical DNA.

Prerequisites:

The project is recommended for students who took the Geometric Modelling and Computer Animation courses as well as have some interest in Numerical Methods (e.g. solution of Ordinary Differential Equations) and their applications to atomistic simulations.

Tools and applications in computational epigenetics Peter Minary Computational Biology and Health Informatics C

Chemical modifications such as (hydroxy)methylation on nucleic acids are used by the cell for silencing and activation of genes. These so called epigenetic marks can be recognized by 'protein readers' indirectly due to their structural 'imprints', the effects they impose on DNA structure. The project include the development of computational protocols to assess the effect of epigenetic modifications on DNA structure. This research may shed light on how different epigentic modifications affect the helical parameters of the double stranded DNA.

Prerequisites:

Strong interest in visualizing, analysing and comparing 3D objects and modelling molecular structures. The project can be tailored to suit those from a variety of backgrounds but would benefit from having taken the following courses: Computer Graphics, Geometric Modelling and Computer Animation.

Topics in Automata Theory, Program Verification and Programming Languages (including lambda calculus and categorical semantics) Andrzej Murawski Programming Languages C MSc

Prof Murawski is willing to supervise in the area of automata theory, program verification and programming languages (broadly construed, including lambda calculus and categorical semantics).

For a taste of potential projects, follow this link.

An Electronic Commerce Protocol Hanno Nickau Foundations, Structures, and Quantum B C

Commercial use of the Internet is becoming more and more common, with an increasing variety of goods becoming available for purchase over the Net. Clearly, we want such purchases to be carried out securely: a customer wants to be sure of what (s)he's buying and the price (s)he's paying; the merchant wants to be sure of receiving payment; both sides want to end up with evidence of the transaction, in case the other side denies it took place; the act of purchase should not leak secrets, such as credit card details, to an eavesdropper.

The aim of this project is to find out more about the protocols that are used for electronic commerce, and to implement a simple e-commerce protocol. In more detail:

Understand the requirements of e-commerce protocols;

Specify an e-commerce protocol, both in terms of its functional and security requirements;

Understand cryptographic techniques;
Understand how these cryptographic techniques can be combined together to create a secure protocol - and understand the weaknesses that allow some protocols to be attacked;
Design a protocol to meet the requirements identified;
Implement the protocol.

A variant of this project would be to implement a protocol for voting on the web (which would have a different set of security properties).

Prerequisites for this project include good program design and implementation skills, including some experience of object-oriented programming, and a willingness to learn about protocols and cryptography. The courses on concurrency and distributed systems provide useful background for this project.

1 Jonathan Knudsen, Java Cryptography, O'Reilly, 1998.

Bootstrapping document annotation with Schema.org Nadeschda Nikitina Data and Knowledge MSc

The recently started Schema.org initiative of the major search engine providers aims at fostering semantic annotations across the Web. You can read about it here. Semi-automatic annotation of natural language documents is a long-standing problem area. The goal of this project would be to apply state-of-the-art annotation techniques to a large corpus based on the Schema.org semantic model.

Prerequisites

Some familiarity with topics from Computational Linguistics and Knowledge, Representation and Reasoning.

Factorised Databases Dan Olteanu Data and Knowledge B C MSc

More details can be found at (http://www.cs.ox.ac.uk/people/dan.olteanu.html) and Dr Olteanu would be happy to discuss specific projects within the aforementioned topics with interested students.

https://fdbresearch.github.io

Branching Temporal Logics, Automata and Games Luke Ong Programming Languages B C MSc

Model checking has emerged as a powerful method for the formal verification of programs. Temporal logics such as CTL (computational tree logic) and CTL* are widely used to specify programs because they are expressive and easy to understand. Given an abstract model of a program, a model checker (which typically implements the acceptance problem for a class of automata) verifies whether the model meets a given specification. A conceptually attractive method for solving the model checking problem is by reducing it to the solution of (a suitable subclass of) parity games. These are a type of two player infinite game played on a finite graph. The project concerns the connexions between the temporal logics CTL and / or CTL*, automata, and games. Some of the following directions may be explored. 1. Representing CTL / CTL* as classes of alternating tree automata. 2. Inter-translation between CTL / CTL* and classes of alternating tree automata 3. Using B¨uchi games and other subclasses of parity games to analyse the CTL / CTL* model checking problem 4. Efficient implementation of model checking algorithms 5. Application of the model checker to higher-order model checking.

References:

Orna Kupferman, Moshe Y. Vardi, Pierre Wolper: An automata-theoretic approach to branchingtime model checking. J. ACM 47(2): 312-360 (2000).
http://dx.doi.org/10.1145/333979.333987

Rachel Bailey: A Comparative Study of Alorithmics for Solving B¨uchi Games. University of Oxford MSc Dissertation, 2010.
http://www.cs.ox.ac.uk/people/luke.ong/personal/publications/RachelBailey_MScdissertation.pdf

Foundational Structures for Concurrency Luke Ong Programming Languages MSc

http://www.cs.ox.ac.uk/people/luke.ong/personal/lukeong-projects-19-20.html

Game Semantics and Linear Logic Luke Ong Programming Languages MSc

http://www.cs.ox.ac.uk/people/luke.ong/personal/lukeong-projects-19-20.html

Lambda Calculus Luke Ong Programming Languages MSc

See http://www.cs.ox.ac.uk/people/luke.ong/personal/lukeong-projects-19-20.html

Luke Ong's Projects Luke Ong Programming Languages B C MSc

See http://www.cs.ox.ac.uk/people/luke.ong/personal/lukeong-projects-19-20.html for more information

Semantics of Programming Languages Luke Ong Programming Languages MSc

http://www.cs.ox.ac.uk/people/luke.ong/personal/lukeong-projects-19-20.html

Types, Proofs and Categorical Logic Luke Ong Programming Languages MSc

http://www.cs.ox.ac.uk/people/luke.ong/personal/lukeong-projects-19-20.html

Building databases of mathematical objects in Sagemath (Python) Dmitrii Pasechnik Algorithms and Complexity Theory B C MSc

There is an enormous amount of information on constructing various sorts of ``interesting'', in one or another way, mathematical objects, e.g.

block designs, linear and non-linear codes, Hadamard matrices, elliptic curves, etc. There is considerable interest in having this information available in computer-ready form. However, usually the only available form is a paper describing the construction, while no computer code (and often no detailed description of a possible implementation) is provided. This provides interesting algorithmic and software engineering challenges in creating verifiable implementations; properly structured and documented code, supplemented by unit tests, has to be provided, preferably in functional programming style (although performance is important too).

Sagemath project aims in part to remedy this, by implementing such constructions, see e.g. Hadamard matrices in Sagemath: http://doc.sagemath.org/html/en/reference/combinat/sage/combinat/matrices/hadamard_matrix.html and http://arxiv.org/abs/1601.00181.

The project will contribute to such implementations.

There might be a possibility for participation in Google Summer of Code (GSoC) with Sagemath as a GSoC organisation, and being partially funded by the EU project ``Open Digital Research Environment Toolkit for the Advancement of Mathematics''

http://opendreamkit.org/.

Prerequisites: Interest in open source software, some knowledge of Python, some maths background.

Computing with semi-algebraic sets in Sage (math) Dmitrii Pasechnik Algorithms and Complexity Theory B C MSc

Semi-algebraic sets are subsets of R^n specified by polynomial inequalities. The project will extend capabilities of Sage (http://www.sagemath.org) to deal with them, such as CADs computations or sums of squares based (i.e. semi-definite programming based) methods. There might be a possibility for participation in Google Summer of Code (GSoC) or Google Semester of Code with Sage as a GSoC organisation.

Prerequisites

Interest in open source software, some knowledge of Python, appropriate maths background.

Implementing and experimenting with variants of Weisfeiler-Leman graph stabilisation Dmitrii Pasechnik Algorithms and Complexity Theory B C MSc Weisfeiler-Leman graph stabilisation is a procedure that plays an important role in modern graph isomorphism algorithms (L.Babai 2015) and enjoyed attention in machine learning community recently (cf. "Weisfeiler-Lehman graph kernels"). While it is relatively easy to implement, efficient portable implementations seem to be hard to find. In this project we will work on producing such an open-source implementation, either by reworking our old code from http://arxiv.org/abs/1002.1921 or producing new one; we will also work on providing Python/Cython interfaces for a possible inclusion in SageMath (http://sagemath.org) library, and/or elsewhere. show / hide details

Automatic translation to GPGPU Joe Pitt-Francis Computational Biology and Health Informatics B C MSc

This project involves running cardiac cell models on a high-end GPU card. Each model simulates the electrophysiology of a single heart cell and can be subjected to a series of computational experiments (such as being paced at particular heart rates). For more information about the science and to see it in action on CPU see "Cardiac Electrophysiology Web Lab" at https://travis.cs.ox.ac.uk/FunctionalCuration/ An existing compiler (implemented in Python) is able to translate from a domain specific XML language (http://models.cellml.org) into a C++ implementation. The goal of the project is to add functionality to the compiler in order to get OpenCL or CUDA implementations of the same cell models and to thus increase the efficiency of the "Web Lab".

General GPGPU and high performance computing projects Joe Pitt-Francis Computational Biology and Health Informatics B C MSc

I am willing to supervise projects which fit into the general areas of General Purpose Graphics Processing Unit (GPGPU) programming and High-Performance Computing (HPC). Specific technologies used are likely to based around

NVIDIA CUDA for GPU programming;
MPI for distributed-memory cluster computing.

All application areas considered although geometric algorithms are favourites of mine.

General graphics projects Joe Pitt-Francis Computational Biology and Health Informatics B C MSc

I am interested in supervising general projects in the area of computer graphics. If you have a particular area of graphics-related research that you are keen to explore then we can tailor a bespoke project for you. Specific projects I have supervised in the past include

"natural tree generation" which involved using Lindenmayer systems to grow realistic looking bushes and trees to be rendered in a scene;
"procedural landscape generation" in which an island world could be generated on-the-fly using a set of simple rules as a user explored it;
"gesture recognition" where a human could control a simple interface using hand-gestures;
"parallel ray-tracing" on distributed-memory clusters and using multiple threads on a GPU card;
"radiosity modelling" used for analysing the distribution of RFID radio signal inside a building; and
"non-photorealistic rendering" where various models were rendered with toon/cel shaders and a set of pencil-sketch shaders.

MSc students should note that in order for this option to work as a potential MSc project then it should be combined with a taught-course topic such as machine learning, concurrent programming, linguistics etc.

Graphics pipeline animator Joe Pitt-Francis Computational Biology and Health Informatics B

Pre-requisites: Computer graphics, Object-oriented programming

The idea behind this project is to build an educational tool which enables the stages of the graphics pipeline to be visualised. One might imagine the pipeline being represented by a sequence of windows; the user is able to manipulate a model in the first window and watch the progress of her modifications in the subsequent windows. Alternatively, the pipeline might be represented by an annotated slider widget; the user inputs a model and then she moves the slider down the pipeline, watching an animation of the process

Intuitive exploration through novel visualisation Joe Pitt-Francis Computational Biology and Health Informatics B C I am interested in novel visualisation as a way to represent things in a more appealing and intuitive way. For example the Gnome disk usage analyzer (Baobab) uses either a "ring chart" or "treemap chart" Representation to show us which sub-folders are using the most disk. In the early 1990s the IRIX file system navigator used a 3D skyscraper representation to show us similar information. There are plenty more ways of representing disk usage: from DAGs to centralised Voronoi diagrams. What kind of representation is most intuitive for finding a file which hogging disk-space and which is most intuitive for helping us to remember where something is located in the file-system tree? The aim is to explore other places where visualisation gain intuition: for example, to visualise the output of a profiler to find bottlenecks in software, to visual a code coverage tool in order to check that test-suites are are testing the appropriate functionality or even to visualise the prevalence of diabetes and heart disease in various regions of the country. show / hide details

Computation of particle interactions in n-dimensional space Martin Robinson Computational Biology and Health Informatics C MSc

"Aboria (https://github.com/martinjrobins/Aboria) is a C++ library for evaluating and solving systems of equations that can be described as interactions between particles in n-dimensional space. It can be used as a high performance library to implement numerical methods such as Molecular Dynamics in computational chemistry, or Gaussian Processes for machine learning.

Project 1: Aboria features a radial neighbour search to find nearby particles in the n-dimensional space, in order to calculate their interactions. This project will implement a new algorithm based on calculating the interactions between neighbouring *clusters* of particles. Its performance will be compared against the existing implementation, and across the different spatial data structures used by Aboria (Cell-list, Octree, Kdtree). Prerequisites: C++

Project 2: Aboria features a serial Fast Multipole Algorithm (FMM) for evaluating smooth long range interactions between particles. This project will implement and profile a parallel FMM algorithm using CUDA and/or the Thrust library.

Prerequisites: C++, Knowledge of GPU programming using CUDA and/or Thrust

Project 3: The main bottleneck of the FMM is the interactions between well-separated particle clusters, which can be described as low-rank matrix operations. This project will explore different methods compressing these matrices in order to improve performance, using either Singular Value Decomposition (SVD), Randomised" SVD, or Adaptive Cross Approximation

Prerequisites: C++, Linear Algebra

Statistical shape atlas of cardiac anatomy Blanca Rodriguez, Pablo Lamata Computational Biology and Health Informatics MSc

Description: Cardiac remodelling is the change in shape of the anatomy due to disease processes. 3D computational meshes encode shape variation in cardiac anatomy, and render higher diagnostic than conventional geometrical metrics. Better shape metrics will enable an earlier detection, a more accurate stratification of disease, and more reliable evaluation of remodelling response. This project will contribute to the development of a toolkit for the construction of anatomical atlases of cardiac anatomy, and its translation to clinical adoption. The student will learn about the challenges and opportunities of the cross-disciplinary field between image analysis, computational modelling and cardiology, and be part of a project with a big potential impact on the management of cardiovascular diseases.

Prerequisites: Motivation. Good programming skills. Experience with computational graphics and image analysis is an advantage.

Applications of ad hoc security Bill Roscoe Security MSc My group have developed novel technology for developing security between pairs and groups of devices such as mobile phones. This technology clearly has many potential applications: the objective of this project is to build an implementation directed at an application of the student's choosing, for example email, healthcase or social networking. See my papers with Bangdao Chen, Ronald Kainda and Long Nguyen on my web page. Prerequisites: Security show / hide details

Developing and using SVA Bill Roscoe Security MSc SVA is a front end for FDR that analyses shared variable programs. See Chapters 18 and 19 of Understanding Concurrent Systems. It has the potential to support a number of projects including: (a) Building a version that accepts a subset of some standard programming language as opposed to the custom language it presently uses. (b) Extending its functionality (c) Case studies using it. (a) and (b) would both involve programming in CSP and JAVA, the two languages SVA is written in. Prerequisites: Concurrency for (a) and (b) show / hide details

Modelling and verifying systems in Timed CSP and FDR Bill Roscoe Security B C MSc

Timed CSP reinterprets the CSP language in a real-time setting and has a semantics in which the exact times of events are recorded as well as their order. Originally devised in the 1980s, it has only just been implemented as an alternative mode for FDR. The objective of this project is to take one or more examples of timed concurrent system from the literature, implement them in Timed CSP, and where possible compare the performance of these models with similar examples running on other analysis tools such as Uppaal.

References:

(Reference Understanding Concurrent Systems, especially Chapter 15, and Model Checking Timed CSP, from AWR's web list of publications)

Proving circuit lower bounds using SAT solvers Rahul Santhanam, Jan Pich Algorithms and Complexity Theory B C MSc We would be happy to supervise a project on SAT solving of circuit lower bounds. The goal of the project is to experimentally verify simple finitistic versions of important conjectures in the theory of computation. Proving lower bounds on size of circuits computing explicit Boolean functions is one of the most fundamental problems in the theory of computation, underlying questions such as the famous P versus NP problem. Intuitively, the task of proving a circuit lower bound can be described as follows: given an explicit Boolean function show that it cannot be computed by a small circuit (assuming it is indeed the case). This fundamental problem has strong connections to cryptography and learning algorithms. Circuit lower bounds have been, however, notoriously difficult to prove. For these reasons we would like to check out if computers could help us to gain some insight into the problem. As a first step, we will verify if the existing class of algorithms known as SAT solvers could be used for this purpose. SAT solvers are algorithms designed to decide satisfiability of propositional formulas, which turned out surprisingly powerful in real-world applications. Circuit lower bounds can be expressed as propositional formulas as well. The project will involve implementation of the encoding of some simple circuit lower bounds into propositional formulas and testing if SAT solvers can resolve these formulas in a reasonable time. We expect that SAT solvers will be able to solve only very simple cases of the problem. In general, the problem is expected to be hard for all SAT solvers. More advanced directions could involve modifications of SAT solvers with new heuristics. In principle, this approach could also lead us to a construction of new learning algorithms. show / hide details

Automatically synthesizing Spark programs Amir Shaikhha Data and Knowledge B C MSc There are different ways to write a query processing algorithm (such as a join algorithm) using Spark. For different data sizes and based on the number of worker nodes, the best Spark implementation may vary. The aim of this project is to use program synthesis techniques to automatically derive the most efficient Spark program based on the given data information and the given platform. show / hide details

Data Compression Schemes as Compiler Transformations Amir Shaikhha Data and Knowledge B C Compression algorithms can improve the runtime performance and memory usage of database systems. This improvement is more obvious in the context of column-store database systems. On the other hand, with the advent of in-memory database systems, query compilers are getting more important. As a result, using compilation techniques in database development is gaining more attention. DBLAB is a framework for building database systems using high-level programming, producing optimized code in a low-level language like C. This is achieved through compilation techniques implemented in an extensible optimizing compiler called SC (Systems Compiler). The suggested project combines these two aspects. To be more concrete, we would like to implement different compression schemes using compilation transformations, which will be encoded using SC and added to DBLAB. show / hide details

Model-Based Machine Learning for Optimizing Data Analytics Amir Shaikhha Data and Knowledge B C MSc Model-based machine learning advocates for using probabilistic models to learn patterns on data. Probabilistic programming languages are the key frameworks for expressing such machine learning models. The aim of this project is to use these frameworks for optimizing data analytics workloads. More specifically, the focus of this project is on finding more precise estimates for cardinalities of intermediate relations in database queries, using probabilistic programming languages. show / hide details

A simple object-oriented language Michael Spivey Programming Languages B C MSc

Use Keiko to implement a simple language that is purely object-oriented. Study the compromises that must be made to get reasonable performance, comparing your implementation with Smalltalk, Ruby or Scala.