- Details
- Published on Tuesday, 19 August 2014 01:36

Another super nice event that happened in Berlin in July was the second Shogun workshop that me and the Shogun team organised. We had a hands-on session and a main workshop day full of talks, cool people, and lots of Machine Learning. It is a great pleasure working with the Shogun team, and I am very happy that we were able to push this year's workshop through.

I talked a bit about what Shogun is and what it tries to be, and gave some directions for future work. If you missed the event, we got full video coverage!

- Details
- Published on Tuesday, 19 August 2014 01:24

It's already a while ago, in July, I presented Shogun at the EuroPython in Berlin. This was a super nice conference full of interesting people and projects. My talk was recorded and can be found [here], slides are [here].

Another super interesting project was presented by Thomas Wiecki: Probabilistic programming with PyMC3 [video]. This is a very interesting project, a bit similar to Stan, which allows to plug together probabilistic models and then do HMC for inference, powered by automatic differentiation (which I am currently super excited about). Unlike Stan, this happens all in Python which makes things super comfortable (example notebook). As I am working on kernel-based samplers (Kameleon MCMC & soon more), this might be a good vehicle to make them public, exploiting the nice auto-diff tool (Theano) that PyMC uses.

- Details
- Published on Wednesday, 25 June 2014 22:07

$\DeclareMathOperator*{\argmin}{arg\,min}$ $\def\Mz{M_{\mathbf{z},y}}$

In this year's ICML in Bejing, Arthur Gretton, presented our paper on **Kernel Adaptive Metropolis Hastings** [link, poster]. His talk slides are based on Dino Sejdinovic ones [link]. Pitty that I did not have travel funding. The **Kame**leon is furious!

The idea of the paper is quite neat: It is basically doing Kernel-PCA to adapt a random walk Metropolis-Hastings sampler's proposal based on the Markov chain's history. Or, more fancy: Performing a random walk on an empirically learned non-linear **manifold **induced by an empirical Gaussian measure in a** Reproducing Kernel Hilbert Space (RKHS)**. Sounds scary, but it's not: the proposal distribution ends up being a Gaussian aligned to the history of the Markov chain.

We consider the problem of sampling from an intractable highly-nonlinear posterior distribution using MCMC. In such cases, the thing to do (tm) usually is **Hamiltonian Monte Carlo (HMC)** (with all its super fancy extensions on Riemannian manifolds etc). The latter really is the only way to sample from complicated densities by using the geometry of the underlying space. *However*, in order to do so, one needs access to target **likelihood** and **gradient**.

We argue that there exists a class of posterior distributions for that HMC is not available, while the distribution itself is still highly non-linear. You get them as easy as attempting **binary classification** with a **Gaussian Process**. Machine Learning doesn't get more applied. Well, a bit: we sample the posterior distribution over the hyper-parameters using Pseudo-Marginal MCMC. The latter makes both likelihood and gradient intractable and the former in many cases has an interesting shape. See this plot which is the posterior of parameters of an Automatic Relevance Determination Kernel on the UCI Glass dataset. This is not an exotic problem at all, but it has all the properties we are after: non-linearity and higher order information unavailable. We cannot do HMC, so we are left with a random walk, which doesn't work nicely here due to the non-linear shape.

The idea of our algorithm is to do a random walk in an infinite dimensional space. At least almost.

- We run some method to get a rough sketch of the distribution we are after.
- We then embed these points into a RKHS; some infinite dimensional mean and covariance operators are going on here, but I will spare you the details here.
- In the RKHS, the chain samples are Gaussian, which corresponds to a manifold in the input space, which in some sense aligns with what we have seen of the target density yet.
- We draw a sample $f$ from that Gaussian - random walk in the RKHS
- This sample is in an infinite dimensional space, we need to map it back to the original space, just like in Kernel PCA
- It turns out that is hard (nice word for impossible). So let's just take a gradient step along some cost function that minimises distances in the way we want: \[\argmin_{x\in\mathcal{X}}\left\Vert k\left(\cdot,x\right)-f\right\Vert _{\mathcal{H}}^{2}\]
- Cool thing: This happens in input space and gives us a point whose embedding is in some sense close to the RKHS sample.
- Wait, everything is Gaussian! Let's be mathy and integrate the RKHS sample (and the gradient step) out.

Et voilà, we get a **Gaussian** proposal distribution in input space\[q_{\mathbf{z}}(\cdot|x_{t})=\mathcal{N}(x_{t},\gamma^{2}I+\nu^{2}M_{\mathbf{z},x_{t}}HM_{\mathbf{z},x_{t}}^{\top}),\]where $\Mz=2\eta\left[\nabla_{x}k(x,z_{1})|_{x=y},\ldots,\nabla_{x}k(x,z_{n})|_{x=y}\right]$ is based on kernel gradients (which are all easy to get). This proposal aligns with the target density. It is clear (to me) that using this for sampling is better than just an isotropic one. Here are some pictures of bananas and flowers.

The paper puts this idea in the form of an **adaptive** MCMC algorithm, which learns the manifold structure on the fly. One needs to be careful about certain details, but that's all in the paper. Compared to existing linear adaptive samplers, which adapt to the **global** covariance structure of the target, our version adapts to the **local** covariance structure. This can all also be mathematically formalised. For example, for the Gaussian kernel, the above proposal covariance becomes \[\left[\text{cov}[q_{\mathbf{z}(\cdot|y)}]\right]_{ij} = \gamma^{2}\delta_{ij} + \frac{4\nu^{2}}{\sigma^{4}}\sum_{a=1}^{n}\left[k(y,z_{a})\right]^{2}(z_{a,i}-y_{i})(z_{a,j}-y_{j}) +\mathcal{O}(n^{-1}),\] where the previous points $z_a$ influence the covariance, weighted by their similarity $k(y,z_a)$ to current point $y$.

Pretty cool! We also have a few "we beat all competing methods" plots, but I find those retarded and will spare them here - though they are needed for publication ;)

Dino and me have also written a pretty **Python implementation **(link) of the above, where the GP Pseudo Marginal sampling is done with Shogun's ability to importance sample marginal likelihoods of non-conjugate GP models (link). Pretty cool!

- Details
- Published on Friday, 25 April 2014 00:14

I am visiting the Machine Learning Summer School in Reykjavik, Iceland. Having our poster on Kameleon MCMC, a Kernel Adaptive Metropolis-Hastings sampler, with me (arXiv link: to appear in ICML 2014, poster link). Already met loads of interesting new and known people. Iceland is a pretty cool place, looks a bit like this.

- Details
- Published on Saturday, 01 March 2014 20:29

Sergey and me gave an interview on Shogun and Google Summer of Code. Here it is:

The internet. More specifically #shogun on irc.freenode.net. Wasn’t IRC that thing that our big brothers used as a socialising substitute when they were teenagers back in the 90s? Anyways. We are talking to two of the hottest upcoming figures in machine learning open-source software, the Russian software entrepreneur Sergey Lisitsyn, and the big German machine Heiko Strathmann.

**Hi guys, glad to meet you. Would you mind introducing yourself?**

Sergey **(S)**: Hey, I am Sergey. If you ask me what do I do apart from Shogun - I am currently working as a software engineer and finishing my Master’s studies at Samara State Aerospace University. I joined Shogun in 2011 as a student and now I am doing my best to help guys from the Shogun team to keep up with GSoC 2014.

Heiko **(H)**: Hej, my name is Heiko. I do a Phd in Neuroscience & Machine Learning at the Gatsby Institute in London and joined Shogun three years ago during GSoC. I love open-source since my days in school.

**Your project, Shogun, is about Machine Learning. That sounds scary and sexy, but what is it really?**

**H**: My grandmother recently sent me an email asking about this ‘maschinelles Lernen’. I replied it is the art of finding structure in data in an automated way. She replied: Since when are you an artist? And what is this “data”? I showed her the movie PI by Darren Aronofsky where the main character at some point is able to predict stock prices after realising “the pattern”, and said that’s what we want to do with a computer. Since then, she is worried about me because the guy puts a drill into his head in the end….. Another cool application is for example to model brain patterns to allow people to learn how to use a prosthesis faster.

**S**: Or have you seen your iPhone detects faces? That’s just a Support Vector Machine (SVM). It employs kernels which are inner products of non-linear mappings of Haar features into a reproducing kernel Hilbert Space so that it minimizes ….

**Yeah, okok... What is the history of Shogun in the GSoC?**

**S****:** The project got started by Sören in his student days around 15 years ago. It was a research only tool for a couple of years before being made public. Over the years, more and more people joined, but the biggest boost came from GSoC...

**H**: We just got accepted into our 4th year in that program. We had 5+8+8 students so far who all successfully did the program with us. Wow I guess that’s a few million dollars. (EDITOR: actually 105,000$.) GSoC students forced Shogun to grow up in many ways: github, a farm of buildbots, proper unit-testing, a cloud-service, web-demos, etc were all set up by students. Also, the diversity of algorithms from latest research increased a lot. From the GSoC money, we were able to fund our first Shogun workshop in Berlin last summer.

**How did you two got into Shogun and GSoC? Did the money play a role?**

**H**: I was doing my undergraduate project back in 2010, which actually involved kernel SVMs, and used Shogun. I thought it would be a nice idea of putting my ideas into it -- also I was lonely coding just on my own. 2010, they were rejected from GSoC, but I eventually implemented my ideas in 2011. The money to me was very useful as I was planning to move to London soon. Being totally broke in that city one year later, I actually paid my rent from my second participation’s stipend - which I got for implementing ideas from my Master’s project at uni. Since 2013, I mentor other students and help organising the project. I think I would have stayed around without the money, but it would have been a bit tougher.

**S**: We were having a really hard winter in Russia. While I was walking my bear and clearing the roof of the snow, I realised I forgot to turn off my nuclear missile system…..

**H**: Tales!

**S**: Okay, so on another cold night I noticed a message on GSoC somewhere and then I just glanced over the list of accepted organizations and Shogun’s description was quite interesting so I joined a chat and started talking to people - the whole thing was breathtaking for me. As for the money - well, I was a student and was about to start my first part-time job as a developer - it was like a present for me but it didn’t play the main role!

**H**: To make it short: Sergey suddenly appeared and rocked the house coding in lightspeed, drinking Vodka.

**But now you are not paid anymore, while still spending a lot of time on the project. What motivates you to do this?**

**S**: This just involves you and you feels like you participate in something useful. Such kind of appreciation is important!

**H**: Mentoring students is very rewarding indeed! Some of those guys are insanely motivated and talented. It is very nice to interact with the community with people from all over the world sharing the same interest. Trying to be a scientist, GSoC is also very useful in producing tools that myself or my colleagues need, but that nobody has the time to build properly. You see, there are all sorts of synergic effects in GSoC and my day-job at university, such as meeting new people or getting a job since you know how to code in a team.

**How does this work? Did you ever publish papers based on GSoC work?**

**S****:** Yeah, I actually published a paper based on my GSoC 2011 work. It is called ‘Tapkee: An Efficient Dimension Reduction Library’ and was recently published in the Journal of Machine Learning Research. We started writing it up with my mentor Christian (Widmer) and later Fernando (Iglesias) joined our efforts. It took enormous amount of time but we did it! Tapkee by the way is a Russian word for slippers.

**H**: I worked on a project on statistical simulation of global ozone data last year. The code is mainly based on one of my last year’s student’s project - a very clever and productive guy from Mumbai who I would never have met without the program, see http://www.ucl.ac.uk/roulette/ozoneexample

**So you came all the way from being a student with GSoC up to being an organisation admin. How does the perspective change during this path?**

**H****:** I first had too much time so I coded open-source, then too little money so I coded open-source, then too much work so I mentor people coding it open-source. At some point I realised I like this stuff so much that I would like to help organising Shogun and bring together the students and scientists involved. It is great to give back to the community which played a major role for me in my studies. It is also sometimes quite amusing to get those emails by students applying, being worried about the same unimportant things that I worried about back then.

**S**: It seems to be quite natural actually. You could even miss the point when things change and you became a mentor. Once you are into the game things are going pretty fast. Especially if you have full-time job and studies!

**Are there any (forbidden) substances that you exploit to keep up with the workload?**

**S**: It would sound strange but I am not addicted to vodka. Although I bet Heiko is addicted to beer and sausages.

**H**: Coffeecoffeecoffeee…… Well, to be honest GSoC definitely reduces your sleep no matter whether you are either student, mentor, or admin. By the way, our 3.0 release was labelled: Powered by Vodka, Mate, and beer.

**Do you crazy Nerds actually ever go away from your computers?**

**H**: No.

**S**: Once we all met at our workshop in Berlin - but we weren’t really away from our computers. Why on earth to do that?

**Any tips for upcoming members of the open-source community? For students? Mentors? Admins?**

**H**: Students: Do GSoC! You will learn a lot. Mentors: Do GSoC! You will get a lot. Admins/Mentors: Don’t do GSoC, it ruins your health. Rather collect stamps!

**S**: He is kidding. (whispers: “we need this … come on … just be nice to them”)

**H****:** Okay to be honest: just have fun of what you are doing!

Due to the missing interest in the community, Sergey and Heiko interviewed themselves on their own.

Shogun: http://www.shogun-toolbox.org

GSoC 2013 blog: http://herrstrathmann.de/shogun-blog/110-shogun-3-0.html

GSoC 2014 ideas: http://www.shogun-toolbox.org/page/Events/gsoc2014_ideas

Heiko: http://herrstrathmann.de/

Sergey: http://cv.lisitsyn.me/

- Details
- Published on Friday, 13 December 2013 16:45

Last week, I went to the Advances in Neural Information Processing Systems (NIPS) for the first time. That was a very nice experience due to the incredibly density of people whose names I know from research papers. In fact, it was too much to take so I had to pick things that sounded interesting - still loads.

The main three buzzwords of the conference for me were: Deep Learning (even Mark Zuckerberg is interested in that these days), Mini-batch, and stochastic gradient descent (aka on-line whatever).

One very interesting workshop I attended on Tuesday was on Machine Learning Open-Source Software (MLOSS), organised by Cheng Soon Ong (who could not be there unfortunately) and Antti Honkela. I presented a short spotlight for Shogun (slide) and had a one hour demo, showing off with our cool IPython notebooks (link) and the cloud Shogun server (link). I got some very encouraging feedback for this, including from Fernando Perez.

I also met a few nice fellow open-source ML coders from scikit-learn.

During the workshop, there was a quite lively discussion about licensing issues, in particular whether to choose GPL or BSD. The python universe for example seems to gain a lot from being BSD-style licensed.

Finally, NIPS is was held close to Lake Tahoe, which is surrounded by incredibly beautiful mountains to hike in. One evening, I met the guy who left those traces ... very exciting, slightly scary...

- Details
- Published on Tuesday, 25 March 2014 23:33

Last week, I went to the i-like workshop at Oxford university. Pretty cool! All of Britain's statisticians were there and I met many of them for the first time. Check out my two posters (Russian Roulette, Kernel Adaptive Metropolis Hastings). Talks were amazing - as in last NIPS, the main trend is on estimating likelihoods (well, that's the name of the program), either using some other random process such as importance sampling a latent model's marginal likelihood (aka Pseudo-Marginal MCMC), or directly sub-sampling likelihoods or gradients.

These things are important in Machine Learning too, and it is very nice to see the field growing together (even-though there was a talk by a Statistician spending lots of time on re-inventing belief propagation and Junction tree ideas - always such a pitty if this happens simply because communities do not talk to each other enough). Three talks that I really found interesting:

Remi Bardenet talked about sub-sampling approaches to speed up MCMC. This is quite related to the Austerity in MCMC land paper by Welling & Co, with the difference that his tests do not suffer from small number of points in the hypothesis test to decide accept/reject.

Chris Sherlock talked about optimal rates and scaling for Pseudo-Marginal MCMC. There finally are some nice heuristics how to scale PM estimates in a way that the number of iid samples per computation time is optimal. Interestingly, the acceptance rate and the variance of the likelihood estimate can be tweaked separately.

Jim Griffin gave a very interesting talk on adaptive MCMC on discrete, in particular binary, state-spaces - he used them for feature selection (in ML language). His algorithm automatically learns global mutations rates for each of the positions. However, it doesn't take any correlations between the features into account. This might be a very interesting application for our fancy Kameleon sampler (arxiv, code), thinking about this!

Finally, I presented two posters, the one on Playing Russian Roulette with Intractable Likelihoods that I already presented in Reykjavik, and (with Dino) a new poster (link) on the Kernel Adaptive Metropolis Hastings Kameleon that I mentioned above. The corresponding paper is hopefully published very soon. Talking to other scientists about my own work is just great!

- Details
- Published on Saturday, 01 March 2014 01:42

Yeah! Shogun this week got accepted to be an organisation participating in the 10th Google Summer of Code. This year, besides mentoring a few projects, I am one of the three project administrators. I am curious how this will be. One first thing to do was to write the application for Shogun - I'm glad it worked! I also will spend a little more time organising things. Apart from trying to find mentors (which requires a lot of talking people into it), I also want to make Shogun (and the students) having more from the program. Last year, I pushed the team to ask all students

- to
**write a project report**in the form of IPython notebooks (link). These are absolutely great for talking about the GSoC work, impressing people, and having a final piece of work to show for the students. - To
**fully unit-test every module**of their algorithm/framework. This is absolutely essential in order to not loose the student's work a few years later when a re-factoring change breaks their code and nobody knows how to fix it. Those tests already saved lots of life since last year. - To
**peer-review each other**in pairs of students. This improved documentation here and there and solved some bugs. I want to emphasise this more this year as I think it is a great way of enabling synergistic effects between students.

In addition, we will again screen all the applicants via a set of entrance tasks on our github page (link). I just wrote a large number of such smaller or larger tasks that get students started on a particular project, fix bugs in Shogun, or prepare some larger change. In order to get the students started a bit more easily (contributing to Shogun these days is a non-trivial task), I wrote a little how-to (link) that is supposed to point out our expectations, and what are the first steps towards participating in GSoC.

Finally, I wrote descriptions for quite a few possible projects, some of them with a number of interesting co-mentors. The full list is here (link). If you are a talented student interested in any of those topics, consider working with us during the summer. It's usually very fun!

**Variational Learning for Recommendation with Big Data**. With Emtiyaz Khan, who I met at last year's workshop for latent Gaussian models. Matrix factorisation and Gaussian Processes, ultra-cool project.**Generic Framework for Markov Chain Monte Carlo Algorithms and Stan Interface.**With Theo Papamarkou, who I know from my time at UCL Statistics. It's about a modular representation of MCMC within Shogun and a possible interface to STAN for the actual sampling. This would be a major step of Shogun towards probabilistic models.**Testing and Measuring Variable Interactions With Kernels.**With Dino, who is post-doc at Gatsby and co-author of our optimal kernel for MMD paper. This project is to implement all kernel based interaction measures in Shogun in a unified way. We'll probably use this for research later.**A Meta-Language for Shogun examples**. With Sören. Write example once, press button to generate in any modular language binding. This would be so useful to have in Shogun!**Lobbying Shogun in MLPACK’s automatic benchmarking system**. Joint project with Ryan from MLPACK. He already can compare speed of different toolboxes. Now let's compare results.**Shogun Missionary & Shogun in Education**. With Sören. Write high quality notebooks and eye-candy examples. Very different project as this is about creative technical writing and illustrating methods on cool data rather than hacking new algorithms. I would be very excited if this happened!

Some of the other projects involve cool buzzwords such as Deep Learning, Structured Output, Kernel, Dual solvers, Cluster backends, etc. Join us! :)