TRAINING 3 – Making and Breaking Machine Learning Systems




EUR2299 (early bird)

EUR3299 (normal)

Early bird registration rate ends on the 30th of April


Making & Breaking Machine Learning Systems is a fast paced session on machine learning from the hacker and infosec professional’s point of view. The class is designed with the goal of providing students with a hands-on introduction to machine learning concepts and systems, as well as making and breaking security applications powered by machine learning.

The lab session is designed with security use-cases in mind, since using machine learning in security is very different from using it in other situations. Students will get first hand experience at cleaning data, implementing machine learning security programs, and performing penetration tests of these systems.

Each attendee will be provided with a comprehensive virtual machine programming environment that is preconfigured for the tasks in the class, as well as any future machine learning experimentation and development that they will do. This environment consist of all of the most essential machine learning libraries and programming environments friendly to even novices at machine learning.

At the end of the class, students will be put through a CTF challenge that will test the machine learning development and exploitation skills that they have learned over the course in a realistic environment.

Who Should Attend

  • Security Professionals
  • Web Application Pentesters
  • Software/application developers
  • People interested to start using machine learning for security

Key Learning Objectives

  • Familiarizing yourself with popular machine learning algorithms and how to adapt these for different problems
  • How to clean and sanitize data using powerful data processing libraries in Python
  • How to build a spam classifier and online anomaly detection system in Python
  • How to do performance evaluations of machine learning classifiers
  • Examples for using machine learning in intrusion detection, botnet detection, phishing detection, web vulnerability analysis, malware classification, and behavioural analysis
  • Perform tuning of machine learning systems to improve classification/detection results
  • Perform security evaluations and penetration tests on machine learning systems
    • Fuzzing machine learning classifiers
  • How to avoid vulnerabilities in machine learning system and algorithm design
  • How to use Apache Spark to design scalable and distributed real-time machine learning systems
  • Write your own machine learning captcha solver

Preequisite Knowledge


Hardware / Software Requirements

  • Latest version of VirtualBox Installed
  • Administrative access on your laptop with external USB allowed
  • At least 20 GB free hard disk space
  • At least 4 GB RAM (the more the better)



● Introduction to machine learning
○ Hands-on guided exploration of Python machine learning libraries:

  • Data-wrangling using Numpy and Pandas
  • Scikit-learn’s functions and capabilities
  • Data visualization using Matplotlib/Seaborn
  • Walkthrough of the most commonly used machine learning algorithms (with quick hands-on examples/visualizations for select algorithms)
    •  Supervised learning algorithms
      • Linear/logistic regression
      • Support Vector Machines
    • Unsupervised learning algorithms
      • Hierarchical/k-Means clustering
      • Decision trees/Random forests
    • Semi-supervised learning
  • 2-hour example: Building (and bypassing) an email spam filter with scikit-learn


  • Loading data efficiently
  • Using a labeled email/spam corpus training and test set, extract salient features to build a word model of spam
  • Model tuning, cross-validation, and evaluation process
  • With complete knowledge of the system, manually craft a piece of spam to bypass the filter
  • Lecture on application of machine learning in the security/abuse space
    • Spam, fraud, malware, phishing, and intrusion detection short examples
    • Principles behind selecting the best machine learning models for different use-cases
    • Considerations when using machine learning in an adversarial/malicious networks
    • Using Keras/TensorFlow for anomaly detection with convolutional neural networks
    • Choosing the appropriate model for implementing different types of problems – efficacy comparison of different machine learning techniques for solving the anomaly detection problem, and what other considerations to have
  • 2-hour example: Building a simple network intrusion detection system with 2 different machine learning models
    • Importance of understanding the data and the threat model before designing a solution for the problem
    • Model tuning, cross-validation, and evaluation process
    • Guided comparisons of the performance characteristics for each implementation
    • Visualizing and presenting the data for ease of analysis by security operation professionals.


  • Streaming pipelines for machine learning using Apache Spark MLlib (PySpark)
    • Overview of Apache Spark
      • General architecture
      • Distributed, scalable machine learning deployments with Spark
    • Guided example of a streaming architecture for network anomaly detection using reinforcement learning on Spark
  • Evaluating the security of machine learning systems
    • Techniques and guided example of fuzzing a classifier and regressor to find blind spots in the model
    • Evaluation of intelligent learning system architecture that is resilient to model poisoning by an adversary
  • Machine Learning CTF challenge – captcha bypass challenges (using captcha character classification starter code provided)


Location: Training Rooms Date: April 9, 2018 Time: 9:00 am - 6:00 pm Clarence Chio