TRAINING 3 – Making and Breaking Machine Learning Systems « HITBSecConf2018

DURATION: 2 DAYS

CAPACITY: CANCELLED

SEATS AVAILABLE: 20

EUR2299 (early bird)

EUR3299 (normal)

Early bird registration rate ends on the 30th of April

Overview

Making & Breaking Machine Learning Systems is a fast paced session on machine learning from the hacker and infosec professional’s point of view. The class is designed with the goal of providing students with a hands-on introduction to machine learning concepts and systems, as well as making and breaking security applications powered by machine learning.

The lab session is designed with security use-cases in mind, since using machine learning in security is very different from using it in other situations. Students will get first hand experience at cleaning data, implementing machine learning security programs, and performing penetration tests of these systems.

Each attendee will be provided with a comprehensive virtual machine programming environment that is preconfigured for the tasks in the class, as well as any future machine learning experimentation and development that they will do. This environment consist of all of the most essential machine learning libraries and programming environments friendly to even novices at machine learning.

At the end of the class, students will be put through a CTF challenge that will test the machine learning development and exploitation skills that they have learned over the course in a realistic environment.

Who Should Attend

Security Professionals
Web Application Pentesters
Software/application developers
People interested to start using machine learning for security

Key Learning Objectives

Familiarizing yourself with popular machine learning algorithms and how to adapt these for different problems
How to clean and sanitize data using powerful data processing libraries in Python
How to build a spam classifier and online anomaly detection system in Python
How to do performance evaluations of machine learning classifiers
Examples for using machine learning in intrusion detection, botnet detection, phishing detection, web vulnerability analysis, malware classification, and behavioural analysis
Perform tuning of machine learning systems to improve classification/detection results
Perform security evaluations and penetration tests on machine learning systems
- Fuzzing machine learning classifiers

How to avoid vulnerabilities in machine learning system and algorithm design
How to use Apache Spark to design scalable and distributed real-time machine learning systems
Write your own machine learning captcha solver

Preequisite Knowledge

None

Hardware / Software Requirements

Latest version of VirtualBox Installed
Administrative access on your laptop with external USB allowed
At least 20 GB free hard disk space
At least 4 GB RAM (the more the better)

Agenda

DAY 1

● Introduction to machine learning
○ Hands-on guided exploration of Python machine learning libraries:

Data-wrangling using Numpy and Pandas
Scikit-learn’s functions and capabilities
Data visualization using Matplotlib/Seaborn

Walkthrough of the most commonly used machine learning algorithms (with quick hands-on examples/visualizations for select algorithms)
- Supervised learning algorithms
  - Linear/logistic regression
  - Support Vector Machines
- Unsupervised learning algorithms
  - Hierarchical/k-Means clustering
  - Decision trees/Random forests
- Semi-supervised learning
2-hour example: Building (and bypassing) an email spam filter with scikit-learn

DAY 2

Loading data efficiently
Using a labeled email/spam corpus training and test set, extract salient features to build a word model of spam
Model tuning, cross-validation, and evaluation process
With complete knowledge of the system, manually craft a piece of spam to bypass the filter

Lecture on application of machine learning in the security/abuse space
- Spam, fraud, malware, phishing, and intrusion detection short examples
- Principles behind selecting the best machine learning models for different use-cases
- Considerations when using machine learning in an adversarial/malicious networks
- Using Keras/TensorFlow for anomaly detection with convolutional neural networks
- Choosing the appropriate model for implementing different types of problems – efficacy comparison of different machine learning techniques for solving the anomaly detection problem, and what other considerations to have
2-hour example: Building a simple network intrusion detection system with 2 different machine learning models
- Importance of understanding the data and the threat model before designing a solution for the problem
- Model tuning, cross-validation, and evaluation process
- Guided comparisons of the performance characteristics for each implementation
- Visualizing and presenting the data for ease of analysis by security operation professionals.

DAY 3

Streaming pipelines for machine learning using Apache Spark MLlib (PySpark)
- Overview of Apache Spark
  - General architecture
  - Distributed, scalable machine learning deployments with Spark
- Guided example of a streaming architecture for network anomaly detection using reinforcement learning on Spark
Evaluating the security of machine learning systems
- Techniques and guided example of fuzzing a classifier and regressor to find blind spots in the model
- Evaluation of intelligent learning system architecture that is resilient to model poisoning by an adversary
Machine Learning CTF challenge – captcha bypass challenges (using captcha character classification starter code provided)

TRAININGS

Location: Training Rooms Date: April 9, 2018 Time: 9:00 am - 6:00 pm

Clarence Chio

TRAINING 3 – ﻿Making and Breaking Machine Learning Systems