Summer School
HPC in Data Science
18-24 August 2024, Ostrava, Czech Republic
IT4Innovations National Supercomputing Centre
in this page
EUMaster4HPC Summer School welcomes students from universities around the world to get cutting-edge knowledge in HPC and HPC-related technologies for one week, from 18 to 24 August 2024 in Ostrava, Czech Republic.
The Summer School is organised in collaboration with IT4Innovations National Supercomputing Center, which is part of the VSB – Technical University of Ostrava, VSC (Vienna Scientific Cluster) Research Center, which is part of TU Wien (TUW), and MathWorks. These entities will contribute expertise and resources to enrich the experience for all participants.
- When: 18 to 24 August 2024
- Where: Ostrava, Czech Republic
- Topic: HPC in Data Science
- Hosted by IT4Innovations National Supercomputing Center at VSB – Technical University of Ostrava
Travel guide including options of flights to Ostrava is available here.
Registration is open now until 31 May 2024.
To register for the event, please use the registration link based on your affiliation. The participation conditions and respective fees are below.
- EUMaster4HPC programme students: Registrations are closed
- Students of other study programs (external students): Registrations are closed
The Summer School will offer students practical insights into HPC and data analytics, addressing the growing demand for expertise in these areas. Participants will gain competencies in various tools and techniques, empowering them to tackle real-world challenges effectively.
In this summer school, students will learn how to prepare data and understand its characteristics to create meaningful machine learning models on an HPC architecture. For this purpose, we will use open-source programming languages, such as R and Python. The supercomputing infrastructure of IT4Innovations will be used for hands-on exercises, which will be an integral part of all lectures. At the end of the summer school, the student will gain competencies in the following:
- Using a Linux-based HPC environment.
- Understanding the theoretical background of exploratory data analysis and modeling.
- Scale data analysis for Big Data in R and Python.
- Creating basic Machine and Deep Learning models in R and Python.
- Deciding whether to use Machine or Deep Learning methods.
- Building data processing pipelines for Machine or Deep Learning tasks.
- Knowing how to set up and run data analysis in parallel on an HPC cluster with R and Python.
- Parallelisation of Machine and Deep Learning tasks to use multiple compute nodes and/or multiple accelerators (GPUs).
- Using MATLAB tools for HPC and data analysis.
IT4Innovations National Supercomputing Center is a research center active in HPC, Data Analysis, AI, and Quantum Computing and their applications to other scientific fields, industry, and society.
Vienna Scientific Cluster (VSC) is the national supercomputer of Austria, its VSC Research Center is devoted to facilitate the use of high-performance computing in all domains of science and research, it is recognised for its first-class training offer.
MathWorks are the leading producers of technical computing software. Millions of users trust MATLAB® and Simulink® to accelerate the pace of their engineering and science.
Detailed agenda
Saturday 17 August: Arrival of students and guests and check-in to the hotel.
Sunday 18 August: Summer School Welcome event at IT4Innovations.
12:00 – 13:30 Lunch
13:30 – 14:00 Introduction of the school programme, practical information
14:00 – 15:00 Introduction of the organisers
- IT4I
- MathWorks
- EUMaster4HPC
15:00 – 15:30 Coffee Break
15:30 – 16:30 Guided tours around IT4I’s Infrastructure
16:30 – 18:00 Teambuilding activities
18:00 – 21:00 Welcome reception
Monday 19 August
9:00 Accessing and using IT4I clusters
- First login.
- How to get your data to the cluster.
- How to log in to the cluster and prepare a computation environment.
- How to submit computational jobs.
10:30 Coffee Break
11:00 Introduction to Data Science
12:30 Lunch Break
13:30 Coding Challenge Part 1
15:00 Coffee Break
15:30 Coding Challenge Part 2
Tuesday 20 August
9:00 Introduction to R
- What is R and when to consider using it?
- Basic data types
- Programming styles in R
- Very short introduction into tidyverse universe
10:30 Coffee Break
10:45 Exploratory Data Analysis with R
- How to get basic understanding of data
- Explore and handle missing values and outliers
- Clean up messy data
- Visualisation of basic relationships
12:15 Lunch Break
13:15 Modelling with R
- Introduction to modelling with tidy models packages
- Creation of basic ML pipeline
- An end-to-end example with XGBoost
15:00 Coffee Break
15:15 Parallelisation in R
- Local machine parallelisation
- Differences of parallelisation on Windows and UNIX OS
- Multi-node parallelisation
- Simple multi-node example in data science workflow
Wednesday 21 August
09:00 Challenge Reports: 1st cohort of EUMaster4HPC students
10:30 Coffee break
10:45 Dask, Numba, Ray: Parallelise the lazy way
11:30 Fast, faster, NumPy: Why is the popular library hard to beat?
12:00 Numerical computations on a GPU: Which tool does the best job?
12:30 Lunch break
13:30 Data analysis in Python: Pandas, Polars and the rest of the zoo.
15:00 Coffee break
15:15 Data visualisation: Insightful and pretty?!
16:30 Quiz & Recap
17:30 Leaving from the hotel to the Planetarium
18:00 Social event at planetarium
Thursday 22 August
09:00 ML intro: Welcome to weight watching.
09:30 Scikit-Learn: Get to know a living fossil.
09:45 Regression vs Classification: What’s your problem?
10:00 Data pre-processing: Visualise, clean, transform.
10:30 Coffee break
10:45 Prominent ML algorithms: SVMs, Decision Trees, K-nearest neighbors & ensemble methods.
11:00 Evaluation: Which model performed best?
11:30 Hyperparameters: Twiddle the knobs and dials.
12:00 Scaling Scikit-Learn: Dask and RAPIDS to the rescue.
12:30 Lunch break
13:30 Neural Networks: Dive in at the deep (learning) end.
14:15 Tensorflow & Keras: The easy way to become an architect.
15:00 Coffee break
15:15 Convolutional Neural Networks: Give your computer a vision.
16:15 Distributed Training: Sharing the burden.
16:45 Outlook on Transformers: Welcome to the future.
Friday 23 August
9:00 Writing Fast and Efficient MATLAB Code
- 1000x speed-up: Exploring the MATLAB performance landscape
- Code Profiler and best practices
- Parallelising MATLAB code: From desktop to HPC and cloud
- GPU Computing in MATLAB
10:30 Coffee Break
10:45 Big Data Analysis with MATLAB
- Reading big data, using parquet files
- Datatypes for big data (datastores, tall arrays)
- Downstream Analysis of big data: “needle in the haystack”-analysis, “for each”-analysis, “across all”-analysis
- Interoperability
12:15 Lunch Break
13:15 Hands-on Work
- Choose a project that fits your interests. Projects will introduce additional MATLAB functionality in the areas of large scale HPC, Deep Learning, Image Processing and Signal Analysis.
15:00 Coffee Break
15:30 Hands-on Work Continued
17:00 Teams Competition Awarding Ceremony
18:00 Leaving from the hotel with a bus to Ostrava centre
18:30 Dinner in a restaurant in Ostrava downtown
Saturday 24 August: Trip to Pustevny
Conditions and Fees
Eligible EUMaster4HPC programme students need to cover their travel costs to Ostrava and back. Their accommodation and attendance at the summer school are arranged and fully covered directly by the organisers. It includes attendance at the lessons, access to the HPC infrastructure, catering during the lessons (coffee breaks and lunches), two social events, and a field trip to Pustevny.
External students must cover all expenses related to travel and accommodation themselves. MathWorks kindly sponsors their attendance fee of 250 EUR. It includes attendance at the lessons, access to the HPC infrastructure, catering during the lessons (coffee breaks and lunches), two social events, and a field trip to Pustevny.
External students need to arrange for their travel to Ostrava. Organisers can help book accommodation at the same hotel as the EUMaster4HPC students and tutors, but the external students must pay for their stay. The prices vary based on occupancy and room standards and will be communicated to accepted students.
In the event of a higher demand than available spots for external students, applications will be carefully reviewed. Selected students will be notified about their acceptance by the end of June. We kindly request that you refrain from making any bookings until you receive official confirmation from us about your acceptance. This ensures a fair and transparent selection process for all applicants.
Venue and Travel
Summer School 2024 will be held in Ostrava, Czech Republic. Below, you can find detailed information about the different venues, hotel, and a guide for travel to Ostrava. The venues are within walkable distance from the hotel, except for the Pustevny. For the field trip, a bus transport will be organised.
The welcome reception on Sunday will be held at IT4Innovations National Supercomputing Center, Studentská 6231/1B, Ostrava—Poruba, Czech Republic. View Map
The venue of the school is VŠB – Technical University of Ostrava, New Auditorium, 17. listopadu 15, Ostrava – Poruba. View Map
The social event venue is Planetarium Ostrava, K Planetáriu 502, 725 26 Ostrava 26. View Map
The field trip is organised to Pustevny in the Beskydy Mountains. View Map
The accommodation for EUMaster4HPC students and tutors is booked in Hotel Garni at the university dormitories. View Map
Travel guide including options of flights to Ostrava is available here.
Any questions?
If you have any questions, please do not hesitate to contact us at training@it4i.cz