Main page | Study Branches/Specializations | Groups of Courses | All Courses | Roles                Instructions

A course is the basic teaching unit, it's design as a medium for a student to acquire comprehensive knowledge and skills indispensable in the given field. A course guarantor is responsible for the factual content of the course.
For each course, there is a department responsible for the course organisation. A person responsible for timetabling for a given department sets a time schedule of teaching and for each class, s/he assigns an instructor and/or an examiner.
Expected time consumption of the course is expressed by a course attribute extent of teaching. For example, extent = 2 +2 indicates two teaching hours of lectures and two teaching hours of seminar (lab) per week.
At the end of each semester, the course instructor has to evaluate the extent to which a student has acquired the expected knowledge and skills. The type of this evaluation is indicated by the attribute completion. So, a course can be completed by just an assessment ('pouze zápočet'), by a graded assessment ('klasifikovaný zápočet'), or by just an examination ('pouze zkouška') or by an assessment and examination ('zápočet a zkouška') .
The difficulty of a given course is evaluated by the amount of ECTS credits.
The course is in session (cf. teaching is going on) during a semester. Each course is offered either in the winter ('zimní') or summer ('letní') semester of an academic year. Exceptionally, a course might be offered in both semesters.
The subject matter of a course is described in various texts.

MI-DDM Distributed Data Mining Extent of teaching: 3C
Instructor: Completion: KZ
Department: 18105 Credits: 4 Semester: L

Annotation:
Course focuses on state-of-the-art approaches for distributed data mining and parallelization of machine learning algorithms. Students will gain hands on experience with large scale data processing framework Apache Spark and with existing distributed DM / ML algorithms. They will learn principles of their parallel implementations and will be capable to propose approaches to parallelize other algorithms. The course is prezented in czech language.

Lecture syllabus:

Seminar syllabus:
1) Introduction to MapReduce, Apache Spark and cluster infrastructure
2) Data structures of Apache Spark framework: RDDs, Dataframes, Datasets
3) Apache Spark ML pipelines, ML Lib
4) Distributed data, data exploration, basic statistics
5) Distributed data-preprocessing (feature extraction and transformation, feature selection, dimensionality reduction)
6) Association rule mining, collaborative filtering, alternating least squares
7) Distributed classification and regression algorithms
8) Distributed clustering algorithms
9) Distributed ensemble algorithms
10) Algorithms for information retrieval and text mining
11) Deep learning and artificial neural networks
12) Stream processing, online algorithms

Literature:
Pentreath, Nick. Machine Learning with Spark. Packt Publishing Ltd, 2015.

Requirements:
Knowledge of at least one of the programming languages Python, Java or Scala. Knowledge of fundamentals of machine learning algorithms.

Informace o předmětu a výukové materiály naleznete na https://courses.fit.cvut.cz/MI-DDM/

The course is also part of the following Study plans:
Study Plan Study Branch/Specialization Role Recommended semester
NI-TI.2018 Computer Science V 2
MI-ZI.2016 Knowledge Engineering V Není
MI-ZI.2018 Knowledge Engineering V Není
MI-SP-TI.2016 System Programming V Není
MI-SP-SP.2016 System Programming V Není
MI-SPOL.2016 Unspecified Branch/Specialisation of Study V Není
MI-WSI-WI.2016 Web and Software Engineering V Není
MI-WSI-SI.2016 Web and Software Engineering V Není
MI-WSI-ISM.2016 Web and Software Engineering V Není
MI-NPVS.2016 Design and Programming of Embedded Systems V Není
MI-PSS.2016 Computer Systems and Networks V Není
MI-PB.2016 Computer Security V Není


Page updated 19. 4. 2024, semester: L/2020-1, L/2021-2, Z/2023-4, Z/2024-5, Z/2019-20, Z/2022-3, L/2019-20, L/2022-3, Z/2020-1, Z/2021-2, L/2023-4, Send comments to the content presented here to Administrator of study plans Design and implementation: J. Novák, I. Halaška