Course description

Main page | Study Branches/Specializations | Groups of Courses | All Courses | Roles Instructions

A course is the basic teaching unit, it's design as a medium for a student to acquire comprehensive knowledge and skills indispensable in the given field. A course guarantor is responsible for the factual content of the course.
For each course, there is a department responsible for the course organisation. A person responsible for timetabling for a given department sets a time schedule of teaching and for each class, s/he assigns an instructor and/or an examiner.
Expected time consumption of the course is expressed by a course attribute extent of teaching. For example, extent = 2 +2 indicates two teaching hours of lectures and two teaching hours of seminar (lab) per week.
At the end of each semester, the course instructor has to evaluate the extent to which a student has acquired the expected knowledge and skills. The type of this evaluation is indicated by the attribute completion. So, a course can be completed by just an assessment ('pouze zápočet'), by a graded assessment ('klasifikovaný zápočet'), or by just an examination ('pouze zkouška') or by an assessment and examination ('zápočet a zkouška') .
The difficulty of a given course is evaluated by the amount of ECTS credits.
The course is in session (cf. teaching is going on) during a semester. Each course is offered either in the winter ('zimní') or summer ('letní') semester of an academic year. Exceptionally, a course might be offered in both semesters.
The subject matter of a course is described in various texts.

MI-PDM Practical Data Mining Extent of teaching: 2P+1C

Instructor: Completion: Z,ZK

Department: 18105 Credits: 5 Semester: L

Annotation:
Students are introduced to the basic methods of discovering knowledge in data. In particular, they learn the basic techniques of data preprocessing, data visualization, statistical techniques of data transformation, and fundamental principles of knowledge discovery methods. Students will be aware of the relationships between model bias and variance, and know the fundamentals of assessing model quality. Data mining software is extensively used in the module. Students will be able to apply basic data mining tools to common problems (classification, regression, clustering).

Lecture syllabus:

1) Introduction and motivation

2) Decision trees

3) Clustering (K-means, hierarchical clustering)

4) K-NN

5) Naive Bayes

6) Linear regression

7) Logistic regression

8) Dimensionality reduction (SVD, PCA)

9) NLP (natural language processing)

Up to four lectures will be given by external speakers from the business.

Seminar syllabus:

1) Jupyter Notebook and panda, numpy, scikit-learn packages

2) Data visualisation

3) Decision trees

4) Clustering

5) Linear regression

6) PCA

Literature:

1. Larose, D. T. Discovering Knowledge in Data: An Introduction to Data Mining. Wiley-Interscience, 2004.

2. Hastie T.,Tibshirani R.,Friedman J., The Elements of Statistical Learning, Data Mining, Inference and Prediction, Springer, 2011

Requirements:
Fundamentals of algebra, statistics, programming

Informace o předmětu a výukové materiály naleznete na https://courses.fit.cvut.cz/MI-PDM/

The course is also part of the following Study plans:

Study Plan Study Branch/Specialization Role Recommended semester

MI-WSI-ISM.2016 Web and Software Engineering PZ 2

Page updated 25. 4. 2024, semester: Z,L/2023-4, Z/2019-20, Z/2024-5, L/2022-3, Z/2020-1, Z,L/2021-2, L/2020-1, Z/2022-3, L/2019-20, Send comments to the content presented here to Administrator of study plans Design and implementation: J. Novák, I. Halaška

MI-PDM	Practical Data Mining			Extent of teaching:	2P+1C
Instructor:				Completion:	Z,ZK
Department:	18105	Credits:	5	Semester:	L

1)		Introduction and motivation
2)		Decision trees
3)		Clustering (K-means, hierarchical clustering)
4)		K-NN
5)		Naive Bayes
6)		Linear regression
7)		Logistic regression
8)		Dimensionality reduction (SVD, PCA)
9)		NLP (natural language processing)

1)		Jupyter Notebook and panda, numpy, scikit-learn packages
2)		Data visualisation
3)		Decision trees
4)		Clustering
5)		Linear regression
6)		PCA

1.		Larose, D. T. Discovering Knowledge in Data: An Introduction to Data Mining. Wiley-Interscience, 2004.
2.		Hastie T.,Tibshirani R.,Friedman J., The Elements of Statistical Learning, Data Mining, Inference and Prediction, Springer, 2011