This only includes relevant information. Useless / trivial information is not included. If you don't know shit about AC read the slides first.

Data Mining

KDD Process Model

  1. Selection

Identification and selection of all external and internal sources of information and selection of the subset of data or variables needed for the KDD process. In short, get the fucking data.

  1. Preprocessing

Includes the removal of data with extreme values, filling in missing values, etc.

  1. Transformation

Converting data into a format suitable for Data Mining algorithms

  1. Data Mining

Applying computational methods to find previously unknown or useful patterns in data, like finding associations, detecting anomalies, extracting rules, etc.

  1. Interpretation / Evaluation

Evaluate the results, and if they are not satisfactory form a new set of experiments.

CRISP-DM

CRoss Industry Standard Process for Data Mining

  1. Business Understanding

Focuses on understanding the objective of the project from a business perspective.

  1. Data Understanding

Identify problems / interesting information about the data.

  1. Data Preparation

Construction of the final data set from the initial one. Usually occurs several times in the process.

  1. Modelling

Apply modelling techniques to the data and calibrate its parameters. It is common to return to data preparation.

  1. Evaluation

Verify the generated model to check if it reaches the goals defined in the business understanding phase.

  1. Deployment

Deploy the thing.

Descriptive Statistics

!Pasted image 20260121193844.png

Population

Set of similar objects which is of interest for some experiment.

Sample

Set of a data collected / selected from a population.

Deduction

Reasoning about the sample extracted from that population.

Induction

Concerns reasoning about the population given a sample

Descriptive Statistics

Methods / techniques to describe or summarize samples in order to help humans to understand it.

Scale Types

Qualitative Scales:
- Nominal: categorize data in a non-ordinal way (== !=)
- Ordinal: categorize data in an ordinal way (>, <=)
Quantitative Scales:
- Relative: does not have an absolute zero (- + )
- Absolute: has an absolute zero (/ * )

Descriptive univariate analysis**

Descriptive bivariate analysis

Covariance

Measures the degree of presence of lineara relation between two attributes.

Descriptive Multivariate Analysis

Multivariate Frequencies


EXAM 23/24

  1. e
  2. d
  3. c
  4. a
  5. c
  6. b

EXAM 24/25

  1. idk
  2. idk
  3. b
  4. a
  5. b
  6. c
  7. e
  8. idk
  9. c
  10. a
  11. idk
  12. b
  13. b
  14. e
  15. c
  16. a
  17. b
  18. b
  19. idk
  20. b
  21. sdfs
  22. a/b
  23. c