This only includes relevant information. Useless / trivial information is not included. If you don't know shit about AC read the slides first.
Data Mining
KDD Process Model
- Selection
Identification and selection of all external and internal sources of information and selection of the subset of data or variables needed for the KDD process. In short, get the fucking data.
- Preprocessing
Includes the removal of data with extreme values, filling in missing values, etc.
- Transformation
Converting data into a format suitable for Data Mining algorithms
- Data Mining
Applying computational methods to find previously unknown or useful patterns in data, like finding associations, detecting anomalies, extracting rules, etc.
- Interpretation / Evaluation
Evaluate the results, and if they are not satisfactory form a new set of experiments.
CRISP-DM
CRoss Industry Standard Process for Data Mining
- Business Understanding
Focuses on understanding the objective of the project from a business perspective.
- Data Understanding
Identify problems / interesting information about the data.
- Data Preparation
Construction of the final data set from the initial one. Usually occurs several times in the process.
- Modelling
Apply modelling techniques to the data and calibrate its parameters. It is common to return to data preparation.
- Evaluation
Verify the generated model to check if it reaches the goals defined in the business understanding phase.
- Deployment
Deploy the thing.
Descriptive Statistics
!Pasted image 20260121193844.png
Population
Set of similar objects which is of interest for some experiment.
Sample
Set of a data collected / selected from a population.
Deduction
Reasoning about the sample extracted from that population.
Induction
Concerns reasoning about the population given a sample
Descriptive Statistics
Methods / techniques to describe or summarize samples in order to help humans to understand it.
Scale Types
Qualitative Scales:
- Nominal: categorize data in a non-ordinal way (== !=)
- Ordinal: categorize data in an ordinal way (>, <=)
Quantitative Scales:
- Relative: does not have an absolute zero (- + )
- Absolute: has an absolute zero (/ * )
Descriptive univariate analysis**
- Absolute Frequency: counts how many times a value appears
- Relative Frequency: counts the percentage of times that value appears.
- Absolute Cumulative Frequency: number of occurrences less or equal than a given value.
- Relative Cumulative Frequency: the percentage of occurrences less or equal than a given value.
Descriptive bivariate analysis
Covariance
Measures the degree of presence of lineara relation between two attributes.
Descriptive Multivariate Analysis
Multivariate Frequencies
EXAM 23/24
- e
- d
- c
- a
- c
- b
EXAM 24/25
- idk
- idk
- b
- a
- b
- c
- e
- idk
- c
- a
- idk
- b
- b
- e
- c
- a
- b
- b
- idk
- b
- sdfs
- a/b
- c