[ Week 2-1 ] types of attribute / probability / Entropy

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

working_helen

[ Week 2-1 ] types of attribute / probability / Entropy 본문

교내 수업/Machine Learning

[ Week 2-1 ] types of attribute / probability / Entropy

HaeWon_Seo 2024. 3. 17. 20:17

Lecture : Machine Learning

Date : week 2-1, 2024/03/04

Topic : Probability

1. types of Attribute
2. Probability Model
3. Entropy

1. types of Attribute

Categorical (Normal) variable : discrete + no ordering, boolean type도 포함
Ordinal variable : discrete value + natural ordering, mathematical operations 적용 X
Continous (Numerical) variable : real-valued, mathematical operations 적용 O

- Oridinal 변수는 수학 연산을 적용했을때 의미가 없지만, Continous 변수는 의미를 가짐

ex) 1등의 2배 (X), 1m의 2배는 2m (O)

- int 변수는? -> 값은 discrete 이지만, 수학 연산을 적용가능한 continuous value로 받아들임
- 변수의 종류에 적합한 model과 학습 방법을 선택해야 함

2. Probability Model

1) Probability Model

= sample space + events + probability distribution
- event들에 대한 수학적 표현
- probability distriution을 통해 각 event가 발생할 확률 P(x)를 계산

2) Probability distribution

- Bernoulli trials : 가능한 결과를 2가지만 가지는 independent events
- Binomial distribution (이항분포) : Bernoulli trails를 여러번 실행한 결과로 생성되는 확률 분포

출처 : https://en.wikipedia.org/wiki/Binomial_distribution

- Multinomial distribution (다항분포) : 가능한 결과가 3개 이상인 independent events를 시행을 여러번 실행한 결과로 생성되는 확률 분포

출처 : https://en.wikipedia.org/wiki/Multinomial_distribution

- Normal (Gaussian) distribution (정규분포) : noisy continous variable의 확률 분포로 자주 사용

출처 : https://en.wikipedia.org/wiki/Normal_distribution

3. Entropy

1) Entropy
- Measure of information/unpredictability

- 변수의 가능한 결과가 가지는 평균 정보량/불확실성 정도

- higher Entropy → can convey more information, 변수의 평균 정보량이 많음
- Low Entropy = 변수의 결과가 highly predictable, 예측이 쉽기 때문에 정보량이 적음
- High Entropy = 변수의 결과가 highly unpredictable, 예측이 어려워 정보량이 많음

2) Entropy 값 계산

- discrete variable의 Entropy 계산 공식

출처 : https://en.wikipedia.org/wiki/Entropy_(information_theory)

- Entropy 값의 범위 : 0 ~ log(N), N = 가능한 결과의 개수
- 가능한 결과가 1가지 → Entropy = 0
- 모든 event의 발생 확률이 동일 → Entropy = log(N)

3) Entropy 사용 예시 : convey message + data compression

- data compression : Entropy를 이용해 전달되는 정보량을 최소화하도록 기존 데이터를 압축
- text message compression : 기존의 text message를 최소한의 bit, 최소한의 정보량으로 전달하기 위해 각 알파벳에 별도의 code를 부여함으로써 데이터를 압축하는 방식
- 사용 빈도가 높은 알파벳 = probability가 높은 알파벳 → shorter code 부여
= 전달되는 bit의 길이를 짧게 만들기 = text의 Entropy 값 작게 만들기

Reference

https://en.wikipedia.org/wiki/Entropy_(information_theory)

'교내 수업 > Machine Learning' 카테고리의 다른 글

[ Week 4-2 ] Decision Tree, ID3 algorithm (0)	2024.03.25
[ Week 3-2 ] Discretisation, Naive Bayes with continuous variable (0)	2024.03.22
[ Week 3-1 ] Instance-based Learning KNN (0)	2024.03.18
[ Week 2-2 ] Naive Bayes Model (0)	2024.03.17
[ Week 1 ] ML terminology / learning strategy (0)	2024.03.17

'교내 수업/Machine Learning' Related Articles

working_helen

[ Week 2-1 ] types of attribute / probability / Entropy 본문

[ Week 2-1 ] types of attribute / probability / Entropy

1. types of Attribute

2. Probability Model

3. Entropy

'교내 수업 > Machine Learning' 카테고리의 다른 글

티스토리툴바