๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
self-study/์ฝ”๋“œ์ž‡ - ๋ฐ์ดํ„ฐ์‚ฌ์ด์–ธ์Šค

[์ฝ”๋“œ์ž‡ : ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค Toolkit] ํŒŒ์ด์ฌ ๋ฐ์ดํ„ฐ ๋ถ„์„ ์ฒซ๊ฑธ์Œ - Numpy, Pandas, Matplotlib๋กœ ์‰ฝ๊ณ  ๊ฐ•๋ ฅํ•˜๊ฒŒ ๋ฐฐ์šฐ๊ธฐ

by all_zer0 2025. 1. 11.
๋ฐ˜์‘ํ˜•

 

 

Numpy

1. Numpy๋ž€?

- Numerical Python : ์ˆ˜์น˜์ ์ธ ์—ฐ์‚ฐ์— ์ตœ์ ํ™”๋œ ํŒŒ์ด์ฌ ๋„๊ตฌ

- Numpy Array(๋„˜ํŒŒ์ด ๋ฐฐ์—ด) : Python List์™€ ์œ ์‚ฌํ•œ ์ž๋ฃŒํ˜•์ด์ง€๋งŒ ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ„๊ฒฐํ•œ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„

 

 

2. Numpy์™€ Array

[ ์ฝ”๋“œ ]

import numpy as np

print(np.zeros(5))
print(np.arange(10))
print(np.arange(2,10))  # ์‹œ์ž‘์ , ๋๋‚˜๋Š” ์ 
print(np.arange(4, 17, 3))  # ์‹œ์ž‘์ , ๋๋‚˜๋Š” ์ , ๊ฐ„๊ฒฉ
[ ์ถœ๋ ฅ๋ฌผ ]

[0. 0. 0. 0. 0.]
[0 1 2 3 4 5 6 7 8 9]
[2 3 4 5 6 7 8 9]
[ 4  7 10 13 16]

 

 

3. ์ธ๋ฑ์‹ฑ๊ณผ ์Šฌ๋ผ์ด์‹ฑ : 1์ฐจ์› array

[ ์ฝ”๋“œ ]

import numpy as np

gdp_array = np.array([6610, 7637, 8127, 8885, 10385, 12565, 13403, 12398, 8282, 10672])

# ์ธ๋ฑ์‹ฑ
print(gdp_array[0])
print(gdp_array[[1, 3, 4]])

# ์Šฌ๋ผ์ด์‹ฑ
print(gdp_array[2:6])   # 2~5๋ฒˆ ์ธ๋ฑ์Šค์— ์žˆ๋Š” ๊ฐ’์„ ์ž˜๋ผ์„œ ์ƒˆ๋กœ์šด numpy array ์ƒ์„ฑ
[ ์ถœ๋ ฅ๋ฌผ ]

6610
[ 7637  8885 10385]
[ 8127  8885 10385 12565]

 

 

4. ์ธ๋ฑ์‹ฑ๊ณผ ์Šฌ๋ผ์ด์‹ฑ : 2์ฐจ์› array

[ ์ฝ”๋“œ ]

import numpy as np

gdp_array = np.array([
    [12257, 11561, 13165, 14673, 16496, 19403],  # ๋Œ€ํ•œ๋ฏผ๊ตญ
    [39169, 34406, 32821, 35387, 38299, 37813],  # ์ผ๋ณธ
    [959, 1053, 1149, 1289, 1509, 1753],         # ์ค‘๊ตญ
    [36335, 37133, 38023, 39496, 41713, 44115]   # ๋ฏธ๊ตญ
])

# ์ธ๋ฑ์‹ฑ
print(gdp_array[1])         # ์ผ๋ณธ์˜ ์ž๋ฃŒ
print(gdp_array[1][3])      # ์ผ๋ณธ์˜ 4์ฐจ๋…„๋„ gdp / gdp_array[1, 3]๊ณผ ๋™์ผ

# ์Šฌ๋ผ์ด์‹ฑ
print(gdp_array[1:3, 2:5])  # ์ผ๋ณธ๋ถ€ํ„ฐ ์ค‘๊ตญ๊นŒ์ง€, ๊ฐ ๊ตญ๊ฐ€์˜ 2~4๋ฒˆ ์ธ๋ฑ์Šค ์ถœ๋ ฅ
[ ์ถœ๋ ฅ๋ฌผ ]

[39169 34406 32821 35387 38299 37813]
35387
[[32821 35387 38299]
 [ 1149  1289  1509]]

 

 

5. ๋ถˆ๋ฆฐ ์ธ๋ฑ์‹ฑ

[ ์ฝ”๋“œ ]

import numpy as np

gdp_array = np.array([6618, 8127, 8885, 12665, 13483, 12398, 8282, 10672])

gdp_array > 10000
[ ์ถœ๋ ฅ๋ฌผ ]

array([False, False, False,  True,  True,  True, False,  True])

 

- mask : 10,000์ด ๋„˜๋Š” ๊ฐ’๋“ค๋งŒ True๊ฐ€ ๋˜์–ด์„œ mask๋กœ ์ธ๋ฑ์‹ฑํ•˜๋ฉด ํ•ด๋‹น ๊ฐ’๋“ค๋งŒ ์ถœ๋ ฅ

[ ์ฝ”๋“œ(์ด์–ด์„œ) ]

mask = gdp_array > 10000
gdp_array[mask]
[ ์ถœ๋ ฅ๋ฌผ ]

array([12665, 13483, 12398, 10672])

 

- AND ์—ฐ์‚ฐ๊ณผ OR ์—ฐ์‚ฐ

[ ์ฝ”๋“œ(์ด์–ด์„œ) ]

# AND ์—ฐ์‚ฐ์ž : &
gdp_array[(gdp_array < 10000) & (gdp_array > 8000)]
[ ์ถœ๋ ฅ๋ฌผ ]

array([8127, 8885, 8282])

 

[ ์ฝ”๋“œ(์ด์–ด์„œ) ]

# OR ์—ฐ์‚ฐ์ž : |

gdp_array[(gdp_array > 10000) | (gdp_array < 8000)]
[ ์ถœ๋ ฅ๋ฌผ ]

array([ 6618, 12665, 13483, 12398, 10672])

 

 

6. Numpy ๊ธฐ๋ณธ ์—ฐ์‚ฐ

- gdp_korea_array.mean() : ํ‰๊ท 
- gdp_korea_array.sum() : ํ•ฉ๊ณ„
- gdp_korea_array.min() : ์ตœ์†Ÿ๊ฐ’
- gdp_korea_array.max() : ์ตœ๋Œ“๊ฐ’
- gdp_korea_array * 1200 : ๋ชจ๋“  ๊ฐ’์— 1200 ๊ณฑํ•˜๊ธฐ (๋‹จ, ๊ธฐ์กด ๋ณ€์ˆ˜์— ์ƒˆ๋กœ ์ €์žฅ๋˜๋Š” ๊ฒƒ์€ ์•„๋‹˜ -> ์ €์žฅํ•˜๋ ค๋ฉด gdp_korea_array = gdp_korea_array * 1200 ์œผ๋กœ ์ƒˆ๋กœ ๋ฐฐ์—ด ์ €์žฅ)

 

 

Matplotlib

1. Matplotlib ๊ฐœ์š”

- ํŒŒ์ด์ฌ๊ณผ Numpy๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

[ ์ฝ”๋“œ ]

import numpy as np
import matplotlib.pyplot as plt

year_array = np.array([2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020])

stock_array = np.array([
    14.46, 19.01, 20.04, 27.59, 26.32,
    28.96, 42.31, 39.33, 73.41, 132.69
])

plt.plot(year_array, stock_array)
plt.show()

 

[ ์ถœ๋ ฅ๋ฌผ ]

- plt.plot(x์ถ•, y์ถ•) : ์„ ํ˜• ๊ทธ๋ž˜ํ”„ / plt.bar : ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„ / plt.scatter : ์‚ฐ์ ๋„(์—ฐ๊ด€๋„)

 

 

2. Matplotlib ๊ทธ๋ž˜ํ”„ ๊ฐ„๋‹จํ•˜๊ฒŒ ๊พธ๋ฏธ๊ธฐ

- plt.scatter(height_array, weight_array, c='red ํ˜น์€ HEX code', marker = '+ ๋˜๋Š” s' ) : c๋Š” color, marker๋Š” ์ ์˜ ๋ชจ์–‘
- plt.title('Height and Weight')
- plt.xlabel('Height (cm)')
- plt.ylabel('Weight (kg)')

- ๋” ๋งŽ์€ ์˜ต์…˜์€ ์•„๋ž˜ ๋งํฌ ์ฐธ๊ณ 

 

matplotlib.markers — Matplotlib 3.10.0 documentation

matplotlib.markers Functions to handle markers; used by the marker functionality of plot, scatter, and errorbar. All possible markers are defined here: Note that special symbols can be defined via the STIX math font, e.g. "$\u266B$". For an overview over t

matplotlib.org

 

 

3. ๊ทธ๋ž˜ํ”„ ์‚ฌ์ด์ฆˆ ์กฐ์ ˆํ•˜๊ธฐ

(1) ๊ฐœ๋ณ„ ๊ทธ๋ž˜ํ”„ ์‚ฌ์ด์ฆˆ ์กฐ์ ˆ

[ ์ฝ”๋“œ ]

plt.figure(figsize=(10, 4))		# ๊ฐ€๋กœ, ์„ธ๋กœ ์‚ฌ์ด์ฆˆ ์กฐ์ ˆ
plt.plot(year_array, stock_array)
plt.title('GDP Growth')			# ๊ทธ๋ž˜ํ”„ ์ œ๋ชฉ ์„ค์ •
plt.xlabel('Year')
plt.ylabel('GDP')
plt.show()

 

[ ์ถœ๋ ฅ๋ฌผ ]

 

 

(2) ์ „์ฒด ๊ทธ๋ž˜ํ”„ ํฌ๊ธฐ ์„ค์ •

[ ์ฝ”๋“œ ]

plt.rcParams['figure.figsize'] = (5, 5)
plt.scatter(height_array, weight_array)
plt.title('Scatter Plot')
plt.xlabel('X Label')
plt.ylabel('Y Label')
plt.show()

 

 

4. Matplotlib ๊ทธ๋ž˜ํ”„์— ํ•œ๊ธ€๋กœ ๋œ ํ…์ŠคํŠธ ๋„ฃ๊ธฐ

- ํ•œ๊ธ€์„ ๋„ฃ์œผ๋ฉด ๊ธ€์ž๊ฐ€ ๋ชจ๋‘ ๊ป˜์ง€๊ฒŒ ๋จ -> ํ•œ๊ตญ์–ด ํฐํŠธ๋กœ ๋ฐ”๊พธ๋Š” ์ฝ”๋“œ๊ฐ€ ํ•„์š”ํ•จ

[ ์ฝ”๋“œ ]

plt.rc('font', family='Apple Gothic')

 

 

Pandas

1. Pandas ๊ฐœ์š”

(1) Numpy array์˜ ๋‹จ์  : pandas ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ๋ชจ๋‘ ํ•ด๊ฒฐ ๊ฐ€๋Šฅ

  - ๊ฐ€๋…์„ฑ์ด ๋–จ์–ด์ง

  - ์ •๋ณด์— ๋Œ€ํ•œ ๋ ˆ์ด๋ธ” ์‚ฝ์ž… ๋ถˆ๊ฐ€

  - ํ•œ ๊ฐ€์ง€ ๋ฐ์ดํ„ฐ ํƒ€์ž…๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅ

 

(2) Pandas ์‚ฌ์šฉ๋ฒ•

[ ์ฝ”๋“œ ]

import pandas as pd

df = pd.DataFrame({
    'category': ['skirt', 'sweater', 'coat', 'jeans'],
    'quantity': [10, 15, 6, 11],                                     
    'price': [30000, 60000, 95000, 35000]
})
df

 

[ ์ถœ๋ ฅ๋ฌผ ]

 

- ํŠน์ • ์—ด์˜ ๋ฐ์ดํ„ฐ๋งŒ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ ๊ฐ€๋Šฅ

[ ์ฝ”๋“œ ]

df['quantity']
[ ์ถœ๋ ฅ๋ฌผ ]

0	10
1	15
2	 6
3	11

 

- mean, sum, min๊ณผ ๊ฐ™์€ ์—ฐ์‚ฐ์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ

[ ์ฝ”๋“œ ]

df['quantity'].mean

 

 

2. Pandas ์™ธ๋ถ€ ์ž๋ฃŒ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

[ ์ฝ”๋“œ ]

import pandas as pd

burger_df2 = pd.read_csv("data/burger2.csv", header=None,
                        names=["product_name", "calories", "carb", "protein", "fat", "sodium", "category"],
                        index_col="product_name")
burger_df2

- csv : comma-separated values = ๊ฐ’๋“ค์ด ์‰ผํ‘œ๋กœ ๋‚˜๋ˆ„์–ด์ ธ ์žˆ์Œ

 

[ ์ถœ๋ ฅ๋ฌผ ]

 

 

3. DataFrame์—์„œ ์ผ๋ถ€ ๋ฐ์ดํ„ฐ๋งŒ ์ถ”์ถœํ•˜๊ธฐ(iloc, loc)

(1) iloc : integer location, ์ •์ˆ˜๊ฐ’์˜ ์ธ๋ฑ์Šค๋ฅผ ํ†ตํ•ด์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์•„์˜ค๋Š” ๊ฒƒ

 

(2) loc : location, ์œ„์น˜๋ฅผ ํ†ตํ•ด์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์•„์˜ค๋Š” ๊ฒƒ

- iloc์€ ๋งˆ์ง€๋ง‰ ๊ฐ’์€ ํฌํ•จ๋˜์ง€ ์•Š๋Š” ๋ฐ˜๋ฉด, loc์€ ๋งˆ์ง€๋ง‰ ๊ฐ’๊นŒ์ง€ ํฌํ•จํ•ด์„œ ๊ฒฐ๊ณผ ๋„์ถœ

 

 

4. DataFrame๊ณผ ๋ถˆ๋ฆฐ ์ธ๋ฑ์‹ฑ

[ ์ฝ”๋“œ ]

burger_df.loc[burger_df['calories']] < 500, 'protein'] # ํ•œ column์„ ๋ณด๊ณ  ์‹ถ์€ ๊ฒฝ์šฐ
burger_df.loc[burger_df['calories']] < 500, ['carb', 'protein']] # ์—ฌ๋Ÿฌ column์„ ๋ณด๊ณ  ์‹ถ์€ ๊ฒฝ์šฐ

 

[ ์ถœ๋ ฅ๋ฌผ ]

 

- ํ˜น์€ ํ•ด๋‹น ๋ฐ์ดํ„ฐ๋ฅผ ํ…Œ์ด๋ธ”์˜ ํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐฉ๋ฒ•

[ ์ฝ”๋“œ ]

mask = burger_df2['calories'] < 500
burger_df2[mask]

 

 

5. ๋ฐ์ดํ„ฐ ์ˆ˜์ • ๋ฐ ์ถ”๊ฐ€ํ•˜๊ธฐ

# ์…€ ํ•˜๋‚˜ ์ˆ˜์ •ํ•˜๊ธฐ
burger_df2.loc['Duble Stacker King', 'sodium'] = 1.9

# row ํ•œ ์ค„ ์ˆ˜์ •ํ•˜๊ธฐ
burger_df2.loc['Cheeseburger'] = [360, 24, 18, 21, 0.7, 'Burger']

# column ํ•œ ์ค„ ์ˆ˜์ •ํ•˜๊ธฐ
burger_df['sodium'] = [1.8, ... 1.3]

# ์ƒˆ๋กœ์šด row/column ์ถ”๊ฐ€ํ•˜๊ธฐ
burger_df.loc['Tripple Whopper'] = [1130, 49, 67, 75, 1.1, 'Burger'] 	# row ์ถ”๊ฐ€
burger_df['brand'] = 'Burger King'										# column ์ถ”๊ฐ€

# ์กฐ๊ฑด์— ๋”ฐ๋ผ ๊ฐ’์„ ์ถ”๊ฐ€ํ•˜๊ธฐ
burger_df.loc[burger_df['calories'] >= 500, 'high_calorie'] = True

 

 

6. Pandas์—์„œ ๊ทธ๋ž˜ํ”„ ๋งŒ๋“ค๊ธฐ

[ ์ฝ”๋“œ ]

sales_df.plot()
plt.show()

# x์ถ•๊ณผ y์ถ•์— ๋“ค์–ด๊ฐˆ ๊ฐ’์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Œ
sales_df.plot(x='quarter', y='revenue')
plt.show()

# ๊ทธ๋ž˜ํ”„ ํ˜•ํƒœ ์ง€์ •
sales_df.plot(x='quarter', y='revenue', kind='bar', labels =['1Q', '2Q', '3Q', '4Q'])
plt.show()

 

 

๋ฐ˜์‘ํ˜•

'self-study > ์ฝ”๋“œ์ž‡ - ๋ฐ์ดํ„ฐ์‚ฌ์ด์–ธ์Šค' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[์ฝ”๋“œ์ž‡ : ๊ธฐ์ดˆ ํ†ต๊ณ„์™€ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”] ํ†ต๊ณ„์™€ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™” ์™„๋ฒฝ ๊ฐ€์ด๋“œ : ํ†ต๊ณ„ ๊ธฐ๋ณธ ์ƒ์‹ | seaborn | seaborn์œผ๋กœ ๊ทธ๋ž˜ํ”„ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•  (0) 2025.01.25
[์ฝ”๋“œ์ž‡ : ํ”„๋กœ๊ทธ๋ž˜๋ฐ๊ณผ ๋ฐ์ดํ„ฐ in Python] ํŒŒ์ด์ฌ ์ดˆ๋ณด ํƒˆ์ถœ! ๋ฆฌ์ŠคํŠธ์™€ ๋”•์…”๋„ˆ๋ฆฌ ํ™œ์šฉ๋ฒ•๋ถ€ํ„ฐ ํ…์ŠคํŠธ ํŒŒ์ผ ์ฝ๊ธฐยท์“ฐ๊ธฐ ๋ฐฉ๋ฒ•  (0) 2025.01.10
[์ฝ”๋“œ์ž‡ : ํ”„๋กœ๊ทธ๋ž˜๋ฐ ํ•ต์‹ฌ ๊ฐœ๋… in Python] Python ๊ธฐ์ดˆ ๋ฌธ๋ฒ• ์™„๋ฒฝ ์ •๋ฆฌ - ์ž๋ฃŒํ˜• | ๋ฌธ์ž์—ด | ์ œ์–ด๋ฌธ | ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ(PEP8)  (0) 2025.01.10
[์ฝ”๋“œ์ž‡ : ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์‹œ์ž‘ํ•˜๊ธฐ in Python] ํŒŒ์ด์ฌ ๊ธฐ์ดˆ ๊ฐœ๋… ํ•™์Šตํ•˜๊ธฐ - ์ž๋ฃŒํ˜• | ์ถ”์ƒํ™” | ํ•จ์ˆ˜ | ํŒŒ๋ผ๋ฏธํ„ฐ  (0) 2025.01.09
[์ฝ”๋“œ์ž‡ : ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์˜ค๋ฒ„๋ทฐ] ํŒŒ์ด์ฌ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ ๋ถ„์„๊นŒ์ง€ : ๋น„์ „๊ณต์ž ์ดˆ๋ณด ๋ฐ์ดํ„ฐ ๋ถ„์„๊ฐ€์˜ ์ฝ”๋”ฉ ๊ณต๋ถ€ ์‹œ์ž‘ ๐Ÿƒ๐Ÿปโ€โ™€๏ธ  (0) 2025.01.09