
Numpy
1. Numpy๋?
- Numerical Python : ์์น์ ์ธ ์ฐ์ฐ์ ์ต์ ํ๋ ํ์ด์ฌ ๋๊ตฌ
- Numpy Array(๋ํ์ด ๋ฐฐ์ด) : Python List์ ์ ์ฌํ ์๋ฃํ์ด์ง๋ง ๋ง์ ์์ ๋ฐ์ดํฐ๋ฅผ ๊ฐ๊ฒฐํ ์ฝ๋๋ก ๊ตฌํ
2. Numpy์ Array
[ ์ฝ๋ ]
import numpy as np
print(np.zeros(5))
print(np.arange(10))
print(np.arange(2,10)) # ์์์ , ๋๋๋ ์
print(np.arange(4, 17, 3)) # ์์์ , ๋๋๋ ์ , ๊ฐ๊ฒฉ
[ ์ถ๋ ฅ๋ฌผ ]
[0. 0. 0. 0. 0.]
[0 1 2 3 4 5 6 7 8 9]
[2 3 4 5 6 7 8 9]
[ 4 7 10 13 16]
3. ์ธ๋ฑ์ฑ๊ณผ ์ฌ๋ผ์ด์ฑ : 1์ฐจ์ array
[ ์ฝ๋ ]
import numpy as np
gdp_array = np.array([6610, 7637, 8127, 8885, 10385, 12565, 13403, 12398, 8282, 10672])
# ์ธ๋ฑ์ฑ
print(gdp_array[0])
print(gdp_array[[1, 3, 4]])
# ์ฌ๋ผ์ด์ฑ
print(gdp_array[2:6]) # 2~5๋ฒ ์ธ๋ฑ์ค์ ์๋ ๊ฐ์ ์๋ผ์ ์๋ก์ด numpy array ์์ฑ
[ ์ถ๋ ฅ๋ฌผ ]
6610
[ 7637 8885 10385]
[ 8127 8885 10385 12565]
4. ์ธ๋ฑ์ฑ๊ณผ ์ฌ๋ผ์ด์ฑ : 2์ฐจ์ array
[ ์ฝ๋ ]
import numpy as np
gdp_array = np.array([
[12257, 11561, 13165, 14673, 16496, 19403], # ๋ํ๋ฏผ๊ตญ
[39169, 34406, 32821, 35387, 38299, 37813], # ์ผ๋ณธ
[959, 1053, 1149, 1289, 1509, 1753], # ์ค๊ตญ
[36335, 37133, 38023, 39496, 41713, 44115] # ๋ฏธ๊ตญ
])
# ์ธ๋ฑ์ฑ
print(gdp_array[1]) # ์ผ๋ณธ์ ์๋ฃ
print(gdp_array[1][3]) # ์ผ๋ณธ์ 4์ฐจ๋
๋ gdp / gdp_array[1, 3]๊ณผ ๋์ผ
# ์ฌ๋ผ์ด์ฑ
print(gdp_array[1:3, 2:5]) # ์ผ๋ณธ๋ถํฐ ์ค๊ตญ๊น์ง, ๊ฐ ๊ตญ๊ฐ์ 2~4๋ฒ ์ธ๋ฑ์ค ์ถ๋ ฅ
[ ์ถ๋ ฅ๋ฌผ ]
[39169 34406 32821 35387 38299 37813]
35387
[[32821 35387 38299]
[ 1149 1289 1509]]
5. ๋ถ๋ฆฐ ์ธ๋ฑ์ฑ
[ ์ฝ๋ ]
import numpy as np
gdp_array = np.array([6618, 8127, 8885, 12665, 13483, 12398, 8282, 10672])
gdp_array > 10000
[ ์ถ๋ ฅ๋ฌผ ]
array([False, False, False, True, True, True, False, True])
- mask : 10,000์ด ๋๋ ๊ฐ๋ค๋ง True๊ฐ ๋์ด์ mask๋ก ์ธ๋ฑ์ฑํ๋ฉด ํด๋น ๊ฐ๋ค๋ง ์ถ๋ ฅ
[ ์ฝ๋(์ด์ด์) ]
mask = gdp_array > 10000
gdp_array[mask]
[ ์ถ๋ ฅ๋ฌผ ]
array([12665, 13483, 12398, 10672])
- AND ์ฐ์ฐ๊ณผ OR ์ฐ์ฐ
[ ์ฝ๋(์ด์ด์) ]
# AND ์ฐ์ฐ์ : &
gdp_array[(gdp_array < 10000) & (gdp_array > 8000)]
[ ์ถ๋ ฅ๋ฌผ ]
array([8127, 8885, 8282])
[ ์ฝ๋(์ด์ด์) ]
# OR ์ฐ์ฐ์ : |
gdp_array[(gdp_array > 10000) | (gdp_array < 8000)]
[ ์ถ๋ ฅ๋ฌผ ]
array([ 6618, 12665, 13483, 12398, 10672])
6. Numpy ๊ธฐ๋ณธ ์ฐ์ฐ
- gdp_korea_array.mean() : ํ๊ท
- gdp_korea_array.sum() : ํฉ๊ณ
- gdp_korea_array.min() : ์ต์๊ฐ
- gdp_korea_array.max() : ์ต๋๊ฐ
- gdp_korea_array * 1200 : ๋ชจ๋ ๊ฐ์ 1200 ๊ณฑํ๊ธฐ (๋จ, ๊ธฐ์กด ๋ณ์์ ์๋ก ์ ์ฅ๋๋ ๊ฒ์ ์๋ -> ์ ์ฅํ๋ ค๋ฉด gdp_korea_array = gdp_korea_array * 1200 ์ผ๋ก ์๋ก ๋ฐฐ์ด ์ ์ฅ)
Matplotlib
1. Matplotlib ๊ฐ์
- ํ์ด์ฌ๊ณผ Numpy๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ๋ง๋ค์ด์ง ๋ฐ์ดํฐ ์๊ฐํ ๋ผ์ด๋ธ๋ฌ๋ฆฌ
[ ์ฝ๋ ]
import numpy as np
import matplotlib.pyplot as plt
year_array = np.array([2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020])
stock_array = np.array([
14.46, 19.01, 20.04, 27.59, 26.32,
28.96, 42.31, 39.33, 73.41, 132.69
])
plt.plot(year_array, stock_array)
plt.show()
[ ์ถ๋ ฅ๋ฌผ ]

- plt.plot(x์ถ, y์ถ) : ์ ํ ๊ทธ๋ํ / plt.bar : ๋ง๋ ๊ทธ๋ํ / plt.scatter : ์ฐ์ ๋(์ฐ๊ด๋)
2. Matplotlib ๊ทธ๋ํ ๊ฐ๋จํ๊ฒ ๊พธ๋ฏธ๊ธฐ
- plt.scatter(height_array, weight_array, c='red ํน์ HEX code', marker = '+ ๋๋ s' ) : c๋ color, marker๋ ์ ์ ๋ชจ์
- plt.title('Height and Weight')
- plt.xlabel('Height (cm)')
- plt.ylabel('Weight (kg)')
- ๋ ๋ง์ ์ต์ ์ ์๋ ๋งํฌ ์ฐธ๊ณ
matplotlib.markers — Matplotlib 3.10.0 documentation
matplotlib.markers Functions to handle markers; used by the marker functionality of plot, scatter, and errorbar. All possible markers are defined here: Note that special symbols can be defined via the STIX math font, e.g. "$\u266B$". For an overview over t
matplotlib.org
3. ๊ทธ๋ํ ์ฌ์ด์ฆ ์กฐ์ ํ๊ธฐ
(1) ๊ฐ๋ณ ๊ทธ๋ํ ์ฌ์ด์ฆ ์กฐ์
[ ์ฝ๋ ]
plt.figure(figsize=(10, 4)) # ๊ฐ๋ก, ์ธ๋ก ์ฌ์ด์ฆ ์กฐ์
plt.plot(year_array, stock_array)
plt.title('GDP Growth') # ๊ทธ๋ํ ์ ๋ชฉ ์ค์
plt.xlabel('Year')
plt.ylabel('GDP')
plt.show()
[ ์ถ๋ ฅ๋ฌผ ]

(2) ์ ์ฒด ๊ทธ๋ํ ํฌ๊ธฐ ์ค์
[ ์ฝ๋ ]
plt.rcParams['figure.figsize'] = (5, 5)
plt.scatter(height_array, weight_array)
plt.title('Scatter Plot')
plt.xlabel('X Label')
plt.ylabel('Y Label')
plt.show()
4. Matplotlib ๊ทธ๋ํ์ ํ๊ธ๋ก ๋ ํ ์คํธ ๋ฃ๊ธฐ
- ํ๊ธ์ ๋ฃ์ผ๋ฉด ๊ธ์๊ฐ ๋ชจ๋ ๊ป์ง๊ฒ ๋จ -> ํ๊ตญ์ด ํฐํธ๋ก ๋ฐ๊พธ๋ ์ฝ๋๊ฐ ํ์ํจ
[ ์ฝ๋ ]
plt.rc('font', family='Apple Gothic')
Pandas
1. Pandas ๊ฐ์
(1) Numpy array์ ๋จ์ : pandas ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ก ๋ชจ๋ ํด๊ฒฐ ๊ฐ๋ฅ
- ๊ฐ๋ ์ฑ์ด ๋จ์ด์ง
- ์ ๋ณด์ ๋ํ ๋ ์ด๋ธ ์ฝ์ ๋ถ๊ฐ
- ํ ๊ฐ์ง ๋ฐ์ดํฐ ํ์ ๋ง ์ฌ์ฉ ๊ฐ๋ฅ
(2) Pandas ์ฌ์ฉ๋ฒ
[ ์ฝ๋ ]
import pandas as pd
df = pd.DataFrame({
'category': ['skirt', 'sweater', 'coat', 'jeans'],
'quantity': [10, 15, 6, 11],
'price': [30000, 60000, 95000, 35000]
})
df
[ ์ถ๋ ฅ๋ฌผ ]

- ํน์ ์ด์ ๋ฐ์ดํฐ๋ง ์ถ์ถํ๋ ๊ฒ ๊ฐ๋ฅ
[ ์ฝ๋ ]
df['quantity']
[ ์ถ๋ ฅ๋ฌผ ]
0 10
1 15
2 6
3 11
- mean, sum, min๊ณผ ๊ฐ์ ์ฐ์ฐ์ ๋ชจ๋ ์ฌ์ฉํ ์ ์์
[ ์ฝ๋ ]
df['quantity'].mean
2. Pandas ์ธ๋ถ ์๋ฃ ๋ถ๋ฌ์ค๊ธฐ
[ ์ฝ๋ ]
import pandas as pd
burger_df2 = pd.read_csv("data/burger2.csv", header=None,
names=["product_name", "calories", "carb", "protein", "fat", "sodium", "category"],
index_col="product_name")
burger_df2
- csv : comma-separated values = ๊ฐ๋ค์ด ์ผํ๋ก ๋๋์ด์ ธ ์์
[ ์ถ๋ ฅ๋ฌผ ]

3. DataFrame์์ ์ผ๋ถ ๋ฐ์ดํฐ๋ง ์ถ์ถํ๊ธฐ(iloc, loc)
(1) iloc : integer location, ์ ์๊ฐ์ ์ธ๋ฑ์ค๋ฅผ ํตํด์ ๋ฐ์ดํฐ๋ฅผ ๋ฐ์์ค๋ ๊ฒ

(2) loc : location, ์์น๋ฅผ ํตํด์ ๋ฐ์ดํฐ๋ฅผ ๋ฐ์์ค๋ ๊ฒ

- iloc์ ๋ง์ง๋ง ๊ฐ์ ํฌํจ๋์ง ์๋ ๋ฐ๋ฉด, loc์ ๋ง์ง๋ง ๊ฐ๊น์ง ํฌํจํด์ ๊ฒฐ๊ณผ ๋์ถ
4. DataFrame๊ณผ ๋ถ๋ฆฐ ์ธ๋ฑ์ฑ
[ ์ฝ๋ ]
burger_df.loc[burger_df['calories']] < 500, 'protein'] # ํ column์ ๋ณด๊ณ ์ถ์ ๊ฒฝ์ฐ
burger_df.loc[burger_df['calories']] < 500, ['carb', 'protein']] # ์ฌ๋ฌ column์ ๋ณด๊ณ ์ถ์ ๊ฒฝ์ฐ
[ ์ถ๋ ฅ๋ฌผ ]

- ํน์ ํด๋น ๋ฐ์ดํฐ๋ฅผ ํ ์ด๋ธ์ ํํ๋ก ๋ํ๋ด๋ ๋ฐฉ๋ฒ
[ ์ฝ๋ ]
mask = burger_df2['calories'] < 500
burger_df2[mask]
5. ๋ฐ์ดํฐ ์์ ๋ฐ ์ถ๊ฐํ๊ธฐ
# ์
ํ๋ ์์ ํ๊ธฐ
burger_df2.loc['Duble Stacker King', 'sodium'] = 1.9
# row ํ ์ค ์์ ํ๊ธฐ
burger_df2.loc['Cheeseburger'] = [360, 24, 18, 21, 0.7, 'Burger']
# column ํ ์ค ์์ ํ๊ธฐ
burger_df['sodium'] = [1.8, ... 1.3]
# ์๋ก์ด row/column ์ถ๊ฐํ๊ธฐ
burger_df.loc['Tripple Whopper'] = [1130, 49, 67, 75, 1.1, 'Burger'] # row ์ถ๊ฐ
burger_df['brand'] = 'Burger King' # column ์ถ๊ฐ
# ์กฐ๊ฑด์ ๋ฐ๋ผ ๊ฐ์ ์ถ๊ฐํ๊ธฐ
burger_df.loc[burger_df['calories'] >= 500, 'high_calorie'] = True
6. Pandas์์ ๊ทธ๋ํ ๋ง๋ค๊ธฐ
[ ์ฝ๋ ]
sales_df.plot()
plt.show()
# x์ถ๊ณผ y์ถ์ ๋ค์ด๊ฐ ๊ฐ์ ์ง์ ํ ์ ์์
sales_df.plot(x='quarter', y='revenue')
plt.show()
# ๊ทธ๋ํ ํํ ์ง์
sales_df.plot(x='quarter', y='revenue', kind='bar', labels =['1Q', '2Q', '3Q', '4Q'])
plt.show()
