[인공지능 데브코스 TIL] 0831 웹 스크래핑 기초 (4): Seaborn, WordCloud

인공지능 데브코스 6기

[인공지능 데브코스 TIL] 0831 웹 스크래핑 기초 (4): Seaborn, WordCloud

비쵸비쵸비 2023. 9. 8. 11:44

728x90

프로그래머스 인공지능 데브코스 2주차 강의를 정리한 글입니다.

Seaborn

데이터 시각화 라이브러리
matplotlib 기반

# seaborn 불러오기
import seaborn as sns

sns.lineplot(x,y): 꺾은 선 그래프
sns.barplot(x,y): 막대 그래프
- for categorical data

matplotlib의 속성을 변경해서 그래프 만지기

# matplotlib 불러오기
import matplotlib.pyplot as plt

plt.title(): 제목 추가
plt.xlabel(), plt.ylabel(): 축에 설명 추가
plt.xlim(), plt.ylim(): 축의 범위 지정
plt.figure(figsize = (x, y)): 그래프 크기 지정

실습1: lineplot

날씨정보를 가져와서 lineplot 그리기

# 데이터 불러오기/전처리
temps = driver.find_element(By.ID, "my-tchart").text
temps = [int(i) for i in temps.replace("℃","").split("\n")]

# 그래프
plt.ylim(min(temps) - 2, max(temps) + 2)
plt.title("Expected Temperature from now on")

sns.lineplot(
    x = [i for i in range(len(temps))],
    y = temps
)

축의 최대, 최솟값이 가변적일 때 max, min으로 xlim, ylim을 지정해줄 수 있다.

실습2: barplot

해시코드 질문 태그의 빈도를 barplot으로 시각화하기

html 에서 질문 태그 가져와서 frequency 딕셔너리 만들기

frequency = {}
for i in range(1,3):
 res = requests.get("https://hashcode.co.kr/?page={}".format(i),user_agent)
 soup = BeautifulSoup(res.text, "html.parser")

 # 1. ul 태그를 모두 찾기
 # 2. 1번 안에 있는 li 태그의 text를 추출

 ul_tags = soup.find_all("ul", "question-tags")
 for ul in ul_tags:
     li_tags = ul.find_all("li")
     for li in li_tags:
         tag = li.text.strip() # 빈 칸 제거
         if tag not in frequency:
             frequency[tag] = 1
         else:
             frequency[tag] += 1
 time.sleep(0.5)

Counter를 사용해 가장 빈도가 높은 value들을 추출

from collections import Counter
counter = Counter(frequency)
counter.most_common(10)

plotting

plt.figure(figsize = (10,10))
sns.barplot(
 x = [elem[0] for elem in counter.most_common(10)],
 y = [elem[1] for elem in counter.most_common(10)]
)
plt.xlabel("tag")
plt.ylabel("frequency")
plt.title("Frequency of Questions in Hashcode")
plt.show()

Wordcloud

파이썬의 텍스트 클라우드 라이브러리
word cloud: 자주 등장하는 텍스트를 중요도, 인기도 등을 고려해 표현한 것
konlpy: 한국어 형태소 분석기 라이브러리

word cloud 만들기

과정
1. konlpy라이브러리로 한국어 문장 전처리
2. Counter를 이용하여 키워드의 빈도 측정
3. WordCloud 이용하여 시각화

# 시각화에 쓰이는 라이브러리
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# 횟수를 기반으로 딕셔너리 생성
from collections import Counter

# 문장에서 명사를 추출하는 형태소 분석 라이브러리
from konlpy.tag import Hannanum

# Hannanum 객체를 생성한 후, .nouns()를 통해 명사를 추출합니다.
hannanum = Hannanum()
nouns = hannanum.nouns(national_anthem)
words = [noun for noun in nouns if len(noun) > 1]

counter = Counter(words) # 빈도 추출

# wordcloud 그리기
wordcloud = WordCloud(
    font_path="NanumGothic.ttf",
    background_color = "white",
    width = 1000,
    height = 1000
)
img = wordcloud.generate_from_frequencies(counter)
plt.imshow(img)

konlpy는 자바가 있어야 돌아가는데, 구글링해가면서 자바를 어떻게 설치해도 계속 오류가 나서 포기하고 그냥 코랩으로 했다^_ㅠ(맥북^^) 코랩으로 광명찾으세요..

728x90

저작자표시 비영리 변경금지 (새창열림)

'인공지능 데브코스 6기' 카테고리의 다른 글

[RL] MDP, Value Iteration, Policy Evaluation, Maximum Entropy Formulation (0)	2023.10.20
[인공지능 데브코스 TIL] 0921 Deep Learning: 신경망의 기초 - 기계학습 (0)	2023.09.28
[인공지능 데브코스 TIL] 0830 웹 스크래핑 기초 (3): Selenium (0)	2023.09.07
[인공지능 데브코스 TIL] 0829 웹 스크래핑 기초 (2): BeautifulSoup4 (0)	2023.09.07
[인공지능 데브코스 TIL] 0828 웹 스크래핑 기초 (1): HTTP 요청 주고받기 (0)	2023.09.02

현재글[인공지능 데브코스 TIL] 0831 웹 스크래핑 기초 (4): Seaborn, WordCloud

#AI #DS #Cognitive_Science

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

대학원생의 공부노트

[인공지능 데브코스 TIL] 0831 웹 스크래핑 기초 (4): Seaborn, WordCloud

Seaborn

matplotlib의 속성을 변경해서 그래프 만지기

실습1: lineplot

실습2: barplot

Wordcloud

word cloud 만들기

'인공지능 데브코스 6기' 카테고리의 다른 글

'인공지능 데브코스 6기'의 다른글

티스토리툴바

[인공지능 데브코스 TIL] 0831 웹 스크래핑 기초 (4): Seaborn, WordCloud

Seaborn

matplotlib의 속성을 변경해서 그래프 만지기

실습1: lineplot

실습2: barplot

Wordcloud

word cloud 만들기

'인공지능 데브코스 6기' 카테고리의 다른 글

'인공지능 데브코스 6기'의 다른글

관련글

티스토리툴바