awesome-korean-nlp

A curated list of resources dedicated to Natural Language Processing for Korean

한글, 한국어에 대한 자연어처리를 하는데에 유용한 자료 목록을 모아두었습니다.

Maintainers - Insik Kim

Thanks to Keon Kim, Martin Park for making awesome nlp, which is original awesome-nlp

Please feel free to pull requests, or email Insik Kim (insik92@gmail.com) to add links.

Tutorials and Courses

Tensor Flow Tutorial on Seq2Seq Models
Natural Language Understanding with Distributed Representation Lecture Note by Cho

videos

Dataset

Corpus, 말뭉치

세종말뭉치
Wikipedia Korean
Wiki Source Korean

출판물이기 때문에 문법 정보 추출에 도움이 되는 데이터

나무위키:데이터베이스 덤프, 나무 위키 DB Dump Download Mirror Site

미러에서 제공하는 7z파일을 이용하면 약 1.2GB 크기의 압축파일을 받을 수 있다.

위키 rawdata를 평문으로 바꿔주는 스크립트는 namu_wiki_db_preprocess 참고

Deep Learning for NLP

Packages

Implementations

Pre-trained word embeddings for WSJ corpus by Koc AI-Lab
Word2vec by Mikolov
HLBL language model by Turian
Real-valued vector "embeddings" by Dhillon
Improving Word Representations Via Global Context And Multiple Word Prototypes by Huang
Dependency based word embeddings
Global Vectors for Word Representations

Libraries

Python - Python NLP Libraries
- KoNLPy - A Python package for Korean natural language processing.
C++ - C++ Libraries
- Mecab (Korean) 형태소분석기
Scalar - Scalar Libraries
- twitter-korean-text 토큰 추출

https://github.com/twitter/twitter-korean-text

Services

Articles

Review Articles

Word Vectors

Resources about word vectors, aka word embeddings, and distributed representations for words.
Word vectors are numeric representations of words that are often used as input to deep learning systems. This process is sometimes called pretraining.

word2vec 관련 이론 정리
word2vec 튜토리얼

Thought Vectors

Thought vectors are numeric representations for sentences, paragraphs, and documents. The following papers are listed in order of date published, each one replaces the last as the state of the art in sentiment analysis.

Single Exchange Dialogs

Memory and Attention Models

General Natural Language Processing

Named Entity Recognition

Neural Network

Supplementary Materials

Projects

시인 뉴럴. Multi-layer LSTM for character-level language models in Torch. implemented by Kim Tae Hoon.
한글 word2vec Demo. implemented by Daegeun Lee.

Blogs

Credits

part of the lists are from

ai-reading-list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

awesome-korean-nlp

Table of Contents

Tutorials and Courses

videos

Dataset

Corpus, 말뭉치

Deep Learning for NLP

Packages

Implementations

Libraries

Services

Articles

Review Articles

Word Vectors

Thought Vectors

Single Exchange Dialogs

Memory and Attention Models

General Natural Language Processing

Named Entity Recognition

Neural Network

Supplementary Materials

Projects

Blogs

Credits

Files

README.md

Latest commit

History

README.md

File metadata and controls

awesome-korean-nlp

Table of Contents

Tutorials and Courses

videos

Dataset

Corpus, 말뭉치

Deep Learning for NLP

Packages

Implementations

Libraries

Services

Articles

Review Articles

Word Vectors

Thought Vectors

Single Exchange Dialogs

Memory and Attention Models

General Natural Language Processing

Named Entity Recognition

Neural Network

Supplementary Materials

Projects

Blogs

Credits