IT/SQLD

MySQL character set 확인 및 변경

알콩달콩아빠 2023. 10. 26. 20:52
728x90
반응형

언제나 한글 깨짐 문제로 인한 캐릭터  확인  변경 작업이 많다.

일단 DB 저장된 한글이 클라이언트에서  보이는 경우는 클라이언트의 인코딩 값을 변경해서 조치를   있다.

그러나 저장 당시의 인코딩 값이 서로 다른 한글이 들어가 있을 경우는 어떤 한글은 정상적으로 보이지만, 어떤 한글은 깨져서 보이는 경우가 발생하게 된다.

MySQL 기본으로 서버 캐릭터 셋이 latin1으로 설정 되어 있어서 DB 생성  캐릭터  값을 지정해주지 않으면 latin1으로 생성이 된다.

 

Character set ?

사용하는 언어를 표현하기 위한 문자들의 집합을 의미. 한글을 표현하기 위하여 사용하는 캐릭터 셋에는 여러가지가 존재 한다.

Encoding ?

Character Set을 컴퓨터가 이해할 수 있는 바이트와 매핑해 주는 것.

 

0. 사용할  있는 캐릭터  종류 확인

 

mysql> show character set ;
+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| dec8     | DEC West European               | dec8_swedish_ci     |      1 |
| cp850    | DOS West European               | cp850_general_ci    |      1 |
| hp8      | HP West European                | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
| cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
| ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
| cp866    | DOS Russian                     | cp866_general_ci    |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
| macce    | Mac Central European            | macce_general_ci    |      1 |
| macroman | Mac West European               | macroman_general_ci |      1 |
| cp852    | DOS Central European            | cp852_general_ci    |      1 |
| latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_general_ci  |      4 |
| cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
| utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
| utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
| cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
| binary   | Binary pseudo charset           | binary              |      1 |
| geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
| cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
| eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
| gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
+----------+---------------------------------+---------------------+--------+


mysql> show character set where description like '%Korean%' ;
+---------+---------------+-------------------+--------+
| Charset | Description   | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| euckr   | EUC-KR Korean | euckr_korean_ci   |      2 |


+---------+---------------+-------------------+--------+

 

위에서 보면 Maxlen 칼럼이 보이는데 영어 이외의 언어를 저장하기 위해서는 2 이상의 값을 가진 언어 셋을 가져야만 한글이 DB 저장이 된다.

euckr 한글을 2바이트로 저장하고, utf8 한글을 3바이트로 저장한다. (공백이나 영문은 1바이트로 처리)

그래서 euckr 제한  한글만 표현이 가능하지만, 한글 1글자당 2바이트를 사용하므로, 제한된 한글만 사용하는 환경에서는 euckr 설정 하는 것이 유리

 

1. 간단하게 캐릭터 셋을 확인 하는 방법

 

mysql> status
--------------
mysql  Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using  EditLine wrapper


Connection id:                3836
Current database:
Current user:                root@localhost
SSL:                        Not in use
Current pager:                stdout
Using outfile:                ''
Using delimiter:        ;
Server version:                5.7.14-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)
Protocol version:        10
Connection:                Localhost via UNIX socket
Server characterset:        latin1
Db     characterset:        latin1
Client characterset:        utf8
Conn.  characterset:        utf8
UNIX socket:                /tmp/mysql.sock
Uptime:                        1 hour 11 min 12 sec


Threads: 3  Questions: 33930  Slow queries: 0  Opens: 120  Flush tables: 1  Open tables: 113  Queries per second avg: 7.942




Servere characterset : 데이터베이스 생성  기본 값을 지정해 주지 않을 경우,  값으로 데이터 베이스 캐릭터 셋이 설정 .
DB      characterset : 테이블 생성  기본 값을 지정해 주지 않을 경우,  값으로 테이블의 캐릭터 셋이 설정 .
Client  characterset : 클라이언트 인코딩 설정 .

 

2. 데이터 베이스의 디폴트 캐릭터  설정

 

생성  설정해 주는 방법
CREATE DATABASE `utf8db` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci ;
CREATE DATABASE `euekrdb` DEFAULT CHARACTER SET euckr COLLATE euckr_korean_ci ;


Alter 명령으로 변경 하는 방법
ALTER  DATABASE  db명 DEFAULT CHARACTER SET utf8 ;


데이터베이스 캐릭터  확인
SELECT schema_name , default_character_set_name
FROM information_schema.schemata ;


schema_name        |default_character_set_name |
-------------------|---------------------------|
information_schema |utf8                       |
euekrdb            |euckr                      |
mysql              |latin1                     |
performance_schema |utf8                       |
sys                |utf8                       |
test               |latin1                     |
utf8db             |utf8                       |


테이블의 캐릭터  설정  확인
CREATE TABLE 테이블명 (id int , name varchar(10)) DEFAULT CHARSET=utf8 ;


SELECT table_schema , table_name , table_collation
FROM information_schema.tables
WHERE table_schema = 'information_schema' AND table_name = 'PROCESSLIST'


table_schema       |table_name  |table_collation |
-------------------|------------|----------------|


information_schema |PROCESSLIST |utf8_general_ci | 

 

3. 세션레벨에서 인코딩  변경

 

임시적인 변경


set names 케릭터셋 ;


mysql> status
--------------
mysql  Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using  EditLine wrapper


Connection id:                3836
Current database:
Current user:                root@localhost
SSL:                        Not in use
Current pager:                stdout
Using outfile:                ''
Using delimiter:        ;
Server version:                5.7.14-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)
Protocol version:        10
Connection:                Localhost via UNIX socket
Server characterset:        latin1
Db     characterset:        latin1
Client characterset:        utf8
Conn.  characterset:        utf8
UNIX socket:                /tmp/mysql.sock
Uptime:                        1 hour 11 min 12 sec


Threads: 3  Questions: 33930  Slow queries: 0  Opens: 120  Flush tables: 1  Open tables: 113  Queries per second avg: 7.942
--------------


mysql> set names latin1 ;
Query OK, 0 rows affected (0.00 sec)


mysql> status
--------------
mysql  Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using  EditLine wrapper


Connection id:                3836
Current database:
Current user:                root@localhost
SSL:                        Not in use
Current pager:                stdout
Using outfile:                ''
Using delimiter:        ;
Server version:                5.7.14-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)
Protocol version:        10
Connection:                Localhost via UNIX socket
Server characterset:        latin1
Db     characterset:        latin1
Client characterset:        latin1
Conn.  characterset:        latin1
UNIX socket:                /tmp/mysql.sock
Uptime:                        1 hour 20 min 3 sec


Threads: 3  Questions: 38153  Slow queries: 0  Opens: 120  Flush tables: 1  Open tables: 113  Queries per second avg: 7.943
--------------


mysql> set names utf8 ;
Query OK, 0 rows affected (0.00 sec)


mysql> status
--------------
mysql  Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using  EditLine wrapper


Connection id:                3836
Current database:
Current user:                root@localhost
SSL:                        Not in use
Current pager:                stdout
Using outfile:                ''
Using delimiter:        ;
Server version:                5.7.14-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)
Protocol version:        10
Connection:                Localhost via UNIX socket
Server characterset:        latin1
Db     characterset:        latin1
Client characterset:        utf8
Conn.  characterset:        utf8
UNIX socket:                /tmp/mysql.sock
Uptime:                        1 hour 25 min 38 sec


Threads: 3  Questions: 40805  Slow queries: 0  Opens: 120  Flush tables: 1  Open tables: 113  Queries per second avg: 7.941
--------------

 

영구적인 변경


/etc/my.cnf 파일에 캐릭터 셋을 명시  MySQL  기동.


) utf8 변경 


[client]
default-character-set = utf8


[mysqld]
skip-character-set-client-handshake
default-character-set = utf8
character-set-server = utf8
collation-server = utf8_general_ci
init-connect = SET NAMES utf8


[mysqldump]
default-character-set = utf8


[mysql]
default-character-set = utf8

 

4. 캐릭터  테스트

 

# Server   characterSet : latin1
# Database characterset : euckr
# Client   인코딩 값은 변경해 가면서 테스트


# Client 인코딩값을 데이터베이스 캐릭터 셋과 동일한 값으로 변경
mysql> set names euckr ;
mysql> status ;


Current database:        euekrdb
Current user:                root@localhost
Server characterset:        latin1
Db     characterset:        euckr
Client characterset:        euckr
Conn.  characterset:        euckr


#테스트 테이블 생성
mysql> create table et1  (id int , name varchar(10)) ;
mysql> create table eet1 (id int , name varchar(10)) DEFAULT CHARSET=euckr  ;
mysql> create table eut1 (id int , name varchar(10)) DEFAULT CHARSET=utf8 ;
mysql> commit ;
mysql> select table_name , table_collation from information_schema.tables where table_name like 'e%1' ;
+------------+-----------------+
| table_name | table_collation |
+------------+-----------------+
| eet1       | euckr_korean_ci |
| et1        | euckr_korean_ci |
| eut1       | utf8_general_ci |
+------------+-----------------+
캐릭터 셋을 명시적으로 지정해 주지 않았던 et1 테이블은 데이터베이스 기본 캐릭터 셋으로 설정 .


#데이터 Insert
mysql> insert into et1 values (1 , '1111' ) ;
mysql> commit ;
mysql> insert into eet1 values (1 , '1111' ) ;
mysql> commit ;
mysql> insert into eut1 values (1 , '1111' ) ;
mysql> commit;


mysql> select * from et1 ;
+------+--------+
| id   | name   |
+------+--------+
|    1 | 1111 |
+------+--------+


mysql> select * from eet1 ;
+------+--------+
| id   | name   |
+------+--------+
|    1 | 1111 |
+------+--------+




mysql> select * from eut1 ;
+------+--------+
| id   | name   |
+------+--------+
|    1 | 1111 |
+------+--------+


# 클라이언트 인코딩 변경
mysql> set names utf8 ;
mysql> status ;


Current database:        euekrdb
Server characterset:        latin1
Db     characterset:        euckr
Client characterset:        utf8
Conn.  characterset:        utf8


mysql> select * from et1 ;
+------+---------+
| id   | name    |
+------+---------+
|    1 | 11??11  |
+------+---------+


mysql> select * from eet1 ;
+------+---------+
| id   | name    |
+------+---------+
|    1 | 11??11  |
+------+---------+


mysql> select * from eut1 ;
+------+---------+
| id   | name    |
+------+---------+
|    1 | 11??11  |
+------+---------+


# 새로운 데이터 Insert
mysql> insert into et1 values (11 , '1111' ) ;
ERROR 1366 (HY000): Incorrect string value: '\xC8\xAB11' for column 'name' at row 1


mysql> insert into eet1 values (11 , '1111' ) ;
ERROR 1366 (HY000): Incorrect string value: '\xC8\xAB11' for column 'name' at row 1


mysql> insert into eut1 values (11 , '1111' ) ;
mysql> commit;


mysql> select * from eut1 ;
+------+---------+
| id   | name    |
+------+---------+
|    1 | 11??11  |
|   11 | 1111  |
+------+---------+


# 다시 클라이언트 인코딩 변경
mysql> set names euckr ;
mysql> status ;


Current database:        euekrdb
Server characterset:        latin1
Db     characterset:        euckr
Client characterset:        euckr
Conn.  characterset:        euckr


mysql> select * from eut1 ;
+------+--------+
| id   | name   |
+------+--------+
|    1 | 1111 |
|   11 | 11?11  |
+------+--------+

 

# Server   characterSet : latin1
# Database characterset : utf8
# Client  인코딩 값은 변경해 가면서 테스트


# Client 인코딩값을 데이터베이스 캐릭터 셋과 동일한 값으로 변경
mysql> set names utf8;
mysql> status


Current database:        utf8db
Server characterset:        latin1
Db     characterset:        utf8
Client characterset:        utf8
Conn.  characterset:        utf8


#테스트 테이블 생성
mysql> create table ut1  (id int , name varchar(10)) ;
mysql> create table uet1 (id int , name varchar(10)) DEFAULT CHARSET=euckr  ;
mysql> create table uut1 (id int , name varchar(10)) DEFAULT CHARSET=utf8 ;
mysql> commit ;
mysql> select table_name , table_collation from information_schema.tables where table_name like 'u%1' ;
+------------+-----------------+
| table_name | table_collation |
+------------+-----------------+
| uet1       | euckr_korean_ci |
| ut1        | utf8_general_ci |
| uut1       | utf8_general_ci |
+------------+-----------------+


#데이터 Insert
mysql> insert into ut1 values (1 , '1111' ) ;
mysql> commit ;
mysql> insert into uet1 values (1 , '1111' ) ;
ERROR 1366 (HY000): Incorrect string value: '\xC8\xAB11' for column 'name' at row 1
mysql> insert into uut1 values (1 , '1111' ) ;
mysql> commit;


mysql> select * from ut1 ;
+------+--------+
| id   | name   |
+------+--------+
|    1 | 1111 |
+------+--------+


mysql> select * from uet1 ;
Empty set (0.00 sec)


mysql> select * from uut1 ;
+------+--------+
| id   | name   |
+------+--------+
|    1 | 1111 |
+------+--------+


# 클라이언트 인코딩 변경
mysql> set names euckr ;
mysql> status


Current database:        utf8db
Server characterset:        latin1
Db     characterset:        utf8
Client characterset:        euckr
Conn.  characterset:        euckr


# 새로운 데이터 Insert
mysql> insert into ut1 values (11 , '1111' ) ;
mysql> commit ;
mysql> insert into uet1 values (11 , '1111' ) ;
mysql> commit ;
mysql> insert into uut1 values (11 , '1111' ) ;
mysql> commit ;


mysql> select * from ut1 ;
+------+--------+
| id   | name   |
+------+--------+
|    1 | 11?11  |
|   11 | 1111 |
+------+--------+


mysql> select * from uet1 ;
+------+--------+
| id   | name   |
+------+--------+
|   11 | 1111 |
+------+--------+


mysql> select * from uut1 ;
+------+--------+
| id   | name   |
+------+--------+
|    1 | 11?11  |
|   11 | 1111 |
+------+--------+


# 다시 클라이언트 인코딩 변경
mysql> set names utf8 ;
mysql> status


Current database:        utf8db
Server characterset:        latin1
Db     characterset:        utf8
Client characterset:        utf8
Conn.  characterset:        utf8


mysql> select * from ut1 ;
+------+---------+
| id   | name    |
+------+---------+
|    1 | 1111  |
|   11 | 11??11  |
+------+---------+


mysql> select * from uet1 ;
+------+---------+
| id   | name    |
+------+---------+
|   11 | 11??11  |
+------+---------+


mysql> select * from uet1 ;
+------+---------+
| id   | name    |
+------+---------+
|   11 | 11??11  |
+------+---------+


mysql> select * from uut1 ;
+------+---------+
| id   | name    |
+------+---------+
|    1 | 1111  |
|   11 | 11??11  |
+------+---------+

 

utf8상태는 한글을 3바이트로 저장을 한다. 그래서 utf8 테이블에 euckr 한글을 저장하게 되면 2바이트로 저장하게 되므로, 저장이 된다.

그러나 euckr 상태인 테이블에 utf8 데이터를 저장하려고 하면, 2바이트로 한글을 저장해야 하는데 3바이트로 저장을 시도하게 되므로 에러가 발생 한다.

 

테이블의 캐릭터 셋과 상관없이 한글을 저장할 당시의 인코딩 값과 일치하는 한글만 제대로 표현이 된다.

그래서 테이블 별로 캐릭터 셋을 지정해   있지만, 데이터 베이스 레벨에서 캐릭터 셋을 통일하여서 한글을 사용하면, 한글 깨짐 문제를 미리 방지할  있다.

 

출처 : MySQL character set 확인 및 변경 :: 멋지게 놀아라 (tistory.com)

728x90
반응형