웹서버 오류 - bind() to[::]:443 failed 에러와 대처방법

2023. 5. 24.

by. mason.jeong

사내에서 let's encrypt 인증서를 사용하는 곳이 딱 두 군데 있는데, 하나는 사내 나스와 하나는 harbor 컨테이너입니다. harbor는 보안이 요구되는 고객사 때문에 사내망에 별도로 만든 저장소입니다. 사내망에 있다 보니 외부로 나가는 경우를 위해 별도의 연결을 해두었는데요. 프락시 서버의 인증서 때문에 사내 인증서를 사용할 수가 없는 이슈가 있어 let's encrypt를 써서 대강 둘러댔습니다. 외부에서 레지스트리로 인증을 요청하려면 ssl을 이용해야 하기 때문입니다.

let's encrypt는 90일 동안만 사용가능하고 이후 갱신가능 횟수는 무제한입니다. 갱신만 제때 해주면 되는데요. 갱신도 certbot이라는 프로그램이 대신 수행해 줍니다. 그래서 대강 crontab에 올려두었고 그것으로 좋았습니다.

0 0 1 * * /usr/bin/certbot renew --renew-hook="/home/harbor/install.sh"

그러고 얼마 후 레지스트리에서 이미지가 안 받아진다는 이야기를 듣고 확인해 보니 인증서가 만료된 것을 확일할 수 있었습니다. 왜 그런지 확인을 위해 로그를 살펴보았습니다.

 May 24 05:00:34 .. certbot[3460308]: Attempting to renew cert (가림처리) from /etc/letsencrypt/renewal/가림처리.conf produced an unexpected error: nginx restart failed:
May 24 05:00:34 .. certbot[3460308]: b''
May 24 05:00:34 .. certbot[3460308]: b''. Skipping.
May 24 05:00:34 .. certbot[3460308]: All renewal attempts failed. The following certs could not be renewed:
May 24 05:00:34 .. certbot[3460308]:   /etc/letsencrypt/live/가림처리/fullchain.pem (failure)
May 24 05:00:34 .. certbot[3460308]: 1 renew failure(s), 0 parse failure(s)
May 24 05:00:34 .. systemd[1]: certbot.service: Main process exited, code=exited, status=1/FAILURE
May 24 05:00:34 .. systemd[1]: certbot.service: Failed with result 'exit-code'.
May 24 05:00:34 .. systemd[1]: Failed to start Certbot.

이러한 로그가 찍혀있었습니다. 자체적으로 실행하는 nginx 가 실행이 잘 안 됐다고 하는 것 같습니다. 아하! harvor 컨테이너에서 nginx 가 실행되고 80 포트와 443 포트를 물고 있으니 실행이 안 됐을 수 있겠다고 생각을 하고서 스크립트를 대강 짜주었습니다.

 #!/bin/bash
 
NGINX_PID=`docker ps -asf "name=nginx"`
docker stop $NGINX_PID
 
...

스크립트를 포함하여 renew를 받는 과정을 실행시켰는데, 당연히 실행이 돼야 되는데 또 똑같은 로그가 찍혔습니다. 로컬에 동작중인 포트가 있는지 확인을 해봤습니다.

 service nginx status
 
● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2023-05-24 16:23:39 KST; 5min ago
       Docs: man:nginx(8)
    Process: 3699224 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
    Process: 3699238 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=1/FAILURE)
 
May 24 16:23:38 .. nginx[3699238]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
May 24 16:23:38 .. nginx[3699238]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
May 24 16:23:39 .. nginx[3699238]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
May 24 16:23:39 .. nginx[3699238]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
May 24 16:23:39 .. nginx[3699238]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
May 24 16:23:39 .. nginx[3699238]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
May 24 16:23:39 .. nginx[3699238]: nginx: [emerg] still could not bind()
May 24 16:23:39 .. systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
May 24 16:23:39 .. systemd[1]: nginx.service: Failed with result 'exit-code'.
May 24 16:23:39 .. systemd[1]: Failed to start A high performance web server and a reverse proxy server.

nginx가 failed 상태로 계속해서 bind()를 호출하는 모습을 볼 수 있었습니다. stop을 시켜보았습니다.

systemctl stop nginx.service

그런데도 상태가 exited로 변하지 않고 failed 그대로였습니다. 프로세스를 살펴보았습니다.

 netstat -tulpN
 
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:http            0.0.0.0:*               LISTEN      3701015/nginx: mast 
tcp        0      0 localhost:domain        0.0.0.0:*               LISTEN      3553746/systemd-res 
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      833/sshd: /usr/sbin 
tcp        0      0 localhost:6010          0.0.0.0:*               LISTEN      3689672/sshd: stick 
tcp        0      0 0.0.0.0:https           0.0.0.0:*               LISTEN      3701015/nginx: mast 
tcp        0      0 localhost:1514          0.0.0.0:*               LISTEN      3697547/docker-prox 
tcp6       0      0 [::]:http               [::]:*                  LISTEN      3701015/nginx: mast 
tcp6       0      0 ip6-localhost:6010      [::]:*                  LISTEN      3689672/sshd: stick 
tcp6       0      0 [::]:https              [::]:*                  LISTEN      3701015/nginx: mast 
udp        0      0 localhost:domain        0.0.0.0:*                           3553746/systemd-res 
udp        0      0 stick:bootpc            0.0.0.0:*                           3553728/systemd-net

nginx의 워커들이 여전히 동작중입니다. 워커를 종료시키기 위해 터미네이트를 던져봐도 죽지 않고 계속 살아납니다. 몇 번 더 이 과정을 테스트해보니 결국 certbot에서 인증 갱신 명령에서 인증서를 새로 발급받거나, 재 갱신 후 nginx를 실행하면 재 실행되지 않는 문제를 발견했습니다. certbot에서 이 문제를 자체적으로 방지하는 방법은 아직 확인되지 않았습니다만 SSL 인증 과정에서 아마 443 포트를 어떤 프로그램이 사용 후 되돌려 주지 않는 버그가 있다고 짐작하고 있습니다.

이런 경우 443 포트로 소켓을 사용하는 프로세스에서 리소스를 해제할 수 있도록 fuser -k 443/tcp 명령을 사용하면 해결이 가능합니다.

그래서 임시방편이지만 이런 식으로 스크립트를 짜두고 스크립트를 크론에서 동작하도록 설정했습니다.

 #!/bin/bash
  
NGINX_PID=`docker ps -aqf "name=nginx"`
NGINX_STATUS=`docker container inspect -f '{{.State.Running}}' nginx`
 
if [ $NGINX_STATUS ];
then
    docker stop $NGINX_PID
fi
 
/usr/bin/certbot renew
 
fuser -k 80/tcp
fuser -k 443/tcp
 
/home/stick/harbor/install.sh

테스트 후 이상 없이 잘 갱신이 되었습니다. 회사에서 certbot을 잘 사용할 일은 없지만 혹시 모를 나중을 위해 글을 남겨두었습니다. 감사합니다!

mason.jeong

읽어주셔서 감사합니다.

'Tech > DevOps' 카테고리의 다른 글

AWS EKS 클러스터 구축 - 1부 (2)	2023.05.11

AWS EKS 클러스터 구축 - 1부 2023.05.11

맨 위로

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

mason_lab
블로그좀 제대로 해보자😫

웹서버 오류 - bind() to[::]:443 failed 에러와 대처방법

'Tech > DevOps' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

	May 24 05:00:34 .. certbot[3460308]: Attempting to renew cert (가림처리) from /etc/letsencrypt/renewal/가림처리.conf produced an unexpected error: nginx restart failed:
	May 24 05:00:34 .. certbot[3460308]: b''
	May 24 05:00:34 .. certbot[3460308]: b''. Skipping.
	May 24 05:00:34 .. certbot[3460308]: All renewal attempts failed. The following certs could not be renewed:
	May 24 05:00:34 .. certbot[3460308]: /etc/letsencrypt/live/가림처리/fullchain.pem (failure)
	May 24 05:00:34 .. certbot[3460308]: 1 renew failure(s), 0 parse failure(s)
	May 24 05:00:34 .. systemd[1]: certbot.service: Main process exited, code=exited, status=1/FAILURE
	May 24 05:00:34 .. systemd[1]: certbot.service: Failed with result 'exit-code'.
	May 24 05:00:34 .. systemd[1]: Failed to start Certbot.

	#!/bin/bash

	NGINX_PID=`docker ps -asf "name=nginx"`
	docker stop $NGINX_PID

	...

	service nginx status

	● nginx.service - A high performance web server and a reverse proxy server
	Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
	Active: failed (Result: exit-code) since Wed 2023-05-24 16:23:39 KST; 5min ago
	Docs: man:nginx(8)
	Process: 3699224 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
	Process: 3699238 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=1/FAILURE)

	May 24 16:23:38 .. nginx[3699238]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
	May 24 16:23:38 .. nginx[3699238]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
	May 24 16:23:39 .. nginx[3699238]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
	May 24 16:23:39 .. nginx[3699238]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
	May 24 16:23:39 .. nginx[3699238]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
	May 24 16:23:39 .. nginx[3699238]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
	May 24 16:23:39 .. nginx[3699238]: nginx: [emerg] still could not bind()
	May 24 16:23:39 .. systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
	May 24 16:23:39 .. systemd[1]: nginx.service: Failed with result 'exit-code'.
	May 24 16:23:39 .. systemd[1]: Failed to start A high performance web server and a reverse proxy server.

	netstat -tulpN

	Active Internet connections (only servers)
	Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
	tcp 0 0 0.0.0.0:http 0.0.0.0:* LISTEN 3701015/nginx: mast
	tcp 0 0 localhost:domain 0.0.0.0:* LISTEN 3553746/systemd-res
	tcp 0 0 0.0.0.0:ssh 0.0.0.0:* LISTEN 833/sshd: /usr/sbin
	tcp 0 0 localhost:6010 0.0.0.0:* LISTEN 3689672/sshd: stick
	tcp 0 0 0.0.0.0:https 0.0.0.0:* LISTEN 3701015/nginx: mast
	tcp 0 0 localhost:1514 0.0.0.0:* LISTEN 3697547/docker-prox
	tcp6 0 0 [::]:http [::]:* LISTEN 3701015/nginx: mast
	tcp6 0 0 ip6-localhost:6010 [::]:* LISTEN 3689672/sshd: stick
	tcp6 0 0 [::]:https [::]:* LISTEN 3701015/nginx: mast
	udp 0 0 localhost:domain 0.0.0.0:* 3553746/systemd-res
	udp 0 0 stick:bootpc 0.0.0.0:* 3553728/systemd-net

	#!/bin/bash

	NGINX_PID=`docker ps -aqf "name=nginx"`
	NGINX_STATUS=`docker container inspect -f '{{.State.Running}}' nginx`

	if [ $NGINX_STATUS ];
	then
	docker stop $NGINX_PID
	fi

	/usr/bin/certbot renew

	fuser -k 80/tcp
	fuser -k 443/tcp

	/home/stick/harbor/install.sh

웹서버 오류 - bind() to[::]:443 failed 에러와 대처방법

'Tech > DevOps' 카테고리의 다른 글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역