七台异常 VPS 根因核查
结论:不是七台完全同一原因。HK、KR、KRB、TK、US 这 5 台高度一致,主因是 IP-Sentinel 残留服务反复退出刷日志;AR 是 Hermes gateway 短时失败日志;HKA 只有少量 Hermes gateway 历史失败日志。所有已连通节点当前 nginx/核心服务未见整体宕机,资源也正常。
7
核查节点
5
同一主因:IP-Sentinel 残留
0
当前 failed systemd 单元
0
确认 OOM/磁盘满
根因分组
- 同一原因组:HK、KR、KRB、TK、US。共同特征是日志里大量出现
ip-sentinel-agent-daemon.service: Main process exited与Failed with result 'exit-code'。这表示旧的 IP-Sentinel 残留服务仍在被 systemd 拉起,然后立即失败,造成“近24小时失败日志”异常。 - 不同原因/观察组:AR 与 HKA。AR 是 Hermes gateway 相关短时失败日志,但当前 hermes-gateway/nginx/docker 均运行;HKA 仅 2 条 hermes-gateway 历史失败日志,当前 nginx/docker/containerd 运行。
- 排除项:未发现 OOM(内存耗尽)、根盘满、核心 nginx/xray/docker 全面失败导致七台一起异常。
逐台摘要
| 节点 | 状态 | 判断 | 24h关键词数 | 资源 |
|---|---|---|---|---|
| AR 129.146.59.53 | 观察 | Hermes gateway 短时失败日志(当前核心服务已运行) | 27 | 4200/23974 18% 59G/116G 51% |
| KR 131.186.27.212 | 需处理 | IP-Sentinel 残留服务反复退出刷日志 | 32947 | 477/954 50% 4.4G/48G 10% |
| HK 82.158.88.91 | 需处理 | IP-Sentinel 残留服务反复退出刷日志 | 32944 | 453/1967 23% 11G/39G 27% |
| TK 103.232.213.10 | 需处理 | IP-Sentinel 残留服务反复退出刷日志 | 13338 | 263/957 27% 8.8G/20G 47% |
| KRB 161.118.130.5 | 需处理 | IP-Sentinel 残留服务反复退出刷日志 | 32940 | 441/954 46% 3.1G/45G 8% |
| US 186.244.244.52 | 需处理 | IP-Sentinel 残留服务反复退出刷日志 | 33678 | 971/3915 25% 12G/29G 42% |
| HKA 38.76.188.244 | 观察 | Hermes gateway 短时失败日志(当前核心服务已运行) | 2 | 764/3915 20% 13G/29G 43% |
逐台证据
AR 129.146.59.53
判断
Hermes gateway 短时失败日志(当前核心服务已运行)
系统/资源
Ubuntu 24.04.4 LTS;内存 4200/23974 18%;根盘 59G/116G 51%
服务状态
xray:not-found/-; nginx:loaded/active; hermes-gateway:loaded/active; docker:loaded/active; ip-sentinel-agent-daemon:not-found/-; site_total:not-found/-
失败单元
无
24小时关键词数
27
主要日志模式:
3 May N TIME instance-N-N python[N]: Traceback (most recent call last):
2 May N TIME instance-N-N systemd[N]: hermes-gateway.service: Main process exited, code=exited, status=N/FAILURE
2 May N TIME instance-N-N systemd[N]: hermes-gateway.service: Failed with result 'exit-code'.
2 May N TIME instance-N-N systemd[N]: hermes-gateway-health.service: Failed with result 'exit-code'.
2 May N TIME instance-N-N python[N]: The above exception was the direct cause of the following exception:
2 May N TIME instance-N-N python[N]: File "/root/.hermes/hermes-agent/venv/lib/pythonN.N/site-packages/httpx/_transports/default.py", line N, in map_httpcore_exceptions
2 May N TIME instance-N-N python[N]: self.gen.throw(typ, value, traceback)
1 May N TIME instance-N-N systemd[N]: hermes-gateway-stock.service: Main process exited, code=exited, status=N/FAILURE
1 May N TIME instance-N-N systemd[N]: hermes-gateway-stock.service: Failed with result 'exit-code'.
1 May N TIME instance-N-N systemd[N]: hermes-gateway-news.service: Main process exited, code=exited, status=N/FAILURE
1 May N TIME instance-N-N systemd[N]: hermes-gateway-news.service: Failed with result 'exit-code'.
1 May N TIME instance-N-N systemd[N]: hermes-gateway-network.service: Main process exited, code=exited, status=N/FAILURE
KR 131.186.27.212
判断
IP-Sentinel 残留服务反复退出刷日志
系统/资源
Ubuntu 24.04.4 LTS;内存 477/954 50%;根盘 4.4G/48G 10%
服务状态
xray:loaded/active; nginx:loaded/active; hermes-gateway:not-found/-; docker:not-found/-; ip-sentinel-agent-daemon:loaded/activating; site_total:not-found/-
失败单元
无
24小时关键词数
32947
主要日志模式:
16470 May N TIME instance-N-N systemd[N]: ip-sentinel-agent-daemon.service: Main process exited, code=exited, status=N/n/a
16470 May N TIME instance-N-N systemd[N]: ip-sentinel-agent-daemon.service: Failed with result 'exit-code'.
6 May N TIME instance-N-N pythonN[N]: /usr/bin/apt-key: N: cannot create /dev/null: Permission denied
1 May N TIME instance-N-N xray[N]: N/N/N TIME.N from N.N.N.N:N accepted tcp:zoom.us:N [z_direct_outbound] email: NcfdN-vless_reality_vision
HK 82.158.88.91
判断
IP-Sentinel 残留服务反复退出刷日志
系统/资源
Ubuntu 24.04.4 LTS;内存 453/1967 23%;根盘 11G/39G 27%
服务状态
xray:loaded/active; nginx:loaded/active; hermes-gateway:not-found/-; docker:not-found/-; ip-sentinel-agent-daemon:loaded/activating; site_total:not-found/-
失败单元
无
24小时关键词数
32944
主要日志模式:
16472 May N TIME serqNtNzyrkNxk systemd[N]: ip-sentinel-agent-daemon.service: Main process exited, code=exited, status=N/n/a
16472 May N TIME serqNtNzyrkNxk systemd[N]: ip-sentinel-agent-daemon.service: Failed with result 'exit-code'.
TK 103.232.213.10
判断
IP-Sentinel 残留服务反复退出刷日志
系统/资源
Ubuntu 22.04.5 LTS;内存 263/957 27%;根盘 8.8G/20G 47%
服务状态
xray:loaded/active; nginx:loaded/active; hermes-gateway:not-found/-; docker:not-found/-; ip-sentinel-agent-daemon:loaded/activating; site_total:not-found/-
失败单元
无
24小时关键词数
13338
主要日志模式:
6669 May N TIME jpproN-N systemd[N]: ip-sentinel-agent-daemon.service: Main process exited, code=exited, status=N/n/a
6669 May N TIME jpproN-N systemd[N]: ip-sentinel-agent-daemon.service: Failed with result 'exit-code'.
KRB 161.118.130.5
判断
IP-Sentinel 残留服务反复退出刷日志
系统/资源
Ubuntu 24.04.4 LTS;内存 441/954 46%;根盘 3.1G/45G 8%
服务状态
xray:not-found/-; nginx:loaded/active; hermes-gateway:not-found/-; docker:not-found/-; ip-sentinel-agent-daemon:loaded/activating; site_total:not-found/-
失败单元
无
24小时关键词数
32940
主要日志模式:
16469 May N TIME instance-N-N systemd[N]: ip-sentinel-agent-daemon.service: Main process exited, code=exited, status=N/n/a
16469 May N TIME instance-N-N systemd[N]: ip-sentinel-agent-daemon.service: Failed with result 'exit-code'.
US 186.244.244.52
判断
IP-Sentinel 残留服务反复退出刷日志
系统/资源
Ubuntu 24.04.4 LTS;内存 971/3915 25%;根盘 12G/29G 42%
服务状态
xray:not-found/-; nginx:loaded/active; hermes-gateway:not-found/-; docker:loaded/active; ip-sentinel-agent-daemon:loaded/activating; site_total:loaded/inactive
失败单元
无
24小时关键词数
33678
主要日志模式:
16466 May N TIME serN systemd[N]: ip-sentinel-agent-daemon.service: Main process exited, code=exited, status=N/n/a
16466 May N TIME serN systemd[N]: ip-sentinel-agent-daemon.service: Failed with result 'exit-code'.
370 May N TIME serN systemd[N]: site_total.service: Main process exited, code=exited, status=N/EXEC
370 May N TIME serN systemd[N]: site_total.service: Failed with result 'exit-code'.
HKA 38.76.188.244
判断
Hermes gateway 短时失败日志(当前核心服务已运行)
系统/资源
Ubuntu 24.04.4 LTS;内存 764/3915 20%;根盘 13G/29G 43%
服务状态
xray:not-found/-; nginx:loaded/active; hermes-gateway:not-found/-; docker:loaded/active; ip-sentinel-agent-daemon:not-found/-; site_total:not-found/-
失败单元
无
24小时关键词数
2
主要日志模式:
1 May N TIME serN systemd[N]: hermes-gateway.service: Main process exited, code=exited, status=N/TEMPFAIL
1 May N TIME serN systemd[N]: hermes-gateway.service: Failed with result 'exit-code'.
建议
- 优先处理 5 台同因节点的 IP-Sentinel 残留服务:先备份 unit 状态,再停止残留拉起链路、daemon-reload、reset-failed,并验证 24 小时日志不再暴涨。
- US 还要单独确认
site_total.service是否已退役;如果已退役,按残留服务处理;如果仍需要,应修复 Exec 路径。 - AR/HKA 暂不建议按 IP-Sentinel 同因处理,只需观察 Hermes gateway 的短时失败是否复现。
本次为只读核查,未重启、未修改、未删除任何远端配置。