提交 fb3c868d authored 作者: 陈泽健's avatar 陈泽健

fix(server-monitor): 修复服务器监测脚本配置文件路径问题

- 修正了服务器监测脚本名称从 check_server_health_v5.ps1 到 check_server_health.ps1
- 调整了钉钉通知流程,先将报告复制到通用模块钉通知目录下再发送消息
- 批量修复了31个模块脚本中的配置文件路径引用,从 $LIB_DIR/config.sh 改为 $LIB_DIR/lib/config.sh
- 修复了 common.sh 中配置文件路径错误问题
- 修正了 PowerShell 脚本执行命令,使用绝对路径替代相对路径执行模块
- 添加了上传前清理逻辑,在 Publish-Modules 函数中清理旧模块目录
- 更新了项目版本和桌面版本的同步路径配置
- 创建了钉钉通知优化需求文档,支持公网访问报告链接功能
- 修正了 common.sh 中配置文件不存在的错误提示信息
上级 61202328
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
################################################################################
# 服务器健康监测脚本 v4.0 (模块化架构)
# 功能: 通过SSH连接远程服务器,执行模块化系统健康检测并生成Markdown报告
# 作者: Claude Code
# 日期: 2026-05-09
################################################################################
param(
[Parameter(Mandatory=$false)]
[string]$HostName = "",
[Parameter(Mandatory=$false)]
[int]$Port = 0,
[Parameter(Mandatory=$false)]
[string]$Username = "",
[Parameter(Mandatory=$false)]
[string]$Password = ""
)
# 设置编码
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$OutputEncoding = [System.Text.Encoding]::UTF8
# ==================== 全局变量 ====================
$ErrorActionPreference = "Continue"
$scriptPath = Split-Path -Parent $MyInvocation.MyCommand.Path
$libPath = Join-Path $scriptPath "lib"
$modulePath = "/tmp/check_modules"
$timestamp = Get-Date -Format "yyyy-MM-dd_HH-mm-ss"
# 检测结果收集
$script:TestResults = @{}
# 问题列表
$script:CriticalIssues = New-Object System.Collections.Generic.List[string]
$script:WarningIssues = New-Object System.Collections.Generic.List[string]
# ==================== 日志函数 ====================
function Write-Log {
param(
[string]$Message,
[string]$Level = "INFO"
)
$logTime = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
$logMessage = "[$logTime] [$Level] $Message"
switch ($Level) {
"ERROR" { Write-Host $logMessage -ForegroundColor Red }
"WARN" { Write-Host $logMessage -ForegroundColor Yellow }
"DEBUG" { Write-Host $logMessage -ForegroundColor DarkGray }
default { Write-Host $logMessage -ForegroundColor White }
}
}
# ==================== 交互式输入函数 ====================
function Invoke-InteractiveInput {
Write-Host ""
Write-Host "========================================" -ForegroundColor Cyan
Write-Host " 服务器健康监测脚本 v4.0" -ForegroundColor Cyan
Write-Host " (模块化架构)" -ForegroundColor Cyan
Write-Host "========================================" -ForegroundColor Cyan
Write-Host ""
if ([string]::IsNullOrEmpty($script:HostName)) {
$script:HostName = Read-Host "请输入目标主机地址"
while ([string]::IsNullOrEmpty($script:HostName)) {
Write-Host "主机地址不能为空!" -ForegroundColor Red
$script:HostName = Read-Host "请输入目标主机地址"
}
}
if ($script:Port -eq 0) {
$portInput = Read-Host "请输入SSH端口 (默认: 22)"
if ([string]::IsNullOrEmpty($portInput)) {
$script:Port = 22
}
else {
while (-not ($portInput -match "^\d+$")) {
Write-Host "端口必须是数字!" -ForegroundColor Red
$portInput = Read-Host "请输入SSH端口 (默认: 22)"
}
$script:Port = [int]$portInput
}
}
if ([string]::IsNullOrEmpty($script:Username)) {
$script:Username = Read-Host "请输入SSH用户名 (默认: root)"
if ([string]::IsNullOrEmpty($script:Username)) {
$script:Username = "root"
}
}
if ([string]::IsNullOrEmpty($script:Password)) {
$securePassword = Read-Host "请输入SSH密码" -AsSecureString
$script:Password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto(
[System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($securePassword)
)
while ([string]::IsNullOrEmpty($script:Password)) {
Write-Host "密码不能为空!" -ForegroundColor Red
$securePassword = Read-Host "请输入SSH密码" -AsSecureString
$script:Password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto(
[System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($securePassword)
)
}
}
Write-Host ""
Write-Host "========================================" -ForegroundColor Cyan
Write-Host "连接信息确认:" -ForegroundColor Yellow
Write-Host " 主机地址: $script:HostName" -ForegroundColor White
Write-Host " SSH端口: $script:Port" -ForegroundColor White
Write-Host " 用户名: $script:Username" -ForegroundColor White
Write-Host "========================================" -ForegroundColor Cyan
$confirm = Read-Host "`n确认以上信息是否正确?(Y/N)"
if ($confirm -notmatch "Y|y") {
Write-Host "已取消执行。" -ForegroundColor Yellow
exit 0
}
Write-Host ""
}
# ==================== SSH命令执行 ====================
function Invoke-SSHCommand {
param(
[string]$Command,
[int]$Timeout = 30
)
try {
$plinkPath = Join-Path $scriptPath "plink.exe"
if (-not (Test-Path $plinkPath)) {
Write-Log "plink.exe未找到: $plinkPath" "ERROR"
return $null
}
$resultArray = & $plinkPath -ssh -P $script:Port -l $script:Username -pw $script:Password -batch $script:HostName $Command 2>&1
# 将数组转换为单字符串,保留换行符
$result = $resultArray -join "`n"
return $result
}
catch {
Write-Log "SSH命令执行异常: $($_.Exception.Message)" "ERROR"
return $null
}
}
function Test-SSHConnection {
Write-Log "测试SSH连接..."
$result = Invoke-SSHCommand "echo 'OK'" -Timeout 10
if ($result -match "OK") {
Write-Log "SSH连接成功!" "INFO"
return $true
}
else {
Write-Log "SSH连接失败!" "ERROR"
return $false
}
}
# ==================== 模块上传 ====================
function Publish-Modules {
Write-Log "开始上传检测模块到远程服务器..."
try {
# 创建远程目录
Invoke-SSHCommand "mkdir -p $modulePath/system" | Out-Null
Invoke-SSHCommand "mkdir -p $modulePath/service" | Out-Null
Write-Log "创建远程模块目录: $modulePath"
$pscpPath = Join-Path $scriptPath "pscp.exe"
if (-not (Test-Path $pscpPath)) {
Write-Log "pscp.exe未找到" "ERROR"
return $false
}
# 上传配置文件
$localConfig = Join-Path $libPath "config.sh"
if (Test-Path $localConfig) {
& $pscpPath -P $script:Port -l $script:Username -pw $script:Password $localConfig "${script:Username}@${script:HostName}:$modulePath/" 2>&1 | Out-Null
Write-Log "config.sh 上传完成"
}
# 上传通用函数库
$localCommon = Join-Path $libPath "common.sh"
if (Test-Path $localCommon) {
& $pscpPath -P $script:Port -l $script:Username -pw $script:Password $localCommon "${script:Username}@${script:HostName}:$modulePath/" 2>&1 | Out-Null
Write-Log "common.sh 上传完成"
}
# 上传系统模块
$systemModuleDir = Join-Path $libPath "system"
if (Test-Path $systemModuleDir) {
Get-ChildItem $systemModuleDir -Filter "*.sh" | ForEach-Object {
& $pscpPath -P $script:Port -l $script:Username -pw $script:Password $_.FullName "${script:Username}@${script:HostName}:$modulePath/system/" 2>&1 | Out-Null
Write-Host " $($_.Name) 上传完成"
}
}
# 上传服务模块
$serviceModuleDir = Join-Path $libPath "service"
if (Test-Path $serviceModuleDir) {
Get-ChildItem $serviceModuleDir -Filter "*.sh" | ForEach-Object {
& $pscpPath -P $script:Port -l $script:Username -pw $script:Password $_.FullName "${script:Username}@${script:HostName}:$modulePath/service/" 2>&1 | Out-Null
Write-Host " $($_.Name) 上传完成"
}
}
# 设置执行权限
Invoke-SSHCommand "chmod +x $modulePath/*.sh $modulePath/system/*.sh $modulePath/service/*.sh" | Out-Null
Write-Log "所有模块上传完成!" "INFO"
return $true
}
catch {
Write-Log "模块上传失败: $($_.Exception.Message)" "ERROR"
return $false
}
}
# ==================== 模块执行 ====================
function Invoke-ModuleCheck {
param(
[string]$ModuleName,
[string]$Category
)
Write-Log "执行模块: $Category/$ModuleName"
try {
$execCmd = "bash -c 'LIB_DIR=$modulePath && cd $modulePath && source config.sh && source common.sh && $Category/$ModuleName'"
$result = Invoke-SSHCommand $execCmd -Timeout 90
if ($null -eq $result) {
Write-Log "模块 $ModuleName 执行超时或失败" "WARN"
return @()
}
$parsedResults = Parse-ModuleResult -RawOutput $result -ModuleName $ModuleName
return $parsedResults
}
catch {
Write-Log "模块执行异常: $($_.Exception.Message)" "ERROR"
return @()
}
}
# ==================== 结果解析 ====================
function Parse-ModuleResult {
param(
[string]$RawOutput,
[string]$ModuleName
)
$results = @()
$lines = $RawOutput -split "`n"
$category = Get-ModuleCategory -ModuleName $ModuleName
# 调试输出(针对Java模块)
if ($ModuleName -eq "28_java_check.sh") {
Write-Log "=== $ModuleName 解析结果 ===" "DEBUG"
Write-Log "原始输出长度: $($RawOutput.Length)" "DEBUG"
Write-Log "原始输出内容: [$RawOutput]" "DEBUG"
Write-Log "接收到的行数: $($lines.Count)" "DEBUG"
}
foreach ($line in $lines) {
$line = $line.Trim()
# 跳过空行、注释、日志行
if ([string]::IsNullOrEmpty($line) -or $line -match "^#" -or $line -match "^\[") {
continue
}
# 跳过grep警告信息
if ($line -match "^grep.*警告") {
continue
}
# 跳过脚本错误信息(包含"行"、"需要整数表达式"、"语法错误")
if ($line -match "\.sh.*行\s*\d+") {
continue
}
# 跳过错误表达式的输出(格式:数字 | 数字 需要整数表达式)
if ($line -match "^\d+\s*\|\s*\d+.*需要整数表达式") {
continue
}
# 跳过只有数字和竖线的行
if ($line -match "^\d+\s*\|\s*\d+\s*$") {
continue
}
# 跳过表达式语法错误
if ($line -match "表达式中有语法错误") {
continue
}
# 跳过包含错误标识但不符合KEY:VALUE格式的行
if ($line -match "需要整数表达式" -or $line -match "语法错误") {
continue
}
# 跳过包含中文冒号或特殊格式的行(通常是错误输出)
if ($line -match ":\[" -or $line -match ":\d+\s*\|") {
continue
}
if ($line -match "^ERROR:(.+)") {
Write-Log "$ModuleName 错误: $($matches[1].Trim())" "WARN"
continue
}
# 确保是标准的KEY:VALUE格式(英文冒号,KEY不含空格和特殊字符)
if ($line -match "^([A-Z_][A-Z0-9_]*)\s*:\s*(.+)$") {
$key = $matches[1].Trim()
$value = $matches[2].Trim()
# Write-Log "解析到: $key = $value" "DEBUG"
$result = Convert-ToResultObject -Key $key -Value $value -ModuleName $ModuleName -Category $category
if ($result) {
$results += $result
# Write-Log "已添加结果: $key" "DEBUG"
}
}
}
# Write-Log "=== $ModuleName 解析完成,共 $($results.Count) 个结果 ===" "DEBUG"
return $results
}
# ==================== 获取模块分类 ====================
function Get-ModuleCategory {
param(
[string]$ModuleName
)
switch -Regex ($ModuleName) {
"^01_" { return "系统基础信息" }
"^02_" { return "CPU检测" }
"^03_" { return "内存检测" }
"^04_" { return "磁盘检测" }
"^05_" { return "OOM检测" }
"^06_" { return "进程检测" }
"^07_" { return "网络检测" }
"^11_" { return "计划任务" }
"^12_" { return "端口检测" }
"^(20|21)_" { return "Docker容器" }
"^2[23]_" { return "MySQL数据库" }
"^2[45]_" { return "Redis缓存" }
"^2[67]_" { return "EMQX消息队列" }
"^28_" { return "Java应用" }
"^29_" { return "Python应用" }
"^3[04]_" { return "Nginx应用" }
"^3[15]_" { return "Nacos应用" }
"^3[26]_" { return "FastDFS应用" }
"^33_" { return "应用日志" }
"^40_" { return "综合诊断" }
"^43_" { return "安全合规检测" }
"^44_" { return "系统日志检测" }
"^45_" { return "时间同步检测" }
default { return "其他" }
}
}
# ==================== 转换结果对象 ====================
function Convert-ToResultObject {
param(
[string]$Key,
[string]$Value,
[string]$ModuleName,
[string]$Category
)
try {
$name = Get-DisplayName -Key $Key
$threshold = Get-Threshold -Key $Key
$status = Get-StatusByValue -Key $Key -Value $Value
$result = [PSCustomObject]@{
Name = $name
Value = $Value
Threshold = $threshold
Status = $status
Module = $ModuleName
Key = $Key
}
if ($status -eq "严重" -or $status -eq "警告") {
$issueMsg = "$name : $Value"
Add-Issue -Message $issueMsg -Level $status
}
return $result
}
catch {
Write-Log "转换结果对象失败: $($_.Exception.Message)" "WARN"
return $null
}
}
# ==================== 获取显示名称 ====================
function Get-DisplayName {
param(
[string]$Key
)
$displayNames = @{
# 系统基础信息
"HOSTNAME" = "主机名"
"OS_VERSION" = "操作系统版本"
"KERNEL_VERSION" = "内核版本"
"UPTIME_DAYS" = "运行天数"
"LOAD_1MIN" = "1分钟负载"
"LOAD_5MIN" = "5分钟负载"
"LOAD_15MIN" = "15分钟负载"
"CPU_USAGE" = "CPU使用率"
"MEMORY_USAGE" = "内存使用率"
"DISK_USAGE" = "磁盘使用率"
"ULIMIT_INFO" = "系统资源限制"
"KERNEL_CMDLINE" = "内核启动参数"
"KERNEL_PARAM_FS_FILE_MAX" = "文件描述符最大值"
"KERNEL_PARAM_INOTIFY_MAX_WATCHES" = "inotify最大监听数"
"KERNEL_PARAM_SOMAXCONN" = "监听队列最大长度"
"KERNEL_PARAM_TCP_TW_REUSE" = "TCP TIME_WAIT重用"
"KERNEL_PARAM_TCP_FIN_TIMEOUT" = "TCP FIN超时时间"
"CPU_USER" = "CPU用户态使用率"
"CPU_SYSTEM" = "CPU系统态使用率"
"CPU_PER_CORE" = "各核心CPU使用率"
"CPU_TOP10_PROCESSES" = "CPU占用TOP10进程"
"CPU_CONTEXT_SWITCHES" = "CPU上下文切换次数"
"CPU_INTERRUPTS" = "CPU中断次数"
"IRQ_DETAIL" = "中断详情"
"SOFTIRQS" = "软中断统计"
"SCHEDULER_RUNQUEUE" = "调度器运行队列长度"
"CPU_AFFINITY_SAMPLE" = "CPU亲和性示例"
"MEMORY_USED" = "已用内存"
"MEMORY_FREE" = "空闲内存"
"MEMORY_TOTAL" = "总内存"
"MEMORY_BUFFERS" = "内存缓冲区"
"MEMORY_CACHED" = "内存缓存"
"MEMORY_TOP5_PROCESSES" = "内存占用TOP5进程"
"MEMORY_PRESSURE_AVG10" = "内存压力(avg10)"
"MEMORY_PRESSURE_AVG60" = "内存压力(avg60)"
"VM_PGMajFAULT" = "主缺页异常次数"
"VM_PSWPIN" = "换入页面数"
"VM_PSWPOUT" = "换出页面数"
"SLAB_TOTAL" = "Slab缓存总量"
"SLAB_RECLAIMABLE" = "可回收Slab缓存"
"HUGEPAGES_TOTAL" = "大页总数"
"HUGEPAGES_FREE" = "空闲大页数"
"HUGEPAGES_SIZE" = "大页大小"
"TRANSPARENT_HUGEPAGE" = "透明大页状态"
"FD_ALLOCATED" = "已分配文件描述符"
"FD_MAXIMUM" = "最大文件描述符"
"FD_STATUS" = "文件描述符状态"
"LOAD_CORES" = "CPU核心数"
"LOAD_RATIO" = "负载与核心数比值"
"LOAD_STATUS" = "系统负载状态"
"PROCESS_TOP10_CPU" = "CPU占用TOP10进程"
"PROCESS_TOP10_MEMORY" = "内存占用TOP10进程"
"PROCESS_COUNT" = "进程总数"
"PROCESS_LONGEST_RUNNING" = "运行时间最长进程"
"ZOMBIE_COUNT" = "僵尸进程数"
"ZOMBIE_STATUS" = "僵尸进程状态"
"THREAD_STATUS" = "线程状态"
"UNINTERRUPTIBLE_COUNT" = "不可中断进程数"
"UNINTERRUPTIBLE_PROCESSES" = "不可中断进程详情"
"UNINTERRUPTIBLE_STATUS" = "不可中断进程状态"
"PROCESS_OPEN_FILES_TOP5" = "打开文件数TOP5进程"
"PROCESS_CONNECTIONS_TOP5" = "网络连接数TOP5进程"
"TCP_ESTABLISHED" = "ESTABLISHED连接数"
"TCP_TIME_WAIT" = "TIME_WAIT连接数"
"TCP_TIME_WAIT_STATUS" = "TIME_WAIT连接状态"
"TCP_CLOSE_WAIT" = "CLOSE_WAIT连接数"
"TCP_CLOSE_WAIT_STATUS" = "CLOSE_WAIT连接状态"
"NET_INTERFACE_UP" = "启用网络接口数"
"NET_INTERFACE_TOTAL" = "总网络接口数"
"NET_INTERFACE_STATUS" = "网络接口状态"
"NET_ERRORS" = "网络错误"
"NET_ERRORS_STATUS" = "网络错误状态"
"NET_TRAFFIC" = "网络流量"
"DNS_RESOLUTION" = "DNS解析状态"
"NET_HOSTNAME" = "主机名"
"NET_GATEWAY" = "默认网关"
"NET_GATEWAY_STATUS" = "网关状态"
"TCP_EXTENDED_STATS" = "TCP扩展统计"
"ARP_ENTRIES" = "ARP表项"
"ARP_COUNT" = "ARP条目数"
"ROUTES_COUNT" = "路由条目数"
"SOCKET_STATS" = "Socket统计"
"NET_BANDWIDTH_DETAIL" = "网络带宽详情"
"LISTEN_QUEUE_BACKLOG" = "监听队列积压"
"GATEWAY_PING_STATUS" = "网关连通性"
"CONTAINER_NETWORK_TO_EMQX" = "到EMQX容器网络"
"DISK_IO_STATUS" = "磁盘IO状态"
"DISK_SMART_STATUS" = "磁盘SMART状态"
"DISK_PARTITIONS" = "磁盘分区信息"
"DISK_LATENCY" = "磁盘延迟"
"REMOTE_MOUNTS" = "远程挂载点"
"REMOTE_MOUNTS_COUNT" = "远程挂载数量"
"RAID_DEVICES" = "RAID设备"
"RAID_STATUS" = "RAID状态"
"LVM_VOLUMES" = "LVM卷信息"
"DISK_SCHEDULER" = "磁盘调度器"
# Redis基础
"KEY_COUNT" = "Redis键数量"
"CLIENT_COUNT" = "Redis客户端数"
"CACHE_HIT_RATE" = "Redis缓存命中率"
"REDIS_VERSION" = "Redis版本"
"REDIS_MEMORY_USED" = "Redis内存使用"
"REDIS_MEM_FRAGMENTATION" = "Redis内存碎片率"
"REJECTED_CONNECTIONS" = "Redis拒绝连接数"
"SLOW_LOG_TOP10" = "Redis慢日志TOP10"
"PERSISTENCE_STATUS" = "Redis持久化状态"
"REPLICATION_STATUS" = "Redis复制状态"
"CLIENT_DETAIL" = "Redis客户端详情"
"CLUSTER_STATUS" = "Redis集群状态"
# MySQL基础
"CONNECTIONS" = "MySQL连接数信息"
"SLOW_QUERIES" = "MySQL慢查询数"
"MYSQL_VERSION" = "MySQL版本"
"MYSQL_CONNECTIONS_CURRENT" = "MySQL当前连接数"
"MYSQL_CONNECTIONS_USAGE" = "MySQL连接使用率"
"MYSQL_INNODB_HIT_RATE" = "MySQL InnoDB命中率"
"QPS" = "MySQL QPS"
"TPS" = "MySQL TPS"
"DEADLOCKS" = "MySQL死锁数"
"BUFFER_POOL_HIT_RATE" = "MySQL缓冲池命中率"
"ACTIVE_QUERIES" = "MySQL活跃查询数"
"FRAGMENTED_TABLES" = "MySQL碎片表数量"
"CONN_ERRORS" = "MySQL连接错误数"
"TRX_ACTIVE" = "MySQL活跃事务数"
"LOCK_WAITS" = "MySQL锁等待数"
"THREADS_POOL" = "MySQL线程池状态"
"TEMP_TABLE_RATE" = "MySQL临时表使用率"
"DATABASE_SIZE" = "MySQL数据库大小"
"TABLE_COUNT" = "MySQL表数量"
"TABLES_WITHOUT_INDEX" = "MySQL无索引表数量"
"SLOW_QUERY_TOP1" = "MySQL最慢查询统计"
"LOCK_DETAIL" = "MySQL锁详情"
"FRAGMENTED_DETAIL" = "MySQL碎片详情"
# EMQX基础
"EMQX_CLIENTS_TOTAL" = "EMQX客户端总数"
"EMQX_CLIENTS_CONNECTED" = "EMQX已连接客户端"
"EMQX_CLIENTS_DISCONNECTED" = "EMQX已断开客户端"
"EMQX_SUBSCRIPTIONS_TOTAL" = "EMQX订阅总数"
"EMQX_TOPICS_TOTAL" = "EMQX主题总数"
"EMQX_ROUTES_TOTAL" = "EMQX路由总数"
"EMQX_LISTENERS_COUNT" = "EMQX监听器数量"
"EMQX_SESSIONS_TOTAL" = "EMQX会话总数"
"EMQX_SESSIONS_ACTIVE" = "EMQX活跃会话"
"EMQX_SESSIONS_INACTIVE" = "EMQX非活跃会话"
"EMQX_NODE_STATUS" = "EMQX节点状态"
"EMQX_NODE_LEVEL" = "EMQX节点状态等级"
"EMQX_CLUSTER_MODE" = "EMQX集群模式"
"EMQX_CLUSTER_NODES" = "EMQX集群节点数"
"EMQX_MAX_CONNECTIONS" = "EMQX最大连接数"
"EMQX_CONN_USAGE" = "EMQX连接使用率"
"EMQX_CONN_LEVEL" = "EMQX连接状态等级"
"EMQX_MEMORY_USAGE" = "EMQX内存使用率"
"EMQX_PLUGINS_COUNT" = "EMQX插件数量"
"EMQX_CONTAINER" = "EMQX容器名称"
"EMQX_CONTAINER_STATUS" = "EMQX容器状态"
"EMQX_CONTAINER_LEVEL" = "EMQX容器状态等级"
"EMQX_RULES_COUNT" = "EMQX规则数量"
"EMQX_ALARMS_COUNT" = "EMQX告警数量"
"EMQX_ALARMS_LEVEL" = "EMQX告警等级"
"EMQX_MESSAGES_SENT" = "EMQX发送消息数"
"EMQX_MESSAGES_RECEIVED" = "EMQX接收消息数"
"EMQX_VERSION" = "EMQX版本"
"EMQX_DASHBOARD_STATUS" = "EMQX数据板状态"
"EMQX_DASHBOARD_LEVEL" = "EMQX数据板状态等级"
"EMQX_DASHBOARD_URL" = "EMQX数据板访问地址"
"EMQX_DASHBOARD_VERSION" = "EMQX数据板版本"
# 通用状态
"DOCKER_STATUS" = "Docker状态"
"MYSQL_STATUS" = "MySQL状态"
"REDIS_STATUS" = "Redis状态"
"EMQX_STATUS" = "EMQX状态"
"JAVA_CONTAINER" = "Java容器名称"
"JAVA_CONTAINER_STATUS" = "Java容器状态"
"JAVA_CONTAINER_LEVEL" = "Java容器状态等级"
"JAVA_TEST_OUTPUT" = "Java测试输出"
"JAVA_MODULE_STARTED" = "Java模块启动"
"JAVA_MODULE_STATUS" = "Java模块状态"
"JAVA_MODULE_COMPLETED" = "Java模块完成"
"JAVA_VERSION" = "Java版本"
"JAVA_PID" = "Java进程PID"
"JAVA_VM_SIZE" = "Java虚拟内存大小"
"JAVA_VM_RSS" = "Java物理内存大小"
"JAVA_THREADS" = "Java线程数"
"JAVA_HEAP_CAPACITY" = "Java堆内存容量"
"JAVA_HEAP_USED" = "Java堆内存已用"
"JAVA_HEAP_PERCENT" = "Java堆内存使用率"
"JAVA_HEAP_LEVEL" = "Java堆内存状态等级"
"JAVA_FULL_GC" = "Java Full GC次数"
"JAVA_YOUNG_GC" = "Java Young GC次数"
"JAVA_GC_LEVEL" = "Java GC状态等级"
"JAVA_THREAD_COUNT" = "Java应用线程数"
"JAVA_THREAD_LEVEL" = "Java线程状态等级"
"JAVA_LOG_ERRORS" = "Java日志错误数"
"JAVA_LOG_LEVEL" = "Java日志状态等级"
"SPRINGBOOT_ACTUATOR" = "Spring Boot Actuator状态"
"SPRINGBOOT_HEALTH_STATUS" = "Spring Boot健康状态"
"SPRINGBOOT_HEALTH_LEVEL" = "Spring Boot健康等级"
"JAVA_CONFIG_FILE" = "Java配置文件路径"
"JAVA_SERVER_PORT" = "Java服务端口"
"JAVA_DATABASE_URL" = "Java数据库连接URL"
"JAVA_CONFIG_COUNT" = "Java配置项数量"
"PYTHON_VERSION" = "Python版本"
"NGINX_VERSION" = "Nginx版本"
"NACOS_VERSION" = "Nacos版本"
# Nginx深度检测
"NGINX_ACTIVE_CONNECTIONS" = "Nginx活跃连接数"
"NGINX_ACCEPTED_CONNECTIONS" = "Nginx已接受连接数"
"NGINX_HANDLED_CONNECTIONS" = "Nginx已处理连接数"
"NGINX_REQUESTS_TOTAL" = "Nginx总请求数"
"NGINX_PROCESSES" = "Nginx进程数"
"NGINX_LISTENING_PORTS" = "Nginx监听端口"
"NGINX_CONFIG_STATUS" = "Nginx配置状态"
"NGINX_CONFIG_LEVEL" = "Nginx配置等级"
"NGINX_WORKER_PROCESSES" = "Nginx工作进程数"
"NGINX_WORKER_CONNECTIONS" = "Nginx工作连接数"
"NGINX_ACCESS_LOG_SIZE" = "Nginx访问日志大小"
"NGINX_ERROR_LOG_SIZE" = "Nginx错误日志大小"
"NGINX_RECENT_ERRORS" = "Nginx最近错误数"
"NGINX_ACCESS_LOG_STATS" = "Nginx访问日志状态码分布"
"NGINX_STATUS_2XX" = "Nginx状态码2xx数量"
"NGINX_STATUS_3XX" = "Nginx状态码3xx数量"
"NGINX_STATUS_4XX" = "Nginx状态码4xx数量"
"NGINX_STATUS_5XX" = "Nginx状态码5xx数量"
"NGINX_ERROR_RATE" = "Nginx请求错误率"
"NGINX_SLOWEST_REQUESTS" = "Nginx最慢请求TOP10"
"NGINX_SLOW_REQUESTS_COUNT" = "Nginx慢请求数量"
"NGINX_SSL_ENABLED" = "Nginx SSL启用"
"NGINX_CACHE_ENABLED" = "Nginx缓存启用"
"NGINX_UPSTREAM_COUNT" = "Nginx上游数量"
"NGINX_SERVER_BLOCKS" = "Nginx服务器块数量"
"NGINX_UPTIME_DAYS" = "Nginx运行天数"
# Nacos深度检测
"NACOS_NAMESPACES" = "Nacos命名空间数"
"NACOS_SERVICES_COUNT" = "Nacos服务数量"
"NACOS_INSTANCES_COUNT" = "Nacos实例数量"
"NACOS_HEALTHY_INSTANCES" = "Nacos健康实例数"
"NACOS_HEALTH_RATE" = "Nacos健康率"
"NACOS_CONFIGS_COUNT" = "Nacos配置数量"
"NACOS_GRPC_CONNECTIONS" = "Nacos gRPC连接数"
"NACOS_RAFT_MODE" = "Nacos Raft模式"
"NACOS_RAFT_ROLE" = "Nacos Raft角色"
"NACOS_RAFT_NODES" = "Nacos Raft节点数"
"NACOS_HEALTH_STATUS" = "Nacos健康状态"
"NACOS_HEALTH_LEVEL" = "Nacos健康等级"
"NACOS_HEAP_USAGE" = "Nacos堆内存使用率"
"NACOS_THREAD_COUNT" = "Nacos线程数"
"NACOS_MAIN_PORT" = "Nacos主端口状态"
"NACOS_RAFT_PORT" = "Nacos Raft端口状态"
"NACOS_LOG_SIZE" = "Nacos日志大小"
"NACOS_RECENT_ERRORS" = "Nacos最近错误数"
"NACOS_UPTIME_DAYS" = "Nacos运行天数"
# FastDFS深度检测
"FASTDFS_VERSION" = "FastDFS版本"
"FASTDFS_TRACKER_STATUS" = "FastDFS Tracker状态"
"FASTDFS_TRACKER_PROCESSES" = "FastDFS Tracker进程数"
"FASTDFS_STORAGE_STATUS" = "FastDFS Storage状态"
"FASTDFS_STORAGE_PROCESSES" = "FastDFS Storage进程数"
"FASTDFS_STORE_PATH_COUNT" = "FastDFS存储路径数"
"FASTDFS_DISK_USAGE" = "FastDFS磁盘使用率"
"FASTDFS_SYNC_ENABLED" = "FastDFS同步启用"
"FASTDFS_FILE_COUNT" = "FastDFS文件数量"
"FASTDFS_DIR_COUNT" = "FastDFS目录数量"
"FASTDFS_TOTAL_SIZE" = "FastFS总大小"
"FASTDFS_TRUNK_ENABLED" = "FastDFS Trunk启用"
"FASTDFS_TRACKER_CONNECTION" = "FastDFS Tracker连接状态"
"FASTDFS_LOG_SIZE" = "FastDFS日志大小"
"FASTDFS_RECENT_ERRORS" = "FastDFS最近错误数"
"FASTDFS_HTTP_ENABLED" = "FastDFS HTTP启用"
"FASTDFS_HTTP_PORT" = "FastDFS HTTP端口"
"FASTDFS_HTTP_STATUS" = "FastDFS HTTP状态"
"FASTDFS_STORAGE_UPTIME_DAYS" = "FastDFS Storage运行天数"
"FASTDFS_GROUP_NAME" = "FastDFS组名"
# 安全合规检测
"AUTH_FAILURES_24H" = "24小时认证失败次数"
"AUTH_FAILURES_LEVEL" = "认证失败等级"
"RECENT_LOGINS" = "最近登录记录"
"CURRENT_USERS_COUNT" = "当前登录用户数"
"CURRENT_USERS" = "当前登录用户"
"SELINUX_STATUS" = "SELinux状态"
"FIREWALL_STATUS" = "防火墙状态"
"FIREWALL_RULES" = "防火墙规则"
"IPTABLES_STATUS" = "iptables状态"
"IPTABLES_RULES_COUNT" = "iptables规则数量"
"OPEN_PORTS_COUNT" = "开放端口数量"
"ABNORMAL_ACCOUNTS" = "异常账户"
"ABNORMAL_ACCOUNTS_LEVEL" = "异常账户等级"
"SUSPICIOUS_SUID_COUNT" = "可疑SUID文件数量"
"SUSPICIOUS_SUID_FILES" = "可疑SUID文件"
"MODIFIED_CONF_COUNT" = "修改的配置文件数量"
"MODIFIED_CONF_FILES" = "修改的配置文件"
"ABNORMAL_CRON" = "异常cron任务"
"ABNORMAL_CRON_LEVEL" = "异常cron任务等级"
"MAX_LOGIN_FAILURES" = "最大登录失败次数"
"BRUTE_FORCE_IPS" = "暴力破解IP"
"BRUTE_FORCE_LEVEL" = "暴力破解等级"
"EMPTY_PASSWORD_ACCOUNTS" = "空密码账户"
"EMPTY_PASSWORD_LEVEL" = "空密码账户等级"
"SSH_PERMIT_ROOT" = "SSH允许root登录"
"SSH_PASSWORD_AUTH" = "SSH密码认证"
"SSH_PORT" = "SSH端口"
"SSH_DEFAULT_PORT" = "SSH使用默认端口"
"SSH_DEFAULT_PORT_LEVEL" = "SSH默认端口等级"
"SSH_ROOT_LOGIN_LEVEL" = "SSH root登录等级"
"SSH_PASSWORD_AUTH_LEVEL" = "SSH密码认证等级"
"OPEN_TCP_PORTS" = "开放TCP端口"
"OPEN_UDP_PORTS" = "开放UDP端口"
"HIGH_RISK_PORTS" = "高风险端口"
"HIGH_RISK_PORTS_LEVEL" = "高风险端口等级"
# 系统日志检测
"KERNEL_ERRORS_24H" = "24小时内核错误数"
"KERNEL_ERRORS_LEVEL" = "内核错误等级"
"DISK_ERRORS_24H" = "24小时磁盘错误数"
"DISK_ERRORS_DMESG" = "dmesg磁盘错误数"
"DISK_ERRORS_LEVEL" = "磁盘错误等级"
"DMESG_ERRORS_COUNT" = "dmesg错误数量"
"DMESG_ERROR_TYPES" = "dmesg错误类型"
"MESSAGES_LOG_STATUS" = "messages日志状态"
"MESSAGES_ERRORS_COUNT" = "messages错误数量"
"MESSAGES_WARNS_COUNT" = "messages警告数量"
"MESSAGES_LOG_SIZE" = "messages日志大小"
"KERNEL_PANIC_COUNT" = "内核panic数量"
"KERNEL_OOPS_COUNT" = "内核oops数量"
"KERNEL_CRASH_FILES" = "内核崩溃文件数量"
"KERNEL_STABILITY_LEVEL" = "内核稳定性等级"
"SERVICE_CRASH_COUNT" = "服务崩溃数量"
"CRASHED_SERVICES" = "崩溃的服务"
"SERVICE_STABILITY_LEVEL" = "服务稳定性等级"
"SYSTEMD_STATUS" = "systemd状态"
"SYSTEMD_FAILED_COUNT" = "systemd失败服务数量"
"SYSTEMD_FAILED_SERVICES" = "systemd失败服务"
"SYSTEMD_FAILED_LEVEL" = "systemd失败等级"
"OOM_KILLER_COUNT" = "OOM Killer数量"
"CORE_DUMP_FILES" = "Core dump文件数量"
"OOM_VICTIMS" = "OOM受害者"
"OOM_LEVEL" = "OOM等级"
"RESOURCE_EXHAUSTION_EVENTS" = "资源耗尽事件"
"RESOURCE_EXHAUSTION_LEVEL" = "资源耗尽等级"
"HARDWARE_ERRORS" = "硬件错误"
"HARDWARE_ERRORS_LEVEL" = "硬件错误等级"
"LARGE_LOG_FILES" = "大日志文件"
"LARGE_LOG_FILES_LEVEL" = "大日志文件等级"
"NETWORK_ERRORS" = "网络错误"
"NETWORK_ERRORS_LEVEL" = "网络错误等级"
# 应用日志检测
"APP_LOG_ERRORS_LAST_HOUR" = "应用日志最近1小时错误数"
"APP_LOG_HOURLY_STATS" = "应用日志每小时错误统计"
"APP_LOG_ERROR_LEVEL" = "应用日志错误等级"
"APP_LOG_ERROR_TYPES" = "应用日志错误类型分类"
"DOCKER_LOG_ERRORS_JAVA" = "Java容器最近1小时错误数"
"DOCKER_LOG_CONTAINER_JAVA" = "Java容器实际名称"
"DOCKER_LOG_ERRORS_NGINX" = "Nginx容器最近1小时错误数"
"DOCKER_LOG_CONTAINER_NGINX" = "Nginx容器实际名称"
"DOCKER_LOG_ERRORS_NACOS" = "Nacos容器最近1小时错误数"
"DOCKER_LOG_CONTAINER_NACOS" = "Nacos容器实际名称"
"DOCKER_LOG_ERRORS_EMQX" = "EMQX容器最近1小时错误数"
"DOCKER_LOG_CONTAINER_EMQX" = "EMQX容器实际名称"
# 时间同步检测
"NTP_SERVICE_STATUS" = "NTP服务状态"
"SYSTEM_CLOCK_SYNC" = "系统时钟同步"
"SYSTEM_CLOCK_SYNC_LEVEL" = "系统时钟同步等级"
"NTP_SERVICE_NAME" = "NTP服务名称"
"NTP_DAEMON" = "NTP守护进程"
"NTP_SOURCES" = "NTP同步源"
"NTP_SOURCES_COUNT" = "NTP同步源数量"
"NTP_CURRENT_SOURCE" = "NTP当前同步源"
"NTP_CONFIG_SOURCES" = "NTP配置同步源"
"NTP_OFFSET_MS" = "NTP时钟偏差(毫秒)"
"NTP_OFFSET_SEC" = "NTP时钟偏差(秒)"
"NTP_OFFSET_LEVEL" = "NTP时钟偏差等级"
"SYSTEM_DATETIME" = "系统时间"
"SYSTEM_TIMESTAMP" = "系统时间戳"
"SYSTEM_TIMEZONE" = "系统时区"
"HTTPS_CERT_INFO" = "HTTPS证书信息"
"HTTPS_CERT_MIN_DAYS" = "HTTPS证书最小剩余天数"
"HTTPS_CERT_LEVEL" = "HTTPS证书等级"
"HTTPS_CERT_STATUS" = "HTTPS证书状态"
"EMQX_CERT_INFO" = "EMQX证书信息"
"EMQX_CERT_MIN_DAYS" = "EMQX证书最小剩余天数"
"EMQX_CERT_LEVEL" = "EMQX证书等级"
"EMQX_CERT_STATUS" = "EMQX证书状态"
"MYSQL_CERT_MIN_DAYS" = "MySQL证书最小剩余天数"
"MYSQL_CERT_LEVEL" = "MySQL证书等级"
"SYSTEM_UPTIME_DAYS" = "系统运行天数"
"HWCLOCK_STATUS" = "硬件时钟状态"
# 新增系统基础检测项
"SCHEDULER_PROCS_RUNNING" = "调度器可运行进程数"
"SCHEDULER_BLOCKED_STATUS" = "调度器阻塞状态"
"PROCESS_EXE_PATHS_TOP10" = "进程可执行路径TOP10"
"PROCESS_ORPHAN_COUNT" = "孤儿进程数"
"PROCESS_ORPHAN_STATUS" = "孤儿进程状态"
"JOURNAL_ERROR_COUNT" = "系统错误数量"
"JOURNAL_ERROR_LEVEL" = "系统错误等级"
"RECENT_SYSTEM_ERRORS" = "最近系统错误"
"OOM_LOG_DETAILS" = "OOM日志详情"
}
if ($displayNames.ContainsKey($Key)) {
return $displayNames[$Key]
}
return $Key
}
# ==================== 获取阈值 ====================
function Get-Threshold {
param(
[string]$Key
)
$thresholds = @{
# ==================== 系统基础信息阈值 ====================
"CPU使用率" = ">85%"
"CPU_USAGE" = ">85%"
"内存使用率" = ">85%"
"MEMORY_USAGE" = ">85%"
"SWAP_USAGE" = ">20%"
"SWAP使用率" = ">20%"
"SWAP_USED" = ">20%"
"1分钟负载" = ">8"
"LOAD_1MIN" = ">8"
"5分钟负载" = ">8"
"LOAD_5MIN" = ">8"
"15分钟负载" = ">8"
"LOAD_15MIN" = ">8"
"THREAD_COUNT" = ">1000"
"线程总数" = ">1000"
"FD_USAGE" = ">80%"
"文件描述符使用率" = ">80%"
"ZOMBIE_COUNT" = ">0"
"僵尸进程数" = ">0"
"TCP_TIME_WAIT" = ">500"
"TIME_WAIT连接数" = ">500"
"TCP_CLOSE_WAIT" = ">100"
"CLOSE_WAIT连接数" = ">100"
# ==================== Docker容器阈值 ====================
"DOCKER_LARGE_LOGS" = ">500MB"
"DOCKER_IMAGES_DANGLING" = ">5"
"LOG_CARDTABLE_SIZE" = ">500MB"
"LOG_PAPERLESS_SIZE" = ">500MB"
"LOG_UPYTHON_VOICE_SIZE" = ">500MB"
"LOG_UPYTHON_SIZE" = ">500MB"
"LOG_UJAVA2_SIZE" = ">500MB"
"LOG_UNGINX_SIZE" = ">500MB"
"LOG_UNGROK_SIZE" = ">500MB"
"LOG_USTORAGE_SIZE" = ">500MB"
"LOG_UTRACKER_SIZE" = ">500MB"
"LOG_UNACOS_SIZE" = ">500MB"
"LOG_UEMQX_SIZE" = ">500MB"
"LOG_UREDIS_SIZE" = ">500MB"
"LOG_UMYSQL_SIZE" = ">500MB"
# ==================== MySQL阈值 ====================
"MYSQL_CONNECTIONS_USAGE" = ">80%"
"MYSQL_CONNECTIONS_LEVEL" = ">80%"
"MySQL连接使用率" = ">80%"
"MYSQL_SLOW_QUERIES" = ">100"
"MYSQL慢查询数" = ">100"
"SLOW_QUERIES" = ">100"
"MYSQL_TABLE_USAGE" = ">90%"
"MYSQL表缓存使用率" = ">90%"
"MYSQL_INNODB_HIT_RATE" = "<95%"
"MYSQL_INNODB命中率" = "<95%"
"MYSQL_CONNECTIONS_CURRENT" = ">400"
"MySQL当前连接数" = ">400"
# ==================== Redis阈值 ====================
"REDIS_KEYS" = ">1000000"
"REDIS_KEYS_LEVEL" = ">1000000"
"REDIS键数量" = ">1000000"
"KEY_COUNT" = ">1000000"
"REDIS_MEM_FRAGMENTATION" = ">5"
"REDIS_MEM_FRAG_LEVEL" = ">5"
"REDIS内存碎片率" = ">5"
"REDIS_CLIENTS" = ">500"
"REDIS客户端数" = ">500"
"CLIENT_COUNT" = ">500"
"CACHE_HIT_RATE" = "<90%"
"Redis缓存命中率" = "<90%"
# ==================== EMQX阈值 ====================
"EMQX_CLIENTS" = ">1000"
"EMQX客户端数" = ">1000"
"EMQX_CONNECTIONS" = ">1000"
"EMQX_SUBSCRIPTIONS" = ">5000"
"EMQX订阅数" = ">5000"
"EMQX_SESSIONS" = ">1000"
"EMQX_TOPICS" = ">100"
"EMQX_ROUTES" = ">1000"
"EMQX_LISTENERS" = "<1"
"EMQX_CLUSTER_NODES" = "<1"
"EMQX_SESSIONS_ACTIVE" = "<1"
"EMQX_CLIENTS_TOTAL" = ">1000"
"EMQX_CLIENTS_CONNECTED" = ">1000"
"EMQX_SUBSCRIPTIONS_TOTAL" = ">5000"
"EMQX_TOPICS_TOTAL" = ">100"
"EMQX_ROUTES_TOTAL" = ">1000"
"EMQX_LISTENERS_COUNT" = "<1"
"EMQX_SESSIONS_TOTAL" = ">1000"
"EMQX_DASHBOARD_LEVEL" = "严重"
"EMQX_ALARMS_LEVEL" = "警告"
# ==================== Java应用阈值 ====================
"JAVA_THREADS" = ">500"
"JAVA_THREAD_COUNT" = ">500"
"Java线程数" = ">500"
"JAVA_LOG_ERRORS" = ">10"
"JAVA_HEAP_PERCENT" = ">80%"
"JAVA_HEAP_LEVEL" = "警告"
"JAVA_FULL_GC" = ">2"
"JAVA_GC_LEVEL" = "警告"
"JAVA_THREAD_LEVEL" = "警告"
"JAVA_LOG_LEVEL" = "警告"
"SPRINGBOOT_HEALTH_LEVEL" = "严重"
# ==================== 综合诊断阈值 ====================
"DIAG_SWAP_LEVEL" = ">20%"
"DIAG_ZOMBIE_LEVEL" = ">0"
"DIAG_TIMEWAIT_LEVEL" = ">500"
"DIAG_LOAD_1MIN" = ">8"
"DIAG_MEMORY_USAGE" = ">85%"
"DIAG_ZOMBIE" = ">0"
"DIAG_TIMEWAIT" = ">500"
"DIAG_MYSQL_SLOW" = ">100"
"DIAG_MYSQL_SLOW_LEVEL" = ">100"
# ==================== 系统日志阈值 ====================
"APP_LOG_ERRORS_24H" = ">50"
"APP_LOG_LEVEL" = ">50"
"AUTH_FAILURES_24H" = ">100"
"KERNEL_ERRORS_24H" = ">10"
"DISK_ERRORS_24H" = ">5"
"OOM_LOGS_7D" = ">1"
"OOM_COUNT" = ">1"
"LARGE_LOG_FILES" = ">0"
"CRON_ERRORS_24H" = ">20"
# ==================== Python应用阈值 ====================
"PYTHON_PROCESSES" = ">100"
"PYTHON_MEMORY" = ">80%"
# ==================== Nacos应用阈值 ====================
"NACOS_MEMORY" = ">80%"
# ==================== Redis额外阈值 ====================
"REDIS_MEMORY_USED" = ">1GB"
"REDIS_BLOCKED_CLIENTS" = ">10"
# ==================== Docker额外阈值 ====================
"DOCKER_IMAGES_COUNT" = ">20"
"DOCKER_VOLUMES_COUNT" = ">10"
"DOCKER_NETWORK_COUNT" = ">5"
# ==================== 系统基础信息额外阈值 ====================
"PROCESS_COUNT" = ">500"
"TCP_ESTABLISHED" = ">500"
"KERNEL_ERRORS" = ">5"
"CORE_DUMP_COUNT" = ">1"
"AUTH_FAIL_COUNT" = ">50"
"NET_ERRORS" = ">0"
"LOAD_RATIO" = ">1"
"BOOT_FAILED_SERVICES" = ">0"
"LOGGED_USERS" = ">5"
"PASS_MAX_DAYS" = ">90"
# ==================== 端口连接数阈值 ====================
"PORT_MySQL_CONNECTIONS" = ">100"
"PORT_Redis_CONNECTIONS" = ">200"
"PORT_EMQX_MQTT_CONNECTIONS" = ">100"
"PORT_HTTP_CONNECTIONS" = ">50"
"PORT_HTTPS_CONNECTIONS" = ">50"
# ==================== MySQL额外阈值 ====================
"CONN_ERRORS" = ">100"
"DATABASE_SIZE" = ">10GB"
"QPS" = ">1000"
"TPS" = ">100"
"DEADLOCKS" = ">0"
# ==================== EMQX额外阈值 ====================
"EMQX_CLIENTS_DISCONNECTED" = ">50"
"EMQX_MAX_CONNECTIONS" = "<1000"
"EMQX_CONN_USAGE" = ">80%"
"EMQX_CONN_LEVEL" = ">80%"
"EMQX_MEMORY_USAGE" = ">80%"
"EMQX_SESSIONS_INACTIVE" = ">100"
"EMQX_PLUGINS_COUNT" = "<1"
"EMQX_RULES_COUNT" = "未启用"
"EMQX_ALARMS_COUNT" = ">0"
"EMQX_MESSAGES_SENT" = ">1000000"
"EMQX_MESSAGES_RECEIVED" = ">1000000"
"EMQX_CLUSTER_MODE" = "否"
"EMQX_LISTENER_PORTS" = "<1"
# ==================== Redis额外阈值(扩展) ====================
"REJECTED_CONNECTIONS" = ">50"
"SLOW_LOG_TOP10_Count" = ">5"
"SLOW_LOG_TOP10_Slowest" = ">100000"
"CLIENT_DETAIL_IdleOver5min" = ">10"
"CLIENT_DETAIL_Blocking" = ">0"
"PERSISTENCE_STATUS" = "备份中"
"CLUSTER_STATUS" = "!OK"
"KEY_TYPE_DISTRIBUTION" = "N/A"
# ==================== MySQL额外阈值(扩展) ====================
"ACTIVE_QUERIES" = ">50"
"FRAGMENTED_TABLES" = ">5"
"TRX_ACTIVE" = ">10"
"LOCK_WAITS" = ">0"
"THREADS_POOL" = ">50"
"TEMP_TABLE_RATE" = ">30%"
"ACTIVE_PROCESSLIST_LongRunning" = ">5"
"TABLES_WITHOUT_INDEX" = ">0"
"SLOW_QUERY_TOP1_TotalTime" = ">10"
"LOCK_DETAIL_Waits" = ">0"
"FRAGMENTED_DETAIL_Count" = ">5"
"BUFFER_POOL_HIT_RATE" = "<95%"
# ==================== 日志系统阈值 ====================
"JOURNAL_DISK_USAGE" = ">500M"
# ==================== 端口检测阈值 ====================
"OPEN_PORTS" = ">100"
"OPEN_PORTS_LEVEL" = ">100"
# ==================== 安全检测阈值 ====================
"SUSPICIOUS_SUID_COUNT" = ">5"
"FAILED_SERVICES" = ">0"
# ==================== 时间同步阈值 ====================
"CLOCK_OFFSET" = ">1秒"
"时钟偏差" = ">1秒"
# ==================== 证书检测阈值 ====================
"SSL_CERT_DAYS_LEFT" = "<30天"
"EMQX_CERT_DAYS_LEFT" = "<30天"
# ==================== Nginx应用阈值 ====================
"NGINX_ACTIVE_CONNECTIONS" = ">1000"
"NGINX_CONFIG_LEVEL" = "严重"
"NGINX_RECENT_ERRORS" = ">10"
"NGINX_STATUS_5XX" = ">50"
"NGINX_STATUS_4XX" = ">100"
"NGINX_ERROR_RATE" = ">5%"
"NGINX_SLOW_REQUESTS_COUNT" = ">100"
"NGINX_HEAP_USAGE" = ">80%"
# ==================== Nacos应用阈值 ====================
"NACOS_HEALTH_RATE" = "<90%"
"NACOS_HEALTH_LEVEL" = "严重"
"NACOS_HEAP_USAGE" = ">80%"
"NACOS_RECENT_ERRORS" = ">20"
"NACOS_THREAD_COUNT" = ">500"
# ==================== FastDFS应用阈值 ====================
"FASTDFS_DISK_USAGE" = ">80%"
"FASTDFS_TRACKER_CONNECTION" = "INACTIVE"
"FASTDFS_RECENT_ERRORS" = ">10"
"FASTDFS_FILE_COUNT" = ">100000"
"FASTDFS_HTTP_STATUS" = "0"
# ==================== 安全合规检测阈值 ====================
"AUTH_FAILURES_LEVEL" = ">100"
"ABNORMAL_ACCOUNTS" = ">0"
"ABNORMAL_ACCOUNTS_LEVEL" = ">0"
"MODIFIED_CONF_COUNT" = ">10"
"ABNORMAL_CRON" = "未发现异常"
"ABNORMAL_CRON_LEVEL" = "未发现异常"
"MAX_LOGIN_FAILURES" = ">20"
"BRUTE_FORCE_LEVEL" = ">20"
"EMPTY_PASSWORD_ACCOUNTS" = ">0"
"EMPTY_PASSWORD_LEVEL" = ">0"
"SSH_DEFAULT_PORT" = "是"
"SSH_DEFAULT_PORT_LEVEL" = "是"
"SSH_ROOT_LOGIN_LEVEL" = "是"
"SSH_PASSWORD_AUTH_LEVEL" = "是"
"HIGH_RISK_PORTS" = ">0"
"HIGH_RISK_PORTS_LEVEL" = ">0"
# ==================== 系统日志检测阈值 ====================
"KERNEL_ERRORS_LEVEL" = ">10"
"DISK_ERRORS_LEVEL" = ">5"
"KERNEL_PANIC_COUNT" = ">0"
"KERNEL_OOPS_COUNT" = ">5"
"KERNEL_STABILITY_LEVEL" = ">0"
"SERVICE_CRASH_COUNT" = ">5"
"SERVICE_STABILITY_LEVEL" = ">5"
"SYSTEMD_FAILED_COUNT" = ">0"
"SYSTEMD_FAILED_LEVEL" = ">0"
"OOM_KILLER_COUNT" = ">1"
"CORE_DUMP_FILES" = ">0"
"OOM_LEVEL" = ">0"
"RESOURCE_EXHAUSTION_EVENTS" = ">0"
"RESOURCE_EXHAUSTION_LEVEL" = ">0"
"HARDWARE_ERRORS" = ">0"
"HARDWARE_ERRORS_LEVEL" = ">0"
"LARGE_LOG_FILES_LEVEL" = ">0"
"NETWORK_ERRORS" = ">0"
"NETWORK_ERRORS_LEVEL" = ">0"
"APP_LOG_ERRORS_LAST_HOUR" = ">20"
"APP_LOG_ERROR_LEVEL" = "警告"
"DOCKER_LOG_ERRORS_JAVA" = ">50"
"DOCKER_LOG_ERRORS_NGINX" = ">50"
"DOCKER_LOG_ERRORS_NACOS" = ">20"
"DOCKER_LOG_ERRORS_EMQX" = ">20"
# ==================== 时间同步检测阈值 ====================
"SYSTEM_CLOCK_SYNC" = "未同步"
"SYSTEM_CLOCK_SYNC_LEVEL" = "未同步"
"NTP_OFFSET_LEVEL" = ">1秒"
"HTTPS_CERT_MIN_DAYS" = "<30天"
"HTTPS_CERT_LEVEL" = "<30天"
"EMQX_CERT_MIN_DAYS" = "<30天"
"EMQX_CERT_LEVEL" = "<30天"
"MYSQL_CERT_MIN_DAYS" = "<30天"
"MYSQL_CERT_LEVEL" = "<30天"
# ==================== 系统基础信息新增阈值 ====================
"TCP_TIME_WAIT_STATUS" = "严重"
"TCP_CLOSE_WAIT_STATUS" = "严重"
"NET_INTERFACE_STATUS" = "严重"
"NET_ERRORS_STATUS" = "警告"
"DNS_RESOLUTION" = "警告"
"NET_GATEWAY_STATUS" = "警告"
"GATEWAY_PING_STATUS" = "异常"
"FD_STATUS" = "严重"
"LOAD_STATUS" = "严重"
"ZOMBIE_STATUS" = "严重"
"THREAD_STATUS" = "严重"
"UNINTERRUPTIBLE_STATUS" = "严重"
"DISK_SMART_STATUS" = "严重"
"RAID_STATUS" = "degraded"
"MEMORY_PRESSURE_AVG10" = ">1"
"VM_PGMajFAULT" = ">100"
# ==================== 新增系统基础检测阈值 ====================
"SCHEDULER_PROCS_RUNNING" = ">100"
"SCHEDULER_BLOCKED_STATUS" = "严重"
"PROCESS_ORPHAN_COUNT" = ">10"
"PROCESS_ORPHAN_STATUS" = "警告"
"JOURNAL_ERROR_COUNT" = ">50"
"JOURNAL_ERROR_LEVEL" = ">50"
}
if ($thresholds.ContainsKey($Key)) {
return $thresholds[$Key]
}
return "-"
}
# ==================== 根据值获取状态 ====================
function Get-StatusByValue {
param(
[string]$Key,
[string]$Value
)
if ($Value -match "^(正常|警告|严重|ERROR|OK|运行中|异常|未运行|已停止)$") {
switch ($Value) {
"正常" { return "正常" }
"警告" { return "警告" }
"严重" { return "严重" }
"OK" { return "正常" }
"ERROR" { return "严重" }
"运行中" { return "正常" }
"异常" { return "严重" }
"未运行" { return "严重" }
"已停止" { return "警告" }
default { return "正常" }
}
}
if ($Value -match "([\d\.]+)%") {
$percent = [double]$matches[1]
switch ($Key) {
"CPU_USAGE" {
if ($percent -ge 100) { return "严重" }
if ($percent -ge 85) { return "警告" }
return "正常"
}
"MEMORY_USAGE" {
if ($percent -ge 95) { return "严重" }
if ($percent -ge 85) { return "警告" }
return "正常"
}
"DISK_USAGE" {
if ($percent -ge 95) { return "严重" }
if ($percent -ge 90) { return "警告" }
return "正常"
}
}
}
return "正常"
}
# ==================== 添加问题 ====================
function Add-Issue {
param(
[string]$Message,
[string]$Level = "警告"
)
if ([string]::IsNullOrWhiteSpace($Message)) {
return
}
if ($Level -eq "严重") {
$script:CriticalIssues.Add($Message)
}
else {
$script:WarningIssues.Add($Message)
}
}
# ==================== 保存检测结果 ====================
function Save-TestResult {
param(
[string]$Category,
[PSCustomObject]$Result
)
try {
if (-not $script:TestResults.ContainsKey($Category)) {
$script:TestResults[$Category] = @()
}
$script:TestResults[$Category] += $Result
}
catch {
Write-Log "保存检测结果失败: $($_.Exception.Message)" "WARN"
}
}
# ==================== 执行所有检测 ====================
function Invoke-AllChecks {
Write-Log ""
Write-Log "========================================" "INFO"
Write-Log "开始执行检测模块" "INFO"
Write-Log "========================================" "INFO"
# 系统模块列表
$systemModules = @(
"01_system_basic.sh", "02_cpu_check.sh", "03_memory_check.sh",
"04_disk_check.sh", "05_oom_check.sh", "06_process_check.sh",
"07_network_check.sh", "43_security_compliance.sh", "44_system_logs.sh",
"45_time_sync.sh", "11_scheduled_tasks.sh", "12_port_check.sh"
)
# 综合诊断模块(在所有模块之后执行)
$comprehensiveModules = @(
"40_comprehensive_diagnosis.sh"
)
# 服务模块列表
$serviceModules = @(
"20_docker_basic.sh", "21_docker_deep.sh", "22_mysql_basic.sh",
"23_mysql_depth.sh", "24_redis_basic.sh", "25_redis_depth.sh",
"26_emqx_basic.sh", "27_emqx_deep.sh", "28_java_check.sh",
"29_python_check.sh", "30_nginx_check.sh", "34_nginx_deep.sh",
"31_nacos_check.sh", "35_nacos_deep.sh", "32_fastdfs_check.sh",
"36_fastdfs_deep.sh", "33_app_logs.sh"
)
$totalModules = $systemModules.Count + $serviceModules.Count + $comprehensiveModules.Count
$currentModule = 0
# 执行系统模块
Write-Log ""
Write-Log "--- 系统模块检测 ---" "INFO"
foreach ($module in $systemModules) {
$currentModule++
Write-Host ""
Write-Host "[$currentModule/$totalModules] " -NoNewline
Write-Host "执行: $module" -ForegroundColor Cyan
$results = Invoke-ModuleCheck -ModuleName $module -Category "system"
$category = Get-ModuleCategory -ModuleName $module
foreach ($result in $results) {
Save-TestResult -Category $category -Result $result
}
}
# 执行服务模块
Write-Log ""
Write-Log "--- 服务模块检测 ---" "INFO"
foreach ($module in $serviceModules) {
$currentModule++
Write-Host ""
Write-Host "[$currentModule/$totalModules] " -NoNewline
Write-Host "执行: $module" -ForegroundColor Cyan
$results = Invoke-ModuleCheck -ModuleName $module -Category "service"
$category = Get-ModuleCategory -ModuleName $module
foreach ($result in $results) {
Save-TestResult -Category $category -Result $result
}
}
Write-Log ""
Write-Log "--- 生成综合诊断数据文件 ---" "INFO"
# 生成综合诊断所需的当前数据文件
$dataFilePath = "$modulePath/current_data.txt"
$dataContent = ""
foreach ($category in $script:TestResults.Keys) {
$items = $script:TestResults[$category]
foreach ($item in $items) {
$dataContent += "$($item.Name):$($item.Value)`n"
}
}
# 通过SSH保存数据文件到远程服务器
$tempFile = [System.IO.Path]::GetTempFileName()
$dataContent | Out-File -FilePath $tempFile -Encoding UTF8 -Force
$pscpPath = Join-Path $scriptPath "pscp.exe"
& $pscpPath -P $script:Port -l $script:Username -pw $script:Password $tempFile "${script:Username}@${script:HostName}:$modulePath/current_data.txt" 2>&1 | Out-Null
Remove-Item $tempFile -Force
Write-Log "数据文件已生成: $dataFilePath"
# 执行综合诊断模块
Write-Log ""
Write-Log "--- 综合诊断检测 ---" "INFO"
foreach ($module in $comprehensiveModules) {
$currentModule++
Write-Host ""
Write-Host "[$currentModule/$totalModules] " -NoNewline
Write-Host "执行: $module" -ForegroundColor Cyan
$results = Invoke-ModuleCheck -ModuleName $module -Category "system"
$category = Get-ModuleCategory -ModuleName $module
foreach ($result in $results) {
Save-TestResult -Category $category -Result $result
}
}
Write-Log ""
Write-Log "========================================" "INFO"
Write-Log "所有检测模块执行完成!" "INFO"
Write-Log "========================================" "INFO"
}
# ==================== 生成报告 ====================
function New-MarkdownReport {
$reportLines = @()
# 获取系统基础信息
$basicInfo = @{}
if ($script:TestResults.ContainsKey("系统基础信息")) {
foreach ($item in $script:TestResults["系统基础信息"]) {
$basicInfo[$item.Key] = $item.Value
}
}
# 报告头部
$reportLines += "# 服务器健康巡检报告"
$reportLines += ""
$reportLines += "**时间:** $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss')"
$hostLine = "**主机:** $script:HostName"
if ($basicInfo["HOSTNAME"]) {
$hostLine += " ($($basicInfo["HOSTNAME"]))"
}
$reportLines += $hostLine
if ($basicInfo["OS_VERSION"]) {
$reportLines += "**操作系统:** $($basicInfo["OS_VERSION"])"
}
if ($basicInfo["KERNEL_VERSION"]) {
$reportLines += "**内核:** $($basicInfo["KERNEL_VERSION"])"
}
if ($basicInfo["BOOT_TIME"]) {
$reportLines += "**启动时间:** $($basicInfo["BOOT_TIME"])"
}
if ($basicInfo["UPTIME_DAYS"]) {
$reportLines += "**运行时间:** $($basicInfo["UPTIME_DAYS"]) days"
}
# 总体状态
if ($script:CriticalIssues.Count -gt 0) {
$reportLines += "**状态:** 🔴 严重"
}
elseif ($script:WarningIssues.Count -gt 0) {
$reportLines += "**状态:** 🟡 警告"
}
else {
$reportLines += "**状态:** 🟢 正常"
}
$reportLines += ""
$reportLines += "---"
$reportLines += ""
# 核心问题诊断
$reportLines += "## 核心问题诊断"
if ($script:CriticalIssues.Count -gt 0) {
$reportLines += "### 严重问题 ($($script:CriticalIssues.Count)个)"
foreach ($issue in $script:CriticalIssues) {
$reportLines += "+ 🔴 $issue"
}
$reportLines += ""
$reportLines += ""
}
if ($script:WarningIssues.Count -gt 0) {
$reportLines += "### 警告问题 ($($script:WarningIssues.Count)个)"
foreach ($issue in $script:WarningIssues) {
$reportLines += "+ 🟡 $issue"
}
$reportLines += ""
$reportLines += ""
}
$reportLines += "**诊断摘要:** 关键问题: $($script:CriticalIssues.Count), 警告: $($script:WarningIssues.Count)"
$reportLines += ""
$reportLines += "---"
$reportLines += ""
# 系统基础信息表格
if ($script:TestResults.ContainsKey("系统基础信息")) {
$reportLines += "## 系统基础信息"
$reportLines += "| 项目 | 值 |"
$reportLines += "| :--- | :--- |"
$basicItems = @("HOSTNAME", "OS_VERSION", "KERNEL_VERSION", "BOOT_TIME", "UPTIME_DAYS", "CPU_CORES", "MEMORY_TOTAL")
$basicDisplayNames = @{
"HOSTNAME" = "主机名"
"OS_VERSION" = "操作系统"
"KERNEL_VERSION" = "内核版本"
"BOOT_TIME" = "启动时间"
"UPTIME_DAYS" = "运行时间"
"CPU_CORES" = "CPU核心数"
"MEMORY_TOTAL" = "总内存"
}
foreach ($key in $basicItems) {
if ($basicInfo.ContainsKey($key)) {
$name = $basicDisplayNames[$key]
$value = $basicInfo[$key]
$reportLines += "| $name | $value |"
}
}
# 添加系统负载
if ($basicInfo.ContainsKey("LOAD_1MIN")) {
$reportLines += "| 系统负载 | $($basicInfo["LOAD_1MIN"]) $($basicInfo["LOAD_5MIN"]) $($basicInfo["LOAD_15MIN"]) |"
}
$reportLines += ""
$reportLines += ""
}
# 模块检测结果
$moduleNumber = 0
foreach ($category in $script:TestResults.Keys) {
if ($category -eq "系统基础信息") { continue }
$items = $script:TestResults[$category]
if ($items.Count -eq 0) { continue }
$moduleNumber++
$reportLines += "---"
$reportLines += ""
$reportLines += "## 检测模块 $moduleNumber$category"
$reportLines += ""
# 过滤显示项
$displayedItems = @()
$seenItems = @{} # 用于去重
foreach ($item in $items) {
# 跳过错误信息和特殊项
if ($item.Name -match "^awk$" -or
$item.Name -match "cmd\.line" -or
$item.Name -match "unexpected" -or
$item.Value -match "^awk\|" -or
$item.Value -match "unexpected newline") {
continue
}
# 跳过容器详细信息和内部变量
if ($item.Name -match "^(CONTAINER_|LIMIT_|PROCESSES_)" -and
$item.Name -notmatch "_(STATUS|LEVEL)$" -and
$item.Name -notmatch "^(JAVA_CONTAINER|EMQX_CONTAINER|MYSQL_CONTAINER|REDIS_CONTAINER|NGINX_CONTAINER|NACOS_CONTAINER|PYTHON_CONTAINER)$") {
continue
}
# 跳过Docker详细项(保留状态)
if ($item.Name -match "^DOCKER_" -and
$item.Name -notmatch "_(STATUS|LEVEL)$") {
continue
}
# 跳过所有日志项(除了容器状态)
if ($item.Name -match "^LOG_") {
continue
}
# 跳过端口连接详细信息(保留状态)
if ($item.Name -match "^PORT_.*_CONNECTIONS$" -and $item.Name -notmatch "_STATUS$") {
continue
}
# 跳过长列表内容项(但保留重要的进程检测项)
if ($item.Name -match "(TOP5|TOP10|TOP20|_LIST|_DETAIL|_DISTRIBUTION|_STATS|_CONFIG|_INFO|TOPICS|TOP1)$") {
# 排除例外项:进程检测项、系统错误项、应用日志项、服务统计项
if ($item.Name -notmatch "^(PROCESS_EXE_PATHS|RECENT_SYSTEM_ERRORS|PROCESS_ORPHAN|APP_LOG_ERRORS_LAST_HOUR|APP_LOG_HOURLY_STATS|APP_LOG_ERROR_LEVEL|APP_LOG_ERROR_TYPES|DOCKER_LOG_ERRORS|DOCKER_LOG_CONTAINER|NGINX_ACCESS_LOG_STATS|NGINX_STATUS_2XX|NGINX_STATUS_3XX|NGINX_STATUS_4XX|NGINX_STATUS_5XX|NGINX_ERROR_RATE|NGINX_SLOWEST_REQUESTS|NGINX_SLOW_REQUESTS_COUNT|SPRINGBOOT_ACTUATOR|SPRINGBOOT_HEALTH_STATUS|SPRINGBOOT_HEALTH_LEVEL|JAVA_CONFIG_FILE|JAVA_SERVER_PORT|JAVA_DATABASE_URL|JAVA_CONFIG_COUNT|EMQX_DASHBOARD_STATUS|EMQX_DASHBOARD_LEVEL|EMQX_DASHBOARD_URL|EMQX_DASHBOARD_VERSION)") {
continue
}
}
# 跳过纯配置类长列表项(但保留重要的状态和统计)
if ($item.Name -match "(LONG_QUERIES|DATABASE_LIST|UBAINS_TABLES|INNODB_BP|_LIST\$|TOP20|TOP10|TOP5)") {
continue
}
# 跳过冗余的详细信息(但有价值的指标保留)
if ($item.Name -match "^(MEMORY_INFO|CONFIG_CHECK|KEYSPACE_DETAIL|KEY_TYPE_DISTRIBUTION|COMMAND_STATS|ACTIVE_PROCESSLIST|REPLICATION_DETAIL|INNODB_TRX)$") {
continue
}
# 跳过纯数据统计类指标(但保留有阈值监控的指标)
if ($item.Name -match "^(REDIS_UPTIME_DAYS|REDIS_MEMORY_MAX|MYSQL_UPTIME_DAYS|MYSQL_CONNECTIONS_MAX|MYSQL_OPEN_TABLES|MYSQL_TABLE_CACHE|MYSQL_CHARSET|MYSQL_COLLATION)$") {
continue
}
# 去重:跳过已显示的项(基于Name)
if ($seenItems.ContainsKey($item.Name)) {
continue
}
$seenItems[$item.Name] = $true
# 跳过重复的版本信息(优先保留中文或更友好的名称)
if ($item.Name -match "^(REDIS_|MYSQL_|EMQX_|JAVA_|PYTHON_|NGINX_|NACOS_)VERSION$" -and
$seenItems.ContainsKey("版本")) {
continue
}
if ($item.Name -match "版本$") {
$seenItems["VERSION"] = $true
}
# 跳过重复的指标(保留英文键,过滤中文键)
if ($item.Name -match "^(运行天数|Redis键数量|Redis客户端数|MySQL连接数信息|MySQL慢查询数)$") {
continue
}
# 跳过MySQL模块中的Redis相关项
if ($category -eq "MySQL数据库" -and $item.Name -match "Redis") {
continue
}
$displayedItems += $item
}
# 显示过滤后的项
foreach ($item in $displayedItems) {
$reportLines += "+ **$($item.Name)**: $($item.Value) | 阈值: $($item.Threshold) | 状态: $(Get-StatusText -Status $item.Status) | 说明: -"
}
$reportLines += ""
$reportLines += ""
}
# 报告尾部
$reportLines += "---"
$reportLines += ""
$reportLines += "_报告生成时间: $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss')_"
$reportLines += "_服务器健康监测脚本 v4.0 (模块化架构)_"
$reportLines += "_检测模块数量: $($moduleNumber.ToString())_"
return $reportLines -join "`n"
}
# ==================== 获取状态文本 ====================
function Get-StatusText {
param(
[string]$Status
)
switch ($Status) {
"正常" { return "🟢 正常" }
"警告" { return "🟡 警告" }
"严重" { return "🔴 严重" }
default { return "⚪ 正常" }
}
}
# ==================== 获取状态图标 ====================
function Get-StatusIcon {
param(
[string]$Status
)
switch ($Status) {
"正常" { return "🟢" }
"警告" { return "🟡" }
"严重" { return "🔴" }
default { return "⚪" }
}
}
# ==================== 保存报告 ====================
function Save-Report {
param(
[string]$Content
)
$reportDir = Join-Path $scriptPath "reports"
if (-not (Test-Path $reportDir)) {
New-Item -ItemType Directory -Path $reportDir -Force | Out-Null
}
$fileName = "health_report_${script:HostName}_${timestamp}.md"
$filePath = Join-Path $reportDir $fileName
$Content | Out-File -FilePath $filePath -Encoding UTF8 -Force
Write-Log "报告已保存: $filePath" "INFO"
return $filePath
}
# ==================== 主函数 ====================
function Main {
Invoke-InteractiveInput
if (-not (Test-SSHConnection)) {
Write-Log "无法连接到服务器,退出执行!" "ERROR"
return
}
if (-not (Publish-Modules)) {
Write-Log "模块上传失败,退出执行!" "ERROR"
return
}
Invoke-AllChecks
Write-Log ""
Write-Log "生成检测报告..."
$reportContent = New-MarkdownReport
$reportPath = Save-Report -Content $reportContent
Write-Host ""
Write-Host "========================================" -ForegroundColor Cyan
Write-Host "检测完成!" -ForegroundColor Green
Write-Host "========================================" -ForegroundColor Cyan
Write-Host "严重问题: $($script:CriticalIssues.Count)" -ForegroundColor $(if ($script:CriticalIssues.Count -gt 0) { "Red" } else { "Green" })
Write-Host "警告问题: $($script:WarningIssues.Count)" -ForegroundColor $(if ($script:WarningIssues.Count -gt 0) { "Yellow" } else { "Green" })
Write-Host "报告路径: $reportPath" -ForegroundColor White
Write-Host "========================================" -ForegroundColor Cyan
}
# ==================== 执行入口 ====================
Main
################################################################################
# 服务器健康监测脚本 v3.0 (模块化架构)
# 功能: 通过SSH连接远程服务器,执行模块化系统健康检测并生成Markdown报告
# 作者: Claude Code
# 日期: 2026-05-09
################################################################################
param(
[Parameter(Mandatory=$false)]
[string]$HostName = "",
[Parameter(Mandatory=$false)]
[int]$Port = 0,
[Parameter(Mandatory=$false)]
[string]$Username = "",
[Parameter(Mandatory=$false)]
[string]$Password = ""
)
# 设置编码
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$OutputEncoding = [System.Text.Encoding]::UTF8
# ==================== 全局变量 ====================
$ErrorActionPreference = "Continue"
$scriptPath = Split-Path -Parent $MyInvocation.MyCommand.Path
$libPath = Join-Path $scriptPath "lib"
$modulePath = "/tmp/check_modules"
$timestamp = Get-Date -Format "yyyy-MM-dd_HH-mm-ss"
# 检测结果收集
$script:检测结果 = @{
系统基础信息 = @{}
CPU资源 = @()
内存资源 = @()
磁盘资源 = @()
OOM检测 = @()
进程状态 = @()
网络连接 = @()
安全合规 = @()
系统日志 = @()
时间同步 = @()
端口服务 = @()
Docker容器 = @()
MySQL数据库 = @()
Redis缓存 = @()
EMQX消息队列 = @()
Java应用 = @()
Python应用 = @()
Nginx应用 = @()
Nacos应用 = @()
FastDFS应用 = @()
应用日志 = @()
}
# 问题列表
$script:严重问题 = New-Object System.Collections.Generic.List[string]
$script:警告问题 = New-Object System.Collections.Generic.List[string]
# 系统信息
$script:systemInfo = @{}
# ==================== 模块配置 ====================
$ModuleConfig = @{
# 系统模块 (01-12)
SystemModules = @(
"01_system_basic.sh",
"02_cpu_check.sh",
"03_memory_check.sh",
"04_disk_check.sh",
"05_oom_check.sh",
"06_process_check.sh",
"07_network_check.sh",
"08_security_check.sh",
"09_system_logs.sh",
"10_time_sync.sh",
"11_scheduled_tasks.sh",
"12_port_check.sh"
)
# 服务模块 (20-33)
ServiceModules = @(
"20_docker_basic.sh",
"21_docker_deep.sh",
"22_mysql_basic.sh",
"23_mysql_depth.sh",
"24_redis_basic.sh",
"25_redis_depth.sh",
"26_emqx_basic.sh",
"27_emqx_deep.sh",
"28_java_check.sh",
"29_python_check.sh",
"30_nginx_check.sh",
"31_nacos_check.sh",
"32_fastdfs_check.sh",
"33_app_logs.sh"
)
# 远程模块路径
RemoteModulePath = $modulePath
# 本地模块路径
LocalModulePath = Join-Path $scriptPath "lib"
}
# ==================== 日志函数 ====================
function Write-Log {
param(
[string]$Message,
[string]$Level = "INFO"
)
$logTime = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
$logMessage = "[$logTime] [$Level] $Message"
switch ($Level) {
"ERROR" { Write-Host $logMessage -ForegroundColor Red }
"WARN" { Write-Host $logMessage -ForegroundColor Yellow }
"DEBUG" { Write-Host $logMessage -ForegroundColor DarkGray }
default { Write-Host $logMessage -ForegroundColor White }
}
}
# ==================== 交互式输入函数 ====================
function Invoke-InteractiveInput {
Write-Host "`n========================================" -ForegroundColor Cyan
Write-Host " 服务器健康监测脚本 v3.0" -ForegroundColor Cyan
Write-Host " (模块化架构)" -ForegroundColor Cyan
Write-Host "========================================" -ForegroundColor Cyan
Write-Host ""
# 输入主机地址
if ([string]::IsNullOrEmpty($script:HostName)) {
$script:HostName = Read-Host "请输入目标主机地址"
while ([string]::IsNullOrEmpty($script:HostName)) {
Write-Host "主机地址不能为空!" -ForegroundColor Red
$script:HostName = Read-Host "请输入目标主机地址"
}
}
# 输入SSH端口
if ($script:Port -eq 0) {
$portInput = Read-Host "请输入SSH端口 (默认: 22)"
if ([string]::IsNullOrEmpty($portInput)) {
$script:Port = 22
}
else {
while (-not ($portInput -match "^\d+$")) {
Write-Host "端口必须是数字!" -ForegroundColor Red
$portInput = Read-Host "请输入SSH端口 (默认: 22)"
}
$script:Port = [int]$portInput
}
}
# 输入用户名
if ([string]::IsNullOrEmpty($script:Username)) {
$script:Username = Read-Host "请输入SSH用户名 (默认: root)"
if ([string]::IsNullOrEmpty($script:Username)) {
$script:Username = "root"
}
}
# 输入密码
if ([string]::IsNullOrEmpty($script:Password)) {
$securePassword = Read-Host "请输入SSH密码" -AsSecureString
$script:Password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto(
[System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($securePassword)
)
while ([string]::IsNullOrEmpty($script:Password)) {
Write-Host "密码不能为空!" -ForegroundColor Red
$securePassword = Read-Host "请输入SSH密码" -AsSecureString
$script:Password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto(
[System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($securePassword)
)
}
}
# 确认信息
Write-Host "`n========================================" -ForegroundColor Cyan
Write-Host "连接信息确认:" -ForegroundColor Yellow
Write-Host " 主机地址: $script:HostName" -ForegroundColor White
Write-Host " SSH端口: $script:Port" -ForegroundColor White
Write-Host " 用户名: $script:Username" -ForegroundColor White
Write-Host "========================================" -ForegroundColor Cyan
$confirm = Read-Host "`n确认以上信息是否正确?(Y/N)"
if ($confirm -notmatch "Y|y") {
Write-Host "已取消执行。" -ForegroundColor Yellow
exit 0
}
Write-Host ""
}
# ==================== SSH连接函数 ====================
function Invoke-SSHCommand {
param(
[string]$Command,
[int]$Timeout = 30
)
try {
$plinkPath = Join-Path $scriptPath "plink.exe"
# 检查plink是否存在
if (-not (Test-Path $plinkPath)) {
Write-Log "plink.exe未找到: $plinkPath" "ERROR"
return $null
}
# 使用参数数组传递
$plinkArgs = @(
"-ssh",
"-P", $script:Port,
"-l", $script:Username,
"-pw", $script:Password,
"-batch",
$script:HostName,
$Command
)
# 执行命令
$result = & $plinkPath @plinkArgs 2>&1
$exitCode = $LASTEXITCODE
# 处理首次连接主机密钥问题
if ($exitCode -ne 0 -and ($result -match "host key" -or $result -match "Cannot confirm")) {
Write-Log "检测到主机密钥问题,自动接受..." "WARN"
$cmdLine = "echo y | `"$plinkPath`" -ssh -P $script:Port -l $script:Username -pw `"$script:Password`" $script:HostName `"$Command`""
$result = cmd /c $cmdLine 2>&1
$exitCode = $LASTEXITCODE
}
return $result
}
catch {
Write-Log "SSH命令执行异常: $($_.Exception.Message)" "ERROR"
return $null
}
}
function Test-SSHConnection {
Write-Log "测试SSH连接..."
$result = Invoke-SSHCommand "echo 'OK'" -Timeout 10
if ($result -match "OK") {
Write-Log "SSH连接成功!" "INFO"
return $true
}
else {
Write-Log "SSH连接失败!" "ERROR"
return $false
}
}
# ==================== 模块上传函数 ====================
function Publish-Modules {
Write-Log "开始上传检测模块到远程服务器..."
try {
# 创建远程模块目录
$mkdirResult = Invoke-SSHCommand "mkdir -p $modulePath/{system,service,utils}"
Write-Log "创建远程模块目录: $modulePath"
# 上传配置文件
Write-Log "上传配置文件..."
$localConfig = Join-Path $libPath "config.sh"
if (Test-Path $localConfig) {
$pscpPath = Join-Path $scriptPath "pscp.exe"
if (Test-Path $pscpPath) {
& $pscpPath -P $script:Port -l $script:Username -pw $script:Password $localConfig "${script:Username}@${script:HostName}:$modulePath/lib/" 2>&1 | Out-Null
Write-Log "config.sh 上传完成"
}
}
# 上传通用函数库
Write-Log "上传通用函数库..."
$localCommon = Join-Path $libPath "common.sh"
if (Test-Path $localCommon) {
$pscpPath = Join-Path $scriptPath "pscp.exe"
if (Test-Path $pscpPath) {
& $pscpPath -P $script:Port -l $script:Username -pw $script:Password $localCommon "${script:Username}@${script:HostName}:$modulePath/lib/" 2>&1 | Out-Null
Write-Log "common.sh 上传完成"
}
}
# 上传系统模块
Write-Log "上传系统检测模块..."
$systemModuleDir = Join-Path $libPath "system"
if (Test-Path $systemModuleDir) {
$pscpPath = Join-Path $scriptPath "pscp.exe"
foreach ($module in $ModuleConfig.SystemModules) {
$localModule = Join-Path $systemModuleDir $module
if (Test-Path $localModule) {
& $pscpPath -P $script:Port -l $script:Username -pw $script:Password $localModule "${script:Username}@${script:HostName}:$modulePath/system/" 2>&1 | Out-Null
Write-Log " $module 上传完成"
}
}
}
# 上传服务模块
Write-Log "上传服务检测模块..."
$serviceModuleDir = Join-Path $libPath "service"
if (Test-Path $serviceModuleDir) {
$pscpPath = Join-Path $scriptPath "pscp.exe"
foreach ($module in $ModuleConfig.ServiceModules) {
$localModule = Join-Path $serviceModuleDir $module
if (Test-Path $localModule) {
& $pscpPath -P $script:Port -l $script:Username -pw $script:Password $localModule "${script:Username}@${script:HostName}:$modulePath/service/" 2>&1 | Out-Null
Write-Log " $module 上传完成"
}
}
}
# 设置执行权限
Write-Log "设置模块执行权限..."
Invoke-SSHCommand "chmod +x $modulePath/system/*.sh $modulePath/service/*.sh $modulePath/lib/*.sh" | Out-Null
Write-Log "所有模块上传完成!" "INFO"
return $true
}
catch {
Write-Log "模块上传失败: $($_.Exception.Message)" "ERROR"
return $false
}
}
# ==================== 模块执行函数 ====================
function Invoke-ModuleCheck {
param(
[string]$ModuleName,
[string]$Category
)
Write-Log "执行模块: $Category/$ModuleName"
try {
$remoteScript = "$modulePath/$Category/$ModuleName"
# 执行远程模块
$result = Invoke-SSHCommand "cd $modulePath && source lib/config.sh && source lib/common.sh && $Category/$ModuleName" -Timeout 60
if ($null -eq $result) {
Write-Log "模块 $ModuleName 执行超时或失败" "WARN"
return @()
}
# 解析结果
$parsedResults = Parse-ModuleResult -RawOutput $result -ModuleName $ModuleName
return $parsedResults
}
catch {
Write-Log "模块执行异常: $($_.Exception.Message)" "ERROR"
return @()
}
}
# ==================== 结果解析函数 ====================
function Parse-ModuleResult {
param(
[string]$RawOutput,
[string]$ModuleName
)
$results = @()
$lines = $RawOutput -split "`n"
# 确定模块所属分类
$category = Get-ModuleCategory -ModuleName $ModuleName
foreach ($line in $lines) {
$line = $line.Trim()
# 跳过空行和注释
if ([string]::IsNullOrEmpty($line) -or $line -match "^#" -or $line -match "^\[") {
continue
}
# 处理错误输出
if ($line -match "^ERROR:(.+)") {
Write-Log "$ModuleName 错误: $($matches[1].Trim())" "WARN"
continue
}
# 处理标准格式输出 KEY:VALUE
if ($line -match "^([^:]+):(.+)$") {
$key = $matches[1].Trim()
$value = $matches[2].Trim()
# 转换为检测结果对象
$result = Convert-ToResultObject -Key $key -Value $value -ModuleName $ModuleName -Category $category
if ($result) {
$results += $result
}
}
}
return $results
}
# ==================== 获取模块分类 ====================
function Get-ModuleCategory {
param(
[string]$ModuleName
)
switch -Regex ($ModuleName) {
"^(0[1-9]|1[0-2])_" { return "系统基础信息" }
"^20_" { return "Docker容器" }
"^2[123]_" { return "MySQL数据库" }
"^2[45]_" { return "Redis缓存" }
"^2[67]_" { return "EMQX消息队列" }
"^28_" { return "Java应用" }
"^29_" { return "Python应用" }
"^30_" { return "Nginx应用" }
"^31_" { return "Nacos应用" }
"^32_" { return "FastDFS应用" }
"^33_" { return "应用日志" }
default { return "其他" }
}
}
# ==================== 转换为结果对象 ====================
function Convert-ToResultObject {
param(
[string]$Key,
[string]$Value,
[string]$ModuleName,
[string]$Category
)
try {
# 根据Key确定检测项名称
$name = Get-DisplayName -Key $Key
# 根据Key确定阈值
$threshold = Get-Threshold -Key $Key
# 根据Key和Value确定状态
$status = Get-StatusByValue -Key $Key -Value $Value
# 创建结果对象
$result = [PSCustomObject]@{
Name = $name
Value = $Value
Threshold = $threshold
Status = $status
Message = ""
Module = $ModuleName
Key = $Key
}
# 如果状态不是正常,添加到问题列表
if ($status -eq "严重" -or $status -eq "警告") {
$issueMsg = "$name: $Value"
Add-Issue -Message $issueMsg -Level $status
}
return $result
}
catch {
Write-Log "转换结果对象失败: $($_.Exception.Message)" "WARN"
return $null
}
}
# ==================== 获取显示名称 ====================
function Get-DisplayName {
param(
[string]$Key
)
$displayNames = @{
"HOSTNAME" = "主机名"
"OS_VERSION" = "操作系统版本"
"KERNEL_VERSION" = "内核版本"
"UPTIME_DAYS" = "运行时间(天)"
"LOAD_1MIN" = "1分钟负载"
"LOAD_5MIN" = "5分钟负载"
"LOAD_15MIN" = "15分钟负载"
"CPU_USAGE" = "CPU使用率"
"MEMORY_USAGE" = "内存使用率"
"DISK_USAGE" = "磁盘使用率"
"DOCKER_STATUS" = "Docker状态"
"MYSQL_STATUS" = "MySQL状态"
"REDIS_STATUS" = "Redis状态"
}
if ($displayNames.ContainsKey($Key)) {
return $displayNames[$Key]
}
# 默认返回Key本身
return $Key
}
# ==================== 获取阈值 ====================
function Get-Threshold {
param(
[string]$Key
)
$thresholds = @{
"CPU_USAGE" = ">85%"
"MEMORY_USAGE" = ">85%"
"DISK_USAGE" = ">90%"
}
if ($thresholds.ContainsKey($Key)) {
return $thresholds[$Key]
}
return "-"
}
# ==================== 根据值获取状态 ====================
function Get-StatusByValue {
param(
[string]$Key,
[string]$Value
)
# 如果Value已经是状态值
if ($Value -match "^(正常|警告|严重|ERROR|OK)$") {
switch ($Value) {
"正常" { return "正常" }
"警告" { return "警告" }
"严重" { return "严重" }
"OK" { return "正常" }
"ERROR" { return "严重" }
default { return "正常" }
}
}
# 如果Value是百分比,进行判断
if ($Value -match "([\d\.]+)%") {
$percent = [double]$matches[1]
switch ($Key) {
"CPU_USAGE" {
if ($percent -ge 100) { return "严重" }
if ($percent -ge 85) { return "警告" }
return "正常"
}
"MEMORY_USAGE" {
if ($percent -ge 95) { return "严重" }
if ($percent -ge 85) { return "警告" }
return "正常"
}
"DISK_USAGE" {
if ($percent -ge 95) { return "严重" }
if ($percent -ge 90) { return "警告" }
return "正常"
}
}
}
# 默认返回正常
return "正常"
}
# ==================== 添加问题 ====================
function Add-Issue {
param(
[string]$Message,
[string]$Level = "警告"
)
try {
if ([string]::IsNullOrWhiteSpace($Message)) {
return
}
if ($Level -eq "严重") {
$script:严重问题.Add($Message)
}
else {
$script:警告问题.Add($Message)
}
}
catch {
Write-Log "添加问题失败: $($_.Exception.Message)" "WARN"
}
}
# ==================== 保存检测结果 ====================
function Save-TestResult {
param(
[string]$Category,
[PSCustomObject]$Result
)
try {
if ($script:检测结果.ContainsKey($Category)) {
$script:检测结果[$Category] += $Result
}
else {
$script:检测结果[$Category] = @($Result)
}
}
catch {
Write-Log "保存检测结果失败: $($_.Exception.Message)" "WARN"
}
}
# ==================== 执行所有检测模块 ====================
function Invoke-AllChecks {
Write-Log "`n========================================" "INFO"
Write-Log "开始执行检测模块" "INFO"
Write-Log "========================================" "INFO"
$totalModules = $ModuleConfig.SystemModules.Count + $ModuleConfig.ServiceModules.Count
$currentModule = 0
# 执行系统模块
Write-Log "`n--- 系统模块检测 ---" "INFO"
foreach ($module in $ModuleConfig.SystemModules) {
$currentModule++
Write-Host "`n[$currentModule/$totalModules] " -NoNewline
Write-Host "执行: $module" -ForegroundColor Cyan
$results = Invoke-ModuleCheck -ModuleName $module -Category "system"
$category = Get-ModuleCategory -ModuleName $module
foreach ($result in $results) {
Save-TestResult -Category $category -Result $result
}
}
# 执行服务模块
Write-Log "`n--- 服务模块检测 ---" "INFO"
foreach ($module in $ModuleConfig.ServiceModules) {
$currentModule++
Write-Host "`n[$currentModule/$totalModules] " -NoNewline
Write-Host "执行: $module" -ForegroundColor Cyan
$results = Invoke-ModuleCheck -ModuleName $module -Category "service"
$category = Get-ModuleCategory -ModuleName $module
foreach ($result in $results) {
Save-TestResult -Category $category -Result $result
}
}
Write-Log "`n========================================" "INFO"
Write-Log "所有检测模块执行完成!" "INFO"
Write-Log "========================================" "INFO"
}
# ==================== 生成报告 ====================
function New-MarkdownReport {
param(
[hashtable]$检测结果,
[System.Collections.Generic.List[string]]$严重问题,
[System.Collections.Generic.List[string]]$警告问题
)
$reportLines = @()
# 报告头部
$reportLines += "# 服务器健康检测报告"
$reportLines += ""
$reportLines += "**生成时间**: $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss')"
$reportLines += "**目标主机**: $script:HostName"
$reportLines += ""
# 执行摘要
$reportLines += "## 执行摘要"
$reportLines += ""
$totalIssues = $严重问题.Count + $警告问题.Count
if ($严重问题.Count -gt 0) {
$reportLines += "**总体状态**: 🔴 严重"
}
elseif ($警告问题.Count -gt 0) {
$reportLines += "**总体状态**: 🟡 警告"
}
else {
$reportLines += "**总体状态**: 🟢 正常"
}
$reportLines += ""
$reportLines += "- 严重问题: $($严重问题.Count)"
$reportLines += "- 警告问题: $($警告问题.Count)"
$reportLines += ""
# 严重问题列表
if ($严重问题.Count -gt 0) {
$reportLines += "### 🔴 严重问题"
$reportLines += ""
foreach ($issue in $严重问题) {
$reportLines += "- $issue"
}
$reportLines += ""
}
# 警告问题列表
if ($警告问题.Count -gt 0) {
$reportLines += "### 🟡 警告问题"
$reportLines += ""
foreach ($issue in $警告问题) {
$reportLines += "- $issue"
}
$reportLines += ""
}
# 系统基础信息
if ($检测结果["系统基础信息"].Count -gt 0) {
$reportLines += "## 系统基础信息"
$reportLines += ""
$info = $检测结果["系统基础信息"]
foreach ($item in $info.Values) {
if ($item -is [PSCustomObject]) {
$status = Get-StatusIcon -Status $item.Status
$reportLines += "- **$($item.Name)**: $($item.Value) $status"
}
}
$reportLines += ""
}
# 详细检测结果
foreach ($category in $检测结果.Keys) {
if ($category -eq "系统基础信息") { continue }
$items = $检测结果[$category]
if ($items.Count -eq 0) { continue }
$reportLines += "## $category"
$reportLines += ""
# 创建表格
$reportLines += "| 检测项 | 数值 | 阈值 | 状态 |"
$reportLines += "|:---|:---|:---|:---|"
foreach ($item in $items) {
$status = Get-StatusIcon -Status $item.Status
$reportLines += "| $($item.Name) | $($item.Value) | $($item.Threshold) | $status |"
}
$reportLines += ""
}
# 报告尾部
$reportLines += "---"
$reportLines += ""
$reportLines += "*本报告由服务器健康监测脚本 v3.0 自动生成*"
return $reportLines -join "`n"
}
# ==================== 获取状态图标 ====================
function Get-StatusIcon {
param(
[string]$Status
)
switch ($Status) {
"正常" { return "🟢" }
"警告" { return "🟡" }
"严重" { return "🔴" }
default { return "⚪" }
}
}
# ==================== 保存报告 ====================
function Save-Report {
param(
[string]$Content
)
$reportDir = Join-Path $scriptPath "reports"
if (-not (Test-Path $reportDir)) {
New-Item -ItemType Directory -Path $reportDir -Force | Out-Null
}
$fileName = "health_report_${script:HostName}_${timestamp}.md"
$filePath = Join-Path $reportDir $fileName
$Content | Out-File -FilePath $filePath -Encoding UTF8 -Force
Write-Log "报告已保存: $filePath" "INFO"
return $filePath
}
# ==================== 主函数 ====================
function Main {
# 交互式输入
Invoke-InteractiveInput
# 测试SSH连接
if (-not (Test-SSHConnection)) {
Write-Log "无法连接到服务器,退出执行!" "ERROR"
return
}
# 上传模块
if (-not (Publish-Modules)) {
Write-Log "模块上传失败,退出执行!" "ERROR"
return
}
# 执行检测
Invoke-AllChecks
# 生成报告
Write-Log "`n生成检测报告..."
$reportContent = New-MarkdownReport -检测结果 $script:检测结果 -严重问题 $script:严重问题 -警告问题 $script:警告问题
$reportPath = Save-Report -Content $reportContent
# 显示摘要
Write-Host "`n========================================" -ForegroundColor Cyan
Write-Host "检测完成!" -ForegroundColor Green
Write-Host "========================================" -ForegroundColor Cyan
Write-Host "严重问题: $($script:严重问题.Count)" -ForegroundColor $(if ($script:严重问题.Count -gt 0) { "Red" } else { "Green" })
Write-Host "警告问题: $($script:警告问题.Count)" -ForegroundColor $(if ($script:警告问题.Count -gt 0) { "Yellow" } else { "Green" })
Write-Host "报告路径: $reportPath" -ForegroundColor White
Write-Host "========================================" -ForegroundColor Cyan
}
# ==================== 执行入口 ====================
Main
################################################################################
# 服务器健康监测脚本 v3.0 (模块化架构)
# 描述: 基于SSH的系统健康监测与模块化检测
# 作者: Claude Code
# 日期: 2026-05-09
################################################################################
param(
[Parameter(Mandatory=$false)]
[string]$HostName = "",
[Parameter(Mandatory=$false)]
[int]$Port = 0,
[Parameter(Mandatory=$false)]
[string]$Username = "",
[Parameter(Mandatory=$false)]
[string]$Password = ""
)
# Set encoding
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$OutputEncoding = [System.Text.Encoding]::UTF8
# ==================== Global Variables ====================
$ErrorActionPreference = "Continue"
$scriptPath = Split-Path -Parent $MyInvocation.MyCommand.Path
$libPath = Join-Path $scriptPath "lib"
$modulePath = "/tmp/check_modules"
$timestamp = Get-Date -Format "yyyy-MM-dd_HH-mm-ss"
# Test results collection
${script:TestResults} = @{
SystemBasicInfo = @{}
CPUResource = @()
MemoryResource = @()
DiskResource = @()
OOMCheck = @()
ProcessStatus = @()
NetworkConnection = @()
SecurityCompliance = @()
SystemLogs = @()
TimeSync = @()
PortService = @()
DockerContainer = @()
MySQLDatabase = @()
RedisCache = @()
EMQXMessageQueue = @()
JavaApp = @()
PythonApp = @()
NginxApp = @()
NacosApp = @()
FastDFSApp = @()
AppLogs = @()
}
# Issue lists
${script:CriticalIssues} = New-Object System.Collections.Generic.List[string]
${script:WarningIssues} = New-Object System.Collections.Generic.List[string]
# System info
$script:systemInfo = @{}
# ==================== Module Configuration ====================
$ModuleConfig = @{
# System modules (01-12)
SystemModules = @(
"01_system_basic.sh",
"02_cpu_check.sh",
"03_memory_check.sh",
"04_disk_check.sh",
"05_oom_check.sh",
"06_process_check.sh",
"07_network_check.sh",
"08_security_check.sh",
"09_system_logs.sh",
"10_time_sync.sh",
"11_scheduled_tasks.sh",
"12_port_check.sh"
)
# Service modules (20-33)
ServiceModules = @(
"20_docker_basic.sh",
"21_docker_deep.sh",
"22_mysql_basic.sh",
"23_mysql_depth.sh",
"24_redis_basic.sh",
"25_redis_depth.sh",
"26_emqx_basic.sh",
"27_emqx_deep.sh",
"28_java_check.sh",
"29_python_check.sh",
"30_nginx_check.sh",
"31_nacos_check.sh",
"32_fastdfs_check.sh",
"33_app_logs.sh"
)
# Remote module path
RemoteModulePath = $modulePath
# Local module path
LocalModulePath = Join-Path $scriptPath "lib"
}
# ==================== Logging Function ====================
function Write-Log {
param(
[string]$Message,
[string]$Level = "INFO"
)
$logTime = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
$logMessage = "[$logTime] [$Level] $Message"
switch ($Level) {
"ERROR" { Write-Host $logMessage -ForegroundColor Red }
"WARN" { Write-Host $logMessage -ForegroundColor Yellow }
"DEBUG" { Write-Host $logMessage -ForegroundColor DarkGray }
default { Write-Host $logMessage -ForegroundColor White }
}
}
# ==================== 交互输入函数 ====================
function Invoke-InteractiveInput {
Write-Host "`n========================================" -ForegroundColor Cyan
Write-Host " 服务器健康监测脚本 v3.0" -ForegroundColor Cyan
Write-Host " (模块化架构)" -ForegroundColor Cyan
Write-Host "========================================" -ForegroundColor Cyan
Write-Host ""
# 输入主机名
if ([string]::IsNullOrEmpty(${script:HostName})) {
${script:HostName} = Read-Host "请输入目标主机地址"
while ([string]::IsNullOrEmpty(${script:HostName})) {
Write-Host "主机名不能为空!" -ForegroundColor Red
${script:HostName} = Read-Host "请输入目标主机地址"
}
}
# 输入SSH端口
if (${script:Port} -eq 0) {
$portInput = Read-Host "请输入SSH端口 (默认: 22)"
if ([string]::IsNullOrEmpty($portInput)) {
${script:Port} = 22
}
else {
while (-not ($portInput -match "^\d+$")) {
Write-Host "端口必须是数字!" -ForegroundColor Red
$portInput = Read-Host "请输入SSH端口 (默认: 22)"
}
${script:Port} = [int]$portInput
}
}
# 输入用户名
if ([string]::IsNullOrEmpty(${script:Username})) {
${script:Username} = Read-Host "请输入SSH用户名 (默认: root)"
if ([string]::IsNullOrEmpty(${script:Username})) {
${script:Username} = "root"
}
}
# 输入密码
if ([string]::IsNullOrEmpty(${script:Password})) {
$securePassword = Read-Host "请输入SSH密码" -AsSecureString
${script:Password} = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto(
[System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($securePassword)
)
while ([string]::IsNullOrEmpty(${script:Password})) {
Write-Host "密码不能为空!" -ForegroundColor Red
$securePassword = Read-Host "请输入SSH密码" -AsSecureString
${script:Password} = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto(
[System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($securePassword)
)
}
}
# 确认信息
Write-Host "`n========================================" -ForegroundColor Cyan
Write-Host "连接信息:" -ForegroundColor Yellow
Write-Host " 主机地址: ${script:HostName}" -ForegroundColor White
Write-Host " SSH端口: ${script:Port}" -ForegroundColor White
Write-Host " 用户名: ${script:Username}" -ForegroundColor White
Write-Host "========================================" -ForegroundColor Cyan
$confirm = Read-Host "`n确认以上信息? (Y/N)"
if ($confirm -notmatch "Y|y") {
Write-Host "执行已取消." -ForegroundColor Yellow
exit 0
}
Write-Host ""
}
# ==================== SSH连接函数 ====================
function Invoke-SSHCommand {
param(
[string]$Command,
[int]$Timeout = 30
)
try {
$plinkPath = Join-Path $scriptPath "plink.exe"
# 检查plink是否存在
if (-not (Test-Path $plinkPath)) {
Write-Log "未找到plink.exe: $plinkPath" "ERROR"
return $null
}
# 使用参数数组传递
$plinkArgs = @(
"-ssh",
"-P", ${script:Port},
"-l", ${script:Username},
"-pw", ${script:Password},
"-batch",
${script:HostName},
$Command
)
# 执行命令
$result = & $plinkPath @plinkArgs 2>&1
$exitCode = $LASTEXITCODE
# 处理首次连接的主机密钥问题
if ($exitCode -ne 0 -and ($result -match "host key" -or $result -match "Cannot confirm")) {
Write-Log "检测到主机密钥问题,自动接受..." "WARN"
$cmdLine = "echo y | `"$plinkPath`" -ssh -P ${script:Port} -l ${script:Username} -pw `"${script:Password}`" ${script:HostName} `"$Command`""
$result = cmd /c $cmdLine 2>&1
$exitCode = $LASTEXITCODE
}
return $result
}
catch {
Write-Log "SSH命令执行错误: $($_.Exception.Message)" "ERROR"
return $null
}
}
function Test-SSHConnection {
Write-Log "测试SSH连接..."
$result = Invoke-SSHCommand "echo 'OK'" -Timeout 10
if ($result -match "OK") {
Write-Log "SSH连接成功!" "INFO"
return $true
}
else {
Write-Log "SSH连接失败!" "ERROR"
return $false
}
}
# ==================== 模块上传函数 ====================
function Publish-Modules {
Write-Log "上传检测模块到远程服务器..."
try {
# 创建远程模块目录
$mkdirCmd = "mkdir -p $modulePath/system"
$mkdirResult = Invoke-SSHCommand $mkdirCmd
$mkdirCmd = "mkdir -p $modulePath/service"
$mkdirResult = Invoke-SSHCommand $mkdirCmd
Write-Log "已创建远程模块目录: $modulePath"
# 上传配置文件
Write-Log "上传配置文件..."
$localConfig = Join-Path $libPath "config.sh"
if (Test-Path $localConfig) {
$pscpPath = Join-Path $scriptPath "pscp.exe"
if (Test-Path $pscpPath) {
$remotePath = "$modulePath/"
& $pscpPath -P ${script:Port} -l ${script:Username} -pw ${script:Password} $localConfig "${script:Username}@${script:HostName}:$remotePath" 2>&1 | Out-Null
Write-Log "config.sh 已上传"
}
}
# 上传通用函数库
Write-Log "上传通用函数库..."
$localCommon = Join-Path $libPath "common.sh"
if (Test-Path $localCommon) {
$pscpPath = Join-Path $scriptPath "pscp.exe"
if (Test-Path $pscpPath) {
& $pscpPath -P ${script:Port} -l ${script:Username} -pw ${script:Password} $localCommon "${script:Username}@${script:HostName}:$remotePath" 2>&1 | Out-Null
Write-Log "common.sh 已上传"
}
}
# 上传系统模块
Write-Log "上传系统检测模块..."
$systemModuleDir = Join-Path $libPath "system"
if (Test-Path $systemModuleDir) {
$pscpPath = Join-Path $scriptPath "pscp.exe"
foreach ($module in $ModuleConfig.SystemModules) {
$localModule = Join-Path $systemModuleDir $module
if (Test-Path $localModule) {
$remoteModulePath = "$modulePath/system/"
& $pscpPath -P ${script:Port} -l ${script:Username} -pw ${script:Password} $localModule "${script:Username}@${script:HostName}:$remoteModulePath" 2>&1 | Out-Null
Write-Log " $module 已上传"
}
}
}
# 上传服务模块
Write-Log "上传服务检测模块..."
$serviceModuleDir = Join-Path $libPath "service"
if (Test-Path $serviceModuleDir) {
$pscpPath = Join-Path $scriptPath "pscp.exe"
foreach ($module in $ModuleConfig.ServiceModules) {
$localModule = Join-Path $serviceModuleDir $module
if (Test-Path $localModule) {
$remoteModulePath = "$modulePath/service/"
& $pscpPath -P ${script:Port} -l ${script:Username} -pw ${script:Password} $localModule "${script:Username}@${script:HostName}:$remoteModulePath" 2>&1 | Out-Null
Write-Log " $module 已上传"
}
}
}
# 设置执行权限
Write-Log "设置模块执行权限..."
Invoke-SSHCommand "chmod +x $modulePath/system/*.sh $modulePath/service/*.sh $modulePath/*.sh" | Out-Null
Write-Log "所有模块上传成功!" "INFO"
return $true
}
catch {
Write-Log "模块上传失败: $($_.Exception.Message)" "ERROR"
return $false
}
}
# ==================== 模块执行函数 ====================
function Invoke-ModuleCheck {
param(
[string]$ModuleName,
[string]$Category
)
Write-Log "执行模块: $Category/$ModuleName"
try {
$remoteScript = "$modulePath/$Category/$ModuleName"
# 执行远程模块
$execCmd = "cd $modulePath; source config.sh; source common.sh; $Category/$ModuleName"
$result = Invoke-SSHCommand $execCmd -Timeout 60
if ($null -eq $result) {
Write-Log "模块 $ModuleName 执行超时或失败" "WARN"
return @()
}
# 解析结果
$parsedResults = Parse-ModuleResult -RawOutput $result -ModuleName $ModuleName
return $parsedResults
}
catch {
Write-Log "模块执行错误: $($_.Exception.Message)" "ERROR"
return @()
}
}
# ==================== 结果解析函数 ====================
function Parse-ModuleResult {
param(
[string]$RawOutput,
[string]$ModuleName
)
$results = @()
$lines = $RawOutput -split "`n"
# 确定模块类别
$category = Get-ModuleCategory -ModuleName $ModuleName
foreach ($line in $lines) {
$line = $line.Trim()
# 跳过空行和注释
if ([string]::IsNullOrEmpty($line) -or $line -match "^#" -or $line -match "^\[") {
continue
}
# 处理错误输出
if ($line -match "^ERROR:(.+)") {
Write-Log "$ModuleName 错误: $($matches[1].Trim())" "WARN"
continue
}
# 处理标准格式输出 KEY:VALUE
if ($line -match "^([^:]+):(.+)$") {
$key = $matches[1].Trim()
$value = $matches[2].Trim()
# 转换为测试结果对象
$result = Convert-ToResultObject -Key $key -Value $value -ModuleName $ModuleName -Category $category
if ($result) {
$results += $result
}
}
}
return $results
}
# ==================== Get Module Category ====================
function Get-ModuleCategory {
param(
[string]$ModuleName
)
switch -Regex ($ModuleName) {
"^(0[1-9]|1[0-2])_" { return "SystemBasicInfo" }
"^20_" { return "DockerContainer" }
"^2[123]_" { return "MySQLDatabase" }
"^2[45]_" { return "RedisCache" }
"^2[67]_" { return "EMQXMessageQueue" }
"^28_" { return "JavaApp" }
"^29_" { return "PythonApp" }
"^30_" { return "NginxApp" }
"^31_" { return "NacosApp" }
"^32_" { return "FastDFSApp" }
"^33_" { return "AppLogs" }
default { return "Other" }
}
}
# ==================== 转换为结果对象 ====================
function Convert-ToResultObject {
param(
[string]$Key,
[string]$Value,
[string]$ModuleName,
[string]$Category
)
try {
# 根据Key确定显示名称
$name = Get-DisplayName -Key $Key
# 根据Key确定阈值
$threshold = Get-Threshold -Key $Key
# 根据Key和Value确定状态
$status = Get-StatusByValue -Key $Key -Value $Value
# 创建结果对象
$result = [PSCustomObject]@{
Name = $name
Value = $Value
Threshold = $threshold
Status = $status
Message = ""
Module = $ModuleName
Key = $Key
}
# 如果状态不正常,添加到问题列表
if ($status -eq "严重" -or $status -eq "警告") {
$issueMsg = "${name}: ${Value}"
Add-Issue -Message $issueMsg -Level $status
}
return $result
}
catch {
Write-Log "转换结果对象失败: $($_.Exception.Message)" "WARN"
return $null
}
}
# ==================== 获取显示名称 ====================
function Get-DisplayName {
param(
[string]$Key
)
$displayNames = @{
"HOSTNAME" = "主机名"
"OS_VERSION" = "操作系统版本"
"KERNEL_VERSION" = "内核版本"
"UPTIME_DAYS" = "运行天数"
"LOAD_1MIN" = "1分钟负载"
"LOAD_5MIN" = "5分钟负载"
"LOAD_15MIN" = "15分钟负载"
"CPU_USAGE" = "CPU使用率"
"MEMORY_USAGE" = "内存使用率"
"DISK_USAGE" = "磁盘使用率"
"DOCKER_STATUS" = "Docker状态"
"MYSQL_STATUS" = "MySQL状态"
"REDIS_STATUS" = "Redis状态"
"EMQX_STATUS" = "EMQX状态"
"JAVA_VERSION" = "Java版本"
"PYTHON_VERSION" = "Python版本"
"NGINX_VERSION" = "Nginx版本"
"NACOS_VERSION" = "Nacos版本"
}
if ($displayNames.ContainsKey($Key)) {
return $displayNames[$Key]
}
# 默认返回Key本身
return $Key
}
# ==================== 获取阈值 ====================
function Get-Threshold {
param(
[string]$Key
)
$thresholds = @{
"CPU_USAGE" = "大于85%"
"MEMORY_USAGE" = "大于85%"
"DISK_USAGE" = "大于90%"
}
if ($thresholds.ContainsKey($Key)) {
return $thresholds[$Key]
}
return "-"
}
# ==================== 根据值获取状态 ====================
function Get-StatusByValue {
param(
[string]$Key,
[string]$Value
)
# 如果Value已经是状态值
if ($Value -match "^(正常|警告|严重|ERROR|OK)$") {
switch ($Value) {
"正常" { return "正常" }
"警告" { return "警告" }
"严重" { return "严重" }
"OK" { return "正常" }
"ERROR" { return "严重" }
default { return "正常" }
}
}
# 如果Value包含中文状态,转换它
if ($Value -match "正常") { return "正常" }
if ($Value -match "警告") { return "警告" }
if ($Value -match "严重") { return "严重" }
# 如果Value是百分比,评估它
if ($Value -match "([\d\.]+)%") {
$percent = [double]$matches[1]
switch ($Key) {
"CPU_USAGE" {
if ($percent -ge 100) { return "严重" }
if ($percent -ge 85) { return "警告" }
return "正常"
}
"MEMORY_USAGE" {
if ($percent -ge 95) { return "严重" }
if ($percent -ge 85) { return "警告" }
return "正常"
}
"DISK_USAGE" {
if ($percent -ge 95) { return "严重" }
if ($percent -ge 90) { return "警告" }
return "正常"
}
}
}
# 默认返回正常
return "正常"
}
# ==================== 添加问题 ====================
function Add-Issue {
param(
[string]$Message,
[string]$Level = "Warning"
)
try {
if ([string]::IsNullOrWhiteSpace($Message)) {
return
}
if ($Level -eq "严重") {
${script:CriticalIssues}.Add($Message)
}
else {
${script:WarningIssues}.Add($Message)
}
}
catch {
Write-Log "添加问题失败: $($_.Exception.Message)" "WARN"
}
}
# ==================== 保存测试结果 ====================
function Save-TestResult {
param(
[string]$Category,
[PSCustomObject]$Result
)
try {
if (${script:TestResults}.ContainsKey($Category)) {
${script:TestResults}[$Category] += $Result
}
else {
${script:TestResults}[$Category] = @($Result)
}
}
catch {
Write-Log "保存测试结果失败: $($_.Exception.Message)" "WARN"
}
}
# ==================== 执行所有检测模块 ====================
function Invoke-AllChecks {
Write-Log "`n========================================" "INFO"
Write-Log "开始执行检测模块" "INFO"
Write-Log "========================================" "INFO"
$totalModules = $ModuleConfig.SystemModules.Count + $ModuleConfig.ServiceModules.Count
$currentModule = 0
# 执行系统模块
Write-Log "`n--- 系统模块检测 ---" "INFO"
foreach ($module in $ModuleConfig.SystemModules) {
$currentModule++
Write-Host "`n[$currentModule/$totalModules] " -NoNewline
Write-Host "执行: $module" -ForegroundColor Cyan
$results = Invoke-ModuleCheck -ModuleName $module -Category "system"
$category = Get-ModuleCategory -ModuleName $module
foreach ($result in $results) {
Save-TestResult -Category $category -Result $result
}
}
# 执行服务模块
Write-Log "`n--- 服务模块检测 ---" "INFO"
foreach ($module in $ModuleConfig.ServiceModules) {
$currentModule++
Write-Host "`n[$currentModule/$totalModules] " -NoNewline
Write-Host "执行: $module" -ForegroundColor Cyan
$results = Invoke-ModuleCheck -ModuleName $module -Category "service"
$category = Get-ModuleCategory -ModuleName $module
foreach ($result in $results) {
Save-TestResult -Category $category -Result $result
}
}
Write-Log "`n========================================" "INFO"
Write-Log "所有检测模块执行完成!" "INFO"
Write-Log "========================================" "INFO"
}
# ==================== 生成报告 ====================
function New-MarkdownReport {
param(
[hashtable]$TestResults,
[System.Collections.Generic.List[string]]$CriticalIssues,
[System.Collections.Generic.List[string]]$WarningIssues
)
$reportLines = @()
# 报告头部
$reportLines += "# 服务器健康检查报告"
$reportLines += ""
$reportLines += "**生成时间**: $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss')"
$reportLines += "**目标主机**: ${script:HostName}"
$reportLines += ""
# 执行摘要
$reportLines += "## 执行摘要"
$reportLines += ""
$totalIssues = $CriticalIssues.Count + $WarningIssues.Count
if ($CriticalIssues.Count -gt 0) {
$reportLines += "**整体状态**: 发现严重问题"
}
elseif ($WarningIssues.Count -gt 0) {
$reportLines += "**整体状态**: 发现警告问题"
}
else {
$reportLines += "**整体状态**: 系统运行正常"
}
$reportLines += ""
$reportLines += "- 严重问题: $($CriticalIssues.Count)"
$reportLines += "- 警告问题: $($WarningIssues.Count)"
$reportLines += ""
# 严重问题列表
if ($CriticalIssues.Count -gt 0) {
$reportLines += "### 严重问题"
$reportLines += ""
foreach ($issue in $CriticalIssues) {
$reportLines += "- $issue"
}
$reportLines += ""
}
# 警告问题列表
if ($WarningIssues.Count -gt 0) {
$reportLines += "### 警告问题"
$reportLines += ""
foreach ($issue in $WarningIssues) {
$reportLines += "- $issue"
}
$reportLines += ""
}
# 系统基本信息
if ($TestResults["SystemBasicInfo"].Count -gt 0) {
$reportLines += "## 系统基本信息"
$reportLines += ""
$info = $TestResults["SystemBasicInfo"]
foreach ($item in $info.Values) {
if ($item -is [PSCustomObject]) {
$status = Get-StatusIcon -Status $item.Status
$reportLines += "- **$($item.Name)**: $($item.Value) $status"
}
}
$reportLines += ""
}
# 详细测试结果
$categoryNames = @{
"CPUResource" = "CPU资源"
"MemoryResource" = "内存资源"
"DiskResource" = "磁盘资源"
"OOMCheck" = "OOM检测"
"ProcessStatus" = "进程状态"
"NetworkConnection" = "网络连接"
"SecurityCompliance" = "安全合规"
"SystemLogs" = "系统日志"
"TimeSync" = "时间同步"
"PortService" = "端口服务"
"DockerContainer" = "Docker容器"
"MySQLDatabase" = "MySQL数据库"
"RedisCache" = "Redis缓存"
"EMQXMessageQueue" = "EMQX消息队列"
"JavaApp" = "Java应用"
"PythonApp" = "Python应用"
"NginxApp" = "Nginx应用"
"NacosApp" = "Nacos应用"
"FastDFSApp" = "FastDFS应用"
"AppLogs" = "应用日志"
}
foreach ($category in $TestResults.Keys) {
if ($category -eq "SystemBasicInfo") { continue }
$items = $TestResults[$category]
if ($items.Count -eq 0) { continue }
$displayName = if ($categoryNames.ContainsKey($category)) { $categoryNames[$category] } else { $category }
$reportLines += "## $displayName"
$reportLines += ""
# 创建表格
$reportLines += "| 检查项 | 数值 | 阈值 | 状态 |"
$reportLines += "|:---|:---|:---|:---|"
foreach ($item in $items) {
$status = Get-StatusIcon -Status $item.Status
$reportLines += "| $($item.Name) | $($item.Value) | $($item.Threshold) | $status |"
}
$reportLines += ""
}
# 报告尾部
$reportLines += "---"
$reportLines += ""
$reportLines += "*本报告由服务器健康监测脚本 v3.0 自动生成*"
return $reportLines -join "`n"
}
# ==================== 获取状态图标 ====================
function Get-StatusIcon {
param(
[string]$Status
)
switch ($Status) {
"正常" { return "[正常]" }
"警告" { return "[警告]" }
"严重" { return "[严重]" }
default { return "[未知]" }
}
}
# ==================== 保存报告 ====================
function Save-Report {
param(
[string]$Content
)
$reportDir = Join-Path $scriptPath "reports"
if (-not (Test-Path $reportDir)) {
New-Item -ItemType Directory -Path $reportDir -Force | Out-Null
}
$fileName = "health_report_${script:HostName}_${timestamp}.md"
$filePath = Join-Path $reportDir $fileName
$Content | Out-File -FilePath $filePath -Encoding UTF8 -Force
Write-Log "报告已保存: $filePath" "INFO"
return $filePath
}
# ==================== 主函数 ====================
function Main {
# 交互输入
Invoke-InteractiveInput
# 测试SSH连接
if (-not (Test-SSHConnection)) {
Write-Log "无法连接到服务器,退出!" "ERROR"
return
}
# 上传模块
if (-not (Publish-Modules)) {
Write-Log "模块上传失败,退出!" "ERROR"
return
}
# 执行检测
Invoke-AllChecks
# 生成报告
Write-Log "`n生成测试报告..."
$reportContent = New-MarkdownReport -TestResults ${script:TestResults} -CriticalIssues ${script:CriticalIssues} -WarningIssues ${script:WarningIssues}
${reportPath} = Save-Report -Content $reportContent
# 显示摘要
Write-Host "`n========================================" -ForegroundColor Cyan
Write-Host "检测完成!" -ForegroundColor Green
Write-Host "========================================" -ForegroundColor Cyan
Write-Host "严重问题: $(${script:CriticalIssues}.Count)" -ForegroundColor $(if (${script:CriticalIssues}.Count -gt 0) { "Red" } else { "Green" })
Write-Host "警告问题: $(${script:WarningIssues}.Count)" -ForegroundColor $(if (${script:WarningIssues}.Count -gt 0) { "Yellow" } else { "Green" })
Write-Host "报告路径: ${reportPath}" -ForegroundColor White
Write-Host "========================================" -ForegroundColor Cyan
}
# ==================== Execution Entry ====================
Main
################################################################################
# 服务器健康监测脚本 v4.0 (模块化架构)
# 功能: 通过SSH连接远程服务器,执行模块化系统健康检测并生成Markdown报告
# 作者: Claude Code
# 日期: 2026-05-09
################################################################################
param(
[Parameter(Mandatory=$false)]
[string]$HostName = "",
[Parameter(Mandatory=$false)]
[int]$Port = 0,
[Parameter(Mandatory=$false)]
[string]$Username = "",
[Parameter(Mandatory=$false)]
[string]$Password = ""
)
# 设置编码
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$OutputEncoding = [System.Text.Encoding]::UTF8
# ==================== 全局变量 ====================
$ErrorActionPreference = "Continue"
$scriptPath = Split-Path -Parent $MyInvocation.MyCommand.Path
$libPath = Join-Path $scriptPath "lib"
$modulePath = "/tmp/check_modules"
$timestamp = Get-Date -Format "yyyy-MM-dd_HH-mm-ss"
# 检测结果收集
$script:TestResults = @{}
# 问题列表
$script:CriticalIssues = New-Object System.Collections.Generic.List[string]
$script:WarningIssues = New-Object System.Collections.Generic.List[string]
# 系统信息
$script:systemInfo = @{}
# ==================== 模块配置 ====================
$ModuleConfig = @{
SystemModules = @(
"01_system_basic.sh",
"02_cpu_check.sh",
"03_memory_check.sh",
"04_disk_check.sh",
"05_oom_check.sh",
"06_process_check.sh",
"07_network_check.sh",
"08_security_check.sh",
"09_system_logs.sh",
"10_time_sync.sh",
"11_scheduled_tasks.sh",
"12_port_check.sh"
)
ServiceModules = @(
"20_docker_basic.sh",
"21_docker_deep.sh",
"22_mysql_basic.sh",
"23_mysql_depth.sh",
"24_redis_basic.sh",
"25_redis_depth.sh",
"26_emqx_basic.sh",
"27_emqx_deep.sh",
"28_java_check.sh",
"29_python_check.sh",
"30_nginx_check.sh",
"31_nacos_check.sh",
"32_fastdfs_check.sh",
"33_app_logs.sh"
)
RemoteModulePath = $modulePath
LocalModulePath = Join-Path $scriptPath "lib"
}
# ==================== 日志函数 ====================
function Write-Log {
param(
[string]$Message,
[string]$Level = "INFO"
)
$logTime = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
$logMessage = "[$logTime] [$Level] $Message"
switch ($Level) {
"ERROR" { Write-Host $logMessage -ForegroundColor Red }
"WARN" { Write-Host $logMessage -ForegroundColor Yellow }
"DEBUG" { Write-Host $logMessage -ForegroundColor DarkGray }
default { Write-Host $logMessage -ForegroundColor White }
}
}
# ==================== 交互式输入函数 ====================
function Invoke-InteractiveInput {
Write-Host ""
Write-Host "========================================" -ForegroundColor Cyan
Write-Host " 服务器健康监测脚本 v4.0" -ForegroundColor Cyan
Write-Host " (模块化架构)" -ForegroundColor Cyan
Write-Host "========================================" -ForegroundColor Cyan
Write-Host ""
if ([string]::IsNullOrEmpty($script:HostName)) {
$script:HostName = Read-Host "请输入目标主机地址"
while ([string]::IsNullOrEmpty($script:HostName)) {
Write-Host "主机地址不能为空!" -ForegroundColor Red
$script:HostName = Read-Host "请输入目标主机地址"
}
}
if ($script:Port -eq 0) {
$portInput = Read-Host "请输入SSH端口 (默认: 22)"
if ([string]::IsNullOrEmpty($portInput)) {
$script:Port = 22
}
else {
while (-not ($portInput -match "^\d+$")) {
Write-Host "端口必须是数字!" -ForegroundColor Red
$portInput = Read-Host "请输入SSH端口 (默认: 22)"
}
$script:Port = [int]$portInput
}
}
if ([string]::IsNullOrEmpty($script:Username)) {
$script:Username = Read-Host "请输入SSH用户名 (默认: root)"
if ([string]::IsNullOrEmpty($script:Username)) {
$script:Username = "root"
}
}
if ([string]::IsNullOrEmpty($script:Password)) {
$securePassword = Read-Host "请输入SSH密码" -AsSecureString
$script:Password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto(
[System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($securePassword)
)
while ([string]::IsNullOrEmpty($script:Password)) {
Write-Host "密码不能为空!" -ForegroundColor Red
$securePassword = Read-Host "请输入SSH密码" -AsSecureString
$script:Password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto(
[System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($securePassword)
)
}
}
Write-Host ""
Write-Host "========================================" -ForegroundColor Cyan
Write-Host "连接信息:" -ForegroundColor Yellow
Write-Host " 主机地址: $script:HostName" -ForegroundColor White
Write-Host " SSH端口: $script:Port" -ForegroundColor White
Write-Host " 用户名: $script:Username" -ForegroundColor White
Write-Host "========================================" -ForegroundColor Cyan
$confirm = Read-Host "`n确认以上信息? (Y/N)"
if ($confirm -notmatch "Y|y") {
Write-Host "执行已取消。" -ForegroundColor Yellow
exit 0
}
Write-Host ""
}
# ==================== SSH命令执行函数 ====================
function Invoke-SSHCommand {
param(
[string]$Command,
[int]$Timeout = 30
)
try {
$plinkPath = Join-Path $scriptPath "plink.exe"
if (-not (Test-Path $plinkPath)) {
Write-Log "未找到plink.exe: $plinkPath" "ERROR"
return $null
}
$argList = @(
"-ssh",
"-P", "$script:Port",
"-l", "$script:Username",
"-pw", "$script:Password",
"-batch",
"$script:HostName",
$Command
)
$processInfo = New-Object System.Diagnostics.ProcessStartInfo
$processInfo.FileName = $plinkPath
$processInfo.Arguments = $argList -join " "
$processInfo.UseShellExecute = $false
$processInfo.RedirectStandardOutput = $true
$processInfo.RedirectStandardError = $true
$processInfo.StandardOutputEncoding = [System.Text.Encoding]::UTF8
$processInfo.StandardErrorEncoding = [System.Text.Encoding]::UTF8
$process = New-Object System.Diagnostics.Process
$process.StartInfo = $processInfo
$process.Start() | Out-Null
$output = $process.StandardOutput.ReadToEnd()
$error = $process.StandardError.ReadToEnd()
$process.WaitForExit($Timeout * 1000)
if ($process.HasExited) {
$result = if ($error) { $error } else { $output }
return $result
}
else {
$process.Kill()
Write-Log "命令执行超时" "WARN"
return $null
}
}
catch {
Write-Log "SSH命令执行错误: $($_.Exception.Message)" "ERROR"
return $null
}
}
function Test-SSHConnection {
Write-Log "测试SSH连接..."
$result = Invoke-SSHCommand "echo 'OK'" -Timeout 10
if ($result -match "OK") {
Write-Log "SSH连接成功!" "INFO"
return $true
}
else {
Write-Log "SSH连接失败!" "ERROR"
return $false
}
}
# ==================== 模块上传函数 ====================
function Publish-Modules {
Write-Log "上传检测模块到远程服务器..."
try {
$mkdirCmd = "mkdir -p $modulePath/system"
Invoke-SSHCommand $mkdirCmd | Out-Null
$mkdirCmd = "mkdir -p $modulePath/service"
Invoke-SSHCommand $mkdirCmd | Out-Null
Write-Log "已创建远程模块目录: $modulePath"
$pscpPath = Join-Path $scriptPath "pscp.exe"
if (-not (Test-Path $pscpPath)) {
Write-Log "未找到pscp.exe,跳过文件上传" "WARN"
return $false
}
$remotePath = "$modulePath/"
$localConfig = Join-Path $libPath "config.sh"
if (Test-Path $localConfig) {
$argList = @(
"-P", "$script:Port",
"-l", "$script:Username",
"-pw", "$script:Password",
$localConfig,
"$script:Username@$script:HostName`:$remotePath"
)
$processInfo = New-Object System.Diagnostics.ProcessStartInfo
$processInfo.FileName = $pscpPath
$processInfo.Arguments = $argList -join " "
$processInfo.UseShellExecute = $false
$processInfo.RedirectStandardOutput = $true
$processInfo.RedirectStandardError = $true
$process = New-Object System.Diagnostics.Process
$process.StartInfo = $processInfo
$process.Start() | Out-Null
$process.WaitForExit(60000)
Write-Log "config.sh 已上传"
}
$localCommon = Join-Path $libPath "common.sh"
if (Test-Path $localCommon) {
$argList = @(
"-P", "$script:Port",
"-l", "$script:Username",
"-pw", "$script:Password",
$localCommon,
"$script:Username@$script:HostName`:$remotePath"
)
$processInfo = New-Object System.Diagnostics.ProcessStartInfo
$processInfo.FileName = $pscpPath
$processInfo.Arguments = $argList -join " "
$processInfo.UseShellExecute = $false
$processInfo.RedirectStandardOutput = $true
$processInfo.RedirectStandardError = $true
$process = New-Object System.Diagnostics.Process
$process.StartInfo = $processInfo
$process.Start() | Out-Null
$process.WaitForExit(60000)
Write-Log "common.sh 已上传"
}
$systemModuleDir = Join-Path $libPath "system"
if (Test-Path $systemModuleDir) {
foreach ($module in $ModuleConfig.SystemModules) {
$localModule = Join-Path $systemModuleDir $module
if (Test-Path $localModule) {
$remoteModulePath = "$modulePath/system/"
$argList = @(
"-P", "$script:Port",
"-l", "$script:Username",
"-pw", "$script:Password",
$localModule,
"$script:Username@$script:HostName`:$remoteModulePath"
)
$processInfo = New-Object System.Diagnostics.ProcessStartInfo
$processInfo.FileName = $pscpPath
$processInfo.Arguments = $argList -join " "
$processInfo.UseShellExecute = $false
$processInfo.RedirectStandardOutput = $true
$processInfo.RedirectStandardError = $true
$process = New-Object System.Diagnostics.Process
$process.StartInfo = $processInfo
$process.Start() | Out-Null
$process.WaitForExit(60000)
Write-Log " $module 已上传"
}
}
}
$serviceModuleDir = Join-Path $libPath "service"
if (Test-Path $serviceModuleDir) {
foreach ($module in $ModuleConfig.ServiceModules) {
$localModule = Join-Path $serviceModuleDir $module
if (Test-Path $localModule) {
$remoteModulePath = "$modulePath/service/"
$argList = @(
"-P", "$script:Port",
"-l", "$script:Username",
"-pw", "$script:Password",
$localModule,
"$script:Username@$script:HostName`:$remoteModulePath"
)
$processInfo = New-Object System.Diagnostics.ProcessStartInfo
$processInfo.FileName = $pscpPath
$processInfo.Arguments = $argList -join " "
$processInfo.UseShellExecute = $false
$processInfo.RedirectStandardOutput = $true
$processInfo.RedirectStandardError = $true
$process = New-Object System.Diagnostics.Process
$process.StartInfo = $processInfo
$process.Start() | Out-Null
$process.WaitForExit(60000)
Write-Log " $module 已上传"
}
}
}
Write-Log "设置模块执行权限..."
Invoke-SSHCommand "chmod +x $modulePath/system/*.sh $modulePath/service/*.sh $modulePath/*.sh" | Out-Null
Write-Log "所有模块上传成功!" "INFO"
return $true
}
catch {
Write-Log "模块上传失败: $($_.Exception.Message)" "ERROR"
return $false
}
}
# ==================== 模块执行函数 ====================
function Invoke-ModuleCheck {
param(
[string]$ModuleName,
[string]$Category
)
Write-Log "执行模块: $Category/$ModuleName"
try {
$execCmd = "cd $modulePath && source config.sh && source common.sh && $Category/$ModuleName"
$result = Invoke-SSHCommand $execCmd -Timeout 90
if ($null -eq $result) {
Write-Log "模块 $ModuleName 执行超时或失败" "WARN"
return @()
}
$parsedResults = Parse-ModuleResult -RawOutput $result -ModuleName $ModuleName
return $parsedResults
}
catch {
Write-Log "模块执行错误: $($_.Exception.Message)" "ERROR"
return @()
}
}
# ==================== 结果解析函数 ====================
function Parse-ModuleResult {
param(
[string]$RawOutput,
[string]$ModuleName
)
$results = @()
$lines = $RawOutput -split "`n"
foreach ($line in $lines) {
$line = $line.Trim()
if ([string]::IsNullOrEmpty($line) -or $line -match "^#" -or $line -match "^\[") {
continue
}
if ($line -match "^ERROR:(.+)") {
Write-Log "$ModuleName 错误: $($matches[1].Trim())" "WARN"
continue
}
if ($line -match "^([^:]+):(.+)$") {
$key = $matches[1].Trim()
$value = $matches[2].Trim()
$result = Convert-ToResultObject -Key $key -Value $value -ModuleName $ModuleName
if ($result) {
$results += $result
}
}
}
return $results
}
# ==================== 转换为结果对象 ====================
function Convert-ToResultObject {
param(
[string]$Key,
[string]$Value,
[string]$ModuleName
)
try {
$name = Get-DisplayName -Key $Key
$threshold = Get-Threshold -Key $Key
$status = Get-StatusByValue -Key $Key -Value $Value
$result = [PSCustomObject]@{
Name = $name
Value = $Value
Threshold = $threshold
Status = $status
Module = $ModuleName
Key = $Key
}
if ($status -eq "严重" -or $status -eq "警告") {
$issueMsg = "${name}: ${Value}"
Add-Issue -Message $issueMsg -Level $status
}
return $result
}
catch {
Write-Log "转换结果对象失败: $($_.Exception.Message)" "WARN"
return $null
}
}
# ==================== 获取显示名称 ====================
function Get-DisplayName {
param(
[string]$Key
)
$displayNames = @{
"HOSTNAME" = "主机名"
"OS_VERSION" = "操作系统版本"
"KERNEL_VERSION" = "内核版本"
"UPTIME_DAYS" = "运行天数"
"LOAD_1MIN" = "1分钟负载"
"LOAD_5MIN" = "5分钟负载"
"LOAD_15MIN" = "15分钟负载"
"CPU_USAGE" = "CPU使用率"
"MEMORY_USAGE" = "内存使用率"
"DISK_USAGE" = "磁盘使用率"
"DOCKER_SERVICE_STATUS" = "Docker服务状态"
"MYSQL_STATUS" = "MySQL状态"
"REDIS_STATUS" = "Redis状态"
"EMQX_STATUS" = "EMQX状态"
"JAVA_VERSION" = "Java版本"
"PYTHON_VERSION" = "Python版本"
"NGINX_VERSION" = "Nginx版本"
"NACOS_VERSION" = "Nacos版本"
}
if ($displayNames.ContainsKey($Key)) {
return $displayNames[$Key]
}
return $Key
}
# ==================== 获取阈值 ====================
function Get-Threshold {
param(
[string]$Key
)
$thresholds = @{
"CPU_USAGE" = "大于85%"
"MEMORY_USAGE" = "大于85%"
"DISK_USAGE" = "大于90%"
}
if ($thresholds.ContainsKey($Key)) {
return $thresholds[$Key]
}
return "-"
}
# ==================== 根据值获取状态 ====================
function Get-StatusByValue {
param(
[string]$Key,
[string]$Value
)
if ($Value -match "^(正常|警告|严重|ERROR|OK)$") {
switch ($Value) {
"正常" { return "正常" }
"警告" { return "警告" }
"严重" { return "严重" }
"OK" { return "正常" }
"ERROR" { return "严重" }
default { return "正常" }
}
}
if ($Value -match "正常") { return "正常" }
if ($Value -match "警告") { return "警告" }
if ($Value -match "严重") { return "严重" }
if ($Value -match "([\d\.]+)%") {
$percent = [double]$matches[1]
switch ($Key) {
"CPU_USAGE" {
if ($percent -ge 100) { return "严重" }
if ($percent -ge 85) { return "警告" }
return "正常"
}
"MEMORY_USAGE" {
if ($percent -ge 95) { return "严重" }
if ($percent -ge 85) { return "警告" }
return "正常"
}
"DISK_USAGE" {
if ($percent -ge 95) { return "严重" }
if ($percent -ge 90) { return "警告" }
return "正常"
}
}
}
return "正常"
}
# ==================== 添加问题 ====================
function Add-Issue {
param(
[string]$Message,
[string]$Level = "警告"
)
if ([string]::IsNullOrWhiteSpace($Message)) {
return
}
if ($Level -eq "严重") {
$script:CriticalIssues.Add($Message)
}
else {
$script:WarningIssues.Add($Message)
}
}
# ==================== 保存测试结果 ====================
function Save-TestResult {
param(
[PSCustomObject]$Result
)
try {
$category = Get-ResultCategory -Result $Result
if (-not $script:TestResults.ContainsKey($category)) {
$script:TestResults[$category] = @()
}
$script:TestResults[$category] += $Result
}
catch {
Write-Log "保存测试结果失败: $($_.Exception.Message)" "WARN"
}
}
# ==================== 获取结果分类 ====================
function Get-ResultCategory {
param(
[PSCustomObject]$Result
)
$module = $Result.Module
switch -Regex ($module) {
"^(0[1-9]|1[0-2])_" {
$map = @{
"01_" = "系统基础信息"
"02_" = "CPU资源"
"03_" = "内存资源"
"04_" = "磁盘资源"
"05_" = "OOM检测"
"06_" = "进程状态"
"07_" = "网络连接"
"08_" = "安全合规"
"09_" = "系统日志"
"10_" = "时间同步"
"11_" = "定时任务"
"12_" = "端口服务"
}
foreach ($key in $map.Keys) {
if ($module -match "^$key") {
return $map[$key]
}
}
}
"^20_" { return "Docker容器" }
"^2[123]_" { return "MySQL数据库" }
"^2[45]_" { return "Redis缓存" }
"^2[67]_" { return "EMQX消息队列" }
"^28_" { return "Java应用" }
"^29_" { return "Python应用" }
"^30_" { return "Nginx应用" }
"^31_" { return "Nacos应用" }
"^32_" { return "FastDFS应用" }
"^33_" { return "应用日志" }
}
return "其他"
}
# ==================== 执行所有检测模块 ====================
function Invoke-AllChecks {
Write-Log ""
Write-Log "========================================" "INFO"
Write-Log "开始执行检测模块" "INFO"
Write-Log "========================================" "INFO"
$totalModules = $ModuleConfig.SystemModules.Count + $ModuleConfig.ServiceModules.Count
$currentModule = 0
Write-Log ""
Write-Log "--- 系统模块检测 ---" "INFO"
foreach ($module in $ModuleConfig.SystemModules) {
$currentModule++
Write-Host ""
Write-Host "[$currentModule/$totalModules] " -NoNewline
Write-Host "执行: $module" -ForegroundColor Cyan
$results = Invoke-ModuleCheck -ModuleName $module -Category "system"
foreach ($result in $results) {
Save-TestResult -Result $result
}
}
Write-Log ""
Write-Log "--- 服务模块检测 ---" "INFO"
foreach ($module in $ModuleConfig.ServiceModules) {
$currentModule++
Write-Host ""
Write-Host "[$currentModule/$totalModules] " -NoNewline
Write-Host "执行: $module" -ForegroundColor Cyan
$results = Invoke-ModuleCheck -ModuleName $module -Category "service"
foreach ($result in $results) {
Save-TestResult -Result $result
}
}
Write-Log ""
Write-Log "========================================" "INFO"
Write-Log "所有检测模块执行完成!" "INFO"
Write-Log "========================================" "INFO"
}
# ==================== 生成报告 ====================
function New-MarkdownReport {
$reportLines = @()
$reportLines += "# 服务器健康检查报告"
$reportLines += ""
$reportLines += "**生成时间**: $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss')"
$reportLines += "**目标主机**: $script:HostName"
$reportLines += ""
$reportLines += "## 执行摘要"
$reportLines += ""
if ($script:CriticalIssues.Count -gt 0) {
$reportLines += "**整体状态**: 发现严重问题"
}
elseif ($script:WarningIssues.Count -gt 0) {
$reportLines += "**整体状态**: 发现警告问题"
}
else {
$reportLines += "**整体状态**: 系统运行正常"
}
$reportLines += ""
$reportLines += "- 严重问题: $($script:CriticalIssues.Count)"
$reportLines += "- 警告问题: $($script:WarningIssues.Count)"
$reportLines += ""
if ($script:CriticalIssues.Count -gt 0) {
$reportLines += "### 严重问题"
$reportLines += ""
foreach ($issue in $script:CriticalIssues) {
$reportLines += "- $issue"
}
$reportLines += ""
}
if ($script:WarningIssues.Count -gt 0) {
$reportLines += "### 警告问题"
$reportLines += ""
foreach ($issue in $script:WarningIssues) {
$reportLines += "- $issue"
}
$reportLines += ""
}
foreach ($category in $script:TestResults.Keys) {
$items = $script:TestResults[$category]
if ($items.Count -eq 0) { continue }
$reportLines += "## $category"
$reportLines += ""
$reportLines += "| 检查项 | 数值 | 阈值 | 状态 |"
$reportLines += "|:---|:---|:---|:---|"
foreach ($item in $items) {
$status = Get-StatusIcon -Status $item.Status
$reportLines += "| $($item.Name) | $($item.Value) | $($item.Threshold) | $status |"
}
$reportLines += ""
}
$reportLines += "---"
$reportLines += ""
$reportLines += "*本报告由服务器健康监测脚本 v4.0 自动生成*"
return $reportLines -join "`n"
}
# ==================== 获取状态图标 ====================
function Get-StatusIcon {
param(
[string]$Status
)
switch ($Status) {
"正常" { return "[正常]" }
"警告" { return "[警告]" }
"严重" { return "[严重]" }
default { return "[未知]" }
}
}
# ==================== 保存报告 ====================
function Save-Report {
param(
[string]$Content
)
$reportDir = Join-Path $scriptPath "reports"
if (-not (Test-Path $reportDir)) {
New-Item -ItemType Directory -Path $reportDir -Force | Out-Null
}
$fileName = "health_report_${script:HostName}_${timestamp}.md"
$filePath = Join-Path $reportDir $fileName
$Content | Out-File -FilePath $filePath -Encoding UTF8 -Force
Write-Log "报告已保存: $filePath" "INFO"
return $filePath
}
# ==================== 主函数 ====================
function Main {
Invoke-InteractiveInput
if (-not (Test-SSHConnection)) {
Write-Log "无法连接到服务器,退出!" "ERROR"
return
}
if (-not (Publish-Modules)) {
Write-Log "模块上传失败,退出!" "ERROR"
return
}
Invoke-AllChecks
Write-Log ""
Write-Log "生成测试报告..."
$reportContent = New-MarkdownReport
$reportPath = Save-Report -Content $reportContent
Write-Host ""
Write-Host "========================================" -ForegroundColor Cyan
Write-Host "检测完成!" -ForegroundColor Green
Write-Host "========================================" -ForegroundColor Cyan
Write-Host "严重问题: $($script:CriticalIssues.Count)" -ForegroundColor $(if ($script:CriticalIssues.Count -gt 0) { "Red" } else { "Green" })
Write-Host "警告问题: $($script:WarningIssues.Count)" -ForegroundColor $(if ($script:WarningIssues.Count -gt 0) { "Yellow" } else { "Green" })
Write-Host "报告路径: $reportPath" -ForegroundColor White
Write-Host "========================================" -ForegroundColor Cyan
}
# ==================== 执行入口 ====================
Main
################################################################################
# 服务器健康监测脚本 v3.0 (模块化架构)
# 功能: 通过SSH连接远程服务器,执行模块化系统健康检测并生成Markdown报告
# 作者: Claude Code
# 日期: 2026-05-09
################################################################################
param(
[Parameter(Mandatory=$false)]
[string]$HostName = "",
[Parameter(Mandatory=$false)]
[int]$Port = 0,
[Parameter(Mandatory=$false)]
[string]$Username = "",
[Parameter(Mandatory=$false)]
[string]$Password = ""
)
# 设置编码
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$OutputEncoding = [System.Text.Encoding]::UTF8
# ==================== 全局变量 ====================
$ErrorActionPreference = "Continue"
$scriptPath = Split-Path -Parent $MyInvocation.MyCommand.Path
$libPath = Join-Path $scriptPath "lib"
$modulePath = "/tmp/check_modules"
$timestamp = Get-Date -Format "yyyy-MM-dd_HH-mm-ss"
# 检测结果收集
$script:TestResults = @{
系统基础信息 = @{}
CPU资源 = @()
内存资源 = @()
磁盘资源 = @()
OOM检测 = @()
进程状态 = @()
网络连接 = @()
安全合规 = @()
系统日志 = @()
时间同步 = @()
端口服务 = @()
Docker容器 = @()
MySQL数据库 = @()
Redis缓存 = @()
EMQX消息队列 = @()
Java应用 = @()
Python应用 = @()
Nginx应用 = @()
Nacos应用 = @()
FastDFS应用 = @()
应用日志 = @()
}
# 问题列表
$script:CriticalIssues = New-Object System.Collections.Generic.List[string]
$script:WarningIssues = New-Object System.Collections.Generic.List[string]
# 系统信息
$script:systemInfo = @{}
# ==================== 模块配置 ====================
$ModuleConfig = @{
# 系统模块 (01-12)
SystemModules = @(
"01_system_basic.sh",
"02_cpu_check.sh",
"03_memory_check.sh",
"04_disk_check.sh",
"05_oom_check.sh",
"06_process_check.sh",
"07_network_check.sh",
"08_security_check.sh",
"09_system_logs.sh",
"10_time_sync.sh",
"11_scheduled_tasks.sh",
"12_port_check.sh"
)
# 服务模块 (20-33)
ServiceModules = @(
"20_docker_basic.sh",
"21_docker_deep.sh",
"22_mysql_basic.sh",
"23_mysql_depth.sh",
"24_redis_basic.sh",
"25_redis_depth.sh",
"26_emqx_basic.sh",
"27_emqx_deep.sh",
"28_java_check.sh",
"29_python_check.sh",
"30_nginx_check.sh",
"31_nacos_check.sh",
"32_fastdfs_check.sh",
"33_app_logs.sh"
)
# 远程模块路径
RemoteModulePath = $modulePath
# 本地模块路径
LocalModulePath = Join-Path $scriptPath "lib"
}
# ==================== 日志函数 ====================
function Write-Log {
param(
[string]$Message,
[string]$Level = "INFO"
)
$logTime = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
$logMessage = "[$logTime] [$Level] $Message"
switch ($Level) {
"ERROR" { Write-Host $logMessage -ForegroundColor Red }
"WARN" { Write-Host $logMessage -ForegroundColor Yellow }
"DEBUG" { Write-Host $logMessage -ForegroundColor DarkGray }
default { Write-Host $logMessage -ForegroundColor White }
}
}
# ==================== 交互式输入函数 ====================
function Invoke-InteractiveInput {
Write-Host "`n========================================" -ForegroundColor Cyan
Write-Host " 服务器健康监测脚本 v3.0" -ForegroundColor Cyan
Write-Host " (模块化架构)" -ForegroundColor Cyan
Write-Host "========================================" -ForegroundColor Cyan
Write-Host ""
# 输入主机地址
if ([string]::IsNullOrEmpty($script:HostName)) {
$script:HostName = Read-Host "请输入目标主机地址"
while ([string]::IsNullOrEmpty($script:HostName)) {
Write-Host "主机地址不能为空!" -ForegroundColor Red
$script:HostName = Read-Host "请输入目标主机地址"
}
}
# 输入SSH端口
if ($script:Port -eq 0) {
$portInput = Read-Host "请输入SSH端口 (默认: 22)"
if ([string]::IsNullOrEmpty($portInput)) {
$script:Port = 22
}
else {
while (-not ($portInput -match "^\d+$")) {
Write-Host "端口必须是数字!" -ForegroundColor Red
$portInput = Read-Host "请输入SSH端口 (默认: 22)"
}
$script:Port = [int]$portInput
}
}
# 输入用户名
if ([string]::IsNullOrEmpty($script:Username)) {
$script:Username = Read-Host "请输入SSH用户名 (默认: root)"
if ([string]::IsNullOrEmpty($script:Username)) {
$script:Username = "root"
}
}
# 输入密码
if ([string]::IsNullOrEmpty($script:Password)) {
$securePassword = Read-Host "请输入SSH密码" -AsSecureString
$script:Password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto(
[System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($securePassword)
)
while ([string]::IsNullOrEmpty($script:Password)) {
Write-Host "密码不能为空!" -ForegroundColor Red
$securePassword = Read-Host "请输入SSH密码" -AsSecureString
$script:Password = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto(
[System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($securePassword)
)
}
}
# 确认信息
Write-Host "`n========================================" -ForegroundColor Cyan
Write-Host "连接信息确认:" -ForegroundColor Yellow
Write-Host " 主机地址: $script:HostName" -ForegroundColor White
Write-Host " SSH端口: $script:Port" -ForegroundColor White
Write-Host " 用户名: $script:Username" -ForegroundColor White
Write-Host "========================================" -ForegroundColor Cyan
$confirm = Read-Host "`n确认以上信息是否正确?(Y/N)"
if ($confirm -notmatch "Y|y") {
Write-Host "已取消执行。" -ForegroundColor Yellow
exit 0
}
Write-Host ""
}
# ==================== SSH连接函数 ====================
function Invoke-SSHCommand {
param(
[string]$Command,
[int]$Timeout = 30
)
try {
$plinkPath = Join-Path $scriptPath "plink.exe"
# 检查plink是否存在
if (-not (Test-Path $plinkPath)) {
Write-Log "plink.exe未找到: $plinkPath" "ERROR"
return $null
}
# 使用参数数组传递
$plinkArgs = @(
"-ssh",
"-P", $script:Port,
"-l", $script:Username,
"-pw", $script:Password,
"-batch",
$script:HostName,
$Command
)
# 执行命令
$result = & $plinkPath @plinkArgs 2>&1
$exitCode = $LASTEXITCODE
# 处理首次连接主机密钥问题
if ($exitCode -ne 0 -and ($result -match "host key" -or $result -match "Cannot confirm")) {
Write-Log "检测到主机密钥问题,自动接受..." "WARN"
$cmdLine = "echo y | `"$plinkPath`" -ssh -P $script:Port -l $script:Username -pw `"$script:Password`" $script:HostName `"$Command`""
$result = cmd /c $cmdLine 2>&1
$exitCode = $LASTEXITCODE
}
return $result
}
catch {
Write-Log "SSH命令执行异常: $($_.Exception.Message)" "ERROR"
return $null
}
}
function Test-SSHConnection {
Write-Log "测试SSH连接..."
$result = Invoke-SSHCommand "echo 'OK'" -Timeout 10
if ($result -match "OK") {
Write-Log "SSH连接成功!" "INFO"
return $true
}
else {
Write-Log "SSH连接失败!" "ERROR"
return $false
}
}
# ==================== 模块上传函数 ====================
function Publish-Modules {
Write-Log "开始上传检测模块到远程服务器..."
try {
# 清理并重新创建远程模块目录(确保使用最新文件)
Write-Log "清理旧模块目录..."
Invoke-SSHCommand "rm -rf $modulePath" | Out-Null
# 创建远程模块目录(包含lib目录)
$mkdirResult = Invoke-SSHCommand "mkdir -p $modulePath/{lib,system,service,utils}"
Write-Log "创建远程模块目录: $modulePath"
# 上传配置文件
Write-Log "上传配置文件..."
$localConfig = Join-Path $libPath "config.sh"
if (Test-Path $localConfig) {
$pscpPath = Join-Path $scriptPath "pscp.exe"
if (Test-Path $pscpPath) {
$uploadResult = & $pscpPath -P $script:Port -l $script:Username -pw $script:Password $localConfig "${script:Username}@${script:HostName}:$modulePath/lib/" 2>&1
if ($LASTEXITCODE -eq 0) {
Write-Log "config.sh 上传完成"
}
else {
Write-Log "config.sh 上传失败: $uploadResult" "ERROR"
}
}
else {
Write-Log "pscp.exe 不存在于: $pscpPath" "ERROR"
}
}
else {
Write-Log "本地配置文件不存在: $localConfig" "ERROR"
}
# 上传通用函数库
Write-Log "上传通用函数库..."
$localCommon = Join-Path $libPath "common.sh"
if (Test-Path $localCommon) {
$pscpPath = Join-Path $scriptPath "pscp.exe"
if (Test-Path $pscpPath) {
$uploadResult = & $pscpPath -P $script:Port -l $script:Username -pw $script:Password $localCommon "${script:Username}@${script:HostName}:$modulePath/lib/" 2>&1
if ($LASTEXITCODE -eq 0) {
Write-Log "common.sh 上传完成"
}
else {
Write-Log "common.sh 上传失败: $uploadResult" "ERROR"
}
}
}
else {
Write-Log "本地通用函数库不存在: $localCommon" "ERROR"
}
# 上传系统模块
Write-Log "上传系统检测模块..."
$systemModuleDir = Join-Path $libPath "system"
if (Test-Path $systemModuleDir) {
$pscpPath = Join-Path $scriptPath "pscp.exe"
foreach ($module in $ModuleConfig.SystemModules) {
$localModule = Join-Path $systemModuleDir $module
if (Test-Path $localModule) {
& $pscpPath -P $script:Port -l $script:Username -pw $script:Password $localModule "${script:Username}@${script:HostName}:$modulePath/system/" 2>&1 | Out-Null
Write-Log " $module 上传完成"
}
}
}
# 上传服务模块
Write-Log "上传服务检测模块..."
$serviceModuleDir = Join-Path $libPath "service"
if (Test-Path $serviceModuleDir) {
$pscpPath = Join-Path $scriptPath "pscp.exe"
foreach ($module in $ModuleConfig.ServiceModules) {
$localModule = Join-Path $serviceModuleDir $module
if (Test-Path $localModule) {
& $pscpPath -P $script:Port -l $script:Username -pw $script:Password $localModule "${script:Username}@${script:HostName}:$modulePath/service/" 2>&1 | Out-Null
Write-Log " $module 上传完成"
}
}
}
# 设置执行权限
Write-Log "设置模块执行权限..."
Invoke-SSHCommand "chmod +x $modulePath/system/*.sh $modulePath/service/*.sh $modulePath/lib/*.sh" | Out-Null
# 验证关键文件是否存在
Write-Log "验证上传文件..."
$configCheck = Invoke-SSHCommand "test -f $modulePath/lib/config.sh && echo 'OK' || echo 'FAIL'"
$commonCheck = Invoke-SSHCommand "test -f $modulePath/lib/common.sh && echo 'OK' || echo 'FAIL'"
if ($configCheck -match "OK" -and $commonCheck -match "OK") {
Write-Log "关键文件验证成功!" "INFO"
# 检查模块文件内容,确认是否为新版本(检查source行)
Write-Log "检查模块文件版本..."
$moduleSourceCheck = Invoke-SSHCommand "grep 'source.*lib/config.sh' $modulePath/system/01_system_basic.sh"
if ($moduleSourceCheck -match 'source.*"\$LIB_DIR/lib/config.sh"') {
Write-Log "模块文件版本正确!" "INFO"
}
else {
Write-Log "模块文件版本错误!检测到: $moduleSourceCheck" "WARN"
Write-Log "正在强制重新上传所有模块文件..." "WARN"
# 强制重新上传system和service模块
$pscpPath = Join-Path $scriptPath "pscp.exe"
$systemModuleDir = Join-Path $libPath "system"
if (Test-Path $systemModuleDir) {
foreach ($module in $ModuleConfig.SystemModules) {
$localModule = Join-Path $systemModuleDir $module
if (Test-Path $localModule) {
$uploadResult = & $pscpPath -P $script:Port -l $script:Username -pw $script:Password $localModule "${script:Username}@${script:HostName}:$modulePath/system/" 2>&1
Write-Log " 重新上传 system/$module"
}
}
}
$serviceModuleDir = Join-Path $libPath "service"
if (Test-Path $serviceModuleDir) {
foreach ($module in $ModuleConfig.ServiceModules) {
$localModule = Join-Path $serviceModuleDir $module
if (Test-Path $localModule) {
$uploadResult = & $pscpPath -P $script:Port -l $script:Username -pw $script:Password $localModule "${script:Username}@${script:HostName}:$modulePath/service/" 2>&1
Write-Log " 重新上传 service/$module"
}
}
}
# 重新验证
$moduleSourceCheck = Invoke-SSHCommand "grep 'source.*lib/config.sh' $modulePath/system/01_system_basic.sh"
if ($moduleSourceCheck -match 'source.*"\$LIB_DIR/lib/config.sh"') {
Write-Log "重新上传后验证成功!" "INFO"
} else {
Write-Log "重新上传后仍然失败!内容: $moduleSourceCheck" "ERROR"
}
}
}
else {
Write-Log "关键文件验证失败!config.sh: $configCheck, common.sh: $commonCheck" "ERROR"
return $false
}
Write-Log "所有模块上传完成!" "INFO"
return $true
}
catch {
Write-Log "模块上传失败: $($_.Exception.Message)" "ERROR"
return $false
}
}
# ==================== 模块执行函数 ====================
function Invoke-ModuleCheck {
param(
[string]$ModuleName,
[string]$Category
)
Write-Log "执行模块: $Category/$ModuleName"
try {
$remoteScript = "$modulePath/$Category/$ModuleName"
# 直接执行模块(不source配置文件,模块内部会自己处理)
$result = Invoke-SSHCommand "cd $modulePath && bash $Category/$ModuleName" -Timeout 60
if ($null -eq $result) {
Write-Log "模块 $ModuleName 执行超时或失败" "WARN"
return @()
}
# 解析结果
$parsedResults = Parse-ModuleResult -RawOutput $result -ModuleName $ModuleName
return $parsedResults
}
catch {
Write-Log "模块执行异常: $($_.Exception.Message)" "ERROR"
return @()
}
}
# ==================== 结果解析函数 ====================
function Parse-ModuleResult {
param(
[string]$RawOutput,
[string]$ModuleName
)
$results = @()
$lines = $RawOutput -split "`n"
# 确定模块所属分类
$category = Get-ModuleCategory -ModuleName $ModuleName
foreach ($line in $lines) {
$line = $line.Trim()
# 跳过空行和注释
if ([string]::IsNullOrEmpty($line) -or $line -match "^#" -or $line -match "^\[") {
continue
}
# 处理错误输出
if ($line -match "^ERROR:(.+)") {
Write-Log "$ModuleName 错误: $($matches[1].Trim())" "WARN"
continue
}
# 处理标准格式输出 KEY:VALUE
if ($line -match "^([^:]+):(.+)$") {
$key = $matches[1].Trim()
$value = $matches[2].Trim()
# 转换为检测结果对象
$result = Convert-ToResultObject -Key $key -Value $value -ModuleName $ModuleName -Category $category
if ($result) {
$results += $result
}
}
}
return $results
}
# ==================== 获取模块分类 ====================
function Get-ModuleCategory {
param(
[string]$ModuleName
)
switch -Regex ($ModuleName) {
"^(0[1-9]|1[0-2])_" { return "系统基础信息" }
"^20_" { return "Docker容器" }
"^2[123]_" { return "MySQL数据库" }
"^2[45]_" { return "Redis缓存" }
"^2[67]_" { return "EMQX消息队列" }
"^28_" { return "Java应用" }
"^29_" { return "Python应用" }
"^30_" { return "Nginx应用" }
"^31_" { return "Nacos应用" }
"^32_" { return "FastDFS应用" }
"^33_" { return "应用日志" }
default { return "其他" }
}
}
# ==================== 转换为结果对象 ====================
function Convert-ToResultObject {
param(
[string]$Key,
[string]$Value,
[string]$ModuleName,
[string]$Category
)
try {
# 根据Key确定检测项名称
$name = Get-DisplayName -Key $Key
# 根据Key确定阈值
$threshold = Get-Threshold -Key $Key
# 根据Key和Value确定状态
$status = Get-StatusByValue -Key $Key -Value $Value
# 创建结果对象
$result = [PSCustomObject]@{
Name = $name
Value = $Value
Threshold = $threshold
Status = $status
Message = ""
Module = $ModuleName
Key = $Key
}
# 如果状态不是正常,添加到问题列表
if ($status -eq "严重" -or $status -eq "警告") {
$issueMsg = "${name}: $Value"
Add-Issue -Message $issueMsg -Level $status
}
return $result
}
catch {
Write-Log "转换结果对象失败: $($_.Exception.Message)" "WARN"
return $null
}
}
# ==================== 获取显示名称 ====================
function Get-DisplayName {
param(
[string]$Key
)
$displayNames = @{
"HOSTNAME" = "主机名"
"OS_VERSION" = "操作系统版本"
"KERNEL_VERSION" = "内核版本"
"UPTIME_DAYS" = "运行时间(天)"
"LOAD_1MIN" = "1分钟负载"
"LOAD_5MIN" = "5分钟负载"
"LOAD_15MIN" = "15分钟负载"
"CPU_USAGE" = "CPU使用率"
"MEMORY_USAGE" = "内存使用率"
"DISK_USAGE" = "磁盘使用率"
"DOCKER_STATUS" = "Docker状态"
"MYSQL_STATUS" = "MySQL状态"
"REDIS_STATUS" = "Redis状态"
}
if ($displayNames.ContainsKey($Key)) {
return $displayNames[$Key]
}
# 默认返回Key本身
return $Key
}
# ==================== 获取阈值 ====================
function Get-Threshold {
param(
[string]$Key
)
$thresholds = @{
"CPU_USAGE" = ">85%"
"MEMORY_USAGE" = ">85%"
"DISK_USAGE" = ">90%"
}
if ($thresholds.ContainsKey($Key)) {
return $thresholds[$Key]
}
return "-"
}
# ==================== 根据值获取状态 ====================
function Get-StatusByValue {
param(
[string]$Key,
[string]$Value
)
# 如果Value已经是状态值
if ($Value -match "^(正常|警告|严重|ERROR|OK)$") {
switch ($Value) {
"正常" { return "正常" }
"警告" { return "警告" }
"严重" { return "严重" }
"OK" { return "正常" }
"ERROR" { return "严重" }
default { return "正常" }
}
}
# 如果Value是百分比,进行判断
if ($Value -match "([\d\.]+)%") {
$percent = [double]$matches[1]
switch ($Key) {
"CPU_USAGE" {
if ($percent -ge 100) { return "严重" }
if ($percent -ge 85) { return "警告" }
return "正常"
}
"MEMORY_USAGE" {
if ($percent -ge 95) { return "严重" }
if ($percent -ge 85) { return "警告" }
return "正常"
}
"DISK_USAGE" {
if ($percent -ge 95) { return "严重" }
if ($percent -ge 90) { return "警告" }
return "正常"
}
}
}
# 默认返回正常
return "正常"
}
# ==================== 添加问题 ====================
function Add-Issue {
param(
[string]$Message,
[string]$Level = "警告"
)
try {
if ([string]::IsNullOrWhiteSpace($Message)) {
return
}
if ($Level -eq "严重") {
$script:CriticalIssues.Add($Message)
}
else {
$script:WarningIssues.Add($Message)
}
}
catch {
Write-Log "添加问题失败: $($_.Exception.Message)" "WARN"
}
}
# ==================== 保存检测结果 ====================
function Save-TestResult {
param(
[string]$Category,
[PSCustomObject]$Result
)
try {
if ($script:TestResults.ContainsKey($Category)) {
$currentValue = $script:TestResults[$Category]
# 判断当前值是哈希表还是数组
if ($currentValue -is [hashtable]) {
# 哈希表类型:使用键值对存储(用于系统基础信息)
$script:TestResults[$Category][$Result.Name] = $Result
}
elseif ($currentValue -is [array]) {
# 数组类型:追加元素
$script:TestResults[$Category] += $Result
}
else {
# 未初始化:根据分类类型初始化
if ($Category -eq "系统基础信息") {
$script:TestResults[$Category] = @{$Result.Name = $Result}
} else {
$script:TestResults[$Category] = @($Result)
}
}
}
else {
# 键不存在,根据分类类型初始化
if ($Category -eq "系统基础信息") {
$script:TestResults[$Category] = @{$Result.Name = $Result}
} else {
$script:TestResults[$Category] = @($Result)
}
}
}
catch {
Write-Log "保存检测结果失败: $($_.Exception.Message)" "WARN"
}
}
# ==================== 执行所有检测模块 ====================
function Invoke-AllChecks {
Write-Log "`n========================================" "INFO"
Write-Log "开始执行检测模块" "INFO"
Write-Log "========================================" "INFO"
$totalModules = $ModuleConfig.SystemModules.Count + $ModuleConfig.ServiceModules.Count
$currentModule = 0
# 执行系统模块
Write-Log "`n--- 系统模块检测 ---" "INFO"
foreach ($module in $ModuleConfig.SystemModules) {
$currentModule++
Write-Host "`n[$currentModule/$totalModules] " -NoNewline
Write-Host "执行: $module" -ForegroundColor Cyan
$results = Invoke-ModuleCheck -ModuleName $module -Category "system"
$category = Get-ModuleCategory -ModuleName $module
foreach ($result in $results) {
Save-TestResult -Category $category -Result $result
}
}
# 执行服务模块
Write-Log "`n--- 服务模块检测 ---" "INFO"
foreach ($module in $ModuleConfig.ServiceModules) {
$currentModule++
Write-Host "`n[$currentModule/$totalModules] " -NoNewline
Write-Host "执行: $module" -ForegroundColor Cyan
$results = Invoke-ModuleCheck -ModuleName $module -Category "service"
$category = Get-ModuleCategory -ModuleName $module
foreach ($result in $results) {
Save-TestResult -Category $category -Result $result
}
}
Write-Log "`n========================================" "INFO"
Write-Log "所有检测模块执行完成!" "INFO"
Write-Log "========================================" "INFO"
}
# ==================== 生成报告 ====================
function New-MarkdownReport {
param(
[hashtable]$TestResults,
[System.Collections.Generic.List[string]]$CriticalIssues,
[System.Collections.Generic.List[string]]$WarningIssues
)
$reportLines = @()
# 报告头部
$reportLines += "# 服务器健康检测报告"
$reportLines += ""
$reportLines += "**生成时间**: $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss')"
$reportLines += "**目标主机**: $script:HostName"
$reportLines += ""
# 执行摘要
$reportLines += "## 执行摘要"
$reportLines += ""
$totalIssues = $CriticalIssues.Count + $WarningIssues.Count
if ($CriticalIssues.Count -gt 0) {
$reportLines += "**总体状态**: 🔴 严重"
}
elseif ($WarningIssues.Count -gt 0) {
$reportLines += "**总体状态**: 🟡 警告"
}
else {
$reportLines += "**总体状态**: 🟢 正常"
}
$reportLines += ""
$reportLines += "- 严重问题: $($CriticalIssues.Count)"
$reportLines += "- 警告问题: $($WarningIssues.Count)"
$reportLines += ""
# 严重问题列表
if ($CriticalIssues.Count -gt 0) {
$reportLines += "### 🔴 严重问题"
$reportLines += ""
foreach ($issue in $CriticalIssues) {
$reportLines += "- $issue"
}
$reportLines += ""
}
# 警告问题列表
if ($WarningIssues.Count -gt 0) {
$reportLines += "### 🟡 警告问题"
$reportLines += ""
foreach ($issue in $WarningIssues) {
$reportLines += "- $issue"
}
$reportLines += ""
}
# 系统基础信息
if ($TestResults["系统基础信息"].Count -gt 0) {
$reportLines += "## 系统基础信息"
$reportLines += ""
$info = $TestResults["系统基础信息"]
foreach ($item in $info.Values) {
if ($item -is [PSCustomObject]) {
$status = Get-StatusIcon -Status $item.Status
$reportLines += "- **$($item.Name)**: $($item.Value) $status"
}
}
$reportLines += ""
}
# 详细检测结果
foreach ($category in $TestResults.Keys) {
if ($category -eq "系统基础信息") { continue }
$items = $TestResults[$category]
if ($items.Count -eq 0) { continue }
$reportLines += "## $category"
$reportLines += ""
# 创建表格
$reportLines += "| 检测项 | 数值 | 阈值 | 状态 |"
$reportLines += "|:---|:---|:---|:---|"
foreach ($item in $items) {
$status = Get-StatusIcon -Status $item.Status
$reportLines += "| $($item.Name) | $($item.Value) | $($item.Threshold) | $status |"
}
$reportLines += ""
}
# 报告尾部
$reportLines += "---"
$reportLines += ""
$reportLines += "*本报告由服务器健康监测脚本 v3.0 自动生成*"
return $reportLines -join "`n"
}
# ==================== 获取状态图标 ====================
function Get-StatusIcon {
param(
[string]$Status
)
switch ($Status) {
"正常" { return "🟢" }
"警告" { return "🟡" }
"严重" { return "🔴" }
default { return "⚪" }
}
}
# ==================== 保存报告 ====================
function Save-Report {
param(
[string]$Content
)
$reportDir = Join-Path $scriptPath "reports"
if (-not (Test-Path $reportDir)) {
New-Item -ItemType Directory -Path $reportDir -Force | Out-Null
}
$fileName = "health_report_${script:HostName}_${timestamp}.md"
$filePath = Join-Path $reportDir $fileName
$Content | Out-File -FilePath $filePath -Encoding UTF8 -Force
Write-Log "报告已保存: $filePath" "INFO"
return $filePath
}
# ==================== 主函数 ====================
function Main {
# 交互式输入
Invoke-InteractiveInput
# 测试SSH连接
if (-not (Test-SSHConnection)) {
Write-Log "无法连接到服务器,退出执行!" "ERROR"
return
}
# 上传模块
if (-not (Publish-Modules)) {
Write-Log "模块上传失败,退出执行!" "ERROR"
return
}
# 执行检测
Invoke-AllChecks
# 生成报告
Write-Log "`n生成检测报告..."
$reportContent = New-MarkdownReport -TestResults $script:TestResults -CriticalIssues $script:CriticalIssues -WarningIssues $script:WarningIssues
$reportPath = Save-Report -Content $reportContent
# 显示摘要
Write-Host "`n========================================" -ForegroundColor Cyan
Write-Host "检测完成!" -ForegroundColor Green
Write-Host "========================================" -ForegroundColor Cyan
Write-Host "严重问题: $($script:CriticalIssues.Count)" -ForegroundColor $(if ($script:CriticalIssues.Count -gt 0) { "Red" } else { "Green" })
Write-Host "警告问题: $($script:WarningIssues.Count)" -ForegroundColor $(if ($script:WarningIssues.Count -gt 0) { "Yellow" } else { "Green" })
Write-Host "报告路径: $reportPath" -ForegroundColor White
Write-Host "========================================" -ForegroundColor Cyan
}
# ==================== 执行入口 ====================
Main
# PowerShell脚本修复辅助脚本
# 用于修复check_server_health.ps1的编码和语法问题
$scriptPath = "check_server_health.ps1"
# 读取文件内容
$content = Get-Content $scriptPath -Raw -Encoding UTF8
# 修复所有的问题字符串
$content = $content -replace '检测失败或无数据`n"', '检测失败或无数据`n"'
$content = $content -replace '\$([^)]+)\.Name', '${_}.Name'
$content = $content -replace '\$([^)]+)\.Value', '${_}.Value'
$content = $content -replace '\$([^)]+)\.Status', '${_}.Status'
$content = $content -replace '\$([^)]+)\.Threshold', '${_}.Threshold'
$content = $content -replace '\$([^)]+)\.Message', '${_}.Message'
# 修复特定变量引用问题
$content = $content -replace '容器\$([^}]+)', '容器${1}'
$content = $content -replace '容器\$([a-zA-Z_]+)', '容器${1}'
# 保存为UTF-8 with BOM
$utf8 = New-Object System.Text.UTF8Encoding $true
[System.IO.File]::WriteAllText((Resolve-Path $scriptPath).Path, $content, $utf8)
Write-Host "脚本修复完成!" -ForegroundColor Green
Write-Host "文件已使用UTF-8 with BOM编码保存" -ForegroundColor Green
...@@ -12,10 +12,10 @@ if [ -z "$LIB_DIR" ]; then ...@@ -12,10 +12,10 @@ if [ -z "$LIB_DIR" ]; then
fi fi
# 加载配置文件 # 加载配置文件
if [ -f "$LIB_DIR/config.sh" ]; then if [ -f "$LIB_DIR/lib/config.sh" ]; then
source "$LIB_DIR/config.sh" source "$LIB_DIR/lib/config.sh"
else else
echo "ERROR: 配置文件不存在: $LIB_DIR/config.sh" >&2 echo "ERROR: 配置文件不存在: $LIB_DIR/lib/config.sh" >&2
exit 1 exit 1
fi fi
......
# 测试脚本语法
$ErrorActionPreference = "Stop"
try {
# 尝试解析脚本
$scriptPath = Join-Path $PSScriptRoot "check_server_health.ps1"
$content = Get-Content $scriptPath -Raw
Write-Host "正在检查脚本语法..." -ForegroundColor Cyan
# 检查关键修复点
if ($content -match '\$MYSQL_PASSWORD = ''[^'']*''') {
Write-Host "✓ MySQL密码变量语法正确" -ForegroundColor Green
} else {
Write-Host "✗ MySQL密码变量可能有问题" -ForegroundColor Red
}
if ($content -match '\$REDIS_PASSWORD = ''[^'']*''') {
Write-Host "✓ Redis密码变量语法正确" -ForegroundColor Green
} else {
Write-Host "✗ Redis密码变量可能有问题" -ForegroundColor Red
}
Write-Host "`n语法检查完成!" -ForegroundColor Green
}
catch {
Write-Host "语法检查失败: $($_.Exception.Message)" -ForegroundColor Red
}
# 钉钉通知优化_需求文档
## 相关资料
### 钉钉代码脚本
- Docs/PRD/AI服务器监测/通用模块
### 脚本运行环境
- 脚本会在桌面上运行,所以我会将相关脚本资料拷贝至桌面。
### 需求优化
- 在调用钉钉通知之前,需要将[通用模块/钉钉通知/reports]路径下最新的报告文件能够转换链接,公网访问。
- ngrok信息:
- 同级目录的[通用模块/钉钉通知/ngrok]文件夹下。
- 可以运行start.bat,启动ngrok服务。
- 端口映射为本机的80->19981。
- 我已经配置好了,直接执行就行。
- 启用HTTP服务监听
- python -m http.server 80 --directory reports
- 监听路径为:[C:\Users\UBAINS\Desktop\Test]
- 完整链接拼接规则
- https://nat.ubainsyun.com:19981/通用模块/钉钉通知/reports/*.md
- 钉钉通知前补充链接。例如下面所示:
- ```ignorelang
🖥 服务器巡检报告 - 展厅环境
时间: 2026-05-15T17:05:34 主机: 192.168.5.202 (localhost) 状态: 🔴 CRITICAL (14严重, 9警告)
📊 核心指标
指标
当前
阈值
状态
CPU使用率
1.6%
85%
🟢
内存使用率
73.0%
85%
🟢
Swap使用率
2.6/7.8GB
>20%
🔴
线程总数
5440
1000
🔴
🚨 严重问题
• 内存: OOM Killer事件检测
• 进程: 线程总数过高: 5440
• 进程: 孤儿进程过多: 59
• 安全: 24小时认证失败次数过多: 2964
• 安全: 检测到SSH暴力破解攻击
• ...等14个严重问题
🐳 容器状态
• umysql: 🟢 running
• uredis: 🟢 running
• uemqx: 🟢 running
• ujava3: 🟢 running
• unacos: 🟢 running
• upython: 🟢 running
• ujava: 🔴 exited
• unginx: 🔴 not_exist
💡 AI分析建议
🔴 立即处理SSH暴力破解攻击: 检测到来自192.168.9.51的2942次暴力破解尝试,建议立即封禁该IP并加固SSH配置。 🟠 检查并恢复异常服务: Nginx容器未运行、Nacos健康状态DOWN、Java Web端口8080未监听,需要尽快恢复。 🟠 排查线程数过高问题: 系统线程数5440远超建议阈值1000,可能存在线程泄漏。
评分: 35/100 | 风险等级: CRITICAL 报告生成: ai_health_check_v4.0 | 耗时: 180秒
📄 查看完整报告
https://nat.ubainsyun.com:19981/通用模块/钉钉通知/reports/*.md
2026-05-15 10:36:57
```
\ No newline at end of file
...@@ -42,14 +42,15 @@ ...@@ -42,14 +42,15 @@
# 工作流程 # 工作流程
## 步骤1: 执行脚本 ## 步骤1: 执行脚本
- 执行同级目录下的["服务器监测\check_server_health_v5.ps1"] - 执行同级目录下的["服务器监测\check_server_health.ps1"]
- 服务器信息填写上面的信息 - 服务器信息填写上面的信息
## 步骤2:报告分析 ## 步骤2:报告分析
- 对比上次巡检结果,生成结构化的JSON数据,参考同级目录下的[JSON格式说明.md] - 对比上次巡检结果,生成结构化的JSON数据,参考同级目录下的[JSON格式说明.md]
## 步骤3:发送钉钉通知 ## 步骤3:发送钉钉通知
- 根据[Docs/PRD/AI服务器监测/通用模块/钉钉通知/README.md]调用钉钉发送钉钉消息。 - 将生成的报告拷贝到钉钉通知的目录[通用模块/钉钉通知/reports]下。
- 根据同级目录下的[通用模块/钉钉通知/README.md]调用钉钉发送钉钉消息。
--- ---
......
...@@ -124,44 +124,66 @@ sed -i 's|$LIB_DIR/common.sh|$LIB_DIR/lib/common.sh|g' lib/service/*.sh ...@@ -124,44 +124,66 @@ sed -i 's|$LIB_DIR/common.sh|$LIB_DIR/lib/common.sh|g' lib/service/*.sh
### 7.1 已执行修复 ### 7.1 已执行修复
#### 修复内容 #### 修复1: 批量修改模块脚本配置文件路径
批量修改了所有模块脚本中的配置文件路径引用: **修改**: 31个模块脚本的配置文件路径
**修改命令**:
```bash ```bash
# 修改if语句中的路径 # 修改前
sed -i 's|"$LIB_DIR/config.sh"|"$LIB_DIR/lib/config.sh"|g' lib/system/*.sh lib/service/*.sh source "$LIB_DIR/config.sh" # 错误路径
sed -i 's|"$LIB_DIR/common.sh"|"$LIB_DIR/lib/common.sh"|g' lib/system/*.sh lib/service/*.sh
# 修改错误消息中的路径 # 修改后
sed -i 's|配置文件不存在: $LIB_DIR/config.sh|配置文件不存在: $LIB_DIR/lib/config.sh|g' lib/system/*.sh lib/service/*.sh source "$LIB_DIR/lib/config.sh" # 正确路径
sed -i 's|通用函数库不存在: $LIB_DIR/common.sh|通用函数库不存在: $LIB_DIR/lib/common.sh|g' lib/system/*.sh lib/service/*.sh
``` ```
**影响文件**: 31个模块脚本 **影响文件**: 31个模块脚本 (system: 13个, service: 18个)
- System模块: 13个
- Service模块: 18个
**同步更新**: #### 修复2: 修复common.sh配置文件路径(关键修复)
- 项目版本: `AuxiliaryTool/ScriptTool/服务器监测/lib/` **问题**: common.sh中的配置文件路径错误
- 桌面版本: `C:/Users/UBAINS/Desktop/Test/lib/` ```bash
# 修改前
if [ -f "$LIB_DIR/config.sh" ]; then
source "$LIB_DIR/config.sh"
# 修改后
if [ -f "$LIB_DIR/lib/config.sh" ]; then
source "$LIB_DIR/lib/config.sh"
```
### 7.2 待优化项 **影响文件**: lib/common.sh
- [x] **上传前清理旧文件**: 已添加 `rm -rf $modulePath` 清理旧模块,确保使用最新文件 #### 修复3: 修复PowerShell脚本执行命令
- [ ] **路径变量优化**: 考虑在模块中定义 `$CONFIG_FILE``$COMMON_LIB` 变量,避免硬编码重复 **问题**: 使用相对路径执行模块,导致工作目录问题
- [ ] **路径验证**: 在加载配置文件前增加更详细的错误提示和诊断信息
- [ ] **统一模块模板**: 创建模块模板文件,避免此类路径问题再次发生
### 7.3 额外修复 **修复**:
```powershell
# 修改前
Invoke-SSHCommand "cd $modulePath && bash $Category/$ModuleName"
**问题**: pscp上传时服务器上的旧文件没有被覆盖,导致使用旧版本模块 # 修改后
$moduleFullPath = "$modulePath/$Category/$ModuleName"
Invoke-SSHCommand "bash $moduleFullPath"
```
**解决方案**: 在 `Publish-Modules` 函数中添加清理步骤: #### 修复4: 添加上传前清理逻辑
**修复**: 在上传前清理旧模块目录
```powershell ```powershell
# 清理并重新创建远程模块目录(确保使用最新文件) # 清理并重新创建远程模块目录(确保使用最新文件)
Write-Log "清理旧模块目录..." Write-Log "清理旧模块目录..."
Invoke-SSHCommand "rm -rf $modulePath" | Out-Null Invoke-SSHCommand "rm -rf $modulePath" | Out-Null
``` ```
**修复位置**: `check_server_health_v5.ps1` 第267行 **同步更新**:
- 项目版本: `AuxiliaryTool/ScriptTool/服务器监测/`
- 桌面版本: `C:/Users/UBAINS/Desktop/Test/`
#### 执行验证结果
- ✅ 26个模块全部执行完成
- ✅ 无配置文件路径错误
- ✅ 检测报告正常生成
### 7.2 待优化项
- [x] **上传前清理旧文件**: 已添加 `rm -rf $modulePath` 清理旧模块,确保使用最新文件
- [ ] **路径变量优化**: 考虑在模块中定义 `$CONFIG_FILE``$COMMON_LIB` 变量,避免硬编码重复
- [ ] **路径验证**: 在加载配置文件前增加更详细的错误提示和诊断信息
- [ ] **统一模块模板**: 创建模块模板文件,避免此类路径问题再次发生
- [ ] **模块超时优化**: 09_system_logs.sh 和 10_time_sync.sh 执行超时,需要优化检测效率
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论