Skip to content

服务治理

服务限流

什么是限流?为什么需要限流?

答案: 限流是指限制系统的请求速率,防止系统过载。

为什么需要限流?

  • 保护系统,防止雪崩
  • 保证核心业务可用
  • 防止恶意攻击
  • 控制成本

常见场景

  • 秒杀活动
  • 热点数据访问
  • API接口调用
  • 爬虫防护

常见的限流算法?

答案:

1. 固定窗口算法

java
public class FixedWindowRateLimiter {
    private AtomicInteger counter = new AtomicInteger(0);
    private long windowStart = System.currentTimeMillis();
    private final int limit = 100;  // 每秒100次
    private final long windowSize = 1000;  // 1秒

    public boolean tryAcquire() {
        long now = System.currentTimeMillis();
        if (now - windowStart >= windowSize) {
            // 新窗口
            windowStart = now;
            counter.set(0);
        }

        return counter.incrementAndGet() <= limit;
    }
}

缺点:窗口边界流量突刺问题

2. 滑动窗口算法

java
public class SlidingWindowRateLimiter {
    private Queue<Long> timestamps = new LinkedList<>();
    private final int limit = 100;
    private final long windowSize = 1000;

    public synchronized boolean tryAcquire() {
        long now = System.currentTimeMillis();

        // 移除过期时间戳
        while (!timestamps.isEmpty() && now - timestamps.peek() >= windowSize) {
            timestamps.poll();
        }

        if (timestamps.size() < limit) {
            timestamps.offer(now);
            return true;
        }
        return false;
    }
}

3. 漏桶算法(Leaky Bucket)

java
public class LeakyBucketRateLimiter {
    private final int capacity = 100;  // 桶容量
    private final int rate = 10;  // 漏出速率(每秒)
    private int water = 0;  // 当前水量
    private long lastLeakTime = System.currentTimeMillis();

    public synchronized boolean tryAcquire() {
        long now = System.currentTimeMillis();

        // 漏水
        long leaked = (now - lastLeakTime) / 1000 * rate;
        water = Math.max(0, water - (int) leaked);
        lastLeakTime = now;

        // 加水
        if (water < capacity) {
            water++;
            return true;
        }
        return false;
    }
}

特点:平滑流量,但无法应对突发流量

4. 令牌桶算法(Token Bucket)

java
public class TokenBucketRateLimiter {
    private final int capacity = 100;  // 桶容量
    private final int rate = 10;  // 生成速率(每秒)
    private int tokens = 100;  // 当前令牌数
    private long lastRefillTime = System.currentTimeMillis();

    public synchronized boolean tryAcquire() {
        refill();

        if (tokens > 0) {
            tokens--;
            return true;
        }
        return false;
    }

    private void refill() {
        long now = System.currentTimeMillis();
        long tokensToAdd = (now - lastRefillTime) / 1000 * rate;

        tokens = Math.min(capacity, tokens + (int) tokensToAdd);
        lastRefillTime = now;
    }
}

特点:允许突发流量

Guava RateLimiter的使用?

答案:

Guava提供了基于令牌桶的限流器。

java
// 创建限流器,每秒10个令牌
RateLimiter rateLimiter = RateLimiter.create(10.0);

// 获取令牌(阻塞)
rateLimiter.acquire();  // 获取1个令牌
rateLimiter.acquire(5);  // 获取5个令牌

// 尝试获取令牌(非阻塞)
if (rateLimiter.tryAcquire()) {
    // 获取成功
}

// 尝试获取令牌(超时)
if (rateLimiter.tryAcquire(100, TimeUnit.MILLISECONDS)) {
    // 100ms内获取成功
}

应用示例

java
@RestController
public class ApiController {

    private RateLimiter rateLimiter = RateLimiter.create(100.0);  // 每秒100次

    @GetMapping("/api/data")
    public Result getData() {
        if (!rateLimiter.tryAcquire(100, TimeUnit.MILLISECONDS)) {
            return Result.error("请求过于频繁");
        }

        return Result.success(data);
    }
}

分布式限流如何实现?

答案:

方案1:Redis + Lua

lua
-- 令牌桶限流
local key = KEYS[1]
local capacity = tonumber(ARGV[1])  -- 桶容量
local rate = tonumber(ARGV[2])  -- 生成速率
local requested = tonumber(ARGV[3])  -- 请求令牌数

local tokens_key = key .. ":tokens"
local timestamp_key = key .. ":timestamp"

local tokens = tonumber(redis.call('get', tokens_key))
local last_time = tonumber(redis.call('get', timestamp_key))
local now = tonumber(ARGV[4])

if tokens == nil then
    tokens = capacity
    last_time = now
end

-- 补充令牌
local delta = math.max(0, now - last_time)
local new_tokens = math.min(capacity, tokens + delta * rate)

if new_tokens >= requested then
    new_tokens = new_tokens - requested
    redis.call('set', tokens_key, new_tokens)
    redis.call('set', timestamp_key, now)
    return 1
else
    return 0
end
java
@Service
public class RedisRateLimiter {

    @Autowired
    private RedisTemplate<String, String> redisTemplate;

    private String luaScript = "...";  // 上面的Lua脚本

    public boolean tryAcquire(String key, int capacity, int rate, int requested) {
        List<String> keys = Collections.singletonList(key);
        Long now = System.currentTimeMillis() / 1000;

        Long result = redisTemplate.execute(
            new DefaultRedisScript<>(luaScript, Long.class),
            keys,
            String.valueOf(capacity),
            String.valueOf(rate),
            String.valueOf(requested),
            String.valueOf(now)
        );

        return result != null && result == 1;
    }
}

方案2:Sentinel

java
@RestController
public class ApiController {

    @GetMapping("/api/data")
    @SentinelResource(value = "getData", blockHandler = "handleBlock")
    public Result getData() {
        return Result.success(data);
    }

    public Result handleBlock(BlockException ex) {
        return Result.error("请求过于频繁");
    }
}

// 配置限流规则
FlowRule rule = new FlowRule();
rule.setResource("getData");
rule.setGrade(RuleConstant.FLOW_GRADE_QPS);
rule.setCount(100);  // QPS阈值
FlowRuleManager.loadRules(Collections.singletonList(rule));

服务熔断降级

什么是熔断降级?

答案:

熔断:当服务调用失败率达到阈值时,自动切断服务调用,直接返回降级结果。

降级:当系统压力过大时,主动关闭部分非核心功能,保证核心功能可用。

区别

  • 熔断是被动的,由系统自动触发
  • 降级是主动的,由人工或配置触发

Sentinel的熔断策略?

答案:

三种熔断策略

1. 慢调用比例

java
DegradeRule rule = new DegradeRule();
rule.setResource("getData");
rule.setGrade(RuleConstant.DEGRADE_GRADE_RT);  // 响应时间
rule.setCount(100);  // 慢调用阈值100ms
rule.setSlowRatioThreshold(0.5);  // 慢调用比例50%
rule.setMinRequestAmount(5);  // 最小请求数
rule.setStatIntervalMs(1000);  // 统计时长1秒
rule.setTimeWindow(10);  // 熔断时长10秒

2. 异常比例

java
DegradeRule rule = new DegradeRule();
rule.setResource("getData");
rule.setGrade(RuleConstant.DEGRADE_GRADE_EXCEPTION_RATIO);
rule.setCount(0.5);  // 异常比例50%
rule.setMinRequestAmount(5);
rule.setStatIntervalMs(1000);
rule.setTimeWindow(10);

3. 异常数

java
DegradeRule rule = new DegradeRule();
rule.setResource("getData");
rule.setGrade(RuleConstant.DEGRADE_GRADE_EXCEPTION_COUNT);
rule.setCount(10);  // 异常数10次
rule.setStatIntervalMs(1000);
rule.setTimeWindow(10);

如何实现服务降级?

答案:

方案1:返回默认值

java
@Service
public class UserService {

    @SentinelResource(value = "getUser", fallback = "getUserFallback")
    public User getUser(Long id) {
        return restTemplate.getForObject("http://user-service/user/" + id, User.class);
    }

    public User getUserFallback(Long id, Throwable ex) {
        return new User(id, "默认用户", "服务降级");
    }
}

方案2:返回缓存数据

java
@Service
public class ProductService {

    @Autowired
    private RedisTemplate redisTemplate;

    @SentinelResource(value = "getProduct", fallback = "getProductFallback")
    public Product getProduct(Long id) {
        return restTemplate.getForObject("http://product-service/product/" + id, Product.class);
    }

    public Product getProductFallback(Long id, Throwable ex) {
        // 返回缓存数据
        return (Product) redisTemplate.opsForValue().get("product:" + id);
    }
}

方案3:降级开关

java
@Service
public class OrderService {

    @Value("${order.degrade.enabled:false}")
    private boolean degradeEnabled;

    public Result createOrder(Order order) {
        if (degradeEnabled) {
            return Result.error("系统繁忙,请稍后再试");
        }

        // 正常业务逻辑
        return Result.success();
    }
}

灰度发布

什么是灰度发布?

答案: 灰度发布(金丝雀发布)是指在发布新版本时,先让一小部分用户使用新版本,验证无问题后再全量发布。

优点

  • 降低发布风险
  • 快速回滚
  • A/B测试

策略

  • 按用户ID灰度
  • 按地域灰度
  • 按百分比灰度

如何实现灰度发布?

答案:

方案1:基于Ribbon的灰度路由

java
@Configuration
public class GrayRuleConfig {

    @Bean
    public IRule grayRule() {
        return new GrayRule();
    }
}

public class GrayRule extends AbstractLoadBalancerRule {

    @Override
    public Server choose(Object key) {
        List<Server> servers = getLoadBalancer().getAllServers();

        // 获取灰度标记
        String grayTag = RpcContext.getContext().getAttachment("gray");

        if ("true".equals(grayTag)) {
            // 选择灰度服务器
            return servers.stream()
                .filter(s -> "gray".equals(s.getMetaInfo().get("version")))
                .findFirst()
                .orElse(servers.get(0));
        } else {
            // 选择正常服务器
            return servers.stream()
                .filter(s -> !"gray".equals(s.getMetaInfo().get("version")))
                .findFirst()
                .orElse(servers.get(0));
        }
    }
}

方案2:基于Gateway的灰度路由

java
@Component
public class GrayFilter implements GlobalFilter, Ordered {

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        String userId = exchange.getRequest().getHeaders().getFirst("userId");

        // 灰度用户列表
        Set<String> grayUsers = getGrayUsers();

        if (grayUsers.contains(userId)) {
            // 路由到灰度版本
            exchange.getAttributes().put("version", "gray");
        } else {
            // 路由到正常版本
            exchange.getAttributes().put("version", "stable");
        }

        return chain.filter(exchange);
    }

    @Override
    public int getOrder() {
        return -100;
    }
}

方案3:基于Nacos的灰度发布

yaml
spring:
  cloud:
    nacos:
      discovery:
        metadata:
          version: gray  # 灰度版本标记
java
@Component
public class GrayLoadBalancer {

    public ServiceInstance choose(List<ServiceInstance> instances) {
        String version = RpcContext.getContext().getAttachment("version");

        return instances.stream()
            .filter(i -> version.equals(i.getMetadata().get("version")))
            .findFirst()
            .orElse(instances.get(0));
    }
}

链路追踪

什么是链路追踪?

答案: 链路追踪是指记录一次请求在微服务系统中的完整调用链路,用于性能分析和故障排查。

核心概念

  • Trace:一次完整的请求链路
  • Span:一次服务调用
  • TraceId:全局唯一ID
  • SpanId:Span的唯一ID
  • ParentSpanId:父Span的ID

示例

TraceId: 123456
  Span1: Gateway (SpanId: 1, ParentSpanId: null)
    Span2: OrderService (SpanId: 2, ParentSpanId: 1)
      Span3: StockService (SpanId: 3, ParentSpanId: 2)
      Span4: AccountService (SpanId: 4, ParentSpanId: 2)

Sleuth + Zipkin的使用?

答案:

1. 添加依赖

xml
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-sleuth-zipkin</artifactId>
</dependency>

2. 配置

yaml
spring:
  sleuth:
    sampler:
      probability: 1.0  # 采样率100%
  zipkin:
    base-url: https://zipkin.example.com
    sender:
      type: web

3. 日志输出

[service-name,traceId,spanId,exportable]
[order-service,123456,1,true] 创建订单
[stock-service,123456,2,true] 扣减库存

4. 自定义Span

java
@Service
public class OrderService {

    @Autowired
    private Tracer tracer;

    public void createOrder(Order order) {
        Span span = tracer.nextSpan().name("createOrder").start();
        try (Tracer.SpanInScope ws = tracer.withSpan(span)) {
            // 业务逻辑
            span.tag("orderId", order.getId().toString());
            span.tag("amount", order.getAmount().toString());
        } finally {
            span.end();
        }
    }
}

SkyWalking的使用?

答案:

SkyWalking是国产的APM(应用性能监控)系统,无侵入式。

1. 下载Agent

bash
wget https://archive.apache.org/dist/skywalking/8.9.0/apache-skywalking-apm-8.9.0.tar.gz
tar -zxvf apache-skywalking-apm-8.9.0.tar.gz

2. 启动应用

bash
java -javaagent:/path/to/skywalking-agent.jar \
     -Dskywalking.agent.service_name=order-service \
     -Dskywalking.collector.backend_service=127.0.0.1:11800 \
     -jar order-service.jar

3. 查看链路 访问 http://localhost:8080 查看SkyWalking UI

特点

  • 无侵入式,不需要修改代码
  • 支持多种框架(Spring、Dubbo、MyBatis等)
  • 提供丰富的监控指标
  • 支持告警

服务监控

常见的监控指标?

答案:

1. 系统指标

  • CPU使用率
  • 内存使用率
  • 磁盘使用率
  • 网络流量

2. 应用指标

  • QPS(每秒请求数)
  • 响应时间(RT)
  • 错误率
  • 并发数

3. 业务指标

  • 订单量
  • 支付成功率
  • 用户活跃度

Prometheus + Grafana的使用?

答案:

1. 添加依赖

xml
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

2. 配置

yaml
management:
  endpoints:
    web:
      exposure:
        include: prometheus,health,info
  metrics:
    export:
      prometheus:
        enabled: true

3. Prometheus配置

yaml
# prometheus.yml
scrape_configs:
  - job_name: 'spring-boot'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['localhost:8080']

4. 自定义指标

java
@Service
public class OrderService {

    private Counter orderCounter;
    private Timer orderTimer;

    public OrderService(MeterRegistry registry) {
        this.orderCounter = Counter.builder("order.created")
            .description("订单创建数量")
            .register(registry);

        this.orderTimer = Timer.builder("order.create.time")
            .description("订单创建耗时")
            .register(registry);
    }

    public void createOrder(Order order) {
        orderTimer.record(() -> {
            // 业务逻辑
            orderCounter.increment();
        });
    }
}

练习题

  1. 如何设计一个高可用的限流系统?
  2. 熔断和降级的区别是什么?
  3. 如何实现全链路灰度发布?
  4. 如何排查微服务的性能问题?
  5. 如何设计服务的监控告警体系?

Released under the MIT License.