服务治理
服务限流
什么是限流?为什么需要限流?
答案: 限流是指限制系统的请求速率,防止系统过载。
为什么需要限流?
- 保护系统,防止雪崩
- 保证核心业务可用
- 防止恶意攻击
- 控制成本
常见场景:
- 秒杀活动
- 热点数据访问
- API接口调用
- 爬虫防护
常见的限流算法?
答案:
1. 固定窗口算法
java
public class FixedWindowRateLimiter {
private AtomicInteger counter = new AtomicInteger(0);
private long windowStart = System.currentTimeMillis();
private final int limit = 100; // 每秒100次
private final long windowSize = 1000; // 1秒
public boolean tryAcquire() {
long now = System.currentTimeMillis();
if (now - windowStart >= windowSize) {
// 新窗口
windowStart = now;
counter.set(0);
}
return counter.incrementAndGet() <= limit;
}
}缺点:窗口边界流量突刺问题
2. 滑动窗口算法
java
public class SlidingWindowRateLimiter {
private Queue<Long> timestamps = new LinkedList<>();
private final int limit = 100;
private final long windowSize = 1000;
public synchronized boolean tryAcquire() {
long now = System.currentTimeMillis();
// 移除过期时间戳
while (!timestamps.isEmpty() && now - timestamps.peek() >= windowSize) {
timestamps.poll();
}
if (timestamps.size() < limit) {
timestamps.offer(now);
return true;
}
return false;
}
}3. 漏桶算法(Leaky Bucket)
java
public class LeakyBucketRateLimiter {
private final int capacity = 100; // 桶容量
private final int rate = 10; // 漏出速率(每秒)
private int water = 0; // 当前水量
private long lastLeakTime = System.currentTimeMillis();
public synchronized boolean tryAcquire() {
long now = System.currentTimeMillis();
// 漏水
long leaked = (now - lastLeakTime) / 1000 * rate;
water = Math.max(0, water - (int) leaked);
lastLeakTime = now;
// 加水
if (water < capacity) {
water++;
return true;
}
return false;
}
}特点:平滑流量,但无法应对突发流量
4. 令牌桶算法(Token Bucket)
java
public class TokenBucketRateLimiter {
private final int capacity = 100; // 桶容量
private final int rate = 10; // 生成速率(每秒)
private int tokens = 100; // 当前令牌数
private long lastRefillTime = System.currentTimeMillis();
public synchronized boolean tryAcquire() {
refill();
if (tokens > 0) {
tokens--;
return true;
}
return false;
}
private void refill() {
long now = System.currentTimeMillis();
long tokensToAdd = (now - lastRefillTime) / 1000 * rate;
tokens = Math.min(capacity, tokens + (int) tokensToAdd);
lastRefillTime = now;
}
}特点:允许突发流量
Guava RateLimiter的使用?
答案:
Guava提供了基于令牌桶的限流器。
java
// 创建限流器,每秒10个令牌
RateLimiter rateLimiter = RateLimiter.create(10.0);
// 获取令牌(阻塞)
rateLimiter.acquire(); // 获取1个令牌
rateLimiter.acquire(5); // 获取5个令牌
// 尝试获取令牌(非阻塞)
if (rateLimiter.tryAcquire()) {
// 获取成功
}
// 尝试获取令牌(超时)
if (rateLimiter.tryAcquire(100, TimeUnit.MILLISECONDS)) {
// 100ms内获取成功
}应用示例:
java
@RestController
public class ApiController {
private RateLimiter rateLimiter = RateLimiter.create(100.0); // 每秒100次
@GetMapping("/api/data")
public Result getData() {
if (!rateLimiter.tryAcquire(100, TimeUnit.MILLISECONDS)) {
return Result.error("请求过于频繁");
}
return Result.success(data);
}
}分布式限流如何实现?
答案:
方案1:Redis + Lua
lua
-- 令牌桶限流
local key = KEYS[1]
local capacity = tonumber(ARGV[1]) -- 桶容量
local rate = tonumber(ARGV[2]) -- 生成速率
local requested = tonumber(ARGV[3]) -- 请求令牌数
local tokens_key = key .. ":tokens"
local timestamp_key = key .. ":timestamp"
local tokens = tonumber(redis.call('get', tokens_key))
local last_time = tonumber(redis.call('get', timestamp_key))
local now = tonumber(ARGV[4])
if tokens == nil then
tokens = capacity
last_time = now
end
-- 补充令牌
local delta = math.max(0, now - last_time)
local new_tokens = math.min(capacity, tokens + delta * rate)
if new_tokens >= requested then
new_tokens = new_tokens - requested
redis.call('set', tokens_key, new_tokens)
redis.call('set', timestamp_key, now)
return 1
else
return 0
endjava
@Service
public class RedisRateLimiter {
@Autowired
private RedisTemplate<String, String> redisTemplate;
private String luaScript = "..."; // 上面的Lua脚本
public boolean tryAcquire(String key, int capacity, int rate, int requested) {
List<String> keys = Collections.singletonList(key);
Long now = System.currentTimeMillis() / 1000;
Long result = redisTemplate.execute(
new DefaultRedisScript<>(luaScript, Long.class),
keys,
String.valueOf(capacity),
String.valueOf(rate),
String.valueOf(requested),
String.valueOf(now)
);
return result != null && result == 1;
}
}方案2:Sentinel
java
@RestController
public class ApiController {
@GetMapping("/api/data")
@SentinelResource(value = "getData", blockHandler = "handleBlock")
public Result getData() {
return Result.success(data);
}
public Result handleBlock(BlockException ex) {
return Result.error("请求过于频繁");
}
}
// 配置限流规则
FlowRule rule = new FlowRule();
rule.setResource("getData");
rule.setGrade(RuleConstant.FLOW_GRADE_QPS);
rule.setCount(100); // QPS阈值
FlowRuleManager.loadRules(Collections.singletonList(rule));服务熔断降级
什么是熔断降级?
答案:
熔断:当服务调用失败率达到阈值时,自动切断服务调用,直接返回降级结果。
降级:当系统压力过大时,主动关闭部分非核心功能,保证核心功能可用。
区别:
- 熔断是被动的,由系统自动触发
- 降级是主动的,由人工或配置触发
Sentinel的熔断策略?
答案:
三种熔断策略:
1. 慢调用比例
java
DegradeRule rule = new DegradeRule();
rule.setResource("getData");
rule.setGrade(RuleConstant.DEGRADE_GRADE_RT); // 响应时间
rule.setCount(100); // 慢调用阈值100ms
rule.setSlowRatioThreshold(0.5); // 慢调用比例50%
rule.setMinRequestAmount(5); // 最小请求数
rule.setStatIntervalMs(1000); // 统计时长1秒
rule.setTimeWindow(10); // 熔断时长10秒2. 异常比例
java
DegradeRule rule = new DegradeRule();
rule.setResource("getData");
rule.setGrade(RuleConstant.DEGRADE_GRADE_EXCEPTION_RATIO);
rule.setCount(0.5); // 异常比例50%
rule.setMinRequestAmount(5);
rule.setStatIntervalMs(1000);
rule.setTimeWindow(10);3. 异常数
java
DegradeRule rule = new DegradeRule();
rule.setResource("getData");
rule.setGrade(RuleConstant.DEGRADE_GRADE_EXCEPTION_COUNT);
rule.setCount(10); // 异常数10次
rule.setStatIntervalMs(1000);
rule.setTimeWindow(10);如何实现服务降级?
答案:
方案1:返回默认值
java
@Service
public class UserService {
@SentinelResource(value = "getUser", fallback = "getUserFallback")
public User getUser(Long id) {
return restTemplate.getForObject("http://user-service/user/" + id, User.class);
}
public User getUserFallback(Long id, Throwable ex) {
return new User(id, "默认用户", "服务降级");
}
}方案2:返回缓存数据
java
@Service
public class ProductService {
@Autowired
private RedisTemplate redisTemplate;
@SentinelResource(value = "getProduct", fallback = "getProductFallback")
public Product getProduct(Long id) {
return restTemplate.getForObject("http://product-service/product/" + id, Product.class);
}
public Product getProductFallback(Long id, Throwable ex) {
// 返回缓存数据
return (Product) redisTemplate.opsForValue().get("product:" + id);
}
}方案3:降级开关
java
@Service
public class OrderService {
@Value("${order.degrade.enabled:false}")
private boolean degradeEnabled;
public Result createOrder(Order order) {
if (degradeEnabled) {
return Result.error("系统繁忙,请稍后再试");
}
// 正常业务逻辑
return Result.success();
}
}灰度发布
什么是灰度发布?
答案: 灰度发布(金丝雀发布)是指在发布新版本时,先让一小部分用户使用新版本,验证无问题后再全量发布。
优点:
- 降低发布风险
- 快速回滚
- A/B测试
策略:
- 按用户ID灰度
- 按地域灰度
- 按百分比灰度
如何实现灰度发布?
答案:
方案1:基于Ribbon的灰度路由
java
@Configuration
public class GrayRuleConfig {
@Bean
public IRule grayRule() {
return new GrayRule();
}
}
public class GrayRule extends AbstractLoadBalancerRule {
@Override
public Server choose(Object key) {
List<Server> servers = getLoadBalancer().getAllServers();
// 获取灰度标记
String grayTag = RpcContext.getContext().getAttachment("gray");
if ("true".equals(grayTag)) {
// 选择灰度服务器
return servers.stream()
.filter(s -> "gray".equals(s.getMetaInfo().get("version")))
.findFirst()
.orElse(servers.get(0));
} else {
// 选择正常服务器
return servers.stream()
.filter(s -> !"gray".equals(s.getMetaInfo().get("version")))
.findFirst()
.orElse(servers.get(0));
}
}
}方案2:基于Gateway的灰度路由
java
@Component
public class GrayFilter implements GlobalFilter, Ordered {
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
String userId = exchange.getRequest().getHeaders().getFirst("userId");
// 灰度用户列表
Set<String> grayUsers = getGrayUsers();
if (grayUsers.contains(userId)) {
// 路由到灰度版本
exchange.getAttributes().put("version", "gray");
} else {
// 路由到正常版本
exchange.getAttributes().put("version", "stable");
}
return chain.filter(exchange);
}
@Override
public int getOrder() {
return -100;
}
}方案3:基于Nacos的灰度发布
yaml
spring:
cloud:
nacos:
discovery:
metadata:
version: gray # 灰度版本标记java
@Component
public class GrayLoadBalancer {
public ServiceInstance choose(List<ServiceInstance> instances) {
String version = RpcContext.getContext().getAttachment("version");
return instances.stream()
.filter(i -> version.equals(i.getMetadata().get("version")))
.findFirst()
.orElse(instances.get(0));
}
}链路追踪
什么是链路追踪?
答案: 链路追踪是指记录一次请求在微服务系统中的完整调用链路,用于性能分析和故障排查。
核心概念:
- Trace:一次完整的请求链路
- Span:一次服务调用
- TraceId:全局唯一ID
- SpanId:Span的唯一ID
- ParentSpanId:父Span的ID
示例:
TraceId: 123456
Span1: Gateway (SpanId: 1, ParentSpanId: null)
Span2: OrderService (SpanId: 2, ParentSpanId: 1)
Span3: StockService (SpanId: 3, ParentSpanId: 2)
Span4: AccountService (SpanId: 4, ParentSpanId: 2)Sleuth + Zipkin的使用?
答案:
1. 添加依赖
xml
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-zipkin</artifactId>
</dependency>2. 配置
yaml
spring:
sleuth:
sampler:
probability: 1.0 # 采样率100%
zipkin:
base-url: https://zipkin.example.com
sender:
type: web3. 日志输出
[service-name,traceId,spanId,exportable]
[order-service,123456,1,true] 创建订单
[stock-service,123456,2,true] 扣减库存4. 自定义Span
java
@Service
public class OrderService {
@Autowired
private Tracer tracer;
public void createOrder(Order order) {
Span span = tracer.nextSpan().name("createOrder").start();
try (Tracer.SpanInScope ws = tracer.withSpan(span)) {
// 业务逻辑
span.tag("orderId", order.getId().toString());
span.tag("amount", order.getAmount().toString());
} finally {
span.end();
}
}
}SkyWalking的使用?
答案:
SkyWalking是国产的APM(应用性能监控)系统,无侵入式。
1. 下载Agent
bash
wget https://archive.apache.org/dist/skywalking/8.9.0/apache-skywalking-apm-8.9.0.tar.gz
tar -zxvf apache-skywalking-apm-8.9.0.tar.gz2. 启动应用
bash
java -javaagent:/path/to/skywalking-agent.jar \
-Dskywalking.agent.service_name=order-service \
-Dskywalking.collector.backend_service=127.0.0.1:11800 \
-jar order-service.jar3. 查看链路 访问 http://localhost:8080 查看SkyWalking UI
特点:
- 无侵入式,不需要修改代码
- 支持多种框架(Spring、Dubbo、MyBatis等)
- 提供丰富的监控指标
- 支持告警
服务监控
常见的监控指标?
答案:
1. 系统指标
- CPU使用率
- 内存使用率
- 磁盘使用率
- 网络流量
2. 应用指标
- QPS(每秒请求数)
- 响应时间(RT)
- 错误率
- 并发数
3. 业务指标
- 订单量
- 支付成功率
- 用户活跃度
Prometheus + Grafana的使用?
答案:
1. 添加依赖
xml
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>2. 配置
yaml
management:
endpoints:
web:
exposure:
include: prometheus,health,info
metrics:
export:
prometheus:
enabled: true3. Prometheus配置
yaml
# prometheus.yml
scrape_configs:
- job_name: 'spring-boot'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['localhost:8080']4. 自定义指标
java
@Service
public class OrderService {
private Counter orderCounter;
private Timer orderTimer;
public OrderService(MeterRegistry registry) {
this.orderCounter = Counter.builder("order.created")
.description("订单创建数量")
.register(registry);
this.orderTimer = Timer.builder("order.create.time")
.description("订单创建耗时")
.register(registry);
}
public void createOrder(Order order) {
orderTimer.record(() -> {
// 业务逻辑
orderCounter.increment();
});
}
}练习题
- 如何设计一个高可用的限流系统?
- 熔断和降级的区别是什么?
- 如何实现全链路灰度发布?
- 如何排查微服务的性能问题?
- 如何设计服务的监控告警体系?