们如何解决持续的速率限制问题并为我们的人工智能学习平台实现强大的批量电子邮件功能
挑战:当电子邮件开始失败时
想象一下:你构建了一个漂亮的课程安排系统,可以向用户发送个性化的每日课程。在开发阶段,一切运行正常,但在生产环境中,你开始看到一些神秘的电子邮件失败:
这正是我们基于AI 的学习平台DailyMastery所面临的情况。尽管我们拥有基于 Nx Monorepo、Fastify 微服务构建的稳固架构,并部署在 Google Cloud Run 上,但我们的电子邮件传递系统却持续触及 Resend 的速率限制。
理解问题
初始架构
我们的系统有一个简单的方法:
- 由 Cloud Scheduler 触发的课程安排程序(每天 3 次)
- 针对每封课程电子邮件进行单独的 API 调用以重新发送
- 无速率限制保护
- 基本错误处理
// The problematic approach
for (const lesson of lessons) {
const emailResult = await this.emailService.sendEmail({
to: user.email,
subject: lesson.title,
html: lessonContent,
})
// Rate limits hit here when processing multiple users rapidly
}
根本原因
重新发送功能强制所有端点每秒发送 2 个请求。当我们的调度程序处理多个用户,每个用户包含多个课程时,很快就会超出此限制,从而导致连锁故障。
调查:像专业人士一样调试 Cloud Run
步骤 1:增强日志记录策略
第一步是实施全面的日志记录,以准确了解正在发生的事情:
async sendEmail(options: EmailOptions): Promise<{ success: boolean; error?: string; messageId?: string }> {
try {
console.log('Attempting to send email via Resend:', {
to: options.to,
subject: options.subject,
htmlLength: options.html?.length || 0
});
const result = await this.resend.emails.send(emailData);
console.log('Resend API response:', result);
// Critical: Check if Resend returned an error (they don't throw exceptions!)
if (result.error) {
console.error('Resend API returned error:', result.error);
return {
success: false,
error: result.error.message || JSON.stringify(result.error),
};
}
return {
success: true,
messageId: result.data?.id
};
} catch (error) {
console.error('Failed to send email via Resend:', error);
return {
success: false,
error: error instanceof Error ? error.message : 'Unknown error',
};
}
}
第 2 步:Cloud Run 日志分析技术
您可以在 Google Cloud 控制台中查看www.mxwd.cc日志,但如果您喜欢终端方法,那么以下是我们使用的基本 Cloud Run 调试命令:
# Real-time log monitoring during issues
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=lesson-scheduler" \
--limit=50 --format="value(timestamp,textPayload)" | head -20
# Filter for specific errors
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=lesson-scheduler" \
--limit=100 | grep -E "(rate_limit|429|Failed)"
# Check logs from specific time range
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=lesson-scheduler AND timestamp>="2025-09-03T16:00:00Z"" \
--limit=50
解决方案:多层速率限制策略
第 1 层:适当的错误检测
第一个关键洞察:Resend 不会因速率限制而引发异常。它们会返回错误对象:
// Before: Missing rate limit detection
const result = await this.resend.emails.send(emailData)
return { success: true, messageId: result.data?.id } // Wrong!
// After: Comprehensive error checking
const result = await this.resend.emails.send(emailData)
if (result.error) {
console.error('Resend API returned error:', result.error)
return {
success: false,
error: result.error.message || JSON.stringify(result.error),
}
}
if (!result.data) {
return {
success: false,
error: 'No data returned from Resend API',
}
}
return {
success: true,
messageId: result.data.id,
}
第 2 层:具有自动分块功能的智能批量电子邮件实现
Resend 提供批量 API,单次trx租赁请求最多可发送 100 封邮件。以下是我们针对大批量发送的智能回退和自动分块功能的具体实现:
async sendBatchLessonEmails(
emails: BatchEmailOptions[]
): Promise<{ success: boolean; error?: string; messageIds?: string[]; sentCount?: number; failedCount?: number }> {
try {
if (emails.length === 0) {
return { success: true, messageIds: [], sentCount: 0, failedCount: 0 };
}
console.log(`Sending batch of ${emails.length} lesson emails`);
// Automatic chunking: Split into batches of 100 (Resend limit)
const batchSize = 100;
const batches: BatchEmailOptions[][] = [];
for (let i = 0; i < emails.length; i += batchSize) {
batches.push(emails.slice(i, i + batchSize));
}
let totalSent = 0;
let totalFailed = 0;
const allMessageIds: string[] = [];
const errors: string[] = [];
// Process each chunk sequentially with rate limiting
for (let i = 0; i < batches.length; i++) {
const batch = batches[i];
console.log(`Sending batch ${i + 1}/${batches.length} with ${batch.length} emails`);
const result = await this.emailService.sendBatchEmails(batch);
if (result.success) {
totalSent += batch.length;
if (result.messageIds) {
allMessageIds.push(...result.messageIds);
}
console.log(`✅ Batch ${i + 1} sent successfully (${batch.length} emails)`);
} else {
totalFailed += batch.length;
errors.push(`Batch ${i + 1} failed: ${result.error || 'Unknown error'}`);
console.error(`❌ Batch ${i + 1} failed: ${result.error}`);
}
// Add delay between batches to avoid rate limiting
if (i < batches.length - 1) {
console.log('⏳ Waiting 1 second between batches to avoid rate limiting...');
await new Promise(resolve => setTimeout(resolve, 1000));
}
}
const overallSuccess = totalFailed === 0;
return {
success: overallSuccess,
messageIds: allMessageIds,
sentCount: totalSent,
failedCount: totalFailed,
error: errors.length > 0 ? errors.join('; ') : undefined,
};
} catch (error) {
console.error('Error sending batch lesson emails:', error);
return {
success: false,
error: error instanceof Error ? error.message : 'Unknown error',
sentCount: 0,
failedCount: emails.length,
};
}
}
第三层:架构重构
我们彻底重组了电子邮件发送流程:
// Old approach: Send emails immediately during processing
for (const user of users) {
for (const lesson of lessons) {
await sendEmail(user, lesson) // Rate limit hit here
}
}
// New approach: Prepare first, then batch send
const allPreparedEmails: PreparedEmail[] = []
// Phase 1: Prepare all emails
for (const user of users) {
for (const lesson of lessons) {
const preparedEmail = await this.prepareLessonEmail(user, lesson)
if (preparedEmail) {
allPreparedEmails.push({ email: preparedEmail, user, lesson })
}
}
}
// Phase 2: Batch send with intelligent chunking and accurate tracking
if (allPreparedEmails.length > 0) {
console.log(`🚀 Sending ${allPreparedEmails.length} emails in optimized batches`)
const emailsToSend = allPreparedEmails.map((pe) => pe.email)
// The EmailCoordinator handles chunking internally for 100+ email limits
const batchResult = await this.emailCoordinator.sendBatchLessonEmails(emailsToSend)
if (batchResult.success) {
// Only mark lessons as sent based on actual successful send count
const successfulEmails = allPreparedEmails.slice(0, batchResult.sentCount)
await this.markLessonsAsSent(successfulEmails)
}
}
第 4 层:智能重试逻辑
对于管理报告等重要电子邮件,我们实施了重试逻辑:
async sendSummaryReport(result: SchedulerResult): Promise<void> {
// Add delay to avoid rate limiting after lesson emails
console.log('⏳ Waiting 3 seconds to avoid rate limiting...');
await new Promise(resolve => setTimeout(resolve, 3000));
const emailResult = await this.emailService.sendEmail({
to: this.reportingEmail,
subject: `📊 Lesson Scheduler Report - ${result.timeSlot}`,
html: htmlContent,
});
if (!emailResult.success) {
// Retry logic for rate limits
if (emailResult.error?.includes('rate_limit_exceeded') ||
emailResult.error?.includes('Too many requests')) {
console.log('⏳ Rate limited - waiting 10 seconds and retrying...');
await new Promise(resolve => setTimeout(resolve, 10000));
const retryResult = await this.emailService.sendEmail({
to: this.reportingEmail,
subject: `📊 Lesson Scheduler Report - ${result.timeSlot} (Retry)`,
html: htmlContent,
});
if (retryResult.success) {
console.log('✅ Summary report sent successfully on retry');
}
}
}
}
Cloud Run 的关键调试技术
1. 结构化日志
使用一致的表情符号前缀和结构化数据:
// Good logging practices
console.log('🚀 BATCH EMAIL MODE: Sending emails', { count: emails.length })
console.log('✅ Email sent successfully', { messageId, recipient })
console.log('❌ Email failed', { error: error.message, recipient })
console.log('⏳ Rate limiting delay', { delayMs: 600 })
2. 修订跟踪
始终验证您的代码是否确实已部署:
# Check current revision
gcloud run services describe lesson-scheduler \
--region=us-central1 \
--format="value(status.latestReadyRevisionName)"
# List recent revisions with images
gcloud run revisions list --service=lesson-scheduler \
--format="table(metadata.name,status.conditions[0].status,spec.containers[0].image)"
3.实时监控
在调试期间设置日志流:
# Stream logs in real-time
gcloud beta logging tail "resource.type=cloud_run_revision AND resource.labels.service_name=lesson-scheduler"
# Filter for errors only
gcloud beta logging tail "resource.type=cloud_run_revision AND resource.labels.service_name=lesson-scheduler" \
--filter="severity>=ERROR"
结果和绩效影响
优化前
- ❌ 由于速率限制导致的级联故障
- ❌错过课程导致用户体验不佳
- ❌ 每天需要人工干预
优化后
- ✅ 0% 电子邮件失败率
- ✅ 每批自动处理最多 100 封电子邮件
- ✅无需人工干预
绩效指标
Final Test Results:
- totalUsers: 1
- emailsSent: 3
- emailsFailed: 0
- errorCount: 0
- processingTime: 7.048s
- batchEmailsUsed: true
最佳实践经验
1. 电子邮件服务设计模式
interface EmailService {
// Always return structured results
sendEmail(options: EmailOptions): Promise<{
success: boolean
error?: string
messageId?: string
}>
// Batch operations with fallback
sendBatchEmails(emails: BatchEmailOptions[]): Promise<{
success: boolean
error?: string
messageIds?: string[]
sentCount?: number
failedCount?: number
}>
}
2. 速率限制策略
- 批处理优先:使用批处理 API(如果可用)
- 自动分块:自动将大批量邮件拆分为 100 封邮件
- 智能延迟:单个请求之间延迟 600 毫秒(< 2 个请求/秒),批量请求之间延迟 1 秒
- 准确跟踪:仅根据实际发送次数将操作标记为成功
- 指数退避:重试延迟更长
- 断路器:重复速率限制时快速失败
结论
构建强大的电子邮件传递系统需要了解电子邮件提供商的局限性以及云平台的部署机制。我们的失败率从 30% 降至 0% 的历程告诉我们:
- 速率限制是真实存在的- 从第一天开始就做好计划
- 批处理 API 是游戏规则的改变者- 尽可能使用它们
- Docker 缓存可能会给你带来麻烦- 谨慎构建你的构建
- 全面的日志记录节省时间——投资良好的可观察性
- 后备策略至关重要——始终要有 B 计划
Resend 的批处理 API、智能回退逻辑、适当的速率限制和强大的 Cloud Run 调试技术的结合,将我们的电子邮件传递从日常的麻烦转变为可靠、可扩展的系统。
请记住:在生产系统中,不仅要使其正常工作,还要使其可靠地工作,即使出现问题。