7.7 KiB
Telegram Duplicate Prevention System
Overview
This document describes the comprehensive duplicate prevention system implemented for Telegram message processing in the Tududi application. The system prevents duplicate inbox items from being created when the same message is processed multiple times due to network issues, processing errors, or other edge cases.
Problem Statement
When integrating with Telegram's polling API, several scenarios can cause duplicate messages:
- Network timeouts causing message reprocessing
- Application restarts during message processing
- Race conditions in polling cycles
- Telegram API returning the same update multiple times
- Processing failures that lead to retry attempts
Solution Architecture
The duplicate prevention system implements multiple layers of protection:
1. Update ID Tracking (Application Level)
- Purpose: Prevent reprocessing of the same Telegram update
- Implementation: In-memory Set tracking processed update IDs
- Key Format:
${userId}-${updateId} - Memory Management: Automatic cleanup (keeps last 1000 entries)
// Example usage
const processedUpdates = new Set();
const updateKey = `${user.id}-${update.update_id}`;
if (!processedUpdates.has(updateKey)) {
// Process the update
processedUpdates.add(updateKey);
}
2. Content-Based Duplicate Detection (Database Level)
- Purpose: Prevent duplicate inbox items with identical content
- Implementation: Database query checking recent items (30-second window)
- Scope: Per-user, per-source (telegram)
// Check for existing item within time window
const existingItem = await InboxItem.findOne({
where: {
content: messageContent,
user_id: userId,
source: 'telegram',
created_at: {
[Op.gte]: new Date(Date.now() - 30000) // 30 seconds
}
}
});
3. Telegram API Offset Management
- Purpose: Prevent re-fetching already processed updates
- Implementation: Track highest processed update ID per user
- Persistence: Maintained in poller state
Code Structure
Core Files
services/telegramPoller.js: Main polling logic with duplicate preventionservices/telegramInitializer.js: Initialization and user managementtests/unit/services/telegramPoller.test.js: Unit tests for core functionstests/integration/telegram-duplicates.test.js: Integration tests for database interactionstests/integration/telegram-duplicate-scenario.test.js: Real-world scenario tests
Key Functions
processUpdates(user, updates)
- Filters out already processed updates
- Updates user status with highest update ID
- Processes each new update individually
- Implements cleanup for memory management
createInboxItem(content, userId, messageId)
- Checks for recent duplicates before creation
- Creates inbox item with metadata
- Returns existing item if duplicate found
processMessage(user, update)
- Extracts message data
- Creates inbox item (with duplicate check)
- Sends confirmation back to Telegram
- Handles errors gracefully
Testing Strategy
Unit Tests (12 tests)
- Update ID tracking logic
- User list management
- Message parameter creation
- URL generation
- State management
Integration Tests (12 tests)
- Database-level duplicate prevention
- Poller state management
- Update processing logic
- Error handling scenarios
Scenario Tests (7 tests)
- Network issue simulations
- Rapid consecutive messages
- Update ID tracking in practice
- Poller restart scenarios
- Memory management verification
- Edge cases and time-based logic
Running Tests
# Run all Telegram duplicate tests
npm run test:telegram-duplicates
# Run specific test suites
npm test -- --testPathPatterns="telegramPoller\.test\.js"
npm test -- --testPathPatterns="telegram-duplicates\.test\.js"
npm test -- --testPathPatterns="telegram-duplicate-scenario\.test\.js"
Configuration
Time Windows
- Duplicate Detection: 30 seconds (configurable)
- Memory Cleanup: 1000 entries (configurable)
- Polling Interval: 5 seconds (configurable)
Memory Management
- Automatic cleanup when processed updates exceed 1000 entries
- Removes oldest 100 entries during cleanup
- Prevents memory leaks in long-running processes
Monitoring and Debugging
Logging
The system includes comprehensive logging for debugging:
console.log(`Processing ${updates.length} updates for user ${user.id}`);
console.log(`Duplicate inbox item detected for user ${userId}`);
console.log(`Successfully processed message ${messageId} for user ${user.id}`);
Status Monitoring
const status = telegramPoller.getStatus();
// Returns: { running, usersCount, pollInterval, userStatus }
Error Handling
Network Errors
- Graceful handling of Telegram API timeouts
- Continuation of polling for other users if one fails
- Detailed error logging for debugging
Database Errors
- Fallback behavior when duplicate check fails
- Transaction safety for inbox item creation
- Proper error responses to users
Processing Errors
- Individual update error handling
- Continuation of processing for remaining updates
- Error reporting via Telegram messages
Performance Considerations
Memory Usage
- Bounded memory growth through automatic cleanup
- Efficient Set operations for duplicate checking
- Minimal memory footprint per user
Database Performance
- Indexed queries for duplicate detection
- Time-limited searches (30-second window)
- Efficient user-scoped queries
Network Efficiency
- Optimized Telegram API calls
- Proper offset management
- Reasonable polling intervals
Security Considerations
Data Protection
- User-scoped duplicate checking
- Secure token handling
- No sensitive data in logs
Rate Limiting
- Respectful Telegram API usage
- Configurable polling intervals
- Graceful handling of API limits
Future Enhancements
Possible Improvements
- Persistent State: Store processed update IDs in database
- Configurable Windows: Make time windows user-configurable
- Metrics Collection: Add detailed metrics for monitoring
- Retry Logic: Implement exponential backoff for failures
- Batch Processing: Process multiple updates in batches
Migration Considerations
- Current system maintains backward compatibility
- New features can be added incrementally
- Existing data remains unaffected
Troubleshooting
Common Issues
-
Duplicates Still Occurring
- Check if time window is appropriate for your use case
- Verify update ID tracking is working
- Review logs for processing errors
-
Memory Usage Growing
- Verify cleanup logic is running
- Check if cleanup threshold needs adjustment
- Monitor processed updates Set size
-
Performance Issues
- Review database query performance
- Check polling interval settings
- Monitor network latency to Telegram API
Debug Commands
// Check poller status
const status = telegramPoller.getStatus();
// Manual cleanup (for testing)
if (processedUpdates.size > 1000) {
// Cleanup logic
}
// Verify duplicate detection
const existing = await InboxItem.findOne({ /* query */ });
Conclusion
The Telegram duplicate prevention system provides robust protection against message duplication through multiple complementary mechanisms. The comprehensive test suite ensures reliability, while the monitoring and debugging features facilitate maintenance and troubleshooting.
The system is designed to be:
- Reliable: Multiple layers of protection
- Efficient: Minimal performance impact
- Maintainable: Well-tested and documented
- Scalable: Bounded memory usage and efficient queries
- Debuggable: Comprehensive logging and monitoring