tududi/backend/docs/telegram-duplicate-prevention.md
2025-06-20 12:04:36 +03:00

7.7 KiB

Telegram Duplicate Prevention System

Overview

This document describes the comprehensive duplicate prevention system implemented for Telegram message processing in the Tududi application. The system prevents duplicate inbox items from being created when the same message is processed multiple times due to network issues, processing errors, or other edge cases.

Problem Statement

When integrating with Telegram's polling API, several scenarios can cause duplicate messages:

  • Network timeouts causing message reprocessing
  • Application restarts during message processing
  • Race conditions in polling cycles
  • Telegram API returning the same update multiple times
  • Processing failures that lead to retry attempts

Solution Architecture

The duplicate prevention system implements multiple layers of protection:

1. Update ID Tracking (Application Level)

  • Purpose: Prevent reprocessing of the same Telegram update
  • Implementation: In-memory Set tracking processed update IDs
  • Key Format: ${userId}-${updateId}
  • Memory Management: Automatic cleanup (keeps last 1000 entries)
// Example usage
const processedUpdates = new Set();
const updateKey = `${user.id}-${update.update_id}`;

if (!processedUpdates.has(updateKey)) {
  // Process the update
  processedUpdates.add(updateKey);
}

2. Content-Based Duplicate Detection (Database Level)

  • Purpose: Prevent duplicate inbox items with identical content
  • Implementation: Database query checking recent items (30-second window)
  • Scope: Per-user, per-source (telegram)
// Check for existing item within time window
const existingItem = await InboxItem.findOne({
  where: {
    content: messageContent,
    user_id: userId,
    source: 'telegram',
    created_at: {
      [Op.gte]: new Date(Date.now() - 30000) // 30 seconds
    }
  }
});

3. Telegram API Offset Management

  • Purpose: Prevent re-fetching already processed updates
  • Implementation: Track highest processed update ID per user
  • Persistence: Maintained in poller state

Code Structure

Core Files

  • services/telegramPoller.js: Main polling logic with duplicate prevention
  • services/telegramInitializer.js: Initialization and user management
  • tests/unit/services/telegramPoller.test.js: Unit tests for core functions
  • tests/integration/telegram-duplicates.test.js: Integration tests for database interactions
  • tests/integration/telegram-duplicate-scenario.test.js: Real-world scenario tests

Key Functions

processUpdates(user, updates)

  • Filters out already processed updates
  • Updates user status with highest update ID
  • Processes each new update individually
  • Implements cleanup for memory management

createInboxItem(content, userId, messageId)

  • Checks for recent duplicates before creation
  • Creates inbox item with metadata
  • Returns existing item if duplicate found

processMessage(user, update)

  • Extracts message data
  • Creates inbox item (with duplicate check)
  • Sends confirmation back to Telegram
  • Handles errors gracefully

Testing Strategy

Unit Tests (12 tests)

  • Update ID tracking logic
  • User list management
  • Message parameter creation
  • URL generation
  • State management

Integration Tests (12 tests)

  • Database-level duplicate prevention
  • Poller state management
  • Update processing logic
  • Error handling scenarios

Scenario Tests (7 tests)

  • Network issue simulations
  • Rapid consecutive messages
  • Update ID tracking in practice
  • Poller restart scenarios
  • Memory management verification
  • Edge cases and time-based logic

Running Tests

# Run all Telegram duplicate tests
npm run test:telegram-duplicates

# Run specific test suites
npm test -- --testPathPatterns="telegramPoller\.test\.js"
npm test -- --testPathPatterns="telegram-duplicates\.test\.js"
npm test -- --testPathPatterns="telegram-duplicate-scenario\.test\.js"

Configuration

Time Windows

  • Duplicate Detection: 30 seconds (configurable)
  • Memory Cleanup: 1000 entries (configurable)
  • Polling Interval: 5 seconds (configurable)

Memory Management

  • Automatic cleanup when processed updates exceed 1000 entries
  • Removes oldest 100 entries during cleanup
  • Prevents memory leaks in long-running processes

Monitoring and Debugging

Logging

The system includes comprehensive logging for debugging:

console.log(`Processing ${updates.length} updates for user ${user.id}`);
console.log(`Duplicate inbox item detected for user ${userId}`);
console.log(`Successfully processed message ${messageId} for user ${user.id}`);

Status Monitoring

const status = telegramPoller.getStatus();
// Returns: { running, usersCount, pollInterval, userStatus }

Error Handling

Network Errors

  • Graceful handling of Telegram API timeouts
  • Continuation of polling for other users if one fails
  • Detailed error logging for debugging

Database Errors

  • Fallback behavior when duplicate check fails
  • Transaction safety for inbox item creation
  • Proper error responses to users

Processing Errors

  • Individual update error handling
  • Continuation of processing for remaining updates
  • Error reporting via Telegram messages

Performance Considerations

Memory Usage

  • Bounded memory growth through automatic cleanup
  • Efficient Set operations for duplicate checking
  • Minimal memory footprint per user

Database Performance

  • Indexed queries for duplicate detection
  • Time-limited searches (30-second window)
  • Efficient user-scoped queries

Network Efficiency

  • Optimized Telegram API calls
  • Proper offset management
  • Reasonable polling intervals

Security Considerations

Data Protection

  • User-scoped duplicate checking
  • Secure token handling
  • No sensitive data in logs

Rate Limiting

  • Respectful Telegram API usage
  • Configurable polling intervals
  • Graceful handling of API limits

Future Enhancements

Possible Improvements

  1. Persistent State: Store processed update IDs in database
  2. Configurable Windows: Make time windows user-configurable
  3. Metrics Collection: Add detailed metrics for monitoring
  4. Retry Logic: Implement exponential backoff for failures
  5. Batch Processing: Process multiple updates in batches

Migration Considerations

  • Current system maintains backward compatibility
  • New features can be added incrementally
  • Existing data remains unaffected

Troubleshooting

Common Issues

  1. Duplicates Still Occurring

    • Check if time window is appropriate for your use case
    • Verify update ID tracking is working
    • Review logs for processing errors
  2. Memory Usage Growing

    • Verify cleanup logic is running
    • Check if cleanup threshold needs adjustment
    • Monitor processed updates Set size
  3. Performance Issues

    • Review database query performance
    • Check polling interval settings
    • Monitor network latency to Telegram API

Debug Commands

// Check poller status
const status = telegramPoller.getStatus();

// Manual cleanup (for testing)
if (processedUpdates.size > 1000) {
  // Cleanup logic
}

// Verify duplicate detection
const existing = await InboxItem.findOne({ /* query */ });

Conclusion

The Telegram duplicate prevention system provides robust protection against message duplication through multiple complementary mechanisms. The comprehensive test suite ensures reliability, while the monitoring and debugging features facilitate maintenance and troubleshooting.

The system is designed to be:

  • Reliable: Multiple layers of protection
  • Efficient: Minimal performance impact
  • Maintainable: Well-tested and documented
  • Scalable: Bounded memory usage and efficient queries
  • Debuggable: Comprehensive logging and monitoring