Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Dani-zm/emotion-recognition-ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

EmoChat uses computer vision techniques to detect and classify human emotions in real-time through facial analysis. The system captures facial expressions from a webcam feed, extracts facial landmarks, normalizes the features, and classifies them into emotion categories.

How It Works

The emotion recognition pipeline consists of four main stages:
  1. Face Detection - Detect faces in the video frame
  2. Landmark Extraction - Extract 68 facial landmark points
  3. Feature Normalization - Normalize coordinates for consistent analysis
  4. Emotion Classification - Predict emotion using the trained ML model

Facial Landmark Detection

OpenCV Implementation

EmoChat uses OpenCV’s face detection and landmark extraction capabilities, specifically:
  • Haar Cascade Classifier for face detection
  • LBF (Local Binary Features) Model for 68-point facial landmark detection
The core implementation is in utils.py:59:
def get_face_landmarks(image, draw: bool = False, static_image_mode: bool = True) -> List[float]:
    """
    Extrae landmarks faciales 2D usando únicamente OpenCV (sin Mediapipe).
    Utiliza un detector Haar + FacemarkLBF (68 puntos): devuelve una lista
    plana [x1_norm, y1_norm, x2_norm, y2_norm, ...].
    """
The system automatically downloads the required model files (haarcascade_frontalface_default.xml and lbfmodel.yaml) from OpenCV’s repository if they don’t exist locally.

68 Facial Points

The LBF model detects 68 specific facial landmark points that capture:
  • Jawline contour (17 points)
  • Eyebrow shapes (10 points)
  • Nose bridge and tip (9 points)
  • Eye contours (12 points)
  • Mouth outline (20 points)
These landmarks provide comprehensive facial geometry data for emotion analysis.

Feature Extraction Process

Step 1: Face Detection

The Haar Cascade classifier scans the grayscale image to detect faces:
face_detector, facemark = _get_models()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
faces = face_detector.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
Parameters:
  • scaleFactor=1.1 - Image pyramid scaling factor
  • minNeighbors=5 - Minimum neighbors required for face detection (reduces false positives)

Step 2: Landmark Fitting

Once a face is detected, the LBF model fits 68 landmarks to the facial region:
ok, landmarks = facemark.fit(gray, faces)
if not ok or len(landmarks) == 0:
    return []

# Only use the first detected face
points = landmarks[0][0]  # shape: (68, 2)

Step 3: Feature Normalization

Raw landmark coordinates vary based on face position and size. EmoChat normalizes these coordinates to make them position and scale-invariant:
xs = points[:, 0]
ys = points[:, 1]

# Calculate bounding box
min_x, min_y = xs.min(), ys.min()
max_x, max_y = xs.max(), ys.max()

width = float(max_x - min_x)
height = float(max_y - min_y)

# Normalize each point to [0, 1] range
features: List[float] = []
for (x, y) in zip(xs, ys):
    features.append(float((x - min_x) / width))
    features.append(float((y - min_y) / height))
Why Normalization? Normalizing coordinates makes the model invariant to:
  • Face size (distance from camera)
  • Face position (location in frame)
  • Image resolution
This ensures consistent predictions regardless of how the user positions themselves.

Feature Vector Output

The final feature vector contains 136 values (68 points × 2 coordinates):
[x1_norm, y1_norm, x2_norm, y2_norm, ..., x68_norm, y68_norm]
Each value is in the range [0, 1], representing normalized positions within the facial bounding box.

Currently Supported Emotions

EmoChat currently recognizes 2 core emotions:

Happy

Detected when facial features show:
  • Raised cheek muscles
  • Mouth corners elevated
  • Crow’s feet around eyes

Sad

Detected when facial features show:
  • Downturned mouth corners
  • Lowered eyebrows
  • Relaxed facial muscles
The emotion labels are defined in app.py:19 as:
emotions = ["HAPPY", "SAD"]
The model outputs an integer index (0 or 1) which maps to these labels.

Real-time Processing

Webcam Integration

The JavaScript frontend (main.js) captures frames from the webcam every 1 second:
// Start sending frames to backend every 1000ms
predictionInterval = setInterval(sendFrameForPrediction, 1000);

Frame Processing Flow

  1. Capture - JavaScript captures frame from webcam video element
  2. Encode - Frame is converted to JPEG and Base64 encoded
  3. Send - Data is sent to Flask /predict endpoint via HTTP POST
  4. Decode - Backend decodes Base64 to image array
  5. Extract - get_face_landmarks() extracts normalized features
  6. Predict - Model classifies the emotion
  7. Return - Emotion label is sent back to frontend
  8. Display - UI updates with detected emotion
# Backend processing (app.py:35)
@app.route('/predict', methods=['POST')
def predict():
    # Decode base64 image
    img_data = data['image'].split(',')[1]
    nparr = np.frombuffer(base64.b64decode(img_data), np.uint8)
    frame = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
    
    # Extract features
    face_landmarks = get_face_landmarks(frame, draw=False, static_image_mode=True)
    
    # Predict emotion
    if len(face_landmarks) > 0:
        output = model.predict([face_landmarks])
        emotion = emotions[int(output[0])]
        return jsonify({'emotion': emotion})
Performance Consideration: Processing occurs at 1 FPS to balance responsiveness with computational efficiency. This rate provides smooth real-time feedback without overwhelming the CPU.

Error Handling

No Face Detected

When no face is found in the frame:
faces = face_detector.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
if len(faces) == 0:
    return []  # Empty feature vector
The API returns: {"emotion": "No face detected"}

Invalid Input

The function validates image format before processing:
if image is None or image.ndim != 3 or image.shape[2] != 3:
    return []  # Requires 3-channel RGB/BGR image

Visualization (Optional)

For debugging, landmarks can be drawn on the image:
face_landmarks = get_face_landmarks(frame, draw=True)
This draws green circles at each of the 68 landmark positions:
if draw:
    for (x, y) in points:
        cv2.circle(image, (int(x), int(y)), 1, (0, 255, 0), -1)
This feature is used in test_model.py for real-time visualization during development.

Next Steps

ML Model

Learn how the Random Forest classifier is trained and makes predictions

Architecture

Understand the complete system architecture and data flow