Bug Report Lifecycle

Bug report follows a specific workflow managed within trackers like Github, Jia or Hubspot.

State Transitions

  • Unconfirmed New: Validated as a legitimate issue (Bug Report Optimisation).
  • New Assigned: A developer takes ownership (Triage).
  • Assigned Resolved: The bug is addressed. Resolution includes FIXED, WONTFIX, WORKSFORME, DUPLICATE, or INVALID.
  • Resolved Verified/Closed: the fix is confirmed and the issue is archived.
  • Reopen: Occurs if a solution is unsatisfactory or the bug recurs.

Bug Repot Classification

Goal: Automatically distinguish reports about bugs (corrective maintenance) from non-bugs like enhancements or refactoring.

Traditional ML Pipeline

  1. Text Filtering: Punctuation removal and specific transformation (e.g., splitting camel-case like MyUser My, User).
  2. Stemming: Reducing words to their base for (e.g., caresses caress) using rules.
  3. Indexing: Converting text into numerical vectors.
    • Term Frequency (TF):
    • Note: Unlike standard NLP, IDF (Inverse Document Frequency) is often excluded because frequent terms like “crash” or “failure” are highly significant indicators of a bug.
  4. Training: Applying models like Decision Trees, Naive Bayes or Logistic Regession .

Advanced Approach: MHNurf/MHAN

Traditional tools struggle with the semi-structured nature of reports. The Multifaceted Hierarchical Attention Network (MHNurf) improves results by:

  • Parallel Semantic Leaning: Analysing title, description, comments and code snippets separately.
  • Two-Level Attention: Weighting importance at both the word/token level and the sentence/statement level.
  • Concatenation: Merging different feature types into a final prediction vector.

Bug Repot Prioritisation

Goal: Predict the priority label (P1 to P5) to help developers focus on the most critical issues first.

The DRONE Framework (Linear Regression)

DRONE uses a combination of textual and non-textual features:

  • Formula:
  • Features: Includes Temporal factors (frequency of reports), Author factors (author’s history), and Related-report factors.
  • Thresholding: Instead of simple rounding, it uses Greedy Hill Climbing on validation data to determine optimal cutoffs for P1-P5 labels.

CNN-Based Prioritisation

Modern approaches use Convolutional Neural Networks and Emotional Analysis:

  1. Emotion Extraction: Identifying positive/negative sentiment. Negative emotion (e.g., “Breakpoint not hit”) can signal urgency.
  2. Word2Vector: Converting words into -dimensional vectors where similar words are closer together in space.
  3. Convolutional Layer: Sliding filters over the report representation to extract local features.
    • Intuition: For an report representation, the filter computes dot products:

Duplicate Bug Report Detection

Goal: Identify if a new report matches an existing bucket of “master” and duplicate reports to prevent redundant work

BM25Fext Similarity Measure

Duplicates are detected by measuring similarity between a new repot and existing buckets.

  • BM25F: A ranking function used to estimate the relevance of documents to a given search query.
  • BM25Fext: Extended the original by adding a weight for the query report itself, making it more accurate for long bug reports.
  • REP Metric: A weighted sum of seven features including summary, description, product, component, type, priority and version.

Bug Localisation using Reports

Goal: Locate the specific source code files responsible for a bug using information from the report.

Learning to Rank (LTR)

Instead of simple search, LTR models rank files by their likelihood of being buggy.

  • Key Features:
    1. Surface Lexical Similarity: Cosine similarity between report text and source code.
    2. API Specification: Mapping report tokens API documentation (e.g., report mentions “toolbar”, code uses MUILabel which documents toolbars).
    3. Collaborative Filtering: Checking if a file was previously fixed for similar bug reports.
    4. Recency/Frequency: File fixed recently or frequently are more likely to contain new bugs.
  • *Ranking Formula: