Bug Report Lifecycle
Bug report follows a specific workflow managed within trackers like Github, Jia or Hubspot.
State Transitions
- Unconfirmed New: Validated as a legitimate issue (Bug Report Optimisation).
- New Assigned: A developer takes ownership (Triage).
- Assigned Resolved: The bug is addressed. Resolution includes FIXED, WONTFIX, WORKSFORME, DUPLICATE, or INVALID.
- Resolved Verified/Closed: the fix is confirmed and the issue is archived.
- Reopen: Occurs if a solution is unsatisfactory or the bug recurs.
Bug Repot Classification
Goal: Automatically distinguish reports about bugs (corrective maintenance) from non-bugs like enhancements or refactoring.
Traditional ML Pipeline
- Text Filtering: Punctuation removal and specific transformation (e.g., splitting camel-case like
MyUserMy,User). - Stemming: Reducing words to their base for (e.g.,
caressescaress) using rules. - Indexing: Converting text into numerical vectors.
- Term Frequency (TF):
- Note: Unlike standard NLP, IDF (Inverse Document Frequency) is often excluded because frequent terms like “crash” or “failure” are highly significant indicators of a bug.
- Training: Applying models like Decision Trees, Naive Bayes or Logistic Regession .
Advanced Approach: MHNurf/MHAN
Traditional tools struggle with the semi-structured nature of reports. The Multifaceted Hierarchical Attention Network (MHNurf) improves results by:
- Parallel Semantic Leaning: Analysing title, description, comments and code snippets separately.
- Two-Level Attention: Weighting importance at both the word/token level and the sentence/statement level.
- Concatenation: Merging different feature types into a final prediction vector.
Bug Repot Prioritisation
Goal: Predict the priority label (P1 to P5) to help developers focus on the most critical issues first.
The DRONE Framework (Linear Regression)
DRONE uses a combination of textual and non-textual features:
- Formula:
- Features: Includes Temporal factors (frequency of reports), Author factors (author’s history), and Related-report factors.
- Thresholding: Instead of simple rounding, it uses Greedy Hill Climbing on validation data to determine optimal cutoffs for P1-P5 labels.
CNN-Based Prioritisation
Modern approaches use Convolutional Neural Networks and Emotional Analysis:
- Emotion Extraction: Identifying positive/negative sentiment. Negative emotion (e.g., “Breakpoint not hit”) can signal urgency.
- Word2Vector: Converting words into -dimensional vectors where similar words are closer together in space.
- Convolutional Layer: Sliding filters over the report representation to extract local features.
- Intuition: For an report representation, the filter computes dot products:
Duplicate Bug Report Detection
Goal: Identify if a new report matches an existing bucket of “master” and duplicate reports to prevent redundant work
BM25Fext Similarity Measure
Duplicates are detected by measuring similarity between a new repot and existing buckets.
- BM25F: A ranking function used to estimate the relevance of documents to a given search query.
- BM25Fext: Extended the original by adding a weight for the query report itself, making it more accurate for long bug reports.
- REP Metric: A weighted sum of seven features including summary, description, product, component, type, priority and version.
Bug Localisation using Reports
Goal: Locate the specific source code files responsible for a bug using information from the report.
Learning to Rank (LTR)
Instead of simple search, LTR models rank files by their likelihood of being buggy.
- Key Features:
- Surface Lexical Similarity: Cosine similarity between report text and source code.
- API Specification: Mapping report tokens API documentation (e.g., report mentions “toolbar”, code uses
MUILabelwhich documents toolbars). - Collaborative Filtering: Checking if a file was previously fixed for similar bug reports.
- Recency/Frequency: File fixed recently or frequently are more likely to contain new bugs.
- *Ranking Formula: