Terminology and Definitions

There is currently no single industry for building AI-integrated systems, leading to various overlapping terms.

  • AI-Enabled/ML-Enabled Systems: Software where AI/ML is a component of a larger system.
  • AI Engineering: Often refers to the professional discipline of building these systems usually with a focus on the ML pipeline.
  • ML Systems Engineering: Frequently used when discussing the creation of distributed, scalable data storage and processing platforms.
  • AIOps: Utilising AI to automate operational decision-making
  • DataOps: Applying agile methods and automation specifically to business data analytics.

The System-Level Perspective

A system is a collection of interrelated parts working together in an environment to achieve a specific objective. In modern software, Machine Learning is just a component, often a small one, within this larger system.

Holistic Approach

  • Interconnectivity: Every part of the system is linked; changing one component can have emergent effects on the whole.
  • Environment Interaction: The system much account for users and the physical world, creating feedback loops where actions taken by the AI influence future data inputs.
  • Component Examples:
    • Small Component: An “Audit Risk Prediction” feature in tax software
    • Core Component: A “Music Generation” engine in a creative application
    • Non-ML Pats: User interfaces, databases, payment gateways, and logging systems must all interact seamlessly with the ML model.

Why ML-Enabled Systems are Challenging

Traditional software development relies on clear specifications and testing for correctness. AI shifts this paradigm

Deductive vs. Inductive Reasoning

  • Deductive (Traditional SE): A “top down” approach. Developers apply logic rules (general principles) to reach specific conclusions.
  • Inductive (ML Focus): A “bottom up” approach. The system learns from specific observations (data) to create broader generalisation of patterns.

The “Black Box” and Uncertainty

  • Lack of Specifications: Because ML model learns from data rather than explicit rules, it is often unclear if they will follow specific prompts or requirements (e.g., in LLMs)
  • Mistake as the Norm: AI models frequently fail in unexpected ways because they lack a rigid “specification” for correctness
  • Fit for Purpose: Success is measured by whether a system is “good enough” on average rather than “correct” in every instance.

Key

Engineering AI requires building “safe software with unreliable components”. You must design the system to tolerate and mitigate incorrect predictions.

Software Process Models

A software process is the structured set of activities required to develop a system, typically including specification, design, validation, and evolution.

Classical Methodologies

  • Waterfall Model: A linear, phase-based approach (Requirements → Design → Implementation → Testing → Maintenance).
  • Incremental Development: Breaking the system into increments, delivering functionality piece-by-piece. This is the basis for Agile practices.
    • Priority-Driven: High-priority requirements are delivered in early increments.
    • Frozen Increments: Requirements are frozen once an increment starts, but future increments can evolve.

MLOps: Automating the ML Pipeline

While data science focuses on building a single model, MLOps focuses on the infastructure required to make that model operational and scalable.

The ML Pipeline Process

Developing a production-ready ML component involves these sequential (and often iterative) steps:

  1. Model Requirements: Defining goals.
  2. Data Collection & Labelling: Gathering and categorising raw information.
  3. Data Cleaning: Removing noise or errors.
  4. Feature Engineering: Selecting the right variables for the model.
  5. Model Training: The actual learning phase.
  6. Model Evaluation: Checking accuracy against test data.
  7. Model Deployment: Moving the model into a production environment.
  8. Model Monitoring: Watching for “concept drift” or performance drops in the real world.

Barriers to MLOps Adoption

Transitioning from a “model-centric” (local notebook) to a “pipeline-centric” (automated infrastructure) workflow is a major hurdle. Common barriers include:

  • Overly long build times (ideally kept under 10 minutes).
  • Troubleshooting failures in Continuous Integration (CI).
  • Lack of tool integration and security access controls.
  • Hidden Technical Debt: ML systems often have massive amounts of non-ML code for configuration, data verification, and resource management.