Skip to content

DeepDriveSim

Python 3.9+ License: MIT Code style: ruff

Deep learning-driven Adaptive Simulations

DeepDriveSim is a toolkit developed by Brookhaven National Laboratory (BNL) / RADICAL Laboratory at Rutgers University, in collaboration with Argonne National Laboratory. It implements an AI-steered ensemble simulation workflow that uses deep learning models to guide and optimize simulations in real-time.

Features

  • Adaptive Simulation Management: Dynamically manages molecular simulations based on ML predictions
  • Active Learning Loop: Implements simulation → training → prediction → cancellation → re-submission cycle
  • Multiple Execution Backends: Supports local execution, RHAPSODY (HPC), and Dragon distributed computing
  • Resource-Aware Scheduling: Automatically balances resources between simulations and training
  • GPU Support: Automatic GPU detection and utilization
  • Extensible Architecture: Easy to customize for different simulation types and ML models

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     DDMD Manager                            │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │ Simulation  │  │  Training   │  │     Prediction      │  │
│  │   Queue     │──│   Module    │──│      Module         │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
│         │                │                    │             │
│         ▼                ▼                    ▼             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              ROSE / RADICAL-AsyncFlow               │    │
│  │           (Execution Backend Abstraction)           │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

1. Basic Usage with DummyWorkflow (for testing)

import asyncio
from radical.asyncflow import ConcurrentExecutionBackend, WorkflowEngine
from concurrent.futures import ThreadPoolExecutor
from ddmd import DummyWorkflow

async def main():
    # Create execution backend
    engine = await ConcurrentExecutionBackend(ThreadPoolExecutor())
    asyncflow = await WorkflowEngine.create(engine)

    # Initialize workflow
    workflow = DummyWorkflow(
        asyncflow=asyncflow,
        max_sim_batch=4,
        training_cores=1,
        num_files=10
    )

    # Run the adaptive learning loop
    await workflow.start()
    await workflow.close()

asyncio.run(main())

2. Creating a Custom Workflow

Extend DDSimManager to create your own workflow:

from ddmd import DDSimManager

class MyWorkflow(DDSimManager):
    def __init__(self, asyncflow, **kwargs):
        super().__init__(asyncflow)
        # Your initialization code
        self._register_learner_tasks()

    def _register_learner_tasks(self):
        @self.learner.simulation_task(as_executable=False)
        async def simulation(*args, **kwargs):
            # Your simulation logic
            pass
        self.simulation = simulation

    def stop_simulation(self, prediction):
        # Return True to cancel simulation based on prediction
        return prediction < 0.5

    async def init_sim_queue(self):
        # Populate self.sim_task_queue with simulation inputs
        pass

    async def check_train_data(self):
        # Return True when ready to start training
        return True

    async def train_model(self):
        # Your training logic
        pass

    async def clean_sim_data(self, sim_ind):
        # Cleanup files for canceled simulations
        pass

Configuration Options

Parameter Description Default
max_sim_batch Maximum concurrent simulations 4
training_cores CPU cores reserved for training 1
training_threshold Accuracy threshold for training 0.5
prediction_threshold Score threshold for cancellation 0.5
force_start_training Skip waiting for data threshold False
clean_unregistered_sims Delete files from canceled sims True