Workspaces
Workspaces in Kontroler
Workspaces provide a shared persistent storage area that all tasks in a DAG can access. This feature enables data sharing and state persistence between tasks in your workflow.
Basic Configuration
Enable a workspace by adding the workspace configuration to your DAG:
spec:
workspace:
enable: true
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: standard
volumeMode: Filesystem
When enabled, each task in your DAG automatically gets access to the workspace at /workspace
.
How It Works
- PVC Creation: Kontroler automatically creates a PersistentVolumeClaim (PVC) based on your configuration
- Mounting: Each task pod mounts the PVC at
/workspace
- Lifecycle: The PVC persists throughout the entire DAG execution
- Sharing: All tasks can read from and write to the shared space
Possible Use Cases
- File Passing
task:
- name: "producer"
command: ["sh", "-c"]
args: ["echo 'data' > /workspace/output.txt"]
- name: "consumer"
command: ["sh", "-c"]
args: ["cat /workspace/output.txt"]
runAfter: ["producer"]
- Data Processing
task:
- name: "download"
command: ["wget"]
args: ["-O", "/workspace/data.csv", "http://example.com/data.csv"]
- name: "process"
command: ["python"]
args: ["-c", "import pandas as pd; df = pd.read_csv('/workspace/data.csv')"]
runAfter: ["download"]
Configuration Options
Storage Class
workspace:
enable: true
pvc:
storageClassName: "fast-ssd" # Specify storage class
Access Modes
workspace:
enable: true
pvc:
accessModes:
- ReadWriteOnce # Single node access
# - ReadWriteMany # Multi-node access
Resource Requests
workspace:
enable: true
pvc:
resources:
requests:
storage: 5Gi # Request 5GB of storage
Best Practices
-
Size Appropriately
- Request enough storage for your workflow
- Consider peak usage requirements
- Account for temporary files
-
Access Modes
- Use
ReadWriteOnce
for single-node workflows - Consider
ReadWriteMany
for distributed tasks - Check storage class compatibility
- Use
-
Performance
- Choose appropriate storage class
- Monitor I/O patterns
- Consider task sequence
Example DAG with Workspace
Here’s a shortened example showing workspace usage:
apiVersion: kontroler.greedykomodo/v1alpha1
kind: DAG
metadata:
name: workspace-example
spec:
workspace:
enable: true
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: standard
task:
- name: "write-data"
command: ["sh", "-c"]
args: ["echo 'Hello from task 1' > /workspace/message.txt"]
- name: "read-data"
command: ["sh", "-c"]
args: ["cat /workspace/message.txt"]
runAfter: ["write-data"]
Limitations
-
Storage Class Compatibility
- Must be supported by your cluster
- Check available storage classes
- Verify access mode support
-
Performance
- Network storage may impact speed
- Consider I/O requirements
- Test with representative data
-
Concurrency
- Handle file locking if needed
- Consider parallel task access
- Be aware of race conditions