Towards MLOps: technical capabilities of a machine learning platform
The choice of the technology and tool that delivers this functionality is crucial, as it s a dependency of the ML pipeline. Traditionally, hadoop-based data lakes would have a workflow manager like Oozie or Azkaban to perform such activities. Projects like Airflow and Luigi dominated that space by providing an independent tool outside of the hadoop ecosystem, as companies moved their data and workloads to cloud-based data lakes. Currently, Airflow is the leading workflow management system for data processing.
2.4 Data labeling
Labeled data is frequently required to develop machine learning models. When this labeled data is not available, a data labeling activity may need to happen as part of an ML project, to create an initial training dataset. Within Prosus group, our classifieds business OLX has identified the importance of the labeling activity also at the end of the ML pipeline. For example, the OLX moderatio