0

I am not sure if I have been using the term wrongly (and including it in my CV), so some inputs from the community will be appreciated.

I am not a DevOps, but a noob machine learning engineer. So I basically developed certain pipelines to perform batch processing jobs to automate stuffs like

  1. Read raw data from a cloud data store (such as Azure blob)
  2. Perform training time feature engineering, labelling, clustering etc.
  3. Trains and validates some machine learning models
  4. Analyses the performance, and subject to some criteria, replaces old models in production
  5. Pushes the new models in the serving pipeline
  6. Reports the result of these steps over emails

Basically, these steps are powering the machine learning part of our inhouse analytics perform. So based on this, can I claim to have developed a CI/CD pipeline? Or does that term mean something totally different?

To be noted, the tools I have used for the whole thing are Docker, Airflow, Spark, tensorflow, celery etc. and language wise, it is a combination of Scala and Python.

The company being small, there is not really a dedicated DevOps team, or even people who can advise me on this, i.e. whether this is CI/CD. Hence asking the community.

Della
  • 121
  • 4
  • 1
    If you want to put the latest shiny buzzwords on your CV, use "MLOps". It means nothing, but then neither does "DevOps" these days. – Philip Kendall Jun 30 '23 at 08:54
  • What triggers the pipeline? Pushing some code? If so, what do you do with the code as part of the pipeline (build it, test it, etc.)? – guillaume31 Jun 30 '23 at 10:08
  • @guillaume31 good question. No, it is not git push/commit that triggers the pipeline, it is monitored data-drift and model performance testing (basically, business logic), and the monitoring task is triggered by Airflow or cron jobs. The _code_ is static. So is that the defining criterion on whether I can call it CI/CD? – Della Jun 30 '23 at 10:30
  • @PhilipKendall thanks for the comment. Shiny buzzwords is not the goal, but just want to use the correct words, so as to not confuse or mislead people. But yeah, meanings of the words get diluted to an extent they can mean anything. – Della Jun 30 '23 at 10:33
  • The goal of "CI" is automating your development process (typically checks around the code itself - code quality, code metrics, automated testing, building software artefacts ready for test/deployment, etc. - tools and scripts whose purpose is to help developers be more productive and deliver higher quality software; a CI pipeline generally doesn't go anywhere near production or real data/environments). This sounds more like ETL or data engineering; which are terms more commonly found around AI/ML/Big Data/Analytics/etc. – Ben Cottrell Jun 30 '23 at 17:06

2 Answers2

1

A combination of factors make me think you can't really talk about Continuous Integration in that case:

  • The pipeline is not triggered by a code change
  • You don't merge, build, or test code as part of the process
  • You seem to be working solo on this so nothing to integrate (some will claim it's not an essential condition for CI but it's a big part of the CI philosophy...)

The fact that the pipeline replaces the old models in production and all the automated stuff around it makes the CD side of things defensible though IMO.

guillaume31
  • 8,358
  • 22
  • 33
1

What you've done is task or workflow automation. Calling it CI/CD would be misleading for two reasons:

  • What you're automating is not the kind of task typically done under that term
  • What you're using are not the typical tools used for CI/CD

So you'd probably be pretty lost (or at least have to do lots of research and reading) if someone asked you to set up a typical CI/CD pipeline, even if some of the knowledge probably transfers.

Michael Borgwardt
  • 51,037
  • 13
  • 124
  • 176