Machine Learning Pipeline as a Language
Developing production-ready models requires lots of data preparation and model validation. The business logic for individual tasks and the dataflow for the pipeline can be rather complex. Projects with similar experiments benefit from holistic optimizations, like sharing intermediate data. Supporting this further complicates the system and easily results in maintenance burden.
This talk shows how the DSL can be used to create program descriptions and how interpreters optimize and evaluate them. With a domain-specific language (DSL), convoluted logic can be hidden behind familiar expressions. We implemented a DSL deeply embedded in Scala that represents programs as data.
This system empowers us to create easy-to-understand descriptions of complex dataflows. It also allows us to optimize pipelines in independent stages, so that execution semantics like data sharing can be implemented universally. This ensures that efficiency is not compromised by readability.