In specific, I need an opensource platform that supports my model to run on terabytes of data
I am exploring different open source platforms to support my custom ML models that simply takes an input and emits an output.
I came across Spark's rdd.pipe(
my_model). But looks like that isn't suited to build pipelines with scheduling options.
Looking for recommendations in any opensource tool/technology