L'analisi dei dati e la progettazione di modelli di Machine Learning spesso sono la parte più "facile" nella vita di un data scientist. La vera sfida inizia quando il prototipo deve diventare production-ready: come garantire scalabilità e performance senza compromettere velocità di esecuzione e flessibilità? Analizzeremo insieme pro e contro delle due soluzioni più promettenti: MLaaS (Machine Learning as a Service) e Serverless (e.g. AWS Lambda).
3. What is Machine Learning?
Back to 1959 (Arthur Samuel)
How computers learn from Data
How to solve decision problems
Serverless Meetup @ Milanclda.co/serverless-‐milano
4. Machine Learning pipeline
Training Predic3on
batch real-‐Ame
Feature
extrac3on
batch
data informaMon
features ML models
Serverless Meetup @ Milanclda.co/serverless-‐milano
6. Machine Learning taxonomy
clustering
rule extrac2on
group A group B
A, B C
Supervised
Learning
Unsupervised
Learning
Serverless Meetup @ Milanclda.co/serverless-‐milano
7. What problems can ML solve for you?
Supervised
Learning
Unsupervised
Learning
classifica'on
regression
clustering
rule extrac'on
?
170
cm
gro gro
A, B C
Serverless Meetup @ Milanclda.co/serverless-‐milano
8. What problems can ML solve for you?
Supervised
Learning
Unsupervised
Learning
classifica'on
regression
clustering
rule extrac'on
?
fraud detecMon
170
cm
gro gro
A, B C
price of a stock over Mme
purchase likelihood
user segmentaMon
Serverless Meetup @ Milanclda.co/serverless-‐milano
10. Generated data started growing ~10 years ago…
“90% of the data in the world today has been
created in the last two years alone” -‐ IBM
“300+ hours worth of video content is being
uploaded to the site every minute” -‐ Youtube
Serverless Meetup @ Milanclda.co/serverless-‐milano
11. … and it keeps geKng bigger!
Serverless Meetup @ Milanclda.co/serverless-‐milano
13. What does a real Data ScienAst look like?
Serverless Meetup @ Milanclda.co/serverless-‐milano
Data ScienMst
Very smart & curious
Numbers lover (i.e. Data)
Great teamwork skills
40% analysis, 30% design, 30% code
14. Big data challenges
Manual exploraMon is not an opMon
Data-‐driven decisions are a must
Distributed/parallel compuMng
The curse of dimensionality
Serverless Meetup @ Milanclda.co/serverless-‐milano
16. Why is deploying ML models a challenge?
Serverless Meetup @ Milanclda.co/serverless-‐milano
17. Why is deploying ML models a challenge?
+
+
Data
ScienMst
Data
Time
Serverless Meetup @ Milanclda.co/serverless-‐milano
18. Why is deploying ML models a challenge?
+
+
Data
ScienMst
Data
Time
ML
Model
Data
VisualisaMon
Prototype
+
+
Serverless Meetup @ Milanclda.co/serverless-‐milano
19. Why is deploying ML models a challenge?
ProducMon
Code
+
+
Data
ScienMst
Data
Time
ML
Model
Data
VisualisaMon
Prototype
+
+
Serverless Meetup @ Milanclda.co/serverless-‐milano
20. Why is deploying ML models a challenge?
+
+
Data
ScienMst
Data
Time
ML
Model
Data
VisualisaMon
Prototype
+
+
Web
Developer
DevOps
A lot of
Time
+
+
Serverless Meetup @ Milanclda.co/serverless-‐milano
21. Why is deploying ML models a challenge?
1. Prototyping != ProducMon-‐ready
2. We need ElasMcity
4. MulM-‐model architectures
3. Too many nice-‐to-‐have features
5. Avoid lack of ownership
Serverless Meetup @ Milanclda.co/serverless-‐milano
22. The Lack of Ownership
Serverless Meetup @ Milanclda.co/serverless-‐milano
!=
Data ScienMst DevOps
MathemaAcal modeling
StaAsAcal analysis
Data mining
(Cloud) OperaAons
System administraAon
SoVware best pracAces
23. Machine Learning as a Service (MLaaS)
Amazon
Machine Learning
Azure
Machine Learning
Google
PredicAon API
IMB
Watson AnalyAcs
BigML
cloudacademy.com/blog/machine-‐learning
Serverless Meetup @ Milanclda.co/serverless-‐milano
24. Amazon Machine Learning
AmazonML
One of the first MLaaS soluMons (Apr 2015)
It’s great for classificaMon and regression problems
Only linear models (linear & logisMc regression + SGD)
No support for advanced scenarios yet
Serverless Meetup @ Milanclda.co/serverless-‐milano
25. Serverless compuAng to the rescue!
Serverless Meetup @ Milanclda.co/serverless-‐milano
Versioning, staging & caching
1 model = 1 microservice
Flexible RESTful interface
High Availability (no downMme)
Very liele operaMonal effort
Transparent elasMcity (PAYG)
Failure isolaMon / DecentralisaMon Offline training phase
ProducMon-‐ready prototypes A/B tesMng through composiMon
27. Serverless ML @ Cloud Academy
MulM-‐model architecture
1 Lambda FuncMon for each ML model
S3 to store models (1~10MB each)
RDS to store training data (PostgreSQL)
Periodic training (offline)
Real-‐world Example
Serverless Meetup @ Milanclda.co/serverless-‐milano
30. AWS
Lambda
No real-‐Mme models (only pseudo real-‐Mme)
Deployment package management: size limit and OS libraries
Not suitable for model training yet (5 min max execuMon Mme)
Cold start Mme is long and hard to avoid
Unit/integraMon tests help, but not enough
LimitaAons of Serverless ML
Serverless Meetup @ Milanclda.co/serverless-‐milano