Milvus Build Up the Unstructured Data Service - Jun Gu 09.2020 - ITU
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Speaker bio Jun Gu Database engineer, SME Voting member in Technical Advisory Council (TAC) Partner, Chief Evangelist Career history Education © 2020 Zilliz. All rights reserved.
Zilliz: Who we are • Open source software company based in Shanghai • Mission: Reinvent data science • Main contributor of Milvus project © 2020 Zilliz. All rights reserved.
Unlock the treasure of unstructured data AI algorithms transform image, video, voice, natural language into vectors, and enables understanding and utilization of unstructured data at scale. Unstructured data Deep learning models Vectors Knowledge, insight, $ © 2020 Zilliz. All rights reserved.
The flow-based AI applications Video Extract Voice Extract The most popular way frames model tags • Flexible • Easy to compose, web-based UI • Sample piplelines Image The challenge Visual model • Data fragmentation VGG, eg. Vectors Vectors Attributes Visual Voice © 2020 Zilliz. All rights reserved. The sample pipelines for video processing
The unstructured data service (UDS) for AI Unstructured Data image, video, voice, natural language Search Insert Model Inference Runtime store Inference Layer TensorRT, ONNX RT, TFRT Search Insert Milvus Vectors Attributes Object Data Service Layer High dense + Sparse (will be in 0.11) Storage Multimodal Scoring (will be in 0.14) (will be in 0.16) output Result Set image, video, voice, natural language © 2020 Zilliz. All rights reserved.
Why Milvus: Vectors are different Numbers Vectors Arithmetic operation Similarity (eg. Euclidean distance) d ( A, B ) n (a b ) i i 2 Operation i 1 Number comparison Similarity comparison a b TopK ( A) arg min(d ( A, B )) B 1–10 1–5 6–10 Organization 1 2 3 4 5 6 7 8 9 10 © 2020 Zilliz. All rights reserved.
Milvus: The big picture Query Scheduler Processing Engine Buffer Pool ANNS Collaborative Query Mi-FAISS, Mi-Annoy tag/structured data Index Result Files SDK / Web API top-K result Reducer Multi-modal Scoring app specific Segment Segment query obj Metadata Selection insert obj X86: supports SSE4.2, AVX2, AVX512 GPU: Pascal microarchitecture or later, CUDA 10.0 or later x86 ARM GPU New Index Arm: requires aarch64 Index Kunpeng: tested on Kunpen 920 with Centos 7.x Files File Loongson: tested on Loongson with docker container Kunpeng Loongson RISC-V RSIC-V: in early development Various Processors Storage Tier © 2020 Zilliz. All rights reserved.
Milvus: The ANN benchmark Milvus: 0.8.0 OS: Ubuntu 18.04 ECS: AWS c5.4xlarge (16c, 32GB), Intel XeonPlatinum 8275CL Data set: sift-128-euclidean (1 million vectors) More info: https://milvus.io/docs/benchmarks_aws © 2020 Zilliz. All rights reserved. Special thanks to ANN-Benchmarks (developed by Martin Aumueller, Erik Bernhardsson and Alec Faitfull)
Milvus: The journey 2018.10 2019.04 2019.06 The most active AI projects in 1st The Milvus Linux foundation seed idea 0.1 user Open Joined Source LF AI 2019.10 2020.03 © 2020 Zilliz. All rights reserved.
Progress Unstoppable momentum since its debut. 5.9K 3.9K 104 Commits GitHub stars Contributors 16 200+ 19 Release Users Patents filed © 2020 Zilliz. All rights reserved.
Comprehensive Leading-Edge Dynamic Data Similarity Metrics Performance Management Milvus Features & benefits Near Real Time Rich Data Type & Advanced Cost Efficient Search Search The world’s most advanced, our target Highly Scalable and Robust Cloud Native Ease of Use © 2020 Zilliz. All rights reserved.
Use case: Inteligent writing assistant Corpus Data natural language Writing Intention Data Cleansing Feature engineering Encoder TextCNN Extract paragraph, summary Result An auto-generated Encoder essay InferSent Object Milvus Storage © 2020 Zilliz. All rights reserved.
Use case: News recommendation on mobile Daily batch Feeding News title News title Encoder SimBert Object Milvus Storage Reading Recommended Preference News © 2020 Zilliz. All rights reserved.
Use case: Image search for company trademark Images Company Trademark • 55 million images • Search elapsed time: 20 ms on cloud GPU server Encoder VGG (fine tuned) Search Object Milvus Storage Trademark Image Company Info © 2020 Zilliz. All rights reserved.
Use case: Pharmaceutical molecule analysis Molecular Formula • 800 million molecules CC(=O)Nc1ccc(S(=O)(=O)NCC(=O)N2CCS(=O)CC2)cc1 • Search elapsed time: Encoder 500 ms on single server RDKit Molecular fingerprint: 1024 bits 00001100...10000000 Milvus Tanimoto similarity Molecular Substructure Candidate List Similarity © 2020 Zilliz. All rights reserved. Superstructure
Useful Links Performnance benchmark: https://milvus.io/docs/benchmarks_aws https://milvus.io https://github.com/milvus-io/milvus Live demo: https://milvus.io/scenarios https://milvusio.slack.com https://twitter.com/milvusio • Content-based image retrieval system (以图搜图) https://medium.com/unstructured-data-service • Q&A chatbot powered by NLP (智能客服机器人) • Molecular analysis (化合物分析) https://zhuanlan.zhihu.com/ai-search © 2020 Zilliz. All rights reserved.
Thanks! © 2020 Zilliz. All rights reserved.
You can also read