OBDP 2021 PRESENTATION DR PABLO GHIGLINO - A Low Power And High Performance Artificial Intelligence Approach To Increase Guidance Navigation And ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
OBDP 2021 PRESENTATION DR PABLO GHIGLINO A Low Power And High Performance Arti cial Intelligence Approach To Increase Guidance Navigation And Control Robustness pablo.ghiglino@klepsydra.com www.klepsydra.com fi
KLEPSYDRA AI OVERVIEW Trained Model Advanced features Language Performance bindings analysis Basic features Klepsydra AI Images Timeseries Sensor Data
KLEPSYDRA AI DISTRIBUTION 1. Developer station distribution: • Libraries + header les • Development tools (performance analysis, etc.) • Unlimited seats. 2. Target computer distribution: • Libraries + header les • License le per seat. fi fi fi
SUPPORT MATRIX Architecture Supported Processors ARM Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz Samsung Exynos5 Octa ARM Cortex™-A15 Quad 2Ghz and Cortex™-A7 Quad 1.3GHz CPUs 64-bit Arm Neoverse core 6-core Carmel ARM v8.2 64-bit CPU, 8MB L2 + 4MB L3 x86 Intel Xeon E5-2686 v4 2.4GHz 8-core Intel Core i9 2.9GHz quad-core Intel Core i7 OS Type Supported OS Linux Ubuntu 18.04, 20.04, Docker, Embedded Linux, FreeRTOS, PykeOS
Core API class DeepNeuralNetwork { public: /** * @brief setCallback * @param callback. Callback function for the prediction result. */ virtual void setCallback(std::function callback) = 0; /** * @brief predict. Load input matrix as input to network. * @param inputVector. An F32AlignedVector of floats containing network input. * * @return Unique id correspoding to the input vector */ virtual unsigned long predict(const kpsr::ai::F32AlignedVector& inputVector) = 0; /** * @brief predict. Copy-less version of predict. * @param inputVector. An F32AlignedVector of floats containing network input. * * @return Unique id correspoding to the input vector */ virtual unsigned long predict(const std::shared_ptr & inputVector) = 0; }; 5
ONNX API class KPSR_API OnnxDNNImporter { public: /** * @brief import an onnx file and uses a default eventloop factory for all processor cores * @param onnxFileName * @param testDNN * @return a share pointer to a DeepNeuralNetwork object * * When log level is debug, dumps the YAML configuration of the default factory. * It makes use of all processor cores. */ static std::shared_ptr createDNNFactory(const std::string & onnxFileName, bool testDNN = false); /** * @brief importForTest an onnx file and uses a default synchronous factory * @param onnxFileName * @param envFileName. Klepsydra AI configuration environment file. * @return a share pointer to a DeepNeuralNetwork object * * This method is intented to be used for testing purposes only. * */ static std::shared_ptr createDNNFactory(const std::string & onnxFileName, const std::string & envFileName); }; 6
APPROACHES TO CONCURRENT ALGORITHMIC EXECUTION Parallelisation Pipeline
BENCHMARK DESCRIPTION Description • Given an input matrix, a number of sequential multiplications will be performed: • Step 1: A => B = A x A => Step 2 : C = B x B… • Matrix A randomly generated on each new sequence Parameters: • Matrix dimensions: 100x100 • Data type: Float, integer • Number of multiplications per matrix: [10, 60] • Processing frequency: [2Hz - 100Hz] Technical Spec • Computer: Odroid XU4 • OS: Ubuntu 18.04
FLOAT PERFORMANCE RESULTS CPU Usage. 30 Steps Throughput. 30 Steps Latency. 30 Steps 80.0 20.00 180.00 60.0 15.00 135.00 40.0 10.00 90.00 20.0 5.00 45.00 0.0 0.00 0.00 2.00 6.50 11.00 15.50 20.00 2.00 6.50 11.00 15.50 20.00 2.00 6.50 11.00 15.50 20.00 Publishing Rate (Hz) Publishing Rate (Hz) Publishing Rate (Hz) OpenMp Klepsydra OpenMp Klepsydra OpenMp Klepsydra CPU Usage. 40 Steps Throughput. 40 Steps Latency. 40 Steps 70.0 14.00 240.00 52.5 10.50 180.00 35.0 7.00 120.00 17.5 3.50 60.00 0.0 0.00 0.00 2.00 5.00 8.00 11.00 14.00 2.00 5.00 8.00 11.00 14.00 2.00 5.00 8.00 11.00 14.00 Publishing Rate (Hz) Publishing Rate (Hz) Publishing Rate (Hz) OpenMp Klepsydra OpenMp Klepsydra OpenMp Klepsydra
KLEPSYDRA AI DATA PROCESSING APPROACH Klepsydra AI threading model Input Output Layer Layer Data Data { { Thread 1 Thread 2 Most layers can be parallelised and are Eventloops are Threading model consists of: vectorised. assigned to cores - Number of cores assigned to event loops - Number of event loops per core - Number of parallelisation threads for each layer
Performance tuning Performance parameters Performance Criteria • number_of_cores • CPU usage • RAM usage Number of cores where event loops will be distributed (by default one event loop per core). High throughput requires • Throughput (output data rate) more cores, i.e., more CPU usage, low throughput requires • Latency low number of cores, therefore substantial reduction in CPU usage. Performance parameters: Performance parameters • pool_size • number_of_parallel_threads Size of the internal queues of the event loop publish/ Number of threads assigned to parallelise layers. For low subscribe pairs. latency requirements, assign large numbers (maximum = High throughput requires large numbers, i.e., more RAM number of cores), i.e., increase CPU usage. For no latency usage, low throughout requires smaller number, therefore requirements, use low numbers (minimum = 1), therefore less RAM. substantial reduction in CPU usage. 11
Example of performance benchmarks Latency: 56ms Latency: 35ms TensorFlow Klepsydra AI 12
ROADMAP Q2 2021 Q1 2022 • No third party dependencies. • NVIDIA Jetson TX2 Support (alpha • Binaries are C/C++ only release) • Custom format for models • Quantisation support Q3 2021 • FreeRTOS support (alpha version) Q2 2022 • Xilinx Ultrascale+ board • Graphs support • Microchip SAM V71 • Memory allocation new model Q4 2021 • C support • PykeOS support (alpha version) • Xilinx Zedboard Legend: Hard deadlines Flexible dates
CONTACT INFORMATION Dr Pablo Ghiglino pablo.ghiglino@klepsydra.com +41786931544 www.klepsydra.com linkedin.com/company/klepsydra-technologies
You can also read