NXP Connects i.MX 8 Family Hardware - Lydia Ziegler - Nxp corporate template, INTERNAL ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
NXP Connects i.MX 8 Family Hardware Lydia Ziegler i.MX 8 DRAM Introduction and Tools Overview October 2018 | AMF-AUT-T3361 Company External – NXP, the NXP logo, and NXP secure connections for a smarter world are trademarks of NXP B.V. All other product or service names are the property of their respective owners. © 2018 NXP B.V.
Agenda • i.MX 8 Family Overview • i.MX 8QM/QXP DDR Controller Overview • i.MX 8QM/QXP DDR Initialization Flow • i.MX 8QM/QXP DDR Calibration Details • i.MX 8QM/QXP DDR Tools Introduction • Debugging DDR Failures PUBLIC 1
i.MX Explosive Growth Over 460M i.MX SOCs shipped to date. Over 140M i.MX shipped in vehicles since 2007. #1 in eReaders #1 in Auto Infotainment Applications Processors 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 i.MX i.MX Auto Scalability • Trusted Supply • World Class Support PUBLIC 2
Scalability of Embedded Processing: i.MX Subsystem Reuse i.MX 8QM A53 A53 A72 SCU DSP HSM i.MX 8DualMax A72 A72 M4 M4 A53 A53 SCU M4 M4 DSP HSM i.MX 8QXP 1x GPU (8 s ha ders) 1x GPU (8 s ha ders) A72 A35 A35 SCU 4K Video Di s play Controller Di s play Controller 1x GPU (8 shaders) M4 DSP 4K Video A35 HSM i.MX 8DX 2x MIPI-DSI 2x LVDS PCIe PCIe 1GbE Di s play Controller A35 SCU 1x GPU (4 s ha ders) A35 A35 M4 DSP HSM 2x MIPI-CSI HDMI 2.0 Audio 1GbE MIPI-DSI 2x LVDS PCIe PCIe 1GbE 4K Video Di s play Controller 1x GPU (4 s ha ders) i.MX 8DXL x64 LPDDR4/DDR4 2x MIPI-CSI HDMI 2.0 Audio 1GbE LVDS/MIPI LVDS/MIPI 1GbE 1080p Video i.MX 8SXL USB 3.0 & 2.0 Di s play Controller A35 A35 M4 x64 LPDDR4/DDR4 A35 MIPI-CSI Audio 1GbE PCIe M4 USB 3.0 & 2.0 LVDS/MIPI LVDS/MIPI 1GbE Pa ra llel 1GbE Di s play SCU x32 LPDDR4/DDR3L MIPI-CSI Audio PCIe 10/100 Pa ra llel 1GbE USB 3.0 & 2.0 HSM Di s play SCU PCIe 10/100 USB 2.0 HSM x16 LPDDR4/DDR3L PCIe 10/100 USB 2.0 x16 LPDDR4/DDR3L USB 2.0 x16 LPDDR4/DDR3L Most Scalable Family of Automotive Applications Processors for eCockpit, Instrument Cluster, Display Audio and Telematics/V2X PUBLIC 4
Automotive Applications Processor Roadmap ARM v5-v7 ARM/v8 ARM v8.2 25-50k DMIPS 128-300 GFLOPS eCockpit Next Gen big.LITTLE i.MX High Vision i.MX 8QuadMax Audio DSP 15-20k DMIPS i.MX 8QuadPlus Pin Compatible Family 64 GFLOPS eCockpit i.MX 8DualMax Vision Audio DSP i.MX 6Quad i.MX 6QuadPlus i.MX 8QuadXPlus
Automotive Applications Processor Roadmap ARM v5-v7 ARM/v8 ARM v8.2 25-50k DMIPS 128-300 GFLOPS eCockpit Next Gen big.LITTLE i.MX High Vision i.MX 8QuadMax Pin Compatible Audio DSP eCockpit Processors Pin Compatible Family • Up to 4x 1080p/ 1x 4k 15-20k DMIPS i.MX 8QuadPlus Displays 64 GFLOPS • x64 LP-DDR4 / 3200 eCockpit i.MX 8DualMax Vision • HiFi4 DSP option Next Gen Audio DSP • Common NextSoftware Gen and i.MX i.MX Entry Entry Hardware platform i.MX 6Quad i.MX 6QuadPlus i.MX 8QuadXPlus
Automotive Applications Processor Roadmap ARM v5-v7 ARM/v8 ARM v8.2 25-50k DMIPS 128-300 GFLOPS eCockpit Next Gen big.LITTLE i.MX High Vision i.MX 8QuadMax Audio DSP 15-20k DMIPS i.MX 8QuadPlus Pin Compatible Family 64 GFLOPS eCockpit i.MX 8DualMax Vision Audio DSP Pin Compatible Display i.MX 6Quad i.MX 6QuadPlus i.MX 8QuadXPlus Audio and Instrument
Automotive Applications Processor Roadmap ARM v5-v7 ARM/v8 ARM v8.2 25-50k DMIPS 128-300 GFLOPS eCockpit Next Gen big.LITTLE i.MX High Vision i.MX 8QuadMax Audio DSP 15-20k DMIPS i.MX 8QuadPlus Pin Compatible Family 64 GFLOPS eCockpit i.MX 8DualMax Vision Audio DSP i.MX 6Quad i.MX 6QuadPlus i.MX 8QuadXPlus
Automotive Applications Processor Roadmap ARM v5-v7 ARM/v8 ARM v8.2 25-50k DMIPS 128-300 GFLOPS eCockpit Next Gen big.LITTLE i.MX High Vision i.MX 8QuadMax Audio DSP 15-20k DMIPS i.MX 8QuadPlus Pin Compatible Family 64 GFLOPS eCockpit i.MX 8DualMax Vision Audio DSP i.MX 6Quad i.MX 6QuadPlus i.MX 8QuadXPlus
Automotive Applications Processor Roadmap ARM v5-v7 ARM/v8 ARM v8.2 25-50k DMIPS 128-300 GFLOPS eCockpit Next Gen big.LITTLE i.MX High Vision i.MX 8QuadMax Audio DSP 15-20k DMIPS i.MX 8QuadPlus Pin Compatible Family 64 GFLOPS eCockpit i.MX 8DualMax Next Generation Vision Audio DSP i.MX 10 i.MX 6Quad i.MX 6QuadPlus Scalable Family i.MX 8QuadXPlus
i.MX 8 & 8X Introduction PUBLIC 11
i.MX 8 Family of Automotive Applications Processors GPU Display DSP Option Virtualization ARM CPU • Dual Core GPU Cortex-M4 | Cortex-A53 | Cortex-A72 • 16 Vec4 Shaders Up to 4 displays Audio DSP SoC Level 8 • Up to 128 GFLOPS OpenVX and ISI Vision Acceleration • 64 execution units SoC OS 8 • High Speed total pixels 8QuadMax • Tessellation / Geometry HiFi 4 Core OS OS Software Compatibility Shaders Pin Compatibility • Dual Core GPU • 16 Vec4 Shaders Up to 4 displays Audio DSP SoC Level 8 • Up to 80 GFLOPS • 64 execution units SoC OS • Full Speed total pixels OS 8 Core 8QuadPlus • Tessellation/Geometry HiFi 4 OS Shaders • Single Core GPU • 8 Vec4 Shaders Up to 3 displays Audio DSP SoC Level • Up to 64 GFLOPS 8 • 32 execution units SoC OS • High Speed total pixels Core OS 8DualMax • Tessellation/Geometry HiFi 4 OS Shaders Family of Scalable Automotive Multimedia Processors eCockpit Infotainment Graphical Instrument Clusters PUBLIC 12
preliminary i.MX 8 Family – Block Diagrams i.MX 8QuadMax i.MX 8QuadPlus i.MX 8DualMax Feature 29x29 Flip-Chip BGA 29x29 Flip-Chip BGA 29x29 Flip-Chip BGA Package 0.75mm pitch 0.75mm pitch 0.75mm pitch DMIPS (Cortex-A) 26k 18.5k 15k ARM® Core 4x Cortex-A53 4x Cortex-A53 2x Cortex-A72 Complex 1 ARM® Core 2x Cortex®-A72 1x Cortex-A72 - Complex 2 Display Controller 2x 2x 1x GPU 2x GC7000 XSVX 2x GC7000Lite XSVX 1x GC7000 XSVX MIPI CSI 2x 4-lane 2x 4-lane 2x 4-lane MLB150 1x 1x via USB HDMI In 1x 1x - HDMI/eDP Out 1x 1x 1x DDR 2x x32 2x x32 2x x32 PCIe 2x PCIe 3.0 2x PCIe 3.0 2x PCIe 3.0 SATA 1x SATA3 1x SATA3 - 1x 1Gb w/AVB Ethernet 2x 1Gb w/AVB 2x 1Gb w/AVB 1x 10/100 w/AVB PUBLIC 13
Preliminary – Subject to Change i.MX 8X Family of Applications Processors GPU Video Displays DSP USB DDR ARM CPU • Single Core GPU x32 Cortex-A35 + M4 • 4 Vec4 Shaders Up to 3 high performance 4 • 16 execution units 2x 1080p 1x WVGA DDR3L-1866 (ECC option) • OpenGL ES 3.1 8QuadXPlus • OpenCL Embedded + Legacy HiFi 4 LP-DDR4-2400 (no ECC) Software Compatibility Pin Compatibility • Single Core GPU Up to 3 x32 • 4 Vec4 Shaders 2x 1080p DDR3L-1866 4 high performance • 16 execution units 1x WVGA (ECC option) • OpenGL ES 3.1 HiFi 4 LP-DDR4-2400 8DualXPlus • OpenCL Embedded + Legacy (no ECC) • Single Core GPU x16 Up to 3 • 4 Vec4 Shaders DDR3L-1866 4 poweroptimized • 16 execution units 2x 1080p 1x WVGA (no ECC) • OpenGL ES 3.1 + Legacy HiFi 4 LP-DDR4-2400 8DualX • OpenCL Embedded (no ECC) Family of Scalable Automotive Multimedia Processors Display Audio Applications Graphical Instrument Clusters Telematics and V2X PUBLIC 14
i.MX 8X Family Block Diagram Core Complex 2 i.MX 8DualXPlus i.MX 8DualX Core Complex 1 Connectivity 1x Cortex-M4F i.MX 8QuadXPlus 1x I2C 4x 4x UART Feature 2-4xCortex-A53 Cortex-A35 16KB L1 I-cache 32KB 32KBL1-D L1-D 1x UART 32KB L1-I 32KB 32KBL1-I 32KBL1-D L1-D 16KB L1 D-cache 8x I2C 6x GPIO 2 x Cortex-A35 512KB L2 w/ECC 256KB SRAM 1x TPM Timer 4x SPI (i.MX 8DualXPlus) ARM® Core 2 x Cortex-A35 2x Gbit Ethernet 4 x Cortex-A35 Multimedia Memory (i.MX 8QuadXPlus) DDR3 @933 MHz (ECC Option) GPU 1x 10/100 Ethernet LPDDR4 @ 1200 MHz (no ECC) 3.3V / 1.8V GPIO ARM® Core 1 x Cortex-M4F 1 x Cortex-M4F 4- Shaders 2x SDIO3.0/eMMC5.1 OpenGL ES 3.1 DSP Core Tensilica® HiFi 4 DSP Tensilica HiFi 4 DSP Vulkan® 2x Quad / 1x Octal SPI PCIe 3.0 with L1 Substate (1-lane) VPU RAW NAND – BCH62 1x USB3 OTG w/PHY *32-bit DDR3L (ECC option) 16-bit DDR3L (no ECC) Video: h.265 dec 4k DRAM h.264 dec/enc 1080p 1 or 2x USB2 OTG w/PHY LPDDR4 (no ECC) LPDDR4 (no ECC) Security Audio 3x CAN/CAN FD 1 x GC7000Lite 1 x GC7000Lite HAB, SRTC, SJTAG, TrustZone 1x Tensilica® GPU HiFi 4 DSP 32KB I 48KB D High Performance Power Optimized AES256, RSA4096, SHA-512 MOST 25/50 512 KB SRAM 64KB TCM 3DES, ARC4, MD-5 4K h.265 dec, 1080p h.264 1080p h.264 enc/dec 4x4 Keypad VPU enc/dec Flashless SHE, ECC 4x PWM Display & Camera I/O Tamper detection, Inline Enc Engine 1 x Gigabit with AVB 1x 12-bit ADC Ethernet 2 x Gigabit with AVB 1 x 10/100 Display Processor w/ SafeAssure® System Control 2x ASRC, SPDIF 2 x MIPI-DSI/LVDS Combo PHY* Power Control, Clocks, Reset 4x SAI, ESAI, MQS USB with 1 x USB 3.0 (or USB 2.0) 1x Parallel Display Boot ROMs 2 x USB 2.0 PHY 1 x USB 2.0 1x MIPI CSI PMIC interface (dedicated I2C) 1x Parallel CSI Resource Domain Partitioning Varies by device *21x21 package only. 17x17 will have 16-bit memory interface * Each single PHY can either be a 1×4 lane MIPI-DSI or a 1×1 channel LVDS interface for a total of 2 display interfaces. In combination, the two PHYs can be configured to be a single 2-channel LVDS interface. PUBLIC 15
i.MX 8QM/QXP DDR Controller Overview PUBLIC 16
DDR Controller/PHY Features • i.MX 8QM − Supports LPDDR4 up to 3200Mbps (1.6GHz DDR clock) − Supports DDR4 up to 2400Mbps (1.2GHz DDR clock) − Two DDR Controllers (4KB interleave between controllers) • i.MX 8QXP − Supports LPDDR4 up to 2400Mbps (1.2GHz DDR clock) − Supports DDR3L (with ECC) up to 1866Mbps (933MHz DDR clock) − One DDR Controller • Data bus width 32-bits/16-bits for all DDR protocols. • Supports up to 2 ranks for all DDR protocols • Voltage and temperature compensation in the background PUBLIC 17
DDR Subsystem Architecture DDR Controller DRC RRB PHYv1 28FDSOI Up to 32-bit data bus along uMCTL2 with associated DQS/DM control signals data training Scheduler and SDRAM AXI Port Arbiter command PHY generator (DDRC) PHY PLL Address and control signals are configurable based on PUB DRAM type WB • QM has two sets of DDR controllers/PHYs • QX has one DDR controller/PHY PUBLIC 18
Comparison With i.MX6/7 • i.MX 6 series uses the MMDC • i.MX 8QM/QXP and i.MX7D uses 3rd party IP − DDR Controller IP similar programming model with i.MX7D − DDR PHY is completely different from MX7D • i.MX 8QM/QXP DDR is higher speed − Ultra high speed, more challenges for customer PCB design − Previous i.MX max DDR freq 528MHz, i.MX 8 QM up to 1.6GHz − Follow layout recommendations provided in the Hardware Developers Guide PUBLIC 19
i.MX 8QM/QXP and i.MX 8M High-level Comparison Feature i.MX8 QM/QXP i.MX8M System Control Unit (SCU) Yes No, architecture similar to MX7D DDR Initialization Performed by SCU Perform by SPL Automatic Data training Performed as part of Performed by the PHY initialization script (PIR MCU (firmware loaded into writes) MCU IRAM/DRAM) Controller version SNPS DDR Controller SNPS DDR Controller (dwc_ddr_umctl2) (dwc_ddr_umctl2) PHY version SNPS PHY v1 SNPS PHY v2 (integrated MCU) PUBLIC 20
High Level Feature Set Comparison of the i.MX 8 / 8X / 8M Families QM Family QX Family mScale Family PUBLIC 21
i.MX8 QM i.MX8 QXP DDR Pin IO name DCF_00 DCF_01 LPDDR4 name DDR4 name CA2_A CA4_A A5 A6 IO name DCF_00 DCF_01 LPDDR4 name CA2_A CA4_A DDR3 name A5 A6 Function DCF_02 DCF_03 DCF_04 CA5_A ALERT_N A7 A8 DCF_03 DCF_04 DCF_05 CA5_A A7 A8 A9 DCF_05 A9 DCF_07 RAS# DCF_06 BG1 DCF_08 CA3_A A3 DCF_07 ACT_N DCF_08 CA3_A A3 DCF_09 ODT_CA_A ODT DCF_09 ODT_CA_A ODT DCF_10 CS0_A A1 • Pins configurable DCF_10 CS0_A A1 DCF_11 CA0_A A0 DCF_11 CA0_A A0 DCF_12 CS1_A A2 based on DDR type DCF_12 CS1_A A2 DCF_14 CKE0_A DCF_13 PARITY DCF_15 CKE1_A • Refer to NXP board DCF_14 CKE0_A DCF_16 CA1_A A4 DCF_15 CKE1_A schematics for DCF_16 CA1_A A4 DCF_17 DCF_18 CA4_B RESET_N A12 RESET# DCF_17 CA4_B A12 examples DCF_18 RESET_N RESET_N DCF_19 CA5_B A14 DCF_19 CA5_B A14 DCF_20 A15 DCF_20 A15 DCF_21 BA0 DCF_21 BA0 DCF_22 BA1 DCF_22 BA1 DCF_23 BA2 DCF_23 BG0 DCF_24 CAS# DCF_24 A17 DCF_25 ODT_CA_B DCF_25 ODT_CA_B ODT1 DCF_26 CA3_B A13 DCF_26 CA3_B A13 DCF_27 CA0_B A10 DCF_27 CA0_B A10 DCF_28 CS0_B CS_N[0] DCF_28 CS0_B CS_N[0] DCF_29 CS1_B CS_N[1] DCF_29 CS1_B CS_N[1] DCF_30 CKE0_B CKE0 DCF_30 CKE0_B CKE0 DCF_31 CKE1_B CKE1 DCF_31 CKE1_B CKE1 DCF_32 CA1_B A11 DCF_32 CA1_B A11 DCF_33 CA2_B A16 DCF_33 CA2_B WE# PUBLIC 22
JEDEC Timing PUBLIC 23
Timing Budget for Read – JEDEC Min From LPDDR4 • 1.6 GHz frequency has a clock period of 625 picoseconds − Double data rate gives a theoretical window of 312.5 picoseconds • JEDEC standards require LPDDR4 to have a minimum window of 70% of theoretical window (94 picoseconds) − Accounts for all skew, slew rate diff and jitter from LPDDR4 package PUBLIC 24
Timing Budget for Read – Processor Flip-Flop times • Set up time requirement for Read FIFO of processor − 17 picoseconds • Hold time requirement for Read FIFO of processor − 17 picoseconds PUBLIC 25
Timing Budget for Read – Vref Uncertainty • Vref must meet the following tolerance: +/- 1% − Vref effects the time that a signal (DQ/DM/CA) is latched into the pads • Timing fluctuations for maximum Vref variations −4 picoseconds for Set Up − 4 picoseconds for Hold PUBLIC 26
Timing Budget for Read – DQS Placement Uncertainty • Accounts for Delay Element granularity in DLL − One delay element is ~ 5 picoseconds long − Manufacturing process variations can change this value. • Timing budget for DQS variation is 7 picoseconds applied to Set Up PUBLIC 27
Timing Budget for Read – Voltage-Temperature Drift • ZQ Calibrations account for signal drive strength on PCB • Variations in Volt-Temp effect delay element time • Timing budget for maximum allowed Volt-Temp drift −7 picoseconds for Hold PUBLIC 28
Timing Budget for Read – Tap Size Variation • The actual delay element tap point may vary • Timing budget allows for 2.2 picoseconds based on manufacturing process variations. PUBLIC 29
Timing Budget for Read – Power Supply Noise • Maximum allowed internal power rail ripple is +/- 2% • Accounts for jitter introduced on Read signal from package ball to the input of the Read FIFO. • Timing budget allowances: − Set Up: 8 picoseconds − Hold: 8 picoseconds PUBLIC 30
Timing Budget for Read – I/O Rise/Fall Skew mismatch • Accounts for internal Rise/Fall mismatches of the Read signal from the processor balls to the Read FIFO. • Typically caused by different slew rates for rising and falling edges • Timing budget allowances: − Set Up: 9 picoseconds − Hold: 9 picoseconds PUBLIC 31
Timing Budget for Read – InterSymbol Interference ISI • Accounts for interactions between data traces internal to the processor, processor balls to the Read FIFO. • Timing budget allowances: − Set Up: 8 picoseconds − Hold: 8 picoseconds PUBLIC 32
Timing Budget – Allowance for Trace Length Mismatch • The remaining Timing Budget is allocated to PCB trace length, internal package length, and design margin. • For most robust design, recommend match trace lengths as close as possible: − Addthe internal package length given to the PCB trace length, and then match lengths by group. PUBLIC 33
Timing Budget • As DDR frequency increases, the time between strobe edges (rise/fall) becomes so small that the DRAM system designer needs to account for all possible errors in timing. • The frequency itself provides the maximum available time in a window. • The three major components in a DRAM system can account for all errors: − The DRAM Device − The PHY on the processor − The interconnecting system ~ PCB board. ▪ Includes package substrate up to silicon pads. ▪ IBIS models include necessary information. At 1.6 GHz, the maximum data window is 313 picoseconds. Uncertainties on the DRAM and processor reduce this window to 110 picoseconds. If further errors on the PCB amount to more than 110 picoseconds, there are potential problems with data integrity. PUBLIC 34
i.MX 8QM/QX DDR Initialization Flow PUBLIC 35
DDR Initialization Flow • Three main initialization components DDR Controller/PHY register initialization − Controller/PHY initialization − DRAM initialization − Data training DRAM initialization • Data training (calibration) part of init flow − Data training specific to DRAM technology DRAM training • Initialization sequence must adhere to LP4 DDR4 DDR3 order shown here − Includes sequence order for data training • DDR Register Programming Aid (RPA) PHY/DRAM Ready takes care of this PUBLIC 36
i.MX 6 Versus i.MX 8QM/QXP DDR Initialization Process i.MX 6 Series i.MX 8QM/QX 1. Create an initial DRAM initialization script 1. Create an initial DRAM initialization script from RPA from RPA 2. Run initial DRAM initialization 2. Run DDR stress test based on the script 3. Run calibration and then test to make sure 3. Tweak the script (if necessary) to make sure board works it can pass on several boards 4. Run calibration on a number of boards and obtain average values 5. Place averaged calibration values into DRAM initialization script 6. Run updated DRAM initialization 7. Perform testing on several boards PUBLIC 37
i.MX 8QM /QX DDR Calibration Details PUBLIC 38
DDR Data Training LPDD DDR4 DDR3 • Different DDR technology R4 require different data training • Data training part of initialization process − Write PIR register − Poll for completion • Command Bus Training (CBT) not automatic, requires SW algorithm − Currently under investigation and development by R&D PUBLIC 39
DDR Training/Calibration Introduction DRAM Calibration LPDDR4 DDR4 DDR3L Impedance (ZQ) calibration ✓ ✓ ✓ Command/address bus ✓ training* Write Leveling ✓ ✓ ✓ DQS Gate training ✓ ✓ ✓ Write DQS2DQ training ✓ Data Eye training ✓ ✓ ✓ VREF training ✓ ✓ * Command Bus Training (CBT) not automatic, requires SW algorithm; currently under investigation and development by R&D PUBLIC 40
DDR Training (calibration) During Initialization • Reason for data training (calibration) during DRAM initialization − New DRAM technologies increasingly faster − Tighter timings affected by delays between PHY and DDR memory ▪ Factors like board trace length affect these delays ▪ Process variations of the SoC and DRAM may also affect these delays − JEDEC requires data training for LPDDR4 and DDR4 as part of the initialization • Data training implemented completely by DDR PHY − Some setup may be needed (i.e. enable/disable DQS pull up/down for DQS gate) − Simple write to PHY PIR to start training then poll PHY PGSR0 for training complete − RPA handles all of this, no user interaction • No longer need to manually run calibration on various boards to take an average (as in the case of previous i.MX SoC) PUBLIC 41
DDR Calibration After Initialization (Run-time) • Run-time calibration during DRAM operation compensates for variations in voltage and temperature • Enabled during initialization of the DRAM, no further user interaction required − Delay line VT compensation ▪ Delays vary over time due to voltage and temperature fluctuations ▪ PHY contains circuits to monitor delay in the background during DRAM operation ▪ Drift compensation logic periodically adjusts delay line select input for variations in voltage/temperature ▪ Ensures each delay line maintains a constant time delay as voltage and temperature change during chip operation − Impedance (ZQ) calibration ▪ PHY has background calibration/compensation engine ▪ Boot time: during PHY initialization, full calibration performed to find initial values ▪ Run time: during DRAM operation • ZQ calibration periodically calibrates the output driver impedance and ODT of SoC and DRAM I/Os • Incremental compensation performed in the background − DQS drift detection (applicable only to LPDDR4) ▪ PHY logic monitors drift in read DQS signal compared to DQS_GATE input due to DRAM tDQSCK variations over time ▪ tDQSCK for DDR3/4 are kept relatively constant by DRAM and hence do not require DQS drift detection PUBLIC 42
DDR Calibration Modes • Impedance (ZQ) calibration Occurs as part of PHY initialization and run-time • Command/address bus training* • Write Leveling • DQS Gate training • Write DQS2DQ training* Performed by PHY during initialization • Data Eye training • VREF training** Note: The items of DQ training are performed automatically during DRAM initialization by the DDR PHY. Specifically, each of these trainings are simply triggered by programming their specific bits in the PHY Initialization Register (PIR). * Applicable only to LPDDR4 ** Applicable only to LPDDR4 and DDR4 PUBLIC 43
Impedance (ZQ) Calibration What ZQ calibration calibrates I/O driver impedance across PVT Why This automatic process tunes the DRAM and the SoC I/O Pad output drivers (drive strength) and ODT values across changes in process, voltage, and temperature. How ZQ calibration is performed as part of the DRAM initialization process. Auto ZQ calibration is configured via the register DDRC_ZQCTL0 during DRAM initialization When ZQ calibration is configured during DRAM initialization to run periodically. Once configured, there is no further user interaction required. PUBLIC 44
Command/Address Bus Training (LPDDR4 only) What Command/Address Bus Training (CBT) used to center Command/Address bus (CS and CA[5:0]) with rising clock edge by adjusting internal delays associated with CA bus CA Why Higher DRAM speeds implies more stringent timing. However, LPDDR4 CA bus is single data rate thereby increases timing margin when compared to double data rate. How QM/QX SNPS PHYv1 does not perform CBT automatically (within JEDEC spec by default). Requires software algorithm, under investigation by R&D. When JEDEC recommends but does not require CBT to be performed during initialization. Another proposal is to run CBT on a few boards to obtain an average CA delay value and apply to initialization. PUBLIC 45
Write Leveling What Compensates for CK to DQS timing skew by aligning clock with data strobe to improve signal integrity performance Why • For non-LPDDR4: compensates for skew between clock and data strobe caused by fly-by topology • LPDDR4: compensates for CK-to-DQS timing skew affecting timing parameters such as tDQSS (write command to 1st DQS latching), tDSS and tDSH (DQS setup/hold time) How DDR PHY invokes write leveling mode in SDRAM then delays DQS to align with clock at SDRAM When Write leveling training is performed automatically by the DDR PHY during DRAM initialization PUBLIC 46
DQS Gate Training What Training that sweeps read DQS gate over possible gating positions to discover appropriate placement Why • PHY internally gates DQS during non-read operations to prevent erroneous latching of DQS edges • Precise alignment of gate within read preamble a prerequisite for proper reads • Delays (such as board trace lengths) in read path are imprecisely known, need to train the gate for a particular system How DQS Gate training is performed automatically by the DDR PHY. The PUB features a built-in read DQS strobe gate training unit that may be triggered as part of the initialization process using the PIR register When DQS Gate training is performed automatically during DRAM initialization. PUBLIC 47
Write DQS2DQ Training (LPDDR4 only) What DQS to DQ training is referred to as “Write training” in JEDEC and “Write DQ training” in DFI. Why LPDDR4 Memory devices use an unmatched DQS-DQ path to enable high-speed performance and save power. As a result, the DQS strobe must be trained to arrive at the DQ latch center-aligned with the data eye. How The DQ receiver will latch the data present on the DQ bus when DQS reaches the latch, and DQS2DQ training is accomplished by delaying the DQ signals relative to DQS such that the data eye arrives at the receiver latch centered on the DQS transition. Above picture shows the DQ position after the training. When DQS2DQ training is performed automatically by the DDR PHY during DRAM initialization. PUBLIC 48
Data Eye Training What The PHY training firmware contains automatic training sequences to perform read and write de-skew which aligns the data bits to the DQ bit with the longest delay using a bit delay line (BDL). After performing bit de-skew the read and write eye centering training is executed to place the strobe in the center of the eye defined by the bits in the respective byte. Below is an illustration of before and after de-skewing and centering. Before After Why As bit rates increase to 2133Mbps and beyond, maintaining timing margins in the DDR interfaces has become more difficult. The PHY solution includes delay lines to compensate for per-bit skew due to factors such as PHY to IO routing skews, package skews, PCB skew, etc. When Read/write de-skew and eye centering is performed automatically by the DDR PHY during DRAM initialization. PUBLIC 49
VREF Training (LPDDR4 and DDR4) What • Write/read eyes should be as wide as possible to provide stable/robust memory access. • Eye position depends upon LCDL (delay line) and VREF values. Why • VREF is internally generated by SoC and DRAM. • VREF training used to determine range of VREF values where memory interface (write/read) is stable and then find out an optimum write/read eye position. The following types of VREF training are supported: DRAM VREF Training: Optimizes the write eye by sweeping DRAM VREF DQ values inside memory. Host (i.MX8) VREF Training: Optimizes the read eye by sweeping the PHY I/O’s VREF setting. How VREF training is performed automatically by the DDR PHY during DRAM initialization. Note, for DDR3L, VREF is externally supplied hence there is no VREF training requirement. PUBLIC 50
i.MX 8QM/QXP DDR Tool Introduction PUBLIC 51
i.MX 8QM/QX DDR Register Programming Aid (RPA) Highlights • Developed by SE team and no formal roll out or maintenance − Based on scripts provided by design/validation • Excel spread sheet based, transparent, ease-of-use • Help to compute DDRC registers configuration − JEDEC timing parameters − DDRC DFI timing parameters − DDRPHY configuration • Help to configure DDR mode registers • Includes necessary data training for specific memory type • “BoardDataBusConfig” worksheet for data bus swizzling • Two output formats − DCD CFG file – SCFW usage (copy into SCFW board folder) − DDR Stress Test Script – for use with the DDR stress test PUBLIC 52
i.MX 8QM/QX RPA • Each tool based on DDR technology: LPDDR4, DDR4 or DDR3 • Applies correct order of initialization steps − Controller/PHY initialization − DRAM initialization − Data training • Includes worksheet for data bus mapping − Configures relevant registers for data bit/byte swizzling • Generates two initialization formats − CFG file for use with SCFW (save as .cfg) − DDR Stress Test Script (save as .ds) • Color coded cells provides usage guidance PUBLIC 53
RPA – Register Configuration • In most cases, user only needs to update Device Information table − Automatically updates configuration and timings (all timings are based on JEDEC standard) − No need to manually go through all register fields (strongly recommend to not manually edit those fields) Indicates the DDR type the RPA is applicable to Recommend to list vendor and exact part number User must ensure these are accurate; values are found in the memory device data sheet PUBLIC 54
RPA – BoardDataBusConfig • Board layout guidelines allow users to swizzle data bits within a byte lane and swap byte lanes • “BoardDataBusConfig” worksheet – users input SoC data bus connection − Data bus mapping must be accurate for PHY data training − Relevant registers are automatically updated User must accurately populate this field based on the customer schematics. Errors in this field may result in data training errors. PUBLIC 55
RPA – Initialization Scripts • Two file formats, simply copy-and-paste into text document: − [DCD CFG file] for SCFW (to support SCFW porting) – save as .cfg − [DDR Stress Test Script] for use with DDR stress test – save as .ds • Strongly recommend to not manually edit these tabs − Make changes only to Register Configuration and BoardDataBusConfig tabs DCD CFG file example DDR Stress Test Script example • Yellow cells indicates that they are affected by changes on the Register Configuration and BoardDataBusConfig tabs PUBLIC 56
i.MX 8QM/QX DDR Stress Test Tool – Overview • Supports i.MX 8QM/QX DDR Stress Test Folder structure • Board hardware requirement − USB OTG port for Serial download mode − Debug AP UART port* − Highly recommend SCU UART port DDR Stress Test • Requires functional SCFW GUI • Use RPA to generate stress test script * Note, for Win10, may require manually installing COM port driver (FTDI, SiLabs,…) PUBLIC 57
i.MX 8QM/QX DDR Stress Test Tool – High Level Steps • User must first ensure working SCFW • Create a new DDR script by RPA tool − Based on DDR device and board hardware design • Power on i.MX 8QM/QX board in serial download mode − USB OTG and AP UART port connect correctly − Highly recommend SCU UART port connection to serial terminal • Load DDR script and download i.MX8 QM/QX binaries to target board • If DDR Stress Test passes, use RPA DCD CFG file to create *.cfg file for SCFW − Rebuild SCFW with updated *.cfg and proceed with u-boot/OS porting − Recommend running OS stress test (i.e. memtester) PUBLIC 58
i.MX 8QM/QX DDR Stress Test Tool – SCFW • User must first port SCFW to customer board (ensure SCFW is up and running) • Then build the SCFW for the DDR Stress Test make qx R=B0 DDR_CON=ddr_stress_test_parser − SCFW will run a special “parser” instead of running DDR init − DDR Stress Test loads ddr initialization to OCRAM then “parser” executes init − Copy and re-name scfw_tcm.bin to DDR Stress Test bin folder as follows: ▪ QM: mx8qm_scfw_download.bin ▪ QX: mx8qx_scfw_download.bin • SCU UART port connection to serial terminal − Ensures SCFW is up-and-running PUBLIC 59
i.MX 8QM/QX DDR Stress Test Tool – How to Run 1. Select the correct COM port 3. When AP UART, DDR script, and number for the AP UART, then hit SoC selected, hit Download connect 2. Select the desired DDR initialization script and SoC 4. Select operational features 5. Select freq range for test or leave as 0 for testing at target freq Double check DDR parameters and ensure they match what’s on the 6. Hit Stress test to start running board DDR data training status PUBLIC 60
DDR Stress Test Fails to Run – Common Causes • DDR Stress Test should run even when data training error occurs • However, in some corner cases, the DDR Stress Test may fail to run • Make sure board is in serial download mode and USB OTG is connected Example of successful SCFW execution • If all you see is this, first make sure the SCFW is properly running (check SCFW UART port) • Make sure to build the SCFW for the DDR Stress Test • If SCFW hangs during DDR init, make sure you are selecting the correct *.ds file (in other words, don’t select a QM *.ds file when using QX) • If SCFW is successful and DDR init has completed, then check to make sure you are connected to the correct COM port for the AP UART PUBLIC 61
i.MX 8QM/QX DDR Stress Test Versus Memtester • Once DDR stress test passes with ample margin, are we guaranteed the OS will never fail due to DDR issues? − High degree of confidence DDR robust enough, but… − OS is still the most stressful, particularly an OS stress test like memtester or u-boot decompressing the Linux kernel − Recommend to run any OS stress tests to double check PUBLIC 62
i.MX 8X MEK Connection for DDR Stress Test USB-to-UART serial connection (debug USB OTG Type C UART port) (direct connection to PC, do not connect through USB HUB) PUBLIC 63
i.MX 8QM/QX RPA and DDR Stress Test Tools • As the i.MX 8QM and QXP family are not released yet, please contact your local NXP FAE for RPA and DDR Stress test tool. • Eventually this will be posted to Community PUBLIC 64
Debugging DDR Failures PUBLIC 65
Potential Causes of DDR Failures • DDR Data training (during DDR init) achieves best possible timing and vref parameters for optimal performance − If failures occur, more likely to occur early on in data training − If failures do occur in data training, here’re some suggestions ▪ First, re-check RPA tool, ensure correct/accurate DDR parameters/configuration ▪ For errors like DQS2DQ (LP4) and WLERR (write leveling) training – ensure RPA BoardDataBusConfig is accurate ▪ Other errors (less likely) – try adjusting drive strength and ODT parameters ▪ Other reason: poor board layout or manufacturing issues; bad memory device − Data training results reported by the DDR stress test • Post training DDR failures – unlikely but here are some possible reasons − Ensure row, col, chip select, and data bus size are correct (failures would occur consistently when passing certain memory boundaries) − Power supply noise or spikes – refer to HW Developers Guide for board design techniques (cap placements, power supply design, etc) PUBLIC 66
Debugging DDR Failures Flow Chart DDR initialization and data training (RPA) Re-check DDR Data N initialization and Data N Data N Adjust drive strengths training “BoardDataBusConfig” training training and ODT pass? to account for bit/byte pass? pass? swizzling Y Y Y DDR good to Likely board go layout/manufacturing/ power-supply-design issue or bad DDR PUBLIC 67
How to Adjust Drive Strength and ODT in RPA • Values can be adjusted in Register Configuration tab • Adjustable parameters based on DRAM type (green shaded cells) • Adjusts parameters for: − CA (command and address) bus − DQ bus • Pull-down menu list impedance options • Recommend to start with RPA defaults − Tuned by validation for best possible signal integrity for NXP validation boards − To date, we’ve not seen a need to adjust PUBLIC 68
How to Adjust Drive Strength and ODT in RPA LPDDR4 Example Controls pull-up Note: for CA bus Controls pull-up ODT control for DQ bus. and pull-down drive (output only), ODT and pull-down drive Note, also adjusts DRAM strength for CA bus irrelevant strength for DQ MR22: SOC_ODT bus Note: DRAM drive strength control can be found in MR3 register and ODT control can be found in the MR11 register PUBLIC 69
How to Adjust Drive Strength and ODT in RPA DDR3 Example Controls pull-up and Note: for CA bus Controls pull-up and ODT control for DQ pull-down drive (output only), ODT pull-down drive bus strength for CA bus irrelevant strength for DQ bus Note: DRAM drive strength and ODT control can be found in the MR1 register PUBLIC 70
www.nxp.com NXP, the NXP logo, and NXP secure connections for a smarter world are trademarks of NXP B.V. All other product or service names are the property of their respective owners. © 2018 NXP B.V.
You can also read