An efficient channelization architecture and its implementation for radio astronomy

Page created by Isaac Dixon

Science

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

An efficient channelization architecture and its implementation for radio astronomy

Journal of Instrumentation

PAPER • OPEN ACCESS

An efficient channelization architecture and its implementation for radio
astronomy
To cite this article: W. Liu et al 2021 JINST 16 P08047

View the article online for updates and enhancements.

 This content was downloaded from IP address 46.4.80.155 on 08/10/2021 at 05:30

Published by IOP Publishing for Sissa Medialab
Received: July 4, 2021
Accepted: July 19, 2021
Published: August 16, 2021

An efficient channelization architecture and its

2021 JINST 16 P08047
implementation for radio astronomy

W. Liu, Q. Meng, ,∗ C. Wang, C. Zhou, S. Yao and I. Tariq
Schoolof Information Science and Engineering, Southeast University,
Nanjing, Jiangsu, China
National Astronomical Observatories, Chinese Academy of Sciences,

Beĳing 100012, China

E-mail: mengqiao@seu.edu.cn

Abstract: Channelization is one of the most important parts in a Digital Back-End(DBE) for
radio astronomy. A DBE with wider bandwidth and higher resolution consumes larger amount
of computing and memory resources, which results in much higher hardware cost. This paper
presents an efficient channelization architecture, which consists of Bit-Inverted, Parallel Complex
Fast Fourier Transform(BIPC-FFT) and In-place Forward-Backward Decomposition(IPFBD). The
efficient architecture can assist with saving a lot of resources, so a wide-band and high-resolution
DBE can be implemented on an resource restricted platform. Based on the efficient channelization
architecture, we designed a Dual-Input, 64K-Channelized prototype DBE with 1.2 GHz bandwidth
on a Xilinx Virtex-6 LX240T Field Programmable Gate Array(FPGA) chip. The test results in
the lab and observation results at Yunnan Observatory demonstrate the DBE can be used for radio
astronomy.

Keywords: Instrument optimisation; Spectrometers

∗ Corresponding author.

c 2021 The Author(s). Published by IOP Publishing Ltd on behalf of
Sissa Medialab. Original content from this work may be used under the
terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this https://doi.org/10.1088/1748-0221/16/08/P08047
work must maintain attribution to the author(s) and the title of the work, journal citation
and DOI.

Contents

1 Introduction 1

2 Efficient channelization architecture and Its 64k-channelized implementation 2
2.1 Hardware platform 3
2.2 Efficient channelization architecture and implementation 4
2.2.1 Bit-inverted, parallel complex FFT(BIPC-FFT) 4

2021 JINST 16 P08047
2.2.2 In-Place Forward-Backward Decomposition (IPFBD) 7
2.2.3 Total resource consumption for the digital back-end 11

3 Results 11
3.1 Test in the lab 11
3.2 Observation at Yunnan Observatory 12

4 Conclusion 13

1 Introduction

Digital Back-End with wide bandwidth and high resolution is vital in radio astronomy, such as pulsar
observation and spectral line studies. Because of the significant distance among pulsars and earth,
the received pulsar signal is weak, so the receivers and digital back-end should have high sensitivity
for the weak signal, which makes the processing bandwidth get wider. For example, the bandwidth
of the ATA(Allen Telescope Array) is 209 MHz [1]. The bandwidth of the HartRAO is 400 MHz [2].
The bandwidth of the SKA(Square Kilometer Array) is up to 1 GHz [3] The bandwidth of the Parkes
is 2.8 GHz [4]. The bandwidth of EHT(Event Horizon Telescope) has reached to 4 GHz [5].
Spectral line studies are operated at sky frequencies of Gigahertz to Terahertz, so the wide
processing bandwidth is also necessary. GREAT(German REceiver for Astronomy at Terahertz
frequencies) is a modular dual-color heterodyne instrument for highresolution far-infrared (FIR)
spectroscopy [6], and the IF bandwidth is up to 2.5 GHz. The new APEX(Atacama Pathfinder
EXperiment) telescope operates from several 100 GHz up to 1.5 THz, which allows for studies of
broader lines even at the highest frequencies [7].
Spectral resolution has a large impact on the pulsar observation or spectral line studies. Because
of the impact of interstellar medium, the various frequencies of wide band pulsar signals coming to
the earth have distinctive time delays, which is called dispersion. High spectral resolution is helpful
in removing the dispersion effect, which is called incoherent dedispersion [8]. Higher resolution is
always required in spectral line research, therefore processing bandwidth up to several GHz with a
few thousand spectral channels is required, capable of resolving narrower spectral lines. [9–11].
The approach to accomplish higher spectral resolution is to utilize larger number of spectral
channels. The Berkley-Parkes- Swinburne Recorder(BPSR) system developed based on Berkeley

–1–

Table 1. Frequency channels of some famous digital back-end.

Platform FPGA Model Number in BW(MHz) Freq ℎ Freq (kHz)
SARDARA Xilinx Virtex6 SX475T 2 2500 16384 152.5
XFFTS Xilinx Virtex6 LX 240T 1 2500 32768 76.2
GUPPI Xilinx Virtex-II Pro 2VP50 2 1000 4096 244.1
PDFB Virtex-4 SX55 2×2 1000 8192 122
BPSR Xilinx Virtex-II Pro 2VP50 2 400 1024 390.6

CASPER’s IBOB has a maximum channel number of 1024 [12]. The Green Bank Ultimate Pulsar

2021 JINST 16 P08047
Processing Instrument(GUPPI) system based on Berkeley CASPER’s IBOB has a maximum channel
number of 4096 [13]. The Pulsar Digital Filter Bank (PDFB) developed by Australia’s CISRO has
a maximum channel number of 8192 [14]. The eXtended bandwidth FFT Spectrometer(XFFTS)
system developed by Max Planck Lab in Germany is based on a Xilinx Virtex-6 LX240T FPGA chip,
and the maximum number of frequency channels is 32768 [9]. Another approach is to finish coarsely
channelization in FPGA, and then acquire higher spectral resolution in subsequent processing. For
example, rather than attempting to achieve the desired channel resolution by performing a larger
FFT on FPGA, the SArdinia Roach2-based Digital Architecture for Radio Astronomy(SARDARA)
based on the ROACH21 platform uses a Xilinx Virtex-6 SX475T FPGA as the signal processing
core, dividing the broadband signal into 16384 frequency channels [15]. This data is then sent over
ethernet to a GPU server, which performs the subsequent fine channel splitting. [16–18]. With the
incredible processing capacity of GPU, the number of spectral channels can reach to million, but
GPU increases equipment cost and power consumption. Bandwidth and spectral channels of the
digital back-ends are shown in table 1.
The key of channelization processing is to perform FFT on the digitized input signal. However,
the hardware resource, especially the Block RAM(BRAM) resources, on FPGA chip limits the
number of spectral channels. Among the digital back-ends seen so far, Xilinx Virtex-6 SX 475T
FPGA on ROACH2 platform has the most hardware resource, especially the largest amount of
BRAM resources at 38,304 Kb, and the maximum number of channels supported by it is 16384
for dual-input analog input signals. XFFTS is another powerful spectrometer with the maximum
number of spectral channels up to 32768. Currently, many FPGA chips have more hardware
resources, which can be used for larger amount of spectral channels, but the powerful FPGA chips
result in higher hardware costs.
In this paper, an efficient channelization architecture is present, which can help to acquire more
spectral channels with limited hardware resources. With the efficient architecture, a dual-input,
64K-Channelized prototype digital back-end is implemented on a Xilinx Virtex-6 LX240T FPGA
chip with the spectral resolution of 18.3 kHz in 1.2 GHz bandwidth.

2 Efficient channelization architecture and Its 64k-channelized implementation

Table 1 shows that the majority of digital back-ends are built for dual analog inputs. The reason for
this is that signals received by the antenna are usually split into right-polarized and left-polarized
1https://github.com/casper-astro/casper-hardware.

–2–

input components, and these two input components are occasionally required for radio astronomy
research, such as the determination of Stokes parameters in pulsar observation. In this way, the
digital back-end should be able to process these two signals simultaneously.

2.1 Hardware platform

The digital back-end implemented in this paper is based on our Cascaded Reconfigurable Architec-
ture Board(CRABoard) [19], which consists of three parts. The first part is ADC(Analog-to-Digital
Converter) sampling board based on EV8AQ160, capable of dual-channel input with 2.4 GSps sam-
pling rate and 8-bit resolution. The second is an FPGA signal processing board based on a Xilinx

 2021 JINST 16 P08047
Virtex-6 LX240T FPGA chip. The last is an ARM-based control board, which can be utilized for
initialization and control. The system block diagram is shown in figure 1. The photo of the digital
back-end is shown in figure 2.

 Figure 1. Block diagram of the Digital Back-End system.

 CASPER toolflow2 is world famous, and it allows researches to generate signal processing
designs using MATLAB’s graphical programming tool Simulink. At the beginning of the design,
we tried CASPER toolflow(for ROACH2) to implement some spectrometer designs, which can be
used for the estimation of resource consumption, which is shown in table 2.

 Table 2. The resource consumption for different spectral channels(one input channel).

 Slice Logic
 Occupied Slices LUT Flip Flop pairs used RAMB18E1
 Channels
 16384 7837 27485 535
 32768 9806 33706 1071
 65536 - - Exceeded

 2https://casper-toolflow.readthedocs.io/en/latest/.

 –3–

2021 JINST 16 P08047
Figure 2. The photo of the Digital Back-End.

From the table, we can see the resources are not enough for a one-input, 64K-Channelized
design on ROACH2, so the resources must also be exceeded on our hardware platform, which is
based on a Xilinx Virtex-6 LX240T chip. The channelization implementation consumes most of
the resources, so we have to utilize an more efficient channelization architecture.

2.2 Efficient channelization architecture and implementation

The signal processing diagram of the entire digital back-end is shown in figure 3, which is consisted
of five units: Data Interface Unit, Power Calculating Unit, Accumulation Unit, Ethernet Unit and
64K-Channelized Unit for Dual Input Channels. As the analog input signals, and , are sampled
at 2.4 GSps, it is hard to process the high-speed data stream directly. The data interface unite is
used to divide the data stream into eight separate data streams at 300 MSps. The Power Calculation
Unit is used to calculate the input power of the signals. The Accumulation Unit is used for power
accumulation. The Ethernet Unit is used for transferring raw channelized data from FPGA. The
64-Channelized Unit is based on the efficient channelization architecture, which is the core in the
whole system. The efficient channelization architecture is shown in figure 4, and we will introduce
the key parts in the efficient architecture in section 2.2.1 and section 2.2.2.

2.2.1 Bit-inverted, parallel complex FFT(BIPC-FFT)
As previously stated, the digital back-end should process both left- and right-polarized input signals
at the same time. Two separate FFT modules are used in the traditional technique. The input data
is only assigned to the real part of the FFT input since the input signal is a real signal. When we
try to execute 64K channelization, this method takes a lot of BRAM resources, which the FPGA
cannot provide. We propose Bit-Inverted, Parallel Complex FFT, which treats left-polarized and
right-polarized signals as the real and imaginary components of a complex signal to save hardware
resources. Following the completion of the FFT process, a decomposition module is used to decom-
pose the FFT output into the real and imaginary parts of the FFT input, which correspond to left-

–4–

2021 JINST 16 P08047
 Figure 3. Signal processing diagram of the 64K-Channelized digital back-end.

 Figure 4. The 64K-Channelized module based on the efficient channelization architecture.

polarized and right-polarized signals. By utilizing the complex FFT, only a single FFT module is ex-
pected to complete the 64K channelization, and it saves half of the resources required by FFT module.
 Suppose ( ) and ( ) are N-point real data streams, which corresponding to the right-
polarized and left-polarized data, and ( ) and ( ) are their DFT:

 ( ) = DFT[ ( )], ( ) = DFT( ( ))

Let
 ( ) = ( ) + ∗ ( ) (2.1)
Then the DFT will be
 ( ) = DFT( ( ))
 = DFT( ( )) + ∗ DFT( ( )) (2.2)
 = ( ) + ∗ ( )
Due to the conjugate symmetry of ( ) and ( ), we will get
  1 ∗
  ( ) = 2 ( ( ) + ( − ))
 
 
 
 (2.3)
 
  ( ) = − ( ( ) − ∗ ( − ))
 
 
  2
Therefore, ( ) and ( ) can be calculated with only one FFT module [20].
 Because the data stream has to be processed in real time, a Pipelined FFT IP core that can
process the data as a stream must be implemented. However, because the sampling rate of the 
and in this architecture is 2.4 GSps, the FPGA clock rate would exceed the fabric capability

 –5–

2021 JINST 16 P08047
Figure 5. Block diagram of the PC-FFT

if processed as a single data stream on the FPGA. The high-speed data stream must be divided
into several parallel data streams and then input to the logic part of FPGA. As the high-speed data
stream has been divided into eight sub data streams by the data interface unit, the eight parallel
data streams are perfect to the 64K-Channelized unit. To finish the Real-Time, High-Speed FFT
computation, a parallel processing method must be applied. The block diagram of BIPC-FFT is
shown in figure 5. Eight pipeline FFT cores are used to perform eight 16K-Point FFT in parallel,
and then one Radix-2 FFT module and two Radix-4 FFT modules are implemented to finish the
whole FFT computation [19, 21, 22].
In figure 5, the twiddle factor is needed for FFT butterfly, which is generally a set of sin/cos
data tables stored in BRAM. As the number of FFT points increases, the BRAM consumption
also increases accordingly. In order to save BRAM resources, the CORDIC (Coordinate Rotation
Digital Computer) core, which is based on a digital calculation algorithm of coordinate rotation,is
utilized to generate the twiddle factor and reducing consumption of BRAM resource.
In the realization of 16K-Point Pipelined FFT module in figure 5, there are two kinds of Xilinx
FFT IP cores in Xilinx ISE. One works on sequential mode, and another works on Bit-Inverted mode.
The BRAM consumption under each IP core is shown in the table 3. The selection of the FFT IP
core will also affect the design of the decomposition module in figure 4. When a sequential FFT IP
core is selected, the realization of the decomposition will be easier, but FFT core consumes a large
amount of BRAM. If a Bit-Inverted FFT IP core is selected, the amount of BRAM resources will be
decreased, but the decomposition will be more complex due to the inverted order of the data. Virtex-
6 LX240T FPGA only has a total of 832 18 Kb-BRAM blocks. In order to save the BRAM resources,
the Bit-Inverted FFT IP core is utilized. However, the difficulties in decomposition process under
Bit-Inverted mode must be carefully resolved, which will be discussed in Session 2.2.2.

–6–

Table 3. BRAM consumption for FFT (the input data width is 8 bits).

 18 Kb BRAM Consumption
 16K Pipeline FFT with Sequential output 68 × 8
 16K Pipeline FFT with Bit-Inverted output 27 × 8

 2021 JINST 16 P08047
 (a) Sequential Order Mode

 (b) Bit-Inverted Order Mode

 Figure 6. The order of 8-Channel output data of FFT module

2.2.2 In-Place Forward-Backward Decomposition (IPFBD)

In the design of the decomposition part, the Bit-Inverted order of the input data from the FFT module
must be processed first. Taking 32-point FFT as an example, the input data is divided into eight
streams at 300 MSps for eight sub FFT calculation, and the output order of sequential FFT output
and Bit-Inverted FFT output are shown in figure 6(a) and figure 6(b) respectively. In Bit-Inverted
order mode, there are eight output data streams: SUB_FFT_STREAM0∼SUB_FFT_STREAM7.

 –7–

2021 JINST 16 P08047
 Figure 7. upper/lower streams and the pairing method.

In the th stream at , the corresponding index of ( ) will be:
 
 = × + BitInvert( , )
 8 (2.4)
 
 = , = 0 ∼ 7, = 0 ∼ − 1
 8
Where BitInvert( , ) refers to a M-bit Bit-Inverted function to q. Because the ( ) and ( )
are real signals, we only need to calculate the first half of ( ) and ( ), where = 0 ∼ ( 2 − 1).
 From equation (2.3), we can see that in order to calculate ( ) and ( ) in a given , only
a pair of data is needed: ( ) and ( − ). Therefore, we can split the streams in figure 6(b) into
two groups: SUB_FFT_STREAM0∼3 refer to the upper-streams and SUB_FFT_STREAM4∼7
refer to the lower-streams.
 The upper/lower streams and the pairing method is shown in figure 7. There are two things we
can find from figure 7. The first is that for a given (0 ∼ 2 − 1), ( ) and ( − ) appear in
the upper-streams and lower-streams respectively. The second thing is that the required ( ) and
 ( − ) sometimes appear at the same time, but sometimes not. For example, in order to calculate
 (2) and (2), we need (2) and (30), which both appear in 2. However, for calculating
 (1) and (1), we need (1) and (31), which appear at 2 and 3 respectively. Since the
PC-FFT module is a Real-Time arithmetic module, a buffer is necessary to catch data between the
output of PC-FFT and the input of the decomposition module.
 Traditionally, a Ping-Pong buffering method could be used in these situations. It consists of
two identical buffers. At a time, one buffer is used for caching current data, and another buffer,
which contains the previous data of streams, is used as the input of the decomposition module. It is
relatively simple, but the disadvantage is that it requires more BRAM for the buffer. In this design,
256 18Kb-BRAM blocks are needed for the Ping-Pong buffer storage of data stream(18bit data
width), which is a large consumption of BRAM resource. Therefore, an In-Place buffering method
with one buffer is required.
 IPFBD method is implemented for saving BRAM resources. When decomposing ( ) and
 ( ), ( ) and ( − ) will be read out from the buffer. New output data from PC-FFT module

 –8–

can be written to the same address to take the place for ( ) and ( − ) in the same buffer at the
same time. Therefore, only one buffer is required in IPFBD. Comparing to the Ping-Pong buffering
method, half of the buffer resource is saved.
 Also taking 32-point FFT as an example. Suppose in the previous step, the ( ) is stored in
the buffer as the same order shown in figure 6(b), and the storage map is illustrated in figure 8(a),
which is at the time, 0. Then the first round of decomposition happens like this:

(1) In the first step, 1, the first column of the streams is used to decompose ( ), ( )
 where = 0, 4, 8, 12. According to equation (2.3), (0), (4), (8), (12), (16), (20),
 (24), (28) are required. After the decomposition, new data can be written to the first row,

 2021 JINST 16 P08047
 as figure 8(b).

(2) In the second step, 2, the second column is processed in the similar way as what happened
 at 1, which is illustrated in figure 8(c).

(3) In the third step, 3, things happened differently. The required data of the upper-streams
 exists in the third column, but for the lower streams it exists in the fourth column. Therefore, the
 new data must be written to the third and fourth column respectively, as shown in figure 8(d).

(4) In the fourth step, 4, the fourth column in the upper streams is processed in the similar way
 as step 3, which is illustrated in figure 8(e). This is the end of decomposition of this round, and
 new data was written to the buffer.

 From above procedure, we can see that the column address of the lower-streams is little more
complex. Sometimes it goes forward, sometimes backward. It can be calculated according to
equation (2.3)and (2.4).
 After the first round of the decomposition, the storage of the ( ) is illustrated in figure 8(e).
By studying the arrangement of the ( ) in this time, we can find out that all the necessary data
of the upper-streams and lower-streams happens to be in the same columns. It will be good news
for the next round decomposition, for the read/write order of the column address could be all in
forward mode at this time. Therefore, after the second round decomposition, the new ( ) will be
arranged as figure 8(f), which is in a similar order as figure 8(a). And then the next third round of
decomposition will be the same as the first round.
 There are two kind of addressing modes for the lower-streams in the decomposition procedure:
the first is described above, known as the first round decomposition; the second is sequential. These
two kinds of addressing mode are used alternatively in the whole time to construct an In-Place
buffering algorithm.
 In Xilinx ISE, the True Dual-Port RAM(TDP-RAM) works on Read-First can be used for the
In-Place method. In Read-First mode,3 data previously stored at the Write address shows up on the
data output port, while the input data is being stored in the memory [23]. Therefore, the IPFBD
algorithm can be implemented for pipeline decomposition with only one TDP-RAM, which can save
half of the BRAM resources. Comparison between the two buffering methods is shown in table 4.
 In summary, The total 18Kb BRAM consumption for the 64K-Channlized core is 344 18 Kb-
BRAM blocks, which is 41.3% of total 18Kb-BRAM on Xilinx Virtex-6 LX240T FPGA chip.
 3https://www.xilinx.com/support/documentation/ip_documentation/blk_mem_gen/v7_3/pg058-blk-mem-gen.pdf.

 –9–

2021 JINST 16 P08047
Figure 8. Data steam updating in the buffer. ( ) means the ( ) in the th round of calculation.

 Table 4. BRAM consumption for Ping-Pong Method and IPFBD(The output bit width is 18 bits.)

 18 Kb BRAM Consumption
 Ping-Pong Method 16 × 8×2
 IPFBD Algorithm 16 × 8

 – 10 –

Table 5. Summary of BRAM consumption for the 64K-Channelized core.

 18 Kb BRAM Consumption
 PC-FFT 27 × 8
 IPFBD 16 × 8
 Total 344

2.2.3 Total resource consumption for the digital back-end
With PC-FFT and IPFBD, we implemented a DGB. It consists all necessary modules, such as ADC-

 2021 JINST 16 P08047
input, Dual-Input 64K-Channelized, power accumulation, Ethernet and state monitoring module
and etc. The total resources consumption is illustrated in table 6. The total BRAM consumption
is 96% on Xilinx Virtex-6 LX240T. Therefore, we can see that the reduced BRAM usage due to
the new methodology allows for a doubling of the spectral resolution. Without these changes, the
usage at this resolution would have exceeded the FPGA resources.

 Table 6. Total Resource Consumption.

 Slice Logic Utilization Used Utilization
 Number of Occupied Slices 18250 48%
 Number of LUT Flip Flop pairs used 60209 20%
 Number of RAMB18E1 768 92%

3 Results

With the efficient channelization architecture, we implemented a 64K-Channelized digital back-
end with high frequency resolution (18.3 kHz) and wide bandwidth (1.2 GHz) based on Virtex-6
LX240T FPGA. We did the test in the laboratory to confirm the high frequency resolution, and the
we took the backend to Yunnan Observatory for the pulsar observation experiment.

3.1 Test in the lab

In the 64K-Channelized digital back-end, the frequency resolution is 18.3 kHz, so we can gen-
erate a two-tone signal with two Δ frequency difference. An ADALM-PLUTO4 from ADI
(Analog Devices) was used as the signal generator, which can generate the two-tone signal. The
photo of ADALM-PLUTO is shown in figure 9. The two frequencies we suppose to have are
453.22265625 MHz and 453.25927734 MHz.
 With the digital back-end, we can get the frequency spectra data from the Ethernet port on the
back-end, and the result is shown in figure 10. The test result demonstrates that the 64K-Channelized
digital back-end has high spectral resolution and it can be implemented on a resource-limited FPGA
chip, such as a Xilinx Virtex-6 LX240T FPGA chip.

 4https://www.analog.com/en/design-center/evaluation-hardware-and-software/evaluation-boards-kits/adalm-
pluto.html.

 – 11 –

2021 JINST 16 P08047
 Figure 9. The photo of ADALM-PLUTO.

 Figure 10. Power spectra of the two-tone signal.

3.2 Observation at Yunnan Observatory

We also did pulsar observation experiment successfully at Yunnan Observatory on June 18, 2021
with the digital back-end. The pulsar we observed is B0329+54. The some of the information about
the pulsar is shown in table 7, which is from the ATNF5 pulsar database [24].
 The observation is completed on S-band. The S-band receiver at Yunnan Observatory covers
from 2190 MHz to 2300 MHz, and the frequency of LO(Local Oscillator) is 2000 MHz, so the valid
IF(intermediate frequency) is from 190 MHz to 300 MHz, which is shown in following figures.
 In the design, the number of frequency channels is 65536, and the frequency resolution is
18.3 kHz. We finished 5-minute observation at Yunnan Observatory. The period of B0329+54 is
about 0.714 s, and the period folded results are shown in figure 11. In the two figures, -axis refers
to phase, ranging from 0 to 1.

 5https://www.atnf.csiro.au/research/pulsar/psrcat/.

 – 12 –

Table 7. The information about B0329+54.

 Parameters Values
 NAME B0329+54
 DM 26.7641
 PEPOCH 46473.00
 P0 0.714519699726s
 P1 2.048265E-15s/s

 2021 JINST 16 P08047
 (a) (b)

 Figure 11. (a) The Phase-Frequency figure of B0329+54. (b) The profile of B0329+54.

 The frequency resolution of the digital back-end is 18.3 kHz, so the number of frequency
channels in the 110 MHz observed bandwidth is
 110 MHz
 =≈ 6011 (3.1)
 18.3 kHz
Therefore, the power of the pulsar is distributed among 6011 channels, which makes the signal not
clear in the Phase-Frequency figure. We added up the power of 110 consecutive spectral channels,
and obtained 55 equivalent spectral channels. The bandwidth of each equivalent spectral channel
is 11 MHz. The processed result is shown in figure 12.
 The comparing of figure 11(a) and figure 12 can also demonstrate that the digital back-end has
high frequency resolution.

4 Conclusion

An efficient channelization architecture, consisting of BIPC-FFT and IPBFD, is present in this paper.
The efficient architecture can assist save a significant amount of hardware resources, allowing a
wide-bandwidth and high-resolution digital back-end to be implemented on a hardware platform
with limited resources. With the efficient architecture, a Dual-Input, 64K-Channelized prototype
digital back-end with 18.3 kHz spectral resolution and 1.2 GHz bandwidth is implemented on a
Xilinx Virtex-6 LX240T FPGA chip, which makes full use of FPGA resources, improves resource

 – 13 –

2021 JINST 16 P08047
 Figure 12. The observing bandwith is divided into 55 spectral channels.

efficiency. It is possible to implement a more powerful digital back-end on a more advanced FPGA
chip with the efficient architecture.

Acknowledgments

This work was supported by The National Natural Science Foundation of China under Grant
U1731120. The authors also gratefully acknowledge the helpful comments and suggestions from
the reviewers. We also thank the staff of Yunnan Astronomical Observatory for their help.

References

 [1] A. P. Siemion, D. Werthimer, G. Marcy and M. W. Leeuwen, The fly’s eye: Instrumentation for
 detection of radio ephemeron, https://casper.berkeley.edu.
 [2] S. J. Bell Burnell, Little green men, white dwarfs or pulsars?, Cosmic Search 1 (1979) 16.
 [3] F. Combes, The Square Kilometer Array: cosmology, pulsars and other physics with the SKA,
 2015 JINST 10 C09001 [arXiv:1504.00493].
 [4] L. Levin et al., The High Time Resolution Universe Pulsar Survey VIII: The Galactic millisecond
 pulsar population, Mon. Not. Roy. Astron. Soc. 434 (2013) 1387 [arXiv:1306.4190].
 [5] J.P.W. Verbiest et al., Timing stability of millisecond pulsars and prospects for gravitational-wave
 detection, Mon. Not. Roy. Astron. Soc. 400 (2009) 951 [arXiv:0908.0244].
 [6] S. Heyminck, U.U. Graf, R. Gusten, J. Stutzki, H.W. Hubers and P. Hartogh, GREAT: the SOFIA
 high-frequency heterodyne instrument, Astron. Astrophys. 542 (2012) L1.
 [7] R. Güsten, L. Nyman, P. Schilke, K. Menten, C. Cesarsky and R. Booth, The atacama pathfinder
 experiment (apex)–a new submillimeter facility for southern skies–, Astron. Astrophys. 454 (2006)
 L13.
 [8] T.H. Hankins and B.J. Rickett, Pulsar signal processing, Meth. Comput. Phys. 14 (1975) 55.

 – 14 –

[9] B. Klein, S. Hochgurtel, I. Kramer, A. Bell, K. Meyer and R. Gusten, High-resolution wide-band Fast
 Fourier Transform spectrometers, Astron. Astrophys. 542 (2012) L3 [arXiv:1203.3972].
[10] S. Stanko, B. Klein and J. Kerp, A Field programmable gate array spectrometer for radio astronomy.
 First light at the Effelsberg 100-m telescope, Astron. Astrophys. 436 (2005) 391
 [astro-ph/0503067].
[11] B. Klein, S. D. Philipp, R. Güsten, I. Krämer and D. Samtleben, A new generation of spectrometers
 for radio astronomy: Fast fourier transform spectrometer, Proc. SPIE 6275 (2006) 627511.
[12] M.J. Keith et al., The High Time Resolution Universe Pulsar Survey I: System configuration and
 initial discoveries, Mon. Not. Roy. Astron. Soc. 409 (2010) 619 [arXiv:1006.5744].

 2021 JINST 16 P08047
[13] R. DuPlain, S. Ransom, P. Demorest, P. Brandt, J. Ford and A. L. Shelton, Launching guppi: the
 green bank ultimate pulsar processing instrument, Proc. SPIE 7019 (2008) 70191D.
[14] G. Hampson, A. Brown and C. Vimiera, A 1 GHz pulsar digital filter bank and RFI mitigation system,
 Australia Telescope National Facilty (2008).
[15] A. Melis, R. Concu, A. Trois, A. Possenti, A. Bocchinu, P. Bolli et al., Sardinia Roach2-based Digital
 Architecture for Radio Astronomy (SARDARA), J. Astron. Instrum. 7 (2018) 1850004.
[16] J. Kocz, L. Greenhill, B. Barsdell, D. Price, G. Bernardi, S. Bourke et al., Digital signal processing
 using stream high performance computing: a 512-input broadband correlator for radio astronomy,
 J. Astron. Instrum. 4 (2015) 1550003.
[17] MITEoR collaboration, MITEoR: a scalable interferometer for precision 21 cm cosmology, Mon. Not.
 Roy. Astron. Soc. 445 (2014) 1084 [arXiv:1405.5527].
[18] J. Zwart, R. Barker, P. Biddulph, D. Bly, R. Boysen, A. Brown et al., The Arcminute Microkelvin
 Imager, Mon. Not. Roy. Astron. Soc. 391 (2008) 1545 [arXiv:0807.2469].
[19] W. Liu, Q. Meng, J.-L. Han, C. Wang, T. Zhang and X. Dong, A 1.2 ghz bandwidth digital backend
 for pulsar observation, in proceedings of Progress in Electromagnetics Research Symposium-Fall,
 Singapore, 19–22 November 2017, pp. .
[20] H.J. Nussbaumer, The fast fourier transform, in Fast Fourier Transform and Convolution Algorithms,
 Springer (1981), pp. 80–111.
[21] D. Werthimer, The CASPER collaboration for high-performance open source digital radio astronomy
 instrumentation, in proceedings of the XXXth URSI general assembly and scientific symposium,
 Istanbul, Turkey, 13–20 August 2011, pp. 1–4.
[22] X. Wang, Q. Meng, H. Jinlin, W. Chen and J. Zhang, A wideband real-time spectrometer based on
 combined complex fft for radio astronomy, in proceedings of the 9th International Symposium on
 Communication Systems, Networks & Digital Sign, Manchester, U.K., 23–25 July 2014, pp. 685–689.
[23] Xilinx, LogiCORE IP, Block memory generator.
[24] R.N. Manchester, G.B. Hobbs, A. Teoh and M. Hobbs, The Australia Telescope National Facility
 pulsar catalogue, Astron. J. 129 (2005) 1993 [astro-ph/0412641].

 – 15 –

You can also read