FEDERATED DROPOUT LEARNING FOR HYBRID BEAMFORMING WITH SPATIAL PATH INDEX MODULATION IN MULTI-USER MMWAVE-MIMO SYSTEMS

Page created by Jeffrey Frank
 
CONTINUE READING
FEDERATED DROPOUT LEARNING FOR HYBRID BEAMFORMING WITH SPATIAL PATH INDEX
                                                                MODULATION IN MULTI-USER MMWAVE-MIMO SYSTEMS

                                                                                        Ahmet M. Elbir† , Sinem Coleri† and Kumar Vijay Mishra+

                                                                    †
                                                                        Department of Electrical and Electronics Engineering, Koç University, Istanbul, Turkey
                                                                           +
                                                                             United States CCDC Army Research Laboratory, Adelphi, MD 20783 USA
arXiv:2102.07450v1 [eess.SP] 15 Feb 2021

                                                                            ABSTRACT                                             from reduced array gain because it always uses a subarray. For a
                                           Millimeter wave multiple-input multiple-output (mmWave-MIMO)                          single-user scenario, [8] proposed beamspace-based approaches for
                                           systems with small number of radio-frequency (RF) chains have                         spatial path index modulation (SPIM), which modulates the indices
                                           limited multiplexing gain. Spatial path index modulation (SPIM)                       of the spatial paths to create different spatial patterns. The use of
                                           is helpful in improving this gain by utilizing additional signal bits                 beamspace model is further exploited in [9] by employing lens arrays
                                           modulated by the indices of spatial paths. In this paper, we introduce                at both transmitter and receiver to improve the bit-error-rate (BER).
                                           model-based and model-free frameworks for beamformer design in                             Apart from BER, spectral efficiency is utilized as a performance
                                           multi-user SPIM-MIMO systems. We first design the beamformers                         metric in [7] for SPIM-based transmitter design. Here, theoretical
                                           via model-based manifold optimization algorithm. Then, we leverage                    conditions for SPIM-MIMO to outperform mmWave-MIMO are intro-
                                           federated learning (FL) with dropout learning (DL) to train a learning                duced. The SPIM structure in [7] considers only analog beamformer
                                           model on the local dataset of users, who estimate the beamformers                     design, for which the same baseband beamformers are used even if
                                           by feeding the model with their channel data. The DL randomly                         the structure of the analog beamformer is changed due to the selec-
                                           selects different set of model parameters during training, thereby                    tion of different spatial patterns. Analog-only beamformer design
                                           further reducing the transmission overhead compared to conventional                   is also considered in [11] for uplink multi-user scenario with code-
                                           FL. Numerical experiments show that the proposed framework ex-                        book design. A joint design for analog and baseband precoders for
                                           hibits higher spectral efficiency than the state-of-the-art SPIM-MIMO                 SPIM is performed in [12] by implementing zero-forcing baseband
                                           methods and mmWave-MIMO, which relies on the strongest propa-                         precoding and selecting the steering vectors as analog beamformer
                                           gation path. Furthermore, the proposed FL approach provides at least                  candidates. Similar to [12], [7, 8] also design the analog precoders
                                           10 times lower transmission overhead than the centralized learning                    with a predefined codebook of steering vectors, which entails a beam
                                           techniques.                                                                           training task prior to the precoder design. Most of the aforementioned
                                                                                                                                 works investigate the single-user scenario. Their extension to the
                                               Index Terms— Dropout learning, federated learning, manifold
                                                                                                                                 multi-user case remains a challenge. Although [11] considered the
                                           optimization, massive MIMO, spatial modulation.
                                                                                                                                 uplink multi-user SPIM architecture, it included the codebook of
                                                                                                                                 analog-only beamformers at the user end.
                                                                        1. INTRODUCTION
                                                                                                                                     In this paper, we design both analog and digital beamformers for
                                           The millimeter wave multiple-input multiple-output (mmWave-                           a downlink multi-user scenario using model-based and model-free
                                           MIMO) communications systems substantially improve the through-                       techniques. We leverage the optimality of the manifold optimization
                                           put in the fifth generation (5G) networks [1, 2]. As an emerging 5G                   (MO) [13, 14] for the model-based approach. Then, taking advantage
                                           technology, index modulation (IM) is attractive primarily because it                  of the model-free structure of learning-based methods [15–17] to
                                           offers both improved energy efficiency and spectral efficiency over                   improve robustness and computational efficiency, we train a global
                                           conventional modulations. The IM encodes additional information                       model through federated learning (FL). All users contribute to the
                                           in the indices of the transmission media such as subcarriers [3, 4],                  learning process by computing the model updates with respect to their
                                           antennas [5, 6], and spatial paths [7–9]. In this paper, we focus                     local datasets. The model updates are then collected at the base station
                                           on spatial modulation (SM) in the context of mmWave-MIMO                              (BS) for model aggregation and then sent back to the users for the
                                           systems [10].                                                                         next communication round and the global model is iteratively updated.
                                               In mmWave-MIMO, hybrid analog-digital beamformers are em-                         Once trained, the model parameters are shared with each user, which
                                           ployed, where the number of radio-frequency (RF) chains is much                       can estimate the beamformers by simply feeding the model with its
                                           smaller than the antennas. While this saves cost and power, its mul-                  downlink channel matrix. As a result, a non-linear data mapping is
                                           tiplexing gain is limited [1]. The SM techniques have been shown                      constructed between the channel data (input) and the beamformers
                                           to be helpful in addressing this problem [7–9]. In [5], an antenna                    (output), wherein a convolutional neural network (CNN) with dropout
                                           grouping (AG) approach is proposed for point-to-point communica-                      learning (DL) is designed [18]. The DL allows randomly selecting a
                                           tion, wherein some antenna elements are (de)activated to provide                      fraction (up to ~50%) of the model parameters, thus further reducing
                                           SM in terms of active/passive antenna indices. This approach suffers                  the communication cost during FL-based training.
                                                                                                                                     Unlike the conventional centralized learning (CL) methods [19–
                                               S. Coleri acknowledges the support of the Scientific and Technological Research
                                           Council of Turkey (TUBITAK) EU CHIST-ERA grant 119E350. A. M. Elbir acknowl-          21], where the BS collects all of the training datasets from users,
                                           edges the support of TUBITAK.                                                         our proposed FL-based approach is advantageous because of less
transmission overhead; it is further reduced by employing DL to send
approximately half of the model parameters to the users. We validate
this through extensive numerical experiments and demonstrate that
the proposed model-based and model-free approaches have supe-
rior spectral efficiency than the state-of-the-art model-based SPIM
techniques [7] as well as outperforming the conventional mmWave-
MIMO [22]. Apart from maintaining satisfactory prediction perfor-
mance, the model-free FL offers a communication-efficient train-
ing, which requires approximately 10 times lower communication
exchange for model parameter transmission than the conventional
CL-based techniques.
                                                                              Fig. 1. The SPIM-MIMO architecture processes the incoming data
                        2. SYSTEM MODEL                                       streams {su }u∈U and employs spatial path index information s0
                                                                              in a switching network, which connects NRF = U RF chains to
Consider a multi-user MIMO scenario with SPIM (SPIM-MIMO),                    M̄ = U M taps on the analog beamformers to exploit one of the M
where the BS has NT antennas to communicate with U users, each of             spatial paths per user.
which has NR antennas, via a single data stream. Then, the vector of
all data symbols are given by s = [s1 , . . . , sU ]T ∈ CU . Additionally,            (u)
the spatial path index information represented by s0 is fed to the            and [aT (ϕ)]n = √1 exp{−jπ (n − 1) sin(ϕ)}, respectively.
                                                                                                     NT
switching network (Fig. 1) to randomly assign the outputs of NRF =                          √            √
                                                                              Σu = diag{ γu,1 , . . . , γu,M } is an M × M diagonal matrix
U ≤ M̄ RF chains to the M̄ taps of the analog beamformer. Thus, the           including the scattering path gains γu,m [7].
BS can process at most M̄ spatial paths, for which M̄ = U M ≤ NT ,                                       (i)
                                                                                  The received signal yu is then processed by analog combiner
where M denotes the number of available spatial paths for each user.            (u,i)     NR
                                                                              wRF ∈ C        as
Compared to the conventional mmWave-MIMO, SPIM-MIMO has
                                                                                                             (u,i)H        (i)   (i)
the advantage of transmitting additional data streams by exploiting                              yeu(i) = wRF         Hu FRF FBB s + n
                                                                                                                                     eu ,                     (4)
the spatial pattern of the mmWave channel with limited RF chains,                                   (u,i)H

i.e., NRF ≤ M̄ [7]. If M = 1, i.e., NRF = M̄ , then SPIM-MIMO                 where n  eu = wRF nu . Similar to the analog precoders, the
                                                                                                    (u,i)
reduces to conventional mmWave-MIMO because there is only one                 analog combiner wRF also has constant-modulus elements, i.e.,
                                                                                  (u,i)         √
choice of transmission [9].                                                   |[wRF ]n | = 1/ NR , n = 1, . . . , NR .
                (i)                        (i)
     Assume FRF ∈ CNT ×U and FBB ∈ CU ×U be the analog and                                                                   (i)   (i)
                                                                                  Our goal is to design the beamformers, FRF , FBB and WRF =
                                                                                                                                              (i)

baseband beamformers corresponding to the i-th spatial pattern, re-              (1,i)        (U,i)
                                                                              [wRF , . . . , wRF ] by exploiting SPIM. The downlink channel Hu
spectively, for i = 1, . . . , M U , i.e., selecting one of the M paths for   is available for u ∈ U = {1, . . . , U } and used to design the beam-
each user. The signal vector transmitted by the NT antennas is                formers with FL-based training, in which a learning model is trained
                                     (i)    (i)
                          x(i) = FRF FBB s.                            (1)    to provide a mapping from the channel matrix to the beamformers.
Note that (1) includes the design of both analog and baseband beam-
formers for each spatial pattern whereas the method in [7] designs               3. BEAMFORMING VIA MODEL-BASED APPROACH
only analog beamformers and uses a fixed baseband beamformer.
                         (i)
    The RF precoders FRF , which are constructed by phase shifters,           We construct the analog beamformers via simultaneously incorporat-
                                            (i)
have constant-modulus elements, i.e., |[FRF ]m,n | = √1 . In ad-              ing all of the M spatial paths, in which the analog beamformers corre-
                                                              NT                                               (u)    (u,1)         (u,M )
                                              (i) (i)                         sponding to all spatial paths FRF = [f RF , . . . , f RF ] ∈ CNT ×M
dition, we have the power constraint        kFRF FBB k2F
                                                     = NRF that is                              (u)     (u,1)        (u,M )
                                  (i)                                         per user and WRF = [wRF , . . . , wRF ] ∈ CNR ×M are designed
enforced by the normalization of FBB . Finally, the NR × 1 received
                                                                              for u ∈ U. Then, we design the baseband precoders after taking into
signal by the u-th user becomes
                                                                              account the interference among the users. Given Hu 1 , the analog
                                   (i)     (i)
                     yu(i) = Hu FRF FBB s + nu ,                       (2)                     (u)
                                                                              beamformer FRF ∈ CNT ×M is designed by minimizing the distance
                 NR ×NT                                                                                     (u) (u)
where Hu ∈ C            represents the mmWave channel matrix be-              between the beamformer FRF f BB ∈ CNT and the optimal digital
tween the BS and the u-th user and nu ∼ CN (0, σn2 INR ) is tem-                           opt      NT
                                                                              precoder fu ∈ C , available from singular value decomposition
porarily and spatially white zero-mean Gaussian noise with variance           (SVD) of Hu [23]. Thus, the maximizing the spectral efficiency [20]
σn2 . The mmWave channel can be modeled as the contribution of M              is equivalent to solve
clustered paths from each user [22, 23]. Thus, Hu can be given by                                               (u) (u)
                                                                                    minimize kfuopt − FRF f BB k2F
                                 (u)   (u)H                                           (u) (u)
                        Hu =    AR Σu AT ,                             (3)           FRF ,f BB

                                 (u)  (u)           (u)                                               (u)                1      (u) (u)
where the steering matrices   = [aR (φ1 ), . . . , aR (φM )] ∈
                                AR                                                 subject to |[FRF ]m,n | = √              , kFRF f BB k2F = M,              (5)
C NR ×M         (u)     (u)           (u)
         and AT = [aT (ϕm ), . . . , aT (ϕM )] ∈ CNT ×M cor-                                                             NT
                                                                                                                                                       (u)
respond to the angle-of-arrival/angle-of-departure (AoA/AoD) an-              which is solved for u ∈ U to obtain the analog precoders {FRF }u∈U .
gles φm and ϕm , for m = 1, . . . , M , respectively. For a uni-                 Similarly, the following optimization yields analog combiners
                                               (u)        (u)
form linear array (ULA), the n-th element of aR (φ) and aT (ϕ)
                                                                                 1
                    (u)             1                                              The estimate of Hu is obtained via both learning- [17, 24, 25] and non-learning-
can be defined as [aR (φ)]n = √         exp{−jπ (n − 1) sin(φ)}               based [22, 26] approaches. We assume Hu is obtained prior to the beamformer design.
                                           NR
(u)                                                                                                                                         (l)          (l)
WRF ∈ CNR ×M for all possible paths:                                                  model training, i.e., minimizeθ D1u Dl=1
                                                                                                                                 P u
                                                                                                                                        L(f (Xu |θt−1 ), Yu ),
                                         (u)             (u)      (u)                 with the use of the local gradient gu (θt ), where θt denotes the
                  minimize kwMMSE − WRF wBB k2F
                       (u)
                 WRF ,wBB
                               (u)                                                    model parameters at the t-th iteration and L(·) is the loss function.
                                                                                      Then, the u-th user transmits gu (θt ) to the BS. Once the gradient
                               (u)          1
              subject to |[WRF ]n,m | = √       ,            (6)                      data from all users are collected, the BS finally   Pincorporates  gu (θt )
                                            NR                                        for u ∈ U to update θt as θt+1 = θt − ηt U1           U
                                                                                                                                            u=1 gu (θt ), where
        (u)H          H                −1 optH H                                                                            (l)      (l)
where wMMSE = fuopt HHu Hu fuopt +σn2                                                 gu (θt ) = D1u Dl=1
                                                                                                          P u
                                           fu Hu is the NR ×1                                                     ∇θ L(f (Xu |θt ), Yu )) for learning rate ηt .
optimum combiner using minimum-mean-squared-error (MMSE)                              After model aggregation, the BS returns the updated model parame-
estimation, which is used to obtain unconstrained combiner [23].                      ters θt+1 to the users, which will be used for the computation of the
  (u)           (u)H             (u)            (u)H             (u)                  gradients in the next iteration.
wBB = (WRF Λyu WRF )−1 (WRF Λyu wMMSE ) ∈ CM is used
                                                         (u)                                The proposed network architecture is a CNN comprised of 10
to compute all analog combiners. Once                   WRF
                                             is found, the receiver
                                (u)                                                   layers. The first layer is the input layer, which accepts the input data
only uses a single column of WRF as a combiner vector. The co-                        of size NR × NT × 3. The {2, 4, 6}-th layers are the convolutional
variance matrix of the received signal in (2), for which the ana-                     layers with NSF = 128 filters, each of which employs a 3 × 3 kernel
                                                  (u)        (u)
log and baseband precoders are replaced with FRF and FBB , is                         for 2-D spatial feature extraction. The {3, 5, 7}-th layers are the
                (u)      (u)    (u)H     (u)H
Λyu = Hu FRF FBB FBB FRF HHu +σn2 INR ∈ CNR ×NR .                                     normalization layers. The eighth layer is a fully connected layer
    The optimization problems in (5) and (6) are effectively solved via               with NFCL = 1024 units, whose main purpose is to provide feature
alternating minimization techniques, such as manifold optimization                    mapping. The ninth layer is a dropout layer with κ = 1/2 probability.
(MO) or “Manopt” algorithm [13, 14, 24]. This is optimal in the                       The dropout layer applies an NFCL × 1 mask on the weights of the
sense that it achieves the minimum Euclidean distance between the                     fully connected layer, whose elements are uniform randomly selected
unconstrained and hybrid beamformers.                                                 from {0, 1}. As a result, at each iteration, DL randomly selects
                                                           (u)          (u)           different set of weights in the fully connected layer, thereby reducing
    To exploit SPIM, only one column of FRF and WRF is selected                       the size of θt and gu (θt ), thereby, reducing model transmission
                                 (u,i)   (u)           (u,i)
for the i-th spatial pattern as fRF = FRF b(u,i) and wRF =                            overhead. Finally, the last layer is output regression layer, yielding
   (u)
WRF b(u,i) . Denote B(u) = {b(u,i1 ) , . . . , b(u,iM ) } to be the set of            the output channel estimate of size (2NT U + NR ) × 1. Once the
selected paths for all possible path configurations of the u-th user                  training is completed, each user feeds the model with Hu and obtains
                                                                                                             (u,i)
for im = 1, . . . , M . Then, the entries of b(u,i) ∈ RM are all zeros                its beamformer wRF and F(i) , which is fed back to the BS.
except the im -th element, which is unity and denotes selection of the                      We further examine the transmission overhead which can be
im -th spatial path for the u-th user.                                                defined as the size of the transmitted data during model training.
    In order to mitigate interference among the users, the baseband                   Let TFL and TCL denote thePtransmission overhead of FL and CL,
beamformer  needs to  be updated by computing the effective channel                 respectively. Define D =           u∈U Du so that TCL = 3NT NR +
                   (1,i)
                 heff                                                                 2NT U + NR D, which includes the number of symbols in the uplink
    (i)
as Heff   =
                   .
                    .          ∈C
                                  U ×U          (u,i)  (u,i)  (i)
                                        , where heff = wRF Hu FRF ∈                   transmission of the training dataset D from the users to the BS. In
                    .
                  (U,i)                                                               contrast, the transmission overhead of FL includes the transmission
                 heff
                   (i)           (1,i)          (U,i)                                 of gu (θt ) and θt in uplink and downlink communication for t =
C1×U where FRF = [fRF , . . . , fRF ]. Finally, the baseband pre-                     1, . . . , T , respectively. Finally, TFL is given by TFL = 2P T U. We
          (i)                            (i)       (i)−1
coder FBB is obtained as FBB = Heff                         and it is normalized as   can see that the dominant terms are D and P , which are the number of
  (i)         (i)         (i)
[FBB ]u,: = [FBB ]u,: /k[FBB ]u,: k2 .                                                training data pairs and the number of CNN parameters, respectively.
                                                                                      While D can be adjusted according to the amount of available data
                                                                                      at the users, P is usually unchanged during model training. Here,
    4. BEAMFORMING VIA FEDERATED LEARNING                                             P = NCL (CNSF Wx Wy ) + κNSF Wx Wy NFCL , where NCL = 3
                                                                                      is the number of convolutional layers and C = 3 is the number of
The learning model accepts Hu as input and yields F(i) =                              spatial “channels”. Wx = Wy = 3 are the 2-D kernel sizes. As a
  (i) (i)                       (u,i)
FRF FBB ∈ CNT ×U and wRF at the output. Define Du be                                  result, we have P = 600, 192 whereas P = 1, 190, 016 if dropout
the local dataset of the u-th user, in which the l-th element is                      layer is removed.
           (l)  (l)              (l)     (l)
Dl = (Xu , Yu ), where Xu and Yu are the input and out-
put for l = 1, . . . , Du , and Du = |Du | is the size of the local
                                                                                                      5. NUMERICAL SIMULATIONS
dataset. The input Xu ∈ RNR ×NT ×3 can be constructed by
“three-channel” data, whose the first and second “channel” can be                     We compared the performance of FL-based SPIM-MIMO with
designed as the element-wise real and imaginary part of Hu as                         mmWave-MIMO and the state-of-the-art model-based SPIM-MIMO
[Xu ]1 = Re{Hu } and [Xu ]2 = Im{Hu }, respectively. Also, the                        Wang et al. [7] in terms of spectral efficiency averaged over 1000
third channel can be constructed as [Xu ]3 = ∠{Hu }, which is                         Monte Carlo trials. The local dataset of each user includes N = 200
demonstrated to improve the feature extraction performance [19,                       different channel realizations for U = 8 users. The number of anten-
27]. Then, the output Yu ∈ R(2NT U +NR )×1 is constructed as                          nas at the BS and the users are NT = 128 and NR = 9, respectively.
                                                            (u,i)T
Yu = vec{vec{Re{F(i) }, Im{F(i) }}T , ∠wRF }T .                                       We select the number of available spatial paths for each user as
    In FL, the training dataset D is partitioned into small portions, i.e.,           M = 2. The location of each user is selected as φu,m ∈ Φu and
Du , u ∈ U , which are available at the users and not transmitted to the              ϕu,m ∈ Ψ̄u , for m = 1, . . . , M , where Φu andSΨ̄u are the equally-
BS. Let θ ∈ RP denote the learnable parameters of size P , then FL
                                                                                                                                                     S
                                                                                      divided subregions of the angular domain Θ = u∈U Φu = u Ψ̄u ,
solves the following problem for the t-th communication round of the                  Θ ∈ [30◦ , 150◦ ] as in [16]. During training, each channel realization
15                                                                               12

                                                                                       11

                                                                                       10
      10
                                                                                        9

                                                                                        8

                                         11.5                                           7
       5                                  11
                                         10.5                                           6
                                          10
                                          9.5                                           5
                                                14   16   18    20    22
       0                                                                                4
       -20   -15   -10   -5    0    5    10          15    20        25    30            0.5   0.55   0.6      0.65     0.7    0.75       0.8     0.85       0.9     0.95       1

Fig. 2. Spectral efficiency for mmWave-MIMO and SPIM-MIMO.                           Fig. 3. Spectral efficiency versus γ1 when SNR = 20 dB.

is corrupted by synthetic noise on the input data for three SNRTRAIN
levels, i.e., SNRTRAIN = {20, 25, 30} dB, for G = 200 realizations                                                    FL completes data transmission
                                                                                      108
in order to provide robust performance against noisy input [25, 27].                                                  after 952,000 blocks
As a result, the number of input-output pairs in the whole training                                         FL completes data transmission
dataset is D = 3U N G = 3 × 8 × 200 × 200 = 960, 000.                                                       after 480,000 blocks
                                                                                      106
     The proposed CNN model is realized and trained in MATLAB on
a PC with a 2304-core GPU. For CL, we use the stochastic gradient
                                                                                      104
descent (SGD) algorithm with momentum of 0.9 and the mini-batch
size MB = 128, and update the network parameters with learning                                 0.5    1       1.5       2     2.5     3         3.5      4         4.5      5
rate 0.001. For FL, we train the CNN for T = 50 iterations/rounds.                                                                                                          106
Once the training is completed, the labels of the validation data (i.e.,
20% of the whole dataset) are used in prediction stage.                         Fig. 4. Transmission overhead comparison of CL with FL, including
     It was shown in [7] that SPIM-MIMO outperforms mmWave-                     and excluding DL.
MIMO for M = 2 with γ1 ≤ 4γ2 , where γm = γu,m for u ∈ U
and m = 1, 2. Figure 2 shows the spectral efficiency with respect
                                                                                (3 · 128 · 9 + 2 · 128 · 8 + 9) · 960, 000 ≈ 5.3 × 109 , respectively. This
to SNR when the spatial path gains for all users are selected as
                                                                                clearly shows the effectiveness of FL over CL, i.e., approximately
γ1 = γ2 = 0.5. Note that both SPIM-MIMO and mmWave-MIMO
                                                                                10 times lower transmission overhead. In Fig. 4, we visualize the
use the same number of RF chains while SPIM-MIMO exploits the
                                                                                number of transmitted symbols with respect to transmission blocks,
spatial distribution of the paths. In contrast, mmWave-MIMO designs
                                                                                each of which carries 1000 symbols. We see that FL completes model
the precoders in accordance to the largest path gains, i.e., γ1 , in our
                                                                                training quicker than CL after approximately 480, 000 and 952, 000
case. We observe that Wang et al. provides less spectral efficiency
                                                                                transmission blocks with and without DL, respectively.
than the proposed model-based approach because it employs a single
baseband beamformer for all spatial patterns whereas the proposed
model-based approach updates the baseband beamformer FBB in
                                                                  (i)                                               6. SUMMARY
accordance to the different spatial patterns as well as suppressing
the interference among the users. The proposed FL approach has                  We presented both model-based and model-free frameworks for beam-
slight performance loss than the model-based method due to the loss             former design in multi-user SPIM-MIMO systems. Whereas the for-
during model training. It is worth noting that the performance of               mer leverages MO for beamformer design, the latter employs FL to
FL is upper bounded by the model-based technique since FL cannot                efficiently train the learning model. Our experiments showed that
perform better than its labels.                                                 the proposed approach has superior performance than the state-of-
                                                                                the-art SPIM techniques as well as outperforming the conventional
     In Fig. 3, we compare SPIM-MIMO and mmWave-MIMO with
                                                                                mmWave-MIMO systems in terms of spectral efficiency. Further-
respect to γ1 when γ2 = 1 − γ1 . We observe that both techniques
                                                                                more, the proposed FL approach exhibits a more communication-
meet when γ1 = 4γ2 for γ1 = 0.8. This clearly shows that the usage
                                                                                efficient learning method than conventional CL for model training.
of SPIM is appropriate if the path gain are close. The SPIM-MIMO
                                                                                We demonstrated that FL with (without) DL enjoys approximately 10
performance degrades as long as the difference between the path
                                                                                (5) times lower transmission overhead during model training lower
gains are large. As a result, mmWave-MIMO becomes favorable.
                                                                                transmission overhead than CL.
We note from both Fig. 2 and Fig. 3 that our proposed FL approach
closely follows the model-based technique.
     Next, we present the effectiveness of FL-based model training
by comparison to the CL-based training. According to the analysis
in Sec. 4, the transmission overhead of FL and CL are 2P T U =
2 · 600, 192 · 50 · 8 ≈ 480 × 106 and (3NT NR + 2NT U + NR )D =
7. REFERENCES                                    [14] A. M. Elbir, “A Deep Learning Framework for Hybrid Beam-
                                                                              forming Without Instantaneous CSI Feedback,” IEEE Trans.
 [1] R. W. Heath, N. González-Prelcic, S. Rangan, W. Roh, and                Veh. Technol., pp. 1–1, 2020.
     A. M. Sayeed, “An overview of signal processing techniques for
     millimeter wave MIMO systems,” IEEE J. Sel. Topics Signal           [15] A. M. Elbir and K. V. Mishra, “Cognitive Learning-Aided Multi-
     Process., vol. 10, no. 3, pp. 436–453, 2016.                             Antenna Communications,” arXiv preprint arXiv:2010.03131,
                                                                              2020.
 [2] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta,
     O. Edfors, and F. Tufvesson, “Scaling up MIMO: Opportunities        [16] A. M. Elbir and S. Coleri, “Federated Learning for Hybrid
     and challenges with very large arrays,” IEEE Signal Process.             Beamforming in mm-Wave Massive MIMO,” IEEE Commun.
     Mag., vol. 30, no. 1, pp. 40–60, 2013.                                   Lett., pp. 1–1, 2020.

 [3] E. Basar, “Index modulation techniques for 5G wireless net-         [17] A. M. Elbir and S. Coleri, “Federated Learning for Channel
     works,” IEEE Commun. Mag., vol. 54, no. 7, pp. 168–175,                  Estimation in Conventional and IRS-Assisted Massive MIMO,”
     2016.                                                                    arXiv preprint arXiv:2008.10846, 2020.

 [4] J. A. Hodge, K. V. Mishra, and A. I. Zaghloul, “Intelligent Time-   [18] “Expanding the reach of federated learning by reducing client
     Varying Metasurface Transceiver for Index Modulation in 6G               resource requirements, author=Caldas, Sebastian and Konečny,
     Wireless Networks,” IEEE Antennas and Wireless Propagation               Jakub and McMahan, H Brendan and Talwalkar, Ameet, jour-
     Letters, 2020, in press.                                                 nal=arXiv preprint arXiv:1812.07210, year=2018.”

 [5] L. He, J. Wang, and J. Song, “Spatial Modulation for More           [19] A. M. Elbir and K. V. Mishra, “Joint antenna selection and hy-
     Spatial Multiplexing: RF-Chain-Limited Generalized Spatial               brid beamformer design using unquantized and quantized deep
     Modulation Aided MM-Wave MIMO With Hybrid Precoding,”                    learning networks,” IEEE Trans. Wireless Commun., vol. 19,
     IEEE Trans. Commun., vol. 66, no. 3, pp. 986–998, 2018.                  no. 3, pp. 1677–1688, March 2020.

 [6] J. A. Hodge, K. V. Mishra, and A. I. Zaghloul, “Reconfigurable      [20] A. M. Elbir and A. Papazafeiropoulos, “Hybrid Precoding for
     metasurfaces for index modulation in 5G wireless communica-              Multi-User Millimeter Wave Massive MIMO Systems: A Deep
     tions,” in IEEE Int. Appl. Comput. Electromagn. Soc. Symp.,              Learning Approach,” IEEE Trans. Veh. Technol., vol. 69, no. 1,
     2019, pp. 1–2.                                                           p. 552–563, 2020.

 [7] J. Wang, L. He, and J. Song, “Towards Higher Spectral Effi-         [21] H. Huang, Y. Song, J. Yang, G. Gui, and F. Adachi, “Deep-
     ciency: Spatial Path Index Modulation Improves Millimeter-               learning-based millimeter-wave massive MIMO for hybrid pre-
     Wave Hybrid Beamforming,” IEEE J. Sel. Topics Signal Pro-                coding,” IEEE Trans. Veh. Technol., vol. 68, no. 3, pp. 3027–
     cess., vol. 13, no. 6, pp. 1348–1359, 2019.                              3032, 2019.

 [8] Y. Ding, V. Fusco, A. Shitvov, Y. Xiao, and H. Li, “Beam Index      [22] A. Alkhateeb, G. Leus, and R. W. Heath, “Limited feedback
     Modulation Wireless Communication With Analog Beamform-                  hybrid precoding for multi-user millimeter wave systems,” IEEE
     ing,” IEEE Trans. Veh. Technol., vol. 67, no. 7, pp. 6340–6354,          Trans. Wireless Commun., vol. 14, no. 11, pp. 6481–6494, 2015.
     2018.                                                               [23] O. E. Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath,
                                                                              “Spatially sparse precoding in millimeter wave MIMO systems,”
 [9] S. Gao, X. Cheng, and L. Yang, “Spatial Multiplexing With Lim-
                                                                              IEEE Trans. Wireless Commun., vol. 13, no. 3, pp. 1499–1513,
     ited RF Chains: Generalized Beamspace Modulation (GBM)
                                                                              2014.
     for mmWave Massive MIMO,” IEEE J. Sel. Areas Commun.,
     vol. 37, no. 9, pp. 2029–2039, 2019.                                [24] A. M. Elbir, K. V. Mishra, M. R. B. Shankar, and B. Ottersten,
                                                                              “Online and Offline Deep Learning Strategies For Channel Es-
[10] K. V. Mishra, M. R. Bhavani Shankar, V. Koivunen, B. Ot-
                                                                              timation and Hybrid Beamforming in Multi-Carrier mm-Wave
     tersten, and S. A. Vorobyov, “Toward millimeter wave joint
                                                                              Massive MIMO Systems,” arXiv preprint arXiv:1912.10036,
     radar-communications: A signal processing perspective,” IEEE
                                                                              2019.
     Signal Process. Mag., vol. 36, no. 5, pp. 100–114, 2019.
                                                                         [25] A. M. Elbir, A. Papazafeiropoulos, P. Kourtessis, and
[11] W. Wang and W. Zhang, “Spatial Modulation for Uplink Multi-
                                                                              S. Chatzinotas, “Deep Channel Learning for Large Intelligent
     User mmWave MIMO Systems With Hybrid Structure,” IEEE
                                                                              Surfaces Aided mm-Wave Massive MIMO Systems,” IEEE
     Trans. Commun., vol. 68, no. 1, pp. 177–190, 2020.
                                                                              Wireless Commun. Lett., vol. 9, no. 9, pp. 1447–1451, 2020.
[12] J. Zhu, P. Yang, Y. Xiao, X. Lei, and Q. Chen, “Low RF-
                                                                         [26] K. Venugopal, A. Alkhateeb, N. González Prelcic, and R. W.
     Complexity Receive Spatial Modulation for Millimeter-Wave
                                                                              Heath, “Channel estimation for hybrid architecture-based wide-
     MIMO Communications,” IEEE Commun. Lett., vol. 22, no. 7,
                                                                              band millimeter wave systems,” IEEE J. Sel. Areas Commun.,
     pp. 1338–1341, 2018.
                                                                              vol. 35, no. 9, pp. 1996–2009, 2017.
[13] X. Yu, J. Shen, J. Zhang, and K. B. Letaief, “Alternating Mini-
                                                                         [27] A. M. Elbir, “CNN-based precoder and combiner design in
     mization Algorithms for Hybrid Precoding in Millimeter Wave
                                                                              mmWave MIMO systems,” IEEE Commun. Lett., vol. 23, no. 7,
     MIMO Systems,” IEEE J. Sel. Topics Signal Process., vol. 10,
                                                                              pp. 1240–1243, 2019.
     no. 3, pp. 485–500, April 2016.
You can also read