Efficient Detection Algorithm for a Multiple-Input and Multiple-Output Multiuser Multi-Carrier Code Division Multiple Access in Time-Varying Channels

: Problem statement: Maximum Likelihood (ML) decoding has been applied for the uplink of a Multi-Carrier Code Division Multiple Access (MC-CDMA) system based on Orthogonal Frequency Division Multiplexing (OFDM). Multiple-Input Multiple-Output (MIMO) channel from k users, which moves at vehicular speed, to the Base-Station (BS) is time-varying. For time-varying channels, Sphere Decoding (SD) was introduced to perform ML decoding. Whereas, computational complexity of SD (due to a QR-factorization for each symbol), is nevertheless high. Modified SD had been proposed to achieve near optimum solutions that called Subspace-Sphere Decoder (SPSD). Approach: Proposed algorithm was based on subspace and orthogonal projection with very small dimensionality as robustness scheme in an iterative Multi-User Detection (MUD) and Parallel Interference Cancelation (PIC) method. Results: This approach had been achieved intense reduction of computational complexity for time-varying channel via one and more than one order of magnitude at channel estimation and multiuser detection respectively. Furthermore, SPSD was robustness to channel estimation error (about 3.8 dB) as compared to the representative counterparts in literature. Conclusion: Effectiveness of proposed method was demonstrated by simulations.


INTRODUCTION
We consider the uplink of a Multi-Carrier Code-Division Multiple Access (MC-CDMA) system based on Orthogonal Frequency Division Multiplexing (OFDM) with N subcarriers. We focus on A Multiple-Input Multiple-Output (MIMO) multi-user system. Each user k∈{1, …, K}, has T transmitted antennas and the base-station provides with R receive antennas. The receiver carries out iterative Parallel Interference Cancelation (PIC), channel estimation and Multi-User Detection (MUD) jointly [1][2][3] . For multi-user detection a subspace-based sphere decoder is employed.
In [3] , the researchers use the Sphere Decoding (SD) in an iterative receiver for each user independently after PIC is more robust to channel estimation errors than a Linear Minimum Mean Square Error (LMMSE) filter. For time-varying channels the computational In the following, we will omit the time index m unless necessary, the contribution of transmit antenna (k, t) to the signal at receive antenna r is r,( k ,t ) (k ,t ) s b ɶ . The received signals from all transmit antennas of all users at the receive antenna r, can be expressed as follows: This can be expressed in matrix notation as: The contribution of user k (i.e., derivation from symbols b (k) = [b (k,1) , …, b (k,1) ] T can be expressed as follows: Where: is the effective spreading sequences from all transmit antenna of user k to all receive antennas. To execute detection of the desirable user, removing the contribution of all other users (i.e., k k ≠ ) in (4), by accomplishing PIC: The soft symbols in are computed from the extrinsic probabilities and A Posteriori Probabilities (APP) respectively, equipped by the BCJR decoder (i.e., the (M-J) T detected symbols w (k, t) [m] are jointly de-mapped, de-interleaved and decoded using a BCJR decoder) [3,5] after detection using: To perform MUD entangles PIC (7) for user k, we combine the subspace SD algorithm in order to detect b (k) by an appropriate iterative structure to reduce the computational complexity.

Time-variant channel model:
The estimating of timevariant frequency response gr (k,t) [m] demonstrates performance of the iterative receiver structure since the effective spreading sequence r( k,t ) s ɶ truthfully depends on the factual channel realization.
The maximum variation in time of the wireless channel is upper-bounded by maximizing normalized one-sided Doppler bandwidth [4] : Where: v Dmax = The maximum (supported) velocity T s = The OFDM symbol duration f c = The carrier frequency c o = The speed of light Time-limited snapshots of the band-limited fading process span a subspace with very small dimensionality. The same subspace is spanned by Discrete Prolate Spheroidal (DPS) sequences [4] {u i [m]} are defined as [6] : [m]  ℂ for m∈(0,…M-1). The dimension D is order of 3-5 for practical issues [2,4] . Substituting (12) into (1) yields: And substituting (13) into (6) can be expressed as (k) Consequently, the received signal of user k after PIC given in (7) can be represented as: The sphere decoder is developed by context of above equation.

Sphere decoder and its modifications:
The SD technique was introduced in [7] to perform the Maximum Likelihood (ML) decoding by searching over only those points of the lattice that lie within a hypersphere of radius p around the received signal [8,9] . Firstly, the ML detection and its low computational complexity implementation using SD are recalled as described in [8] . Secondly, utilizing the SD for MUD in the iterative MIMO multiuser MC-CDMA uplink as mentioned above. Then, we exploit the details of the subspace SD structure by using the model (15) to reduce the computational complexity for a MIMO MC-CDMA system.

Maximum likelihood decoder:
For convenience, we omit the user index k and superscripts, replaced s with H in Eq. 7. The resulting signal model of a MIMO single-user system in a flat-fading channel is given by: Suppose there are T transmit and R receive antennas. Denote by T b ∈ ℂ and R y ∈ ℂ are the symbol transmitted and the received signal, respectively. Let The ML detection finds the data symbol vector b in the discrete alphabet A T such that maximizes the likelihood function as follows: b arg max f (y b) arg min{ y Hb 2} To identify the ML vector, an exhaustive search is required. Due to the number of candidate vectors for b and search over A T elements, the complexity increases exponentially with T We attempt to find a subset of b, with small number of candidate vectors, in which the ML vector can be found with a low computational complexity by a SD as following.
The sphere decoder algorithm: The SD have been introduced with small number of candidate vectors in order to reduce an exhaustive search in (17) [3,10,11] . This is achieved by considering the search into a subset C (p) defined as: where, p>0 is a given radius, the ML vector can be found in C (p) as the smallest cost: The SD technique finds C(p) for a given p by applying the thin QR factorization (or, equivalently, Cholessky decomposition of the Gram matrix (G = H H H) of the matrix H, is unique as defined in [12] . Hence, is an upper triangular matrix. From Eq. 18, the following constraints can be derived: where, z = Q H y, the error vector to be minimized is given by ∈ = z-Rb. For t ∈{1, …, T}, define the partial vectors and the partial matrix as follows: Noting that R is upper-triangular and ∈ (t) can be written as: and substituting (23) into (20) can be expressed as: where, d(t) is the partial distance, meaning that d(1) 2 >d(2) 2 >…>d(t) 2 . We denote the set of candidate symbols at step t, with partial distance d(t)≤p, by C t (p). The C t (p) can be found by backward recursion using (24), for an iterative implementation of SD. Note that if C t (p) becomes an empty, there is no vector in C (p), a larger value of p has to be chosen (if p→∝, i.e., over the whole set A t ) and the procedure has to be repeated.
In this case, we have C t (p) = A T-t+1. Then, an exhaustive search may be directed to find the ML vector with C (p) Hence, to verify C 1 (p) restrain at least the ZF-criterion given by: Consequently, the Eq. 20 becomes As soon as we obtain t such that d(t) 2 >p 2 (inferring d(1) 2 >p 2 ), we abandon all b ∈ A T having the partial b (t) = A T-t+1 If C t (p) is not empty, we can build C (p) from C t (p), t ∈{1,…,T}, as follows: where, × denotes the Cartesian product. After T steps the SD algorithm terminates. This technique can be demonstrated by tree-pruning in [3,8] . The distinct steps of the SD are shown in Table 1. For time-varying channels the computational complexity of the SD due to a QR-factorization for each symbol, is nevertheless high. Hence, we develop a lessened-rank lowcomplexity by using the Eq. 15 and its individual properties, for time-variant channels based on subspace and orthogonal projection with very small dimensionality. The main results of this study are based on the modified subspace-sphere method.
Subspace-sphere decoder combined with thin QR decomposition: the proposed algorithm is considered as an efficient detection algorithm for time-variant channel to reduce the computational complexity. That is a combination of the SD, with a subset of a vector space which employs the PIC at the iterative receiver scheme. We note that ( k ) The error vector to be minimized is given by . Hence, we revise the Eq. 1-3, 21 and 22 as follows: Noting that R (t) is upper triangular whose main diagonal elements are upper-triangular and elsewhere are full and Eq. 14 becomes as: Then ∈ (t) can be written as ( t ) ( t ) , the partial distance can be expressed as: Let us now drive the subsets of the vector space z (t) and the elements of R (t) , we obtain:  [3] Step T:

For all b[T]∈A compute d (T) 2 If d (T) 2 ≤ρ 2 store b[T]∈Ct(ρ)
Step τ: Step 1: Accordingly, the partial distance defined by (33) has now the form: Consequently, in order to commit the iterative Subspace-Sphere Decoder (SPSD) algorithm, the d(1) 2 can be found by backward recursion and initiating at t = T, using (35) with applying above counterpart's revise in Table 1. The following, we compute complexity of our algorithm in terms of floating point operation (flop).
Computational complexity: Let us here define the computational complexity of the various algorithms. That will be assessed in terms of the required number, as floating point operations (flops) [12] . A flop is an addition, subtraction, multiplication, division or square root operation in the real domain. Thus, one Complex Multiplication (CM) requires four real multiplications and two additions, leading to six flops. Similarly, one Complex Addition (CA) requires two flops. In our appointing, the crucial parameters are T and R (i.e., the size of system model), Q = A(i.e., the cardinality of the symbol alphabet A) and p t = C t (p)(i.e., the number of candidate vectors preserved at step t ∈{1,…,T}).

Complexity of the comprehensive search:
We demonstrate the advantage of the QR decomposition by evaluating the computational complexity for comprehensive (i.e., exhaustive) search subsequent to QR decomposition as [3] : In [12] , one QR Decomposition (QRD) of size NR×T, possesses complexity as: Complexity of the sphere decoder: Using the SD algorithm which is illustrated in Table 1, we can now elaborate on the computational complexity utilizing an iterative implementation of the sphere decoder method by refining the derivations in [3] . We define the cardinality preserved at step t such that We consider the computation at step t and time instant m (i.e., R (t) b (t) ) as follows:

RESULTS
Simulation setup: We use the same simulation setup as in [1,3] . With root mean square (rms) delay spread T D = 4T c ≅ 1 µs for a chip rate of R c = 3.84.10 6 Hz [14,15] . The autocorrelation for every channel tap is given by Clarke spectrum (e.g., resolvable paths, L = 15). The system operates at carrier frequency f c = 2 GHz and k∈{16, 24, 32, 64}, users move with velocity v = 102.5 km h −1 . According to mentioned parameters the Doppler bandwidth is, B D ≅ 190 Hz. The number of subcarriers is N = 64 and the OFDM symbol with cyclic prefix has length P = N + G = 79. The data block comprise of M = 256 OFDM symbols as well as l = 60 pilot symbols which results in a D≥ [2v Dmax M] + 1 ≅ 3. Due to investigate the diversity gain of the receiver only (i.e., no antenna gain) the MIMO channel taps are normalized as [13] : All depicted results are averaged over 100 independent channel realizations by applying the data transmission as a QPSK constellation [16] . Without loss of generality, we can assume each user applying equal number of transmitter and receiver antenna (i.e., T = R).

DISCUSSION
To evaluate the proposed method, simulations are performed in tree steps.
The bit error rate performance: Firstly, we compared the results in terms of Bit-Error-Rate (BER) versus SNR. We focus on the multiuser detector utilizes the subspace-sphere method. In Fig. 2 we illustrate the BER versus E b /N o for different users as well as in fully loaded systems (i.e., k = N = 64). The solid lines demonstrate the results for Perfect Channel knowledge (PC), while the dashed lines show the results for subspace-sphere Channel Estimates (CE). We also depict the Single User Bound (SUB), which indicates the BER that is achieved with one single user and assuming that the receiver has access to the exact channel knowledge (i.e., coefficients). We compare our results with [3] , as the following: • With exact knowledge of the channel: LMMSE, sphere decoder and subspace-sphere multiuser detector exhibit to perform, respectively • Utilizing channel estimates, the subspace-sphere method adjacent to sphere decoder for identical iteration. Subspace-sphere detection outperforms LMMSE detection, which is stouter to channel estimation error (about 3.8 dB) The QR decomposition computational complexity: Secondly, we show the advantage of the QRD in term of flops. For instance, suppose T = R = 4, the Eq. 37 and 38 becomes as: , we see that when executing a QRD first and subsequently comprehensive search, which permits complexity reduction of a factor approximately 1.85.
The global computational complexity: At the final step, we compared the computational complexity based on the number of flops required with other representative counterparts [3,13] . In Table 2, to exhibit an easy reference from our derivations, we comparison the analytic expressions for the total numbers of complex multiplications, additions required by the aforementioned methods.    Utilizing the subspace-sphere multiuser decoder for joint antenna detection with PIC leads to computational complexity reduction by more than one order of magnitude. This complexity reduction comes at the expenditure of a slight increase of BER (about 0.58 dB). Consequently, a trade-off has to be made between performance and computational complexity, which is sufficient for practical and macroeconomic issues.

CONCLUSION
We have presented the subspace method applying sphere detection as an alternative to the LMMSE and classical sphere decoder for MIMO multicarrier CDMA systems. The subspace-sphere algorithm is proposed to joint time-varying channel estimation and multiuser detection. To achieve high accuracy, we have been combined the subspace-sphere method with interference cancelation for every stage. We also defined suitable radius of searching termination. In this study, a low computational complexity receiver performing Soft Input Soft Output (SISO) channel decoder with BCJR algorithm are studied by a hybrid method that involves the extrinsic information for the multiuser computation and the APP for the channel estimation in context of additional pilot symbols. Our new method allows drastic computational complexity reduction one and more than one order of magnitude at the channel estimation and the multiuser detection respectively, that is validated by simulation results. Applying the subspace-sphere methods implies a slight loss in performance which is insignificant compared to the gain in computational complexity.