Plasma acylcarnitines and amino acids in dyslipidemia: an integrated metabolomics and machine learning approach

Page created by Julie Austin
 
CONTINUE READING
Plasma acylcarnitines and amino acids in dyslipidemia: an integrated metabolomics and machine learning approach
Plasma acylcarnitines and amino acids in dyslipidemia: an integrated
metabolomics and machine learning approach
Ali Etemadi
 Tehran University of Medical Sciences
Houra Mobaleghaleslam
 Tehran University of Medical Sciences
Maryam Mirabolghasemi
 University of Tehran
Mehdi Ahmadi
 Tehran University of Medical Sciences (TUMS)
Hojat Dehghanbanadaki
 Tehran University of Medical Sciences
Shaghayegh Hosseinkhani
 Tehran University of Medical Sciences
Fatemeh Bandarian
 Tehran University of Medical Sciences
Niloufar Najjar
 Tehran University of Medical Sciences
Arezou Dilmaghani-Marand
 Tehran University of Medical Sciences
Nekoo Panahi
 Tehran University of Medical Sciences
Babak Negahdari
 Tehran University of Medical Sciences (TUMS)
Mohammadali Mazloomi
 Tehran University of Medical Sciences (TUMS)
Mohammad Hossein Karimi-jafari
 University of Tehran
Farideh Razi

 Tehran University of Medical Sciences
Bagher Larijani
 Tehran University of Medical Sciences

Research Article

Keywords: Mass Spectrometry, Metabolomics, Triglycerides, Dyslipidemia, Machine learning

Posted Date: January 3rd, 2023

DOI: https://doi.org/10.21203/rs.3.rs-2400804/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License

Additional Declarations: No competing interests reported.

Version of Record: A version of this preprint was published at Journal of Diabetes & Metabolic Disorders on February 24th, 2024. See the published version at
https://doi.org/10.1007/s40200-024-01384-9.

                                                                        Page 1/15
Plasma acylcarnitines and amino acids in dyslipidemia: an integrated metabolomics and machine learning approach
Abstract
Background:‌The Discovery of underlying intermediates associated with the development of dyslipidemia results in a better understanding of
pathophysiology of dyslipidemia and their modification will be a promising preventive and therapeutic strategy for the management of dyslipidemia.

Methods: The entire dataset in this study was a large cross-sectional study that included 1200 subjects and was stratified into four binary classes with
normal and abnormal cases based on their levels of triglyceride (TG), total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), and non-HDL-C.

The current study sought to first evaluate ​plasma concentrations of 20 amino acids and 30 acylcarnitines in each class of dyslipidemia. Then, these
attributes, along with baseline characteristics data, were used to check whether machine learning (ML) algorithms could classify cases and controls.

Results: Taking this into account, the levels of dyslipidemia classes fluctuate during the day, which produces data fluctuation, our ML framework accurately
predicts TG binary classes. Moreover, the findings showed that alanine, phenylalanine, methionine, C3, C14:2, and C16 had great power in differentiating
patients with high TG from normal TG controls.

Conclusions: The comprehensive output of this work, along with sex-specific attributes, will improve our understanding of the underlying intermediates
involved in dyslipidemia.

1. introduction
Dyslipidemia can be described as a situation in which patients have unbalanced concentrations of one or more of the following factors: high-density
lipoprotein (HDL) cholesterol, low-density lipoprotein (LDL) cholesterol, total cholesterol (TC), and triglyceride (TG)(1, 2).

The molecular basis behind dyslipidemia comes from insulin resistance and hyperinsulinemia, which cause obesity as the most common metabolic disorder.
Insulin resistance pathways and dyslipidemia are both associated with adiposity, which is characterized by structural and functional changes in adipose
tissue (3).

It has been well-studied that dyslipidemia can be an augmentation factor for other diseases like atherosclerosis and cardiovascular diseases (CVD) (4). Also,
it is stated that dyslipidemia may also be linked to obesity, type 2 diabetes mellitus, and certain types of cancer (5, 6). Therefore, early screening and effective
lipid management are essential for improving the quality of life and reducing economic burden.

Emerging metabolomics has provided new insights into precision medicine, such as the discovery of new intermediate markers and a better understanding
of the underlying pathophysiology. With emerging metabolomics and the accumulation of omics data, there is still a need to extract valuable knowledge from
various omics datasets. Machine learning enables researchers to recognize and extract patterns from this large amount of omics data and helps to develop
optimal patterns that best explain the metabolomic alterations in dyslipidemia.

Within this context, in this study, we reported a data-driven platform for combining targeted LC-MS/MS metabolomics with a machine-learning approach to
evaluate fasting plasma amino acids and acylcarnitines in dyslipidemia. The present study proposes a new method based on a metabolomics approach with
the application of metabolites along with gender-specific attributes to better understand the underlying intermediates involved in dyslipidemia.

2. methods
2-1-Data collection and experiments

In this study, 1200 participants were randomly selected from our previous study on the Surveillance of Risk Factors of Noncommunicable Diseases (NCDs) in
30 provinces of Iran (STEPs 2016 Country report in Iran)1, which followed the WHO STEPwise approach to surveillance. The ethics committee of
Endocrinology and Metabolism Research Institute, Tehran University of Medical Sciences (IR.TUMS.EMRI.REC. 1395.00141) approved the study protocol and
performed it under the declaration of Helsinki. The purpose of the study was explained to the patients and written informed consent was obtained from all
participants.

Venous blood was collected in tubes containing sodium fluoride and EDTA acid after 12 h of fasting. Biochemical laboratory tests were performed using
commercial Roche kits (Roche Diagnostics, Mannheim, Germany) and a Cobas C311 autoanalyzer. A portion of the plasma sample was isolated for
metabolomic analyses.

Tandem mass spectrometry

A Thermo Scientific Dionex UltiMate 3000 standard HPLC system with a triple quadrupole mass spectrometer API 3200 (SCIEX) using positive electrospray
ionization mode was used for flow injection of MS/MS Analysis in fasting plasma samples.

After injection of a 5μL sample, a total of 50 metabolites, including 20 amino acids and 30 acylcarnitines, were analyzed. The mobile phase was a mixture of
75% aqueous acetonitrile. Data processing and metabolite quantification were performed using Multiquant software (ABI Sciex). For calibration and
calculation of analyte concentrations, ratios of the signals of the metabolites relative to the isotopes (as internal standards) were used. The full protocol of
the analytical procedures can be found in relative reference (7).

2-2-Data processing and analysis

                                                                            Page 2/15
Plasma acylcarnitines and amino acids in dyslipidemia: an integrated metabolomics and machine learning approach
Patients were classified into two groups (non-drug and drug-receiving) based on their history of receiving lipid-lowering medications during the study.

Missing data and outliers in the non-drug receiving group were excluded. The data for TG, HDL, TC, and non–HDL cholesterol were labeled into binary classes
based on this reference1. TC concentration ≥ 200 mg/dL (TC ≥ 5.2 mmol/L) was used to define hypercholesterolemia (TC class). High non-HDL cholesterol
was defined as non–HDL ≥ 130 mg/dL (≥ 3.4 mmol/L). Serum HDL levels
Plasma acylcarnitines and amino acids in dyslipidemia: an integrated metabolomics and machine learning approach
A total of 1200 patients were selected from our previous study (1). The baseline characteristics of enrolled patients are presented in Table 1. In this study,
1094 patients had no history of receiving lipid-lowering drugs and were regarded as the non-drug receiving group.

TG data analysis

A single-point TG cut-off of 150 mg/dL was used to divide the TG group into two groups: normal (≤ 150 mg/dL) and abnormal (greater than 150 mg/dL).
Among all patients in this study before sampling, 762 had normal TG levels and 321 had abnormal TG levels. After sampling data based on sex, age, and BMI,
there were 235 and 205 samples in the TG normal and abnormal groups, respectively.

The feature selection for TG

Mann–Whitney U-test (Figure 1A and Table 2S) showed that the following amino acids in TG group were statistically higher in abnormal TG group compared
to normal group: alanine( 459.63±96.25, 401.38±97.10,), glutamic acid (71.07±12.31 ،67.88±14.00), leucine (137.63±27.46, 121.87±25.85),
phenylalanine(65.56±10.95, 63.92±11.19), tyrosine(74.80±15.50, 71.19±14.33), valine (288.89±53.70, 253.20±48.57), proline (271.33±84.09, 251.04±82.55),
lysine(189.42±48.39, 180.57±43.30), tryptophan (74.27±16.63, 68.78±14.52).

The analysis showed that glycine (256.41±80.06, 273.65±77.93), serine (97.77±26.87, 109.46±30.20), and asparagine (45.52±17.22, 49.51±20.01), were
decreased in abnormal TG group compared to normal TG group.

Also, among acylcarnitines, the test showed that in the abnormal TG group C0(59.40±12.82, 55.06±12.41), C3(0.99±0.43, 0.83±0.34), C16(0.20±0.06,
0.18±0.05), C18:2OH(0.04±0.03, 0.03±0.02) were statistically higher than the normal group. Furthermore, C4OH(0.05±0.02, 0.06±0.02), C8(0.31±0.36,
0.37±0.43), C8:1(0.30±0.17, 0.35±0.18), C10(0.39±0.37, 0.47±0.46), C10:1(0.36±0.33, 0.44±0.37), C14:2(0.09±0.04, 0.11±0.05) were statistically decreased in
abnormal TG group compared to the normal group.

The data in parentheses show the mean and standard deviation of the normal and abnormal groups, respectively, for each factor.

Surprisingly, the mean and standard deviation of participants’ height in the TG normal group (161.60±10.63) were statistically lower than those in the TG
abnormal group (164.19±9.90).

The alanine aminotransferase (ALT) in the normal group (19.51±8.69) was significantly lower than that in the abnormal group (23.82±14.85).

In this study, both selectKBest and RFECV were used to identify the top 10 optimal features with the highest weights for classifying TG groups. SelectKBest,
which is a univariate feature selection method, showed that alanine, leucine, tyrosine, valine, glycine, proline, serine, tryptophan, asparagine, and diastolic
blood pressure had the most relevant features for TG group classification. The optimal features extracted using RFECV were C0, C14:2, alanine, leucine,
valine, threonine, serine, tryptophan, asparagine, and weight.

Pearson's correlation coefficients were used to measure the strength of the linear relationship between two random variables (Figure S1) and TG (Figure 1B).
As shown in Figure 1B, valine, leucine, and alanine had TG correlations greater than 0.3. The inter-correlation of factors was also checked by Pearson's
correlation coefficients (Figure S1) and scores greater than 0.6 were considered to indicate a strong correlation. The analysis showed that C14:1 with C16
and C10:1, C14:2 with C14:1 and C10:1, C18 with C16, valine with tyrosine, and leucine had a strong correlation, with Pearson scores greater than 0.6.
Furthermore, Point Biserial Correlation (Figure S1) showed that alanine, leucine, valine, serine, tryptophan, and weight had the highest correlation (greater
than 0.3) with the TG classes.

Machine learning for TG classification

A dataset containing data for 440 instances (235 samples with normal TG scores and 205 samples with abnormal TG scores) was used for TG classification.
Data in both the normal and abnormal groups were adjusted by age (mean 53.76 and 53.93 years old, respectively), sex (frequency 220 and 220,
respectively), and BMI (mean 28.6 and 28.5, respectively), and the differences between them in each group were not statistically significant (p-value > 0.05).

Based on feature selection methods, a combination of different feature sets was used, and the highest accuracy was achieved when all statistically
significant features were used. This feature set for TG classification had 25 independent attributes and the target feature (a dependent feature) (Figure S1).
The target was labeled as either 0 or 1, where 0 was defined as a person with a normal TG value and 1 as a person with an abnormal TG value.

For TG classification, 21 ML models were used, and the five top models based on ROC curves were Logistic Regression (LR), Support Vector Classification
(SVC), Linear Support Vector Classifier (LSVC), Random Forest (RF), and Linear Discriminant Analysis (LDA). A comparison of the top models based on the
ROC curve with all 25 independent attributes is shown in Figure 2A. These top five models showed satisfactory TG classification performance, with AUCs
ranging between 0.76 and 0.81. The data showed that the SVM model (with AUC = 0.81, and standard deviation of test accuracy = 0.04) performed slightly
better than the other models and was considered the optimal model for TG classification. Furthermore, the mean CV score (K=10), recall (true positive/true
positive+ false negative), precision (true positive/true positive+ false positive), F1, and standard deviation of the test accuracy for this model were 0.69, 0.7,
0.72, 0.71, and 0.04, respectively. In terms of precision (True positive/True positive+ False positive), LDA, with a precision of 0.73(standard deviation of test
accuracy = 0.05), had better performance.

However, the data showed that LSVC had the highest recall score (recall = 0.79 and standard deviation of test accuracy = 0.07). In addition, the analysis
demonstrated that LSVC retained a strong predictive performance for F1 with a score of 0.75. Table S3 summarizes all characteristics of the top five models.

Feature importance for top 5 models in TG class

                                                                            Page 4/15
Plasma acylcarnitines and amino acids in dyslipidemia: an integrated metabolomics and machine learning approach
The ELI5 library (15) (Accessed:2022-06-25) was used to extract feature importance for each model. Figure 2B and 2C show the top 10 most important
features of the SVM and LR used for prediction, respectively. Six features were common to both SVM and LR feature importance (Figure 2D): alanine,
phenylalanine, methionine, C16, C14:2, and C3.

For both SVM and LR (Figures 2B and 2C), we also assessed the models based on a 2×2 confusion matrix with true TG labels on one axis and predicted TG
results on the other axis. The matrix showed that both the SVM and LR performed better in predicting abnormal TG classes. Predicted abnormal TG values
(true positive) in the matrix in SVM and LR were 0.8 and 0.79, respectively. For normal TG (true negative), these scores were 0.72 and 0.73, respectively.

Data for TC, HDL, and non-HDL cholesterol

Data processing methods for the TC, HDL, and non-HDL cholesterol groups were the same as those used for the TG group. After excluding missing data
points and outliers from each class, matching (sampling) of cases and controls in the groups was performed based on age, sex, and BMI. The number of
controls in the TC, HDL, and non-HDL cholesterol groups after sampling was 187, 271, and 302, respectively. Furthermore, after sampling, in the abnormal
group there were 162, 245, and 272 cases in TC, HDL, and non-HDL cholesterol groups, respectively.

Three feature selection methods (statistical analysis using the Mann–Whitney U-test, Selectkbest with Chi-Square, and RFECV) were used to extract a
combination of important features (Table S4). Table S4 summarizes all extracted features using the three mentioned methods for the TG, TC, HDL, and non-
HDL cholesterol groups. The Pearson correlation coefficients of all HDL features are shown in Figure S2.

The ML data for HDL classification using statistically significant attributes (Figure S3A) showed that the top prediction model was a Random Forest with an
AUC score of 0.73. The prediction power of the model was slightly more substantial for abnormal patients, as depicted in Figure S3B using a confusion
matrix. The top features (asparagine, C3, C0, glutamic acid, alanine, and C5) for HDL classification using RF are plotted in Figure S3C.

Non-HDL cholesterol classification data showed that the most satisfactory prediction accuracy was for the SVM model, with an AUC score of 0.72, as shown
in Figure S4A. The top features proposed by ELI5 feature importance were C16, valine, and C18:1 (Figure S4B and S6C). The normalized confusion matrix
showed that the model was stronger in the normal non-HDL-C group (Figure S4B). The Pearson correlation coefficients of all features for HDL and TC levels
are shown in Figures S5 and S6, respectively.

It seems that machine learning models cannot classify cases and controls in TC. The accurate classification model was RF, with an AUC of 0.61 (Figure S7A).
The top feature proposed by ELI5 was tryptophan (Figure S7B and S9C). Point biserial correlations with features also showed that Alanine and C18 had the
highest correlation with TC classes (Figure S8).

Association between gender and dyslipidemia

In this study, we modified the association of all attributes and dyslipidemia by sex to determine which factors were more dominant in favor of either males or
females. These gender-specific attributes may help in screening programs. Figure 3 shows the p-values for sex-specific attributes in each class of
dyslipidemia.

The analysis of the data (Figure 3A) showed that in females, patients with abnormal TG had lower levels of serine (Figure 3B), asparagine, threonine, glycine,
citrulline, C14:2, C14:1, C10, C8:1, C8, and C5:DC and higher levels of proline, glutamic acid, and C18:2OH (and also height) than the normal TG group,
whereas these metabolites showed no differences in the male population between the abnormal TG and normal TG groups.

In the male population, patients with abnormal TG levels had lower levels of C3 and higher levels of lysine, histidine, C18, C5:OH, C5, and C3(Figure 3B) ( as
well as ALT) than the normal TG group, whereas there were no differences in the female population between the abnormal TG and normal TG groups.
Furthermore, we found that both males and females with abnormal TG levels had lower levels of alanine (Figure 3B) and tryptophan and higher levels of C0,
C16, C10:1, leucine, and valine than the normal TG group.

For women in the TC group, the waist-hip ratio and aspartic acid level were significantly higher in the abnormal and normal groups, respectively. In contrast,
males in the normal TC group had significantly lower tryptophan, proline, ornithine, and alanine levels than males with abnormal TC levels (Figure 3C).

The data suggested that females with normal HDL had statistically lower ALT, waist-hip, and higher alanine, citrulline, arginine, and C14OH levels than the
abnormal group (Figure 3D).

Furthermore, higher asparagine, serine, threonine, and glycine levels and lower mean diastolic pressure were observed in females with normal non-HDL. In
this group, males with abnormal non-HDL showed significantly higher levels of tryptophan, ornithine, tyrosine, leucine, alanine, C18, C16, C5OH, C5, and C0
than males with normal non-HDL (Figure 3E).

To check for differences in attributes based on gender in normal and abnormal groups, data were stratified by outcomes(normal/abnormal), and p-values
were calculated for males and females in both normal and abnormal TG, TC, non-HDL, and HDL groups (Figure S9A). For simplicity, the data for the TG groups
are reported here, and the full details are available in Figure S9A-E.

In the normal TG group, the concentrations of asparagine, serine, glycine, and C18:1 were significantly different between men and women. Furthermore, WC,
histidine, tyrosine, aspartic acid, C14:2, C14, C12, and C0 levels were statistically significant in the abnormal TG groups according to sex (Figure S9B). In both
normal and abnormal TG, ALT, waist-hip, HC, BMI, height, tryptophan, proline, citrulline, valine, phenylalanine, methionine, leucine, glutamic acid, C18, C16:1,
C5DC, C5:OH, C5, C4DC, C3, and C3DC were significantly different between male and female.

                                                                           Page 5/15
Pathway and Metabolite enrichment analysis

According to Metabolite enrichment analysis, 33 pathways were enriched (Table S1). These pathways included aminoacyl-tRNA biosynthesis, valine-leucine,
and isoleucine biosynthesis, alanine, aspartate, and glutamate metabolism, arginine biosynthesis, glyoxylate and dicarboxylate metabolism, glycine, serine
and threonine metabolism, histidine metabolism, phenylalanine, tyrosine and tryptophan biosynthesis, Pantothenate and CoA biosynthesis, D-Glutamine and
D-glutamate metabolism, nitrogen metabolism, glutathione metabolism, phenylalanine metabolism, cysteine, and methionine metabolism in compliance with
adjusted p-value < 0.05. This pathway is illustrated in Figure 4.

Network-based analysis determines the relationship between the metabolites listed in Table S1 and the enzymes and reactions involved in the metabolism of
these compounds. The network contains 185 nodes and 315 edges. To discover the most significant nodes in terms of those that have the most impact on
the network, a cytoHubba application was applied. Protein digestion and absorption(hsa04974), ABC transporters(hsa02010), mineral absorption(hsa04978),
and pancreatic secretion(hsa04972) in pathways cluster and Na+/K+-exchanging ATPase (7.2.2.13) and triacylglycerol lipase (3.1.1.3) in the class of
enzymes and L-Phenylalanine(C00079), L-Serine(C00065), L-Aspartic acid(C00049), L-Leucine(C00123), L-Glutamic acid(C00025), L-Valine(C00183), L-
Alanine(C00041), Glycine(C00037), L-Glutamine(C000640), L-Tryptophan(C000780), L-Proline(C00148) and Triacylglycerol(C00422) in metabolite group are
the most critical nodes in term of the betweenness value (Figure 5).

Analysis of drug-receiving groups

In the drug-receiving group (patients with a history of receiving lipid-lowering drugs), there were 106 patients. In this group, the concentrations of C18, TG, TC,
non-HDL cholesterol, and HDL were statistically significant in the two LDL classes divided by the LDL cutoff level of 100 mg/dL (Figure S10). In addition, in TC
classes divided by a cut-off level of 200 mg/dL, the Mann-Whitney (independent samples) test showed that C4DC, C18, serine, asparagine, TG, non-HDL
cholesterol, and HDL were statistically significant (Figure S11).

4. discussion
Our previous study showed that serum lipid levels in adult Iranian populations were critically at dangerous levels, with a report of eight out of ten people with
undesired serum lipid levels (1). The findings showed that the prevalence of lipid abnormalities of low HDL, non–HDL cholesterol, hypertriglyceridemia, and
hypercholesterolemia in adult Iranian populations was 60%, 39.5%, 28.0%, and 26.7%, respectively.

The systematic analysis and study of metabolites in biological samples is a part of metabolomics. Using metabolites such as amino acids and
acylcarnitines, metabolomics can be used to extract and provide useful knowledge from normal and abnormal samples, which eventually helps us explore
pathophysiological conditions and the molecular mechanisms of some diseases and disorders (16, 17).

To the best of our knowledge, this is the first study to systematically investigate amino acids and acylcarnitines as risk markers for dyslipidemia using LC-
MS/MS. These factors have not been extensively studied in the field of precision medicine.

There is insufficient evidence regarding machine learning applications for dyslipidemia classification. A novel and significant contribution of this study is a
way to solve two-class classification (normal or abnormal) based on data on the concentrations of amino acids and acylcarnitines in plasma.

The main constituents of the human lipid fraction are cholesterol, TGs, and high-density lipoproteins. Studies showed that the fluctuation in lipid and
lipoprotein levels daily and even hourly are often encountered in hyperlipidemic patients (18). ​For example, in response to meals, TGs change dramatically,
becoming 5–10 times higher than fasting levels just a few hours after eating. Although our sampling methods used fasting plasma to exclude fluctuations, it
seems that even fasting levels vary considerably from day to day, and these modest changes in fasting TG levels might cause huge problems in machine
learning algorithms. We attempted to exclude outlier data points from our data frame to reduce potential data sparsity and noise. ML prediction showed that
by using amino acids and acylcarnitines as attributes, only TG classification had satisfactory accuracy.

In this regard, a study conducted by Yousri et al. on the relationship between metabolite levels and dyslipidemia reported that TG was the most significantly
perturbed lipid pathway (19).

The goal of this study was to classify the TG group based on a single-point TG cut-off of 150 mg/dL. Alanine, glutamic acid, leucine, phenylalanine, tyrosine,
valine, proline, lysine, and tryptophan were significantly increased in the abnormal TG group compared to the normal group, whereas glycine, serine, and
asparagine showed a decreasing trend. After adjusting the data based on sex, age, and BMI, the findings showed that there were several important features
that had the highest classification weights and differences between the normal and abnormal groups. Alanine, Leucine, and Valine showed the highest
differences between the normal and abnormal TG groups based on both the MW-U test and Pearson’s correlation coefficient. The concentrations of these
three features were highly increased in the abnormal TG groups compared to those in the normal group, which is in accordance with the study of Yousri and
coworkers (19). In another study, for both sexes, the amounts of valine and leucine positively correlated with TG levels and negatively correlated with HDL
cholesterol levels (20). Rose et al. showed that fasting serum TG levels were significantly higher after oral administered L-alanine(21). They suggested that
this alteration in serum TG levels may have been due to increased alanine metabolism to pyruvate and its incorporation into lipids under insulin stimulation.

Wiklund et al. examined the association between TG concentrations and serum amino acid profiles during pubertal growth to predict hypertriglyceridemia in
early adulthood(22). Although this was studied in girls, they found that serum leucine and isoleucine levels correlated significantly with future TG levels.

As the underlying mechanism for the observed elevation, in the state of obesity, a decline in the catabolism of valine and leucine in adipose tissue can lead to
an increase in their circulating levels. In addition, readily usable lipid and glucose substrates can avoid the requirement of amino acids for metabolism in

                                                                           Page 6/15
adipose tissue (23). In this study, 347 participants had BMI > 25.

With regard to acylcarnitines, C0, C3, C16, and C18:2OH increased in the abnormal group. Furthermore, C4OH, C8, C8:1, C10, C10:1, and C14:2 levels were
significantly lower in the abnormal TG group than in the normal group.

C3 acylcarnitine, levels of which are increased in abnormal TG groups compared to normal ones, is a byproduct of valine and isoleucine amino acids(24).

A significantly increased alanine aminotransferase (ALT) level, which is a common laboratory marker for underlying chronic liver disease (25) was also
reported in the abnormal TG group. These findings are in line with those of Chen et al.'s (26) results. They proposed that serum ALT levels were independently
correlated with the hepatic TG content in obese subjects. They also mentioned that ALT level might be more appropriate as a predictor for the degree of non-
alcoholic fatty liver disease (NAFLD) than aspartate aminotransferase (AST) and gamma-glutamyltransferase (GGT).

Although the direct determination of dyslipidemia factors in the laboratory is the most accurate and preferred method, when this is not available, machine
learning can help with less computationally expensive methods and shorter time frames. Here, we showed that in both SVC and LR models, TG values can be
accurately classified into normal and abnormal classes based on plasma concentrations of amino acids and acylcarnitine. Both models were applied to the
statistically significant features. Alanine, Phenylalanine, Methionine, C3, C14:2, and C16 all had a statistically significant effect on TG classification according
to the ELI5 library for extracting feature importance from the SVC and LR models.

Our findings showed that in the drug-receiving groups, the concentration of acylcarnitine C18 was statistically significant in groups stratified by both LDL and
cholesterol. In both groups of people with abnormal LDL and cholesterol levels, C18 concentrations were higher than those in the normal groups. C18 is a
long chain acylcarnitine. Elevated levels of C18 have been shown in people with carnitine-palmitoyl transferase-2 deficiency disorder, which is the most
common inherited disorder of lipid metabolism in adults(27).

Similar to C18, C4DC concentrations were significantly higher in the patients with abnormal HDL levels. Serine and asparagine showed lower concentrations
in the abnormal cholesterol groups than in the drug-receiving groups.

Despite the fact that there are not enough published studies with adequate data stratified by sex, there is strong evidence that sex, as an endogenous factor,
influences metabolism, incidence or severity of diseases, and therapy (16, 28). In this study, we also examined the relationship between sex and dyslipidemia
in different classes, which can provide valuable information through precision medicine.

Conclusion
The comprehensive output of this study, along with gender-specific attributes, provides a better understanding of metabolite dysregulation in dyslipidemia.
Machine learning modeling has introduced several highly accurate models for the detection of patients with abnormal TG levels. Alanine, phenylalanine, C16,
methionine, C14:2, and C3 were the common diagnostic metabolites in the two most accurate models. The metabolic pathways that have the greatest
impact on abnormal TG development are valine, leucine, and isoleucine biosynthesis; phenylalanine, tyrosine, and tryptophan biosynthesis; aminoacyl-tRNA
biosynthesis; D-Glutamine and D-glutamate metabolism; and arginine biosynthesis.

Acylcarnitines name
Free carnitine (C0), acetylcarnitine (C2), propionylcarnitine (C3), Malonylcarnitine (C3-DC), butyrylcarnitine (C4), Methylmalonyl-/succinylcarnitine (C4-DC), 3-
OH-iso-/butyrylcarnitine (C4-OH), isovalerylcarnitine (C5), Tiglylcarnitine (C5:1), 3-OH-isovalerylcarnitine (C5-OH), glutarylcarnitine (C5DC), hexanoylcarnitine
(C6), octanoylcarnitine (C8), Octenoylcarnitine (C8:1), decanoylcarnitine (C10), Decenoylcarnitine (C10:1), dodecanoylcarnitine (C12), tetradecanoylcarnitine
(C14), Tetradecenoylcarnitine (C14:1), Tetradecadienoylcarnitine (C14:2), 3-OH-tetradecanoylcarnitine (C14-OH), hexadecanoylcarnitine (C16), 3-OH-
hexadecanoylcarnitine (C16-OH), 3-OH-hexadecenoylcarnitine (C16:1-OH), Hexadecenoylcarnitine (C16:1), octadecanoylcarnitine (C18),
Octadecenoylcarnitine (C18:1), 3-OH-octadecanoylcarnitine (C18-OH), 3-OH-octadecenoylcarnitine (C18:1-OH), Octadecadienoylcarnitine (C18:2).

Abbreviations
body mass index (BMI), false discovery rate (FDR), branched-chain amino acids (BCAA), aromatic amino acids (AAA), triglyceride (TG), total cholesterol (TC),
low plasma high-density lipoprotein cholesterol(HDL-C), machine learning (ML), liquid chromatography-tandem mass spectrometry (LC-MS/MS), Mann–
Whitney U-test(MWU), alanine aminotransferase (ALT), receiver operator characteristic curves (ROC)

Declarations
Ethics approval and consent to participate

The ethics committee of Endocrinology and Metabolism Research Institute, Tehran University of Medical Sciences (IR.TUMS.EMRI.REC. 1395.00141)
approved the study protocol and performed it under the declaration of Helsinki.

Consent for publication

The purpose of the study was explained to the patients and written informed consent was obtained from all participants.

Availability of data and materials

                                                                            Page 7/15
The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Competing interests

The authors declare that they have no competing interests.

Funding

N/A

Authors' contributions

F.R., A.E., H.D, S.H., H.M, contributed to the study conception and design. B.A., S.MF., S.AM. provided study patients and monitored data and specimen
collection. N.N., A.DM, Sh.H. performed the experiments. A.E., F.R., M.A., H.M., M.M., M.KJ. analyzed the data. A.E., H.M., M.A., Sh.H., H.DB. wrote the
manuscript. All authors read and approved the final manuscript.

References
  1. Aryan Z, Mahmoudi N, Sheidaei A, Rezaei S, Mahmoudi Z, Gohari K, et al. The prevalence, awareness, and treatment of lipid abnormalities in Iranian
      adults: Surveillance of risk factors of noncommunicable diseases in Iran 2016. J Clin Lipidol. 2018 Dec;12(6):1471-1481.e4.
  2. National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult
     Treatment Panel III). Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High
     Blood Cholesterol in Adults (Adult Treatment Panel III) final report. Circulation. 2002 Dec 17;106(25):3143–421.
  3. Blüher M. Adipose tissue dysfunction contributes to obesity related metabolic diseases. Best Pract Res Clin Endocrinol Metab. 2013 Apr 1;27(2):163–77.
  4. Lin CF, Chang YH, Chien SC, Lin YH, Yeh HY. Epidemiology of Dyslipidemia in the Asia Pacific Region. Int J Gerontol. 2018 Mar 1;12(1):2–6.
  5. Vekic J, Zeljkovic A, Stefanovic A, Jelic-Ivanovic Z, Spasojevic-Kalimanovska V. Obesity and dyslipidemia. Metabolism. 2019 Mar 1;92:71–81.
  6. Johnson CB, Davis MK, Law A, Sulpher J. Shared Risk Factors for Cardiovascular Disease and Cancer: Implications for Preventive Health and Clinical
     Care in Oncology Patients. Can J Cardiol. 2016 Jul;32(7):900–7.
  7. Esmati P, Najjar N, Emamgholipour S, Hosseinkhani S, Arjmand B, Soleimani A, et al. Mass spectrometry with derivatization method for concurrent
     measurement of amino acids and acylcarnitines in plasma of diabetic type 2 patients with diabetic nephropathy. J Diabetes Metab Disord. 2021
     Jun;20(1):591–9.
  8. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Mach Learn PYTHON. :6.
  9. SHAPIRO SS, WILK MB. An analysis of variance test for normality (complete samples)†. Biometrika. 1965 Dec 1;52(3–4):591–611.
10. Freedman D, Pisani R, Purves R. Statistics: Fourth International Student Edition. W W Nort Co Httpswww Amaz ComStatistics-Fourth-Int-Stud-Free
    Accessed. 2020;22.
11. Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinforma Oxf Engl. 2016 Sep
      15;32(18):2847–9.
12. Evaluation of Feature Selections on Movie Reviews Sentiment | IEEE Conference Publication | IEEE Xplore [Internet]. [cited 2022 Sep 3]. Available from:
    https://ieeexplore.ieee.org/document/9234287
13. FELLA: an R package to enrich metabolomics data | BMC Bioinformatics | Full Text [Internet]. [cited 2022 Nov 15]. Available from:
      https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2487-5
14. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction
    networks. Genome Res. 2003 Nov;13(11):2498–504.
15. Korobov M, Lopuhin K. ELI5 Documentation. :113.
16. Costanzo M, Caterino M, Sotgiu G, Ruoppolo M, Franconi F, Campesi I. Sex differences in the human metabolome. Biol Sex Differ. 2022 Jun 15;13(1):30.
17. Beger RD, Schmidt MA, Kaddurah-Daouk R. Current Concepts in Pharmacometabolomics, Biomarker Discovery, and Precision Medicine. Metabolites.
    2020 Mar 27;10(4):E129.
18. Weintraub MS, Grosskopf I, Charach G, Eckstein N, Ringel Y, Maharshak N, et al. Fluctuations of Lipid and Lipoprotein Levels in Hyperlipidemic
      Postmenopausal Women Receiving Hormone Replacement Therapy. Arch Intern Med. 1998 Sep 14;158(16):1803–6.
19. Yousri NA, Suhre K, Yassin E, Al-Shakaki A, Robay A, Elshafei M, et al. Metabolic and Metabo-Clinical Signatures of Type 2 Diabetes, Obesity, Retinopathy,
    and Dyslipidemia. Diabetes. 2022 Feb 1;71(2):184–205.
20. Fukagawa NK, Martin JM, Wurthmann A, Prue AH, Ebenstein D, O’Rourke B. Sex-related differences in methionine metabolism and plasma homocysteine
      concentrations. Am J Clin Nutr. 2000 Jul;72(1):22–9.
21. Rose DP, Leklem JE, Fardal L, Baron RB, Shrago E. Effect of oral alanine loads on the serum triglycerides of oral contraceptive users and normal subjects.
    Am J Clin Nutr. 1977 May;30(5):691–4.
22. Wiklund P, Zhang X, Tan X, Keinänen-Kiukaanniemi S, Alen M, Cheng S. Serum Amino Acid Profiles in Childhood Predict Triglyceride Level in Adulthood: A
      7-Year Longitudinal Study in Girls. J Clin Endocrinol Metab. 2016 May;101(5):2047–55.
23. Newgard CB. Interplay between lipids and branched-chain amino acids in development of insulin resistance. Cell Metab. 2012 May 2;15(5):606–14.

                                                                          Page 8/15
24. Newgard CB, An J, Bain JR, Muehlbauer MJ, Stevens RD, Lien LF, et al. A Branched-Chain Amino Acid-Related Metabolic Signature that Differentiates
    Obese and Lean Humans and Contributes to Insulin Resistance. Cell Metab. 2009 Apr;9(4):311–26.
25. Siddiqui MS, Sterling RK, Luketic VA, Puri P, Stravitz RT, Bouneva I, et al. Association between high-normal levels of alanine aminotransferase and risk
    factors for atherogenesis. Gastroenterology. 2013 Dec;145(6):1271-1279.e1-3.
26. Chen Z, Han CK, Pan LL, Zhang HJ, Ma ZM, Huang ZF, et al. Serum alanine aminotransferase independently correlates with intrahepatic triglyceride
    contents in obese subjects. Dig Dis Sci. 2014 Oct;59(10):2470–6.
27. Adeva-Andany MM, Calvo-Castro I, Fernández-Fernández C, Donapetry-García C, Pedre-Piñeiro AM. Significance of l-carnitine for human health. IUBMB
    Life. 2017;69(8):578–94.
28. F MJ, Hk B, I C, Jj C, S D, F F, et al. Sex- and Gender-Based Pharmacological Response to Drugs. Pharmacol Rev [Internet]. 2021 Apr [cited 2022 Sep
    13];73(2). Available from: https://pubmed.ncbi.nlm.nih.gov/33653873/?dopt=Abstract

Tables
Table 1. The baseline characteristics of the study participants were classified into different classes of dyslipidemia.

                                                                           Page 9/15
Variables     TG N*          TG A           p-      TC N (N=186)     TC A (N=161)   p-      HDL N          HDL A          p-      non-HDL
              (n=235)        (n=205)        value                                   value   (N=271)        (N=245)        value   cholesterol
                                                                                                                                  (N=302)

Age           53.76±10.54    53.93±10.60    0.8     55.94±10.82      56.22±10.75    0.7     57.15±12.14    56.81±11.95    0.9     55.77±11.80
(year)

Gende                                       0.8                                     0.5                                   0.8
r(n):

Female
              119            101                    117              95                     136            124                    152

    Male      116            104                    69               66                     135            121                    150

Area (n):                                   0.7                                     0.3                                   0.13

    Rural
              81             64                     61               67                     96             86                     102

      Urban   154            141                    125              94                     175            159                    200

Years of
Education
(n)

0             58             35             0.3     50               49             0.5     69             63             0.001   75

1-6           79             77                     67               44                     91             86                     116

7-12          69             68                     49               40                     71             63                     82

>12           29             25                     20               28                     40             33                     29

HTN                                         0.2                                     0.9                                   0.5
treatment
(n)

              194            178                    156              134                    233            210                    251

No

Yes           41             27                     30               27                     38             35                     51

              28.60±4.67     28.48±4.74     0.7     28.17±4.95       28.04±5.21     0.8     26.88±4.87     27.08±4.87     0.6     27.78±5.00
BMI
(Kg/m²)
HbA1c         5.83±1.29        5.95±1.13        0.002    5.76±1.02         5.88± 1.05       0.04     5.68±1.06        5.83±1.00        0.005    5.78±1.14
 (%)

 GLU           100.18±37.40     106.63±40.90     0.004    101.74±33.23      101.19±36.75     0.62     96.74±27.20      102.94±36.34     0.02     100.59±34.0
 (mg/dL)

 NHC           118.94±29.37     149.51±33.65     6E-      119.13±25.32      177.50±20.33     2E-      122.60±34.13     130.87±34.11     0.009    103.93±18.4
 (mg/dL)                                         20                                          42

* Continuous variables are presented as mean± SD, and categorical variables are presented as the number of each variable. N=normal, A=abnormal. HTN:
hypertension, BMI: body mass index, WC: waist circumference, HC: hip circumference, BP: blood pressure, TG: Triglycerides, NHC: Non-HDL cholesterol, GLU:
glucose,

Figures

Figure 1

Mann–Whitney U-test p-values (A) and Pearson’s correlation coefficient(r) (B) for all studied features in the TG, TC, non-HDL cholesterol, and HDL groups
based on normal and abnormal categories.

                                                                        Page 11/15
Figure 2

(A) Receiver operator characteristic curves (ROC) for TG classification. (B) Feature importance and confusion matrix for the SVM model for TG classification.
(C) Feature importance and confusion matrix for the LR model in TG classification. (D) Alanine, phenylalanine, C16, methionine, C14:2, and C3 are common
features extracted using both the SVM and LR models.

                                                                        Page 12/15
Figure 3

(A) P-values stratified by sex comparing outcomes(normal/abnormal) for TG, TC, non-HDL, and HDL groups. The box plot shows the most significant features
in the TG (B), TC (C), non-HDL (D), and HDL (E) groups.

                                                                     Page 13/15
Figure 4

KEGG pathway analysis based on p-values and enrichment ratios

                                                                Page 14/15
Figure 5

CytoHubba represents the most significant node in terms of betweenness in the network and was obtained using FELLA.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.

    SupplementaryDyslipidemiaetemadi.docx

                                                                           Page 15/15
You can also read