[go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (28)

Search Parameters:
Keywords = generative pre-trained transformer 2

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 10104 KiB  
Article
From Plants to Pixels: The Role of Artificial Intelligence in Identifying Sericea Lespedeza in Field-Based Studies
by Aftab Siddique, Kyla Cook, Yasmin Holt, Sudhanshu S. Panda, Ajit K. Mahapatra, Eric R. Morgan, Jan A. van Wyk and Thomas H. Terrill
Agronomy 2024, 14(5), 992; https://doi.org/10.3390/agronomy14050992 - 8 May 2024
Viewed by 823
Abstract
The increasing use of convolutional neural networks (CNNs) has brought about a significant transformation in numerous fields, such as image categorization and identification. In the development of a CNN model to classify images of sericea lespedeza [SL; Lespedeza cuneata (Dum-Cours) G. Don] from [...] Read more.
The increasing use of convolutional neural networks (CNNs) has brought about a significant transformation in numerous fields, such as image categorization and identification. In the development of a CNN model to classify images of sericea lespedeza [SL; Lespedeza cuneata (Dum-Cours) G. Don] from weed images, four architectures were explored: CNN model variant 1, CNN model variant 2, the Visual Geometry Group (VGG16) model, and ResNet50. CNN model variant 1 (batch normalization with adjusted dropout method) demonstrated 100% validation accuracy, while variant 2 (RMSprop optimization with adjusted learning rate) achieved 90.78% validation accuracy. Pre-trained models, like VGG16 and ResNet50, were also analyzed. In contrast, ResNet50’s steady learning pattern indicated the potential for better generalization. A detailed evaluation of these models revealed that variant 1 achieved a perfect score in precision, recall, and F1 score, indicating superior optimization and feature utilization. Variant 2 presented a balanced performance, with metrics between 86% and 93%. VGG16 mirrored the behavior of variant 2, both maintaining around 90% accuracy. In contrast, ResNet50’s results revealed a conservative approach for class 0 predictions. Overall, variant 1 stood out in performance, while both variant 2 and VGG16 showed balanced results. The reliability of CNN model variant 1 was highlighted by the significant accuracy percentages, suggesting potential for practical implementation in agriculture. In addition to the above, a smartphone application for the identification of SL in a field-based trial showed promising results with an accuracy of 98–99%. The conclusion from the above is that a CNN model with batch normalization has the potential to play a crucial role in the future in redefining and optimizing the management of undesirable vegetation. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Examples of different sericea lespedeza images (first row in top), weed images (second row in middle), and SL images in between weeds (third row in bottom), a combination used in the development of CNN image-based classification. Different images used in the figure illustration show examples of different colors, lighting conditions, angles of images taken, and distances from plants.</p>
Full article ">Figure 2
<p>Flow diagram for the pipeline showing development of image classification model and weed identification app.</p>
Full article ">Figure 3
<p>Graphical representation of batch normalization with adjusted dropout method. Black circle represents dropout non-functional neural network nodes.</p>
Full article ">Figure 4
<p>Diagrammatic representation of RMSprop optimization with adjusted dropout method.</p>
Full article ">Figure 5
<p>Graphical representation of training and validation accuracies and loss for CNN model variant 1 (batch normalization with adjusted dropout) for SL image datasets.</p>
Full article ">Figure 6
<p>Graphical representation of training and validation accuracies and loss for CNN model variant 2 (RMSprop optimizer with adjusted learning rate) for SL image datasets.</p>
Full article ">Figure 7
<p>Graphical representation of training and validation accuracies and loss for pre-trained VGG16 model using SL images datasets.</p>
Full article ">Figure 8
<p>Graphical representation of training and validation accuracies and loss for pre-trained ResNet50 model using SL images datasets.</p>
Full article ">Figure 9
<p>Flowchart and smart phone app results for differentiation of SL from plant weed species.</p>
Full article ">
16 pages, 744 KiB  
Article
Causal Inference and Prefix Prompt Engineering Based on Text Generation Models for Financial Argument Analysis
by Fei Ding, Xin Kang, Linhuang Wang, Yunong Wu, Satoshi Nakagawa and Fuji Ren
Electronics 2024, 13(9), 1746; https://doi.org/10.3390/electronics13091746 - 1 May 2024
Viewed by 489
Abstract
The field of argument analysis has become a crucial component in the advancement of natural language processing, which holds the potential to reveal unprecedented insights from complex data and enable more efficient, cost-effective solutions for enhancing human initiatives. Despite its importance, current technologies [...] Read more.
The field of argument analysis has become a crucial component in the advancement of natural language processing, which holds the potential to reveal unprecedented insights from complex data and enable more efficient, cost-effective solutions for enhancing human initiatives. Despite its importance, current technologies face significant challenges, including (1) low interpretability, (2) lack of precision and robustness, particularly in specialized fields like finance, and (3) the inability to deploy effectively on lightweight devices. To address these challenges, we introduce a framework uniquely designed to process and analyze massive volumes of argument data efficiently and accurately. This framework employs a text-to-text Transformer generation model as its backbone, utilizing multiple prompt engineering methods to fine-tune the model. These methods include Causal Inference from ChatGPT, which addresses the interpretability problem, and Prefix Instruction Fine-tuning as well as in-domain further pre-training, which tackle the issues of low robustness and accuracy. Ultimately, the proposed framework generates conditional outputs for specific tasks using different decoders, enabling deployment on consumer-grade devices. After conducting extensive experiments, our method achieves high accuracy, robustness, and interpretability across various tasks, including the highest F1 scores in the NTCIR-17 FinArg-1 tasks. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>The proposed PPEF overview. Yellow text represents the original text of the dataset, and green text represents the labels.</p>
Full article ">Figure 2
<p>Causal Inference from ChatGPT enables the T5 model to simultaneously output text labels and inferences. The original FinArg-1 text and the labels used by the different methods are highlighted.</p>
Full article ">Figure 3
<p>Long/short instructions act on inference results.</p>
Full article ">
20 pages, 5330 KiB  
Article
Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models
by Hongkang Chu and Taigang Liu
Int. J. Mol. Sci. 2024, 25(8), 4507; https://doi.org/10.3390/ijms25084507 - 19 Apr 2024
Viewed by 554
Abstract
Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational [...] Read more.
Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model. Full article
(This article belongs to the Special Issue Deep Learning in Bioinformatics and Biological Data Analysis)
Show Figures

Figure 1

Figure 1
<p>Flow chart of our study. ESM-2: evolutionary scale modeling 2; 320D: 320 dimensions; SVM: support vector machine; DNN: deep neural network; NB: naive bayes; XGB: extreme gradient boosting; CapsNet: capsule networks; BiLSTM: bidirectional long short-term memory; RF: random forest; DPC-PSSM: dipeptide composition position-specific scoring matrix; KSB-PSSM: K-Separated-Bigrams position-specific scoring matrix; GPT2: generative pre-trained transformer 2; 400D: 400 dimensions; 1200D: 1200 dimensions.</p>
Full article ">Figure 2
<p>ROC curves for different datasets using various features with a 5-fold CV on the test set. Abbreviations: SVM: support vector machine; XGB: extreme gradient boosting; NB: naive bayes; RF: random forest; AUC: accuracy; ESM-2: evolutionary scale modeling 2; ROC: receiver operating characteristic; CV: cross-validation.</p>
Full article ">Figure 3
<p>SHAP analysis of the importance of PSSM-based features on Jamali’s dataset. (<b>a</b>) XGB; (<b>b</b>) SVM; (<b>c</b>) RF. Abbreviations: SHAP: Shapley Additive Explanations; XGB: extreme gradient boosting; SVM: support vector machine; RF: random forest; DPC-PSSM: dipeptide composition position-specific scoring matrix; KSB-PSSM: K-Separated-Bigrams position-specific scoring matrix.</p>
Full article ">Figure 3 Cont.
<p>SHAP analysis of the importance of PSSM-based features on Jamali’s dataset. (<b>a</b>) XGB; (<b>b</b>) SVM; (<b>c</b>) RF. Abbreviations: SHAP: Shapley Additive Explanations; XGB: extreme gradient boosting; SVM: support vector machine; RF: random forest; DPC-PSSM: dipeptide composition position-specific scoring matrix; KSB-PSSM: K-Separated-Bigrams position-specific scoring matrix.</p>
Full article ">Figure 4
<p>ACC and MCC plots of CapsNets with various kernel sizes across different training epochs. (<b>a</b>) ACC; (<b>b</b>) MCC. Abbreviations: ACC: accuracy; MCC: Matthews Correlation Coefficient; CapsNets: capsule networks.</p>
Full article ">Figure 5
<p>UMAP visualization on two datasets with different features and different models. Abbreviations: ESM2: evolutionary scale modeling 2; BiLSTM: bidirectional long short-term memory; CapsNet: capsule network; DNN: deep neural network.</p>
Full article ">Figure 6
<p>Heat map of labeling differences for the common protein sequences across these two datasets, totaling 2101 protein sequences.</p>
Full article ">Figure 7
<p>Protein contact maps. The first row is the underfitting ESM-2 model, the second row is the fine-tuned one on Jamali’s dataset, and the third row is the original ESM-2 model. For each row, it is labeled as undruggable protein (Jamali’s dataset), druggable protein (Jamali’s dataset), undruggable protein (Pharos dataset), and druggable protein (Pharos dataset). For each column, the protein is the same.</p>
Full article ">Figure 8
<p>The web server interface.</p>
Full article ">Figure 9
<p>DNN architecture.</p>
Full article ">Figure 10
<p>Capsule network architecture.</p>
Full article ">Figure 11
<p>BiLSTM network architecture.</p>
Full article ">Figure 12
<p>The modified GPT-2 for classification. (<b>a</b>) The embedding technique; (<b>b</b>) The modified architecture.</p>
Full article ">Figure 12 Cont.
<p>The modified GPT-2 for classification. (<b>a</b>) The embedding technique; (<b>b</b>) The modified architecture.</p>
Full article ">
24 pages, 1873 KiB  
Article
Enhancing Child Safety in Online Gaming: The Development and Application of Protectbot, an AI-Powered Chatbot Framework
by Anum Faraz, Fardin Ahsan, Jinane Mounsef, Ioannis Karamitsos and Andreas Kanavos
Information 2024, 15(4), 233; https://doi.org/10.3390/info15040233 - 19 Apr 2024
Viewed by 871
Abstract
This study introduces Protectbot, an innovative chatbot framework designed to improve safety in children’s online gaming environments. At its core, Protectbot incorporates DialoGPT, a conversational Artificial Intelligence (AI) model rooted in Generative Pre-trained Transformer 2 (GPT-2) technology, engineered to simulate human-like interactions within [...] Read more.
This study introduces Protectbot, an innovative chatbot framework designed to improve safety in children’s online gaming environments. At its core, Protectbot incorporates DialoGPT, a conversational Artificial Intelligence (AI) model rooted in Generative Pre-trained Transformer 2 (GPT-2) technology, engineered to simulate human-like interactions within gaming chat rooms. The framework is distinguished by a robust text classification strategy, rigorously trained on the Publicly Available Natural 2012 (PAN12) dataset, aimed at identifying and mitigating potential sexual predatory behaviors through chat conversation analysis. By utilizing fastText for word embeddings to vectorize sentences, we have refined a support vector machine (SVM) classifier, achieving remarkable performance metrics, with recall, accuracy, and F-scores approaching 0.99. These metrics not only demonstrate the classifier’s effectiveness, but also signify a significant advancement beyond existing methodologies in this field. The efficacy of our framework is additionally validated on a custom dataset, composed of 71 predatory chat logs from the Perverted Justice website, further establishing the reliability and robustness of our classifier. Protectbot represents a crucial innovation in enhancing child safety within online gaming communities, providing a proactive, AI-enhanced solution to detect and address predatory threats promptly. Our findings highlight the immense potential of AI-driven interventions to create safer digital spaces for young users. Full article
(This article belongs to the Special Issue Do (AI) Chatbots Pose any Special Challenges for Trust and Privacy?)
Show Figures

Figure 1

Figure 1
<p>The operational workflow of Protectbot system architecture (6 steps).</p>
Full article ">Figure 2
<p>Protectbot conversational language generation model architecture (Steps 1–4).</p>
Full article ">Figure 3
<p>Architecture of Protectbot’s classification model: sequential process from conversational engagement (Step 5) to behavioral analysis (Step 6).</p>
Full article ">Figure 4
<p>Comprehensive pipeline of the Protectbot classification model: process from initial data input through preprocessing, feature extraction, and final classification.</p>
Full article ">Figure 5
<p>Structured example from the PAN12 dataset, illustrating detailed interaction attributes.</p>
Full article ">Figure 6
<p>Overview of Protectbot system integration: from initial chat to potential predator identification.</p>
Full article ">Figure 7
<p>Detailed Confusion matrix visualization for classifiers.</p>
Full article ">
18 pages, 1459 KiB  
Article
Contrastive Learning Penalized Cross-Entropy with Diversity Contrastive Search Decoding for Diagnostic Report Generation of Reduced Token Repetition
by Taozheng Zhang, Jiajian Meng, Yuseng Yang and Shaode Yu
Appl. Sci. 2024, 14(7), 2817; https://doi.org/10.3390/app14072817 - 27 Mar 2024
Viewed by 536
Abstract
Medical imaging description and disease diagnosis are vitally important yet time-consuming. Automated diagnosis report generation (DRG) from medical imaging description can reduce clinicians’ workload and improve their routine efficiency. To address this natural language generation task, fine-tuning a pre-trained large language model (LLM) [...] Read more.
Medical imaging description and disease diagnosis are vitally important yet time-consuming. Automated diagnosis report generation (DRG) from medical imaging description can reduce clinicians’ workload and improve their routine efficiency. To address this natural language generation task, fine-tuning a pre-trained large language model (LLM) is cost-effective and indispensable, and its success has been witnessed in many downstream applications. However, semantic inconsistency of sentence embeddings has been massively observed from undesirable repetitions or unnaturalness in text generation. To address the underlying issue of anisotropic distribution of token representation, in this study, a contrastive learning penalized cross-entropy (CLpCE) objective function is implemented to enhance the semantic consistency and accuracy of token representation by guiding the fine-tuning procedure towards a specific task. Furthermore, to improve the diversity of token generation in text summarization and to prevent sampling from unreliable tail of token distributions, a diversity contrastive search (DCS) decoding method is designed for restricting the report generation derived from a probable candidate set with maintained semantic coherence. Furthermore, a novel metric named the maximum of token repetition ratio (maxTRR) is proposed to estimate the token diversity and to help determine the candidate output. Based on the LLM of a generative pre-trained Transformer 2 (GPT-2) of Chinese version, the proposed CLpCE with DCS (CLpCEwDCS) decoding framework is validated on 30,000 desensitized text samples from the “Medical Imaging Diagnosis Report Generation” track of 2023 Global Artificial Intelligence Technology Innovation Competition. Using four kinds of metrics evaluated from n-gram word matching, semantic relevance, and content similarity as well as the maxTRR metric extensive experiments reveal that the proposed framework effectively maintains semantic coherence and accuracy (BLEU-1, 0.4937; BLEU-2, 0.4107; BLEU-3, 0.3461; BLEU-4, 0.2933; METEOR, 0.2612; ROUGE, 0.5182; CIDER, 1.4339) and improves text generation diversity and naturalness (maxTRR, 0.12). The phenomenon of dull or repetitive text generation is common when fine-tuning pre-trained LLMs for natural language processing applications. This study might shed some light on relieving this issue by developing comprehensive strategies to enhance semantic coherence, accuracy and diversity of sentence embeddings. Full article
Show Figures

Figure 1

Figure 1
<p>The structure of Transformer and GPT-2 decoder blocks.</p>
Full article ">Figure 2
<p>The CLpCE-based model fine-tuning procedure. <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>C</mi> <mi>E</mi> </mrow> </msub> </semantics></math> guides the supervised learning and <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>C</mi> <mi>L</mi> </mrow> </msub> </semantics></math> directs the unsupervised learning, both parts contributing to the fine-tuning of pre-trained LLMs for accurate feature representation towards a specific task.</p>
Full article ">Figure 3
<p>The effect of different <math display="inline"><semantics> <mi>β</mi> </semantics></math> values and decoding methods on DRG text summarization. In the plot, the horizontal axis denotes the <math display="inline"><semantics> <mi>β</mi> </semantics></math> values in the CLpCE objective function, and the vertical axis presents the values of evaluation metrics. Specifically, combinations of different types of lines, markers and colors are used for identifying different metric values of a DRG model (BLEU-1, solid black line with ★; BLEU-2, dashed black line with ∘; BLEU-3, dotted black line with ♢; BLEU-4, dash-dotted black line with □; METEOR, dashed red line with ⊳; ROUGE, dashed green line with △; and CIDER, dashed blue line with ▽).</p>
Full article ">Figure 4
<p>The effect of control threshold <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> on the text generation diversity (<math display="inline"><semantics> <mrow> <mi>ρ</mi> <mo>=</mo> <mn>0.01</mn> </mrow> </semantics></math>, dotted red line with ♢; <math display="inline"><semantics> <mrow> <mi>ρ</mi> <mo>=</mo> <mn>0.10</mn> </mrow> </semantics></math>, dashed blue line with ∘).</p>
Full article ">
21 pages, 1298 KiB  
Article
A Unified Visual and Linguistic Semantics Method for Enhanced Image Captioning
by Jiajia Peng and Tianbing Tang
Appl. Sci. 2024, 14(6), 2657; https://doi.org/10.3390/app14062657 - 21 Mar 2024
Viewed by 599
Abstract
Image captioning, also recognized as the challenge of transforming visual data into coherent natural language descriptions, has persisted as a complex problem. Traditional approaches often suffer from semantic gaps, wherein the generated textual descriptions lack depth, context, or the nuanced relationships contained within [...] Read more.
Image captioning, also recognized as the challenge of transforming visual data into coherent natural language descriptions, has persisted as a complex problem. Traditional approaches often suffer from semantic gaps, wherein the generated textual descriptions lack depth, context, or the nuanced relationships contained within the images. In an effort to overcome these limitations, we introduce a novel encoder–decoder framework called A Unified Visual and Linguistic Semantics Method. Our method comprises three key components: an encoder, a mapping network, and a decoder. The encoder employs a fusion of CLIP (Contrastive Language–Image Pre-training) and SegmentCLIP to process and extract salient image features. SegmentCLIP builds upon CLIP’s foundational architecture by employing a clustering mechanism, thereby enhancing the semantic relationships between textual and visual elements in the image. The extracted features are then transformed by a mapping network into a fixed-length prefix. A GPT-2-based decoder subsequently generates a corresponding Chinese language description for the image. This framework aims to harmonize feature extraction and semantic enrichment, thereby producing more contextually accurate and comprehensive image descriptions. Our quantitative assessment reveals that our model exhibits notable enhancements across the intricate AIC-ICC, Flickr8k-CN, and COCO-CN datasets, evidenced by a 2% improvement in BLEU@4 and a 10% uplift in CIDEr scores. Additionally, it demonstrates acceptable efficiency in terms of simplicity, speed, and reduction in computational burden. Full article
(This article belongs to the Special Issue Recent Trends in Automatic Image Captioning Systems)
Show Figures

Figure 1

Figure 1
<p>Framework of UVLS.</p>
Full article ">Figure 2
<p>(<b>a</b>) <b>The architecture and training pipeline of segmentCLIP.</b> easing size. The images to the right illustrate the visual segments that manifest across different grouping stages. In the lower stages, pixels are grouped into parts of objects, such as the hands and legs of a woman or a little girl; in the higher stages, these are further amalgamated into complete entities, such as the entire body of the woman and the little girl. (<b>b</b>) <b>The architecture of the segmenting block.</b> At the end of each grouping stage, there is a segmenting block that calculates the similarity between the learned tokens and the segment (image) tokens. The assignment is determined via a gumbel softmax operation over the learned tokens and is then converted into a hard one-hot assignment. The segment tokens that are assigned to the same group are merged, forming new segment tokens that serve as input for the subsequent grouping stage.</p>
Full article ">Figure 3
<p>Effect of the prefix length on the captioning performance over the AIC-ICC dataset. For each prefix length, we report the BLEU@4 (red) and CIDERr (blue) scores over the test and train (dashed line) sets.</p>
Full article ">
17 pages, 1313 KiB  
Article
Using Generative AI to Improve the Performance and Interpretability of Rule-Based Diagnosis of Type 2 Diabetes Mellitus
by Leon Kopitar, Iztok Fister and Gregor Stiglic
Information 2024, 15(3), 162; https://doi.org/10.3390/info15030162 - 12 Mar 2024
Viewed by 1346
Abstract
Introduction: Type 2 diabetes mellitus is a major global health concern, but interpreting machine learning models for diagnosis remains challenging. This study investigates combining association rule mining with advanced natural language processing to improve both diagnostic accuracy and interpretability. This novel approach has [...] Read more.
Introduction: Type 2 diabetes mellitus is a major global health concern, but interpreting machine learning models for diagnosis remains challenging. This study investigates combining association rule mining with advanced natural language processing to improve both diagnostic accuracy and interpretability. This novel approach has not been explored before in using pretrained transformers for diabetes classification on tabular data. Methods: The study used the Pima Indians Diabetes dataset to investigate Type 2 diabetes mellitus. Python and Jupyter Notebook were employed for analysis, with the NiaARM framework for association rule mining. LightGBM and the dalex package were used for performance comparison and feature importance analysis, respectively. SHAP was used for local interpretability. OpenAI GPT version 3.5 was utilized for outcome prediction and interpretation. The source code is available on GitHub. Results: NiaARM generated 350 rules to predict diabetes. LightGBM performed better than the GPT-based model. A comparison of GPT and NiaARM rules showed disparities, prompting a similarity score analysis. LightGBM’s decision making leaned heavily on glucose, age, and BMI, as highlighted in feature importance rankings. Beeswarm plots demonstrated how feature values correlate with their influence on diagnosis outcomes. Discussion: Combining association rule mining with GPT for Type 2 diabetes mellitus classification yields limited effectiveness. Enhancements like preprocessing and hyperparameter tuning are required. Interpretation challenges and GPT’s dependency on provided rules indicate the necessity for prompt engineering and similarity score methods. Variations in feature importance rankings underscore the complexity of T2DM. Concerns regarding GPT’s reliability emphasize the importance of iterative approaches for improving prediction accuracy. Full article
(This article belongs to the Special Issue Machine Learning and Artificial Intelligence with Applications)
Show Figures

Figure 1

Figure 1
<p>Architecture of the proposed approach.</p>
Full article ">Figure 2
<p>Ten GPT-recognized rules that contributed to predicted non-diabetic status with the highest frequency. Percentages in the circular pie chart represent the proportions among the ten most impactful rules.</p>
Full article ">Figure 3
<p>Ten GPT-recognized rules that contributed to predicted diabetic status with the highest frequency. Percentages in the circular pie chart represent the proportions among the ten most impactful rules.</p>
Full article ">Figure 4
<p>A ranking diagram of feature importance displayed as the average impact on model output magnitude. The bigger the average impact, the higher the importance of the feature.</p>
Full article ">Figure 5
<p>Feature importance displayed as an average impact on model output magnitude.</p>
Full article ">
27 pages, 9431 KiB  
Article
Generative Pre-Trained Transformer (GPT) in Research: A Systematic Review on Data Augmentation
by Fahim Sufi
Information 2024, 15(2), 99; https://doi.org/10.3390/info15020099 - 8 Feb 2024
Cited by 4 | Viewed by 5896
Abstract
GPT (Generative Pre-trained Transformer) represents advanced language models that have significantly reshaped the academic writing landscape. These sophisticated language models offer invaluable support throughout all phases of research work, facilitating idea generation, enhancing drafting processes, and overcoming challenges like writer’s block. Their capabilities [...] Read more.
GPT (Generative Pre-trained Transformer) represents advanced language models that have significantly reshaped the academic writing landscape. These sophisticated language models offer invaluable support throughout all phases of research work, facilitating idea generation, enhancing drafting processes, and overcoming challenges like writer’s block. Their capabilities extend beyond conventional applications, contributing to critical analysis, data augmentation, and research design, thereby elevating the efficiency and quality of scholarly endeavors. Strategically narrowing its focus, this review explores alternative dimensions of GPT and LLM applications, specifically data augmentation and the generation of synthetic data for research. Employing a meticulous examination of 412 scholarly works, it distills a selection of 77 contributions addressing three critical research questions: (1) GPT on Generating Research data, (2) GPT on Data Analysis, and (3) GPT on Research Design. The systematic literature review adeptly highlights the central focus on data augmentation, encapsulating 48 pertinent scholarly contributions, and extends to the proactive role of GPT in critical analysis of research data and shaping research design. Pioneering a comprehensive classification framework for “GPT’s use on Research Data”, the study classifies existing literature into six categories and 14 sub-categories, providing profound insights into the multifaceted applications of GPT in research data. This study meticulously compares 54 pieces of literature, evaluating research domains, methodologies, and advantages and disadvantages, providing scholars with profound insights crucial for the seamless integration of GPT across diverse phases of their scholarly pursuits. Full article
(This article belongs to the Special Issue Editorial Board Members’ Collection Series: "Information Processes")
Show Figures

Figure 1

Figure 1
<p>Conceptual diagram of how GPT performs feature extraction, data augmentation, and synthetic data generation.</p>
Full article ">Figure 2
<p>Use GPT and associated LLM in all phases of research. 48 scholarly works on data augmentation (Starred denoting the main focus of this review), 12 existing publications on critical analysis (i.e., research data analysis), and 10 papers on research design.</p>
Full article ">Figure 3
<p>Search keyword used for obtaining relevant existing academic works on “GPT, LLM, and associated technologies in different phases of research”.</p>
Full article ">Figure 4
<p>Schematic Diagram of the systematic literature review (i.e., use of GPT, LLM, and associated technologies on different phases of research).</p>
Full article ">Figure 5
<p>A comprehensive classification framework for “GPT’s use of research data”.</p>
Full article ">Figure 6
<p>A comparative schematic of feature extraction process with NLP and GPT.</p>
Full article ">Figure 7
<p>Chat2VIS analyzes data and shows results in visualization with a GPT prompt like “plot the gross against budget” [<a href="#B23-information-15-00099" class="html-bibr">23</a>].</p>
Full article ">Figure 8
<p>PRISMA Flow Diagram on the systematic literature review of “GPT for research”.</p>
Full article ">Figure 9
<p>Timeline analysis of existing literature on the use of GPT in research.</p>
Full article ">Figure A1
<p>Database search from Scopus using Scopus-specific advanced query. From Scopus, 99 documents were returned, including the duplicates. After removing the duplicates, records were screened. For example, the first record, “Beyond the Scalpel: Assessing ChatGPT’s Potential as an Auxiliary Intelligent Virtual Assistant in Oral Surgery” is not relevant to the focus of this study, “i.e., GPT in Research/GPT in Data Augmentation/GPT in Data Generation/GPT in Solving Research Problem”.</p>
Full article ">Figure A2
<p>Database search from IEEE Xplore using IEEE Xplore-specific advanced queries. A total of 119 documents were returned, including duplicates. After removing the duplicates, records were screened. For example, the first record was included, and the second record was screened out as this paper does not address “GPT in research”.</p>
Full article ">Figure A3
<p>Database search from PubMed using a PubMed-specific advanced query. From PubMed, 47 documents were returned, including the duplicates.</p>
Full article ">Figure A4
<p>Database Search from Web of Science using their supported advanced query. From Web of Science, 306 documents were returned, including duplicates. After removing the duplicates, the records were screened. For example, the first records were screened out as this paper was focused on nanotechnology and nanomaterials.</p>
Full article ">Figure A5
<p>Database search from the ACM Digital Library using their supported advanced query. From the ACM Digital Library, 102 documents were returned, including duplicates.</p>
Full article ">Figure A6
<p>Litmaps suggest 20 possibly relevant articles by visually analyzing the citation maps of [<a href="#B66-information-15-00099" class="html-bibr">66</a>].</p>
Full article ">
19 pages, 2348 KiB  
Article
BookGPT: A General Framework for Book Recommendation Empowered by Large Language Model
by Zhiyu Li, Yanfang Chen, Xuan Zhang and Xun Liang
Electronics 2023, 12(22), 4654; https://doi.org/10.3390/electronics12224654 - 15 Nov 2023
Cited by 1 | Viewed by 1818
Abstract
With the continuous development and change exhibited by large language model (LLM) technology, represented by generative pretrained transformers (GPTs), many classic scenarios in various fields have re-emerged with new opportunities. This paper takes ChatGPT as the modeling object, incorporates LLM technology into the [...] Read more.
With the continuous development and change exhibited by large language model (LLM) technology, represented by generative pretrained transformers (GPTs), many classic scenarios in various fields have re-emerged with new opportunities. This paper takes ChatGPT as the modeling object, incorporates LLM technology into the typical book resource understanding and recommendation scenario for the first time, and puts it into practice. By building a ChatGPT-like book recommendation system (BookGPT) framework based on ChatGPT, this paper attempts to apply ChatGPT to recommendation modeling for three typical tasks: book rating recommendation, user rating recommendation, and the book summary recommendation; it also explores the feasibility of LLM technology in book recommendation scenarios. At the same time, based on different evaluation schemes for book recommendation tasks and the existing classic recommendation models, this paper discusses the advantages and disadvantages of the BookGPT in book recommendation scenarios and analyzes the opportunities and improvement directions for subsequent LLMs in these scenarios. The experimental research shows the following: (1) The BookGPT can achieve good recommendation results in existing classic book recommendation tasks. Especially in cases containing less information about the target object to be recommended, such as zero-shot or one-shot learning tasks, the performance of the BookGPT is close to or even better than that of the current classic book recommendation algorithms, and this method has great potential for improvement. (2) In text generation tasks such as book summary recommendation, the recommendation effect of the BookGPT model is better than that of the manual editing process of Douban Reading, and it can even perform personalized interpretable content recommendations based on readers’ attribute and identity information, making it more persuasive than interpretable one-size-fits-all recommendation models. Finally, we have open-sourced the relevant datasets and experimental codes, hoping that the exploratory program proposed in this paper can inspire the development of more LLMs to expand their applications and theoretical research prospects in the field of book recommendation and general recommendation tasks. Full article
Show Figures

Figure 1

Figure 1
<p>ChatGPT’s search volume in the Baidu Index from November 2022 to April 2023.</p>
Full article ">Figure 2
<p>Framework of the BookGPT.</p>
Full article ">Figure 3
<p>Prompt examples for the BookGPT.</p>
Full article ">Figure 4
<p>Example of role injection.</p>
Full article ">Figure 5
<p>NDCG scores obtained in the user rating preference recommendation task. NDCG is a metric used to evaluate the performance of recommendation and information retrieval systems, considering both the relevance and ranking of recommended items. The value of NDCG ranges from 0 to 1, with 1 indicating optimal performance. Hence, a higher NDCG value signifies a more effective system.</p>
Full article ">
19 pages, 4236 KiB  
Article
Improved Leakage Detection and Recognition Algorithm for Residual Neural Networks Based on Transfer Learning
by Liangliang Li, Yu Chen, Zhengxiang Ma, Xinling Wen, Jiabao Pang and Weitao Yuan
Electronics 2023, 12(20), 4378; https://doi.org/10.3390/electronics12204378 - 23 Oct 2023
Viewed by 903
Abstract
Due to the lack of other component information in traditional magnetic leakage signal defects and the low accuracy of prediction methods, this paper proposes an improved residual network for magnetic leakage detection defect recognition method that predicts defect size and different detection speeds. [...] Read more.
Due to the lack of other component information in traditional magnetic leakage signal defects and the low accuracy of prediction methods, this paper proposes an improved residual network for magnetic leakage detection defect recognition method that predicts defect size and different detection speeds. A new defect diagnosis method based on ResNet18 on the Convolutional Neural Network (CNN) is proposed in this study. This method transfers the pre-trained ResNet18 network and replaces the activation function in the transferred network structure. It extracts features from transformed two-dimensional images obtained by converting the original experimental signals and signals with added noise, removing the influence of manual features. The results demonstrated that the improved ResNet18 network model, after transfer learning, achieved 100% prediction accuracy for all 10,000 grayscale images generated with defect lengths of 50 mm; width of 2 mm; and depths of 2 mm, 4 mm, 6 mm, and 8 mm. Moreover, the prediction accuracies for the quasi-static, slow, compensated fast, and fast scanning speeds were 99.20%, 98.50%, 93.30%, and 94.00%, respectively, for defect depths of 2 mm, 4 mm, 6 mm, and 8 mm. These accuracies surpass those of other models, demonstrating the significant improvement in prediction accuracy achieved by this method. Full article
Show Figures

Figure 1

Figure 1
<p>Detection principle of magnetic leakage method.</p>
Full article ">Figure 2
<p>The characteristics of defect leakage magnetic signals in the radial and axial components.</p>
Full article ">Figure 3
<p>Radial leakage curve.</p>
Full article ">Figure 4
<p>Axial leakage curve. Note: <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>ρ</mi> </mrow> <mrow> <mi>s</mi> </mrow> </msub> </mrow> </semantics></math>/<math display="inline"><semantics> <mrow> <mn>2</mn> <mi>π</mi> <msub> <mrow> <mi>μ</mi> </mrow> <mrow> <mn>0</mn> </mrow> </msub> </mrow> </semantics></math> is the constant, defect width <span class="html-italic">n</span> = 5 mm, lift-off distance <span class="html-italic">y</span> = 5 mm, and defect depth <span class="html-italic">h</span> is taken as 2 mm, 4 mm, 6 mm, and 8 mm, respectively. Under the same conditions, as the defect depth increases, the difference between the peak and valley values of the radial leakage magnetic signal data will increase, and the peak value of the axial leakage magnetic signal data will increase.</p>
Full article ">Figure 5
<p>Simple CNN architecture.</p>
Full article ">Figure 6
<p>Improved ResNet18 architecture with transfer learning.</p>
Full article ">Figure 7
<p>Algorithm overall flowchart.</p>
Full article ">Figure 8
<p>Process flowchart of converting magnetic signals to grayscale images.</p>
Full article ">Figure 9
<p>Signal–image conversion diagram.</p>
Full article ">Figure 10
<p>Feature maps corresponding to different defect sizes. “l” represents the length of the defect, “w” represents the width of the defect, and “d” represents the depth of the defect.</p>
Full article ">Figure 11
<p>Input layer feature changes.</p>
Full article ">Figure 12
<p>Basic Block changes.</p>
Full article ">Figure 13
<p>Identification accuracy of ResNet 18 after migration improvement.</p>
Full article ">Figure 14
<p>Comparison of confusion matrices for four types of defects.</p>
Full article ">Figure 15
<p>Cost function comparison diagram of 4 scanning detection speeds vs. defects of the same depth.</p>
Full article ">Figure 16
<p>Comparison of recognition accuracy of defects with different depths by four scanning detection speeds.</p>
Full article ">
72 pages, 4682 KiB  
Review
Deep Learning for Medical Image-Based Cancer Diagnosis
by Xiaoyan Jiang, Zuojin Hu, Shuihua Wang and Yudong Zhang
Cancers 2023, 15(14), 3608; https://doi.org/10.3390/cancers15143608 - 13 Jul 2023
Cited by 18 | Viewed by 6642
Abstract
(1) Background: The application of deep learning technology to realize cancer diagnosis based on medical images is one of the research hotspots in the field of artificial intelligence and computer vision. Due to the rapid development of deep learning methods, cancer diagnosis requires [...] Read more.
(1) Background: The application of deep learning technology to realize cancer diagnosis based on medical images is one of the research hotspots in the field of artificial intelligence and computer vision. Due to the rapid development of deep learning methods, cancer diagnosis requires very high accuracy and timeliness as well as the inherent particularity and complexity of medical imaging. A comprehensive review of relevant studies is necessary to help readers better understand the current research status and ideas. (2) Methods: Five radiological images, including X-ray, ultrasound (US), computed tomography (CT), magnetic resonance imaging (MRI), positron emission computed tomography (PET), and histopathological images, are reviewed in this paper. The basic architecture of deep learning and classical pretrained models are comprehensively reviewed. In particular, advanced neural networks emerging in recent years, including transfer learning, ensemble learning (EL), graph neural network, and vision transformer (ViT), are introduced. Five overfitting prevention methods are summarized: batch normalization, dropout, weight initialization, and data augmentation. The application of deep learning technology in medical image-based cancer analysis is sorted out. (3) Results: Deep learning has achieved great success in medical image-based cancer diagnosis, showing good results in image classification, image reconstruction, image detection, image segmentation, image registration, and image synthesis. However, the lack of high-quality labeled datasets limits the role of deep learning and faces challenges in rare cancer diagnosis, multi-modal image fusion, model explainability, and generalization. (4) Conclusions: There is a need for more public standard databases for cancer. The pre-training model based on deep neural networks has the potential to be improved, and special attention should be paid to the research of multimodal data fusion and supervised paradigm. Technologies such as ViT, ensemble learning, and few-shot learning will bring surprises to cancer diagnosis based on medical images. Full article
Show Figures

Figure 1

Figure 1
<p>Basic schematic diagram of CT.</p>
Full article ">Figure 2
<p>General MRI imaging procedures.</p>
Full article ">Figure 3
<p>Block diagram of a real-time two-dimensional color flow ultrasonic imaging system.</p>
Full article ">Figure 4
<p>Basic schematic diagram of X-ray.</p>
Full article ">Figure 5
<p>Basic schematic diagram of PET.</p>
Full article ">Figure 6
<p>Schematic diagram of convolutional neural network structure.</p>
Full article ">Figure 7
<p>The structure of an autoencoder.</p>
Full article ">Figure 8
<p>Scheme diagram of the DC-ELM network with three-layer convolution.</p>
Full article ">Figure 9
<p>The RNN model structure diagram.</p>
Full article ">Figure 10
<p>The GAN model structure diagram.</p>
Full article ">Figure 11
<p>The classic DBN network structure.</p>
Full article ">Figure 12
<p>The LeNet-5 architecture.</p>
Full article ">Figure 13
<p>The basic residual block.</p>
Full article ">Figure 14
<p>Schematic of a deep DenseNet with three dense blocks.</p>
Full article ">Figure 15
<p>Diagram of applying dropout.</p>
Full article ">
24 pages, 717 KiB  
Article
DiffuD2T: Empowering Data-to-Text Generation with Diffusion
by Heng Gong, Xiaocheng Feng and Bing Qin
Electronics 2023, 12(9), 2136; https://doi.org/10.3390/electronics12092136 - 7 May 2023
Viewed by 2340
Abstract
Surrounded by structured data, such as medical data, financial data, knowledge bases, etc., data-to-text generation has become an important natural language processing task that can help people better understand the meaning of those data by providing them with user-friendly text. Existing methods for [...] Read more.
Surrounded by structured data, such as medical data, financial data, knowledge bases, etc., data-to-text generation has become an important natural language processing task that can help people better understand the meaning of those data by providing them with user-friendly text. Existing methods for data-to-text generation show promising results in tackling two major challenges: content planning and surface realization, which transform structured data into fluent text. However, they lack an iterative refinement process for generating text, which can enable the model to perfect the text step-by-step while accepting control over the process. In this paper, we explore enhancing data-to-text generation with an iterative refinement process via diffusion. We have four main contributions: (1) we use the diffusion model to improve the prefix tuning for data-to-text generation; (2) we propose a look-ahead guiding loss to supervise the iterative refinement process for better text generation; (3) we extract content plans from reference text and propose a planning-then-writing pipeline to give the model content planning ability; and (4) we conducted experiments on three data-to-text generation datasets and both automatic evaluation criteria (BLEU, NIST, METEOR, ROUGEL, CIDEr, TER, MoverScore, BLEURT, and BERTScore) and human evaluation criteria (Quality and Naturalness) show the effectiveness of our model. Our model can improve the competitive prefix tuning method by 2.19% in terms of a widely-used automatic evaluation criterion BLEU (BiLingual Evaluation Understudy) on WebNLG dataset with GPT-2 Large as the pretrained language model backbone. Human evaluation criteria also show that our model can improve the quality and naturalness of the generated text across all three datasets. Full article
(This article belongs to the Special Issue Natural Language Processing and Information Retrieval)
Show Figures

Figure 1

Figure 1
<p>An illustration of an example for data-to-text generation. In this example, the structured data are seven related triples from the knowledge base. Each triple consists of the name of the entity, the type of this information, and the corresponding value. Given the structured data, a text report faithfully expresses the information in the data.</p>
Full article ">Figure 2
<p>An illustration of the step-by-step optimizing process of the DiffuD2T. The left of the figure presents the iterative refinement process of the diffusion model for the representation of prefixes, as illustrated in <a href="#sec3dot2-electronics-12-02136" class="html-sec">Section 3.2</a>. <math display="inline"><semantics> <msub> <mi>y</mi> <mi>T</mi> </msub> </semantics></math> is random noise sampled from the Gaussian distribution <math display="inline"><semantics> <mrow> <mi mathvariant="double-struck">N</mi> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mi>I</mi> <mo>)</mo> </mrow> </semantics></math>. Through the reverse process that denoises <math display="inline"><semantics> <msub> <mi>y</mi> <mi>T</mi> </msub> </semantics></math> step-by-step, we ultimately obtain a high-quality <math display="inline"><semantics> <msub> <mi>y</mi> <mn>0</mn> </msub> </semantics></math> for the representation of the prefix. The forward process adds noise to the <math display="inline"><semantics> <msub> <mi>y</mi> <mn>0</mn> </msub> </semantics></math> step-by-step. After getting <math display="inline"><semantics> <msub> <mi>y</mi> <mn>0</mn> </msub> </semantics></math>, we use linear transformation to map it to the shape of PLM so that it can serve as “virtual tokens” that help PLM adapt to different tasks. The parameters for the diffusion model and the linear transformation are trainable while the parameters of PLM are frozen during training. The PLM takes the structured data as input and generates the text with the help of prefixes.</p>
Full article ">Figure 3
<p>An illustration of our proposed look-ahead guiding loss. Please be reminded that during each denoising step t, we first predict the <math display="inline"><semantics> <msubsup> <mi>y</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msubsup> </semantics></math> directly through <math display="inline"><semantics> <mrow> <msub> <mi>f</mi> <mi>θ</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>t</mi> </msub> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> </semantics></math>, then obtain the <math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </semantics></math> through the forward process by iteratively applying <math display="inline"><semantics> <mrow> <mi>q</mi> <mo>(</mo> <msub> <mi>y</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> <mo>|</mo> <msub> <mi>y</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>2</mn> </mrow> </msub> <mo>)</mo> </mrow> </semantics></math>, according to <a href="#sec3dot2-electronics-12-02136" class="html-sec">Section 3.2</a>. The top of the figure shows that we use the predicted <math display="inline"><semantics> <msubsup> <mi>y</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msubsup> </semantics></math> to obtain the corresponding prefix and calculate the loss <math display="inline"><semantics> <mrow> <msub> <mi>L</mi> <mrow> <mi>g</mi> <mi>e</mi> <mi>n</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>y</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </semantics></math> of PLM on the target text, which indicates the denoised <math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </semantics></math>’s performance on generating text. Then, as shown in the bottom part of the figure, we take one step further to denoise <math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </semantics></math> again to obtain <math display="inline"><semantics> <msubsup> <mi>y</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>−</mo> <mn>2</mn> </mrow> </msubsup> </semantics></math> and <math display="inline"><semantics> <msub> <mi>y</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>2</mn> </mrow> </msub> </semantics></math>. Similarly, we obtain its performance on generating text <math display="inline"><semantics> <mrow> <msub> <mi>L</mi> <mrow> <mi>g</mi> <mi>e</mi> <mi>n</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>y</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>−</mo> <mn>2</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </semantics></math>. Since the lower the loss, the better the performance, we propose a new look-ahead guiding loss <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>a</mi> <mi>h</mi> <mi>e</mi> <mi>a</mi> <mi>d</mi> </mrow> </msub> </semantics></math> to supervise <math display="inline"><semantics> <mrow> <msub> <mi>L</mi> <mrow> <mi>g</mi> <mi>e</mi> <mi>n</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>y</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>−</mo> <mn>2</mn> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>&lt;</mo> <msub> <mi>L</mi> <mrow> <mi>g</mi> <mi>e</mi> <mi>n</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>y</mi> <mn>0</mn> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </semantics></math>. That is, the denoised representation should obtain better performance in generating text than its previous one.</p>
Full article ">Figure 4
<p>An illustration of the planning-then-writing pipeline. The bottom of the figure shows the planning stage. We extract important information from the text to construct the content plan by identifying words that both appear in the target text and the structured data. Then, we use the structured data and the extracted plan to train a content planner with the same model structure as the text generator, described in <a href="#sec4dot1-electronics-12-02136" class="html-sec">Section 4.1</a> and <a href="#sec4dot2-electronics-12-02136" class="html-sec">Section 4.2</a>. Then, we use the parameters of the content planner to initialize the training of the text generator, which is at the top of this figure in gray, to help the text generator produce better text with the ability to plan.</p>
Full article ">
19 pages, 9341 KiB  
Article
TESR: Two-Stage Approach for Enhancement and Super-Resolution of Remote Sensing Images
by Anas M. Ali, Bilel Benjdira, Anis Koubaa, Wadii Boulila and Walid El-Shafai
Remote Sens. 2023, 15(9), 2346; https://doi.org/10.3390/rs15092346 - 29 Apr 2023
Cited by 8 | Viewed by 2744
Abstract
Remote Sensing (RS) images are usually captured at resolutions lower than those required. Deep Learning (DL)-based super-resolution (SR) architectures are typically used to increase the resolution artificially. In this study, we designed a new architecture called TESR (Two-stage approach for Enhancement and super-resolution), [...] Read more.
Remote Sensing (RS) images are usually captured at resolutions lower than those required. Deep Learning (DL)-based super-resolution (SR) architectures are typically used to increase the resolution artificially. In this study, we designed a new architecture called TESR (Two-stage approach for Enhancement and super-resolution), leveraging the power of Vision Transformers (ViT) and the Diffusion Model (DM) to increase the resolution of RS images artificially. The first stage is the ViT-based model, which serves to increase resolution. The second stage is an iterative DM pre-trained on a larger dataset, which serves to increase image quality. Every stage is trained separately on the given task using a separate dataset. The self-attention mechanism of the ViT helps the first stage generate global and contextual details. The iterative Diffusion Model helps the second stage enhance the image’s quality and generate consistent and harmonic fine details. We found that TESR outperforms state-of-the-art architectures on super-resolution of remote sensing images on the UCMerced benchmark dataset. Considering the PSNR/SSIM metrics, TESR improves SR image quality as compared to state-of-the-art techniques from 34.03/0.9301 to 35.367/0.9449 in the scale ×2. On a scale of ×3, it improves from 29.92/0.8408 to 32.311/0.91143. On a scale of ×4, it improves from 27.77/0.7630 to 31.951/0.90456. We also found that the Charbonnier loss outperformed other loss functions in the training of both stages of TESR. The improvement was by a margin of 21.5%/14.3%, in the PSNR/SSIM, respectively. The source code of TESR is open to the community. Full article
Show Figures

Figure 1

Figure 1
<p>Block diagram for the proposed algorithm (TESR).</p>
Full article ">Figure 2
<p>Illustration of different SR frameworks. (<b>a</b>) Pre-upsample framework. (<b>b</b>) Post-upsample framework.</p>
Full article ">Figure 3
<p>Block diagram for the SwinIR ViT model.</p>
Full article ">Figure 4
<p>The main concept behind iterative DM.</p>
Full article ">Figure 5
<p>The three main steps to train the TESR architecture.</p>
Full article ">Figure 6
<p>Result comparisons on the UCMerced dataset with different stages. (<b>a</b>) The ground-truth scene. (<b>b</b>) A bicubic interpolation scene with a ×4 factor. (<b>c</b>) SwinIR upsamples a scene with ×4 factor. (<b>d</b>) An iterative DM scene.</p>
Full article ">Figure 7
<p>The sample of RS images before and after applying our proposed model includes ground-truth (GT), low-resolution (LR), and super-resolution images for each stage.</p>
Full article ">Figure 8
<p>The sample of RS images after the second stage (iterative DM).</p>
Full article ">Figure 9
<p>Illustration of the histograms of the RS original image, the interpolated LR image, an image enlarged using SwinIR, and an image enhanced using DM.</p>
Full article ">
27 pages, 1427 KiB  
Article
An Empirical Analysis of State-of-Art Classification Models in an IT Incident Severity Prediction Framework
by Salman Ahmed, Muskaan Singh, Brendan Doherty, Effirul Ramlan, Kathryn Harkin, Magda Bucholc and Damien Coyle
Appl. Sci. 2023, 13(6), 3843; https://doi.org/10.3390/app13063843 - 17 Mar 2023
Cited by 2 | Viewed by 3177
Abstract
Large-scale companies across various sectors maintain substantial IT infrastructure to support their operations and provide quality services for their customers and employees. These IT operations are managed by teams who deal directly with incident reports (i.e., those generated automatically through autonomous systems or [...] Read more.
Large-scale companies across various sectors maintain substantial IT infrastructure to support their operations and provide quality services for their customers and employees. These IT operations are managed by teams who deal directly with incident reports (i.e., those generated automatically through autonomous systems or human operators). (1) Background: Early identification of major incidents can provide a significant advantage for reducing the disruption to normal business operations, especially for preventing catastrophic disruptions, such as a complete system shutdown. (2) Methods: This study conducted an empirical analysis of eleven (11) state-of-the-art models to predict the severity of these incidents using an industry-led use-case composed of 500,000 records collected over one year. (3) Results: The datasets were generated from three stakeholders (i.e., agency, customer, and employee). Separately, the bidirectional encoder representations from transformers (BERT), the robustly optimized BERT pre-training approach (RoBERTa), the enhanced representation through knowledge integration (ERNIE 2.0), and the extreme gradient boosting (XGBoost) methods performed the best for the agency records (93% AUC), while the convolutional neural network (CNN) was the best model for the rest (employee records at 95% AUC and customer records at 74% AUC, respectively). The average prediction horizon was approximately 150 min, which was significant for real-time deployment. (4) Conclusions: The study provided a comprehensive analysis that supported the deployment of artificial intelligence for IT operations (AIOps), specifically for incident management within large-scale organizations. Full article
(This article belongs to the Special Issue Machine/Deep Learning: Applications, Technologies and Algorithms)
Show Figures

Figure 1

Figure 1
<p>Overview of IT incident management system.</p>
Full article ">Figure 2
<p>Distribution of dataset.</p>
Full article ">Figure 3
<p>Proposed IT incident risk framework.</p>
Full article ">Figure 4
<p>Graphical representation of the pipeline stages for training and testing for ML, DL, and transformer models.</p>
Full article ">Figure 5
<p>Graphical representation of our best-performing transformer (Roberta) model architecture.</p>
Full article ">Figure 6
<p>Average tokenizer/class for training and test data across (<b>a</b>) Agency_records, (<b>b</b>) Employee_records and (<b>c</b>) Customer_records.</p>
Full article ">Figure 7
<p>Quantitative ROC/AUC results across (<b>a</b>) Agency_records, (<b>b</b>) Employee_records, and (<b>c</b>) Customer_records for state-of-art methods.</p>
Full article ">Figure 8
<p>Quantitative ROC/AUC results across (<b>a</b>) Agency_resampled, (<b>b</b>) Employee_resampled (<b>c</b>) Customer_resampled, and (<b>d</b>) Combine_All for state-of-art methods.</p>
Full article ">Figure 9
<p>Qualitative ROC/AUC results for synthetic data (SMOTE) across (<b>a</b>) agency, (<b>b</b>) employee and (<b>c</b>) customer for state-of-art methods.</p>
Full article ">Figure 10
<p>ROC/AUC results across <span class="html-italic">k</span>-fold cross-validation.</p>
Full article ">Figure 11
<p>Number of MIR across actual, ML (naive Bayes, gradient boost, XGBoost, CatBoost, SVM), transformers (BERT, RoBERTa, ERNIE 2.0), and DL algorithms(GRU, Bi-LSTM, CNN).</p>
Full article ">Figure 12
<p>Prediction horizon comparison (MIR opened at vs. incident opened at. The blue line indicates IOA records while the orange shows the MOA records).</p>
Full article ">Figure 13
<p>Prediction horizon comparison (MIR opened at vs. incident opened at).</p>
Full article ">
16 pages, 1438 KiB  
Article
A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification
by Sahar F. Sabbeh and Heba A. Fasihuddin
Electronics 2023, 12(6), 1425; https://doi.org/10.3390/electronics12061425 - 16 Mar 2023
Cited by 10 | Viewed by 4058
Abstract
Sentiment analysis on social media platforms (i.e., Twitter or Facebook) has become an important tool to learn about users’ opinions and preferences. However, the accuracy of sentiment analysis is disrupted by the challenges of natural language processing (NLP). Recently, deep learning models have [...] Read more.
Sentiment analysis on social media platforms (i.e., Twitter or Facebook) has become an important tool to learn about users’ opinions and preferences. However, the accuracy of sentiment analysis is disrupted by the challenges of natural language processing (NLP). Recently, deep learning models have proved superior performance over statistical- and lexical-based approaches in NLP-related tasks. Word embedding is an important layer of deep learning models to generate input features. Many word embedding models have been presented for text representation of both classic and context-based word embeddings. In this paper, we present a comparative analysis to evaluate both classic and contextualized word embeddings for sentiment analysis. The four most frequently used word embedding techniques were used in their trained and pre-trained versions. The selected embedding represents classical and contextualized techniques. Classical word embedding includes algorithms such as GloVe, Word2vec, and FastText. By contrast, ARBERT is used as a contextualized embedding model. Since word embedding is more typically employed as the input layer in deep networks, we used deep learning architectures BiLSTM and CNN for sentiment classification. To achieve these goals, the experiments were applied to a series of benchmark datasets: HARD, Khooli, AJGT, ArSAS, and ASTD. Finally, a comparative analysis was conducted on the results obtained for the experimented models. Our outcomes indicate that, generally, generated embedding by one technique achieves higher performance than its pretrained version for the same technique by around 0.28 to 1.8% accuracy, 0.33 to 2.17% precision, and 0.44 to 2% recall. Moreover, the contextualized transformer-based embedding model BERT achieved the highest performance in its pretrained and trained versions. Additionally, the results indicate that BiLSTM outperforms CNN by approximately 2% in 3 datasets, HARD, Khooli, and ArSAS, while CNN achieved around 2% higher performance in the smaller datasets, AJGT and ASTD. Full article
Show Figures

Figure 1

Figure 1
<p>The accuracy of pretrained embedding.</p>
Full article ">Figure 2
<p>The accuracy of trained embeddings models.</p>
Full article ">Figure 3
<p>The accuracy of BiLSTM vs. CNN for different datasets.</p>
Full article ">
Back to TopTop