MDPI - Publisher of Open Access Journals

18 pages, 10104 KiB

Open AccessArticle

From Plants to Pixels: The Role of Artificial Intelligence in Identifying Sericea Lespedeza in Field-Based Studies

by Aftab Siddique, Kyla Cook, Yasmin Holt, Sudhanshu S. Panda, Ajit K. Mahapatra, Eric R. Morgan, Jan A. van Wyk and Thomas H. Terrill

Agronomy 2024, 14(5), 992; https://doi.org/10.3390/agronomy14050992 - 8 May 2024

Viewed by 823

Abstract

The increasing use of convolutional neural networks (CNNs) has brought about a significant transformation in numerous fields, such as image categorization and identification. In the development of a CNN model to classify images of sericea lespedeza [SL; Lespedeza cuneata (Dum-Cours) G. Don] from [...] Read more.

The increasing use of convolutional neural networks (CNNs) has brought about a significant transformation in numerous fields, such as image categorization and identification. In the development of a CNN model to classify images of sericea lespedeza [SL; Lespedeza cuneata (Dum-Cours) G. Don] from weed images, four architectures were explored: CNN model variant 1, CNN model variant 2, the Visual Geometry Group (VGG16) model, and ResNet50. CNN model variant 1 (batch normalization with adjusted dropout method) demonstrated 100% validation accuracy, while variant 2 (RMSprop optimization with adjusted learning rate) achieved 90.78% validation accuracy. Pre-trained models, like VGG16 and ResNet50, were also analyzed. In contrast, ResNet50’s steady learning pattern indicated the potential for better generalization. A detailed evaluation of these models revealed that variant 1 achieved a perfect score in precision, recall, and F1 score, indicating superior optimization and feature utilization. Variant 2 presented a balanced performance, with metrics between 86% and 93%. VGG16 mirrored the behavior of variant 2, both maintaining around 90% accuracy. In contrast, ResNet50’s results revealed a conservative approach for class 0 predictions. Overall, variant 1 stood out in performance, while both variant 2 and VGG16 showed balanced results. The reliability of CNN model variant 1 was highlighted by the significant accuracy percentages, suggesting potential for practical implementation in agriculture. In addition to the above, a smartphone application for the identification of SL in a field-based trial showed promising results with an accuracy of 98–99%. The conclusion from the above is that a CNN model with batch normalization has the potential to play a crucial role in the future in redefining and optimizing the management of undesirable vegetation. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

16 pages, 744 KiB

Open AccessArticle

Causal Inference and Prefix Prompt Engineering Based on Text Generation Models for Financial Argument Analysis

by Fei Ding, Xin Kang, Linhuang Wang, Yunong Wu, Satoshi Nakagawa and Fuji Ren

Electronics 2024, 13(9), 1746; https://doi.org/10.3390/electronics13091746 - 1 May 2024

Viewed by 489

Abstract

The field of argument analysis has become a crucial component in the advancement of natural language processing, which holds the potential to reveal unprecedented insights from complex data and enable more efficient, cost-effective solutions for enhancing human initiatives. Despite its importance, current technologies [...] Read more.

The field of argument analysis has become a crucial component in the advancement of natural language processing, which holds the potential to reveal unprecedented insights from complex data and enable more efficient, cost-effective solutions for enhancing human initiatives. Despite its importance, current technologies face significant challenges, including (1) low interpretability, (2) lack of precision and robustness, particularly in specialized fields like finance, and (3) the inability to deploy effectively on lightweight devices. To address these challenges, we introduce a framework uniquely designed to process and analyze massive volumes of argument data efficiently and accurately. This framework employs a text-to-text Transformer generation model as its backbone, utilizing multiple prompt engineering methods to fine-tune the model. These methods include Causal Inference from ChatGPT, which addresses the interpretability problem, and Prefix Instruction Fine-tuning as well as in-domain further pre-training, which tackle the issues of low robustness and accuracy. Ultimately, the proposed framework generates conditional outputs for specific tasks using different decoders, enabling deployment on consumer-grade devices. After conducting extensive experiments, our method achieves high accuracy, robustness, and interpretability across various tasks, including the highest F1 scores in the NTCIR-17 FinArg-1 tasks. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

20 pages, 5330 KiB

Open AccessArticle

Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models

by Hongkang Chu and Taigang Liu

Int. J. Mol. Sci. 2024, 25(8), 4507; https://doi.org/10.3390/ijms25084507 - 19 Apr 2024

Viewed by 554

Abstract

Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational [...] Read more.

Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model. Full article

(This article belongs to the Special Issue Deep Learning in Bioinformatics and Biological Data Analysis)

► Show Figures

Figure 1

24 pages, 1873 KiB

Open AccessArticle

Enhancing Child Safety in Online Gaming: The Development and Application of Protectbot, an AI-Powered Chatbot Framework

by Anum Faraz, Fardin Ahsan, Jinane Mounsef, Ioannis Karamitsos and Andreas Kanavos

Information 2024, 15(4), 233; https://doi.org/10.3390/info15040233 - 19 Apr 2024

Viewed by 871

Abstract

This study introduces Protectbot, an innovative chatbot framework designed to improve safety in children’s online gaming environments. At its core, Protectbot incorporates DialoGPT, a conversational Artificial Intelligence (AI) model rooted in Generative Pre-trained Transformer 2 (GPT-2) technology, engineered to simulate human-like interactions within [...] Read more.

This study introduces Protectbot, an innovative chatbot framework designed to improve safety in children’s online gaming environments. At its core, Protectbot incorporates DialoGPT, a conversational Artificial Intelligence (AI) model rooted in Generative Pre-trained Transformer 2 (GPT-2) technology, engineered to simulate human-like interactions within gaming chat rooms. The framework is distinguished by a robust text classification strategy, rigorously trained on the Publicly Available Natural 2012 (PAN12) dataset, aimed at identifying and mitigating potential sexual predatory behaviors through chat conversation analysis. By utilizing fastText for word embeddings to vectorize sentences, we have refined a support vector machine (SVM) classifier, achieving remarkable performance metrics, with recall, accuracy, and F-scores approaching 0.99. These metrics not only demonstrate the classifier’s effectiveness, but also signify a significant advancement beyond existing methodologies in this field. The efficacy of our framework is additionally validated on a custom dataset, composed of 71 predatory chat logs from the Perverted Justice website, further establishing the reliability and robustness of our classifier. Protectbot represents a crucial innovation in enhancing child safety within online gaming communities, providing a proactive, AI-enhanced solution to detect and address predatory threats promptly. Our findings highlight the immense potential of AI-driven interventions to create safer digital spaces for young users. Full article

(This article belongs to the Special Issue Do (AI) Chatbots Pose any Special Challenges for Trust and Privacy?)

► Show Figures

Figure 1

18 pages, 1459 KiB

Open AccessArticle

Contrastive Learning Penalized Cross-Entropy with Diversity Contrastive Search Decoding for Diagnostic Report Generation of Reduced Token Repetition

by Taozheng Zhang, Jiajian Meng, Yuseng Yang and Shaode Yu

Appl. Sci. 2024, 14(7), 2817; https://doi.org/10.3390/app14072817 - 27 Mar 2024

Viewed by 536

Abstract

Medical imaging description and disease diagnosis are vitally important yet time-consuming. Automated diagnosis report generation (DRG) from medical imaging description can reduce clinicians’ workload and improve their routine efficiency. To address this natural language generation task, fine-tuning a pre-trained large language model (LLM) [...] Read more.

Medical imaging description and disease diagnosis are vitally important yet time-consuming. Automated diagnosis report generation (DRG) from medical imaging description can reduce clinicians’ workload and improve their routine efficiency. To address this natural language generation task, fine-tuning a pre-trained large language model (LLM) is cost-effective and indispensable, and its success has been witnessed in many downstream applications. However, semantic inconsistency of sentence embeddings has been massively observed from undesirable repetitions or unnaturalness in text generation. To address the underlying issue of anisotropic distribution of token representation, in this study, a contrastive learning penalized cross-entropy (CLpCE) objective function is implemented to enhance the semantic consistency and accuracy of token representation by guiding the fine-tuning procedure towards a specific task. Furthermore, to improve the diversity of token generation in text summarization and to prevent sampling from unreliable tail of token distributions, a diversity contrastive search (DCS) decoding method is designed for restricting the report generation derived from a probable candidate set with maintained semantic coherence. Furthermore, a novel metric named the maximum of token repetition ratio (maxTRR) is proposed to estimate the token diversity and to help determine the candidate output. Based on the LLM of a generative pre-trained Transformer 2 (GPT-2) of Chinese version, the proposed CLpCE with DCS (CLpCEwDCS) decoding framework is validated on 30,000 desensitized text samples from the “Medical Imaging Diagnosis Report Generation” track of 2023 Global Artificial Intelligence Technology Innovation Competition. Using four kinds of metrics evaluated from n-gram word matching, semantic relevance, and content similarity as well as the maxTRR metric extensive experiments reveal that the proposed framework effectively maintains semantic coherence and accuracy (BLEU-1, 0.4937; BLEU-2, 0.4107; BLEU-3, 0.3461; BLEU-4, 0.2933; METEOR, 0.2612; ROUGE, 0.5182; CIDER, 1.4339) and improves text generation diversity and naturalness (maxTRR, 0.12). The phenomenon of dull or repetitive text generation is common when fine-tuning pre-trained LLMs for natural language processing applications. This study might shed some light on relieving this issue by developing comprehensive strategies to enhance semantic coherence, accuracy and diversity of sentence embeddings. Full article

(This article belongs to the Special Issue Applications, Challenges and Future Direction of Natural Language Processing Based on Deep Learning)

► Show Figures

Figure 1

Figure 1
The structure of Transformer and GPT-2 decoder blocks. Full article ">Figure 2
The CLpCE-based model fine-tuning procedure. <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>C</mi> <mi>E</mi> </mrow> </msub> </semantics></math> guides the supervised learning and <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>C</mi> <mi>L</mi> </mrow> </msub> </semantics></math> directs the unsupervised learning, both parts contributing to the fine-tuning of pre-trained LLMs for accurate feature representation towards a specific task. Full article ">Figure 3
The effect of different <math display="inline"><semantics> <mi>β</mi> </semantics></math> values and decoding methods on DRG text summarization. In the plot, the horizontal axis denotes the <math display="inline"><semantics> <mi>β</mi> </semantics></math> values in the CLpCE objective function, and the vertical axis presents the values of evaluation metrics. Specifically, combinations of different types of lines, markers and colors are used for identifying different metric values of a DRG model (BLEU-1, solid black line with ★; BLEU-2, dashed black line with ∘; BLEU-3, dotted black line with ♢; BLEU-4, dash-dotted black line with □; METEOR, dashed red line with ⊳; ROUGE, dashed green line with △; and CIDER, dashed blue line with ▽). Full article ">Figure 4
The effect of control threshold <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> on the text generation diversity (<math display="inline"><semantics> <mrow> <mi>ρ</mi> <mo>=</mo> <mn>0.01</mn> </mrow> </semantics></math>, dotted red line with ♢; <math display="inline"><semantics> <mrow> <mi>ρ</mi> <mo>=</mo> <mn>0.10</mn> </mrow> </semantics></math>, dashed blue line with ∘). Full article ">

21 pages, 1298 KiB

Open AccessArticle

A Unified Visual and Linguistic Semantics Method for Enhanced Image Captioning

by Jiajia Peng and Tianbing Tang

Appl. Sci. 2024, 14(6), 2657; https://doi.org/10.3390/app14062657 - 21 Mar 2024

Viewed by 599

Abstract

Image captioning, also recognized as the challenge of transforming visual data into coherent natural language descriptions, has persisted as a complex problem. Traditional approaches often suffer from semantic gaps, wherein the generated textual descriptions lack depth, context, or the nuanced relationships contained within [...] Read more.

Image captioning, also recognized as the challenge of transforming visual data into coherent natural language descriptions, has persisted as a complex problem. Traditional approaches often suffer from semantic gaps, wherein the generated textual descriptions lack depth, context, or the nuanced relationships contained within the images. In an effort to overcome these limitations, we introduce a novel encoder–decoder framework called A Unified Visual and Linguistic Semantics Method. Our method comprises three key components: an encoder, a mapping network, and a decoder. The encoder employs a fusion of CLIP (Contrastive Language–Image Pre-training) and SegmentCLIP to process and extract salient image features. SegmentCLIP builds upon CLIP’s foundational architecture by employing a clustering mechanism, thereby enhancing the semantic relationships between textual and visual elements in the image. The extracted features are then transformed by a mapping network into a fixed-length prefix. A GPT-2-based decoder subsequently generates a corresponding Chinese language description for the image. This framework aims to harmonize feature extraction and semantic enrichment, thereby producing more contextually accurate and comprehensive image descriptions. Our quantitative assessment reveals that our model exhibits notable enhancements across the intricate AIC-ICC, Flickr8k-CN, and COCO-CN datasets, evidenced by a 2% improvement in BLEU@4 and a 10% uplift in CIDEr scores. Additionally, it demonstrates acceptable efficiency in terms of simplicity, speed, and reduction in computational burden. Full article

(This article belongs to the Special Issue Recent Trends in Automatic Image Captioning Systems)

► Show Figures

Figure 1

17 pages, 1313 KiB

Open AccessArticle

Using Generative AI to Improve the Performance and Interpretability of Rule-Based Diagnosis of Type 2 Diabetes Mellitus

by Leon Kopitar, Iztok Fister and Gregor Stiglic

Information 2024, 15(3), 162; https://doi.org/10.3390/info15030162 - 12 Mar 2024

Viewed by 1346

Abstract

Introduction: Type 2 diabetes mellitus is a major global health concern, but interpreting machine learning models for diagnosis remains challenging. This study investigates combining association rule mining with advanced natural language processing to improve both diagnostic accuracy and interpretability. This novel approach has [...] Read more.

Introduction: Type 2 diabetes mellitus is a major global health concern, but interpreting machine learning models for diagnosis remains challenging. This study investigates combining association rule mining with advanced natural language processing to improve both diagnostic accuracy and interpretability. This novel approach has not been explored before in using pretrained transformers for diabetes classification on tabular data. Methods: The study used the Pima Indians Diabetes dataset to investigate Type 2 diabetes mellitus. Python and Jupyter Notebook were employed for analysis, with the NiaARM framework for association rule mining. LightGBM and the dalex package were used for performance comparison and feature importance analysis, respectively. SHAP was used for local interpretability. OpenAI GPT version 3.5 was utilized for outcome prediction and interpretation. The source code is available on GitHub. Results: NiaARM generated 350 rules to predict diabetes. LightGBM performed better than the GPT-based model. A comparison of GPT and NiaARM rules showed disparities, prompting a similarity score analysis. LightGBM’s decision making leaned heavily on glucose, age, and BMI, as highlighted in feature importance rankings. Beeswarm plots demonstrated how feature values correlate with their influence on diagnosis outcomes. Discussion: Combining association rule mining with GPT for Type 2 diabetes mellitus classification yields limited effectiveness. Enhancements like preprocessing and hyperparameter tuning are required. Interpretation challenges and GPT’s dependency on provided rules indicate the necessity for prompt engineering and similarity score methods. Variations in feature importance rankings underscore the complexity of T2DM. Concerns regarding GPT’s reliability emphasize the importance of iterative approaches for improving prediction accuracy. Full article

(This article belongs to the Special Issue Machine Learning and Artificial Intelligence with Applications)

► Show Figures

Figure 1

27 pages, 9431 KiB

Open AccessArticle

Generative Pre-Trained Transformer (GPT) in Research: A Systematic Review on Data Augmentation

by Fahim Sufi

Information 2024, 15(2), 99; https://doi.org/10.3390/info15020099 - 8 Feb 2024

Cited by 4 | Viewed by 5896

Abstract

GPT (Generative Pre-trained Transformer) represents advanced language models that have significantly reshaped the academic writing landscape. These sophisticated language models offer invaluable support throughout all phases of research work, facilitating idea generation, enhancing drafting processes, and overcoming challenges like writer’s block. Their capabilities [...] Read more.

GPT (Generative Pre-trained Transformer) represents advanced language models that have significantly reshaped the academic writing landscape. These sophisticated language models offer invaluable support throughout all phases of research work, facilitating idea generation, enhancing drafting processes, and overcoming challenges like writer’s block. Their capabilities extend beyond conventional applications, contributing to critical analysis, data augmentation, and research design, thereby elevating the efficiency and quality of scholarly endeavors. Strategically narrowing its focus, this review explores alternative dimensions of GPT and LLM applications, specifically data augmentation and the generation of synthetic data for research. Employing a meticulous examination of 412 scholarly works, it distills a selection of 77 contributions addressing three critical research questions: (1) GPT on Generating Research data, (2) GPT on Data Analysis, and (3) GPT on Research Design. The systematic literature review adeptly highlights the central focus on data augmentation, encapsulating 48 pertinent scholarly contributions, and extends to the proactive role of GPT in critical analysis of research data and shaping research design. Pioneering a comprehensive classification framework for “GPT’s use on Research Data”, the study classifies existing literature into six categories and 14 sub-categories, providing profound insights into the multifaceted applications of GPT in research data. This study meticulously compares 54 pieces of literature, evaluating research domains, methodologies, and advantages and disadvantages, providing scholars with profound insights crucial for the seamless integration of GPT across diverse phases of their scholarly pursuits. Full article

(This article belongs to the Special Issue Editorial Board Members’ Collection Series: "Information Processes")

► Show Figures

Figure 1

19 pages, 2348 KiB

Open AccessArticle

BookGPT: A General Framework for Book Recommendation Empowered by Large Language Model

by Zhiyu Li, Yanfang Chen, Xuan Zhang and Xun Liang

Electronics 2023, 12(22), 4654; https://doi.org/10.3390/electronics12224654 - 15 Nov 2023

Cited by 1 | Viewed by 1818

Abstract

With the continuous development and change exhibited by large language model (LLM) technology, represented by generative pretrained transformers (GPTs), many classic scenarios in various fields have re-emerged with new opportunities. This paper takes ChatGPT as the modeling object, incorporates LLM technology into the [...] Read more.

With the continuous development and change exhibited by large language model (LLM) technology, represented by generative pretrained transformers (GPTs), many classic scenarios in various fields have re-emerged with new opportunities. This paper takes ChatGPT as the modeling object, incorporates LLM technology into the typical book resource understanding and recommendation scenario for the first time, and puts it into practice. By building a ChatGPT-like book recommendation system (BookGPT) framework based on ChatGPT, this paper attempts to apply ChatGPT to recommendation modeling for three typical tasks: book rating recommendation, user rating recommendation, and the book summary recommendation; it also explores the feasibility of LLM technology in book recommendation scenarios. At the same time, based on different evaluation schemes for book recommendation tasks and the existing classic recommendation models, this paper discusses the advantages and disadvantages of the BookGPT in book recommendation scenarios and analyzes the opportunities and improvement directions for subsequent LLMs in these scenarios. The experimental research shows the following: (1) The BookGPT can achieve good recommendation results in existing classic book recommendation tasks. Especially in cases containing less information about the target object to be recommended, such as zero-shot or one-shot learning tasks, the performance of the BookGPT is close to or even better than that of the current classic book recommendation algorithms, and this method has great potential for improvement. (2) In text generation tasks such as book summary recommendation, the recommendation effect of the BookGPT model is better than that of the manual editing process of Douban Reading, and it can even perform personalized interpretable content recommendations based on readers’ attribute and identity information, making it more persuasive than interpretable one-size-fits-all recommendation models. Finally, we have open-sourced the relevant datasets and experimental codes, hoping that the exploratory program proposed in this paper can inspire the development of more LLMs to expand their applications and theoretical research prospects in the field of book recommendation and general recommendation tasks. Full article

(This article belongs to the Special Issue Challenges and New Opportunities for Next-Generation Recommender Systems)

► Show Figures

Figure 1

19 pages, 4236 KiB

Open AccessArticle

Improved Leakage Detection and Recognition Algorithm for Residual Neural Networks Based on Transfer Learning

by Liangliang Li, Yu Chen, Zhengxiang Ma, Xinling Wen, Jiabao Pang and Weitao Yuan

Electronics 2023, 12(20), 4378; https://doi.org/10.3390/electronics12204378 - 23 Oct 2023

Viewed by 903

Abstract

Due to the lack of other component information in traditional magnetic leakage signal defects and the low accuracy of prediction methods, this paper proposes an improved residual network for magnetic leakage detection defect recognition method that predicts defect size and different detection speeds. [...] Read more.

Due to the lack of other component information in traditional magnetic leakage signal defects and the low accuracy of prediction methods, this paper proposes an improved residual network for magnetic leakage detection defect recognition method that predicts defect size and different detection speeds. A new defect diagnosis method based on ResNet18 on the Convolutional Neural Network (CNN) is proposed in this study. This method transfers the pre-trained ResNet18 network and replaces the activation function in the transferred network structure. It extracts features from transformed two-dimensional images obtained by converting the original experimental signals and signals with added noise, removing the influence of manual features. The results demonstrated that the improved ResNet18 network model, after transfer learning, achieved 100% prediction accuracy for all 10,000 grayscale images generated with defect lengths of 50 mm; width of 2 mm; and depths of 2 mm, 4 mm, 6 mm, and 8 mm. Moreover, the prediction accuracies for the quasi-static, slow, compensated fast, and fast scanning speeds were 99.20%, 98.50%, 93.30%, and 94.00%, respectively, for defect depths of 2 mm, 4 mm, 6 mm, and 8 mm. These accuracies surpass those of other models, demonstrating the significant improvement in prediction accuracy achieved by this method. Full article

► Show Figures

Figure 1

72 pages, 4682 KiB

Open AccessReview

Deep Learning for Medical Image-Based Cancer Diagnosis

by Xiaoyan Jiang, Zuojin Hu, Shuihua Wang and Yudong Zhang

Cancers 2023, 15(14), 3608; https://doi.org/10.3390/cancers15143608 - 13 Jul 2023

Cited by 18 | Viewed by 6642

Abstract

(1) Background: The application of deep learning technology to realize cancer diagnosis based on medical images is one of the research hotspots in the field of artificial intelligence and computer vision. Due to the rapid development of deep learning methods, cancer diagnosis requires [...] Read more.

(1) Background: The application of deep learning technology to realize cancer diagnosis based on medical images is one of the research hotspots in the field of artificial intelligence and computer vision. Due to the rapid development of deep learning methods, cancer diagnosis requires very high accuracy and timeliness as well as the inherent particularity and complexity of medical imaging. A comprehensive review of relevant studies is necessary to help readers better understand the current research status and ideas. (2) Methods: Five radiological images, including X-ray, ultrasound (US), computed tomography (CT), magnetic resonance imaging (MRI), positron emission computed tomography (PET), and histopathological images, are reviewed in this paper. The basic architecture of deep learning and classical pretrained models are comprehensively reviewed. In particular, advanced neural networks emerging in recent years, including transfer learning, ensemble learning (EL), graph neural network, and vision transformer (ViT), are introduced. Five overfitting prevention methods are summarized: batch normalization, dropout, weight initialization, and data augmentation. The application of deep learning technology in medical image-based cancer analysis is sorted out. (3) Results: Deep learning has achieved great success in medical image-based cancer diagnosis, showing good results in image classification, image reconstruction, image detection, image segmentation, image registration, and image synthesis. However, the lack of high-quality labeled datasets limits the role of deep learning and faces challenges in rare cancer diagnosis, multi-modal image fusion, model explainability, and generalization. (4) Conclusions: There is a need for more public standard databases for cancer. The pre-training model based on deep neural networks has the potential to be improved, and special attention should be paid to the research of multimodal data fusion and supervised paradigm. Technologies such as ViT, ensemble learning, and few-shot learning will bring surprises to cancer diagnosis based on medical images. Full article

► Show Figures

Figure 1

24 pages, 717 KiB

Open AccessArticle

DiffuD2T: Empowering Data-to-Text Generation with Diffusion

by Heng Gong, Xiaocheng Feng and Bing Qin

Electronics 2023, 12(9), 2136; https://doi.org/10.3390/electronics12092136 - 7 May 2023

Viewed by 2340

Abstract

Surrounded by structured data, such as medical data, financial data, knowledge bases, etc., data-to-text generation has become an important natural language processing task that can help people better understand the meaning of those data by providing them with user-friendly text. Existing methods for [...] Read more.

Surrounded by structured data, such as medical data, financial data, knowledge bases, etc., data-to-text generation has become an important natural language processing task that can help people better understand the meaning of those data by providing them with user-friendly text. Existing methods for data-to-text generation show promising results in tackling two major challenges: content planning and surface realization, which transform structured data into fluent text. However, they lack an iterative refinement process for generating text, which can enable the model to perfect the text step-by-step while accepting control over the process. In this paper, we explore enhancing data-to-text generation with an iterative refinement process via diffusion. We have four main contributions: (1) we use the diffusion model to improve the prefix tuning for data-to-text generation; (2) we propose a look-ahead guiding loss to supervise the iterative refinement process for better text generation; (3) we extract content plans from reference text and propose a planning-then-writing pipeline to give the model content planning ability; and (4) we conducted experiments on three data-to-text generation datasets and both automatic evaluation criteria (BLEU, NIST, METEOR,

{ROUGE}_{L}

, CIDEr, TER, MoverScore, BLEURT, and BERTScore) and human evaluation criteria (Quality and Naturalness) show the effectiveness of our model. Our model can improve the competitive prefix tuning method by 2.19% in terms of a widely-used automatic evaluation criterion BLEU (BiLingual Evaluation Understudy) on WebNLG dataset with GPT-2 Large as the pretrained language model backbone. Human evaluation criteria also show that our model can improve the quality and naturalness of the generated text across all three datasets. Full article

(This article belongs to the Special Issue Natural Language Processing and Information Retrieval)

► Show Figures

Figure 1

19 pages, 9341 KiB

Open AccessArticle

TESR: Two-Stage Approach for Enhancement and Super-Resolution of Remote Sensing Images

by Anas M. Ali, Bilel Benjdira, Anis Koubaa, Wadii Boulila and Walid El-Shafai

Remote Sens. 2023, 15(9), 2346; https://doi.org/10.3390/rs15092346 - 29 Apr 2023

Cited by 8 | Viewed by 2744

Abstract

Remote Sensing (RS) images are usually captured at resolutions lower than those required. Deep Learning (DL)-based super-resolution (SR) architectures are typically used to increase the resolution artificially. In this study, we designed a new architecture called TESR (Two-stage approach for Enhancement and super-resolution), [...] Read more.

Remote Sensing (RS) images are usually captured at resolutions lower than those required. Deep Learning (DL)-based super-resolution (SR) architectures are typically used to increase the resolution artificially. In this study, we designed a new architecture called TESR (Two-stage approach for Enhancement and super-resolution), leveraging the power of Vision Transformers (ViT) and the Diffusion Model (DM) to increase the resolution of RS images artificially. The first stage is the ViT-based model, which serves to increase resolution. The second stage is an iterative DM pre-trained on a larger dataset, which serves to increase image quality. Every stage is trained separately on the given task using a separate dataset. The self-attention mechanism of the ViT helps the first stage generate global and contextual details. The iterative Diffusion Model helps the second stage enhance the image’s quality and generate consistent and harmonic fine details. We found that TESR outperforms state-of-the-art architectures on super-resolution of remote sensing images on the UCMerced benchmark dataset. Considering the PSNR/SSIM metrics, TESR improves SR image quality as compared to state-of-the-art techniques from 34.03/0.9301 to 35.367/0.9449 in the scale ×2. On a scale of ×3, it improves from 29.92/0.8408 to 32.311/0.91143. On a scale of ×4, it improves from 27.77/0.7630 to 31.951/0.90456. We also found that the Charbonnier loss outperformed other loss functions in the training of both stages of TESR. The improvement was by a margin of 21.5%/14.3%, in the PSNR/SSIM, respectively. The source code of TESR is open to the community. Full article

(This article belongs to the Special Issue Deep Learning Meets Remote Sensing for Earth Observation and Monitoring)

► Show Figures

Figure 1

27 pages, 1427 KiB

Open AccessArticle

An Empirical Analysis of State-of-Art Classification Models in an IT Incident Severity Prediction Framework

by Salman Ahmed, Muskaan Singh, Brendan Doherty, Effirul Ramlan, Kathryn Harkin, Magda Bucholc and Damien Coyle

Appl. Sci. 2023, 13(6), 3843; https://doi.org/10.3390/app13063843 - 17 Mar 2023

Cited by 2 | Viewed by 3177

Abstract

Large-scale companies across various sectors maintain substantial IT infrastructure to support their operations and provide quality services for their customers and employees. These IT operations are managed by teams who deal directly with incident reports (i.e., those generated automatically through autonomous systems or [...] Read more.

Large-scale companies across various sectors maintain substantial IT infrastructure to support their operations and provide quality services for their customers and employees. These IT operations are managed by teams who deal directly with incident reports (i.e., those generated automatically through autonomous systems or human operators). (1) Background: Early identification of major incidents can provide a significant advantage for reducing the disruption to normal business operations, especially for preventing catastrophic disruptions, such as a complete system shutdown. (2) Methods: This study conducted an empirical analysis of eleven (11) state-of-the-art models to predict the severity of these incidents using an industry-led use-case composed of 500,000 records collected over one year. (3) Results: The datasets were generated from three stakeholders (i.e., agency, customer, and employee). Separately, the bidirectional encoder representations from transformers (BERT), the robustly optimized BERT pre-training approach (RoBERTa), the enhanced representation through knowledge integration (ERNIE 2.0), and the extreme gradient boosting (XGBoost) methods performed the best for the agency records (93% AUC), while the convolutional neural network (CNN) was the best model for the rest (employee records at 95% AUC and customer records at 74% AUC, respectively). The average prediction horizon was approximately 150 min, which was significant for real-time deployment. (4) Conclusions: The study provided a comprehensive analysis that supported the deployment of artificial intelligence for IT operations (AIOps), specifically for incident management within large-scale organizations. Full article

(This article belongs to the Special Issue Machine/Deep Learning: Applications, Technologies and Algorithms)

► Show Figures

Figure 1

16 pages, 1438 KiB

Open AccessArticle

A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification

by Sahar F. Sabbeh and Heba A. Fasihuddin

Electronics 2023, 12(6), 1425; https://doi.org/10.3390/electronics12061425 - 16 Mar 2023

Cited by 10 | Viewed by 4058

Abstract

Sentiment analysis on social media platforms (i.e., Twitter or Facebook) has become an important tool to learn about users’ opinions and preferences. However, the accuracy of sentiment analysis is disrupted by the challenges of natural language processing (NLP). Recently, deep learning models have [...] Read more.

Sentiment analysis on social media platforms (i.e., Twitter or Facebook) has become an important tool to learn about users’ opinions and preferences. However, the accuracy of sentiment analysis is disrupted by the challenges of natural language processing (NLP). Recently, deep learning models have proved superior performance over statistical- and lexical-based approaches in NLP-related tasks. Word embedding is an important layer of deep learning models to generate input features. Many word embedding models have been presented for text representation of both classic and context-based word embeddings. In this paper, we present a comparative analysis to evaluate both classic and contextualized word embeddings for sentiment analysis. The four most frequently used word embedding techniques were used in their trained and pre-trained versions. The selected embedding represents classical and contextualized techniques. Classical word embedding includes algorithms such as GloVe, Word2vec, and FastText. By contrast, ARBERT is used as a contextualized embedding model. Since word embedding is more typically employed as the input layer in deep networks, we used deep learning architectures BiLSTM and CNN for sentiment classification. To achieve these goals, the experiments were applied to a series of benchmark datasets: HARD, Khooli, AJGT, ArSAS, and ASTD. Finally, a comparative analysis was conducted on the results obtained for the experimented models. Our outcomes indicate that, generally, generated embedding by one technique achieves higher performance than its pretrained version for the same technique by around 0.28 to 1.8% accuracy, 0.33 to 2.17% precision, and 0.44 to 2% recall. Moreover, the contextualized transformer-based embedding model BERT achieved the highest performance in its pretrained and trained versions. Additionally, the results indicate that BiLSTM outperforms CNN by approximately 2% in 3 datasets, HARD, Khooli, and ArSAS, while CNN achieved around 2% higher performance in the smaller datasets, AJGT and ASTD. Full article

► Show Figures

Figure 1

Search Results (28)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (28)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI