Welcome to CSEIT 2024

11th International Conference on Computer Science, Engineering and Information Technology (CSEIT 2024)

July 27 ~ 28, 2024, London, United Kingdom



Accepted Papers
Information Extraction From Product Labels: a Machine Vision Approach

Hansi Seitaj and Vinayak Elangovan, Computer Science program, Penn State Abington, Abington, PA, USA

ABSTRACT

This research tackles the challenge of manual data extraction from product labels by employing a blend of computer vision and Natural Language Processing (NLP). We introduce an enhanced model that combines Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) in a Convolutional Recurrent Neural Network (CRNN) for reliable text recognition. Our model is further refined by incorporating the Tesseract OCR engine, enhancing its applicability in Optical Character Recognition (OCR) tasks. The methodology is augmented by NLP techniques and extended through the Open Food Facts API (Application Programming Interface) for database population and text-only label prediction. The CRNN model is trained on encoded labels and evaluated for accuracy on a dedicated test set. Importantly, our approach enables visually impaired individuals to access essential information on product labels, such as directions and ingredients. Overall, the study highlights the efficacy of deep learning and OCR in automating label extraction and recognition.

KEYWORDS

Optical Character Recognition (OCR); Machine Vision; Machine Learning; Convolutional Recurrent Neural Network (CRNN); Natural Language Processing (NLP); Text Recognition; Test Classification; Product Labels; Deep Learning; Data Extraction.


Unveiling the Value of User Reviews on Steam: a Predictive Modeling of User Engagement Approach Using Machine Learning

Leonardo Espinosa-Leal1, Mar´ıa Olmedilla2, Jose-Carlos Romero-Moreno3, and Zhen Li1, 1Arcada University of Applied Sciences, Graduate Studies and Research, Finland, 2SKEMA Business School – Universit´e Cˆote d’Azur, France, 3Applied Computational Social Sciences-Institute, University of Paris-Dauphine-PSL, France

ABSTRACT

In an era where user-generated content is both ubiquitous and influential, accurately evaluating videogame reviews’ relevance becomes critical. The vast digital domain of videogames brims with user feedback, presenting the challenge of distinguishing genuinely helpful reviews. Our study, analyzing over a million videogame reviews from the Steam platform, employs cutting-edge machine learning techniques to ascertain review helpfulness. We applied both regression and binary classification models, revealing the latter’s enhanced predictive prowess. Interestingly, our findings contradict the anticipated benefit of incorporating features from pre-trained NLP models into enhancing prediction accuracy. This investigation not only highlights methods for assessing review helpfulness effectively but also promotes the application of computational techniques for the insightful analysis of user-generated content. Furthermore, it provides valuable perspectives on the elements influencing user engagement and the intrinsic value of feedback within the context of videogame consumption, marking a significant contribution to understanding digital user interaction dynamics.

KEYWORDS

Videogames, helpfulness, machine learning, NLP, online reviews.


Predict the Consumer Price Index in Vietnam Using Long Short-term Memory (Lstm) Network Based on Cloud Computing

Pham Trong Huynh, University of Natural Resources and Environment Ho Chi Minh City, Viet Nam

ABSTRACT

In Vietnam, the Consumer Price Index (CPI) serves as a pivotal gauge for evaluating inflation, alongside the Gross Domestic Product (GDP) Index. CPI data not only assesses economic performance but also forecasts future inflation trends. This research endeavors to predict CPI utilizing Long Short-Term Memory networks (LSTMs), an advancement over Regression Neural Networks (RNNs). The model inputs basic price variables in Vietnam to forecast CPI values. To enhance prediction accuracy, various optimization algorithms were employed including Stochastic Gradient Descent (SGD), Root Mean Square Propagation (RMSProp), Adaptive Gradient (AdaGrad), Adaptive Moment (Adam), Adadelta, Nesterov Adam (Nadam), and Adamax. Results demonstrate Nadam's superiority with an achieved RMSE of 4.088. Although the model's accuracy falls short of expectations, potential enhancements include adjusting epoch numbers, hidden layers, batch sizes, and input variables. This study not only presents the model but also proposes an approach to CPI data regarding essential food prices in forecasting inflation rates.

KEYWORDS

LSTM, Machine Learning, CPI, Prediction, Namdam.


Navigating Efficiency: Proximal Policy Optimization for Efficient Product Transportation in Reinforcement Learning Environments

Asharful Islam and Chuan Li, Department of Computer Science, Sichuan University, Chengdu, China

ABSTRACT

Optimizing the delivery of products from a central depot to multiple retail locations presents a multifaceted challenge, especially when considering factors such as minimizing costs while ensuring product availability for customers. Traditional approaches to this problem often rely on heuristic methods or mathematical optimization techniques. However, these approaches may struggle to adapt to dynamic real-world scenarios with complex, evolving conditions. This study pioneers the application of Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning algorithm, to the domain of product transportation and inventory management. By creating a custom simulation environment, “ProductTransportEnv,” we delve into the complexities of supply chain logistics, demonstrating the significant potential of reinforcement learning to transform operational efficiencies. The “ProductTransportEnv” mimics real-world logistics scenarios, allowing for a detailed exploration of transportation routes, inventory levels, and demand fluctuations, providing a rigorous testing ground for the PPO algorithm.

KEYWORDS

Openai Gym-environment, “ProductTransportEve,” PPO, RL, DQN, Inventory management.


Enhancing Cost-effective License Plate Detection and Recognition on Low-compute Edge Devices Through Unified Modeling and Tensorrt Quantization

Sonu Kumar, Hassan Berry, Bahram Baloch, Ibrahim Chippa, and Abdul Muqsit Abbasi

ABSTRACT

In the realm of smart city and intelligent transportation systems, the efficient detection and recognition of license plates on low-compute edge devices present a significant challenge. Traditional high-compute infrastructures, while powerful, are not cost-effective nor scalable for widespread implementation. This research paper addresses this challenge by introducing a novel, unified model designed to optimize license plate detection and recognition on these resource-constrained devices. Our comprehensive approach includes data augmentation using core computer vision techniques and a custom YOLOv3 configuration tailored for this specific task. Key innovations in our methodology are the use of flipped and unflipped numbers in a dual-phase training regimen and the quantization of models using TensorRT. This enables efficient deployment on edge devices, overcoming the traditional tradeoffs between performance and computational demands. The results demonstrate that our model not only performs with high accuracy in detecting license plates and recognizing characters but also stands out in terms of cost-effectiveness and scalability. This positions our research at the forefront of ALPR technology, offering a practical, efficient solution for smart city and surveillance technologies.


Revolutionizing Utility Meter Reading in Developing Economies: a Computer Vision-powered Solution - a Case Study of Pakistan

Eman Ahmed, Ibrahim Chippa, Bahram Baloch, Hassan Berry, and Abdul Muqsit Abbasi

ABSTRACT

This research paper explores the modernization of meter reading processes in third-world countries, with a specific focus on Pakistan. Traditional manual meter reading practices in these regions are labor-intensive, error-prone, and time-consuming, leading to suboptimal utility management and financial losses. To address these challenges, our study introduces a digitalized meter reading system enhanced by computer vision and machine learning technologies. This system automates data collection, enables real-time monitoring, and employs data analytics to enhance accuracy and efficiency. By reducing human error and ensuring timely data transmission, this digitized assistant empowers utility providers to make informed decisions and optimize resource allocation. Using Pakistan as a case study, we evaluate the impact of the digitized meter reading assistant on operational efficiency, cost-effectiveness, and overall utility management. Through key performance indicators and case studies, we demonstrate how computer vision and machine learning can enhance service delivery, reduce financial losses, and promote sustainability in third-world economies. This research contributes to the discourse on technological interventions in developing countries by highlighting the potential of digitizing essential services like meter reading. The findings offer valuable insights for policymakers, utility providers, and researchers seeking innovative solutions to address operational challenges in similar socio-economic contexts.


Araspider: Democratizing Arabic-to-sql

Ahmed Heak, Youssef Mohamed, and Ahmed B. Zaky, Department of Computer Science, Egypt-Japan University of Science and Technology

ABSTRACT

This study presents AraSpider, the first Arabic version of the Spider dataset, aimed at improving natural language processing (NLP) in the Arabic-speaking community. Four multilingual translation models were tested for their effectiveness in translating English to Arabic. Additionally, two models were assessed for their ability to generate SQL queries from Arabic text. The results showed that using back translation significantly improved the performance of both ChatGPT 3.5 and SQLCoder models, which are considered top performers on the Spider dataset. Notably, ChatGPT 3.5 demonstrated high-quality translation, while SQLCoder excelled in text-to-SQL tasks. The study underscores the importance of incorporating contextual schema and employing back translation strategies to enhance model performance in Arabic NLP tasks. Moreover, the provision of detailed methodologies for reproducibility and translation of the dataset into other languages highlights the research's commitment to promoting transparency and collaborative knowledge sharing in the field. Overall, these contributions advance NLP research, empower Arabic-speaking researchers, and enrich the global discourse on language comprehension and database interrogation.

KEYWORDS

Semantic Parsing, SQL Generation, Text-to-SQL, Spider Dataset, Natural Language Processing.


Performance Evaluation of Large Language Model for Copy Number Variation Extraction From Medical Journal

Jongmun Choi, Department of Molecular Genetics and Artificial Intelligence Research Center, Seegene Medical Foundation, Seoul, South Korea

ABSTRACT

This study assesses the efficacy of using Large Language Models (LLMs), specifically GPT-4, for extracting Copy Number Variations (CNVs) from medical journal articles, a task critical for advancing genetic research and clinical decision-making. Copy Number Variations (CNVs) significantly contribute to genetic diversity and disease, yet their complexity and the variable nature of their genetic content pose challenges for interpretation in clinical genetics. Traditional methods for CNV data extraction from clinical journals have faced limitations in accuracy, partly due to the inherent complexity of genetic data. This paper evaluates an alternative approach using GPT-4, comparing its performance against CNV-ETLAI, a specialized NLP-based model designed for CNV extraction. Our methodology involved configuring GPT-4 to process and interpret medical journal PDFs, developing custom prompts for CNV information extraction, and benchmarking its performance using a dataset of 146 true positive CNVs. The results revealed that while GPT-4 shows promise, with commendable performance despite the lack of fine-tuning for medical document analysis, it significantly lags behind CNV-ETLAI, particularly in extracting information from tables—a crucial aspect of data interpretation in genomics. Despite GPT-4's lower accuracy, its potential for improvement and adaptability highlights the evolving capabilities of LLMs as valuable tools for medical data extraction. This study underscores the superiority of CNV-ETLAI in current clinical genetic settings while pointing towards the promising future of LLMs in enhancing the efficiency and breadth of medical data extraction across various applications.

KEYWORDS

Large Language Model, GPT-4, Copy Number Variation, Natural Language Processing, Text-mining, Genetic Interpretation.