Please login to be able to save your searches and receive alerts for new content matching your search criteria.
With the rapid development of computer vision technology, rendering and stylization of landscape drawings have become the focus of attention in the field of design. This paper aims to study the rendering stylization method of landscape drawings based on computer vision technology and discuss how to apply advanced vision algorithms to the automatic stylization of landscape design drawings. This paper reviews the application status of computer vision technology in the field of landscape design and analyzes the challenges and opportunities faced by the current technology. This paper proposes a new landscape drawing stylization framework, which combines deep learning, image processing and pattern recognition techniques to achieve efficient rendering and stylization of landscape drawings. By learning and training on a large amount of landscape drawing data, the model is able to recognize and simulate different rendering styles, thus providing designers with a variety of design solutions. In addition, the paper explores the importance of user interaction in the stylization process and proposes a user-oriented stylization approach that allows designers to customize and adjust rendering effects through simple interactions. Finally, the effectiveness and practicability of the proposed method are verified through a series of experiments. The experimental results show that the proposed method can significantly improve the quality and efficiency of landscape drawing rendering, and bring new technical innovations to the field of landscape design.
This paper aims to study a method of movement recognition and injury risk assessment for Wushu athletes based on computer vision. Studying the optimization of Wushu athletes’ movement trajectory can effectively improve the athletes’ movement quality. Based on a hybrid real-time synchronization algorithm, the arm motion trajectory model of martial arts athletes is studied. A dynamic and static arm recognition algorithm is proposed, and the motion feature extraction method in Wushu movement is studied. Dynamic arm recognition typically relies on the collection of video or image sequences. These sequences contain the position, shape, and motion trajectory of the arm at different time points. Static arm recognition mainly relies on the acquisition of a single image or image frame. The dynamic arm recognition algorithm captures the motion trajectory and changes of the arm by processing video or image sequences, while the static arm recognition algorithm only processes individual images or image frames. On this basis, the design of the computer software architecture of the digital site system is completed. The real-time collection, analysis, display and storage of motion information and assembly configuration are realized. The simulation results show that the method has high accuracy and provides a powerful scientific basis for improving athletes’ movement injuries. To predict the risk of injury an athlete may suffer during training or competition.
One of the most important aspects of industrial manufacturing is the assessment of product appearance and quality, which frequently occurs manually. A product’s appearance can convey practical qualities, aesthetic value, and a sense of quality to consumers. This study explores the use of computer vision in the automatic generation of product appearance designs. In this study, a dolphin swarm-driven weighted random forest (DS-WRF) design approach is introduced to assist designers in improving the efficiency of product appearance design with automatic generation. In this study, we collected product image data. The data were preprocessed to resize the images. The two primary components of this method are sketched inversion and design scheme creation. The produced sketches are entered into the sketch inversion model combined with the matching color images for training. Sketches by hand are transformed into color design patterns using this approach. The experimental result demonstrates that the proposed method involves the automatic generation of product appearance designs with high quality.
Data security and privacy have become areas of concern as image recognition and computer vision technologies have advanced. In Internet of Things (IoT) systems for remote sensing, conventional encryption methods are computationally challenging and unsuitable, and multimedia data require strong mechanisms. This study proposes a novel Authentication Vision Optimizer (ACVO)-driven module incorporated with advanced cryptographic approaches and computer vision to generate a privacy-preserving image identification system for e-archives in IoT environments. The framework consists of four modules, like encryption module, secure sharing, authentication, and recognition optimization, utilizing high-resolution satellite images captured via electronic remote sensing gathered from the Kaggle. The encryption module employs the Advanced Encryption Standard (AES) algorithm, while the secure sharing module utilizes Visual Cryptography (VC) for human-readable reconstruction. The authentication module uses the Trusted Execution Environments (TEE) approach to ensure data authenticity. The image recognition optimization module utilizes transfer learning to fine-tune pre-trained Efficient Golden Jackal Tuned Deep Convolute Neuronet (EGJ-DCNN) on small-scale encrypted datasets. The study achieved high recognition accuracy on encrypted satellite images utilizing transfer learning optimization, reducing computational time with AES and visual cryptography encryption. Data integrity is maintained optimally in blind watermarking verification, and privacy protection represents high resistance to unofficial access.
In this paper, we introduce a novel system to assist 6D object pose estimation network training, which is only deployed in the training progress to optimize the network parameters, and does not work in the testing stage, called Online training and offline testing system (OTOT). OTOT consists of two modules: a feature fusion module and a supervision module. The feature fusion module fuses several feature maps from the pose estimation network in a specified order to obtain a fused feature. Then, the supervision module uses the encoder–decoder structure network to implicitly extract useful features from the fused feature and optimizes the pose estimation network online through the back-propagation mechanism. OTOT can be migrated to any network with encoder–decoder structure. The network trained with OTOT achieves 56.11% accuracy in terms of the VSD metric on the TLESS dataset using RGB inputs, compared to the 46.70% accuracy of the original network trained without OTOT. Experiments show that OTOT greatly improves the accuracy of the pose estimation network, and since OTOT is not deployed in the testing stage, it does not increase any parameters during testing and affect the original speed of the network.
Behavioral monitoring can be used to monitor aquatic ecosystems and water quality over time. Using precise and rapid fish performance detection, fishermen may make educated management decisions on recirculating aquaculture systems while decreasing labor. Sensors and procedures for recognizing fish behavior are often developed and prepared by researchers in big numbers. Deep learning (DL) techniques have revolutionized the capability to automatically analyze videos, which were utilized for behavior analysis, live fish detection, biomass estimation, water quality monitoring, and species classification. The benefit of DL is that it could automatically study the extraction of image features and reveals brilliant performance in identifying sequential actions. This paper focuses on the design of Dwarf Mongoose Optimization with Transfer Learning-based fish behavior classification (DMOTLB-FBC) model. The presented DMOTLB-FBC technique intends to effectively monitor and classify fish behaviors. Initially, the DMOTLB-FBC technique follows Gaussian filtering (GFI) technique for noise removal process. Besides, a transfer learning (TL)-based neural architectural search network (NASNet) model is used to produce a collection of feature vectors. For fish behavior classification, graph convolution network (GCN) model is employed in this work. To improve the fish behavior classification results of the DMOTLB-FBC technique, the DWO algorithm is applied as a hyperparameter optimizer of the GCN model. The experimentation analysis of the DMOTLB-FBC technique is tested on fish video dataset and the widespread comparison study reported the enhancements of the DMOTLB-FBC technique over other recent approaches.
In recent years, advancements in computer vision (CV) and machine learning (ML) have facilitated significant progress in the field of plant disease prediction and detection. The growing danger to worldwide food security because of plant diseases necessitates the development of accurate and efficient methods for early disease identification. This paper provides an extensive overview of the current landscape of plant disease prediction using plant images, focusing on datasets, models, and potential future directions. Initially, publicly available datasets that comprise annotated images of healthy and diseased plants, enabling researchers to develop and evaluate predictive models are analyzed. These datasets, encompassing a wide range of crops and diseases, serve as crucial resources for training and benchmarking various algorithms. Also, explore both traditional and modern approaches, including expert systems, ML, and deep learning (DL) algorithms. Model architectures, transfer learning strategies, and ensemble techniques are discussed in terms of their effectiveness in disease classification and localization. This paper also addresses the challenges faced in plant disease prediction, such as data scarcity, model robustness, and scalability. Provide a rendition of the present state of research and identify potential avenues for future exploration, this paper aims to contribute to the advancement of plant disease prediction methods, fostering more resilient and productive agricultural practices.
In the field of materials science, accurate and precise material characterization is essential for understanding material properties. Manual microstructural classification is subjective, time-consuming, and prone to human error. There is a growing need for automation and reliable techniques to identify and classify microstructural features. This project aims to develop an integrated system utilizing computer vision, machine learning algorithms, and advanced deep learning techniques for automating and predicting material microstructure. Deep learning has already been applied for feature extraction and learning across various domains, including materials science. However, most applications focus either on complex industrial processes or data simulation, with few addressing property forecasting and analysis. Moreover, the limited programs available often require expensive equipment or high maintenance. The project is divided into three key steps: database creation, model training, and model evaluation. Database creation involves sample collection of grey cast iron, sample preparation (metallography), and microscopy to generate microstructural images. These images undergo feature engineering before being trained on different deep learning models. The model that performs best at 500 epochs with an image size of 1056 is selected, achieving a mean Average Precision (mAP) of 90.2% at a threshold of 0.5. The selected model is then evaluated on different grey cast iron microstructures to determine the percentage of various types of carbon flakes. This project introduces a Python-embedded machine learning program that automates carbon flake characterization using deep learning techniques with 90% accuracy. The proposed system contributes to the advancement of automated microstructural characterization of grey cast iron.
The integration of Artificial Intelligence (AI) and sign language recognition is a hot topic in the field of AI+Science, aimed at addressing communication barriers faced by the deaf and hard-of-hearing communities. This paper examines the integration of AI with sign language recognition, highlighting its potential to bridge communication gaps for the deaf and hard-of-hearing. It reviews the evolution of sign language recognition from data gloves to computer vision and underscores the role of extensive databases. The paper also discusses the benefits of multi-modal AI models in enhancing recognition accuracy. It highlights the importance of government and industry support, ethical data practices, and user-centered design in advancing this technology. The challenges and opportunities of integrating this technology into daily life, including technical, interface, and ethical considerations, are explored, emphasizing the need for user-focused solutions and innovative technical approaches.
Background modeling and foreground detection are key parts of any computer vision system. These problems have been addressed in literature with several probabilistic approaches based on mixture models. Here we propose a new kind of probabilistic background models which is based on probabilistic self-organising maps. This way, the background pixels are modeled with more flexibility. On the other hand, a statistical correlation measure is used to test the similarity among nearby pixels, so as to enhance the detection performance by providing a feedback to the process. Several well known benchmark videos have been used to assess the relative performance of our proposal with respect to traditional neural and non neural based methods, with favourable results, both qualitatively and quantitatively. A statistical analysis of the differences among methods demonstrates that our method is significantly better than its competitors. This way, a strong alternative to classical methods is presented.
In this work, a novel self-organizing model called growing neural forest (GNF) is presented. It is based on the growing neural gas (GNG), which learns a general graph with no special provisions for datasets with separated clusters. On the contrary, the proposed GNF learns a set of trees so that each tree represents a connected cluster of data. High dimensional datasets often contain large empty regions among clusters, so this proposal is better suited to them than other self-organizing models because it represents these separated clusters as connected components made of neurons. Experimental results are reported which show the self-organization capabilities of the model. Moreover, its suitability for unsupervised clustering and foreground detection applications is demonstrated. In particular, the GNF is shown to correctly discover the connected component structure of some datasets. Moreover, it outperforms some well-known foreground detectors both in quantitative and qualitative terms.
One of the most important challenges in computer vision applications is the background modeling, especially when the background is dynamic and the input distribution might not be stationary, i.e. the distribution of the input data could change with time (e.g. changing illuminations, waving trees, water, etc.). In this work, an unsupervised learning neural network is proposed which is able to cope with progressive changes in the input distribution. It is based on a dual learning mechanism which manages the changes of the input distribution separately from the cluster detection. The proposal is adequate for scenes where the background varies slowly. The performance of the method is tested against several state-of-the-art foreground detectors both quantitatively and qualitatively, with favorable results.
Abnormal event detection is a popular research direction in the field of intelligent transportation and public safety. The features that characterize abnormal events are extracted from given video sequence through computer vision technology. Then the abnormal events in the video are automatically detected through the classification model. In order to describe the motion characteristics of events more accurately, a new feature based on motion entropy is proposed in this paper. The entropy value of motion pixels in the video frame is calculated as the input feature of the classification model. Motion entropy is suitable to regard as a feature to distinguish normal events from abnormal events due to the big differences between normal and abnormal events. In addition, an abnormal event detection model based on motion entropy and dual support vector data description (ME-DSVDD) is presented to solve the problem of insufficient sample diversity. The standard data set is tested to analyze the performance of the proposed model. The experimental results show that the proposed method can effectively improve the performance of the abnormal event detection model.
A transputer-based parallel processing paradjgm for real-time extraction of road traffic data from video images of roadway scenes is proposed. The model can monitor three lanes of motorway traffic in real-time by processing images from two windows associated with each lane. Parallel algorithms are distributed among a network of transputers to perform similar and/or different tasks concerning image data analysis and traffic data extraction. The model can be expanded to cover more lanes or duplicated to monitor a further multi-lane carriageway.
It is difficult to balance detection speed and detection accuracy in current methods for sampling on feature maps. We combine the parameterless interpolation algorithm used in the traditional method with the algorithm in deep learning to improve shortcomings of the traditional interpolation algorithm by introducing learnable parameters into the algorithm. The new algorithm guarantees the features of the original image by calculating the correlation between feature map channels and pixels to ensure the correct restoration of the image features during up-sampling. In this paper, we propose a new up-sampling operation model — Reparametric Blend Up-sampling (RBU), which uses the method of rearranging pixels to obtain a feature map that matches the up-sampling size, uses blending to compute inter-pixel and inter-channel correlations and back-propagates the model to learn the appropriate parameters to ensure feature extraction. In the field of object detection, this model replaces the original up-sampling operation and improves the mAP mean value by 1.1% and the detection of small target objects by 3.1% in the VOC dataset with the richest application scenarios.
This paper describes probabilistic modeling methods to solve the problem of discriminating between five facial orientations with very little labeled data. Three models are explored. The first model maintains no inter-pixel dependencies, the second model is capable of modeling a set of arbitrary pair-wise dependencies, and the last model allows dependencies only between neighboring pixels. We show that for all three of these models, the accuracy of the learned models can be greatly improved by augmenting a small number of labeled training images with a large set of unlabeled images using Expectation–Maximization. This is important because it is often difficult to obtain image labels, while many unlabeled images are readily available. Through a large set of empirical tests, we examine the benefits of unlabeled data for each of the models. By using only two randomly selected labeled examples per class, we can discriminate between the five facial orientations with an accuracy of 94%; with six labeled examples, we achieve an accuracy of 98%.
We present a new, efficient stereo algorithm addressing robust disparity estimation in the presence of occlusions. The algorithm is an adaptive, multiwindow scheme using left–right consistency to compute disparity and its associated uncertainty. We demonstrate and discuss performances with both synthetic and real stereo pairs, and show how our results improve on those of closely related techniques for both accuracy and efficiency.
Conventional application of hidden Markov models to the task of recognizing human gesture may suffer from multiple sources of systematic variation in the sensor outputs. We present two frameworks based on hidden Markov models which are designed to model and recognize gestures that vary in systematic ways. In the first, the systematic variation is assumed to be communicative in nature, and the input gesture is assumed to belong to gesture family. The variation across the family is modeled explicitly by the parametric hidden Markov model (PHMM). In the second framework, variation in the signal is overcome by relying on online learning rather than conventional offline, batch learning.
This paper presents an efficient general purpose search algorithm for alignment and an applied procedure for IC print mark quality inspection. The search algorithm is based on normalized cross-correlation and enhances it with a hierarchical resolution pyramid, dynamic programming, and pixel over-sampling to achieve subpixel accuracy on one or more targets. The general purpose search procedure is robust with respect to linear change of image intensity and thus can be applied to general industrial visual inspection. Accuracy, speed, reliability, and repeatability are all critical for the industrial use. After proper optimization, the proposed procedure was tested on the IC inspection platforms in the Mechanical Industry Research Laboratories (MIRL), Industrial Technology Research Institute (ITRI), Taiwan. The proposed method meets all these criteria and has worked well in field tests on various IC products.
This paper provides a survey of the variety of computer vision [CV] and image processing [IP] courses being taught at institutions around the world. The survey shows that, in addition to classic survey courses in CV/IP, there are many focused and multidisciplinary courses being taught that reportedly improve both student and faculty interest in the topic. It also demonstrates that students can successfully undertake a variety of complex lab assignments. In addition, this paper includes a comparative review of current textbooks and supplemental texts appropriate for CV/IP courses.