In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models–for which only informal (and often confusing) semantics is generally provided–all share a common underlying model, namely, the Dataflow model. The model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.
A systematic framework for jet definition is developed from first principles of physical measurement, quantum field theory, and QCD. A jet definition is found which:
• is theoretically optimal in regard of both minimization of detector errors and inversion of hadronization;
• is similar to a cone algorithm with dynamically negotiated jet shapes and positions found via shape observables that generalize the thrust to any number of axes;
• involves no ad hoc conventions;
• allows a fast computer implementation.7
The framework offers an array of options for systematic construction of quasi-optimal observables for specific applications.
The Taiji-1 satellite is a pioneering space technology mission designed by the Chinese Academy of Science (CAS) to test key technologies required for gravitational wave detection in space. Temperature stability is a critical element because it can couple with the gravitational wave measurement. A dedicated thermal control with a three-level control method was used on the Taiji-1 satellite science module. The simulation analysis shows that the temperature stability control level of its scientific instrument temperature stability can reach ± 1.7 mK. Combined with the in-orbit temperature results, the temperature stability obtained by using the linear smoothing filter and the Kalman filter reached ± 1.1 and ± 0.5 mK, respectively, which were in good concerted with the simulation data, indicating that the thermal control level of Taiji-1satellite science module reached a high precision.
TAIJI-1 is a micro-gravity experiment spacecraft. The mission target is to verify some key techniques of the spacecraft payloads for the gravitational wave detection, which involve laser interferometer, gravity reference sensor, drag-free control technology, micro-propulsion system, super-quiescent spacecraft platform, etc. The verification of the data processing pipeline required by the next stage of TAIJI Program is also performed. In order to benefit from the joint observatory between TAIJI and LISA in the future, as a space mission, the science operations refer to the existing ESA and NASA standard models, which include the Mission Operations Center (MOC), the Science Operations Center (SOC), the Data Processing Center (DPC), etc. The data processing pipeline connects between SOC and DPC. The SOC obtains the level-0 data from MOC and DPC performs the data processing and distributes the level-2 and level-3 data to SOC. For TAIJI-1 mission, SOC and DPC are two subsystems, which are included into the named science application system (SAS). That is the one of the six-function systems, which operate Chinese space mission. MOC is relevant to the ground support system (GSS) and spacecraft control system, that are also the ones of six-function systems of Chinese space mission. The on-orbit experiment plannings are transported from SAS to GSS, that is similar to from SOC to MOC in NASA standard models. The computer construction and computer software are the basic elements of the SAS, that are constructed completely, before the TAIJI-1 was launched. After TAIJI-1 enters the orbit, the data processing pipeline begins to work and the experimental items of TAIJI-1 are performed in the pipeline. The basic functions, performances and optimization functions of the detection devices in the payloads are verified completely 3 months after launch. In the same time, the methods of data analysis and processing are also verified. As a result, the required indicators of key techniques of the spacecraft for the gravitational wave detection are justified. The data processing pipeline is also reasonable. The relevant codes for data analysis and processing will benefit the next stage of TAIJI Program.
Data processing with multiple domains is an important concept in any platform; it deals with multimedia and textual information. Where textual data processing focuses on a structured or unstructured way of data processing which computes in less time with no compression over the data, multimedia data are processing deals with a processing requirement algorithm where compression is needed. This involve processing of video and their frames and compression in short forms such that the fast processing of storage as well as the access can be performed. There are different ways of performing compression, such as fractal compression, wavelet transform, compressive sensing, contractive transformation and other ways. One way of performing such a compression is working with the high frequency component of multimedia data. One of the most recent topics is fractal transformation which follows the block symmetry and archives high compression ratio. Yet, there are limitations such as working with speed and its cost while performing proper encoding and decoding using fractal compression. Swarm optimization and other related algorithms make it usable along with fractal compression function. In this paper, we review multiple algorithms in the field of fractal-based video compression and swarm intelligence for problems of optimization.
The concealed microcracks in shield tunnel lining present the characteristics of being of small size, unknown shape, and are difficult to detect. Based on the finite-difference time domain (FDTD) approach, this study proposed a new construction method of a refined grid accommodating and combining the variable shapes of microcracks, and capable of designing cross type, mesh type, and wave type microcrack models. The proposed new method also configured steel bars in the models to simulate actual engineering conditions, and characteristic response images of the models under different working conditions were obtained using ground penetrating radar (GPR) technology, which were then compared and analyzed to identify the imaging characteristics and differences of microcracks with variable geometric shapes. The waveform, amplitude, and time span of the characteristic single channel signal were furthermore studied. The results showed that the new method could successfully simulate the GPR characteristic response images of 0.5mm microcracks of diverse geometric shapes. When the microcracks were wavy, their real shape could only be determined after signal pre-processing; the density and quantity of steel bars directly affected the appearance of microcrack characteristic signals; the greater the density and quantity of steel bars, the greater the interference on the waveform, amplitude, and time-frequency range of electromagnetic wave signals; a special correlation existed between the maximum mean root square value of the amplitude and the single channel signal of the cracks. Moreover, the finding that the extension in time and distance in the GPR time distance profile intersected with the cracks was deemed potentially to provide fresh insights into identifying the characteristic points of the cracks in the GPR images. The new method proposed in this study successfully obtained the GPR numerical simulation images and characteristic signals of microcracks with variable geometric shapes. Through the processing and analysis of the characteristic response signals of microcracks, the conclusions obtained were considered to provide an interpretation basis for the detection of microcracks in practical engineering.
In the framework of the representability of ordinal qualitative data by means of interval-valued correspondences, we study interval orders defined on a nonempty set X. We analyse the continuous case, that corresponds to a set endowed with a topology that furnishes an idea of continuity, so that it becomes natural to ask for the existence of quantifications based on interval-valued mappings from the set of data into the real numbers under preservation of order and topology. In the present paper we solve a continuous representability problem for interval orders. We furnish a characterization of the representability of an interval order through a pair of continuous real-valued functions so that each element in X has associated in a continuous manner a characteristic interval or equivalently a symmetric triangular fuzzy number.
To predict the performance of a diesel engine, current practice relies on the use of black-box identification where numerous experiments must be carried out in order to obtain numerical values for model training. Although many diesel engine models based on artificial neural networks (ANNs) have already been developed, they have many drawbacks such as local minima, user burden on selection of optimal network structure, large training data size and poor generalization performance, making themselves difficult to be put into practice. This paper proposes to use extreme learning machine (ELM), which can overcome most of the aforementioned drawbacks, to model the emission characteristics and the brake-specific fuel consumption of the diesel engine under scarce and exponential sample data sets. The resulting ELM model is compared with those developed using popular ANNs such as radial basis function neural network (RBFNN) and advanced techniques such as support vector machine (SVM) and its variants, namely least squares support vector machine (LS-SVM) and relevance vector machine (RVM). Furthermore, some emission outputs of diesel engines suffer from the problem of exponentiality (i.e., the output y grows up exponentially along input x) that will deteriorate the prediction accuracy. A logarithmic transformation is therefore applied to preprocess and post-process the sample data sets in order to improve the prediction accuracy of the model. Evaluation results show that ELM with the logarithmic transformation is better than SVM, LS-SVM, RVM and RBFNN with/without the logarithmic transformation, regardless the model accuracy and training time.
The Flicker Noise Analysis methodology is applied on voltage signals recorded when the electrochemical etching of crystalline surfaces of silicon is performed. The porous silicon material resulting from this procedure is composed of nanocrystals possessing remarkable properties which can be employed in the development of optical, electronic, or biological sensor devices. As a result of the analysis, two dynamic processes during the electrochemical reaction have been identified: one related to the dissolution of the crystalline matrix and the other related to the porous morphology itself.
On-the-fly data integration, i.e. at query time, happens mostly in tightly coupled, homogeneous environments where the partitioning of the data can be controlled or is known in advance. During the process of data fusion, the information is homogenized and data inconsistencies are hidden from the application. Beyond this, we propose in this paper the Nexus metadata model and a processing approach that support on-the-fly data integration in a loosely coupled federation of autonomous data providers, thereby advancing the status quo in terms of flexibility and expressive power. It is able to represent data and schema inconsistencies like multi-valued attributes and multi-typed objects. In an open environment, this best suites the application needs where the data processing infrastructure is not able to decide which attribute value is correct. The Nexus metadata model provides the foundation for integration schemata that are specific to a given application domain. The corresponding processing model provides four complementary query semantics in order to account for the subtleties of multi-valued and missing attributes. In this paper we show that this query semantics is sound, easy to implement, and it builds upon existing query processing techniques. Thus the Nexus metadata model provides a unique level of flexibility for on-the-fly data integration.
The era of the web has evolved and the industry strives to work better every day, the constant need for data to be accessible at a random moment is expanding, and with this expansion, the need to create a meaningful query technique in the web is a major concerns. To transmit meaningful data or rich semantics, machines/ projects need to have the ability to reach the correct information and make adequate connections, this problem is addressed after the emergence of Web 3.0, the semantic web is developing and being collected an immense. Information to prepare, this passes the giant data management test, to provide an ideal result at any time needed. Accordingly, in this article, we present an ideal system for managing huge information using MapReduce structures that internally help an engine bring information using the strength of fair preparation using smaller map occupations and connection disclosure measures. Calculations for similarity can be challenging, this work performs five similarity detection algorithms and determines the time it takes to address the patterns that has to be a better choice in the calculation decision. The proposed framework is created using the most recent and widespread information design, that is, the JSON design, the HIVE query language to obtain and process the information planned according to the customer’s needs and calculations for the disclosure of the interface. Finally, the results on a web page is made available that helps a user stack json information and make connections somewhere in the range of dataset 1 and dataset 2. The results are examined in 2 different sets, the results show that the proposed approach helps to interconnect significantly faster; Regardless of how large the information is, the time it takes is not radically extended. The results demonstrate the interlinking of the dataset 1 and dataset 2 is most notable using LD and JW, the time required is ideal in both calculations, this paper has mechanized the method involved with interconnecting via a web page, where customers can merge two sets of data that should be associated and used.
Due to the current focus of research on ankle rehabilitation robots on structural design, there is still limited research on ankle human–machine interaction technology. In order to enable rehabilitation robots to conduct personalized rehabilitation training based on patients’ ankle movement intentions, we propose a new ankle motion recognition method based on plantar pressure. First, we designed a plantar pressure collection system based on array sensors. Then, we collected nine types of ankle joint motion pressure data from five volunteers and conducted algorithm selection, data processing, and algorithm optimization. Finally, we proposed a small sample optimization algorithm based on support vector machine, with an average recognition rate of 93.16%. The recognition method proposed in this paper can be combined with ankle rehabilitation robots to achieve active rehabilitation functions, laying the foundation for the clinical application of active rehabilitation technology.
The paper focuses on the field of artificial intelligence techniques and their use in biomedical data processing. It concerns the clustering techniques inspired by various ant colonies. The behavior of ant colonies shows many interesting properties that have been used in static and dynamic combinatorial problem-solving tasks (mostly since 1990). Also applications to data clustering have been proposed. This branch is a subject of ongoing research. After the introduction into the state-of-the-art of ant-colony-inspired metaheuristics, an overview of ant-colony-inspired clustering metaheuristics is presented, together with the ACO_DTree method, developed by the first author, which is based on the autocatalytic collective behavior of real insect colonies. Over the basic algorithm it involves techniques to increase robustness and performance of the method. Application to electrocardiogram and electroencephalogram data processing is also presented, together with comparison to other clustering methods.
Sensitivity and data processing speed are important in spectral domain Optical Coherence Tomography (SD-OCT) system. To get a higher sensitivity, zero-padding interpolation together with linear interpolation is commonly used to re-sample the interference data in SD-OCT, which limits the data processing speed. Recently, a time-domain interpolation for SD-OCT was proposed. By eliminating the huge Fast Fourier Transform Algorithm (FFT) operations, the operation number of the time-domain interpolation is much less than that of the zero-padding interpolation. In this paper, a numerical simulation is performed to evaluate the computational complexity and the interpolation accuracy. More than six times acceleration is obtained. At the same time, the normalized mean square error (NMSE) results show that the time-domain interpolation method with cut-off length L = 21 and L = 31 can improve about 1.7 dB and 2.1 dB when the distance mismatch is 2.4 mm than that of zero-padding interpolation method with padding times M = 4, respectively. Furthermore, this method can be applied the parallel arithmetic processing because only the data in the cut-off window is processed. By using Graphics Processing Unit (GPU) with compute unified device architecture (CUDA) program model, a frame (400 A-lines × 2048 pixels × 12 bits) data can be processed in 6 ms and the processing capability can be achieved 164,000 line/s for 1024-OCT and 71,000 line/s for 2048-OCT when the cut-off length is 21. Thus, a high-sensitivity and ultra-high data processing SD-OCT is realized.
Dynamic virtual plant simulation is an attractive research issue in both botany and computer graphics. Data-driven method is an efficient way for motion analysis and animation synthesis. As a widely used tool, motion capture has been used in plant motion data acquisition and analysis. The most prominent and important problem in motion capture for plants is primary data processing such as missing markers reconstruction. This paper presents a novel physics-based approach to motion capture data processing of plants. Firstly, a physics-based mechanics model is found by Lagrangian mechanics for a motion captured plant organ such as a leaf, and then its dynamic mechanical properties are analyzed and relevant model parameters are evaluated. Further, by using the physical model with evaluated parameters, we can calculate the next positions of a maker to reconstruct the missing makers in motion capture sequence. We take an example of a maize leaf and pachira leaf to examine the proposed approach, and the results show that the physics-based method is feasible and effective for plant motion data processing.
The Galileo Project aims to shed light on the nature and characteristics of Unidentified Aerial Phenomena (UAP). We are developing a multi-modal instrumentation suite that will monitor the sky in seven electromagnetic and three audio bands. Computing will play a critical role in this project, enabling the automated collection and processing of data. In this paper, we provide a brief overview of data sources, and describe our plan for computing infrastructure and architecture. We present a proposed real-time pipeline for distinguishing between natural and human-made phenomena, and for detecting objects that fall outside the phenomenological envelope of known phenomena. In addition, we outline the algorithms we will test and evaluate for use in offline data analysis. While preliminary, our work represents a significant step towards a unified data capture and analysis platform for the systematic detection and rigorous scientific study of unusual aerial phenomena in a regional airspace.
Agriculture catalyzes the economy in developing nations. Malaysian agriculture constitutes 4.06 million hectares, with 80% encompassing industrial crops and agro-food production, boosting the economy through implementing precision agriculture (PA). Precision agriculture gives minimal environmental implications by using an unmanned aerial vehicle (UAV), improving sustainability, productivity, and crop production 30-fold instead of conventional methods. This study aims to review the UAV application based on technical requirements with insights into the potentiality of precision agriculture in UAV agriculture technologies, limitations, and solutions.
To optimize the delivery in lung radiation therapy, a better understanding of the tumor motion is required, on one hand, to have a better tumor-targeting efficiency, and on the other hand to avoid as much as possible normal tissues. The four-dimensional computed tomography (4D-CT) allows to quantify tumor motion, but due to artifacts, it introduces biases and errors in tumor localization. Despite this disadvantage, we propose a method to simulate lung motion based on data provided by the 4D-CT for several patients. To reduce uncertainties introduced by the 4D-CT scan, we conveniently treated data using artificial neural networks. More precisely, our approach consists of a data augmentation technique. The data resulting from this processing step are then used to build a training set for another artificial neural network that learns the lung motion. To improve the learning accuracy, we have studied the number of phases required to precisely describe the displacement of each point. Thus, from 1118 points scattered across five patients and defined over 8 or 10 phases, we obtained 5800 points from 50 phases. After training, the network is used to compute the positions of 40 points from five other patients on 10 phases. These points allow to quantify the prediction performance. In comparison with the original data, the ones issued from our treatment process provide a significant increase of the prediction accuracy: an average improvement of 16% can be observed. The motion computed for several points by the neural network that has learnt the lung one exhibits an hysteresis near the one given by the 4D-CT, with an error smaller than 1 mm in the cranio-caudal axis.
Three techniques of data analysis are examined: dimensionality reduction, locally linear embedding and the F-transform. We show that all of them can be connected using the notion of a fuzzy partition and the corresponding construction of a non-local Laplace operator. The article is endowed with results of two comprehensible numerical experiments that compare achieved outputs. An important conclusion is that the outputs correspond to values of the first degree F-transform components.
The problem of measuring instrument calibration and check by its comparison with other instrument used as a reference standard is discussed. The typical tasks of data processing are formulated and solved. The results being published earlier and new ones are presented.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.