Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  Bestsellers

  • articleNo Access

    Traffic Sign Recognition Using a Synthetic Data Training Approach

    Traffic Sign Recognition (TSR) is a crucial component in many automotive applications, such as driver assistance, sign maintenance, and vehicle autonomy. In this paper, we present an efficient approach to training a machine learning-based TSR solution. In our choice of recognition method, we have opted for convolutional neural networks, which have demonstrated best-in-class performance in previous works on TSR. One of the challenges related to training deep neural networks is the requirement for a large amount of training data. To circumvent the tedious process of acquiring and manually labelling real data, we investigate the use of synthetically generated images. Our networks, trained on only synthetic data, are capable of recognising traffic signs in challenging real-world footage. The classification results achieved on the GTSRB benchmark are seen to outperform existing state-of-the-art solutions.

  • articleNo Access

    PrimitivePose: Generic Model and Representation for 3D Bounding Box Prediction of Unseen Objects

    A considerable amount of research is concerned with the challenging task of estimating three-dimensional (3D) pose and size for multi-object indoor scene configurations. Many existing models rely on a priori known object models, such as 3D CAD models and are therefore limited to a predefined set of object categories. This closed-set constraint limits the range of applications for robots interacting in dynamic environments where previously unseen objects may appear. This paper addresses this problem with a highly generic 3D bounding box detection method that relies entirely on geometric cues obtained from depth data percepts. While the generation of synthetic data, e.g. synthetic depth maps, is commonly used for this task, the well-known synth-to-real gap often emerges, which prohibits transition of models trained solely on synthetic data to the real world. To ameliorate this problem, we use stereo depth computation on synthetic data to obtain pseudo-realistic disparity maps. We then propose an intermediate representation, namely disparity-scaled surface normal (SN) images, which encodes geometry and at the same time preserves depth/scale information unlike the commonly used standard SNs. In a series of experiments, we demonstrate the usefulness of our approach, detecting everyday objects on a captured data set of tabletop scenes, and compare it to the popular PoseCNN model. We quantitatively show that standard SNs are less adequate for challenging 3D detection tasks by comparing predictions from the model trained on disparity alone, SNs and disparity-scaled SNs. Additionally, in an ablation study we investigate the minimal number of training samples required for such a learning task. Lastly, we make the tool used for 3D object annotation publicly available at: https://preview.tinyurl.com/3ycn8v5k. A video showcasing our results can be found at: https://preview.tinyurl.com/dzdzabek.

  • articleOpen Access

    Synthetic Data Generation for Morphological Analyses of Histopathology Images with Deep Learning Models

    In this study, we introduce a new synthetic data generation procedure for augmentation of histopathology image data. This is an extension to our previous research in which we proved the possibility to apply deep learning models for morphological analysis of tumor cells, trained on synthetic data only. The medical problem considered is related to the Ki-67 protein proliferation index calculation. We focused on the problem of cell counting in cell conglomerates, which are considered as structures composed of overlapping tumor cells. The lack of large and standardized data sets is a critical problem in medical image classification. Classical augmentation procedures are not sufficient. Therefore, in this research, we expanded our previous augmentation approach for histopathology images and we proved the possibility to apply it for a cell-counting problem.

  • articleOpen Access

    Integrated Computing Platform for Detection and Tracking of Unidentified Aerial Phenomena (UAP)

    The Galileo Project aims to shed light on the nature and characteristics of Unidentified Aerial Phenomena (UAP). We are developing a multi-modal instrumentation suite that will monitor the sky in seven electromagnetic and three audio bands. Computing will play a critical role in this project, enabling the automated collection and processing of data. In this paper, we provide a brief overview of data sources, and describe our plan for computing infrastructure and architecture. We present a proposed real-time pipeline for distinguishing between natural and human-made phenomena, and for detecting objects that fall outside the phenomenological envelope of known phenomena. In addition, we outline the algorithms we will test and evaluate for use in offline data analysis. While preliminary, our work represents a significant step towards a unified data capture and analysis platform for the systematic detection and rigorous scientific study of unusual aerial phenomena in a regional airspace.

  • chapterOpen Access

    Implementing a Cloud Based Method for Protected Clinical Trial Data Sharing

    Clinical trials generate a large amount of data that have been underutilized due to obstacles that prevent data sharing including risking patient privacy, data misrepresentation, and invalid secondary analyses. In order to address these obstacles, we developed a novel data sharing method which ensures patient privacy while also protecting the interests of clinical trial investigators. Our flexible and robust approach involves two components: (1) an advanced cloud-based querying language that allows users to test hypotheses without direct access to the real clinical trial data and (2) corresponding synthetic data for the query of interest that allows for exploratory research and model development. Both components can be modified by the clinical trial investigator depending on factors such as the type of trial or number of patients enrolled. To test the effectiveness of our system, we first implement a simple and robust permutation based synthetic data generator. We then use the synthetic data generator coupled with our querying language to identify significant relationships among variables in a realistic clinical trial dataset.

  • chapterOpen Access

    SynTwin: A graph-based approach for predicting clinical outcomes using digital twins derived from synthetic patients

    The concept of a digital twin came from the engineering, industrial, and manufacturing domains to create virtual objects or machines that could inform the design and development of real objects. This idea is appealing for precision medicine where digital twins of patients could help inform healthcare decisions. We have developed a methodology for generating and using digital twins for clinical outcome prediction. We introduce a new approach that combines synthetic data and network science to create digital twins (i.e. SynTwin) for precision medicine. First, our approach starts by estimating the distance between all subjects based on their available features. Second, the distances are used to construct a network with subjects as nodes and edges defining distance less than the percolation threshold. Third, communities or cliques of subjects are defined. Fourth, a large population of synthetic patients are generated using a synthetic data generation algorithm that models the correlation structure of the data to generate new patients. Fifth, digital twins are selected from the synthetic patient population that are within a given distance defining a subject community in the network. Finally, we compare and contrast community-based prediction of clinical endpoints using real subjects, digital twins, or both within and outside of the community. Key to this approach are the digital twins defined using patient similarity that represent hypothetical unobserved patients with patterns similar to nearby real patients as defined by network distance and community structure. We apply our SynTwin approach to predicting mortality in a population-based cancer registry (n=87,674) from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA). Our results demonstrate that nearest network neighbor prediction of mortality in this study is significantly improved with digital twins (AUROC=0.864, 95% CI=0.857-0.872) over just using real data alone (AUROC=0.791, 95% CI=0.781-0.800). These results suggest a network-based digital twin strategy using synthetic patients may add value to precision medicine efforts.

  • chapterNo Access

    Synthetic data based nonparametric testing of parametric mean-regression models with censored data

    We develop a kernel smoothing based test of a parametric mean-regression model against a nonparametric alternative when the response variable is right-censored. The new test statistic is inspired by the synthetic data approach for estimating the parameters of a (non)linear regression model under censoring. The asymptotic critical values of our tests are given by the quantiles of the standard normal law. The test is consistent against any fixed alternative, against local Pitman alternatives and uniformly over alternatives in Hölder classes of functions of known regularity.

  • chapterNo Access

    Synthetic dataset for compositional learning

    This contribution presents a framework for a generation of synthetic images. The framework is built on top of the Unreal Engine 4, a software kit capable of rendering realistic images. Besides image data, additional label information, such as depth, normal maps and object components masks, are generated. Hierarchical nature of generated labels corresponds to hierarchical representations which we want to be captured by the neural network. Such labels enable training of deep models in a compositional manner. This leads to the better understanding of the internal representations of the models and acceleration of the learning procedure. The framework allows users to render arbitrary scenes and objects according to their specific domain.

  • chapterOpen Access

    ReXErr: Synthesizing Clinically Meaningful Errors in Diagnostic Radiology Reports

    Accurately interpreting medical images and writing radiology reports is a critical but challenging task in healthcare. Both human-written and AI-generated reports can contain errors, ranging from clinical inaccuracies to linguistic mistakes. To address this, we introduce ReXErr, a methodology that leverages Large Language Models to generate representative errors within chest X-ray reports. Working with board-certified radiologists, we developed error categories that capture common mistakes in both human and AI-generated reports. Our approach uses a novel sampling scheme to inject diverse errors while maintaining clinical plausibility. ReXErr demonstrates consistency across error categories and produces errors that closely mimic those found in real-world scenarios. This method has the potential to aid in the development and evaluation of report correction algorithms, potentially enhancing the quality and reliability of radiology reporting.

  • chapterOpen Access

    All Together Now: Data Work to Advance Privacy, Science, and Health in the Age of Synthetic Data

    There is a disconnect between data practices in biomedicine and public understanding of those data practices, and this disconnect is expanding rapidly every day (with the emergence of synthetic data and digital twins and more widely adopted Artificial Intelligence (AI)/Machine Learning tools). Transparency alone is insufficient to bridge this gap. Concurrently, there is an increasingly complex landscape of laws, regulations, and institutional/ programmatic policies to navigate when engaged in biocomputing and digital health research, which makes it increasingly difficult for those wanting to “get it right” or “do the right thing.” Mandatory data protection obligations vary widely, sometimes focused on the type of data (and nuanced definition and scope parameters), the actor/entity involved, or the residency of the data subjects. Additional challenges come from attempts to celebrate biocomputing discoveries and digital health innovations, which frequently transform fair and accurate communications into exaggerated hype (e.g., to secure financial investment in future projects or lead to more favorable tenure and promotion decisions). Trust in scientists and scientific expertise can be quickly eroded if, for example, synthetic data is perceived by the public as “fake data” or if digital twins are perceived as “imaginary” patients. Researchers appear increasingly aware of the scientific and moral imperative to strengthen their work and facilitate its sustainability through increased diversity and community engagement. Moreover, there is a growing appreciation for the “data work” necessary to have scientific data become meaningful, actionable information, knowledge, and wisdom–not only for scientists but also for the individuals from whom those data were derived or to whom those data relate. Equity in the process of biocomputing and equity in the distribution of benefits and burdens of biocomputing both demand ongoing development, implementation, and refinement of embedded Ethical, Legal and Social Implications (ELSI) research practices. This workshop is intended to nurture interdisciplinary discussion of these issues and to highlight the skills and competencies all too often considered “soft skills” peripheral to other skills prioritized in traditional training and professional development programs. Data scientists attending this workshop will become better equipped to embed ELSI practices into their research.