//www.snoollab.com:80/handle/2SGJ60CL/16338 <p>统计与数据科学系成立于 2019年4月,目前共有科研教学系列教师 8人,其中有讲席教授1人,教授2人,副教授2人,Tenure-track助理教授2人,访问助理教授1人。本系有统计学和数据科学(筹)2个学科方向,包含生物统计、金融统计、实验设计、高维数据、随机矩阵、时间序列、概率论、数据科学等主要研究领域。本系的教师100%有境外学习或工作经历,包括1名国际数学家大会邀请报告人兼国家特聘专家,1位国家自然科学奖二等奖获得者,1名国际数理统计学会会士,理事会常务理事,Medallion讲座演讲者。</p> Fri, 02 Aug 2024 00:07:02 GMT 2024-08-02T00:07:02Z PractiCPP: a deep learning approach tailored for extremely imbalanced datasets in cell-penetrating peptide prediction //www.snoollab.com:80/handle/2SGJ60CL/789280 题名: PractiCPP: a deep learning approach tailored for extremely imbalanced datasets in cell-penetrating peptide prediction 作者: Shi, Kexin; Xiong, Yuanpeng; Wang, Yu; Deng, Yifan; Wang, Wenjia; Jing, Bingyi; Gao, Xin 摘要: Motivation Effective drug delivery systems are paramount in enhancing pharmaceutical outcomes, particularly through the use of cell-penetrating peptides (CPPs). These peptides are gaining prominence due to their ability to penetrate eukaryotic cells efficiently without inflicting significant damage to the cellular membrane, thereby ensuring optimal drug delivery. However, the identification and characterization of CPPs remain a challenge due to the laborious and time-consuming nature of conventional methods, despite advances in proteomics. Current computational models, however, are predominantly tailored for balanced datasets, an approach that falls short in real-world applications characterized by a scarcity of known positive CPP instances.Results To navigate this shortfall, we introduce PractiCPP, a novel deep-learning framework tailored for CPP prediction in highly imbalanced data scenarios. Uniquely designed with the integration of hard negative sampling and a sophisticated feature extraction and prediction module, PractiCPP facilitates an intricate understanding and learning from imbalanced data. Our extensive computational validations highlight PractiCPP's exceptional ability to outperform existing state-of-the-art methods, demonstrating remarkable accuracy, even in datasets with an extreme positive-to-negative ratio of 1:1000. Furthermore, through methodical embedding visualizations, we have established that models trained on balanced datasets are not conducive to practical, large-scale CPP identification, as they do not accurately reflect real-world complexities. In summary, PractiCPP potentially offers new perspectives in CPP prediction methodologies. Its design and validation, informed by real-world dataset constraints, suggest its utility as a valuable tool in supporting the acceleration of drug delivery advancements.Availability and implementation The source code of PractiCPP is available on Figshare at https://doi.org/10.6084/m9.figshare.25053878.v1. Fri, 19 Jul 2024 08:29:18 GMT //www.snoollab.com:80/handle/2SGJ60CL/789280 2024-07-19T08:29:18Z Risk prediction models of exacerbations in adults with asthma: An updated systematic review //www.snoollab.com:80/handle/2SGJ60CL/789259 题名: Risk prediction models of exacerbations in adults with asthma: An updated systematic review 作者: Liu, Anqi; Chen, Wenjia; Yadav, Chandra Prakash Fri, 19 Jul 2024 06:51:22 GMT //www.snoollab.com:80/handle/2SGJ60CL/789259 2024-07-19T06:51:22Z Time-delayed Geodemographic Scaling Algorithm for Fast COVID-19 Simulation //www.snoollab.com:80/handle/2SGJ60CL/789189 题名: Time-delayed Geodemographic Scaling Algorithm for Fast COVID-19 Simulation 作者: Sijin Wu; Zhenqing Wu; Lang Mo; Zhejun Huang; Lili Yang 摘要: COVID-19 has brought a series of challenges to human society and safety. Here a fine-grained COVID-19 Agent-based model (ABM) is proposed to investigate the possible outcomes of different policy interventions and their influence on subpopulations. A simulation platform is built to mimic the individual social contacts in different activities as the primary virus transmission way. Also, the simulation platform utilises a novel scale extension algorithm to enhance the simulation size while keeping the granularity, allowing the simulation size to range from a community with 30,000 residents to a megacity with 20 million residents. Finally, the model takes the recent Shenzhen outbreak as a case study to validate the simulation platform. The simulation results are consistent with the reported data, which lays the foundations to support different simulation scenarios to answer what-if questions efficiently. Fri, 19 Jul 2024 06:41:16 GMT //www.snoollab.com:80/handle/2SGJ60CL/789189 2024-07-19T06:41:16Z A Causal Inference Method Based on Front-Door Criterion and Difference in Differences for Analyzing Traffic Conditions //www.snoollab.com:80/handle/2SGJ60CL/789188 题名: A Causal Inference Method Based on Front-Door Criterion and Difference in Differences for Analyzing Traffic Conditions 作者: Yixuan Ding; Qianqian Wang; Zimo Qi; Liuxin Zhu; Youxin Zhu; Lizhuo Luo; Lili Yang 摘要: In the period of pandemic, various regions implemented distinct control measures, leading to subtle changes in road traffic patterns. These fluctuations were further influenced by external factors such as public risk perception and weather conditions. Although prior research has employed observational methods to assess traffic states and applied causal inference techniques to passive data, the fusion of these approaches in the field of transportation remains relatively unexplored. This study endeavors to address this research gap by investigating the causal impact of pandemic containment policies on urban traffic in multiple cities across China. Our analytical framework combines the Additive Noise Model (ANM) algorithm with rigorous statistical tests designed to capture casual relationship. Specifically, we employ the front-door criterion and a difference-in-differences approach to validate these causal connections. Refutation tests and placebo tests are conducted. Ultimately, our methodology yields a generalized linear model optimized for predicting traffic patterns under varying intervention scenarios. Our research thus paves the way for incorporating causal inference methodologies into the formulation of resilient transportation strategies amid public health crises. Fri, 19 Jul 2024 06:41:08 GMT //www.snoollab.com:80/handle/2SGJ60CL/789188 2024-07-19T06:41:08Z Road Traffic Accident Severity Prediction under Unbalanced Data //www.snoollab.com:80/handle/2SGJ60CL/789187 题名: Road Traffic Accident Severity Prediction under Unbalanced Data 作者: Jiaxin Lu; Zhejun Huang; Lili Yang 摘要: In recent years, road traffic safety has been a major concern. Research on road traffic accidents is crucial to address people’s worries about their daily commutes. This study focuses on California, using U.S. traffic accident data from February 2016 to the end of 2021. The research employs two data equalization methods: single sample and mixed sample. Traffic accident severity prediction models are established based on the XGBoost algorithm. These models are compared using accuracy and ROC curves. The model with the "random undersampling + SMOTENC oversampling" sequence is the most effective. The SHAP interpretability method analyzes accident factors’ impact on the model results. Consequently, specific strategies for preventing road traffic accidents are proposed. Fri, 19 Jul 2024 06:41:03 GMT //www.snoollab.com:80/handle/2SGJ60CL/789187 2024-07-19T06:41:03Z Non-stationary temporal-spatio correlation analysis of information-driven complex financial dynamics //www.snoollab.com:80/handle/2SGJ60CL/788751 题名: Non-stationary temporal-spatio correlation analysis of information-driven complex financial dynamics 作者: Zhang, Jiu; Zheng, Bo; Jin, Lifu; Li, Yan; Jiang, Xiongfei 摘要: The explosion of information production provides a new perspective on investigating complex dynamic systems such as the financial markets. With large-scale historical data, a textual-based sentiment index is introduced to study the correlations between the external information and the price return in the stock market. In particular, a novel approach taking into account the non-stationary effect of the sentiment is proposed to compute the sentiment-return correlation function, and it reveals a non-zero correlation between the past sentiment and the future motion of the price return. Such a computation is then extended to a cross-correlation form which describes the correlations between different sentiment indexes and price returns. A stratified structure of the cross-correlation functions is observed. With the random matrix theory, the features of the stratified structure are quantitatively analyzed. Finally, an investment strategy is constructed based on the temporal correlation. Fri, 19 Jul 2024 04:09:11 GMT //www.snoollab.com:80/handle/2SGJ60CL/788751 2024-07-19T04:09:11Z A Scale-Invariant Relaxation in Low-Rank Tensor Recovery with an Application to Tensor Completion //www.snoollab.com:80/handle/2SGJ60CL/788712 题名: A Scale-Invariant Relaxation in Low-Rank Tensor Recovery with an Application to Tensor Completion 作者: Zheng, Huiwen; Lou, Yifei; Tian, Guoliang; Wang, Chao 摘要: In this paper, we consider a low -rank tensor recovery problem. Based on the tensor singular value decomposition (t-SVD), we propose the ratio of the tensor nuclear norm and the tensor Frobenius norm (TNF) as a novel nonconvex surrogate of tensor's tubal rank. The rationale of the proposed model for enforcing a low -rank structure is analyzed as its theoretical properties. Specifically, we introduce a null space property (NSP) type condition, under which a low -rank tensor is a local minimum for the proposed TNF recovery model. Numerically, we consider a low -rank tensor completion problem as a specific application of tensor recovery and employ the alternating direction method of multipliers (ADMM) to secure a model solution with guaranteed subsequential convergence under mild conditions. Extensive experiments demonstrate the superiority of our proposed model over state-of-the-art methods. Fri, 19 Jul 2024 04:03:39 GMT //www.snoollab.com:80/handle/2SGJ60CL/788712 2024-07-19T04:03:39Z Association of daytime napping with incidence of chronic kidney disease and end-stage kidney disease: A prospective observational study //www.snoollab.com:80/handle/2SGJ60CL/788681 题名: Association of daytime napping with incidence of chronic kidney disease and end-stage kidney disease: A prospective observational study 作者: Li, Qinjun; Shan, Ying; Liao, Jingchi; Wang, Ling; Wei, Yanling; Dai, Liang; Kan, Sen; Shi, Jianqing; Huang, Xiaoyan; Lu, Guoyuan 摘要: Background and aims Few studies have examined the relationship between daytime napping and risk of kidney diseases. We aimed to investigate the association of daytime napping with the incidence of chronic kidney disease (CKD) and end-stage kidney disease (ESKD). We also examined whether sleep duration modified the association of nap with CKD or ESKD.Methods We recruited 460,571 European middle- to older-aged adults without prior CKD or ESKD between March 13, 2006, and October 1, 2010, in the UK Biobank. Sleep behavior data were obtained through questionnaires administered during recruitment. The analysis of the relationship between napping and the occurrence of CKD and ESKD utilized Cox proportional hazards regression models. The modification role of sleep duration on the effect of nap on CKD and ESKD was also examined.Results After a mean follow-up of 11.1 (standard deviation 2.2) years, we observed 28,330 incident CKD cases and 927 ESKD cases. The daytime napping was associated with incident CKD (P for trend = .004). After fully adjusted, when compared with participants who did not take nap, those in sometimes and usually nap groups had higher risk of CKD. Nevertheless, the available evidence did not support a link between daytime napping and ESKD (P for trend = .06). Simultaneously, there was insufficient evidence suggesting that sleeping duration modified the association of daytime napping with incident CKD or ESKD.Conclusion Daytime napping was associated with an increased risk of CKD. However, the absence of conclusive evidence did not indicate a connection between daytime napping and ESKD. Fri, 19 Jul 2024 03:59:51 GMT //www.snoollab.com:80/handle/2SGJ60CL/788681 2024-07-19T03:59:51Z Distributional Approximation for General Curie-Weiss Models with Size-dependent Inverse Temperatures //www.snoollab.com:80/handle/2SGJ60CL/788574 题名: Distributional Approximation for General Curie-Weiss Models with Size-dependent Inverse Temperatures 作者: Shao, Qi-Man; Zhang, Mengchen; Zhang, Zhuo-Song 摘要: The Curie-Weiss model is a statistical physics model that describes the behavior of a system of particles with mutual interactions. In this paper, we apply Stein's method to establish Berry-Esseen bounds for both normal and non-normal approximations of a broad types of Curie-Weiss model, incorporating a size-dependent inverse temperature. Our result encompasses the Blumer-Emery-Griffiths model as a particular instance, while surpassing the convergence rate of earlier findings by Eichelsbacher and Martschink (2014). By using Stein's method, we provide a comprehensive analysis of the Curie-Weiss model, offering improved bounds on the rate of convergence. Fri, 19 Jul 2024 03:45:40 GMT //www.snoollab.com:80/handle/2SGJ60CL/788574 2024-07-19T03:45:40Z Flexible, efficient, and accurate tests for epidemics //www.snoollab.com:80/handle/2SGJ60CL/788429 题名: Flexible, efficient, and accurate tests for epidemics 作者: Fang, Linjiajie; Jing, Bing-Yi; Ling, Shen; Wang, Qiyue; Yang, Qing 摘要: Group testing involves discovering a small subset of distinguished subjects from a large population while efficiently reducing the total number of tests. It has been widely used for industrial testing, information technology, and biology, especially epidemic screening. Tests, in reality, are noisy for the presence of false outcomes. Some tests are accurate but time-consuming, while others are cheaper but less accurate. Exactly which test to use is constrained by various considerations, such as availability, cost, accuracy, and efficiency. In this paper, we propose flexible, efficient, and accurate tests (FEATs). FEATs are based on group testing with simple but careful designs by incorporating ideas such as close contact cliques and repeated tests. FEATs could dramatically improve the efficiency or accuracy of existing tests. For example, for accurate but slow tests, the FEAT can improve efficiency multiple times without compromising accuracy. On the other hand, for fast but inaccurate tests, the FEAT can sharply reduce the false-negative rate (FNR) and significantly increase efficiency. Theoretical justifications are provided. We point out some scenarios where the FEAT can be effectively employed. Fri, 19 Jul 2024 03:26:39 GMT //www.snoollab.com:80/handle/2SGJ60CL/788429 2024-07-19T03:26:39Z