Projects

Track1：LLM

Member of a Startup focused on AI-driven trading solution

1, 2025

NL2SQL-driven QA System for Robo-advisor, Zhipu Finance LLM Challenge

1, 2025

Development of Financial Event (News) Analysis and Intelligent Dialogue System Based on Large Language Models

In Collaboration with Shenzhen Investoday Data Technology Co., Ltd.

12, 2024

Track2：Quant and Data Science

Research Assistant in Finance (Asset Pricing) @ Hunan University, College of Finance and Statistics

Working Paper: Reevaluating Stock Market Timing: Machine Learning Perspectives on Low-Frequency Characteristics, Lin Zhu, Guohao Tang, Yulong Zhu, and Fuwei Jiang
Currently under review at the Journal of Portfolio Management.

Advanced Decision Framework for Electronics Manufacturing

9, 2024

Employing a hybrid approach of statistical sampling, dynamic programming, and integer programming with penalty terms, this project addresses critical decision-making challenges in electronic product manufacturing. It focuses on optimizing component inspection, assembly quality control, and cost management to maximize profits amidst fluctuating defect rates. The methodological synergy ensures a balance between production efficiency and product reliability, providing a cutting-edge solution for quality assurance in electronics production.

⭐ This is the paper for CUMCM (China Undergraduate Mathematical Contest in Modeling), which won the National Second Prize."

Under the Framework of Automated Alpha (Stock Selection Factor) Mining: A Comparative Study of Different Methods (Tree-based, Reinforcement Learning, Deep Learning) and Research on Overfitting and Generalization Capabilities

7, 2024 （In progress）

Under the framework of automated Alpha (stock selection factor) mining, this project conducts a comparative study of different methods, including tree-based models, reinforcement learning (RL), and deep learning, to explore their effectiveness in generating Alpha signals. Custom operators across hundreds of dimensions have been developed for both stock and cryptocurrency markets, constructing Alpha signals using Reverse Polish Notation (RPN). The reinforcement learning component, optimized with algorithms like Proximal Policy Optimization (PPO), has been fine-tuned at multiple stages to enhance the Alpha Mining framework. Current efforts focus on refining RL details, improving algorithm execution efficiency, and addressing challenges related to overfitting and generalization capabilities to ensure robust and scalable performance.

⭐ This is an early project and it's under exploration.

Replication of Barra CNE6

9, 2023

Implemented the BARRA risk model for A-shares using optimized techniques (via CVXPY), including 19 primary factors, 31 industry factors, and 1 country factor; performed portfolio risk evaluations and performance attribution, with primary factor returns showing a T-value greater than 2 in approximately 60% of the periods

⭐ To be supplemented: Style factor return curve chart.

Industry Factors' Return

Fundamental Factors and Stock Returns - Based on Machine Learning Methods

@ Jing Xia, @ Xinfang Tian

12, 2022

This study employs a comprehensive set of 207 fundamental and volume-price factors from the United States equity market spanning from January 1985 to October 2022. Utilizing an array of 10 machine learning techniques—including linear regression, penalized linear regression, tree-based methods, and neural networks—the research synthesizes factor signals and constructs investment portfolios. The empirical results indicate that machine learning algorithms effectively capture the relationship between anomalies (factors) and portfolio returns. With a 1-year training window and monthly rebalancing, the resulting long-short portfolios generate average annualized returns ranging from 16.5% to 22.8%, with Sharpe ratios varying between 0.69 and 1.43. Modifying the training window to 3 months and 24 months produces annualized returns and Sharpe ratios that range from 11.9% to 19.6% and 0.57 to 1.18, and from 4.96% to 21.2% and 0.59 to 1.54, respectively.

Returns curve of different portfolios

机器学习答辩Slide含封面.pdf

Exploring the Impact of the U.S. Macroeconomy on the Chinese Stock Market - Based on Market and Individual Stock Dimensions

10, 2022

This study firstly employs the Partial Least Squares (PLS) technique to elucidate the volatility phenomena within the Chinese equity market, leveraging a comprehensive set of 120 macroeconomic indicators from the FRED-MD database. The empirical findings suggest a robust association with a multitude of factors, including the global economic framework, supply chain fluctuations, and the dynamics of the China-US trade relationship. Furthermore, the manuscript assesses the systemic risk imparted by Chinese enterprises upon the US macroeconomy, utilizing a confluence of macroeconomic indices and individual stock return series. To enhance dimensionality reduction, the study implements both Principal Component Analysis (PCA) and Sparse-PCA. The analysis reveals that Sparse-PCA affords superior economic interpretability of the principal components, outperforming the traditional PCA methodology.

⭐This work is yet unfinished, and I intend to continue with the idea presented in the second part of this paper, constructing some stock selection or risk factors to further explore the role of macroeconomic data.

Sparse PCA component weights

The Impact of R&D-to-Market Ratio on Factor Models: Evidence from China's A-Share Market

6, 2022

Based on data from the Chinese A-share market post-2006, this study investigates the impact of the R&D expenditure to market capitalization ratio (RDm) on factor models. Data processing and analysis were conducted using SAS software, and equally weighted and market capitalization-weighted investment portfolios were constructed to test the effectiveness of the RDm indicator across different industries and sample periods. The results show that the RDm indicator achieved an average return of 0.941% and a Sharpe ratio of 1.131 in equally weighted portfolios, both of which are significant at the 5% level. In market capitalization-weighted portfolios, the average return was 0.944%, with a Sharpe ratio of 0.901. Further analysis reveals that the RDm indicator exhibits significant excess returns in the CAPM model but fails to emerge as an independent new factor in the Fama-French three-factor and five-factor models. Additionally, industry-specific tests indicate that the RDm indicator performs better in the primary industry compared to the secondary and tertiary industries. Across different sample periods, the returns from 2010-2020 were significantly higher than those from 2015-2020. The findings provide investors with a reference for R&D expenditure-based investment strategies and highlight the limitations of the RDm indicator in factor models.

绿色金融支持下农旅融合助力美丽乡村建设——基于江西省永修县柘林镇的调查研究

@Ke Yi, @Yinting Chen, @Yiwen Wang, @Haoyuan Wei, @Jing Xia

9, 2021

This research report was produced during our ‘三下乡’ summer social practice in Jiangxi province during the summer vacation of our freshman year. That experience is truly unforgettable!

附件：绿色金融支持下农旅融合助力美丽乡村建设——基于江西省永修县柘林镇的调查研究.pdf

Mosaic tile color selection model based on clustering algorithm

6, 2021

Mosaic tiles readily accommodate various texts or patterns through assembly; however, their limited color palette, due to technological and cost constraints, necessitates a preliminary selection of tiles with hues closest to the original image's colors for successful composition. Drawing inspiration from the K-means clustering algorithm, this paper streamlines and refines it, leveraging Monte Carlo methods and multi-objective programming principles. By employing particle swarm optimization, we derive an algorithm suited to ascertain the correlation between original colors and tile hues.

We simplify and enhance the K-means clustering algorithm by bypassing the need for initial centroids, directly clustering due to known points, and incorporating the Davies-Boulding index to select the most optimal distance metric—Euclidean distance—thereby achieving the optimal mapping of original to tile colors.

Presuming the uniformity of point distribution as a measure of the mosaic's visual fidelity, we introduce the concept of coverage rate (𝜔) to gauge the evenness of the distribution of corresponding points in RGB color space, calculating its value via Monte Carlo methods.

Subsequently, we structure a multi-objective function M = 𝛼𝑒 − 𝛽𝑖, accounting for varying corporate priorities on cost (𝑖) and visual effect (𝑒), modifying the weights (𝛼, 𝛽) accordingly, and determining additional necessary color types and their RGB values for diverse scenarios.

By modifying the classic steps of the K-means clustering algorithm with the inclusion of the DBI metric for clustering efficacy, and introducing the notion of "coverage rate" as a standard for mosaic expressiveness, we utilize contemporary optimization techniques—particle swarm optimization (PSO)—to pinpoint the optimal coverage rate.

Page updated

Google Sites

Report abuse