Multi group merging algorithm for solving data Shuffle and data skew of securities companies

In the securities industry, the processing and analysis of user data are critical technologies that significantly impact business decision-making and risk control. However, the vast scale and complexity of user data securities companies led to significant Shuffle operations and data skew issues in b...

Full description

Saved in:
Bibliographic Details
Main Authors: CAO Yakun, TANG Xiaoyong
Format: Article
Language:Chinese
Published: China InfoCom Media Group 2025-01-01
Series:大数据
Subjects:
Online Access:http://www.j-bigdataresearch.com.cn/zh/article/111999042/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the securities industry, the processing and analysis of user data are critical technologies that significantly impact business decision-making and risk control. However, the vast scale and complexity of user data securities companies led to significant Shuffle operations and data skew issues in big data computations. Existing optimization methods either relied on hardware upgrades or were limited by domain-specific constraints, failing to address the problem effectively. To resolve this, a multi-group merging algorithm (MGMA) based on user relationships was proposed, which improved computational efficiency and reduces resource consumption through effective grouping and optimization strategies. Experimental results showed that, compared to the no optimized(NO) control group, MGMA algorithm achieved a 20% data skew rate, 72% memory usage, and 61% computation time. All three indicators surpass those of the other four comparison optimization methods.
ISSN:2096-0271