Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix W2V stream data newline error. #82

Merged
merged 1 commit into from
Jan 9, 2024
Merged

Conversation

yupyub
Copy link
Contributor

@yupyub yupyub commented Jan 9, 2024

While using the W2V model, a vulnerability arises, resulting in a memory error if the input stream data contains empty lines without characters.

Cause
During the reading of stream data, if a line contains only a newline character, the num_nnz variable is incremented by 1. code

data_size = len(data) # 0
_vali_size = min(vali_n, len(data) - 1) # -1
num_nnz += (data_size - _vali_size) # +1

Later on, num_nnz is utilized as total_lines in the _sort_and_compressed_binarization() function.
The values stored in the path file are pass to the records vector, and this vector is read based on the total_lines. code
If the calculation of num_nnz is inflated due to the newline, it exceeds the index of the records vector, leading to references outside the bounds.
Consequently, reading unexpected values triggers a segment fault or program malfunction.

Changes
In instances where an empty line is inputted, it has been modified to be disregarded using the continue statement. Additionally, a typo identified during debugging has been rectified.

@ita9naiwa ita9naiwa self-requested a review January 9, 2024 11:46
@ita9naiwa ita9naiwa merged commit efd7d0c into kakao:dev Jan 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants