-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Why is the entire parsed md file divided into one chunk? #3992
Comments
Platform: Windows WSL |
Carriage Return / Line Feed delimiter issue? |
It is supposed to be like that, but I am not sure why it fails to recognize the line breaks in the md file. I use ragflow/readme.md as the document. |
Could you please check and report back here, what kind of BOM (Byte Order Mark) the file has? This might be an en-/decoding issue. |
utf-8 @Snify89 |
Manual copy the content from the md file directly into a docx file, which can work properly, but it is best to be able to parse the md file directly. |
Maybe it's a HTML parsing issue?! Haven't reproduced yet, Def, worth a look. Thanks for reporting. |
What about turning down the |
|
I tried reinstalling the latest version of Nightly, but I am still getting the same result. |
Do you get more than 1 chunk, if you use other chunk methods? Maybe the chunk method decided to use just one chunk? Edit: It's weird tho, that the docx chunks better?! |
Using the QA mode can chunk normaly, it seems there is a bug in the general mode regarding MD chunking. |
Describe your problem
The default settings are used. The chunk length of the md file is obviously longer than the setting, which should be due to the absence of a delimiter. The delimiter includes \n but cannot recognize the line break in md.
The text was updated successfully, but these errors were encountered: