You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe
Model I am using (UniLM, MiniLM, LayoutLM ...): UniLM
Issue Description:
I have noticed an issue with the MWPBench test dataset downloaded from the UNILM repository. According to the paper, the test dataset should contain 18,408 entries. However, upon downloading and verifying the dataset, I found that it actually contains 17,470 entries.
After careful investigation, I discovered that the missing data corresponds to the content of the AGIEval-Math Competition. This discrepancy could affect the evaluation of models trained on this dataset.
Details:
Expected number of entries: 18,408
Actual number of entries: 17,470
Missing content: AGIEval-Math Competition
Impact:
This discrepancy may lead to inaccurate evaluation results for models trained and tested on this dataset.
Request:
Could you please look into this issue and provide an updated dataset that includes all the entries mentioned in the paper? This would ensure fair and accurate evaluation of models.
Thank you for your attention to this matter.
The text was updated successfully, but these errors were encountered:
Describe
Model I am using (UniLM, MiniLM, LayoutLM ...): UniLM
Issue Description:
I have noticed an issue with the MWPBench test dataset downloaded from the UNILM repository. According to the paper, the test dataset should contain 18,408 entries. However, upon downloading and verifying the dataset, I found that it actually contains 17,470 entries.
After careful investigation, I discovered that the missing data corresponds to the content of the AGIEval-Math Competition. This discrepancy could affect the evaluation of models trained on this dataset.
Details:
Expected number of entries: 18,408
Actual number of entries: 17,470
Missing content: AGIEval-Math Competition
Impact:
This discrepancy may lead to inaccurate evaluation results for models trained and tested on this dataset.
Request:
Could you please look into this issue and provide an updated dataset that includes all the entries mentioned in the paper? This would ensure fair and accurate evaluation of models.
Thank you for your attention to this matter.
The text was updated successfully, but these errors were encountered: