HTML Parser V2 Ontology? #3805
Unanswered
martinmoldrup
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I would prefer to use the V2 html parser in partition_html(), but I get the error message:
No <body class='Document'> or <div class='Page'> element found in the HTML.
I have not been able to find any documentation about how the v2 parser is supposed to be used or an ontology schema for the parser. Could you provide any details on the plans for this feature and how to use it?
The reason why I want to use it, is that I do not like the way the v1 parser is promoting short paragraphs to title. And I would like to be able to see what the html tag originally was in the parsed outputs? This was possible in older versions of unstructured in the metadata.
Beta Was this translation helpful? Give feedback.
All reactions