Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update XSLTs for new Presentation XML format #418

Open
11 of 12 tasks
Intelligent2013 opened this issue Nov 3, 2024 · 19 comments
Open
11 of 12 tasks

Update XSLTs for new Presentation XML format #418

Intelligent2013 opened this issue Nov 3, 2024 · 19 comments
Assignees
Labels
enhancement New feature or request

Comments

@Intelligent2013 Intelligent2013 added the enhancement New feature or request label Nov 3, 2024
@Intelligent2013 Intelligent2013 self-assigned this Nov 3, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Metanorma Nov 3, 2024
@opoudjis
Copy link

With the new Presentation XML refactor coming out of metanorma/isodoc#617, metanorma-iso now breaks on trying to test for STS conversion. Should I remove that integration testing, or keep it?

@Intelligent2013
Copy link
Contributor Author

With the new Presentation XML refactor coming out of metanorma/isodoc#617, metanorma-iso now breaks on trying to test for STS conversion. Should I remove that integration testing, or keep it?

Please keep it, I'm going to update mnconvert for the test passing as minimum. If the integration testing is stopping the release, then remove it temporarely.

@Intelligent2013
Copy link
Contributor Author

Currently, mnconvert allows to convert ISO/NISO STS XML to Metanorma XML.
Metanorma XML is constantly changing. The changes in the Metanorma XML should be reflected in the STS XML to Metanorma XML conversion rules in the mnconvert. Errors and discrepancies between 'true' Metanorma XML and converted may occur, because the conversion between STS XML and Metanorma XML is not just 'tag-to-tag' conversion, there is complex logic often. The constantly updating of conversion isn't complex task, but it takes a time.

Does the generation STS XML to MN XML directly make sense? In which scenarios it can be used?

Metanorma XML can be generated in this workflow:

  • ISO/NISO STS XML to Metanorma Adoc
  • generate Metanorma XML from Adoc

Could it be to drop the ISO/NISO STS to Metanorma XML conversion in mnconvert?

@ronaldtse @opoudjis any thoughts on this?

@Intelligent2013
Copy link
Contributor Author

With the new Presentation XML refactor coming out of metanorma/isodoc#617, metanorma-iso now breaks on trying to test for STS conversion. Should I remove that integration testing, or keep it?

@opoudjis mnconvert updated (https://github.com/metanorma/mnconvert/releases/tag/v1.65.0).

@opoudjis
Copy link

opoudjis commented Dec 4, 2024

mnconvert is updated to 1.65.0 on my side, but STS generation is still crashing when running rspec in metanorma-iso:

  1) Metanorma::Iso::Processor generates STS from Metanorma XML
     Failure/Error:
       MnConvert.convert(in_fname,
                         { input_format: MnConvert::InputFormat::MN,
                           output_file: out_fname || "#{filename}.#{@suffix}" })
     
     RuntimeError:
       Input: XML (test.xml)
       Output: XML (test.sts.xml), format NISO STS
     
       Transforming...
       Validate XML against XSD NISO-STS-interchange-1-MathML3-XSD/NISO-STS-interchange-1-mathml3.xsd...
       /Users/nickn/Documents/Arbeit/upwork/ribose/metanorma-iso/test.sts.xml is NOT valid reason:
       [ERROR] org.xml.sax.SAXParseException; lineNumber: 51; columnNumber: 22; cvc-complex-type.2.4.a: Invalid content was found starting with element 'fmt-title'. One of '{title, address, alternatives, array, boxed-text, chem-struct-wrap, code, fig, fig-group, graphic, media, non-normative-note, normative-note, non-normative-example, normative-example, notes-group, preformat, supplementary-material, table-wrap, table-wrap-group, disp-formula, disp-formula-group, editing-instruction, def-list, list, tex-math, "http://www.w3.org/1998/Math/MathML":math, p, related-article, related-object, disp-quote, speech, statement, verse-group, fn-group, glossary, ref-list, sec, term-sec}' is expected.
       [ERROR] org.xml.sax.SAXParseException; lineNumber: 56; columnNumber: 11; cvc-complex-type.2.4.a: Invalid content was found starting with element 'fmt-name'. One of '{"urn:iso:std:iso:30042:ed-1":crossReference, "urn:iso:std:iso:30042:ed-1":definition, "urn:iso:std:iso:30042:ed-1":example, "urn:iso:std:iso:30042:ed-1":externalCrossReference, "urn:iso:std:iso:30042:ed-1":note, "urn:iso:std:iso:30042:ed-1":see, "urn:iso:std:iso:30042:ed-1":source, "urn:iso:std:iso:30042:ed-1":subjectField, "urn:iso:std:iso:30042:ed-1":xGraphic, "urn:iso:std:iso:30042:ed-1":xMathML, "urn:iso:std:iso:30042:ed-1":xSource, "urn:iso:std:iso:30042:ed-1":tig}' is expected.
     # ./lib/isodoc/iso/sts_convert.rb:27:in 'IsoDoc::Iso::StsConvert#convert'
     # ./lib/metanorma/iso/processor.rb:62:in 'Metanorma::Iso::Processor#output'
     # ./spec/metanorma/processor_spec.rb:166:in 'block (2 levels) in <top (required)>'

@Intelligent2013
Copy link
Contributor Author

Confirmed. I'll fix.

Intelligent2013 added a commit that referenced this issue Dec 4, 2024
mn2xml xslt updated to fix validation issue, #418
@Intelligent2013
Copy link
Contributor Author

@opoudjis
Copy link

Currently, mnconvert allows to convert ISO/NISO STS XML to Metanorma XML. Metanorma XML is constantly changing. The changes in the Metanorma XML should be reflected in the STS XML to Metanorma XML conversion rules in the mnconvert. Errors and discrepancies between 'true' Metanorma XML and converted may occur, because the conversion between STS XML and Metanorma XML is not just 'tag-to-tag' conversion, there is complex logic often. The constantly updating of conversion isn't complex task, but it takes a time.

Does the generation STS XML to MN XML directly make sense? In which scenarios it can be used?

Metanorma XML can be generated in this workflow:

  • ISO/NISO STS XML to Metanorma Adoc
  • generate Metanorma XML from Adoc

Could it be to drop the ISO/NISO STS to Metanorma XML conversion in mnconvert?

@ronaldtse @opoudjis any thoughts on this?

I'm sorry to say I don't have the business context for STS XML > MN XML conversion, but I agree with you that using Asciidoc as an intermediary representation makes much more sense, and I've argued for it for years in comparable contexts, for the same reason you've given: the XML may be more expressive and more formal, but it is also volatile and will remain so. The Asciidoc is simply more stable.

@ronaldtse
Copy link
Contributor

@Intelligent2013 yes indeed. We can drop the STS to MN XML conversion. It's more important that we focus on what users need, which is in this use case, the ability to re-edit and re-publish STS XML.

@opoudjis
Copy link

The ST4S apparently needs to be updated for Presentation XML for terms as well, the conversion is breaking in metanorma-iso

       Transforming...
       Validate XML against XSD NISO-STS-interchange-1-MathML3-XSD/NISO-STS-interchange-1-mathml3.xsd...
       /Users/nickn/Documents/Arbeit/upwork/ribose/metanorma-iso/test.sts.xml is NOT valid reason:
       [ERROR] org.xml.sax.SAXParseException; lineNumber: 57; columnNumber: 16; cvc-complex-type.2.4.a: Invalid content was found starting with element 'fmt-preferred'. One of '{"urn:iso:std:iso:30042:ed-1":crossReference, "urn:iso:std:iso:30042:ed-1":definition, "urn:iso:std:iso:30042:ed-1":example, "urn:iso:std:iso:30042:ed-1":externalCrossReference, "urn:iso:std:iso:30042:ed-1":note, "urn:iso:std:iso:30042:ed-1":see, "urn:iso:std:iso:30042:ed-1":source, "urn:iso:std:iso:30042:ed-1":subjectField, "urn:iso:std:iso:30042:ed-1":xGraphic, "urn:iso:std:iso:30042:ed-1":xMathML, "urn:iso:std:iso:30042:ed-1":xSource, "urn:iso:std:iso:30042:ed-1":tig}' is expected.

in spec/metanorma/processor_spec.rb, line 161

@Intelligent2013
Copy link
Contributor Author

A week ago I've tested on ISO Rice document and the STS output XML was ok.
Ok, I'll check the XML from gem.

Intelligent2013 added a commit that referenced this issue Jan 21, 2025
mn2xml.xsl updated for new term tags, #418
@Intelligent2013
Copy link
Contributor Author

The ST4S apparently needs to be updated for Presentation XML for terms as well, the conversion is breaking in metanorma-iso

Issue fixed in https://github.com/metanorma/mnconvert/releases/tag/v1.67.0.

@Intelligent2013
Copy link
Contributor Author

Source XML - new Presentation XML format without Semantic part
Output XML - NISO STS XML.

Validation issues:

[ERROR] org.xml.sax.SAXParseException; lineNumber: 48; columnNumber: 25; cvc-complex-type.2.4.a: Invalid content was found starting with element 'source-highlighter-css'. One of '{custom-meta-group}' is expected.
[ERROR] org.xml.sax.SAXParseException; lineNumber: 50; columnNumber: 12; cvc-complex-type.2.3: Element 'nat-meta' cannot have character [children], because the type's content type is element-only.
[ERROR] org.xml.sax.SAXParseException; lineNumber: 68; columnNumber: 7; cvc-complex-type.2.3: Element 'sec' cannot have character [children], because the type's content type is element-only.
[ERROR] org.xml.sax.SAXParseException; lineNumber: 69; columnNumber: 4; cvc-complex-type.2.4.a: Invalid content was found starting with element 'p'. One of '{sec, term-sec, fn-group, glossary, ref-list, non-normative-note, normative-note, non-normative-example, normative-example, notes-group}' is expected.
[ERROR] org.xml.sax.SAXParseException; lineNumber: 885; columnNumber: 41; cvc-complex-type.2.4.a: Invalid content was found starting with element 'p'. One of '{sec, term-sec, sub-part, "http://www.w3.org/2001/XInclude":include, sig-block}' is expected.
[ERROR] org.xml.sax.SAXParseException; lineNumber: 885; columnNumber: 41; cvc-complex-type.3.2.2: Attribute 'class' is not allowed to appear in element 'p'.
[ERROR] org.xml.sax.SAXParseException; lineNumber: 885; columnNumber: 41; cvc-complex-type.3.2.2: Attribute 'displayorder' is not allowed to appear in element 'p'.
[ERROR] org.xml.sax.SAXParseException; lineNumber: 888; columnNumber: 41; cvc-complex-type.3.2.2: Attribute 'class' is not allowed to appear in element 'p'.
[ERROR] org.xml.sax.SAXParseException; lineNumber: 888; columnNumber: 41; cvc-complex-type.3.2.2: Attribute 'displayorder' is not allowed to appear in element 'p'.
[ERROR] org.xml.sax.SAXParseException; lineNumber: 2830; columnNumber: 22; cvc-complex-type.3.2.2: Attribute 'autonum' is not allowed to appear in element 'table'.
[ERROR] org.xml.sax.SAXParseException; lineNumber: 3573; columnNumber: 109; cvc-complex-type.2.4.a: Invalid content was found starting with element 'biblio-tag'. One of '{break, bold, fixed-case, italic, monospace, num, overline, roman, sans-serif, sc, strike, underline, ruby, alternatives, inline-graphic, private-char, chem-struct, inline-formula, label, abbrev, index-term, index-term-range-end, milestone-end, milestone-start, named-content, styled-content, annotation, article-title, chapter-title, collab, collab-alternatives, comment, conf-acronym, conf-date, conf-loc, conf-name, conf-sponsor, data-title, date, date-in-citation, day, edition, email, elocation-id, etal, ext-link, fpage, gov, institution, institution-wrap, isbn, issn, issn-l, issue, issue-id, issue-part, issue-title, lpage, month, name, name-alternatives, object-id, page-range, part-title, patent, person-group, pub-id, publisher-loc, publisher-name, role, season, series, size, source, std, string-name, supplement, trans-source, trans-title, uri, version, volume, volume-id, volume-series, year, fn, target, "urn:iso:std:iso:30042:ed-1":entailedTerm, xref, sub, sup}' is expected.
[ERROR] org.xml.sax.SAXParseException; lineNumber: 3577; columnNumber: 130; cvc-complex-type.2.4.a: Invalid content was found starting with element 'biblio-tag'. One of '{break, bold, fixed-case, italic, monospace, num, overline, roman, sans-serif, sc, strike, underline, ruby, alternatives, inline-graphic, private-char, chem-struct, inline-formula, label, abbrev, index-term, index-term-range-end, milestone-end, milestone-start, named-content, styled-content, annotation, article-title, chapter-title, collab, collab-alternatives, comment, conf-acronym, conf-date, conf-loc, conf-name, conf-sponsor, data-title, date, date-in-citation, day, edition, email, elocation-id, etal, ext-link, fpage, gov, institution, institution-wrap, isbn, issn, issn-l, issue, issue-id, issue-part, issue-title, lpage, month, name, name-alternatives, object-id, page-range, part-title, patent, person-group, pub-id, publisher-loc, publisher-name, role, season, series, size, source, std, string-name, supplement, trans-source, trans-title, uri, version, volume, volume-id, volume-series, year, fn, target, "urn:iso:std:iso:30042:ed-1":entailedTerm, xref, sub, sup}' is expected.

Intelligent2013 added a commit that referenced this issue Jan 26, 2025
@Intelligent2013
Copy link
Contributor Author

Intelligent2013 commented Jan 26, 2025

In the output NISO XML found the missing tags for the terms references.

For instance, in the source BSI document (https://github.com/metanorma/mn-samples-bsi/tree/main/sources/bs-202000), in 00-introduction.adoc there are references to the terms process and standard:

SM is a through-life {{process}} that enables the drafting, development and maintenance of {{standard,standards}}. It promotes the intelligent selection and implementation of {{standard,standards}}, policy, {{process,processes}} and tools, and delivers:

The Presentation XML contains the plain text only:

<p id="_34c2a109-0b03-f25c-efc8-7ba7fd222919">SM is a through-life process that enables the drafting, development and maintenance of standards. It promotes the intelligent selection and implementation of standards, policy, processes and tools, and delivers:</p>

Previously, the Semantic part (which used as 'base' for the conversion) contained this markup:

<semantic__p id="semantic___34c2a109-0b03-f25c-efc8-7ba7fd222919">SM is a through-life <semantic__concept><semantic__refterm>process</semantic__refterm><semantic__renderterm>process</semantic__renderterm><semantic__xref target="semantic__term-process"/></semantic__concept> that enables the drafting, development and maintenance of <semantic__concept><semantic__refterm>standard</semantic__refterm><semantic__renderterm>standards</semantic__renderterm><semantic__xref target="semantic__term-standard"/></semantic__concept>. It promotes the intelligent selection and implementation of <semantic__concept><semantic__refterm>standard</semantic__refterm><semantic__renderterm>standards</semantic__renderterm><semantic__xref target="semantic__term-standard"/></semantic__concept>, policy, <semantic__concept><semantic__refterm>process</semantic__refterm><semantic__renderterm>processes</semantic__renderterm><semantic__xref target="semantic__term-process"/></semantic__concept> and tools, and delivers:</semantic__p>

I'll add the issue in the isodoc repository.

@opoudjis
Copy link

There is one remaining issue I will create after Monday's release: refactoring of footnotes. There may be other requests as @strogonoff starts using the refactored code, but after that issue, this task can be treated as completed.

@ronaldtse
Copy link
Contributor

@opoudjis what's the plan for footnotes? Placing them in a unified container per document? Ideally we separate footnote content and applicable locality, in order to allow shared footnotes (and differentiation in locality).

Intelligent2013 added a commit that referenced this issue Feb 17, 2025
Intelligent2013 added a commit that referenced this issue Feb 18, 2025
Intelligent2013 added a commit that referenced this issue Feb 18, 2025
Intelligent2013 added a commit that referenced this issue Feb 18, 2025
Intelligent2013 added a commit that referenced this issue Feb 20, 2025
Intelligent2013 added a commit that referenced this issue Feb 20, 2025
Intelligent2013 added a commit that referenced this issue Feb 20, 2025
Intelligent2013 added a commit that referenced this issue Feb 21, 2025
Intelligent2013 added a commit that referenced this issue Feb 21, 2025
@Intelligent2013
Copy link
Contributor Author

mnconvert fails on the processing XML from the repository mn-samples-iso:
https://github.com/metanorma/mnconvert/actions/runs/13461620115/job/37618126610?pr=422
because XMLs in old format (iso-standard root tag). I'll run the XMLs re-generation.

@Intelligent2013
Copy link
Contributor Author

version 2.9.5 (from Jan, 20):

<iso-standard xmlns="https://www.metanorma.org/ns/iso" type="presentation" version="2.9.5" schema-version="v1.4.1">

because Metanorma docker updated 30 days ago: https://hub.docker.com/r/metanorma/metanorma.

@Intelligent2013
Copy link
Contributor Author

Intelligent2013 commented Feb 22, 2025

XSLT updated for old XML to STS XML conversion and validation.
Now, the issue metanorma/metanorma-plugin-lutaml#192 occurs.
https://github.com/metanorma/mnconvert/actions/runs/13475078544/job/37653234020?pr=422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants