The client is a leading academic publisher with a global footprint in scholarly communications. They manage an extensive catalog of peer-reviewed journals and scholarly books, covering research articles, reviews, case reports, and reference-rich book chapters across scientific and medical disciplines. Their publications are widely indexed in PubMed and PubMed Central, making research easily accessible and visible to the global academic community.
The client needed consistent XML production that aligns with PubMed/PubMed Central (PMC) standards across multiple journals and book imprints:
Content arrived in varied formats, including Word, PDF, LaTeX, InDesign exports, and legacy XML, making normalization a critical first step.
The client had journals and books that required different versions of PMC/JATS XML (Authoring and Publishing tag sets) to be supported at the same time, along with their own schema extensions.
Handling multi-level headings, nested lists, advanced table models, MathML equations, chemical formulas, figure groups, and supplementary files.
References had to be structured accurately, mapped with DOIs/PMIDs/PMCID identifiers, and standardized across mixed reference styles while accommodating corrections and updates.
Processing high monthly volumes under tight SLAs required multi-level XML validation and compliance checks, reproducible processes, and measurable quality assurance.
Deliverables had to be ingestion-ready for PubMed Central, discovery services, abstracting & indexing databases, and institutional repositories without loss of fidelity.
Our XML conversion workflow ensured that the content remained accurate, met PubMed/PubMed Central (PMC) standards, and was easily uploadable to various scholarly platforms. It was designed to handle a wide range of source files, complex research content, and large monthly volumes without compromising quality.
We standardized the source files (Word documents, PDFs, LaTeX manuscripts, InDesign exports, and even older XML files) into a consistent baseline before processing. Issues like missing figures, font-dependent special symbols, or malformed tables were flagged during automated checks early in the workflow, allowing us to resolve errors before they affected XML output.
Each manuscript/article was mapped into standard JATS/NLM sections such as <front>, <body>, and <back>. Within these, we ensured consistent tagging of elements like <sec> for sections, <title> for headings, <abstract> for summaries, and <kwd-group> for keywords. Contributor information was accurately structured using <contrib-group>, with precise tagging for <name> (author names), <aff> (affiliations), and <xref> (cross-references). This level of semantic detail ensured that the content was machine-readable, metadata-rich, and ready for indexing by scholarly databases, such as PubMed.
Mathematical expressions were encoded in MathML, making them both human- and machine-readable. We differentiated between inline math (equations within running text) and display math (standalone equations), preserving their distinct formatting and semantic meaning. Image renderings (PNG/SVG) were added as a fallback for systems that don’t support MathML.
Complex tables—including those with spanning headers and footnotes—were represented using JATS-compliant table structures to preserve their meaning. Figures and supplementary media were packaged with proper captions, alternative text for accessibility, persistent identifiers, and licensing metadata, ensuring they could be discovered, reused, and cited correctly.
References were transformed into structured <ref-list> entries, with detailed <element-citation> tagging for author names, journal titles, publication years, volume/issue numbers, page ranges, and DOIs. Automated lookups and normalization ensured that every reference was cross-checked with a DOI, PMID, or PMCID, where available. This eliminated inconsistencies in citation formatting and supported downstream bibliographic linking.
Every XML file was validated against JATS/NLM rules (DTD/XSD/RNG schemas) and further checked with Schematron rules for business logic—such as ensuring abstracts, mandatory sections, identifiers, and references were complete. Specialized conformance checks guaranteed that the XML passed PubMed Central’s requirements so it could be ingested without errors.
After validation, the XML files were bundled into submission-ready packages that included all supporting materials (figures, tables, multimedia) and a manifest file listing the contents. Each package was named and structured according to the specific requirements of PubMed Central and other platforms, so it could be uploaded without errors. We also supported update packages for handling corrections, errata, or post-publication changes, ensuring that previously published content stayed accurate and consistent across versions.
What made quality assurance critical in this project was the scale and complexity of the content. With XML conversion for equations, tables, and figures, chemical formulas, multilingual abstracts, and diverse JATS/NLM standards, for 10,000+ pages a month, QC couldn’t stop at generic checks. It required a multi-layered framework that combined automated XML quality assurance, rule-based validation, and expert editorial review — all backed by traceable metrics and audit-ready defect logs.
We established a durable and long-term collaboration with the client, grounded in the consistent delivery of structured, interoperable XML that not only met current compliance needs but also facilitated discoverability, archiving, accessibility, and seamless reuse across scholarly platforms.
Met strict timelines with predictable turnaround and minimal rework.
Achieved high conformance for PMC/JATS submissions, reducing ingestion errors.
Maintained throughout the partnership, ensuring consistent schedules and reliable turnaround.
They’ve taken the stress out of XML production for us. Even with high volumes, files are delivered on time, pass validation on the first try, and require minimal corrections. It’s allowed our team to focus more on publishing than troubleshooting.
- Production Manager
Scale your publishing operations without sacrificing quality with ePublishing services from SunTec India. In addition to XML conversion, you can also get XML/DTD design, TEI/PRISM XML conversion, and related support, with predictable turnaround, minimal rework, and the advantage of specialist review.