Nature Methods Paper Leverages PacBio Sequencing Technology to Develop the Platinum Pedigree Benchmark, a New Standard for Accurate Characterization of Variation in the Human Genome that Improves Training for AI Models
PacBio (NASDAQ: PACB) announced the publication of a groundbreaking study in Nature Methods introducing the Platinum Pedigree benchmark, a comprehensive truth-set of genomic variation. The study, conducted in collaboration with multiple institutions, utilized deep sequencing across a 28-member multi-generational family to create the most complete view of validated genetic variation to date.
The benchmark successfully catalogs over 37 Mb of genetic variation and adds more than 200 million bases, extending benchmark regions to 2.77 Gb. When used to retrain Google's DeepVariant AI tool, it achieved a 34% reduction in erroneously called variants. The dataset is freely available and includes the first large pedigree-validated tandem repeat and structural variant truth sets.
PacBio (NASDAQ: PACB) ha annunciato la pubblicazione di uno studio innovativo su Nature Methods che presenta il Platinum Pedigree benchmark, un set di riferimento completo per la variazione genomica. Lo studio, realizzato in collaborazione con diverse istituzioni, ha utilizzato il sequenziamento profondo di una famiglia multi-generazionale di 28 membri per creare la panoramica più completa finora validata delle variazioni genetiche.
Il benchmark cataloga con successo oltre 37 Mb di variazioni genetiche e aggiunge più di 200 milioni di basi, estendendo le regioni di riferimento a 2,77 Gb. Utilizzato per il riaddestramento dello strumento AI DeepVariant di Google, ha ottenuto una riduzione del 34% degli errori nelle varianti chiamate. Il dataset è disponibile gratuitamente e include i primi set di verità validati su tandem repeat e varianti strutturali in una grande pedigree.
PacBio (NASDAQ: PACB) anunció la publicación de un estudio innovador en Nature Methods que presenta el benchmark Platinum Pedigree, un conjunto de referencia completo de variación genómica. El estudio, realizado en colaboración con varias instituciones, utilizó secuenciación profunda en una familia multigeneracional de 28 miembros para crear la vista más completa hasta la fecha de variación genética validada.
El benchmark cataloga con éxito más de 37 Mb de variación genética y añade más de 200 millones de bases, extendiendo las regiones de referencia a 2,77 Gb. Al utilizarse para reentrenar la herramienta de IA DeepVariant de Google, logró una reducción del 34% en variantes erróneas. El conjunto de datos está disponible gratuitamente e incluye los primeros conjuntos de verdad validados en pedigrí grande para repeticiones en tándem y variantes estructurales.
PacBio (NASDAQ: PACB)� Nature Methods� 획기적인 연구 결과� 발표했으�, � 연구� Platinum Pedigree 벤치마크라 포괄적인 유전� 변� 진실 세트� 소개합니�. 여러 기관� 협력하여 수행� � 연구� 28명으� 구성� 다세대 가�� 심층 시퀀싱을 활용� 지금까지 가� 완벽하게 검증된 유전� 변이의 전모� 만들었습니다.
� 벤치마크� 성공적으� 37 Mb 이상� 유전 변�� 분류하고 2� � 이상� 염기� 추가하여 벤치마크 영역� 2.77 Gb� 확장했습니다. 구글� AI 도구 DeepVariant� 재학습하� � 사용되었� �, 잘못 호출� 변이가 34% 감소하 성과� 거두었습니다. 데이터셋은 무료� 제공되며, 대규모 계보 검증된 텐덤 반복 � 구조 변� 진실 세트� 최초� 포함하고 있습니다.
PacBio (NASDAQ : PACB) a annoncé la publication d'une étude révolutionnaire dans Nature Methods présentant le benchmark Platinum Pedigree, un ensemble de vérité complet des variations génomiques. L'étude, réalisée en collaboration avec plusieurs institutions, a utilisé un séquençage profond sur une famille multi-générationnelle de 28 membres pour créer la vue la plus complète à ce jour des variations génétiques validées.
Le benchmark recense avec succès plus de 37 Mb de variations génétiques et ajoute plus de 200 millions de bases, étendant les régions de référence à 2,77 Gb. Utilisé pour réentraîner l'outil d'IA DeepVariant de Google, il a permis une réduction de 34 % des variantes faussement appelées. Le jeu de données est disponible gratuitement et inclut les premiers ensembles de vérité validés sur de grands pedigrees pour les répétitions en tandem et les variations structurelles.
PacBio (NASDAQ: PACB) gab die Veröffentlichung einer bahnbrechenden Studie in Nature Methods bekannt, die den Platinum Pedigree Benchmark vorstellt � ein umfassendes Wahrheitsset für genomische Variationen. Die Studie, die in Zusammenarbeit mit mehreren Institutionen durchgeführt wurde, nutzte Deep-Sequencing einer 28-köpfigen, mehrgenerationenübergreifenden Familie, um die bisher vollständigste validierte Übersicht genetischer Variationen zu erstellen.
Der Benchmark katalogisiert erfolgreich über 37 Mb genetische Variationen und fügt mehr als 200 Millionen Basen hinzu, wodurch die Benchmark-Regionen auf 2,77 Gb erweitert werden. Beim erneuten Training von Googles DeepVariant KI-Tool führte dies zu einer 34%igen Reduktion fehlerhaft erkannter Varianten. Der Datensatz ist frei verfügbar und beinhaltet die ersten groß angelegten, pedigree-validierten Tandem-Repeat- und Strukturvarianten-Wahrheitssets.
- Creation of the most comprehensive family-based variant dataset for improving AI-based variant classification
- 34% reduction in variant calling errors when used to retrain Google's DeepVariant tool
- Dataset extends benchmark regions by 200 million bases to 2.77 Gb, including previously excluded complex regions
- First large pedigree-validated tandem repeat and structural variant truth sets
- None.
Insights
PacBio's new genomic benchmark significantly advances variant detection accuracy, especially in complex regions, demonstrating their scientific leadership in the field.
The publication in Nature Methods represents a significant scientific achievement for PacBio. The company has developed the "Platinum Pedigree" benchmark - the most comprehensive family-based variant dataset to date - which substantially improves the accuracy of genetic variant detection, especially in traditionally challenging genomic regions.
What makes this breakthrough particularly valuable is its application to complex regions of the genome (often called the "dark genome") that previous benchmarks couldn't adequately address. By incorporating data from 28 family members across multiple generations and using inheritance patterns to validate variants, PacBio has created a resource that confidently catalogs over
The immediate practical impact is demonstrated by the
For PacBio as a company, this publication showcases their scientific leadership in the genomics field and the unique capabilities of their long-read sequencing technology. By making this resource freely available, they're positioning themselves at the center of the genomics ecosystem while enabling advancements in clinical sequencing and AI-driven genomic analysis. This aligns with the industry shift toward more comprehensive genomic analyses that include complex structural variants and repeat regions previously inaccessible with short-read technologies.
The most comprehensive, family-based variant dataset ever published will improve variant classification using AI-based tools
MENLO PARK, Calif., Aug. 04, 2025 (GLOBE NEWSWIRE) -- PacBio (NASDAQ: PACB), a leading provider of high-quality, highly accurate sequencing platforms, today announced the results of describing a new, comprehensive truth-set of genomic variation which characterizes simple and complex variation. These improved benchmarks were used to retrain Google’s DeepVariant, a popular AI-based variant calling tool, resulting in a
Combining inheritance-based validation with long-read sequencing, this benchmark accurately characterizes variants, even in difficult, repeat rich regions of the genome, producing the most complete view of validated genetic variation to date.
“Comprehensive benchmarking datasets that include all variant types are foundational to progress in genomics methods development and the application of AI-driven tools, as well as to our understanding of genomic variation for both research and diagnostic purposes,� said Zev Kronenberg, lead author and Senior Manager at PacBio. “The Platinum Pedigree benchmark doesn’t just include simple variants in easy-to-sequence regions, it includes variants from across the entire genome, including regions that were previously excluded from benchmarks due to their complex nature.�
The Platinum Pedigree dataset was developed using deep sequencing from three sequencing platforms across a 28-member, multi-generational family (CEPH-1463). By tracking the inheritance of genetic variants from parents to multiple children, the study confidently catalogs over 37 Mb of genetic variation segregating within the family from single nucleotide variants to large structural variants.
The dataset introduces the first large pedigree-validated tandem repeat and structural variant truth sets. It also adds more than 200 million bases extending the benchmark regions to 2.77 Gb, including difficult-to-map areas such as segmental duplications and low-complexity regions.
A Benchmark Built for the Dark Genome
As a demonstration of the value of improved benchmarks to improve AI and ML methods, the researchers retrained Google’s DeepVariant - a popular software tool that employs deep learning to identify genetic variants - using the Platinum Pedigree benchmark data. This updated DeepVariant model reduced errors by up to
“This benchmark pushes accuracy where it matters most,� said Michael Eberle, senior author and Vice President of Computational Biology at PacBio. “It enables better evaluation of variant calling pipelines and accelerates the development of methods that finally reach the full genome, including some of the complex regions that are important for human health.�
A New Standard for Clinical and Population Genomics
The Platinum Pedigree benchmark is freely available and already being used by scientists to develop new sequence analysis tools and validate clinical sequencing workflows. It also provides a roadmap for future benchmarking efforts, especially those involving more complete genomes like T2T-CHM13.
The full dataset, analysis code, and pipelines are publicly available at: https://github.com/Platinum-Pedigree-Consortium.
About the Study
The study, was published in Nature Methods on August 4, 2025. It was led by scientists at PacBio, the University of Washington, and University of Utah, with support from the NIH and Howard Hughes Medical Institute.
About PacBio
PacBio (NASDAQ: PACB) is a premier life science technology company that designs, develops, and manufactures advanced sequencing solutions to help scientists and clinical researchers resolve genetically complex problems. Our products and technologies, which include our HiFi long-read sequencing, address solutions across a broad set of research applications including human germline sequencing, plant and animal sciences, infectious disease and microbiology, oncology, and other emerging applications. For more information, please visit and follow @PacBio.
PacBio products are provided for Research Use Only. Not for use in diagnostic procedures.
Forward Looking Statements
This press release contains “forward-looking statements� within the meaning of Section 21E of the Securities Exchange Act of 1934, as amended, and the U.S. Private Securities Litigation Reform Act of 1995. All statements other than statements of historical fact are forward-looking statements, including statements relating to the uses, advantages, quality or performance of, or the benefits or expected benefits of using, PacBio products or technologies, including in connection with the Platinum Pedigree dataset, its potential to enable better evaluation of variant calling pipelines and accelerate development methods that reach the full genome, and other future events. You should not place undue reliance on forward-looking statements because they are subject to assumptions, risks, and uncertainties that could cause actual outcomes and results to differ materially from currently anticipated results. These risks include, but are not limited to, rapidly changing technologies and extensive competition in genomic sequencing; unanticipated increases in costs or expenses; and other risks associated with general macroeconomic conditions and geopolitical instability. Additional factors that could materially affect actual results can be found in PacBio’s most recent filings with the Securities and Exchange Commission, including PacBio’s most recent reports on Forms 8-K, 10-K, and 10-Q, and include those listed under the caption “Risk Factors.� These forward-looking statements are based on current expectations and speak only as of the date hereof; except as required by law, PacBio disclaims any obligation to revise or update these forward-looking statements to reflect events or circumstances in the future, even if new information becomes available
Contacts
Investors and Media:
Todd Friedman
Media:
