So, clinical sample collections take time, money and, like Sheenah said, they are the life of researchers. And we at NuGen really get it. For 13 years that we have focused our energy and the resources in the area of sample process. By sample prep, we mean you have DNA; you have RNA to get them ready for analytical platform. For example, build a sequencing library to have them ready for microarray analysis.
Our goal is to enable the scientists to access true biology in all samples available. That is a tall order. We're talking about all samples independent of the type, quantity and the quantity to get a true biologies using simple, fast, easy method to get reproducible results day in and day out. And not only that, you can do your research using samples you collected over 20, 30 years, such as FFPE [sp] samples are degraded and you can generate renewable resources of those samples so you can analyze them five, 10 years from now when we have even better technologies.
We have--we work with scientists as partners. Now we have people come tell us the challenges they face. They normally started like this. This is all I have, okay? What we got is what we got. We have to work with those samples. They are very quantity limited and FFPE samples often quality compromised. And you want to do RNA-seq and you don't want to spend 70 or 90 percent of your sequencing dollars generate kind of useless information like ribosomal RNA.
So, we have been working hard and smart to address this very concerns. So, today I'm very happy to share with you some I called rather elegant technologies to address this very challenges to allow scientists like you to work with samples of extremely small quantities, be DNA or RNA and get rid of the things you don't want to see in your sequencing, really get the true information out of your samples. And they enable to do targeted sequencing fast and accurate.
So now, how to make the most out of your precious samples. We have a customer they have needle biopsy samples and they have a very small quantities and this is from patient samples. They need a very high success rate and have nasal swab samples and FFPE samples. And sometimes I'm sure some of you would do that you have one sample you want to kind of throw the book at it. You want to do QPCR [sp]. You want to do microarray. You want to do sequencing. You can never have enough samples to do all this analysis.
We have invented this technology called SPIA, single primer isothermal amplification technology. And they works like this, okay? You have your original template, say this is a total RNA. We introduce a specific tech sequence; it's a SPIA tech, by either priming in the first [unintelligible] all by ligation. With the tag [sp] in place that we anneal this SPIA primer, which is a very smart primer. It is the chimeric primer made with both DNA and RNA. Once you have the SPIA primer annealed we have enzyme mix to display the strain and they extend it out to make the amplification product. Then the RNA portion gets degraded allow more SPIA primers to come out. So, keep doing this then you ended up with many copies of very original sample. So, this is unlike PCR. You are amplifying copies of the copies and the basis amplification of your original samples.
So, with SPIA you get a product that you can go out with whatever type of analysis you want to do today, microarray, next gen sequencing and you can save a portion for the future to retrieve. Particularly if you have a vast collection, 1,000 of samples, you don't even know which one you want to get back to in the future. This provide a very innovative way to keep the samples.
You can go down to very low input with this technology. Like I mentioned, you know, there are about 400 million FFPE samples in North America. What do people do with those samples? They are degraded, most of them. And a lot of people have those collections they have to use them. This technology would allow you to get a true biology out of those heavily degraded samples.
So now, when we talk about low input, how low is low, okay? So SPIA technology we say okay you can go down to pictograms. And if you work in the field you will know pictogram is almost next to nothing. But, people study viral load of viruses in healthy individual carriers and what they have now look at all the leading zeros, okay? You may have 1 to 1,000 copies. You do not even have a pictogram of total RNA material. And at this input range the highly sensitive, high precision QPCR doesn't work. So, you are really left with variable choices.
And the one researcher Dr. Christina Maltbooth [sp] from the Broad Institute [sp] was really having problems with her samples. She wanted to study the viruses in the clinical samples she collect they are healthy individuals. They are all very, very low copy numbers. So, she came to us. We worked with her as a partner and she did off label application with our RNA-seq kit. So, she was thrilled. She was really thrilled. I just took this result table out of her paper here so she was able to get 100 percent coverage or near complete coverage for samples at a very low copy numbers. This is from the nasal swab prep from the samples, so very impressive.
Now, what if you have DNA? I once talked to this researcher he was involved in Manhattan; I call it the Manhattan Dirty Air Project. So, they collected some air in the midair of Manhattan and to prep the samples, they were pretty dirty. They are not so easy to work with and they have a very low quantity of it. And they want to be able make the genomics library for sequencing. And he basically had no way of doing that. That was the time we were developing this ultra-low technology. So, think about the NGS library construction business, it's really signal to noise, okay? You have a lot of good DNA. You have no problem, right? You can build a library. You have all the risks you want. But, as the input level keeps going down and down, the noise come up overwhelm the signal. That’s why when you look at any commercial kit, they will tell you what's the lowest input level you can go down to still get a good library.
We have came up--we have come up with this very smart, very unique adapter structure that basically reduce the noises, which allow you to go down as little as a one nanogram. Actually one of my collaborators, the first thing they did was 100 pictogram. They wanted to see how low they can go, okay? They build a rate. They build for libraries and it is fairly fast.
So, with this ultra-low technology we did a test we wanted to see how they performs on scientific genomes. So, first we started with some bugs. We have eColi, which is, you know, kind of TC [sp] range, right in the middle. It's not too hard. Then some people have a very difficult samples with very high TC or which very low TC. So, we have the three bugs that we made a sequencing libraries using one nanogram in triplicate and to sequence them.
The curve you see here is the coverage curve. The gray one is the theoretical. So, you have the genome, your computation relief fragment in them and to pull them back together and if you have no bias then you will see this coverage curve. The other curves super imposed on top of each other are the actual experimental data from the one nanogram library. And you can see they are really on top of the theoretical curve, gave you very good coverage without any amplification bias. And when you have a low TC, I think most of the sequencing chemistries they have trouble with high AT [sp] content. So, you do see the curve a little bit broader, but you can see it's still smack in the middle here.
What about human samples? For the 1,000 genome project that the data deposited in the database were all using the illumina sequencing, illumine the library prep with agents. And there was calling for one to five micrograms starting material. Using our ultralow library we got [unintelligible] sample and we started with the 50 nanogram material, okay? We sequence it to the same number of reads.
Looking at the ultra-low library we started about 550 million reads, the same thing for illumina library sample we downloaded from the database. We actually downloaded from the database. We actually downloaded five genomes just make sure we're not just happen to look at a special one. So, they all look about the same and showing here just one of the five. And you can see our very unique adapter structure basically eliminates all adapter artifact, that’s why you can go lower and lower.
So, we have a very little adapter artifact compared to the illumina other parameters, you know, the NuGen ultra-low comes slightly better. I will say they are comparable, okay, statistically for 88 alignment versus 85 and the reads that were not aligned were slightly better. But, keeping in mind, this is a 50 nanogram input without any gel purification. The library was made.
The illumine library was using more than one microgram DNA and it went through the gel purification, a very labor intensive step. So, again, using the TC content curve as a way to show sequencing of library bias, the human genome is about 41 percent. So, it is totally random. It will be just like this. And the illumina library the NuGen library actually all came out really nicely here.
To look at the coverage, particularly if you are clinician, to tell to your patient, you say, you know, yeah, I sequenced your genome everything see low and pretty good, but they will only count like 92 percent, okay? We don't know what happened to the other eight. That doesn’t seem to be satisfactory if I were on the other side.
So, we look at--we took a 10 mega basis from chromosome 20, which is sort of the average of human genome to look at the coverage plot, okay? So, we took the reads gave us about 21X coverage, so if everything covered still perfectly you will just get a straight line, right, so 21X. In real experimental data you always have regent covered a little bit more and a little bit less. And to look at this part, okay, this is the part get covered but less than average with 50 nanogram library, using the ultralow technology, now we actually have--this is a fairly significant better coverage than the one microgram input, okay?
We always say you spend 20 percent of the resource to get that 80 percent of the data, but the last 20 percent costs you a lot of money to get. And that this is the part to make the difference.
So, make the most out of your precious samples, particularly the ones saved from the Sandy storm that if you have total RNA you want to do RNA-seq or you have a very precious DNA for example from [unintelligible] that you usually just have a few nanogram material that this technology will help you to get wherever you want to go.
So now, move on RNA-seq, okay? People do RNA-seq try to get a complete picture of the transciptor. Unfortunately, the RNA-seq, the total RNA has a lot of things you don't want, the ribosomal RNA often 70, 80, sometimes 90 percent of your samples. So, you don't want to get a complete picture while wasting a significant portion of your sequencing capacity. Now if you are working with whole blood total RNA sample then you have both ribosomal RNA and the globin RNA to get rid of. And we talked to people they work great seems seriously from the space, okay? They want to do RNA-seq. And if they want to see the signals not buried by that 70, 80 percent of uninformative reads. How do you do that? They are multiple ways. But, what most people do is that okay I'm going to do polyA selection, right? I get really the messenger RNA. But then you lose the other part of the transcription, the [unintelligible] RNA and the small RNA. Okay, I can do a hybridization based ribosomal pullout reduction, but you still ended up with significant portion of the ribosomal RNA and, trust me, you do lose information when you do pullout.
So, we came up with this really elegant technology we call it the inDA-C, it's Insert Dependent Adaptor Cleavage. This is how they works. What I'm now talking is we have a way to make strength specific RNA-seq libraries. So, take that strength specific RNA-seq libraries. The indices starts when you anneal those probes that were designed for the things you don't want like ribosomal RNA or globin RNA. You anneal those probes and they extended them. Upon the completion of the extension, it creates a new restriction site. So, you cut it off. So, what's left is the ones that the free up the things you don't want.
InDA-C is built into the library prep process. It doesn't really--it takes about an hour this step alone, okay? So, it allows you to produce a library free of the things you don't want and it doesn't get rid of the things--or it doesn’t impact the complexity of the library. This is just a quick slide showing this is one single whole blood sample and the basis of blood pool sample out of four all of those are [unintelligible] prepared, total human blood, RNA. And that you can see without using the inDA-C you have over 50 percent of the ribosomal RNA and the close to 20 percent of globin RNA. Try to think about not only that you are burdened with this 70 percent of junk, you are valued scientists have to spend extra time to remove them as well. You pay twice. With the inDA-C probe, and you can see the total ribosomal and the globin reads came down well below 10 percent. And this is the five to three prement [sp] coverage brought from that very data and you can see you wanted to have your transciptom covered from 5 prem [sp] to three prem, ideally straight line. And look at this one, it's fairly close. It's a very good coverage. You don't suffer from either three prement nor five prement bias.
So, this is a very interesting slide, okay, this data actually came from Dr. Andy Brooks' lab. They did a beta comparison using NuGen's whole blood RNA-seq kit starting with 100 nanogram total RNA. And illumina current TruSeq RNA sample prep kit, which calls for 1 microgram input because you have to do polyA selection, okay? And look at the percent--this is the sixth sample put through into the two lines of product and look at the percentage aligned that the NuGen whole blood is consistent better by about two to three--two to five percentage point. And then look at the percentage of ribosomal and the globin combined, we are well below 10 and the verses here is very sample dependent. This is after PolyA selection. And you could still throw away nearly half of your reads, okay.
Now, what you--after you throw away the things you don't want, let's see what you really get, okay? So, this numbers are the RPKM threshold, one meaning they are fairly abundant as the cut off, the cut off coming down. This getting you to the more rare transcript, okay, possibly the things you are looking for. So, if you look at highly abundant transcript here that with 1 microgram input you actually get a lot of reads in highly abundant area, highly abundant transcript. With the threshold coming down the whole blood kit using the NuGen inDA-C technology, started peaking up a lot more rare transcript. It's a fairly significant. We are talking about 2,000 genes in every sample. So, you are missing 2,000 genes even from 10 times higher input.
So, getting rid of the things you don't want were actually quite overwhelmed, it's amazing how many things the people don't want that you can start with a total RNA again, you know, somewhat 100 nanogram and you construct a strength specific RNA-seq library in one working day. And many people are still struggling in two, three or even five working days today. And with this approach the ribosomal RNA are things you don't like that will get simultaneously reduced or eliminated in the process.
So, moving on to target sequencing, particularly for calendar samples for researchers that they want to see what's in your genome and what's in your axiom. They want to sequence a lot of things. For clinicians sometimes I just want to see what happened to my patient for this particular genome and this particular region. So, the complete coverage of my big [unintelligible] base of 300 genes is very, very important. And they're like yeah, you've got clinician you really don't want to tell your patient that we only get 95 percent of your gene. And the particularly for tumor samples, they are highly complex. And they're sometimes very degraded. And you really don't want false negatives, false positives may cost you more money to straighten it out, but false negatives you simply don't know what you are missing.
And every researcher has their own favorite gene list even they are studying the same thesis and the same patient population that they all have their own thing they want to specialize. So, we have this technology SPET. It's a single primary extension technology. And this is really one of the most elegant technologies that I have had the pleasure working on. And this is how they works. This is--the purple part is your sheered genomic DNA. And you like it like adaptor to the five prong end, then you anneal the probes consisting of the target you want. So, you are interest in P53 or you are interested in whatever gene panel you design those probes and the base probe will anneal. And the only one this extends then you can complete the sequencing library. So, it's almost the opposite of inDA-C, okay? So, you extend. So, basically at the end of the day you get a library only contains the things that you want to have.
And look at some comparisons here. This is the paper population a couple of years ago to compare three most popular target enrichment technologies. And this is using the very same target regions five mega base B. And you can see the hybridization based approach, [unintelligible] and the elegant. You have about 10 percent design drop off because it's hybridization based. You can design probes in the highly repetitive regions. And the [unintelligible] being PCR based had a little bit of better luck in that [unintelligible] aspect. And the on target rate, on target rate is all around 50, 60 percent. So, even you go through all the trouble you know, you get what you want, but the efficiency still suffers a little bit.
So, we have a couple of test panels we have worked, developed to ways our partners and the one is the 344 genes cancer panel that we have a design dropout less than .2 percent. The on target rate is about 80 percent. And the experimental at the end this get about 2 percent. The experimental dropout is unreported in this paper for the people who are familiar with this, the experimental dropout, meaning you can [unintelligible] usually around 5, 6, 7 percent, okay. And this is the very same cancer panel a customer wanted to try out. So, they got about 70 percent on target. And another custom panel 113 genes we also have a fairly good coverage.
And this approach is one of its own in the target enrichment because it--we designed the probes so it actually interrogates both strains of the target DNA. If you are familiar with hyperbaton based approach, the hybridization probe really just hybridized to one strain of your target. But we designed the probes to interrogate both strains. So, when you see the sequencing date at the very show up in both orientations, they come from two different strains. Then you have much higher confidence to know this is real. And that is a really good thing for tumor sequencing.
This method is very straightforward. It's very elegant. You get a sequencible library from genomic DNA, 8:00 o'clock in the morning you get it at the end of the day. And the input can be from 10 to 100 nanogram instead of a micrograms. We have tried many custom probe designs in partnership with our customer. We often get at zero design dropouts. And we get a couple percentage experimental dropout. I will say this really stands on this own in performance. And the one's, again, when you are sequencing tumor samples, when you sequence any patient samples, you really want to be sure that this is the real before you do whole lot more possibly invasive procedures. So, this technology interrogates both strains.
So, in summary, that we work with our customers as partners. We really listen to understand what their problems are and we work with them to address their concerns. So, we have invented some really interesting technologies to allow you to go down to pictograms of total RNA for RNA-seq and you can--we actually have a single cell product very soon as well. Single cell is a 10 pictograms total RNA. And allow you to go down to one nanogram to make a really beautiful genomic library for matter genomics, for anything you want to study.
We have a very complete workflow to generate strength specific RNA-seq without the junk, if you will, and give you renewable CDN [sp] sources. So, this is a big deal for people want to keep their precious sample for the next 10, 15 years. We have a very fast and efficient method for enrichment and this afternoon some of you might be going to the [unintelligible] SP-plus workshops and this is a total cute machine, smaller than the coffee machine. It's a total walkaway automation to do the simple prep for you.
Like I said, we really thrive in the area of sample preparation. We work hard. We try to push the breakthroughs to enable you to do the cutting edge research. Thank you.