Documentation

Contents



1. IntroductionTop

Instead of being translated into proteins, long noncoding RNAs (lncRNAs) exert their functions in cellular processes, organismal development and diseases directly in the shape of RNA. However, due to the extremely complicated and inconstant mechanisms when compared with protein-coding genes, most of lncRNAs have not been well studied, despite that a host of researches have extensively demonstrated the significance and diversity of lncRNAs in regulatory functions. While at this moment, CRISPR/Cas9 system, as a revolutionary genome editing tool for all the areas of molecular biology, provides new opportunities for deeply researching lncRNA's functions, and is receiving increasing attention in the field of lncRNA studies.

Unlike protein-coding genes, many lncRNAs are confined to nucleus, and some exert their molecular functions in a transcript-independent mode, meaning that the transcribing event of lncRNA in itself could affect target genes. Therefore, there are probably a lot of restrictions to use RNAi method for performing loss-of-function studies of lncRNAs. Whereas, CRISPR/Cas9 has a huge advantage in lncRNA researches, resulting from its in-cis regulative function in cell nucleus. CRISPR/Cas9 can be used in its unmodified form (wtCas9) to cut DNA and create a knockout genotype (CRISPRko). After inactivating the nuclease domains to create a dead Cas9 (dCas9), additional domains can be appended to extend the range of activities for CRISPR/Cas9 technology, such as transcriptional activation (CRISPRa), transcriptional interference (CRISPRi), and so on.

Until now, tools with various characteristics for ab initio designing sgRNA are crowded, whereas systematic collection of validated sgRNA is rare. As the first database against the validated sgRNAs of lncRNAs, CRISPRlnc will provide not only a new and powerful approach for genome editing of lncRNAs, but also a golden standard dataset that is crucial for subsequent development of the sgRNA design tools of lncRNAs. We believed that our CRISPRlnc database will thrive in the fields related to lncRNA and CRISPR/Cas9 studies, and will be popular among people in these fields.



2. Aims of CRISPRlncTop

The first step of CRISPR/Cas9 gene editing is to design a single guide RNA (sgRNA) to target your gene of interest. However, because sgRNAs vary widely in their activity and models, designing a sgRNA is not easy with an unwarrantable effectiveness. Thus, it is worthy of collecting validated sgRNA sequences, to assist in efficiently choosing sgRNA with an expected activity. For example, Varshney et al. had constructed CRISPRz, which published in the NAR 2016 Database Issue, to collect validated sgRNAs for zebrafish coding-genes. However, CRISPR/Cas9 applications for lncRNAs are much different from coding-genes, as indicated by many known works. For instance, it is not necessary for lncRNA to maintain an intact open reading frame for functioning. Besides, lncRNA as well as their surrounding coding/noncoding neighbors had complicated genomic architectures, like sense/antisense, intergenic/intragenic (Tsui-Ting et al. NAR 2015; Sanjay et al. NAR 2016; Ashish et al. NAR 2017), etc. Therefore, a validated sgRNA database specifically for lncRNAs is profoundly valuable for the relevant academic community.



3. Data source and processingTop

CRISPRlnc is a manually curated database of validated CRISPR/Cas9 sgRNAs for lncRNAs from all species. After manually reviewing more than 200 published literature, the current version of CRISPRlnc embodied hundreds of lncRNA entries and thousands of validated sgRNAs across 8 species, including mammalian, insect and plant. We handled the ID, position in the genome, sequence and functional description of these lncRNAs, as well as the sequence, PAM motif, CRISPR type and validity of these sgRNAs.



4. Statistics of the CRISPRlncTop

(A) The distribution of sgRNAs in various species. (B) The functional types of sgRNA. (C) The Top10 lncRNAs that possess known designed sgRNAs, without regard to those large-scale data. (D) The position distribution of sgRNAs on lncRNAs.

Figure A shows the sgRNA distribution in various species. The collected sgRNA data are mainly from Human (Homo sapiens, 1777 in total), Mouse (Mus musculus, 79 in total) and Fly (Drosophila melanogaster, 233 in total). Figure B shows the type of these data, most of which are CRISPR interference (CRISPRi, about 52.1%); while the minimum type is CRISPRedit (about 0.2%). According to the Top10 lncRNAs possessing the designed sgRNAs (Figure C, regardless of type III sgRNAs supported by high-throughput experiments), CRISPR/Cas9 applications for lncRNAs are currently focused on some potentially important lncRNAs in human, such as NEAT1 (Nuclear Enriched Abundant Transcript 1), PVT1 (Plasmacytoma variant translocation 1) and MALAT1 (Metastasis Associated Lung Adenocarcinoma Transcript 1). We also calculated the localization tendency of sgRNAs on lncRNAs, including up/downstream region of gene, and gene body near 5'/3' end (Figure D). Our results revealed that the collected sgRNAs were more likely to locate at gene body near 5' end (61.4% in 5' end vs. 12.6% in 3' end), as well as more likely at gene upstream as opposed to downstream (20.0% vs. 5.9%).



5. Usage of DatabaseTop

a. Browse

In Browse, you can browse all lncRNAs and sgRNAs in CRISPRlnc. We divided our data into three types according to the validity of collected sgRNAs: (I) recommended high activity sgRNAs, (II) experimental validated sgRNAs, and (III) sgRNAs designed by experts with partial experimental validation.


b. Details

Each lncRNA in CRISPRlnc has a ‘details’ page. This page is mainly containing lncRNA’s information (ID, transcript, sequence and function description) and sgRNAs table. Each lncRNA is marked with a source of literature, so you can backtrack. If this information is not compatible with you yet, we also provide links to lncRNA databases or sgRNA design tools in the ‘details’ page.


c. GBrowse

In GBrowse, you can view lncRNAs, sgRNAs and other genomic annotation according to their location in the genome. If you know the location of a lncRNA, we recommend that you use GBrowse to find out if it has validated sgRNAs.


d. BLAST

You can find a lncRNA through BLAST sequence similarity search in CRISPRlnc. Maybe the lncRNA of a species you are interested in does not have validated sgRNAs, but if you are lucky, you may find its homologue has. Up to date, CRISPRlnc collected 205 lncRNAs and 1049 validated sgRNAs across 8 species, including mammalian, insect and plant.


e. Search

In Advanced Search, you can find interested lncRNA or sgRNA in according to it's Gene ID, Species, CRISPR Type, Validity, Cell line and PMID.



6. Submission to CRISPRlncTop

To respond to the rapid growth of research in lncRNA and CRISPR/Cas9, we will update the database regularly. As the first database specifically targeting to the validated sgRNAs of lncRNAs, we hope to get the support and advice from the academic community to help us improve it. It is greatly appreciated submitting data to us. All submissions will be reviewed and curated, and update into the database as soon as possible. Any comments and suggestions, please E-mail us.


7. Tips for CRISPR/Cas9 of lncRNAsTop

1. CRISPRa and CRISPRi are more suitable for CRISPR/Cas9 of lncRNAs
CRISPR­ko has limited ability to perturb lncRNAs because small indels may affect function only rarely (unlike the functional consequences of frameshifts in protein­coding genes). Therefore, CRISPRa and CRISPRi technologies are the methods of choice for manipulating the expression of this class of genes.

2. Target lncRNA at conserved regions or splice junctions
Instead of translating into protein, lncRNAs play roles as RNA directly. Target lncRNA at conserved regions or splice junctions maybe more effective.

3. Obtain the full-length of lncRNA by RACE
The transcription of lncRNAs is more complex than the coding gene and lncRNAs in most databases is not full-length. We recommend that the full length of lncRNA be confirmed before the design of sgRNAs.

4. Selected target sites should be sequenced
Selected target sites should be sequenced before carrying on experiments, due to the heterogeneity of genome, such as SNPs or in/dels. Design primers to amplify about 300bp product on the flanks of the target site and sequence by Sanger chain termination method.

5. More sgRNAs
Simple but very useful. Probability, multiple loci can guarantee success.



ReferenceTop

1. Ho TT, Zhou N, Huang J, Koirala P, Xu M, Fung R, Wu F, Mo YY: Targeting non-coding RNAs with the CRISPR/Cas9 system in human cell lines. Nucleic Acids Res 2015, 43(3):e17.

2. Goyal A, Myacheva K, Gross M, Klingenberg M, Duran Arque B, Diederichs S: Challenges of CRISPR/Cas9 applications for long non-coding RNA genes. Nucleic Acids Res 2017, 45(3):e12.

3. Varshney GK, Zhang S, Pei W, Adomako-Ankomah A, Fohtung J, Schaffer K, Carrington B, Maskeri A, Slevin C, Wolfsberg T et al: CRISPRz: a database of zebrafish validated sgRNAs. Nucleic Acids Res 2016, 44(D1):D822-826.

4. Doench JG: Am I ready for CRISPR? A user's guide to genetic screens. Nat Rev Genet 2018, 19(2):67-80.