The goal of this project is to build a nextgen replacement for the BioMart tool that provides a way to download custom reports of genes, transcripts,...
New FAANG backend with Elasticsearch and GraphQL
Sunny Tarawade
Current limitations: The current Back End for the Functional Annotation of Animal Genomes project (FAANG) provides users with a public rest API to...
Using Machine Learning to Identify and Classify Repeat Features
Yantong
A number of tools exist for identifying repeat features, but it remains a problem that the DNA sequence of some genes can be identified as being a...
Investigating and Implementing Compact Data Representation of Homology Relationship
KevinGao
A key challenge surrounding modern bioinformatics is to manage and store the growing amount of biological data with both space efficiency and...
Extract important information from scientific papers
Malay Joshi
During GSoC 2021, BioBERT and RegEx/string matching technique based “Named Entity Recognition” (NER) system was developed to recognize and extract...
GSoC 2022 Proposal - Extract text from tables in Scientific Papers by Kshitij Soni
kshitijsoni
PyTesseract is really helpful, the first time I knew PyTesseract, I directly used it to detect some a short text and the result is satisfying. Then,...