My research vision is to incorporate commonsense knowledge and social intelligence into an embodied agent by integrating information from multiple modalities. The questsions that I am broadly investigating during my research are:
How to refine visual grounding that - (a) understands the visual information in terms of relations, attributes and their correlations, (b) ensures proper semantic vision interpretation and alignment, and (c) is capable of grounding information locally and executing complex reasoning to summarize the global context?
How to establish an efficient, interpretable and compositional connection among language, vision, and social cognition so that - (a) interactions between natural language and vision ensure query understanding and answer formulation and (b) connections between vision and social cognition help to interpret intents and interactions?
I am currently a second year masters student (MLT) and an incoming Ph.D. student at Language Technologies Institute, Carnegie Mellon University. I am working on multimodal commonsense reasoning and social intelligence modeling under the supervision of Professor Eric Nyberg.
Apart from exploring plethora of exciting research problems, I love reading books, sketching, playing with charcoals, writing poems and reciting them. Robert Frost is my inspiration for writing. "And miles to go before I sleep, And miles to go before I sleep"...
Masters of Language Technologies (2021 - Present)
Carnegie Mellon University
Undergrad Student in Computer Science (2016 - 2021)
Bangaldesh University of Engineering and Technology
High network communication cost for synchronizing weights and gradients in geo-distributed data analysis using DNN models consumes the benefits of advancement in computation and optimization techniques. Quantization methods for weight, gradient or both have been proved to be infeasible in terms of distributed training across multiple data centers all over the world. Here, we introduce WeightGrad which acknowledges the limitations of quantization and provides loss-aware weight-quantized networks with quantized gradients for local convergence and for global convergence it dynamically eliminates insignificant communication between data centers while still guaranteeing the correctness of DNN models. Our experiments on our developed prototypes of WeightGrad running across 3 Amazon EC2 regions shows that WeightGrad provides 1.06% gain in top-1 accuracy, 5.36× speedup over baseline and 1.4×-2.26× over the four state-of-the-art distributed ML systems.
Using multilingual pre-trained model XML-Roberta we develop a model for contextual commonsense based Question Answering(QA). We propose a new embedding layer with a topic modeling structure prior to that to increase accuracy for context-based question answering system for low resource languages. We address multiple persons/organizations/events in the context (create ambiguity while generating answer) issue by applying an attention layer followed by an entity extraction layer to ensure query-based tagging which will narrow down the search space and eliminate ambiguity. We are emphasizing on cost-effective well-structured fine-tuning steps rather than modifying the pretrained model to address challenges of incorporating commonsense into the language models.
In this work, we use Natural Language Processing (NLP) and Social Network Analysis (SNA) to study collected anonymized Twitter data. The NLP performs the following procedures to analyze social media posts: keyword gathering, frequency analysis, information extraction, automatic categorization and clustering, automatic summarization, and finding associations within the data. The objective of these procedures would be to find how stigma manifests itself on social media. The SNA studies de-identified data to analyze and visualize existing network structures, along with mapping misinformation spread. The objective of these procedures is to understand how misinformation and stigmatization spread and change in social media channels. I am working on the NLP part of this project which includes experimenting different topic modeling architectures (LDA, GSDMM, BTM, lda2vec, BERT) for short texts like tweets, modeling advanced classifier as well as DNN architectures and showing comparison.
Phylogenetic trees depict the evolution of a set of taxa from their most recent common ancestor. A species tree is a phylogenetic tree that models the evolutionary history of a set of species. A gene tree is a phylogenetic tree that models a genealogy of a gene. In this work, we intend to predict species tree from a set of gene trees. A wide array of algorithms and computer programs are available for inferring phylogenetic trees from various types of data. Though those algorithms work well with a fixed and small number of taxa, they perform poorly when the number of taxa increases. To acknowledge this problem of existing state-of-the-art algorithms, we use embedding based DNN models to predict a species tree from a set of gene trees for a wide range of taxa.
The goal of this project is to build a mobile-Based application for detecting a clinical sign of malnutrition and dehydration and learn a new malnutrition and dehydration level detection scale based on the achieved accuracy of different clinical sign detection techniques. It captures image and video of specific body regions (face, lip, tongue, eye, fontanelle, urine and the video of skin turgor test) to detect a clinical sign of malnutrition and dehydration. We develop CNN model to extract features (e.g., body part detection, segmentation, color, border, shape and texture of skin) from images and videos and build a classifier (SVM) to predict malnutrition and dehydration level (mild, moderate, acute). This project is a collaboration with International Centre for Diarrhoeal Disease Research, Bangladesh [icddr,b].