Research
My research is focused on studying computer vision and robot motion control in complex dynamic environments. I am working under the supervision of
Dr. Yi Guo.
I have worked with
Dr Sheikh Anowarul Fattah, Professor, Department of EEE, BUET in the past. Our research is focused on biomedical disease identification through computer vision and machine learning techniques. Besides, as part of my job as a Research Engineer
at
Celloscope, i have been working on NLP tasks (such as: implementation of ASR, TTS, NLU Based Chatbot) etc. since December 2021. Some of our works are published as research
articles. I have completed my undergrad thesis under the supervision of
Dr Apratim Roy, Department of EEE, BUET. You can find link to my undergrad thesis paper
below. I have also
worked as a research assistant at Bangladesh University of Engineering and Technology under the supervision of
Dr. A. B. M. Alim Al Islam.
Current Works
Past Research
2023
- Neural Voice Banking System
We have developed the first voice banking system in Bangladesh here at Celloscope. It is a speech-to-speech engine that allows users to do
regular banking tasks (such as balance transfer and balance inquiry) just by using their voice. Our system generates
synthetic voice as response in order to provide a voice-to-voice conversation experience. This system is currently integrated into the voice banking
app of Agrani Bank. We have integrated numerous SOTA AI systems to ensure safe and secure banking experience for the users.
- Rail Cop
Derailment is a significant problem in the rail system in developing and underdeveloped countries. Derailment can occur for
number of reasons. For example, natural disasters or unethical activities by human. However, a complete detection system for
derailment is yet unavailable in the market. Our approach focuses on developing such a system that can sense the vibrations
of the railtrack and can detect whether there is a derailment within a certain range of the track.
Our current prototype uses multichannel piezoelectric crystals to sense the vibrations in railtracks and convert the mechanical signals
to electrical signals. For processing the signals and making decisions, we have trained and deployed tinyML and EdgeML models in RP2040
and Atmel processors.
2022
- SpectroCardioNet: An Attention Based Deep Learning Network Using Triple-Spectrograms of PCG Signal for Heart Valve Disease Detection [PDF]
- IEEE Sensors Journal
- Sakib Chowdhury, Monjur Morshed, Shaikh Anowarul Fattah
Phonocardiogram (PCG) signal is used for the early detection of cardiovascular diseases as it captures the heart sound characteristics.
In this paper, a spectral attention based deep-learning network is proposed for the automatic detection of cardiac disease from the spectrograms of PCG
signals, namely SpectroCardioNet. From a given PCG audio signal, in view of simultaneously utilizing both time and frequency domain information,
spectrogram, delta-spectrogram and double-delta-spectrogram are generated. The extracted triple-spectrogram representation is applied in the proposed
network as a three-channel 2D input, where it passes spectral and sequential feature paths. In the spectral feature path, a spectral attention block is
designed to emphasizes some regions in the spectrograms based on a deep attention network and its output is then processed through the spectral pattern
detectors. On the other hand, in order to extract the temporal behavior of the frequency components of the spectrograms, 1D convolution based sequential
feature extractor is also proposed. Extensive experimentation is carried out on two standard PCG datasets and a very satisfactory performance is achieved
in comparison to that obtained by some existing methods.
- SHONGLAP: A Large Bengali Open-Domain Dialogue Corpus [PDF]
- LREC
- Syed Mostofa Monsur, Sakib Chowdhury, Md Shahrar Fatemi, Shafayat Ahmed, Muhammad Abdullah Adnan
We introduce, SHONGLAP, a large annotated open-domain dialogue corpus in Bengali language.
Due to unavailability of high-quality dialogue datasets for low-resource languages like Bengali, existing neural open-domain dialogue systems suffers
from data scarcity. We propose a framework to prepare large-scale open-domain dialogue dataset from publicly available multi-party discussion podcasts,
talk-shows and label them based on weak-supervision techniques which is particularly suitable for low-resource languages. Using the framework, we
prepared our corpus which is the first reported Bengali open-domain dialogue corpus, which can serve as a strong baseline for future works.
Experimental results show that use of our corpus improve performance of large language models (BanglaBERT) to do downstream classification tasks during
fine-tuning.
- An Intelligent Pixelated Electrode Array for High Density Surface Electromyography Sensors [PDF]
- Undergrad Thesis
- Sakib Chowdhury, Dipayon Kumar Sikder, Apratim Roy
Surafce electromyography (sEMG) sensor is a
non-invasive diagnostic tool for identifying muscle diseases
which is also utilized in portable smart devices to recognize
body mass compositions. In this article, we propose a novel
pixelated electrode array that can modulate electrode size and
leverage natural filtering property of electrodes to maximize
the signal-to-noise ratio in different body locations and
for individuals with varied skin features and composition.
While determining the ideal electrode layout, the suggested
concept significantly reduces the signal sample duration and
is able to avoid additional computational overhead once
the optimal arrangement is identified. Moreover, it offers
a substantial advantage over traditional signal processing
techniques, which need constant processing to filter out noise
and improve the intelligent signal with digital filters and
transceiver amplifiers. Because of truncated processing cost,
the proposed array is suitable for wearable devices driven
with low-power embedded microcontrollers, where high
computational requirement often prove to be prohibitive.
2021
- CovTANet: A Hybrid Tri-Level Attention-Based Network for Lesion Segmentation, Diagnosis, and Severity Prediction of COVID-19 Chest CT Scans [PDF]
- IEEE Transactions on Industrial Informatics
- Tanvir Mahmud, Md Jahin Alam, Sakib Chowdhury, Shams Nafisa Ali, Md Maisoon Rahman, Shaikh Anowarul Fattah, Mohammad Saquib
Rapid and precise diagnosis of COVID-19 is
one of the major challenges faced by the global community
to control the spread of this overgrowing pandemic. In this
article, a hybrid neural network is proposed, named CovTANet, to provide an end-to-end clinical diagnostic tool for
early diagnosis, lesion segmentation, and severity prediction of COVID-19 utilizing chest computer tomography (CT)
scans. A multiphase optimization strategy is introduced for
solving the challenges of complicated diagnosis at a very
early stage of infection, where an efficient lesion segmentation network is optimized initially, which is later integrated
into a joint optimization framework for the diagnosis and
severity prediction tasks providing feature enhancement of
the infected regions. Moreover, for overcoming the challenges with diffused, blurred, and varying shaped edges
of COVID lesions with novel and diverse characteristics,
a novel segmentation network is introduced, namely trilevel attention-based segmentation network. This network
has significantly reduced semantic gaps in subsequent
encoding–decoding stages, with immense parallelization of
multiscale features for faster convergence providing considerable performance improvement over traditional networks. Furthermore, a novel tri-level attention mechanism
has been introduced, which is repeatedly utilized over the
network, combining channel, spatial, and pixel attention.
2020
- A RNN based parallel deep learning framework for detecting sentiment polarity from Twitter derived textual data [PDF]
- 2020 11th International Conference on Electrical and Computer Engineering (ICECE)
- Sakib Chowdhury, Md Latifur Rahman, Shams Nafisa Ali, Md Jahin Alam
Social media platforms have become one of the primary mediums of communication nowadays.
Along with communication, they are currently being utilized in a wide range of activities like digital marketing, customer care, e-learning etc.
The unceasing use of social media is generating gigantic amount of textual data everyday.
It is essential to properly analyze these data with the consideration of underlying human traits sentiments for exploring the full potential of these
platforms. However, sentiment analysis from text has been considered as a challenging task because of the rapid use of informal and noisy words.
Updated and powerful word embeddings are being invented almost every two years so that the machines could understand the underlying features
of linguistics. Each of these embedding techniques excel in different aspects. In this paper, we present a novel RNN based sentiment polarity
detection framework which feeds the power of three different word embeddings: Word2Vec, GloVe and SSWE into a single powerful network with three
parallel branches. The proposed network effectively utilizes the semantic, syntactic and sentiment polarity wise embeddings in word vectors encoding
three major aspect of language from the viewpoint of extracting sentiment information from text. Posts collected from twitter was used to train and
validate the proposed network. The results demonstrate that the proposed network containing parallelly configured multiple word embeddings outperforms
the single word vectorization techniques. Additionally, it shows comparable or better evaluation scores when compared to several contemporary
state-of-the-art models.