0

Problem Context: I'm building a Morse code audio decoder using CNN-BiLSTM with CTC loss. My current 4-layer model achieves Levenshtein distance ≈0.6, but attempts to improve performance by adding a 5th CNN layer caused training collapse (loss stuck at ~4). I need practical suggestions to reduce the error rate further.

Current Architecture (Working 4-Layer):

self.cnn = nn.Sequential(
    nn.Conv2d(3, 32, 3, padding=1),
    nn.BatchNorm2d(32),
    nn.ReLU(),
    nn.MaxPool2d(2),
    # ... 3 more blocks ...
)
self.rnn = nn.LSTM(128*8, 256, bidirectional=True)

Specific Questions:

Architecture Improvements:

Would replacing later CNN layers with dilated convolutions better preserve temporal features?

Is there an optimal ratio between CNN depth and LSTM complexity for this task?

Noise Robustness:

Which techniques work best for Morse decoding in noisy conditions?

Additive noise layers during training?

Learnable spectral suppression?

Sequence Modeling:

Would adding attention between CNN and LSTM help?

Should I try hybrid CTC/attention loss?

Training Tricks:

What curriculum learning strategies work for Morse?

Are there better ways to initialize the LSTM for CTC?

Alternative Approaches:

Would a U-Net style encoder-decoder outperform CNN-BiLSTM?

Are there lightweight transformer variants suitable for this task?

alexander
  • 9
  • 1

0 Answers0