Sequence-to-Sequence (Seq2Seq) Data

Seq2Seq refers to a scenario where both the input and the output of a model are ordered sequences. We aren't just predicting a single number (like "Will the price go up?"), but rather mapping one series of events to another.

Standard neural networks (like a simple Feed-Forward network) require a fixed number of input features and produce a fixed number of outputs. However, financial data is fluid, You might have 20 days of S&P500 data and might want to predict the next 1 hour, or the next 5 days.

The Encoder-Decoder architecture acts as a "bridge" that compresses any number of inputs into a single mathematical representation and then expands it into the desired number of outputs.

What is an Encoder-Decoder?

Think of it as a two-part translation system:

Pasted image 20260401005118.png

The Context Vector (C) is a bottleneck representation that captures the relevant information from an input sequence X=(x1,,xT) into a fixed-dimensional space.

Example

To illustrate how the Encoder-Decoder and Transformer architectures work, let's use a specific financial forecasting task:
Predicting the next 3 days of a stock's closing price based on the previous 5 days of trading data.

  • Input (X): 5 days of (Price, Volume) pairs.
  • Output (Y): 3 days of predicted Prices.

Imagine you are a senior trader. You have an analyst (the Encoder) who watches the market for 5 days. You don't want to see every single tick or trade; you want a summary that tells you everything you need to know to make a 3-day forecast.

The Financial Input (X)

Let's say our 5-day window looks like this:

The Encoding Process

The Encoder processes these one by one. By Day 5, it doesn't just "remember" Day 5; it has updated its internal hidden state ht five times. The final state, h5, is the Context Vector.

While the raw data is a sequence of prices and volumes, h5 is a single, fixed-length vector of real numbers. If our model's "hidden dimension" is 128, then h5 is simply a list of 128 numbers.

Mathematically, these numbers don't represent "Price" or "Volume" directly. Instead, they represent High-Level Features (or latent variables) that the model has learned are important for forecasting.

To visualize it, imagine h5 looks something like this:

h5=[0.12,0.85,2.31,0.04,,1.12]

Each "slot" in that 128-number vector might correspond to a specific market concept the neural network has discovered, such as:

Stage Data Shape Representation
Input (X) (5,2) 5 days, each with (Price, Volume)
Processing (5,128) The hidden state at each of the 5 steps
Context (C) (1,128) The final state h5 only

The number 128 isn't a fixed mathematical constant; it is a hyperparameter chosen by the person designing the model.