The Softmax output function transforms a previous layer's output into a vector of probabilities. It is commonly used for multiclass classification. Given an input vector $x$ and a weighting vector $w$ we have:

$$ P(y=j \mid{x}) = \frac{e^{x^{T}w_{j}}}{\sum^{K}_{k=1}e^{x^{T}wk}} $$

Source: Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition