Custom output such as exponential function? #167

JockLawrie · 2017-01-10T06:25:32Z

Hi there,

I am trying to define the activation of the last layer as the exponential function. If x is the input vector to a node in the last layer, the output of the node would be exp(w*x + b). Is this possible?

Thanks!
Jock

pluskid · 2017-01-10T13:19:38Z

I think it should be possible with basic operations like matrix multiplication, plus, and exp.

JockLawrie · 2017-01-10T21:22:19Z

I'm not sure how to get this working. As an example, suppose I have a net with 1 hidden layer of 10 nodes and an output layer of 4 nodes. Suppose that the activation of output layer should be :softrelu. The code below doesn't work, partly because I haven't specified where mx.Variable(:label) comes into the net. What am I missing? And how might I replace softrelu with the exponential function?

Thanks in advance.

net = @mx.chain mx.Variable(:data) =>

    mx.FullyConnected(name = :fc1_in, num_hidden = 10) =>
    mx.Activation(name = :fc1_out, act_type = :relu) =>

    mx.FullyConnected(name = :fc2_in, num_hidden = 4) =>
    mx.Activation(name = :fc2_out, act_type = :softrelu)

Arkoniak · 2017-01-10T22:38:36Z

You can use symbolic calculations to get what you need.

using MXNet

# net without activation layer
net = @mx.chain mx.Variable(:data) =>

    mx.FullyConnected(name = :fc1_in, num_hidden = 10) =>
    mx.Activation(name = :fc1_out, act_type = :relu) =>

    mx.FullyConnected(name = :fc2_in, num_hidden = 4)

# Same net with exponential activation
net_out = mx.exp(net, name = :fc2_out)

# Here outputs of two nets are joined together, for easier comparison
net = mx.Group(net, net_out)

println(mx.list_arguments(net))       # arguments = input data + hidden layers weights
println(mx.list_outputs(net))             # outputs, since we use grouped net, we have two outputs: before and after activation

# some random data for forward propagation. Since we are not going to train model, no labels are needed
x = rand(Float32, 10, 2)
data = mx.ArrayDataProvider(:data => x, batch_size=2)

model = mx.FeedForward(net)

# usually you do not use this function directly, it is called internally from train function
mx.init_model(model, mx.UniformInitializer(), data=(10, 2))

# This is forward pass with some random weights. We get two arrays, before and after exponential activation 
res = mx.predict(model, data)

# And we can check, that everything is fine
@assert all(exp(res[1]) .- res[2] .< 1e-6)

But for training model, loss layer is needed as usually, of course.

JockLawrie · 2017-01-11T00:57:08Z

Thanks that works.
I am still having trouble training the model. I tried using MakeLoss together with a custom eval_metric that takes a nonlinear combination of the 4 output values, but it's not working so far. Any ideas?

Arkoniak · 2017-01-11T21:29:23Z

It's hard to tell without source code and error messages. Can you give a link and tell what exactly is not working?

JockLawrie · 2017-01-12T05:48:27Z

Sure, code below, together with the resulting error. One obvious problem is that I don't know where mx.Variable(:label) goes. I'm sure there are other issues with this code. Thoughts? Thanks again for your help.

using MXNet

# Custom eval metric
import MXNet.mx: get, reset!, _update_single_output


type CustomMetric <: mx.AbstractEvalMetric
    loss::Float64
    n::Int

    CustomMetric() = new(0.0, 0)
end


function mx.reset!(metric::CustomMetric)
    metric.loss = 0.0
    metric.n = 0
end


function mx.get(metric::CustomMetric)
    [(:CustomMetric, metric.loss / metric.n)]
end


function mx._update_single_output(metric::CustomMetric, label::mx.NDArray, pred::mx.NDArray)
    label = mx.copy(label)
    pred  = mx.copy(pred)
    n = size(label, 1)
    metric.n += n
    for i = 1:n
        z = 0.0
        for j = 1:4
            z += j * pred[j, i]
        end
        loss = sqrt(abs(z - label[i]))
        metric.loss += loss
    end
end


# Base net
net = @mx.chain mx.Variable(:data) =>

      mx.FullyConnected(name = :fc1_in, num_hidden = 10) =>
      mx.Activation(name = :fc1_out, act_type = :softrelu) =>

      mx.FullyConnected(name = :fc2_in, num_hidden = 4)

netout = mx.exp(net, name = :fc2_out)

# data
x = rand(Float32, 1, 8)    # 8 observations of 1 variable
y = exp(x) + 2.0 * exp(0.5 * x) + 3.0 * exp(0.3 * x) + 4.0 * exp(0.25  * x)

# Connect net, data and hyperparameters
batch_size = 4
train_prov = mx.ArrayDataProvider(x, y; batch_size = batch_size)
eval_prov  = mx.ArrayDataProvider(x, y; batch_size = batch_size)

# predictions from model with random parameters
model = mx.FeedForward(netout)
mx.init_model(model, mx.UniformInitializer(), data = (1, 8))
res = mx.predict(model, eval_prov)

# train
netout = mx.MakeLoss(netout)
mdl = mx.FeedForward(netout, context = mx.cpu())
opt = mx.SGD(lr = 0.1, momentum = 0.9, weight_decay = 0.00001)    # Optimizing algorithm
mx.fit(mdl, opt, train_prov, n_epoch = 2, eval_data = eval_prov, eval_metric = CustomMetric())

And the resulting error:

ERROR: MXNet.mx.MXError("[16:46:47] src/symbol/symbol.cc:155: Symbol.InferShapeKeyword argument name softmax_label not found.\nCandidate arguments:\n\t[0]data\n\t[1]fc1_in_weight\n\t[2]fc1_in_bias\n\t[3]fc2_in_weight\n\t[4]fc2_in_bias\n")
 in macro expansion at /home/jock/.julia/v0.5/MXNet/src/base.jl:58 [inlined]
 in _infer_shape(::MXNet.mx.SymbolicNode, ::Array{AbstractString,1}, ::Array{UInt32,1}, ::Array{UInt32,1}) at /home/jock/.julia/v0.5/MXNet/src/symbolic-node.jl:276
 in #infer_shape#214(::Array{Any,1}, ::Function, ::MXNet.mx.SymbolicNode) at /home/jock/.julia/v0.5/MXNet/src/symbolic-node.jl:319
 in (::MXNet.mx.#kw##infer_shape)(::Array{Any,1}, ::MXNet.mx.#infer_shape, ::MXNet.mx.SymbolicNode) at ./<missing>:0
 in #init_model#931(::Bool, ::Array{Any,1}, ::Function, ::MXNet.mx.FeedForward, ::MXNet.mx.UniformInitializer) at /home/jock/.julia/v0.5/MXNet/src/model.jl:90
 in (::MXNet.mx.#kw##init_model)(::Array{Any,1}, ::MXNet.mx.#init_model, ::MXNet.mx.FeedForward, ::MXNet.mx.UniformInitializer) at ./<missing>:0
 in _init_model(::MXNet.mx.FeedForward, ::MXNet.mx.ArrayDataProvider, ::MXNet.mx.UniformInitializer, ::Bool) at /home/jock/.julia/v0.5/MXNet/src/model.jl:258
 in #fit#954(::Array{Any,1}, ::Function, ::MXNet.mx.FeedForward, ::MXNet.mx.SGD, ::MXNet.mx.ArrayDataProvider) at /home/jock/.julia/v0.5/MXNet/src/model.jl:355
 in (::MXNet.mx.#kw##fit)(::Array{Any,1}, ::MXNet.mx.#fit, ::MXNet.mx.FeedForward, ::MXNet.mx.SGD, ::MXNet.mx.ArrayDataProvider) at ./<missing>:0

Arkoniak · 2017-01-12T13:35:57Z

There are few things that should be considered.

AbstractEvalMetric is an evaluation metric, i.e. it is not used during backward propagation. It's merely calculated statistics of the current epoch.
What you need is a custom operator with which to calculate loss function. Unfortunately, it is not implemented yet, see for example Is creation of custom operators supported? #166
Error about softmax_label not found is due to the construction of DataProvider. By default labels automaticaly get name softmax_label, if you want to set your own name, you should pass it like this, for example: ArrayDataProvider(:my_data_input_name => x, :my_label_name => y; batch_size=batch_size). You can see it here: https://github.com/dmlc/MXNet.jl/blob/master/src/io.jl#L276
If you add custom loss layer, than output of network equals to the loss value, so you need identity eval metric, to evaluate losses. I do not know, how one can get net output and separately calculate losses in such a case.

Despite the fact, that you are unable add custom loss operator, in this exact task you can do the following trick: summation z += j * pred[j, i] equals to to matrix multiplication of net output with matrix weights equals to [1 2 3 4]. And matrix multiplication is just another FullyConnected symbol, so you can add it, but you have to freeze these weights, so they do not change during backpropagation.

You may see this gist for details of realization: https://gist.github.com/Arkoniak/5402ddf4d272d2c32cc74343d5ce1793, here CustomMetric is just identity used for evaluation and CustomInitializer uniformly initialize weights of net, except last layer, where it set 1:4 as weights.

Yet, may be I am overcomplicate the problem and more simple solution exists :-)

JockLawrie · 2017-01-13T03:44:24Z

Thanks again, much appreciated. In response to your points above:

If the solver isn't using the eval metric, then what is being optimized?
I'll wait for this feature to be implemented.
OK thanks.
Makes sense, but I'm trying to achieve exactly what you suggest can't be done, namely take the output of the last layer, feed it to a custom loss function and have the solver minimise this. In particular, the labels are only included in the loss function - they do not appear in the net.

Arkoniak · 2017-01-13T08:11:17Z

Well, for the most part answers are in the gist, from previous comment.

If the solver isn't using the eval metric, then what is being optimized?

Network output is optimized. Main idea is the following: you build network with the following structure

Net with loss output = Input -> Calculations -> Result -> Loss calculation -> Loss output.

All of these are symbolic calculation and may include more than one step of course. For example in your case loss output consists of the following steps: exponential activation, multiplication by weight matrix, subtraction from labels, abs and square root. In gist

netexp = @mx.chain net =>
  mx.exp(name = :fc2_out) =>
  mx.FullyConnected(name = :output1, num_hidden = 1, attrs = Dict(:grad => "freeze"))

netloss = mx.sqrt(mx.abs(netexp - label)) 
netloss = mx.MakeLoss(netloss)

But from the solver point of view it is unimportant, it has whole big network with lots of internal steps.

After training you obtain net that takes your input and produces loss output, i.e. something that is close to zero. On the second step you construct new net:

Output Net = Input -> Calculations -> Result

which is exactly "Net with loss output" without loss calculations. Then you transfer weights from full network, may be deleting excessive weights. In gist it is done as

model_coeff = mx.FeedForward(net)
model_coeff.arg_params = Dict(k => v for (k, v) in mdl.arg_params)
delete!(model_coeff.arg_params, :output1_weight)   # these were used in loss calculation and not needed for predictions.
delete!(model_coeff.arg_params, :output1_bias)
model_coeff.aux_params = mdl.aux_params

Then output of this new net is exactly what you need in the first place.

In case of predefined loss outputs, like SoftmaxOutput, the same things are done, with following differences

From user perspective all steps of loss calculations are combined into one, single symbolic calculation.
It may create label variable internally, so you do not have to declare it explicitly.
Internally for backpropagation loss function is used, but output of this symbol is its input, so you do not need to strip this layer for getting predictions. I.e. effectively you may use same network for training and prediction.

JockLawrie · 2017-01-13T09:51:13Z

Ah OK. To summarize, my previous understanding was that the eval_metric takes in the network's output and the solver seeks to minimize the eval_metric (loss). Turns out the loss is defined as the output of the network's output layer and the solver minimizes this. During training the eval_metric is calculated at the end of each epoch solely for monitoring purposes. That's most helpful, thanks again.

Arkoniak · 2017-01-13T10:30:07Z

Yes, exactly. I think it's quite possible to hide all of this stuff from user, so you'll be able proceed exactly as in your initial understanding, as in specifying loss layer and network separately, with construction of full network, stripping away loss results etc etc done internally. I presume it'll be implemented in high level module api.

vchuravy mentioned this issue Jan 16, 2017

[WIP/RFC] Initial support for modules #173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom output such as exponential function? #167

Custom output such as exponential function? #167

JockLawrie commented Jan 10, 2017

pluskid commented Jan 10, 2017

JockLawrie commented Jan 10, 2017

Arkoniak commented Jan 10, 2017 •

edited

Loading

JockLawrie commented Jan 11, 2017

Arkoniak commented Jan 11, 2017

JockLawrie commented Jan 12, 2017

Arkoniak commented Jan 12, 2017

JockLawrie commented Jan 13, 2017

Arkoniak commented Jan 13, 2017

JockLawrie commented Jan 13, 2017

Arkoniak commented Jan 13, 2017

Custom output such as exponential function? #167

Custom output such as exponential function? #167

Comments

JockLawrie commented Jan 10, 2017

pluskid commented Jan 10, 2017

JockLawrie commented Jan 10, 2017

Arkoniak commented Jan 10, 2017 • edited Loading

JockLawrie commented Jan 11, 2017

Arkoniak commented Jan 11, 2017

JockLawrie commented Jan 12, 2017

Arkoniak commented Jan 12, 2017

JockLawrie commented Jan 13, 2017

Arkoniak commented Jan 13, 2017

JockLawrie commented Jan 13, 2017

Arkoniak commented Jan 13, 2017

Arkoniak commented Jan 10, 2017 •

edited

Loading