-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom output such as exponential function? #167
Comments
I think it should be possible with basic operations like matrix multiplication, plus, and exp. |
I'm not sure how to get this working. As an example, suppose I have a net with 1 hidden layer of 10 nodes and an output layer of 4 nodes. Suppose that the activation of output layer should be Thanks in advance. net = @mx.chain mx.Variable(:data) =>
mx.FullyConnected(name = :fc1_in, num_hidden = 10) =>
mx.Activation(name = :fc1_out, act_type = :relu) =>
mx.FullyConnected(name = :fc2_in, num_hidden = 4) =>
mx.Activation(name = :fc2_out, act_type = :softrelu) |
You can use symbolic calculations to get what you need. using MXNet
# net without activation layer
net = @mx.chain mx.Variable(:data) =>
mx.FullyConnected(name = :fc1_in, num_hidden = 10) =>
mx.Activation(name = :fc1_out, act_type = :relu) =>
mx.FullyConnected(name = :fc2_in, num_hidden = 4)
# Same net with exponential activation
net_out = mx.exp(net, name = :fc2_out)
# Here outputs of two nets are joined together, for easier comparison
net = mx.Group(net, net_out)
println(mx.list_arguments(net)) # arguments = input data + hidden layers weights
println(mx.list_outputs(net)) # outputs, since we use grouped net, we have two outputs: before and after activation
# some random data for forward propagation. Since we are not going to train model, no labels are needed
x = rand(Float32, 10, 2)
data = mx.ArrayDataProvider(:data => x, batch_size=2)
model = mx.FeedForward(net)
# usually you do not use this function directly, it is called internally from train function
mx.init_model(model, mx.UniformInitializer(), data=(10, 2))
# This is forward pass with some random weights. We get two arrays, before and after exponential activation
res = mx.predict(model, data)
# And we can check, that everything is fine
@assert all(exp(res[1]) .- res[2] .< 1e-6) But for training model, loss layer is needed as usually, of course. |
Thanks that works. |
It's hard to tell without source code and error messages. Can you give a link and tell what exactly is not working? |
Sure, code below, together with the resulting error. One obvious problem is that I don't know where using MXNet
# Custom eval metric
import MXNet.mx: get, reset!, _update_single_output
type CustomMetric <: mx.AbstractEvalMetric
loss::Float64
n::Int
CustomMetric() = new(0.0, 0)
end
function mx.reset!(metric::CustomMetric)
metric.loss = 0.0
metric.n = 0
end
function mx.get(metric::CustomMetric)
[(:CustomMetric, metric.loss / metric.n)]
end
function mx._update_single_output(metric::CustomMetric, label::mx.NDArray, pred::mx.NDArray)
label = mx.copy(label)
pred = mx.copy(pred)
n = size(label, 1)
metric.n += n
for i = 1:n
z = 0.0
for j = 1:4
z += j * pred[j, i]
end
loss = sqrt(abs(z - label[i]))
metric.loss += loss
end
end
# Base net
net = @mx.chain mx.Variable(:data) =>
mx.FullyConnected(name = :fc1_in, num_hidden = 10) =>
mx.Activation(name = :fc1_out, act_type = :softrelu) =>
mx.FullyConnected(name = :fc2_in, num_hidden = 4)
netout = mx.exp(net, name = :fc2_out)
# data
x = rand(Float32, 1, 8) # 8 observations of 1 variable
y = exp(x) + 2.0 * exp(0.5 * x) + 3.0 * exp(0.3 * x) + 4.0 * exp(0.25 * x)
# Connect net, data and hyperparameters
batch_size = 4
train_prov = mx.ArrayDataProvider(x, y; batch_size = batch_size)
eval_prov = mx.ArrayDataProvider(x, y; batch_size = batch_size)
# predictions from model with random parameters
model = mx.FeedForward(netout)
mx.init_model(model, mx.UniformInitializer(), data = (1, 8))
res = mx.predict(model, eval_prov)
# train
netout = mx.MakeLoss(netout)
mdl = mx.FeedForward(netout, context = mx.cpu())
opt = mx.SGD(lr = 0.1, momentum = 0.9, weight_decay = 0.00001) # Optimizing algorithm
mx.fit(mdl, opt, train_prov, n_epoch = 2, eval_data = eval_prov, eval_metric = CustomMetric()) And the resulting error: ERROR: MXNet.mx.MXError("[16:46:47] src/symbol/symbol.cc:155: Symbol.InferShapeKeyword argument name softmax_label not found.\nCandidate arguments:\n\t[0]data\n\t[1]fc1_in_weight\n\t[2]fc1_in_bias\n\t[3]fc2_in_weight\n\t[4]fc2_in_bias\n")
in macro expansion at /home/jock/.julia/v0.5/MXNet/src/base.jl:58 [inlined]
in _infer_shape(::MXNet.mx.SymbolicNode, ::Array{AbstractString,1}, ::Array{UInt32,1}, ::Array{UInt32,1}) at /home/jock/.julia/v0.5/MXNet/src/symbolic-node.jl:276
in #infer_shape#214(::Array{Any,1}, ::Function, ::MXNet.mx.SymbolicNode) at /home/jock/.julia/v0.5/MXNet/src/symbolic-node.jl:319
in (::MXNet.mx.#kw##infer_shape)(::Array{Any,1}, ::MXNet.mx.#infer_shape, ::MXNet.mx.SymbolicNode) at ./<missing>:0
in #init_model#931(::Bool, ::Array{Any,1}, ::Function, ::MXNet.mx.FeedForward, ::MXNet.mx.UniformInitializer) at /home/jock/.julia/v0.5/MXNet/src/model.jl:90
in (::MXNet.mx.#kw##init_model)(::Array{Any,1}, ::MXNet.mx.#init_model, ::MXNet.mx.FeedForward, ::MXNet.mx.UniformInitializer) at ./<missing>:0
in _init_model(::MXNet.mx.FeedForward, ::MXNet.mx.ArrayDataProvider, ::MXNet.mx.UniformInitializer, ::Bool) at /home/jock/.julia/v0.5/MXNet/src/model.jl:258
in #fit#954(::Array{Any,1}, ::Function, ::MXNet.mx.FeedForward, ::MXNet.mx.SGD, ::MXNet.mx.ArrayDataProvider) at /home/jock/.julia/v0.5/MXNet/src/model.jl:355
in (::MXNet.mx.#kw##fit)(::Array{Any,1}, ::MXNet.mx.#fit, ::MXNet.mx.FeedForward, ::MXNet.mx.SGD, ::MXNet.mx.ArrayDataProvider) at ./<missing>:0 |
There are few things that should be considered.
Despite the fact, that you are unable add custom loss operator, in this exact task you can do the following trick: summation You may see this gist for details of realization: https://gist.github.com/Arkoniak/5402ddf4d272d2c32cc74343d5ce1793, here Yet, may be I am overcomplicate the problem and more simple solution exists :-) |
Thanks again, much appreciated. In response to your points above:
|
Well, for the most part answers are in the gist, from previous comment.
Network output is optimized. Main idea is the following: you build network with the following structure Net with loss output = Input -> Calculations -> Result -> Loss calculation -> Loss output. All of these are symbolic calculation and may include more than one step of course. For example in your case loss output consists of the following steps: exponential activation, multiplication by weight matrix, subtraction from labels, abs and square root. In gist netexp = @mx.chain net =>
mx.exp(name = :fc2_out) =>
mx.FullyConnected(name = :output1, num_hidden = 1, attrs = Dict(:grad => "freeze"))
netloss = mx.sqrt(mx.abs(netexp - label))
netloss = mx.MakeLoss(netloss) But from the solver point of view it is unimportant, it has whole big network with lots of internal steps. After training you obtain net that takes your input and produces loss output, i.e. something that is close to zero. On the second step you construct new net: Output Net = Input -> Calculations -> Result which is exactly "Net with loss output" without loss calculations. Then you transfer weights from full network, may be deleting excessive weights. In gist it is done as model_coeff = mx.FeedForward(net)
model_coeff.arg_params = Dict(k => v for (k, v) in mdl.arg_params)
delete!(model_coeff.arg_params, :output1_weight) # these were used in loss calculation and not needed for predictions.
delete!(model_coeff.arg_params, :output1_bias)
model_coeff.aux_params = mdl.aux_params Then output of this new net is exactly what you need in the first place. In case of predefined loss outputs, like SoftmaxOutput, the same things are done, with following differences
|
Ah OK. To summarize, my previous understanding was that the |
Yes, exactly. I think it's quite possible to hide all of this stuff from user, so you'll be able proceed exactly as in your initial understanding, as in specifying loss layer and network separately, with construction of full network, stripping away loss results etc etc done internally. I presume it'll be implemented in high level module api. |
Hi there,
I am trying to define the activation of the last layer as the exponential function. If x is the input vector to a node in the last layer, the output of the node would be exp(w*x + b). Is this possible?
Thanks!
Jock
The text was updated successfully, but these errors were encountered: