-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orthonorm layer weights #14
Comments
Hi, Thanks for looking at our code. Please note that the weights are tuned by QR decomposition, which means that it is not a 'trainable weight' in the traditional sense, since finding it does not involve gradient descent. Instead, it is simply computed by Cholesky decomposition each orthogonalization step. In particular, when the orthogonality layer is applied (1), we call def Orthonorm(x, name=None):
'''
Builds keras layer that handles orthogonalization of x
x: an n x d input matrix
name: name of the keras layer
returns: a keras layer instance. during evaluation, the instance returns an n x d orthogonal matrix
if x is full rank and not singular
'''
# get dimensionality of x
d = x.get_shape().as_list()[-1]
# compute orthogonalizing matrix
ortho_weights = orthonorm_op(x)
# create variable that holds this matrix
ortho_weights_store = K.variable(np.zeros((d,d)))
# create op that saves matrix into variable
ortho_weights_update = tf.assign(ortho_weights_store, ortho_weights, name='ortho_weights_update')
# switch between stored and calculated weights based on training or validation
l = Lambda(lambda x: K.in_train_phase(K.dot(x, ortho_weights), K.dot(x, ortho_weights_store)), name=name)
l.add_update(ortho_weights_update)
return l Note that the |
It means the matrix will change (not update) in every orthogonalization step without previous information (gradient) ? So how to make sure the final matrix we want? |
Note that the orthogonalization layer orthogonalizes the input, and nothing else. Unlike fully connected, recurrent, convolutional, etc., layers, which are functions that depend both on the input AND a set of parameters (i.e. weights and biases), the orthogonalization layer is one that depends only on the input (i.e., given a fixed input, it performs a fixed operation -- namely the orthogonalization of the fixed input). So your question of finding the 'correct' final matrix is the wrong way to think about this layer. (Somewhat akin to asking about how to find the 'correct' ReLU function; there is simply no tunable parameter in the fixed operation.) Since the orthogonalization layer is differentiable, the learning signal passes through the orthogonalization layer to the previous layers that do have tunable parameters. These parameters are trained such that the output of the orthogonalization layer is indeed the desired output -- that is, it approximates the solution of the constrained optimization task. |
Hello. I have a question about the "alternatively training", why you divide the training into two steps. Why not do them in one same mini batch: get y_tilde then compute the L, tune the weight of last layer and finally get the final output y. I think it can guarantee the orthogonalization for any input. Is there any problem if doing so? Thanks. |
Hi,
I have some trouble about the orthogonality training steps. I noticed that the orthogonality layer is actually a Keras Lambda layer which only do the simple operation of inputs. More over, the lambda layer has no trainable weights which is not consistent with the paper:
In each orthogonalization step we use the QR decomposition to tune the weights of the last layer.
I have read the code carefully, and I still can't figure out how to train the orthogonality layer to get the orthogonality output.
I'd appreciate it if you could explain it.
The text was updated successfully, but these errors were encountered: