three layer neural network for MNIST with Python -
i'm writing own code implement single-hidden-layer neural network , test model on mnist dataset. got wired result(nll unacceptably high) though checked code on 2 days without finding what's went wrong.
here're global parameters:
layers = np.array([784, 300, 10]) learningrate = 0.01 momentum = 0.01 batch_size = 10000 num_of_batch = len(train_label)/batch_size nepoch = 30
softmax function definition:
def softmax(x): x = np.exp(x) x_sum = np.sum(x,axis=1) #shape = (nsamples,) row_idx in range(len(x)): x[row_idx,:] /= x_sum[row_idx] return x
sigmoid function definition:
def f(x): return 1.0/(1+np.exp(-x))
initialize w , b
k = np.vectorize(math.sqrt)(layers[0:-2]*layers[1:]) w1 = np.random.uniform(-0.5, 0.5, layers[0:2][::-1]) b1 = np.random.uniform(-0.5, 0.5, (1,layers[1])) w2 = np.random.uniform(-0.5, 0.5, layers[1:3][::-1]) b2 = np.random.uniform(-0.5, 0.5, (1,layers[2]))
and following core part each mini-batch:
for idx in range(num_of_batch): # forward_vectorized x = train_set[idx*batch_size:(idx+1)*batch_size,:] y = y[idx*batch_size:(idx+1)*batch_size,:] a1 = x a2 = f(np.dot(np.insert(a1,0,1,axis=1),np.insert(w1,0,b1,axis=1).t)) a3 = softmax(np.dot(np.insert(a2,0,1,axis=1),np.insert(w2,0,b2,axis=1).t)) # compute delta d3 = a3-y d2 = np.dot(d3,w2)*a2*(1.0-a2) # compute grad d2 = np.dot(d3.t,a2) d1 = np.dot(d2.t,a1) # update_parameters w1 = w1 - learningrate*(d1/batch_size + momentum*w1) b1 = b1 - learningrate*(np.sum(d2,axis=0)/batch_size) w2 = w2 - learningrate*(d2/batch_size+ momentum*w2) b2 = b2 - learningrate*(np.sum(d3,axis=0)/batch_size) e = -np.sum(y*np.log(a3))/batch_size err.append(e)
after 1 epoch(50,000 samples), got following sequence of e, seems large:
out[1]: 10000/50000 4.033538 20000/50000 3.924567 30000/50000 3.761105 40000/50000 3.632708 50000/50000 3.549212
i think back_prop code should correct , couldn't find what's going wrong. has tortured me on 2 days.
Comments
Post a Comment