c0e01fd63c
Test for XNOR-conv on CUDA
2018-09-08 02:46:05 +03:00
b141f85cab
Compile fix
2018-09-07 15:07:46 +03:00
007878393f
Temporary Slow implementation of XNOR on CUDA (shared_memory)
2018-09-06 23:21:26 +03:00
c4a9e3422e
Temporary implementation of XNOR on CUDA
2018-08-31 02:47:58 +03:00
9753b72aeb
temp fix, don't use it
2018-08-30 17:24:41 +03:00
18d5e4f39c
Fixed yolov3-tiny_xnor.cfg
2018-08-24 18:29:40 +03:00
31b6b0bad3
XNOR-net 4x acceleration on CPU for yolov2-tiny - 22 FPS (CPU Core i7 6700K)
2018-08-23 02:44:21 +03:00
f606b5456e
XNOR-net 21 FPS on CPU yolov2-tiny.cfg
2018-08-22 17:52:48 +03:00
f92b20580a
Some fixes for AVX support on CPU
2018-08-14 01:51:31 +03:00
b1dddf02cc
Fixed AVX compiled bug
2018-08-13 02:43:45 +03:00
1f2155b886
Experiments
2018-08-11 02:49:55 +03:00
a9fef1bd66
Bug fixes. Tested im2col_cpu_custom_transpose - bad way.
2018-08-11 00:26:53 +03:00
3e856ec04e
Optimized: transpose
2018-08-10 01:27:20 +03:00
d6162af210
Optimized on CPU: gemm_bin, im2col, activation, transpose
2018-08-09 02:31:36 +03:00
a284a7da8d
Try to use avx_hs() - slow and requires alignment 4096 bits < (l.size*l.size*l.c)
...
May be faster only from 8192 bits and more.
2018-08-08 19:08:58 +03:00
0a326e7afe
XNOR-net on CPU AVX2
2018-08-08 02:45:47 +03:00
cfc5fedbb6
Just used spaces for indents instead of Tabs
2018-07-10 23:29:15 +03:00
ec68838342
Fixed memory leaks for Yolo: train, test
2018-05-23 18:27:18 +03:00
c1bb8c129d
Fixed xnor for random=1
2018-05-19 16:52:05 +03:00
8b5344ee2d
Added BFLOPs output for network configurations
2018-05-14 13:34:40 +03:00
028696bf15
Output improvements for detector results:
...
When printing detector results, output was done in random order, obfuscating results for interpreting. Now:
1. Text output includes coordinates of rects in (left,right,top,bottom in pixels) along with label and score
2. Text output is sorted by rect lefts to simplify finding appropriate rects on image
3. If several class probs are > thresh for some detection, the most probable is written first and coordinates for others are not repeated
4. Rects are imprinted in image in order by their best class prob, so most probable rects are always on top and not overlayed by less probable ones
5. Most probable label for rect is always written first
Also:
6. Message about low GPU memory include required amount
2018-05-03 16:33:46 +03:00
9bae70b225
Accelerated by another 5% using FP16/32 Batch-norm for Tensor Cores.
2018-04-17 02:51:11 +03:00
c52fa47428
Loss-graph store automatically (iterations == max_batches) at the end of training
2018-04-16 13:09:10 +03:00
eb9c88ef73
Fixed bug in Tensor Cores V100 (1. Desc in Batch norm, 2. Manually selected algo).
...
Also fixed time measure on Linux for multi-threading.
2018-04-15 01:51:21 +03:00
537d135feb
Improve training performance - batch-norm using cuDNN.
2018-03-20 02:16:51 +03:00
880cf187d8
Fixed multi-GPU training for Tensor Cores
2018-03-09 19:44:46 +03:00
cad4d1618f
Added support for Tensor Cores CC >= 7.0 (V100). For FP16/32 (mixed precision) define CUDNN_HALF should be used.
2018-02-25 16:29:44 +03:00
cd2bdec090
Updated to CUDA 9.1. And fixed no_gpu dependecies.
2018-02-23 15:05:31 +03:00
f558d5c39c
Fix
2018-02-22 23:16:36 +03:00
dda993f3dd
Use half_float16 instead of float32 if defined both CUDNN and CUDNN_HALF. Use Tensor Cores.
2018-02-22 22:54:40 +03:00
033e934ce8
If there is excessive GPU-RAM consumption by CUDNN then then do not use Workspace
2018-02-21 19:14:01 +03:00
4b0be8c701
Optimized resizing of network for random=1
2018-02-21 15:06:11 +03:00
bc810016a1
cuDNN 6.0 supported. Also speed of console example improved.
2017-08-03 01:36:22 +03:00
d7a30ada7e
Fixed behavior if missing library cudnn.lib
2017-01-16 12:51:42 +03:00
3b9afd4cd2
Fixed behavior if missing library cudnn.lib
2017-01-16 00:44:41 +03:00
62235e9aa3
cpu batch norm works
2016-11-18 21:51:36 -08:00
fc9b867dd9
🔥 🔥 :dragonite:
2016-11-16 00:15:46 -08:00
0d6b107ed2
hey
2016-11-15 22:53:58 -08:00
c7a700dc22
new font strategy
2016-11-05 14:09:21 -07:00
352ae7e65b
ADAM
2016-10-26 08:35:44 -07:00
481b57a96a
So I have this new programming paradigm.......
2016-09-24 23:12:54 -07:00
73f7aacf35
better multigpu
2016-09-20 11:34:49 -07:00
5c067dc447
good chance I didn't break anything
2016-09-12 13:55:20 -07:00
8f1b4e0962
updates and things
2016-09-01 16:48:41 -07:00
845ab75796
some more stuff
2016-08-05 15:27:07 -07:00
9361292c42
updates
2016-07-19 14:50:01 -07:00
08c7cf9c88
no mean on input binarization
2016-06-19 14:28:15 -07:00
8322a58cf6
hate warnings
2016-06-14 11:30:28 -07:00
7520949d84
idk just in case
2016-06-08 11:07:31 -07:00
8a767f1066
stuff for carlo
2016-06-06 15:48:52 -07:00