|
038289eb7d
|
Some conv-lstm, sgdr and other fixes
|
2019-05-14 14:35:22 +03:00 |
|
|
4f72fcc015
|
Added grouped convolutional (depth-wise convolutional)
|
2019-05-10 16:46:48 +03:00 |
|
|
31dc6c8680
|
Added LSTM sequence detector, and blur data augmentation (for OpenCV only)
|
2019-05-07 16:28:31 +03:00 |
|
|
b3254ed523
|
Fixed many warnings
|
2019-03-18 19:25:48 +03:00 |
|
|
b6e15f1656
|
ZED 3D Camera support added to ./uselib (yolo_console_cpp.exe) example
|
2019-03-18 02:48:52 +03:00 |
|
|
75f2a3e7cf
|
Added object Detection & Tracking using conv-rnn layer on frames from video
|
2019-03-02 03:32:24 +03:00 |
|
|
9b09abe122
|
Fixed convolutional-layer when it is used as base for crnn-layer
|
2019-02-28 20:47:22 +03:00 |
|
|
00de023601
|
fully separate C-API from CPP-API
|
2019-02-19 15:57:18 +01:00 |
|
|
b3579380dc
|
improve compatibility with c++ compilers, prepare for CMake
|
2019-02-15 17:27:12 +01:00 |
|
|
28106c0fd8
|
Optimized memory allocation for XNOR on CPU
|
2019-02-12 22:16:11 +03:00 |
|
|
9e07605bc5
|
get_connected_workspace_size() and get_convolutional_workspace_size()
|
2019-02-08 00:51:20 +03:00 |
|
|
12b6e93893
|
CHECK_CUDA is used everywhere
|
2019-02-05 16:18:36 +03:00 |
|
|
f09a9c3315
|
XNOR uses Tensor Cores on Turing GPU CC>=7.3 (not Volta)
|
2019-02-02 00:24:34 +03:00 |
|
|
c7309c1fdb
|
Fixed CRNN (RNN based on Convolution) layer
|
2019-02-01 01:30:02 +03:00 |
|
|
640bdbc063
|
LSTM, RNN, GRU - use connected_layer that uses cuDNN. Fixed CRNN for conv-layer with cuDNN.
|
2019-01-28 23:50:51 +03:00 |
|
|
090d934c0f
|
Minor speedup on CPU
|
2019-01-26 19:12:46 +03:00 |
|
|
2d3220cef5
|
Look at wmma::bmma_sync(), bmmaBitOpXOR, bmmaAccumulateOpPOPC
|
2019-01-23 00:35:44 +03:00 |
|
|
46be08db37
|
Minor fix
|
2019-01-22 16:23:44 +03:00 |
|
|
3a51f4af74
|
Experimental repack
|
2019-01-18 19:52:11 +03:00 |
|
|
5343aa4235
|
CUDA minor performance improvement
|
2019-01-16 18:08:11 +03:00 |
|
|
4c05166215
|
Temporary experimental XNOR on GPU (repack channels)
|
2019-01-16 02:43:44 +03:00 |
|
|
c75fbb5f2e
|
Minor fix
|
2019-01-06 15:45:10 +03:00 |
|
|
48d461f9bd
|
Temporary experimental XNOR improvements on CPU
|
2019-01-04 23:19:45 +03:00 |
|
|
64e478db07
|
Fix training approach (convolutional layer)
|
2018-12-27 00:31:28 +03:00 |
|
|
cb998db949
|
Some fix for CUDNN_HALF
|
2018-12-11 21:16:18 +03:00 |
|
|
7c2f302321
|
Fixed nan issue for training with CUDNN_HALF=1 by using Tensor Cores
|
2018-12-07 22:40:10 +03:00 |
|
|
25f65f6878
|
Added fast_binarize_weights_gpu()
|
2018-11-05 22:38:35 +03:00 |
|
|
31ac46ba22
|
Fixed bug for 32-bit compilation without GPU.
|
2018-10-23 21:59:57 +03:00 |
|
|
d487bdf471
|
transpose 32x32 on GPU
|
2018-10-19 22:55:25 +03:00 |
|
|
9e2c894a32
|
Transpose on CPU fix
|
2018-10-19 16:31:55 +03:00 |
|
|
7dd97537fb
|
XNOR-net tiny-yolo_xnor.cfg ~2x faster than cuDNN on CUDA (nVidia GPU Maxwell)
|
2018-09-22 02:01:14 +03:00 |
|
|
c0e01fd63c
|
Test for XNOR-conv on CUDA
|
2018-09-08 02:46:05 +03:00 |
|
|
b141f85cab
|
Compile fix
|
2018-09-07 15:07:46 +03:00 |
|
|
007878393f
|
Temporary Slow implementation of XNOR on CUDA (shared_memory)
|
2018-09-06 23:21:26 +03:00 |
|
|
c4a9e3422e
|
Temporary implementation of XNOR on CUDA
|
2018-08-31 02:47:58 +03:00 |
|
|
9753b72aeb
|
temp fix, don't use it
|
2018-08-30 17:24:41 +03:00 |
|
|
18d5e4f39c
|
Fixed yolov3-tiny_xnor.cfg
|
2018-08-24 18:29:40 +03:00 |
|
|
31b6b0bad3
|
XNOR-net 4x acceleration on CPU for yolov2-tiny - 22 FPS (CPU Core i7 6700K)
|
2018-08-23 02:44:21 +03:00 |
|
|
f606b5456e
|
XNOR-net 21 FPS on CPU yolov2-tiny.cfg
|
2018-08-22 17:52:48 +03:00 |
|
|
f92b20580a
|
Some fixes for AVX support on CPU
|
2018-08-14 01:51:31 +03:00 |
|
|
b1dddf02cc
|
Fixed AVX compiled bug
|
2018-08-13 02:43:45 +03:00 |
|
|
1f2155b886
|
Experiments
|
2018-08-11 02:49:55 +03:00 |
|
|
a9fef1bd66
|
Bug fixes. Tested im2col_cpu_custom_transpose - bad way.
|
2018-08-11 00:26:53 +03:00 |
|
|
3e856ec04e
|
Optimized: transpose
|
2018-08-10 01:27:20 +03:00 |
|
|
d6162af210
|
Optimized on CPU: gemm_bin, im2col, activation, transpose
|
2018-08-09 02:31:36 +03:00 |
|
|
a284a7da8d
|
Try to use avx_hs() - slow and requires alignment 4096 bits < (l.size*l.size*l.c)
May be faster only from 8192 bits and more.
|
2018-08-08 19:08:58 +03:00 |
|
|
0a326e7afe
|
XNOR-net on CPU AVX2
|
2018-08-08 02:45:47 +03:00 |
|
|
cfc5fedbb6
|
Just used spaces for indents instead of Tabs
|
2018-07-10 23:29:15 +03:00 |
|
|
ec68838342
|
Fixed memory leaks for Yolo: train, test
|
2018-05-23 18:27:18 +03:00 |
|
|
c1bb8c129d
|
Fixed xnor for random=1
|
2018-05-19 16:52:05 +03:00 |
|