Commit Graph

89 Commits

Author SHA1 Message Date
ae3ce1b181 Minor fixes 2019-05-20 22:37:42 +03:00
4f72fcc015 Added grouped convolutional (depth-wise convolutional) 2019-05-10 16:46:48 +03:00
da74882fe1 cleanup in preparation to opencv-4 work 2019-03-28 17:54:27 +01:00
b6e15f1656 ZED 3D Camera support added to ./uselib (yolo_console_cpp.exe) example 2019-03-18 02:48:52 +03:00
75f2a3e7cf Added object Detection & Tracking using conv-rnn layer on frames from video 2019-03-02 03:32:24 +03:00
b3579380dc improve compatibility with c++ compilers, prepare for CMake 2019-02-15 17:27:12 +01:00
3d9c8530a0 Use Tensor Cores only when (channels % 8 == 0) and (filters % 8 == 0) 2019-02-12 23:13:25 +03:00
5448e07445 Try to fuse conv_xnor+shortcut -> conv_xnor 2019-02-12 02:05:15 +03:00
fa1415e3c2 CUDNN_HALF and CC 7.5 by default in darknet.sln 2019-02-05 20:43:07 +03:00
edfdf2c20e Fixed bug in Tensor Cores training 2019-02-05 19:33:10 +03:00
12b6e93893 CHECK_CUDA is used everywhere 2019-02-05 16:18:36 +03:00
d767e8ca38 Minor fixes 2019-02-04 23:29:06 +03:00
61156239e0 Minor performance improvement 2019-02-03 00:18:30 +03:00
41814fc4b3 Minor fixes 2019-02-02 15:16:57 +03:00
f91d5a5e09 Fixed __shfl() and __ballot() warnings 2019-02-02 03:16:05 +03:00
f09a9c3315 XNOR uses Tensor Cores on Turing GPU CC>=7.3 (not Volta) 2019-02-02 00:24:34 +03:00
640bdbc063 LSTM, RNN, GRU - use connected_layer that uses cuDNN. Fixed CRNN for conv-layer with cuDNN. 2019-01-28 23:50:51 +03:00
85b99872cb Use non-default stream for all CUDA-functions 2019-01-28 20:19:26 +03:00
17019854c3 XNOR minor fix 2019-01-19 03:18:50 +03:00
0e022d0912 Fixed timer 2019-01-18 21:29:06 +03:00
5343aa4235 CUDA minor performance improvement 2019-01-16 18:08:11 +03:00
4c05166215 Temporary experimental XNOR on GPU (repack channels) 2019-01-16 02:43:44 +03:00
64e478db07 Fix training approach (convolutional layer) 2018-12-27 00:31:28 +03:00
3969ce30ed Speedup Tensor Cores: 1st layer uses FP32 and pre-allocate GPU memory for Tensor Cores 2018-12-11 23:48:58 +03:00
25f133d6ef Another one minor fix 2018-12-11 21:26:36 +03:00
cb998db949 Some fix for CUDNN_HALF 2018-12-11 21:16:18 +03:00
a621235783 Switch to Tensor Cores after 2000 iterations. 2018-12-10 01:35:08 +03:00
dc7f8a32ae mAP calculation during training, if is used flag -map 2018-12-09 18:18:47 +03:00
742bb7c7ce Compile fix 2018-12-07 22:52:07 +03:00
7c2f302321 Fixed nan issue for training with CUDNN_HALF=1 by using Tensor Cores 2018-12-07 22:40:10 +03:00
21a4ec9390 Saving loss-chart for each 100 iterations automatically 2018-11-26 11:11:56 +03:00
9f7d7c58b5 Minor fixes. Use CUDA 10.0 2018-11-17 02:48:46 +03:00
25f65f6878 Added fast_binarize_weights_gpu() 2018-11-05 22:38:35 +03:00
c0e2512af2 Activation improvement, more robust timer. 2018-09-27 23:10:54 +03:00
7dd97537fb XNOR-net tiny-yolo_xnor.cfg ~2x faster than cuDNN on CUDA (nVidia GPU Maxwell) 2018-09-22 02:01:14 +03:00
03e95320a1 XNOR coalesced memory access, and avoid bank conflicts 2018-09-17 23:39:25 +03:00
ca43bbdaae Fixed openmp bugs for XNOR 2018-09-12 16:22:54 +03:00
c0e01fd63c Test for XNOR-conv on CUDA 2018-09-08 02:46:05 +03:00
b141f85cab Compile fix 2018-09-07 15:07:46 +03:00
007878393f Temporary Slow implementation of XNOR on CUDA (shared_memory) 2018-09-06 23:21:26 +03:00
c4a9e3422e Temporary implementation of XNOR on CUDA 2018-08-31 02:47:58 +03:00
9753b72aeb temp fix, don't use it 2018-08-30 17:24:41 +03:00
cfc5fedbb6 Just used spaces for indents instead of Tabs 2018-07-10 23:29:15 +03:00
9bae70b225 Accelerated by another 5% using FP16/32 Batch-norm for Tensor Cores. 2018-04-17 02:51:11 +03:00
537d135feb Improve training performance - batch-norm using cuDNN. 2018-03-20 02:16:51 +03:00
880cf187d8 Fixed multi-GPU training for Tensor Cores 2018-03-09 19:44:46 +03:00
cad4d1618f Added support for Tensor Cores CC >= 7.0 (V100). For FP16/32 (mixed precision) define CUDNN_HALF should be used. 2018-02-25 16:29:44 +03:00
cd2bdec090 Updated to CUDA 9.1. And fixed no_gpu dependecies. 2018-02-23 15:05:31 +03:00
6332ea99ab one more fix 2018-02-23 00:13:08 +03:00
b2b5756d86 Added __float2half_rn() and __half2float() 2018-02-22 23:52:43 +03:00