Roadmap 2023 #4578

JanuszL · 2023-01-17T20:17:28Z

The following represents a high-level overview of our 2023 plan. You should be aware that this roadmap may change at any time and the order below does not reflect any type of priority.

We strongly encourage you to comment on our roadmap and provide us feedback on this issue here.

Some of the items mentioned below are the continuation of the 2022 effort (#3774)

Improving Usability:

eager mode - extending support for using DALI operators as standalone entities and improving their interoperability with other libraries like VPF, CV-CUDA or MONAI
conditional execution - providing a convenient API to conditionally apply operation based on a predicate, providing AutoAugment style capabilities -
- conditional execution itself (Add tutorial for using conditionals in DALI #4569, Add experimental support for if statements in DALI #4561, Make conditionals work in debug mode #4738, Fix classification of argument input-only operators in AutoGraph #4618, Track DataNodes produced by .gpu() in conditionals #4602, Add DALI Conditionals documentation #4589, Support inferring batch size from tensor argument inputs #4617, Add lazy and and or, and not lazy not support #4629, Fix the logical expression tests to avoid short-cutting them #4676)
- automatic augmentation module with AutoAugment, RandAugment, and TrivialAugment ([AA] Add auto augmentation wrapper #4694, Add augmentations used by AA #4699, [AA] Add select operator/wrapper #4696, Add AutoAugment and ImageNet policy #4702, Add RandAugment and TrivialAugment to auto_aug module #4704, Do not use numpy.typing when not available #4706, Rename as_param to mag_to_param #4710, Add more AutoAugment policies #4753, Add EfficientNet example using automatic augmentations with DALI #4678)
support for NVIDIA Grace Hopper Superchip, this includes flexible execution model utilizing fast CPU<->GPU memory transfers, where data can go from CPU to GPU and back to the GPU in single pipeline

Extending input format support:

Extending support of formats and containers with variable frame rate videos
- decoding raw H264 and H265 streams from memory (Extend decoding support #4480)
Support for higher dynamic ranges data (int32, float) through the whole data processing pipelines
Adding GPU acceleration for more image formats, like TIFF or new profiles of the existing one
- lossless JPEG decoding on CPU and GPU with fn.experimental.decoders.image (Optimize CPU time of JPEG lossless decoder #4625, Skip JPEG lossless tests for compute capability < SM60 #4600, Improve error message when trying to decode JPEG lossless images on the CPU backend #4587, Support for JPEG lossless images in GPU fn.experimental.decoders.image #4572, Add nvjpeg calls used for lossless jpeg decoding to the stub generator #4592, Add axes_utils.h #4548)

Performance:

optimizing memory consumption
- cudaMallocAsync support (Add a memory resource based on cudaMallocAsync #4900, Add alignment to cuda_malloc_async_memory_resource. #4923, and Fix number of devices for JAX multigpu test #4921)
- API for pre-allocation and releasing of memory pools (Add functions to preallocate pools and release unused pool memory #4563, Add release_unused function to memory pools. #4556)
operators performance optimizations
- O_DIRECT support mode support to fn.readers.tfrecord (Add O_DIRECT support to the TFRecord reader #4820).
- O_DIRECT mode support to fn.readers.numpy (Add O_DIRECT support in numpy_reader #4796, Fix race condition in the CPU numpy reader #4848)

New transformations:

We are constantly extending the set of operations supported by DALI. Currently, this section lists the most notable additions to our areas of interest that we plan to do this year. This list is not exhaustive and we plan on expanding the set of operators as the needs or requests arise.

new transformations for general data processing
- fn.experimental.tensor_resize operator (Add experimental.tensor_resize operator #4492)
new transformations for image processing
- median blur - Add median blur operator #4950, Exclude median_blur test from xavier tests #4975
- histogram equalization operator (fn.experimental.equalize) (Add equalize CPU variant #4742, Equalization operator #4575, Equalize kernel #4565).
- 2-D convolution(fn.experimental.filter) (Add CPU filter operator #4764, Add GPU filter kernel #4298, Add GPU filter operator (2D, 3D) #4525).
new transformations for video processing
- the above image transformations are applicable to video as well

The text was updated successfully, but these errors were encountered:

songyuc · 2023-02-06T09:22:02Z

Hi, guys!
Is a planned release date for stable support for the Conditional Execution?

JanuszL · 2023-02-06T09:26:47Z

Hi @songyuc,

I think it is rather a matter of sufficient testing than feature completeness (in DALI 1.23 and the current nightly builds it is/going to be available in the experimental module). What we are focusing on now is checking the quality and performance in different uncases. I hope we can call it stable (not experimental anymore) in a couple of releases from now.

songyuc · 2023-02-06T09:59:48Z

Hi @songyuc,

I think it is rather a matter of sufficient testing than feature completeness (in DALI 1.23 and the current nightly builds it is/going to be available in the experimental module). What we are focusing on now is checking the quality and performance in different uncases. I hope we can call it stable (not experimental anymore) in a couple of releases from now.

Thanks for your reponse!
I will try to apply it in DALI 1.23.

prefer-potato · 2023-04-14T13:11:40Z

@pipeline_def(batch_size = 5, num_threads=2, device_id=0, py_num_workers=4, py_start_method='spawn')
def my_pipepline(shard_id, num_shards, batch_size):
    jpegs, labels = fn.external_source(
        source=create_callback(batch_size, shard_id, num_shards),
        num_outputs=2, batch=False, parallel=True, dtype=[types.UINT8, types.INT32])
    decode = fn.decoders.image(jpegs, device="mixed")
    return decode, labels

pipe = my_pipepline(batch_size = 10, shard_id=0, num_shards=2)

I have to provide 2 'batch_size' for external_source in 'my_pipepline' and 'pipeline_def'. Sometimes, 'batch_size' is a external arg and is not equal to the preseted value in 'pipeline_def'.

For example, in this case, i will get only 5 samples in each 10-sample-batch.

env: python 3. 11, pytorch 2.0, cuda118.

JanuszL · 2023-04-14T13:20:01Z

@prefer-potato,

I have to provide 2 'batch_size' for external_source in 'my_pipepline' and 'pipeline_def'. Sometimes, 'batch_size' is a external arg and is not equal to the preseted value in 'pipeline_def'.

I understand this may be inconvenient in some cases. The idea is that the batch size provided to the pipeline is the maximum one while the external source has the freedom of providing batches of variable length (in your case it is fixed but it doesn't have to be).

prefer-potato · 2023-04-15T09:30:52Z

thank you for replying very much.🥰🥰

Etzelkut · 2023-04-25T08:56:46Z

Hello, is there any plans in accelerating audio reading and compression (mp3)? Or is by any chance the different team (library) that you know of, who is working on that? Thanks for your work!

JanuszL · 2023-04-25T10:47:57Z

Hi @Etzelkut,

Thank you for reaching out.

Hello, is there any plans in accelerating audio reading and compression (mp3)? Or is by any chance the different team (library) that you know of, who is working on that?

We are internally discussing this. Could you tell us what are your use case? Is it about training or inference? What do you use now for the decoding?

dyhan316 · 2023-04-26T08:23:28Z

Hello! Is there any plans on making readers for NIFTI file formats?

The three reasons I give is :

most neuroimaging data, even one that were preprocessed by a pipeline, are in NIFTI formats.
NIFTI formats contain meta-data that are useful.
Not having to save two versions of the same file.

Reason 1

I cannot speak for all medical imaging people, but at least in neuroimaging, I believe that nii.gz, nii formats are mostly used as inputs/outputs of data preprocessing.

(below is a list of preprocessing pipelines that input/output NIFTI files)

Freesurfer (Structural MRI data preprocessing pipeline)
QSIPrep (diffusion MRI data preprocessing pipeline)
fMRIPrep (functional MRI data preprocessing pipeline)
... and so on

Reason 2.

Moreover, unlike .npy files, NFITI file format also stores meta data specific to the image that are useful, such as dimension info, time for each slice (for example, the length of each image in 4D fMRI), affine matrix, size of each voxel, and so on. It may be helpful if those meta data can be accessed via DALI reader

Reason 3.

Due to the extra meta data that the NIFTI files contain, we cannot just delete the NIFTI files to make room for .npy files. This leads to us having two versions of the same files, one in .npy to be used for DALI, and another in .nii format to be used for other uses.

Thank you for reading this, and making a powerful tool!

JanuszL · 2023-04-26T08:29:03Z

Hi @dyhan316,

Thank you for reaching out.
Have you checked cuCIM library - it may provide the workflow you are looking for.
However, if you still prefer to use DALI, I would start with the external source operator and use one of the python libraries for the initial data loading inside it, like this one.

Etzelkut · 2023-04-26T09:35:02Z

Hi @JanuszL,

We are internally discussing this. Could you tell us what are your use case? Is it about training or inference? What do you use now for the decoding?

Thanks for your reply! It is more related to training, because a lot of researchers, who work on audio, need to constantly load and sometimes save audio, and then move data from RAM (CPU) to GPU. This can be seen as one of the bottlenecks in training speed. Loading and then decoding .mp3 files are done on the CPU, and I was not able to find a suitable library that would do it on GPU. That would be very helpful in audio-related research if, in the future, there would be a library that would load and decode audio in GPU (similarly to images and video) but also encodes it back to mp3 (or change formats from .wav to .mp3) on GPU. Right now, we are using torchaudio.

dyhan316 · 2023-04-27T11:22:53Z

Thank you @JanuszL for your suggestion :)

filippocastelli · 2023-11-28T15:39:28Z

Hello !
Will non-experimental support for python 3.11 be added to this or the next year's roadmap?
We've been seeing growing adoption in 3.11 as a platform and having DALI officially support it would be great

Thank you in advance for reading this, have a nice day!

JanuszL · 2023-11-28T16:16:27Z

Hi @filippocastelli,

Your question comes right in time. It has just been added in #5196 and #5174 and will ship in 1.33 release.

JanuszL · 2024-02-14T08:02:03Z

Please continue using #5320.

JanuszL pinned this issue Jan 17, 2023

JanuszL mentioned this issue Jan 17, 2023

DALI 2022 roadmap #3774

Closed

JanuszL changed the title ~~GitHub Roadmap 2023~~ Roadmap 2023 Jan 17, 2023

jantonguirao assigned klecki Jan 18, 2023

klecki assigned JanuszL Jan 20, 2023

awolant mentioned this issue Jan 24, 2023

Question about experiemental video reader #4606

Open

songyuc mentioned this issue Mar 15, 2023

The pip command install dali 0.27.0 on python-3.11-conda environment #4417

Closed

awolant mentioned this issue Mar 20, 2023

Will operations like sort, argsort or nms get implemented? #4725

Open

JanuszL mentioned this issue Feb 14, 2024

GitHub Roadmap 2024 #5320

Open

JanuszL unpinned this issue Feb 14, 2024

JanuszL closed this as completed Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap 2023 #4578

Roadmap 2023 #4578

Roadmap 2023 #4578

Roadmap 2023 #4578

Comments

Improving Usability:

Extending input format support:

Performance:

New transformations:

Reason 1

Reason 2.

Reason 3.