Gpu programming смотреть последние обновления за сегодня на .
In this video, we talk about how why GPU's are better suited for parallelized tasks. We go into how a GPU is better than a CPU at certain tasks. Finally, we setup the NVIDIA CUDA programming packages to use the CUDA API in Visual Studio. GPUs are a great platform to executed code that can take advantage of hyper parallelization. For example, in this video we show the difference between adding vectors on a CPU versus adding vectors on a GPU. By taking advantage of the CUDA parallelization framework, we can do mass addition in parallel. Join me on Discord!: 🤍 Support me on Patreon!: 🤍
If you can parallelize your code by harnessing the power of the GPU, I bow to you. GPU code is usually abstracted away by by the popular deep learning frameworks, but knowing how it works is really useful. CUDA is the most popular of the GPU frameworks so we're going to add two arrays together, then optimize that process using it. I love CUDA! Code for this video: 🤍 Alberto's Winning Code: 🤍 Hutauf's runner-up code: 🤍 Please Subscribe! And like. And comment. That's what keeps me going. Follow me: Twitter: 🤍 Facebook: 🤍 More learning resources: 🤍 🤍 🤍 🤍 🤍 🤍 🤍 🤍 🤍 Join us in the Wizards Slack channel: 🤍 No, Nvidia did not pay me to make this video lol. I just love CUDA. And please support me on Patreon: 🤍 Follow me: Twitter: 🤍 Facebook: 🤍 Instagram: 🤍 Signup for my newsletter for exciting updates in the field of AI: 🤍 Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: 🤍
The Mythbusters, Adam Savage and Jamie Hyneman demonstrate the power of GPU computing.
What's the difference between a CPU and GPU? And what the heck is a TPU, DPU, or QPU? Learn the how computers actually compute things in this quick lesson. #computerscience #tech #programming 💬 Chat with Me on Discord 🤍 🔗 Resources Learn more about CPUs 🤍 CPU in 100 Seconds 🤍 Math for Programmers 🤍 JS Worker Threads 🤍 🔥 Get More Content - Upgrade to PRO Upgrade at 🤍 Use code YT25 for 25% off PRO access 🎨 My Editor Settings - Atom One Dark - vscode-icons - Fira Code Font 🔖 Topics Covered - What is a CPU architecture? - ARM vs x86-64 - CPU versus GPU - Why are GPUs so fast? - Why do you need a GPU? - What is a DPU? - Quantum computing basics - How are silicon chips made?
Learn to use a CUDA GPU to dramatically speed up code in Python. 00:00 Start of Video 00:16 End of Moore's Law 01: 15 What is a TPU and ASIC 02:25 How a GPU works 03:05 Enabling GPU in Colab Notebook 04:16 Using Python Numba 05:40 Building Mandlebrots with and without GPU and Numba 07:49 CUDA Vectorize Functions 08:27 Copy Data to GPU Memory Tutorial: 🤍 Book: 🤍 If you enjoyed this video, here are additional resources to look at: Coursera + Duke Specialization: Building Cloud Computing Solutions at Scale Specialization: 🤍 O'Reilly Book: Practical MLOps: 🤍 O'Reilly Book: Python for DevOps: 🤍 Pragmatic AI: An Introduction to Cloud-based Machine Learning: 🤍 Pragmatic AI Labs Book: Python Command-Line Tools: 🤍 Pragmatic AI Labs Book: Cloud Computing for Data Analysis : 🤍 Pragmatic AI Book: Minimal Python: 🤍 Pragmatic AI Book: Testing in Python: 🤍 Subscribe to Pragmatic AI Labs YouTube Channel: 🤍 View content on noahgift.com: 🤍 View content on Pragmatic AI Labs Website: 🤍
In this video, presenter Noel Chalmers introduces GPU programing concepts specific to ROCm. CTA: 🤍 CTA: 🤍 Watch the next video in the series: 🤍 View the full playlist: 🤍 * Subscribe: 🤍 Like us on Facebook: 🤍 Follow us on Twitter: 🤍 Follow us on Twitch: 🤍 Follow us on Linkedin: 🤍 Follow us on Instagram: 🤍 ©2020 Advanced Micro Devices, Inc. AMD, the AMD Arrow Logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other names are for informational purposes only and may be trademarks of their respective owners.
In this tutorial, we will talk about CUDA and how it helps us accelerate the speed of our programs. Additionally, we will discuss the difference between processors (CPUs) and graphic cards (GPUs) and how come we can use both to process code. By the end of this video - we will install CUDA and perform a quick speed test comparing the speed of our GPU with the speed of our CPU. We will create 2 extremely large data structures with PyTorch and we will multiply one by the other to test the performance. Specifically, I'll be comparing Nvidia's GeForce RTX 3090 GPU with Intel's i9-12900K 12th-Gen Alder Lake Processor (with DDR5 memory). I'll be posting some more advanced benchmarks in the next few tutorials, as the code I'm demonstrating in this video is 100% beginner-friendly! ⏲️ Time Stamps ⏲️ * 00:00 - what is CUDA? 00:47 - how processors (CPU) operate? 01:42 - CPU multitasking 03:16 - how graphic cards (GPU) operate? 04:02 - how come GPUs can run code faster than CPUs? 04:59 - benefits of using CUDA 06:03 - verify our GPU is capable of CUDA 06:48 - install CUDA with Anaconda and PyTorch 09:22 - verify if CUDA installation was successful 10:32 - CPU vs GPU speed test with PyTorch 14:20 - freeze CPU with torch.cuda.synchronize() 15:51 - speed test results 17:55 - CUDA for systems with multiple GPUs 18:28 - next tutorials and thanks for watching! 🔗 Important Links 🔗 * ⭐ My Anaconda Tutorial for Beginners: 🤍 ⭐ My CUDA vs. TensorRT Tutorial for Beginners: 🤍 ⭐ CUDA Enabled GPUS: 🤍 ⭐ Complete Notebook Code: 🤍 💻 Install with VENV instead of Anaconda (LINUX) 💻 * ❗install venv: $ sudo apt-get install -y python3-venv 🥇create working environment: $ python3 -m venv my_env 🥈activate working environment: $ source my_env/bin/activate 🥉install PIP3 and PyTorch+CUDA: (my_env) $ sudo apt install python3-pip (my_env) $ pip3 install torch1.10.1+cu113 torchvision0.11.2+cu113 torchaudio0.10.1+cu113 -f 🤍 🏆more information about VENV: 🤍 🏆more information about installing Pytorch: 🤍 🙏SPECIAL THANK YOU 🙏 * Thank you so much to Robert from Nvidia for helping me with the speed test code! Thank you to SFX Buzz for the scratched record sound: 🤍 Thank you to Flat Icon for the beautiful icon graphics: 🤍
What does a GPU do differently to a CPU and why don't we use them for everything? First of a series from Jem Davies, VP of Technology at ARM. Floating Point Numbers: 🤍 Why Computers Use Binary: 🤍 How Bitcoin Works: 🤍 Triangles & Pixels (Graphics Playlist): 🤍 🤍 🤍 This video was filmed and edited by Sean Riley. Computer Science at the University of Nottingham: 🤍 Computerphile is a sister project to Brady Haran's Numberphile. More at 🤍
Read the blog to learn how IBM Cloud enhances GPUs → 🤍 Check out IBM Cloud for GPUs → 🤍 In the latest in our series of lightboarding explainer videos, Alex Hudak is going tackle the subject of GPUs. What is a GPU? What's the difference between a GPU and CPU? What are the most relevant use cases for GPUs, and how do GPUs figure into your cloud strategy? Get started for free on IBM Cloud → 🤍 #GPU #HPC #AI
/Using the GPU can substantially speed up all kinds of numerical problems. Conventional wisdom dictates that for fast numerics you need to be a C/C wizz. It turns out that you can get quite far with only python. In this video, I explain how you can use cupy together with numba to perform calculations on NVIDIA GPU's. Production quality is not the best, but I hope you may find it useful. 00:00 Introduction: GPU programming in python, why? 06:52 Cupy intro 08:39 Cupy demonstration in Google colab 19:54 Cupy summary 20:21 Numba.cuda and kernels intro 25:07 Grids, blocks and threads 27:12 Matrix multiplication kernel 29:20 Tiled matrix multiplication kernel and shared memory 34:31 Numba.cuda demonstration in Google colab 44:25 Final remarks Edit 3/9/2021: the notebook is use for demonstration can be found here 🤍 Edit 9/9/2021: at 23:56 one of the grid elements should be labeled 1,3 instead of 1,2. Thanks to _ for pointing this out.
💡Enroll to gain access to the full course: 🤍 Artificial intelligence with PyTorch and CUDA. Let's discuss how CUDA fits in with PyTorch, and more importantly, why we use GPUs in neural network programming. Strange Loop: 🤍 🕒🦎 VIDEO SECTIONS 🦎🕒 00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources 00:30 Help deeplizard add video timestamps - See example in the description 13:03 Collective Intelligence and the DEEPLIZARD HIVEMIND 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👋 Hey, we're Chris and Mandy, the creators of deeplizard! 👉 Check out the website for more learning material: 🔗 🤍 💻 ENROLL TO GET DOWNLOAD ACCESS TO CODE FILES 🔗 🤍 🧠 Support collective intelligence, join the deeplizard hivemind: 🔗 🤍 🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order 👉 Use your receipt from Neurohacker to get a discount on deeplizard courses 🔗 🤍 👀 CHECK OUT OUR VLOG: 🔗 🤍 ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Tammy Mano Prime Ling Li 🚀 Boost collective intelligence by sharing this video on social media! 👀 Follow deeplizard: Our vlog: 🤍 Facebook: 🤍 Instagram: 🤍 Twitter: 🤍 Patreon: 🤍 YouTube: 🤍 🎓 Deep Learning with deeplizard: Deep Learning Dictionary - 🤍 Deep Learning Fundamentals - 🤍 Learn TensorFlow - 🤍 Learn PyTorch - 🤍 Natural Language Processing - 🤍 Reinforcement Learning - 🤍 Generative Adversarial Networks - 🤍 🎓 Other Courses: DL Fundamentals Classic - 🤍 Deep Learning Deployment - 🤍 Data Science - 🤍 Trading - 🤍 🛒 Check out products deeplizard recommends on Amazon: 🔗 🤍 🎵 deeplizard uses music by Kevin MacLeod 🔗 🤍 ❤️ Please use the knowledge gained from deeplizard content for good, not evil.
This video is part of an online course, Intro to Parallel Programming. Check out the course here: 🤍
In my previous video, I talked about why CPUs cannot have thousands of cores. While this is true, due to thermal, electrical, and memory limitations, alot of the comments in the video were about how CPU's have thousands of cores. In this video, we discuss the subtle differences in GPU microarchitecture, which makes CUDA "cores" and CPU cores significantly different. CPU cores are heavy computing machines, that are able to process arbitrary input from users using arbitrary programs. Because of this, CPUs are more generalized. GPUs on the other hand, are good at one thing: bulk processing on bulk data. 🏫 COURSES 🏫 🤍 🔥🔥🔥 SOCIALS 🔥🔥🔥 Low Level Merch!: 🤍 Follow me on Twitter: 🤍 Follow me on Twitch: 🤍 Join me on Discord!: 🤍
It’s 2019, and Moore’s Law is dead. CPU performance is plateauing, but GPUs provide a chance for continued hardware performance gains, if you can structure your programs to make good use of them. In this talk you will learn how to speed up your Python programs using Nvidia’s CUDA platform. EVENT: PyTexas2019 SPEAKER: William Horton PUBLICATION PERMISSIONS: Original video was published with the Creative Commons Attribution license (reuse allowed). ATTRIBUTION CREDITS: Original video source: 🤍
🤍 — Discussion & Comments: 🤍 — Presentation Slides, PDFs, Source Code and other presenter materials are available at: 🤍 — Computer system architecture trends are constantly evolving to provide higher performance and computing power, to support the increasing demand for high-performance computing domains including AI, machine learning, image processing and automotive driving aids. The most recent being the move towards heterogeneity, where a system has one or more co-processors, often a GPU, working with it in parallel. These kinds of systems are everywhere, from desktop machines and high-performance computing supercomputers to mobile and embedded devices. Many-core GPU has shaped by the fast-growing video game industry that expects a tremendous massive number of floating-point calculations per video frame. The motive was to look for ways to maximize the chip area and power budget dedicated to floating-point calculations. The solution is to optimize for execution throughput of a massive number of threads. The design saves chip area and power by allowing pipelined memory channels and arithmetic operations to have long latency. The reduce area and power on memory and arithmetic allows designers to have more cores on a chip to increase the execution throughput. In CPPCON 2018, we presented "A Modern C Programming Model for CPUs using Khronos SYCL", which provided an introduction to GPU programming using SYCL. This talk will take this further. It will present the GPU architecture and the GPU programming model; covering the execution and memory model. It will describe parallel programming patterns and common parallel algorithms and how they map to the GPU programming model. Finally, through this lens, it will look at how to construct the control-flow of your programs and how to structure and move your data to achieve efficient utilisation of GPU architectures. This talk will use SYCL as a programming model for demonstrating the concepts being presented, however, the concepts can be applied to any other heterogeneous programming model such as OpenCL or CUDA. SYCL allows users to write standard C code which is then executed on a range of heterogeneous architectures including CPUs, GPUs, DSPs, FPGAs and other accelerators. On top of this SYCL also provides a high-level abstraction which allows users to describe their computations as a task graph with data dependencies, while the SYCL runtime performs data dependency analysis and scheduling. SYCL also supports a host device which will execute on the host CPU with the same execution and memory model guarantees as OpenCL for debugging purposes, and a fallback mechanism which allows an application to recover from failure. — Gordon Brown Codeplay Software Principal Software Engineer, SYCL & C Edinburgh, United Kingdom Gordon Brown is a principal software engineer at Codeplay Software specializing in heterogeneous programming models for C. He has been involved in the standardization of the Khronos standard SYCL and the development of Codeplay's implementation of the standard from its inception. More recently he has been involved in the efforts within SG1/SG14 to standardize execution and to bring heterogeneous computing to C. — Videos Filmed & Edited by Bash Films: 🤍 *-* Register Now For CppCon 2022: 🤍 *-*
CUDA Teaching Center Oklahoma State University ECEN 4773/5793
EE380: Computer Systems Colloquium Seminar NVIDIA GPU Computing: A Journey from PC Gaming to Deep Learning Speaker: Stuart Oberman, NVIDIA Deep Learning and GPU Computing are now being deployed across many industries, helping to solve big data problems ranging from computer vision and natural language-processing to self-driving cars. At the heart of these solutions is the NVIDIA GPU, providing the computing power to both train these massive deep neural networks as well as efficiently provide inference and implementation of those networks. But how did the GPU get to this point? In this talk I will present a personal perspective and some lessons learned during the GPU's journey and evolution from being the heart of the PC gaming platform, to today also powering the world's largest datacenters and supercomputers. About the Speaker: Stuart Oberman is Vice President of GPU ASIC Engineering at NVIDIA. Since 2002, he has contributed to the design and verification of seven GPU architectures. He currently directs multiple GPU design and verification teams. He previously worked at AMD, where he was an architect of the 3DNow! multimedia instruction set and the Athlon floating-point unit. Stuart earned the BS degree in electrical engineering from the University of Iowa, and the MS and PhD degrees in electrical engineering from Stanford University, where he performed research in the Stanford Architecture and Arithmetic Group. He has coauthored one book and more than 20 technical papers. He holds more than 55 granted US patents. For more information about this seminar and its speaker, you can visit 🤍 Support for the Stanford Colloquium on Computer Systems Seminar Series provided by the Stanford Computer Forum. Colloquium on Computer Systems Seminar Series (EE380) presents the current research in design, implementation, analysis, and use of computer systems. Topics range from integrated circuits to operating systems and programming languages. It is free and open to the public, with new lectures each week. Learn more: 🤍
Introduction to NVIDIA's CUDA parallel architecture and programming model. Learn more by following 🤍gpucomputing on twitter.
Computer Architecture, ETH Zürich, Fall 2020 (🤍 Lecture 25: GPU Programming Lecturer: Professor Onur Mutlu (🤍 Date: December 30, 2020 Slides (pptx): 🤍 Slides (pdf): 🤍
This simple program will display "Hello World" to the console. The screen output will be produced by the GPU instead of the CPU.
In this video we look at a step-by-step performance optimization of matrix multiplication in CUDA! Spreadsheet: 🤍 For code samples: 🤍 For live content: 🤍
In this video, presenter René Van Oostrum provides a general introduction to ROCm and programing with HIP (Heterogeneous-Computing Interface for Portability). CTA: 🤍 CTA: 🤍 Watch the next video in the series: 🤍 View the full playlist: 🤍 * Subscribe: 🤍 Like us on Facebook: 🤍 Follow us on Twitter: 🤍 Follow us on Twitch: 🤍 Follow us on Linkedin: 🤍 Follow us on Instagram: 🤍 ©2020 Advanced Micro Devices, Inc. AMD, the AMD Arrow Logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other names are for informational purposes only and may be trademarks of their respective owners.
Computer Architecture, ETH Zürich, Fall 2022 (🤍 Lecture 26: GPU Programming Lecturer: Professor Onur Mutlu (🤍 Date: January 6, 2023 Lecture 26 Slides (pptx): 🤍 Lecture 26 Slides (pdf): 🤍 Recommended Reading: Intelligent Architectures for Intelligent Computing Systems 🤍 A Modern Primer on Processing in Memory 🤍 RowHammer: A Retrospective 🤍 RECOMMENDED LECTURE VIDEOS & PLAYLISTS: Computer Architecture Fall 2021 Lectures Playlist: 🤍 Digital Design and Computer Architecture Spring 2021 Livestream Lectures Playlist: 🤍 Featured Lectures: 🤍 Interview with Professor Onur Mutlu: 🤍 The Story of RowHammer Lecture: 🤍 Accelerating Genome Analysis Lecture: 🤍 Memory-Centric Computing Systems Tutorial at IEDM 2021: 🤍 Intelligent Architectures for Intelligent Machines Lecture: 🤍 Computer Architecture Fall 2020 Lectures Playlist: 🤍 Digital Design and Computer Architecture Spring 2020 Lectures Playlist: 🤍 Public Lectures by Onur Mutlu, Playlist: 🤍 Computer Architecture at Carnegie Mellon Spring 2015 Lectures Playlist: 🤍 Rethinking Memory System Design Lecture 🤍stanfordonline : 🤍
Presented at the Argonne Training Program on Extreme-Scale Computing 2017. Slides for this presentation are available here: 🤍
In this video, presenter Damon McDougall summarizes the various Compilers, Libraries and Tools available with ROCm. CTA: 🤍 CTA: 🤍 Watch the next video in the series: 🤍 View the full playlist: 🤍 * Subscribe: 🤍 Like us on Facebook: 🤍 Follow us on Twitter: 🤍 Follow us on Twitch: 🤍 Follow us on Linkedin: 🤍 Follow us on Instagram: 🤍 ©2020 Advanced Micro Devices, Inc. AMD, the AMD Arrow Logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other names are for informational purposes only and may be trademarks of their respective owners.
Is an Nvidia 4080 faster than a Threadripper 3970x? Dave puts them to the test! He explains the differences between how CPUs and GPUs operate and then explores whether the GPU can be leveraged to solve prime numbers faster than the CPU.
This video is part of an online course, Intro to Parallel Programming. Check out the course here: 🤍
Speed up your MATLAB® applications using NVIDIA® GPUs without needing any CUDA® programming experience. Parallel Computing Toolbox™ supports more than 700 functions that let you use GPU computing. Any GPU-supported function automatically runs using your GPU if you provide inputs as GPU arrays, making it easy to convert and evaluate GPU compute performance for your application. In this video, watch a brief overview, including code examples and benchmarks. In addition, discover options for getting access to a GPU if you do not have one in your desktop computing environment. Also, learn about deploying GPU-enabled applications directly as CUDA code generated by GPU Coder™. Parallel Computing Toolbox: 🤍 Get a free product trial: 🤍 Learn more about MATLAB: 🤍 Learn more about Simulink: 🤍 See what's new in MATLAB and Simulink: 🤍 © 2022 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See 🤍mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders.
Learn how to write, compile, and run a simple C program on your GPU using Microsoft Visual Studio with the Nsight plug-in. Find code used in the video at: 🤍 Learn more at the blog: 🤍
Julia has several packages for programming GPUs, each of which support various programming models. In this workshop, we will demonstrate the use of three major GPU programming packages: CUDA.jl for NVIDIA GPUs, AMDGPU.jl for AMD GPUs, and oneAPI.jl for Intel GPUs. We will explain the various approaches for programming GPUs with these packages, ranging from generic array operations that focus on ease-of-use, to hardware-specific kernels for when performance matters. Most of the workshop will be vendor-neutral, and the content will be available for all supported GPU back-ends. There will also be a part on vendor-specific tools and APIs. Attendees will be able to follow along, but are recommended to have access to a suitable GPU for doing so. Materials 🤍 🤍 Enjoyed the workshop? Consider sponsoring us on GitHub: 🤍 00:00 Welcome! 00:24 Welcome 01:20 Outline 02:44 JuliaGPU packages 04:08 JuliaGPU back-ends 05:34 GPU Architecture 07:25 Parallel programming models 08:55 Follow along and links to notebooks, JuliaHub 12:37 Start of tuturial with notebook 16:00 Array programming 28:20 Kernel programming 34:32 Parallel programming + questions 58:40 Profiling 1:01:50 Profiling: NVIDIA Nsight Systems: live example 1:11:00 Profiling: NVIDIA Nsight Compute: live example → optimize single kernel invocation 1:19:05 Common issues: unsupported array operations 1:21:50 Common issues: unsuppored kernel operations 1:27:40 Parallel programming issues 1:31:55 Tour of accompanying Github repo 1:32:40 Case Study I: Image processing using AMDGPU 1:57:00 Break 2:01:30 Case Study II: Fun with arrays, Machine Learning 2:10:47 Case Study III: Random number generators 2:22:10 Kernel abstractions 2:42:10 Example: Solving heat equation with GPU 2:56:30 Sneak peek of Enzyme (automatic differentiation framework) 2:59:18 Questions and Future plans Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: 🤍
In this tutorial I will go over the following: When to use the GPU for calculations over the CPU How to write Metal Compute Kernel Functions How multithreading with the GPU works How Threadgroups work The difference in time taken for the GPU vs the CPU Why the GPU shouldn't be called the GPU. Episode Source Code: 🤍 Resources: Threads and Threadgroups: - 🤍 - 🤍 - 🤍 Metal Shading Language Specification: 🤍 Become A Patron: 🤍 Discord: Join me on Discord for discussions about Metal. I am always open to talk code :) 🤍 Affiliate Links: Sweet Standing Desks: 🤍 Blender Tutorials: 🤍
Quick GPU #shorts for y'all! Need more info? Check these out: CUDA Powered GPUs: 🤍 NVidia CUDA: 🤍 ROCm: 🤍 Oh, and don't forget to connect with me! LinkedIn: 🤍 Facebook: 🤍 GitHub: 🤍 Patreon: 🤍 Join the Discussion on Discord: 🤍 Happy coding! Nick P.s. Let me know how you go and drop a comment if you need a hand!
Prof Soumyajit Dey Department of Computer Science and Engineering IIT Kharagpur
Computer Architecture, ETH Zürich, Fall 2017 (🤍 Lecture 9: GPUs and GPGPU Programming Lecturers: Professor Onur Mutlu (🤍 TA Juan Gomez Luna Date: October 19, 2017 Slides (ppt): 🤍 Slides (pdf): 🤍
MPAGS: High Performance Computing in Julia In this lecture, we talk about the concept of GPU programming, including the differences between GPU and CPU hardware. We discuss some models of how to compute on a GPU, with particular focus on CUDA and the CUDA.jl library. We cover some examples of the high-level array based programming mechanism provided by CUDA.jl to avoid the need to write one's own kernels. This is module designed for the Midlands Physics Alliance Graduate School (MPAGS). More information can be found on the website.
#GPU #C #AccuConf Parallel programming can be used to take advance of multi-core and heterogeneous architectures and can significantly increase the performance of software. It has gained a reputation for being difficult, but is it really? Modern C has gone a long way to making parallel programming easier and more accessible; providing both high-level and low-level abstractions. C11 introduced the C memory model and standard threading library which includes threads, futures, promises, mutexes, atomics and more. C17 takes this further by providing high level parallel algorithms; parallel implementations of many standard algorithms; and much more is expected in C20. The introduction of the parallel algorithms also opens C to supporting non-CPU architectures, such as GPU, FPGAs, APUs and other accelerators. This talk will show you the fundamentals of parallelism; how to recognise when to use parallelism, how to make the best choices and common parallel patterns such as reduce, map and scan which can be used over and again. It will show you how to make use of the C standard threading library, but it will take this further by teaching you how to extend parallelism to heterogeneous devices, using the SYCL programming model to implement these patterns on a GPU using standard C. - Michael Wong is the Vice President of Research and Development at Codeplay Software, a Scottish company that produces compilers, debuggers, runtimes, testing systems, and other specialized tools to aid software development for heterogeneous systems, accelerators and special purpose processor architectures, including GPUs and DSPs. He is now a member of the open consortium group known as Khronos and is Chair of the C Heterogeneous Programming language SYCL, used for GPU dispatch in native modern C (14/17), OpenCL, as well as guiding the research and development teams of ComputeSuite, ComputeAorta/ComputeCPP. For twenty years, he was the Senior Technical Strategy Architect for IBM compilers. He is a member of the ISO C Directions Group (DG), and the Canadian Head of Delegation to the ISO C Standard and a past CEO of OpenMP. He is also a Director and VP of ISOCPP.org, and Chair of all Programming Languages for Canada’s Standard Council. He has so many titles, it’s a wonder he can get anything done. He chairs WG21 SG14 Games Development/Low Latency/Financial/Embedded Devices and WG21 SG5 Transactional Memory, and is the co-author of a book on C and a number of C/OpenMP/Transactional Memory features including generalized attributes, user-defined literals, inheriting constructors, weakly ordered memory models, and explicit conversion operators. Having been the past C team lead to IBM’s XL C compiler means he has been messing around with designing the C language and C compilers for twenty-five years. His current research interest, i.e. what he would like to do if he had time is in the area of parallel programming, future programming models for Neural network, AI, Machine vision, safety/critical/ programming vulnerabilities, self-driving cars and low-power devices, lock-free programming, transactional memory, C benchmark performance, object model, generic programming and template metaprogramming. He holds a B.Sc from University of Toronto, and a Masters in Mathematics from University of Waterloo. He has been asked to speak/keynote at many conferences, companies, research centers, universities, including CPPCON, Bloomberg, U of Houston, U of Toronto, ACCU, CNow, Meeting C, ADC, CASCON, Bloomberg, CERN, Barcelona Supercomputing Center, FAU Erlangen, LSU, Universidad Carlos III de Madrid, Texas A&M University, Parallel, KIT School, CGO, IWOMP/IWOCL, Code::dive, many C Users group meetings, Euro TM Graduate School, and Going Native. He is the current Editor for the Concurrency TS and the Transactional Memory TS. 🤍 - Future Conferences: ACCU 2019 Autumn Conference, Belfast (UK): 2019-11-11 and 2019-11-12. ACCU 2020 Spring Conference, Bristol (UK), Marriott City Centre: 2020-03-24 to 2020-03-28. - ACCU Website: 🤍accu.org ACCU Conference Website: conference.accu.org ACCU Twitter: 🤍ACCUConf ACCU YouTube: 🤍 Filmed and Edited by Digital Medium Ltd - events.digital-medium.co.uk Contact: events🤍digital-medium.co.uk
In this video we introduce the field of GPU architecture that we expand upon in later videos in the series! For code samples: 🤍 For live content: 🤍
High-performance computing is now dominated by general-purpose graphics processing unit (GPGPU) oriented computations. How can we leverage our knowledge of C to program the GPU? NVIDIA's answer to general-purpose computing on the GPU is CUDA. CUDA programs are essentially C programs, but have some differences. CUDA comes as a Toolkit SDK containing a number of libraries that exploit the resources of the GPU: fast Fourier transforms, machine learning training and inference, etc. Thrust is a C template library for CUDA. In this month's meeting, Richard Thomson will present a brief introduction to CUDA with the Thrust library to program the GPU. Programming the GPU with CUDA is a huge topic covered by lots of libraries, tutorials, videos, and so-on, so we will only be able to present an introduction to the topic. You are encouraged to explore more on your own! Utah C Programmers meetup: 🤍 Utah C Programmers blog: 🤍 CUDA: 🤍 Thrust: 🤍
In Fall 2020 and Spring 2021, this was MIT's 18.337J/6.338J: Parallel Computing and Scientific Machine Learning course. Now these lectures and notes serve as a standalone book resource. 🤍 Chris Rackauckas, Massachusetts Institute of Technology Additional information on these topics can be found at: 🤍 and other Julia programming language sites Many of these descriptions originated on 🤍