How to Choose the Right Azure Virtual Machine for Data Warehousing
With over 600 virtual machines available on Microsoft Azure, selecting the right one can be challenging. In this guide, we will explore the parameters of VM needed for data transformation and the number of cores needed to increase throughput and processing speed.
When it comes to choosing the right virtual machine for data warehousing on Azure, you should prioritize the number of cores and RAM.
The best way of looking at virtual machines for Data Warehousing is to focus on the parameters of the VM needed for Data Transformation.
The memory-optimized Ebd series is a popular choice for most projects.
The amount of memory available is directly proportional to the number of cores selected. Depending on the amount of memory you require, you can choose from virtual machines with 4, 8, or 16 cores.
Costs of Azure Virtual Machine?
It’s essential to consider that we’ll not run the VM for the full 720 hours (24 hours/day) but for 2 hours every working day, which makes 40 hours – that is 18 times less. Another cost consideration for any VM is storage. Depending on the amount of data you need to store, you may need to add disk storage to your virtual machine. The cost of storage will add to your total cost of the virtual machine, so make sure to account for this when selecting the right VM.
Finally, the performance of Azure Virtual Machine is crucial to consider.
The eight-core VM can process up to 50 to 100 GB of Dynamics data per hour, while the 16-core VM can handle 150 to 200 GB per hour. Keep in mind that the numbers can vary by up to 50%, depending on other factors such as the size of your tables and the amount of data you have. Number of cores significantly impacts the performance of the virtual machine. If you are working on a large project where speed is essential, you should aim to have as many cores as your budget allows.
Let’s have a look at three phases to understand better how the process works:
Phase One: Copying data into the Data Warehouse
When it comes to getting data into the Data Warehouse, you will need to copy the data from Dynamics, legacy systems and/or external tables. For this operation, you will need a virtual machine with a sufficient number of cores. The number of cores will define the number of lines in the data traffic. The more lines, the better throughput and faster processing speed. This is why you should prioritize the number of cores when choosing a virtual machine for this phase.
Phase Two: Doing the Data Transformation in the Data Warehouse
Once the data is in the Data Warehouse, you will need to transform it. The Data Warehouse is a SQL database that consists of a few hundred tables. When performing data warehouse transformation, joint operations are required, and you want to join these tables in memory. However, it’s not always possible to do everything in memory because it would require a lot of memory. As a result, some parts of the operation will have to be performed using disk space.
Phase Three: Pushing Data to the Tabular Database
After transforming the data, you will need to push it to the Tabular database. To accomplish this, you will need a virtual machine with a sufficient amount of RAM. This is because the Tabular database requires a lot of RAM – the more RAM you have, the better the performance.
Conclusion
By understanding the parameters of the VM needed for data transformation, and the number of cores and RAM needed, you can make an informed decision. With the right virtual machine, you can improve the speed and efficiency of your application and reduce your costs. Selecting the right Azure virtual machine is also a critical step to ensure the success of your project.
Watch the video to learn more about how to choose the right virtual machine for data warehousing on Microsoft Azure.
See also videos on other topics here.