Cannot associate an Azure WAF policy to regional Azure Application Gateway (v2)

Since App Gateway v2, one can enable the new WAF policies that will allow more basic tweaking of the Rules/Setting of WAF for instance: exclusions, custom roles, multiple ruleset and more – yet still a basic WAF, still an OK pricing too thought.

Anyhow, the idea is that you create an App Gateway with WAF, by default, it only comes with a v1 policy, associating a WAF policy to the App Gateway will then only upgrade it to v2 – the policy not the App Gateway itself, as it does need to be v2 itself…

And so, I was facing some issues as trying to pre-configure everything in a WAF policy then associate it to an App Gateway but it would fail as the deployment spits out:

{‘code’:’DeploymentFailed’,’message’:’At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.’,’details’:[{‘code’:’BadRequest’,’message’:'{\r\n \’error\’: {\r\n \’code\’: \’ApplicationGatewayWafPolicyCannotBeAttachedDueToConflict\’,\r\n \’message\’: \’Cannot attach Firewall policy /subscriptions/yourID/resourceGroups/yourRG/providers/Microsoft.Network/ApplicationGatewayWebApplicationFirewallPolicies/WAFpolicy1 to the Application Gateway /subscriptions/yourID/resourceGroups/sameRG/providers/Microsoft.Network/applicationGateways/webgw1, since the former is not in sync with WebApplicationFirewallConfiguration.\’,\r\n \’details\’: []\r\n }\r\n}’}]}

the error

By mismatch, it really wanted to tell you that the Azure WAF policy and the App Gateway WAF policy don’t match. Checking the error (since the former is not in sync with WebApplicationFirewallConfiguration)…You have to take into consideration the following:

When adding a Waf policy, it must match the exact configuration of his App gateway currently (or the currently attached waf policy if replacing). Once you have a Policy associated with your Application Gateway, then you can continue to make changes to your WAF rules and settings.

This means the policy must contain the exact same settings that are currently applied to the gateway. Same

Custom Rules
-Waf configuration (it is very important to check owasp set and disabled rules must be identical)
-Exclusions
-Global Parameters if applied

Hoping this helps…

#appgw, #application-gateway, #azure, #policy

SaaS application integration What do you need to ask?

Don’t let your application govern you. When selecting a SaaS and application as a service, ask the right questions to the prospective vendor.

What is your disaster recovery plan?
Most SaaS vendors have a disaster recovery plan, but not all plans are created equal. Some mistakenly believe taking regular backups constitutes disaster recovery.
Make sure your SaaS vendor has a solid plan that covers a recovery timeline, routine testing, and geographic isolation. In other words, if there is an earthquake, is that going to wipe out all of your data centres?
If the SaaS vendor doesn’t provide recovery services, do they allow your IT to access the data and back it up?

What if you go out of business? What is your data-out policy?
Often we think of catastrophic events in the form of natural disasters, but a vendor going out of business can do just as much damage. Having access to that data is non-negotiable no matter what happens outside your control.
There must be a data-out clause in your contract that allows you access to your data at any time, especially if the business is not going well.

Do you take my security seriously?
In the digital work, security is paramount and it doesn’t take a lot for your company to get on the front page of the news.
If you find it difficult to know which security features are most important, bring in your IT department for guidance.

You will want to find an answer to the following:
Residency of the data – under which country/province legislation will your data be governed by?
Encrypting data – when stored with the vendor data must be encrypted with a key you can control to ensure only you can view it. Ask about their zero-trust policy.
Secure data transmission and storage – not only when stored but also when the data is in transit between the users and systems, it needs to be secured. Ask about the protocols in place.
Access restrictions – who has access to what? Where it is to the data or to the data centres hosting your data ensure you are comfortable with strangers around your data.
Staff training, Sensitive Information handling – if provided access, will support or maintenance staff know what to do shall they stumble upon sensitive information. Most vendors will train their staff to ensure they know what to do and are under oath to not mishandle or leak your data.
Secure practices and certifications – the industry has various certifications that vendors need to achievement to show their business is secure. Demand those!
Regular monitoring and scanning – similarly you might want to know what was the result of their last security penetration testing. How did they score? are they transparent? Will they allow your IT team to perform their own tests?

How scalable is your product?
It is one thing to watch a flawless demo or run through a proof of concept without a glitch. But can the application withstand what the real world throws at it? Unfortunately, it is tough to know the answer to this until the real world happens.
For example, if one of the other clients of the service provider executes a huge project, is that going to negatively impact security? It is smart—and absolutely appropriate—to inquire about how well the vendor can scale their product to meet demands, and how quickly those demands will be met.

There has to be a strong bond to the clearly announced SLAs and SLOs. The vendor should clarify the following:
Response time in case of emergency (service interruption) – when things go down, how fast can they answer you or provide you with a status? What guarantees do they have to ensure the system to be available to you and your team?
Response time in case of non-urgent question or problem (utilization or configuration) – even when not critical, you want to know how your business is a priority to them.
Is there a guaranteed resolution time? – If so, what is it?
What is the support escalation ladder (what are the various levels and their roles?) – when the above SLA or SLO above are defined they are well known. As lack of transparency is a bad sign.
Are SLA breach backed by penalties? You are paying money for this, how about getting some back shall the promised service be not available?

Do You Offer Robust Integration?
Sometimes it is easy to forget that to ensure productivity, the new systems need to be well integrated for them to allow for automation and ease of use for higher adoption. In general, think about all of the interfaces this application is going to have to ensure as little as possible human intervention.

Single Sign-On and authentication – the vendor must support the use of our credentials to login with platforms like Azure Active Directory (SAML 2.0)
Automated user provisioning – with all systems, the user-based need to be created and can be easily maintained with seamless provisioning from platforms like Azure Active Directory (CIM)
Document Storage within Office 365 (Teams/SharePoint Online) – ease of access and security is best when documents don’t need to go outside of your corporate boundaries.
If the app is part of any financial systems, is it able to speak that language and present the information so that the recipient system can process?
Also, most organization have some Business Intelligence data lake where the data is processed to create funky reports and dashboards. It is very important for your executive team to be able to view important scorecard using that single system. Can the vendor allow pull access to something like Cognos Analytics? Can it export the relevant data to another location?

Microsoft Phone System Azure VoiceMail transcription is garbage

When we moved to Microsoft Phone System I was psyched: my phone would follow me around – who would want that? – it would work on any devices I carry around, it would notify me in my inbox of missed calls with information shall I have some contact information for them in my contact book, and also transcribe my VoiceMail for me so I don’t have to listen to them anymore.

Actually, my voicemail says to not leave a voicemail, just send me an email because I never check my VoiceMails anyways. And I thought
Microsoft Phone System Azure VoiceMail transcription would help with that.

After a couple month of usage, it turns out whatever engine Microsoft uses for transcription of audio to text is utter rubbish:

  • it misses the whole transcription even if the audio is clear enough
  • it usually doesn’t transcribe the start of the VoiceMail, so you’d miss the name of the caller
  • it doesn’t understand Microsoft jargon
  • it transcribes words that don’t even exist – not in my dictionary

It sucks so much, that I actually have to listen to the voicemail to get an idea of the audio Azure VM was trying to transcribe. To its defence, it seems that the deeper in the voicemail, the better the transcription works.

And so I took the audio file and have a few other transcription services online I found duckduckgo-ing around. they all did better!

  • Original vs Azure VM
    • Levenshtein: 393
  • Orginal vs Sonix
    • Levenshtein: 224
  • Orginal vs Temi
    • Levenshtein: 120

I am not familiar with the other provider but overall the transcription quality from all of them is so much accurate.

I believe Microsoft is missing out a lot. I had created a support ticket with Microsoft. It remained in the queue for 4 months after they finally threw me away with the promise of a new engine coming late 2018 – still waiting for it at the date of this post.

In the time of AI and contextually rich information, Azure VoiceMail could really use some grammar, spelling engines on top of a better transcription engine.

#azure, #phone-system, #transcription, #voicemail

Azure SQL Managed Instance setup and performance thoughts

I am not going to describe what SQLMI are and how they compare to the other SQL offering on Azure here, but if you need to know more about MI this is a great starting place https://docs.microsoft.com/en-us/azure/sql-database/sql-database-managed-instance. Yet, this is a very viable and cool option to host databases in Azure.

Also, as you are looking to test this out and If you don’t want to integrate the MI to existing vnet, you can look at this quick start template https://github.com/Azure/azure-quickstart-templates/tree/master/101-sql-managed-instance-azure-environment

 

Getting  a SQL MI ready

This said, getting the MI installed is not like any other resources or PaaS in that instance, and you will need:

  1. Configure Virtual Network where Managed Instance will be placed.
  2. Create Route table that will enable Managed Instance to communicate with Azure Management Service.
  3. Optionally create a dedicated subnet for Managed Instance (or use default one that is created when the Virtual Networks is created)
  4. Assign the Route table to the subnet.
  5. Double-check that you have not added something that might cause the problem.

 

Vnet configuration is the network container for all your stuff, this said, the MI shall not be on a subnet that has anything else. And so, creating a subnet for all of your Mis makes sense. Also, there cannot be any Service Endpoints attached to the subnet either. If you want to have only one subnet in your Virtual Network (Virtual Network blade will enable you to define first subnet called default), you need to know that Managed Instance subnet can have between 16 and 256 addresses. Therefore, use subnet masks /28 to /24 when defining your subnet IP ranges for default subnet. If you know how many instances you will have make sure that you have at least 2 addresses per instance + 5 system addresses in the default subnet.

 

The route table will allow the MI to talk to the Azure Management Service. This is required because Managed Instance is placed in your private Virtual Network, and if it cannot communicate with Azure service that manages it will become inaccessible. add new resource “Route table”, and once it is created for to Routes blade and add a route “0.0.0.0/0 Next Hop Internet route”. This route will enable Managed Instances that are placed in your Virtual Network to communicate to Azure Management Service that manages the instance. Without this, the Managed Instance cannot be deployed.

 

Rules for the subnet

  • You have a Managed Instance Route table assigned to the subnet
  • There should be no Networks Security Groups in your subnet.
  • There should be no service-endpoint in your subnet.
  • There are no other resources in subnet.

 

Altogether:

  • Virtual Network should have Service Endpoints disabled
  • Subnet must have between 16 and 256 IP addresses (masks from /28 to /24)
  • There should be no other resources in your Managed Instance subnet.
  • Subnet must have a route with 0.0.0.0/0 Next hop internet
  • Subnet must not have any Network Security Group
  • Subnet must not have any service endpoint

 

More on configuring your SQLMI on MSDN (or whatever the new name is:) ) https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/03/14/how-to-configure-network-for-azure-sql-managed-instance/

 

Access and restores

And then what you ask? You need to connect and play for the database stuff?

SQLMI are private by default and the easy way is to connect from a VM within the same Vnet, connect using SMSS or from your app running next to the SQLMI, like the usual architecture stuff right?

But wait, there is more scenarios here! https://docs.microsoft.com/en-us/azure/sql-database/sql-database-managed-instance-connect-app

 

sqlmi1

 

Don’t be fooled by the version number here, this is not your regular v12 MSSQL (aka 2014). Those MSSQL azure flavours follow a different numbering and are actually more like a mixture between 2017 and 2019 (As of today!)

But if you thought you could restore a DB from a backup (.bak) file you can, but it will have to be from a BLOB storage container of some sort as SQLMI can only understand that device type.

Or use DMS https://docs.microsoft.com/en-us/azure/dms/tutorial-sql-server-to-managed-instance

 

Performance

Because SQLMI is managed and under SLA, all databases are fully logged with a throttling(https://blogs.msdn.microsoft.com/sqlcat/2018/07/20/storage-performance-best-practices-and-considerations-for-azure-sql-db-managed-instance-general-purpose/ ) to ensure the cluster can ship and replay the logs in a good amount of time. When doing more intensive operations such as insert millions of records this can play a role slowing this down.

 

The answer to that is: over-allocated storage! Indeed, behind the DB files are blobs with various iops capabilities and just like the managed disks, your disk iops for your DBs come with the amount of storage too. The tiers are >512; 513-1024; 1024-2048… so even if you have a small DB, you might want to go with more space on the instance and grow your DB (and log) files to the maximum right away – you pay for it after all!. More on premium disks here: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/premium-storage#scalability-and-performance-targets

 

Tip! remember that the tempDB is not fully logged and lives on local SSD for the Business Critical version of SQLMI. Use this one if want untethered speed.

Also of interest a script to measure iops https://github.com/dimitri-furman/managed-instance/blob/master/MI-GP-storage-perf/MI-GP-storage-perf.sql

 

How to fix NTP issues on Nutanix CVMs

First, research the following article: https://portal.nutanix.com/#/page/kbs/details?targetId=kA032000000bmjeCAA

But if you suspect an offset anyway, run the following. From this example, the time is way off.

allssh grep offset ~/data/logs/genesis.out
================== 173.23.33.12 =================
2018-06-20 08:54:07 INFO time_manager.py:555 NTP offset: -119.154 seconds
2018-06-20 09:04:43 INFO time_manager.py:555 NTP offset: -119.149 seconds
2018-06-20 09:15:14 INFO time_manager.py:555 NTP offset: -119.154 seconds
2018-06-20 09:25:50 INFO time_manager.py:555 NTP offset: -119.145 seconds
2018-06-20 09:36:21 INFO time_manager.py:555 NTP offset: -119.154 seconds

Cassandra will not allow for the server to immediately switch back to another time because of the large offset and messing up with timestamps but fear not, it comes with a script to slowly catch up “Fix Time Drift”

allssh '(/usr/bin/crontab -l && echo "*/5 * * * * bash -lc /home/nutanix/serviceability/bin/fix_time_drift") | /usr/bin/crontab -'

Then you can keep an eye on the cluster time offset using:

for i in `svmips` ; do echo CVM $i: ; ssh $i "/usr/sbin/ntpq -pn" ; echo ; done
CVM 173.23.33..12:
FIPS mode initialized
Nutanix Controller VM
remote refid st t when poll reach delay offset jitter
==============================================================================
*173.23.33..1 13.65.245.138 3 u 23 256 377 0.513 23.415 21.630
127.127.1.0 .LOCL. 10 l 107m 64 0 0.000 0.000 0.000

CVM 173.23.33..14:
FIPS mode initialized
Nutanix Controller VM
remote refid st t when poll reach delay offset jitter
==============================================================================
*173.23.33.12 173.23.33..1 4 u 135 256 377 0.226 6.950 6.836

CVM 173.23.33..16:
FIPS mode initialized
Nutanix Controller VM
remote refid st t when poll reach delay offset jitter
==============================================================================
*173.23.33.12 173.23.33..1 4 u 30 256 377 0.240 -2.570 6.010
<span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>

or with the previously mentioned command:

allssh grep offset ~/data/logs/genesis.out
================== 173.23.33.12 =================
2018-06-21 09:41:16 INFO time_manager.py:555 NTP offset: -118.121 seconds
2018-06-21 09:40:22 INFO time_manager.py:555 NTP offset: -0.000 seconds
2018-06-21 09:50:52 INFO time_manager.py:555 NTP offset: 0.005 seconds
2018-06-21 10:01:22 INFO time_manager.py:555 NTP offset: 0.006 seconds

when all caught up, run the ntp health check:

ncc health_checks network_checks check_ntp

Also, after all is clear don’t forgot to remove the fix_time_drift crontab job!

allssh "(/usr/bin/crontab -l | sed '/fix_time_drift/d' | /usr/bin/crontab -)"

Azure Cloud, IOPS, DTU and VM equivalence, PaaS vs IaaS MSSQL

I am building the next generation of Data Warehousing for a client and because the cloud’s sky is the limit it doesn’t mean they can afford to spend. Millions (25 million) of records need to be analyzed to produce meaningful reports and dashboard.

Cranking the power all the way up gives us the information we need in seconds. The highest database tier offered is something call P11. But what behind P11? Did I say this PaaS option was expensive?

First, it is not readily translatable into regular machines as the various tiers use a mix of computing, memory and disk IOPS and as per Microsoft: “a blended measure of CPU, memory, and data I/O and transaction log I/O…”. This blend unit is the Database Transaction Unit aka DTU.

Based on the performance trending it comes to something like this (courtesy of What the heck is a DTU?):

Number Cores IOPS Memory DTUs Service Tier Comparable Azure VM Size
1 core, 5% utilization 10 ??? 5 Basic Standard_A0, barely used
<1 core 150 ??? 100 Standard S0-S3 Standard_A0, not fully utilized
1 core up to 4000 ??? 500 Premium – P4 Standard_DS1_v2
2-3 cores up to 12000 ??? 1000 Premium – P6 Standard_DS3_v2
4-5 cores up to 20000 ??? 1750 Premium – P11 Standard_DS4_v2

Notice that this is a comparable size…and that P11 DTU is about CAD$15,000/month!!! And while PaaS is great, the price tag is a little too much for the common mortals getting into DWH.

Anyhow, trying to recreate a cheaper P11 I went on a picked a VM size of E8s_v3, because it said  “Great for relational database servers, medium to large caches, and in-memory analytics”. More sizes here: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes 

Size vCPU Memory: GiB Temp storage (SSD) GiB Max data disks Max cached and temp storage throughput: IOPS / MBps (cache size in GiB) Max uncached disk throughput: IOPS / MBps Max NICs / Expected network bandwidth (Mbps)
Standard_E8s_v3 8 64 128 16 16,000128 (200) 12,800 / 192 4 / 4,000

And so to host my data, I am adding the best disk I find on Azure:

This one, because it is closer to the size I need.
P20 disk is 512 GB, it can achieves 2300 IOPS and up to 150 MB/s.

When I could have gone for the highest:
P30 disk is 1024 GB, it can achieves 5000 IOPS and up to 200 MB/s. And only comes in 1TB size

The key thing* to understand is that the VM tier will limit the IOPS of what you connect to it.

And so here, while my disk is rated at 2300 IOPS and 150 MB/s, the machine specifications are going to limit me at 16,000 IOPS (no problem here) / but only 128 MB/s which is ok because P20 is only 150 MB/s anyway.

Once the VM is brought up online and the disk attached, a quick benchmark will match the specifications given:

iaas iops

 

 

135 MB/s for 150 MB/s advertised…

But wait, is this fast? My laptop gives me way more

laptop iops.png

How to attach faster disks to my VMs then?  What if I want to create a VHD with 600 GB and 400 MB/s of throughput?

You will not obtain such throughput if you just create a 600 GB VHD, because Azure will create a 600 GB VHD on a P30 Disk, and then you will have only 200 MB/s.

To achieve that, you should use stripping, and to do that, you can proceed with different ways:

  • Create 2 600 GB VHDs. Azure will create them using P30 disks. Then you use your stripping tool (Storage spaces) to create a 1200 GB volume. This volume will permit 400 MB/s and 10000 IOPS. But in this case, you will have 600 un-needed GB
  • Create 3 VHDs with 200 GB each. Azure will create them using P20 disks. Then you use your stripping tool (Storage spaces) to create a 600 GB volume. This volume will permit 450 MB/s (150 MB/s *3) and 6900 IOPS (2300 IOPS *3).

Wait, I need more! What if I want to create a VHD with 600 GB and 600 MB/s of throughput? Unfortunately, we can’t just dream, and ask Azure to do it, not till today. In fact, the maximum throughput possible is 512 MB/s, we can’t do better.

Ok. You have created the stripped volumes and you are still not getting what you want? Remember the key thing I mentioned above * ? The total data storage, the IOPS and the throughput are limited by the VM series and size. Each Azure Virtual Machine type is limited by a number of disks (total storage size), a maximum IOPS (IOPS) and a maximum throughput (Throughput). For example, you  may achieve a 128 MB/s only in a Standard_E8s_v3.  All the other VM types will throttle your IOPS or throughput when you reach the threshold.

When I was looking at the Memory optimized server size, there are Storage optimized VM sizes.

Size vCPU Memory: GiB Temp storage (SSD) GiB Max data disks Max cached and temp storage throughput: IOPS / MBps (cache size in GiB) Max uncached disk throughput: IOPS / MBps Max NICs / Expected network bandwidth (Mbps)
Standard_E8s_v3 8 64 128 16 16,000128 (200) 12,800 / 192 4 / 4,000
Standard_L8s 8 64 32 32 40,000 / 400 10,000 / 250 4 / 8,000

And so from the disk creation steps above, you will also want to get a VM that will be able to provide you with the throughput you need.

#azure, #database, #dtu, #iaas, #mssql, #paas, #sql

IBM v7000 WebUI not responding

I ran into the case of non-responding WebUI to manage LUNs (and other things) on a IBM Storwize v7000.
You can only run satask commands if you connected to the CLI using the SSH private key which is associated with the user called superuser. No other SSH key will allow you to run satask commands:

IBM_Storwize:SAN1:superuser>sainfo lsservicenodes
panel_name cluster_id cluster_name node_id node_name relation node_status error_data
01-1 00000100204029A2 SAN1 1 node1 local Active
01-2 00000100204029A2 SAN1 2 node2 partner Active

To find out which node is the configuration node, run the following command

sainfo lsservicestatus <Active panel name>

the following line tells you which node is the config node:

IBM_Storwize:SAN1:superuser>sainfo lsservicestatus 01-1
panel_name 01-1
cluster_id 00000100204029a2
cluster_name SAN1
cluster_status Active
cluster_ip_count 2
cluster_port 1
cluster_ip 172.xx.xx.33
cluster_gw 172.xx.xx.252
cluster_mask 255.255.254.0

Use the following command to restart the web service on the configuration node

satask restartservice -service tomcat 01-1

Wait at least 5 minutes for the service to restart before assuming that this has failed

If this do no’t solve your problems, you should restart the cannister with configuration node on it. Remember to connect to the Node that’s not configuration node. And restart the “partner”. If unable to get it reset, reset the canister physically. that is, pull it out a few inches from the chassis and then insert it again after 30 seconds.

#ibm, #storage-2, #v7000, #webui