Learn why Snowflake's cloud data warehouse solution is gaining popularity so quickly—and how your business can make the most of it.
by Ricky Sharma
Why Snowflake is special
First, let’s talk about why Snowflake is gaining momentum as a top cloud data warehousing solution:
- It has out-of-the box features like separation of storage and compute, on-the-fly scalable compute, data sharing, data cloning, and third party tools support.
- It serves a wide range of technology areas, including data integration, business intelligence, advanced analytics, and security & governance.
- It provides support for programming languages like Go, Java, .NET, Python, C, Node.js, etc.
- For general users, Snowflake provides complete ANSI SQL language support for managing day-to -day operations.
- It’s cloud agnostic, with unlimited, seamless scalability across Amazon Web Services (AWS) and Microsoft Azure (with the prospect of adding Google Cloud soon).
- Not only does it provide cloud infrastructure, but also many choices for designing modern architectures, making it particularly suitable for agile methodology and dynamic usage trends.
- Snowflake can be used to fit to many use cases—data lakes with raw data, ODS with staged data, and data warehouse / data marts with presentable and modeled data.
- Data processing is simplified: Users can do data blending, analysis, and transformations against various types of data structures with a single language, SQL.
- Snowflake offers dynamic, scalable computing power with charges based purely on usage.
- Snowflake's ability to operate against massive raw data has truly ramped up insightful data analysis, allowing, for example, for “Schema on Read” to access data in structured and semi-structured formats without having to model them first.
- It empowers us to analyze a variety of data structures including CSVs, JSON, XML, Parquet, Avro, and to blend them at the same time by using SQL language.
All that adds up to a lot of power. And we all know what comes with great power! It’s important to establish Snowflake correctly for your enterprise and to enable your people for the big jump ahead.
To help get you there, here are four keys to establishing a sustainable and adaptive enterprise data warehouse with Snowflake.
1. Yes, you may need to do some rebuilding
Many customers moving from on-prem to cloud ask, “Can I leverage my current infrastructure standards and best practices, such as database and user management, security, and DevOps?” This brings up a valid concern about building policies from scratch, but it’s important to adapt to new technology and new business opportunities. And that may in fact require some re-building. If you took an engine from a 1982 Chevrolet and installed it in a 2019 Mustang, would you expect the same performance?
It’s important to make choices not because “that’s how we’ve always done it,” but because those choices will help you adopt new technology, gain agility, and empower your business applications and processes. Key areas to review include policies, user management, sandbox setups, data loading practices, ETL frameworks, tools, and code base.
2. Get your data modelling right
Snowflake serves multiple purposes: data lake, data mart, data warehouse, ODS and database. It even supports various modelling techniques such as Star, Snowflake, Data Vault and BEAM.
Snowflake can even support “schema on read” and “schema on write.” This sometimes creates confusion on how to position Snowflake appropriately.
The solution is to let your usage patterns predict your data model. Think about how you foresee your data consumers and business applications leveraging data assets in Snowflake. This will help you clarify your organization and resources to get the best result from Snowflake.
Here’s an example. In complex use cases, it’s usually a good practice to develop composite solutions involving:
- Layer1 as Data Lake to ingest all the raw structured and semi-structured data.
- Layer2 as ODS to store staged and validated data.
- Layer3 as Data Warehouse for storing cleansed, categorized, normalized and transformed data.
- Layer4 as Data Mart to deliver targeted data assets to end consumers and applications.
3. Figure out ingestion and integration
Snowflake adopts seamlessly with various data integration patterns including batch (e.g., fixed schedule), near real time (e.g., event based) and real time (e.g., streaming). To identify the best pattern, evaluate your data loading use cases. You may want to combine all these patterns—where data received on a fixed schedule goes through a static batch process, and flexibly delivered data uses dynamic pattern. Evaluate your data sourcing requirements and delivery SLAs to trace them to an appropriate ingestion pattern.
Also account for your future use cases. Example: “data X” is received by 10am daily, so it’s good to schedule a batch workflow running at 10am, right? But what if instead it’s ingested by an event-based workflow—won’t this improve your SLA, deliver data faster, avoid efforts when delays happen, and convert static dependency to an automated triggering mechanism? Try to think through as many possible scenarios as you can.
Once integration patterns are identified, ETL tooling comes next. Snowflake supports many integration tools and partners such as Talend, Informatica, Matillion, Snaplogic, Dell Boomi, and Alteryx. Many of these have also developed a native connector with Snowflake. And Snowflake supports no-tool integration using open source languages like Python.
To choose the right integration platform, evaluate these tools against your data volume, processing requirements, and usage demands. Also consider if it can process in memory and/or perform SQL push down (leveraging Snowflake warehouse for processing). Push down technique is a great help on Big Data use cases, as it eliminates the bottleneck with tool’s memory.
4. Managing Snowflake
Here are few things to consider after Snowflake is up and running:
Security practices. Establish strong security practices for your Organization. Leverage Snowflake’s role-based access control (RBAC) over Discretionary Access Control (DAC). Snowflake also supports federated authentication and SSO, integrating with third party services such as Oakta and Active Directory.
Access management. Identify user groups, needed roles and privileges to define a hierarchical structure for your users and applications.
Resource monitors. Snowflake provides infinitely scalable storage and compute. The tradeoff is that you must establish monitoring and control protocols to keep your operating budget under control. The two main considerations here are:
- Snowflake warehouse configuration. It's usually best to create different Snowflake Warehouses for each user group, business area, or application. This helps to manage segregated billing and chargeback when needed. To govern further, assign roles specific to Warehouse actions (access, monitor/ update / create) so that only designed users can alter or create the warehouse.
- Billing alerts help with monitoring and taking the right actions at the right time. Define Resource Monitors to help monitor your cost and avoid billing overage. You can customize these alerts and actions based on different threshold scenarios. Actions range from simple email warnings to suspending a warehouse.
Snowflake provides excellent training resources and blogs to accelerate your journey. They frequently conduct “Snowflake in 0 to 90 minutes” workshops, and offer webinars, training videos, and valuable up-to-date online documentation. Here are some great resources to get you going:
- “Snowflake in 0 to 90 minutes” workshops
- Online training on cloud adoption
- Online webinars
- Technical training videos
- Snowflake Community
I hope this post provided you an understanding of Snowflake’s differentiating features and how those can be leveraged to cultivate your organization’s growing data and analytics requirements.