Strava gets a PB with cloud-based data warehouse

ao link

Members

Contact

Free AI assessment

New to DataIQ?

Take our FREE data literacy indicator now

Unlock the power of data - take our FREE data literacy indicator now

Social networks are renowned for the amount of data they amass, and sports social networks like Strava are no different. Strava's data function was buckling under the weight of its own popularity, having recently passed the milestone of two billion uploaded activities. It had a 120 TB data warehouse, 13 trillion GPS data points and 15 million uploads per week. That volume of data became very difficult to handle and process.

"Trying to access and query the data caused gridlock."

There was a bottleneck and performance was suffering because there were limited connections into the database. “We had a growing set of users trying to access and query that data and it caused gridlock and poor performance on all sides. That was a real challenge for us to deal with,” said Cathy Tanimura, senior director of analytics and data science at Strava.

Tanimura manages the analysis and data science teams and saw that the performance of the database was so bad that the people who had to spend all day querying, would leave for a coffee break or even run a query overnight if they had made a mistake. She and her team would also have to ‘trick’ the database into returning queries.

Cloud-based data warehouse company Snowflake appeared on Strava’s radar because of its “compelling approach to compute and storage”. The decision to move Strava’s data to Snowflake was made at the end of 2017, implementation began in March 2018 and data had been fully transferred by June.

The transition took place with support, consultation and guidance from Snowflake, helping to unload the data from Strava’s old solution and load it into the cloud data warehouse.

According to Tanimura the switchover was virtually seamless, despite moving jobs that were running at the time. “The analysis team just didn’t have downtime. They were asking how to adjust over the course of a week, then folks were just up and running, got up to speed and got productive quickly,” she said.

"To query one billion rows is 20 minutes not 20 hours."

The result is much faster performance. Now if a member of her team needs to query one billion rows, will take 20 minutes to run instead of 20 hours. They are also able to change table structures without it affecting other team members who are querying the data. This is because it is running on different compute clusters.

A tangible upside of now being able to query quickly and have a continuous flow of analysis is that the analysts and data scientists have been freed up to do other things. Tanimura said they have rolled out a new tool to help track athlete interactions in the app, so Strava can understand which features users like and use.

They have also rolled out a new ETL scheduling process that helps Strava be more nimble and facilitates self-service of data. Strava is also changing its email vendor, and the data from that email vendor as well as data from the warehouse is helping to build a picture of the athletes - such as the type of athlete they are, the activities they have done in the past and whether they are a member of the premium Summit service – to improve email communications to them. Finally they data scientists and analysts have created a ‘Year in Sport’ highlight reel video for each user.

So Strava has hit a PB with data, while helping its users track theirs.

Log in to read the entire article

Gain access to the entire article by logging in or registering for a free account here.

Did you find this content useful?

Thank you for your input

Thank you for your feedback

Next read

CDO Challenges – Nurturing a data culture

Cultivating and evolving a data culture in an organisation is key to success and transitioning to a data-led business. This edition of CDO Challenges discusses how CDOs can kick-start data culture.

Next read

A case of the AI biter bit?

23 Apr 2024by David Reed

DataIQ’s Chief Knowledge Officer and Evangelist, David Reed, examines the hype cycle around generative AI and the actual speed of transformation being seen.

Pioneering AI initiatives revealed: DataIQ Announces 2024 AI Awards Shortlist

15 Apr 2024by Alex Roberts

The shortlist for the 2024 DataIQ AI Awards has been unveiled, with the winners to be announced at the DataIQ Summit on May 21.

Final chance to enter the 2024 DataIQ Awards and demonstrate your team’s prowess

08 Apr 2024by Alex Roberts

The final deadline for submissions to the 2024 DataIQ Awards – 26 April – is rapidly approaching, so make sure you have entered to clinch a title.

You may also be interested in

DataIQ is a trading name of IQ Data Group Limited
10 York Road, London, SE1 7ND

We use cookies so we can provide you with the best online experience. By continuing to browse this site you are agreeing to our use of cookies. Click on the banner to find out more.

Cookie Settings