I have a lot of data. Objects with other objects nested like this (JSON):
{
id: 12,
Code: "a code",
country: "at",
prices:
[{
{ price: 100, from "2017-05-05", to "2017-06-05" },
{ price: 150, from "2017-05-06", to "2017-06-06" },
{ price: 200, from "2017-05-07", to "2017-06-07" },
}]
}
etc. I have about 60k of these main/parent objects. The prices are about 3 million objects.
Current situation
I have made a relational Azure SQL DB with tables for the objects and a price table with foreign key. Same for country table etc. I have no issues with relational databases. I have had an S3 instance of an Azure SQL DB and also a LocalDB. Both are not fast enough. To insert or update, I use stored procedure. One per table. So to store just 1 object, I execute an stored proc to find out of it needs an insert or update and store the main/parent object. Return the (new) ID and set it as foreign key for the price object. Also store about 100 price objects and move on to the next. This takes for over an hour to store 100 main/parent objects with their related prices.
The solutions I consider
- upscale to Azure SQL Premium and scale back to S0 (standard) when I am done. docs
- move to in-memory (which does not have foreign keys...)
- move to NoSQL (cosmos db)
I tried solution 3 and came up with this:
I have made a proof of concept storing the objects as json in cosmos db (it was still document db back then) and it was fast. However, querying it was hard because you cannot do a subquery or avg or sum. https://stackoverflow.com/questions/43521000/documentdb-get-min-of-subarray
or should I store the price rows as new documents and have a 'foreign key' to the main/parent object?
Should I dig in more in option 3, or would you recommend 1 or 2?
Edit for Ewan My stored proc's: for main/parent object:
ALTER PROCEDURE [dbo].[InsertUpdateParent]
(
@code nvarchar(max),
@name nvarchar(max),
@lat float,
@lng float,
@country int,
@countryCode nvarchar(2)
)
AS
BEGIN
DECLARE @id int;
DECLARE @err int;
DECLARE @newObject bit;
SET NOCOUNT ON
IF (@country = 0)
begin
select @country = id from countries where code = @countrycode;
SELECT @err = @@error;
end
IF EXISTS (SELECT ID FROM ParentObject where code = @code)
begin
SELECT @id = ID FROM ParentObject where code = @code;
SELECT @err = @@error;
SELECT @newObject = 0;
UPDATE ParentObject SET
[Name] = @name,
Latitude = @lat,
Longitude = @lng,
CountryId = @country
WHERE code = @code;
SELECT @err = @@error;
end
else
begin
SELECT @newObject = 1;
INSERT INTO [dbo].[ParentObject]
([Code]
,[Name]
,[Latitude]
,[Longitude]
,[CountryId]
VALUES
(@code, @name, @lat, @lng, @country);
select @id = SCOPE_IDENTITY();
SELECT @err = @@error;
end
IF @err <> 0
PRINT '@err is ' + ltrim(str(@err)) + '.'
else
return @id;
END
and for prices:
ALTER PROCEDURE [dbo].[InsertUpdatePrices]
(
@from as datetime,
@to as datetime,
@parentId as int,
@price as decimal(18,2)
)
AS
BEGIN
declare @id int;
SET NOCOUNT ON
IF EXISTS (SELECT ID FROM prices where ArrivalDateFrom = @from and ArrivalDateTo= @to and ParentObjectId = @parentId )
begin
SELECT @id = ID FROM prices where ArrivalDateFrom = @from and ArrivalDateTo= @to and ParentObjectId = @parentId;
UPDATE [dbo].[Prices]
SET
[ArrivalDateFrom] = @from,
[ArrivalDateTo] = @to,
[Price] = @price,
[ParentObjectId] = @parentId
WHERE ID = @id
end
else
begin
INSERT INTO [dbo].[Prices]
([ArrivalDateFrom]
,[ArrivalDateTo]
,[Price]
,[ParentObjectId])
VALUES
(@from
,@to
,@price
,@parentId);
select @id = SCOPE_IDENTITY();
end
return @id;
END