Thursday, January 5, 2023
HomeData ScienceExtracting Knowledge from (Azure) SQL Server Big Tables in RFC 4180-Compliant CSV...

Extracting Knowledge from (Azure) SQL Server Big Tables in RFC 4180-Compliant CSV Information | by Luca Zavarella | Jan, 2023


(Picture from Unsplash)

When a staff of Knowledge Scientists from outdoors your organization is employed to implement Machine Studying fashions, you in some way need to share with them the information for use for mannequin coaching. If the aforementioned staff can not instantly entry the information endured in a database, the primary possibility is to extract it from the database into information in CSV format. Contemplating that more often than not these knowledge are in massive portions (5+ GB) and that some fields could comprise particular characters (comma, which coincides with the sphere separator; carriage return and/or new line characters), the standard instruments used for export by non-developer customers is probably not sufficient, even inflicting reminiscence issues.

On this article you will note how you can remedy the issue of extracting a considerable amount of knowledge containing particular characters from an (Azure) SQL Server database in RFC 4180-Compliant CSV Information utilizing PowerShell features.

When you’ll want to extract knowledge from an (Azure) SQL Server database, the primary instruments that come to thoughts for the consumer are SQL Server Administration Studio (SSMS) and Azure Knowledge Studio (ADS). It is because each comprise easy options that will let you extract knowledge from a database with a number of clicks.

The interfacing software with (Azure) SQL Server par excellence is SSMS. Just lately Microsoft has been investing closely in including options in ADS to make it the software of selection for the Microsoft knowledge platform on Azure and past. Subsequently, if you set up the newest variations of SSMS at this time, the setup additionally installs ADS behind the scenes.

Any third-party system that includes importing a CSV file to load a dataset should be primarily based on a typical that defines the CSV format. Subsequently, earlier than transferring on to sensible assessments, let’s see if there’s a commonplace definition of the CSV format.

RFC 4180 is a typical that formalizes the format used for Comma-Separated Values (CSV) information and the particular Multipurpose Web Mail Extensions (MIME) sort related to the CSV format (“textual content/csv”). The contents of this commonplace might be discovered right here:

As you may see from the definition of the format within the earlier hyperlink, whereas the primary 4 factors are pretty apparent, the remaining three should be learn fastidiously:

  • Every discipline could or is probably not enclosed in double quotes (nonetheless
    some applications, resembling Microsoft Excel, don’t use double quotes
    in any respect). If fields should not enclosed with double quotes, then
    double quotes could not seem contained in the fields.
  • Fields containing line breaks (CR/LF), double quotes, and commas
    needs to be enclosed in double-quotes.
  • If double-quotes are used to surround fields, then a double-quote
    showing inside a discipline should be escaped by previous it with
    one other double quote
    .

Retaining in thoughts additionally the examples given within the hyperlink, it’s evident that the worth of a discipline will likely be enclosed with double quotes solely when wanted. It doesn’t make sense to make use of double quotes for all of the values of a discipline when solely a number of the values want them.

When you’ll want to share data with third-party methods utilizing CSV format information, the next applies:

It’s essential that the CSV information you generate out of your exports are RFC 4180 compliant to make sure that the information might be learn by any exterior system that gives CSV file import functionality.

As a way to check how the upon talked about instruments extracts knowledge in CSV format, let’s create a easy desk containing particular characters talked about within the RFC 4180 commonplace, and Unicode characters to make sure the generality of the contents in textual content fields.

First, it’s a must to create the extract_test desk in your SQL Server occasion utilizing the next script:

CREATE TABLE [dbo].[extract_test](
[id] [int] IDENTITY(1,1) NOT NULL,
[name] [nvarchar](50) NULL,
[notes] [nvarchar](100) NULL
)

Then you may add knowledge to this desk utilizing the next script:

SET IDENTITY_INSERT [dbo].[extract_test] ON 
GO
INSERT [dbo].[extract_test] ([id], [name], [notes]) VALUES (1, N'Luca', N'let''s add a carriage return
right here')
GO
INSERT [dbo].[extract_test] ([id], [name], [notes]) VALUES (2, N'Zavarella, Luca', N'the identify incorporates a comma')
GO
INSERT [dbo].[extract_test] ([id], [name], [notes]) VALUES (3, N'Luca Zavarella', N'right here we've got a comma and a double citation mark: ,"')
GO
INSERT [dbo].[extract_test] ([id], [name], [notes]) VALUES (4, N'秋彦', N'this japanese identify means "shiny prince"')
GO
SET IDENTITY_INSERT [dbo].[extract_test] OFF
GO

As you may see from the contents of the INSERT statements, we’ve got offered all of the particular characters talked about in the usual. Now we have additionally used Japanese characters, in order that we are able to confirm that the CSV file is written appropriately utilizing the Unicode character desk.

Evidently the desk created on this case won’t be a 5 GB desk, however will comprise particular characters to check CSV format exports. Right here the output of a SELECT in ADS:

Determine 1 — Output of the content material of your dummy desk on ADS (by the writer)

Don’t worry from the truth that the carriage return doesn’t present up within the output grid of ADS or SSMS. The way in which the INSERT of that row was executed, the carriage return is there.

So, let’s attempt to extract knowledge from this desk utilizing SSMS and ADS.

Let’s first attempt to use the normal software with which we interface with SQL Server, specifically the SQL Server Administration Studio.

Extracting knowledge with SSMS

Upon getting opened SSMS and related to your database occasion, right-click on the identify of the database internet hosting the desk you’ve simply created, go to Duties after which Export Knowledge:

Determine 2 — Exporting knowledge from a database utilizing SSMS (by the writer)

You can be proven an preliminary display screen describing the Extract Knowledge exercise. In case you go ahead, you can be proven this window:

Determine 3 — Choosing a knowledge supply from the Export Wizard (by the writer)

Choose the SQL Server Consumer knowledge supply, enter your server occasion identify, then select the authentication to make use of to login to database. In my case, having endured the check desk on an Azure SQL database, I used a SQL Server authentication to entry my test-sql-bug database, as you may see in Determine 3.

On the following display screen of the Wizard you will have the choice of choosing the export vacation spot. In our case, choose Flat File Vacation spot, create a CSV vacation spot file through the Browse button in your most well-liked folder (bear in mind to pick the CSV extension within the Open window that opens after urgent Browse). Bear in mind to test the Unicode flag to be sure to additionally deal with the Japanese characters in our instance. After that, choose Delimited because the format, leaving the Textual content qualifier at “<none>”. Additionally make it possible for the “Column names within the first knowledge row” flag is checked. Then press Subsequent:

Determine 4 — Choosing a vacation spot for the information output (by the writer)

Within the subsequent window choose Copy knowledge from a number of tables or views and press Subsequent once more.

Within the configuration window that seems you may then choose the desk ` [dbo].[extract_text] as Supply desk or view. For the opposite choices, you may go away all the pieces as is, because the row delimiter (CRLF) and the column delimiter (comma) are as outlined by the RFC 4180 commonplace. Then press Subsequent:

Determine 5 — Configuring the flat file vacation spot choices (by the writer)

Within the subsequent window hold Run instantly chosen and press End. A abstract window of the chosen choices will seem. Press End once more and the extraction will begin. When completed, press Shut.

In case you now attempt to open the output CSV file with a textual content editor (not Excel), you’ll discover the next:

Determine 6 — The output of the SSMS Export Wizard with out textual content qualifier (by the writer)

Mainly, on this case the Export Wizard extracts the contents of every textual content discipline no matter whether or not it might comprise particular characters (comma and carriage return). Because of this any carriage return contained in a textual content discipline is interpreted as a row delimiter by the system that has to learn the file, simply as any comma contained in a textual content discipline is interpreted as a discipline delimiter. Unicode characters, then again, have been handled appropriately. Subsequently, the generated CSV file won’t be acknowledged as appropriate by any third-party system that should import that data.

In case you attempt to repeat the export, this time getting into the double quotes as a textual content qualifier, you’ll get the next:

Determine 7 — The output of the SSMS Export Wizard utilizing double quotes as textual content qualifier (by the writer)

On this case, all extracted values are surrounded by double quotes, together with the header. Nonetheless, this forces an exterior system that should learn the information to think about all numeric values as strings. Furthermore, if a worth in a textual content discipline incorporates a double quote character, it’s not escaped, producing parsing issues for exterior methods. Subsequently, once more, the generated CSV file won’t be acknowledged as appropriate by any third-party system that should import this data.

Relating to the scalability of the extraction operation on very massive plenty of knowledge, there are not any issues, as a result of the Export Wizard makes use of SQL Server Integration Providers (SSIS) as its engine, which is developed to deal with big bulk volumes of knowledge.

Furthermore, it might typically occur that you’ll want to take motion on the information supply knowledge sorts to keep away from some errors throughout export with the Export Wizard, as highlighted on this weblog submit:

We will conclude this part by stating the next:

Utilizing the SSMS Export Wizard as a software for extracting knowledge in CSV format from an (Azure) SQL Server database doesn’t assure having a format that complies with the usual outlined by RFC 4180, with the consequence that the extracted data is probably not correctly learn by an exterior system.

As a substitute, let’s see what occurs after we use Azure Knowledge Studio to extract the knowledge in CSV format.

Extracting knowledge with ADS

As soon as Azure Knowledge Studio is open, the very first thing to do is to add a brand new connection to your server occasion. Be careful that beginning with newer variations, the Encrypted possibility is ready to True by default. This won’t end in connection errors for those who connect with an Azure SQL database, nevertheless it would possibly generate one in case your knowledge supply is an on-prem SQL Server. In that case, you may set the choice to False.

That stated, with a view to extract the contents of a desk (or view, or question) in ADS, you will need to first carry out a SELECT question and show its contents within the output grid working it. After that, merely press the “Save As CSV” button on the high proper of the grid:

Determine 8 — Saving the output of a question in ADS in CSV format (by the writer)

An output file choice window will open, permitting you to call the file that will likely be extracted (in our case ExtractTestADS.csv). As quickly as you press the Save button, the contents of the CSV file will likely be proven instantly inside ADS:

Determine 9 — The output of ADS in CSV format (by the writer)

Wow! The output generated by ADS complies with the RFC 4180 commonplace to all intents and functions! Thus, it might appear that ADS is the right software for extracting data in CSV format from an (Azure) SQL database.

Nonetheless, there’s a scalability downside. Since ADS requires that the question output be first uncovered within the output grid, this limits the performance when coping with many GB of knowledge. In these circumstances, containing all that knowledge in a grid includes taking on a lot RAM on the system, inflicting the applying to crash.

We will subsequently conclude this part as follows:

ADS’s CSV format knowledge export process ensures output that conforms to the RFC 4180 commonplace. Nonetheless, the usage of ADS for extraction duties is indicated when the dimensions of the dataset to be exported is fairly restricted. When greater than 2–3 GB of knowledge must be extracted, ADS could occupy your entire system reminiscence and crash.

On the whole, we are able to subsequently conclude that:

Sadly, the user-friendly options offered by Microsoft’s knowledge platform instruments don’t permit to extract big quantity of knowledge in CSV format following the RFC 4180 commonplace.

Let’s attempt to see if we are able to obtain our objective by way of extra particular instruments identified by knowledgeable customers.

The Bulk Copy Program (BCP) command line utility is used to import massive numbers of latest rows into SQL Server tables or to export knowledge from tables to knowledge information in a user-specified format. That is the answer that imports or exports knowledge as quick as attainable in even very massive portions. Subsequently, it has no downside with scalability.

Along with being put in by default with a typical on-prem SQL Server set up, and along with with the ability to be put in stand-alone on a Home windows working system, the BCP utility may also be used from the Azure cloud shell to work together with an Azure SQL database, as proven on this weblog submit:

With out going into an excessive amount of element, the principle downside with BCP is that it doesn’t extract desk headers and doesn’t deal with double quotes in a easy method out of the field. That is evidenced by the truth that Erland Sommarskog’s reference information for its use stories a variety of workarounds for getting each headers and double quotes, as you may see right here:

One of many drawbacks of that is strategy is that it’s a must to know prematurely which fields want double quotes (except you present them for all textual content fields). Typically, I don’t have the power to know prematurely which fields may need the necessity for double quotes. I simply wish to extract the information worry-free. Do you have to be capable to get the headers and double quotes through Erland’s recommendation, nonetheless, the quotes could be utilized to all values within the chosen fields. As Erland himself factors out:

… the belief is that the information ought to at all times be quoted. In case you solely wish to quote when wanted, you’ll need to deal with this in your question, which is outdoors the scope of this text. All I can say is: good luck. Or extra instantly: keep away from it for those who can.

Additionally, ought to a discipline with double quotes have a string containing each a comma and a double quote, the BCP doesn’t deal with the characteristic of escaping the double quote by doubling it.

We will subsequently state that:

Utilizing BCP to export knowledge in a CSV format that features each headers and double quotes may be very arcane for the non-expert consumer. One draw back is that it’s a must to know prematurely for which fields to offer double quotes. As well as, it might nonetheless not end in a format according to the RFC 4180 commonplace.

I can’t go into the main points of utilizing Microsoft’s different command-line software known as SQLCMD, as a result of the problems are just like these highlighted on this part.

So what? The right way to proceed? Since I couldn’t discover an software on the Web that was capable of extract knowledge in an RFC 4180-compliant CSV format and on the identical time deal with very massive knowledge plenty, the one attainable answer was to develop a customized answer that may be simply used even by the non-expert consumer. Let’s see how this answer works.

The very first thing I requested myself after I determined to develop a particular answer for this downside was what programming language to make use of. The primary language that got here to thoughts was undoubtedly Python. I then thought, nonetheless, that a typical consumer approaching the world of automation on a Home windows machine could not know Python, and he wouldn’t discover it preinstalled on the working system. That’s the reason the selection fell on PowerShell, which gives, amongst different issues, a particular module for SQL Server.

Points with the SQL Server PowerShell module

The primary try I made was to make use of the SQL Server PowerShell module, which permits SQL Server builders, directors, and enterprise intelligence professionals to automate database improvement and server administration.

Particularly, the command I attempted to make use of to ship the question wanted to retrieve the information to the Azure SQL database was Invoke-Sqlcmd. This command does nothing greater than invoke the sqlcmd.exe command-line utility, typically utilized by automation processes to retrieve data from a SQL Server database. To this point, so good. The issue is that Invoke-Sqlcmd persists all question output instantly into PowerShell knowledge constructions. As you may guess, when the question output takes up greater than 3–4 GB, you will have the identical downside encountered with extraction executed in ADS, which is that your system turns into unstable resulting from extreme RAM consumption.

Subsequently, I discovered it applicable to instantly use ADO.NET objects in PowerShell to attempt to work round the issue. Let’s see how I used them on this answer.

Batch exporting knowledge to output file

The primary thought of my answer is to at all times use an intermediate knowledge construction (a DataTable) that may acquire the question knowledge, however a variety of rows at a time. As soon as the utmost capability of the intermediate knowledge construction is reached, its contents are written to the goal file, it’s emptied and is instantly loaded with the following rows of knowledge from the information supply:

Determine 10 — Most important means of the answer (picture by the writer)

This course of goes on till there are new strains to learn within the knowledge supply.

You would possibly marvel why I used an intermediate DataTable and didn’t implement a direct write stream to the output file by way of the StreamWriter. The reply lies within the capability to make use of PowerShell’s Export-Csv cmdlet instantly.

Writing knowledge utilizing Export-Csv

One of many objectives I set for myself when I’ve to resolve an issue is at all times to not reinvent the wheel if there are already handy options that enable you remedy it utterly or partially. On this case, I assumed I’d dispense with rewriting all of the logic that handles the particular characters talked about by the RFC 4180 commonplace utilizing the Export-Csv cmdlet.

Checking the PowerShell cmdlet information, I noticed that Export-Csv gives the parameters that management the usage of double quotes solely as of model 7:

Determine 11 — Variations between Export-Csv variations 6 and seven (picture by the writer)

Particularly, the UseQuotes parameter gives the worth AsNeeded and defines its performance as follows:

solely quote fields that comprise a delimiter character, double-quote, or newline character

Mainly, it’s what we wish with a view to meet the necessities of the RFC 4180 commonplace.

Do you have to want to present double quotes just for sure fields, you may specify them explicitly through the QuoteFields parameter.

Now there’s only a small downside with PowerShell versioning. Take into account that Home windows 10, Home windows 11, and Home windows Server 2022 preinstall model 5.1 of Home windows PowerShell (also called Desktop version). As a way to use the newer variations of the Export-Csv cmdlet, you will need to set up the newer model of PowerShell (not less than PowerShell 7.0), which is for all intents and functions a separate piece of software program from Home windows PowerShell primarily based on .NET Core (in case you are serious about studying about its evolution over time, you may study extra at this hyperlink).

It’s essential to emphasise the next:

Since this module was developed for Core variations of PowerShell, it may also be used on Linux and macOS methods.

That stated, let’s see how you can use this new module.

The brand new SqlBulkExport module is accessible on GitHub right here:

It gives two features:

  • Export-SqlBulkCsv: Exports the content material of a SQL Server database desk, view or question in an RFC 4180-Compliant CSV file. This perform helps the export of big consequence units, writing the CSV file content material in a number of batches.
  • Export-SqlBulkCsvByPeriod: Exports the content material of a SQL Server database desk, view or question in a number of RFC 4180-Compliant CSV information, damaged down by time interval (yearly, month-to-month or day by day), primarily based on the contents of a specific date discipline. This perform helps the export of big consequence units, writing every CSV file content material in a number of batches.

Each features require the next parameters:

  • ServerName: The SQL Server occasion identify to connect with.
  • Port: The SQL Server occasion port quantity. By default, it’s 1433.
  • DatabaseName: The SQL Server database identify to connect with.
  • SchemaName: The database schema of a desk of view from which extract knowledge. By default, it’s “dbo”.
  • TableViewName: The database desk or view identify from which extract knowledge.
  • Question: The T-SQL question with which extract knowledge.
  • Person: The username to make use of to connect with database.
  • Password: The password of the username to connect with database.
  • ConnectionTimeout: The connection timeout in seconds. By default it’s 30 seconds.
  • DatabaseCulture: The database tradition code (es. it-IT). It’s used to extract the decimal separator correctly. By default, it’s “en-US”.
  • BatchSize: The scale (variety of rows) of batches which can be written to the output file till knowledge to extract is over.
  • OutputFileFullPath: Full path (together with filename and csv extension) of the output file.
  • SeparatorChar: Character used to construct string separators proven in console.

The Export-SqlBulkCsvByPeriod perform gives three extra necessary parameters to have the ability to partition the consequence set in response to a time interval:

  • DateColumnName: Date/time sort column by which knowledge will likely be damaged down by the point interval.
  • StartPeriod: Time interval string (allowed codecs: “yyyy”, “yyyy-MM”, “yyyy-MM-dd”) representing the interval from which to start out extracting knowledge (interval in query included).
  • EndPeriod: Time interval string (allowed codecs: “yyyy”, “yyyy-MM”, “yyyy-MM-dd”) representing the interval as much as which to extract knowledge (interval in query included).

It’s evident that the codecs used for the 2 enter intervals should be according to one another.

It’s essential to notice that extracting a number of CSV information damaged down by a time interval utilizing the Export-SqlBulkCsvByPeriod perform is just attainable utilizing a desk/view, and never a question. If there are, for instance, particular wants for choosing fields and filters to be utilized to a desk, one should then first expose a view with these logics to then be capable to extract a number of CSV information by time interval.

Furthermore, the Export-SqlBulkCsvByPeriod perform includes the usage of the string token {} (curly brackets open and closed) inside the identify of the output CSV file, which token will likely be changed by the string related to the time interval of the transactions contained within the CSV file in query.

Each features mechanically acknowledge when to attach utilizing Home windows authentication or SQL Server authentication primarily based on whether or not or not the Person and Password parameters are handed.

Earlier than continuing with the examples, be sure to have put in the newest model of PowerShell.

Putting in the newest PowerShell and SqlBulkExport variations

As a way to set up the newest model of PowerShell on Home windows machines, obtain and run the 64-bit installer (in our case, model 7.3.0) from this hyperlink.

Click on Subsequent to all of the Setup Wizard home windows. Then click on End. You’ll see the PowerShell 7 (x64) immediate put in into your functions:

Determine 12 — PowerShell 7 simply put in (picture by the writer)

Run it and also you’ll see the PowerShell immediate prepared on your instructions:

Determine 13 — PowerShell 7 immediate prepared (picture by the writer)

You possibly can enter the $PSVersionTable command and press Enter to test if all is working tremendous:

Determine 14 — PSVersionTable output (picture by the writer)

Nice! If obligatory, you too can set up PowerShell on Linux or macOS.

Now it’s a must to obtain the SqlBulkExport module information:

  1. Go to the releases web page of the SqlBulkExport GitHub repository and obtain the Supply code.zip file of the newest launch.
  2. As soon as your file is saved in your machine, unzip it and replica its content material into the C:Temp folder (or you may select your most well-liked folder). So, your module information will likely be endured into the C:Tempsql-bulk-export-<model> folder.

Okay! Now you might be able to attempt few examples.

Export the content material of our dummy desk in a single CSV file

Allow us to attempt extracting the contents of the extract_test desk created at first of this text to test its consistency with the RFC 4180 commonplace. In our case, the desk in query is endured in an Azure SQL database:

  1. Open the PowerShell 7 immediate, enter the cd C:Tempsql-bulk-export-<model> command and press Enter to vary your working listing to the module’s one.
  2. Enter the Import-Module -Title ".SqlBulkExport.psd1" command to import the SqlBulkExport module.
  3. Enter the Export-SqlBulkCsv -ServerName "<your-server-name>" -DatabaseName "<your-database-name>" -Person "<username>" -Password "<password>" -TableViewName "export_test" -BatchSize 30000 -OutputFileFullPath "C:TempExtractedTestPS.csv" to export the content material of a database desk (or view) into the output.csv file in batches of 30K rows. Right here the output:
Determine 15 — Console output of the command that extracts the contents of the dummy desk right into a CSV file (picture by the writer)

Right here the content material of the output CSV file:

Determine 16 — Dummy desk extracted in a CSV file utilizing the SqlBulkExport module (picture by the writer)

As you may see, the output CSV file content material meets the RFC 4180 commonplace. As a result of the dummy desk used had few rows, just one batch was used for extraction. Let’s now attempt to extract the contents of a desk having a number of tens of hundreds of rows.

Exporting the content material of a desk/view in a single CSV file

As earlier than, additionally the desk we’re going to make use of to extract knowledge from, is endured in an Azure SQL database:

  1. Open the PowerShell 7 immediate, enter the cd C:Tempsql-bulk-export-<model> command and press Enter to vary your working listing to the module’s one.
  2. Enter the Import-Module -Title ".SqlBulkExport.psd1" command to import the SqlBulkExport module.
  3. Enter the Export-SqlBulkCsv -ServerName "<your-server-name>" -DatabaseName "<your-database-name>" -Person "<username>" -Password "<password>" -TableViewName "<your-table-or-view-name>" -BatchSize 30000 -OutputFileFullPath "C:Tempoutput.csv" command to export the content material of a database desk (or view) into the output.csv file in batches of 30K rows. Right here the output:
Determine 17 — Console output of the command that extracts the contents of a tab/view right into a CSV file (picture by the writer)

As you may see, it took 3 batches of 30K rows to extract the contents of a desk of about 74K rows, taking a complete of 1 second and 88 milliseconds. Not unhealthy!

Let’s attempt utilizing a question to export the information.

Exporting the output of a question in a single CSV file

On this case we are going to extract knowledge from the identical desk as within the earlier case, however utilizing a question like SELECT * FROM <desk> WHERE <situation>.

  1. Open the PowerShell 7 immediate, enter the cd C:Tempsql-bulk-export-<model> command and press Enter to vary your working listing to the module’s one.
  2. Enter the Import-Module -Title ".SqlBulkExport.psd1" command to import the SqlBulkExport module.
  3. Enter the Export-SqlBulkCsv -ServerName "<your-server-name>" -DatabaseName "<your-database-name>" -Person "<username>" -Password "<password>" -Question "SELECT * FROM <your-table-or-view-name> WHERE <situation>" -BatchSize 30000 -OutputFileFullPath "C:Tempoutput.csv" command to export the content material of a question consequence set into the output.csv file in batches of 30K rows. Right here the output:
Determine 18 — Console output of the command that extracts the output of a question right into a CSV file (picture by the writer)

All the pieces works like a attraction! Allow us to now attempt exporting the contents of 1 view to a number of month-to-month CSV information.

Exporting the content material of a desk/view in a number of month-to-month CSV information

Think about you will have a transaction desk containing a whole lot of hundreds of rows monthly. There’s a group of Knowledge Scientists from outdoors the corporate who’re assigned to do superior evaluation on the transaction historical past. For comfort, they ask you to extract a dataset consisting of subsets of the fields obtainable within the desk for a few months of transactions. As a substitute of producing a single CSV file, they ask you to offer them with a number of CSV information damaged down by month.
Let’s see how to do that because of the Export-SqlBulkCsvByPeriod perform:

  1. Open the PowerShell 7 immediate, enter the cd C:Tempsql-bulk-export-<model> command and press Enter to vary your working listing to the module’s one.
  2. Enter the Import-Module -Title ".SqlBulkExport.psd1" command to import the SqlBulkExport module.
  3. Enter the Export-SqlBulkCsvByPeriod -ServerName "<your-server-name>" -DatabaseName "<your-database-name>" -Person "<username>" -Password "<password>" -TableViewName "<your-table-name>" -DateColumnName "<your-date-column-name>" -StartPeriod "2022-01" -EndPeriod "2022-03" -BatchSize 30000 -OutputFileFullPath "C:Tempoutput_{}.csv" command to export the content material of a database desk (or view) into a number of month-to-month CSV information in batches of 100K rows, ranging from January 2022 to March 2022. Right here the output:
Determine 19 — Console output of the command that extracts a number of month-to-month CSV file (picture by the writer)

Wonderful! You simply extracted about 1.5mln rows damaged down into three month-to-month CSV information in simply 1 minute and 19 seconds!

The necessity that prompted me to write down this text is to extract a considerable amount of knowledge (3–4+ GB) into a number of information in CSV format compliant with the RFC 4180 commonplace.

You have got seen how the instruments offered by Microsoft (whether or not they’re IDEs, resembling SSMS and ADS; whether or not they’re command-line instruments, resembling BCP) are unable to fulfill the above want. The one software that appeared a bit extra appropriate is ADS, nevertheless it can not extract massive quantities of knowledge with out crashing. With out mincing phrases, it’s fairly embarrassing that to this point Microsoft has not but made a software obtainable to customers that may meet the necessities set forth above.

Not having discovered software program on the Web that met the above wants, I wrote the SqlBulkExport PowerShell module that solves the issue and made it obtainable opensource on GitHub below an MIT license. I emphasize that I’m not a PowerShell developer, so any enter from you that may enhance the answer is absolutely welcome!

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments