Subsystem: Search 🔎
This tool creates the resources needed to run the NuGet search service. These resources can be updated using the Catalog2AzureSearch and Auxiliary2AzureSearch jobs.
Specifically, this tool creates:
You can run this job using:
NuGet.Jobs.Db2AzureSearch.exe -Configuration path\to\your\settings.json
The easiest way to run the tool if you are on the nuget.org team is to use the DEV environment resources:
- Install the certificate used to authenticate as our client AAD app registration into your
CurrentUser
certificate store. - Clone our internal
NuGetDeployment
repository. - Update your cloned copy of the DEV Db2AzureSearch appsettings.json file to authenticate using the certificate you installed:
{
...
"KeyVault_VaultName": "PLACEHOLDER",
"KeyVault_ClientId": "PLACEHOLDER",
"KeyVault_CertificateThumbprint": "PLACEHOLDER",
"KeyVault_ValidateCertificate": true,
"KeyVault_StoreName": "My",
"KeyVault_StoreLocation": "CurrentUser"
...
}
- Update the
-Configuration
CLI option to point to the DEV Azure Search settings:NuGetDeployment/src/Jobs/NuGet.Jobs.Cloud/Jobs/Db2AzureSearch/DEV/northcentralus/appsettings.json
As an alternative to using nuget.org's DEV resources, you can also run this tool using your personal Azure resources.
- Gallery DB. This can be initialized locally using the NuGetGallery.
- Azure Search. You can create your own Azure Search resource using the Azure Portal.
- Azure Blob Storage. You can create your own Azure Blob Storage using the Azure Portal.
In your Azure Blob Storage account, you will need to create a container named ng-search-data
and upload the following files:
downloads.v1.json
with content[]
ExcludedPackages.v1.json
with content[]
You will also need to create a second container (if it does not already exist) named content
and upload the following file:
flags.json
with content{}
If you are on the nuget.org team, you can copy these files from the PROD auxiliary files container.
Once you've created your Azure resources, you can create your settings.json
file. There's a few PLACEHOLDER
values you will need to fill in yourself:
- The
GalleryDb:ConnectionString
setting is the connection string to your Gallery DB. - The
SearchServiceName
setting is the name of your Azure Search resource. For example, use the namefoo-bar
for the Azure Search service with URLhttps://foo-bar.search.windows.net
. - The
SearchServiceApiKey
setting is an admin key that has write permissions to the Azure Search resource. Make sure the Azure Search resource you're connecting to has API keys enabled (either in parallel with managed identities "RBAC" access or with managed identities authentication disabled). - The
StorageConnectionString
andAuxiliaryDataStorageConnectionString
settings are both the connection string to your Azure Blob Storage account. - The
DownloadsV1JsonUrl
setting is the URL todownloads.v1.json
file above. Make sure it works without authentication. - The
FeatureFlags:ConnectionString
setting is the connection string to your Azure Blob storage account.
{
"GalleryDb": {
"ConnectionString": "PLACEHOLDER"
},
"Db2AzureSearch": {
"AzureSearchBatchSize": 1000,
"MaxConcurrentBatches": 4,
"MaxConcurrentVersionListWriters": 8,
"SearchServiceName": "PLACEHOLDER",
"SearchServiceApiKey": "PLACEHOLDER",
"SearchIndexName": "search-000",
"HijackIndexName": "hijack-000",
"StorageConnectionString": "PLACEHOLDER",
"StorageContainer": "v3-azuresearch-000",
"StoragePath": "",
"GalleryBaseUrl": "https://www.nuget.org/",
"AuxiliaryDataStorageConnectionString": "PLACEHOLDER",
"AuxiliaryDataStorageContainer": "ng-search-data",
"AuxiliaryDataStorageExcludedPackagesPath": "ExcludedPackages.v1.json",
"DownloadsV1JsonUrl": "PLACEHOLDER",
"FlatContainerBaseUrl": "https://api.nuget.org/",
"FlatContainerContainerName": "v3-flatcontainer",
"AllIconsInFlatContainer": false,
"DatabaseBatchSize": 10000,
"CatalogIndexUrl": "https://api.nuget.org/v3/catalog0/index.json",
"EnablePopularityTransfers": true,
"Scoring": {
"FieldWeights": {
"PackageId": 9,
"TokenizedPackageId": 9,
"Tags": 5
},
"DownloadScoreBoost": 30000,
"PopularityTransfer": 0.99
}
},
"FeatureFlags": {
"ConnectionString": "PLACEHOLDER"
}
}
For local development and fast iteration, you can build the job with the NuGet.Insights Kusto tables.
You can use the following configuration as a starting point:
{
"Db2AzureSearch": {
"AzureSearchBatchSize": 1000,
"MaxConcurrentBatches": 4,
"MaxConcurrentVersionListWriters": 8,
"SearchServiceName": "<AZURE AI SEARCH RESOURCE NAME>",
"SearchServiceUseDefaultCredential": true,
"SearchIndexName": "search-001",
"HijackIndexName": "hijack-001",
"StorageConnectionString": "<AZURE STORAGE CONNECTION STRING>",
"StorageContainer": "v3-azuresearch-001",
"StoragePath": "",
"GalleryBaseUrl": "https://www.nuget.org/",
"FlatContainerBaseUrl": "https://api.nuget.org/",
"FlatContainerContainerName": "v3-flatcontainer",
"AllIconsInFlatContainer": false,
"EnablePopularityTransfers": true,
"Scoring": {
"FieldWeights": {
"PackageId": 9,
"TokenizedPackageId": 9,
"Tags": 5
},
"DownloadScoreBoost": 30000,
"PopularityTransfer": 0.99
},
"Development": {
"ReplaceContainersAndIndexes": true,
"DisableVersionListWriters": false,
"KustoConnectionString": "https://<KUSTO CLUSTER NAME>.kusto.windows.net",
"KustoDatabaseName": "<KUSTO DATABASE NAME>",
"KustoTableNameFormat": "Ni{0}",
"KustoTopPackageCount": 100000,
"KustoOnlyLatestPackages": true
}
},
"FeatureFlags": {
"ConnectionString": "<FEATURE FLAGS AZURE STORAGE CONNECTION STRING>"
},
"KeyVault_VaultName": "<KEY VAULT NAME, IF NEEDED>",
"KeyVault_UseManagedIdentity": true
}
At a high-level, here's how Db2AzureSearch works:
- Create the Azure Search indexes
- Create the Azure Blob storage container for the search auxiliary files
- Capture the catalog's cursor
- Load initial data from Gallery DB and statistics auxiliary files
- Process package metadata in batches
- Load a chunk of packages from Gallery DB
- Generate and upload documents to the Azure Search indexes
- Update the search version list resource
- Write the search auxiliary files to search storage
- Write the catalog's cursor to search storage