A workflow to export data and use them in SPARQUE AI is probably a much needed configuration for ICM. This guide shows one way to automate this workflow. For this automation, the following steps are necessary:
Create a product export job
Create a catalog export job (for each catalog)
Create a file destination in Azure DevOps
Create a transport configuration
Create a process chain for automatic export
Create all job configurations via DBInit
Configure SPARQUE to read from export destination
More detailed instructions can be found in the following sections.
A cartridge containing the jobs and process chain described in this document can be found here:
To automate the export of all products, create a job that runs the product export:
# Name of job configuration RunProductExport.Name=RunProductExport RunProductExport.Description=RunProductExport #RunProductExport.Date=2010.11.01 at 00:00:00 #RunProductExport.Interval=1440 RunProductExport.PipelineName=ProcessImpexJob RunProductExport.PipelineStartNode=Start RunProductExport.EnableJob=true RunProductExport.ApplicationSite=inSPIRED-Site RunProductExport.ApplicationURLIdentifier=inTRONICS # add custom attributes (keypair with AttributeName<Number> = AttributeValue<Number>) RunProductExport.AttributeName1=DomainName RunProductExport.AttributeValue1=inSPIRED-inTRONICS RunProductExport.AttributeName2=ExportDirectory RunProductExport.AttributeValue2=sparque RunProductExport.AttributeName3=JobName RunProductExport.AttributeValue3=ProcessCatalogImpex RunProductExport.AttributeName4=ProcessPipelineName RunProductExport.AttributeValue4=ProcessProductExport RunProductExport.AttributeName5=ProcessPipelineStartNode RunProductExport.AttributeValue5=Export RunProductExport.AttributeName6=SelectedFile RunProductExport.AttributeValue6=exportFromProcessChain.xml RunProductExport.AttributeName7=DeterminePageablePipeline RunProductExport.AttributeValue7=ProcessProductSearch-SimpleSearch
Catalogs need to be exported separately, one export per catalog. This can also be done via a job configuration, similar to the following:
# Name of job configuration RunCatalogCamerasExport.Name=RunCatalogCamerasExport RunCatalogCamerasExport.Description=RunCatalogCamerasExport #RunCatalogCamerasExport.Date=2010.11.01 at 00:00:00 #RunCatalogCamerasExport.Interval=1440 RunCatalogCamerasExport.PipelineName=ProcessImpexJob RunCatalogCamerasExport.PipelineStartNode=Start RunCatalogCamerasExport.EnableJob=true RunCatalogCamerasExport.ApplicationSite=inSPIRED-Site RunCatalogCamerasExport.ApplicationURLIdentifier=inTRONICS # add custom attributes (keypair with AttributeName<Number> = AttributeValue<Number>) RunCatalogCamerasExport.AttributeName1=DomainName RunCatalogCamerasExport.AttributeValue1=inSPIRED-inTRONICS RunCatalogCamerasExport.AttributeName2=ExportDirectory RunCatalogCamerasExport.AttributeValue2=sparque RunCatalogCamerasExport.AttributeName3=CatalogID RunCatalogCamerasExport.AttributeValue3=Cameras-Camcorders RunCatalogCamerasExport.AttributeName4=ProcessPipelineName RunCatalogCamerasExport.AttributeValue4=ProcessCatalogExport RunCatalogCamerasExport.AttributeName5=ProcessPipelineStartNode RunCatalogCamerasExport.AttributeValue5=Export RunCatalogCamerasExport.AttributeName6=SelectedFile RunCatalogCamerasExport.AttributeValue6=exportCameras.xml
To create a file destination in Azure DevOps, perform the following steps:
Go to Microsoft Azure.
Create a storage account or use an existing one.
Create a new container or fileshare. In this example we will call it sparque.
See also Microsoft | Create a container.
See also Microsoft | Create Fileshare.
Create an access key, it will be required in the next step.
For the full transport, create a transport configuration as shown below:
domain=inSPIRED-inTRONICS process.id=SparqueTransport process.displayname=SparqueTransport process.type=AZURE location.local=<path to shared file system>/sites/inSPIRED-inTRONICS-Site/units/inSPIRED-inTRONICS/impex/export/sparque account.key=<previously created access key> account.name=<storage account name> file.share=<previously created container/fileshare name, e.g. blob://sparque. Important: use prefix blob:// for container or file:// for fileshare> process.direction=PUSH process.delete=0
The transport can then be automated using a job.
ExecuteSparqueTransport.Name=ExecuteSparqueTransport ExecuteSparqueTransport.Description=ExecuteSparqueTransport #ExecuteSparqueTransport.Date=2010.11.01 at 00:00:00 #ExecuteSparqueTransport.Interval=1440 ExecuteSparqueTransport.PipelineName=FileTransportJob ExecuteSparqueTransport.PipelineStartNode=Start ExecuteSparqueTransport.EnableJob=true # add custom attributes (keypair with AttributeName<Number> = AttributeValue<Number>) ExecuteSparqueTransport.AttributeName1=TransportProcessID ExecuteSparqueTransport.AttributeValue1=SparqueTransport
The process chain contains all previous exports and the transport configuration. Timeouts should be adjusted for projects. Also, depending on the number of products/categories, it may be more efficient/faster to run the exports concurrently.
For details on all of the process chain options, see Concept - Process Chains (valid to 11.x).
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <chain xmlns="https://www.intershop.com/xml/ns/semantic/processchain/v1" name="Chain 1" timeout="90"> <sequence name="Chain 1.1 - Sequence" timeout="90"> <job job="RunProductExport" domain="inSPIRED-inTRONICS" name="Chain 1.1.1 - Job" timeout="60"/> <job job="RunCatalogCamerasExport" domain="inSPIRED-inTRONICS" name="Chain 1.1.2 - Job" timeout="60"/> <!--- more catalog exports i.e.<job job="RunCatalogSpecialsExport" domain="inSPIRED-inTRONICS" name="Chain 1.1.3 - Job" timeout="60"/>---> <job job="ExecuteSparqueTransport" domain="inSPIRED-inTRONICS" name="Chain 1.1.4 - Job" timeout="30"/> </sequence> </chain>
A process chain can be triggered manually in the back office, the automation approach would be to create a job configuration for this as well:
# Name of job configuration ExecuteSparqueProcessChain.Name=ExecuteSparqueProcessChain ExecuteSparqueProcessChain.Description=ExecuteSparqueProcessChain #ExecuteSparqueProcessChain.Date=2010.11.01 at 00:00:00 #ExecuteSparqueProcessChain.Interval=1440 ExecuteSparqueProcessChain.PipelineName=ExecuteProcessChain ExecuteSparqueProcessChain.PipelineStartNode=Start ExecuteSparqueProcessChain.EnableJob=true # add custom attributes (keypair with AttributeName<Number> = AttributeValue<Number>) ExecuteSparqueProcessChain.AttributeName1=XmlFileName ExecuteSparqueProcessChain.AttributeValue1=inSPIRED-inTRONICS-Site/units/inSPIRED-inTRONICS/impex/config/ExportAndTransportProducts.xml
Job configurations and transport configurations can be created through DBInit using the PrepareTransportConfiguration
and PrepareJobConfigurations
preparers:
Class1500 = com.intershop.component.transport.dbinit.PrepareTransportConfiguration \ com.intershop.demo.responsive.dbinit.data.job.TransportConfiguration Class1505 = com.intershop.beehive.core.dbinit.preparer.job.PrepareJobConfigurations \ inSPIRED-inTRONICS \ com.intershop.demo.responsive.dbinit.data.job.JobConfigurations
To access the created file in the file share, a Shared Access Signature must be created.
Navigate to Security + networking | Shared access signature:
Settings:
Allowed services: File
Allowed resource types: Service, Container, Object
Allowed permissions: Read, List
Allowed IP addresses: Add if necessary
Define the expiry date/time - Select a proper date in the future. Make sure you refresh the signature after expiration date.
Signing Key: Use the same access key as above.
Click on Generate SAS and connection string.
Copy the string of SAS token.
See also Microsoft | Grant limited access to Azure Storage resources using shared access signatures (SAS).
Alternatively, if the Azure Portal is not available for the task, you can create an SAS token using the Azure CLI as follows:
az storage share generate-sas --name <share-name> --account-name <storage-account-name> --permissions rl --https-only --expiry 2028-01-01T00:00Z --account-key <storage-account-key>
Example:
user@computer:~$ az storage share exists --name sharename --account-name azurestorageaccountname --account-key your_own_key==
{
"exists": false
}
user@computer:~$ az storage share create --name sharename --account-name azurestorageaccountname --account-key your_own_key==
{
"created": true
}
user@computer:~$ az storage share generate-sas --name sharename --account-name azurestorageaccountname --permissions rl --expiry 2029-01-01T00:00Z --account-key your_own_key==
"se=2029-01-01T00%3A00Z&sp=rl&sv=2021-06-08&sr=s&sig=secretkeyULZi%2BT9McrADbEBvCtRRTgK0MIumRzac%3D"
To create a Shared Access Signature, do the following:
Navigate to Containers.
Open the container.
Click the three dots next to your file, and then click Generate SAS:
Use the following settings:
Select the time frame in which the SAS token will be available.
Click Generate SAS token and URL and copy the value.
Alternatively, you can use Azure CLI to create the token:
az storage blob generate-sas \ --account-name $STORAGE_ACCOUNT_NAME \ --container-name $CONTAINER_NAME \ --name $BLOB_NAME \ --permissions r \ --expiry <expiry-date-time> \ --https-only \ --output tsv
After running the export jobs and the transport configuration, the exported and transported files are located in the created file share. SPARQUE.AI can use this file share as a base for a dataset. To use this function, configure a dataset source to Fetch a file from URL and enter the path to the file share along with the access key. This allows SPARQUE.AI to fetch data from this data source.
Example fileshare: <https://<storageaccount>.file.core.windows.net/<filesharename>/<exportfile>?<SAS token>
Example blob storage: <https://<storageaccount>.blob.core.windows.net/<containername>/<exportfile>?<SAS token>