When you upload a data file that has more complex data than a CSV can naturally represent, the split
operation can be useful.
Note: In this article, rows are referred to as "units".
Any of the columns of your uploaded CSV may be internally delimited by a special character (by default, a blank space). Calling split
on this column will let Appen know that the contents of this column should be treated by Appen internally as a collection of discrete items rather than a block.
Split |
||
Method |
Endpoint |
Parameters |
|
|
|
Suppose your existing dataset is an arbitrary collection of major authors.
author,major_works,countries_active
Homer,The Iliad|The Odyssey,Greece
Dickens,David Copperfield|Bleak House,England
Nabokov,Camera Obscura|Lolita,Russia|United States
Rabelais,Gargantua and Pantagruel,France
Cervantes,Don Quixote,Spain
When this data is posted as a CSV to Appen, one row is created for each of the five rows of data. The rows each have data associated with the three CSV columns provided. When initially posted, Appen treats all of the values transferred as free text values with no depth or structure. After the initial data post, Dickens' major works field is set to David Copperfield|Bleak House
To let Appenknow that the major_works
and countries_active
columns are each actually collections of delimited values, you can use the split
operation.
curl -X PUT --data-urlencode "key={api_key}" https://api.figure-eight.com/v1/jobs/{job_id}/units/split?on=major_works,countries_active&with=|
After the PUT, Appen will consider Dickens' major_works
field to be set to the collection [ "David Copperfield", "Bleak House" ]. Similarly, Nabokov's countries_active
field will be set to [ "Russia", "United States" ]. The brackets indicate a data structure that is analogous to a List or Vector in Java, a list in Python, an Array in Ruby, etc. If you were to request Homer's major_works
from Appen, it would be returned as a JSON array:
{major_works: [ "The Iliad","The Odyssey" ]}
Because the author
field was not split, it will not be treated as a collection:
{author: "Homer"}