{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Creating a Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from SPARQLWrapper import SPARQLWrapper, JSON\n",
    "import pandas as pd\n",
    "import hashlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To install the use the dependencies for this notebook:\n",
    "\n",
    "```bash\n",
    "conda env create -f environment.yml\n",
    "source activate fornax_tutorial\n",
    "```\n",
    "\n",
    "To run this notebook from the project root:\n",
    "\n",
    "```bash\n",
    "cd docs/tutorial\n",
    "jupyter-notebook\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download\n",
    "\n",
    "For the duration of this tutorial we will be using the social network of Marvel Comicbook characters.\n",
    "\n",
    "Nodes will represent \n",
    "\n",
    "* characters \n",
    "* aliases \n",
    "* groups of characters\n",
    "\n",
    "edges will represent relationships between the nodes.\n",
    "\n",
    "For example Wolverine, Logan and X-Men are all nodes. \n",
    "There is an edge between Wolverine and Logan because Logan is an alternative name for Wolverine.\n",
    "There is an edge between Wolverine and X-Men because Wolverine is a member of X-Men.\n",
    "There is no direct relationship between Logan and X-Men so there is no edge between them.\n",
    "\n",
    "### SPARQL\n",
    "\n",
    "Below is a SPARQL query which will return data in the following format (using 'Wolverine' as an example):\n",
    "\n",
    "```json\n",
    "{\n",
    "    \"group\": {\n",
    "        \"type\": \"uri\",\n",
    "        \"value\": \"http://www.wikidata.org/entity/Q2690825\"\n",
    "    },\n",
    "    \"character\": {\n",
    "        \"type\": \"uri\",\n",
    "        \"value\": \"http://www.wikidata.org/entity/Q186422\"\n",
    "    },\n",
    "    \"birthName\": {\n",
    "        \"xml:lang\": \"en\",\n",
    "        \"type\": \"literal\",\n",
    "        \"value\": \"James Howlett\"\n",
    "    },\n",
    "    \"characterLabel\": {\n",
    "        \"xml:lang\": \"en\",\n",
    "        \"type\": \"literal\",\n",
    "        \"value\": \"Wolverine\"\n",
    "    },\n",
    "    \"groupLabel\": {\n",
    "        \"xml:lang\": \"en\",\n",
    "        \"type\": \"literal\",\n",
    "        \"value\": \"Horsemen of Apocalypse\"\n",
    "    },\n",
    "    \"characterAltLabel\": {\n",
    "        \"xml:lang\": \"en\",\n",
    "        \"type\": \"literal\",\n",
    "        \"value\": \"Logan, Weapon X, Jim Logan, Patch, James Howlett, Agent Ten, Experiment X, Weapon Ten\"\n",
    "    }\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "sparql = SPARQLWrapper(\"https://query.wikidata.org/sparql\")\n",
    "sparql.setQuery(\"\"\"\n",
    "    SELECT ?character ?characterLabel ?group ?groupLabel ?birthName ?characterAltLabel \n",
    "    WHERE {\n",
    "        ?group wdt:P31 wd:Q14514600 ;  # group of fictional characters\n",
    "              wdt:P1080 wd:Q931597.    # from Marvel universe\n",
    "        ?character wdt:P463 ?group.    # member of group\n",
    "        optional{ ?character wdt:P1477 ?birthName. }\n",
    "      SERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\".}\n",
    "    }\n",
    "\"\"\")\n",
    "sparql.setReturnFormat(JSON)\n",
    "results = sparql.query().convert()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading with Pandas\n",
    "\n",
    "We'll be using pandas to do some data manipulation so lets put the result inside a pandas dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the results into a pandas DataFrame\n",
    "records = []\n",
    "for result in results[\"results\"][\"bindings\"]:\n",
    "    character_id = result['character']['value']\n",
    "    group_id = result['group']['value']\n",
    "    name = result['characterLabel']['value']\n",
    "    group = result['groupLabel']['value']\n",
    "    alt_names = None\n",
    "    if 'characterAltLabel' in result:\n",
    "        alt_names = result['characterAltLabel']['value']\n",
    "    birth_name = None\n",
    "    if 'birthName' in result:\n",
    "        birth_name = result['birthName']['value']\n",
    "    records.append((character_id, group_id, name, group, birth_name, alt_names))\n",
    "\n",
    "frame = pd.DataFrame.from_records(records, columns=['character_id', 'group_id', 'name', 'group', 'birth_name', 'alt_names'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>character_id</th>\n",
       "      <th>group_id</th>\n",
       "      <th>name</th>\n",
       "      <th>group</th>\n",
       "      <th>birth_name</th>\n",
       "      <th>alt_names</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>http://www.wikidata.org/entity/Q60002</td>\n",
       "      <td>http://www.wikidata.org/entity/Q2603976</td>\n",
       "      <td>Colossus</td>\n",
       "      <td>Excalibur</td>\n",
       "      <td>Пётр Николаевич Распутин</td>\n",
       "      <td>Peter Rasputin, Piotr Nikolayevich Rasputin, P...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>http://www.wikidata.org/entity/Q258015</td>\n",
       "      <td>http://www.wikidata.org/entity/Q2603976</td>\n",
       "      <td>Rachel Summers</td>\n",
       "      <td>Excalibur</td>\n",
       "      <td>Rachel Anne Summers</td>\n",
       "      <td>Phoenix, Prestige, Marvel Girl, Mother Askani,...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>http://www.wikidata.org/entity/Q369197</td>\n",
       "      <td>http://www.wikidata.org/entity/Q2527918</td>\n",
       "      <td>Black Widow</td>\n",
       "      <td>Thunderbolts</td>\n",
       "      <td>Наталья Алиановна Романова</td>\n",
       "      <td>Natalie Rushman, Natasha Romanoff, asesina rus...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>http://www.wikidata.org/entity/Q388316</td>\n",
       "      <td>http://www.wikidata.org/entity/Q2527918</td>\n",
       "      <td>Bullseye</td>\n",
       "      <td>Thunderbolts</td>\n",
       "      <td>None</td>\n",
       "      <td>Lester, Hawkeye, Benjamin Poindexter</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>http://www.wikidata.org/entity/Q432272</td>\n",
       "      <td>http://www.wikidata.org/entity/Q2457162</td>\n",
       "      <td>Medusa</td>\n",
       "      <td>Frightful Four</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             character_id  \\\n",
       "0   http://www.wikidata.org/entity/Q60002   \n",
       "1  http://www.wikidata.org/entity/Q258015   \n",
       "2  http://www.wikidata.org/entity/Q369197   \n",
       "3  http://www.wikidata.org/entity/Q388316   \n",
       "4  http://www.wikidata.org/entity/Q432272   \n",
       "\n",
       "                                  group_id            name           group  \\\n",
       "0  http://www.wikidata.org/entity/Q2603976        Colossus       Excalibur   \n",
       "1  http://www.wikidata.org/entity/Q2603976  Rachel Summers       Excalibur   \n",
       "2  http://www.wikidata.org/entity/Q2527918     Black Widow    Thunderbolts   \n",
       "3  http://www.wikidata.org/entity/Q2527918        Bullseye    Thunderbolts   \n",
       "4  http://www.wikidata.org/entity/Q2457162          Medusa  Frightful Four   \n",
       "\n",
       "                   birth_name  \\\n",
       "0    Пётр Николаевич Распутин   \n",
       "1         Rachel Anne Summers   \n",
       "2  Наталья Алиановна Романова   \n",
       "3                        None   \n",
       "4                        None   \n",
       "\n",
       "                                           alt_names  \n",
       "0  Peter Rasputin, Piotr Nikolayevich Rasputin, P...  \n",
       "1  Phoenix, Prestige, Marvel Girl, Mother Askani,...  \n",
       "2  Natalie Rushman, Natasha Romanoff, asesina rus...  \n",
       "3               Lester, Hawkeye, Benjamin Poindexter  \n",
       "4                                               None  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Splitting into Tables\n",
    "\n",
    "The dataframe above is unwieldy since it contains a list of values in the alt_names column.\n",
    "Most values also appear in many rows.\n",
    "Below we seperate the frame into a set of tables with unique rows much like we would for a relational database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "names = frame[['character_id', 'name']].drop_duplicates()\n",
    "groups = frame[['group_id', 'group']].drop_duplicates()\n",
    "character_group = frame[['character_id', 'group_id']].drop_duplicates()\n",
    "birth_names = frame[\n",
    "    frame['birth_name'].notna() # do not include a row for characters without a birthname\n",
    "][['character_id', 'birth_name']].drop_duplicates()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "records = []\n",
    "for uid, alt_names in zip(frame['character_id'], frame['alt_names']):\n",
    "    if alt_names is None:\n",
    "        continue\n",
    "    for name in alt_names.split(','):\n",
    "        records.append({'character_id': uid, 'alt_name': name})\n",
    "alt_names = pd.DataFrame.from_records(records).drop_duplicates()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analysis\n",
    "\n",
    "Lets quickly examine the data to check it still makes sense.\n",
    "\n",
    "There are 399 unique characters in the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "400"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count the number of unique character ids\n",
    "len(names)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The characters belong to 107 different groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "107"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count the number of unique group ids\n",
    "len(groups)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's find Wolverine..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>character_id</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [character_id, name]\n",
       "Index: []"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "names[names['name'] == 'Wolverine']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What groups has he been in?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0                     X-Men\n",
       "1              Alpha Flight\n",
       "2                  Avengers\n",
       "3    Horsemen of Apocalypse\n",
       "4                    Team X\n",
       "Name: group, dtype: object"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "character_group[\n",
    "    character_group['character_id'] == 'http://www.wikidata.org/entity/Q186422'\n",
    "].merge(groups, on='group_id')['group']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What are his alternative names?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>alt_name</th>\n",
       "      <th>character_id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>187</th>\n",
       "      <td>Logan</td>\n",
       "      <td>http://www.wikidata.org/entity/Q186422</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>188</th>\n",
       "      <td>Weapon X</td>\n",
       "      <td>http://www.wikidata.org/entity/Q186422</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189</th>\n",
       "      <td>Jim Logan</td>\n",
       "      <td>http://www.wikidata.org/entity/Q186422</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>190</th>\n",
       "      <td>Patch</td>\n",
       "      <td>http://www.wikidata.org/entity/Q186422</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>191</th>\n",
       "      <td>James Howlett</td>\n",
       "      <td>http://www.wikidata.org/entity/Q186422</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192</th>\n",
       "      <td>Agent Ten</td>\n",
       "      <td>http://www.wikidata.org/entity/Q186422</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>193</th>\n",
       "      <td>Experiment X</td>\n",
       "      <td>http://www.wikidata.org/entity/Q186422</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>194</th>\n",
       "      <td>Weapon Ten</td>\n",
       "      <td>http://www.wikidata.org/entity/Q186422</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           alt_name                            character_id\n",
       "187           Logan  http://www.wikidata.org/entity/Q186422\n",
       "188        Weapon X  http://www.wikidata.org/entity/Q186422\n",
       "189       Jim Logan  http://www.wikidata.org/entity/Q186422\n",
       "190           Patch  http://www.wikidata.org/entity/Q186422\n",
       "191   James Howlett  http://www.wikidata.org/entity/Q186422\n",
       "192       Agent Ten  http://www.wikidata.org/entity/Q186422\n",
       "193    Experiment X  http://www.wikidata.org/entity/Q186422\n",
       "194      Weapon Ten  http://www.wikidata.org/entity/Q186422"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "alt_names[alt_names['character_id'] == 'http://www.wikidata.org/entity/Q186422']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is his birthname?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>character_id</th>\n",
       "      <th>birth_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>103</th>\n",
       "      <td>http://www.wikidata.org/entity/Q186422</td>\n",
       "      <td>James Howlett</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                               character_id     birth_name\n",
       "103  http://www.wikidata.org/entity/Q186422  James Howlett"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "birth_names[birth_names['character_id'] == 'http://www.wikidata.org/entity/Q186422']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is the biggest team?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(116,                                    group_id  group\n",
       " 125  http://www.wikidata.org/entity/Q128452  X-Men)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sorted(\n",
    "    [\n",
    "        (len(group), group[['group_id', 'group']].drop_duplicates()) \n",
    "        for uid, group \n",
    "        in character_group.merge(groups, on='group_id').groupby('group_id')\n",
    "    ], \n",
    "    key=lambda x: x[0], \n",
    "    reverse=True\n",
    ")[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Who has been in the most groups?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(8, 102    Cannonball\n",
       " Name: name, dtype: object)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sorted(\n",
    "    [\n",
    "        (len(group), group['name'].drop_duplicates()) \n",
    "        for uid, group \n",
    "        in character_group.merge(names, on='character_id').groupby('character_id')\n",
    "    ],\n",
    "    key=lambda x: x[0], \n",
    "    reverse=True\n",
    ")[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Export to CSV\n",
    "\n",
    "Let's write each node to a csv file, we need to record\n",
    "\n",
    "* a unique ID for each node (we use a hash of the Wikidata URL since later we will need an integer ID)\n",
    "* a label (such as the name of the character or the group)\n",
    "* a type (0, 1, 2 for character, group or birthname)\n",
    "\n",
    "Node that birth names don't have a Wikidata URL so we just use a hash of the name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_id(url):\n",
    "    \"\"\" A function to map the python hash function onto 32-bit integers\"\"\"\n",
    "    return int(hashlib.sha256(url.encode('utf-8')).hexdigest(), 16) % 2147483647 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "nodes = pd.concat(\n",
    "    [\n",
    "        pd.DataFrame({'uid': [get_id(item) for item in names['character_id']], 'type': 0, 'label': names['name']}),\n",
    "        pd.DataFrame({'uid': [get_id(item) for item in groups['group_id']], 'type': 1, 'label': groups['group']}),\n",
    "        pd.DataFrame({'uid': [get_id(item) for item in birth_names['birth_name']], 'type': 2, 'label': birth_names['birth_name']}),\n",
    "        pd.DataFrame({'uid': [get_id(item) for item in alt_names['alt_name']], 'type': 2, 'label': alt_names['alt_name']})\n",
    "    ], \n",
    "    sort=True\n",
    ").drop_duplicates()\n",
    "nodes.to_csv('./nodes.csv', index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>type</th>\n",
       "      <th>uid</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Colossus</td>\n",
       "      <td>0</td>\n",
       "      <td>2105314676</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Rachel Summers</td>\n",
       "      <td>0</td>\n",
       "      <td>298635603</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Black Widow</td>\n",
       "      <td>0</td>\n",
       "      <td>1897346471</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Bullseye</td>\n",
       "      <td>0</td>\n",
       "      <td>2027281781</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Medusa</td>\n",
       "      <td>0</td>\n",
       "      <td>347320780</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            label  type         uid\n",
       "0        Colossus     0  2105314676\n",
       "1  Rachel Summers     0   298635603\n",
       "2     Black Widow     0  1897346471\n",
       "3        Bullseye     0  2027281781\n",
       "4          Medusa     0   347320780"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nodes.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Edges connect characters to their birth names and their groups."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# character_group_edges = frame[['character_id', 'group_id']].drop_duplicates()\n",
    "# character_birth_name_edges = frame[['character_id', 'birth_name']].drop_duplicates()\n",
    "\n",
    "edges = pd.concat([\n",
    "    # character to group\n",
    "    pd.DataFrame([\n",
    "        {'start': get_id(start), 'end': get_id(end)}\n",
    "        for start, end in zip(character_group['character_id'], character_group['group_id'])\n",
    "    ]),\n",
    "\n",
    "    # character to alt name\n",
    "    pd.DataFrame([\n",
    "        {'start': get_id(start), 'end': get_id(end)}\n",
    "        for start, end in zip(alt_names['character_id'], alt_names['alt_name'])\n",
    "    ])\n",
    "])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "edges.to_csv('./edges.csv', index=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}