Data Structure

Oct 15 2020 Documentation an hour read (About 9915 words)

初级数据结构研究报告。我正在着手学习更高级的数据结构，此时，初级数据结构成为了很好的基础。谨此记录一些时间复杂度分析和数据结构优缺点分析。

Lab1 Recursion

Subject1. Solve the hanoi problem with recursion

Introduction

Write a recursive program to solve the Hanoi problem, output the procedure of movements of the disks.

Algorithms Specification

Hanoi is a classic problem of recursion algorithm, which can be easily divided into smaller tasks to be solved.
I wrote a function called Hanoi, which handles the situation when there are n disks need to be move.
When the function is handling only 1 disk, then it is obvious that just move the disk from needle A to C.
Otherwise, we break the task into two smaller pieces, which handle scale of n-1 disks respectively.
There is no specific data structure used.

Test results

input n=5

This implies that there are 5 disks need to be moved initially.
The output is:

which is defintely correct.

input n=0

There is no output, and the program reach its end.
Which means the input is invalid.

Analysis and Comments

Time complexity:

$T(n)=2\times T(n-1)+1$
$=2\times (2\times T(n-2)+1)+1$
$=2^{2}\times T(n-2)+1+2^{1}$
$…$
$=2^{k}\times T(1)+1+2^{1}+…+2^{k-1}$
$=O(2^k)+O(2^k)$
$=O(2^k)$
and we know, $k=n-1$.
Therefore, time complexity is $O(2^n)$

Space complexity:
This requires space for saving the recursions.

Therefore, the time complexity and space complexity are both large. However, for this problem, we cannot get more time efficient algorithm since the requirement is output the total moves and the number of moves is $O(2^n)$.

Source code

#include<stdio.h>

void Hanoi(int n,char a,char b,char c){ //the hanoi function
    if(n<1) return; //when the input is invalid, just return
    if(n==1) printf("Move the %dth disk from %c to %c\n",n,a,c); //when there is just one disk, move it from A to C
    else {
        Hanoi(n-1,a,c,b); //recursive step: using C
        printf("Move the %dth disk from %c to %c\n",n,a,c); //move the bottom disk from A to C
        Hanoi(n-1,b,a,c); //recursive step: using A
    }
}

int main(){
    puts("Please input the number of disk(s):");
    int n;scanf("%d",&n);
    Hanoi(n,'A','B','C');
}

Subject2. The problem of finding the paths and the number of paths

Introduction

Given an $m$ by $n$ grid as shown in the following figure (in the figure $m=2$ and $n=5$).

Suppose there is a robot at the upper left corner (with the coordinate (m,n)) and the robot can only take one step to the right or down at a time, the questions are:

how many paths are there for the robot to reach the lower right corner (with the
coordinate (1,1))
output the paths.

Algorithms Specification

The number of paths can be obtained in two different ways:

Since the robot can only walk in two directions, the number of steps to the destination is a constant then, which equals to $m+n-2$. Therefore, we choose $m-1$ steps from the total steps to walk downwards, which is simply $\tbinom{m+n-2}{m-1}$.
the other way to obtain the number of ways is using recursion.

$f(m,n)= \begin{cases} 0& \text{m<0||n<0}\\ 1& \text{m=1&&n=1}\\ f(m-1,n)+f(m,n-1)& \text{otherwise} \end{cases}$

For the part of finding the exact steps, we can simply write one function of DFS while using an array to record the steps. The DFS (i.e.recursion) stops at (1,1), which is the destination. When we reach the destination, print the array.

To record the steps, I use a struct data type named PathType.

Test results

input: 5 2

Input the sample input of the question to test. The answer is correct.

input: -4 -4

Input an invalid input to test if it will fall into an infinite loop. The result seems everything is fine.

Analysis and Comments

Time complexity:

pathnum

Pathnum uses combination to calculate the number of paths, since it needs to calculate factorial. The easier way to calculate factorial is simply one while loop or recursion, since it is always better to use while loop then recursion to save the space, we choose while loop, which requires $O(n)$.
There is another way to calculate factorial, that is, using the decomposition of Number Theory. The factorial can be expressed as a product of primes with specific powers.

$n!=\prod_{p<=n}p^{\sum_{r=1}^{\infty}\left[ \frac{n}{p^r}\right]}$

pathnum2

Pathnum2 uses recursion to calculate the number of paths, the time complexity is roughly $O(\frac{(n+m)!}{n!m!})$.

Prove:

(i) We first use programs to get the times of calculating f(m,n), and record them inside one table.

#include<stdio.h>
int t;

int pathnum2(int m,int n){
    t++;
    if(m<1||n<1) return 0;
    if(m==1&&n==1) return 1;
    return pathnum2(m-1,n)+pathnum2(m,n-1);
}

int main(){
    int m, n;
    for(m=1;m<=10;++m){
        for(n=1;n<=10;++n){
            t=0;pathnum2(m,n);
            printf("%07d ",t);
        }
        puts("");
    }
}

We obtain the matrix A:
whose row represents $m$, and column represents $n$, and value represents how many times the recursion function is called during calculating the number of paths starting from (m,n).

$\left[ \begin{matrix} 1 & 3 & 5 & 7 & 9 & \cdots \\ 3 & 7 & 13 & 21 & 31 &\cdots \\ 5 & 13 & 27 & 49 & 81 &\cdots \\ 7 & 21 & 49 & 99 & 181 &\cdots \\ 9 & 31 & 81 & 181 & 363 &\cdots \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ \end{matrix} \right]$

Observing A, we can construct a new matrix B whose values on the corresponding position are $\frac{A+1}{2}$.

$\left[ \begin{matrix} 1 & 2 & 3 & 4 & 5 & \cdots \\ 2 & 4 & 7 & 11 & 16 &\cdots \\ 3 & 7 & 14 & 25 &\cdots \\ 4 & 11 & 25 & 50 &\cdots \\ 5 & 16 &\cdots \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ \end{matrix} \right]$

B has one characteristic(*):

$b_{m,n} = b_{m-1,n}+b_{m,n-1}$

Therefore, we only need to find the relation equation between $m$,$n$ and $b_{m,n}$ , and after that we can get the representing equation between $m$,$n$ and $a_{m,n}$, as well as the time complexity.

We know that Pascal triangle also has the characteristic(*). So we write the matrix of Pascal triangle(C):

$\left[ \begin{matrix} 1 & 1 & 1 & 1 & 1 & \cdots \\ 1 & 2 & 3 & 4 & 5 &\cdots \\ 1 & 3 & 6 & 10 & 15 &\cdots \\ 1 & 4 & 10 & 20 & 35 &\cdots \\ 1 & 5 & 15 & 35 & 70 & \cdots \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ \end{matrix} \right]$

Compare C and B, we obtain that the only difference is that the starting values of each row and column are different. The ones of B is $1,2,3,4,...$ , while the ones of C is $1,1,1,1,...$ .
Thus, we guess that B is obtained by addition of C and another C that offset a bit, since both B and C satisfy (*).

By comparing C and B one by one, we obtain that C is a sum of B and another Pascal triangle.
Therefore, we obtain the relation equation of $m$ and $n$ and $a_{m,n}$ using the matrix expression of Pascal triangle:

$a_{m,n}=T(f(m,n))=2\times (\tbinom{m+n}{n}-\tbinom{m+n-2}{n-1})-1$

Therefore:

$T(f(m,n))=O(\frac{(n+m)!}{n!m!})$

Completed.

disppath

Disppath also use recursion to record the paths, so its time complexity is the same as the one of pathnum2.

Space complexity:

pathnum

Since we do not need any array to save temporary data, the space complexity is $O(1)$.

pathnum2

It requires some space of stack to save the statuses when go through recursion.

disppath

It is the same as the one of pathnum2.

The time complexity and space complexity are quite large, and they depend on how many statuses encountered when doing the recursion. Therefore, if we consider Memory Search, that is, avoid repeating the statuses that already been through, the time and space will be saved to some extent.

Source code

#include<stdio.h>
const int N = 1e5+5;

typedef struct{  //the struct to record the path
    int i, j;  //x and y
}PathType;

PathType path[N];
int ca; //the idex of the current path

int fac(int n){ //get the factorial of n, use for combination
    int ans=1;
    while(n>=1) ans*=n,n--;
    return ans;
}

int C(int n,int m){ //get the combination of n and m
    return fac(n)/fac(n-m)/fac(m);
}

int pathnum(int m, int n){ //calculate the number of paths directly
    return C(m+n-2,m-1);
}

int pathnum2(int m,int n){ //the second way to get the number of paths: recursion
    if(m<1||n<1) return 0;
    if(m==1&&n==1) return 1;
    return pathnum2(m-1,n)+pathnum2(m,n-1);
}

void disppath(int x, int y, PathType path[], int dis){ //print the paths
    if(x == 1&& y== 1) { //when it reaches the destination
        path[dis].i = x, path[dis].j = y;
        printf("For path No.%d:\nthe distance of this path is %d\n", ++ ca, dis);
        for(int i = 0; i <= dis; ++ i) printf("(%d %d)", path[i].i, path[i].j); //output the path
        puts(""),puts("");
        return;
    }
    path[dis].i = x, path[dis].j = y; //record current position
    if(x > 1) disppath(x-1, y, path, dis+1); //recusive step
    if(y > 1) disppath(x, y-1, path, dis+1); //recusive step
}

int main(){
    int m, n;
    puts("Please input the location of the robot:");
    scanf("%d%d", &m, &n);
    printf("The number of paths is %d\n",pathnum(m,n));
    disppath(m, n, path, 0);
}

Lab2 List

Subject1. Sequential list

Introduction

Implement the abstract data structure: Sequential list, introducing its characteristics.

Algorithms Specification

There is no specific algorithm used.

When we need to modify some elements, we just modify them according to the index that we already know.
When we need to do addition and deletion, we need to move the elements behind the one we are going to operate.

Test Results

Analysis and Comments

Time Complexity:

create: $O(1)$
initialization: $O(n)$
check if is empty: $O(1)$
check the length: $O(1)$
display the list: $O(n)$
get the element: $O(1)$
locate one element (query) $O(n)$
insert and delete $O(n)$

Space Complexity:
it only requires space for the array. Therefore, the space complexity is $O(n)$.

Sequential list is easy to implement.
The only significant thing it contains is one array of some data type.
Therefore, obtaining element in specific position is quite direct, as well as modifying the value of several elements.
However, it may be somehow troublesome to delete or add elements, since the elements are consecutive, and they have unique indexes.
Moreover, one should know the scale of the amount of data first, which is not flexible enough.

Source Code

//
// Created by Artist on 2020/10/16.
// seqlist
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

#define MAXLEN 100
#define ERROR -1
#define OK 1
#define OVER 0

typedef int ElemType;

typedef struct{
    ElemType data[MAXLEN];
    int Length;
}SeqList;

SeqList *Create(){
    SeqList *L=(SeqList *)malloc(sizeof(SeqList));
    return L;
}

void InitList(SeqList *L){
    int i;
    printf("Please input the length of the list:\n");
    scanf("%d",&L->Length);

    printf("Please input the values of the elements:\n");
    for(i=0;i<L->Length;i++) scanf("%d",&L->data[i]);
}

void Destroy(SeqList *L){
    if(L==NULL) return;
    free(L);
}

bool ListEmpty(SeqList *L){
    return (L->Length)?false:true;
}

int ListLength(SeqList *L){
    return L->Length;
}

void DispList(SeqList *L){
    for(int j=0;j<L->Length;++j) printf("%d ",L->data[j]);
    puts("");
}

ElemType GetElem(SeqList *L,int i){
    if(i<1||i>L->Length) {puts("Location Error!");return ERROR;}
    return L->data[i-1];
}

int LocateElem(SeqList *L,ElemType e){
    for(int j=0;j<L->Length;++j) if(L->data[j]==e) return j+1;
    puts("the element does not exist");
    return ERROR;
}

int SeqInsert(SeqList *L,int i,ElemType e){
    if(L->Length>=MAXLEN) {puts("The List is full!");return OVER;}
    if(i<1||i>L->Length) {puts("Location Error!");return ERROR;}
    if(i==L->Length) {L->data[i]=e;L->Length++;return OK;}
    for(int j=L->Length-1;j>=i-1;--j) L->data[j+1]=L->data[j];      // move the other data to the next position
    L->data[i-1]=e;
    L->Length++;
    return OK;
}

int SeqDelete(SeqList *L,int i){
    if(L->Length==0) {puts("Empty List!");return ERROR;}
    if(i<1||i>L->Length) {printf("the %dth element does not exist",i);return ERROR;}
    int res = L->data[i-1];
    for(int j=i;j<=L->Length-1;++j) L->data[j-1]=L->data[j];        // move the data to the former position
    L->Length--;
    puts("Done");
    return res;
}


int main() {
    puts("************************************************* MENU *****************************************************");
    puts("* At the beginning of this program, you are going to create one sequential list, as well as initialize it. *");
    puts("* After that, you can do several operations on that sequential list according to the following prompts.    *");
    puts("* All you need to do to operate the sequential list is to input one number represents the operation.       *");
    puts("* You can input the numbers continuously, separated by several blanks, terminated by one sigle 0.          *");
    puts("* 1. check whether a sequential list is empty or not.                                                      *");
    puts("* 2. return the number of elements in the list.                                                            *");
    puts("* 3. output the elements in the list.                                                                      *");
    puts("* 4. return the ith element in the list.                                                                   *");
    puts("* 5. search the element e in the list and return the index.                                                *");
    puts("* 6. insert a new element into the list as the ith element.                                                *");
    puts("* 7. delete the ith element and return its value.                                                          *");
    puts("************************************************************************************************************");
    puts("We first initialize the sequential list.");
    SeqList *L = Create();
    InitList(L);
    puts("Done.");
    puts("Now you can input the number sequence representing the operations:");
    int opt,i,e,res,ele,pos;
    while(~scanf("%d",&opt)&&opt){
        switch (opt) {
            case 1:puts("Check whether a sequential list is empty or not:");
                    ListEmpty(L)?puts("Yes"):puts("No");break;
            case 2:puts("Return the number of elements in the list:");
                    printf("The number is %d\n",ListLength(L));break;
            case 3:puts("Output the elements in the list:");
                    DispList(L);break;
            case 4:puts("Return the ith element in the list. Please input i:");
                    scanf("%d",&i);
                    ele = GetElem(L,i);
                    if(ele!=ERROR) printf("%d\n",ele);break;
            case 5:puts("Search the element e in the list and return the index. Please input e:");
                    scanf("%d",&e);
                    pos = LocateElem(L,e);
                    if(pos!=ERROR) printf("%d\n",pos);break;
            case 6:puts("Insert a new element into the list as the ith element. Please input i and the element:");
                    scanf("%d%d",&i,&e);
                    res = SeqInsert(L,i,e);break;
            case 7:puts("delete the ith element and return its value. Please input i:");
                    scanf("%d",&i);
                    res = SeqDelete(L,i);
                    if(res!=ERROR) printf("%d\n",res);break;
        }
    }
    Destroy(L);
    return 0;
}

Subject2. Linked list

Introduction

Implement the abstract data structure: Linked list, introducing its characteristics.

Algorithms Specification

Linked list is a data structure that specifies in pointers.
Data is not sequential, but be as node while linked by pointers.
Therefore, addition and deletion take $O(1)$ time, while find operation will take $O(n)$ time, since it should go through the list, and there is no direct match between data and position.
This data structure makes it faster to modify the number of data inside. However, query will be slower than before.

Test Results

Comment:

initializes the linked list, making it becomes “artist”.
it display the linked list. After that, it adds five elements from ‘a’ to ‘e’ to the back of the linked list and print the list.
it returns the number of elements in the list and print it.
insert one element ‘f’ into the position after the 3rd node and print the list.
delete the element after the 2nd node and print the list.

Analysis and Comments

Time Complexity:

initialization: $O(n)$
return the length of the list: $O(n)$
find the position of one element: $O(n)$
insert and delete one element: $O(1)$. In this program, however, the corresponding functions require $O(n)$ time since we need to find the position first.

Space Complexity:
Generally, it requires $O(n)$ space.
It may be larger than using sequential list since linked list requires space for storing pointers.

Therefore, if we need to do a large amount of addition and deletion, linked list is obviously better than sequential list.
However, if we need to do a large amount of modification instead, sequential will be a better choice.

Source Code

//
// Created by Artist on 2020/10/16.
// linked list
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

#define ERROR -1
#define OVER 0
#define OK 1

typedef char ElemType;

typedef struct{
    ElemType a;
    struct Node *nxt;
}Node;

typedef Node *node_ptr;
typedef node_ptr List;
typedef node_ptr position;

List Create(){
    node_ptr head=(node_ptr)malloc(sizeof(Node));
    head->nxt=NULL;
    return head;
}

void InitList(List L){
    puts("Please input the length of the list:");
    int len;scanf("%d",&len);
    puts("Please input the elements of the list, separated by one space.");
    getchar();
    node_ptr cur=L;
    for(int i=1;i<=len;++i){
        node_ptr next=(node_ptr)malloc(sizeof(Node));
        cur->nxt=next,cur=next;
        scanf("%c",&cur->a);getchar();
    }
    cur->nxt=NULL;
}

// use recursion to destroy the list
void Destroy(List L){
    if(L==NULL) return;   // avoid empty pointer
    if(L->nxt==NULL) {free(L);return;}
    Destroy(L->nxt);
    if(L->nxt==NULL) free(L);
}

bool isEmpty(List L){
    return (L->nxt==NULL);
}

int ListLength(List L){
    int i=0;
    List cur=L;
    while(cur!=NULL) cur=cur->nxt,i++;
    return --i;
}

void DispList(List L){
    if(L==NULL) {puts("The list is empty!");return;}
    position p=L->nxt;
    while(p!=NULL) printf("%c ",p->a),p=p->nxt;
    puts("");
}

position find(List L,ElemType e){
    position p=L->nxt;
    while(p!=NULL&&p->a!=e) p=p->nxt;
    return p;
}

// insert the element into the back of position p in list
int Insert(List L,position p,ElemType e){
    position cur=p,next=(position)malloc(sizeof(Node));
    if(cur==NULL) {puts("the position does not exist int the list");return ERROR;}
    next->a=e;
    if(cur->nxt==NULL) {next->nxt=NULL;cur->nxt=next;return OK;}
    next->nxt=cur->nxt;
    cur->nxt=next;
    return OK;
}

// delete the element in the position p
int Delete(List L,position p){
    position cur=find(L,p->a),next;
    if(cur==NULL||cur->nxt==NULL) {puts("the position does not exist int the list.");return ERROR;}
    next=cur->nxt;
    if(next->nxt==NULL) {puts("There is no element after the position");return OVER;}
    cur->nxt=next->nxt;
    free(next);
}

int main(){
    List L=Create();
    InitList(L);                            // initialization
    DispList(L);
    position p=L;
    while(p->nxt!=NULL) p=p->nxt;           // p points to the last node of list
    for(char ch='a';ch<='e';ch++){          // insert 'a' to 'e'
        Insert(L,p,ch);
        p=p->nxt;
    }
    DispList(L);                            // display the list
    printf("%d\n",ListLength(L));  // get the length
    p=L;
    int id=0;
    while(p!=NULL&&id!=3) p=p->nxt,id++;
    Insert(L,p,'f');                     // insert to the 3rd position of list
    DispList(L);
    p=L;
    id=0;
    while(p!=NULL&&id!=2) p=p->nxt,id++;    // delete the 2nd node
    Delete(L,p);
    DispList(L);
    Destroy(L);                             // destroy the list
}

Subject3. Polynomial (optional)

Introduction

Implement the abstract data structure: Linked list, introducing its characteristics.
Polynomial is one application of linked list.
This application uses the feature that insertion and deletion in linked list are quite efficient.

Algorithms Specification

The addition and deletion of nodes are easy to implement since it only requires to modify the next pointer of the node you specify and the next pointer of the position node.
The addition and multiplication of two polynomials are slightly more complicated.

As for addition:
We go through all the nodes of the operands A and B simultaneously. When the exponent of the current node in A is greater than that of B, it means that this node in A has one unique exponent. Therefore, we insert this node into the end of the result polynomial directly. This is the same for the case that the current node of B has a greater exponent than the one of A. In the third case that the node of A and that one of B have same exponent, we can do addition of their coefficients, after which we obtain a new node and insert it into the result polynomial.

As for multiplication:
The process consists of two nested while loops. The outside one goes through all nodes of A, as for each node in A, we again go through all nodes in B. Based on this process, we can get the product of each node of A and each node of B. All we need to do is to add all of them together.
Therefore, we see the partial product of each single node as one contribution to the result polynomial. We add them one by one.

Test Results

Analysis and Comments

Time Complexity:
The other simple operations have the same time complexity as linked list, since this polynomial is based on linked list.
The addition operation takes $O(n)$ time. Since it needs to go through all the nodes in A and B simultaneously once.
The multiplication operation takes $O(n^2)$ time. Since it needs to fix one node in A and loop through all nodes in B.

Space Complexity:
It takes $O(n)$ space to store polynomials.

The time complexity of multiplication is $O(n^2)$, but we cannot reduce this because it is necessary to go through all combinations to implement multiplication of polynomials.
It seems that linked list is one good choice of data structure to do this thanks to the convenience for modify.

Source Code

//
// Created by Artist on 2020/10/16.
// polynomial
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

typedef int ElemType;

typedef struct{
    ElemType coeff;
    ElemType expo;
    struct Node *nxt;
}Node;

typedef Node *node_ptr;
typedef node_ptr position;
typedef node_ptr Polynomial;

Polynomial Create(){
    Polynomial p = (node_ptr)malloc(sizeof(Node));
    p->nxt = NULL;
    return p;
}

// initialization
void InitList(Polynomial L){
    int len; scanf("%d",&len);          // length
    position cur = L;                            // current position
    // it requires readers to input in increasing order of exponent
    while(len--){
        node_ptr next = (node_ptr)malloc(sizeof(Node));    // new node
        scanf("%d%d",&next->coeff,&next->expo);   // receive the inputs one by one
        next->nxt = NULL;
        cur->nxt = next,cur = next;              // link the node into the polynomial
    }
}

// check if one polynomial is empty
bool isEmpty(Polynomial L){
    return (L->nxt==NULL);
}

// using recursion to free the memory of one polynomial
void Destroy(Polynomial L){
    if(L==NULL) return;
    if(isEmpty(L)) {free(L);return;}       // the stop condition of recursion
    Destroy(L->nxt);
    if(isEmpty(L)) free(L);                // trace back
}

// return the length of one polynomial
int ListLength(Polynomial L){
    int i = 0;
    Polynomial cur = L;
    while(cur!=NULL) cur = cur->nxt,i++;
    return --i;
}

// display one polynomial
void DispList(Polynomial L){
    if(isEmpty(L)) return;
    Polynomial cur = L->nxt;
    printf("%dx^%d",cur->coeff,cur->expo);
    cur = cur->nxt;
    while(cur!=NULL) printf(" + %dx^%d",cur->coeff,cur->expo),cur = cur->nxt;
    puts("");
}

// insert one node into the beginning of one polynomial
void Insert(Polynomial L,node_ptr S){
    S->nxt=L->nxt;
    L->nxt=S;
}

// insert one node into the back of one polynomial
void Pushback(Polynomial L,node_ptr S){
    position cur = L;
    while(cur->nxt!=NULL) cur = cur->nxt;
    cur->nxt = S;
    S->nxt = NULL;
}

// do addition of two polynomials, return the answer
Polynomial AddPoly(Polynomial A,Polynomial B){
    position curA = A->nxt,curB = B->nxt,res = Create();     // current position of A, current position of B, result
    while(curA!=NULL&&curB!=NULL) {
        if(curB->expo==curA->expo) {                  // if the exponential of current position of A and B is equal
            node_ptr S = (node_ptr)malloc(sizeof(Node));
            S->expo = curA->expo,S->coeff = curA->coeff+curB->coeff;
            Pushback(res,S);                          // push the new node into the result
            curA = curA->nxt;                         // move to the next node of A and B
            curB = curB->nxt;
        }
        else if(curB->expo>curA->expo) {              // if the exponential of current position of B is greater
            node_ptr S = (node_ptr)malloc(sizeof(Node));
            S->expo = curB->expo,S->coeff = curB->coeff;
            Pushback(res,S);                          // directly push this node of B
            curB = curB->nxt;
        }
        else {
            node_ptr S = (node_ptr)malloc(sizeof(Node));   // if the exponential of current position of A is greater
            S->expo = curA->expo,S->coeff = curA->coeff;
            Pushback(res,S);
            curA = curA->nxt;
        }
    }
    while(curA!=NULL) {                              // if there are still nodes in A that are not processed
        node_ptr S = (node_ptr)malloc(sizeof(Node));
        S->expo = curA->expo,S->coeff = curA->coeff;
        Pushback(res,S);
        curA = curA->nxt;
    }
    while(curB!=NULL) {                              // if there are still nodes in B that are not processed
        node_ptr S = (node_ptr)malloc(sizeof(Node));
        S->expo = curB->expo,S->coeff = curB->coeff;
        Pushback(res,S);
        curB = curB->nxt;
    }
    return res;
}

// do multiplication of two polynomials, return the answer
Polynomial MultiPoly(Polynomial A,Polynomial B){
    position curA = A->nxt,res = Create();           // result
    while(curA!=NULL){                               // use the nodes of A one by one to multiply the nodes of B
        // a temporary polynomial that stores the product of current node of A and all nodes of B
        Polynomial ctr = Create();
        position curB = B->nxt;                      // for each node of A, go through the nodes of B
        while(curB!=NULL){
            Polynomial ctr2 = Create();              // the product of current nodes of A and B
            node_ptr tmp = (node_ptr)malloc(sizeof(Node));
            tmp->coeff = curA->coeff*curB->coeff,tmp->expo = curA->expo+curB->expo;
            Pushback(ctr2,tmp);
            // push the product into the polynomial consists of product of current node of A and all the B nodes
            ctr = AddPoly(ctr,ctr2);
            Destroy(ctr2);
            curB = curB->nxt;
        }
        // contribute to the result polynomial
        res = AddPoly(res,ctr);
        Destroy(ctr);
        curA = curA->nxt;
    }
    return res;
}

int main(){
    // suppose that the polynomial is already sorted.
    // the number of test cases
    int ca = 0; scanf("%d",&ca);
    while(ca--){
        // initialize
        Polynomial pa = Create(), pb = Create();
        InitList(pa), InitList(pb);
        Polynomial sum = AddPoly(pa,pb);
        Polynomial pro = MultiPoly(pa,pb);

        // print the sum
        printf("the sum of pa and pb is:\n");
        DispList(pa),puts("+"),DispList(pb),puts("="),DispList(sum);

        //print the product
        printf("the product of pa and pb is:\n");
        DispList(pa),puts("*"),DispList(pb),puts("="),DispList(pro);

        //destroy
        Destroy(pa);
        Destroy(pb);
        Destroy(sum);
        Destroy(pro);
    }
}

Lab3 Binary Tree

Subject1. The creation and traversal of binary trees

Introduction

Implement the abstract data structure: Binary Tree, introducing its characteristics. Use linked representation as the storage structure of a binary tree. We need to convert one string into binary tree and find nodes on it.

Algorithms Specification

Input the string which represents the binary tree.
Convert the string into binary tree in pointers form, using a stack to save the position of nodes in the string.
Find a node in the binary tree, first consider finding in the left child, after which consider finding in the right child.
Recursively get the height of the tree. Using the thinking of dynamic programming to divide a bigger task into smaller tasks, which are easier to solve.
Delete tree: first delete the children of the tree, after which can we delete the current node. Otherwise we may lose the position.
Pre-order, in-order and post-order traversal are easy to implement, just using recursion to divide and conquer.

Test results

Comment:
We can input whatever numbers of test cases we want. The only thing we need to consider is we should terminate the inputs by one sign character ‘0’.

first input: A(B(D,E(H(J,K(L,M(,N))))),C(F,G(,I)))

We take the leaves as height of 1. After initializing the binary tree, we got the traversals, and we can search for the nodes one by one, terminated by one ‘0’ as well.

Second input: A(,B)
Third input: A

Those all got correct results.

Analysis and Comments

Time complexity:

Initialization: $O(n)$. Since we go through the whole string.
Get Height: $O(n)$. Since we go through all nodes once.
Traversal: $O(n)$. Since we go through all nodes once.
Find: $O(log n)$. Since we go half the array each time we choose to go to the left son or right son.

Space complexity:
The whole program needs one string to record the string representation of the binary tree, as well as a tree with n nodes to store the data. That is $O(n)$.

Comment:
Binary tree is a very efficient data structure, which appears in many applications. Segment tree, a popular data structure for modifying a consecutive segment, is based on binary search tree. AVL, red black tree, Treap and so on are data structures that use techniques to make binary tree balanced in order to prevent the binary tree from degenerating.
When we do depth first search one tree, it is obvious that we encounter one node twice. The first time is visit it, and the second time is trace back. This process can be used for dynamic programming, finding the height or depth of one node, finding the lowest common ancestor, and so on.When we do depth first search one tree, it is obvious that we encounter one node twice. The first time is visit it, and the second time is trace back. This process can be used for dynamic programming, finding the height or depth of one node, finding the lowest common ancestor, and so on.

Source code

#include <stdio.h>
#include <malloc.h>
#define MaxSize 100
#define max(x,y) x>y?x:y
typedef char ElemType;
typedef struct node{
    ElemType data;
    struct node *lchild,*rchild;
}BTNode;
typedef BTNode *BinTree;

// create one binary tree from sequential representation, and assign it.
BinTree CreateBTree(char *str){
    BTNode *St[MaxSize],*p; // use the data structure stack
    BinTree b;
    int top=-1,k,j=0;char ch;
    b=NULL;              // at the beginning the tree is empty
    ch=str[j];
    while(ch!='\0'){
        switch(ch){
            case '(':top++;St[top]=p;k=1;break;
            case ')':top--;break;
            case ',':k=2;break;
            default:p=(BTNode *)malloc(sizeof(BTNode));
                p->data=ch;p->lchild=p->rchild=NULL;
                if(b==NULL) b=p;
                else {
                    switch(k){
                        case 1:St[top]->lchild=p;break;
                        case 2:St[top]->rchild=p;break;
                    }
                }
        }
        j++;ch=str[j];
    }
    return b;
}

// find a node whose value is the same as the query
BTNode *FindNode(BinTree T,char x){
    BTNode *p;
    if(T==NULL) return NULL;
    else if(T->data==x) return T;
    else {
        p=FindNode(T->lchild,x);
        if(p!=NULL) return p;
        else return FindNode(T->rchild,x);
    }
}

// get the height of the tree using dynamic programming in tree
int GetHeight(BinTree T){
    if(T==NULL) return 0;
    int height1=GetHeight(T->lchild);
    int height2=GetHeight(T->rchild);
    int height=max(height1,height2);
    return height+1;
}

// delete the tree using recursion
void Delete(BinTree T){
    if(T==NULL) return;
    Delete(T->lchild);
    Delete(T->rchild);
    free(T);
}

// pre-order traversal
void Preorder(BinTree T){
    if(T==NULL) return;
    printf("%c",T->data);
    Preorder(T->lchild);
    Preorder(T->rchild);
}

// in-order traversal
void Inorder(BinTree T){
    if(T==NULL) return;
    Inorder(T->lchild);
    printf("%c",T->data);
    Inorder(T->rchild);
}

// post-order traversal
void Postorder(BinTree T){
    if(T==NULL) return;
    Postorder(T->lchild);
    Postorder(T->rchild);
    printf("%c",T->data);
}

/*
 * A(B(D,E(H(J,K(L,M(,N))))),C(F,G(,I)))
 */

int main() {
    char str[MaxSize];
    puts("Please input a string representation of a binary tree, terminated by one '0':");
    while(1){
        scanf("%s",str);
        if(str[0]=='0') break;
        BinTree tree = CreateBTree(str);
        printf("The height of the tree is:%d\n",GetHeight(tree));
        puts("Pre-order traversal:");
        Preorder(tree);
        puts("");
        puts("In-order traversal:");
        Inorder(tree);
        puts("");
        puts("Post-order traversal:");
        Postorder(tree);
        puts("");
        puts("You can get any node in the tree you want,");
        puts("Try input the char, separated by newline notation, terminated by one '0':");
        while(1){
            scanf("%s",str);
            if(str[0]=='0') break;
            BTNode* node = FindNode(tree,str[0]);
            if(node==NULL) puts("Not found!");
            else puts("Found.");
        }
        Delete(tree);
    }
    return 0;
}

Subject2. Application of BST/AVL tree.(Optional)

Introduction

Problem description

Problem Description:Hardwoods are the botanical group of trees that have broad leaves, produce a fruit or nut, and generally go dormant in the winter. America’s temperate climates produce forests with hundreds of hardwood species – trees that share certain biological characteristics. Although oak, maple and cherry all are types of hardwood trees, for example, they are different species. Together, all the hardwood species represent 40 percent of the trees in the United States.
On the other hand, softwoods, or conifers, from the Latin word meaning “conebearing,” have needles. Widely available US softwoods include cedar, fir, hemlock, pine, redwood, spruce and cypress. In a home, the softwoods are used primarily as structural lumber such as 2x4s and 2x6s, with some limited decorative applications.
Using satellite imaging technology, the Department of Natural Resources has compiled an inventory of every tree standing on a particular day. You are to compute the total fraction of the tree population represented by each species.

Input

Input to your program consists of a list of the species of every tree observed by the satellite; one tree per line. No species name exceeds 30 characters. There are no more than 10,000 species and no more than 1,000,000 trees.

Output

Print the name of each species represented in the population, in alphabetical order, followed by the percentage of the population it represents, to 4 decimal places.

Algorithms Specification

We use one binary tree to store the names of trees.
Names that are greater in alphabetical order are stores in the left subtree of one node.
When inserting one new name, we first find the position of it according to the alphabetical order. If the current name we encounter is alphabetically greater than the name we are inserting, we go to the left son of the current node to search further. When the position is found, we first check if there is one node that is already exists. If so, we simply increase the count of that name(node), otherwise, we create one new node over there.
After inserting all the nodes, we need to print them altogether in alphabetical order. Recall that in-order traversal is corresponding to alphabetical order because of its characteristics.
At last, we simply delete the tree that is built.

Test results

Comment:

sample output

pics

input Dang Xilin\nKim Junkyu\nDeng Chutong

pics
Which are both correct.

Analysis and Comments

Time Complexity:

insert one node into the binary tree: $O(log n)$. Since we need to find the position of the node first.
Using in-order traversal to print the names: $O(n)$. Since we should go to every node once.
Delete the tree: $O(n)$. Since we should go to every node once. Space Complexity: The space complexity is $O(n)$. Each name is stored in one node.

Comment:
Binary tree is efficient for Interval processing, $O(log n)$ is quite good for processing data.
However, according to the shape of the binary tree, sometimes the binary tree may degenerates into one array in the worst case. Balanced tree is needed when the scale of data is large. We usually use Treap, WBLT, left leaning red black tree and Splay since they are relatively easy to implement.

Source code

#include <stdio.h>
#include <malloc.h>
#include <string.h>
#define MaxSize 33
const int N = 1e4+4;
typedef struct node{
    char str[MaxSize];
    struct node *lchild,*rchild;
    int cnt;     // the number of occurrence that this word appear
}BTNode;
typedef BTNode *BinTree;
int num;

// compare two string, if str1 is bigger than str2, return 1
int Cmp(char *str1,char *str2){
    int len1=strlen(str1),len2=strlen(str2);
    int i;
    for(i=0;i<len1;++i){
        if(i>=len2) return 1;
        if(str1[i]>str2[i]) return 1;
        if(str1[i]<str2[i]) return -1;
    }
    if(len1!=len2) return -1;
    else return 0;
}

//insert one node into the binary tree.
BinTree Insert(BinTree T,char *s){
    if(T==NULL){
        T=(BTNode *)malloc(sizeof(BTNode));
        strcpy(T->str,s);
        T->lchild=T->rchild=NULL;
        T->cnt=1; // the first time it insert, the count is 1
        return T;
    }
    int tmp = Cmp(T->str,s);
    if(tmp>0) T->lchild=Insert(T->lchild,s);
    if(tmp<0) T->rchild=Insert(T->rchild,s);
    if(tmp==0) T->cnt++;  // reach the destination
    return T;
}

// in-order traversal to print the nodes in order
void Dfs(BinTree T){
    if(T==NULL) return;
    Dfs(T->lchild);
    printf("%s %.4lf\n",T->str,100.0*T->cnt/num);
    Dfs(T->rchild);
}

// delete the entire tree
void Delete(BinTree T){
    if(T==NULL) return;
    Delete(T->lchild);
    Delete(T->rchild);
    free(T);
}

int main(){
    // change the stream from keyboard into the file test.txt
    freopen("test.txt","r",stdin);
    char str[MaxSize];
    BinTree b=NULL;  // initialize the binary tree: empty
    char * find;
    while(1){
        // gets one line of name
        void *tmp=fgets(str,32,stdin);
        if(tmp==NULL) break;
        // delete the new line sign
        find=strchr(str,'\n');
        if(find) *find = '\0';
        // insert the node into the tree
        b=Insert(b,str);
        // total number of tree
        num++;
    }
    // print the nodes in order using depth first search
    Dfs(b);
    Delete(b);
}

Lab4 Sorting

Introduction

Understand sorting procedure and algorithm design.

Sorting is one of the most fundamental algorithmic problems within computer science. It has been claimed that as many as 25% of all CPU cycles are spent sorting, whichprovides a lot of incentive for further study and optimization of sorting. In addition, there are many other tasks (searching, calculating the median, finding the mode, remove duplicates, etc.) that can be implemented much more efficiently once the data sorted.

Subject1. Basic sorting algorithm & binary insertion sort

Algorithms Specification

bubble sort

The basic idea of bubble sort is to compare the adjacent elements each time, and swap them if the left one is greater. In this process, great elements will be like bubble merging towards the end of the array. Since each step we only compare adjacent elements, swap them if needed and then go to the next position, regardless of the relationship between the swapped one and the former ones. The array will not be sorted after only one scan. Therefore, in order to make every element be in their correct positions, we need to scan the array $n$ times. Note that after $x$ times’ scanning, the last x elements are already in their correct positions. Therefore, in the $x+1$th scan, we only need to scan till position $n-x$.

merge sort

Merge sort is based on the idea of divide and conquer. There are lots of algorithms and techniques derived from this idea, such as CDQ divide and conquer(this algorithm was developed by a Chinese high school girl named Danqi Chen, and she then becomes a famous doctor in computer science). This idea has many applications, such as finding the numbers of reverse pairs in an array or Three-dimensional partial order problems(三维偏序). This idea is based on recursion. We first assume that we already get the array nearly sorted, that is, the left half part and right half part are sorted. Then, we merge these two part into one completely sorted array. To make the two parts sorted, we can do recursion at the top of the functions, to sort the smaller arrays, and merge those two arrays into one in the process of backtracking.

insertion sort

The idea of insertion sort is to maintain a sorted array consisted of the first few elements when scanning the entire array. When we encounter one element, we find the position of it in the sorted array and insert it into the array. Then we get one longer sorted array and we continue this process until we reach the end of the entire array.

Since the array we are maintaining is already sorted, we do not need to scan it thoroughly. Instead, we use binary search to find the position, by which the time complexity can be declined.

Test Results

input file generated by the generator, note that there are 1000 random numbers in range (1,10000).

we can see from the output file that the three algorithms sorted the numbers perfectly.

Analysis and Comments

bubble sort

To implement the bubble sort, we use a for loop to scan the whole array $n$ times, while use another inner
loop to swap the adjacent elements. Note that in the $x$th loop, the last $x$ elements are already sorted. Therefore, the inner loop needs only go till the $n-i-1$th element.

We use the swap function defined by the header file to simplify the code. The condition for swapping is that the former one is greater than the latter one, which does not satisfy the sorted requirement.

Since we use two for loop, even through the duration of the inner one is deceasing by one each time, the total time complexity is still $O(n^2)$. ($\sum_{i=1}^n{i}=\frac{n\times (n+1)}{2}$).

The swapping is limited in adjacent elements. It lifts the lower bound of the sorting method to $O(n^2)$. To optimize the time complexity, we can use the idea of block(分块思想), like Shell sort. By swapping the elements that are not adjacent several times, the total time complexity can be compressed. However, the main point is to set the size of the steps nicely.

The Space complexity is $O(n)$ to store the input.

merge sort

To implement merge sort, we need to call a recursion function, which is declared by sort in the structure merge_sort. Inside the function, we first sort the left part and right part of the array recursively, after that, in the backtracking part, we scan the left part of the array, while at the same time scanning the right part. When we encounter one element in the right part that is less than the current element in the left part, we stop scanning the right part temporarily, and store the current element in the left part. Otherwise, we store the current element in the right part. By doing this, the array we are currently handling is sorted in a temporary array. We update the original array using this and end one recursion process. Merge sort has a time complexity of $O(nlog n)$, which is quite efficient among all the sorting methods. This time complexity benefits from the idea of divide and conquer.

The Space complexity is $O(n)$ to store the input and the temporary array.

Prove:

Let $k=\log_2{n}$.

$T(n)=2\times T(\frac{n}{2})+n$ $=2\times (2\times T(\frac{n}{2^2})+\frac{n}{2})+n$ $...$ $=2^k\times T(\frac{n}{2^k})+k\times n$ $...$ $=2^k+n\times k$ $=2^{\log_2n}+n\times \log_2n$ $=n+n\log_2n$

insertion sort

To complement the insertion sort, we use a for loop to scan the whole array, for each element, we find the position of it in the former sorted part and insert it. The finding process can be optimised by binary search. We can code a binary search function, or we can directly use the lower_bound function packed by the header file, whose low-level implementation is still binary search.

The Time complexity is $O(nlog n)$. $n$ is for scanning the whole array, $log n$ is for binary search.

The Space complexity is $O(n)$ to store the input.

Comments:

In my opinion, the best sorting method is quick sort:), which is also a default sorting method in the c++ function sort.

void qsort(int l,int r)
{
    int mid=a[(l+r)/2];
    int i=l,j=r;
    do{
        while(a[i]<mid) i++;
        while(a[j]>mid) j--;
        if(i<=j)
        {
            swap(a[i],a[j]);
            i++;
            j--;
        }
    }while(i<=j);
    if(l<j) qsort(l,j);
    if(i<r) qsort(i,r);
}

Source Code

generator

# include<bits/stdc++.h>
using namespace std;

// quick read
inline int read(){
    char ch=getchar();int x=0,f=1;
    while(ch<'0'||ch>'9') {if(ch=='-') f=-1;ch=getchar();}
    while(ch>='0'&&ch<='9') x=x*10+ch-'0',ch=getchar();
    return x*f;
}

int main(){
    freopen("input.txt","w",stdout);
    int n=read();
    cout<<n<<endl;
    srand(time(NULL));   // generate the random seed
    for(int i=1;i<=n;++i) cout<< (int)rand()%10000+1 <<" ";
    puts("");
}

program

# include<bits/stdc++.h>
using namespace std;

inline int read(){
    char ch=getchar();int x=0,f=1;
    while(ch<'0'||ch>'9') {if(ch=='-') f=-1;ch=getchar();}
    while(ch>='0'&&ch<='9') x=x*10+ch-'0',ch=getchar();
    return x*f;
}

// bubble sort
void bubble_sort(vector<int> a,int n){
    puts("************* Bubble sort *************");
    for(int i=0;i<n-1;++i)
        for(int j=0;j<n-i;++j)
            if(a[j]>a[j+1]) swap(a[j],a[j+1]);
    for(auto i:a) cout<<i<<" ";  // print the sorted array
    putchar(10);  // print one new line character
}

// merge sort
struct merge_sort{
    vector<int> a;
    int tmpa[1005];  // temporary array
    void read(vector<int> b){
        for(auto i:b) a.push_back(i);
    }
    void sort(int l,int r){
        if(l>=r) return;
        int mid = (l+r)>>1;  // divide into two parts
        sort(l,mid),sort(mid+1,r);  // recursion
        int tmp=mid+1,cnt=l-1;
        for(int i=l;i<=mid;++i){ // loop the left part
            // loop the right part
            while(tmp<=r&&a[tmp]<a[i]) tmpa[++cnt] = a[tmp++];
            tmpa[++cnt] = a[i];
        }
        // if there are elements in the right part are not included
        for(int i=tmp;i<=r;++i) tmpa[++cnt] = a[i];
        // copy the temporary array to the original array
        for(int i=l;i<=r;++i) a[i] = tmpa[i];
    }
    void solve(vector<int> b,int n){
        puts("************* Merge sort *************");
        read(b);
        sort(0,n-1);
        for(auto i:a) cout<<i<<" ";
        putchar(10);
    }
};

struct binary_insertion_sort{
    vector<int> a;
    void read(vector<int> b){
        for(auto i:b) a.push_back(i);
    }
    // binary search
    int b_search(int st,int ed,int val){
        int ans=ed+1;
        while(st<=ed){
            int mid = (st+ed)>>1;
            if(a[mid]<val) st = mid+1;
            else ed = mid-1,ans = mid;
        }
        return ans;
    }
    void sort(int n){
        for(int i=1;i<=n;++i){
            int tmp = a[i];
            // we can use the functions packed by the header file directly
            //int pos = lower_bound(a.begin(),a.begin()+i,tmp)-a.begin();
            int pos = b_search(0,i-1,tmp);
            // move the elements one position to the end
            for(int j=i-1;j>=pos;--j){
                a[j+1] = a[j];
            }
            a[pos] = tmp;   // insert
        }
    }
    void solve(vector<int> b,int n){
        puts("************* Insertion sort *************");
        read(b);
        sort(n-1);
        for(auto i:a) cout<<i<<" ";
        putchar(10);
    }
};
int main(){
    freopen("input.txt","r",stdin);  // read the data
    freopen("output.txt","w",stdout);  // output

    int n=read();
    vector<int> a;
    for(int i=1;i<=n;++i) a.push_back(read());
    
    bubble_sort(a,n);
    
    merge_sort mer;
    mer.solve(a,n);

    binary_insertion_sort ins;
    ins.solve(a,n);
}

Subject2. One ACM problem (optional):

Problem Statement

Matt’s friend K.Bro is an ACMer. Yesterday, K.Bro learnt an algorithm: Bubble sort. Bubble sort will compare each pair of adjacent items and swap them if they are in the wrong order. The process repeats until no swap is needed.

Today, K.Bro comes up with a new algorithm and names it K.Bro Sorting.

There are many rounds in K.Bro Sorting. For each round, K.Bro chooses a number, and keeps swapping it with its next number while the next number is less than it. For
example, if the sequence is “1 4 3 2 5”, and K.Bro chooses “4”, he will get “1 3 2 4 5” after this round. K.Bro Sorting is similar to Bubble sort, but it’s a randomized algorithm because K.Bro will choose a random number at the beginning of each round. K.Bro wants to know that, for a given sequence, how many rounds are needed to sort this sequence in the best situation. In other words, you should answer the minimal number of rounds needed to sort the sequence into ascending order. To simplify the problem, K.Bro promises that the sequence is a permutation of 1, 2, … , N . Input The first line contains only one integer T (T ≤ 200), which indicates the number of test cases. For each test case, the first line contains an integer N (1 ≤ N ≤ 10 ^6).

The second line contains N integers a i (1 ≤ a i ≤ N ), denoting the sequence K.Bro gives you.

The sum of N in all test cases would not exceed 3 × 10 ^6. Output For each test case, output a single line “Case #x: y”, where x is the case number (starting from 1), y is the minimal number of rounds needed to sort the sequence.

Sample Input

2
5
5 4 3 2 1
5
5 1 2 3 4

Sample Output

Case #1: 4
Case #2: 1

Test Results

I handed my code to the online judge since I am an ACMer, I know that the sample input is just pretests.

Analysis and Comments

This problem is quite asy to solve. Its difficulty is fairly low. Found that when we choose one number to bubble, the relative positions of the other numbers stay unchanged. Therefore, to make the array sorted, the minimum number we choose to bubble is the number of elements have at least one element less to it is latter to it.

Therefore, we scan the array from right to left. Maintain the minimum value we encountered, and check if the current element is greater to it. If yes, increase the answer by one.

The time complexity is $O(n)$ since we only scan the array once, and there is no other operation.

The space complexity is $O(n)$ to store the input.

Source Code

#include <bits/stdc++.h>
using namespace std;

inline int read(){
    char ch=getchar();int x=0,f=1;
    while(ch<'0'||ch>'9') {if(ch=='-') f=-1;ch=getchar();}
    while(ch>='0'&&ch<='9') x=x*10+ch-'0',ch=getchar();
    return x*f;
}

int main(){
    int t=read(),cas = 0;
    while(t--){
        int n=read();
        vector<int>a;
        for(int i=1;i<=n;++i) a.push_back(read());
        int maxa = 0x3f3f3f3f,ans=0;
        // for each number, check if there exist a number after it is smaller than it.
        // if yes, contribute one.
        for(int i=n-1;i>=0;--i){
            if(maxa<a[i]) ans++;
            maxa = min(maxa,a[i]);
        }
        printf("Case #%d: %d\n",++cas,ans);
    }
}

Lab5 Basic Operations on Graph

Experiment Objectives

Grasp the basic methods to represent a graph
Grasp the BFS method to traverse a graph

Experiment Contents

ADT of Graph:

Structure Undirected Graph is an object composed of a nonempty set of vertices and a set of undirected edges, where each edge is a pair of vertices.

Functions:

for all $graph\in Graph$, $v, v1, v2 \in Vertices$
Graph Create(), returns an empty graph.
Graph InsertVertex(Graph, v), returns a graph with v inserted. v has no incident edge.
Graph InsertEdge(Graph, v1,v2), returns a graph with new edge between v1 and v2.
Graph DeleteVertex(Graph, v), returns a graph in which v and all edges incident to it are removed.
Graph DeleteEdge(Graph, v1, v2), returns a graph in which the edge (v1, v2) is removed.
Boolean IsEmpty(Graph), if the graph is empty returns TRUE else returns FALSE.

Finish the following tasks:

Use both adjacency matrix and adjacency list to represent graphs and define the corresponding abstract data type.
Realize all the above functions for the two graph ADTs.
Base on the input, generate objects to represent the graph.
Traverse the graph in BFS manner.

Input

The input consists of several test cases. Each test case contains two lines. The first line contains the vertex set V. The second line contains edge set E. A single 0 indicate the end of the input.

Output

The output is the BFS of the graph.

Sample input

(note that every bunk of input has two line, each string of vertex is separated by a space, and the input ends when 0 is scanned)
A B C D E F G
A B A C A D B D B E C F C D D E D F D G
0

Sample output

A B C D E F G

Algorithms Specification

Graph is a quite important model in computer science, many real-world problems can be abstracted into graph problems.
Adjacency matrix and adjacency list are two basic methods for graph representation.

adjacency matrix

adjacency matrix’s basic idea is to set a matrix of size $n^2$ to store the properties of the edges links vertex $i$ and $j$ for any $i,j\in V$.
We define the structure of graph to have four variables: enum type variable kind for the graph type information, integer variable maxnum for the total number of vertices that have ever been added into this graph, integer variable vexnum for current number of vertices, vertex type variable for storing the value of vertices while number them by indices and integer variable adj for preserving the information about edges between every two vertices.
The LocateVertex function is for locating the index according to the value of the given vertex. We simply go through the whole vexs array in the graph and check if any is the one we are looking for.
The InsertVertex function is for inserting a new vertex into the graph given the value of the new vertex. We first use the previous function LocateVertex to check if the given value is already in the graph, then we don’t have to count another duplicately. If not, we add this vertex into the array, while setting all the edges between this vertex and the other to be undefined.
The InsertEdge function is for inserting a new edge between two vertices that already exist in the graph whose values are given by the parameters. We first locate the indices of these two vertices according to their values and then set the edge inside the matrix to some specific values.
The DeleteVertex function is for deleting a vertex given the value of it. We first locate the index of it, then loop through all edges that are adjacent to this vertex to set them undefined, after which we set the vertex index in the array to be unavailable.
The DeleteEdge function is for deleting an edge between two vertices that already exist in the graph whose values are given by the parameters. We first locate them two, and set the edge in the matrix to be undefined.
The BFS function is for breath first search. It uses a queue to store the indices of those temporary vertices that are visited while waiting for accessing their sons. For every step, all we need to do is to take one element from the queue, visit its sons that are unvisited and push the sons into the queue. When the queue is empty, we jump out of the loop and scan the array of vertices to see if there are still vertices that are unvisited and push them into the queue to do the above procedure again. These vertices form a component that is unable to be accessed from the previous vertices.

adjacency list

Adjacency list is the most widely used data structure for graph representation. Its idea is to use linked list to store the value of edges, which is dynamic and memory-saving.
We define the structure of graph to have two variables: one is the number of vertices, another is an array to store each vertices and the pointers to edges pointing from that vertex.
The LocateVertexHead function is for locating the index in the array of the vertex. We go through the array to check whether its value is equal to the given vertex. Return the index if we find it.
The LocateVertexLinkPre function is for locating the position of the node before the given vertex from the head vertex (note that, in the concepts of graph, head is defined as the vertex that is pointed to by another vertex. However, in my code, I set the vertex pointing out edge to be head since it seems to be easier to understand). We go from the head node to the next one until we find the node whose link points to the vertex we are finding. Then we return this previous node.
The InsertVertex function is for inserting a new vertex into the graph. We need to check if the vertex is already in the graph by the function LocateVertexHead, if not, we initialize a new space in the array.
The InsertEdge function is for inserting an edge between two vertices that already exist in the graph whose values are given by the parameters. We first locate the index of the first vertex, insert the second vertex into the end of the links pointing from the first vertex. There, we only consider the first vertex to be head, and the second to be tail, since the graph may be a directed graph. If it is undirected, we can call this function twice for one time, only to swap the two vertices in the second call.
The Destroy function is for destroying a link pointing from one vertex. It uses recursion to finish the procedure.
The DeleteEdge function is for deleting an edge between two vertices. Just locate the index of the head, and go through the link of head to find the tail, then free the memory allocated. (note that in my code, since I am using C++ instead of pure C, I use delete and new instead of malloc and free, there, they have a little difference for the manipulation of memory of string, which is a data type in STL).
The DeleteVertex function is for deleting a vertex given the value. We first locate the position, then free the memory of its link. Then we merge the following values in the array sequentially to occupy this empty position. In addition, we need to loop through all links of head vertices to delete the edges that point to the vertex.
The BFS function is for breath first search. Each time we need to access sons of current vertex popped from the queue, we loop through its link to visit new vertices.
The DestroyGraph function is for destroying all memory allocated by the specific graph.

features

We can represent different kinds of graph in the code. Directed graph, undirected graph and so on. Simply modify the main function and input data.
In my code, the vertices of input need not to be a single character. It can be anything one like, such as numbers, character strings. Since I set the data type to be string in STL while writing a powerful input function, whose logic is based on separating input by finding spaces.
new and delete is just malloc and free in C++. While they support more properties.
Actually, we can use vector<datatype>Graph[MAX_SIZE] or vector<vector<datatype> >Graphinstead of adjacency list. vector in STL has many associated efficient functions and can allocate memory dynamically. Moreover, they can decrease the length of codes. Some functions inside STL::vector are implemented based on the knowledge we learn. However, some functions use some efficient ways for acceleration. Such as reading and modifying any element has constant time complexity, and inserting and deleting at the end of the sequence is constant time complexity.
Actually, there is another way to implement adjacency list. It is called 链式前向星(I do not know its English name…), its basic idea is to use an array to store the link lists, which is similar to the array representation of binary tree.

struct node{
    int to,next;
}edge[maxn*2];
void add_edge(int u,int v){
    ++tot;
    edge[tot].to=v;
    edge[tot].next=head[u];
    head[u]=tot;
}

Test Results

test1

A B C D E F G
A B A C A D B D B E C F C D D E D F D G
Ice-cream LiuRujia Muxin Artist
Muxin Ice-cream
1 2 3 4 5 6 7
1 2 4 5 3 7 5 2
0

test2 (error test)

A B C D E
456 345
Icecream Artist Icecream
Artist Icecream
Icecream Artist
Rain Icecream
0

test3 (delete edge or vertices)

Analysis and Comments

time complexity

matrix

LocateVertex: This procedure is linear, it costs $O(V)$ to scan the whole array.
InsertVertex: Since we need to check this vertex in the array and set all related edges, this procedure requires $O(V)$ time.
InsertEdge: Since we need to check this vertex in the array and set all related edges, this procedure requires $O(V)$ time.
DeleteVertex: This procedure takes $O(V)$ times, since it needs to locate the vertex and modify the values between this vertex and the other by scanning the whole array that is related to the vertex.
DeleteEdge: This procedure takes linear time, since it needs to locate the two vertices of the arc.
BFS: $O(V^2)$. Each time we dequeue one vertex from the queue. In total, we will dequeue $V$ times. For each vertex, we go through all the vertices related to it to see whether there is an edge between them and the new vertex has not been visited yet.

comments:

It shows that the time complexity is bounded by the LocateVertex function, to improve the time complexity, we need to modify this function. One possible method to optimise this function is to build a binary search tree to store the indices of each vertices. To simplify the code, we can use STL::map to map the vertices’ values to indices one to one, whose underlying implementation is a red-black tree. The time complexity can be improve to $O(log V)$.
The mapping of vertices and indices is unique and cannot be changed even if we delete the vertex, since if we change the mapping, the values stored in the matrix will be meaningless and we have to reconstruct the matrix for the new mapping relation. Therefore, if we delete some vertices, the indices that maps to them cannot be used a second time for newly inserted vertices. The only acceptable result for these indices is set them to be unavailable.
In the BFS function, we check all the vertices that if each is linked to the current vertex. If the map is not dense, we will waste a lot of time in this procedure. Nevertheless, this scanning process cannot be improved since it is determined by the nature of matrix.

list

LocateVertexHead: This procedure is linear, it costs $O(V)$ to scan the whole array to find the index of the given vertex.
LocateVertexLinkPre: This procedure is $O(V)$. We need to loop through the link representing the vertices pointing from the given head vertex.
InsertVertex: This procedure is linear, since we need to use function 1 to check if there already exists such vertex.
InsertEdge: This procedure is linear. We locate the head vertex, as well as go to the end of the link of this vertex.
DeleteEdge: $O(V)$. We locate the head vertex1 as well as find the position of the given vertex2 in the link of vertex1.
DeleteVertex: $O(V+E)$. We need to check each vertices in the graph to see if there is an edge from this vertex and the given vertex. If there is, we delete the edge. Outer loop is going through every vertices, inner loop is finding the possible position of the deleted vertex in the linked list of the outer loop vertex.
BFS: $O(E)$. Each time we pop one vertex from the queue. We will pop $V$ times. For each vertex, we loop through all points that are pointing from it, which is the number of edges linked from the vertex. Therefore, we encounter $E$ edges in total.

comments:
The DeleteVertex procedure takes a bunch of time since the graph may be a dense graph, where the number of edges may be up to $V^2$.
This is because we scatter the edges that point to a certain node into a linked list of each head node.
How to improve the time complexity?
For linear time complexity functions that need to scan the link to find the given vertices, we can replace linked list with a binary search tree. We can implement this using a STL::set for each vertex. The time complexity can be reduced to $O(log V)$ to locate the tail vertex.
For linear time complexity functions that need to locate the indices of a given head vertex in the array, we can also replace this array with a STL::set, which is based on a binary tree, as stated above.
Therefore, the time complexity of DeleteVertex can be reduced to O(V log V), but this resulted graph is not a classical linked list and the code will be very long and complicated.

matrix

$O(V^2)$ to store all relationship between every two vertices.
If the number of vertices reach $10^5$, the space requirement is too large for a program.
Therefore, adjacent matrix is better used in small dense graph.

list

$O(E)$ to store all edges.
It is quite efficient.

Source Code

adjacent matrix version

#include<bits/stdc++.h>
using namespace std;
#define MAX_VEX 30
#define INF 0x3f3f3f3f
typedef string VexType;
typedef int ArcValType;
typedef enum{DG,AG,WDG,WAG} GraphKind;

int iserror;  // if we encounter an error by now

struct ArcType{   // arc
    VexType vex1,vex2;
    ArcValType ArcVal;
};

typedef struct{
    GraphKind kind;  // type of the graph
    int maxnum,vexnum; // current number of vertices
    // max index of all vertices
    VexType vexs[MAX_VEX];  // array saving vertices
    int adj[MAX_VEX][MAX_VEX];
}Graph;

Graph Create(){
    Graph g;
    g.kind=AG;
    g.maxnum=g.vexnum=0;
    return g;
}

int LocateVertex(Graph g,VexType p){
    for(int i=0;i<g.maxnum;++i) if(g.vexs[i]==p) return i;
    return -1;
}

Graph InsertVertex(Graph g,VexType v){
    if(g.maxnum>=MAX_VEX)
        {puts("Vertex Overflow!");iserror=1;return g;}
    if(LocateVertex(g,v)!=-1)
        {puts("Vertex has existed!");iserror=1;return g;}
    int k = g.maxnum;
    g.vexs[g.maxnum++]=v;
    ++g.vexnum;
    if(g.kind==DG||g.kind==AG)
        for(int i=0;i<g.maxnum;++i)
            g.adj[i][k] = g.adj[k][i] = 0;
    else
        for(int i=0;i<g.maxnum;++i)
            g.adj[i][k] = g.adj[k][i] = INF;
    return g;
}

Graph InsertEdge(Graph g,ArcType arc){
    int i = LocateVertex(g,arc.vex1),j = LocateVertex(g,arc.vex2);
    if(i==-1||j==-1)
        {puts("Arc's Vertex does not existed!");iserror=1;return g;}
    if(g.kind==DG||g.kind==WDG)
        g.adj[i][j]=arc.ArcVal;
    else g.adj[i][j]=g.adj[j][i]=arc.ArcVal;
    return g;
}

Graph DeleteVertex(Graph g,VexType v){
    int i = LocateVertex(g,v);
    if(i==-1)
        {puts("The Vertex does not existed!");iserror=1;return g;}
    if(g.kind==DG||g.kind==AG)
        for(int j=0;j<g.maxnum;++j)
            g.adj[i][j] = g.adj[j][i] = 0;
    else
        for(int j=0;j<g.maxnum;++j)
            g.adj[i][j] = g.adj[j][i] = INF;
    g.vexs[i]="NULL";
    --g.vexnum;  // the actual number of vertex decrease.
    // but the max index (maxnum) of vertex does not decrease!
    return g;
}

Graph DeleteEdge(Graph g,ArcType arc){
    int i = LocateVertex(g,arc.vex1),j = LocateVertex(g,arc.vex2);
    if(i==-1||j==-1)
        {puts("Arc's Vertex does not existed!");iserror=1;return g;}
    if(g.kind==DG) g.adj[i][j] = 0;
    if(g.kind==WDG) g.adj[i][j] = INF;
    if(g.kind==AG) g.adj[i][j] = g.adj[j][i] = 0;
    if(g.kind==WAG) g.adj[i][j] = g.adj[j][i] = INF;
    return g;
}

bool IsEmpty(Graph g){
    return g.vexnum?1:0;
}

bool visited[MAX_VEX]; // for BFS

struct Queue{
    int elem[MAX_VEX];
    int front, rear;
};

void Visit(Graph g,int i){
    cout<<g.vexs[i]<<" ";
}

void BFS(Graph g){
    Queue q;
    memset(visited,0,sizeof(visited));
    q.front=q.rear=0;
    for(int i=0;i<g.maxnum;++i)
        if(g.vexs[i]=="NULL") visited[i]=1;
    for(int i=0;i<g.maxnum;++i){
        if(!visited[i]){
            visited[i] = 1;
            Visit(g,i);
            q.elem[++q.rear]=i;
            while(q.front!=q.rear){
                int w=q.elem[++q.front];
                for(int j=0;j<g.maxnum;++j){
                    // loop through all vertices in the graph
                    if((g.kind==DG||g.kind==AG)&&g.adj[w][j]&&!visited[j]){
                        visited[j]=1;
                        Visit(g,j);
                        q.elem[++q.rear]=j;
                    }
                    if((g.kind==WDG||g.kind==WAG)&&g.adj[w][j]!=INF&&!visited[j]){
                        visited[j]=1;
                        Visit(g,j);
                        q.elem[++q.rear]=j;
                    }
                }
            }
        }
    }
    putchar(10);
}

int main(){
    string s;
    // c++ method for scanning a whole line
    while(getline(cin,s)){
        if(s=="0") return 0;
        iserror=0;
        Graph cur = Create();
        string delimiter = " ";
        size_t pos = 0;
        string token,token2;
        s+=" ";  // use space to separate each string
        while((pos = s.find(delimiter)) != string::npos) {
            token = s.substr(0,pos);
            s.erase(0,pos + delimiter.length());
            if(!iserror) cur = InsertVertex(cur,token);
        }
        getline(cin,s);
        s+=" ";
        if(s.empty()) {puts("This is an empty graph!");continue;}
        pos = 0;
        while((pos = s.find(delimiter)) != string::npos) {
            token = s.substr(0,pos);
            s.erase(0,pos + delimiter.length());
            pos = s.find(delimiter);
            token2 = s.substr(0,pos);
            s.erase(0,pos + delimiter.length());
            ArcType arc={token,token2,1};
            if(!iserror) cur = InsertEdge(cur,arc);  //   undirected graph
        }
        if(!iserror) BFS(cur);
        /*
        //    ----------------------------------- test ---------------------------------
            ArcType arc;
            arc.vex1="A",arc.vex2="B",arc.ArcVal=1;
            cur=DeleteEdge(cur,arc);
            BFS(cur);
            cur=DeleteVertex(cur,"D");
            BFS(cur);
        */
    }
}

linked list version

#include<bits/stdc++.h>
using namespace std;
#define MAX_VEX 50
#define ERROR -1
#define OK 1
typedef struct _node *node_ptr;
typedef string VexType;

typedef struct _node{
    VexType vertex;
    node_ptr link;
}node;

typedef struct{
    int vexnum;
    node AdjList[MAX_VEX];  // array of vertex heads and their links
}graph;

typedef graph *Graph;

Graph Create(){
    Graph g = new graph;
    g->vexnum=0;
    return g;
}

int LocateVertexHead(Graph g,VexType p){
    for(int i=0;i<g->vexnum;++i)
        if(g->AdjList[i].vertex==p) return i;
    return -1;
}

// get the position of the previous node of the vertex
node_ptr LocateVertexLinkPre(Graph g,int head,VexType p){
    node_ptr cur=&g->AdjList[head];
    while(cur->link!=NULL&&cur->link->vertex!=p)
        cur=cur->link;
    if(cur->link==NULL) cur=NULL;
    return cur;
}

int InsertVertex(Graph g,VexType v){
    if(g->vexnum>=MAX_VEX)
        {puts("Vertex Overflow!");return ERROR;}
    if(LocateVertexHead(g,v)!=-1)
        {puts("Vertex has existed!");return ERROR;}
    g->AdjList[g->vexnum].link=NULL;
    g->AdjList[g->vexnum++].vertex=v;
    return OK;
}

int InsertEdge(Graph g,VexType v1,VexType v2){
    int i = LocateVertexHead(g,v1);
    if(i==-1)
        {puts("Arc's Vertex does not existed!");return ERROR;}
    node_ptr cur=&g->AdjList[i];
    while(cur->link!=NULL&&cur->link->vertex!=v2) cur=cur->link;
    if(cur->link!=NULL)
        {puts("This edge already exists!");return ERROR;}
    cur->link=new node;
    cur->link->link = NULL,cur->link->vertex=v2;
    return OK;
}

// recursion
void Destroy(node_ptr cur){
    if(cur==NULL) return;
    Destroy(cur->link);
    delete cur;
}

int DeleteEdge(Graph g,VexType v1,VexType v2){
    int i = LocateVertexHead(g,v1);
    if(i==-1)
        {puts("This vertex is not in the graph!");return ERROR;}
    node_ptr pre = LocateVertexLinkPre(g,i,v2);
    if(pre==NULL)
        {puts("This edge is not in the graph!");return ERROR;}
    node_ptr cur = pre->link;
    pre->link = cur->link;
    free(cur);
    return OK;
}

int DeleteVertex(Graph g,VexType v){
    int i = LocateVertexHead(g,v);
    if(i==-1)
        {puts("This vertex is not in the graph!");return ERROR;}
    Destroy(g->AdjList[i].link);
    for(int j=i+1;j<g->vexnum;++j)
        g->AdjList[j-1]=g->AdjList[j];
    --g->vexnum;
    // loop through all vertices in the graph
    for(int j=0;j<g->vexnum;++j){
        // find if the v is pointed by this vertex
        node_ptr pre = LocateVertexLinkPre(g,j,v);
        if(pre!=NULL){
            node_ptr cur = pre->link;
            pre->link = cur->link;
            delete cur;
        }
    }
    return OK;
}

bool IsEmpty(Graph g){
    return g->vexnum?1:0;
}

int visited[MAX_VEX];

struct Queue{
    int elem[MAX_VEX];
    int front, rear;
};

void Visit(node cur){
    cout<<cur.vertex<<" ";
}

void BFS(Graph g){
    Queue q;
    memset(visited,0,sizeof(visited));
    q.front=q.rear=0;
    for(int i=0;i<g->vexnum;++i){
        if(!visited[i]){
            visited[i] = 1;
            Visit(g->AdjList[i]);
            q.elem[++q.rear]=i;
            while(q.front!=q.rear){
                int w=q.elem[++q.front];
                node_ptr cur = g->AdjList[w].link;
                // loop through the vertex's link
                while(cur!=NULL){
                    int j=LocateVertexHead(g,cur->vertex);
                    if(!visited[j]){
                        visited[j]=1;
                        Visit(*cur);
                        q.elem[++q.rear]=j;
                    }
                    cur=cur->link;
                }
            }
        }
    }
    putchar(10);
}

void DestroyGraph(Graph g){
    // destroy every links of heads
    for(int i=0;i<g->vexnum;++i)
        Destroy(g->AdjList[i].link);
    delete g;
}

int main(){
    string s;
    int iserror;
    while(getline(cin,s)){
        if(s=="0") return 0;
        iserror = 1;
        Graph cur = Create();
        string delimiter = " ";
        size_t pos = 0;
        string token,token2;
        s+=" ";
        while((pos = s.find(delimiter)) != string::npos) {
            token = s.substr(0,pos);
            s.erase(0,pos + delimiter.length());
            if(iserror!=-1) iserror = InsertVertex(cur,token);
        }
        getline(cin,s);
        s+=" ";
        if(s.empty()) {puts("This is an empty graph!");continue;}
        pos = 0;
        while((pos = s.find(delimiter)) != string::npos) {
            token = s.substr(0,pos);
            s.erase(0,pos + delimiter.length());
            pos = s.find(delimiter);
            token2 = s.substr(0,pos);
            s.erase(0,pos + delimiter.length());
            //   undirected graph
            if(iserror!=-1) iserror = InsertEdge(cur,token,token2);
            if(iserror!=-1) iserror = InsertEdge(cur,token2,token);
        }
        if(iserror!=-1) BFS(cur);
        /*
           // ----------------------------------- test ---------------------------------
            DeleteVertex(cur,"C");
            BFS(cur);
            DeleteEdge(cur,"A","B");
            BFS(cur);
        */
        DestroyGraph(cur);
    }
}

#Algorithms

Data Structure

Lab1 Recursion

Subject1. Solve the hanoi problem with recursion

Introduction

Algorithms Specification

Test results

Analysis and Comments

Source code

Subject2. The problem of finding the paths and the number of paths

Introduction

Algorithms Specification

Test results

Analysis and Comments

Source code

Lab2 List

Subject1. Sequential list

Introduction

Algorithms Specification

Test Results

Analysis and Comments

Source Code

Subject2. Linked list

Introduction

Algorithms Specification

Test Results

Analysis and Comments

Source Code

Subject3. Polynomial (optional)

Introduction

Algorithms Specification

Test Results

Analysis and Comments

Source Code

Lab3 Binary Tree

Subject1. The creation and traversal of binary trees

Introduction

Algorithms Specification

Test results

Analysis and Comments

Source code

Subject2. Application of BST/AVL tree.(Optional)

Introduction

Algorithms Specification

Test results

Analysis and Comments

Source code

Lab4 Sorting

Introduction

Subject1. Basic sorting algorithm & binary insertion sort

Algorithms Specification

Test Results

Analysis and Comments

Source Code

Subject2. One ACM problem (optional):

Problem Statement

Test Results

Analysis and Comments

Source Code

Lab5 Basic Operations on Graph

Experiment Objectives

Experiment Contents

Algorithms Specification

Test Results

Analysis and Comments

time complexity

Source Code

Comments

Your browser is out-of-date!