<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>2227-1899</journal-id>
<journal-title><![CDATA[Revista Cubana de Ciencias Informáticas]]></journal-title>
<abbrev-journal-title><![CDATA[Rev cuba cienc informat]]></abbrev-journal-title>
<issn>2227-1899</issn>
<publisher>
<publisher-name><![CDATA[Editorial Ediciones Futuro]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S2227-18992017000100004</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Adapting a Reinforcement Learning Approach for the Flow Shop Environment with sequence-dependent setup time]]></article-title>
<article-title xml:lang="es"><![CDATA[Adaptación de un Algoritmo del Aprendizaje Reforzado para el Flow Shop con tiempos de configuración]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Fonseca-Reyna]]></surname>
<given-names><![CDATA[Yunior César]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Martínez-Jiménez]]></surname>
<given-names><![CDATA[Yailen]]></given-names>
</name>
<xref ref-type="aff" rid="A02"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Universidad de Granma Departamento de Informática ]]></institution>
<addr-line><![CDATA[Bayamo Granma]]></addr-line>
<country>Cuba</country>
</aff>
<aff id="A02">
<institution><![CDATA[,Universidad Central de las Villas Departamento de Ciencia de la Computación ]]></institution>
<addr-line><![CDATA[Santa Clara Villa Clara]]></addr-line>
<country>Cuba</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>03</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>03</month>
<year>2017</year>
</pub-date>
<volume>11</volume>
<numero>1</numero>
<fpage>41</fpage>
<lpage>57</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_arttext&amp;pid=S2227-18992017000100004&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_abstract&amp;pid=S2227-18992017000100004&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_pdf&amp;pid=S2227-18992017000100004&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[ABSTRACT The tasks scheduling problem on linear production systems, Flow Shop Scheduling Problems, has been a great importance in the operations research which seeks to establish optimal job scheduling in machines within a production process in an industry in general. The problem considered here is to find a permutation of jobs to be sequentially processed on a number of machines under the restriction that the processing of each job has to be continuous with respect to the objective of minimizing the completion time of all jobs, known in literature as makespan or Cmax. Furthermore, its considerate setup-time between two jobs and initial preparation times of machines. This problem is as NP-hard, it is typical of combinatorial optimization and can be found in manufacturing environments, where there are conventional machines-tools and different types of pieces which share the same route. In this paper presents an adaptation of Reinforcement Learning algorithm known as Q-Learning to solve problems of the Flow Shop category. This algorithm is based on learning an action-value function that gives the expected utility of taking a given action in a given state where an agent is associated to each of the resources. Finally, the algorithm is tested with problems of different levels of complexity in order to obtain satisfactory results in terms of solutions quality.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[RESUMEN El Flow Shop Scheduling es un problema de optimización que se presenta con frecuencia en sistemas de producción convencionales automatizados. Este es un problema común donde está involucrada la toma de decisiones con respecto a la mejor asignación de recursos a procesos de información en los cuales se tienen restricciones de temporalidad. Este problema es típico de la optimización combinatoria y se presenta en talleres con tecnología de maquinado donde existen máquinas-herramientas convencionales y se fabrican diferentes tipos de piezas que tienen en común una misma ruta. En este artículo se presenta una adaptación de un enfoque del Aprendizaje Reforzado conocido en la literatura como Q-Learning para resolver problemas de scheduling de tipo Flow Shop con tiempos de configuración entre trabajos y tiempos iniciales de preparación de las máquinas, teniendo como objetivo minimizar el tiempo de finalización de todos los trabajos, conocido en la literatura como makespan o Cmax. Por último, se presentan casos de pruebas para comprobar la validez de dicha adaptación de este algoritmo al problema de secuenciación de tareas.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Flow-shop]]></kwd>
<kwd lng="en"><![CDATA[makespan]]></kwd>
<kwd lng="en"><![CDATA[optimization]]></kwd>
<kwd lng="en"><![CDATA[q-learning]]></kwd>
<kwd lng="en"><![CDATA[scheduling]]></kwd>
<kwd lng="en"><![CDATA[Aprendizaje reforzado]]></kwd>
<kwd lng="en"><![CDATA[flow-shop]]></kwd>
<kwd lng="en"><![CDATA[makespan]]></kwd>
<kwd lng="en"><![CDATA[optimización]]></kwd>
<kwd lng="en"><![CDATA[secuenciación]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[ <p align="right"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><B>ART&Iacute;CULO  ORIGINAL</B></font></p>     <p>&nbsp;</p>     <p><font size="4"><strong><font face="Verdana, Arial, Helvetica, sans-serif">Adapting a Reinforcement  Learning Approach for the Flow Shop Environment with sequence-dependent setup time</font></strong></font></p>     <p>&nbsp;</p>     <p><font size="3"><strong><font face="Verdana, Arial, Helvetica, sans-serif">Adaptaci&oacute;n de un Algoritmo del Aprendizaje Reforzado para  el Flow Shop con tiempos de configuraci&oacute;n</font></strong></font></p>     <p>&nbsp;</p>     <p>&nbsp;</p>     <P><font size="2"><strong><font face="Verdana, Arial, Helvetica, sans-serif">Yunior C&eacute;sar Fonseca-Reyna<strong><sup>1*</sup></strong>, </font></strong><font face="Verdana, Arial, Helvetica, sans-serif"><strong>Yailen Mart&iacute;nez-Jim&eacute;nez<sup>2</sup></strong></font></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><sup>1 </sup>Departamento de Inform&aacute;tica,  Universidad de Granma, Km 18 &frac12; Carretera Manzanillo, Bayamo, Granma, Cuba, fonseca@udg.co.cu </font> <font size="2" face="Verdana, Arial, Helvetica, sans-serif">    <br>     <sup>2 </sup>Departamento de Ciencia de la  Computaci&oacute;n, Universidad Central de las Villas, Carretera a Camajuan&iacute; Km 5 &frac12;,  Santa Clara, Villa Clara, Cuba, e-mail: yailenm@uclv.edu.cu         ]]></body>
<body><![CDATA[<br>       </font></p>     <P><font face="Verdana, Arial, Helvetica, sans-serif"><span class="class"><font size="2">*Autor para la correspondencia: </font></span></font><font size="2" face="Verdana, Arial, Helvetica, sans-serif"> <a href="mailto:fonseca@udg.co.cu">fonseca@udg.co.cu</a><a href="mailto:fjsilva@cenatav.co.cu"></a><a href="mailto:jova@uci.cu"></a></font><font face="Verdana, Arial, Helvetica, sans-serif"><a href="mailto:losorio@ismm.edu.cu"></a> </font>     <p>&nbsp;</p>     <p>&nbsp;</p> <hr>     <P><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>ABSTRACT</b></font>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The  tasks scheduling problem on linear production systems, Flow Shop Scheduling  Problems, has been a great importance in the operations research which seeks to  establish optimal job scheduling in machines within a production process in an  industry in general. The problem considered here is to find a permutation of  jobs to be sequentially processed on a number of machines under the restriction  that the processing of each job has to be continuous with respect to the  objective of minimizing the completion time of all jobs, known in literature as  makespan or Cmax. Furthermore, its considerate setup-time between two jobs and initial  preparation times of machines. This problem is as NP-hard, it is typical of  combinatorial optimization and can be found in manufacturing environments,  where there are conventional machines-tools and different types of pieces which  share the same route. In this paper presents an adaptation of Reinforcement  Learning algorithm known as Q-Learning to solve problems of the Flow Shop category.  This algorithm is based on learning an action-value function that gives the  expected utility of taking a given action in a given state where an agent is  associated to each of the resources. Finally, the algorithm is tested with  problems of different levels of complexity in order to obtain satisfactory  results in terms of solutions quality.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>Key words:</b> Flow-shop,  makespan; optimization; q-learning, scheduling.</font></p> <hr>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>RESUMEN</b></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">El <em>Flow Shop Scheduling</em> es un problema de optimizaci&oacute;n que se presenta  con frecuencia en sistemas de producci&oacute;n convencionales automatizados. Este es  un problema com&uacute;n donde est&aacute; involucrada la toma de decisiones con respecto a  la mejor asignaci&oacute;n de recursos a procesos de informaci&oacute;n en los cuales se  tienen restricciones de temporalidad. Este problema es t&iacute;pico de la  optimizaci&oacute;n combinatoria y se presenta en talleres con tecnolog&iacute;a de maquinado  donde existen m&aacute;quinas-herramientas convencionales y se fabrican diferentes  tipos de piezas que tienen en com&uacute;n una misma ruta. En este art&iacute;culo se  presenta una adaptaci&oacute;n de un enfoque del Aprendizaje Reforzado conocido en la  literatura como &nbsp;Q-Learning para resolver  problemas de scheduling de tipo <em>Flow Shop</em> con tiempos de configuraci&oacute;n entre trabajos y tiempos iniciales de preparaci&oacute;n  de las m&aacute;quinas, teniendo como objetivo minimizar el tiempo de finalizaci&oacute;n de  todos los trabajos, conocido en la literatura como <em>makespan</em> o Cmax. Por &uacute;ltimo, se presentan casos de pruebas para  comprobar la validez de dicha adaptaci&oacute;n de este algoritmo al problema de  secuenciaci&oacute;n de tareas.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>Palabras clave:</b></font> <font size="2" face="Verdana, Arial, Helvetica, sans-serif">Aprendizaje reforzado, flow-shop,  makespan, optimizaci&oacute;n, secuenciaci&oacute;n.</font></p> <hr>     ]]></body>
<body><![CDATA[<p>&nbsp;</p>     <p>&nbsp;</p>     <p><font size="3" face="Verdana, Arial, Helvetica, sans-serif"><b>INTRODUCCI&Oacute;N</b></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Scheduling is a very active  field with a high practical relevance. For a long time, manufacturing  environment have been known for requiring distributed solution approaches in  order to find high-quality solutions, because of their intrinsic complexity  and, possibly due to an inherent distribution of the tasks that are involved (Akhshabi  y&nbsp; Khalatbari, 2011; Wu, et al., 2005). This is a decision making  process that is used on a regular basis in every situation where a speci&#64257;c set  of tasks has to be performed on a speci&#64257;c set of resources. Practical machine scheduling problems are numerous and  varied. They arise in diverse areas such as flexible manufacturing systems,  production planning, computer design, logistics, comunication, etc. where  the schedule construction process plays an important role, as it can have a  major impact on the productivity of the company. A scheduling problem is to find sequences of jobs on given machines with  the objective of minimising some function of the job completion times (Pinedo, 2008; &Scaron;eda, 2007). Manufacturing  scheduling is defined as an optimization process that allocates limited  manufacturing resources over time among parallel and sequential manufacturing  activities. This allocation must obey a set of constraints that reflect the  temporal relationships between activities and the capacity limitations of a set  of shared resources.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The problems can be  classified according to different characteristics, for example, the number of  machines (one machine, parallel machines), the job characteristics (preemption  allowed or not, equal processing times) and so on. When each job has a fixed  number of operations requiring different machines, we are dealing with a shop  problem, and depending on the constraints it presents, it can be classified as  Open Shop, Job Shop, Flow Shop, etc (Brucker,  2007; Doulabi,  et al., 2010; Seido Naganoa, et al., 2012) .</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">In this research we focus  on manufacturing scheduling where all jobs share de same route, specifically  the Flow Shop Scheduling (FSSP) which have been extensively studied due to  their application in industry. This problem is typical of  combinatorial optimization and can be found in manufacturing environments,  where there are conventional machines-tools and different types  of pieces which share the same route.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The scheduling literature is abundant with solutions  procedures for the general flow shop scheduling problem for developing  permutation schedules to minimize the makespan or another criteria. Ruiz and  Moroto (Ruiz y&nbsp; Moroto, 2005), and Mehmet and Betul (Mehmet y&nbsp; Betul, 2014) have presented an extensive  review and evaluation of many exact methods, approximation methods, heuristics  and meta-heuristics for the flow shop scheduling problem with the makespan  criterion.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Due to  the NP-hard (Anc&acirc;u, 2012; &#268;i&#269;kov&aacute; y&nbsp; &Scaron;tevo, 2010; Garey, et al., 1976) nature of the  problem, most of the solution procedures employ heuristic approaches to obtain  near-optimal sequences in reasonable time. There are many various methods for  an approximation of the optimal solution by searching only a part of the space  of feasible solutions (represented here by all permutations). For complex  combinatorial problems, stochastic heuristic techniques are frequently used. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">In 1954  Johnson presented an algorithm that yielded optimum sequencing for an n-job,  2-machine problem (Johnson, 1954). Researchers have  tried to extend this notorious result to obtain polynomial time algorithms for  more general cases (Betul y&nbsp; Mehmet  Mutlu, 2008; Kubiak, et al., 2002; Li, et al., 2011; Tavares-Neto y&nbsp; Godinho-Filho, 2011). Other outhors  proposed a mathematical models for flow shop scheduling based on a mixed  integer programming model (Ramezanian, et al., 2010; &Scaron;eda, 2007).&nbsp; </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Anc&acirc;u(Anc&acirc;u, 2012) proposed two  variants of heuristic algorithms to solve the classic FSSP. Both algorithms are  simple and very efficient. First algorithm is a constructive heuristic based on <em>&alpha;</em>-greedy selection, while the second  algorithm is a modified version of the previous, based on iterative stochastic  start. The numerical results show the good position of the proposed algorithms  within the top known as best heuristic algorithms in the field.</font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Framinan  et al. (Framinan, et al., 2002) proposed two  heuristics based on the NEH heuristic(Nawaz, et al., 1983) for the <em>m</em>-machine FSSP problem to minimize  makespan and flowtime. The proposed heuristics were evaluated and found to be  better than existing heuristics.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Branch-and-bound(B&amp;B)  technique can &#64257;nd optimal solution but at a very high computational cost and  therefore cannot attempt very large problems. This algorithm can be used to  find optimal solutions for small size flow shop problems. Some author applied  B&amp;B, for example Peter Bruker in your PhD thesis. He presented a method  based in branch and bound techniques to solve general scheduling problems,  where find a factible solutions to the FSSP (Brucker, 2007). </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Nagar et  al. (Nagar, et al., 1995) proposed a B&amp;B  procedure for the 2-machine flow shop problem to minimize a weighted sum of  flow time and makespan. They also presented a greedy algorithm for the upper  bound for the B&amp;B algorithm. The B&amp;B method can be used as a preceding  algorithm to a heuristic in order to obtain an initial solution.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Sayin and  Karabati (Say&#305;n y&nbsp; Karabat&#305;, 1999) presented a B&amp;B  algorithm for a 2-machine flow shop with makespan and flowtime objectives. The  algorithm obtained all of the efficient solutions to the problem.    <br>   Parviz et  al.(Parviz, et al., 2014) demostrate the  efficience of B&amp;B&nbsp; methodology. The  considered objective is to minimize the completion time of all products  (makespan). In this research, some lower and upper bounds are developed to  increase the efficiency of the proposed algorithm. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">In recent  years, metaheuristic approaches such as simulated annealing (SA), tabu search  (TS), genetic algorithms (GA) are very desirable to solve combina-torial  optimization problems regarding to their computational performance. As  considering the recent studies for the flow shop scheduling problem, it is  obvious that the solution methods based on metaheuristic approach are  frequently proposed. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Takeshi  Yamada(Yamada, 2003) applied GA, SA and  TS to the jobshop scheduling problem (and the flowshop scheduling problem as  its special case) which is among the hardest combinatorial optimization  problems. The author demostrated that the research in this dissertation help  advance in the understanding of this significant field. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Ling Wang  et. al(Ling Wang, et al., 2006)&nbsp; proposed an hybrid genetic algorithm  (HGA)&nbsp; for permutation flow shop  scheduling with limited buffers where multiple genetic operators based on  evolutionary mechanism are used simultaneously, and a neighborhood structure  based on graph model is employed to enhance the local search. The result  obtained were compared with SA and TS results and demostrated the  effectiviness&nbsp; of&nbsp; HGA.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Varadharajan  and Rajendran(Varadharajan y&nbsp; Rajendran, 2005) presented a  simulated annealing algorithm for the <em>m</em>-machine  flow shop problem with the objectives of minimizing makespan and total  flowtime. Two variants of the proposed simulated annealing algorithm, with  different parameter settings, were shown to out perform four previous multi-  objective flow shop scheduling algorithms.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Nagar et.  al(Nagar, et al., 1995). combined the  B&amp;B procedure with a GA to find approximate solutions to the objective  function made of the weighted sum of average flowtime and makespan for the  2-machine problem.</font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Other  researchers apply these metaheuristics and obtained good solutions(Akhshabi y&nbsp; Khalatbari, 2011; &Aacute;lvarez, et al., 2008; Chaudhry y&nbsp; Munem khan, 2012; Fonseca, et al., 2014; Ling Wang, et al., 2006; Reeves, 1995; Sadegheih, 2006; Y. Zhang, et al., 2009). </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The Ant  Colony Optimization (ACO) approach has been used to solve combinatorial  optimization problems. Xiangyong Li et. al(Li, et al., 2011) compared three  different mathematical formulations and propose an ACO based metaheuristic to  solve this flow shop scheduling problem where demostrated that this  metaheuristic is computationally efficient. Other authors applied this  technique to minimizing the makespan or another objectives in a permutational  flowshop environment and tested with well-known problems in literature (Betul y&nbsp; Mehmet  Mutlu, 2008, 2010; Rajendran y&nbsp; Ziegler, 2004; Tavares-Neto y&nbsp; Godinho-Filho, 2011).</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Tasgetiren  et. al investigated a Particle Swarm Optimization algorithm(PSO), called PSOvns  and HCPSO respectively, which found many best solutions for the first 90  Taillard benchmark instances(Taillard, 1993; Tasgetiren, et al., 2007). On the other hand,  Quan-Ke(Quan-Ke, et al., 2008) applied a discrete  particle swarm optimization algorithm for the no-wait flowshop scheduling  problem. Rahimi-Vahed and Mirghorbani(Rahimi-Vahed y&nbsp; SM., 2007) studied a PSO  approach with the objectives of weighted mean completion time and weighted mean  tardiness. The proposed multi-objective particle swarm algorithm was compared  with a multi-objective genetic algorithm. The proposed algorithm out-performed  the multi-objective genetic algorithm on some specific performance metrics.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Zhang and  Xiaoping(Yi Zhang y&nbsp; Xiaoping, 2011) applied an Hybrid Estimation  of Distribution Algorithm (EDA) for permutation flow shops. This method  improved 42 out of 90 current best solutions for Taillard benchmark instances.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Based on  idea of adaptative learnig, Anurag Agarwal et. al(Anurag, et al., 2006) proposed an  improvement-heuristic approach for the general flow-shop problem. This approach  employs a one-pass heuristic to give a good starting solution in the search  space and uses a weight parameter to perturb the data of the original problem  to obtain improved solutions. This algorithm obtained good solution for several  benchmark problem sets.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">All the  previous approaches focus on optimization problems that are actually a very  simplified version of reality. The exclusion of real-world constraints prevent  the applicability of those methods. The industry needs systems for optimized  production scheduling which adjust to the conditions in the production plant  and generate good solutions in a short time. In this paper we tackle a flow  shop scheduling problem with sequence dependent setup time and initial preparation  times of machines with the criterion of total completion time minimization.  This criterion is more realistic than the more common makespan minimization, as  it is known it increases productivity while at the same time it reduces the  work-in-progress.</font></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif"><strong><font size="3">COMPUTATIONAL METHODOLOGY </font></strong></font></p>     <p><font size="2"><strong><font face="Verdana, Arial, Helvetica, sans-serif">Flow Shop Scheduling Description</font></strong></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The Flow Shop Scheduling  is one of the most important problems in the area of production management (&#268;i&#269;kov&aacute; y&nbsp; &Scaron;tevo, 2010). It can be briefly described as follows: there are a set of <strong><em>m</em></strong> machines and a set of <strong><em>n</em></strong> jobs. Each job comprises a set of <strong><em>m</em></strong> operations which must be executed on different machines. All jobs have the same  processing order when passing through the machines. There are no precedence  constraints among operations of different jobs. Operations cannot be  interrupted and each machine can process only one operation at a time. The  problem is to find the job sequences on the machines that minimize the  makespan, which is the maximum completion time of all the operations. The flow  shop scheduling problem is NP-complete and thus it is usually solved by  approximation or heuristic methods (&Aacute;lvarez, et al., 2008; Toro, et al., 2006; Toro, et al., 2006b) .</font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The problem investigated in this paper is  conventionally given the notation <strong><em>n|m|p|C<sub>max</sub>&nbsp; </em></strong>(Reeves, 1995) and is defined as follows:</font></p> <ul>       <li>         <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Each job <em>i</em> can only be processed on one machine at any time.</font></p>   </li>       <li>         <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Each machine <em>j</em> can process only one job <em>i</em> at any  time.</font></p>   </li>       <li>         <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">No preemption is allowed, i.e. the processing of a job <em>i</em> on a machine <em>j</em> cannot be interrupted. </font></p>   </li>       <li>         <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">All jobs are independent and are available for  processing at time zero.</font></p>   </li>       <li>         ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The setup-times of the jobs on machines are considerate.</font></p>   </li>       <li>         <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The machines are continuously available.</font></p>   </li>       <li>         <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The initial preparation times of machines are  considerate.</font></p>   </li>     </ul>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">As mentioned, the  objective is to find a permutation of jobs to be sequentially processed on a  number of machines under the restriction that the processing of each job has to  be continuous with respect to the objective of minimizing the Cmax.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Therefore:</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">If we have <em>r(j) </em>as the machine <em>j</em> preparation time, <em>p(i, j) </em>as  theprocessing time of job <em>i</em> on machine <em>j</em>, <em>s(i, k, j)</em> as the  setup-time between job <em>i</em> and job <em>k</em> on machine <em>j,</em> and a job permutation {<em>J1,  J2,&hellip;,Jn</em>}, then we calculate the completion times <em>C(Ji, j)</em> as follows:</font></p>     <p align="center"><img src="/img/revistas/rcci/v11n1/fo0104117.jpg" alt="fo01" width="573" height="118"></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">In other words, <strong><em>Cmax</em></strong> is the time of the last operation in the last machine (R&iacute;os-Mercado, 1999, 2001).</font></p>     <p><font size="2"><strong><font face="Verdana, Arial, Helvetica, sans-serif">Reinforcement Learning and Multi-Agent Systems</font></strong></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The ideas involved in Reinforcement  Learning (RL) were originally developed by Sutton and Barto (Sutton y&nbsp; Barto, 1998) and applied to topics of interest to researchers in  Artificial Intelligence. RL is learning what to do (how to map situations to  actions) so as to maximize a numerical reward signal. In the standard RL model,  an agent is connected to its environment via perception and action, as depicted  in Figure 1. In each interaction step, the agent perceives the current state <strong><em>s</em></strong> of its environment, and then selects an action <strong><em>a </em></strong>to change this state.  This transition generates a reinforcement signal <strong><em>r</em></strong>, which is received by  the agent. The task of the agent is to learn a policy for choosing actions in  each state to receive the maximal long-run cumulative rewards. RL methods  explore the environment over time to come up with a desired policy (Mart&iacute;nez, 2012).</font></p>     <p align="center"><img src="/img/revistas/rcci/v11n1/f0104117.jpg" alt="f01" width="300" height="206"><a name="f01"></a></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">A typical type of the  environment is one that possesses the Markov property. In such an environment,  what will happen in the future depends on the current state of the environment  and the action and only on this. Most reinforcement learning researchers have  been focusing on learning in this type of environment, coming up with a number  of important reinforcement learning methods such as the Q-learning algorithm (C. Watkins, 1989; C.&nbsp;  Watkins y&nbsp; Dayan, 1992). </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">One of the challenges that  arise in reinforcement learning and not in other kinds of learning is the  trade-off between exploration and exploitation. To obtain a high reward, a  reinforcement learning agent must prefer actions that it has tried in the past  and found to be effective in producing reward. But to discover such actions, it  has to try actions that it has not selected before. The agent has to exploit  what it already knows in order to obtain reward, but it also has to explore in  order to make better action selections in the future. The dilemma is that  neither exploration nor exploitation can be pursued exclusively without failing  at the task. The agent must try a variety of actions and progressively favor  those that appear to be best. On a stochastic task, each action must be tried  many times to gain a reliable estimate its expected reward. Proper control of  the tradeoff between exploration and exploitation is important in order to  construct an efficient learning method. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Formally, the basic  reinforcement learning model consists of: </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">-&nbsp; a set of  environment states S; </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">-&nbsp; a set of  actions A;</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">-&nbsp; a set of scalar &quot;rewards&quot; in <img src="/img/revistas/rcci/v11n1/fo0204117.jpg" alt="fo02" width="13" height="13"></font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">- a transition function <em>T</em>.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">At  each time <em>t</em>, the agent perceives its  state <img src="/img/revistas/rcci/v11n1/fo0304117.jpg" alt="fo03" width="41" height="16"> and  the set of possible actions A(st). It chooses an action <img src="/img/revistas/rcci/v11n1/fo0404117.jpg" alt="fo04" width="62" height="18"> and  receives from the environment the new state st+1 and a reward rt+1,  this means that the agent implements a mapping from states to probabilities of  selecting each possible action. This mapping is called the agent's policy and  is denoted &pi;t,  where &pi;t(<em>s, a</em>) is the  probability that at = <em>a</em> if  st = s, in words, is the probability of selecting action <em>a </em>in state <em>s</em>at time <em>t</em>.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The  reward function defines the goal in a RL problem. Roughly speaking, it maps  each perceived state (or state-action pair) of the environment to a single  number, a reward, indicating the intrinsic desirability of that state. A RL  agent's sole objective is to maximize the total reward it receives in the long  run. The reward function defines which the good and bad events are for the  agent. Besides RL, intelligent agents can be designed by other paradigms,  notably planning and supervised learning, but there exist some differences  between these approaches. In general, planning methods require an explicit  model of the state transition &delta;(s, a). Given such a model, a planning algorithm  can search through the state-action space to find an action sequence that will  guide the agent from an initial state to a goal state. Since planning  algorithms operate using a model of the environment, they can backtrack or  &ldquo;undo&rdquo; state transitions that enter undesirable states. In contrast, RL is  intended to apply to situations in which a sufficiently tractable action model  does not exist. Consequently, an agent in the RL paradigm must actively explore  its environment to observe the effects of its actions. Unlike planning, RL  agents normally cannot undo state transitions. Of course, in some cases it may  be possible to build up an action model through experience (Sutton y&nbsp; Barto, 1998), enabling more planning as  experience accumulates. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">So basically there are two approaches: </font></p> <ul>       <li>         <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Model based approach: learn the model, and use it to  derive the optimal policy. </font></p>   </li>       <li>         <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Model free approach: derive the optimal policy without  learning the model. </font></p>   </li>     </ul>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Agents can also be trained  through supervised learning. In supervised learning, the agent is presented  with examples of state-action pairs, along with an indication that the action  was either correct or incorrect. The goal in supervised learning is to induce a  general policy from the training examples. Thus, supervised learning requires  an oracle that can supply correctly labeled examples. In contrast, RL does not  require prior knowledge of correct and incorrect decisions. RL can be applied  to situations in which rewards are sparse, for example, rewards may be  associated only with certain states. In such cases, it may be impossible to  associate a label of correct or incorrect on particular decisions without  reference to the agent&rsquo;s subsequent decisions, making supervised learning  infeasible (Moriarty, et al., 1999). </font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">In summary, RL provides a  flexible approach to the design of intelligent agents in situations for which,  for example, planning and supervised learning are impractical. RL can be  applied to problems for which significant domain knowledge is either  unavailable or costly to obtain (Moriarty, et al., 1999). In this sense, some authors have applied RL approaches  to solve scheduling problems.  Bert Van Vreckem et. al (Bert Van Vreckem, et al., 2013) proposed a  method based on Learning Automata to solve Hybrid Flexible Flowline Scheduling  Problems (HFFSP) with additional constraints like sequence dependent setup  times, precedence relations between jobs and machine eligibility. Experiments  on a set of benchmark problems indicate that this method can yield good  results. On the other hand, Suarez (Su&aacute;rez, 2010)  introduce an alternative to solve the Job Shop Scheduling Problem with Parallel  Machines using the QL algorithm. The results obtained by the alternative  proposed are compared with the results reported by some other approaches. Bargaoui  and Belkahala (Bargaoui y&nbsp;  Belkahla, 2014) opted for a Multi-agent  architecture based on cooperative behavior allied with the Tabu Search meta-heuristic  to solve FSSP. The proposed approach has been tested on different benchmarks  data sets and results demonstrate that it reaches high-quality solutions.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">In this paper QL algorithm is first described and then applied to the solution of  the <strong><em>n|m|p|Cmax </em></strong>sequencing problem. In order to validate the quality of the solutions, computational results  will be presented and compared with the optimum values of test  problems. </font></p>     <p><font size="2"><strong><font face="Verdana, Arial, Helvetica, sans-serif">Q-Learning Algorithm</font></strong></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">A well-known RL algorithm  is Q-Learning (Mart&iacute;nez, 2012), which works by learning an action-value function that  expresses the expected utility (i.e. cumulative reward) of taking a given  action in a given state. The core of the algorithm is a simple value iteration  update, each state-action pair <strong><em>(s, a)</em></strong> has a Q-value associated.  When action a is selected by the agent located in state <strong><em>s</em></strong>, the Q-value for that  state-action pair is updated based on the reward received when selecting that  action and the best Q-value for the subsequent state . The update rule for the state action pair <strong><em>(s, a)</em></strong> is the following:</font></p>     <p align="center"><img src="/img/revistas/rcci/v11n1/fo0504117.jpg" alt="fo05" width="399" height="27"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">In  this expression <strong>&alpha;<img src="/img/revistas/rcci/v11n1/fo0604117.jpg" alt="fo06" width="12" height="12"> </strong> [0, 1] is the learning  rate and <strong><em>r </em></strong>the reward or penalty resulting from taking action <strong>&alpha; </strong>in state <strong><em>s</em></strong>. The learning rate <strong>&alpha; </strong>determines &lsquo;the degree&rsquo; by which the  old value is updated. QL has the advantage that is proven to converge to the  optimal policy in Markov Decision Processes under some restrictions (Tsitsiklis, 1994).</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a href="/img/revistas/rcci/v11n1/fo0704117.jpg" target="_blank">Algorithm  1</a> is used by the agents to learn from experience or training. Each episode is  equivalent to one training session. In each training session, the agent  explores the environment and gets the rewards until it reaches to goal state.  The purpose of the training is to enhance the knowledge of the agent  represented by the Q-values. More training will give better values that can be  used by the agent to move in more optimal way.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The agents need to balance  between exploration and exploitation. The &#1108;-greedy action selection method  instructs the agent to follow the current policy &pi; most of the time, but  sometimes, to choose an action at random (with equal probability for each  possible action <strong><em>a</em></strong> in the current state <strong><em>s</em></strong>). The probability <strong><em>&#1108;</em></strong> determines when to choose a random action; this allows some balance between  exploration and exploitation.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><strong>Adapting Q-Learning to solve the FSSP</strong></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">In the FSSP all the jobs  have the same processing operation order when passing through the machines.  This model takes the processing times of the operations as input parameters,  with the objective of finding&nbsp; certain  job sequence that minimizes the idles time, in the long run.</font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">To fit the QL method, it  is reasonable to define states as job sequences, or more precisely job  precedence relations. State-changes (or actions) are defined as changes in the  relations. An action step is performed by a permutation operator, which sets up  a job sequence according to precedence preferences. At the beginning no  preferences are given, so states are randomly traversed. As learning proceeds,  preferences are updated, which, in turn, influences the action selection policy  converging to the found quasi-optimal job sequence.&nbsp; From this respect the learning algorithm is a  directed search procedure.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">In this research, we take into  account <strong><em>n|m|p|Cmax</em></strong> where we have only one agent associated  with a first resource (machine). This agent will make decisions about future  actions. For this agent taking an action means deciding which job to process  next from the set of currently available jobs. When a job is selected, this is  processed by all the machines.&nbsp; The agent  can select the best job taking into account the associated <em>q-value</em> (exploration), or can select one job randomly  (exploration). The action selection mechanism is executed by an <em>&epsilon;-greedy </em>strategy described in (Mart&iacute;nez, 2012).</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">In our  approach, we have one agent that will execute <strong><em>n </em></strong>actions (one operation  from each of the <strong><em>n</em></strong> jobs). According to (Gabel y&nbsp; Riedmiller, 2007), the set of states for the  agent is defined as: <img src="/img/revistas/rcci/v11n1/fo0804117.jpg" alt="fo08" width="76" height="20"> this  give raise to <img src="/img/revistas/rcci/v11n1/fo0904117.jpg" alt="fo09" width="65" height="21"> local  states for every agent <strong><em>i</em></strong>, in our case, <strong><em>i</em></strong> = 1, which results in  an upper limits of <img src="/img/revistas/rcci/v11n1/fo1004117.jpg" alt="fo10" width="110" height="21">possible system states if  we have, for example, 6 jobs.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">There are different  possible feedback signals that can be used when solving a scheduling problem (Mart&iacute;nez, 2012). We are using cost as reward signal, meaning that the lower  the cost the better the action, which is based on the idea that a makespan of a  schedule is minimized if not many resources with queued jobs are in the system.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The proposed <a href="#fo11">algorithm</a> is  summarize as:</font></p>     <p align="center"><img src="/img/revistas/rcci/v11n1/fo1104117.jpg" alt="fo11" width="399" height="304"><a name="fo11"></a></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif"><strong><font size="3">COMPUTATIONAL RESULTS</font></strong></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">FSSP benchmark  problems have been defined by several authors and widely used by many researchers  in the scheduling field to test their solutions and compare them with solutions  of other approaches. They are available online (Beasley,  1990; Taillard,  1993). However, there are  no benchmark problems available for the flow shop scheduling problems with  sequence-dependent setup time and initial preparation times of machines. For  this reason, all data for the computational experiments are generated randomly.  In order to test the proposed algorithm, ten different cases are used. &nbsp;Taking into account that the search space of the problem is <em>n!</em>, these instances were created with  small dimensions in order to perform an exhaustive search in this space  determining the optimal solutions and compare them with those obtained by QL algorithm.  There were 10 instances. Their  size are 5x3, 5x4, 5x5, 7x6, 7x7, 8x8, 9x4, 9x9, 10x8 and 10x10. We generated  some random numbers to create the initial preparation time of machines and the  setup-time between two jobs. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">To  determinate the quality of our solutions, the Relative Error (RE)  is defined as:</font></p>     ]]></body>
<body><![CDATA[<p align="center"><img src="/img/revistas/rcci/v11n1/fo1204117.jpg" alt="fo12" width="139" height="45"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Where <em>MK</em> is the best makespan  obtained by our approach and <em>OP</em> is  the optimum. The MRE takes into account the <em>RE</em> of the whole instances.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a href="#t01">Table 1</a>, <a href="#t02">Table 2</a> and <a href="/img/revistas/rcci/v11n1/t0304117.jpg" target="_blank">Table 3</a> shows the processing times, setup-times and initial preparation  time of machines for 5x5 instance. </font></p>     <p align="center"><img src="/img/revistas/rcci/v11n1/t0104117.jpg" alt="t01" width="220" height="164"><a name="t01"></a> <img src="/img/revistas/rcci/v11n1/t0204117.jpg" alt="t02" width="196" height="166"><a name="t02"></a></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">We  coded the Q-Learning algorithm in Java, running on a PC with Core i3 3.5 GHz  CPU with 2 GB RAM. &nbsp;<a href="#f02">Figure 2</a> shows the  solution for the instance 5x5 where Cmax = 114.&nbsp; <a href="#t05">Table 5</a> shows the experimental results in  relation to the optimal values for the instances set.</font></p>     <p align="center"><img src="/img/revistas/rcci/v11n1/f0204117.jpg" alt="f02" width="539" height="227"><a name="f02"></a></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">From <a href="#t05">Table 5</a> we can see that the proposed algorithm is able to obtain  good results.&nbsp; The algorithm obtains 5  optimal results and 5 slightly worse values for the instances set. The MRE for  all instances was less than 0.03% taking into account the optimal values.</font></p>     <p align="center"><img src="/img/revistas/rcci/v11n1/t0504117.jpg" alt="t05" width="524" height="162"><a name="t05"></a></p>     <p>&nbsp;</p>     <p>&nbsp;</p>     ]]></body>
<body><![CDATA[<p><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><B>CONCLUSION AND PERSPECTIVES</B></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">We implemented an  algorithm based on Reinforcement Learning, known as Q-Learning. This algorithm was  adapted to FSSP with sequence-dependent setup-times and  evaluated taking into account ten test cases of this problem. This  algorithm provides good scheduling sequence FSSP for the test cases. The  obtained result leads to the following conclusions:</font></p> <ul>       <li>         <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">This approach constitutes an interesting alternative to solve complex  mathematic problems.</font></p>   </li>       <li>         <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The Q-Learning adaptation for the FSSP with setup-time between jobs and  initial preparation times of machines yielded good results taking into account the optimal values for the instances set of problems.</font></p>   </li>       <li>         <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">&nbsp;It is  important to mention that we are currently studying the main parameters of the  QL algorithm and we can add a new reward function to our learning algorithm in  order to construct alternative solutions and adapt other methods to generate  initial solutions such as NEH, AG, and PSO. At the same time, we are considering  other real world constraints and larger benchmarks. </font></p>   </li>     </ul>     <p>&nbsp;</p>     ]]></body>
<body><![CDATA[<p align="left"><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><B>REFERENCIAS    BIBLIOGR&Aacute;FICAS</B></font>     <!-- ref --><p><font size="2"><a><font face="Verdana, Arial, Helvetica, sans-serif">Akhshabi, M. y&nbsp;&nbsp; Khalatbari, J. Solving flexible job-shop  scheduling problem using clonal selection algorithm. Indian Journal of Science  and Technology<em>, </em>2011, 10(4): p.  1248-1251.    </font></a></font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>&Aacute;lvarez, M.; Toro, E., et al. Simulated  Annealing Heuristic For Flow Shop Scheduling Problems. Scientia et Technica<em>, </em>2008, XIV(40): p. 159-164.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Anc&acirc;u, M. On Solving Flow Shop Scheduling  Problems. Proceedings of the Romanian Academy<em>, </em>2012, 13(1): p. 71-79.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Anurag, A.; Selcuk, C., et al. Improvement  heuristic for the flow-shop scheduling problem: An adaptive-learning approach.  European Journal of Operational Research<em>, </em>2006, 169 p. 801-815.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Bargaoui, H. y&nbsp;&nbsp; Belkahla, O. Multi-Agent Model based on Tabu  Search for the Permutation Flow Shop Scheduling Problem. Advances in  Distributed Computing and Artificial Intelligence Journal<em>, </em>2014, 3(1): p. 29-38.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Beasley, J. E. (1990). OR-Library Retrieved  January 14, 2014, from </a><a href="http://people.brunel.ac.uk/~mastjjb/jeb/info.html">http://people.brunel.ac.uk/~mastjjb/jeb/info.html</a> </font><!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Bert Van Vreckem, B.; Borodin, D., et al. A  Reinforcement Learning Approach to Solving Hybrid Flexible Flowline Scheduling  Problems. En: 6th Multidisciplinary International Conference on Scheduling :  Theory and Applications (MISTA). Gent, Belgium: 2013, p. 402-409.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Betul, Y. y&nbsp;&nbsp;  Mehmet Mutlu, Y. Ant colony optimization for multi-objective flow shop  scheduling problem. Computers &amp; Industrial Engineering, 2008, 54: p.  411-420.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Betul, Y. y&nbsp;&nbsp;  Mehmet Mutlu, Y. A multi-objective ant colony system algorithm for flow  shop scheduling problem. Expert Systems with Applications<em>, </em>2010, 37 p. 1361-1368.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Brucker, P. Scheduling Algorithms. Berlin,  Springer-Verlag, 2007, 378.    </a></font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>&#268;i&#269;kov&aacute;, Z. y&nbsp;&nbsp;  &Scaron;tevo, S. Flow Shop Scheduling using Differential Evolution. Management  Information Systems<em>, </em>2010, 5(2): p.  008-013.</a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Chaudhry, I. A. y&nbsp;&nbsp; Munem khan, A. Minimizing makespan for a  no-wait &#64258;owshop using genetic algorithm. Sadhana<em>, </em>2012, 36(6): p. 695-707.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Doulabi, S. H. H.; Jaafari, A. A., et al.  Minimizing weighted mean flow time in open shop scheduling with time-dependent  weights and intermediate storage cost. International Journal on Computer  Science and Engineering 2010, 2(3): p. 457-460.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Fonseca, Y.; Mart&iacute;nez, Y., et al. Behavior of  the main parameters of the Genetic Algorithm for Flow Shop Scheduling Problems.  Revista&nbsp; Cubana de Ciencias Inform&aacute;ticas<em>, </em>2014, 8(1): p. 99-111.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Framinan, J. M.; Leisten, R., et al. Efficient  heuristics for flowshop sequencing with objectives of makespan and flowtime  minimization. European Journal of Operational Research<em>, </em>2002, 141: p. 561-571.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Gabel, T. y&nbsp;&nbsp;  Riedmiller, M. On a Successful Application of Multi-Agent Reinforcement  Learning to Operations Research Benchmarks. En: IEEE International Symposium on  Approximate&nbsp; Dynamic&nbsp; Programming and Reinforcement&nbsp; Learning. Honolulu, USA.: I. Press, 2007, p.  68-75.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Garey, M. R.; Johnson, D. S., et al. The  Complexity of Flowshop and Jobshop Scheduling. Mathematics of Operations  Research<em>, </em>1976, 1(2): p. 117-129.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Johnson, S. M. Optimal two and three stage  production schedules with setup times included. Naval Research Logistics  Quarterly<em>, </em>1954, 1: p. 402-452.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Kubiak, W.; Blazewicz, J., et al. Two-machine  &#64258;owshop with limited machine availability. Eur. J. Oper. Res<em>, </em>2002, 136: p. 528-540.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Li, X.; Baki, M. F., et al. Flow shop  scheduling to minimize the total completion time with a&nbsp; permanently present operator: Models and ant  colony optimization metaheuristic. Computers &amp; Operations Research<em>, </em>2011, 38: p. 152-164.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Ling Wang, L.; Zhang, L., et al. An effective  hybrid genetic algorithm for flow shop scheduling with limited buffers.  Computers &amp; Operations Research<em>, </em>2006,  33 p. 2960 - 2971.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Mart&iacute;nez, Y. A Generic Multi-Agent Reinforcement  Learning Approach for Scheduling Problems<em>.</em> PhD Thesis, Vrije Universiteit Brussel, Brussel, 2012.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Mehmet, Y. y&nbsp;&nbsp;  Betul, Y. Multi-objective permutation flow shop scheduling problem:  Literature review, classification and current trends. Omega<em>, </em>2014, 45 p. 119-135.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Moriarty, D.; Schultz, A., et al. Evolutionary  Algorithms for Reinforcement Learning. Journal of Artificial Intelligence  Research<em>, </em>1999, 11: p. 241-276.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Nagar, A.; Heragu, S., et al. A branch and  bound approach for two-machine flowshop scheduling problem. Journal of the  Operational Research Society<em>, </em>1995,  46: p. 721-734.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Nawaz, M.; Enscore, E., et al. A heuristic  algorithm for the m-machine, n-job flowshop sequencing problem. OMEGA - The  International Journal of Management Science<em>, </em>1983, 11(1): p. 91-95.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Parviz, F.; Seyed Mohammad, H. H., et al. A  branch and bound algorithm for hybrid flow shop scheduling problem with setup  time and assembly operations. Applied Mathematical Modelling<em>, </em>2014, 38: p. 119-134.    </a> </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Pinedo, M. Scheduling Theory, Algorithms, and  Systems. New Jersey, Prentice Hall Inc., 2008, 586.</a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Quan-Ke, P.; Fatih, M. T., et al. A discrete  particle swarm optimization algorithm for the no -wait flowshop scheduling  problem. Computers and Operations Research<em>, </em>2008, 35 (9): p. 2807-2839.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Rahimi-Vahed, A. y&nbsp;&nbsp; SM., M. A multi-objective particle swarm for  a flowshop scheduling problem. Journal of Combinatorial Optimization<em>, </em>2007, 13(1): p. 79-102.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Rajendran, C. y&nbsp;&nbsp; Ziegler, H. Ant-colony algorithms for  permutation flowshop scheduling to minimize makespan_total flowtime of jobs.  European Journal of Operation Research<em>, </em>2004,  115: p. 426-438.    </a> </font></p>     ]]></body>
<body><![CDATA[<!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Ramezanian, R.; Aryanezhad, M. B., et al. A  Mathematical Programming Model for Flow Shop Scheduling Problems for  Considering Just in Time Production. International Journal of Industrial  Engineering &amp; Production Research<em>, </em>2010,  21(2): p. 97-104.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Reeves, C. R. A genetic algorithm for flowshop  sequencing. Computers &amp; Operations Research.<em>, </em>1995, 22(1): p. 5-13.    </a></font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>R&iacute;os-Mercado, Z. An enhanced TSPbased  heuristic for makespan minimization in a Flowshop with setup times. Journal of  Heuristics<em>, </em>1999, 5(1): p. 57-74.    </a></font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>R&iacute;os-Mercado, Z. Secuenciando &oacute;ptimamente  l&iacute;neas de flujo en sistemas de manufactura. Revista de Ingenier&iacute;as<em>, </em>2001, IV(10): p. 48-67.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Ruiz, R. y&nbsp;&nbsp;  Moroto, C. A comprehensive review and evaluation of permutation flowshop  heuristics. European Journal of Operation Research<em>, </em>2005, 64: p. 278-275.    </a> </font></p>     ]]></body>
<body><![CDATA[<!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Sadegheih, A. Scheduling problem using genetic  algorithm, simulated annealing and the effects of parameter values on GA performance.  Applied Mathematical Modelling<em>, </em>2006,  30: p. 147-154.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Say&#305;n, S. y&nbsp;&nbsp;  Karabat&#305;, S. A bicriteria approach to the two-machine flowshop  scheduling problem. European Journal of Operational Research<em>, </em>1999, 112: p. 435-449</a></font><!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>&Scaron;eda, M. Mathematical Models of Flow Shop and  Job Shop Scheduling Problems. World Academy of Science, Engineering and  Technology<em>, </em>2007, 1(31): p. 122-127.    </a> </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Seido Naganoa, M.; Almeida da Silva, A., et  al. A new evolutionary clustering search for a no-wait flow shop problem with  set-up times. Engineering Applications of Artificial Intelligence<em>, </em>2012, 25: p. 1114&ndash;1120.</a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Su&aacute;rez, Y. Soluci&oacute;n al problema de  secuenciaci&oacute;n en&nbsp; m&aacute;quinas paralelas  utilizando Aprendizaje Reforzado Universidad Central de las Villas, Villa  Clara, 2010.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Sutton, R. y&nbsp;  &nbsp;Barto, A. Reinforcement Learning  (An Introduction). Cambridge, Massachusetts, The MIT Press, 1998, 312.    </a> </font></p>     ]]></body>
<body><![CDATA[<!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Taillard, E. Benchmarks for basic scheduling  problems. European Journal of Operational Research<em>, </em>1993, 64(2): p. 278-285.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Tasgetiren, M. F.; Liang, Y. C., et al. A  particle swarm optimization algorithm for makespan and total flowtime  minimization in the permutation flowshop sequencing problem. European Journal  of Operational Research<em>, </em>2007, 177:  p. 1930-1947.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Tavares-Neto, R. F. y&nbsp;&nbsp; Godinho-Filho, M. An ant colony optimization  approach to a permutational flowshop scheduling&nbsp;  problem with outsourcing allowed. Computers &amp; Operations Research  2011, 38: p. 1286-1293.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Toro, M.; Restrepo, G., et al. Adaptaci&oacute;n de  la t&eacute;cnica de Particle Swarm al problema de secuenciaci&oacute;n de tareas. Scientia  et Technica UTP<em>, </em>2006, XII(32): p.  307-313.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Toro, M.; Restrepo, G. Y., et al. Algoritmo  gen&eacute;tico modificado aplicado al problema de secuenciamiento de tareas en  sistemas de producci&oacute;n lineal - Flow Shop. Scientia et Technica<em>, </em>2006b, XII(30): p. 285-290.    </a> </font></p>     ]]></body>
<body><![CDATA[<!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Tsitsiklis, J. Asynchronous stochastic  approximation an Q-learning. Machine Learning<em>, </em>1994, 16: p. 185-202.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Varadharajan, T. y&nbsp;&nbsp; Rajendran, C. A multi-objective  simulated-annealing algo-rithm for scheduling in flowshops to minimize the  makespan and total flowtime of jobs. European Journal of Operational Research<em>, </em>2005, 167: p. 772-795.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Watkins, C. Learning from delayed rewards<em>.</em> PhD Thesis, University of Cambridge,  1989.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Watkins, C. y&nbsp;&nbsp;  Dayan, P. Technical Note: Q-Learning,. Machine Learning 1992, 8: p.  279-292.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Wu, T.; Ye, N., et al. Comparison of  distributed methods for resource allocation. International Journal of  Production Research<em>, </em>2005, 43(3): p.  515-536.    </a> </font></p>     ]]></body>
<body><![CDATA[<!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Yamada, T. Studies on Metaheuristics for  Jobshop and Flowshop Scheduling Problems<em>.</em> Tesis Doctoral, Kyoto University, Kyoto, Japan, 2003.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Zhang, Y.; Li, X., et al. Hybrid genetic  algorithm for permutation flowshop scheduling problems with total flowtime  minimization. European Journal of Operational Research<em>, </em>2009, 196: p. 869-876.    </a> </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a>Zhang, Y. y&nbsp;&nbsp; Xiaoping, L. Estimation of distribution  algorithm for permutation flow shops with total flowtime minimization.  Computers &amp; Industrial Engineering<em>, </em>2011,  60: p. 706-718.    </a></font></p>     <p name="_ENREF_1">&nbsp;</p>     <p name="_ENREF_1">&nbsp;</p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Recibido: 30/11/2015    <br> Aceptado: 10/10/2016</font></p>     ]]></body>
<body><![CDATA[ ]]></body><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Akhshabi]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Khalatbari]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Solving flexible job-shop scheduling problem using clonal selection algorithm]]></article-title>
<source><![CDATA[]]></source>
<year>2011</year>
<volume>10</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>1248-1251</page-range></nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Álvarez]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Toro]]></surname>
<given-names><![CDATA[E]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Simulated Annealing Heuristic For Flow Shop Scheduling Problems.]]></article-title>
<source><![CDATA[]]></source>
<year>2008</year>
<volume>XIV</volume>
<numero>40</numero>
<issue>40</issue>
<page-range>159-164</page-range></nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ancâu]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[On Solving Flow Shop Scheduling Problems]]></article-title>
<source><![CDATA[]]></source>
<year>2012</year>
<volume>13</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>71-79</page-range></nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Anurag]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Selcuk]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
</person-group>
<source><![CDATA[Improvement heuristic for the flow-shop scheduling problem:: An adaptive-learning approach]]></source>
<year>2006</year>
<volume>169</volume>
<page-range>801-815</page-range><publisher-name><![CDATA[European Journal of Operational Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bargaoui]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Belkahla]]></surname>
<given-names><![CDATA[O]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Multi-Agent Model based on Tabu Search for the Permutation Flow Shop Scheduling Problem.]]></article-title>
<source><![CDATA[]]></source>
<year>2014</year>
<volume>3</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>29-38</page-range></nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Beasley]]></surname>
<given-names><![CDATA[J. E]]></given-names>
</name>
</person-group>
<source><![CDATA[OR-Library]]></source>
<year>1990</year>
</nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bert Van Vreckem]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
<name>
<surname><![CDATA[Borodin]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
</person-group>
<source><![CDATA[Reinforcement Learning Approach to Solving Hybrid Flexible Flowline Scheduling Problems.]]></source>
<year>2013</year>
<page-range>402-409</page-range><publisher-loc><![CDATA[^eGent Gent]]></publisher-loc>
</nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Betul]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Mehmet Mutlu]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
</person-group>
<source><![CDATA[Ant colony optimization for multi-objective flow shop scheduling problem.]]></source>
<year>2008</year>
<volume>54</volume>
<page-range>411-420.</page-range><publisher-name><![CDATA[Computers & Industrial Engineering]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Betul]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Mehmet Mutlu]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
</person-group>
<source><![CDATA[A multi-objective ant colony system algorithm for flow shop scheduling problem.]]></source>
<year>2010</year>
<volume>37</volume>
<page-range>1361-1368</page-range></nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Brucker]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
</person-group>
<source><![CDATA[Scheduling Algorithms]]></source>
<year>2007</year>
<page-range>378</page-range><publisher-loc><![CDATA[^eBerlin Berlin]]></publisher-loc>
<publisher-name><![CDATA[Springer-Verlag]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[&#268;i&#269;ková]]></surname>
<given-names><![CDATA[Z]]></given-names>
</name>
<name>
<surname><![CDATA[&#352;tevo]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Flow Shop Scheduling using Differential Evolution]]></article-title>
<source><![CDATA[]]></source>
<year>2010</year>
<volume>5</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>008-013</page-range></nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chaudhry]]></surname>
<given-names><![CDATA[I. A]]></given-names>
</name>
<name>
<surname><![CDATA[Munem khan]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Minimizing makespan for a no-wait &#64258;owshop using genetic algorithm]]></article-title>
<source><![CDATA[]]></source>
<year>2012</year>
<volume>36</volume>
<numero>6</numero>
<issue>6</issue>
<page-range>695-707</page-range></nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Doulabi]]></surname>
<given-names><![CDATA[S. H. H]]></given-names>
</name>
<name>
<surname><![CDATA[Jaafari]]></surname>
<given-names><![CDATA[A. A]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Minimizing weighted mean flow time in open shop scheduling with time-dependent weights and intermediate storage cost]]></article-title>
<source><![CDATA[]]></source>
<year>2010</year>
<volume>2</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>457-460</page-range></nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Fonseca]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Martínez]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Behavior of the main parameters of the Genetic Algorithm for Flow Shop Scheduling Problems]]></article-title>
<source><![CDATA[]]></source>
<year>2014</year>
<volume>8</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>99-111</page-range></nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Framinan]]></surname>
<given-names><![CDATA[J. M]]></given-names>
</name>
<name>
<surname><![CDATA[Leisten]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
</person-group>
<source><![CDATA[Efficient heuristics for flowshop sequencing with objectives of makespan and flowtime minimization.]]></source>
<year>2002</year>
<volume>141</volume>
<page-range>561-571</page-range><publisher-name><![CDATA[European Journal of Operational Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Gabel]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[Riedmiller]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[On a Successful Application of Multi-Agent Reinforcement Learning to Operations Research Benchmarks]]></article-title>
<source><![CDATA[]]></source>
<year>2007</year>
<page-range>68-75</page-range><publisher-name><![CDATA[I. Press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Garey]]></surname>
<given-names><![CDATA[M. R]]></given-names>
</name>
<name>
<surname><![CDATA[Johnson]]></surname>
<given-names><![CDATA[D. S]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[The Complexity of Flowshop and Jobshop Scheduling]]></article-title>
<source><![CDATA[]]></source>
<year>1976</year>
<volume>1</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>117-129</page-range></nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Johnson]]></surname>
<given-names><![CDATA[S. M]]></given-names>
</name>
</person-group>
<source><![CDATA[Optimal two and three stage production schedules with setup times included]]></source>
<year>1954</year>
<volume>1</volume>
<page-range>402-452</page-range><publisher-name><![CDATA[Naval Research Logistics Quarterly]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kubiak]]></surname>
<given-names><![CDATA[W]]></given-names>
</name>
<name>
<surname><![CDATA[Blazewicz]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<source><![CDATA[Two-machine &#64258;owshop with limited machine availability.]]></source>
<year>2002</year>
<volume>136</volume>
<page-range>528-540</page-range></nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[X]]></given-names>
</name>
<name>
<surname><![CDATA[Baki]]></surname>
<given-names><![CDATA[M. F]]></given-names>
</name>
</person-group>
<source><![CDATA[Flow shop scheduling to minimize the total completion time with a permanently present operator: Models and ant colony optimization metaheuristic]]></source>
<year>2011</year>
<volume>38</volume>
<page-range>152-164</page-range><publisher-name><![CDATA[Computers & Operations Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B21">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ling Wang]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
</person-group>
<source><![CDATA[An effective hybrid genetic algorithm for flow shop scheduling with limited buffers.]]></source>
<year>2006</year>
<volume>33</volume>
<page-range>2960 - 2971</page-range><publisher-name><![CDATA[Computers & Operations Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B22">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Martínez]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
</person-group>
<source><![CDATA[A Generic Multi-Agent Reinforcement Learning Approach for Scheduling Problems]]></source>
<year>2012</year>
<publisher-loc><![CDATA[^eBrussel Brussel]]></publisher-loc>
<publisher-name><![CDATA[Vrije Universiteit Brussel]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B23">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mehmet]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Betul]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
</person-group>
<source><![CDATA[Multi-objective permutation flow shop scheduling problem: Literature review, classification and current trends]]></source>
<year>2014</year>
<volume>45</volume>
<page-range>119-135</page-range><publisher-name><![CDATA[Omega]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B24">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Moriarty]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[Schultz]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<source><![CDATA[Evolutionary Algorithms for Reinforcement Learning.]]></source>
<year>1999</year>
<volume>11</volume>
<page-range>241-276</page-range><publisher-name><![CDATA[Journal of Artificial Intelligence Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B25">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nagar]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Heragu]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<source><![CDATA[A branch and bound approach for two-machine flowshop scheduling problem.]]></source>
<year>1995</year>
<volume>46</volume>
<page-range>721-734</page-range><publisher-name><![CDATA[Journal of the Operational Research Society]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B26">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nawaz]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Enscore]]></surname>
<given-names><![CDATA[E]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A heuristic algorithm for the m-machine, n-job flowshop sequencing problem]]></article-title>
<source><![CDATA[]]></source>
<year>1983</year>
<volume>11</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>91-95</page-range></nlm-citation>
</ref>
<ref id="B27">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Parviz]]></surname>
<given-names><![CDATA[F]]></given-names>
</name>
<name>
<surname><![CDATA[Seyed Mohammad]]></surname>
<given-names><![CDATA[H. H]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A branch and bound algorithm for hybrid flow shop scheduling problem with setup time and assembly operations]]></article-title>
<source><![CDATA[]]></source>
<year>2014</year>
<volume>38</volume>
<page-range>119-134</page-range></nlm-citation>
</ref>
<ref id="B28">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pinedo]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[Scheduling Theory, Algorithms, and Systems]]></source>
<year>2008</year>
<page-range>586</page-range><publisher-loc><![CDATA[^eNew Jersey New Jersey]]></publisher-loc>
<publisher-name><![CDATA[Prentice Hall Inc]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B29">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Quan-Ke]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[Fatih]]></surname>
<given-names><![CDATA[M. T]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A discrete particle swarm optimization algorithm for the no -wait flowshop scheduling problem]]></article-title>
<source><![CDATA[]]></source>
<year>2008</year>
<volume>35</volume>
<numero>9</numero>
<issue>9</issue>
<page-range>2807-2839</page-range></nlm-citation>
</ref>
<ref id="B30">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rahimi-Vahed]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[SM]]></surname>
<given-names><![CDATA[M. A]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[multi-objective particle swarm for a flowshop scheduling problem.]]></article-title>
<source><![CDATA[]]></source>
<year>2007</year>
<volume>13</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>79-102</page-range></nlm-citation>
</ref>
<ref id="B31">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rajendran]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Ziegler]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
</person-group>
<source><![CDATA[Ant-colony algorithms for permutation flowshop scheduling to minimize makespan_total flowtime of jobs]]></source>
<year>2004</year>
<volume>115</volume>
<page-range>426-438</page-range><publisher-name><![CDATA[European Journal of Operation Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B32">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ramezanian]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Aryanezhad]]></surname>
<given-names><![CDATA[M. B]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A Mathematical Programming Model for Flow Shop Scheduling Problems for Considering Just in Time Production]]></article-title>
<source><![CDATA[]]></source>
<year>2010</year>
<volume>21</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>97-104</page-range></nlm-citation>
</ref>
<ref id="B33">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Reeves]]></surname>
<given-names><![CDATA[C. R]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A genetic algorithm for flowshop sequencing]]></article-title>
<source><![CDATA[]]></source>
<year>1995</year>
<volume>22</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>p. 5-13</page-range></nlm-citation>
</ref>
<ref id="B34">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ríos-Mercado]]></surname>
<given-names><![CDATA[Z]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[An enhanced TSPbased heuristic for makespan minimization in a Flowshop with setup times]]></article-title>
<source><![CDATA[]]></source>
<year>1999</year>
<volume>5</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>57-74</page-range></nlm-citation>
</ref>
<ref id="B35">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ríos-Mercado]]></surname>
<given-names><![CDATA[Z]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Secuenciando óptimamente líneas de flujo en sistemas de manufactura.]]></article-title>
<source><![CDATA[]]></source>
<year>2001</year>
<volume>IV</volume>
<numero>10</numero>
<issue>10</issue>
<page-range>48-67</page-range></nlm-citation>
</ref>
<ref id="B36">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ruiz]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Moroto]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
</person-group>
<source><![CDATA[A comprehensive review and evaluation of permutation flowshop heuristics.]]></source>
<year>2005</year>
<volume>64</volume>
<page-range>278-275</page-range><publisher-name><![CDATA[European Journal of Operation Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B37">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sadegheih]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<source><![CDATA[Scheduling problem using genetic algorithm, simulated annealing and the effects of parameter values on GA performance.]]></source>
<year>2006</year>
<volume>30</volume>
<page-range>147-154</page-range></nlm-citation>
</ref>
<ref id="B38">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Say&#305;n]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Karabat&#305;]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<source><![CDATA[A bicriteria approach to the two-machine flowshop scheduling problem]]></source>
<year>1999</year>
<volume>112</volume>
<page-range>435-449</page-range><publisher-name><![CDATA[European Journal of Operational Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B39">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[&#352;eda]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Mathematical Models of Flow Shop and Job Shop Scheduling Problems.]]></article-title>
<source><![CDATA[]]></source>
<year>2007</year>
<volume>1</volume>
<numero>31</numero>
<issue>31</issue>
<page-range>122-127</page-range><publisher-name><![CDATA[Engineering and Technology]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B40">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Seido Naganoa]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Almeida da Silva]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<source><![CDATA[A new evolutionary clustering search for a no-wait flow shop problem with set-up times]]></source>
<year>2012</year>
<volume>25</volume>
<page-range>1114-1120</page-range><publisher-name><![CDATA[Engineering Applications of Artificial Intelligence]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B41">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Suárez]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
</person-group>
<source><![CDATA[Solución al problema de secuenciación en máquinas paralelas utilizando Aprendizaje Reforzado Universidad Central de las Villas]]></source>
<year>2010</year>
<publisher-loc><![CDATA[^eVilla Clara Villa Clara]]></publisher-loc>
</nlm-citation>
</ref>
<ref id="B42">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sutton]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Barto]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<source><![CDATA[Reinforcement Learning (An Introduction).]]></source>
<year>1998</year>
<page-range>312</page-range><publisher-loc><![CDATA[Cambridge^eMassachusetts Massachusetts]]></publisher-loc>
<publisher-name><![CDATA[The MIT Press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B43">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Taillard]]></surname>
<given-names><![CDATA[E]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Benchmarks for basic scheduling problems.]]></article-title>
<source><![CDATA[]]></source>
<year>1993</year>
<volume>64</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>278-285</page-range></nlm-citation>
</ref>
<ref id="B44">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tasgetiren]]></surname>
<given-names><![CDATA[M. F]]></given-names>
</name>
<name>
<surname><![CDATA[Liang]]></surname>
<given-names><![CDATA[Y. C]]></given-names>
</name>
</person-group>
<source><![CDATA[A particle swarm optimization algorithm for makespan and total flowtime minimization in the permutation flowshop sequencing problem.]]></source>
<year>2007</year>
<volume>177</volume>
<page-range>1930-1947</page-range><publisher-name><![CDATA[European Journal of Operational Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B45">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tavares-Neto]]></surname>
<given-names><![CDATA[R. F]]></given-names>
</name>
<name>
<surname><![CDATA[Godinho-Filho]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[An ant colony optimization approach to a permutational flowshop scheduling problem with outsourcing allowed]]></source>
<year>2011</year>
<volume>38</volume>
<page-range>1286-1293</page-range><publisher-name><![CDATA[Computers & Operations Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B46">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Toro]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Restrepo]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
</person-group>
<source><![CDATA[Adaptación de la técnica de Particle Swarm al problema de secuenciación de tareas.]]></source>
<year>2006</year>
<page-range>307-313</page-range><publisher-name><![CDATA[Scientia et Technica UTP]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B47">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Toro]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Restrepo]]></surname>
<given-names><![CDATA[G. Y]]></given-names>
</name>
</person-group>
<source><![CDATA[Algoritmo genético modificado aplicado al problema de secuenciamiento de tareas en sistemas de producción lineal - Flow Shop]]></source>
<year>2006</year>
<month>b</month>
<page-range>285-290</page-range><publisher-name><![CDATA[Scientia et Technica]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B48">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tsitsiklis]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<source><![CDATA[Asynchronous stochastic approximation an Q-learning. Machine Learning]]></source>
<year>1994</year>
<volume>16</volume>
<page-range>185-202</page-range></nlm-citation>
</ref>
<ref id="B49">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Varadharajan]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[Rajendran]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
</person-group>
<source><![CDATA[A multi-objective simulated-annealing algo-rithm for scheduling in flowshops to minimize the makespan and total flowtime of jobs]]></source>
<year>2005</year>
<volume>167</volume>
<page-range>772-795</page-range><publisher-name><![CDATA[European Journal of Operational Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B50">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Watkins]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
</person-group>
<source><![CDATA[Learning from delayed rewards]]></source>
<year>1989</year>
<publisher-name><![CDATA[University of Cambridge]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B51">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Watkins]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Dayan]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
</person-group>
<source><![CDATA[Technical Note: Q-Learning]]></source>
<year>1992</year>
<volume>8</volume>
<page-range>279-292</page-range></nlm-citation>
</ref>
<ref id="B52">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Wu]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[Ye]]></surname>
<given-names><![CDATA[N]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Comparison of distributed methods for resource allocation.]]></article-title>
<source><![CDATA[]]></source>
<year>2005</year>
<volume>43</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>515-536</page-range></nlm-citation>
</ref>
<ref id="B53">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Yamada]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
</person-group>
<source><![CDATA[Studies on Metaheuristics for Jobshop and Flowshop Scheduling Problems]]></source>
<year>2003</year>
<publisher-loc><![CDATA[^eKyoto Kyoto]]></publisher-loc>
<publisher-name><![CDATA[Kyoto University]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B54">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[X]]></given-names>
</name>
</person-group>
<source><![CDATA[Hybrid genetic algorithm for permutation flowshop scheduling problems with total flowtime minimization]]></source>
<year>2009</year>
<volume>196</volume>
<page-range>869-876</page-range><publisher-name><![CDATA[European Journal of Operational Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B55">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Xiaoping]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
</person-group>
<source><![CDATA[Estimation of distribution algorithm for permutation flow shops with total flowtime minimization]]></source>
<year>2011</year>
<volume>60</volume>
<page-range>706-718</page-range><publisher-name><![CDATA[Computers & Industrial Engineering]]></publisher-name>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
